Home > Stupid Python Tricks > Using lxml for xml parsing/validation in Python

Using lxml for xml parsing/validation in Python

…Is actually very easy. lxml for Windows is distributed as an egg file, which can be installed with a single command (once you have the egg utility). Egg is part of the Python Enterprise Application Kit (PEAK). After some initial drama finding the correct version of Python on my system, I got lxml working pretty quickly. Using it is very simple:

from lxml import etree

agentschema="agent-schema.xsd"
tree=etree.parse(filename)
schema_doc=etree.parse(agentschema)

if (tree.xmlschema(schema_doc)):
    print "***Processed " + filename + " successfully."
else:
    print "***Error in " + filename + "!"

This parses and validates an xml file according to a schema (in this case, agent-schema.xsd). Once it has been parsed and validated, XPath is probably the best choice of tree navigation to use, as it provides a simple way to query elements; it’s definitely more pleasant than walking the tree manually. The lxml page has good instructions on how to use XPath.

Writing Schema, on the other hand, is a bit trickier. www.w3schools.com has a good tutorial on how to build schema (just note that it seems to incorrectly use xs instead of xsd throughout the examples– it is in fact xsd:schema, etc. for the tags). In addition, I’ve written up most of the CML language from Funge as an schema already, but I still need to go back to his masters thesis to find the full grammar for it. This will give us at least a starting point for bringing in agent definitions for the core.

Categories: Stupid Python Tricks Tags:
  1. No comments yet.
You must be logged in to post a comment.