The XML libraries and tools

Architecture of libxml2

the architecture of libxml

XML example

XML is a W3C Recommendation

Inserting metadata in text to associate structure to content

<p> Classic example based on
<a href="">HTML</a> markup</p>

The mathematical representation is a tree: an XML tree for the given markup

The tree parser Interface

import libxml2
doc = libxml2.parseFile("ex1.xml")
p = doc.children

See example 1, XML

The SAX interface

The reader interface

import libxml2

input = libxml2.inputBuffer(open("ex2.xml"))
reader = input.newTextReader("ex2.xml")
ret = reader.Read()
while ret == 1:
    print reader.Name()
    ret = reader.Read()

See example 2, XML

Validation: XML Schemas Datatypes

image of the hierarchy of data types

Validation: Relax-NG

<element name="p"
      <element name="a">
        <attribute name="href"/>

See example 4, XML, RNG

Validation: streaming

See example 5, XML, RNG

XPath: addressing language


//a[@href = "index.html"]
doc = libxml2.htmlParseFile(url, None);
ctxt = doc.xpathNewContext()
anchors = ctxt.xpathEval("//a[@href]")
for anchor in anchors:
    href = anchor.prop("href")

Try "xmllint --shell" to test XPath expressions

XPointer: fragment and selection

XInclude: inclusion mechanism

xml:id : defining IDs without using a DTD

XSLT: the transformation language

See example 7, XML, Stylesheet

Extending XSLT with python

def f(ctx, str):

             "", f)

See example 8

Catalogs and I/O handling


SAX speed benchmark Overall speed benchmark