Introduction

Website: http://xmlsoft.org/

The XML libraries and tools

Architecture of libxml2

the architecture of libxml

XML example

XML is a W3C Recommendation

Inserting metadata in text to associate structure to content

<p> Classic example based on
<a href="http://www.w3.org/">HTML</a> markup</p>

The mathematical representation is a tree: an XML tree for the given markup

The tree parser Interface

import libxml2
doc = libxml2.parseFile("ex1.xml")
p = doc.children
print p.name
doc.freeDoc()

See example 1, XML

The SAX interface

The reader interface

import libxml2

input = libxml2.inputBuffer(open("ex2.xml"))
reader = input.newTextReader("ex2.xml")
ret = reader.Read()
while ret == 1:
    print reader.Name()
    ret = reader.Read()

See example 2, XML

Validation: XML Schemas Datatypes

image of the hierarchy of data types

Validation: Relax-NG

<element name="p"
   xmlns="http://relaxng.org/ns/structure/1.0">
  <zeroOrMore>
    <choice>
      <text/>
      <element name="a">
        <attribute name="href"/>
        <text/>
      </element>
    </choice>
  </zeroOrMore>
</element>

See example 4, XML, RNG

Validation: streaming

See example 5, XML, RNG

XPath: addressing language

Examples:

/p/a
//a
//a[@href = "index.html"]
doc = libxml2.htmlParseFile(url, None);
ctxt = doc.xpathNewContext()
anchors = ctxt.xpathEval("//a[@href]")
for anchor in anchors:
    href = anchor.prop("href")

Try "xmllint --shell" to test XPath expressions

XPointer: fragment and selection

XInclude: inclusion mechanism

xml:id : defining IDs without using a DTD

XSLT: the transformation language

See example 7, XML, Stylesheet

Extending XSLT with python

def f(ctx, str):

libxslt.registerExtModuleFunction("foo",
             "http://example.com/foo", f)

See example 8

Catalogs and I/O handling

Performances

SAX speed benchmark Overall speed benchmark

Conclusions