xmltodict

xmltodict is a Python module that makes working with XML feel like you are working with JSON, as in this "spec":

>>> doc = xmltodict.parse("""
... <mydocument has="an attribute">
...   <and>
...     <many>elements</many>
...     <many>more elements</many>
...   </and>
...   <plus a="complex">
...     element as well
...   </plus>
... </mydocument>
... """)
>>>
>>> doc['mydocument']['@has']
u'an attribute'
>>> doc['mydocument']['and']['many']
[u'elements', u'more elements']
>>> doc['mydocument']['plus']['@a']
u'complex'
>>> doc['mydocument']['plus']['#text']
u'element as well'

It's very fast (Expat-based) and has a streaming mode with a small memory footprint, suitable for big XML dumps like Discogs or Wikipedia:

>>> def handle_artist(_, artist):
...     print artist['name']
>>> 
>>> xmltodict.parse(GzipFile('discogs_artists.xml.gz'),
...     item_depth=2, item_callback=handle_artist)
A Perfect Circle
Fantômas
King Crimson
Chris Potter
...

It can also be used from the command line to pipe objects to a script like this:

import sys, marshal
while True:
    _, article = marshal.load(sys.stdin)
    print article['title']

$ cat enwiki-pages-articles.xml.bz2 | bunzip2 | xmltodict.py 2 | myscript.py
AccessibleComputing
Anarchism
AfghanistanHistory
AfghanistanGeography
AfghanistanPeople
AfghanistanCommunications
Autism
...

Or just cache the dicts so you don't have to parse that big XML file again. You do this only once:

$ cat enwiki-pages-articles.xml.bz2 | bunzip2 | xmltodict.py 2 | gzip > enwiki.dicts.gz

And you reuse the dicts with every script that needs them:

$ cat enwiki.dicts.gz | gunzip | script1.py
$ cat enwiki.dicts.gz | gunzip | script2.py
...

You can also convert in the other direction, using the unparse() method:

>>> mydict = {
...     'page': {
...         'title': 'King Crimson',
...         'ns': 0,
...         'revision': {
...             'id': 547909091,
...         }
...     }
... }
>>> print unparse(mydict)
<?xml version="1.0" encoding="utf-8"?>
<page><ns>0</ns><revision><id>547909091</id></revision><title>King Crimson</title></page>

Ok, how do I get it?

You just need to

$ pip install xmltodict

There is an official Fedora package for xmltodict. If you are on Fedora or RHEL, you can do:

$ sudo yum install python-xmltodict

Donate

If you love xmltodict, consider supporting the author on Gittip.

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
setup.py		setup.py
xmltodict.py		xmltodict.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

xmltodict

Ok, how do I get it?

Donate

About

Releases

Packages

Languages

License

bgilb/xmltodict

Folders and files

Latest commit

History

Repository files navigation

xmltodict

Ok, how do I get it?

Donate

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages