xmltodict

by martinblech

martinblech / xmltodict

Python module that makes working with XML feel like you are working with JSON

4.1K Stars 399 Forks Last release: Not found MIT License 205 Commits 29 Releases

Available items

No Items, yet!

The developer of this repository has not created any items for sale yet. Need a bug fixed? Help with integration? A different license? Create a request here:

xmltodict

xmltodict
is a Python module that makes working with XML feel like you are working with JSON, as in this "spec":

Build Status

>>> print(json.dumps(xmltodict.parse("""
...  
...    
...      elements
...      more elements
...    
...    
...      element as well
...    
...  
...  """), indent=4))
{
    "mydocument": {
        "@has": "an attribute", 
        "and": {
            "many": [
                "elements", 
                "more elements"
            ]
        }, 
        "plus": {
            "@a": "complex", 
            "#text": "element as well"
        }
    }
}

Namespace support

By default,

xmltodict
does no XML namespace processing (it just treats namespace declarations as regular node attributes), but passing
process_namespaces=True
will make it expand namespaces for you:
>>> xml = """
... 
...   1
...   2
...   3
... 
... """
>>> xmltodict.parse(xml, process_namespaces=True) == {
...     'http://defaultns.com/:root': {
...         'http://defaultns.com/:x': '1',
...         'http://a.com/:y': '2',
...         'http://b.com/:z': '3',
...     }
... }
True

It also lets you collapse certain namespaces to shorthand prefixes, or skip them altogether:

>>> namespaces = {
...     'http://defaultns.com/': None, # skip this namespace
...     'http://a.com/': 'ns_a', # collapse "http://a.com/" -> "ns_a"
... }
>>> xmltodict.parse(xml, process_namespaces=True, namespaces=namespaces) == {
...     'root': {
...         'x': '1',
...         'ns_a:y': '2',
...         'http://b.com/:z': '3',
...     },
... }
True

Streaming mode

xmltodict
is very fast (Expat-based) and has a streaming mode with a small memory footprint, suitable for big XML dumps like Discogs or Wikipedia:
>>> def handle_artist(_, artist):
...     print(artist['name'])
...     return True
>>> 
>>> xmltodict.parse(GzipFile('discogs_artists.xml.gz'),
...     item_depth=2, item_callback=handle_artist)
A Perfect Circle
Fantômas
King Crimson
Chris Potter
...

It can also be used from the command line to pipe objects to a script like this:

import sys, marshal
while True:
    _, article = marshal.load(sys.stdin)
    print(article['title'])
$ bunzip2 enwiki-pages-articles.xml.bz2 | xmltodict.py 2 | myscript.py
AccessibleComputing
Anarchism
AfghanistanHistory
AfghanistanGeography
AfghanistanPeople
AfghanistanCommunications
Autism
...

Or just cache the dicts so you don't have to parse that big XML file again. You do this only once:

$ bunzip2 enwiki-pages-articles.xml.bz2 | xmltodict.py 2 | gzip > enwiki.dicts.gz

And you reuse the dicts with every script that needs them:

$ gunzip enwiki.dicts.gz | script1.py
$ gunzip enwiki.dicts.gz | script2.py
...

Roundtripping

You can also convert in the other direction, using the

unparse()
method:
>>> mydict = {
...     'response': {
...             'status': 'good',
...             'last_updated': '2014-02-16T23:10:12Z',
...     }
... }
>>> print(unparse(mydict, pretty=True))


    good
    2014-02-16T23:10:12Z

Text values for nodes can be specified with the

cdata_key
key in the python dict, while node properties can be specified with the
attr_prefix
prefixed to the key name in the python dict. The default value for
attr_prefix
is
@
and the default value for
cdata_key
is
#text
.
>>> import xmltodict
>>> 
>>> mydict = {
...     'text': {
...         '@color':'red',
...         '@stroke':'2',
...         '#text':'This is a test'
...     }
... }
>>> print(xmltodict.unparse(mydict, pretty=True))

This is a test

Lists that are specified under a key in a dictionary use the key as a tag for each item. But if a list does have a parent key, for example if a list exists inside another list, it does not have a tag to use and the items are converted to a string as shown in the example below. To give tags to nested lists, use the

expand_iter
keyword argument to provide a tag as demonstrated below. Note that using
expand_iter
will break roundtripping.
>>> mydict = {
...     "line": {
...         "points": [
...             [1, 5],
...             [2, 6],
...         ]
...     }
... }
>>> print(xmltodict.unparse(mydict, pretty=True))


        [1, 5]
        [2, 6]

>>> print(xmltodict.unparse(mydict, pretty=True, expand_iter="coord"))


        
                1
                5
        
        
                2
                6
        

Ok, how do I get it?

Using pypi

You just need to

$ pip install xmltodict

RPM-based distro (Fedora, RHEL, …)

There is an official Fedora package for xmltodict.

$ sudo yum install python-xmltodict

Arch Linux

There is an official Arch Linux package for xmltodict.

$ sudo pacman -S python-xmltodict

Debian-based distro (Debian, Ubuntu, …)

There is an official Debian package for xmltodict.

$ sudo apt install python-xmltodict

FreeBSD

There is an official FreeBSD port for xmltodict.

$ pkg install py36-xmltodict

openSUSE/SLE (SLE 15, Leap 15, Tumbleweed)

There is an official openSUSE package for xmltodict.

# Python2
$ zypper in python2-xmltodict

Python3

$ zypper in python3-xmltodict

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.