Birkbeck MSc Bioinformatics With Systems Biology

This is a summary of the general approach to reading XML using the DOM in Python

Example 1

This is the 'obvious' way to do it

#!/usr/bin/env python3

from xml.dom import minidom

doc = minidom.parse('test.xml')

for species in doc.getElementsByTagName('species'):
    speciesName = species.getAttribute('name')
    commonName = species.getElementsByTagName('common-name')[0].firstChild.data
    conservation = species.getElementsByTagName('conservation')[0].getAttribute('status')
    print ("%s (%s) %s" % (commonName, speciesName, conservation))

[XML file] [Code]

Example 2

This is now my preferred approach - don't worry about getting the first item, simply do everything as a loop so the loops exactly mirror the structure of the XML.

#!/usr/bin/env python3

from xml.dom import minidom

doc = minidom.parse('test.xml')

for species in doc.getElementsByTagName('species'):
    speciesName = species.getAttribute('name')

    for commonNameElement in species.getElementsByTagName('common-name'):
        commonName = commonNameElement.firstChild.data

    for conservationElement in species.getElementsByTagName('conservation'):
        conservation = conservationElement.getAttribute('status')

    print ("%s (%s) %s" % (commonName, speciesName, conservation))

[XML file] [Code]

Reading XML

Starting your script

Parse the file

Obtain a set of elements...

Get the first (only) element from a set...

Obtain the first child item (typically a text node)...

Convert a text node to actual text...

Get an attribute value...

Examples

Example 1

Example 2