Reading XML

This is a summary of the general approach to reading XML using the DOM in Python

Starting your script

#!/usr/bin/env python3

# Load minidom
from xml.dom import minidom   

# Read a filename from the command line
import sys
progName = sys.argv.pop(0)
fileName = sys.argv.pop(0)

Parse the file

doc = minidom.parse(fileName)

Obtain a set of elements...

tagnameElementSet = doc.getElementsByTagName('tagname') 

Get the first (only) element from a set...

tagnameElement = tagnameElementSet.item(0)
- or -
tagnameElement = tagnameElementSet[0]

Obtain the first child item (typically a text node)...

tagnameTextNode = tagnameElement.firstChild 

Convert a text node to actual text...

tagnameText = tagnameTextNode.data

Get an attribute value...

attribureText = tagnameElement.getAttribute('attr-name') 

Examples

This is the 'obvious' way to do it

#!/usr/bin/env python3

from xml.dom import minidom

doc = minidom.parse('test.xml')

for species in doc.getElementsByTagName('species'):
    speciesName = species.getAttribute('name')
    commonName = species.getElementsByTagName('common-name')[0].firstChild.data
    conservation = species.getElementsByTagName('conservation')[0].getAttribute('status')
    print ("%s (%s) %s" % (commonName, speciesName, conservation))


[XML file] [Code]

This is now my preferred approach - don't worry about getting the first item, simply do everything as a loop so the loops exactly mirror the structure of the XML.

#!/usr/bin/env python3

from xml.dom import minidom

doc = minidom.parse('test.xml')

for species in doc.getElementsByTagName('species'):
    speciesName = species.getAttribute('name')

    for commonNameElement in species.getElementsByTagName('common-name'):
        commonName = commonNameElement.firstChild.data

    for conservationElement in species.getElementsByTagName('conservation'):
        conservation = conservationElement.getAttribute('status')

    print ("%s (%s) %s" % (commonName, speciesName, conservation))


[XML file] [Code]

Continue