Practical

The XML file

You should download the XML file and save a copy to disk.

You can do this on hope by typing the command:

wget http://www.bioinf.org.uk/teaching/bbk/databases/dom/mutations.xml

The XML file which you will load and parse is structured as follows:

<mutants>
   <mutant_group native="1adeA2">
      <native_structure>
         <method>crystal</method>
         <resolution>2.00</resolution>
         <rfactor>19.90</rfactor>
      </native_structure>
      <mutant domid="1cg3A2">
         <structure>
            <method>crystal</method>
            <resolution>2.50</resolution>
            <rfactor>16.20</rfactor>
         </structure>
         <mutation resnum="A143"  wildtype="Arg" substitution="Leu"/>
      </mutant>
      ...
   </mutant_group>
   ...
</mutants>

The 'root' element is <mutants> and it contains multiple <mutant_group> elements. Each of these refers to one 'native' structure which is described in the <native_structure> element.

Each <mutant_group> also contains one or more <mutant> elements describing mutations to the native structure. Each <mutant> element contains a <structure> element which describes the structure and a <mutation> element which describes the mutation.

Note: The XML is slightly more complex than the examples in the lecture since the <method>, <resolution> and <rfactor> elements appear within both the <native_structure> and the <structure> elements.

Tasks

You will write two Python scripts using the xml.dom.mindom module as follows:

Task 1

List all the native structures with resolution ≤2.0A.

The results should look like this:

1adeA2 2.00
1adeB2 2.00
1qf5A2 2.00
1sspE0 1.90
1eugA0 1.60
1bvtA0 1.85
1ede00 1.90
2had00 1.90
2dhd00 2.00
1akeA0 1.90
1akeB0 1.90
1ankA0 2.00
1ankB0 2.00
1aky00 1.63
2aky00 1.96
1a2300 0.00
1a2400 0.00
1fvkA0 1.70
1fvkB0 1.70
1a2j00 2.00
1dsbA0 2.00
1dsbB0 2.00
1edt00 1.90

Task 2

Extract a list of the mutant structures for the native 1eugA0. Your list should contain:

The results should look like this:

4eugA0 A187 His Gln 1.40
5eugA0 A187 His Gln 1.60
3eugA0 A19  Tyr His 1.43
2eugA0 A19  Tyr His 1.50

Note: You may not have time to do the second question. Essentially both questions teach you the same things; the second one is just a bit harder than the first for those who find it all very straightforward!

Continue