PDBSProtEC: Dump Details |
The PDBSprotEC data may be obtained in two forms:
|
The flat file format has 6 columns:
|
The XML format contains the same data in the following format:
<pdbsprotec> <pdb_sprot_ec pdb='xxxx'> <chain id='X'> <region res1='nnn' res2='nnn' sprot='ssssss'> <ec ec1='n' ec2='n' ec3='n' ec4='n'> n.n.n.n </ec> </region> </chain> </pdb_sprot_ec> </pdbsprotec>You can download example code to parse this XML file using Perl/DOM. |
Note that one PDB chain may contain more than one region (indicated by
the residue range in the flat file or the res1 and res2 attributes of
the <region> tag in the XML). In addition, one region may be assigned more than one EC number. This will appear as multiple rows in the flat file dump and as multiple EC tags in the XML. |
An EC number of 0.0.0.0 indicates that this protein is not an
enzyme. In other words it appears in UniProtKB, but there is no EC
number specified either there or in the Enzyme database. Where we don't know whether a protein is an enzyme, no EC indication is given. These may be protein chains which appear in UniProt/trEMBL, but not in UniProtKB or Enzyme. Alternatively they may be short peptides or non-protein chains. |
Note that there is a difference between the data in the the flat file
and the XML file. The flat file contains rows with EC numbers of 0.0.0.0 where we have evidence that a protein is not an enzyme, but it does not contain rows for the chains where we have no information. In the XML file, we also have EC numbers of 0.0.0.0 where we have evidence that a protein is not an enzyme. Unlike the flat file, the XML file does include PDB chains for which we have no EC information. However, these entries have no <ec> tag. |
Go back to search |