PDBSProtEC: Methods |
||||
PDBSprotEC uses
Previously we used a mapping between individual residues in PDB entries with amino acids in SwissProt entries produced by Sameer Velankar and Phil McNeil (MSD group) and Virginie Mittard (Sequence Database group) at the EBI |
||||
These data sources are mirrored locally using the standard Perl Mirror script. Note that we have patched mirror.pl to allow data stored compressed remotely to be stored uncompressed locally. The patches to mirror.pl are available here. | ||||
The standard GNU 'make' utility is run each night and used to detect updates to any of these data sources. The Makefile can be accessed here. Updates to either the PDBChain/SwissProt mapping or to the Enzyme database cause the respective tables to be dropped and reloaded while updates to SwissProt are applied to the existing SwissProt/EC mapping table. | ||||
Perl scripts are used to extract the required data from the three data
sources and to create the two database tables. The database is
implemented using PostgreSQL.
The structure of these tables is as follows:
Note that the SwissProt/EC table contains some redundancy as it contains the EC numbers both in complete and split form. This is done for ease of queries. The PDBChain/SwissProt table is indexed on PDB code and SwissProt code while the SwissProt/EC table is indexed on all columns. |
||||
A flat-file dump of the data is produced using a simple query of the database while the XML dump is created using a Perl/DBI script. These are also created automatically as part of the 'make' run. | ||||
Download a draft paper, or read the paper as published in Bioinformatics. | ||||
Download an example Perl script which converts the XML dump format into the flat file dump format. |
Go back to search |