PDBSProtEC: Methods

PDBSprotEC uses

our mapping from PDB chains and residue numbers to SwissProt codes and residue numbers
the Enzyme database
SwissProt, patched with updates and new entries

to create two database tables which may then be queried and joined to create the mapping from PDB chain to EC number via SwissProt code.
Previously we used a mapping between individual residues in PDB entries with amino acids in SwissProt entries produced by Sameer Velankar and Phil McNeil (MSD group) and Virginie Mittard (Sequence Database group) at the EBI

These data sources are mirrored locally using the standard Perl Mirror script. Note that we have patched mirror.pl to allow data stored compressed remotely to be stored uncompressed locally. The patches to mirror.pl are available here.

The standard GNU 'make' utility is run each night and used to detect updates to any of these data sources. The Makefile can be accessed here. Updates to either the PDBChain/SwissProt mapping or to the Enzyme database cause the respective tables to be dropped and reloaded while updates to SwissProt are applied to the existing SwissProt/EC mapping table.

Perl scripts are used to extract the required data from the three data sources and to create the two database tables. The database is implemented using PostgreSQL. The structure of these tables is as follows:

PDB Chain / SwissProt

SwissProt / EC

CREATE TABLE pdbsws
(  pdbcode  char(4),
   chainid  char(1),
   sprot    varchar(10),
   res1     varchar(6),
   res2     varchar(6)
);

CREATE TABLE sprotec
(
   sprot   varchar(10),
   ec      varchar(16),
   ec1     varchar(3),
   ec2     varchar(3),
   ec3     varchar(3),
   ec4     varchar(3)
);

Note that the SwissProt/EC table contains some redundancy as it contains the EC numbers both in complete and split form. This is done for ease of queries. The PDBChain/SwissProt table is indexed on PDB code and SwissProt code while the SwissProt/EC table is indexed on all columns.

A flat-file dump of the data is produced using a simple query of the database while the XML dump is created using a Perl/DBI script. These are also created automatically as part of the 'make' run.

Download a draft paper, or read the paper as published in Bioinformatics.

Download an example Perl script which converts the XML dump format into the flat file dump format.

Go back to search