OMIM provides information on 'allelic variants' (mutations), but does not provide cross references to SwissProt.
In addition, there are frequently problems with the numbering used in OMIM. Often the residue numbers used do not correspond to the residue number in the corresponding SwissProt file. Generally, applying a constant offset to all the residue numbers will correct this, but in about 8% of entries, the correct residue numbers cannot be found in this way and for about 1% of the data, there is a clear change in the numbering scheme within the OMIM entry (although there is no parsable annotation to indicate this is the case). These are validated as 'probably correct' (see below).
On the other hand, SwissProt provides cross-links to mutation data, but the source of these data are not available from the SwissProt file (though links on the web provide these data), making it difficult to identify disease-related mutations rather than SNPs.
This server extracts mis-sense mutation data and OMIM reference numbers from OMIM and takes the sequence data, OMIM cross-reference and accession code from SwissProt. The data are linked using a PostgreSQL relational database. The residue numbers from OMIM are then validated against the sequence from SwissProt and the results of the validation are written back into the database.
'Probably correct' residue numbers
'Probably correct' residue numbers are those where a majority of residues in the OMIM record are found by applying an offset (let's say of 20 residues). However a subset are not found using that offset. If any of this subset then matches with an offset of zero, we assume that the numbering scheme in OMIM has changed.
File formats
XML
The XML file is pretty-much self-documenting. An <omim> tag with an id attribute corresponds to an OMIM identifier. This contains one <sprot> tag whose ac attribute indicates the UniProtKB/SwissProt accession code. This tag contains one or more <record> tags which correspond to the OMIM allelic variant records.
Within each <record> tag, we indicate the residue number as it appeared in OMIM using the <omim_resnum> tag. This has a correct attribute to indicate whether we have validated this residue number as correct with respect to the SwissProt sequence. (The correct attribute is either 't' for true or 'f' for false.) The <resnum> tag indicates our validated residue number with respect to the SwissProt sequence. The valid attribute indicates that this number is definitely correct ('t' for true), definitely incorrect ('f' for false - indicates we were unable to find the residues indicated in OMIM), or probably correct ('?' - see above).
Finally within the record tag, we indicate the native and mutant residues with the <native> and <mutant> tags and provide a <description> tag with the brief descriptive title taken from the OMIM data.
Download the DTD
CSV
The comma-separated value format contains the following columns:
- OMIM ID The OMIM identifier
- OMIM Record The allelic variant record number
- UniProt/SwissProt accession The accession number in SwissProt. Note that it is possible for more than one accession number to match the same OMIM record. In this case the data will appear again for each SwissProt accession. It is of course possible that the residue numbering may be different with respect to different SwissProt entries. One SwissProt accession may also link to more than one OMIM entry, though it is unlikely that there will be more than one OMIM entry with allelic variation information.
- Native residue The unmutated (native) amino acid from OMIM
- Residue number This is our corrected residue number with respect to the SwissProt entry. This should only be trusted of the Valid field (see below) is 't' or, at your discretion, '?'.
- Mutant residue The mutated amino acid from OMIM
- Valid Validation status: 't' indicates that the Residue number is definitely correct; 'f' indicates that the Residue number is definitely wrong - we were unable to find the native residues at the correct locations even when applying an offset; '?' indicates that the Residue number is 'probably correct' (see above).
- OMIM Residue number This is the residue number provided in the OMIM record.
- OMIM Description The brief descriptive title from the OMIM record.
Draft paper
You can download a draft of paper.