We will now look for more information on the protein sequence encoded by our gene.

UniProt is a resource maintained by the Swiss Institute for Bioinformatics in Geneva and the European Bioinformatics Institute near Cambridge. It consists of two parts:

NOTE! Accession codes are unique identifiers that are used to reference specific entries in specific databanks. Thus the same protein or gene in different databanks will generally have a different accession code. Often, however, databanks will provide links to other databanks by storing the related accession code from that other resource.

In the previous step, you found the accession code for our gene in the Genbank DNA databank. We are now going to search UniProt (a databank of protein sequences) to see if any of the entries cross-reference this Genbank accession code - i.e. they are the protein encoded by the gene.

Follow the instructions below to examine the protein encoded by the DNA sequence in UniProt

UniProt

  • Open the UniProt website: http://www.uniprot.org/
  • Enter the Genbank DNA accession code that you recorded in the last step into the search box at the top of the page. If the accession number has a "version number" at the end (like ".1"), do not include this when you do the search.
  • You should obtain one result from the search. Since you are now searching a different resource (a databank of protein sequences) the protein you identify has its own (different) accession code. Click on the accession code for this hit (the combination of letters and numbers in the Entry column).

The SwissProt entry for the protein sequence has lots of information about the protein.

Record the following information:

  • The E.C. Number. Look in the Names & Taxonomy section and the information provided by the BRENDA database. If more than one E.C. Number appears, use the name of the enzyme and the information supplied in the Function section of the UniProt web page to decide which you think is more likely.
  • The number of amino acids in the protein.
  • Ligands for which there are binding sites.
  • Does the secondary structure of the protein consist mostly of alpha helices or beta strands?
  • The PDB entry codes for any 3D structures that have been solved.
Continue