Part 1: Protein Structure - Exploring a protein fold

AIM: To familiarize yourself with using basic features of PyMol and to explore a simple protein fold.

Having got PyMol running, we will start by looking at a small protein, crambin. This is a hydrophobic protein from the seeds of abyssinian cabbage whose structure was solved in 1981. It is one of the smallest complete proteins in the Protein Databank containing just 46 amino acid residues.

The Protein Databank (PDB) is the world-wide resource in which the 3D structure of proteins are stored. People who have solved the structure of a protein deposit it with the PDB to make it available to others. Most journals require that structures be deposited before they will accept a paper describing the structure. In your own time, you can visit the PDB at http://www.rcsb.org/ and explore the many thousands of proteins that have been deposited.

Each entry in the PDB has an 'identifier' or 'code' currently consisting of a digit followed by 3 characters (digits or letters). The PDB code for crambin is '1CRN'.

Download the structure by right-clicking the link below and selecting the option to 'Save Target As...' (or 'Save Link As...'). Save it to your desktop, or a folder of your choice:

1CRN - crambin, 1.50A resolution

Having downloaded crambin, you can view it using PyMol. (Note that in PyMol you can also load PDB files directly simply by typing 'fetch' followed by the PDB code - in this case 'fetch 1crn')

Start the PyMol program and select 'Open...' from the 'File' menu of the control window. Browse to the desktop (or folder where you saved the 1CRN PDB file) and load it into PyMol.

You can now explore the crambin structure by clicking and dragging in the main PyMol window to rotate the structure. Use the right mouse button to zoom in and out from the structure.

The default rendering of the protein depends on the version of PyMol you are running. It may be:

A wireframe representation - each atom is the vertex between two or more lines and the lines represent the bonds. The colouring uses a variation of the scheme known as CPK (Corey, Pauling, Koltun). True CPK colouring uses black for carbons, but since this doesn't show up well against a black background(!), PyMol uses green instead.
A cartoon representation - the backbone of the protein is shown as a cartoon in which helices are represented by curved ribbons and strands are represented by arrows. Here the whole protein will have the same colour (probably green).

If you have the cartoon representation, change it to the wireframe representation as follows:

On the main window, there is a panel at the top right with the word 'all', followed by buttons labelled A,S,H,L,C (or similar). Below we will refer to this as the 'display panel'

Click 'S' next to 'all'
Hover over 'as' ... a new menu will appear
Move to the new menu and click 'wire'

We can change to a different colouring scheme as follows:

On the main window, there is a panel at the top right with buttons labelled A,S,H,L,C (or similar). Below we will refer to this as the 'display panel'

Click 'C' next to 'all'
Click 'spectrum'
Click 'rainbow' (the version with no other text next to it)

The structure is now coloured in a rainbow from blue at the N-terminus to red at the C-terminus. This makes it much easier to explore the protein fold and to trace your way along the protein chain.

The wireframe rendering can still look rather confusing, so we can change to a backbone view in which only the C-alpha atoms of each amino acid are shown linked by pretend bonds:

In the display panel, click 'S' next to 'all', then click 'ribbon'

At this stage you will see a simple backbone ribbon linking C-alpha atoms superimposed on the stick representation.

In the display panel, click 'H' next to 'all', then click 'lines'

Note that you could also have selected 'as' and then 'ribbon' from the 'S' box to do this in one step. Doing it in 2 steps shows how the 'S' stands for 'show' and the 'H' for 'hide'.

You should now be able to see regions that form alpha-helices (in yellow-green and pale blue) and two beta-strands in blue and orange.

You can click on the atoms to identify what they are. The results will be displayed in the PyMol command window. When you click on an atom, all the associated atoms in the amino acids will be highlighted by pink squares and the atom will be described in the command window using a record such as:

You clicked /1crn//A/ILE`34/CA

This indicates that you have clicked on:

PDB file 1crn
This atom is in chain A
This atom is part of an isoleucine ('ILE')
This isoleucine is labelled as residue 34 in the PDB file

(In this example there is only one chain in the PDB file, but structures having quaternary structure (e.g. haemoglobin), or where two copies of a monomeric protein have crystallized together in the unit cell, will have additional chains.)

Click anywhere on the black background to remove the pink squares.

To make the secondary structure of the protein even clearer, we can change to a cartoon representation:

In the display panel, next to 'all', click 'S' then 'as' then 'cartoon'

You will now see the alpha-helices shown as helical ribbons and the beta strands (in dark blue and orange) as twisted flat ribbons with arrow heads.

One drawback of the cartoon view is that it can be difficult to click on the C-alpha atoms. We can display both the cartoon version and the backbone version at the same time. Simply:

In the display panel, next to 'all', click 'S' then 'ribbon'

Work your way along the protein backbone from the N-terminus (in blue) towards the C-terminus (in red). Identify the second helix and find the residue numbers of the first and last residues in the helix.

Record the following information:

The first residue number in the second helix (a number between 1 and 46)
The last residue number in the second helix (a number between 1 and 46)