bioinf.org.uk - Prof. Andrew C.R. Martin's group at UCL

How do I cite ProFit?
I want to fit two regions of different length. How can I specify a ZONE?
As a result of missing residues in one of my structures I obtain an error message saying the number of residues in a zone does not match
Can I run ProFit under Mac OS/X?
How can I run ProFit from a script?
How do I run ProFit in Windows
Can I read a set of coordinates from an NAMD trajectory?
When I fit identical structures, why do I get a non-zero RMSD?
How do I specify multiple zones for fitting or RMSD calculation?

How do I cite ProFit?

This question is answered on the last page of the manual! From the manual...

How do I Reference ProFit?

No paper has been published describing ProFit itself since it is simply a convenient program (I hope) to let you use a standard fitting algorithm; consequently, it is a little difficult to reference. The exact wording is up to you and dependent on the context, but I suggest something similar to:

Fitting was performed using the McLachlan algorithm (McLachlan, A.D., 1982 'Rapid Comparison of Protein Structures', Acta Cryst A38, 871-873) as implemented in the program ProFit (Martin, A.C.R., http://www.bioinf.org.uk/software/profit/).

I want to fit two regions of different length. How can I specify a ZONE?
--or--
As a result of missing residues in one of my structures I obtain an error message saying the number of residues in a zone does not match

It is not possible to compare regions of different length! The definition of an RMSD requires that there be a 1:1 equivalence between atoms. A least-squares fitting program like ProFit minimizes the RMSD between two sructures. Therefore, you must somehow decide which residues are equivalent to one another.

You can do this using multiple ZONE commands to specify which part of one structure is equivalent to which part of the other structure.

Alternatively, ProFit allows you to use a sequence alignment to specify the equivalent zones. You can do this within ProFit using the ALIGN command, or you can read a pre-calculated alignment using the READALIGNMENT command.

Ideally (and especially with more diverged sequences) you need to perform a structural alignment (using a program such as SSAP) rather than a sequence alignment to define the zones for least-squares fitting. You could use such a program to generate a structure-based sequence alignment and read that into ProFit.

However, ProFit also has the ability to work out the best equivalences once you have given it a seed set. Typically you start with a sequence alignment between the two structrues to obtain a starting point for equivalent pairs of atoms. You can then use the ITERATE command to get ProFit to refine the equivalences which it does using a dynamic programming algorithm. Note that you may need to play with the cutoff distance (given as a parameter to the ITERATE command). We intend to optimize this, but as a general guide you use a value that is 1-2A larger than the RMSD that you obtain from the initial fit based on the sequence alignment alone.

Can I run ProFit under Mac OS/X?

Yes! I don't have a Mac OS/X machine so can't provide a pre-compiled version, but V3.1 has been compiled successfully on a Mac.

I would like to thank Judith Cohn (Los Alamos National Lab), Nicola Ramsden (Dundee) and Michael Plevin (Toronto) for pointing out problems and testing code for earlier versions.

How can I run ProFit from a script?

One of the most common questions I get about ProFit is something along the lines of 'I have lots of pairs of proteins I need to fit or one protein that needs to be fitted to lots of others. How can I get ProFit to process all of these?'

As of ProFit V3.0, there are three ways to do this.

The first method is the SCRIPT command introduced in ProFit V3.0 which allows you to read in and execute a script.

The second method is to use the -f flag, followed by the name of a file containing your ProFit commands.

The third method has always been possible and relies on just using Unix-style redirection. Just place all the commands you wish to run in a text file and then run ProFit, redirecting standard input to this file.

For example, suppose you wanted to use a.pdb as a reference structure and wanted to fit it with b.pdb, c.pdb, d.pdb, e.pdb and f.pdb. Just create a text file as follows:

reference a.pdb
mobile b.pdb
fit
mobile c.pdb
fit
mobile d.pdb
fit
mobile e.pdb
fit
mobile f.pdb
fit

If you have called this file profit.in, them you just run ProFit with the command:

profit -f profit.in

or, using the redirection method, run ProFit with the command:

profit < profit.in

Of course the input file should also include any other commands you need such as specifying the atoms for fitting or the ranges over which to fit.

How do I run ProFit in Windows?

As of ProFit V3.1, Windows is officially supported. We now provide a precompiled Windows version, and the source code should compile cleanly with Windows compilers (we use mingw, the 'Miniature GNU for Windows' environment).

Under Windows (as with Linux), ProFit has no graphical interface: you must still learn to use the commands as described in the manual. With Windows XP (and maybe earlier versions), the double-clicking the ProFit icon will open a command window where you can type commands. You can also start ProFit from a MS-DOS command shell. Go to the directory where you unpacked ProFit and type the command:

profit

You will then be in the ProFit command interface.

From there on, you need to read the documentation. See the ProFit documentation PDF.

Can I read a set of coordinates from an NAMD trajectory?

We do not currently support this, but the following example shows how to export a DCD trajectory from NAMD as a set of PDB files:

mol load psf *.psf dcd *.dcd
set nf [molinfo top get numframes]
for {set i 0 } {$i < $nf} {incr i} {
   [atomselect top all frame $i] writepdb frame_$i.pdb
}

Once you have your PDB files, then you can generate a ProFit script to read them using this piece of Perl:

# Set this to the number of frames generated from the trajectory
$nf = 1000;

print "reference frame_0.pdb\n";
print "atoms ca\n";
for(my $i=1; $i<=$nf; $i++)
{
   printf "mobile frame_%d.pdb\n", $i;
   print "fit\n";
}

Save the output of that Perl program in a file. That file is a ProFit script which will fit each frame in turn to the initial frame and calculate the C-alpha RMSD for each.

NOTE: I haven't used NAMD for some time so haven't had a chance to test the export script.

When I fit identical structures, why do I get a non-zero RMSD?

This is a known weird-and-wonderful bug in the McLachlan fitting algorithm. Our analysis of this problem shows that the fitting of identical structures can hit a saddle point during the minimisation which the algorithm thinks is convergence leading to the structures being fitted 180degrees away from the correct position. We have spent a lot of time trying to find a proper fix for this, so far without success.

As explained in the INSTALL file, with some versions of GCC, compiling with optimization on (-O3) seems to hide the bug. Alternatively, a workaround is provided by editing the Makefile and uncommenting the line:

ROTATEREFIT = -DROTATE_REFIT

While this should sort the problem, it will slow the program down as every fit has to be performed twice.

How do I specify multiple zones for fitting or RMSD calculation?

To specify multiple zones, simply enter multiple zone commands. For example if you want to fit residues A23, A25 and A34-A37, then you would do:

ZONE A23
ZONE A25
ZONE A34-A37

Zones are additive. To reset them, use the ZONE CLEAR or equivalent ZONE * command.

Note that you cannot combine zones on a single line. The following may not give an error message, but is not correct:

ZONE A23,A25,A34-A37

In fact, this will simply take the first zone (A23).

Return

ProFit FAQ