Promoter prediction is based on finding good matches to the classic TATA-box (TATAAAA) motif. Further support may come from other motifs such as the CCAAT-Box, CpG islands and other transcription factor binding sites.

In the computer, this is done by using position weight matrices (PWMs) which are them compared against a DNA sequence using a program such as MatInspector (Currently not available!) which slides the PWMs along the DNA sequence to find signficant matches. PWMs may be obtained from databases such as TRANSFAC or Jaspar.

However, if you run a search program such as MatInspector to find matches against the TRANSFAC database, you will find a huge number of hits.

Instead, we will run two promoter prediction programs and work on the assumption that the predicted promoter which overlaps best between the two predictions is the the correct one. In other words, we will do a logical 'AND' between the two predictions (if it is predicted well in software A and in software B, then we will believe it).

Click the headings below to try each of two promoter prediction tools in turn

First we will use a program called Promoter 2.0 from the Centre for Biological Sequence Analysis at the University of Copenhagen

In a few seconds the results will appear. Look at the set of predicted promoters listed at the top of the page. Note how many sites there are, where they occur and the confidence in the predictions.

  • Record your results by cutting and pasting them into a text editor or word processor

Next we will use the LBL Promoter predictor from the Drosophila Genome community. The program works with human sequences as well.

The results should be returned within a few seconds. You should see varying quality matches to the TATA-box consensus (tATAA/TAA/T) approximately ¼ to ½ the way along the resulting sequence matches.

For example, in the first match, count 16 bases along and you will see tataaat.

The predicted transcription start site is shown in a larger font.

  • Record your results by cutting and pasting them into a text editor or word processor

Next we will use a program called TSSG from a company called Softberry

In a few seconds the results will appear. Look at the set of predicted promoters listed at the top of the page. How many sites are there compared with the previous predictor? In particular, how many TATA boxes are predicted?

NOTE! As of 2018, Softberry are limiting the number of queries that may be run in one day. If you do not get a result back, then you can access the results here.

Note that the detailed results that follow the title "Transcription factor binding sites:" are very terse and difficult to understand! This long list of results shows matches to functional motifs from the Ghosh database. The sequence fragments use the IUPAC ambiguity codes. The authors provide detailed help.

  • Record your results by cutting and pasting them into a text editor or word processor

Compare the predictions from the different servers. Pay attention to the start and stop of each predicted region (if given). Some predictors give far more information than others, so see where you think the predictions may overlap (even if the positions of the final predictions aren't the same). Also pay attention to the prediction confidence (where given).

The main take-home message is that promoter prediction is actually far from trivial. Different predictors can give quite different results. Comparing Promoter 2.0 (P2) and TSSG, my view would be that if more than one predictor makes a prediction around the same area, then this is more likely to be correct.

Continue