Bioplib - a high quality programming library for protein bioinformatics

Author: Dr. Andrew C.R. Martin with additions by Dr. Craig T. Porter

Bioplib is a programming library, written in the C programming language, for handling bioinformatics data – primarily protein structures, but also sequence data.

Click the 'Related Pages' link above for information on installation and detailed listings of what the library contains

The library has equivalents for other programming languages such as Python (BioPython) and Perl (BioPerl) and another C library and tool-set (EMBOSS) is also available. However, the focus of all these other libraries is on protein and DNA sequence rather than structure. Conversely, Bioplib is a very comprehensive library for handling protein structure, with some support for protein and DNA sequences. The fact that the code is implemented in C means that it is ideal for more complex and CPU-intensive tasks since C (unlike Perl and Python) is a compiled language.

Some of the most significant work in Bioplib includes:

an extremely reliable PDB parser capable of handling multiple occupancies and selecting lower occupancy conformations for analysis
structure fitting routines (used by ProFit)
sequence alignment and general purpose dynamic programming routines
PDB manipulation routines including
- list manipulation – finding residues by name, truncating and selecting residues or atoms;
- replacing sidechains;
- adding hydrogens;
- adding a C-terminal oxygen;
- standardizing atom order;
- extracting the sequence from ATOM and SEQRES records and comparing these;
- handling and regenerating CONECT records;
- parsing of key header records (HEADER, COMPND, SEQRES, TITLE, SOURCE).

The Bioplib library currently consists of approximately 47,700 lines of C-code including comments, or 31,000 lines excluding comments.

Functions exist for:

Reading and Writing PDB and CSSR Files
Manipulating a list of PDB coordinates (e.g. joining, copying or truncating lists of coordinates or extracting regions, finding atoms or residues)
Modifying the content of the PDB coordinates (e.g. translation, rotation, renumbering)
Working with coordinates (e.g. building neighbour lists, fitting, testing for H-bonds)
General and 3D maths (e.g. distance from a point to a line, matrix multiplication)
Protein and DNA Sequences (e.g. reading and writing PIR and FASTA formats, DNA translation)
General programming (e.g. command parser, string manipulation, multiple-dimension arrays, interactive help, rigid column file reading, prime numbers, hashes/dictionaries)

Include files

Data Structures

pdb.h Definitions of WHOLEPDB and PDB data structures and functions for manipulating them.
aalist.h Include file for amino acid linked lists.

Handling Stucture Data

angle.h Calculate angles, torsions, etc.
fit.h Include file for least squares fitting
hbond.h Report whether two residues are H-bonded
access.h Solvent accessibility

Handling Sequence Data

seq.h Used for all sequence functions

Utility Functions

general.h General purpose routines used bu BiopLib
macros.h Useful macros including linked list definitions used by PDB.
hash.h Hash/dictionary functions

Example of usage

#include "bioplib/pdb.h"
#include "bioplib/macros.h"
WHOLEPDB *wpdb = NULL;
PDB      *p    = NULL;
if((fp_in = fopen("input_file.pdb","r"))!=NULL)
{
   wpdb = blReadWholePDB(fp_in);
   close(fp_in);
   if((wpdb != NULL) && (wpdb->pdb != NULL))
   {
      for(p=wpdb->pdb; p != NULL; NEXT(p))
      {
         _do_something_here_ 
      }
      if((fp_out = fopen("output_file.pdb","w"))!=NULL)
      {
         blWriteWholePDB(fp_out, wpdb);
         fclose(fp_out);
      }
      // --- else handle error --- 
   }
   // --- else handle error ---
}
// --- else handle error ---