Bioplib - a high quality programming library for protein bioinformatics
- Author
- Dr. Andrew C.R. Martin with additions by Dr. Craig T. Porter
Bioplib is a programming library, written in the C programming language, for handling bioinformatics data – primarily protein structures, but also sequence data.
Click the 'Related Pages' link above for information on installation and detailed listings of what the library contains
The library has equivalents for other programming languages such as Python (BioPython) and Perl (BioPerl) and another C library and tool-set (EMBOSS) is also available. However, the focus of all these other libraries is on protein and DNA sequence rather than structure. Conversely, Bioplib is a very comprehensive library for handling protein structure, with some support for protein and DNA sequences. The fact that the code is implemented in C means that it is ideal for more complex and CPU-intensive tasks since C (unlike Perl and Python) is a compiled language.
Some of the most significant work in Bioplib includes:
- an extremely reliable PDB parser capable of handling multiple occupancies and selecting lower occupancy conformations for analysis
- structure fitting routines (used by ProFit)
- sequence alignment and general purpose dynamic programming routines
- PDB manipulation routines including
- list manipulation – finding residues by name, truncating and selecting residues or atoms;
- replacing sidechains;
- adding hydrogens;
- adding a C-terminal oxygen;
- standardizing atom order;
- extracting the sequence from ATOM and SEQRES records and comparing these;
- handling and regenerating CONECT records;
- parsing of key header records (HEADER, COMPND, SEQRES, TITLE, SOURCE).
The Bioplib library currently consists of approximately 47,700 lines of C-code including comments, or 31,000 lines excluding comments.
Functions exist for:
- Reading and Writing PDB and CSSR Files
- Manipulating a list of PDB coordinates (e.g. joining, copying or truncating lists of coordinates or extracting regions, finding atoms or residues)
- Modifying the content of the PDB coordinates (e.g. translation, rotation, renumbering)
- Working with coordinates (e.g. building neighbour lists, fitting, testing for H-bonds)
- General and 3D maths (e.g. distance from a point to a line, matrix multiplication)
- Protein and DNA Sequences (e.g. reading and writing PIR and FASTA formats, DNA translation)
- General programming (e.g. command parser, string manipulation, multiple-dimension arrays, interactive help, rigid column file reading, prime numbers, hashes/dictionaries)
Include files
Data Structures
- pdb.h Definitions of WHOLEPDB and PDB data structures and functions for manipulating them.
- aalist.h Include file for amino acid linked lists.
Handling Stucture Data
- angle.h Calculate angles, torsions, etc.
- fit.h Include file for least squares fitting
- hbond.h Report whether two residues are H-bonded
- access.h Solvent accessibility
Handling Sequence Data
- seq.h Used for all sequence functions
Utility Functions
- general.h General purpose routines used bu BiopLib
- macros.h Useful macros including linked list definitions used by PDB.
- hash.h Hash/dictionary functions
Example of usage
#include "bioplib/pdb.h"
#include "bioplib/macros.h"
if((fp_in = fopen(
"input_file.pdb",
"r"))!=
NULL)
{
close(fp_in);
{
{
_do_something_here_
}
if((fp_out = fopen(
"output_file.pdb",
"w"))!=
NULL)
{
fclose(fp_out);
}
}
}