Bioplib
Protein Structure C Library
 All Data Structures Files Functions Variables Typedefs Macros Pages
Macros | Functions
ReadPDB.c File Reference

Read coordinates from a PDB file. More...

#include "port.h"
#include <stdio.h>
#include <string.h>
#include <math.h>
#include <stdlib.h>
#include <ctype.h>
#include <unistd.h>
#include "SysDefs.h"
#include "MathType.h"
#include "pdb.h"
#include "macros.h"
#include "fsscanf.h"
#include "general.h"

Go to the source code of this file.

Macros

#define READPDB_MAIN
 
#define MAXPARTIAL   8
 
#define SMALL   0.000001
 
#define XML_BUFFER   1024
 
#define XML_SAMPLE   256
 
#define MAXBUFF   160
 
#define LOCATION_HEADER   0
 
#define LOCATION_COORDINATES   1
 
#define LOCATION_TRAILER   2
 

Functions

FILE * popen (char *, char *)
 
int pclose (FILE *)
 
PDBblReadPDB (FILE *fp, int *natom)
 
PDBblReadPDBAll (FILE *fp, int *natom)
 
PDBblReadPDBAtoms (FILE *fp, int *natom)
 
PDBblReadPDBOccRank (FILE *fp, int *natom, int OccRank)
 
PDBblReadPDBAtomsOccRank (FILE *fp, int *natom, int OccRank)
 
WHOLEPDBblDoReadPDB (FILE *fpin, BOOL AllAtoms, int OccRank, int ModelNum, BOOL DoWhole)
 
char * blFixAtomName (char *name, REAL occup)
 
PDBblRemoveAlternates (PDB *pdb)
 
WHOLEPDBblDoReadPDBML (FILE *fpin, BOOL AllAtoms, int OccRank, int ModelNum, BOOL DoWhole)
 
BOOL blCheckFileFormatPDBML (FILE *fp)
 
void blFreeWholePDB (WHOLEPDB *wpdb)
 
WHOLEPDBblReadWholePDB (FILE *fpin)
 
WHOLEPDBblReadWholePDBAtoms (FILE *fpin)
 

Detailed Description

Read coordinates from a PDB file.

Version
V3.11
Date
01.07.15
Author
Dr. Andrew C. R. Martin
Institute of Structural & Molecular Biology, University College London, Gower Street, London. WC1E 6BT.
andre.nosp@m.w@bi.nosp@m.oinf..nosp@m.org..nosp@m.uk andre.nosp@m.w.ma.nosp@m.rtin@.nosp@m.ucl..nosp@m.ac.uk

This code is NOT IN THE PUBLIC DOMAIN, but it may be copied according to the conditions laid out in the accompanying file COPYING.DOC.

The code may be modified as required, but any modifications must be documented so that the person responsible can be identified.

The code may not be sold commercially or included as part of a commercial product except as described in the file COPYING.DOC.

Description:

pdb = ReadPDB(fp,natom)

This subroutine will read a .PDB file of any size and form a linked list of the protein structure. This list is contained in a linked set of structures of type pdb_entry. The structure is set up by including the file "pdb.h". For details of the structure, see this file.

To free the space created by this routine, call FREELIST(pdb,PDB).

The parameters passed to the subroutine are:

As of V2.3, the routine makes provision for partial occupancies. If the occupancies are 1.0 or 0.0, the atoms are read verbatim. If not, only the highest occupancy atoms are read and the atom names are corrected to remove alternative labels. This behaviour can be overridden by calling one of the ...OccRank() routines to read lower occupancy atoms. If any partial occupancy atoms are read the global flag gPDBPartialOcc is set to TRUE.

The various PDB reading routines set the following global flags: gPDBPartialOcc - the PDB file contained multiple occupancies gPDBMultiNMR - the PDB file contained multiple models gPDBXML - the file was in PDBML (XML) format gPDBModelNotFound - the requested model was not found

NOTE: Although some of the fields are represented by a single character, they are still stored in character arrays.

BUGS: The subroutine cannot read files with VAX Fortran carriage control! It just sits there and page faults like crazy.

BUGS: The multiple occupancy code assumes that all positions for a given atom in consecutive records of the file

BUGS: 25.01.05 Note the multiple occupancy code won't work properly for 3pga where atoms have occupancies of zero and one

Usage:

pdb = blReadPDB(fp,natom)
Parameters
[in]*fpA pointer to type FILE in which the .PDB file is stored.
[out]*natomNumber of atoms read.
Returns
*pdb A pointer to the first allocated item of the PDB linked list

Revision History:

Definition in file ReadPDB.c.

Macro Definition Documentation

#define LOCATION_COORDINATES   1

Definition at line 340 of file ReadPDB.c.

#define LOCATION_HEADER   0

Definition at line 339 of file ReadPDB.c.

#define LOCATION_TRAILER   2

Definition at line 341 of file ReadPDB.c.

#define MAXBUFF   160

Definition at line 337 of file ReadPDB.c.

#define MAXPARTIAL   8

Definition at line 333 of file ReadPDB.c.

#define READPDB_MAIN

Definition at line 307 of file ReadPDB.c.

#define SMALL   0.000001

Definition at line 334 of file ReadPDB.c.

#define XML_BUFFER   1024

Definition at line 335 of file ReadPDB.c.

#define XML_SAMPLE   256

Definition at line 336 of file ReadPDB.c.

Function Documentation

BOOL blCheckFileFormatPDBML ( FILE *  fp)
Parameters
[in]*fpA pointer to type FILE.
Returns
File is in PDBML format?

Simple test to detect PDBML-formatted pdb file.

Todo: Consider replacement with general function to detect file format for uncompressed file returning file type (eg pdb/pdbml/unknown).

  • 22.04.14 Original By: CTP
  • 07.07.14 Renamed to blCheckFileFormatPDBML() By: CTP
  • 29.08.14 Function re-written to take sample from the input stream then reset the stream with ungetc. By: CTP
  • 31.08.14 Bugfix: Check for 'PDBx:datablock' tag skipped if blank line before xml tag. By: CTP
  • 09.09.14 Use rewind() for DOS instead of pushing sample back on stream with ungetc(). By: CTP
  • 29.09.14 Use single character check for pdbml files for Windows or systems where ungetc() fails after pushback of singe char. By: CTP

Definition at line 2161 of file ReadPDB.c.

WHOLEPDB* blDoReadPDB ( FILE *  fpin,
BOOL  AllAtoms,
int  OccRank,
int  ModelNum,
BOOL  DoWhole 
)
Parameters
[in]*fpinA pointer to type FILE in which the .PDB file is stored.
[in]AllAtomsTRUE: ATOM & HETATM records FALSE: ATOM records only
[in]OccRankOccupancy ranking
[in]ModelNumNMR Model number (0 = all)
[in]DoWholeRead the whole PDB file rather than just the ATOM/HETATM records.
Returns
A pointer to a malloc'd WHOLEPDB structure

Reads a PDB file into a PDB linked list. The OccRank value indicates occupancy ranking to read for partial occupancy atoms. If any partial occupancy atoms are read the global flag gPDBPartialOcc is set to TRUE.

  • 04.11.88 V1.0 Original
  • 07.02.89 V1.1 Ignore records which aren't ATOM or HETATM
  • 28.03.90 V1.2 Altered field widths to match PDB standard better See notes above for deviations
  • 28.06.90 V1.2a Buffer size increased to 85 chars.
  • 15.02.91 V1.2b Changed comment header to match new standard.
  • 07.01.92 V1.3 Ignores blank lines properly
  • 11.05.92 V1.4 Check on EOF in while() loop, memset() buffer. ANSIed.
  • 01.06.92 V1.5 Documented for autodoc
  • 19.06.92 V1.6 Corrected use of stdlib
  • 01.10.92 V1.7 Changed to use fgets()
  • 10.06.93 V1.9 Returns 0 on failure rather than exiting Replaced SIZE with sizeof(PDB) directly
  • 17.06.93 V2.0 Rewritten to use fsscanf()
  • 08.07.93 V2.1 Split from ReadPDB()
  • 09.07.93 V2.2 Modified to return pointer to PDB. Rewrote allocation scheme.
  • 17.03.94 V2.3 Handles partial occupancies Sets natom to -1 if there was an error to distinguish from no atoms. Handles atom names which start in column 13 rather than column 14. This is allowed in the standard, but very rare. Sets flag for partials.
  • 06.04.94 V2.4 Atom names starting in column 13 have their first character moved to the end if it is a digit.
  • 03.10.94 V2.5 Check residue number as well as atom name when running through alternative atoms for partial occupancy Moved increment of NPartial, so only done if there is space in the array. If OccRank is 0, all atoms are read regardless of occupancy.
  • 06.03.95 V2.7 Added value for NMR model to read (0 = all) No longer static. Sets gPDBMultiNMR if ENDMDL records found.
  • 13.01.97 V2.8 Added check on return from fsscanf. Blank lines used to result in duplication of the previous line since fsscanf() does not reset the variables on receiving a blank line. Also fixed in fsscanf().
  • 25.02.98 V2.9 Added code to read gzipped PDB files transparently when GUNZIP_SUPPORT is defined
  • 17.08.98 V2.10 Added case to popen() for SunOS
  • 08.10.99 V2.11 Initialise CurIns and CurRes
  • 15.02.01 V2.12 Added atnam_raw
  • 27.04.05 V2.14 Added another atnam_raw for multiple occupancies
  • 03.06.05 V2.15 Added altpos
  • 14.10.05 V2.16 Modified detection of partial occupancy. handles residues like 1zeh/B16 where a lower partial is erroneously set to zero
  • 05.06.07 V2.19 Added support for Unix compress'd files
  • 21.12.11 V2.22 Modified for cases of single occupancy < 1.0
  • 22.04.14 V2.24 Call doReadPDBML() for PDBML-formatted PDB file. By: CTP
  • 02.06.14 V2.25 Updated doReadPDBML(). By: CTP
  • 09.06.14 V2.26 Set gPDBXML flag. By: CTP
  • 07.07.14 V2.27 Renamed to blDoReadPDB() By: CTP
  • 15.08.14 V2.29 Use CLEAR_PDB() to set default values. By: CTP
  • 16.08.14 V2.30 Replaced charge with formal_charge and partial_charge for PDB structure. By: CTP
  • 18.08.14 V2.31 Added XML_SUPPORT option allowing BiopLib to be compiled without support for PDBML format. By: CTP
  • 09.09.14 V2.35 Reading of gzipped files with gunzip not supported for MS Windows. By: CTP
  • 29.09.14 V2.36 Allow single character filetype check for gzipped files. By: CTP
  • 17.02.15 V2.37 Added segid support By: ACRM
  • 20.02.15 V3.0 NOT COMPATIBLE WITH PREVIOUS VERSIONS. The functionality of the old ReadWholePDB() is now integrated into this function.
  • 20.03.15 V3.3 Fixed behaviour with reading other than the first model and now sets a global error flag if the requested model was not found. Uses MODEL records rather than ENDMDL records in counting models
  • 02.04.15 V3.4 Rewind file after reading pdbxml header data. By: CTP
  • 28.04.15 V3.5 Removed rewind. Call to blDoReadPDBML() returns WHOLEPDB instead of PDB. By: CTP
  • 21.07.15 Changed atomType to atomInfo By: ACRM

    We need to deal with freeing wpdb if we are returning null. Also need to deal with some sort of error code

Definition at line 707 of file ReadPDB.c.

WHOLEPDB* blDoReadPDBML ( FILE *  fpin,
BOOL  AllAtoms,
int  OccRank,
int  ModelNum,
BOOL  DoWhole 
)
Parameters
[in]*fpinA pointer to type FILE in which the .PDB file is stored.
[in]AllAtomsTRUE: ATOM & HETATM records FALSE: ATOM records only
[in]OccRankOccupancy ranking
[in]ModelNumNMR Model number (0 = all)
[in]DoWholeRead the whole PDB file rather than just the ATOM/HETATM records.
Returns
A pointer to a malloc'd WHOLEPDB structure.

Reads a PDBML-formatted PDB file into a PDB linked list.

The OccRank value indicates occupancy ranking to read for partial occupancy atoms. If any partial occupancy atoms are read the global flag gPDBPartialOcc is set to TRUE.

The global multiple-models flag is set to true if more than one model is found.

Returns NULL if memory allocation fails or returns wpdb with wpdb->pdb set to NULL and wpdb->natoms set to -1.

  • 22.04.14 Original By: CTP
  • 02.06.14 Updated setting atnam_raw and parsing data from PDB atom site labels (label_seq_id, etc.) if author-defined labels are omitted. By: CTP
  • 09.06.14 Set gPDBXML flag. By: CTP
  • 07.07.14 Renamed to blDoReadPDBML() By: CTP
  • 04.08.14 Read element and formal charge. By: CTP
  • 15.08.14 Use CLEAR_PDB() to set default values. By: CTP
  • 16.08.14 Read formal and partial charges. Use blCopyPDB() to copy data for partial occupancy atoms. By: CTP
  • 18.08.14 Added XML_SUPPORT option. Return error if XML_SUPPORT not defined By: CTP
  • 26.08.14 Pad record_type to six characters. By: CTP
  • 17.02.15 Added segid support By: ACRM
  • 25.02.15 Added some checks on potential NULL pointers Changed all strcpy()s to strncpy()s Initialized numeric content variable before each sscanf() in case it fails Calls blRenumAtomsPDB() at the end since we don't use the atom site IDs for atom numbers
  • 13.03.15 Cosmetic changes
  • 28.04.15 Set identical input parameters to blDoReadPDB(). Added CONECT and header parsing. Return WHOLEPDB instead of PDB. By: CTP
  • 14.06.15 Read entity_id By: CTP
  • 25.06.15 Detect if residue number has been set from auth_seq_id using flag - fixes bug where auth_seq_id = 0. By: CTP
  • 01.07.15 Replaced ParseHeaderPDBML() with ParseHeaderRecordsPDBML() By: CTP

Definition at line 1624 of file ReadPDB.c.

char* blFixAtomName ( char *  name,
REAL  occup 
)
Parameters
[in]*nameAtom name read from file
[in]occupOccupancy to allow fixing of partial occupancy atom names
Returns
Fixed atom name (pointer into name)

Fixes an atom name by removing leading spaces, or moving a leading digit to the end of the string. Used by doReadPDB()

  • 06.04.94 Original By: ACRM
  • 01.03.01 No longer static
  • 03.06.05 The name passed in has always contained the column which is officially the alternate atom position indicator, but is used by some programs as part of the atom name. Thus the properly constructed variable coming into the routine should be something like '1HG1 ' or '1HG1A' for an alternate atom position. However some programs use ' HG11'. Therefore we now check for a character in the last position and replace it with a space if there is a space in the preceeding position (e.g. ' CA A' -> ' CA ') or if there is a character in the first position (e.g. '1HG1A' -> '1HG1 ') or if the occupancy is not zero/one NOTE!!! To support this, the routine now has a second parameter: REAL occup
  • 07.07.14 Renamed to blFixAtomName() By: CTP

Definition at line 1291 of file ReadPDB.c.

void blFreeWholePDB ( WHOLEPDB wpdb)
Parameters
[in]*wpdbWHOLEPDB structure to be freed

Frees the header, trailer and atom content from a WHOLEPDB structure

Definition at line 2293 of file ReadPDB.c.

PDB* blReadPDB ( FILE *  fp,
int *  natom 
)
Parameters
[in]*fpA pointer to type FILE in which the .PDB file is stored.
[out]*natomNumber of atoms read. -1 if error.
Returns
A pointer to the first allocated item of the PDB linked list

Reads a PDB file into a PDB linked list

  • 08.07.93 Written as entry for doReadPDB()
  • 09.07.93 Modified to return pointer to PDB
  • 17.03.94 Modified to handle OccRank
  • 06.03.95 Added value for NMR model to read (1 = first)
  • 25.01.06 Added call to RemoveAlternates() - this deals with odd cases where alternate atom positions don't appear where they should!
  • 25.01.06 Added call to RemoveAlternates(). This deals with odd uses of multiple occupancies like 3pga and the instance where the alternates are all grouped at the end of the file.
  • 07.07.14 Renamed to blReadPDB() By: CTP
  • 03.04.15 Initialize pdb to NULL avoiding returning uninitialized value if blDoReadPDB() fails. By: CTP

Definition at line 419 of file ReadPDB.c.

PDB* blReadPDBAll ( FILE *  fp,
int *  natom 
)
Parameters
[in]*fpA pointer to type FILE in which the .PDB file is stored.
[out]*natomNumber of atoms read. -1 if error.
Returns
A pointer to the first allocated item of the PDB linked list

Reads a PDB file into a PDB linked list. Reads all partial occupancy atoms. Reads both ATOM and HETATM records.

  • 04.10.94 Original By: ACRM
  • 06.03.95 Added value for NMR model to read (0 = all)
  • 07.07.14 Renamed to blReadPDBAll() By: CTP
  • 03.04.15 Initialize pdb to NULL avoiding returning uninitialized value if blDoReadPDB() fails. By: CTP

Definition at line 460 of file ReadPDB.c.

PDB* blReadPDBAtoms ( FILE *  fp,
int *  natom 
)
Parameters
[in]*fpA pointer to type FILE in which the .PDB file is stored.
[out]*natomNumber of atoms read. -1 if error.
Returns
A pointer to the first allocated item of the PDB linked list

Reads a PDB file into a PDB linked list. Atoms only (no HETATM cards).

  • 08.07.93 Written as entry for doReadPDB()
  • 09.07.93 Modified to return pointer to PDB
  • 17.03.94 Modified to handle OccRank
  • 06.03.95 Added value for NMR model to read (1 = first)
  • 25.01.06 Added call to RemoveAlternates(). This deals with odd uses of multiple occupancies like 3pga and the instance where the alternates are all grouped at the end of the file.
  • 07.07.14 Renamed to blReadPDBAtoms() By: CTP
  • 03.04.15 Initialize pdb to NULL avoiding returning uninitialized value if blDoReadPDB() fails. By: CTP

Definition at line 503 of file ReadPDB.c.

PDB* blReadPDBAtomsOccRank ( FILE *  fp,
int *  natom,
int  OccRank 
)
Parameters
[in]*fpA pointer to type FILE in which the .PDB file is stored.
[in]OccRankOccupancy ranking (>=1)
[out]*natomNumber of atoms read. -1 if error.
Returns
A pointer to the first allocated item of the PDB linked list

Reads a PDB file into a PDB linked list ignoring HETATM records and selecting the OccRank'th highest occupancy atoms

  • 17.03.94 Original By: ACRM
  • 06.03.95 Added value for NMR model to read (1 = first)
  • 07.07.14 Renamed to blReadPDBAtomsOccRank() By: CTP
  • 03.04.15 Initialize pdb to NULL avoiding returning uninitialized value if blDoReadPDB() fails. By: CTP

Definition at line 586 of file ReadPDB.c.

PDB* blReadPDBOccRank ( FILE *  fp,
int *  natom,
int  OccRank 
)
Parameters
[in]*fpA pointer to type FILE in which the .PDB file is stored.
[in]OccRankOccupancy ranking (>=1)
[out]*natomNumber of atoms read. -1 if error.
Returns
A pointer to the first allocated item of the PDB linked list

Reads a PDB file into a PDB linked list selecting the OccRank'th highest occupancy atoms

  • 17.03.94 Original By: ACRM
  • 06.03.95 Added value for NMR model to read (1 = first)
  • 07.07.14 Renamed to blDoReadPDB() By: CTP
  • 03.04.15 Initialize pdb to NULL avoiding returning uninitialized value if blDoReadPDB() fails. By: CTP

Definition at line 545 of file ReadPDB.c.

WHOLEPDB* blReadWholePDB ( FILE *  fpin)
Parameters
[in]*fpinFile pointer
Returns
Whole PDB structure containing linked list to PDB coordinate data

Reads a PDB file, storing the header and trailer information as well as the coordinate data. Can read gzipped files as well as uncompressed files.

Coordinate data is accessed as linked list of type PDB as follows:

WHOLEPDB *wpdb; PDB *p; wpdb = ReadWholePDB(fp); for(p=wpdb->pdb; p!=NULL; p=p->next) { ... Do something with p ... }

  • 07.03.07 Made into a wrapper to doReadWholePDB()
  • 07.07.14 Use blDoReadWholePDB() Renamed to blReadWholePDB() By: CTP

Definition at line 2328 of file ReadPDB.c.

WHOLEPDB* blReadWholePDBAtoms ( FILE *  fpin)
Parameters
[in]*fpinFile pointer
Returns
Whole PDB structure containing linked list to PDB coordinate data

Reads a PDB file, storing the header and trailer information as well as the coordinate data. Can read gzipped files as well as uncompressed files.

Coordinate data is accessed as linked list of type PDB as follows:

WHOLEPDB *wpdb; PDB *p; wpdb = ReadWholePDB(fp); for(p=wpdb->pdb; p!=NULL; p=p->next) { ... Do something with p ... }

  • 07.03.07 Made into a wrapper to doReadWholePDB()
  • 07.07.14 Use blDoReadWholePDB() Renamed to blReadWholePDBAtoms() By: CTP

Definition at line 2363 of file ReadPDB.c.

PDB* blRemoveAlternates ( PDB pdb)
Parameters
[in,out]*pdbPDB
Returns
Ammended linked list (in case start has changed)

Remove alternate atoms - we keep only the highest occupancy or the first if there are more than one the same.

  • 25.01.05 Original based on code written for Inpharmatica By: ACRM
  • 04.02.14 Use CHAINMATCH macro. By: CTP
  • 07.07.14 Renamed to blRemoveAlternates() Use blWritePDBRecord() Use bl prefix for functions By: CTP

Definition at line 1362 of file ReadPDB.c.

int pclose ( FILE *  )
FILE* popen ( char *  ,
char *   
)