Bioplib
Protein Structure C Library
 All Data Structures Files Functions Variables Typedefs Macros Pages
ReadPDB.c
Go to the documentation of this file.
1 /************************************************************************/
2 /**
3 
4  \file ReadPDB.c
5 
6  \version V3.11
7  \date 01.07.15
8  \brief Read coordinates from a PDB file
9 
10  \copyright (c) UCL / Dr. Andrew C. R. Martin 1988-2015
11  \author Dr. Andrew C. R. Martin
12  \par
13  Institute of Structural & Molecular Biology,
14  University College London,
15  Gower Street,
16  London.
17  WC1E 6BT.
18  \par
19  andrew@bioinf.org.uk
20  andrew.martin@ucl.ac.uk
21 
22 **************************************************************************
23 
24  This code is NOT IN THE PUBLIC DOMAIN, but it may be copied
25  according to the conditions laid out in the accompanying file
26  COPYING.DOC.
27 
28  The code may be modified as required, but any modifications must be
29  documented so that the person responsible can be identified.
30 
31  The code may not be sold commercially or included as part of a
32  commercial product except as described in the file COPYING.DOC.
33 
34 **************************************************************************
35 
36  Description:
37  ============
38 
39 \code
40  pdb = ReadPDB(fp,natom)
41 \endcode
42 
43  This subroutine will read a .PDB file
44  of any size and form a linked list of the protein structure.
45  This list is contained in a linked set of structures of type
46  pdb_entry. The structure is set up by including the file
47  "pdb.h". For details of the structure, see this file.
48 
49  To free the space created by this routine, call FREELIST(pdb,PDB).
50 
51  The parameters passed to the subroutine are:
52 
53  - fp - A pointer to type FILE in which the .PDB file is stored.
54  - pdb - A pointer to type PDB.
55  - natom - A pointer to type integer in which the number of atoms
56  found is stored.
57 
58  As of V2.3, the routine makes provision for partial occupancies. If
59  the occupancies are 1.0 or 0.0, the atoms are read verbatim. If not,
60  only the highest occupancy atoms are read and the atom names are
61  corrected to remove alternative labels. This behaviour can be
62  overridden by calling one of the ...OccRank() routines to read lower
63  occupancy atoms. If any partial occupancy atoms are read the global
64  flag gPDBPartialOcc is set to TRUE.
65 
66  The various PDB reading routines set the following global flags:
67  gPDBPartialOcc - the PDB file contained multiple occupancies
68  gPDBMultiNMR - the PDB file contained multiple models
69  gPDBXML - the file was in PDBML (XML) format
70  gPDBModelNotFound - the requested model was not found
71 
72 
73 NOTE: Although some of the fields are represented by a single character,
74  they are still stored in character arrays.
75 
76 BUGS: The subroutine cannot read files with VAX Fortran carriage control!
77  It just sits there and page faults like crazy.
78 
79 BUGS: The multiple occupancy code assumes that all positions for a given
80  atom in consecutive records of the file
81 
82 BUGS: 25.01.05 Note the multiple occupancy code won't work properly for
83  3pga where atoms have occupancies of zero and one
84 
85 **************************************************************************
86 
87  Usage:
88  ======
89 
90 \code
91  pdb = blReadPDB(fp,natom)
92 \endcode
93 
94  \param[in] *fp A pointer to type FILE in which the
95  .PDB file is stored.
96 
97  \param[out] *natom Number of atoms read.
98 
99  \return *pdb A pointer to the first allocated item of
100  the PDB linked list
101 
102 **************************************************************************
103 
104  Revision History:
105  =================
106 - V1.0 04.11.88 Original
107 - V1.1 07.02.89 Now ignores any records from the .PDB file which
108  don't start with ATOM or HETATM.
109 - V1.2 28.03.90 Some fields altered to match the exact specifications
110  of the PDB. The only differences from the standard
111  are:
112  1. The residue name is 4 characters rather than 3
113  (allowing LYSH, HISA, etc.).
114  2. The atom name starts one column later than the
115  standard and is four columns wide encompasing the
116  standard's `alternate' field. These two
117  differences from the standard reflect the common
118  usage.
119 - V1.2a 28.06.90 Buffer size increased to 85 chars.
120 - V1.2b 15.02.91 Simply changed comment header to match new standard.
121 - V1.3 07.01.92 Corrected small bug in while() loop. Now ignores
122  blank lines properly
123 - V1.4 11.05.92 Added check on EOF in while() loop and memset() of
124  buffer. ANSIfied.
125 - V1.5 01.06.92 Documented for autodoc
126 - V1.7 01.10.92 Changed to use fgets()
127 - V1.6 19.06.92 Corrected use of stdlib
128 - V1.8 08.12.92 SAS/C V6 now defines atof() in stdlib
129 - V1.9 10.06.93 Returns TRUE or FALSE rather than exiting on failure
130 - V2.0 17.06.93 Rewritten to use fsscanf()
131 - V2.1 08.07.93 Modified to give ReadPDB() and ReadPDBAtoms()
132 - V2.2 09.07.93 Modified to return the PDB pointer rather than a BOOL.
133  There is now no need to initialise the structure first.
134  Rewrote allocation scheme.
135 - V2.3 17.03.94 Handles partial occupancies. If occupancies are not
136  1.0 or 0.0, the normal routine now reads only the
137  highest occupancy atoms and corrects the atoms names
138  to remove alternative labels. This behaviour can be
139  overridden by calling one of the ...OccRank()
140  routines to read lower occupancy atoms.
141  Sets natom to -1 if there was an error to distinguish
142  from no atoms.
143  Handles atom names which start in column 13 rather
144  than column 14. This is allowed in the standard, but
145  very rare.
146  Added ReadPDBOccRank() & ReadPDBAtomsOccRank()
147  Sets gPDBPartialOcc flag.
148 - V2.4 06.04.94 With atom names which start in column 13, now checks
149  if the first character is a digit. If so, moves it
150  to the end of the atom name. Thus, 1HH1 becomes HH11
151  and 2HH1 becomes HH12.
152 - V2.5 04.10.94 Fixed partial occ when resnum changes as well as atom
153  name. Fixed bug when MAXPARTIAL exceeded.
154 - V2.6 03.11.94 Simply Corrected description. No code changes
155 - V2.7 06.03.95 Now reads just the first NMR model by default
156  doReadPDB() no longer static
157  Sets gPDBMultiNMR if ENDMDL records found.
158 - V2.8 13.01.97 Added check on return from fsscanf. Blank lines used
159  to result in duplication of the previous line since
160  fsscanf() does not reset the variables on receiving
161  a blank line. Also fixed in fsscanf().
162 - V2.9 25.02.98 Added transparent reading of gzipped PDB files if
163  GUNZIP_SUPPORT is defined
164 - V2.10 18.08.98 Added cast to popen() for SunOS
165 - V2.11 08.10.99 Initialised some variables
166 - V2.12 15.02.01 Added atnam_raw into PDB structure
167 - V2.13 30.05.02 Changed PDB field from 'junk' to 'record_type'
168 - V2.14 27.04.05 Fixed bug in atnam_raw for multiple occupancies
169 - V2.15 03.06.05 Added altpos field to PDB structure. The massaged atom
170  name no longer contains the alternate indicator and
171  atnam_raw has only the atom name with altpos having the
172  alternate indicator (as it should!)
173 - V2.16 14.10.05 Fixed a problem in StoreOccRankAtom() when a lower
174  occupancy atom has (erroneously) been set to occupancy
175  of zero and you want to pull out that atom
176 - V2.17 25.01.06 Added calls to RemoveAlternates()
177 - V2.18 03.02.06 Added prototypes for popen() and pclose()
178 - V2.19 05.06.07 Added support for Unix compress'd files
179 - V2.20 29.06.07 popen() and pclose() prototypes now skipped for MAC OSX
180  which defines them differently
181 - V2.21 17.03.09 popen() prototype skipped for Windows. By: CTP
182 - V2.22 21.12.11 doReadPDB() modified for cases where atoms are single
183  occupancy but occupancy is < 1.0
184 - V2.23 04.02.14 Use CHAINMATCH macro. By: CTP
185 - V2.24 22.04.14 Added PDBML parsing with doReadPDBML() and
186  CheckFileFormatPDBML(). By CTP
187 - V2.25 02.06.14 Updated doReadPDBML(). By: CTP
188 - V2.26 09.06.14 Set gPDBXML flag. By: CTP
189 - V2.27 07.07.14 Renaming of functions with "bl" prefix. By: CTP
190 - V2.28 04.08.14 blReadPDB() and blReadPDBML() get element and charge.
191  Set access and radius to 0.0. Set atomType to NULL.
192  Added blProcessElementField() and blProcessChargeField()
193  By: CTP
194 - V2.29 15.08.14 Updated blDoReadPDB() and blDoReadPDBML() to use
195  CLEAR_PDB(). By: CTP
196 - V2.30 16.08.14 Replaced charge with formal_charge and partial_charge
197  for PDB structure. By: CTP
198 - V2.31 18.08.14 Added XML_SUPPORT option allowing compilation without
199  support for PDBML format. By: CTP
200 - V2.32 26.08.14 blDoReadPDBML() pads record type to six chars. By: CTP
201 - V2.33 29.08.14 Rewrote blCheckFileFormatPDBML() to take sample from
202  input steam then push sample back on stream. By: CTP
203 - V2.34 31.08.14 Fixed bug in blCheckFileFormatPDBML(). By: CTP
204 - V2.35 09.09.14 Updated blCheckFileFormatPDBML() for non-unix systems.
205  Decreased size of XML_SAMPLE.
206  Reading of gzipped files with gunzip not supported for
207  MS Windows. By: CTP
208 - V2.36 29.09.14 Allow single character check for filetype where ungetc()
209  fails after pushback of single character. Updates to
210  blCheckFileFormatPDBML() and blDoReadPDB(). By: CTP
211 - V2.37 17.02.15 Added segid support By: ACRM
212 - V3.0 20.02.15 Merged functionality of ReadWholePDB() into this
213 - V3.1 25.02.15 Enhanced error checking and safety of blDoReadPDBML()
214 - V3.2 05.03.15 Fixed core dump in StoreConectRecords() when alternates
215  have been deleted so CONECT specifies an atom that
216  doesn't exist any more
217 - V3.3 20.03.15 Fixed reading of whole PDB when not the first model
218  Now sets gPDBModelNotFound flag if the requested model
219  isn't found
220 - V3.4 03.04.15 Rewind file after reading pdbxml header data.
221  Initialize pdb to NULL for ReadPDB functions. By: CTP
222 - V3.5 13.05.15 Added COMPND and SOURCE parsing for PDBML-format By: CTP
223 - V3.6 23.06.15 Parse entity_id from PDBML files. Parse chain for COMPND
224  records from PDBML files. Restrict compnd type to
225  polymer entries. Parse SEQRES records from PDBML files.
226  By: CTP
227 - V3.7 23.06.15 blStoreOccRankAtom() properly clears new PDB items
228 - V3.8 24.06.15 Parse MODRES records from PDBML files. By: CTP
229 - V3.9 25.06.15 Added experimental data support. Fixed some memory
230  leaks. Tidied up comments and formatting. Renamed all
231  static functions so they don't start with bl. Added
232  checks for failed XML extractions.
233 - V3.10 25.06.15 Fixed bug for pdbml parsing where residue number is set
234  to 0. By: CTP
235 - £3.11 01.07.15 Added ParseHeaderRecordsPDBML().
236  ParseHeaderPDBML() broken into smaller functions:
237  ParseTitlePDBML(), ParseCompndPDBML(),
238  ParseSourcePDBML(), ParseResolPDBML(),
239  ParseSeqresPDBML() and ParseModresPDBML(). By: CTP
240 
241 *************************************************************************/
242 /* Doxygen
243  -------
244  #GROUP Handling PDB Data
245  #SUBGROUP File IO
246 
247  #KEYFUNCTION blReadPDB()
248  Main way of reading a PDB file into a linked list, reading just the
249  highest occupancy atoms
250 
251  #FUNCTION blReadPDBAll()
252  Reads a PDB file into a linked list, reading all multiple
253  occupancy atoms
254 
255  #FUNCTION blReadPDBAtoms()
256  Reads only ATOM records from a PDB file into a linked list, reading
257  just the highest occupancy atoms
258 
259  #FUNCTION blReadPDBOccRank()
260  Reads the specified ranking of occupancy (e.g. the second most
261  populated coordinates) from a PDB file into a linked list
262 
263  #FUNCTION blReadPDBAtomsOccRank()
264  Reads only the ATOM records for the specified ranking of occupancy
265  (e.g. the second most populated coordinates) from a PDB file into a
266  linked list
267 
268  #FUNCTION blDoReadPDB()
269  A lower level routine giving full control over reading all or only
270  ATOM records, occupancy rankings and model numbers.
271 
272  #FUNCTION blDoReadPDBML()
273  A lower level routine giving full control over reading all or only
274  ATOM records, occupancy rankings and model numbers from a PDBML XML
275  file.
276 
277  #FUNCTION blCheckFileFormatPDBML()
278  A simple test to detect whether a file is a PDBML-formatted PDB file.
279 
280  #KEYFUNCTION blReadWholePDB()
281  Reads a PDB file, storing the header and trailer information as
282  well as the coordinate data. Can read gzipped files as well as
283  uncompressed files.
284 
285  #KEYFUNCTION blFreeWholePDB()
286  Frees the header, trailer and atom content from a WHOLEPDB structure
287 
288  #FUNCTION blReadWholePDBAtoms()
289  Reads a PDB file, storing the header and trailer information as
290  well as the coordinate data. Only reads the ATOM record for
291  coordinates
292 
293  #SUBGROUP Atom names and elements
294  #FUNCTION blFixAtomName()
295  Fixes an atom name by removing leading spaces, or moving a leading
296  digit to the end of the string.
297 
298  #SUBGROUP Manipulating the PDB linked list
299  #FUNCTION blRemoveAlternates()
300  Removes alternate occupancy atoms. This may be useful after
301  blReadPDBAll()
302 */
303 
304 /************************************************************************/
305 /* Defines required for includes
306 */
307 #define READPDB_MAIN
308 
309 /************************************************************************/
310 /* Includes
311 */
312 #include "port.h" /* Required before stdio.h */
313 
314 #include <stdio.h>
315 #include <string.h>
316 #include <math.h>
317 #include <stdlib.h>
318 #include <ctype.h>
319 #include <unistd.h>
320 
321 #ifdef XML_SUPPORT /* Required to read PDBML files */
322 #include <libxml/parser.h>
323 #include <libxml/tree.h>
324 #endif
325 
326 #include "SysDefs.h"
327 #include "MathType.h"
328 #include "pdb.h"
329 #include "macros.h"
330 #include "fsscanf.h"
331 #include "general.h"
332 
333 #define MAXPARTIAL 8
334 #define SMALL 0.000001
335 #define XML_BUFFER 1024
336 #define XML_SAMPLE 256
337 #define MAXBUFF 160
338 
339 #define LOCATION_HEADER 0
340 #define LOCATION_COORDINATES 1
341 #define LOCATION_TRAILER 2
342 
343 #ifdef XML_SUPPORT
344 #define APPEND_STRINGLIST(x, y) \
345  if(((y)!=NULL) && ((x)!=NULL)) { \
346  LAST((x)); \
347  (x)->next = (y); \
348  }
349 #endif
350 
351 /************************************************************************/
352 /* Prototypes
353 */
354 static BOOL StoreOccRankAtom(int OccRank, PDB multi[MAXPARTIAL],
355  int NPartial, PDB **ppdb, PDB **pp,
356  int *natom);
357 static void ProcessElementField(char *element, char *element_field);
358 static void ProcessChargeField(int *charge, char *charge_field);
359 static void StoreConectRecords(WHOLEPDB *wpdb, char *buffer);
360 #ifdef XML_SUPPORT
361 static BOOL SetPDBDateField(char *pdb_date, char *pdbml_date);
362 static void ParseHeaderRecordsPDBML(WHOLEPDB *wpdb, xmlDoc *document);
363 static STRINGLIST *ParseHeaderPDBML(xmlDoc *document);
364 static STRINGLIST *ParseTitlePDBML(xmlDoc *document);
365 static STRINGLIST *ParseCompndPDBML(xmlDoc *document, PDB *pdb);
366 static STRINGLIST *ParseSourcePDBML(xmlDoc *document);
367 static STRINGLIST *ParseResolPDBML(xmlDoc *document);
368 static STRINGLIST *ParseSeqresPDBML(xmlDoc *document);
369 static STRINGLIST *ParseModresPDBML(xmlDoc *document);
370 static int ParseConectPDBML(xmlDoc *document, PDB *pdb);
371 static STRINGLIST *TitleStringlist(char *titlestring);
372 static STRINGLIST *CompndStringlist(STRINGLIST *stringlist,
373  int *lines_stored, COMPND *compnd);
374 static STRINGLIST *SourceStringlist(STRINGLIST *stringlist,
375  int *lines_stored, int mol_id,
376  PDBSOURCE *source);
377 static char **GetEntityChainLabels(int entity, PDB *pdb, int *nChains);
378 static STRINGLIST *SeqresStringlist(int nchains, char **chains,
379  STRINGLIST **residues);
380 
381 #endif
382 
383 
384 
385 #if !defined(__APPLE__) && !defined(MS_WINDOWS)
386 FILE *popen(char *, char *);
387 #endif
388 #ifndef __APPLE__
389 int pclose(FILE *);
390 #endif
391 
392 /************************************************************************/
393 /*>PDB *blReadPDB(FILE *fp, int *natom)
394  ------------------------------------
395 *//**
396 
397  \param[in] *fp A pointer to type FILE in which the
398  .PDB file is stored.
399  \param[out] *natom Number of atoms read. -1 if error.
400  \return A pointer to the first allocated item of
401  the PDB linked list
402 
403  Reads a PDB file into a PDB linked list
404 
405 - 08.07.93 Written as entry for doReadPDB()
406 - 09.07.93 Modified to return pointer to PDB
407 - 17.03.94 Modified to handle OccRank
408 - 06.03.95 Added value for NMR model to read (1 = first)
409 - 25.01.06 Added call to RemoveAlternates() - this deals with odd
410  cases where alternate atom positions don't appear where
411  they should!
412 - 25.01.06 Added call to RemoveAlternates(). This deals with odd uses
413  of multiple occupancies like 3pga and the instance where
414  the alternates are all grouped at the end of the file.
415 - 07.07.14 Renamed to blReadPDB() By: CTP
416 - 03.04.15 Initialize pdb to NULL avoiding returning uninitialized value
417  if blDoReadPDB() fails. By: CTP
418 */
419 PDB *blReadPDB(FILE *fp,
420  int *natom)
421 {
422  PDB *pdb = NULL;
423  WHOLEPDB *wpdb;
424  *natom=(-1);
425 
426  if((wpdb = blDoReadPDB(fp, TRUE, 1, 1, FALSE))!=NULL)
427  {
428  blFreeStringList(wpdb->header);
429  blFreeStringList(wpdb->trailer);
430  *natom = wpdb->natoms;
431  pdb = wpdb->pdb;
432  free(wpdb);
433 
434  pdb = blRemoveAlternates(pdb);
435  }
436 
437  return(pdb);
438 }
439 
440 /************************************************************************/
441 /*>PDB *blReadPDBAll(FILE *fp, int *natom)
442  ---------------------------------------
443 *//**
444 
445  \param[in] *fp A pointer to type FILE in which the
446  .PDB file is stored.
447  \param[out] *natom Number of atoms read. -1 if error.
448  \return A pointer to the first allocated item of
449  the PDB linked list
450 
451  Reads a PDB file into a PDB linked list. Reads all partial occupancy
452  atoms. Reads both ATOM and HETATM records.
453 
454 - 04.10.94 Original By: ACRM
455 - 06.03.95 Added value for NMR model to read (0 = all)
456 - 07.07.14 Renamed to blReadPDBAll() By: CTP
457 - 03.04.15 Initialize pdb to NULL avoiding returning uninitialized value
458  if blDoReadPDB() fails. By: CTP
459 */
460 PDB *blReadPDBAll(FILE *fp,
461  int *natom)
462 {
463  PDB *pdb = NULL;
464  WHOLEPDB *wpdb;
465  *natom=(-1);
466 
467  if((wpdb = blDoReadPDB(fp, TRUE, 0, 0, FALSE))!=NULL)
468  {
469  blFreeStringList(wpdb->header);
470  blFreeStringList(wpdb->trailer);
471  *natom = wpdb->natoms;
472  pdb = wpdb->pdb;
473  free(wpdb);
474  }
475 
476  return(pdb);
477 }
478 
479 /************************************************************************/
480 /*>PDB *blReadPDBAtoms(FILE *fp, int *natom)
481  -----------------------------------------
482 *//**
483 
484  \param[in] *fp A pointer to type FILE in which the
485  .PDB file is stored.
486  \param[out] *natom Number of atoms read. -1 if error.
487  \return A pointer to the first allocated item of
488  the PDB linked list
489 
490  Reads a PDB file into a PDB linked list. Atoms only (no HETATM cards).
491 
492 - 08.07.93 Written as entry for doReadPDB()
493 - 09.07.93 Modified to return pointer to PDB
494 - 17.03.94 Modified to handle OccRank
495 - 06.03.95 Added value for NMR model to read (1 = first)
496 - 25.01.06 Added call to RemoveAlternates(). This deals with odd uses
497  of multiple occupancies like 3pga and the instance where
498  the alternates are all grouped at the end of the file.
499 - 07.07.14 Renamed to blReadPDBAtoms() By: CTP
500 - 03.04.15 Initialize pdb to NULL avoiding returning uninitialized value
501  if blDoReadPDB() fails. By: CTP
502 */
503 PDB *blReadPDBAtoms(FILE *fp,
504  int *natom)
505 {
506  PDB *pdb = NULL;
507  WHOLEPDB *wpdb;
508  *natom=(-1);
509 
510  if((wpdb = blDoReadPDB(fp, FALSE, 1, 1, FALSE))!=NULL)
511  {
512  blFreeStringList(wpdb->header);
513  blFreeStringList(wpdb->trailer);
514  *natom = wpdb->natoms;
515  pdb = wpdb->pdb;
516  free(wpdb);
517 
518  pdb = blRemoveAlternates(pdb);
519  }
520 
521  return(pdb);
522 }
523 
524 /************************************************************************/
525 /*>PDB *blReadPDBOccRank(FILE *fp, int *natom, int OccRank)
526  --------------------------------------------------------
527 *//**
528 
529  \param[in] *fp A pointer to type FILE in which the
530  .PDB file is stored.
531  \param[in] OccRank Occupancy ranking (>=1)
532  \param[out] *natom Number of atoms read. -1 if error.
533  \return A pointer to the first allocated item of
534  the PDB linked list
535 
536  Reads a PDB file into a PDB linked list selecting the OccRank'th
537  highest occupancy atoms
538 
539 - 17.03.94 Original By: ACRM
540 - 06.03.95 Added value for NMR model to read (1 = first)
541 - 07.07.14 Renamed to blDoReadPDB() By: CTP
542 - 03.04.15 Initialize pdb to NULL avoiding returning uninitialized value
543  if blDoReadPDB() fails. By: CTP
544 */
545 PDB *blReadPDBOccRank(FILE *fp, int *natom, int OccRank)
546 {
547  PDB *pdb = NULL;
548  WHOLEPDB *wpdb;
549  *natom=(-1);
550 
551  if((wpdb = blDoReadPDB(fp, TRUE, OccRank, 1, FALSE))!=NULL)
552  {
553  blFreeStringList(wpdb->header);
554  blFreeStringList(wpdb->trailer);
555  *natom = wpdb->natoms;
556  pdb = wpdb->pdb;
557  free(wpdb);
558 
559  pdb = blRemoveAlternates(pdb);
560  }
561 
562  return(pdb);
563 }
564 
565 /************************************************************************/
566 /*>PDB *blReadPDBAtomsOccRank(FILE *fp, int *natom, int OccRank)
567  -------------------------------------------------------------
568 *//**
569 
570  \param[in] *fp A pointer to type FILE in which the
571  .PDB file is stored.
572  \param[in] OccRank Occupancy ranking (>=1)
573  \param[out] *natom Number of atoms read. -1 if error.
574  \return A pointer to the first allocated item of
575  the PDB linked list
576 
577  Reads a PDB file into a PDB linked list ignoring HETATM records
578  and selecting the OccRank'th highest occupancy atoms
579 
580 - 17.03.94 Original By: ACRM
581 - 06.03.95 Added value for NMR model to read (1 = first)
582 - 07.07.14 Renamed to blReadPDBAtomsOccRank() By: CTP
583 - 03.04.15 Initialize pdb to NULL avoiding returning uninitialized value
584  if blDoReadPDB() fails. By: CTP
585 */
586 PDB *blReadPDBAtomsOccRank(FILE *fp, int *natom, int OccRank)
587 {
588  PDB *pdb = NULL;
589  WHOLEPDB *wpdb;
590  *natom=(-1);
591 
592  if((wpdb = blDoReadPDB(fp, FALSE, OccRank, 1, FALSE))!=NULL)
593  {
594  blFreeStringList(wpdb->header);
595  blFreeStringList(wpdb->trailer);
596  *natom = wpdb->natoms;
597  pdb = wpdb->pdb;
598  free(wpdb);
599 
600  pdb = blRemoveAlternates(pdb);
601  }
602 
603  return(pdb);
604 }
605 
606 /************************************************************************/
607 /*>WHOLEPDB *blDoReadPDB(FILE *fpin, BOOL AllAtoms, int OccRank,
608  int ModelNum, BOOL DoWhole)
609  -------------------------------------------------------------
610 *//**
611 
612  \param[in] *fpin A pointer to type FILE in which the
613  .PDB file is stored.
614  \param[in] AllAtoms TRUE: ATOM & HETATM records
615  FALSE: ATOM records only
616  \param[in] OccRank Occupancy ranking
617  \param[in] ModelNum NMR Model number (0 = all)
618  \param[in] DoWhole Read the whole PDB file rather than just
619  the ATOM/HETATM records.
620  \return A pointer to a malloc'd WHOLEPDB structure
621 
622  Reads a PDB file into a PDB linked list. The OccRank value indicates
623  occupancy ranking to read for partial occupancy atoms.
624  If any partial occupancy atoms are read the global flag
625  gPDBPartialOcc is set to TRUE.
626 
627 - 04.11.88 V1.0 Original
628 - 07.02.89 V1.1 Ignore records which aren't ATOM or HETATM
629 - 28.03.90 V1.2 Altered field widths to match PDB standard better
630  See notes above for deviations
631 - 28.06.90 V1.2a Buffer size increased to 85 chars.
632 - 15.02.91 V1.2b Changed comment header to match new standard.
633 - 07.01.92 V1.3 Ignores blank lines properly
634 - 11.05.92 V1.4 Check on EOF in while() loop, memset() buffer.
635  ANSIed.
636 - 01.06.92 V1.5 Documented for autodoc
637 - 19.06.92 V1.6 Corrected use of stdlib
638 - 01.10.92 V1.7 Changed to use fgets()
639 - 10.06.93 V1.9 Returns 0 on failure rather than exiting
640  Replaced SIZE with sizeof(PDB) directly
641 - 17.06.93 V2.0 Rewritten to use fsscanf()
642 - 08.07.93 V2.1 Split from ReadPDB()
643 - 09.07.93 V2.2 Modified to return pointer to PDB. Rewrote allocation
644  scheme.
645 - 17.03.94 V2.3 Handles partial occupancies
646  Sets natom to -1 if there was an error to distinguish
647  from no atoms.
648  Handles atom names which start in column 13 rather
649  than column 14. This is allowed in the standard, but
650  very rare.
651  Sets flag for partials.
652 - 06.04.94 V2.4 Atom names starting in column 13 have their first
653  character moved to the end if it is a digit.
654 - 03.10.94 V2.5 Check residue number as well as atom name when running
655  through alternative atoms for partial occupancy
656  Moved increment of NPartial, so only done if there
657  is space in the array. If OccRank is 0, all atoms are
658  read regardless of occupancy.
659 - 06.03.95 V2.7 Added value for NMR model to read (0 = all)
660  No longer static. Sets gPDBMultiNMR if ENDMDL records
661  found.
662 - 13.01.97 V2.8 Added check on return from fsscanf. Blank lines used
663  to result in duplication of the previous line since
664  fsscanf() does not reset the variables on receiving
665  a blank line. Also fixed in fsscanf().
666 - 25.02.98 V2.9 Added code to read gzipped PDB files transparently
667  when GUNZIP_SUPPORT is defined
668 - 17.08.98 V2.10 Added case to popen() for SunOS
669 - 08.10.99 V2.11 Initialise CurIns and CurRes
670 - 15.02.01 V2.12 Added atnam_raw
671 - 27.04.05 V2.14 Added another atnam_raw for multiple occupancies
672 - 03.06.05 V2.15 Added altpos
673 - 14.10.05 V2.16 Modified detection of partial occupancy. handles
674  residues like 1zeh/B16 where a lower partial is
675  erroneously set to zero
676 - 05.06.07 V2.19 Added support for Unix compress'd files
677 - 21.12.11 V2.22 Modified for cases of single occupancy < 1.0
678 - 22.04.14 V2.24 Call doReadPDBML() for PDBML-formatted PDB file. By: CTP
679 - 02.06.14 V2.25 Updated doReadPDBML(). By: CTP
680 - 09.06.14 V2.26 Set gPDBXML flag. By: CTP
681 - 07.07.14 V2.27 Renamed to blDoReadPDB() By: CTP
682 - 15.08.14 V2.29 Use CLEAR_PDB() to set default values. By: CTP
683 - 16.08.14 V2.30 Replaced charge with formal_charge and partial_charge
684  for PDB structure. By: CTP
685 - 18.08.14 V2.31 Added XML_SUPPORT option allowing BiopLib to be compiled
686  without support for PDBML format. By: CTP
687 - 09.09.14 V2.35 Reading of gzipped files with gunzip not supported for
688  MS Windows. By: CTP
689 - 29.09.14 V2.36 Allow single character filetype check for gzipped files.
690  By: CTP
691 - 17.02.15 V2.37 Added segid support By: ACRM
692 - 20.02.15 V3.0 NOT COMPATIBLE WITH PREVIOUS VERSIONS. The functionality
693  of the old ReadWholePDB() is now integrated into this
694  function.
695 - 20.03.15 V3.3 Fixed behaviour with reading other than the first model
696  and now sets a global error flag if the requested model
697  was not found. Uses MODEL records rather than ENDMDL
698  records in counting models
699 - 02.04.15 V3.4 Rewind file after reading pdbxml header data. By: CTP
700 - 28.04.15 V3.5 Removed rewind. Call to blDoReadPDBML() returns WHOLEPDB
701  instead of PDB. By: CTP
702 - 21.07.15 Changed atomType to atomInfo By: ACRM
703 
704  We need to deal with freeing wpdb if we are returning null.
705  Also need to deal with some sort of error code
706 */
707 WHOLEPDB *blDoReadPDB(FILE *fpin,
708  BOOL AllAtoms,
709  int OccRank,
710  int ModelNum,
711  BOOL DoWhole)
712 {
713  char record_type[8],
714  atnambuff[8],
715  *atnam,
716  atnam_raw[8],
717  resnam[8],
718  chain[4],
719  insert[4],
720  segid[8],
721  buffer[160],
722  CurAtom[8],
723  cmd[80],
724  CurIns = ' ',
725  altpos,
726  element_buff[4] = "",
727  charge_buff[4] = "",
728  element[4] = "";
729  int atnum,
730  resnum,
731  CurRes = 0,
732  NPartial,
733  ModelCount = 0,
734  charge = 0,
735  inLocation = LOCATION_HEADER;
736  FILE *fp = fpin;
737  double x,y,z,
738  occ,
739  bval;
740  PDB *p = NULL,
741  multi[MAXPARTIAL]; /* Temporary storage for partial occ */
742  WHOLEPDB *wpdb = NULL;
743  BOOL pdbml_format;
744 
745 
746 #if defined(GUNZIP_SUPPORT) && !defined(MS_WINDOWS)
747  int signature[3],
748  ch;
749  BOOL gzipped_file = FALSE;
750 # ifndef SINGLE_CHAR_FILECHECK
751  int i;
752 # endif
753 #endif
754 
755  if((wpdb=(WHOLEPDB *)malloc(sizeof(WHOLEPDB)))==NULL)
756  return(NULL);
757 
758  wpdb->pdb = NULL;
759  wpdb->header = NULL;
760  wpdb->trailer = NULL;
761 
762  wpdb->natoms = 0;
763  CurAtom[0] = '\0';
764  NPartial = 0;
766  gPDBMultiNMR = 0;
767  cmd[0] = '\0';
768  gPDBXML = FALSE;
769  gPDBModelNotFound = TRUE; /* Assume we haven't found the model */
770 
771 #if defined(GUNZIP_SUPPORT) && !defined(MS_WINDOWS)
772  /* See whether this is a gzipped file */
773 # ifndef SINGLE_CHAR_FILECHECK
774  /* Default three character filetype check */
775  for(i=0; i<3; i++)
776  signature[i] = fgetc(fpin);
777  for(i=2; i>=0; i--)
778  ungetc(signature[i], fpin);
779  if(((signature[0] == (int)0x1F) && /* gzip */
780  (signature[1] == (int)0x8B) &&
781  (signature[2] == (int)0x08)) ||
782  ((signature[0] == (int)0x1F) && /* 05.06.07 compress */
783  (signature[1] == (int)0x9D) &&
784  (signature[2] == (int)0x90)))
785  {
786  gzipped_file = TRUE;
787  }
788 # else
789  /* Single character filetype check */
790  signature[0] = fgetc(fpin);
791  ungetc(signature[0], fpin);
792  if(signature[0] == (int)0x1F) gzipped_file = TRUE;
793 # endif
794 
795  if(gzipped_file)
796  {
797  /* It is gzipped so we'll open gunzip as a pipe and send the data
798  through that into a temporary file
799  */
800  sprintf(cmd,"gunzip >/tmp/readpdb_%d", (int)getpid());
801  if((fp = (FILE *)popen(cmd,"w"))==NULL)
802  {
803  wpdb->natoms = (-1);
804  return(NULL);
805  }
806  while((ch=fgetc(fpin))!=EOF)
807  fputc(ch, fp);
808  pclose(fp);
809 
810  /* We now reopen the temporary file as our PDB input file */
811  sprintf(cmd,"/tmp/readpdb_%d", (int)getpid());
812  if((fp = fopen(cmd,"r"))==NULL)
813  {
814  wpdb->natoms = (-1);
815  return(NULL);
816  }
817  }
818 #endif
819 
820 
821  /* Check file format */
822  pdbml_format = blCheckFileFormatPDBML(fp);
823 
824  /* If it's PDBML then call the appropriate parser */
825  if(pdbml_format)
826  {
827 #ifdef XML_SUPPORT
828  /* Parse PDBML-formatted PDB file */
829  blFreeWholePDB(wpdb); /* free wpdb */
830  wpdb = blDoReadPDBML(fp,AllAtoms,OccRank,ModelNum,DoWhole);
831  if(cmd[0]) unlink(cmd); /* delete tmp file */
832  return(wpdb); /* return PDB list */
833 #else
834  /* PDBML format not supported. */
835  if(cmd[0]) unlink(cmd); /* delete tmp file */
836  wpdb->natoms = (-1); /* Indicate error */
837  return(NULL); /* return NULL list */
838 #endif
839  }
840 
841  inLocation = LOCATION_HEADER;
842 
843  while(fgets(buffer,159,fp))
844  {
845  /*** Deal with counting model numbers ***/
846  if(ModelNum != 0) /* We are interested in model numbers */
847  {
848  if(!strncmp(buffer,"MODEL ",6))
849  {
850  ModelCount++;
851  gPDBMultiNMR++;
852  }
853 
854  /* See if we are in the right model */
855  if(inLocation == LOCATION_COORDINATES)
856  {
857  if((ModelCount != ModelNum) && (ModelCount != 0))
858  continue;
859  else
861  }
862  }
863  else
864  {
866  }
867 
868 
869  if(!strncmp(buffer, "ATOM ", 6) ||
870  !strncmp(buffer, "HETATM", 6) ||
871  !strncmp(buffer, "MODEL ", 6))
872  {
873  inLocation = LOCATION_COORDINATES;
874  }
875  else if(!strncmp(buffer, "CONECT", 6) ||
876  !strncmp(buffer, "MASTER", 6) ||
877  !strncmp(buffer, "END ", 6))
878  {
879  inLocation = LOCATION_TRAILER;
880  }
881 
882  /* If we are in the header, just store it */
883  if(inLocation == LOCATION_HEADER)
884  {
885  if(DoWhole)
886  {
887  if((wpdb->header = blStoreString(wpdb->header, buffer))==NULL)
888  return(NULL);
889  }
890  continue;
891  }
892  if(inLocation == LOCATION_TRAILER)
893  {
894  if(DoWhole)
895  {
896  wpdb->trailer = blStoreString(wpdb->trailer, buffer);
897  if(!strncmp(buffer, "CONECT", 6))
898  StoreConectRecords(wpdb, buffer);
899  }
900 
901  continue;
902  }
903 
904  /* Read a record */
905  if(fsscanf(buffer,
906  "%6s%5d%1x%5s%4s%1s%4d%1s%3x%8lf%8lf%8lf%6lf%6lf%6x%4s%2s%2s",
907  record_type,&atnum,atnambuff,resnam,chain,&resnum,insert,
908  &x,&y,&z,&occ,&bval,segid,element_buff,charge_buff)
909  != EOF)
910  {
911  if((!strncmp(record_type,"ATOM ",6)) ||
912  (!strncmp(record_type,"HETATM",6) && AllAtoms))
913  {
914  /* Copy the raw atom name */
915  /* 03.06.05 Note: this reads the alternate atom position as
916  well as the atom name - changes in FixAtomName() now strip
917  that
918  We now copy only the first 4 characters into atnam_raw and
919  put the 5th character into altpos
920  */
921  strncpy(atnam_raw, atnambuff, 4);
922  atnam_raw[4] = '\0';
923  altpos = atnambuff[4];
924 
925  /* Fix the atom name accounting for start in column 13 or 14*/
926  atnam = blFixAtomName(atnambuff, occ);
927 
928  /* Set element and charge */
929  ProcessElementField(element, element_buff);
930  ProcessChargeField(&charge, charge_buff);
931 
932  /* Set element from atom name if not in input file */
933  if(strlen(element) == 0)
934  {
935  blSetElementSymbolFromAtomName(element, atnam_raw);
936  }
937 
938  /* Check for full occupancy. If occupancy is 0.0 assume that
939  it is actually fully occupied; the column just hasn't been
940  filled in correctly
941 
942  04.10.94 Read all atoms if OccRank is 0
943 
944  14.10.05 Now takes an atom as full occupancy:
945  if occ==1.0
946  if occ==0.0 and altpos==' '
947  if OccRank==0
948  This fixes problems where a lower (partial)
949  occupancy has erroneously been set to zero
950  21.12.11 Now only worries about partial occupancy if altpos
951  is a space. The first line of the if() statement
952  here would assume single occupancy if altpos was
953  a space and occupancy was zero:
954  if(((altpos == ' ') && (occ < (double)SMALL)) ||
955  - it now assumes single occupancy if altpos is a
956  space regardless of the actual occupancy. This
957  deals with cases like 1ap2 ZN A112 and 1ces ZN
958  A238 where these HETATMs are single occupancy
959  but with occupancy < 1.0
960  */
961  if((altpos == ' ') ||
962  (occ > (double)0.999) ||
963  (OccRank == 0))
964  {
965  /* Trim the atom name to 4 characters */
966  atnam[4] = '\0';
967 
968  if(NPartial != 0)
969  {
970  if(!StoreOccRankAtom(OccRank,multi,NPartial,
971  &wpdb->pdb,&p,&(wpdb->natoms)))
972  {
973  if(wpdb->pdb != NULL) FREELIST(wpdb->pdb, PDB);
974  wpdb->natoms = (-1);
975  if(cmd[0]) unlink(cmd);
976  return(NULL);
977  }
978 
979  /* Set partial occupancy counter to 0 */
980  NPartial = 0;
981  }
982 
983  /* Allocate space in the linked list */
984  if(wpdb->pdb == NULL)
985  {
986  INIT(wpdb->pdb, PDB);
987  p = wpdb->pdb;
988  }
989  else
990  {
991  ALLOCNEXT(p, PDB);
992  }
993 
994  /* Failed to allocate space; free up list so far & return*/
995  if(p==NULL)
996  {
997  if(wpdb->pdb != NULL) FREELIST(wpdb->pdb, PDB);
998  wpdb->natoms = (-1);
999  if(cmd[0]) unlink(cmd);
1000  return(NULL);
1001  }
1002 
1003  /* Increment the number of atoms */
1004  (wpdb->natoms)++;
1005 
1006  /* Store the information read */
1007  CLEAR_PDB(p);
1008  p->atnum = atnum;
1009  p->resnum = resnum;
1010  p->x = (REAL)x;
1011  p->y = (REAL)y;
1012  p->z = (REAL)z;
1013  p->occ = (REAL)occ;
1014  p->bval = (REAL)bval;
1015  p->altpos = altpos; /* 03.06.05 Added this one */
1016  p->formal_charge = charge;
1017  p->partial_charge = (REAL)charge;
1018  p->access = 0.0;
1019  p->radius = 0.0;
1020  p->next = NULL;
1021  strcpy(p->record_type, record_type);
1022  strcpy(p->atnam, atnam);
1023  strcpy(p->atnam_raw, atnam_raw);
1024  strcpy(p->resnam, resnam);
1025  strcpy(p->chain, chain);
1026  strcpy(p->insert, insert);
1027  strcpy(p->element, element);
1028  strcpy(p->segid, segid);
1029  }
1030  else /* Partial occupancy */
1031  {
1032  /* Set flag to say we've got a partial occupancy atom */
1033  gPDBPartialOcc = TRUE;
1034 
1035  /* First in a group, store atom name */
1036  if(NPartial == 0)
1037  {
1038  CurIns = insert[0];
1039  CurRes = resnum;
1040  strncpy(CurAtom,atnam,8);
1041  }
1042 
1043  if(strncmp(CurAtom,atnam,strlen(CurAtom)-1) ||
1044  resnum != CurRes ||
1045  CurIns != insert[0])
1046  {
1047  /* Atom name has changed
1048  Select and store the OccRank highest occupancy atom
1049  */
1050  if(!StoreOccRankAtom(OccRank,multi,NPartial,
1051  &wpdb->pdb,&p,&wpdb->natoms))
1052  {
1053  if(wpdb->pdb != NULL) FREELIST(wpdb->pdb, PDB);
1054  wpdb->natoms = (-1);
1055  if(cmd[0]) unlink(cmd);
1056  return(NULL);
1057  }
1058 
1059  /* Reset the partial atom counter */
1060  NPartial = 0;
1061  strncpy(CurAtom,atnam,8);
1062  CurRes = resnum;
1063  CurIns = insert[0];
1064  }
1065 
1066  if(NPartial < MAXPARTIAL)
1067  {
1068  /* Store the partial atom data */
1069  CLEAR_PDB((multi + NPartial));
1070  multi[NPartial].atnum = atnum;
1071  multi[NPartial].resnum = resnum;
1072  multi[NPartial].x = (REAL)x;
1073  multi[NPartial].y = (REAL)y;
1074  multi[NPartial].z = (REAL)z;
1075  multi[NPartial].occ = (REAL)occ;
1076  multi[NPartial].bval = (REAL)bval;
1077  multi[NPartial].formal_charge = charge;
1078  multi[NPartial].partial_charge = (REAL)charge;
1079  multi[NPartial].access = 0.0;
1080  multi[NPartial].radius = 0.0;
1081  multi[NPartial].atomInfo = NULL;
1082  multi[NPartial].next = NULL;
1083  strcpy(multi[NPartial].record_type, record_type);
1084  strcpy(multi[NPartial].atnam, atnam);
1085  /* 27.04.05 - added this line */
1086  strcpy(multi[NPartial].atnam_raw, atnam_raw);
1087  strcpy(multi[NPartial].resnam, resnam);
1088  strcpy(multi[NPartial].chain, chain);
1089  strcpy(multi[NPartial].insert, insert);
1090  strcpy(multi[NPartial].element, element);
1091  /* 03.06.05 - added this line */
1092  multi[NPartial].altpos = altpos;
1093  /* 17.02.15 - added this line */
1094  strcpy(multi[NPartial].segid, segid);
1095 
1096  NPartial++;
1097  }
1098  }
1099  }
1100  charge_buff[0] = '\0';
1101  charge = 0;
1102  }
1103  }
1104 
1105  if(NPartial != 0)
1106  {
1107  if(!StoreOccRankAtom(OccRank,multi,NPartial,&wpdb->pdb,&p,
1108  &wpdb->natoms))
1109  {
1110  if(wpdb->pdb != NULL) FREELIST(wpdb->pdb, PDB);
1111  wpdb->natoms = (-1);
1112  if(cmd[0]) unlink(cmd);
1113  return(NULL);
1114  }
1115  }
1116 
1117  if(cmd[0]) unlink(cmd);
1118 
1119  /* Return pointer to start of linked list */
1120  return(wpdb);
1121 }
1122 
1123 /************************************************************************/
1124 /*>static BOOL StoreOccRankAtom(int OccRank, PDB multi[MAXPARTIAL],
1125  int NPartial, PDB **ppdb, PDB **pp,
1126  int *natom)
1127  ------------------------------------------------------------------
1128 *//**
1129 
1130  \param[in] OccRank Occupancy ranking required (>=1)
1131  \param[in] multi[] Array of PDB records for alternative atom
1132  positions
1133  \param[in] NPartial Number of items in multi array
1134  \param[in,out] **ppdb Start of PDB linked list (or NULL)
1135  \param[in,out] **pp Current position in PDB linked list (or NULL)
1136  \param[in,out] *natom Number of atoms read
1137  \return Memory allocation success
1138 
1139  Takes an array of PDB records which represent alternative atom
1140  positions for an atom. Select the OccRank'th highest occupancy and
1141  add this one into the PDB linked list.
1142 
1143  To be called by doReadPDB().
1144 
1145 - 17.03.94 Original By: ACRM
1146 - 08.10.99 Initialise IMaxOcc and MaxOcc
1147 - 27.04.05 Added atnam_raw
1148 - 03.06.05 Added altpos
1149 - 14.10.05 Modified the flag value from 0.0 to -1.0 so that erroneous
1150  lower occupancies of 0.0 are read properly and written back
1151  with their occupancy (0.0) rather than the next higher
1152  occupancy. Handles residues like 1zeh/B16
1153 - 04.08.14 Read charge and element. By: CTP
1154 - 16.08.14 Read formal charge and set partial charge. By: CTP
1155 - 17.02.15 Added segid support By: ACRM
1156 - 23.06.15 Clears the new PDB items
1157 - 21.07.15 Changed .atomType to .atomInfo
1158 */
1159 static BOOL StoreOccRankAtom(int OccRank, PDB multi[MAXPARTIAL],
1160  int NPartial, PDB **ppdb, PDB **pp,
1161  int *natom)
1162 {
1163  int i,
1164  j,
1165  IMaxOcc = 0;
1166  REAL MaxOcc = (REAL)0.0,
1167  LastOcc = (REAL)0.0;
1168 
1169  if(OccRank < 1) OccRank = 1;
1170 
1171  for(i=0; i<OccRank; i++)
1172  {
1173  MaxOcc = (REAL)0.0;
1174  IMaxOcc = 0;
1175 
1176  for(j=0; j<NPartial; j++)
1177  {
1178  if(multi[j].occ >= MaxOcc)
1179  {
1180  MaxOcc = multi[j].occ;
1181  IMaxOcc = j;
1182  }
1183  }
1184  /* 14.10.05 Changed flag value to -1 so that erroneous occupancies
1185  of zero are treated properly
1186  */
1187  multi[IMaxOcc].occ = (REAL)-1.0;
1188 
1189  /* 14.10.05 Changed flag value to -1 so that erroneous occupancies
1190  of zero are treated properly
1191  */
1192  if(MaxOcc < (REAL)0.0) break;
1193  LastOcc = MaxOcc;
1194  }
1195 
1196  /* If we ran out of rankings, take the last one to be found */
1197  /* 14.10.05 Changed flag value to -1 so that erroneous occupancies
1198  of zero are treated properly
1199  */
1200  if(MaxOcc < (REAL)0.0)
1201  MaxOcc = LastOcc;
1202 
1203  /* Store this atom
1204  Allocate space in the linked list
1205  */
1206  if(*ppdb == NULL)
1207  {
1208  INIT((*ppdb), PDB);
1209  *pp = *ppdb;
1210  }
1211  else
1212  {
1213  ALLOCNEXT(*pp, PDB);
1214  }
1215 
1216  /* Failed to allocate space; error return. */
1217  if(*pp==NULL)
1218  return(FALSE);
1219 
1220  /* Increment the number of atoms */
1221  (*natom)++;
1222 
1223  CLEAR_PDB((*pp)); /* 23.06.15 */
1224 
1225  /* Store the information read */
1226  (*pp)->atnum = multi[IMaxOcc].atnum;
1227  (*pp)->resnum = multi[IMaxOcc].resnum;
1228  (*pp)->x = multi[IMaxOcc].x;
1229  (*pp)->y = multi[IMaxOcc].y;
1230  (*pp)->z = multi[IMaxOcc].z;
1231  (*pp)->occ = MaxOcc;
1232  (*pp)->bval = multi[IMaxOcc].bval;
1233  (*pp)->formal_charge = multi[IMaxOcc].formal_charge;
1234  (*pp)->partial_charge = multi[IMaxOcc].partial_charge;
1235  (*pp)->access = multi[IMaxOcc].access;
1236  (*pp)->radius = multi[IMaxOcc].radius;
1237  (*pp)->atomInfo = NULL;
1238  (*pp)->next = NULL;
1239  /* 03.06.05 Added this line */
1240  (*pp)->altpos = multi[IMaxOcc].altpos;
1241  strcpy((*pp)->record_type, multi[IMaxOcc].record_type);
1242  strcpy((*pp)->atnam, multi[IMaxOcc].atnam);
1243  /* 27.04.05 Added this line */
1244  strcpy((*pp)->atnam_raw, multi[IMaxOcc].atnam_raw);
1245  strcpy((*pp)->resnam, multi[IMaxOcc].resnam);
1246  strcpy((*pp)->chain, multi[IMaxOcc].chain);
1247  strcpy((*pp)->insert, multi[IMaxOcc].insert);
1248  strcpy((*pp)->element, multi[IMaxOcc].element);
1249  /* 17.02.15 Added this line */
1250  strcpy((*pp)->segid, multi[IMaxOcc].segid);
1251 
1252  /* Patch the atom name to remove the alternate letter */
1253  if(strlen((*pp)->atnam) > 4)
1254  ((*pp)->atnam)[4] = '\0';
1255  else
1256  ((*pp)->atnam)[3] = ' ';
1257 
1258  return(TRUE);
1259 }
1260 
1261 /************************************************************************/
1262 /*>char *blFixAtomName(char *name, REAL occup)
1263  -------------------------------------------
1264 *//**
1265 
1266  \param[in] *name Atom name read from file
1267  \param[in] occup Occupancy to allow fixing of partial occupancy
1268  atom names
1269  \return Fixed atom name (pointer into name)
1270 
1271  Fixes an atom name by removing leading spaces, or moving a leading
1272  digit to the end of the string. Used by doReadPDB()
1273 
1274 - 06.04.94 Original By: ACRM
1275 - 01.03.01 No longer static
1276 - 03.06.05 The name passed in has always contained the column which is
1277  officially the alternate atom position indicator, but is
1278  used by some programs as part of the atom name. Thus the
1279  properly constructed variable coming into the routine should
1280  be something like '1HG1 ' or '1HG1A' for an alternate atom
1281  position. However some programs use ' HG11'. Therefore we
1282  now check for a character in the last position and replace
1283  it with a space if there is a space in the preceeding
1284  position (e.g. ' CA A' -> ' CA ') or if there is a
1285  character in the first position (e.g. '1HG1A' -> '1HG1 ')
1286  or if the occupancy is not zero/one
1287  NOTE!!! To support this, the routine now has a second
1288  parameter: REAL occup
1289 - 07.07.14 Renamed to blFixAtomName() By: CTP
1290 */
1291 char *blFixAtomName(char *name, REAL occup)
1292 {
1293  char *newname;
1294  int len;
1295 
1296  /* Default behaviour, just return the input string */
1297  newname = name;
1298 
1299  if(name[0] == ' ') /* Name starts in column 14 */
1300  {
1301  /* remove leading spaces */
1302  KILLLEADSPACES(newname,name);
1303  /* 03.06.05 If the last-but-one position is a space, force the last
1304  position (the alternate atom indicator) to be a space
1305  */
1306  if(newname[2] == ' ')
1307  {
1308  newname[3] = ' ';
1309  }
1310  }
1311  else /* Name starts in column 13 */
1312  {
1313  /* 03.06.05 The last character is the alternate atom indicator,
1314  so force it to be a space
1315  */
1316  name[4] = ' ';
1317 
1318  /* If the first character is a digit, move it to the end */
1319  if(isdigit(name[0]))
1320  {
1321  if((len = blChindex(name,' ')) == (-1))
1322  {
1323  /* We didn't find a space in the name, so add the character
1324  onto the end of the string and re-terminate
1325  */
1326  len = strlen(name);
1327  newname = name+1;
1328  name[len] = name[0];
1329  name[len+1] = '\0';
1330  }
1331  else
1332  {
1333  /* We did find a space in the name, so put the first
1334  character there
1335  */
1336  newname = name+1;
1337  name[len] = name[0];
1338  }
1339  }
1340  }
1341  return(newname);
1342 }
1343 
1344 /************************************************************************/
1345 /*>PDB *blRemoveAlternates(PDB *pdb)
1346  ---------------------------------
1347 *//**
1348 
1349  \param[in,out] *pdb PDB
1350  \return Ammended linked list (in case start has
1351  changed)
1352 
1353  Remove alternate atoms - we keep only the highest occupancy or the
1354  first if there are more than one the same.
1355 
1356 - 25.01.05 Original based on code written for Inpharmatica By: ACRM
1357 - 04.02.14 Use CHAINMATCH macro. By: CTP
1358 - 07.07.14 Renamed to blRemoveAlternates() Use blWritePDBRecord()
1359  Use bl prefix for functions By: CTP
1360 
1361 */
1363 {
1364  PDB *p,
1365  *q,
1366  *r,
1367  *s,
1368  *s_prev,
1369  *r_prev,
1370  *a_prev,
1371  *next,
1372  *alts[MAXPARTIAL];
1373  int i,
1374  altCount,
1375  highest;
1376 
1377 
1378  /* Step through residues */
1379  r_prev=NULL;
1380  for(p=pdb; p!=NULL; p=q)
1381  {
1382  q=blFindNextResidue(p);
1383 
1384  /* Step through atoms */
1385  for(r=p; r!=q; NEXT(r))
1386  {
1387  if(r->altpos != ' ')
1388  {
1389 #ifdef DEBUG
1390  fprintf(stderr,"\n\nAlt pos found for record:\n");
1391  blWritePDBRecord(stderr, r);
1392 #endif
1393  /* We have an alternate, store it and search for the other
1394  ones
1395  */
1396  altCount=0;
1397  alts[altCount++] = r;
1398  /* Search through this residue for the alternates.
1399  This will work for 99.9% of files where the alternates are
1400  with the main atoms
1401  */
1402  for(s=r->next; s!=q; NEXT(s))
1403  {
1404  if(!strcmp(s->atnam_raw, alts[0]->atnam_raw))
1405  {
1406  if(altCount < MAXPARTIAL)
1407  {
1408  alts[altCount++] = s;
1409 #ifdef DEBUG
1410  fprintf(stderr,"Partner atom found in res:\n");
1411  blWritePDBRecord(stderr, s);
1412 #endif
1413  }
1414  else
1415  {
1416  fprintf(stderr,"Warning==> More than %d alternative \
1417 conformations in\n", MAXPARTIAL);
1418  fprintf(stderr," residue %c%d%c atom %s. \
1419 Increase MAXPARTIAL in ReadPDB.c\n", s->chain[0],
1420  s->resnum,
1421  s->insert[0],
1422  s->atnam);
1423  }
1424  }
1425  }
1426  /* If we didn't find the alternates within the residue, then
1427  we search the rest of the records.
1428  This covers the known entry where the alternates are shoved
1429  on the end instead!
1430  */
1431  if(altCount<2)
1432  {
1433 #ifdef DEBUG
1434  fprintf(stderr,"No partner found in residue\n");
1435 #endif
1436 
1437  s_prev = NULL;
1438  for(s=q; s!=NULL; NEXT(s))
1439  {
1440  if((s->resnum == alts[0]->resnum) &&
1441  (s->insert[0] == alts[0]->insert[0]) &&
1442  CHAINMATCH(s->chain,alts[0]->chain) &&
1443  !strcmp(s->atnam_raw, alts[0]->atnam_raw))
1444  {
1445  if(altCount < MAXPARTIAL)
1446  {
1447  alts[altCount++] = s;
1448 #ifdef DEBUG
1449  fprintf(stderr,"Partner found outside \
1450 residue:\n");
1451  blWritePDBRecord(stderr, s);
1452 #endif
1453  }
1454  else
1455  {
1456  fprintf(stderr,"Warning==> More than %d \
1457 alternative conformations in\n", MAXPARTIAL);
1458  fprintf(stderr," residue %c%d%c atom \
1459 %s. Increase MAXPARTIAL in ReadPDB.c\n", s->chain[0],
1460  s->resnum,
1461  s->insert[0],
1462  s->atnam);
1463 
1464  /* Move this record to the correct position in the
1465  linked list
1466 
1467  First unlink s from its old position
1468  */
1469  if(s_prev != NULL)
1470  s_prev->next = s->next;
1471 
1472  /* Now link it back in where it should be */
1473  next = r->next;
1474  r->next = s;
1475  s->next = next;
1476  }
1477  }
1478  s_prev = s;
1479  }
1480  }
1481 
1482  if(altCount < 2)
1483  {
1484 #ifdef DEBUG
1485  fprintf(stderr,"No alternates found. Resetting ALT \
1486 flag\n\n");
1487 #endif
1488  alts[0]->altpos = ' ';
1489 
1490  }
1491  else
1492  {
1493  /* Find the highest occupancy, defaulting to the first */
1494  highest = 0;
1495  for(i=0; i<altCount; i++)
1496  {
1497  if(alts[i]->occ > alts[highest]->occ)
1498  highest = i;
1499  }
1500 
1501  /* Delete the unwanted alternates */
1502  for(i=0; i<altCount; i++)
1503  {
1504  if(i==highest) /* For the highest remove the ALT flag */
1505  {
1506 #ifdef DEBUG
1507  fprintf(stderr,"Highest occupancy selected:\n");
1508  blWritePDBRecord(stderr, alts[i]);
1509 #endif
1510  alts[i]->altpos = ' ';
1511  }
1512  else
1513  {
1514  /* If we are deleting the current record pointer,
1515  then we need to update it
1516  */
1517  if(alts[i] == r)
1518  {
1519 #ifdef DEBUG
1520  fprintf(stderr,"Deleting current record \
1521 pointer\n");
1522 #endif
1523 
1524  if(r_prev == NULL)
1525  {
1526  r_prev = r;
1527  NEXT(r);
1528  /* We are deleting the head of the list so we
1529  must update the main list pointer
1530  */
1531  pdb = r;
1532  }
1533  else
1534  {
1535  r = r_prev;
1536  FINDPREV(r_prev, pdb, r);
1537  }
1538  }
1539 
1540  /* Delete the alternate we don't need */
1541 #ifdef DEBUG
1542  fprintf(stderr,"Deleting Alt pos record:\n");
1543  blWritePDBRecord(stderr, alts[i]);
1544 #endif
1545 
1546  FINDPREV(a_prev, pdb, alts[i]);
1547  if(a_prev != NULL)
1548  a_prev->next = alts[i]->next;
1549  free(alts[i]);
1550 
1551  } /* Not the highest, so we delete it */
1552  } /* Stepping through the alternates */
1553  }
1554 
1555  } /* We have an alternate */
1556  r_prev = r;
1557  } /* Stepping through the atoms of this residue */
1558  } /* Stepping through the residues */
1559  return(pdb);
1560 }
1561 
1562 
1563 
1564 
1565 
1566 /************************************************************************/
1567 /*>WHOLEPDB *blDoReadPDBML(FILE *fpin, BOOL AllAtoms, int OccRank,
1568  int ModelNum, BOOL DoWhole)
1569  ----------------------------------------------------------------
1570 *//**
1571 
1572  \param[in] *fpin A pointer to type FILE in which the
1573  .PDB file is stored.
1574  \param[in] AllAtoms TRUE: ATOM & HETATM records
1575  FALSE: ATOM records only
1576  \param[in] OccRank Occupancy ranking
1577  \param[in] ModelNum NMR Model number (0 = all)
1578  \param[in] DoWhole Read the whole PDB file rather than just
1579  the ATOM/HETATM records.
1580  \return A pointer to a malloc'd WHOLEPDB structure.
1581 
1582  Reads a PDBML-formatted PDB file into a PDB linked list.
1583 
1584  The OccRank value indicates occupancy ranking to read for partial
1585  occupancy atoms. If any partial occupancy atoms are read the global
1586  flag gPDBPartialOcc is set to TRUE.
1587 
1588  The global multiple-models flag is set to true if more than one model
1589  is found.
1590 
1591  Returns NULL if memory allocation fails or returns wpdb with wpdb->pdb
1592  set to NULL and wpdb->natoms set to -1.
1593 
1594 - 22.04.14 Original By: CTP
1595 - 02.06.14 Updated setting atnam_raw and parsing data from PDB atom site
1596  labels (label_seq_id, etc.) if author-defined labels are
1597  omitted. By: CTP
1598 - 09.06.14 Set gPDBXML flag. By: CTP
1599 - 07.07.14 Renamed to blDoReadPDBML() By: CTP
1600 - 04.08.14 Read element and formal charge. By: CTP
1601 - 15.08.14 Use CLEAR_PDB() to set default values. By: CTP
1602 - 16.08.14 Read formal and partial charges. Use blCopyPDB() to copy data
1603  for partial occupancy atoms. By: CTP
1604 - 18.08.14 Added XML_SUPPORT option. Return error if XML_SUPPORT not
1605  defined By: CTP
1606 - 26.08.14 Pad record_type to six characters. By: CTP
1607 - 17.02.15 Added segid support By: ACRM
1608 - 25.02.15 Added some checks on potential NULL pointers
1609  Changed all strcpy()s to strncpy()s
1610  Initialized numeric content variable before each sscanf()
1611  in case it fails
1612  Calls blRenumAtomsPDB() at the end since we don't use the
1613  atom site IDs for atom numbers
1614 - 13.03.15 Cosmetic changes
1615 - 28.04.15 Set identical input parameters to blDoReadPDB().
1616  Added CONECT and header parsing.
1617  Return WHOLEPDB instead of PDB. By: CTP
1618 - 14.06.15 Read entity_id By: CTP
1619 - 25.06.15 Detect if residue number has been set from auth_seq_id using
1620  flag - fixes bug where auth_seq_id = 0. By: CTP
1621 - 01.07.15 Replaced ParseHeaderPDBML() with ParseHeaderRecordsPDBML()
1622  By: CTP
1623 */
1625  BOOL AllAtoms,
1626  int OccRank,
1627  int ModelNum,
1628  BOOL DoWhole)
1629 {
1630 #ifndef XML_SUPPORT
1631 
1632  /* PDBML format not supported. */
1633  return( NULL );
1634 
1635 #else
1636 
1637  /* Parse PDBML-formatted file. */
1638  xmlParserCtxtPtr ctxt;
1639  xmlDoc *document;
1640  xmlNode *root_node = NULL,
1641  *sites_node = NULL,
1642  *atom_node = NULL,
1643  *n = NULL;
1644  int size_t;
1645  char xml_buffer[XML_BUFFER];
1646  xmlChar *content;
1647  double content_lf;
1648 
1649  WHOLEPDB *wpdb = NULL;
1650 
1651  PDB *curr_pdb = NULL,
1652  *end_pdb = NULL,
1653  multi[MAXPARTIAL];
1654 
1655  int NPartial = 0,
1656  model_number = 0,
1657  natom = 0;
1658  char store_atnam[8] = "",
1659  pad_resnam[8] = "";
1660 
1661  BOOL auth_seq_id_set = FALSE;
1662 
1663  /* Allocate wpdb */
1664  if((wpdb=(WHOLEPDB *)malloc(sizeof(WHOLEPDB)))==NULL)
1665  return(NULL);
1666 
1667  /* Initialise wpdb */
1668  wpdb->pdb = NULL;
1669  wpdb->header = NULL;
1670  wpdb->trailer = NULL;
1671  wpdb->natoms = 0;
1672 
1673  /* Reset flags */
1674  gPDBXML = TRUE; /* global PDBML-fornmat flag */
1675  gPDBPartialOcc = FALSE; /* global partial occupancy flag */
1676  gPDBMultiNMR = FALSE; /* global multiple models flag */
1677 
1678 
1679  /* Generate Document From Filehandle */
1680  size_t = fread(xml_buffer, 1, XML_BUFFER, fpin);
1681  ctxt = xmlCreatePushParserCtxt(NULL, NULL, xml_buffer, size_t, "file");
1682  while ((size_t = fread(xml_buffer, 1, XML_BUFFER, fpin)) > 0)
1683  {
1684  xmlParseChunk(ctxt, xml_buffer, size_t, 0);
1685  }
1686  xmlParseChunk(ctxt, xml_buffer, 0, 1);
1687  document = ctxt->myDoc;
1688  xmlFreeParserCtxt(ctxt);
1689 
1690  if(document == NULL)
1691  {
1692  /* Error: Failed to parse file */
1693  xmlFreeDoc(document); /* free document */
1694  xmlCleanupParser(); /* clean up xml parser */
1695  wpdb->natoms = -1; /* indicate error */
1696  return(wpdb); /* return wpdb */
1697  }
1698 
1699  /* Parse Document Tree */
1700  root_node = xmlDocGetRootElement(document);
1701  if(root_node == NULL)
1702  {
1703  /* Error: Failed to set root node */
1704  xmlFreeDoc(document); /* free document */
1705  xmlCleanupParser(); /* clean up xml parser */
1706  wpdb->natoms = -1; /* indicate error */
1707  return(wpdb); /* return wpdb */
1708  }
1709 
1710  /* Parse Atom Coordinate Nodes */
1711  for(n=root_node->children; n!=NULL; NEXT(n))
1712  {
1713  /* Find Atom Sites Node */
1714  if(!strcmp("atom_siteCategory", (char *)n->name))
1715  {
1716  /* Found Atom Sites */
1717  sites_node = n;
1718  break;
1719  }
1720  }
1721 
1722  if(sites_node == NULL)
1723  {
1724  /* Error: Failed to find atom sites */
1725  xmlFreeDoc(document); /* free document */
1726  xmlCleanupParser(); /* clean up xml parser */
1727  wpdb->natoms = -1; /* indicate error */
1728  return(wpdb); /* return wpdb */
1729  }
1730 
1731 
1732  /* Scan through atom nodes and populate PDB list. */
1733  for(atom_node = sites_node->children; atom_node;
1734  atom_node = atom_node->next)
1735  {
1736  if(!strcmp("atom_site", (char *)atom_node->name))
1737  {
1738  /* Current PDB */
1739  INIT(curr_pdb,PDB);
1740 
1741  if(curr_pdb == NULL)
1742  {
1743  /* Error: Failed to store atom in pdb list */
1744  FREELIST(wpdb->pdb,PDB); /* free pdb list */
1745  xmlFreeDoc(document); /* free document */
1746  xmlCleanupParser(); /* clean up xml parser */
1747  wpdb->natoms = -1; /* indicate error */
1748  return(wpdb); /* return wpdb */
1749  }
1750 
1751  /* Set default values */
1752  CLEAR_PDB(curr_pdb);
1753  strcpy(curr_pdb->chain, "");
1754  strcpy(curr_pdb->atnam, "");
1755  strcpy(curr_pdb->resnam, "");
1756  strcpy(curr_pdb->insert, " ");
1757  strcpy(curr_pdb->element, "");
1758  strcpy(curr_pdb->segid, "");
1759  auth_seq_id_set = FALSE; /* author residue number set */
1760 
1761  /* Scan atom node children */
1762  for(n=atom_node->children; n!=NULL; NEXT(n))
1763  {
1764  if(n->type != XML_ELEMENT_NODE){ continue; }
1765  content = xmlNodeGetContent(n);
1766  if(content == NULL)
1767  {
1768  /* Error: Failed to set node content */
1769  FREELIST(wpdb->pdb,PDB); /* free pdb list */
1770  xmlFreeDoc(document); /* free document */
1771  xmlCleanupParser(); /* clean up xml parser */
1772  wpdb->natoms = -1; /* indicate error */
1773  return(wpdb); /* return wpdb */
1774  }
1775 
1776  /* Set PDB values */
1777  if(!strcmp((char *)n->name, "B_iso_or_equiv"))
1778  {
1779  sscanf((char *)content, "%lf", &content_lf);
1780  curr_pdb->bval = (REAL)content_lf;
1781  }
1782  else if(!strcmp((char *)n->name, "Cartn_x"))
1783  {
1784  sscanf((char *)content, "%lf", &content_lf);
1785  curr_pdb->x = (REAL)content_lf;
1786  }
1787  else if(!strcmp((char *)n->name, "Cartn_y"))
1788  {
1789  sscanf((char *)content,"%lf",&content_lf);
1790  curr_pdb->y = (REAL)content_lf;
1791  }
1792  else if(!strcmp((char *)n->name, "Cartn_z"))
1793  {
1794  sscanf((char *)content, "%lf", &content_lf);
1795  curr_pdb->z = (REAL)content_lf;
1796  }
1797  else if(!strcmp((char *)n->name, "auth_asym_id"))
1798  {
1799  strcpy(curr_pdb->chain, (char *)content);
1800  }
1801  else if(!strcmp((char *)n->name, "auth_atom_id"))
1802  {
1803  strcpy(curr_pdb->atnam, (char *)content);
1804  }
1805  else if(!strcmp((char *)n->name, "auth_comp_id"))
1806  {
1807  strcpy(curr_pdb->resnam, (char *)content);
1808  }
1809  else if(!strcmp((char *)n->name, "auth_seq_id"))
1810  {
1811  sscanf((char *)content, "%lf", &content_lf);
1812  curr_pdb->resnum = (REAL)content_lf;
1813  auth_seq_id_set = TRUE;
1814  }
1815  else if(!strcmp((char *)n->name, "pdbx_PDB_ins_code"))
1816  {
1817  /* set insertion code
1818  25.02.15 Changed to strncpy() By: ACRM
1819  */
1820  strncpy(curr_pdb->insert, (char *)content, 8);
1821  }
1822  else if(!strcmp((char *)n->name, "group_PDB"))
1823  {
1824  /* 25.02.15 Changed to strncpy() By: ACRM */
1825  strncpy(curr_pdb->record_type, (char *)content, 8);
1826  PADMINTERM(curr_pdb->record_type, 6);
1827  }
1828  else if(!strcmp((char *)n->name, "occupancy"))
1829  {
1830  content_lf = (REAL)0.0; /* 25.02.15 */
1831  sscanf((char *)content, "%lf", &content_lf);
1832  curr_pdb->occ = (REAL)content_lf;
1833  }
1834  else if(!strcmp((char *)n->name, "label_alt_id"))
1835  {
1836  /* Use strlen as test for alt position */
1837  curr_pdb->altpos = strlen((char *)content) ? content[0]:' ';
1838  }
1839  else if(!strcmp((char *)n->name, "pdbx_PDB_model_num"))
1840  {
1841  content_lf = (REAL)0.0; /* 25.02.15 */
1842  sscanf((char *)content, "%lf", &content_lf);
1843  model_number = (int)content_lf;
1844  }
1845  else if(!strcmp((char *)n->name, "type_symbol"))
1846  {
1847  /* 25.02.15 Changed to strncpy() By: ACRM */
1848  strncpy(curr_pdb->element, (char *)content, 8);
1849  }
1850  else if(!strcmp((char *)n->name, "label_asym_id"))
1851  {
1852  if(strlen(curr_pdb->chain) == 0)
1853  {
1854  /* 25.02.15 Changed to strncpy() By: ACRM */
1855  strncpy(curr_pdb->chain, (char *)content, 8);
1856  }
1857  }
1858  else if(!strcmp((char *)n->name, "label_atom_id"))
1859  {
1860  if(strlen(curr_pdb->atnam) == 0)
1861  {
1862  /* 25.02.15 Changed to strncpy() By: ACRM */
1863  strncpy(curr_pdb->atnam, (char *)content, 8);
1864  }
1865  }
1866  else if(!strcmp((char *)n->name, "label_comp_id"))
1867  {
1868  if(strlen(curr_pdb->resnam) == 0)
1869  {
1870  /* 25.02.15 Changed to strncpy() By: ACRM */
1871  strncpy(curr_pdb->resnam, (char *)content, 8);
1872  }
1873  }
1874  else if(!strcmp((char *)n->name, "label_entity_id"))
1875  {
1876  if((curr_pdb->entity_id == 0) &&
1877  (strlen((char *)content) > 0))
1878  {
1879  content_lf = (REAL)0.0;
1880  sscanf((char *)content, "%lf", &content_lf);
1881  curr_pdb->entity_id = (REAL)content_lf;
1882  }
1883  }
1884  else if(!strcmp((char *)n->name, "label_seq_id"))
1885  {
1886  if((auth_seq_id_set == FALSE) &&
1887  (strlen((char *)content) > 0))
1888  {
1889  content_lf = (REAL)0.0; /* 25.02.15 */
1890  sscanf((char *)content, "%lf", &content_lf);
1891  curr_pdb->resnum = (REAL)content_lf;
1892  }
1893  }
1894  else if(!strcmp((char *)n->name, "pdbx_formal_charge"))
1895  {
1896  content_lf = (REAL)0.0; /* 25.02.15 */
1897  sscanf((char *)content, "%lf", &content_lf);
1898  curr_pdb->formal_charge = (int)content_lf;
1899  curr_pdb->partial_charge = (REAL)content_lf;
1900  }
1901  else if(!strcmp((char *)n->name, "seg_id")) /* 17.02.15 */
1902  {
1903  if(strlen(curr_pdb->segid) == 0)
1904  {
1905  /* 25.02.15 Changed to strncpy() By: ACRM */
1906  strncpy(curr_pdb->segid, (char *)content, 8);
1907  }
1908  }
1909 
1910  xmlFree(content);
1911  }
1912 
1913  /* Set raw atom name
1914  Note: The text pdb format uses columns 13-16 to store the atom
1915  name. By convention, columns 13-14 contain the
1916  right-justified element symbol for the atom.
1917 
1918  The raw atom name is equivalent to colums 13-16 of a
1919  pdb-formatted text file .
1920  */
1921 
1922  if(strlen(curr_pdb->atnam) == 1)
1923  {
1924  /* copy 1-letter name atnam_raw */
1925  strcpy((curr_pdb->atnam_raw), " ");
1926  /* 25.02.15 Changed to strncpy() By: ACRM */
1927  strncpy((curr_pdb->atnam_raw)+1, curr_pdb->atnam, 7);
1928  }
1929  if(strlen(curr_pdb->atnam) == 4)
1930  {
1931  /* copy 4-letter name atnam_raw */
1932  /* 25.02.15 Changed to strncpy() By: ACRM */
1933  strncpy(curr_pdb->atnam_raw, curr_pdb->atnam, 8);
1934  }
1935  else if(strlen(curr_pdb->element) == 1)
1936  {
1937  strcpy((curr_pdb->atnam_raw), " ");
1938  /* 25.02.15 Changed to strncpy() By: ACRM */
1939  strncpy((curr_pdb->atnam_raw)+1, curr_pdb->atnam, 7);
1940  }
1941  else
1942  {
1943  /* 25.02.15 Changed to strncpy() By: ACRM */
1944  strncpy(curr_pdb->atnam_raw, curr_pdb->atnam, 4);
1945  }
1946 
1947  /* Pad atom names to 4 characters */
1948  PADMINTERM(curr_pdb->atnam, 4);
1949  PADMINTERM(curr_pdb->atnam_raw, 4);
1950 
1951  /* Pad Residue Name
1952  Note: The text pdb format uses columns 18-20 to store the
1953  residue name (right-justified).
1954 
1955  curr_pdb->resnam is is equivalent to colums 18-21 of a
1956  pdb-formatted text file.
1957  */
1958  sprintf(pad_resnam, "%3s", curr_pdb->resnam);
1959  PADMINTERM(pad_resnam, 4);
1960  /* 25.02.15 Changed to strncpy() By: ACRM */
1961  strncpy(curr_pdb->resnam, pad_resnam, 8);
1962 
1963  /* Set chain to " " if not already set */
1964  if(strlen(curr_pdb->chain) == 0)
1965  {
1966  strcpy(curr_pdb->chain, " ");
1967  }
1968 
1969  /* Pad the segment id */
1970  PADMINTERM(curr_pdb->segid, 4);
1971 
1972  /* Set multi-model flag */
1973  if(model_number > 1)
1974  {
1975  gPDBMultiNMR = TRUE;
1976  }
1977 
1978  /* Filter: Model Number */
1979  if(model_number != ModelNum)
1980  {
1981  /* Free curr_pdb */
1982  FREELIST(curr_pdb,PDB);
1983  curr_pdb = NULL;
1984 
1985  if(model_number > ModelNum)
1986  {
1987  break; /* skip rest of tree */
1988  }
1989  else
1990  {
1991  continue; /* filter */
1992  }
1993  }
1994 
1995 
1996  /* Filter: All Atoms */
1997  if(!AllAtoms && strncmp(curr_pdb->record_type, "ATOM ", 6))
1998  {
1999  /* Free curr_pdb and skip atom */
2000  FREELIST(curr_pdb,PDB);
2001  curr_pdb = NULL;
2002  continue; /* filter */
2003  }
2004 
2005 
2006  /* Add partial occ atom from temp storage to output PDB list */
2007  if((NPartial != 0) && strcmp(curr_pdb->atnam,store_atnam))
2008  {
2009  /* Store atom */
2010  if(StoreOccRankAtom(OccRank,multi,NPartial,&wpdb->pdb,
2011  &end_pdb,&wpdb->natoms))
2012  {
2013  LAST(end_pdb);
2014  NPartial = 0;
2015  }
2016  else
2017  {
2018  /* Error: Failed to store partial occ atom */
2019  FREELIST(curr_pdb,PDB); /* free curr_pdb */
2020  FREELIST(wpdb->pdb,PDB); /* free pdb list */
2021  xmlFreeDoc(document); /* free document */
2022  xmlCleanupParser(); /* clean up xml parser */
2023  wpdb->natoms = -1; /* indicate error */
2024  return(wpdb); /* return wpdb */
2025  }
2026  }
2027 
2028 
2029  /* Set atom number
2030  Note: Cannot use atom site id for atom number so base atnum on
2031  number of atoms stored
2032 
2033  25.02.15 We will renumber afterwards
2034  */
2035  /*curr_pdb->atnum = *natom + 1;*/
2036  curr_pdb->atnum = natom + 1;
2037 
2038 
2039  /* Add partial occupancy atom to temp storage */
2040  if((curr_pdb->altpos != ' ') && (NPartial < MAXPARTIAL))
2041  {
2042  /* Copy the partial atom data to storage */
2043  blCopyPDB(&multi[NPartial], curr_pdb);
2044 
2045  /* Set global partial occupancy flag */
2046  gPDBPartialOcc = TRUE;
2047 
2048  /* Store current atom name */
2049  /* 25.02.15 Changed to strncpy() By: ACRM */
2050  strncpy(store_atnam, curr_pdb->atnam, 8);
2051  NPartial++;
2052 
2053  /* Free curr_pdb and continue */
2054  FREELIST(curr_pdb,PDB);
2055  curr_pdb = NULL;
2056  continue;
2057  }
2058 
2059 
2060  /* Store Atom */
2061  if(wpdb->pdb == NULL)
2062  {
2063  /* store first atom */
2064  /*pdb = curr_pdb;*/
2065  wpdb->pdb = curr_pdb;
2066  end_pdb = curr_pdb;
2067  curr_pdb = NULL;
2068  wpdb->natoms = 1;
2069  }
2070  else
2071  {
2072  /* store subsequent atoms */
2073  end_pdb->next = curr_pdb;
2074  end_pdb = curr_pdb;
2075  curr_pdb = NULL;
2076  wpdb->natoms += 1;
2077  }
2078  }
2079  }
2080 
2081 
2082  /* Store final atom (if partial occupancy) */
2083  if(NPartial != 0)
2084  {
2085  if(!StoreOccRankAtom(OccRank,multi,NPartial,&wpdb->pdb,&end_pdb,
2086  &wpdb->natoms))
2087  {
2088  /* Error: Failed to store atom in pdb list */
2089  FREELIST(wpdb->pdb,PDB); /* free pdb list */
2090  xmlFreeDoc(document); /* free document */
2091  xmlCleanupParser(); /* clean up xml parser */
2092  wpdb->natoms = -1; /* indicate error */
2093  return(wpdb); /* return wpdb */
2094  }
2095  }
2096 
2097  /* Check atoms have been stored */
2098  if(wpdb->pdb == NULL || wpdb->natoms == 0)
2099  {
2100  /* Error: pdb list empty or no atoms stored */
2101  FREELIST(wpdb->pdb,PDB); /* free pdb list */
2102  xmlFreeDoc(document); /* free document */
2103  xmlCleanupParser(); /* clean up xml parser */
2104  wpdb->natoms = -1; /* indicate error */
2105  return(wpdb); /* return wpdb */
2106  }
2107 
2108  /* 25.02.15 Renumber atoms since we don't use the atom site IDs */
2109  blRenumAtomsPDB(wpdb->pdb, 1);
2110 
2111 
2112  /* Parse header data and CONECT nodes for whole pdb */
2113  if(DoWhole)
2114  {
2115  /* Parse CONECT Nodes */
2116  ParseConectPDBML(document, wpdb->pdb);
2117 
2118  /* Parse Header Data */
2119  ParseHeaderRecordsPDBML(wpdb, document);
2120  }
2121 
2122 
2123  /* Free document and globals set by XML parser */
2124  xmlFreeDoc(document);
2125  xmlCleanupParser();
2126 
2127  /* Return WHOLEPDB */
2128  return(wpdb);
2129 
2130 #endif
2131 }
2132 
2133 
2134 /************************************************************************/
2135 /*>BOOL blCheckFileFormatPDBML(FILE *fp)
2136  -------------------------------------
2137 *//**
2138 
2139  \param[in] *fp A pointer to type FILE.
2140  \return File is in PDBML format?
2141 
2142  Simple test to detect PDBML-formatted pdb file.
2143 
2144  Todo: Consider replacement with general function to detect file format
2145  for uncompressed file returning file type (eg pdb/pdbml/unknown).
2146 
2147 
2148 - 22.04.14 Original By: CTP
2149 - 07.07.14 Renamed to blCheckFileFormatPDBML() By: CTP
2150 - 29.08.14 Function re-written to take sample from the input stream then
2151  reset the stream with ungetc. By: CTP
2152 - 31.08.14 Bugfix: Check for 'PDBx:datablock' tag skipped if blank line
2153  before xml tag. By: CTP
2154 - 09.09.14 Use rewind() for DOS instead of pushing sample back on stream
2155  with ungetc(). By: CTP
2156 - 29.09.14 Use single character check for pdbml files for Windows or
2157  systems where ungetc() fails after pushback of singe char.
2158  By: CTP
2159 
2160 */
2162 {
2163 #if !defined(SINGLE_CHAR_FILECHECK) && !defined(MS_WINDOWS)
2164 
2165  /* Default Filetype Check */
2166  char buffer[XML_SAMPLE];
2167  int i, c;
2168  BOOL found_xml = FALSE,
2169  found_pdbx = FALSE;
2170 
2171  /* store sample from stream */
2172  for(i = 0; i < (XML_SAMPLE - 1); i++)
2173  {
2174  c = fgetc(fp);
2175  if(c == EOF || feof(fp)) break;
2176  buffer[i] = (char)c;
2177  }
2178  buffer[i] = '\0'; /* terminate string */
2179 
2180  /* push sample back on input stream */
2181  for(i = strlen(buffer) - 1; i >= 0; i--)
2182  {
2183  ungetc(buffer[i], fp);
2184  }
2185 
2186  /* check first line */
2187  if(!strncmp(buffer,"<?xml ",6)) found_xml = TRUE;
2188 
2189  /* check remaining lines */
2190  for(i = 0; i < strlen(buffer); i++)
2191  {
2192  if(buffer[i] != '\n') continue;
2193 
2194  /*i++;*/
2195  if(!strncmp(&buffer[i+1],"<?xml ",6)) found_xml = TRUE;
2196  if(!strncmp(&buffer[i+1],"<PDBx:datablock ",16)) found_pdbx = TRUE;
2197  }
2198 
2199  return ((found_xml && found_pdbx) ? TRUE : FALSE);
2200 
2201 #else
2202 
2203  /* Single Character Filetype Check */
2204  int c;
2205 
2206  /* get single char from input stream */
2207  c = fgetc(fp);
2208  if(c == EOF || feof(fp)) return FALSE;
2209 
2210  /* pushback character */
2211  ungetc(c, fp);
2212 
2213  /* detect filetype */
2214  return (((char)c == '<') ? TRUE:FALSE);
2215 
2216 #endif
2217 }
2218 
2219 /************************************************************************/
2220 /*>static void ProcessElementField(char *element_field, char *element)
2221  -------------------------------------------------------------------
2222 *//**
2223 
2224  \param[in] element_field Columns 77 to 78 of the ATOM/HETATM record
2225  of pdb file.
2226  \param[out] element Element symbol.
2227 
2228  Get element symbol for ATOM/HETATM record.
2229 
2230 - 04.08.14 Original By: CTP
2231 
2232 */
2233 static void ProcessElementField(char *element, char *element_field)
2234 {
2235  char element_sym[4] = "";
2236  char *element_ptr = NULL;
2237 
2238  /* Get element */
2239  if(strlen(element_field) >= 2)
2240  {
2241  strncpy(element_sym, element_field, 2);
2242  element_sym[2] = '\0';
2243  KILLLEADSPACES(element_ptr, element_sym);
2244  strcpy(element,element_ptr);
2245  }
2246 
2247  return;
2248 }
2249 
2250 /************************************************************************/
2251 /*>static void ProcessChargeField(char *element_charge, int *charge)
2252  -----------------------------------------------------------------
2253 *//**
2254 
2255  \param[in] element_charge Columns 79 to 80 of the ATOM/HETATM record
2256  of pdb file.
2257  \param[out] charge Charge on the atom.
2258 
2259  Get formal charge for ATOM/HETATM record.
2260 
2261 - 04.08.14 Original By: CTP
2262 
2263 */
2264 static void ProcessChargeField(int *charge, char *charge_field)
2265 {
2266  /* Get charge magnitude */
2267  if(strlen(charge_field) >= 2 && isdigit(charge_field[0]))
2268  {
2269  *charge = charge_field[0] - '0';
2270  }
2271 
2272  /* Get charge sign */
2273  if(strlen(charge_field) >= 2 && charge_field[1] == '-')
2274  {
2275  *charge = *charge * -1;
2276  }
2277 
2278  return;
2279 }
2280 
2281 /************************************************************************/
2282 /*>void blFreeWholePDB(WHOLEPDB *wpdb)
2283  -----------------------------------
2284 *//**
2285 
2286  \param[in] *wpdb WHOLEPDB structure to be freed
2287 
2288  Frees the header, trailer and atom content from a WHOLEPDB structure
2289 
2290 - 30.05.02 Original By: ACRM
2291 - 07.07.14 Renamed to blFreeWholePDB() By: CTP
2292 */
2294 {
2295  blFreeStringList(wpdb->header);
2296  blFreeStringList(wpdb->trailer);
2297  FREELIST(wpdb->pdb, PDB);
2298  free(wpdb);
2299 }
2300 
2301 
2302 /************************************************************************/
2303 /*>WHOLEPDB *blReadWholePDB(FILE *fpin)
2304  ------------------------------------
2305 *//**
2306 
2307  \param[in] *fpin File pointer
2308  \return Whole PDB structure containing linked
2309  list to PDB coordinate data
2310 
2311  Reads a PDB file, storing the header and trailer information as
2312  well as the coordinate data. Can read gzipped files as well as
2313  uncompressed files.
2314 
2315  Coordinate data is accessed as linked list of type PDB as follows:
2316 
2317  WHOLEPDB *wpdb;
2318  PDB *p;
2319  wpdb = ReadWholePDB(fp);
2320  for(p=wpdb->pdb; p!=NULL; p=p->next)
2321  {
2322  ... Do something with p ...
2323  }
2324 
2325 - 07.03.07 Made into a wrapper to doReadWholePDB()
2326 - 07.07.14 Use blDoReadWholePDB() Renamed to blReadWholePDB() By: CTP
2327 */
2329 {
2330  WHOLEPDB *wpdb;
2331  wpdb = blDoReadPDB(fpin, TRUE, 1, 1, TRUE);
2332  wpdb->pdb = blRemoveAlternates(wpdb->pdb);
2333  return(wpdb);
2334 }
2335 
2336 /************************************************************************/
2337 /*>WHOLEPDB *blReadWholePDBAtoms(FILE *fpin)
2338  -----------------------------------------
2339 *//**
2340 
2341  \param[in] *fpin File pointer
2342  \return Whole PDB structure containing linked
2343  list to PDB coordinate data
2344 
2345  Reads a PDB file, storing the header and trailer information as
2346  well as the coordinate data. Can read gzipped files as well as
2347  uncompressed files.
2348 
2349  Coordinate data is accessed as linked list of type PDB as follows:
2350 
2351  WHOLEPDB *wpdb;
2352  PDB *p;
2353  wpdb = ReadWholePDB(fp);
2354  for(p=wpdb->pdb; p!=NULL; p=p->next)
2355  {
2356  ... Do something with p ...
2357  }
2358 
2359 - 07.03.07 Made into a wrapper to doReadWholePDB()
2360 - 07.07.14 Use blDoReadWholePDB() Renamed to blReadWholePDBAtoms()
2361  By: CTP
2362 */
2364 {
2365  WHOLEPDB *wpdb;
2366  wpdb = blDoReadPDB(fpin, FALSE, 1, 1, TRUE);
2367  wpdb->pdb = blRemoveAlternates(wpdb->pdb);
2368  return(wpdb);
2369 }
2370 
2371 
2372 /************************************************************************/
2373 /*>static void StoreConectRecords(WHOLEPDB *wpdb, char *buffer)
2374  ------------------------------------------------------------
2375 *//**
2376  \param[in,out] *wpdb Whole PDB structure
2377  \param[in] *buffer A line containing a CONECT record
2378 
2379  Stores the connectivity data from a CONECT record into the PDB linked
2380  list
2381 
2382 - 18.02.15 Original By: ACRM
2383 - 05.03.15 Initialize all connected atoms to NULL and check for NULLs
2384  when storing the CONECTs. This deals with cases where an
2385  atom specified in a CONECT record has alternate occupancies
2386  and an alternate position is removed. Previously this led
2387  to core dumps. e.g. PDB code 3pnw
2388 */
2389 static void StoreConectRecords(WHOLEPDB *wpdb, char *buffer)
2390 {
2391  char record_type[8];
2392  int i, j,
2393  nConect,
2394  atoms[5];
2395  PDB *p,
2396  *atomsP[5];
2397  BOOL gotLink;
2398 
2399  fsscanf(buffer,"%6s%5d%5d%5d%5d%5d",
2400  record_type,&atoms[0],&atoms[1],&atoms[2],&atoms[3],&atoms[4]);
2401 
2402  /* Find the PDB pointers for the (up to) 5 CONECT atoms */
2403  nConect = 0;
2404  for(i=0; i<5; i++)
2405  {
2406  /* Have we run out of specified atoms? */
2407  if(atoms[i] == 0)
2408  break;
2409 
2410  atomsP[i] = NULL; /* 05.03.15 */
2411 
2412  /* Look for this atom */
2413  for(p=wpdb->pdb; p!=NULL; NEXT(p))
2414  {
2415  if(atoms[i] == p->atnum)
2416  {
2417  /* Found it so store and break out */
2418  atomsP[i] = p;
2419  nConect++;
2420  break;
2421  }
2422  }
2423  }
2424 
2425  /* Set the connections from atom 0 */
2426  if(atomsP[0] != NULL) /* 05.03.15 */
2427  {
2428  for(i=1; i<nConect; i++)
2429  {
2430  /* Look to see if we have this conect stored already */
2431  gotLink = FALSE;
2432  for(j=0; j<atomsP[0]->nConect; j++)
2433  {
2434  if(atomsP[i] != NULL) /* 05.03.15 */
2435  {
2436  if(atomsP[0]->conect[j] == atomsP[i])
2437  {
2438  gotLink = TRUE;
2439  break;
2440  }
2441  }
2442  }
2443 
2444  /* If not then store it */
2445  if(!gotLink && (atomsP[0]->nConect < MAXCONECT))
2446  {
2447  if(atomsP[i] != NULL) /* 05.03.15 */
2448  {
2449  atomsP[0]->conect[atomsP[0]->nConect] = atomsP[i];
2450  (atomsP[0]->nConect)++;
2451  }
2452  }
2453  }
2454 
2455  /* Set the connections in the other direction */
2456  for(i=1; i<nConect; i++)
2457  {
2458  if(atomsP[i] != NULL) /* 05.03.15 */
2459  {
2460  /* Look to see if we have this conect stored already */
2461  gotLink = FALSE;
2462  for(j=0; j<atomsP[i]->nConect; j++)
2463  {
2464  if(atomsP[i]->conect[j] == atomsP[0])
2465  {
2466  gotLink = TRUE;
2467  break;
2468  }
2469  }
2470 
2471  /* If not then store it */
2472  if(!gotLink && (atomsP[i]->nConect < MAXCONECT))
2473  {
2474  atomsP[i]->conect[atomsP[i]->nConect] = atomsP[0];
2475  (atomsP[i]->nConect)++;
2476  }
2477  }
2478  }
2479  }
2480 }
2481 
2482 #ifdef XML_SUPPORT
2483 /************************************************************************/
2484 /*>static void ParseHeaderRecordsPDBML(WHOLEPDB *wpdb, xmlDoc *document)
2485  ---------------------------------------------------------------------
2486 *//**
2487 
2488  \param[in,out] *wpdb WHOLEPDB being parsed.
2489  \param[in] *document XML document
2490 
2491  Parses PDBML header data and creates PDB-formatted header records.
2492 
2493 - 01.07.15 Original. By: CTP
2494 
2495 */
2496 static void ParseHeaderRecordsPDBML(WHOLEPDB *wpdb, xmlDoc *document)
2497 {
2498 #ifndef XML_SUPPORT
2499 
2500  /* PDBML format not supported. */
2501  return NULL;
2502 
2503 #else
2504 
2505  /* Parse PDBML header */
2506  STRINGLIST *wpdb_header = NULL,
2507  *title_lines = NULL,
2508  *compnd_lines = NULL,
2509  *source_lines = NULL,
2510  *seqres_lines = NULL,
2511  *modres_lines = NULL,
2512  *resol_lines = NULL,
2513  *last = NULL;
2514 
2515  /* Return if wpdb or document not set */
2516  if(wpdb == NULL || document == NULL)
2517  {
2518  return;
2519  }
2520 
2521  /* Get header records */
2522  wpdb_header = ParseHeaderPDBML(document);
2523  title_lines = ParseTitlePDBML(document);
2524  compnd_lines = ParseCompndPDBML(document, wpdb->pdb);
2525  source_lines = ParseSourcePDBML(document);
2526  resol_lines = ParseResolPDBML(document);
2527  seqres_lines = ParseSeqresPDBML(document);
2528  modres_lines = ParseModresPDBML(document);
2529 
2530  /* Append header records */
2531  last = wpdb_header;
2532  APPEND_STRINGLIST(last, title_lines);
2533  APPEND_STRINGLIST(last, compnd_lines);
2534  APPEND_STRINGLIST(last, source_lines);
2535  APPEND_STRINGLIST(last, resol_lines);
2536  APPEND_STRINGLIST(last, seqres_lines);
2537  APPEND_STRINGLIST(last, modres_lines);
2538 
2539  /* Set wpdb->header stringlist */
2540  if(wpdb->header != NULL)
2541  {
2542  FREELIST(wpdb->header, STRINGLIST);
2543  }
2544  wpdb->header = wpdb_header;
2545  return;
2546 
2547 #endif
2548 }
2549 
2550 
2551 /************************************************************************/
2552 /*>static void ParseHeaderPDBML(xmlDoc *document, WHOLEPDB *wpdb)
2553  --------------------------------------------------------------
2554 *//**
2555 
2556  \param[in] *document XML document
2557  \return STRINGLIST with header record.
2558 
2559  Parses PDBML header data and creates PDB-formatted header record.
2560 
2561 - 22.04.14 Original. By: CTP
2562 - 07.07.14 Renamed to ParseHeaderPDBML() By: CTP
2563 - 18.08.14 Return NULL if XML not supported. By: CTP
2564 - 10.09.14 UseSetPDBDateField() to set date field. By: CTP
2565 - 28.04.15 Parse xmlDoc instead of FILE. By: CTP
2566 - 05.05.15 Added Source and Compound data. By: CTP
2567 - 21.06.15 Restrict compnd type to polymer. By: CTP
2568 - 23.06.15 Removed filter for compnd molecule is water.
2569  Added seqres records. By: CTP
2570 - 24.06.15 Added modres records. By: CTP
2571 - 25.06.15 Added experimental data support (type, resolution,
2572  R and RFree)
2573  Checked all xmlGetProt() calls succeeded for attributes
2574  Checked all xmlNodeGetContent() calls succeeded
2575  Fixed some memory leaks with attributes
2576  replaced all x=x->next with NEXT(x)
2577  Tidied comments and variable declarations
2578  Added APPEND_STRINGLIST() macro
2579 - 29.06.15 Function broken into smaller functions:
2580  ParseHeaderPDBML(), ParseTitlePDBML(), ParseCompndPDBML(),
2581  ParseSourcePDBML(), ParseResolPDBML(), ParseSeqresPDBML()
2582  and ParseModresPDBML(). By: CTP
2583 */
2584 static STRINGLIST *ParseHeaderPDBML(xmlDoc *document)
2585 {
2586 #ifndef XML_SUPPORT
2587 
2588  /* PDBML format not supported. */
2589  return NULL;
2590 
2591 #else
2592 
2593  /* Parse PDBML header */
2594  xmlNode *root_node = NULL,
2595  *node = NULL,
2596  *subnode = NULL,
2597  *n = NULL;
2598  xmlChar *content, *attribute;
2599  STRINGLIST *header_lines = NULL;
2600  char header_line[82] = "",
2601  header_field[41] = "",
2602  pdb_field[5] = "",
2603  date_field[10] = "";
2604 
2605  /* Parse Document Tree */
2606  root_node = xmlDocGetRootElement(document);
2607  for(node = root_node->children; node; NEXT(node))
2608  {
2609  if(node->type != XML_ELEMENT_NODE){ continue; }
2610 
2611  /* get header */
2612  if(!strcmp("struct_keywordsCategory", (char *)node->name))
2613  {
2614  for(subnode = node->children; subnode; NEXT(subnode))
2615  {
2616  for(n=subnode->children; n; NEXT(n))
2617  {
2618  if(strcmp("pdbx_keywords", (char *)n->name)){ continue; }
2619  if((content = xmlNodeGetContent(n))!=NULL)
2620  {
2621  strncpy(header_field, (char *)content,40);
2622  xmlFree(content);
2623  }
2624  }
2625  }
2626  }
2627 
2628  /* get date */
2629  if(!strcmp("database_PDB_revCategory", (char *)node->name))
2630  {
2631  for(subnode = node->children; subnode; NEXT(subnode))
2632  {
2633  for(n=subnode->children; n; NEXT(n))
2634  {
2635  if(strcmp("date_original", (char *)n->name)){ continue; }
2636  if((content = xmlNodeGetContent(n))!=NULL)
2637  {
2638  /* convert date format */
2639  SetPDBDateField(date_field, (char *)content);
2640  xmlFree(content);
2641  }
2642  }
2643  }
2644  }
2645 
2646  /* get pdb code */
2647  if(!strcmp("entryCategory", (char *)node->name))
2648  {
2649  for(subnode = node->children; subnode; NEXT(subnode))
2650  {
2651  if(strcmp("entry", (char *)subnode->name)){ continue; }
2652 
2653  if((attribute = xmlGetProp(subnode, (xmlChar *)"id"))!=NULL)
2654  {
2655  strncpy(pdb_field, (char *)attribute,4);
2656  pdb_field[4] = '\0';
2657  xmlFree(attribute);
2658  }
2659  }
2660  }
2661  } /* Loop the XML nodes */
2662 
2663 
2664  /* Create Header Line */
2665  if(!strlen(header_field))
2666  {
2667  strcpy(header_field,"Converted from PDBML");
2668  }
2669  sprintf(header_line, "HEADER %-40s%9s %4s \n",
2670  header_field, date_field, pdb_field);
2671 
2672 
2673  /* Make Stringlist */
2674  header_lines = blStoreString(header_lines, header_line);
2675 
2676  /* Return Stringlist */
2677  return(header_lines);
2678 
2679 #endif
2680 }
2681 
2682 
2683 /************************************************************************/
2684 /*>static STRINGLIST *ParseTitlePDBML(xmlDoc *document)
2685  ----------------------------------------------------
2686 *//**
2687 
2688  \param[in] *document XML document
2689  \return STRINGLIST containing title record.
2690 
2691  Parses PDBML header data and returns TITLE record.
2692 
2693 - 01.07.15 Original. By: CTP
2694 */
2695 static STRINGLIST *ParseTitlePDBML(xmlDoc *document)
2696 {
2697 #ifndef XML_SUPPORT
2698 
2699  /* PDBML format not supported. */
2700  return NULL;
2701 
2702 #else
2703 
2704  /* Parse PDBML header */
2705  xmlNode *root_node = NULL,
2706  *node = NULL,
2707  *subnode = NULL,
2708  *n = NULL;
2709  xmlChar *content;
2710  STRINGLIST *title_lines = NULL;
2711 
2712  /* Parse Document Tree */
2713  root_node = xmlDocGetRootElement(document);
2714  for(node = root_node->children; node; NEXT(node))
2715  {
2716  if(node->type != XML_ELEMENT_NODE){ continue; }
2717 
2718  /* get title */
2719  if(!strcmp("structCategory", (char *)node->name))
2720  {
2721  for(subnode = node->children; subnode; NEXT(subnode))
2722  {
2723  for(n=subnode->children; n; NEXT(n))
2724  {
2725  if(strcmp("title", (char *)n->name)){ continue; }
2726  if((content = xmlNodeGetContent(n))!=NULL)
2727  {
2728  /* Get titlestringlist */
2729  title_lines = TitleStringlist((char *)content);
2730  xmlFree(content);
2731  }
2732  }
2733  }
2734  }
2735 
2736  } /* Loop the XML nodes */
2737 
2738 
2739  /* Return Stringlist */
2740  return(title_lines);
2741 
2742 #endif
2743 }
2744 
2745 
2746 /************************************************************************/
2747 /*>static STRINGLIST *ParseCompndPDBML(xmlDoc *document)
2748  -----------------------------------------------------
2749 *//**
2750 
2751  \param[in] *document XML document
2752  \param[in] *pdb PDB list being parsed
2753  \return STRINGLIST containing compound record.
2754 
2755  Parses PDBML header data and returns COMPND record.
2756 
2757 - 01.07.15 Original. By: CTP
2758 */
2759 static STRINGLIST *ParseCompndPDBML(xmlDoc *document, PDB *pdb)
2760 {
2761 #ifndef XML_SUPPORT
2762 
2763  /* PDBML format not supported. */
2764  return NULL;
2765 
2766 #else
2767 
2768  /* Parse PDBML header */
2769  xmlNode *root_node = NULL,
2770  *node = NULL,
2771  *subnode = NULL,
2772  *n = NULL;
2773  xmlChar *content, *attribute;
2774  STRINGLIST *compnd_lines = NULL;
2775  int nlines = 0;
2776 
2777  /* compound */
2778  COMPND compnd;
2779  int nchains = 0,
2780  i = 0;
2781  char **chains = NULL,
2782  compnd_type[16] = "";
2783 
2784  /* Parse Document Tree */
2785  root_node = xmlDocGetRootElement(document);
2786  for(node = root_node->children; node; NEXT(node))
2787  {
2788  if(node->type != XML_ELEMENT_NODE){ continue; }
2789 
2790 
2791  /* get compound */
2792  if(!strcmp("entityCategory", (char *)node->name))
2793  {
2794  /* get compnd node */
2795  for(subnode = node->children; subnode; NEXT(subnode))
2796  {
2797  if(strcmp("entity", (char *)subnode->name)){ continue; }
2798 
2799  /* Clear Fields */
2800  compnd.molid = 0;
2801  compnd.molecule[0] = '\0';
2802  compnd.chain[0] = '\0';
2803  compnd.fragment[0] = '\0';
2804  compnd.synonym[0] = '\0'; /* entity_name_com node */
2805  compnd.ec[0] = '\0';
2806  compnd.engineered[0] = '\0'; /* not in pdbml format */
2807  compnd.mutation[0] = '\0';
2808  compnd.other[0] = '\0';
2809  compnd_type[0] = '\0';
2810 
2811  /* mol id */
2812  if((attribute = xmlGetProp(subnode, (xmlChar *)"id"))!=NULL)
2813  {
2814  sscanf((char *)attribute, "%i", &compnd.molid);
2815  xmlFree(attribute);
2816  }
2817 
2818  /* scan compnd child nodes */
2819  for(n=subnode->children; n; NEXT(n))
2820  {
2821  if(n->type != XML_ELEMENT_NODE){ continue; }
2822 
2823  if((content = xmlNodeGetContent(n))!=NULL)
2824  {
2825  if(!strcmp("details", (char *)n->name))
2826  {
2827  strcpy(compnd.other, (char *)content);
2828  }
2829  else if(!strcmp("pdbx_description", (char *)n->name))
2830  {
2831  strcpy(compnd.molecule, (char *)content);
2832  }
2833  else if(!strcmp("pdbx_fragment", (char *)n->name))
2834  {
2835  strcpy(compnd.fragment, (char *)content);
2836  }
2837  else if(!strcmp("pdbx_ec", (char *)n->name))
2838  {
2839  strcpy(compnd.ec, (char *)content);
2840  }
2841  else if(!strcmp("pdbx_mutation", (char *)n->name))
2842  {
2843  strcpy(compnd.mutation, (char *)content);
2844  }
2845  else if(!strcmp("pdbx_engineered", (char *)n->name))
2846  {
2847  strcpy(compnd.engineered, (char *)content);
2848  }
2849  else if(!strcmp("type", (char *)n->name))
2850  {
2851  strcpy(compnd_type, (char *)content);
2852  }
2853 
2854  xmlFree(content);
2855  }
2856  }
2857 
2858  /* restrict to compnd type to polymer */
2859  if(strcmp(compnd_type,"polymer")){ continue; }
2860 
2861 
2862  /* get chains */
2863  chains = GetEntityChainLabels(compnd.molid , pdb, &nchains);
2864  if(chains != NULL)
2865  {
2866  /* make compnd.chain string */
2867  for(i=0; i<nchains; i++)
2868  {
2869  if(i)
2870  {
2871  strncat(compnd.chain,", ",3);
2872  }
2873  strncat(compnd.chain,chains[i],8);
2874  free(chains[i]);
2875  }
2876  free(chains);
2877  nchains = 0;
2878  }
2879 
2880  /* store compound as stringlist */
2881  compnd_lines = CompndStringlist(compnd_lines, &nlines,
2882  &compnd);
2883 
2884  } /* entity */
2885  }
2886 
2887  } /* Loop the XML nodes */
2888 
2889 
2890  /* Return Stringlist */
2891  return(compnd_lines);
2892 
2893 #endif
2894 }
2895 
2896 
2897 /************************************************************************/
2898 /*>static STRINGLIST *ParseSourcePDBML(xmlDoc *document)
2899  -----------------------------------------------------
2900 *//**
2901 
2902  \param[in] *document XML document
2903  \return STRINGLIST containing source record.
2904 
2905  Parses PDBML header data and returns SOURCE record.
2906 
2907 - 01.07.15 Original. By: CTP
2908 */
2909 static STRINGLIST *ParseSourcePDBML(xmlDoc *document)
2910 {
2911 #ifndef XML_SUPPORT
2912 
2913  /* PDBML format not supported. */
2914  return NULL;
2915 
2916 #else
2917 
2918  /* Parse PDBML header */
2919  xmlNode *root_node = NULL,
2920  *node = NULL,
2921  *subnode = NULL,
2922  *n = NULL;
2923  xmlChar *content, *attribute;
2924  double content_lf = 0.0;
2925  STRINGLIST *source_lines = NULL;
2926  int source_lines_stored = 0;
2927 
2928  /* source */
2929  PDBSOURCE source;
2930  int mol_id = 0;
2931 
2932  /* Parse Document Tree */
2933  root_node = xmlDocGetRootElement(document);
2934  for(node = root_node->children; node; NEXT(node))
2935  {
2936  if(node->type != XML_ELEMENT_NODE){ continue; }
2937 
2938 
2939  /* get source */
2940  if(!strcmp("entity_src_genCategory", (char *)node->name) ||
2941  !strcmp("entity_src_natCategory", (char *)node->name))
2942  {
2943  for(subnode = node->children; subnode; NEXT(subnode))
2944  {
2945  /* if not correct subnode continue */
2946  if(strcmp("entity_src_gen", (char *)subnode->name) &&
2947  strcmp("entity_src_nat", (char *)subnode->name))
2948  { continue; }
2949 
2950  /* clear SOURCE */
2951  mol_id = 0;
2952  source.scientificName[0] = '\0';
2953  source.commonName[0] = '\0';
2954  source.strain[0] = '\0';
2955  source.taxid = 0;
2956 
2957  /* mol id */
2958  if((attribute = xmlGetProp(subnode,
2959  (xmlChar *)"entity_id"))!=NULL)
2960  {
2961  sscanf((char *)attribute, "%i", &mol_id);
2962  xmlFree(attribute);
2963  }
2964 
2965  for(n=subnode->children; n; NEXT(n))
2966  {
2967  if(n->type != XML_ELEMENT_NODE){ continue; }
2968  if((content = xmlNodeGetContent(n))!=NULL)
2969  {
2970  if(!strcmp("pdbx_gene_src_common_name",
2971  (char *)n->name) ||
2972  !strcmp("common_name", (char *)n->name))
2973  {
2974  strncpy(source.commonName, (char *)content,
2975  MAXPDBANNOTATION - 1);
2976  }
2977  else if(!strcmp("pdbx_gene_src_scientific_name",
2978  (char *)n->name) ||
2979  !strcmp("pdbx_organism_scientific",
2980  (char *)n->name))
2981  {
2982  strncpy(source.scientificName, (char *)content,
2983  MAXPDBANNOTATION - 1);
2984  }
2985  else if(!strcmp("pdbx_gene_src_ncbi_taxonomy_id",
2986  (char *)n->name) ||
2987  !strcmp("pdbx_ncbi_taxonomy_id",
2988  (char *)n->name))
2989  {
2990  sscanf((char *)content, "%lf", &content_lf);
2991  source.taxid = (REAL)content_lf;
2992  }
2993  else if(!strcmp("pdbx_gene_src_strain",
2994  (char *)n->name) ||
2995  !strcmp("strain", (char *)n->name))
2996  {
2997  strncpy(source.strain, (char *)content,
2998  MAXPDBANNOTATION - 1);
2999  }
3000 
3001  xmlFree(content);
3002  }
3003  }
3004 
3005  /* store source lines */
3006  source_lines = SourceStringlist(source_lines,
3007  &source_lines_stored,
3008  mol_id,
3009  &source);
3010  }
3011  }
3012  } /* Loop the XML nodes */
3013 
3014 
3015  /* Return Stringlist */
3016  return(source_lines);
3017 
3018 #endif
3019 }
3020 
3021 
3022 
3023 /************************************************************************/
3024 /*>static STRINGLIST *ParseResolPDBML(xmlDoc *document)
3025  ----------------------------------------------------
3026 *//**
3027 
3028  \param[in] *document XML document
3029  \return STRINGLIST containing pdb-formatted records.
3030 
3031  Parses PDBML header data and returns experimental data (type,
3032  resolution, R and RFree) as PDB-formatted header records.
3033 
3034 - 01.07.15 Original based on ARCM's additions to ParseHeaderPDBML().
3035  By: CTP
3036 */
3037 static STRINGLIST *ParseResolPDBML(xmlDoc *document)
3038 {
3039 #ifndef XML_SUPPORT
3040 
3041  /* PDBML format not supported. */
3042  return NULL;
3043 
3044 #else
3045 
3046  /* Parse PDBML header */
3047  xmlNode *root_node = NULL,
3048  *node = NULL,
3049  *subnode = NULL,
3050  *n = NULL;
3051  xmlChar *content, *attribute;
3052  REAL resolution = (-1.0),
3053  RFree = (-1.0),
3054  RWork = (-1.0);
3055  STRINGLIST *resol_lines = NULL;
3056 
3057  /* Resolution and R-factor */
3058  char resol_line[82] = "";
3059 
3060  /* Parse Document Tree */
3061  root_node = xmlDocGetRootElement(document);
3062  for(node = root_node->children; node; NEXT(node))
3063  {
3064  if(node->type != XML_ELEMENT_NODE){ continue; }
3065 
3066  /* get resolution and r-factor */
3067  if(!strcmp("refine_ls_shellCategory", (char *)node->name))
3068  {
3069  for(subnode = node->children; subnode; NEXT(subnode))
3070  {
3071  if(!strcmp("refine_ls_shell", (char *)subnode->name))
3072  {
3073  /* Get the resolution from the attribute */
3074  if((attribute = xmlGetProp(subnode,
3075  (xmlChar *)"d_res_high"))!=NULL)
3076  {
3077  sscanf((char *)attribute, "%lf", &resolution);
3078  xmlFree(attribute);
3079  sprintf(resol_line,"REMARK 2 RESOLUTION. %5.2f \
3080 ANGSTROMS. \n", resolution);
3081  resol_lines = blStoreString(resol_lines, resol_line);
3082  }
3083  /* Get RWork and RFree from the children */
3084  for(n=subnode->children; n!=NULL; NEXT(n))
3085  {
3086  if(!strcmp("R_factor_R_free", (char *)n->name))
3087  {
3088  if((content = xmlNodeGetContent(n))!=NULL)
3089  {
3090  sscanf((char *)content, "%lf", &RFree);
3091  xmlFree(content);
3092  }
3093  }
3094  else if(!strcmp("R_factor_R_work", (char *)n->name))
3095  {
3096  if((content = xmlNodeGetContent(n))!=NULL)
3097  {
3098  sscanf((char *)content, "%lf", &RWork);
3099  xmlFree(content);
3100  }
3101  }
3102  }
3103  if((RWork > (REAL)(-0.99)) || (RFree > (REAL)(-0.99)))
3104  {
3105  sprintf(resol_line,"REMARK 3 REFINEMENT. \
3106  \n");
3107  resol_lines = blStoreString(resol_lines, resol_line);
3108 
3109  sprintf(resol_line,"REMARK 3 FIT TO DATA USED IN \
3110 REFINEMENT. \n");
3111  resol_lines = blStoreString(resol_lines, resol_line);
3112  }
3113  if(RWork > (REAL)(-0.99))
3114  {
3115  sprintf(resol_line,"REMARK 3 R VALUE \
3116 (WORKING SET) : %5.3f \n", RWork);
3117  resol_lines = blStoreString(resol_lines, resol_line);
3118  }
3119  if(RFree > (REAL)(-0.99))
3120  {
3121  sprintf(resol_line,"REMARK 3 FREE R VALUE \
3122  : %5.3f \n", RFree);
3123  resol_lines = blStoreString(resol_lines, resol_line);
3124  }
3125 
3126  /* Experimental details */
3127  if((attribute = xmlGetProp(subnode,
3128  (xmlChar *)"pdbx_refine_id"))
3129  != NULL)
3130  {
3131  sprintf(resol_line,"REMARK 200 EXPERIMENTAL DETAILS \
3132  \n");
3133  resol_lines = blStoreString(resol_lines, resol_line);
3134  sprintf(resol_line,"REMARK 200 EXPERIMENT TYPE \
3135  : %-35s\n", (char *)attribute);
3136  resol_lines = blStoreString(resol_lines, resol_line);
3137  xmlFree(attribute);
3138  }
3139  }
3140  }
3141  }
3142  } /* Loop the XML nodes */
3143 
3144 
3145  /* Return Stringlist */
3146  return(resol_lines);
3147 
3148 #endif
3149 }
3150 
3151 
3152 /************************************************************************/
3153 /*>static STRINGLIST *ParseSeqresPDBML(xmlDoc *document)
3154  ----------------------------------------------------
3155 *//**
3156 
3157  \param[in] *document XML document
3158  \return STRINGLIST containing SEQRES record.
3159 
3160  Parses PDBML header data and returns SEQRES record.
3161 
3162 - 01.07.15 Original. By: CTP
3163 */
3164 static STRINGLIST *ParseSeqresPDBML(xmlDoc *document)
3165 {
3166 #ifndef XML_SUPPORT
3167 
3168  /* PDBML format not supported. */
3169  return NULL;
3170 
3171 #else
3172 
3173  /* Parse PDBML header */
3174  xmlNode *root_node = NULL,
3175  *node = NULL,
3176  *subnode = NULL,
3177  *n = NULL;
3178  xmlChar *content, *attribute;
3179  STRINGLIST *seqres_lines = NULL;
3180  int nchains = 0,
3181  i = 0;
3182  char **chains = NULL;
3183 
3184  /* seqres */
3185  STRINGLIST **residue_list = NULL;
3186  char curr_chain[8] = "",
3187  prev_chain[8] = "",
3188  resnam[8] = "";
3189 
3190 
3191  /* Parse Document Tree */
3192  root_node = xmlDocGetRootElement(document);
3193  for(node = root_node->children; node; NEXT(node))
3194  {
3195  if(node->type != XML_ELEMENT_NODE){ continue; }
3196 
3197  /* get seqres records */
3198  if(!strcmp("pdbx_poly_seq_schemeCategory", (char *)node->name))
3199  {
3200  /* get seqres nodes */
3201  for(subnode = node->children; subnode; NEXT(subnode))
3202  {
3203  if(strcmp("pdbx_poly_seq_scheme", (char *)subnode->name))
3204  { continue; }
3205 
3206  /* get residue from pdbx_poly_seq_scheme node attibutes
3207 
3208  note: pdb_mon_id child node also contains the residue name
3209  but the node is absent if no coodinate data for the
3210  residue is present in the file
3211  */
3212  if((attribute = xmlGetProp(subnode, (xmlChar *)"mon_id"))
3213  != NULL)
3214  {
3215  strncpy(resnam, (char *)attribute,8);
3216  xmlFree(attribute);
3217  }
3218 
3219  /* get chain id from pdb_strand_id child node */
3220  for(n=subnode->children; n; NEXT(n))
3221  {
3222  if(n->type != XML_ELEMENT_NODE){ continue; }
3223  content = xmlNodeGetContent(n);
3224 
3225  if(!strcmp("pdb_strand_id", (char *)n->name))
3226  {
3227  strncpy(curr_chain, (char *)content,8);
3228  }
3229  xmlFree(content);
3230  }
3231 
3232  /* check for change in chain label */
3233  if(!CHAINMATCH(prev_chain,curr_chain))
3234  {
3235  /* increment chains and allocate memory */
3236  nchains++;
3237  if(nchains == 1)
3238  {
3239  chains = (char **)malloc(sizeof(char *));
3240  residue_list =
3241  (STRINGLIST **)malloc(sizeof(STRINGLIST *));
3242  if(chains == NULL || residue_list == NULL)
3243  {
3244  return NULL;
3245  }
3246  }
3247  else
3248  {
3249  chains = (char **)realloc(chains,
3250  (nchains)*sizeof(char *));
3251  residue_list =
3252  (STRINGLIST **)realloc(residue_list,
3253  (nchains)*sizeof(STRINGLIST *));
3254  }
3255 
3256  /* check memory allocated */
3257  if((chains == NULL) || (residue_list == NULL))
3258  {
3259  return NULL;
3260  }
3261 
3262  /* store chain */
3263  if((chains[nchains - 1] =
3264  (char *)malloc(8 * sizeof(char))) == NULL)
3265  {
3266  return NULL;
3267  }
3268  strncpy(chains[nchains - 1], curr_chain,8);
3269 
3270  /* set new residue list pointer to null */
3271  residue_list[nchains - 1] = NULL;
3272  }
3273 
3274  /* store residue */
3275  residue_list[nchains-1] =
3276  blStoreString(residue_list[nchains-1] ,resnam);
3277 
3278  /* set prev chain */
3279  strncpy(prev_chain, curr_chain,8);
3280  }
3281 
3282  /* store sequence as seqres records */
3283  seqres_lines = SeqresStringlist(nchains, chains, residue_list);
3284 
3285  /* free memory */
3286  if(nchains)
3287  {
3288  for(i=0; i<nchains; i++)
3289  {
3290  free(chains[i]);
3291  FREELIST(residue_list[i],STRINGLIST);
3292  }
3293  free(chains);
3294  free(residue_list);
3295  }
3296  }
3297 
3298 
3299  } /* Loop the XML nodes */
3300 
3301 
3302  /* Return Stringlist */
3303  return(seqres_lines);
3304 
3305 #endif
3306 }
3307 
3308 
3309 /************************************************************************/
3310 /*>static STRINGLIST *ParseModresPDBML(xmlDoc *document)
3311  -----------------------------------------------------
3312 *//**
3313 
3314  \param[in] *document XML document
3315  \return STRINGLIST containing MODRES record.
3316 
3317  Parses PDBML header data and returns MODRES record.
3318 
3319 - 01.07.15 Original. By: CTP
3320 */
3321 static STRINGLIST *ParseModresPDBML(xmlDoc *document)
3322 {
3323 #ifndef XML_SUPPORT
3324 
3325  /* PDBML format not supported. */
3326  return NULL;
3327 
3328 #else
3329 
3330  /* Parse PDBML header */
3331  xmlNode *root_node = NULL,
3332  *node = NULL,
3333  *subnode = NULL,
3334  *n = NULL;
3335  xmlChar *content, *attribute;
3336  double content_lf = 0.0;
3337  STRINGLIST *modres_lines = NULL;
3338  char pdb_field[5] = "";
3339 
3340  /* modres */
3341  char modres_line[82] = "",
3342  modres_resnam[8] = "",
3343  modres_chain[8] = "",
3344  modres_insert[2] = " ",
3345  modres_stdnam[8] = "",
3346  modres_comment[42] = "";
3347  int modres_seqnum = 0;
3348 
3349  /* Parse Document Tree */
3350  root_node = xmlDocGetRootElement(document);
3351  for(node = root_node->children; node; NEXT(node))
3352  {
3353  if(node->type != XML_ELEMENT_NODE){ continue; }
3354 
3355  /* get pdb code */
3356  /* todo: put code into ParsePdbcodePDBML() */
3357  if(!strcmp("entryCategory", (char *)node->name))
3358  {
3359  for(subnode = node->children; subnode; NEXT(subnode))
3360  {
3361  if(strcmp("entry", (char *)subnode->name)){ continue; }
3362 
3363  if((attribute = xmlGetProp(subnode, (xmlChar *)"id"))!=NULL)
3364  {
3365  strncpy(pdb_field, (char *)attribute,4);
3366  pdb_field[4] = '\0';
3367  xmlFree(attribute);
3368  }
3369  }
3370  }
3371 
3372  /* get modres records */
3373  if(!strcmp("pdbx_struct_mod_residueCategory", (char *)node->name))
3374  {
3375  /* get modres nodes */
3376  for(subnode = node->children; subnode; NEXT(subnode))
3377  {
3378  if(strcmp("pdbx_struct_mod_residue", (char *)subnode->name))
3379  { continue; }
3380 
3381  /* zero modres data */
3382  strcpy(modres_resnam, "");
3383  strcpy(modres_chain, "");
3384  modres_seqnum = 0;
3385  strcpy(modres_insert, " ");
3386  strcpy(modres_stdnam, "");
3387  strcpy(modres_comment, "");
3388 
3389  /* scan modres data nodes */
3390  for(n=subnode->children; n; NEXT(n))
3391  {
3392  if(n->type != XML_ELEMENT_NODE){ continue; }
3393  if((content = xmlNodeGetContent(n))!=NULL)
3394  {
3395  /* get data */
3396  if(!strcmp("auth_asym_id", (char *)n->name))
3397  {
3398  strncpy(modres_chain, (char *)content,8);
3399  }
3400  else if(!strcmp("auth_comp_id", (char *)n->name))
3401  {
3402  strncpy(modres_resnam, (char *)content,8);
3403  }
3404  else if(!strcmp("auth_seq_id", (char *)n->name))
3405  {
3406  sscanf((char *)content, "%lf", &content_lf);
3407  modres_seqnum = (REAL)content_lf;
3408  }
3409  else if(!strcmp("details", (char *)n->name))
3410  {
3411  strncpy(modres_comment, (char *)content,41);
3412  }
3413  else if(!strcmp("parent_comp_id", (char *)n->name))
3414  {
3415  strncpy(modres_stdnam, (char *)content,8);
3416  }
3417  else if(!strcmp("label_asym_id", (char *)n->name))
3418  {
3419  if(strlen(modres_chain) == 0)
3420  {
3421  strncpy(modres_chain, (char *)content,8);
3422  }
3423  }
3424  else if(!strcmp("label_comp_id", (char *)n->name))
3425  {
3426  if(strlen(modres_resnam) == 0)
3427  {
3428  strncpy(modres_resnam, (char *)content,8);
3429  }
3430  }
3431  else if(!strcmp("label_seq_id", (char *)n->name))
3432  {
3433  if(modres_seqnum == 0)
3434  {
3435  sscanf((char *)content, "%lf", &content_lf);
3436  modres_seqnum = (REAL)content_lf;
3437  }
3438  }
3439  else if(!strcmp("PDB_ins_code", (char *)n->name))
3440  {
3441  strncpy(modres_insert, (char *)content,8);
3442  }
3443 
3444  xmlFree(content);
3445  }
3446  }
3447 
3448  /* make modres record */
3449  sprintf(modres_line,"MODRES %4s %3s %1s %4d%1s %3s %-40s",
3450  pdb_field, modres_resnam, modres_chain,
3451  modres_seqnum, modres_insert, modres_stdnam,
3452  modres_comment);
3453  PADMINTERM(modres_line,80);
3454  strncat(modres_line,"\n",2);
3455 
3456  /* store modres record */
3457  modres_lines = blStoreString(modres_lines ,modres_line);
3458  }
3459  }
3460 
3461  } /* Loop the XML nodes */
3462 
3463 
3464  /* Return Stringlist */
3465  return(modres_lines);
3466 
3467 #endif
3468 }
3469 
3470 
3471 /************************************************************************/
3472 /*>static char **GetEntityChainLabels(int entity, PDB *pdb,
3473  int *nChains)
3474  --------------------------------------------------------
3475 *//**
3476  \param[in] entity Entity ID
3477  \param[in] *pdb PDB linked list
3478  \param[out] *nChains Number of chain labels found.
3479  \return Array of strings containing chain labels.
3480 
3481  Gets chain labels associated with an entity_id.
3482 
3483  Called when parsing PDBML-formatted files. The PDBML entity_id is
3484  equivalent to the MOL_ID for PDB-formatted files.
3485 
3486 - 15.06.15 Original. By: CTP
3487 
3488 */
3489 static char **GetEntityChainLabels(int entity, PDB *pdb, int *nChains)
3490 {
3491 #ifndef XML_SUPPORT
3492 
3493  /* PDBML format not supported. */
3494  return NULL;
3495 
3496 #else
3497 
3498  /* PDBML format supported. */
3499 
3500  char **chains = NULL,
3501  prev_chain[8] = "";
3502  PDB *p = NULL;
3503  int i = 0;
3504  BOOL found = FALSE;
3505 
3506  /* zero chain count */
3507  *nChains = 0;
3508 
3509  /* Scan PDB list */
3510  for(p = pdb; p != NULL; NEXT(p))
3511  {
3512  /* entity found and chain is not last chain found */
3513  if(p->entity_id == entity && !CHAINMATCH(p->chain,prev_chain))
3514  {
3515  /* check all found chains */
3516  found = FALSE;
3517  for(i=0; i < *nChains && !found; i++)
3518  {
3519  if(CHAINMATCH(p->chain,chains[i]))
3520  {
3521  found = TRUE;
3522  }
3523  }
3524 
3525  /* add new chain to array */
3526  if(!found)
3527  {
3528  /* allocate memory */
3529  if(chains == NULL)
3530  {
3531  /* first chain pointer */
3532  if( (chains = (char **)malloc(sizeof(char *))) == NULL )
3533  return(NULL);
3534  }
3535  else
3536  {
3537  /* add new chain pointer */
3538  if((chains = (char **)realloc(chains,
3539  (*nChains+1)*sizeof(char *)))
3540  == NULL)
3541  return(NULL);
3542  }
3543 
3544  /* chain */
3545  if((chains[*nChains] = (char *)malloc(8*sizeof(char)))
3546  == NULL)
3547  return(NULL);
3548 
3549  /* add chain and update count */
3550  strncpy(chains[*nChains],p->chain, 8);
3551  (*nChains)++;
3552  }
3553 
3554  /* update last chain found */
3555  strncpy(prev_chain, p->chain, 8);
3556  }
3557  }
3558 
3559  /* return chains */
3560  return chains;
3561 #endif
3562 }
3563 
3564 
3565 /************************************************************************/
3566 /*>static BOOL SetPDBDateField(char *pdb_date, char *pdbml_date)
3567  -------------------------------------------------------------
3568 *//**
3569 
3570  \param[out] *pdb_date PDB date string 'dd-MTH-yy'
3571  \param[in] *pdbml_date PDBML date string 'yyyy-mm-dd'
3572  \return Success?
3573 
3574  Convert pdbml date format to pdb date format.
3575 
3576 - 10.09.14 Original. By: CTP
3577 
3578 */
3579 static BOOL SetPDBDateField(char *pdb_date, char *pdbml_date)
3580 {
3581  char month_letter[12][4] = {"JAN","FEB","MAR","APR","MAY","JUN",
3582  "JUL","AUG","SEP","OCT","NOV","DEC"};
3583  int day = 0,
3584  month = 0,
3585  year = 0,
3586  items = 0;
3587 
3588  /* parse pdbml date */
3589  items = sscanf(pdbml_date, "%4d-%2d-%2d", &year, &month, &day);
3590 
3591  /* error check */
3592  if(items != 3 ||
3593  year == 0 || month == 0 || day == 0 ||
3594  day < 1 || day > 31 ||
3595  month < 1 || month > 12 ||
3596  year < 1900)
3597  {
3598  /* conversion failed */
3599  strncpy(pdb_date, " ", 10);
3600  return FALSE;
3601  }
3602 
3603  /* set pdb date */
3604  sprintf(pdb_date, "%02d-%3s-%02d",
3605  day, month_letter[month - 1], year % 100);
3606 
3607  return TRUE;
3608 }
3609 
3610 
3611 /************************************************************************/
3612 /*>static int *ParseConectPDBML(xmlDoc *document, PDB *pdb)
3613  --------------------------------------------------------
3614 *//**
3615 
3616  \param[in] *fpin File pointer
3617  \return STRINGLIST with basic header information.
3618 
3619  Parses a PDBML file and creates HEADER and TITLE lines.
3620 
3621 - 28.04.15 Original. By: CTP
3622 - 10.05.15 Removed distance check. By: CTP
3623 - 13.05.15 Removed dynamic variables. By: CTP
3624 
3625 */
3626 static int ParseConectPDBML(xmlDoc *document, PDB *pdb)
3627 {
3628 #ifndef XML_SUPPORT
3629 
3630  /* PDBML format not supported. */
3631  return(-1);
3632 
3633 #else
3634 
3635  /* Parse PDBML CONECT Records */
3636 
3637  xmlNode *root_node = NULL,
3638  *sites_node = NULL,
3639  *conect_node = NULL,
3640  *n = NULL;
3641 
3642  xmlChar *content;
3643  double content_lf;
3644 
3645  PDB conect_one,
3646  conect_two,
3647  *conect_a = NULL,
3648  *conect_b = NULL,
3649  *p = NULL;
3650 
3651  BOOL valid_conect = FALSE;
3652 
3653  int nconect = 0;
3654 
3655 
3656  /* Parse Document Tree */
3657  root_node = xmlDocGetRootElement(document);
3658  sites_node = NULL;
3659  for(n=root_node->children; n!=NULL; NEXT(n))
3660  {
3661  /* Find CONECT Sites Node */
3662  if(!strcmp("struct_connCategory", (char *)n->name))
3663  {
3664  /* Found CONECT Sites */
3665  sites_node = n;
3666  break;
3667  }
3668  }
3669 
3670  /* Read CONECT data */
3671  if(sites_node != NULL)
3672  {
3673  /* Scan conect nodes */
3674  for(conect_node = sites_node->children; conect_node;
3675  conect_node = conect_node->next)
3676  {
3677  if(conect_node->type != XML_ELEMENT_NODE){ continue; }
3678 
3679  /* Reset valid conect flag */
3680  valid_conect = FALSE;
3681 
3682  /* Set default values */
3683  CLEAR_PDB((&conect_one));
3684  strcpy(conect_one.chain, "");
3685  strcpy(conect_one.atnam, "");
3686  strcpy(conect_one.resnam, "");
3687  strcpy(conect_one.insert, " ");
3688  strcpy(conect_one.element, "");
3689  strcpy(conect_one.segid, "");
3690  CLEAR_PDB((&conect_two));
3691  strcpy(conect_two.chain, "");
3692  strcpy(conect_two.atnam, "");
3693  strcpy(conect_two.resnam, "");
3694  strcpy(conect_two.insert, " ");
3695  strcpy(conect_two.element, "");
3696  strcpy(conect_two.segid, "");
3697 
3698  /* Scan conect node children */
3699  for(n=conect_node->children; n!=NULL; NEXT(n))
3700  {
3701  if(n->type != XML_ELEMENT_NODE){ continue; }
3702  if((content = xmlNodeGetContent(n))==NULL)
3703  {
3704  /* Failed to assign memory
3705  Free memory and return error
3706  */
3707  FREELIST(pdb, PDB);
3708  return(-1);
3709  }
3710 
3711  /* Check CONECT type
3712  BiopLib only handles covalent bonding
3713  if not covalent bond then skip node
3714  */
3715  if(!strcmp((char *)n->name, "conn_type_id"))
3716  {
3717  if(!strncmp((char *)content,"covale",6) ||
3718  !strncmp((char *)content,"disulf",6) ||
3719  !strncmp((char *)content,"modres",6))
3720  {
3721  /* mark as covalent bond */
3722  valid_conect = TRUE;
3723  }
3724  else
3725  {
3726  /* skip remaining child nodes */
3727  valid_conect = FALSE;
3728  xmlFree(content);
3729  break;
3730  }
3731  }
3732 
3733  /* Set conect pdb data */
3734  if(!strcmp((char *)n->name, "ptnr1_auth_asym_id"))
3735  {
3736  strcpy(conect_one.chain, (char *)content);
3737  }
3738  else if(!strcmp((char *)n->name, "ptnr1_auth_comp_id"))
3739  {
3740  strcpy(conect_one.resnam, (char *)content);
3741  }
3742  else if(!strcmp((char *)n->name, "ptnr1_auth_seq_id"))
3743  {
3744  sscanf((char *)content, "%lf", &content_lf);
3745  conect_one.resnum = (REAL)content_lf;
3746  }
3747  else if(!strcmp((char *)n->name, "ptnr1_label_asym_id"))
3748  {
3749  if(strlen(conect_one.chain) == 0)
3750  {
3751  strncpy(conect_one.chain, (char *)content, 8);
3752  }
3753  }
3754  else if(!strcmp((char *)n->name, "ptnr1_label_atom_id"))
3755  {
3756  if(strlen(conect_one.atnam) == 0)
3757  {
3758  strncpy(conect_one.atnam, (char *)content, 8);
3759  PADMINTERM(conect_one.atnam, 4);
3760  }
3761  }
3762  else if(!strcmp((char *)n->name, "ptnr1_label_comp_id"))
3763  {
3764  if(strlen(conect_one.resnam) == 0)
3765  {
3766  strncpy(conect_one.resnam, (char *)content, 8);
3767  }
3768  }
3769  else if(!strcmp((char *)n->name, "ptnr1_label_seq_id"))
3770  {
3771  if((conect_one.resnum == 0) &&
3772  (strlen((char *)content) > 0))
3773  {
3774  content_lf = (REAL)0.0;
3775  sscanf((char *)content, "%lf", &content_lf);
3776  conect_one.resnum = (REAL)content_lf;
3777  }
3778  }
3779  else if(!strcmp((char *)n->name, "ptnr2_auth_asym_id"))
3780  {
3781  strcpy(conect_two.chain, (char *)content);
3782  }
3783  else if(!strcmp((char *)n->name, "ptnr2_auth_comp_id"))
3784  {
3785  strcpy(conect_two.resnam, (char *)content);
3786  }
3787  else if(!strcmp((char *)n->name, "ptnr2_auth_seq_id"))
3788  {
3789  sscanf((char *)content, "%lf", &content_lf);
3790  conect_two.resnum = (REAL)content_lf;
3791  }
3792  else if(!strcmp((char *)n->name, "ptnr2_label_asym_id"))
3793  {
3794  if(strlen(conect_two.chain) == 0)
3795  {
3796  strncpy(conect_two.chain, (char *)content, 8);
3797  }
3798  }
3799  else if(!strcmp((char *)n->name, "ptnr2_label_atom_id"))
3800  {
3801  if(strlen(conect_two.atnam) == 0)
3802  {
3803  strncpy(conect_two.atnam, (char *)content, 8);
3804  PADMINTERM(conect_two.atnam, 4);
3805  }
3806  }
3807  else if(!strcmp((char *)n->name, "ptnr2_label_comp_id"))
3808  {
3809  if(strlen(conect_two.resnam) == 0)
3810  {
3811  strncpy(conect_two.resnam, (char *)content, 8);
3812  }
3813  }
3814  else if(!strcmp((char *)n->name, "ptnr2_label_seq_id"))
3815  {
3816  if((conect_two.resnum == 0) &&
3817  (strlen((char *)content) > 0))
3818  {
3819  content_lf = (REAL)0.0;
3820  sscanf((char *)content, "%lf", &content_lf);
3821  conect_two.resnum = (REAL)content_lf;
3822  }
3823  }
3824 
3825  /* free xml content */
3826  xmlFree(content);
3827 
3828  } /* end conect node */
3829 
3830 
3831  /* Filter CONECT records
3832  1. Covalent bond type
3833  2. Atoms found in pdb list
3834  */
3835 
3836  /* 1. Skip unless CONECT entry is for covalent bond */
3837  if(!valid_conect){ continue; }
3838 
3839 
3840  /* 2. Find conect atoms in pdb list */
3841  conect_a = NULL;
3842  conect_b = NULL;
3843  for( p=pdb; p!=NULL && (conect_a == NULL || conect_b == NULL);
3844  NEXT(p) )
3845  {
3846  if(CHAINMATCH(p->chain,conect_one.chain) &&
3847  (p->resnum == conect_one.resnum) &&
3848  !strncmp(p->atnam,conect_one.atnam,8))
3849  {
3850  conect_a = p;
3851  }
3852 
3853  if(CHAINMATCH(p->chain,conect_two.chain) &&
3854  (p->resnum == conect_two.resnum) &&
3855  !strncmp(p->atnam,conect_two.atnam,8))
3856  {
3857  conect_b = p;
3858  }
3859 
3860  }
3861  /* skip if conect atoms not found in list */
3862  if(!(conect_a && conect_b)){ continue; }
3863 
3864 
3865  /* Add CONECT record */
3866  if(blAddConect(conect_a,conect_b))
3867  {
3868  /* increment nconect counter */
3869  nconect++;
3870  }
3871  else
3872  {
3873  /* failed to add conect record */
3874  FREELIST(pdb,PDB);
3875  return(-1);
3876  }
3877 
3878  } /* end conect node */
3879  } /* end conect sites */
3880 
3881  /* Return number of CONECT records stored */
3882  return(nconect);
3883 
3884 #endif
3885 }
3886 
3887 /************************************************************************/
3888 /*>static STRINGLIST *DoStoreStringlist(STRINGLIST *stringlist,
3889  char *record, int *lines_stored,
3890  char *token, char *content,
3891  char *terminator)
3892  ---------------------------------------------------------------------
3893 *//**
3894 
3895  \param[in] *stringlist Stringlist with header information
3896  (eg COMPND lines)
3897  \param[in] *record Record type (eg COMPND)
3898  \param[in] *lines_stored Number of lines stored
3899  \param[in] *token Item stored (eg EC:)
3900  \param[in] *content Content
3901  \param[in] *terminator Terminator for line (";" or empty string)
3902  \return STRINGLIST with header information.
3903 
3904  Stores header information in PDB-format. Data are split over mutiple
3905  lines if required.
3906 
3907 - 13.05.15 Original. By: CTP
3908 - 25.06.15 Doesn't upcase CHAIN By: ACRM
3909 
3910 */
3911 static STRINGLIST *DoStoreStringlist(STRINGLIST *stringlist,
3912  char *record, int *lines_stored,
3913  char *token, char *content,
3914  char *terminator)
3915 {
3916 #ifndef XML_SUPPORT
3917 
3918  /* PDBML format not supported. */
3919  return(NULL);
3920 
3921 #else
3922 
3923  /* Store PDBML content as PDB-formated stringlist */
3924  char content_line[82] = "",
3925  content_field[71] = "",
3926  *content_string = NULL;
3927 
3928  int cut_from = 0,
3929  cut_to = 0,
3930  nlines = 0,
3931  i = 0;
3932 
3933 
3934  /* Return if no content */
3935  if(strlen(content) == 0)
3936  {
3937  return stringlist;
3938  }
3939 
3940  /* get lines stored */
3941  nlines = *lines_stored;
3942 
3943  /* make content string */
3944  if((content_string = (char *)malloc((1+strlen(token) +
3945  strlen(content) +
3946  strlen(terminator))*sizeof(char)))
3947  ==NULL)
3948  {
3949  /* malloc failed */
3950  free(stringlist);
3951  return NULL;
3952  }
3953  /* add token, content and terminator */
3954  strcpy(content_string, token );
3955  strcpy(&content_string[strlen(token)], content);
3956  strcpy(&content_string[strlen(token)+strlen(content)], terminator);
3957 
3958  /* Upcase everything except the CHAIN record which is case sensitive */
3959  if(strncmp(content_string, " CHAIN: ", 8))
3960  UPPER(content_string);
3961 
3962  /* store content string as stringlist */
3963  for(i=0; i<strlen(content_string); i++)
3964  {
3965  if(content_string[i] == ' ' ) cut_to = i;
3966  if(i == strlen(content_string) - 1) cut_to = i+1;
3967 
3968  /* split and store title line */
3969  if( (i && !((i - cut_from)%70)) ||
3970  (i == strlen(content_string)-1) )
3971  {
3972  nlines++;
3973  cut_to = (cut_from == cut_to) ? i : cut_to;
3974  strncpy(content_field,
3975  &content_string[cut_from],
3976  cut_to - cut_from);
3977  content_field[cut_to - cut_from] = '\0';
3978  PADMINTERM(content_field,70);
3979  cut_from = cut_to;
3980  i = cut_to;
3981 
3982  if(nlines == 1)
3983  {
3984  sprintf(content_line, "%-6s %s\n", record, content_field);
3985  stringlist = blStoreString(NULL,content_line);
3986  if(stringlist == NULL)
3987  {
3988  return NULL;
3989  }
3990  }
3991  else
3992  {
3993  sprintf(content_line, "%-6s %3d%s\n",
3994  record, nlines, content_field);
3995  blStoreString(stringlist,content_line);
3996  if(stringlist == NULL)
3997  {
3998  return NULL;
3999  }
4000  }
4001  }
4002  }
4003 
4004  /* free content_string */
4005  free(content_string);
4006 
4007  /* update lines stored */
4008  *lines_stored = nlines;
4009 
4010  /* return stringlist */
4011  return stringlist;
4012 
4013 #endif
4014 }
4015 
4016 
4017 /************************************************************************/
4018 /*>static STRINGLIST *TitleStringlist(char *titlestring)
4019  -----------------------------------------------------
4020 *//**
4021 
4022  \param[in] *titlestring TITLE string
4023  \return STRINGLIST with TITLE in PDB-format.
4024 
4025  Creates TITLE line.
4026 
4027 - 13.05.15 Original. By: CTP
4028 
4029 */
4030 static STRINGLIST *TitleStringlist(char *titlestring)
4031 {
4032 #ifndef XML_SUPPORT
4033 
4034  /* PDBML format not supported. */
4035  return(NULL);
4036 
4037 #else
4038 
4039  /* Store title as PDB-formatted stringlist */
4040  STRINGLIST *title_stringlist = NULL;
4041  int start_line = 0;
4042 
4043  title_stringlist = DoStoreStringlist(NULL, "TITLE", &start_line, "",
4044  titlestring, "");
4045 
4046  return title_stringlist;
4047 
4048 #endif
4049 }
4050 
4051 
4052 /************************************************************************/
4053 /*>static STRINGLIST *CompndStringlist(STRINGLIST *stringlist,
4054  int *lines_stored,
4055  COMPND *compnd)
4056  -----------------------------------------------------------
4057 *//**
4058 
4059  \param[in] *stringlist Stringlist with COMPND information
4060  \param[in] *lines_stored Number of lines stored
4061  \param[in] *source Pointer to COMPND data structure.
4062  \return STRINGLIST with COMPND in PDB-format.
4063 
4064  Creates COMPND lines in PDB-format.
4065 
4066 - 13.05.15 Original. By: CTP
4067 - 14.06.15 Store chain. By: CTP
4068 
4069 */
4070 static STRINGLIST *CompndStringlist(STRINGLIST *stringlist,
4071  int *lines_stored, COMPND *compnd)
4072 {
4073 #ifndef XML_SUPPORT
4074 
4075  /* PDBML format not supported. */
4076  return(NULL);
4077 
4078 #else
4079 
4080  /* Store compound as PDB-formatted stringlist */
4081  char mol_id[4] = "";
4082 
4083  /* set MOL_ID if absent */
4084  if(compnd->molid == 0)
4085  {
4086  compnd->molid = 1;
4087  }
4088 
4089  /* mol_id */
4090  sprintf(mol_id,"%i",compnd->molid);
4091  if(*lines_stored == 0)
4092  {
4093  /* store first compnd line */
4094  stringlist = DoStoreStringlist(stringlist, "COMPND",
4095  lines_stored, "MOL_ID: ",
4096  mol_id, ";");
4097  }
4098  else
4099  {
4100  DoStoreStringlist(stringlist, "COMPND",
4101  lines_stored, " MOL_ID: ",
4102  mol_id, ";");
4103  }
4104 
4105  /* molecule */
4106  DoStoreStringlist(stringlist, "COMPND",
4107  lines_stored, " MOLECULE: ",
4108  compnd->molecule, ";");
4109 
4110  /* chain */
4111  DoStoreStringlist(stringlist, "COMPND",
4112  lines_stored, " CHAIN: ",
4113  compnd->chain, ";");
4114 
4115  /* fragment */
4116  DoStoreStringlist(stringlist, "COMPND",
4117  lines_stored, " FRAGMENT: ",
4118  compnd->fragment, ";");
4119 
4120  /* ec */
4121  DoStoreStringlist(stringlist, "COMPND",
4122  lines_stored, " EC: ",
4123  compnd->ec, ";");
4124 
4125  /* engineered */
4126  DoStoreStringlist(stringlist, "COMPND",
4127  lines_stored, " ENGINEERED: ",
4128  compnd->engineered, ";");
4129 
4130  /* mutation */
4131  DoStoreStringlist(stringlist, "COMPND",
4132  lines_stored, " MUTATION: ",
4133  compnd->mutation, ";");
4134 
4135  /* other */
4136  DoStoreStringlist(stringlist, "COMPND",
4137  lines_stored, " OTHER_DETAILS: ",
4138  compnd->other, ";");
4139 
4140  return stringlist;
4141 
4142 #endif
4143 }
4144 
4145 
4146 /************************************************************************/
4147 /*>static STRINGLIST *SourceStringlist(STRINGLIST *stringlist,
4148  int *lines_stored, int mol_id,
4149  PDBSOURCE *source)
4150  --------------------------------------------------------------------
4151 *//**
4152 
4153  \param[in] *stringlist Stringlist with SOURCE information
4154  \param[in] *lines_stored Number of lines stored
4155  \param[in] mol_id MOL_ID of associated COMPND entry.
4156  \param[in] *source Pointer to PDBSOURCE data structure.
4157  \return STRINGLIST with SOURCE in PDB-format.
4158 
4159  Creates SOURCE lines in PDB-format.
4160 
4161 - 13.05.15 Original. By: CTP
4162 
4163 */
4164 static STRINGLIST *SourceStringlist(STRINGLIST *stringlist,
4165  int *lines_stored, int mol_id,
4166  PDBSOURCE *source)
4167 {
4168 #ifndef XML_SUPPORT
4169 
4170  /* PDBML format not supported. */
4171  return(NULL);
4172 
4173 #else
4174 
4175  /* Store source as PDB-formatted stringlist */
4176  char buffer[8] = "";
4177 
4178  /* mol_id */
4179  sprintf(buffer,"%i",mol_id);
4180  if(*lines_stored == 0)
4181  {
4182  /* store first compnd line */
4183  stringlist = DoStoreStringlist(stringlist, "SOURCE",
4184  lines_stored,"MOL_ID: ",
4185  buffer, ";");
4186  }
4187  else
4188  {
4189  DoStoreStringlist(stringlist, "SOURCE",
4190  lines_stored, " MOL_ID: ",
4191  buffer, ";");
4192  }
4193 
4194  /* scientific name */
4195  DoStoreStringlist(stringlist, "SOURCE",
4196  lines_stored, " ORGANISM_SCIENTIFIC: ",
4197  source->scientificName, ";");
4198 
4199  /* common name */
4200  DoStoreStringlist(stringlist, "SOURCE",
4201  lines_stored, " ORGANISM_COMMON: ",
4202  source->commonName, ";");
4203 
4204  /* taxon id */
4205  sprintf(buffer,"%i",source->taxid);
4206  DoStoreStringlist(stringlist, "SOURCE",
4207  lines_stored, " ORGANISM_TAXID: ",
4208  buffer, ";");
4209 
4210  /* strain */
4211  DoStoreStringlist(stringlist, "SOURCE",
4212  lines_stored, " STRAIN: ",
4213  source->strain, ";");
4214 
4215  return stringlist;
4216 
4217 #endif
4218 }
4219 
4220 /************************************************************************/
4221 /*>static STRINGLIST *SeqresStringlist(int nchains, char **chains,
4222  STRINGLIST **residues)
4223  ---------------------------------------------------------------
4224 *//**
4225 
4226  \param[in] nchains Number of chains
4227  \param[in] **chains Array of chain labels.
4228  \param[in] **residues Array of STRINGLISTS containing residue
4229  sequence for each chain.
4230  \return STRINGLIST with SEQRES in PDB-format.
4231 
4232  Creates SEQRES records in PDB-format.
4233 
4234 - 23.06.15 Original. By: CTP
4235 
4236 */
4237 static STRINGLIST *SeqresStringlist(int nchains, char **chains,
4238  STRINGLIST **residues)
4239 {
4240 #ifndef XML_SUPPORT
4241 
4242  /* PDBML format not supported. */
4243  return(NULL);
4244 
4245 #else
4246 
4247  STRINGLIST *stringlist = NULL,
4248  *residue = NULL;
4249  int i = 0, j = 0, nres = 0;
4250  int nline = 0;
4251  char seqres_line[82] = "";
4252  char residue_field[5] = "";
4253  char sequence_field[53] = "";
4254 
4255 
4256  /* process chains */
4257  for(i=0; i<nchains; i++)
4258  {
4259  /* find number of residues */
4260  nres = 0;
4261  for(residue = residues[i]; residue != NULL; NEXT(residue))
4262  {
4263  nres++;
4264  }
4265 
4266  /* store string */
4267  for(residue=residues[i], j=0, nline=0;
4268  residue!=NULL;
4269  NEXT(residue))
4270  {
4271  j++;
4272  sprintf(residue_field,"%4s",residue->string);
4273  strncat(sequence_field,residue_field,4);
4274  if(j%13 == 0 || j == nres)
4275  {
4276  /* store line */
4277  nline++;
4278  sprintf(seqres_line,"SEQRES%4d%2s %4d %-52s \n",
4279  nline,chains[i],nres,sequence_field);
4280  sequence_field[0] = '\0';
4281  stringlist = blStoreString(stringlist,seqres_line);
4282  }
4283  }
4284  }
4285 
4286  return stringlist;
4287 
4288 #endif
4289 }
4290 
4291 
4292 
4293 #endif
int natoms
Definition: pdb.h:377
#define FINDPREV(p, s, l)
Definition: macros.h:351
BOOL blAddConect(PDB *p, PDB *q)
Definition: BuildConect.c:173
#define ALLOCNEXT(x, y)
Definition: macros.h:251
#define MAXPDBANNOTATION
Definition: pdb.h:245
STRINGLIST * trailer
Definition: pdb.h:376
char strain[MAXPDBANNOTATION]
Definition: pdb.h:395
PDB * blReadPDBAll(FILE *fp, int *natom)
Definition: ReadPDB.c:460
Include file for PDB routines.
FILE * popen(char *, char *)
int resnum
Definition: pdb.h:310
REAL partial_charge
Definition: pdb.h:300
char fragment[MAXPDBANNOTATION]
Definition: pdb.h:383
short BOOL
Definition: SysDefs.h:64
#define LAST(x)
Definition: macros.h:259
#define NULL
Definition: array2.c:99
PDB * blReadPDBAtoms(FILE *fp, int *natom)
Definition: ReadPDB.c:503
Definition: pdb.h:298
void blWritePDBRecord(FILE *fp, PDB *pdb)
Definition: WritePDB.c:415
char engineered[MAXPDBANNOTATION]
Definition: pdb.h:383
#define CLEAR_PDB(p)
Definition: pdb.h:460
char * blFixAtomName(char *name, REAL occup)
Definition: ReadPDB.c:1291
#define LOCATION_TRAILER
Definition: ReadPDB.c:341
BOOL gPDBXML
STRINGLIST * header
Definition: pdb.h:375
#define FALSE
Definition: macros.h:223
Definition: pdb.h:372
#define NEXT(x)
Definition: macros.h:249
BOOL blCheckFileFormatPDBML(FILE *fp)
Definition: ReadPDB.c:2161
char altpos
Definition: pdb.h:324
int atnum
Definition: pdb.h:309
char record_type[8]
Definition: pdb.h:315
WHOLEPDB * blReadWholePDB(FILE *fpin)
Definition: ReadPDB.c:2328
void blFreeStringList(STRINGLIST *StringList)
Useful macros.
int nConect
Definition: pdb.h:312
int gPDBMultiNMR
char atnam[8]
Definition: pdb.h:316
int fsscanf(char *buffer, char *format,...)
Definition: fsscanf.c:177
char resnam[8]
Definition: pdb.h:319
int entity_id
Definition: pdb.h:313
double REAL
Definition: MathType.h:67
WHOLEPDB * blReadWholePDBAtoms(FILE *fpin)
Definition: ReadPDB.c:2363
REAL z
Definition: pdb.h:300
REAL radius
Definition: pdb.h:300
PDB * pdb
Definition: pdb.h:374
char other[MAXPDBANNOTATION]
Definition: pdb.h:383
char commonName[MAXPDBANNOTATION]
Definition: pdb.h:395
char element[8]
Definition: pdb.h:322
BOOL gPDBPartialOcc
int molid
Definition: pdb.h:382
BOOL gPDBModelNotFound
Include file for fsscanf()
char synonym[MAXPDBANNOTATION]
Definition: pdb.h:383
#define TRUE
Definition: macros.h:219
PDB * blReadPDB(FILE *fp, int *natom)
Definition: ReadPDB.c:419
#define XML_BUFFER
Definition: ReadPDB.c:335
int formal_charge
Definition: pdb.h:311
int pclose(FILE *)
Definition: pdb.h:380
char ec[MAXPDBANNOTATION]
Definition: pdb.h:383
int taxid
Definition: pdb.h:398
WHOLEPDB * blDoReadPDB(FILE *fpin, BOOL AllAtoms, int OccRank, int ModelNum, BOOL DoWhole)
Definition: ReadPDB.c:707
REAL access
Definition: pdb.h:300
char atnam_raw[8]
Definition: pdb.h:317
char chain[MAXPDBANNOTATION]
Definition: pdb.h:383
Port-specific defines to allow us to use things like popen() in a clean compile.
PDB * blReadPDBAtomsOccRank(FILE *fp, int *natom, int OccRank)
Definition: ReadPDB.c:586
void blCopyPDB(PDB *out, PDB *in)
Definition: CopyPDB.c:108
PDB * blReadPDBOccRank(FILE *fp, int *natom, int OccRank)
Definition: ReadPDB.c:545
STRINGLIST * blStoreString(STRINGLIST *StringList, char *string)
Definition: StoreString.c:131
Header file for general purpose routines.
int blChindex(char *string, char ch)
Definition: chindex.c:122
#define LOCATION_HEADER
Definition: ReadPDB.c:339
PDB * blRemoveAlternates(PDB *pdb)
Definition: ReadPDB.c:1362
#define XML_SAMPLE
Definition: ReadPDB.c:336
#define LOCATION_COORDINATES
Definition: ReadPDB.c:340
#define CHAINMATCH(chain1, chain2)
Definition: pdb.h:495
#define MAXCONECT
Definition: pdb.h:243
#define KILLLEADSPACES(y, x)
Definition: macros.h:408
#define FREELIST(y, z)
Definition: macros.h:264
void blFreeWholePDB(WHOLEPDB *wpdb)
Definition: ReadPDB.c:2293
char segid[8]
Definition: pdb.h:323
#define INIT(x, y)
Definition: macros.h:244
System-type variable type definitions.
PDB * blFindNextResidue(PDB *pdb)
#define UPPER(x)
Definition: macros.h:390
REAL occ
Definition: pdb.h:300
char molecule[MAXPDBANNOTATION]
Definition: pdb.h:383
#define MAXPARTIAL
Definition: ReadPDB.c:333
Type definitions for maths.
struct pdb_entry * next
Definition: pdb.h:307
struct pdb_entry * conect[MAXCONECT]
Definition: pdb.h:308
char chain[blMAXCHAINLABEL]
Definition: pdb.h:321
void blSetElementSymbolFromAtomName(char *element, char *atom_name)
Definition: WritePDB.c:1433
REAL x
Definition: pdb.h:300
char scientificName[MAXPDBANNOTATION]
Definition: pdb.h:395
REAL y
Definition: pdb.h:300
REAL bval
Definition: pdb.h:300
void blRenumAtomsPDB(PDB *pdb, int offset)
#define PADMINTERM(string, len)
Definition: macros.h:459
char insert[8]
Definition: pdb.h:320
char mutation[MAXPDBANNOTATION]
Definition: pdb.h:383
WHOLEPDB * blDoReadPDBML(FILE *fpin, BOOL AllAtoms, int OccRank, int ModelNum, BOOL DoWhole)
Definition: ReadPDB.c:1624