Read a PIR sequence file. More...

#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include "SysDefs.h"
#include "macros.h"
#include "seq.h"

Functions
int	blReadPIR (FILE fp, BOOL DoInsert, char seqs, int maxchain, SEQINFO seqinfo, BOOL punct, BOOL error)

Detailed Description

Read a PIR sequence file.

Version: V2.8

Date: 07.07.14

Copyright: (c) UCL / Dr. Andrew C. R. Martin 1991-2014

Author: Dr. Andrew C. R. Martin

: Institute of Structural & Molecular Biology, University College London, Gower Street, London. WC1E 6BT.

: andre.nosp@m.w@bi.nosp@m.oinf..nosp@m.org..nosp@m.uk andre.nosp@m.w.ma.nosp@m.rtin@.nosp@m.ucl..nosp@m.ac.uk

This code is NOT IN THE PUBLIC DOMAIN, but it may be copied according to the conditions laid out in the accompanying file COPYING.DOC.

The code may be modified as required, but any modifications must be documented so that the person responsible can be identified.

The code may not be sold commercially or included as part of a commercial product except as described in the file COPYING.DOC.

Description:

Usage:

int blReadPIR(FILE *fp, BOOL DoInsert, char **seqs, int maxchain,

SEQINFO *seqinfo, BOOL *punct, BOOL *error)

This version attempts to read any PIR file following the PIR specifications. It also accepts a few non-standard features: lower case sequence, no star at end of last chain, dashes in the sequence to indicate insertions.

Revision History:

V1.0 01.06.92 Original
V2.0 08.03.94 Changed name of ReadPIR() to ReadSimplePIR() Added new ReadPIR().
V2.1 18.03.94 getc() -> fgetc()
V2.2 11.05.94 Changes to ReadPIR() for better compatibility with PIR V38.0 and V39.0
V2.3 28.02.95 Added ReadRawPIR()
V2.4 13.03.95 Fixed bug in reading text lines in ReadRawPIR()
V2.5 26.07.95 Removed unused variables
V2.6 30.10.95 Cosmetic
V2.7 06.02.96 Removes trailing spaces from comment line
V2.8 07.07.14 Use bl prefix for functions By: CTP

Definition in file ReadPIR.c.

Function Documentation

int blReadPIR	(	FILE *	fp,
		BOOL	DoInsert,
		char **	seqs,
		int	maxchain,
		SEQINFO *	seqinfo,
		BOOL *	punct,
		BOOL *	error
	)

Parameters

[in]	*fp	File pointer
[in]	DoInsert	TRUE Read - characters into the sequence FALSE Skip - characters
[in]	maxchain	Max number of chains to read. This is the dimension of the seqs array. N.B. THIS SHOULD BE AT LEAST 1 MORE THAN THE EXPECTED MAXIMUM NUMBER OF SEQUENCES
[out]	**seqs	Array of character pointers which will be filled in with sequence information. Memory will be allocated for any sequence length.
[out]	*seqinfo	This structure will be filled in with extra information about the sequence. Header & title information and details of any punctuation.
[out]	*punct	TRUE if any punctuation found.
[out]	*error	TRUE if an error occured (e.g. memory allocation)

Returns: Number of chains in this sequence. 0 if file ended, or no valid sequence entries found.

This is an all-singing, all-dancing PIR reader which should handle all legal PIR files and some (slightly) incorrect ones. The only requirements of the code are that the PIR file should have 2 title lines per entry, the first line starting with a > sign.

The routine will handle multiple sequence files. Successive calls will return information on the next entry. The routine will return 0 when there are no more entries.

Header line: Must start with >. Will handle files which don't have the proper P1; or F1; parts of the header as well as those which do.

Title line: Will read the name and source fields if correctly separated by a -, otherwise copies all information into the name.

Sequence: May contain allowed puctuation. This will set the punct flag and information on the types found will be placed in seqinfo. White space and line breaks are ignored. Each chain should end with a *, but the routine will accept the last chain of an entry with no . While the standard requires upper case text, this routine will handle lower case and convert it to upper case. While the routine does pretty well at last chains not terminated with a *, a last chain ending with a / not followed by a * but followed by a text line will be identified as incomplete rather than truncated. If the DoInsert flag is set, - signs in the sequence will be read as part of the sequence, otherwise they will be skipped. This is an addition to the PIR standard.

Text lines: Text lines after an entry (beginning with R;, C;, A;, N; or F;) are ignored.

02.03.94 Original By: ACRM
03.03.94 Added / and = handling, upcasing, strcpy()->strncpy(), header lines without semi-colon, title lines without -
07.03.94 Added sequence insertion handling and DoInsert parameter.
11.05.94 buffer is now 504 characters (V38.0 spec allows 500 chars) Removes leading spaces from entry code and terminates at first space (V39.0 spec allows comments after the code).
28.02.95 Added check that buffer doesn't overflow. Check on nseq changed to >=
06.02.96 Removes trailing spaces from comment line
07.07.14 Use bl prefix for functions By: CTP

Definition at line 180 of file ReadPIR.c.

Functions

Detailed Description

Description:

Usage:

Revision History:

Function Documentation