Writing High Quality Code

I was send the following code by a student as a response to the Web Services practical:

## ~!/usr/bin/env python3

from urllib import request
import sys
import re

def ParseRST(result):
    p = re.compile('.*AC:\s+(.*?)#')
    m = p.match(result)
    ac = m.group(1)
    q = re.compile('.*UPCOUNT:\s+(.*?)#')
    n = q.match(result)
    upresnum = int(n.group(1))

    return(ac, upresnum)

def ReadPDBSWS(pdbcode, chain, resnum):
    url = 'http://www.bioinf.org.uk/cgi-bin/pdbsws/query.pl?plain=1&qtype=pdb'
    url += '&id=' + pdbcode
    if ('&chain' != ''):
        url += '&chain=' + chain
    url += '&res=' + str(resnum)
    result = request.urlopen(url).read()
    result = str(result, encoding='utf-8')
    result = result.replace('\n', '#')

    if (result != ''):
        rst = ParseRST(result)
        return(rst)
    else:
        sys.stderr.write('Nothing was returned\n')
    return('')

## Main program ##
pdbcode = input('Enter the pdbcode: ')
chain = input('Enter the chain label (optional): ')
resnum =  input('Enter the residue number: ')

(ac, upresnum) = ReadPDBSWS(pdbcode, chain, resnum)

print('Accession:        ' + ac)
print('UniProt Resnum: ' + str(upresnum))

[Download]

This was a well-written functioning piece of code, but there are a number of things that can be done to make it an excellent piece of code that would score higher marks in the assessed coursework.

  1. The shebang line was not valid (at least for unix/Linux). It must be the very first line and should read:
    #!/usr/bin/env python3
    
  2. Every source code file should contain a header stating the function of the code, who wrote it and when, how to run it, copyright information, a list of modifications, etc. In Python, this is conventionally done with a 'docstring' (see PEP257).
  3. Every function should be clearly separated from the next function and should have its own header to say what it does, who wrote it and when, how it has been revised and how it is called. In Python, this is conventionally done with a 'docstring' (see PEP257).
  4. Make your variable names informative. Resist the urge to use 1-character variable names (except for counters) and use proper names
  5. Code should be laid out neatly - make it look pretty, line up equals signs, etc. Separate logical blocks of code with a blank line
  6. Avoid hard-coding. Anything that is hard coded must go at the top of the file. Even better it goes into a separate configuration module or is read from a configuration file.
  7. Chain labels are not optional in a PDB file, so it is better to regard a chain plus residue number as a residue identifier. (e.g. A23)
  8. In the original code, ReadPDBSWS() returned a list when it succeeded, but a scalar when it failed. It could exit the code instead, but this is not a good idea for code reusability.
  9. ReadPDBSWS() gave an error message when the routine failed. It is better to handle errors at the highest level of the code.
  10. Programs designed for an end-user should have a help message.
  11. Generally it is better for programs to take values from the command line rather than prompt for them.
# Config.py - configuration information
baseurl = 'http://www.bioinf.org.uk/cgi-bin/pdbsws/query.pl?plain=1&qtype=pdb'

[Download]

#!/usr/bin/env python3
"""
Program:    pdb2sprot
File:       pdb2sprot.py

Version:    V1.0
Date:       01.03.18
Function:   Obtain SwissProt information for a PDB code and residue
            number using PDBSWS

Copyright:  (c) Dr. Andrew C. R. Martin, UCL, 2018
Author:     Dr. Andrew C. R. Martin
Address:    Institute of Structural and Molecular Biology
            Division of Biosciences
            University College London
            
--------------------------------------------------------------------------

This program is released under the GNU Public Licence (GPL V3)

--------------------------------------------------------------------------
Description:
============
This program takes a PDB code and residue identifier and returns
the relevant UniProtKB/SwissProt (or trEMBL) accession together with
the residue number in the sequence entry

--------------------------------------------------------------------------
Usage:
======
pdb2sprot PDBID RESID

--------------------------------------------------------------------------
Revision History:
=================
V1.0   01.03.18   Original   By: ACRM
"""

#*************************************************************************
# Import libraries
from urllib import request
import sys
import re
import config

#*************************************************************************
def ParseRST(result):
    """Parse the results from PDBSWS

    Input:  result       --- The result returned by PDBSWS
    Return: (ac, resnum) --- A list containing the UniProt accession
                             code and the UniProt residue number

    01.03.18  Original   By: ACRM
    """

    pattern = re.compile('.*AC:\s+(.*?)#')
    match   = pattern.match(result)
    ac      = match.group(1)

    pattern = re.compile('.*UPCOUNT:\s+(.*?)#')
    match   = pattern.match(result)
    resnum  = int(match.group(1))

    return(ac, resnum)

#*************************************************************************
def ReadPDBSWS(pdbcode, resid):
    """Obtain the UniProt accession, and residue number from PDBSWS
       given a PDB code, and residue identifier

    Input:  pdbcode      --- A PDB code (e.g. 8cat)
            resid        --- A residue identifier (e.g. A23)
    Return: (ac, resnum) --- A list containing the UniProt accession
                             code and the UniProt residue number

    01.03.18  Original   By: ACRM
    """

    # Obtain the chain and residue number from the resid
    # e.g.                  A          23    C                      
    pattern = re.compile('([a-zA-Z]+)([0-9]+[a-zA-z]*)')
    match   = pattern.match(resid)
    chain   = match.group(1)
    resnum  = match.group(2)

    url     = config.baseurl
    url    += '&id='    + pdbcode
    url    += '&chain=' + chain
    url    += '&res='   + str(resnum)

    result  = request.urlopen(url).read()
    result  = str(result, encoding='utf-8')
    result  = result.replace('\n', '#')

    # Success - parse and return the list
    if (result != ''):
        rst = ParseRST(result)
        return(rst)

    # Failure - return a blank list
    return([])


#*************************************************************************
def UsageDie():
    """Print a usage message and exit

    01.03.18  Original   By: ACRM
    """

    print(
"""
pdb2sws V1.0 (c) 2018 UCL, Dr. Andrew C.R. Martin

Usage: pdb2sws pdbcode resid

Obtain the UniProtKB/SwissProt accession and residue number for a 
PDB code (e.g. 8cat) and residue identifier (e.g. A23)
""")
    exit(0)


#*************************************************************************
#*** Main program                                                      ***
#*************************************************************************
if ((len(sys.argv) < 3) or (sys.argv[1] == '-h')):
   UsageDie()

pdbcode = sys.argv[1]
resid   = sys.argv[2]

results = ReadPDBSWS(pdbcode, resid)
if (len(results)):
    ac = results[0]
    upresnum = results[1]
    print('Accession:      ' + ac)
    print('UniProt Resnum: ' + str(upresnum))
else:
    print('PDB code or residue identifier not found.')


[Download]

Continue