Using REST

Practical work

Your task now is to modify the code you wrote for screen scraping such that it accesses the URL:

http://www.bioinf.org.uk/servers/pdbsws/query.cgi?qtype=pdb&id=1bwi&res=35&plain=1

Again start with dummy code:

See the code

#!/usr/bin/env python3

from urllib import request
import sys
import re

def ReadPDBSWS(pdbcode, resnum):
    ac = 'P12345'
    upresnum = 666
    return(ac, upresnum)

""" Main program """

pdbcode = '1bwi'
resnum  = 35

(ac, upresnum) = ReadPDBSWS(pdbcode, resnum)

print ("Accession:      " + ac)
print ("UniProt Resnum: %d" % upresnum)

Now modify the ReadPDBSWS() function so that it creates the URL:

See the code

    url = 'http://www.bioinf.org.uk/servers/pdbsws/query.cgi?plain=1&qtype=pdb'
    url += '&id=' + pdbcode
    url += '&res=' + str(resnum)

Read the URL and decode the resulting information.

See the code

    result = request.urlopen(url).read()
    result = str(result, encoding='utf-8')

Replace all the return characters with a # sign to make pattern matching easier

See the code

    result = result.replace('\n', '#')

Match a pattern based on AC: followed by one or more spaces then extract the minimum number of characters before a # sign.

See the code

    pattern  = re.compile('.*AC:\s+(.*?)#')
    match    = pattern.match(result)
    ac       = match.group(1)

Repeat, but look for UPCOUNT: instead of AC: and return the results.

See the code

    pattern  = re.compile('.*UPCOUNT:\s+(.*?)#')
    match    = pattern.match(result)
    upresnum = int(match.group(1))

    return(ac, upresnum)

You should now have a working program that obtains the information from PDBSWS.

You might now want to modify the program:

to add more error checking,
to allow the chain to be specified as well as the residue number,
to allow the PDB code, chain and residue number to be specified on the command line.

Continue