Using REST

Practical work

Your task now is to modify the code you wrote for screen scraping such that it accesses the URL:

http://www.bioinf.org.uk/servers/pdbsws/query.cgi?qtype=pdb&id=1bwi&res=35&plain=1

Again start with dummy code:

#!/usr/bin/env python3

from urllib import request
import sys
import re

def ReadPDBSWS(pdbcode, resnum):
    ac = 'P12345'
    upresnum = 666
    return(ac, upresnum)

""" Main program """

pdbcode = '1bwi'
resnum  = 35

(ac, upresnum) = ReadPDBSWS(pdbcode, resnum)

print ("Accession:      " + ac)
print ("UniProt Resnum: %d" % upresnum)

Now modify the ReadPDBSWS() function so that it creates the URL:

    url = 'http://www.bioinf.org.uk/servers/pdbsws/query.cgi?plain=1&qtype=pdb'
    url += '&id=' + pdbcode
    url += '&res=' + str(resnum)

Read the URL and decode the resulting information.

    result = request.urlopen(url).read()
    result = str(result, encoding='utf-8')

Replace all the return characters with a # sign to make pattern matching easier

    result = result.replace('\n', '#')

Match a pattern based on AC: followed by one or more spaces then extract the minimum number of characters before a # sign.

    pattern  = re.compile('.*AC:\s+(.*?)#')
    match    = pattern.match(result)
    ac       = match.group(1)

Repeat, but look for UPCOUNT: instead of AC: and return the results.

    pattern  = re.compile('.*UPCOUNT:\s+(.*?)#')
    match    = pattern.match(result)
    upresnum = int(match.group(1))

    return(ac, upresnum)

You should now have a working program that obtains the information from PDBSWS.

You might now want to modify the program:

Continue