Your task now is to modify the code you wrote for screen scraping such that it accesses the URL:
http://www.bioinf.org.uk/servers/pdbsws/query.cgi?qtype=pdb&id=1bwi&res=35&plain=1
Again start with dummy code:
#!/usr/bin/env python3
from urllib import request
import sys
import re
def ReadPDBSWS(pdbcode, resnum):
ac = 'P12345'
upresnum = 666
return(ac, upresnum)
""" Main program """
pdbcode = '1bwi'
resnum = 35
(ac, upresnum) = ReadPDBSWS(pdbcode, resnum)
print ("Accession: " + ac)
print ("UniProt Resnum: %d" % upresnum)
Now modify the ReadPDBSWS() function so that it creates the URL:
url = 'http://www.bioinf.org.uk/servers/pdbsws/query.cgi?plain=1&qtype=pdb'
url += '&id=' + pdbcode
url += '&res=' + str(resnum)
Read the URL and decode the resulting information.
result = request.urlopen(url).read()
result = str(result, encoding='utf-8')
Replace all the return characters with a # sign to make pattern matching easier
result = result.replace('\n', '#')
Match a pattern based on AC: followed by one or more spaces then extract the minimum number of characters before a # sign.
pattern = re.compile('.*AC:\s+(.*?)#')
match = pattern.match(result)
ac = match.group(1)
Repeat, but look for UPCOUNT: instead of AC: and return the results.
pattern = re.compile('.*UPCOUNT:\s+(.*?)#')
match = pattern.match(result)
upresnum = int(match.group(1))
return(ac, upresnum)
You should now have a working program that obtains the information from PDBSWS.
You might now want to modify the program: