annotate corebio/resource/scop.py @ 14:778f03497adb

Uploaded
author davidmurphy
date Fri, 24 Feb 2012 11:37:26 -0500
parents c55bdc2fb9fa
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
1
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
2 # Copyright 2000 by Jeffrey Chang. All rights reserved.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
3 # Copyright 2001 by Gavin E. Crooks. All rights reserved.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
4 # Modifications Copyright 2004/2005 James Casbon.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
5 # Copyright 2005 by Regents of the University of California. All rights Reserved.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
6 # (Major rewrite for conformance to corebio. Gavin Crooks)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
7 #
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
8 # This code is derived from the Biopython distribution and is governed by it's
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
9 # license. Please see the LICENSE file that should have been included
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
10 # as part of this package.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
11
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
12
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
13 """ SCOP: Structural Classification of Proteins.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
14
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
15 The SCOP database aims to provide a manually constructed classification of
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
16 all know protein structures into a hierarchy, the main levels of which
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
17 are family, superfamily and fold.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
18
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
19 * SCOP: http://scop.mrc-lmb.cam.ac.uk/scop/
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
20 * Introduction: http://scop.mrc-lmb.cam.ac.uk/scop/intro.html
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
21 * SCOP parsable files: http://scop.mrc-lmb.cam.ac.uk/scop/parse/
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
22
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
23 The Scop object in this module represents the entire SCOP classification. It
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
24 can be built from the three SCOP parsable files (see DesRecord, HieRecord and
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
25 ClaRecord), modified is so desired, and converted back to the same file formats.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
26 A single SCOP domain (represented by the Domain class) can be obtained from
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
27 Scop using the domain's SCOP identifier (sid).
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
28
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
29 Classes:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
30 - Scop -- The entire SCOP hierarchy.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
31 - Node -- A node in the SCOP hierarchy.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
32 - Domain -- A SCOP domain.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
33 - Residues -- A collection of residues from a PDB structure.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
34 - HieRecord -- Handle the SCOP HIErarchy files.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
35 - DesRecord -- Handle the SCOP DEScription file.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
36 - ClaRecord -- Handle the SCOP CLAssification file.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
37
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
38
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
39 nodeCodeDict -- A mapping between known 2 letter node codes and a longer
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
40 description. The known node types are 'cl' (class), 'cf'
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
41 (fold), 'sf' (superfamily), 'fa' (family), 'dm' (domain),
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
42 'sp' (species), 'px' (domain). Additional node types may
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
43 be added in the future.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
44 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
45
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
46 import os, re
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
47
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
48
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
49 nodeCodeDict = { 'cl':'class', 'cf':'fold', 'sf':'superfamily',
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
50 'fa':'family', 'dm':'protein', 'sp':'species', 'px':'domain'}
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
51
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
52
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
53 _nodetype_to_code= dict([[v,k] for k,v in nodeCodeDict.items()])
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
54
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
55
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
56 nodeCodeOrder = [ 'ro', 'cl', 'cf', 'sf', 'fa', 'dm', 'sp', 'px' ]
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
57
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
58
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
59 def cmp_sccs(sccs1, sccs2) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
60 """Order SCOP concise classification strings (sccs).
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
61
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
62 a.4.5.1 < a.4.5.11 < b.1.1.1
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
63
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
64 A sccs (e.g. a.4.5.11) compactly represents a domain's classification.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
65 The letter represents the class, and the numbers are the fold,
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
66 superfamily, and family, respectively.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
67
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
68 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
69
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
70 s1 = sccs1.split(".")
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
71 s2 = sccs2.split(".")
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
72
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
73 if s1[0] != s2[0]: return cmp(s1[0], s2[0])
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
74
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
75 s1 = map(int, s1[1:])
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
76 s2 = map(int, s2[1:])
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
77
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
78 return cmp(s1,s2)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
79
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
80
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
81
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
82 def _open_scop_file(scop_dir_path, version, filetype) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
83 filename = "dir.%s.scop.txt_%s" % (filetype,version)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
84 afile = open(os.path.join( scop_dir_path, filename))
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
85 return afile
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
86
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
87
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
88 class Scop(object):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
89 """The entire SCOP hierarchy.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
90
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
91 root -- The root node of the hierarchy
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
92 domains -- A list of all domains
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
93 nodes_by_sid -- A dictionary of nodes indexed by SCOP identifier
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
94 (e.g. 'd1hbia_')
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
95 domains_by_sunid -- A dictionary of domains indexed by SCOP uniquie
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
96 identifiers (e.g. 14996)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
97 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
98 def __init__(self):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
99 """ An empty Scop object.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
100
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
101 See also Scop.parse() and Scop.parse_files()
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
102 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
103 self.root = None
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
104 self.domains = []
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
105 self.nodes_by_sunid = dict()
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
106 self.domains_by_sid = dict()
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
107
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
108 #@classmethod
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
109 def parse(cls, dir_path, version) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
110 """Build the SCOP hierarchy from the SCOP parsable files.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
111
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
112 - dir_path -- A directory that contains the SCOP files
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
113 - version -- The SCOP version (as a string)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
114
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
115 The SCOP files are named dir.XXX.scop.txt_VERSION, where XXX
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
116 is 'cla', 'des' or 'hie'.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
117 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
118 cla_file = None
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
119 des_file = None
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
120 hie_file = None
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
121 try :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
122 cla_file = _open_scop_file( dir_path, version, 'cla')
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
123 des_file = _open_scop_file( dir_path, version, 'des')
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
124 hie_file = _open_scop_file( dir_path, version, 'hie')
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
125 scop = cls.parse_files(cla_file, des_file, hie_file)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
126 finally :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
127 # If we opened the files, we close the files
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
128 if cla_file : cla_file.close()
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
129 if des_file : des_file.close()
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
130 if hie_file : hie_file.close()
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
131
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
132 return scop
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
133 parse = classmethod(parse)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
134
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
135 #@classmethod
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
136 def parse_files(cls, cla_file, des_file, hie_file):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
137 """Build the SCOP hierarchy from the SCOP parsable files.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
138
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
139 - cla_file -- the CLA clasification file
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
140 - des_file -- the DES description file
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
141 - hie_file -- the HIE hierarchy file
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
142 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
143
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
144 self = cls()
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
145
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
146 sunidDict = {}
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
147
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
148 root = Node()
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
149 domains = []
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
150 root.sunid=0
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
151 root.type='ro'
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
152 sunidDict[root.sunid] = root
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
153
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
154 root.description = 'SCOP Root'
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
155
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
156 # Build the rest of the nodes using the DES file
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
157 for rec in DesRecord.records(des_file):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
158 if rec.nodetype =='px' :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
159 n = Domain()
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
160 n.sid = rec.name
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
161 domains.append(n)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
162 else :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
163 n = Node()
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
164 n.sunid = rec.sunid
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
165 n.type = rec.nodetype
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
166 n.sccs = rec.sccs
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
167 n.description = rec.description
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
168
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
169 sunidDict[n.sunid] = n
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
170
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
171 # Glue all of the Nodes together using the HIE file
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
172 for rec in HieRecord.records(hie_file):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
173 if not rec.sunid in sunidDict :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
174 print rec.sunid #FIXME: HUH?
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
175
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
176 n = sunidDict[rec.sunid]
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
177 if rec.parent !='': # Not root node
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
178 if not rec.parent in sunidDict :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
179 raise ValueError("Incomplete data?")
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
180 n.parent = sunidDict[rec.parent]
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
181
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
182 for c in rec.children:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
183 if not c in sunidDict :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
184 raise ValueError("Incomplete data?")
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
185 n.children.append(sunidDict[c])
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
186
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
187
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
188 # Fill in the gaps with information from the CLA file
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
189 sidDict = {}
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
190 for rec in ClaRecord.records(cla_file):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
191 n = sunidDict[rec.sunid]
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
192 assert n.sccs == rec.sccs
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
193 assert n.sid == rec.sid
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
194 n.residues = rec.residues
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
195 sidDict[n.sid] = n
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
196
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
197 # Clean up
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
198 self.root = root
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
199 self.nodes_by_sunid = sunidDict
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
200 self.domains_by_sid = sidDict
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
201 self.domains = tuple(domains)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
202
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
203 return self
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
204 parse_files = classmethod(parse_files)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
205
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
206
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
207 def write_hie(self, stream) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
208 """Build an HIE SCOP parsable file from this object"""
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
209 nodes = self.nodes_by_sunid.values()
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
210 # We order nodes to ease comparison with original file
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
211 nodes.sort(lambda n1,n2: cmp(n1.sunid, n2.sunid))
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
212
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
213 for n in nodes :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
214 stream.write(str(n.to_hie_record()))
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
215
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
216
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
217 def write_des(self, stream) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
218 """Build a DES SCOP parsable file from this object"""
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
219 nodes = self.nodes_by_sunid.values()
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
220 # Origional SCOP file is not ordered?
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
221 nodes.sort(lambda n1,n2: cmp(n1.sunid, n2.sunid))
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
222
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
223 for n in nodes :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
224 if n != self.root :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
225 stream.write(str(n.to_des_record()))
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
226
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
227
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
228 def write_cla(self, stream) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
229 """Build a CLA SCOP parsable file from this object"""
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
230 nodes = self.domains_by_sid.values()
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
231 # We order nodes to ease comparison with original file
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
232 nodes.sort(lambda n1,n2: cmp(n1.sunid, n2.sunid))
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
233
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
234 for n in nodes :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
235 stream.write(str(n.to_cla_record()))
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
236 # End Scop
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
237
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
238
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
239
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
240 class Node(object) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
241 """ A node in the Scop hierarchy
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
242
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
243 sunid -- SCOP unique identifiers. e.g. '14986'
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
244 parent -- The parent node
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
245 children -- A list of child nodes
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
246 sccs -- SCOP concise classification string. e.g. 'a.1.1.2'
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
247 type -- A 2 letter node type code. e.g. 'px' for domains
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
248 description --
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
249
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
250 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
251 def __init__(self) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
252 """A new, uninitilized SCOP node."""
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
253 self.sunid=''
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
254 self.parent = None
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
255 self.children=[]
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
256 self.sccs = ''
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
257 self.type =''
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
258 self.description =''
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
259
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
260 def __str__(self) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
261 s = []
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
262 s.append(str(self.sunid))
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
263 s.append(self.sccs)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
264 s.append(self.type)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
265 s.append(self.description)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
266
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
267 return " ".join(s)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
268
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
269 def to_hie_record(self):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
270 """Return an Hie.Record"""
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
271 rec = HieRecord()
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
272 rec.sunid = str(self.sunid)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
273 if self.parent : # Not root node
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
274 rec.parent = str(self.parent.sunid)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
275 else:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
276 rec.parent = '-'
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
277 for c in self.children :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
278 rec.children.append(str(c.sunid))
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
279 return rec
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
280
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
281 def to_des_record(self):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
282 """Return a Des.Record"""
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
283 rec = DesRecord()
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
284 rec.sunid = str(self.sunid)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
285 rec.nodetype = self.type
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
286 rec.sccs = self.sccs
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
287 rec.description = self.description
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
288 return rec
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
289
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
290 def descendents( self, node_type) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
291 """ Return a list of all decendent nodes of the given type. Node type
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
292 can be a two letter code or longer description. e.g. 'fa' or 'family'
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
293 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
294 if node_type in _nodetype_to_code:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
295 node_type = _nodetype_to_code[node_type]
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
296
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
297 nodes = [self]
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
298
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
299 while nodes[0].type != node_type:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
300 if nodes[0].type == 'px' :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
301 return [] # Fell of the bottom of the hierarchy
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
302 child_list = []
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
303 for n in nodes:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
304 for child in n.children:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
305 child_list.append( child )
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
306 nodes = child_list
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
307
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
308 return nodes
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
309
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
310
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
311 def ascendent( self, node_type) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
312 """ Return the ancestor node of the given type, or None. Node type can
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
313 be a two letter code or longer description. e.g. 'fa' or 'family'
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
314 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
315 if node_type in _nodetype_to_code :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
316 node_type = _nodetype_to_code[node_type]
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
317
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
318 n = self
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
319 if n.type == node_type: return None
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
320 while n.type != node_type:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
321 if n.type == 'ro':
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
322 return None # Fell of the top of the hierarchy
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
323 n = n.parent
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
324
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
325 return n
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
326 # End Node
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
327
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
328
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
329 class Domain(Node) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
330 """ A SCOP domain. A leaf node in the Scop hierarchy.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
331
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
332 - sid -- The SCOP domain identifier. e.g. 'd5hbib_'
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
333 - residues -- A Residue object. It defines the collection
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
334 of PDB atoms that make up this domain.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
335 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
336 def __init__(self) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
337 Node.__init__(self)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
338 self.sid = ''
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
339 self.residues = None
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
340
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
341 def __str__(self) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
342 s = []
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
343 s.append(self.sid)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
344 s.append(self.sccs)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
345 s.append("("+str(self.residues)+")")
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
346
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
347 if not self.parent :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
348 s.append(self.description)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
349 else :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
350 sp = self.parent
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
351 dm = sp.parent
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
352 s.append(dm.description)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
353 s.append("{"+sp.description+"}")
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
354
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
355 return " ".join(s)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
356
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
357 def to_des_record(self):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
358 """Return a des.Record"""
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
359 rec = Node.to_des_record(self)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
360 rec.name = self.sid
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
361 return rec
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
362
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
363 def to_cla_record(self) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
364 """Return a cla.Record"""
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
365 rec = ClaRecord()
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
366 rec.sid = self.sid
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
367 rec.residues = self.residues
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
368 rec.sccs = self.sccs
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
369 rec.sunid = self.sunid
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
370
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
371 n = self
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
372 while n.sunid != 0: # Not root node
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
373 rec.hierarchy.append( (n.type, str(n.sunid)) )
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
374 n = n.parent
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
375
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
376 rec.hierarchy.reverse()
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
377
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
378 return rec
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
379 # End Domain
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
380
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
381
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
382
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
383 class DesRecord(object):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
384 """ Handle the SCOP DEScription file.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
385
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
386 The file format is described in the scop
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
387 "release notes.":http://scop.berkeley.edu/release-notes-1.55.html
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
388 The latest DES file can be found
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
389 "elsewhere at SCOP.":http://scop.mrc-lmb.cam.ac.uk/scop/parse/
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
390
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
391 The DES file consisnt of one DES record per line. Each record
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
392 holds information for one node in the SCOP hierarchy, and consist
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
393 of 5 tab deliminated fields,
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
394 sunid, node type, sccs, node name, node description.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
395
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
396 For example ::
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
397
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
398 21953 px b.1.2.1 d1dan.1 1dan T:,U:91-106
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
399 48724 cl b - All beta proteins
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
400 48725 cf b.1 - Immunoglobulin-like beta-sandwich
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
401 49265 sf b.1.2 - Fibronectin type III
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
402 49266 fa b.1.2.1 - Fibronectin type III
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
403
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
404
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
405 - sunid -- SCOP unique identifiers
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
406 - nodetype -- One of 'cl' (class), 'cf' (fold), 'sf' (superfamily),
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
407 'fa' (family), 'dm' (protein), 'sp' (species),
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
408 'px' (domain). Additional node types may be added.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
409 - sccs -- SCOP concise classification strings. e.g. b.1.2.1
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
410 - name -- The SCOP ID (sid) for domains (e.g. d1anu1),
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
411 currently empty for other node types
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
412 - description -- e.g. "All beta proteins","Fibronectin type III",
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
413 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
414 def __init__(self, record=None):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
415
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
416 if not record :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
417 self.sunid = ''
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
418 self.nodetype = ''
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
419 self.sccs = ''
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
420 self.name = ''
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
421 self.description =''
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
422 else :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
423 entry = record.rstrip() # no trailing whitespace
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
424 columns = entry.split("\t") # separate the tab-delineated cols
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
425 if len(columns) != 5:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
426 raise ValueError("I don't understand the format of %s" % entry)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
427
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
428 self.sunid, self.nodetype, self.sccs, self.name, self.description \
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
429 = columns
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
430 if self.name == '-' : self.name =''
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
431 self.sunid = int(self.sunid)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
432
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
433 def __str__(self):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
434 s = []
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
435 s.append(self.sunid)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
436 s.append(self.nodetype)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
437 s.append(self.sccs)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
438 if self.name :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
439 s.append(self.name)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
440 else :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
441 s.append("-")
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
442 s.append(self.description)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
443 return "\t".join(map(str,s)) + "\n"
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
444
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
445 #@staticmethod
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
446 def records(des_file):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
447 """Iterates over a DES file, generating DesRecords """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
448 for line in des_file:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
449 if line[0] =='#': continue # A comment
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
450 if line.isspace() : continue
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
451 yield DesRecord(line)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
452 records = staticmethod(records)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
453 # End DesRecord
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
454
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
455 class HieRecord(object):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
456 """Handle the SCOP HIErarchy files, which describe the SCOP hierarchy in
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
457 terms of SCOP unique identifiers (sunid).
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
458
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
459 The file format is described in the scop
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
460 "release notes.":http://scop.berkeley.edu/release-notes-1.55.html
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
461 The latest HIE file can be found
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
462 "elsewhere at SCOP.":http://scop.mrc-lmb.cam.ac.uk/scop/parse/
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
463
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
464 "Release 1.55":http://scop.berkeley.edu/parse/dir.hie.scop.txt_1.55
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
465 Records consist of 3 tab deliminated fields; node's sunid,
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
466 parent's sunid, and a list of children's sunids. For example ::
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
467
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
468 0 - 46456,48724,51349,53931,56572,56835,56992,57942
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
469 21953 49268 -
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
470 49267 49266 49268,49269
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
471
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
472 Each record holds information for one node in the SCOP hierarchy.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
473
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
474 sunid -- SCOP unique identifiers of this node
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
475 parent -- Parents sunid
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
476 children -- Sequence of childrens sunids
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
477 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
478 def __init__(self, record = None):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
479 self.sunid = None
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
480 self.parent = None
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
481 self.children = []
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
482
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
483 if not record : return
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
484
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
485 # Parses HIE records.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
486 entry = record.rstrip() # no trailing whitespace
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
487 columns = entry.split('\t') # separate the tab-delineated cols
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
488 if len(columns) != 3:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
489 raise ValueError("I don't understand the format of %s" % entry)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
490
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
491 self.sunid, self.parent, children = columns
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
492
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
493 if self.sunid =='-' : self.sunid = ''
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
494 if self.parent =='-' : self.parent = ''
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
495 else : self.parent = int( self.parent )
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
496
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
497 if children =='-' :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
498 self.children = ()
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
499 else :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
500 self.children = children.split(',')
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
501 self.children = map ( int, self.children )
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
502
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
503 self.sunid = int(self.sunid)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
504
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
505 def __str__(self):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
506 s = []
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
507 s.append(str(self.sunid))
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
508
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
509 if self.parent:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
510 s.append(str(self.parent))
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
511 else:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
512 if self.sunid != 0:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
513 s.append('0')
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
514 else:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
515 s.append('-')
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
516
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
517 if self.children :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
518 child_str = map(str, self.children)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
519 s.append(",".join(child_str))
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
520 else:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
521 s.append('-')
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
522
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
523 return "\t".join(s) + "\n"
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
524
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
525
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
526 #@staticmethod
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
527 def records(hie_file):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
528 """Iterates over a DOM file, generating DomRecords """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
529 for line in hie_file:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
530 if line[0] =='#': continue # A comment
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
531 if line.isspace() : continue
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
532 yield HieRecord(line)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
533 records = staticmethod(records)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
534 # End HieRecord
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
535
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
536
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
537
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
538 class ClaRecord(object):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
539 """Handle the SCOP CLAssification file, which describes SCOP domains.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
540
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
541 The file format is described in the scop
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
542 "release notes.":http://scop.berkeley.edu/release-notes-1.55.html
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
543 The latest CLA file can be found
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
544 "elsewhere at SCOP.":http://scop.mrc-lmb.cam.ac.uk/scop/parse/
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
545
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
546 sid -- SCOP identifier. e.g. d1danl2
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
547 residues -- The domain definition as a Residues object
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
548 sccs -- SCOP concise classification strings. e.g. b.1.2.1
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
549 sunid -- SCOP unique identifier for this domain
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
550 hierarchy -- A sequence of tuples (nodetype, sunid) describing the
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
551 location of this domain in the SCOP hierarchy.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
552 See the Scop module for a description of nodetypes.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
553 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
554 def __init__(self, record=None):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
555 self.sid = ''
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
556 self.residues = None
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
557 self.sccs = ''
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
558 self.sunid =''
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
559 self.hierarchy = []
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
560
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
561 if not record: return
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
562
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
563 # Parse a tab-deliminated CLA record.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
564 entry = record.rstrip() # no trailing whitespace
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
565 columns = entry.split('\t') # separate the tab-delineated cols
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
566 if len(columns) != 6:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
567 raise ValueError("I don't understand the format of %s" % entry)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
568
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
569 self.sid, pdbid, residues, self.sccs, self.sunid, hierarchy = columns
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
570 self.residues = Residues(residues)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
571 self.residues.pdbid = pdbid
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
572 self.sunid = int(self.sunid)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
573
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
574 h = []
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
575 for ht in hierarchy.split(",") :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
576 h.append( ht.split('='))
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
577 for ht in h:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
578 ht[1] = int(ht[1])
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
579 self.hierarchy = h
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
580
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
581 def __str__(self):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
582 s = []
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
583 s.append(self.sid)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
584 s += str(self.residues).split(" ")
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
585 s.append(self.sccs)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
586 s.append(self.sunid)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
587
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
588 h=[]
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
589 for ht in self.hierarchy:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
590 h.append("=".join(map(str,ht)))
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
591 s.append(",".join(h))
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
592
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
593 return "\t".join(map(str,s)) + "\n"
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
594
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
595 #@staticmethod
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
596 def records(cla_file):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
597 """Iterates over a DOM file, generating DomRecords """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
598 for line in cla_file:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
599 if line[0] =='#': continue # A comment
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
600 if line.isspace() : continue
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
601 yield ClaRecord(line)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
602 records = staticmethod(records)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
603
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
604 # End ClaRecord
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
605
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
606
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
607
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
608
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
609 class DomRecord(object):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
610 """Handle the SCOP DOMain file.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
611
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
612 The DOM file has been officially deprecated. For more information see
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
613 the SCOP"release notes.":http://scop.berkeley.edu/release-notes-1.55.html
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
614 The DOM files for older releases can be found
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
615 "elsewhere at SCOP.":http://scop.mrc-lmb.cam.ac.uk/scop/parse/
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
616
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
617 DOM records consist of 4 tab deliminated fields;
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
618 sid, pdbid, residues, hierarchy
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
619 For example ::
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
620
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
621 d1sctg_ 1sct g: 1.001.001.001.001.001
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
622 d1scth_ 1sct h: 1.001.001.001.001.001
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
623 d1flp__ 1flp - 1.001.001.001.001.002
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
624 d1moh__ 1moh - 1.001.001.001.001.002
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
625
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
626 sid -- The SCOP ID of the entry, e.g. d1anu1
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
627 residues -- The domain definition as a Residues object
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
628 hierarchy -- A string specifying where this domain is in the hierarchy.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
629 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
630 def __init__(self, record= None):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
631 self.sid = ''
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
632 self.residues = []
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
633 self.hierarchy = ''
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
634
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
635 if record:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
636 entry = record.rstrip() # no trailing whitespace
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
637 columns = entry.split("\t") # separate the tab-delineated cols
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
638 if len(columns) != 4:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
639 raise ValueError("I don't understand the format of %s" % entry)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
640 self.sid, pdbid, res, self.hierarchy = columns
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
641 self.residues = Residues(res)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
642 self.residues.pdbid = pdbid
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
643
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
644 def __str__(self):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
645 s = []
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
646 s.append(self.sid)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
647 s.append(str(self.residues).replace(" ","\t") )
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
648 s.append(self.hierarchy)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
649 return "\t".join(s) + "\n"
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
650
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
651 #@staticmethod
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
652 def records(dom_file):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
653 """Iterates over a DOM file, generating DomRecords """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
654 for line in dom_file:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
655 if line[0] =='#': continue # A comment
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
656 if line.isspace() : continue
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
657 yield DomRecord(line)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
658 records = staticmethod(records)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
659 # End DomRecord
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
660
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
661
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
662
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
663
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
664 _pdbid_re = re.compile(r"^(\w\w\w\w)(?:$|\s+|_)(.*)")
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
665 _fragment_re = re.compile(r"\(?(\w:)?(-?\w*)-?(-?\w*)\)?(.*)")
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
666
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
667 class Residues(object) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
668 """A collection of residues from a PDB structure.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
669
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
670 This class provides code to work with SCOP domain definitions. These
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
671 are concisely expressed as a one or more chain fragments. For example,
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
672 "(1bba A:10-20,B:)" indicates residue 10 through 20 (inclusive) of
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
673 chain A, and every residue of chain B in the pdb structure 1bba. The pdb
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
674 id and brackets are optional. In addition "-" indicates every residue of
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
675 a pbd structure with one unnamed chain.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
676
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
677 Start and end residue ids consist of the residue sequence number and an
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
678 optional single letter insertion code. e.g. "12", "-1", "1a", "1000"
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
679
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
680
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
681 pdbid -- An optional PDB id, e.g. "1bba"
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
682 fragments -- A sequence of tuples (chainID, startResID, endResID)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
683 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
684
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
685
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
686 def __init__(self, str=None) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
687 self.pdbid = ''
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
688 self.fragments = ()
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
689 if str is not None : self._parse(str)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
690
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
691
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
692 def _parse(self, string):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
693 string = string.strip()
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
694
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
695 #Is there a pdbid at the front? e.g. 1bba A:1-100
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
696 m = _pdbid_re.match(string)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
697 if m is not None :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
698 self.pdbid = m.group(1)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
699 string = m.group(2) # Everything else
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
700
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
701 if string=='' or string == '-' or string=='(-)': # no fragments, whole sequence
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
702 return
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
703
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
704 fragments = []
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
705 for l in string.split(",") :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
706 m = _fragment_re.match(l)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
707 if m is None:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
708 raise ValueError("I don't understand the format of %s" % l)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
709 chain, start, end, postfix = m.groups()
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
710
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
711 if postfix != "" :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
712 raise ValueError("I don't understand the format of %s" % l )
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
713
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
714 if chain:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
715 if chain[-1] != ':':
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
716 raise ValueError("I don't understand the chain in %s" % l)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
717 chain = chain[:-1] # chop off the ':'
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
718 else :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
719 chain =""
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
720
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
721 fragments.append((chain, start, end))
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
722 self.fragments = tuple(fragments)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
723
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
724 def __str__(self):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
725 prefix =""
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
726 if self.pdbid :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
727 prefix =self.pdbid +' '
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
728
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
729 if not self.fragments: return prefix+'-'
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
730 strs = []
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
731 for chain, start, end in self.fragments:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
732 s = []
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
733 if chain: s.append("%s:" % chain)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
734 if start: s.append("%s-%s" % (start, end))
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
735 strs.append("".join(s))
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
736 return prefix+ ",".join(strs)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
737 # End Residues
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
738
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
739
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
740
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
741
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
742
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
743
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
744
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
745
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
746
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
747
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
748
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
749
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
750
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
751
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
752
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
753
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
754
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
755
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
756
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
757