comparison interproscan.xml @ 2:99517734aa65 draft default tip

Uploaded
author bjoern-gruening
date Sun, 23 Jun 2013 07:41:05 -0400
parents 94745fda6aff
children
comparison
equal deleted inserted replaced
1:94745fda6aff 2:99517734aa65
110 110
111 ----- 111 -----
112 Tools 112 Tools
113 ----- 113 -----
114 114
115 **PROSITE patterns**:: 115 **PROSITE patterns**
116
117 Some biologically significant amino acid patterns can be summarised in 116 Some biologically significant amino acid patterns can be summarised in
118 the form of regular expressions. 117 the form of regular expressions.
119 ScanRegExp (by Wolfgang.Fleischmann@ebi.ac.uk). 118 ScanRegExp (by Wolfgang.Fleischmann@ebi.ac.uk).
120 119
121 **PROSITE profiles**:: 120 **PROSITE profiles**
122
123 There are a number of protein families as well as functional or 121 There are a number of protein families as well as functional or
124 structural domains that cannot be detected using patterns due to their extreme 122 structural domains that cannot be detected using patterns due to their extreme
125 sequence divergence, so the use of techniques based on weight matrices 123 sequence divergence, so the use of techniques based on weight matrices
126 (also known as profiles) allows the detection of such proteins or domains. 124 (also known as profiles) allows the detection of such proteins or domains.
127 A profile is a table of position-specific amino acid weights and gap costs. 125 A profile is a table of position-specific amino acid weights and gap costs.
128 The profile structure used in PROSITE is similar to but slightly more general 126 The profile structure used in PROSITE is similar to but slightly more general
129 (Bucher P. et al., 1996) than the one introduced by M. Gribskov and 127 (Bucher P. et al., 1996) than the one introduced by M. Gribskov and
130 co-workers. 128 co-workers.
131 pfscan from the Pftools package (by Philipp.Bucher@isrec.unil.ch). 129 pfscan from the Pftools package (by Philipp.Bucher@isrec.unil.ch).
132 130
133 **PRINTS**:: 131 **PRINTS**
134
135 The PRINTS database houses a collection of protein family fingerprints. 132 The PRINTS database houses a collection of protein family fingerprints.
136 These are groups of motifs that together are diagnostically more 133 These are groups of motifs that together are diagnostically more
137 powerful than single motifs by making use of the biological context inherent in a 134 powerful than single motifs by making use of the biological context inherent in a
138 multiple-motif method. The fingerprinting method arose from the need for 135 multiple-motif method. The fingerprinting method arose from the need for
139 a reliable technique for detecting members of large, highly divergent 136 a reliable technique for detecting members of large, highly divergent
140 protein super-families. 137 protein super-families.
141 FingerPRINTScan (Scordis P. et al., 1999). 138 FingerPRINTScan (Scordis P. et al., 1999).
142 139
143 **PFAM**:: 140 **PFAM**
144
145 Pfam is a database of protein domain families. Pfam contains curated 141 Pfam is a database of protein domain families. Pfam contains curated
146 multiple sequence alignments for each family and corresponding hidden 142 multiple sequence alignments for each family and corresponding hidden
147 Markov models (HMMs) (Eddy S.R., 1998). 143 Markov models (HMMs) (Eddy S.R., 1998).
148 Profile hidden Markov models are statistical models of the primary 144 Profile hidden Markov models are statistical models of the primary
149 structure consensus of a sequence family. The construction and use 145 structure consensus of a sequence family. The construction and use
150 of Pfam is tightly tied to the HMMER software package. 146 of Pfam is tightly tied to the HMMER software package.
151 hmmpfam from the HMMER2.3.2 package (by Sean Eddy, 147 hmmpfam from the HMMER2.3.2 package (by Sean Eddy,
152 eddy@genetics.wustl.edu, http://hmmer.wustl.edu). 148 eddy@genetics.wustl.edu, http://hmmer.wustl.edu).
153 149
154 **PRODOM**:: 150 **PRODOM**
155
156 ProDom is a database of protein domain families obtained by automated 151 ProDom is a database of protein domain families obtained by automated
157 analysis of the SWISS-PROT and TrEMBL protein sequences. It is useful 152 analysis of the SWISS-PROT and TrEMBL protein sequences. It is useful
158 for analysing the domain arrangements of complex protein families and the 153 for analysing the domain arrangements of complex protein families and the
159 homology relationships in modular proteins. ProDom families are built by 154 homology relationships in modular proteins. ProDom families are built by
160 an automated process based on a recursive use of PSI-BLAST homology 155 an automated process based on a recursive use of PSI-BLAST homology
161 searches. 156 searches.
162 ProDomBlast3i.pl (by Emmanuel Courcelle emmanuel.courcelle@toulouse.inra.fr 157 ProDomBlast3i.pl (by Emmanuel Courcelle emmanuel.courcelle@toulouse.inra.fr
163 and Yoann Beausse beausse@toulouse.inra.fr) 158 and Yoann Beausse beausse@toulouse.inra.fr)
164 a wrapper on top of the Blast package (Altschul S.F. et al., 1997). 159 a wrapper on top of the Blast package (Altschul S.F. et al., 1997).
165 160
166 **SMART**:: 161 **SMART**
167
168 SMART (a Simple Modular Architecture Research Tool) allows the 162 SMART (a Simple Modular Architecture Research Tool) allows the
169 identification and annotation of genetically mobile domains and the 163 identification and annotation of genetically mobile domains and the
170 analysis of domain architectures. These domains are extensively 164 analysis of domain architectures. These domains are extensively
171 annotated with respect to phyletic distributions, functional class, tertiary 165 annotated with respect to phyletic distributions, functional class, tertiary
172 structures and functionally important residues. SMART alignments are 166 structures and functionally important residues. SMART alignments are
173 optimised manually and following construction of corresponding hidden Markov models (HMMs). 167 optimised manually and following construction of corresponding hidden Markov models (HMMs).
174 hmmpfam from the HMMER2.3.2 package (by Sean Eddy, 168 hmmpfam from the HMMER2.3.2 package (by Sean Eddy,
175 eddy@genetics.wustl.edu, http://hmmer.wustl.edu). 169 eddy@genetics.wustl.edu, http://hmmer.wustl.edu).
176 170
177 **TIGRFAMs**:: 171 **TIGRFAMs**
178
179 TIGRFAMs are a collection of protein families featuring curated multiple 172 TIGRFAMs are a collection of protein families featuring curated multiple
180 sequence alignments, Hidden Markov Models (HMMs) and associated 173 sequence alignments, Hidden Markov Models (HMMs) and associated
181 information designed to support the automated functional identification 174 information designed to support the automated functional identification
182 of proteins by sequence homology. Classification by equivalog family 175 of proteins by sequence homology. Classification by equivalog family
183 (see below), where achievable, complements classification by orthologs, 176 (see below), where achievable, complements classification by orthologs,
185 for automatic assignment of specific functions to proteins from large 178 for automatic assignment of specific functions to proteins from large
186 scale genome sequencing projects. 179 scale genome sequencing projects.
187 hmmpfam from the HMMER2.3.2 package (by Sean Eddy, 180 hmmpfam from the HMMER2.3.2 package (by Sean Eddy,
188 eddy@genetics.wustl.edu, http://hmmer.wustl.edu). 181 eddy@genetics.wustl.edu, http://hmmer.wustl.edu).
189 182
190 **PIR SuperFamily**:: 183 **PIR SuperFamily**
191
192 PIR SuperFamily (PIRSF) is a classification system based on evolutionary 184 PIR SuperFamily (PIRSF) is a classification system based on evolutionary
193 relationship of whole proteins. 185 relationship of whole proteins.
194 hmmpfam from the HMMER2.3.2 package (by Sean Eddy, 186 hmmpfam from the HMMER2.3.2 package (by Sean Eddy,
195 eddy@genetics.wustl.edu, http://hmmer.wustl.edu). 187 eddy@genetics.wustl.edu, http://hmmer.wustl.edu).
196 188
197 **SUPERFAMILY**:: 189 **SUPERFAMILY**
198
199 SUPERFAMILY is a library of profile hidden Markov models that represent 190 SUPERFAMILY is a library of profile hidden Markov models that represent
200 all proteins of known structure, based on SCOP. 191 all proteins of known structure, based on SCOP.
201 hmmpfam/hmmsearch from the HMMER2.3.2 package (by Sean Eddy, 192 hmmpfam/hmmsearch from the HMMER2.3.2 package (by Sean Eddy,
202 eddy@genetics.wustl.edu, http://hmmer.wustl.edu). 193 eddy@genetics.wustl.edu, http://hmmer.wustl.edu).
203 Optionally, predictions for coiled-coil, signal peptide cleavage sites 194 Optionally, predictions for coiled-coil, signal peptide cleavage sites
204 (SignalP v3) and TM helices (TMHMM v2) are supported (See the FAQs file 195 (SignalP v3) and TM helices (TMHMM v2) are supported (See the FAQs file for details).
205 for details). 196
206 197 **GENE3D**
207 **GENE3D**:: 198 Gene3D is supplementary to the CATH database. This protein sequence database
208 199 contains proteins from complete genomes which have been clustered into protein
209 Gene3D is supplementary to the CATH database. This protein sequence database 200 families and annotated with CATH domains, Pfam domains and functional
210 contains proteins from complete genomes which have been clustered into protein 201 information from KEGG, GO, COG, Affymetrix and STRINGS.
211 families and annotated with CATH domains, Pfam domains and functional 202 hmmpfam from the HMM2.3.2 package (by Sean Eddy,
212 information from KEGG, GO, COG, Affymetrix and STRINGS. 203 eddy@genetics.wustl.edu, http://hmmer.wustl.edu).
213 hmmpfam from the HMM2.3.2 package (by Sean Eddy, 204
214 eddy@genetics.wustl.edu, http://hmmer.wustl.edu). 205 **PANTHER**
215 206 The PANTHER (Protein ANalysis THrough Evolutionary Relationships)
216 **PANTHER**:: 207 Classification System was designed to classify proteins (and their genes)
217 208 in order to facilitate high-throughput analysis.
218 The PANTHER (Protein ANalysis THrough Evolutionary Relationships) 209 hmmsearch from the HMM2.3.2 package (by Sean Eddy,
219 Classification System was designed to classify proteins (and their genes) 210 eddy@genetics.wustl.edu, http://hmmer.wustl.edu).
220 in order to facilitate high-throughput analysis. 211 and blastall from the Blast package (Altschul S.F. et al., 1997).
221 hmmsearch from the HMM2.3.2 package (by Sean Eddy,
222 eddy@genetics.wustl.edu, http://hmmer.wustl.edu).
223 and blastall from the Blast package (Altschul S.F. et al., 1997).
224 212
225 ---------- 213 ----------
226 References 214 References
227 ---------- 215 ----------
228 216