Mercurial > repos > peterjc > tmhmm_and_signalp
comparison tools/protein_analysis/rxlr_motifs.xml @ 6:a290c6d4e658
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
author | peterjc |
---|---|
date | Tue, 07 Jun 2011 18:07:09 -0400 |
parents | |
children | 9b45a8743100 |
comparison
equal
deleted
inserted
replaced
5:0f1c61998b22 | 6:a290c6d4e658 |
---|---|
1 <tool id="rxlr_motifs" name="RXLR Motifs" version="0.0.5"> | |
2 <description>Find RXLR Effectors of Plant Pathogenic Oomycetes</description> | |
3 <command interpreter="python"> | |
4 rxlr_motifs.py $fasta_file 8 $model $tabular_file | |
5 ##I want the number of threads to be a Galaxy config option... | |
6 </command> | |
7 <inputs> | |
8 <param name="fasta_file" type="data" format="fasta" label="FASTA file of protein sequences" /> | |
9 <param name="model" type="select" label="Which RXLR model?"> | |
10 <option value="Bhattacharjee2006">Bhattacharjee et al. (2006) RXLR</option> | |
11 <option value="Win2007">Win et al. (2007) RXLR</option> | |
12 <option value="Whisson2007" selected="True">Whisson et al. (2007) RXLR-EER with HMM</option> | |
13 </param> | |
14 </inputs> | |
15 <outputs> | |
16 <data name="tabular_file" format="tabular" label="$model.value_label" /> | |
17 </outputs> | |
18 <requirements> | |
19 <!-- Need SignalP for all the models --> | |
20 <requirement type="binary">signalp</requirement> | |
21 <!-- Need HMMER for Whisson et al. (2007) --> | |
22 <requirement type="binary">hmmsearch</requirement> | |
23 </requirements> | |
24 <tests> | |
25 <test> | |
26 <param name="fasta_file" value="rxlr_win_et_al_2007.fasta" ftype="fasta" /> | |
27 <param name="model" value="Win2007" /> | |
28 <output name="tabular_file" file="rxlr_win_et_al_2007.tabular" ftype="tabular" /> | |
29 </test> | |
30 </tests> | |
31 <help> | |
32 | |
33 **Background** | |
34 | |
35 Many effector proteins from Oomycete plant pathogens for manipulating the host | |
36 have been found to contain a signal peptide followed by a conserved RXLR motif | |
37 (Arg, any amino acid, Leu, Arg), and then sometimes EER (Glu, Glu, Arg). There | |
38 are stiking parallels with the malarial host-targeting signal (Plasmodium | |
39 export element, or "Pexel" for short). | |
40 | |
41 ----- | |
42 | |
43 **What it does** | |
44 | |
45 Takes a protein sequence FASTA file as input, and produces a simple tabular | |
46 file as output with one line per protein, and two columns giving the sequence | |
47 ID and the predicted class. This is typically just whether or not it had the | |
48 selected RXLR motif (Y or N). | |
49 | |
50 ----- | |
51 | |
52 **Bhattacharjee et al. (2006) RXLR Model** | |
53 | |
54 Looks for the oomycete motif RXLR as described in Bhattacharjee et al. (2006). | |
55 | |
56 Matches must have a SignalP Hidden Markov Model (HMM) score of at least 0.9, | |
57 a SignalP Neural Network (NN) predicted clevage site giving a signal peptide | |
58 length between 10 and 40 amino acids inclusive, and the RXLR pattern must be | |
59 after but within 100 amino acids of the clevage site. | |
60 SignalP is run truncating the sequences to the first 70 amino acids, which was | |
61 the default on the SignalP webservice used in Bhattacharjee et al. (2006). | |
62 | |
63 | |
64 **Win et al. (2007) RXLR Model** | |
65 | |
66 Looks for the protein motif RXLR as described in Win et al. (2007). | |
67 | |
68 Matches must have a SignalP Hidden Markov Model (HMM) score of at least 0.9, | |
69 a SignalP Neural Network (NN) predicted clevage site giving a signal peptide | |
70 length between 10 and 40 amino acids inclusive, and the RXLR pattern must be | |
71 after the clevage site and start between amino acids 30 and 60. | |
72 SignalP is run truncating the sequences to the first 70 amino acids, to match | |
73 the methodology of Torto et al. (2003) followed in Win et al. (2007). | |
74 | |
75 | |
76 **Whisson et al. (2007) RXLR-EER with HMM** | |
77 | |
78 Looks for the protein motif RXLR-EER using the heuristic regular expression | |
79 methodolgy, which was an extension of the Bhattacharjee et al. (2006) model, | |
80 and a HMM as described in Whisson et al. (2007). | |
81 | |
82 All the requirements described above for Bhattacharjee et al. (2006) apply, | |
83 but rather than just looking for RXLR with the regular expression R.LR the | |
84 more complicated regular expression R.LR.{,40}[ED][ED][KR] is used. This means | |
85 RXLR (Arg, any amino acid, Leu, Arg), then a stretch of up to forty amino | |
86 acids before Glu/Asp, Glu/Asp, Lys/Arg. The EER part of the name is perhaps | |
87 misleading as it also allows for DDR, EEK, and so on. | |
88 | |
89 Unlike Bhattacharjee et al. (2006) which used the SignalP webservice which | |
90 defaults to truncating the sequences at 70 amino acids, Whisson et al. (2007) | |
91 used the SignalP 3.0 command line tool with its default of not truncating the | |
92 sequences. This does alter some of the scores, and also takes a little longer. | |
93 | |
94 Additionally HMMER 2.3.2 is run to look for a cross validated HMM for the | |
95 RXLR-ERR domain based on known positive examples. There are no restrictions | |
96 on where within the protein the HMM match must be found. | |
97 | |
98 The output of this model has four classes: | |
99 * Y = Yes, both the heuristic motif and HMM were found. | |
100 * re = Only the heuristic SignalP with regular expression motif was found. | |
101 * hmm = Only the HMM was found. | |
102 * neither = Niether the heuristic motif nor HMM was found. | |
103 | |
104 ----- | |
105 | |
106 **Note** | |
107 | |
108 Both Bhattacharjee et al. (2006) and Win et al. (2007) used SignalP v2.0, which | |
109 is no longer available. The current release is SignalP v3.0 (Mar 5, 2007), so | |
110 this is used instead. SignalP is called with the Eukaryote model and the short | |
111 output (one line per protein). Any sequence truncation (e.g. to 70 amino acids) | |
112 is handled via the intemediate sequence files. | |
113 | |
114 ----- | |
115 | |
116 **References** | |
117 | |
118 Stephen C. Whisson, Petra C. Boevink, Lucy Moleleki, Anna O. Avrova, Juan G. Morales, Eleanor M. Gilroy, Miles R. Armstrong, Severine Grouffaud, Pieter van West, Sean Chapman, Ingo Hein, Ian K. Toth, Leighton Pritchard and Paul R. J. Birch | |
119 A translocation signal for delivery of oomycete effector proteins into host plant cells. | |
120 Nature 450:115-118, 2007. | |
121 http://dx.doi.org/10.1038/nature06203 | |
122 | |
123 Joe Win, William Morgan, Jorunn Bos, Ksenia V. Krasileva, Liliana M. Cano, Angela Chaparro-Garcia, Randa Ammar, Brian J. Staskawicz and Sophien Kamoun. | |
124 Adaptive evolution has targeted the C-terminal domain of the RXLR effectors of plant pathogenic oomycetes. | |
125 The Plant Cell 19:2349-2369, 2007. | |
126 http://dx.doi.org/10.1105/tpc.107.051037 | |
127 | |
128 Souvik Bhattacharjee, N. Luisa Hiller, Konstantinos Liolios, Joe Win, Thirumala-Devi Kanneganti, Carolyn Young, Sophien Kamoun and Kasturi Haldar. | |
129 The malarial host-targeting signal is conserved in the Irish potato famine pathogen. | |
130 PLoS Pathogens, 2(5):e50, 2006. | |
131 http://dx.doi.org/10.1371/journal.ppat.0020050 | |
132 | |
133 Trudy A. Torto, Shuang Li, Allison Styer, Edgar Huitema, Antonino Testa, Neil A.R. Gow, Pieter van West and Sophien Kamoun. | |
134 EST mining and functional expression assays identify extracellular effector proteins from the plant pathogen *phytophthora*. | |
135 Genome Research, 13:1675-1685, 2003. | |
136 http://dx.doi.org/10.1101/gr.910003 | |
137 | |
138 Sean R. Eddy. | |
139 Profile hidden Markov models. | |
140 Bioinformatics, 14(9):755–763, 1998 | |
141 http://dx.doi.org/10.1093/bioinformatics/14.9.755 | |
142 | |
143 Nielsen, Engelbrecht, Brunak and von Heijne. | |
144 Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. | |
145 Protein Engineering, 10:1-6, 1997. | |
146 http://dx.doi.org/10.1093/protein/10.1.1 | |
147 | |
148 </help> | |
149 </tool> |