comparison rnabob.xml @ 0:cd00b4fe6552 draft

Imported from capsule None
author rnateam
date Mon, 22 Dec 2014 09:08:31 -0500
parents
children 5a4b00c84f50
comparison
equal deleted inserted replaced
-1:000000000000 0:cd00b4fe6552
1 <tool id="rbc_rnabob" name="RNABOB" version="2.2.1.0">
2 <description>Fast Pattern searching for RNA secondary structures</description>
3 <requirements>
4 <requirement type="package" version="2.2.1">rnabob</requirement>
5 </requirements>
6 <version_command>echo "2.2.1"</version_command>
7 <command>
8 <![CDATA[
9 rnabob
10 -q
11 $fancy
12 $compStrands
13 $skipOverlapping
14 $descriptorFile
15 $sequenceFile > $stdout
16 ]]>
17 </command>
18 <stdio>
19 <exit_code range="1:" level="fatal" description="Error occurred. Please check Tool Standard Error" />
20 <exit_code range=":-1" level="fatal" description="Error occurred. Please check Tool Standard Error" />
21 </stdio>
22 <inputs>
23 <param name="descriptorFile" type="data" format="txt" multiple="false" label="Motif Descriptor File" help="This file contains the description of the motif for which to search"/>
24 <param name="sequenceFile" type="data" format="fasta" multiple="false" label="Sequence File" help="This file specifies the sequence in which the motif will be searched"/>
25 <param name="compStrands" type="boolean" truevalue="-c" falsevalue="" checked="false" label="Also search on complementary strands" help="-c : Search both strands of the supplied sequence"/>
26 <param name="skipOverlapping" type="boolean" truevalue="-s" falsevalue="" checked="false" label="Skip overlapping matches" help="-s : This is a workaround to avoid a problem in the DNABANK, overlapping matches will be ignored"/>
27 <param name="fancy" type="boolean" checked="false" truevalue="-F" falsevalue="" label="Show Alignments" help="Display full alignments to pattern"/>
28 </inputs>
29 <outputs>
30 <data format="txt" name="stdout" label="${tool.name} on ${on_string}" />
31 </outputs>
32 <tests>
33 <test>
34 <param name="descriptorFile" value="r17.des" />
35 <param name="sequenceFile" value="F22B7.fa" />
36 <param name="compStrands" value="True" />
37 <param name="skipOverlapping" value="False" />
38 <param name="fancy" value="False" />
39 <output name="stdout" file="r17.bob" />
40 </test>
41 <test>
42 <param name="descriptorFile" value="trna.des" />
43 <param name="sequenceFile" value="F22B7.fa" />
44 <param name="compStrands" value="True" />
45 <param name="skipOverlapping" value="False" />
46 <param name="fancy" value="False" />
47 <output name="stdout" file="trna.bob" />
48 </test>
49 </tests>
50 <help>
51 **What RNABOB does**
52
53 RNABOB allows searching a sequence database for RNA structural motifs.
54 The probe motif is specified in a *descriptor* file,
55 which describes its primary sequence, secondary structure, and tertiary constraints.
56 The source in its original packaging can be found at http://selab.janelia.org/software/#rnabob.
57
58 -----
59
60 **Sequence database format**
61
62 RNABOB is currently restricted to reading sequence files in FASTA format.
63 The command line version of RNABOB can also read sequence files in GCG, EMBL, GenBank and other formats.
64
65 -----
66
67 **Descriptor file syntax**
68
69 The descriptor file syntax is fairly powerful, and allows a great deal of freedom for specifying
70 RNA motifs. The syntax is therefore a bit complicated.
71
72 The descriptor file has two parts: a **topology** description and an **explicit** description.
73
74 The first non-blank, non-comment line of the file is the topology description. It defines the
75 order of occurrence of a series of single-stranded, double-stranded and related elements. Each
76 element must be given a unique name (a number, typically) and must be prefixed with '**s**',
77 '**h**', or '**r**', indicating single-strand, helical, or a relational element. Helical and
78 relational elements are paired to other elements, which are suffixed by a prime, **\'**.
79
80 For example::
81
82 \
83 h1 s1 h1'
84
85 describes a hairpin loop structure with a simple helix and single-stranded loop. If the helix
86 always contained a non-canonical base pair at one position, the topology coud be described as::
87
88 \
89 h1 r1 h2 s1 h2' r1' h1'
90
91 where r1,r1' indicate a correlation, where the sequence r1 constrains the sequence of r1'.
92 (Helices are a special case of this.)
93
94 The remaining non-comment, non-blank lines are explicit descriptions of each element in turn. Each
95 line contains 3 or 4 fields, separated by tabs or blank space. The first field is the name of the
96 element, from the topology description. The second field is the number of mismatches allowed in
97 this element. The third field is the primary sequence constraint to apply to this element.
98
99 Helices and relational element pairs are specified on a single line rather than two. Mismatches
100 and primary sequence constraints are given as pairs, separated by a colon '**:**'. The left side
101 is the constraint applied to the upstream element, and the right side is applied to the downstream
102 elements.
103
104 The primary sequence constraint is given as a sequence of nucleotides. Any IUPAC single-letter
105 code is recognized, including N if the position can have any base identity. Allowed length
106 variations are specified with asterisks ``'*'``, where each ``*`` will allow either 0 or 1 N at
107 that position.
108
109 For example::
110
111 \
112 GGAGG******NNNAUG
113
114 specifies a GGAGG Shine/Dalgarno site and an AUG initiation codon, separated by a spacer of 3 to 9
115 nucleotides of any sequence.
116
117 An alternative syntax can be used for very long gaps::
118
119 \
120 GGAGG[10]NNNAUG is the same as GGAGG**********NNNAUG
121
122 Be careful defining variable length helices and relational elements; if the number and type (gap
123 or identity) of position do not match on left and right sides, the program will refuse to accept
124 the descriptor.
125
126 Relational elements have an additional field which specifies a "transformation matrix" of four
127 nucleotides, specifying the rule for making the ``r'`` pattern from the ``r`` sequence in order
128 ``A-C-G-T``. For example, the transformation matrix for a simple helix is ``TGCA``; if you allow
129 ``G-U`` pairs, it is ``TGYR``. RNABOB allows ``G-U`` pairing by default and uses the ``TGYR``
130 matrix for helical elements.
131
132 For example, the explicit description of our hairpin might be:
133
134 ::
135
136 \
137 h1 0:0 NNN:NNN
138 r1 0:0 R:N GNAN
139 h2 0:0 **NC:GN**
140 s1 0 UUCG
141
142 This describes a stem of 6 to 8 base pairs, in which the 4th pair from the bottom of the stem must
143 be a non-canonical GA pair. Note that, in general, the left side of the primary constraint for
144 helices and relational elements is redundant, and should be given as all N. In some cases it is
145 convenient to constrain the right side to require a particular base pair (GU, for instance) at one
146 position.
147
148 A note on mismatches: The split format for helices and relational elements works like this. The
149 number on the left constrains the primary sequence match of the left side of the primary
150 constraint. The number on the right constrains the match of the right side of the primary
151 constraint, *after* that side has been constructed according to the sequence on the left. In other
152 words, the number on the left constrains the mismatches in primary sequence only, while the number
153 on the right will constrain the number of mispaired positions in the helix.
154
155 Finally: any line that begins with a pound sign '#' is a comment line, and will not be interpreted
156 by the pattern compiler.
157
158 **Options**
159
160 The behavior of RNABOB can be modified by use of the following options:
161
162 *Complement*: Selecting this option will cause RNABOB to search for the pattern also on the
163 complementary strands.
164
165 *Skip*: This is a workaround to avoid a problem in the DNABANK. There are some sequences in the
166 database which have long stretches of ambiguous sequence (N's). Descriptors with no primary
167 sequence constraints will match these garbage sequences at many, many positions, and generate huge
168 outputs. This option toggles a search strategy that skips forward a pattern-length rather than a
169 single base when a match is found, thus printing out only a single match when overlapping matches
170 are found.
171
172 **Examples**
173
174 The following example descriptors included in the source distribution
175 (http://selab.janelia.org/software/rnabob/rnabob.tar.gz):
176
177 - trna.des - a general descriptor of a tRNA structure
178 - r17.des - descriptor of the consensus binding site for the r17 phage coat protein
179 - pseudoknot.des - description of a simple pseudoknotted structure
180
181 An example cosmid ``F22B7.fa`` from the *C. elegans* genome sequencing project is also provided
182 for running these descriptors against.
183
184 ::
185
186 \
187 # trna.des
188 #
189 # Generalized descriptor of a tRNA cloverleaf. Doesn't
190 # find them all though.
191 #
192
193 h1 s1 h2 s2 h2' s3 h3 s4 h3' s5 h4 s6 h4' h1' s8
194
195 h1 0:2 NNNNNNN:NNNNNNN
196 h2 0:1 *NNN:NNN*
197 h3 0:1 NNNNN:NNNNN
198 h4 0:1 NNNNN:NNNNN
199 s1 0 TN
200 s2 0 NNNN**********
201 s3 0 N
202 s4 0 NNNNNN*
203 s5 0 NN********************
204 s6 0 TTC****
205 s8 0 NCCA
206
207 Running RNABOB with ``trna.des`` against ``F22B7.fa`` searches the top strand of the cosmid for
208 the above motif. ``trna.des`` hits twice, once on each strand. (F22B7 has several other tRNA genes
209 in it which the pattern fails to detect - this is *not* a pattern to use for tRNA genefinding!).
210 </help>
211 <citations>
212 <citation type="doi">10.1093/bioinformatics/6.4.325</citation>
213 <citation type="bibtex">@UNPUBLISHED{rnabob,
214 author = {Eddy S.R},
215 title = {RNABOB: a program to search for RNA secondary structure motifs in sequence databases},
216 note = {}}</citation>
217 </citations>
218 </tool>