Mercurial > repos > rnateam > rnabob
view rnabob.xml @ 0:cd00b4fe6552 draft
Imported from capsule None
author | rnateam |
---|---|
date | Mon, 22 Dec 2014 09:08:31 -0500 |
parents | |
children | 5a4b00c84f50 |
line wrap: on
line source
<tool id="rbc_rnabob" name="RNABOB" version="2.2.1.0"> <description>Fast Pattern searching for RNA secondary structures</description> <requirements> <requirement type="package" version="2.2.1">rnabob</requirement> </requirements> <version_command>echo "2.2.1"</version_command> <command> <![CDATA[ rnabob -q $fancy $compStrands $skipOverlapping $descriptorFile $sequenceFile > $stdout ]]> </command> <stdio> <exit_code range="1:" level="fatal" description="Error occurred. Please check Tool Standard Error" /> <exit_code range=":-1" level="fatal" description="Error occurred. Please check Tool Standard Error" /> </stdio> <inputs> <param name="descriptorFile" type="data" format="txt" multiple="false" label="Motif Descriptor File" help="This file contains the description of the motif for which to search"/> <param name="sequenceFile" type="data" format="fasta" multiple="false" label="Sequence File" help="This file specifies the sequence in which the motif will be searched"/> <param name="compStrands" type="boolean" truevalue="-c" falsevalue="" checked="false" label="Also search on complementary strands" help="-c : Search both strands of the supplied sequence"/> <param name="skipOverlapping" type="boolean" truevalue="-s" falsevalue="" checked="false" label="Skip overlapping matches" help="-s : This is a workaround to avoid a problem in the DNABANK, overlapping matches will be ignored"/> <param name="fancy" type="boolean" checked="false" truevalue="-F" falsevalue="" label="Show Alignments" help="Display full alignments to pattern"/> </inputs> <outputs> <data format="txt" name="stdout" label="${tool.name} on ${on_string}" /> </outputs> <tests> <test> <param name="descriptorFile" value="r17.des" /> <param name="sequenceFile" value="F22B7.fa" /> <param name="compStrands" value="True" /> <param name="skipOverlapping" value="False" /> <param name="fancy" value="False" /> <output name="stdout" file="r17.bob" /> </test> <test> <param name="descriptorFile" value="trna.des" /> <param name="sequenceFile" value="F22B7.fa" /> <param name="compStrands" value="True" /> <param name="skipOverlapping" value="False" /> <param name="fancy" value="False" /> <output name="stdout" file="trna.bob" /> </test> </tests> <help> **What RNABOB does** RNABOB allows searching a sequence database for RNA structural motifs. The probe motif is specified in a *descriptor* file, which describes its primary sequence, secondary structure, and tertiary constraints. The source in its original packaging can be found at http://selab.janelia.org/software/#rnabob. ----- **Sequence database format** RNABOB is currently restricted to reading sequence files in FASTA format. The command line version of RNABOB can also read sequence files in GCG, EMBL, GenBank and other formats. ----- **Descriptor file syntax** The descriptor file syntax is fairly powerful, and allows a great deal of freedom for specifying RNA motifs. The syntax is therefore a bit complicated. The descriptor file has two parts: a **topology** description and an **explicit** description. The first non-blank, non-comment line of the file is the topology description. It defines the order of occurrence of a series of single-stranded, double-stranded and related elements. Each element must be given a unique name (a number, typically) and must be prefixed with '**s**', '**h**', or '**r**', indicating single-strand, helical, or a relational element. Helical and relational elements are paired to other elements, which are suffixed by a prime, **\'**. For example:: \ h1 s1 h1' describes a hairpin loop structure with a simple helix and single-stranded loop. If the helix always contained a non-canonical base pair at one position, the topology coud be described as:: \ h1 r1 h2 s1 h2' r1' h1' where r1,r1' indicate a correlation, where the sequence r1 constrains the sequence of r1'. (Helices are a special case of this.) The remaining non-comment, non-blank lines are explicit descriptions of each element in turn. Each line contains 3 or 4 fields, separated by tabs or blank space. The first field is the name of the element, from the topology description. The second field is the number of mismatches allowed in this element. The third field is the primary sequence constraint to apply to this element. Helices and relational element pairs are specified on a single line rather than two. Mismatches and primary sequence constraints are given as pairs, separated by a colon '**:**'. The left side is the constraint applied to the upstream element, and the right side is applied to the downstream elements. The primary sequence constraint is given as a sequence of nucleotides. Any IUPAC single-letter code is recognized, including N if the position can have any base identity. Allowed length variations are specified with asterisks ``'*'``, where each ``*`` will allow either 0 or 1 N at that position. For example:: \ GGAGG******NNNAUG specifies a GGAGG Shine/Dalgarno site and an AUG initiation codon, separated by a spacer of 3 to 9 nucleotides of any sequence. An alternative syntax can be used for very long gaps:: \ GGAGG[10]NNNAUG is the same as GGAGG**********NNNAUG Be careful defining variable length helices and relational elements; if the number and type (gap or identity) of position do not match on left and right sides, the program will refuse to accept the descriptor. Relational elements have an additional field which specifies a "transformation matrix" of four nucleotides, specifying the rule for making the ``r'`` pattern from the ``r`` sequence in order ``A-C-G-T``. For example, the transformation matrix for a simple helix is ``TGCA``; if you allow ``G-U`` pairs, it is ``TGYR``. RNABOB allows ``G-U`` pairing by default and uses the ``TGYR`` matrix for helical elements. For example, the explicit description of our hairpin might be: :: \ h1 0:0 NNN:NNN r1 0:0 R:N GNAN h2 0:0 **NC:GN** s1 0 UUCG This describes a stem of 6 to 8 base pairs, in which the 4th pair from the bottom of the stem must be a non-canonical GA pair. Note that, in general, the left side of the primary constraint for helices and relational elements is redundant, and should be given as all N. In some cases it is convenient to constrain the right side to require a particular base pair (GU, for instance) at one position. A note on mismatches: The split format for helices and relational elements works like this. The number on the left constrains the primary sequence match of the left side of the primary constraint. The number on the right constrains the match of the right side of the primary constraint, *after* that side has been constructed according to the sequence on the left. In other words, the number on the left constrains the mismatches in primary sequence only, while the number on the right will constrain the number of mispaired positions in the helix. Finally: any line that begins with a pound sign '#' is a comment line, and will not be interpreted by the pattern compiler. **Options** The behavior of RNABOB can be modified by use of the following options: *Complement*: Selecting this option will cause RNABOB to search for the pattern also on the complementary strands. *Skip*: This is a workaround to avoid a problem in the DNABANK. There are some sequences in the database which have long stretches of ambiguous sequence (N's). Descriptors with no primary sequence constraints will match these garbage sequences at many, many positions, and generate huge outputs. This option toggles a search strategy that skips forward a pattern-length rather than a single base when a match is found, thus printing out only a single match when overlapping matches are found. **Examples** The following example descriptors included in the source distribution (http://selab.janelia.org/software/rnabob/rnabob.tar.gz): - trna.des - a general descriptor of a tRNA structure - r17.des - descriptor of the consensus binding site for the r17 phage coat protein - pseudoknot.des - description of a simple pseudoknotted structure An example cosmid ``F22B7.fa`` from the *C. elegans* genome sequencing project is also provided for running these descriptors against. :: \ # trna.des # # Generalized descriptor of a tRNA cloverleaf. Doesn't # find them all though. # h1 s1 h2 s2 h2' s3 h3 s4 h3' s5 h4 s6 h4' h1' s8 h1 0:2 NNNNNNN:NNNNNNN h2 0:1 *NNN:NNN* h3 0:1 NNNNN:NNNNN h4 0:1 NNNNN:NNNNN s1 0 TN s2 0 NNNN********** s3 0 N s4 0 NNNNNN* s5 0 NN******************** s6 0 TTC**** s8 0 NCCA Running RNABOB with ``trna.des`` against ``F22B7.fa`` searches the top strand of the cosmid for the above motif. ``trna.des`` hits twice, once on each strand. (F22B7 has several other tRNA genes in it which the pattern fails to detect - this is *not* a pattern to use for tRNA genefinding!). </help> <citations> <citation type="doi">10.1093/bioinformatics/6.4.325</citation> <citation type="bibtex">@UNPUBLISHED{rnabob, author = {Eddy S.R}, title = {RNABOB: a program to search for RNA secondary structure motifs in sequence databases}, note = {}}</citation> </citations> </tool>