changeset 3:7adb2518f6e9 draft default tip

planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/rna_tools/rna_shapes commit 1527e05bcd748a2b3cef22e0e356697066a55635
author rnateam
date Sat, 11 Nov 2017 15:07:41 -0500
parents f33190f18ee6
children
files .tool_dependencies.xml.swp RNAShapes.xml tool_dependencies.xml
diffstat 3 files changed, 46 insertions(+), 43 deletions(-) [+]
line wrap: on
line diff
Binary file .tool_dependencies.xml.swp has changed
--- a/RNAShapes.xml	Fri Jun 19 11:43:35 2015 -0400
+++ b/RNAShapes.xml	Sat Nov 11 15:07:41 2017 -0500
@@ -1,5 +1,4 @@
-<?xml version='1.0' encoding='UTF-8'?>
-<tool id="RNAshapes" name="RNAshapes" version="3.2.2">
+<tool id="RNAshapes" name="RNAshapes" version="3.3.0">
   <description>RNA Secondary structure prediction</description>
   <macros>
     <token name="@EXECUTABLE@">RNAshapes</token>
@@ -65,20 +64,12 @@
       <param name="param_structure_probs" type="boolean" truevalue="1" falsevalue="0" label="Structure Probabilities" help="(--structureProbs) In addition to free energy also the probability of structures is calculated."/>
     </macro>
   </macros>
-
   <requirements>
     <requirement type="binary">@EXECUTABLE@</requirement>
-    <requirement type="package" version="3.2.5">rnashapes</requirement>
+    <requirement type="package" version="3.3.0">rnashapes</requirement>
   </requirements>
-  <stdio>
-    <exit_code range="1:"/>
-    <exit_code range=":-1"/>
-    <regex match="Error:"/>
-    <regex match="Exception:"/>
-  </stdio>
-
-  <command>
-  <![CDATA[
+  <command detect_errors="aggressive">
+<![CDATA[
   RNAshapes 
   --mode $param_cond_mode.param_mode
   #if $param_cond_mode.param_mode != 'outside':
@@ -230,50 +221,68 @@
 
   <help>
 
-**What id does**
+**What it does**
 
-RNA secondary structure predictions
+This tool predicts RNA secondary structures. RNAshape abstraction maps structures to a tree-like domain of shapes, retaining adjacency and nesting of structural features, but disregarding helix lengths. Shape abstraction integrates well with dynamic programming algorithms, and hence it can be applied during structure prediction rather than afterwards. This avoids exponential explosion and can still give us a non-heuristic and complete account of properties of the molecule's folding space.
+
+
+**Input**
 
------
+RNA sequence(s): A (multiple) FASTA file, containing RNA primary sequences.
+RNA secondary structure: A Vienna dot-bracket formatted string, representing a seconday RNA structure.
+RNA sequence: Exactly one RNA primary sequence.
+RNA family: A family of at least two potentially related RNA sequences. This is not an alignment, since sequences can have different lengths.
+
 
-**modes**
+**Parameters**  
+
+**Calculation Modes**
+
+- **shapes**: Output of "subopt" mode is crowded by many very similar answers, which make it hard to focus to the "important" changes. The abstract shape concept groups similar answers together and reports only the best answer within such a group. Due to abstraction, suboptimal analyses can be done more thorough, by ignoring boring differences. (see parameter --shapeLevel)
 
-+ **mfe**: Computes the single energetically most stable secondary structure for the given RNA sequence. Co-optimal results will be suppressed, i.e. should different prediction have the same best energy value, just an arbitrary one out of them will be reported. This resembles the function of the program "RNAfold" of the Vienna group. If you only use "mfe" mode, consider switching to RNAfold, because their implementation is much faster, due to sophisticated low level C optimisations.
-+ **subopt**: Often, the biological relevant structure is hidden among suboptimal predictions. In "subopt" mode, you can also inspect all suboptimal solutions up to a given threshold (see parameters --absoluteDeviation and --relativeDeviation). Duplicates might appear when using grammar "microstate", due to its semantic ambiguity according Vienna-Dot-Bracket strings. 
-+ **shapes**: Output of "subopt" mode is crowded by many very similar answers, which make it hard to focus to the "important" changes. The abstract shape concept groups similar answers together and reports only the best answer within such a group. Due to abstraction, suboptimal analyses can be done more thorough, by ignoring boring differences. (see parameter --shapeLevel)
-+ **probs**: Structure probabilities are strictly correlated to their energy values. Grouped together into shape classes, their probabilities add up. Often a shape class with many members of worse energy becomes more probable than the shape containing the mfe structure but not much more members.
-+ **sample**: Probabilistic sampling based on partition function. This mode combines stochastic sampling with a-posteriori shape abstraction. A sample from the structure space holds M structures together with their shapes, on which classification is performed. The probability of a shape can then be approximated by its frequency in the sample.
-+ **cast**: This mode is the RNAcast approach. For a family of RNA sequences, this method independently enumerates the near-optimal abstract shape space, and predicts as the consensus an abstract shape common to all sequences. For each sequence, it delivers the thermodynamically best structure which has this common shape. Input is a multiple fasta file, which should contain at least two sequences. Output is sorted by "score" of common shapes, i.e. summed free energy of all sequences. R is the rank (= list position) of the shape in individual sequence analysis.
-+ **eval**: Evaluates the free energy of an RNA molecule in fixed secondary structure, similar to RNAeval from the Vienna group. Multiple answers stem from semantic ambiguity of the underlying grammar. It might happen, that your given structure is not a structure for the sequence. Maybe your settings are too restrictive, e.g. not allowing lonely base-pairs (--allowLP). If you input a (multiple) FASTA file, RNAshapes assumes that exactly first half of the contents of each entry is RNA sequence, second half is the according structure. Whitespaces are ignored.
-+ **abstract**: Converts a Vienna-Dot-Bracket representation of a secondary structure into a shape string.
-+ **outside**: Applies the "outside"-algorithm to compute probabilities for all base pairs (i,j), based on the partition function. Output is a PostScript file, visualizing these probabilities as a "dot plot". The "dot plot" shows a matrix of squares with area proportional to the base pair probabilities in the upper right half. For each pair (i,j) with probability above --bppmThreshold there is a line of the form i j sqrt(p) ubox in the PostScript file, so that they can be easily extracted.
-+ **mea**: Finds the secondary structure with the maximal sum of base-pair probabilities (MEA=maximal expected accuracy). The equivalent Vienna Package name is the 'centroid secondary structure', defined as 'The centroid structure is the structure with the minimum total base-pair distance to all structures in the thermodynamic ensemble.'.
+- **mfe**: Computes the single energetically most stable secondary structure for the given RNA sequence. Co-optimal results will be suppressed, i.e. should different prediction have the same best energy value, just an arbitrary one out of them will be reported. This resembles the function of the program "RNAfold" of the Vienna group. If you only use "mfe" mode, consider switching to RNAfold, because their implementation is much faster, due to sophisticated low level C optimisations.
+
+- **subopt**: Often, the biological relevant structure is hidden among suboptimal predictions. In "subopt" mode, you can also inspect all suboptimal solutions up to a given threshold (see parameters --absoluteDeviation and --relativeDeviation). Duplicates might appear when using grammar "microstate", due to its semantic ambiguity according Vienna-Dot-Bracket strings. 
 
------
+- **probs**: Structure probabilities are strictly correlated to their energy values. Grouped together into shape classes, their probabilities add up. Often a shape class with many members of worse energy becomes more probable than the shape containing the mfe structure but not much more members.
+
+- **sample**: Probabilistic sampling based on partition function. This mode combines stochastic sampling with a-posteriori shape abstraction. A sample from the structure space holds M structures together with their shapes, on which classification is performed. The probability of a shape can then be approximated by its frequency in the sample.
+
+- **cast**: This mode is the RNAcast approach. For a family of RNA sequences, this method independently enumerates the near-optimal abstract shape space, and predicts as the consensus an abstract shape common to all sequences. For each sequence, it delivers the thermodynamically best structure which has this common shape. Input is a multiple fasta file, which should contain at least two sequences. Output is sorted by "score" of common shapes, i.e. summed free energy of all sequences. R is the rank (= list position) of the shape in individual sequence analysis.
 
-**grammar**
+- **eval**: Evaluates the free energy of an RNA molecule in fixed secondary structure, similar to RNAeval from the Vienna group. Multiple answers stem from semantic ambiguity of the underlying grammar. It might happen, that your given structure is not a structure for the sequence. Maybe your settings are too restrictive, e.g. not allowing lonely base-pairs (--allowLP). If you input a (multiple) FASTA file, RNAshapes assumes that exactly first half of the contents of each entry is RNA sequence, second half is the according structure. Whitespaces are ignored.
 
-How to treat "dangling end" energies for bases adjacent to helices in free ends and multi-loops. 
+- **abstract**: Converts a Vienna-Dot-Bracket representation of a secondary structure into a shape string.
 
-+ **nodangle**: (-d 0 in Vienna package) ignores dangling energies altogether.
-+ **overdangle**: (-d 2 in Vienna package) always dangles bases onto helices, even if they are part of neighboring helices themselves. Seems to be wrong, but could perform surprisingly well.
-+ **microstate**: (-d 1 in Vienna package) correct optimisation of all dangling possibilities, unfortunately this results in an semantically ambiguous search space regarding Vienna-Dot-Bracket notations.
-+ **macrostate**: (no correspondens in Vienna package) same as microstate, while staying unambiguous. Unfortunately, mfe computation violates Bellman's principle of optimality. Default is "macrostate".
+- **outside**: Applies the "outside"-algorithm to compute probabilities for all base pairs (i,j), based on the partition function. Output is a PostScript file, visualizing these probabilities as a "dot plot". The "dot plot" shows a matrix of squares with area proportional to the base pair probabilities in the upper right half. For each pair (i,j) with probability above --bppmThreshold there is a line of the form i j sqrt(p) ubox in the PostScript file, so that they can be easily extracted.
 
------
+- **mea**: Finds the secondary structure with the maximal sum of base-pair probabilities (MEA=maximal expected accuracy). The equivalent Vienna Package name is the 'centroid secondary structure', defined as 'The centroid structure is the structure with the minimum total base-pair distance to all structures in the thermodynamic ensemble.'.
 
-**windowSize**
+
+**Window Size**
 
 Activates window mode and computes substrings of size i for the input. After computation for the first i bases is done, the window is pushed j bases to the right and the next computation is startet. j is set by --windowIncrement. i must be a non-zero positive integer, smaller than the input length.
 
+
 **windowIncrement**
 
 If --windowSize is given, this parameter sets the offset for the next window to j bases. j must be a non-zero positive integer, smaller than --windowSize.
 
------
+
+**Dangling End Energies**
+
+How to treat "dangling end" energies for bases adjacent to helices in free ends and multi-loops. 
+
+- **nodangle**: (-d 0 in Vienna package) ignores dangling energies altogether.
+- **overdangle**: (-d 2 in Vienna package) always dangles bases onto helices, even if they are part of neighboring helices themselves. Seems to be wrong, but could perform surprisingly well.
+- **microstate**: (-d 1 in Vienna package) correct optimisation of all dangling possibilities, unfortunately this results in an semantically ambiguous search space regarding Vienna-Dot-Bracket notations.
+- **macrostate**: (no correspondens in Vienna package) same as microstate, while staying unambiguous. Unfortunately, mfe computation violates Bellman's principle of optimality. Default is "macrostate".
+
 
 For more information, visit http://bibiserv2.cebitec.uni-bielefeld.de/rnashapes?id=rnashapes_rnashapes_manual_manual
+
   </help>
+  
   <citations>
     <citation type="doi">doi:10.1093/bioinformatics/btu649</citation>
   </citations>
--- a/tool_dependencies.xml	Fri Jun 19 11:43:35 2015 -0400
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,6 +0,0 @@
-<?xml version="1.0"?>
-<tool_dependency>
-    <package name="rnashapes" version="3.2.5">
-        <repository changeset_revision="62faae9d2401" name="package_rnashapes_3_2_5" owner="rnateam" prior_installation_required="True" toolshed="https://toolshed.g2.bx.psu.edu" />
-    </package>
-</tool_dependency>