diff tools/protein_analysis/tmhmm2.xml @ 0:bca9bc7fdaef

Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
author peterjc
date Tue, 07 Jun 2011 18:03:34 -0400
parents
children 3ff1dcbb9440
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/protein_analysis/tmhmm2.xml	Tue Jun 07 18:03:34 2011 -0400
@@ -0,0 +1,81 @@
+<tool id="tmhmm2" name="TMHMM 2.0" version="0.0.1">
+    <description>Find transmembrane domains in protein sequences</description>
+    <command interpreter="python">
+      tmhmm2.py 8 $fasta_file $tabular_file
+      ##I want the number of threads to be a Galaxy config option...
+    </command>
+    <inputs>
+        <param name="fasta_file" type="data" format="fasta" label="FASTA file of protein sequences"/> 
+        <!--
+        <param name="version" type="select" display="radio" label="Model version">
+            <option value="">Version 1 (old)</option>
+            <option value="" selected="True">Version 2 (default)</option>
+        </param>
+        -->
+    </inputs>
+    <outputs>
+        <data name="tabular_file" format="tabular" label="TMHMM results" />
+    </outputs>
+    <requirements>
+        <requirement type="binary">tmhmm</requirement>
+    </requirements>
+    <tests>
+        <test>
+            <param name="fasta_file" value="four_human_proteins.fasta" ftype="fasta"/>
+            <output name="tabular_file" file="four_human_proteins.tmhmm2.tsv" ftype="tabular"/>
+        </test>
+    </tests>
+    <help>
+    
+**What it does**
+
+This calls the TMHMM v2.0 tool for prediction of transmembrane (TM)  helices in proteins using a hidden Markov model (HMM).
+
+The input is a FASTA file of protein sequences, and the output is tabular with six columns (one row per protein):
+
+ 1. Sequence identifier
+ 2. Sequence length
+ 3. Expected number of amino acids in TM helices (ExpAA). If this number is larger than 18 it is very likely to be a transmembrane protein (OR have a signal peptide).
+ 4. Expected number of amino acids in TM helices in the first 60 amino acids of the protein (Exp60). If this number more than a few, be aware that a predicted transmembrane helix in the N-term could be a signal peptide.
+ 5. Number of transmembrane helices predicted by N-best.
+ 6. Topology predicted by N-best (encoded as a strip using o for output and i for inside)
+
+Predicted TM segments in the n-terminal region sometime turn out to be signal peptides.
+
+One of the most common mistakes by the program is to reverse the direction of proteins with one TM segment.
+
+Do not use the program to predict whether a non-membrane protein is cytoplasmic or not. 
+
+**Notes**
+
+The raw output from TMHMM v2.0 looks like this (six columns tab separated):
+
+=================================== ======= =========== ============= ========= =============================
+gi|2781234|pdb|1JLY|B               len=304 ExpAA=0.01  First60=0.00  PredHel=0 Topology=o
+gi|4959044|gb|AAD34209.1|AF069992_1 len=600 ExpAA=0.00  First60=0.00  PredHel=0 Topology=o
+gi|671626|emb|CAA85685.1|           len=473 ExpAA=0.19  First60=0.00  PredHel=0 Topology=o
+gi|3298468|dbj|BAA31520.1|          len=107 ExpAA=59.37 First60=31.17 PredHel=3 Topology=o23-45i52-74o89-106i
+=================================== ======= =========== ============= ========= =============================
+
+In order to make it easier to use in Galaxy, the wrapper script simplifies this to remove the redundant tags, and instead adds a comment line at the top with the column names:
+
+=================================== === ===== ======= ======= ====================
+#ID                                 len	ExpAA First60 PredHel Topology 
+gi|2781234|pdb|1JLY|B               304  0.01    0.00       0 o
+gi|4959044|gb|AAD34209.1|AF069992_1 600  0.00    0.00       0 o
+gi|671626|emb|CAA85685.1|           473  0.19    0.00       0 o
+gi|3298468|dbj|BAA31520.1|          107 59.37   31.17       3 o23-45i52-74o89-106i
+=================================== === ===== ======= ======= ====================
+
+**References**
+
+Krogh, Larsson, von Heijne, and Sonnhammer.
+Predicting Transmembrane Protein Topology with a Hidden Markov Model: Application to Complete Genomes.
+J. Mol. Biol. 305:567-580, 2001.
+
+Sonnhammer, von Heijne, and Krogh.
+A hidden Markov model for predicting transmembrane helices in protein sequences.
+In J. Glasgow et al., eds.: Proc. Sixth Int. Conf. on Intelligent Systems for Molecular Biology, pages 175-182. AAAI Press, 1998.
+
+    </help>
+</tool>