diff tools/protein_analysis/signalp3.xml @ 11:99b82a2b1272 draft

Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
author peterjc
date Wed, 03 Apr 2013 10:49:10 -0400
parents e52220a9ddad
children 7de64c8b258d
line wrap: on
line diff
--- a/tools/protein_analysis/signalp3.xml	Wed Mar 27 11:21:05 2013 -0400
+++ b/tools/protein_analysis/signalp3.xml	Wed Apr 03 10:49:10 2013 -0400
@@ -1,4 +1,4 @@
-<tool id="signalp3" name="SignalP 3.0" version="0.0.10">
+<tool id="signalp3" name="SignalP 3.0" version="0.0.11">
     <description>Find signal peptides in protein sequences</description>
     <!-- If job splitting is enabled, break up the query file into parts -->
     <!-- Using 2000 chunks meaning 4 threads doing 500 each is ideal -->
@@ -71,9 +71,13 @@
 
 The input is a FASTA file of protein sequences, and the output is tabular with twenty columns (one row per protein):
 
- * Sequence identifier
- * Neural Network (NN) predictions (13 columns)
- * Hidden Markov Model (HMM) predictions (6 columns)
+====== =================================================
+Column Description
+------ -------------------------------------------------
+     1 Sequence identifier
+  2-14 Neural Network (NN) predictions (13 columns)
+ 15-20 Hidden Markov Model (HMM) predictions (6 columns)
+====== =================================================
 
 Internally the input FASTA file is divided into parts (to allow multiple processors to be used), and the proteins truncated as specified (see below). The raw output from SignalP is then reformatted into a tabular layout suitable for Galaxy (see below).
 
@@ -83,15 +87,47 @@
 
 The NN output comprises three different scores (C-max, S-max and Y-max) and two scores derived from them (S-mean and D-score).
 
-The C-score is the 'cleavage site' score. For each position in the submitted sequence, a C-score is reported, which should only be significantly high at the cleavage site. Confusion is often seen with the position numbering of the cleavage site. When a cleavage site position is referred to by a single number, the number indicates the first residue in the mature protein, meaning that a predicted cleavage site between amino acid 26-27 is reported as 27, corresponding to the mature protein starting at (and including) position 27.
-
-The S-score for the signal peptide prediction is calculated for every single amino acid position in the submitted sequence (not shown in the output via Galaxy), with high scores indicating that the corresponding amino acid is part of a signal peptide, and low scores indicating that the amino acid is part of a mature protein.
-
-Y-max is a derivative of the C-score combined with the S-score resulting in a better cleavage site prediction than the raw C-score alone. This is due to the fact that multiple high-peaking C-scores can be found in one sequence, where only one is the true cleavage site. The cleavage site is assigned from the Y-score where the slope of the S-score is steep and a significant C-score is found.
-
-The S-mean is the average of the S-score, ranging from the N-terminal amino acid to the amino acid assigned with the highest Y-max score, thus the S-mean score is calculated for the length of the predicted signal peptide. The S-mean score was in SignalP version 2.0 used as the criteria for discrimination of secretory and non-secretory proteins.
-
-The D-score was introduced in SignalP version 3.0 and is a simple average of the S-mean and Y-max score. The score shows superior discrimination performance of secretory and non-secretory proteins to that of the S-mean score which was used in SignalP version 1 and 2.
+====== ======= ===============================================================
+Column Name    Description 
+------ ------- ---------------------------------------------------------------
+   2-4 C-score The C-score is the 'cleavage site' score. For each position in
+               the submitted sequence, a C-score is reported, which should
+               only be significantly high at the cleavage site. Confusion is
+               often seen with the position numbering of the cleavage site.
+               When a cleavage site position is referred to by a single number,
+               the number indicates the first residue in the mature protein,
+               meaning, that a predicted cleavage site between amino acid 26-27
+               is reported as 27, corresponding to the mature protein starting
+               at (and including) position 27.
+------ ------- ---------------------------------------------------------------
+   5-7 S-score The S-score for the signal peptide prediction is calculated for
+               every single amino acid position in the submitted sequence (not
+               shown in the output via Galaxy), with high scores indicating
+               that the corresponding amino acid is part of a signal peptide,
+               and low scores indicating that the amino acid is part of a
+               mature protein.
+------ ------- ---------------------------------------------------------------
+  8-10 Y-max   Y-max is a derivative of the C-score combined with the S-score
+               resulting in a better cleavage site prediction than the raw
+               C-score alone. This is due to the fact that multiple high-peaking
+               C-scores can be found in one sequence, where only one is the
+               true cleavage site. The cleavage site is assigned from the
+               Y-score where the slope of the S-score is steep and a
+               significant C-score is found.
+------ ------- ---------------------------------------------------------------
+ 11-12 S-mean  The S-mean is the average of the S-score, ranging from the
+               N-terminal amino acid to the amino acid assigned with the
+               highest Y-max score, thus the S-mean score is calculated for
+               the length of the predicted signal peptide. The S-mean score
+               was in SignalP version 2.0 used as the criteria for
+               discrimination of secretory and non-secretory proteins.
+------ ------- ---------------------------------------------------------------
+ 13-14 D-score The D-score was introduced in SignalP version 3.0 and is a
+               simple average of the S-mean and Y-max score. The score shows
+               superior discrimination performance of secretory and
+               non-secretory proteins to that of the S-mean score which was
+               used in SignalP version 1 and 2.
+====== ======= ===============================================================
 
 For non-secretory proteins all the scores represented in the SignalP3-NN output should ideally be very low.