tmhmm_and_signalp: tools/protein_analysis/signalp3.xml comparison

comparison tools/protein_analysis/signalp3.xml @ 11:99b82a2b1272 draft

Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)

author	peterjc
date	Wed, 03 Apr 2013 10:49:10 -0400
parents	e52220a9ddad
children	7de64c8b258d

comparison

equal deleted inserted replaced

-:09ff180d1615
+:99b82a2b1272
-<tool id="signalp3" name="SignalP 3.0" version="0.0.10">
+<tool id="signalp3" name="SignalP 3.0" version="0.0.11">
 <description>Find signal peptides in protein sequences</description>
 <!-- If job splitting is enabled, break up the query file into parts -->
 <!-- Using 2000 chunks meaning 4 threads doing 500 each is ideal -->
 <parallelism method="basic" split_inputs="fasta_file" split_mode="to_size" split_size="2000" merge_outputs="tabular_file"></parallelism>
 <command interpreter="python">
 This calls the SignalP v3.0 tool for prediction of signal peptides, which uses both a Neural Network (NN) and Hidden Markov Model (HMM) to produce two sets of scores.
 The input is a FASTA file of protein sequences, and the output is tabular with twenty columns (one row per protein):
-* Sequence identifier
+====== =================================================
-* Neural Network (NN) predictions (13 columns)
+Column Description
-* Hidden Markov Model (HMM) predictions (6 columns)
+------ -------------------------------------------------
+1 Sequence identifier
+2-14 Neural Network (NN) predictions (13 columns)
+15-20 Hidden Markov Model (HMM) predictions (6 columns)
+====== =================================================
 Internally the input FASTA file is divided into parts (to allow multiple processors to be used), and the proteins truncated as specified (see below). The raw output from SignalP is then reformatted into a tabular layout suitable for Galaxy (see below).
 **Neural Network Scores**
 For each organism class (Eukaryote, Gram-negative and Gram-positive), two different neural networks are used, one for predicting the actual signal peptide and one for predicting the position of the signal peptidase I (SPase I) cleavage site.
 The NN output comprises three different scores (C-max, S-max and Y-max) and two scores derived from them (S-mean and D-score).
-The C-score is the 'cleavage site' score. For each position in the submitted sequence, a C-score is reported, which should only be significantly high at the cleavage site. Confusion is often seen with the position numbering of the cleavage site. When a cleavage site position is referred to by a single number, the number indicates the first residue in the mature protein, meaning that a predicted cleavage site between amino acid 26-27 is reported as 27, corresponding to the mature protein starting at (and including) position 27.
+====== ======= ===============================================================
+Column Name    Description
-The S-score for the signal peptide prediction is calculated for every single amino acid position in the submitted sequence (not shown in the output via Galaxy), with high scores indicating that the corresponding amino acid is part of a signal peptide, and low scores indicating that the amino acid is part of a mature protein.
+------ ------- ---------------------------------------------------------------
+2-4 C-score The C-score is the 'cleavage site' score. For each position in
-Y-max is a derivative of the C-score combined with the S-score resulting in a better cleavage site prediction than the raw C-score alone. This is due to the fact that multiple high-peaking C-scores can be found in one sequence, where only one is the true cleavage site. The cleavage site is assigned from the Y-score where the slope of the S-score is steep and a significant C-score is found.
+the submitted sequence, a C-score is reported, which should
+only be significantly high at the cleavage site. Confusion is
-The S-mean is the average of the S-score, ranging from the N-terminal amino acid to the amino acid assigned with the highest Y-max score, thus the S-mean score is calculated for the length of the predicted signal peptide. The S-mean score was in SignalP version 2.0 used as the criteria for discrimination of secretory and non-secretory proteins.
+often seen with the position numbering of the cleavage site.
+When a cleavage site position is referred to by a single number,
-The D-score was introduced in SignalP version 3.0 and is a simple average of the S-mean and Y-max score. The score shows superior discrimination performance of secretory and non-secretory proteins to that of the S-mean score which was used in SignalP version 1 and 2.
+the number indicates the first residue in the mature protein,
+meaning, that a predicted cleavage site between amino acid 26-27
+is reported as 27, corresponding to the mature protein starting
+at (and including) position 27.
+------ ------- ---------------------------------------------------------------
+5-7 S-score The S-score for the signal peptide prediction is calculated for
+every single amino acid position in the submitted sequence (not
+shown in the output via Galaxy), with high scores indicating
+that the corresponding amino acid is part of a signal peptide,
+and low scores indicating that the amino acid is part of a
+mature protein.
+------ ------- ---------------------------------------------------------------
+8-10 Y-max   Y-max is a derivative of the C-score combined with the S-score
+resulting in a better cleavage site prediction than the raw
+C-score alone. This is due to the fact that multiple high-peaking
+C-scores can be found in one sequence, where only one is the
+true cleavage site. The cleavage site is assigned from the
+Y-score where the slope of the S-score is steep and a
+significant C-score is found.
+------ ------- ---------------------------------------------------------------
+11-12 S-mean  The S-mean is the average of the S-score, ranging from the
+N-terminal amino acid to the amino acid assigned with the
+highest Y-max score, thus the S-mean score is calculated for
+the length of the predicted signal peptide. The S-mean score
+was in SignalP version 2.0 used as the criteria for
+discrimination of secretory and non-secretory proteins.
+------ ------- ---------------------------------------------------------------
+13-14 D-score The D-score was introduced in SignalP version 3.0 and is a
+simple average of the S-mean and Y-max score. The score shows
+superior discrimination performance of secretory and
+non-secretory proteins to that of the S-mean score which was
+used in SignalP version 1 and 2.
+====== ======= ===============================================================
 For non-secretory proteins all the scores represented in the SignalP3-NN output should ideally be very low.
 **Hidden Markov Model Scores**

Mercurial > repos > peterjc > tmhmm_and_signalp

comparison tools/protein_analysis/signalp3.xml @ 11:99b82a2b1272 draft