annotate scythe.xml @ 9:c7ea7b299f01 draft default tip

Uploaded
author nikhil-joshi
date Tue, 10 Mar 2015 20:38:40 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
9
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
1 <tool id="scythe" name="Scythe" version="0.991">
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
2 <description>Trimming adapters/contaminants using a Naive Bayesian classifier</description>
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
3
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
4 <command>
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
5 scythe -a $adapter_file
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
6
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
7 #if $input_fastq.ext == "fastq":
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
8 -q sanger
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
9 #else if $input_fastq.ext == "fastqsanger":
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
10 -q sanger
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
11 #else if $input_fastq.ext == "fastqillumina":
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
12 -q illumina
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
13 #else if $input_fastq.ext == "fastqsolexa":
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
14 -q solexa
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
15 #end if
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
16
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
17 #if str($add_tag) == "add_tag_true":
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
18 -t
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
19 #end if
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
20
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
21 #if str($prior) != "":
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
22 -p $prior
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
23 #end if
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
24
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
25 #if str($min_match) != "":
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
26 -n $min_match
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
27 #end if
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
28
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
29 #if str($min_keep) != "":
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
30 -M $min_keep
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
31 #end if
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
32
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
33 #if str($matches_file) == "matches_file_true":
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
34 -m $output_matches
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
35 #end if
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
36
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
37 -o $output_trimmed $input_fastq 2>&amp;1
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
38 </command>
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
39
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
40 <inputs>
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
41 <param format="fastq, fastqsanger, fastqillumina, fastqsolexa" name="input_fastq" type="data" optional="false" label="FastQ Reads" help="Note: Scythe will infer the quality type of the file from its datatype. I.e., if the datatype is fastqsanger, then the quality type is sanger. The default is fastqsanger."/>
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
42
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
43 <param format="fasta" name="adapter_file" type="data" optional="false" label="Adapter/Contaminant file (in fasta format)"/>
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
44
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
45 <param name="add_tag" type="boolean" checked="false" truevalue="add_tag_true" falsevalue="add_tag_false" label="Add a tag to the header indicating that Scythe cut a sequence?"/>
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
46
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
47 <param name="matches_file" type="boolean" checked="false" truevalue="matches_file_true" falsevalue="matches_file_false" label="Also output another file with details about adapter/contaminant matches?"/>
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
48
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
49 <param name="prior" value="0.3" type="float" optional="true" label="Prior" help="The prior contamination rate">
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
50 <validator type="in_range" min="0" message="Minimum value is 0"/>
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
51 </param>
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
52
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
53 <param name="min_match" value="5" type="integer" optional="true" label="Smallest length adapter/contaminant to consider">
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
54 <validator type="in_range" min="0" message="Minimum value is 0"/>
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
55 </param>
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
56
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
57 <param name="min_keep" value="35" type="integer" optional="true" label="Filter sequences less than this length (after trimming)">
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
58 <validator type="in_range" min="0" message="Minimum value is 0"/>
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
59 </param>
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
60 </inputs>
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
61
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
62 <outputs>
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
63 <data format_source="input_fastq" name="output_trimmed" label="Adapter/Contaminant Trimmed FastQ using ${tool.name} on ${on_string}"/>
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
64
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
65 <data format="txt" name="output_matches" label="Matches of Adapters/Contaminants using ${tool.name} on ${on_string}">
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
66 <filter>(matches_file == True)</filter>
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
67 </data>
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
68 </outputs>
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
69
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
70 <help>
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
71 Scythe uses a Naive Bayesian approach to classify contaminant substrings in sequence reads. It considers quality information, which can make it robust in picking out 3'-end adapters, which often include poor quality bases.
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
72
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
73 Most next generation sequencing reads have deteriorating quality towards the 3'-end. It's common for a quality-based trimmer to be employed before mapping, assemblies, and analysis to remove these poor quality bases. However, quality-based trimming could remove bases that are helpful in identifying (and removing) 3'-end adapter contaminants. Thus, it is recommended you run Scythe before quality-based trimming, as part of a read quality control pipeline.
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
74
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
75 The Bayesian approach Scythe uses compares two likelihood models: the probability of seeing the matches in a sequence given contamination, and not given contamination. Given that the read is contaminated, the probability of seeing a certain number of matches and mistmatches is a function of the quality of the sequence. Given the read is not contaminated (and is thus assumed to be random sequence), the probability of seeing a certain number of matches and mismatches is chance. The posterior is calculated across both these likelihood models, and the class (contaminated or not contaminated) with the maximum posterior probability is the class selected.
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
76
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
77 Scythe will infer the quality type from the datatype of the file.
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
78 </help>
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
79
c7ea7b299f01 Uploaded
nikhil-joshi
parents:
diff changeset
80 </tool>