comparison SMART/DiffExpAnal/fastq_groomer_parallel.xml @ 18:94ab73e8a190

Uploaded
author m-zytnicki
date Mon, 29 Apr 2013 03:20:15 -0400
parents
children
comparison
equal deleted inserted replaced
17:b0e8584489e6 18:94ab73e8a190
1 <tool id="fastq_groomer_parallel" name="FASTQ Groomer (for DEA)" version="1.0.0">
2 <description>convert between various FASTQ quality formats for a list of inputs.</description>
3 <command interpreter="python">fastq_groomer_parallel.py '$input_file' '$input_type' '$output_file'
4 #if str( $options_type['options_type_selector'] ) == 'basic':
5 #if str( $input_type ) == 'cssanger':
6 'cssanger'
7 #else:
8 'sanger'
9 #end if
10 'ascii' 'summarize_input'
11 #else:
12 '${options_type.output_type}' '${options_type.force_quality_encoding}' '${options_type.summarize_input}'
13 #end if
14 #if $OptionPairedEnd.pairedEnd == "Yes":
15 '$OptionPairedEnd.pairedEnd_input' '$output_pairedEndFile'
16 #else:
17 'None' 'None'
18 #end if
19 </command>
20 <inputs>
21 <param name="input_file" type="data" format="txt" label="The File list to groom" />
22 <param name="input_type" type="select" label="Input FASTQ quality scores type">
23 <option value="solexa">Solexa</option>
24 <option value="illumina">Illumina 1.3-1.7</option>
25 <option value="sanger" selected="True">Sanger</option>
26 <option value="cssanger">Color Space Sanger</option>
27 </param>
28 <conditional name="options_type">
29 <param name="options_type_selector" type="select" label="Advanced Options">
30 <option value="basic" selected="True">Hide Advanced Options</option>
31 <option value="advanced">Show Advanced Options</option>
32 </param>
33 <when value="basic">
34 <!-- no options -->
35 </when>
36 <when value="advanced">
37 <param name="output_type" type="select" label="Output FASTQ quality scores type" help="Galaxy tools are designed to work with the Sanger Quality score format.">
38 <option value="solexa">Solexa</option>
39 <option value="illumina">Illumina 1.3+</option>
40 <option value="sanger" selected="True">Sanger (recommended)</option>
41 <option value="cssanger">Color Space Sanger</option>
42 </param>
43 <param name="force_quality_encoding" type="select" label="Force Quality Score encoding">
44 <option value="None">Use Source Encoding</option>
45 <option value="ascii" selected="True">ASCII</option>
46 <option value="decimal">Decimal</option>
47 </param>
48 <param name="summarize_input" type="select" label="Summarize input data">
49 <option value="summarize_input" selected="True">Summarize Input</option>
50 <option value="dont_summarize_input">Do not Summarize Input (faster)</option>
51 </param>
52 </when>
53 </conditional>
54
55 <conditional name="OptionPairedEnd">
56 <param name="pairedEnd" type="select" label="For paired-end analysis.">
57 <option value="Yes">Yes</option>
58 <option value="No" selected="true">No</option>
59 </param>
60 <when value="Yes">
61 <param name="pairedEnd_input" type="data" format="txt" label="input paired-end files list"/>
62 </when>
63 <when value="No">
64 </when>
65 </conditional>
66
67 </inputs>
68
69 <outputs>
70 <data name="output_file" format="txt">
71 </data>
72 <data format="txt" name="output_pairedEndFile" label="output Paired-end fastq files">
73 <filter>(OptionPairedEnd['pairedEnd']=='Yes')</filter>
74 </data>
75 </outputs>
76 <help>
77 **What it does**
78
79 This tool offers several conversions options relating to the FASTQ format.
80
81 When using *Basic* options, the output will be *sanger* formatted or *cssanger* formatted (when the input is Color Space Sanger).
82
83 When converting, if a quality score falls outside of the target score range, it will be coerced to the closest available value (i.e. the minimum or maximum).
84
85 When converting between Solexa and the other formats, quality scores are mapped between Solexa and PHRED scales using the equations found in `Cock PJ, Fields CJ, Goto N, Heuer ML, Rice PM. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res. 2009 Dec 16.`_
86
87 When converting between color space (csSanger) and base/sequence space (Sanger, Illumina, Solexa) formats, adapter bases are lost or gained; if gained, the base 'G' is used as the adapter. You cannot convert a color space read to base space if there is no adapter present in the color space sequence. Any masked or ambiguous nucleotides in base space will be converted to 'N's when determining color space encoding.
88
89 -----
90
91 **Quality Score Comparison**
92
93 ::
94
95 SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS
96 ...............................IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
97 ..........................XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
98 !"#$%&amp;'()*+,-./0123456789:;&lt;=&gt;?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
99 | | | | | |
100 33 59 64 73 104 126
101
102 S - Sanger Phred+33, 93 values (0, 93) (0 to 60 expected in raw reads)
103 I - Illumina 1.3 Phred+64, 62 values (0, 62) (0 to 40 expected in raw reads)
104 X - Solexa Solexa+64, 67 values (-5, 62) (-5 to 40 expected in raw reads)
105
106 Diagram adapted from http://en.wikipedia.org/wiki/FASTQ_format
107
108 .. class:: infomark
109
110 Output from Illumina 1.8+ pipelines are Sanger encoded.
111
112 ------
113
114 **Citation**
115
116 If you use this tool, please cite `Blankenberg D, Gordon A, Von Kuster G, Coraor N, Taylor J, Nekrutenko A; Galaxy Team. Manipulation of FASTQ data with Galaxy. Bioinformatics. 2010 Jul 15;26(14):1783-5. &lt;http://www.ncbi.nlm.nih.gov/pubmed/20562416&gt;`_
117
118
119 .. _Cock PJ, Fields CJ, Goto N, Heuer ML, Rice PM. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res. 2009 Dec 16.: http://www.ncbi.nlm.nih.gov/pubmed/20015970
120
121 </help>
122 </tool>