comparison tools/next_gen_conversion/fastq_gen_conv.xml @ 0:9071e359b9a3

Uploaded
author xuebing
date Fri, 09 Mar 2012 19:37:19 -0500 (2012-03-10)
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:9071e359b9a3
1 <tool id="fastq_gen_conv" name="FASTQ Groomer" version="1.0.0">
2 <description>converts any FASTQ to Sanger</description>
3 <command interpreter="python">
4 fastq_gen_conv.py
5 --input=$input
6 --origType=$origTypeChoice.origType
7 #if $origTypeChoice.origType == "sanger":
8 --allOrNot=$origTypeChoice.howManyBlocks.allOrNot
9 #if $origTypeChoice.howManyBlocks.allOrNot == "not":
10 --blocks=$origTypeChoice.howManyBlocks.blocks
11 #else:
12 --blocks="None"
13 #end if
14 #else:
15 --allOrNot="None"
16 --blocks="None"
17 #end if
18 --output=$output
19 </command>
20 <inputs>
21 <param name="input" type="data" format="fastq" label="Groom this dataset" />
22 <conditional name="origTypeChoice">
23 <param name="origType" type="select" label="How do you think quality values are scaled?" help="See below for explanation">
24 <option value="solexa">Solexa/Illumina 1.0</option>
25 <option value="illumina">Illumina 1.3+</option>
26 <option value="sanger">Sanger (validation only)</option>
27 </param>
28 <when value="solexa" />
29 <when value="illumina" />
30 <when value="sanger">
31 <conditional name="howManyBlocks">
32 <param name="allOrNot" type="select" label="Since your fastq is already in Sanger format you can check it for consistency">
33 <option value="all">Check all (may take a while)</option>
34 <option selected="true" value="not">Check selected number of blocks</option>
35 </param>
36 <when value="all" />
37 <when value="not">
38 <param name="blocks" type="integer" value="1000" label="How many blocks (four lines each) do you want to check?" />
39 </when>
40 </conditional>
41 </when>
42 </conditional>
43 </inputs>
44 <outputs>
45 <data name="output" format="fastqsanger"/>
46 </outputs>
47 <tests>
48 <test>
49 <param name="input" value="fastq_gen_conv_in1.fastq" ftype="fastq" />
50 <param name="origType" value="solexa" />
51 <output name="output" format="fastqsanger" file="fastq_gen_conv_out1.fastqsanger" />
52 </test>
53 <test>
54 <param name="input" value="fastq_gen_conv_in2.fastq" ftype="fastq" />
55 <param name="origType" value="sanger" />
56 <param name="allOrNot" value="not" />
57 <param name="blocks" value="3" />
58 <output name="output" format="fastqsanger" file="fastq_gen_conv_out2.fastqsanger" />
59 </test>
60 </tests>
61 <help>
62
63 **What it does**
64
65 Galaxy pipeline for mapping of Illumina data requires data to be in fastq format with quality values conforming to so called "Sanger" format. Unfortunately there are many other types of fastq. Thus the main objective of this tool is to "groom" multiple types of fastq into Sanger-conforming fastq that can be used in downstream application such as mapping.
66
67 .. class:: infomark
68
69 **TIP**: If the input dataset is already in Sanger format the tool does not perform conversion. However validation (described below) is still performed.
70
71 -----
72
73 **Types of fastq datasets**
74
75 A good description of fastq datasets can be found `here`__, while a description of Galaxy's fastq "logic" can be found `here`__. Because ranges of quality values within different types of fastq datasets overlap it very difficult to detect them automatically. This tool supports conversion of two commonly found types (Solexa/Illumina 1.0 and Illumina 1.3+) into fastq Sanger.
76
77 .. __: http://en.wikipedia.org/wiki/FASTQ_format
78 .. __: http://wiki.g2.bx.psu.edu/Admin/NGS%20Local%20Setup
79
80 .. class:: warningmark
81
82 **NOTE** that there is also a type of fastq format where quality values are represented by a list of space-delimited integers (e.g., 40 40 20 15 -5 20 ...). This tool **does not** handle such fastq. If you have such a dataset, it needs to be converted into ASCII-type fastq (where quality values are encoded by characters) by "Numeric-to-ASCII" utility before it can accepted by this tool.
83
84 -----
85
86 **Validation**
87
88 In addition to converting quality values to Sanger format the tool also checks the input dataset for consistency. Specifically, it performs these four checks:
89
90 - skips empty lines
91 - checks that blocks are properly formed by making sure that:
92
93 #. there are four lines per block
94 #. the first line starts with "@"
95 #. the third line starts with "+"
96 #. lengths of second line (sequences) and the fourth line (quality string) are identical
97
98 - checks that quality values are within range for the chosen fastq format (e.g., the format provided by the user in **How do you think quality values are scaled?** drop down.
99
100 To see exactly what the tool does you can take a look at its source code `here`__.
101
102 .. __: http://bitbucket.org/galaxy/galaxy-central/src/tip/tools/next_gen_conversion/fastq_gen_conv.py
103
104
105 </help>
106 </tool>