0
|
1 <tool id="cshl_fastq_quality_filter" name="Filter by quality">
|
|
2 <description></description>
|
|
3
|
|
4 <command>
|
|
5 cat '$input' |
|
|
6 fastq_quality_filter
|
|
7 #if $input.ext == "fastqsanger":
|
|
8 -Q 33
|
|
9 #elif $input.ext == "fastq":
|
|
10 -Q 64
|
|
11 #end if
|
|
12 -q $quality -p $percent -v -o '$output'
|
|
13 </command>
|
|
14 <inputs>
|
|
15 <param format="fastq,fastqsanger" name="input" type="data" label="Library to filter" />
|
|
16
|
|
17 <param name="quality" size="4" type="integer" value="20">
|
|
18 <label>Quality cut-off value</label>
|
|
19 </param>
|
|
20
|
|
21 <param name="percent" size="4" type="integer" value="90">
|
|
22 <label>Percent of bases in sequence that must have quality equal to / higher than cut-off value</label>
|
|
23 </param>
|
|
24 </inputs>
|
|
25
|
|
26 <tests>
|
|
27 <test>
|
|
28 <!-- Test1: 100% of bases with quality 33 or higher (pretty steep requirement...) -->
|
|
29 <param name="input" value="fastq_qual_filter1.fastq" />
|
|
30 <param name="quality" value="33"/>
|
|
31 <param name="percent" value="100"/>
|
|
32 <output name="output" file="fastq_qual_filter1a.out" />
|
|
33 </test>
|
|
34 <test>
|
|
35 <!-- Test2: 80% of bases with quality 20 or higher -->
|
|
36 <param name="input" value="fastq_qual_filter1.fastq" />
|
|
37 <param name="quality" value="20"/>
|
|
38 <param name="percent" value="80"/>
|
|
39 <output name="output" file="fastq_qual_filter1b.out" />
|
|
40 </test>
|
|
41 </tests>
|
|
42
|
|
43 <outputs>
|
|
44 <data format="input" name="output" metadata_source="input"
|
|
45 />
|
|
46 </outputs>
|
|
47
|
|
48 <help>
|
|
49 **What it does**
|
|
50
|
|
51 This tool filters reads based on quality scores.
|
|
52
|
|
53 .. class:: infomark
|
|
54
|
|
55 Using **percent = 100** requires all cycles of all reads to be at least the quality cut-off value.
|
|
56
|
|
57 .. class:: infomark
|
|
58
|
|
59 Using **percent = 50** requires the median quality of the cycles (in each read) to be at least the quality cut-off value.
|
|
60
|
|
61 --------
|
|
62
|
|
63 Quality score distribution (of all cycles) is calculated for each read. If it is lower than the quality cut-off value - the read is discarded.
|
|
64
|
|
65
|
|
66 **Example**::
|
|
67
|
|
68 @CSHL_4_FC042AGOOII:1:2:214:584
|
|
69 GACAATAAAC
|
|
70 +CSHL_4_FC042AGOOII:1:2:214:584
|
|
71 30 30 30 30 30 30 30 30 20 10
|
|
72
|
|
73 Using **percent = 50** and **cut-off = 30** - This read will not be discarded (the median quality is higher than 30).
|
|
74
|
|
75 Using **percent = 90** and **cut-off = 30** - This read will be discarded (90% of the cycles do no have quality equal to / higher than 30).
|
|
76
|
|
77 Using **percent = 100** and **cut-off = 20** - This read will be discarded (not all cycles have quality equal to / higher than 20).
|
|
78
|
|
79 ------
|
|
80
|
|
81 This tool is based on `FASTX-toolkit`__ by Assaf Gordon.
|
|
82
|
|
83 .. __: http://hannonlab.cshl.edu/fastx_toolkit/
|
|
84 </help>
|
|
85 </tool>
|
|
86 <!-- FASTQ-Quality-Filter is part of the FASTX-toolkit, by A.Gordon (gordon@cshl.edu) -->
|