0
|
1 <tool id="cshl_fastx_quality_statistics" version="1.0.0" name="Compute quality statistics">
|
|
2 <description></description>
|
|
3 <requirements>
|
|
4 <requirement type="package" version="0.0.13">fastx_toolkit</requirement>
|
|
5 </requirements>
|
|
6 <command>zcat -f $input | fastx_quality_stats -o $output -Q 33</command>
|
|
7
|
|
8 <inputs>
|
|
9 <param format="fastqsanger" version="1.0.0" name="input" type="data" label="Library to analyse" />
|
|
10 </inputs>
|
|
11
|
|
12 <tests>
|
|
13 <test>
|
|
14 <param version="1.0.0" name="input" value="fastq_stats1.fastq" ftype="fastqsanger"/>
|
|
15 <output version="1.0.0" name="output" file="fastq_stats1.out" />
|
|
16 </test>
|
|
17 </tests>
|
|
18
|
|
19 <outputs>
|
|
20 <data format="txt" version="1.0.0" name="output" metadata_source="input" />
|
|
21 </outputs>
|
|
22
|
|
23 <help>
|
|
24
|
|
25 **What it does**
|
|
26
|
|
27 Creates quality statistics report for the given Solexa/FASTQ library.
|
|
28
|
|
29 .. class:: infomark
|
|
30
|
|
31 **TIP:** This statistics report can be used as input for **Quality Score** and **Nucleotides Distribution** tools.
|
|
32
|
|
33 -----
|
|
34
|
|
35 **The output file will contain the following fields:**
|
|
36
|
|
37 * column = column number (1 to 36 for a 36-cycles read Solexa file)
|
|
38 * count = number of bases found in this column.
|
|
39 * min = Lowest quality score value found in this column.
|
|
40 * max = Highest quality score value found in this column.
|
|
41 * sum = Sum of quality score values for this column.
|
|
42 * mean = Mean quality score value for this column.
|
|
43 * Q1 = 1st quartile quality score.
|
|
44 * med = Median quality score.
|
|
45 * Q3 = 3rd quartile quality score.
|
|
46 * IQR = Inter-Quartile range (Q3-Q1).
|
|
47 * lW = 'Left-Whisker' value (for boxplotting).
|
|
48 * rW = 'Right-Whisker' value (for boxplotting).
|
|
49 * A_Count = Count of 'A' nucleotides found in this column.
|
|
50 * C_Count = Count of 'C' nucleotides found in this column.
|
|
51 * G_Count = Count of 'G' nucleotides found in this column.
|
|
52 * T_Count = Count of 'T' nucleotides found in this column.
|
|
53 * N_Count = Count of 'N' nucleotides found in this column.
|
|
54
|
|
55
|
|
56 For example::
|
|
57
|
|
58 1 6362991 -4 40 250734117 39.41 40 40 40 0 40 40 1396976 1329101 678730 2958184 0
|
|
59 2 6362991 -5 40 250531036 39.37 40 40 40 0 40 40 1786786 1055766 1738025 1782414 0
|
|
60 3 6362991 -5 40 248722469 39.09 40 40 40 0 40 40 2296384 984875 1443989 1637743 0
|
|
61 4 6362991 -4 40 248214827 39.01 40 40 40 0 40 40 2536861 1167423 1248968 1409739 0
|
|
62 36 6362991 -5 40 117158566 18.41 7 15 30 23 -5 40 4074444 1402980 63287 822035 245
|
|
63
|
|
64 ------
|
|
65
|
|
66 This tool is based on `FASTX-toolkit`__ by Assaf Gordon.
|
|
67
|
|
68 .. __: http://hannonlab.cshl.edu/fastx_toolkit/
|
|
69
|
|
70 </help>
|
|
71 <!-- FASTQ-Statistics is part of the FASTX-toolkit, by A.Gordon (gordon@cshl.edu) -->
|
|
72 </tool>
|