3
|
1 <tool id="cshl_fastx_nucleotides_distribution" name="Nucleotides Distribution">
|
|
2 <description>chart</description>
|
|
3 <command>fastx_nucleotide_distribution_graph.sh -t '$input.name' -i $input -o $output</command>
|
|
4
|
|
5 <inputs>
|
|
6 <param format="txt" name="input" type="data" label="Statistics Text File (output of 'FASTX Statistics' tool)" />
|
|
7 </inputs>
|
|
8
|
|
9 <outputs>
|
|
10 <data format="png" name="output" metadata_source="input" />
|
|
11 </outputs>
|
|
12 <help>
|
|
13
|
|
14 **What it does**
|
|
15
|
|
16 Creates a stacked-histogram graph for the nucleotide distribution in the Solexa library.
|
|
17
|
|
18 .. class:: infomark
|
|
19
|
|
20 **TIP:** Use the **FASTQ Statistics** tool to generate the report file needed for this tool.
|
|
21
|
|
22 -----
|
|
23
|
|
24 **Output Examples**
|
|
25
|
|
26
|
|
27
|
|
28 The following chart clearly shows the barcode used at the 5'-end of the library: **GATCT**
|
|
29
|
|
30 .. image:: ./static/fastx_icons/fastq_nucleotides_distribution_1.png
|
|
31
|
|
32
|
|
33
|
|
34
|
|
35
|
|
36
|
|
37
|
|
38 In the following chart, one can almost 'read' the most abundant sequence by looking at the dominant values: **TGATA TCGTA TTGAT GACTG AA...**
|
|
39
|
|
40 .. image:: ./static/fastx_icons/fastq_nucleotides_distribution_2.png
|
|
41
|
|
42
|
|
43
|
|
44
|
|
45
|
|
46
|
|
47
|
|
48
|
|
49 The following chart shows a growing number of unknown (N) nucleotides towards later cycles (which might indicate a sequencing problem):
|
|
50
|
|
51 .. image:: ./static/fastx_icons/fastq_nucleotides_distribution_3.png
|
|
52
|
|
53
|
|
54
|
|
55
|
|
56
|
|
57
|
|
58
|
|
59
|
|
60 But most of the time, the chart will look rather random:
|
|
61
|
|
62 .. image:: ./static/fastx_icons/fastq_nucleotides_distribution_4.png
|
|
63
|
|
64 </help>
|
|
65 </tool>
|
|
66 <!-- FASTQ-Nucleotides-Distribution is part of the FASTX-toolkit, by A.Gordon (gordon@cshl.edu) -->
|