comparison fasta_clipping_histogram.xml @ 0:78a7d28f2a15 draft

Uploaded
author idot
date Wed, 10 Jul 2013 06:13:48 -0400
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:78a7d28f2a15
1 <tool id="cshl_fasta_clipping_histogram" name="Length Distribution">
2 <description>chart</description>
3 <command>fasta_clipping_histogram.pl $input $outfile</command>
4
5 <inputs>
6 <param format="fasta" name="input" type="data" label="Library to analyze" />
7 </inputs>
8
9 <outputs>
10 <data format="png" name="outfile" metadata_source="input"
11 />
12 </outputs>
13 <help>
14
15 **What it does**
16
17 This tool creates a histogram image of sequence lengths distribution in a given fasta dataset file.
18
19 **TIP:** Use this tool after clipping your library (with **FASTX Clipper tool**), to visualize the clipping results.
20
21 -----
22
23 **Output Examples**
24
25 In the following library, most sequences are 24-mers to 27-mers.
26 This could indicate an abundance of endo-siRNAs (depending of course of what you've tried to sequence in the first place).
27
28 .. image:: ./static/fastx_icons/fasta_clipping_histogram_1.png
29
30
31 In the following library, most sequences are 19,22 or 23-mers.
32 This could indicate an abundance of miRNAs (depending of course of what you've tried to sequence in the first place).
33
34 .. image:: ./static/fastx_icons/fasta_clipping_histogram_2.png
35
36
37 -----
38
39
40 **Input Formats**
41
42 This tool accepts short-reads FASTA files. The reads don't have to be short, but they do have to be on a single line, like so::
43
44 >sequence1
45 AGTAGTAGGTGATGTAGAGAGAGAGAGAGTAG
46 >sequence2
47 GTGTGTGTGGGAAGTTGACACAGTA
48 >sequence3
49 CCTTGAGATTAACGCTAATCAAGTAAAC
50
51
52 If the sequences span over multiple lines::
53
54 >sequence1
55 CAGCATCTACATAATATGATCGCTATTAAACTTAAATCTCCTTGACGGAG
56 TCTTCGGTCATAACACAAACCCAGACCTACGTATATGACAAAGCTAATAG
57 aactggtctttacctTTAAGTTG
58
59 Use the **FASTA Width Formatter** tool to re-format the FASTA into a single-lined sequences::
60
61 >sequence1
62 CAGCATCTACATAATATGATCGCTATTAAACTTAAATCTCCTTGACGGAGTCTTCGGTCATAACACAAACCCAGACCTACGTATATGACAAAGCTAATAGaactggtctttacctTTAAGTTG
63
64
65 -----
66
67
68
69 **Multiplicity counts (a.k.a reads-count)**
70
71 If the sequence identifier (the text after the '>') contains a dash and a number, it is treated as a multiplicity count value (i.e. how many times that individual sequence repeated in the original FASTA file, before collapsing).
72
73 Example 1 - The following FASTA file *does not* have multiplicity counts::
74
75 >seq1
76 GGATCC
77 >seq2
78 GGTCATGGGTTTAAA
79 >seq3
80 GGGATATATCCCCACACACACACAC
81
82 Each sequence is counts as one, to produce the following chart:
83
84 .. image:: ./static/fastx_icons/fasta_clipping_histogram_3.png
85
86
87 Example 2 - The following FASTA file have multiplicity counts::
88
89 >seq1-2
90 GGATCC
91 >seq2-10
92 GGTCATGGGTTTAAA
93 >seq3-3
94 GGGATATATCCCCACACACACACAC
95
96 The first sequence counts as 2, the second as 10, the third as 3, to produce the following chart:
97
98 .. image:: ./static/fastx_icons/fasta_clipping_histogram_4.png
99
100 Use the **FASTA Collapser** tool to create FASTA files with multiplicity counts.
101
102 </help>
103 </tool>
104 <!-- FASTA-Clipping-Histogram is part of the FASTX-toolkit, by A.Gordon (gordon@cshl.edu) -->