annotate tools/fastx_toolkit/fasta_clipping_histogram.xml @ 0:9071e359b9a3

Uploaded
author xuebing
date Fri, 09 Mar 2012 19:37:19 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
1 <tool id="cshl_fasta_clipping_histogram" name="Length Distribution">
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
2 <description>chart</description>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
3 <requirements><requirement type="package">fastx_toolkit</requirement></requirements>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
4 <command>fasta_clipping_histogram.pl $input $outfile</command>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
5
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
6 <inputs>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
7 <param format="fasta" name="input" type="data" label="Library to analyze" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
8 </inputs>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
9
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
10 <outputs>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
11 <data format="png" name="outfile" metadata_source="input" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
12 </outputs>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
13 <help>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
14
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
15 **What it does**
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
16
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
17 This tool creates a histogram image of sequence lengths distribution in a given fasta dataset file.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
18
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
19 **TIP:** Use this tool after clipping your library (with **FASTX Clipper tool**), to visualize the clipping results.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
20
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
21 -----
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
22
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
23 **Output Examples**
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
24
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
25 In the following library, most sequences are 24-mers to 27-mers.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
26 This could indicate an abundance of endo-siRNAs (depending of course of what you've tried to sequence in the first place).
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
27
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
28 .. image:: ./static/fastx_icons/fasta_clipping_histogram_1.png
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
29
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
30
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
31 In the following library, most sequences are 19,22 or 23-mers.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
32 This could indicate an abundance of miRNAs (depending of course of what you've tried to sequence in the first place).
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
33
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
34 .. image:: ./static/fastx_icons/fasta_clipping_histogram_2.png
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
35
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
36
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
37 -----
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
38
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
39
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
40 **Input Formats**
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
41
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
42 This tool accepts short-reads FASTA files. The reads don't have to be short, but they do have to be on a single line, like so::
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
43
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
44 >sequence1
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
45 AGTAGTAGGTGATGTAGAGAGAGAGAGAGTAG
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
46 >sequence2
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
47 GTGTGTGTGGGAAGTTGACACAGTA
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
48 >sequence3
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
49 CCTTGAGATTAACGCTAATCAAGTAAAC
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
50
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
51
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
52 If the sequences span over multiple lines::
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
53
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
54 >sequence1
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
55 CAGCATCTACATAATATGATCGCTATTAAACTTAAATCTCCTTGACGGAG
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
56 TCTTCGGTCATAACACAAACCCAGACCTACGTATATGACAAAGCTAATAG
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
57 aactggtctttacctTTAAGTTG
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
58
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
59 Use the **FASTA Width Formatter** tool to re-format the FASTA into a single-lined sequences::
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
60
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
61 >sequence1
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
62 CAGCATCTACATAATATGATCGCTATTAAACTTAAATCTCCTTGACGGAGTCTTCGGTCATAACACAAACCCAGACCTACGTATATGACAAAGCTAATAGaactggtctttacctTTAAGTTG
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
63
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
64
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
65 -----
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
66
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
67
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
68
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
69 **Multiplicity counts (a.k.a reads-count)**
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
70
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
71 If the sequence identifier (the text after the '>') contains a dash and a number, it is treated as a multiplicity count value (i.e. how many times that individual sequence repeated in the original FASTA file, before collapsing).
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
72
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
73 Example 1 - The following FASTA file *does not* have multiplicity counts::
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
74
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
75 >seq1
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
76 GGATCC
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
77 >seq2
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
78 GGTCATGGGTTTAAA
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
79 >seq3
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
80 GGGATATATCCCCACACACACACAC
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
81
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
82 Each sequence is counts as one, to produce the following chart:
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
83
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
84 .. image:: ./static/fastx_icons/fasta_clipping_histogram_3.png
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
85
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
86
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
87 Example 2 - The following FASTA file have multiplicity counts::
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
88
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
89 >seq1-2
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
90 GGATCC
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
91 >seq2-10
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
92 GGTCATGGGTTTAAA
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
93 >seq3-3
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
94 GGGATATATCCCCACACACACACAC
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
95
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
96 The first sequence counts as 2, the second as 10, the third as 3, to produce the following chart:
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
97
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
98 .. image:: ./static/fastx_icons/fasta_clipping_histogram_4.png
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
99
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
100 Use the **FASTA Collapser** tool to create FASTA files with multiplicity counts.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
101
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
102 </help>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
103 </tool>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
104 <!-- FASTA-Clipping-Histogram is part of the FASTX-toolkit, by A.Gordon (gordon@cshl.edu) -->