sharplabtool: tools/fastx_toolkit/fasta_clipping

comparison tools/fastx_toolkit/fasta_clipping_histogram.xml @ 0:9071e359b9a3

Uploaded

author	xuebing
date	Fri, 09 Mar 2012 19:37:19 -0500
parents
children

comparison

equal deleted inserted replaced

--1:000000000000
+:9071e359b9a3
+<tool id="cshl_fasta_clipping_histogram" name="Length Distribution">
+	<description>chart</description>
+	<requirements><requirement type="package">fastx_toolkit</requirement></requirements>
+	<command>fasta_clipping_histogram.pl $input $outfile</command>
+	<inputs>
+		<param format="fasta" name="input" type="data" label="Library to analyze" />
+	</inputs>
+	<outputs>
+		<data format="png" name="outfile" metadata_source="input" />
+	</outputs>
+<help>
+**What it does**
+This tool creates a histogram image of sequence lengths distribution in a given fasta dataset file.
+**TIP:** Use this tool after clipping your library (with **FASTX Clipper tool**), to visualize the clipping results.
+-----
+**Output Examples**
+In the following library, most sequences are 24-mers to 27-mers.
+This could indicate an abundance of endo-siRNAs (depending of course of what you've tried to sequence in the first place).
+.. image:: ./static/fastx_icons/fasta_clipping_histogram_1.png
+In the following library, most sequences are 19,22 or 23-mers.
+This could indicate an abundance of miRNAs (depending of course of what you've tried to sequence in the first place).
+.. image:: ./static/fastx_icons/fasta_clipping_histogram_2.png
+-----
+**Input Formats**
+This tool accepts short-reads FASTA files. The reads don't have to be short, but they do have to be on a single line, like so::
+>sequence1
+AGTAGTAGGTGATGTAGAGAGAGAGAGAGTAG
+>sequence2
+GTGTGTGTGGGAAGTTGACACAGTA
+>sequence3
+CCTTGAGATTAACGCTAATCAAGTAAAC
+If the sequences span over multiple lines::
+>sequence1
+CAGCATCTACATAATATGATCGCTATTAAACTTAAATCTCCTTGACGGAG
+TCTTCGGTCATAACACAAACCCAGACCTACGTATATGACAAAGCTAATAG
+aactggtctttacctTTAAGTTG
+Use the **FASTA Width Formatter** tool to re-format the FASTA into a single-lined sequences::
+>sequence1
+CAGCATCTACATAATATGATCGCTATTAAACTTAAATCTCCTTGACGGAGTCTTCGGTCATAACACAAACCCAGACCTACGTATATGACAAAGCTAATAGaactggtctttacctTTAAGTTG
+-----
+**Multiplicity counts (a.k.a reads-count)**
+If the sequence identifier (the text after the '>') contains a dash and a number, it is treated as a multiplicity count value (i.e. how many times that individual sequence repeated in the original FASTA file, before collapsing).
+Example 1 - The following FASTA file *does not* have multiplicity counts::
+>seq1
+GGATCC
+>seq2
+GGTCATGGGTTTAAA
+>seq3
+GGGATATATCCCCACACACACACAC
+Each sequence is counts as one, to produce the following chart:
+.. image:: ./static/fastx_icons/fasta_clipping_histogram_3.png
+Example 2 - The following FASTA file have multiplicity counts::
+>seq1-2
+GGATCC
+>seq2-10
+GGTCATGGGTTTAAA
+>seq3-3
+GGGATATATCCCCACACACACACAC
+The first sequence counts as 2, the second as 10, the third as 3, to produce the following chart:
+.. image:: ./static/fastx_icons/fasta_clipping_histogram_4.png
+Use the **FASTA Collapser** tool to create FASTA files with multiplicity counts.
+</help>
+</tool>
+<!-- FASTA-Clipping-Histogram is part of the FASTX-toolkit, by A.Gordon (gordon@cshl.edu) -->

Mercurial > repos > xuebing > sharplabtool

comparison tools/fastx_toolkit/fasta_clipping_histogram.xml @ 0:9071e359b9a3