# HG changeset patch # User idot # Date 1373451228 14400 # Node ID 78a7d28f2a153bda53c7e569122328e218d4c6c8 Uploaded diff -r 000000000000 -r 78a7d28f2a15 ._fasta_clipping_histogram.xml Binary file ._fasta_clipping_histogram.xml has changed diff -r 000000000000 -r 78a7d28f2a15 ._fasta_formatter.xml Binary file ._fasta_formatter.xml has changed diff -r 000000000000 -r 78a7d28f2a15 ._fasta_nucleotide_changer.xml Binary file ._fasta_nucleotide_changer.xml has changed diff -r 000000000000 -r 78a7d28f2a15 ._fastq_masker.xml Binary file ._fastq_masker.xml has changed diff -r 000000000000 -r 78a7d28f2a15 ._fastq_quality_boxplot.xml Binary file ._fastq_quality_boxplot.xml has changed diff -r 000000000000 -r 78a7d28f2a15 ._fastq_quality_converter.xml Binary file ._fastq_quality_converter.xml has changed diff -r 000000000000 -r 78a7d28f2a15 ._fastq_quality_filter.xml Binary file ._fastq_quality_filter.xml has changed diff -r 000000000000 -r 78a7d28f2a15 ._fastq_quality_trimmer.xml Binary file ._fastq_quality_trimmer.xml has changed diff -r 000000000000 -r 78a7d28f2a15 ._fastq_to_fasta.xml Binary file ._fastq_to_fasta.xml has changed diff -r 000000000000 -r 78a7d28f2a15 ._fastx_artifacts_filter.xml Binary file ._fastx_artifacts_filter.xml has changed diff -r 000000000000 -r 78a7d28f2a15 ._fastx_barcode_splitter.xml Binary file ._fastx_barcode_splitter.xml has changed diff -r 000000000000 -r 78a7d28f2a15 ._fastx_barcode_splitter_galaxy_wrapper.sh Binary file ._fastx_barcode_splitter_galaxy_wrapper.sh has changed diff -r 000000000000 -r 78a7d28f2a15 ._fastx_clipper.xml Binary file ._fastx_clipper.xml has changed diff -r 000000000000 -r 78a7d28f2a15 ._fastx_collapser.xml Binary file ._fastx_collapser.xml has changed diff -r 000000000000 -r 78a7d28f2a15 ._fastx_nucleotides_distribution.xml Binary file ._fastx_nucleotides_distribution.xml has changed diff -r 000000000000 -r 78a7d28f2a15 ._fastx_nucleotides_distribution_line.xml Binary file ._fastx_nucleotides_distribution_line.xml has changed diff -r 000000000000 -r 78a7d28f2a15 ._fastx_quality_statistics.xml Binary file ._fastx_quality_statistics.xml has changed diff -r 000000000000 -r 78a7d28f2a15 ._fastx_quality_statistics_ng.xml Binary file ._fastx_quality_statistics_ng.xml has changed diff -r 000000000000 -r 78a7d28f2a15 ._fastx_renamer.xml Binary file ._fastx_renamer.xml has changed diff -r 000000000000 -r 78a7d28f2a15 ._fastx_reverse_complement.xml Binary file ._fastx_reverse_complement.xml has changed diff -r 000000000000 -r 78a7d28f2a15 ._fastx_trimmer.xml Binary file ._fastx_trimmer.xml has changed diff -r 000000000000 -r 78a7d28f2a15 ._fastx_trimmer_from_end.xml Binary file ._fastx_trimmer_from_end.xml has changed diff -r 000000000000 -r 78a7d28f2a15 ._fastx_uncollapser.xml Binary file ._fastx_uncollapser.xml has changed diff -r 000000000000 -r 78a7d28f2a15 ._seqid_uncollapser.xml Binary file ._seqid_uncollapser.xml has changed diff -r 000000000000 -r 78a7d28f2a15 fasta_clipping_histogram.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/fasta_clipping_histogram.xml Wed Jul 10 06:13:48 2013 -0400 @@ -0,0 +1,104 @@ + + chart + fasta_clipping_histogram.pl $input $outfile + + + + + + + + + + +**What it does** + +This tool creates a histogram image of sequence lengths distribution in a given fasta dataset file. + +**TIP:** Use this tool after clipping your library (with **FASTX Clipper tool**), to visualize the clipping results. + +----- + +**Output Examples** + +In the following library, most sequences are 24-mers to 27-mers. +This could indicate an abundance of endo-siRNAs (depending of course of what you've tried to sequence in the first place). + +.. image:: ./static/fastx_icons/fasta_clipping_histogram_1.png + + +In the following library, most sequences are 19,22 or 23-mers. +This could indicate an abundance of miRNAs (depending of course of what you've tried to sequence in the first place). + +.. image:: ./static/fastx_icons/fasta_clipping_histogram_2.png + + +----- + + +**Input Formats** + +This tool accepts short-reads FASTA files. The reads don't have to be short, but they do have to be on a single line, like so:: + + >sequence1 + AGTAGTAGGTGATGTAGAGAGAGAGAGAGTAG + >sequence2 + GTGTGTGTGGGAAGTTGACACAGTA + >sequence3 + CCTTGAGATTAACGCTAATCAAGTAAAC + + +If the sequences span over multiple lines:: + + >sequence1 + CAGCATCTACATAATATGATCGCTATTAAACTTAAATCTCCTTGACGGAG + TCTTCGGTCATAACACAAACCCAGACCTACGTATATGACAAAGCTAATAG + aactggtctttacctTTAAGTTG + +Use the **FASTA Width Formatter** tool to re-format the FASTA into a single-lined sequences:: + + >sequence1 + CAGCATCTACATAATATGATCGCTATTAAACTTAAATCTCCTTGACGGAGTCTTCGGTCATAACACAAACCCAGACCTACGTATATGACAAAGCTAATAGaactggtctttacctTTAAGTTG + + +----- + + + +**Multiplicity counts (a.k.a reads-count)** + +If the sequence identifier (the text after the '>') contains a dash and a number, it is treated as a multiplicity count value (i.e. how many times that individual sequence repeated in the original FASTA file, before collapsing). + +Example 1 - The following FASTA file *does not* have multiplicity counts:: + + >seq1 + GGATCC + >seq2 + GGTCATGGGTTTAAA + >seq3 + GGGATATATCCCCACACACACACAC + +Each sequence is counts as one, to produce the following chart: + +.. image:: ./static/fastx_icons/fasta_clipping_histogram_3.png + + +Example 2 - The following FASTA file have multiplicity counts:: + + >seq1-2 + GGATCC + >seq2-10 + GGTCATGGGTTTAAA + >seq3-3 + GGGATATATCCCCACACACACACAC + +The first sequence counts as 2, the second as 10, the third as 3, to produce the following chart: + +.. image:: ./static/fastx_icons/fasta_clipping_histogram_4.png + +Use the **FASTA Collapser** tool to create FASTA files with multiplicity counts. + + + + diff -r 000000000000 -r 78a7d28f2a15 fasta_formatter.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/fasta_formatter.xml Wed Jul 10 06:13:48 2013 -0400 @@ -0,0 +1,90 @@ + + formatter + + + cat '$input' | + fasta_formatter -w $width -o '$output' + + + + + + + + + + + + + + + + + + + + + + + + + + +**What it does** + +This tool re-formats a FASTA file, changing the width of the nucleotides lines. + +**TIP:** Outputting a single line (with **width = 0**) can be useful for scripting (with **grep**, **awk**, and **perl**). Every odd line is a sequence identifier, and every even line is a nucleotides line. + +-------- + +**Example** + +Input FASTA file (each nucleotides line is 50 characters long):: + + >Scaffold3648 + AGGAATGATGACTACAATGATCAACTTAACCTATCTATTTAATTTAGTTC + CCTAATGTCAGGGACCTACCTGTTTTTGTTATGTTTGGGTTTTGTTGTTG + TTGTTTTTTTAATCTGAAGGTATTGTGCATTATATGACCTGTAATACACA + ATTAAAGTCAATTTTAATGAACATGTAGTAAAAACT + >Scaffold9299 + CAGCATCTACATAATATGATCGCTATTAAACTTAAATCTCCTTGACGGAG + TCTTCGGTCATAACACAAACCCAGACCTACGTATATGACAAAGCTAATAG + aactggtctttacctTTAAGTTG + + +Output FASTA file (with width=80):: + + >Scaffold3648 + AGGAATGATGACTACAATGATCAACTTAACCTATCTATTTAATTTAGTTCCCTAATGTCAGGGACCTACCTGTTTTTGTT + ATGTTTGGGTTTTGTTGTTGTTGTTTTTTTAATCTGAAGGTATTGTGCATTATATGACCTGTAATACACAATTAAAGTCA + ATTTTAATGAACATGTAGTAAAAACT + >Scaffold9299 + CAGCATCTACATAATATGATCGCTATTAAACTTAAATCTCCTTGACGGAGTCTTCGGTCATAACACAAACCCAGACCTAC + GTATATGACAAAGCTAATAGaactggtctttacctTTAAGTTG + +Output FASTA file (with width=0 => single line):: + + >Scaffold3648 + AGGAATGATGACTACAATGATCAACTTAACCTATCTATTTAATTTAGTTCCCTAATGTCAGGGACCTACCTGTTTTTGTTATGTTTGGGTTTTGTTGTTGTTGTTTTTTTAATCTGAAGGTATTGTGCATTATATGACCTGTAATACACAATTAAAGTCAATTTTAATGAACATGTAGTAAAAACT + >Scaffold9299 + CAGCATCTACATAATATGATCGCTATTAAACTTAAATCTCCTTGACGGAGTCTTCGGTCATAACACAAACCCAGACCTACGTATATGACAAAGCTAATAGaactggtctttacctTTAAGTTG + + +------ + +This tool is based on `FASTX-toolkit`__ by Assaf Gordon. + + .. __: http://hannonlab.cshl.edu/fastx_toolkit/ + + + diff -r 000000000000 -r 78a7d28f2a15 fasta_nucleotide_changer.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/fasta_nucleotide_changer.xml Wed Jul 10 06:13:48 2013 -0400 @@ -0,0 +1,74 @@ + + converter + + cat '$input' | + fasta_nucleotide_changer $mode -v -o '$output' + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +**What it does** + +This tool converts RNA FASTA files to DNA (and vice-versa). + +In **RNA-to-DNA** mode, U's are changed into T's. + +In **DNA-to-RNA** mode, T's are changed into U's. + +-------- + +**Example** + +Input RNA FASTA file ( from Sanger's mirBase ):: + + >cel-let-7 MIMAT0000001 Caenorhabditis elegans let-7 + UGAGGUAGUAGGUUGUAUAGUU + >cel-lin-4 MIMAT0000002 Caenorhabditis elegans lin-4 + UCCCUGAGACCUCAAGUGUGA + >cel-miR-1 MIMAT0000003 Caenorhabditis elegans miR-1 + UGGAAUGUAAAGAAGUAUGUA + +Output DNA FASTA file (with RNA-to-DNA mode):: + + >cel-let-7 MIMAT0000001 Caenorhabditis elegans let-7 + TGAGGTAGTAGGTTGTATAGTT + >cel-lin-4 MIMAT0000002 Caenorhabditis elegans lin-4 + TCCCTGAGACCTCAAGTGTGA + >cel-miR-1 MIMAT0000003 Caenorhabditis elegans miR-1 + TGGAATGTAAAGAAGTATGTA + +------ + +This tool is based on `FASTX-toolkit`__ by Assaf Gordon. + + .. __: http://hannonlab.cshl.edu/fastx_toolkit/ + + + diff -r 000000000000 -r 78a7d28f2a15 fastq_masker.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/fastq_masker.xml Wed Jul 10 06:13:48 2013 -0400 @@ -0,0 +1,87 @@ + + (based on quality) + + cat '$input' | + fastq_masker +#if $input.ext == "fastqsanger": + -Q 33 +#elif $input.ext == "fastq": + -Q 64 +#end if + -v -q $cutoff -r '$maskchar' -o '$output' + + + + + + Nucleotides below this quality will be masked + + + + + Replace low-quality nucleotides with this character. Common values: 'N' or '.' + + + + + + + + + + + + + + + + +**What it does** + +This tool masks low-quality nucleotides in a FASTQ file, and replaces them with the specifed mask character (**N** by default). + +-------- + +**Example** + +Input FASTQ file:: + + @1 + TATGGTCAGAAACCATATGC + +1 + 40 40 40 40 40 40 40 40 40 40 40 20 19 19 19 19 19 19 19 19 + @2 + CAGCGAGGCTTTAATGCCAT + +2 + 40 40 40 40 40 40 40 40 30 20 19 20 19 19 19 19 19 19 19 19 + @3 + CAGCGAGGCTTTAATGCCAT + +3 + 40 40 40 40 40 40 40 40 20 19 19 19 19 19 19 19 19 19 19 19 + +After Masking nucleotides with quality lower than 20 with the character **N**:: + + @1 + TATGGTCAGAAANNNNNNNN + +1 + 40 40 40 40 40 40 40 40 40 40 40 20 19 19 19 19 19 19 19 19 + @2 + CAGCGAGGCTNTNNNNNNNN + +2 + 40 40 40 40 40 40 40 40 30 20 19 20 19 19 19 19 19 19 19 19 + @3 + CAGCGAGGCNNNNNNNNNNN + +3 + 40 40 40 40 40 40 40 40 20 19 19 19 19 19 19 19 19 19 19 19 + + +------ + +This tool is based on `FASTX-toolkit`__ by Assaf Gordon. + + .. __: http://hannonlab.cshl.edu/fastx_toolkit/ + + + + diff -r 000000000000 -r 78a7d28f2a15 fastq_quality_boxplot.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/fastq_quality_boxplot.xml Wed Jul 10 06:13:48 2013 -0400 @@ -0,0 +1,54 @@ + + + + fastq_quality_boxplot_graph.sh -t '$input.name' -i $input -o '$output' + + + + + + + + + + +**What it does** + +Creates a boxplot graph for the quality scores in the library. + +.. class:: infomark + +**TIP:** Use the **FASTQ Statistics** tool to generate the report file needed for this tool. + +----- + +**Output Examples** + +* Black horizontal lines are medians +* Rectangular red boxes show the Inter-quartile Range (IQR) (top value is Q3, bottom value is Q1) +* Whiskers show outlier at max. 1.5*IQR + + +An excellent quality library (median quality is 40 for almost all 36 cycles): + +.. image:: ../static/fastx_icons/fastq_quality_boxplot_1.png + + +A relatively good quality library (median quality degrades towards later cycles): + +.. image:: ../static/fastx_icons/fastq_quality_boxplot_2.png + +A low quality library (median drops quickly): + +.. image:: ../static/fastx_icons/fastq_quality_boxplot_3.png + +------ + +This tool is based on `FASTX-toolkit`__ by Assaf Gordon. + + .. __: http://hannonlab.cshl.edu/fastx_toolkit/ + + + + diff -r 000000000000 -r 78a7d28f2a15 fastq_quality_converter.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/fastq_quality_converter.xml Wed Jul 10 06:13:48 2013 -0400 @@ -0,0 +1,99 @@ + + (ASCII-Numeric) + + cat '$input' | + fastq_quality_converter $QUAL_FORMAT -o '$output' -Q $offset + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +**What it does** + +Converts a Solexa FASTQ file to/from numeric or ASCII quality format. + +.. class:: warningmark + +Re-scaling is **not** performed. (e.g. conversion from Phred scale to Solexa scale). + + +----- + +FASTQ with Numeric quality scores:: + + @CSHL__2_FC042AGWWWXX:8:1:120:202 + ACGATAGATCGGAAGAGCTAGTATGCCGTTTTCTGC + +CSHL__2_FC042AGWWWXX:8:1:120:202 + 40 40 40 40 20 40 40 40 40 6 40 40 28 40 40 25 40 20 40 -1 30 40 14 27 40 8 1 3 7 -1 11 10 -1 21 10 8 + @CSHL__2_FC042AGWWWXX:8:1:103:1185 + ATCACGATAGATCGGCAGAGCTCGTTTACCGTCTTC + +CSHL__2_FC042AGWWWXX:8:1:103:1185 + 40 40 40 40 40 35 33 31 40 40 40 32 30 22 40 -0 9 22 17 14 8 36 15 34 22 12 23 3 10 -0 8 2 4 25 30 2 + + +FASTQ with ASCII quality scores:: + + @CSHL__2_FC042AGWWWXX:8:1:120:202 + ACGATAGATCGGAAGAGCTAGTATGCCGTTTTCTGC + +CSHL__2_FC042AGWWWXX:8:1:120:202 + hhhhThhhhFhh\hhYhTh?^hN[hHACG?KJ?UJH + @CSHL__2_FC042AGWWWXX:8:1:103:1185 + ATCACGATAGATCGGCAGAGCTCGTTTACCGTCTTC + +CSHL__2_FC042AGWWWXX:8:1:103:1185 + hhhhhca_hhh`^Vh@IVQNHdObVLWCJ@HBDY^B + +------ + +This tool is based on `FASTX-toolkit`__ by Assaf Gordon. + + .. __: http://hannonlab.cshl.edu/fastx_toolkit/ + + + + diff -r 000000000000 -r 78a7d28f2a15 fastq_quality_filter.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/fastq_quality_filter.xml Wed Jul 10 06:13:48 2013 -0400 @@ -0,0 +1,86 @@ + + + + +cat '$input' | +fastq_quality_filter +#if $input.ext == "fastqsanger": + -Q 33 +#elif $input.ext == "fastq": + -Q 64 +#end if + -q $quality -p $percent -v -o '$output' + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +**What it does** + +This tool filters reads based on quality scores. + +.. class:: infomark + +Using **percent = 100** requires all cycles of all reads to be at least the quality cut-off value. + +.. class:: infomark + +Using **percent = 50** requires the median quality of the cycles (in each read) to be at least the quality cut-off value. + +-------- + +Quality score distribution (of all cycles) is calculated for each read. If it is lower than the quality cut-off value - the read is discarded. + + +**Example**:: + + @CSHL_4_FC042AGOOII:1:2:214:584 + GACAATAAAC + +CSHL_4_FC042AGOOII:1:2:214:584 + 30 30 30 30 30 30 30 30 20 10 + +Using **percent = 50** and **cut-off = 30** - This read will not be discarded (the median quality is higher than 30). + +Using **percent = 90** and **cut-off = 30** - This read will be discarded (90% of the cycles do no have quality equal to / higher than 30). + +Using **percent = 100** and **cut-off = 20** - This read will be discarded (not all cycles have quality equal to / higher than 20). + +------ + +This tool is based on `FASTX-toolkit`__ by Assaf Gordon. + + .. __: http://hannonlab.cshl.edu/fastx_toolkit/ + + + diff -r 000000000000 -r 78a7d28f2a15 fastq_quality_trimmer.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/fastq_quality_trimmer.xml Wed Jul 10 06:13:48 2013 -0400 @@ -0,0 +1,100 @@ + + + +cat '$input' | +fastq_quality_trimmer +#if $input.ext == "fastqsanger": + -Q 33 +#elif $input.ext == "fastq": + -Q 64 +#end if + -v -t $cutoff -l $minlen -o '$output' + + + + + + + + Nucleotides below this quality will be trimmed + + + + + Sequences shorter than this length will be discard. Leave at zero to keep all sequences + + + + + + + + + + + + + + + + +**What it does** + +This tool scans the sequence from the end for the first nucleotide to possess the specified minimum quality score. It will then trim (remove nucleotides from) the sequence after this position. After trimming, sequences that are shorter than the minimum length are discarded. + +-------- + +**Example** + +Input Fasta file (with 20 bases in each sequences):: + + @1 + TATGGTCAGAAACCATATGC + +1 + 40 40 40 40 40 40 40 40 40 40 40 20 19 19 19 19 19 19 19 19 + @2 + CAGCGAGGCTTTAATGCCAT + +2 + 40 40 40 40 40 40 40 40 30 20 19 20 19 19 19 19 19 19 19 19 + @3 + CAGCGAGGCTTTAATGCCAT + +3 + 40 40 40 40 40 40 40 40 20 19 19 19 19 19 19 19 19 19 19 19 + + +Trimming with a cutoff of 20, we get the following FASTQ file:: + + @1 + TATGGTCAGAAA + +1 + 40 40 40 40 40 40 40 40 40 40 40 20 + @2 + CAGCGAGGCTTT + +2 + 40 40 40 40 40 40 40 40 30 20 19 20 + @3 + CAGCGAGGC + +3 + 40 40 40 40 40 40 40 40 20 + +Trimming with a cutoff of 20 and a minimum length of 12, we get the following FASTQ file:: + + @1 + TATGGTCAGAAA + +1 + 40 40 40 40 40 40 40 40 40 40 40 20 + @2 + CAGCGAGGCTTT + +2 + 40 40 40 40 40 40 40 40 30 20 19 20 + +------ + +This tool is based on `FASTX-toolkit`__ by Assaf Gordon. + + .. __: http://hannonlab.cshl.edu/fastx_toolkit/ + + + + diff -r 000000000000 -r 78a7d28f2a15 fastq_to_fasta.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/fastq_to_fasta.xml Wed Jul 10 06:13:48 2013 -0400 @@ -0,0 +1,84 @@ + + converter + +cat '$input' | +fastq_to_fasta +#if $input.ext == "fastqsanger": + -Q 33 +#elif $input.ext == "fastq": + -Q 64 +#end if + $SKIPN $RENAMESEQ -o '$output' -v + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +**What it does** + +This tool converts data from Solexa format to FASTA format (scroll down for format description). + +-------- + +**Example** + +The following data in Solexa-FASTQ format:: + + @CSHL_4_FC042GAMMII_2_1_517_596 + GGTCAATGATGAGTTGGCACTGTAGGCACCATCAAT + +CSHL_4_FC042GAMMII_2_1_517_596 + 40 40 40 40 40 40 40 40 40 40 38 40 40 40 40 40 14 40 40 40 40 40 36 40 13 14 24 24 9 24 9 40 10 10 15 40 + +Will be converted to FASTA (with 'rename sequence names' = NO):: + + >CSHL_4_FC042GAMMII_2_1_517_596 + GGTCAATGATGAGTTGGCACTGTAGGCACCATCAAT + +Will be converted to FASTA (with 'rename sequence names' = YES):: + + >1 + GGTCAATGATGAGTTGGCACTGTAGGCACCATCAAT + +------ + +This tool is based on `FASTX-toolkit`__ by Assaf Gordon. + + .. __: http://hannonlab.cshl.edu/fastx_toolkit/ + + + diff -r 000000000000 -r 78a7d28f2a15 fastx_artifacts_filter.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/fastx_artifacts_filter.xml Wed Jul 10 06:13:48 2013 -0400 @@ -0,0 +1,95 @@ + + + +cat '$input' | +fastx_artifacts_filter +#if $input.ext == "fastqsanger": + -Q 33 +#elif $input.ext == "fastq": + -Q 64 +#end if + -v -o '$output' + + + + + + + + + + + + + + + + + + + + + + + + +**What it does** + +This tool filters sequencing artifacts (reads with all but 3 identical bases). + +-------- + +**The following is an example of sequences which will be filtered outhis tool is based on `FASTX-toolkit`__ by Assaf Gordon. + + .. __: http://hannonlab.cshl.edu/fastx_toolkit/ + + + diff -r 000000000000 -r 78a7d28f2a15 fastx_barcode_splitter.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/fastx_barcode_splitter.xml Wed Jul 10 06:13:48 2013 -0400 @@ -0,0 +1,75 @@ + + + fastx_barcode_splitter_galaxy_wrapper.sh $BARCODE $input "$input.name" "$output.files_path" --mismatches $mismatches --partial $partial $EOL > $output + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +**What it does** + +This tool splits a FASTQ or FASTA file into several files, using barcodes as the split criteria. + +-------- + +**Barcode file Format** + +Barcode files are simple text files. +Each line should contain an identifier (descriptive name for the barcode), and the barcode itself (A/C/G/T), separated by a TAB character. +Example:: + + #This line is a comment (starts with a 'number' sign) + BC1 GATCT + BC2 ATCGT + BC3 GTGAT + BC4 TGTCT + +For each barcode, a new FASTQ file will be created (with the barcode's identifier as part of the file name). +Sequences matching the barcode will be stored in the appropriate file. + +One additional FASTQ file will be created (the 'unmatched' file), where sequences not matching any barcode will be stored. + +The output of this tool is an HTML file, displaying the split counts and the file locations. + +**Output Example** + +.. image:: ./static/fastx_icons/barcode_splitter_output_example.png + +------ + +This tool is based on `FASTX-toolkit`__ by Assaf Gordon. + + .. __: http://hannonlab.cshl.edu/fastx_toolkit/ + + + + diff -r 000000000000 -r 78a7d28f2a15 fastx_barcode_splitter_galaxy_wrapper.sh --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/fastx_barcode_splitter_galaxy_wrapper.sh Wed Jul 10 06:13:48 2013 -0400 @@ -0,0 +1,85 @@ +#!/bin/sh + +# FASTX-toolkit - FASTA/FASTQ preprocessing tools. +# Copyright (C) 2009 A. Gordon (gordon@cshl.edu) +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU Affero General Public License as +# published by the Free Software Foundation, either version 3 of the +# License, or (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU Affero General Public License for more details. +# +# You should have received a copy of the GNU Affero General Public License +# along with this program. If not, see . + +# +#This is a shell script wrapper for 'fastx_barcode_splitter.pl' +# +# 1. Output files are saved at the dataset's files_path directory. +# +# 2. 'fastx_barcode_splitter.pl' outputs a textual table. +# This script turns it into pretty HTML with working URL +# (so lazy users can just click on the URLs and get their files) + +if [ "$1x" = "x" ]; then + echo "Usage: $0 [BARCODE FILE] [FASTQ FILE] [LIBRARY_NAME] [OUTPUT_PATH]" >&2 + exit 1 +fi + +BARCODE_FILE="$1" +FASTQ_FILE="$2" +LIBNAME="$3" +OUTPUT_PATH="$4" +shift 4 +# The rest of the parameters are passed to the split program + +if [ "${OUTPUT_PATH}x" = "x" ]; then + echo "Usage: $0 [BARCODE FILE] [FASTQ FILE] [LIBRARY_NAME] [OUTPUT_PATH]" >&2 + exit 1 +fi + +#Sanitize library name, make sure we can create a file with this name +LIBNAME=${LIBNAME%.gz} +LIBNAME=${LIBNAME%.txt} +LIBNAME=$(echo "$LIBNAME" | tr -cd '[:alnum:]') + +if [ ! -r "$FASTQ_FILE" ]; then + echo "Error: Input file ($FASTQ_FILE) not found!" >&2 + exit 1 +fi +if [ ! -r "$BARCODE_FILE" ]; then + echo "Error: barcode file ($BARCODE_FILE) not found!" >&2 + exit 1 +fi +mkdir -p "$OUTPUT_PATH" +if [ ! -d "$OUTPUT_PATH" ]; then + echo "Error: failed to create output path '$OUTPUT_PATH'" >&2 + exit 1 +fi + +PUBLICURL="" +BASEPATH="$OUTPUT_PATH/" +#PREFIX="$BASEPATH"`date "+%Y-%m-%d_%H%M__"`"${LIBNAME}__" +PREFIX="$BASEPATH""${LIBNAME}__" +SUFFIX=".txt" + +RESULTS=`gzip -cdf "$FASTQ_FILE" | fastx_barcode_splitter.pl --bcfile "$BARCODE_FILE" --prefix "$PREFIX" --suffix "$SUFFIX" "$@"` +if [ $? != 0 ]; then + echo "error" +fi + +# +# Convert the textual tab-separated table into simple HTML table, +# with the local path replaces with a valid URL +echo "" +echo "$RESULTS" | sed -r "s|$BASEPATH(.*)|\\1|" | sed ' +i
+s|\t||g +a<\/td><\/tr> +' +echo "

" +echo "

" diff -r 000000000000 -r 78a7d28f2a15 fastx_clipper.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/fastx_clipper.xml Wed Jul 10 06:13:48 2013 -0400 @@ -0,0 +1,117 @@ + + adapter sequences + +cat '$input' | +fastx_clipper +#if $input.ext == "fastqsanger": + -Q 33 +#elif $input.ext == "fastq": + -Q 64 +#end if + -l $minlength -a '$clip_source.clip_sequence' -d $keepdelta -o '$output' -v $KEEP_N $DISCARD_OPTIONS + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + use this for hairpin barcoding. keep at 0 unless you know what you're doing. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +**What it does** + +This tool clips adapters from the 3'-end of the sequences in a FASTA/FASTQ file. + +-------- + + +**Clipping Illustration:** + +.. image:: ../static/fastx_icons/fastx_clipper_illustration.png + + + + + + + + +**Clipping Example:** + +.. image:: ../static/fastx_icons/fastx_clipper_example.png + + + +**In the above example:** + +* Sequence no. 1 was discarded since it wasn't clipped (i.e. didn't contain the adapter sequence). (**Output** parameter). +* Sequence no. 5 was discarded --- it's length (after clipping) was shorter than 15 nt (**Minimum Sequence Length** parameter). + + +------ + +This tool is based on `FASTX-toolkit`__ by Assaf Gordon. + + .. __: http://hannonlab.cshl.edu/fastx_toolkit/ + + + + + diff -r 000000000000 -r 78a7d28f2a15 fastx_collapser.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/fastx_collapser.xml Wed Jul 10 06:13:48 2013 -0400 @@ -0,0 +1,92 @@ + + sequences + +cat '$input' | +fastx_collapser +#if $input.ext == "fastqsanger": + -Q 33 +#elif $input.ext == "fastq": + -Q 64 +#end if + -v -o '$output' + + + + + + + + + + + + + + + + + + +**What it does** + +This tool collapses identical sequences in a FASTQ or FASTA file into a single sequence. + +-------- + +**Example** + +Example Input File (Sequence "ATAT" appears multiple times):: + + >CSHL_2_FC0042AGLLOO_1_1_605_414 + TGCG + >CSHL_2_FC0042AGLLOO_1_1_537_759 + ATAT + >CSHL_2_FC0042AGLLOO_1_1_774_520 + TGGC + >CSHL_2_FC0042AGLLOO_1_1_742_502 + ATAT + >CSHL_2_FC0042AGLLOO_1_1_781_514 + TGAG + >CSHL_2_FC0042AGLLOO_1_1_757_487 + TTCA + >CSHL_2_FC0042AGLLOO_1_1_903_769 + ATAT + >CSHL_2_FC0042AGLLOO_1_1_724_499 + ATAT + +Example Output file:: + + >1-1 + TGCG + >2-4 + ATAT + >3-1 + TGGC + >4-1 + TGAG + >5-1 + TTCA + +.. class:: infomark + +Original Sequence Names / Lane descriptions (e.g. "CSHL_2_FC0042AGLLOO_1_1_742_502") are discarded. + +The output sequence name is composed of two numbers: the first is the sequence's number, the second is the multiplicity value. + +The following output:: + + >2-4 + ATAT + +means that the sequence "ATAT" is the second sequence in the file, and it appeared 4 times in the input FASTA file. + +------ + +This tool is based on `FASTX-toolkit`__ by Assaf Gordon. + + .. __: http://hannonlab.cshl.edu/fastx_toolkit/ + + + + diff -r 000000000000 -r 78a7d28f2a15 fastx_nucleotides_distribution.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/fastx_nucleotides_distribution.xml Wed Jul 10 06:13:48 2013 -0400 @@ -0,0 +1,51 @@ + + + fastx_nucleotide_distribution_graph.sh -t '$input.name' -i $input -o '$output' + + + + + + + + + + +**What it does** + +Creates a stacked-histogram graph for the nucleotide distribution in the Solexa library. + +.. class:: infomark + +**TIP:** Use the **FASTQ Statistics** tool to generate the report file needed for this tool. + +----- + +**Output Examples** + +The following chart clearly shows the barcode used at the 5'-end of the library: **GATCT** + +.. image:: ./static/fastx_icons/fastq_nucleotides_distribution_1.png + +In the following chart, one can almost 'read' the most abundant sequence by looking at the dominant values: **TGATA TCGTA TTGAT GACTG AA...** + +.. image:: ./static/fastx_icons/fastq_nucleotides_distribution_2.png + +The following chart shows a growing number of unknown (N) nucleotides towards later cycles (which might indicate a sequencing problem): + +.. image:: ./static/fastx_icons/fastq_nucleotides_distribution_3.png + +But most of the time, the chart will look rather random: + +.. image:: ./static/fastx_icons/fastq_nucleotides_distribution_4.png + +------ + +This tool is based on `FASTX-toolkit`__ by Assaf Gordon. + + .. __: http://hannonlab.cshl.edu/fastx_toolkit/ + + + + diff -r 000000000000 -r 78a7d28f2a15 fastx_nucleotides_distribution_line.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/fastx_nucleotides_distribution_line.xml Wed Jul 10 06:13:48 2013 -0400 @@ -0,0 +1,38 @@ + + + fastx_nucleotide_distribution_line_graph.sh -i '$input' -o '$output' + + + + + + + + + + + +**What it does** + +Creates a line and points graph for the nucleotide distribution in the Solexa library. + +.. class:: infomark + +**TIP:** Use the **FASTQ Statistics** tool to generate the report file needed for this tool. + +----- + +**Output Examples** + +.. image:: ../static/fastx_icons/fastq_nucleotides_distribution_line_graph.png + +------ + +This tool was created by Oliver Tam, based on `FASTX-toolkit`__ by Assaf Gordon. + + .. __: http://hannonlab.cshl.edu/fastx_toolkit/ + + + + diff -r 000000000000 -r 78a7d28f2a15 fastx_quality_statistics.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/fastx_quality_statistics.xml Wed Jul 10 06:13:48 2013 -0400 @@ -0,0 +1,112 @@ + + + +cat '$input' | +fastx_quality_stats +#if $input.ext == "fastqsanger": + -Q 33 +#elif $input.ext == "fastq": + -Q 64 +#end if + -o '$output' + + + + + + + + + + + + + + + + + + + +**What it does** + +Creates quality statistics report for the given Solexa/FASTQ library. + +.. class:: infomark + +**TIP:** This statistics report can be used as input for **Quality Score** and **Nucleotides Distribution** tools. + +----- + +**The output file will contain the following fields:** + +* column = column number (1 to 36 for a 36-cycles read FASTQ file) +* count = number of bases found in this column. +* min = Lowest quality score value found in this column. +* max = Highest quality score value found in this column. +* sum = Sum of quality score values for this column. +* mean = Mean quality score value for this column. +* Q1 = 1st quartile quality score. +* med = Median quality score. +* Q3 = 3rd quartile quality score. +* IQR = Inter-Quartile range (Q3-Q1). +* lW = 'Left-Whisker' value (for boxplotting). +* rW = 'Right-Whisker' value (for boxplotting). +* A_Count = Count of 'A' nucleotides found in this column. +* C_Count = Count of 'C' nucleotides found in this column. +* G_Count = Count of 'G' nucleotides found in this column. +* T_Count = Count of 'T' nucleotides found in this column. +* N_Count = Count of 'N' nucleotides found in this column. + + + +**Output Example**:: + + column count min max sum mean Q1 med Q3 IQR lW rW A_Count C_Count G_Count T_Count N_Count + 1 6362991 -4 40 250734117 39.41 40 40 40 0 40 40 1396976 1329101 678730 2958184 0 + 2 6362991 -5 40 250531036 39.37 40 40 40 0 40 40 1786786 1055766 1738025 1782414 0 + 3 6362991 -5 40 248722469 39.09 40 40 40 0 40 40 2296384 984875 1443989 1637743 0 + 4 6362991 -5 40 247654797 38.92 40 40 40 0 40 40 1683197 1410855 1722633 1546306 0 + 5 6362991 -4 40 248214827 39.01 40 40 40 0 40 40 2536861 1167423 1248968 1409739 0 + 6 6362991 -5 40 248499903 39.05 40 40 40 0 40 40 1598956 1236081 1568608 1959346 0 + 7 6362991 -4 40 247719760 38.93 40 40 40 0 40 40 1692667 1822140 1496741 1351443 0 + 8 6362991 -5 40 245745205 38.62 40 40 40 0 40 40 2230936 1343260 1529928 1258867 0 + 9 6362991 -5 40 245766735 38.62 40 40 40 0 40 40 1702064 1306257 1336511 2018159 0 + 10 6362991 -5 40 245089706 38.52 40 40 40 0 40 40 1519917 1446370 1450995 1945709 0 + 11 6362991 -5 40 242641359 38.13 40 40 40 0 40 40 1717434 1282975 1387804 1974778 0 + 12 6362991 -5 40 242026113 38.04 40 40 40 0 40 40 1662872 1202041 1519721 1978357 0 + 13 6362991 -5 40 238704245 37.51 40 40 40 0 40 40 1549965 1271411 1973291 1566681 1643 + 14 6362991 -5 40 235622401 37.03 40 40 40 0 40 40 2101301 1141451 1603990 1515774 475 + 15 6362991 -5 40 230766669 36.27 40 40 40 0 40 40 2344003 1058571 1440466 1519865 86 + 16 6362991 -5 40 224466237 35.28 38 40 40 2 35 40 2203515 1026017 1474060 1651582 7817 + 17 6362991 -5 40 219990002 34.57 34 40 40 6 25 40 1522515 1125455 2159183 1555765 73 + 18 6362991 -5 40 214104778 33.65 30 40 40 10 15 40 1479795 2068113 1558400 1249337 7346 + 19 6362991 -5 40 212934712 33.46 30 40 40 10 15 40 1432749 1231352 1769799 1920093 8998 + 20 6362991 -5 40 212787944 33.44 29 40 40 11 13 40 1311657 1411663 2126316 1513282 73 + 21 6362991 -5 40 211369187 33.22 28 40 40 12 10 40 1887985 1846300 1300326 1318380 10000 + 22 6362991 -5 40 213371720 33.53 30 40 40 10 15 40 542299 3446249 516615 1848190 9638 + 23 6362991 -5 40 221975899 34.89 36 40 40 4 30 40 347679 1233267 926621 3855355 69 + 24 6362991 -5 40 194378421 30.55 21 40 40 19 -5 40 433560 674358 3262764 1992242 67 + 25 6362991 -5 40 199773985 31.40 23 40 40 17 -2 40 944760 325595 1322800 3769641 195 + 26 6362991 -5 40 179404759 28.20 17 34 40 23 -5 40 3457922 156013 1494664 1254293 99 + 27 6362991 -5 40 163386668 25.68 13 28 40 27 -5 40 1392177 281250 3867895 821491 178 + 28 6362991 -5 40 156230534 24.55 12 25 40 28 -5 40 907189 981249 4174945 299437 171 + 29 6362991 -5 40 163236046 25.65 13 28 40 27 -5 40 1097171 3418678 1567013 280008 121 + 30 6362991 -5 40 151309826 23.78 12 23 40 28 -5 40 3514775 2036194 566277 245613 132 + 31 6362991 -5 40 141392520 22.22 10 21 40 30 -5 40 1569000 4571357 124732 97721 181 + 32 6362991 -5 40 143436943 22.54 10 21 40 30 -5 40 1453607 4519441 38176 351107 660 + 33 6362991 -5 40 114269843 17.96 6 14 30 24 -5 40 3311001 2161254 155505 734297 934 + 34 6362991 -5 40 140638447 22.10 10 20 40 30 -5 40 1501615 1637357 18113 3205237 669 + 35 6362991 -5 40 138910532 21.83 10 20 40 30 -5 40 1532519 3495057 23229 1311834 352 + 36 6362991 -5 40 117158566 18.41 7 15 30 23 -5 40 4074444 1402980 63287 822035 245 + +------ + +This tool is based on `FASTX-toolkit`__ by Assaf Gordon. + + .. __: http://hannonlab.cshl.edu/fastx_toolkit/ + + + + diff -r 000000000000 -r 78a7d28f2a15 fastx_quality_statistics_ng.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/fastx_quality_statistics_ng.xml Wed Jul 10 06:13:48 2013 -0400 @@ -0,0 +1,187 @@ + + (improved) + +cat '$input' | +fastx_quality_stats +#if $input.ext == "fastqsanger": + -Q 33 +#elif $input.ext == "fastq": + -Q 64 +#end if + -N -o '$output' + + + + + + + + + + + + + + + + + + + +**What it does** + +Creates quality statistics report for the given Solexa/FASTQ library. + +.. class:: warningmark + +The output format is different than the old quality statistics tool. It can't be used for the quality-chart and nucleotide distribution tools (without further processing) + +----- + +**The output file will contain the following fields:** + +* cycle = cycle number (1 to 36 for a 36-cycles read solexa file) +* max-count = maximum number of bases (in all cycles) + +For each nucleotide type of each cycle (ALL/A/C/G/T/N), the following columns are generated: + +* count = number of bases found in this column. +* min = Lowest quality score value found in this column. +* max = Highest quality score value found in this column. +* sum = Sum of quality score values for this column. +* mean = Mean quality score value for this column. +* Q1 = 1st quartile quality score. +* med = Median quality score. +* Q3 = 3rd quartile quality score. +* IQR = Inter-Quartile range (Q3-Q1). +* lW = 'Left-Whisker' value (for boxplotting). +* rW = 'Right-Whisker' value (for boxplotting). + + +(see column list at the bottom of this page) + +----- + +**Output Example**:: + + cycle max_count ALL_count ALL_min ALL_max ALL_sum ALL_mean ALL_Q1 ALL_med ALL_Q3 ALL_IQR ALL_lW ALL_rW A_count A_min A_max A_sum A_mean A_Q1 A_med A_Q3 A_IQR A_lW A_rW C_count C_min C_max C_sum C_mean C_Q1 C_med C_Q3 C_IQR C_lW C_rW G_count G_min G_max G_sum G_mean G_Q1 G_med G_Q3 G_IQR G_lW G_rW T_count T_min T_max T_sum T_mean T_Q1 T_med T_Q3 T_IQR T_lW T_rW N_count N_min N_max N_sum N_mean N_Q1 N_med N_Q3 N_IQR N_lW N_rW + 1 2827201 2827201 5 34 86622739 30.64 33 33 33 0 33 33 31337 5 34 841248 26.85 23 30 33 10 8 34 9269 5 34 154582 16.68 5 12 30 25 5 34 2095406 5 34 64401991 30.73 33 33 33 0 33 33 689133 5 34 21214602 30.78 33 33 33 0 33 33 2056 5 13 10316 5.02 5 5 5 0 5 5 + 2 2827201 2827201 5 34 81416729 28.80 27 33 33 6 18 34 1860337 5 34 56188709 30.20 33 33 33 0 33 33 21274 5 34 420221 19.75 11 21 30 19 5 34 862406 5 34 22835654 26.48 21 32 33 12 5 34 81979 5 34 1964575 23.96 17 26 33 16 5 34 1205 5 24 7570 6.28 5 5 5 0 5 5 + 3 2827201 2827201 5 34 89142476 31.53 33 33 34 1 32 34 18121 5 34 203489 11.23 5 5 15 10 5 30 45699 5 34 944362 20.66 5 26 33 28 5 34 79472 5 34 859251 10.81 5 5 12 7 5 22 2682082 5 34 87126165 32.48 33 33 34 1 32 34 1827 5 18 9209 5.04 5 5 5 0 5 5 + 4 2827201 2827201 5 34 90033575 31.85 33 34 34 1 32 34 172281 5 34 2905831 16.87 5 11 33 28 5 34 2597111 5 34 85653490 32.98 33 34 34 1 32 34 24461 5 34 643275 26.30 23 33 33 10 8 34 32749 5 34 827798 25.28 17 33 33 16 5 34 599 5 21 3181 5.31 5 5 5 0 5 5 + 5 2827201 2827201 5 34 89641650 31.71 33 33 34 1 32 34 26774 5 34 476388 17.79 5 13 33 28 5 34 58691 5 34 891506 15.19 5 5 32 27 5 34 54916 5 34 714335 13.01 5 5 24 19 5 34 2685062 5 34 87550414 32.61 33 33 34 1 32 34 1758 5 21 9007 5.12 5 5 5 0 5 5 + 6 2827201 2827201 5 34 84595812 29.92 29 33 33 4 23 34 1204450 5 34 36229599 30.08 29 33 33 4 23 34 463119 5 34 13924930 30.07 30 33 33 3 26 34 712076 5 34 21093763 29.62 28 33 33 5 21 34 447508 5 34 13347178 29.83 29 33 33 4 23 34 48 5 21 342 7.12 5 5 7 2 5 10 + 7 2827201 2827201 5 34 81404399 28.79 26 33 33 7 16 34 912751 5 34 26241597 28.75 26 33 33 7 16 34 540022 5 34 15843612 29.34 28 33 33 5 21 34 701269 5 34 19699830 28.09 26 32 33 7 16 34 672893 5 34 19617405 29.15 27 33 33 6 18 34 266 5 24 1955 7.35 5 5 7 2 5 10 + 8 2827201 2827201 5 34 83714332 29.61 28 33 33 5 21 34 809852 5 34 23610246 29.15 27 33 33 6 18 34 563842 5 34 17062600 30.26 30 33 33 3 26 34 650887 5 34 18848062 28.96 27 33 33 6 18 34 802551 5 34 24192911 30.15 30 33 33 3 26 34 69 5 24 513 7.43 5 5 5 0 5 5 + 9 2827201 2827201 5 34 83974872 29.70 28 33 33 5 21 34 834129 5 34 24483965 29.35 27 33 33 6 18 34 567059 5 34 17270502 30.46 30 33 33 3 26 34 620453 5 34 17917829 28.88 26 33 33 7 16 34 805499 5 34 24302177 30.17 30 33 33 3 26 34 61 5 26 399 6.54 5 5 5 0 5 5 + 10 2827201 2827201 5 34 83278375 29.46 27 33 33 6 18 34 896783 5 34 26245652 29.27 27 33 33 6 18 34 551055 5 34 16628773 30.18 30 33 33 3 26 34 648328 5 34 18502443 28.54 26 33 33 7 16 34 730957 5 34 21900963 29.96 30 33 33 3 26 34 78 5 21 544 6.97 5 5 7 2 5 10 + 11 2827201 2827201 5 34 82511316 29.18 27 33 33 6 18 34 857880 5 34 24789241 28.90 27 33 33 6 18 34 642205 5 34 19342469 30.12 30 33 33 3 26 34 655942 5 34 18484775 28.18 26 31 33 7 16 34 671046 5 34 19893945 29.65 28 33 33 5 21 34 128 5 24 886 6.92 5 5 5 0 5 5 + 12 2827201 2827201 5 34 83171736 29.42 27 33 33 6 18 34 826807 5 34 24025084 29.06 27 33 33 6 18 34 586948 5 34 17775697 30.28 30 33 33 3 26 34 636734 5 34 17999923 28.27 26 31 33 7 16 34 776614 5 34 23370369 30.09 30 33 33 3 26 34 98 5 18 663 6.77 5 5 5 0 5 5 + 13 2827201 2827201 5 34 82829608 29.30 27 33 33 6 18 34 822607 5 34 23762719 28.89 27 33 33 6 18 34 644718 5 34 19458385 30.18 30 33 33 3 26 34 591437 5 34 16607859 28.08 26 31 33 7 16 34 768323 5 34 22999812 29.94 30 33 33 3 26 34 116 5 24 833 7.18 5 5 7 2 5 10 + 14 2827201 2827201 5 34 82345826 29.13 27 33 33 6 18 34 798164 5 34 22982673 28.79 26 33 33 7 16 34 649845 5 34 19506688 30.02 30 33 33 3 26 34 608966 5 34 16892044 27.74 24 31 33 9 11 34 770051 5 34 22963200 29.82 29 33 33 4 23 34 175 5 24 1221 6.98 5 5 5 0 5 5 + 15 2827201 2827201 5 34 82462892 29.17 27 33 33 6 18 34 831167 5 34 23971929 28.84 26 33 33 7 16 34 613017 5 34 18416386 30.04 30 33 33 3 26 34 621149 5 34 17284205 27.83 24 31 33 9 11 34 761767 5 34 22789700 29.92 29 33 33 4 23 34 101 5 18 672 6.65 5 5 5 0 5 5 + 16 2827201 2827201 5 34 82526664 29.19 27 33 33 6 18 34 824933 5 34 23753705 28.79 26 33 33 7 16 34 610126 5 34 18388479 30.14 30 33 33 3 26 34 612088 5 34 16999148 27.77 24 31 33 9 11 34 779925 5 34 23384436 29.98 30 33 33 3 26 34 129 5 24 896 6.95 5 5 5 0 5 5 + 17 2827201 2827201 5 34 82610038 29.22 27 33 33 6 18 34 819008 5 34 23665033 28.89 27 33 33 6 18 34 618277 5 34 18651436 30.17 30 33 33 3 26 34 597414 5 34 16501609 27.62 24 30 33 9 11 34 792381 5 34 23791076 30.02 30 33 33 3 26 34 121 5 21 884 7.31 5 5 5 0 5 5 + 18 2827201 2827201 5 34 82402647 29.15 27 33 33 6 18 34 815170 5 34 23471377 28.79 26 33 33 7 16 34 615913 5 34 18527086 30.08 30 33 33 3 26 34 607020 5 34 16707257 27.52 24 30 33 9 11 34 788977 5 34 23695988 30.03 30 33 33 3 26 34 121 5 24 939 7.76 5 5 11 6 5 20 + 19 2827201 2827201 5 34 82124647 29.05 27 33 33 6 18 34 799663 5 34 22872641 28.60 26 32 33 7 16 34 628535 5 34 18876510 30.03 30 33 33 3 26 34 610246 5 34 16776560 27.49 24 30 33 9 11 34 788629 5 34 23598027 29.92 29 33 33 4 23 34 128 5 27 909 7.10 5 5 5 0 5 5 + 20 2827201 2827201 5 34 81985110 29.00 27 33 33 6 18 34 797587 5 34 22834667 28.63 26 32 33 7 16 34 636494 5 34 19081110 29.98 30 33 33 3 26 34 603916 5 34 16456404 27.25 24 30 33 9 11 34 789056 5 34 23611835 29.92 29 33 33 4 23 34 148 5 27 1094 7.39 5 5 7 2 5 10 + 21 2827201 2827201 5 34 81789492 28.93 27 33 33 6 18 34 794078 5 34 22654429 28.53 26 32 33 7 16 34 636334 5 34 19008271 29.87 29 33 33 4 23 34 614943 5 34 16761297 27.26 24 30 33 9 11 34 781661 5 34 23364202 29.89 29 33 33 4 23 34 185 5 27 1293 6.99 5 5 5 0 5 5 + 22 2827201 2827201 5 34 81451811 28.81 27 33 33 6 18 34 789032 5 34 22366485 28.35 26 31 33 7 16 34 645777 5 34 19277917 29.85 29 33 33 4 23 34 608030 5 34 16404902 26.98 23 29 33 10 8 34 784198 5 34 23401407 29.84 29 33 33 4 23 34 164 5 24 1100 6.71 5 5 5 0 5 5 + 23 2827201 2827201 5 34 80945146 28.63 26 32 33 7 16 34 786207 5 34 22128593 28.15 26 31 33 7 16 34 647440 5 34 19187231 29.64 28 33 33 5 21 34 607663 5 34 16274550 26.78 22 29 33 11 6 34 785744 5 34 23353803 29.72 29 33 33 4 23 34 147 5 24 969 6.59 5 5 5 0 5 5 + 24 2827201 2827201 5 34 80501327 28.47 26 32 33 7 16 34 786929 5 34 22067207 28.04 26 31 33 7 16 34 645831 5 34 19042366 29.49 28 33 33 5 21 34 612772 5 34 16261175 26.54 22 29 33 11 6 34 781496 5 34 23129334 29.60 28 33 33 5 21 34 173 5 26 1245 7.20 5 5 5 0 5 5 + 25 2827201 2827201 5 34 79714527 28.20 26 31 33 7 16 34 782000 5 34 21701186 27.75 24 30 33 9 11 34 644171 5 34 18796511 29.18 27 33 33 6 18 34 617490 5 34 16226119 26.28 22 28 33 11 6 34 783396 5 34 22989588 29.35 27 33 33 6 18 34 144 5 26 1123 7.80 5 5 11 6 5 20 + 26 2827201 2827201 5 34 77523225 27.42 24 31 33 9 11 34 783881 5 34 21162231 27.00 24 30 33 9 11 34 645075 5 34 18368273 28.47 27 33 33 6 18 34 617885 5 34 15635967 25.31 21 27 33 12 5 34 779368 5 34 22349766 28.68 27 33 33 6 18 34 992 5 27 6988 7.04 5 5 5 0 5 5 + 27 2827201 2827201 5 34 76792679 27.16 24 31 33 9 11 34 788575 5 34 21113021 26.77 23 30 33 10 8 34 638456 5 34 18023093 28.23 26 32 33 7 16 34 624665 5 34 15600176 24.97 21 27 33 12 5 34 774483 5 34 22049478 28.47 27 32 33 6 18 34 1022 5 27 6911 6.76 5 5 5 0 5 5 + 28 2827201 2827201 5 34 76446203 27.04 24 30 33 9 11 34 783001 5 34 20828394 26.60 22 30 33 11 6 34 639424 5 34 17921638 28.03 26 32 33 7 16 34 621361 5 34 15437055 24.84 21 27 33 12 5 34 782313 5 34 22251729 28.44 27 32 33 6 18 34 1102 5 26 7387 6.70 5 5 5 0 5 5 + 29 2827201 2827201 5 34 75869397 26.84 24 30 33 9 11 34 777718 5 34 20485923 26.34 22 30 33 11 6 34 645283 5 34 18004108 27.90 26 31 33 7 16 34 627295 5 34 15440771 24.61 21 27 33 12 5 34 775728 5 34 21930783 28.27 26 32 33 7 16 34 1177 5 27 7812 6.64 5 5 5 0 5 5 + 30 2827201 2827201 5 34 75137420 26.58 22 30 33 11 6 34 779313 5 34 20336426 26.10 22 29 33 11 6 34 646974 5 34 17887122 27.65 24 31 33 9 11 34 626980 5 34 15205903 24.25 19 26 33 14 5 34 772774 5 34 21699992 28.08 26 31 33 7 16 34 1160 5 27 7977 6.88 5 5 5 0 5 5 + 31 2827201 2827201 5 34 74256817 26.27 22 30 33 11 6 34 780211 5 34 20171360 25.85 21 29 33 12 5 34 645371 5 34 17606830 27.28 24 31 33 9 11 34 629456 5 34 14997599 23.83 18 26 33 15 5 34 771023 5 34 21473316 27.85 26 31 33 7 16 34 1140 5 27 7712 6.76 5 5 5 0 5 5 + 32 2827201 2827201 5 34 73624704 26.04 22 29 33 11 6 34 776741 5 34 19802248 25.49 21 28 33 12 5 34 642994 5 34 17408712 27.07 24 30 33 9 11 34 631699 5 34 14925494 23.63 18 26 32 14 5 34 774316 5 34 21478972 27.74 26 31 33 7 16 34 1451 5 27 9278 6.39 5 5 5 0 5 5 + 33 2827201 2827201 5 34 72833249 25.76 21 29 33 12 5 34 775426 5 34 19509710 25.16 21 27 33 12 5 34 644177 5 34 17265182 26.80 24 30 33 9 11 34 627490 5 34 14612407 23.29 18 26 31 13 5 34 778476 5 34 21435400 27.54 24 31 33 9 11 34 1632 5 27 10550 6.46 5 5 5 0 5 5 + 34 2827201 2827201 5 34 71937995 25.44 21 28 33 12 5 34 772803 5 34 19226676 24.88 21 27 33 12 5 34 647127 5 34 17098061 26.42 22 30 33 11 6 34 628686 5 34 14382900 22.88 17 24 31 14 5 34 777289 5 34 21221307 27.30 24 30 33 9 11 34 1296 5 27 9051 6.98 5 5 5 0 5 5 + 35 2827201 2827201 5 34 70604895 24.97 21 27 33 12 5 34 769554 5 34 18722160 24.33 19 27 32 13 5 34 643915 5 34 16662802 25.88 21 28 33 12 5 34 627642 5 34 14115224 22.49 17 24 30 13 5 34 784712 5 34 21095775 26.88 24 30 33 9 11 34 1378 5 27 8934 6.48 5 5 5 0 5 5 + 36 2827201 2827201 5 34 71705284 25.36 21 28 33 12 5 34 775278 5 34 18770248 24.21 18 27 33 15 5 34 634906 5 34 16703972 26.31 22 30 33 11 6 34 630819 5 34 14421307 22.86 17 24 31 14 5 34 784826 5 34 21800547 27.78 26 32 33 7 16 34 1372 5 27 9210 6.71 5 5 5 0 5 5 + +----- + +All columns:: + + cycle + max_count + ALL_count + ALL_min + ALL_max + ALL_sum + ALL_mean + ALL_Q1 + ALL_med + ALL_Q3 + ALL_IQR + ALL_lW + ALL_rW + A_count + A_min + A_max + A_sum + A_mean + A_Q1 + A_med + A_Q3 + A_IQR + A_lW + A_rW + C_count + C_min + C_max + C_sum + C_mean + C_Q1 + C_med + C_Q3 + C_IQR + C_lW + C_rW + G_count + G_min + G_max + G_sum + G_mean + G_Q1 + G_med + G_Q3 + G_IQR + G_lW + G_rW + T_count + T_min + T_max + T_sum + T_mean + T_Q1 + T_med + T_Q3 + T_IQR + T_lW + T_rW + N_count + N_min + N_max + N_sum + N_mean + N_Q1 + N_med + N_Q3 + N_IQR + N_lW + N_rW + +------ + +This tool is based on `FASTX-toolkit`__ by Assaf Gordon. + + .. __: http://hannonlab.cshl.edu/fastx_toolkit/ + + + + diff -r 000000000000 -r 78a7d28f2a15 fastx_renamer.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/fastx_renamer.xml Wed Jul 10 06:13:48 2013 -0400 @@ -0,0 +1,75 @@ + + + +cat '$input' | +fastx_renamer +#if $input.ext == "fastqsanger": + -Q 33 +#elif $input.ext == "fastq": + -Q 64 +#end if + -n $TYPE -o '$output' -v + + + + + + + + + + + + + + + + + + + + + + + +**What it does** + +This tool renames the sequence identifiers in a FASTQ/A file. + +.. class:: infomark + +Use this tool at the beginning of your workflow, as a way to keep the original sequence (before trimming, clipping, barcode-removal, etc). + +-------- + +**Example** + +The following Solexa-FASTQ file:: + + @CSHL_4_FC042GAMMII_2_1_517_596 + GGTCAATGATGAGTTGGCACTGTAGGCACCATCAAT + +CSHL_4_FC042GAMMII_2_1_517_596 + 40 40 40 40 40 40 40 40 40 40 38 40 40 40 40 40 14 40 40 40 40 40 36 40 13 14 24 24 9 24 9 40 10 10 15 40 + +Renamed to **nucleotides sequence**:: + + @GGTCAATGATGAGTTGGCACTGTAGGCACCATCAAT + GGTCAATGATGAGTTGGCACTGTAGGCACCATCAAT + +GGTCAATGATGAGTTGGCACTGTAGGCACCATCAAT + 40 40 40 40 40 40 40 40 40 40 38 40 40 40 40 40 14 40 40 40 40 40 36 40 13 14 24 24 9 24 9 40 10 10 15 40 + +Renamed to **numeric counter**:: + + @1 + GGTCAATGATGAGTTGGCACTGTAGGCACCATCAAT + +1 + 40 40 40 40 40 40 40 40 40 40 38 40 40 40 40 40 14 40 40 40 40 40 36 40 13 14 24 24 9 24 9 40 10 10 15 40 + +------ + +This tool is based on `FASTX-toolkit`__ by Assaf Gordon. + + .. __: http://hannonlab.cshl.edu/fastx_toolkit/ + + + diff -r 000000000000 -r 78a7d28f2a15 fastx_reverse_complement.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/fastx_reverse_complement.xml Wed Jul 10 06:13:48 2013 -0400 @@ -0,0 +1,68 @@ + + + +cat '$input' | +fastx_reverse_complement +#if $input.ext == "fastqsanger": + -Q 33 +#elif $input.ext == "fastq": + -Q 64 +#end if + -v -o '$output' + + + + + + + + + + + + + + + + + + + + + + + +**What it does** + +This tool reverse-complements each sequence in a library. +If the library is a FASTQ, the quality-scores are also reversed. + +-------- + +**Example** + +Input FASTQ file:: + + @CSHL_1_FC42AGWWWXX:8:1:3:740 + TGTCTGTAGCCTCNTCCTTGTAATTCAAAGNNGGTA + +CSHL_1_FC42AGWWWXX:8:1:3:740 + 33 33 33 34 33 33 33 33 33 33 33 33 27 5 27 33 33 33 33 33 33 27 21 27 33 32 31 29 26 24 5 5 15 17 27 26 + + +Output FASTQ file:: + + @CSHL_1_FC42AGWWWXX:8:1:3:740 + TACCNNCTTTGAATTACAAGGANGAGGCTACAGACA + +CSHL_1_FC42AGWWWXX:8:1:3:740 + 26 27 17 15 5 5 24 26 29 31 32 33 27 21 27 33 33 33 33 33 33 27 5 27 33 33 33 33 33 33 33 33 34 33 33 33 + +------ + +This tool is based on `FASTX-toolkit`__ by Assaf Gordon. + + .. __: http://hannonlab.cshl.edu/fastx_toolkit/ + + + + diff -r 000000000000 -r 78a7d28f2a15 fastx_trimmer.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/fastx_trimmer.xml Wed Jul 10 06:13:48 2013 -0400 @@ -0,0 +1,85 @@ + + to fixed length + +cat '$input' | +fastx_trimmer +#if $input.ext == "fastqsanger": + -Q 33 +#elif $input.ext == "fastq": + -Q 64 +#end if + -v -f $first -l $last -o '$output' + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +**What it does** + +This tool trims (cut nucleotides from) sequences in a FASTA/Q file. + +-------- + +**Example** + +Input Fasta file (with 36 bases in each sequences):: + + >1-1 + TATGGTCAGAAACCATATGCAGAGCCTGTAGGCACC + >2-1 + CAGCGAGGCTTTAATGCCATTTGGCTGTAGGCACCA + + +Trimming with First=1 and Last=21, we get a FASTA file with 21 bases in each sequences (starting from the first base):: + + >1-1 + TATGGTCAGAAACCATATGCA + >2-1 + CAGCGAGGCTTTAATGCCATT + +Trimming with First=6 and Last=10, will generate a FASTA file with 5 bases (bases 6,7,8,9,10) in each sequences:: + + >1-1 + TCAGA + >2-1 + AGGCT + +------ + +This tool is based on `FASTX-toolkit`__ by Assaf Gordon. + + .. __: http://hannonlab.cshl.edu/fastx_toolkit/ + + + + diff -r 000000000000 -r 78a7d28f2a15 fastx_trimmer_from_end.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/fastx_trimmer_from_end.xml Wed Jul 10 06:13:48 2013 -0400 @@ -0,0 +1,81 @@ + + of sequences + +cat '$input' | +fastx_trimmer +#if $input.ext == "fastqsanger": + -Q 33 +#elif $input.ext == "fastq": + -Q 64 +#end if + -v -t $trimnum -m $minlen -o '$output' + + + + + + + This will trim from the end of the sequences + + + + + Sequences shorter than this length will be discarded + + + + + + + + + + + + + + + + +**What it does** + +This tool trims (cut nucleotides from) sequences in a FASTQ/FASTA file from the 3' end. + +.. class:: infomark + +When trimming a FASTQ file, the quality scores will be trimmed appropriately (to the same length of the corresponding sequence). + +-------- + +**Example** + +Input Fasta file:: + + >1-1 + TATGGTCAGAAACCATATGCAGAGCCTGTAGGCACC + >2-1 + CAGCGAGGCTTTAATGCCATT + + +Trimming 5 nucleotides from the end, and discarding sequences shorter than 10 , we get the following FASTA file:: + + >1-1 + TATGGTCAGAAACCATATGCAGAGCCTGTAG + >2-1 + CAGCGAGGCTTTAATG + +Trimming 10 nucleotides from the end, and discarding sequences shorter than 15 , we get the following FASTA file:: + + >1-1 + TATGGTCAGAAACCATATGCAGAGCC + +------ + +This tool is based on `FASTX-toolkit`__ by Assaf Gordon. + + .. __: http://hannonlab.cshl.edu/fastx_toolkit/ + + + + diff -r 000000000000 -r 78a7d28f2a15 fastx_uncollapser.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/fastx_uncollapser.xml Wed Jul 10 06:13:48 2013 -0400 @@ -0,0 +1,64 @@ + + sequences + +cat '$input' | +fastx_uncollapser -v -o '$output' + + + + + + + + + + + + + + + + + +**What it does** + +This tool uncollapses a previously-collapsed FASTA file. It reads each collapsed sequence and generates multiple sequences based on the collapsed read count. + +-------- + +**Example** + +Example Input - a collapsed FASTA file (Sequence "ATAT" has four collapsed reads):: + + >1-1 + TGCG + >2-4 + ATAT + +Example Output - uncollapsed FASTA file (Sequence "ATAT" now appears as 4 separate sequences):: + + >1 + TGCG + >2 + ATAT + >3 + ATAT + >4 + ATAT + >5 + ATAT + +.. class:: infomark + +The original sequence id (with the read counts) are discarded, with the sequence given a numerical name. + +----- + +This tool is based on `FASTX-toolkit`__ by Assaf Gordon. + + .. __: http://hannonlab.cshl.edu/fastx_toolkit/ + + + + diff -r 000000000000 -r 78a7d28f2a15 seqid_uncollapser.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/seqid_uncollapser.xml Wed Jul 10 06:13:48 2013 -0400 @@ -0,0 +1,75 @@ + + containing collapsed sequence IDs + +cat '$input' | +fastx_uncollapser -c $idcol -v -o '$output' + + + + + This column contains the sequence id from a collapsed FASTA file in the form of "(seq number)-(read count)" (e.g. 15-4). Use 10 if you're analyzing BLAT output + + + + + + + + + + + + + + + +**What it does** + +This tool reads a row (in a table) containing a collapsed sequence ID, and duplicates the . + +.. class:: warningmark + +You must specify the column containing the collapsed sequence ID (e.g. 15-4). + +-------- + +**Example Input File** + +The following input file contains two collapsed sequence identifiers at column 10: *84-2* and *87-5* + +(meaning the first has multiplicity-count of 2 and the second has multiplicity count of 5):: + + + 23 0 0 0 0 0 0 0 + 84-2 ... + 22 0 0 0 0 0 0 0 + 87-5 ... + + +**Output Example** + +After **uncollapsing** (on column 10), the line of the first sequence-identifier is repeated *twice*, and the line of the second sequence-identifier is repeated *five* times:: + + 23 0 0 0 0 0 0 0 + 84-2 ... + 23 0 0 0 0 0 0 0 + 84-2 ... + 22 0 0 0 0 0 0 0 + 87-5 ... + 22 0 0 0 0 0 0 0 + 87-5 ... + 22 0 0 0 0 0 0 0 + 87-5 ... + 22 0 0 0 0 0 0 0 + 87-5 ... + 22 0 0 0 0 0 0 0 + 87-5 ... + + +Uncollapsing a text file allows analsys of collapsed FASTA files to be used with any tool which doesn't 'understand' collapsed multiplicity counts. + +.. class:: infomark + +See the *Collapse* tool in the *FASTA Manipulation* category for more details about collapsing FASTA files. + +----- + +This tool is based on `FASTX-toolkit`__ by Assaf Gordon. + + .. __: http://hannonlab.cshl.edu/fastx_toolkit/ + + + +