Mercurial > repos > idot > fastx_toolkit2
comparison fastx_collapser.xml @ 0:78a7d28f2a15 draft
Uploaded
author | idot |
---|---|
date | Wed, 10 Jul 2013 06:13:48 -0400 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:78a7d28f2a15 |
---|---|
1 <tool id="cshl_fastx_collapser" name="Collapse"> | |
2 <description>sequences</description> | |
3 <command> | |
4 cat '$input' | | |
5 fastx_collapser | |
6 #if $input.ext == "fastqsanger": | |
7 -Q 33 | |
8 #elif $input.ext == "fastq": | |
9 -Q 64 | |
10 #end if | |
11 -v -o '$output' | |
12 </command> | |
13 | |
14 <inputs> | |
15 <param format="fastq,fastqsanger,fasta" name="input" type="data" label="Library to collapse" /> | |
16 </inputs> | |
17 | |
18 <tests> | |
19 <test> | |
20 <param name="input" value="fasta_collapser1.fasta" /> | |
21 <output name="output" file="fasta_collapser1.out" /> | |
22 </test> | |
23 </tests> | |
24 | |
25 <outputs> | |
26 <data format="fasta" name="output" metadata_source="input" | |
27 /> | |
28 </outputs> | |
29 <help> | |
30 | |
31 **What it does** | |
32 | |
33 This tool collapses identical sequences in a FASTQ or FASTA file into a single sequence. | |
34 | |
35 -------- | |
36 | |
37 **Example** | |
38 | |
39 Example Input File (Sequence "ATAT" appears multiple times):: | |
40 | |
41 >CSHL_2_FC0042AGLLOO_1_1_605_414 | |
42 TGCG | |
43 >CSHL_2_FC0042AGLLOO_1_1_537_759 | |
44 ATAT | |
45 >CSHL_2_FC0042AGLLOO_1_1_774_520 | |
46 TGGC | |
47 >CSHL_2_FC0042AGLLOO_1_1_742_502 | |
48 ATAT | |
49 >CSHL_2_FC0042AGLLOO_1_1_781_514 | |
50 TGAG | |
51 >CSHL_2_FC0042AGLLOO_1_1_757_487 | |
52 TTCA | |
53 >CSHL_2_FC0042AGLLOO_1_1_903_769 | |
54 ATAT | |
55 >CSHL_2_FC0042AGLLOO_1_1_724_499 | |
56 ATAT | |
57 | |
58 Example Output file:: | |
59 | |
60 >1-1 | |
61 TGCG | |
62 >2-4 | |
63 ATAT | |
64 >3-1 | |
65 TGGC | |
66 >4-1 | |
67 TGAG | |
68 >5-1 | |
69 TTCA | |
70 | |
71 .. class:: infomark | |
72 | |
73 Original Sequence Names / Lane descriptions (e.g. "CSHL_2_FC0042AGLLOO_1_1_742_502") are discarded. | |
74 | |
75 The output sequence name is composed of two numbers: the first is the sequence's number, the second is the multiplicity value. | |
76 | |
77 The following output:: | |
78 | |
79 >2-4 | |
80 ATAT | |
81 | |
82 means that the sequence "ATAT" is the second sequence in the file, and it appeared 4 times in the input FASTA file. | |
83 | |
84 ------ | |
85 | |
86 This tool is based on `FASTX-toolkit`__ by Assaf Gordon. | |
87 | |
88 .. __: http://hannonlab.cshl.edu/fastx_toolkit/ | |
89 | |
90 </help> | |
91 </tool> | |
92 <!-- FASTX-Collapser is part of the FASTX-toolkit, by A.Gordon (gordon@cshl.edu) --> |