Mercurial > repos > petr-novak > re_utils
comparison fastq_name_affixer.xml @ 22:58807b35777a draft
planemo upload commit 20bdf879b52796d3fb251a20807191ff02084d3c-dirty
author | petr-novak |
---|---|
date | Wed, 02 Aug 2023 11:31:12 +0000 |
parents | c2c69c6090f0 |
children | 36c418bca8b2 |
comparison
equal
deleted
inserted
replaced
21:f4ed6a65a2ff | 22:58807b35777a |
---|---|
1 <tool id="names_affixer" name="FASTQ Read name affixer" version="1.0.0"> | 1 <tool id="names_affixer" name="FASTQ Read name affixer" version="1.0.0"> |
2 <description> Tool appending suffix and prefix to sequences names </description> | 2 <description>Tool appending suffix and prefix to sequences names</description> |
3 <command interpreter="python"> | 3 <required_files> |
4 ${__tool_directory__}/name_affixer.py -f $input -p "$prefix" -s "$suffix" -n $nspace > $output | 4 <include type="literal" path="name_affixer.py"/> |
5 </command> | 5 </required_files> |
6 <command> | |
7 ${__tool_directory__}/name_affixer.py -f $input -p "$prefix" -s "$suffix" -n | |
8 $nspace > $output | |
9 </command> | |
6 | 10 |
7 <inputs> | 11 <inputs> |
8 <param format="fastq" type="data" name="input" label="Choose your FASTQ file" /> | 12 <param format="fastq" type="data" name="input" label="Choose your FASTQ file"/> |
9 <param name="prefix" type="text" size="10" value="" label="Prefix" help="Enter prefix which will be added to all sequences names" /> | 13 <param name="prefix" type="text" size="10" value="" label="Prefix" |
10 <param name="suffix" type="text" size="10" value="" label="Suffix" help="Enter suffix which will be added to all sequences names"/> | 14 help="Enter prefix which will be added to all sequences names"/> |
11 <param name="nspace" type="integer" size="10" value="0" min="0" max="1000" label="Number of spaces in sequence name to ignore" help="Sequence name is a string before the first space. If you want name to include spaces in name, enter positive integer. All other characters beyond ignored spaces are omitted"/> | 15 <param name="suffix" type="text" size="10" value="" label="Suffix" |
12 </inputs> | 16 help="Enter suffix which will be added to all sequences names"/> |
17 <param name="nspace" type="integer" size="10" value="0" min="0" max="1000" | |
18 label="Number of spaces in sequence name to ignore" | |
19 help="Sequence name is a string before the first space. If you want name to include spaces in name, enter positive integer. All other characters beyond ignored spaces are omitted"/> | |
20 </inputs> | |
13 | 21 |
14 | 22 |
15 <outputs> | 23 <outputs> |
16 <data format="fastq" name="output" label="FASTQ dataset ${input.hid} with modified sequence names" /> | 24 <data format="fastq" name="output" |
17 </outputs> | 25 label="FASTQ dataset ${input.hid} with modified sequence names"/> |
26 </outputs> | |
18 | 27 |
19 <help> | 28 <help> |
20 **What is does** | 29 **What is does** |
21 | |
22 Tool for appending prefix and suffix to sequences names in fastq formated sequences. | |
23 | 30 |
24 **Example** | 31 Tool for appending prefix and suffix to sequences names in fastq formated |
32 sequences. | |
25 | 33 |
26 The following Solexa-FASTQ file: | 34 **Example** |
27 | |
28 :: | |
29 | |
30 @CSHL_4_FC042GAMMII_2_1_517_596 | |
31 GGTCAATGATGAGTTGGCACTGTAGGCACCATCAAT | |
32 +CSHL_4_FC042GAMMII_2_1_517_596 | |
33 40 40 40 40 40 40 40 40 40 40 38 40 40 40 40 40 14 40 40 40 40 40 36 40 13 14 24 24 9 24 9 40 10 10 15 40 | |
34 | |
35 is renamed to: | |
36 | 35 |
37 :: | 36 The following Solexa-FASTQ file: |
38 | |
39 @prefixCSHL_4_FC042GAMMII_2_1_517_596suffix | |
40 GGTCAATGATGAGTTGGCACTGTAGGCACCATCAAT | |
41 +prefixCSHL_4_FC042GAMMII_2_1_517_596suffix | |
42 40 40 40 40 40 40 40 40 40 40 38 40 40 40 40 40 14 40 40 40 40 40 36 40 13 14 24 24 9 24 9 40 10 10 15 40 | |
43 | 37 |
44 different format: | 38 :: |
45 | |
46 | 39 |
47 :: | 40 @CSHL_4_FC042GAMMII_2_1_517_596 |
48 | 41 GGTCAATGATGAGTTGGCACTGTAGGCACCATCAAT |
49 @HISEQ1:92:c0190acxx:8:1101:1252:2230 2:N:0:CGATGT | 42 +CSHL_4_FC042GAMMII_2_1_517_596 |
50 AGAGGAAAAAACATAGTTCTTGTCTAAAAAAATCCCTTGAAAAAGGGCAGATGTATAGAAATAGAAAATTTCAAAGAAAAACTCTCTACAAATGGAAGAGA | 43 40 40 40 40 40 40 40 40 40 40 38 40 40 40 40 40 14 40 40 40 40 40 36 40 13 14 24 |
51 + | 44 24 9 24 9 40 10 10 15 40 |
52 CCCFFFFFHHHHHJJJJIJJJJJJJJJJJJJJJIJJJJJIIJJJJJJGIJIJIHHHHHHHHFFFFFFDEEEEEDCDDDDDDDCCDDDEDDDDD>CCCCB@9 | |
53 | 45 |
54 is renamed to: | 46 is renamed to: |
55 | 47 |
56 :: | 48 :: |
57 | |
58 @prefixHISEQ1:92:c0190acxx:8:1101:1252:2230suffix | |
59 AGAGGAAAAAACATAGTTCTTGTCTAAAAAAATCCCTTGAAAAAGGGCAGATGTATAGAAATAGAAAATTTCAAAGAAAAACTCTCTACAAATGGAAGAGA | |
60 + | |
61 CCCFFFFFHHHHHJJJJIJJJJJJJJJJJJJJJIJJJJJIIJJJJJJGIJIJIHHHHHHHHFFFFFFDEEEEEDCDDDDDDDCCDDDEDDDDD>CCCCB@9 | |
62 | |
63 note that string after first space is omitted! | |
64 | 49 |
65 Because sequence names sometimes containg spaces which delimit the actual name. By default, anything after spaces is | 50 @prefixCSHL_4_FC042GAMMII_2_1_517_596suffix |
66 excluded from sequences name. In example sequence: | 51 GGTCAATGATGAGTTGGCACTGTAGGCACCATCAAT |
67 | 52 +prefixCSHL_4_FC042GAMMII_2_1_517_596suffix |
68 :: | 53 40 40 40 40 40 40 40 40 40 40 38 40 40 40 40 40 14 40 40 40 40 40 36 40 13 14 24 |
69 | 54 24 9 24 9 40 10 10 15 40 |
70 @SRR352150.23846180 HWUSI-EAS1786:7:119:15910:19280/1 | |
71 CTGGATTCTATACCTTTGGCAACTACTTCTTGGTTGATCAGGAAATTAACACTAGTAGTTTAGGCAATTTGGAATGGTGCCAAAGATGTATAGAACTTTC | |
72 + | |
73 IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIGIIIHIIIIIFIIIIIIHDHBBIHFIHIIBHHDDHIFHIHIIIHIHGGDFDEI@EGEGFGFEFB@ECG | |
74 | 55 |
75 when **Number of spaces in name to ignore** is set to 0 (default) the output will be: | 56 different format: |
76 | 57 |
77 :: | 58 |
78 | 59 :: |
79 @prefixSRR352150.23846180suffix | 60 |
80 CTGGATTCTATACCTTTGGCAACTACTTCTTGGTTGATCAGGAAATTAACACTAGTAGTTTAGGCAATTTGGAATGGTGCCAAAGATGTATAGAACTTTC | 61 @HISEQ1:92:c0190acxx:8:1101:1252:2230 2:N:0:CGATGT |
81 + | 62 AGAGGAAAAAACATAGTTCTTGTCTAAAAAAATCCCTTGAAAAAGGGCAGATGTATAGAAATAGAAAATTTCAAAGAAAAACTCTCTACAAATGGAAGAGA |
82 IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIGIIIHIIIIIFIIIIIIHDHBBIHFIHIIBHHDDHIFHIHIIIHIHGGDFDEI@EGEGFGFEFB@ECG | 63 + |
83 | 64 CCCFFFFFHHHHHJJJJIJJJJJJJJJJJJJJJIJJJJJIIJJJJJJGIJIJIHHHHHHHHFFFFFFDEEEEEDCDDDDDDDCCDDDEDDDDD>CCCCB@9 |
84 If you want to keep spaces the setting **Number of spaces in name to ignore** to 1 will yield | 65 |
85 | 66 is renamed to: |
86 :: | 67 |
87 | 68 :: |
88 @prefixSRR352150.23846180 HWUSI-EAS1786:7:119:15910:19280/1suffix | 69 |
89 CTGGATTCTATACCTTTGGCAACTACTTCTTGGTTGATCAGGAAATTAACACTAGTAGTTTAGGCAATTTGGAATGGTGCCAAAGATGTATAGAACTTTC | 70 @prefixHISEQ1:92:c0190acxx:8:1101:1252:2230suffix |
90 + | 71 AGAGGAAAAAACATAGTTCTTGTCTAAAAAAATCCCTTGAAAAAGGGCAGATGTATAGAAATAGAAAATTTCAAAGAAAAACTCTCTACAAATGGAAGAGA |
91 IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIGIIIHIIIIIFIIIIIIHDHBBIHFIHIIBHHDDHIFHIHIIIHIHGGDFDEI@EGEGFGFEFB@ECG | 72 + |
92 | 73 CCCFFFFFHHHHHJJJJIJJJJJJJJJJJJJJJIJJJJJIIJJJJJJGIJIJIHHHHHHHHFFFFFFDEEEEEDCDDDDDDDCCDDDEDDDDD>CCCCB@9 |
93 | 74 |
94 </help> | 75 note that string after first space is omitted! |
76 | |
77 Because sequence names sometimes containg spaces which delimit the actual name. By | |
78 default, anything after spaces is | |
79 excluded from sequences name. In example sequence: | |
80 | |
81 :: | |
82 | |
83 @SRR352150.23846180 HWUSI-EAS1786:7:119:15910:19280/1 | |
84 CTGGATTCTATACCTTTGGCAACTACTTCTTGGTTGATCAGGAAATTAACACTAGTAGTTTAGGCAATTTGGAATGGTGCCAAAGATGTATAGAACTTTC | |
85 + | |
86 IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIGIIIHIIIIIFIIIIIIHDHBBIHFIHIIBHHDDHIFHIHIIIHIHGGDFDEI@EGEGFGFEFB@ECG | |
87 | |
88 when **Number of spaces in name to ignore** is set to 0 (default) the output will | |
89 be: | |
90 | |
91 :: | |
92 | |
93 @prefixSRR352150.23846180suffix | |
94 CTGGATTCTATACCTTTGGCAACTACTTCTTGGTTGATCAGGAAATTAACACTAGTAGTTTAGGCAATTTGGAATGGTGCCAAAGATGTATAGAACTTTC | |
95 + | |
96 IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIGIIIHIIIIIFIIIIIIHDHBBIHFIHIIBHHDDHIFHIHIIIHIHGGDFDEI@EGEGFGFEFB@ECG | |
97 | |
98 If you want to keep spaces the setting **Number of spaces in name to ignore** to 1 | |
99 will yield | |
100 | |
101 :: | |
102 | |
103 @prefixSRR352150.23846180 HWUSI-EAS1786:7:119:15910:19280/1suffix | |
104 CTGGATTCTATACCTTTGGCAACTACTTCTTGGTTGATCAGGAAATTAACACTAGTAGTTTAGGCAATTTGGAATGGTGCCAAAGATGTATAGAACTTTC | |
105 + | |
106 IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIGIIIHIIIIIFIIIIIIHDHBBIHFIHIIBHHDDHIFHIHIIIHIHGGDFDEI@EGEGFGFEFB@ECG | |
107 | |
108 | |
109 </help> | |
95 </tool> | 110 </tool> |