0
|
1 <tool id="make_nr" name="Make FASTA non-redundant" version="0.0.1">
|
|
2 <description>by combining duplicated sequences</description>
|
|
3 <requirements>
|
|
4 <requirement type="package" version="1.67">biopython</requirement>
|
|
5 </requirements>
|
|
6 <version_command>
|
|
7 python $__tool_directory__/make_nr.py --version
|
|
8 </version_command>
|
|
9 <command detect_errors="aggressive">
|
|
10 python $__tool_directory__/make_nr.py $alphasort -s '$separator' -o '$output'
|
|
11 #for $f in $input
|
|
12 '$f'
|
|
13 #end for
|
|
14 </command>
|
|
15 <inputs>
|
|
16 <param name="input" type="data" format="fasta,fasta.gz" multiple="True"
|
|
17 label="Input FASTA sequence file(s)"/>
|
|
18 <param argument="separator" type="text" size="10" area="False" value=";"
|
|
19 label="Separator string to use when combining the identifiers of duplicate sequences"
|
|
20 help="A single character is recommended, e.g. the semi-colon, or comma">
|
|
21 <sanitizer>
|
|
22 <valid initial="default">
|
|
23 <add value=";"/>
|
|
24 <add value="|"/>
|
|
25 </valid>
|
|
26 </sanitizer>
|
|
27 </param>
|
|
28 <param argument="alphasort" type="select" label="Treatment of identifiers when combining duplicates with the separator">
|
|
29 <option value="">Use the order they appear in the input file(s)</option>
|
|
30 <option value="-a">Sort alphabetically before combining them</option>
|
|
31 </param>
|
|
32 </inputs>
|
|
33 <outputs>
|
|
34 <data name="output" format="fasta" label="$on_string (NR)" />
|
|
35 </outputs>
|
|
36 <tests>
|
|
37 <test>
|
|
38 <param name="input" value="duplicates.fasta" ftype="fasta"/>
|
|
39 <output name="output" file="duplicates.nr.fasta" ftype="fasta"/>
|
|
40 </test>
|
|
41 <test>
|
|
42 <param name="input" value="duplicates.fasta.gz" ftype="fasta.gz"/>
|
|
43 <output name="output" file="duplicates.nr.fasta" ftype="fasta"/>
|
|
44 </test>
|
|
45 <test>
|
|
46 <param name="input" value="more_duplicates.fasta,duplicates.fasta" ftype="fasta"/>
|
|
47 <output name="output" file="deduplicate.nosortids.fasta" ftype="fasta"/>
|
|
48 </test>
|
|
49 <test>
|
|
50 <param name="input" value="more_duplicates.fasta,duplicates.fasta" ftype="fasta"/>
|
|
51 <param name="alphasort" value="-a"/>
|
|
52 <output name="output" file="deduplicate.sortids.fasta" ftype="fasta"/>
|
|
53 </test>
|
|
54 </tests>
|
|
55 <help>
|
|
56 **What it does**
|
|
57
|
|
58 Takes one or more input FASTA files, checks them to find any duplicate sequences
|
|
59 (ignoring the case), and writes an output FASTA file where any duplicates appear
|
|
60 once with combined identifier.
|
|
61
|
|
62 For example, using the default separator of a semi-colon::
|
|
63
|
|
64 >1 first entry
|
|
65 act
|
|
66 >2 The A-Team
|
|
67 AAaa
|
|
68 >3 not unique...
|
|
69 ACgt
|
|
70 >4
|
|
71 CCCC
|
|
72 >5 a duplicate
|
|
73 acgt
|
|
74 >6 last!
|
|
75 GGGG
|
|
76
|
|
77 In this simple example ``ACGT`` appears twice (ignoring case) as entries ``3``
|
|
78 and ``6``. Entry ``3`` is renamed as ``3;6`` and entry ``4`` is omitted::
|
|
79
|
|
80 >1 first entry
|
|
81 act
|
|
82 >2 The A-Team
|
|
83 AAaa
|
|
84 >3;6 representing 2 records
|
|
85 ACgt
|
|
86 >4
|
|
87 CCCC
|
|
88 >6 last!
|
|
89 GGGG
|
|
90
|
|
91 This means that the representative records take the position and sequence case
|
|
92 from the first entry with that sequence.
|
|
93
|
|
94 In this case the combined entry is labelled as ``3;6``, so the sort option
|
|
95 has no effect. However, if the records appears in the file with ``6`` before
|
|
96 ``3`` you can choose to get ``6;3`` (order from file, default) or ``3;6``
|
|
97 (ordered alphabetically).
|
|
98
|
|
99 Notice the unique sequences are preserved as they were with any description
|
|
100 or mixed case.
|
|
101
|
|
102
|
|
103 **References**
|
|
104
|
|
105 If you cannot cite this tool directly via the GitHub URL
|
|
106 https://github.com/peterjc/galaxy_blast/tree/master/tools/make_nr
|
|
107 and need a traditional paper, then please cite:
|
|
108
|
|
109 P.J.A. Cock, J.M. Chilton, B. Gruening, J.E. Johnson, N. Soranzo (2015).
|
|
110 NCBI BLAST+ integrated into Galaxy.
|
|
111 *GigaScience* 4:39
|
|
112 https://doi.org/10.1186/s13742-015-0080-7
|
|
113
|
|
114 This wrapper is available to install into other Galaxy Instances via the Galaxy
|
|
115 Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/make_nr
|
|
116 </help>
|
|
117 <citations>
|
|
118 <citation type="doi">10.1186/1471-2105-10-421</citation>
|
|
119 </citations>
|
|
120 </tool>
|