Mercurial > repos > peterjc > make_nr
view tools/make_nr/make_nr.xml @ 1:84e483325b04 draft default tip
"make_nr v0.0.2"
author | peterjc |
---|---|
date | Thu, 18 Mar 2021 12:48:57 +0000 |
parents | c84f12187af9 |
children |
line wrap: on
line source
<tool id="make_nr" name="Make FASTA non-redundant" version="0.0.2"> <description>by combining duplicated sequences</description> <requirements> <requirement type="package" version="1.67">biopython</requirement> </requirements> <version_command> python $__tool_directory__/make_nr.py --version </version_command> <command detect_errors="aggressive"> python $__tool_directory__/make_nr.py $alphasort -s '$separator' -o '$output' #for $f in $input '$f' #end for </command> <inputs> <param name="input" type="data" format="fasta,fasta.gz" multiple="True" label="Input FASTA sequence file(s)"/> <param argument="separator" type="text" size="10" area="False" value=";" label="Separator string to use when combining the identifiers of duplicate sequences" help="A single character is recommended, e.g. the semi-colon, or comma"> <sanitizer> <valid initial="default"> <add value=";"/> <add value="|"/> </valid> </sanitizer> </param> <param argument="alphasort" type="select" label="Treatment of identifiers when combining duplicates with the separator"> <option value="">Use the order they appear in the input file(s)</option> <option value="-a">Sort alphabetically before combining them</option> </param> </inputs> <outputs> <data name="output" format="fasta" label="$on_string (NR)" /> </outputs> <tests> <test> <param name="input" value="duplicates.fasta" ftype="fasta"/> <output name="output" file="duplicates.nr.fasta" ftype="fasta"/> </test> <test> <param name="input" value="duplicates.fasta.gz" ftype="fasta.gz"/> <output name="output" file="duplicates.nr.fasta" ftype="fasta"/> </test> <test> <param name="input" value="more_duplicates.fasta,duplicates.fasta" ftype="fasta"/> <output name="output" file="deduplicate.nosortids.fasta" ftype="fasta"/> </test> <test> <param name="input" value="more_duplicates.fasta,duplicates.fasta" ftype="fasta"/> <param name="alphasort" value="-a"/> <output name="output" file="deduplicate.sortids.fasta" ftype="fasta"/> </test> <test> <param name="input" value="empty.fasta" ftype="fasta"/> <output name="output" file="empty.fasta" ftype="fasta"/> </test> <test> <param name="input" value="empty.fasta,empty.fasta" ftype="fasta"/> <output name="output" file="empty.fasta" ftype="fasta"/> </test> </tests> <help> **What it does** Takes one or more input FASTA files, checks them to find any duplicate sequences (ignoring the case), and writes an output FASTA file where any duplicates appear once with combined identifier. For example, using the default separator of a semi-colon:: >1 first entry act >2 The A-Team AAaa >3 not unique... ACgt >4 CCCC >5 a duplicate acgt >6 last! GGGG In this simple example ``ACGT`` appears twice (ignoring case) as entries ``3`` and ``6``. Entry ``3`` is renamed as ``3;6`` and entry ``4`` is omitted:: >1 first entry act >2 The A-Team AAaa >3;6 representing 2 records ACgt >4 CCCC >6 last! GGGG This means that the representative records take the position and sequence case from the first entry with that sequence. In this case the combined entry is labelled as ``3;6``, so the sort option has no effect. However, if the records appears in the file with ``6`` before ``3`` you can choose to get ``6;3`` (order from file, default) or ``3;6`` (ordered alphabetically). Notice the unique sequences are preserved as they were with any description or mixed case. **References** If you cannot cite this tool directly via the GitHub URL https://github.com/peterjc/galaxy_blast/tree/master/tools/make_nr and need a traditional paper, then please cite: P.J.A. Cock, J.M. Chilton, B. Gruening, J.E. Johnson, N. Soranzo (2015). NCBI BLAST+ integrated into Galaxy. *GigaScience* 4:39 https://doi.org/10.1186/s13742-015-0080-7 This wrapper is available to install into other Galaxy Instances via the Galaxy Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/make_nr </help> <citations> <citation type="doi">10.1186/1471-2105-10-421</citation> </citations> </tool>