view tools/make_nr/make_nr.xml @ 1:84e483325b04 draft default tip

"make_nr v0.0.2"
author peterjc
date Thu, 18 Mar 2021 12:48:57 +0000
parents c84f12187af9
children
line wrap: on
line source

<tool id="make_nr" name="Make FASTA non-redundant" version="0.0.2">
    <description>by combining duplicated sequences</description>
    <requirements>
        <requirement type="package" version="1.67">biopython</requirement>
    </requirements>
    <version_command>
python $__tool_directory__/make_nr.py --version
    </version_command>
    <command detect_errors="aggressive">
python $__tool_directory__/make_nr.py $alphasort -s '$separator' -o '$output'
#for $f in $input
'$f'
#end for
    </command>
    <inputs>
        <param name="input" type="data" format="fasta,fasta.gz" multiple="True"
               label="Input FASTA sequence file(s)"/>
        <param argument="separator" type="text" size="10" area="False" value=";"
               label="Separator string to use when combining the identifiers of duplicate sequences"
               help="A single character is recommended, e.g. the semi-colon, or comma">
            <sanitizer>
                <valid initial="default">
                    <add value=";"/>
                    <add value="|"/>
                </valid>
            </sanitizer>
        </param>
        <param argument="alphasort" type="select" label="Treatment of identifiers when combining duplicates with the separator">
            <option value="">Use the order they appear in the input file(s)</option>
            <option value="-a">Sort alphabetically before combining them</option>
        </param>
    </inputs>
    <outputs>
        <data name="output" format="fasta" label="$on_string (NR)" />
    </outputs>
    <tests>
        <test>
            <param name="input" value="duplicates.fasta" ftype="fasta"/>
            <output name="output" file="duplicates.nr.fasta" ftype="fasta"/>
        </test>
        <test>
            <param name="input" value="duplicates.fasta.gz" ftype="fasta.gz"/>
            <output name="output" file="duplicates.nr.fasta" ftype="fasta"/>
        </test>
        <test>
            <param name="input" value="more_duplicates.fasta,duplicates.fasta" ftype="fasta"/>
            <output name="output" file="deduplicate.nosortids.fasta" ftype="fasta"/>
        </test>
        <test>
            <param name="input" value="more_duplicates.fasta,duplicates.fasta" ftype="fasta"/>
            <param name="alphasort" value="-a"/>
            <output name="output" file="deduplicate.sortids.fasta" ftype="fasta"/>
        </test>
        <test>
            <param name="input" value="empty.fasta" ftype="fasta"/>
            <output name="output" file="empty.fasta" ftype="fasta"/>
        </test>
        <test>
            <param name="input" value="empty.fasta,empty.fasta" ftype="fasta"/>
            <output name="output" file="empty.fasta" ftype="fasta"/>
        </test>
    </tests>
    <help>
**What it does**

Takes one or more input FASTA files, checks them to find any duplicate sequences
(ignoring the case), and writes an output FASTA file where any duplicates appear
once with combined identifier.

For example, using the default separator of a semi-colon::

    >1 first entry
    act
    >2 The A-Team
    AAaa
    >3 not unique...
    ACgt
    >4
    CCCC
    >5 a duplicate
    acgt
    >6 last!
    GGGG

In this simple example ``ACGT`` appears twice (ignoring case) as entries ``3``
and ``6``. Entry ``3`` is renamed as ``3;6`` and entry ``4`` is omitted::

    >1 first entry
    act
    >2 The A-Team
    AAaa
    >3;6 representing 2 records
    ACgt
    >4
    CCCC
    >6 last!
    GGGG

This means that the representative records take the position and sequence case
from the first entry with that sequence.

In this case the combined entry is labelled as ``3;6``, so the sort option
has no effect. However, if the records appears in the file with ``6`` before
``3`` you can choose to get ``6;3`` (order from file, default) or ``3;6``
(ordered alphabetically).

Notice the unique sequences are preserved as they were with any description
or mixed case.


**References**

If you cannot cite this tool directly via the GitHub URL
https://github.com/peterjc/galaxy_blast/tree/master/tools/make_nr
and need a traditional paper, then please cite:

P.J.A. Cock, J.M. Chilton, B. Gruening, J.E. Johnson, N. Soranzo (2015).
NCBI BLAST+ integrated into Galaxy.
*GigaScience* 4:39
https://doi.org/10.1186/s13742-015-0080-7

This wrapper is available to install into other Galaxy Instances via the Galaxy
Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/make_nr
    </help>
    <citations>
        <citation type="doi">10.1186/1471-2105-10-421</citation>
    </citations>
</tool>