Mercurial > repos > iuc > kc_align

<tool id="kc-align" name="Kc-Align" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@">
    <macros>
        <token name="@TOOL_VERSION@">1.0.2</token>
        <token name="@VERSION_SUFFIX@">1</token>
    </macros>
    <xrefs>
        <xref type="bio.tools">kc-align</xref>
    </xrefs>
    <requirements>
        <requirement type="package" version="@TOOL_VERSION@">kcalign</requirement>
    </requirements>
    <command detect_errors="exit_code">
        <![CDATA[
        ## number of slots needs to be one or divisible by three
        SLOTS=\${GALAXY_SLOTS:-1};
        while [ \$SLOTS -gt 1 ];
        do
            if ! (( \$SLOTS % 3 )) ; then break; fi;
            ((SLOTS--));
        done;
        if [[ "\${GALAXY_SLOTS:-1}" -ne "\$SLOTS" ]]; then
            >&2 echo "Warning: using only \$SLOTS threads (instead of the configured \${GALAXY_SLOTS:-1}), since it needs to be 1 or a multiple of three. Please contact your Galaxy Admin";
        fi;

        kc-align
            --mode '$position.mode'
            --reference '$ref'
            --sequences '$seqs'
            #if $position.mode == "genome":
                --start '$position.start'
                --end '$position.end'
                -d '$position.dist'
            #end if
            #if $position.mode == "mixed":
                -d '$position.dist'
            #end if
            $compress
            -th "\$SLOTS"

    ]]></command>
    <inputs>
        <param name="ref" type="data" format="fasta" label="Reference Sequence" help="Single FASTA reference sequence to be aligned" />
        <param name="seqs" type="data" format="fasta" label="Reads" help="Multi-FASTA of seqeunces to be aligned with the reference" />
        <conditional name="position" >
            <param name="mode" type="select" label="Mode" >
                <option value="genome">Genome</option>
                <option value="gene">Gene</option>
                <option value="mixed">Mixed</option>
            </param>
            <when value="genome" >
                <param name="start" type="text" value="0" label="Start Position(s)" help="The 1-indexed start position of the gene of interest in the reference sequence (For multi-segmented sequences, input each start site separated by a comma ex: 12562,12591)" />
                <param name="end" type="text" value="0" label="End Position(s)" help="The 1-indexed end position of the gene of interest in the reference sequence (For multi-segmented sequences, input each end site separated by a comma ex: 12592,13905)" />
                <param name="dist" type="select" label="Homology distance limit" help="Sequences that are more than an empirically determined distance from the reference are discarded before the final alignment">
                    <option value="none">None</option>
                    <option value="very-close">Very Close</option>
                    <option value="semi-close">Semi-Close</option>
                    <option value="less-close">Less Close</option>
                </param>
            </when>
            <when value="gene" >
            </when>
            <when value="mixed" >
            	<param name="dist" type="select" label="Homology distance limit" help="Sequences that are more than an empirically determined distance from the reference are discarded before the final alignment">
                    <option value="none">None</option>
                    <option value="very-close">Very Close</option>
                    <option value="semi-close">Semi Close</option>
                    <option value="less-close">Less Close</option>
                </param>
            </when>
        </conditional>
        <param name="compress" type="boolean" truevalue="--compress" falsevalue="" label="Compress" help="Compress identical sequences" />
    </inputs>
    <outputs>
        <data name="fasta" format="fasta" from_work_dir="codon_align.fasta" label="${tool.name}: fasta output" />
        <data name="clustal" format="txt" from_work_dir="codon_align.clustal" label="${tool.name}: clustal output" />
    </outputs>
<tests>
    <test>
        <param name="ref" ftype="fasta" value="wuhan_ref.fasta" />
        <param name="seqs" ftype="fasta" value="3.fasta" />
        <param name="mode" value="genome" />
        <param name="start" value="21563" />
        <param name="end" value="25384" />
        <output name="fasta" ftype="fasta" compare="contains" value="kc-align.fasta" />
        <output name="clustal" ftype="txt" compare="contains" value="kc-align.clustal" />
    </test>
</tests>
    <help><![CDATA[

============
Kc-Align
============

Kc-Algin is a codon-aware multiple aligner that uses Kalgin2 to produce in-frame gapped codon alignments for selection analysis of small genomes (mostly viral and some smaller bacterial genomes). Takes nucleotide seqeunces as inputs, converts them to their in-frame amino acid sequences, performs multiple alignment with Kalign, and then converts the alignments back to their original codon sequence while preserving the gaps. Produces two outputs: the gapped nucleotide alignments in FASTA format and in CLUSTAL format.

Kc-Align will also attempt to detect any frameshift mutations in the input reads. If a frameshift is detected, that sequence will not be included in the multiple alignment and its ID will be printed to stdout.

Kc-Align also has functionality for genes that are are composed of more than one continuous sequence (currently only support for two segments). This can be achieved by entering each segments start coordinate in the Start Position parameter separated by a comma and then doing the same for each segments end coordinate in the End Position parameter (Ex: Start Postion: 12562,12591 End Position: 12592,13905)

Modes:
------

Kc-Align can be run in three different modes, depending on your input data.

* In **genome** mode, the "reference" and "reads" input parameters are all full genome FASTA files. This mode also requires the 1-based start and end position numbers corresponding to the gene you are interested in aligning from the reference input.

* If both the "reference" and "reads" inputs are already in-frame genes, the **gene** mode should be used. This mode does not require start and end position parameters as the reference is already in-frame.

* For the case when your "reference" is an in-frame gene while the "reads" are whole genomes, the **mixed** mode can be used. Like gene mode, this mode does not require the start and end point position parameters.


    ]]></help>
    <citations>
        <citation type="bibtex">
            @misc{githubkcalign,
            author = {Nicholas Keener, Emil Bouvier},
            year = {2020},
            title = {Kc-Align},
            publisher = {Github},
            journal = {Github repository},
            url = {https://github.com/davebx/kc-align},
        }</citation>
    </citations>
</tool>
author	iuc
date	Fri, 24 Jun 2022 15:57:35 +0000
parents	0c0288a9d92c
children