Mercurial > repos > iuc > mummer_show_coords

<tool id="mummer_show_coords" name="Show-Coords" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="@PROFILE@">
    <description>Parse delta file and report coordinates and other information</description>
    <macros>
        <import>macros.xml</import>
    </macros>
    <expand macro="bio_tools"/>
    <expand macro="requirements" />
    <command detect_errors="exit_code">
        <![CDATA[
        show-coords
            $merge
            -c
            -H
            -I '$identity'
            -l
            -L '$min_alignment_length'
            $annotate
            $sort
            -T
            '${delta}' > '${output}'
        ]]>
    </command>
    <inputs>
        <param name="delta" type="data" format="tabular" label="Match file from Nucmer" />
        <param name="merge" type="boolean" argument="-b" truevalue="-b" falsevalue="" label="Merge"
            help="Merges overlapping alignments regardless of match dir or frame and does not display any idenitity information. (-b)" />
        <param name="identity" type="float" argument="-I" value="75.0" label="Identity" help="Set minimum percent identity to display (-I)" />
        <param name="min_alignment_length" type="integer" argument="-L" value="100" label="Minimum Alignment Length" help="Set minimum alignment length to display (-L)" />
        <param name="annotate" type="boolean" argument="-o" truevalue="-o" falsevalue="" label="Annotate"
            help="Annotate maximal alignments between two sequences, i.e. overlaps between reference and query sequences (-o)" />
        <param name="sort" type="select" label="Sorting strategy for output" >
            <option value="-q">Sort output by query IDs and coordinates (-q)</option>
            <option value="-r">Sort output by reference IDs and coordinates (-r)</option>
        </param>
    </inputs>
    <outputs>
        <data name="output" format="tabular" from_work_dir="show-coords.txt" >
            <actions>
                <action name="column_names" type="metadata" default="[S1], [E1], [S2], [E2], [LEN 1], [LEN 2], [% IDY], [LEN R], [LEN Q], [COV R], [COV Q], [REF TAG], [QUERY TAG]" />
            </actions>
        </data>
    </outputs>
    <tests>
        <test>
            <param name="delta" ftype="tabular" value="nucmer.txt" />
            <output name="output" ftype="tabular" compare="diff" value="show-coords.txt" />
        </test>
    </tests>
    <help><![CDATA[
This program parses the delta alignment output of nucmer and displays the coordinates, and other useful information about the alignments.

Output is tabular. Below is a description of each column:
    * **[S1]** Start of the alignment region in the reference sequence
    * **[E1]** End of the alignment region in the reference sequence
    * **[S2]** Start of the alignment region in the query sequence
    * **[E2]** End of the alignment region in the query sequence
    * **[LEN 1]** Length of the alignment region in the reference sequence, measured in nucleotides
    * **[LEN 2]** Length of the alignment region in the query sequence, measured in nucleotides
    * **[% IDY]** Percent identity of the alignment, calculated as (number of exact matches) / ([LEN 1] + insertions in the query)
    * **[LEN R]** Length of the reference sequence
    * **[LEN Q]** Length of the query sequence
    * **[COV R]** Percent coverage of the alignment on the reference sequence, calculated as [LEN 1] / [LEN R]
    * **[COV Q]** Percent coverage of the alignment on the query sequence, calculated as [LEN 2] / [LEN Q]
    * **[REF TAG]** The reference FastA ID
    * **[QUERY TAG]** The query FastA ID

There is also an optional final column (turned on with the "Annotate" parameter) that will contain some 'annotations'. The Annotate option will annotate alignments that represent overlaps between two sequences. Sometimes, nucmer will extend adjacent clusters past one another, thus causing a somewhat redundant output, this option will notify users of such rare occurrences.

The Percent Coverage and Sequence Length options are useful when comparing two sets of assembly contigs, in that these options help determine if an alignment spans an entire contig, or is just a partial hit to a different read. The Merge option is useful when the user wishes to identify sytenic regions between two genomes, but is not particularly interested in the actual alignment similarity or appearance. This option also disregards match orientation, so should not be used if this information is needed.

**Options:**::

    -b      Merges overlapping alignments regardless of match dir or frame and does not display any
            identity information.

    -I      Set minimum percent identity to display

    -L      Set minimum alignment length to display

    -o      Annotate maximal alignments between two sequences, i.e. overlaps between reference and query
            sequences

    -q      Sort output lines by query IDs and coordinates

    -r      Sort output lines by reference IDs and coordinates

    ]]></help>
    <expand macro="citation" />
</tool>
author	iuc
date	Mon, 18 Mar 2024 12:41:14 +0000
parents	b4124df98b26
children