Mercurial > repos > iuc > gamma_s

<tool id="gamma_s" name="GAMMA-S" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="23.2">
	<description>finds gene matches in microbial genomic data using nucleotide identity</description>
	<macros>
		<token name="@TOOL_VERSION@">2.2</token>
		<token name="@VERSION_SUFFIX@">0</token>
	</macros>
    <creator>
        <person givenName="Lieven" familyName="Sterck" url="https://github.com/lsterck" />
        <organization name="Sciensano-BioIT" url="https://github.com/BioinformaticsPlatformWIV-ISP" />
    </creator>
	<requirements>
		<requirement type="package" version="@TOOL_VERSION@">GAMMA</requirement>
	</requirements>
	<version_command>echo @TOOL_VERSION@</version_command>
	<command detect_errors="exit_code"><![CDATA[
 GAMMA-S.py

 '$input_fasta'
 '$input_db'
 gamma-s_out

 $all
 -i $identity
 $extended
 $protein
 -m $minimum

]]></command>
	<inputs>
        <param name="input_fasta" type="data" format="fasta" label="Input FASTA file" help="a genome or assembly in FASTA format" />
        <param name="input_db" type="data" format="fasta" label="Database to screen against" help="a multifasta database of the coding sequence of genes." />
        <param argument="--all" type="boolean" truevalue="-a" falsevalue="" checked="false" label="Include all gene matches" help="Returns all (including overlapping) gene matches" />
        <param argument="--identity" type="integer" min="0" max="100" value="90" label="Nucleotide sequence identity" help="The minimum nucleotide sequence identity % used by the Blat search, input as an integer (i.e., '-i 95' for a 95% threshold), default is 90" />
        <param argument="--extended" type="boolean" truevalue="-e" falsevalue="" checked="false" label="Return all gene mutations" help="Returns all gene mutations, otherwise if there are more than 10 mutations present the count is given" />
        <param argument="--protein" type="boolean" truevalue="-p" falsevalue="" checked="false" label="Perform protein-protein comparisons" help="protein-protein comparisons, requires two protein sequence fastas as input" />
        <param argument="--minimum" type="integer" min="0" max="100" value="20" label="Minimum length percent match" help="The minimum length percent match for output, input as an integer (i.e., '-m 50' for a 50% minimum match length to be reported), not active if the --all option is used" />
	</inputs>
	<outputs>
        <data name="gamma_s_out" format="tabular" from_work_dir="gamma-s_out.gamma" label="${tool.name} on $on_string: GAMMA Output" />
	</outputs>
	<tests>
        <test expect_num_outputs="1">
            <param name="input_fasta" value="contig_in.fasta" ftype="fasta"/>
            <param name="input_db" value="lukE_6.fasta" ftype="fasta"/>
            <output name="gamma_s_out" file="gamma-s_out.gamma" ftype="tabular"/>
        </test>
        <test expect_num_outputs="1">
            <param name="input_fasta" value="pDHQP1701672_amr_plasmid.fa" ftype="fasta"/>
            <param name="input_db" value="ResFinderDB_subset.fsa" ftype="fasta"/>
            <output name="gamma_s_out" file="gamma-s_amr.gamma" ftype="tabular"/>
        </test>
	</tests>
	<help><![CDATA[
**GAMMA-S (Gene Allele Mutation Microbial Assessment-Sequence)**
finds best matches from a gene database without translating them--so it will find the best match by nucleotides, rather by the translated protein sequence. However, it can perform protein-protein sequence matching as well, which requires two protein fastas as the input.

Output:

The default output of GAMMA-S is a tab-delimited file with 17 columns:

- Gene – The name of the closest matching gene (target) from the database. If there are ambiguous gene matches (i.e., multiple target matches with the same number of non-degenerate codon changes, basepair changes, and transversions), the gene match will be appended with a "‡".
- Contig – The name of the contig on which the match was found.
- Start – The start position of the sequence matching the gene on the contig.
- Stop – The end position of the sequence matching the gene on the contig.
- Match_Type – The type of the gene match based on the translation of the sequence (i.e., the protein sequence). Can be native (for identical amino acid sequences to the target), mutant (for nonsynonymous mutations), truncation (for nonsense mutations), indels (for insertions/deletions), nonstop (for a missing stop codon), contig edge (for matches that are truncated at the start or stop of a contig), or a combination of multiple types (i.e., indel truncation).
- Description – A short description of the match type.
- Mismatches – The count of nucleotide/protein sequence substitution mutations.
- Transversions - The count of basepair changes that are transversions (i.e., purine to pyrimidine or vice versa, such as an A -> C or a T -> G)
- Insertions – The count of the insertions in the matching sequence.
- Insertion_BP – The count of the total bases/residues in the insertions in the matching sequence.
- Deletions – The count of the deletions in the matching sequence.
- Deletions_BP – The count of the total bases/residues in the insertions in the matching sequence.
- Unweighted_Match_Percent – The percent match of overlapping sequences only (i.e., does not include sequences on the contig edges, insertions, or deletions).
- Match_Percent – The percent match of the sequence, subtracing out sequences missing from contig edges, insertions, or deletions. Because insertions are subtracted out, this can lead to cases with negative values, if the insertion is larger than the gene itself.
- Percent_Length - The percent (expressed as a decimal value) of the length of the target covered by the matching sequence, maximum of 1.
- Target_Length - The length (in basepairs) of the target sequence.
- Strand – The sense of the strand (+ or -) on which the match is found.


**More Information**

- **Official Repository**: `GAMMA on GitHub`_

.. _GAMMA on GitHub: https://github.com/rastanton/GAMMA

	]]></help>
	<citations>
		<citation type="doi">10.1093/bioinformatics/btab607</citation>
	</citations>
</tool>