Mercurial > repos > bgruening > glimmer_not_knowledge_based

<tool id="glimmer_not_knowledge_based" name="Glimmer3" version="@WRAPPER_VERSION@">
    <description>Predict ORFs in prokaryotic genomes (not knowlegde-based)</description>
    <expand macro="bio_tools"/>
    <macros>
        <import>macros.xml</import>
    </macros>
    <expand macro="requirements"/>
    <command><![CDATA[
        python '$__tool_directory__/glimmer_wo_icm.py'
            '$input'
            #if $report:
                '$prediction'
            #else:
                'None'
            #end if
            #if $detailed_report:
                '$detailed'
            #else:
                'None'
            #end if
            $overlap
            $gene_length
            $threshold
            $linear
            '$genes_output'
]]></command>
    <inputs>
        <param name="input" type="data" format="fasta" label="Genome sequence" />
        <param name="overlap" type="integer" value="0" label="Set maximum overlap length. Overlaps this short or shorter are ignored." />
        <param name="gene_length" type="integer" value="110" label="Set minimum gene length." />
        <param name="threshold" type="integer" value="30" label="Set threshold score for calling as gene. If the in-frame score >= N, then the region is given a number and considered a potential gene." />
        <param name="linear" type="boolean" truevalue="true" falsevalue="false" checked="true" label="Assume linear rather than circular genome, i.e., no wraparound" />

        <param name="detailed_report" type="boolean" truevalue="" falsevalue="" checked="false" label="Output a detailed gene prediction report as separate file" />
        <param name="report" type="boolean" truevalue="" falsevalue="" checked="false" label="Report the classic glimmer table output" />
    </inputs>
    <outputs>
        <data name="genes_output" format="fasta" label="Glimmer3 on ${on_string} (Gene Prediction FASTA)" />
        <data name="prediction" format="txt" label="Glimmer3 on ${on_string} (Gene Prediction table)">
            <filter>report == True</filter>
        </data>
        <data name="detailed" format="txt" label="Glimmer3 on ${on_string} (detailed report)">
            <filter>detailed_report == True</filter>
        </data>
    </outputs>
    <tests>
        <test>
            <param name="input" value="streptomyces_Tue6071_plasmid_genomic.fasta" />
            <param name="overlap" value="0" />
            <param name="gene_length" value="110" />
            <param name="threshold" value="30" />
            <param name="linear" value="true" />
            <param name="detailed_report" value="" />
            <param name="report" value="" />
            <output name="genes_output" file="glimmer_wo_icm_trans-table-11_plasmid_genomic.fasta" ftype="fasta" />
        </test>
    </tests>
    <help>
<![CDATA[

**What it does**

This tool predicts open reading frames (ORFs) from a given DNA Sequence. That tool is not knowlegde-based.

The recommended way is to use a trained Glimmer3 with ICM model. Use the knowlegde-based version for that and insert/generate a training set.

-----

**Glimmer Overview**

::

**************		**************		**************		**************
*            *		*	     *		*            *		*            *
* long-orfs  *  ===>	*   Extract  *	===>	* build-icm  *  ===>	*  glimmer3  *
*            *		*	     *		*	     *  	*	     *
**************		**************		**************		**************

-----

**Example**

Suppose you have the following DNA sequences::

    >SQ   Sequence 8667507 BP; 1203558 A; 3121252 C; 3129638 G; 1213059 T; 0 other;
    cccgcggagcgggtaccacatcgctgcgcgatgtgcgagcgaacacccgggctgcgcccg
    ggtgttgcgctcccgctccgcgggagcgctggcgggacgctgcgcgtcccgctcaccaag
    cccgcttcgcgggcttggtgacgctccgtccgctgcgcttccggagttgcggggcttcgc
    cccgctaaccctgggcctcgcttcgctccgccttgggcctgcggcgggtccgctgcgctc
    ccccgcctcaagggcccttccggctgcgcctccaggacccaaccgcttgcgcgggcctgg
    .......

Running this tool will produce a FASTA file with predicted genes and glimmer output files like the following::

    >SQ   Sequence 8667507 BP; 1203558 A; 3121252 C; 3129638 G; 1213059 T; 0 other;
    orf00001      577      699  +1     5.24
    orf00003      800     1123  +2     5.18
    orf00004     1144     3813  +1    10.62
    orf00006     3857     6220  +2     6.07
    orf00007     6226     7173  +1     1.69
    orf00008     7187     9307  +2     8.95
    orf00009     9424    10410  +1     8.29
    orf00010    10515    11363  +3     7.00
    orf00011    11812    11964  +1     2.80
    orf00012    12360    13457  +3     4.80
    orf00013    14379    14044  -1     7.41
    orf00015    15029    14739  -3    12.43
    orf00016    15066    15227  +3     1.91
    orf00020    16061    15351  -3     2.83
    orf00021    17513    17391  -3     2.20
    orf00023    17529    17675  +3     0.11


]]>
    </help>
    <expand macro="citation" />
</tool>
author	iuc
date	Sun, 20 Mar 2022 10:08:30 +0000
parents	4da5f6bdcf12
children