Mercurial > repos > iuc > gff3_rebase

<tool id="gff3.rebase" name="Rebase GFF3 features" version="1.2">
  <description>against parent features</description>
  <macros>
    <import>macros.xml</import>
  </macros>
  <expand macro="requirements"/>
  <version_command>python gff3_rebase.py --version</version_command>
  <command detect_errors="aggressive"><![CDATA[
    python '$__tool_directory__/gff3_rebase.py'
    '$parent'
    '$child'

    $interpro
    $protein2dna
    > '$output']]>
  </command>
  <inputs>
    <param label="Parent GFF3 annotations" name="parent" format="gff3" type="data"/>
    <param label="Child GFF3 annotations to rebase against parent" name="child" format="gff3" type="data"/>

    <param label="Interpro specific modifications" name="interpro" type="boolean" truevalue="--interpro" falsevalue=""/>
    <param label="Map protein translated results to original DNA data" name="protein2dna" type="boolean" truevalue="--protein2dna" falsevalue=""/>
  </inputs>
  <outputs>
    <data format="gff3" name="output"/>
  </outputs>
  <tests>
      <test>
          <param name="parent" value="parent.gff"/>
          <param name="child" value="child.gff"/>
          <param name="protein2dna" value="True" />
          <output name="output" file="proteins.gff"/>
      </test>
  </tests>
  <help><![CDATA[
**What it does**

Often the genomic data processing/analysis process requires a workflow like the following:

-  select some features from a genome
-  export the sequences associated with those regions
-  analyse those exports with some tool like Blast

For display, especially in software like JBrowse, it is convenient to know
where in the original genome the analysis results would fall. E.g. if a
transmembrane domain is detected at bases 10-20 of an analysed protein, where
should this be displayed relative to the parent genome?

This tool helps fill that gap, by rebasing some analysis results against the
parent features which were originally analysed.

**Example Inputs**

For a "child" set of annotations like::

    #gff-version 3
    cds42	blastp	match_part	1	50	1e-40	.	.	ID=m00001;Notes=RNAse A Protein

And a parent set of annotations like::

    #gff-version 3
    PhageBob	maker	cds	300	600	.	+	.	ID=cds42

One could imagine that during the analysis process, the user had exported the parent annotation into some sequence::

    >cds42
    M......

and then analysed those results, producing the "child" annotation file. This
tool will then localize the results properly against the parent::

    #gff-version 3
    PhageBob	blastp	match_part	300	449	1e-40	+	.	ID=m00001;Notes=RNAse A Protein

which will allow you to display the results in the correct location in visualizations.

**Options**

There are two optional flags which can be passed.

The Interpro flag selectively ignores features which shouldn't be included in
the output (i.e. ``polypeptide``), and a couple of qualifiers that aren't
useful (``status``, ``Target``)

The "Map Protein..." flag says that you translated the sequences during the
genomic export process, analysing protein sequences. This indicates to the
software that the bases should be multiplied by three to obtain the correct DNA
locations.
]]></help>
    <citations>
        <citation type="doi">10.1093/bioinformatics/btp163</citation>
    </citations>
</tool>