Mercurial > repos > iuc > pizzly

<tool id="pizzly" name="pizzly" version="0.37.3">
    <description>- fast fusion detection using kallisto</description>
    <requirements>
        <requirement type="package" version="0.37.3">pizzly</requirement>
    </requirements>
    <version_command>pizzly --version</version_command>
    <command detect_errors="exit_code"><![CDATA[

        ## Get reference FASTA

        #if $ref_fa.rfasta.rfasta_source == "history":
            ln -s '$ref_fa.rfasta.ref_fa_hist' reference.fa &&
        #else:
            ln -s '$ref_fa.rfasta.ref_fa_builtin.fields.path' reference.fa &&
        #end if


        ## Get reference GTF

        #if $ref_gtf.rgtf.rgtf_source == "history":
            ln -s '$ref_gtf.rgtf.ref_gtf_hist' reference.gtf &&
        #else:
            ln -s '$ref_gtf.rgtf.ref_gtf_builtin.fields.path' reference.gtf &&
        #end if


        ## Run pizzly

        pizzly
        --fasta reference.fa
        --gtf reference.gtf
        --align-score '$adv.align_score'
        --insert-size '$adv.insert_size'
        --k '$adv.kmer_size'
        $adv.ignore_protein
        --output out
        '$kinput.input'   &&


        ## Convert json output to tabular

        pizzly_flatten_json.py out.json > out.fusions.tab

    ]]></command>
    <inputs>
        <section name="kinput" expanded="false" title="Select kallisto fusions file">
            <param name="input" type="data" format="tabular" label="Output from kallisto quant --fusion" />
        </section>
        <section name="ref_fa" expanded="false" title="Select Reference FASTA">
            <conditional name="rfasta">
                <param name="rfasta_source" type="select" label="Reference transcriptome FASTA" help="Make sure the FASTA corresponds to the same file used with kallisto quant --fusion">
                    <option value="cached" selected="true">Use a built-in transcriptome</option>
                    <option value="history">Use a FASTA from history</option>
                </param>
                <when value="cached">
                    <param name="ref_fa_builtin" type="select" label="Select a built-in FASTA" help="If your transcriptome of interest is not listed, contact your Galaxy administrator">
                        <options from_data_table="all_fasta">
                            <filter type="sort_by" column="2" />
                            <validator type="no_options" message="No FASTA is available for the selected input dataset" />
                        </options>
                    </param>
                </when>
                <when value="history">
                    <param name="ref_fa_hist" type="data" format="fasta" label="Select a history FASTA" />
                </when>
            </conditional>
        </section>
        <section name="ref_gtf" expanded="false" title="Select Reference GTF">
         <conditional name="rgtf">
            <param name="rgtf_source" type="select" label="Reference GTF">
                <option value="cached" selected="true">Use a built-in transcriptome</option>
                <option value="history">Use a GTF from history</option>
            </param>
            <when value="cached">
                <param name="ref_gtf_builtin" type="select" label="Select a built-in GTF" help="If the GTF file for your transcriptome of interest is not listed, contact your Galaxy administrator">
                    <options from_data_table="all_gtf">
                        <filter type="sort_by" column="2" />
                        <validator type="no_options" message="No GTF file is available." />
                    </options>
                </param>
            </when>
            <when value="history">
                <param name="ref_gtf_hist" type="data" format="gtf" label="Select a history GTF" />
            </when>
        </conditional>
        </section>
        <section name="adv" expanded="false" title="Advanced Options">
            <param name="insert_size" argument="--insert-size" type="integer" value="400" label="Maximum insert size of the paired-end library" help="kallisto will estimate this from the reads, default: 400">
                <validator type="in_range" min="0" />
            </param>
            <param name="align_score" argument="--align-score" type="integer" value="2" label="Number of mismatches allowed" help="default: 2">
                <validator type="in_range" min="0" />
            </param>
            <param name="kmer_size" argument="--k" type="integer" value="31" label="K-mer size used to create the kallisto index" help="default: 31">
                <validator type="in_range" min="0" />
            </param>
            <param name="ignore_protein" argument="--ignore-protein" type="boolean" truevalue="--ignore-protein" falsevalue="" checked="false" label="Ignore protein coding information in annotation" help="Warning: this will probably lead to an increase in the number of false positives reported"/>
       </section>
    </inputs>
    <outputs>
        <data name="fusions_tab" format="tabular" from_work_dir="out.fusions.tab" label="${tool.name} on ${on_string}: Table"/>
        <data name="fusions_fa" format="fasta" from_work_dir="out.fusions.fasta" label="${tool.name} on ${on_string}: Fasta"/>
    </outputs>
    <tests>
        <test>
            <param name="rfasta_source" value="history" />
            <param name="rgtf_source" value="history" />
            <param name="input" ftype="tabular" value="fusion.txt" />
            <param name="ref_fa_hist" ftype="fasta" value="ref.fasta.gz" />
            <param name="ref_gtf_hist" ftype="gtf" value="ref.gtf.gz" />
            <output name="fusions_tab" file="out.fusions.tab" ftype="tabular" />
            <output name="fusions_fa" file="out.fusions.fasta" ftype="fasta" />
        </test>
    </tests>
    <help><![CDATA[

.. class:: infomark

**What it does**

pizzly_ is a program for detecting gene fusions from RNA-Seq data of cancer samples.

pizzly introduces a novel approach to fusion detection, it builds on the pseudoalignment idea (that simplifies and accelerates transcript quantification) by inspecting paired reads that cannot be pseudoaligned due to conflicting matches. pizzly filters false positives, assembles new transcripts from the fusion reads, and reports candidate fusions.

With pizzly, fusion detection from RNA-Seq reads can be performed **in a matter of minutes**.

.. _pizzly: https://github.com/pmelsted/pizzly

-----

**Inputs**

- one or more tabular files from kallisto quant (run with the --fusion parameter on paired-end fastqs)
- a FASTA file containing the reference transcriptome. This must be the same fasta that was used to build the kallisto index.
- a GTF file describing the transcriptome. Ensembl transcriptomes recommended.

pizzly has been tested on Ensembl (versions 75+) and Gencode (version 19+) annotations. The latest Ensembl annotations (version 87 GTF, FASTA) are recommended for running with pizzly.

-----

**Outputs**

Two output files are created containing the filtered fusion calls

- a tabular file
- a FASTA file

-----

**More Information**

Fusion junctions are detected by pizzly using a two-stage method. The first stage is implemented in kallisto and detects individual reads or read pairs, whose constituent parts pseudoalign to a reference transcriptome but that in combination fail to pseudoalign. pizzly takes the output of the potential fusion junctions found by kallisto and performs a detailed analysis of the associated reads by aligning them across the putative junctions. Additionally, pizzly is annotation aware, i.e. it uses information about the genomic coordinates and gene identities of each isoform to identify possible false positives arising from repetitive sequences across the genome.

 ]]></help>
    <citations>
        <citation type="doi">10.1101/166322</citation>
    </citations>
</tool>
author	iuc
date	Wed, 13 Sep 2017 22:40:57 -0400
parents
children	91ef4a171202