view junction_saturation.xml @ 58:1a052c827e88 draft

"planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/rseqc commit 57f71aa633a43ab02bbf05acd0c6d7f406e01f1e"
author iuc
date Thu, 28 Nov 2019 15:56:37 -0500
parents 5873cd7afb67
children dbedfc5f5a3c
line wrap: on
line source

<tool id="rseqc_junction_saturation" name="Junction Saturation" version="@WRAPPER_VERSION@.1">
    <description>detects splice junctions from each subset and compares them to reference gene model</description>

    <macros>
        <import>rseqc_macros.xml</import>
    </macros>

    <expand macro="requirements" />

    <expand macro="stdio" />

    <version_command><![CDATA[junction_saturation.py --version]]></version_command>

    <command><![CDATA[
        junction_saturation.py
            --input-file '${input}'
            --refgene '${refgene}'
            --out-prefix output
            --min-intron ${min_intron}
            --min-coverage ${min_coverage}
            --mapq ${mapq}
        #if str($percentiles_type.percentiles_type_selector) == "specify":
            --percentile-floor ${percentiles_type.lowBound}
            --percentile-ceiling ${percentiles_type.upBound}
            --percentile-step ${percentiles_type.percentileStep}
        #end if
        ]]>
    </command>

    <inputs>
        <expand macro="bam_sam_param" />
        <expand macro="refgene_param" />
        <expand macro="min_intron_param" />
        <param name="min_coverage" type="integer" label="Minimum number of supporting reads to call a junction (default=1)" value="1" help="(--min-coverage)" />
        <expand macro="mapq_param" />
        <conditional name="percentiles_type">
            <param name="percentiles_type_selector" type="select" label="Sampling bounds and frequency">
                <option value="default" selected="true">Default sampling bounds and frequency</option>
                <option value="specify">Specify sampling bounds and frequency</option>
            </param>
            <when value="specify">
                <param name="lowBound" type="integer" value="5" label="Lower Bound Sampling Frequency (bp, default=5)" help="(--percentile-floor)">
                    <validator type="in_range" min="0" max="100" />
                </param>
                <param name="upBound" type="integer" value="100" label="Upper Bound Sampling Frequency (bp, default=100)" help="(--percentile-ceiling)">
                    <validator type="in_range" min="0" max="100" />
                </param>
                <param name="percentileStep" type="integer" value="5" label="Sampling increment (default=5)" help="(--percentile-step)">
                    <validator type="in_range" min="0" max="100" />
                </param>
            </when>
            <when value="default"/>
        </conditional>
        <expand macro="rscript_output_param" />
    </inputs>

    <outputs>
        <expand macro="pdf_output_data" filename="output.junctionSaturation_plot.pdf" />
        <expand macro="rscript_output_data" filename="output.junctionSaturation_plot.r" />
    </outputs>

    <tests>
        <test>
            <param name="input" value="pairend_strandspecific_51mer_hg19_chr1_1-100000.bam" />
            <param name="refgene" value="hg19_RefSeq_chr1_1-100000.bed" />
            <param name="rscript_output" value="true" />
            <output name="outputr" file="output.junctionSaturation_plot.r" compare="sim_size">
                <assert_contents>
                    <has_line line="pdf('output.junctionSaturation_plot.pdf')" />
                    <has_line line="x=c(5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100)" />
                </assert_contents>
            </output>
            <output name="outputpdf" file="output.junctionSaturation_plot.pdf" compare="sim_size" />
        </test>
    </tests>

    <help><![CDATA[
junction_saturation.py
++++++++++++++++++++++

It's very important to check if current sequencing depth is deep enough to perform
alternative splicing analyses. For a well annotated organism, the number of expressed genes
in particular tissue is almost fixed so the number of splice junctions is also fixed. The fixed
splice junctions can be predetermined from reference gene model. All (annotated) splice
junctions should be rediscovered from a saturated RNA-seq data, otherwise, downstream
alternative splicing analysis is problematic because low abundance splice junctions are
missing. This module checks for saturation by resampling 5%, 10%, 15%, ..., 95% of total
alignments from BAM or SAM file, and then detects splice junctions from each subset and
compares them to reference gene model.

Inputs
++++++++++++++

Input BAM/SAM file
    Alignment file in BAM/SAM format.

Reference gene model
    Gene model in BED format.

Sampling Percentiles - Upper Bound, Lower Bound, Sampling Increment (defaults= 100, 5, and 5)
    Sampling starts from the Lower Bound and increments to the Upper Bound at the rate of the Sampling Increment.

Minimum intron length (default=50)
    Minimum intron length (bp).

Minimum coverage (default=1)
    Minimum number of supportting reads to call a junction.

Output
++++++++++++++

1. output.junctionSaturation_plot.r: R script to generate plot
2. output.junctionSaturation_plot.pdf

.. image:: $PATH_TO_IMAGES/junction_saturation.png
   :height: 600 px
   :width: 600 px
   :scale: 80 %

In this example, current sequencing depth is almost saturated for "known junction" (red line) detection because the number of "known junction" reaches a plateau. In other words, nearly all "known junctions" (expressed in this particular tissue) have already been detected, and continue sequencing will not detect additional "known junction" and will only increase junction coverage (i.e. junction covered by more reads). While current sequencing depth is not saturated for novel junctions (green).

@ABOUT@

]]>
    </help>

    <expand macro="citations" />

</tool>