Mercurial > repos > nilesh > rseqc
view junction_saturation.xml @ 63:27e16a30667a draft default tip
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/rseqc commit d7544582d5599c67a284faf9232cd2ccc4daa1de
author | iuc |
---|---|
date | Tue, 09 Apr 2024 11:24:55 +0000 |
parents | 5968573462fa |
children |
line wrap: on
line source
<tool id="rseqc_junction_saturation" name="Junction Saturation" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="@GALAXY_VERSION@"> <description>detects splice junctions from each subset and compares them to reference gene model</description> <macros> <import>rseqc_macros.xml</import> </macros> <expand macro="bio_tools"/> <expand macro="requirements"/> <expand macro="stdio"/> <version_command><![CDATA[junction_saturation.py --version]]></version_command> <command><![CDATA[ @BAM_SAM_INPUTS@ junction_saturation.py --input-file 'input.${extension}' --refgene '${refgene}' --out-prefix output --min-intron ${min_intron} --min-coverage ${min_coverage} --mapq ${mapq} #if str($percentiles_type.percentiles_type_selector) == "specify": --percentile-floor ${percentiles_type.lowBound} --percentile-ceiling ${percentiles_type.upBound} --percentile-step ${percentiles_type.percentileStep} #end if ]]> </command> <inputs> <expand macro="bam_sam_param"/> <expand macro="refgene_param"/> <expand macro="min_intron_param"/> <param name="min_coverage" type="integer" label="Minimum number of supporting reads to call a junction (default=1)" value="1" help="(--min-coverage)"/> <expand macro="mapq_param"/> <conditional name="percentiles_type"> <param name="percentiles_type_selector" type="select" label="Sampling bounds and frequency"> <option value="default" selected="true">Default sampling bounds and frequency</option> <option value="specify">Specify sampling bounds and frequency</option> </param> <when value="specify"> <param name="lowBound" type="integer" value="5" label="Lower Bound Sampling Frequency (bp, default=5)" help="(--percentile-floor)"> <validator type="in_range" min="0" max="100"/> </param> <param name="upBound" type="integer" value="100" label="Upper Bound Sampling Frequency (bp, default=100)" help="(--percentile-ceiling)"> <validator type="in_range" min="0" max="100"/> </param> <param name="percentileStep" type="integer" value="5" label="Sampling increment (default=5)" help="(--percentile-step)"> <validator type="in_range" min="0" max="100"/> </param> </when> <when value="default"/> </conditional> <expand macro="rscript_output_param"/> </inputs> <outputs> <expand macro="pdf_output_data" filename="output.junctionSaturation_plot.pdf" label="${tool.name} on ${on_string}: junction saturation (PDF)"/> <expand macro="rscript_output_data" filename="output.junctionSaturation_plot.r" label="${tool.name} on ${on_string}: junction saturation (Rscript)"/> </outputs> <tests> <test expect_num_outputs="2"> <param name="input" value="pairend_strandspecific_51mer_hg19_chr1_1-100000.bam"/> <param name="refgene" value="hg19_RefSeq_chr1_1-100000.bed" ftype="bed12"/> <param name="rscript_output" value="true"/> <output name="outputr" file="output.junctionSaturation_plot_r" compare="sim_size"> <assert_contents> <has_line line="pdf('output.junctionSaturation_plot.pdf')"/> <has_line line="x=c(5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100)"/> </assert_contents> </output> <output name="outputpdf" file="output.junctionSaturation_plot.pdf" compare="sim_size"/> </test> </tests> <help><![CDATA[ junction_saturation.py ++++++++++++++++++++++ It's very important to check if current sequencing depth is deep enough to perform alternative splicing analyses. For a well annotated organism, the number of expressed genes in particular tissue is almost fixed so the number of splice junctions is also fixed. The fixed splice junctions can be predetermined from reference gene model. All (annotated) splice junctions should be rediscovered from a saturated RNA-seq data, otherwise, downstream alternative splicing analysis is problematic because low abundance splice junctions are missing. This module checks for saturation by resampling 5%, 10%, 15%, ..., 95% of total alignments from BAM or SAM file, and then detects splice junctions from each subset and compares them to reference gene model. Inputs ++++++++++++++ Input BAM/SAM file Alignment file in BAM/SAM format. Reference gene model Gene model in BED format. Sampling Percentiles - Upper Bound, Lower Bound, Sampling Increment (defaults= 100, 5, and 5) Sampling starts from the Lower Bound and increments to the Upper Bound at the rate of the Sampling Increment. Minimum intron length (default=50) Minimum intron length (bp). Minimum coverage (default=1) Minimum number of supportting reads to call a junction. Output ++++++++++++++ 1. output.junctionSaturation_plot.r: R script to generate plot 2. output.junctionSaturation_plot.pdf .. image:: $PATH_TO_IMAGES/junction_saturation.png :height: 600 px :width: 600 px :scale: 80 % In this example, current sequencing depth is almost saturated for "known junction" (red line) detection because the number of "known junction" reaches a plateau. In other words, nearly all "known junctions" (expressed in this particular tissue) have already been detected, and continue sequencing will not detect additional "known junction" and will only increase junction coverage (i.e. junction covered by more reads). While current sequencing depth is not saturated for novel junctions (green). @ABOUT@ ]]> </help> <expand macro="citations"/> </tool>