Mercurial > repos > nilesh > rseqc
view read_distribution.xml @ 60:1421603cc95b draft
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/rseqc commit 1dfe55ca83685cadb0ce8f6ebbd8c13232376d1d
author | iuc |
---|---|
date | Sat, 26 Nov 2022 15:19:14 +0000 |
parents | dbedfc5f5a3c |
children | 5968573462fa |
line wrap: on
line source
<tool id="rseqc_read_distribution" name="Read Distribution" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="@GALAXY_VERSION@"> <description>calculates how mapped reads were distributed over genome feature</description> <expand macro="bio_tools"/> <macros> <import>rseqc_macros.xml</import> </macros> <expand macro="requirements" /> <expand macro="stdio" /> <version_command><![CDATA[read_distribution.py --version]]></version_command> <command><![CDATA[ @BAM_SAM_INPUTS@ read_distribution.py -i 'input.${extension}' -r '${refgene}' > '${output}' ]]> </command> <inputs> <expand macro="bam_sam_param" /> <expand macro="refgene_param" /> </inputs> <outputs> <data format="txt" name="output" /> </outputs> <tests> <test> <param name="input" value="pairend_strandspecific_51mer_hg19_chr1_1-100000.bam"/> <param name="refgene" value="hg19_RefSeq_chr1_1-100000.bed" ftype="bed12"/> <output name="output" file="output.read_distribution.txt"/> </test> </tests> <help><![CDATA[ read_distribution.py ++++++++++++++++++++ Provided a BAM/SAM file and reference gene model, this module will calculate how mapped reads were distributed over genome feature (like CDS exon, 5'UTR exon, 3' UTR exon, Intron, Intergenic regions). When genome features are overlapped (e.g. a region could be annotated as both exon and intron by two different transcripts) , they are prioritize as: CDS exons > UTR exons > Introns > Intergenic regions, for example, if a read was mapped to both CDS exon and intron, it will be assigned to CDS exons. * "Total Reads": This does NOT include those QC fail,duplicate and non-primary hit reads * "Total Tags": reads spliced once will be counted as 2 tags, reads spliced twice will be counted as 3 tags, etc. And because of this, "Total Tags" >= "Total Reads" * "Total Assigned Tags": number of tags that can be unambiguously assigned the 10 groups (see below table). * Tags assigned to "TSS_up_1kb" were also assigned to "TSS_up_5kb" and "TSS_up_10kb", tags assigned to "TSS_up_5kb" were also assigned to "TSS_up_10kb". Therefore, "Total Assigned Tags" = CDS_Exons + 5'UTR_Exons + 3'UTR_Exons + Introns + TSS_up_10kb + TES_down_10kb. * When assign tags to genome features, each tag is represented by its middle point. RSeQC cannot assign those reads that: * hit to intergenic regions that beyond region starting from TSS upstream 10Kb to TES downstream 10Kb. * hit to regions covered by both 5'UTR and 3' UTR. This is possible when two head-to-tail transcripts are overlapped in UTR regions. * hit to regions covered by both TSS upstream 10Kb and TES downstream 10Kb. Inputs ++++++++++++++ Input BAM/SAM file Alignment file in BAM/SAM format. Reference gene model Gene model in BED format. Sample Output ++++++++++++++ Output: =============== ============ =========== =========== Group Total_bases Tag_count Tags/Kb =============== ============ =========== =========== CDS_Exons 33302033 20002271 600.63 5'UTR_Exons 21717577 4408991 203.01 3'UTR_Exons 15347845 3643326 237.38 Introns 1132597354 6325392 5.58 TSS_up_1kb 17957047 215331 11.99 TSS_up_5kb 81621382 392296 4.81 TSS_up_10kb 149730983 769231 5.14 TES_down_1kb 18298543 266161 14.55 TES_down_5kb 78900674 729997 9.25 TES_down_10kb 140361190 896882 6.39 =============== ============ =========== =========== @ABOUT@ ]]> </help> <expand macro="citations" /> </tool>