comparison RPKM_saturation.xml @ 31:cc5eaa9376d8

Lance's updates
author nilesh
date Wed, 02 Oct 2013 02:20:04 -0400
parents 907d4b021ff6
children 580ee0c4bc4e
comparison
equal deleted inserted replaced
30:b5d2f575ccb6 31:cc5eaa9376d8
1 <tool id="RPKM_saturation" name="RPKM Saturation"> 1 <tool id="RPKM_saturation" name="RPKM Saturation" version="1.1">
2 <description>calculates raw count and RPKM values for transcript at exon, intron, and mRNA level</description> 2 <description>calculates raw count and RPKM values for transcript at exon, intron, and mRNA level</description>
3 <requirements> 3 <requirements>
4 <requirement type="package" version="2.15.1">R</requirement> 4 <requirement type="package" version="2.11.0">R</requirement>
5 <requirement type="package" version="1.7.1">numpy</requirement>
5 <requirement type="package" version="2.3.7">rseqc</requirement> 6 <requirement type="package" version="2.3.7">rseqc</requirement>
6 </requirements> 7 </requirements>
7 <command interpreter="python"> RPKM_saturation.py -i $input -o output -r $refgene 8 <command> RPKM_saturation.py -i $input -o output -r $refgene
8 9
9 #if str($strand_type.strand_specific) == "pair" 10 #if str($strand_type.strand_specific) == "pair"
10 -d 11 -d
11 #if str($strand_type.pair_type) == "sd" 12 #if str($strand_type.pair_type) == "sd"
12 '1++,1--,2+-,2-+' 13 '1++,1--,2+-,2-+'
54 <param name="percentileCeiling" type="integer" value="100" label="End sampling at this percentile (default=100)" /> 55 <param name="percentileCeiling" type="integer" value="100" label="End sampling at this percentile (default=100)" />
55 <param name="percentileStep" type="integer" value="5" label="Sampling step size (default=5)" /> 56 <param name="percentileStep" type="integer" value="5" label="Sampling step size (default=5)" />
56 <param name="rpkmCutoff" type="text" value="0.01" label="Ignore transcripts with RPKM smaller than this number (default=0.01)" /> 57 <param name="rpkmCutoff" type="text" value="0.01" label="Ignore transcripts with RPKM smaller than this number (default=0.01)" />
57 </inputs> 58 </inputs>
58 <outputs> 59 <outputs>
59 <data format="xls" name="outputxls" from_work_dir="output.eRPKM.xls"/> 60 <data format="xls" name="outputxls" from_work_dir="output.eRPKM.xls" label="${tool.name} on ${on_string} (RPKM XLS)"/>
60 <data format="xls" name="outputrawxls" from_work_dir="output.rawCount.xls"/> 61 <data format="xls" name="outputrawxls" from_work_dir="output.rawCount.xls" label="${tool.name} on ${on_string} (Raw Count XLS)"/>
61 <data format="r" name="outputr" from_work_dir="output.saturation.r"/> 62 <data format="r" name="outputr" from_work_dir="output.saturation.r" label="${tool.name} on ${on_string} (R Script)"/>
62 <data format="pdf" name="outputpdf" from_work_dir="output.saturation.pdf"/> 63 <data format="pdf" name="outputpdf" from_work_dir="output.saturation.pdf" label="${tool.name} on ${on_string} (PDF)"/>
63 </outputs> 64 </outputs>
65 <stdio>
66 <exit_code range="1:" level="fatal" description="An error occured during execution, see stderr and stdout for more information" />
67 <regex match="[Ee]rror" source="both" description="An error occured during execution, see stderr and stdout for more information" />
68 </stdio>
64 <help> 69 <help>
65 .. image:: https://code.google.com/p/rseqc/logo?cct=1336721062 70 RPKM_saturation.py
71 ++++++++++++++++++
66 72
67 ----- 73 The precision of any sample statitics (RPKM) is affected by sample size (sequencing depth);
74 \'resampling\' or \'jackknifing\' is a method to estimate the precision of sample statistics by
75 using subsets of available data. This module will resample a series of subsets from total RNA
76 reads and then calculate RPKM value using each subset. By doing this we are able to check if
77 the current sequencing depth was saturated or not (or if the RPKM values were stable or not)
78 in terms of genes' expression estimation. If sequencing depth was saturated, the estimated
79 RPKM value will be stationary or reproducible. By default, this module will calculate 20
80 RPKM values (using 5%, 10%, ... , 95%,100% of total reads) for each transcripts.
68 81
69 About RSeQC 82 In the output figure, Y axis is "Percent Relative Error" or "Percent Error" which is used
70 +++++++++++ 83 to measures how the RPKM estimated from subset of reads (i.e. RPKMobs) deviates from real
84 expression level (i.e. RPKMreal). However, in practice one cannot know the RPKMreal. As a
85 proxy, we use the RPKM estimated from total reads to approximate RPKMreal.
71 86
72 The RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data. “Basic modules” quickly inspect sequence quality, nucleotide composition bias, PCR bias and GC bias, while “RNA-seq specific modules” investigate sequencing saturation status of both splicing junction detection and expression estimation, mapped reads clipping profile, mapped reads distribution, coverage uniformity over gene body, reproducibility, strand specificity and splice junction annotation. 87 .. image:: http://rseqc.sourceforge.net/_images/RelativeError.png
73 88 :height: 80 px
74 The RSeQC package is licensed under the GNU GPL v3 license. 89 :width: 400 px
90 :scale: 100 %
75 91
76 Inputs 92 Inputs
77 ++++++++++++++ 93 ++++++++++++++
78 94
79 Input BAM/SAM file 95 Input BAM/SAM file
100 1. output..eRPKM.xls: RPKM values for each transcript 116 1. output..eRPKM.xls: RPKM values for each transcript
101 2. output.rawCount.xls: Raw count for each transcript 117 2. output.rawCount.xls: Raw count for each transcript
102 3. output.saturation.r: R script to generate plot 118 3. output.saturation.r: R script to generate plot
103 4. output.saturation.pdf: 119 4. output.saturation.pdf:
104 120
105 .. image:: http://dldcc-web.brc.bcm.edu/lilab/liguow/RSeQC/figure/saturation.png 121 .. image:: http://rseqc.sourceforge.net/_images/saturation.png
122 :height: 600 px
123 :width: 600 px
124 :scale: 80 %
106 125
107 - All transcripts were sorted in ascending order according to expression level (RPKM). Then they are divided into 4 groups: 126 - All transcripts were sorted in ascending order according to expression level (RPKM). Then they are divided into 4 groups:
108 1. Q1 (0-25%): Transcripts with expression level ranked below 25 percentile. 127 1. Q1 (0-25%): Transcripts with expression level ranked below 25 percentile.
109 2. Q2 (25-50%): Transcripts with expression level ranked between 25 percentile and 50 percentile. 128 2. Q2 (25-50%): Transcripts with expression level ranked between 25 percentile and 50 percentile.
110 3. Q3 (50-75%): Transcripts with expression level ranked between 50 percentile and 75 percentile. 129 3. Q3 (50-75%): Transcripts with expression level ranked between 50 percentile and 75 percentile.
111 4. Q4 (75-100%): Transcripts with expression level ranked above 75 percentile. 130 4. Q4 (75-100%): Transcripts with expression level ranked above 75 percentile.
112 - BAM/SAM file containing more than 100 million alignments will make module very slow. 131 - BAM/SAM file containing more than 100 million alignments will make module very slow.
113 - Follow example below to visualize a particular transcript (using R console):: 132 - Follow example below to visualize a particular transcript (using R console)::
114 - output example 133
115 .. image:: http://dldcc-web.brc.bcm.edu/lilab/liguow/RSeQC/figure/saturation_eg.png 134 pdf("xxx.pdf") #starts the graphics device driver for producing PDF graphics
135 x &lt;- seq(5,100,5) #resampling percentage (5,10,15,...,100)
136 rpkm &lt;- c(32.95,35.43,35.15,36.04,36.41,37.76,38.96,38.62,37.81,38.14,37.97,38.58,38.59,38.54,38.67, 38.67,38.87,38.68, 38.42, 38.23) #Paste RPKM values calculated from each subsets
137 scatter.smooth(x,100*abs(rpkm-rpkm[length(rpkm)])/(rpkm[length(rpkm)]),type="p",ylab="Precent Relative Error",xlab="Resampling Percentage")
138 dev.off() #close graphical device
139
140 .. image:: http://rseqc.sourceforge.net/_images/saturation_eg.png
141 :height: 600 px
142 :width: 600 px
143 :scale: 80 %
144
145 -----
146
147 About RSeQC
148 +++++++++++
149
150 The RSeQC_ package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data. "Basic modules" quickly inspect sequence quality, nucleotide composition bias, PCR bias and GC bias, while "RNA-seq specific modules" investigate sequencing saturation status of both splicing junction detection and expression estimation, mapped reads clipping profile, mapped reads distribution, coverage uniformity over gene body, reproducibility, strand specificity and splice junction annotation.
151
152 The RSeQC package is licensed under the GNU GPL v3 license.
153
154 .. image:: http://rseqc.sourceforge.net/_static/logo.png
155
156 .. _RSeQC: http://rseqc.sourceforge.net/
157
116 158
117 </help> 159 </help>
118 </tool> 160 </tool>