31
|
1 <tool id="RPKM_saturation" name="RPKM Saturation" version="1.1">
|
29
|
2 <description>calculates raw count and RPKM values for transcript at exon, intron, and mRNA level</description>
|
|
3 <requirements>
|
31
|
4 <requirement type="package" version="2.11.0">R</requirement>
|
|
5 <requirement type="package" version="1.7.1">numpy</requirement>
|
29
|
6 <requirement type="package" version="2.3.7">rseqc</requirement>
|
|
7 </requirements>
|
31
|
8 <command> RPKM_saturation.py -i $input -o output -r $refgene
|
29
|
9
|
|
10 #if str($strand_type.strand_specific) == "pair"
|
|
11 -d
|
|
12 #if str($strand_type.pair_type) == "sd"
|
|
13 '1++,1--,2+-,2-+'
|
|
14 #else
|
|
15 '1+-,1-+,2++,2--'
|
|
16 #end if
|
|
17 #end if
|
|
18
|
|
19 #if str($strand_type.strand_specific) == "single"
|
|
20 -d
|
|
21 #if str($strand_type.single_type) == "s"
|
|
22 '++,--'
|
|
23 #else
|
|
24 '+-,-+'
|
|
25 #end if
|
|
26 #end if
|
|
27
|
|
28 -l $percentileFloor -u $percentileCeiling -s $percentileStep -c $rpkmCutoff
|
|
29
|
|
30 </command>
|
|
31 <inputs>
|
|
32 <param name="input" type="data" format="bam" label="input bam/sam file" />
|
|
33 <param name="refgene" type="data" format="bed" label="Reference gene model" />
|
|
34 <conditional name="strand_type">
|
|
35 <param name="strand_specific" type="select" label="Strand-specific?" value="None">
|
|
36 <option value="none">None</option>
|
|
37 <option value="pair">Pair-End RNA-seq</option>
|
|
38 <option value="single">Single-End RNA-seq</option>
|
|
39 </param>
|
|
40 <when value="pair">
|
|
41 <param name="pair_type" type="select" display="radio" label="Pair-End Read Type (format: mapped --> parent)" value="sd">
|
|
42 <option value="sd"> read1 (positive --> positive; negative --> negative), read2 (positive --> negative; negative --> positive)</option>
|
|
43 <option value="ds">read1 (positive --> negative; negative --> positive), read2 (positive --> positive; negative --> negative)</option>
|
|
44 </param>
|
|
45 </when>
|
|
46 <when value="single">
|
|
47 <param name="single_type" type="select" display="radio" label="Single-End Read Type (format: mapped --> parent)" value="s">
|
|
48 <option value="s">positive --> positive; negative --> negative</option>
|
|
49 <option value="d">positive --> negative; negative --> positive</option>
|
|
50 </param>
|
|
51 </when>
|
|
52 <when value="none"></when>
|
|
53 </conditional>
|
|
54 <param name="percentileFloor" type="integer" value="5" label="Begin sampling from this percentile (default=5)" />
|
|
55 <param name="percentileCeiling" type="integer" value="100" label="End sampling at this percentile (default=100)" />
|
|
56 <param name="percentileStep" type="integer" value="5" label="Sampling step size (default=5)" />
|
|
57 <param name="rpkmCutoff" type="text" value="0.01" label="Ignore transcripts with RPKM smaller than this number (default=0.01)" />
|
|
58 </inputs>
|
|
59 <outputs>
|
31
|
60 <data format="xls" name="outputxls" from_work_dir="output.eRPKM.xls" label="${tool.name} on ${on_string} (RPKM XLS)"/>
|
|
61 <data format="xls" name="outputrawxls" from_work_dir="output.rawCount.xls" label="${tool.name} on ${on_string} (Raw Count XLS)"/>
|
|
62 <data format="r" name="outputr" from_work_dir="output.saturation.r" label="${tool.name} on ${on_string} (R Script)"/>
|
|
63 <data format="pdf" name="outputpdf" from_work_dir="output.saturation.pdf" label="${tool.name} on ${on_string} (PDF)"/>
|
29
|
64 </outputs>
|
31
|
65 <stdio>
|
|
66 <exit_code range="1:" level="fatal" description="An error occured during execution, see stderr and stdout for more information" />
|
|
67 <regex match="[Ee]rror" source="both" description="An error occured during execution, see stderr and stdout for more information" />
|
|
68 </stdio>
|
29
|
69 <help>
|
31
|
70 RPKM_saturation.py
|
|
71 ++++++++++++++++++
|
29
|
72
|
31
|
73 The precision of any sample statitics (RPKM) is affected by sample size (sequencing depth);
|
|
74 \'resampling\' or \'jackknifing\' is a method to estimate the precision of sample statistics by
|
|
75 using subsets of available data. This module will resample a series of subsets from total RNA
|
|
76 reads and then calculate RPKM value using each subset. By doing this we are able to check if
|
|
77 the current sequencing depth was saturated or not (or if the RPKM values were stable or not)
|
|
78 in terms of genes' expression estimation. If sequencing depth was saturated, the estimated
|
|
79 RPKM value will be stationary or reproducible. By default, this module will calculate 20
|
|
80 RPKM values (using 5%, 10%, ... , 95%,100% of total reads) for each transcripts.
|
29
|
81
|
31
|
82 In the output figure, Y axis is "Percent Relative Error" or "Percent Error" which is used
|
|
83 to measures how the RPKM estimated from subset of reads (i.e. RPKMobs) deviates from real
|
|
84 expression level (i.e. RPKMreal). However, in practice one cannot know the RPKMreal. As a
|
|
85 proxy, we use the RPKM estimated from total reads to approximate RPKMreal.
|
29
|
86
|
31
|
87 .. image:: http://rseqc.sourceforge.net/_images/RelativeError.png
|
|
88 :height: 80 px
|
|
89 :width: 400 px
|
|
90 :scale: 100 %
|
29
|
91
|
|
92 Inputs
|
|
93 ++++++++++++++
|
|
94
|
|
95 Input BAM/SAM file
|
|
96 Alignment file in BAM/SAM format.
|
|
97
|
|
98 Reference gene model
|
|
99 Gene model in BED format.
|
|
100
|
|
101 Strand sequencing type (default=none)
|
|
102 See Infer Experiment tool if uncertain.
|
|
103
|
|
104 Options
|
|
105 ++++++++++++++
|
|
106
|
|
107 Skip Multiple Hit Reads
|
|
108 Use Multiple hit reads or use only uniquely mapped reads.
|
|
109
|
|
110 Only use exonic reads
|
|
111 Renders program only used exonic (UTR exons and CDS exons) reads, otherwise use all reads.
|
|
112
|
|
113 Output
|
|
114 ++++++++++++++
|
|
115
|
|
116 1. output..eRPKM.xls: RPKM values for each transcript
|
|
117 2. output.rawCount.xls: Raw count for each transcript
|
|
118 3. output.saturation.r: R script to generate plot
|
|
119 4. output.saturation.pdf:
|
|
120
|
31
|
121 .. image:: http://rseqc.sourceforge.net/_images/saturation.png
|
|
122 :height: 600 px
|
|
123 :width: 600 px
|
|
124 :scale: 80 %
|
29
|
125
|
|
126 - All transcripts were sorted in ascending order according to expression level (RPKM). Then they are divided into 4 groups:
|
|
127 1. Q1 (0-25%): Transcripts with expression level ranked below 25 percentile.
|
|
128 2. Q2 (25-50%): Transcripts with expression level ranked between 25 percentile and 50 percentile.
|
|
129 3. Q3 (50-75%): Transcripts with expression level ranked between 50 percentile and 75 percentile.
|
|
130 4. Q4 (75-100%): Transcripts with expression level ranked above 75 percentile.
|
|
131 - BAM/SAM file containing more than 100 million alignments will make module very slow.
|
|
132 - Follow example below to visualize a particular transcript (using R console)::
|
31
|
133
|
|
134 pdf("xxx.pdf") #starts the graphics device driver for producing PDF graphics
|
|
135 x <- seq(5,100,5) #resampling percentage (5,10,15,...,100)
|
|
136 rpkm <- c(32.95,35.43,35.15,36.04,36.41,37.76,38.96,38.62,37.81,38.14,37.97,38.58,38.59,38.54,38.67, 38.67,38.87,38.68, 38.42, 38.23) #Paste RPKM values calculated from each subsets
|
|
137 scatter.smooth(x,100*abs(rpkm-rpkm[length(rpkm)])/(rpkm[length(rpkm)]),type="p",ylab="Precent Relative Error",xlab="Resampling Percentage")
|
|
138 dev.off() #close graphical device
|
|
139
|
|
140 .. image:: http://rseqc.sourceforge.net/_images/saturation_eg.png
|
|
141 :height: 600 px
|
|
142 :width: 600 px
|
|
143 :scale: 80 %
|
|
144
|
|
145 -----
|
|
146
|
|
147 About RSeQC
|
|
148 +++++++++++
|
|
149
|
|
150 The RSeQC_ package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data. "Basic modules" quickly inspect sequence quality, nucleotide composition bias, PCR bias and GC bias, while "RNA-seq specific modules" investigate sequencing saturation status of both splicing junction detection and expression estimation, mapped reads clipping profile, mapped reads distribution, coverage uniformity over gene body, reproducibility, strand specificity and splice junction annotation.
|
|
151
|
|
152 The RSeQC package is licensed under the GNU GPL v3 license.
|
|
153
|
|
154 .. image:: http://rseqc.sourceforge.net/_static/logo.png
|
|
155
|
|
156 .. _RSeQC: http://rseqc.sourceforge.net/
|
|
157
|
29
|
158
|
|
159 </help>
|
|
160 </tool>
|