# HG changeset patch # User rouan # Date 1388054125 18000 # Node ID 22c941d89a89256eb6ecbc1e03cb6ff05ea84118 # Parent d7253a7818fb50445ae252eb798703949eff1f1f Uploaded diff -r d7253a7818fb -r 22c941d89a89 edgeR.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/edgeR.xml Thu Dec 26 05:35:25 2013 -0500 @@ -0,0 +1,214 @@ + + - Estimates differential gene expression for short read sequence count using methods appropriate for count data + + edgeR + limma + + + edgeR.pl -a $analysis_type.analysis -e $html_file.files_path -f $fdr -h $html_file -o $output + ## Pairwise comparisons + #if $analysis_type.analysis == "pw": + -r $analysis_type.rowsumfilter + #if $analysis_type.tagwise_disp.twd == "TRUE": + -u $analysis_type.tagwise_disp.twd_trend + -t + #end if + ## GLM + #else if $analysis_type.analysis == "glm": + #if $analysis_type.exp.export_norm == "true": + -n $norm_exp + #end if + -d $analysis_type.disp + $analysis_type.cont_pw + #for $fct in $analysis_type.factors: + factor::${$fct.fact_name}::${$fct.fact} + #end for + #for $c in $analysis_type.cont_pred: + cp::${c.cp_name}::${c.cp} + #end for + #for $cnt in $analysis_type.contrasts: + "cnt::${cnt.add_cont}" + #end for + ## LIMMA + #else + #if $analysis_type.exp.export_norm == "true": + -n $norm_exp $analysis_type.exp.log + #end if + $analysis_type.cont_pw + #for $fct in $analysis_type.factors: + factor::${$fct.fact_name}::${$fct.fact} + #end for + #for $c in $analysis_type.cont_pred: + cp::${c.cp_name}::${c.cp} + #end for + #for $cnt in $analysis_type.contrasts: + "cnt::${cnt.add_cont}" + #end for + #end if + $matrix + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + analysis_type[ "analysis" ] != "pw" and analysis_type[ "exp" ][ "export_norm" ] == "true" + + + + + +.. class:: infomark + +**What it does** + +Estimates differential gene expression for short read sequence count using methods appropriate for count data. +If you have paired data you may also want to consider Tophat/Cufflinks. +Input must be raw count data for each sequence arranged in a rectangular matrix as a tabular file. +Note - no scaling - please make sure you have untransformed raw counts of reads for each sequence. + +Performs digital differential gene expression analysis between groups (eg a treatment and control). +Biological replicates provide information about experimental variability required for reliable inference. + +**What it does not do** +edgeR_ requires biological replicates. +Without replicates you can't account for known important experimental sources of variability that the approach implemented here requires. + + +**Input** +A count matrix containing sequence names as rows and sample specific counts of reads from this sequence as columns. +The matrix must have 2 header rows, the first indicating the group assignment and the second uniquely identifiying the samples. It must also contain a unique set of (eg Feature) names in the first column. + +Example:: + + # G1:Mut G1:Mut G1:Mut G2:WT G2:WT G2:WT + #Feature Spl1 Spl2 Spl3 Spl4 Spl5 Spl6 + NM_001001130 97 43 61 34 73 26 + NM_001001144 25 8 9 3 5 5 + NM_001001152 72 45 29 20 31 13 + NM_001001160 0 1 1 1 0 0 + NM_001001177 0 1 0 4 3 3 + NM_001001178 0 2 1 0 4 0 + NM_001001179 0 0 0 0 0 2 + NM_001001180 0 0 0 0 0 2 + NM_001001181 415 319 462 185 391 155 + NM_001001182 1293 945 987 297 938 496 + NM_001001183 5 4 11 7 11 2 + NM_001001184 135 198 178 110 205 64 + NM_001001185 186 1 0 1 1 0 + NM_001001186 75 90 91 34 63 54 + NM_001001187 267 236 170 165 202 51 + NM_001001295 5 2 6 1 7 0 + NM_001001309 1 0 0 1 2 1 + ... + + +Please use the "Count reads in features with htseq-count" tool to generate the count matrix. + +**Output** + +A tabular file containing relative expression levels, statistical estimates of differential expression probability, R scripts, log, and some helpful diagnostic plots. + +.. class:: infomark + +**Attribution** +This tool wraps the edgeR_ Bioconductor package so all calculations and plots are controlled by that code. See edgeR_ for all documentation and appropriate attribution. +Recommended reference is Mark D. Robinson, Davis J. McCarthy, Gordon K. Smyth, PMCID: PMC2796818 + +.. class:: infomark + +**Attribution** +When applying the LIMMA (Linear models for RNA-Seq) anlysis the tool also makes use of the limma_ Bioconductor package. +Recommended reference is Smyth, G. K. (2005). Limma: linear models for microarray data. In: 'Bioinformatics and Computational Biology Solutions using R and Bioconductor'. R. Gentleman, V. Carey, S. Dudoit, R. Irizarry, W. Huber (eds), Springer, New York, pages 397--420. + + .. _edgeR: http://www.bioconductor.org/packages/release/bioc/html/edgeR.html + .. _limma: http://www.bioconductor.org/packages/release/bioc/html/limma.html + + + + +