# HG changeset patch # User pravs # Date 1529224092 14400 # Node ID 796a42e10f77c7061e3439bf1fe16ba3a406f471 # Parent fc89f8c3b777c252b63f6d0bd4f452eb228ac1ab planemo upload diff -r fc89f8c3b777 -r 796a42e10f77 test_data/PE_abundance_GE_abundance_pearson.html --- a/test_data/PE_abundance_GE_abundance_pearson.html Sun Jun 17 04:20:06 2018 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,56 +0,0 @@ - -

Association between proteomics and transcriptomics data

-

Input data summary

Download mapped unmapped data

Filtering

Checking for NA or Inf or -Inf in either Transcriptome or Proteome data, if found, remove those entry

Filtered data summary

Excluding entires with abundance values: NA/Inf/-Inf

Proteome data summary

- - - - - - -
ParameterValue
Min. :-2.98277
1st Qu.:-0.40393
Median :-0.07986
Mean : 0.00000
3rd Qu.: 0.26061
Max. :15.13211
-

Transcriptome data summary

- - - - - - -
ParameterValue
Min. :-8.33003
1st Qu.:-0.06755
Median : 0.09635
Mean : 0.00000
3rd Qu.: 0.18103
Max. : 8.50430
-

Distribution of Proteome and Transcripome abundance (Box plot and Density plot)

-

Scatter plot between Proteome and Transcriptome Abundance

-

Correlation with all data

-
ParameterMethod 1Method 2Method 3
Correlation method used Pearson's product-moment correlation Spearman's rank correlation rho Kendall's rank correlation tau
Correlation -0.003584536 0.01866248 0.01280742
Pvalue 0.8457255 0.3110035 0.314683
-*Note that correlation is sensitive to outliers in the data. So it is important to analyze outliers/influential observations in the data.
Below we use cook's distance based approach to identify such influential observations.

Linear Regression model fit between Proteome and Transcriptome data

-

Assuming a linear relationship between Proteome and Transcriptome data, we here fit a linear regression model.

- - - - - - - - - -
ParameterValue
Formula PE_abundance~GE_abundance
Coefficients
(Intercept) 1.727289e-16 (Pvalue: 1 )
GE_abundance -0.003584536 (Pvalue: 0.8457255 )
Model parameters
Residual standard error 1.000163 ( 2947 degree of freedom)
F-statistic 0.0378662 ( on 1 and 2947 degree of freedom)
R-squared 1.28489e-05
Adjusted R-squared -0.0003264749
-

Plotting various regression diagnostics plots

-

Residuals vs Fitted plot

-

This plot checks for linear relationship assumptions. If a horizontal line is observed without any distinct patterns, it indicates a linear relationship

Normal Q-Q plot of residuals

-

This plot checks whether residuals are normally distributed or not. It is good if the residuals points follow the straight dashed line i.e., do not deviate much from dashed line.

Scale-Location (or Spread-Location) plot

-

This plot checks for homogeneity of residual variance (homoscedasticity). A horizontal line observed with equally spread residual points is a good indication of homoscedasticity.

Residuals vs Leverage plot

-

This plot is useful to identify any influential cases, that is outliers or extreme values that might influence the regression results upon inclusion or exclusion from the analysis.

Identify influential observations

-

Cook’s distance computes the influence of each data point/observation on the predicted outcome. i.e. this measures how much the observation is influencing the fitted values.
In general use, those observations that have a cook’s distance > than 4 times the mean may be classified as influential.


In the above plot, observations above red line (4*mean cook's distance) are influential, marked in *. Genes that are outliers could be important. These observations influences the correlation values and regression coefficients

- - -
ParameterValue
Mean cook's distance 0.0002988385
Total influential observations (cook's distance > 4 * mean cook's distance) 90
Total influential observations (cook's distance > 3 * mean cook's distance) 116

Top 10 influential observations (cook's distance > 4 * mean cook's distance)

Download entire list
PE_IDPE_abundanceGE_IDGE_abundancecooksd
ENSMUSP00000107109.2 -0.4719799 ENSMUST00000001126 -5.301664 0.001213545
ENSMUSP00000151536.1 3.113811 ENSMUST00000001256 -0.6348804 0.00230483
ENSMUSP00000150261.1 2.914045 ENSMUST00000001583 0.4988006 0.001801232
ENSMUSP00000111204.1 2.850989 ENSMUST00000002073 0.09635024 0.001391751
ENSMUSP00000089336.4 1.219945 ENSMUST00000002391 -2.47573 0.001781417
ENSMUSP00000030805.7 -0.8313093 ENSMUST00000003469 3.660597 0.001650483
ENSMUSP00000011492.8 -0.3735374 ENSMUST00000004326 -7.366491 0.001556623
ENSMUSP00000029658.7 9.120211 ENSMUST00000004473 0.09635024 0.01423993
ENSMUSP00000099904.4 -1.913743 ENSMUST00000004673 0.9756628 0.001209039
ENSMUSP00000081956.8 3.674308 ENSMUST00000005607 1.306612 0.006223403

Scatter plot between Proteome and Transcriptome Abundance, after removal of outliers/influential observations

-

Correlation with removal of outliers / influential observations

-

We removed the influential observations and reestimated the correlation values.

ParameterMethod 1Method 2Method 3
Correlation method used Pearson's product-moment correlation Spearman's rank correlation rho Kendall's rank correlation tau
Correlation 0.01485058 0.0246989 0.01689519
Pvalue 0.4273403 0.1867467 0.1918906
-

Heatmap of PE and GE abundance values

-

Kmean clustering

-Number of Clusters: 5
Download cluster list

Other regression model fitting

- -

Comparison of model fits

ModelMAEMSERMSEMAPEDiagnostics Plot
Linear regression with all data 0.5463329 0.9996481 0.999824 0.9996321 Link
Linear regression with removal of outliers 0.5404805 1.006281 1.003136 1.455637 Link
Resistant regression (lqs / least trimmed squares method) 0.5407598 1.007932 1.003958 1.537172 Link
Robust regression (rlm / Huber M-estimator method) 0.5404879 1.005054 1.002524 1.411806 Link
Polynomial regression with degree 2 0.546322 0.9996472 0.9998236 0.9993865 Link
Polynomial regression with degree 3 0.5469588 0.9976384 0.9988185 1.043158 Link
Polynomial regression with degree 4 0.5467885 0.9975077 0.9987531 1.041541 Link
Polynomial regression with degree 5 0.5467813 0.9975076 0.998753 1.041209 Link
Polynomial regression with degree 6 0.5465911 0.996652 0.9983246 1.056632 Link
Generalized additive models 0.5463695 0.9976796 0.9988391 1.032766 Link
\ No newline at end of file