Mercurial > repos > galaxyp > quantp

<html><head></head><body>
<h1><u>QuanTP: Association between abundance ratios of transcript and protein</u></h1><hr/>
 <font><h3>Input data summary</h3></font>
 <ul>
 <li>Abbreviations used: PE (Proteome data) and TE (Transcriptome data) </li><br>
 <li>Input Proteome data dimension (Row Column):  2817  x  5 </li>
 <li>Input Transcriptome data dimension (Row Column):  2817  x  5 </li></ul><hr/>
<h3 id=table_of_content>Table of Contents:</h3>
 <ul>
 <li><a href=#sample_dist>Sample distribution</a></li>
 <li><a href=#corr_data>Correlation</a></li>
 <li><a href=#regression_data>Regression analysis</a></li>
 <li><a href=#inf_obs>Influential observations</a></li>
 <li><a href=#cluster_data>Cluster analysis</a></li></ul><hr/>
<h2 id="sample_dist"><font color=#ff0000>SAMPLE DISTRIBUTION</font></h2>
<table  border=1 cellspacing=0 cellpadding=5 style="table-layout:auto; ">
 <tr bgcolor="#7a0019"><th><font color=#ffcc33>Boxplot: Transcriptome data</font></th><th><font color=#ffcc33>Boxplot: Proteome data</font></th></tr>
 <tr><td align=center> <img src="Box_TE_all_rep.png" width=500 height=500></td>
<td align=center> <img src="Box_PE_all_rep.png" width=500 height=500></td></tr></table>
<br><font color="#ff0000"><h3>Sample wise distribution (Box plot) after using  mean  on replicates </h3></font><table border=1 cellspacing=0 cellpadding=5 style="table-layout:auto; "> <tr bgcolor="#7a0019"><th><font color=#ffcc33>Boxplot: Transcriptome data</font></th><th><font color=#ffcc33>Boxplot: Proteome data</font></th></tr>
 <tr><td align=center> <img src="Box_TE_rep.png" width=500 height=500></td>
<td align=center> <img src="Box_PE_rep.png" width=500 height=500></td></tr></table>
<br><font color="#ff0000"><h3>Distribution (Box plot) of log fold change </h3></font><table border=1 cellspacing=0 cellpadding=5 style="table-layout:auto; "> <tr bgcolor="#7a0019"><th><font color=#ffcc33>Boxplot: Transcriptome data</font></th><th><font color=#ffcc33>Boxplot: Proteome data</font></th></tr>
 <tr><td align=center> <img src="Box_TE.png" width=500 height=500></td>
<td align=center> <img src="Box_PE.png" width=500 height=500></td></tr></table>
<br><br><font size=5><b><a href='PE_TE_logfold_pval.txt' target='_blank'>Download the complete fold change data here</a></b></font><br>
<br><table  border=1 cellspacing=0 cellpadding=5 style="table-layout:auto; "> <tr bgcolor="#7a0019"><th><font color=#ffcc33>Transcript Fold-Change</font></th><th><font color=#ffcc33>Protein Fold-Change</font></th></tr>
<tr><td align=center> <img src="TE_volcano.png" width=600 height=600></td>
<td align=center> <img src="PE_volcano.png" width=600 height=600></td></tr></table><br>
<br><br><table  border=1 cellspacing=0 cellpadding=5 style="table-layout:auto; "> <tr bgcolor="#7a0019"><th><font color=#ffcc33>PCA plot: Transcriptome data</font></th><th><font color=#ffcc33>PCA plot: Proteome data</font></th></tr>
 <tr><td align=center> <img src="PCA_TE_all_rep.png" width=500 height=500></td>
 <td align=center> <img src="PCA_PE_all_rep.png" width=500 height=500></td></tr></table>
<hr/><h2 id="corr_data"><font color=#ff0000>CORRELATION</font></h2>
<br><table border=1 cellspacing=0 cellpadding=5 style="table-layout:auto; "> <tr bgcolor="#7a0019"><th><font color=#ffcc33>Scatter plot between Proteome and Transcriptome Abundance</font></th></tr>
<tr><td align=center> <img src="TE_PE_scatter.png" width=800 height=800></td>
<tr><td align=center>
<table  border=1 cellspacing=0 cellpadding=5 style="table-layout:auto; "> <tr bgcolor="#7a0019"><th><font color=#ffcc33>Parameter</font></th><th><font color=#ffcc33>Method 1</font></th><th><font color=#ffcc33>Method 2</font></th><th><font color=#ffcc33>Method 3</font></th></tr>
<tr><td>Correlation method</td><td> Pearson's product-moment correlation </td><td> Spearman's rank correlation rho </td><td> Kendall's rank correlation tau </td></tr>
 <tr><td>Correlation coefficient</td><td> 0.1173569 </td><td> 0.1608612 </td><td> 0.1093701 </td></tr>
</table>
<font color="red">*Note that <u>correlation</u> is <u>sensitive to outliers</u> in the data. So it is important to analyze outliers/influential observations in the data.<br> Below we use <u>Cook's distance based approach</u> to identify such influential observations.</font>
</td></table><hr/><h2 id="regression_data"><font color=#ff0000>REGRESSION ANALYSIS</font></h2>
<font><h3>Linear Regression model fit between Proteome and Transcriptome data</h3></font>
 <p>Assuming a linear relationship between Proteome and Transcriptome data, we here fit a linear regression model.</p>
 <table  border=1 cellspacing=0 cellpadding=5 style="table-layout:auto; "> <tr bgcolor="#7a0019"><th><font color=#ffcc33>Parameter</font></th><th><font color=#ffcc33>Value</font></th></tr>
<tr><td>Formula</td><td> PE_abundance~TE_abundance </td></tr>
 <tr><td colspan='2' align='center'> <b>Coefficients</b></td> </tr>
 <tr><td> (Intercept) </td><td> -0.06910598  (Pvalue: 1.220723e-05 ) </td></tr>
 <tr><td> TE_abundance </td><td> 0.1712395  (Pvalue: 4.168015e-10 ) </td></tr>
 <tr><td colspan='2' align='center'> <b>Model parameters</b></td> </tr>
 <tr><td>Residual standard error</td><td> 0.8363295  ( 2815  degree of freedom)</td></tr>
 <tr><td>F-statistic</td><td> 39.31142  ( on  1  and   2815  degree of freedom)</td></tr>
 <tr><td>R-squared</td><td> 0.01377265 </td></tr>
 <tr><td>Adjusted R-squared</td><td> 0.0134223 </td></tr>
</table>
<font color='#ff0000'><h3>Regression and diagnostics plots</h3></font>
<table border=1 cellspacing=0 cellpadding=5 style="table-layout:auto; "><tr bgcolor="#7a0019"><th> <font color='#ffcc33'><h4>1) <u>Residuals vs Fitted plot</h4></font></u></th>
 <th><font color=#ffcc33><h4>2) <u>Normal Q-Q plot of residuals</h4></font></u></th></tr>
<tr><td align=center><img src="PE_TE_lm_1.png" width=600 height=600></td><td align=center><img src="PE_TE_lm_2.png" width=600 height=600></td></tr>
<tr><td align=center>This plot checks for linear relationship assumptions.<br>If a horizontal line is observed without any distinct patterns, it indicates a linear relationship.</td>
 <td align=center>This plot checks whether residuals are normally distributed or not.<br>It is good if the residuals points follow the straight dashed line i.e., do not deviate much from dashed line.</td></tr></table>
<br><h2 id="inf_obs"><font color=#ff0000>Outliers based on the residuals from regression analysis</font></h2>
<table border=1 cellspacing=0 cellpadding=5 style="table-layout:auto; ">
 <tr bgcolor="#7a0019"><th colspan=2><font color=#ffcc33>Residuals from Regression</font></th></tr>
 <tr bgcolor="#7a0019"><th><font color=#ffcc33>Parameter</font></th><th><font color=#ffcc33>Value</font></th></tr>
<tr><td>Mean Residual value</td><td> 1.942328e-17 </td></tr>
 <tr><td>Standard deviation (Residuals)</td><td> 0.836181 </td></tr>
 <tr><td>Total outliers (Residual value > 2 standard deviation from the mean)</td><td> 164  <font size=4>(<b><a href=PE_TE_outliers_residuals.txt target="_blank">Download these  164  data points with high residual values here</a></b>)</font></td>
 <tr><td colspan=2 align=center><font size=4>(<b><a href=PE_TE_abundance_residuals.txt target="_blank">Download the complete residuals data here</a></b>)</font></td></td>
 </table><br><br>
<br><br><table border=1 cellspacing=0 cellpadding=5 style="table-layout:auto; "><tr bgcolor="#7a0019"><th><font color=#ffcc33><h4>3) <u>Residuals vs Leverage plot</h4></font></u></th></tr>
<tr><td align=center><img src="PE_TE_lm_5.png" width=600 height=600></td></tr>
<tr><td align=center>This plot is useful to identify any influential cases, that is outliers or extreme values.<br>They might influence the regression results upon inclusion or exclusion from the analysis.</td></tr></table><br>
<hr/><h2 id="inf_obs"><font color=#ff0000>INFLUENTIAL OBSERVATIONS</font></h2>
<p><b>Cook's distance</b> computes the influence of each data point/observation on the predicted outcome. i.e. this measures how much the observation is influencing the fitted values.<br>In general use, those observations that have a <b>Cook's distance > than  4  times the mean</b> may be classified as <b>influential.</b></p>
<img src="PE_TE_lm_cooksd.png" width=800 height=800> <br>In the above plot, observations above red line ( 4  * mean Cook's distance) are influential. Genes that are outliers could be important. These observations influences the correlation values and regression coefficients<br><br><table border=1 cellspacing=0 cellpadding=5 style="table-layout:auto; "> <tr bgcolor="#7a0019"><th><font color=#ffcc33>Parameter</font></th><th><font color=#ffcc33>Value</font></th></tr>
<tr><td>Mean Cook's distance</td><td> 0.0004875011 </td></tr>
 <tr><td>Total influential observations (Cook's distance >  4  * mean Cook's distance)</td><td> 115 </td>
 <tr><td>Observations with Cook's distance <  4  * mean Cook's distance</td><td> 2702 </td>
 </table><br><br>
<table  border=1 cellspacing=0 cellpadding=5 style="table-layout:auto; "> <tr bgcolor="#7a0019"><th><font color=#ffcc33>Scatterplot: Before removal</font></th><th><font color=#ffcc33>Scatterplot: After removal</font></th></tr>
<tr><td align=center><!--<font color='#ff0000'><h3>Scatter plot between Proteome and Transcriptome Abundance</h3></font>
--> <img src="TE_PE_scatter.png" width=600 height=600></td>
<td align=center>
 <img src="AbundancePlot_scatter_without_outliers.png" width=600 height=600></td></tr>
<tr><td>
<table  border=1 cellspacing=0 cellpadding=5 style="table-layout:auto; "> <tr bgcolor="#7a0019"><th><font color=#ffcc33>Parameter</font></th><th><font color=#ffcc33>Method 1</font></th><th><font color=#ffcc33>Method 2</font></th><th><font color=#ffcc33>Method 3</font></th></tr>
<tr><td>Correlation method</td><td> Pearson's product-moment correlation </td><td> Spearman's rank correlation rho </td><td> Kendall's rank correlation tau </td></tr>
 <tr><td>Correlation coefficient</td><td> 0.1173569 </td><td> 0.1608612 </td><td> 0.1093701 </td></tr>
</table>
</td>
<td><table border=1 cellspacing=0 cellpadding=5 style="table-layout:auto; "> <tr bgcolor="#7a0019"><th><font color=#ffcc33>Parameter</font></th><th><font color=#ffcc33>Method 1</font></th><th><font color=#ffcc33>Method 2</font></th><th><font color=#ffcc33>Method 3</font></th></tr>
<tr><td>Correlation method</td><td> Pearson's product-moment correlation </td><td> Spearman's rank correlation rho </td><td> Kendall's rank correlation tau </td></tr>
 <tr><td>Correlation coefficient</td><td> 0.1334038 </td><td> 0.1611936 </td><td> 0.1082761 </td></tr>
</table></td></tr></table>
<br><br><font size=5><b><a href='PE_TE_influential_observation.txt' target='_blank'>Download the complete list of influential observations</a></b></font>&nbsp;&nbsp;&nbsp;&nbsp; <font size=5><b><a href='PE_TE_non_influential_observation.txt' target='_blank'>Download the complete list (After removing influential points)</a></b></font><br>
 <br><font color="brown"><h4>Top  10  Influential observations (Cook's distance >  4  * mean Cook's distance)</h4></font>
<table border=1 cellspacing=0 cellpadding=5> <tr bgcolor="#7a0019">
<th><font color=#ffcc33>Gene</font></th><th><font color=#ffcc33>Protein Log Fold-Change</font></th><th><font color=#ffcc33>Transcript Log Fold-Change</font></th><th><font color=#ffcc33>Cook's Distance</font></th></tr>
<tr> <td> CATHL2 </td>
 <td> -1.960863 </td>
 <td> 4.88565 </td>
 <td> 0.1432189 </td></tr>
<tr> <td> CD177 </td>
 <td> -4.173263 </td>
 <td> 2.057499 </td>
 <td> 0.06826605 </td></tr>
<tr> <td> CATHL1 </td>
 <td> -0.9912973 </td>
 <td> 4.835209 </td>
 <td> 0.05767091 </td></tr>
<tr> <td> HP </td>
 <td> 2.570727 </td>
 <td> 3.885549 </td>
 <td> 0.04680496 </td></tr>
<tr> <td> AZU1 </td>
 <td> -2.226356 </td>
 <td> -5.561874 </td>
 <td> 0.03737565 </td></tr>
<tr> <td> ELANE </td>
 <td> -2.732479 </td>
 <td> -2.914936 </td>
 <td> 0.03266198 </td></tr>
<tr> <td> PYGM </td>
 <td> -0.06079228 </td>
 <td> 6.071712 </td>
 <td> 0.03242859 </td></tr>
<tr> <td> LTF </td>
 <td> -2.4294 </td>
 <td> 2.129742 </td>
 <td> 0.02725017 </td></tr>
<tr> <td> ATP1A2 </td>
 <td> 0.2871971 </td>
 <td> 6.446299 </td>
 <td> 0.01939256 </td></tr>
<tr> <td> C13H20orf194 </td>
 <td> -5.640732 </td>
 <td> -0.6697401 </td>
 <td> 0.01852927 </td></tr>
</table><br><br>
<hr/><h2 id="cluster_data"><font color=#ff0000>CLUSTER ANALYSIS</font></h2>
<br><table border=1 cellspacing=0 cellpadding=5 style="table-layout:auto; "> <tr bgcolor="#7a0019"><th><font color=#ffcc33>Heatmap of PE and TE abundance values (Hierarchical clustering)</font></th><th><font color=#ffcc33>Number of clusters to extract:  5 </font></th></tr>
<tr><td align=center colspan="2"><img src="PE_TE_heatmap.png" width=800 height=800></td></tr>
<tr><td colspan="2" align=center><font size=5><a href="PE_TE_hc_clusterpoints.txt" target="_blank"><b>Download the hierarchical cluster list</b></a></font></td></tr></table>
<br><br><table border=1 cellspacing=0 cellpadding=5 style="table-layout:auto; "> <tr bgcolor="#7a0019"><th><font color=#ffcc33>K-mean clustering</font></th><th><font color=#ffcc33>Number of clusters:  4 </font></th></tr>
<tr><td colspan="2" align=center><img src="PE_TE_kmeans.png" width=800 height=800></td></tr>
<tr><td colspan="2" align=center><font size=5><a href="PE_TE_kmeans_clusterpoints.txt" target="_blank"><b>Download the cluster list</b></a></font></td></tr></table><br><hr/>
<h3>Go To:</h3>
 <ul>
 <li><a href=#sample_dist>Sample distribution</a></li>
 <li><a href=#corr_data>Correlation</a></li>
 <li><a href=#regression_data>Regression analysis</a></li>
 <li><a href=#inf_obs>Influential observations</a></li>
 <li><a href=#cluster_data>Cluster analysis</a></li></ul>
 <br><a href=#>TOP</a></body></html>
author	galaxyp
date	Thu, 20 Dec 2018 16:06:05 -0500
parents	75faf9a89f5b
children