Mercurial > repos > bgruening > hicexplorer_hiccorrectmatrix
diff hicCorrectMatrix.xml @ 10:bfa1c014f64a draft
planemo upload for repository https://github.com/maxplanck-ie/HiCExplorer/tree/master/galaxy/wrapper/ commit dddc0b9035b8edadfd45d74b01aeca245c2725d7
author | iuc |
---|---|
date | Fri, 27 Apr 2018 08:38:17 -0400 |
parents | ac80bd0a96ca |
children | 92fc291ceb1a |
line wrap: on
line diff
--- a/hicCorrectMatrix.xml Fri Apr 27 03:29:59 2018 -0400 +++ b/hicCorrectMatrix.xml Fri Apr 27 08:38:17 2018 -0400 @@ -189,28 +189,24 @@ Diagnostic plot _______________ -The diagnostic plot consists of a bar plot of the contacts coverage per bins size together with the -modified z-score based on the Median Absolute Deviation (MAD) method. -See Boris Iglewicz and David Hoaglin 1993, Volume 16: -How to Detect and Handle Outliers The ASQC Basic References in Quality Control: Statistical Techniques, -Edward F. Mykytka, Ph.D., Editor. - -Using this diagnostic plot, a user can decide if values -with a too low (and/or too high) number of contacts in respect to their genomic distance should -be removed from the data before the correction applies. - -Moreover, the shown distribution should be a Gaussian bell. If it doesn’t follow a Gaussian distribution -this is an indicator that the used data is of bad quality or that the used contact matrix -is maybe not the one that should be used. It can happen that users select for example a merge -matrix with a lower resolution that was previously needed for plotting. In such cases the -diagnostic plot helps to detect this and prevent the user from running the analysis on a wrong dataset. +The goal of the diagnostic plot is to help the user decide on a cutoff threshold that will ignore Hi-C matrix +bins with few reads assigned to them. The plot is a histogram of the total number of Hi-C reads per matrix bin. +A secondary scale based on the mean absolute deviation score, is shown on top of the figure. +This secondary scale aims to offer 'normalized' values that are comparable across samples +independently of the sequencing depth and the fraction of usable Hi-C reads. In all samples that we have studied, +the histogram follows a bimodal distribution where the first peak is for bins with zero reads which usually occur +at repetitive regions. Other low scoring bins tend to be close to repetitive regions. +Also, low scoring bins can be caused by absence of a restriction site in the bin or because the restriction +site is present but the restriction enzyme did not cut. The valley between the two peaks in the +histogram is set by default as cutoff threshold. +However, it is important to revise this as in some cases the selected value could not be correct. .. image:: $PATH_TO_IMAGES/diagnostic_plot.png :width: 50% -On the example plot above, a user can then use the lower threshold defined by the MAD method (black bold bar), or define its own threshold based on the contacts distribution. +On the example plot above, a user can then use the lower threshold defined by the Median Absolute Deviation (MAD) method (black bold bar), or define its own threshold based on the contacts distribution. Correct _______