comparison hicCorrectMatrix.xml @ 10:bfa1c014f64a draft

planemo upload for repository https://github.com/maxplanck-ie/HiCExplorer/tree/master/galaxy/wrapper/ commit dddc0b9035b8edadfd45d74b01aeca245c2725d7
author iuc
date Fri, 27 Apr 2018 08:38:17 -0400
parents ac80bd0a96ca
children 92fc291ceb1a
comparison
equal deleted inserted replaced
9:ac80bd0a96ca 10:bfa1c014f64a
187 ------ 187 ------
188 188
189 Diagnostic plot 189 Diagnostic plot
190 _______________ 190 _______________
191 191
192 The diagnostic plot consists of a bar plot of the contacts coverage per bins size together with the 192
193 modified z-score based on the Median Absolute Deviation (MAD) method. 193 The goal of the diagnostic plot is to help the user decide on a cutoff threshold that will ignore Hi-C matrix
194 194 bins with few reads assigned to them. The plot is a histogram of the total number of Hi-C reads per matrix bin.
195 See Boris Iglewicz and David Hoaglin 1993, Volume 16: 195 A secondary scale based on the mean absolute deviation score, is shown on top of the figure.
196 How to Detect and Handle Outliers The ASQC Basic References in Quality Control: Statistical Techniques, 196 This secondary scale aims to offer 'normalized' values that are comparable across samples
197 Edward F. Mykytka, Ph.D., Editor. 197 independently of the sequencing depth and the fraction of usable Hi-C reads. In all samples that we have studied,
198 198 the histogram follows a bimodal distribution where the first peak is for bins with zero reads which usually occur
199 Using this diagnostic plot, a user can decide if values 199 at repetitive regions. Other low scoring bins tend to be close to repetitive regions.
200 with a too low (and/or too high) number of contacts in respect to their genomic distance should 200 Also, low scoring bins can be caused by absence of a restriction site in the bin or because the restriction
201 be removed from the data before the correction applies. 201 site is present but the restriction enzyme did not cut. The valley between the two peaks in the
202 202 histogram is set by default as cutoff threshold.
203 Moreover, the shown distribution should be a Gaussian bell. If it doesn’t follow a Gaussian distribution 203 However, it is important to revise this as in some cases the selected value could not be correct.
204 this is an indicator that the used data is of bad quality or that the used contact matrix
205 is maybe not the one that should be used. It can happen that users select for example a merge
206 matrix with a lower resolution that was previously needed for plotting. In such cases the
207 diagnostic plot helps to detect this and prevent the user from running the analysis on a wrong dataset.
208 204
209 205
210 .. image:: $PATH_TO_IMAGES/diagnostic_plot.png 206 .. image:: $PATH_TO_IMAGES/diagnostic_plot.png
211 :width: 50% 207 :width: 50%
212 208
213 On the example plot above, a user can then use the lower threshold defined by the MAD method (black bold bar), or define its own threshold based on the contacts distribution. 209 On the example plot above, a user can then use the lower threshold defined by the Median Absolute Deviation (MAD) method (black bold bar), or define its own threshold based on the contacts distribution.
214 210
215 Correct 211 Correct
216 _______ 212 _______
217 213
218 Run the iterative correction and outputs the corrected matrix. This matrix can then be used with all downstream analysis tools such as ``hicPlotMatrix``, ``hicPlotTADs``, ``hicPlotViewpoint``, ``hicAggregateContacts`` for **visualization of Hi-C data**, ``hicCorrelate``, ``hicPlotDistVsCounts``, ``hicTransform``, ``hicFindTADs``, ``hicPCA`` **for data and scores computation on Hi-C data**. 214 Run the iterative correction and outputs the corrected matrix. This matrix can then be used with all downstream analysis tools such as ``hicPlotMatrix``, ``hicPlotTADs``, ``hicPlotViewpoint``, ``hicAggregateContacts`` for **visualization of Hi-C data**, ``hicCorrelate``, ``hicPlotDistVsCounts``, ``hicTransform``, ``hicFindTADs``, ``hicPCA`` **for data and scores computation on Hi-C data**.