6
|
1 =======
|
|
2 ezBAMQC
|
|
3 =======
|
|
4
|
|
5 *"ezBAMQC, a tool to check the quality of mapped next generation sequencing files."*
|
|
6
|
|
7 :Codeology Icon:
|
|
8
|
|
9 .. image:: https://raw.githubusercontent.com/mhammell-laboratory/bamqc/master/doc/bamqc-icon.gif
|
|
10 :alt: generated at codeology.braintreepayments.com/mhammell-laboratory/bamqc
|
|
11 :align: right
|
|
12 :target: http://codeology.braintreepayments.com/mhammell-laboratory/bamqc
|
|
13
|
|
14 :Description:
|
|
15
|
|
16 ezBAMQC is a tool to check the quality of either one or many mapped next-generation-sequencing datasets. It conducts comprehensive evaluations of aligned sequencing data from multiple aspects including: clipping profile, mapping quality distribution, mapped read length distribution, genomic/transcriptomic mapping distribution, inner distance distribution (for paired-end reads), ribosomal RNA contamination, transcript 5’ and 3’ end bias, transcription dropout rate, sample correlations, sample reproducibility, sample variations. It outputs a set of tables and plots and one HTML page that contains a summary of the results. Many metrics are designed for RNA-seq data specifically, but ezBAMQC can be applied to any mapped sequencing dataset such as RNA-seq, CLIP-seq, GRO-seq, ChIP-seq, DNA-seq and so on.
|
|
17
|
|
18 :Links:
|
|
19
|
|
20 `Github Page <https://github.com/mhammell-laboratory/bamqc>`_
|
|
21
|
|
22 `Pypi Page <https://pypi.python.org/pypi/ezBAMQC>`_
|
|
23
|
|
24 `MHammell Lab <http://hammelllab.labsites.cshl.edu/software>`_
|
|
25
|
|
26 :Authors:
|
|
27 Ying Jin, David Molik, and Molly Hammell
|
|
28
|
|
29 :Version: 0.6.7
|
|
30
|
|
31 :Contact:
|
|
32 Ying Jin (yjin@cshl.edu)
|
|
33
|
|
34 Installation guide for ezBAMQC for from source installs
|
|
35 =======================================================
|
|
36
|
|
37 When installing ezBAMQC there are several options, but the main point is: since ezBAMQC uses C++ STD 11 you'll need a version of GCC that can support that, this useally means 4.8 or 4.9. beyond that, you'll need Python, R and Corrplot for interfacing with the C code.
|
|
38
|
|
39 :Intallation:
|
|
40 `Source Code <https://github.com/mhammell-laboratory/ezBAMQC/releases>`_
|
|
41
|
|
42 `Pypi <https://pypi.python.org/pypi?:action=display&name=ezBAMQC>`_
|
|
43
|
|
44 :Prerequisites:
|
|
45 * `python2.7 <https://www.python.org/download/releases/2.7/>`_
|
|
46 * `R <https://www.r-project.org/>`_
|
|
47 * `corrplot <https://cran.r-project.org/web/packages/corrplot/>`_
|
|
48 * `GCC 4.8.1 or greater <https://gcc.gnu.org/gcc-4.8/>`_ GCC 4.9.1 or greater is recomended for PyPi install
|
|
49
|
|
50 :Notes:
|
|
51 * While there are multiple methods of installing the prerequistes it may help to look at (if using a yum based linux distro):*
|
|
52 * `Devtoolset-3 <https://access.redhat.com/documentation/en-US/Red_Hat_Developer_Toolset/3/html/User_Guide/sect-Red_Hat_Developer_Toolset-Install.html>`_ for GCC compilers
|
|
53 * `IUS <https://ius.io/>`_ for Python2.7
|
|
54 * `Software Collections <https://www.softwarecollections.org/>`_ for collections of software (like devtoolset 3 or python)
|
|
55 * `rpmfinder <https://www.rpmfind.net/>`_ for searching rpms across mutliple systems
|
|
56
|
|
57 Setup
|
|
58 =====
|
|
59
|
|
60 1) Make sure that the GCC comiler is in your PATH:
|
|
61
|
|
62 ::
|
|
63
|
|
64 export PATH=/path/to/gcc:$PATH
|
|
65
|
|
66 2) Make sure that python2.7 is in your PYTHONPATH:
|
|
67
|
|
68 ::
|
|
69
|
|
70 export PYTHONPATH=/path/to/python2.7/site-packages:$PYTHONPATH
|
|
71
|
|
72 3) There are three methods of installation of ezBAMQC, from source, from setup.py, and from pypi, once prequistes are setup.
|
|
73
|
|
74 From Source
|
|
75 ~~~~~~~~~~~
|
|
76
|
|
77 1) Download source
|
|
78
|
|
79 2) Unpack tarball and go to the directory of the package:
|
|
80
|
|
81 ::
|
|
82
|
|
83 tar xvfz bamqc-0.6.7.tar.gz
|
|
84
|
|
85 cd bamqc-0.6.7
|
|
86
|
|
87 3) Run make:
|
|
88
|
|
89 ::
|
|
90
|
|
91 make
|
|
92
|
|
93 From Setup.py
|
|
94 ~~~~~~~~~~~~~
|
|
95
|
|
96 ::
|
|
97
|
|
98 python2.7 setup.py install
|
|
99
|
|
100 From Pypi
|
|
101 ~~~~~~~~~
|
|
102
|
|
103 ::
|
|
104
|
|
105 pip2.7 install BAMqc
|
|
106
|
|
107 Usage
|
|
108 =====
|
|
109
|
|
110 ::
|
|
111
|
|
112 ezBAMQC [-h] -i alignment_files [alignment_files ...] -r [refgene]
|
|
113 [-f [attrID]] [--rRNA [rRNA]] -o [dir] [--stranded [stranded]]
|
|
114 [-q [mapq]] [-l labels [labels ...]] [-t NUMTHREADS]
|
|
115
|
|
116 optional arguments:
|
|
117
|
|
118 ::
|
|
119
|
|
120 -h, --help show this help message and exit.
|
|
121 -i, --inputFile alignment files. Could be multiple SAM/BAM files separated by space. Required.
|
|
122 -r, --refgene gene annotation file in GTF format. Required
|
|
123 -f the read summation at which feature level in the GTF file. DEFAULT: gene_id.
|
|
124 --rRNA rRNA coordinates in BED format.
|
|
125 -o, --outputDir output directory. Required.
|
|
126 --stranded strandness of the library?
|
|
127 yes : sense stranded
|
|
128 reverse : reverse stranded
|
|
129 no : not stranded
|
|
130 DEFAULT: yes.
|
|
131 -q, --mapq Minimum mapping quality (phred scaled) for an alignment to be called uniquely mapped. DEFAULT:30
|
|
132 -l, --label Labels of input files. DEFAULT:smp1 smp2 ...
|
|
133 -t, --threads Number of threads to use. DEFAULT:1
|
|
134
|
|
135 Example:
|
|
136
|
|
137 ::
|
|
138
|
|
139 ezBAMQC -i test-data/exp_data/treat1.bam test-data/exp_data/treat2.bam test-data/exp_data/treat3.bam -r test-data/exp_data/hg9_refGene.gtf -q 30 --rRNA test-data/exp_data/hg19_rRNA.bed -o exp_output2
|
|
140
|
|
141 Please find the example output from folder test-data.
|
|
142
|
|
143 FAQ
|
|
144 ===
|
|
145 Q: Why use ezBAMQC?
|
|
146
|
|
147 A: ezBAMQC is efficient and easy to use. With one command line, it reports a comprehensive evaluation of the data with a set of plots and tables.The ability to assess multiple samples together with high efficiency make it especially useful in cases where there are a large number of samples from the same condition, genotype, or treatment. ezBAMQC was written in C++ and supports multithreading. A mouse RNA-seq sample with 120M alignments can be done in 8 minutes with 5 threads.
|
|
148
|
|
149 Q: Why the total number of reads reported by ezBAMQC does not match with samtools flagstat?
|
|
150
|
|
151 A: The difference is because of non-uniquely mapped reads or multiply aligned reads (multi-reads). Samtools flagstat counts each multiple aligment as a different reads, but ezBAMQC counts reads accoriding to the read ID, i.e., each individual read will be counted once no matter that it is a uniquely mapped read or multi-read.
|
|
152
|
|
153 Q: What is "Low Quality Reads" ?
|
|
154
|
|
155 A: Reads marked as qc fail accoriding to SAM format or reads with mapping quality lower than the value set by the option -q will be considered as "Low Quality Reads".
|
|
156
|
|
157 Q: How the setting of option -q alter the results?
|
|
158
|
|
159 A: Reads with low quality, i.e., did not pass -q cutoff, are only counted in Total Reads, Mapped Reads, and Mappability by mapping quality plot. The rest of the report does not include low quality reads.
|
|
160
|
|
161 Q: Do multi-reads (non-uniquely mapped reads) have been considered in Read distribution and gene quantification?
|
|
162
|
|
163 A: No. Only uniquely mapped reads were counted.
|
|
164
|
|
165
|
|
166 Acknowledgements
|
|
167 ================
|
|
168
|
|
169 #) Samtools contributors
|
|
170 #) Users' valuable feedback
|
|
171
|
|
172 Copying & Distribution
|
|
173 ======================
|
|
174
|
|
175 ezBAMQC is free software: you can redistribute it and/or modify
|
|
176 it under the terms of the GNU General Public License as published by
|
|
177 the Free Software Foundation, either version 3 of the License, or
|
|
178 (at your option) any later version.
|
|
179
|
|
180 This program is distributed in the hope that it will be useful,
|
|
181 but *WITHOUT ANY WARRANTY*; without even the implied warranty of
|
|
182 *MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE*. See the
|
|
183 GNU General Public License for more details.
|
|
184
|
|
185 You should have received a copy of the GNU General Public License
|
|
186 along with ezBAMQC. If not, see `this website <http://www.gnu.org/licenses/>`_
|