annotate ezBAMQC/README.rst @ 3:ddfb71ec32ed

Uploaded
author cshl-bsr
date Tue, 29 Mar 2016 15:31:11 -0400
parents dfa3745e5fd8
children 6610eedd9fae
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
1 .. image:: https://raw.githubusercontent.com/mhammell-laboratory/bamqc/master/doc/bamqc-icon.png
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
2 :width: 200 px
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
3 :alt: generated at codeology.braintreepayments.com/mhammell-laboratory/bamqc
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
4 :align: right
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
5 :target: http://codeology.braintreepayments.com/mhammell-laboratory/bamqc
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
6
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
7 =====
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
8 ezBAMQC
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
9 =====
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
10 *"ezBAMQC, a tool to check the quality of mapped next generation sequencing files."*
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
11
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
12 :Description:
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
13
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
14 ezBAMQC is a tool to check the quality of either one or many mapped next-generation-sequencing datasets. It conducts comprehensive evaluations of aligned sequencing data from multiple aspects including: clipping profile, mapping quality distribution, mapped read length distribution, genomic/transcriptomic mapping distribution, inner distance distribution (for paired-end reads), ribosomal RNA contamination, transcript 5’ and 3’ end bias, transcription dropout rate, sample correlations, sample reproducibility, sample variations. It outputs a set of tables and plots and one HTML page that contains a summary of the results. Many metrics are designed for RNA-seq data specifically, but ezBAMQC can be applied to any mapped sequencing dataset such as RNA-seq, CLIP-seq, GRO-seq, ChIP-seq, DNA-seq and so on. ::
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
15
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
16 :Links:
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
17
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
18 `Github Page <https://github.com/mhammell-laboratory/bamqc>`_
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
19
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
20 `Pypi Page <https://pypi.python.org/pypi/ezBAMQC>`_
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
21
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
22 `MHammell Lab <http://hammelllab.labsites.cshl.edu/software>`_
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
23
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
24 :Authors:
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
25 Ying Jin, David Molik, and Molly Hammell
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
26
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
27 :Version: 0.6.5
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
28
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
29 :Contact:
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
30 Ying Jin (yjin@cshl.edu)
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
31
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
32 Installation guide for ezBAMQC for from source installs
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
33 =====================================================
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
34
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
35 When installing ezBAMQC there are several options, but the main point is: since ezBAMQC uses C++ STD 11 you'll need a version of GCC that can support that, this useally means 4.8 or 4.9. beyond that, you'll need Python, R and Corrplot for interfacing with the C code.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
36
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
37 :Intallation:
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
38 `Source Code <https://github.com/mhammell-laboratory/ezBAMQC/releases>`_
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
39
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
40 `Pypi <https://pypi.python.org/pypi?:action=display&name=ezBAMQC>`_
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
41
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
42 :Prerequisites:
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
43 * `python2.7 <https://www.python.org/download/releases/2.7/>`_
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
44 * `R <https://www.r-project.org/>`_
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
45 * `corrplot <https://cran.r-project.org/web/packages/corrplot/>`_
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
46 * `GCC 4.8.1 or greater <https://gcc.gnu.org/gcc-4.8/>`_ GCC 4.9.1 or greater is recomended for PyPi install
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
47
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
48 :Notes:
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
49 * While there are multiple methods of installing the prerequistes it may help to look at (if using a yum based linux distro):*
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
50 * `Devtoolset-3 <https://access.redhat.com/documentation/en-US/Red_Hat_Developer_Toolset/3/html/User_Guide/sect-Red_Hat_Developer_Toolset-Install.html>`_ for GCC compilers
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
51 * `IUS <https://ius.io/>`_ for Python2.7
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
52 * `Software Collections <https://www.softwarecollections.org/>`_ for collections of software (like devtoolset 3 or python)
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
53 * `rpmfinder <https://www.rpmfind.net/>`_ for searching rpms across mutliple systems
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
54
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
55 Setup
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
56 =====
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
57
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
58 1) Make sure that the GCC comiler is in your PATH:
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
59
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
60 ::
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
61
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
62 export PATH=/path/to/gcc:$PATH
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
63
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
64 2) Make sure that python2.7 is in your PYTHONPATH:
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
65
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
66 ::
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
67
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
68 export PYTHONPATH=/path/to/python2.7/site-packages:$PYTHONPATH
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
69
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
70 3) There are three methods of installation of ezBAMQC, from source, from setup.py, and from pypi, once prequistes are setup.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
71
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
72 From Source
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
73 ~~~~~~~~~~~
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
74
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
75 1) Download source
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
76
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
77 2) Unpack tarball and go to the directory of the package:
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
78
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
79 ::
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
80
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
81 tar xvfz bamqc-0.6.6.tar.gz
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
82
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
83 cd bamqc-0.6.6
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
84
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
85 3) Run make:
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
86
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
87 ::
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
88
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
89 make
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
90
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
91 From Setup.py
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
92 ~~~~~~~~~~~~~
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
93
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
94 ::
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
95
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
96 python2.7 setup.py install
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
97
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
98 From Pypi
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
99 ~~~~~~~~~
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
100
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
101 ::
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
102
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
103 pip2.7 install BAMqc
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
104
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
105 Usage
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
106 =====
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
107
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
108 ::
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
109
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
110 ezBAMQC [-h] -i alignment_files [alignment_files ...] -r [refgene]
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
111 [-f [attrID]] [--rRNA [rRNA]] -o [dir] [--stranded [stranded]]
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
112 [-q [mapq]] [-l labels [labels ...]] [-t NUMTHREADS]
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
113
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
114 optional arguments:
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
115
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
116 ::
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
117
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
118 -h, --help show this help message and exit.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
119 -i, --inputFile alignment files. Could be multiple SAM/BAM files separated by space. Required.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
120 -r, --refgene gene annotation file in GTF format. Required
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
121 -f the read summation at which feature level in the GTF file. DEFAULT: gene_id.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
122 --rRNA rRNA coordinates in BED format.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
123 -o, --outputDir output directory. Required.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
124 --stranded strandness of the library?
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
125 yes : sense stranded
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
126 reverse : reverse stranded
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
127 no : not stranded
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
128 DEFAULT: yes.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
129 -q, --mapq Minimum mapping quality (phred scaled) for an alignment to be called uniquely mapped. DEFAULT:30
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
130 -l, --label Labels of input files. DEFAULT:smp1 smp2 ...
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
131 -t, --threads Number of threads to use. DEFAULT:1
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
132
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
133 Example:
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
134
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
135 ::
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
136
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
137 ezBAMQC -i test-data/exp_data/treat1.bam test-data/exp_data/treat2.bam test-data/exp_data/treat3.bam -r test-data/exp_data/hg9_refGene.gtf -q 30 --rRNA test-data/exp_data/hg19_rRNA.bed -o exp_output2
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
138
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
139 Please find the example output from folder test-data.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
140
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
141 FAQ
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
142 ====
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
143 Q: Why use ezBAMQC?
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
144
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
145 A: ezBAMQC is efficient and easy to use. With one command line, it reports a comprehensive evaluation of the data with a set of plots and tables.The ability to assess multiple samples together with high efficiency make it especially useful in cases where there are a large number of samples from the same condition, genotype, or treatment. ezBAMQC was written in C++ and supports multithreading. A mouse RNA-seq sample with 120M alignments can be done in 8 minutes with 5 threads.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
146
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
147 Q: Why the total number of reads reported by ezBAMQC does not match with samtools flagstat?
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
148
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
149 A: The difference is because of non-uniquely mapped reads or multiply aligned reads (multi-reads). Samtools flagstat counts each multiple aligment as a different reads, but ezBAMQC counts reads accoriding to the read ID, i.e., each individual read will be counted once no matter that it is a uniquely mapped read or multi-read.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
150
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
151 Q: What is "Low Quality Reads" ?
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
152
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
153 A: Reads marked as qc fail accoriding to SAM format or reads with mapping quality lower than the value set by the option -q will be considered as "Low Quality Reads".
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
154
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
155 Q: How the setting of option -q alter the results?
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
156
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
157 A: Reads with low quality, i.e., did not pass -q cutoff, are only counted in Total Reads, Mapped Reads, and Mappability by mapping quality plot. The rest of the report does not include low quality reads.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
158
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
159 Q: Do multi-reads (non-uniquely mapped reads) have been considered in Read distribution and gene quantification?
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
160
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
161 A: No. Only uniquely mapped reads were counted.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
162
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
163
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
164 Acknowledgements
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
165 ================
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
166
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
167 #) Samtools contributors
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
168 #) Users' valuable feedback
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
169
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
170 Copying & Distribution
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
171 ======================
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
172
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
173 ezBAMQC is free software: you can redistribute it and/or modify
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
174 it under the terms of the GNU General Public License as published by
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
175 the Free Software Foundation, either version 3 of the License, or
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
176 (at your option) any later version.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
177
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
178 This program is distributed in the hope that it will be useful,
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
179 but *WITHOUT ANY WARRANTY*; without even the implied warranty of
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
180 *MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE*. See the
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
181 GNU General Public License for more details.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
182
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
183 You should have received a copy of the GNU General Public License
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
184 along with ezBAMQC. If not, see `this website <http://www.gnu.org/licenses/>`_