Mercurial > repos > youngkim > ezbamqc
comparison ezBAMQC/README.rst @ 0:dfa3745e5fd8
Uploaded
author | youngkim |
---|---|
date | Thu, 24 Mar 2016 17:12:52 -0400 |
parents | |
children | 6610eedd9fae |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:dfa3745e5fd8 |
---|---|
1 .. image:: https://raw.githubusercontent.com/mhammell-laboratory/bamqc/master/doc/bamqc-icon.png | |
2 :width: 200 px | |
3 :alt: generated at codeology.braintreepayments.com/mhammell-laboratory/bamqc | |
4 :align: right | |
5 :target: http://codeology.braintreepayments.com/mhammell-laboratory/bamqc | |
6 | |
7 ===== | |
8 ezBAMQC | |
9 ===== | |
10 *"ezBAMQC, a tool to check the quality of mapped next generation sequencing files."* | |
11 | |
12 :Description: | |
13 | |
14 ezBAMQC is a tool to check the quality of either one or many mapped next-generation-sequencing datasets. It conducts comprehensive evaluations of aligned sequencing data from multiple aspects including: clipping profile, mapping quality distribution, mapped read length distribution, genomic/transcriptomic mapping distribution, inner distance distribution (for paired-end reads), ribosomal RNA contamination, transcript 5’ and 3’ end bias, transcription dropout rate, sample correlations, sample reproducibility, sample variations. It outputs a set of tables and plots and one HTML page that contains a summary of the results. Many metrics are designed for RNA-seq data specifically, but ezBAMQC can be applied to any mapped sequencing dataset such as RNA-seq, CLIP-seq, GRO-seq, ChIP-seq, DNA-seq and so on. :: | |
15 | |
16 :Links: | |
17 | |
18 `Github Page <https://github.com/mhammell-laboratory/bamqc>`_ | |
19 | |
20 `Pypi Page <https://pypi.python.org/pypi/ezBAMQC>`_ | |
21 | |
22 `MHammell Lab <http://hammelllab.labsites.cshl.edu/software>`_ | |
23 | |
24 :Authors: | |
25 Ying Jin, David Molik, and Molly Hammell | |
26 | |
27 :Version: 0.6.5 | |
28 | |
29 :Contact: | |
30 Ying Jin (yjin@cshl.edu) | |
31 | |
32 Installation guide for ezBAMQC for from source installs | |
33 ===================================================== | |
34 | |
35 When installing ezBAMQC there are several options, but the main point is: since ezBAMQC uses C++ STD 11 you'll need a version of GCC that can support that, this useally means 4.8 or 4.9. beyond that, you'll need Python, R and Corrplot for interfacing with the C code. | |
36 | |
37 :Intallation: | |
38 `Source Code <https://github.com/mhammell-laboratory/ezBAMQC/releases>`_ | |
39 | |
40 `Pypi <https://pypi.python.org/pypi?:action=display&name=ezBAMQC>`_ | |
41 | |
42 :Prerequisites: | |
43 * `python2.7 <https://www.python.org/download/releases/2.7/>`_ | |
44 * `R <https://www.r-project.org/>`_ | |
45 * `corrplot <https://cran.r-project.org/web/packages/corrplot/>`_ | |
46 * `GCC 4.8.1 or greater <https://gcc.gnu.org/gcc-4.8/>`_ GCC 4.9.1 or greater is recomended for PyPi install | |
47 | |
48 :Notes: | |
49 * While there are multiple methods of installing the prerequistes it may help to look at (if using a yum based linux distro):* | |
50 * `Devtoolset-3 <https://access.redhat.com/documentation/en-US/Red_Hat_Developer_Toolset/3/html/User_Guide/sect-Red_Hat_Developer_Toolset-Install.html>`_ for GCC compilers | |
51 * `IUS <https://ius.io/>`_ for Python2.7 | |
52 * `Software Collections <https://www.softwarecollections.org/>`_ for collections of software (like devtoolset 3 or python) | |
53 * `rpmfinder <https://www.rpmfind.net/>`_ for searching rpms across mutliple systems | |
54 | |
55 Setup | |
56 ===== | |
57 | |
58 1) Make sure that the GCC comiler is in your PATH: | |
59 | |
60 :: | |
61 | |
62 export PATH=/path/to/gcc:$PATH | |
63 | |
64 2) Make sure that python2.7 is in your PYTHONPATH: | |
65 | |
66 :: | |
67 | |
68 export PYTHONPATH=/path/to/python2.7/site-packages:$PYTHONPATH | |
69 | |
70 3) There are three methods of installation of ezBAMQC, from source, from setup.py, and from pypi, once prequistes are setup. | |
71 | |
72 From Source | |
73 ~~~~~~~~~~~ | |
74 | |
75 1) Download source | |
76 | |
77 2) Unpack tarball and go to the directory of the package: | |
78 | |
79 :: | |
80 | |
81 tar xvfz bamqc-0.6.6.tar.gz | |
82 | |
83 cd bamqc-0.6.6 | |
84 | |
85 3) Run make: | |
86 | |
87 :: | |
88 | |
89 make | |
90 | |
91 From Setup.py | |
92 ~~~~~~~~~~~~~ | |
93 | |
94 :: | |
95 | |
96 python2.7 setup.py install | |
97 | |
98 From Pypi | |
99 ~~~~~~~~~ | |
100 | |
101 :: | |
102 | |
103 pip2.7 install BAMqc | |
104 | |
105 Usage | |
106 ===== | |
107 | |
108 :: | |
109 | |
110 ezBAMQC [-h] -i alignment_files [alignment_files ...] -r [refgene] | |
111 [-f [attrID]] [--rRNA [rRNA]] -o [dir] [--stranded [stranded]] | |
112 [-q [mapq]] [-l labels [labels ...]] [-t NUMTHREADS] | |
113 | |
114 optional arguments: | |
115 | |
116 :: | |
117 | |
118 -h, --help show this help message and exit. | |
119 -i, --inputFile alignment files. Could be multiple SAM/BAM files separated by space. Required. | |
120 -r, --refgene gene annotation file in GTF format. Required | |
121 -f the read summation at which feature level in the GTF file. DEFAULT: gene_id. | |
122 --rRNA rRNA coordinates in BED format. | |
123 -o, --outputDir output directory. Required. | |
124 --stranded strandness of the library? | |
125 yes : sense stranded | |
126 reverse : reverse stranded | |
127 no : not stranded | |
128 DEFAULT: yes. | |
129 -q, --mapq Minimum mapping quality (phred scaled) for an alignment to be called uniquely mapped. DEFAULT:30 | |
130 -l, --label Labels of input files. DEFAULT:smp1 smp2 ... | |
131 -t, --threads Number of threads to use. DEFAULT:1 | |
132 | |
133 Example: | |
134 | |
135 :: | |
136 | |
137 ezBAMQC -i test-data/exp_data/treat1.bam test-data/exp_data/treat2.bam test-data/exp_data/treat3.bam -r test-data/exp_data/hg9_refGene.gtf -q 30 --rRNA test-data/exp_data/hg19_rRNA.bed -o exp_output2 | |
138 | |
139 Please find the example output from folder test-data. | |
140 | |
141 FAQ | |
142 ==== | |
143 Q: Why use ezBAMQC? | |
144 | |
145 A: ezBAMQC is efficient and easy to use. With one command line, it reports a comprehensive evaluation of the data with a set of plots and tables.The ability to assess multiple samples together with high efficiency make it especially useful in cases where there are a large number of samples from the same condition, genotype, or treatment. ezBAMQC was written in C++ and supports multithreading. A mouse RNA-seq sample with 120M alignments can be done in 8 minutes with 5 threads. | |
146 | |
147 Q: Why the total number of reads reported by ezBAMQC does not match with samtools flagstat? | |
148 | |
149 A: The difference is because of non-uniquely mapped reads or multiply aligned reads (multi-reads). Samtools flagstat counts each multiple aligment as a different reads, but ezBAMQC counts reads accoriding to the read ID, i.e., each individual read will be counted once no matter that it is a uniquely mapped read or multi-read. | |
150 | |
151 Q: What is "Low Quality Reads" ? | |
152 | |
153 A: Reads marked as qc fail accoriding to SAM format or reads with mapping quality lower than the value set by the option -q will be considered as "Low Quality Reads". | |
154 | |
155 Q: How the setting of option -q alter the results? | |
156 | |
157 A: Reads with low quality, i.e., did not pass -q cutoff, are only counted in Total Reads, Mapped Reads, and Mappability by mapping quality plot. The rest of the report does not include low quality reads. | |
158 | |
159 Q: Do multi-reads (non-uniquely mapped reads) have been considered in Read distribution and gene quantification? | |
160 | |
161 A: No. Only uniquely mapped reads were counted. | |
162 | |
163 | |
164 Acknowledgements | |
165 ================ | |
166 | |
167 #) Samtools contributors | |
168 #) Users' valuable feedback | |
169 | |
170 Copying & Distribution | |
171 ====================== | |
172 | |
173 ezBAMQC is free software: you can redistribute it and/or modify | |
174 it under the terms of the GNU General Public License as published by | |
175 the Free Software Foundation, either version 3 of the License, or | |
176 (at your option) any later version. | |
177 | |
178 This program is distributed in the hope that it will be useful, | |
179 but *WITHOUT ANY WARRANTY*; without even the implied warranty of | |
180 *MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE*. See the | |
181 GNU General Public License for more details. | |
182 | |
183 You should have received a copy of the GNU General Public License | |
184 along with ezBAMQC. If not, see `this website <http://www.gnu.org/licenses/>`_ |