Mercurial > repos > weilong-guo > bs_seeker2
comparison BSseeker2/README.md @ 1:8b26adf64adc draft default tip
V2.0.5
author | weilong-guo |
---|---|
date | Tue, 05 Nov 2013 01:55:39 -0500 |
parents | e6df770c0e58 |
children |
comparison
equal
deleted
inserted
replaced
0:e6df770c0e58 | 1:8b26adf64adc |
---|---|
46 3. System requirements | 46 3. System requirements |
47 ============ | 47 ============ |
48 | 48 |
49 * Linux or Mac OS platform | 49 * Linux or Mac OS platform |
50 * One of the following Aligner | 50 * One of the following Aligner |
51 - [bowtie](http://bowtie-bio.sourceforge.net/) | 51 - [bowtie](http://bowtie-bio.sourceforge.net/) (fast) |
52 - [bowtie2](http://bowtie-bio.sourceforge.net/bowtie2/) (Recommend) | 52 - [bowtie2](http://bowtie-bio.sourceforge.net/bowtie2/) (Default) |
53 - [soap](http://soap.genomics.org.cn/) | 53 - [soap](http://soap.genomics.org.cn/) |
54 - [rmap](http://www.cmb.usc.edu/people/andrewds/rmap/) | |
54 * [Python](http://www.python.org/download/) (Version 2.6 +) | 55 * [Python](http://www.python.org/download/) (Version 2.6 +) |
55 | 56 |
56 (It is normally pre-installed in Linux. Type " python -V" to see the installed version.) | 57 (It is normally pre-installed in Linux. Type " python -V" to see the installed version.) |
57 | 58 |
58 * [pysam](http://code.google.com/p/pysam/) package is needed. | 59 * [pysam](http://code.google.com/p/pysam/) package is needed. |
72 | 73 |
73 ##Usage : | 74 ##Usage : |
74 | 75 |
75 $ python FilterReads.py | 76 $ python FilterReads.py |
76 Usage: FilterReads.py -i <input> -o <output> [-k] | 77 Usage: FilterReads.py -i <input> -o <output> [-k] |
77 Author : Guo, Weilong; guoweilong@gmail.com; 2012-11-10 | 78 Author : Guo, Weilong; 2012-11-10 |
78 Unique reads for qseq/fastq/fasta/sequencce, and filter | 79 Unique reads for qseq/fastq/fasta/sequencce, and filter |
79 low quality file in qseq file. | 80 low quality file in qseq file. |
80 | 81 |
81 Options: | 82 Options: |
82 -h, --help show this help message and exit | 83 -h, --help show this help message and exit |
103 | 104 |
104 Options: | 105 Options: |
105 -h, --help show this help message and exit | 106 -h, --help show this help message and exit |
106 -f FILE, --file=FILE Input your reference genome file (fasta) | 107 -f FILE, --file=FILE Input your reference genome file (fasta) |
107 --aligner=ALIGNER Aligner program to perform the analysis: bowtie, | 108 --aligner=ALIGNER Aligner program to perform the analysis: bowtie, |
108 bowtie2, soap [Default: bowtie2] | 109 bowtie2, soap, rmap [Default: bowtie] |
109 -p PATH, --path=PATH Path to the aligner program. Defaults: | 110 -p PATH, --path=PATH Path to the aligner program. Detected: |
110 bowtie: /u/home/mcdb/weilong/install/bowtie-0.12.8 | 111 bowtie: ~/install/bowtie |
111 bowtie2: | 112 bowtie2: ~/install/bowtie2 |
112 /u/home/mcdb/weilong/install/bowtie2-2.0.0-beta7 | 113 rmap: ~/install/rmap_/bin |
113 soap: /u/home/mcdb/weilong/install/soap2.21release/ | 114 soap: ~/install/soap/ |
114 -d DBPATH, --db=DBPATH | 115 -d DBPATH, --db=DBPATH |
115 Path to the reference genome library (generated in | 116 Path to the reference genome library (generated in |
116 preprocessing genome) [Default: /u/home/mcdb/weilong/i | 117 preprocessing genome) [Default: ~/install |
117 nstall/BSseeker2/bs_utils/reference_genomes] | 118 /BSseeker2/bs_utils/reference_genomes] |
118 -v, --version show version of BS-Seeker2 | 119 -v, --version show version of BS-Seeker2 |
119 | 120 |
120 Reduced Representation Bisulfite Sequencing Options: | 121 Reduced Representation Bisulfite Sequencing Options: |
121 Use this options with conjuction of -r [--rrbs] | 122 Use this options with conjuction of -r [--rrbs] |
122 | 123 |
123 -r, --rrbs Build index specially for Reduced Representation | 124 -r, --rrbs Build index specially for Reduced Representation |
124 Bisulfite Sequencing experiments. Genome other than | 125 Bisulfite Sequencing experiments. Genome other than |
125 certain fragments will be masked. [Default: False] | 126 certain fragments will be masked. [Default: False] |
126 -l LOW_BOUND, --low=LOW_BOUND | 127 -l LOW_BOUND, --low=LOW_BOUND |
127 lower bound of fragment length (excluding recognition | 128 lower bound of fragment length (excluding recognition |
128 sequence such as C-CGG) [Default: 40] | 129 sequence such as C-CGG) [Default: 20] |
129 -u UP_BOUND, --up=UP_BOUND | 130 -u UP_BOUND, --up=UP_BOUND |
130 upper bound of fragment length (excluding recognition | 131 upper bound of fragment length (excluding recognition |
131 sequence such as C-CGG ends) [Default: 500] | 132 sequence such as C-CGG ends) [Default: 500] |
132 -c CUT_FORMAT, --cut-site=CUT_FORMAT | 133 -c CUT_FORMAT, --cut-site=CUT_FORMAT |
133 Cut sites of restriction enzyme. Ex: MspI(C-CGG), | 134 Cut sites of restriction enzyme. Ex: MspI(C-CGG), |
134 Mael:(C-TAG), double-enzyme MspI&Mael:(C-CGG,C-TAG). | 135 Mael:(C-TAG), double-enzyme MspI&Mael:(C-CGG,C-TAG). |
135 [Default: C-CGG] | 136 [Default: C-CGG] |
136 | 137 |
138 | |
137 ##Example | 139 ##Example |
138 | 140 |
139 * Build genome index for WGBS using bowtie, path of bowtie should be included in $PATH | 141 * Build genome index for WGBS using bowtie, path of bowtie should be included in $PATH |
140 | 142 |
141 python bs_seeker2-build.py -f genome.fa --aligner=bowtie | 143 python bs_seeker2-build.py -f genome.fa --aligner=bowtie |
148 | 150 |
149 python bs_seeker2-build.py -f genome.fa -r -l 40 -u 400 --aligner=bowtie2 | 151 python bs_seeker2-build.py -f genome.fa -r -l 40 -u 400 --aligner=bowtie2 |
150 | 152 |
151 * Build genome index for RRBS library for double-enzyme : MspI (C-CGG) & ApeKI (G-CWGC, where W=A|T, see [IUPAC code](http://www.bioinformatics.org/sms/iupac.html)) | 153 * Build genome index for RRBS library for double-enzyme : MspI (C-CGG) & ApeKI (G-CWGC, where W=A|T, see [IUPAC code](http://www.bioinformatics.org/sms/iupac.html)) |
152 | 154 |
153 python bs_seeker2-build.py -f genome.fa -r -c C-CGG,G-CWGC | 155 python bs_seeker2-build.py -f genome.fa -r -c C-CGG,G-CWGC --aligner=bowtie |
154 | 156 |
155 ##Tips: | 157 ##Tips: |
156 | 158 |
157 - Index built for BS-Seeker2 is different from the index for BS-Seeker 1. | 159 - Index built for BS-Seeker2 is different from the index for BS-Seeker 1. |
158 For RRBS, you need to specify "-r" in the parameters. Also, you need to specify LOW_BOUND and UP_BOUND for the range of fragment lengths according your protocol. | 160 For RRBS, you need to specify "-r" in the parameters. Also, you need to specify LOW_BOUND and UP_BOUND for the range of fragment lengths according your protocol. |
170 Module to map reads on 3-letter converted genome. | 172 Module to map reads on 3-letter converted genome. |
171 | 173 |
172 ##Usage : | 174 ##Usage : |
173 | 175 |
174 $ python ~/install/BSseeker2/bs_seeker2-align.py -h | 176 $ python ~/install/BSseeker2/bs_seeker2-align.py -h |
175 Usage: bs_seeker2-align.py [options] | 177 Usage: bs_seeker2-align.py {-i <single> | -1 <mate1> -2 <mate2>} -g <genome.fa> [options] |
176 | 178 |
177 Options: | 179 Options: |
178 -h, --help show this help message and exit | 180 -h, --help show this help message and exit |
179 | 181 |
180 For single end reads: | 182 For single end reads: |
181 -i INFILE, --input=INFILE | 183 -i INFILE, --input=INFILE |
182 Input your read file name (FORMAT: sequences, | 184 Input read file (FORMAT: sequences, qseq, fasta, |
183 fastq, qseq,fasta) | 185 fastq). Ex: read.fa or read.fa.gz |
184 | 186 |
185 For pair end reads: | 187 For pair end reads: |
186 -1 FILE, --input_1=FILE | 188 -1 FILE, --input_1=FILE |
187 Input your read file end 1 (FORMAT: sequences, | 189 Input read file, mate 1 (FORMAT: sequences, qseq, |
188 qseq, fasta, fastq) | 190 fasta, fastq) |
189 -2 FILE, --input_2=FILE | 191 -2 FILE, --input_2=FILE |
190 Input your read file end 2 (FORMAT: sequences, | 192 Input read file, mate 2 (FORMAT: sequences, qseq, |
191 qseq, fasta, fastq) | 193 fasta, fastq) |
192 --minins=MIN_INSERT_SIZE | 194 -I MIN_INSERT_SIZE, --minins=MIN_INSERT_SIZE |
193 The minimum insert size for valid paired-end | 195 The minimum insert size for valid paired-end |
194 alignments [Default: -1] | 196 alignments [Default: 0] |
195 --maxins=MAX_INSERT_SIZE | 197 -X MAX_INSERT_SIZE, --maxins=MAX_INSERT_SIZE |
196 The maximum insert size for valid paired-end | 198 The maximum insert size for valid paired-end |
197 alignments [Default: 400] | 199 alignments [Default: 500] |
198 | 200 |
199 Reduced Representation Bisulfite Sequencing Options: | 201 Reduced Representation Bisulfite Sequencing Options: |
200 -r, --rrbs Process reads from Reduced Representation Bisulfite | 202 -r, --rrbs Map reads to the Reduced Representation genome |
201 Sequencing experiments | |
202 -c pattern, --cut-site=pattern | 203 -c pattern, --cut-site=pattern |
203 Cutting sites of restriction enzyme. Ex: MspI(C-CGG), | 204 Cutting sites of restriction enzyme. Ex: MspI(C-CGG), |
204 Mael:(C-TAG), double-enzyme MspI&Mael:(C-CGG,C-TAG). | 205 Mael:(C-TAG), double-enzyme MspI&Mael:(C-CGG,C-TAG). |
206 [Default: C-CGG] | |
205 -L RRBS_LOW_BOUND, --low=RRBS_LOW_BOUND | 207 -L RRBS_LOW_BOUND, --low=RRBS_LOW_BOUND |
206 lower bound of fragment length (excluding C-CGG ends) | 208 Lower bound of fragment length (excluding C-CGG ends) |
207 [Default: 40] | 209 [Default: 20] |
208 -U RRBS_UP_BOUND, --up=RRBS_UP_BOUND | 210 -U RRBS_UP_BOUND, --up=RRBS_UP_BOUND |
209 upper bound of fragment length (excluding C-CGG ends) | 211 Upper bound of fragment length (excluding C-CGG ends) |
210 [Default: 500] | 212 [Default: 500] |
211 | 213 |
212 General options: | 214 General options: |
213 -t TAG, --tag=TAG [Y]es for undirectional lib, [N]o for directional | 215 -t TAG, --tag=TAG [Y]es for undirectional lib, [N]o for directional |
214 [Default: N] | 216 [Default: N] |
215 -s CUTNUMBER1, --start_base=CUTNUMBER1 | 217 -s CUTNUMBER1, --start_base=CUTNUMBER1 |
216 The first base of your read to be mapped [Default: 1] | 218 The first cycle of the read to be mapped [Default: 1] |
217 -e CUTNUMBER2, --end_base=CUTNUMBER2 | 219 -e CUTNUMBER2, --end_base=CUTNUMBER2 |
218 The last cycle number of your read to be mapped | 220 The last cycle of the read to be mapped [Default: 200] |
219 [Default: 200] | |
220 -a FILE, --adapter=FILE | 221 -a FILE, --adapter=FILE |
221 Input text file of your adaptor sequences (to be | 222 Input text file of your adaptor sequences (to be |
222 trimed from the 3'end of the reads). Input 1 seq for | 223 trimmed from the 3'end of the reads, ). Input one seq |
223 dir. lib., 2 seqs for undir. lib. One line per | 224 for dir. lib., twon seqs for undir. lib. One line per |
224 sequence | 225 sequence. Only the first 10bp will be used |
225 --am=ADAPTER_MISMATCH | 226 --am=ADAPTER_MISMATCH |
226 Number of mismatches allowed in adaptor [Default: 1] | 227 Number of mismatches allowed in adapter [Default: 0] |
227 -g GENOME, --genome=GENOME | 228 -g GENOME, --genome=GENOME |
228 Name of the reference genome (the same as the | 229 Name of the reference genome (should be the same as |
229 reference genome file in the preprocessing step) [ex. | 230 "-f" in bs_seeker2-build.py ) [ex. chr21_hg18.fa] |
230 chr21_hg18.fa] | 231 -m NO_MISMATCHES, --mismatches=NO_MISMATCHES |
231 -m INT_NO_MISMATCHES, --mismatches=INT_NO_MISMATCHES | |
232 Number of mismatches in one read [Default: 4] | 232 Number of mismatches in one read [Default: 4] |
233 --aligner=ALIGNER Aligner program to perform the analisys: bowtie, | 233 --aligner=ALIGNER Aligner program for short reads mapping: bowtie, |
234 bowtie2, soap [Default: bowtie2] | 234 bowtie2, soap, rmap [Default: bowtie] |
235 -p PATH, --path=PATH | 235 -p PATH, --path=PATH |
236 Path to the aligner program. Defaults: | 236 Path to the aligner program. Detected: |
237 bowtie: /u/home/mcdb/weilong/install/bowtie-0.12.8 | 237 bowtie: ~/install/bowtie |
238 bowtie2: | 238 bowtie2: ~/install/bowtie2 |
239 /u/home/mcdb/weilong/install/bowtie2-2.0.0-beta7 | 239 rmap: ~/install/rmap/bin |
240 soap: /u/home/mcdb/weilong/soap2.21release/ | 240 soap: ~/install/soap/ |
241 -d DBPATH, --db=DBPATH | 241 -d DBPATH, --db=DBPATH |
242 Path to the reference genome library (generated in | 242 Path to the reference genome library (generated in |
243 preprocessing genome) [Default: /u/home/mcdb/weilong/i | 243 preprocessing genome) [Default: ~/i |
244 nstall/BSseeker2/bs_utils/reference_genomes] | 244 nstall/BSseeker2/bs_utils/reference_genomes] |
245 -l NO_SPLIT, --split_line=NO_SPLIT | 245 -l NO_SPLIT, --split_line=NO_SPLIT |
246 Number of lines per split (the read file will be split | 246 Number of lines per split (the read file will be split |
247 into small files for mapping. The result will be | 247 into small files for mapping. The result will be |
248 merged. [Default: 4000000] | 248 merged. [Default: 4000000] |
249 -o OUTFILE, --output=OUTFILE | 249 -o OUTFILE, --output=OUTFILE |
250 The name of output file [INFILE.bs(se|pe|rrbs)] | 250 The name of output file [INFILE.bs(se|pe|rrbs)] |
251 -f FORMAT, --output-format=FORMAT | 251 -f FORMAT, --output-format=FORMAT |
252 Output format: bam, sam, bs_seeker1 [Default: bam] | 252 Output format: bam, sam, bs_seeker1 [Default: bam] |
253 --no-header Suppress SAM header lines [Default: False] | 253 --no-header Suppress SAM header lines [Default: False] |
254 --temp_dir=PATH The path to your temporary directory [Default: /tmp] | 254 --temp_dir=PATH The path to your temporary directory [Detected: /tmp] |
255 --XS=XS_FILTER Filter definition for tag XS, format X,Y. X=0.8 and | 255 --XS=XS_FILTER Filter definition for tag XS, format X,Y. X=0.8 and |
256 y=5 indicate that for one read, if #(mCH sites)/#(all | 256 y=5 indicate that for one read, if #(mCH sites)/#(all |
257 CH sites)>0.8 and #(mCH sites)>5, then tag XS=1; or | 257 CH sites)>0.8 and #(mCH sites)>5, then tag XS=1; or |
258 else tag XS=0. [Default: 0.5,5] | 258 else tag XS=0. [Default: 0.5,5] |
259 --multiple-hit Output reads with multiple hits to | 259 --multiple-hit Output reads with multiple hits to |
261 -v, --version show version of BS-Seeker2 | 261 -v, --version show version of BS-Seeker2 |
262 | 262 |
263 Aligner Options: | 263 Aligner Options: |
264 You may specify any additional options for the aligner. You just have | 264 You may specify any additional options for the aligner. You just have |
265 to prefix them with --bt- for bowtie, --bt2- for bowtie2, --soap- for | 265 to prefix them with --bt- for bowtie, --bt2- for bowtie2, --soap- for |
266 soap, and BS Seeker will pass them on. For example: --bt-p 4 will | 266 soap, --rmap- for rmap, and BS Seeker will pass them on. For example: |
267 increase the number of threads for bowtie to 4, --bt--tryhard will | 267 --bt-p 4 will increase the number of threads for bowtie to 4, --bt-- |
268 instruct bowtie to try as hard as possible to find valid alignments | 268 tryhard will instruct bowtie to try as hard as possible to find valid |
269 when they exist, and so on. Be sure that you know what you are doing | 269 alignments when they exist, and so on. Be sure that you know what you |
270 when using these options! Also, we don't do any validation on the | 270 are doing when using these options! Also, we don't do any validation |
271 values. | 271 on the values. |
272 | |
273 | 272 |
274 | 273 |
275 ##Examples : | 274 ##Examples : |
276 | 275 |
277 * Align from fasta format with bowtie2 (local alignment) for whole genome, allowing 3 mismatches | 276 * WGBS library ; alignment mode, bowtie ; map to WGBS index |
278 | 277 |
279 python bs_seeker2-align.py -i WGBS.fa -m 3 --aligner=bowtie2 -o WGBS.bam -f bam -g genome.fa | 278 python bs_seeker2-align.py -i WGBS.fa --aligner=bowtie -o WGBS.bam -f bam -g genome.fa |
280 | 279 |
281 * Align from qseq format for RRBS with bowtie, default parameters for RRBS fragments | 280 * WGBS library ; alignment mode, bowtie2-local ; map to WGBS index |
282 | 281 |
283 python bs_seeker2-align.py -i RRBS.fa --aligner=bowtie -o RRBS.sam -f sam -g genome.fa -r -a adapter.txt | 282 python bs_seeker2-align.py -i WGBS.fa --aligner=bowtie2 -o WGBS.bam -f bam -g genome.fa |
284 | 283 |
285 * Align from qseq format for RRBS with bowtie (end-to-end), specifying lengths of fragments ranging [40bp, 400bp] | 284 * WGBS library ; alignment mode, bowtie2-end-to-end ; map to WGBS index |
286 | 285 |
287 python bs_seeker2-align.py -i RRBS.qseq --aligner=bowtie2 --bt2--end-to-end -o RRBS.bam -f bam -g genome.fa -r --low=40 --up=400 -a adapter.txt | 286 python bs_seeker2-align.py -i WGBS.fa -m 3 --aligner=bowtie2 -o WGBS.bam -f bam -g genome.fa --bt2--end-to-end |
287 | |
288 * RRBS library ; alignment mode, bowtie ; map to RR index | |
289 | |
290 python bs_seeker2-align.py -i RRBS.fa --aligner=bowtie -o RRBS.bam -g genome.fa -r -a adapter.txt | |
291 | |
292 * RRBS library ; alignment mode, bowtie ; map to WG index | |
293 | |
294 python bs_seeker2-align.py -i RRBS.fa --aligner=bowtie -o RRBS.bam -g genome.fa -a adapter.txt | |
295 | |
296 * RRBS library ; alignment mode, bowtie2-end-to-end ; map to WG index | |
297 | |
298 python bs_seeker2-align.py -i RRBS.fa --aligner=bowtie -o RRBS.bam -g genome.fa -a adapter.txt --bt2--end-to-end | |
299 | |
300 * Align from qseq format for RRBS with bowtie, specifying lengths of fragments ranging [40bp, 400bp] | |
301 | |
302 python bs_seeker2-align.py -i RRBS.qseq --aligner=bowtie -o RRBS.bam -f bam -g genome.fa -r --low=40 --up=400 -a adapter.txt | |
288 | 303 |
289 The parameters '--low' and '--up' should be the same with corresponding parameters when building the genome index | 304 The parameters '--low' and '--up' should be the same with corresponding parameters when building the genome index |
290 | 305 |
306 * WGBS library ; alignment mode, bowtie ; map to WGBS index; use 8 threads for alignment | |
307 | |
308 python bs_seeker2-align.py -i WGBS.fa --aligner=bowtie -o WGBS.bam -f bam -g genome.fa --bt-p 4 | |
309 | |
310 BS-Seeker2 will run TWO bowtie instances in parallel. | |
291 | 311 |
292 | 312 |
293 ##Input file: | 313 ##Input file: |
294 | 314 |
295 - Adapter.txt (example) : | 315 - Adapter.txt (example) : |
339 | 359 |
340 - It's always better to use a wider range for fragment length. | 360 - It's always better to use a wider range for fragment length. |
341 | 361 |
342 For example, if 95% of reads come from fragments with length range [50bp, 250bp], you'd better choose [40bp, 300bp]. | 362 For example, if 95% of reads come from fragments with length range [50bp, 250bp], you'd better choose [40bp, 300bp]. |
343 | 363 |
364 - Fewer mismatches for the 'local alignment' mode. | |
365 | |
366 As the 'local alignment', the bad sequenced bases are usually trimmed, and would not be considered by the parameter "-m". | |
367 It is suggested to user fewer mismatches for the 'local alignment' mode. | |
344 | 368 |
345 (3) bs_seeker2-call_methylation.py | 369 (3) bs_seeker2-call_methylation.py |
346 ------------ | 370 ------------ |
347 | 371 |
348 | 372 |
349 This module calls methylation levels from the mapping result. | 373 This module calls methylation levels from the mapping result. |
350 | 374 |
351 ##Usage: | 375 ##Usage: |
352 | 376 |
353 $ python bs_seeker2-call_methylation.py -h | 377 $ python bs_seeker2-call_methylation.py -h |
354 Usage: bs_seeker2-call_methylation.py [options] | |
355 | |
356 Options: | 378 Options: |
357 -h, --help show this help message and exit | 379 -h, --help show this help message and exit |
358 -i INFILE, --input=INFILE | 380 -i INFILE, --input=INFILE |
359 BAM output from bs_seeker2-align.py | 381 BAM output from bs_seeker2-align.py |
360 -d DBPATH, --db=DBPATH | 382 -d DBPATH, --db=DBPATH |
361 Path to the reference genome library (generated in | 383 Path to the reference genome library (generated in |
362 preprocessing genome) [Default: /u/home/mcdb/weilong/i | 384 preprocessing genome) [Default: ~/install |
363 nstall/BSseeker2/bs_utils/reference_genomes] | 385 /BSseeker2/bs_utils/reference_genomes] |
364 -o OUTFILE, --output-prefix=OUTFILE | 386 -o OUTFILE, --output-prefix=OUTFILE |
365 The output prefix to create ATCGmap and wiggle files | 387 The output prefix to create ATCGmap and wiggle files |
366 [INFILE] | 388 [INFILE] |
367 --wig=OUTFILE The output .wig file [INFILE.wig] | 389 --wig=OUTFILE The output .wig file [INFILE.wig] |
368 --CGmap=OUTFILE The output .CGmap file [INFILE.CGmap] | 390 --CGmap=OUTFILE The output .CGmap file [INFILE.CGmap] |
369 --ATCGmap=OUTFILE The output .ATCGmap file [INFILE.ATCGmap] | 391 --ATCGmap=OUTFILE The output .ATCGmap file [INFILE.ATCGmap] |
370 -x, --rm-SX Removed reads with tag 'XS:i:1', which would be | 392 -x, --rm-SX Removed reads with tag 'XS:i:1', which would be |
371 considered as not fully converted by bisulfite | 393 considered as not fully converted by bisulfite |
372 treatment [Default: False] | 394 treatment [Default: False] |
395 --txt Show CGmap and ATCGmap in .gz [Default: False] | |
373 -r READ_NO, --read-no=READ_NO | 396 -r READ_NO, --read-no=READ_NO |
374 The least number of reads covering one site to be | 397 The least number of reads covering one site to be |
375 shown in wig file [Default: 1] | 398 shown in wig file [Default: 1] |
376 -v, --version show version of BS-Seeker2 | 399 -v, --version show version of BS-Seeker2 |
377 | 400 |
428 (1) chromosome | 451 (1) chromosome |
429 (2) nucleotide on Watson (+) strand | 452 (2) nucleotide on Watson (+) strand |
430 (3) position | 453 (3) position |
431 (4) context (CG/CHG/CHH) | 454 (4) context (CG/CHG/CHH) |
432 (5) dinucleotide-context (CA/CC/CG/CT) | 455 (5) dinucleotide-context (CA/CC/CG/CT) |
433 (6) methyltion-level = #-of-C / (#-of-C + #-of-T) | 456 (6) methylation-level = #_of_C / (#_of_C + #_of_T). |
434 (7) #-of-C (methylated) | 457 (7) #_of_C (methylated C, the count of reads showing C here) |
435 (8) (#-ofC + #-of-T) (all cytosines) | 458 (8) = #_of_C + #_of_T (all Cytosines, the count of reads showing C or T here) |
436 | 459 |
437 | 460 |
438 - ATCGmap file | 461 - ATCGmap file |
439 | 462 |
440 Sample: | 463 Sample: |
441 | 464 |
442 chr1 T 3009410 -- -- 0 10 0 0 0 0 0 0 0 0 na | 465 chr1 T 3009410 -- -- 0 10 0 0 0 0 0 0 0 0 na |
443 chr1 C 3009411 CHH CC 0 10 0 0 0 0 0 0 0 0 0.0 | 466 chr1 C 3009411 CHH CC 0 10 0 0 0 0 0 0 0 0 0.0 |
444 chr1 C 3009412 CHG CC 0 10 0 0 0 0 0 0 0 0 0.0 | 467 chr1 C 3009412 CHG CC 0 10 0 0 0 0 0 0 0 0 0.0 |
445 chr1 C 3009413 CG CG 0 10 50 0 0 0 0 0 0 0 0.833333333333 | 468 chr1 C 3009413 CG CG 0 10 50 0 0 0 0 0 0 0 0.83 |
446 | 469 |
447 | 470 |
448 Format descriptions: | 471 Format descriptions: |
449 | 472 |
450 (1) chromosome | 473 (1) chromosome |
465 (12) # of reads from Crick strand mapped here, support T on Watson strand and A on Crick strand | 488 (12) # of reads from Crick strand mapped here, support T on Watson strand and A on Crick strand |
466 (13) # of reads from Crick strand mapped here, support C on Watson strand and G on Crick strand | 489 (13) # of reads from Crick strand mapped here, support C on Watson strand and G on Crick strand |
467 (14) # of reads from Crick strand mapped here, support G on Watson strand and C on Crick strand | 490 (14) # of reads from Crick strand mapped here, support G on Watson strand and C on Crick strand |
468 (15) # of reads from Crick strand mapped here, support N | 491 (15) # of reads from Crick strand mapped here, support N |
469 | 492 |
470 (16) methylation_level = #C/(#C+#T) = (C8+C14)/(C7+C8+C11+C14); "nan" means none reads support C/T at this position. | 493 (16) methylation_level = #C/(#C+#T) = C8/(C7+C8) for Watson strand, =C14/(C11+C14) for Crick strand; |
494 "nan" means none reads support C/T at this position. | |
471 | 495 |
472 | 496 |
473 | 497 |
474 Contact Information | 498 Contact Information |
475 ============ | 499 ============ |
476 | 500 |
477 If you still have questions on BS-Seeker 2, or you find bugs when using BS-Seeker 2, or you have suggestions, please write email to guoweilong@gmail.com (Weilong Guo). | 501 If you still have questions on BS-Seeker 2, or you find bugs when using BS-Seeker 2, or you have suggestions, please write email to [Weilong Guo](guoweilong@gmail.com). |
478 | 502 |
479 | 503 |
480 | 504 |
481 Questions & Answers | 505 Questions & Answers |
482 ============ | 506 ============ |
554 mv master BSSeeker2.zip | 578 mv master BSSeeker2.zip |
555 unzip BSSeeker2.zip | 579 unzip BSSeeker2.zip |
556 cd BSseeker2-master/ | 580 cd BSseeker2-master/ |
557 | 581 |
558 | 582 |
559 (4)Run BS-Seeker2 | 583 (4)Run BS-Seeker2 |
560 | 584 |
561 Q: Can I add the path of BS-Seeker2's *.py to the $PATH, so I can call | 585 Q: Can I add the path of BS-Seeker2's *.py to the $PATH, so I can call |
562 BS-Seeker2 from anywhere? | 586 BS-Seeker2 from anywhere? |
563 | 587 |
564 A: If you're using the "python" from path "/usr/bin/python", you can directly | 588 A: If you're using the "python" from path "/usr/bin/python", you can directly |
583 | 607 |
584 bs_seeker_build.py -h | 608 bs_seeker_build.py -h |
585 bs_seeker-align.py -h | 609 bs_seeker-align.py -h |
586 bs_seeker-call_methylation.py -h | 610 bs_seeker-call_methylation.py -h |
587 | 611 |
588 | 612 (5) Unique alignment |
589 | 613 |
590 | 614 Q: If I want to only keep alignments that map uniquely, is this an argument I should supply directly |
615 to Bowtie2 (via BS Seeker 2's command line option), or is this an option that's available in | |
616 BS Seeker 2 itself? | |
617 | |
618 A: BS-Seeker2 reports unique alignment by default already. If you want to know how many reads | |
619 could have multiple hits, run BS-Seeker2 with parameter "--multiple-hit". | |
620 | |
621 | |
622 (6) Output | |
623 | |
624 Q: In CGmap files, why some lines shown "--" but not a motif (CG/CHG/CHH), for example: | |
625 | |
626 chr01 C 4303711 -- CC 0.0 0 2 | |
627 chr01 C 4303712 -- CN 0.0 0 2 | |
628 | |
629 A: In this example, the site 4303713 is "N" in genome, thus we could not decide the explict pattern. | |
630 | |
631 (7) Algorithm to remove the adapter. | |
632 | |
633 Q: What's the algorithm to remove the adapter | |
634 | |
635 A: BS-Seeker2 has built-in algorithm for removing the adapter, which is developed by [Weilong Guo](http://bioinfo.au.tsinghua.edu.cn/member/wguo/index.html). | |
636 | |
637 First, if the adapter was provided as "AGATCGGAAGAGCACACGTC", only the first 10bp would be used. | |
638 Second, we use semi-local alignment strategy for removing the adapters. | |
639 Exmaple: | |
640 | |
641 Read : ACCGCGTTGATCGAGTACGTACGTGGGTC | |
642 Adapter : ....................ACGTGGGTCCCG | |
643 | |
644 no_mismatch : the maximum number allowed for mismatches | |
645 | |
646 Algorithm: (allowing 1 mismatch) | |
647 -Step 1: | |
648 ACCGCGTTGATCGAGTACGTACGTGGGTC | |
649 ||XX | |
650 ACGTGGGTCCCG | |
651 -Step 2: | |
652 ACCGCGTTGATCGAGTACGTACGTGGGTC | |
653 X||X | |
654 .ACGTGGGTCCCG | |
655 -Step 3: | |
656 ACCGCGTTGATCGAGTACGTACGTGGGTC | |
657 XX | |
658 ..ACGTGGGTCCCG | |
659 -Step ... | |
660 -Step N: | |
661 ACCGCGTTGATCGAGTACGTACGTGGGTC | |
662 ||||||||| | |
663 ....................ACGTGGGTCCCG | |
664 Success & return! | |
665 | |
666 Third, we also removed the synthesized bases at the end of RRBS fragments. | |
667 Take the "C-CGG" cutting site as example, | |
668 | |
669 - - C|U G G - - =>cut=> - - C =>add=> - - C|C G =>sequencing | |
670 - - G G C|C - - - - G G C - - G G C | |
671 | |
672 In our algorithm, the "CG" in "--CCG" (upper strand) was trimmed, in order to get accurate methyaltion level. | |
673 | |
674 | |
675 | |
676 | |
677 |