0
|
1 BS-Seeker2
|
|
2 =========
|
|
3
|
|
4 BS-Seeker2 (BS Seeker 2) performs accurate and fast mapping of bisulfite-treated short reads. BS-Seeker2 is an updated version on BS-Seeker.
|
|
5
|
|
6 0. Availability
|
|
7 ============
|
|
8
|
|
9 Homepage of [BS-Seeker2](http://pellegrini.mcdb.ucla.edu/BS_Seeker2/).
|
|
10
|
|
11 The source code for this package is available from
|
|
12 [https://github.com/BSSeeker/BSseeker2](https://github.com/BSSeeker/BSseeker2).
|
|
13 Also, you can use an instance of BS-Seeker 2 in Galaxy from [http://galaxy.hoffman2.idre.ucla.edu](http://galaxy.hoffman2.idre.ucla.edu).
|
|
14 (Label: "NGS: Methylation Mapping"/"Methylation Map with BS Seeker2")
|
|
15
|
|
16
|
|
17 1. Remarkable new features
|
|
18 ============
|
|
19 * Reduced index for RRBS, accelerating the mapping speed and increasing mappability
|
|
20 * Allowing local alignment with Bowtie 2, increased the mappability
|
|
21
|
|
22 2. Other features
|
|
23 ============
|
|
24 * Supported library types
|
|
25 - whole genome-wide bisulfite sequencing (WGBS)
|
|
26 - reduced representative bisulfite sequencing (RRBS)
|
|
27
|
|
28 * Supported formats for input file
|
|
29 - [fasta](http://en.wikipedia.org/wiki/FASTA_format)
|
|
30 - [fastq](http://en.wikipedia.org/wiki/FASTQ_format)
|
|
31 - [qseq](http://jumpgate.caltech.edu/wiki/QSeq)
|
|
32 - pure sequence (one-line one-sequence)
|
|
33
|
|
34 * Supported alignment tools
|
|
35 - [bowtie](http://bowtie-bio.sourceforge.net/index.shtml) : Single-seed
|
|
36 - [bowtie2](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml) : Multiple-seed, gapped-alignment
|
|
37 - [local alignment](http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#local-alignment-example)
|
|
38 - [end-to-end alignment](http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#end-to-end-alignment-example)
|
|
39 - [SOAP](http://soap.genomics.org.cn/)
|
|
40
|
|
41 * Supported formats for mapping results
|
|
42 - [BAM](http://genome.ucsc.edu/FAQ/FAQformat.html#format5.1)
|
|
43 - [SAM](http://samtools.sourceforge.net/)
|
|
44 - [BS-seeker 1](http://pellegrini.mcdb.ucla.edu/BS_Seeker/USAGE.html)
|
|
45
|
|
46 3. System requirements
|
|
47 ============
|
|
48
|
|
49 * Linux or Mac OS platform
|
|
50 * One of the following Aligner
|
|
51 - [bowtie](http://bowtie-bio.sourceforge.net/)
|
|
52 - [bowtie2](http://bowtie-bio.sourceforge.net/bowtie2/) (Recommend)
|
|
53 - [soap](http://soap.genomics.org.cn/)
|
|
54 * [Python](http://www.python.org/download/) (Version 2.6 +)
|
|
55
|
|
56 (It is normally pre-installed in Linux. Type " python -V" to see the installed version.)
|
|
57
|
|
58 * [pysam](http://code.google.com/p/pysam/) package is needed.
|
|
59
|
|
60 (Read "Questions & Answers" if you have problem when installing this package.)
|
|
61
|
|
62
|
|
63
|
|
64 4. Modules' descriptions
|
|
65 ============
|
|
66
|
|
67 (0) FilterReads.py
|
|
68 ------------
|
|
69
|
|
70 Optional and independent module.
|
|
71 Some reads would be extremely amplified during the PCR. This script helps you get unique reads before doing the mapping. You can decide whether or not to filter reads before doing the mapping.
|
|
72
|
|
73 ##Usage :
|
|
74
|
|
75 $ python FilterReads.py
|
|
76 Usage: FilterReads.py -i <input> -o <output> [-k]
|
|
77 Author : Guo, Weilong; guoweilong@gmail.com; 2012-11-10
|
|
78 Unique reads for qseq/fastq/fasta/sequencce, and filter
|
|
79 low quality file in qseq file.
|
|
80
|
|
81 Options:
|
|
82 -h, --help show this help message and exit
|
|
83 -i FILE Name of the input qseq/fastq/fasta/sequence file
|
|
84 -o FILE Name of the output file
|
|
85 -k Would not filter low quality reads if specified
|
|
86
|
|
87
|
|
88 ##Tip :
|
|
89
|
|
90 - This step is not suggested for RRBS library, as reads from RRBS library would more likely from the same location.
|
|
91
|
|
92
|
|
93 (1) bs_seeker2-build.py
|
|
94 ------------
|
|
95
|
|
96 Module to build the index for BS-Seeker2.
|
|
97
|
|
98
|
|
99 ##Usage :
|
|
100
|
|
101 $ python bs_seeker2-build.py -h
|
|
102 Usage: bs_seeker2-build.py [options]
|
|
103
|
|
104 Options:
|
|
105 -h, --help show this help message and exit
|
|
106 -f FILE, --file=FILE Input your reference genome file (fasta)
|
|
107 --aligner=ALIGNER Aligner program to perform the analysis: bowtie,
|
|
108 bowtie2, soap [Default: bowtie2]
|
|
109 -p PATH, --path=PATH Path to the aligner program. Defaults:
|
|
110 bowtie: /u/home/mcdb/weilong/install/bowtie-0.12.8
|
|
111 bowtie2:
|
|
112 /u/home/mcdb/weilong/install/bowtie2-2.0.0-beta7
|
|
113 soap: /u/home/mcdb/weilong/install/soap2.21release/
|
|
114 -d DBPATH, --db=DBPATH
|
|
115 Path to the reference genome library (generated in
|
|
116 preprocessing genome) [Default: /u/home/mcdb/weilong/i
|
|
117 nstall/BSseeker2/bs_utils/reference_genomes]
|
|
118 -v, --version show version of BS-Seeker2
|
|
119
|
|
120 Reduced Representation Bisulfite Sequencing Options:
|
|
121 Use this options with conjuction of -r [--rrbs]
|
|
122
|
|
123 -r, --rrbs Build index specially for Reduced Representation
|
|
124 Bisulfite Sequencing experiments. Genome other than
|
|
125 certain fragments will be masked. [Default: False]
|
|
126 -l LOW_BOUND, --low=LOW_BOUND
|
|
127 lower bound of fragment length (excluding recognition
|
|
128 sequence such as C-CGG) [Default: 40]
|
|
129 -u UP_BOUND, --up=UP_BOUND
|
|
130 upper bound of fragment length (excluding recognition
|
|
131 sequence such as C-CGG ends) [Default: 500]
|
|
132 -c CUT_FORMAT, --cut-site=CUT_FORMAT
|
|
133 Cut sites of restriction enzyme. Ex: MspI(C-CGG),
|
|
134 Mael:(C-TAG), double-enzyme MspI&Mael:(C-CGG,C-TAG).
|
|
135 [Default: C-CGG]
|
|
136
|
|
137 ##Example
|
|
138
|
|
139 * Build genome index for WGBS using bowtie, path of bowtie should be included in $PATH
|
|
140
|
|
141 python bs_seeker2-build.py -f genome.fa --aligner=bowtie
|
|
142
|
|
143 * Build genome index for RRBS with default parameters specifying the path for bowtie2
|
|
144
|
|
145 python bs_seeker2-build.py -f genome.fa --aligner=bowtie2 -p ~/install/bowtie2-2.0.0-beta7/ -r
|
|
146
|
|
147 * Build genome index for RRBS library using bowite2, with fragment lengths ranging [40bp, 400bp]
|
|
148
|
|
149 python bs_seeker2-build.py -f genome.fa -r -l 40 -u 400 --aligner=bowtie2
|
|
150
|
|
151 * Build genome index for RRBS library for double-enzyme : MspI (C-CGG) & ApeKI (G-CWGC, where W=A|T, see [IUPAC code](http://www.bioinformatics.org/sms/iupac.html))
|
|
152
|
|
153 python bs_seeker2-build.py -f genome.fa -r -c C-CGG,G-CWGC
|
|
154
|
|
155 ##Tips:
|
|
156
|
|
157 - Index built for BS-Seeker2 is different from the index for BS-Seeker 1.
|
|
158 For RRBS, you need to specify "-r" in the parameters. Also, you need to specify LOW_BOUND and UP_BOUND for the range of fragment lengths according your protocol.
|
|
159
|
|
160 - The fragment length is different from read length. Fragments refers to the DNA fragments which you get by size-selection step (i.e. gel-cut oor AMPure beads). Lengths of fragments are supposed to be in a range, such as [50bp,250bp].
|
|
161
|
|
162 - The indexes for RRBS and WGBS are different. Also, indexes for RRBS are specific for fragment length parameters (LOW_BOUND and UP_BOUND).
|
|
163
|
|
164
|
|
165
|
|
166
|
|
167 (2) bs_seeker2-align.py
|
|
168 ------------
|
|
169
|
|
170 Module to map reads on 3-letter converted genome.
|
|
171
|
|
172 ##Usage :
|
|
173
|
|
174 $ python ~/install/BSseeker2/bs_seeker2-align.py -h
|
|
175 Usage: bs_seeker2-align.py [options]
|
|
176
|
|
177 Options:
|
|
178 -h, --help show this help message and exit
|
|
179
|
|
180 For single end reads:
|
|
181 -i INFILE, --input=INFILE
|
|
182 Input your read file name (FORMAT: sequences,
|
|
183 fastq, qseq,fasta)
|
|
184
|
|
185 For pair end reads:
|
|
186 -1 FILE, --input_1=FILE
|
|
187 Input your read file end 1 (FORMAT: sequences,
|
|
188 qseq, fasta, fastq)
|
|
189 -2 FILE, --input_2=FILE
|
|
190 Input your read file end 2 (FORMAT: sequences,
|
|
191 qseq, fasta, fastq)
|
|
192 --minins=MIN_INSERT_SIZE
|
|
193 The minimum insert size for valid paired-end
|
|
194 alignments [Default: -1]
|
|
195 --maxins=MAX_INSERT_SIZE
|
|
196 The maximum insert size for valid paired-end
|
|
197 alignments [Default: 400]
|
|
198
|
|
199 Reduced Representation Bisulfite Sequencing Options:
|
|
200 -r, --rrbs Process reads from Reduced Representation Bisulfite
|
|
201 Sequencing experiments
|
|
202 -c pattern, --cut-site=pattern
|
|
203 Cutting sites of restriction enzyme. Ex: MspI(C-CGG),
|
|
204 Mael:(C-TAG), double-enzyme MspI&Mael:(C-CGG,C-TAG).
|
|
205 -L RRBS_LOW_BOUND, --low=RRBS_LOW_BOUND
|
|
206 lower bound of fragment length (excluding C-CGG ends)
|
|
207 [Default: 40]
|
|
208 -U RRBS_UP_BOUND, --up=RRBS_UP_BOUND
|
|
209 upper bound of fragment length (excluding C-CGG ends)
|
|
210 [Default: 500]
|
|
211
|
|
212 General options:
|
|
213 -t TAG, --tag=TAG [Y]es for undirectional lib, [N]o for directional
|
|
214 [Default: N]
|
|
215 -s CUTNUMBER1, --start_base=CUTNUMBER1
|
|
216 The first base of your read to be mapped [Default: 1]
|
|
217 -e CUTNUMBER2, --end_base=CUTNUMBER2
|
|
218 The last cycle number of your read to be mapped
|
|
219 [Default: 200]
|
|
220 -a FILE, --adapter=FILE
|
|
221 Input text file of your adaptor sequences (to be
|
|
222 trimed from the 3'end of the reads). Input 1 seq for
|
|
223 dir. lib., 2 seqs for undir. lib. One line per
|
|
224 sequence
|
|
225 --am=ADAPTER_MISMATCH
|
|
226 Number of mismatches allowed in adaptor [Default: 1]
|
|
227 -g GENOME, --genome=GENOME
|
|
228 Name of the reference genome (the same as the
|
|
229 reference genome file in the preprocessing step) [ex.
|
|
230 chr21_hg18.fa]
|
|
231 -m INT_NO_MISMATCHES, --mismatches=INT_NO_MISMATCHES
|
|
232 Number of mismatches in one read [Default: 4]
|
|
233 --aligner=ALIGNER Aligner program to perform the analisys: bowtie,
|
|
234 bowtie2, soap [Default: bowtie2]
|
|
235 -p PATH, --path=PATH
|
|
236 Path to the aligner program. Defaults:
|
|
237 bowtie: /u/home/mcdb/weilong/install/bowtie-0.12.8
|
|
238 bowtie2:
|
|
239 /u/home/mcdb/weilong/install/bowtie2-2.0.0-beta7
|
|
240 soap: /u/home/mcdb/weilong/soap2.21release/
|
|
241 -d DBPATH, --db=DBPATH
|
|
242 Path to the reference genome library (generated in
|
|
243 preprocessing genome) [Default: /u/home/mcdb/weilong/i
|
|
244 nstall/BSseeker2/bs_utils/reference_genomes]
|
|
245 -l NO_SPLIT, --split_line=NO_SPLIT
|
|
246 Number of lines per split (the read file will be split
|
|
247 into small files for mapping. The result will be
|
|
248 merged. [Default: 4000000]
|
|
249 -o OUTFILE, --output=OUTFILE
|
|
250 The name of output file [INFILE.bs(se|pe|rrbs)]
|
|
251 -f FORMAT, --output-format=FORMAT
|
|
252 Output format: bam, sam, bs_seeker1 [Default: bam]
|
|
253 --no-header Suppress SAM header lines [Default: False]
|
|
254 --temp_dir=PATH The path to your temporary directory [Default: /tmp]
|
|
255 --XS=XS_FILTER Filter definition for tag XS, format X,Y. X=0.8 and
|
|
256 y=5 indicate that for one read, if #(mCH sites)/#(all
|
|
257 CH sites)>0.8 and #(mCH sites)>5, then tag XS=1; or
|
|
258 else tag XS=0. [Default: 0.5,5]
|
|
259 --multiple-hit Output reads with multiple hits to
|
|
260 file"Multiple_hit.fa"
|
|
261 -v, --version show version of BS-Seeker2
|
|
262
|
|
263 Aligner Options:
|
|
264 You may specify any additional options for the aligner. You just have
|
|
265 to prefix them with --bt- for bowtie, --bt2- for bowtie2, --soap- for
|
|
266 soap, and BS Seeker will pass them on. For example: --bt-p 4 will
|
|
267 increase the number of threads for bowtie to 4, --bt--tryhard will
|
|
268 instruct bowtie to try as hard as possible to find valid alignments
|
|
269 when they exist, and so on. Be sure that you know what you are doing
|
|
270 when using these options! Also, we don't do any validation on the
|
|
271 values.
|
|
272
|
|
273
|
|
274
|
|
275 ##Examples :
|
|
276
|
|
277 * Align from fasta format with bowtie2 (local alignment) for whole genome, allowing 3 mismatches
|
|
278
|
|
279 python bs_seeker2-align.py -i WGBS.fa -m 3 --aligner=bowtie2 -o WGBS.bam -f bam -g genome.fa
|
|
280
|
|
281 * Align from qseq format for RRBS with bowtie, default parameters for RRBS fragments
|
|
282
|
|
283 python bs_seeker2-align.py -i RRBS.fa --aligner=bowtie -o RRBS.sam -f sam -g genome.fa -r -a adapter.txt
|
|
284
|
|
285 * Align from qseq format for RRBS with bowtie (end-to-end), specifying lengths of fragments ranging [40bp, 400bp]
|
|
286
|
|
287 python bs_seeker2-align.py -i RRBS.qseq --aligner=bowtie2 --bt2--end-to-end -o RRBS.bam -f bam -g genome.fa -r --low=40 --up=400 -a adapter.txt
|
|
288
|
|
289 The parameters '--low' and '--up' should be the same with corresponding parameters when building the genome index
|
|
290
|
|
291
|
|
292
|
|
293 ##Input file:
|
|
294
|
|
295 - Adapter.txt (example) :
|
|
296
|
|
297 AGATCGGAAGAGCACACGTC
|
|
298
|
|
299
|
|
300 ##Output files:
|
|
301
|
|
302 - SAM file
|
|
303
|
|
304 Sample:
|
|
305
|
|
306 10918 0 chr1 133859922 255 100M * 0 0 TGGTTGTTTTTGTTATAGTTTTTTGTTGTAGAGTTTTTTTTGGAAAGTTGTGTTTATTTTTTTTTTTGTTTGGGTTTTGTTTGAAAGGGGTGGATGAGTT * XO:Z:+FW XS:i:0 NM:i:3 XM:Z:x--yx-zzzy--y--y--zz-zyx-yx-y--------z------------x--------z--zzz----y----y--x-zyx--------y--------z XG:Z:-C_CGGCCGCCCCTGCTGCAGCCTCCCGCCGCAGAGTTTTCTTTGGAAAGTTGCGTTTATTTCTTCCCTTGTCTGGGCTGCGCCCGAAAGGGGCAGATGAGTC_AC
|
|
307
|
|
308
|
|
309 Format descriptions:
|
|
310
|
|
311 BS-Seeker2 specific tags:
|
|
312 XO : orientation, from forward/reverted
|
|
313 XS : 1 when read is recognized as not fully converted by bisulfite treatment, or else 0
|
|
314 XM : number of sites for mismatch
|
|
315 X: methylated CG
|
|
316 x: un-methylated CG
|
|
317 Y: methylated CHG
|
|
318 y: un-methylated CHG
|
|
319 Z: methylated CHH
|
|
320 z: un-methylated CHH
|
|
321 XG : genome sequences, with 2bp extended on both ends, from 5' to 3'
|
|
322 YR : tag only for RRBS, serial id of mapped fragment
|
|
323 YS : tag only for RRBS, start position of mapped fragment
|
|
324 YE : tag only for RRBS, end position of mapped fragment
|
|
325
|
|
326 Note:
|
|
327 For reads mapped on Watson(minus) strand, the 10th colum in SAM file is not the original reads but the revered sequences.
|
|
328
|
|
329
|
|
330 ##Tips:
|
|
331
|
|
332 - Removing adapter is recommended.
|
|
333
|
|
334 If you don't know what's your parameter, please ask the person who generate the library for you.
|
|
335
|
|
336 If you are too shy to ask for it, you can try to de novo motif finding tools (such as [DME](http://cb1.utdallas.edu/dme/index.htm) and [MEME](http://meme.nbcr.net/meme/cgi-bin/meme.cgi)) find the enriched pattern in 1000 reads.
|
|
337
|
|
338 Of course, you can also use other tools (such as [cutadapt](http://code.google.com/p/cutadapt/) ) to remove adaptor first.
|
|
339
|
|
340 - It's always better to use a wider range for fragment length.
|
|
341
|
|
342 For example, if 95% of reads come from fragments with length range [50bp, 250bp], you'd better choose [40bp, 300bp].
|
|
343
|
|
344
|
|
345 (3) bs_seeker2-call_methylation.py
|
|
346 ------------
|
|
347
|
|
348
|
|
349 This module calls methylation levels from the mapping result.
|
|
350
|
|
351 ##Usage:
|
|
352
|
|
353 $ python bs_seeker2-call_methylation.py -h
|
|
354 Usage: bs_seeker2-call_methylation.py [options]
|
|
355
|
|
356 Options:
|
|
357 -h, --help show this help message and exit
|
|
358 -i INFILE, --input=INFILE
|
|
359 BAM output from bs_seeker2-align.py
|
|
360 -d DBPATH, --db=DBPATH
|
|
361 Path to the reference genome library (generated in
|
|
362 preprocessing genome) [Default: /u/home/mcdb/weilong/i
|
|
363 nstall/BSseeker2/bs_utils/reference_genomes]
|
|
364 -o OUTFILE, --output-prefix=OUTFILE
|
|
365 The output prefix to create ATCGmap and wiggle files
|
|
366 [INFILE]
|
|
367 --wig=OUTFILE The output .wig file [INFILE.wig]
|
|
368 --CGmap=OUTFILE The output .CGmap file [INFILE.CGmap]
|
|
369 --ATCGmap=OUTFILE The output .ATCGmap file [INFILE.ATCGmap]
|
|
370 -x, --rm-SX Removed reads with tag 'XS:i:1', which would be
|
|
371 considered as not fully converted by bisulfite
|
|
372 treatment [Default: False]
|
|
373 -r READ_NO, --read-no=READ_NO
|
|
374 The least number of reads covering one site to be
|
|
375 shown in wig file [Default: 1]
|
|
376 -v, --version show version of BS-Seeker2
|
|
377
|
|
378
|
|
379 ##Example :
|
|
380
|
|
381 -For WGBS (whole genome bisulfite sequencing):
|
|
382
|
|
383 python bs_seeker2-call_methylation.py -i WGBS.bam -o output --db /path/to/BSseeker2/bs_utils/reference_genomes/genome.fa_bowtie/
|
|
384
|
|
385 -For RRBS:
|
|
386
|
|
387 python bs_seeker2-call_methylation.py -i RRBS.bam -o output --db /path/to/BSseeker2/bs_utils/reference_genomes/genome.fa_rrbs_40_400_bowtie2/
|
|
388
|
|
389 -For RRBS and removed un-converted reads (with tag XS=1):
|
|
390
|
|
391 python bs_seeker2-call_methylation.py -x -i RRBS.bam -o output --db /path/to/BSseeker2/bs_utils/reference_genomes/genome.fa_rrbs_75_280_bowtie2/
|
|
392
|
|
393 -For RRBS and only show sites covered by at least 10 reads in WIG file:
|
|
394
|
|
395 python bs_seeker2-call_methylation.py -r 10 -i RRBS.bam -o output --db /path/to/BSseeker2/bs_utils/reference_genomes/genome.fa_rrbs_75_280_bowtie2/
|
|
396
|
|
397
|
|
398 The folder “genome.fa\_rrbs\_40\_500\_bowtie2” is built in the first step
|
|
399
|
|
400 ##Output files:
|
|
401
|
|
402 - wig file
|
|
403
|
|
404 Sample:
|
|
405
|
|
406 variableStep chrom=chr1
|
|
407 3000419 0.000000
|
|
408 3000423 -0.2
|
|
409 3000440 0.000000
|
|
410 3000588 0.5
|
|
411 3000593 -0.000000
|
|
412
|
|
413
|
|
414 Format descriptions:
|
|
415 WIG file format. Negative value for 2nd column indicate a Cytosine on minus strand.
|
|
416
|
|
417
|
|
418 - CGmap file
|
|
419
|
|
420 Sample:
|
|
421
|
|
422 chr1 G 3000851 CHH CC 0.1 1 10
|
|
423 chr1 C 3001624 CHG CA 0.0 0 9
|
|
424 chr1 C 3001631 CG CG 1.0 5 5
|
|
425
|
|
426 Format descriptions:
|
|
427
|
|
428 (1) chromosome
|
|
429 (2) nucleotide on Watson (+) strand
|
|
430 (3) position
|
|
431 (4) context (CG/CHG/CHH)
|
|
432 (5) dinucleotide-context (CA/CC/CG/CT)
|
|
433 (6) methyltion-level = #-of-C / (#-of-C + #-of-T)
|
|
434 (7) #-of-C (methylated)
|
|
435 (8) (#-ofC + #-of-T) (all cytosines)
|
|
436
|
|
437
|
|
438 - ATCGmap file
|
|
439
|
|
440 Sample:
|
|
441
|
|
442 chr1 T 3009410 -- -- 0 10 0 0 0 0 0 0 0 0 na
|
|
443 chr1 C 3009411 CHH CC 0 10 0 0 0 0 0 0 0 0 0.0
|
|
444 chr1 C 3009412 CHG CC 0 10 0 0 0 0 0 0 0 0 0.0
|
|
445 chr1 C 3009413 CG CG 0 10 50 0 0 0 0 0 0 0 0.833333333333
|
|
446
|
|
447
|
|
448 Format descriptions:
|
|
449
|
|
450 (1) chromosome
|
|
451 (2) nucleotide on Watson (+) strand
|
|
452 (3) position
|
|
453 (4) context (CG/CHG/CHH)
|
|
454 (5) dinucleotide-context (CA/CC/CG/CT)
|
|
455
|
|
456 (6) - (10) plus strand
|
|
457 (6) # of reads from Watson strand mapped here, support A on Watson strand
|
|
458 (7) # of reads from Watson strand mapped here, support T on Watson strand
|
|
459 (8) # of reads from Watson strand mapped here, support C on Watson strand
|
|
460 (9) # of reads from Watson strand mapped here, support G on Watson strand
|
|
461 (10) # of reads from Watson strand mapped here, support N
|
|
462
|
|
463 (11) - (15) minus strand
|
|
464 (11) # of reads from Crick strand mapped here, support A on Watson strand and T on Crick strand
|
|
465 (12) # of reads from Crick strand mapped here, support T on Watson strand and A on Crick strand
|
|
466 (13) # of reads from Crick strand mapped here, support C on Watson strand and G on Crick strand
|
|
467 (14) # of reads from Crick strand mapped here, support G on Watson strand and C on Crick strand
|
|
468 (15) # of reads from Crick strand mapped here, support N
|
|
469
|
|
470 (16) methylation_level = #C/(#C+#T) = (C8+C14)/(C7+C8+C11+C14); "nan" means none reads support C/T at this position.
|
|
471
|
|
472
|
|
473
|
|
474 Contact Information
|
|
475 ============
|
|
476
|
|
477 If you still have questions on BS-Seeker 2, or you find bugs when using BS-Seeker 2, or you have suggestions, please write email to guoweilong@gmail.com (Weilong Guo).
|
|
478
|
|
479
|
|
480
|
|
481 Questions & Answers
|
|
482 ============
|
|
483
|
|
484 (1) Speed-up your alignment
|
|
485
|
|
486 Q: "It takes me days to do the alignment for one lane" ...
|
|
487
|
|
488 A: Yes, alignment is a time-consuming work, especially because the sequencing depth is increasing. An efficient way to align is :
|
|
489
|
|
490 i. cut the original sequence file into multiple small pieces;
|
|
491
|
|
492 Ex: split -l 4000000 input.fq
|
|
493
|
|
494 ii. align them in parallel;
|
|
495 iii. merge all the BAM files into a single one before running "bs-seeker2_call-methylation.py" (user "samtools merge" command).
|
|
496
|
|
497 Ex: samtools merge out.bam in1.bam in2.bam in3.bam
|
|
498
|
|
499 (2) read in BAM/SAM
|
|
500
|
|
501 Q: Is the read sequence in BAM/SAM file is the same as my original one?
|
|
502
|
|
503 A: NO. They are different for several reasons.
|
|
504
|
|
505 i. For RRBS, some reads are short because of trimming of the adapters
|
|
506 ii. For read mapping on Crick (-) strand, the reads are in fact the antisense version of the original sequence, opposite both in nucleotides and direction
|
|
507
|
|
508 (3) "Pysam" package related problem
|
|
509
|
|
510 Q: I'm normal account user for Linux(Cluster). I can't install "pysam". I get following error massages:
|
|
511
|
|
512
|
|
513 $ python setup.py install
|
|
514 running install
|
|
515 error: can't create or remove files in install directory
|
|
516 The following error occurred while trying to add or remove files in the
|
|
517 installation directory:
|
|
518 [Errno 13] Permission denied: '/usr/lib64/python2.6/site-packages/test-easy-install-26802.write-test'
|
|
519 ...
|
|
520
|
|
521
|
|
522 A: You can ask the administrator of your cluster to install pysam. If you don't want to bother him/her, you might need to build your own python, and then install the "pysam" package. The following script could be helpful for you.
|
|
523
|
|
524
|
|
525 mkdir ~/install
|
|
526 cd ~/install/
|
|
527
|
|
528 # install python
|
|
529 wget http://www.python.org/ftp/python/2.7.4/Python-2.7.4.tgz # download the python from websites
|
|
530 tar zxvf Python-2.7.4.tgz # decompress
|
|
531 cd Python-2.7.4
|
|
532 ./configure --prefix=`pwd`
|
|
533 make
|
|
534 make install
|
|
535
|
|
536 # Add the path of Python to $PATH
|
|
537 # Please add the following line to file ~/.bashrc
|
|
538
|
|
539 export PATH=~/install/Python-2.7.4:$PATH
|
|
540
|
|
541 # save the ~/.bashrc file
|
|
542 source ~/.bashrc
|
|
543
|
|
544 # install pysam package
|
|
545 wget https://pysam.googlecode.com/files/pysam-0.7.4.tar.gz
|
|
546 tar zxvf pysam-0.7.4.tar.gz
|
|
547 cd pysam-0.7.4
|
|
548 python setup.py build
|
|
549 python setup.py install
|
|
550 # re-login the shell after finish installing pysam
|
|
551
|
|
552 # install BS-Seeker2
|
|
553 wget https://github.com/BSSeeker/BSseeker2/archive/master.zip
|
|
554 mv master BSSeeker2.zip
|
|
555 unzip BSSeeker2.zip
|
|
556 cd BSseeker2-master/
|
|
557
|
|
558
|
|
559 (4)Run BS-Seeker2
|
|
560
|
|
561 Q: Can I add the path of BS-Seeker2's *.py to the $PATH, so I can call
|
|
562 BS-Seeker2 from anywhere?
|
|
563
|
|
564 A: If you're using the "python" from path "/usr/bin/python", you can directly
|
|
565 add the path of BS-Seeker2 in file "~/.bash_profile" (bash) or "~/.profile"
|
|
566 (other shell) or "~/.bashrc" (per-interactive-shell startup).
|
|
567 But if you are using python under other directories, you might need to modify
|
|
568 BS-Seeker2's script first. For example, if your python path is "/my_python/python",
|
|
569 please change the first line in "bs_seeker-build.py", "bs_seeker-align.py" and
|
|
570 "bs_seeker-call_methylation.py" to
|
|
571
|
|
572 #!/my_python/python
|
|
573
|
|
574 Then add
|
|
575
|
|
576 export PATH=/path/to/BS-Seeker2/:$PATH
|
|
577
|
|
578 to file "~/.bash_profile" (e.g.), and source the file:
|
|
579
|
|
580 source ~/.bash_profile
|
|
581
|
|
582 Then you can use BS-Seeker2 globally by typing:
|
|
583
|
|
584 bs_seeker_build.py -h
|
|
585 bs_seeker-align.py -h
|
|
586 bs_seeker-call_methylation.py -h
|
|
587
|
|
588
|
|
589
|
|
590
|