0
|
1 BS-Seeker2
|
|
2 =========
|
|
3
|
|
4 BS-Seeker2 (BS Seeker 2) performs accurate and fast mapping of bisulfite-treated short reads. BS-Seeker2 is an updated version on BS-Seeker.
|
|
5
|
|
6 0. Availability
|
|
7 ============
|
|
8
|
|
9 Homepage of [BS-Seeker2](http://pellegrini.mcdb.ucla.edu/BS_Seeker2/).
|
|
10
|
|
11 The source code for this package is available from
|
|
12 [https://github.com/BSSeeker/BSseeker2](https://github.com/BSSeeker/BSseeker2).
|
|
13 Also, you can use an instance of BS-Seeker 2 in Galaxy from [http://galaxy.hoffman2.idre.ucla.edu](http://galaxy.hoffman2.idre.ucla.edu).
|
|
14 (Label: "NGS: Methylation Mapping"/"Methylation Map with BS Seeker2")
|
|
15
|
|
16
|
|
17 1. Remarkable new features
|
|
18 ============
|
|
19 * Reduced index for RRBS, accelerating the mapping speed and increasing mappability
|
|
20 * Allowing local alignment with Bowtie 2, increased the mappability
|
|
21
|
|
22 2. Other features
|
|
23 ============
|
|
24 * Supported library types
|
|
25 - whole genome-wide bisulfite sequencing (WGBS)
|
|
26 - reduced representative bisulfite sequencing (RRBS)
|
|
27
|
|
28 * Supported formats for input file
|
|
29 - [fasta](http://en.wikipedia.org/wiki/FASTA_format)
|
|
30 - [fastq](http://en.wikipedia.org/wiki/FASTQ_format)
|
|
31 - [qseq](http://jumpgate.caltech.edu/wiki/QSeq)
|
|
32 - pure sequence (one-line one-sequence)
|
|
33
|
|
34 * Supported alignment tools
|
|
35 - [bowtie](http://bowtie-bio.sourceforge.net/index.shtml) : Single-seed
|
|
36 - [bowtie2](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml) : Multiple-seed, gapped-alignment
|
|
37 - [local alignment](http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#local-alignment-example)
|
|
38 - [end-to-end alignment](http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#end-to-end-alignment-example)
|
|
39 - [SOAP](http://soap.genomics.org.cn/)
|
|
40
|
|
41 * Supported formats for mapping results
|
|
42 - [BAM](http://genome.ucsc.edu/FAQ/FAQformat.html#format5.1)
|
|
43 - [SAM](http://samtools.sourceforge.net/)
|
|
44 - [BS-seeker 1](http://pellegrini.mcdb.ucla.edu/BS_Seeker/USAGE.html)
|
|
45
|
|
46 3. System requirements
|
|
47 ============
|
|
48
|
|
49 * Linux or Mac OS platform
|
|
50 * One of the following Aligner
|
1
|
51 - [bowtie](http://bowtie-bio.sourceforge.net/) (fast)
|
|
52 - [bowtie2](http://bowtie-bio.sourceforge.net/bowtie2/) (Default)
|
0
|
53 - [soap](http://soap.genomics.org.cn/)
|
1
|
54 - [rmap](http://www.cmb.usc.edu/people/andrewds/rmap/)
|
0
|
55 * [Python](http://www.python.org/download/) (Version 2.6 +)
|
|
56
|
|
57 (It is normally pre-installed in Linux. Type " python -V" to see the installed version.)
|
|
58
|
|
59 * [pysam](http://code.google.com/p/pysam/) package is needed.
|
|
60
|
|
61 (Read "Questions & Answers" if you have problem when installing this package.)
|
|
62
|
|
63
|
|
64
|
|
65 4. Modules' descriptions
|
|
66 ============
|
|
67
|
|
68 (0) FilterReads.py
|
|
69 ------------
|
|
70
|
|
71 Optional and independent module.
|
|
72 Some reads would be extremely amplified during the PCR. This script helps you get unique reads before doing the mapping. You can decide whether or not to filter reads before doing the mapping.
|
|
73
|
|
74 ##Usage :
|
|
75
|
|
76 $ python FilterReads.py
|
|
77 Usage: FilterReads.py -i <input> -o <output> [-k]
|
1
|
78 Author : Guo, Weilong; 2012-11-10
|
0
|
79 Unique reads for qseq/fastq/fasta/sequencce, and filter
|
|
80 low quality file in qseq file.
|
|
81
|
|
82 Options:
|
|
83 -h, --help show this help message and exit
|
|
84 -i FILE Name of the input qseq/fastq/fasta/sequence file
|
|
85 -o FILE Name of the output file
|
|
86 -k Would not filter low quality reads if specified
|
|
87
|
|
88
|
|
89 ##Tip :
|
|
90
|
|
91 - This step is not suggested for RRBS library, as reads from RRBS library would more likely from the same location.
|
|
92
|
|
93
|
|
94 (1) bs_seeker2-build.py
|
|
95 ------------
|
|
96
|
|
97 Module to build the index for BS-Seeker2.
|
|
98
|
|
99
|
|
100 ##Usage :
|
|
101
|
|
102 $ python bs_seeker2-build.py -h
|
|
103 Usage: bs_seeker2-build.py [options]
|
|
104
|
|
105 Options:
|
|
106 -h, --help show this help message and exit
|
|
107 -f FILE, --file=FILE Input your reference genome file (fasta)
|
|
108 --aligner=ALIGNER Aligner program to perform the analysis: bowtie,
|
1
|
109 bowtie2, soap, rmap [Default: bowtie]
|
|
110 -p PATH, --path=PATH Path to the aligner program. Detected:
|
|
111 bowtie: ~/install/bowtie
|
|
112 bowtie2: ~/install/bowtie2
|
|
113 rmap: ~/install/rmap_/bin
|
|
114 soap: ~/install/soap/
|
0
|
115 -d DBPATH, --db=DBPATH
|
|
116 Path to the reference genome library (generated in
|
1
|
117 preprocessing genome) [Default: ~/install
|
|
118 /BSseeker2/bs_utils/reference_genomes]
|
0
|
119 -v, --version show version of BS-Seeker2
|
|
120
|
|
121 Reduced Representation Bisulfite Sequencing Options:
|
|
122 Use this options with conjuction of -r [--rrbs]
|
|
123
|
|
124 -r, --rrbs Build index specially for Reduced Representation
|
|
125 Bisulfite Sequencing experiments. Genome other than
|
|
126 certain fragments will be masked. [Default: False]
|
|
127 -l LOW_BOUND, --low=LOW_BOUND
|
|
128 lower bound of fragment length (excluding recognition
|
1
|
129 sequence such as C-CGG) [Default: 20]
|
0
|
130 -u UP_BOUND, --up=UP_BOUND
|
|
131 upper bound of fragment length (excluding recognition
|
|
132 sequence such as C-CGG ends) [Default: 500]
|
|
133 -c CUT_FORMAT, --cut-site=CUT_FORMAT
|
|
134 Cut sites of restriction enzyme. Ex: MspI(C-CGG),
|
|
135 Mael:(C-TAG), double-enzyme MspI&Mael:(C-CGG,C-TAG).
|
|
136 [Default: C-CGG]
|
|
137
|
1
|
138
|
0
|
139 ##Example
|
|
140
|
|
141 * Build genome index for WGBS using bowtie, path of bowtie should be included in $PATH
|
|
142
|
|
143 python bs_seeker2-build.py -f genome.fa --aligner=bowtie
|
|
144
|
|
145 * Build genome index for RRBS with default parameters specifying the path for bowtie2
|
|
146
|
|
147 python bs_seeker2-build.py -f genome.fa --aligner=bowtie2 -p ~/install/bowtie2-2.0.0-beta7/ -r
|
|
148
|
|
149 * Build genome index for RRBS library using bowite2, with fragment lengths ranging [40bp, 400bp]
|
|
150
|
|
151 python bs_seeker2-build.py -f genome.fa -r -l 40 -u 400 --aligner=bowtie2
|
|
152
|
|
153 * Build genome index for RRBS library for double-enzyme : MspI (C-CGG) & ApeKI (G-CWGC, where W=A|T, see [IUPAC code](http://www.bioinformatics.org/sms/iupac.html))
|
|
154
|
1
|
155 python bs_seeker2-build.py -f genome.fa -r -c C-CGG,G-CWGC --aligner=bowtie
|
0
|
156
|
|
157 ##Tips:
|
|
158
|
|
159 - Index built for BS-Seeker2 is different from the index for BS-Seeker 1.
|
|
160 For RRBS, you need to specify "-r" in the parameters. Also, you need to specify LOW_BOUND and UP_BOUND for the range of fragment lengths according your protocol.
|
|
161
|
|
162 - The fragment length is different from read length. Fragments refers to the DNA fragments which you get by size-selection step (i.e. gel-cut oor AMPure beads). Lengths of fragments are supposed to be in a range, such as [50bp,250bp].
|
|
163
|
|
164 - The indexes for RRBS and WGBS are different. Also, indexes for RRBS are specific for fragment length parameters (LOW_BOUND and UP_BOUND).
|
|
165
|
|
166
|
|
167
|
|
168
|
|
169 (2) bs_seeker2-align.py
|
|
170 ------------
|
|
171
|
|
172 Module to map reads on 3-letter converted genome.
|
|
173
|
|
174 ##Usage :
|
|
175
|
|
176 $ python ~/install/BSseeker2/bs_seeker2-align.py -h
|
1
|
177 Usage: bs_seeker2-align.py {-i <single> | -1 <mate1> -2 <mate2>} -g <genome.fa> [options]
|
0
|
178
|
|
179 Options:
|
|
180 -h, --help show this help message and exit
|
|
181
|
|
182 For single end reads:
|
|
183 -i INFILE, --input=INFILE
|
1
|
184 Input read file (FORMAT: sequences, qseq, fasta,
|
|
185 fastq). Ex: read.fa or read.fa.gz
|
0
|
186
|
|
187 For pair end reads:
|
|
188 -1 FILE, --input_1=FILE
|
1
|
189 Input read file, mate 1 (FORMAT: sequences, qseq,
|
|
190 fasta, fastq)
|
0
|
191 -2 FILE, --input_2=FILE
|
1
|
192 Input read file, mate 2 (FORMAT: sequences, qseq,
|
|
193 fasta, fastq)
|
|
194 -I MIN_INSERT_SIZE, --minins=MIN_INSERT_SIZE
|
0
|
195 The minimum insert size for valid paired-end
|
1
|
196 alignments [Default: 0]
|
|
197 -X MAX_INSERT_SIZE, --maxins=MAX_INSERT_SIZE
|
0
|
198 The maximum insert size for valid paired-end
|
1
|
199 alignments [Default: 500]
|
0
|
200
|
|
201 Reduced Representation Bisulfite Sequencing Options:
|
1
|
202 -r, --rrbs Map reads to the Reduced Representation genome
|
0
|
203 -c pattern, --cut-site=pattern
|
|
204 Cutting sites of restriction enzyme. Ex: MspI(C-CGG),
|
|
205 Mael:(C-TAG), double-enzyme MspI&Mael:(C-CGG,C-TAG).
|
1
|
206 [Default: C-CGG]
|
0
|
207 -L RRBS_LOW_BOUND, --low=RRBS_LOW_BOUND
|
1
|
208 Lower bound of fragment length (excluding C-CGG ends)
|
|
209 [Default: 20]
|
0
|
210 -U RRBS_UP_BOUND, --up=RRBS_UP_BOUND
|
1
|
211 Upper bound of fragment length (excluding C-CGG ends)
|
0
|
212 [Default: 500]
|
|
213
|
|
214 General options:
|
|
215 -t TAG, --tag=TAG [Y]es for undirectional lib, [N]o for directional
|
|
216 [Default: N]
|
|
217 -s CUTNUMBER1, --start_base=CUTNUMBER1
|
1
|
218 The first cycle of the read to be mapped [Default: 1]
|
0
|
219 -e CUTNUMBER2, --end_base=CUTNUMBER2
|
1
|
220 The last cycle of the read to be mapped [Default: 200]
|
0
|
221 -a FILE, --adapter=FILE
|
|
222 Input text file of your adaptor sequences (to be
|
1
|
223 trimmed from the 3'end of the reads, ). Input one seq
|
|
224 for dir. lib., twon seqs for undir. lib. One line per
|
|
225 sequence. Only the first 10bp will be used
|
0
|
226 --am=ADAPTER_MISMATCH
|
1
|
227 Number of mismatches allowed in adapter [Default: 0]
|
0
|
228 -g GENOME, --genome=GENOME
|
1
|
229 Name of the reference genome (should be the same as
|
|
230 "-f" in bs_seeker2-build.py ) [ex. chr21_hg18.fa]
|
|
231 -m NO_MISMATCHES, --mismatches=NO_MISMATCHES
|
0
|
232 Number of mismatches in one read [Default: 4]
|
1
|
233 --aligner=ALIGNER Aligner program for short reads mapping: bowtie,
|
|
234 bowtie2, soap, rmap [Default: bowtie]
|
0
|
235 -p PATH, --path=PATH
|
1
|
236 Path to the aligner program. Detected:
|
|
237 bowtie: ~/install/bowtie
|
|
238 bowtie2: ~/install/bowtie2
|
|
239 rmap: ~/install/rmap/bin
|
|
240 soap: ~/install/soap/
|
0
|
241 -d DBPATH, --db=DBPATH
|
|
242 Path to the reference genome library (generated in
|
1
|
243 preprocessing genome) [Default: ~/i
|
0
|
244 nstall/BSseeker2/bs_utils/reference_genomes]
|
|
245 -l NO_SPLIT, --split_line=NO_SPLIT
|
|
246 Number of lines per split (the read file will be split
|
|
247 into small files for mapping. The result will be
|
|
248 merged. [Default: 4000000]
|
|
249 -o OUTFILE, --output=OUTFILE
|
|
250 The name of output file [INFILE.bs(se|pe|rrbs)]
|
|
251 -f FORMAT, --output-format=FORMAT
|
|
252 Output format: bam, sam, bs_seeker1 [Default: bam]
|
|
253 --no-header Suppress SAM header lines [Default: False]
|
1
|
254 --temp_dir=PATH The path to your temporary directory [Detected: /tmp]
|
0
|
255 --XS=XS_FILTER Filter definition for tag XS, format X,Y. X=0.8 and
|
|
256 y=5 indicate that for one read, if #(mCH sites)/#(all
|
|
257 CH sites)>0.8 and #(mCH sites)>5, then tag XS=1; or
|
|
258 else tag XS=0. [Default: 0.5,5]
|
|
259 --multiple-hit Output reads with multiple hits to
|
|
260 file"Multiple_hit.fa"
|
|
261 -v, --version show version of BS-Seeker2
|
|
262
|
|
263 Aligner Options:
|
|
264 You may specify any additional options for the aligner. You just have
|
|
265 to prefix them with --bt- for bowtie, --bt2- for bowtie2, --soap- for
|
1
|
266 soap, --rmap- for rmap, and BS Seeker will pass them on. For example:
|
|
267 --bt-p 4 will increase the number of threads for bowtie to 4, --bt--
|
|
268 tryhard will instruct bowtie to try as hard as possible to find valid
|
|
269 alignments when they exist, and so on. Be sure that you know what you
|
|
270 are doing when using these options! Also, we don't do any validation
|
|
271 on the values.
|
0
|
272
|
|
273
|
|
274 ##Examples :
|
|
275
|
1
|
276 * WGBS library ; alignment mode, bowtie ; map to WGBS index
|
|
277
|
|
278 python bs_seeker2-align.py -i WGBS.fa --aligner=bowtie -o WGBS.bam -f bam -g genome.fa
|
|
279
|
|
280 * WGBS library ; alignment mode, bowtie2-local ; map to WGBS index
|
0
|
281
|
1
|
282 python bs_seeker2-align.py -i WGBS.fa --aligner=bowtie2 -o WGBS.bam -f bam -g genome.fa
|
|
283
|
|
284 * WGBS library ; alignment mode, bowtie2-end-to-end ; map to WGBS index
|
0
|
285
|
1
|
286 python bs_seeker2-align.py -i WGBS.fa -m 3 --aligner=bowtie2 -o WGBS.bam -f bam -g genome.fa --bt2--end-to-end
|
|
287
|
|
288 * RRBS library ; alignment mode, bowtie ; map to RR index
|
0
|
289
|
1
|
290 python bs_seeker2-align.py -i RRBS.fa --aligner=bowtie -o RRBS.bam -g genome.fa -r -a adapter.txt
|
|
291
|
|
292 * RRBS library ; alignment mode, bowtie ; map to WG index
|
|
293
|
|
294 python bs_seeker2-align.py -i RRBS.fa --aligner=bowtie -o RRBS.bam -g genome.fa -a adapter.txt
|
0
|
295
|
1
|
296 * RRBS library ; alignment mode, bowtie2-end-to-end ; map to WG index
|
|
297
|
|
298 python bs_seeker2-align.py -i RRBS.fa --aligner=bowtie -o RRBS.bam -g genome.fa -a adapter.txt --bt2--end-to-end
|
0
|
299
|
1
|
300 * Align from qseq format for RRBS with bowtie, specifying lengths of fragments ranging [40bp, 400bp]
|
|
301
|
|
302 python bs_seeker2-align.py -i RRBS.qseq --aligner=bowtie -o RRBS.bam -f bam -g genome.fa -r --low=40 --up=400 -a adapter.txt
|
0
|
303
|
|
304 The parameters '--low' and '--up' should be the same with corresponding parameters when building the genome index
|
|
305
|
1
|
306 * WGBS library ; alignment mode, bowtie ; map to WGBS index; use 8 threads for alignment
|
|
307
|
|
308 python bs_seeker2-align.py -i WGBS.fa --aligner=bowtie -o WGBS.bam -f bam -g genome.fa --bt-p 4
|
|
309
|
|
310 BS-Seeker2 will run TWO bowtie instances in parallel.
|
0
|
311
|
|
312
|
|
313 ##Input file:
|
|
314
|
|
315 - Adapter.txt (example) :
|
|
316
|
|
317 AGATCGGAAGAGCACACGTC
|
|
318
|
|
319
|
|
320 ##Output files:
|
|
321
|
|
322 - SAM file
|
|
323
|
|
324 Sample:
|
|
325
|
|
326 10918 0 chr1 133859922 255 100M * 0 0 TGGTTGTTTTTGTTATAGTTTTTTGTTGTAGAGTTTTTTTTGGAAAGTTGTGTTTATTTTTTTTTTTGTTTGGGTTTTGTTTGAAAGGGGTGGATGAGTT * XO:Z:+FW XS:i:0 NM:i:3 XM:Z:x--yx-zzzy--y--y--zz-zyx-yx-y--------z------------x--------z--zzz----y----y--x-zyx--------y--------z XG:Z:-C_CGGCCGCCCCTGCTGCAGCCTCCCGCCGCAGAGTTTTCTTTGGAAAGTTGCGTTTATTTCTTCCCTTGTCTGGGCTGCGCCCGAAAGGGGCAGATGAGTC_AC
|
|
327
|
|
328
|
|
329 Format descriptions:
|
|
330
|
|
331 BS-Seeker2 specific tags:
|
|
332 XO : orientation, from forward/reverted
|
|
333 XS : 1 when read is recognized as not fully converted by bisulfite treatment, or else 0
|
|
334 XM : number of sites for mismatch
|
|
335 X: methylated CG
|
|
336 x: un-methylated CG
|
|
337 Y: methylated CHG
|
|
338 y: un-methylated CHG
|
|
339 Z: methylated CHH
|
|
340 z: un-methylated CHH
|
|
341 XG : genome sequences, with 2bp extended on both ends, from 5' to 3'
|
|
342 YR : tag only for RRBS, serial id of mapped fragment
|
|
343 YS : tag only for RRBS, start position of mapped fragment
|
|
344 YE : tag only for RRBS, end position of mapped fragment
|
|
345
|
|
346 Note:
|
|
347 For reads mapped on Watson(minus) strand, the 10th colum in SAM file is not the original reads but the revered sequences.
|
|
348
|
|
349
|
|
350 ##Tips:
|
|
351
|
|
352 - Removing adapter is recommended.
|
|
353
|
|
354 If you don't know what's your parameter, please ask the person who generate the library for you.
|
|
355
|
|
356 If you are too shy to ask for it, you can try to de novo motif finding tools (such as [DME](http://cb1.utdallas.edu/dme/index.htm) and [MEME](http://meme.nbcr.net/meme/cgi-bin/meme.cgi)) find the enriched pattern in 1000 reads.
|
|
357
|
|
358 Of course, you can also use other tools (such as [cutadapt](http://code.google.com/p/cutadapt/) ) to remove adaptor first.
|
|
359
|
|
360 - It's always better to use a wider range for fragment length.
|
|
361
|
|
362 For example, if 95% of reads come from fragments with length range [50bp, 250bp], you'd better choose [40bp, 300bp].
|
|
363
|
1
|
364 - Fewer mismatches for the 'local alignment' mode.
|
|
365
|
|
366 As the 'local alignment', the bad sequenced bases are usually trimmed, and would not be considered by the parameter "-m".
|
|
367 It is suggested to user fewer mismatches for the 'local alignment' mode.
|
0
|
368
|
|
369 (3) bs_seeker2-call_methylation.py
|
|
370 ------------
|
|
371
|
|
372
|
|
373 This module calls methylation levels from the mapping result.
|
|
374
|
|
375 ##Usage:
|
|
376
|
|
377 $ python bs_seeker2-call_methylation.py -h
|
|
378 Options:
|
|
379 -h, --help show this help message and exit
|
|
380 -i INFILE, --input=INFILE
|
|
381 BAM output from bs_seeker2-align.py
|
|
382 -d DBPATH, --db=DBPATH
|
|
383 Path to the reference genome library (generated in
|
1
|
384 preprocessing genome) [Default: ~/install
|
|
385 /BSseeker2/bs_utils/reference_genomes]
|
0
|
386 -o OUTFILE, --output-prefix=OUTFILE
|
|
387 The output prefix to create ATCGmap and wiggle files
|
|
388 [INFILE]
|
|
389 --wig=OUTFILE The output .wig file [INFILE.wig]
|
|
390 --CGmap=OUTFILE The output .CGmap file [INFILE.CGmap]
|
|
391 --ATCGmap=OUTFILE The output .ATCGmap file [INFILE.ATCGmap]
|
|
392 -x, --rm-SX Removed reads with tag 'XS:i:1', which would be
|
|
393 considered as not fully converted by bisulfite
|
|
394 treatment [Default: False]
|
1
|
395 --txt Show CGmap and ATCGmap in .gz [Default: False]
|
0
|
396 -r READ_NO, --read-no=READ_NO
|
|
397 The least number of reads covering one site to be
|
|
398 shown in wig file [Default: 1]
|
|
399 -v, --version show version of BS-Seeker2
|
|
400
|
|
401
|
|
402 ##Example :
|
|
403
|
|
404 -For WGBS (whole genome bisulfite sequencing):
|
|
405
|
|
406 python bs_seeker2-call_methylation.py -i WGBS.bam -o output --db /path/to/BSseeker2/bs_utils/reference_genomes/genome.fa_bowtie/
|
|
407
|
|
408 -For RRBS:
|
|
409
|
|
410 python bs_seeker2-call_methylation.py -i RRBS.bam -o output --db /path/to/BSseeker2/bs_utils/reference_genomes/genome.fa_rrbs_40_400_bowtie2/
|
|
411
|
|
412 -For RRBS and removed un-converted reads (with tag XS=1):
|
|
413
|
|
414 python bs_seeker2-call_methylation.py -x -i RRBS.bam -o output --db /path/to/BSseeker2/bs_utils/reference_genomes/genome.fa_rrbs_75_280_bowtie2/
|
|
415
|
|
416 -For RRBS and only show sites covered by at least 10 reads in WIG file:
|
|
417
|
|
418 python bs_seeker2-call_methylation.py -r 10 -i RRBS.bam -o output --db /path/to/BSseeker2/bs_utils/reference_genomes/genome.fa_rrbs_75_280_bowtie2/
|
|
419
|
|
420
|
|
421 The folder “genome.fa\_rrbs\_40\_500\_bowtie2” is built in the first step
|
|
422
|
|
423 ##Output files:
|
|
424
|
|
425 - wig file
|
|
426
|
|
427 Sample:
|
|
428
|
|
429 variableStep chrom=chr1
|
|
430 3000419 0.000000
|
|
431 3000423 -0.2
|
|
432 3000440 0.000000
|
|
433 3000588 0.5
|
|
434 3000593 -0.000000
|
|
435
|
|
436
|
|
437 Format descriptions:
|
|
438 WIG file format. Negative value for 2nd column indicate a Cytosine on minus strand.
|
|
439
|
|
440
|
|
441 - CGmap file
|
|
442
|
|
443 Sample:
|
|
444
|
|
445 chr1 G 3000851 CHH CC 0.1 1 10
|
|
446 chr1 C 3001624 CHG CA 0.0 0 9
|
|
447 chr1 C 3001631 CG CG 1.0 5 5
|
|
448
|
|
449 Format descriptions:
|
|
450
|
|
451 (1) chromosome
|
|
452 (2) nucleotide on Watson (+) strand
|
|
453 (3) position
|
|
454 (4) context (CG/CHG/CHH)
|
|
455 (5) dinucleotide-context (CA/CC/CG/CT)
|
1
|
456 (6) methylation-level = #_of_C / (#_of_C + #_of_T).
|
|
457 (7) #_of_C (methylated C, the count of reads showing C here)
|
|
458 (8) = #_of_C + #_of_T (all Cytosines, the count of reads showing C or T here)
|
0
|
459
|
|
460
|
|
461 - ATCGmap file
|
|
462
|
|
463 Sample:
|
|
464
|
|
465 chr1 T 3009410 -- -- 0 10 0 0 0 0 0 0 0 0 na
|
|
466 chr1 C 3009411 CHH CC 0 10 0 0 0 0 0 0 0 0 0.0
|
|
467 chr1 C 3009412 CHG CC 0 10 0 0 0 0 0 0 0 0 0.0
|
1
|
468 chr1 C 3009413 CG CG 0 10 50 0 0 0 0 0 0 0 0.83
|
0
|
469
|
|
470
|
|
471 Format descriptions:
|
|
472
|
|
473 (1) chromosome
|
|
474 (2) nucleotide on Watson (+) strand
|
|
475 (3) position
|
|
476 (4) context (CG/CHG/CHH)
|
|
477 (5) dinucleotide-context (CA/CC/CG/CT)
|
|
478
|
|
479 (6) - (10) plus strand
|
|
480 (6) # of reads from Watson strand mapped here, support A on Watson strand
|
|
481 (7) # of reads from Watson strand mapped here, support T on Watson strand
|
|
482 (8) # of reads from Watson strand mapped here, support C on Watson strand
|
|
483 (9) # of reads from Watson strand mapped here, support G on Watson strand
|
|
484 (10) # of reads from Watson strand mapped here, support N
|
|
485
|
|
486 (11) - (15) minus strand
|
|
487 (11) # of reads from Crick strand mapped here, support A on Watson strand and T on Crick strand
|
|
488 (12) # of reads from Crick strand mapped here, support T on Watson strand and A on Crick strand
|
|
489 (13) # of reads from Crick strand mapped here, support C on Watson strand and G on Crick strand
|
|
490 (14) # of reads from Crick strand mapped here, support G on Watson strand and C on Crick strand
|
|
491 (15) # of reads from Crick strand mapped here, support N
|
|
492
|
1
|
493 (16) methylation_level = #C/(#C+#T) = C8/(C7+C8) for Watson strand, =C14/(C11+C14) for Crick strand;
|
|
494 "nan" means none reads support C/T at this position.
|
0
|
495
|
|
496
|
|
497
|
|
498 Contact Information
|
|
499 ============
|
|
500
|
1
|
501 If you still have questions on BS-Seeker 2, or you find bugs when using BS-Seeker 2, or you have suggestions, please write email to [Weilong Guo](guoweilong@gmail.com).
|
0
|
502
|
|
503
|
|
504
|
|
505 Questions & Answers
|
|
506 ============
|
|
507
|
|
508 (1) Speed-up your alignment
|
|
509
|
|
510 Q: "It takes me days to do the alignment for one lane" ...
|
|
511
|
|
512 A: Yes, alignment is a time-consuming work, especially because the sequencing depth is increasing. An efficient way to align is :
|
|
513
|
|
514 i. cut the original sequence file into multiple small pieces;
|
|
515
|
|
516 Ex: split -l 4000000 input.fq
|
|
517
|
|
518 ii. align them in parallel;
|
|
519 iii. merge all the BAM files into a single one before running "bs-seeker2_call-methylation.py" (user "samtools merge" command).
|
|
520
|
|
521 Ex: samtools merge out.bam in1.bam in2.bam in3.bam
|
|
522
|
|
523 (2) read in BAM/SAM
|
|
524
|
|
525 Q: Is the read sequence in BAM/SAM file is the same as my original one?
|
|
526
|
|
527 A: NO. They are different for several reasons.
|
|
528
|
|
529 i. For RRBS, some reads are short because of trimming of the adapters
|
|
530 ii. For read mapping on Crick (-) strand, the reads are in fact the antisense version of the original sequence, opposite both in nucleotides and direction
|
|
531
|
|
532 (3) "Pysam" package related problem
|
|
533
|
|
534 Q: I'm normal account user for Linux(Cluster). I can't install "pysam". I get following error massages:
|
|
535
|
|
536
|
|
537 $ python setup.py install
|
|
538 running install
|
|
539 error: can't create or remove files in install directory
|
|
540 The following error occurred while trying to add or remove files in the
|
|
541 installation directory:
|
|
542 [Errno 13] Permission denied: '/usr/lib64/python2.6/site-packages/test-easy-install-26802.write-test'
|
|
543 ...
|
|
544
|
|
545
|
|
546 A: You can ask the administrator of your cluster to install pysam. If you don't want to bother him/her, you might need to build your own python, and then install the "pysam" package. The following script could be helpful for you.
|
|
547
|
|
548
|
|
549 mkdir ~/install
|
|
550 cd ~/install/
|
|
551
|
|
552 # install python
|
|
553 wget http://www.python.org/ftp/python/2.7.4/Python-2.7.4.tgz # download the python from websites
|
|
554 tar zxvf Python-2.7.4.tgz # decompress
|
|
555 cd Python-2.7.4
|
|
556 ./configure --prefix=`pwd`
|
|
557 make
|
|
558 make install
|
|
559
|
|
560 # Add the path of Python to $PATH
|
|
561 # Please add the following line to file ~/.bashrc
|
|
562
|
|
563 export PATH=~/install/Python-2.7.4:$PATH
|
|
564
|
|
565 # save the ~/.bashrc file
|
|
566 source ~/.bashrc
|
|
567
|
|
568 # install pysam package
|
|
569 wget https://pysam.googlecode.com/files/pysam-0.7.4.tar.gz
|
|
570 tar zxvf pysam-0.7.4.tar.gz
|
|
571 cd pysam-0.7.4
|
|
572 python setup.py build
|
|
573 python setup.py install
|
|
574 # re-login the shell after finish installing pysam
|
|
575
|
|
576 # install BS-Seeker2
|
|
577 wget https://github.com/BSSeeker/BSseeker2/archive/master.zip
|
|
578 mv master BSSeeker2.zip
|
|
579 unzip BSSeeker2.zip
|
|
580 cd BSseeker2-master/
|
|
581
|
|
582
|
1
|
583 (4)Run BS-Seeker2
|
0
|
584
|
|
585 Q: Can I add the path of BS-Seeker2's *.py to the $PATH, so I can call
|
|
586 BS-Seeker2 from anywhere?
|
|
587
|
|
588 A: If you're using the "python" from path "/usr/bin/python", you can directly
|
|
589 add the path of BS-Seeker2 in file "~/.bash_profile" (bash) or "~/.profile"
|
|
590 (other shell) or "~/.bashrc" (per-interactive-shell startup).
|
|
591 But if you are using python under other directories, you might need to modify
|
|
592 BS-Seeker2's script first. For example, if your python path is "/my_python/python",
|
|
593 please change the first line in "bs_seeker-build.py", "bs_seeker-align.py" and
|
|
594 "bs_seeker-call_methylation.py" to
|
|
595
|
|
596 #!/my_python/python
|
|
597
|
|
598 Then add
|
|
599
|
|
600 export PATH=/path/to/BS-Seeker2/:$PATH
|
|
601
|
|
602 to file "~/.bash_profile" (e.g.), and source the file:
|
|
603
|
|
604 source ~/.bash_profile
|
|
605
|
|
606 Then you can use BS-Seeker2 globally by typing:
|
|
607
|
|
608 bs_seeker_build.py -h
|
|
609 bs_seeker-align.py -h
|
|
610 bs_seeker-call_methylation.py -h
|
|
611
|
1
|
612 (5) Unique alignment
|
|
613
|
|
614 Q: If I want to only keep alignments that map uniquely, is this an argument I should supply directly
|
|
615 to Bowtie2 (via BS Seeker 2's command line option), or is this an option that's available in
|
|
616 BS Seeker 2 itself?
|
|
617
|
|
618 A: BS-Seeker2 reports unique alignment by default already. If you want to know how many reads
|
|
619 could have multiple hits, run BS-Seeker2 with parameter "--multiple-hit".
|
|
620
|
|
621
|
|
622 (6) Output
|
|
623
|
|
624 Q: In CGmap files, why some lines shown "--" but not a motif (CG/CHG/CHH), for example:
|
|
625
|
|
626 chr01 C 4303711 -- CC 0.0 0 2
|
|
627 chr01 C 4303712 -- CN 0.0 0 2
|
|
628
|
|
629 A: In this example, the site 4303713 is "N" in genome, thus we could not decide the explict pattern.
|
|
630
|
|
631 (7) Algorithm to remove the adapter.
|
|
632
|
|
633 Q: What's the algorithm to remove the adapter
|
|
634
|
|
635 A: BS-Seeker2 has built-in algorithm for removing the adapter, which is developed by [Weilong Guo](http://bioinfo.au.tsinghua.edu.cn/member/wguo/index.html).
|
|
636
|
|
637 First, if the adapter was provided as "AGATCGGAAGAGCACACGTC", only the first 10bp would be used.
|
|
638 Second, we use semi-local alignment strategy for removing the adapters.
|
|
639 Exmaple:
|
|
640
|
|
641 Read : ACCGCGTTGATCGAGTACGTACGTGGGTC
|
|
642 Adapter : ....................ACGTGGGTCCCG
|
|
643
|
|
644 no_mismatch : the maximum number allowed for mismatches
|
|
645
|
|
646 Algorithm: (allowing 1 mismatch)
|
|
647 -Step 1:
|
|
648 ACCGCGTTGATCGAGTACGTACGTGGGTC
|
|
649 ||XX
|
|
650 ACGTGGGTCCCG
|
|
651 -Step 2:
|
|
652 ACCGCGTTGATCGAGTACGTACGTGGGTC
|
|
653 X||X
|
|
654 .ACGTGGGTCCCG
|
|
655 -Step 3:
|
|
656 ACCGCGTTGATCGAGTACGTACGTGGGTC
|
|
657 XX
|
|
658 ..ACGTGGGTCCCG
|
|
659 -Step ...
|
|
660 -Step N:
|
|
661 ACCGCGTTGATCGAGTACGTACGTGGGTC
|
|
662 |||||||||
|
|
663 ....................ACGTGGGTCCCG
|
|
664 Success & return!
|
|
665
|
|
666 Third, we also removed the synthesized bases at the end of RRBS fragments.
|
|
667 Take the "C-CGG" cutting site as example,
|
|
668
|
|
669 - - C|U G G - - =>cut=> - - C =>add=> - - C|C G =>sequencing
|
|
670 - - G G C|C - - - - G G C - - G G C
|
|
671
|
|
672 In our algorithm, the "CG" in "--CCG" (upper strand) was trimmed, in order to get accurate methyaltion level.
|
0
|
673
|
|
674
|
|
675
|
1
|
676
|
|
677
|