Mercurial > repos > bgruening > bismark

Create a temporary index with the offered files from the user. Utilizing the script: bismark_genome_preparation
Generating index with: 'bismark_genome_preparation --bowtie2 /var/folders/cf/k7c5rjhj1sggk6x8z1jnw4b80000gp/T/tmp4_4hiluf'
Writing bisulfite genomes out into a single MFA (multi FastA) file

Bisulfite Genome Indexer version v0.20.0 (last modified 26 April 2018)

Step I - Prepare genome folders - completed


Total number of conversions performed:
C->T:	146875
G->A:	150504

Step II - Genome bisulfite conversions - completed


Bismark Genome Preparation - Step III: Launching the Bowtie 2 indexer
Please be aware that this process can - depending on genome size - take several hours!
Settings:
  Output files: "BS_CT.*.bt2"
  Line rate: 6 (line is 64 bytes)
  Lines per side: 1 (side is 64 bytes)
  Offset rate: 4 (one in 16)
  FTable chars: 10
  Strings: unpacked
  Max bucket size: default
  Max bucket size, sqrt multiplier: default
  Max bucket size, len divisor: 4
  Difference-cover sample period: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
  genome_mfa.CT_conversion.fa
Building a SMALL index
Reading reference sizes
  Time reading reference sizes: 00:00:00
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:00
bmax according to bmaxDivN setting: 189039
Using parameters --bmax 141780 --dcv 1024
  Doing ahead-of-time memory usage test
  Passed!  Constructing with these parameters: --bmax 141780 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
  Building sPrime
  Building sPrimeOrder
  V-Sorting samples
  V-Sorting samples time: 00:00:00
  Allocating rank array
  Ranking v-sort output
  Ranking v-sort output time: 00:00:00
  Invoking Larsson-Sadakane on ranks
  Invoking Larsson-Sadakane on ranks time: 00:00:00
  Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
  (Using difference cover)
  Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
Splitting and merging
  Splitting and merging time: 00:00:00
Avg bucket size: 756159 (target: 141779)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering Ebwt loop
Getting block 1 of 1
  No samples; assembling all-inclusive block
  Sorting block of length 756159 for bucket 1
  (Using difference cover)
  Sorting block time: 00:00:01
Returning block of 756160 for bucket 1
Exited Ebwt loop
fchr[A]: 0
fchr[C]: 235897
fchr[G]: 235897
fchr[T]: 386401
fchr[$]: 756159
Exiting Ebwt::buildToDisk()
Returning from initFromVector
Wrote 4446745 bytes to primary EBWT file: BS_CT.1.bt2
Wrote 189044 bytes to secondary EBWT file: BS_CT.2.bt2
Re-opening _in1 and _in2 as input streams
Returning from Ebwt constructor
Headers:
    len: 756159
    bwtLen: 756160
    sz: 189040
    bwtSz: 189040
    lineRate: 6
    offRate: 4
    offMask: 0xfffffff0
    ftabChars: 10
    eftabLen: 20
    eftabSz: 80
    ftabLen: 1048577
    ftabSz: 4194308
    offsLen: 47260
    offsSz: 189040
    lineSz: 64
    sideSz: 64
    sideBwtSz: 48
    sideBwtLen: 192
    numSides: 3939
    numLines: 3939
    ebwtTotLen: 252096
    ebwtTotSz: 252096
    color: 0
    reverse: 0
Total time for call to driver() for forward index: 00:00:01
Reading reference sizes
  Time reading reference sizes: 00:00:00
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:00
  Time to reverse reference sequence: 00:00:00
bmax according to bmaxDivN setting: 189039
Using parameters --bmax 141780 --dcv 1024
  Doing ahead-of-time memory usage test
  Passed!  Constructing with these parameters: --bmax 141780 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
  Building sPrime
  Building sPrimeOrder
  V-Sorting samples
  V-Sorting samples time: 00:00:00
  Allocating rank array
  Ranking v-sort output
  Ranking v-sort output time: 00:00:00
  Invoking Larsson-Sadakane on ranks
  Invoking Larsson-Sadakane on ranks time: 00:00:00
  Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
  (Using difference cover)
  Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
Splitting and merging
  Splitting and merging time: 00:00:00
Avg bucket size: 756159 (target: 141779)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering Ebwt loop
Getting block 1 of 1
  No samples; assembling all-inclusive block
  Sorting block of length 756159 for bucket 1
  (Using difference cover)
  Sorting block time: 00:00:00
Returning block of 756160 for bucket 1
Exited Ebwt loop
fchr[A]: 0
fchr[C]: 235897
fchr[G]: 235897
fchr[T]: 386401
fchr[$]: 756159
Exiting Ebwt::buildToDisk()
Returning from initFromVector
Wrote 4446745 bytes to primary EBWT file: BS_CT.rev.1.bt2
Wrote 189044 bytes to secondary EBWT file: BS_CT.rev.2.bt2
Re-opening _in1 and _in2 as input streams
Returning from Ebwt constructor
Headers:
    len: 756159
    bwtLen: 756160
    sz: 189040
    bwtSz: 189040
    lineRate: 6
    offRate: 4
    offMask: 0xfffffff0
    ftabChars: 10
    eftabLen: 20
    eftabSz: 80
    ftabLen: 1048577
    ftabSz: 4194308
    offsLen: 47260
    offsSz: 189040
    lineSz: 64
    sideSz: 64
    sideBwtSz: 48
    sideBwtLen: 192
    numSides: 3939
    numLines: 3939
    ebwtTotLen: 252096
    ebwtTotSz: 252096
    color: 0
    reverse: 1
Total time for backward call to driver() for mirror index: 00:00:00
Settings:
  Output files: "BS_GA.*.bt2"
  Line rate: 6 (line is 64 bytes)
  Lines per side: 1 (side is 64 bytes)
  Offset rate: 4 (one in 16)
  FTable chars: 10
  Strings: unpacked
  Max bucket size: default
  Max bucket size, sqrt multiplier: default
  Max bucket size, len divisor: 4
  Difference-cover sample period: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
  genome_mfa.GA_conversion.fa
Building a SMALL index
Reading reference sizes
  Time reading reference sizes: 00:00:00
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:00
bmax according to bmaxDivN setting: 189039
Using parameters --bmax 141780 --dcv 1024
  Doing ahead-of-time memory usage test
  Passed!  Constructing with these parameters: --bmax 141780 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
  Building sPrime
  Building sPrimeOrder
  V-Sorting samples
  V-Sorting samples time: 00:00:00
  Allocating rank array
  Ranking v-sort output
  Ranking v-sort output time: 00:00:00
  Invoking Larsson-Sadakane on ranks
  Invoking Larsson-Sadakane on ranks time: 00:00:00
  Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
  (Using difference cover)
  Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
Splitting and merging
  Splitting and merging time: 00:00:00
Avg bucket size: 756159 (target: 141779)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering Ebwt loop
Getting block 1 of 1
  No samples; assembling all-inclusive block
  Sorting block of length 756159 for bucket 1
  (Using difference cover)
  Sorting block time: 00:00:01
Returning block of 756160 for bucket 1
Exited Ebwt loop
fchr[A]: 0
fchr[C]: 386401
fchr[G]: 533276
fchr[T]: 533276
fchr[$]: 756159
Exiting Ebwt::buildToDisk()
Returning from initFromVector
Wrote 4446745 bytes to primary EBWT file: BS_GA.1.bt2
Wrote 189044 bytes to secondary EBWT file: BS_GA.2.bt2
Re-opening _in1 and _in2 as input streams
Returning from Ebwt constructor
Headers:
    len: 756159
    bwtLen: 756160
    sz: 189040
    bwtSz: 189040
    lineRate: 6
    offRate: 4
    offMask: 0xfffffff0
    ftabChars: 10
    eftabLen: 20
    eftabSz: 80
    ftabLen: 1048577
    ftabSz: 4194308
    offsLen: 47260
    offsSz: 189040
    lineSz: 64
    sideSz: 64
    sideBwtSz: 48
    sideBwtLen: 192
    numSides: 3939
    numLines: 3939
    ebwtTotLen: 252096
    ebwtTotSz: 252096
    color: 0
    reverse: 0
Total time for call to driver() for forward index: 00:00:01
Reading reference sizes
  Time reading reference sizes: 00:00:00
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:00
  Time to reverse reference sequence: 00:00:00
bmax according to bmaxDivN setting: 189039
Using parameters --bmax 141780 --dcv 1024
  Doing ahead-of-time memory usage test
  Passed!  Constructing with these parameters: --bmax 141780 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
  Building sPrime
  Building sPrimeOrder
  V-Sorting samples
  V-Sorting samples time: 00:00:00
  Allocating rank array
  Ranking v-sort output
  Ranking v-sort output time: 00:00:00
  Invoking Larsson-Sadakane on ranks
  Invoking Larsson-Sadakane on ranks time: 00:00:00
  Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
  (Using difference cover)
  Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
Splitting and merging
  Splitting and merging time: 00:00:00
Avg bucket size: 756159 (target: 141779)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering Ebwt loop
Getting block 1 of 1
  No samples; assembling all-inclusive block
  Sorting block of length 756159 for bucket 1
  (Using difference cover)
  Sorting block time: 00:00:00
Returning block of 756160 for bucket 1
Exited Ebwt loop
fchr[A]: 0
fchr[C]: 386401
fchr[G]: 533276
fchr[T]: 533276
fchr[$]: 756159
Exiting Ebwt::buildToDisk()
Returning from initFromVector
Wrote 4446745 bytes to primary EBWT file: BS_GA.rev.1.bt2
Wrote 189044 bytes to secondary EBWT file: BS_GA.rev.2.bt2
Re-opening _in1 and _in2 as input streams
Returning from Ebwt constructor
Headers:
    len: 756159
    bwtLen: 756160
    sz: 189040
    bwtSz: 189040
    lineRate: 6
    offRate: 4
    offMask: 0xfffffff0
    ftabChars: 10
    eftabLen: 20
    eftabSz: 80
    ftabLen: 1048577
    ftabSz: 4194308
    offsLen: 47260
    offsSz: 189040
    lineSz: 64
    sideSz: 64
    sideBwtSz: 48
    sideBwtLen: 192
    numSides: 3939
    numLines: 3939
    ebwtTotLen: 252096
    ebwtTotSz: 252096
    color: 0
    reverse: 1
Total time for backward call to driver() for mirror index: 00:00:00
Running bismark with: 'bismark --bam --gzip --temp_dir /var/folders/cf/k7c5rjhj1sggk6x8z1jnw4b80000gp/T/tmprjiux3me -o /var/folders/cf/k7c5rjhj1sggk6x8z1jnw4b80000gp/T/tmprjiux3me/results --quiet --fastq -L 20 -D 15 -R 2 --un --ambiguous /var/folders/cf/k7c5rjhj1sggk6x8z1jnw4b80000gp/T/tmp4_4hiluf /private/var/folders/cf/k7c5rjhj1sggk6x8z1jnw4b80000gp/T/tmpywNSSM/files/000/dataset_2.dat'
Path to Bowtie 2 specified as: bowtie2
Bowtie seems to be working fine (tested command 'bowtie2 --version' [2.3.4])
Output format is BAM (default)
Alignments will be written out in BAM format. Samtools found here: '/Users/scholtalbers/miniconda3/envs/mulled-v1-7f230b3d24f9f00e91e72af479ffae49907f4592b1a7a4b05436e0a99567150a/bin/samtools'
Reference genome folder provided is /var/folders/cf/k7c5rjhj1sggk6x8z1jnw4b80000gp/T/tmp4_4hiluf/	(absolute path is '/private/var/folders/cf/k7c5rjhj1sggk6x8z1jnw4b80000gp/T/tmp4_4hiluf/)'
FastQ format specified

Input files to be analysed (in current folder '/private/var/folders/cf/k7c5rjhj1sggk6x8z1jnw4b80000gp/T/tmpywNSSM/job_working_directory/000/3/working'):
/private/var/folders/cf/k7c5rjhj1sggk6x8z1jnw4b80000gp/T/tmpywNSSM/files/000/dataset_2.dat
Library is assumed to be strand-specific (directional), alignments to strands complementary to the original top or bottom strands will be ignored (i.e. not performed!)
Created output directory /var/folders/cf/k7c5rjhj1sggk6x8z1jnw4b80000gp/T/tmprjiux3me/results/!

Output will be written into the directory: /private/var/folders/cf/k7c5rjhj1sggk6x8z1jnw4b80000gp/T/tmprjiux3me/results/

Using temp directory: /var/folders/cf/k7c5rjhj1sggk6x8z1jnw4b80000gp/T/tmprjiux3me
Temporary files will be written into the directory: /private/var/folders/cf/k7c5rjhj1sggk6x8z1jnw4b80000gp/T/tmprjiux3me/
Setting parallelization to single-threaded (default)

Current working directory is: /private/var/folders/cf/k7c5rjhj1sggk6x8z1jnw4b80000gp/T/tmpywNSSM/job_working_directory/000/3/working

Now reading in and storing sequence information of the genome specified in: /private/var/folders/cf/k7c5rjhj1sggk6x8z1jnw4b80000gp/T/tmp4_4hiluf/

chr chrY_JH584300_random (182347 bp)
chr chrY_JH584301_random (259875 bp)
chr chrY_JH584302_random (155838 bp)
chr chrY_JH584303_random (158099 bp)

Single-core mode: setting pid to 1

Single-end alignments will be performed
=======================================

Input file is in FastQ format
Writing a C -> T converted version of the input file dataset_2.dat to /private/var/folders/cf/k7c5rjhj1sggk6x8z1jnw4b80000gp/T/tmprjiux3me/dataset_2.dat_C_to_T.fastq.gz

Created C -> T converted version of the FastQ file dataset_2.dat (44115 sequences in total)

Input file is dataset_2.dat_C_to_T.fastq.gz (FastQ)

Now running 2 instances of Bowtie 2 against the bisulfite genome of /private/var/folders/cf/k7c5rjhj1sggk6x8z1jnw4b80000gp/T/tmp4_4hiluf/ with the specified options: -q -L 20 -D 15 -R 2 --score-min L,0,-0.2 --ignore-quals --quiet

Now starting the Bowtie 2 aligner for CTreadCTgenome (reading in sequences from /private/var/folders/cf/k7c5rjhj1sggk6x8z1jnw4b80000gp/T/tmprjiux3me/dataset_2.dat_C_to_T.fastq.gz with options -q -L 20 -D 15 -R 2 --score-min L,0,-0.2 --ignore-quals --quiet --norc)
Using Bowtie 2 index: /private/var/folders/cf/k7c5rjhj1sggk6x8z1jnw4b80000gp/T/tmp4_4hiluf/Bisulfite_Genome/CT_conversion/BS_CT

Found first alignment:	1_1	4	*	0	0	*	*	0	0	TTGTATATATTAGATAAATTAATTTTTTTTGTTTGTATGTTAAATTTTTTAATTAATTTATTAATATTTTGTGAATTTTTAGATA	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEAEEEEEE	YT:Z:UU
Now starting the Bowtie 2 aligner for CTreadGAgenome (reading in sequences from /private/var/folders/cf/k7c5rjhj1sggk6x8z1jnw4b80000gp/T/tmprjiux3me/dataset_2.dat_C_to_T.fastq.gz with options -q -L 20 -D 15 -R 2 --score-min L,0,-0.2 --ignore-quals --quiet --nofw)
Using Bowtie 2 index: /private/var/folders/cf/k7c5rjhj1sggk6x8z1jnw4b80000gp/T/tmp4_4hiluf/Bisulfite_Genome/GA_conversion/BS_GA

Found first alignment:	1_1	4	*	0	0	*	*	0	0	TTGTATATATTAGATAAATTAATTTTTTTTGTTTGTATGTTAAATTTTTTAATTAATTTATTAATATTTTGTGAATTTTTAGATA	AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEAEEEEEE	YT:Z:UU

>>> Writing bisulfite mapping results to /private/var/folders/cf/k7c5rjhj1sggk6x8z1jnw4b80000gp/T/tmprjiux3me/results/dataset_2.dat_bismark_bt2.bam <<<

Unmapped sequences will be written to /private/var/folders/cf/k7c5rjhj1sggk6x8z1jnw4b80000gp/T/tmprjiux3me/results/dataset_2.dat_unmapped_reads.fq.gz
Ambiguously mapping sequences will be written to /private/var/folders/cf/k7c5rjhj1sggk6x8z1jnw4b80000gp/T/tmprjiux3me/results/dataset_2.dat_ambiguous_reads.fq.gz

Reading in the sequence file /private/var/folders/cf/k7c5rjhj1sggk6x8z1jnw4b80000gp/T/tmpywNSSM/files/000/dataset_2.dat
Processed 44115 sequences in total


Successfully deleted the temporary file /private/var/folders/cf/k7c5rjhj1sggk6x8z1jnw4b80000gp/T/tmprjiux3me/dataset_2.dat_C_to_T.fastq.gz

Final Alignment report
======================
Sequences analysed in total:	44115
Number of alignments with a unique best hit from the different alignments:	554
Mapping efficiency:	1.3%

Sequences with no alignments under any condition:	43115
Sequences did not map uniquely:	446
Sequences which were discarded because genomic sequence could not be extracted:	0

Number of sequences with unique best (first) alignment came from the bowtie output:
CT/CT:	230	((converted) top strand)
CT/GA:	324	((converted) bottom strand)
GA/CT:	0	(complementary to (converted) top strand)
GA/GA:	0	(complementary to (converted) bottom strand)

Number of alignments to (merely theoretical) complementary strands being rejected in total:	0

Final Cytosine Methylation Report
=================================
Total number of C's analysed:	8563

Total methylated C's in CpG context:	245
Total methylated C's in CHG context:	51
Total methylated C's in CHH context:	114
Total methylated C's in Unknown context:	1

Total unmethylated C's in CpG context:	133
Total unmethylated C's in CHG context:	1762
Total unmethylated C's in CHH context:	6258
Total unmethylated C's in Unknown context:	5

C methylated in CpG context:	64.8%
C methylated in CHG context:	2.8%
C methylated in CHH context:	1.8%
C methylated in Unknown context (CN or CHN):	16.7%


Bismark completed in 0d 0h 0m 8s

====================
Bismark run complete
====================
author	bgruening
date	Tue, 26 Feb 2019 18:42:28 -0500
parents	9bfe38410155
children	f211753166bd