annotate default-masurca-config @ 0:3f13e9565679 draft

Uploaded
author dnbenso
date Mon, 24 Jan 2022 00:00:38 +0000
parents
children 1808eaa9d699
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
1 DATA
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
2 #Illumina paired end reads supplied as <two-character prefix> <fragment mean> <fragment stdev> <forward_reads> <reverse_reads>
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
3 #if single-end, do not specify <reverse_reads>
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
4 #MUST HAVE Illumina paired end reads to use MaSuRCA
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
5 #PE= pe 500 50 /FULL_PATH/frag_1.fastq /FULL_PATH/frag_2.fastq
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
6 PE= pe MEAN STDDEV INPUTREAD1 INPUTREAD2
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
7 #Illumina mate pair reads supplied as <two-character prefix> <fragment mean> <fragment stdev> <forward_reads> <reverse_reads>
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
8 #JUMP= sh 3600 200 /FULL_PATH/short_1.fastq /FULL_PATH/short_2.fastq
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
9 #pacbio OR nanopore reads must be in a single fasta or fastq file with absolute path, can be gzipped
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
10 #if you have both types of reads supply them both as NANOPORE type
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
11 #PACBIO=/FULL_PATH/pacbio.fa
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
12 #PACBIO=INPUTREADLONG
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
13 #NANOPORE=/FULL_PATH/nanopore.fa
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
14 #NANOPORE=INPUTREADLONG
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
15 #Other reads (Sanger, 454, etc) one frg file, concatenate your frg files into one if you have many
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
16 #OTHER=/FULL_PATH/file.frg
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
17 #synteny-assisted assembly, concatenate all reference genomes into one reference.fa; works for Illumina-only data
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
18 #REFERENCE=REF
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
19 END
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
20
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
21 PARAMETERS
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
22 #PLEASE READ all comments to essential parameters below, and set the parameters according to your project
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
23 #set this to 1 if your Illumina jumping library reads are shorter than 100bp
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
24 EXTEND_JUMP_READS=0
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
25 #this is k-mer size for deBruijn graph values between 25 and 127 are supported, auto will compute the optimal size based on the read data and GC content
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
26 GRAPH_KMER_SIZE = auto
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
27 #set this to 1 for all Illumina-only assemblies
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
28 #set this to 0 if you have more than 15x coverage by long reads (Pacbio or Nanopore) or any other long reads/mate pairs (Illumina MP, Sanger, 454, etc)
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
29 USE_LINKING_MATES = 0
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
30 #specifies whether to run the assembly on the grid
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
31 USE_GRID=0
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
32 #specifies grid engine to use SGE or SLURM
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
33 GRID_ENGINE=SLURM
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
34 #specifies queue (for SGE) or partition (for SLURM) to use when running on the grid MANDATORY
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
35 GRID_QUEUE=defq
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
36 #batch size in the amount of long read sequence for each batch on the grid
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
37 GRID_BATCH_SIZE=500000000
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
38 #use at most this much coverage by the longest Pacbio or Nanopore reads, discard the rest of the reads
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
39 #can increase this to 30 or 35 if your reads are short (N50<7000bp)
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
40 LHE_COVERAGE=25
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
41 #set to 0 (default) to do two passes of mega-reads for slower, but higher quality assembly, otherwise set to 1
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
42 MEGA_READS_ONE_PASS=0
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
43 #this parameter is useful if you have too many Illumina jumping library mates. Typically set it to 60 for bacteria and 300 for the other organisms
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
44 LIMIT_JUMP_COVERAGE = 300
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
45 #these are the additional parameters to Celera Assembler. do not worry about performance, number or processors or batch sizes -- these are computed automatically.
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
46 #CABOG ASSEMBLY ONLY: set cgwErrorRate=0.25 for bacteria and 0.1<=cgwErrorRate<=0.15 for other organisms.
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
47 CA_PARAMETERS = cgwErrorRate=0.15
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
48 #CABOG ASSEMBLY ONLY: whether to attempt to close gaps in scaffolds with Illumina or long read data
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
49 CLOSE_GAPS=1
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
50 #number of cpus to use, set this to the number of CPUs/threads per node you will be using
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
51 NUM_THREADS = GALAXY_SLOTS
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
52 #this is mandatory jellyfish hash size -- a safe value is estimated_genome_size*20
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
53 JF_SIZE = JELLYFISHSIZE
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
54 #ILLUMINA ONLY. Set this to 1 to use SOAPdenovo contigging/scaffolding module.
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
55 #Assembly will be worse but will run faster. Useful for very large (>=8Gbp) genomes from Illumina-only data
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
56 SOAP_ASSEMBLY=0
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
57 #If you are doing Hybrid Illumina paired end + Nanopore/PacBio assembly ONLY (no Illumina mate pairs or OTHER frg files).
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
58 #Set this to 1 to use Flye assembler for final assembly of corrected mega-reads.
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
59 #A lot faster than CABOG, AND QUALITY IS THE SAME OR BETTER.
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
60 #Works well even when MEGA_READS_ONE_PASS is set to 1.
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
61 #DO NOT use if you have less than 15x coverage by long reads.
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
62 FLYE_ASSEMBLY=0
3f13e9565679 Uploaded
dnbenso
parents:
diff changeset
63 END