Mercurial > repos > lsong10 > psiclass
comparison PsiCLASS-1.0.2/samtools-0.1.19/samtools.1 @ 0:903fc43d6227 draft default tip
Uploaded
| author | lsong10 |
|---|---|
| date | Fri, 26 Mar 2021 16:52:45 +0000 |
| parents | |
| children |
comparison
equal
deleted
inserted
replaced
| -1:000000000000 | 0:903fc43d6227 |
|---|---|
| 1 .TH samtools 1 "15 March 2013" "samtools-0.1.19" "Bioinformatics tools" | |
| 2 .SH NAME | |
| 3 .PP | |
| 4 samtools - Utilities for the Sequence Alignment/Map (SAM) format | |
| 5 | |
| 6 bcftools - Utilities for the Binary Call Format (BCF) and VCF | |
| 7 .SH SYNOPSIS | |
| 8 .PP | |
| 9 samtools view -bt ref_list.txt -o aln.bam aln.sam.gz | |
| 10 .PP | |
| 11 samtools sort aln.bam aln.sorted | |
| 12 .PP | |
| 13 samtools index aln.sorted.bam | |
| 14 .PP | |
| 15 samtools idxstats aln.sorted.bam | |
| 16 .PP | |
| 17 samtools view aln.sorted.bam chr2:20,100,000-20,200,000 | |
| 18 .PP | |
| 19 samtools merge out.bam in1.bam in2.bam in3.bam | |
| 20 .PP | |
| 21 samtools faidx ref.fasta | |
| 22 .PP | |
| 23 samtools pileup -vcf ref.fasta aln.sorted.bam | |
| 24 .PP | |
| 25 samtools mpileup -C50 -gf ref.fasta -r chr3:1,000-2,000 in1.bam in2.bam | |
| 26 .PP | |
| 27 samtools tview aln.sorted.bam ref.fasta | |
| 28 .PP | |
| 29 bcftools index in.bcf | |
| 30 .PP | |
| 31 bcftools view in.bcf chr2:100-200 > out.vcf | |
| 32 .PP | |
| 33 bcftools view -Nvm0.99 in.bcf > out.vcf 2> out.afs | |
| 34 | |
| 35 .SH DESCRIPTION | |
| 36 .PP | |
| 37 Samtools is a set of utilities that manipulate alignments in the BAM | |
| 38 format. It imports from and exports to the SAM (Sequence Alignment/Map) | |
| 39 format, does sorting, merging and indexing, and allows to retrieve reads | |
| 40 in any regions swiftly. | |
| 41 | |
| 42 Samtools is designed to work on a stream. It regards an input file `-' | |
| 43 as the standard input (stdin) and an output file `-' as the standard | |
| 44 output (stdout). Several commands can thus be combined with Unix | |
| 45 pipes. Samtools always output warning and error messages to the standard | |
| 46 error output (stderr). | |
| 47 | |
| 48 Samtools is also able to open a BAM (not SAM) file on a remote FTP or | |
| 49 HTTP server if the BAM file name starts with `ftp://' or `http://'. | |
| 50 Samtools checks the current working directory for the index file and | |
| 51 will download the index upon absence. Samtools does not retrieve the | |
| 52 entire alignment file unless it is asked to do so. | |
| 53 | |
| 54 .SH SAMTOOLS COMMANDS AND OPTIONS | |
| 55 | |
| 56 .TP 10 | |
| 57 .B view | |
| 58 samtools view [-bchuHS] [-t in.refList] [-o output] [-f reqFlag] [-F | |
| 59 skipFlag] [-q minMapQ] [-l library] [-r readGroup] [-R rgFile] <in.bam>|<in.sam> [region1 [...]] | |
| 60 | |
| 61 Extract/print all or sub alignments in SAM or BAM format. If no region | |
| 62 is specified, all the alignments will be printed; otherwise only | |
| 63 alignments overlapping the specified regions will be output. An | |
| 64 alignment may be given multiple times if it is overlapping several | |
| 65 regions. A region can be presented, for example, in the following | |
| 66 format: `chr2' (the whole chr2), `chr2:1000000' (region starting from | |
| 67 1,000,000bp) or `chr2:1,000,000-2,000,000' (region between 1,000,000 and | |
| 68 2,000,000bp including the end points). The coordinate is 1-based. | |
| 69 | |
| 70 .B OPTIONS: | |
| 71 .RS | |
| 72 .TP 10 | |
| 73 .B -b | |
| 74 Output in the BAM format. | |
| 75 .TP | |
| 76 .BI -f \ INT | |
| 77 Only output alignments with all bits in INT present in the FLAG | |
| 78 field. INT can be in hex in the format of /^0x[0-9A-F]+/ [0] | |
| 79 .TP | |
| 80 .BI -F \ INT | |
| 81 Skip alignments with bits present in INT [0] | |
| 82 .TP | |
| 83 .B -h | |
| 84 Include the header in the output. | |
| 85 .TP | |
| 86 .B -H | |
| 87 Output the header only. | |
| 88 .TP | |
| 89 .BI -l \ STR | |
| 90 Only output reads in library STR [null] | |
| 91 .TP | |
| 92 .BI -o \ FILE | |
| 93 Output file [stdout] | |
| 94 .TP | |
| 95 .BI -q \ INT | |
| 96 Skip alignments with MAPQ smaller than INT [0] | |
| 97 .TP | |
| 98 .BI -r \ STR | |
| 99 Only output reads in read group STR [null] | |
| 100 .TP | |
| 101 .BI -R \ FILE | |
| 102 Output reads in read groups listed in | |
| 103 .I FILE | |
| 104 [null] | |
| 105 .TP | |
| 106 .BI -s \ FLOAT | |
| 107 Fraction of templates/pairs to subsample; the integer part is treated as the | |
| 108 seed for the random number generator [-1] | |
| 109 .TP | |
| 110 .B -S | |
| 111 Input is in SAM. If @SQ header lines are absent, the | |
| 112 .B `-t' | |
| 113 option is required. | |
| 114 .TP | |
| 115 .B -c | |
| 116 Instead of printing the alignments, only count them and print the | |
| 117 total number. All filter options, such as | |
| 118 .B `-f', | |
| 119 .B `-F' | |
| 120 and | |
| 121 .B `-q' | |
| 122 , are taken into account. | |
| 123 .TP | |
| 124 .BI -t \ FILE | |
| 125 This file is TAB-delimited. Each line must contain the reference name | |
| 126 and the length of the reference, one line for each distinct reference; | |
| 127 additional fields are ignored. This file also defines the order of the | |
| 128 reference sequences in sorting. If you run `samtools faidx <ref.fa>', | |
| 129 the resultant index file | |
| 130 .I <ref.fa>.fai | |
| 131 can be used as this | |
| 132 .I <in.ref_list> | |
| 133 file. | |
| 134 .TP | |
| 135 .B -u | |
| 136 Output uncompressed BAM. This option saves time spent on | |
| 137 compression/decomprssion and is thus preferred when the output is piped | |
| 138 to another samtools command. | |
| 139 .RE | |
| 140 | |
| 141 .TP | |
| 142 .B tview | |
| 143 samtools tview | |
| 144 .RB [ \-p | |
| 145 .IR chr:pos ] | |
| 146 .RB [ \-s | |
| 147 .IR STR ] | |
| 148 .RB [ \-d | |
| 149 .IR display ] | |
| 150 .RI <in.sorted.bam> | |
| 151 .RI [ref.fasta] | |
| 152 | |
| 153 Text alignment viewer (based on the ncurses library). In the viewer, | |
| 154 press `?' for help and press `g' to check the alignment start from a | |
| 155 region in the format like `chr10:10,000,000' or `=10,000,000' when | |
| 156 viewing the same reference sequence. | |
| 157 | |
| 158 .B Options: | |
| 159 .RS | |
| 160 .TP 14 | |
| 161 .BI -d \ display | |
| 162 Output as (H)tml or (C)urses or (T)ext | |
| 163 .TP | |
| 164 .BI -p \ chr:pos | |
| 165 Go directly to this position | |
| 166 .TP | |
| 167 .BI -s \ STR | |
| 168 Display only reads from this sample or read group | |
| 169 .RE | |
| 170 | |
| 171 .TP | |
| 172 .B mpileup | |
| 173 samtools mpileup | |
| 174 .RB [ \-EBugp ] | |
| 175 .RB [ \-C | |
| 176 .IR capQcoef ] | |
| 177 .RB [ \-r | |
| 178 .IR reg ] | |
| 179 .RB [ \-f | |
| 180 .IR in.fa ] | |
| 181 .RB [ \-l | |
| 182 .IR list ] | |
| 183 .RB [ \-M | |
| 184 .IR capMapQ ] | |
| 185 .RB [ \-Q | |
| 186 .IR minBaseQ ] | |
| 187 .RB [ \-q | |
| 188 .IR minMapQ ] | |
| 189 .I in.bam | |
| 190 .RI [ in2.bam | |
| 191 .RI [ ... ]] | |
| 192 | |
| 193 Generate BCF or pileup for one or multiple BAM files. Alignment records | |
| 194 are grouped by sample identifiers in @RG header lines. If sample | |
| 195 identifiers are absent, each input file is regarded as one sample. | |
| 196 | |
| 197 In the pileup format (without | |
| 198 .BR -u or -g ), | |
| 199 each | |
| 200 line represents a genomic position, consisting of chromosome name, | |
| 201 coordinate, reference base, read bases, read qualities and alignment | |
| 202 mapping qualities. Information on match, mismatch, indel, strand, | |
| 203 mapping quality and start and end of a read are all encoded at the read | |
| 204 base column. At this column, a dot stands for a match to the reference | |
| 205 base on the forward strand, a comma for a match on the reverse strand, | |
| 206 a '>' or '<' for a reference skip, `ACGTN' for a mismatch on the forward | |
| 207 strand and `acgtn' for a mismatch on the reverse strand. A pattern | |
| 208 `\\+[0-9]+[ACGTNacgtn]+' indicates there is an insertion between this | |
| 209 reference position and the next reference position. The length of the | |
| 210 insertion is given by the integer in the pattern, followed by the | |
| 211 inserted sequence. Similarly, a pattern `-[0-9]+[ACGTNacgtn]+' | |
| 212 represents a deletion from the reference. The deleted bases will be | |
| 213 presented as `*' in the following lines. Also at the read base column, a | |
| 214 symbol `^' marks the start of a read. The ASCII of the character | |
| 215 following `^' minus 33 gives the mapping quality. A symbol `$' marks the | |
| 216 end of a read segment. | |
| 217 | |
| 218 .B Input Options: | |
| 219 .RS | |
| 220 .TP 10 | |
| 221 .B -6 | |
| 222 Assume the quality is in the Illumina 1.3+ encoding. | |
| 223 .B -A | |
| 224 Do not skip anomalous read pairs in variant calling. | |
| 225 .TP | |
| 226 .B -B | |
| 227 Disable probabilistic realignment for the computation of base alignment | |
| 228 quality (BAQ). BAQ is the Phred-scaled probability of a read base being | |
| 229 misaligned. Applying this option greatly helps to reduce false SNPs | |
| 230 caused by misalignments. | |
| 231 .TP | |
| 232 .BI -b \ FILE | |
| 233 List of input BAM files, one file per line [null] | |
| 234 .TP | |
| 235 .BI -C \ INT | |
| 236 Coefficient for downgrading mapping quality for reads containing | |
| 237 excessive mismatches. Given a read with a phred-scaled probability q of | |
| 238 being generated from the mapped position, the new mapping quality is | |
| 239 about sqrt((INT-q)/INT)*INT. A zero value disables this | |
| 240 functionality; if enabled, the recommended value for BWA is 50. [0] | |
| 241 .TP | |
| 242 .BI -d \ INT | |
| 243 At a position, read maximally | |
| 244 .I INT | |
| 245 reads per input BAM. [250] | |
| 246 .TP | |
| 247 .B -E | |
| 248 Extended BAQ computation. This option helps sensitivity especially for MNPs, but may hurt | |
| 249 specificity a little bit. | |
| 250 .TP | |
| 251 .BI -f \ FILE | |
| 252 The | |
| 253 .BR faidx -indexed | |
| 254 reference file in the FASTA format. The file can be optionally compressed by | |
| 255 .BR razip . | |
| 256 [null] | |
| 257 .TP | |
| 258 .BI -l \ FILE | |
| 259 BED or position list file containing a list of regions or sites where pileup or BCF should be generated [null] | |
| 260 .TP | |
| 261 .BI -q \ INT | |
| 262 Minimum mapping quality for an alignment to be used [0] | |
| 263 .TP | |
| 264 .BI -Q \ INT | |
| 265 Minimum base quality for a base to be considered [13] | |
| 266 .TP | |
| 267 .BI -r \ STR | |
| 268 Only generate pileup in region | |
| 269 .I STR | |
| 270 [all sites] | |
| 271 .TP | |
| 272 .B Output Options: | |
| 273 | |
| 274 .TP | |
| 275 .B -D | |
| 276 Output per-sample read depth | |
| 277 .TP | |
| 278 .B -g | |
| 279 Compute genotype likelihoods and output them in the binary call format (BCF). | |
| 280 .TP | |
| 281 .B -S | |
| 282 Output per-sample Phred-scaled strand bias P-value | |
| 283 .TP | |
| 284 .B -u | |
| 285 Similar to | |
| 286 .B -g | |
| 287 except that the output is uncompressed BCF, which is preferred for piping. | |
| 288 | |
| 289 .TP | |
| 290 .B Options for Genotype Likelihood Computation (for -g or -u): | |
| 291 | |
| 292 .TP | |
| 293 .BI -e \ INT | |
| 294 Phred-scaled gap extension sequencing error probability. Reducing | |
| 295 .I INT | |
| 296 leads to longer indels. [20] | |
| 297 .TP | |
| 298 .BI -h \ INT | |
| 299 Coefficient for modeling homopolymer errors. Given an | |
| 300 .IR l -long | |
| 301 homopolymer | |
| 302 run, the sequencing error of an indel of size | |
| 303 .I s | |
| 304 is modeled as | |
| 305 .IR INT * s / l . | |
| 306 [100] | |
| 307 .TP | |
| 308 .B -I | |
| 309 Do not perform INDEL calling | |
| 310 .TP | |
| 311 .BI -L \ INT | |
| 312 Skip INDEL calling if the average per-sample depth is above | |
| 313 .IR INT . | |
| 314 [250] | |
| 315 .TP | |
| 316 .BI -o \ INT | |
| 317 Phred-scaled gap open sequencing error probability. Reducing | |
| 318 .I INT | |
| 319 leads to more indel calls. [40] | |
| 320 .TP | |
| 321 .BI -p | |
| 322 Apply -m and -F thresholds per sample to increase sensitivity of calling. | |
| 323 By default both options are applied to reads pooled from all samples. | |
| 324 .TP | |
| 325 .BI -P \ STR | |
| 326 Comma dilimited list of platforms (determined by | |
| 327 .BR @RG-PL ) | |
| 328 from which indel candidates are obtained. It is recommended to collect | |
| 329 indel candidates from sequencing technologies that have low indel error | |
| 330 rate such as ILLUMINA. [all] | |
| 331 .RE | |
| 332 | |
| 333 .TP | |
| 334 .B reheader | |
| 335 samtools reheader <in.header.sam> <in.bam> | |
| 336 | |
| 337 Replace the header in | |
| 338 .I in.bam | |
| 339 with the header in | |
| 340 .I in.header.sam. | |
| 341 This command is much faster than replacing the header with a | |
| 342 BAM->SAM->BAM conversion. | |
| 343 | |
| 344 .TP | |
| 345 .B cat | |
| 346 samtools cat [-h header.sam] [-o out.bam] <in1.bam> <in2.bam> [ ... ] | |
| 347 | |
| 348 Concatenate BAMs. The sequence dictionary of each input BAM must be identical, | |
| 349 although this command does not check this. This command uses a similar trick | |
| 350 to | |
| 351 .B reheader | |
| 352 which enables fast BAM concatenation. | |
| 353 | |
| 354 .TP | |
| 355 .B sort | |
| 356 samtools sort [-nof] [-m maxMem] <in.bam> <out.prefix> | |
| 357 | |
| 358 Sort alignments by leftmost coordinates. File | |
| 359 .I <out.prefix>.bam | |
| 360 will be created. This command may also create temporary files | |
| 361 .I <out.prefix>.%d.bam | |
| 362 when the whole alignment cannot be fitted into memory (controlled by | |
| 363 option -m). | |
| 364 | |
| 365 .B OPTIONS: | |
| 366 .RS | |
| 367 .TP 8 | |
| 368 .B -o | |
| 369 Output the final alignment to the standard output. | |
| 370 .TP | |
| 371 .B -n | |
| 372 Sort by read names rather than by chromosomal coordinates | |
| 373 .TP | |
| 374 .B -f | |
| 375 Use | |
| 376 .I <out.prefix> | |
| 377 as the full output path and do not append | |
| 378 .I .bam | |
| 379 suffix. | |
| 380 .TP | |
| 381 .BI -m \ INT | |
| 382 Approximately the maximum required memory. [500000000] | |
| 383 .RE | |
| 384 | |
| 385 .TP | |
| 386 .B merge | |
| 387 samtools merge [-nur1f] [-h inh.sam] [-R reg] <out.bam> <in1.bam> <in2.bam> [...] | |
| 388 | |
| 389 Merge multiple sorted alignments. | |
| 390 The header reference lists of all the input BAM files, and the @SQ headers of | |
| 391 .IR inh.sam , | |
| 392 if any, must all refer to the same set of reference sequences. | |
| 393 The header reference list and (unless overridden by | |
| 394 .BR -h ) | |
| 395 `@' headers of | |
| 396 .I in1.bam | |
| 397 will be copied to | |
| 398 .IR out.bam , | |
| 399 and the headers of other files will be ignored. | |
| 400 | |
| 401 .B OPTIONS: | |
| 402 .RS | |
| 403 .TP 8 | |
| 404 .B -1 | |
| 405 Use zlib compression level 1 to comrpess the output | |
| 406 .TP | |
| 407 .B -f | |
| 408 Force to overwrite the output file if present. | |
| 409 .TP 8 | |
| 410 .BI -h \ FILE | |
| 411 Use the lines of | |
| 412 .I FILE | |
| 413 as `@' headers to be copied to | |
| 414 .IR out.bam , | |
| 415 replacing any header lines that would otherwise be copied from | |
| 416 .IR in1.bam . | |
| 417 .RI ( FILE | |
| 418 is actually in SAM format, though any alignment records it may contain | |
| 419 are ignored.) | |
| 420 .TP | |
| 421 .B -n | |
| 422 The input alignments are sorted by read names rather than by chromosomal | |
| 423 coordinates | |
| 424 .TP | |
| 425 .BI -R \ STR | |
| 426 Merge files in the specified region indicated by | |
| 427 .I STR | |
| 428 [null] | |
| 429 .TP | |
| 430 .B -r | |
| 431 Attach an RG tag to each alignment. The tag value is inferred from file names. | |
| 432 .TP | |
| 433 .B -u | |
| 434 Uncompressed BAM output | |
| 435 .RE | |
| 436 | |
| 437 .TP | |
| 438 .B index | |
| 439 samtools index <aln.bam> | |
| 440 | |
| 441 Index sorted alignment for fast random access. Index file | |
| 442 .I <aln.bam>.bai | |
| 443 will be created. | |
| 444 | |
| 445 .TP | |
| 446 .B idxstats | |
| 447 samtools idxstats <aln.bam> | |
| 448 | |
| 449 Retrieve and print stats in the index file. The output is TAB delimited | |
| 450 with each line consisting of reference sequence name, sequence length, # | |
| 451 mapped reads and # unmapped reads. | |
| 452 | |
| 453 .TP | |
| 454 .B faidx | |
| 455 samtools faidx <ref.fasta> [region1 [...]] | |
| 456 | |
| 457 Index reference sequence in the FASTA format or extract subsequence from | |
| 458 indexed reference sequence. If no region is specified, | |
| 459 .B faidx | |
| 460 will index the file and create | |
| 461 .I <ref.fasta>.fai | |
| 462 on the disk. If regions are speficified, the subsequences will be | |
| 463 retrieved and printed to stdout in the FASTA format. The input file can | |
| 464 be compressed in the | |
| 465 .B RAZF | |
| 466 format. | |
| 467 | |
| 468 .TP | |
| 469 .B fixmate | |
| 470 samtools fixmate <in.nameSrt.bam> <out.bam> | |
| 471 | |
| 472 Fill in mate coordinates, ISIZE and mate related flags from a | |
| 473 name-sorted alignment. | |
| 474 | |
| 475 .TP | |
| 476 .B rmdup | |
| 477 samtools rmdup [-sS] <input.srt.bam> <out.bam> | |
| 478 | |
| 479 Remove potential PCR duplicates: if multiple read pairs have identical | |
| 480 external coordinates, only retain the pair with highest mapping quality. | |
| 481 In the paired-end mode, this command | |
| 482 .B ONLY | |
| 483 works with FR orientation and requires ISIZE is correctly set. It does | |
| 484 not work for unpaired reads (e.g. two ends mapped to different | |
| 485 chromosomes or orphan reads). | |
| 486 | |
| 487 .B OPTIONS: | |
| 488 .RS | |
| 489 .TP 8 | |
| 490 .B -s | |
| 491 Remove duplicate for single-end reads. By default, the command works for | |
| 492 paired-end reads only. | |
| 493 .TP 8 | |
| 494 .B -S | |
| 495 Treat paired-end reads and single-end reads. | |
| 496 .RE | |
| 497 | |
| 498 .TP | |
| 499 .B calmd | |
| 500 samtools calmd [-EeubSr] [-C capQcoef] <aln.bam> <ref.fasta> | |
| 501 | |
| 502 Generate the MD tag. If the MD tag is already present, this command will | |
| 503 give a warning if the MD tag generated is different from the existing | |
| 504 tag. Output SAM by default. | |
| 505 | |
| 506 .B OPTIONS: | |
| 507 .RS | |
| 508 .TP 8 | |
| 509 .B -A | |
| 510 When used jointly with | |
| 511 .B -r | |
| 512 this option overwrites the original base quality. | |
| 513 .TP 8 | |
| 514 .B -e | |
| 515 Convert a the read base to = if it is identical to the aligned reference | |
| 516 base. Indel caller does not support the = bases at the moment. | |
| 517 .TP | |
| 518 .B -u | |
| 519 Output uncompressed BAM | |
| 520 .TP | |
| 521 .B -b | |
| 522 Output compressed BAM | |
| 523 .TP | |
| 524 .B -S | |
| 525 The input is SAM with header lines | |
| 526 .TP | |
| 527 .BI -C \ INT | |
| 528 Coefficient to cap mapping quality of poorly mapped reads. See the | |
| 529 .B pileup | |
| 530 command for details. [0] | |
| 531 .TP | |
| 532 .B -r | |
| 533 Compute the BQ tag (without -A) or cap base quality by BAQ (with -A). | |
| 534 .TP | |
| 535 .B -E | |
| 536 Extended BAQ calculation. This option trades specificity for sensitivity, though the | |
| 537 effect is minor. | |
| 538 .RE | |
| 539 | |
| 540 .TP | |
| 541 .B targetcut | |
| 542 samtools targetcut [-Q minBaseQ] [-i inPenalty] [-0 em0] [-1 em1] [-2 em2] [-f ref] <in.bam> | |
| 543 | |
| 544 This command identifies target regions by examining the continuity of read depth, computes | |
| 545 haploid consensus sequences of targets and outputs a SAM with each sequence corresponding | |
| 546 to a target. When option | |
| 547 .B -f | |
| 548 is in use, BAQ will be applied. This command is | |
| 549 .B only | |
| 550 designed for cutting fosmid clones from fosmid pool sequencing [Ref. Kitzman et al. (2010)]. | |
| 551 .RE | |
| 552 | |
| 553 .TP | |
| 554 .B phase | |
| 555 samtools phase [-AF] [-k len] [-b prefix] [-q minLOD] [-Q minBaseQ] <in.bam> | |
| 556 | |
| 557 Call and phase heterozygous SNPs. | |
| 558 .B OPTIONS: | |
| 559 .RS | |
| 560 .TP 8 | |
| 561 .B -A | |
| 562 Drop reads with ambiguous phase. | |
| 563 .TP 8 | |
| 564 .BI -b \ STR | |
| 565 Prefix of BAM output. When this option is in use, phase-0 reads will be saved in file | |
| 566 .BR STR .0.bam | |
| 567 and phase-1 reads in | |
| 568 .BR STR .1.bam. | |
| 569 Phase unknown reads will be randomly allocated to one of the two files. Chimeric reads | |
| 570 with switch errors will be saved in | |
| 571 .BR STR .chimeric.bam. | |
| 572 [null] | |
| 573 .TP | |
| 574 .B -F | |
| 575 Do not attempt to fix chimeric reads. | |
| 576 .TP | |
| 577 .BI -k \ INT | |
| 578 Maximum length for local phasing. [13] | |
| 579 .TP | |
| 580 .BI -q \ INT | |
| 581 Minimum Phred-scaled LOD to call a heterozygote. [40] | |
| 582 .TP | |
| 583 .BI -Q \ INT | |
| 584 Minimum base quality to be used in het calling. [13] | |
| 585 .RE | |
| 586 | |
| 587 .SH BCFTOOLS COMMANDS AND OPTIONS | |
| 588 | |
| 589 .TP 10 | |
| 590 .B view | |
| 591 .B bcftools view | |
| 592 .RB [ \-AbFGNQSucgv ] | |
| 593 .RB [ \-D | |
| 594 .IR seqDict ] | |
| 595 .RB [ \-l | |
| 596 .IR listLoci ] | |
| 597 .RB [ \-s | |
| 598 .IR listSample ] | |
| 599 .RB [ \-i | |
| 600 .IR gapSNPratio ] | |
| 601 .RB [ \-t | |
| 602 .IR mutRate ] | |
| 603 .RB [ \-p | |
| 604 .IR varThres ] | |
| 605 .RB [ \-m | |
| 606 .IR varThres ] | |
| 607 .RB [ \-P | |
| 608 .IR prior ] | |
| 609 .RB [ \-1 | |
| 610 .IR nGroup1 ] | |
| 611 .RB [ \-d | |
| 612 .IR minFrac ] | |
| 613 .RB [ \-U | |
| 614 .IR nPerm ] | |
| 615 .RB [ \-X | |
| 616 .IR permThres ] | |
| 617 .RB [ \-T | |
| 618 .IR trioType ] | |
| 619 .I in.bcf | |
| 620 .RI [ region ] | |
| 621 | |
| 622 Convert between BCF and VCF, call variant candidates and estimate allele | |
| 623 frequencies. | |
| 624 | |
| 625 .RS | |
| 626 .TP | |
| 627 .B Input/Output Options: | |
| 628 .TP 10 | |
| 629 .B -A | |
| 630 Retain all possible alternate alleles at variant sites. By default, the view | |
| 631 command discards unlikely alleles. | |
| 632 .TP 10 | |
| 633 .B -b | |
| 634 Output in the BCF format. The default is VCF. | |
| 635 .TP | |
| 636 .BI -D \ FILE | |
| 637 Sequence dictionary (list of chromosome names) for VCF->BCF conversion [null] | |
| 638 .TP | |
| 639 .B -F | |
| 640 Indicate PL is generated by r921 or before (ordering is different). | |
| 641 .TP | |
| 642 .B -G | |
| 643 Suppress all individual genotype information. | |
| 644 .TP | |
| 645 .BI -l \ FILE | |
| 646 List of sites at which information are outputted [all sites] | |
| 647 .TP | |
| 648 .B -N | |
| 649 Skip sites where the REF field is not A/C/G/T | |
| 650 .TP | |
| 651 .B -Q | |
| 652 Output the QCALL likelihood format | |
| 653 .TP | |
| 654 .BI -s \ FILE | |
| 655 List of samples to use. The first column in the input gives the sample names | |
| 656 and the second gives the ploidy, which can only be 1 or 2. When the 2nd column | |
| 657 is absent, the sample ploidy is assumed to be 2. In the output, the ordering of | |
| 658 samples will be identical to the one in | |
| 659 .IR FILE . | |
| 660 [null] | |
| 661 .TP | |
| 662 .B -S | |
| 663 The input is VCF instead of BCF. | |
| 664 .TP | |
| 665 .B -u | |
| 666 Uncompressed BCF output (force -b). | |
| 667 .TP | |
| 668 .B Consensus/Variant Calling Options: | |
| 669 .TP 10 | |
| 670 .B -c | |
| 671 Call variants using Bayesian inference. This option automatically invokes option | |
| 672 .BR -e . | |
| 673 .TP | |
| 674 .BI -d \ FLOAT | |
| 675 When | |
| 676 .B -v | |
| 677 is in use, skip loci where the fraction of samples covered by reads is below FLOAT. [0] | |
| 678 .TP | |
| 679 .B -e | |
| 680 Perform max-likelihood inference only, including estimating the site allele frequency, | |
| 681 testing Hardy-Weinberg equlibrium and testing associations with LRT. | |
| 682 .TP | |
| 683 .B -g | |
| 684 Call per-sample genotypes at variant sites (force -c) | |
| 685 .TP | |
| 686 .BI -i \ FLOAT | |
| 687 Ratio of INDEL-to-SNP mutation rate [0.15] | |
| 688 .TP | |
| 689 .BI -m \ FLOAT | |
| 690 New model for improved multiallelic and rare-variant calling. Another | |
| 691 ALT allele is accepted if P(chi^2) of LRT exceeds the FLOAT threshold. The | |
| 692 parameter seems robust and the actual value usually does not affect the results | |
| 693 much; a good value to use is 0.99. This is the recommended calling method. [0] | |
| 694 .TP | |
| 695 .BI -p \ FLOAT | |
| 696 A site is considered to be a variant if P(ref|D)<FLOAT [0.5] | |
| 697 .TP | |
| 698 .BI -P \ STR | |
| 699 Prior or initial allele frequency spectrum. If STR can be | |
| 700 .IR full , | |
| 701 .IR cond2 , | |
| 702 .I flat | |
| 703 or the file consisting of error output from a previous variant calling | |
| 704 run. | |
| 705 .TP | |
| 706 .BI -t \ FLOAT | |
| 707 Scaled muttion rate for variant calling [0.001] | |
| 708 .TP | |
| 709 .BI -T \ STR | |
| 710 Enable pair/trio calling. For trio calling, option | |
| 711 .B -s | |
| 712 is usually needed to be applied to configure the trio members and their ordering. | |
| 713 In the file supplied to the option | |
| 714 .BR -s , | |
| 715 the first sample must be the child, the second the father and the third the mother. | |
| 716 The valid values of | |
| 717 .I STR | |
| 718 are `pair', `trioauto', `trioxd' and `trioxs', where `pair' calls differences between two input samples, and `trioxd' (`trioxs') specifies that the input | |
| 719 is from the X chromosome non-PAR regions and the child is a female (male). [null] | |
| 720 .TP | |
| 721 .B -v | |
| 722 Output variant sites only (force -c) | |
| 723 .TP | |
| 724 .B Contrast Calling and Association Test Options: | |
| 725 .TP | |
| 726 .BI -1 \ INT | |
| 727 Number of group-1 samples. This option is used for dividing the samples into | |
| 728 two groups for contrast SNP calling or association test. | |
| 729 When this option is in use, the following VCF INFO will be outputted: | |
| 730 PC2, PCHI2 and QCHI2. [0] | |
| 731 .TP | |
| 732 .BI -U \ INT | |
| 733 Number of permutations for association test (effective only with | |
| 734 .BR -1 ) | |
| 735 [0] | |
| 736 .TP | |
| 737 .BI -X \ FLOAT | |
| 738 Only perform permutations for P(chi^2)<FLOAT (effective only with | |
| 739 .BR -U ) | |
| 740 [0.01] | |
| 741 .RE | |
| 742 | |
| 743 .TP | |
| 744 .B index | |
| 745 .B bcftools index | |
| 746 .I in.bcf | |
| 747 | |
| 748 Index sorted BCF for random access. | |
| 749 .RE | |
| 750 | |
| 751 .TP | |
| 752 .B cat | |
| 753 .B bcftools cat | |
| 754 .I in1.bcf | |
| 755 .RI [ "in2.bcf " [ ... "]]]" | |
| 756 | |
| 757 Concatenate BCF files. The input files are required to be sorted and | |
| 758 have identical samples appearing in the same order. | |
| 759 .RE | |
| 760 .SH SAM FORMAT | |
| 761 | |
| 762 Sequence Alignment/Map (SAM) format is TAB-delimited. Apart from the header lines, which are started | |
| 763 with the `@' symbol, each alignment line consists of: | |
| 764 | |
| 765 .TS | |
| 766 center box; | |
| 767 cb | cb | cb | |
| 768 n | l | l . | |
| 769 Col Field Description | |
| 770 _ | |
| 771 1 QNAME Query template/pair NAME | |
| 772 2 FLAG bitwise FLAG | |
| 773 3 RNAME Reference sequence NAME | |
| 774 4 POS 1-based leftmost POSition/coordinate of clipped sequence | |
| 775 5 MAPQ MAPping Quality (Phred-scaled) | |
| 776 6 CIAGR extended CIGAR string | |
| 777 7 MRNM Mate Reference sequence NaMe (`=' if same as RNAME) | |
| 778 8 MPOS 1-based Mate POSistion | |
| 779 9 TLEN inferred Template LENgth (insert size) | |
| 780 10 SEQ query SEQuence on the same strand as the reference | |
| 781 11 QUAL query QUALity (ASCII-33 gives the Phred base quality) | |
| 782 12+ OPT variable OPTional fields in the format TAG:VTYPE:VALUE | |
| 783 .TE | |
| 784 | |
| 785 .PP | |
| 786 Each bit in the FLAG field is defined as: | |
| 787 | |
| 788 .TS | |
| 789 center box; | |
| 790 cb | cb | cb | |
| 791 l | c | l . | |
| 792 Flag Chr Description | |
| 793 _ | |
| 794 0x0001 p the read is paired in sequencing | |
| 795 0x0002 P the read is mapped in a proper pair | |
| 796 0x0004 u the query sequence itself is unmapped | |
| 797 0x0008 U the mate is unmapped | |
| 798 0x0010 r strand of the query (1 for reverse) | |
| 799 0x0020 R strand of the mate | |
| 800 0x0040 1 the read is the first read in a pair | |
| 801 0x0080 2 the read is the second read in a pair | |
| 802 0x0100 s the alignment is not primary | |
| 803 0x0200 f the read fails platform/vendor quality checks | |
| 804 0x0400 d the read is either a PCR or an optical duplicate | |
| 805 .TE | |
| 806 | |
| 807 where the second column gives the string representation of the FLAG field. | |
| 808 | |
| 809 .SH VCF FORMAT | |
| 810 | |
| 811 The Variant Call Format (VCF) is a TAB-delimited format with each data line consists of the following fields: | |
| 812 .TS | |
| 813 center box; | |
| 814 cb | cb | cb | |
| 815 n | l | l . | |
| 816 Col Field Description | |
| 817 _ | |
| 818 1 CHROM CHROMosome name | |
| 819 2 POS the left-most POSition of the variant | |
| 820 3 ID unique variant IDentifier | |
| 821 4 REF the REFerence allele | |
| 822 5 ALT the ALTernate allele(s), separated by comma | |
| 823 6 QUAL variant/reference QUALity | |
| 824 7 FILTER FILTers applied | |
| 825 8 INFO INFOrmation related to the variant, separated by semi-colon | |
| 826 9 FORMAT FORMAT of the genotype fields, separated by colon (optional) | |
| 827 10+ SAMPLE SAMPLE genotypes and per-sample information (optional) | |
| 828 .TE | |
| 829 | |
| 830 .PP | |
| 831 The following table gives the | |
| 832 .B INFO | |
| 833 tags used by samtools and bcftools. | |
| 834 | |
| 835 .TS | |
| 836 center box; | |
| 837 cb | cb | cb | |
| 838 l | l | l . | |
| 839 Tag Format Description | |
| 840 _ | |
| 841 AF1 double Max-likelihood estimate of the site allele frequency (AF) of the first ALT allele | |
| 842 DP int Raw read depth (without quality filtering) | |
| 843 DP4 int[4] # high-quality reference forward bases, ref reverse, alternate for and alt rev bases | |
| 844 FQ int Consensus quality. Positive: sample genotypes different; negative: otherwise | |
| 845 MQ int Root-Mean-Square mapping quality of covering reads | |
| 846 PC2 int[2] Phred probability of AF in group1 samples being larger (,smaller) than in group2 | |
| 847 PCHI2 double Posterior weighted chi^2 P-value between group1 and group2 samples | |
| 848 PV4 double[4] P-value for strand bias, baseQ bias, mapQ bias and tail distance bias | |
| 849 QCHI2 int Phred-scaled PCHI2 | |
| 850 RP int # permutations yielding a smaller PCHI2 | |
| 851 CLR int Phred log ratio of genotype likelihoods with and without the trio/pair constraint | |
| 852 UGT string Most probable genotype configuration without the trio constraint | |
| 853 CGT string Most probable configuration with the trio constraint | |
| 854 VDB float Tests variant positions within reads. Intended for filtering RNA-seq artifacts around splice sites | |
| 855 RPB float Mann-Whitney rank-sum test for tail distance bias | |
| 856 HWE float Hardy-Weinberg equilibrium test, Wigginton et al., PMID: 15789306 | |
| 857 .TE | |
| 858 | |
| 859 .SH EXAMPLES | |
| 860 .IP o 2 | |
| 861 Import SAM to BAM when | |
| 862 .B @SQ | |
| 863 lines are present in the header: | |
| 864 | |
| 865 samtools view -bS aln.sam > aln.bam | |
| 866 | |
| 867 If | |
| 868 .B @SQ | |
| 869 lines are absent: | |
| 870 | |
| 871 samtools faidx ref.fa | |
| 872 samtools view -bt ref.fa.fai aln.sam > aln.bam | |
| 873 | |
| 874 where | |
| 875 .I ref.fa.fai | |
| 876 is generated automatically by the | |
| 877 .B faidx | |
| 878 command. | |
| 879 | |
| 880 .IP o 2 | |
| 881 Attach the | |
| 882 .B RG | |
| 883 tag while merging sorted alignments: | |
| 884 | |
| 885 perl -e 'print "@RG\\tID:ga\\tSM:hs\\tLB:ga\\tPL:Illumina\\n@RG\\tID:454\\tSM:hs\\tLB:454\\tPL:454\\n"' > rg.txt | |
| 886 samtools merge -rh rg.txt merged.bam ga.bam 454.bam | |
| 887 | |
| 888 The value in a | |
| 889 .B RG | |
| 890 tag is determined by the file name the read is coming from. In this | |
| 891 example, in the | |
| 892 .IR merged.bam , | |
| 893 reads from | |
| 894 .I ga.bam | |
| 895 will be attached | |
| 896 .IR RG:Z:ga , | |
| 897 while reads from | |
| 898 .I 454.bam | |
| 899 will be attached | |
| 900 .IR RG:Z:454 . | |
| 901 | |
| 902 .IP o 2 | |
| 903 Call SNPs and short INDELs for one diploid individual: | |
| 904 | |
| 905 samtools mpileup -ugf ref.fa aln.bam | bcftools view -bvcg - > var.raw.bcf | |
| 906 bcftools view var.raw.bcf | vcfutils.pl varFilter -D 100 > var.flt.vcf | |
| 907 | |
| 908 The | |
| 909 .B -D | |
| 910 option of varFilter controls the maximum read depth, which should be | |
| 911 adjusted to about twice the average read depth. One may consider to add | |
| 912 .B -C50 | |
| 913 to | |
| 914 .B mpileup | |
| 915 if mapping quality is overestimated for reads containing excessive | |
| 916 mismatches. Applying this option usually helps | |
| 917 .B BWA-short | |
| 918 but may not other mappers. | |
| 919 | |
| 920 .IP o 2 | |
| 921 Generate the consensus sequence for one diploid individual: | |
| 922 | |
| 923 samtools mpileup -uf ref.fa aln.bam | bcftools view -cg - | vcfutils.pl vcf2fq > cns.fq | |
| 924 | |
| 925 .IP o 2 | |
| 926 Call somatic mutations from a pair of samples: | |
| 927 | |
| 928 samtools mpileup -DSuf ref.fa aln.bam | bcftools view -bvcgT pair - > var.bcf | |
| 929 | |
| 930 In the output INFO field, | |
| 931 .I CLR | |
| 932 gives the Phred-log ratio between the likelihood by treating the | |
| 933 two samples independently, and the likelihood by requiring the genotype to be identical. | |
| 934 This | |
| 935 .I CLR | |
| 936 is effectively a score measuring the confidence of somatic calls. The higher the better. | |
| 937 | |
| 938 .IP o 2 | |
| 939 Call de novo and somatic mutations from a family trio: | |
| 940 | |
| 941 samtools mpileup -DSuf ref.fa aln.bam | bcftools view -bvcgT pair -s samples.txt - > var.bcf | |
| 942 | |
| 943 File | |
| 944 .I samples.txt | |
| 945 should consist of three lines specifying the member and order of samples (in the order of child-father-mother). | |
| 946 Similarly, | |
| 947 .I CLR | |
| 948 gives the Phred-log likelihood ratio with and without the trio constraint. | |
| 949 .I UGT | |
| 950 shows the most likely genotype configuration without the trio constraint, and | |
| 951 .I CGT | |
| 952 gives the most likely genotype configuration satisfying the trio constraint. | |
| 953 | |
| 954 .IP o 2 | |
| 955 Phase one individual: | |
| 956 | |
| 957 samtools calmd -AEur aln.bam ref.fa | samtools phase -b prefix - > phase.out | |
| 958 | |
| 959 The | |
| 960 .B calmd | |
| 961 command is used to reduce false heterozygotes around INDELs. | |
| 962 | |
| 963 .IP o 2 | |
| 964 Call SNPs and short indels for multiple diploid individuals: | |
| 965 | |
| 966 samtools mpileup -P ILLUMINA -ugf ref.fa *.bam | bcftools view -bcvg - > var.raw.bcf | |
| 967 bcftools view var.raw.bcf | vcfutils.pl varFilter -D 2000 > var.flt.vcf | |
| 968 | |
| 969 Individuals are identified from the | |
| 970 .B SM | |
| 971 tags in the | |
| 972 .B @RG | |
| 973 header lines. Individuals can be pooled in one alignment file; one | |
| 974 individual can also be separated into multiple files. The | |
| 975 .B -P | |
| 976 option specifies that indel candidates should be collected only from | |
| 977 read groups with the | |
| 978 .B @RG-PL | |
| 979 tag set to | |
| 980 .IR ILLUMINA . | |
| 981 Collecting indel candidates from reads sequenced by an indel-prone | |
| 982 technology may affect the performance of indel calling. | |
| 983 | |
| 984 Note that there is a new calling model which can be invoked by | |
| 985 | |
| 986 bcftools view -m0.99 ... | |
| 987 | |
| 988 which fixes some severe limitations of the default method. | |
| 989 | |
| 990 For filtering, best results seem to be achieved by first applying the | |
| 991 .IR SnpGap | |
| 992 filter and then applying some machine learning approach | |
| 993 | |
| 994 vcf-annotate -f SnpGap=n | |
| 995 vcf filter ... | |
| 996 | |
| 997 Both can be found in the | |
| 998 .B vcftools | |
| 999 and | |
| 1000 .B htslib | |
| 1001 package (links below). | |
| 1002 | |
| 1003 .IP o 2 | |
| 1004 Derive the allele frequency spectrum (AFS) on a list of sites from multiple individuals: | |
| 1005 | |
| 1006 samtools mpileup -Igf ref.fa *.bam > all.bcf | |
| 1007 bcftools view -bl sites.list all.bcf > sites.bcf | |
| 1008 bcftools view -cGP cond2 sites.bcf > /dev/null 2> sites.1.afs | |
| 1009 bcftools view -cGP sites.1.afs sites.bcf > /dev/null 2> sites.2.afs | |
| 1010 bcftools view -cGP sites.2.afs sites.bcf > /dev/null 2> sites.3.afs | |
| 1011 ...... | |
| 1012 | |
| 1013 where | |
| 1014 .I sites.list | |
| 1015 contains the list of sites with each line consisting of the reference | |
| 1016 sequence name and position. The following | |
| 1017 .B bcftools | |
| 1018 commands estimate AFS by EM. | |
| 1019 | |
| 1020 .IP o 2 | |
| 1021 Dump BAQ applied alignment for other SNP callers: | |
| 1022 | |
| 1023 samtools calmd -bAr aln.bam > aln.baq.bam | |
| 1024 | |
| 1025 It adds and corrects the | |
| 1026 .B NM | |
| 1027 and | |
| 1028 .B MD | |
| 1029 tags at the same time. The | |
| 1030 .B calmd | |
| 1031 command also comes with the | |
| 1032 .B -C | |
| 1033 option, the same as the one in | |
| 1034 .B pileup | |
| 1035 and | |
| 1036 .BR mpileup . | |
| 1037 Apply if it helps. | |
| 1038 | |
| 1039 .SH LIMITATIONS | |
| 1040 .PP | |
| 1041 .IP o 2 | |
| 1042 Unaligned words used in bam_import.c, bam_endian.h, bam.c and bam_aux.c. | |
| 1043 .IP o 2 | |
| 1044 Samtools paired-end rmdup does not work for unpaired reads (e.g. orphan | |
| 1045 reads or ends mapped to different chromosomes). If this is a concern, | |
| 1046 please use Picard's MarkDuplicate which correctly handles these cases, | |
| 1047 although a little slower. | |
| 1048 | |
| 1049 .SH AUTHOR | |
| 1050 .PP | |
| 1051 Heng Li from the Sanger Institute wrote the C version of samtools. Bob | |
| 1052 Handsaker from the Broad Institute implemented the BGZF library and Jue | |
| 1053 Ruan from Beijing Genomics Institute wrote the RAZF library. John | |
| 1054 Marshall and Petr Danecek contribute to the source code and various | |
| 1055 people from the 1000 Genomes Project have contributed to the SAM format | |
| 1056 specification. | |
| 1057 | |
| 1058 .SH SEE ALSO | |
| 1059 .PP | |
| 1060 Samtools website: <http://samtools.sourceforge.net> | |
| 1061 .br | |
| 1062 Samtools latest source: <https://github.com/samtools/samtools> | |
| 1063 .br | |
| 1064 VCFtools website with stable link to VCF specification: <http://vcftools.sourceforge.net> | |
| 1065 .br | |
| 1066 HTSlib website: <https://github.com/samtools/htslib> |
