Mercurial > repos > ryanmorin > nextgen_variant_identification
comparison SNV/SNVMix2_source/SNVMix2-v0.12.1-rc1/samtools-0.1.6/samtools.1 @ 0:74f5ea818cea
Uploaded
author | ryanmorin |
---|---|
date | Wed, 12 Oct 2011 19:50:38 -0400 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:74f5ea818cea |
---|---|
1 .TH samtools 1 "2 September 2009" "samtools-0.1.6" "Bioinformatics tools" | |
2 .SH NAME | |
3 .PP | |
4 samtools - Utilities for the Sequence Alignment/Map (SAM) format | |
5 .SH SYNOPSIS | |
6 .PP | |
7 samtools view -bt ref_list.txt -o aln.bam aln.sam.gz | |
8 .PP | |
9 samtools sort aln.bam aln.sorted | |
10 .PP | |
11 samtools index aln.sorted.bam | |
12 .PP | |
13 samtools view aln.sorted.bam chr2:20,100,000-20,200,000 | |
14 .PP | |
15 samtools merge out.bam in1.bam in2.bam in3.bam | |
16 .PP | |
17 samtools faidx ref.fasta | |
18 .PP | |
19 samtools pileup -f ref.fasta aln.sorted.bam | |
20 .PP | |
21 samtools tview aln.sorted.bam ref.fasta | |
22 | |
23 .SH DESCRIPTION | |
24 .PP | |
25 Samtools is a set of utilities that manipulate alignments in the BAM | |
26 format. It imports from and exports to the SAM (Sequence Alignment/Map) | |
27 format, does sorting, merging and indexing, and allows to retrieve reads | |
28 in any regions swiftly. | |
29 | |
30 Samtools is designed to work on a stream. It regards an input file `-' | |
31 as the standard input (stdin) and an output file `-' as the standard | |
32 output (stdout). Several commands can thus be combined with Unix | |
33 pipes. Samtools always output warning and error messages to the standard | |
34 error output (stderr). | |
35 | |
36 Samtools is also able to open a BAM (not SAM) file on a remote FTP or | |
37 HTTP server if the BAM file name starts with `ftp://' or `http://'. | |
38 Samtools checks the current working directory for the index file and | |
39 will download the index upon absence. Samtools does not retrieve the | |
40 entire alignment file unless it is asked to do so. | |
41 | |
42 .SH COMMANDS AND OPTIONS | |
43 | |
44 .TP 10 | |
45 .B import | |
46 samtools import <in.ref_list> <in.sam> <out.bam> | |
47 | |
48 Since 0.1.4, this command is an alias of: | |
49 | |
50 samtools view -bt <in.ref_list> -o <out.bam> <in.sam> | |
51 | |
52 .TP | |
53 .B sort | |
54 samtools sort [-n] [-m maxMem] <in.bam> <out.prefix> | |
55 | |
56 Sort alignments by leftmost coordinates. File | |
57 .I <out.prefix>.bam | |
58 will be created. This command may also create temporary files | |
59 .I <out.prefix>.%d.bam | |
60 when the whole alignment cannot be fitted into memory (controlled by | |
61 option -m). | |
62 | |
63 .B OPTIONS: | |
64 .RS | |
65 .TP 8 | |
66 .B -n | |
67 Sort by read names rather than by chromosomal coordinates | |
68 .TP | |
69 .B -m INT | |
70 Approximately the maximum required memory. [500000000] | |
71 .RE | |
72 | |
73 .TP | |
74 .B merge | |
75 samtools merge [-h inh.sam] [-n] <out.bam> <in1.bam> <in2.bam> [...] | |
76 | |
77 Merge multiple sorted alignments. | |
78 The header reference lists of all the input BAM files, and the @SQ headers of | |
79 .IR inh.sam , | |
80 if any, must all refer to the same set of reference sequences. | |
81 The header reference list and (unless overridden by | |
82 .BR -h ) | |
83 `@' headers of | |
84 .I in1.bam | |
85 will be copied to | |
86 .IR out.bam , | |
87 and the headers of other files will be ignored. | |
88 | |
89 .B OPTIONS: | |
90 .RS | |
91 .TP 8 | |
92 .B -h FILE | |
93 Use the lines of | |
94 .I FILE | |
95 as `@' headers to be copied to | |
96 .IR out.bam , | |
97 replacing any header lines that would otherwise be copied from | |
98 .IR in1.bam . | |
99 .RI ( FILE | |
100 is actually in SAM format, though any alignment records it may contain | |
101 are ignored.) | |
102 .TP | |
103 .B -n | |
104 The input alignments are sorted by read names rather than by chromosomal | |
105 coordinates | |
106 .RE | |
107 | |
108 .TP | |
109 .B index | |
110 samtools index <aln.bam> | |
111 | |
112 Index sorted alignment for fast random access. Index file | |
113 .I <aln.bam>.bai | |
114 will be created. | |
115 | |
116 .TP | |
117 .B view | |
118 samtools view [-bhuHS] [-t in.refList] [-o output] [-f reqFlag] [-F | |
119 skipFlag] [-q minMapQ] [-l library] [-r readGroup] <in.bam>|<in.sam> [region1 [...]] | |
120 | |
121 Extract/print all or sub alignments in SAM or BAM format. If no region | |
122 is specified, all the alignments will be printed; otherwise only | |
123 alignments overlapping the specified regions will be output. An | |
124 alignment may be given multiple times if it is overlapping several | |
125 regions. A region can be presented, for example, in the following | |
126 format: `chr2', `chr2:1000000' or `chr2:1,000,000-2,000,000'. The | |
127 coordinate is 1-based. | |
128 | |
129 .B OPTIONS: | |
130 .RS | |
131 .TP 8 | |
132 .B -b | |
133 Output in the BAM format. | |
134 .TP | |
135 .B -u | |
136 Output uncompressed BAM. This option saves time spent on | |
137 compression/decomprssion and is thus preferred when the output is piped | |
138 to another samtools command. | |
139 .TP | |
140 .B -h | |
141 Include the header in the output. | |
142 .TP | |
143 .B -H | |
144 Output the header only. | |
145 .TP | |
146 .B -S | |
147 Input is in SAM. If @SQ header lines are absent, the | |
148 .B `-t' | |
149 option is required. | |
150 .TP | |
151 .B -t FILE | |
152 This file is TAB-delimited. Each line must contain the reference name | |
153 and the length of the reference, one line for each distinct reference; | |
154 additional fields are ignored. This file also defines the order of the | |
155 reference sequences in sorting. If you run `samtools faidx <ref.fa>', | |
156 the resultant index file | |
157 .I <ref.fa>.fai | |
158 can be used as this | |
159 .I <in.ref_list> | |
160 file. | |
161 .TP | |
162 .B -o FILE | |
163 Output file [stdout] | |
164 .TP | |
165 .B -f INT | |
166 Only output alignments with all bits in INT present in the FLAG | |
167 field. INT can be in hex in the format of /^0x[0-9A-F]+/ [0] | |
168 .TP | |
169 .B -F INT | |
170 Skip alignments with bits present in INT [0] | |
171 .TP | |
172 .B -q INT | |
173 Skip alignments with MAPQ smaller than INT [0] | |
174 .TP | |
175 .B -l STR | |
176 Only output reads in library STR [null] | |
177 .TP | |
178 .B -r STR | |
179 Only output reads in read group STR [null] | |
180 .RE | |
181 | |
182 .TP | |
183 .B faidx | |
184 samtools faidx <ref.fasta> [region1 [...]] | |
185 | |
186 Index reference sequence in the FASTA format or extract subsequence from | |
187 indexed reference sequence. If no region is specified, | |
188 .B faidx | |
189 will index the file and create | |
190 .I <ref.fasta>.fai | |
191 on the disk. If regions are speficified, the subsequences will be | |
192 retrieved and printed to stdout in the FASTA format. The input file can | |
193 be compressed in the | |
194 .B RAZF | |
195 format. | |
196 | |
197 .TP | |
198 .B pileup | |
199 samtools pileup [-f in.ref.fasta] [-t in.ref_list] [-l in.site_list] | |
200 [-iscgS2] [-T theta] [-N nHap] [-r pairDiffRate] <in.bam>|<in.sam> | |
201 | |
202 Print the alignment in the pileup format. In the pileup format, each | |
203 line represents a genomic position, consisting of chromosome name, | |
204 coordinate, reference base, read bases, read qualities and alignment | |
205 mapping qualities. Information on match, mismatch, indel, strand, | |
206 mapping quality and start and end of a read are all encoded at the read | |
207 base column. At this column, a dot stands for a match to the reference | |
208 base on the forward strand, a comma for a match on the reverse strand, | |
209 `ACGTN' for a mismatch on the forward strand and `acgtn' for a mismatch | |
210 on the reverse strand. A pattern `\\+[0-9]+[ACGTNacgtn]+' indicates | |
211 there is an insertion between this reference position and the next | |
212 reference position. The length of the insertion is given by the integer | |
213 in the pattern, followed by the inserted sequence. Similarly, a pattern | |
214 `-[0-9]+[ACGTNacgtn]+' represents a deletion from the reference. The | |
215 deleted bases will be presented as `*' in the following lines. Also at | |
216 the read base column, a symbol `^' marks the start of a read segment | |
217 which is a contiguous subsequence on the read separated by `N/S/H' CIGAR | |
218 operations. The ASCII of the character following `^' minus 33 gives the | |
219 mapping quality. A symbol `$' marks the end of a read segment. | |
220 | |
221 If option | |
222 .B -c | |
223 is applied, the consensus base, consensus quality, SNP quality and RMS | |
224 mapping quality of the reads covering the site will be inserted between | |
225 the `reference base' and the `read bases' columns. An indel occupies an | |
226 additional line. Each indel line consists of chromosome name, | |
227 coordinate, a star, the genotype, consensus quality, SNP quality, RMS | |
228 mapping quality, # covering reads, the first alllele, the second allele, | |
229 # reads supporting the first allele, # reads supporting the second | |
230 allele and # reads containing indels different from the top two alleles. | |
231 | |
232 .B OPTIONS: | |
233 .RS | |
234 | |
235 .TP 10 | |
236 .B -s | |
237 Print the mapping quality as the last column. This option makes the | |
238 output easier to parse, although this format is not space efficient. | |
239 | |
240 .TP | |
241 .B -S | |
242 The input file is in SAM. | |
243 | |
244 .TP | |
245 .B -i | |
246 Only output pileup lines containing indels. | |
247 | |
248 .TP | |
249 .B -f FILE | |
250 The reference sequence in the FASTA format. Index file | |
251 .I FILE.fai | |
252 will be created if | |
253 absent. | |
254 | |
255 .TP | |
256 .B -M INT | |
257 Cap mapping quality at INT [60] | |
258 | |
259 .TP | |
260 .B -t FILE | |
261 List of reference names ane sequence lengths, in the format described | |
262 for the | |
263 .B import | |
264 command. If this option is present, samtools assumes the input | |
265 .I <in.alignment> | |
266 is in SAM format; otherwise it assumes in BAM format. | |
267 | |
268 .TP | |
269 .B -l FILE | |
270 List of sites at which pileup is output. This file is space | |
271 delimited. The first two columns are required to be chromosome and | |
272 1-based coordinate. Additional columns are ignored. It is | |
273 recommended to use option | |
274 .B -s | |
275 together with | |
276 .B -l | |
277 as in the default format we may not know the mapping quality. | |
278 | |
279 .TP | |
280 .B -c | |
281 Call the consensus sequence using MAQ consensus model. Options | |
282 .B -T, | |
283 .B -N, | |
284 .B -I | |
285 and | |
286 .B -r | |
287 are only effective when | |
288 .B -c | |
289 or | |
290 .B -g | |
291 is in use. | |
292 | |
293 .TP | |
294 .B -g | |
295 Generate genotype likelihood in the binary GLFv3 format. This option | |
296 suppresses -c, -i and -s. | |
297 | |
298 .TP | |
299 .B -T FLOAT | |
300 The theta parameter (error dependency coefficient) in the maq consensus | |
301 calling model [0.85] | |
302 | |
303 .TP | |
304 .B -N INT | |
305 Number of haplotypes in the sample (>=2) [2] | |
306 | |
307 .TP | |
308 .B -r FLOAT | |
309 Expected fraction of differences between a pair of haplotypes [0.001] | |
310 | |
311 .TP | |
312 .B -I INT | |
313 Phred probability of an indel in sequencing/prep. [40] | |
314 | |
315 .RE | |
316 | |
317 .TP | |
318 .B tview | |
319 samtools tview <in.sorted.bam> [ref.fasta] | |
320 | |
321 Text alignment viewer (based on the ncurses library). In the viewer, | |
322 press `?' for help and press `g' to check the alignment start from a | |
323 region in the format like `chr10:10,000,000'. | |
324 | |
325 .RE | |
326 | |
327 .TP | |
328 .B fixmate | |
329 samtools fixmate <in.nameSrt.bam> <out.bam> | |
330 | |
331 Fill in mate coordinates, ISIZE and mate related flags from a | |
332 name-sorted alignment. | |
333 | |
334 .TP | |
335 .B rmdup | |
336 samtools rmdup <input.srt.bam> <out.bam> | |
337 | |
338 Remove potential PCR duplicates: if multiple read pairs have identical | |
339 external coordinates, only retain the pair with highest mapping quality. | |
340 This command | |
341 .B ONLY | |
342 works with FR orientation and requires ISIZE is correctly set. | |
343 | |
344 .RE | |
345 | |
346 .TP | |
347 .B rmdupse | |
348 samtools rmdupse <input.srt.bam> <out.bam> | |
349 | |
350 Remove potential duplicates for single-ended reads. This command will | |
351 treat all reads as single-ended even if they are paired in fact. | |
352 | |
353 .RE | |
354 | |
355 .TP | |
356 .B fillmd | |
357 samtools fillmd [-e] <aln.bam> <ref.fasta> | |
358 | |
359 Generate the MD tag. If the MD tag is already present, this command will | |
360 give a warning if the MD tag generated is different from the existing | |
361 tag. | |
362 | |
363 .B OPTIONS: | |
364 .RS | |
365 .TP 8 | |
366 .B -e | |
367 Convert a the read base to = if it is identical to the aligned reference | |
368 base. Indel caller does not support the = bases at the moment. | |
369 | |
370 .RE | |
371 | |
372 .SH SAM FORMAT | |
373 | |
374 SAM is TAB-delimited. Apart from the header lines, which are started | |
375 with the `@' symbol, each alignment line consists of: | |
376 | |
377 .TS | |
378 center box; | |
379 cb | cb | cb | |
380 n | l | l . | |
381 Col Field Description | |
382 _ | |
383 1 QNAME Query (pair) NAME | |
384 2 FLAG bitwise FLAG | |
385 3 RNAME Reference sequence NAME | |
386 4 POS 1-based leftmost POSition/coordinate of clipped sequence | |
387 5 MAPQ MAPping Quality (Phred-scaled) | |
388 6 CIAGR extended CIGAR string | |
389 7 MRNM Mate Reference sequence NaMe (`=' if same as RNAME) | |
390 8 MPOS 1-based Mate POSistion | |
391 9 ISIZE Inferred insert SIZE | |
392 10 SEQ query SEQuence on the same strand as the reference | |
393 11 QUAL query QUALity (ASCII-33 gives the Phred base quality) | |
394 12 OPT variable OPTional fields in the format TAG:VTYPE:VALUE | |
395 .TE | |
396 | |
397 .PP | |
398 Each bit in the FLAG field is defined as: | |
399 | |
400 .TS | |
401 center box; | |
402 cb | cb | |
403 l | l . | |
404 Flag Description | |
405 _ | |
406 0x0001 the read is paired in sequencing | |
407 0x0002 the read is mapped in a proper pair | |
408 0x0004 the query sequence itself is unmapped | |
409 0x0008 the mate is unmapped | |
410 0x0010 strand of the query (1 for reverse) | |
411 0x0020 strand of the mate | |
412 0x0040 the read is the first read in a pair | |
413 0x0080 the read is the second read in a pair | |
414 0x0100 the alignment is not primary | |
415 0x0200 the read fails platform/vendor quality checks | |
416 0x0400 the read is either a PCR or an optical duplicate | |
417 .TE | |
418 | |
419 .SH LIMITATIONS | |
420 .PP | |
421 .IP o 2 | |
422 Unaligned words used in bam_import.c, bam_endian.h, bam.c and bam_aux.c. | |
423 .IP o 2 | |
424 CIGAR operation P is not properly handled at the moment. | |
425 .IP o 2 | |
426 In merging, the input files are required to have the same number of | |
427 reference sequences. The requirement can be relaxed. In addition, | |
428 merging does not reconstruct the header dictionaries | |
429 automatically. Endusers have to provide the correct header. Picard is | |
430 better at merging. | |
431 .IP o 2 | |
432 Samtools' rmdup does not work for single-end data and does not remove | |
433 duplicates across chromosomes. Picard is better. | |
434 | |
435 .SH AUTHOR | |
436 .PP | |
437 Heng Li from the Sanger Institute wrote the C version of samtools. Bob | |
438 Handsaker from the Broad Institute implemented the BGZF library and Jue | |
439 Ruan from Beijing Genomics Institute wrote the RAZF library. Various | |
440 people in the 1000Genomes Project contributed to the SAM format | |
441 specification. | |
442 | |
443 .SH SEE ALSO | |
444 .PP | |
445 Samtools website: <http://samtools.sourceforge.net> |