annotate SNV/SNVMix2_source/SNVMix2-v0.12.1-rc1/samtools-0.1.6/samtools.txt @ 7:351b3acadd17 default tip

Uploaded
author ryanmorin
date Tue, 18 Oct 2011 18:33:15 -0400
parents 74f5ea818cea
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
1 samtools(1) Bioinformatics tools samtools(1)
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
2
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
3
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
4
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
5 NAME
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
6 samtools - Utilities for the Sequence Alignment/Map (SAM) format
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
7
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
8 SYNOPSIS
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
9 samtools view -bt ref_list.txt -o aln.bam aln.sam.gz
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
10
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
11 samtools sort aln.bam aln.sorted
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
12
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
13 samtools index aln.sorted.bam
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
14
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
15 samtools view aln.sorted.bam chr2:20,100,000-20,200,000
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
16
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
17 samtools merge out.bam in1.bam in2.bam in3.bam
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
18
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
19 samtools faidx ref.fasta
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
20
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
21 samtools pileup -f ref.fasta aln.sorted.bam
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
22
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
23 samtools tview aln.sorted.bam ref.fasta
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
24
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
25
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
26 DESCRIPTION
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
27 Samtools is a set of utilities that manipulate alignments in the BAM
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
28 format. It imports from and exports to the SAM (Sequence Alignment/Map)
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
29 format, does sorting, merging and indexing, and allows to retrieve
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
30 reads in any regions swiftly.
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
31
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
32 Samtools is designed to work on a stream. It regards an input file `-'
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
33 as the standard input (stdin) and an output file `-' as the standard
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
34 output (stdout). Several commands can thus be combined with Unix pipes.
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
35 Samtools always output warning and error messages to the standard error
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
36 output (stderr).
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
37
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
38 Samtools is also able to open a BAM (not SAM) file on a remote FTP or
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
39 HTTP server if the BAM file name starts with `ftp://' or `http://'.
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
40 Samtools checks the current working directory for the index file and
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
41 will download the index upon absence. Samtools does not retrieve the
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
42 entire alignment file unless it is asked to do so.
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
43
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
44
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
45 COMMANDS AND OPTIONS
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
46 import samtools import <in.ref_list> <in.sam> <out.bam>
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
47
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
48 Since 0.1.4, this command is an alias of:
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
49
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
50 samtools view -bt <in.ref_list> -o <out.bam> <in.sam>
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
51
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
52
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
53 sort samtools sort [-n] [-m maxMem] <in.bam> <out.prefix>
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
54
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
55 Sort alignments by leftmost coordinates. File <out.pre-
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
56 fix>.bam will be created. This command may also create tempo-
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
57 rary files <out.prefix>.%d.bam when the whole alignment can-
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
58 not be fitted into memory (controlled by option -m).
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
59
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
60 OPTIONS:
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
61
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
62 -n Sort by read names rather than by chromosomal coordi-
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
63 nates
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
64
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
65 -m INT Approximately the maximum required memory.
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
66 [500000000]
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
67
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
68
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
69 merge samtools merge [-h inh.sam] [-n] <out.bam> <in1.bam>
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
70 <in2.bam> [...]
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
71
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
72 Merge multiple sorted alignments. The header reference lists
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
73 of all the input BAM files, and the @SQ headers of inh.sam,
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
74 if any, must all refer to the same set of reference
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
75 sequences. The header reference list and (unless overridden
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
76 by -h) `@' headers of in1.bam will be copied to out.bam, and
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
77 the headers of other files will be ignored.
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
78
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
79 OPTIONS:
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
80
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
81 -h FILE Use the lines of FILE as `@' headers to be copied to
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
82 out.bam, replacing any header lines that would other-
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
83 wise be copied from in1.bam. (FILE is actually in
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
84 SAM format, though any alignment records it may con-
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
85 tain are ignored.)
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
86
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
87 -n The input alignments are sorted by read names rather
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
88 than by chromosomal coordinates
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
89
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
90
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
91 index samtools index <aln.bam>
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
92
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
93 Index sorted alignment for fast random access. Index file
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
94 <aln.bam>.bai will be created.
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
95
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
96
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
97 view samtools view [-bhuHS] [-t in.refList] [-o output] [-f
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
98 reqFlag] [-F skipFlag] [-q minMapQ] [-l library] [-r read-
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
99 Group] <in.bam>|<in.sam> [region1 [...]]
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
100
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
101 Extract/print all or sub alignments in SAM or BAM format. If
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
102 no region is specified, all the alignments will be printed;
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
103 otherwise only alignments overlapping the specified regions
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
104 will be output. An alignment may be given multiple times if
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
105 it is overlapping several regions. A region can be presented,
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
106 for example, in the following format: `chr2', `chr2:1000000'
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
107 or `chr2:1,000,000-2,000,000'. The coordinate is 1-based.
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
108
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
109 OPTIONS:
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
110
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
111 -b Output in the BAM format.
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
112
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
113 -u Output uncompressed BAM. This option saves time spent
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
114 on compression/decomprssion and is thus preferred
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
115 when the output is piped to another samtools command.
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
116
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
117 -h Include the header in the output.
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
118
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
119 -H Output the header only.
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
120
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
121 -S Input is in SAM. If @SQ header lines are absent, the
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
122 `-t' option is required.
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
123
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
124 -t FILE This file is TAB-delimited. Each line must contain
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
125 the reference name and the length of the reference,
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
126 one line for each distinct reference; additional
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
127 fields are ignored. This file also defines the order
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
128 of the reference sequences in sorting. If you run
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
129 `samtools faidx <ref.fa>', the resultant index file
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
130 <ref.fa>.fai can be used as this <in.ref_list> file.
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
131
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
132 -o FILE Output file [stdout]
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
133
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
134 -f INT Only output alignments with all bits in INT present
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
135 in the FLAG field. INT can be in hex in the format of
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
136 /^0x[0-9A-F]+/ [0]
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
137
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
138 -F INT Skip alignments with bits present in INT [0]
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
139
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
140 -q INT Skip alignments with MAPQ smaller than INT [0]
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
141
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
142 -l STR Only output reads in library STR [null]
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
143
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
144 -r STR Only output reads in read group STR [null]
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
145
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
146
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
147 faidx samtools faidx <ref.fasta> [region1 [...]]
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
148
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
149 Index reference sequence in the FASTA format or extract sub-
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
150 sequence from indexed reference sequence. If no region is
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
151 specified, faidx will index the file and create
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
152 <ref.fasta>.fai on the disk. If regions are speficified, the
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
153 subsequences will be retrieved and printed to stdout in the
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
154 FASTA format. The input file can be compressed in the RAZF
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
155 format.
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
156
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
157
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
158 pileup samtools pileup [-f in.ref.fasta] [-t in.ref_list] [-l
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
159 in.site_list] [-iscgS2] [-T theta] [-N nHap] [-r
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
160 pairDiffRate] <in.bam>|<in.sam>
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
161
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
162 Print the alignment in the pileup format. In the pileup for-
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
163 mat, each line represents a genomic position, consisting of
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
164 chromosome name, coordinate, reference base, read bases, read
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
165 qualities and alignment mapping qualities. Information on
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
166 match, mismatch, indel, strand, mapping quality and start and
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
167 end of a read are all encoded at the read base column. At
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
168 this column, a dot stands for a match to the reference base
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
169 on the forward strand, a comma for a match on the reverse
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
170 strand, `ACGTN' for a mismatch on the forward strand and
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
171 `acgtn' for a mismatch on the reverse strand. A pattern
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
172 `\+[0-9]+[ACGTNacgtn]+' indicates there is an insertion
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
173 between this reference position and the next reference posi-
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
174 tion. The length of the insertion is given by the integer in
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
175 the pattern, followed by the inserted sequence. Similarly, a
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
176 pattern `-[0-9]+[ACGTNacgtn]+' represents a deletion from the
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
177 reference. The deleted bases will be presented as `*' in the
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
178 following lines. Also at the read base column, a symbol `^'
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
179 marks the start of a read segment which is a contiguous sub-
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
180 sequence on the read separated by `N/S/H' CIGAR operations.
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
181 The ASCII of the character following `^' minus 33 gives the
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
182 mapping quality. A symbol `$' marks the end of a read seg-
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
183 ment.
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
184
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
185 If option -c is applied, the consensus base, consensus qual-
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
186 ity, SNP quality and RMS mapping quality of the reads cover-
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
187 ing the site will be inserted between the `reference base'
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
188 and the `read bases' columns. An indel occupies an additional
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
189 line. Each indel line consists of chromosome name, coordi-
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
190 nate, a star, the genotype, consensus quality, SNP quality,
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
191 RMS mapping quality, # covering reads, the first alllele, the
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
192 second allele, # reads supporting the first allele, # reads
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
193 supporting the second allele and # reads containing indels
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
194 different from the top two alleles.
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
195
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
196 OPTIONS:
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
197
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
198
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
199 -s Print the mapping quality as the last column. This
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
200 option makes the output easier to parse, although
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
201 this format is not space efficient.
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
202
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
203
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
204 -S The input file is in SAM.
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
205
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
206
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
207 -i Only output pileup lines containing indels.
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
208
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
209
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
210 -f FILE The reference sequence in the FASTA format. Index
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
211 file FILE.fai will be created if absent.
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
212
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
213
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
214 -M INT Cap mapping quality at INT [60]
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
215
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
216
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
217 -t FILE List of reference names ane sequence lengths, in
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
218 the format described for the import command. If
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
219 this option is present, samtools assumes the input
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
220 <in.alignment> is in SAM format; otherwise it
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
221 assumes in BAM format.
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
222
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
223
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
224 -l FILE List of sites at which pileup is output. This file
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
225 is space delimited. The first two columns are
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
226 required to be chromosome and 1-based coordinate.
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
227 Additional columns are ignored. It is recommended
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
228 to use option -s together with -l as in the default
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
229 format we may not know the mapping quality.
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
230
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
231
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
232 -c Call the consensus sequence using MAQ consensus
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
233 model. Options -T, -N, -I and -r are only effective
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
234 when -c or -g is in use.
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
235
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
236
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
237 -g Generate genotype likelihood in the binary GLFv3
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
238 format. This option suppresses -c, -i and -s.
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
239
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
240
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
241 -T FLOAT The theta parameter (error dependency coefficient)
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
242 in the maq consensus calling model [0.85]
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
243
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
244
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
245 -N INT Number of haplotypes in the sample (>=2) [2]
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
246
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
247
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
248 -r FLOAT Expected fraction of differences between a pair of
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
249 haplotypes [0.001]
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
250
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
251
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
252 -I INT Phred probability of an indel in sequencing/prep.
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
253 [40]
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
254
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
255
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
256
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
257 tview samtools tview <in.sorted.bam> [ref.fasta]
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
258
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
259 Text alignment viewer (based on the ncurses library). In the
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
260 viewer, press `?' for help and press `g' to check the align-
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
261 ment start from a region in the format like
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
262 `chr10:10,000,000'.
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
263
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
264
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
265
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
266 fixmate samtools fixmate <in.nameSrt.bam> <out.bam>
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
267
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
268 Fill in mate coordinates, ISIZE and mate related flags from a
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
269 name-sorted alignment.
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
270
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
271
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
272 rmdup samtools rmdup <input.srt.bam> <out.bam>
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
273
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
274 Remove potential PCR duplicates: if multiple read pairs have
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
275 identical external coordinates, only retain the pair with
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
276 highest mapping quality. This command ONLY works with FR
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
277 orientation and requires ISIZE is correctly set.
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
278
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
279
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
280
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
281 rmdupse samtools rmdupse <input.srt.bam> <out.bam>
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
282
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
283 Remove potential duplicates for single-ended reads. This com-
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
284 mand will treat all reads as single-ended even if they are
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
285 paired in fact.
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
286
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
287
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
288
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
289 fillmd samtools fillmd [-e] <aln.bam> <ref.fasta>
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
290
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
291 Generate the MD tag. If the MD tag is already present, this
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
292 command will give a warning if the MD tag generated is dif-
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
293 ferent from the existing tag.
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
294
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
295 OPTIONS:
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
296
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
297 -e Convert a the read base to = if it is identical to
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
298 the aligned reference base. Indel caller does not
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
299 support the = bases at the moment.
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
300
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
301
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
302
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
303 SAM FORMAT
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
304 SAM is TAB-delimited. Apart from the header lines, which are started
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
305 with the `@' symbol, each alignment line consists of:
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
306
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
307
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
308 +----+-------+----------------------------------------------------------+
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
309 |Col | Field | Description |
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
310 +----+-------+----------------------------------------------------------+
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
311 | 1 | QNAME | Query (pair) NAME |
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
312 | 2 | FLAG | bitwise FLAG |
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
313 | 3 | RNAME | Reference sequence NAME |
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
314 | 4 | POS | 1-based leftmost POSition/coordinate of clipped sequence |
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
315 | 5 | MAPQ | MAPping Quality (Phred-scaled) |
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
316 | 6 | CIAGR | extended CIGAR string |
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
317 | 7 | MRNM | Mate Reference sequence NaMe (`=' if same as RNAME) |
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
318 | 8 | MPOS | 1-based Mate POSistion |
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
319 | 9 | ISIZE | Inferred insert SIZE |
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
320 |10 | SEQ | query SEQuence on the same strand as the reference |
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
321 |11 | QUAL | query QUALity (ASCII-33 gives the Phred base quality) |
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
322 |12 | OPT | variable OPTional fields in the format TAG:VTYPE:VALUE |
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
323 +----+-------+----------------------------------------------------------+
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
324
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
325 Each bit in the FLAG field is defined as:
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
326
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
327
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
328 +-------+--------------------------------------------------+
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
329 | Flag | Description |
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
330 +-------+--------------------------------------------------+
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
331 |0x0001 | the read is paired in sequencing |
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
332 |0x0002 | the read is mapped in a proper pair |
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
333 |0x0004 | the query sequence itself is unmapped |
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
334 |0x0008 | the mate is unmapped |
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
335 |0x0010 | strand of the query (1 for reverse) |
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
336 |0x0020 | strand of the mate |
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
337 |0x0040 | the read is the first read in a pair |
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
338 |0x0080 | the read is the second read in a pair |
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
339 |0x0100 | the alignment is not primary |
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
340 |0x0200 | the read fails platform/vendor quality checks |
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
341 |0x0400 | the read is either a PCR or an optical duplicate |
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
342 +-------+--------------------------------------------------+
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
343
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
344 LIMITATIONS
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
345 o Unaligned words used in bam_import.c, bam_endian.h, bam.c and
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
346 bam_aux.c.
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
347
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
348 o CIGAR operation P is not properly handled at the moment.
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
349
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
350 o In merging, the input files are required to have the same number of
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
351 reference sequences. The requirement can be relaxed. In addition,
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
352 merging does not reconstruct the header dictionaries automatically.
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
353 Endusers have to provide the correct header. Picard is better at
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
354 merging.
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
355
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
356 o Samtools' rmdup does not work for single-end data and does not remove
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
357 duplicates across chromosomes. Picard is better.
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
358
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
359
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
360 AUTHOR
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
361 Heng Li from the Sanger Institute wrote the C version of samtools. Bob
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
362 Handsaker from the Broad Institute implemented the BGZF library and Jue
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
363 Ruan from Beijing Genomics Institute wrote the RAZF library. Various
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
364 people in the 1000Genomes Project contributed to the SAM format speci-
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
365 fication.
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
366
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
367
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
368 SEE ALSO
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
369 Samtools website: <http://samtools.sourceforge.net>
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
370
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
371
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
372
74f5ea818cea Uploaded
ryanmorin
parents:
diff changeset
373 samtools-0.1.6 2 September 2009 samtools(1)