Mercurial > repos > ryanmorin > nextgen_variant_identification

diff SNV/SNVMix2_source/SNVMix2-v0.12.1-rc1/samtools-0.1.6/samtools.1 @ 0:74f5ea818cea
Uploaded
author: ryanmorin
date: Wed, 12 Oct 2011 19:50:38 -0400
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/SNV/SNVMix2_source/SNVMix2-v0.12.1-rc1/samtools-0.1.6/samtools.1	Wed Oct 12 19:50:38 2011 -0400
@@ -0,0 +1,445 @@
+.TH samtools 1 "2 September 2009" "samtools-0.1.6" "Bioinformatics tools"
+.SH NAME
+.PP
+samtools - Utilities for the Sequence Alignment/Map (SAM) format
+.SH SYNOPSIS
+.PP
+samtools view -bt ref_list.txt -o aln.bam aln.sam.gz
+.PP
+samtools sort aln.bam aln.sorted
+.PP
+samtools index aln.sorted.bam
+.PP
+samtools view aln.sorted.bam chr2:20,100,000-20,200,000
+.PP
+samtools merge out.bam in1.bam in2.bam in3.bam
+.PP
+samtools faidx ref.fasta
+.PP
+samtools pileup -f ref.fasta aln.sorted.bam
+.PP
+samtools tview aln.sorted.bam ref.fasta
+
+.SH DESCRIPTION
+.PP
+Samtools is a set of utilities that manipulate alignments in the BAM
+format. It imports from and exports to the SAM (Sequence Alignment/Map)
+format, does sorting, merging and indexing, and allows to retrieve reads
+in any regions swiftly.
+
+Samtools is designed to work on a stream. It regards an input file `-'
+as the standard input (stdin) and an output file `-' as the standard
+output (stdout). Several commands can thus be combined with Unix
+pipes. Samtools always output warning and error messages to the standard
+error output (stderr).
+
+Samtools is also able to open a BAM (not SAM) file on a remote FTP or
+HTTP server if the BAM file name starts with `ftp://' or `http://'.
+Samtools checks the current working directory for the index file and
+will download the index upon absence. Samtools does not retrieve the
+entire alignment file unless it is asked to do so.
+
+.SH COMMANDS AND OPTIONS
+
+.TP 10
+.B import
+samtools import <in.ref_list> <in.sam> <out.bam>
+
+Since 0.1.4, this command is an alias of:
+
+samtools view -bt <in.ref_list> -o <out.bam> <in.sam>
+
+.TP
+.B sort
+samtools sort [-n] [-m maxMem] <in.bam> <out.prefix>
+
+Sort alignments by leftmost coordinates. File
+.I <out.prefix>.bam
+will be created. This command may also create temporary files
+.I <out.prefix>.%d.bam
+when the whole alignment cannot be fitted into memory (controlled by
+option -m).
+
+.B OPTIONS:
+.RS
+.TP 8
+.B -n
+Sort by read names rather than by chromosomal coordinates
+.TP
+.B -m INT
+Approximately the maximum required memory. [500000000]
+.RE
+
+.TP
+.B merge
+samtools merge [-h inh.sam] [-n] <out.bam> <in1.bam> <in2.bam> [...]
+
+Merge multiple sorted alignments.
+The header reference lists of all the input BAM files, and the @SQ headers of
+.IR inh.sam ,
+if any, must all refer to the same set of reference sequences.
+The header reference list and (unless overridden by
+.BR -h )
+`@' headers of
+.I in1.bam
+will be copied to
+.IR out.bam ,
+and the headers of other files will be ignored.
+
+.B OPTIONS:
+.RS
+.TP 8
+.B -h FILE
+Use the lines of
+.I FILE
+as `@' headers to be copied to
+.IR out.bam ,
+replacing any header lines that would otherwise be copied from
+.IR in1.bam .
+.RI ( FILE
+is actually in SAM format, though any alignment records it may contain
+are ignored.)
+.TP
+.B -n
+The input alignments are sorted by read names rather than by chromosomal
+coordinates
+.RE
+
+.TP
+.B index
+samtools index <aln.bam>
+
+Index sorted alignment for fast random access. Index file
+.I <aln.bam>.bai
+will be created.
+
+.TP
+.B view
+samtools view [-bhuHS] [-t in.refList] [-o output] [-f reqFlag] [-F
+skipFlag] [-q minMapQ] [-l library] [-r readGroup] <in.bam>|<in.sam> [region1 [...]]
+
+Extract/print all or sub alignments in SAM or BAM format. If no region
+is specified, all the alignments will be printed; otherwise only
+alignments overlapping the specified regions will be output. An
+alignment may be given multiple times if it is overlapping several
+regions. A region can be presented, for example, in the following
+format: `chr2', `chr2:1000000' or `chr2:1,000,000-2,000,000'. The
+coordinate is 1-based.
+
+.B OPTIONS:
+.RS
+.TP 8
+.B -b
+Output in the BAM format.
+.TP
+.B -u
+Output uncompressed BAM. This option saves time spent on
+compression/decomprssion and is thus preferred when the output is piped
+to another samtools command.
+.TP
+.B -h
+Include the header in the output.
+.TP
+.B -H
+Output the header only.
+.TP
+.B -S
+Input is in SAM. If @SQ header lines are absent, the
+.B `-t'
+option is required.
+.TP
+.B -t FILE
+This file is TAB-delimited. Each line must contain the reference name
+and the length of the reference, one line for each distinct reference;
+additional fields are ignored. This file also defines the order of the
+reference sequences in sorting. If you run `samtools faidx <ref.fa>',
+the resultant index file
+.I <ref.fa>.fai
+can be used as this
+.I <in.ref_list>
+file.
+.TP
+.B -o FILE
+Output file [stdout]
+.TP
+.B -f INT
+Only output alignments with all bits in INT present in the FLAG
+field. INT can be in hex in the format of /^0x[0-9A-F]+/ [0]
+.TP
+.B -F INT
+Skip alignments with bits present in INT [0]
+.TP
+.B -q INT
+Skip alignments with MAPQ smaller than INT [0]
+.TP
+.B -l STR
+Only output reads in library STR [null]
+.TP
+.B -r STR
+Only output reads in read group STR [null]
+.RE
+
+.TP
+.B faidx
+samtools faidx <ref.fasta> [region1 [...]]
+
+Index reference sequence in the FASTA format or extract subsequence from
+indexed reference sequence. If no region is specified,
+.B faidx
+will index the file and create
+.I <ref.fasta>.fai
+on the disk. If regions are speficified, the subsequences will be
+retrieved and printed to stdout in the FASTA format. The input file can
+be compressed in the
+.B RAZF
+format.
+
+.TP
+.B pileup
+samtools pileup [-f in.ref.fasta] [-t in.ref_list] [-l in.site_list]
+[-iscgS2] [-T theta] [-N nHap] [-r pairDiffRate] <in.bam>|<in.sam>
+
+Print the alignment in the pileup format. In the pileup format, each
+line represents a genomic position, consisting of chromosome name,
+coordinate, reference base, read bases, read qualities and alignment
+mapping qualities. Information on match, mismatch, indel, strand,
+mapping quality and start and end of a read are all encoded at the read
+base column. At this column, a dot stands for a match to the reference
+base on the forward strand, a comma for a match on the reverse strand,
+`ACGTN' for a mismatch on the forward strand and `acgtn' for a mismatch
+on the reverse strand. A pattern `\\+[0-9]+[ACGTNacgtn]+' indicates
+there is an insertion between this reference position and the next
+reference position. The length of the insertion is given by the integer
+in the pattern, followed by the inserted sequence. Similarly, a pattern
+`-[0-9]+[ACGTNacgtn]+' represents a deletion from the reference. The
+deleted bases will be presented as `*' in the following lines. Also at
+the read base column, a symbol `^' marks the start of a read segment
+which is a contiguous subsequence on the read separated by `N/S/H' CIGAR
+operations. The ASCII of the character following `^' minus 33 gives the
+mapping quality. A symbol `$' marks the end of a read segment.
+
+If option
+.B -c
+is applied, the consensus base, consensus quality, SNP quality and RMS
+mapping quality of the reads covering the site will be inserted between
+the `reference base' and the `read bases' columns. An indel occupies an
+additional line. Each indel line consists of chromosome name,
+coordinate, a star, the genotype, consensus quality, SNP quality, RMS
+mapping quality, # covering reads, the first alllele, the second allele,
+# reads supporting the first allele, # reads supporting the second
+allele and # reads containing indels different from the top two alleles.
+
+.B OPTIONS:
+.RS
+
+.TP 10
+.B -s
+Print the mapping quality as the last column. This option makes the
+output easier to parse, although this format is not space efficient.
+
+.TP
+.B -S
+The input file is in SAM.
+
+.TP
+.B -i
+Only output pileup lines containing indels.
+
+.TP
+.B -f FILE
+The reference sequence in the FASTA format. Index file
+.I FILE.fai
+will be created if
+absent.
+
+.TP
+.B -M INT
+Cap mapping quality at INT [60]
+
+.TP
+.B -t FILE
+List of reference names ane sequence lengths, in the format described
+for the
+.B import
+command. If this option is present, samtools assumes the input
+.I <in.alignment>
+is in SAM format; otherwise it assumes in BAM format.
+
+.TP
+.B -l FILE
+List of sites at which pileup is output. This file is space
+delimited. The first two columns are required to be chromosome and
+1-based coordinate. Additional columns are ignored. It is
+recommended to use option
+.B -s
+together with
+.B -l
+as in the default format we may not know the mapping quality.
+
+.TP
+.B -c
+Call the consensus sequence using MAQ consensus model. Options
+.B -T,
+.B -N,
+.B -I
+and
+.B -r
+are only effective when
+.B -c
+or
+.B -g
+is in use.
+
+.TP
+.B -g
+Generate genotype likelihood in the binary GLFv3 format. This option
+suppresses -c, -i and -s.
+
+.TP
+.B -T FLOAT
+The theta parameter (error dependency coefficient) in the maq consensus
+calling model [0.85]
+
+.TP
+.B -N INT
+Number of haplotypes in the sample (>=2) [2]
+
+.TP
+.B -r FLOAT
+Expected fraction of differences between a pair of haplotypes [0.001]
+
+.TP
+.B -I INT
+Phred probability of an indel in sequencing/prep. [40]
+
+.RE
+
+.TP
+.B tview
+samtools tview <in.sorted.bam> [ref.fasta]
+
+Text alignment viewer (based on the ncurses library). In the viewer,
+press `?' for help and press `g' to check the alignment start from a
+region in the format like `chr10:10,000,000'.
+
+.RE
+
+.TP
+.B fixmate
+samtools fixmate <in.nameSrt.bam> <out.bam>
+
+Fill in mate coordinates, ISIZE and mate related flags from a
+name-sorted alignment.
+
+.TP
+.B rmdup
+samtools rmdup <input.srt.bam> <out.bam>
+
+Remove potential PCR duplicates: if multiple read pairs have identical
+external coordinates, only retain the pair with highest mapping quality.
+This command
+.B ONLY
+works with FR orientation and requires ISIZE is correctly set.
+
+.RE
+
+.TP
+.B rmdupse
+samtools rmdupse <input.srt.bam> <out.bam>
+
+Remove potential duplicates for single-ended reads. This command will
+treat all reads as single-ended even if they are paired in fact.
+
+.RE
+
+.TP
+.B fillmd
+samtools fillmd [-e] <aln.bam> <ref.fasta>
+
+Generate the MD tag. If the MD tag is already present, this command will
+give a warning if the MD tag generated is different from the existing
+tag.
+
+.B OPTIONS:
+.RS
+.TP 8
+.B -e
+Convert a the read base to = if it is identical to the aligned reference
+base. Indel caller does not support the = bases at the moment.
+
+.RE
+
+.SH SAM FORMAT
+
+SAM is TAB-delimited. Apart from the header lines, which are started
+with the `@' symbol, each alignment line consists of:
+
+.TS
+center box;
+cb | cb | cb
+n | l | l .
+Col	Field	Description
+_
+1	QNAME	Query (pair) NAME
+2	FLAG	bitwise FLAG
+3	RNAME	Reference sequence NAME
+4	POS	1-based leftmost POSition/coordinate of clipped sequence
+5	MAPQ	MAPping Quality (Phred-scaled)
+6	CIAGR	extended CIGAR string
+7	MRNM	Mate Reference sequence NaMe (`=' if same as RNAME)
+8	MPOS	1-based Mate POSistion
+9	ISIZE	Inferred insert SIZE
+10	SEQ	query SEQuence on the same strand as the reference
+11	QUAL	query QUALity (ASCII-33 gives the Phred base quality)
+12	OPT	variable OPTional fields in the format TAG:VTYPE:VALUE
+.TE
+
+.PP
+Each bit in the FLAG field is defined as:
+
+.TS
+center box;
+cb | cb
+l | l .
+Flag	Description
+_
+0x0001	the read is paired in sequencing
+0x0002	the read is mapped in a proper pair
+0x0004	the query sequence itself is unmapped
+0x0008	the mate is unmapped
+0x0010	strand of the query (1 for reverse)
+0x0020	strand of the mate
+0x0040	the read is the first read in a pair
+0x0080	the read is the second read in a pair
+0x0100	the alignment is not primary
+0x0200	the read fails platform/vendor quality checks
+0x0400	the read is either a PCR or an optical duplicate
+.TE
+
+.SH LIMITATIONS
+.PP
+.IP o 2
+Unaligned words used in bam_import.c, bam_endian.h, bam.c and bam_aux.c.
+.IP o 2
+CIGAR operation P is not properly handled at the moment.
+.IP o 2
+In merging, the input files are required to have the same number of
+reference sequences. The requirement can be relaxed. In addition,
+merging does not reconstruct the header dictionaries
+automatically. Endusers have to provide the correct header. Picard is
+better at merging.
+.IP o 2
+Samtools' rmdup does not work for single-end data and does not remove
+duplicates across chromosomes. Picard is better.
+
+.SH AUTHOR
+.PP
+Heng Li from the Sanger Institute wrote the C version of samtools. Bob
+Handsaker from the Broad Institute implemented the BGZF library and Jue
+Ruan from Beijing Genomics Institute wrote the RAZF library. Various
+people in the 1000Genomes Project contributed to the SAM format
+specification.
+
+.SH SEE ALSO
+.PP
+Samtools website: <http://samtools.sourceforge.net>
author	ryanmorin
date	Wed, 12 Oct 2011 19:50:38 -0400
parents
children