annotate ezBAMQC/src/htslib/faidx.5 @ 20:9de3bbec2479 draft default tip

Uploaded
author youngkim
date Thu, 31 Mar 2016 10:10:37 -0400
parents dfa3745e5fd8
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
1 '\" t
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
2 .TH faidx 5 "August 2013" "htslib" "Bioinformatics formats"
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
3 .SH NAME
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
4 faidx \- an index enabling random access to FASTA files
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
5 .\"
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
6 .\" Copyright (C) 2013 Genome Research Ltd.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
7 .\"
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
8 .\" Author: John Marshall <jm18@sanger.ac.uk>
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
9 .\"
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
10 .\" Permission is hereby granted, free of charge, to any person obtaining a
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
11 .\" copy of this software and associated documentation files (the "Software"),
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
12 .\" to deal in the Software without restriction, including without limitation
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
13 .\" the rights to use, copy, modify, merge, publish, distribute, sublicense,
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
14 .\" and/or sell copies of the Software, and to permit persons to whom the
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
15 .\" Software is furnished to do so, subject to the following conditions:
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
16 .\"
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
17 .\" The above copyright notice and this permission notice shall be included in
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
18 .\" all copies or substantial portions of the Software.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
19 .\"
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
20 .\" THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
21 .\" IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
22 .\" FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
23 .\" THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
24 .\" LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
25 .\" FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
26 .\" DEALINGS IN THE SOFTWARE.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
27 .\"
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
28 .SH SYNOPSIS
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
29 .IR file.fa .fai,
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
30 .IR file.fasta .fai
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
31 .SH DESCRIPTION
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
32 Using an \fBfai index\fP file in conjunction with a FASTA file containing
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
33 reference sequences enables efficient access to arbitrary regions within
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
34 those reference sequences.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
35 The index file typically has the same filename as the corresponding FASTA
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
36 file, with \fB.fai\fP appended.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
37 .P
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
38 An \fBfai index\fP file is a text file consisting of lines each with
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
39 five TAB-delimited columns:
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
40 .TS
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
41 lbl.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
42 NAME Name of this reference sequence
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
43 LENGTH Total length of this reference sequence, in bases
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
44 OFFSET Offset within the FASTA file of this sequence's first base
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
45 LINEBASES The number of bases on each line
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
46 LINEWIDTH The number of bytes in each line, including the newline
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
47 .TE
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
48 .P
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
49 The \fBNAME\fP and \fBLENGTH\fP columns contain the same
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
50 data as would appear in the \fBSN\fP and \fBLN\fP fields of a
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
51 SAM \fB@SQ\fP header for the same reference sequence.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
52 .P
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
53 The \fBOFFSET\fP column contains the offset within the FASTA file, in bytes
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
54 starting from zero, of the first base of this reference sequence, i.e., of
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
55 the character following the newline at the end of the "\fB>\fP" header line.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
56 Typically the lines of a \fBfai index\fP file appear in the order in which the
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
57 reference sequences appear in the FASTA file, so \fB.fai\fP files are typically
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
58 sorted according to this column.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
59 .P
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
60 The \fBLINEBASES\fP column contains the number of bases in each of the sequence
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
61 lines that form the body of this reference sequence, apart from the final line
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
62 which may be shorter.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
63 The \fBLINEWIDTH\fP column contains the number of \fIbytes\fP in each of
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
64 the sequence lines (except perhaps the final line), thus differing from
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
65 \fBLINEBASES\fP in that it also counts the bytes forming the line terminator.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
66 .SS FASTA Files
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
67 In order to be indexed with \fBsamtools faidx\fP, a FASTA file must be a text
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
68 file of the form
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
69 .LP
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
70 .RS
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
71 .RI > name
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
72 .RI [ description ...]
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
73 .br
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
74 ATGCATGCATGCATGCATGCATGCATGCAT
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
75 .br
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
76 GCATGCATGCATGCATGCATGCATGCATGC
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
77 .br
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
78 ATGCAT
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
79 .br
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
80 .RI > name
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
81 .RI [ description ...]
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
82 .br
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
83 ATGCATGCATGCAT
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
84 .br
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
85 GCATGCATGCATGC
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
86 .br
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
87 [...]
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
88 .RE
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
89 .LP
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
90 In particular, each reference sequence must be "well-formatted", i.e., all
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
91 of its sequence lines must be the same length, apart from the final sequence
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
92 line which may be shorter.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
93 (While this sequence line length must be the same within each sequence,
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
94 it may vary between different reference sequences in the same FASTA file.)
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
95 .P
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
96 This also means that although the FASTA file may have Unix- or Windows-style
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
97 or other line termination, the newline characters present must be consistent,
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
98 at least within each reference sequence.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
99 .P
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
100 The \fBsamtools\fP implementation uses the first word of the "\fB>\fP" header
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
101 line text (i.e., up to the first whitespace character) as the \fBNAME\fP column.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
102 At present, there may be no whitespace between the
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
103 ">" character and the \fIname\fP.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
104 .SH EXAMPLE
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
105 For example, given this FASTA file
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
106 .LP
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
107 .RS
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
108 >one
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
109 .br
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
110 ATGCATGCATGCATGCATGCATGCATGCAT
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
111 .br
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
112 GCATGCATGCATGCATGCATGCATGCATGC
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
113 .br
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
114 ATGCAT
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
115 .br
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
116 >two another chromosome
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
117 .br
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
118 ATGCATGCATGCAT
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
119 .br
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
120 GCATGCATGCATGC
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
121 .br
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
122 .RE
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
123 .LP
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
124 formatted with Unix-style (LF) line termination, the corresponding fai index
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
125 would be
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
126 .RS
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
127 .TS
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
128 lnnnn.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
129 one 66 5 30 31
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
130 two 28 98 14 15
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
131 .TE
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
132 .RE
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
133 .LP
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
134 If the FASTA file were formatted with Windows-style (CR-LF) line termination,
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
135 the fai index would be
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
136 .RS
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
137 .TS
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
138 lnnnn.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
139 one 66 6 30 32
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
140 two 28 103 14 16
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
141 .TE
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
142 .RE
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
143 .SH SEE ALSO
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
144 .IR samtools (1)
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
145 .TP
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
146 http://en.wikipedia.org/wiki/FASTA_format
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
147 Further description of the FASTA format