0
|
1 .TH tabix 1 "3 February 2015" "htslib-1.2.1" "Bioinformatics tools"
|
|
2 .SH NAME
|
|
3 .PP
|
|
4 bgzip \- Block compression/decompression utility
|
|
5 .PP
|
|
6 tabix \- Generic indexer for TAB-delimited genome position files
|
|
7 .\"
|
|
8 .\" Copyright (C) 2009-2011 Broad Institute.
|
|
9 .\"
|
|
10 .\" Author: Heng Li <lh3@sanger.ac.uk>
|
|
11 .\"
|
|
12 .\" Permission is hereby granted, free of charge, to any person obtaining a
|
|
13 .\" copy of this software and associated documentation files (the "Software"),
|
|
14 .\" to deal in the Software without restriction, including without limitation
|
|
15 .\" the rights to use, copy, modify, merge, publish, distribute, sublicense,
|
|
16 .\" and/or sell copies of the Software, and to permit persons to whom the
|
|
17 .\" Software is furnished to do so, subject to the following conditions:
|
|
18 .\"
|
|
19 .\" The above copyright notice and this permission notice shall be included in
|
|
20 .\" all copies or substantial portions of the Software.
|
|
21 .\"
|
|
22 .\" THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
23 .\" IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
24 .\" FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
|
|
25 .\" THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
26 .\" LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
|
27 .\" FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
|
|
28 .\" DEALINGS IN THE SOFTWARE.
|
|
29 .\"
|
|
30 .SH SYNOPSIS
|
|
31 .PP
|
|
32 .B bgzip
|
|
33 .RB [ -cdhB ]
|
|
34 .RB [ -b
|
|
35 .IR virtualOffset ]
|
|
36 .RB [ -s
|
|
37 .IR size ]
|
|
38 .RI [ file ]
|
|
39 .PP
|
|
40 .B tabix
|
|
41 .RB [ -0lf ]
|
|
42 .RB [ -p
|
|
43 gff|bed|sam|vcf]
|
|
44 .RB [ -s
|
|
45 .IR seqCol ]
|
|
46 .RB [ -b
|
|
47 .IR begCol ]
|
|
48 .RB [ -e
|
|
49 .IR endCol ]
|
|
50 .RB [ -S
|
|
51 .IR lineSkip ]
|
|
52 .RB [ -c
|
|
53 .IR metaChar ]
|
|
54 .I in.tab.bgz
|
|
55 .RI [ "region1 " [ "region2 " [ ... "]]]"
|
|
56
|
|
57 .SH DESCRIPTION
|
|
58 .PP
|
|
59 Tabix indexes a TAB-delimited genome position file
|
|
60 .I in.tab.bgz
|
|
61 and creates an index file (
|
|
62 .I in.tab.bgz.tbi
|
|
63 or
|
|
64 .I in.tab.bgz.csi
|
|
65 ) when
|
|
66 .I region
|
|
67 is absent from the command-line. The input data file must be position
|
|
68 sorted and compressed by
|
|
69 .B bgzip
|
|
70 which has a
|
|
71 .BR gzip (1)
|
|
72 like interface. After indexing, tabix is able to quickly retrieve data
|
|
73 lines overlapping
|
|
74 .I regions
|
|
75 specified in the format "chr:beginPos-endPos". Fast data retrieval also
|
|
76 works over network if URI is given as a file name and in this case the
|
|
77 index file will be downloaded if it is not present locally.
|
|
78
|
|
79 .SH INDEXING OPTIONS
|
|
80 .TP 10
|
|
81 .B -0, --zero-based
|
|
82 Specify that the position in the data file is 0-based (e.g. UCSC files)
|
|
83 rather than 1-based.
|
|
84 .TP
|
|
85 .BI "-b, --begin " INT
|
|
86 Column of start chromosomal position. [4]
|
|
87 .TP
|
|
88 .BI "-c, --comment " CHAR
|
|
89 Skip lines started with character CHAR. [#]
|
|
90 .TP
|
|
91 .BI "-C, --csi"
|
|
92 Skip lines started with character CHAR. [#]
|
|
93 .TP
|
|
94 .BI "-e, --end " INT
|
|
95 Column of end chromosomal position. The end column can be the same as the
|
|
96 start column. [5]
|
|
97 .TP
|
|
98 .B "-f, --force "
|
|
99 Force to overwrite the index file if it is present.
|
|
100 .TP
|
|
101 .BI "-m, --min-shift" INT
|
|
102 set minimal interval size for CSI indices to 2^INT [14]
|
|
103 .TP
|
|
104 .BI "-p, --preset " STR
|
|
105 Input format for indexing. Valid values are: gff, bed, sam, vcf.
|
|
106 This option should not be applied together with any of
|
|
107 .BR -s ", " -b ", " -e ", " -c " and " -0 ;
|
|
108 it is not used for data retrieval because this setting is stored in
|
|
109 the index file. [gff]
|
|
110 .TP
|
|
111 .BI "-s, --sequence " INT
|
|
112 Column of sequence name. Option
|
|
113 .BR -s ", " -b ", " -e ", " -S ", " -c " and " -0
|
|
114 are all stored in the index file and thus not used in data retrieval. [1]
|
|
115 .TP
|
|
116 .BI "-S, --skip-lines " INT
|
|
117 Skip first INT lines in the data file. [0]
|
|
118
|
|
119 .SH QUERYING AND OTHER OPTIONS
|
|
120 .TP
|
|
121 .B "-h, --print-header "
|
|
122 Print also the header/meta lines.
|
|
123 .TP
|
|
124 .B "-H, --only-header "
|
|
125 Print only the header/meta lines.
|
|
126 .TP
|
|
127 .B "-i, --file-info "
|
|
128 Print file format info.
|
|
129 .TP
|
|
130 .B "-l, --list-chroms "
|
|
131 List the sequence names stored in the index file.
|
|
132 .TP
|
|
133 .B "-r, --reheader " FILE
|
|
134 Replace the header with the content of FILE
|
|
135 .TP
|
|
136 .B "-R, --regions " FILE
|
|
137 Restrict to regions listed in the FILE. The FILE can be BED file (requires .bed, .bed.gz, .bed.bgz
|
|
138 file name extension) or a TAB-delimited file with CHROM, POS, and, optionally,
|
|
139 POS_TO columns, where positions are 1-based and inclusive. When this option is in use, the input
|
|
140 file may not be sorted.
|
|
141 regions.
|
|
142 .TP
|
|
143 .B "-T, --targets" FILE
|
|
144 Similar to
|
|
145 .B -R
|
|
146 but the entire input will be read sequentially and regions not listed in FILE will be skipped.
|
|
147 .PP
|
|
148 .SH EXAMPLE
|
|
149 (grep ^"#" in.gff; grep -v ^"#" in.gff | sort -k1,1 -k4,4n) | bgzip > sorted.gff.gz;
|
|
150
|
|
151 tabix -p gff sorted.gff.gz;
|
|
152
|
|
153 tabix sorted.gff.gz chr1:10,000,000-20,000,000;
|
|
154
|
|
155 .SH NOTES
|
|
156 It is straightforward to achieve overlap queries using the standard
|
|
157 B-tree index (with or without binning) implemented in all SQL databases,
|
|
158 or the R-tree index in PostgreSQL and Oracle. But there are still many
|
|
159 reasons to use tabix. Firstly, tabix directly works with a lot of widely
|
|
160 used TAB-delimited formats such as GFF/GTF and BED. We do not need to
|
|
161 design database schema or specialized binary formats. Data do not need
|
|
162 to be duplicated in different formats, either. Secondly, tabix works on
|
|
163 compressed data files while most SQL databases do not. The GenCode
|
|
164 annotation GTF can be compressed down to 4%. Thirdly, tabix is
|
|
165 fast. The same indexing algorithm is known to work efficiently for an
|
|
166 alignment with a few billion short reads. SQL databases probably cannot
|
|
167 easily handle data at this scale. Last but not the least, tabix supports
|
|
168 remote data retrieval. One can put the data file and the index at an FTP
|
|
169 or HTTP server, and other users or even web services will be able to get
|
|
170 a slice without downloading the entire file.
|
|
171
|
|
172 .SH AUTHOR
|
|
173 .PP
|
|
174 Tabix was written by Heng Li. The BGZF library was originally
|
|
175 implemented by Bob Handsaker and modified by Heng Li for remote file
|
|
176 access and in-memory caching.
|
|
177
|
|
178 .SH SEE ALSO
|
|
179 .PP
|
|
180 .BR samtools (1)
|