0
|
1 ############################
|
|
2 # #
|
|
3 # Variant Effect Predictor #
|
|
4 # #
|
|
5 ############################
|
|
6
|
|
7 Copyright (c) 1999-2011 The European Bioinformatics Institute and
|
|
8 Genome Research Limited. All rights reserved.
|
|
9
|
|
10 This software is distributed under a modified Apache license.
|
|
11 For license details, please see
|
|
12
|
|
13 http://www.ensembl.org/info/about/code_licence.html
|
|
14
|
|
15 Please email comments or questions to the public Ensembl
|
|
16 developers list at <dev@ensembl.org>.
|
|
17
|
|
18 Questions may also be sent to the Ensembl help desk at
|
|
19 <helpdesk@ensembl.org>
|
|
20
|
|
21 Quickstart
|
|
22 ==========
|
|
23
|
|
24 Install API and cache files, run in offline mode:
|
|
25
|
|
26 perl INSTALL.pl
|
|
27 perl variant_effect_predictor.pl --offline
|
|
28
|
|
29
|
|
30 Documentation
|
|
31 =============
|
|
32
|
|
33 For a summary of command line flags, run:
|
|
34
|
|
35 perl variant_effect_predictor.pl --help
|
|
36
|
|
37 For full documentation see
|
|
38
|
|
39 http://www.ensembl.org/info/docs/variation/vep/vep_script.html
|
|
40
|
|
41
|
|
42
|
|
43 Changelog
|
|
44 =========
|
|
45
|
|
46 New in version 2.6 (July 2012)
|
|
47 ------------------------------
|
|
48
|
|
49 - support for structural variant consequences
|
|
50
|
|
51 - Sequence Ontology (SO) consequence terms now default
|
|
52
|
|
53 - script runtime 3-4x faster when using forking
|
|
54
|
|
55 - 1000 Genomes global MAF available in cache files
|
|
56
|
|
57 - improved memory usage
|
|
58
|
|
59
|
|
60 New in version 2.5 (May 2012)
|
|
61 -----------------------------
|
|
62
|
|
63 - SIFT and PolyPhen predictions now available for RefSeq transcripts
|
|
64
|
|
65 - retrieve cell type-specific regulatory consequences
|
|
66
|
|
67 - consequences can be retrieved based on a single individual's genotype in
|
|
68 a VCF input file
|
|
69
|
|
70 - find overlapping structural variants
|
|
71
|
|
72 - Condel support removed from main script and moved to a plugin
|
|
73
|
|
74
|
|
75 New in version 2.4 (February 2012)
|
|
76 ----------------------------------
|
|
77 - offline mode and new installer script make it easy to use the VEP without
|
|
78 the usual dependencies
|
|
79
|
|
80 - output columns configurable using the --fields flag
|
|
81
|
|
82 - VCF output support expanded, can now carry all fields
|
|
83
|
|
84 - output affected exon and intron numbers with --numbers
|
|
85
|
|
86 - output overlapping protein domains using --domains
|
|
87
|
|
88 - enhanced support for LRGs
|
|
89
|
|
90 - plugins now work on variants called as intergenic
|
|
91
|
|
92
|
|
93 New in version 2.3 (December 2011)
|
|
94 ----------------------------------
|
|
95
|
|
96 - Add custom annotations from tabix-indexed files (BED, GFF, GTF, VCF, bigWig)
|
|
97
|
|
98 - Add new functionality to the VEP with user-written plugins
|
|
99
|
|
100 - Filter input on consequence type
|
|
101
|
|
102
|
|
103 Version 2.2 (September 2011)
|
|
104 ----------------------------
|
|
105
|
|
106 - SIFT, PolyPhen and Condel predictions and regulatory features now accessible
|
|
107 from the cache
|
|
108
|
|
109 - Support for calling consequences against RefSeq transcripts
|
|
110
|
|
111 - Variant identifiers (e.g. dbSNP rsIDs) and HGVS notations supported as input
|
|
112 format
|
|
113
|
|
114 - Variants can now be filtered by frequency in HapMap and 1000 genomes
|
|
115 populations
|
|
116
|
|
117 - Script can be used to convert files between formats (Ensembl/VCF/Pileup/HGVS
|
|
118 to Ensembl/VCF/Pileup)
|
|
119
|
|
120 - Large amount of code moved to API modules to ensure consistency between web
|
|
121 and script VEP
|
|
122
|
|
123 - Memory usage optimisations
|
|
124
|
|
125 - VEP script moved to ensembl-tools CVS module
|
|
126
|
|
127 - Added --canonical, --per_gene and --no_intergenic options
|
|
128
|
|
129
|
|
130 Version 2.1 (June 2011)
|
|
131 -----------------------
|
|
132
|
|
133 - ability to use local file cache in place of or alongside connecting to an
|
|
134 Ensembl database
|
|
135
|
|
136 - significant improvements to speed of script
|
|
137
|
|
138 - whole-genome mode now default (no disadvantage for smaller datasets)
|
|
139
|
|
140 - improved status output with progress bars
|
|
141
|
|
142 - regulatory region consequences now reinstated and improved
|
|
143
|
|
144 - modification to output file - Transcript column is now Feature, and is
|
|
145 followed by a Feature_type column
|
|
146
|
|
147 - full documentation now online
|
|
148
|
|
149
|
|
150 Version 2.0 (April 2011)
|
|
151 ------------------------
|
|
152
|
|
153 Version 2.0 of the Variant Effect Predictor script (VEP) constitutes a complete
|
|
154 overhaul of both the script and the API behind it. It requires at least version
|
|
155 62 of the Ensembl API to function. Here follows a summary of the changes:
|
|
156
|
|
157 - support for SIFT, PolyPhen and Condel non-synonymous predictions in human
|
|
158
|
|
159 - per-allele and compound consequence types
|
|
160
|
|
161 - support for Sequence Ontology (SO) and NCBI consequence terms
|
|
162
|
|
163 - modified output format
|
|
164 - support for new output fields in Extra column
|
|
165 - header section containing information on database and software versions
|
|
166 - codon change shown in output
|
|
167 - CDS position shown in output
|
|
168 - option to output Ensembl protein identifiers
|
|
169 - option to output HGVS nomenclature for variants
|
|
170
|
|
171 - support for gzipped input files
|
|
172
|
|
173 - enhanced configuration options, including the ability to read configuration
|
|
174 from a file
|
|
175
|
|
176 - verbose output now much more useful
|
|
177
|
|
178 - whole-genome mode now more stable
|
|
179
|
|
180 - finding existing co-located variations now ~5x faster
|