annotate MACARON-GenMed-LabEx/README.md @ 0:c9636a827049 draft default tip

Uploaded
author waqas
date Wed, 12 Sep 2018 08:45:03 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
c9636a827049 Uploaded
waqas
parents:
diff changeset
1 MACARON User Guide
c9636a827049 Uploaded
waqas
parents:
diff changeset
2 ================
c9636a827049 Uploaded
waqas
parents:
diff changeset
3
c9636a827049 Uploaded
waqas
parents:
diff changeset
4 # Table of Contents
c9636a827049 Uploaded
waqas
parents:
diff changeset
5
c9636a827049 Uploaded
waqas
parents:
diff changeset
6 [//]: # (BEGIN automated TOC section, any edits will be overwritten on next source refresh)
c9636a827049 Uploaded
waqas
parents:
diff changeset
7
c9636a827049 Uploaded
waqas
parents:
diff changeset
8 * [Introduction](#introduction)
c9636a827049 Uploaded
waqas
parents:
diff changeset
9 * [Installation](#installation)
c9636a827049 Uploaded
waqas
parents:
diff changeset
10 * [Operating System Guidelines](#operating-system-guidelines)
c9636a827049 Uploaded
waqas
parents:
diff changeset
11 * [Runtime Pre-requisite](#runtime-pre-requisite)
c9636a827049 Uploaded
waqas
parents:
diff changeset
12 * [Software Dependencies](#software-dependencies)
c9636a827049 Uploaded
waqas
parents:
diff changeset
13 * [Downloading the Source Code](#downloading-the-source-code)
c9636a827049 Uploaded
waqas
parents:
diff changeset
14 * [Contents of the Folder MACARON_GenMed](#contents-of-the-folder-macaron_genmed)
c9636a827049 Uploaded
waqas
parents:
diff changeset
15 * [Running the MACARON](#running-the-macaron)
c9636a827049 Uploaded
waqas
parents:
diff changeset
16 * [Input Requirements](#input-requirements)
c9636a827049 Uploaded
waqas
parents:
diff changeset
17 * [Default Options](#default-options)
c9636a827049 Uploaded
waqas
parents:
diff changeset
18 * [demo Folder](#demo-folder)
c9636a827049 Uploaded
waqas
parents:
diff changeset
19 * [Advanced Options](#advanced-options)
c9636a827049 Uploaded
waqas
parents:
diff changeset
20 * [MACARON Reporting Format](#macaron-reporting-format)
c9636a827049 Uploaded
waqas
parents:
diff changeset
21 * [Validating SNVs Existed on the Same Reads](#validating-snvs-existed-on-the-same-reads)
c9636a827049 Uploaded
waqas
parents:
diff changeset
22 * [References](#references)
c9636a827049 Uploaded
waqas
parents:
diff changeset
23 * [Citation](#citation)
c9636a827049 Uploaded
waqas
parents:
diff changeset
24
c9636a827049 Uploaded
waqas
parents:
diff changeset
25 [//]: # (END automated TOC section, any edits will be overwritten on next source refresh)
c9636a827049 Uploaded
waqas
parents:
diff changeset
26
c9636a827049 Uploaded
waqas
parents:
diff changeset
27 # Introduction
c9636a827049 Uploaded
waqas
parents:
diff changeset
28
c9636a827049 Uploaded
waqas
parents:
diff changeset
29 MACARON (Multi-bAse Codon-Associated variant Re-annotatiON) is a python framework to identify and re-annotate multi-base affected codons in whole genome/exome sequence data. Starting from a standard VCF file, MACARON identifies, re-annotates and predicts the amino acid change resulting from multiple single nucleotide variants (SNVs) within the same genetic codon.
c9636a827049 Uploaded
waqas
parents:
diff changeset
30
c9636a827049 Uploaded
waqas
parents:
diff changeset
31 The information below includes how to install and run MACARON to filter a list of variant records (from VCF file) called by any existing SNP-based variant caller to identify SNVs with the same genetic codon and correct their corresponding amino acid change.
c9636a827049 Uploaded
waqas
parents:
diff changeset
32
c9636a827049 Uploaded
waqas
parents:
diff changeset
33 See latest [News](https://github.com/waqasuddinkhan/MACARON-GenMed-LabEx/wiki/News???) and [Updates](https://github.com/waqasuddinkhan/MACARON-GenMed-LabEx/wiki#updates) on [MACARON-GenMed-LabEx Wiki page](https://github.com/waqasuddinkhan/MACARON-GenMed-LabEx/wiki).
c9636a827049 Uploaded
waqas
parents:
diff changeset
34
c9636a827049 Uploaded
waqas
parents:
diff changeset
35 # Installation
c9636a827049 Uploaded
waqas
parents:
diff changeset
36
c9636a827049 Uploaded
waqas
parents:
diff changeset
37 ### Operating System Guidelines
c9636a827049 Uploaded
waqas
parents:
diff changeset
38
c9636a827049 Uploaded
waqas
parents:
diff changeset
39 MACARON is know to run on LINUX UBUNTU 16.04 LTS. However, MACARON can be run on any other LINUX version.
c9636a827049 Uploaded
waqas
parents:
diff changeset
40
c9636a827049 Uploaded
waqas
parents:
diff changeset
41 ### Runtime Pre-requisite
c9636a827049 Uploaded
waqas
parents:
diff changeset
42
c9636a827049 Uploaded
waqas
parents:
diff changeset
43 __1.__ MACARON is executable in __PYTHON v2.7 or later__. If the user has multiple PYTHON versions, please make sure that your running environment is set to the required version of PYTHON.
c9636a827049 Uploaded
waqas
parents:
diff changeset
44
c9636a827049 Uploaded
waqas
parents:
diff changeset
45 __2.__ Check your __JAVA__ version as MACARON is tested with:
c9636a827049 Uploaded
waqas
parents:
diff changeset
46
c9636a827049 Uploaded
waqas
parents:
diff changeset
47 java -version
c9636a827049 Uploaded
waqas
parents:
diff changeset
48 openjdk version __"1.8.0_151"__
c9636a827049 Uploaded
waqas
parents:
diff changeset
49 OpenJDK Runtime Environment (build 1.8.0_151-8u151-b12-0ubuntu0.16.04.2-b12)
c9636a827049 Uploaded
waqas
parents:
diff changeset
50 OpenJDK 64-Bit Server VM (build 25.151-b12, mixed mode)
c9636a827049 Uploaded
waqas
parents:
diff changeset
51
c9636a827049 Uploaded
waqas
parents:
diff changeset
52 ### Software Dependencies
c9636a827049 Uploaded
waqas
parents:
diff changeset
53
c9636a827049 Uploaded
waqas
parents:
diff changeset
54 Before running MACARON, please make sure that following software are installed properly:
c9636a827049 Uploaded
waqas
parents:
diff changeset
55
c9636a827049 Uploaded
waqas
parents:
diff changeset
56 __1.__ __Genome-Analysis Toolkit__ (https://software.broadinstitute.org/gatk/download/).
c9636a827049 Uploaded
waqas
parents:
diff changeset
57
c9636a827049 Uploaded
waqas
parents:
diff changeset
58 __2.__ __SnpEff__ (tested with __v4.3__ (build 2017-05-05 18:41). However, MACARON can also run with any older or newer version (http://snpeff.sourceforge.net/download.html).
c9636a827049 Uploaded
waqas
parents:
diff changeset
59
c9636a827049 Uploaded
waqas
parents:
diff changeset
60 __3.__ __SAMTools__ (tested with version __0.1.19__), however any version can be used.
c9636a827049 Uploaded
waqas
parents:
diff changeset
61
c9636a827049 Uploaded
waqas
parents:
diff changeset
62 __4.__ __Human Reference Genome__: Depends on user’s input.
c9636a827049 Uploaded
waqas
parents:
diff changeset
63
c9636a827049 Uploaded
waqas
parents:
diff changeset
64 __5.__ __SnpEff’s Human Annotation Database__: Depends on user’s input.
c9636a827049 Uploaded
waqas
parents:
diff changeset
65
c9636a827049 Uploaded
waqas
parents:
diff changeset
66 For __1__ and __2__, as long as they are compatible with JAVA, MACARON has no issues.
c9636a827049 Uploaded
waqas
parents:
diff changeset
67
c9636a827049 Uploaded
waqas
parents:
diff changeset
68 ### Downloading the Source Code
c9636a827049 Uploaded
waqas
parents:
diff changeset
69
c9636a827049 Uploaded
waqas
parents:
diff changeset
70 The most prefered way to use the lastest version of MACARON is:
c9636a827049 Uploaded
waqas
parents:
diff changeset
71
c9636a827049 Uploaded
waqas
parents:
diff changeset
72 git clone https://github.com/waqasuddinkhan/MACARON-GenMed-LabEx.git
c9636a827049 Uploaded
waqas
parents:
diff changeset
73
c9636a827049 Uploaded
waqas
parents:
diff changeset
74 or download the ZIP folder.
c9636a827049 Uploaded
waqas
parents:
diff changeset
75
c9636a827049 Uploaded
waqas
parents:
diff changeset
76 MACARON source code can also be downloaded from http://www.genmed.fr/images/publications/data/MACARON_GenMed.zip
c9636a827049 Uploaded
waqas
parents:
diff changeset
77
c9636a827049 Uploaded
waqas
parents:
diff changeset
78 After acquiring a release distribution of the source code, the build procedure is to unpack the zip file:
c9636a827049 Uploaded
waqas
parents:
diff changeset
79
c9636a827049 Uploaded
waqas
parents:
diff changeset
80 unzip MACARON_GenMed.zip
c9636a827049 Uploaded
waqas
parents:
diff changeset
81
c9636a827049 Uploaded
waqas
parents:
diff changeset
82 ### Contents of the folder MACARON_GenMed
c9636a827049 Uploaded
waqas
parents:
diff changeset
83
c9636a827049 Uploaded
waqas
parents:
diff changeset
84 * *MACARON* – The MACARON python code
c9636a827049 Uploaded
waqas
parents:
diff changeset
85 * *MACARON_validate.sh* – a BASH-shell script to validate multi-SNVs located on the same read that affect the same genetic codon
c9636a827049 Uploaded
waqas
parents:
diff changeset
86
c9636a827049 Uploaded
waqas
parents:
diff changeset
87 # Running the MACARON
c9636a827049 Uploaded
waqas
parents:
diff changeset
88
c9636a827049 Uploaded
waqas
parents:
diff changeset
89 ### Input Requirements
c9636a827049 Uploaded
waqas
parents:
diff changeset
90
c9636a827049 Uploaded
waqas
parents:
diff changeset
91 Before running MACARON, check these __input technical notes__ as the following limitations exist for either the input VCF file, or the required software dependencines:
c9636a827049 Uploaded
waqas
parents:
diff changeset
92
c9636a827049 Uploaded
waqas
parents:
diff changeset
93 * Chromosome (chr) notation should be compatible with both input VCF file and Human Reference Genome file, or vice versa,
c9636a827049 Uploaded
waqas
parents:
diff changeset
94
c9636a827049 Uploaded
waqas
parents:
diff changeset
95 * Sequence dictionaries of input VCF file and Human Reference Genome file should be the same,
c9636a827049 Uploaded
waqas
parents:
diff changeset
96
c9636a827049 Uploaded
waqas
parents:
diff changeset
97 * Input VCF file (should) suitably be annotated with ANNOVAR, and additionally with any other annotation software, e.g, VEP (https://www.ensembl.org/info/docs/tools/vep/index.html) if the user has a desire to get the full functionality of -f option (see [Advanced Options](#advanced-options) below),
c9636a827049 Uploaded
waqas
parents:
diff changeset
98
c9636a827049 Uploaded
waqas
parents:
diff changeset
99 * Same Human Reference Genome file should be used for MACARON which is practiced earlier for alignemnt and (or) to call variant sets,
c9636a827049 Uploaded
waqas
parents:
diff changeset
100
c9636a827049 Uploaded
waqas
parents:
diff changeset
101 * Versions of input VCF file, Human Reference Genome file and SnpEff database file should be the same (hg19 / GRCh37 = SnpEff GRCh37.75) or (hg38 / GRCh38 = SnpEff GRCh38.86).
c9636a827049 Uploaded
waqas
parents:
diff changeset
102
c9636a827049 Uploaded
waqas
parents:
diff changeset
103 ### Default Options
c9636a827049 Uploaded
waqas
parents:
diff changeset
104
c9636a827049 Uploaded
waqas
parents:
diff changeset
105 For a full list of MACARON executable options, run:
c9636a827049 Uploaded
waqas
parents:
diff changeset
106
c9636a827049 Uploaded
waqas
parents:
diff changeset
107 python MACARON -h
c9636a827049 Uploaded
waqas
parents:
diff changeset
108
c9636a827049 Uploaded
waqas
parents:
diff changeset
109 By default, MACARON depends on the `GLOBAL VARIABLES` set in the script before run:
c9636a827049 Uploaded
waqas
parents:
diff changeset
110
c9636a827049 Uploaded
waqas
parents:
diff changeset
111 ## GLOBAL VARIABLES (IMPORTANT: You can set the default values here)
c9636a827049 Uploaded
waqas
parents:
diff changeset
112 GATK="/home/wuk/software/GenomeAnalysisTK.jar"
c9636a827049 Uploaded
waqas
parents:
diff changeset
113 #GATK="/home/wuk/software/gatk-4.0.1.2/gatk-package-4.0.1.2-local.jar"
c9636a827049 Uploaded
waqas
parents:
diff changeset
114 HG_REF="/home/wuk/Working/gnme_refrnces/Homo_sapiens_assembly19.fasta"
c9636a827049 Uploaded
waqas
parents:
diff changeset
115 SNPEFF="/home/wuk/software/snpEff/snpEff.jar"
c9636a827049 Uploaded
waqas
parents:
diff changeset
116 SNPEFF_HG="GRCh37.75" ## SnpEff genome version
c9636a827049 Uploaded
waqas
parents:
diff changeset
117
c9636a827049 Uploaded
waqas
parents:
diff changeset
118 To run MACARON with __GATK <4.0__ versions, simply type:
c9636a827049 Uploaded
waqas
parents:
diff changeset
119
c9636a827049 Uploaded
waqas
parents:
diff changeset
120 python MACARON -i test_input.vcf
c9636a827049 Uploaded
waqas
parents:
diff changeset
121
c9636a827049 Uploaded
waqas
parents:
diff changeset
122 If running with __GATK >= 4.0__ versions, make following changes:
c9636a827049 Uploaded
waqas
parents:
diff changeset
123
c9636a827049 Uploaded
waqas
parents:
diff changeset
124 #GATK="/home/wuk/software/GenomeAnalysisTK.jar"
c9636a827049 Uploaded
waqas
parents:
diff changeset
125 GATK="/home/wuk/software/gatk-4.0.1.2/gatk-package-4.0.1.2-local.jar"
c9636a827049 Uploaded
waqas
parents:
diff changeset
126 HG_REF="/home/wuk/Working/gnme_refrnces/Homo_sapiens_assembly19.fasta"
c9636a827049 Uploaded
waqas
parents:
diff changeset
127 SNPEFF="/home/wuk/software/snpEff/snpEff.jar"
c9636a827049 Uploaded
waqas
parents:
diff changeset
128 SNPEFF_HG="GRCh37.75" ## SnpEff genome version
c9636a827049 Uploaded
waqas
parents:
diff changeset
129
c9636a827049 Uploaded
waqas
parents:
diff changeset
130 and run with:
c9636a827049 Uploaded
waqas
parents:
diff changeset
131
c9636a827049 Uploaded
waqas
parents:
diff changeset
132 python MACARON -i test_input.vcf --gatk4
c9636a827049 Uploaded
waqas
parents:
diff changeset
133
c9636a827049 Uploaded
waqas
parents:
diff changeset
134 ### demo Folder
c9636a827049 Uploaded
waqas
parents:
diff changeset
135
c9636a827049 Uploaded
waqas
parents:
diff changeset
136 To help verify a successful installation, MACARON includes a small demo data set:
c9636a827049 Uploaded
waqas
parents:
diff changeset
137
c9636a827049 Uploaded
waqas
parents:
diff changeset
138 * *variants_of_interest.vcf* – a test VCF file to check the functionality of MACARON
c9636a827049 Uploaded
waqas
parents:
diff changeset
139 * *MACARON_output.txt* – The output file generated by running the MACARON
c9636a827049 Uploaded
waqas
parents:
diff changeset
140 * *sub1.chr22_21349676-21349677.sample02.bam* – a subset of BAM file used as input for MACARON_validate.sh
c9636a827049 Uploaded
waqas
parents:
diff changeset
141 * *MACARON_validate.txt* – The output file with read count information of concerned pcSNV in sample02 (in this case).
c9636a827049 Uploaded
waqas
parents:
diff changeset
142 (All files are referenced with hg19)
c9636a827049 Uploaded
waqas
parents:
diff changeset
143
c9636a827049 Uploaded
waqas
parents:
diff changeset
144 `cd` to `demo` folder and run:
c9636a827049 Uploaded
waqas
parents:
diff changeset
145
c9636a827049 Uploaded
waqas
parents:
diff changeset
146 python ../MACARON -i variants_of_interest.vcf
c9636a827049 Uploaded
waqas
parents:
diff changeset
147
c9636a827049 Uploaded
waqas
parents:
diff changeset
148 MACARON_output.txt is the default output file name of MACARON. User can change it with `-o` option.
c9636a827049 Uploaded
waqas
parents:
diff changeset
149
c9636a827049 Uploaded
waqas
parents:
diff changeset
150 python ../MACARON -i variants_of_interest.vcf -o variants_of_interest.txt
c9636a827049 Uploaded
waqas
parents:
diff changeset
151
c9636a827049 Uploaded
waqas
parents:
diff changeset
152 ### Advanced Options
c9636a827049 Uploaded
waqas
parents:
diff changeset
153
c9636a827049 Uploaded
waqas
parents:
diff changeset
154 MACARON can be run by invoking paths directly set from the command-line:
c9636a827049 Uploaded
waqas
parents:
diff changeset
155
c9636a827049 Uploaded
waqas
parents:
diff changeset
156 ```bash
c9636a827049 Uploaded
waqas
parents:
diff changeset
157 python ../MACARON -i variants_of_interest.vcf --GATK /home/wuk/software/GenomeAnalysisTK.jar --HG_REF /home/wuk/Working/gnme_refrnces/Homo_sapiens_assembly19.fasta --SNPEFF /home/wuk/software/snpEff/snpEff.jar --SNPEFF_HG GRCh37.75
c9636a827049 Uploaded
waqas
parents:
diff changeset
158 ```
c9636a827049 Uploaded
waqas
parents:
diff changeset
159 * For __GATK >= 4.0__ versions:
c9636a827049 Uploaded
waqas
parents:
diff changeset
160
c9636a827049 Uploaded
waqas
parents:
diff changeset
161 ```bash
c9636a827049 Uploaded
waqas
parents:
diff changeset
162 python ../MACARON -i variants_of_interest.vcf --gatk4 --GATK /home/wuk/software/ --HG_REF /home/wuk/Working/gnme_refrnces/Homo_sapiens_assembly19.fasta --SNPEFF /home/wuk/software/snpEff/snpEff.jar --SNPEFF_HG GRCh37.75
c9636a827049 Uploaded
waqas
parents:
diff changeset
163 ```
c9636a827049 Uploaded
waqas
parents:
diff changeset
164 MACARON can add additional fields, besdies the dafault (see [MACARON Reporting Format](#macaron-reporting-format)) by using `-f` option:
c9636a827049 Uploaded
waqas
parents:
diff changeset
165
c9636a827049 Uploaded
waqas
parents:
diff changeset
166 * `-f CSQ` (if input VCF file is additionally annotated with VEP, the output txt file also has the same complete annotation for each variant record)
c9636a827049 Uploaded
waqas
parents:
diff changeset
167
c9636a827049 Uploaded
waqas
parents:
diff changeset
168 * `-f EFF` (if user wants to output SnpEff annotations in output txt file), or -f ANN (if SnpEff is used without -formatEff option)
c9636a827049 Uploaded
waqas
parents:
diff changeset
169
c9636a827049 Uploaded
waqas
parents:
diff changeset
170 * `-f QUAL,DP,AF,Func.refGene,Gene.refGene,GeneDetail.refGene` (this will keep any other default annotations of input VCF file and of ANNOVAR to output txt file)
c9636a827049 Uploaded
waqas
parents:
diff changeset
171
c9636a827049 Uploaded
waqas
parents:
diff changeset
172 -f can be used multiple times, e.g.,
c9636a827049 Uploaded
waqas
parents:
diff changeset
173
c9636a827049 Uploaded
waqas
parents:
diff changeset
174 * `-f CSQ,DP,Func.refGene`
c9636a827049 Uploaded
waqas
parents:
diff changeset
175 or
c9636a827049 Uploaded
waqas
parents:
diff changeset
176 * `-f FILTER,EFF,CSQ,AF`
c9636a827049 Uploaded
waqas
parents:
diff changeset
177
c9636a827049 Uploaded
waqas
parents:
diff changeset
178 The order of the fields in the output txt file depends on the order of INFO field headers used in `-f`.
c9636a827049 Uploaded
waqas
parents:
diff changeset
179
c9636a827049 Uploaded
waqas
parents:
diff changeset
180 ```bash
c9636a827049 Uploaded
waqas
parents:
diff changeset
181 python ../MACARON -i variants_of_interest.vcf --gatk4 --GATK /home/wuk/software/ --HG_REF /home/wuk/Working/gnme_refrnces/Homo_sapiens_assembly19.fasta --SNPEFF /home/wuk/software/snpEff/snpEff.jar --SNPEFF_HG GRCh37.75 -f QUAL,FILTER,SIFT_pred
c9636a827049 Uploaded
waqas
parents:
diff changeset
182 ```
c9636a827049 Uploaded
waqas
parents:
diff changeset
183 Without `-f` option, `QUAL` field is outputted as default.If user wants to keep `QUAL` along with any other field, `-f` should mentiond `QUAL` in addition to other field headers: `-f QUAL,FILTER,SIFT_pred`. If only `-f SIFT_pred` is used, `QUAL` field is over-written by `SIFT_pred` field.
c9636a827049 Uploaded
waqas
parents:
diff changeset
184
c9636a827049 Uploaded
waqas
parents:
diff changeset
185 # MACARON Reporting Format
c9636a827049 Uploaded
waqas
parents:
diff changeset
186
c9636a827049 Uploaded
waqas
parents:
diff changeset
187 MACARON outputs a table text file with the following format specifications:
c9636a827049 Uploaded
waqas
parents:
diff changeset
188
c9636a827049 Uploaded
waqas
parents:
diff changeset
189 ```
c9636a827049 Uploaded
waqas
parents:
diff changeset
190 chr22 21349676 rs412470 T A LZTR1 423 T/T T/A T/T 0/0 0/1 0/0 MISSENSE S92T Tct Act ATt I 0 0
c9636a827049 Uploaded
waqas
parents:
diff changeset
191 chr22 21349677 rs376419 C T LZTR1 423 C/C C/T C/C 0/0 0/1 0/0 MISSENSE S92F tCt tTt 0 I 0 0
c9636a827049 Uploaded
waqas
parents:
diff changeset
192 ```
c9636a827049 Uploaded
waqas
parents:
diff changeset
193 Field Number | Field Name | Description
c9636a827049 Uploaded
waqas
parents:
diff changeset
194 --- | --- | ---
c9636a827049 Uploaded
waqas
parents:
diff changeset
195 1 |CHROM | Chromosome number
c9636a827049 Uploaded
waqas
parents:
diff changeset
196 2 | POS | Chromosomal position / coordinates of SNV
c9636a827049 Uploaded
waqas
parents:
diff changeset
197 3 | ID | dbSNP rsID
c9636a827049 Uploaded
waqas
parents:
diff changeset
198 4 | REF | Reference base
c9636a827049 Uploaded
waqas
parents:
diff changeset
199 5 | ALT | Alternate base
c9636a827049 Uploaded
waqas
parents:
diff changeset
200 6 | Gene_Name | Name of a gene in which SnpCluster is located
c9636a827049 Uploaded
waqas
parents:
diff changeset
201 7 | QUAL | Quality of the ALT base called
c9636a827049 Uploaded
waqas
parents:
diff changeset
202 8 | [SAMPLE NAME].GT | Genotype of samples as base conventions as well as binary conventions
c9636a827049 Uploaded
waqas
parents:
diff changeset
203 9 | Protein_coding_EFF | Functional Effect of Variant on protein
c9636a827049 Uploaded
waqas
parents:
diff changeset
204 10 | AA-Change | Amino acid change by individual SNV
c9636a827049 Uploaded
waqas
parents:
diff changeset
205 11 | REF-codon | Reference Codon
c9636a827049 Uploaded
waqas
parents:
diff changeset
206 12 | ALT-codon | Alternate Codon
c9636a827049 Uploaded
waqas
parents:
diff changeset
207 13 | ALT-codon_merge-2VAR | A new codon formed by the combination of two Alt-codons (pcSNV codon; see [MACARON](https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/bty382/4992149?redirectedFrom=fulltext))
c9636a827049 Uploaded
waqas
parents:
diff changeset
208 14 | AA-Change-2VAR | Re-annotated amino acid formed by pcSNV codon
c9636a827049 Uploaded
waqas
parents:
diff changeset
209 15 | ALT-codon_merge-3VAR | A new codon formed by the combination of three Alt-codons
c9636a827049 Uploaded
waqas
parents:
diff changeset
210 16 | AA-Change-3VAR | Re-annotated amino acid formed by the combination of three Alt-codons
c9636a827049 Uploaded
waqas
parents:
diff changeset
211
c9636a827049 Uploaded
waqas
parents:
diff changeset
212 This default's MACARON output can be changed by using `-f` option. For example, if MACARON run with `-f QUAL,FILTER,SIFT_pred`, the new output looks like:
c9636a827049 Uploaded
waqas
parents:
diff changeset
213
c9636a827049 Uploaded
waqas
parents:
diff changeset
214 Field Number | Field Name | Description
c9636a827049 Uploaded
waqas
parents:
diff changeset
215 --- | --- | ---
c9636a827049 Uploaded
waqas
parents:
diff changeset
216 1 |CHROM | Chromosome number
c9636a827049 Uploaded
waqas
parents:
diff changeset
217 2 | POS | Chromosomal position / coordinates of SNV
c9636a827049 Uploaded
waqas
parents:
diff changeset
218 3 | ID | dbSNP rsID
c9636a827049 Uploaded
waqas
parents:
diff changeset
219 4 | REF | Reference base
c9636a827049 Uploaded
waqas
parents:
diff changeset
220 5 | ALT | Alternate base
c9636a827049 Uploaded
waqas
parents:
diff changeset
221 6 | Gene_Name | Name of a gene in which SnpCluster is located
c9636a827049 Uploaded
waqas
parents:
diff changeset
222 7 | QUAL | Quality of the ALT base called
c9636a827049 Uploaded
waqas
parents:
diff changeset
223 8 | FILTER | Filter (PASS) tag
c9636a827049 Uploaded
waqas
parents:
diff changeset
224 9 | SIFT_pred | Functional effect prediction of SNV on protien
c9636a827049 Uploaded
waqas
parents:
diff changeset
225 10 | [SAMPLE NAME].GT | Genotype of samples as base conventions as well as binary conventions
c9636a827049 Uploaded
waqas
parents:
diff changeset
226 11 | Protein_coding_EFF | Functional Effect of Variant on protein
c9636a827049 Uploaded
waqas
parents:
diff changeset
227 12 | AA-Change | Amino acid change by individual SNV
c9636a827049 Uploaded
waqas
parents:
diff changeset
228 13 | REF-codon | Reference Codon
c9636a827049 Uploaded
waqas
parents:
diff changeset
229 14 | ALT-codon | Alternate Codon
c9636a827049 Uploaded
waqas
parents:
diff changeset
230 15 | ALT-codon_merge-2VAR | A new codon formed by the combination of two Alt-codons (pcSNV codon; see [MACARON](https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/bty382/4992149?redirectedFrom=fulltext))
c9636a827049 Uploaded
waqas
parents:
diff changeset
231 16 | AA-Change-2VAR | Re-annotated amino acid formed by pcSNV codon
c9636a827049 Uploaded
waqas
parents:
diff changeset
232 17 | ALT-codon_merge-3VAR | A new codon formed by the combination of three Alt-codons
c9636a827049 Uploaded
waqas
parents:
diff changeset
233 18 | AA-Change-3VAR | Re-annotated amino acid formed by the combination of three Alt-codons
c9636a827049 Uploaded
waqas
parents:
diff changeset
234
c9636a827049 Uploaded
waqas
parents:
diff changeset
235 # Validating SNVs Existed on the Same Reads
c9636a827049 Uploaded
waqas
parents:
diff changeset
236
c9636a827049 Uploaded
waqas
parents:
diff changeset
237 **NB: You do not need to run this step if you already used phased VCF file to run MACARON**
c9636a827049 Uploaded
waqas
parents:
diff changeset
238
c9636a827049 Uploaded
waqas
parents:
diff changeset
239 To confirm the existence of multi-SNVs within the same genetic codon, an accessory BASH-shell script [MACARON_validate.sh](MACARON_validate.sh) calculates the read count information of affected bases. This script requires as an input subset of BAM files (should be the same that used to generate the input VCF file) covering 50 bps over each SnpCluster.
c9636a827049 Uploaded
waqas
parents:
diff changeset
240
c9636a827049 Uploaded
waqas
parents:
diff changeset
241 Subset of any BAM file can be generated by using the following command:
c9636a827049 Uploaded
waqas
parents:
diff changeset
242
c9636a827049 Uploaded
waqas
parents:
diff changeset
243 `
c9636a827049 Uploaded
waqas
parents:
diff changeset
244 samtools view –hb –L sub1.bed sample02.bam > sub1.chr22_21349676-21349677.sample02.bam
c9636a827049 Uploaded
waqas
parents:
diff changeset
245 `
c9636a827049 Uploaded
waqas
parents:
diff changeset
246
c9636a827049 Uploaded
waqas
parents:
diff changeset
247 In this case, our big BAM file `sample02.bam` (not provided here, obviously!!!) is subsetted as `sub1.chr22_21349676-21349677.sample02.bam` (see [demo](demo) folder) for the position `chr22:21349676`. The naming format of output BAM file should be the same. The `sub1.bed` file has 1 tab-seperated line:
c9636a827049 Uploaded
waqas
parents:
diff changeset
248
c9636a827049 Uploaded
waqas
parents:
diff changeset
249 `chr22 21349676`
c9636a827049 Uploaded
waqas
parents:
diff changeset
250
c9636a827049 Uploaded
waqas
parents:
diff changeset
251 representing the first position of SnpCluster (SNV1 only).
c9636a827049 Uploaded
waqas
parents:
diff changeset
252
c9636a827049 Uploaded
waqas
parents:
diff changeset
253 Once subset BAM file(s) are generated, run MACARON_validate.sh:
c9636a827049 Uploaded
waqas
parents:
diff changeset
254
c9636a827049 Uploaded
waqas
parents:
diff changeset
255 `MACARON_validate.sh sub1.chr22_21349676-21349677.sample02.bam`
c9636a827049 Uploaded
waqas
parents:
diff changeset
256
c9636a827049 Uploaded
waqas
parents:
diff changeset
257 This will generate an output text file (`MACARON_validate.txt`) allowing the user for further analysis.
c9636a827049 Uploaded
waqas
parents:
diff changeset
258
c9636a827049 Uploaded
waqas
parents:
diff changeset
259 sub1 chr22:21349676-21349677 sample02
c9636a827049 Uploaded
waqas
parents:
diff changeset
260 1 AA
c9636a827049 Uploaded
waqas
parents:
diff changeset
261 1 T
c9636a827049 Uploaded
waqas
parents:
diff changeset
262 11 AT
c9636a827049 Uploaded
waqas
parents:
diff changeset
263 14 TC
c9636a827049 Uploaded
waqas
parents:
diff changeset
264
c9636a827049 Uploaded
waqas
parents:
diff changeset
265 See [MACARON-GenMed-LabEx Wiki page](https://github.com/waqasuddinkhan/MACARON-GenMed-LabEx/wiki) for more details, and interpretations of the [demo](demo) data.
c9636a827049 Uploaded
waqas
parents:
diff changeset
266
c9636a827049 Uploaded
waqas
parents:
diff changeset
267 # References
c9636a827049 Uploaded
waqas
parents:
diff changeset
268
c9636a827049 Uploaded
waqas
parents:
diff changeset
269 __1.__ [Van der Auwera G.A., et al. (2013) From FastQ Data to High-Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline, Curr Protoc Bioinformatics, 43:11.10.1-11.10.33](https://currentprotocols.onlinelibrary.wiley.com/doi/abs/10.1002/0471250953.bi1110s43).
c9636a827049 Uploaded
waqas
parents:
diff changeset
270
c9636a827049 Uploaded
waqas
parents:
diff changeset
271 __2.__ [Cingolani, P., et al. (2012) A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3, Fly, 6, 80-92](https://www.tandfonline.com/doi/full/10.4161/fly.19695).
c9636a827049 Uploaded
waqas
parents:
diff changeset
272
c9636a827049 Uploaded
waqas
parents:
diff changeset
273 __3.__ [McLaren, W., et al. (2010) Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor, Bioinformatics, 26, 2069-2070](https://academic.oup.com/bioinformatics/article/26/16/2069/217748).
c9636a827049 Uploaded
waqas
parents:
diff changeset
274
c9636a827049 Uploaded
waqas
parents:
diff changeset
275 __4.__ [Wang, K., Li, M. and Hakonarson, H. (2010) ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data, Nucleic Acids Res, 38, e164](https://academic.oup.com/nar/article/38/16/e164/1749458).
c9636a827049 Uploaded
waqas
parents:
diff changeset
276
c9636a827049 Uploaded
waqas
parents:
diff changeset
277 # Citation
c9636a827049 Uploaded
waqas
parents:
diff changeset
278
c9636a827049 Uploaded
waqas
parents:
diff changeset
279 If you use [MACARON](https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/bty382/4992149?redirectedFrom=fulltext) in your research, please cite:
c9636a827049 Uploaded
waqas
parents:
diff changeset
280
c9636a827049 Uploaded
waqas
parents:
diff changeset
281 *Khan W. et al. MACARON: a python framework to identify and re-annotate multi-base affected codons in whole genome/exome sequence data, Bioinformatics 2018*
c9636a827049 Uploaded
waqas
parents:
diff changeset
282
c9636a827049 Uploaded
waqas
parents:
diff changeset
283 *CONTACT: david-alexandre.tregouet@inserm.fr; waqasnayab@gmail.com*
c9636a827049 Uploaded
waqas
parents:
diff changeset
284
c9636a827049 Uploaded
waqas
parents:
diff changeset
285 *VERSION: 0.7*
c9636a827049 Uploaded
waqas
parents:
diff changeset
286 *VERSION DATE: September 5, 2018*