Mercurial > repos > saskia-hiltemann > annovar
changeset 5:4600be69b96f draft
Added databases 1000g2015aug, SPIDEX, avsnp138, avsnp142, exac03
author | saskia-hiltemann |
---|---|
date | Thu, 01 Oct 2015 04:24:45 -0400 |
parents | e423536a0780 |
children | a3b16fd125c3 |
files | tools/README tools/annovar/annovar.sh tools/annovar/annovar.xml tools/tool-data/annovar.loc.sample tools/tool_data_table_conf.xml.sample tools/tool_dependencies.xml tools/tools/annovar/annovar.sh tools/tools/annovar/annovar.xml |
diffstat | 8 files changed, 2177 insertions(+), 22 deletions(-) [+] |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/README Thu Oct 01 04:24:45 2015 -0400 @@ -0,0 +1,242 @@ +ANNOVAR needs to be installed manually in the following way: + + +1) If you already have ANNOVAR installed on your system, simply edit the tool-data/annovar.loc file to reflect locations of + the perl scripts (annotate_variation.pl and convert2annovar.pl) and humandb directory (directory containing the annovar database files) +1b) Restart galaxy instance for changes in .loc file to take effect + + +2) If you do not have ANNOVAR installed, request annovar download and sign license here: + http://www.openbioinformatics.org/annovar/annovar_download_form.php + + 3) Once downloaded, install annovar per the installation instructions and edit annovar.loc file to reflect location of directory containing perl scripts. + tool uses annotate_variation.pl and convert2annovar.pl + + 4) Then download all desired databases for all desired builds as follows: + annotate_variation.pl -downdb -buildver <build> [-webfrom annovar] <database> <humandb> + + where <humandb> is location where all database files should be stored + and <database> is the database file to download, e.g. refGene (see bottom of document for all available database files at the time of writing this tool) + and <build> can be hg18 or hg19 for humans, also other organisms available. + + list of all available databases can be found here: http://www.openbioinformatics.org/annovar/annovar_db.html + + 5) edit the tool-data/annovar.loc file to reflect location of humandb folder + 5b) restart galaxy instance for changes in .loc file to take effect + +6) Tool uses cgatools join for combining of files, this should be installed automatically with repository. If not, get a copy from Complete Genomics directly: + wget http://sourceforge.net/projects/cgatools/files/1.7.1/cgatools-1.7.1.5-linux_binary-x86_64.tar.gz + tar xvzf cgatools-1.7.1.5-linux_binary-x86_64.tar.gz + + and place the "cgatools" binary found in bin/ directory on your $PATH + + +list of files in my own humandb folder: + +hg18_ALL.sites.2012_04.txt +hg18_ALL.sites.2012_04.txt.idx +hg18_CEU.sites.2010_07.txt +hg18_CEU.sites.2010_07.txt.idx +hg18_JPTCHB.sites.2010_07.txt +hg18_JPTCHB.sites.2010_07.txt.idx +hg18_YRI.sites.2010_07.txt +hg18_YRI.sites.2010_07.txt.idx +hg18_cg46.txt +hg18_cg46.txt.idx +hg18_cg69.txt +hg18_cg69.txt.idx +hg18_cytoBand.txt +hg18_dgvMerged.txt +hg18_ensGene.txt +hg18_ensGeneMrna.fa +hg18_esp5400_aa.txt +hg18_esp5400_aa.txt.idx +hg18_esp5400_all.txt +hg18_esp5400_all.txt.idx +hg18_esp6500_aa.txt +hg18_esp6500_aa.txt.idx +hg18_esp6500_all.txt +hg18_esp6500_all.txt.idx +hg18_esp6500_ea.txt +hg18_esp6500_ea.txt.idx +hg18_esp6500si_aa.txt +hg18_esp6500si_aa.txt.idx +hg18_esp6500si_all.txt +hg18_esp6500si_all.txt.idx +hg18_esp6500si_ea.txt +hg18_esp6500si_ea.txt.idx +hg18_example_db_generic.txt +hg18_example_db_gff3.txt +hg18_genomicSuperDups.txt +hg18_gerp++gt2.txt +hg18_gerp++gt2.txt.idx +hg18_gwasCatalog.txt +hg18_kgXref.txt +hg18_knownGene.txt +hg18_knownGeneMrna.fa +hg18_ljb2_fathmm.txt +hg18_ljb2_fathmm.txt.idx +hg18_ljb2_gerp++.txt +hg18_ljb2_gerp++.txt.idx +hg18_ljb2_ma.txt +hg18_ljb2_ma.txt.idx +hg18_ljb2_mt.txt +hg18_ljb2_mt.txt.idx +hg18_ljb2_phylop.txt +hg18_ljb2_phylop.txt.idx +hg18_ljb2_pp2hdiv.txt +hg18_ljb2_pp2hdiv.txt.idx +hg18_ljb2_pp2hvar.txt +hg18_ljb2_pp2hvar.txt.idx +hg18_ljb2_sift.txt +hg18_ljb2_sift.txt.idx +hg18_ljb2_siphy.txt +hg18_ljb2_siphy.txt.idx +hg18_phastConsElements44way.txt +hg18_refGene.txt +hg18_refGeneMrna.fa +hg18_refLink.txt +hg18_snp128.txt +hg18_snp128.txt.idx +hg18_snp128NonFlagged.txt +hg18_snp128NonFlagged.txt.idx +hg18_snp129.txt +hg18_snp129.txt.idx +hg18_snp129NonFlagged.txt +hg18_snp129NonFlagged.txt.idx +hg18_snp130.txt +hg18_snp130.txt.idx +hg18_snp130NonFlagged.txt +hg18_snp130NonFlagged.txt.idx +hg18_snp131.txt +hg18_snp131.txt.idx +hg18_snp131NonFlagged.txt +hg18_snp131NonFlagged.txt.idx +hg18_snp132.txt +hg18_snp132.txt.idx +hg18_snp132NonFlagged.txt +hg18_snp132NonFlagged.txt.idx +hg18_tfbsConsSites.txt +hg19_AFR.sites.2012_04.txt +hg19_AFR.sites.2012_04.txt.idx +hg19_ALL.sites.2010_11.txt +hg19_ALL.sites.2010_11.txt.idx +hg19_ALL.sites.2012_02.txt +hg19_ALL.sites.2012_02.txt.idx +hg19_ALL.sites.2012_04.txt +hg19_ALL.sites.2012_04.txt.idx +hg19_AMR.sites.2012_04.txt +hg19_AMR.sites.2012_04.txt.idx +hg19_ASN.sites.2012_04.txt +hg19_ASN.sites.2012_04.txt.idx +hg19_EUR.sites.2012_04.txt +hg19_EUR.sites.2012_04.txt.idx +hg19_avsift.txt +hg19_avsift.txt.idx +hg19_cg46.txt +hg19_cg46.txt.idx +hg19_cg69.txt +hg19_cg69.txt.idx +hg19_clinvar_20131105.txt +hg19_clinvar_20131105.txt.idx +hg19_cosmic61.txt +hg19_cosmic61.txt.idx +hg19_cosmic63.txt +hg19_cosmic63.txt.idx +hg19_cosmic64.txt +hg19_cosmic64.txt.idx +hg19_cosmic65.txt +hg19_cosmic65.txt.idx +hg19_cosmic67.txt +hg19_cytoBand.txt +hg19_dgvMerged.txt +hg19_ensGene.txt +hg19_ensGeneMrna.fa +hg19_esp5400_aa.txt +hg19_esp5400_aa.txt.idx +hg19_esp5400_all.txt +hg19_esp5400_all.txt.idx +hg19_esp6500_aa.txt +hg19_esp6500_aa.txt.idx +hg19_esp6500_all.txt +hg19_esp6500_all.txt.idx +hg19_esp6500_ea.txt +hg19_esp6500_ea.txt.idx +hg19_esp6500si_aa.txt +hg19_esp6500si_aa.txt.idx +hg19_esp6500si_all.txt +hg19_esp6500si_all.txt.idx +hg19_esp6500si_ea.txt +hg19_esp6500si_ea.txt.idx +hg19_genomicSuperDups.txt +hg19_gerp++gt2.txt +hg19_gerp++gt2.txt.idx +hg19_gwasCatalog.txt +hg19_kgXref.txt +hg19_knownGene.txt +hg19_knownGeneMrna.fa +hg19_ljb2_fathmm.txt +hg19_ljb2_fathmm.txt.idx +hg19_ljb2_gerp++.txt +hg19_ljb2_gerp++.txt.idx +hg19_ljb2_ma.txt +hg19_ljb2_ma.txt.idx +hg19_ljb2_mt.txt +hg19_ljb2_phylop.txt +hg19_ljb2_phylop.txt.idx +hg19_ljb2_pp2hdiv.txt +hg19_ljb2_pp2hdiv.txt.idx +hg19_ljb2_pp2hvar.txt +hg19_ljb2_pp2hvar.txt.idx +hg19_ljb2_sift.txt +hg19_ljb2_sift.txt.idx +hg19_ljb2_siphy.txt +hg19_nci60.txt +hg19_nci60.txt.idx +hg19_phastConsElements46way.txt +hg19_refGene.txt +hg19_refGeneMrna.fa +hg19_refLink.txt +hg19_snp130.txt +hg19_snp130.txt.idx +hg19_snp130NonFlagged.txt +hg19_snp130NonFlagged.txt.idx +hg19_snp131.txt +hg19_snp131NonFlagged.txt +hg19_snp131NonFlagged.txt.idx +hg19_snp132.txt +hg19_snp132.txt.idx +hg19_snp132NonFlagged.txt +hg19_snp132NonFlagged.txt.idx +hg19_snp135.txt +hg19_snp135NonFlagged.txt +hg19_snp135NonFlagged.txt.idx +hg19_snp137.txt +hg19_snp137NonFlagged.txt +hg19_snp137NonFlagged.txt.idx +hg19_tfbsConsSites.txt + + +obsolete functional impact database files: (disabled by default) +hg18_avsift.txt +hg18_avsift.txt.idx +hg19_ljb_all.txt +hg19_ljb_all.txt.idx +hg19_ljb_lrt.txt +hg19_ljb_lrt.txt.idx +hg19_ljb_mt.txt +hg19_ljb_mt.txt.idx +hg19_ljb_phylop.txt +hg19_ljb_phylop.txt.idx +hg19_ljb_pp2.txt +hg19_ljb_pp2.txt.idx +hg18_ljb_all.txt +hg18_ljb_all.txt.idx +hg18_ljb_lrt.txt +hg18_ljb_lrt.txt.idx +hg18_ljb_mt.txt +hg18_ljb_mt.txt.idx +hg18_ljb_phylop.txt +hg18_ljb_phylop.txt.idx +hg18_ljb_pp2.txt +hg18_ljb_pp2.txt.idx
--- a/tools/annovar/annovar.sh Thu Apr 10 09:31:26 2014 -0400 +++ b/tools/annovar/annovar.sh Thu Oct 01 04:24:45 2015 -0400 @@ -179,7 +179,7 @@ ################################# -set -- `getopt -n$0 -u -a --longoptions="inputfile: buildver: humandb: varfile: VCF: chrcol: startcol: endcol: refcol: obscol: vartypecol: convertcoords: geneanno: hgvs: verdbsnp: tfbs: mce: cytoband: segdup: dgv: gwas: ver1000g: cg46: cg69: impactscores: newimpactscores: otherinfo: esp: gerp: cosmic61: cosmic63: cosmic64: cosmic65: cosmic67: cosmic68: clinvar: nci60: outall: outfilt: outinvalid: scriptsdir: dorunannovar: dofilter: filt_dbsnp: filt1000GALL: filt1000GAFR: filt1000GAMR: filt1000GASN: filt1000GEUR: filtESP6500ALL: filtESP6500EA: filtESP6500AA: filtcg46: filtcg69: dummy:" "h:" "$@"` || usage +set -- `getopt -n$0 -u -a --longoptions="inputfile: buildver: humandb: varfile: VCF: chrcol: startcol: endcol: refcol: obscol: vartypecol: convertcoords: geneanno: hgvs: verdbsnp: tfbs: mce: cytoband: segdup: dgv: gwas: ver1000g: cg46: cg69: impactscores: newimpactscores: otherinfo: esp: exac03: spidex: gonl: gerp: cosmic61: cosmic63: cosmic64: cosmic65: cosmic67: cosmic68: clinvar: nci60: outall: outfilt: outinvalid: scriptsdir: dorunannovar: dofilter: filt_dbsnp: filt1000GALL: filt1000GAFR: filt1000GAMR: filt1000GASN: filt1000GEUR: filtESP6500ALL: filtESP6500EA: filtESP6500AA: filtcg46: filtcg69: dummy:" "h:" "$@"` || usage [ $# -eq 0 ] && usage @@ -216,6 +216,9 @@ --otherinfo) otherinfo=$2;shift;; --scriptsdir) scriptsdirtmp=$2;shift;; # Y or N --esp) esp=$2;shift;; # Y or N + --exac03) exac03=$2;shift;; + --gonl) gonl=$2;shift;; + --spidex) spidex=$2;shift;; --gerp) gerp=$2;shift;; # Y or N --cosmic61) cosmic61=$2;shift;; # Y or N --cosmic63) cosmic63=$2;shift;; # Y or N @@ -524,6 +527,7 @@ OFS="\t"; }{ if(FNR>1) { + gsub(/chr/,"",$"'"${chrcol}"'") if( $"'"${vartypecol}"'" == "snp" ){ $"'"${startcol}"'" += 1 }; if( $"'"${vartypecol}"'" == "ins" ){ $"'"${refcol}"'" = "-" }; if( $"'"${vartypecol}"'" == "del" ){ $"'"${startcol}"'" +=1; $"'"${obscol}"'" = "-" }; @@ -536,13 +540,15 @@ }' $infile > annovarinput #remove any "chr" prefixes - sed -i 's/chr//g' annovarinput + #sed -i '2,$s/chr//g' annovarinput awk 'BEGIN{ FS="\t"; - OFS="\t"; + OFS="\t"; }{ + if(FNR>=1) { + gsub(/chr/,"",$"'"${chrcol}"'") if( $"'"${vartypecol}"'" == "snp" ){ $"'"${startcol}"'" += 1 }; if( $"'"${vartypecol}"'" == "ins" ){ $"'"${refcol}"'" = "-" }; if( $"'"${vartypecol}"'" == "del" ){ $"'"${startcol}"'" +=1; $"'"${obscol}"'" = "-" }; @@ -555,7 +561,7 @@ }' $infile > originalfile #remove any "chr" prefixes - sed -i 's/chr//g' originalfile + #sed -i '2,$s/chr//g' originalfile sed -i 's/omosome/chromosome/g' originalfile @@ -565,16 +571,16 @@ FS="\t"; OFS="\t"; }{ - if(FNR>1) { - printf("%s\t%s\t%s\t%s\t%s\n",$"'"${chrcol}"'",$"'"${startcol}"'",$"'"${endcol}"'",$"'"${refcol}"'",$"'"${obscol}"'"); + if(FNR>1) { + printf("%s\t%s\t%s\t%s\t%s\n",$"'"${chrcol}"'",$"'"${startcol}"'",$"'"${endcol}"'",$"'"${refcol}"'",$"'"${obscol}"'"); } } END{ }' $infile > annovarinput #remove any "chr" prefixes - sed -i 's/chr//g' annovarinput - sed 's/chr//g' $infile > originalfile + sed -i '2,$s/chr//g' annovarinput + sed '2,$s/chr//g' $infile > originalfile sed -i 's/omosome/chromosome/g' originalfile fi @@ -745,9 +751,15 @@ echo -e "\ndbSNP region Annotation, version: $version" $scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype ${version} annovarinput $humandb 2>&1 + columnname=${version} + if [[ $columnname == snp* ]] + then + columnname="db${version}" + fi + annovarout=annovarinput.${buildver}_${version}_dropped - sed -i '1i\db\tdb'${version}'\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout - joinresults originalfile $annovarout 3 4 5 6 7 B.db${version} + sed -i '1i\db\t'${columnname}'\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.${columnname} done @@ -767,6 +779,8 @@ g1000_colheader_AMR="${version}_AMR" g1000_colheader_ASN="${version}_ASN" g1000_colheader_EUR="${version}_EUR" + g1000_colheader_EAS="${version}_EAS" + g1000_colheader_SAS="${version}_SAS" g1000_colheader_CEU="${version}_CEU" g1000_colheader_YRI="${version}_YRI" g1000_colheader_JPTCHB="${version}_JPTCHB" @@ -775,6 +789,8 @@ doAMR="N" doAFR="N" doASN="N" + doEAS="N" + doSAS="N" doEUR="N" doCEU="N" doYRI="N" @@ -791,7 +807,30 @@ doAFR="Y" doASN="Y" doEUR="Y" - fi + fi + elif [ $version == "1000g2014oct" ] + then + fileID="2014_10" + doALL="Y" + doAMR="Y" + doAFR="Y" + doEUR="Y" + doEAS="Y" + if [ $buildver == "hg19" ] + then + doSAS="Y" + fi + + elif [[ $version == "1000g2015aug" && $buildver == "hg19" ]] + then + fileID="2015_08" + doALL="Y" + doAMR="Y" + doAFR="Y" + doEUR="Y" + doEAS="Y" + doSAS="Y" + elif [[ $version == "1000g2012feb" && $buildver == "hg19" ]] then fileID="2012_02" @@ -855,7 +894,29 @@ sed -i '1i\db\t'$g1000_colheader_ASN'\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout joinresults originalfile $annovarout 3 4 5 6 7 B.$g1000_colheader_ASN fi - + + # EAS + if [ $doEAS == "Y" ] + then + echo -e "\n1000Genomes EAS" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype "${version}_eas" annovarinput $humandb 2>&1 + + annovarout=annovarinput.${buildver}_EAS.sites.${fileID}_dropped + sed -i '1i\db\t'$g1000_colheader_EAS'\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.$g1000_colheader_EAS + fi + + # SAS + if [ $doSAS == "Y" ] + then + echo -e "\n1000Genomes SAS" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype "${version}_sas" annovarinput $humandb 2>&1 + + annovarout=annovarinput.${buildver}_SAS.sites.${fileID}_dropped + sed -i '1i\db\t'$g1000_colheader_SAS'\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.$g1000_colheader_SAS + fi + # EUR if [ $doEUR == "Y" ] then @@ -937,6 +998,8 @@ $scriptsdir/annotate_variation.pl --filter --buildver $buildver $otherinfo -dbtype ljb2_pp2hvar annovarinput $humandb 2>&1 annovarout=annovarinput.${buildver}_ljb2_pp2hvar_dropped + + head $annovarout sed -i '1i\db\tLJB2_PolyPhen2_HVAR\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout joinresults originalfile $annovarout 3 4 5 6 7 B.LJB2_PolyPhen2_HVAR fi @@ -1191,7 +1254,76 @@ done fi + + #ExAC-03 database + if [ $exac03 == "Y" ] + then + echo -e "\nExAC03 Annotation" + $scriptsdir/annotate_variation.pl --filter -otherinfo --buildver $buildver --otherinfo -dbtype exac03 annovarinput $humandb 2>&1 + + #annovarout=annovarinput.${buildver}_exac03_dropped + + # split allelefrequency column into several columns, one per population + awk 'BEGIN{FS="\t" + OFS="\t" + }{ + gsub(",","\t",$2) + print $0 + }END{}' annovarinput.${buildver}_exac03_dropped > $annovarout + + sed -i '1i\db\tExAC_Freq\tExAC_AFR\tExAC_AMR\tExAC_EAS\tExAC_FIN\tExAC_NFE\tExAC_OTH\tExAC_SAS\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 10 11 12 13 14 B.ExAC_Freq,B.ExAC_AFR,B.ExAC_AMR,B.ExAC_EAS,B.ExAC_FIN,B.ExAC_NFE,B.ExAC_OTH,B.ExAC_SAS + fi + #GoNL database + if [ $gonl == "Y" ] + then + + if [ $buildver == "hg19" ] + then + echo -e "\nGoNL Annotation" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver --otherinfo -dbtype generic -genericdbfile ${buildver}_gonl.txt annovarinput $humandb 2>&1 + + ls + annovarout=annovarinput.${buildver}_generic_dropped + + head $annovarout + + sed -i '1i\db\tGoNL\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.GoNL + + fi + + fi + + #SPIDEX database + if [ $spidex == "Y" ] + then + + if [ $buildver == "hg19" ] + then + echo -e "\nSPIDEX Annotation" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver --otherinfo -dbtype spidex annovarinput $humandb 2>&1 + + # split allelefrequency column into several columns, one per population + awk 'BEGIN{FS="\t" + OFS="\t" + }{ + gsub(",","\t",$2) + print $0 + }END{}' annovarinput.${buildver}_spidex_dropped > $annovarout + + #annovarout=annovarinput.${buildver}_spidex_dropped + #head $annovarout + + sed -i '1i\db\tSPIDEX_dpsi_max_tissue\tSPIDEX_dpsi_zscore\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 4 5 6 7 8 B.SPIDEX_dpsi_max_tissue,B.SPIDEX_dpsi_zscore + + fi + + fi + + #GERP++ if [ $gerp == "Y" ] then @@ -1271,6 +1403,16 @@ fi + if [[ $cosmic70 == "Y" && $buildver == "hg19" ]] + then + echo -e "\nCOSMIC70 Annotation" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype cosmic70 annovarinput $humandb 2>&1 + + annovarout="annovarinput.${buildver}_cosmic70_dropped" + sed -i '1i\db\tCOSMIC70\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.COSMIC70 + + fi if [[ $clinvar == "Y" && $buildver == "hg19" ]] then @@ -1356,6 +1498,10 @@ cp originalfile_coords $outfile_all cp annovarinput.invalid_input $outfile_invalid 2>&1 + + sed -i 's/chrchr/chr/g' $outfile_all + sed -i 's/chrchr/chr/g' $outfile_invalid + fi #if $dorunannovar
--- a/tools/annovar/annovar.xml Thu Apr 10 09:31:26 2014 -0400 +++ b/tools/annovar/annovar.xml Thu Oct 01 04:24:45 2015 -0400 @@ -1,4 +1,4 @@ -<tool id="AnnovarShed" name="ANNOVAR" version="2013aug"> +<tool id="AnnovarShed" name="ANNOVAR" version="2015may"> <description> Annotate a file using ANNOVAR </description> <requirements> @@ -8,6 +8,9 @@ <command interpreter="bash"> annovar.sh --esp ${esp} + --gonl ${gonl} + --exac03 ${exac03} + --spidex ${spidex} --gerp ${gerp} --cosmic61 ${cosmic61} --cosmic63 ${cosmic63} @@ -67,8 +70,9 @@ <inputs> <param name="dorun" type="hidden" value="Y"/> <!-- will add tool in future to filter on annovar columns, then will call annovar.sh with dorun==N --> - <param name="reference" type="select" label="Reference"> + <param name="reference" type="select" label="Reference"> <options from_data_table="annovar_loc" /> + <filter type="data_meta" ref="infile" key="dbkey" column="0"/> </param> <param name="infile" type="data" label="Select file to annotate" help="Must be CG varfile or a tab-separated file with a 1 line header"/> @@ -117,7 +121,7 @@ <!-- filter-based annotation --> - <param name="verdbsnp" type="select" label="Select dbSNP version(s) to annotate with" multiple="true" display="checkboxes" optional="true" help="SNPs in dbSNP may be flagged as Clinically Associated, Select the NonFlagged version if you do not wish to annotate with these SNPs "> + <param name="verdbsnp" type="select" label="Select dbSNP version(s) to annotate with" multiple="true" display="checkboxes" optional="true" help="avSNP are reformatted dbSNP databases with one variant per line and left-normalized indels (for a more detailed discussion read this article: http://annovar.openbioinformatics.org/en/latest/articles/dbSNP/). Flagged SNPs include SNPs less than 1% minor allele frequency (MAF) (or unknown), mapping only once to reference assembly, flagged in dbSnp as clinically associated"> <option value="snp128" > 128 (hg18/hg19) </option> <option value="snp128NonFlagged"> 128 NonFlagged </option> <option value="snp129" > 129 (hg18/hg19) </option> @@ -134,9 +138,13 @@ <option value="snp137NonFlagged"> 137 NonFlagged </option> <option value="snp138" > 138 (hg19 only) </option> <option value="snp138NonFlagged"> 138 NonFlagged </option> + <option value="avsnp138" > 138 (avSNP, hg19 only) </option> + <option value="avsnp142" > 142 (avSNP, hg19 only) </option> </param> <param name="ver1000g" type="select" label="Select 1000Genomes Annotation(s)" multiple="true" display="checkboxes" optional="true" help="2012april database for ALL populations was converted to hg18 using the UCSC liftover program"> + <option value="1000g2015aug"> 2015aug (hg19) (6 populations: AMR,AFR,EUR,EAS,SAS,ALL) </option> + <option value="1000g2014oct"> 2014oct (hg18/hg19) (6 populations: AMR,AFR,EUR,EAS,SAS,ALL) </option> <option value="1000g2012apr"> 2012apr (hg18/hg19) (5 populations: AMR,AFR,ASN,CEU,ALL) </option> <option value="1000g2012feb"> 2012feb (hg19) (1 population: ALL) </option> <option value="1000g2010nov"> 2010nov (hg19) (1 population: ALL) </option> @@ -159,21 +167,28 @@ <option value="esp5400_aa" > ESP5400 African Americans </option> </param> + <param name="exac03" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with ExAC 03? (The Exome Aggregation Consortium)" help=" The Exome Aggregation Consortium (ExAC) is a coalition of investigators seeking to aggregate and harmonize exome sequencing data from a wide variety of large-scale sequencing projects, and to make summary data available for the wider scientific community. The data set provided on this website spans 60,706 unrelated individuals sequenced as part of various disease-specific and population genetic studies. See http://exac.broadinstitute.org/faq for more information."/> + <param name="gonl" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with GoNL data? (hg19 only)"/> + <param name="spidex" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with SPIDEX database? (hg19 only)" help="This dataset provides machine-learning prediction on how genetic variants affect RNA splicing. (Xiong et al, Science 2015)"/> + + <param name="gerp" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="GERP++ Annotation?" help="GERP identifies constrained elements in multiple alignments by quantifying substitution deficits (see http://mendel.stanford.edu/SidowLab/downloads/gerp/ for details) This option annotates those variants having GERP++>2 in human genome, as this threshold is typically regarded as evolutionarily conserved and potentially functional"/> <param name="clinvar" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="CLINVAR Annotation? (hg19 only)" help="version 2014-02-11. Annotations include Variant Clinical Significance (unknown, untested, non-pathogenic, probable-non-pathogenic, probable-pathogenic, pathogenic, drug-response, histocompatibility, other) and Variant disease name."/> <param name="nci60" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with NCI60? (hg19 only)" help="NCI-60 exome allele frequency data"/> <param name="cgfortysix" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Complete Genomics 46 Genomes?" help="Diversity Panel; 46 unrelated individuals"/> <param name="cgsixtynine" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Complete Genomics 69 Genomes?" help="Diversity Panel, Pedigree, YRI trio and PUR trio"/> - <param name="cosmic61" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with COSMIC61? (hg19 only)"/> - <param name="cosmic63" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with COSMIC63? (hg19 only)"/> - <param name="cosmic64" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with COSMIC64? (hg19 only)"/> - <param name="cosmic65" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with COSMIC65? (hg19 only)"/> - <param name="cosmic67" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with COSMIC67? (hg19 only)"/> - <param name="cosmic68" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with COSMIC68? (hg19 only)"/> + <param name="cosmic61" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with COSMIC 61? (hg19 only)"/> + <param name="cosmic63" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with COSMIC 63? (hg19 only)"/> + <param name="cosmic64" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with COSMIC 64? (hg19 only)"/> + <param name="cosmic65" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with COSMIC 65? (hg19 only)"/> + <param name="cosmic67" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with COSMIC 67? (hg19 only)"/> + <param name="cosmic68" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with COSMIC 68? (hg19 only)"/> + <param name="cosmic70" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with COSMIC 70? (hg19 only)"/> + + - <param name="newimpactscores" type="select" label="Select functional impact scores (LJB2)" multiple="true" display="checkboxes" optional="true" help="LJB refers to Liu, Jian, Boerwinkle paper in Human Mutation, pubmed ID 21520341. "> <option value="ljb2_sift"> SIFT score </option> <option value="ljb2_pp2hdiv"> PolyPhen2 HDIV score </option>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/tool-data/annovar.loc.sample Thu Oct 01 04:24:45 2015 -0400 @@ -0,0 +1,6 @@ +#loc file for annovar tool +# +# <columns>value, dbkey, name, ANNOVAR_scripts, ANNOVAR_humandb</columns> + +#hg18 hg18 hg18 [Human Mar. 2006 (NCBI36/hg18)] /path/to/annovarscripts /path/to/humandb +#hg19 hg19 hg19 [Human Feb. 2009 (GRCh37/hg19)] /path/to/annovarscripts /path/to/humandb
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/tool_data_table_conf.xml.sample Thu Oct 01 04:24:45 2015 -0400 @@ -0,0 +1,7 @@ +<!-- ANNOVAR files --> +<tables> +<table name="annovar_loc" comment_char="#"> +<columns>value, dbkey, name, ANNOVAR_scripts, ANNOVAR_humandb</columns> +<file path="tool-data/annovar.loc" /> +</table> +</tables>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/tool_dependencies.xml Thu Oct 01 04:24:45 2015 -0400 @@ -0,0 +1,23 @@ +<?xml version="1.0"?> +<tool_dependency> + <package name="cgatools" version="1.7"> + <install version="1.0"> + <actions> + <action type="download_by_url">http://sourceforge.net/projects/cgatools/files/1.7.1/cgatools-1.7.1.5-linux_binary-x86_64.tar.gz</action> + <action type="shell_command"> chmod a+x bin/cgatools</action> + <action type="move_file"> + <source>bin/cgatools</source> + <destination>$INSTALL_DIR/bin</destination> + </action> + <action type="set_environment"> + <environment_variable name="PATH" action="prepend_to">$INSTALL_DIR/bin</environment_variable> + <environment_variable name="PATH" action="prepend_to">$REPOSITORY_INSTALL_DIR</environment_variable> + </action> + </actions> + </install> + <readme> + Downloads and installs the cgatools binary. + </readme> + </package> +</tool_dependency> +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/tools/annovar/annovar.sh Thu Oct 01 04:24:45 2015 -0400 @@ -0,0 +1,1461 @@ +#!/bin/bash + +test="N" +dofilter="N" + +######################### +# DEFINE SOME +# FUNCTIONS +######################### + +function usage(){ + echo "usage: $0 todo" +} + +function runfilter(){ + ifile=$1 + columnname=$2 + threshold=$3 + + if [[ $threshold == "-1" ]] + then + echo "not filtering" + return + fi + + echo "filtering: $columnname, $threshold" + cat $ifile + + #get column number corresponding to column header + column=`awk 'BEGIN{ + FS="\t"; + col=-1 + }{ + if(FNR==1){ + for(i=1;i<=NF;i++){ + if($i == "'"${columnname}"'") + col=i + } + print col + } + }' $ifile ` + + if [ $column == -1 ] + then + echo "no such column, exiting" + return + fi + + #perform filtering using the threshold + awk 'BEGIN{ + FS="\t"; + OFS="\t"; + }{ + if(FNR==1) + print $0; + if(FNR>1){ + if( $"'"${column}"'" == "" ) # empty column, then print + print $0 + else if ("'"${threshold}"'" == "text"){} #if set to text dont check threshold + + else if ($"'"${column}"'" < "'"${threshold}"'") #else do check it + print $0 + } + }' $ifile > tmpfile + + mv tmpfile $ifile +} + +# arguments: originalfile,resultfile,chrcol,startcol,endcol,refcol,obscol,addcols +function joinresults(){ + ofile=$1 + rfile=$2 + colchr=$3 + colstart=$4 + colend=$5 + colref=$6 + colobs=$7 + addcols=$8 #e.g. "B.col1,B.col2" + + test="N" + + # echo "joining result with original file" + if [ $test == "Y" ] + then + echo "ofile: $ofile" + head $ofile + echo "rfile: $rfile" + head $rfile + fi + numlines=`wc $rfile | cut -d" " -f2` + + # if empty results file, just add header fields + if [[ ! -s $rfile ]] + then + dummycol=${addcols:2} + outputcol=${dummycol//",B."/" "} + numcommas=`echo "$addcols" | grep -o "," | wc -l` + + awk 'BEGIN{FS="\t";OFS="\t"}{ + if(FNR==1) + print $0,"'"$outputcol"'"; + else{ + printf $0 + for(i=0;i<="'"$numcommas"'"+1;i++) + printf "\t" + printf "\n" + } + }END{}' $ofile > tempofile + + mv tempofile $ofile + return + fi + + + #get input file column names for cgatools join + col_chr_name=`head -1 $rfile | cut -f${colchr}` + col_start_name=`head -1 $rfile | cut -f${colstart}` + col_end_name=`head -1 $rfile | cut -f${colend}` + col_ref_name=`head -1 $rfile | cut -f${colref}` + col_obs_name=`head -1 $rfile | cut -f${colobs}` + + #get annotation file column names for cgatools join + chr_name=`head -1 $ofile | cut -f${chrcol}` + start_name=`head -1 $ofile | cut -f${startcol}` + end_name=`head -1 $ofile | cut -f${endcol}` + ref_name=`head -1 $ofile | cut -f${refcol}` + obs_name=`head -1 $ofile | cut -f${obscol}` + + if [ $test == "Y" ] + then + echo "input file" + echo "chr col: $col_chr_name ($colchr)" + echo "start col: $col_start_name ($colstart)" + echo "end col: $col_end_name ($colend)" + echo "ref col: $col_ref_name ($colref)" + echo "obs col: $col_obs_name ($colobs)" + echo "" + echo "annotation file" + echo "chr col: $chr_name ($chrcol)" + echo "start col: $start_name ($startcol)" + echo "end col: $end_name ($endcol)" + echo "ref col: $ref_name ($refcol)" + echo "obs col: $obs_name ($obscol)" + fi + + #perform join + cgatools join --beta \ + --input $ofile $rfile \ + --output temporiginal \ + --match ${chr_name}:${col_chr_name} \ + --match ${start_name}:${col_start_name} \ + --match ${end_name}:${col_end_name} \ + --match ${ref_name}:${col_ref_name} \ + --match ${obs_name}:${col_obs_name} \ + --select A.*,$addcols \ + --always-dump \ + --output-mode compact + + #replace originalfile + sed -i 's/^>//g' temporiginal #join sometimes adds a '>' symbol to header + mv temporiginal originalfile + + if [ $test == "Y" ] + then + echo "joining complete" + head originalfile + echo "" + fi + +} + + + + +################################# +# +# PARSE PARAMETERS +# +################################# + + +set -- `getopt -n$0 -u -a --longoptions="inputfile: buildver: humandb: varfile: VCF: chrcol: startcol: endcol: refcol: obscol: vartypecol: convertcoords: geneanno: hgvs: verdbsnp: tfbs: mce: cytoband: segdup: dgv: gwas: ver1000g: cg46: cg69: impactscores: newimpactscores: otherinfo: esp: exac03: gerp: cosmic61: cosmic63: cosmic64: cosmic65: cosmic67: cosmic68: clinvar: nci60: outall: outfilt: outinvalid: scriptsdir: dorunannovar: dofilter: filt_dbsnp: filt1000GALL: filt1000GAFR: filt1000GAMR: filt1000GASN: filt1000GEUR: filtESP6500ALL: filtESP6500EA: filtESP6500AA: filtcg46: filtcg69: dummy:" "h:" "$@"` || usage +[ $# -eq 0 ] && usage + + + +while [ $# -gt 0 ] +do + case "$1" in + --inputfile) infile=$2;shift;; # inputfile + --buildver) buildvertmp=$2;shift;; # hg18 or hg19 + --humandb) humandbtmp=$2;shift;; # location of humandb database + --varfile) varfile=$2;shift;; # Y or N + --VCF) vcf=$2;shift;; #Y or N + --chrcol) chrcol=$2;shift;; # which column has chr + --startcol) startcol=$2;shift;; # which column has start + --endcol) endcol=$2;shift;; # which column has end + --refcol) refcol=$2;shift;; # which column has ref + --obscol) obscol=$2;shift;; # which column has alt + --vartypecol) vartypecol=$2;shift;; # which column has vartype + --convertcoords) convertcoords=$2;shift;; # Y or N convert coordinate from CG to 1-based? + --geneanno) geneanno=$2;shift;; # comma-separated list of strings refSeq, knowngene, ensgene + --hgvs) hgvs=$2;shift;; + --verdbsnp) verdbsnp=$2;shift;; #comma-separated list of dbsnp version to annotate with (e.g. "132,135NonFlagged,137,138")" + --tfbs) tfbs=$2;shift;; # Y or N + --mce) mce=$2;shift;; # Y or N + --cytoband) cytoband=$2;shift;; # Y or N + --segdup) segdup=$2;shift;; # Y or N + --dgv) dgv=$2;shift;; # Y or N + --gwas) gwas=$2;shift;; # Y or N + --ver1000g) ver1000g=$2;shift;; # Y or N + --cg46) cg46=$2;shift;; + --cg69) cg69=$2;shift;; + --impactscores) impactscores=$2;shift;; # Y or N + --newimpactscores) newimpactscores=$2;shift;; # Y or N + --otherinfo) otherinfo=$2;shift;; + --scriptsdir) scriptsdirtmp=$2;shift;; # Y or N + --esp) esp=$2;shift;; # Y or N + --exac03) exac03=$2;shift;; + --gerp) gerp=$2;shift;; # Y or N + --cosmic61) cosmic61=$2;shift;; # Y or N + --cosmic63) cosmic63=$2;shift;; # Y or N + --cosmic64) cosmic64=$2;shift;; # Y or N + --cosmic65) cosmic65=$2;shift;; # Y or N + --cosmic67) cosmic67=$2;shift;; # Y or N + --cosmic68) cosmic68=$2;shift;; # Y or N + --nci60) nci60=$2;shift;; # Y or N + --clinvar) clinvar=$2;shift;; # Y or N + --filt_dbsnp) filt_dbsnp=$2;shift;; + --filt1000GALL) threshold_1000g_ALL=$2;shift;; #threshold value + --filt1000GAFR) threshold_1000g_AFR=$2;shift;; #threshold value + --filt1000GAMR) threshold_1000g_AMR=$2;shift;; #threshold value + --filt1000GASN) threshold_1000g_ASN=$2;shift;; #threshold value + --filt1000GEUR) threshold_1000g_EUR=$2;shift;; #threshold value + --filtESP6500ALL) threshold_ESP6500_ALL=$2;shift;; #threshold value + --filtESP6500EA) threshold_ESP6500_EA=$2;shift;; #threshold value + --filtESP6500AA) threshold_ESP6500_AA=$2;shift;; #threshold value + --filtcg46) threshold_cg46=$2;shift;; + --filtcg69) threshold_cg69=$2;shift;; + --outall) outfile_all=$2;shift;; # file + --outfilt) outfile_filt=$2;shift;; # file + --outinvalid) outfile_invalid=$2;shift;; #file + --dorunannovar) dorunannovar=$2;shift;; #Y or N + -h) shift;; + --) shift;break;; + -*) usage;; + *) break;; + esac + shift +done + +#sometimes galaxy screws up these variables after updates, if comma-separated list, use only what is before first comma +humandb=${humandbtmp%,*} +buildver=${buildvertmp%,*} +scriptsdir=${scriptsdirtmp%,*} + + +if [ $test == "Y" ] +then + echo "dorunannovar: $dorunannovar" + echo "infile: $infile" + echo "buildver: $buildver" + echo "annovardb: $humandb" + echo "verdbnsp: $verdbsnp" + echo "geneanno: $geneanno" + echo "tfbs: $tfbs" + echo "mce: $mce" + echo "cytoband: $cytoband" + echo "segdup: $segdup" + echo "dgv: $dgv" + echo "gwas: $gwas" + echo "g1000: ${g1000}" + echo "cg46: ${cg46}" + echo "cg69: ${cg69}" + echo "impactscores: $impactscores" + echo "impactscores: $newimpactscores" + echo "esp: $esp" + echo "gerp: $gerp" + echo "cosmic: $cosmic" + echo "outfile: $outfile_all" + echo "outinvalid: $outfile_invalid" + echo "outfiltered: $outfile_filt" + echo "varfile: $varfile" + echo "vcf" $vcf + echo "chrcol: $chrcol" + echo "startcol: $startcol" + echo "endcol: $endcol" + echo "refcol: $refcol" + echo "obscol: $obscol" + echo "convertcoords: $convertcoords" + echo "vartypecol: $vartypecol" + echo "dofilter: $dofilter" + echo "threshold_1000g_ALL : $threshold_1000g_ALL" + echo "threshold_1000g_AFR : $threshold_1000g_AFR" + echo "threshold_1000g_AMR : $threshold_1000g_AMR" + echo "threshold_1000g_ASN : $threshold_1000g_ASN" + echo "threshold_1000g_EUR : $threshold_1000g_EUR" + echo "threshold_ESP6500_ALL: $threshold_ESP6500_ALL" + echo "threshold_ESP6500_EA : $threshold_ESP6500_EA" + echo "threshold_ESP6500_AA : $threshold_ESP6500_AA" + +fi + + + +############################################ +# +# Annotate Variants +# +############################################ + +#parse geneanno param +refgene="N" +knowngene="N" +ensgene="N" + +if [[ $geneanno =~ "refSeq" ]] +then + refgene="Y" +fi +if [[ $geneanno =~ "knowngene" ]] +then + knowngene="Y" +fi +if [[ $geneanno =~ "ensgene" ]] +then + ensgene="Y" +fi +if [ $hgvs == "N" ] +then + hgvs="" +fi + +#parse verdbsnp/1000g/esp strings +dbsnpstr=${verdbsnp//,/ } +filt_dbsnpstr=${filt_dbsnp//,/ } +g1000str=${ver1000g//,/ } +espstr=${esp//,/ } + +if [ $test == "Y" ] +then + echo "annotate dbsnp: $dbsnpstr" + echo "annotate esp: $espstr" + echo "filter dbsnp: $filt_dbsnpstr" +fi + +mutationtaster="N" +avsift="N" +lrt="N" +polyphen2="N" +phylop="N" +ljbsift="N" + +#parse old impactscores param (obsolete) +if [[ $impactscores =~ "mutationtaster" ]] +then + mutationtaster="Y" +fi +if [[ $impactscores =~ "sift" ]] +then + avsift="Y" +fi +if [[ $impactscores =~ "lrt" ]] +then + lrt="Y" +fi +if [[ $impactscores =~ "ljbsift" ]] +then + ljbsift="Y" +fi +if [[ $impactscores =~ "ljb2sift" ]] +then + ljb2sift="Y" +fi +if [[ $impactscores =~ "pp2" ]] +then + polyphen2="Y" +fi +if [[ $impactscores =~ "phylop" ]] +then + phylop="Y" +fi + +if [[ $varfile == "Y" ]] +then + convertcoords="Y" +fi + +#ljb refers to Liu, Jian, Boerwinkle paper in Human Mutation with pubmed ID 21520341. Cite this paper if you use the scores + +ljb2_sift="N" +ljb2_pp2hdiv="N" +ljb2_pp2hvar="N" +ljb2_lrt="N" +ljb2_mt="N" +ljb2_ma="N" +ljb2_fathmm="N" +ljb2_gerp="N" +ljb2_phylop="N" +ljb2_siphy="N" + +# parse ljb2 newimpactscores param +# ljb2_sift, ljb2_pp2hdiv, ljb2_pp2hvar, ljb2_lrt, ljb2_mt, ljb2_ma, ljb2_fathmm, ljb2_gerp++, ljb2_phylop, ljb2_siphy +if [[ $newimpactscores =~ "ljb2_sift" ]] +then + ljb2_sift="Y" +fi +if [[ $newimpactscores =~ "ljb2_pp2hdiv" ]] +then + ljb2_pp2hdiv="Y" +fi +if [[ $newimpactscores =~ "ljb2_pp2hvar" ]] +then + ljb2_pp2hvar="Y" +fi +if [[ $newimpactscores =~ "ljb2_lrt" ]] +then + ljb2_lrt="Y" +fi +if [[ $newimpactscores =~ "ljb2_mt" ]] +then + ljb2_mt="Y" +fi +if [[ $newimpactscores =~ "ljb2_ma" ]] +then + ljb2_ma="Y" +fi +if [[ $newimpactscores =~ "ljb2_fathmm" ]] +then + ljb2_fathmm="Y" +fi +if [[ $newimpactscores =~ "ljb2_gerp" ]] +then + ljb2_gerp="Y" +fi +if [[ $newimpactscores =~ "ljb2_phylop" ]] +then + ljb2_phylop="Y" +fi +if [[ $newimpactscores =~ "ljb2_siphy" ]] +then + ljb2_siphy="Y" +fi + +if [ $otherinfo == "N" ] +then + otherinfo="" +fi + + +#column header names we will be adding +# ESP 6500 +esp6500si_colheader_ALL="ESP6500si_ALL" +esp6500si_colheader_EA="ESP6500si_EA" +esp6500si_colheader_AA="ESP6500si_AA" +esp6500_colheader_ALL="ESP6500_ALL" +esp6500_colheader_EA="ESP6500_EA" +esp6500_colheader_AA="ESP6500_AA" +esp5400si_colheader_ALL="ESP5400si_ALL" +esp5400si_colheader_EA="ESP5400si_EA" +esp5400si_colheader_AA="ESP5400si_AA" +esp5400_colheader_ALL="ESP5400_ALL" +esp5400_colheader_EA="ESP5400_EA" +esp5400_colheader_AA="ESP5400_AA" + + +# cg46 cg69 +cg46_colheader="CG_46_genomes" +cg69_colheader="CG_69_genomes" + +cp $infile originalfile +#run annovar or filter only? +if [ $dorunannovar == "Y" ] +then + + + #################################### + # + # PREPARE INPUT FILE + # + #################################### + + echo "converting input file" + vcfheader="" + if [ $vcf == "Y" ] #if CG varfile, convert + then + # convert vcf to annovarinput + $scriptsdir/convert2annovar.pl --format vcf4old --allallele --includeinfo --outfile annovarinput $infile 2>&1 + + #construct header line from vcf file + cat $infile | grep "#CHROM" > additionalcols + sed -i 's/#//g' additionalcols + vcfheader="\t`cat additionalcols`" + echo "vcfheader:$vcfheader" + echo -e "chromosome\tbegin\tend\treference\tobserved\t`cat additionalcols`" > originalfile + cat annovarinput >> originalfile + + chrcol=1 + startcol=2 + endcol=3 + refcol=4 + obscol=5 + + + elif [ $varfile == "Y" ] #if CG varfile, convert + then + # convert varfile + $scriptsdir/convert2annovar.pl --format cg --outfile annovarinput $infile 2>&1 + echo -e "chromosome\tbegin\tend\treference\talleleSeq\tvarType\thaplotype" > originalfile + cat annovarinput | cut -f1-6,8 >> originalfile + cat annovarinput | cut -f1-5 >> annovarinput2 + mv annovarinput2 annovarinput + + chrcol=1 + startcol=2 + endcol=3 + refcol=4 + obscol=5 + + elif [ $convertcoords == "Y" ] # if CG-coordinates, convert + then + #echo "rearranging columns and converting coordinates" + awk 'BEGIN{ + FS="\t"; + OFS="\t"; + }{ + if(FNR>1) { + gsub(/chr/,"",$"'"${chrcol}"'") + if( $"'"${vartypecol}"'" == "snp" ){ $"'"${startcol}"'" += 1 }; + if( $"'"${vartypecol}"'" == "ins" ){ $"'"${refcol}"'" = "-" }; + if( $"'"${vartypecol}"'" == "del" ){ $"'"${startcol}"'" +=1; $"'"${obscol}"'" = "-" }; + if( $"'"${vartypecol}"'" == "sub" ){ $"'"${startcol}"'" += 1 }; + + printf("%s\t%s\t%s\t%s\t%s\n" ,$"'"${chrcol}"'",$"'"${startcol}"'",$"'"${endcol}"'",$"'"${refcol}"'",$"'"${obscol}"'"); + } + } + END{ + }' $infile > annovarinput + + #remove any "chr" prefixes + #sed -i '2,$s/chr//g' annovarinput + + awk 'BEGIN{ + FS="\t"; + OFS="\t"; + }{ + + if(FNR>=1) { + gsub(/chr/,"",$"'"${chrcol}"'") + if( $"'"${vartypecol}"'" == "snp" ){ $"'"${startcol}"'" += 1 }; + if( $"'"${vartypecol}"'" == "ins" ){ $"'"${refcol}"'" = "-" }; + if( $"'"${vartypecol}"'" == "del" ){ $"'"${startcol}"'" +=1; $"'"${obscol}"'" = "-" }; + if( $"'"${vartypecol}"'" == "sub" ){ $"'"${startcol}"'" += 1 }; + + print $0 + } + } + END{ + }' $infile > originalfile + + #remove any "chr" prefixes + #sed -i '2,$s/chr//g' originalfile + sed -i 's/omosome/chromosome/g' originalfile + + + else #only rearrange columns if already 1-based coordinates + echo "rearranging columns " + awk 'BEGIN{ + FS="\t"; + OFS="\t"; + }{ + if(FNR>1) { + printf("%s\t%s\t%s\t%s\t%s\n",$"'"${chrcol}"'",$"'"${startcol}"'",$"'"${endcol}"'",$"'"${refcol}"'",$"'"${obscol}"'"); + } + } + END{ + }' $infile > annovarinput + + #remove any "chr" prefixes + sed -i '2,$s/chr//g' annovarinput + sed '2,$s/chr//g' $infile > originalfile + sed -i 's/omosome/chromosome/g' originalfile + fi + + echo "...finished conversion" + + + + + #################################### + # + # RUN ANNOVAR COMMANDS + # + #################################### + + + + ###### gene-based annotation ####### + + # RefSeq Gene + if [ $refgene == "Y" ] + then + echo -e "\nrefSeq gene" + $scriptsdir/annotate_variation.pl --geneanno --buildver $buildver -dbtype gene ${hgvs} annovarinput $humandb 2>&1 + + annovarout=annovarinput.variant_function + sed -i '1i\RefSeq_Func\tRefSeq_Gene\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.RefSeq_Func,B.RefSeq_Gene + + annovarout=annovarinput.exonic_variant_function + sed -i '1i\linenum\tRefSeq_ExonicFunc\tRefSeq_AAChange\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 4 5 6 7 8 B.RefSeq_ExonicFunc,B.RefSeq_AAChange + fi + + + # UCSC KnownGene + if [ $knowngene == "Y" ] + then + echo -e "\nUCSC known gene" + $scriptsdir/annotate_variation.pl --geneanno --buildver $buildver -dbtype knowngene annovarinput $humandb 2>&1 + + annovarout=annovarinput.variant_function + sed -i '1i\UCSCKnownGene_Func\tUCSCKnownGene_Gene\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.UCSCKnownGene_Func,B.UCSCKnownGene_Gene + + annovarout=annovarinput.exonic_variant_function + sed -i '1i\linenum\tUCSCKnownGene_ExonicFunc\tUCSCKnownGene_AAChange\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 4 5 6 7 8 B.UCSCKnownGene_ExonicFunc,B.UCSCKnownGene_AAChange + fi + + + # Emsembl Gene + if [ $ensgene == "Y" ] + then + echo -e "\nEnsembl gene" + $scriptsdir/annotate_variation.pl --geneanno --buildver $buildver -dbtype ensgene annovarinput $humandb 2>&1 + + annovarout=annovarinput.variant_function + sed -i '1i\EnsemblGene_Func\tEnsemblGene_Gene\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.EnsemblGene_Func,B.EnsemblGene_Gene + + annovarout=annovarinput.exonic_variant_function + sed -i '1i\linenum\tEnsemblGene_ExonicFunc\tEnsemblGene_AAChange\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 4 5 6 7 8 B.EnsemblGene_ExonicFunc,B.EnsemblGene_AAChange + fi + + + + ###### region-based annotation ####### + + + # Transcription Factor Binding Sites Annotation + if [ $mce == "Y" ] + then + echo -e "\nMost Conserved Elements" + + if [ $buildver == "hg18" ] + then + $scriptsdir/annotate_variation.pl --regionanno --buildver $buildver -dbtype mce44way annovarinput $humandb 2>&1 + annovarout=annovarinput.${buildver}_phastConsElements44way + sed -i '1i\db\tphastConsElements44way\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.phastConsElements44way + + else #hg19 + $scriptsdir/annotate_variation.pl --regionanno --buildver $buildver -dbtype mce46way annovarinput $humandb 2>&1 + annovarout=annovarinput.${buildver}_phastConsElements46way + sed -i '1i\db\tphastConsElements46way\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.phastConsElements46way + fi + + fi + + + + # Transcription Factor Binding Sites Annotation + if [ $tfbs == "Y" ] + then + echo -e "\nTranscription Factor Binding Site Annotation" + $scriptsdir/annotate_variation.pl --regionanno --buildver $buildver -dbtype tfbs annovarinput $humandb 2>&1 + + # arguments: originalfile, resultfile,chrcol,startcol,endcol,refcol,obscol,selectcolumns + annovarout=annovarinput.${buildver}_tfbsConsSites + sed -i '1i\db\tTFBS\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.TFBS + fi + + + + # Identify cytogenetic band for genetic variants + if [ $cytoband == "Y" ] + then + echo -e "\nCytogenic band Annotation" + $scriptsdir/annotate_variation.pl --regionanno --buildver $buildver -dbtype band annovarinput $humandb 2>&1 + + annovarout=annovarinput.${buildver}_cytoBand + sed -i '1i\db\tBand\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.Band + fi + + + # Identify variants located in segmental duplications + if [ $segdup == "Y" ] + then + echo -e "\nSegmental Duplications Annotation" + $scriptsdir/annotate_variation.pl --regionanno --buildver $buildver -dbtype segdup annovarinput $humandb 2>&1 + + annovarout=annovarinput.${buildver}_genomicSuperDups + sed -i '1i\db\tSegDup\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.SegDup + fi + + + + # Identify previously reported structural variants in DGV + if [ $dgv == "Y" ] + then + echo -e "\nDGV Annotation" + $scriptsdir/annotate_variation.pl --regionanno --buildver $buildver -dbtype dgvMerged annovarinput $humandb 2>&1 + + annovarout=annovarinput.${buildver}_dgvMerged + sed -i '1i\db\tDGV\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.DGV + fi + + + # Identify variants reported in previously published GWAS studies + if [ $gwas == "Y" ] + then + echo -e "\nGWAS Annotation" + $scriptsdir/annotate_variation.pl --regionanno --buildver $buildver -dbtype gwascatalog annovarinput $humandb 2>&1 + + annovarout=annovarinput.${buildver}_gwasCatalog + sed -i '1i\db\tGWAS\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.GWAS + fi + + + + + ###### filter-based annotation ####### + + #dbSNP + for version in $dbsnpstr + do + if [ $version == "None" ] + then + break + fi + echo -e "\ndbSNP region Annotation, version: $version" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype ${version} annovarinput $humandb 2>&1 + + annovarout=annovarinput.${buildver}_${version}_dropped + sed -i '1i\db\tdb'${version}'\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.db${version} + + + done + + + + #1000 Genomes + + if [ $ver1000g != "None" ] + then + + for version in $g1000str + do + #column headers + g1000_colheader_ALL="${version}_ALL" + g1000_colheader_AFR="${version}_AFR" + g1000_colheader_AMR="${version}_AMR" + g1000_colheader_ASN="${version}_ASN" + g1000_colheader_EUR="${version}_EUR" + g1000_colheader_EAS="${version}_EAS" + g1000_colheader_SAS="${version}_SAS" + g1000_colheader_CEU="${version}_CEU" + g1000_colheader_YRI="${version}_YRI" + g1000_colheader_JPTCHB="${version}_JPTCHB" + + doALL="N" + doAMR="N" + doAFR="N" + doASN="N" + doEAS="N" + doSAS="N" + doEUR="N" + doCEU="N" + doYRI="N" + doJPTCHB="N" + + + if [ $version == "1000g2012apr" ] + then + fileID="2012_04" + doALL="Y" + if [ $buildver == "hg19" ] + then + doAMR="Y" + doAFR="Y" + doASN="Y" + doEUR="Y" + fi + elif [ $version == "1000g2014oct" ] + then + fileID="2014_10" + doALL="Y" + doAMR="Y" + doAFR="Y" + doEUR="Y" + doEAS="Y" + if [ $buildver == "hg19" ] + then + doSAS="Y" + fi + elif [[ $version == "1000g2012feb" && $buildver == "hg19" ]] + then + fileID="2012_02" + doALL="Y" + elif [[ $version == "1000g2010nov" && $buildver == "hg19" ]] + then + fileID="2010_11" + doALL="Y" + elif [[ $version == "1000g2010jul" && $buildver == "hg18" ]] + then + fileID="2010_07" + doALL="N" + doCEU="Y" + doYRI="Y" + doJPTCHB="Y" + else + echo "unrecognized 1000g version, skipping" + fi + + #ALL + if [ $doALL == "Y" ] + then + echo -e "\n1000Genomes ALL" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype "${version}_all" annovarinput $humandb 2>&1 + + annovarout=annovarinput.${buildver}_ALL.sites.${fileID}_dropped + sed -i '1i\db\t'$g1000_colheader_ALL'\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.$g1000_colheader_ALL + fi + + # AFR + if [ $doAFR == "Y" ] + then + echo -e "\n1000Genomes AFR" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype "${version}_afr" annovarinput $humandb 2>&1 + + annovarout=annovarinput.${buildver}_AFR.sites.${fileID}_dropped + sed -i '1i\db\t'$g1000_colheader_AFR'\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.$g1000_colheader_AFR + fi + + + # AMR + if [ $doAMR == "Y" ] + then + echo -e "\n1000Genomes AMR" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype "${version}_amr" annovarinput $humandb 2>&1 + + annovarout=annovarinput.${buildver}_AMR.sites.${fileID}_dropped + sed -i '1i\db\t'$g1000_colheader_AMR'\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.$g1000_colheader_AMR + fi + + # ASN + if [ $doASN == "Y" ] + then + echo -e "\n1000Genomes ASN" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype "${version}_asn" annovarinput $humandb 2>&1 + + annovarout=annovarinput.${buildver}_ASN.sites.${fileID}_dropped + sed -i '1i\db\t'$g1000_colheader_ASN'\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.$g1000_colheader_ASN + fi + + # EAS + if [ $doEAS == "Y" ] + then + echo -e "\n1000Genomes EAS" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype "${version}_eas" annovarinput $humandb 2>&1 + + annovarout=annovarinput.${buildver}_EAS.sites.${fileID}_dropped + sed -i '1i\db\t'$g1000_colheader_EAS'\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.$g1000_colheader_EAS + fi + + # SAS + if [ $doSAS == "Y" ] + then + echo -e "\n1000Genomes SAS" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype "${version}_sas" annovarinput $humandb 2>&1 + + annovarout=annovarinput.${buildver}_SAS.sites.${fileID}_dropped + sed -i '1i\db\t'$g1000_colheader_SAS'\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.$g1000_colheader_SAS + fi + + # EUR + if [ $doEUR == "Y" ] + then + echo -e "\n1000Genomes EUR" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype "${version}_eur" annovarinput $humandb 2>&1 + + annovarout=annovarinput.${buildver}_EUR.sites.${fileID}_dropped + sed -i '1i\db\t'$g1000_colheader_EUR'\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.$g1000_colheader_EUR + fi + + # CEU + if [ $doCEU == "Y" ] + then + echo -e "\n1000Genomes CEU" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype "${version}_ceu" annovarinput $humandb 2>&1 + + annovarout=annovarinput.${buildver}_CEU.sites.${fileID}_dropped + sed -i '1i\db\t'$g1000_colheader_CEU'\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.$g1000_colheader_CEU + fi + + # YRI + if [ $doYRI == "Y" ] + then + echo -e "\n1000Genomes YRI" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype "${version}_yri" annovarinput $humandb 2>&1 + + annovarout=annovarinput.${buildver}_YRI.sites.${fileID}_dropped + sed -i '1i\db\t'$g1000_colheader_YRI'\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.$g1000_colheader_YRI + + + fi + + #JPTCHB + if [ $doJPTCHB == "Y" ] + then + echo -e "\n1000Genomes JPTCHB" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype "${version}_jptchb" annovarinput $humandb 2>&1 + + annovarout=annovarinput.${buildver}_JPTCHB.sites.${fileID}_dropped + sed -i '1i\db\t'$g1000_colheader_JPTCHB'\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.$g1000_colheader_JPTCHB + fi + + done + fi + + + + + #### IMPACT SCORE ANNOTATIONS + + + if [ $ljb2_sift == "Y" ] + then + echo -e "\nLJB2 SIFT Annotation" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver $otherinfo -dbtype ljb2_sift annovarinput $humandb 2>&1 + + annovarout=annovarinput.${buildver}_ljb2_sift_dropped + sed -i '1i\db\tLJB2_SIFT\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.LJB2_SIFT + fi + + if [ $ljb2_pp2hdiv == "Y" ] + then + echo -e "\nLJB2 pp2hdiv Annotation" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver $otherinfo -dbtype ljb2_pp2hdiv annovarinput $humandb 2>&1 + + annovarout=annovarinput.${buildver}_ljb2_pp2hdiv_dropped + sed -i '1i\db\tLJB2_PolyPhen2_HDIV\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.LJB2_PolyPhen2_HDIV + fi + + if [ $ljb2_pp2hvar == "Y" ] + then + echo -e "\nLJB2 pp2hvar Annotation" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver $otherinfo -dbtype ljb2_pp2hvar annovarinput $humandb 2>&1 + + annovarout=annovarinput.${buildver}_ljb2_pp2hvar_dropped + + head $annovarout + sed -i '1i\db\tLJB2_PolyPhen2_HVAR\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.LJB2_PolyPhen2_HVAR + fi + + if [ $ljb2_lrt == "Y" ] + then + echo -e "\nLJB2 LRT Annotation" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver $otherinfo -dbtype ljb2_lrt annovarinput $humandb 2>&1 + + annovarout=annovarinput.${buildver}_ljb2_lrt_dropped + sed -i '1i\db\tLJB2_LRT\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.LJB2_LRT + fi + + if [ $ljb2_mt == "Y" ] + then + echo -e "\nLJB2 mutationtaster Annotation" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver $otherinfo -dbtype ljb2_mt annovarinput $humandb 2>&1 + + annovarout=annovarinput.${buildver}_ljb2_mt_dropped + sed -i '1i\db\tLJB2_MutationTaster\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.LJB2_MutationTaster + fi + + if [ $ljb2_ma == "Y" ] + then + echo -e "\nLJB2 mutationassessor Annotation" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver $otherinfo -dbtype ljb2_ma annovarinput $humandb 2>&1 + + annovarout=annovarinput.${buildver}_ljb2_ma_dropped + sed -i '1i\db\tLJB2_MutationAssessor\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.LJB2_MutationAssessor + fi + + if [ $ljb2_fathmm == "Y" ] + then + echo -e "\nLJB2 FATHMM Annotation" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver $otherinfo -dbtype ljb2_fathmm annovarinput $humandb 2>&1 + + annovarout=annovarinput.${buildver}_ljb2_fathmm_dropped + sed -i '1i\db\tLJB2_FATHMM\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.LJB2_FATHMM + fi + + if [ $ljb2_gerp == "Y" ] + then + echo -e "\nLJB2 GERP++ Annotation" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver $otherinfo -dbtype ljb2_gerp++ annovarinput $humandb 2>&1 + + annovarout=annovarinput.${buildver}_ljb2_gerp++_dropped + sed -i '1i\db\tLJB2_GERP++\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.LJB2_GERP++ + fi + + if [ $ljb2_phylop == "Y" ] + then + echo -e "\nLJB2 PhyloP Annotation" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver $otherinfo -dbtype ljb2_phylop annovarinput $humandb 2>&1 + + annovarout=annovarinput.${buildver}_ljb2_phylop_dropped + sed -i '1i\db\tLJB2_PhyloP\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.LJB2_PhyloP + fi + + if [ $ljb2_siphy == "Y" ] + then + echo -e "\nLJB2 SiPhy Annotation" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver $otherinfo -dbtype ljb2_siphy annovarinput $humandb 2>&1 + + annovarout=annovarinput.${buildver}_ljb2_siphy_dropped + sed -i '1i\db\tLJB2_SiPhy\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.LJB2_SiPhy + fi + + + + ### OLD IMPACT SCORE ANNOTATIONS + + # SIFT + if [ $avsift == "Y" ] + then + echo -e "\nSIFT Annotation" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype avsift annovarinput $humandb 2>&1 + + annovarout=annovarinput.${buildver}_avsift_dropped + sed -i '1i\db\tAVSIFT\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.AVSIFT + fi + + #ljb refers to Liu, Jian, Boerwinkle paper in Human Mutation with pubmed ID 21520341. Cite this paper if you use the scores + # SIFT2 + if [ $ljbsift == "Y" ] + then + echo -e "\nLJB SIFT Annotation" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype ljb_sift annovarinput $humandb 2>&1 + + annovarout=annovarinput.${buildver}_ljb_sift_dropped + sed -i '1i\db\tLJB_SIFT\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.LJB_SIFT + fi + + + # PolyPhen2 + if [ $polyphen2 == "Y" ] + then + echo -e "\nPolyPhen Annotation" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype ljb_pp2 annovarinput $humandb 2>&1 + + annovarout=annovarinput.${buildver}_ljb_pp2_dropped + sed -i '1i\db\tPolyPhen2\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.PolyPhen2 + fi + + + # MutationTaster + if [ $mutationtaster == "Y" ] + then + echo -e "\nMutationTaster Annotation" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype ljb_mt annovarinput $humandb 2>&1 + + annovarout=annovarinput.${buildver}_ljb_mt_dropped + sed -i '1i\db\tMutationTaster\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.MutationTaster + fi + + + # LRT + if [ $lrt == "Y" ] + then + echo -e "\nLRT Annotation" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype ljb_lrt annovarinput $humandb 2>&1 + + annovarout=annovarinput.${buildver}_ljb_lrt_dropped + sed -i '1i\db\tLikelihoodRatioTestScore\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.LikelihoodRatioTestScore + fi + + # PhyloP + if [ $phylop == "Y" ] + then + echo -e "\nPhyloP Annotation" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype ljb_phylop annovarinput $humandb 2>&1 + + annovarout=annovarinput.${buildver}_ljb_phylop_dropped + sed -i '1i\db\tPhyloP\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.PhyloP + fi + + + ### ESP Exome Variant Server + if [ $esp != "None" ] + then + echo -e "\nESP Annotation" + for version in $espstr + do + echo "version: $version" + # 6500si ALL + if [ $version == "esp6500si_all" ] + then + $scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype esp6500si_all annovarinput $humandb 2>&1 + + annovarout=annovarinput.${buildver}_esp6500si_all_dropped + sed -i '1i\db\t'$esp6500si_colheader_ALL'\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.$esp6500si_colheader_ALL + fi + + + # 6500si European American + if [ $version == "esp6500si_ea" ] + then + $scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype esp6500si_ea annovarinput $humandb 2>&1 + + annovarout=annovarinput.${buildver}_esp6500si_ea_dropped + sed -i '1i\db\t'$esp6500si_colheader_EA'\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'" ' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.$esp6500si_colheader_EA + fi + + # 6500si African Americans + if [ $version == "esp6500si_aa" ] + then + $scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype esp6500si_aa annovarinput $humandb 2>&1 + + annovarout=annovarinput.${buildver}_esp6500si_aa_dropped + sed -i '1i\db\t'$esp6500si_colheader_AA'\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'" ' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.$esp6500si_colheader_AA + fi + + + # 6500 ALL + if [ $version == "esp6500_all" ] + then + ls + $scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype esp6500_all annovarinput $humandb 2>&1 + + annovarout=annovarinput.${buildver}_esp6500_all_dropped + sed -i '1i\db\t'$esp6500_colheader_ALL'\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'" ' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.$esp6500_colheader_ALL + fi + + + # 6500 European American + if [ $version == "esp6500_ea" ] + then + $scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype esp6500_ea annovarinput $humandb 2>&1 + annovarout=annovarinput.${buildver}_esp6500_ea_dropped + sed -i '1i\db\t'$esp6500_colheader_EA'\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.$esp6500_colheader_EA + fi + + # 6500 African Americans + if [ $version == "esp6500_aa" ] + then + $scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype esp6500_aa annovarinput $humandb 2>&1 + + annovarout=annovarinput.${buildver}_esp6500_aa_dropped + sed -i '1i\db\t'$esp6500_colheader_AA'\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.$esp6500_colheader_AA + fi + + + # 5400 ALL + if [ $version == "esp5400_all" ] + then + $scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype esp5400_all annovarinput $humandb 2>&1 + + annovarout=annovarinput.${buildver}_esp5400_all_dropped + sed -i '1i\db\t'$esp5400_colheader_ALL'\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.$esp5400_colheader_ALL + fi + + + # 5400 European American + if [ $version == "esp5400_ea" ] + then + $scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype esp5400_ea annovarinput $humandb 2>&1 + + annovarout=annovarinput.${buildver}_esp5400_ea_dropped + sed -i '1i\db\t'$esp5400_colheader_EA'\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.$esp5400_colheader_EA + fi + + # 5400 African Americans + if [ $version == "esp5400_aa" ] + then + $scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype esp5400_aa annovarinput $humandb 2>&1 + + annovarout=annovarinput.${buildver}_esp5400_aa_dropped + sed -i '1i\db\t'$esp5400_colheader_AA'\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.$esp5400_colheader_AA + fi + + done + fi + + + #exac03 + if [ $exac03 == "Y" ] + then + echo -e "\nexac03 Annotation" + $scriptsdir/annotate_variation.pl --filter -otherinfo --buildver $buildver $otherinfo -dbtype exac03 annovarinput $humandb 2>&1 + + #annovarout.tmp=annovarinput.${buildver}_exac03_dropped + + # split allelefrequency column into several columns, one per population + awk 'BEGIN{FS="\t" + OFS="\t" + }{ + gsub(",","\t",$2) + print $0 + }END{}' annovarinput.${buildver}_exac03_dropped > $annovarout + + head $annovarout + sed -i '1i\db\tExAC_Freq\tExAC_AFR\tExAC_AMR\tExAC_EAS\tExAC_FIN\tExAC_NFE\tExAC_OTH\tExAC_SAS\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 10 11 12 13 14 B.ExAC_Freq,B.ExAC_AFR,B.ExAC_AMR,B.ExAC_EAS,B.ExAC_FIN,B.ExAC_NFE,B.ExAC_OTH,B.ExAC_SAS + fi + + #GERP++ + if [ $gerp == "Y" ] + then + echo -e "\nGERP++ Annotation" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype gerp++gt2 annovarinput $humandb 2>&1 + + annovarout="annovarinput.${buildver}_gerp++gt2_dropped" + sed -i '1i\db\tGERP++\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.GERP++ + fi + + + #COSMIC + if [[ $cosmic61 == "Y" && $buildver == "hg19" ]] + then + echo -e "\nCOSMIC61 Annotation" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype cosmic61 annovarinput $humandb 2>&1 + + annovarout="annovarinput.${buildver}_cosmic61_dropped" + sed -i '1i\db\tCOSMIC61\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.COSMIC61 + + fi + + if [[ $cosmic63 == "Y" && $buildver == "hg19" ]] + then + echo -e "\nCOSMIC63 Annotation" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype cosmic63 annovarinput $humandb 2>&1 + + annovarout="annovarinput.${buildver}_cosmic63_dropped" + sed -i '1i\db\tCOSMIC63\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.COSMIC63 + + fi + + if [[ $cosmic64 == "Y" && $buildver == "hg19" ]] + then + echo -e "\nCOSMIC64 Annotation" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype cosmic64 annovarinput $humandb 2>&1 + + annovarout="annovarinput.${buildver}_cosmic64_dropped" + sed -i '1i\db\tCOSMIC64\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.COSMIC64 + + fi + + if [[ $cosmic65 == "Y" && $buildver == "hg19" ]] + then + echo -e "\nCOSMIC65 Annotation" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype cosmic65 annovarinput $humandb 2>&1 + + annovarout="annovarinput.${buildver}_cosmic65_dropped" + sed -i '1i\db\tCOSMIC65\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.COSMIC65 + + fi + + if [[ $cosmic67 == "Y" && $buildver == "hg19" ]] + then + echo -e "\nCOSMIC67 Annotation" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype cosmic67 annovarinput $humandb 2>&1 + + annovarout="annovarinput.${buildver}_cosmic67_dropped" + sed -i '1i\db\tCOSMIC67\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.COSMIC67 + + fi + + if [[ $cosmic68 == "Y" && $buildver == "hg19" ]] + then + echo -e "\nCOSMIC68 Annotation" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype cosmic68 annovarinput $humandb 2>&1 + + annovarout="annovarinput.${buildver}_cosmic68_dropped" + sed -i '1i\db\tCOSMIC68\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.COSMIC68 + + fi + + if [[ $cosmic70 == "Y" && $buildver == "hg19" ]] + then + echo -e "\nCOSMIC70 Annotation" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype cosmic70 annovarinput $humandb 2>&1 + + annovarout="annovarinput.${buildver}_cosmic70_dropped" + sed -i '1i\db\tCOSMIC70\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.COSMIC70 + + fi + + if [[ $clinvar == "Y" && $buildver == "hg19" ]] + then + echo -e "\nCLINVAR Annotation" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype clinvar_20140211 annovarinput $humandb 2>&1 + + annovarout="annovarinput.${buildver}_clinvar_20140211_dropped" + sed -i '1i\db\tCLINVAR\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.CLINVAR + + fi + + if [[ $nci60 == "Y" && $buildver == "hg19" ]] + then + echo -e "\nNCI60 Annotation" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype nci60 annovarinput $humandb 2>&1 + + annovarout="annovarinput.${buildver}_nci60_dropped" + sed -i '1i\db\tNCI60\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.NCI60 + + fi + + #cg46 + if [[ $cg46 == "Y" ]] + then + echo -e "\nCG 46 genomes Annotation" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype cg46 annovarinput $humandb 2>&1 + + annovarout="annovarinput.${buildver}_cg46_dropped" + sed -i '1i\db\t'${cg46_colheader}'\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.${cg46_colheader} + + fi + + + #cg69 + if [[ $cg69 == "Y" ]] + then + echo -e "\nCG 69 genomes Annotation" + $scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype cg69 annovarinput $humandb 2>&1 + + annovarout="annovarinput.${buildver}_cg69_dropped" + sed -i '1i\db\t'${cg69_colheader}'\tchromosome\tstart\tend\treference\talleleSeq"'"$vcfheader"'"' $annovarout + joinresults originalfile $annovarout 3 4 5 6 7 B.${cg69_colheader} + + fi + + + + if [ $convertcoords == "Y" ] + then + echo "converting back coordinates" + awk 'BEGIN{ + FS="\t"; + OFS="\t"; + }{ + if (FNR==1) + print $0 + if(FNR>1) { + $"'"${chrcol}"'" = "chr"$"'"${chrcol}"'" + if( $"'"${vartypecol}"'" == "snp" ){ $"'"${startcol}"'" -= 1 }; + if( $"'"${vartypecol}"'" == "ins" ){ $"'"${refcol}"'" = "" }; + if( $"'"${vartypecol}"'" == "del" ){ $"'"${startcol}"'" -=1; $"'"${obscol}"'" = "" }; + if( $"'"${vartypecol}"'" == "sub" ){ $"'"${startcol}"'" -= 1 }; + print $0 + + } + } + END{ + }' originalfile > originalfile_coords + else + mv originalfile originalfile_coords + fi + + #restore "chr" prefix? + + #move to outputfile + if [ ! -s annovarinput.invalid_input ] + then + echo "Congrats, your input file contained no invalid lines!" > annovarinput.invalid_input + fi + + cp originalfile_coords $outfile_all + cp annovarinput.invalid_input $outfile_invalid 2>&1 + + sed -i 's/chrchr/chr/g' $outfile_all + sed -i 's/chrchr/chr/g' $outfile_invalid + +fi #if $dorunannovar + + + + + + + + + + + + + + + + + + + + + + +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/tools/annovar/annovar.xml Thu Oct 01 04:24:45 2015 -0400 @@ -0,0 +1,255 @@ +<tool id="AnnovarShed" name="ANNOVAR" version="2015may"> + <description> Annotate a file using ANNOVAR </description> + + <requirements> + <requirement type="package" version="1.7">cgatools</requirement> + </requirements> + + <command interpreter="bash"> + annovar.sh + --esp ${esp} + --exac03 ${exac03} + --gerp ${gerp} + --cosmic61 ${cosmic61} + --cosmic63 ${cosmic63} + --cosmic64 ${cosmic64} + --cosmic65 ${cosmic65} + --cosmic67 ${cosmic67} + --cosmic68 ${cosmic68} + --outall ${annotated} + --outinvalid ${invalid} + --dorunannovar ${dorun} + --inputfile ${infile} + --buildver ${reference.fields.dbkey} + --humandb ${reference.fields.ANNOVAR_humandb} + --scriptsdir ${reference.fields.ANNOVAR_scripts} + --verdbsnp ${verdbsnp} + --geneanno ${geneanno} + --tfbs ${tfbs} + --mce ${mce} + --cytoband ${cytoband} + --segdup ${segdup} + --dgv ${dgv} + --gwas ${gwas} + #if $filetype.type == "other" + --varfile N + --VCF N + --chrcol ${filetype.col_chr} + --startcol ${filetype.col_start} + --endcol ${filetype.col_end} + --obscol ${filetype.col_obs} + --refcol ${filetype.col_ref} + + #if $filetype.convertcoords.convert == "Y" + --vartypecol ${filetype.convertcoords.col_vartype} + --convertcoords Y + #else + --convertcoords N + #end if + #end if + #if $filetype.type == "vcf" + --varfile N + --VCF Y + --convertcoords N + #end if + #if $filetype.type == "varfile" + --varfile Y + --VCF N + #end if + --cg46 ${cgfortysix} + --cg69 ${cgsixtynine} + --ver1000g ${ver1000g} + --hgvs ${hgvs} + --otherinfo ${otherinfo} + --newimpactscores ${newimpactscores} + --clinvar ${clinvar} + + </command> + + <inputs> + <param name="dorun" type="hidden" value="Y"/> <!-- will add tool in future to filter on annovar columns, then will call annovar.sh with dorun==N --> + <param name="reference" type="select" label="Reference"> + <options from_data_table="annovar_loc" /> + <filter type="data_meta" ref="infile" key="dbkey" column="0"/> + </param> + + <param name="infile" type="data" label="Select file to annotate" help="Must be CG varfile or a tab-separated file with a 1 line header"/> + <conditional name="filetype"> + <param name="type" type="select" label="Select filetype" > + <option value="vcf" selected="false"> VCF4 file </option> + <option value="varfile" selected="false"> CG varfile </option> + <option value="other" selected="false"> Other </option> + </param> + <when value="other"> + <param name="col_chr" type="data_column" data_ref="infile" multiple="False" label="Chromosome Column" /> + <param name="col_start" type="data_column" data_ref="infile" multiple="False" label="Start Column" /> + <param name="col_end" type="data_column" data_ref="infile" multiple="False" label="End Column" /> + <param name="col_ref" type="data_column" data_ref="infile" multiple="False" label="Reference Allele Column" /> + <param name="col_obs" type="data_column" data_ref="infile" multiple="False" label="Observed Allele Column" /> + <conditional name="convertcoords"> + <param name="convert" type="select" label="Is this file using Complete Genomics (0-based half-open) cooridinates?" > + <option value="Y"> Yes </option> + <option value="N" selected="True"> No </option> + </param> + <when value="Y"> + <param name="col_vartype" type="data_column" data_ref="infile" multiple="False" label="varType Column" /> + </when> + </conditional> + </when> + </conditional> + + + + <param name="geneanno" type="select" label="Select Gene Annotation(s)" multiple="true" optional="true" display="checkboxes"> + <option value="refSeq" selected="true" > RefSeq </option> + <option value="knowngene"> UCSC KnownGene </option> + <option value="ensgene" > Ensembl </option> + </param> + <param name="hgvs" type="boolean" checked="False" truevalue="-hgvs" falsevalue="N" label="Use HGVS nomenclature for RefSeq annotation" help="if checked, cDNA level annotation is compatible with HGVS"/> + + + <!-- region-based annotation --> + <param name="cytoband" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Cytogenic band Annotation?" help="This option identifies Giemsa-stained chromosomes bands, (e.g. 1q21.1-q23.3)."/> + <param name="tfbs" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Transcription Factor Binding Site Annotation?"/> + <param name="mce" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Most Conserved Elements Annotation?" help="This option phastCons 44-way alignments to annotate variants that fall within conserved genomic regions."/> + <param name="segdup" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Segmental Duplication Annotation?" help="Genetic variants that are mapped to segmental duplications are most likely sequence alignment errors and should be treated with extreme caution."/> + <param name="dgv" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="DGV (Database of Genomic Variants) Annotation?" help="Identify previously reported structural variants in DGV (Database of Genomic Variants) "/> + <param name="gwas" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="GWAS studies Annotation?" help="Identify variants reported in previously published GWAS (Genome-wide association studies) "/> + + + + <!-- filter-based annotation --> + <param name="verdbsnp" type="select" label="Select dbSNP version(s) to annotate with" multiple="true" display="checkboxes" optional="true" help="SNPs in dbSNP may be flagged as Clinically Associated, Select the NonFlagged version if you do not wish to annotate with these SNPs "> + <option value="snp128" > 128 (hg18/hg19) </option> + <option value="snp128NonFlagged"> 128 NonFlagged </option> + <option value="snp129" > 129 (hg18/hg19) </option> + <option value="snp129NonFlagged"> 129 NonFlagged </option> + <option value="snp130" > 130 (hg18/hg19) </option> + <option value="snp130NonFlagged"> 130 NonFlagged </option> + <option value="snp131" > 131 (hg18/hg19) </option> + <option value="snp131NonFlagged"> 131 NonFlagged </option> + <option value="snp132" > 132 (hg18/hg19) </option> + <option value="snp132NonFlagged"> 132 NonFlagged </option> + <option value="snp135" > 135 (hg19 only) </option> + <option value="snp135NonFlagged"> 135 NonFlagged </option> + <option value="snp137" > 137 (hg19 only) </option> + <option value="snp137NonFlagged"> 137 NonFlagged </option> + <option value="snp138" > 138 (hg19 only) </option> + <option value="snp138NonFlagged"> 138 NonFlagged </option> + </param> + + <param name="ver1000g" type="select" label="Select 1000Genomes Annotation(s)" multiple="true" display="checkboxes" optional="true" help="2012april database for ALL populations was converted to hg18 using the UCSC liftover program"> + <option value="1000g2014oct"> 2014oct (hg18/hg19) (6 populations: AMR,AFR,EUR,EAS,SAS,ALL) </option> + <option value="1000g2012apr"> 2012apr (hg18/hg19) (5 populations: AMR,AFR,ASN,CEU,ALL) </option> + <option value="1000g2012feb"> 2012feb (hg19) (1 population: ALL) </option> + <option value="1000g2010nov"> 2010nov (hg19) (1 population: ALL) </option> + <option value="1000g2010jul"> 2010jul (hg18) (4 populations: YRI,JPT,CHB,CEU)</option> + </param> + <!-- + <param name="g1000" type="boolean" checked="True" truevalue="Y" falsevalue="N" label="Annotate with 1000genomes project? (version 2012april)"/> + --> + + + <param name="esp" type="select" label="Select Exome Variant Server version(s) to annotate with" multiple="true" display="checkboxes" optional="true" help="si versions of databases contain indels and chrY calls"> + <option value="esp6500si_all" > ESP6500si ALL </option> + <option value="esp6500si_ea" > ESP6500si European Americans </option> + <option value="esp6500si_aa" > ESP6500si African Americans </option> + <option value="esp6500_all" > ESP6500 ALL </option> + <option value="esp6500_ea" > ESP6500 European Americans </option> + <option value="esp6500_aa" > ESP6500 African Americans </option> + <option value="esp5400_all" > ESP5400 ALL </option> + <option value="esp5400_ea" > ESP5400 European Americans </option> + <option value="esp5400_aa" > ESP5400 African Americans </option> + </param> + + <param name="exac03" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with ExAC 03?"/> + + <param name="gerp" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="GERP++ Annotation?" help="GERP identifies constrained elements in multiple alignments by quantifying substitution deficits (see http://mendel.stanford.edu/SidowLab/downloads/gerp/ for details) This option annotates those variants having GERP++>2 in human genome, as this threshold is typically regarded as evolutionarily conserved and potentially functional"/> + + <param name="clinvar" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="CLINVAR Annotation? (hg19 only)" help="version 2014-02-11. Annotations include Variant Clinical Significance (unknown, untested, non-pathogenic, probable-non-pathogenic, probable-pathogenic, pathogenic, drug-response, histocompatibility, other) and Variant disease name."/> + <param name="nci60" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with NCI60? (hg19 only)" help="NCI-60 exome allele frequency data"/> + <param name="cgfortysix" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Complete Genomics 46 Genomes?" help="Diversity Panel; 46 unrelated individuals"/> + <param name="cgsixtynine" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Complete Genomics 69 Genomes?" help="Diversity Panel, Pedigree, YRI trio and PUR trio"/> + <param name="cosmic61" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with COSMIC 61? (hg19 only)"/> + <param name="cosmic63" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with COSMIC 63? (hg19 only)"/> + <param name="cosmic64" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with COSMIC 64? (hg19 only)"/> + <param name="cosmic65" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with COSMIC 65? (hg19 only)"/> + <param name="cosmic67" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with COSMIC 67? (hg19 only)"/> + <param name="cosmic68" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with COSMIC 68? (hg19 only)"/> + <param name="cosmic70" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with COSMIC 70? (hg19 only)"/> + + + + <param name="newimpactscores" type="select" label="Select functional impact scores (LJB2)" multiple="true" display="checkboxes" optional="true" help="LJB refers to Liu, Jian, Boerwinkle paper in Human Mutation, pubmed ID 21520341. "> + <option value="ljb2_sift"> SIFT score </option> + <option value="ljb2_pp2hdiv"> PolyPhen2 HDIV score </option> + <option value="ljb2_pp2hvar" > PolyPhen2 HVAR score </option> + <option value="ljb2_mt" > MutationTaster score </option> + <option value="ljb2_ma" > MutationAssessor score </option> + <option value="ljb2_lrt"> LRT score (Likelihood Ratio Test) </option> + <option value="ljb2_phylop"> PhyloP score </option> + <option value="ljb2_fathmm" > FATHMM score </option> + <option value="ljb2_gerp"> GERP++ score </option> + <option value="ljb2_siphy"> SiPhy score </option> + </param> + <param name="otherinfo" type="boolean" checked="False" truevalue="-otherinfo" falsevalue="N" label="Also get predictions where possible?" help="e.g. annotated as -score,damaging- or -score,benign- instead of just score"/> + + <!-- OBSOLETE impact scores, uncomment for backwards compatibility, add argument impactscores to command +<param name="impactscores" type="select" label="Select functional impact scores annotate with (OBSOLETE)" multiple="true" display="checkboxes" optional="true" help="LJB refers to Liu, Jian, Boerwinkle paper in Human Mutation, pubmed ID 21520341."> + <option value="avsift"> AV SIFT </option> + <option value="ljbsift"> LJB SIFT (corresponds to 1-SIFT)</option> + <option value="pp2"> PolyPhen2 </option> + <option value="mutationtaster" > MutationTaster </option> + <option value="lrt"> LRT (Likelihood Ratio Test) </option> + <option value="phylop"> PhyloP </option> + </param> + --> + + <!-- prefix for output file so you dont have to manually rename history items --> + <param name="fname" type="text" value="" label="Prefix for your output file" help="Optional"/> + + </inputs> + + <outputs> + <data format="tabular" name="invalid" label="$fname ANNOVAR Invalid input on ${on_string}"/> + <data format="tabular" name="annotated" label="$fname ANNOVAR Annotated variants on ${on_string}"/> + </outputs> + + <help> +**What it does** + +This tool will annotate a file using ANNOVAR. + +**ANNOVAR Website and Documentation** + +Website: http://www.openbioinformatics.org/annovar/ + +Paper: http://nar.oxfordjournals.org/content/38/16/e164 + +**Input Formats** + +Input Formats may be one of the following: + +VCF file +Complete Genomics varfile + +Custom tab-delimited file (specify chromosome, start, end, reference allele, observed allele columns) + +Custom tab-delimited CG-derived file (specify chromosome, start, end, reference allele, observed allele, varType columns) + + +**Database Notes** + +see ANNOVAR website for extensive documentation, a few notes on some of the databases: + +**LJB2 Database** + +PolyPhen2 HVAR should be used for diagnostics of Mendelian diseases, which requires distinguishing mutations with drastic effects from all the remaining human variation, including abundant mildly deleterious alleles.The authors recommend calling probably damaging if the score is between 0.909 and 1, and possibly damaging if the score is between 0.447 and 0.908, and benign if the score is between 0 and 0.446. + +PolyPhen HDIV should be used when evaluating rare alleles at loci potentially involved in complex phenotypes, dense mapping of regions identified by genome-wide association studies, and analysis of natural selection from sequence data. The authors recommend calling probably damaging if the score is between 0.957 and 1, and possibly damaging if the score is between 0.453 and 0.956, and benign is the score is between 0 and 0.452. + + </help> + +</tool> + +