snpsift_dbnsfp: snpSift_dbnsfp.xml comparison

comparison snpSift_dbnsfp.xml @ 4:4e21e4f2bc48 draft default tip

planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tool_collections/snpsift/snpsift_dbnsfp commit fbc18d9128669e461e76ed13307ee88dd774afa5

author	iuc
date	Mon, 12 Jun 2017 10:25:44 -0400
parents	c838e7136a40
children

comparison

equal deleted inserted replaced

-:563d1bdb7b80
+:4e21e4f2bc48
-<tool id="snpSift_dbnsfp" name="SnpSift dbNSFP" version="@WRAPPER_VERSION@.1">
+<tool id="snpSift_dbnsfp" name="SnpSift dbNSFP" version="@WRAPPER_VERSION@.0">
-<description>Add Annotations from dbNSFP or similar annotation DBs</description>
+<description>Add annotations from dbNSFP or similar annotation DBs</description>
 <macros>
 <import>snpSift_macros.xml</import>
 </macros>
 <expand macro="requirements" />
 <expand macro="stdio" />
 <expand macro="version_command" />
 <command><![CDATA[
-@CONDA_SNPSIFT_JAR_PATH@ &&
+SnpSift -Xmx6G dbnsfp -v
-java -Xmx6G -jar "\$SNPSIFT_JAR_PATH/SnpSift.jar" dbnsfp -v
+#if $db.dbsrc == 'cached':
-#if $db.dbsrc == 'cached':
+-db '$db.dbnsfp'
--db "$db.dbnsfp"
+#if $db.annotations and str($db.annotations) != '':
-#if $db.annotations and str($db.annotations) != '':
+-f '$db.annotations'
--f "$db.annotations"
+#end if
-#end if
+#else:
-#else:
+-db '${db.dbnsfpdb.extra_files_path}/${db.dbnsfpdb.metadata.bgzip}'
--db "${db.dbnsfpdb.extra_files_path}/${db.dbnsfpdb.metadata.bgzip}"
+#if $db.annotations and str($db.annotations) != '':
-#if $db.annotations and str($db.annotations) != '':
+-f '$db.annotations'
--f "$db.annotations"
+#end if
 #end if
-#end if
+'$input' > '$output'
-"$input" > "$output"
+2> tmp.err && grep -v file tmp.err
-2> tmp.err && grep -v file tmp.err
+]]></command>
-]]>
-</command>
 <inputs>
 <param name="input" type="data" format="vcf" label="Variant input file in VCF format"/>
 <conditional name="db">
 <param name="dbsrc" type="select" label="dbNSFP ">
 <option value="cached">Locally installed dbNSFP database </option>
 </param>
 </when>
 <when value="history">
 <param name="dbnsfpdb" type="data" format="snpsiftdbnsfp" label="DbNSFP"/>
 <param name="annotations" type="select" multiple="true" display="checkboxes" label="Annotate with">
 <options>
 <filter type="data_meta" ref="dbnsfpdb" key="annotation" />
 </options>
 </param>
 </when>
 </conditional>
 </inputs>
 <outputs>
-<data format="vcf" name="output" />
+<data name="output" format="vcf" />
 </outputs>
 <tests>
 <!-- This cannot be tested at the moment because test_dbnsfpdb.tabular
 is converted from dbnsfp.tabular to snpsiftdbnsfp format on-the-fly
 when this tool is run and annotation metadata is not available
 </assert_contents>
 </output>
 </test> -->
 </tests>
 <help><![CDATA[
 The dbNSFP is an integrated database of functional predictions from multiple algorithms (SIFT, Polyphen2, LRT and MutationTaster, PhyloP and GERP++, etc.).
 It contains variant annotations such as:
+1000Gp1_AC
-1000Gp1_AC
 Alternative allele counts in the whole 1000 genomes phase 1 (1000Gp1) data
 1000Gp1_AF
 Alternative allele frequency in the whole 1000Gp1 data
 1000Gp1_AFR_AC
 Alternative allele counts in the 1000Gp1 African descendent samples
 1000Gp1_AFR_AF
 Alternative allele frequency in the 1000Gp1 African descendent samples
 1000Gp1_AMR_AC
 Alternative allele counts in the 1000Gp1 American descendent samples
 1000Gp1_AMR_AF
 Alternative allele frequency in the 1000Gp1 American descendent samples
 1000Gp1_ASN_AC
 Alternative allele counts in the 1000Gp1 Asian descendent samples
 1000Gp1_ASN_AF
 Alternative allele frequency in the 1000Gp1 Asian descendent samples
 1000Gp1_EUR_AC
 Alternative allele counts in the 1000Gp1 European descendent samples
 1000Gp1_EUR_AF
 Alternative allele frequency in the 1000Gp1 European descendent samples
 aaalt
 Alternative amino acid. "." if the variant is a splicing site SNP (2bp on each end of an intron)
 aapos
 Amino acid position as to the protein. "-1" if the variant is a splicing site SNP (2bp on each end of an intron)
 aapos_SIFT
 ENSP id and amino acid positions corresponding to SIFT scores. Multiple entries separated by ";"
 aapos_FATHMM
 ENSP id and amino acid positions corresponding to FATHMM scores. Multiple entries separated by ";"
 aaref
 Reference amino acid. "." if the variant is a splicing site SNP (2bp on each end of an intron)
 alt
 Alternative nucleotide allele (as on the + strand)
 Ancestral_allele
 Ancestral allele (based on 1000 genomes reference data)
 cds_strand
 Coding sequence (CDS) strand (+ or -)
 chr
 Chromosome number
 codonpos
 Position on the codon (1, 2 or 3)
 Ensembl_geneid
 Ensembl gene ID
 Ensembl_transcriptid
 Ensembl transcript IDs (separated by ";")
 ESP6500_AA_AF
 Alternative allele frequency in the African American samples of the NHLBI GO Exome Sequencing Project (ESP6500 data set)
 ESP6500_EA_AF
 Alternative allele frequency in the European American samples of the NHLBI GO Exome Sequencing Project (ESP6500 data set)
 FATHMM_pred
 If a FATHMM_score is <=-1.5 (or rankscore <=0.81415) the corresponding non-synonymous SNP is predicted as "D(AMAGING)"; otherwise it is predicted as "T(OLERATED)". Multiple predictions separated by ";"
 FATHMM_rankscore
 FATHMMori scores were ranked among all FATHMMori scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of FATHMMori scores in dbNSFP. If there are multiple scores, only the most damaging (largest) rankscore is presented. The scores range from 0 to 1
 FATHMM_score
 FATHMM default score (FATHMMori)
 fold-degenerate
 Degenerate type (0, 2 or 3)
 genename
 Gene name; if the non-synonymous SNP can be assigned to multiple genes, gene names are separated by ";"
 GERP++_NR
 GERP++ neutral rate
 GERP++_RS
 GERP++ RS score, the larger the score, the more conserved the site
 GERP++_RS_rankscore
 GERP++ RS scores were ranked among all GERP++ RS scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of GERP++ RS scores in dbNSFP
 hg18_pos(1-coor)
 Physical position on the chromosome as to hg18 (1-based coordinate)
 Interpro_domain
 Domain or conserved site on which the variant locates
 LR_pred
 Prediction of our LR based ensemble prediction score, "T(olerated)" or "D(amaging)". The score cutoff between "D" and "T" is 0.5. The rankscore cutoff between "D" and "T" is 0.82268
 LR_rankscore
 LR scores were ranked among all LR scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of LR scores in dbNSFP. The scores range from 0 to 1
 LR_score
 Our logistic regression (LR) based ensemble prediction score, which incorporated 10 scores (SIFT, PolyPhen-2 HDIV, PolyPhen-2 HVAR, GERP++, MutationTaster, Mutation Assessor, FATHMM, LRT, SiPhy, PhyloP) and the maximum frequency observed in the 1000 genomes populations. Larger value means the SNV is more likely to be damaging. Scores range from 0 to 1
 LRT_Omega
 Estimated nonsynonymous-to-synonymous-rate ratio (Omega, reported by LRT)
 LRT_converted_rankscore
 LRTori scores were first converted as LRTnew=1-LRTori*0.5 if Omega<1, or LRTnew=LRTori*0.5 if Omega>=1. Then LRTnew scores were ranked among all LRTnew scores in dbNSFP. The rankscore is the ratio of the rank over the total number of the scores in dbNSFP. The scores range from 0.00166 to 0.85682
 LRT_pred
 LRT prediction, D(eleterious), N(eutral) or U(nknown), which is not solely determined by the score
 LRT_score
 The original LRT two-sided p-value (LRTori), ranges from 0 to 1
 MutationAssessor_pred
 MutationAssessor's functional impact of a variant
 MutationAssessor_rankscore
 MAori scores were ranked among all MAori scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of MAori scores in dbNSFP. The scores range from 0 to 1
 MutationAssessor_score
 MutationAssessor functional impact combined score (MAori)
 MutationTaster_converted_rankscore
 The MTori scores were first converted: if the prediction is "A" or "D" MTnew=MTori; if the prediction is "N" or "P", MTnew=1-MTori. Then MTnew scores were ranked among all MTnew scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of MTnew scores in dbNSFP. The scores range from 0.0931 to 0.80722
 MutationTaster_pred
 MutationTaster prediction
 MutationTaster_score
 MutationTaster p-value (MTori), ranges from 0 to 1
 phastCons46way_placental
 phastCons conservation score based on the multiple alignments of 33 placental mammal genomes (including human). The larger the score, the more conserved the site
 phastCons46way_placental_rankscore
 phastCons46way_placental scores were ranked among all phastCons46way_placental scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of phastCons46way_placental scores in dbNSFP
 phastCons46way_primate
 phastCons conservation score based on the multiple alignments of 10 primate genomes (including human). The larger the score, the more conserved the site
 phastCons46way_primate_rankscore
 phastCons46way_primate scores were ranked among all phastCons46way_primate scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of phastCons46way_primate scores in dbNSFP
 phastCons100way_vertebrate
 phastCons conservation score based on the multiple alignments of 100 vertebrate genomes (including human). The larger the score, the more conserved the site
 phastCons100way_vertebrate_rankscore
 phastCons100way_vertebrate scores were ranked among all phastCons100way_vertebrate scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of phastCons100way_vertebrate scores in dbNSFP
 phyloP46way_placental
 phyloP (phylogenetic p-values) conservation score based on the multiple alignments of 33 placental mammal genomes (including human). The larger the score, the more conserved the site
 phyloP46way_placental_rankscore
 phyloP46way_placental scores were ranked among all phyloP46way_placental scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of phyloP46way_placental scores in dbNSFP
 phyloP46way_primate
 phyloP (phylogenetic p-values) conservation score based on the multiple alignments of 10 primate genomes (including human). The larger the score, the more conserved the site
 phyloP46way_primate_rankscore
 phyloP46way_primate scores were ranked among all phyloP46way_primate scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of phyloP46way_primate scores in dbNSFP
 phyloP100way_vertebrate
 phyloP (phylogenetic p-values) conservation score based on the multiple alignments of 100 vertebrate genomes (including human). The larger the score, the more conserved the site
 phyloP100way_vertebrate_rankscore
 phyloP100way_vertebrate scores were ranked among all phyloP100way_vertebrate scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of phyloP100way_vertebrate scores in dbNSFP
 Polyphen2_HDIV_pred
 Polyphen2 prediction based on HumDiv
 Polyphen2_HDIV_rankscore
 Polyphen2 HDIV scores were first ranked among all HDIV scores in dbNSFP. The rankscore is the ratio of the rank the score over the total number of the scores in dbNSFP. If there are multiple scores, only the most damaging (largest) rankscore is presented. The scores range from 0.02656 to 0.89917
 Polyphen2_HDIV_score
 Polyphen2 score based on HumDiv, i.e. hdiv_prob. The score ranges from 0 to 1. Multiple entries separated by ";"
 Polyphen2_HVAR_pred
 Polyphen2 prediction based on HumVar
 Polyphen2_HVAR_rankscore
 Polyphen2 HVAR scores were first ranked among all HVAR scores in dbNSFP. The rankscore is the ratio of the rank the score over the total number of the scores in dbNSFP. If there are multiple scores, only the most damaging (largest) rankscore is presented. The scores range from 0.01281 to 0.9711
 Polyphen2_HVAR_score
 Polyphen2 score based on HumVar, i.e. hvar_prob. The score ranges from 0 to 1. Multiple entries separated by ";"
 pos(1-coor)
 Physical position on the chromosome as to hg19 (1-based coordinate)
 RadialSVM_pred
 Prediction of our SVM based ensemble prediction score, "T(olerated)" or "D(amaging)". The score cutoff between "D" and "T" is 0. The rankscore cutoff between "D" and "T" is 0.83357
 RadialSVM_rankscore
 RadialSVM scores were ranked among all RadialSVM scores in dbNSFP. The rankscore is the ratio of the rank of the screo over the total number of RadialSVM scores in dbNSFP. The scores range from 0 to 1
 RadialSVM_score
 Our support vector machine (SVM) based ensemble prediction score, which incorporated 10 scores (SIFT, PolyPhen-2 HDIV, PolyPhen-2 HVAR, GERP++, MutationTaster, Mutation Assessor, FATHMM, LRT, SiPhy, PhyloP) and the maximum frequency observed in the 1000 genomes populations. Larger value means the SNV is more likely to be damaging. Scores range from -2 to 3 in dbNSFP
 ref
 Reference nucleotide allele (as on the + strand)
 refcodon
 Reference codon
 Reliability_index
 Number of observed component scores (except the maximum frequency in the 1000 genomes populations) for RadialSVM and LR. Ranges from 1 to 10. As RadialSVM and LR scores are calculated based on imputed data, the less missing component scores, the higher the reliability of the scores and predictions
 SIFT_converted_rankscore
 SIFTori scores were first converted to SIFTnew=1-SIFTori, then ranked among all SIFTnew scores in dbNSFP. The rankscore is the ratio of the rank the SIFTnew score over the total number of SIFTnew scores in dbNSFP. If there are multiple scores, only the most damaging (largest) rankscore is presented. The rankscores range from 0.02654 to 0.87932
 SIFT_pred
 If SIFTori is smaller than 0.05 (rankscore>0.55) the corresponding non-synonymous SNP is predicted as "D(amaging)"; otherwise it is predicted as "T(olerated)". Multiple predictions separated by ";"
 SIFT_score
 SIFT score (SIFTori). Scores range from 0 to 1. The smaller the score the more likely the SNP has damaging effect. Multiple scores separated by ";"
 SiPhy_29way_logOdds
 SiPhy score based on 29 mammals genomes. The larger the score, the more conserved the site
 SiPhy_29way_pi
 The estimated stationary distribution of A, C, G and T at the site, using SiPhy algorithm based on 29 mammals genomes
 SLR_test_statistic
 SLR test statistic for testing natural selection on codons. A negative value indicates negative selection, and a positive value indicates positive selection. Larger magnitude of the value suggests stronger evidence
 Uniprot_aapos
 Amino acid position as to Uniprot. Multiple entries separated by ";"
 Uniprot_acc
 Uniprot accession number. Multiple entries separated by ";"
 Uniprot_id
 Uniprot ID number. Multiple entries separated by ";"
 UniSNP_ids
 rs numbers from UniSNP, which is a cleaned version of dbSNP build 129, in format: rs number1;rs number2;...
+The dbNSFP database is available from https://sites.google.com/site/jpopgen/dbNSFP and there is only annotation for human genome builds.
-The website for  dbNSFP database is https://sites.google.com/site/jpopgen/dbNSFP and there is only annotation for human genome builds.
+The procedure for preparing the dbNSFP data for use in SnpSift dbnsfp and a couple of prebuilt dbNSFP databases are available at:
-The procedure for preparing the dbNSFP data for use in SnpSift dbnsfp is in the SnpSift documentation:
-*( It also provides links for dbNSFP databases prebuilt for SnpSift )*
 http://snpeff.sourceforge.net/SnpSift.html#dbNSFP
-However, any dbNSFP-like tabular file that be can used with SnpSift dbnsfp if it has::
+However, any dbNSFP-like tabular file that be can used with SnpSift dbnsfp if it has:
-	- The first line of the file must be column headers that name the annotations.
+- The first line of the file must be column headers that name the annotations.
-	- The first 4 columns are required and must be::
+- The first 4 columns are required and must be:
-		1. chromosome
-		2. position in chromosome
+1. chr: chromosome
-		3. reference base
+2. pos(1-coor): position in chromosome
-		4. alternate base
+3. ref: reference base
+4. alt: alternate base
-For example:
+For example::
-::
+#chr	pos(1-coor)	ref	alt	aaref	aaalt	genename	SIFT_score
-	#chr	pos(1-coor)	ref	alt	aaref	aaalt	genename	SIFT_score
+1	69134	A	C	E	A	OR4F5	0.03
-	1	69134	A	C	E	A	OR4F5	0.03
+1	69134	A	G	E	G	OR4F5	0.09
-	1	69134	A	G	E	G	OR4F5	0.09
+1	69134	A	T	E	V	OR4F5	0.03
-	1	69134	A	T	E	V	OR4F5	0.03
+4	100239319	T	A	H	L	ADH1B	0
-	4	100239319	T	A	H	L	ADH1B	0
+4	100239319	T	C	H	R	ADH1B	0.15
-	4	100239319	T	C	H	R	ADH1B	0.15
+4	100239319	T	G	H	P	ADH1B	0
-	4	100239319	T	G	H	P	ADH1B	0
+The Galaxy datatypes for dbNSFP can automatically convert the specially formatted tabular file for use by SnpSift dbNSFP:
-The galaxy datatypes for dbNSFP can automatically convert the specially formatted tabular file for use by SnpSift dbNSFP:
+1. Upload the tabular file, set the datatype as: **"dbnsfp.tabular"**
-1. Upload the tabular file, set the datatype as: **"dbnsfp.tabular"**
+2. Edit the history dataset attributes (pencil icon): Use "Convert Format" to convert the **"dbnsfp.tabular"** to the correct format for SnpSift dbnsfp: **"snpsiftdbnsfp"**.
-2. Edit the history dataset attributes (pencil icon): Use "Convert Format" to convert the **"dbnsfp.tabular"** to the correct format for SnpSift dbnsfp: **"snpsiftdbnsfp"**.
 @EXTERNAL_DOCUMENTATION@
-	http://snpeff.sourceforge.net/SnpSift.html#dbNSFP
+- http://snpeff.sourceforge.net/SnpSift.html#dbNSFP
-]]>
+]]></help>
-</help>
 <expand macro="citations">
-<citation type="doi">DOI: 10.1002/humu.21517</citation>
+<citation type="doi">10.1002/humu.21517</citation>
-<citation type="doi">DOI: 10.1002/humu.22376</citation>
+<citation type="doi">10.1002/humu.22932</citation>
-<citation type="doi">DOI: 10.1002/humu.22932</citation>
+<citation type="doi">10.1093/hmg/ddu733</citation>
-<citation type="doi">doi: 10.1093/hmg/ddu733</citation>
+<citation type="doi">10.3389/fgene.2012.00035</citation>
-<citation type="doi">doi: 10.1093/nar/gku1206</citation>
-<citation type="doi">doi: 10.3389/fgene.2012.00035</citation>
 </expand>
 </tool>

Mercurial > repos > iuc > snpsift_dbnsfp

comparison snpSift_dbnsfp.xml @ 4:4e21e4f2bc48 draft default tip