# HG changeset patch # User iuc # Date 1480957878 18000 # Node ID bf8c1526871b055b31b95c8ab6578d4632bd8596 # Parent 98708b88af9f75daec37b70f3e7ee0834c060fee planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tool_collections/snpsift/snpsift commit d12355cea76843e3ed6f09d96c3e9fe22afe4a4f diff -r 98708b88af9f -r bf8c1526871b snpSift_annotate.xml --- a/snpSift_annotate.xml Tue Jun 07 10:04:09 2016 -0400 +++ b/snpSift_annotate.xml Mon Dec 05 12:11:18 2016 -0500 @@ -1,6 +1,6 @@ - + SNPs from dbSnp - @@ -10,33 +10,33 @@ "$output" + #end if + -q "$dbSnp" "$input" > "$output" ]]> - - + ^(([a-zA-Z][a-zA-Z0-9_-]*)(,[a-zA-Z][a-zA-Z0-9_-]*)*)?$ - + This option will load the entire 'database' VCF file into memory (which may not be practical for large 'database' VCF files). - Otherwise, both the database and the input VCF files should be sorted by position (Chromosome sort order can differ between files). + Otherwise, both the database and the input VCF files should be sorted by position (Chromosome sort order can differ between files). diff -r 98708b88af9f -r bf8c1526871b snpSift_caseControl.xml --- a/snpSift_caseControl.xml Tue Jun 07 10:04:09 2016 -0400 +++ b/snpSift_caseControl.xml Mon Dec 05 12:11:18 2016 -0500 @@ -1,6 +1,6 @@ - + Count samples are in 'case' and 'control' groups. - @@ -10,16 +10,17 @@ "$output" + @CONDA_SNPSIFT_JAR_PATH@ && + java -Xmx1G -jar "\$SNPSIFT_JAR_PATH/SnpSift.jar" caseControl -q + #if str($name).strip() != '': + -name "$name" + #end if + #if $ctrl.ctrl_src == 'caseString': + '$ctrl.caseControlStr' + #else + -tfam "$ctrl.tfam" + #end if + "$input" > "$output" ]]> @@ -41,8 +42,8 @@ - - [_a-zA-Z0-9]+ + + [_a-zA-Z0-9]* @@ -90,13 +91,13 @@ **SnpSift CaseControl** -Allows you to count how many samples are in 'case' group and a 'control' group. You can count 'homozygous', 'heterozygous' or 'any' variants. +Allows you to count how many samples are in 'case' group and a 'control' group. You can count 'homozygous', 'heterozygous' or 'any' variants. -Case and control are defined by a string containing plus and minus symbols {'+', '-', '0'} where '+' is case, '-' is control and '0' is neutral. +Case and control are defined by a string containing plus and minus symbols {'+', '-', '0'} where '+' is case, '-' is control and '0' is neutral. This command adds two annotations to the VCF file: - - **CaseControl**: Two comma separated numbers numbers representing the number of samples that have the variant in the case and the control group. Example: + - **CaseControl**: Two comma separated numbers numbers representing the number of samples that have the variant in the case and the control group. Example: "CaseControl=3,4" *the variant is present in 3 cases and 4 controls.* @@ -110,7 +111,7 @@ - Hom/Het case = "hom" - - Hom/Het control = "any" + - Hom/Het control = "any" - Case / Control column designation = ""++++------" diff -r 98708b88af9f -r bf8c1526871b snpSift_extractFields.xml --- a/snpSift_extractFields.xml Tue Jun 07 10:04:09 2016 -0400 +++ b/snpSift_extractFields.xml Mon Dec 05 12:11:18 2016 -0500 @@ -1,6 +1,6 @@ - + - from a VCF file inot a tabular file + from a VCF file into a tabular file snpSift_macros.xml @@ -8,16 +8,17 @@ - - - - + + @@ -78,13 +77,13 @@ ID REF ALT - FILTER + FILTER INFO fields: AF AC DP MQ - etc. (any info field available) + etc. (any info field available) SnpEff 'ANN' fields: "ANN[*].ALLELE" (alias GENOTYPE) "ANN[*].EFFECT" (alias ANNOTATION): Effect in Sequence ontology terms (e.g. 'missense_variant', 'synonymous_variant', 'stop_gained', etc.) @@ -104,7 +103,7 @@ "ANN[*].AA_POS" (alias POS_AA) "ANN[*].AA_LEN" (alias LEN_AA) "ANN[*].DISTANCE" - "ANN[*].ERRORS" (alias WARNING, INFOS) + "ANN[*].ERRORS" (alias WARNING, INFOS) SnpEff 'EFF' fields (this is for older SnpEff/SnpSift versions, new version use 'ANN' field): "EFF[*].EFFECT" "EFF[*].IMPACT" @@ -116,17 +115,17 @@ "EFF[*].BIOTYPE" "EFF[*].CODING" "EFF[*].TRID" - "EFF[*].RANK" + "EFF[*].RANK" SnpEff 'LOF' fields: "LOF[*].GENE" "LOF[*].GENEID" "LOF[*].NUMTR" - "LOF[*].PERC" + "LOF[*].PERC" SnpEff' NMD' fields: "NMD[*].GENE" "NMD[*].GENEID" "NMD[*].NUMTR" - "NMD[*].PERC" + "NMD[*].PERC" Some examples: @@ -135,7 +134,7 @@ **CHROM POS ID AF** - The result will look something like: + The result will look something like: :: @@ -155,25 +154,25 @@ - GEN[0].GL[1] : Second likelihood from first genotype - GEN[1].GL : The whole GL fiels (all entries without separating them) - GEN[3].GL[*] : All likelihoods form genotype 3 (this time they will be tab separated, as opposed to the previous one). - - GEN[*].GT : Genotype subfields (GT) from ALL samples (tab separated). + - GEN[*].GT : Genotype subfields (GT) from ALL samples (tab separated). - The result will look something like: + The result will look something like: :: #CHROM POS ID THETA GEN[0].GL[1] GEN[1].GL GEN[3].GL[*] GEN[*].GT - 1 10583 rs58108140 0.0046 -0.47 -0.24,-0.44,-1.16 -0.48 -0.48 -0.48 0|0 0|0 0|0 0|1 0|0 0|1 0|0 0|0 0|1 - 1 10611 rs189107123 0.0077 -0.48 -0.24,-0.44,-1.16 -0.48 -0.48 -0.48 0|0 0|1 0|0 0|0 0|0 0|0 0|0 0|0 0|0 - 1 13302 rs180734498 0.0048 -0.58 -2.45,-0.00,-5.00 -0.48 -0.48 -0.48 0|0 0|1 0|0 0|0 0|0 1|0 0|0 0|1 0|0 + 1 10583 rs58108140 0.0046 -0.47 -0.24,-0.44,-1.16 -0.48 -0.48 -0.48 0|0 0|0 0|0 0|1 0|0 0|1 0|0 0|0 0|1 + 1 10611 rs189107123 0.0077 -0.48 -0.24,-0.44,-1.16 -0.48 -0.48 -0.48 0|0 0|1 0|0 0|0 0|0 0|0 0|0 0|0 0|0 + 1 13302 rs180734498 0.0048 -0.58 -2.45,-0.00,-5.00 -0.48 -0.48 -0.48 0|0 0|1 0|0 0|0 0|0 1|0 0|0 0|1 0|0 - *Extracting fields with multiple values:* (notice that there are multiple effect columns per line because there are mutiple effects per variant) **CHROM POS REF ALT ANN[*].EFFECT** - The result will look something like: + The result will look something like: - :: + :: #CHROM POS REF ALT ANN[*].EFFECT 22 17071756 T C 3_prime_UTR_variant downstream_gene_variant @@ -184,9 +183,9 @@ **CHROM POS REF ALT ANN[*].EFFECT ANN[*].HGVS_P** - The result will look something like: + The result will look something like: - :: + :: #CHROM POS REF ALT ANN[*].EFFECT ANN[*].HGVS_P 22 17071756 T C 3_prime_UTR_variant,downstream_gene_variant .,. @@ -198,9 +197,9 @@ **CHROM POS REF ALT ANN[*].EFFECT** - The result will look something like: + The result will look something like: - :: + :: #CHROM POS REF ALT ANN[*].EFFECT 22 17071756 T C 3_prime_UTR_variant diff -r 98708b88af9f -r bf8c1526871b snpSift_filter.xml --- a/snpSift_filter.xml Tue Jun 07 10:04:09 2016 -0400 +++ b/snpSift_filter.xml Mon Dec 05 12:11:18 2016 -0500 @@ -1,4 +1,4 @@ - + Filter variants using arbitrary expressions snpSift_macros.xml @@ -8,7 +8,8 @@ $expr#slurp - + @@ -41,20 +42,19 @@ - + - - + - - + + @@ -133,20 +133,20 @@ :: - (FILTER = 'PASS') | ( na FILTER ) + (FILTER = 'PASS') | ( na FILTER ) - *I want to filter lines with an ANN annotation EFFECT of 'frameshift_variant' ( for vcf files using Sequence Ontology terms )*: :: - + ( ANN[*].EFFECT has 'frameshift_variant' ) - **Important** According to the specification, there can be more than one EFFECT separated by & (e.g. 'missense_variant&splice_region_variant', thus using has operator is better than using equality operator (=). For instance 'missense_variant&splice_region_variant' = 'missense_variant' is false, whereas 'missense_variant&splice_region_variant' has 'missense_variant' is true. + **Important** According to the specification, there can be more than one EFFECT separated by & (e.g. 'missense_variant&splice_region_variant', thus using has operator is better than using equality operator (=). For instance 'missense_variant&splice_region_variant' = 'missense_variant' is false, whereas 'missense_variant&splice_region_variant' has 'missense_variant' is true. - *I want to filter lines with an EFF of 'FRAME_SHIFT' ( for vcf files using Classic Effect names )*: :: - + ( EFF[*].EFFECT = 'FRAME_SHIFT' ) - *I want to filter out samples with quality less than 30*: @@ -160,23 +160,23 @@ :: (( exists INDEL ) & (QUAL >= 20)) | (QUAL >= 30 ) - + - *...or any homozygous variant present in more than 3 samples*: :: (countHom() > 3) | (( exists INDEL ) & (QUAL >= 20)) | (QUAL >= 30 ) - + - *...or any heterozygous sample with coverage 25 or more*: :: ((countHet() > 0) & (DP >= 25)) | (countHom() > 3) | (( exists INDEL ) & (QUAL >= 20)) | (QUAL >= 30 ) - + - *I want to keep samples where the genotype for the first sample is homozygous variant and the genotype for the second sample is reference*: :: - + (isHom( GEN[0] ) & isVariant( GEN[0] ) & isRef( GEN[1] )) diff -r 98708b88af9f -r bf8c1526871b snpSift_int.xml --- a/snpSift_int.xml Tue Jun 07 10:04:09 2016 -0400 +++ b/snpSift_int.xml Mon Dec 05 12:11:18 2016 -0500 @@ -1,4 +1,4 @@ - + Filter variants using intervals