# HG changeset patch # User Jim Johnson # Date 1359584432 21600 # Node ID 2c595fea585c1e1dfa6e1914b2470646285b3159 # Parent c07c403fc470fcefd642adc9508c639671b0a22d Add more doocumentation diff -r c07c403fc470 -r 2c595fea585c snpSift_caseControl.xml --- a/snpSift_caseControl.xml Thu Jan 17 16:31:12 2013 -0500 +++ b/snpSift_caseControl.xml Wed Jan 30 16:20:32 2013 -0600 @@ -8,7 +8,7 @@ snpEff - java -Xmx1G -jar \$JAVA_JAR_PATH/SnpSift.jar casControl -q $hhCase $hhControl $caseControStr $input > $output + java -Xmx1G -jar \$JAVA_JAR_PATH/SnpSift.jar casControl -q $hhCase $hhControl '$caseControStr' $input > $output @@ -22,7 +22,12 @@ - + + +Case and control are defined by a string containing plus and minus symbols {'+', '-', '0'} where '+' is case, '-' is control and '0' is neutral + + [+-0]+ + @@ -34,7 +39,32 @@ -Count samples are in 'case' and 'control' groups. You can count 'homozygous', 'heterozygous' or 'any' variants. Case and control are defined by a string containing plus and minus symbols ('+' and '-') where '+' is case and '-' is control. This command adds two annotations to the VCF file. +**SnpSift CaseControl** + +Allows you to count how many samples are in 'case' group and a 'control' group. You can count 'homozygous', 'heterozygous' or 'any' variants. + +Case and control are defined by a string containing plus and minus symbols {'+', '-', '0'} where '+' is case, '-' is control and '0' is neutral. + +This command adds two annotations to the VCF file: + + - **CaseControl**: Two comma separated numbers numbers representing the number of samples that have the variant in the case and the control group. Example: + + "CaseControl=3,4" *the variant is present in 3 cases and 4 controls.* + + + - **CaseControlP**: A p-value (Fisher exact test) that the number of cases is N or more. Example: + + "CaseControl=4,0;CaseControlP=3.030303e-02" *in this case the pValue of having 4 or more cases and zero controls is 0.03* + + +For example, if we have ten samples (which means ten genotype columns in the VCF file), the first four are 'case' and the last six are 'control', so the description string would be "++++------". Let's say we want to distinguish genotypes that are homozygous in 'case' and either homozygous or heterozygous in 'control'. We would set: + + - Hom/Het case = "hom" + + - Hom/Het control = "any" + + - Case / Control column designation = ""++++------" + For details about this tool, please go to http://snpeff.sourceforge.net/SnpSift.html#casecontrol diff -r c07c403fc470 -r 2c595fea585c snpSift_filter.xml --- a/snpSift_filter.xml Thu Jan 17 16:31:12 2013 -0500 +++ b/snpSift_filter.xml Wed Jan 30 16:20:32 2013 -0600 @@ -31,9 +31,34 @@ -You can filter using arbitrary expressions. +**SnpSift filter** + +You can filter ia vcf file using arbitrary expressions, for instance "(QUAL > 30) | (exists INDEL) | ( countHet() > 2 )". The actual expressions can be quite complex, so it allows for a lot of flexibility. + +Some examples: + + - *I want to filter out samples with quality less than 30*: + + * **( QUAL > 30 )** + + - *...but we also want InDels that have quality 20 or more*: + + * **(( exists INDEL ) & (QUAL >= 20)) | (QUAL >= 30 )** -For details about this tool, please go to http://snpeff.sourceforge.net/SnpSift.html#filter + - *...or any homozygous variant present in more than 3 samples*: + + * **(countHom() > 3) | (( exists INDEL ) & (QUAL >= 20)) | (QUAL >= 30 )** + + - *...or any heterozygous sample with coverage 25 or more*: + + * **((countHet() > 0) && (DP >= 25)) | (countHom() > 3) | (( exists INDEL ) & (QUAL >= 20)) | (QUAL >= 30 )** + + - *I want to keep samples where the genotype for the first sample is homozygous variant and the genotype for the second sample is reference*: + + * **isHom( GEN[0] ) & isVariant( GEN[0] ) & isRef( GEN[1] )** + + +For complete details about this tool and epressions that can be used, please go to http://snpeff.sourceforge.net/SnpSift.html#filter