changeset 1:2c595fea585c

Add more doocumentation
author Jim Johnson <jj@umn.edu>
date Wed, 30 Jan 2013 16:20:32 -0600
parents c07c403fc470
children a9bae7957c36
files snpSift_caseControl.xml snpSift_filter.xml
diffstat 2 files changed, 60 insertions(+), 5 deletions(-) [+]
line wrap: on
line diff
--- a/snpSift_caseControl.xml	Thu Jan 17 16:31:12 2013 -0500
+++ b/snpSift_caseControl.xml	Wed Jan 30 16:20:32 2013 -0600
@@ -8,7 +8,7 @@
                 <requirement type="package" version="3.1">snpEff</requirement>
 	</requirements>
 	<command>
-		java -Xmx1G -jar \$JAVA_JAR_PATH/SnpSift.jar casControl -q $hhCase $hhControl $caseControStr $input > $output
+		java -Xmx1G -jar \$JAVA_JAR_PATH/SnpSift.jar casControl -q $hhCase $hhControl '$caseControStr' $input > $output
 	</command>
 	<inputs>
 		<param format="vcf" name="input" type="data" label="VCF input"/>
@@ -22,7 +22,12 @@
 			<option value="hom">Homozygous</option>
 			<option value="het">Heterozygous</option>
 		</param>
-		<param name="caseControStr" type="text" label="Case / Control" size="50"/>
+		<param name="caseControStr" type="text" label="Case / Control column designation" size="50">
+		<help>
+Case and control are defined by a string containing plus and minus symbols {'+', '-', '0'} where '+' is case, '-' is control and '0' is neutral
+		</help>
+		<validator type="regex" message="must be  only plus(+), minus(-), or zero(0) characters">[+-0]+</validator>
+                </param>
 	</inputs>
 	<outputs>
 		<data format="vcf" name="output" />
@@ -34,7 +39,32 @@
 
 	<help>
 
-Count samples are in 'case' and 'control' groups. You can count 'homozygous', 'heterozygous' or 'any' variants. Case and control are defined by a string containing plus and minus symbols ('+' and '-') where '+' is case and '-' is control. This command adds two annotations to the VCF file.
+**SnpSift CaseControl**
+
+Allows you to count how many samples are in 'case' group and a 'control' group. You can count 'homozygous', 'heterozygous' or 'any' variants. 
+
+Case and control are defined by a string containing plus and minus symbols {'+', '-', '0'} where '+' is case, '-' is control and '0' is neutral. 
+
+This command adds two annotations to the VCF file:
+
+ - **CaseControl**: Two comma separated numbers numbers representing the number of samples that have the variant in the case and the control group. Example: 
+
+  "CaseControl=3,4" *the variant is present in 3 cases and 4 controls.*
+
+
+ - **CaseControlP**: A p-value (Fisher exact test) that the number of cases is N or more. Example:
+
+  "CaseControl=4,0;CaseControlP=3.030303e-02" *in this case the pValue of having 4 or more cases and zero controls is 0.03*
+
+
+For example, if we have ten samples (which means ten genotype columns in the VCF file), the first four are 'case' and the last six are 'control', so the description string would be "++++------".  Let's say we want to distinguish genotypes that are homozygous in 'case' and either homozygous or heterozygous in 'control'.  We would set:
+
+  - Hom/Het case = "hom"
+
+  - Hom/Het control = "any"  
+
+  - Case / Control column designation = ""++++------"
+
 
 For details about this tool, please go to http://snpeff.sourceforge.net/SnpSift.html#casecontrol
 
--- a/snpSift_filter.xml	Thu Jan 17 16:31:12 2013 -0500
+++ b/snpSift_filter.xml	Wed Jan 30 16:20:32 2013 -0600
@@ -31,9 +31,34 @@
         </stdio>
 	<help>
 
-You can filter using arbitrary expressions.
+**SnpSift filter**
+
+You can filter ia vcf file using arbitrary expressions, for instance "(QUAL > 30) | (exists INDEL) | ( countHet() > 2 )". The actual expressions can be quite complex, so it allows for a lot of flexibility.
+
+Some examples:
+
+  - *I want to filter out samples with quality less than 30*:
+
+    * **( QUAL &gt; 30 )**
+
+  - *...but we also want InDels that have quality 20 or more*:
+
+    * **(( exists INDEL ) &amp; (QUAL >= 20)) | (QUAL >= 30 )**
 
-For details about this tool, please go to http://snpeff.sourceforge.net/SnpSift.html#filter
+  - *...or any homozygous variant present in more than 3 samples*:
+
+    * **(countHom() > 3) | (( exists INDEL ) &amp; (QUAL >= 20)) | (QUAL >= 30 )**
+
+  - *...or any heterozygous sample with coverage 25 or more*:
+
+    * **((countHet() > 0) &amp;&amp; (DP >= 25)) | (countHom() > 3) | (( exists INDEL ) &amp; (QUAL >= 20)) | (QUAL >= 30 )**
+
+  - *I want to keep samples where the genotype for the first sample is homozygous variant and the genotype for the second sample is reference*:
+
+    * **isHom( GEN[0] ) &amp; isVariant( GEN[0] ) &amp; isRef( GEN[1] )**
+
+
+For complete details about this tool and epressions that can be used, please go to http://snpeff.sourceforge.net/SnpSift.html#filter
 
 	</help>
 </tool>