view snpSift_filter.xml @ 6:ec16dae84230

Fix tests for SnpSift caseControl and intervals
author Jim Johnson <jj@umn.edu>
date Tue, 26 Mar 2013 15:24:30 -0500
parents 192a236898f5
children 13b6ad2ddace
line wrap: on
line source

<tool id="snpSift_filter" name="SnpSift Filter" version="3.1">
	<options sanitize="False" />
	<description>Filter variants using arbitrary expressions</description>
	<!-- 
	    You will need to change the path to wherever your installation is.
		You can change the amount of memory used, just change the -Xmx parameter (e.g. use -Xmx2G for 2Gb of memory)
	java -Xmx6G -jar $JAVA_JAR_PATH/SnpSift.jar filter -f $input -e $exprFile > $output
	-->
	<requirements>
                <requirement type="package" version="3.1">snpEff</requirement>
	</requirements>
	<command>
		java -Xmx6G -jar \$JAVA_JAR_PATH/SnpSift.jar filter -f $input -e $exprFile > $output
	</command>
	<inputs>
		<param format="vcf" name="input" type="data" label="VCF input"/>
		<param name="expr" type="text" label="Expression" size="50"/>
	</inputs>
	<configfiles>
		<configfile name="exprFile">
		$expr
		</configfile> 
	</configfiles>

	<outputs>
		<data format="vcf" name="output" />
	</outputs>
        <stdio>
          <exit_code range=":-1"  level="fatal"   description="Error: Cannot open file" />
          <exit_code range="1:"  level="fatal"   description="Error" />
        </stdio>

        <tests>

            <test>
                <param name="input" ftype="vcf" value="test01.vcf"/>
                <param name="expr" value="QUAL >= 50"/>
                <output name="output">
                    <assert_contents>
                        <not_has_text text="25967" />
                        <not_has_text text="NT_166464" />
                    </assert_contents>
                </output>
            </test>

            <test>
                <param name="input" ftype="vcf" value="test01.vcf"/>
                <param name="expr" value="(CHROM = '19')"/>
                <output name="output">
                    <assert_contents>
                        <has_text text="3205820" />
                        <not_has_text text="NT_16" />
                    </assert_contents>
                </output>
            </test>

            <test>
                <param name="input" ftype="vcf" value="test01.vcf"/>
                <param name="expr" value="(POS >= 20175 & (POS <= 35549)"/>
                <output name="output">
                    <assert_contents>
                        <has_text text="20175" />
                        <has_text text="35549" />
                        <has_text text="22256" />
                        <not_has_text text="18933" />
                        <not_has_text text="37567" />
                    </assert_contents>
                </output>
            </test>

            <test>
                <param name="input" ftype="vcf" value="test01.vcf"/>
                <param name="expr" value="( DP >= 5 )"/>
                <output name="output">
                    <assert_contents>
                        <has_text text="DP=5;" />
                        <has_text text="DP=6;" />
                        <not_has_text text="DP=1;" />
                    </assert_contents>
                </output>
            </test>

        </tests>

	<help>

**SnpSift filter**

You can filter ia vcf file using arbitrary expressions, for instance "(QUAL > 30) | (exists INDEL) | ( countHet() > 2 )". The actual expressions can be quite complex, so it allows for a lot of flexibility.

Some examples:

  - *I want to filter out samples with quality less than 30*:

    * **( QUAL &gt; 30 )**

  - *...but we also want InDels that have quality 20 or more*:

    * **(( exists INDEL ) &amp; (QUAL >= 20)) | (QUAL >= 30 )**

  - *...or any homozygous variant present in more than 3 samples*:

    * **(countHom() > 3) | (( exists INDEL ) &amp; (QUAL >= 20)) | (QUAL >= 30 )**

  - *...or any heterozygous sample with coverage 25 or more*:

    * **((countHet() > 0) &amp; (DP >= 25)) | (countHom() > 3) | (( exists INDEL ) &amp; (QUAL >= 20)) | (QUAL >= 30 )**

  - *I want to keep samples where the genotype for the first sample is homozygous variant and the genotype for the second sample is reference*:

    * **isHom( GEN[0] ) &amp; isVariant( GEN[0] ) &amp; isRef( GEN[1] )**


For complete details about this tool and epressions that can be used, please go to http://snpeff.sourceforge.net/SnpSift.html#filter

	</help>
</tool>