view snpEff/snpEff.xml @ 0:481a95ca5339

Migrated tool version 0.9 from old tool shed archive to new tool shed repository
author pcingola
date Tue, 07 Jun 2011 17:08:59 -0400
parents
children
line wrap: on
line source

<tool id="snpEff" name="Compute SNP effect" version="0.9">
	<description>for each SNP in a file</description>
	<command>java -jar /usr/local/snpEff/snpEff.jar -c /usr/local/snpEff/snpEff.config $genomeVersion $input > $output</command>
	<!-- <command>java -Xmx2048M -jar /usr/local/snpEff/snpEff.jar -c /usr/local/snpEff/snpEff.config $genomeVersion $input > $output</command> -->
	<inputs>
		<param format="interval" name="input" type="data" label="Source file"/>
		<param name="genomeVersion" type="select" label="Genome">
			<option value="hg37">Human (hg37)</option>
			<option value="mm37">Mouse (mm37)</option>
		</param>
	</inputs>
	<outputs>
		<data format="tabular" name="output" />
	</outputs>

	<help>

This tool computes the effect of SNPs. Current predictions include

	* GENE : ENSEMBL gene ID, gene name and bio-type
	* TRANSCRIPT : ENSEMBL transcript ID
	* INTRON : The SNP hits a transcript, but no exons (or UTRs)
	* EXON : ENSEMBL exon ID
		* SYNONYMOUS_CODING : The SNP changes the DNA sequence in a way that produces the same amino acid
		* NON_SYNONYMOUS_CODING : The SNP changes the DNA sequence in a way that produces a different amino acid
		* STOP_GAINED : The SNP creates a new STOP codon
		* STOP_LOST : The SNP changes a STOP codon into an amino acid
	* UPSTREAM : The SNP is 2K bases upstream of a transcript (before 5 prime UTR)
	* DOWNSTREAM : The SNP is 2K bases downstream of a transcript (after 3 prime UTR)
	* 5PRIME_UTR : The SNP is in the 5 prime UTR region
	* 3PRIME_UTR : The SNP is in the 3 prime UTR region
	* INTERGENIC : The SNP does not any known gene or up/downstream region

-----

.. class:: infomark

**File format**

The file format must be tab-separated format, containing five columns that correspond to:

	* chromosome_name
	* chromosome_start_position
	* chromosome_end_position
	* allele: "base_Ori / base_Snp"
	* strand: {+,-}

.. class:: warningmark 

**WARNING** Insertions and deletions are not supported, so chromosome_start_position is equal to chromosome_end_position::
 
-----

.. class:: infomark

**Input file format example**

This is an example of an input file::

    5   140532    140532    T/C   +
    12  1017956   1017956   T/A   +
    2   946507    946507    G/C   +
    14  19584687  19584687  C/T   -
    19  66520     66520     G/A   +
    8   150029    150029    A/T   +

-----

.. class:: infomark

**Output file format example**

The output file consist of one line per SNP effect. This means that you usually get more than one line per SNP. The format is tab separated cinsisting of two columns

	* SNP:  chr:position_besOri/baseSnp
	* Effect : Effect[, EXON:ExonId, TRANSCRIPT:TranscriptId, GENE:GeneId (genenName bioType)]

This is an example of an output file::

    chr2:946507_G/C        UPSTREAM, ENST00000452177
    chr2:946507_G/C        UPSTREAM, ENST00000450962
    chr2:946507_G/C        UPSTREAM, ENST00000308624
    chr2:946507_G/C        UPSTREAM, ENST00000407292
    chr5:140532_T/C        NON_SYNONYMOUS_CODING (V/A), EXON:ENSE00001319336, TRANSCRIPT:ENST00000283426, GENE:ENSG00000153404(PLEKHG4B protein_coding)
    chr5:140532_T/C        NON_SYNONYMOUS_CODING (V/A), EXON:ENSE00001319336, TRANSCRIPT:ENST00000398036, GENE:ENSG00000153404(PLEKHG4B protein_coding)
    chr5:140532_T/C        DOWNSTREAM, ENST00000398036
    chr8:150029_A/T        NON_SYNONYMOUS_CODING (Y/N), EXON:ENSE00001913609, TRANSCRIPT:ENST00000490482, GENE:ENSG00000223508(RPL23AP53 protein_coding)
    chr12:1017956_T/A      STOP_LOST (*/K), EXON:ENSE00001527897, TRANSCRIPT:ENST00000340908, GENE:ENSG00000060237(WNK1 protein_coding)
    chr12:1017956_T/A      STOP_LOST (*/K), EXON:ENSE00001527897, TRANSCRIPT:ENST00000252477, GENE:ENSG00000060237(WNK1 protein_coding)
    chr12:1017956_T/A      STOP_LOST (*/K), EXON:ENSE00001527897, TRANSCRIPT:ENST00000315939, GENE:ENSG00000060237(WNK1 protein_coding)
    chr12:1017956_T/A      UPSTREAM, ENST00000340908
    chr14:19584687_C/T     3PRIME_UTR, ENSE00001583193, TRANSCRIPT:ENST00000409832, GENE:ENSG00000222036(POTEG protein_coding)

.. class:: warningmark 

**WARNING** You may get the same effect on one exon repeated because it acts on different transcripts::
	</help>
</tool>