view deseq-hts_2.0/galaxy/deseq2.xml @ 10:2fe512c7bfdf draft

DESeq2 version 1.0.19 added to the repo
author vipints <vipin@cbio.mskcc.org>
date Tue, 08 Oct 2013 08:15:34 -0400
parents
children
line wrap: on
line source

<tool id="deseq2-hts" name="DESeq2" version="1.0.19">
  <description> Differential gene expression analysis based on the negative binomial distribution</description>
  <command interpreter="bash"> 
./../src/deseq2-hts.sh $anno_input_selected $deseq_out $deseq_out.extra_files_path/gene_map.mat 
$distype
#for $i in $replicate_groups
#for $j in $i.replicates
$j.bam_alignment:#slurp
#end for

#end for
    >> $Log_File </command>
  <inputs>
	<param format="gff,gtf,gff3" name="anno_input_selected" type="data" label="Genome annotation in GFF file" help="A tab delimited format for storing sequence features and annotations"/>
   <repeat name="replicate_groups" title="Replicate group" min="2">
     <repeat name="replicates" title="Replicate">
      <param format="bam" name="bam_alignment" type="data" label="BAM alignment file" help="BAM alignment file. Can be generated from SAM files using the SAMTools."/>
     </repeat>
   </repeat>

    <param name="distype" type="select" label="Select fitting of dispersions to the mean intensity">
            <option value="parametric">Parametric</option>
            <option value="local">Local</option>
            <option value="mean" selected="true">Mean</option>
    </param>

  </inputs>

  <outputs>
    <data format="txt" name="deseq_out" label="${tool.name} on ${on_string}: Differential Expression"/>
    <data format="txt" name="Log_File" label="${tool.name} on ${on_string}: log"/>
  </outputs>

  <tests>
    <test> 
	./deseq2-hts.sh ../test_data/deseq_c_elegans_WS200-I-regions.gff3 ../test_data/deseq_c_elegans_WS200-I-regions_deseq.txt ../test_data/genes.mat ../test_data/deseq_c_elegans_WS200-I-regions-SRX001872.bam ../test_data/deseq_c_elegans_WS200-I-regions-SRX001875.bam
	
      <param name="anno_input_selected" value="deseq_c_elegans_WS200-I-regions.gff3" ftype="gff3" />
      <param name="bam_alignments1" value="deseq_c_elegans_WS200-I-regions-SRX001872.bam" ftype="bam" />
      <param name="bam_alignments2" value="deseq_c_elegans_WS200-I-regions-SRX001875.bam" ftype="bam" />
      <output name="deseq_out" file="deseq_c_elegans_WS200-I-regions_deseq.txt" />
    </test>
  </tests> 

  <help>

.. class:: infomark

**What it does** 

DESeq2_ Estimate variance-mean dependence in count data from high-throughput sequencing assays and test for differential expression based on a model using the negative binomial distribution.

.. _DESeq2: http://bioconductor.org/packages/2.12/bioc/html/DESeq2.html

`DESeq2` requires:

Genome annotation in GFF file type, containing the necessary information about the transcripts that are to be quantified.

The BAM alignment files grouped into replicate groups, each containing several replicates. BAM files store the read alignments, The program will also work with only two groups containing only a single replicate each. However, this analysis has less statistical power and is therefore not recommended!

------

**Licenses**

If **DESeq2** is used to obtain results for scientific publications it
should be cited as [1]_.

**References** 

.. [1] Anders, S and Huber, W (2010): `Differential expression analysis for sequence count data`_. 

.. _Differential expression analysis for sequence count data: http://dx.doi.org/10.1186/gb-2010-11-10-r106

------

.. class:: infomark

**About formats**

**GFF/GTF format** General Feature Format/Gene Transfer Format is a format for describing genes and other features associated with DNA, RNA and protein sequences. GFF3 lines have nine tab-separated fields:

1. seqid - The name of a chromosome or scaffold.
2. source - The program that generated this feature.
3. type - The name of this type of feature. Some examples of standard feature types are "gene", "CDS", "protein", "mRNA", and "exon". 
4. start - The starting position of the feature in the sequence. The first base is numbered 1.
5. stop - The ending position of the feature (inclusive).
6. score - A score between 0 and 1000. If there is no score value, enter ".".
7. strand - Valid entries include '+', '-', or '.' (for don't know/care).
8. phase - If the feature is a coding exon, frame should be a number between 0-2 that represents the reading frame of the first base. If the feature is not a coding exon, the value should be '.'.
9. attributes - All lines with the same group are linked together into a single item.

For more information see http://www.sequenceontology.org/gff3.shtml

**BAM format** The Sequence Alignment/Map (SAM) format is a
tab-limited text format that stores large nucleotide sequence
alignments. BAM is the binary version of a SAM file that allows for
fast and intensive data processing. The format specification and the
description of SAMtools can be found on
http://samtools.sourceforge.net/.

------

DESeq2-hts Wrapper Version 0.2 (Aug 2013)

</help>
</tool>