0
|
1 <tool id="deseq-hts" name="DESeq" version="1.6.0">
|
|
2 <description>Determines differentially expressed transcripts from read alignments</description>
|
|
3 <command>
|
|
4 deseq-hts/src/deseq-hts.sh $anno_input_selected $deseq_out $deseq_out.extra_files_path/gene_map.mat
|
|
5 #for $i in $replicate_groups
|
|
6 #for $j in $i.replicates
|
|
7 $j.bam_alignment:#slurp
|
|
8 #end for
|
|
9 #end for
|
|
10 >> $Log_File </command>
|
|
11 <inputs>
|
|
12 <param format="gff3" name="anno_input_selected" type="data" label="Genome annotation in GFF3 file" help="A tab delimited format for storing sequence features and annotations"/>
|
|
13 <repeat name="replicate_groups" title="Replicate group" min="2">
|
|
14 <repeat name="replicates" title="Replicate">
|
|
15 <param format="bam" name="bam_alignment" type="data" label="BAM alignment file" help="BAM alignment file. Can be generated from SAM files using the SAM Tools."/>
|
|
16 </repeat>
|
|
17 </repeat>
|
|
18 </inputs>
|
|
19
|
|
20 <outputs>
|
|
21 <data format="txt" name="deseq_out" label="DESeq result"/>
|
|
22 <data format="txt" name="Log_File" label="DESeq log file"/>
|
|
23 </outputs>
|
|
24
|
|
25 <tests>
|
|
26 <test>
|
|
27 command:
|
|
28 ./deseq-hts.sh ../test_data/deseq_c_elegans_WS200-I-regions.gff3 ../test_data/deseq_c_elegans_WS200-I-regions_deseq.txt ../test_data/genes.mat ../test_data/deseq_c_elegans_WS200-I-regions-SRX001872.bam ../test_data/deseq_c_elegans_WS200-I-regions-SRX001875.bam
|
|
29
|
|
30 <param name="anno_input_selected" value="deseq_c_elegans_WS200-I-regions.gff3" ftype="gff3" />
|
|
31 <param name="bam_alignments1" value="deseq_c_elegans_WS200-I-regions-SRX001872.bam" ftype="bam" />
|
|
32 <param name="bam_alignments2" value="deseq_c_elegans_WS200-I-regions-SRX001875.bam" ftype="bam" />
|
|
33 <output name="deseq_out" file="deseq_c_elegans_WS200-I-regions_deseq.txt" />
|
|
34 </test>
|
|
35 </tests>
|
|
36
|
|
37 <help>
|
|
38
|
|
39 .. class:: infomark
|
|
40
|
|
41 **What it does**
|
|
42
|
|
43 `DESeq` is a tool for differential expression testing of RNA-Seq data.
|
|
44
|
|
45
|
|
46 **Inputs**
|
|
47
|
|
48 `DESeq` requires three input files to run:
|
|
49
|
|
50 1. Annotation file in GFF3, containing the necessary information about the transcripts that are to be quantified.
|
|
51 2. The BAM alignment files grouped into replicate groups, each containing several replicates. BAM files store the read alignments in a compressed format. They can be generated using the `SAM-to-BAM` tool in the NGS: SAM Tools section. (The script will also work with only two groups containing only a single replicate each. However, this analysis has less statistical power and is therefor not recommended.)
|
|
52
|
|
53 **Output**
|
|
54
|
|
55 `DESeq` generates a text file containing the gene name and the p-value.
|
|
56
|
|
57 ------
|
|
58
|
|
59 **Licenses**
|
|
60
|
|
61 If **DESeq** is used to obtain results for scientific publications it
|
|
62 should be cited as [1]_.
|
|
63
|
|
64 **References**
|
|
65
|
|
66 .. [1] Anders, S and Huber, W (2010): `Differential expression analysis for sequence count data`_.
|
|
67
|
|
68 .. _Differential expression analysis for sequence count data: http://dx.doi.org/10.1186/gb-2010-11-10-r106
|
|
69
|
|
70 ------
|
|
71
|
|
72 .. class:: infomark
|
|
73
|
|
74 **About formats**
|
|
75
|
|
76
|
|
77 **GFF3 format** General Feature Format is a format for describing genes
|
|
78 and other features associated with DNA, RNA and protein
|
|
79 sequences. GFF3 lines have nine tab-separated fields:
|
|
80
|
|
81 1. seqid - The name of a chromosome or scaffold.
|
|
82 2. source - The program that generated this feature.
|
|
83 3. type - The name of this type of feature. Some examples of standard feature types are "gene", "CDS", "protein", "mRNA", and "exon".
|
|
84 4. start - The starting position of the feature in the sequence. The first base is numbered 1.
|
|
85 5. stop - The ending position of the feature (inclusive).
|
|
86 6. score - A score between 0 and 1000. If there is no score value, enter ".".
|
|
87 7. strand - Valid entries include '+', '-', or '.' (for don't know/care).
|
|
88 8. phase - If the feature is a coding exon, frame should be a number between 0-2 that represents the reading frame of the first base. If the feature is not a coding exon, the value should be '.'.
|
|
89 9. attributes - All lines with the same group are linked together into a single item.
|
|
90
|
|
91 For more information see http://www.sequenceontology.org/gff3.shtml
|
|
92
|
|
93 **SAM/BAM format** The Sequence Alignment/Map (SAM) format is a
|
|
94 tab-limited text format that stores large nucleotide sequence
|
|
95 alignments. BAM is the binary version of a SAM file that allows for
|
|
96 fast and intensive data processing. The format specification and the
|
|
97 description of SAMtools can be found on
|
|
98 http://samtools.sourceforge.net/.
|
|
99
|
|
100 ------
|
|
101
|
|
102 DESeq-hts Wrapper Version 0.3 (Feb 2012)
|
|
103
|
|
104 </help>
|
|
105 </tool>
|