comparison sm_tophat2_toolshed.xml @ 1:038c61725cfb draft

Uploaded
author sarahinraauzeville
date Thu, 11 Feb 2016 08:45:28 -0500
parents
children
comparison
equal deleted inserted replaced
0:15aa80493a82 1:038c61725cfb
1 <!--# Copyright (C) 2013 INRA
2 # This program is free software: you can redistribute it and/or modify
3 # it under the terms of the GNU General Public License as published by
4 # the Free Software Foundation, either version 3 of the License, or
5 # (at your option) any later version.
6 #
7 # This program is distributed in the hope that it will be useful,
8 # but WITHOUT ANY WARRANTY; without even the implied warranty of
9 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
10 # GNU General Public License for more details.
11 #
12 # You should have received a copy of the GNU General Public License
13 # along with this program. If not, see http://www.gnu.org/licenses/.
14 #-->
15 <tool id="sm_tophat2" name="Tophat 2 for Illumina">
16 <description>Find splice junctions using RNA-seq data</description>
17 <command interpreter="perl">sm_tophat2.pl $lib $input_read1 $input_read2 $reference_source.reference_source_selector
18 #if $reference_source.reference_source_selector =="cached":
19 $reference_source.ref_file_cached.fields.path
20 #end if
21 #if $reference_source.reference_source_selector =="history":
22 $reference_source.ref_file
23 #end if
24 $p $r $max_intron $output_bam $output_bed $output_unmapped_bam $zip $gtf_cond.gtf
25 #if $gtf_cond.gtf =="T":
26 $gtf_cond.input_gtf
27 #end if
28 </command>
29 <version_command>echo tophat2 version : ; tophat2 --version</version_command>
30 <inputs>
31 <param format="fastq, fastqsanger, fastqillumina" name="input_read1" type="data" label="Your RNA-Seq FASTQ file (read 1)"/>
32 <param format="fastq, fastqsanger, fastqillumina" name="input_read2" type="data" label="Your RNA-Seq FASTQ file (read 2)"/>
33
34 <conditional name="reference_source">
35 <param name="reference_source_selector" type="select" label="Load reference genome from">
36 <option value="cached">Local cache</option>
37 <option value="history">History</option>
38 </param>
39 <when value="cached">
40 <param name="ref_file_cached" type="select" label="Using reference genome" help="Select genome from the list">
41 <options from_data_table="tophat_ind">
42 <filter type="sort_by" column="2" />
43 <validator type="no_options" message="No indexes are available" />
44 </options>
45 <validator type="no_options" message="A built-in reference genome is not available for the build associated with the selected input file"/>
46 </param>
47
48 </when>
49 <when value="history">
50 <param name="ref_file" type="data" format="fasta" label="Use the following dataset as the reference sequence" help="You can upload a FASTA sequence to the history and use it as reference" />
51 </when>
52 </conditional>
53
54 <param name="p" size="20" type="text" value="16" label="Number of threads used to align reads"/>
55 <param name="max_intron" size="20" type="text" value="5000" label="Maximum intron length"/>
56 <param name="r" size="20" type="text" value="200" label="Expected (mean) inner distance between mate pairs"/>
57 <param name="zip" type="select" display="checkboxes" multiple="True" label="Your RNA-seq FASTQ file are zipped" help="Please check this option if your files are zipped.">
58 <option value="YES">Yes</option>
59 </param>
60
61 <conditional name="gtf_cond">
62 <param name="gtf" type="select" help="Do you have a gtf file available ?" label="GTF file available">
63 <option value="T">Yes</option>
64 <option value="F" selected="true">No</option>
65 </param>
66 <when value="F" />
67 <when value="T">
68 <param format="gtf, gff" name="input_gtf" type="data" label="Your GTF file"/>
69 </when>
70 </conditional>
71
72 <param name="lib" type="select" label="Library type">
73 <option value="fr-unstranded">fr-unstranded</option>
74 <option value="fr-firststrand">fr-firststrand</option>
75 <option value="fr-secondstrand">fr-secondstrand</option>
76 </param>
77
78 </inputs>
79 <outputs>
80 <data format="bam" name="output_bam" label ="{$input_read1.name}-Tophat_mapped.bam"/>
81 <data format="bed" name="output_bed" label ="{$input_read1.name}-Tophat.bed"/>
82 <data format="bam" name="output_unmapped_bam" label ="{$input_read1.name}-Tophat_unmapped.bam"/>
83 </outputs>
84 <help>
85 .. class:: infomark
86
87 What it does : TopHat 2 is a program that aligns RNA-Seq reads to a genome in order to identify exon-exon splice junctions. It is built on the ultrafast short read mapping program Bowtie 2. TopHat runs on Linux and OS X.
88
89
90 *What types of reads can I use TopHat 2 with?*
91
92 TopHat was designed to work with reads produced by the Illumina Genome Analyzer, although users have been successful in using TopHat with reads from other technologies. In TopHat 1.1.0, we began supporting Applied Biosystems' Colorspace format. The software is optimized for reads 75bp or longer.
93
94 Mixing paired- and single- end reads together is not supported.
95
96
97
98 *How does TopHat 2 find junctions?*
99
100 TopHat can find splice junctions without a reference annotation. By first mapping RNA-Seq reads to the genome, TopHat identifies potential exons, since many RNA-Seq reads will contiguously align to the genome. Using this initial mapping information, TopHat builds a database of possible splice junctions and then maps the reads against these junctions to confirm them.
101
102 Short read sequencing machines can currently produce reads 100bp or longer but many exons are shorter than this so they would be missed in the initial mapping. TopHat solves this problem mainly by splitting all input reads into smaller segments which are then mapped independently. The segment alignments are put back together in a final step of the program to produce the end-to-end read alignments.
103
104 TopHat generates its database of possible splice junctions from two sources of evidence. The first and strongest source of evidence for a splice junction is when two segments from the same read (for reads of at least 45bp) are mapped at a certain distance on the same genomic sequence or when an internal segment fails to map - again suggesting that such reads are spanning multiple exons. With this approach, "GT-AG", "GC-AG" and "AT-AC" introns will be found ab initio. The second source is pairings of "coverage islands", which are distinct regions of piled up reads in the initial mapping. Neighboring islands are often spliced together in the transcriptome, so TopHat looks for ways to join these with an intron. We only suggest users use this second option (--coverage-search) for short reads (inf. 45bp) and with a small number of reads (inf or egal 10 million). This latter option will only report alignments across "GT-AG" introns
105
106
107 Command line : Please see "information" then "stdout".
108
109
110 Parameters :
111
112 -o/--output-dir string
113
114 Sets the name of the directory in which TopHat will write all of its output. The default is "./tophat_out".
115
116
117 -r/--mate-inner-dist int
118
119 This is the expected (mean) inner distance between mate pairs. For, example, for paired end runs with fragments selected at 300bp, where each end is 50bp, you should set -r to be 200. The default is 50bp.
120
121
122 -I/--max-intron-length int
123
124 The maximum intron length. When searching for junctions ab initio, TopHat will ignore donor/acceptor pairs farther than this many bases apart, except when such a pair is supported by a split segment alignment of a long read. The default is 500000.
125
126
127 -p/--num-threads int
128
129 Use this many threads to align reads. The default is 1.
130
131
132 --library-type
133 fr-unstranded, fr-firststrand, fr-secondstrand
134
135
136
137 ----
138
139 Version Galaxy Tool : V2.0
140
141 Versions of bioinformatics tools used : Tophat 2
142
143 ----
144
145 Contacts (noms et emails) : sigenae-support@listes.inra.fr
146
147 E-learning available : Yes.
148
149 Please cite :
150
151 Depending on the help provided you can cite us in acknowledgements, references or both.
152
153 Examples :
154 Acknowledgements
155 We wish to thank the SIGENAE group for ....
156
157 References
158 X. SIGENAE [http://www.sigenae.org/]
159 </help>
160
161 </tool>