0
|
1 <tool id="mummer_maxmatch" name="MUMmer MaxMatch" version="0.9.alx" force_history_refresh="True">
|
|
2 <description>: Maximal exact sequence matching</description>
|
|
3 <command>
|
|
4 <!-- update this path to the installed location -->
|
|
5 $tool.cmd
|
|
6 #if $tool.cmd=="mummer":
|
|
7 $tool.cmd_extra
|
|
8 $tool.mum_ref_in
|
|
9 $tool.mum_q_in
|
|
10 #end if
|
|
11 #if $tool.cmd=="repeat-match":
|
|
12 -n $tool.rm_n
|
|
13 #if $tool.rm_E=="yes":
|
|
14 -E
|
|
15 #end if
|
|
16 $tool.cmd_extra
|
|
17 $tool.in_seq
|
|
18 #end if
|
|
19 #if $tool.cmd=="exact-tandems":
|
|
20 $tool.in_seq
|
|
21 $tool.et_minl
|
|
22 #end if
|
|
23 <!-- unfortunate somehow error state gets set also on succesfull jobs. Pipe io stderr to dev/null -->
|
|
24 2>&-
|
|
25 > $out_tool
|
|
26
|
|
27 </command>
|
|
28 <inputs>
|
|
29 <conditional name="tool">
|
|
30 <param name="cmd" type="select" value="mummer" label="MUMmer maximal matching" help="Algorithms are run with default parameters (none). For specific args see help below" >
|
|
31 <option value="mummer">mummer</option>
|
|
32 <option value="repeat-match">repeat-match</option>
|
|
33 <option value="exact-tandems">exact-tandems</option>
|
|
34 </param>
|
|
35 <when value="mummer">
|
|
36 <param name="mum_ref_in" type="data" format="fasta" label="Reference FastA file" />
|
|
37 <param name="mum_q_in" type="data" format="fasta" label="Query (multi) FastA sequence" />
|
|
38 <param name="cmd_extra" type="text" size="40" value="" label="Extra cmd line options" help="See specific cmd line options below for each tool" />
|
|
39 </when>
|
|
40 <when value="repeat-match">
|
|
41 <param name="in_seq" type="data" format="fasta" label="FastA sequence file" />
|
|
42 <param name="rm_n" type="text" size="5" value="20" label="Minimum exact match length [-n]" />
|
|
43 <param name="rm_E" type="select" value="no" label="Use exhaustive (slow) search to find matches [-E]" >
|
|
44 <option value="no">No</option>
|
|
45 <option value="yes">Yes</option>
|
|
46 </param>
|
|
47 <param name="cmd_extra" type="text" size="40" value="" label="Extra cmd line options" help="-n and -E are configured above. More specific cmd line options in help below." />
|
|
48 </when>
|
|
49 <when value="exact-tandems">
|
|
50 <param name="in_seq" type="data" format="fasta" label="FastA sequence file" />
|
|
51 <param name="et_minl" type="text" size="5" value="20" label="Minimum length" />
|
|
52 </when>
|
|
53 </conditional>
|
|
54 </inputs>
|
|
55 <outputs>
|
|
56 <data name="out_tool" format="text" label="Max exact match output" />
|
|
57 </outputs>
|
|
58 <requirements>
|
|
59 <!-- <requirement type="set_environment" version="3.23">MUMMER_PATH</requirement> -->
|
|
60 <requirement type="package" version="4.6.4">gnuplot</requirement>
|
|
61 <requirement type="package" version="3.23">mummer</requirement>
|
|
62 </requirements>
|
|
63 <tests>
|
|
64 <test>
|
|
65 </test>
|
|
66 </tests>
|
|
67 <help>
|
|
68 |
|
|
69
|
|
70
|
|
71 **Reference**
|
|
72 =============
|
|
73
|
|
74 - **MUMmer MaxExactMatch Galaxy tool wrapper:** Alex Bossers, CVI of Wageningen UR, The Netherlands.
|
|
75
|
|
76 - **MUMmer suite v3.22:** http://mummer.sourceforge.net
|
|
77
|
|
78 - **MUMmer tutorials:** http://mummer.sourceforge.net/examples/
|
|
79
|
|
80 Please do not use any of the command line options that modify prefixes or file names. As obvious
|
|
81 they are quite useless within galaxy and are likely to fail the routine!
|
|
82
|
|
83 If you found these tools/wrappers usefull in your research, please acknowledge our work. If you improve
|
|
84 or modify the wrappers please add instead of substitute yourself into the acknowlegement section :)
|
|
85
|
|
86
|
|
87
|
|
88 **MUMmer Maximal exact matching**
|
|
89 =================================
|
|
90
|
|
91 The heart of the MUMmer package is its suffix tree based maximal matching routines. These can be
|
|
92 used for repeat detection within a single sequence as is done by *repeat-match* and *exact-tandems*,
|
|
93 or can be used for the alignment of two or more sequences as is done by *mummer*.
|
|
94
|
|
95 Mummer
|
|
96 ------
|
|
97
|
|
98 mummer is a suffix tree algorithm designed to find maximal exact matches of some minimum length
|
|
99 between two input sequences. by default mummer will only find maximal matches that are unique in
|
|
100 the entire set of reference sequences. The match lists produced by mummer can be used alone to
|
|
101 generate alignment dot plots, or can be passed on to the clustering algorithms for the identification
|
|
102 of longer non-exact regions of conservation. These match lists have great versatility because they
|
|
103 contain huge amounts of information and can be passed forward to other interpretation programs for
|
|
104 clustering, analysis, searching, etc.
|
|
105
|
|
106
|
|
107 Repeat-match
|
|
108 ------------
|
|
109
|
|
110 repeat-match is a suffix tree algorithm designed to find maximal exact repeats within a single input
|
|
111 sequence. It uses a similar algorithm to mummer, but altered slightly to find maximal exact matches
|
|
112 within a single sequence.
|
|
113
|
|
114 Output formatting varies depending on the command line parameters and the output can be quite large.
|
|
115 The standard output format that results from running repeat-match with default parameters is as follows:
|
|
116 ::
|
|
117
|
|
118 Long Exact Matches:
|
|
119 Start1 Start2 Length
|
|
120 4919485 4919506r 22
|
|
121
|
|
122 The three columns are the first position of the repeat, the second position of the repeat, and the
|
|
123 length of the repeat respectively. Reverse complement repeat positions are denoted by an 'r'
|
|
124 following the Start2 position, and are relative to the forward strand of the sequence.
|
|
125
|
|
126
|
|
127 Exact-tandems
|
|
128 -------------
|
|
129
|
|
130 exact-tandems is a wrapper script for the repeat-match program. It provides a list of exact tandem
|
|
131 repeats within a single input sequence. As with repeat-match the sequence file should contain only
|
|
132 one sequence in FastA format, however if multiple sequences exist the first one will be used. The
|
|
133 sequence may contain any set of upper and lowercase characters, thus DNA and protein sequence are
|
|
134 both allowed and matching is case insensitive. The minimum match length parameter should be a
|
|
135 positive integer, this value will be passed to the repeat-match program via the -n option.
|
|
136
|
|
137 The output format of exact-tandems is as follows:
|
|
138 ::
|
|
139
|
|
140 Finding matches
|
|
141 Tandem repeats
|
|
142 Start Extent UnitLen Copies
|
|
143 416173 150 45 3.3
|
|
144
|
|
145 The four columns are the first position of the tandem, the extent of the repeat region, the length
|
|
146 of each tandem repeat unit, and the number of repeat units respectively.
|
|
147
|
|
148
|
|
149
|
|
150 **Manuals and CMD line options (specific for each tool!):**
|
|
151 ===========================================================
|
|
152
|
|
153 **Mummer**
|
|
154
|
|
155 http://mummer.sourceforge.net/manual/#mummer
|
|
156
|
|
157 **Repeat-match**
|
|
158
|
|
159 http://mummer.sourceforge.net/manual/#repeat
|
|
160
|
|
161 **exact-tandems**
|
|
162
|
|
163 http://mummer.sourceforge.net/manual/#exact
|
|
164
|
|
165 |
|
|
166 |
|
|
167
|
|
168 </help>
|
|
169 </tool>
|
|
170
|