25
|
1 toolfactory_2
|
|
2 =============
|
|
3
|
|
4 This is an upgrade to the tool factory but with added parameters
|
|
5 (optionally editable in the generated tool form - otherwise fixed) and
|
|
6 multiple input files.
|
|
7
|
|
8 Specify any number of parameters - well at
|
|
9 least up to the limit of your patience with repeat groups.
|
|
10
|
|
11 Parameter values supplied at tool generation time are defaults and
|
|
12 can be optionally editable by the user - names cannot be changed once
|
|
13 a tool has been generated.
|
|
14
|
|
15 If not editable, they act as hidden parameters passed to the script
|
|
16 and are not editable on the tool form.
|
|
17
|
|
18 Note! There will be Galaxy default sanitization for all
|
|
19 user input parameters which your script may need to dance around.
|
|
20
|
|
21 Any number of input files can be passed to your script, but of course it
|
|
22 has to deal with them. Both path and metadata name are supplied either in the environment
|
|
23 (bash/sh) or as command line parameters (python,perl,rscript) that need to be parsed and
|
|
24 dealt with in the script. This is complicated by the common use case of needing file names
|
|
25 for (eg) column headers, as well as paths. Try the examples are show on the tool factory
|
|
26 form to see how Galaxy file and user supplied parameter values can be recovered in each
|
|
27 of the 4 scripting environments supported.
|
|
28
|
|
29 Best way to deal with multiple outputs is to let the tool factory generate an HTML
|
|
30 page for your users. It automagically lays out pdf images as thumbnail galleries
|
|
31 and can have separate results sections gathering all similarly prefixed files, such as
|
|
32 a Foo section taking text and results from text (foo_whatever.log) and
|
|
33 artifacts (eg foo_MDS_plot.pdf) file names. All artifacts are linked for download.
|
|
34 A copy of the actual script is provided for provenance - be warned, it exposes
|
|
35 real file paths.
|
|
36
|
|
37
|
|
38 tldr;
|
|
39
|
|
40 ```
|
|
41
|
|
42 # WARNING before you start
|
|
43 # Install this tool on a private Galaxy ONLY
|
|
44 # Please NEVER on a public or production instance
|
|
45 # updated august 2014 by John Chilton adding citation support
|
|
46 #
|
|
47 # updated august 8 2014 to fix bugs reported by Marius van den Beek
|
|
48 # please cite the resource at
|
|
49 http://bioinformatics.oxfordjournals.org/cgi/reprint/bts573?ijkey=lczQh1sWrMwdYWJ&keytype=ref
|
|
50 # if you use this tool in your published work.
|
|
51
|
|
52 *Short Story*
|
|
53
|
|
54 This is an unusual Galaxy tool capable of generating new Galaxy tools.
|
|
55 It works by exposing *unrestricted* and therefore extremely dangerous scripting
|
|
56 to all designated administrators of the host Galaxy server, allowing them to
|
|
57 run scripts in R, python, sh and perl over multiple selected input data sets,
|
|
58 writing a single new data set as output.
|
|
59
|
|
60 *Differences between TF2 and the original Tool Factory*
|
|
61
|
|
62 1. TF2 (this one) allows any number of either fixed or user-editable parameters to be defined
|
|
63 for the new tool. If these are editable, the user can change them but otherwise, they are passed
|
|
64 as fixed and invisible parameters for each execution. Obviously, there are substantial security
|
|
65 implications with editable parameters, but these are always sanitized by Galaxy's inbuilt
|
|
66 parameter sanitization so you may need to "unsanitize" characters - eg translate all "__lt__"
|
|
67 into "<" for certain parameters where that is needed. Please practise safe toolshed.
|
|
68
|
|
69 2. Any number of (the same datatype) of input files may be defined.
|
|
70
|
|
71 These changes substantially complicate the way your supplied script is supplied with
|
|
72 all the new and variable parameters. Examples in each scripting language are shown
|
|
73 in the tool help
|
|
74
|
|
75 *Automated outputs in named sections*
|
|
76
|
|
77 If your script writes to the current directory path, arbitrary mix of (eg)
|
|
78 pdfs, tabular analysis results and run logs,the tool factory can optionally
|
|
79 auto-generate a linked Html page with separate sections showing a thumbnail
|
|
80 grid for all pdfs and the log text, grouping all artifacts sharing a file
|
|
81 name and log name prefix::
|
|
82
|
|
83 eg: if "foo.log" is emitted then *all* other outputs matching foo_* will
|
|
84 all be grouped together - eg
|
|
85 foo_baz.pdf
|
|
86 foo_bar.pdf and
|
|
87 foo_zot.xls
|
|
88 would all be displayed and linked in the same section with foo.log's contents
|
|
89 - to form the "Foo" section of the Html page. Sections appear in alphabetic
|
|
90 order and there are no limits on the number of files or sections.
|
|
91
|
|
92 *Automated generation of new Galaxy tools for installation into any Galaxy*
|
|
93
|
|
94 Once a script is working correctly, this tool optionally generates a
|
|
95 new Galaxy tool, effectively freezing the supplied script into a new,
|
|
96 ordinary Galaxy tool that runs it over one or more input files selected by
|
|
97 the user. Generated tools are installed via a tool shed by an administrator
|
|
98 and work exactly like all other Galaxy tools for your users.
|
|
99
|
|
100 If you use the Html output option, please ensure that sanitize_all_html is
|
|
101 set to False and uncommented in universe_wsgi.ini - it should show::
|
|
102
|
|
103 # By default, all tool output served as 'text/html' will be sanitized
|
|
104 sanitize_all_html = False
|
|
105
|
|
106 This opens potential security risks and may not be acceptable for public
|
|
107 sites where the lack of stylesheets may make Html pages damage onlookers'
|
|
108 eyeballs but should still be correct.
|
|
109
|
|
110
|
|
111 *More Detail*
|
|
112
|
|
113 To use the ToolFactory, you should have prepared a script to paste into a
|
|
114 text box, and a small test input example ready to select from your history
|
|
115 to test your new script.
|
|
116
|
|
117 There is an example in each scripting language on the Tool Factory form. You
|
|
118 can just cut and paste these to try it out - remember to select the right
|
|
119 interpreter please. You'll also need to create a small test data set using
|
|
120 the Galaxy history add new data tool.
|
|
121
|
|
122 If the script fails somehow, use the "redo" button on the tool output in
|
|
123 your history to recreate the form complete with broken script. Fix the bug
|
|
124 and execute again. Rinse, wash, repeat.
|
|
125
|
|
126 Once the script runs sucessfully, a new Galaxy tool that runs your script
|
|
127 can be generated. Select the "generate" option and supply some help text and
|
|
128 names. The new tool will be generated in the form of a new Galaxy datatype
|
|
129 - toolshed.gz - as the name suggests, it's an archive ready to upload to a
|
|
130 Galaxy ToolShed as a new tool repository.
|
|
131
|
|
132 Once it's in a ToolShed, it can be installed into any local Galaxy server
|
|
133 from the server administrative interface.
|
|
134
|
|
135 Once the new tool is installed, local users can run it - each time, the script
|
|
136 that was supplied when it was built will be executed with the input chosen
|
|
137 from the user's history. In other words, the tools you generate with the
|
|
138 ToolFactory run just like any other Galaxy tool,but run your script every time.
|
|
139
|
|
140 Tool factory tools are perfect for workflow components. One input, one output,
|
|
141 no variables.
|
|
142
|
|
143 *To fully and safely exploit the awesome power* of this tool,
|
|
144 Galaxy and the ToolShed, you should be a developer installing this
|
|
145 tool on a private/personal/scratch local instance where you are an
|
|
146 admin_user. Then, if you break it, you get to keep all the pieces see
|
|
147 https://bitbucket.org/fubar/galaxytoolfactory/wiki/Home
|
|
148
|
|
149 ** Installation **
|
|
150 This is a Galaxy tool. You can install it most conveniently using the
|
|
151 administrative "Search and browse tool sheds" link. Find the Galaxy Main
|
|
152 toolshed at https://toolshed.g2.bx.psu.edu/ and search for the toolfactory
|
|
153 repository. Open it and review the code and select the option to install it.
|
|
154
|
|
155 (
|
|
156 If you can't get the tool that way, the xml and py files here need to be
|
|
157 copied into a new tools
|
|
158 subdirectory such as tools/toolfactory Your tool_conf.xml needs a new entry
|
|
159 pointing to the xml
|
|
160 file - something like::
|
|
161
|
|
162 <section name="Tool building tools" id="toolbuilders">
|
|
163 <tool file="toolfactory/rgToolFactory.xml"/>
|
|
164 </section>
|
|
165
|
|
166 If not already there (I just added it to datatypes_conf.xml.sample),
|
|
167 please add:
|
|
168 <datatype extension="toolshed.gz" type="galaxy.datatypes.binary:Binary"
|
|
169 mimetype="multipart/x-gzip" subclass="True" />
|
|
170 to your local data_types_conf.xml.
|
|
171 )
|
|
172
|
|
173 Of course, R, python, perl etc are needed on your path if you want to test
|
|
174 scripts using those interpreters. Adding new ones to this tool code should
|
|
175 be easy enough. Please make suggestions as bitbucket issues and code. The
|
|
176 HTML file code automatically shrinks R's bloated pdfs, and depends on
|
|
177 ghostscript. The thumbnails require imagemagick .
|
|
178
|
|
179 * Restricted execution *
|
|
180 The tool factory tool itself will then be usable ONLY by admin users -
|
|
181 people with IDs in admin_users in universe_wsgi.ini **Yes, that's right. ONLY
|
|
182 admin_users can run this tool** Think about it for a moment. If allowed to
|
|
183 run any arbitrary script on your Galaxy server, the only thing that would
|
|
184 impede a miscreant bent on destroying all your Galaxy data would probably
|
|
185 be lack of appropriate technical skills.
|
|
186
|
|
187 *What it does* This is a tool factory for simple scripts in python, R and
|
|
188 perl currently. Functional tests are automatically generated. How cool is that.
|
|
189
|
|
190 LIMITED to simple scripts that read one input from the history. Optionally can
|
|
191 write one new history dataset, and optionally collect any number of outputs
|
|
192 into links on an autogenerated HTML index page for the user to navigate -
|
|
193 useful if the script writes images and output files - pdf outputs are shown
|
|
194 as thumbnails and R's bloated pdf's are shrunk with ghostscript so that and
|
|
195 imagemagik need to be available.
|
|
196
|
|
197 Generated tools can be edited and enhanced like any Galaxy tool, so start
|
|
198 small and build up since a generated script gets you a serious leg up to a
|
|
199 more complex one.
|
|
200
|
|
201 *What you do* You paste and run your script, you fix the syntax errors and
|
|
202 eventually it runs. You can use the redo button and edit the script before
|
|
203 trying to rerun it as you debug - it works pretty well.
|
|
204
|
|
205 Once the script works on some test data, you can generate a toolshed compatible
|
|
206 gzip file containing your script ready to run as an ordinary Galaxy tool in
|
|
207 a repository on your local toolshed. That means safe and largely automated
|
|
208 installation in any production Galaxy configured to use your toolshed.
|
|
209
|
|
210 *Generated tool Security* Once you install a generated tool, it's just
|
|
211 another tool - assuming the script is safe. They just run normally and their
|
|
212 user cannot do anything unusually insecure but please, practice safe toolshed.
|
|
213 Read the fucking code before you install any tool. Especially this one -
|
|
214 it is really scary.
|
|
215
|
|
216 If you opt for an HTML output, you get all the script outputs arranged
|
|
217 as a single Html history item - all output files are linked, thumbnails for
|
|
218 all the pdfs. Ugly but really inexpensive.
|
|
219
|
|
220 Patches and suggestions welcome as bitbucket issues please?
|
|
221
|
|
222 copyright ross lazarus (ross stop lazarus at gmail stop com) May 2012
|
|
223
|
|
224 all rights reserved
|
|
225 Licensed under the LGPL if you want to improve it, feel free
|
|
226 https://bitbucket.org/fubar/galaxytoolfactory/wiki/Home
|
|
227
|
|
228 Material for our more enthusiastic and voracious readers continues below -
|
|
229 we salute you.
|
|
230
|
|
231 **Motivation** Simple transformation, filtering or reporting scripts get
|
|
232 written, run and lost every day in most busy labs - even ours where Galaxy is
|
|
233 in use. This 'dark script matter' is pervasive and generally not reproducible.
|
|
234
|
|
235 **Benefits** For our group, this allows Galaxy to fill that important dark
|
|
236 script gap - all those "small" bioinformatics tasks. Once a user has a working
|
|
237 R (or python or perl) script that does something Galaxy cannot currently do
|
|
238 (eg transpose a tabular file) and takes parameters the way Galaxy supplies
|
|
239 them (see example below), they:
|
|
240
|
|
241 1. Install the tool factory on a personal private instance
|
|
242
|
|
243 2. Upload a small test data set
|
|
244
|
|
245 3. Paste the script into the 'script' text box and iteratively run the
|
|
246 insecure tool on test data until it works right - there is absolutely no
|
|
247 reason to do this anywhere other than on a personal private instance.
|
|
248
|
|
249 4. Once it works right, set the 'Generate toolshed gzip' option and run
|
|
250 it again.
|
|
251
|
|
252 5. A toolshed style gzip appears ready to upload and install like any other
|
|
253 Toolshed entry.
|
|
254
|
|
255 6. Upload the new tool to the toolshed
|
|
256
|
|
257 7. Ask the local admin to check the new tool to confirm it's not evil and
|
|
258 install it in the local production galaxy
|
|
259
|
|
260
|
|
261
|
|
262 **Parameter passing and file inputs**
|
|
263
|
|
264 Your script will receive up to 3 named parameters
|
|
265 INPATHS is a comma separated list of input file paths
|
|
266 INNAMES is a comma separated list of input file names in the same order
|
|
267 OUTPATH is optional if a file is being generated, your script should write there
|
|
268 Your script should open and write files in the provided working directory if you are using the Html
|
|
269 automatic presentation option.
|
|
270
|
|
271 Python script command lines will have --INPATHS and --additional_arguments etc. to make it easy to use argparse
|
|
272
|
|
273 Rscript will need to use commandArgs(TRUE) - see the example below - additional arguments will
|
|
274 appear as themselves - eg foo="bar" will mean that foo is defined as "bar" for the script.
|
|
275
|
|
276 Bash and sh will see any additional parameters on their command lines and the 3 named parameters
|
|
277 in their environment magically - well, using env on the CL
|
|
278
|
|
279 ***python***::
|
|
280
|
|
281 # argparse for 3 possible comma separated lists
|
|
282 # additional parameters need to be parsed !
|
|
283 # then echo parameters to the output file
|
|
284 import sys
|
|
285 import argparse
|
|
286 argp=argparse.ArgumentParser()
|
|
287 argp.add_argument('--INNAMES',default=None)
|
|
288 argp.add_argument('--INPATHS',default=None)
|
|
289 argp.add_argument('--OUTPATH',default=None)
|
|
290 argp.add_argument('--additional_parameters',default=[],action="append")
|
|
291 argp.add_argument('otherargs', nargs=argparse.REMAINDER)
|
|
292 args = argp.parse_args()
|
|
293 f= open(args.OUTPATH,'w')
|
|
294 s = '### args=%s\n' % str(args)
|
|
295 f.write(s)
|
|
296 s = 'sys.argv=%s\n' % sys.argv
|
|
297 f.write(s)
|
|
298 f.close()
|
|
299
|
|
300
|
|
301
|
|
302 ***Rscript***::
|
|
303
|
|
304 # tool factory Rscript parser suggested by Forester
|
|
305 # http://www.r-bloggers.com/including-arguments-in-r-cmd-batch-mode/
|
|
306 # additional parameters will appear in the ls() below - they are available
|
|
307 # to your script
|
|
308 # echo parameters to the output file
|
|
309 ourargs = commandArgs(TRUE)
|
|
310 if(length(ourargs)==0){
|
|
311 print("No arguments supplied.")
|
|
312 }else{
|
|
313 for(i in 1:length(ourargs)){
|
|
314 eval(parse(text=ourargs[[i]]))
|
|
315 }
|
|
316 sink(OUTPATH)
|
|
317 cat('INPATHS=',INPATHS,'\n')
|
|
318 cat('INNAMES=',INNAMES,'\n')
|
|
319 cat('OUTPATH=',OUTPATH,'\n')
|
|
320 x=ls()
|
|
321 cat('all objects=',x,'\n')
|
|
322 sink()
|
|
323 }
|
|
324 sessionInfo()
|
|
325 print.noquote(date())
|
|
326
|
|
327
|
|
328 ***bash/sh***::
|
|
329
|
|
330 # tool factory sets up these environmental variables
|
|
331 # this example writes those to the output file
|
|
332 # additional params appear on command line
|
|
333 if [ ! -f "$OUTPATH" ] ; then
|
|
334 touch "$OUTPATH"
|
|
335 fi
|
|
336 echo "INPATHS=$INPATHS" >> "$OUTPATH"
|
|
337 echo "INNAMES=$INNAMES" >> "$OUTPATH"
|
|
338 echo "OUTPATH=$OUTPATH" >> "$OUTPATH"
|
|
339 echo "CL=$@" >> "$OUTPATH"
|
|
340
|
|
341 ***perl***::
|
|
342
|
|
343 (my $INPATHS,my $INNAMES,my $OUTPATH ) = @ARGV;
|
|
344 open(my $fh, '>', $OUTPATH) or die "Could not open file '$OUTPATH' $!";
|
|
345 print $fh "INPATHS=$INPATHS\n INNAMES=$INNAMES\n OUTPATH=$OUTPATH\n";
|
|
346 close $fh;
|
|
347
|
|
348
|
|
349
|
|
350 Galaxy as an IDE for developing API scripts
|
|
351 If you need to develop Galaxy API scripts and you like to live dangerously,
|
|
352 please read on.
|
|
353
|
|
354 Galaxy as an IDE?
|
|
355 Amazingly enough, blend-lib API scripts run perfectly well *inside*
|
|
356 Galaxy when pasted into a Tool Factory form. No need to generate a new
|
|
357 tool. Galaxy+Tool_Factory = IDE I think we need a new t-shirt. Seriously,
|
|
358 it is actually quite useable.
|
|
359
|
|
360 Why bother - what's wrong with Eclipse
|
|
361 Nothing. But, compared with developing API scripts in the usual way outside
|
|
362 Galaxy, you get persistence and other framework benefits plus at absolutely
|
|
363 no extra charge, a ginormous security problem if you share the history or
|
|
364 any outputs because they contain the api script with key so development
|
|
365 servers only please!
|
|
366
|
|
367 Workflow
|
|
368 Fire up the Tool Factory in Galaxy.
|
|
369
|
|
370 Leave the input box empty, set the interpreter to python, paste and run an
|
|
371 api script - eg working example (substitute the url and key) below.
|
|
372
|
|
373 It took me a few iterations to develop the example below because I know
|
|
374 almost nothing about the API. I started with very simple code from one of the
|
|
375 samples and after each run, the (edited..) api script is conveniently recreated
|
|
376 using the redo button on the history output item. So each successive version
|
|
377 of the developing api script you run is persisted - ready to be edited and
|
|
378 rerun easily. It is ''very'' handy to be able to add a line of code to the
|
|
379 script and run it, then view the output to (eg) inspect dicts returned by
|
|
380 API calls to help move progressively deeper iteratively.
|
|
381
|
|
382 Give the below a whirl on a private clone (install the tool factory from
|
|
383 the main toolshed) and try adding complexity with few rerun/edit/rerun cycles.
|
|
384
|
|
385 Eg tool factory api script
|
|
386 import sys
|
|
387 from blend.galaxy import GalaxyInstance
|
|
388 ourGal = 'http://x.x.x.x:xxxx'
|
|
389 ourKey = 'xxx'
|
|
390 gi = GalaxyInstance(ourGal, key=ourKey)
|
|
391 libs = gi.libraries.get_libraries()
|
|
392 res = []
|
|
393 # libs looks like
|
|
394 # u'url': u'/galaxy/api/libraries/441d8112651dc2f3', u'id':
|
|
395 u'441d8112651dc2f3', u'name':.... u'Demonstration sample RNA data',
|
|
396 for lib in libs:
|
|
397 res.append('%s:\n' % lib['name'])
|
|
398 res.append(str(gi.libraries.show_library(lib['id'],contents=True)))
|
|
399 outf=open(sys.argv[2],'w')
|
|
400 outf.write('\n'.join(res))
|
|
401 outf.close()
|
|
402
|
|
403 **Attribution**
|
|
404 Creating re-usable tools from scripts: The Galaxy Tool Factory
|
|
405 Ross Lazarus; Antony Kaspi; Mark Ziemann; The Galaxy Team
|
|
406 Bioinformatics 2012; doi: 10.1093/bioinformatics/bts573
|
|
407
|
|
408 http://bioinformatics.oxfordjournals.org/cgi/reprint/bts573?ijkey=lczQh1sWrMwdYWJ&keytype=ref
|
|
409
|
|
410 **Licensing**
|
|
411 Copyright Ross Lazarus 2010
|
|
412 ross lazarus at g mail period com
|
|
413
|
|
414 All rights reserved.
|
|
415
|
|
416 Licensed under the LGPL
|
|
417
|
|
418 **Obligatory screenshot**
|
|
419
|
|
420 http://bitbucket.org/fubar/galaxytoolmaker/src/fda8032fe989/images/dynamicScriptTool.png
|
|
421
|
|
422
|
|
423 ```
|
|
424
|