comparison README.md @ 25:9fe74bd23af2 draft

Uploaded
author fubar
date Mon, 02 Mar 2015 05:18:21 -0500
parents
children db35d39e1de9
comparison
equal deleted inserted replaced
24:1a4d3923aa9f 25:9fe74bd23af2
1 toolfactory_2
2 =============
3
4 This is an upgrade to the tool factory but with added parameters
5 (optionally editable in the generated tool form - otherwise fixed) and
6 multiple input files.
7
8 Specify any number of parameters - well at
9 least up to the limit of your patience with repeat groups.
10
11 Parameter values supplied at tool generation time are defaults and
12 can be optionally editable by the user - names cannot be changed once
13 a tool has been generated.
14
15 If not editable, they act as hidden parameters passed to the script
16 and are not editable on the tool form.
17
18 Note! There will be Galaxy default sanitization for all
19 user input parameters which your script may need to dance around.
20
21 Any number of input files can be passed to your script, but of course it
22 has to deal with them. Both path and metadata name are supplied either in the environment
23 (bash/sh) or as command line parameters (python,perl,rscript) that need to be parsed and
24 dealt with in the script. This is complicated by the common use case of needing file names
25 for (eg) column headers, as well as paths. Try the examples are show on the tool factory
26 form to see how Galaxy file and user supplied parameter values can be recovered in each
27 of the 4 scripting environments supported.
28
29 Best way to deal with multiple outputs is to let the tool factory generate an HTML
30 page for your users. It automagically lays out pdf images as thumbnail galleries
31 and can have separate results sections gathering all similarly prefixed files, such as
32 a Foo section taking text and results from text (foo_whatever.log) and
33 artifacts (eg foo_MDS_plot.pdf) file names. All artifacts are linked for download.
34 A copy of the actual script is provided for provenance - be warned, it exposes
35 real file paths.
36
37
38 tldr;
39
40 ```
41
42 # WARNING before you start
43 # Install this tool on a private Galaxy ONLY
44 # Please NEVER on a public or production instance
45 # updated august 2014 by John Chilton adding citation support
46 #
47 # updated august 8 2014 to fix bugs reported by Marius van den Beek
48 # please cite the resource at
49 http://bioinformatics.oxfordjournals.org/cgi/reprint/bts573?ijkey=lczQh1sWrMwdYWJ&keytype=ref
50 # if you use this tool in your published work.
51
52 *Short Story*
53
54 This is an unusual Galaxy tool capable of generating new Galaxy tools.
55 It works by exposing *unrestricted* and therefore extremely dangerous scripting
56 to all designated administrators of the host Galaxy server, allowing them to
57 run scripts in R, python, sh and perl over multiple selected input data sets,
58 writing a single new data set as output.
59
60 *Differences between TF2 and the original Tool Factory*
61
62 1. TF2 (this one) allows any number of either fixed or user-editable parameters to be defined
63 for the new tool. If these are editable, the user can change them but otherwise, they are passed
64 as fixed and invisible parameters for each execution. Obviously, there are substantial security
65 implications with editable parameters, but these are always sanitized by Galaxy's inbuilt
66 parameter sanitization so you may need to "unsanitize" characters - eg translate all "__lt__"
67 into "<" for certain parameters where that is needed. Please practise safe toolshed.
68
69 2. Any number of (the same datatype) of input files may be defined.
70
71 These changes substantially complicate the way your supplied script is supplied with
72 all the new and variable parameters. Examples in each scripting language are shown
73 in the tool help
74
75 *Automated outputs in named sections*
76
77 If your script writes to the current directory path, arbitrary mix of (eg)
78 pdfs, tabular analysis results and run logs,the tool factory can optionally
79 auto-generate a linked Html page with separate sections showing a thumbnail
80 grid for all pdfs and the log text, grouping all artifacts sharing a file
81 name and log name prefix::
82
83 eg: if "foo.log" is emitted then *all* other outputs matching foo_* will
84 all be grouped together - eg
85 foo_baz.pdf
86 foo_bar.pdf and
87 foo_zot.xls
88 would all be displayed and linked in the same section with foo.log's contents
89 - to form the "Foo" section of the Html page. Sections appear in alphabetic
90 order and there are no limits on the number of files or sections.
91
92 *Automated generation of new Galaxy tools for installation into any Galaxy*
93
94 Once a script is working correctly, this tool optionally generates a
95 new Galaxy tool, effectively freezing the supplied script into a new,
96 ordinary Galaxy tool that runs it over one or more input files selected by
97 the user. Generated tools are installed via a tool shed by an administrator
98 and work exactly like all other Galaxy tools for your users.
99
100 If you use the Html output option, please ensure that sanitize_all_html is
101 set to False and uncommented in universe_wsgi.ini - it should show::
102
103 # By default, all tool output served as 'text/html' will be sanitized
104 sanitize_all_html = False
105
106 This opens potential security risks and may not be acceptable for public
107 sites where the lack of stylesheets may make Html pages damage onlookers'
108 eyeballs but should still be correct.
109
110
111 *More Detail*
112
113 To use the ToolFactory, you should have prepared a script to paste into a
114 text box, and a small test input example ready to select from your history
115 to test your new script.
116
117 There is an example in each scripting language on the Tool Factory form. You
118 can just cut and paste these to try it out - remember to select the right
119 interpreter please. You'll also need to create a small test data set using
120 the Galaxy history add new data tool.
121
122 If the script fails somehow, use the "redo" button on the tool output in
123 your history to recreate the form complete with broken script. Fix the bug
124 and execute again. Rinse, wash, repeat.
125
126 Once the script runs sucessfully, a new Galaxy tool that runs your script
127 can be generated. Select the "generate" option and supply some help text and
128 names. The new tool will be generated in the form of a new Galaxy datatype
129 - toolshed.gz - as the name suggests, it's an archive ready to upload to a
130 Galaxy ToolShed as a new tool repository.
131
132 Once it's in a ToolShed, it can be installed into any local Galaxy server
133 from the server administrative interface.
134
135 Once the new tool is installed, local users can run it - each time, the script
136 that was supplied when it was built will be executed with the input chosen
137 from the user's history. In other words, the tools you generate with the
138 ToolFactory run just like any other Galaxy tool,but run your script every time.
139
140 Tool factory tools are perfect for workflow components. One input, one output,
141 no variables.
142
143 *To fully and safely exploit the awesome power* of this tool,
144 Galaxy and the ToolShed, you should be a developer installing this
145 tool on a private/personal/scratch local instance where you are an
146 admin_user. Then, if you break it, you get to keep all the pieces see
147 https://bitbucket.org/fubar/galaxytoolfactory/wiki/Home
148
149 ** Installation **
150 This is a Galaxy tool. You can install it most conveniently using the
151 administrative "Search and browse tool sheds" link. Find the Galaxy Main
152 toolshed at https://toolshed.g2.bx.psu.edu/ and search for the toolfactory
153 repository. Open it and review the code and select the option to install it.
154
155 (
156 If you can't get the tool that way, the xml and py files here need to be
157 copied into a new tools
158 subdirectory such as tools/toolfactory Your tool_conf.xml needs a new entry
159 pointing to the xml
160 file - something like::
161
162 <section name="Tool building tools" id="toolbuilders">
163 <tool file="toolfactory/rgToolFactory.xml"/>
164 </section>
165
166 If not already there (I just added it to datatypes_conf.xml.sample),
167 please add:
168 <datatype extension="toolshed.gz" type="galaxy.datatypes.binary:Binary"
169 mimetype="multipart/x-gzip" subclass="True" />
170 to your local data_types_conf.xml.
171 )
172
173 Of course, R, python, perl etc are needed on your path if you want to test
174 scripts using those interpreters. Adding new ones to this tool code should
175 be easy enough. Please make suggestions as bitbucket issues and code. The
176 HTML file code automatically shrinks R's bloated pdfs, and depends on
177 ghostscript. The thumbnails require imagemagick .
178
179 * Restricted execution *
180 The tool factory tool itself will then be usable ONLY by admin users -
181 people with IDs in admin_users in universe_wsgi.ini **Yes, that's right. ONLY
182 admin_users can run this tool** Think about it for a moment. If allowed to
183 run any arbitrary script on your Galaxy server, the only thing that would
184 impede a miscreant bent on destroying all your Galaxy data would probably
185 be lack of appropriate technical skills.
186
187 *What it does* This is a tool factory for simple scripts in python, R and
188 perl currently. Functional tests are automatically generated. How cool is that.
189
190 LIMITED to simple scripts that read one input from the history. Optionally can
191 write one new history dataset, and optionally collect any number of outputs
192 into links on an autogenerated HTML index page for the user to navigate -
193 useful if the script writes images and output files - pdf outputs are shown
194 as thumbnails and R's bloated pdf's are shrunk with ghostscript so that and
195 imagemagik need to be available.
196
197 Generated tools can be edited and enhanced like any Galaxy tool, so start
198 small and build up since a generated script gets you a serious leg up to a
199 more complex one.
200
201 *What you do* You paste and run your script, you fix the syntax errors and
202 eventually it runs. You can use the redo button and edit the script before
203 trying to rerun it as you debug - it works pretty well.
204
205 Once the script works on some test data, you can generate a toolshed compatible
206 gzip file containing your script ready to run as an ordinary Galaxy tool in
207 a repository on your local toolshed. That means safe and largely automated
208 installation in any production Galaxy configured to use your toolshed.
209
210 *Generated tool Security* Once you install a generated tool, it's just
211 another tool - assuming the script is safe. They just run normally and their
212 user cannot do anything unusually insecure but please, practice safe toolshed.
213 Read the fucking code before you install any tool. Especially this one -
214 it is really scary.
215
216 If you opt for an HTML output, you get all the script outputs arranged
217 as a single Html history item - all output files are linked, thumbnails for
218 all the pdfs. Ugly but really inexpensive.
219
220 Patches and suggestions welcome as bitbucket issues please?
221
222 copyright ross lazarus (ross stop lazarus at gmail stop com) May 2012
223
224 all rights reserved
225 Licensed under the LGPL if you want to improve it, feel free
226 https://bitbucket.org/fubar/galaxytoolfactory/wiki/Home
227
228 Material for our more enthusiastic and voracious readers continues below -
229 we salute you.
230
231 **Motivation** Simple transformation, filtering or reporting scripts get
232 written, run and lost every day in most busy labs - even ours where Galaxy is
233 in use. This 'dark script matter' is pervasive and generally not reproducible.
234
235 **Benefits** For our group, this allows Galaxy to fill that important dark
236 script gap - all those "small" bioinformatics tasks. Once a user has a working
237 R (or python or perl) script that does something Galaxy cannot currently do
238 (eg transpose a tabular file) and takes parameters the way Galaxy supplies
239 them (see example below), they:
240
241 1. Install the tool factory on a personal private instance
242
243 2. Upload a small test data set
244
245 3. Paste the script into the 'script' text box and iteratively run the
246 insecure tool on test data until it works right - there is absolutely no
247 reason to do this anywhere other than on a personal private instance.
248
249 4. Once it works right, set the 'Generate toolshed gzip' option and run
250 it again.
251
252 5. A toolshed style gzip appears ready to upload and install like any other
253 Toolshed entry.
254
255 6. Upload the new tool to the toolshed
256
257 7. Ask the local admin to check the new tool to confirm it's not evil and
258 install it in the local production galaxy
259
260
261
262 **Parameter passing and file inputs**
263
264 Your script will receive up to 3 named parameters
265 INPATHS is a comma separated list of input file paths
266 INNAMES is a comma separated list of input file names in the same order
267 OUTPATH is optional if a file is being generated, your script should write there
268 Your script should open and write files in the provided working directory if you are using the Html
269 automatic presentation option.
270
271 Python script command lines will have --INPATHS and --additional_arguments etc. to make it easy to use argparse
272
273 Rscript will need to use commandArgs(TRUE) - see the example below - additional arguments will
274 appear as themselves - eg foo="bar" will mean that foo is defined as "bar" for the script.
275
276 Bash and sh will see any additional parameters on their command lines and the 3 named parameters
277 in their environment magically - well, using env on the CL
278
279 ***python***::
280
281 # argparse for 3 possible comma separated lists
282 # additional parameters need to be parsed !
283 # then echo parameters to the output file
284 import sys
285 import argparse
286 argp=argparse.ArgumentParser()
287 argp.add_argument('--INNAMES',default=None)
288 argp.add_argument('--INPATHS',default=None)
289 argp.add_argument('--OUTPATH',default=None)
290 argp.add_argument('--additional_parameters',default=[],action="append")
291 argp.add_argument('otherargs', nargs=argparse.REMAINDER)
292 args = argp.parse_args()
293 f= open(args.OUTPATH,'w')
294 s = '### args=%s\n' % str(args)
295 f.write(s)
296 s = 'sys.argv=%s\n' % sys.argv
297 f.write(s)
298 f.close()
299
300
301
302 ***Rscript***::
303
304 # tool factory Rscript parser suggested by Forester
305 # http://www.r-bloggers.com/including-arguments-in-r-cmd-batch-mode/
306 # additional parameters will appear in the ls() below - they are available
307 # to your script
308 # echo parameters to the output file
309 ourargs = commandArgs(TRUE)
310 if(length(ourargs)==0){
311 print("No arguments supplied.")
312 }else{
313 for(i in 1:length(ourargs)){
314 eval(parse(text=ourargs[[i]]))
315 }
316 sink(OUTPATH)
317 cat('INPATHS=',INPATHS,'\n')
318 cat('INNAMES=',INNAMES,'\n')
319 cat('OUTPATH=',OUTPATH,'\n')
320 x=ls()
321 cat('all objects=',x,'\n')
322 sink()
323 }
324 sessionInfo()
325 print.noquote(date())
326
327
328 ***bash/sh***::
329
330 # tool factory sets up these environmental variables
331 # this example writes those to the output file
332 # additional params appear on command line
333 if [ ! -f "$OUTPATH" ] ; then
334 touch "$OUTPATH"
335 fi
336 echo "INPATHS=$INPATHS" >> "$OUTPATH"
337 echo "INNAMES=$INNAMES" >> "$OUTPATH"
338 echo "OUTPATH=$OUTPATH" >> "$OUTPATH"
339 echo "CL=$@" >> "$OUTPATH"
340
341 ***perl***::
342
343 (my $INPATHS,my $INNAMES,my $OUTPATH ) = @ARGV;
344 open(my $fh, '>', $OUTPATH) or die "Could not open file '$OUTPATH' $!";
345 print $fh "INPATHS=$INPATHS\n INNAMES=$INNAMES\n OUTPATH=$OUTPATH\n";
346 close $fh;
347
348
349
350 Galaxy as an IDE for developing API scripts
351 If you need to develop Galaxy API scripts and you like to live dangerously,
352 please read on.
353
354 Galaxy as an IDE?
355 Amazingly enough, blend-lib API scripts run perfectly well *inside*
356 Galaxy when pasted into a Tool Factory form. No need to generate a new
357 tool. Galaxy+Tool_Factory = IDE I think we need a new t-shirt. Seriously,
358 it is actually quite useable.
359
360 Why bother - what's wrong with Eclipse
361 Nothing. But, compared with developing API scripts in the usual way outside
362 Galaxy, you get persistence and other framework benefits plus at absolutely
363 no extra charge, a ginormous security problem if you share the history or
364 any outputs because they contain the api script with key so development
365 servers only please!
366
367 Workflow
368 Fire up the Tool Factory in Galaxy.
369
370 Leave the input box empty, set the interpreter to python, paste and run an
371 api script - eg working example (substitute the url and key) below.
372
373 It took me a few iterations to develop the example below because I know
374 almost nothing about the API. I started with very simple code from one of the
375 samples and after each run, the (edited..) api script is conveniently recreated
376 using the redo button on the history output item. So each successive version
377 of the developing api script you run is persisted - ready to be edited and
378 rerun easily. It is ''very'' handy to be able to add a line of code to the
379 script and run it, then view the output to (eg) inspect dicts returned by
380 API calls to help move progressively deeper iteratively.
381
382 Give the below a whirl on a private clone (install the tool factory from
383 the main toolshed) and try adding complexity with few rerun/edit/rerun cycles.
384
385 Eg tool factory api script
386 import sys
387 from blend.galaxy import GalaxyInstance
388 ourGal = 'http://x.x.x.x:xxxx'
389 ourKey = 'xxx'
390 gi = GalaxyInstance(ourGal, key=ourKey)
391 libs = gi.libraries.get_libraries()
392 res = []
393 # libs looks like
394 # u'url': u'/galaxy/api/libraries/441d8112651dc2f3', u'id':
395 u'441d8112651dc2f3', u'name':.... u'Demonstration sample RNA data',
396 for lib in libs:
397 res.append('%s:\n' % lib['name'])
398 res.append(str(gi.libraries.show_library(lib['id'],contents=True)))
399 outf=open(sys.argv[2],'w')
400 outf.write('\n'.join(res))
401 outf.close()
402
403 **Attribution**
404 Creating re-usable tools from scripts: The Galaxy Tool Factory
405 Ross Lazarus; Antony Kaspi; Mark Ziemann; The Galaxy Team
406 Bioinformatics 2012; doi: 10.1093/bioinformatics/bts573
407
408 http://bioinformatics.oxfordjournals.org/cgi/reprint/bts573?ijkey=lczQh1sWrMwdYWJ&keytype=ref
409
410 **Licensing**
411 Copyright Ross Lazarus 2010
412 ross lazarus at g mail period com
413
414 All rights reserved.
415
416 Licensed under the LGPL
417
418 **Obligatory screenshot**
419
420 http://bitbucket.org/fubar/galaxytoolmaker/src/fda8032fe989/images/dynamicScriptTool.png
421
422
423 ```
424