comparison toolfactory/README.md @ 32:4d578c8c1613 draft

passes planemo test
author fubar
date Fri, 07 Aug 2020 23:14:54 -0400
parents 6f48315c32c1
children c5290ea7bae0
comparison
equal deleted inserted replaced
31:69eed330c91f 32:4d578c8c1613
1 toolfactory_2 1 *WARNING before you start*
2 =============
3 2
4 This is an upgrade to the tool factory but with added parameters 3 Install this tool on a private Galaxy ONLY
5 (optionally editable in the generated tool form - otherwise fixed) and 4 Please NEVER on a public or production instance
6 multiple input files. 5
6 Updated august 2014 by John Chilton adding citation support
7 7
8 Specify any number of parameters - well at 8 Updated august 8 2014 to fix bugs reported by Marius van den Beek
9 least up to the limit of your patience with repeat groups.
10 9
11 Parameter values supplied at tool generation time are defaults and
12 can be optionally editable by the user - names cannot be changed once
13 a tool has been generated.
14
15 If not editable, they act as hidden parameters passed to the script
16 and are not editable on the tool form.
17
18 Note! There will be Galaxy default sanitization for all
19 user input parameters which your script may need to dance around.
20
21 Any number of input files can be passed to your script, but of course it
22 has to deal with them. Both path and metadata name are supplied either in the environment
23 (bash/sh) or as command line parameters (python,perl,rscript) that need to be parsed and
24 dealt with in the script. This is complicated by the common use case of needing file names
25 for (eg) column headers, as well as paths. Try the examples are show on the tool factory
26 form to see how Galaxy file and user supplied parameter values can be recovered in each
27 of the 4 scripting environments supported.
28
29 Best way to deal with multiple outputs is to let the tool factory generate an HTML
30 page for your users. It automagically lays out pdf images as thumbnail galleries
31 and can have separate results sections gathering all similarly prefixed files, such as
32 a Foo section taking text and results from text (foo_whatever.log) and
33 artifacts (eg foo_MDS_plot.pdf) file names. All artifacts are linked for download.
34 A copy of the actual script is provided for provenance - be warned, it exposes
35 real file paths.
36
37 **WARNING before you start**
38
39 Install this tool on a private Galaxy ONLY
40 Please NEVER on a public or production instance
41 Please cite the resource at 10 Please cite the resource at
42 http://bioinformatics.oxfordjournals.org/cgi/reprint/bts573?ijkey=lczQh1sWrMwdYWJ&keytype=ref 11 http://bioinformatics.oxfordjournals.org/cgi/reprint/bts573?ijkey=lczQh1sWrMwdYWJ&keytype=ref
43 if you use this tool in your published work. 12 if you use this tool in your published work.
44 13
45 14 **Short Story**
46 *Short Story*
47 15
48 This is an unusual Galaxy tool capable of generating new Galaxy tools. 16 This is an unusual Galaxy tool capable of generating new Galaxy tools.
49 It works by exposing *unrestricted* and therefore extremely dangerous scripting 17 It works by exposing *unrestricted* and therefore extremely dangerous scripting
50 to all designated administrators of the host Galaxy server, allowing them to 18 to all designated administrators of the host Galaxy server, allowing them to
51 run scripts in R, python, sh and perl over multiple selected input data sets, 19 run scripts in R, python, sh and perl over multiple selected input data sets,
52 writing a single new data set as output. 20 writing a single new data set as output.
53 21
54 *Differences between TF2 and the original Tool Factory* 22 *You have a working r/python/perl/bash script or any executable with positional or argparse style parameters*
55 23
56 1. TF2 (this one) allows any number of either fixed or user-editable parameters to be defined 24 It can be turned into an ordinary Galaxy tool in minutes, using a Galaxy tool.
57 for the new tool. If these are editable, the user can change them but otherwise, they are passed
58 as fixed and invisible parameters for each execution. Obviously, there are substantial security
59 implications with editable parameters, but these are always sanitized by Galaxy's inbuilt
60 parameter sanitization so you may need to "unsanitize" characters - eg translate all "__lt__"
61 into "<" for certain parameters where that is needed. Please practise safe toolshed.
62 25
63 2. Any number of (the same datatype) of input files may be defined.
64 26
65 These changes substantially complicate the way your supplied script is supplied with 27 **Automated generation of new Galaxy tools for installation into any Galaxy**
66 all the new and variable parameters. Examples in each scripting language are shown
67 in the tool help
68 28
69 *Automated outputs in named sections* 29 A test is generated using small sample test data inputs and parameter settings you supply.
70 30 Once the test case outputs have been produced, they can be used to build a
71 If your script writes to the current directory path, arbitrary mix of (eg) 31 new Galaxy tool. The supplied script or executable is baked as a requirement
72 pdfs, tabular analysis results and run logs,the tool factory can optionally 32 into a new, ordinary Galaxy tool, fully workflow compatible out of the box.
73 auto-generate a linked Html page with separate sections showing a thumbnail 33 Generated tools are installed via a tool shed by an administrator
74 grid for all pdfs and the log text, grouping all artifacts sharing a file
75 name and log name prefix.if "foo.log" is emitted then *all* other outputs matching foo_* will
76 all be grouped together - eg
77 - foo_baz.pdf
78 - foo_bar.pdf and
79 - foo_zot.xls
80
81 would all be displayed and linked in the same section with foo.log's contents to form the "Foo" section of the Html page.
82 Sections appear in alphabetic order and there are no limits on the number of files or sections.
83
84 *Automated generation of new Galaxy tools for installation into any Galaxy*
85
86 Once a script is working correctly, this tool optionally generates a
87 new Galaxy tool, effectively freezing the supplied script into a new,
88 ordinary Galaxy tool that runs it over one or more input files selected by
89 the user. Generated tools are installed via a tool shed by an administrator
90 and work exactly like all other Galaxy tools for your users. 34 and work exactly like all other Galaxy tools for your users.
91 35
92 If you use the Html output option, please ensure that sanitize_all_html is 36 **More Detail**
93 set to False and uncommented in universe_wsgi.ini - it should show
94
95 By default, all tool output served as 'text/html' will be sanitized
96 Change ```sanitize_all_html = False```
97
98 This opens potential security risks and may not be acceptable for public
99 sites where the lack of stylesheets may make Html pages damage onlookers'
100 eyeballs but should still be correct.
101
102 *More Detail*
103 37
104 To use the ToolFactory, you should have prepared a script to paste into a 38 To use the ToolFactory, you should have prepared a script to paste into a
105 text box, and a small test input example ready to select from your history 39 text box, or have a package in mind and a small test input example ready to select from your history
106 to test your new script. 40 to test your new script.
41
42 ```planemo test rgToolFactory2.xml --galaxy_root ~/galaxy --test_data ~/galaxy/tools/tool_makers/toolfactory/test-data``` works for me
107 43
108 There is an example in each scripting language on the Tool Factory form. You 44 There is an example in each scripting language on the Tool Factory form. You
109 can just cut and paste these to try it out - remember to select the right 45 can just cut and paste these to try it out - remember to select the right
110 interpreter please. You'll also need to create a small test data set using 46 interpreter please. You'll also need to create a small test data set using
111 the Galaxy history add new data tool. 47 the Galaxy history add new data tool.
115 and execute again. Rinse, wash, repeat. 51 and execute again. Rinse, wash, repeat.
116 52
117 Once the script runs sucessfully, a new Galaxy tool that runs your script 53 Once the script runs sucessfully, a new Galaxy tool that runs your script
118 can be generated. Select the "generate" option and supply some help text and 54 can be generated. Select the "generate" option and supply some help text and
119 names. The new tool will be generated in the form of a new Galaxy datatype 55 names. The new tool will be generated in the form of a new Galaxy datatype
120 - toolshed.gz - as the name suggests, it's an archive ready to upload to a 56 *toolshed.gz* - as the name suggests, it's an archive ready to upload to a
121 Galaxy ToolShed as a new tool repository. 57 Galaxy ToolShed as a new tool repository.
122 58
123 Once it's in a ToolShed, it can be installed into any local Galaxy server 59 Once it's in a ToolShed, it can be installed into any local Galaxy server
124 from the server administrative interface. 60 from the server administrative interface.
125 61
135 Galaxy and the ToolShed, you should be a developer installing this 71 Galaxy and the ToolShed, you should be a developer installing this
136 tool on a private/personal/scratch local instance where you are an 72 tool on a private/personal/scratch local instance where you are an
137 admin_user. Then, if you break it, you get to keep all the pieces see 73 admin_user. Then, if you break it, you get to keep all the pieces see
138 https://bitbucket.org/fubar/galaxytoolfactory/wiki/Home 74 https://bitbucket.org/fubar/galaxytoolfactory/wiki/Home
139 75
140 ** Installation ** 76 **Installation**
141 This is a Galaxy tool. You can install it most conveniently using the 77 This is a Galaxy tool. You can install it most conveniently using the
142 administrative "Search and browse tool sheds" link. Find the Galaxy Main 78 administrative "Search and browse tool sheds" link. Find the Galaxy Main
143 toolshed at https://toolshed.g2.bx.psu.edu/ and search for the toolfactory 79 toolshed at https://toolshed.g2.bx.psu.edu/ and search for the toolfactory
144 repository. Open it and review the code and select the option to install it. 80 repository. Open it and review the code and select the option to install it.
145 81
82 If you can't get the tool that way, the xml and py files here need to be
83 copied into a new tools
84 subdirectory such as tools/toolfactory Your tool_conf.xml needs a new entry
85 pointing to the xml
86 file - something like::
146 87
147 If you can't get the tool that way, the xml and py files here need to be
148 copied into a new tools subdirectory such as tools/toolfactory
149 Your tool_conf.xml needs a new entry pointing to the xml \file - something like
150 ```
151 <section name="Tool building tools" id="toolbuilders"> 88 <section name="Tool building tools" id="toolbuilders">
152 <tool file="toolfactory/rgToolFactory.xml"/> 89 <tool file="toolfactory/rgToolFactory.xml"/>
153 </section> 90 </section>
154 ``` 91
155 If not already there (I just added it to datatypes_conf.xml.sample), 92 If not already there,
156 please add: 93 please add:
157
158 ```
159 <datatype extension="toolshed.gz" type="galaxy.datatypes.binary:Binary" 94 <datatype extension="toolshed.gz" type="galaxy.datatypes.binary:Binary"
160 mimetype="multipart/x-gzip" subclass="True" /> 95 mimetype="multipart/x-gzip" subclass="True" />
161 ```
162 to your local data_types_conf.xml. 96 to your local data_types_conf.xml.
163 97
164 98
165 Of course, R, python, perl etc are needed on your path if you want to test 99 **Restricted execution**
166 scripts using those interpreters. Adding new ones to this tool code should
167 be easy enough. Please make suggestions as bitbucket issues and code. The
168 HTML file code automatically shrinks R's bloated pdfs, and depends on
169 ghostscript. The thumbnails require imagemagick .
170 100
171 * Restricted execution *
172 The tool factory tool itself will then be usable ONLY by admin users - 101 The tool factory tool itself will then be usable ONLY by admin users -
173 people with IDs in admin_users in universe_wsgi.ini **Yes, that's right. ONLY 102 people with IDs in admin_users in universe_wsgi.ini **Yes, that's right. ONLY
174 admin_users can run this tool** Think about it for a moment. If allowed to 103 admin_users can run this tool** Think about it for a moment. If allowed to
175 run any arbitrary script on your Galaxy server, the only thing that would 104 run any arbitrary script on your Galaxy server, the only thing that would
176 impede a miscreant bent on destroying all your Galaxy data would probably 105 impede a miscreant bent on destroying all your Galaxy data would probably
177 be lack of appropriate technical skills. 106 be lack of appropriate technical skills.
178 107
179 *What it does* This is a tool factory for simple scripts in python, R and 108 **What it does**
109
110 This is a tool factory for simple scripts in python, R and
180 perl currently. Functional tests are automatically generated. How cool is that. 111 perl currently. Functional tests are automatically generated. How cool is that.
181 112
182 LIMITED to simple scripts that read one input from the history. Optionally can 113 LIMITED to simple scripts that read one input from the history. Optionally can
183 write one new history dataset, and optionally collect any number of outputs 114 write one new history dataset, and optionally collect any number of outputs
184 into links on an autogenerated HTML index page for the user to navigate - 115 into links on an autogenerated HTML index page for the user to navigate -
188 119
189 Generated tools can be edited and enhanced like any Galaxy tool, so start 120 Generated tools can be edited and enhanced like any Galaxy tool, so start
190 small and build up since a generated script gets you a serious leg up to a 121 small and build up since a generated script gets you a serious leg up to a
191 more complex one. 122 more complex one.
192 123
193 *What you do* You paste and run your script, you fix the syntax errors and 124 **What you do**
125
126 You paste and run your script, you fix the syntax errors and
194 eventually it runs. You can use the redo button and edit the script before 127 eventually it runs. You can use the redo button and edit the script before
195 trying to rerun it as you debug - it works pretty well. 128 trying to rerun it as you debug - it works pretty well.
196 129
197 Once the script works on some test data, you can generate a toolshed compatible 130 Once the script works on some test data, you can generate a toolshed compatible
198 gzip file containing your script ready to run as an ordinary Galaxy tool in 131 gzip file containing your script ready to run as an ordinary Galaxy tool in
199 a repository on your local toolshed. That means safe and largely automated 132 a repository on your local toolshed. That means safe and largely automated
200 installation in any production Galaxy configured to use your toolshed. 133 installation in any production Galaxy configured to use your toolshed.
201 134
202 *Generated tool Security* Once you install a generated tool, it's just 135 **Generated tool Security**
136
137 Once you install a generated tool, it's just
203 another tool - assuming the script is safe. They just run normally and their 138 another tool - assuming the script is safe. They just run normally and their
204 user cannot do anything unusually insecure but please, practice safe toolshed. 139 user cannot do anything unusually insecure but please, practice safe toolshed.
205 Read the fucking code before you install any tool. Especially this one - 140 Read the code before you install any tool. Especially this one - it is really scary.
206 it is really scary.
207 141
208 If you opt for an HTML output, you get all the script outputs arranged 142 **Send Code**
209 as a single Html history item - all output files are linked, thumbnails for
210 all the pdfs. Ugly but really inexpensive.
211 143
212 Patches and suggestions welcome as bitbucket issues please? 144 Patches and suggestions welcome as bitbucket issues please?
213 145
214 copyright ross lazarus (ross stop lazarus at gmail stop com) May 2012 146 **Attribution**
215 147
216 all rights reserved
217 Licensed under the LGPL if you want to improve it, feel free
218 https://bitbucket.org/fubar/galaxytoolfactory/wiki/Home
219
220 Material for our more enthusiastic and voracious readers continues below -
221 we salute you.
222
223 **Motivation** Simple transformation, filtering or reporting scripts get
224 written, run and lost every day in most busy labs - even ours where Galaxy is
225 in use. This 'dark script matter' is pervasive and generally not reproducible.
226
227 **Benefits** For our group, this allows Galaxy to fill that important dark
228 script gap - all those "small" bioinformatics tasks. Once a user has a working
229 R (or python or perl) script that does something Galaxy cannot currently do
230 (eg transpose a tabular file) and takes parameters the way Galaxy supplies
231 them (see example below), they:
232
233 1. Install the tool factory on a personal private instance
234
235 2. Upload a small test data set
236
237 3. Paste the script into the 'script' text box and iteratively run the
238 insecure tool on test data until it works right - there is absolutely no
239 reason to do this anywhere other than on a personal private instance.
240
241 4. Once it works right, set the 'Generate toolshed gzip' option and run
242 it again.
243
244 5. A toolshed style gzip appears ready to upload and install like any other
245 Toolshed entry.
246
247 6. Upload the new tool to the toolshed
248
249 7. Ask the local admin to check the new tool to confirm it's not evil and
250 install it in the local production galaxy
251
252
253
254 **Parameter passing and file inputs**
255
256 Your script will receive up to 3 named parameters
257 INPATHS is a comma separated list of input file paths
258 INNAMES is a comma separated list of input file names in the same order
259 OUTPATH is optional if a file is being generated, your script should write there
260 Your script should open and write files in the provided working directory if you are using the Html
261 automatic presentation option.
262
263 Python script command lines will have --INPATHS and --additional_arguments etc. to make it easy to use argparse
264
265 Rscript will need to use commandArgs(TRUE) - see the example below - additional arguments will
266 appear as themselves - eg foo="bar" will mean that foo is defined as "bar" for the script.
267
268 Bash and sh will see any additional parameters on their command lines and the 3 named parameters
269 in their environment magically - well, using env on the CL
270 ```
271 ***python***::
272
273 # argparse for 3 possible comma separated lists
274 # additional parameters need to be parsed !
275 # then echo parameters to the output file
276 import sys
277 import argparse
278 argp=argparse.ArgumentParser()
279 argp.add_argument('--INNAMES',default=None)
280 argp.add_argument('--INPATHS',default=None)
281 argp.add_argument('--OUTPATH',default=None)
282 argp.add_argument('--additional_parameters',default=[],action="append")
283 argp.add_argument('otherargs', nargs=argparse.REMAINDER)
284 args = argp.parse_args()
285 f= open(args.OUTPATH,'w')
286 s = '### args=%s\n' % str(args)
287 f.write(s)
288 s = 'sys.argv=%s\n' % sys.argv
289 f.write(s)
290 f.close()
291
292
293
294 ***Rscript***::
295
296 # tool factory Rscript parser suggested by Forester
297 # http://www.r-bloggers.com/including-arguments-in-r-cmd-batch-mode/
298 # additional parameters will appear in the ls() below - they are available
299 # to your script
300 # echo parameters to the output file
301 ourargs = commandArgs(TRUE)
302 if(length(ourargs)==0){
303 print("No arguments supplied.")
304 }else{
305 for(i in 1:length(ourargs)){
306 eval(parse(text=ourargs[[i]]))
307 }
308 sink(OUTPATH)
309 cat('INPATHS=',INPATHS,'\n')
310 cat('INNAMES=',INNAMES,'\n')
311 cat('OUTPATH=',OUTPATH,'\n')
312 x=ls()
313 cat('all objects=',x,'\n')
314 sink()
315 }
316 sessionInfo()
317 print.noquote(date())
318
319
320 ***bash/sh***::
321
322 # tool factory sets up these environmental variables
323 # this example writes those to the output file
324 # additional params appear on command line
325 if [ ! -f "$OUTPATH" ] ; then
326 touch "$OUTPATH"
327 fi
328 echo "INPATHS=$INPATHS" >> "$OUTPATH"
329 echo "INNAMES=$INNAMES" >> "$OUTPATH"
330 echo "OUTPATH=$OUTPATH" >> "$OUTPATH"
331 echo "CL=$@" >> "$OUTPATH"
332
333 ***perl***::
334
335 (my $INPATHS,my $INNAMES,my $OUTPATH ) = @ARGV;
336 open(my $fh, '>', $OUTPATH) or die "Could not open file '$OUTPATH' $!";
337 print $fh "INPATHS=$INPATHS\n INNAMES=$INNAMES\n OUTPATH=$OUTPATH\n";
338 close $fh;
339
340 ```
341
342 Galaxy as an IDE for developing API scripts
343 If you need to develop Galaxy API scripts and you like to live dangerously,
344 please read on.
345
346 Galaxy as an IDE?
347 Amazingly enough, blend-lib API scripts run perfectly well *inside*
348 Galaxy when pasted into a Tool Factory form. No need to generate a new
349 tool. Galaxy+Tool_Factory = IDE I think we need a new t-shirt. Seriously,
350 it is actually quite useable.
351
352 Why bother - what's wrong with Eclipse
353 Nothing. But, compared with developing API scripts in the usual way outside
354 Galaxy, you get persistence and other framework benefits plus at absolutely
355 no extra charge, a ginormous security problem if you share the history or
356 any outputs because they contain the api script with key so development
357 servers only please!
358
359 Workflow
360 Fire up the Tool Factory in Galaxy.
361
362 Leave the input box empty, set the interpreter to python, paste and run an
363 api script - eg working example (substitute the url and key) below.
364
365 It took me a few iterations to develop the example below because I know
366 almost nothing about the API. I started with very simple code from one of the
367 samples and after each run, the (edited..) api script is conveniently recreated
368 using the redo button on the history output item. So each successive version
369 of the developing api script you run is persisted - ready to be edited and
370 rerun easily. It is ''very'' handy to be able to add a line of code to the
371 script and run it, then view the output to (eg) inspect dicts returned by
372 API calls to help move progressively deeper iteratively.
373
374 Give the below a whirl on a private clone (install the tool factory from
375 the main toolshed) and try adding complexity with few rerun/edit/rerun cycles.
376
377 Eg tool factory api script
378 ```
379 import sys
380 from blend.galaxy import GalaxyInstance
381 ourGal = 'http://x.x.x.x:xxxx'
382 ourKey = 'xxx'
383 gi = GalaxyInstance(ourGal, key=ourKey)
384 libs = gi.libraries.get_libraries()
385 res = []
386 # libs looks like
387 # u'url': u'/galaxy/api/libraries/441d8112651dc2f3', u'id':
388 u'441d8112651dc2f3', u'name':.... u'Demonstration sample RNA data',
389 for lib in libs:
390 res.append('%s:\n' % lib['name'])
391 res.append(str(gi.libraries.show_library(lib['id'],contents=True)))
392 outf=open(sys.argv[2],'w')
393 outf.write('\n'.join(res))
394 outf.close()
395 ```
396
397 **Attribution**
398 Creating re-usable tools from scripts: The Galaxy Tool Factory 148 Creating re-usable tools from scripts: The Galaxy Tool Factory
399 Ross Lazarus; Antony Kaspi; Mark Ziemann; The Galaxy Team 149 Ross Lazarus; Antony Kaspi; Mark Ziemann; The Galaxy Team
400 Bioinformatics 2012; doi: 10.1093/bioinformatics/bts573 150 Bioinformatics 2012; doi: 10.1093/bioinformatics/bts573
401 151
402 http://bioinformatics.oxfordjournals.org/cgi/reprint/bts573?ijkey=lczQh1sWrMwdYWJ&keytype=ref 152 http://bioinformatics.oxfordjournals.org/cgi/reprint/bts573?ijkey=lczQh1sWrMwdYWJ&keytype=ref
403 153
404 **Licensing** 154 **Licensing**
155
405 Copyright Ross Lazarus 2010 156 Copyright Ross Lazarus 2010
406 ross lazarus at g mail period com 157 ross lazarus at g mail period com
407 158
408 All rights reserved. 159 All rights reserved.
409 160
410 Licensed under the LGPL 161 Licensed under the LGPL
411 162
412 **screenshot** 163 **Obligatory screenshot**
413 164
414 ![example run](/images/dynamicScriptTool.png) 165 http://bitbucket.org/fubar/galaxytoolmaker/src/fda8032fe989/images/dynamicScriptTool.png
415 166
416
417 ```
418