Mercurial > repos > fubar > tool_factory_2
comparison toolfactory/README.md @ 30:6f48315c32c1 draft
Uploaded
author | fubar |
---|---|
date | Fri, 07 Aug 2020 07:54:23 -0400 |
parents | |
children | 4d578c8c1613 |
comparison
equal
deleted
inserted
replaced
29:6db39cbc3242 | 30:6f48315c32c1 |
---|---|
1 toolfactory_2 | |
2 ============= | |
3 | |
4 This is an upgrade to the tool factory but with added parameters | |
5 (optionally editable in the generated tool form - otherwise fixed) and | |
6 multiple input files. | |
7 | |
8 Specify any number of parameters - well at | |
9 least up to the limit of your patience with repeat groups. | |
10 | |
11 Parameter values supplied at tool generation time are defaults and | |
12 can be optionally editable by the user - names cannot be changed once | |
13 a tool has been generated. | |
14 | |
15 If not editable, they act as hidden parameters passed to the script | |
16 and are not editable on the tool form. | |
17 | |
18 Note! There will be Galaxy default sanitization for all | |
19 user input parameters which your script may need to dance around. | |
20 | |
21 Any number of input files can be passed to your script, but of course it | |
22 has to deal with them. Both path and metadata name are supplied either in the environment | |
23 (bash/sh) or as command line parameters (python,perl,rscript) that need to be parsed and | |
24 dealt with in the script. This is complicated by the common use case of needing file names | |
25 for (eg) column headers, as well as paths. Try the examples are show on the tool factory | |
26 form to see how Galaxy file and user supplied parameter values can be recovered in each | |
27 of the 4 scripting environments supported. | |
28 | |
29 Best way to deal with multiple outputs is to let the tool factory generate an HTML | |
30 page for your users. It automagically lays out pdf images as thumbnail galleries | |
31 and can have separate results sections gathering all similarly prefixed files, such as | |
32 a Foo section taking text and results from text (foo_whatever.log) and | |
33 artifacts (eg foo_MDS_plot.pdf) file names. All artifacts are linked for download. | |
34 A copy of the actual script is provided for provenance - be warned, it exposes | |
35 real file paths. | |
36 | |
37 **WARNING before you start** | |
38 | |
39 Install this tool on a private Galaxy ONLY | |
40 Please NEVER on a public or production instance | |
41 Please cite the resource at | |
42 http://bioinformatics.oxfordjournals.org/cgi/reprint/bts573?ijkey=lczQh1sWrMwdYWJ&keytype=ref | |
43 if you use this tool in your published work. | |
44 | |
45 | |
46 *Short Story* | |
47 | |
48 This is an unusual Galaxy tool capable of generating new Galaxy tools. | |
49 It works by exposing *unrestricted* and therefore extremely dangerous scripting | |
50 to all designated administrators of the host Galaxy server, allowing them to | |
51 run scripts in R, python, sh and perl over multiple selected input data sets, | |
52 writing a single new data set as output. | |
53 | |
54 *Differences between TF2 and the original Tool Factory* | |
55 | |
56 1. TF2 (this one) allows any number of either fixed or user-editable parameters to be defined | |
57 for the new tool. If these are editable, the user can change them but otherwise, they are passed | |
58 as fixed and invisible parameters for each execution. Obviously, there are substantial security | |
59 implications with editable parameters, but these are always sanitized by Galaxy's inbuilt | |
60 parameter sanitization so you may need to "unsanitize" characters - eg translate all "__lt__" | |
61 into "<" for certain parameters where that is needed. Please practise safe toolshed. | |
62 | |
63 2. Any number of (the same datatype) of input files may be defined. | |
64 | |
65 These changes substantially complicate the way your supplied script is supplied with | |
66 all the new and variable parameters. Examples in each scripting language are shown | |
67 in the tool help | |
68 | |
69 *Automated outputs in named sections* | |
70 | |
71 If your script writes to the current directory path, arbitrary mix of (eg) | |
72 pdfs, tabular analysis results and run logs,the tool factory can optionally | |
73 auto-generate a linked Html page with separate sections showing a thumbnail | |
74 grid for all pdfs and the log text, grouping all artifacts sharing a file | |
75 name and log name prefix.if "foo.log" is emitted then *all* other outputs matching foo_* will | |
76 all be grouped together - eg | |
77 - foo_baz.pdf | |
78 - foo_bar.pdf and | |
79 - foo_zot.xls | |
80 | |
81 would all be displayed and linked in the same section with foo.log's contents to form the "Foo" section of the Html page. | |
82 Sections appear in alphabetic order and there are no limits on the number of files or sections. | |
83 | |
84 *Automated generation of new Galaxy tools for installation into any Galaxy* | |
85 | |
86 Once a script is working correctly, this tool optionally generates a | |
87 new Galaxy tool, effectively freezing the supplied script into a new, | |
88 ordinary Galaxy tool that runs it over one or more input files selected by | |
89 the user. Generated tools are installed via a tool shed by an administrator | |
90 and work exactly like all other Galaxy tools for your users. | |
91 | |
92 If you use the Html output option, please ensure that sanitize_all_html is | |
93 set to False and uncommented in universe_wsgi.ini - it should show | |
94 | |
95 By default, all tool output served as 'text/html' will be sanitized | |
96 Change ```sanitize_all_html = False``` | |
97 | |
98 This opens potential security risks and may not be acceptable for public | |
99 sites where the lack of stylesheets may make Html pages damage onlookers' | |
100 eyeballs but should still be correct. | |
101 | |
102 *More Detail* | |
103 | |
104 To use the ToolFactory, you should have prepared a script to paste into a | |
105 text box, and a small test input example ready to select from your history | |
106 to test your new script. | |
107 | |
108 There is an example in each scripting language on the Tool Factory form. You | |
109 can just cut and paste these to try it out - remember to select the right | |
110 interpreter please. You'll also need to create a small test data set using | |
111 the Galaxy history add new data tool. | |
112 | |
113 If the script fails somehow, use the "redo" button on the tool output in | |
114 your history to recreate the form complete with broken script. Fix the bug | |
115 and execute again. Rinse, wash, repeat. | |
116 | |
117 Once the script runs sucessfully, a new Galaxy tool that runs your script | |
118 can be generated. Select the "generate" option and supply some help text and | |
119 names. The new tool will be generated in the form of a new Galaxy datatype | |
120 - toolshed.gz - as the name suggests, it's an archive ready to upload to a | |
121 Galaxy ToolShed as a new tool repository. | |
122 | |
123 Once it's in a ToolShed, it can be installed into any local Galaxy server | |
124 from the server administrative interface. | |
125 | |
126 Once the new tool is installed, local users can run it - each time, the script | |
127 that was supplied when it was built will be executed with the input chosen | |
128 from the user's history. In other words, the tools you generate with the | |
129 ToolFactory run just like any other Galaxy tool,but run your script every time. | |
130 | |
131 Tool factory tools are perfect for workflow components. One input, one output, | |
132 no variables. | |
133 | |
134 *To fully and safely exploit the awesome power* of this tool, | |
135 Galaxy and the ToolShed, you should be a developer installing this | |
136 tool on a private/personal/scratch local instance where you are an | |
137 admin_user. Then, if you break it, you get to keep all the pieces see | |
138 https://bitbucket.org/fubar/galaxytoolfactory/wiki/Home | |
139 | |
140 ** Installation ** | |
141 This is a Galaxy tool. You can install it most conveniently using the | |
142 administrative "Search and browse tool sheds" link. Find the Galaxy Main | |
143 toolshed at https://toolshed.g2.bx.psu.edu/ and search for the toolfactory | |
144 repository. Open it and review the code and select the option to install it. | |
145 | |
146 | |
147 If you can't get the tool that way, the xml and py files here need to be | |
148 copied into a new tools subdirectory such as tools/toolfactory | |
149 Your tool_conf.xml needs a new entry pointing to the xml \file - something like | |
150 ``` | |
151 <section name="Tool building tools" id="toolbuilders"> | |
152 <tool file="toolfactory/rgToolFactory.xml"/> | |
153 </section> | |
154 ``` | |
155 If not already there (I just added it to datatypes_conf.xml.sample), | |
156 please add: | |
157 | |
158 ``` | |
159 <datatype extension="toolshed.gz" type="galaxy.datatypes.binary:Binary" | |
160 mimetype="multipart/x-gzip" subclass="True" /> | |
161 ``` | |
162 to your local data_types_conf.xml. | |
163 | |
164 | |
165 Of course, R, python, perl etc are needed on your path if you want to test | |
166 scripts using those interpreters. Adding new ones to this tool code should | |
167 be easy enough. Please make suggestions as bitbucket issues and code. The | |
168 HTML file code automatically shrinks R's bloated pdfs, and depends on | |
169 ghostscript. The thumbnails require imagemagick . | |
170 | |
171 * Restricted execution * | |
172 The tool factory tool itself will then be usable ONLY by admin users - | |
173 people with IDs in admin_users in universe_wsgi.ini **Yes, that's right. ONLY | |
174 admin_users can run this tool** Think about it for a moment. If allowed to | |
175 run any arbitrary script on your Galaxy server, the only thing that would | |
176 impede a miscreant bent on destroying all your Galaxy data would probably | |
177 be lack of appropriate technical skills. | |
178 | |
179 *What it does* This is a tool factory for simple scripts in python, R and | |
180 perl currently. Functional tests are automatically generated. How cool is that. | |
181 | |
182 LIMITED to simple scripts that read one input from the history. Optionally can | |
183 write one new history dataset, and optionally collect any number of outputs | |
184 into links on an autogenerated HTML index page for the user to navigate - | |
185 useful if the script writes images and output files - pdf outputs are shown | |
186 as thumbnails and R's bloated pdf's are shrunk with ghostscript so that and | |
187 imagemagik need to be available. | |
188 | |
189 Generated tools can be edited and enhanced like any Galaxy tool, so start | |
190 small and build up since a generated script gets you a serious leg up to a | |
191 more complex one. | |
192 | |
193 *What you do* You paste and run your script, you fix the syntax errors and | |
194 eventually it runs. You can use the redo button and edit the script before | |
195 trying to rerun it as you debug - it works pretty well. | |
196 | |
197 Once the script works on some test data, you can generate a toolshed compatible | |
198 gzip file containing your script ready to run as an ordinary Galaxy tool in | |
199 a repository on your local toolshed. That means safe and largely automated | |
200 installation in any production Galaxy configured to use your toolshed. | |
201 | |
202 *Generated tool Security* Once you install a generated tool, it's just | |
203 another tool - assuming the script is safe. They just run normally and their | |
204 user cannot do anything unusually insecure but please, practice safe toolshed. | |
205 Read the fucking code before you install any tool. Especially this one - | |
206 it is really scary. | |
207 | |
208 If you opt for an HTML output, you get all the script outputs arranged | |
209 as a single Html history item - all output files are linked, thumbnails for | |
210 all the pdfs. Ugly but really inexpensive. | |
211 | |
212 Patches and suggestions welcome as bitbucket issues please? | |
213 | |
214 copyright ross lazarus (ross stop lazarus at gmail stop com) May 2012 | |
215 | |
216 all rights reserved | |
217 Licensed under the LGPL if you want to improve it, feel free | |
218 https://bitbucket.org/fubar/galaxytoolfactory/wiki/Home | |
219 | |
220 Material for our more enthusiastic and voracious readers continues below - | |
221 we salute you. | |
222 | |
223 **Motivation** Simple transformation, filtering or reporting scripts get | |
224 written, run and lost every day in most busy labs - even ours where Galaxy is | |
225 in use. This 'dark script matter' is pervasive and generally not reproducible. | |
226 | |
227 **Benefits** For our group, this allows Galaxy to fill that important dark | |
228 script gap - all those "small" bioinformatics tasks. Once a user has a working | |
229 R (or python or perl) script that does something Galaxy cannot currently do | |
230 (eg transpose a tabular file) and takes parameters the way Galaxy supplies | |
231 them (see example below), they: | |
232 | |
233 1. Install the tool factory on a personal private instance | |
234 | |
235 2. Upload a small test data set | |
236 | |
237 3. Paste the script into the 'script' text box and iteratively run the | |
238 insecure tool on test data until it works right - there is absolutely no | |
239 reason to do this anywhere other than on a personal private instance. | |
240 | |
241 4. Once it works right, set the 'Generate toolshed gzip' option and run | |
242 it again. | |
243 | |
244 5. A toolshed style gzip appears ready to upload and install like any other | |
245 Toolshed entry. | |
246 | |
247 6. Upload the new tool to the toolshed | |
248 | |
249 7. Ask the local admin to check the new tool to confirm it's not evil and | |
250 install it in the local production galaxy | |
251 | |
252 | |
253 | |
254 **Parameter passing and file inputs** | |
255 | |
256 Your script will receive up to 3 named parameters | |
257 INPATHS is a comma separated list of input file paths | |
258 INNAMES is a comma separated list of input file names in the same order | |
259 OUTPATH is optional if a file is being generated, your script should write there | |
260 Your script should open and write files in the provided working directory if you are using the Html | |
261 automatic presentation option. | |
262 | |
263 Python script command lines will have --INPATHS and --additional_arguments etc. to make it easy to use argparse | |
264 | |
265 Rscript will need to use commandArgs(TRUE) - see the example below - additional arguments will | |
266 appear as themselves - eg foo="bar" will mean that foo is defined as "bar" for the script. | |
267 | |
268 Bash and sh will see any additional parameters on their command lines and the 3 named parameters | |
269 in their environment magically - well, using env on the CL | |
270 ``` | |
271 ***python***:: | |
272 | |
273 # argparse for 3 possible comma separated lists | |
274 # additional parameters need to be parsed ! | |
275 # then echo parameters to the output file | |
276 import sys | |
277 import argparse | |
278 argp=argparse.ArgumentParser() | |
279 argp.add_argument('--INNAMES',default=None) | |
280 argp.add_argument('--INPATHS',default=None) | |
281 argp.add_argument('--OUTPATH',default=None) | |
282 argp.add_argument('--additional_parameters',default=[],action="append") | |
283 argp.add_argument('otherargs', nargs=argparse.REMAINDER) | |
284 args = argp.parse_args() | |
285 f= open(args.OUTPATH,'w') | |
286 s = '### args=%s\n' % str(args) | |
287 f.write(s) | |
288 s = 'sys.argv=%s\n' % sys.argv | |
289 f.write(s) | |
290 f.close() | |
291 | |
292 | |
293 | |
294 ***Rscript***:: | |
295 | |
296 # tool factory Rscript parser suggested by Forester | |
297 # http://www.r-bloggers.com/including-arguments-in-r-cmd-batch-mode/ | |
298 # additional parameters will appear in the ls() below - they are available | |
299 # to your script | |
300 # echo parameters to the output file | |
301 ourargs = commandArgs(TRUE) | |
302 if(length(ourargs)==0){ | |
303 print("No arguments supplied.") | |
304 }else{ | |
305 for(i in 1:length(ourargs)){ | |
306 eval(parse(text=ourargs[[i]])) | |
307 } | |
308 sink(OUTPATH) | |
309 cat('INPATHS=',INPATHS,'\n') | |
310 cat('INNAMES=',INNAMES,'\n') | |
311 cat('OUTPATH=',OUTPATH,'\n') | |
312 x=ls() | |
313 cat('all objects=',x,'\n') | |
314 sink() | |
315 } | |
316 sessionInfo() | |
317 print.noquote(date()) | |
318 | |
319 | |
320 ***bash/sh***:: | |
321 | |
322 # tool factory sets up these environmental variables | |
323 # this example writes those to the output file | |
324 # additional params appear on command line | |
325 if [ ! -f "$OUTPATH" ] ; then | |
326 touch "$OUTPATH" | |
327 fi | |
328 echo "INPATHS=$INPATHS" >> "$OUTPATH" | |
329 echo "INNAMES=$INNAMES" >> "$OUTPATH" | |
330 echo "OUTPATH=$OUTPATH" >> "$OUTPATH" | |
331 echo "CL=$@" >> "$OUTPATH" | |
332 | |
333 ***perl***:: | |
334 | |
335 (my $INPATHS,my $INNAMES,my $OUTPATH ) = @ARGV; | |
336 open(my $fh, '>', $OUTPATH) or die "Could not open file '$OUTPATH' $!"; | |
337 print $fh "INPATHS=$INPATHS\n INNAMES=$INNAMES\n OUTPATH=$OUTPATH\n"; | |
338 close $fh; | |
339 | |
340 ``` | |
341 | |
342 Galaxy as an IDE for developing API scripts | |
343 If you need to develop Galaxy API scripts and you like to live dangerously, | |
344 please read on. | |
345 | |
346 Galaxy as an IDE? | |
347 Amazingly enough, blend-lib API scripts run perfectly well *inside* | |
348 Galaxy when pasted into a Tool Factory form. No need to generate a new | |
349 tool. Galaxy+Tool_Factory = IDE I think we need a new t-shirt. Seriously, | |
350 it is actually quite useable. | |
351 | |
352 Why bother - what's wrong with Eclipse | |
353 Nothing. But, compared with developing API scripts in the usual way outside | |
354 Galaxy, you get persistence and other framework benefits plus at absolutely | |
355 no extra charge, a ginormous security problem if you share the history or | |
356 any outputs because they contain the api script with key so development | |
357 servers only please! | |
358 | |
359 Workflow | |
360 Fire up the Tool Factory in Galaxy. | |
361 | |
362 Leave the input box empty, set the interpreter to python, paste and run an | |
363 api script - eg working example (substitute the url and key) below. | |
364 | |
365 It took me a few iterations to develop the example below because I know | |
366 almost nothing about the API. I started with very simple code from one of the | |
367 samples and after each run, the (edited..) api script is conveniently recreated | |
368 using the redo button on the history output item. So each successive version | |
369 of the developing api script you run is persisted - ready to be edited and | |
370 rerun easily. It is ''very'' handy to be able to add a line of code to the | |
371 script and run it, then view the output to (eg) inspect dicts returned by | |
372 API calls to help move progressively deeper iteratively. | |
373 | |
374 Give the below a whirl on a private clone (install the tool factory from | |
375 the main toolshed) and try adding complexity with few rerun/edit/rerun cycles. | |
376 | |
377 Eg tool factory api script | |
378 ``` | |
379 import sys | |
380 from blend.galaxy import GalaxyInstance | |
381 ourGal = 'http://x.x.x.x:xxxx' | |
382 ourKey = 'xxx' | |
383 gi = GalaxyInstance(ourGal, key=ourKey) | |
384 libs = gi.libraries.get_libraries() | |
385 res = [] | |
386 # libs looks like | |
387 # u'url': u'/galaxy/api/libraries/441d8112651dc2f3', u'id': | |
388 u'441d8112651dc2f3', u'name':.... u'Demonstration sample RNA data', | |
389 for lib in libs: | |
390 res.append('%s:\n' % lib['name']) | |
391 res.append(str(gi.libraries.show_library(lib['id'],contents=True))) | |
392 outf=open(sys.argv[2],'w') | |
393 outf.write('\n'.join(res)) | |
394 outf.close() | |
395 ``` | |
396 | |
397 **Attribution** | |
398 Creating re-usable tools from scripts: The Galaxy Tool Factory | |
399 Ross Lazarus; Antony Kaspi; Mark Ziemann; The Galaxy Team | |
400 Bioinformatics 2012; doi: 10.1093/bioinformatics/bts573 | |
401 | |
402 http://bioinformatics.oxfordjournals.org/cgi/reprint/bts573?ijkey=lczQh1sWrMwdYWJ&keytype=ref | |
403 | |
404 **Licensing** | |
405 Copyright Ross Lazarus 2010 | |
406 ross lazarus at g mail period com | |
407 | |
408 All rights reserved. | |
409 | |
410 Licensed under the LGPL | |
411 | |
412 **screenshot** | |
413 | |
414 ![example run](/images/dynamicScriptTool.png) | |
415 | |
416 | |
417 ``` | |
418 |