Mercurial > repos > fubar > tool_factory_2
comparison README.txt @ 26:db35d39e1de9 draft
Passes planemo test
Uses galaxyxml to generate new tool.
More outputs will be added...
author | fubar |
---|---|
date | Thu, 30 Jul 2020 06:48:45 -0400 |
parents | d98f5a09137f |
children |
comparison
equal
deleted
inserted
replaced
25:9fe74bd23af2 | 26:db35d39e1de9 |
---|---|
14 It works by exposing *unrestricted* and therefore extremely dangerous scripting | 14 It works by exposing *unrestricted* and therefore extremely dangerous scripting |
15 to all designated administrators of the host Galaxy server, allowing them to | 15 to all designated administrators of the host Galaxy server, allowing them to |
16 run scripts in R, python, sh and perl over multiple selected input data sets, | 16 run scripts in R, python, sh and perl over multiple selected input data sets, |
17 writing a single new data set as output. | 17 writing a single new data set as output. |
18 | 18 |
19 *Differences between TF2 and the original Tool Factory* | 19 *You have a working r/python/perl/bash script or any executable with positional or argparse style parameters* |
20 | 20 |
21 1. TF2 (this one) allows any number of either fixed or user-editable parameters to be defined | 21 It can be turned into an ordinary Galaxy tool in minutes, using a Galaxy tool. |
22 for the new tool. If these are editable, the user can change them but otherwise, they are passed | |
23 as fixed and invisible parameters for each execution. Obviously, there are substantial security | |
24 implications with editable parameters, but these are always sanitized by Galaxy's inbuilt | |
25 parameter sanitization so you may need to "unsanitize" characters - eg translate all "__lt__" | |
26 into "<" for certain parameters where that is needed. Please practise safe toolshed. | |
27 | 22 |
28 2. Any number of (the same datatype) of input files may be defined. | |
29 | |
30 These changes substantially complicate the way your supplied script is supplied with | |
31 all the new and variable parameters. Examples in each scripting language are shown | |
32 in the tool help | |
33 | |
34 *Automated outputs in named sections* | |
35 | |
36 If your script writes to the current directory path, arbitrary mix of (eg) | |
37 pdfs, tabular analysis results and run logs,the tool factory can optionally | |
38 auto-generate a linked Html page with separate sections showing a thumbnail | |
39 grid for all pdfs and the log text, grouping all artifacts sharing a file | |
40 name and log name prefix:: | |
41 | |
42 eg: if "foo.log" is emitted then *all* other outputs matching foo_* will | |
43 all be grouped together - eg | |
44 foo_baz.pdf | |
45 foo_bar.pdf and | |
46 foo_zot.xls | |
47 would all be displayed and linked in the same section with foo.log's contents | |
48 - to form the "Foo" section of the Html page. Sections appear in alphabetic | |
49 order and there are no limits on the number of files or sections. | |
50 | 23 |
51 *Automated generation of new Galaxy tools for installation into any Galaxy* | 24 *Automated generation of new Galaxy tools for installation into any Galaxy* |
52 | 25 |
53 Once a script is working correctly, this tool optionally generates a | 26 A test is generated using small sample test data inputs and parameter settings you supply. |
54 new Galaxy tool, effectively freezing the supplied script into a new, | 27 Once the test case outputs have been produced, they can be used to build a |
55 ordinary Galaxy tool that runs it over one or more input files selected by | 28 new Galaxy tool. The supplied script or executable is baked as a requirement |
56 the user. Generated tools are installed via a tool shed by an administrator | 29 into a new, ordinary Galaxy tool, fully workflow compatible out of the box. |
30 Generated tools are installed via a tool shed by an administrator | |
57 and work exactly like all other Galaxy tools for your users. | 31 and work exactly like all other Galaxy tools for your users. |
58 | |
59 If you use the Html output option, please ensure that sanitize_all_html is | |
60 set to False and uncommented in universe_wsgi.ini - it should show:: | |
61 | |
62 # By default, all tool output served as 'text/html' will be sanitized | |
63 sanitize_all_html = False | |
64 | |
65 This opens potential security risks and may not be acceptable for public | |
66 sites where the lack of stylesheets may make Html pages damage onlookers' | |
67 eyeballs but should still be correct. | |
68 | |
69 | 32 |
70 *More Detail* | 33 *More Detail* |
71 | 34 |
72 To use the ToolFactory, you should have prepared a script to paste into a | 35 To use the ToolFactory, you should have prepared a script to paste into a |
73 text box, and a small test input example ready to select from your history | 36 text box, or have a package in mind and a small test input example ready to select from your history |
74 to test your new script. | 37 to test your new script. |
38 | |
39 ```planemo test rgToolFactory2.xml --galaxy_root ~/galaxy --test_data ~/galaxy/tools/tool_makers/toolfactory/test-data``` works for me | |
75 | 40 |
76 There is an example in each scripting language on the Tool Factory form. You | 41 There is an example in each scripting language on the Tool Factory form. You |
77 can just cut and paste these to try it out - remember to select the right | 42 can just cut and paste these to try it out - remember to select the right |
78 interpreter please. You'll also need to create a small test data set using | 43 interpreter please. You'll also need to create a small test data set using |
79 the Galaxy history add new data tool. | 44 the Galaxy history add new data tool. |
127 <datatype extension="toolshed.gz" type="galaxy.datatypes.binary:Binary" | 92 <datatype extension="toolshed.gz" type="galaxy.datatypes.binary:Binary" |
128 mimetype="multipart/x-gzip" subclass="True" /> | 93 mimetype="multipart/x-gzip" subclass="True" /> |
129 to your local data_types_conf.xml. | 94 to your local data_types_conf.xml. |
130 ) | 95 ) |
131 | 96 |
132 Of course, R, python, perl etc are needed on your path if you want to test | |
133 scripts using those interpreters. Adding new ones to this tool code should | |
134 be easy enough. Please make suggestions as bitbucket issues and code. The | |
135 HTML file code automatically shrinks R's bloated pdfs, and depends on | |
136 ghostscript. The thumbnails require imagemagick . | |
137 | |
138 * Restricted execution * | 97 * Restricted execution * |
139 The tool factory tool itself will then be usable ONLY by admin users - | 98 The tool factory tool itself will then be usable ONLY by admin users - |
140 people with IDs in admin_users in universe_wsgi.ini **Yes, that's right. ONLY | 99 people with IDs in admin_users in universe_wsgi.ini **Yes, that's right. ONLY |
141 admin_users can run this tool** Think about it for a moment. If allowed to | 100 admin_users can run this tool** Think about it for a moment. If allowed to |
142 run any arbitrary script on your Galaxy server, the only thing that would | 101 run any arbitrary script on your Galaxy server, the only thing that would |
182 | 141 |
183 all rights reserved | 142 all rights reserved |
184 Licensed under the LGPL if you want to improve it, feel free | 143 Licensed under the LGPL if you want to improve it, feel free |
185 https://bitbucket.org/fubar/galaxytoolfactory/wiki/Home | 144 https://bitbucket.org/fubar/galaxytoolfactory/wiki/Home |
186 | 145 |
187 Material for our more enthusiastic and voracious readers continues below - | |
188 we salute you. | |
189 | |
190 **Motivation** Simple transformation, filtering or reporting scripts get | |
191 written, run and lost every day in most busy labs - even ours where Galaxy is | |
192 in use. This 'dark script matter' is pervasive and generally not reproducible. | |
193 | |
194 **Benefits** For our group, this allows Galaxy to fill that important dark | |
195 script gap - all those "small" bioinformatics tasks. Once a user has a working | |
196 R (or python or perl) script that does something Galaxy cannot currently do | |
197 (eg transpose a tabular file) and takes parameters the way Galaxy supplies | |
198 them (see example below), they: | |
199 | |
200 1. Install the tool factory on a personal private instance | |
201 | |
202 2. Upload a small test data set | |
203 | |
204 3. Paste the script into the 'script' text box and iteratively run the | |
205 insecure tool on test data until it works right - there is absolutely no | |
206 reason to do this anywhere other than on a personal private instance. | |
207 | |
208 4. Once it works right, set the 'Generate toolshed gzip' option and run | |
209 it again. | |
210 | |
211 5. A toolshed style gzip appears ready to upload and install like any other | |
212 Toolshed entry. | |
213 | |
214 6. Upload the new tool to the toolshed | |
215 | |
216 7. Ask the local admin to check the new tool to confirm it's not evil and | |
217 install it in the local production galaxy | |
218 | |
219 **Simple examples on the tool form** | |
220 | |
221 A simple Rscript "filter" showing how the command line parameters can be | |
222 handled, takes an input file, does something (transpose in this case) and | |
223 writes the results to a new tabular file:: | |
224 | |
225 # transpose a tabular input file and write as a tabular output file | |
226 ourargs = commandArgs(TRUE) | |
227 inf = ourargs[1] | |
228 outf = ourargs[2] | |
229 inp = read.table(inf,head=F,row.names=NULL,sep='\t') | |
230 outp = t(inp) | |
231 write.table(outp,outf, quote=FALSE, sep="\t",row.names=F,col.names=F) | |
232 | |
233 Calculate a multiple test adjusted p value from a column of p values - | |
234 for this script to be useful, it needs the right column for the input to be | |
235 specified in the code for the given input file type(s) specified when the | |
236 tool is generated :: | |
237 | |
238 # use p.adjust - assumes a HEADER row and column 1 - please fix for any | |
239 real use | |
240 column = 1 # adjust if necessary for some other kind of input | |
241 fdrmeth = 'BH' | |
242 ourargs = commandArgs(TRUE) | |
243 inf = ourargs[1] | |
244 outf = ourargs[2] | |
245 inp = read.table(inf,head=T,row.names=NULL,sep='\t') | |
246 p = inp[,column] | |
247 q = p.adjust(p,method=fdrmeth) | |
248 newval = paste(fdrmeth,'p-value',sep='_') | |
249 q = data.frame(q) | |
250 names(q) = newval | |
251 outp = cbind(inp,newval=q) | |
252 write.table(outp,outf, quote=FALSE, sep="\t",row.names=F,col.names=T) | |
253 | |
254 | |
255 | |
256 Another Rscript example without any input file - generates a random heatmap | |
257 pdf - you must make sure the option to create an HTML output file is | |
258 turned on for this to work. The heatmap will be presented as a thumbnail | |
259 linked to the pdf in the resulting HTML page:: | |
260 | |
261 # note this script takes NO input or output because it generates random data | |
262 foo = data.frame(a=runif(100),b=runif(100),c=runif(100),d=runif(100), | |
263 e=runif(100),f=runif(100)) | |
264 bar = as.matrix(foo) | |
265 pdf( "heattest.pdf" ) | |
266 heatmap(bar,main='Random Heatmap') | |
267 dev.off() | |
268 | |
269 A Python example that reverses each row of a tabular file. You'll need | |
270 to remove the leading spaces for this to work if cut and pasted into the | |
271 script box. Note that you can already do this in Galaxy by setting up the | |
272 cut columns tool with the correct number of columns in reverse order,but | |
273 this script will work for any number of columns so is completely generic:: | |
274 | |
275 # reverse order of columns in a tabular file | |
276 import sys | |
277 inp = sys.argv[1] | |
278 outp = sys.argv[2] | |
279 i = open(inp,'r') | |
280 o = open(outp,'w') | |
281 for row in i: | |
282 rs = row.rstrip().split('\t') | |
283 rs.reverse() | |
284 o.write('\t'.join(rs)) | |
285 o.write('\n') | |
286 i.close() | |
287 o.close() | |
288 | |
289 | |
290 Galaxy as an IDE for developing API scripts | |
291 If you need to develop Galaxy API scripts and you like to live dangerously, | |
292 please read on. | |
293 | |
294 Galaxy as an IDE? | |
295 Amazingly enough, blend-lib API scripts run perfectly well *inside* | |
296 Galaxy when pasted into a Tool Factory form. No need to generate a new | |
297 tool. Galaxy+Tool_Factory = IDE I think we need a new t-shirt. Seriously, | |
298 it is actually quite useable. | |
299 | |
300 Why bother - what's wrong with Eclipse | |
301 Nothing. But, compared with developing API scripts in the usual way outside | |
302 Galaxy, you get persistence and other framework benefits plus at absolutely | |
303 no extra charge, a ginormous security problem if you share the history or | |
304 any outputs because they contain the api script with key so development | |
305 servers only please! | |
306 | |
307 Workflow | |
308 Fire up the Tool Factory in Galaxy. | |
309 | |
310 Leave the input box empty, set the interpreter to python, paste and run an | |
311 api script - eg working example (substitute the url and key) below. | |
312 | |
313 It took me a few iterations to develop the example below because I know | |
314 almost nothing about the API. I started with very simple code from one of the | |
315 samples and after each run, the (edited..) api script is conveniently recreated | |
316 using the redo button on the history output item. So each successive version | |
317 of the developing api script you run is persisted - ready to be edited and | |
318 rerun easily. It is ''very'' handy to be able to add a line of code to the | |
319 script and run it, then view the output to (eg) inspect dicts returned by | |
320 API calls to help move progressively deeper iteratively. | |
321 | |
322 Give the below a whirl on a private clone (install the tool factory from | |
323 the main toolshed) and try adding complexity with few rerun/edit/rerun cycles. | |
324 | |
325 Eg tool factory api script | |
326 import sys | |
327 from blend.galaxy import GalaxyInstance | |
328 ourGal = 'http://x.x.x.x:xxxx' | |
329 ourKey = 'xxx' | |
330 gi = GalaxyInstance(ourGal, key=ourKey) | |
331 libs = gi.libraries.get_libraries() | |
332 res = [] | |
333 # libs looks like | |
334 # u'url': u'/galaxy/api/libraries/441d8112651dc2f3', u'id': | |
335 u'441d8112651dc2f3', u'name':.... u'Demonstration sample RNA data', | |
336 for lib in libs: | |
337 res.append('%s:\n' % lib['name']) | |
338 res.append(str(gi.libraries.show_library(lib['id'],contents=True))) | |
339 outf=open(sys.argv[2],'w') | |
340 outf.write('\n'.join(res)) | |
341 outf.close() | |
342 | 146 |
343 **Attribution** | 147 **Attribution** |
344 Creating re-usable tools from scripts: The Galaxy Tool Factory | 148 Creating re-usable tools from scripts: The Galaxy Tool Factory |
345 Ross Lazarus; Antony Kaspi; Mark Ziemann; The Galaxy Team | 149 Ross Lazarus; Antony Kaspi; Mark Ziemann; The Galaxy Team |
346 Bioinformatics 2012; doi: 10.1093/bioinformatics/bts573 | 150 Bioinformatics 2012; doi: 10.1093/bioinformatics/bts573 |