comparison README.txt @ 26:db35d39e1de9 draft

Passes planemo test Uses galaxyxml to generate new tool. More outputs will be added...
author fubar
date Thu, 30 Jul 2020 06:48:45 -0400
parents d98f5a09137f
children
comparison
equal deleted inserted replaced
25:9fe74bd23af2 26:db35d39e1de9
14 It works by exposing *unrestricted* and therefore extremely dangerous scripting 14 It works by exposing *unrestricted* and therefore extremely dangerous scripting
15 to all designated administrators of the host Galaxy server, allowing them to 15 to all designated administrators of the host Galaxy server, allowing them to
16 run scripts in R, python, sh and perl over multiple selected input data sets, 16 run scripts in R, python, sh and perl over multiple selected input data sets,
17 writing a single new data set as output. 17 writing a single new data set as output.
18 18
19 *Differences between TF2 and the original Tool Factory* 19 *You have a working r/python/perl/bash script or any executable with positional or argparse style parameters*
20 20
21 1. TF2 (this one) allows any number of either fixed or user-editable parameters to be defined 21 It can be turned into an ordinary Galaxy tool in minutes, using a Galaxy tool.
22 for the new tool. If these are editable, the user can change them but otherwise, they are passed
23 as fixed and invisible parameters for each execution. Obviously, there are substantial security
24 implications with editable parameters, but these are always sanitized by Galaxy's inbuilt
25 parameter sanitization so you may need to "unsanitize" characters - eg translate all "__lt__"
26 into "<" for certain parameters where that is needed. Please practise safe toolshed.
27 22
28 2. Any number of (the same datatype) of input files may be defined.
29
30 These changes substantially complicate the way your supplied script is supplied with
31 all the new and variable parameters. Examples in each scripting language are shown
32 in the tool help
33
34 *Automated outputs in named sections*
35
36 If your script writes to the current directory path, arbitrary mix of (eg)
37 pdfs, tabular analysis results and run logs,the tool factory can optionally
38 auto-generate a linked Html page with separate sections showing a thumbnail
39 grid for all pdfs and the log text, grouping all artifacts sharing a file
40 name and log name prefix::
41
42 eg: if "foo.log" is emitted then *all* other outputs matching foo_* will
43 all be grouped together - eg
44 foo_baz.pdf
45 foo_bar.pdf and
46 foo_zot.xls
47 would all be displayed and linked in the same section with foo.log's contents
48 - to form the "Foo" section of the Html page. Sections appear in alphabetic
49 order and there are no limits on the number of files or sections.
50 23
51 *Automated generation of new Galaxy tools for installation into any Galaxy* 24 *Automated generation of new Galaxy tools for installation into any Galaxy*
52 25
53 Once a script is working correctly, this tool optionally generates a 26 A test is generated using small sample test data inputs and parameter settings you supply.
54 new Galaxy tool, effectively freezing the supplied script into a new, 27 Once the test case outputs have been produced, they can be used to build a
55 ordinary Galaxy tool that runs it over one or more input files selected by 28 new Galaxy tool. The supplied script or executable is baked as a requirement
56 the user. Generated tools are installed via a tool shed by an administrator 29 into a new, ordinary Galaxy tool, fully workflow compatible out of the box.
30 Generated tools are installed via a tool shed by an administrator
57 and work exactly like all other Galaxy tools for your users. 31 and work exactly like all other Galaxy tools for your users.
58
59 If you use the Html output option, please ensure that sanitize_all_html is
60 set to False and uncommented in universe_wsgi.ini - it should show::
61
62 # By default, all tool output served as 'text/html' will be sanitized
63 sanitize_all_html = False
64
65 This opens potential security risks and may not be acceptable for public
66 sites where the lack of stylesheets may make Html pages damage onlookers'
67 eyeballs but should still be correct.
68
69 32
70 *More Detail* 33 *More Detail*
71 34
72 To use the ToolFactory, you should have prepared a script to paste into a 35 To use the ToolFactory, you should have prepared a script to paste into a
73 text box, and a small test input example ready to select from your history 36 text box, or have a package in mind and a small test input example ready to select from your history
74 to test your new script. 37 to test your new script.
38
39 ```planemo test rgToolFactory2.xml --galaxy_root ~/galaxy --test_data ~/galaxy/tools/tool_makers/toolfactory/test-data``` works for me
75 40
76 There is an example in each scripting language on the Tool Factory form. You 41 There is an example in each scripting language on the Tool Factory form. You
77 can just cut and paste these to try it out - remember to select the right 42 can just cut and paste these to try it out - remember to select the right
78 interpreter please. You'll also need to create a small test data set using 43 interpreter please. You'll also need to create a small test data set using
79 the Galaxy history add new data tool. 44 the Galaxy history add new data tool.
127 <datatype extension="toolshed.gz" type="galaxy.datatypes.binary:Binary" 92 <datatype extension="toolshed.gz" type="galaxy.datatypes.binary:Binary"
128 mimetype="multipart/x-gzip" subclass="True" /> 93 mimetype="multipart/x-gzip" subclass="True" />
129 to your local data_types_conf.xml. 94 to your local data_types_conf.xml.
130 ) 95 )
131 96
132 Of course, R, python, perl etc are needed on your path if you want to test
133 scripts using those interpreters. Adding new ones to this tool code should
134 be easy enough. Please make suggestions as bitbucket issues and code. The
135 HTML file code automatically shrinks R's bloated pdfs, and depends on
136 ghostscript. The thumbnails require imagemagick .
137
138 * Restricted execution * 97 * Restricted execution *
139 The tool factory tool itself will then be usable ONLY by admin users - 98 The tool factory tool itself will then be usable ONLY by admin users -
140 people with IDs in admin_users in universe_wsgi.ini **Yes, that's right. ONLY 99 people with IDs in admin_users in universe_wsgi.ini **Yes, that's right. ONLY
141 admin_users can run this tool** Think about it for a moment. If allowed to 100 admin_users can run this tool** Think about it for a moment. If allowed to
142 run any arbitrary script on your Galaxy server, the only thing that would 101 run any arbitrary script on your Galaxy server, the only thing that would
182 141
183 all rights reserved 142 all rights reserved
184 Licensed under the LGPL if you want to improve it, feel free 143 Licensed under the LGPL if you want to improve it, feel free
185 https://bitbucket.org/fubar/galaxytoolfactory/wiki/Home 144 https://bitbucket.org/fubar/galaxytoolfactory/wiki/Home
186 145
187 Material for our more enthusiastic and voracious readers continues below -
188 we salute you.
189
190 **Motivation** Simple transformation, filtering or reporting scripts get
191 written, run and lost every day in most busy labs - even ours where Galaxy is
192 in use. This 'dark script matter' is pervasive and generally not reproducible.
193
194 **Benefits** For our group, this allows Galaxy to fill that important dark
195 script gap - all those "small" bioinformatics tasks. Once a user has a working
196 R (or python or perl) script that does something Galaxy cannot currently do
197 (eg transpose a tabular file) and takes parameters the way Galaxy supplies
198 them (see example below), they:
199
200 1. Install the tool factory on a personal private instance
201
202 2. Upload a small test data set
203
204 3. Paste the script into the 'script' text box and iteratively run the
205 insecure tool on test data until it works right - there is absolutely no
206 reason to do this anywhere other than on a personal private instance.
207
208 4. Once it works right, set the 'Generate toolshed gzip' option and run
209 it again.
210
211 5. A toolshed style gzip appears ready to upload and install like any other
212 Toolshed entry.
213
214 6. Upload the new tool to the toolshed
215
216 7. Ask the local admin to check the new tool to confirm it's not evil and
217 install it in the local production galaxy
218
219 **Simple examples on the tool form**
220
221 A simple Rscript "filter" showing how the command line parameters can be
222 handled, takes an input file, does something (transpose in this case) and
223 writes the results to a new tabular file::
224
225 # transpose a tabular input file and write as a tabular output file
226 ourargs = commandArgs(TRUE)
227 inf = ourargs[1]
228 outf = ourargs[2]
229 inp = read.table(inf,head=F,row.names=NULL,sep='\t')
230 outp = t(inp)
231 write.table(outp,outf, quote=FALSE, sep="\t",row.names=F,col.names=F)
232
233 Calculate a multiple test adjusted p value from a column of p values -
234 for this script to be useful, it needs the right column for the input to be
235 specified in the code for the given input file type(s) specified when the
236 tool is generated ::
237
238 # use p.adjust - assumes a HEADER row and column 1 - please fix for any
239 real use
240 column = 1 # adjust if necessary for some other kind of input
241 fdrmeth = 'BH'
242 ourargs = commandArgs(TRUE)
243 inf = ourargs[1]
244 outf = ourargs[2]
245 inp = read.table(inf,head=T,row.names=NULL,sep='\t')
246 p = inp[,column]
247 q = p.adjust(p,method=fdrmeth)
248 newval = paste(fdrmeth,'p-value',sep='_')
249 q = data.frame(q)
250 names(q) = newval
251 outp = cbind(inp,newval=q)
252 write.table(outp,outf, quote=FALSE, sep="\t",row.names=F,col.names=T)
253
254
255
256 Another Rscript example without any input file - generates a random heatmap
257 pdf - you must make sure the option to create an HTML output file is
258 turned on for this to work. The heatmap will be presented as a thumbnail
259 linked to the pdf in the resulting HTML page::
260
261 # note this script takes NO input or output because it generates random data
262 foo = data.frame(a=runif(100),b=runif(100),c=runif(100),d=runif(100),
263 e=runif(100),f=runif(100))
264 bar = as.matrix(foo)
265 pdf( "heattest.pdf" )
266 heatmap(bar,main='Random Heatmap')
267 dev.off()
268
269 A Python example that reverses each row of a tabular file. You'll need
270 to remove the leading spaces for this to work if cut and pasted into the
271 script box. Note that you can already do this in Galaxy by setting up the
272 cut columns tool with the correct number of columns in reverse order,but
273 this script will work for any number of columns so is completely generic::
274
275 # reverse order of columns in a tabular file
276 import sys
277 inp = sys.argv[1]
278 outp = sys.argv[2]
279 i = open(inp,'r')
280 o = open(outp,'w')
281 for row in i:
282 rs = row.rstrip().split('\t')
283 rs.reverse()
284 o.write('\t'.join(rs))
285 o.write('\n')
286 i.close()
287 o.close()
288
289
290 Galaxy as an IDE for developing API scripts
291 If you need to develop Galaxy API scripts and you like to live dangerously,
292 please read on.
293
294 Galaxy as an IDE?
295 Amazingly enough, blend-lib API scripts run perfectly well *inside*
296 Galaxy when pasted into a Tool Factory form. No need to generate a new
297 tool. Galaxy+Tool_Factory = IDE I think we need a new t-shirt. Seriously,
298 it is actually quite useable.
299
300 Why bother - what's wrong with Eclipse
301 Nothing. But, compared with developing API scripts in the usual way outside
302 Galaxy, you get persistence and other framework benefits plus at absolutely
303 no extra charge, a ginormous security problem if you share the history or
304 any outputs because they contain the api script with key so development
305 servers only please!
306
307 Workflow
308 Fire up the Tool Factory in Galaxy.
309
310 Leave the input box empty, set the interpreter to python, paste and run an
311 api script - eg working example (substitute the url and key) below.
312
313 It took me a few iterations to develop the example below because I know
314 almost nothing about the API. I started with very simple code from one of the
315 samples and after each run, the (edited..) api script is conveniently recreated
316 using the redo button on the history output item. So each successive version
317 of the developing api script you run is persisted - ready to be edited and
318 rerun easily. It is ''very'' handy to be able to add a line of code to the
319 script and run it, then view the output to (eg) inspect dicts returned by
320 API calls to help move progressively deeper iteratively.
321
322 Give the below a whirl on a private clone (install the tool factory from
323 the main toolshed) and try adding complexity with few rerun/edit/rerun cycles.
324
325 Eg tool factory api script
326 import sys
327 from blend.galaxy import GalaxyInstance
328 ourGal = 'http://x.x.x.x:xxxx'
329 ourKey = 'xxx'
330 gi = GalaxyInstance(ourGal, key=ourKey)
331 libs = gi.libraries.get_libraries()
332 res = []
333 # libs looks like
334 # u'url': u'/galaxy/api/libraries/441d8112651dc2f3', u'id':
335 u'441d8112651dc2f3', u'name':.... u'Demonstration sample RNA data',
336 for lib in libs:
337 res.append('%s:\n' % lib['name'])
338 res.append(str(gi.libraries.show_library(lib['id'],contents=True)))
339 outf=open(sys.argv[2],'w')
340 outf.write('\n'.join(res))
341 outf.close()
342 146
343 **Attribution** 147 **Attribution**
344 Creating re-usable tools from scripts: The Galaxy Tool Factory 148 Creating re-usable tools from scripts: The Galaxy Tool Factory
345 Ross Lazarus; Antony Kaspi; Mark Ziemann; The Galaxy Team 149 Ross Lazarus; Antony Kaspi; Mark Ziemann; The Galaxy Team
346 Bioinformatics 2012; doi: 10.1093/bioinformatics/bts573 150 Bioinformatics 2012; doi: 10.1093/bioinformatics/bts573