comparison README.txt @ 20:d98f5a09137f draft

Uploaded
author fubar
date Mon, 02 Feb 2015 20:57:40 -0500
parents
children 4e3aa95ed3ac db35d39e1de9
comparison
equal deleted inserted replaced
19:ff812453b1b3 20:d98f5a09137f
1 # WARNING before you start
2 # Install this tool on a private Galaxy ONLY
3 # Please NEVER on a public or production instance
4 # updated august 2014 by John Chilton adding citation support
5 #
6 # updated august 8 2014 to fix bugs reported by Marius van den Beek
7 # please cite the resource at
8 http://bioinformatics.oxfordjournals.org/cgi/reprint/bts573?ijkey=lczQh1sWrMwdYWJ&keytype=ref
9 # if you use this tool in your published work.
10
11 *Short Story*
12
13 This is an unusual Galaxy tool capable of generating new Galaxy tools.
14 It works by exposing *unrestricted* and therefore extremely dangerous scripting
15 to all designated administrators of the host Galaxy server, allowing them to
16 run scripts in R, python, sh and perl over multiple selected input data sets,
17 writing a single new data set as output.
18
19 *Differences between TF2 and the original Tool Factory*
20
21 1. TF2 (this one) allows any number of either fixed or user-editable parameters to be defined
22 for the new tool. If these are editable, the user can change them but otherwise, they are passed
23 as fixed and invisible parameters for each execution. Obviously, there are substantial security
24 implications with editable parameters, but these are always sanitized by Galaxy's inbuilt
25 parameter sanitization so you may need to "unsanitize" characters - eg translate all "__lt__"
26 into "<" for certain parameters where that is needed. Please practise safe toolshed.
27
28 2. Any number of (the same datatype) of input files may be defined.
29
30 These changes substantially complicate the way your supplied script is supplied with
31 all the new and variable parameters. Examples in each scripting language are shown
32 in the tool help
33
34 *Automated outputs in named sections*
35
36 If your script writes to the current directory path, arbitrary mix of (eg)
37 pdfs, tabular analysis results and run logs,the tool factory can optionally
38 auto-generate a linked Html page with separate sections showing a thumbnail
39 grid for all pdfs and the log text, grouping all artifacts sharing a file
40 name and log name prefix::
41
42 eg: if "foo.log" is emitted then *all* other outputs matching foo_* will
43 all be grouped together - eg
44 foo_baz.pdf
45 foo_bar.pdf and
46 foo_zot.xls
47 would all be displayed and linked in the same section with foo.log's contents
48 - to form the "Foo" section of the Html page. Sections appear in alphabetic
49 order and there are no limits on the number of files or sections.
50
51 *Automated generation of new Galaxy tools for installation into any Galaxy*
52
53 Once a script is working correctly, this tool optionally generates a
54 new Galaxy tool, effectively freezing the supplied script into a new,
55 ordinary Galaxy tool that runs it over one or more input files selected by
56 the user. Generated tools are installed via a tool shed by an administrator
57 and work exactly like all other Galaxy tools for your users.
58
59 If you use the Html output option, please ensure that sanitize_all_html is
60 set to False and uncommented in universe_wsgi.ini - it should show::
61
62 # By default, all tool output served as 'text/html' will be sanitized
63 sanitize_all_html = False
64
65 This opens potential security risks and may not be acceptable for public
66 sites where the lack of stylesheets may make Html pages damage onlookers'
67 eyeballs but should still be correct.
68
69
70 *More Detail*
71
72 To use the ToolFactory, you should have prepared a script to paste into a
73 text box, and a small test input example ready to select from your history
74 to test your new script.
75
76 There is an example in each scripting language on the Tool Factory form. You
77 can just cut and paste these to try it out - remember to select the right
78 interpreter please. You'll also need to create a small test data set using
79 the Galaxy history add new data tool.
80
81 If the script fails somehow, use the "redo" button on the tool output in
82 your history to recreate the form complete with broken script. Fix the bug
83 and execute again. Rinse, wash, repeat.
84
85 Once the script runs sucessfully, a new Galaxy tool that runs your script
86 can be generated. Select the "generate" option and supply some help text and
87 names. The new tool will be generated in the form of a new Galaxy datatype
88 - toolshed.gz - as the name suggests, it's an archive ready to upload to a
89 Galaxy ToolShed as a new tool repository.
90
91 Once it's in a ToolShed, it can be installed into any local Galaxy server
92 from the server administrative interface.
93
94 Once the new tool is installed, local users can run it - each time, the script
95 that was supplied when it was built will be executed with the input chosen
96 from the user's history. In other words, the tools you generate with the
97 ToolFactory run just like any other Galaxy tool,but run your script every time.
98
99 Tool factory tools are perfect for workflow components. One input, one output,
100 no variables.
101
102 *To fully and safely exploit the awesome power* of this tool,
103 Galaxy and the ToolShed, you should be a developer installing this
104 tool on a private/personal/scratch local instance where you are an
105 admin_user. Then, if you break it, you get to keep all the pieces see
106 https://bitbucket.org/fubar/galaxytoolfactory/wiki/Home
107
108 ** Installation **
109 This is a Galaxy tool. You can install it most conveniently using the
110 administrative "Search and browse tool sheds" link. Find the Galaxy Main
111 toolshed at https://toolshed.g2.bx.psu.edu/ and search for the toolfactory
112 repository. Open it and review the code and select the option to install it.
113
114 (
115 If you can't get the tool that way, the xml and py files here need to be
116 copied into a new tools
117 subdirectory such as tools/toolfactory Your tool_conf.xml needs a new entry
118 pointing to the xml
119 file - something like::
120
121 <section name="Tool building tools" id="toolbuilders">
122 <tool file="toolfactory/rgToolFactory.xml"/>
123 </section>
124
125 If not already there (I just added it to datatypes_conf.xml.sample),
126 please add:
127 <datatype extension="toolshed.gz" type="galaxy.datatypes.binary:Binary"
128 mimetype="multipart/x-gzip" subclass="True" />
129 to your local data_types_conf.xml.
130 )
131
132 Of course, R, python, perl etc are needed on your path if you want to test
133 scripts using those interpreters. Adding new ones to this tool code should
134 be easy enough. Please make suggestions as bitbucket issues and code. The
135 HTML file code automatically shrinks R's bloated pdfs, and depends on
136 ghostscript. The thumbnails require imagemagick .
137
138 * Restricted execution *
139 The tool factory tool itself will then be usable ONLY by admin users -
140 people with IDs in admin_users in universe_wsgi.ini **Yes, that's right. ONLY
141 admin_users can run this tool** Think about it for a moment. If allowed to
142 run any arbitrary script on your Galaxy server, the only thing that would
143 impede a miscreant bent on destroying all your Galaxy data would probably
144 be lack of appropriate technical skills.
145
146 *What it does* This is a tool factory for simple scripts in python, R and
147 perl currently. Functional tests are automatically generated. How cool is that.
148
149 LIMITED to simple scripts that read one input from the history. Optionally can
150 write one new history dataset, and optionally collect any number of outputs
151 into links on an autogenerated HTML index page for the user to navigate -
152 useful if the script writes images and output files - pdf outputs are shown
153 as thumbnails and R's bloated pdf's are shrunk with ghostscript so that and
154 imagemagik need to be available.
155
156 Generated tools can be edited and enhanced like any Galaxy tool, so start
157 small and build up since a generated script gets you a serious leg up to a
158 more complex one.
159
160 *What you do* You paste and run your script, you fix the syntax errors and
161 eventually it runs. You can use the redo button and edit the script before
162 trying to rerun it as you debug - it works pretty well.
163
164 Once the script works on some test data, you can generate a toolshed compatible
165 gzip file containing your script ready to run as an ordinary Galaxy tool in
166 a repository on your local toolshed. That means safe and largely automated
167 installation in any production Galaxy configured to use your toolshed.
168
169 *Generated tool Security* Once you install a generated tool, it's just
170 another tool - assuming the script is safe. They just run normally and their
171 user cannot do anything unusually insecure but please, practice safe toolshed.
172 Read the fucking code before you install any tool. Especially this one -
173 it is really scary.
174
175 If you opt for an HTML output, you get all the script outputs arranged
176 as a single Html history item - all output files are linked, thumbnails for
177 all the pdfs. Ugly but really inexpensive.
178
179 Patches and suggestions welcome as bitbucket issues please?
180
181 copyright ross lazarus (ross stop lazarus at gmail stop com) May 2012
182
183 all rights reserved
184 Licensed under the LGPL if you want to improve it, feel free
185 https://bitbucket.org/fubar/galaxytoolfactory/wiki/Home
186
187 Material for our more enthusiastic and voracious readers continues below -
188 we salute you.
189
190 **Motivation** Simple transformation, filtering or reporting scripts get
191 written, run and lost every day in most busy labs - even ours where Galaxy is
192 in use. This 'dark script matter' is pervasive and generally not reproducible.
193
194 **Benefits** For our group, this allows Galaxy to fill that important dark
195 script gap - all those "small" bioinformatics tasks. Once a user has a working
196 R (or python or perl) script that does something Galaxy cannot currently do
197 (eg transpose a tabular file) and takes parameters the way Galaxy supplies
198 them (see example below), they:
199
200 1. Install the tool factory on a personal private instance
201
202 2. Upload a small test data set
203
204 3. Paste the script into the 'script' text box and iteratively run the
205 insecure tool on test data until it works right - there is absolutely no
206 reason to do this anywhere other than on a personal private instance.
207
208 4. Once it works right, set the 'Generate toolshed gzip' option and run
209 it again.
210
211 5. A toolshed style gzip appears ready to upload and install like any other
212 Toolshed entry.
213
214 6. Upload the new tool to the toolshed
215
216 7. Ask the local admin to check the new tool to confirm it's not evil and
217 install it in the local production galaxy
218
219 **Simple examples on the tool form**
220
221 A simple Rscript "filter" showing how the command line parameters can be
222 handled, takes an input file, does something (transpose in this case) and
223 writes the results to a new tabular file::
224
225 # transpose a tabular input file and write as a tabular output file
226 ourargs = commandArgs(TRUE)
227 inf = ourargs[1]
228 outf = ourargs[2]
229 inp = read.table(inf,head=F,row.names=NULL,sep='\t')
230 outp = t(inp)
231 write.table(outp,outf, quote=FALSE, sep="\t",row.names=F,col.names=F)
232
233 Calculate a multiple test adjusted p value from a column of p values -
234 for this script to be useful, it needs the right column for the input to be
235 specified in the code for the given input file type(s) specified when the
236 tool is generated ::
237
238 # use p.adjust - assumes a HEADER row and column 1 - please fix for any
239 real use
240 column = 1 # adjust if necessary for some other kind of input
241 fdrmeth = 'BH'
242 ourargs = commandArgs(TRUE)
243 inf = ourargs[1]
244 outf = ourargs[2]
245 inp = read.table(inf,head=T,row.names=NULL,sep='\t')
246 p = inp[,column]
247 q = p.adjust(p,method=fdrmeth)
248 newval = paste(fdrmeth,'p-value',sep='_')
249 q = data.frame(q)
250 names(q) = newval
251 outp = cbind(inp,newval=q)
252 write.table(outp,outf, quote=FALSE, sep="\t",row.names=F,col.names=T)
253
254
255
256 Another Rscript example without any input file - generates a random heatmap
257 pdf - you must make sure the option to create an HTML output file is
258 turned on for this to work. The heatmap will be presented as a thumbnail
259 linked to the pdf in the resulting HTML page::
260
261 # note this script takes NO input or output because it generates random data
262 foo = data.frame(a=runif(100),b=runif(100),c=runif(100),d=runif(100),
263 e=runif(100),f=runif(100))
264 bar = as.matrix(foo)
265 pdf( "heattest.pdf" )
266 heatmap(bar,main='Random Heatmap')
267 dev.off()
268
269 A Python example that reverses each row of a tabular file. You'll need
270 to remove the leading spaces for this to work if cut and pasted into the
271 script box. Note that you can already do this in Galaxy by setting up the
272 cut columns tool with the correct number of columns in reverse order,but
273 this script will work for any number of columns so is completely generic::
274
275 # reverse order of columns in a tabular file
276 import sys
277 inp = sys.argv[1]
278 outp = sys.argv[2]
279 i = open(inp,'r')
280 o = open(outp,'w')
281 for row in i:
282 rs = row.rstrip().split('\t')
283 rs.reverse()
284 o.write('\t'.join(rs))
285 o.write('\n')
286 i.close()
287 o.close()
288
289
290 Galaxy as an IDE for developing API scripts
291 If you need to develop Galaxy API scripts and you like to live dangerously,
292 please read on.
293
294 Galaxy as an IDE?
295 Amazingly enough, blend-lib API scripts run perfectly well *inside*
296 Galaxy when pasted into a Tool Factory form. No need to generate a new
297 tool. Galaxy+Tool_Factory = IDE I think we need a new t-shirt. Seriously,
298 it is actually quite useable.
299
300 Why bother - what's wrong with Eclipse
301 Nothing. But, compared with developing API scripts in the usual way outside
302 Galaxy, you get persistence and other framework benefits plus at absolutely
303 no extra charge, a ginormous security problem if you share the history or
304 any outputs because they contain the api script with key so development
305 servers only please!
306
307 Workflow
308 Fire up the Tool Factory in Galaxy.
309
310 Leave the input box empty, set the interpreter to python, paste and run an
311 api script - eg working example (substitute the url and key) below.
312
313 It took me a few iterations to develop the example below because I know
314 almost nothing about the API. I started with very simple code from one of the
315 samples and after each run, the (edited..) api script is conveniently recreated
316 using the redo button on the history output item. So each successive version
317 of the developing api script you run is persisted - ready to be edited and
318 rerun easily. It is ''very'' handy to be able to add a line of code to the
319 script and run it, then view the output to (eg) inspect dicts returned by
320 API calls to help move progressively deeper iteratively.
321
322 Give the below a whirl on a private clone (install the tool factory from
323 the main toolshed) and try adding complexity with few rerun/edit/rerun cycles.
324
325 Eg tool factory api script
326 import sys
327 from blend.galaxy import GalaxyInstance
328 ourGal = 'http://x.x.x.x:xxxx'
329 ourKey = 'xxx'
330 gi = GalaxyInstance(ourGal, key=ourKey)
331 libs = gi.libraries.get_libraries()
332 res = []
333 # libs looks like
334 # u'url': u'/galaxy/api/libraries/441d8112651dc2f3', u'id':
335 u'441d8112651dc2f3', u'name':.... u'Demonstration sample RNA data',
336 for lib in libs:
337 res.append('%s:\n' % lib['name'])
338 res.append(str(gi.libraries.show_library(lib['id'],contents=True)))
339 outf=open(sys.argv[2],'w')
340 outf.write('\n'.join(res))
341 outf.close()
342
343 **Attribution**
344 Creating re-usable tools from scripts: The Galaxy Tool Factory
345 Ross Lazarus; Antony Kaspi; Mark Ziemann; The Galaxy Team
346 Bioinformatics 2012; doi: 10.1093/bioinformatics/bts573
347
348 http://bioinformatics.oxfordjournals.org/cgi/reprint/bts573?ijkey=lczQh1sWrMwdYWJ&keytype=ref
349
350 **Licensing**
351 Copyright Ross Lazarus 2010
352 ross lazarus at g mail period com
353
354 All rights reserved.
355
356 Licensed under the LGPL
357
358 **Obligatory screenshot**
359
360 http://bitbucket.org/fubar/galaxytoolmaker/src/fda8032fe989/images/dynamicScriptTool.png
361