comparison README.txt @ 29:bff4c9bfabc7 draft

Fixes for escaping all xml characters in help and code - thanks to Marius van den Beek for pointing these out
author fubar
date Thu, 07 Aug 2014 22:11:02 -0400
parents
children fb3fa6a2874d
comparison
equal deleted inserted replaced
28:03bb25f38ea8 29:bff4c9bfabc7
1 # WARNING before you start
2 # Install this tool on a private Galaxy ONLY
3 # Please NEVER on a public or production instance
4 # updated august 8 2014 to fix bugs reported by Marius van den Beek
5 Please cite:
6 http://bioinformatics.oxfordjournals.org/cgi/reprint/bts573?ijkey=lczQh1sWrMwdYWJ&keytype=ref
7 if you use this tool in your published work.
8
9 *Short Story*
10
11 This is an unusual Galaxy tool that exposes unrestricted and therefore extremely dangerous
12 scripting to designated administrative users of a Galaxy server, allowing them to run scripts
13 in R, python, sh and perl over a single input data set, writing a single new data set as output.
14
15 In addition, this tool optionally generates very simple new Galaxy tools, that effectively
16 freeze the supplied script into a new, ordinary Galaxy tool that runs it over a single input file,
17 working just like any other Galaxy tool for your users.
18
19 To use the ToolFactory, you should have prepared a script to paste into a text box,
20 and a small test input example ready to select from your history to test your new script.
21 There is an example in each scripting language on the Tool Factory form. You can just
22 cut and paste these to try it out - remember to select the right interpreter please. You'll
23 also need to create a small test data set using the Galaxy history add new data tool.
24
25 If the script fails somehow, use the "redo" button on the tool output in your history to
26 recreate the form complete with broken script. Fix the bug and execute again. Rinse, wash, repeat.
27
28 Once the script runs sucessfully, a new Galaxy tool that runs your script can be generated.
29 Select the "generate" option and supply some help text and names. The new tool will be
30 generated in the form of a new Galaxy datatype - toolshed.gz - as the name suggests,
31 it's an archive ready to upload to a Galaxy ToolShed as a new tool repository.
32
33 Once it's in a ToolShed, it can be installed into any local Galaxy server from
34 the server administrative interface.
35
36 Once the new tool is installed, local users can run it - each time, the script that was supplied
37 when it was built will be executed with the input chosen from the user's history. In other words,
38 the tools you generate with the ToolFactory run just like any other Galaxy tool,
39 but run your script every time.
40
41 Tool factory tools are perfect for workflow components. One input, one output, no variables.
42
43 *Reasons to read further*
44
45 If you use Galaxy to support your research;
46
47 You and fellow users are sometimes forced to take data out of Galaxy, process it with ugly
48 little perl/awk/sed/R... scripts and put it back;
49
50 You do this when you can't do some transformation in Galaxy (the 90/10 rule);
51
52 You don't have enough developer resources for wrapping dozens of even relatively simple tools;
53
54 Your research and your institution would be far better off if those feral scripts were all tucked
55 safely in your local toolshed and Galaxy histories.
56
57 *The good news* If it can be trivially scripted, it can be running safely in your
58 local Galaxy via your own local toolshed in a few minutes - with functional tests.
59
60
61 *Value proposition* The ToolFactory allows Galaxy to efficiently take over most of your lab's
62 dark script matter, making it reproducible in Galaxy and shareable through the ToolShed.
63
64 That's what this tool does. You paste a simple script and the tool returns
65 a new, real Galaxy tool, ready to be installed from the local toolshed to local servers.
66 Scripts can be wrapped and online literally within minutes.
67
68 *To fully and safely exploit the awesome power* of this tool, Galaxy and the ToolShed,
69 you should be a developer installing this tool on a private/personal/scratch local instance where you
70 are an admin_user. Then, if you break it, you get to keep all the pieces
71 see https://bitbucket.org/fubar/galaxytoolfactory/wiki/Home
72
73 ** Installation **
74 This is a Galaxy tool. You can install it most conveniently using the administrative "Search and browse tool sheds" link.
75 Find the Galaxy Test toolshed (not main) and search for the toolfactory repository.
76 Open it and review the code and select the option to install it.
77
78 If you can't get the tool that way, the xml and py files here need to be copied into a new tools
79 subdirectory such as tools/toolfactory Your tool_conf.xml needs a new entry pointing to the xml
80 file - something like::
81
82 <section name="Tool building tools" id="toolbuilders">
83 <tool file="toolfactory/rgToolFactory.xml"/>
84 </section>
85
86 If not already there (I just added it to datatypes_conf.xml.sample), please add:
87 <datatype extension="toolshed.gz" type="galaxy.datatypes.binary:Binary" mimetype="multipart/x-gzip" subclass="True" />
88 to your local data_types_conf.xml.
89
90 Ensure that html sanitization is set to False and uncommented in universe_wsgi.ini
91
92 You'll have to restart the server for the new tool to be available.
93
94 Of course, R, python, perl etc are needed on your path if you want to test scripts using those interpreters.
95 Adding new ones to this tool code should be easy enough. Please make suggestions as bitbucket issues and code.
96 The HTML file code automatically shrinks R's bloated pdfs, and depends on ghostscript. The thumbnails require imagemagick .
97
98 * Restricted execution *
99 The new tool factory tool will then be usable ONLY by admin users - people with IDs in admin_users in universe_wsgi.ini
100 **Yes, that's right. ONLY admin_users can run this tool** Think about it for a moment. If allowed to run any
101 arbitrary script on your Galaxy server, the only thing that would impede a miscreant bent on destroying all your
102 Galaxy data would probably be lack of appropriate technical skills.
103
104 *What it does* This is a tool factory for simple scripts in python, R and perl currently.
105 Functional tests are automatically generated. How cool is that.
106
107 LIMITED to simple scripts that read one input from the history.
108 Optionally can write one new history dataset,
109 and optionally collect any number of outputs into links on an autogenerated HTML
110 index page for the user to navigate - useful if the script writes images and output files - pdf outputs
111 are shown as thumbnails and R's bloated pdf's are shrunk with ghostscript so that and imagemagik need to
112 be avaailable.
113
114 Generated tools can be edited and enhanced like any Galaxy tool, so start small and build up since
115 a generated script gets you a serious leg up to a more complex one.
116
117 *What you do* You paste and run your script
118 you fix the syntax errors and eventually it runs
119 You can use the redo button and edit the script before
120 trying to rerun it as you debug - it works pretty well.
121
122 Once the script works on some test data, you can
123 generate a toolshed compatible gzip file
124 containing your script ready to run as an ordinary Galaxy tool in a
125 repository on your local toolshed. That means safe and largely automated installation in any
126 production Galaxy configured to use your toolshed.
127
128 *Generated tool Security* Once you install a generated tool, it's just
129 another tool - assuming the script is safe. They just run normally and their user cannot do anything unusually insecure
130 but please, practice safe toolshed.
131 Read the fucking code before you install any tool.
132 Especially this one - it is really scary.
133
134 If you opt for an HTML output, you get all the script outputs arranged
135 as a single Html history item - all output files are linked, thumbnails for all the pdfs.
136 Ugly but really inexpensive.
137
138 Patches and suggestions welcome as bitbucket issues please?
139
140 long route to June 2012 product
141 derived from an integrated script model
142 called rgBaseScriptWrapper.py
143 Note to the unwary:
144 This tool allows arbitrary scripting on your Galaxy as the Galaxy user
145 There is nothing stopping a malicious user doing whatever they choose
146 Extremely dangerous!!
147 Totally insecure. So, trusted users only
148
149
150
151
152 copyright ross lazarus (ross stop lazarus at gmail stop com) May 2012
153
154 all rights reserved
155 Licensed under the LGPL if you want to improve it, feel free https://bitbucket.org/fubar/galaxytoolfactory/wiki/Home
156
157 Material for our more enthusiastic and voracious readers continues below - we salute you.
158
159 **Motivation** Simple transformation, filtering or reporting scripts get written, run and lost every day in most busy labs
160 - even ours where Galaxy is in use. This 'dark script matter' is pervasive and generally not reproducible.
161
162 **Benefits** For our group, this allows Galaxy to fill that important dark script gap - all those "small" bioinformatics
163 tasks. Once a user has a working R (or python or perl) script that does something Galaxy cannot currently do (eg transpose a
164 tabular file) and takes parameters the way Galaxy supplies them (see example below), they:
165
166 1. Install the tool factory on a personal private instance
167
168 2. Upload a small test data set
169
170 3. Paste the script into the 'script' text box and iteratively run the insecure tool on test data until it works right -
171 there is absolutely no reason to do this anywhere other than on a personal private instance.
172
173 4. Once it works right, set the 'Generate toolshed gzip' option and run it again.
174
175 5. A toolshed style gzip appears ready to upload and install like any other Toolshed entry.
176
177 6. Upload the new tool to the toolshed
178
179 7. Ask the local admin to check the new tool to confirm it's not evil and install it in the local production galaxy
180
181 **Simple examples on the tool form**
182
183 A simple Rscript "filter" showing how the command line parameters can be handled, takes an input file,
184 does something (transpose in this case) and writes the results to a new tabular file::
185
186 # transpose a tabular input file and write as a tabular output file
187 ourargs = commandArgs(TRUE)
188 inf = ourargs[1]
189 outf = ourargs[2]
190 inp = read.table(inf,head=F,row.names=NULL,sep='\t')
191 outp = t(inp)
192 write.table(outp,outf, quote=FALSE, sep="\t",row.names=F,col.names=F)
193
194 Calculate a multiple test adjusted p value from a column of p values - for this script to be useful,
195 it needs the right column for the input to be specified in the code for the
196 given input file type(s) specified when the tool is generated ::
197
198 # use p.adjust - assumes a HEADER row and column 1 - please fix for any real use
199 column = 1 # adjust if necessary for some other kind of input
200 fdrmeth = 'BH'
201 ourargs = commandArgs(TRUE)
202 inf = ourargs[1]
203 outf = ourargs[2]
204 inp = read.table(inf,head=T,row.names=NULL,sep='\t')
205 p = inp[,column]
206 q = p.adjust(p,method=fdrmeth)
207 newval = paste(fdrmeth,'p-value',sep='_')
208 q = data.frame(q)
209 names(q) = newval
210 outp = cbind(inp,newval=q)
211 write.table(outp,outf, quote=FALSE, sep="\t",row.names=F,col.names=T)
212
213
214
215 Another Rscript example without any input file - generates a random heatmap pdf - you must make sure the option to create an HTML output file is
216 turned on for this to work. The heatmap will be presented as a thumbnail linked to the pdf in the resulting HTML page::
217
218 # note this script takes NO input or output because it generates random data
219 foo = data.frame(a=runif(100),b=runif(100),c=runif(100),d=runif(100),e=runif(100),f=runif(100))
220 bar = as.matrix(foo)
221 pdf( "heattest.pdf" )
222 heatmap(bar,main='Random Heatmap')
223 dev.off()
224
225 A Python example that reverses each row of a tabular file. You'll need to remove the leading spaces for this to work if cut
226 and pasted into the script box. Note that you can already do this in Galaxy by setting up the cut columns tool with the
227 correct number of columns in reverse order,but this script will work for any number of columns so is completely generic::
228
229 # reverse order of columns in a tabular file
230 import sys
231 inp = sys.argv[1]
232 outp = sys.argv[2]
233 i = open(inp,'r')
234 o = open(outp,'w')
235 for row in i:
236 rs = row.rstrip().split('\t')
237 rs.reverse()
238 o.write('\t'.join(rs))
239 o.write('\n')
240 i.close()
241 o.close()
242
243
244 Galaxy as an IDE for developing API scripts
245 If you need to develop Galaxy API scripts and you like to live dangerously, please read on.
246
247 Galaxy as an IDE?
248 Amazingly enough, blend-lib API scripts run perfectly well *inside* Galaxy when pasted into a Tool Factory form. No need to generate a new tool. Galaxy+Tool_Factory = IDE I think we need a new t-shirt. Seriously, it is actually quite useable.
249
250 Why bother - what's wrong with Eclipse
251 Nothing. But, compared with developing API scripts in the usual way outside Galaxy, you get persistence and other framework benefits plus at absolutely no extra charge, a ginormous security problem if you share the history or any outputs because they contain the api script with key so development servers only please!
252
253 Workflow
254 Fire up the Tool Factory in Galaxy.
255
256 Leave the input box empty, set the interpreter to python, paste and run an api script - eg working example (substitute the url and key) below.
257
258 It took me a few iterations to develop the example below because I know almost nothing about the API. I started with very simple code from one of the samples and after each run, the (edited..) api script is conveniently recreated using the redo button on the history output item. So each successive version of the developing api script you run is persisted - ready to be edited and rerun easily. It is ''very'' handy to be able to add a line of code to the script and run it, then view the output to (eg) inspect dicts returned by API calls to help move progressively deeper iteratively.
259
260 Give the below a whirl on a private clone (install the tool factory from the main toolshed) and try adding complexity with few rerun/edit/rerun cycles.
261
262 Eg tool factory api script
263 import sys
264 from blend.galaxy import GalaxyInstance
265 ourGal = 'http://x.x.x.x:xxxx'
266 ourKey = 'xxx'
267 gi = GalaxyInstance(ourGal, key=ourKey)
268 libs = gi.libraries.get_libraries()
269 res = []
270 # libs looks like
271 # u'url': u'/galaxy/api/libraries/441d8112651dc2f3', u'id': u'441d8112651dc2f3', u'name':.... u'Demonstration sample RNA data',
272 for lib in libs:
273 res.append('%s:\n' % lib['name'])
274 res.append(str(gi.libraries.show_library(lib['id'],contents=True)))
275 outf=open(sys.argv[2],'w')
276 outf.write('\n'.join(res))
277 outf.close()
278
279 **Attribution**
280 Creating re-usable tools from scripts: The Galaxy Tool Factory
281 Ross Lazarus; Antony Kaspi; Mark Ziemann; The Galaxy Team
282 Bioinformatics 2012; doi: 10.1093/bioinformatics/bts573
283
284 http://bioinformatics.oxfordjournals.org/cgi/reprint/bts573?ijkey=lczQh1sWrMwdYWJ&keytype=ref
285
286 **Licensing**
287 Copyright Ross Lazarus 2010
288 ross lazarus at g mail period com
289
290 All rights reserved.
291
292 Licensed under the LGPL
293
294 **Obligatory screenshot**
295
296 http://bitbucket.org/fubar/galaxytoolmaker/src/fda8032fe989/images/dynamicScriptTool.png
297