Mercurial > repos > fubar > toolfactory
diff fubar-galaxytoolfactory-cfcf6c9df5b7/README.txt @ 1:87613ace5113 draft
Uploaded
author | fubar |
---|---|
date | Sat, 11 Aug 2012 02:41:28 -0400 |
parents | |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/fubar-galaxytoolfactory-cfcf6c9df5b7/README.txt Sat Aug 11 02:41:28 2012 -0400 @@ -0,0 +1,233 @@ +# WARNING before you start +# Install on a private Galaxy ONLY +# Please NEVER on a public or production instance + +*Short Story* +This is an unusual Galaxy tool that generates very simple but potentially +very useful local Galaxy tools that run the user supplied script (R, python, perl...) over a single input file. +Whenever you run this tool, the ToolFactory, you should have prepared a script to paste into a text box, +and a small test input example ready to select from your history to test your new script + +If the script runs sucessfully, a new Galaxy tool that runs your script can be generated. +The new tool is in the form of a special new Galaxy datatype - toolshed.gz - as the name suggests, +it's an archive ready to upload to a Galaxy ToolShed as a new tool repository. + +Once it's in a ToolShed, it can be installed into any local Galaxy server from +the server administrative interface. + +Once your new tool is installed, local users can run it - each time, the script that was supplied +when it was built will be executed with the input chosen from the user's history. In other words, +the tools you generate with the ToolFactory run just like any other Galaxy tool, +but run your script every time. + +*Reasons to read further* + +If you use Galaxy to support your research; + +You and fellow users are sometimes forced to take data out of Galaxy, process it with ugly +little perl/awk/sed/R... scripts and put it back; + +You do this when you can't do some transformation in Galaxy (the 90/10 rule); + +You don't have enough developer resources for wrapping dozens of even relatively simple tools; + +Your research and your institution would be far better off if those feral scripts were all tucked safely in +your local toolshed and Galaxy histories. + +*The good news* If it can be trivially scripted, it can be running safely in your +local Galaxy via your own local toolshed in a few minutes - with functional tests. + + +*Value proposition* The ToolFactory allows Galaxy to efficiently take over most of your lab's dark script matter, +making it reproducible in Galaxy and shareable through the ToolShed. + +That's what this tool does. You paste a simple script and the tool returns +a new, real Galaxy tool, ready to be installed from the local toolshed to local servers. +Scripts can be wrapped and online literally within minutes. + +*To fully and safely exploit the awesome power* of this tool, Galaxy and the ToolShed, +you should be a developer installing this tool on a private/personal/scratch local instance where you are an admin_user. +Then, if you break it, you get to keep all the pieces +see https://bitbucket.org/fubar/galaxytoolfactory/wiki/Home + +** Installation ** +This is a Galaxy tool. You can install it most conveniently using the administrative "Search and browse tool sheds" link. +Find the Galaxy Test toolshed (not main) and search for the toolfactory repository. +Open it and review the code and select the option to install it. + +If you can't get the tool that way, the xml and py files here need to be copied into a new tools subdirectory such as tools/toolfactory +Your tool_conf.xml needs a new entry pointing to the xml file - something like:: + + <section name="Tool building tools" id="toolbuilders"> + <tool file="toolfactory/rgToolFactory.xml"/> + </section> + +If not already there (I just added it to datatypes_conf.xml.sample), please add: +<datatype extension="toolshed.gz" type="galaxy.datatypes.binary:Binary" mimetype="multipart/x-gzip" subclass="True" /> +to your local data_types_conf.xml. + +Ensure that html sanitization is set to False and uncommented in universe_wsgi.ini + +You'll have to restart the server for the new tool to be available. + +Of course, R, python, perl etc are needed on your path if you want to test scripts using those interpreters. +Adding new ones to this tool code should be easy enough. Please make suggestions as bitbucket issues and code. +The HTML file code automatically shrinks R's bloated pdfs, and depends on ghostscript. The thumbnails require imagemagick . + +* Restricted execution * +The new tool factory tool will then be usable ONLY by admin users - people with IDs in admin_users in universe_wsgi.ini +**Yes, that's right. ONLY admin_users can run this tool** Think about it for a moment. If allowed to run any +arbitrary script on your Galaxy server, the only thing that would impede a miscreant bent on destroying all your +Galaxy data would probably be lack of appropriate technical skills. + +*What it does* This is a tool factory for simple scripts in python, R and perl currently. +Functional tests are automatically generated. How cool is that. + +LIMITED to simple scripts that read one input from the history. +Optionally can write one new history dataset, +and optionally collect any number of outputs into links on an autogenerated HTML +index page for the user to navigate - useful if the script writes images and output files - pdf outputs +are shown as thumbnails and R's bloated pdf's are shrunk with ghostscript so that and imagemagik need to +be avaailable. + +Generated tools can be edited and enhanced like any Galaxy tool, so start small and build up since +a generated script gets you a serious leg up to a more complex one. + +*What you do* You paste and run your script +you fix the syntax errors and eventually it runs +You can use the redo button and edit the script before +trying to rerun it as you debug - it works pretty well. + +Once the script works on some test data, you can +generate a toolshed compatible gzip file +containing your script ready to run as an ordinary Galaxy tool in a +repository on your local toolshed. That means safe and largely automated installation in any +production Galaxy configured to use your toolshed. + +*Generated tool Security* Once you install a generated tool, it's just +another tool - assuming the script is safe. They just run normally and their user cannot do anything unusually insecure +but please, practice safe toolshed. +Read the fucking code before you install any tool. +Especially this one - it is really scary. + +If you opt for an HTML output, you get all the script outputs arranged +as a single Html history item - all output files are linked, thumbnails for all the pdfs. +Ugly but really inexpensive. + +Patches and suggestions welcome as bitbucket issues please? + +long route to June 2012 product +derived from an integrated script model +called rgBaseScriptWrapper.py +Note to the unwary: + This tool allows arbitrary scripting on your Galaxy as the Galaxy user + There is nothing stopping a malicious user doing whatever they choose + Extremely dangerous!! + Totally insecure. So, trusted users only + + + + +copyright ross lazarus (ross stop lazarus at gmail stop com) May 2012 + +all rights reserved +Licensed under the LGPL if you want to improve it, feel free https://bitbucket.org/fubar/galaxytoolfactory/wiki/Home + +Material for our more enthusiastic and voracious readers continues below - we salute you. + +**Motivation** Simple transformation, filtering or reporting scripts get written, run and lost every day in most busy labs +- even ours where Galaxy is in use. This 'dark script matter' is pervasive and generally not reproducible. + +**Benefits** For our group, this allows Galaxy to fill that important dark script gap - all those "small" bioinformatics +tasks. Once a user has a working R (or python or perl) script that does something Galaxy cannot currently do (eg transpose a +tabular file) and takes parameters the way Galaxy supplies them (see example below), they: + +1. Install the tool factory on a personal private instance + +2. Upload a small test data set + +3. Paste the script into the 'script' text box and iteratively run the insecure tool on test data until it works right - +there is absolutely no reason to do this anywhere other than on a personal private instance. + +4. Once it works right, set the 'Generate toolshed gzip' option and run it again. + +5. A toolshed style gzip appears ready to upload and install like any other Toolshed entry. + +6. Upload the new tool to the toolshed + +7. Ask the local admin to check the new tool to confirm it's not evil and install it in the local production galaxy + +**Simple examples on the tool form** + +A simple Rscript "filter" showing how the command line parameters can be handled, takes an input file, +does something (transpose in this case) and writes the results to a new tabular file:: + + # transpose a tabular input file and write as a tabular output file + ourargs = commandArgs(TRUE) + inf = ourargs[1] + outf = ourargs[2] + inp = read.table(inf,head=F,row.names=NULL,sep='\t') + outp = t(inp) + write.table(outp,outf, quote=FALSE, sep="\t",row.names=F,col.names=F) + +Calculate a multiple test adjusted p value from a column of p values - for this script to be useful, +it needs the right column for the input to be specified in the code for the +given input file type(s) specified when the tool is generated :: + + # use p.adjust - assumes a HEADER row and column 1 - please fix for any real use + column = 1 # adjust if necessary for some other kind of input + fdrmeth = 'BH' + ourargs = commandArgs(TRUE) + inf = ourargs[1] + outf = ourargs[2] + inp = read.table(inf,head=T,row.names=NULL,sep='\t') + p = inp[,column] + q = p.adjust(p,method=fdrmeth) + newval = paste(fdrmeth,'p-value',sep='_') + q = data.frame(q) + names(q) = newval + outp = cbind(inp,newval=q) + write.table(outp,outf, quote=FALSE, sep="\t",row.names=F,col.names=T) + + + +Another Rscript example without any input file - generates a random heatmap pdf - you must make sure the option to create an HTML output file is +turned on for this to work. The heatmap will be presented as a thumbnail linked to the pdf in the resulting HTML page:: + + # note this script takes NO input or output because it generates random data + foo = data.frame(a=runif(100),b=runif(100),c=runif(100),d=runif(100),e=runif(100),f=runif(100)) + bar = as.matrix(foo) + pdf( "heattest.pdf" ) + heatmap(bar,main='Random Heatmap') + dev.off() + +A Python example that reverses each row of a tabular file. You'll need to remove the leading spaces for this to work if cut +and pasted into the script box. Note that you can already do this in Galaxy by setting up the cut columns tool with the +correct number of columns in reverse order,but this script will work for any number of columns so is completely generic:: + +# reverse order of columns in a tabular file +import sys +inp = sys.argv[1] +outp = sys.argv[2] +i = open(inp,'r') +o = open(outp,'w') +for row in i: + rs = row.rstrip().split('\t') + rs.reverse() + o.write('\t'.join(rs)) + o.write('\n') +i.close() +o.close() + + +**Attribution** Copyright Ross Lazarus (ross period lazarus at gmail period com) May 2012 + +All rights reserved. + +Licensed under the LGPL + + +**Obligatory screenshot** + +http://bitbucket.org/fubar/galaxytoolmaker/src/fda8032fe989/images/dynamicScriptTool.png +