Mercurial > repos > fubar > tool_factory_2
comparison toolfactory/README.md @ 49:35a912ce0c83 draft
Can now make the bwa example from planemo :)
author | fubar |
---|---|
date | Thu, 27 Aug 2020 23:11:01 -0400 |
parents | ad564ab3cf7b |
children | 68fbdbe35f08 |
comparison
equal
deleted
inserted
replaced
48:5a7a5b06bce0 | 49:35a912ce0c83 |
---|---|
1 *WARNING before you start* | 1 **Breaking news! Docker container is recommended as at August 2020** |
2 | 2 |
3 Install this tool on a private Galaxy ONLY | 3 A Docker container can be built - see the docker directory. |
4 It is highly recommended for isolation. It also has an integrated toolshed to allow installation of new tools back | |
5 into the Galaxy being used to generate them. | |
6 | |
7 Built from quay.io/bgruening/galaxy:20.05 but updates the | |
8 Galaxy code to the dev branch - it seems to work fine with updated bioblend>=0.14 | |
9 with planemo and the right version of gxformat2 needed by the ToolFactory (TF). | |
10 | |
11 The runclean.sh script run from the docker subdirectory of your local clone of this repository | |
12 should create a container (eventually) and serve it at localhost:8080 with a toolshed at | |
13 localhost:9009. | |
14 | |
15 Once it's up, please restart Galaxy in the container with | |
16 ```docker exec [container name] supervisorctl restart galaxy: ``` | |
17 Jobs just do not seem to run properly otherwise and the next steps won't work! | |
18 | |
19 The generated container includes a workflow and 2 sample data sets for the workflow | |
20 | |
21 Load the workflow. Adjust the inputs for each as labelled. The perl example counts GC in phiX.fasta. | |
22 The python scripts use the rgToolFactory.py as their input - any text file will work but I like the | |
23 recursion. The BWA example has some mitochondrial reads and reference. Run the workflow and watch. | |
24 This should fill the history with some sample tools you can rerun and play with. | |
25 Note that each new tool will have been tested using Planemo. In the workflow, in Galaxy. | |
26 Extremely cool to watch. | |
27 | |
28 *WARNING* | |
29 | |
30 Install this tool on a throw-away private Galaxy or Docker container ONLY | |
4 Please NEVER on a public or production instance | 31 Please NEVER on a public or production instance |
5 | |
6 Updated august 2014 by John Chilton adding citation support | |
7 | 32 |
8 Updated august 8 2014 to fix bugs reported by Marius van den Beek | 33 *Short Story* |
9 | 34 |
10 Please cite the resource at | 35 Galaxy is easily extended to new applications by adding a new tool. Each new scientific computational package added as |
11 http://bioinformatics.oxfordjournals.org/cgi/reprint/bts573?ijkey=lczQh1sWrMwdYWJ&keytype=ref | 36 a tool to Galaxy requires some special instructions to be written. This is sometimes termed "wrapping" the package |
12 if you use this tool in your published work. | 37 because the instructions tell Galaxy how to run the package as a new Galaxy tool. Any tool in a Galaxy is |
38 readily available to all the users through a consistent and easy to use interface. | |
13 | 39 |
14 **Short Story** | 40 Most Galaxy tool wrappers have been manually prepared by skilled programmers, many using Planemo because it |
41 automates much of the basic boilerplate and makes the process much easier. The ToolFactory (TF) | |
42 uses Planemo under the hood for many functions, but hides the command | |
43 line complexities from the TF user. | |
15 | 44 |
16 This is an unusual Galaxy tool capable of generating new Galaxy tools. | 45 *More Explanation* |
17 It works by exposing *unrestricted* and therefore extremely dangerous scripting | 46 |
47 The TF is an unusual Galaxy tool, designed to allow a skilled user to make new Galaxy tools. | |
48 It appears in Galaxy just like any other tool but outputs include new Galaxy tools generated | |
49 using instructions provided by the user and the results of Planemo lint and tool testing using | |
50 small sample inputs provided by the TF user. The small samples become tests built in to the new tool. | |
51 | |
52 It offers a familiar Galaxy form driven way to define how the user of the new tool will | |
53 choose input data from their history, and what parameters the new tool user will be able to adjust. | |
54 The TF user must know, or be able to read, enough about the tool to be able to define the details of | |
55 the new Galaxy interface and the ToolFactory offers little guidance on that other than some examples. | |
56 | |
57 Tools always depend on other things. Most tools in Galaxy depend on third party | |
58 scientific packages, so TF tools usually have one or more dependencies. These can be | |
59 scientific packages such as BWA or scripting languages such as Python and are | |
60 usually managed by Conda. If the new tool relies on a system utility such as bash or awk | |
61 where the importance of version control on reproducibility is low, these can be used without | |
62 Conda management - but remember the potential risks of unmanaged dependencies on computational | |
63 reproducibility. | |
64 | |
65 The TF user can optionally supply a working script where scripting is | |
66 required and the chosen dependency is a scripting language such as Python or a system | |
67 scripting executable such as bash. Whatever the language, the script must correctly parse the command line | |
68 arguments it receives at tool execution, as they are defined by the TF user. The | |
69 text of that script is "baked in" to the new tool and will be executed each time | |
70 the new tool is run. It is highly recommended that scripts and their command lines be developed | |
71 and tested until proven to work before the TF is invoked. Galaxy as a software development | |
72 environment is actually possible, but not recommended being somewhat clumsy and inefficient. | |
73 | |
74 Tools nearly always take one or more data sets from the user's history as input. TF tools | |
75 allow the TF user to define what Galaxy datatypes the tool end user will be able to choose and what | |
76 names or positions will be used to pass them on a command line to the package or script. | |
77 | |
78 Tools often have various parameter settings. The TF allows the TF user to define how each | |
79 parameter will appear on the tool form to the end user, and what names or positions will be | |
80 used to pass them on the command line to the package. At present, parameters are limited to | |
81 simple text and number fields. Pull requests for other kinds of parameters that galaxyxml | |
82 can handle are welcomed. | |
83 | |
84 Best practice Galaxy tools have one or more automated tests. These should use small sample data sets and | |
85 specific parameter settings so when the tool is tested, the outputs can be compared with their expected | |
86 values. The TF will automatically create a test for the new tool. It will use the sample data sets | |
87 chosen by the TF user when they built the new tool. | |
88 | |
89 The TF works by exposing *unrestricted* and therefore extremely dangerous scripting | |
18 to all designated administrators of the host Galaxy server, allowing them to | 90 to all designated administrators of the host Galaxy server, allowing them to |
19 run scripts in R, python, sh and perl over multiple selected input data sets, | 91 run scripts in R, python, sh and perl. For this reason, a Docker container is |
20 writing a single new data set as output. | 92 available to help manage the associated risks. |
21 | 93 |
22 *You have a working r/python/perl/bash script or any executable with positional or argparse style parameters* | 94 *Scripting uses* |
23 | 95 |
24 It can be turned into an ordinary Galaxy tool in minutes, using a Galaxy tool. | 96 To use a scripting language to create a new tool, you must first prepared and properly test a script. Use small sample |
97 data sets for testing. When the script is working correctly, upload the small sample datasets | |
98 into a new history, start configuring a new ToolFactory tool, and paste the script into the script text box on the TF form. | |
25 | 99 |
26 | 100 *Outputs* |
27 **Automated generation of new Galaxy tools for installation into any Galaxy** | |
28 | |
29 A test is generated using small sample test data inputs and parameter settings you supply. | |
30 Once the test case outputs have been produced, they can be used to build a | |
31 new Galaxy tool. The supplied script or executable is baked as a requirement | |
32 into a new, ordinary Galaxy tool, fully workflow compatible out of the box. | |
33 Generated tools are installed via a tool shed by an administrator | |
34 and work exactly like all other Galaxy tools for your users. | |
35 | |
36 **More Detail** | |
37 | |
38 To use the ToolFactory, you should have prepared a script to paste into a | |
39 text box, or have a package in mind and a small test input example ready to select from your history | |
40 to test your new script. | |
41 | |
42 ```planemo test rgToolFactory2.xml --galaxy_root ~/galaxy --test_data ~/galaxy/tools/tool_makers/toolfactory/test-data``` works for me | |
43 | |
44 There is an example in each scripting language on the Tool Factory form. You | |
45 can just cut and paste these to try it out - remember to select the right | |
46 interpreter please. You'll also need to create a small test data set using | |
47 the Galaxy history add new data tool. | |
48 | |
49 If the script fails somehow, use the "redo" button on the tool output in | |
50 your history to recreate the form complete with broken script. Fix the bug | |
51 and execute again. Rinse, wash, repeat. | |
52 | 101 |
53 Once the script runs sucessfully, a new Galaxy tool that runs your script | 102 Once the script runs sucessfully, a new Galaxy tool that runs your script |
54 can be generated. Select the "generate" option and supply some help text and | 103 can be generated. Select the "generate" option and supply some help text and |
55 names. The new tool will be generated in the form of a new Galaxy datatype | 104 names. The new tool will be generated in the form of a new Galaxy datatype |
56 *toolshed.gz* - as the name suggests, it's an archive ready to upload to a | 105 *tgz* - as the name suggests, it's an archive ready to upload to a |
57 Galaxy ToolShed as a new tool repository. | 106 Galaxy ToolShed as a new tool repository. |
107 | |
108 It is also possible to run a tool to generate test outputs, then test it | |
109 using planemo. A toolshed is built in to the Docker container and configured | |
110 so a tool can be tested, sent to that toolshed, then installed in the Galaxy | |
111 where the TF is running. | |
112 | |
113 If the tool requires a command or test XML override, then planemo is | |
114 needed to generate test outputs to make a complete tool, rerun to test | |
115 and if required upload to the local toolshed and install in the Galaxy | |
116 where the TF is running. | |
58 | 117 |
59 Once it's in a ToolShed, it can be installed into any local Galaxy server | 118 Once it's in a ToolShed, it can be installed into any local Galaxy server |
60 from the server administrative interface. | 119 from the server administrative interface. |
61 | 120 |
62 Once the new tool is installed, local users can run it - each time, the script | 121 Once the new tool is installed, local users can run it - each time, the |
63 that was supplied when it was built will be executed with the input chosen | 122 package and/or script that was supplied when it was built will be executed with the input chosen |
64 from the user's history. In other words, the tools you generate with the | 123 from the user's history, together with user supplied parameters. In other words, the tools you generate with the |
65 ToolFactory run just like any other Galaxy tool,but run your script every time. | 124 ToolFactory run just like any other Galaxy tool. |
66 | 125 |
67 Tool factory tools are perfect for workflow components. One input, one output, | 126 TF generated tools work as normal workflow components. |
68 no variables. | |
69 | 127 |
70 *To fully and safely exploit the awesome power* of this tool, | |
71 Galaxy and the ToolShed, you should be a developer installing this | |
72 tool on a private/personal/scratch local instance where you are an | |
73 admin_user. Then, if you break it, you get to keep all the pieces see | |
74 https://bitbucket.org/fubar/galaxytoolfactory/wiki/Home | |
75 | 128 |
76 **Installation** | 129 *Limitations* |
77 This is a Galaxy tool. You can install it most conveniently using the | 130 |
131 The TF is flexible enough to generate wrappers for many common scientific packages | |
132 but the inbuilt automation will not cope with all possible situations. Users can | |
133 supply overrides for two tool XML segments - tests and command and the BWA | |
134 example in the supplied samples workflow illustrates their use. | |
135 | |
136 *Installation* | |
137 | |
138 The Docker container is the best way to use the TF because it is preconfigured | |
139 to automate new tool testing and has a built in local toolshed where each new tool | |
140 is uploaded. It is easy to install without Docker, but you will need to make some | |
141 configuration changes (TODO write a configuration). You can install it most conveniently using the | |
78 administrative "Search and browse tool sheds" link. Find the Galaxy Main | 142 administrative "Search and browse tool sheds" link. Find the Galaxy Main |
79 toolshed at https://toolshed.g2.bx.psu.edu/ and search for the toolfactory | 143 toolshed at https://toolshed.g2.bx.psu.edu/ and search for the toolfactory |
80 repository. Open it and review the code and select the option to install it. | 144 repository in the Tool Maker section. Open it and review the code and select the option to install it. |
81 | 145 |
82 If you can't get the tool that way, the xml and py files here need to be | 146 Otherwise, if not already there pending an accepted PR, |
83 copied into a new tools | |
84 subdirectory such as tools/toolfactory Your tool_conf.xml needs a new entry | |
85 pointing to the xml | |
86 file - something like:: | |
87 | |
88 <section name="Tool building tools" id="toolbuilders"> | |
89 <tool file="toolfactory/rgToolFactory.xml"/> | |
90 </section> | |
91 | |
92 If not already there, | |
93 please add: | 147 please add: |
94 <datatype extension="toolshed.gz" type="galaxy.datatypes.binary:Binary" | 148 <datatype extension="tgz" type="galaxy.datatypes.binary:Binary" |
95 mimetype="multipart/x-gzip" subclass="True" /> | 149 mimetype="multipart/x-gzip" subclass="True" /> |
96 to your local data_types_conf.xml. | 150 to your local data_types_conf.xml. |
97 | 151 |
98 | 152 |
99 **Restricted execution** | 153 *Restricted execution* |
100 | 154 |
101 The tool factory tool itself will then be usable ONLY by admin users - | 155 The tool factory tool itself will then be usable ONLY by admin users - |
102 people with IDs in admin_users in universe_wsgi.ini **Yes, that's right. ONLY | 156 people with IDs in admin_users. **Yes, that's right. ONLY |
103 admin_users can run this tool** Think about it for a moment. If allowed to | 157 admin_users can run this tool** Think about it for a moment. If allowed to |
104 run any arbitrary script on your Galaxy server, the only thing that would | 158 run any arbitrary script on your Galaxy server, the only thing that would |
105 impede a miscreant bent on destroying all your Galaxy data would probably | 159 impede a miscreant bent on destroying all your Galaxy data would probably |
106 be lack of appropriate technical skills. | 160 be lack of appropriate technical skills. |
107 | |
108 **What it does** | |
109 | |
110 This is a tool factory for simple scripts in python, R and | |
111 perl currently. Functional tests are automatically generated. How cool is that. | |
112 | |
113 LIMITED to simple scripts that read one input from the history. Optionally can | |
114 write one new history dataset, and optionally collect any number of outputs | |
115 into links on an autogenerated HTML index page for the user to navigate - | |
116 useful if the script writes images and output files - pdf outputs are shown | |
117 as thumbnails and R's bloated pdf's are shrunk with ghostscript so that and | |
118 imagemagik need to be available. | |
119 | |
120 Generated tools can be edited and enhanced like any Galaxy tool, so start | |
121 small and build up since a generated script gets you a serious leg up to a | |
122 more complex one. | |
123 | |
124 **What you do** | |
125 | |
126 You paste and run your script, you fix the syntax errors and | |
127 eventually it runs. You can use the redo button and edit the script before | |
128 trying to rerun it as you debug - it works pretty well. | |
129 | |
130 Once the script works on some test data, you can generate a toolshed compatible | |
131 gzip file containing your script ready to run as an ordinary Galaxy tool in | |
132 a repository on your local toolshed. That means safe and largely automated | |
133 installation in any production Galaxy configured to use your toolshed. | |
134 | 161 |
135 **Generated tool Security** | 162 **Generated tool Security** |
136 | 163 |
137 Once you install a generated tool, it's just | 164 Once you install a generated tool, it's just |
138 another tool - assuming the script is safe. They just run normally and their | 165 another tool - assuming the script is safe. They just run normally and their |
139 user cannot do anything unusually insecure but please, practice safe toolshed. | 166 user cannot do anything unusually insecure but please, practice safe toolshed. |
140 Read the code before you install any tool. Especially this one - it is really scary. | 167 Read the code before you install any tool. Especially this one - it is really scary. |
141 | 168 |
142 **Send Code** | 169 **Send Code** |
143 | 170 |
144 Patches and suggestions welcome as bitbucket issues please? | 171 Pull requests and suggestions welcome as git issues please? |
145 | 172 |
146 **Attribution** | 173 **Attribution** |
147 | 174 |
148 Creating re-usable tools from scripts: The Galaxy Tool Factory | 175 Creating re-usable tools from scripts: The Galaxy Tool Factory |
149 Ross Lazarus; Antony Kaspi; Mark Ziemann; The Galaxy Team | 176 Ross Lazarus; Antony Kaspi; Mark Ziemann; The Galaxy Team |
158 | 185 |
159 All rights reserved. | 186 All rights reserved. |
160 | 187 |
161 Licensed under the LGPL | 188 Licensed under the LGPL |
162 | 189 |
163 **Obligatory screenshot** | |
164 | |
165 http://bitbucket.org/fubar/galaxytoolmaker/src/fda8032fe989/images/dynamicScriptTool.png | |
166 |