0
|
1 **Breaking news! Docker container is recommended as at August 2020**
|
|
2
|
|
3 A Docker container can be built - see the docker directory.
|
|
4 It is highly recommended for isolation. It also has an integrated toolshed to allow installation of new tools back
|
|
5 into the Galaxy being used to generate them.
|
|
6
|
|
7 Built from quay.io/bgruening/galaxy:20.05 but updates the
|
|
8 Galaxy code to the dev branch - it seems to work fine with updated bioblend>=0.14
|
|
9 with planemo and the right version of gxformat2 needed by the ToolFactory (TF).
|
|
10
|
|
11 The runclean.sh script run from the docker subdirectory of your local clone of this repository
|
|
12 should create a container (eventually) and serve it at localhost:8080 with a toolshed at
|
|
13 localhost:9009.
|
|
14
|
|
15 Once it's up, please restart Galaxy in the container with
|
|
16 ```docker exec [container name] supervisorctl restart galaxy: ```
|
|
17 Jobs just do not seem to run properly otherwise and the next steps won't work!
|
|
18
|
|
19 The generated container includes a workflow and 2 sample data sets for the workflow
|
|
20
|
|
21 Load the workflow. Adjust the inputs for each as labelled. The perl example counts GC in phiX.fasta.
|
|
22 The python scripts use the rgToolFactory.py as their input - any text file will work but I like the
|
|
23 recursion. The BWA example has some mitochondrial reads and reference. Run the workflow and watch.
|
|
24 This should fill the history with some sample tools you can rerun and play with.
|
|
25 Note that each new tool will have been tested using Planemo. In the workflow, in Galaxy.
|
|
26 Extremely cool to watch.
|
|
27
|
|
28 *WARNING*
|
|
29
|
|
30 Install this tool on a throw-away private Galaxy or Docker container ONLY
|
|
31 Please NEVER on a public or production instance
|
|
32
|
|
33 *Short Story*
|
|
34
|
|
35 Galaxy is easily extended to new applications by adding a new tool. Each new scientific computational package added as
|
|
36 a tool to Galaxy requires some special instructions to be written. This is sometimes termed "wrapping" the package
|
|
37 because the instructions tell Galaxy how to run the package as a new Galaxy tool. Any tool in a Galaxy is
|
|
38 readily available to all the users through a consistent and easy to use interface.
|
|
39
|
|
40 Most Galaxy tool wrappers have been manually prepared by skilled programmers, many using Planemo because it
|
|
41 automates much of the basic boilerplate and makes the process much easier. The ToolFactory (TF)
|
|
42 uses Planemo under the hood for many functions, but hides the command
|
|
43 line complexities from the TF user.
|
|
44
|
|
45 *More Explanation*
|
|
46
|
|
47 The TF is an unusual Galaxy tool, designed to allow a skilled user to make new Galaxy tools.
|
|
48 It appears in Galaxy just like any other tool but outputs include new Galaxy tools generated
|
|
49 using instructions provided by the user and the results of Planemo lint and tool testing using
|
|
50 small sample inputs provided by the TF user. The small samples become tests built in to the new tool.
|
|
51
|
|
52 It offers a familiar Galaxy form driven way to define how the user of the new tool will
|
|
53 choose input data from their history, and what parameters the new tool user will be able to adjust.
|
|
54 The TF user must know, or be able to read, enough about the tool to be able to define the details of
|
|
55 the new Galaxy interface and the ToolFactory offers little guidance on that other than some examples.
|
|
56
|
|
57 Tools always depend on other things. Most tools in Galaxy depend on third party
|
|
58 scientific packages, so TF tools usually have one or more dependencies. These can be
|
|
59 scientific packages such as BWA or scripting languages such as Python and are
|
|
60 usually managed by Conda. If the new tool relies on a system utility such as bash or awk
|
|
61 where the importance of version control on reproducibility is low, these can be used without
|
|
62 Conda management - but remember the potential risks of unmanaged dependencies on computational
|
|
63 reproducibility.
|
|
64
|
|
65 The TF user can optionally supply a working script where scripting is
|
|
66 required and the chosen dependency is a scripting language such as Python or a system
|
|
67 scripting executable such as bash. Whatever the language, the script must correctly parse the command line
|
|
68 arguments it receives at tool execution, as they are defined by the TF user. The
|
|
69 text of that script is "baked in" to the new tool and will be executed each time
|
|
70 the new tool is run. It is highly recommended that scripts and their command lines be developed
|
|
71 and tested until proven to work before the TF is invoked. Galaxy as a software development
|
|
72 environment is actually possible, but not recommended being somewhat clumsy and inefficient.
|
|
73
|
|
74 Tools nearly always take one or more data sets from the user's history as input. TF tools
|
|
75 allow the TF user to define what Galaxy datatypes the tool end user will be able to choose and what
|
|
76 names or positions will be used to pass them on a command line to the package or script.
|
|
77
|
|
78 Tools often have various parameter settings. The TF allows the TF user to define how each
|
|
79 parameter will appear on the tool form to the end user, and what names or positions will be
|
|
80 used to pass them on the command line to the package. At present, parameters are limited to
|
|
81 simple text and number fields. Pull requests for other kinds of parameters that galaxyxml
|
|
82 can handle are welcomed.
|
|
83
|
|
84 Best practice Galaxy tools have one or more automated tests. These should use small sample data sets and
|
|
85 specific parameter settings so when the tool is tested, the outputs can be compared with their expected
|
|
86 values. The TF will automatically create a test for the new tool. It will use the sample data sets
|
|
87 chosen by the TF user when they built the new tool.
|
|
88
|
|
89 The TF works by exposing *unrestricted* and therefore extremely dangerous scripting
|
|
90 to all designated administrators of the host Galaxy server, allowing them to
|
|
91 run scripts in R, python, sh and perl. For this reason, a Docker container is
|
|
92 available to help manage the associated risks.
|
|
93
|
|
94 *Scripting uses*
|
|
95
|
|
96 To use a scripting language to create a new tool, you must first prepared and properly test a script. Use small sample
|
|
97 data sets for testing. When the script is working correctly, upload the small sample datasets
|
|
98 into a new history, start configuring a new ToolFactory tool, and paste the script into the script text box on the TF form.
|
|
99
|
|
100 *Outputs*
|
|
101
|
|
102 Once the script runs sucessfully, a new Galaxy tool that runs your script
|
|
103 can be generated. Select the "generate" option and supply some help text and
|
|
104 names. The new tool will be generated in the form of a new Galaxy datatype
|
|
105 *tgz* - as the name suggests, it's an archive ready to upload to a
|
|
106 Galaxy ToolShed as a new tool repository.
|
|
107
|
|
108 It is also possible to run a tool to generate test outputs, then test it
|
|
109 using planemo. A toolshed is built in to the Docker container and configured
|
|
110 so a tool can be tested, sent to that toolshed, then installed in the Galaxy
|
|
111 where the TF is running.
|
|
112
|
|
113 If the tool requires a command or test XML override, then planemo is
|
|
114 needed to generate test outputs to make a complete tool, rerun to test
|
|
115 and if required upload to the local toolshed and install in the Galaxy
|
|
116 where the TF is running.
|
|
117
|
|
118 Once it's in a ToolShed, it can be installed into any local Galaxy server
|
|
119 from the server administrative interface.
|
|
120
|
|
121 Once the new tool is installed, local users can run it - each time, the
|
|
122 package and/or script that was supplied when it was built will be executed with the input chosen
|
|
123 from the user's history, together with user supplied parameters. In other words, the tools you generate with the
|
|
124 ToolFactory run just like any other Galaxy tool.
|
|
125
|
|
126 TF generated tools work as normal workflow components.
|
|
127
|
|
128
|
|
129 *Limitations*
|
|
130
|
|
131 The TF is flexible enough to generate wrappers for many common scientific packages
|
|
132 but the inbuilt automation will not cope with all possible situations. Users can
|
|
133 supply overrides for two tool XML segments - tests and command and the BWA
|
|
134 example in the supplied samples workflow illustrates their use.
|
|
135
|
|
136 *Installation*
|
|
137
|
|
138 The Docker container is the best way to use the TF because it is preconfigured
|
|
139 to automate new tool testing and has a built in local toolshed where each new tool
|
|
140 is uploaded. If you grab the docker container, it should just work.
|
|
141
|
|
142 If you build the container, there are some things to watch out for. Let it run for 10 minutes
|
|
143 or so once you build it - check with top until conda has finished fussing. Once everything quietens
|
|
144 down, find the container with
|
|
145 ```docker ps```
|
|
146 and use
|
|
147 ```docker exec [containername] supervisorctl restart galaxy:```
|
|
148 That colon is not a typographical mistake.
|
|
149 Not restarting after first boot seems to leave the job/worflow system confused and the workflow
|
|
150 just will not run properly until Galaxy has restarted.
|
|
151
|
|
152 Login as admin@galaxy.org with password "password". Feel free to change it once you are logged in.
|
|
153 There should be a companion toolshed at localhost:9090. The history should have some sample data for
|
|
154 the workflow.
|
|
155
|
|
156 Run the workflow and make sure the right dataset is selected for each of the input files. Most of the
|
|
157 examples use text files so should run, but the bwa example needs the right ones to work properly.
|
|
158
|
|
159 When the workflow is finished, you will have half a dozen examples to rerun and play with. They have also
|
|
160 all been tested and installed so you should find them in your tool menu under "Generated Tools"
|
|
161
|
|
162 It is easy to install without Docker, but you will need to make some
|
|
163 configuration changes (TODO write a configuration). You can install it most conveniently using the
|
|
164 administrative "Search and browse tool sheds" link. Find the Galaxy Main
|
|
165 toolshed at https://toolshed.g2.bx.psu.edu/ and search for the toolfactory
|
|
166 repository in the Tool Maker section. Open it and review the code and select the option to install it.
|
|
167
|
|
168 Otherwise, if not already there pending an accepted PR,
|
|
169 please add:
|
|
170 <datatype extension="tgz" type="galaxy.datatypes.binary:Binary"
|
|
171 mimetype="multipart/x-gzip" subclass="True" />
|
|
172 to your local data_types_conf.xml.
|
|
173
|
|
174
|
|
175 *Restricted execution*
|
|
176
|
|
177 The tool factory tool itself will then be usable ONLY by admin users -
|
|
178 people with IDs in admin_users. **Yes, that's right. ONLY
|
|
179 admin_users can run this tool** Think about it for a moment. If allowed to
|
|
180 run any arbitrary script on your Galaxy server, the only thing that would
|
|
181 impede a miscreant bent on destroying all your Galaxy data would probably
|
|
182 be lack of appropriate technical skills.
|
|
183
|
|
184 **Generated tool Security**
|
|
185
|
|
186 Once you install a generated tool, it's just
|
|
187 another tool - assuming the script is safe. They just run normally and their
|
|
188 user cannot do anything unusually insecure but please, practice safe toolshed.
|
|
189 Read the code before you install any tool. Especially this one - it is really scary.
|
|
190
|
|
191 **Send Code**
|
|
192
|
|
193 Pull requests and suggestions welcome as git issues please?
|
|
194
|
|
195 **Attribution**
|
|
196
|
|
197 Creating re-usable tools from scripts: The Galaxy Tool Factory
|
|
198 Ross Lazarus; Antony Kaspi; Mark Ziemann; The Galaxy Team
|
|
199 Bioinformatics 2012; doi: 10.1093/bioinformatics/bts573
|
|
200
|
|
201 http://bioinformatics.oxfordjournals.org/cgi/reprint/bts573?ijkey=lczQh1sWrMwdYWJ&keytype=ref
|
|
202
|
|
203 **Licensing**
|
|
204
|
|
205 Copyright Ross Lazarus 2010
|
|
206 ross lazarus at g mail period com
|
|
207
|
|
208 All rights reserved.
|
|
209
|
|
210 Licensed under the LGPL
|
|
211
|