2
|
1 # Breaking news! Docker container at https://github.com/fubar2/toolfactory-galaxy-docker recommended as at December 2020
|
|
2
|
|
3 ## This ToolFactory is for docker use and is used in the new recommended Docker ToolFactory
|
|
4
|
|
5 ## For non-Docker situations, use the ordinary ToolFactory https://github.com/fubar2/toolfactory
|
|
6
|
|
7
|
|
8 # WARNING
|
|
9
|
|
10 Install this tool to a throw-away private Galaxy or Docker container ONLY!
|
|
11
|
|
12 Please NEVER on a public or production instance where a hostile user may
|
|
13 be able to gain access if they can acquire an administrative account login.
|
|
14
|
|
15 It only runs for server administrators - the ToolFactory tool will refuse to execute for an ordinary user since
|
|
16 it can install new tools to the Galaxy server it executes on! This is not something you should allow other than
|
|
17 on a throw away instance that is protected from potentially hostile users.
|
|
18
|
|
19 ## Short Story
|
|
20
|
|
21 Galaxy is easily extended to new applications by adding a new tool. Each new scientific computational package added as
|
|
22 a tool to Galaxy requires an XML document describing how the application interacts with Galaxy.
|
|
23 This is sometimes termed "wrapping" the package because the instructions tell Galaxy how to run the package
|
|
24 as a new Galaxy tool. Any tool that has been wrapped is readily available to all the users through a consistent
|
|
25 and easy to use interface once installed in the local Galaxy server.
|
|
26
|
|
27 Most Galaxy tool wrappers have been manually prepared by skilled programmers, many using Planemo because it
|
|
28 automates much of the boilerplate and makes the process much easier.
|
|
29 The ToolFactory (TF) now uses Planemo under the hood for testing, but hides the command
|
|
30 line complexities. The user will still need appropriate skills in terms of describing the interface between
|
|
31 Galaxy and the new application, but will be helped by a Galaxy tool form to collect all the needed
|
|
32 settings, together with automated testing and uploading to a toolshed with optional local installation.
|
|
33
|
|
34 ## More Explanation
|
|
35
|
|
36 The TF is an unusual Galaxy tool, designed to allow a skilled user to make new Galaxy tools.
|
|
37 It appears in Galaxy just like any other tool but outputs include new Galaxy tools generated
|
|
38 using instructions provided by the user and the results of Planemo lint and tool testing using
|
|
39 small sample inputs provided by the TF user. The small samples become tests built in to the new tool.
|
|
40
|
|
41 It offers a familiar Galaxy form driven way to define how the user of the new tool will
|
|
42 choose input data from their history, and what parameters the new tool user will be able to adjust.
|
|
43 The TF user must know, or be able to read, enough about the tool to be able to define the details of
|
|
44 the new Galaxy interface and the ToolFactory offers little guidance on that other than some examples.
|
|
45
|
|
46 Tools always depend on other things. Most tools in Galaxy depend on third party
|
|
47 scientific packages, so TF tools usually have one or more dependencies. These can be
|
|
48 scientific packages such as BWA or scripting languages such as Python and are
|
|
49 managed by Conda. If the new tool relies on a system utility such as bash or awk
|
|
50 where the importance of version control on reproducibility is low, these can be used without
|
|
51 Conda management - but remember the potential risks of unmanaged dependencies on computational
|
|
52 reproducibility.
|
|
53
|
|
54 The TF user can optionally supply a working script where scripting is
|
|
55 required and the chosen dependency is a scripting language such as Python or a system
|
|
56 scripting executable such as bash. Whatever the language, the script must correctly parse the command line
|
|
57 arguments it receives at tool execution, as they are defined by the TF user. The
|
|
58 text of that script is "baked in" to the new tool and will be executed each time
|
|
59 the new tool is run. It is highly recommended that scripts and their command lines be developed
|
|
60 and tested until proven to work before the TF is invoked. Galaxy as a software development
|
|
61 environment is actually possible, but not recommended being somewhat clumsy and inefficient.
|
|
62
|
|
63 Tools nearly always take one or more data sets from the user's history as input. TF tools
|
|
64 allow the TF user to define what Galaxy datatypes the tool end user will be able to choose and what
|
|
65 names or positions will be used to pass them on a command line to the package or script.
|
|
66
|
|
67 Tools often have various parameter settings. The TF allows the TF user to define how each
|
|
68 parameter will appear on the tool form to the end user, and what names or positions will be
|
|
69 used to pass them on the command line to the package. At present, parameters are limited to
|
|
70 simple text and number fields. Pull requests for other kinds of parameters that galaxyxml
|
|
71 can handle are welcomed.
|
|
72
|
|
73 Best practice Galaxy tools have one or more automated tests. These should use small sample data sets and
|
|
74 specific parameter settings so when the tool is tested, the outputs can be compared with their expected
|
|
75 values. The TF will automatically create a test for the new tool. It will use the sample data sets
|
|
76 chosen by the TF user when they built the new tool.
|
|
77
|
|
78 The TF works by exposing *unrestricted* and therefore extremely dangerous scripting
|
|
79 to all designated administrators of the host Galaxy server, allowing them to
|
|
80 run scripts in R, python, sh and perl. For this reason, a Docker container is
|
|
81 available to help manage the associated risks.
|
|
82
|
|
83 ## Scripting uses
|
|
84
|
|
85 To use a scripting language to create a new tool, you must first prepared and properly test a script. Use small sample
|
|
86 data sets for testing. When the script is working correctly, upload the small sample datasets
|
|
87 into a new history, start configuring a new ToolFactory tool, and paste the script into the script text box on the TF form.
|
|
88
|
|
89 ### Outputs
|
|
90
|
|
91 The TF will generate the new tool described on the TF form, and test it
|
|
92 using planemo. Optionally if a local toolshed is running, it can be used to
|
|
93 install the new tool back into the generating Galaxy.
|
|
94
|
|
95 A toolshed is built in to the Docker container and configured
|
|
96 so a tool can be tested, sent to that toolshed, then installed in the Galaxy
|
|
97 where the TF is running using the default toolshed and Galaxy URL and API keys.
|
|
98
|
|
99 Once it's in a ToolShed, it can be installed into any local Galaxy server
|
|
100 from the server administrative interface.
|
|
101
|
|
102 Once the new tool is installed, local users can run it - each time, the
|
|
103 package and/or script that was supplied when it was built will be executed with the input chosen
|
|
104 from the user's history, together with user supplied parameters. In other words, the tools you generate with the
|
|
105 TF run just like any other Galaxy tool.
|
|
106
|
|
107 TF generated tools work as normal workflow components.
|
|
108
|
|
109
|
|
110 ## Limitations
|
|
111
|
|
112 The TF is flexible enough to generate wrappers for many common scientific packages
|
|
113 but the inbuilt automation will not cope with all possible situations. Users can
|
|
114 supply overrides for two tool XML segments - tests and command and the BWA
|
|
115 example in the supplied samples workflow illustrates their use. It does not deal with
|
|
116 repeated elements or conditional parameters such as allowing a user to choose to see "simple"
|
|
117 or "advanced" parameters (yet) and there will be plenty of packages it just
|
|
118 won't cover - but it's a quick and efficient tool for the other 90% of cases. Perfect for
|
|
119 that bash one liner you need to get that workflow functioning correctly for this
|
|
120 afternoon's demonstration!
|
|
121
|
|
122 ## Installation
|
|
123
|
|
124 The Docker container https://github.com/fubar2/toolfactory-galaxy-docker/blob/main/README.md
|
|
125 is the best way to use the TF because it is preconfigured
|
|
126 to automate new tool testing and has a built in local toolshed where each new tool
|
|
127 is uploaded. If you grab the docker container, it should just work after a restart and you
|
|
128 can run a workflow to generate all the sample tools. Running the samples and rerunning the ToolFactory
|
|
129 jobs that generated them allows you to add fields and experiment to see how things work.
|
|
130
|
|
131 It can be installed like any other tool from the Toolshed, but you will need to make some
|
|
132 configuration changes (TODO write a configuration). You can install it most conveniently using the
|
|
133 administrative "Search and browse tool sheds" link. Find the Galaxy Main
|
|
134 toolshed at https://toolshed.g2.bx.psu.edu/ and search for the toolfactory
|
|
135 repository in the Tool Maker section. Open it and review the code and select the option to install it.
|
|
136
|
|
137 If not already there please add:
|
|
138
|
|
139 ```
|
|
140 <datatype extension="tgz" type="galaxy.datatypes.binary:Binary" mimetype="multipart/x-gzip" subclass="True" />
|
|
141 ```
|
|
142
|
|
143 to your local config/data_types_conf.xml.
|
|
144
|
|
145
|
|
146 ## Restricted execution
|
|
147
|
|
148 The tool factory tool itself will ONLY run for admin users -
|
|
149 people with IDs in config/galaxy.yml "admin_users".
|
|
150
|
|
151 *ONLY admin_users can run this tool*
|
|
152
|
|
153 That doesn't mean it's safe to install on a shared or exposed instance - please don't.
|
|
154
|
|
155 ## Generated tool Security
|
|
156
|
|
157 Once you install a generated tool, it's just
|
|
158 another tool - assuming the script is safe. They just run normally and their
|
|
159 user cannot do anything unusually insecure but please, practice safe toolshed.
|
|
160 Read the code before you install any tool. Especially this one - it is really scary.
|
|
161
|
|
162 ## Attribution
|
|
163
|
|
164 Creating re-usable tools from scripts: The Galaxy Tool Factory
|
|
165 Ross Lazarus; Antony Kaspi; Mark Ziemann; The Galaxy Team
|
|
166 Bioinformatics 2012; doi: 10.1093/bioinformatics/bts573
|
|
167
|
|
168 http://bioinformatics.oxfordjournals.org/cgi/reprint/bts573?ijkey=lczQh1sWrMwdYWJ&keytype=ref
|
|
169
|