comparison toolfactory_docker/README.md @ 2:a5c5652823a6 draft

Uploaded
author fubar
date Tue, 05 Jan 2021 00:35:40 +0000
parents
children
comparison
equal deleted inserted replaced
1:0778fb523693 2:a5c5652823a6
1 # Breaking news! Docker container at https://github.com/fubar2/toolfactory-galaxy-docker recommended as at December 2020
2
3 ## This ToolFactory is for docker use and is used in the new recommended Docker ToolFactory
4
5 ## For non-Docker situations, use the ordinary ToolFactory https://github.com/fubar2/toolfactory
6
7
8 # WARNING
9
10 Install this tool to a throw-away private Galaxy or Docker container ONLY!
11
12 Please NEVER on a public or production instance where a hostile user may
13 be able to gain access if they can acquire an administrative account login.
14
15 It only runs for server administrators - the ToolFactory tool will refuse to execute for an ordinary user since
16 it can install new tools to the Galaxy server it executes on! This is not something you should allow other than
17 on a throw away instance that is protected from potentially hostile users.
18
19 ## Short Story
20
21 Galaxy is easily extended to new applications by adding a new tool. Each new scientific computational package added as
22 a tool to Galaxy requires an XML document describing how the application interacts with Galaxy.
23 This is sometimes termed "wrapping" the package because the instructions tell Galaxy how to run the package
24 as a new Galaxy tool. Any tool that has been wrapped is readily available to all the users through a consistent
25 and easy to use interface once installed in the local Galaxy server.
26
27 Most Galaxy tool wrappers have been manually prepared by skilled programmers, many using Planemo because it
28 automates much of the boilerplate and makes the process much easier.
29 The ToolFactory (TF) now uses Planemo under the hood for testing, but hides the command
30 line complexities. The user will still need appropriate skills in terms of describing the interface between
31 Galaxy and the new application, but will be helped by a Galaxy tool form to collect all the needed
32 settings, together with automated testing and uploading to a toolshed with optional local installation.
33
34 ## More Explanation
35
36 The TF is an unusual Galaxy tool, designed to allow a skilled user to make new Galaxy tools.
37 It appears in Galaxy just like any other tool but outputs include new Galaxy tools generated
38 using instructions provided by the user and the results of Planemo lint and tool testing using
39 small sample inputs provided by the TF user. The small samples become tests built in to the new tool.
40
41 It offers a familiar Galaxy form driven way to define how the user of the new tool will
42 choose input data from their history, and what parameters the new tool user will be able to adjust.
43 The TF user must know, or be able to read, enough about the tool to be able to define the details of
44 the new Galaxy interface and the ToolFactory offers little guidance on that other than some examples.
45
46 Tools always depend on other things. Most tools in Galaxy depend on third party
47 scientific packages, so TF tools usually have one or more dependencies. These can be
48 scientific packages such as BWA or scripting languages such as Python and are
49 managed by Conda. If the new tool relies on a system utility such as bash or awk
50 where the importance of version control on reproducibility is low, these can be used without
51 Conda management - but remember the potential risks of unmanaged dependencies on computational
52 reproducibility.
53
54 The TF user can optionally supply a working script where scripting is
55 required and the chosen dependency is a scripting language such as Python or a system
56 scripting executable such as bash. Whatever the language, the script must correctly parse the command line
57 arguments it receives at tool execution, as they are defined by the TF user. The
58 text of that script is "baked in" to the new tool and will be executed each time
59 the new tool is run. It is highly recommended that scripts and their command lines be developed
60 and tested until proven to work before the TF is invoked. Galaxy as a software development
61 environment is actually possible, but not recommended being somewhat clumsy and inefficient.
62
63 Tools nearly always take one or more data sets from the user's history as input. TF tools
64 allow the TF user to define what Galaxy datatypes the tool end user will be able to choose and what
65 names or positions will be used to pass them on a command line to the package or script.
66
67 Tools often have various parameter settings. The TF allows the TF user to define how each
68 parameter will appear on the tool form to the end user, and what names or positions will be
69 used to pass them on the command line to the package. At present, parameters are limited to
70 simple text and number fields. Pull requests for other kinds of parameters that galaxyxml
71 can handle are welcomed.
72
73 Best practice Galaxy tools have one or more automated tests. These should use small sample data sets and
74 specific parameter settings so when the tool is tested, the outputs can be compared with their expected
75 values. The TF will automatically create a test for the new tool. It will use the sample data sets
76 chosen by the TF user when they built the new tool.
77
78 The TF works by exposing *unrestricted* and therefore extremely dangerous scripting
79 to all designated administrators of the host Galaxy server, allowing them to
80 run scripts in R, python, sh and perl. For this reason, a Docker container is
81 available to help manage the associated risks.
82
83 ## Scripting uses
84
85 To use a scripting language to create a new tool, you must first prepared and properly test a script. Use small sample
86 data sets for testing. When the script is working correctly, upload the small sample datasets
87 into a new history, start configuring a new ToolFactory tool, and paste the script into the script text box on the TF form.
88
89 ### Outputs
90
91 The TF will generate the new tool described on the TF form, and test it
92 using planemo. Optionally if a local toolshed is running, it can be used to
93 install the new tool back into the generating Galaxy.
94
95 A toolshed is built in to the Docker container and configured
96 so a tool can be tested, sent to that toolshed, then installed in the Galaxy
97 where the TF is running using the default toolshed and Galaxy URL and API keys.
98
99 Once it's in a ToolShed, it can be installed into any local Galaxy server
100 from the server administrative interface.
101
102 Once the new tool is installed, local users can run it - each time, the
103 package and/or script that was supplied when it was built will be executed with the input chosen
104 from the user's history, together with user supplied parameters. In other words, the tools you generate with the
105 TF run just like any other Galaxy tool.
106
107 TF generated tools work as normal workflow components.
108
109
110 ## Limitations
111
112 The TF is flexible enough to generate wrappers for many common scientific packages
113 but the inbuilt automation will not cope with all possible situations. Users can
114 supply overrides for two tool XML segments - tests and command and the BWA
115 example in the supplied samples workflow illustrates their use. It does not deal with
116 repeated elements or conditional parameters such as allowing a user to choose to see "simple"
117 or "advanced" parameters (yet) and there will be plenty of packages it just
118 won't cover - but it's a quick and efficient tool for the other 90% of cases. Perfect for
119 that bash one liner you need to get that workflow functioning correctly for this
120 afternoon's demonstration!
121
122 ## Installation
123
124 The Docker container https://github.com/fubar2/toolfactory-galaxy-docker/blob/main/README.md
125 is the best way to use the TF because it is preconfigured
126 to automate new tool testing and has a built in local toolshed where each new tool
127 is uploaded. If you grab the docker container, it should just work after a restart and you
128 can run a workflow to generate all the sample tools. Running the samples and rerunning the ToolFactory
129 jobs that generated them allows you to add fields and experiment to see how things work.
130
131 It can be installed like any other tool from the Toolshed, but you will need to make some
132 configuration changes (TODO write a configuration). You can install it most conveniently using the
133 administrative "Search and browse tool sheds" link. Find the Galaxy Main
134 toolshed at https://toolshed.g2.bx.psu.edu/ and search for the toolfactory
135 repository in the Tool Maker section. Open it and review the code and select the option to install it.
136
137 If not already there please add:
138
139 ```
140 <datatype extension="tgz" type="galaxy.datatypes.binary:Binary" mimetype="multipart/x-gzip" subclass="True" />
141 ```
142
143 to your local config/data_types_conf.xml.
144
145
146 ## Restricted execution
147
148 The tool factory tool itself will ONLY run for admin users -
149 people with IDs in config/galaxy.yml "admin_users".
150
151 *ONLY admin_users can run this tool*
152
153 That doesn't mean it's safe to install on a shared or exposed instance - please don't.
154
155 ## Generated tool Security
156
157 Once you install a generated tool, it's just
158 another tool - assuming the script is safe. They just run normally and their
159 user cannot do anything unusually insecure but please, practice safe toolshed.
160 Read the code before you install any tool. Especially this one - it is really scary.
161
162 ## Attribution
163
164 Creating re-usable tools from scripts: The Galaxy Tool Factory
165 Ross Lazarus; Antony Kaspi; Mark Ziemann; The Galaxy Team
166 Bioinformatics 2012; doi: 10.1093/bioinformatics/bts573
167
168 http://bioinformatics.oxfordjournals.org/cgi/reprint/bts573?ijkey=lczQh1sWrMwdYWJ&keytype=ref
169