annotate toolfactory/README.md @ 0:83f8bb78781e draft

Uploaded
author fubar
date Fri, 11 Dec 2020 02:51:15 +0000
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
83f8bb78781e Uploaded
fubar
parents:
diff changeset
1 **Breaking news! Docker container is recommended as at August 2020**
83f8bb78781e Uploaded
fubar
parents:
diff changeset
2
83f8bb78781e Uploaded
fubar
parents:
diff changeset
3 A Docker container can be built - see the docker directory.
83f8bb78781e Uploaded
fubar
parents:
diff changeset
4 It is highly recommended for isolation. It also has an integrated toolshed to allow installation of new tools back
83f8bb78781e Uploaded
fubar
parents:
diff changeset
5 into the Galaxy being used to generate them.
83f8bb78781e Uploaded
fubar
parents:
diff changeset
6
83f8bb78781e Uploaded
fubar
parents:
diff changeset
7 Built from quay.io/bgruening/galaxy:20.05 but updates the
83f8bb78781e Uploaded
fubar
parents:
diff changeset
8 Galaxy code to the dev branch - it seems to work fine with updated bioblend>=0.14
83f8bb78781e Uploaded
fubar
parents:
diff changeset
9 with planemo and the right version of gxformat2 needed by the ToolFactory (TF).
83f8bb78781e Uploaded
fubar
parents:
diff changeset
10
83f8bb78781e Uploaded
fubar
parents:
diff changeset
11 The runclean.sh script run from the docker subdirectory of your local clone of this repository
83f8bb78781e Uploaded
fubar
parents:
diff changeset
12 should create a container (eventually) and serve it at localhost:8080 with a toolshed at
83f8bb78781e Uploaded
fubar
parents:
diff changeset
13 localhost:9009.
83f8bb78781e Uploaded
fubar
parents:
diff changeset
14
83f8bb78781e Uploaded
fubar
parents:
diff changeset
15 Once it's up, please restart Galaxy in the container with
83f8bb78781e Uploaded
fubar
parents:
diff changeset
16 ```docker exec [container name] supervisorctl restart galaxy: ```
83f8bb78781e Uploaded
fubar
parents:
diff changeset
17 Jobs just do not seem to run properly otherwise and the next steps won't work!
83f8bb78781e Uploaded
fubar
parents:
diff changeset
18
83f8bb78781e Uploaded
fubar
parents:
diff changeset
19 The generated container includes a workflow and 2 sample data sets for the workflow
83f8bb78781e Uploaded
fubar
parents:
diff changeset
20
83f8bb78781e Uploaded
fubar
parents:
diff changeset
21 Load the workflow. Adjust the inputs for each as labelled. The perl example counts GC in phiX.fasta.
83f8bb78781e Uploaded
fubar
parents:
diff changeset
22 The python scripts use the rgToolFactory.py as their input - any text file will work but I like the
83f8bb78781e Uploaded
fubar
parents:
diff changeset
23 recursion. The BWA example has some mitochondrial reads and reference. Run the workflow and watch.
83f8bb78781e Uploaded
fubar
parents:
diff changeset
24 This should fill the history with some sample tools you can rerun and play with.
83f8bb78781e Uploaded
fubar
parents:
diff changeset
25 Note that each new tool will have been tested using Planemo. In the workflow, in Galaxy.
83f8bb78781e Uploaded
fubar
parents:
diff changeset
26 Extremely cool to watch.
83f8bb78781e Uploaded
fubar
parents:
diff changeset
27
83f8bb78781e Uploaded
fubar
parents:
diff changeset
28 *WARNING*
83f8bb78781e Uploaded
fubar
parents:
diff changeset
29
83f8bb78781e Uploaded
fubar
parents:
diff changeset
30 Install this tool on a throw-away private Galaxy or Docker container ONLY
83f8bb78781e Uploaded
fubar
parents:
diff changeset
31 Please NEVER on a public or production instance
83f8bb78781e Uploaded
fubar
parents:
diff changeset
32
83f8bb78781e Uploaded
fubar
parents:
diff changeset
33 *Short Story*
83f8bb78781e Uploaded
fubar
parents:
diff changeset
34
83f8bb78781e Uploaded
fubar
parents:
diff changeset
35 Galaxy is easily extended to new applications by adding a new tool. Each new scientific computational package added as
83f8bb78781e Uploaded
fubar
parents:
diff changeset
36 a tool to Galaxy requires some special instructions to be written. This is sometimes termed "wrapping" the package
83f8bb78781e Uploaded
fubar
parents:
diff changeset
37 because the instructions tell Galaxy how to run the package as a new Galaxy tool. Any tool in a Galaxy is
83f8bb78781e Uploaded
fubar
parents:
diff changeset
38 readily available to all the users through a consistent and easy to use interface.
83f8bb78781e Uploaded
fubar
parents:
diff changeset
39
83f8bb78781e Uploaded
fubar
parents:
diff changeset
40 Most Galaxy tool wrappers have been manually prepared by skilled programmers, many using Planemo because it
83f8bb78781e Uploaded
fubar
parents:
diff changeset
41 automates much of the basic boilerplate and makes the process much easier. The ToolFactory (TF)
83f8bb78781e Uploaded
fubar
parents:
diff changeset
42 uses Planemo under the hood for many functions, but hides the command
83f8bb78781e Uploaded
fubar
parents:
diff changeset
43 line complexities from the TF user.
83f8bb78781e Uploaded
fubar
parents:
diff changeset
44
83f8bb78781e Uploaded
fubar
parents:
diff changeset
45 *More Explanation*
83f8bb78781e Uploaded
fubar
parents:
diff changeset
46
83f8bb78781e Uploaded
fubar
parents:
diff changeset
47 The TF is an unusual Galaxy tool, designed to allow a skilled user to make new Galaxy tools.
83f8bb78781e Uploaded
fubar
parents:
diff changeset
48 It appears in Galaxy just like any other tool but outputs include new Galaxy tools generated
83f8bb78781e Uploaded
fubar
parents:
diff changeset
49 using instructions provided by the user and the results of Planemo lint and tool testing using
83f8bb78781e Uploaded
fubar
parents:
diff changeset
50 small sample inputs provided by the TF user. The small samples become tests built in to the new tool.
83f8bb78781e Uploaded
fubar
parents:
diff changeset
51
83f8bb78781e Uploaded
fubar
parents:
diff changeset
52 It offers a familiar Galaxy form driven way to define how the user of the new tool will
83f8bb78781e Uploaded
fubar
parents:
diff changeset
53 choose input data from their history, and what parameters the new tool user will be able to adjust.
83f8bb78781e Uploaded
fubar
parents:
diff changeset
54 The TF user must know, or be able to read, enough about the tool to be able to define the details of
83f8bb78781e Uploaded
fubar
parents:
diff changeset
55 the new Galaxy interface and the ToolFactory offers little guidance on that other than some examples.
83f8bb78781e Uploaded
fubar
parents:
diff changeset
56
83f8bb78781e Uploaded
fubar
parents:
diff changeset
57 Tools always depend on other things. Most tools in Galaxy depend on third party
83f8bb78781e Uploaded
fubar
parents:
diff changeset
58 scientific packages, so TF tools usually have one or more dependencies. These can be
83f8bb78781e Uploaded
fubar
parents:
diff changeset
59 scientific packages such as BWA or scripting languages such as Python and are
83f8bb78781e Uploaded
fubar
parents:
diff changeset
60 usually managed by Conda. If the new tool relies on a system utility such as bash or awk
83f8bb78781e Uploaded
fubar
parents:
diff changeset
61 where the importance of version control on reproducibility is low, these can be used without
83f8bb78781e Uploaded
fubar
parents:
diff changeset
62 Conda management - but remember the potential risks of unmanaged dependencies on computational
83f8bb78781e Uploaded
fubar
parents:
diff changeset
63 reproducibility.
83f8bb78781e Uploaded
fubar
parents:
diff changeset
64
83f8bb78781e Uploaded
fubar
parents:
diff changeset
65 The TF user can optionally supply a working script where scripting is
83f8bb78781e Uploaded
fubar
parents:
diff changeset
66 required and the chosen dependency is a scripting language such as Python or a system
83f8bb78781e Uploaded
fubar
parents:
diff changeset
67 scripting executable such as bash. Whatever the language, the script must correctly parse the command line
83f8bb78781e Uploaded
fubar
parents:
diff changeset
68 arguments it receives at tool execution, as they are defined by the TF user. The
83f8bb78781e Uploaded
fubar
parents:
diff changeset
69 text of that script is "baked in" to the new tool and will be executed each time
83f8bb78781e Uploaded
fubar
parents:
diff changeset
70 the new tool is run. It is highly recommended that scripts and their command lines be developed
83f8bb78781e Uploaded
fubar
parents:
diff changeset
71 and tested until proven to work before the TF is invoked. Galaxy as a software development
83f8bb78781e Uploaded
fubar
parents:
diff changeset
72 environment is actually possible, but not recommended being somewhat clumsy and inefficient.
83f8bb78781e Uploaded
fubar
parents:
diff changeset
73
83f8bb78781e Uploaded
fubar
parents:
diff changeset
74 Tools nearly always take one or more data sets from the user's history as input. TF tools
83f8bb78781e Uploaded
fubar
parents:
diff changeset
75 allow the TF user to define what Galaxy datatypes the tool end user will be able to choose and what
83f8bb78781e Uploaded
fubar
parents:
diff changeset
76 names or positions will be used to pass them on a command line to the package or script.
83f8bb78781e Uploaded
fubar
parents:
diff changeset
77
83f8bb78781e Uploaded
fubar
parents:
diff changeset
78 Tools often have various parameter settings. The TF allows the TF user to define how each
83f8bb78781e Uploaded
fubar
parents:
diff changeset
79 parameter will appear on the tool form to the end user, and what names or positions will be
83f8bb78781e Uploaded
fubar
parents:
diff changeset
80 used to pass them on the command line to the package. At present, parameters are limited to
83f8bb78781e Uploaded
fubar
parents:
diff changeset
81 simple text and number fields. Pull requests for other kinds of parameters that galaxyxml
83f8bb78781e Uploaded
fubar
parents:
diff changeset
82 can handle are welcomed.
83f8bb78781e Uploaded
fubar
parents:
diff changeset
83
83f8bb78781e Uploaded
fubar
parents:
diff changeset
84 Best practice Galaxy tools have one or more automated tests. These should use small sample data sets and
83f8bb78781e Uploaded
fubar
parents:
diff changeset
85 specific parameter settings so when the tool is tested, the outputs can be compared with their expected
83f8bb78781e Uploaded
fubar
parents:
diff changeset
86 values. The TF will automatically create a test for the new tool. It will use the sample data sets
83f8bb78781e Uploaded
fubar
parents:
diff changeset
87 chosen by the TF user when they built the new tool.
83f8bb78781e Uploaded
fubar
parents:
diff changeset
88
83f8bb78781e Uploaded
fubar
parents:
diff changeset
89 The TF works by exposing *unrestricted* and therefore extremely dangerous scripting
83f8bb78781e Uploaded
fubar
parents:
diff changeset
90 to all designated administrators of the host Galaxy server, allowing them to
83f8bb78781e Uploaded
fubar
parents:
diff changeset
91 run scripts in R, python, sh and perl. For this reason, a Docker container is
83f8bb78781e Uploaded
fubar
parents:
diff changeset
92 available to help manage the associated risks.
83f8bb78781e Uploaded
fubar
parents:
diff changeset
93
83f8bb78781e Uploaded
fubar
parents:
diff changeset
94 *Scripting uses*
83f8bb78781e Uploaded
fubar
parents:
diff changeset
95
83f8bb78781e Uploaded
fubar
parents:
diff changeset
96 To use a scripting language to create a new tool, you must first prepared and properly test a script. Use small sample
83f8bb78781e Uploaded
fubar
parents:
diff changeset
97 data sets for testing. When the script is working correctly, upload the small sample datasets
83f8bb78781e Uploaded
fubar
parents:
diff changeset
98 into a new history, start configuring a new ToolFactory tool, and paste the script into the script text box on the TF form.
83f8bb78781e Uploaded
fubar
parents:
diff changeset
99
83f8bb78781e Uploaded
fubar
parents:
diff changeset
100 *Outputs*
83f8bb78781e Uploaded
fubar
parents:
diff changeset
101
83f8bb78781e Uploaded
fubar
parents:
diff changeset
102 Once the script runs sucessfully, a new Galaxy tool that runs your script
83f8bb78781e Uploaded
fubar
parents:
diff changeset
103 can be generated. Select the "generate" option and supply some help text and
83f8bb78781e Uploaded
fubar
parents:
diff changeset
104 names. The new tool will be generated in the form of a new Galaxy datatype
83f8bb78781e Uploaded
fubar
parents:
diff changeset
105 *tgz* - as the name suggests, it's an archive ready to upload to a
83f8bb78781e Uploaded
fubar
parents:
diff changeset
106 Galaxy ToolShed as a new tool repository.
83f8bb78781e Uploaded
fubar
parents:
diff changeset
107
83f8bb78781e Uploaded
fubar
parents:
diff changeset
108 It is also possible to run a tool to generate test outputs, then test it
83f8bb78781e Uploaded
fubar
parents:
diff changeset
109 using planemo. A toolshed is built in to the Docker container and configured
83f8bb78781e Uploaded
fubar
parents:
diff changeset
110 so a tool can be tested, sent to that toolshed, then installed in the Galaxy
83f8bb78781e Uploaded
fubar
parents:
diff changeset
111 where the TF is running.
83f8bb78781e Uploaded
fubar
parents:
diff changeset
112
83f8bb78781e Uploaded
fubar
parents:
diff changeset
113 If the tool requires a command or test XML override, then planemo is
83f8bb78781e Uploaded
fubar
parents:
diff changeset
114 needed to generate test outputs to make a complete tool, rerun to test
83f8bb78781e Uploaded
fubar
parents:
diff changeset
115 and if required upload to the local toolshed and install in the Galaxy
83f8bb78781e Uploaded
fubar
parents:
diff changeset
116 where the TF is running.
83f8bb78781e Uploaded
fubar
parents:
diff changeset
117
83f8bb78781e Uploaded
fubar
parents:
diff changeset
118 Once it's in a ToolShed, it can be installed into any local Galaxy server
83f8bb78781e Uploaded
fubar
parents:
diff changeset
119 from the server administrative interface.
83f8bb78781e Uploaded
fubar
parents:
diff changeset
120
83f8bb78781e Uploaded
fubar
parents:
diff changeset
121 Once the new tool is installed, local users can run it - each time, the
83f8bb78781e Uploaded
fubar
parents:
diff changeset
122 package and/or script that was supplied when it was built will be executed with the input chosen
83f8bb78781e Uploaded
fubar
parents:
diff changeset
123 from the user's history, together with user supplied parameters. In other words, the tools you generate with the
83f8bb78781e Uploaded
fubar
parents:
diff changeset
124 ToolFactory run just like any other Galaxy tool.
83f8bb78781e Uploaded
fubar
parents:
diff changeset
125
83f8bb78781e Uploaded
fubar
parents:
diff changeset
126 TF generated tools work as normal workflow components.
83f8bb78781e Uploaded
fubar
parents:
diff changeset
127
83f8bb78781e Uploaded
fubar
parents:
diff changeset
128
83f8bb78781e Uploaded
fubar
parents:
diff changeset
129 *Limitations*
83f8bb78781e Uploaded
fubar
parents:
diff changeset
130
83f8bb78781e Uploaded
fubar
parents:
diff changeset
131 The TF is flexible enough to generate wrappers for many common scientific packages
83f8bb78781e Uploaded
fubar
parents:
diff changeset
132 but the inbuilt automation will not cope with all possible situations. Users can
83f8bb78781e Uploaded
fubar
parents:
diff changeset
133 supply overrides for two tool XML segments - tests and command and the BWA
83f8bb78781e Uploaded
fubar
parents:
diff changeset
134 example in the supplied samples workflow illustrates their use.
83f8bb78781e Uploaded
fubar
parents:
diff changeset
135
83f8bb78781e Uploaded
fubar
parents:
diff changeset
136 *Installation*
83f8bb78781e Uploaded
fubar
parents:
diff changeset
137
83f8bb78781e Uploaded
fubar
parents:
diff changeset
138 The Docker container is the best way to use the TF because it is preconfigured
83f8bb78781e Uploaded
fubar
parents:
diff changeset
139 to automate new tool testing and has a built in local toolshed where each new tool
83f8bb78781e Uploaded
fubar
parents:
diff changeset
140 is uploaded. If you grab the docker container, it should just work.
83f8bb78781e Uploaded
fubar
parents:
diff changeset
141
83f8bb78781e Uploaded
fubar
parents:
diff changeset
142 If you build the container, there are some things to watch out for. Let it run for 10 minutes
83f8bb78781e Uploaded
fubar
parents:
diff changeset
143 or so once you build it - check with top until conda has finished fussing. Once everything quietens
83f8bb78781e Uploaded
fubar
parents:
diff changeset
144 down, find the container with
83f8bb78781e Uploaded
fubar
parents:
diff changeset
145 ```docker ps```
83f8bb78781e Uploaded
fubar
parents:
diff changeset
146 and use
83f8bb78781e Uploaded
fubar
parents:
diff changeset
147 ```docker exec [containername] supervisorctl restart galaxy:```
83f8bb78781e Uploaded
fubar
parents:
diff changeset
148 That colon is not a typographical mistake.
83f8bb78781e Uploaded
fubar
parents:
diff changeset
149 Not restarting after first boot seems to leave the job/worflow system confused and the workflow
83f8bb78781e Uploaded
fubar
parents:
diff changeset
150 just will not run properly until Galaxy has restarted.
83f8bb78781e Uploaded
fubar
parents:
diff changeset
151
83f8bb78781e Uploaded
fubar
parents:
diff changeset
152 Login as admin@galaxy.org with password "password". Feel free to change it once you are logged in.
83f8bb78781e Uploaded
fubar
parents:
diff changeset
153 There should be a companion toolshed at localhost:9090. The history should have some sample data for
83f8bb78781e Uploaded
fubar
parents:
diff changeset
154 the workflow.
83f8bb78781e Uploaded
fubar
parents:
diff changeset
155
83f8bb78781e Uploaded
fubar
parents:
diff changeset
156 Run the workflow and make sure the right dataset is selected for each of the input files. Most of the
83f8bb78781e Uploaded
fubar
parents:
diff changeset
157 examples use text files so should run, but the bwa example needs the right ones to work properly.
83f8bb78781e Uploaded
fubar
parents:
diff changeset
158
83f8bb78781e Uploaded
fubar
parents:
diff changeset
159 When the workflow is finished, you will have half a dozen examples to rerun and play with. They have also
83f8bb78781e Uploaded
fubar
parents:
diff changeset
160 all been tested and installed so you should find them in your tool menu under "Generated Tools"
83f8bb78781e Uploaded
fubar
parents:
diff changeset
161
83f8bb78781e Uploaded
fubar
parents:
diff changeset
162 It is easy to install without Docker, but you will need to make some
83f8bb78781e Uploaded
fubar
parents:
diff changeset
163 configuration changes (TODO write a configuration). You can install it most conveniently using the
83f8bb78781e Uploaded
fubar
parents:
diff changeset
164 administrative "Search and browse tool sheds" link. Find the Galaxy Main
83f8bb78781e Uploaded
fubar
parents:
diff changeset
165 toolshed at https://toolshed.g2.bx.psu.edu/ and search for the toolfactory
83f8bb78781e Uploaded
fubar
parents:
diff changeset
166 repository in the Tool Maker section. Open it and review the code and select the option to install it.
83f8bb78781e Uploaded
fubar
parents:
diff changeset
167
83f8bb78781e Uploaded
fubar
parents:
diff changeset
168 Otherwise, if not already there pending an accepted PR,
83f8bb78781e Uploaded
fubar
parents:
diff changeset
169 please add:
83f8bb78781e Uploaded
fubar
parents:
diff changeset
170 <datatype extension="tgz" type="galaxy.datatypes.binary:Binary"
83f8bb78781e Uploaded
fubar
parents:
diff changeset
171 mimetype="multipart/x-gzip" subclass="True" />
83f8bb78781e Uploaded
fubar
parents:
diff changeset
172 to your local data_types_conf.xml.
83f8bb78781e Uploaded
fubar
parents:
diff changeset
173
83f8bb78781e Uploaded
fubar
parents:
diff changeset
174
83f8bb78781e Uploaded
fubar
parents:
diff changeset
175 *Restricted execution*
83f8bb78781e Uploaded
fubar
parents:
diff changeset
176
83f8bb78781e Uploaded
fubar
parents:
diff changeset
177 The tool factory tool itself will then be usable ONLY by admin users -
83f8bb78781e Uploaded
fubar
parents:
diff changeset
178 people with IDs in admin_users. **Yes, that's right. ONLY
83f8bb78781e Uploaded
fubar
parents:
diff changeset
179 admin_users can run this tool** Think about it for a moment. If allowed to
83f8bb78781e Uploaded
fubar
parents:
diff changeset
180 run any arbitrary script on your Galaxy server, the only thing that would
83f8bb78781e Uploaded
fubar
parents:
diff changeset
181 impede a miscreant bent on destroying all your Galaxy data would probably
83f8bb78781e Uploaded
fubar
parents:
diff changeset
182 be lack of appropriate technical skills.
83f8bb78781e Uploaded
fubar
parents:
diff changeset
183
83f8bb78781e Uploaded
fubar
parents:
diff changeset
184 **Generated tool Security**
83f8bb78781e Uploaded
fubar
parents:
diff changeset
185
83f8bb78781e Uploaded
fubar
parents:
diff changeset
186 Once you install a generated tool, it's just
83f8bb78781e Uploaded
fubar
parents:
diff changeset
187 another tool - assuming the script is safe. They just run normally and their
83f8bb78781e Uploaded
fubar
parents:
diff changeset
188 user cannot do anything unusually insecure but please, practice safe toolshed.
83f8bb78781e Uploaded
fubar
parents:
diff changeset
189 Read the code before you install any tool. Especially this one - it is really scary.
83f8bb78781e Uploaded
fubar
parents:
diff changeset
190
83f8bb78781e Uploaded
fubar
parents:
diff changeset
191 **Send Code**
83f8bb78781e Uploaded
fubar
parents:
diff changeset
192
83f8bb78781e Uploaded
fubar
parents:
diff changeset
193 Pull requests and suggestions welcome as git issues please?
83f8bb78781e Uploaded
fubar
parents:
diff changeset
194
83f8bb78781e Uploaded
fubar
parents:
diff changeset
195 **Attribution**
83f8bb78781e Uploaded
fubar
parents:
diff changeset
196
83f8bb78781e Uploaded
fubar
parents:
diff changeset
197 Creating re-usable tools from scripts: The Galaxy Tool Factory
83f8bb78781e Uploaded
fubar
parents:
diff changeset
198 Ross Lazarus; Antony Kaspi; Mark Ziemann; The Galaxy Team
83f8bb78781e Uploaded
fubar
parents:
diff changeset
199 Bioinformatics 2012; doi: 10.1093/bioinformatics/bts573
83f8bb78781e Uploaded
fubar
parents:
diff changeset
200
83f8bb78781e Uploaded
fubar
parents:
diff changeset
201 http://bioinformatics.oxfordjournals.org/cgi/reprint/bts573?ijkey=lczQh1sWrMwdYWJ&keytype=ref
83f8bb78781e Uploaded
fubar
parents:
diff changeset
202
83f8bb78781e Uploaded
fubar
parents:
diff changeset
203 **Licensing**
83f8bb78781e Uploaded
fubar
parents:
diff changeset
204
83f8bb78781e Uploaded
fubar
parents:
diff changeset
205 Copyright Ross Lazarus 2010
83f8bb78781e Uploaded
fubar
parents:
diff changeset
206 ross lazarus at g mail period com
83f8bb78781e Uploaded
fubar
parents:
diff changeset
207
83f8bb78781e Uploaded
fubar
parents:
diff changeset
208 All rights reserved.
83f8bb78781e Uploaded
fubar
parents:
diff changeset
209
83f8bb78781e Uploaded
fubar
parents:
diff changeset
210 Licensed under the LGPL
83f8bb78781e Uploaded
fubar
parents:
diff changeset
211