annotate toolfactory/README.md @ 123:0566e2ee9789 draft

Uploaded
author fubar
date Fri, 08 Jan 2021 02:04:17 +0000
parents 2050b2475ae5
children 63d15caea378
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
121
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
1 **Breaking news! Docker container is recommended as at August 2020**
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
2
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
3 A Docker container can be built - see the docker directory.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
4 It is highly recommended for isolation. It also has an integrated toolshed to allow installation of new tools back
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
5 into the Galaxy being used to generate them.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
6
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
7 Built from quay.io/bgruening/galaxy:20.05 but updates the
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
8 Galaxy code to the dev branch - it seems to work fine with updated bioblend>=0.14
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
9 with planemo and the right version of gxformat2 needed by the ToolFactory (TF).
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
10
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
11 The runclean.sh script run from the docker subdirectory of your local clone of this repository
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
12 should create a container (eventually) and serve it at localhost:8080 with a toolshed at
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
13 localhost:9009.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
14
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
15 Once it's up, please restart Galaxy in the container with
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
16 ```docker exec [container name] supervisorctl restart galaxy: ```
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
17 Jobs just do not seem to run properly otherwise and the next steps won't work!
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
18
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
19 The generated container includes a workflow and 2 sample data sets for the workflow
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
20
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
21 Load the workflow. Adjust the inputs for each as labelled. The perl example counts GC in phiX.fasta.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
22 The python scripts use the rgToolFactory.py as their input - any text file will work but I like the
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
23 recursion. The BWA example has some mitochondrial reads and reference. Run the workflow and watch.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
24 This should fill the history with some sample tools you can rerun and play with.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
25 Note that each new tool will have been tested using Planemo. In the workflow, in Galaxy.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
26 Extremely cool to watch.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
27
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
28 *WARNING*
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
29
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
30 Install this tool on a throw-away private Galaxy or Docker container ONLY
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
31 Please NEVER on a public or production instance
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
32
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
33 *Short Story*
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
34
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
35 Galaxy is easily extended to new applications by adding a new tool. Each new scientific computational package added as
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
36 a tool to Galaxy requires some special instructions to be written. This is sometimes termed "wrapping" the package
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
37 because the instructions tell Galaxy how to run the package as a new Galaxy tool. Any tool in a Galaxy is
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
38 readily available to all the users through a consistent and easy to use interface.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
39
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
40 Most Galaxy tool wrappers have been manually prepared by skilled programmers, many using Planemo because it
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
41 automates much of the basic boilerplate and makes the process much easier. The ToolFactory (TF)
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
42 uses Planemo under the hood for many functions, but hides the command
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
43 line complexities from the TF user.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
44
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
45 *More Explanation*
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
46
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
47 The TF is an unusual Galaxy tool, designed to allow a skilled user to make new Galaxy tools.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
48 It appears in Galaxy just like any other tool but outputs include new Galaxy tools generated
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
49 using instructions provided by the user and the results of Planemo lint and tool testing using
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
50 small sample inputs provided by the TF user. The small samples become tests built in to the new tool.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
51
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
52 It offers a familiar Galaxy form driven way to define how the user of the new tool will
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
53 choose input data from their history, and what parameters the new tool user will be able to adjust.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
54 The TF user must know, or be able to read, enough about the tool to be able to define the details of
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
55 the new Galaxy interface and the ToolFactory offers little guidance on that other than some examples.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
56
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
57 Tools always depend on other things. Most tools in Galaxy depend on third party
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
58 scientific packages, so TF tools usually have one or more dependencies. These can be
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
59 scientific packages such as BWA or scripting languages such as Python and are
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
60 usually managed by Conda. If the new tool relies on a system utility such as bash or awk
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
61 where the importance of version control on reproducibility is low, these can be used without
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
62 Conda management - but remember the potential risks of unmanaged dependencies on computational
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
63 reproducibility.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
64
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
65 The TF user can optionally supply a working script where scripting is
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
66 required and the chosen dependency is a scripting language such as Python or a system
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
67 scripting executable such as bash. Whatever the language, the script must correctly parse the command line
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
68 arguments it receives at tool execution, as they are defined by the TF user. The
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
69 text of that script is "baked in" to the new tool and will be executed each time
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
70 the new tool is run. It is highly recommended that scripts and their command lines be developed
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
71 and tested until proven to work before the TF is invoked. Galaxy as a software development
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
72 environment is actually possible, but not recommended being somewhat clumsy and inefficient.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
73
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
74 Tools nearly always take one or more data sets from the user's history as input. TF tools
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
75 allow the TF user to define what Galaxy datatypes the tool end user will be able to choose and what
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
76 names or positions will be used to pass them on a command line to the package or script.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
77
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
78 Tools often have various parameter settings. The TF allows the TF user to define how each
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
79 parameter will appear on the tool form to the end user, and what names or positions will be
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
80 used to pass them on the command line to the package. At present, parameters are limited to
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
81 simple text and number fields. Pull requests for other kinds of parameters that galaxyxml
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
82 can handle are welcomed.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
83
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
84 Best practice Galaxy tools have one or more automated tests. These should use small sample data sets and
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
85 specific parameter settings so when the tool is tested, the outputs can be compared with their expected
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
86 values. The TF will automatically create a test for the new tool. It will use the sample data sets
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
87 chosen by the TF user when they built the new tool.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
88
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
89 The TF works by exposing *unrestricted* and therefore extremely dangerous scripting
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
90 to all designated administrators of the host Galaxy server, allowing them to
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
91 run scripts in R, python, sh and perl. For this reason, a Docker container is
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
92 available to help manage the associated risks.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
93
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
94 *Scripting uses*
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
95
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
96 To use a scripting language to create a new tool, you must first prepared and properly test a script. Use small sample
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
97 data sets for testing. When the script is working correctly, upload the small sample datasets
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
98 into a new history, start configuring a new ToolFactory tool, and paste the script into the script text box on the TF form.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
99
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
100 *Outputs*
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
101
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
102 Once the script runs sucessfully, a new Galaxy tool that runs your script
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
103 can be generated. Select the "generate" option and supply some help text and
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
104 names. The new tool will be generated in the form of a new Galaxy datatype
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
105 *tgz* - as the name suggests, it's an archive ready to upload to a
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
106 Galaxy ToolShed as a new tool repository.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
107
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
108 It is also possible to run a tool to generate test outputs, then test it
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
109 using planemo. A toolshed is built in to the Docker container and configured
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
110 so a tool can be tested, sent to that toolshed, then installed in the Galaxy
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
111 where the TF is running.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
112
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
113 If the tool requires a command or test XML override, then planemo is
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
114 needed to generate test outputs to make a complete tool, rerun to test
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
115 and if required upload to the local toolshed and install in the Galaxy
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
116 where the TF is running.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
117
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
118 Once it's in a ToolShed, it can be installed into any local Galaxy server
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
119 from the server administrative interface.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
120
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
121 Once the new tool is installed, local users can run it - each time, the
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
122 package and/or script that was supplied when it was built will be executed with the input chosen
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
123 from the user's history, together with user supplied parameters. In other words, the tools you generate with the
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
124 ToolFactory run just like any other Galaxy tool.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
125
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
126 TF generated tools work as normal workflow components.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
127
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
128
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
129 *Limitations*
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
130
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
131 The TF is flexible enough to generate wrappers for many common scientific packages
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
132 but the inbuilt automation will not cope with all possible situations. Users can
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
133 supply overrides for two tool XML segments - tests and command and the BWA
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
134 example in the supplied samples workflow illustrates their use.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
135
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
136 *Installation*
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
137
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
138 The Docker container is the best way to use the TF because it is preconfigured
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
139 to automate new tool testing and has a built in local toolshed where each new tool
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
140 is uploaded. If you grab the docker container, it should just work.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
141
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
142 If you build the container, there are some things to watch out for. Let it run for 10 minutes
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
143 or so once you build it - check with top until conda has finished fussing. Once everything quietens
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
144 down, find the container with
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
145 ```docker ps```
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
146 and use
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
147 ```docker exec [containername] supervisorctl restart galaxy:```
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
148 That colon is not a typographical mistake.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
149 Not restarting after first boot seems to leave the job/worflow system confused and the workflow
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
150 just will not run properly until Galaxy has restarted.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
151
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
152 Login as admin@galaxy.org with password "password". Feel free to change it once you are logged in.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
153 There should be a companion toolshed at localhost:9090. The history should have some sample data for
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
154 the workflow.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
155
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
156 Run the workflow and make sure the right dataset is selected for each of the input files. Most of the
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
157 examples use text files so should run, but the bwa example needs the right ones to work properly.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
158
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
159 When the workflow is finished, you will have half a dozen examples to rerun and play with. They have also
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
160 all been tested and installed so you should find them in your tool menu under "Generated Tools"
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
161
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
162 It is easy to install without Docker, but you will need to make some
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
163 configuration changes (TODO write a configuration). You can install it most conveniently using the
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
164 administrative "Search and browse tool sheds" link. Find the Galaxy Main
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
165 toolshed at https://toolshed.g2.bx.psu.edu/ and search for the toolfactory
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
166 repository in the Tool Maker section. Open it and review the code and select the option to install it.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
167
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
168 Otherwise, if not already there pending an accepted PR,
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
169 please add:
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
170 <datatype extension="tgz" type="galaxy.datatypes.binary:Binary"
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
171 mimetype="multipart/x-gzip" subclass="True" />
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
172 to your local data_types_conf.xml.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
173
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
174
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
175 *Restricted execution*
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
176
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
177 The tool factory tool itself will then be usable ONLY by admin users -
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
178 people with IDs in admin_users. **Yes, that's right. ONLY
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
179 admin_users can run this tool** Think about it for a moment. If allowed to
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
180 run any arbitrary script on your Galaxy server, the only thing that would
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
181 impede a miscreant bent on destroying all your Galaxy data would probably
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
182 be lack of appropriate technical skills.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
183
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
184 **Generated tool Security**
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
185
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
186 Once you install a generated tool, it's just
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
187 another tool - assuming the script is safe. They just run normally and their
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
188 user cannot do anything unusually insecure but please, practice safe toolshed.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
189 Read the code before you install any tool. Especially this one - it is really scary.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
190
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
191 **Send Code**
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
192
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
193 Pull requests and suggestions welcome as git issues please?
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
194
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
195 **Attribution**
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
196
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
197 Creating re-usable tools from scripts: The Galaxy Tool Factory
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
198 Ross Lazarus; Antony Kaspi; Mark Ziemann; The Galaxy Team
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
199 Bioinformatics 2012; doi: 10.1093/bioinformatics/bts573
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
200
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
201 http://bioinformatics.oxfordjournals.org/cgi/reprint/bts573?ijkey=lczQh1sWrMwdYWJ&keytype=ref
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
202
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
203 **Licensing**
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
204
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
205 Copyright Ross Lazarus 2010
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
206 ross lazarus at g mail period com
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
207
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
208 All rights reserved.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
209
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
210 Licensed under the LGPL
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
211