Mercurial > repos > fubar > tool_factory_docker
comparison toolfactory/README.md @ 0:83f8bb78781e draft
Uploaded
author | fubar |
---|---|
date | Fri, 11 Dec 2020 02:51:15 +0000 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:83f8bb78781e |
---|---|
1 **Breaking news! Docker container is recommended as at August 2020** | |
2 | |
3 A Docker container can be built - see the docker directory. | |
4 It is highly recommended for isolation. It also has an integrated toolshed to allow installation of new tools back | |
5 into the Galaxy being used to generate them. | |
6 | |
7 Built from quay.io/bgruening/galaxy:20.05 but updates the | |
8 Galaxy code to the dev branch - it seems to work fine with updated bioblend>=0.14 | |
9 with planemo and the right version of gxformat2 needed by the ToolFactory (TF). | |
10 | |
11 The runclean.sh script run from the docker subdirectory of your local clone of this repository | |
12 should create a container (eventually) and serve it at localhost:8080 with a toolshed at | |
13 localhost:9009. | |
14 | |
15 Once it's up, please restart Galaxy in the container with | |
16 ```docker exec [container name] supervisorctl restart galaxy: ``` | |
17 Jobs just do not seem to run properly otherwise and the next steps won't work! | |
18 | |
19 The generated container includes a workflow and 2 sample data sets for the workflow | |
20 | |
21 Load the workflow. Adjust the inputs for each as labelled. The perl example counts GC in phiX.fasta. | |
22 The python scripts use the rgToolFactory.py as their input - any text file will work but I like the | |
23 recursion. The BWA example has some mitochondrial reads and reference. Run the workflow and watch. | |
24 This should fill the history with some sample tools you can rerun and play with. | |
25 Note that each new tool will have been tested using Planemo. In the workflow, in Galaxy. | |
26 Extremely cool to watch. | |
27 | |
28 *WARNING* | |
29 | |
30 Install this tool on a throw-away private Galaxy or Docker container ONLY | |
31 Please NEVER on a public or production instance | |
32 | |
33 *Short Story* | |
34 | |
35 Galaxy is easily extended to new applications by adding a new tool. Each new scientific computational package added as | |
36 a tool to Galaxy requires some special instructions to be written. This is sometimes termed "wrapping" the package | |
37 because the instructions tell Galaxy how to run the package as a new Galaxy tool. Any tool in a Galaxy is | |
38 readily available to all the users through a consistent and easy to use interface. | |
39 | |
40 Most Galaxy tool wrappers have been manually prepared by skilled programmers, many using Planemo because it | |
41 automates much of the basic boilerplate and makes the process much easier. The ToolFactory (TF) | |
42 uses Planemo under the hood for many functions, but hides the command | |
43 line complexities from the TF user. | |
44 | |
45 *More Explanation* | |
46 | |
47 The TF is an unusual Galaxy tool, designed to allow a skilled user to make new Galaxy tools. | |
48 It appears in Galaxy just like any other tool but outputs include new Galaxy tools generated | |
49 using instructions provided by the user and the results of Planemo lint and tool testing using | |
50 small sample inputs provided by the TF user. The small samples become tests built in to the new tool. | |
51 | |
52 It offers a familiar Galaxy form driven way to define how the user of the new tool will | |
53 choose input data from their history, and what parameters the new tool user will be able to adjust. | |
54 The TF user must know, or be able to read, enough about the tool to be able to define the details of | |
55 the new Galaxy interface and the ToolFactory offers little guidance on that other than some examples. | |
56 | |
57 Tools always depend on other things. Most tools in Galaxy depend on third party | |
58 scientific packages, so TF tools usually have one or more dependencies. These can be | |
59 scientific packages such as BWA or scripting languages such as Python and are | |
60 usually managed by Conda. If the new tool relies on a system utility such as bash or awk | |
61 where the importance of version control on reproducibility is low, these can be used without | |
62 Conda management - but remember the potential risks of unmanaged dependencies on computational | |
63 reproducibility. | |
64 | |
65 The TF user can optionally supply a working script where scripting is | |
66 required and the chosen dependency is a scripting language such as Python or a system | |
67 scripting executable such as bash. Whatever the language, the script must correctly parse the command line | |
68 arguments it receives at tool execution, as they are defined by the TF user. The | |
69 text of that script is "baked in" to the new tool and will be executed each time | |
70 the new tool is run. It is highly recommended that scripts and their command lines be developed | |
71 and tested until proven to work before the TF is invoked. Galaxy as a software development | |
72 environment is actually possible, but not recommended being somewhat clumsy and inefficient. | |
73 | |
74 Tools nearly always take one or more data sets from the user's history as input. TF tools | |
75 allow the TF user to define what Galaxy datatypes the tool end user will be able to choose and what | |
76 names or positions will be used to pass them on a command line to the package or script. | |
77 | |
78 Tools often have various parameter settings. The TF allows the TF user to define how each | |
79 parameter will appear on the tool form to the end user, and what names or positions will be | |
80 used to pass them on the command line to the package. At present, parameters are limited to | |
81 simple text and number fields. Pull requests for other kinds of parameters that galaxyxml | |
82 can handle are welcomed. | |
83 | |
84 Best practice Galaxy tools have one or more automated tests. These should use small sample data sets and | |
85 specific parameter settings so when the tool is tested, the outputs can be compared with their expected | |
86 values. The TF will automatically create a test for the new tool. It will use the sample data sets | |
87 chosen by the TF user when they built the new tool. | |
88 | |
89 The TF works by exposing *unrestricted* and therefore extremely dangerous scripting | |
90 to all designated administrators of the host Galaxy server, allowing them to | |
91 run scripts in R, python, sh and perl. For this reason, a Docker container is | |
92 available to help manage the associated risks. | |
93 | |
94 *Scripting uses* | |
95 | |
96 To use a scripting language to create a new tool, you must first prepared and properly test a script. Use small sample | |
97 data sets for testing. When the script is working correctly, upload the small sample datasets | |
98 into a new history, start configuring a new ToolFactory tool, and paste the script into the script text box on the TF form. | |
99 | |
100 *Outputs* | |
101 | |
102 Once the script runs sucessfully, a new Galaxy tool that runs your script | |
103 can be generated. Select the "generate" option and supply some help text and | |
104 names. The new tool will be generated in the form of a new Galaxy datatype | |
105 *tgz* - as the name suggests, it's an archive ready to upload to a | |
106 Galaxy ToolShed as a new tool repository. | |
107 | |
108 It is also possible to run a tool to generate test outputs, then test it | |
109 using planemo. A toolshed is built in to the Docker container and configured | |
110 so a tool can be tested, sent to that toolshed, then installed in the Galaxy | |
111 where the TF is running. | |
112 | |
113 If the tool requires a command or test XML override, then planemo is | |
114 needed to generate test outputs to make a complete tool, rerun to test | |
115 and if required upload to the local toolshed and install in the Galaxy | |
116 where the TF is running. | |
117 | |
118 Once it's in a ToolShed, it can be installed into any local Galaxy server | |
119 from the server administrative interface. | |
120 | |
121 Once the new tool is installed, local users can run it - each time, the | |
122 package and/or script that was supplied when it was built will be executed with the input chosen | |
123 from the user's history, together with user supplied parameters. In other words, the tools you generate with the | |
124 ToolFactory run just like any other Galaxy tool. | |
125 | |
126 TF generated tools work as normal workflow components. | |
127 | |
128 | |
129 *Limitations* | |
130 | |
131 The TF is flexible enough to generate wrappers for many common scientific packages | |
132 but the inbuilt automation will not cope with all possible situations. Users can | |
133 supply overrides for two tool XML segments - tests and command and the BWA | |
134 example in the supplied samples workflow illustrates their use. | |
135 | |
136 *Installation* | |
137 | |
138 The Docker container is the best way to use the TF because it is preconfigured | |
139 to automate new tool testing and has a built in local toolshed where each new tool | |
140 is uploaded. If you grab the docker container, it should just work. | |
141 | |
142 If you build the container, there are some things to watch out for. Let it run for 10 minutes | |
143 or so once you build it - check with top until conda has finished fussing. Once everything quietens | |
144 down, find the container with | |
145 ```docker ps``` | |
146 and use | |
147 ```docker exec [containername] supervisorctl restart galaxy:``` | |
148 That colon is not a typographical mistake. | |
149 Not restarting after first boot seems to leave the job/worflow system confused and the workflow | |
150 just will not run properly until Galaxy has restarted. | |
151 | |
152 Login as admin@galaxy.org with password "password". Feel free to change it once you are logged in. | |
153 There should be a companion toolshed at localhost:9090. The history should have some sample data for | |
154 the workflow. | |
155 | |
156 Run the workflow and make sure the right dataset is selected for each of the input files. Most of the | |
157 examples use text files so should run, but the bwa example needs the right ones to work properly. | |
158 | |
159 When the workflow is finished, you will have half a dozen examples to rerun and play with. They have also | |
160 all been tested and installed so you should find them in your tool menu under "Generated Tools" | |
161 | |
162 It is easy to install without Docker, but you will need to make some | |
163 configuration changes (TODO write a configuration). You can install it most conveniently using the | |
164 administrative "Search and browse tool sheds" link. Find the Galaxy Main | |
165 toolshed at https://toolshed.g2.bx.psu.edu/ and search for the toolfactory | |
166 repository in the Tool Maker section. Open it and review the code and select the option to install it. | |
167 | |
168 Otherwise, if not already there pending an accepted PR, | |
169 please add: | |
170 <datatype extension="tgz" type="galaxy.datatypes.binary:Binary" | |
171 mimetype="multipart/x-gzip" subclass="True" /> | |
172 to your local data_types_conf.xml. | |
173 | |
174 | |
175 *Restricted execution* | |
176 | |
177 The tool factory tool itself will then be usable ONLY by admin users - | |
178 people with IDs in admin_users. **Yes, that's right. ONLY | |
179 admin_users can run this tool** Think about it for a moment. If allowed to | |
180 run any arbitrary script on your Galaxy server, the only thing that would | |
181 impede a miscreant bent on destroying all your Galaxy data would probably | |
182 be lack of appropriate technical skills. | |
183 | |
184 **Generated tool Security** | |
185 | |
186 Once you install a generated tool, it's just | |
187 another tool - assuming the script is safe. They just run normally and their | |
188 user cannot do anything unusually insecure but please, practice safe toolshed. | |
189 Read the code before you install any tool. Especially this one - it is really scary. | |
190 | |
191 **Send Code** | |
192 | |
193 Pull requests and suggestions welcome as git issues please? | |
194 | |
195 **Attribution** | |
196 | |
197 Creating re-usable tools from scripts: The Galaxy Tool Factory | |
198 Ross Lazarus; Antony Kaspi; Mark Ziemann; The Galaxy Team | |
199 Bioinformatics 2012; doi: 10.1093/bioinformatics/bts573 | |
200 | |
201 http://bioinformatics.oxfordjournals.org/cgi/reprint/bts573?ijkey=lczQh1sWrMwdYWJ&keytype=ref | |
202 | |
203 **Licensing** | |
204 | |
205 Copyright Ross Lazarus 2010 | |
206 ross lazarus at g mail period com | |
207 | |
208 All rights reserved. | |
209 | |
210 Licensed under the LGPL | |
211 |