Mercurial > repos > fubar > toolfactory2
comparison toolfactory/README.md @ 0:fc50a3f507ab draft
Need a new repo - old tool_factory_2 is broken
| author | fubar |
|---|---|
| date | Sat, 10 Apr 2021 02:16:35 +0000 |
| parents | |
| children | 48458b0369aa |
comparison
equal
deleted
inserted
replaced
| -1:000000000000 | 0:fc50a3f507ab |
|---|---|
| 1 **Breaking news! Docker container is recommended as at August 2020** | |
| 2 | |
| 3 A Docker container can be built - see the docker directory. | |
| 4 It is highly recommended for isolation. It also has an integrated toolshed to allow installation of new tools back | |
| 5 into the Galaxy being used to generate them. | |
| 6 | |
| 7 Built from quay.io/bgruening/galaxy:20.05 but updates the | |
| 8 Galaxy code to the dev branch - it seems to work fine with updated bioblend>=0.14 | |
| 9 with planemo and the right version of gxformat2 needed by the ToolFactory (TF). | |
| 10 | |
| 11 The runclean.sh script run from the docker subdirectory of your local clone of this repository | |
| 12 should create a container (eventually) and serve it at localhost:8080 with a toolshed at | |
| 13 localhost:9009. | |
| 14 | |
| 15 Once it's up, please restart Galaxy in the container with | |
| 16 ```docker exec [container name] supervisorctl restart galaxy: ``` | |
| 17 Jobs just do not seem to run properly otherwise and the next steps won't work! | |
| 18 | |
| 19 The generated container includes a workflow and 2 sample data sets for the workflow | |
| 20 | |
| 21 Load the workflow. Adjust the inputs for each as labelled. The perl example counts GC in phiX.fasta. | |
| 22 The python scripts use the rgToolFactory.py as their input - any text file will work but I like the | |
| 23 recursion. The BWA example has some mitochondrial reads and reference. Run the workflow and watch. | |
| 24 This should fill the history with some sample tools you can rerun and play with. | |
| 25 Note that each new tool will have been tested using Planemo. In the workflow, in Galaxy. | |
| 26 Extremely cool to watch. | |
| 27 | |
| 28 *WARNING* | |
| 29 | |
| 30 Install this tool on a throw-away private Galaxy or Docker container ONLY | |
| 31 Please NEVER on a public or production instance | |
| 32 | |
| 33 *Short Story* | |
| 34 | |
| 35 Galaxy is easily extended to new applications by adding a new tool. Each new scientific computational package added as | |
| 36 a tool to Galaxy requires some special instructions to be written. This is sometimes termed "wrapping" the package | |
| 37 because the instructions tell Galaxy how to run the package as a new Galaxy tool. Any tool in a Galaxy is | |
| 38 readily available to all the users through a consistent and easy to use interface. | |
| 39 | |
| 40 Most Galaxy tool wrappers have been manually prepared by skilled programmers, many using Planemo because it | |
| 41 automates much of the basic boilerplate and makes the process much easier. The ToolFactory (TF) | |
| 42 uses Planemo under the hood for many functions, but hides the command | |
| 43 line complexities from the TF user. | |
| 44 | |
| 45 *More Explanation* | |
| 46 | |
| 47 The TF is an unusual Galaxy tool, designed to allow a skilled user to make new Galaxy tools. | |
| 48 It appears in Galaxy just like any other tool but outputs include new Galaxy tools generated | |
| 49 using instructions provided by the user and the results of Planemo lint and tool testing using | |
| 50 small sample inputs provided by the TF user. The small samples become tests built in to the new tool. | |
| 51 | |
| 52 It offers a familiar Galaxy form driven way to define how the user of the new tool will | |
| 53 choose input data from their history, and what parameters the new tool user will be able to adjust. | |
| 54 The TF user must know, or be able to read, enough about the tool to be able to define the details of | |
| 55 the new Galaxy interface and the ToolFactory offers little guidance on that other than some examples. | |
| 56 | |
| 57 Tools always depend on other things. Most tools in Galaxy depend on third party | |
| 58 scientific packages, so TF tools usually have one or more dependencies. These can be | |
| 59 scientific packages such as BWA or scripting languages such as Python and are | |
| 60 usually managed by Conda. If the new tool relies on a system utility such as bash or awk | |
| 61 where the importance of version control on reproducibility is low, these can be used without | |
| 62 Conda management - but remember the potential risks of unmanaged dependencies on computational | |
| 63 reproducibility. | |
| 64 | |
| 65 The TF user can optionally supply a working script where scripting is | |
| 66 required and the chosen dependency is a scripting language such as Python or a system | |
| 67 scripting executable such as bash. Whatever the language, the script must correctly parse the command line | |
| 68 arguments it receives at tool execution, as they are defined by the TF user. The | |
| 69 text of that script is "baked in" to the new tool and will be executed each time | |
| 70 the new tool is run. It is highly recommended that scripts and their command lines be developed | |
| 71 and tested until proven to work before the TF is invoked. Galaxy as a software development | |
| 72 environment is actually possible, but not recommended being somewhat clumsy and inefficient. | |
| 73 | |
| 74 Tools nearly always take one or more data sets from the user's history as input. TF tools | |
| 75 allow the TF user to define what Galaxy datatypes the tool end user will be able to choose and what | |
| 76 names or positions will be used to pass them on a command line to the package or script. | |
| 77 | |
| 78 Tools often have various parameter settings. The TF allows the TF user to define how each | |
| 79 parameter will appear on the tool form to the end user, and what names or positions will be | |
| 80 used to pass them on the command line to the package. At present, parameters are limited to | |
| 81 simple text and number fields. Pull requests for other kinds of parameters that galaxyxml | |
| 82 can handle are welcomed. | |
| 83 | |
| 84 Best practice Galaxy tools have one or more automated tests. These should use small sample data sets and | |
| 85 specific parameter settings so when the tool is tested, the outputs can be compared with their expected | |
| 86 values. The TF will automatically create a test for the new tool. It will use the sample data sets | |
| 87 chosen by the TF user when they built the new tool. | |
| 88 | |
| 89 The TF works by exposing *unrestricted* and therefore extremely dangerous scripting | |
| 90 to all designated administrators of the host Galaxy server, allowing them to | |
| 91 run scripts in R, python, sh and perl. For this reason, a Docker container is | |
| 92 available to help manage the associated risks. | |
| 93 | |
| 94 *Scripting uses* | |
| 95 | |
| 96 To use a scripting language to create a new tool, you must first prepared and properly test a script. Use small sample | |
| 97 data sets for testing. When the script is working correctly, upload the small sample datasets | |
| 98 into a new history, start configuring a new ToolFactory tool, and paste the script into the script text box on the TF form. | |
| 99 | |
| 100 *Outputs* | |
| 101 | |
| 102 Once the script runs sucessfully, a new Galaxy tool that runs your script | |
| 103 can be generated. Select the "generate" option and supply some help text and | |
| 104 names. The new tool will be generated in the form of a new Galaxy datatype | |
| 105 *tgz* - as the name suggests, it's an archive ready to upload to a | |
| 106 Galaxy ToolShed as a new tool repository. | |
| 107 | |
| 108 It is also possible to run a tool to generate test outputs, then test it | |
| 109 using planemo. A toolshed is built in to the Docker container and configured | |
| 110 so a tool can be tested, sent to that toolshed, then installed in the Galaxy | |
| 111 where the TF is running. | |
| 112 | |
| 113 If the tool requires a command or test XML override, then planemo is | |
| 114 needed to generate test outputs to make a complete tool, rerun to test | |
| 115 and if required upload to the local toolshed and install in the Galaxy | |
| 116 where the TF is running. | |
| 117 | |
| 118 Once it's in a ToolShed, it can be installed into any local Galaxy server | |
| 119 from the server administrative interface. | |
| 120 | |
| 121 Once the new tool is installed, local users can run it - each time, the | |
| 122 package and/or script that was supplied when it was built will be executed with the input chosen | |
| 123 from the user's history, together with user supplied parameters. In other words, the tools you generate with the | |
| 124 ToolFactory run just like any other Galaxy tool. | |
| 125 | |
| 126 TF generated tools work as normal workflow components. | |
| 127 | |
| 128 | |
| 129 *Limitations* | |
| 130 | |
| 131 The TF is flexible enough to generate wrappers for many common scientific packages | |
| 132 but the inbuilt automation will not cope with all possible situations. Users can | |
| 133 supply overrides for two tool XML segments - tests and command and the BWA | |
| 134 example in the supplied samples workflow illustrates their use. | |
| 135 | |
| 136 *Installation* | |
| 137 | |
| 138 The Docker container is the best way to use the TF because it is preconfigured | |
| 139 to automate new tool testing and has a built in local toolshed where each new tool | |
| 140 is uploaded. If you grab the docker container, it should just work. | |
| 141 | |
| 142 If you build the container, there are some things to watch out for. Let it run for 10 minutes | |
| 143 or so once you build it - check with top until conda has finished fussing. Once everything quietens | |
| 144 down, find the container with | |
| 145 ```docker ps``` | |
| 146 and use | |
| 147 ```docker exec [containername] supervisorctl restart galaxy:``` | |
| 148 That colon is not a typographical mistake. | |
| 149 Not restarting after first boot seems to leave the job/worflow system confused and the workflow | |
| 150 just will not run properly until Galaxy has restarted. | |
| 151 | |
| 152 Login as admin@galaxy.org with password "password". Feel free to change it once you are logged in. | |
| 153 There should be a companion toolshed at localhost:9090. The history should have some sample data for | |
| 154 the workflow. | |
| 155 | |
| 156 Run the workflow and make sure the right dataset is selected for each of the input files. Most of the | |
| 157 examples use text files so should run, but the bwa example needs the right ones to work properly. | |
| 158 | |
| 159 When the workflow is finished, you will have half a dozen examples to rerun and play with. They have also | |
| 160 all been tested and installed so you should find them in your tool menu under "Generated Tools" | |
| 161 | |
| 162 It is easy to install without Docker, but you will need to make some | |
| 163 configuration changes (TODO write a configuration). You can install it most conveniently using the | |
| 164 administrative "Search and browse tool sheds" link. Find the Galaxy Main | |
| 165 toolshed at https://toolshed.g2.bx.psu.edu/ and search for the toolfactory | |
| 166 repository in the Tool Maker section. Open it and review the code and select the option to install it. | |
| 167 | |
| 168 Otherwise, if not already there pending an accepted PR, | |
| 169 please add: | |
| 170 <datatype extension="tgz" type="galaxy.datatypes.binary:Binary" | |
| 171 mimetype="multipart/x-gzip" subclass="True" /> | |
| 172 to your local data_types_conf.xml. | |
| 173 | |
| 174 | |
| 175 *Restricted execution* | |
| 176 | |
| 177 The tool factory tool itself will then be usable ONLY by admin users - | |
| 178 people with IDs in admin_users. **Yes, that's right. ONLY | |
| 179 admin_users can run this tool** Think about it for a moment. If allowed to | |
| 180 run any arbitrary script on your Galaxy server, the only thing that would | |
| 181 impede a miscreant bent on destroying all your Galaxy data would probably | |
| 182 be lack of appropriate technical skills. | |
| 183 | |
| 184 **Generated tool Security** | |
| 185 | |
| 186 Once you install a generated tool, it's just | |
| 187 another tool - assuming the script is safe. They just run normally and their | |
| 188 user cannot do anything unusually insecure but please, practice safe toolshed. | |
| 189 Read the code before you install any tool. Especially this one - it is really scary. | |
| 190 | |
| 191 **Send Code** | |
| 192 | |
| 193 Pull requests and suggestions welcome as git issues please? | |
| 194 | |
| 195 **Attribution** | |
| 196 | |
| 197 Creating re-usable tools from scripts: The Galaxy Tool Factory | |
| 198 Ross Lazarus; Antony Kaspi; Mark Ziemann; The Galaxy Team | |
| 199 Bioinformatics 2012; doi: 10.1093/bioinformatics/bts573 | |
| 200 | |
| 201 http://bioinformatics.oxfordjournals.org/cgi/reprint/bts573?ijkey=lczQh1sWrMwdYWJ&keytype=ref | |
| 202 | |
| 203 **Licensing** | |
| 204 | |
| 205 Copyright Ross Lazarus 2010 | |
| 206 ross lazarus at g mail period com | |
| 207 | |
| 208 All rights reserved. | |
| 209 | |
| 210 Licensed under the LGPL | |
| 211 |
