annotate toolfactory/README.md @ 140:7c8f9793127d draft

Uploaded
author fubar
date Sat, 17 Apr 2021 09:53:56 +0000
parents 63d15caea378
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
137
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
1 ## Breaking news! Docker container at https://github.com/fubar2/toolfactory-galaxy-docker recommended as at December 2020
121
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
2
137
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
3 ### New demonstration of planemo tool_factory command ![Planemo ToolFactory demonstration](images/lintplanemo-2021-01-08_18.02.45.mkv?raw=false "Demonstration inside Planemo")
121
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
4
137
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
5 ## This is the original ToolFactory suitable for non-docker situations. Please use the docker container if you can because it's integrated with a Toolshed...
121
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
6
137
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
7 # WARNING
121
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
8
137
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
9 Install this tool to a throw-away private Galaxy or Docker container ONLY!
121
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
10
137
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
11 Please NEVER on a public or production instance where a hostile user may
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
12 be able to gain access if they can acquire an administrative account login.
121
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
13
137
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
14 It only runs for server administrators - the ToolFactory tool will refuse to execute for an ordinary user since
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
15 it can install new tools to the Galaxy server it executes on! This is not something you should allow other than
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
16 on a throw away instance that is protected from potentially hostile users.
121
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
17
137
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
18 ## Short Story
121
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
19
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
20 Galaxy is easily extended to new applications by adding a new tool. Each new scientific computational package added as
137
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
21 a tool to Galaxy requires an XML document describing how the application interacts with Galaxy.
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
22 This is sometimes termed "wrapping" the package because the instructions tell Galaxy how to run the package
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
23 as a new Galaxy tool. Any tool that has been wrapped is readily available to all the users through a consistent
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
24 and easy to use interface once installed in the local Galaxy server.
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
25
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
26 Most Galaxy tool wrappers have been manually prepared by skilled programmers, many using Planemo because it
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
27 automates much of the boilerplate and makes the process much easier.
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
28 The ToolFactory (TF) now uses Planemo under the hood for testing, but hides the command
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
29 line complexities. The user will still need appropriate skills in terms of describing the interface between
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
30 Galaxy and the new application, but will be helped by a Galaxy tool form to collect all the needed
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
31 settings, together with automated testing and uploading to a toolshed with optional local installation.
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
32
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
33
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
34 ## ToolFactory generated tools are ordinary Galaxy tools
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
35
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
36 A TF generated tool that passes the Planemo test is ready to publish in any Galaxy Toolshed and ready to install in any running Galaxy instance.
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
37 They are fully workflow compatible and work exactly like any hand-written tool. The user can select input files of the specified type(s) from their
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
38 history and edit each of the specified parameters. The tool form will show all the labels and help text supplied when the tool was built. When the tool
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
39 is executed, the dependent binary or script will be passed all the i/o files and parameters as specified, and will write outputs to the specified new
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
40 history datasets - just like any other Galaxy tool.
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
41
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
42 ## Models for tool command line construction
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
43
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
44 The key to turning any software package into a Galaxy tool is the automated construction of a suitable command line.
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
45
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
46 The TF can build a new tool that will allow the tool user to select input files from their history, set any parameters and when run will send the
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
47 new output files to the history as specified when the tool builder completed the form and built the new tool.
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
48
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
49 That tool can contain instructions to run any Conda dependency or a system executable like bash. Whether a bash script you have written or
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
50 a Conda package like bwa, the executable will expect to find settings for input, output and parameters on a command line.
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
51
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
52 These are often passed as "--name value" (argparse style) or in a fixed order (positional style).
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
53
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
54 The ToolFactory allows either, or for "filter" applications that process input from STDIN and write processed output to STDOUT.
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
55
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
56 The simplest tool model wraps a simple script or Conda dependency package requiring only input and output files, with no user supplied settings illustrated by
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
57 the Tacrev demonstration tool found in the Galaxy running in the ToolFactory docker container. It passes a user selected input file from the current history on STDIN
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
58 to a bash script. The bash script runs the unix tac utility (reverse cat) piped to the unix rev (reverse lines in a text file) utility. It's a one liner:
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
59
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
60 `tac | rev`
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
61
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
62 The tool building form allows zero or more Conda package name(s) and version(s) and an optional script to be executed by either a system
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
63 executable like ``bash`` or the first of any named Conda dependency package/version. Tacrev uses a tiny bash script shown above and uses the system
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
64 bash. Conda bash can be specified if it is important to use the same version consistently for the tool.
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
65
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
66 On the tool form, the repeat section allowing zero or more input files was set to be a text file to be selected by the tool user and
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
67 in the repeat section allowing one or more outputs, a new output file with special value `STDOUT` as the positional parameter, causes the TF to
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
68 generate a command to capture STDOUT and send it to the new history file containing the reversed input text.
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
69
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
70 By reversed, we mean really, truly reversed.
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
71
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
72 That simple model can be made much more complicated, and can pass inputs and outputs as named or positional parameters,
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
73 to allow more complicated scripts or dependent binaries that require:
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
74
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
75 1. Any number of input data files selected by the user from existing history data
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
76 2. Any number of output data files written to the user's history
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
77 3. Any number of user supplied parameters. These can be passed as command line arguments to the script or the dependency package. Either
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
78 positional or named (argparse) style command line parameter passing can be used.
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
79
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
80 More complex models can be seen in the Sedtest, Pyrevpos and Pyrevargparse tools illustrating positional and argparse parameter passing.
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
81
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
82 The most complex demonstration is the Planemo advanced tool tutorial BWA tool. There is one version using a command-override to implement
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
83 exactly the same command structure in the Planemo tutorial. A second version uses a bash script and positional parameters to achieve the same
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
84 result. Some tool builders may find the bash version more familiar and cleaner but the choice is yours.
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
85
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
86 ## Overview
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
87
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
88 ![IHello example ToolFactory tool form](files/hello_toolfactory_form.png?raw=true "Part of the Hello world example ToolFactory tool form")
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
89
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
90
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
91 Steps in building a new Galaxy tool are all conducted through Galaxy running in the docker container:
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
92
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
93 1. Login to the Galaxy running in the container at http://localhost:8080 using an admin account. They are specified in config/galaxy.yml and
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
94 in the documentation at
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
95 and the ToolFactory will error out and refuse to run for non-administrative tool builders as a minimal protection from opportunistic hostile use.
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
96
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
97 2. Start the TF and fill in the form, providing sample inputs and parameter values to suit the Conda package being wrapped.
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
98
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
99 3. Execute the tool to create a new XML tool wrapper using the sample inputs and parameter settings for the inbuilt tool test. Planemo runs twice.
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
100 firstly to generate the test outputs and then to perform a proper test. The completed toolshed archive is written to the history
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
101 together with the planemo test report. Optionally the new tool archive can be uploaded
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
102 to the toolshed running in the same container (http://localhost:9009) and then installed inside the Galaxy in the container for further testing.
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
103
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
104 4. If the test fails, rerun the failed history job and correct errors on the tool form before rerunning until everything works correctly.
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
105
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
106 ![How it works](files/TFasIDE.png?raw=true "Overview of the ToolFactory as an Integrated Development Environment")
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
107
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
108 ## Planning and building new Galaxy tool wrappers.
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
109
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
110 It is best to have all the required planning done to wrap any new script or binary before firing up the TF.
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
111 Conda is the only current dependency manager supported. Before starting, at the very least, the tool builder will need
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
112 to know the required software package name in Conda and the version to use, how the command line for
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
113 the package must be constructed, and there must be sample inputs in the working history for each of the required data inputs
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
114 for the package, together with values for every parameter to suit these sample inputs. These are required on the TF form
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
115 for preparing the inbuilt tool test. That test is run using Planemo, as part of the tool generation process.
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
116
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
117 A new tool is specified by filling in the usual Galaxy tool form.
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
118
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
119 The form starts with a new tool name. Most tools will need dependency packages and versions
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
120 for the executable. Only Conda is currently supported.
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
121
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
122 If a script is needed, it can be pasted into a text box and the interpreter named. Available system executables
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
123 can be used such as bash, or an interpreter such as python, perl or R can be nominated as conda dependencies
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
124 to ensure reproducible analyses.
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
125
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
126 The tool form will be generated from the input data and the tool builder supplied parameters. The command line for the
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
127 executable is built using positional or argparse (named e.g. --input_file /foo/baz) style
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
128 parameters and is completely dependent on the executable. These can include:
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
129
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
130 1. Any number of input data sets needed by the executable. Each appears to the tool user on the run form and is included
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
131 on the command line for the executable. The tool builder must supply a small representative sample for each one as
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
132 an input for the automated tool test.
121
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
133
137
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
134 2. Any number of output data sets generated by the package can be added to the command line and will appear in
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
135 the user's history at the end of the job
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
136
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
137 3. Any number of text or numeric parameters. Each will appear to the tool user on the run form and are included
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
138 on the command line to the executable. The tool builder must supply a suitable representative value for each one as
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
139 the value to be used for the automated tool test.
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
140
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
141 Once the form is completed, executing the TF will build a new XML tool wrapper
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
142 including a functional test based on the sample settings and data.
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
143
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
144 If the Planemo test passes, the tool can be optionally uploaded to the local Galaxy used in the image for more testing.
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
145
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
146 A local toolshed runs inside the container to allow an automated installation, although any toolshed and any accessible
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
147 Galaxy can be specified for this process by editing the default URL and API keys to provide appropriate credentials.
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
148
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
149 ## Generated Tool Dependency management
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
150
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
151 Conda is used for all dependency management although tools that use system utilities like sed, bash or awk
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
152 may be available on job execution nodes. Sed and friends are available as Conda (conda-forge) dependencies if necessary.
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
153 Versioned Conda dependencies are always baked-in to the tool and will be used for reproducible calculation.
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
154
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
155 ## Requirements
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
156
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
157 These are all managed automagically. The TF relies on galaxyxml to generate tool xml and uses ephemeris and
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
158 bioblend to load tools to the toolshed and to Galaxy. Planemo is used for testing and runs in a biocontainer currently at
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
159 https://quay.io/fubar2/planemo-biocontainer
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
160
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
161 ## Caveats
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
162
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
163 This docker image requires privileged mode so exposes potential security risks if hostile tool builders gain access.
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
164 Please, do not run it in any situation where that is a problem - never, ever on a public facing Galaxy server.
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
165 On a laptop or workstation should be fine in a non-hostile environment.
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
166
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
167
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
168 ## Example generated XML
121
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
169
137
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
170 For the bwa-mem example, a supplied bash script is included as a configfile and so has escaped characters.
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
171 ```
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
172 <tool name="bwatest" id="bwatest" version="0.01">
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
173 <!--Cite: Creating re-usable tools from scripts doi:10.1093/bioinformatics/bts573-->
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
174 <!--Source in git at: https://github.com/fubar2/toolfactory-->
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
175 <!--Created by admin@galaxy.org at 30/11/2020 07:12:10 using the Galaxy Tool Factory.-->
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
176 <description>Planemo advanced tool building sample bwa mem mapper as a ToolFactory demo</description>
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
177 <requirements>
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
178 <requirement version="0.7.15" type="package">bwa</requirement>
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
179 <requirement version="1.3" type="package">samtools</requirement>
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
180 </requirements>
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
181 <configfiles>
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
182 <configfile name="runme"><![CDATA[
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
183 REFFILE=\$1
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
184 FASTQ=\$2
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
185 BAMOUT=\$3
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
186 rm -f "refalias"
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
187 ln -s "\$REFFILE" "refalias"
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
188 bwa index -a is "refalias"
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
189 bwa mem -t "2" -v 1 "refalias" "\$FASTQ" > tempsam
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
190 samtools view -Sb tempsam > temporary_bam_file.bam
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
191 samtools sort -o "\$BAMOUT" temporary_bam_file.bam
121
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
192
137
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
193 ]]></configfile>
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
194 </configfiles>
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
195 <version_command/>
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
196 <command><![CDATA[bash
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
197 $runme
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
198 $input1
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
199 $input2
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
200 $bam_output]]></command>
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
201 <inputs>
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
202 <param optional="false" label="Reference sequence for bwa to map the fastq reads against" help="" format="fasta" multiple="false" type="data" name="input1" argument="input1"/>
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
203 <param optional="false" label="Reads as fastqsanger to align to the reference sequence" help="" format="fastqsanger" multiple="false" type="data" name="input2" argument="input2"/>
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
204 </inputs>
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
205 <outputs>
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
206 <data name="bam_output" format="bam" label="bam_output" hidden="false"/>
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
207 </outputs>
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
208 <tests>
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
209 <test>
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
210 <output name="bam_output" value="bam_output_sample" compare="sim_size" format="bam" delta_frac="0.1"/>
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
211 <param name="input1" value="input1_sample"/>
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
212 <param name="input2" value="input2_sample"/>
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
213 </test>
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
214 </tests>
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
215 <help><![CDATA[
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
216
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
217 **What it Does**
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
218
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
219 Planemo advanced tool building sample bwa mem mapper
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
220
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
221 Reimagined as a bash script for a ToolFactory demonstration
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
222
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
223
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
224 ------
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
225
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
226 Script::
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
227
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
228 REFFILE=$1
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
229 FASTQ=$2
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
230 BAMOUT=$3
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
231 rm -f "refalias"
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
232 ln -s "$REFFILE" "refalias"
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
233 bwa index -a is "refalias"
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
234 bwa mem -t "2" -v 1 "refalias" "$FASTQ" > tempsam
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
235 samtools view -Sb tempsam > temporary_bam_file.bam
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
236 samtools sort -o "$BAMOUT" temporary_bam_file.bam
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
237
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
238 ]]></help>
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
239 </tool>
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
240
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
241 ```
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
242
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
243
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
244
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
245 ## More Explanation
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
246
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
247 The TF is an unusual Galaxy tool, designed to allow a skilled user to make new Galaxy tools.
121
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
248 It appears in Galaxy just like any other tool but outputs include new Galaxy tools generated
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
249 using instructions provided by the user and the results of Planemo lint and tool testing using
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
250 small sample inputs provided by the TF user. The small samples become tests built in to the new tool.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
251
137
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
252 It offers a familiar Galaxy form driven way to define how the user of the new tool will
121
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
253 choose input data from their history, and what parameters the new tool user will be able to adjust.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
254 The TF user must know, or be able to read, enough about the tool to be able to define the details of
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
255 the new Galaxy interface and the ToolFactory offers little guidance on that other than some examples.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
256
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
257 Tools always depend on other things. Most tools in Galaxy depend on third party
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
258 scientific packages, so TF tools usually have one or more dependencies. These can be
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
259 scientific packages such as BWA or scripting languages such as Python and are
137
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
260 managed by Conda. If the new tool relies on a system utility such as bash or awk
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
261 where the importance of version control on reproducibility is low, these can be used without
121
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
262 Conda management - but remember the potential risks of unmanaged dependencies on computational
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
263 reproducibility.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
264
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
265 The TF user can optionally supply a working script where scripting is
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
266 required and the chosen dependency is a scripting language such as Python or a system
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
267 scripting executable such as bash. Whatever the language, the script must correctly parse the command line
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
268 arguments it receives at tool execution, as they are defined by the TF user. The
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
269 text of that script is "baked in" to the new tool and will be executed each time
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
270 the new tool is run. It is highly recommended that scripts and their command lines be developed
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
271 and tested until proven to work before the TF is invoked. Galaxy as a software development
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
272 environment is actually possible, but not recommended being somewhat clumsy and inefficient.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
273
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
274 Tools nearly always take one or more data sets from the user's history as input. TF tools
137
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
275 allow the TF user to define what Galaxy datatypes the tool end user will be able to choose and what
121
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
276 names or positions will be used to pass them on a command line to the package or script.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
277
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
278 Tools often have various parameter settings. The TF allows the TF user to define how each
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
279 parameter will appear on the tool form to the end user, and what names or positions will be
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
280 used to pass them on the command line to the package. At present, parameters are limited to
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
281 simple text and number fields. Pull requests for other kinds of parameters that galaxyxml
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
282 can handle are welcomed.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
283
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
284 Best practice Galaxy tools have one or more automated tests. These should use small sample data sets and
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
285 specific parameter settings so when the tool is tested, the outputs can be compared with their expected
137
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
286 values. The TF will automatically create a test for the new tool. It will use the sample data sets
121
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
287 chosen by the TF user when they built the new tool.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
288
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
289 The TF works by exposing *unrestricted* and therefore extremely dangerous scripting
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
290 to all designated administrators of the host Galaxy server, allowing them to
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
291 run scripts in R, python, sh and perl. For this reason, a Docker container is
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
292 available to help manage the associated risks.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
293
137
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
294 ## Scripting uses
121
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
295
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
296 To use a scripting language to create a new tool, you must first prepared and properly test a script. Use small sample
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
297 data sets for testing. When the script is working correctly, upload the small sample datasets
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
298 into a new history, start configuring a new ToolFactory tool, and paste the script into the script text box on the TF form.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
299
137
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
300 ### Outputs
121
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
301
137
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
302 The TF will generate the new tool described on the TF form, and test it
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
303 using planemo. Optionally if a local toolshed is running, it can be used to
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
304 install the new tool back into the generating Galaxy.
121
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
305
137
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
306 A toolshed is built in to the Docker container and configured
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
307 so a tool can be tested, sent to that toolshed, then installed in the Galaxy
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
308 where the TF is running using the default toolshed and Galaxy URL and API keys.
121
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
309
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
310 Once it's in a ToolShed, it can be installed into any local Galaxy server
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
311 from the server administrative interface.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
312
137
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
313 Once the new tool is installed, local users can run it - each time, the
121
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
314 package and/or script that was supplied when it was built will be executed with the input chosen
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
315 from the user's history, together with user supplied parameters. In other words, the tools you generate with the
137
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
316 TF run just like any other Galaxy tool.
121
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
317
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
318 TF generated tools work as normal workflow components.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
319
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
320
137
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
321 ## Limitations
121
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
322
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
323 The TF is flexible enough to generate wrappers for many common scientific packages
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
324 but the inbuilt automation will not cope with all possible situations. Users can
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
325 supply overrides for two tool XML segments - tests and command and the BWA
137
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
326 example in the supplied samples workflow illustrates their use. It does not deal with
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
327 repeated elements or conditional parameters such as allowing a user to choose to see "simple"
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
328 or "advanced" parameters (yet) and there will be plenty of packages it just
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
329 won't cover - but it's a quick and efficient tool for the other 90% of cases. Perfect for
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
330 that bash one liner you need to get that workflow functioning correctly for this
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
331 afternoon's demonstration!
121
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
332
137
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
333 ## Installation
121
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
334
137
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
335 The Docker container https://github.com/fubar2/toolfactory-galaxy-docker/blob/main/README.md
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
336 is the best way to use the TF because it is preconfigured
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
337 to automate new tool testing and has a built in local toolshed where each new tool
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
338 is uploaded. If you grab the docker container, it should just work after a restart and you
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
339 can run a workflow to generate all the sample tools. Running the samples and rerunning the ToolFactory
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
340 jobs that generated them allows you to add fields and experiment to see how things work.
121
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
341
137
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
342 It can be installed like any other tool from the Toolshed, but you will need to make some
121
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
343 configuration changes (TODO write a configuration). You can install it most conveniently using the
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
344 administrative "Search and browse tool sheds" link. Find the Galaxy Main
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
345 toolshed at https://toolshed.g2.bx.psu.edu/ and search for the toolfactory
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
346 repository in the Tool Maker section. Open it and review the code and select the option to install it.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
347
137
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
348 If not already there please add:
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
349
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
350 ```
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
351 <datatype extension="tgz" type="galaxy.datatypes.binary:Binary" mimetype="multipart/x-gzip" subclass="True" />
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
352 ```
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
353
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
354 to your local config/data_types_conf.xml.
121
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
355
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
356
137
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
357 ## Restricted execution
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
358
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
359 The tool factory tool itself will ONLY run for admin users -
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
360 people with IDs in config/galaxy.yml "admin_users".
121
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
361
137
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
362 *ONLY admin_users can run this tool*
121
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
363
137
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
364 That doesn't mean it's safe to install on a shared or exposed instance - please don't.
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
365
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
366 ## Generated tool Security
121
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
367
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
368 Once you install a generated tool, it's just
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
369 another tool - assuming the script is safe. They just run normally and their
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
370 user cannot do anything unusually insecure but please, practice safe toolshed.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
371 Read the code before you install any tool. Especially this one - it is really scary.
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
372
137
63d15caea378 Uploaded
fubar
parents: 121
diff changeset
373 ## Attribution
121
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
374
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
375 Creating re-usable tools from scripts: The Galaxy Tool Factory
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
376 Ross Lazarus; Antony Kaspi; Mark Ziemann; The Galaxy Team
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
377 Bioinformatics 2012; doi: 10.1093/bioinformatics/bts573
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
378
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
379 http://bioinformatics.oxfordjournals.org/cgi/reprint/bts573?ijkey=lczQh1sWrMwdYWJ&keytype=ref
2050b2475ae5 Uploaded
fubar
parents:
diff changeset
380