Mercurial > repos > shellac > guppy_basecaller
comparison env/lib/python3.7/site-packages/cwltool/schemas/v1.0/concepts.md @ 2:6af9afd405e9 draft
"planemo upload commit 0a63dd5f4d38a1f6944587f52a8cd79874177fc1"
author | shellac |
---|---|
date | Thu, 14 May 2020 14:56:58 -0400 |
parents | 26e78fe6e8c4 |
children |
comparison
equal
deleted
inserted
replaced
1:75ca89e9b81c | 2:6af9afd405e9 |
---|---|
1 ## References to other specifications | |
2 | |
3 **Javascript Object Notation (JSON)**: http://json.org | |
4 | |
5 **JSON Linked Data (JSON-LD)**: http://json-ld.org | |
6 | |
7 **YAML**: http://yaml.org | |
8 | |
9 **Avro**: https://avro.apache.org/docs/1.8.1/spec.html | |
10 | |
11 **Uniform Resource Identifier (URI) Generic Syntax**: https://tools.ietf.org/html/rfc3986) | |
12 | |
13 **Internationalized Resource Identifiers (IRIs)**: | |
14 https://tools.ietf.org/html/rfc3987 | |
15 | |
16 **Portable Operating System Interface (POSIX.1-2008)**: http://pubs.opengroup.org/onlinepubs/9699919799/ | |
17 | |
18 **Resource Description Framework (RDF)**: http://www.w3.org/RDF/ | |
19 | |
20 ## Scope | |
21 | |
22 This document describes CWL syntax, execution, and object model. It | |
23 is not intended to document a CWL specific implementation, however it may | |
24 serve as a reference for the behavior of conforming implementations. | |
25 | |
26 ## Terminology | |
27 | |
28 The terminology used to describe CWL documents is defined in the | |
29 Concepts section of the specification. The terms defined in the | |
30 following list are used in building those definitions and in describing the | |
31 actions of a CWL implementation: | |
32 | |
33 **may**: Conforming CWL documents and CWL implementations are permitted but | |
34 not required to behave as described. | |
35 | |
36 **must**: Conforming CWL documents and CWL implementations are required to behave | |
37 as described; otherwise they are in error. | |
38 | |
39 **error**: A violation of the rules of this specification; results are | |
40 undefined. Conforming implementations may detect and report an error and may | |
41 recover from it. | |
42 | |
43 **fatal error**: A violation of the rules of this specification; results are | |
44 undefined. Conforming implementations must not continue to execute the current | |
45 process and may report an error. | |
46 | |
47 **at user option**: Conforming software may or must (depending on the modal verb in | |
48 the sentence) behave as described; if it does, it must provide users a means to | |
49 enable or disable the behavior described. | |
50 | |
51 **deprecated**: Conforming software may implement a behavior for backwards | |
52 compatibility. Portable CWL documents should not rely on deprecated behavior. | |
53 Behavior marked as deprecated may be removed entirely from future revisions of | |
54 the CWL specification. | |
55 | |
56 # Data model | |
57 | |
58 ## Data concepts | |
59 | |
60 An **object** is a data structure equivalent to the "object" type in JSON, | |
61 consisting of a unordered set of name/value pairs (referred to here as | |
62 **fields**) and where the name is a string and the value is a string, number, | |
63 boolean, array, or object. | |
64 | |
65 A **document** is a file containing a serialized object, or an array of objects. | |
66 | |
67 A **process** is a basic unit of computation which accepts input data, | |
68 performs some computation, and produces output data. Examples include | |
69 CommandLineTools, Workflows, and ExpressionTools. | |
70 | |
71 An **input object** is an object describing the inputs to an invocation of | |
72 a process. | |
73 | |
74 An **output object** is an object describing the output resulting from an | |
75 invocation of a process. | |
76 | |
77 An **input schema** describes the valid format (required fields, data types) | |
78 for an input object. | |
79 | |
80 An **output schema** describes the valid format for an output object. | |
81 | |
82 **Metadata** is information about workflows, tools, or input items. | |
83 | |
84 ## Syntax | |
85 | |
86 CWL documents must consist of an object or array of objects represented using | |
87 JSON or YAML syntax. Upon loading, a CWL implementation must apply the | |
88 preprocessing steps described in the | |
89 [Semantic Annotations for Linked Avro Data (SALAD) Specification](SchemaSalad.html). | |
90 An implementation may formally validate the structure of a CWL document using | |
91 SALAD schemas located at | |
92 https://github.com/common-workflow-language/common-workflow-language/tree/master/v1.0 | |
93 | |
94 ## Identifiers | |
95 | |
96 If an object contains an `id` field, that is used to uniquely identify the | |
97 object in that document. The value of the `id` field must be unique over the | |
98 entire document. Identifiers may be resolved relative to either the document | |
99 base and/or other identifiers following the rules are described in the | |
100 [Schema Salad specification](SchemaSalad.html#Identifier_resolution). | |
101 | |
102 An implementation may choose to only honor references to object types for | |
103 which the `id` field is explicitly listed in this specification. | |
104 | |
105 ## Document preprocessing | |
106 | |
107 An implementation must resolve [$import](SchemaSalad.html#Import) and | |
108 [$include](SchemaSalad.html#Import) directives as described in the | |
109 [Schema Salad specification](SchemaSalad.html). | |
110 | |
111 Another transformation defined in Schema salad is simplification of data type definitions. | |
112 Type `<T>` ending with `?` should be transformed to `[<T>, "null"]`. | |
113 Type `<T>` ending with `[]` should be transformed to `{"type": "array", "items": <T>}` | |
114 | |
115 ## Extensions and metadata | |
116 | |
117 Input metadata (for example, a lab sample identifier) may be represented within | |
118 a tool or workflow using input parameters which are explicitly propagated to | |
119 output. Future versions of this specification may define additional facilities | |
120 for working with input/output metadata. | |
121 | |
122 Implementation extensions not required for correct execution (for example, | |
123 fields related to GUI presentation) and metadata about the tool or workflow | |
124 itself (for example, authorship for use in citations) may be provided as | |
125 additional fields on any object. Such extensions fields must use a namespace | |
126 prefix listed in the `$namespaces` section of the document as described in the | |
127 [Schema Salad specification](SchemaSalad.html#Explicit_context). | |
128 | |
129 Implementation extensions which modify execution semantics must be [listed in | |
130 the `requirements` field](#Requirements_and_hints). | |
131 | |
132 # Execution model | |
133 | |
134 ## Execution concepts | |
135 | |
136 A **parameter** is a named symbolic input or output of process, with an | |
137 associated datatype or schema. During execution, values are assigned to | |
138 parameters to make the input object or output object used for concrete | |
139 process invocation. | |
140 | |
141 A **CommandLineTool** is a process characterized by the execution of a | |
142 standalone, non-interactive program which is invoked on some input, | |
143 produces output, and then terminates. | |
144 | |
145 A **workflow** is a process characterized by multiple subprocess steps, | |
146 where step outputs are connected to the inputs of downstream steps to | |
147 form a directed acylic graph, and independent steps may run concurrently. | |
148 | |
149 A **runtime environment** is the actual hardware and software environment when | |
150 executing a command line tool. It includes, but is not limited to, the | |
151 hardware architecture, hardware resources, operating system, software runtime | |
152 (if applicable, such as the specific Python interpreter or the specific Java | |
153 virtual machine), libraries, modules, packages, utilities, and data files | |
154 required to run the tool. | |
155 | |
156 A **workflow platform** is a specific hardware and software implementation | |
157 capable of interpreting CWL documents and executing the processes specified by | |
158 the document. The responsibilities of the workflow platform may include | |
159 scheduling process invocation, setting up the necessary runtime environment, | |
160 making input data available, invoking the tool process, and collecting output. | |
161 | |
162 A workflow platform may choose to only implement the Command Line Tool | |
163 Description part of the CWL specification. | |
164 | |
165 It is intended that the workflow platform has broad leeway outside of this | |
166 specification to optimize use of computing resources and enforce policies | |
167 not covered by this specification. Some areas that are currently out of | |
168 scope for CWL specification but may be handled by a specific workflow | |
169 platform include: | |
170 | |
171 * Data security and permissions | |
172 * Scheduling tool invocations on remote cluster or cloud compute nodes. | |
173 * Using virtual machines or operating system containers to manage the runtime | |
174 (except as described in [DockerRequirement](CommandLineTool.html#DockerRequirement)). | |
175 * Using remote or distributed file systems to manage input and output files. | |
176 * Transforming file paths. | |
177 * Determining if a process has previously been executed, and if so skipping it | |
178 and reusing previous results. | |
179 * Pausing, resuming or checkpointing processes or workflows. | |
180 | |
181 Conforming CWL processes must not assume anything about the runtime | |
182 environment or workflow platform unless explicitly declared though the use | |
183 of [process requirements](#Requirements_and_hints). | |
184 | |
185 ## Generic execution process | |
186 | |
187 The generic execution sequence of a CWL process (including workflows and | |
188 command line line tools) is as follows. | |
189 | |
190 1. Load, process and validate a CWL document, yielding a process object. | |
191 2. Load input object. | |
192 3. Validate the input object against the `inputs` schema for the process. | |
193 4. Validate process requirements are met. | |
194 5. Perform any further setup required by the specific process type. | |
195 6. Execute the process. | |
196 7. Capture results of process execution into the output object. | |
197 8. Validate the output object against the `outputs` schema for the process. | |
198 9. Report the output object to the process caller. | |
199 | |
200 ## Requirements and hints | |
201 | |
202 A **process requirement** modifies the semantics or runtime | |
203 environment of a process. If an implementation cannot satisfy all | |
204 requirements, or a requirement is listed which is not recognized by the | |
205 implementation, it is a fatal error and the implementation must not attempt | |
206 to run the process, unless overridden at user option. | |
207 | |
208 A **hint** is similar to a requirement; however, it is not an error if an | |
209 implementation cannot satisfy all hints. The implementation may report a | |
210 warning if a hint cannot be satisfied. | |
211 | |
212 Requirements are inherited. A requirement specified in a Workflow applies | |
213 to all workflow steps; a requirement specified on a workflow step will | |
214 apply to the process implementation of that step and any of its substeps. | |
215 | |
216 If the same process requirement appears at different levels of the | |
217 workflow, the most specific instance of the requirement is used, that is, | |
218 an entry in `requirements` on a process implementation such as | |
219 CommandLineTool will take precedence over an entry in `requirements` | |
220 specified in a workflow step, and an entry in `requirements` on a workflow | |
221 step takes precedence over the workflow. Entries in `hints` are resolved | |
222 the same way. | |
223 | |
224 Requirements override hints. If a process implementation provides a | |
225 process requirement in `hints` which is also provided in `requirements` by | |
226 an enclosing workflow or workflow step, the enclosing `requirements` takes | |
227 precedence. | |
228 | |
229 ## Parameter references | |
230 | |
231 Parameter references are denoted by the syntax `$(...)` and may be used in any | |
232 field permitting the pseudo-type `Expression`, as specified by this document. | |
233 Conforming implementations must support parameter references. Parameter | |
234 references use the following subset of | |
235 [Javascript/ECMAScript 5.1](http://www.ecma-international.org/ecma-262/5.1/) | |
236 syntax, but they are designed to not require a Javascript engine for evaluation. | |
237 | |
238 In the following BNF grammar, character classes, and grammar rules are denoted | |
239 in '{}', '-' denotes exclusion from a character class, '(())' denotes grouping, | |
240 '|' denotes alternates, trailing '*' denotes zero or more repeats, '+' denote | |
241 one or more repeats, '/' escapes these special characters, and all other | |
242 characters are literal values. | |
243 | |
244 <p> | |
245 <table class="table"> | |
246 <tr><td>symbol:: </td><td>{Unicode alphanumeric}+</td></tr> | |
247 <tr><td>singleq:: </td><td>[' (( {character - '} | \' ))* ']</td></tr> | |
248 <tr><td>doubleq:: </td><td>[" (( {character - "} | \" ))* "]</td></tr> | |
249 <tr><td>index:: </td><td>[ {decimal digit}+ ]</td></tr> | |
250 <tr><td>segment:: </td><td>. {symbol} | {singleq} | {doubleq} | {index}</td></tr> | |
251 <tr><td>parameter reference::</td><td>$( {symbol} {segment}*)</td></tr> | |
252 </table> | |
253 </p> | |
254 | |
255 Use the following algorithm to resolve a parameter reference: | |
256 | |
257 1. Match the leading symbol as the key | |
258 2. Look up the key in the parameter context (described below) to get the current value. | |
259 It is an error if the key is not found in the parameter context. | |
260 3. If there are no subsequent segments, terminate and return current value | |
261 4. Else, match the next segment | |
262 5. Extract the symbol, string, or index from the segment as the key | |
263 6. Look up the key in current value and assign as new current value. If | |
264 the key is a symbol or string, the current value must be an object. | |
265 If the key is an index, the current value must be an array or string. | |
266 It is an error if the key does not match the required type, or the key is not found or out | |
267 of range. | |
268 7. Repeat steps 3-6 | |
269 | |
270 The root namespace is the parameter context. The following parameters must | |
271 be provided: | |
272 | |
273 * `inputs`: The input object to the current Process. | |
274 * `self`: A context-specific value. The contextual values for 'self' are | |
275 documented for specific fields elsewhere in this specification. If | |
276 a contextual value of 'self' is not documented for a field, it | |
277 must be 'null'. | |
278 * `runtime`: An object containing configuration details. Specific to the | |
279 process type. An implementation may provide | |
280 opaque strings for any or all fields of `runtime`. These must be | |
281 filled in by the platform after processing the Tool but before actual | |
282 execution. Parameter references and expressions may only use the | |
283 literal string value of the field and must not perform computation on | |
284 the contents, except where noted otherwise. | |
285 | |
286 If the value of a field has no leading or trailing non-whitespace | |
287 characters around a parameter reference, the effective value of the field | |
288 becomes the value of the referenced parameter, preserving the return type. | |
289 | |
290 If the value of a field has non-whitespace leading or trailing characters | |
291 around a parameter reference, it is subject to string interpolation. The | |
292 effective value of the field is a string containing the leading characters, | |
293 followed by the string value of the parameter reference, followed by the | |
294 trailing characters. The string value of the parameter reference is its | |
295 textual JSON representation with the following rules: | |
296 | |
297 * Leading and trailing quotes are stripped from strings | |
298 * Objects entries are sorted by key | |
299 | |
300 Multiple parameter references may appear in a single field. This case | |
301 must be treated as a string interpolation. After interpolating the first | |
302 parameter reference, interpolation must be recursively applied to the | |
303 trailing characters to yield the final string value. | |
304 | |
305 ## Expressions | |
306 | |
307 An expression is a fragment of [Javascript/ECMAScript | |
308 5.1](http://www.ecma-international.org/ecma-262/5.1/) code evaluated by the | |
309 workflow platform to affect the inputs, outputs, or | |
310 behavior of a process. In the generic execution sequence, expressions may | |
311 be evaluated during step 5 (process setup), step 6 (execute process), | |
312 and/or step 7 (capture output). Expressions are distinct from regular | |
313 processes in that they are intended to modify the behavior of the workflow | |
314 itself rather than perform the primary work of the workflow. | |
315 | |
316 To declare the use of expressions, the document must include the process | |
317 requirement `InlineJavascriptRequirement`. Expressions may be used in any | |
318 field permitting the pseudo-type `Expression`, as specified by this | |
319 document. | |
320 | |
321 Expressions are denoted by the syntax `$(...)` or `${...}`. A code | |
322 fragment wrapped in the `$(...)` syntax must be evaluated as a | |
323 [ECMAScript expression](http://www.ecma-international.org/ecma-262/5.1/#sec-11). A | |
324 code fragment wrapped in the `${...}` syntax must be evaluated as a | |
325 [ECMAScript function body](http://www.ecma-international.org/ecma-262/5.1/#sec-13) | |
326 for an anonymous, zero-argument function. Expressions must return a valid JSON | |
327 data type: one of null, string, number, boolean, array, object. Other return | |
328 values must result in a `permanentFailure`. Implementations must permit any | |
329 syntactically valid Javascript and account for nesting of parenthesis or braces | |
330 and that strings that may contain parenthesis or braces when scanning for | |
331 expressions. | |
332 | |
333 The runtime must include any code defined in the ["expressionLib" field of | |
334 InlineJavascriptRequirement](#InlineJavascriptRequirement) prior to | |
335 executing the actual expression. | |
336 | |
337 Before executing the expression, the runtime must initialize as global | |
338 variables the fields of the parameter context described above. | |
339 | |
340 The effective value of the field after expression evaluation follows the | |
341 same rules as parameter references discussed above. Multiple expressions | |
342 may appear in a single field. | |
343 | |
344 Expressions must be evaluated in an isolated context (a "sandbox") which | |
345 permits no side effects to leak outside the context. Expressions also must | |
346 be evaluated in [Javascript strict mode](http://www.ecma-international.org/ecma-262/5.1/#sec-4.2.2). | |
347 | |
348 The order in which expressions are evaluated is undefined except where | |
349 otherwise noted in this document. | |
350 | |
351 An implementation may choose to implement parameter references by | |
352 evaluating as a Javascript expression. The results of evaluating | |
353 parameter references must be identical whether implemented by Javascript | |
354 evaluation or some other means. | |
355 | |
356 Implementations may apply other limits, such as process isolation, timeouts, | |
357 and operating system containers/jails to minimize the security risks associated | |
358 with running untrusted code embedded in a CWL document. | |
359 | |
360 Exceptions thrown from an exception must result in a `permanentFailure` of the | |
361 process. | |
362 | |
363 ## Executing CWL documents as scripts | |
364 | |
365 By convention, a CWL document may begin with `#!/usr/bin/env cwl-runner` | |
366 and be marked as executable (the POSIX "+x" permission bits) to enable it | |
367 to be executed directly. A workflow platform may support this mode of | |
368 operation; if so, it must provide `cwl-runner` as an alias for the | |
369 platform's CWL implementation. | |
370 | |
371 A CWL input object document may similarly begin with `#!/usr/bin/env | |
372 cwl-runner` and be marked as executable. In this case, the input object | |
373 must include the field `cwl:tool` supplying an IRI to the default CWL | |
374 document that should be executed using the fields of the input object as | |
375 input parameters. | |
376 | |
377 ## Discovering CWL documents on a local filesystem | |
378 | |
379 To discover CWL documents look in the following locations: | |
380 | |
381 `/usr/share/commonwl/` | |
382 | |
383 `/usr/local/share/commonwl/` | |
384 | |
385 `$XDG_DATA_HOME/commonwl/` (usually `$HOME/.local/share/commonwl`) | |
386 | |
387 `$XDG_DATA_HOME` is from the [XDG Base Directory | |
388 Specification](http://standards.freedesktop.org/basedir-spec/basedir-spec-0.6.html) |