diff env/lib/python3.7/site-packages/cwltool/schemas/v1.1.0-dev1/concepts.md @ 0:26e78fe6e8c4 draft

"planemo upload commit c699937486c35866861690329de38ec1a5d9f783"
author shellac
date Sat, 02 May 2020 07:14:21 -0400
parents
children
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/env/lib/python3.7/site-packages/cwltool/schemas/v1.1.0-dev1/concepts.md	Sat May 02 07:14:21 2020 -0400
@@ -0,0 +1,586 @@
+## References to other specifications
+
+**Javascript Object Notation (JSON)**: http://json.org
+
+**JSON Linked Data (JSON-LD)**: http://json-ld.org
+
+**YAML**: http://yaml.org
+
+**Avro**: https://avro.apache.org/docs/1.8.1/spec.html
+
+**Uniform Resource Identifier (URI) Generic Syntax**: https://tools.ietf.org/html/rfc3986)
+
+**Internationalized Resource Identifiers (IRIs)**:
+https://tools.ietf.org/html/rfc3987
+
+**Portable Operating System Interface (POSIX.1-2008)**: http://pubs.opengroup.org/onlinepubs/9699919799/
+
+**Resource Description Framework (RDF)**: http://www.w3.org/RDF/
+
+## Scope
+
+This document describes CWL syntax, execution, and object model.  It
+is not intended to document a CWL specific implementation, however it may
+serve as a reference for the behavior of conforming implementations.
+
+## Terminology
+
+The terminology used to describe CWL documents is defined in the
+Concepts section of the specification. The terms defined in the
+following list are used in building those definitions and in describing the
+actions of a CWL implementation:
+
+**may**: Conforming CWL documents and CWL implementations are permitted but
+not required to behave as described.
+
+**must**: Conforming CWL documents and CWL implementations are required to behave
+as described; otherwise they are in error.
+
+**error**: A violation of the rules of this specification; results are
+undefined. Conforming implementations may detect and report an error and may
+recover from it.
+
+**fatal error**: A violation of the rules of this specification; results are
+undefined. Conforming implementations must not continue to execute the current
+process and may report an error.
+
+**at user option**: Conforming software may or must (depending on the modal verb in
+the sentence) behave as described; if it does, it must provide users a means to
+enable or disable the behavior described.
+
+**deprecated**: Conforming software may implement a behavior for backwards
+compatibility.  Portable CWL documents should not rely on deprecated behavior.
+Behavior marked as deprecated may be removed entirely from future revisions of
+the CWL specification.
+
+# Data model
+
+## Data concepts
+
+An **object** is a data structure equivalent to the "object" type in JSON,
+consisting of a unordered set of name/value pairs (referred to here as
+**fields**) and where the name is a string and the value is a string, number,
+boolean, array, or object.
+
+A **document** is a file containing a serialized object, or an array of objects.
+
+A **process** is a basic unit of computation which accepts input data,
+performs some computation, and produces output data. Examples include
+CommandLineTools, Workflows, and ExpressionTools.
+
+An **input object** is an object describing the inputs to an invocation of
+a process.
+
+An **output object** is an object describing the output resulting from an
+invocation of a process.
+
+An **input schema** describes the valid format (required fields, data types)
+for an input object.
+
+An **output schema** describes the valid format for an output object.
+
+**Metadata** is information about workflows, tools, or input items.
+
+## Syntax
+
+CWL documents must consist of an object or array of objects represented using
+JSON or YAML syntax.  Upon loading, a CWL implementation must apply the
+preprocessing steps described in the
+[Semantic Annotations for Linked Avro Data (SALAD) Specification](SchemaSalad.html).
+An implementation may formally validate the structure of a CWL document using
+SALAD schemas located at
+https://github.com/common-workflow-language/common-workflow-language/tree/master/v1.1.0-dev1
+
+### map
+
+Note: This section is non-normative.
+> type: array<ComplexType> |  
+> map<`key_field`, ComplexType>
+
+The above syntax in the CWL specifications means there are two or more ways to write the given value.
+
+Option one is a array and is the most verbose option.
+
+Option one generic example:
+```
+some_cwl_field:
+  - key_field: a_complex_type1
+    field2: foo
+    field3: bar
+  - key_field: a_complex_type2
+    field2: foo2
+    field3: bar2
+  - key_field: a_complex_type3
+```
+
+Option one specific example using [Workflow](Workflow.html#Workflow).[inputs](Workflow.html#WorkflowInputParameter):
+> array<InputParameter> |  
+> map<`id`, `type` | InputParameter>
+
+
+```
+inputs:
+  - id: workflow_input01
+    type: string
+  - id: workflow_input02
+    type: File
+    format: http://edamontology.org/format_2572
+```
+
+Option two is enabled by the `map<…>` syntax. Instead of an array of entries we
+use a mapping, where one field of the `ComplexType` (here named `key_field`)
+becomes the key in the map, and its value is the rest of the `ComplexType`
+without the key field. If all of the other fields of the `ComplexType` are
+optional and unneeded, then we can indicate this with an empty mapping as the
+value: `a_complex_type3: {}`
+
+Option two generic example:
+```
+some_cwl_field:
+  a_complex_type1:  # this was the "key_field" from above
+    field2: foo
+    field3: bar
+  a_complex_type2:
+    field2: foo2
+    field3: bar2
+  a_complex_type3: {}  # we accept the defualt values for "field2" and "field3"
+```
+
+Option two specific example using [Workflow](Workflow.html#Workflow).[inputs](Workflow.html#WorkflowInputParameter):
+> array&lt;InputParameter&gt; |  
+> map&lt;`id`, `type` | InputParameter&gt;
+
+
+```
+inputs:
+  workflow_input01:
+    type: string
+  workflow_input02:
+    type: File
+    format: http://edamontology.org/format_2572
+```
+
+Option two specific example using [SoftwareRequirement](#SoftwareRequirement).[packages](#SoftwarePackage):
+> array&lt;SoftwarePackage&gt; |  
+> map&lt;`package`, `specs` | SoftwarePackage&gt;
+
+
+```
+hints:
+  SoftwareRequirement:
+    packages:
+      sourmash:
+        specs: [ https://doi.org/10.21105/joss.00027 ]
+      screed:
+        version: [ "1.0" ]
+      python: {}
+```
+`
+Sometimes we have a third and even more compact option denoted like this:
+> type: array&lt;ComplexType&gt; |  
+> map&lt;`key_field`, `field2` | ComplexType&gt;
+
+For this example, if we only need the `key_field` and `field2` when specifying
+our `ComplexType`s (because the other fields are optional and we are fine with
+their default values) then we can abbreviate.
+
+Option three generic example:
+```
+some_cwl_field:
+  a_complex_type1: foo   # we accept the default value for field3
+  a_complex_type2: foo2  # we accept the default value for field3
+  a_complex_type3: {}    # we accept the default values for "field2" and "field3"
+```
+
+Option three specific example using [Workflow](Workflow.html#Workflow).[inputs](Workflow.html#WorkflowInputParameter):
+> array&lt;InputParameter&gt; |
+> map&lt;`id`, `type` | InputParameter&gt;
+
+
+```
+inputs:
+  workflow_input01: string
+  workflow_input02: File  # we accept the default of no File format
+```
+
+Option three specific example using [SoftwareRequirement](#SoftwareRequirement).[packages](#SoftwarePackage):
+> array&lt;SoftwarePackage&gt; |  
+> map&lt;`package`, `specs` | SoftwarePackage&gt;
+
+
+```
+hints:
+  SoftwareRequirement:
+    packages:
+      sourmash: [ https://doi.org/10.21105/joss.00027 ]
+      python: {}
+```
+
+
+What if some entries we want to mix the option 2 and 3? You can!
+
+Mixed option 2 and 3 generic example:
+```
+some_cwl_field:
+  my_complex_type1: foo   # we accept the default value for field3
+  my_complex_type2:
+    field2: foo2
+    field3: bar2          # we did not accept the default value for field3
+                          # so we had to use the slightly expanded syntax
+  my_complex_type3: {}    # as before, we accept the default values for both
+                          # "field2" and "field3"
+```
+
+Mixed option 2 and 3 specific example using [Workflow](Workflow.html#Workflow).[inputs](Workflow.html#WorkflowInputParameter):
+> array&lt;InputParameter&gt; |
+> map&lt;`id`, `type` | InputParameter&gt;
+
+
+```
+inputs:
+  workflow_input01: string
+  workflow_input02:     # we use the longer way
+    type: File          # because we want to specify the "format" too
+    format: http://edamontology.org/format_2572
+  workflow_input03: {}  # back to the short form as this entry
+                        # uses the default of no "type" just like the prior
+                        # examples
+```
+
+Mixed option 2 and 3 specific example using [SoftwareRequirement](#SoftwareRequirement).[packages](#SoftwarePackage):
+> array&lt;SoftwarePackage&gt; |  
+> map&lt;`package`, `specs` | SoftwarePackage&gt;
+
+
+```
+hints:
+  SoftwareRequirement:
+    packages:
+      sourmash: [ https://doi.org/10.21105/joss.00027 ]
+      screed:
+        specs: [ https://github.com/dib-lab/screed ]
+        version: [ "1.0" ]
+      python: {}
+```
+
+Note: The `map<…>` (compact) versions are optional, the verbose option #1 is
+always allowed, but for presentation reasons option 3 and 2 may be preferred
+by human readers.
+
+The normative explanation for these variations, aimed at implementors, is in the
+[Schema Salad specification](SchemaSalad.html#Identifier_maps).
+
+## Identifiers
+
+If an object contains an `id` field, that is used to uniquely identify the
+object in that document.  The value of the `id` field must be unique over the
+entire document.  Identifiers may be resolved relative to either the document
+base and/or other identifiers following the rules are described in the
+[Schema Salad specification](SchemaSalad.html#Identifier_resolution).
+
+An implementation may choose to only honor references to object types for
+which the `id` field is explicitly listed in this specification.
+
+## Document preprocessing
+
+An implementation must resolve [$import](SchemaSalad.html#Import) and
+[$include](SchemaSalad.html#Import) directives as described in the
+[Schema Salad specification](SchemaSalad.html).
+
+Another transformation defined in Schema salad is simplification of data type definitions.
+Type `<T>` ending with `?` should be transformed to `[<T>, "null"]`.
+Type `<T>` ending with `[]` should be transformed to `{"type": "array", "items": <T>}`
+
+## Extensions and metadata
+
+Input metadata (for example, a lab sample identifier) may be represented within
+a tool or workflow using input parameters which are explicitly propagated to
+output.  Future versions of this specification may define additional facilities
+for working with input/output metadata.
+
+Implementation extensions not required for correct execution (for example,
+fields related to GUI presentation) and metadata about the tool or workflow
+itself (for example, authorship for use in citations) may be provided as
+additional fields on any object.  Such extensions fields must use a namespace
+prefix listed in the `$namespaces` section of the document as described in the
+[Schema Salad specification](SchemaSalad.html#Explicit_context).
+
+Implementation extensions which modify execution semantics must be [listed in
+the `requirements` field](#Requirements_and_hints).
+
+# Execution model
+
+## Execution concepts
+
+A **parameter** is a named symbolic input or output of process, with an
+associated datatype or schema.  During execution, values are assigned to
+parameters to make the input object or output object used for concrete
+process invocation.
+
+A **CommandLineTool** is a process characterized by the execution of a
+standalone, non-interactive program which is invoked on some input,
+produces output, and then terminates.
+
+A **workflow** is a process characterized by multiple subprocess steps,
+where step outputs are connected to the inputs of downstream steps to
+form a directed acylic graph, and independent steps may run concurrently.
+
+A **runtime environment** is the actual hardware and software environment when
+executing a command line tool.  It includes, but is not limited to, the
+hardware architecture, hardware resources, operating system, software runtime
+(if applicable, such as the specific Python interpreter or the specific Java
+virtual machine), libraries, modules, packages, utilities, and data files
+required to run the tool.
+
+A **workflow platform** is a specific hardware and software implementation
+capable of interpreting CWL documents and executing the processes specified by
+the document.  The responsibilities of the workflow platform may include
+scheduling process invocation, setting up the necessary runtime environment,
+making input data available, invoking the tool process, and collecting output.
+
+A workflow platform may choose to only implement the Command Line Tool
+Description part of the CWL specification.
+
+It is intended that the workflow platform has broad leeway outside of this
+specification to optimize use of computing resources and enforce policies
+not covered by this specification.  Some areas that are currently out of
+scope for CWL specification but may be handled by a specific workflow
+platform include:
+
+* Data security and permissions
+* Scheduling tool invocations on remote cluster or cloud compute nodes.
+* Using virtual machines or operating system containers to manage the runtime
+(except as described in [DockerRequirement](CommandLineTool.html#DockerRequirement)).
+* Using remote or distributed file systems to manage input and output files.
+* Transforming file paths.
+* Determining if a process has previously been executed, and if so skipping it
+and reusing previous results.
+* Pausing, resuming or checkpointing processes or workflows.
+
+Conforming CWL processes must not assume anything about the runtime
+environment or workflow platform unless explicitly declared though the use
+of [process requirements](#Requirements_and_hints).
+
+## Generic execution process
+
+The generic execution sequence of a CWL process (including workflows and
+command line line tools) is as follows.
+
+1. Load input object.
+1. Load, process and validate a CWL document, yielding one or more process objects.
+1. If there are multiple process objects (due to [`$graph`](SchemaSalad.html#Document_graph))
+and which process object to start with is not specified in the input object (via
+a [`cwl:tool`](#Executing_CWL_documents_as_scripts) entry) or by any other means
+(like a URL fragment) then choose the process with the `id` of "#main" or "main".
+1. Validate the input object against the `inputs` schema for the process.
+1. Validate process requirements are met.
+1. Perform any further setup required by the specific process type.
+1. Execute the process.
+1. Capture results of process execution into the output object.
+1. Validate the output object against the `outputs` schema for the process.
+1. Report the output object to the process caller.
+
+## Requirements and hints
+
+A **process requirement** modifies the semantics or runtime
+environment of a process.  If an implementation cannot satisfy all
+requirements, or a requirement is listed which is not recognized by the
+implementation, it is a fatal error and the implementation must not attempt
+to run the process, unless overridden at user option.
+
+A **hint** is similar to a requirement; however, it is not an error if an
+implementation cannot satisfy all hints.  The implementation may report a
+warning if a hint cannot be satisfied.
+
+Optionally, implementations may allow requirements to be specified in the input
+object document as an array of requirements under the field name
+`cwl:requirements`. If implementations allow this, then such requirements
+should be combined with any requirements present in the corresponding Process
+as if they were specified there.
+
+Requirements specified in a parent Workflow are inherited by step processes
+if they are valid for that step. If the substep is a CommandLineTool
+only the `InlineJavascriptRequirement`, `SchemaDefRequirement`, `DockerRequirement`,
+`SoftwareRequirement`, `InitialWorkDirRequirement`, `EnvVarRequirement`, 
+`ShellCommandRequirement`, `ResourceRequirement` are valid.
+
+*As good practice, it is best to have process requirements be self-contained,
+such that each process can run successfully by itself.*
+
+If the same process requirement appears at different levels of the
+workflow, the most specific instance of the requirement is used, that is,
+an entry in `requirements` on a process implementation such as
+CommandLineTool will take precedence over an entry in `requirements`
+specified in a workflow step, and an entry in `requirements` on a workflow
+step takes precedence over the workflow.  Entries in `hints` are resolved
+the same way.
+
+Requirements override hints.  If a process implementation provides a
+process requirement in `hints` which is also provided in `requirements` by
+an enclosing workflow or workflow step, the enclosing `requirements` takes
+precedence.
+
+## Parameter references
+
+Parameter references are denoted by the syntax `$(...)` and may be used in any
+field permitting the pseudo-type `Expression`, as specified by this document.
+Conforming implementations must support parameter references.  Parameter
+references use the following subset of
+[Javascript/ECMAScript 5.1](http://www.ecma-international.org/ecma-262/5.1/)
+syntax, but they are designed to not require a Javascript engine for evaluation.
+
+In the following [BNF
+grammar](https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_Form), character
+classes, and grammar rules are denoted in '{}', '-' denotes exclusion from a
+character class, '(())' denotes grouping, '|' denotes alternates, trailing
+'*' denotes zero or more repeats, '+' denote one or more repeats, '/' escapes
+these special characters, and all other characters are literal values.
+
+<p>
+<table class="table">
+<tr><td>symbol::             </td><td>{Unicode alphanumeric}+</td></tr>
+<tr><td>singleq::            </td><td>[' (( {character - '} | \' ))* ']</td></tr>
+<tr><td>doubleq::            </td><td>[" (( {character - "} | \" ))* "]</td></tr>
+<tr><td>index::              </td><td>[ {decimal digit}+ ]</td></tr>
+<tr><td>segment::            </td><td>. {symbol} | {singleq} | {doubleq} | {index}</td></tr>
+<tr><td>parameter reference::</td><td>$( {symbol} {segment}*)</td></tr>
+</table>
+</p>
+
+Use the following algorithm to resolve a parameter reference:
+
+  1. Match the leading symbol as the key
+  2. Look up the key in the parameter context (described below) to get the current value.
+     It is an error if the key is not found in the parameter context.
+  3. If there are no subsequent segments, terminate and return current value
+  4. Else, match the next segment
+  5. Extract the symbol, string, or index from the segment as the key
+  6. Look up the key in current value and assign as new current value.  If
+     the key is a symbol or string, the current value must be an object.
+     If the key is an index, the current value must be an array or string.
+     It is an error if the key does not match the required type, or the key is not found or out
+     of range.
+  7. Repeat steps 3-6
+
+The root namespace is the parameter context.  The following parameters must
+be provided:
+
+  * `inputs`: The input object to the current Process.
+  * `self`: A context-specific value.  The contextual values for 'self' are
+    documented for specific fields elsewhere in this specification.  If
+    a contextual value of 'self' is not documented for a field, it
+    must be 'null'.
+  * `runtime`: An object containing configuration details.  Specific to the
+    process type.  An implementation may provide
+    opaque strings for any or all fields of `runtime`.  These must be
+    filled in by the platform after processing the Tool but before actual
+    execution.  Parameter references and expressions may only use the
+    literal string value of the field and must not perform computation on
+    the contents, except where noted otherwise.
+
+If the value of a field has no leading or trailing non-whitespace
+characters around a parameter reference, the effective value of the field
+becomes the value of the referenced parameter, preserving the return type.
+
+If the value of a field has non-whitespace leading or trailing characters
+around a parameter reference, it is subject to string interpolation.  The
+effective value of the field is a string containing the leading characters,
+followed by the string value of the parameter reference, followed by the
+trailing characters.  The string value of the parameter reference is its
+textual JSON representation with the following rules:
+
+  * Leading and trailing quotes are stripped from strings
+  * Objects entries are sorted by key
+
+Multiple parameter references may appear in a single field.  This case
+must be treated as a string interpolation.  After interpolating the first
+parameter reference, interpolation must be recursively applied to the
+trailing characters to yield the final string value.
+
+## Expressions
+
+An expression is a fragment of [Javascript/ECMAScript
+5.1](http://www.ecma-international.org/ecma-262/5.1/) code evaluated by the
+workflow platform to affect the inputs, outputs, or
+behavior of a process.  In the generic execution sequence, expressions may
+be evaluated during step 5 (process setup), step 6 (execute process),
+and/or step 7 (capture output).  Expressions are distinct from regular
+processes in that they are intended to modify the behavior of the workflow
+itself rather than perform the primary work of the workflow.
+
+To declare the use of expressions, the document must include the process
+requirement `InlineJavascriptRequirement`.  Expressions may be used in any
+field permitting the pseudo-type `Expression`, as specified by this
+document.
+
+Expressions are denoted by the syntax `$(...)` or `${...}`.  A code
+fragment wrapped in the `$(...)` syntax must be evaluated as a
+[ECMAScript expression](http://www.ecma-international.org/ecma-262/5.1/#sec-11).  A
+code fragment wrapped in the `${...}` syntax must be evaluated as a
+[ECMAScript function body](http://www.ecma-international.org/ecma-262/5.1/#sec-13)
+for an anonymous, zero-argument function.  Expressions must return a valid JSON
+data type: one of null, string, number, boolean, array, object. Other return
+values must result in a `permanentFailure`. Implementations must permit any
+syntactically valid Javascript and account for nesting of parenthesis or braces
+and that strings that may contain parenthesis or braces when scanning for
+expressions.
+
+The runtime must include any code defined in the ["expressionLib" field of
+InlineJavascriptRequirement](#InlineJavascriptRequirement) prior to
+executing the actual expression.
+
+Before executing the expression, the runtime must initialize as global
+variables the fields of the parameter context described above.
+
+The effective value of the field after expression evaluation follows the
+same rules as parameter references discussed above.  Multiple expressions
+may appear in a single field.
+
+Expressions must be evaluated in an isolated context (a "sandbox") which
+permits no side effects to leak outside the context.  Expressions also must
+be evaluated in [Javascript strict mode](http://www.ecma-international.org/ecma-262/5.1/#sec-4.2.2).
+
+The order in which expressions are evaluated is undefined except where
+otherwise noted in this document.
+
+An implementation may choose to implement parameter references by
+evaluating as a Javascript expression.  The results of evaluating
+parameter references must be identical whether implemented by Javascript
+evaluation or some other means.
+
+Implementations may apply other limits, such as process isolation, timeouts,
+and operating system containers/jails to minimize the security risks associated
+with running untrusted code embedded in a CWL document.
+
+Exceptions thrown from an exception must result in a `permanentFailure` of the
+process.
+
+## Executing CWL documents as scripts
+
+By convention, a CWL document may begin with `#!/usr/bin/env cwl-runner`
+and be marked as executable (the POSIX "+x" permission bits) to enable it
+to be executed directly.  A workflow platform may support this mode of
+operation; if so, it must provide `cwl-runner` as an alias for the
+platform's CWL implementation.
+
+A CWL input object document may similarly begin with `#!/usr/bin/env
+cwl-runner` and be marked as executable.  In this case, the input object
+must include the field `cwl:tool` supplying an IRI to the default CWL
+document that should be executed using the fields of the input object as
+input parameters.
+
+The `cwl-runner` interface is required for conformance testing and is
+documented in [cwl-runner.cwl](cwl-runner.cwl).
+
+## Discovering CWL documents on a local filesystem
+
+To discover CWL documents look in the following locations:
+
+`/usr/share/commonwl/`
+
+`/usr/local/share/commonwl/`
+
+`$XDG_DATA_HOME/commonwl/` (usually `$HOME/.local/share/commonwl`)
+
+`$XDG_DATA_HOME` is from the [XDG Base Directory
+Specification](http://standards.freedesktop.org/basedir-spec/basedir-spec-0.6.html)