Mercurial > repos > shellac > guppy_basecaller

diff env/lib/python3.7/site-packages/schema_salad/metaschema/salad.md @ 2:6af9afd405e9 draft
"planemo upload commit 0a63dd5f4d38a1f6944587f52a8cd79874177fc1"
author: shellac
date: Thu, 14 May 2020 14:56:58 -0400
parents: 26e78fe6e8c4
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/env/lib/python3.7/site-packages/schema_salad/metaschema/salad.md	Thu May 14 14:56:58 2020 -0400
@@ -0,0 +1,306 @@
+# Semantic Annotations for Linked Avro Data (SALAD)
+
+Author:
+
+* Peter Amstutz <pamstutz@veritasgenetics.com>, Veritas Genetics
+
+Contributors:
+
+* The developers of Apache Avro
+* The developers of JSON-LD
+* Nebojša Tijanić <nebojsa.tijanic@sbgenomics.com>, Seven Bridges Genomics
+
+# Abstract
+
+Salad is a schema language for describing structured linked data documents
+in JSON or YAML documents.  A Salad schema provides rules for
+preprocessing, structural validation, and link checking for documents
+described by a Salad schema.  Salad builds on JSON-LD and the Apache Avro
+data serialization system, and extends Avro with features for rich data
+modeling such as inheritance, template specialization, object identifiers,
+and object references.  Salad was developed to provide a bridge between the
+record oriented data modeling supported by Apache Avro and the Semantic
+Web.
+
+# Status of This Document
+
+This document is the product of the [Common Workflow Language working
+group](https://groups.google.com/forum/#!forum/common-workflow-language).  The
+latest version of this document is available in the "schema_salad" repository at
+
+https://github.com/common-workflow-language/schema_salad
+
+The products of the CWL working group (including this document) are made available
+under the terms of the Apache License, version 2.0.
+
+<!--ToC-->
+
+# Introduction
+
+The JSON data model is an extremely popular way to represent structured
+data.  It is attractive because of its relative simplicity and is a
+natural fit with the standard types of many programming languages.
+However, this simplicity means that basic JSON lacks expressive features
+useful for working with complex data structures and document formats, such
+as schemas, object references, and namespaces.
+
+JSON-LD is a W3C standard providing a way to describe how to interpret a
+JSON document as Linked Data by means of a "context".  JSON-LD provides a
+powerful solution for representing object references and namespaces in JSON
+based on standard web URIs, but is not itself a schema language.  Without a
+schema providing a well defined structure, it is difficult to process an
+arbitrary JSON-LD document as idiomatic JSON because there are many ways to
+express the same data that are logically equivalent but structurally
+distinct.
+
+Several schema languages exist for describing and validating JSON data,
+such as the Apache Avro data serialization system, however none understand
+linked data.  As a result, to fully take advantage of JSON-LD to build the
+next generation of linked data applications, one must maintain separate
+JSON schema, JSON-LD context, RDF schema, and human documentation, despite
+significant overlap of content and obvious need for these documents to stay
+synchronized.
+
+Schema Salad is designed to address this gap.  It provides a schema
+language and processing rules for describing structured JSON content
+permitting URI resolution and strict document validation.  The schema
+language supports linked data through annotations that describe the linked
+data interpretation of the content, enables generation of JSON-LD context
+and RDF schema, and production of RDF triples by applying the JSON-LD
+context.  The schema language also provides for robust support of inline
+documentation.
+
+## Introduction to v1.1
+
+This is the third version of of the Schema Salad specification.  It is
+developed concurrently with v1.1 of the Common Workflow Language for use in
+specifying the Common Workflow Language, however Schema Salad is intended to be
+useful to a broader audience.  Compared to the v1.0 schema salad
+specification, the following changes have been made:
+
+* Support for `default` values on record fields to specify default values
+* Add subscoped fields (fields which introduce a new inner scope for identifiers)
+* Add the *inVocab* flag (default true) to indicate if a type is added to the vocabulary of well known terms or must be prefixed
+* Add *secondaryFilesDSL* micro DSL (domain specific language) to convert text strings to a secondaryFiles record type used in CWL
+* The `$mixin` feature has been removed from the specification, as it
+  is poorly documented, not included in conformance testing,
+  and not widely supported.
+
+## References to Other Specifications
+
+**Javascript Object Notation (JSON)**: http://json.org
+
+**JSON Linked Data (JSON-LD)**: http://json-ld.org
+
+**YAML**: https://yaml.org/spec/1.2/spec.html
+
+**Avro**: https://avro.apache.org/docs/current/spec.html
+
+**Uniform Resource Identifier (URI) Generic Syntax**: https://tools.ietf.org/html/rfc3986)
+
+**Resource Description Framework (RDF)**: http://www.w3.org/RDF/
+
+**UTF-8**: https://www.ietf.org/rfc/rfc2279.txt)
+
+## Scope
+
+This document describes the syntax, data model, algorithms, and schema
+language for working with Salad documents.  It is not intended to document
+a specific implementation of Salad, however it may serve as a reference for
+the behavior of conforming implementations.
+
+## Terminology
+
+The terminology used to describe Salad documents is defined in the Concepts
+section of the specification. The terms defined in the following list are
+used in building those definitions and in describing the actions of an
+Salad implementation:
+
+**may**: Conforming Salad documents and Salad implementations are permitted but
+not required to be interpreted as described.
+
+**must**: Conforming Salad documents and Salad implementations are required
+to be interpreted as described; otherwise they are in error.
+
+**error**: A violation of the rules of this specification; results are
+undefined. Conforming implementations may detect and report an error and may
+recover from it.
+
+**fatal error**: A violation of the rules of this specification; results
+are undefined. Conforming implementations must not continue to process the
+document and may report an error.
+
+**at user option**: Conforming software may or must (depending on the modal verb in
+the sentence) behave as described; if it does, it must provide users a means to
+enable or disable the behavior described.
+
+# Document model
+
+## Data concepts
+
+An **object** is a data structure equivalent to the "object" type in JSON,
+consisting of a unordered set of name/value pairs (referred to here as
+**fields**) and where the name is a string and the value is a string, number,
+boolean, array, or object.
+
+A **document** is a file containing a serialized object, or an array of
+objects.
+
+A **document type** is a class of files that share a common structure and
+semantics.
+
+A **document schema** is a formal description of the grammar of a document type.
+
+A **base URI** is a context-dependent URI used to resolve relative references.
+
+An **identifier** is a URI that designates a single document or single
+object within a document.
+
+A **vocabulary** is the set of symbolic field names and enumerated symbols defined
+by a document schema, where each term maps to absolute URI.
+
+## Syntax
+
+Conforming Salad v1.1 documents are serialized and loaded using a
+subset of YAML 1.2 syntax and UTF-8 text encoding.  Salad documents
+are written using the [JSON-compatible subset of YAML described in
+section 10.2](https://yaml.org/spec/1.2/spec.html#id2803231).  The
+following features of YAML must not be used in conforming Salad
+documents:
+
+* Use of explicit node tags with leading `!` or `!!`
+* Use of anchors with leading `&` and aliases with leading `*`
+* %YAML directives
+* %TAG directives
+
+It is a fatal error if the document is not valid YAML.
+
+A Salad document must consist only of either a single root object or an
+array of objects.
+
+## Document context
+
+### Implied context
+
+The implicit context consists of the vocabulary defined by the schema and
+the base URI.  By default, the base URI must be the URI that was used to
+load the document.  It may be overridden by an explicit context.
+
+### Explicit context
+
+If a document consists of a root object, this object may contain the
+fields `$base`, `$namespaces`, `$schemas`, and `$graph`:
+
+  * `$base`: Must be a string.  Set the base URI for the document used to
+    resolve relative references.
+
+  * `$namespaces`: Must be an object with strings as values.  The keys of
+    the object are namespace prefixes used in the document; the values of
+    the object are the prefix expansions.
+
+  * `$schemas`: Must be an array of strings.  This field may list URI
+    references to documents in RDF-XML format which will be queried for RDF
+    schema data.  The subjects and predicates described by the RDF schema
+    may provide additional semantic context for the document, and may be
+    used for validation of prefixed extension fields found in the document.
+
+Other directives beginning with `$` must be ignored.
+
+## Document graph
+
+If a document consists of a single root object, this object may contain the
+field `$graph`.  This field must be an array of objects.  If present, this
+field holds the primary content of the document.  A document that consists
+of array of objects at the root is an implicit graph.
+
+## Document metadata
+
+If a document consists of a single root object, metadata about the
+document, such as authorship, may be declared in the root object.
+
+## Document schema
+
+Document preprocessing, link validation and schema validation require a
+document schema.  A schema may consist of:
+
+  * At least one record definition object which defines valid fields that
+  make up a record type.  Record field definitions include the valid types
+  that may be assigned to each field and annotations to indicate fields
+  that represent identifiers and links, described below in "Semantic
+  Annotations".
+
+  * Any number of enumerated type objects which define a set of finite set of symbols that are
+  valid value of the type.
+
+  * Any number of documentation objects which allow in-line documentation of the schema.
+
+The schema for defining a salad schema (the metaschema) is described in
+detail in the [Schema](#Schema) section.
+
+## Record field annotations
+
+In a document schema, record field definitions may include the field
+`jsonldPredicate`, which may be either a string or object.  Implementations
+must use the following document preprocessing of fields by the following
+rules:
+
+  * If the value of `jsonldPredicate` is `@id`, the field is an identifier
+  field.
+
+  * If the value of `jsonldPredicate` is an object, and contains that
+  object contains the field `_type` with the value `@id`, the field is a
+  link field subject to [link validation](#Link_validation).
+
+  * If the value of `jsonldPredicate` is an object which contains the
+  field `_type` with the value `@vocab`, the field value is subject to
+  [vocabulary resolution](#Vocabulary_resolution).
+
+## Document traversal
+
+To perform document document preprocessing, link validation and schema
+validation, the document must be traversed starting from the fields or
+array items of the root object or array and recursively visiting each child
+item which contains an object or arrays.
+
+## Short names
+
+The "short name" of an fully qualified identifier is the portion of
+the identifier following the final slash `/` of either the fragment
+identifier following `#` or the path portion, if there is no fragment.
+Some examples:
+
+* the short name of `http://example.com/foo` is `foo`
+* the short name of `http://example.com/#bar` is `bar`
+* the short name of `http://example.com/foo/bar` is `bar`
+* the short name of `http://example.com/foo#bar` is `bar`
+* the short name of `http://example.com/#foo/bar` is `bar`
+* the short name of `http://example.com/foo#bar/baz` is `baz`
+
+## Inheritance and specialization
+
+A record definition may inherit from one or more record definitions
+with the `extends` field.  This copies the fields defined in the
+parent record(s) as the base for the new record.  A record definition
+may `specialize` type declarations of the fields inherited from the
+base record.  For each field inherited from the base record, any
+instance of the type in `specializeFrom` is replaced with the type in
+`specializeTo`.  The type in `specializeTo` should extend from the
+type in `specializeFrom`.
+
+A record definition may be `abstract`.  This means the record
+definition is not used for validation on its own, but may be extended
+by other definitions.  If an abstract type appears in a field
+definition, it is logically replaced with a union of all concrete
+subtypes of the abstract type.  In other words, the field value does
+not validate as the abstract type, but must validate as some concrete
+type that inherits from the abstract type.
+
+# Document preprocessing
+
+After processing the explicit context (if any), document preprocessing
+begins.  Starting from the document root, object fields values or array
+items which contain objects or arrays are recursively traversed
+depth-first.  For each visited object, field names, identifier fields, link
+fields, vocabulary fields, and `$import` and `$include` directives must be
+processed as described in this section.  The order of traversal of child
+nodes within a parent node is undefined.
author	shellac
date	Thu, 14 May 2020 14:56:58 -0400
parents	26e78fe6e8c4
children