Mercurial > repos > shellac > guppy_basecaller
comparison env/lib/python3.7/site-packages/schema_salad/metaschema/salad.md @ 2:6af9afd405e9 draft
"planemo upload commit 0a63dd5f4d38a1f6944587f52a8cd79874177fc1"
author | shellac |
---|---|
date | Thu, 14 May 2020 14:56:58 -0400 |
parents | 26e78fe6e8c4 |
children |
comparison
equal
deleted
inserted
replaced
1:75ca89e9b81c | 2:6af9afd405e9 |
---|---|
1 # Semantic Annotations for Linked Avro Data (SALAD) | |
2 | |
3 Author: | |
4 | |
5 * Peter Amstutz <pamstutz@veritasgenetics.com>, Veritas Genetics | |
6 | |
7 Contributors: | |
8 | |
9 * The developers of Apache Avro | |
10 * The developers of JSON-LD | |
11 * Nebojša Tijanić <nebojsa.tijanic@sbgenomics.com>, Seven Bridges Genomics | |
12 | |
13 # Abstract | |
14 | |
15 Salad is a schema language for describing structured linked data documents | |
16 in JSON or YAML documents. A Salad schema provides rules for | |
17 preprocessing, structural validation, and link checking for documents | |
18 described by a Salad schema. Salad builds on JSON-LD and the Apache Avro | |
19 data serialization system, and extends Avro with features for rich data | |
20 modeling such as inheritance, template specialization, object identifiers, | |
21 and object references. Salad was developed to provide a bridge between the | |
22 record oriented data modeling supported by Apache Avro and the Semantic | |
23 Web. | |
24 | |
25 # Status of This Document | |
26 | |
27 This document is the product of the [Common Workflow Language working | |
28 group](https://groups.google.com/forum/#!forum/common-workflow-language). The | |
29 latest version of this document is available in the "schema_salad" repository at | |
30 | |
31 https://github.com/common-workflow-language/schema_salad | |
32 | |
33 The products of the CWL working group (including this document) are made available | |
34 under the terms of the Apache License, version 2.0. | |
35 | |
36 <!--ToC--> | |
37 | |
38 # Introduction | |
39 | |
40 The JSON data model is an extremely popular way to represent structured | |
41 data. It is attractive because of its relative simplicity and is a | |
42 natural fit with the standard types of many programming languages. | |
43 However, this simplicity means that basic JSON lacks expressive features | |
44 useful for working with complex data structures and document formats, such | |
45 as schemas, object references, and namespaces. | |
46 | |
47 JSON-LD is a W3C standard providing a way to describe how to interpret a | |
48 JSON document as Linked Data by means of a "context". JSON-LD provides a | |
49 powerful solution for representing object references and namespaces in JSON | |
50 based on standard web URIs, but is not itself a schema language. Without a | |
51 schema providing a well defined structure, it is difficult to process an | |
52 arbitrary JSON-LD document as idiomatic JSON because there are many ways to | |
53 express the same data that are logically equivalent but structurally | |
54 distinct. | |
55 | |
56 Several schema languages exist for describing and validating JSON data, | |
57 such as the Apache Avro data serialization system, however none understand | |
58 linked data. As a result, to fully take advantage of JSON-LD to build the | |
59 next generation of linked data applications, one must maintain separate | |
60 JSON schema, JSON-LD context, RDF schema, and human documentation, despite | |
61 significant overlap of content and obvious need for these documents to stay | |
62 synchronized. | |
63 | |
64 Schema Salad is designed to address this gap. It provides a schema | |
65 language and processing rules for describing structured JSON content | |
66 permitting URI resolution and strict document validation. The schema | |
67 language supports linked data through annotations that describe the linked | |
68 data interpretation of the content, enables generation of JSON-LD context | |
69 and RDF schema, and production of RDF triples by applying the JSON-LD | |
70 context. The schema language also provides for robust support of inline | |
71 documentation. | |
72 | |
73 ## Introduction to v1.1 | |
74 | |
75 This is the third version of of the Schema Salad specification. It is | |
76 developed concurrently with v1.1 of the Common Workflow Language for use in | |
77 specifying the Common Workflow Language, however Schema Salad is intended to be | |
78 useful to a broader audience. Compared to the v1.0 schema salad | |
79 specification, the following changes have been made: | |
80 | |
81 * Support for `default` values on record fields to specify default values | |
82 * Add subscoped fields (fields which introduce a new inner scope for identifiers) | |
83 * Add the *inVocab* flag (default true) to indicate if a type is added to the vocabulary of well known terms or must be prefixed | |
84 * Add *secondaryFilesDSL* micro DSL (domain specific language) to convert text strings to a secondaryFiles record type used in CWL | |
85 * The `$mixin` feature has been removed from the specification, as it | |
86 is poorly documented, not included in conformance testing, | |
87 and not widely supported. | |
88 | |
89 ## References to Other Specifications | |
90 | |
91 **Javascript Object Notation (JSON)**: http://json.org | |
92 | |
93 **JSON Linked Data (JSON-LD)**: http://json-ld.org | |
94 | |
95 **YAML**: https://yaml.org/spec/1.2/spec.html | |
96 | |
97 **Avro**: https://avro.apache.org/docs/current/spec.html | |
98 | |
99 **Uniform Resource Identifier (URI) Generic Syntax**: https://tools.ietf.org/html/rfc3986) | |
100 | |
101 **Resource Description Framework (RDF)**: http://www.w3.org/RDF/ | |
102 | |
103 **UTF-8**: https://www.ietf.org/rfc/rfc2279.txt) | |
104 | |
105 ## Scope | |
106 | |
107 This document describes the syntax, data model, algorithms, and schema | |
108 language for working with Salad documents. It is not intended to document | |
109 a specific implementation of Salad, however it may serve as a reference for | |
110 the behavior of conforming implementations. | |
111 | |
112 ## Terminology | |
113 | |
114 The terminology used to describe Salad documents is defined in the Concepts | |
115 section of the specification. The terms defined in the following list are | |
116 used in building those definitions and in describing the actions of an | |
117 Salad implementation: | |
118 | |
119 **may**: Conforming Salad documents and Salad implementations are permitted but | |
120 not required to be interpreted as described. | |
121 | |
122 **must**: Conforming Salad documents and Salad implementations are required | |
123 to be interpreted as described; otherwise they are in error. | |
124 | |
125 **error**: A violation of the rules of this specification; results are | |
126 undefined. Conforming implementations may detect and report an error and may | |
127 recover from it. | |
128 | |
129 **fatal error**: A violation of the rules of this specification; results | |
130 are undefined. Conforming implementations must not continue to process the | |
131 document and may report an error. | |
132 | |
133 **at user option**: Conforming software may or must (depending on the modal verb in | |
134 the sentence) behave as described; if it does, it must provide users a means to | |
135 enable or disable the behavior described. | |
136 | |
137 # Document model | |
138 | |
139 ## Data concepts | |
140 | |
141 An **object** is a data structure equivalent to the "object" type in JSON, | |
142 consisting of a unordered set of name/value pairs (referred to here as | |
143 **fields**) and where the name is a string and the value is a string, number, | |
144 boolean, array, or object. | |
145 | |
146 A **document** is a file containing a serialized object, or an array of | |
147 objects. | |
148 | |
149 A **document type** is a class of files that share a common structure and | |
150 semantics. | |
151 | |
152 A **document schema** is a formal description of the grammar of a document type. | |
153 | |
154 A **base URI** is a context-dependent URI used to resolve relative references. | |
155 | |
156 An **identifier** is a URI that designates a single document or single | |
157 object within a document. | |
158 | |
159 A **vocabulary** is the set of symbolic field names and enumerated symbols defined | |
160 by a document schema, where each term maps to absolute URI. | |
161 | |
162 ## Syntax | |
163 | |
164 Conforming Salad v1.1 documents are serialized and loaded using a | |
165 subset of YAML 1.2 syntax and UTF-8 text encoding. Salad documents | |
166 are written using the [JSON-compatible subset of YAML described in | |
167 section 10.2](https://yaml.org/spec/1.2/spec.html#id2803231). The | |
168 following features of YAML must not be used in conforming Salad | |
169 documents: | |
170 | |
171 * Use of explicit node tags with leading `!` or `!!` | |
172 * Use of anchors with leading `&` and aliases with leading `*` | |
173 * %YAML directives | |
174 * %TAG directives | |
175 | |
176 It is a fatal error if the document is not valid YAML. | |
177 | |
178 A Salad document must consist only of either a single root object or an | |
179 array of objects. | |
180 | |
181 ## Document context | |
182 | |
183 ### Implied context | |
184 | |
185 The implicit context consists of the vocabulary defined by the schema and | |
186 the base URI. By default, the base URI must be the URI that was used to | |
187 load the document. It may be overridden by an explicit context. | |
188 | |
189 ### Explicit context | |
190 | |
191 If a document consists of a root object, this object may contain the | |
192 fields `$base`, `$namespaces`, `$schemas`, and `$graph`: | |
193 | |
194 * `$base`: Must be a string. Set the base URI for the document used to | |
195 resolve relative references. | |
196 | |
197 * `$namespaces`: Must be an object with strings as values. The keys of | |
198 the object are namespace prefixes used in the document; the values of | |
199 the object are the prefix expansions. | |
200 | |
201 * `$schemas`: Must be an array of strings. This field may list URI | |
202 references to documents in RDF-XML format which will be queried for RDF | |
203 schema data. The subjects and predicates described by the RDF schema | |
204 may provide additional semantic context for the document, and may be | |
205 used for validation of prefixed extension fields found in the document. | |
206 | |
207 Other directives beginning with `$` must be ignored. | |
208 | |
209 ## Document graph | |
210 | |
211 If a document consists of a single root object, this object may contain the | |
212 field `$graph`. This field must be an array of objects. If present, this | |
213 field holds the primary content of the document. A document that consists | |
214 of array of objects at the root is an implicit graph. | |
215 | |
216 ## Document metadata | |
217 | |
218 If a document consists of a single root object, metadata about the | |
219 document, such as authorship, may be declared in the root object. | |
220 | |
221 ## Document schema | |
222 | |
223 Document preprocessing, link validation and schema validation require a | |
224 document schema. A schema may consist of: | |
225 | |
226 * At least one record definition object which defines valid fields that | |
227 make up a record type. Record field definitions include the valid types | |
228 that may be assigned to each field and annotations to indicate fields | |
229 that represent identifiers and links, described below in "Semantic | |
230 Annotations". | |
231 | |
232 * Any number of enumerated type objects which define a set of finite set of symbols that are | |
233 valid value of the type. | |
234 | |
235 * Any number of documentation objects which allow in-line documentation of the schema. | |
236 | |
237 The schema for defining a salad schema (the metaschema) is described in | |
238 detail in the [Schema](#Schema) section. | |
239 | |
240 ## Record field annotations | |
241 | |
242 In a document schema, record field definitions may include the field | |
243 `jsonldPredicate`, which may be either a string or object. Implementations | |
244 must use the following document preprocessing of fields by the following | |
245 rules: | |
246 | |
247 * If the value of `jsonldPredicate` is `@id`, the field is an identifier | |
248 field. | |
249 | |
250 * If the value of `jsonldPredicate` is an object, and contains that | |
251 object contains the field `_type` with the value `@id`, the field is a | |
252 link field subject to [link validation](#Link_validation). | |
253 | |
254 * If the value of `jsonldPredicate` is an object which contains the | |
255 field `_type` with the value `@vocab`, the field value is subject to | |
256 [vocabulary resolution](#Vocabulary_resolution). | |
257 | |
258 ## Document traversal | |
259 | |
260 To perform document document preprocessing, link validation and schema | |
261 validation, the document must be traversed starting from the fields or | |
262 array items of the root object or array and recursively visiting each child | |
263 item which contains an object or arrays. | |
264 | |
265 ## Short names | |
266 | |
267 The "short name" of an fully qualified identifier is the portion of | |
268 the identifier following the final slash `/` of either the fragment | |
269 identifier following `#` or the path portion, if there is no fragment. | |
270 Some examples: | |
271 | |
272 * the short name of `http://example.com/foo` is `foo` | |
273 * the short name of `http://example.com/#bar` is `bar` | |
274 * the short name of `http://example.com/foo/bar` is `bar` | |
275 * the short name of `http://example.com/foo#bar` is `bar` | |
276 * the short name of `http://example.com/#foo/bar` is `bar` | |
277 * the short name of `http://example.com/foo#bar/baz` is `baz` | |
278 | |
279 ## Inheritance and specialization | |
280 | |
281 A record definition may inherit from one or more record definitions | |
282 with the `extends` field. This copies the fields defined in the | |
283 parent record(s) as the base for the new record. A record definition | |
284 may `specialize` type declarations of the fields inherited from the | |
285 base record. For each field inherited from the base record, any | |
286 instance of the type in `specializeFrom` is replaced with the type in | |
287 `specializeTo`. The type in `specializeTo` should extend from the | |
288 type in `specializeFrom`. | |
289 | |
290 A record definition may be `abstract`. This means the record | |
291 definition is not used for validation on its own, but may be extended | |
292 by other definitions. If an abstract type appears in a field | |
293 definition, it is logically replaced with a union of all concrete | |
294 subtypes of the abstract type. In other words, the field value does | |
295 not validate as the abstract type, but must validate as some concrete | |
296 type that inherits from the abstract type. | |
297 | |
298 # Document preprocessing | |
299 | |
300 After processing the explicit context (if any), document preprocessing | |
301 begins. Starting from the document root, object fields values or array | |
302 items which contain objects or arrays are recursively traversed | |
303 depth-first. For each visited object, field names, identifier fields, link | |
304 fields, vocabulary fields, and `$import` and `$include` directives must be | |
305 processed as described in this section. The order of traversal of child | |
306 nodes within a parent node is undefined. |