Skip to content

Commit

Permalink
Merge pull request #25 from co-cddo/move-schema-docs-to-separate-file
Browse files Browse the repository at this point in the history
Move schema docs to /schema
  • Loading branch information
RobNicholsGDS authored Oct 16, 2024
2 parents 1b249b5 + 798e602 commit 3923127
Show file tree
Hide file tree
Showing 2 changed files with 253 additions and 233 deletions.
236 changes: 3 additions & 233 deletions readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,240 +19,10 @@ JSON Schema is a widely used protocol which means there are many tools that allo
environments. There are also a large number of tools that allow objects defined using other protocols to be compared to the schema;
for example XML.

The Project’s JSON Schema are stored at /schema
The JSON Schema descriptions and links to individual files can be found at
[/schema](https://co-cddo.github.io/data-catalogue-metadata/schema)


### JSON

[JavaScript Object Notation (JSON)](https://developer.mozilla.org/en-US/docs/Learn/JavaScript/Objects/JSON) is a standard text-based
format for representing structured data based on JavaScript object syntax. It is commonly used for transmitting data in web
applications (for example, sending some data from the server to the client, so it can be displayed on a web page, or vice versa).

JSON has been chosen as the default format for describing the Data Marketplace metadata models because of its simple structure (it
is basically just a text representation of key/value pairs) and ability to represent nested structures (where a value is itself
another JSON object or array).

For example, this is a JSON representation of Dataset metadata that could have been submitted via an ESDA template spreadsheet:
```json
{
"identifier": "8d085327-21b6-4d8b-9705-88faad231d22",
"supplierIdentifier": "acme-123",
"modified": "2023-12-09T16:09:53+00:00",
"status": "Published",
"title": "Advance Passenger Information",
"description": "Travel data and personal data given to airlines by passenger.",
"type": "Data Set",
"theme": [
"Transport and infrastructure",
"Population and society"
],
"keyword": [
"Air travel",
"Passport",
"Airports",
"leaving UK",
"entering UK"
],
"contactPoint": [
{
"name": "Jane Doe",
"email": "[email protected]"
}
],
"publisher": "academy-for-social-justice",
"securityClassification": "OFFICIAL",
"accessRights": "INTERNAL",
"distribution": [
{
"accessService": ["8d085327-21b6-4d8b-9705-88faad231d23"],
"downloadURL": "http://example.com/path/to/file.csv",
"mediaType":["text/csv"]
}
]
}
```

One of the key advantages of the JSON description of the data over the spreadsheet description is the better support for nesting.
For example, if a second contactPoint is required another object can be added to the contactPoint array:
```json
"contactPoint": [
{
"name": "Jane Doe",
"email": "[email protected]"
},
{
"name": "Fred Bloggs",
"email": "[email protected]"
}
],
```
Within a spreadsheet either a separate sheet, or additional fields would need to be added (for example, extra fields
contactPointName2 and contactPointEmail2).

In the past the spreadsheet representation of the data has made it difficult to do things like define multiple distribution
formats for a single dataset. As with the contactPoint example above, in the JSON description it is straightforward to add
additional distribution objects:
```json
"distribution": [
{
"accessService": ["8d085327-21b6-4d8b-9705-88faad231d23"],
"downloadURL": "http://example.com/path/to/file.csv",
"mediaType":["text/csv"]
},
{
"accessService": ["8d085327-21b6-4d8b-9705-88faad231d23"],
"downloadURL": "http://example.com/path/to/file.json",
"mediaType":["application/json"]
},
{
"accessService": ["8d085327-21b6-4d8b-9705-88faad231d23"],
"downloadURL": "http://example.com/path/to/file.xml",
"mediaType":["text/xml"]
}
]
```

### JSON Schema

[JSON Schema](https://json-schema.org/) is the vocabulary that enables JSON data consistency, validity, and interoperability
at scale. Each schema is itself defined by a JSON object with the vocabulary defining the necessary structure of the object.

This is a simple JSON Schema that describes the structure of a Person JSON object:
```json
{
"$id": "https://example.com/person.schema.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "Person",
"type": "object",
"properties": {
"firstName": {
"type": "string",
"description": "The person's first name."
},
"lastName": {
"type": "string",
"description": "The person's last name."
},
"age": {
"description": "Age in years which must be equal to or greater than zero.",
"type": "integer",
"minimum": 0
}
}
}
```

The schema defines an object with the title “Person”, that has three properties: firstName, lastName, and age. The two name
properties should have string values, while age will be an integer. Age has a further restriction in that it cannot be less than
zero.

Here is a person object that matches that schema:

```json
{
"firstName": "John",
"lastName": "Doe",
"age": 21
}
```

#### Running the JSON schema locally

A Ruby script is provided that loads the project JSON Schema and sample JSON objects, and attempts to validate the
objects against the schema.

To use the script (assuming Ruby is installed locally) first run `bundle` to install the dependencies, and then run
the script using:

```bash
ruby schema_test.rb
```

Data Marketplace objects described by the schema
------------------------------------------------

There are three main types of objects described by the project JSON schema.

- Core objects: describe the primary objects that need to be described within the marketplace catalogue.
- Shared objects: contain structures that are shared across core objects
- Supporting object: are objects that are useful in systems working with the core objects

### Core objects

#### Dataset

The metadata representation of a data object. The object could be a spreadsheet, a database, or even a part of a database
(a query result for example). Typically within the Data Marketplace an ESDA record is represented as a Dataset.

#### Data Service

The service that returns data. That could be an API, a file system, or any data storage or delivery system. The data
returned could range from simple Yes/No boolean responses to gigabytes of structured data.

#### Data Share

The data gathered in support of a data share request.

### Shared objects

#### Catalogued Resource

The core objects share a common root structure. That is, there are a number of fields that are contained within each of
the core objects. Rather than repeating the defining schema for these common fields in each of the core object schema,
these field descriptions are defined once in the Catalogued Resource. Then each core object schema refers to this common
schema.

The effect of this is that Dataset, Data Service, and Data Share objects will each contain all the fields described in
the Catalogued Resource, as well as the fields specific to themselves.

Currently there is no use case for a catalogued resource object to be used directly and in isolation. All catalogued
resources will be defined via the core objects: Dataset, Data Service, and Data Share. Note also that the Data Group
also shares the Catalogued Resource fields.

#### Shared schema
Some object definitions are more complicated than standard and are used in multiple places. The prime example of this
is the date stamp where a number of formats are supported. The shared schema is used as a place to define these object
descriptions. These descriptions are then referenced within the other object schema.

### Support objects

#### Data Group

There are a number of use cases where it would be useful to group a number of core objects together. For example,
if annual examination results were presented as a Dataset for each year, it would be convenient to be able to refer to
all the examination datasets within a period via a single group object.

The main purpose of the Data Group is as a container for references (ids or locations) for the objects within the group.

The Data Group shares the common structure of the core objects and is a type of catalogued resource. Therefore it contains
the fields defined in the catalogued resource schema in addition to the fields defined in its own schema.

As the Data Group shares the same basic structure as the core objects, it can be used as a replacement to one of those
objects within the systems using them. This facilitates actions such as a search for metadata concerning “exam results”
returning both individual metadata records and groups of matching records.

#### Error object

The schema also contains a definition of an object that will contain information about an error that has occurred while
moving or processing Data Marketplace data objects.

This is an example of an error object:
```json
{
"message": "Validation failure",
"code": "401",
"errors": [
{
"type": "http://example.com/docs/errors/exceeded-max-length",
"detail": "The attribute 'summary' is too long. It should be less than 250 characters",
"instance": "identifier:f58ce342-d3cc-4a13-bba4-ac958c39397b",
"location": "/summary",
"severity": "minor"
}
]
}
```

The API Specification
---------------------

Expand Down Expand Up @@ -333,5 +103,5 @@ logic in place. It’s automatically generated from your OpenAPI (formerly known
documentation making it easy for back end implementation and client side consumption.

A Swagger UI representation of the API Specification can be found at
[/swagger](https://co-cddo.github.io/demo-data-marketplace-metadata-json-schema/swagger/)
[/swagger](https://co-cddo.github.io/data-catalogue-metadata/swagger/)
on this repositories Github pages.
Loading

0 comments on commit 3923127

Please sign in to comment.