Skip to content

Commit

Permalink
Merge pull request #22 from co-cddo/update-readme-add-schema-and-api-…
Browse files Browse the repository at this point in the history
…specification-details

Add JSON Schema and API Spec documentation to README
  • Loading branch information
RobNicholsGDS authored Oct 14, 2024
2 parents 0edde43 + c25f591 commit 14beadd
Showing 1 changed file with 329 additions and 8 deletions.
337 changes: 329 additions & 8 deletions readme.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,338 @@
Data Marketplace JSON Schema
============================

The repository contains a JSON schema that can be used to validata metadata records submitted to the Data Marketplace
The Data Marketplace
====================

This repository stores objects (JSON Schema) that are used to define the structure of objects passed to and from the Data
Marketplace and the API specification for the interface through which the objects will be transported.

JSON Schema
-----------

A JSON Schema provides a computer readable description of a metadata data object. The project’s schema can be compared to a JSON
representation of a Data Marketplace resource’s metadata with the output being a report describing the discrepancies between the
two. This allows the schema to be used to validate metadata records.

Due to the straightforward structure of JSON the description can also be read by people such that it is also a useful tool to aid
discussion around the construction of metadata objects.

JSON Schema is a widely used protocol which means there are many tools that allow it to be used in a wide variety of programming
environments. There are also a large number of tools that allow objects defined using other protocols to be compared to the schema;
for example XML.

The Project’s JSON Schema are stored at /schema


### JSON

[JavaScript Object Notation (JSON)](https://developer.mozilla.org/en-US/docs/Learn/JavaScript/Objects/JSON) is a standard text-based
format for representing structured data based on JavaScript object syntax. It is commonly used for transmitting data in web
applications (for example, sending some data from the server to the client, so it can be displayed on a web page, or vice versa).

JSON has been chosen as the default format for describing the Data Marketplace metadata models because of its simple structure (it
is basically just a text representation of key/value pairs) and ability to represent nested structures (where a value is itself
another JSON object or array).

For example, this is a JSON representation of Dataset metadata that could have been submitted via an ESDA template spreadsheet:
```json
{
"identifier": "8d085327-21b6-4d8b-9705-88faad231d22",
"supplierIdentifier": "acme-123",
"modified": "2023-12-09T16:09:53+00:00",
"status": "Published",
"title": "Advance Passenger Information",
"description": "Travel data and personal data given to airlines by passenger.",
"type": "Data Set",
"theme": [
"Transport and infrastructure",
"Population and society"
],
"keyword": [
"Air travel",
"Passport",
"Airports",
"leaving UK",
"entering UK"
],
"contactPoint": [
{
"name": "Jane Doe",
"email": "[email protected]"
}
],
"publisher": "academy-for-social-justice",
"securityClassification": "OFFICIAL",
"accessRights": "INTERNAL",
"distribution": [
{
"accessService": ["8d085327-21b6-4d8b-9705-88faad231d23"],
"downloadURL": "http://example.com/path/to/file.csv",
"mediaType":["text/csv"]
}
]
}
```

One of the key advantages of the JSON description of the data over the spreadsheet description is the better support for nesting.
For example, if a second contactPoint is required another object can be added to the contactPoint array:
```json
"contactPoint": [
{
"name": "Jane Doe",
"email": "[email protected]"
},
{
"name": "Fred Bloggs",
"email": "[email protected]"
}
],
```
Within a spreadsheet either a separate sheet, or additional fields would need to be added (for example, extra fields
contactPointName2 and contactPointEmail2).

In the past the spreadsheet representation of the data has made it difficult to do things like define multiple distribution
formats for a single dataset. As with the contactPoint example above, in the JSON description it is straightforward to add
additional distribution objects:
```json
"distribution": [
{
"accessService": ["8d085327-21b6-4d8b-9705-88faad231d23"],
"downloadURL": "http://example.com/path/to/file.csv",
"mediaType":["text/csv"]
},
{
"accessService": ["8d085327-21b6-4d8b-9705-88faad231d23"],
"downloadURL": "http://example.com/path/to/file.json",
"mediaType":["application/json"]
},
{
"accessService": ["8d085327-21b6-4d8b-9705-88faad231d23"],
"downloadURL": "http://example.com/path/to/file.xml",
"mediaType":["text/xml"]
}
]
```

### JSON Schema

[JSON Schema](https://json-schema.org/) is the vocabulary that enables JSON data consistency, validity, and interoperability
at scale. Each schema is itself defined by a JSON object with the vocabulary defining the necessary structure of the object.

This is a simple JSON Schema that describes the structure of a Person JSON object:
```json
{
"$id": "https://example.com/person.schema.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "Person",
"type": "object",
"properties": {
"firstName": {
"type": "string",
"description": "The person's first name."
},
"lastName": {
"type": "string",
"description": "The person's last name."
},
"age": {
"description": "Age in years which must be equal to or greater than zero.",
"type": "integer",
"minimum": 0
}
}
}
```

The schema defines an object with the title “Person”, that has three properties: firstName, lastName, and age. The two name
properties should have string values, while age will be an integer. Age has a further restriction in that it cannot be less than
zero.

Here is a person object that matches that schema:

```json
{
"firstName": "John",
"lastName": "Doe",
"age": 21
}
```

Data Marketplace objects described by the schema
------------------------------------------------

There are three main types of objects described by the project JSON schema.

- Core objects: describe the primary objects that need to be described within the marketplace catalogue.
- Shared objects: contain structures that are shared across core objects
- Supporting object: are objects that are useful in systems working with the core objects

### Core objects

#### Dataset

The metadata representation of a data object. The object could be a spreadsheet, a database, or even a part of a database
(a query result for example). Typically within the Data Marketplace an ESDA record is represented as a Dataset.

#### Data Service

The service that returns data. That could be an API, a file system, or any data storage or delivery system. The data
returned could range from simple Yes/No boolean responses to gigabytes of structured data.

#### Data Share

The data gathered in support of a data share request.

### Shared objects

#### Catalogued Resource

The core objects share a common root structure. That is, there are a number of fields that are contained within each of
the core objects. Rather than repeating the defining schema for these common fields in each of the core object schema,
these field descriptions are defined once in the Catalogued Resource. Then each core object schema refers to this common
schema.

The effect of this is that Dataset, Data Service, and Data Share objects will each contain all the fields described in
the Catalogued Resource, as well as the fields specific to themselves.

Currently there is no use case for a catalogued resource object to be used directly and in isolation. All catalogued
resources will be defined via the core objects: Dataset, Data Service, and Data Share. Note also that the Data Group
also shares the Catalogued Resource fields.

#### Shared schema
Some object definitions are more complicated than standard and are used in multiple places. The prime example of this
is the date stamp where a number of formats are supported. The shared schema is used as a place to define these object
descriptions. These descriptions are then referenced within the other object schema.

### Support objects

#### Data Group

There are a number of use cases where it would be useful to group a number of core objects together. For example,
if annual examination results were presented as a Dataset for each year, it would be convenient to be able to refer to
all the examination datasets within a period via a single group object.

The main purpose of the Data Group is as a container for references (ids or locations) for the objects within the group.

The Data Group shares the common structure of the core objects and is a type of catalogued resource. Therefore it contains
the fields defined in the catalogued resource schema in addition to the fields defined in its own schema.

As the Data Group shares the same basic structure as the core objects, it can be used as a replacement to one of those
objects within the systems using them. This facilitates actions such as a search for metadata concerning “exam results”
returning both individual metadata records and groups of matching records.

#### Error object

The schema also contains a definition of an object that will contain information about an error that has occurred while
moving or processing Data Marketplace data objects.

This is an example of an error object:
```json
{
"message": "Validation failure",
"code": "401",
"errors": [
{
"type": "http://example.com/docs/errors/exceeded-max-length",
"detail": "The attribute 'summary' is too long. It should be less than 250 characters",
"instance": "identifier:f58ce342-d3cc-4a13-bba4-ac958c39397b",
"location": "/summary",
"severity": "minor"
}
]
}
```

The API Specification
---------------------

The basic purpose of the API is to provide access to the metadata records by systems outside the Data Marketplace. To
achieve this the outside system (remote) sends a request via HTTP (the internet’s standard transport protocol) to an
endpoint location (an URL) provided by the Data Marketplace. The characteristics of the request determine the response
and the API specification defines the relationship between the request and the expected response.

The Project’s API Specification is stored at /api_specification

### API Actions

There are four standard actions that API support. These are commonly referred to as the
[CRUD](https://en.wikipedia.org/wiki/Create,_read,_update_and_delete) actions:

- Create - used to create a new record.
- Read - where a record is viewed only.
- Update - used to modify a record that already exists within the system being accessed.
- Delete - removes the record from the system being accessed

These four actions cover most use cases for remote access of data via an API.

### REST

REST (Representational State Transfer) is a software architectural style that was created to guide the design and
development of the architecture for the World Wide Web. The REST architectural style emphasises uniform interfaces,
independent deployment of components, the scalability of interactions between them, and creating a layered architecture
to promote caching to reduce user-perceived latency, enforce security, and encapsulate legacy systems.

In a REST system, records are usually referred to as resources.

For a resource to be managed via a REST system each CRUD action applied to the resource should have a unique endpoint.
To achieve this efficiently the endpoint is constructed from three components:

- The system’s root URL
- A path to a CRUD endpoint with a matching HTTP method
- A modification to the path that uniquely identifies the resource (if the resource exists)

For the second element it is typical to match the HTTP methods POST, GET, PATCH, DELETE to the respective CRUD actions
of Create, Read, Update, and Delete.

So if an API at `http://example.com` made available a collection of Things, it would be typical to create the following
set of endpoints:

- **POST to `http://example.com/things`** - Create a new resource. The response to this action would include the ID of the
newly created “thing” resource.
- **GET `http://example.com/things`** - Read all: a list of things including the URLs to read each individual thing (see next
item)
- **GET `http://example.com/things/<ID>`** - Read one: where “<ID>” would be replaced with the ID of the matching “thing” to
be viewed.
- **PATCH `http://example.com/things/<ID>`** - Update one: Update the resource with the matching ID.
- **DELETE `http://example.com/things/<ID>`** - Delete one: The resource with the matching ID would be removed from the system.

### OpenAPI

The [OpenAPI Specification (OAS)](https://spec.openapis.org/oas/latest.html) defines a standard, programming
language-agnostic interface description for HTTP APIs, which allows both humans and computers to discover and understand
the capabilities of a service without requiring access to source code, additional documentation, or inspection of network
traffic. When properly defined via OpenAPI, a consumer can understand and interact with the remote service with a minimal
amount of implementation logic. Similar to what interface descriptions have done for lower-level programming, the OpenAPI
Specification removes guesswork in calling a service.

The API specification defined at /api_specification in this repository is define using the OpenAPI specification.

### Swagger UI

[Swagger](https://swagger.io) started out as a simple, open source specification for designing RESTful APIs in 2010. Open
source tooling like the Swagger UI, Swagger Editor and the Swagger Codegen were also developed to better implement and
visualize APIs defined in the specification. The Swagger project, consisting of the specification and the open source tools,
became immensely popular, creating a massive ecosystem of community driven tools.

In 2015, the Swagger project was acquired by SmartBear Software. The Swagger Specification was donated to the Linux
foundation and renamed the OpenAPI.

[Swagger UI](https://swagger.io/tools/swagger-ui/) is an Open Source tool that allows anyone — be it your development
team or your end consumers — to visualize and interact with the API’s resources without having any of the implementation
logic in place. It’s automatically generated from your OpenAPI (formerly known as Swagger) Specification, with the visual
documentation making it easy for back end implementation and client side consumption.

A Swagger UI representation of the API Specification can be found at
[/swagger](https://co-cddo.github.io/demo-data-marketplace-metadata-json-schema/swagger/)
on this repositories Github pages.

Running locally
---------------

The schema are defined in /schema

A Ruby script is provided that loads the schema and sample JSON objects, and attempts to validate the objects against the schema.
A Ruby script is provided that loads the project JSON Schema and sample JSON objects, and attempts to validate the
objects against the schema.

To use the script, first run `bundle` to install the dependencies, and then run the script using:
To use the script (assuming Ruby is installed locally) first run `bundle` to install the dependencies, and then run
the script using:

```ruby
```bash
ruby schema_test.rb
```

0 comments on commit 14beadd

Please sign in to comment.