-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #22 from co-cddo/update-readme-add-schema-and-api-…
…specification-details Add JSON Schema and API Spec documentation to README
- Loading branch information
Showing
1 changed file
with
329 additions
and
8 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,17 +1,338 @@ | ||
Data Marketplace JSON Schema | ||
============================ | ||
|
||
The repository contains a JSON schema that can be used to validata metadata records submitted to the Data Marketplace | ||
The Data Marketplace | ||
==================== | ||
|
||
This repository stores objects (JSON Schema) that are used to define the structure of objects passed to and from the Data | ||
Marketplace and the API specification for the interface through which the objects will be transported. | ||
|
||
JSON Schema | ||
----------- | ||
|
||
A JSON Schema provides a computer readable description of a metadata data object. The project’s schema can be compared to a JSON | ||
representation of a Data Marketplace resource’s metadata with the output being a report describing the discrepancies between the | ||
two. This allows the schema to be used to validate metadata records. | ||
|
||
Due to the straightforward structure of JSON the description can also be read by people such that it is also a useful tool to aid | ||
discussion around the construction of metadata objects. | ||
|
||
JSON Schema is a widely used protocol which means there are many tools that allow it to be used in a wide variety of programming | ||
environments. There are also a large number of tools that allow objects defined using other protocols to be compared to the schema; | ||
for example XML. | ||
|
||
The Project’s JSON Schema are stored at /schema | ||
|
||
|
||
### JSON | ||
|
||
[JavaScript Object Notation (JSON)](https://developer.mozilla.org/en-US/docs/Learn/JavaScript/Objects/JSON) is a standard text-based | ||
format for representing structured data based on JavaScript object syntax. It is commonly used for transmitting data in web | ||
applications (for example, sending some data from the server to the client, so it can be displayed on a web page, or vice versa). | ||
|
||
JSON has been chosen as the default format for describing the Data Marketplace metadata models because of its simple structure (it | ||
is basically just a text representation of key/value pairs) and ability to represent nested structures (where a value is itself | ||
another JSON object or array). | ||
|
||
For example, this is a JSON representation of Dataset metadata that could have been submitted via an ESDA template spreadsheet: | ||
```json | ||
{ | ||
"identifier": "8d085327-21b6-4d8b-9705-88faad231d22", | ||
"supplierIdentifier": "acme-123", | ||
"modified": "2023-12-09T16:09:53+00:00", | ||
"status": "Published", | ||
"title": "Advance Passenger Information", | ||
"description": "Travel data and personal data given to airlines by passenger.", | ||
"type": "Data Set", | ||
"theme": [ | ||
"Transport and infrastructure", | ||
"Population and society" | ||
], | ||
"keyword": [ | ||
"Air travel", | ||
"Passport", | ||
"Airports", | ||
"leaving UK", | ||
"entering UK" | ||
], | ||
"contactPoint": [ | ||
{ | ||
"name": "Jane Doe", | ||
"email": "[email protected]" | ||
} | ||
], | ||
"publisher": "academy-for-social-justice", | ||
"securityClassification": "OFFICIAL", | ||
"accessRights": "INTERNAL", | ||
"distribution": [ | ||
{ | ||
"accessService": ["8d085327-21b6-4d8b-9705-88faad231d23"], | ||
"downloadURL": "http://example.com/path/to/file.csv", | ||
"mediaType":["text/csv"] | ||
} | ||
] | ||
} | ||
``` | ||
|
||
One of the key advantages of the JSON description of the data over the spreadsheet description is the better support for nesting. | ||
For example, if a second contactPoint is required another object can be added to the contactPoint array: | ||
```json | ||
"contactPoint": [ | ||
{ | ||
"name": "Jane Doe", | ||
"email": "[email protected]" | ||
}, | ||
{ | ||
"name": "Fred Bloggs", | ||
"email": "[email protected]" | ||
} | ||
], | ||
``` | ||
Within a spreadsheet either a separate sheet, or additional fields would need to be added (for example, extra fields | ||
contactPointName2 and contactPointEmail2). | ||
|
||
In the past the spreadsheet representation of the data has made it difficult to do things like define multiple distribution | ||
formats for a single dataset. As with the contactPoint example above, in the JSON description it is straightforward to add | ||
additional distribution objects: | ||
```json | ||
"distribution": [ | ||
{ | ||
"accessService": ["8d085327-21b6-4d8b-9705-88faad231d23"], | ||
"downloadURL": "http://example.com/path/to/file.csv", | ||
"mediaType":["text/csv"] | ||
}, | ||
{ | ||
"accessService": ["8d085327-21b6-4d8b-9705-88faad231d23"], | ||
"downloadURL": "http://example.com/path/to/file.json", | ||
"mediaType":["application/json"] | ||
}, | ||
{ | ||
"accessService": ["8d085327-21b6-4d8b-9705-88faad231d23"], | ||
"downloadURL": "http://example.com/path/to/file.xml", | ||
"mediaType":["text/xml"] | ||
} | ||
] | ||
``` | ||
|
||
### JSON Schema | ||
|
||
[JSON Schema](https://json-schema.org/) is the vocabulary that enables JSON data consistency, validity, and interoperability | ||
at scale. Each schema is itself defined by a JSON object with the vocabulary defining the necessary structure of the object. | ||
|
||
This is a simple JSON Schema that describes the structure of a Person JSON object: | ||
```json | ||
{ | ||
"$id": "https://example.com/person.schema.json", | ||
"$schema": "https://json-schema.org/draft/2020-12/schema", | ||
"title": "Person", | ||
"type": "object", | ||
"properties": { | ||
"firstName": { | ||
"type": "string", | ||
"description": "The person's first name." | ||
}, | ||
"lastName": { | ||
"type": "string", | ||
"description": "The person's last name." | ||
}, | ||
"age": { | ||
"description": "Age in years which must be equal to or greater than zero.", | ||
"type": "integer", | ||
"minimum": 0 | ||
} | ||
} | ||
} | ||
``` | ||
|
||
The schema defines an object with the title “Person”, that has three properties: firstName, lastName, and age. The two name | ||
properties should have string values, while age will be an integer. Age has a further restriction in that it cannot be less than | ||
zero. | ||
|
||
Here is a person object that matches that schema: | ||
|
||
```json | ||
{ | ||
"firstName": "John", | ||
"lastName": "Doe", | ||
"age": 21 | ||
} | ||
``` | ||
|
||
Data Marketplace objects described by the schema | ||
------------------------------------------------ | ||
|
||
There are three main types of objects described by the project JSON schema. | ||
|
||
- Core objects: describe the primary objects that need to be described within the marketplace catalogue. | ||
- Shared objects: contain structures that are shared across core objects | ||
- Supporting object: are objects that are useful in systems working with the core objects | ||
|
||
### Core objects | ||
|
||
#### Dataset | ||
|
||
The metadata representation of a data object. The object could be a spreadsheet, a database, or even a part of a database | ||
(a query result for example). Typically within the Data Marketplace an ESDA record is represented as a Dataset. | ||
|
||
#### Data Service | ||
|
||
The service that returns data. That could be an API, a file system, or any data storage or delivery system. The data | ||
returned could range from simple Yes/No boolean responses to gigabytes of structured data. | ||
|
||
#### Data Share | ||
|
||
The data gathered in support of a data share request. | ||
|
||
### Shared objects | ||
|
||
#### Catalogued Resource | ||
|
||
The core objects share a common root structure. That is, there are a number of fields that are contained within each of | ||
the core objects. Rather than repeating the defining schema for these common fields in each of the core object schema, | ||
these field descriptions are defined once in the Catalogued Resource. Then each core object schema refers to this common | ||
schema. | ||
|
||
The effect of this is that Dataset, Data Service, and Data Share objects will each contain all the fields described in | ||
the Catalogued Resource, as well as the fields specific to themselves. | ||
|
||
Currently there is no use case for a catalogued resource object to be used directly and in isolation. All catalogued | ||
resources will be defined via the core objects: Dataset, Data Service, and Data Share. Note also that the Data Group | ||
also shares the Catalogued Resource fields. | ||
|
||
#### Shared schema | ||
Some object definitions are more complicated than standard and are used in multiple places. The prime example of this | ||
is the date stamp where a number of formats are supported. The shared schema is used as a place to define these object | ||
descriptions. These descriptions are then referenced within the other object schema. | ||
|
||
### Support objects | ||
|
||
#### Data Group | ||
|
||
There are a number of use cases where it would be useful to group a number of core objects together. For example, | ||
if annual examination results were presented as a Dataset for each year, it would be convenient to be able to refer to | ||
all the examination datasets within a period via a single group object. | ||
|
||
The main purpose of the Data Group is as a container for references (ids or locations) for the objects within the group. | ||
|
||
The Data Group shares the common structure of the core objects and is a type of catalogued resource. Therefore it contains | ||
the fields defined in the catalogued resource schema in addition to the fields defined in its own schema. | ||
|
||
As the Data Group shares the same basic structure as the core objects, it can be used as a replacement to one of those | ||
objects within the systems using them. This facilitates actions such as a search for metadata concerning “exam results” | ||
returning both individual metadata records and groups of matching records. | ||
|
||
#### Error object | ||
|
||
The schema also contains a definition of an object that will contain information about an error that has occurred while | ||
moving or processing Data Marketplace data objects. | ||
|
||
This is an example of an error object: | ||
```json | ||
{ | ||
"message": "Validation failure", | ||
"code": "401", | ||
"errors": [ | ||
{ | ||
"type": "http://example.com/docs/errors/exceeded-max-length", | ||
"detail": "The attribute 'summary' is too long. It should be less than 250 characters", | ||
"instance": "identifier:f58ce342-d3cc-4a13-bba4-ac958c39397b", | ||
"location": "/summary", | ||
"severity": "minor" | ||
} | ||
] | ||
} | ||
``` | ||
|
||
The API Specification | ||
--------------------- | ||
|
||
The basic purpose of the API is to provide access to the metadata records by systems outside the Data Marketplace. To | ||
achieve this the outside system (remote) sends a request via HTTP (the internet’s standard transport protocol) to an | ||
endpoint location (an URL) provided by the Data Marketplace. The characteristics of the request determine the response | ||
and the API specification defines the relationship between the request and the expected response. | ||
|
||
The Project’s API Specification is stored at /api_specification | ||
|
||
### API Actions | ||
|
||
There are four standard actions that API support. These are commonly referred to as the | ||
[CRUD](https://en.wikipedia.org/wiki/Create,_read,_update_and_delete) actions: | ||
|
||
- Create - used to create a new record. | ||
- Read - where a record is viewed only. | ||
- Update - used to modify a record that already exists within the system being accessed. | ||
- Delete - removes the record from the system being accessed | ||
|
||
These four actions cover most use cases for remote access of data via an API. | ||
|
||
### REST | ||
|
||
REST (Representational State Transfer) is a software architectural style that was created to guide the design and | ||
development of the architecture for the World Wide Web. The REST architectural style emphasises uniform interfaces, | ||
independent deployment of components, the scalability of interactions between them, and creating a layered architecture | ||
to promote caching to reduce user-perceived latency, enforce security, and encapsulate legacy systems. | ||
|
||
In a REST system, records are usually referred to as resources. | ||
|
||
For a resource to be managed via a REST system each CRUD action applied to the resource should have a unique endpoint. | ||
To achieve this efficiently the endpoint is constructed from three components: | ||
|
||
- The system’s root URL | ||
- A path to a CRUD endpoint with a matching HTTP method | ||
- A modification to the path that uniquely identifies the resource (if the resource exists) | ||
|
||
For the second element it is typical to match the HTTP methods POST, GET, PATCH, DELETE to the respective CRUD actions | ||
of Create, Read, Update, and Delete. | ||
|
||
So if an API at `http://example.com` made available a collection of Things, it would be typical to create the following | ||
set of endpoints: | ||
|
||
- **POST to `http://example.com/things`** - Create a new resource. The response to this action would include the ID of the | ||
newly created “thing” resource. | ||
- **GET `http://example.com/things`** - Read all: a list of things including the URLs to read each individual thing (see next | ||
item) | ||
- **GET `http://example.com/things/<ID>`** - Read one: where “<ID>” would be replaced with the ID of the matching “thing” to | ||
be viewed. | ||
- **PATCH `http://example.com/things/<ID>`** - Update one: Update the resource with the matching ID. | ||
- **DELETE `http://example.com/things/<ID>`** - Delete one: The resource with the matching ID would be removed from the system. | ||
|
||
### OpenAPI | ||
|
||
The [OpenAPI Specification (OAS)](https://spec.openapis.org/oas/latest.html) defines a standard, programming | ||
language-agnostic interface description for HTTP APIs, which allows both humans and computers to discover and understand | ||
the capabilities of a service without requiring access to source code, additional documentation, or inspection of network | ||
traffic. When properly defined via OpenAPI, a consumer can understand and interact with the remote service with a minimal | ||
amount of implementation logic. Similar to what interface descriptions have done for lower-level programming, the OpenAPI | ||
Specification removes guesswork in calling a service. | ||
|
||
The API specification defined at /api_specification in this repository is define using the OpenAPI specification. | ||
|
||
### Swagger UI | ||
|
||
[Swagger](https://swagger.io) started out as a simple, open source specification for designing RESTful APIs in 2010. Open | ||
source tooling like the Swagger UI, Swagger Editor and the Swagger Codegen were also developed to better implement and | ||
visualize APIs defined in the specification. The Swagger project, consisting of the specification and the open source tools, | ||
became immensely popular, creating a massive ecosystem of community driven tools. | ||
|
||
In 2015, the Swagger project was acquired by SmartBear Software. The Swagger Specification was donated to the Linux | ||
foundation and renamed the OpenAPI. | ||
|
||
[Swagger UI](https://swagger.io/tools/swagger-ui/) is an Open Source tool that allows anyone — be it your development | ||
team or your end consumers — to visualize and interact with the API’s resources without having any of the implementation | ||
logic in place. It’s automatically generated from your OpenAPI (formerly known as Swagger) Specification, with the visual | ||
documentation making it easy for back end implementation and client side consumption. | ||
|
||
A Swagger UI representation of the API Specification can be found at | ||
[/swagger](https://co-cddo.github.io/demo-data-marketplace-metadata-json-schema/swagger/) | ||
on this repositories Github pages. | ||
|
||
Running locally | ||
--------------- | ||
|
||
The schema are defined in /schema | ||
|
||
A Ruby script is provided that loads the schema and sample JSON objects, and attempts to validate the objects against the schema. | ||
A Ruby script is provided that loads the project JSON Schema and sample JSON objects, and attempts to validate the | ||
objects against the schema. | ||
|
||
To use the script, first run `bundle` to install the dependencies, and then run the script using: | ||
To use the script (assuming Ruby is installed locally) first run `bundle` to install the dependencies, and then run | ||
the script using: | ||
|
||
```ruby | ||
```bash | ||
ruby schema_test.rb | ||
``` |