Skip to content

Get all RDF triples/quads related to an entity based on CBD and a SHACL shape

License

Notifications You must be signed in to change notification settings

TREEcg/extract-cbd-shape

Repository files navigation

Extract CBD Shape

Given (i) an RdfStore (see rdf-stores) of triples, (ii) an RdfStore with a SHACL shape’s triples, and (iii) a target entity URI, this library will extract all triples that belong to the entity. If more triples of the entity are needed, extra triples are retrieved by dereferencing the relevant entity.

The algorithm is a proposal to be standardized as part of W3C’s TREE hypermedia Community Group as the member extraction algorithm. This algorithm needs to be efficient and unambiguously defined, so that various implementations of the member extraction algorithm will result in the same set of triples. As a trade-off, the resulting set of triples is not guaranteed to be validated by the SHACL shape.

The algorithm is inspired by, and an in-between between CBD and Shape Fragments, thanks to Thomas Bergwinkl and his blog post on a SHACL engine.

Use it

npm install extract-cbd-shape
import {CBDShapeExtractor} from "extract-cbd-shape";
// ...
let extractor = new CBDShapeExtractor(shapesGraph);
let entityquads = await extractor.extract(store, entityId, shapeId, graphsToIgnore);
  • The shapesGraph is an RdfStore that contains the quads of a shape
  • The store is an RdfStore containing the quads in the current context (e.g., the quads parsed from an HTTP response, or a message received over a channel)
  • The entityId is the IRI of the entity to extract from the current context
  • The Shape ID is the IRI of the NodeShape in the shapesGraph to start from
  • The graphToIgnore are the namedgraphs in the current context (the store) to disregard when extracting the member

Test it

Tests and examples provided in the tests library. Run them using mocha which can be invoked using npm test

The extraction algorithm

This is an extension of CBD. It extracts:

  1. all quads with subject this entity, and their blank node triples (recursively)
  2. all quads with a named graph matching the entity we’re looking up
  3. It takes hints from a Shape Template (see ↓)

The first focus node is set by the user. 1a. If a shape is set, create a shape template and execute the shape template extraction algorithm 1b. If no shape was set, extract all quads with subject the focus node, and recursively include its blank nodes (see also CBD) 2. Extract all quads with the graph matching the focus node 3. When no quads were extracted from 1 and 2, a client MUST dereference the focus node and re-execute 1 and 2.

Shape Template extraction

The Shape Template is a structure that looks as follows:

class ShapeTemplate {
    closed: boolean;
    requiredPaths: Path[];
    optionalPaths: Path[];
    nodelinks: NodeLink[];
    atLeastOneLists: [ Shape[] ];
}
class NodeLink {
    shape: ShapeTemplate;
    path: Path;
}

Paths in the shape templates are SHACL Property Paths.

A Shape Template has

  • Closed: A boolean telling whether it’s closed or not. If it’s open, a client MUST extract all quads, after a potential HTTP request to the focus node, with subject the focus node, and recursively include its blank nodes (see also CBD)
  • Required paths: MUST trigger an HTTP request if the member does not have this path. All quads from paths, after a potential HTTP request, matching this required path MUST be added to the Member set.
  • Optional paths: All quads from paths, after a potential HTTP request, matching this path MUST be added to the Member set.
  • Node Links: A nodelink contains a reference to another Shape Template, as well as a path. All quads, after a potential HTTP request, matching this path MUST be added to the Member set. The targets MUST be processed again using the shape template extraction algorithm on that
  • atLeastOneLists: Each atLeastOneList is an array of at least one shape with one or more required paths and atLeastOneLists that must be set. If none of the shapes match, it will trigger an HTTP request. Only the quads from paths matching valid shapes are included in the Member.

Note: RDF has set semantics, so while certain quads are going to be matched by the algorithm multiple times, each quad will of course be part of the member only once.

This results in this algorithm:

  1. If it is open, a client MUST extract all quads, after a potential HTTP request to the focus node, with subject the focus node, and recursively include its blank nodes (see also CBD)
  2. If the current focus node is a named node and it was not requested before:
    • test if all required paths are set, if not do an HTTP request, if they are set, then,
    • test if at least one of each list in the atLeastOneLists was set. If not, do an HTTP request.
  3. Visit all paths (required, optional, nodelinks and recursively the shapes in the atLeastOneLists if the shape is valid) paths and add all quads necessary to reach the targets to the result
  4. For the results of nodelinks, if the target is a named node, set it as a focus node and repeat this algorithm with that nodelink’s shape as a shape

Generating a shape template from SHACL

If there’s a shape set, the SHACL shape MUST be processed towards a Shape Template as follows:

  1. Checks if the shape is deactivated (:S sh:deactivated true), if it is, don’t continue
  2. Check if the shape is closed (:S sh:closed true), set the closed boolean to true.
  3. All sh:property elements with an sh:node link are added to the shape’s NodeLinks array
  4. Add all properties with sh:minCount > 0 to the Required Paths array, and all others to the optional paths.
  5. Processes the conditionals sh:xone, sh:or and sh:and (but doesn’t process sh:not):
    • sh:and: all properties on that shape template MUST be merged with the current shape template
    • sh:xone and sh:or: in both cases, at least one item must match at least one quad for all required paths. If not, it will do an HTTP request to the current namednode.

Note: The way we process SHACL shapes into Shape Template is important to understand in order to know when an HTTP request will be triggered when designing SHACL shapes. A cardinality constraint not being exactly matched or a sh:pattern not being respected will not trigger an HTTP request, and instead just add the invalid quads to the Member. This is a design choice: we only define triggers for HTTP request from the SHACL shape to come to a complete set of quads describing the member the data publisher pointed at using tree:member.

Note: it only takes hints (it does not guarantee a result that validates) from an optional SHACL shapes graph. It only uses the parts relevant for discovery from the SHACL Core Constraint Components. It does not support SPARQL or Javascript.

It won’t:

  1. Process more complex validation instructions that are part of SHACL such as sh:class, languageIn, pattern, value, qualified value shapes, etc. It is the data publisher’s responsibility to provide valid data, or it is the responsibility of the user of the library to validate the quads afterwards.
  2. Do automatic target selection based on e.g., targetClass: you need to set the target.
  3. Explicitly look for reified statements or triples that are quoted elsewhere: these are not part of the member. Only when a triple term can be found through a star pattern.

Logging

Logging can be enabled using the DEBUG environment variable, DEBUG=extract-cbd-shape:*.