Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cube package #202

Merged
merged 37 commits into from
Nov 22, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
582865a
feat: cube package with validation pipeline
giacomociti Nov 2, 2023
957e8c6
add manifest
giacomociti Nov 3, 2023
b7df2d8
feat: SHACL report step
giacomociti Nov 3, 2023
1506a10
fix stdout
giacomociti Nov 6, 2023
ae3e262
fix shacl report
giacomociti Nov 7, 2023
ace571f
use external sorting to avoid db timeout
giacomociti Nov 8, 2023
ecd68d4
query observations with construct
giacomociti Nov 8, 2023
4582bea
doc: cube pipelines with package specific commans
giacomociti Nov 9, 2023
1a264ac
Merge branch 'master' into cube-validation-sort
giacomociti Nov 9, 2023
7c9871b
test cube validation pipeline
giacomociti Nov 10, 2023
101949b
Merge branch 'master' into cube-validation-sort
giacomociti Nov 10, 2023
68c034a
move cube operations from rdf package
giacomociti Nov 10, 2023
ec27cc8
fix lint
giacomociti Nov 10, 2023
c3feb14
fix: use pattern for command name
giacomociti Nov 13, 2023
7cf0d65
Update packages/cube/package.json
giacomociti Nov 15, 2023
bfb6bda
Apply suggestions from code review
giacomociti Nov 15, 2023
de52160
fix syntax
giacomociti Nov 15, 2023
bb3b99b
more code review suggestions
giacomociti Nov 15, 2023
bf7af42
Update packages/cube/pipeline/cube-validation.ttl
giacomociti Nov 15, 2023
a32425d
split pipelines in multiple files
giacomociti Nov 15, 2023
8287ed6
rename pipelines
giacomociti Nov 15, 2023
e0bab1a
Remove ENV dependency from Cube (#205)
tpluscode Nov 17, 2023
e69448b
fix: env in sort step
giacomociti Nov 17, 2023
6c60476
refactor: no stdout
tpluscode Nov 20, 2023
c7e328a
refactor: reword
tpluscode Nov 20, 2023
bfb2f04
fix: pipelines must be readable
tpluscode Nov 21, 2023
f1c6548
fix: stdout and finalization on error
giacomociti Nov 21, 2023
e17114b
fix test
giacomociti Nov 21, 2023
e17e98f
fix test
giacomociti Nov 21, 2023
47d5631
test pipeline options
giacomociti Nov 21, 2023
7c59fe3
update reference
giacomociti Nov 21, 2023
c4277ee
test: avoid elvis, prepare tests
tpluscode Nov 21, 2023
1dbb51f
refactor: default to 0 shape violations, don't fail pipeline when max…
tpluscode Nov 21, 2023
66d1262
revert: fail at any number of errors
tpluscode Nov 22, 2023
925e0e9
feat: handle `this.error` in sub-pipelines without overwriting first …
tpluscode Nov 22, 2023
4c48eb4
Update clean-owls-think.md
tpluscode Nov 22, 2023
bbc237d
Merge pull request #207 from zazuko/report-improve
tpluscode Nov 22, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/clean-owls-think.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"barnard59-core": minor
---

Add support for "late errors" where step authors can call `context.error()` to avoid immediately breaking the pipeline
5 changes: 5 additions & 0 deletions .changeset/five-cups-wash.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"barnard59-sparql": patch
---

fix code link in manifest
22 changes: 22 additions & 0 deletions .changeset/silver-humans-joke.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
---
"barnard59-cube": major
"barnard59-rdf": major
---

Move cube operations from package `barnard59-rdf` to the new package `barnard59-cube`.


```diff
<#toObservation> a p:Step;
code:implementedBy [ a code:EcmaScriptModule;
- code:link <node:barnard59-rdf/cube.js#toObservation>
+ code:link <node:barnard59-cube/cube.js#toObservation>
].

<#buildCubeShape> a p:Step;
code:implementedBy [ a code:EcmaScriptModule;
- code:link <node:barnard59-rdf/cube.js#buildCubeShape>
+ code:link <node:barnard59-code/cube.js#buildCubeShape>
].

```
5 changes: 5 additions & 0 deletions .changeset/soft-peaches-brake.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"barnard59-env": minor
---

Added `cube` and `meta` namespaces
5 changes: 5 additions & 0 deletions .changeset/strong-lions-wait.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"barnard59": patch
---

include peer dependencies in manifest discovery
1 change: 1 addition & 0 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ jobs:
package:
- base
- core
- cube
- csvw
- formats
- ftp
Expand Down
1 change: 1 addition & 0 deletions codecov.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ flag_management:
- name: barnard59-base
- name: barnard59-core
- name: barnard59-csvw
- name: barnard59-cube
- name: barnard59-formats
- name: barnard59-ftp
- name: barnard59-graph-store
Expand Down
101 changes: 95 additions & 6 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion packages/cli/lib/discoverManifests.js
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ const require = module.createRequire(import.meta.url)
export default async function * () {
const packages = findPlugins({
includeDev: true,
includePeer: true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should add a changeset to update the CLI version

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

filter({ pkg }) {
return packagePattern.test(pkg.name) && hasManifest(pkg.name)
},
Expand All @@ -19,7 +20,7 @@ export default async function * () {
if (hasManifest(dir)) {
const { name, version } = require(`${dir}/package.json`)
yield {
name,
name: packagePattern.test(name) ? name.match(packagePattern)[1] : name,
manifest: rdf.clownface({ dataset: await rdf.dataset().import(rdf.fromFile(`${dir}/manifest.ttl`)) }),
version,
}
Expand Down
2 changes: 1 addition & 1 deletion packages/cli/lib/pipeline.js
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ export const desugar = async (dataset, { logger, knownOperations } = {}) => {
const [quad] = step.dataset.match(step.term)
const knownStep = knownOperations.get(quad?.predicate)
if (!knownStep) {
logger?.warn(`Operation <${quad?.predicate.value}> not found in known manifests. Have you added the right \`branard59-*\` package as dependency?`)
logger?.warn(`Operation <${quad?.predicate.value}> not found in known manifests. Have you added the right \`barnard59-*\` package as dependency?`)
continue
}

Expand Down
13 changes: 10 additions & 3 deletions packages/core/lib/factory/pipeline.js
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ import { VariableMap } from '../VariableMap.js'
import createStep from './step.js'
import createVariables from './variables.js'

async function createPipelineContext(ptr, { basePath, context, logger, variables }) {
return { ...context, basePath, logger, variables }
async function createPipelineContext(ptr, { basePath, context, logger, variables, error }) {
return { error, ...context, basePath, logger, variables }
}

async function createPipelineVariables(ptr, { basePath, context, loaderRegistry, logger, variables }) {
Expand Down Expand Up @@ -35,8 +35,15 @@ function createPipeline(ptr, {
ptr = context.env.clownface({ dataset: ptr.dataset, term: ptr.term })

const onInit = async pipeline => {
function error(err) {
logger.error(err)
if (!pipeline.error) {
pipeline.error = err
}
}

variables = await createPipelineVariables(ptr, { basePath, context, loaderRegistry, logger, variables })
context = await createPipelineContext(ptr, { basePath, context, logger, variables })
context = await createPipelineContext(ptr, { basePath, context, logger, variables, error })

logVariables(ptr, context, variables)

Expand Down
3 changes: 3 additions & 0 deletions packages/core/lib/run.js
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,9 @@ async function run(pipeline, { end = false, resume = false } = {}) {
pipeline.logger.on('finish', () => resolve())
})

if (pipeline.error) {
throw pipeline.error
}
pipeline.logger.end()
await p
} catch (err) {
Expand Down
85 changes: 85 additions & 0 deletions packages/cube/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# barnard59-cube

This package provides operations and commands for RDF cubes in Barnard59 Linked Data pipelines.
The `manifest.ttl` file contains a full list of all operations included in this package.

## Operations

### `cube/buildCubeShape`

TBD

### `cube/toObservation`

TBD


## Commands

## Cube validation

The following pipelines retrieve and validate cube observations and their constraints.

### fetch constraint

Pipeline `fetch-constraint` queries a given SPARQL endpoint to retrieve
a [concise bounded description](https://docs.stardog.com/query-stardog/#describe-queries) of the `cube:Constraint` part of a given cube.

```bash
npx barnard59 cube fetch-constraint \
--cube https://agriculture.ld.admin.ch/agroscope/PRIFm8t15/2 \
--endpoint https://int.lindas.admin.ch/query
```


This pipeline is useful mainly for cubes published with [cube creator](https://github.com/zazuko/cube-creator) (if the cube definition is manually crafted, likely it's already available as a local file).


### check constraint

Pipeline `check-constraint` validates the input constraint against the shapes provided with the `profile` variable (the default profile is https://cube.link/latest/shape/standalone-constraint-constraint).

The pipeline reads the constraint from `stdin`, allowing input from a local file (as in the following example) as well as from the output of the `fetch-constraint` pipeline (in most cases it's useful to have the constraint in a local file because it's needed also for the `check-observations` pipeline).

```bash
cat myConstraint.ttl \
| npx barnard59 cube check-constraint \
--profile https://cube.link/v0.1.0/shape/standalone-constraint-constraint
```
SHACL reports for violations are written to `stdout`.


### fetch observations

Pipeline `fetch-observations` queries a given SPARQL endpoint to retrieve the observations of a given cube.

```bash
npx barnard59 cube fetch-observations \
--cube https://agriculture.ld.admin.ch/agroscope/PRIFm8t15/2 \
--endpoint https://int.lindas.admin.ch/query
```
Results are written to `stdout`.

### check observations

Pipeline `check-observations` validates the input observations against the shapes provided with the `constraint` variable.

The pipeline reads the observations from `stdin`, allowing input from a local file (as in the following example) as well as from the output of the `fetch-observations` pipeline.

```bash
cat myObservations.ttl \
| npx barnard59 cube check-observations \
--constraint myConstraint.ttl
```

To enable validation, the pipeline adds to the constraint a `sh:targetClass` property with value `cube:Observation`, requiring that each observation has an explicit `rdf:type`.

To leverage streaming, input is split and validated in little batches of adjustable size (the default is 50 and likely it's appropriate in most cases). This allows the validation of very big cubes because observations are not loaded in memory all at once. To ensure triples for the same observation are adjacent (hence processed in the same batch), the input is sorted by subject (and in case the input is large the sorting step relies on temporary local files).

SHACL reports for violations are written to `stdout`.

To limit the output size, there is also a `maxViolations` option to stop validation when the given number of violations is reached.

### Known issues

Command `check-constraint` may fail if there are `sh:in` constraints with too many values.
File renamed without changes.
Loading