Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Rate datasets when crawling #814

Merged
merged 1 commit into from
Nov 5, 2023
Merged

feat: Rate datasets when crawling #814

merged 1 commit into from
Nov 5, 2023

Conversation

ddeboer
Copy link
Member

@ddeboer ddeboer commented Nov 5, 2023

  • Model ratings as schema:Ratings, with penalties subtracting from the total
    score of 100 for missing data.
  • Base ratings on SHACL applied to mapped DCAT output, so improve DCAT
    SHACL where needed.
  • Write output to https://data.netwerkdigitaalerfgoed.nl/registry/ratings
    named graph.

Fix #762

@ddeboer ddeboer force-pushed the 762-rating branch 2 times, most recently from 3419ba1 to e2d1a7f Compare November 5, 2023 21:18
@ddeboer ddeboer enabled auto-merge (squash) November 5, 2023 21:19
* Base ratings on DCAT SHACL
@ddeboer ddeboer merged commit b0f72dd into main Nov 5, 2023
2 checks passed
@ddeboer ddeboer deleted the 762-rating branch November 5, 2023 21:21
@coret
Copy link
Contributor

coret commented Nov 18, 2023

I'm looking into presenting the ratings with the demonstrator. Like for https://www.goudatijdmachine.nl/omeka/api/items/13000 :
image

  1. Why does http://www.w3.org/ns/dcat#distribution need improving?

  2. As the rating is based on the DCAT version, the explanation is also given in DCAT. If the provided data description is schema.org based then these suggestions will be confusing to the dataset description provider! Do we store if the dataset description is schema.org or DCAT based? If not, can we store this property? Then we should convert the DCAT explanation to schema.org properties (in demo or crawler) if the origin was schema.org based.

@ddeboer
Copy link
Member Author

ddeboer commented Nov 18, 2023

  1. Why does http://www.w3.org/ns/dcat#distribution need improving?

Fixed in #826.

  1. As the rating is based on the DCAT version, the explanation is also given in DCAT. If the provided data description is schema.org based then these suggestions will be confusing to the dataset description provider! Do we store if the dataset description is schema.org or DCAT based? If not, can we store this property? Then we should convert the DCAT explanation to schema.org properties (in demo or crawler) if the origin was schema.org based.

I suggest to translate the DCAT keys to labels such as ‘date last modified’, ‘keywords‘, ‘language’. These are human-readable so look better in a UI and apply to both Schema.org and DCAT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Automate rating during crawling
2 participants