Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent graph replication - RDF Dataset Canonicalization #51

Open
sandervd opened this issue Dec 20, 2023 · 4 comments
Open

Consistent graph replication - RDF Dataset Canonicalization #51

sandervd opened this issue Dec 20, 2023 · 4 comments

Comments

@sandervd
Copy link

When a client requires hard guarantees on consistency, the logic described in the RDF Dataset Canonicalization could be used to provided hashes of the state that should be reached after applying a fragment, or even better, a transaction.
This becomes relevant in cases where LDES is used as a replication protocol for named graphs (the client should have an exact copy of the named graph the publisher intended). For instance, consistency could be lost if a client is offline longer than allowed by the retention period, which could result in missed delete operations (tombstone events). If a checksum mismatch is detected, the client must restart replication from the start of the log to arrive at consistent state.

Reference: https://www.w3.org/TR/rdf-canon/

@xdxxxdx
Copy link

xdxxxdx commented Feb 8, 2024

I think this can be applied generically to TREE (tree client)?

@sandervd
Copy link
Author

sandervd commented Feb 9, 2024

Hmm, I was thinking more to include a hash on each member (version object), that would represent the state of the full represented graph after applying the change:
For instance if we would have a collection {(1,A,State 1), (2, B, Some value), (3, A, State 2)}
After applying the 3th member, we would have the graph:
{(A: State 2), (B: Some value)}. The hash should in this case be the hash of the state of the full graph, if that makes sense 😄
This way we can give much stronger guarantees of consistency.

Of course, the hashes would only be valid in tail of the log due to retention deleting objects that have newer state further in the log.

@pietercolpaert
Copy link
Member

I actually use that over here, to transform data dumps into an LDES feed: https://github.com/pietercolpaert/DCAT-AP-Dumps-To-Feeds/blob/main/index.ts#L59

I’m not sure however what would be the influence on the LDES spec itself? DO you expect this hash to be present in the member? Do you want a path to point to that property?

@sandervd
Copy link
Author

Yes, I would see it as metadata of an event, similar like its timestamp. The hash would indicate the state of the graph after applying the member (or members in case of a transaction). This way we can assure graph integrity over time, the client can validate it holds an exact replica of the graph published/intended.
I see this as an important guarantee in cases like the base registries etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants