Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Member extraction algorithm and blank node graphs #113

Open
pietercolpaert opened this issue Sep 10, 2024 · 3 comments
Open

Member extraction algorithm and blank node graphs #113

pietercolpaert opened this issue Sep 10, 2024 · 3 comments

Comments

@pietercolpaert
Copy link
Member

pietercolpaert commented Sep 10, 2024

Blank node graphs are a neat trick. For example:

_:b0 {
 <a> <b> <c> .
}

When having parsed the page containing the data, we can be certain we got all triples possible within _:b0, as the scope of this blank node is limited to this page. We can also be certain that this graph will never conflict with a graph from another source.

This means we could use blank node graphs as a package as part of a member and give a specific purpose to parts of our graph. I.e.:

ex:LDES a ldes:EventStream ;
    rdfs:comment "An LDES with per member: an ActivityStreams update, the payload and the signature of the payload";
    tree:view <> ;
    tree:member <A> .

## Imagine we would like to create a transactional profile of LDES, including things like policies, transactions, policies, ways to upsert/remove sets of triples, etc.
<A> a patch:Event ;
    patch:processingMethod patch:Upsert ;
    patch:upsertKey <https://example.org/Dataset1> ;
    patch:transaction ex:Transaction1 ;
    patch:upsertPayload _:b0 ;
    patch:sequence 1 ;
    patch:time "2024-09-09T13:27:33.681Z";
    patch:provenance _:b1 ;
    patch:signature _:b2 ;
    patch:policy _:b3 .

_:b0 {
     ## The (updated) representation of this particular dataset
     ## ...
     <https://example.org/Dataset1> a dcat:Dataset ;

}

_:b1 {
     <https://example.org/Dataset1#Event1> a as:Create, prov:Activity ;
       as:object <https://example.org/Dataset1> ;
       as:published "2023-10-01T12:00:00Z"^^xsd:dateTime .
}

_:b2 {
     # Signature
     [] a ex:DataIntegrityProof;
        ...
ex:signature "rCWNBuxBK1In93X8dvuK1ss91LK0rMiA2KzvsNaEhdGt7PTD5aQ0X58TzbvnTOhvl9t5bRGoOHnxfys52Q9bWjnmD4GoljEWVWFSrBnORsLBOLwcAnLRfEtTvz4t0EYV";
        ex:target (_:b0 _:b1 _:b3) ;
}

_:b3 {
  ## We want to indicate that this specific member must be removed after 1 month
  [] a ex:Policy ;
      ex:target <https://example.org/Dataset1> ;
      ex:duty [
             a ex:RemovalDuty ;
             ex:after "P1M" 
     ]
}

Then we must be able to have a smarter way of dealing with blank node graphs, which at this moment is not specified at all in the member extraction algorithm.

Proposal:

  • Extend the spec that shapes and CBD never look into blank node graphs
  • From the moment a blank node is encountered, we should check whether there is a blank node graph for it. If there is, include that blank node graph’s quads in the member.
@pietercolpaert
Copy link
Member Author

Additionally:

The main goal of this entire spec is to be able to package triples. If there’s one system, I believe we probably should not mix it with other systems (this is a design choice I guess). So we could change the algorithm in a way which makes sure that

  1. Only shape topologies can trigger an HTTP request to out of band members: the package here is the shape topology, but also the external resource in a way. Shape topologies only trigger when no named member graphs or blank node graphs are used.
  2. From the moment you use a named graph as a member, we limit the metadata to CBD, and only further include data from the named graph. Here the package is one particular named graph that, because it is a member of a collection, is through convention going to remain the same everywhere at this particular moment. The CBD algorithm excludes blank node graphs and member graphs.
  3. A check needs to happen that checks whether blank nodes in the page are directly related to a member, which means that an advanced packaging technique is used in this case to form the member.

@pietercolpaert
Copy link
Member Author

pietercolpaert commented Oct 9, 2024

@pietercolpaert
Copy link
Member Author

Thoughts when going over this issue today:

  • It will become very difficult to understand what graphs we can and cannot look into this way: how do you know whether a blank node graph has not been used somewhere as a triple in another member?
  • It’s hard to explain why you would do that for blank node graphs, but not for IRI named graphs

An alternative proposal to support multiple named graphs per member, would be to point in a separate property on the LDES to what namedgraphPaths must be considered.

E.g., this description could work for the example above:

ex:LDES tree:namedgraphPath (patch:upsertPayload) .
ex:LDES tree:namedgraphPath patch:provenance .
ex:LDES tree:namedgraphPath patch:signature .
ex:LDES tree:namedgraphPath patch:policy .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In progress
Development

No branches or pull requests

1 participant