Skip to content
This repository has been archived by the owner on Apr 13, 2022. It is now read-only.

Clarify and detail the definition of globbing to only apply to folders #148

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

RubenVerborgh
Copy link
Contributor

There is a suspicion that globbing is currently defined too loosely, and that this loose behavior is not depended upon (#147).

This PR restricts globbing to concatenating RDF files in a single folder.

Copy link
Contributor

@michielbdejong michielbdejong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to remove globbing altogether, but this is indeed an improvement. I think this PR is a clarification of the text, not a change of the intent of the text, right?

I would change the example, adding file3.ttl to which the user has no access.

Also maybe we can specify that if the user has no read access to the container, then the request would fail, but if there are no Turtle documents to which the have read access, it would return an empty RDF doc. Also, how would the prefixes be done? What if they clash? Is it just a text-concatenation, or should it understand Turtle?

What would the server do if one of the documents has a syntax error?

@acoburn
Copy link
Member

acoburn commented Mar 27, 2019

Another consideration w/r/t globbing is the issue of blank nodes. Consider the case of two documents in a container:

/container/resource1.ttl
<> a ldp:RDFSource ;
      skos:prefLabel "resource 1" ;
      prov:wasGeneratedBy [
           a prov:Activity, as:Create ;
           prov:wasAssocatedWith <https://example.com/my-webid> ;
           prov:atTime             "2019-03-27T19:23:00.512Z"^^xsd:dateTime ] .

and also:

/container/resource2.ttl
<> a ldp:RDFSource ;
      skos:prefLabel "resource 2" ;
      prov:wasGeneratedBy [
           a prov:Activity, as:Create ;
           prov:wasAssocatedWith <https://example.com/another-webid> ;
           prov:atTime             "2019-03-27T20:13:00.000Z"^^xsd:dateTime ] .

If the underlying RDF serialization library converts each resource's blank node to _:b0 (anything, really, but the issue is for any blank node label conflicts), then the resulting graph would be semantically different than requesting each resource individually.

I am no fan of blank nodes in the context of linked data, but if this is specified behavior, this general case would need to be considered.

As an alternative to globbing, I wonder whether LDPath would satisfy the existing use cases.

@RubenVerborgh
Copy link
Contributor Author

RubenVerborgh commented Mar 27, 2019 via email

@kjetilk
Copy link
Member

kjetilk commented Mar 27, 2019

The formal specification for merging RDF graphs are here:
https://www.w3.org/TR/rdf-mt/#graphdefs
I think that should be the basis of this feature, if it is implemented. E.g. INSERT DATA is specified in terms of RDF Merge.

@acoburn
Copy link
Member

acoburn commented Mar 27, 2019

The Web Annotation specification makes use of Prefer headers, specifically oa:PreferContainedDescriptions and a resulting ActivityStream-based container to fetch an entire container of resources. This interaction fits naturally within the existing LDP specification without defining a new semantic for /*.

@michielbdejong
Copy link
Contributor

Not just editorial, we exclude filename globs.

Yes sorry, I agree, and I also applaud that simplifying change.

My only issue then with this PR is that it talks about 'concatenating', not 'merging' Turtle docs. I think we should prescribe merging instead of concatenating, and refer to the document @kjetilk mentioned.

Just to make sure I understand blank nodes correctly, should we assume that the graphs have no blank nodes in common? For instance, when merging two graphs that both state that Alice drank a glass of wine, both using a blank node, then we have no ground to assume that these two docs describe the same event, right? So the result should be that we just keep both blank nodes (not mark them as identical), and so we in the resulting merge we say Alice drank two glasses of wine, right?

@RubenVerborgh
Copy link
Contributor Author

@michielbdejong

I would change the example, adding file3.ttl to which the user has no access.

Also maybe we can specify that if the user has no read access to the container, then the request would fail, but if there are no Turtle documents to which the have read access, it would return an empty RDF doc.

What would the server do if one of the documents has a syntax error?

Agree with all of the above;
however, a note here that the spec currently is not at this level of detail at all.
It should be, don't get me wrong, but I did not omit any details that were previously specified.

Will do it for this section, but a more rigorous version of this spec is needed indeed.

Also, how would the prefixes be done?

Up to the implementation.

Is it just a text-concatenation, or should it understand Turtle?

From the current version:

Then a request to /data/*
will return the union of the datasets in file1.ttl and file2.ttl

My only issue then with this PR is that it talks about 'concatenating', not 'merging' Turtle docs.

Not really as per the above, but will add spec reference as per @kjetilk.

@RubenVerborgh
Copy link
Contributor Author

@acoburn Interesting, created #149 to track that idea.

@RubenVerborgh
Copy link
Contributor Author

Applied the suggested changes. This part of the spec is now very precise, giving clear expectations for a server (and also for a client library that would assume this functionality). Contrasts a bit with the rest of the spec, but good to start somewhere.

@michielbdejong Could you re-review?

@melvincarvalho Given that you are the main stakeholder for globbing, could you please review?

Copy link
Contributor

@michielbdejong michielbdejong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@melvincarvalho
Copy link
Member

melvincarvalho commented Mar 28, 2019

I'd take a look at gold as node solid server was based on gold for globbing. The write up in the spec should be considered a really good try based on someone writing up a documentation having not created that spec. So while the write up might not be perfect, to an extent gold should lead the text.

What I recall is

  • just use /* type URIs is what I thought globbing was
  • I'm unsure that the BasicContainer thing is part of globbing ... is the /* a container even?
  • Unsure what the merge spec is, and dont have time to read that doc right now, but I'd avoid bnodes and globbing in a design

I think globbing should be used sparingly, but is a great tool to take solid apps to the new level.

It's anticipated that new techniques will emerge over time -- things like http2 and websockets, proxies, caching, databases, and out of band streaming were expected to emerge over time and therefore the need for globbing would grow less and less. These techniques did not emerge yet, because we've focused resource on gettng a stable node solid server.

So tl;dr let's support what work in gold which the /* pattern. And take the triples. If you mix bnodes and globbing you're asking for trouble. Globbing is very useful but other web servers such as apache dont use it, so it's slightly bespoke. We should avoid feature creep, and keep it basic but useful.

@melvincarvalho
Copy link
Member

Also, how would the prefixes be done? What if they clash?

Prefixes are just a per file short hand to save you typing out characters over and over.

They are designed not to clash.

For example if you expand the prefixes, you get consistent URIs which are also UUIDs, if that makes sense.

@melvincarvalho
Copy link
Member

melvincarvalho commented Mar 28, 2019

@michielbdejong as background consider a chat app that may have 50 friends, with 1000s of messages. What do you do at startup time?

It's a reasonable, if not good design decision to put each chat item in its own file.

Now at startup you'll need to get 1000 files. Just watch stuff crash until you queue or throttle it. Then think about async type JS frameworks which will update a UI when a new file is received. You can get incredible lag if you are pulling in a lot of files (think about thats also a problem with http2).

Globbing makes some of these problems go away. And make more than the simplest apps more feasible. Trust me you are hitting so many walls with solid at that stage, that things like globbing a god send. The current generation of app builders have not hit these walls yet, but I look forward to people doing so, appreciating how useful this feature can be to get you to prototype (where otherwise people would just give up) as app data grows.

I think as new techniques become available we'll only then be able to evaluate apps which have 100s of files or even 1000s. Also remember globbing is not just about http, you can get files from a cache, from indexeddb, from a file system and sometimes these operations are much slower one a time etc.

@RubenVerborgh
Copy link
Contributor Author

Thanks @melvincarvalho!

So tl;dr let's support what work in gold which the /* pattern.

To the best of my knowledge, this is what the current proposal contains.
Do you see any issues/inconsistencies? Any reason not to merge?

If you mix bnodes and globbing you're asking for trouble.

First, I disagree (no trouble in the way I currently specified it, which is a main reason for specifying). Second and more importantly: what do you propose alternatively? That we a) skip files with blank nodes b) throw an error on blank nodes c) not include triples with blank nodes d) canonicalize them e) not spec it and leave up to implementers f) … ?

@melvincarvalho
Copy link
Member

@RubenVerborgh well I dont use bnodes, so dont have a problem with any merge strategy.

@RubenVerborgh
Copy link
Contributor Author

Thanks, I will consider the above as no objection, unless I hear otherwise.

@RubenVerborgh
Copy link
Contributor Author

RubenVerborgh commented Mar 28, 2019

Globbing makes some of these problems go away.

But creates new ones. Do we really want a single stream with an unknown number of messages, that might be very long and cause server and client pressure? As opposed to fetching resources individually (which, under HTTP/2, comes with virtually zero overhead).

This is not a problem we can solve with theoretical arguments, but which requires experimental measurements. (Take it from a researcher who happens to specialize in the partitioning of Linked Data documents to tackle that specific situation 😉 http://linkeddatafragments.org/publications/jws2016.pdf)

But let's discuss removal in #145.

@melvincarvalho
Copy link
Member

Any reason not to merge?

Why is the BasicContainer stuff in there with contains? Does gold / databox do that?

Not that I think it's a terrible idea, just dont remember seeing that before, so wonder where it came from. If we're cleaning up the spec, then gold should be the reference imho.

@elf-pavlik
Copy link
Member

I would like to see feedback and from Tim and Andrei before a substantive spec change.

I haven't seen activities around solid repos from @deiu for quite some time. If you wan't his feedback you might need to make effort to reach out to him via some more direct channel.

@melvincarvalho
Copy link
Member

@elf-pavlik he may be busy. Let's give him a bit of time to respond, should he wish to. Or possibly Tim could weigh in.

The main concern is regarding the timing. Once the timing is clarified, I could get behind it.

@melvincarvalho
Copy link
Member

@RubenVerborgh perhaps the easiest thing would be, could you state your preferred timeline, or preferred set of time lines for changes to the spec.

I think that may alleviate potential misunderstandings. I'm feeling quite pressured to take time out from other urgent matters, which is not an optimal way to make specification changes.

@RubenVerborgh
Copy link
Contributor Author

RubenVerborgh commented Mar 28, 2019

@melvincarvalho

of first moving it to a "deprecated" section

You have my formal objection to this.

I didn't even formally propose it, so no need to object.
My point is that even if we decide to start the removal process, it would be good to document the current behavior.

There are currently four described behaviors, and at least two of them are incompatible:

  1. the spec
  2. @melvincarvalho's interpretation of what the spec should be (Write a client-side alternative for globbing solid#253 (comment))
  3. the GOLD implementation
  4. the NSS implementation

The goal of this PR was to ensure that 2. is reflected in 1.

in a discussion with @timbl, he expressed his preference for removing globbing altogether

Citation required.

That's it right there. No need to doubt my word.

I'd like to see the context of this. And also the time line.

It's a private conversation that I hence cannot share.
Timeline: today, pointing to these issues and asking for input.

Not now. The bar for spec changes is extremely high, and by default should be unanimous.

For all we know, this is a spec correction.

A bug fix that aligns the spec with gold test indicated I think would be fine.

That's exactly what this PR does. The TestGlob you pointed to tests only one star at the end of a container path: https://github.com/linkeddata/gold/blob/b000d003f9e2aa40e4977839ca063f09435f80c8/server_test.go#L1193

I will aim to give a more specific review to this work. But not this week. More likely in the month of April.

That's fine, but we might not necessarily wait for that.

All I'm doing here is aligning the spec with behavior that you want, GOLD has, and NSS has. Instead of all the tangential discussions, a simple “thank you” could also have worked.

@RubenVerborgh perhaps the easiest thing would be, could you state your preferred timeline

  1. We fix the incorrect description of globbing (this PR).
  2. We implement client-side globbing.
  3. [optional] We deprecate globbing.
  4. We remove globbing.

However, given that we only have one stakeholder and two apps that use globbing, I (personally) am inclined to drop 3, possibly 2, possibly 1.

Timeline: ASAP. If there were more stakeholders, my answer would be different.

@RubenVerborgh RubenVerborgh dismissed melvincarvalho’s stale review March 28, 2019 23:01

No actual changes requested (only additional reviews).

@RubenVerborgh RubenVerborgh mentioned this pull request Mar 28, 2019
@melvincarvalho
Copy link
Member

melvincarvalho commented Mar 29, 2019

Timeline: today, pointing to these issues and asking for input.

Solid lives through its specifications.

Contentious changes to the spec cannot be carried out unilaterally on such a short time line. Globbing came about iteratively through many years of work on solid, and produced critical apps in its evolution. It's an important part of solid.

Your proposal remains problematic, not least because there are some factual errors that could be expanded upon, and I will take some time go to through the reasons for that. As such, I've readded the "on-hold" tag. It would be good not to have a "tag war", which sort of defeats the purpose of the on-hold principle too! :)

We also could use some time for Tim or Andrei to weigh in if they choose to.

I appreciate the review from @michielbdejong however I cant help but reflect that we had almost this exact same conversation c. 6 years ago in unhost on the unhosted spec. When basti and I wanted to add HEAD to the GET part of the REST spec of remote storage, and michiel in this case wanted a freeze due to "slippery slope". Somewhat amusing that we are now in reverse roles, but I do fondly recall our discussion, and hope that you can empathize with that meta principle :)

Instead of all the tangential discussions, a simple “thank you” could also have worked.

@RubenVerborgh I do thank you for this and other contributions to the topic. I simply ask that contentious spec changes have a longer time line on which to discuss. Spec changes should have a high bar for changes. Hope this is making at least some sense.

@RubenVerborgh
Copy link
Contributor Author

s such, I've readded the "on-hold" tag. It would be good not to have a "tag war", which sort of defeats the purpose of the on-hold principle too! :)

on-hold is for technical blockers; I have been made aware of none. Could you please remove? (or link to another issue that blocks this one?)

Solid lives through its specifications.

Or the GOLD implementation apparently. There is a bug in the spec, this fixes it.

It would be very different if globbing were actually used by more than one stakeholder. Given that we have identified all stakeholders, and that this issue correctly captures globbing as defined by them, I don't understand the resistance.

@RubenVerborgh
Copy link
Contributor Author

RubenVerborgh commented Mar 29, 2019

Discussed out of band with @melvincarvalho; this should be on hold until he and @timbl can discuss.

@Mitzi-Laszlo Mitzi-Laszlo added this to the Spec Pull Requests milestone May 7, 2019
RubenVerborgh added a commit to nodeSolidServer/node-solid-server that referenced this pull request Jun 2, 2019
Also reduces globbing in scope to all files in one folder,
as in solid/solid-spec#148
RubenVerborgh added a commit to nodeSolidServer/node-solid-server that referenced this pull request Jun 2, 2019
Also reduces globbing in scope to all files in one folder,
as in solid/solid-spec#148
michielbdejong
michielbdejong previously approved these changes Jul 1, 2019
Copy link
Contributor

@michielbdejong michielbdejong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should merge this because it describes our understanding of the 0.8 spec better than the current text. This is regardless of whether globbing will be removed altogether in the next spec version or not.

@RubenVerborgh
Copy link
Contributor Author

Resolved conflict; mergeable again.

@kjetilk
Copy link
Member

kjetilk commented Jul 1, 2019

OK by me, and my assumption is that it is also OK by @timbl , but perhaps we should have a go from him?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants