Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea: Record stronger alternative definitions of ease #68

Open
sbp opened this issue Apr 26, 2019 · 1 comment
Open

Idea: Record stronger alternative definitions of ease #68

sbp opened this issue Apr 26, 2019 · 1 comment
Labels
Category: big ideas For major ideas that span multiple issue categories

Comments

@sbp
Copy link

sbp commented Apr 26, 2019

After @dbooth-boston CCed me on the original EasierRDF proposal in Nov 2018, we chatted about it on the Semantic Web Interest Group IRC channel. I tried to convince him that, though important, what he thought was primarily broken about RDF does not match what I think is primarily broken about RDF. I did not manage to convince him at all. My summary to him was that if he continues to fix issues that are relative frivolities then he is going to miss creating a much better RDF, and create instead a marginally improved or perhaps even a merely changed RDF.

I am surprised to find that none of my objections are recorded, in the slightest degree, in this GitHub issue tracker. Usually it is good academic practice to record objections against your conjectures, methods, results, etc. even if you do not agree with those objections. Indeed, one of the purposes of doing so is to provide strong criticial responses to your critics so that the same ground is not trod again in future. And in those situations where your critics turn out to be right you can at least say that you performed due diligence in engaging with those criticisms, even if you initially missed their full validity.

Of course I can add issues to this issue tracker myself, which is why I am creating this umbrella issue. I will not add the individual issues which constituted the bulk of my initial and ongoing criticism, but I will note them inline here so that they are at least on the record.

  • URIs should be the only means of identifying resources. As URIs were intended to be a universal identification space, the "U" only later being changed from Universal to Uniform, literals should not be a separate disjunct space. Internally, representations of URIs in BetterRDF tools could be made as efficient as necessary.
  • Most URIs should be statically typed. When you get an HTTP resource, the type is determined at resolution time by the Content-Type. It may be different for different clients. On the other hand, for e.g. data: resources you know the type before you resolve that URI. In RDF, because of the OWA we might never know the full range of RDF types of a resource, but if statically typed URIs were used predominantly then we would at least know the primary type of a URI before any resolution or RDFish use of the resource that it identifies. This is related to learning the lessons of immutability from pure functional programming.
  • Binary data should be easier to represent. In the existing RDF serialisations, if you want to include binary data then you have to either URI escape it or base64 encode it, and whether your data ends up in a URI or in a literal that's not exactly efficient for those who want to use binary data. The fact that this was only suggested to me a couple of years ago shows just how little even those of us who concentrated on RDF use actually used RDF.
  • URIs for mutable resources should have their authorities decentralised. This should probably apply to many URIs of immutable resources too, but it is especially pressing for mutable resources. One of the primary motivations for RDF was to decentralise knowledge representation. Due to various issues, related to e.g. siloing and the attack mentioned in RFC 7258, the World Wide Web has not (yet) lived up to its potential for decentralisation. There are three camps of responses to this problem: don't care, care little, and care lots. Most RDF specificationistas are in the don't care and care little camps. I am apparently isolated in being in the care lots camp. This is a strange situation since RDF clearly depends on the persistence of its identifiers, and the problem of persistent identifiers has been a persistent nuisance to RDF developers, engendering alternative URI schemes such as tag: and the use of makeshift social solutions such as PURLs. But we now have the technology to begin properly solving this in the RDF and wider web settings, and this should be central to fixing RDF above all else.
  • Consideration should be paid to the lisp model over the SQL model. If you study the existing models of memory, you will find that RDF is strongly associated with what is called in the foregoing linked essay the SQL model. The essay provides some criticism of the SQL model which is pertinent to what has unfolded amongst RDF usage over the years. Ultimately, RDF was a failed research project to add URIs to some sort of Prolog or abstract FOL system. It failed for the same reasons that Prolog failed to capture the imagination of programmers. These are the same reasons why, although Prolog is cool and has niche uses, we don't really see operating systems and browsers and apps written in Prolog. The combination of the memory model and the logic are just not easy to work with for reasons of efficiency, understanding, etc. On the other hand, the lisp model has been quite successful, and may be the basis of a reasonable alternative for EasierRDF.
  • Vocabularies should focus on practical use. They should be built around EasierRDF clients. In around 2001 Seth Russell suggested that graphs should be annotated with a property which gives a default entry point, i.e. making an RDF graph equivalent of a pointed set if you think of graph nodes as elements of a set. This was kind of hard to do unless you made a sort of CWA about some properties, which we never really discussed in detail, but I don't recall the proposal ever being realised, made, or understood by anybody else. There were dozens of usage patterns like these over the years which were obvious things that you'd need in a vibrant and pleasant RDF ecosystem, and yet which never really came to the fore because RDF was mostly a futile exercise in splitting philosophical hairs left on the cutting room floor by those who precipitated the AI winter.

There are many more important issues, but I think this covers the bulk of what is truly and deeply wrong with RDF, especially the issue of decentralisation. Just as @dbooth-boston's original cluster of issues produced a range of smaller associated issues, so the issues above would produce their own mostly independent range of issues.

As I said to @dbooth-boston in Nov 2018 I have barely any interest in fixing RDF properly, let alone trying to pseudofix it, and so the present issue contribution is not intended to stimulate discussion or to involve me in the process. I will, however, note that if you want to fix RDF then you should probably contact the wide range of people who, like me, worked on RDF and the Semantic Web for many years but stopped and are no longer involved because of all of these problems. The people who continued to work on it and are still interested in RDF now are those who did not become jaded as such with RDF, and are therefore the least qualified to continue to work on it. Not only did they not at any point attempt to fix the massive glaring problems, but they were not even put off by these massive glaring problems. Getting these people to subsequently fix RDF is like asking an unrepentant bank robber to be the new head of security at the bank after a heist.

Don't let the people who stole RDF steal its future too!

@dbooth-boston
Copy link
Collaborator

dbooth-boston commented May 1, 2019

@sbp commented:

I am surprised to find that none of my objections are recorded

Apologies. I certainly did not mean to exclude your criticisms! When the github repo was created I worked very hard to collect and capture all of the relevant issues that I found in the hundreds of messages in the original public email thread, but apparently I did not think to look back at our IRC discussion. Mea culpa. Sorry!

Thank you for capturing these comments now!

URIs should be the only means of identifying resources.

Interesting point! The RDF model could have used a single space -- with syntactic sugar for literals -- and any other segregation between URIs and literals could have been internal to implementations (if desired).

Most URIs should be statically typed. When you get an HTTP resource, the type is determined at resolution time by the Content-Type.

I see your point about statically typing most resources, and that makes very good sense. But it is not clear how this is relevant to RDF, since RDF does not rely on URI resolution at all. The Linked Data usage style does, so maybe you are suggesting that RDF should rely on URI resolution, moving full-force down the Linked Data path? That seems like an interesting idea, but it also raises classic questions about trust and fitness for purpose. Can I trust the data I get from dereferencing this URI, especially since domain names may change ownership? Will the data that I get from deferencing this URI align with my application's needs? What if that data includes assertions that are not relevant to my application's needs, but cause logical inconsistency? RDF has historically punted on these issues, and as a community we have never established best practices for dealing with them: each application handles them its own way. It would be really good to establish standard-ish techniques for addressing them.

In short: this would be a HUGE change to RDF itself, but certainly in line with Linked Data and what TimBL envisioned for the Semantic Web.

Binary data should be easier to represent.

Yes, good point.

URIs for mutable resources should have their authorities decentralised.

Agreed, and I am glad you brought this up, though I do not think this problem is unique to RDF. Decentralization is needed for the entire Web.

Consideration should be paid to the lisp model over the SQL model.

Another interesting idea! I fully agree that RDF needs to be able to deal atomically with higher-level data objects/chunks, including lists.

It's funny though, over time I have actually come to think of RDF instance data as more relational-like than I used to. I used to think of it as more arbitrary-graph-like. But the fact is that, aside from Tbox data -- classes and predicates -- most RDF instance data (Abox) involves multiple instances of the same shape -- often tuples -- which often looks a lot like relational tables. And of course, a huge amount of RDF data originates in relational tables. I am not advocating here for restricting RDF to the relation model. I am merely making an observation. But yes, the Lisp model probably makes sense.

Vocabularies should focus on practical use.

Agreed.

if you want to fix RDF then you should probably contact the wide range of people who, like me, worked on RDF and the Semantic Web for many years but stopped and are no longer involved because of all of these problems

Very good idea.

The people who continued to work on it and are still interested in RDF now are those who did not become jaded as such with RDF, and are therefore the least qualified to continue to work on it.

I take your point. It is hard to change one's mindset after becoming so accustomed to looking at the RDF world in a particular way. This is why I really think we need fresh ideas on the table -- to relax some of the assumptions that we've made for so long. I am hoping that some bright young minds can look at all this, abstract the good from it, and come up with something much easier and more pragmatic.

@dbooth-boston dbooth-boston added the Category: big ideas For major ideas that span multiple issue categories label Aug 22, 2019
@dbooth-boston dbooth-boston changed the title Record stronger alternative definitions of ease Idea: Record stronger alternative definitions of ease Aug 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category: big ideas For major ideas that span multiple issue categories
Projects
None yet
Development

No branches or pull requests

2 participants