-
-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathNOTES
49 lines (43 loc) · 2.9 KB
/
NOTES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
* Adapters
A lot of interesting data is external and does not offer SPARQL endpoints.
Not all of that data can be imported into our triplestore, either.
Either because it's too much data, or because it's highly dynamic.
We'll need adapters that can translate between SPARQL and external APIs.
This will need special considerations.
** Allow automatic use of adapters
The user shouldn't need to write their queries around what data is internal vs what data is external.
If we forego the "automatic" part, we could probably use vanilla SPARQL federation (https://www.w3.org/TR/sparql11-federated-query/).
We could always go with that first, then solve automatic federation later.
There's also some existing research on automatic federation (which I currently cannot find).
** Allow JOINs between adapters, without absurd query costs
A query like "find all artists born in all cities that currently have rain" is not feasible if we have to query multiple APIs.
Such queries will probably need to include local information so that we can reduce the number of calls we need to make to APIs.
This is related to the topic of query planning.
* Importing data
Not all external data has to go through adapters.
For example, when we index movies on the local file system, we can import the most important information (such as title, release year, actors, …) into our triplestore.
This will allow us to answer most queries without having to call out to APIs.
* Relational databases like MusicBrainz
There exists a language called R2RML (https://www.w3.org/TR/r2rml/) that is used for mapping from relational databases to RDF.
Learn more about it, see if it can be used for live queries or if we'd have to materialize the whole RDF graph for the database.
See also
- http://www.arida.ufc.br/livesynrdbrdf/linkedbrainzlive.html for a setup that maintains a live RDF graph of the MusicBrainz DB.
- https://github.com/metabrainz/MusicBrainz-R2RML
- https://www.stardog.com/tutorials/data-mappings/
- https://docs.stardog.com/virtual-graphs/
* Related research
https://doi.org/10.1109/RIVF.2007.369129 - Towards Ontology-based Semantic File Systems
https://doi.org/10.48084/etasr.1898 - The Design and Development of a Semantic File System Ontology
https://doi.org/10.1145/121132.121138 - Semantic file systems
* External datasets we may want to make use of
** OpenStreetMap
Has a SPARQL endpoint at https://sophox.org/
** MusicBrainz
Used to have a SPARQL endpoint via the LinkedBrainz project, but that seems to be dead.
The entire dataset is available as SQL dumps, so we could host our own kind of LinkedBrainz/tell users to self-host, as a more powerful alternative to using the API
** IMDB
Publishes some of their data at https://www.imdb.com/interfaces/, free for personal use
** TMDB
Has a data dump of IDs only, would be limited to using their API
** Wikidata
https://query.wikidata.org/