Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IRIs for DLite instances #788

Open
jesper-friis opened this issue Feb 14, 2024 · 4 comments
Open

IRIs for DLite instances #788

jesper-friis opened this issue Feb 14, 2024 · 4 comments

Comments

@jesper-friis
Copy link
Collaborator

jesper-friis commented Feb 14, 2024

Currently the following instance IDs translate into the same UUIDs:

25a1d213-15bb-5d46-9fcc-cbb3a6e0568e
http://onto-ns.com/meta/calm/0.1/Chemistry/25a1d213-15bb-5d46-9fcc-cbb3a6e0568e
aa6060

where http://onto-ns.com/meta/calm/0.1/Chemistry is the URI of the metadata of the instance. The reason for the last ID is because 25a1d213-15bb-5d46-9fcc-cbb3a6e0568e is the version 5 sha1-based UUID of aa6060 (using the DNS namespace).

It would be nice to also be able to refer to the same instance with the IRI:

http://onto-ns.com/meta/calm/0.1/Chemistry/aa6060

Implementing this only requires a small update in src/getuuid.c.

@torhaugl
Copy link
Contributor

PS: "AA6060" is a commonly used aluminum alloy and has semantic meaning, and refers to an instance in this issue.

To include this feature we need a function to check if a URI points to a datamodel or an instance. Currently, if the last part of the url (aa6060 in http://.../Chemistry/aa6060, Chemistry in http://.../Chemistry) was not an UUID, then it had to be a datamodel. With this change it would be difficult to know (for a computer at least) if an url points to a datamodel or an instance without making a more strict requirement on the name of the namespace, version and datamodel.

An example for
http://onto-ns.com/meta/calm/0.1/Chemistry/aa6060

Naively, a machine could read datamodel="aa6060", version = "Chemistry", namespace = "calm/0.1". If we have a semantic versioning (floating point number for instance) then we could find that this url refers to an instance by finding the version: instance="aa6060", datamodel = "Chemistry", version = "0.1", namespace="calm".

Choosing standard semver would unfortunately not be backwards-compatible (since 0.1 is not a valid semver, 0.1.0 is).

@jesper-friis
Copy link
Collaborator Author

jesper-friis commented Mar 13, 2024

Good comments. I think we only need to consider the following cases:

NAMESPACE/VERSION/NAME       # Data model URI. UUID is a hash of the URI
NAMESPACE/VERSION/NAME/UUID  # Instance with given UUID
NAMESPACE/VERSION/NAME/ID    # Instance. Its UUID is a hash of the `ID` (no slash in ID)
UUID                         # Instance with given UUID
ID                           # Instance. Its UUID is a hash of the `ID`

The first three forms are valid IRIs and can be used in a knowledge base.

Ideally, the ID in the last case should be anything not matching any of the first four cases. To safely distinguish between the five cases we have to impose some syntactic requirements, like:

  • UUID must always match the following pattern XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX where the Xs are hexdigits.
  • NAME must not start with a digit
  • VERSION must start with a digit.
  • NAMESPACE must start with SCHEME:, where SCHEME is a combination of letters, digits, plus (+), period (.) or hyphen (-).

Only instances who's ID contain no slashes can use the third form. If the ID contains a slash, the second form should be use when an valid IRI is needed.

Except for that it should be invalid to give an instance an ID that matches the second or third form, these requirements does not impose any additional restrictions on what IDs can be used.

@jesper-friis
Copy link
Collaborator Author

To increase the freedom of selecting IDs, we could strengthening the restrictions on NAMESPACE, VERSION and NAME, for instance:

  • NAME must be a valid identifier (e.g. match regexp [_a-zA-Z][_a-zA-Z0-9]*)
  • VERSION must be of the form X[.Y[.Z]], where X, Y, Z are integers
  • NAMESPACE must start with SCHEME://

However, whether this is a good idea can be discussed.

The rules should be simple!

@jesper-friis
Copy link
Collaborator Author

Thinking more about this. The ID should be a global identifier. To make it easier to refer to instances in a specific case, DLite supports simple human readable IDs on the form "aa6060". But in a general setting, should the IDs always have a namespace to ensure global uniqueness. Hence, you should really use the UUID or the full IRI (e.g. "http://onto-ns.com/meta/calm/0.1/Chemistry/aa6060") when referring to the instance. In the latter case, the UUID should be calculated from the full IRI.

To conclude, we should not add support for treating NAMESPACE/VERSION/NAME/ID as a synonym for ID. Hence, the current implementation is sufficient.

However, it may be possible to add support for namespace prefixes. For example, if you define "chem" as a namespace prefix for "http://onto-ns.com/meta/calm/0.1/Chemistry/", you should be able to refer to the instance with chem:aa6060.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants