Richer querying in get_links #2396

mattyg · 2023-05-22T19:06:23Z

mattyg
May 22, 2023
Collaborator

Currently we have some very basic abilities to filter get_links calls off a base:

by a single link type or a list of link types
by the exact link tag, or the prefix of the link tag in raw bytes

I would like richer capacity to filter / sort get_links calls -- making them more generally useful (so we don't need to rely on path-based indexes for this purpose and we don't need to send a bunch of data unnecessarily over the wire from the authority to the requestor), and also more friendly and ergonomic for the app developer.

On the last Kasra call we talked about adding the ability to filter by the author and timestamp range, which is a great start. I would like even richer filtering / sorting / de-duplicating capacity -- ideally as close as possible to the full range of flexibility we would have in SQL queries. I think there's a few ways we could approach this:

Approach 1
We could include a filter_fn_name: Option<&str> parameter in the get_links query, and create a filtering function in the coordinator zome. The authority then gets all links, runs them through the specified filtering function, and returns the results. Thus the links returned to the client could be only those actually needed, already filtered and sorted as needed. This would also support running aggregation / calculations on the results richer than just counting (i.e. take the average of a bunch of scores and just return the average, etc.)

Approach 2
We could have an opinionated use of the link tag as a serialized struct & expose functionality to query the database

The link tag data structure is pre-specified in the link tag definition in the integrity zome: i.e AgentToBooks(AgentToBooksLinkTag)
Before saving a link tag to sqlite database the tag is deserialized and either:
- inserted into a link_tag_deserialized JSON column using the sqlite json extension
- or, a new db table is created for each link tag data structure, and the link tag fields are insert directly in sql (this could not support nested data structures, but I think that is fine)
  - We then add a filter_query parameter to get_links which uses / pilfers an existing SQL ORM to give us access to the full flexibility of sql for specifying complex filtering and sorting. This SQL could then be used directly by the authority in their database call to get the links. We would need to put some constraints on the filter_query to avoid griefing attacks of running very complex or slow sql queries.

It seems to me that approach 1 is simpler change to core, but would have a higher performance cost. I don't feel strongly about either approach as they both would provide all the desired functionality.

I do feel like whatever we decide, it might still make sense to have an opinionated way to specify link tag structures as serialized structs for a given link type, so you can always rely on the link tag having the expected structure and validate it as such.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Richer querying in get_links #2396

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Richer querying in get_links #2396

mattyg May 22, 2023 Collaborator

Replies: 0 comments

mattyg
May 22, 2023
Collaborator