Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use INTERSECT and UNION for multiple filters in generated query #2725

Open
LZRS opened this issue Nov 15, 2024 · 4 comments · May be fixed by #2742
Open

Use INTERSECT and UNION for multiple filters in generated query #2725

LZRS opened this issue Nov 15, 2024 · 4 comments · May be fixed by #2742

Comments

@LZRS
Copy link
Collaborator

LZRS commented Nov 15, 2024

A suggestion to use Intersect or Union of the resulting resourceUuid of the index table subqueries depending on the Search operation. This might help reduce redundancy if same resourceUuid appears across the tables or in the case of operation AND, resourceUuid doesn't appear.

Example
Query generated for multiple filters currently uses multiple ANDs across the different index tables

An example to currently generated query

SELECT a.resourceUuid, a.serializedResource FROM ResourceEntity a
WHERE a.resourceType = 'Location'
AND 
a.resourceUuid IN (
    SELECT resourceUuid FROM TokenIndexEntity WHERE resourceType = 'Location' AND index_name = 'status' AND index_value = 'active')
AND 
a.resourceUuid IN (
    SELECT resourceUuid FROM TokenIndexEntity WHERE resourceType = 'Location' AND index_name = 'type' AND (index_value = 'bu' AND IFNULL(index_system,'') = 'http://terminology.hl7.org/CodeSystem/location-physical-type'))

May be replaced with

SELECT a.resourceUuid, a.serializedResource
FROM ResourceEntity a
WHERE a.resourceType = 'Location'
  AND a.resourceUuid IN (SELECT resourceUuid
                         FROM TokenIndexEntity
                         WHERE resourceType = 'Location' AND index_name = 'status' AND index_value = 'active'

INTERSECT

SELECT resourceUuid
FROM TokenIndexEntity
WHERE resourceType = 'Location'
  AND index_name = 'type'
  AND (index_value = 'bu' AND IFNULL(index_system, '') = 'http://terminology.hl7.org/CodeSystem/location-physical-type'));
@LZRS LZRS changed the title Use INTERSECT instead of multiple ANDs for multiple filters in generated query Use INTERSECT and UNION for multiple filters in generated query Nov 18, 2024
@jingtang10
Copy link
Collaborator

do you have any performance benchmark numbers for this one or is this more speculative?

@LZRS
Copy link
Collaborator Author

LZRS commented Nov 18, 2024

Yeah, it's kinda of speculative but I can get some data

@LZRS
Copy link
Collaborator Author

LZRS commented Nov 18, 2024

For the above queries, with db

SELECT COUNT(*) FROM TokenIndexEntity;
74942 rows

SELECT COUNT(*) FROM TokenIndexEntity WHERE resourceType = 'Location';
11784 rows

SELECT COUNT(*) FROM ResourceEntity;
8881 rows

SELECT COUNT(*) FROM ResourceEntity WHERE resourceType = 'Location';
1966 rows

The first query took around 17ms and the second one 7ms

Additional context

SELECT COUNT(*)
FROM (SELECT resourceUuid
      FROM TokenIndexEntity
      WHERE resourceType = 'Location'
        AND index_name = 'status'
        AND index_value = 'active'

      INTERSECT

      SELECT resourceUuid
      FROM TokenIndexEntity
      WHERE resourceType = 'Location'
        AND index_name = 'type'
        AND (index_value = 'bu' AND
             IFNULL(index_system, '') = 'http://terminology.hl7.org/CodeSystem/location-physical-type'));

57 rows

SELECT COUNT(*)
FROM (SELECT resourceUuid
      FROM TokenIndexEntity
      WHERE resourceType = 'Location'
        AND index_name = 'status'
        AND index_value = 'active'

      UNION ALL

      SELECT resourceUuid
      FROM TokenIndexEntity
      WHERE resourceType = 'Location'
        AND index_name = 'type'
        AND (index_value = 'bu' AND
             IFNULL(index_system, '') = 'http://terminology.hl7.org/CodeSystem/location-physical-type'));

2023 rows

@LZRS LZRS linked a pull request Nov 27, 2024 that will close this issue
7 tasks
@jingtang10
Copy link
Collaborator

I think one limitation that this exposes is that if we use union to join multiple filters, we are more likely to be hit by the limit of values in the IN statement.

But I think it's a problem with our existing queries anyway... not a problem for this PR, but we should perhaps add some more documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: New
Development

Successfully merging a pull request may close this issue.

2 participants