You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have encountered problems on a large amount of data. We tried to parse the query that Qryn makes and we had some questions
SQL query:
explain indexes = 1 WITH sel_a AS (select `samples`.`string` as `string`,`samples`.`fingerprint` as `fingerprint`,samples.timestamp_ns as `timestamp_ns` from qryn.samples_v3_dist as `samples` where ((`samples`.`timestamp_ns` between 1722865286192000000 and 1722865586192000000) and (`samples`.`type` in (0,0))) and (samples.fingerprint IN (select `sel_1`.`fingerprint` from (select `fingerprint` from `qryn`.`time_series_gin` where ((`key` = 'k8s_clusters') and (`val` = 'k8s-omp-test'))) as `sel_1` inner any join (select `fingerprint` from `qryn`.`time_series_gin` where ((`key` = 'ClientId') and (`val` = '123'))) as `sel_2` on `sel_1`.`fingerprint` = `sel_2`.`fingerprint`)) order by `timestamp_ns` desc limit 100) select JSONExtractKeysAndValues(time_series.labels, 'String') as `labels`,sel_a.* from sel_a left any join `qryn`.`time_series_dist` AS time_series on `sel_a`.`fingerprint` = time_series.fingerprint order by `labels` desc,`timestamp_ns` desc
The first part is fine, there is filtering by time:
1 | Expression (Projection)
2 | Sorting (Sorting for ORDER BY)
3 | Expression (Before ORDER BY)
4 | Join (JOIN FillRightFirst)
5 | Expression ((Before JOIN + Projection))
6 | Limit (preliminary LIMIT (without OFFSET))
7 | Sorting (Merge sorted streams after aggregation stage for ORDER BY)
8 | Union
9 | Expression
10 | CreatingSets (Create sets before main query execution)
11 | Sorting (Sorting for ORDER BY)
12 | Expression (Before ORDER BY)
13 | ReadFromMergeTree (qryn.samples_v3)
14 | Indexes:
15 | MinMax
16 | Keys:
17 | timestamp_ns
18 | Condition: and((timestamp_ns in (-Inf, 1722865586192000000]), (timestamp_ns in [1722865286192000000, +Inf)))
19 | Parts: 1/53
20 | Granules: 3/126065
21 | Partition
22 | Keys:
23 | toStartOfDay(toDateTime(divide(timestamp_ns, 1000000000)))
24 | Condition: and((toStartOfDay(toDateTime(divide(timestamp_ns, 1000000000))) in (-Inf, 1722805200]), (toStartOfDay(toDateTime(divide(timestamp_ns, 1000000000))) in [1722805200, +Inf)))
25 | Parts: 1/1
26 | Granules: 3/3
27 | PrimaryKey
28 | Keys:
29 | timestamp_ns
30 | Condition: and((timestamp_ns in (-Inf, 1722865586192000000]), (timestamp_ns in [1722865286192000000, +Inf)))
31 | Parts: 1/1
32 | Granules: 1/3
In the second part, we encountered that it does a full database scan:
Is it obligatory, is there no possibility to make binding also to time or other way of filtering ?
Looks like an approach with multiple joins doesn’t work well on big amounts of data.
Denormalization and storing labels data in another format may help. There are some options:
We have encountered problems on a large amount of data. We tried to parse the query that Qryn makes and we had some questions
SQL query:
The first part is fine, there is filtering by time:
In the second part, we encountered that it does a full database scan:
The third part is similar, analyzing a lot of data
Is it obligatory, is there no possibility to make binding also to time or other way of filtering ?
Looks like an approach with multiple joins doesn’t work well on big amounts of data.
Denormalization and storing labels data in another format may help. There are some options:
Storing labels as Map(LowCardinality(String), String) at the schema otel.otel_logs in this article https://clickhouse.com/blog/storing-log-data-in-clickhouse-fluent-bit-vector-open-telemetry#querying-the-map-type
Look at section “Approach 3: JSON as pairwise arrays” here https://www.propeldata.com/blog/how-to-store-json-in-clickhouse-the-right-way
This approach also use Signoz https://signoz.io/docs/userguide/logs_clickhouse_queries/
The text was updated successfully, but these errors were encountered: