Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable :nested-field-columns feature in the driver #252

Open
frankyso opened this issue Jun 25, 2024 · 6 comments
Open

Enable :nested-field-columns feature in the driver #252

frankyso opened this issue Jun 25, 2024 · 6 comments
Labels

Comments

@frankyso
Copy link

Enhance ClickHouse Driver to Support JSON Parsing

Introduction

  • Brief Description: Proposal to add JSON parsing support to the ClickHouse driver in Metabase.
  • Motivation: ClickHouse has robust JSON functions that are currently underutilized in Metabase, limiting the usability for datasets that contain JSON fields.

Feature Description

  • Detailed Explanation: Integrate JSON parsing functions within the ClickHouse SQL driver in Metabase to allow users to directly query JSON-type fields and extract data in a more flexible and powerful manner.
  • Examples:
    • SELECT JSONExtractRaw(data, '$.key') FROM table_name WHERE JSONHas(data, '$.key');
    • This feature would allow for extracting elements directly from JSON columns in queries.

Benefits

  • Use Cases: Users dealing with JSON data stored in ClickHouse will be able to perform more complex queries directly from Metabase, improving analytics capabilities.
  • Potential Impact: Enhances Metabase's utility for users who work with modern data types, such as JSON, especially in big data and NoSQL contexts.

Additional Context

  • Related Issues or Discussions: None at this time.
  • Implementation Suggestions: Consider leveraging existing ClickHouse functions like JSONExtractRaw, JSONHas, etc., within the Metabase ClickHouse driver.
@frankyso frankyso added the enhancement New feature or request label Jun 25, 2024
@slvrtrn
Copy link
Collaborator

slvrtrn commented Jun 27, 2024

From your description, it looks like we need to enable the :nested-field-columns feature in the driver and implement the required methods.

https://github.com/metabase/metabase/blob/v0.50.7/src/metabase/driver.clj#L482-L483

;; Does this database support nested fields but only for certain field types (e.g. Postgres and JSON / JSONB columns)?
:nested-field-columns

This is probably also related: https://www.metabase.com/docs/latest/data-modeling/json-unfolding

Is it correct?

@frankyso
Copy link
Author

Yes, that's correct. Enabling the :nested-field-columns feature for the ClickHouse driver seems to be the appropriate approach for supporting JSON parsing. This feature, as described in the Metabase documentation, matches our requirements for handling JSON fields effectively

@slvrtrn
Copy link
Collaborator

slvrtrn commented Jul 1, 2024

@frankyso,
I tried a few things with the driver code, and while ClickHouse works well with JSON stored in String columns, it's tricky to make the driver recognize this as a "nested" type (so the JSON features will be enabled for this particular column) because String is already matched to a "text" MB base type. Hacks like using a workaround based on a type (something like Variant(String)) don't look like a good option to me, and the old JSON ClickHouse type is now obsolete.

However, luckily, the new semi-structured data type (to be used in exactly these scenarios) is planned to be available in 24.7 (see ClickHouse/ClickHouse#54864 (comment)). Using this new type will allow the implementation of the nested-field-columns feature to be much cleaner.

So, when 24.7 (or maybe even the head version containing the required changeset) is out, I will start working on this in the driver code.

@slvrtrn slvrtrn changed the title Enhance ClickHouse Driver to Support JSON Parsing Enable :nested-field-columns feature in the driver Jul 3, 2024
@cptjacky
Copy link

cptjacky commented Oct 1, 2024

Hello @slvrtrn !

The new JSON field was released a couple of versions ago (behind a feature flag) but its usage with Metabase is not very convenient :/

Selecting any property of the JSON field without casting it to a proper type fails with:

java.lang.IllegalArgumentException: Unknown data type: Dynamic

Would the implementation of nested-field-columns improve this or are we forced to explicitly type all of the JSON columns we use?

Thanks!

@slvrtrn
Copy link
Collaborator

slvrtrn commented Oct 1, 2024

@cptjacky, I think we will need to wait until ClickHouse/clickhouse-java#1833 is resolved.

@cptjacky
Copy link

cptjacky commented Oct 2, 2024

You're right! I'll subscribe to that issue too, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants