Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: assertion query optimization #1736

Open
federicojasson opened this issue May 15, 2024 · 3 comments
Open

Question: assertion query optimization #1736

federicojasson opened this issue May 15, 2024 · 3 comments

Comments

@federicojasson
Copy link

Hello 👋

When one defines rowConditions, a view is created with the following query:

SELECT
  '<condition>' AS failing_row_condition,
  *
FROM dataset
WHERE NOT (<condition>)
UNION ALL
...

Reference:

SELECT
${this.sqlString(rowCondition)} AS failing_row_condition,
*

I was wondering if the * is really necessary here.

We're trying to follow BigQuery best practices regarding queries, and this forces us to define manual assertions.

I apologize if this has already been answered before, I couldn't find a related issue.
Thank you!

@BenBirt
Copy link
Collaborator

BenBirt commented May 20, 2024

It's definitely not necessary!

I think the reason it is there is to make debugging the assertion - when it fails - easier.

With a "normal" assertion (i.e. manually-defined), the user gets to choose what column(s) to select into the assertion's view. (Presumably those that are likely to help debug the issue.) Config-block assertions obviously don't have any way for the user to express that, so (I think) we chose to select all columns to aid debuggability.

I'm not against removing it, but do note that SELECT * in a view should not cost anything (because creating a view doesn't actually query any data).

@federicojasson
Copy link
Author

Ohh I see, it's for debugging purposes.

I'm not against removing it, but do note that SELECT * in a view should not cost anything (because creating a view doesn't actually query any data).

It does query the data when the assertions are run, which I think is always the case for built-in assertions, correct?

But the only solution I can think of is to have a different query for running the assertions than for the view (used for debugging). I'm not sure how would that work or how hard is to implement that though.

@BenBirt
Copy link
Collaborator

BenBirt commented May 22, 2024

It does query the data when the assertions are run, which I think is always the case for built-in assertions, correct?

D'oh, of course, you're absolutely right!

Thanks for the discussion - we will keep this in mind and see if we can figure out options here.

For now though, if you want to avoid SELECT *, I think your best bet is to use "standard" assertions. (And note that if you have common pattern(s), you can always extract those out using JavaScript.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants