-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate views on all
dataset
#142
Comments
You could do this: CREATE OR REPLACE VIEW `httparchive.scratchspace.test_view_2022_08_01_star` AS (
SELECT
client AS TABLE_SUFFIX,
page,
rank,
payload
FROM
`httparchive.all.pages`
WHERE
date = '2022-08-01'
); But three downsides to this hack:
So not a drop in replacement. Plus it's just an awful, awful, hack. |
FYI, bugs for wildcard support in views: |
I'm gonna close this. The Views look like a good option for Given the wildcard issue, they don't look like a good option to fully replicate our current schema, from the new schema. There's also a bigger question as to whether we want to maintain the old schema indefinitely even if we could do that with views. I do have one further discussion point for potentially using views to restrict access to some of the more expensive queries which I've raised #149 for, but think that's a separate issue to this one, which was to investigate what views meant: did they indeed lead to cheaper queries (yes!) and are there any limits on them (yes - wildcards), so let's close this out. |
Reopening. FYI @rviscomi |
I'd be interested to use views for the July dataset and seeing what breaks |
Quite a few queries in our BigQuery repo that fed our site for a start. |
@tunetheweb you reopened it a few months ago, is there still some research to be done? |
@rviscomi , @paulcalvano , @pmeenan as discussed on last call.
I ran the following:
Then when I run this:
It uses up 1.55 GB - so clearly isn't including the payload (in this view), and definitely not the Lighthouse results (not in this view, but in underlying table).
Changing it to this uses up 1.76GB (so slightly more as two columns):
Changing it to either of these uses up 5.38TB:
This SQL is only 409.83 MB:
So it looks to me like it IS using the clustering on the underlying table and not charging you the full amount each time, as you thought it would @rviscomi ? Unless I'm missing something?
This means we could in theory move everything to the new
all
schema (including backfilling) and replace all the existing old tables with views to maintain backwards compatibility for old data, but also have it in the new schema format, without duplicating data.We could in theory also keep creating these old-style views every month so no one has to learn the new schema - they just get cheaper queries. That would also potentially reduce risk of someone accidentally querying all columns, or large date ranges, and running up huge bills too easily.
However I did discover one flaw in this plan in that you can't do this:
As you get this error 😔:
Turns out this limitation is documented:
Still, non-wildcard queries would work on the views.
The text was updated successfully, but these errors were encountered: