You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After the data is piped to BigQuery, we currently run a bunch of pre-written queries against it and save the results to GCS to be viewed on the httparchive.org website.
Instead, we want to seed the monthly queries with those used by the 2021 Web Almanac (or more as needed) and run that as an aggregation step after the results have all been written to BigQuery.
This effectively deprecates the generate_reports.sh script and reimplements it using GCP primitives.
The text was updated successfully, but these errors were encountered:
"Instead"? So we're retiring the existing queries?
I presumed we were going to keep both? Especially since the existing ones in the BigQuery repo are written to allow them to be also run by lens. And also a lot of time series in there that aren't run for the Almanac.
That's what I meant by "or more as needed". We should start over with the Web Almanac queries and add back in anything that's still of use.
Not sure what will become of the lenses. They might become less costly in the new pipeline so we could continue to run every query through them. Or we might want to only use the lenses for specific queries that we know would be interesting.
rviscomi
changed the title
Monthly stats and trends should be an evergreen version of Web Almanac queries
Build a new monthly analysis pipeline based on an evergreen version of Web Almanac queries
Feb 1, 2022
Owner: @giancarloaf
Supporters: @rviscomi @tunetheweb
After the data is piped to BigQuery, we currently run a bunch of pre-written queries against it and save the results to GCS to be viewed on the httparchive.org website.
Instead, we want to seed the monthly queries with those used by the 2021 Web Almanac (or more as needed) and run that as an aggregation step after the results have all been written to BigQuery.
This effectively deprecates the generate_reports.sh script and reimplements it using GCP primitives.
The text was updated successfully, but these errors were encountered: