Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build a new monthly analysis pipeline based on an evergreen version of Web Almanac queries #19

Open
rviscomi opened this issue Feb 1, 2022 · 2 comments
Assignees

Comments

@rviscomi
Copy link
Member

rviscomi commented Feb 1, 2022

Owner: @giancarloaf
Supporters: @rviscomi @tunetheweb

After the data is piped to BigQuery, we currently run a bunch of pre-written queries against it and save the results to GCS to be viewed on the httparchive.org website.

Instead, we want to seed the monthly queries with those used by the 2021 Web Almanac (or more as needed) and run that as an aggregation step after the results have all been written to BigQuery.

This effectively deprecates the generate_reports.sh script and reimplements it using GCP primitives.

@tunetheweb
Copy link
Member

tunetheweb commented Feb 1, 2022

"Instead"? So we're retiring the existing queries?

I presumed we were going to keep both? Especially since the existing ones in the BigQuery repo are written to allow them to be also run by lens. And also a lot of time series in there that aren't run for the Almanac.

@rviscomi
Copy link
Member Author

rviscomi commented Feb 1, 2022

That's what I meant by "or more as needed". We should start over with the Web Almanac queries and add back in anything that's still of use.

Not sure what will become of the lenses. They might become less costly in the new pipeline so we could continue to run every query through them. Or we might want to only use the lenses for specific queries that we know would be interesting.

@rviscomi rviscomi changed the title Monthly stats and trends should be an evergreen version of Web Almanac queries Build a new monthly analysis pipeline based on an evergreen version of Web Almanac queries Feb 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants