We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The latest lighthouse.2018_10_15 table is 237 GB. Querying all lighthouse tables currently costs 4.15 TB and runs in several minutes.
The text was updated successfully, but these errors were encountered:
Can you point to where this trimming could be done?
Sorry, something went wrong.
Hey @connorjclark the get_lighthouse_reports function in the Dataflow pipeline would be the place where we can trim off excess response data
get_lighthouse_reports
https://github.com/HTTPArchive/bigquery/blob/acef15add27f0ba360fba44e2b74ab2575baed46/dataflow/python/bigquery_import.py#L188-L222
The lighthouse payload function moved here
FWIW, we can also trim the payload directly on the agents which might be cleaner.
No branches or pull requests
The latest lighthouse.2018_10_15 table is 237 GB. Querying all lighthouse tables currently costs 4.15 TB and runs in several minutes.
The text was updated successfully, but these errors were encountered: