Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data for meta reports (plus new tech dashboard) #12

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

max-ostapenko
Copy link
Contributor

@max-ostapenko max-ostapenko commented Sep 24, 2024

Added meta data tables to be used in the Looker dashboards.

scratchspace.meta_crawl for HTTP Archive BigQuery Meta Dashboard

The reason is the speed and cost:

  • materialised queries have limited SQL support,
  • cache expires mush more often than the tables are updated,
  • using INFORMATION_SCHEMA is incompatible with cache and materialisation
  • after deprecating legacy tables I'd like to add a few more metrics from quality checks (and it will be a heavier query)

scratchspace.meta_technologies for HTTP Archive Technologies Detection Dashboard

The technologies dashboard can be used for maintenance of technology detection rules (as I wrote previously in HTTPArchive/wappalyzer#33)

Closes: HTTPArchive/wappalyzer#70

@max-ostapenko max-ostapenko marked this pull request as ready for review October 17, 2024 09:01
@max-ostapenko
Copy link
Contributor Author

When we have the detection rules list in BQ (HTTPArchive/wappalyzer#34) I would join it to see which rules don't provide any detection, and thus waste the crawl resources and need to be fixed or removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Keep technology detections up to date
1 participant