Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Technologies report #2

Open
maceto opened this issue Aug 5, 2023 · 11 comments
Open

Technologies report #2

maceto opened this issue Aug 5, 2023 · 11 comments
Assignees

Comments

@maceto
Copy link
Collaborator

maceto commented Aug 5, 2023

Could you describe the origin/source of this data?

  • Technology name
  • Technology description
  • List of category names
  • List of similar technologies
    • Technology name
    • used by # of origins
    • used by % of origins

Create a script to query this data from BQ transform and save it in Firestore.

@sarahfossheim
Copy link

I'm not sure if all of that information is available already, or where it lives in that case. But the technology name and category names are available and in use already currently, eg.:
https://cdn.httparchive.org/reports/cwvtech/ALL/ALL/jQuery.json

{
    "app": "jQuery",
    "category": "JavaScript libraries, Miscellaneous, Static site generator"
}

I don't think the description exists somewhere yet, but if the first aim is feature parity, then the main thing we need is name + categories right now.

The similar technologies can probably be based on the category names, if there's no data on it yet?

@rviscomi
Copy link
Member

rviscomi commented Sep 1, 2023

SELECT
  client,
  app AS technology,
  # TODO
  NULL AS description,
  # CSV format
  category,
  # TODO: other technologies within category?
  NULL AS similar_technologies,
  origins
FROM
  `httparchive.core_web_vitals.technologies`
WHERE
  date = '2023-07-01' AND
  geo = 'ALL' AND
  rank = 'ALL'
ORDER BY
  origins DESC

@rviscomi
Copy link
Member

rviscomi commented Sep 3, 2023

@sarahfossheim how should we source the similar_technologies field, something like "top 3 technologies within same category"?

Also note that the description field isn't set in BigQuery so we'll leave it null for now.

@maceto
Copy link
Collaborator Author

maceto commented Sep 9, 2023

@rviscomi, should we have any mandatory param for this endpoint?

@rviscomi
Copy link
Member

I think just technology

cc @sarahfossheim

@sarahfossheim
Copy link

I think for the first version something like you said can make sense: technologies with at least one category in common, sorted by amount of origins, and then pick the top 3 (or maybe top 5?).

Or maybe an alternative could be:

  • technologies with the most categories in common,
  • and use amount of origins as a secondary sorting key.

Then technologies that have many categories in common will come up, even if they're a new or niche technology with not many origins. Which I think makes more sense when it comes to pinning down similar technologies.

If any data gets returned along with the technology names (eg. amount of origins), then we also need to pass in the rank and geo, so that the data of the similar technologies is filtered by the same criteria as the data of the current technology.

@maceto
Copy link
Collaborator Author

maceto commented Sep 15, 2023

Example of how to consume this endpoint

  curl --request GET \
  --url 'https://dev-gw-2vzgiib6.ue.gateway.dev/v1/technologies?category=["Blogs", "CMS", "Ecommerce"]&technology=["WordPress", "Chameleon system"]'

@maceto
Copy link
Collaborator Author

maceto commented Sep 16, 2023

@rviscomi @sarahfossheim, all the changes discussed are already deployed.

New URL https://dev-gw-2vzgiib6.uk.gateway.dev/v1/technologies

Documentation: https://github.com/HTTPArchive/tech-report-apis#get-technologies

@rviscomi
Copy link
Member

Updated query to pull in the descriptions:

SELECT
  client,
  app AS technology,
  description,
  # CSV format
  category,
  # TODO: other technologies within category?
  NULL AS similar_technologies,
  origins
FROM
  `httparchive.core_web_vitals.technologies`
JOIN
  `httparchive.core_web_vitals.technology_descriptions`
ON
  app = technology
WHERE
  date = '2023-07-01' AND
  geo = 'ALL' AND
  rank = 'ALL'
ORDER BY
  origins DESC
image

@maceto
Copy link
Collaborator Author

maceto commented Dec 4, 2023

Hi @rviscomi,

why is there a static date in the WHERE clause of 2023-07-01 for technologies and 2023-08-01 for categories? I think we said this should be the latest month instead?

@rviscomi
Copy link
Member

rviscomi commented Dec 4, 2023

Yeah it should probably track the latest month.

Is httparchive.core_web_vitals.technology_descriptions manually or auto generated? If manual, we wouldn't pick up the descriptions for any new technologies, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants