Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use HN algolia endpoint to retrieve trees #3

Open
simonw opened this issue Jul 25, 2021 · 3 comments
Open

Use HN algolia endpoint to retrieve trees #3

simonw opened this issue Jul 25, 2021 · 3 comments
Labels
enhancement New feature or request

Comments

@simonw
Copy link
Collaborator

simonw commented Jul 25, 2021

The trees command currently has to make a request for every single comment. Algolia have an endpoint that bundles the entire thread together into a single request.

https://hn.algolia.com/api/v1/items/ID

Here's an example that loads quickly, with about 50 comments: https://hn.algolia.com/api/v1/items/27941108

It doesn't appear to use pagination at all - if a thread is big then the response is big.

I ran this search to find some stories with more than 1000 comments: https://hn.algolia.com/api/v1/search?tags=story&numericFilters=num_comments%3E=1000

Here's one: https://news.ycombinator.com/item?id=25015967 with 4759 comments. Hitting the API takes 41s and returns 3.7 MB of JSON!

wget 'https://hn.algolia.com/api/v1/items/25015967'  0.03s user 0.04s system 0% cpu 41.368 total
/tmp % ls -lah 25015967 
-rw-r--r--  1 simon  wheel   3.7M Jul 24 20:31 25015967
@simonw simonw added the enhancement New feature or request label Jul 25, 2021
@simonw
Copy link
Collaborator Author

simonw commented Jul 25, 2021

Prototype:

curl 'https://hn.algolia.com/api/v1/items/27941108' \
  | jq '[recurse(.children[]) | del(.children)]' \
  | sqlite-utils insert hn.db items - --pk id

@simonw
Copy link
Collaborator Author

simonw commented Jul 25, 2021

If you hit the endpoint for a comment that's part of a thread you get that comment and its recursive children: https://hn.algolia.com/api/v1/items/27941552

You can tell that it's not the top-level because the parent_id isn't null. You can use story_id to figure out what the top-level item is.

{
  "id": 27941552,
  "created_at": "2021-07-24T15:08:39.000Z",
  "created_at_i": 1627139319,
  "type": "comment",
  "author": "nine_k",
  "title": null,
  "url": null,
  "text": "<p>I wish ...",
  "points": null,
  "parent_id": 27941108,
  "story_id": 27941108
}

@simonw
Copy link
Collaborator Author

simonw commented Jul 25, 2021

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant