Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

github-to-sqlite should handle rate limits better #51

Open
simonw opened this issue Sep 17, 2020 · 4 comments
Open

github-to-sqlite should handle rate limits better #51

simonw opened this issue Sep 17, 2020 · 4 comments
Labels
enhancement New feature or request

Comments

@simonw
Copy link
Collaborator

simonw commented Sep 17, 2020

From #50 - right now it will crash with an error of it hits the rate limit. Since the rate limit information (including reset time) is available in the headers it could automatically sleep and try again instead.

@simonw simonw added the enhancement New feature or request label Sep 17, 2020
@simonw simonw changed the title github-to-sqlite get should follow rate limits github-to-sqlite should handle rate limits better Nov 30, 2020
@simonw
Copy link
Collaborator Author

simonw commented Nov 30, 2020

This just caused a failure in deploying the demo: https://github.com/dogsheep/github-to-sqlite/runs/1471304407?check_suite_focus=true

  File "/opt/hostedtoolcache/Python/3.8.6/x64/bin/github-to-sqlite", line 33, in <module>
    sys.exit(load_entry_point('github-to-sqlite', 'console_scripts', 'github-to-sqlite')())
  File "/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/runner/work/github-to-sqlite/github-to-sqlite/github_to_sqlite/cli.py", line 142, in issue_comments
    for comment in utils.fetch_issue_comments(repo, token, issue):
  File "/home/runner/work/github-to-sqlite/github-to-sqlite/github_to_sqlite/utils.py", line 380, in fetch_issue_comments
    for comments in paginate(url, headers):
  File "/home/runner/work/github-to-sqlite/github-to-sqlite/github_to_sqlite/utils.py", line 472, in paginate
    raise GitHubError.from_response(response)
github_to_sqlite.utils.GitHubError: ('API rate limit exceeded for user ID 9599.', 403)
Error: Process completed with exit code 1.

@daniel-butler
Copy link
Contributor

daniel-butler commented Jan 30, 2021

I don't have much experience with github's rate limiting. In my day job we use the tenacity library to handle http errors we get.

@hydrosquall
Copy link

I've been looking into how to to get this data out of Github (especially now there are "secondary rate limits" without an advertised allowance separate from the regular rate limits.

I've had decent success with the Airbyte github extractor (aside from one data quality issue airbytehq/airbyte#15420 ). Airbyte splits data extraction between the GraphQL and REST endpoints depending on the resource type, but they're very comprehensive.

https://github.com/airbytehq/airbyte/blob/306a75ef5370728e0912cf52a1a898a530db0c90/airbyte-integrations/connectors/source-github/source_github/streams.py#L22-L122

Before this, I tried a few solutions in my own custom wrapper mentioned in this thread + its children PyGithub/PyGithub#1989 , but they weren't working as expected.

@chapmanjacobd
Copy link

also, it says that authenticated requests have a much higher "rate limit". Unauthenticated requests only get 60 req/hour ?? seems more like a quota than a "rate limit" (although I guess that is semantic equivalence)

You would want to use x-ratelimit-reset

time.sleep(r['x-ratelimit-reset'] + 1 - time.time())

But a more complete solution would bring authenticated requests to the other subcommands. I'm surprised only github-to-sqlite get is using the --auth= CLI flag

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants