Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lease Acquisition Recovery Interval causing lots of AWS rate-limit errors #409

Open
duinness opened this issue Mar 8, 2021 · 1 comment

Comments

@duinness
Copy link
Contributor

duinness commented Mar 8, 2021

Currently the ACQUIRE_LEASES_RECOVERY_INTERVAL is hard-coded to fire every 5 seconds if an error is returned when attempting to acquire a lease. This becomes a big problem if an account has a lot of kinesis streams and multiple instances of lifion-kinesis managing them.

My prod account is safe, because it only has about 10 kinesis streams. But my development account covers 10 different testing environments, so it has about 100 streams. Searching my logs for "Unexpected recoverable failure when trying to acquire" returns ~5,000 hits per minute.

I've opened a PR to make the lease acquisition recovery interval configurable.
I ran this code in my test env and set it to 30 seconds. This dropped my error rate from 5,000/min to 20/min.

@duinness
Copy link
Contributor Author

duinness commented Mar 8, 2021

Hmm... it isn't letting me link the PR. Well, here's the PR for this issue.
#406

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant