-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow adaptive limits #57
Labels
Comments
ImplementationMake another package inside https://github.com/cep21/circuit/tree/master/closers called hystrix-adaptive. It uses composition to include the hystrix package, but change ShouldOpen to be adaptive. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Problem
In the basic case, we want to time out or limit the rare bad request so we can maintain a good SLA. However, when problems happen (maybe the database takes 110ms rather than 100ms for all requests because of a DB issue), we don't want to fail 100% of requests and would rather increase our timeout by a bit while requests are slow, and move it back down when things normalize.
Idea
Move all limits from static numbers to (min/max/rate of change). For example, you could have a timeout normally at 100ms, but allow it to increase by 10ms per some unit if requests are slower than 100ms, but not allow requests to ever be slower than 300ms. Then, when things settle down, allow requests to timeout at 100ms again.
Solution
Circuit open/close logic is defined inside https://github.com/cep21/circuit/blob/master/closers.go#L9 and they listen to all the events on https://github.com/cep21/circuit/blob/master/metrics.go#L164
The function
ShouldOpen
is called when a circuit decides if it should open: https://github.com/cep21/circuit/blob/master/closers.go#L14Right now, for hystrix, we open directly on error percentage https://github.com/cep21/circuit/blob/master/closers/hystrix/opener.go#L140
Instead of opening on some threshold, it could detect why the circuit is failing (if it is because of too many timeouts or concurrency limits). If it is, it would modify the thread safe config on the circuit https://github.com/cep21/circuit/blob/master/circuit.go#L71 to increase the timeout. On concurrent
Success
, we can inspect the timeouts and lower the limit if things recover.Similarly, on
ErrConcurrencyLimitReject
calls, we could increase the concurrency limits up to a point, and decrease it on Success without ErrInterrupt.The text was updated successfully, but these errors were encountered: