Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow task execution due to long retry backoff in ReplicationThrottleHelper #2147

Open
eazama opened this issue May 2, 2024 · 0 comments · May be fixed by #2148
Open

Slow task execution due to long retry backoff in ReplicationThrottleHelper #2147

eazama opened this issue May 2, 2024 · 0 comments · May be fixed by #2148

Comments

@eazama
Copy link

eazama commented May 2, 2024

I've observed extremely long gaps in executions of task batches. Based on the logs, these gaps appear to be the result of ReplicationThrottleHelper submitting requests to add or remove the replication throttle rate configurations and then waiting for the change to be reflected on the broker. Specifically, there is a 10 second gap between each broker's configuration change. This can lead to minutes of time doing nothing, even on small 6 or 12 node clusters.

The specific mechanism that ReplicationThrottleHelper is using to wait for changes is to call CruiseControlMetricsUtils.retry, specifically, the overload that uses the default backoff configurations of scale=5 seconds and base=2.

I assume that the first describe request doesn't return the expected configurations for some reason, resulting in the retry loop triggering the first 10 second backoff.

10 seconds seems like an excessive amount of time to wait for the first retry, so it would be nice if the retry loop started with a much smaller scale, somewhere on the order of a few milliseconds. Because the exponential backoff has no cap, starting with a small scale is somewhat necessary to prevent the backoff from becoming unreasonably large after only one or two retries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant