Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoiding rebalance on historical restart #17594

Open
Z9n2JktHlZDmlhSvqc9X2MmL3BwQG7tk opened this issue Dec 21, 2024 · 1 comment
Open

Avoiding rebalance on historical restart #17594

Z9n2JktHlZDmlhSvqc9X2MmL3BwQG7tk opened this issue Dec 21, 2024 · 1 comment

Comments

@Z9n2JktHlZDmlhSvqc9X2MmL3BwQG7tk
Copy link

Z9n2JktHlZDmlhSvqc9X2MmL3BwQG7tk commented Dec 21, 2024

High IO load on historical nodes during one of nodes upgrade/restart

Affected Version

31.0.0
smartSegmentLoading = true

Description

When changing the configuration or upgrading the version we restart historical nodes one by one, waiting until the node being restarted becomes available (become registered in Zookeeper). We have replication factor = 2. But it looks like coordinator immediatelly assigns load tasks on the remaining historicals which causes high IO load on almost all running historicals and deep storage (we use cephfs). In previous versions of Druid we didn't see such behavior, redundancy recovered much more slowly.

Is there a way to tell coordinator to make a delay with redundancy recovery ?

The only coordinator's parameter related to such sitiation that I see is replicationThrottleLimit, but it does not prevent the redundancy recovery load queue from appearing. There must be another setting to delay recovery completely (do not send load tasks to historicals).

@kfaraz
Copy link
Contributor

kfaraz commented Dec 21, 2024

Thanks for reporting this issue, @Z9n2JktHlZDmlhSvqc9X2MmL3BwQG7tk !
When smartSegmentLoading is set to true, the value of replicationThrottleLimit provided in the coordinator's dynamic is essentially ignored. Smart segment loading automatically calculates the value of replicationThrottleLimit to be 5% of the total number of "used" segments in the cluster.

So, if you have 1000 used segments in the cluster, replication throttle limit would be calculated as 50.

For the time being, you could try setting these:

smartSegmentLoading: false
maxSegmentsInNodeLoadingQueue: 0
replicationThrottleLimit: 10 (or any other value that suits you depending on the total number of segments in your cluster)

However, in the long run, we would like to improve the computation of replicationThrottleLimit performed by the Coordinator even when smartSegmentLoading is set to true.

Edit: Or at the very least, we can continue to honor the value of replicationThrottleLimit provided by the user even if smartSegmentLoading is true.

@kfaraz kfaraz self-assigned this Dec 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants