Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Parallelise allocation decider executions to prevent high priority tasks from timing out #13742

Open
Bukhtawar opened this issue May 18, 2024 · 1 comment
Labels
enhancement Enhancement or improvement to existing feature or request ShardManagement:Performance

Comments

@Bukhtawar
Copy link
Collaborator

Bukhtawar commented May 18, 2024

Is your feature request related to a problem? Please describe

[2024-05-18T18:44:03,287][DEBUG][o.o.a.a.c.s.TransportClusterUpdateSettingsAction] [b869f183befc74cff9f3b5572821ec21] #[org.opensearch.cluster.metadata.ProcessClusterEventTimeoutException]#failed to perform [cluster_update_settings]
ProcessClusterEventTimeoutException[failed to process cluster event (cluster_update_settings) within 1m]
        at org.opensearch.cluster.service.MasterService$Batcher.lambda$onTimeout$0(MasterService.java:200)
        at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
        at org.opensearch.cluster.service.MasterService$Batcher.lambda$onTimeout$1(MasterService.java:199)
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:863)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:840)

Notice the time spent in the loop

99.1% (9.9s out of 10s) cpu usage by thread 'opensearch[b869f183befc74cff9f3b5572821ec21][clusterManagerService#updateTask][T#1]'
     9/10 snapshots sharing following 20 elements
       app//org.opensearch.cluster.routing.allocation.decider.AllocationDeciders.canAllocate(AllocationDeciders.java:94)
       app//org.opensearch.cluster.routing.allocation.allocator.LocalShardsBalancer.decideAllocateUnassigned(LocalShardsBalancer.java:927)
       app//org.opensearch.cluster.routing.allocation.allocator.LocalShardsBalancer.allocateUnassigned(LocalShardsBalancer.java:813)
       app//org.opensearch.cluster.routing.allocation.allocator.BalancedShardsAllocator.allocate(BalancedShardsAllocator.java:288)
       app//org.opensearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:557)
       app//org.opensearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:501)
       app//org.opensearch.action.admin.cluster.reroute.TransportClusterRerouteAction$ClusterRerouteResponseAckedClusterStateUpdateTask.execute(TransportClusterRerouteAction.java:269)
       app//org.opensearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:67)
       app//org.opensearch.cluster.service.MasterService.executeTasks(MasterService.java:882)
       app//org.opensearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:434)
       app//org.opensearch.cluster.service.MasterService.runTasks(MasterService.java:301)
       app//org.opensearch.cluster.service.MasterService$Batcher.run(MasterService.java:212)
       app//org.opensearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:209)
       app//org.opensearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:247)
       app//org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:863)
       app//org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:283)
       app//org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:246)
       [email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
       [email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
       [email protected]/java.lang.Thread.run(Thread.java:840)
     unique snapshot
       [email protected]/java.util.Collections$UnmodifiableCollection$1.<init>(Collections.java:1051)
       [email protected]/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1050)
       app//org.opensearch.cluster.routing.allocation.decider.AllocationDeciders.canAllocate(AllocationDeciders.java:93)
       app//org.opensearch.cluster.routing.allocation.allocator.LocalShardsBalancer.decideAllocateUnassigned(LocalShardsBalancer.java:927)
       app//org.opensearch.cluster.routing.allocation.allocator.LocalShardsBalancer.allocateUnassigned(LocalShardsBalancer.java:813)
       app//org.opensearch.cluster.routing.allocation.allocator.BalancedShardsAllocator.allocate(BalancedShardsAllocator.java:288)
       app//org.opensearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:557)
       app//org.opensearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:501)
       app//org.opensearch.action.admin.cluster.reroute.TransportClusterRerouteAction$ClusterRerouteResponseAckedClusterStateUpdateTask.execute(TransportClusterRerouteAction.java:269)
       app//org.opensearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:67)
       app//org.opensearch.cluster.service.MasterService.executeTasks(MasterService.java:882)
       app//org.opensearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:434)
       app//org.opensearch.cluster.service.MasterService.runTasks(MasterService.java:301)
       app//org.opensearch.cluster.service.MasterService$Batcher.run(MasterService.java:212)
       app//org.opensearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:209)
       app//org.opensearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:247)
       app//org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:863)
       app//org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:283)
       app//org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:246)
       [email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
       [email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
       [email protected]/java.lang.Thread.run(Thread.java:840)
   

Individual thread dumps

"opensearch[b869f183befc74cff9f3b5572821ec21][clusterManagerService#updateTask][T#1]" #3427 daemon prio=5 os_prio=0 cpu=153387532.42ms elapsed=184422.36s tid=0x0000fffee40040e0 nid=0x1152 runnable  [0x0000fffdf1ba1000]
   java.lang.Thread.State: RUNNABLE
        at org.opensearch.common.settings.Setting.get(Setting.java:506)
        at org.opensearch.common.settings.Setting.get(Setting.java:477)
        at org.opensearch.common.settings.Setting.get(Setting.java:623)
        at org.opensearch.cluster.routing.allocation.decider.ShardsLimitAllocationDecider.doDecide(ShardsLimitAllocationDecider.java:129)
        at org.opensearch.cluster.routing.allocation.decider.ShardsLimitAllocationDecider.canAllocate(ShardsLimitAllocationDecider.java:113)
        at org.opensearch.cluster.routing.allocation.decider.AllocationDeciders.canAllocate(AllocationDeciders.java:94)
        at org.opensearch.cluster.routing.allocation.allocator.LocalShardsBalancer.decideAllocateUnassigned(LocalShardsBalancer.java:927)
        at org.opensearch.cluster.routing.allocation.allocator.LocalShardsBalancer.allocateUnassigned(LocalShardsBalancer.java:813)
        at org.opensearch.cluster.routing.allocation.allocator.BalancedShardsAllocator.allocate(BalancedShardsAllocator.java:288)
        at org.opensearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:557)
        at org.opensearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:501)
        at org.opensearch.action.admin.cluster.reroute.TransportClusterRerouteAction$ClusterRerouteResponseAckedClusterStateUpdateTask.execute(TransportClusterRerouteAction.java:269)
        at org.opensearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:67)
        at org.opensearch.cluster.service.MasterService.executeTasks(MasterService.java:882)
        at org.opensearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:434)
        at org.opensearch.cluster.service.MasterService.runTasks(MasterService.java:301)
        at org.opensearch.cluster.service.MasterService$Batcher.run(MasterService.java:212)
        at org.opensearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:209)
        at org.opensearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:247)
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:863)
        at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:283)
        at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:246)
        at java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1136)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:635)
        at java.lang.Thread.run([email protected]/Thread.java:840)
"clusterManagerService#updateTask"
"opensearch[b869f183befc74cff9f3b5572821ec21][clusterManagerService#updateTask][T#1]" #3427 daemon prio=5 os_prio=0 cpu=153454163.28ms elapsed=184496.35s tid=0x0000fffee40040e0 nid=0x1152 runnable  [0x0000fffdf1ba1000]
   java.lang.Thread.State: RUNNABLE
        at java.util.Spliterator.getExactSizeIfKnown([email protected]/Spliterator.java:414)
        at java.util.stream.AbstractPipeline.copyIntoWithCancel([email protected]/AbstractPipeline.java:526)
        at java.util.stream.AbstractPipeline.copyInto([email protected]/AbstractPipeline.java:513)
        at java.util.stream.AbstractPipeline.wrapAndCopyInto([email protected]/AbstractPipeline.java:499)
        at java.util.stream.FindOps$FindOp.evaluateSequential([email protected]/FindOps.java:150)
        at java.util.stream.AbstractPipeline.evaluate([email protected]/AbstractPipeline.java:234)
        at java.util.stream.IntPipeline.findFirst([email protected]/IntPipeline.java:552)
        at java.text.DecimalFormatSymbols.findNonFormatChar([email protected]/DecimalFormatSymbols.java:844)
        at java.text.DecimalFormatSymbols.initialize([email protected]/DecimalFormatSymbols.java:815)
        at java.text.DecimalFormatSymbols.<init>([email protected]/DecimalFormatSymbols.java:115)
        at sun.util.locale.provider.DecimalFormatSymbolsProviderImpl.getInstance([email protected]/DecimalFormatSymbolsProviderImpl.java:85)
        at java.text.DecimalFormatSymbols.getInstance([email protected]/DecimalFormatSymbols.java:182)
        at java.util.Formatter.zero([email protected]/Formatter.java:2450)
        at java.util.Formatter$FormatSpecifier.getZero([email protected]/Formatter.java:4450)
        at java.util.Formatter$FormatSpecifier.localizedMagnitude([email protected]/Formatter.java:4466)
        at java.util.Formatter$FormatSpecifier.print([email protected]/Formatter.java:3276)
        at java.util.Formatter$FormatSpecifier.print([email protected]/Formatter.java:3261)
        at java.util.Formatter$FormatSpecifier.printInteger([email protected]/Formatter.java:2957)
        at java.util.Formatter$FormatSpecifier.print([email protected]/Formatter.java:2918)
        at java.util.Formatter.format([email protected]/Formatter.java:2689)
        at java.util.Formatter.format([email protected]/Formatter.java:2625)
        at java.lang.String.format([email protected]/String.java:4186)
        at org.opensearch.cluster.routing.allocation.decider.ThrottlingAllocationDecider.allocateNonInitialShardCopies(ThrottlingAllocationDecider.java:243)
        at org.opensearch.cluster.routing.allocation.decider.ThrottlingAllocationDecider.canAllocate(ThrottlingAllocationDecider.java:200)
        at org.opensearch.cluster.routing.allocation.decider.AllocationDeciders.canAllocate(AllocationDeciders.java:94)
        at org.opensearch.cluster.routing.allocation.allocator.LocalShardsBalancer.decideAllocateUnassigned(LocalShardsBalancer.java:927)
        at org.opensearch.cluster.routing.allocation.allocator.LocalShardsBalancer.allocateUnassigned(LocalShardsBalancer.java:813)
        at org.opensearch.cluster.routing.allocation.allocator.BalancedShardsAllocator.allocate(BalancedShardsAllocator.java:288)
        at org.opensearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:557)
"clusterManagerService#updateTask"
"opensearch[b869f183befc74cff9f3b5572821ec21][clusterManagerService#updateTask][T#1]" #3427 daemon prio=5 os_prio=0 cpu=153486544.07ms elapsed=184535.29s tid=0x0000fffee40040e0 nid=0x1152 runnable  [0x0000fffdf1ba1000]
   java.lang.Thread.State: RUNNABLE
        at java.util.HashMap.hash([email protected]/HashMap.java:338)
        at java.util.HashMap.getNode([email protected]/HashMap.java:568)
        at java.util.HashMap.get([email protected]/HashMap.java:556)
        at java.util.Collections$UnmodifiableMap.get([email protected]/Collections.java:1502)
        at org.opensearch.cluster.routing.allocation.decider.DiskThresholdDecider.getDiskUsage(DiskThresholdDecider.java:577)
        at org.opensearch.cluster.routing.allocation.decider.DiskThresholdDecider.canAllocate(DiskThresholdDecider.java:224)
        at org.opensearch.cluster.routing.allocation.decider.AllocationDeciders.canAllocate(AllocationDeciders.java:94)
        at org.opensearch.cluster.routing.allocation.allocator.LocalShardsBalancer.decideAllocateUnassigned(LocalShardsBalancer.java:927)
        at org.opensearch.cluster.routing.allocation.allocator.LocalShardsBalancer.allocateUnassigned(LocalShardsBalancer.java:813)
        at org.opensearch.cluster.routing.allocation.allocator.BalancedShardsAllocator.allocate(BalancedShardsAllocator.java:288)
        at org.opensearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:557)
        at org.opensearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:501)
        at org.opensearch.action.admin.cluster.reroute.TransportClusterRerouteAction$ClusterRerouteResponseAckedClusterStateUpdateTask.execute(TransportClusterRerouteAction.java:269)
        at org.opensearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:67)
        at org.opensearch.cluster.service.MasterService.executeTasks(MasterService.java:882)
        at org.opensearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:434)
        at org.opensearch.cluster.service.MasterService.runTasks(MasterService.java:301)
        at org.opensearch.cluster.service.MasterService$Batcher.run(MasterService.java:212)
        at org.opensearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:209)
        at org.opensearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:247)
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:863)
        at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:283)
        at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:246)
        at java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1136)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:635)
        at java.lang.Thread.run([email protected]/Thread.java:840)

Describe the solution you'd like

  1. Break long running execution into smaller batched execution with checkpoints so as to allow other critical operations to get unblocked between two batch execution
  2. Parallelise allocation decider executions to speed up single threaded cluster manager task execution
  3. Make long running executions like reroute and routing table processing non-blocking, evaluate optimistic concurrency controls

Related component

ShardManagement:Performance

Describe alternatives you've considered

No response

Additional context

No response

@Bukhtawar Bukhtawar added enhancement Enhancement or improvement to existing feature or request untriaged labels May 18, 2024
@shwetathareja
Copy link
Member

Another thing is to run these deciders on batch of shards instead of one shard at a time as for large cluster running these deciders per shards slows down the decider significantly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request ShardManagement:Performance
Projects
Status: 🆕 New
Development

No branches or pull requests

2 participants