-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak in MessageChannelPartitionHandler when polling the database #4598
Comments
Thanks for the very well written bug report. I think your description and analysis are absolutely on point, and the example files are very concise and useful. Very much appreciated! I've opened a PR to resolve the issue: #4599 With this fix, your reproducer runs successfully even with |
@a-del-devision Thank you for reporting this issue in details and for providing a minimal complete example! In fact, there is no need to hold a reference to the entire object graph of each completed worker step in memory until the partition step is completed. The change in #4599 removes the intermediate result that holds these references, which fixes the memory leak. I will plan that fix for the upcoming patch releases 5.1.2 and 5.0.6. |
Bug description
In remote partitioning jobs which use the
MessageChannelPartitionHandler
with database polling, on each poll of the database where there is 1 or more new completed workerStepExecutions
, the partition handler loads and keeps in memory an additional copy of the correspondingJobInstance
,JobExecution
, and allStepExecutions
and theirExecutionContexts
until the partition step is completed.This leads to high memory consumption during the partition step and can lead to out of memory errors if the poll interval is short enough and the number of partitions is high enough, especially since the
ExecutionContexts
are held in memory as well.Environment
Any environment using
spring-batch-integration
5.0.1
and above (93800c6) which also uses theMessageChannelPartitionHandler
with database polling.Steps to reproduce
Run a remote partitioning batch job with database polling, a short poll interval, high number of partitions, and limited available memory.
Expected behavior
Polling of the database in remote partitioning jobs does not lead to constant gradual increase of consumed memory until the partition step completes.
Minimal Complete Reproducible example
Minimal Complete Reproducible example is here: spring-batch-mcve-memory-leak.zip
The example runs a remote partitioning batch job with 1000 partitions, each having an
ExecutionContext
containing a singleUUID
. In order to exacerbate the memory consumption, the poll interval is set to very low (2ms
), and the worker step sleeps for50ms
before completing. This allows a new copy of theJobInstance
,JobExecution
,StepExecutions
, andExecutionContexts
to be loaded and held in memory each time a worker step completes.Please run the example with the
-Xmx64m -XX:+HeapDumpOnOutOfMemoryError
jvm options:This should cause an
OutOfMemoryError
to be thrown rather quickly and the resulting heap dump should contain the following:JobInstance
,JobParameters
,JobExecition
StepExecution
,ExecutionContext
Analysis
In
MessageChannelPartitionHandler#pollReplies
, the callback callsJobExplorer#getJobExecution
. TheSimpleJobExplorer
implementation loads theJobExecution
and all of theStepExecutions
as well as theirExecutionContexts
. Each of theseStepExecutions
also contains a reference to theJobExecution
and thus to all otherStepExecutions
indirectly. If any of the loadedStepExecutions
is completed and not present in theresult
Set, they are added to it, and this causes the currently loadedJobExecution
instance and all of the otherStepExecution
instances to be held in memory.The text was updated successfully, but these errors were encountered: