Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too high memory consumption #334

Open
Drwalin opened this issue May 15, 2024 · 3 comments
Open

Too high memory consumption #334

Drwalin opened this issue May 15, 2024 · 3 comments
Assignees

Comments

@Drwalin
Copy link

Drwalin commented May 15, 2024

Hi,

I have written a program, that works in a producer-consumer allocation scheme (there are other threads as well, but most of the work and allocation is done in main producer-consumer pair of threads). Initially I used glibc malloc, but it was very slow, then I've found rpmalloc which in my case sped up performance of my program about 5 times. There is one problem: there is a constant increase in memory consumption. This shocked me, because my program works in iterations and after each iteration most of allocated memory is being freed.

Over time of running program, resident memory and virtual memory both increased a lot each minute. After some time, the program crashed having over 1.5 TiB of virtual memory allocated, 15 GiB in ram and 40 GiB in swap. The data stored in swap was never loaded back into ram (disk usage showed only writes). Total number of calls to pair of rpmalloc and rpfree was around 140 bilion, with most of allocations of size 256, 512 and 4096.

In contrast, glibc malloc version used in peak up to 9 GiB of RAM and custom object pool used in peak up to 6 GiB of RAM having done the same amount of work as rpmalloc version.

My system:
OS: Arch Linux x86_64
Kernel: 6.8.9-arch1-2
RAM: 32GiB
SWAP: 44 GiB

rpmalloc version:
branch: develop
commit: 955f44b

Behavior of branch mjansson/rewrite 2dd697f seems to exhibit the same behavior.

I am not sure if this is a problem with my application or rpmalloc, but other allocators do not indicate any faulty behavior. Valgrind and GCC's AddressSanitizer do not show any memory leaks, buffer overflows, nor hidden segmentation faults in any of the following versions: rpmalloc, malloc, or custom object pool.

Edit:
I've found out that malloc overly high resident memory usage is due to linux settings and I can call malloc_trim(0) to decommit/free resident memory (as far as I've found out, other systems do not have nor require malloc_trim).

I should also note that I compile rpmalloc with ENABLE_OVERRIDE=0, because I had strange errors while debugging otherwise.

@mjansson
Copy link
Owner

Interesting - so if I want to try and reproduce this, I can basically create a program that runs a number of threads in pairs where each pair has one thread allocating memory and the other then deallocating it? What are the memory sizes being allocated?

@mjansson mjansson self-assigned this May 16, 2024
@Drwalin
Copy link
Author

Drwalin commented May 16, 2024

That's the core of payload of may application, but small other allocations (from other threads, or other sizes may increase this behavior as far as I know). Allocations that are performed a lot (99.9% of number of allocations): 256, 512 and 4096, there may be few in range of 4 KiB-60 KiB, but these are sparse.

P.S.:
I am now testing my application with main branch and memory consumption does not seem to increase over expected amounts.

@Drwalin Drwalin closed this as completed May 16, 2024
@Drwalin Drwalin reopened this May 16, 2024
@Drwalin
Copy link
Author

Drwalin commented May 17, 2024

Now I think I have tested enough with main branch. After doing more work than with develop branch, virtual and resident memory consumption is well bellow acceptable/predicted maximum range.

Is it normal for rpmalloc to consume (reserve/cache) a lot more memory than needed?
Application uses 50 MiB in idle mode with custom pool allocator, but version with rpmalloc/main does not drop below 500 MiB. Is this 450 MiB of additionaly reserved resident memory just a thread local or global cache for future fast allocations? Is it normal/acceptable behavior for rpmalloc for application with 3 threads? It seems to be a good enough trade-off between performance and memory consumption, I am just wondering of a norm.

'mainbranch is faster for me thandevelopbranch by 10%-15%, while ignoring memory leak withdevelop` branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants