improve MallocUsage() accuracy #28531

LarryRuane · 2023-09-25T22:17:07Z

The MallocUsage() function takes an allocation size as an argument and returns the amount of physical memory consumed, which is greater due to memory allocator overhead and alignment. It was first added in 2015 (first commit of #6102), but its accuracy has degraded as memory allocation libraries have evolved. It's used when it's important that large data structures, such as the coins cache and mempool, should use a predictable, configurable (limited) amount of physical memory (see the -dbcache and -maxmempool configuration options), as well as a few other places.

sipa figured out a concise, efficient expression that this function can use, and that's what's implemented here.

Also add a unit test, which is more helpful than usual in this case since platforms, operating systems, and libraries vary significantly in this area.

DrahtBot · 2023-09-25T22:17:09Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.
A summary of reviews will appear here.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#28939 (memusage: let PoolResource keep track of all allocated/deallocated memory by martinus)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

LarryRuane · 2023-09-25T22:17:28Z

The results of the old and new versions of this function should differ only slightly; it would be bad if the new one gave very different results, because node operators might find too much memory being consumed, or (not as bad) not enough. To manually test this, I ran -reindex-chainstate on Ubuntu, and watched memory usage by printing /proc/pid/statm and watching the data (6th) field. On master, the value settled to 315140 (units are 4k pages); with this PR it was 315606, which indicates that the new version produces slightly smaller results, but they're close.

LarryRuane · 2023-09-27T15:11:34Z

Changing to draft because CI on a couple of the platforms failed the new test, so the physical memory calculation will need to be different on those platforms, I try to figure that out.

martinus · 2023-11-20T08:24:02Z

I can reproduce the same results with glibc 2.31 and 2.38, in 32bit and 64bit. I don't think it needs to be perfect on all platforms, I think your new calculation is fine.

I played a bit with a reproducer, here is mine: https://godbolt.org/z/s971sbnhG I refactored your MallocUsage formula a bit, but it's basically the same

DrahtBot · 2024-02-29T11:58:43Z

🤔 There hasn't been much activity lately and the CI seems to be failing.

If no one reviewed the current pull request by commit hash, a rebase can be considered. While the CI failure may be a false positive, the CI hasn't been running for some time, so there may be a real issue hiding as well. A rebase triggers the latest CI and makes sure that no silent merge conflicts have snuck in.

LarryRuane · 2024-02-29T20:28:06Z

Force-pushed to 04f3fbe - rebase only. (CI is still expected to fail, I'm working on that now.)

LarryRuane · 2024-03-15T22:34:47Z

@martinus
Can you think of why the macOS and Win64 are failing (I have only Ubuntu, which passes)? As an experiment, I changed the memory usage ratio needed to pass to the resource map being just less than std_map, but not even that passes (that is, the resource map uses more space). Both of those platforms fail the same way (you can see this in the CI results, this is Win64):

D:/a/bitcoin/bitcoin/src/test/pool_tests.cpp(183): error: in "pool_tests/memusage_test": 
  check memusage::DynamicUsage(resource_map) <= memusage::DynamicUsage(std_map) * 100 / 100 
  has failed [466624 > 451088]

On earlier runs, some other tests have failed, but I'm not sure if those are spurious failures.

I'm not sure what to do now, any thoughts or suggestions are welcome!

sipa · 2024-03-16T18:57:08Z

@LarryRuane There is no reason to assume that the memory allocation overhead on those platforms, with substantially different C libraries, would match the formula we inferred on x86_64 and arm Linux.

If our goal is actually accurate memory usage calculations on all systems, we will need to derive a formula for each supported system separately.

LarryRuane · 2024-03-19T17:40:18Z

@sipa - I don't think the test failures could be due to differences in memory allocation overhead across platforms. The actual memory allocation overhead isn't being tested at all. To do that would probably require a test that uses /proc/meminfo (which doesn't exist on all platforms) or similar.

I don't mind giving up and closing this PR, but as @martinus said in an earlier comment, this calculation doesn't need to be perfect, just better.

I think this test failure must be due to differences in std::unordered_map implementations. That container does various allocations for different reasons, and it must be that with this PR's change in memory usage calculation, the std::unordered_map on macOS and Win64 (at least) are doing more a kind of allocation that the new calculation thinks has higher overhead. Something like that.

[UPDATE: the following discussion about std::unordered_map::reserve() is incorrect, calling reserve() doesn't prevent the bucket array from being reallocated multiple times within the pool resource -- rather, it's reallocated using regular memory. Only allocations up to 128 bytes come from the pool allocator. (I'll probably remove these comments later.)

I thought I had it figured out; I verified with the debugger that as the test adds the 10k entries to the two maps, it has to repeatedly grow the bucket array. The allocation sizes (in bytes) for the bucket array that I saw here on Ubuntu were: 104, 232, 472, 1026, 2056, 4328, 8872, 18856, 40696, 82184.... The tradeoff one makes when using the pool resource allocator is that memory allocations that are freed but then those same sizes not allocated again, those allocations are, in effect, leaked (until the resource is destroyed). The pool resource works great when freeing and allocating the same sizes (rounded up to the alignment, usually 8 bytes) repeatedly. This is, of course, the case for the individual map entries.

So the latest force-push simply calls reserve() initially on both maps (this is only really needed for the pool resource map). Debugging verifies that this does prevent the bucket array from being reallocated, here on Ubuntu. But it didn't fix the problem on at least macOS and Win64. Maybe those platforms' implementations of std::unordered_map ignore the reserve() hint?

It's difficult when CI fails on platforms that I don't have access to. Do others have a solution to that? Can I install macOS on Ubuntu or Windows 10 (I have both) as a VM? Maybe I can set up my Windows laptop to build and run the debugger, but that seems like a lot of work.

martinus · 2024-03-20T09:32:38Z

@LarryRuane The problem here is that the DynamicUsage(const std::unordered_map<X, Y, Z>& m) just estimates the memory usage by thinking that internally it uses a node with a single pointer (see struct unordered_node : private X in memusage.h). This estimation is probably not correct on all platforms, e.g. windows seems to use a double linked list, so this underestimates actual memory usage.

The inconcsistency now comes when using DynamicUsage with a pool resource. Here the actual real world memory usage is counted (because it uses a custom allocator, there's no need for estimating node size), and here the different implementations actually effect the result.

So, with your new MallocUsage the numbers are most likely more correct, but this seems to trigger the incorrect underestimation for DynamicUsage(const std::unordered_map<X, Y, Z>& m)

LarryRuane · 2024-03-21T17:37:44Z

@martinus - thanks, that makes perfect sense! I hadn't thought of the possibility of the nodes having two pointers instead of one (double instead of single linked list). Seems like a bad design, but anyway.

I don't think there would be much benefit to fixing DynamicUsage(const std::unordered_map<X, Y, Z>& m) to account for this (make it dependent on platform), because this function is no longer important since we're using the pool resource. It's probably really used only in the test. So it's really just a test problem. I'll see what I can do to fix this.

DrahtBot · 2024-05-13T08:02:59Z

Turned into draft for now, due to failing CI. If this is no longer a WIP, you can move it out of draft.

LarryRuane · 2024-05-22T19:12:35Z

NOTE, this PR isn't ready to merge; at the very least it has some debug prints that need to be removed.

CI was failing on some platforms (Windows and macOS), in test/functional/mempool_limit.py: test_mid_package_eviction(). I think this test is fragile; this very-unrelated PR shouldn't be able to cause it to fail IMO. The problem turned out to be the following:

This test calls the test utility function fill_mempool(). This function creates independent physically-large (around 65kb) transactions at varying fee rates until the one with the lowest feerate is evicted due to the mempool becoming full. The functional tests run with -maxmempool=5 (MB), so that requires about 72 transactions.

After fill_mempool() returns, the mempool likely won't be completely full, because these were large transactions, so there's some space left but not enough for another transaction. The test, test_mid_package_eviction, then submits a smaller transaction (weight around 8k) called mempool_evicted_tx with a fee just barely high enough to be accepted, and expects that transaction to be accepted. (It has this name, mempool_evicted_tx, because the test expects it to be evicted later.)

With this PR, that transaction was not accepted on Windows and macOS. Some debug prints show that on those platforms, when fill_mempool() returns, there are only 1200 bytes of free space in the mempool. This is because the std::unordered_map library implementations are different on these platforms, and this causes the memory usage calculation (for the entire map) to be slightly different in this PR. So it's really just bad luck. This is why I think the test is fragile.

After trying a few fixes to the test that didn't work, what finally worked (the current branch) is, to have fill_mempool(), just before it returns, submit another transaction that replaces (RBF) an existing one in the mempool, but is slightly smaller. This frees up about 10kb of additional space, which allows room for the 8k transaction to be reliably accepted. This doesn't break the other tests that call fill_mempool().

This attempt didn't work the first time, however, because the mempools wouldn't reliably sync in test/functional/p2p_1p1c_network.py. This test starts 4 nodes, and calls fill_mempool() on node 0 then waits for all 4 mempools to become synced. Sometimes they do but other times this never happens. I wonder if there might be an unrelated bug (I can investigate later), but what often happens (it's timing dependent and thus random) is that the replacement tx (added by this PR) does appear on all nodes, but a low-feerate tx that was evicted on node 0 remains on nodes 1-3. I can explain what I'm seeing in more detail later if needed. What ended up fixing this is to delay a few seconds before submitting the RBF transaction. It's ugly, maybe there's a better way, but syncing mempools is kind of non-deterministic anyway, so I don't think it's terribly bad.

@glozow and @theStack, can you let me know your opinion? You two have worked a lot in this area.

theStack · 2024-05-23T00:24:51Z

@LarryRuane: Interesting problem (and very creative solution with the RBF tx :)). Fwiw, on one of my machines running OpenBSD 7.5, the same behaviour as you described occurs on commit 663e501: after fill_mempool, mempool usage is very close to the max ('usage': 4998768), and the subtest test_mid_package_eviction fails on the first MiniWallet tx submission with a "mempool full" error.

One idea I had to solve this at the fill_mempool side is to submit an extra tx at the end of fill_mempool that is smaller (I chose half the size) than the filler txs. Given that this causes eviction, a large filler tx is removed, and the smaller-sized tx is getting in, freeing up some space (around ~32.5KB in terms of bytes). It's still very ugly, especially since this code is only ever executed for edge cases (note that we can't unconditionally submit the smaller tx, as this could cause the problem rather than solve it, if no eviction happens), but passes all the tests that use fill_mempool for me: theStack@f758e73

However, is this even necessary to be solved in fill_mempool? Haven't looked closer at the failing test yet, but maybe it's just a bug in the fee calculation (or MiniWallet's treatment of target_weight).

LarryRuane · 2024-05-23T20:58:18Z

Your patch is much better, @theStack, thanks! It's simpler and doesn't need that weird sleep. Force-pushed your patch. If it passes CI, I think this is the way to go. I had tried something similar, but I didn't think to make the sending of that extra (smaller) transaction conditional based on available mempool space.

Co-authored-by: Pieter Wuille <[email protected]> Co-authored-by: Martin Leitner-Ankerl <[email protected]>

LarryRuane · 2024-05-24T18:31:07Z

CI passed, thanks @theStack. I just force-pushed swapping of the two commits, the test commit has to be first for each commit to pass CI.

DrahtBot added the CI failed label Sep 25, 2023

LarryRuane marked this pull request as draft September 27, 2023 15:11

This was referenced Nov 20, 2023

RAM usage regression in 26.x and master on ARM 32-bit #28906

Closed

util: generalize accounting of system-allocated memory in pool resource #27748

Closed

DrahtBot mentioned this pull request Nov 25, 2023

memusage: let PoolResource keep track of all allocated/deallocated memory #28939

Open

LarryRuane marked this pull request as ready for review February 29, 2024 20:07

LarryRuane force-pushed the 2023-09-MallocUsage branch from b469486 to 04f3fbe Compare February 29, 2024 20:26

LarryRuane force-pushed the 2023-09-MallocUsage branch 2 times, most recently from cee653f to 86b71ec Compare March 1, 2024 23:22

LarryRuane force-pushed the 2023-09-MallocUsage branch 3 times, most recently from 6bfdad9 to 58ebf90 Compare March 15, 2024 21:32

LarryRuane force-pushed the 2023-09-MallocUsage branch from 58ebf90 to e0fa518 Compare March 19, 2024 02:29

LarryRuane force-pushed the 2023-09-MallocUsage branch from e0fa518 to c6cba06 Compare March 22, 2024 21:10

LarryRuane force-pushed the 2023-09-MallocUsage branch from c3d1f4b to eec7112 Compare April 8, 2024 03:02

This was referenced Apr 8, 2024

AcceptMultipleTransactions: Fix workspace not being set as client_maxfeerate failure #29735

Merged

p2p: opportunistically accept 1-parent-1-child packages #28970

Merged

test: p2p: add test for rejected tx request logic (m_recent_rejects filter) #29827

Closed

DrahtBot added the Needs rebase label Apr 11, 2024

LarryRuane force-pushed the 2023-09-MallocUsage branch from eec7112 to ee3683d Compare May 3, 2024 15:38

DrahtBot removed the Needs rebase label May 3, 2024

DrahtBot marked this pull request as draft May 13, 2024 08:02

LarryRuane marked this pull request as ready for review May 21, 2024 22:57

LarryRuane force-pushed the 2023-09-MallocUsage branch from ee3683d to 9bc440d Compare May 21, 2024 23:03

DrahtBot added the Needs rebase label May 21, 2024

LarryRuane force-pushed the 2023-09-MallocUsage branch from 9bc440d to aeac0f2 Compare May 22, 2024 04:58

DrahtBot removed Needs rebase CI failed labels May 22, 2024

DrahtBot mentioned this pull request May 23, 2024

test: MiniWallet: respect passed feerate for padded txs (using target_weight) #30162

Merged

LarryRuane force-pushed the 2023-09-MallocUsage branch from aeac0f2 to f758e73 Compare May 23, 2024 20:54

theStack and others added 2 commits May 24, 2024 12:28

test: ensure that fill_mempool leaves some room in mempool

9c89fb4

improve MallocUsage() accuracy

1fcbebc

Co-authored-by: Pieter Wuille <[email protected]> Co-authored-by: Martin Leitner-Ankerl <[email protected]>

LarryRuane force-pushed the 2023-09-MallocUsage branch from f758e73 to 1fcbebc Compare May 24, 2024 18:29

DrahtBot added CI failed and removed CI failed labels Sep 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve MallocUsage() accuracy #28531

improve MallocUsage() accuracy #28531

LarryRuane commented Sep 25, 2023 •

edited by fanquake

Loading

DrahtBot commented Sep 25, 2023 •

edited

Loading

LarryRuane commented Sep 25, 2023

LarryRuane commented Sep 27, 2023

martinus commented Nov 20, 2023 •

edited

Loading

DrahtBot commented Feb 29, 2024

LarryRuane commented Feb 29, 2024

LarryRuane commented Mar 15, 2024

sipa commented Mar 16, 2024 •

edited

Loading

LarryRuane commented Mar 19, 2024 •

edited

Loading

martinus commented Mar 20, 2024 •

edited

Loading

LarryRuane commented Mar 21, 2024

DrahtBot commented May 13, 2024

LarryRuane commented May 22, 2024

theStack commented May 23, 2024

LarryRuane commented May 23, 2024

LarryRuane commented May 24, 2024

improve MallocUsage() accuracy #28531

Are you sure you want to change the base?

improve MallocUsage() accuracy #28531

Conversation

LarryRuane commented Sep 25, 2023 • edited by fanquake Loading

DrahtBot commented Sep 25, 2023 • edited Loading

Code Coverage

Reviews

Conflicts

LarryRuane commented Sep 25, 2023

LarryRuane commented Sep 27, 2023

martinus commented Nov 20, 2023 • edited Loading

DrahtBot commented Feb 29, 2024

LarryRuane commented Feb 29, 2024

LarryRuane commented Mar 15, 2024

sipa commented Mar 16, 2024 • edited Loading

LarryRuane commented Mar 19, 2024 • edited Loading

martinus commented Mar 20, 2024 • edited Loading

LarryRuane commented Mar 21, 2024

DrahtBot commented May 13, 2024

LarryRuane commented May 22, 2024

theStack commented May 23, 2024

LarryRuane commented May 23, 2024

LarryRuane commented May 24, 2024

LarryRuane commented Sep 25, 2023 •

edited by fanquake

Loading

DrahtBot commented Sep 25, 2023 •

edited

Loading

martinus commented Nov 20, 2023 •

edited

Loading

sipa commented Mar 16, 2024 •

edited

Loading

LarryRuane commented Mar 19, 2024 •

edited

Loading

martinus commented Mar 20, 2024 •

edited

Loading