New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dbwrapper: Bump LevelDB max file size to 128 MiB to avoid system slowdown from high disk cache flush rate #30039
base: master
Are you sure you want to change the base?
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process. |
@jamesob Feel like benchmarking a reindex or so with this? |
Are there any drawbacks to this? |
Might partially address #29662 |
I didn't notice any. It's worth mentioning that the total amount of data stored in this database is at least two orders of magnitude higher than even 128 MiB file size. |
Oh yes, i asked because i like this even from a "leveldb creates less files" point of view, eg anecdotally on one of my nodes the counter exceeds 6 digits bitcoin-core/bitcoin-maintainer-tools#161 . Of course, this includes deleted files, the active number is "only" about 6000. |
It would be great if there are few/no drawbacks. Do you mind sharing the methods used so far to test this? It would be great to have some data for comparison. Other questions that come to mind (thinking out loud before I dig deeper or perform testing):
|
That issue is complaining about long compaction times. From https://github.com/bitcoin/bitcoin/blob/master/src/leveldb/include/leveldb/options.h#L111-L112:
it seems this change would make compaction times longer, so would exacerbate that issue? |
I am simply watching the disk cache flush rate in
Did not observe any such effect, the RAM usage of the Bitcoin process seems to vary within roughly the same bounds when syncing with the Bitcoin network with our without this change.
Maybe, but I don't know whether it makes sense to expose additional tuning option with respect to, for example, maintenance impact.
For me, the biggest performance impact of compaction is from disk cache flushes this operation generates. |
src/dbwrapper.cpp
Outdated
@@ -147,6 +147,7 @@ static leveldb::Options GetOptions(size_t nCacheSize) | |||
// on corruption in later versions. | |||
options.paranoid_checks = true; | |||
} | |||
options.max_file_size = 128 << 20; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make this a constant? Would it be appropriate to reuse MAX_BLOCKFILE_SIZE
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on a constant, but i don't think it's approprioate to reuse MAX_BLOCKFILE_SIZE
, better to define a new one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a relevant constant.
Benchmarked IBD with an SSD to block 800k,
From what I understand, this patch reduces the frequency of flushes, but they will take longer when they do occur. This is great for IBD, but for #29662 the issue is an unavoidable compaction at startup. The compaction could potentially take longer with this patch. |
That's impressive!
Not only the frequency of flushes; another potential advantage here is that leveldb will spend less time open()ing and close()ing files to maintain its allowed number of open files (eg the |
If there's no drawbacks, why not go even larger? |
I ran some benchmarks of IBD to block 800,000 vs master for comparison, and got some similar, if slightly less impressive, results with default dbcache. With With I only did a single run of each though. Sync was performed from a single second local node with datadir on a separate SSD. |
FWIW re: #29662 I did not notice any difference in compaction time at startup on an SSD. It takes about 5 seconds to finish with |
The default max file size for LevelDB is 2 MiB, which results in the LevelDB compaction code generating ~4 disk cache flushes per second when syncing with the Bitcoin network. These disk cache flushes are triggered by fdatasync() syscall issued by the LevelDB compaction code when reaching the max file size. If the database is on a HDD this flush rate brings the whole system to a crawl. It also results in very slow throughput since 2 MiB * 4 flushes per second is about 8 MiB / second max throughput, while even an old HDD can pull 100 - 200 MiB / second streaming throughput. Increase the max file size for LevelDB to 128 MiB instead so the flush rate drops to about 1 flush / 2 seconds and the system no longer gets so sluggish. The max file size value chosen also matches the MAX_BLOCKFILE_SIZE file size setting already used by the block storage.
7f15e71
to
3e32d23
Compare
I used 128 MiB as the new size for commonality with But just to be sure, changed the patch to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've played around with this branch a bit, upgrading and downgrading between it and master with existing datadirs on signet and didn't run into any issues.
Also just noting that this will affect all leveldb databases, also the indexes and the block/index
db.
It appears that RocksDb (more-developed derivative of LevelDB) uses a default of 64 MiB (https://github.com/facebook/rocksdb/blob/main/include/rocksdb/advanced_options.h#L468). See also my comment in #30059 (comment). |
The default max file size for LevelDB is 2 MiB, which results in the LevelDB compaction code generating ~4 disk cache flushes per second when syncing with the Bitcoin network.
These disk cache flushes are triggered by
fdatasync()
syscall issued by the LevelDB compaction code when reaching the max file size.If the database is on a HDD this flush rate brings the whole system to a crawl.
It also results in very slow throughput since 2 MiB * 4 flushes per second is about 8 MiB / second max throughput, while even an old HDD can pull 100 - 200 MiB / second streaming throughput.
Increase the max file size for LevelDB to 128 MiB instead so the flush rate drops to about 1 flush / 2 seconds and the system no longer gets so sluggish.
The max file size value chosen also matches the
MAX_BLOCKFILE_SIZE
file size setting already used by the block storage.