validation: sync chainstate to disk after syncing to tip #15218

andrewtoth · 2019-01-20T19:04:17Z

When finishing syncing the chainstate to tip, the chainstate is not persisted to disk until 24 hours after startup. This can cause an issue where the unpersisted chainstate must be resynced if bitcoind is not cleanly shut down. If using a large enough dbcache, it's possible the entire chainstate from genesis would have to be resynced.

This fixes the issue by persisting the chainstate to disk right after syncing to tip, but not clearing the utxo cache (using the Sync method introduced in #17487). This happens by scheduling a call to the new function SyncCoinsTipAfterChainSync every 30 seconds. This function checks that the node is out of IBD, and then checks if no new block has been added since the last call. Finally, it checks that there are no blocks currently being downloaded by peers. If all these conditions are met, then the chainstate is persisted and the function is no longer scheduled.

Mitigates #11600.

src/validation.cpp

laanwj · 2019-01-21T15:42:25Z

Concept ACK, but I think IsInitialBlockDownload is the wrong place to implement this, as it's a query function, having it suddenly spawn a thread that flushes is unexpected.

Would be better to implement it closer to the validation logic and database update logic itself.

andrewtoth · 2019-01-21T23:34:07Z

@laanwj Good point. I refactored to move this behaviour to ActivateBestChain in an area where periodic flushes are already expected.

src/validation.cpp

laanwj · 2019-01-22T12:29:36Z

@laanwj Good point. I refactored to move this behaviour to ActivateBestChain in an area where periodic flushes are already expected.

Thanks, much better!

src/validation.cpp

sdaftuar · 2019-01-22T20:52:23Z

I'm not really a fan of this change -- the problem described in #11600 is from an unclean shutdown (ie system crash), where our recovery code could take a long time (but typically would be much faster than doing a -reindex to recover, which is how our code used to work).

This change doesn't really solve that problem, it just changes the window in which an unclean shutdown could occur (reducing it at most by 24 hours). But extra flushes, particularly during initial sync, aren't obviously a good idea, since they harm performance. (Note that we leave IBD before we've synced all the way to the tip, I think once we're within a day or two?)

Because we flush every day anyway, it's hard for me to say that this is really that much worse, performance-wise (after all we don't currently support a node configuration where the utxo is kept entirely cached). But I'm not sure this solves anything either, and a change like this would have to be reverted if, for instance, we wanted to make the cache actually more useful on startup (something I've thought we should do for a while). So I think I'm a -0 on this change.

andrewtoth · 2019-01-23T01:18:19Z

@sdaftuar This change also greatly improves the common workflow of spinning up a high performance instance to sync, then immediately shutting it down and using a cheaper one. Currently, you have to enter it and do a clean shutdown instead of just terminating. Similarly, when syncing to an external drive, you can now just unplug the drive or turn off the machine when finished.

I would argue that moving the window to 0 hours directly after initial sync is an objective improvement. There is a lot of data that will be lost directly after, so why risk another 24 hours? After that, the most they will lose is 24 hours worth of rolling back, instead of 10 years. Also, this change does not do any extra flushes during initial sync, only after.

I can't speak to your last point about changing the way we use the cache, since I don't know what your ideas are.

sdaftuar · 2019-01-23T17:08:54Z

Currently, you have to enter it and do a clean shutdown instead of just terminating.

@andrewtoth We already support this (better, I think) with the -stopatheight argument, no?

I don't really view data that is in memory as "at risk"; I view it as a massive performance optimization that will allow a node to process new blocks at the fastest possible speed while the data hasn't yet been flushed. I also don't feel very strongly about this for the reasons I gave above, so if others want this behavior then so be it.

sipa · 2019-01-23T19:31:04Z

@sdaftuar Maybe this is a bit of a different discussion, but there is another option; namely supporting flushing the dirty state to disk, but without wiping it from the cache. Based on our earlier benchmarking, we wouldn't want to do this purely for maximizing IBD performance, but it could be done at specific times to minimize losses in case of crashes (the once per day flush for example, and also this IBD-is-finshed one).

sdaftuar · 2019-01-23T19:43:35Z

@sipa Agreed, I think that would make a lot more sense as a first pass optimization for the periodic flushes and would also work better for this purpose as well.

gmaxwell · 2019-01-24T19:18:26Z

. Currently, you have to enter it and do a clean shutdown instead of just terminating.

Well with this, if you "just terminate" you're going to end up with a replay of several days blocks at start, which is still ugly, even if less bad via this.

Aside, actually if you actually shut off the computer any time during IBD you'll likely completely corrupt the state and need to reindex because we don't use fsync during IBD for performance reasons.

We really need to get background writing going, so that our writes are never more than (say) a week of blocktime behind... but that is a much bigger change, so I don't suggest "just do that instead", though it would make the change here completely unnecessary.

Might it be better to trigger the flush the first time it goes 30 seconds without connecting a block and there are no queued transfers, from the scheduler thread?

test/functional/mempool_accept.py

andrewtoth · 2019-01-25T02:22:19Z

@andrewtoth We already support this (better, I think) with the -stopatheight argument, no?

@sdaftuar Ahh, I never considered using that for this purpose. Thanks!

@gmaxwell It might still be ugly to have a replay of a few days, but much better than making everything unusable for hours.

There are comments from several people in this PR about adding background writing and writing dirty state to disk without wiping the cache. This change wouldn't affect either of those improvements, and is an improvement by itself in the interim.

As for moving this to the scheduler thread, I think this is better since it happens in a place where periodic flushes are already expected Also, checking every 30 seconds for a new block wouldn't work if for instance the network cuts out for a few minutes.

sipa · 2019-01-25T02:36:04Z

@andrewtoth The problem is that right now, causing a flush when exiting IBD will (temporarily) kill your performance right before finishing the sync (because it leaves you with an empty cache). If instead it was a non-clearing flush, there would be no such downside.

sdaftuar · 2019-01-29T18:56:40Z

My experiment in #15265 has changed my view on this a bit -- now I think that we might as well make a change like this for now, but should change the approach slightly to do something like @gmaxwell's proposal so that we don't trigger the flush before we are done syncing:

Might it be better to trigger the flush the first time it goes 30 seconds without connecting a block and there are no queued transfers, from the scheduler thread?

andrewtoth · 2019-02-10T20:34:41Z

@sdaftuar @gmaxwell I've updated this to check every 30 seconds on the scheduler thread if there has been an update to the active chain height. This only actually checks after IsInitialBlockDownload is false, which happens if latest block is within a day of the current time.

I'm not sure how to check if there are queued transfers. If this is not sufficient, some guidance on how to do that would be appreciated.

src/init.cpp

andrewtoth · 2024-02-22T21:21:53Z

@Sjors @maflcko @luke-jr I've rebased, added some logging as well as a functional test.

Github-Pull: bitcoin#15218 Rebased-From: 2626a8a

Github-Pull: bitcoin#15218 Rebased-From: 363f325

mzumsande

Concept ACK

While this one-time sync after IBD should help in some situations, I'm not sure that it completely resolves #11600 (I encountered this PR while looking into possible improvements to ReplayBlocks())
After all, there are several other situations in which a crash / unclean shutdown could lead to extensive replays (e.g. during IBD) that this PR doesn't address.

src/init.cpp

DrahtBot · 2024-06-03T01:45:41Z

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

_{Debug: https://github.com/bitcoin/bitcoin/runs/25710459287}

andrewtoth · 2024-06-03T02:48:47Z

@mzumsande @chrisguida thank you for your reviews and suggestions. I've addressed them and rebased.

andrewtoth force-pushed the flush-after-ibd branch from 20d83cc to c5c5702 Compare January 20, 2019 19:14

practicalswift reviewed Jan 20, 2019

View reviewed changes

src/validation.cpp Outdated Show resolved Hide resolved

fanquake added the Validation label Jan 21, 2019

andrewtoth force-pushed the flush-after-ibd branch from caf1911 to d4f38ce Compare January 21, 2019 04:39

andrewtoth force-pushed the flush-after-ibd branch from d4f38ce to e94f6be Compare January 21, 2019 23:30

andrewtoth force-pushed the flush-after-ibd branch from e94f6be to e3e1e7a Compare January 22, 2019 03:17

practicalswift reviewed Jan 22, 2019

View reviewed changes

src/validation.cpp Outdated Show resolved Hide resolved

andrewtoth force-pushed the flush-after-ibd branch from e3e1e7a to 3662823 Compare January 22, 2019 14:17

ken2812221 reviewed Jan 22, 2019

View reviewed changes

src/validation.cpp Outdated Show resolved Hide resolved

andrewtoth force-pushed the flush-after-ibd branch from 3662823 to 4787054 Compare January 23, 2019 01:32

maflcko reviewed Jan 24, 2019

View reviewed changes

test/functional/mempool_accept.py Outdated Show resolved Hide resolved

andcoisqu mentioned this pull request Jan 25, 2019

Flush without erasing cache during periodic and pruning flushes #15265

Closed

andrewtoth force-pushed the flush-after-ibd branch 2 times, most recently from f1be35e to 442db9d Compare February 10, 2019 20:29

practicalswift reviewed Feb 10, 2019

View reviewed changes

src/init.cpp Outdated Show resolved Hide resolved

andrewtoth force-pushed the flush-after-ibd branch 2 times, most recently from 79a9ed2 to 3abbfb0 Compare February 11, 2019 01:43

DrahtBot removed the CI failed label Feb 3, 2024

andrewtoth force-pushed the flush-after-ibd branch 2 times, most recently from 5928711 to 5275d3c Compare February 22, 2024 20:15

DrahtBot mentioned this pull request Mar 11, 2024

Simplify network-adjusted time warning logic #29623

Merged

andrewtoth force-pushed the flush-after-ibd branch from 5275d3c to 3696036 Compare March 13, 2024 17:54

andrewtoth changed the title ~~validation: Flush state after initial sync~~ validation: sync chainstate to disk after syncing to tip Mar 13, 2024

andrewtoth force-pushed the flush-after-ibd branch 2 times, most recently from 9dd521b to 363f325 Compare March 13, 2024 19:52

luke-jr pushed a commit to bitcoinknots/bitcoin that referenced this pull request Mar 14, 2024

validation: sync utxo state after block sync

2e729d2

Github-Pull: bitcoin#15218 Rebased-From: 2626a8a

luke-jr pushed a commit to bitcoinknots/bitcoin that referenced this pull request Mar 14, 2024

test: add test for SyncCoinsTipAfterChainSync

38c883d

Github-Pull: bitcoin#15218 Rebased-From: 363f325

DrahtBot added the Needs rebase label Apr 30, 2024

andrewtoth force-pushed the flush-after-ibd branch from 363f325 to 011d9b7 Compare May 2, 2024 01:12

DrahtBot removed the Needs rebase label May 2, 2024

DrahtBot mentioned this pull request May 3, 2024

p2p: Allow 1P1C to fetch parent from compact block extra_txn #30032

Closed

Cinnamonrou approved these changes May 4, 2024

View reviewed changes

mzumsande reviewed May 20, 2024

View reviewed changes

src/init.cpp Outdated Show resolved Hide resolved

andrewtoth mentioned this pull request May 21, 2024

Don't empty dbcache on prune flushes: >30% faster IBD #28280

Open

DrahtBot mentioned this pull request May 23, 2024

test: MiniWallet: respect passed feerate for padded txs (using target_weight) #30162

Open

luke-jr mentioned this pull request May 28, 2024

Increase default dbcache to ∞ bitcoinknots/bitcoin#70

Open

chrisguida reviewed May 29, 2024

View reviewed changes

src/init.cpp Outdated Show resolved Hide resolved

andrewtoth added 2 commits June 2, 2024 20:30

validation: sync utxo state after block sync

4862f29

test: add test for SyncCoinsTipAfterChainSync

9843f2c

andrewtoth force-pushed the flush-after-ibd branch from 011d9b7 to 9843f2c Compare June 3, 2024 00:32

DrahtBot added the CI failed label Jun 3, 2024

DrahtBot removed the CI failed label Jun 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

validation: sync chainstate to disk after syncing to tip #15218

validation: sync chainstate to disk after syncing to tip #15218

andrewtoth commented Jan 20, 2019 •

edited

laanwj commented Jan 21, 2019

andrewtoth commented Jan 21, 2019

laanwj commented Jan 22, 2019

sdaftuar commented Jan 22, 2019

andrewtoth commented Jan 23, 2019 •

edited

sdaftuar commented Jan 23, 2019

sipa commented Jan 23, 2019

sdaftuar commented Jan 23, 2019

gmaxwell commented Jan 24, 2019

andrewtoth commented Jan 25, 2019 •

edited

sipa commented Jan 25, 2019

sdaftuar commented Jan 29, 2019

andrewtoth commented Feb 10, 2019 •

edited

andrewtoth commented Feb 22, 2024

mzumsande left a comment

DrahtBot commented Jun 3, 2024

andrewtoth commented Jun 3, 2024

validation: sync chainstate to disk after syncing to tip #15218

Are you sure you want to change the base?

validation: sync chainstate to disk after syncing to tip #15218

Conversation

andrewtoth commented Jan 20, 2019 • edited

laanwj commented Jan 21, 2019

andrewtoth commented Jan 21, 2019

laanwj commented Jan 22, 2019

sdaftuar commented Jan 22, 2019

andrewtoth commented Jan 23, 2019 • edited

sdaftuar commented Jan 23, 2019

sipa commented Jan 23, 2019

sdaftuar commented Jan 23, 2019

gmaxwell commented Jan 24, 2019

andrewtoth commented Jan 25, 2019 • edited

sipa commented Jan 25, 2019

sdaftuar commented Jan 29, 2019

andrewtoth commented Feb 10, 2019 • edited

andrewtoth commented Feb 22, 2024

mzumsande left a comment

Choose a reason for hiding this comment

DrahtBot commented Jun 3, 2024

andrewtoth commented Jun 3, 2024

andrewtoth commented Jan 20, 2019 •

edited

andrewtoth commented Jan 23, 2019 •

edited

andrewtoth commented Jan 25, 2019 •

edited

andrewtoth commented Feb 10, 2019 •

edited