New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
blockstorage: Separate reindexing from saving new blocks #29975
blockstorage: Separate reindexing from saving new blocks #29975
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
761ed1a
to
39ad8d8
Compare
🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the Possibly this is due to a silent merge conflict (the changes in this pull request being Leave a comment here, if you need help tracking down a confusing failure. |
Concept ACK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recreated the change to understand it better, commented on what I've noticed.
src/test/blockmanager_tests.cpp
Outdated
@@ -189,7 +189,7 @@ BOOST_AUTO_TEST_CASE(blockmanager_flush_block_file) | |||
// to block 2 location. | |||
CBlockFileInfo* block_data = blockman.GetBlockFileInfo(0); | |||
BOOST_CHECK_EQUAL(block_data->nBlocks, 2); | |||
BOOST_CHECK(blockman.SaveBlockToDisk(block3, /*nHeight=*/3, /*dbp=*/&pos2) == pos2); | |||
blockman.AddToBlockFileInfo(block3, /*nHeight=*/3, /*pos=*/pos2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems to me the related comments needs to be updated in this file (e.g line 184 and 199)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated some comments. The entire test setup probably makes a bit less sense after the refactor, users unfamiliar
with the history might ask themselves why someone could think that Reindex
/ AddToBlockFileInfo
would change the block files so that we'd require a test making sure it doesn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
39ad8d8
to
194e84a
Compare
39ad8d8 to 194e84a: Addressed review feedback by @paplorinc, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 194e84a. I left a lot of comments, but everything looks right here and the code is a lot nicer than before.
return false; | ||
FlatFilePos blockPos{}; | ||
if (dbp) { | ||
blockPos = *dbp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "validation, blockstorage: Separate code paths for reindex and saving new blocks" (a17eaca)
It looks like previously there would have been an error here if dbp->IsNull()
was true, and now there will not be an error. This is probably a good change, since AcceptBlock should not be looking at block positions, just passing them on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in the reindex case; In this case the passed dbp isn't changed (it's now a const arg to AddToBlockFileInfo
). If a dpb was passed to AcceptBlock for which dbp->IsNull()
, the error message ("Failed to find position to write new block to disk") would have been very confusing anyway, because we don't write a block to disk during reindex anyway.
src/node/blockstorage.h
Outdated
* @param[in] pos the position of the block on disk. This must point *after* the | ||
* 8 byte serialization header, at the beginning of the actual block data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "blockstorage: split up FindBlockPos function" (a2ae0a3)
I think this description is technically accurate, but I got confused by it and thought it was wrong because "the 8 byte serialization header" sounds like something that is part of CBlock
serialization, when actually it is referring to separator fields written by WriteBlockToDisk
before the serialized CBlock
.
Would suggest changing comment to "pos: the position of the serialized CBlock on disk. This is the position returned by WriteBlockToDisk pointing at the CBlock, not the separator fields before it."
I would also suggesting adding two more comments to this commit to make it clear what it happening at this stage of the PR.
In WriteBlockToDisk
documentation, "// The pos argument passed to this function is modified by this call. Before this call, it should point to an unused file location where separator fields will be written followed by the serialized CBlock data. After this call, it will point to the beginning of the serialized CBlock data, after the separator fields"
In FindBlockPos
documentation, "// The nAddSize argument passed to this function should include not just the size of the serialized CBlock, but the also size of separator fields which are written before it by WriteBlockToDisk (BLOCK_SERIALIZATION_HEADER_SIZE).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have done that, but bundled most of the doc changes to a doc-only commit at the beginning in order to not have unrelated things in the "blockstorage: split up FindBlockPos function" commit.
With 194e84a, since reindexing regenerates undo data, and undo data shouldn't be added until all existing blocks are, it seems like there is no reason for the --- a/src/node/blockstorage.cpp
+++ b/src/node/blockstorage.cpp
@@ -941,22 +941,11 @@ bool BlockManager::FindBlockPos(FlatFilePos& pos, unsigned int nAddSize, unsigne
void BlockManager::AddToBlockFileInfo(const CBlock& block, unsigned int nHeight, const FlatFilePos& pos)
{
LOCK(cs_LastBlockFile);
-
const unsigned int added_size = ::GetSerializeSize(TX_WITH_WITNESS(block));
- const BlockfileType chain_type = BlockfileTypeForHeight(nHeight);
- // -reindex and -reindex-chainstate delete any snapshot chainstate during init
- Assume(chain_type == BlockfileType::NORMAL);
-
const int nFile = pos.nFile;
if (static_cast<int>(m_blockfile_info.size()) <= nFile) {
m_blockfile_info.resize(nFile + 1);
}
-
- const int last_blockfile = m_blockfile_cursors[chain_type]->file_num;
- if (nFile != last_blockfile) {
- // No undo data yet in the new file, so reset our undo-height tracking.
- m_blockfile_cursors[chain_type] = BlockfileCursor{nFile};
- }
m_blockfile_info[nFile].AddBlock(nHeight, block.GetBlockTime());
m_blockfile_info[nFile].nSize = std::max(pos.nPos + added_size, m_blockfile_info[nFile].nSize);
m_dirty_fileinfo.insert(nFile); |
fKnown is true during reindex (and only then), which deletes any existing snapshot chainstate. As a result, this function can never be called wth fKnown set and a snapshot chainstate. Add an Assume for this, and make the code initializing a blockfile cursor for the snapshot conditional on !fKnown. This is a preparation for splitting the reindex logic out of FindBlockPos in the following commits.
194e84a
to
6a22eed
Compare
Thanks for the detailed review @ryanofsky! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 6a22eed. Just some suggested changes were made since the last review: adding more Assume checks, renaming a function, and moving a declaration. Everything still looks good.
ff2ef7d
to
9cf475f
Compare
🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the Possibly this is due to a silent merge conflict (the changes in this pull request being Leave a comment here, if you need help tracking down a confusing failure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 9cf475f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, only left a few nit comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 9cf475f. I left some suggestions to clean up comments in intermediate commits, but everything looks good in the end.
src/node/blockstorage.h
Outdated
@@ -155,6 +155,10 @@ class BlockManager | |||
/** Return false if undo file flushing fails. */ | |||
[[nodiscard]] bool FlushUndoFile(int block_file, bool finalize = false); | |||
|
|||
/** | |||
* The nAddSize argument passed to this function should include not just the size of the serialized CBlock, but the also size of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "doc: Improve doc for functions involved in saving blocks to disk" (f90098e)
This comment is wrong at this point in the PR. The way FindBlockPos works right now, you only pass it the size of the serialized CBlock plus the size of the separator fields when fKnown is false. When fKnown is true, you are supposed to pass it just the size of serialized CBlock, without the size of the separator fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expanded the comment with the fKnown
behavior.
In particular, document the flat file positions expected and returned by functions better. Co-authored-by: Ryan Ofsky <[email protected]>
FindBlockPos does different things depending on whether the block is known or not, as shown by the fact that much of the existing code is conditional on fKnown set or not. If the block position is known (during reindex) the function only updates the block info statistics. It doesn't actually find a block position in this case. This commit removes fKnown and splits up these two code paths by introducing a separate function for the reindex case when the block position is known. It doesn't change behavior.
…new blocks By calling SaveBlockToDisk only when we actually want to save a new block to disk. In the reindex case, we now call UpdateBlockInfo directly from validation. This commit doesn't change behavior.
The new name reflects that it is no longer called with existing blocks for which the position is already known. Returning a FlatFilePos directly simplifies the interface.
Previously, it was possible to move the cursor back to an older file during reindex if blocks are enocuntered out of order during reindex. This would mean that MaxBlockfileNum() would be incorrect, and a wrong DB_LAST_BLOCK could be written to disk. This improves the logic by only ever moving the cursor forward (if possible) but not backwards. Co-authored-by: Martin Zumsande <[email protected]>
9cf475f
to
e41667b
Compare
ACK e41667b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-ACK e41667b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK e41667b. Just improvements to comments since last review.
{ | ||
LOCK(cs_LastBlockFile); | ||
|
||
const unsigned int added_size = ::GetSerializeSize(TX_WITH_WITNESS(block)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "blockstorage: split up FindBlockPos function" (064859b)
note: At this point in the PR, GetSerializeSize is already called by the SaveBlockToDisk function calling it, so calling it again here is a bit inefficient. But this is resolved by the next commit moving the the UpdateBlockInfo call
SaveBlockToDisk
/FindBlockPos
are used for two purposes, depending on whether they are called during reindexing (dbp
set,fKnown = true
) or in the "normal" case when adding new blocks (dbp == nullptr
,fKnown = false
).The actual tasks are quite different
Using one function with many conditional statements for this leads to code that is hard to read / understand and bug-prone:
FindBlockPos
are conditional onfKnown
or!fKnown
#24858 and #27039 were recent bugs directly related to the differences between reindexing and normal mode, and in both cases the simple fix took a long time to be reviewed and merged.
This PR proposes to clean this code up by splitting out the reindex logic into a separate function (
UpdateBlockInfo
) which will be called directly from validation. As a result,SaveBlockToDisk
andFindBlockPos
only need to cover the non-reindex logic.