-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bootstrap: Sunset using LegacySnapshotHashes for snapshot discovery #31275
bootstrap: Sunset using LegacySnapshotHashes for snapshot discovery #31275
Conversation
91f9215
to
f3932c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of reading the diff top-to-bottom, I've added comments in a way that may make understanding the changes easier (at least for me!).
/// The `get_snapshot_hashes_for_node` parameter is a function that map a pubkey to its snapshot | ||
/// hashes. This parameter exist to provide a way to test the inner algorithm without needing | ||
/// runtime information such as the ClusterInfo or ValidatorConfig. | ||
fn build_known_snapshot_hashes<'a>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main changes are here in build_known_snapshot_hashes
. I'll come back to add more PR comments; but the main thing to notice is that there are no longer separate parameter to get the full and incremental snapshot hashes per node.
.map(|hashes| (hashes.full, hashes.incremental)) | ||
}; | ||
// Get the snapshot hashes for a node from CRDS | ||
let get_snapshot_hashes_for_node = |node| get_snapshot_hashes_for_node(cluster_info, node); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here within get_snapshot_hashes_from_known_validators()
is where we call build_known_snapshot_hashes()
. And this line is how we get the full and incremental snapshot hashes from a node.
Before, we would call into cluster_info
, but now we can use the same function that we later use to ask the peer (i.e. non-known) nodes what their snapshot hashes are (get_snapshot_hashes_for_node()
). Yay!
fn get_snapshot_hashes_for_node(cluster_info: &ClusterInfo, node: &Pubkey) -> Option<SnapshotHash> { | ||
cluster_info.get_snapshot_hashes_for_node(node).map( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old impl had separate functions for getting the full and the incremental snapshot hashes. The full hashes came from LegacySnapshotHashes
and the incremental hashes came from SnapshotHashes
. Now we get both from SnapshotHashes
.
This impl is basically the same as the old get_highest_incremental_snapshot_hash_for_peer()
, just with a few renames.
let peer_snapshot_hashes = rpc_peers | ||
.iter() | ||
.flat_map(|rpc_peer| { | ||
get_snapshot_hashes_for_node(cluster_info, &rpc_peer.id).map(|snapshot_hash| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And getting the peer snapshot hashes here now becomes a lot easier. Since each node only has one snapshot hash (and not a vec anymore), or None
, we map and collect up all the Some
ones.
This inner call to get_snapshot_hashes_for_node()
does the heavy-ish lifting of getting the node's highest snapshot hash.
}; | ||
get_snapshot_hashes_for_node: impl Fn(&'a Pubkey) -> Option<SnapshotHash>, | ||
) -> bool { | ||
let node_has_snapshot_hashes = |node| get_snapshot_hashes_for_node(node).is_some(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking if the known validators now have snapshot hashes is now much simpler too. We don't need to check multiple CRDS sources, just the one. And again since we don't have a vec of snapshot hashes, we can simply ask if the node has snapshot hashes or not.
let Some(SnapshotHash {full: full_snapshot_hash, incr: incremental_snapshot_hash}) = get_snapshot_hashes_for_node(node) else { | ||
continue 'to_next_node; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the best part. There's only a maximum of one full and incremental snapshot hash per node, instead of vecs of each. So we entirely remove the complexity of looping over both possible loops.
let known_incremental_snapshot_hashes = | ||
known_snapshot_hashes.entry(full_snapshot_hash).or_default(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of this, the whole else
block on old-line 1053 becomes impossible. With .entry()
we always have the correct spot to insert an incremental snapshot hash (if there is one).
And then we also don't need to re-find the correct location, like in old-line 1026.
// simulate a set of known validators with various snapshot hashes | ||
let oracle = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the test for build_known_snapshot_hashes()
, it's now much easier to generate the oracle as well. Yay!
Codecov Report
@@ Coverage Diff @@
## master #31275 +/- ##
========================================
Coverage 81.5% 81.5%
========================================
Files 732 732
Lines 206999 206861 -138
========================================
- Hits 168767 168699 -68
+ Misses 38232 38162 -70 |
); | ||
continue 'to_next_node; | ||
} | ||
let Some(SnapshotHash {full: full_snapshot_hash, incr: incremental_snapshot_hash}) = get_snapshot_hashes_for_node(node) else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brooks is the king of the let/else
@@ -977,134 +923,81 @@ where | |||
} | |||
|
|||
'to_next_node: for node in nodes { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can drop the label now that the inner loop is gone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can. Would that be preferred? I kinda like the label; since the for-loop is so big, whenever I encounter a continue
it has the label alongside it. (The for-loop is not nearly as big as it was before, so this may not be an issue anymore.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your call. I can see the merits of either
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Since this PR has already passed CI, I'd like to leave it. It can be removed in a subsequent PR, and will be trivial to review as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 nit, but this looks great to me!
Problem
The bootstrap code uses both
LegacySnapshotHashes
andSnapshotHashes
from CRDS to discover snapshots. This sometimes doesn't work out, since these two data sources are not synchronized. So getting the full snapshots from Legacy and the incremental snapshots from non-Legacy can cause bootstrap to incorrectly not download an incremental snapshot.The full context and background can be found here:
#26180
Since that issue, we've now changed the non-Legacy CRDS data to always send correct, up-to-date full and incremental snapshot hashes together. This obviates the need for using
LegacySnapshotHashes
entirely.We can now robustly discover all snapshot hashes, and can remove many conditional/error/edge cases that could arise when using two CRDS data sources.
Summary of Changes
Remove all use of
LegacySnapshotHashes
.Note: I've added comments (#31275 (review)) that may be easier to read first, before looking at the diff top-down.
Additional Testing
I bootstrapped a validator running this PR against mnb and confirmed it was successfully able to download snapshots—and the correct ones!
Fixes: #30759
Fixes: #30740
Fixes: #28478
Fixes: #26180