bootstrap: Sunset using LegacySnapshotHashes for snapshot discovery #31275

brooksprumo · 2023-04-19T20:19:09Z

Problem

The bootstrap code uses both LegacySnapshotHashes and SnapshotHashes from CRDS to discover snapshots. This sometimes doesn't work out, since these two data sources are not synchronized. So getting the full snapshots from Legacy and the incremental snapshots from non-Legacy can cause bootstrap to incorrectly not download an incremental snapshot.

The full context and background can be found here:
#26180

Since that issue, we've now changed the non-Legacy CRDS data to always send correct, up-to-date full and incremental snapshot hashes together. This obviates the need for using LegacySnapshotHashes entirely.

We can now robustly discover all snapshot hashes, and can remove many conditional/error/edge cases that could arise when using two CRDS data sources.

Summary of Changes

Remove all use of LegacySnapshotHashes.

Note: I've added comments (#31275 (review)) that may be easier to read first, before looking at the diff top-down.

Additional Testing

I bootstrapped a validator running this PR against mnb and confirmed it was successfully able to download snapshots—and the correct ones!

Fixes: #30759
Fixes: #30740
Fixes: #28478
Fixes: #26180

brooksprumo

Instead of reading the diff top-to-bottom, I've added comments in a way that may make understanding the changes easier (at least for me!).

brooksprumo · 2023-04-19T20:52:53Z

validator/src/bootstrap.rs

+/// The `get_snapshot_hashes_for_node` parameter is a function that map a pubkey to its snapshot
+/// hashes.  This parameter exist to provide a way to test the inner algorithm without needing
+/// runtime information such as the ClusterInfo or ValidatorConfig.
+fn build_known_snapshot_hashes<'a>(


The main changes are here in build_known_snapshot_hashes. I'll come back to add more PR comments; but the main thing to notice is that there are no longer separate parameter to get the full and incremental snapshot hashes per node.

brooksprumo · 2023-04-19T20:55:58Z

validator/src/bootstrap.rs

-            .map(|hashes| (hashes.full, hashes.incremental))
-    };
+    // Get the snapshot hashes for a node from CRDS
+    let get_snapshot_hashes_for_node = |node| get_snapshot_hashes_for_node(cluster_info, node);


Here within get_snapshot_hashes_from_known_validators() is where we call build_known_snapshot_hashes(). And this line is how we get the full and incremental snapshot hashes from a node.

Before, we would call into cluster_info, but now we can use the same function that we later use to ask the peer (i.e. non-known) nodes what their snapshot hashes are (get_snapshot_hashes_for_node()). Yay!

brooksprumo · 2023-04-19T20:59:45Z

validator/src/bootstrap.rs

+fn get_snapshot_hashes_for_node(cluster_info: &ClusterInfo, node: &Pubkey) -> Option<SnapshotHash> {
+    cluster_info.get_snapshot_hashes_for_node(node).map(


The old impl had separate functions for getting the full and the incremental snapshot hashes. The full hashes came from LegacySnapshotHashes and the incremental hashes came from SnapshotHashes. Now we get both from SnapshotHashes.

This impl is basically the same as the old get_highest_incremental_snapshot_hash_for_peer(), just with a few renames.

brooksprumo · 2023-04-19T21:03:55Z

validator/src/bootstrap.rs

+    let peer_snapshot_hashes = rpc_peers
+        .iter()
+        .flat_map(|rpc_peer| {
+            get_snapshot_hashes_for_node(cluster_info, &rpc_peer.id).map(|snapshot_hash| {


And getting the peer snapshot hashes here now becomes a lot easier. Since each node only has one snapshot hash (and not a vec anymore), or None, we map and collect up all the Some ones.

This inner call to get_snapshot_hashes_for_node() does the heavy-ish lifting of getting the node's highest snapshot hash.

brooksprumo · 2023-04-19T21:05:36Z

validator/src/bootstrap.rs

-    };
+    get_snapshot_hashes_for_node: impl Fn(&'a Pubkey) -> Option<SnapshotHash>,
+) -> bool {
+    let node_has_snapshot_hashes = |node| get_snapshot_hashes_for_node(node).is_some();


Checking if the known validators now have snapshot hashes is now much simpler too. We don't need to check multiple CRDS sources, just the one. And again since we don't have a vec of snapshot hashes, we can simply ask if the node has snapshot hashes or not.

brooksprumo · 2023-04-19T21:07:36Z

validator/src/bootstrap.rs

+        let Some(SnapshotHash {full: full_snapshot_hash, incr: incremental_snapshot_hash}) = get_snapshot_hashes_for_node(node) else {
+            continue 'to_next_node;
+        };


This is the best part. There's only a maximum of one full and incremental snapshot hash per node, instead of vecs of each. So we entirely remove the complexity of looping over both possible loops.

brooksprumo · 2023-04-19T21:11:31Z

validator/src/bootstrap.rs

+        let known_incremental_snapshot_hashes =
+            known_snapshot_hashes.entry(full_snapshot_hash).or_default();


Because of this, the whole else block on old-line 1053 becomes impossible. With .entry() we always have the correct spot to insert an incremental snapshot hash (if there is one).

And then we also don't need to re-find the correct location, like in old-line 1026.

brooksprumo · 2023-04-19T21:14:24Z

validator/src/bootstrap.rs

+        // simulate a set of known validators with various snapshot hashes
+        let oracle = {


In the test for build_known_snapshot_hashes(), it's now much easier to generate the oracle as well. Yay!

codecov · 2023-04-19T21:55:34Z

Codecov Report

Merging #31275 (f3932c8) into master (b4ea104) will increase coverage by 0.0%.
The diff coverage is 70.3%.

@@           Coverage Diff            @@
##           master   #31275    +/-   ##
========================================
  Coverage    81.5%    81.5%            
========================================
  Files         732      732            
  Lines      206999   206861   -138     
========================================
- Hits       168767   168699    -68     
+ Misses      38232    38162    -70

bw-solana · 2023-04-20T14:00:40Z

validator/src/bootstrap.rs

-                );
-                continue 'to_next_node;
-            }
+        let Some(SnapshotHash {full: full_snapshot_hash, incr: incremental_snapshot_hash}) = get_snapshot_hashes_for_node(node) else {


Brooks is the king of the let/else

bw-solana · 2023-04-20T14:02:54Z

validator/src/bootstrap.rs

@@ -977,134 +923,81 @@ where
    }

    'to_next_node: for node in nodes {


I think we can drop the label now that the inner loop is gone

Yes, we can. Would that be preferred? I kinda like the label; since the for-loop is so big, whenever I encounter a continue it has the label alongside it. (The for-loop is not nearly as big as it was before, so this may not be an issue anymore.)

Your call. I can see the merits of either

Sounds good. Since this PR has already passed CI, I'd like to leave it. It can be removed in a subsequent PR, and will be trivial to review as well.

bw-solana · 2023-04-20T14:15:38Z

Winning

bw-solana

1 nit, but this looks great to me!

brooksprumo self-assigned this Apr 19, 2023

bootstrap: Sunset using LegacySnapshotHashes for snapshot discovery

f3932c8

brooksprumo force-pushed the bsb/bootstrap/remove-legacy-snapshot-hashes branch from 91f9215 to f3932c8 Compare April 19, 2023 20:44

brooksprumo commented Apr 19, 2023

View reviewed changes

brooksprumo marked this pull request as ready for review April 19, 2023 22:19

brooksprumo requested a review from bw-solana April 19, 2023 22:19

bw-solana reviewed Apr 20, 2023

View reviewed changes

bw-solana approved these changes Apr 20, 2023

View reviewed changes

brooksprumo merged commit 04bbf3b into solana-labs:master Apr 20, 2023

brooksprumo deleted the bsb/bootstrap/remove-legacy-snapshot-hashes branch April 20, 2023 14:33

brooksprumo linked an issue Apr 20, 2023 that may be closed by this pull request

Bootstrap cannot differentiate between "incremental snapshots don't exist" and "incremental snapshots haven't been seen yet" #30759

Closed

brooksprumo mentioned this pull request Oct 7, 2023

Stops pushing legacy snapshot hashes to crds #33576

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bootstrap: Sunset using LegacySnapshotHashes for snapshot discovery #31275

bootstrap: Sunset using LegacySnapshotHashes for snapshot discovery #31275

brooksprumo commented Apr 19, 2023 •

edited

Loading

brooksprumo left a comment

brooksprumo Apr 19, 2023

brooksprumo Apr 19, 2023

brooksprumo Apr 19, 2023

brooksprumo Apr 19, 2023

brooksprumo Apr 19, 2023

brooksprumo Apr 19, 2023

brooksprumo Apr 19, 2023

brooksprumo Apr 19, 2023

codecov bot commented Apr 19, 2023

bw-solana Apr 20, 2023

bw-solana Apr 20, 2023

brooksprumo Apr 20, 2023

bw-solana Apr 20, 2023

brooksprumo Apr 20, 2023

bw-solana commented Apr 20, 2023

bw-solana left a comment

		fn get_snapshot_hashes_for_node(cluster_info: &ClusterInfo, node: &Pubkey) -> Option<SnapshotHash> {
		cluster_info.get_snapshot_hashes_for_node(node).map(

		let known_incremental_snapshot_hashes =
		known_snapshot_hashes.entry(full_snapshot_hash).or_default();

		// simulate a set of known validators with various snapshot hashes
		let oracle = {

		@@ -977,134 +923,81 @@ where
		}

		'to_next_node: for node in nodes {

bootstrap: Sunset using LegacySnapshotHashes for snapshot discovery #31275

bootstrap: Sunset using LegacySnapshotHashes for snapshot discovery #31275

Conversation

brooksprumo commented Apr 19, 2023 • edited Loading

Problem

Summary of Changes

Additional Testing

brooksprumo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Apr 19, 2023

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bw-solana commented Apr 20, 2023

bw-solana left a comment

Choose a reason for hiding this comment

brooksprumo commented Apr 19, 2023 •

edited

Loading