-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bootstrap cannot differentiate between "incremental snapshots don't exist" and "incremental snapshots haven't been seen yet" #30759
Comments
In the solution 1 case, wdyt about the new crds value only holding a single full snapshot hash and Option representing the latest it holds? The node may actually store more in case an older snapshot is requested |
Yeah, that likely should work. I'm reluctant to add more crds values for snapshots based on previous comments I've seen from @behzadnouri. I'm not sure what the process is for being able to deprecate and remove old crds values. Having only the single latest full snapshot and (optional) single latest incremental snapshot would be more straight forward in the bootstrap code, and be less data sent to gossip. |
Yeah, this is going to be a very painful slow process. see the "Update plan:" here: #29596 That being said it might be the right option to go with if in the longer term there are fewer values to gossip (as in the case of new |
Gotcha, yeah figured as much. How do the patches roll out across different clusters? Are feature-gates needed/used, or do targeted backports happen (i.e. a PR is only backported to
The two snapshot CRDS values combined are allowed to hold up to struct NewSnapshotCrdsValue {
full: (Slot, Hash),
incremental: Option<(Slot, Hash)>,
} (and could even try out changing Is that sufficient savings for it to be worth it? Footnotes
|
Just as new releases get adopted by the cluster.
yeah, probably. |
Might be nice to reserve space in a struct NewSnapshotCrdsValue {
latest: Vec<(SnapshotType, Slot, Hash)>,
} where |
Yeah, that makes sense. I like the idea of planning for the future, but I'm also a little wary of an open-ended @behzadnouri Do you have thoughts on how to handle/plan for forwards compatibility? I imagine smaller size of the data is good for performance, and also adding new versions of values can be painful... |
I think it would be better to just add a new enum variant here: |
Problem
Bootstrap runs into issues when it is discovering snapshots due to full snapshots and incremental snapshots gossiping out separately. This makes it impossible to know if an incremental snapshot does not exist yet, or if incremental snapshots just haven't been seen yet in gossip.
The current impl was an issue for a cluster restart. More context here #30740
For example:
scenario 1:
scenario 2:
scenario 3:
Possible Solutions
Solution 1: Combine full and incremental snapshots together into a new CRDS tag
Solution 2: Push a sentinel "empty" incremental snapshot on new full snapshot generation
[0; 32]
, or allFF
s too. I like the lower slot better, because we know it's invalid to have an incremental snapshot with a slot lower than its full snapshot's slot.If we could go back in time, maybe Solution 1 would be better, and we could design it to accommodate more changes in the future. For the current network, likely Solution 2 is more practical.
Edit: Looks like Solution 1 has buy in!
The text was updated successfully, but these errors were encountered: