Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add docs and details for the differential state archive #7049

Open
wants to merge 1 commit into
base: feature/differential-archive
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
---
title: Understanding Historical Sate Regeneration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
title: Understanding Historical Sate Regeneration
title: Understanding Historical State Regeneration

---

# Understanding Historical Sate Regeneration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Understanding Historical Sate Regeneration
# Understanding Historical State Regeneration


To run a blockchain client and establish consensus we need latest headers and forkchoice data. This operation does not require to historical data, specially after the epochs which are finalized. Storing the full state information for the finalized slots increase the storage requirement a lot and not suitable for running the node for long time.
Copy link
Member

@nflaig nflaig Aug 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To run a blockchain client and establish consensus we need latest headers and forkchoice data. This operation does not require to historical data, specially after the epochs which are finalized. Storing the full state information for the finalized slots increase the storage requirement a lot and not suitable for running the node for long time.
To run a blockchain client and establish consensus we need latest headers and fork choice data. This operation does not require access to historical data, especially after the epochs which are finalized. Storing the full state information for the finalized slots increase the storage requirement a lot and not suitable for running the node for long time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To run a blockchain client and establish consensus we need latest headers and forkchoice data. This operation does not require to historical data, specially after the epochs which are finalized. Storing the full state information for the finalized slots increase the storage requirement a lot and not suitable for running the node for long time.
To run a blockchain client and establish consensus we need the latest headers and forkchoice data. This operation does not require to historical data, specially after the epochs are being finalized. Storing the full state information for the finalized slots increase the storage requirement a lot and is not suitable for running the node for long time.


## Solution

To overcome the storage problem for the archive nodes we implemented following algorithm to store and fetch the historical sates.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To overcome the storage problem for the archive nodes we implemented following algorithm to store and fetch the historical sates.
To overcome the storage problem for the archive nodes, we implemented the following algorithm to store and fetch the historical states.


**Approach**

Assume we have following chain represents the state object every slot, with following diff layer configurations `1,2,3,5`. With assumption that we have 8 slots each epoch, The following configuration for layers implies:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Assume we have following chain represents the state object every slot, with following diff layer configurations `1,2,3,5`. With assumption that we have 8 slots each epoch, The following configuration for layers implies:
Assume we have following chain represents the state object every slot, with following diff layer configurations `1,2,3,5`. With assumption that we have 8 slots each epoch, the following configuration for layers implies:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Assume we have following chain represents the state object every slot, with following diff layer configurations `1,2,3,5`. With assumption that we have 8 slots each epoch, The following configuration for layers implies:
Assume we have the following chain representing the state object of every slot, with the following diff layer configurations `1,2,3,5`. With the assumption that we have 8 slots each epoch, the following configuration for layers implies:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Assume we have following chain represents the state object every slot, with following diff layer configurations `1,2,3,5`. With assumption that we have 8 slots each epoch, The following configuration for layers implies:
Assume we have the following chain which represents the state object every slot, with following diff layer configurations `1,2,3,5`. With the assumption that we have 8 slots each epoch, the following configuration for layers implies:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Assume we have following chain represents the state object every slot, with following diff layer configurations `1,2,3,5`. With assumption that we have 8 slots each epoch, The following configuration for layers implies:
Assuming we have following chain represents the state object every slot, with following diff layer configurations `1,2,3,5`. With assumption that we have 8 slots each epoch, The following configuration for layers implies:


1. We store the snapshot every 5th epoch.
2. We take diff every epoch, every 2nd epoch and every 3rd epoch.

Please see the following table for more understanding of these layers.

![historical-regen](docs/static/images/historical-regen/historical-regen.png)

These are the rules we follow:

1. If two layers frequency collide on one slot, we use the lower layer. Shown as the black border around slots.
2. The lowest layer is called the snapshot layer and we store fully serialized bytes of state object for that slot.
3. We always try to find the shortest hierarchical path to reach to the snapshot layer, starting from the top most layer.
4. For rest of the layers we recursively find the binary difference and only store the diffs on the upper layers.

Let's take few scenarios:

1. For slot `0` all layers collide, so we use the lowest layer which is the snapshot layer. So for the slot `0` we store and fetch the snapshot.
2. For slots (0-7) within first epoch we there is no intermediary layer, so we read the snapshot from slot `0`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. For slots (0-7) within first epoch we there is no intermediary layer, so we read the snapshot from slot `0`.
2. For slots (0-7) within the first epoch, there is no intermediary layer, so we read the snapshot from slot `0`.

3. For slots (8-15) the path we follow is `8 -> 0`. e.g. For slot `12`, we apply diff from slot `8` on snapshot from slot `0`. Then we replay blocks from 9-12.
4. For slot `18` the shortest path to nearest snapshot is `16 -> 0` and rest will follow same as above.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
4. For slot `18` the shortest path to nearest snapshot is `16 -> 0` and rest will follow same as above.
4. For slot `18` the shortest path to nearest snapshot is `16 -> 0` and the rest will follow same as above.

5. For slot `34` the path we follow `32 -> 24 -> 0`.
6. For slot `41` path for the nearest snapshot slot is just one layer directly at slot `40`.

As you can see with this approach we can find a shorter paths with smaller number of diffs to apply, which generate the nearest full state and reduce the number of blocks we have to replay to reach to actual slot.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
As you can see with this approach we can find a shorter paths with smaller number of diffs to apply, which generate the nearest full state and reduce the number of blocks we have to replay to reach to actual slot.
As you can see with this approach we can find shorter paths with smaller number of diffs to apply, which generate the nearest full state and reduce the number of blocks we have to replay to reach the actual slot.


**Constants**

To derive the right values for layers, we developed a mathematical approach that provides an estimation based on different parameters in the system.

$$
\begin{align*}
Cost &= \frac{w_{s}* Storage + w_{b} \times Backup Time + w_{r} \times Restore Time}{(n \times T_{diff} + T_{full}) \times G_{max}} + (G_{min} \times T_{replay}) \\
TotalStorage &= F \times S_{full} + \sum\limits_{i=1}^{n}D_{i}\times S_{diff}\\
BackupTime &= F \times T_{full} + \sum\limits_{i=1}^{n}D_{i} \times T_{diff}\\
RestoreTime &= F \times R_{full} + \sum\limits_{i=1}^{n}D_{i} \times R_{diff}\\
\\
\text{Where as}\\
\\
F &= \text{Frequency of full backup}\\
D &= \text{Frequency of differential backup}\\
S_{full} &= \text{Size of full backup}\\
S_{diff} &= \text{Size of differential backup}\\
T_{full} &= \text{Time to take full backup}\\
T_{diff} &= \text{Time to take differential backup}\\
T_{replay} &= \text{Time to replay a block}\\
R_{full} &= \text{Time to restore full backup}\\
R_{diff} &= \text{Tiem to restore differential backup}\\
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
R_{diff} &= \text{Tiem to restore differential backup}\\
R_{diff} &= \text{Time to restore differential backup}\\

G_{max} &= \text{Max gap between backups (usually the snapshot gap)}\\
G_{min} &= \text{Minimum gap between backups (usually the top layer gap)}\\
w_{s} &= \text{Weight for total storage}\\
w_{b}&= \text{Weight for total backup time}\\
w_{r} &= \text{Weight for total restore time}\\
n &= \text{Number of differential layers}\\
\end{align*}
$$

As there are lot of parameters in the system and we don't have accurate values for these so we started few possible estimates. Also as the chain is ever growing data structure the value for `F` is not finite. We decided to do this estimation based on 30 days time period and `mainnet` parameters.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
As there are lot of parameters in the system and we don't have accurate values for these so we started few possible estimates. Also as the chain is ever growing data structure the value for `F` is not finite. We decided to do this estimation based on 30 days time period and `mainnet` parameters.
As there are lot of parameters in the system and we don't have accurate values for these so we started few possible estimates. Also as the chain is a ever growing data structure, the value for `F` is not finite. We decided to do this estimation based on 30 days time period and `mainnet` parameters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
As there are lot of parameters in the system and we don't have accurate values for these so we started few possible estimates. Also as the chain is ever growing data structure the value for `F` is not finite. We decided to do this estimation based on 30 days time period and `mainnet` parameters.
As there are lot of parameters in the system and we don't have accurate values for these. we started with few possible estimates. Also as the chain is an ever- growing data structure, the value for `F` is not finite. We decided to do this estimation based on 30 days time period and `mainnet` parameters.


Based on these assumptions and system we decided for the following constants.

| Name | Value | Description |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also run yarn docs:lint:fix for prettier to fix this table

| -------------------------- | ----- | ----------------------------------------------- |
| DEFAULT_DIFF_LAYERS | 8, 32, 128, 512 | Default value for layers |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading