New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

docs: add docs and details for the differential state archive #7049

Open

nazarhussain wants to merge 1 commit into feature/differential-archive from nh/state-diff-docs

Contributor

nazarhussain commented Aug 24, 2024

Motivation

Add docs explaining differential state archive.

Description

Keep the reference docs updated.

Steps to test or reproduce

Run all checks


          Add state diff docs

8a1de32

nazarhussain requested a review from a team as a code owner

August 24, 2024 11:50

nazarhussain self-assigned this

codecov bot commented Aug 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 49.25%. Comparing base (6d01593) to head (8a1de32).

Additional details and impacted files

@@                      Coverage Diff                      @@
##           feature/differential-archive    #7049   +/-   ##
=============================================================
  Coverage                         49.24%   49.25%           
=============================================================
  Files                               578      578           
  Lines                             37443    37443           
  Branches                           2168     2172    +4     
=============================================================
+ Hits                              18440    18441    +1     
+ Misses                            18963    18962    -1     
  Partials                             40       40

twoeths reviewed

View reviewed changes

docs/pages/contribution/advanced-topics/historical-state-regen.md

		@@ -0,0 +1,79 @@
		---
		title: Understanding Historical Sate Regeneration

Contributor

twoeths Aug 26, 2024

Suggested change

      
            title: Understanding Historical Sate Regeneration
          
            title: Understanding Historical State Regeneration

twoeths reviewed

View reviewed changes

docs/pages/contribution/advanced-topics/historical-state-regen.md


		Approach

		Assume we have following chain represents the state object every slot, with following diff layer configurations `1,2,3,5`. With assumption that we have 8 slots each epoch, The following configuration for layers implies:

Contributor

twoeths Aug 26, 2024

Suggested change

      
            Assume we have following chain represents the state object every slot, with following diff layer configurations `1,2,3,5`. With assumption that we have 8 slots each epoch, The following configuration for layers implies:
          
            Assume we have following chain represents the state object every slot, with following diff layer configurations `1,2,3,5`. With assumption that we have 8 slots each epoch, the following configuration for layers implies:

Member

philknows Aug 26, 2024

Suggested change

      
            Assume we have following chain represents the state object every slot, with following diff layer configurations `1,2,3,5`. With assumption that we have 8 slots each epoch, The following configuration for layers implies:
          
            Assume we have the following chain representing the state object of every slot, with the following diff layer configurations `1,2,3,5`. With the assumption that we have 8 slots each epoch, the following configuration for layers implies:

nflaig reviewed

View reviewed changes

docs/pages/contribution/advanced-topics/historical-state-regen.md


		# Understanding Historical Sate Regeneration

		To run a blockchain client and establish consensus we need latest headers and forkchoice data. This operation does not require to historical data, specially after the epochs which are finalized. Storing the full state information for the finalized slots increase the storage requirement a lot and not suitable for running the node for long time.

Member

nflaig Aug 26, 2024 •

edited

Loading

Suggested change

      
            To run a blockchain client and establish consensus we need latest headers and forkchoice data. This operation does not require to historical data, specially after the epochs which are finalized. Storing the full state information for the finalized slots increase the storage requirement a lot and not suitable for running the node for long time.
          
            To run a blockchain client and establish consensus we need latest headers and fork choice data. This operation does not require access to historical data, especially after the epochs which are finalized. Storing the full state information for the finalized slots increase the storage requirement a lot and not suitable for running the node for long time.

docs/pages/contribution/advanced-topics/historical-state-regen.md


		## Solution

		To overcome the storage problem for the archive nodes we implemented following algorithm to store and fetch the historical sates.

Member

nflaig Aug 26, 2024

Suggested change

      
            To overcome the storage problem for the archive nodes we implemented following algorithm to store and fetch the historical sates.
          
            To overcome the storage problem for the archive nodes, we implemented the following algorithm to store and fetch the historical states.

docs/pages/contribution/advanced-topics/historical-state-regen.md


		Approach

		Assume we have following chain represents the state object every slot, with following diff layer configurations `1,2,3,5`. With assumption that we have 8 slots each epoch, The following configuration for layers implies:

Member

nflaig Aug 26, 2024

Suggested change

      
            Assume we have following chain represents the state object every slot, with following diff layer configurations `1,2,3,5`. With assumption that we have 8 slots each epoch, The following configuration for layers implies:
          
            Assume we have the following chain which represents the state object every slot, with following diff layer configurations `1,2,3,5`. With the assumption that we have 8 slots each epoch, the following configuration for layers implies:

docs/pages/contribution/advanced-topics/historical-state-regen.md

+. For slot `34` the path we follow `32 -> 24 -> 0`.
+. For slot `41` path for the nearest snapshot slot is just one layer directly at slot `40`.
+              As you can see with this approach we can find a shorter paths with smaller number of diffs to apply, which generate the nearest full state and reduce the number of blocks we have to replay to reach to actual slot.

Member

nflaig Aug 26, 2024

Suggested change

      
            As you can see with this approach we can find a shorter paths with smaller number of diffs to apply, which generate the nearest full state and reduce the number of blocks we have to replay to reach to actual slot.
          
            As you can see with this approach we can find shorter paths with smaller number of diffs to apply, which generate the nearest full state and reduce the number of blocks we have to replay to reach the actual slot.

docs/pages/contribution/advanced-topics/historical-state-regen.md

+              \end{align*}
+              $$
+              As there are lot of parameters in the system and we don't have accurate values for these so we started few possible estimates. Also as the chain is ever growing data structure the value for `F` is not finite. We decided to do this estimation based on 30 days time period and `mainnet` parameters.

Member

nflaig Aug 26, 2024

Suggested change

      
            As there are lot of parameters in the system and we don't have accurate values for these so we started few possible estimates. Also as the chain is ever growing data structure the value for `F` is not finite. We decided to do this estimation based on 30 days time period and `mainnet` parameters.
          
            As there are lot of parameters in the system and we don't have accurate values for these so we started few possible estimates. Also as the chain is a ever growing data structure, the value for `F` is not finite. We decided to do this estimation based on 30 days time period and `mainnet` parameters.

docs/pages/contribution/advanced-topics/historical-state-regen.md

+              T_{diff} &= \text{Time to take differential backup}\\
+              T_{replay} &= \text{Time to replay a block}\\
+              R_{full} &= \text{Time to restore full backup}\\
+              R_{diff} &= \text{Tiem to restore differential backup}\\

Member

nflaig Aug 26, 2024

Suggested change

      
            R_{diff} &= \text{Tiem to restore differential backup}\\
          
            R_{diff} &= \text{Time to restore differential backup}\\

docs/pages/contribution/advanced-topics/historical-state-regen.md

+              title: Understanding Historical Sate Regeneration
+              ---
+              # Understanding Historical Sate Regeneration

Member

nflaig Aug 26, 2024

Suggested change

      
            # Understanding Historical Sate Regeneration
          
            # Understanding Historical State Regeneration

philknows reviewed

View reviewed changes

docs/pages/contribution/advanced-topics/historical-state-regen.md


		Based on these assumptions and system we decided for the following constants.

		\| Name \| Value \| Description \|

Member

philknows Aug 26, 2024

Please also run yarn docs:lint:fix for prettier to fix this table

ensi321 reviewed

View reviewed changes

docs/pages/contribution/advanced-topics/historical-state-regen.md


		# Understanding Historical Sate Regeneration

		To run a blockchain client and establish consensus we need latest headers and forkchoice data. This operation does not require to historical data, specially after the epochs which are finalized. Storing the full state information for the finalized slots increase the storage requirement a lot and not suitable for running the node for long time.

Contributor

ensi321 Aug 27, 2024

Suggested change

      
            To run a blockchain client and establish consensus we need latest headers and forkchoice data. This operation does not require to historical data, specially after the epochs which are finalized. Storing the full state information for the finalized slots increase the storage requirement a lot and not suitable for running the node for long time.
          
            To run a blockchain client and establish consensus we need the latest headers and forkchoice data. This operation does not require to historical data, specially after the epochs are being finalized. Storing the full state information for the finalized slots increase the storage requirement a lot and is not suitable for running the node for long time.

docs/pages/contribution/advanced-topics/historical-state-regen.md


		Approach

		Assume we have following chain represents the state object every slot, with following diff layer configurations `1,2,3,5`. With assumption that we have 8 slots each epoch, The following configuration for layers implies:

Contributor

ensi321 Aug 27, 2024

Suggested change

      
            Assume we have following chain represents the state object every slot, with following diff layer configurations `1,2,3,5`. With assumption that we have 8 slots each epoch, The following configuration for layers implies:
          
            Assuming we have following chain represents the state object every slot, with following diff layer configurations `1,2,3,5`. With assumption that we have 8 slots each epoch, The following configuration for layers implies:

docs/pages/contribution/advanced-topics/historical-state-regen.md

+              Let's take few scenarios:
+. For slot `0` all layers collide, so we use the lowest layer which is the snapshot layer. So for the slot `0` we store and fetch the snapshot.
+. For slots (0-7) within first epoch we there is no intermediary layer, so we read the snapshot from slot `0`.

Contributor

ensi321 Aug 27, 2024

Suggested change

      
            2. For slots (0-7) within first epoch we there is no intermediary layer, so we read the snapshot from slot `0`.
          
            2. For slots (0-7) within the first epoch, there is no intermediary layer, so we read the snapshot from slot `0`.

docs/pages/contribution/advanced-topics/historical-state-regen.md

+. For slot `0` all layers collide, so we use the lowest layer which is the snapshot layer. So for the slot `0` we store and fetch the snapshot.
+. For slots (0-7) within first epoch we there is no intermediary layer, so we read the snapshot from slot `0`.
+. For slots (8-15) the path we follow is `8 -> 0`. e.g. For slot `12`, we apply diff from slot `8` on snapshot from slot `0`. Then we replay blocks from 9-12.
+. For slot `18` the shortest path to nearest snapshot is `16 -> 0` and rest will follow same as above.

Contributor

ensi321 Aug 27, 2024

Suggested change

      
            4. For slot `18` the shortest path to nearest snapshot is `16 -> 0` and rest will follow same as above.
          
            4. For slot `18` the shortest path to nearest snapshot is `16 -> 0` and the rest will follow same as above.

docs/pages/contribution/advanced-topics/historical-state-regen.md

+              \end{align*}
+              $$
+              As there are lot of parameters in the system and we don't have accurate values for these so we started few possible estimates. Also as the chain is ever growing data structure the value for `F` is not finite. We decided to do this estimation based on 30 days time period and `mainnet` parameters.

Contributor

ensi321 Aug 27, 2024

Suggested change

      
            As there are lot of parameters in the system and we don't have accurate values for these so we started few possible estimates. Also as the chain is ever growing data structure the value for `F` is not finite. We decided to do this estimation based on 30 days time period and `mainnet` parameters.
          
            As there are lot of parameters in the system and we don't have accurate values for these. we started with few possible estimates. Also as the chain is an ever- growing data structure, the value for `F` is not finite. We decided to do this estimation based on 30 days time period and `mainnet` parameters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet