eth/protocols/eth: implement eth/69 #29158

MariusVanDerWijden · 2024-03-04T11:03:03Z

This PR implements eth/69 which drops the bloom filter from the receipts messages.
This will reduce the amount of data needed for a sync by ~530GB (2.3B txs * 256 byte) uncompressed.
Compressed this will be reduced to ~100GB

eth/protocols/eth/protocol.go

core/blockchain_reader.go

eth/protocols/eth/handler.go

eth/protocols/eth/handlers.go

core/types/receipt.go

holiman · 2024-08-20T08:53:31Z

Please link to the protocol specification in the PR description.

core/types/receipt_test.go

eth/protocols/eth/handlers.go

holiman · 2024-08-20T10:11:04Z

eth/protocols/eth/handlers.go

+				continue
+			}
+		} else {
+			block := chain.GetBlockByHash(hash)


chain.GetBlockByHash

GetBlockByHash retrieves a block from the database by hash, caching it if found.

(emphasis mine). Are we sure we want to do that? I mean, maybe, I don't know, but worth considering.

Also, there are basically two paths here:

The header/block has no txs / recipts

The header/block has txs / receipts

(error-path: the header/block has it, but we're missing it)

For the two paths, we can distinguish between them if we first read the header, and check the receipts-root. The current code unconditionally tries to load the receipts, which reads header number and then loads from ancients or regular-database. And then, it always needs to read the header anyway (either explicitly or implicitly when loading the body) to distinguish between 1 or 2.

So if you read the header first, you can make path 1 less loady.

eth/protocols/eth/handlers.go

eth/protocols/eth/handshake.go

eth/protocols/eth/protocol.go

holiman · 2024-08-21T07:56:44Z

eth/protocols/eth/handlers.go

+		} else {
+			header := chain.GetHeaderByHash(hash)
+			if header.ReceiptHash != types.EmptyReceiptsHash {
+				body := chain.GetBody(hash)


The other handles do not add body to cache (only the rlp-body. ). The GetBody uses ReadBodyRLP under the hood, perhaps you can/should use chain.GetBodyRLP instead?

It will avoid putting the body object in cache.
You can then either decode it into a body object, or you can choose to iterate the rlp lists: the first one is the transactions. And you can decode one transaction at a time, or make some smart iterator which serves you the data you need.

Serving this query is mighty expensive.

GetBlodyRLP also caches it...
This query will be less expensive than before either way

Rlp caches the rlp, not the objecr

This query will be a lot more expensive than before, as I see it. How do you reckon "less expensive"?

Before this, we always called ReadReceipts which does two ReadHeaders a ReadBody and a ReadRawReceipts also it does DeriveFields on every receipt

rjl493456442 · 2024-09-06T08:39:35Z

cmd/devp2p/internal/ethtest/conn.go

@@ -316,8 +316,10 @@ loop:
 				return fmt.Errorf("wrong head block in status, want:  %#x (block %d) have %#x",
 					want, chain.blocks[chain.Len()-1].NumberU64(), have)
 			}
-			if have, want := msg.TD.Cmp(chain.TD()), 0; have != want {
-				return fmt.Errorf("wrong TD in status: have %v want %v", have, want)
+			if c.negotiatedProtoVersion < 69 {


Any particular reason to specialize the check for protocols before 69?

After 69 we will not have the TD anymore. We could just remove the check, but I wanted to preserve it to make sure we don't accidentally break 68 compatibility

rjl493456442 · 2024-09-06T08:41:44Z

core/blockchain_reader.go

+// GetRawReceiptsByHash retrieves the receipts for all transactions in a given block
+// without deriving the internal fields and the Bloom.
+func (bc *BlockChain) GetRawReceiptsByHash(hash common.Hash) rlp.RawValue {
+	number := rawdb.ReadHeaderNumber(bc.db, hash)


Should we also maintain a RLP cache for the raw receipt data?

rjl493456442 · 2024-09-06T09:40:00Z

eth/protocols/eth/handlers.go

+		}
+		// Retrieve the requested block's receipts
+		results := chain.GetRawReceiptsByHash(hash)
+		if results == nil {


If the results returned by chain.GetRawReceiptsByHash(hash) is nil, it means the receipts is not existent.
Namely, the results should never be nil even the block is empty (receipt size is zero).

The eth68 spec is unclear how should we handle the request for non-existent receipts.
Should we include the nil into the response slice (it's chosen by this pull request)?
Should we completely ignore the missing receipt?

Eth68 is defined as: https://github.com/ethereum/devp2p/blob/master/caps/eth.md#getreceipts-0x0f,
probably @fjl can provide more information here.

Besides, serviceGetReceiptsQuery68 has the same issue.

karalabe

One thing we've been thinking about is to completely remove the Bloom field from the receipt and generate it on the fly wnehever/wherever it is needed. Otherwise it's going to get generated in a potentially many cases when we don't even need it. Also trying to be smart and not generate it will blow up if someone later starts accessing it. By not having it available (or only via a method), we can make sure it's used correctly. Of course, if we would end up regenerating it multiple times then that's a problem, so we might need to be careful with cached stuff, but still.

Ethiqque · 2024-09-26T06:59:47Z

core/blockchain_reader.go

+func (bc *BlockChain) GetRawReceiptsByHash(hash common.Hash) rlp.RawValue {
+	number := rawdb.ReadHeaderNumber(bc.db, hash)


Better LRU cache

holiman · 2024-09-26T08:31:47Z

One thing we've been thinking about is to completely remove the Bloom field from the receipt and generate it on the fly wnehever/wherever it is needed.

This can be done e.g. when deriving the root sha, e.g. in a closure. However, it can be done in a follow-up PR, it's not directly related to eth69, it's an internal thing. So let's ignore it here and now.

@fjl will discuss the protocol options a bit more in the next p2p meeting.

MariusVanDerWijden · 2024-11-21T09:56:44Z

eth/protocols/eth/handlers.go

 	// A batch of receipts arrived to one of our previous requests
 	res := new(ReceiptsPacket)
 	if err := msg.Decode(res); err != nil {
 		return fmt.Errorf("%w: message %v: %v", errDecode, msg, err)
 	}
+	// calculate the bloom filter before dispatching


Can't decode directly into types.Receipts, since it will just try to decode it into a legacy receipt

Decode into network receipts

MariusVanDerWijden · 2024-11-21T11:39:38Z

@fjl I added a commit to provide custom encoding/decoding to the receipts for network. I'm not 100% sure about my decoding logic, also the way I do it right now will result in a bunch more allocations. I'm not sure how to fix that without writing much more weird code, this is the cleanest solution I came up with, maybe you have some better idea?

holiman · 2024-11-25T09:39:02Z

core/types/receipt_test.go

+	for _, receipt := range receipts {
+		receipt.Bloom = CreateBloom(Receipts{receipt})
+	}


Please do this via some lazy-loading, e.g. using sync.Once. Otherwise this will happen on every test-run which imports core/types, and it will show up on cpu-profiles and mem-profiles

fjl · 2024-11-25T10:07:41Z

eth/protocols/eth/receipt.go

+	if kind != rlp.List {
+		return errors.New("invalid receipt")
+	}
+	if size == 4 {


In the RLP format, the size is always given in bytes, even for lists. There is no way to know the number of items in a list before decoding the whole list. So this check doesn't work, and it's why I suggested to just encode all receipt types, even legacy ones, in the format [type, statusOrRoot, gasUsed, logs].

My suggestion here would be something like:

var ( rtype byte status []byte gasUsed uint64 logs []*types.Log err error ) if _, err = s.List(); err != nil { return err } { if rtype, err = s.Uint8(); err != nil { return err } if status, err = s.Bytes(); err != nil { return err } if gasUsed, err = s.Uint64(); err != nil { return nil } if err = s.Decode(&logs); err != ni { return err } } if err := s.ListEnd(); err != nil { return err } receipt := types.Receipt{Type: rtype, GasUsed: gasUsed, Logs: logs} if len(status) == 32 { receipt.Root = status } else if len(status) == 8 { receipt.Status = binary.BigEndian.Uint64(status) }

It's ugly, but hand-writing decoders always is.