Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/evm, eth/tracers: refactor structlogger + make it streamable #30806

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

holiman
Copy link
Contributor

@holiman holiman commented Nov 25, 2024

This PR refactors the structlog a bit, making it so that it can be used in a streaming mode.

The commandline usage of structlog now uses the streaming mode, leaving the non-streaming mode of operation for the eth_Call.

There are two benefits of streaming mode

  1. Not have to maintain a long list of operations,
  2. Not have to duplicate / n-plicate data, e.g. memory / stack / returndata so that each entry has their own private slice.

Before:

[user@work go-ethereum]$ ./evm-master --debug --code 604019 run

#### TRACE ####
PUSH1           pc=00000000 gas=10000000000 cost=3

NOT             pc=00000002 gas=9999999997 cost=3
Stack:
00000000  0x40

STOP            pc=00000003 gas=9999999994 cost=0
Stack:
00000000  0xffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffbf

#### LOGS ####

After

[user@work go-ethereum]$ ./evm-structlog --debug --code 604019 run
PUSH1           pc=00000000 gas=10000000000 cost=3

NOT             pc=00000002 gas=9999999997 cost=3
Stack:
00000000  0x40

STOP            pc=00000003 gas=9999999994 cost=0
Stack:
00000000  0xffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffbf

// create a log
if l.writer == nil {
// Non-streaming, need to copy slices.
log.Memory = slices.Clone(log.Memory)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it makes more sense to write them to a buffer in this case and flush them at the end if we want to support the non-streaming case. I don't see a case in the codebase where we use the structlogs directly, only ever in a formatted/textified form

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The buffering is really due to them being included as items in a json-list, in the rpc response.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean shouldn't we marshal them here already into a buffer and not store them as individual logs, but I guess that will break external projects using the struct logger

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We kind of could, but then we would have to redefine the other rpc message to not double-encode the already encoded snippets. It would allow us to have a better estimation of how much data we are accumulating, so instead of limiting for n number of items, we could limit for a certain number of MB.
Is there any other reason for you to suggest this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, basically these reasons. Will also limit the peak amount of memory, since we can free up the un-encoded data quickly and don't have to keep it around until the end of the encoding. Average memory consumption will go slightly up though (encoded bigger then not encoded)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you are suggesting makes sense though, because it simplifies a whole lot.

@holiman
Copy link
Contributor Author

holiman commented Nov 26, 2024

I have now pushed some changes in the non-streaming encoder, which is used when returning "legacy" traces in the debug api.

To ensure that nothing substantial changed, this is how I tested it:

 go run ./cmd/geth --dev console --exec 'debug.traceCall({input: "0x600051600155602051600255"},"latest" , {"enableMemory": true})'  > ethcall_result

The expected response is

{
  failed: false,
  gas: 57606,
  returnValue: "",
  structLogs: [{
      depth: 1,
      gas: 49946818,
      gasCost: 3,
      op: "PUSH1",
      pc: 0,
      stack: []
  }, {
      depth: 1,
      gas: 49946815,
      gasCost: 6,
      op: "MLOAD",
      pc: 2,
      stack: ["0x0"]
  }, {
      depth: 1,
      gas: 49946809,
      gasCost: 3,
      memory: ["0000000000000000000000000000000000000000000000000000000000000000"],
      op: "PUSH1",
      pc: 3,
      stack: ["0x0"]
  }, {
      depth: 1,
      gas: 49946806,
      gasCost: 2200,
      memory: ["0000000000000000000000000000000000000000000000000000000000000000"],
      op: "SSTORE",
      pc: 5,
      stack: ["0x0", "0x1"],
      storage: {
        0000000000000000000000000000000000000000000000000000000000000001: "0000000000000000000000000000000000000000000000000000000000000000"
      }
  }, {
      depth: 1,
      gas: 49944606,
      gasCost: 3,
      memory: ["0000000000000000000000000000000000000000000000000000000000000000"],
      op: "PUSH1",
      pc: 6,
      stack: []
  }, {
      depth: 1,
      gas: 49944603,
      gasCost: 6,
      memory: ["0000000000000000000000000000000000000000000000000000000000000000"],
      op: "MLOAD",
      pc: 8,
      stack: ["0x20"]
  }, {
      depth: 1,
      gas: 49944597,
      gasCost: 3,
      memory: ["0000000000000000000000000000000000000000000000000000000000000000", "0000000000000000000000000000000000000000000000000000000000000000"],
      op: "PUSH1",
      pc: 9,
      stack: ["0x0"]
  }, {
      depth: 1,
      gas: 49944594,
      gasCost: 2200,
      memory: ["0000000000000000000000000000000000000000000000000000000000000000", "0000000000000000000000000000000000000000000000000000000000000000"],
      op: "SSTORE",
      pc: 11,
      stack: ["0x0", "0x2"],
      storage: {
        0000000000000000000000000000000000000000000000000000000000000001: "0000000000000000000000000000000000000000000000000000000000000000",
        0000000000000000000000000000000000000000000000000000000000000002: "0000000000000000000000000000000000000000000000000000000000000000"
      }
  }, {
      depth: 1,
      gas: 49942394,
      gasCost: 0,
      memory: ["0000000000000000000000000000000000000000000000000000000000000000", "0000000000000000000000000000000000000000000000000000000000000000"],
      op: "STOP",
      pc: 12,
      stack: []
  }]
}

The only diff between this PR and master is that this omits empty memory.

9d8
<       memory: [],
17d15
<       memory: [],

The early encoder makes it so we don't have to copy slices explicitly, we just encode a json.Rawmessage. This is ideal for keeping track of the response size. Earlier, the limit was on the number of elements, which is a very vague number (e..g depending on if each element has a megabyte of memory) -- we can now redefine this to mean a number of bytes.

Copy link
Member

@MariusVanDerWijden MariusVanDerWijden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I think @s1na should also take a look

// Create a tracer which records the number of steps
var steps = 0
tracer := &tracing.Hooks{
OnOpcode: func(pc uint64, op byte, gas, cost uint64, scope tracing.OpContext, rData []byte, depth int, err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like at this point we are benching EVM execution :) it seems this benchmark was introduced in #23016 to test optimizations to the struct logger. What do you think about dropping it?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sgtm!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed it instead in the latest commit. It's horribly much slower though, due to the json-encoding and extra mem usage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You were not kidding. It's 10x slower.

PR:

goos: darwin
goarch: arm64
pkg: github.com/ethereum/go-ethereum/eth/tracers
BenchmarkTransactionTraceV2-11                15          74619664 ns/op        87275606 B/op     897587 allocs/op
PASS

Master:

goos: darwin
goarch: arm64
pkg: github.com/ethereum/go-ethereum/eth/tracers
BenchmarkTransactionTrace-11                 163           6710361 ns/op         3798259 B/op      81648 allocs/op
PASS


// toLegacyJSON converts the structlog to json-encoded legacy form (StructLogRes).
//
// The differences between the structlog json and the 'legacy' json are:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do we use the non-legacy json format? I can't seem to find it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above, with debug tracecall

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go run ./cmd/geth --dev console --exec 'debug.traceCall({input: "0x600051600155602051600255"},"latest" , {"enableMemory": true})'

The updated benchmark is a lot slower than previously, since previously
the json-encoding was deferred to later. Also, storing the data as json-encoded
strings is larger than the raw bytes.

The BenchmarkTransactionTraceV2 is thus renamed from the original
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants