-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DKG saga, part 4: On-chain integration for TBTC DKG #3427
Conversation
This is a preparation for integrating ECDSA DKG with on-chain contracts. We no longer want the beacon to execute test functions and pollute production logs. We can always enable it back for development.
Introduced a function allowing to convert `map[group.MemberIndex][]byte` into two slices: []*big.Int and []byte. This format conversion is needed for SubmitDkgResult (to be implemented in one of the next commits).
Introduced a function allowing to convert `ecdsa.PublicKey` into a 64-byte long array. This format conversion is needed for SubmitDkgResult (to be implemented in one of the next commits).
We need to know identifiers of members selected to the signing group during the DKG result submission. This change makes the IDs available for storing and processing but is not yet using them. This will be done in one of the next commits.
The function is partially implemented and DOES NOT WORK. It will be finished in next commits. Specifically, member IDs selected for the group by the sortition pool are not passed.
Here we wire up the missing pieces, i.e. the IDs of operators selected for DKG and the hash of operators who completed the protocol successfully.
Here we introduce a temporary heartbeat mechanism that will request a heartbeat from the active wallet every ~8 hours. A single request consists of 5 messages that must be signed sequentially by the target wallet. A single message has a format of a valid heartbeat and is compatible with the fraud mechanism. Additionally, this changeset attaches to the real Bridge contract and takes the WalletRegistry address from it, instead of taking it from the config. Therefore, the latter option is now deprecated and should be removed in the near future.
Here we add the outstanding parts required to finalize the DKG after the result is submitted on-chain.
The result submission logic should use a block delay and have a cancel signal based either on the DKG timeout or DKG submission event. This changeset align the existing code to those requirements.
We have that work captured in #3450
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through all files but was doing no tests. Have no more comments but those posted so far.
@@ -125,9 +136,36 @@ func Initialize( | |||
}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is one more thing that occurred to me: at line 112 in this file (not modified in this PR) we have:
_ = chain.OnDKGStarted(func(event *DKGStartedEvent) {
go func() {
if ok := deduplicator.notifyDKGStarted(
event.Seed,
); !ok {
// (...)
node.joinDKGIfEligible(
event.Seed,
event.BlockNumber,
)
}()
})
I am pretty sure this will be problematic on mainnet as-is because of the fact event.BlockNumber
is a part of the signature:
keep-core/solidity/ecdsa/contracts/libraries/EcdsaDkg.sol
Lines 103 to 106 in b8dcba4
// `\x19Ethereum signed message:\n${keccak256( | |
// groupPubKey, misbehavedMembersIndices, dkgStartBlock | |
// )}` | |
bytes signatures; |
We will have small chain reorganizations on mainnet resulting in DKGStartedEvent
s emitted one after another, with different startBlock
. They will be deduplicated by deduplicator.notifyDKGStarted
but the startBlock
matters later for the on-chain result submission.
What I would suggest doing here is to wait for 20 blocks (this feels like safe enough finalization for Ethereum) then take the last, GetPastDkgStartedEvent
, and trigger DKG based on the values from that event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good point. Just a couple of questions:
- Are we sure
GetPastDkgStartedEvent
will return the same event for all? I can imagine a case when not all members receive the same due to ETH node re-connections or slight state differences - What about deduplication? Do we want to keep the current logic? Do we want to compare the original event with the one fetched via
GetPastDkgStartedEvent
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure GetPastDkgStartedEvent will return the same event for all? I can imagine a case when not all members receive the same due to ETH node re-connections or slight state differences
If we wait for 20 blocks then I am pretty sure the answer is yes. This function, under the hood, is just fetching filtered logs from the contract. It's a simple call to Ethereum node, it does not maintain a state. If the smart contract's state has enough confirmations than all nodes should receive the same result.
What about deduplication? Do we want to keep the current logic? Do we want to compare the original event with the one fetched via GetPastDkgStartedEvent?
I think we need to deduplicate to not start multiple monitoring loops for 20 confirmations. I don't think we need to compare events, I think we should just accept the fact the event can change and use the one fetched after the confirmations.
In my mind, this works as:
- I know DKG is probably going to happen. So I start just one (!) monitoring.
- I know the information I have right now may change, I need to wait 20 blocks to see the final version.
- After 20 blocks, I confirm if DKG is in progress by checking the event. I take that event as a final source of truth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll rework that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent some time on it and I'm afraid this is not so simple. If we wait for 20 confirmation blocks, we start the actual off-chain protocol with a delay that breaks a lot of things. This is because we use the original block number from the event as the starting point but the actual block is more than 20 blocks in the future. This PR is already a ball of mud so I think we should think about it and address that problem separately. I created an issue capturing it #3456
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should use the original block number from the event. Agreed, let's address it separately given the size of the PR. I'll leave this conversation unresolved for now so that we can reference it.
The DKG result validation logic used so far was vulnerable to malicious result re-submission and chain reorgs. Here we improve it to be resistant to both problems. First, we improve the `notifyDKGResultSubmitted` method of the event deduplicator to consider the DKG seed and submission block while computing the cache key. This way, the same malicious result will not be considered as duplicate due to different block (and sometimes seed) thus will be validated and challenged by nodes. Second, each node will confirm the DKG state after submitting a DKG result challenge. This way, nodes will be able to re-submit the challenge if the previous one has not changed the DKG state (for example, due to a chain reorg that wiped out the block containing the previous challenge transaction)
Higher values will allow individual members to bump the transaction's gas price 2 times.
The current approach is that the client defines what is supported off-chain and confirms if the chain does not ask for something else. The requirements about the group size or threshold do not come from the chain.
This is now handled by the `activeWalletPublicKey` method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pdyraga @lukasz-zimnoch
I tested a network of three clients and one bootstrap. The DKG was submitted successfully:
2023-01-02T18:47:12.447+0100 INFO keep-tbtc tbtc/dkg.go:279 [member:63] DKG result with group public key [0xc0162fd817b7cbb9e4130bec8d69b7830c2610502e5c8e845bca0d1ab37fe6ac9d3e4972586aaa5de424d52505bbba347466a3078a9cb9fe493839a2f2a34d1b] and result hash [0x0ff2b9f501d2d81e608beb1645e1de1abed567dc24895fbf34b0e03982622613] submitted at block [1704] by member [1] {"seed": "0xbed8cada2f7e8cbfaae4b68557faab80ca3f6442532c6540ed181160fdb68dc5"}
Note that I used a simple RandomBeaconChaosnet
contract that calls WalletRegistry
's callback without any gas limit.
dkgResultApprovalHandlersMutex sync.Mutex | ||
dkgResultApprovalHandlers map[int]func(submission *DKGResultApprovedEvent) | ||
|
||
dkgResultApprovalGuard func() bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider renaming it to dkgResultApprovalRejectFn
or dkgResultApprovalFailFn
and explaining in the comment it is used in tests to programmatically fail the execution of ApproveDKGResult
. It was not clear to me at first what the guard is doing - delaying or not letting to call. And if not letting to call, then why.
// Reject the first approval (with index 0) that will be made by | ||
// member 1 (the submitter) in order to force the member 2 to | ||
// approve after the precedence period. | ||
rejectedApprovalsIndexes: []int{0}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be more natural to have it as failedApprovalsCount
or rejectedApprovalsCount
? This is how it works later in the test - we increment approvalIndex
which is actually an approval attempt count (I'd rename it as well).
For this test, we are always failing the first n attempts of the approval. For example, it's not possible to have here rejectedApprovalsIndexes: []int{0, 4}
.
@@ -125,9 +136,36 @@ func Initialize( | |||
}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should use the original block number from the event. Agreed, let's address it separately given the size of the PR. I'll leave this conversation unresolved for now so that we can reference it.
if err != nil { | ||
return fmt.Errorf("cannot assemble DKG chain result [%w]", err) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something that has just occurred to me: we could validate the result with a call to EcdsaDkgValidator
contract before submitting it. This way, we can eliminate the risk of slashing the operator in case of a bug in the client. I'll opened a separate issue to capture this work: #3461.
); !ok { | ||
logger.Warnf( | ||
"Result with hash [0x%x] for DKG with seed [0x%x] "+ | ||
"and starting block [%v] has been already processed", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a starting block right? This is the block when the DKGResultSubmitted
event was emitted, DKG start block is currently the block at which DkgStarted
was emitted and may change in the future given #3456.
Should this log say submit block
instead?
|
||
// notifyDKGResultSubmitted notifies the client wants to start some actions | ||
// upon the DKG result submission. It returns boolean indicating whether the | ||
// client should proceed with the actions or ignore the event as a duplicate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the current deduplication logic makes a lot of sense given the challenges but we should make it crystal-clear in the docs in case future-us will ever think about removing the result block from the hash. Something like:
// (...)
//
// IMPORTANT: This function considers two events with the same seed and hash as
// separate ones if they do not have the same block number. This is important in
// the context of a possible challenge - we never know how soon the next result
// is submitted after the challenge. In extreme circumstances, it may even be
// the same block as the last challenge.
dkgLogger.Infof( | ||
"invalid DKG result still not challenged; retrying", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This log may not always be true. We could have another result submitted in the meantime and challenged. I think we should remove this log from here and move it to the beginning of the loop after we check that the result we are challenging is the last one submitted.
Captured it in a separate issue: #3462
|
||
return | ||
err = de.chain.ChallengeDKGResult(result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case we run this loop one more time, we do not know if the result
is still the one most recently submitted to the chain. It could happen that the challenge was mined and another result got submitted later. Before we execute this loop again, we should check the result - probably reading it from the last event.
Captured it in a separate issue: #3462
I did just a few initial commits in this PR, all the heavy work was handled by @lukasz-zimnoch who is the actual author of the code in this PR. I reviewed @lukasz-zimnoch 's code and it looks good. This PR is huge so we decided to open separate issues to track the improvements:
I did not test this code manually but @lukasz-zimnoch performed a lot of tests, all listed in the PR description and @tomaszslabon confirmed the happy path works as well. Given the tests were performed, I am not the actual author of this code (just a reviewer) and there is approval from another reviewer - @tomaszslabon - I am 🚢 ing this stuff to |
Closes: #3368
Depends on #3428
This is a PR with an on-chain integration for all
ethereum.TbtcChain
functions. Functions based on a mocked contract implementation are replaced with real (TM) functions talking to the chain. Specific changes are described below.Random Beacon test loops disabled
This PR disables mock random beacon chain functions. This is the M3 client that should talk to the real
WalletRegistry
contract, generate groups based on the contract requests, and generate heartbeat signatures. We no longer need artificial beacon activity for tests here.Real implementation of the
ethereum.TbtcChain
functionsThe
ethereum.TbtcChain
component is the Ethereum implementation of thetbtc.Chain
interface that defines the expectations of a tBTC v2 node regarding the host chain. So far, theethereum.TbtcChain
was actually a mock that was not talking with real contracts but triggered some test actions based on the in-memory state. This pull request changes that and provides real implementations for allethereum.TbtcChain
methods which now talk with the on-chain contracts likeBridge
andWalletRegistry
. That means the tBTC v2 node is now fully integrated with the chain.Submission and validation of the DKG result
As mentioned in the previous section, the chain integration was done. That means the tBTC v2 node must deal with DKG results according to the rules defined by the on-chain
WalletRegistry
contract. The node was adjusted to do the following:Having that implemented means that the tBTC v2 nodes are able to answer wallet creation requests issued by the
Bridge
contract. A successful DKG result approval registers the given wallet in theBridge
and makes it ready to handle tBTC protocol actions.Worth noting that this PR does not handle DKG retries that should be made upon a DKG result challenge. This scenario will be handled separately.
Temporary wallet heartbeats
This pull request also changes the existing signing test loop to be a temporary heartbeat mechanism ensuring the on-chain wallets are live. The previous signing test loop asked the latest in-mem wallet to sign 10 distinct messages every 500 blocks. The new heartbeat mechanism asks the on-chain active wallet to sign 5 distinct messages every ~8 hours. What's important, the signed messages follow a special heartbeat format that makes them valid from the fraud mechanism's viewpoint. This is important as wallets registered on-chain can be subject to fraud challenges. If they sign things that are invalid from the tBTC protocol's perspective, they are slashed.
Testing
To test changes presented in this PR, the following on-chain requirements must be met:
Bridge
's contract state must allow for new wallet creationWalletRegistry
must use a mocked random beacon that will answer requests by calling back the rightWalletRegistry
functionWalletRegistry
must be authorized in theReimbursementPool
contract (necessary for successful DKG result approval)Each test scenario was triggered by calling
Bridge
'srequestNewWallet
function. Always3
off-chain clients were involved. Here is a summary of the most important scenarios that were checked multiple times and in different flavors:WalletRegistry
unauthorized from theReimbursementPool
). All members tried to approve by submitting an approval transaction that eventually failed.