New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
p2p: index TxOrphanage by wtxid, allow entries with same txid #30000
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsNo conflicts as of last run. |
🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the Possibly this is due to a silent merge conflict (the changes in this pull request being Leave a comment here, if you need help tracking down a confusing failure. |
Will fix the linter when I rebase |
c1bd985
to
4b4dfaa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code changes are quite mechanical, thankfully
concept ACK
src/net_processing.cpp
Outdated
if (m_orphanage.HaveTx(gtxid)) return true; | ||
// Orphanage is checked by wtxid. However, even if this is a txid, look up the same hash in | ||
// case this is a non-segwit transaction in the orphanage. | ||
if (m_orphanage.HaveTx(Wtxid::FromUint256(gtxid.GetHash()))) return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (m_orphanage.HaveTx(Wtxid::FromUint256(gtxid.GetHash()))) return true; | |
if (m_orphanage.HaveTx(Wtxid::FromUint256(hash))) return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
should be able to add fuzz coverage easily for differing wtxids but same txids in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK, but I'm concerned about the behaviour change in AlreadyHaveTx
, which seems quite non-trivial to review so I'll need to look into further.
src/txorphanage.cpp
Outdated
if (maybeErase->second.fromPeer == peer) | ||
{ | ||
nErased += EraseTxNoLock(maybeErase->second.tx->GetHash()); | ||
nErased += EraseTxNoLock(maybeErase->second.tx->GetWitnessHash()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can't we just use first
?
nErased += EraseTxNoLock(maybeErase->second.tx->GetWitnessHash()); | |
nErased += EraseTxNoLock(maybeErase->first); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or alternatively:
git diff on 4b4dfaa
diff --git a/src/txorphanage.cpp b/src/txorphanage.cpp
index 504a1f7ec3..bed92e6b32 100644
--- a/src/txorphanage.cpp
+++ b/src/txorphanage.cpp
@@ -98,13 +98,11 @@ void TxOrphanage::EraseForPeer(NodeId peer)
m_peer_work_set.erase(peer);
int nErased = 0;
- std::map<Wtxid, OrphanTx>::iterator iter = m_orphans.begin();
- while (iter != m_orphans.end())
- {
- std::map<Wtxid, OrphanTx>::iterator maybeErase = iter++; // increment to avoid iterator becoming invalid
- if (maybeErase->second.fromPeer == peer)
+ for (auto iter = m_orphans.begin(); iter != m_orphans.end(); ) {
+ const auto& [wtxid, tx] = *iter++; // increment to avoid iterator becoming invalid
+ if (tx.fromPeer == peer)
{
- nErased += EraseTxNoLock(maybeErase->second.tx->GetWitnessHash());
+ nErased += EraseTxNoLock(wtxid);
}
}
if (nErased > 0) LogPrint(BCLog::TXPACKAGES, "Erased %d orphan tx from peer=%d\n", nErased, peer);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could still happen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was (mostly) taken
concept ACK After being caught out of guard due to the short notice on this PR at the Bitcoin Review Club, I took time to review the code changes and read the comments above. To my understanding, we need to check the This change needs to be implemented only for the orphanage and not the rest of the structures that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
src/txorphanage.cpp
Outdated
if (maybeErase->second.fromPeer == peer) | ||
{ | ||
nErased += EraseTxNoLock(maybeErase->second.tx->GetHash()); | ||
nErased += EraseTxNoLock(maybeErase->second.tx->GetWitnessHash()); | ||
} | ||
} | ||
if (nErased > 0) LogPrint(BCLog::TXPACKAGES, "Erased %d orphan tx from peer=%d\n", nErased, peer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In 646aa4d
nit: nErased
is a counter right? Shouldn't this read txs
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I've added a commit to fix this and another log to say "transaction(s)". I took the opportunity to add a few nice things to the logs as well.
4b4dfaa
to
65f69bf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I have split the first commit into:
- net processing change to stop querying by txid
- 2 refactors changing the TxOrphanage methods to be by Wtxid only
- TxOrphanage change to index by wtxid and thus allow entries with the same txid
Hopefully in review, we can convince ourselves that the first commit is the right thing to do, separately from the other changes. The tests may help with that - also added one for invs that helps illustrate what the cases are, and added unit tests.
I thought a bit about whether we should check for same-txid-different-witness but from the same peer and replace that in orphanage. Given that a Bitcoin Core node doesn't ever do witness replacement, this seems like unnecessary complexity that won't ever get used between normally-operating nodes. So I did not add this.
src/net_processing.cpp
Outdated
if (m_orphanage.HaveTx(gtxid)) return true; | ||
// Orphanage is checked by wtxid. However, even if this is a txid, look up the same hash in | ||
// case this is a non-segwit transaction in the orphanage. | ||
if (m_orphanage.HaveTx(Wtxid::FromUint256(gtxid.GetHash()))) return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reviewed through 65f69bf
Given that a Bitcoin Core node doesn't ever do witness replacement, this seems like unnecessary complexity that won't ever get used between normally-operating nodes. So I did not add this.
Makes sense, seems unlikely to ever be used in a 20 minute window lifetime of an orphan given that we don't do these kinds of replacements.
also opened #30082 seeing we're touching EraseForPeer
et al a bit in not quite "trivial" ways
src/txorphanage.cpp
Outdated
if (maybeErase->second.fromPeer == peer) | ||
{ | ||
nErased += EraseTxNoLock(maybeErase->second.tx->GetHash()); | ||
nErased += EraseTxNoLock(maybeErase->second.tx->GetWitnessHash()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could still happen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
Code changes look good at first glance, still thinking about the implications of 9d2e905 (w.r.t. discussion in #30000 (comment)). Verified that both the unit tests and each of the newly introduced functional sub-tests fail without the main commit 4206ba3, as expected.
65f69bf
to
822b2f1
Compare
Thanks @instagibbs @theStack @sr-gi! Addressed the comments and added a few minor log improvements. |
822b2f1
to
16483fe
Compare
Ah, p2p_invalid_tx.py failed because |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reviewed through 16483fe
git range-diff master 65f69bf 16483fee7c6d93722bfb79fce9efbe841ec13d6a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tACK 16483fe with some nits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested ACK 16483fe
I was at first skeptical about the first commit for similar reasons as stickies-v above, but after thinking about it more and reading the discussion it makes sense to me to cast a Txid
to a Wtxid
here.
I also played with the functional tests a bit (e.g. giving the parent in test_orphan_txid_inv
a low fee and testing that everything still works if 1p1c package validation is used).
No behavior change right now, as transactions in the orphanage are unique by txid. This makes the next commit easier to review.
Index by wtxid instead of txid to allow entries with the same txid but different witnesses in orphanage. This prevents an attacker from blocking a transaction from entering the orphanage by sending a mutated version of it.
16483fe
to
51c73a0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mzumsande @AngusP @instagibbs! Addressed comments. I've also expanded the "cast from txid to wtxid" part to be slightly more verbose but hopefully easier to understand.
- Add elapsed time in "remove orphan" log - Add size in "stored orphan" log - grammar edit
51c73a0
to
0fb17bf
Compare
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 0fb17bf
reviewed via git range-diff master 16483fee7c6d93722bfb79fce9efbe841ec13d6a 0fb17bf61a40b73a2b81a18e70b3de180c917f22
if (gtxid.IsWtxid()) { | ||
// Normal query by wtxid. | ||
if (m_orphanage.HaveTx(Wtxid::FromUint256(hash))) return true; | ||
} else { | ||
// Never query by txid: it is possible that the transaction in the orphanage has the same | ||
// txid but a different witness, which would give us a false positive result. If we decided | ||
// not to request the transaction based on this result, an attacker could prevent us from | ||
// downloading a transaction by intentionally creating a malleated version of it. While | ||
// only one (or none!) of these transactions can ultimately be confirmed, we have no way of | ||
// discerning which one that is, so the orphanage can store multiple transactions with the | ||
// same txid. | ||
// | ||
// While we won't query by txid, we can try to "guess" what the wtxid is based on the txid. | ||
// A non-segwit transaction's txid == wtxid. Query this txid "casted" to a wtxid. This will | ||
// help us find non-segwit transactions, saving bandwidth, and should have no false positives. | ||
if (m_orphanage.HaveTx(Wtxid::FromUint256(hash))) return true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(since @sr-gi mentioned this offline) Yes, the last push turned this 1 line into 2 identical lines. The purpose is to illustrate to current reviewers and future readers more clearly... if wtxid, we do this. if txid, we also do this, but notice that we are actually doing a "cast" and that's ok because we are guessing what the wtxid is.
Also, perhaps these will diverge again in the future if GenTxid
becomes a std::variant<Txid, Wtxid>
.
ACK 0fb17bf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 0fb17bf
I like the latest, more verbose version of the first commit (7e475b9), as it was significantly easier to grasp what's going on and convince myself that this is the behaviour we want. The only small drawback I see that it looks odd if both the if- and else-branch do exactly the same thing, and it could invite people to open deduplication refactor PRs (or leave them wondering if it's a bug). Maybe adding some "this is intended" comment would make sense for a follow-up.
tx_child = self.wallet.create_self_transfer(utxo_to_spend=tx_parent["new_utxo"]) | ||
|
||
# Create a fake version of the child | ||
tx_orphan_bad_wit = self.create_malleated_version(tx_child) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, potential follow-up material: the test is imho a bit easier to follow if the naming refers to the original (non-malleated) counter-part:
tx_orphan_bad_wit = self.create_malleated_version(tx_child) | |
tx_child_bad_wit = self.create_malleated_version(tx_child) |
(same as for the other two sub-tests below)
Could make the 'cast' explicit in the second branch, though there's already the comment making it pretty clear this is intentional so may be unnecessary const auto guessed_wtxid = Wtxid::FromUint256(hash);
if (m_orphanage.HaveTx(guessed_wtxid)) return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
crACK 0fb17bf
Non-blocking comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 0fb17bf
It took me a while to grok the implications of this PR, but the latest version of this PR makes it much more reviewable and I'm now comfortable that the changes are safe and desired. Also, nice work on the tests!
What I missed/misunderstood in my earlier review is that the reduced orphanage hit rate is a goal, and not a bug/something to minimize. Related nit: perhaps the commit message of 7e475b9 could be explicit about this reduction and why we want it?
I remain uneasy about AlreadyHaveTx()
not really doing what the name indicates (because of the GenTxid
ambiguity), but is not a huge concern and I think should not block this PR / can be addressed separately if it is desired.
@@ -180,6 +190,49 @@ BOOST_AUTO_TEST_CASE(DoS_mapOrphans) | |||
BOOST_CHECK(orphanage.CountOrphans() == 0); | |||
} | |||
|
|||
BOOST_AUTO_TEST_CASE(same_txid_diff_witness) | |||
{ | |||
FastRandomContext det_rand{true}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tidy nit:
FastRandomContext det_rand{true}; | |
FastRandomContext det_rand{/*fDeterministic=*/true}; |
src/net_processing.cpp
Outdated
if (gtxid.IsWtxid()) { | ||
// Normal query by wtxid. | ||
if (m_orphanage.HaveTx(gtxid)) return true; | ||
} else { | ||
// Never query by txid: it is possible that the transaction in the orphanage has the same | ||
// txid but a different witness, which would give us a false positive result. If we decided | ||
// not to request the transaction based on this result, an attacker could prevent us from | ||
// downloading a transaction by intentionally creating a malleated version of it. | ||
// | ||
// While we won't query by txid, we can try to "guess" what the wtxid is based on the txid. | ||
// A non-segwit transaction's txid == wtxid. Query this txid "casted" to a wtxid. This will | ||
// help us find non-segwit transactions, saving bandwidth, and should have no false positives. | ||
if (m_orphanage.HaveTx(GenTxid::Wtxid(hash))) return true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In 7e475b9:
I think having the if/else
path in this commit is helpful, since it distinguishes between when we cast the txid and when we don't.
nit: I think this already works pretty well, but some bits (e.g. "guess what the wtxid is") are not super clear imo. I've summarized my understanding into the below, feel free to use what you like.
// Never query by txid: it is possible that an existing transaction in the orphanage has the
// same txid but a different witness, which would give us a false positive result. If we
// don't request the transaction based on this result, an attacker could prevent us from
// downloading a transaction by intentionally creating a malleated version of it.
//
// False positive are unsafe and must be avoided.False negatives are acceptable and just
// lead to increased bandwidth usage.
//
// When gtxid is a txid it means when we don't have enough information to query by wtxid,
// e.g. because the transaction was announced by a non-wtxid-relay peer, or because we're
// checking the prevouts of a transaction. It doesn't mean that the transaction has no
// witness data, so we need to distinguish between those cases.
//
// If the transaction:
// 1) does not have witness data, we can safely query the orphanage by casting a txid to a
// wtxid because they are equivalent. False positives and false negatives are impossible.
// 2) has witness data, the txid and wtxid are different by definition, making this query a
// no-op. False positives are not possible, but we may have false negatives.
//
// So, by querying for a txid cast to a wtxid, we avoid false positives entirely, but we
// save bandwidth for the case where txid == wtxid.
side-note rant: I think the fact that we need such a verbose comment is a pretty strong sign AlreadyHaveTx
should be refactored, e.g. using {Wt,T}xid
types as I suggested here, but I appreciate that it's mostly orthogonal to this PR. Renaming AlreadyHaveTx
to e.g. ShouldRequestTx
or something would probably already clarify quite a bit, too (although also not blocking for this PR either).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've summarized my understanding into the below
Looks correct to me
I think the fact that we need such a verbose comment is a pretty strong sign AlreadyHaveTx should be refactored, e.g. using {Wt,T}xid types as I suggested #30000 (comment), but I appreciate that it's mostly orthogonal to this PR.
I think it's really a sign to ban txid-based transaction relay. Just kidding. I agree it's hard to reason about logic dealing with a hash that can be txid, wtxid, or bot. I think it's just one of the common things that trips people in tx relay, just like everybody's mempool fuzzer gets snagged on CPFP carve out, and that for
loop in init.cpp that claims reviewer victims every time it's touched.
Renaming AlreadyHaveTx to e.g. ShouldRequestTx or something would probably already clarify quite a bit, too (although also not blocking for this PR either).
Yeah, AlreadyHaveTx
should probably have a name that reflects its purpose of "should I bother {requesting, downloading, validating} this tx".
Do note that #30110 makes AlreadyHaveTx
a function internal to the TxDownloadImpl
, i.e. no longer something that peerman knows about. Funnily, TxDownloadImpl::AlreadyHaveTx
might be a fine name when it's in that context. I get that it's confusing, but I'd ask that we defer renaming until after that PR since it would conflict + maybe be less problematic afterwards.
@@ -23,7 +23,7 @@ bool TxOrphanage::AddTx(const CTransactionRef& tx, NodeId peer) | |||
|
|||
const Txid& hash = tx->GetHash(); | |||
const Wtxid& wtxid = tx->GetWitnessHash(); | |||
if (m_orphans.count(hash)) | |||
if (m_orphans.count(wtxid)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c++-20 nit:
if (m_orphans.count(wtxid)) | |
if (m_orphans.contains(wtxid)) |
} else { | ||
return m_orphans.count(Txid::FromUint256(gtxid.GetHash())); | ||
} | ||
return m_orphans.count(wtxid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c++20-nit:
return m_orphans.count(wtxid); | |
return m_orphans.contains(wtxid); |
// downloading a transaction by intentionally creating a malleated version of it. While | ||
// only one (or none!) of these transactions can ultimately be confirmed, we have no way of | ||
// discerning which one that is, so the orphanage can store multiple transactions with the | ||
// same txid. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In 8923edf:
nit: I feel like documentation on why TxOrphanage
indexes by wtxid/allows multiple versions of the same transaction should be documented in TxOrphanage
instead of here.
bad_peer = node.add_p2p_connection(P2PInterface()) | ||
honest_peer = node.add_p2p_connection(P2PInterface()) | ||
|
||
# 1. Fake orphan is received first. It is missing an input. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I first interpreted that as that an input had been removed from the transaction, perhaps better to use "parent" as to stay consistent with the language in the next steps?
# 1. Fake orphan is received first. It is missing an input. | |
# 1. Fake orphan is received first, the parent is not yet broadcast. |
tx_middle = self.wallet.create_self_transfer(utxo_to_spend=tx_grandparent["new_utxo"]) | ||
|
||
# Create a fake version of the middle tx | ||
tx_orphan_bad_wit = self.create_malleated_version(tx_middle) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: would tx_middle_bad_wit
be more consistent naming here?
trACK 0fb17bf |
I merged this as-is since it has so many acks, but it probably makes sense to follow up on the last few comments and suggestions. |
Part of #27463 in the "make orphan handling more robust" section.
Currently the main map in
TxOrphanage
is indexed by txid; we do not allow 2 transactions with the same txid into TxOrphanage. This means that if we receive a transaction and want to store it in orphanage, we'll fail to do so if a same-txid-different-witness version of the tx already exists in the orphanage. The existing orphanage entry can stay until it expires 20 minutes later, or until we find that it is invalid.This means an attacker can try to block/delay us accepting an orphan transaction by sending a mutated version of the child ahead of time. See included test.
Prior to #28970, we don't rely on the orphanage for anything and it would be relatively difficult to guess what transaction will go to a node's orphanage. After the parent(s) are accepted, if anybody sends us the correct transaction, we'll end up accepting it. However, this is a bit more painful for 1p1c: it's easier for an attacker to tell when a tx is going to hit a node's orphanage, and we need to store the correct orphan + receive the parent before we'll consider the package. If we start out with a bad orphan, we can't evict it until we receive the parent + try the 1p1c, and then we'll need to download the real child, put it in orphanage, download the parent again, and then retry 1p1c.