-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test(gossipsub): peer discovery #1168
base: master
Are you sure you want to change the base?
Conversation
1852506
to
561ca6d
Compare
e2e278e
to
e53d041
Compare
@@ -18,7 +18,7 @@ import | |||
discovery/discoverymngr, | |||
discovery/rendezvousinterface, | |||
] | |||
import ./helpers | |||
import ./helpers, ./asyncunit, ./utils/[async, assertions, futures] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those tests need to be moved to a new PR with a different scope as they are unrelated to GossipSub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I follow. Why are you linking the comment to the imports?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I mean the tests in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say they are related to Gossipsub given they are specified in Gossipsub Testplan's Peer Discovery block, as linked in the description.
Do you perhaps mean to move them to a different file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rendezvous and Peer discovery aren't related to GossipSub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how gossipsub is informed of newly connected peers https://github.com/vacp2p/nim-libp2p/blob/master/libp2p/protocols/pubsub/pubsub.nim#L665. We don't need to use any discovery mechanism to test this, just connecting to peers using the switch
is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whenever a new peer is connected, the gossipsub implementation checks to see if the peer implements floodsub and/or gossipsub, and if so, it sends it a hello packet that announces the topics that it is currently subscribing to.
I don't know if we are doing this part checks to see if the peer implements floodsub and/or gossipsub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are two different things, I believe.
What you mention is the callback that lets you run code when a peer joins or leaves.
This is a discovery mechanism that allows you to find peers available for connection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I believe that having removed the xasyncTest
, this PR's scope is just peer discovery.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, as we agreed, it's best to keep PRs small and focused on a single scope.
That said, we should avoid prolonging the PR process unnecessarily. In this case, for example, the cost of delaying to debate whether to split the PR outweighs simply merging it, provided the code is sound.
Approaching the discussion more collaboratively might have helped resolve the issue more quickly.
7fd3a28
to
b40eaa8
Compare
@@ -277,54 +278,6 @@ suite "GossipSub": | |||
|
|||
await allFuturesThrowing(nodes[0].switch.stop(), nodes[1].switch.stop()) | |||
|
|||
asyncTest "GossipSub unsub - resub faster than backoff": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why has this been removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because that test is wrong. The correct one is the one I implemented but marked with xasyncTest
because currently there's no mechanism to make it work as it should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the test wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it doesn't take into account the fanout mechanism. The message is sent via fanout, it doesn't care about backoff period.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, add the test back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if it's not working?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rendezvous tests are unrelated to GossipSub, they shound't be in GossipSub test cases. Please create a new PR with a different description.
) | ||
gossip1.routingRecordsHandler.add( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First time I see RoutingRecordsHandler
. Would you happen to know what is the purpose of this? Is it something mentioned in the spec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the callback for peer exchange.
proc toResult*[T](future: Future[T]): Result[T, string] = | ||
if future.cancelled(): | ||
return results.err("Future cancelled/timed out.") | ||
elif future.finished(): | ||
if not future.failed(): | ||
return future.toOk() | ||
else: | ||
return results.err("Future finished but failed.") | ||
else: | ||
return results.err("Future still not finished.") | ||
|
||
proc waitForResult*[T]( | ||
future: Future[T], timeout = DURATION_TIMEOUT | ||
): Future[Result[T, string]] {.async.} = | ||
discard await future.withTimeout(timeout) | ||
return future.toResult() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that this complexity isn't necessary. Result
is a substitute for an Exception
, not a Future
. Our focus should be to make the codebase simpler and avoid unnecessary complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Result
is not a substitute for Exception
, but for a more general "any failed computation".
In this case, this simplifies and makes more robust a lot of checks in tests. When you run withTimeout
you are effectively collapsing the "uncertainty of the future", that is, making it either have a value, or an error.
Casting that "collapsed future" into a result allows for a more visual and easier handling of that situation.
nodes[1].unsubscribe("foobar", handler) | ||
|
||
await passed.wait(5.seconds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why has this been removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it's been replaced by the waitForResult
call. Conceptually similar, but handier for the checks below.
const | ||
DURATION_TIMEOUT* = 1.seconds | ||
DURATION_TIMEOUT_EXTENDED* = 1500.milliseconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this shouldn't be defined in an utils file. The constant names don't have any meaningful description and we don't have any requirement or consensus about the use of those values in the test suite. If you want to avoid magic numbers and repetition in your tests, I'd suggest to define them for a particular test file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say DURATION_TIMEOUT
is not a horrible name. It specifies the type (duration) and the context (timeout). Given that's the default measurement I'm using, I decided to leave the name as is. I believe the EXTENDED
suffix can be easily understood from that.
The reason I'm specifying them in a file is precisely because I'm using those numbers widely across different files, so it felt a bit repetitive having to repeat them all over the test codebase.
tests/pubsub/utils.nim
Outdated
@@ -201,3 +207,18 @@ proc waitSubGraph*(nodes: seq[PubSub], key: string) {.async.} = | |||
|
|||
await sleepAsync(5.milliseconds) | |||
doAssert Moment.now() < timeout, "waitSubGraph timeout!" | |||
|
|||
proc waitForMesh*( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem to be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was used in a test that was removed by your suggestion. Removing it.
await rdvB.unsubscribe(namespace) | ||
var | ||
query2 = dmA.request(rdvNamespace) | ||
res2 = await query2.getPeer().waitForResult(1.seconds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use withTimeout
and check the result is false
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I could. That being said, I consider casting the Future
to Result
a much cleaner approach which also allows for value checking if need be (not in this case, ofc).
dmB.advertise(rdvNamespace) | ||
let | ||
query1 = dmA.request(rdvNamespace) | ||
res1 = await query1.getPeer().waitForResult(1.seconds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't supposed to hang forever, so you can just use await
and check the value is returned as expected. I'd suggest using a more descriptive name than res1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using await
means there's no timeout, which is something quite handy for this situation. This operation should take way lower than 1 second, which means if the timeout is hit, there's something very wrong with this and the test should quite. If, by any chance, the reason the timeout is hit is the task is hanging, this makes the test exit and not leave the test suite running forever.
Regarding the name, if you have a look at the tests you will see it's kind of standard practice. If you have a suggestion I'm all ears :)
query1 = dmA.request(rdvNamespace) | ||
res1 = await query1.getPeer().waitForResult(1.seconds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A query
is a tool to re-use. Recreating one every loop is not the way to do.
It should work just by writing
asyncTest "Frequent sub/unsub":
let query = dmA.request(rdvNamespace)
for i in 0 ..< 10:
(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just did that and it doesn't work, it breaks at the 3rd iteration. I don't know what might the correlation be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then there's a problem in rendezvous or the discovery manager 😕
But I think I already tried to understand this. It rings a bell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would you rather I do with this? Leave it as is and open an issue or would you like to investigate it before merging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to investigate, it's a real problem. I'll do this next week.
|
||
res2.assertIsErr() | ||
|
||
asyncTest "Frequent sub/unsub with multiple clients": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing in this test, you should create only one query per discovery manager.
Implement Gossipsub's test plan's Peer Discovery block.
Note
One of the tests is commented out due to the reasons explained in the comment.