test(gossipsub): peer discovery #1168

AlejandroCabeza · 2024-08-07T19:07:38Z

Implement Gossipsub's test plan's Peer Discovery block.

Note

One of the tests is commented out due to the reasons explained in the comment.

diegomrsantos · 2024-09-05T16:00:15Z

tests/testdiscovery.nim

@@ -18,7 +18,7 @@ import
    discovery/discoverymngr,
    discovery/rendezvousinterface,
  ]
-import ./helpers
+import ./helpers, ./asyncunit, ./utils/[async, assertions, futures]


Those tests need to be moved to a new PR with a different scope as they are unrelated to GossipSub.

I don't think I follow. Why are you linking the comment to the imports?

Sorry, I mean the tests in this file.

I would say they are related to Gossipsub given they are specified in Gossipsub Testplan's Peer Discovery block, as linked in the description.

Do you perhaps mean to move them to a different file?

Rendezvous and Peer discovery aren't related to GossipSub.

This is how gossipsub is informed of newly connected peers https://github.com/vacp2p/nim-libp2p/blob/master/libp2p/protocols/pubsub/pubsub.nim#L665. We don't need to use any discovery mechanism to test this, just connecting to peers using the switch is enough.

Whenever a new peer is connected, the gossipsub implementation checks to see if the peer implements floodsub and/or gossipsub, and if so, it sends it a hello packet that announces the topics that it is currently subscribing to.

I don't know if we are doing this part checks to see if the peer implements floodsub and/or gossipsub.

Those are two different things, I believe.
What you mention is the callback that lets you run code when a peer joins or leaves.
This is a discovery mechanism that allows you to find peers available for connection.

Also, I believe that having removed the xasyncTest, this PR's scope is just peer discovery.

Yes, as we agreed, it's best to keep PRs small and focused on a single scope.

That said, we should avoid prolonging the PR process unnecessarily. In this case, for example, the cost of delaying to debate whether to split the PR outweighs simply merging it, provided the code is sound.

Approaching the discussion more collaboratively might have helped resolve the issue more quickly.

tests/utils/tests.nim

tests/utils/async.nim

diegomrsantos · 2024-09-10T15:41:25Z

tests/pubsub/testgossipsub.nim

@@ -277,54 +278,6 @@ suite "GossipSub":

    await allFuturesThrowing(nodes[0].switch.stop(), nodes[1].switch.stop())

-  asyncTest "GossipSub unsub - resub faster than backoff":


Why has this been removed?

Because that test is wrong. The correct one is the one I implemented but marked with xasyncTest because currently there's no mechanism to make it work as it should.

Why is the test wrong?

Because it doesn't take into account the fanout mechanism. The message is sent via fanout, it doesn't care about backoff period.

Please, add the test back.

Even if it's not working?

diegomrsantos

Rendezvous tests are unrelated to GossipSub, they shound't be in GossipSub test cases. Please create a new PR with a different description.

diegomrsantos · 2024-09-11T18:08:31Z

tests/pubsub/testgossipsub.nim

    )
+    gossip1.routingRecordsHandler.add(


First time I see RoutingRecordsHandler. Would you happen to know what is the purpose of this? Is it something mentioned in the spec?

It's the callback for peer exchange.

diegomrsantos · 2024-09-11T18:12:45Z

tests/utils/async.nim

+proc toResult*[T](future: Future[T]): Result[T, string] =
+  if future.cancelled():
+    return results.err("Future cancelled/timed out.")
+  elif future.finished():
+    if not future.failed():
+      return future.toOk()
+    else:
+      return results.err("Future finished but failed.")
+  else:
+    return results.err("Future still not finished.")
+
+proc waitForResult*[T](
+    future: Future[T], timeout = DURATION_TIMEOUT
+): Future[Result[T, string]] {.async.} =
+  discard await future.withTimeout(timeout)
+  return future.toResult()


I believe that this complexity isn't necessary. Result is a substitute for an Exception, not a Future. Our focus should be to make the codebase simpler and avoid unnecessary complexity.

Result is not a substitute for Exception, but for a more general "any failed computation".
In this case, this simplifies and makes more robust a lot of checks in tests. When you run withTimeout you are effectively collapsing the "uncertainty of the future", that is, making it either have a value, or an error.
Casting that "collapsed future" into a result allows for a more visual and easier handling of that situation.

diegomrsantos · 2024-09-11T18:16:34Z

tests/pubsub/testgossipsub.nim

    nodes[1].unsubscribe("foobar", handler)

-    await passed.wait(5.seconds)


why has this been removed?

Because it's been replaced by the waitForResult call. Conceptually similar, but handier for the checks below.

diegomrsantos · 2024-09-11T18:23:33Z

tests/utils/futures.nim

+const
+  DURATION_TIMEOUT* = 1.seconds
+  DURATION_TIMEOUT_EXTENDED* = 1500.milliseconds


I believe this shouldn't be defined in an utils file. The constant names don't have any meaningful description and we don't have any requirement or consensus about the use of those values in the test suite. If you want to avoid magic numbers and repetition in your tests, I'd suggest to define them for a particular test file.

I would say DURATION_TIMEOUT is not a horrible name. It specifies the type (duration) and the context (timeout). Given that's the default measurement I'm using, I decided to leave the name as is. I believe the EXTENDED suffix can be easily understood from that.

The reason I'm specifying them in a file is precisely because I'm using those numbers widely across different files, so it felt a bit repetitive having to repeat them all over the test codebase.

diegomrsantos · 2024-09-11T18:29:07Z

tests/pubsub/utils.nim

@@ -201,3 +207,18 @@ proc waitSubGraph*(nodes: seq[PubSub], key: string) {.async.} =

    await sleepAsync(5.milliseconds)
    doAssert Moment.now() < timeout, "waitSubGraph timeout!"
+
+proc waitForMesh*(


This doesn't seem to be used.

It was used in a test that was removed by your suggestion. Removing it.

diegomrsantos · 2024-09-11T23:42:19Z

tests/testdiscovery.nim

+    await rdvB.unsubscribe(namespace)
+    var
+      query2 = dmA.request(rdvNamespace)
+      res2 = await query2.getPeer().waitForResult(1.seconds)


You can use withTimeoutand check the result is false.

Yes, I could. That being said, I consider casting the Future to Result a much cleaner approach which also allows for value checking if need be (not in this case, ofc).

diegomrsantos · 2024-09-11T23:45:22Z

tests/testdiscovery.nim

+    dmB.advertise(rdvNamespace)
+    let
+      query1 = dmA.request(rdvNamespace)
+      res1 = await query1.getPeer().waitForResult(1.seconds)


This isn't supposed to hang forever, so you can just use awaitand check the value is returned as expected. I'd suggest using a more descriptive name than res1.

Using await means there's no timeout, which is something quite handy for this situation. This operation should take way lower than 1 second, which means if the timeout is hit, there's something very wrong with this and the test should quite. If, by any chance, the reason the timeout is hit is the task is hanging, this makes the test exit and not leave the test suite running forever.

Regarding the name, if you have a look at the tests you will see it's kind of standard practice. If you have a suggestion I'm all ears :)

lchenut · 2024-09-24T12:22:47Z

tests/testdiscovery.nim

+        query1 = dmA.request(rdvNamespace)
+        res1 = await query1.getPeer().waitForResult(1.seconds)


A query is a tool to re-use. Recreating one every loop is not the way to do.
It should work just by writing

asyncTest "Frequent sub/unsub": let query = dmA.request(rdvNamespace) for i in 0 ..< 10: (...)

Just did that and it doesn't work, it breaks at the 3rd iteration. I don't know what might the correlation be.

Then there's a problem in rendezvous or the discovery manager 😕
But I think I already tried to understand this. It rings a bell.

What would you rather I do with this? Leave it as is and open an issue or would you like to investigate it before merging?

I would like to investigate, it's a real problem. I'll do this next week.

lchenut · 2024-09-24T12:31:20Z

tests/testdiscovery.nim

+
+      res2.assertIsErr()
+
+  asyncTest "Frequent sub/unsub with multiple clients":


Same thing in this test, you should create only one query per discovery manager.

AlejandroCabeza self-assigned this Aug 7, 2024

AlejandroCabeza mentioned this pull request Aug 7, 2024

bug: rendezvous unsubscribed peer is discoverable #1169

Open

AlejandroCabeza force-pushed the tests/gossipsub/peer-discovery branch 2 times, most recently from 1852506 to 561ca6d Compare August 14, 2024 15:39

AlejandroCabeza added 12 commits August 27, 2024 18:56

Add utility functions.

a85be04

Refactor discovery tests for reusability.

c8bc839

Implement renddezvous sub/unsub test.

788e156

Implement rendezvous frequent sub/unsub tests.

f791e78

Update waitForResult to handle void futures.

5641946

Improve existing PX test.

543c281

Add timeout duration consts.

e199782

Add active waitForMesh proc.

b32badf

Fix timeout variable naming

cba704d

Update future to result utils.

3e68f91

Add simple mocking mechanism.

6501a61

Fix resub after unsub test.

e53d041

AlejandroCabeza force-pushed the tests/gossipsub/peer-discovery branch from e2e278e to e53d041 Compare August 27, 2024 16:56

AlejandroCabeza marked this pull request as ready for review August 27, 2024 16:56

AlejandroCabeza requested review from lchenut and diegomrsantos August 27, 2024 16:57

AlejandroCabeza added 3 commits August 28, 2024 18:00

Remove mock and comment test.

0db9812

Remove mocking module.

2bbc2f3

Remove mock import.

6447714

diegomrsantos reviewed Sep 5, 2024

View reviewed changes

tests/utils/tests.nim Outdated Show resolved Hide resolved

diegomrsantos reviewed Sep 5, 2024

View reviewed changes

tests/utils/tests.nim Outdated Show resolved Hide resolved

diegomrsantos reviewed Sep 5, 2024

View reviewed changes

tests/utils/async.nim Outdated Show resolved Hide resolved

Remove unused code.

b40eaa8

AlejandroCabeza force-pushed the tests/gossipsub/peer-discovery branch from 7fd3a28 to b40eaa8 Compare September 10, 2024 15:38

diegomrsantos reviewed Sep 10, 2024

View reviewed changes

diegomrsantos suggested changes Sep 11, 2024

View reviewed changes

diegomrsantos reviewed Sep 11, 2024

View reviewed changes

Remove unused utils.

d60822e

lchenut reviewed Sep 24, 2024

View reviewed changes

		@@ -277,54 +278,6 @@ suite "GossipSub":

		await allFuturesThrowing(nodes[0].switch.stop(), nodes[1].switch.stop())

		asyncTest "GossipSub unsub - resub faster than backoff":

		nodes[1].unsubscribe("foobar", handler)

		await passed.wait(5.seconds)

		query1 = dmA.request(rdvNamespace)
		res1 = await query1.getPeer().waitForResult(1.seconds)


		res2.assertIsErr()

		asyncTest "Frequent sub/unsub with multiple clients":

test(gossipsub): peer discovery #1168

Are you sure you want to change the base?

test(gossipsub): peer discovery #1168

Conversation

AlejandroCabeza commented Aug 7, 2024 • edited Loading

Note

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

diegomrsantos Sep 6, 2024 • edited Loading

Choose a reason for hiding this comment

diegomrsantos Sep 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

diegomrsantos left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

diegomrsantos Sep 11, 2024 • edited Loading

Choose a reason for hiding this comment

AlejandroCabeza Sep 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

diegomrsantos Sep 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

diegomrsantos Sep 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lchenut Sep 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lchenut Sep 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlejandroCabeza commented Aug 7, 2024 •

edited

Loading

diegomrsantos Sep 6, 2024 •

edited

Loading

diegomrsantos Sep 10, 2024 •

edited

Loading

diegomrsantos Sep 11, 2024 •

edited

Loading

AlejandroCabeza Sep 16, 2024 •

edited

Loading

diegomrsantos Sep 11, 2024 •

edited

Loading

diegomrsantos Sep 11, 2024 •

edited

Loading

lchenut Sep 24, 2024 •

edited

Loading

lchenut Sep 25, 2024 •

edited

Loading