Skip to content
This repository has been archived by the owner on Mar 12, 2020. It is now read-only.

Flaky dials #871

Open
sanderpick opened this issue Jul 23, 2019 · 0 comments
Open

Flaky dials #871

sanderpick opened this issue Jul 23, 2019 · 0 comments
Labels
bug Something isn't working

Comments

@sanderpick
Copy link
Member

sanderpick commented Jul 23, 2019

Describe the bug
Dialing a remote peer often fails at relatively steady intervals. The remote peer stops accepting connections for ~1-2 min, after which everything starts working again. At least I can confirm that the remote is actually not responding.

During this downtime,

curl -v telnet://104.210.43.77:4001

will hang. Once it starts responding again (with /multistream/1.0.0), our service layer calls succeed.

To Reproduce
Run a desktop peer in debug mode and register a remote cafe. After awhile, you should notice that dials start to fail:

16:31:04.149 DEBUG tex-servic: sending CAFE_CHECK_MESSAGES to 12D3KooWSsM117bNw6yu1auMfNqeu59578Bct5V4S9fWxavogrsw main.go:102
16:31:04.149  INFO tex-servic: error writing message, trying again:  stream reset sender.go:165
16:31:21.089 DEBUG tex-servic: received pubsub CAFE_PUBSUB_QUERY from 12D3KooWRWXWWWuUKP6u8Fdq9FvyHKz6ksWys5cPoGVwhMMGUp1f main.go:502
16:31:34.151 ERROR   tex-core: error checking messages: context deadline exceeded main.go:713
16:31:34.151 DEBUG   tex-core: flushing downloads block_downloads.go:53
16:31:34.151 DEBUG   tex-core: handling 0 downloads block_downloads.go:61
16:31:34.582 DEBUG tex-servic: received pubsub CAFE_PUBSUB_QUERY from 12D3KooWGN8VAsPHsHeJtoTbbzsGjs2LTmQZ6wFKvuPich1TYmYY main.go:502
16:32:04.150 DEBUG   tex-core: flushing cafe outbox cafe_outbox.go:172
16:32:04.151 DEBUG   tex-core: handling 0 cafe requests cafe_service.go:1454
16:32:04.152 DEBUG tex-servic: sending CAFE_CHECK_MESSAGES to 12D3KooWSsM117bNw6yu1auMfNqeu59578Bct5V4S9fWxavogrsw main.go:102
16:32:09.156 ERROR   tex-core: error checking messages: failed to dial : all dials failed
  * [/ip4/127.0.0.1/tcp/8081/ws] dial tcp 127.0.0.1:8081: connect: connection refused
  * [/ip4/127.0.0.1/tcp/4001] dial tcp4 127.0.0.1:4001: connect: connection refused
  * [/ip4/104.210.43.77/tcp/4001] dial tcp4 0.0.0.0:9113->104.210.43.77:4001: i/o timeout main.go:713
16:32:09.156 DEBUG   tex-core: flushing downloads block_downloads.go:53
16:32:09.156 DEBUG   tex-core: handling 0 downloads block_downloads.go:61
16:32:21.117 DEBUG tex-servic: received pubsub CAFE_PUBSUB_QUERY from 12D3KooWRWXWWWuUKP6u8Fdq9FvyHKz6ksWys5cPoGVwhMMGUp1f main.go:502
16:32:32.606 DEBUG tex-servic: received pubsub CAFE_PUBSUB_QUERY from 12D3KooWLnUv9MWuRM6uHirRPBM4NwRj54n4gNNnBtiFiwPiv3Up main.go:502
16:32:48.116 DEBUG tex-servic: received pubsub CAFE_PUBSUB_QUERY from 12D3KooWNjRYvJza7dFx4YTf9kUYBUcbefBDexhBJ3KYHGrC1dtG main.go:502
16:33:04.152 DEBUG   tex-core: flushing cafe outbox cafe_outbox.go:172
16:33:04.152 DEBUG   tex-core: handling 0 cafe requests cafe_service.go:1454
16:33:04.154 DEBUG tex-servic: sending CAFE_CHECK_MESSAGES to 12D3KooWSsM117bNw6yu1auMfNqeu59578Bct5V4S9fWxavogrsw main.go:102
16:33:04.704 DEBUG tex-servic: received CAFE_MESSAGES response from 12D3KooWSsM117bNw6yu1auMfNqeu59578Bct5V4S9fWxavogrsw main.go:130

Expected behavior
All dials should succeed.

@sanderpick sanderpick added the bug Something isn't working label Jul 23, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant