-
-
Notifications
You must be signed in to change notification settings - Fork 348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use autojump_threshold in test_handshake_over_terrible_network #2603
use autojump_threshold in test_handshake_over_terrible_network #2603
Conversation
observation: any autojump threshold over 0 takes MUCH longer than 0 hypothesis: 10 second timeout cscope at the end of test is spuriously cancelling attempts on slow machines experiment: set a minimal threshold and scale back # of handshakes to finish in a reasonable time, check if CI timeouts are avoided
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #2603 +/- ##
=======================================
Coverage 92.45% 92.45%
=======================================
Files 118 118
Lines 16352 16352
Branches 3156 3156
=======================================
Hits 15119 15119
Misses 1104 1104
Partials 129 129
|
I'm not sure I agree with the reasoning in the description of the PR (the 10 second cancel scope should solely be skipping sends?). I think I agree with that changes though, just the description needs to be rethought! I would make |
I suppose I should put a fuller explanation here in the PR. I just learned about this stuff today as well after all! The My thinking is that the send is effectively "nonblocking" and immediately reschedules itself, whereas the receive has to wait long enough for the system (kernel? python?) to get around to handling the sent payload. This handling process may be just slow enough on an overloaded CI machine to outlast one cycle of the run loop that applies for the default setting, but not the 1 ms Finally, it does seem to work, I've rerun CI on 55abf37 4 times and never had a pypy timeout failure. Take another look after I unify |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That does seem like a pretty important issue! Alright, I'm on board. I'd rather wait for confirmation on whether this autojump + cancel scope issue might impact elsewhere (from maybe njs?), though that's not too important.
I shouldn't have left this out to dry for so long, sorry! |
observation: any autojump threshold over 0 takes MUCH longer than 0
hypothesis: 10 second timeout cscope at the end of test is spuriously cancelling attempts on slow machines
experiment: set a minimal threshold and scale back # of handshakes to finish in a reasonable time, check if CI timeouts are avoided