Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test(supervisors): fix occasional assertion failures and hangs #2431

Open
wants to merge 21 commits into
base: master
Choose a base branch
from

Conversation

vvanglro
Copy link
Contributor

@vvanglro vvanglro commented Aug 14, 2024

Summary

It's not entirely clear why the assertion fails here. assert not process.is_alive()

Make sure that test resources can be cleaned up anyway.

Checklist

  • I understand that this PR may be closed in case there was no previous discussion. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • I've updated the documentation accordingly.

…lly'

Ensure proper cleanup in 'test_multiprocess' by wrapping the supervisor shutdown
logic in a 'finally' block. This guarantees that the supervisor is properly terminated
and all processes are joined, avoiding potential resource leaks or state residuebetween test runs.
@vvanglro
Copy link
Contributor Author

vvanglro commented Aug 14, 2024

Now there are two problems, one is that the process hangs when assertion fails(This PR solves), and the second is that it hangs on execution.

related:
https://stackoverflow.com/questions/74114409/flask-hangs-when-run-as-a-subprocess-alongside-pytest
pytest-dev/pytest#10965

I create a Windows environment, but it's hard to reproduce.🙃

A10-second timeout has been added to the subprocess call within the test_multiprocess.py
to ensure it does not hang indefinitely in case of failures or slow systems.
@vvanglro
Copy link
Contributor Author

Ah, It seems to make sense now. process.kill() process does not end immediately, This is why failure is occasionally asserted here assert not process.is_alive().

Successful cases are all executed for more than 7s, that is, when the ping is executed, the process is really closed, and then the timeout returns false.

I add a timeout of 1 second to the assertion here, and the total execution time is more than 3 s, indicating that the ping method is taken here.
image

The same.

image

@Kludex
Copy link
Member

Kludex commented Aug 15, 2024

See? No need for retry. 😌👍

Thanks for your time investigating this. 🙏

@vvanglro vvanglro changed the title test(supervisors): ensure cleanup in test_multiprocess by using 'finally' test(supervisors): fix occasional assertion failures and hangs Aug 16, 2024
tests/supervisors/test_multiprocess.py Outdated Show resolved Hide resolved
@@ -34,6 +34,7 @@ def new_function():
"-c",
f"from {module} import {name}; {name}.__wrapped__()",
],
timeout=10,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the timeout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this was added earlier when I was investigating why it was hanging. No need for it now, I'll remove it later

Kludex
Kludex previously approved these changes Nov 20, 2024
@Kludex Kludex enabled auto-merge (squash) November 20, 2024 20:54
- Replace fixed sleep with retry mechanism for process status checks- Enhance test stability by allowing for variable process startup times
auto-merge was automatically disabled November 21, 2024 08:12

Head branch was pushed to by a user without write access

@Kludex
Copy link
Member

Kludex commented Nov 21, 2024

Thanks.

@vvanglro
Copy link
Contributor Author

vvanglro commented Nov 21, 2024

We can't control it with precision, so we might as well retry.

Just make sure the dead process is eventually started successfully.

@Kludex Kludex dismissed their stale review December 14, 2024 09:22

Retries are not a good sign in a test suite.

@Kludex
Copy link
Member

Kludex commented Dec 15, 2024

@abersheeran suggestions to solve this flaky test?

@vvanglro
Copy link
Contributor Author

vvanglro commented Dec 16, 2024

Very rare, Run it many times before encountering. It looks like the restart of the process is delayed?

image

@vvanglro
Copy link
Contributor Author

Ah, does the unit test code have to be statistically covered too?

@abersheeran
Copy link
Member

@abersheeran suggestions to solve this flaky test?

I think using while to check is not a good idea in unit tests. To be honest, I think it would be better to just kill all processes in the tree externally in case of weird scheduling errors.

@vvanglro
Copy link
Contributor Author

@Kludex I think the issue is resolved in the latest commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants