-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Be careful about freeing callback trampolines #64
Conversation
08e0ee9
to
38a2f77
Compare
ruff started to complain about this in the venv tests.
38a2f77
to
5788fa2
Compare
src/systemd_ctypes/libsystemd.py
Outdated
if self.deferred is not None: | ||
self.deferred.append(self.trampoline) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me a bit nervous -- deferred
is a class variable and also set as such (on the class), but del is an instance method. Can it happen that event sources or bus slots get freed at the same time in parallel threads? Does python do the locking here, or don't we support ctypes from multiple threads at all?
Also, could this be Trampoiline.deferred
, to point out that this is a class var? Accessing it as self
feels misleading (even though it probably does the same).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could definitely be accessed via the class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot about the GIL! So parallel access should be safe.
# We can be sure we're not dispatching callbacks anymore | ||
libsystemd.Trampoline.deferred = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so that's the final cleanup of the deferred trampolines. Is there only ever one instance of Selector
? This doesn't feel correct if there could be multiple ones
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. This is why I'm not super happy about this fix. If we tried to run independent mainloops in separate threads, this would indeed be incorrect. Trying to do something "more correct" here is hard, though, and we never have anything but the default loop running in the main thread, so ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine. This could do with an assertion that there's only a single instance, but good enough now!
Our approach to handling Source and Slot objects is fairly clever: we tie the call trampoline and closure to the same object that holds a reference to the source object on the C side. When we are about to `__del__()` that object, we unref the source, preventing any further events from being dispatched. In this way, we can be completely sure that systemd will never call our trampoline after it's been freed. Unfortunately, this isn't good enough: we have a lot of cases where we free a Source while it is currently being dispatched. Until now we've never noticed a problem, but Cockpit recently added a stress-test for inotify (`test_fsinfo_watch_identity_changes`) which dispatches thousand of events and runs long enough that garbage collection gets invoked, freeing trampolines while they are currently running. Python does not hold a reference to the data, and this causes crashes on some architectures. Let's give Source and Slot a common base class (Trampoline) that models their common behaviour. This helper class also changes the `__del__()` behaviour: in case some external caller has requested deferral of the destruction of trampolines, we add them to a list just before we get deleted, to prevent the FFI wrapper from being destroyed with us. We know that the problem described above is only a problem if we're dispatching from systemd's event loop, so setup deferral on entry to the loop and drop the deferred objects on exit. Closes #63
5788fa2
to
fbb8e57
Compare
Our approach to handling Source and Slot objects is fairly clever: we
tie the call trampoline and closure to the same object that holds a
reference to the source object on the C side. When we are about to
__del__()
that object, we unref the source, preventing any furtherevents from being dispatched. In this way, we can be completely sure
that systemd will never call our trampoline after it's been freed.
Unfortunately, this isn't good enough: we have a lot of cases where we
free a Source while it is currently being dispatched. Until now we've
never noticed a problem, but Cockpit recently added a stress-test for
inotify (
test_fsinfo_watch_identity_changes
) which dispatches thousandof events and runs long enough that garbage collection gets invoked,
freeing trampolines while they are currently running. Python does not
hold a reference to the data, and this causes crashes on some
architectures.
Let's give Source and Slot a common base class (Trampoline) that models
their common behaviour. This helper class also changes the
__del__()
behaviour: in case some external caller has requested deferral of the
destruction of trampolines, we add them to a list just before we get
deleted, to prevent the FFI wrapper from being destroyed with us.
We know that the problem described above is only a problem if we're
dispatching from systemd's event loop, so setup deferral on entry to the
loop and drop the deferred objects on exit.
Closes #63