-
Notifications
You must be signed in to change notification settings - Fork 397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Override dlsym instead of dlopen to correctly honour RPATH/RUNPATHS #525
Conversation
6a6bf73
to
5c7be6d
Compare
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #525 +/- ##
==========================================
+ Coverage 92.55% 92.86% +0.30%
==========================================
Files 91 91
Lines 11304 11136 -168
Branches 1581 2026 +445
==========================================
- Hits 10462 10341 -121
+ Misses 837 795 -42
+ Partials 5 0 -5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Hm. What I pushed seems to work on glibc Linux and macOS, but I think it's hanging on musl linux. I'll probably need to spin up a container to debug that. 😔 |
08ac332
to
f8e51aa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've got a few small nits, but otherwise this looks good to me.
In addition to these, would you mind squashing the first two commits together (and leaving the 3rd and 4th separate)? Commit 2 just un-does something done in commit 1, mostly (and adds a &::
that wasn't there before). Maybe update the commit message to mention the rationale for the &::
to force a lookup in the global namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I've squashed the fixups and rebased.
We need to override dlopen/dlclose to account for new shared libraries being loaded in the process memory space. This is needed so we can correctly track allocations in those libraries by overriding their PLT entries and also so we can properly map the addresses of the symbols in those libraries when we resolve later native traces. Unfortunately, we can't just override dlopen directly because of the following edge case: when a shared library dlopen's another by name (e.g. dlopen("libfoo.so")), the dlopen call will honor the RPATH/RUNPATH of the calling library if it's set. Some libraries set an RPATH/RUNPATH based on $ORIGIN (the path of the calling library) to load dependencies from a relative directory based on the location of the calling library. This means that if we override dlopen, we'll end up loading the library from the wrong path or more likely, not loading it at all because the dynamic loader will think the memray extenion it's the calling library and the RPATH of the real calling library will not be honoured. To work around this, we override dlsym instead and override the symbols in the loaded libraries only the first time we have seen a handle passed to dlsym. This works because for a symbol from a given dlopen-ed library to appear in a call stack, *something* from that library has to be dlsym-ed first. The only exception to this are static initializers, but we cannot track those anyway by overriding dlopen as they run within the dlopen call itself. Signed-off-by: Pablo Galindo <[email protected]>
Now that we're patching `dlsym`, we need to be careful to always pick up the global symbol named `dlsym`, to avoid lldb considering our own namespaces and finding the wrong symbol's address. Signed-off-by: Matt Wozniski <[email protected]>
LLDB wants to resolve `&::malloc` to `memray::hooks::malloc` for reasons that I haven't fully managed to make sense of, but we can work around this by calling our hook `memray::hooks::memray_malloc` instead. Wrap that up in a macro to make it harder to forget the way to compute the hook object name from the hooked function's name. Signed-off-by: Matt Wozniski <[email protected]>
To avoid deadlocks and infinite recursion, we need to avoid patching symbols inside the linker shared objects. We have been diligently doing this with the GNU linker ("ld-linux") but not for the muslc linker ("ld-musl"). This has been working by chance because we use our recursion guards in many of the symbols we patch, including dlopen. The fact that we override dlopen and we use the guard was preventing a deadlock when using native traces in muslc, as fetching native traces ends adquiring a lock that it's shared by dlopen itself. Now that we do not patch dlopen, there is nothing preventing memray to try to fetch native traces in the inner calls to calloc() and that tries to adquire the dlopen lock which is not re-entrant. The proper fix is to avoid patching inside the muslc linker as we do for Linux. Signed-off-by: Pablo Galindo <[email protected]>
We need to override dlopen/dlclose to account for new shared libraries
being loaded in the process memory space. This is needed so we can
correctly track allocations in those libraries by overriding their PLT
entries and also so we can properly map the addresses of the symbols in
those libraries when we resolve later native traces. Unfortunately, we
can't just override dlopen directly because of the following edge case:
when a shared library dlopen's another by name (e.g.
dlopen("libfoo.so")), the dlopen call will honor the RPATH/RUNPATH of
the calling library if it's set. Some libraries set an RPATH/RUNPATH
based on $ORIGIN (the path of the calling library) to load dependencies
from a relative directory based on the location of the calling library.
This means that if we override dlopen, we'll end up loading the library
from the wrong path or more likely, not loading it at all because the
dynamic loader will think the memray extenion it's the calling library
and the RPATH of the real calling library will not be honoured.
To work around this, we override dlsym instead and override the symbols
in the loaded libraries only the first time we have seen a handle passed
to dlsym. This works because for a symbol from a given dlopen-ed library
to appear in a call stack, something from that library has to be
dlsym-ed first. The only exception to this are static initializers, but
we cannot track those anyway by overriding dlopen as they run within the
dlopen call itself.
Closes: #212