Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Override dlsym instead of dlopen to correctly honour RPATH/RUNPATHS #525

Merged
merged 4 commits into from
Feb 8, 2024

Conversation

pablogsal
Copy link
Member

We need to override dlopen/dlclose to account for new shared libraries
being loaded in the process memory space. This is needed so we can
correctly track allocations in those libraries by overriding their PLT
entries and also so we can properly map the addresses of the symbols in
those libraries when we resolve later native traces. Unfortunately, we
can't just override dlopen directly because of the following edge case:
when a shared library dlopen's another by name (e.g.
dlopen("libfoo.so")), the dlopen call will honor the RPATH/RUNPATH of
the calling library if it's set. Some libraries set an RPATH/RUNPATH
based on $ORIGIN (the path of the calling library) to load dependencies
from a relative directory based on the location of the calling library.
This means that if we override dlopen, we'll end up loading the library
from the wrong path or more likely, not loading it at all because the
dynamic loader will think the memray extenion it's the calling library
and the RPATH of the real calling library will not be honoured.

To work around this, we override dlsym instead and override the symbols
in the loaded libraries only the first time we have seen a handle passed
to dlsym. This works because for a symbol from a given dlopen-ed library
to appear in a call stack, something from that library has to be
dlsym-ed first. The only exception to this are static initializers, but
we cannot track those anyway by overriding dlopen as they run within the
dlopen call itself.

Closes: #212

@pablogsal pablogsal force-pushed the dlopen branch 4 times, most recently from 6a6bf73 to 5c7be6d Compare January 16, 2024 16:39
src/memray/_memray/hooks.cpp Show resolved Hide resolved
src/memray/_memray/hooks.cpp Show resolved Hide resolved
src/memray/_memray/hooks.cpp Outdated Show resolved Hide resolved
src/memray/_memray/hooks.cpp Outdated Show resolved Hide resolved
src/memray/_memray/hooks.cpp Outdated Show resolved Hide resolved
src/memray/_memray/hooks.cpp Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Jan 19, 2024

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (41248ed) 92.55% compared to head (422a0ee) 92.86%.
Report is 6 commits behind head on main.

Files Patch % Lines
tests/integration/test_extensions.py 90.47% 2 Missing ⚠️
tests/conftest.py 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #525      +/-   ##
==========================================
+ Coverage   92.55%   92.86%   +0.30%     
==========================================
  Files          91       91              
  Lines       11304    11136     -168     
  Branches     1581     2026     +445     
==========================================
- Hits        10462    10341     -121     
+ Misses        837      795      -42     
+ Partials        5        0       -5     
Flag Coverage Δ
cpp 92.86% <95.16%> (+6.92%) ⬆️
python_and_cython 92.86% <95.16%> (-2.86%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@godlygeek
Copy link
Contributor

godlygeek commented Jan 19, 2024

Hm. What I pushed seems to work on glibc Linux and macOS, but I think it's hanging on musl linux. I'll probably need to spin up a container to debug that. 😔

@pablogsal pablogsal force-pushed the dlopen branch 3 times, most recently from 08ac332 to f8e51aa Compare January 19, 2024 15:00
Copy link
Contributor

@godlygeek godlygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got a few small nits, but otherwise this looks good to me.

In addition to these, would you mind squashing the first two commits together (and leaving the 3rd and 4th separate)? Commit 2 just un-does something done in commit 1, mostly (and adds a &:: that wasn't there before). Maybe update the commit message to mention the rationale for the &:: to force a lookup in the global namespace.

tests/integration/rpath_extension/setup.py Outdated Show resolved Hide resolved
tests/integration/test_extensions.py Outdated Show resolved Hide resolved
tests/integration/test_extensions.py Show resolved Hide resolved
src/memray/_memray/hooks.cpp Outdated Show resolved Hide resolved
src/memray/_memray/hooks.cpp Show resolved Hide resolved
src/memray/_memray/hooks.cpp Show resolved Hide resolved
Copy link
Contributor

@godlygeek godlygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I've squashed the fixups and rebased.

pablogsal and others added 4 commits February 8, 2024 16:07
We need to override dlopen/dlclose to account for new shared libraries
being loaded in the process memory space. This is needed so we can
correctly track allocations in those libraries by overriding their PLT
entries and also so we can properly map the addresses of the symbols in
those libraries when we resolve later native traces. Unfortunately, we
can't just override dlopen directly because of the following edge case:
when a shared library dlopen's another by name (e.g.
dlopen("libfoo.so")), the dlopen call will honor the RPATH/RUNPATH of
the calling library if it's set. Some libraries set an RPATH/RUNPATH
based on $ORIGIN (the path of the calling library) to load dependencies
from a relative directory based on the location of the calling library.
This means that if we override dlopen, we'll end up loading the library
from the wrong path or more likely, not loading it at all because the
dynamic loader will think the memray extenion it's the calling library
and the RPATH of the real calling library will not be honoured.

To work around this, we override dlsym instead and override the symbols
in the loaded libraries only the first time we have seen a handle passed
to dlsym. This works because for a symbol from a given dlopen-ed library
to appear in a call stack, *something* from that library has to be
dlsym-ed first. The only exception to this are static initializers, but
we cannot track those anyway by overriding dlopen as they run within the
dlopen call itself.

Signed-off-by: Pablo Galindo <[email protected]>
Now that we're patching `dlsym`, we need to be careful to always pick up
the global symbol named `dlsym`, to avoid lldb considering our own
namespaces and finding the wrong symbol's address.

Signed-off-by: Matt Wozniski <[email protected]>
LLDB wants to resolve `&::malloc` to `memray::hooks::malloc` for reasons
that I haven't fully managed to make sense of, but we can work around
this by calling our hook `memray::hooks::memray_malloc` instead.

Wrap that up in a macro to make it harder to forget the way to compute
the hook object name from the hooked function's name.

Signed-off-by: Matt Wozniski <[email protected]>
To avoid deadlocks and infinite recursion, we need to avoid patching
symbols inside the linker shared objects. We have been diligently doing
this with the GNU linker ("ld-linux") but not for the muslc linker
("ld-musl"). This has been working by chance because we use our
recursion guards in many of the symbols we patch, including dlopen.

The fact that we override dlopen and we use the guard was preventing a
deadlock when using native traces in muslc, as fetching native traces
ends adquiring a lock that it's shared by dlopen itself. Now that we do
not patch dlopen, there is nothing preventing memray to try to fetch
native traces in the inner calls to calloc() and that tries to adquire
the dlopen lock which is not re-entrant.

The proper fix is to avoid patching inside the muslc linker as we do for
Linux.

Signed-off-by: Pablo Galindo <[email protected]>
@godlygeek godlygeek merged commit 6fbadf7 into bloomberg:main Feb 8, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Could not load library libcudnn_ops_infer.so.8.
3 participants