Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a path for BPF-accelerated async signal emulation. #3731

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

khuey
Copy link
Collaborator

@khuey khuey commented Apr 22, 2024

Starting in kernel 6.10 BPF filters can choose whether or not to trigger the SIGIO behavior for a perf event that becomes readable. We combine that with a hardware breakpoint and a BPF filter that matches the GPRs to produce an accelerated internal breakpoint type that can fast forward through loop iterations to deliver async signals. On one trace this reduced rr's replay overhead by 94%.

This adds a runtime dependency on libbpf and a compile time dependency on clang --target bpf. rr also needs CAP_BPF and CAP_PERFMON to use this feature. Because of all of that, this isn't really suitable for wide use at this point and is instead a CMake feature usebpf. Set -Dusebpf=ON to test it.

(I think we should wait until the kernel side hits Linus's tree to merge this.)

CMakeLists.txt Outdated Show resolved Hide resolved
static struct user_regs_struct* bpf_regs;

if (!fd_async_signal_accelerator.is_open()) {
if (!initialized) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about moving this BPF initialization code into its own function?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feel a bit ugly to be mashing the BPF program's global state in this function. And it's ugly to be mmapping that buffer and then leaking it to the global variable.

How hard would it be to put the BPF program and its state into its own class with proper ownership, and have each ReplaySession hold a shared pointer to an object of that class?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright I reorganized this along those lines. The bpf singleton stuff lives in a BpfAccelerator class that's shared between the different PerfCounters instances.

src/PerfCounters.h Outdated Show resolved Hide resolved
src/ReplaySession.cc Show resolved Hide resolved
src/bpf/async_event_filter.c Outdated Show resolved Hide resolved
src/bpf/async_event_filter.c Outdated Show resolved Hide resolved
khuey added 3 commits May 26, 2024 12:01
Starting in kernel 6.10 BPF filters can choose whether or not to trigger
the SIGIO behavior for a perf event that becomes readable. We combine that
with a hardware breakpoint and a BPF filter that matches the GPRs to produce
an accelerated internal breakpoint type that can fast forward through loop
iterations to deliver async signals. On one trace this reduced rr's replay
overhead by 94%.

This adds a runtime dependency on libbpf and a compile time dependency on
clang --target bpf. rr also needs CAP_BPF and CAP_PERFMON to use this feature.
Because of all of that, this isn't really suitable for wide use at this point
and is instead a CMake feature usebpf. Set -Dusebpf=ON to test it.
@khuey khuey requested a review from rocallahan May 27, 2024 01:39
src/PerfCounters.cc Outdated Show resolved Hide resolved
src/PerfCounters.cc Show resolved Hide resolved
src/PerfCounters.cc Outdated Show resolved Hide resolved
src/PerfCounters.cc Show resolved Hide resolved

class BpfAccelerator {
public:
static std::shared_ptr<BpfAccelerator> get_or_create();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking we could just create one BpfAccelerator in ReplaySession and copy the reference when we clone ReplaySessions so we don't need a static variable here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced this is a great idea. It means moving BpfAccelerator into the header so ReplaySession can get at it. Is that really better than a static singleton?

src/bpf/async_event_filter.c Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants