Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion failure in process_syscall_entry() during record #3729

Closed
nuudlman opened this issue Apr 19, 2024 · 15 comments
Closed

Assertion failure in process_syscall_entry() during record #3729

nuudlman opened this issue Apr 19, 2024 · 15 comments

Comments

@nuudlman
Copy link

Attempting to debug LLVM opt, crash only happens with a Bazel development build (-O3 -g -UNDEBUG -fno-omit-frame-pointer -fsanitize=address,undefined, LLVM commit 44718311dee486f1823876e8af9100afcc50041b); regular release binaries seem to be fine.

Can provide executable if needed.

Log:

[FATAL src/RecordSession.cc:1944:process_syscall_entry()]
 (task 123227 (rec:123227) at time 72777)
 -> Assertion `t->desched_rec() || is_rrcall_notify_syscall_hook_exit_syscall( t->regs().original_syscallno(), t->arch()) || t->ip() == t->vm() ->privileged_traced_syscall_ip() .increment_by_syscall_insn_length(t->arch())' failed to hold. Stashed signal pending on syscall entry when it shouldn't be: {signo:SIGSTOP,errno:SUCCESS,code:SI_TKILL}; regs={ ip:0x7b9e505de9db args:(0xefffffffffffffff,0,0,0,0,0) orig_syscall: 24 syscallno: -38 }; last_execution_resume=0x7b9e505de9db; sig ip=0x7b9e505de9db
Tail of trace dump:
{
  real_time:10557.434084 global_time:72757, event:`SYSCALL: lseek' (state:EXITING_SYSCALL) tid:123228, ticks:1275
rax:0x0 rbx:0x1000 rcx:0xffffffffffffffff rdx:0x0 rsi:0x0 rdi:0x3 rbp:0x7b9e44200e40 rsp:0x7b9e44200da0 r8:0xffffffffffffffff r9:0x0 r10:0x22 r11:0x246 r12:0x7b9e44200ef0 r13:0x7b9e4ffbcf5c r14:0x7b9e44200e20 r15:0x0 rip:0x7b9e4ff60215 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x8 fs_base:0x7b9e505a6380 gs_base:0x0
}
{
  real_time:10557.434110 global_time:72758, event:`SYSCALL: getdents64' (state:ENTERING_SYSCALL) tid:123228, ticks:1277
rax:0xffffffffffffffda rbx:0x1000 rcx:0xffffffffffffffff rdx:0x1000 rsi:0x7b9e44270000 rdi:0x3 rbp:0x7b9e44200e40 rsp:0x7b9e44200da0 r8:0xffffffffffffffff r9:0x0 r10:0x22 r11:0x246 r12:0x1 r13:0x2 r14:0x7b9e44200e20 r15:0x0 rip:0x7b9e4ff6026a eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xd9 fs_base:0x7b9e505a6380 gs_base:0x0
}
{
  real_time:10557.434136 global_time:72759, event:`SYSCALL: getdents64' (state:EXITING_SYSCALL) tid:123228, ticks:1277
rax:0x50 rbx:0x1000 rcx:0xffffffffffffffff rdx:0x1000 rsi:0x7b9e44270000 rdi:0x3 rbp:0x7b9e44200e40 rsp:0x7b9e44200da0 r8:0xffffffffffffffff r9:0x0 r10:0x22 r11:0x246 r12:0x1 r13:0x2 r14:0x7b9e44200e20 r15:0x0 rip:0x7b9e4ff6026a eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xd9 fs_base:0x7b9e505a6380 gs_base:0x0
  { tid:123228, addr:0x7b9e44270000, length:0x50 }
}
{
  real_time:10557.434167 global_time:72760, event:`SYSCALL: openat' (state:ENTERING_SYSCALL) tid:123228, ticks:1500
rax:0xffffffffffffffda rbx:0x0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x7b9e44200d20 rdi:0xffffffffffffff9c rbp:0x7b9e44200d20 rsp:0x7b9e44200c48 r8:0x1b0 r9:0x7b9e44200d20 r10:0x1b0 r11:0x246 r12:0x0 r13:0x0 r14:0x7b9e44200e48 r15:0x50 rip:0x7b9e4ff5f1e1 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x101 fs_base:0x7b9e505a6380 gs_base:0x0
}
{
  real_time:10557.434178 global_time:72761, event:`SYSCALL: sched_yield' (state:EXITING_SYSCALL) tid:123227, ticks:525773288
rax:0x0 rbx:0x7b9e5008bcc8 rcx:0xffffffffffffffff rdx:0x0 rsi:0x0 rdi:0xefffffffffffffff rbp:0x7ffd2a7be590 rsp:0x7ffd2a7be4f8 r8:0x0 r9:0x0 r10:0x0 r11:0x246 r12:0x7ffd2a7be51c r13:0x7b9e44000000 r14:0x1 r15:0x201000 rip:0x7b9e505de9db eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x18 fs_base:0x7b9e505a6380 gs_base:0x0
}
{
  real_time:10557.434214 global_time:72762, event:`SYSCALL: sched_yield' (state:ENTERING_SYSCALL) tid:123227, ticks:525773289
rax:0xffffffffffffffda rbx:0x7b9e5008bcc8 rcx:0xffffffffffffffff rdx:0x0 rsi:0x0 rdi:0xefffffffffffffff rbp:0x7ffd2a7be590 rsp:0x7ffd2a7be4f8 r8:0x0 r9:0x0 r10:0x0 r11:0x246 r12:0x7ffd2a7be51c r13:0x7b9e44000000 r14:0x1 r15:0x201000 rip:0x7b9e505de9db eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x18 fs_base:0x7b9e505a6380 gs_base:0x0
}
{
  real_time:10557.434259 global_time:72763, event:`SYSCALL: openat' (state:EXITING_SYSCALL) tid:123228, ticks:1500
rax:0x4 rbx:0x0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x7b9e44200d20 rdi:0xffffffffffffff9c rbp:0x7b9e44200d20 rsp:0x7b9e44200c48 r8:0x1b0 r9:0x7b9e44200d20 r10:0x1b0 r11:0x246 r12:0x0 r13:0x0 r14:0x7b9e44200e48 r15:0x50 rip:0x7b9e4ff5f1e1 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x101 fs_base:0x7b9e505a6380 gs_base:0x0
}
{
  real_time:10557.434308 global_time:72764, event:`PATCH_SYSCALL' tid:123228, ticks:1579
rax:0x0 rbx:0x0 rcx:0xffffffffffffffff rdx:0x1000 rsi:0x7b9e44270000 rdi:0x4 rbp:0x7b9e44200cc0 rsp:0x7b9e44200c78 r8:0x0 r9:0x7b9e44200d20 r10:0x1b0 r11:0x246 r12:0x0 r13:0x10000000 r14:0x7b9e44200e48 r15:0x0 rip:0x7b9e4ff5f1fd eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xffffffffffffffff fs_base:0x7b9e505a6380 gs_base:0x0
  { tid:123228, addr:0x7b9e5066969c, length:0x5e }
  { tid:123228, addr:0x7b9e4ff5f1fd, length:0x8 }
}
{
  real_time:10557.434352 global_time:72765, event:`SYSCALLBUF_FLUSH' tid:123228, ticks:1659
  { syscall:'read', ret:0x50c, size:0x51c, desched:1 }
  { syscall:'read', ret:0x0, size:0x10, desched:1 }
}
{
  real_time:10557.434359 global_time:72766, event:`SYSCALL: close' (state:ENTERING_SYSCALL) tid:123228, ticks:1659
rax:0xffffffffffffffda rbx:0x0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x0 rdi:0x4 rbp:0x4 rsp:0x7b9e44200c98 r8:0x0 r9:0x7b9e44200d20 r10:0x1b0 r11:0x246 r12:0x1000 r13:0x10000000 r14:0x7b9e44200e48 r15:0x50c rip:0x7b9e4ff5f19e eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x3 fs_base:0x7b9e505a6380 gs_base:0x0
}
{
  real_time:10557.434362 global_time:72767, event:`SYSCALLBUF_RESET' tid:123228, ticks:1659
}
{
  real_time:10557.434387 global_time:72768, event:`SYSCALL: close' (state:EXITING_SYSCALL) tid:123228, ticks:1659
rax:0x0 rbx:0x0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x0 rdi:0x4 rbp:0x4 rsp:0x7b9e44200c98 r8:0x0 r9:0x7b9e44200d20 r10:0x1b0 r11:0x246 r12:0x1000 r13:0x10000000 r14:0x7b9e44200e48 r15:0x50c rip:0x7b9e4ff5f19e eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x3 fs_base:0x7b9e505a6380 gs_base:0x0
  { tid:123227, addr:0x7b9e505e350c, length:0x1 }
}
{
  real_time:10557.434412 global_time:72769, event:`SYSCALL: getdents64' (state:ENTERING_SYSCALL) tid:123228, ticks:4594
rax:0xffffffffffffffda rbx:0x1000 rcx:0xffffffffffffffff rdx:0x1000 rsi:0x7b9e44270000 rdi:0x3 rbp:0x7b9e44200e40 rsp:0x7b9e44200da0 r8:0x7b9e44271000 r9:0x1 r10:0x4e r11:0x246 r12:0x0 r13:0x2 r14:0x7b9e44200e20 r15:0x50 rip:0x7b9e4ff6026a eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xd9 fs_base:0x7b9e505a6380 gs_base:0x0
}
{
  real_time:10557.434442 global_time:72770, event:`SYSCALL: getdents64' (state:EXITING_SYSCALL) tid:123228, ticks:4594
rax:0x0 rbx:0x1000 rcx:0xffffffffffffffff rdx:0x1000 rsi:0x7b9e44270000 rdi:0x3 rbp:0x7b9e44200e40 rsp:0x7b9e44200da0 r8:0x7b9e44271000 r9:0x1 r10:0x4e r11:0x246 r12:0x0 r13:0x2 r14:0x7b9e44200e20 r15:0x50 rip:0x7b9e4ff6026a eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xd9 fs_base:0x7b9e505a6380 gs_base:0x0
  { tid:123228, addr:0x7b9e44270000, length:(nil) }
}
{
  real_time:10557.434486 global_time:72771, event:`PATCH_SYSCALL' tid:123228, ticks:4600
rax:0x65 rbx:0x1e15b rcx:0xffffffffffffffff rdx:0x0 rsi:0x1e15b rdi:0x10 rbp:0x7b9e44200ef0 rsp:0x7b9e44200da8 r8:0x0 r9:0x1 r10:0x0 r11:0x246 r12:0x7b9e44200ef0 r13:0x7b9e4426f008 r14:0x7b9e44200e40 r15:0x0 rip:0x7b9e4ff5f915 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xffffffffffffffff fs_base:0x7b9e505a6380 gs_base:0x0
  { tid:123228, addr:0x7b9e506696fa, length:0x5e }
  { tid:123228, addr:0x7b9e4ff5f915, length:0xb }
}
{
  real_time:10557.434533 global_time:72772, event:`SYSCALL: ptrace' (state:ENTERING_SYSCALL) tid:123228, ticks:4610
rax:0xffffffffffffffda rbx:0x7b9e43ffffa0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x1e15b rdi:0x10 rbp:0x7b9e43fffef0 rsp:0x7b9e43fffdc0 r8:0x0 r9:0x1 r10:0x0 r11:0x246 r12:0x7b9e44200ef0 r13:0x7b9e4426f008 r14:0x7b9e44200e40 r15:0x0 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x65 fs_base:0x7b9e505a6380 gs_base:0x0
}
{
  real_time:10557.434553 global_time:72773, event:`SYSCALL: ptrace' (state:EXITING_SYSCALL) tid:123228, ticks:4610
rax:0x0 rbx:0x7b9e43ffffa0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x1e15b rdi:0x10 rbp:0x7b9e43fffef0 rsp:0x7b9e43fffdc0 r8:0x0 r9:0x1 r10:0x0 r11:0x246 r12:0x7b9e44200ef0 r13:0x7b9e4426f008 r14:0x7b9e44200e40 r15:0x0 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x65 fs_base:0x7b9e505a6380 gs_base:0x0
}
{
  real_time:10557.434598 global_time:72774, event:`PATCH_SYSCALL' tid:123228, ticks:4616
rax:0x3d rbx:0x1e15b rcx:0xffffffffffffffff rdx:0x40000000 rsi:0x7b9e44200dbc rdi:0x1e15b rbp:0x7b9e44200ef0 rsp:0x7b9e44200da8 r8:0x0 r9:0x1 r10:0x0 r11:0x246 r12:0x1e15b r13:0x7b9e44200dc0 r14:0x7b9e44200dbc r15:0x0 rip:0x7b9e4ff5f935 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xffffffffffffffff fs_base:0x7b9e505a6380 gs_base:0x0
  { tid:123228, addr:0x7b9e50669758, length:0x5e }
  { tid:123228, addr:0x7b9e4ff5f935, length:0xb }
}
{
  real_time:10557.434623 global_time:72775, event:`SYSCALL: wait4' (state:ENTERING_SYSCALL) tid:123228, ticks:4625
rax:0xffffffffffffffda rbx:0x7b9e43ffffa0 rcx:0xffffffffffffffff rdx:0x40000000 rsi:0x7b9e44200dbc rdi:0x1e15b rbp:0x7b9e43fffef0 rsp:0x7b9e43fffdc0 r8:0x0 r9:0x1 r10:0x0 r11:0x246 r12:0x1e15b r13:0x7b9e44200dc0 r14:0x7b9e44200dbc r15:0x0 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x3d fs_base:0x7b9e505a6380 gs_base:0x0
}
{
  real_time:10557.434634 global_time:72776, event:`SYSCALL: sched_yield' (state:EXITING_SYSCALL) tid:123227, ticks:525773289
rax:0x0 rbx:0x7b9e5008bcc8 rcx:0xffffffffffffffff rdx:0x0 rsi:0x0 rdi:0xefffffffffffffff rbp:0x7ffd2a7be590 rsp:0x7ffd2a7be4f8 r8:0x0 r9:0x0 r10:0x0 r11:0x246 r12:0x7ffd2a7be51c r13:0x7b9e44000000 r14:0x1 r15:0x201000 rip:0x7b9e505de9db eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x18 fs_base:0x7b9e505a6380 gs_base:0x0
}
=== Start rr backtrace:
rr(_ZN2rr9GdbServer15emergency_debugEPNS_4TaskE+0x159)[0x5f50bbfab279]
rr(+0xda941)[0x5f50bbfbe941]
rr(+0xdb4c6)[0x5f50bbfbf4c6]
rr(_ZN2rr13RecordSession21process_syscall_entryEPNS_10RecordTaskEPNS0_9StepStateEPNS0_12RecordResultENS_13SupportedArchE+0x3a4)[0x5f50bbff3174]
rr(_ZN2rr13RecordSession29handle_seccomp_traced_syscallEPNS_10RecordTaskEPNS0_9StepStateEPNS0_12RecordResultEPb+0x3a9)[0x5f50bbfecc39]
rr(_ZN2rr13RecordSession19handle_ptrace_eventEPPNS_10RecordTaskEPNS0_9StepStateEPNS0_12RecordResultEPb+0x782)[0x5f50bbfeda42]
rr(_ZN2rr13RecordSession11record_stepEv+0x2bc)[0x5f50bbff96ac]
rr(_ZN2rr13RecordCommand3runERSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EE+0xe8e)[0x5f50bbfe9dce]
rr(main+0x191)[0x5f50bbf36161]
/usr/lib/libc.so.6(+0x26010)[0x70f853426010]
/usr/lib/libc.so.6(__libc_start_main+0x8a)[0x70f8534260ca]
rr(_start+0x25)[0x5f50bbf378a5]
=== End rr backtrace
@khuey
Copy link
Collaborator

khuey commented Apr 19, 2024

Looks like this is a bug in our PTRACE_ATTACH emulation. Event 72772 is a PTRACE_ATTACH to tid 123227, then event 72776 is that process executing a syscall exit. Seems like we're not expecting the tracee to be in a syscall?

@rocallahan
Copy link
Collaborator

We do handle PTRACE_ATTACH to a tracee blocked in a syscall, in general. Here's a test for that: f4ff830

@rocallahan
Copy link
Collaborator

Can provide executable if needed.

Sure. Steps to reproduce would be useful.

@nuudlman
Copy link
Author

nuudlman commented May 17, 2024

Please be warned that the executable is 5.3GB, and compiling it from source will take ~50GB of disk space (and a lot of time). I'm waiting on it to upload but meanwhile to reproduce on it on your own:
git clone --depth=1 https://git.sr.ht/~nuudlman/bazel479k and then bazel build //:opt --config=asan --//:llvm=19.

To get the bug rr record bazel-bin/external/llvm-19/llvm/opt --help. I'm running on Zen 3 system and compiling with GCC 13.2.1 and 14.1.0

@nuudlman
Copy link
Author

@rocallahan
Copy link
Collaborator

I downloaded that binary and ran it with rr record ./cupt90r924jwo9l40c13ik4hxobgfv1o --help. It recorded and replayed successfully. This is Ubuntu 24 LTS, 6.8.0-1008-aws kernel.

@rocallahan
Copy link
Collaborator

The assertion is similar to the one in #3745... which I also can't reproduce. If you re-reproduce this bug with master, the assertion should include the pid that we think sent SIGSTOP. If you can identify that process that would be helpful.

@nuudlman
Copy link
Author

Im also using Arch, just like that other bug. Ill build master and see if I can get you the PID.

@nuudlman
Copy link
Author

I can't reproduce it with master. The arch version that does have the issue is 5.7.0

@rocallahan
Copy link
Collaborator

do you want to try bisecting to find the rr change that fixed this?

@nuudlman
Copy link
Author

That's a good idea, I'll try it.

@nuudlman
Copy link
Author

I can't reproduce with rr 5.7.0 built from source, only with the Arch package. What's even stranger is that when I copy the build config that Arch uses, I still can't reproduce the issue. Since building from source seems to work, I'll try to get a trace of it and upload it here.

@nuudlman
Copy link
Author

Since this seems like a bug with the Arch package, I'll close this for now unless you want to investigate further.

@rocallahan
Copy link
Collaborator

Can you file a Arch bug about it?

@rocallahan
Copy link
Collaborator

PS thanks for checking this out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants