Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Divergence with tcmalloc on arm64 #3740

Open
pcc opened this issue Apr 26, 2024 · 2 comments
Open

Divergence with tcmalloc on arm64 #3740

pcc opened this issue Apr 26, 2024 · 2 comments

Comments

@pcc
Copy link
Contributor

pcc commented Apr 26, 2024

I'm seeing the following divergence while replaying a tcmalloc-utilizing program on arm64:

[FATAL src/ReplaySession.cc:1226:check_ticks_consistency()]
 (task 2944657 (rec:2944634) at time 424)
 -> Assertion `ticks_now == trace_ticks' failed to hold. ticks mismatch for 'SIGNAL: SIGSEGV(det)'; expected 10014507, got 10014509

I suspect this to be caused by accesses to CNTVCT_EL0 in the tcmalloc code. Unfortunately the kernel does not support trapping on count register access on arm64:

prctl(PR_SET_TSC, PR_TSC_SIGSEGV)       = -1 EINVAL (Invalid argument)

It would be possible for the kernel to configure the CPU to trap on this access by clearing CNTKCTL_EL1.EL0VCTEN.

@pcc
Copy link
Contributor Author

pcc commented Apr 27, 2024

The kernel side of this is https://lore.kernel.org/all/[email protected]/T/

@pcc
Copy link
Contributor Author

pcc commented Apr 27, 2024

(I also confirmed that it's CNTVCT_EL0 -- if I nop out the MRS instruction in the binary I can no longer reproduce the divergence.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant