Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlocks vs. Storage Latency #9260

Open
turowicz opened this issue Dec 4, 2024 · 3 comments
Open

Deadlocks vs. Storage Latency #9260

turowicz opened this issue Dec 4, 2024 · 3 comments

Comments

@turowicz
Copy link

turowicz commented Dec 4, 2024

We have noticed a large correlation between the .NET contention rate and Orleans storage write latency. We are using a custom StorageProvider that is fully async and are trying to pinpoint where are all the threads being locked. We get logs that .NET is hanging.

The situation is caused by temporarily slow storage backend, that we are in progress of upgrading. That said, I don't think we should be having a high contention rate just because some async I/O calls are taking longer than usual.

Are you able to tell if there is something inside Orleans that is causing a sync call that saturate the thread pool?

As you can see below, as soon there is any discernable latency bump, the contention explodes through the roof.

We have a high throughput system using BroadcastChannels. Grains are long lived and all state writes happen on 5 minute timers.

Latency last 2 days:
Image

Contention last 2 days: (some of them have gaps because contention caused .NET to freeze completely.)
Image

cc @ReubenBond

@ReubenBond
Copy link
Member

Hi @turowicz , are you able to provide more info? Situations like this are hard to diagnose from counters alone, but they are a good first indicator. CPU profiling traces or a memory dump from the malperforming process would be very useful. With that, we should be able to quickly identify the issue. If it's an Orleans issue, it's most likely the directory cache, which we are hoping to replace soon. I wouldn't be surprised if it were something else, though, like logging or IO.

@turowicz
Copy link
Author

turowicz commented Dec 6, 2024

@ReubenBond could you point me towards an article explaining how to perform a dump that would be most useful to you?

@ReubenBond
Copy link
Member

A full heap dump would be useful. You can capture one with dotnet-dump: https://learn.microsoft.com/en-us/dotnet/core/diagnostics/dotnet-dump#dotnet-dump-collect

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants