Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backpressure: overly susceptible to temporary issues #70034

Open
mwarkentin opened this issue May 1, 2024 · 1 comment
Open

Backpressure: overly susceptible to temporary issues #70034

mwarkentin opened this issue May 1, 2024 · 1 comment

Comments

@mwarkentin
Copy link
Member

Environment

SaaS (https://sentry.io/)

Steps to Reproduce

Over the last 30 days, we've experienced ~265 instances where backpressure has been marked as unhealthy due to a connection timeout when checking the health of a redis or rabbitmq cluster: https://cloudlogging.app.goo.gl/KNZDAduqrHWQn5At7

image

Each of these come with a corresponding pause and delay in ingestion:
image

1 timeout seems to trigger about 15s of ingestion latency.

There can also be instances where multiple trigger in succession, which seems to be enough to trigger a backlog large enough that it may page SRE while it burns down the backlog:

image

Expected Result

Some possible improvements we can make:

  • Add some retry functionality to avoid flakes
  • Require multiple events in a row to trigger the unhealthy state
  • Have backpressure fail open instead of closed (could have negative impact if the failures are caused by a real outage of a cluster).

I would probably start with adding retries on failure as it seems like the simplest thing that can work.

Actual Result

Backpressure pauses ingestion from a single failure.

Product Area

Ingestion and Filtering

Link

No response

DSN

No response

Version

No response

@loewenheim
Copy link
Contributor

I agree with adding retries. Making failure to check not count as unhealthy sounds dicey to me, on the other hand, for the reason you mention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants