Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPUThrottlingHigh alert too easily triggered #36

Closed
lnovara opened this issue May 29, 2020 · 4 comments · Fixed by #43
Closed

CPUThrottlingHigh alert too easily triggered #36

lnovara opened this issue May 29, 2020 · 4 comments · Fixed by #43
Labels
bug Something isn't working

Comments

@lnovara
Copy link
Contributor

lnovara commented May 29, 2020

The CPUThrottlingHigh alert is used to notify if the Kubelet is throttling the pod's CPU usage for more than 25% of time in the last 5 minutes. While this might indicate wrong CPU limits there are particular class of pods (e.g. node-exporter) that are more subject to throttling than other.

My proposal is either to drop this alert or to increase the threshold to, at least, 50-75% given the narrow time window.

What's your opinion about this?

I am also attaching some issues from upstream projects.

Refs:

@lzecca78
Copy link
Contributor

lzecca78 commented Jun 3, 2020

Is not easy to tune this kind of alert, is very domain-specific and depends a lot on the application's behavior. We can try to set it higher (50-75%) as you suggested, and analyze the behavior after that change.

@ralgozino
Copy link
Member

It's not clear to me if the problem is the alert itself or the current limits values in use.

If I understand correctly, you are proposing to change only the alert threshold, right? In that case, we won't know that the pods are being throttled. How much of an issue is that?

Would it make sense to drop the limits instead for some pods like OpenShift did for the monitoring ones?

@lzecca78
Copy link
Contributor

lzecca78 commented Jul 7, 2020

Can we temporarily disable this kind of alert? I think that is causing more problems than advantages and the great risk of all is that you eventually get used to see alerting in the Slack channel, and this drives to don't bother to them.
I am literally riddled by these alerts 🔫

@ralgozino
Copy link
Member

being that we are being DDoS by these alerts 😄 I'd say let's drop them and revisit them in the future. Or if possible leave them muted by default so one can go to alertmanager's dashboard and see them if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants