Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove kube-state-metrics labels form Kubernetes workload alerts #37

Open
brancz opened this issue Jun 18, 2018 · 8 comments
Open

Remove kube-state-metrics labels form Kubernetes workload alerts #37

brancz opened this issue Jun 18, 2018 · 8 comments

Comments

@brancz
Copy link
Member

brancz commented Jun 18, 2018

It is often confusing for users, when there are alerts about Kubernetes workloads (deployments, daemonsets, statefulsets, etc) and it seems at first sight that it is coming from the kube-state-metrics target. We should probably drop any labels that identify kube-state-metrics and just leave the actual contextual information like the object name and namespace.

My hunch is that this would need to be configurable. I understand that for example in the Kausal ksonnet-prometheus package this would be the instance label, but in most other setups out there (such as the default Prometheus configuration from the Prometheus repo and the Prometheus Operator) this will be labels with the respective Kubernetes resource in it (pod/service/namespace/etc). It's also reasonable that people can do this however they like.

@tomwilkie @metalmatze

@tomwilkie
Copy link
Member

I agree; but isn't this up to the scape config? ie out of scope for the mixin?

The mixin should only depend on labels from KSM for sure.

@tomwilkie
Copy link
Member

Flip side of this is the "rule" that alerts should at least specify a job name (or at least, I recall that rule but can't find a reference to it). In this case do we have to have a job name, or is the KSM metric prefix (kube_) distinct enough?

@brancz
Copy link
Member Author

brancz commented Jun 18, 2018

I think we are aligned, just want to give an example to make 100% clear what I mean. For example a possible alert you may receive today:

{
    "alert": "KubeDeploymentGenerationMismatch",
    "instance": "<ip:8080>",
    "job": "kube-state-metrics",
    "service": "kube-state-metrics",
    "namespace": "default",
    "deployment": "my-app",
}

Because all services are configured/relabeled/scraped the same way, and all services get the service label relabeled, this causes problems here. The instance label would not be super useful either, as it's about an abstract Kubernetes thing, rather than kube-state-metrics, so an instance is not really useful (kube-state-metrics has a separate port for metrics about itself). job being kube-state-metrics may still be applicable, but even that people may find confusing.

@lilic
Copy link
Contributor

lilic commented Sep 26, 2019

@brancz I guess you already had an issue for this open? :)

@brancz
Copy link
Member Author

brancz commented Sep 26, 2019

Let me update myself here because I don't know what I was thinking when I wrote the last comments.

What we would like to do is not have the job selector whenever using kube-state-metrics metrics in alerts, except the KubeStateMetricsDown alert. The reason for this is: we perform metric relabeling on kube-state-metrics metrics dropping the job label, as that way there are no confusing label on these metrics, but it retains up monitoring for the kube-state-metrics endpoint. I think this is ok, as the metrics are not about a job anyways, they are in some way meta metrics or application metrics.

@csmarchbanks @metalmatze @gouthamve @tomwilkie does this sound ok with you? (forget about everything I said in the previous comments)

@csmarchbanks
Copy link
Member

That seems reasonable to me, though does any work need to be done? If not using the job label for kube-state-metrics couldn't you set kubeStateMetricsSelector: '' in $._config?

@brancz
Copy link
Member Author

brancz commented Sep 30, 2019

You're totally right, there's nothing really to do for that. I was thinking about the KubeStateMetricsDown alert. But it turns out that's actually defined in our downstream usage, not here. We can just set the job selector there "manually" and set the job label to empty for the kubernetes-mixin config.

Copy link

This issue has not had any activity in the past 30 days, so the
stale label has been added to it.

  • The stale label will be removed if there is new activity
  • The issue will be closed in 7 days if there is no new activity
  • Add the keepalive label to exempt this issue from the stale check action

Thank you for your contributions!

@github-actions github-actions bot added stale and removed stale labels Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants