Remove kube-state-metrics labels form Kubernetes workload alerts #37

brancz · 2018-06-18T12:58:14Z

It is often confusing for users, when there are alerts about Kubernetes workloads (deployments, daemonsets, statefulsets, etc) and it seems at first sight that it is coming from the kube-state-metrics target. We should probably drop any labels that identify kube-state-metrics and just leave the actual contextual information like the object name and namespace.

My hunch is that this would need to be configurable. I understand that for example in the Kausal ksonnet-prometheus package this would be the instance label, but in most other setups out there (such as the default Prometheus configuration from the Prometheus repo and the Prometheus Operator) this will be labels with the respective Kubernetes resource in it (pod/service/namespace/etc). It's also reasonable that people can do this however they like.

@tomwilkie @metalmatze

The text was updated successfully, but these errors were encountered:

tomwilkie · 2018-06-18T13:00:20Z

I agree; but isn't this up to the scape config? ie out of scope for the mixin?

The mixin should only depend on labels from KSM for sure.

tomwilkie · 2018-06-18T13:12:51Z

Flip side of this is the "rule" that alerts should at least specify a job name (or at least, I recall that rule but can't find a reference to it). In this case do we have to have a job name, or is the KSM metric prefix (kube_) distinct enough?

brancz · 2018-06-18T13:37:12Z

I think we are aligned, just want to give an example to make 100% clear what I mean. For example a possible alert you may receive today:

{
    "alert": "KubeDeploymentGenerationMismatch",
    "instance": "<ip:8080>",
    "job": "kube-state-metrics",
    "service": "kube-state-metrics",
    "namespace": "default",
    "deployment": "my-app",
}

Because all services are configured/relabeled/scraped the same way, and all services get the service label relabeled, this causes problems here. The instance label would not be super useful either, as it's about an abstract Kubernetes thing, rather than kube-state-metrics, so an instance is not really useful (kube-state-metrics has a separate port for metrics about itself). job being kube-state-metrics may still be applicable, but even that people may find confusing.

lilic · 2019-09-26T13:47:09Z

@brancz I guess you already had an issue for this open? :)

brancz · 2019-09-26T13:56:59Z

Let me update myself here because I don't know what I was thinking when I wrote the last comments.

What we would like to do is not have the job selector whenever using kube-state-metrics metrics in alerts, except the KubeStateMetricsDown alert. The reason for this is: we perform metric relabeling on kube-state-metrics metrics dropping the job label, as that way there are no confusing label on these metrics, but it retains up monitoring for the kube-state-metrics endpoint. I think this is ok, as the metrics are not about a job anyways, they are in some way meta metrics or application metrics.

@csmarchbanks @metalmatze @gouthamve @tomwilkie does this sound ok with you? (forget about everything I said in the previous comments)

csmarchbanks · 2019-09-27T15:15:47Z

That seems reasonable to me, though does any work need to be done? If not using the job label for kube-state-metrics couldn't you set kubeStateMetricsSelector: '' in $._config?

brancz · 2019-09-30T08:01:21Z

You're totally right, there's nothing really to do for that. I was thinking about the KubeStateMetricsDown alert. But it turns out that's actually defined in our downstream usage, not here. We can just set the job selector there "manually" and set the job label to empty for the kubernetes-mixin config.

github-actions · 2024-12-10T00:27:34Z

This issue has not had any activity in the past 30 days, so the
stale label has been added to it.

The stale label will be removed if there is new activity
The issue will be closed in 7 days if there is no new activity
Add the keepalive label to exempt this issue from the stale check action

Thank you for your contributions!

pvlltvk mentioned this issue Aug 30, 2018

Adding additional relabeling rules to alert_relabel_configs: prometheus-operator/prometheus-operator#1805

Closed

github-actions bot added stale and removed stale labels Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove kube-state-metrics labels form Kubernetes workload alerts #37

Remove kube-state-metrics labels form Kubernetes workload alerts #37

brancz commented Jun 18, 2018

tomwilkie commented Jun 18, 2018

tomwilkie commented Jun 18, 2018

brancz commented Jun 18, 2018

lilic commented Sep 26, 2019

brancz commented Sep 26, 2019 •

edited

Loading

csmarchbanks commented Sep 27, 2019

brancz commented Sep 30, 2019

github-actions bot commented Dec 10, 2024

Remove kube-state-metrics labels form Kubernetes workload alerts #37

Remove kube-state-metrics labels form Kubernetes workload alerts #37

Comments

brancz commented Jun 18, 2018

tomwilkie commented Jun 18, 2018

tomwilkie commented Jun 18, 2018

brancz commented Jun 18, 2018

lilic commented Sep 26, 2019

brancz commented Sep 26, 2019 • edited Loading

csmarchbanks commented Sep 27, 2019

brancz commented Sep 30, 2019

github-actions bot commented Dec 10, 2024

brancz commented Sep 26, 2019 •

edited

Loading