Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only containerD is shown in application #5

Open
sumeet-zuora opened this issue Sep 29, 2022 · 20 comments
Open

Only containerD is shown in application #5

sumeet-zuora opened this issue Sep 29, 2022 · 20 comments

Comments

@sumeet-zuora
Copy link

As per documents and installation, after installing coroot and agent, prometheus was attached properly and only visible application was containerD any help appreciated

@apetruhin
Copy link
Member

@Schaudhari7565, please attach logs of the agent

@sumeet-zuora
Copy link
Author

corootnodeagent-n4cbb.txt
Attached is the logs from one of the agent

@sumeet-zuora
Copy link
Author

manifest for agent

---
# Source: corootnodeagent/templates/daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  namespace: coroot
  labels:
    chart: "corootnodeagent-1.0.0"
    release: "corootnodeagent"
    heritage: "Helm"
  name: corootnodeagent
spec:
  selector:
    matchLabels:
      app: corootnodeagent
      group: observability
      provider: tools
  template:
    metadata:
      annotations:
        prometheus.io/port: "80"
        prometheus.io/scrape: "true"
      labels:
        app: corootnodeagent
        group: observability
        provider: tools
    spec:
      imagePullSecrets:
        - name: regcred
      tolerations:
        - operator: Exists
      hostPID: true
      containers:
        - name: corootnodeagent
          image: "ghcr.io/coroot/coroot-node-agent:latest"
          imagePullPolicy: "IfNotPresent"
          args: ["--cgroupfs-root", "/host/sys/fs/cgroup"]
          ports:
            - name: http
              containerPort: 80
          securityContext:
            privileged: true
          volumeMounts:
            - mountPath: /host/sys/fs/cgroup
              name: cgroupfs
              readOnly: true
            - mountPath: /sys/kernel/debug
              name: debugfs
              readOnly: false
      volumes:
        - hostPath:
            path: /sys/fs/cgroup
          name: cgroupfs
        - hostPath:
            path: /sys/kernel/debug
          name: debugfs

@sumeet-zuora
Copy link
Author

also, I am using VictoriaMetrics instead of Prometheus .. not sure if this breaks but connection did work as expected

@apetruhin
Copy link
Member

At first glance, nothing unusual.
Please show me how it looks in Coroot: main page and settings page of the project.

@apetruhin
Copy link
Member

Also, Coroot logs would help.

@sumeet-zuora
Copy link
Author

ahh.. I was missing the kube-state-metrics, seems like a progress no more logs other than compaction .. does it take some time? for UI to show up services

W0929 19:03:37.575715       1 containers.go:65] unknown pod: kube-system/cilium-bs7md, seems like no kube-state-metrics installed
W0929 19:03:37.575736       1 containers.go:65] unknown pod: coroot/corootnodeagent-cwk82, seems like no kube-state-metrics installed
W0929 19:03:37.576552       1 containers.go:65] unknown pod: pomerium/pomerium-proxy-587b77dd7c-zj899, seems like no kube-state-metrics installed
W0929 19:03:37.576582       1 containers.go:65] unknown pod: pomerium/pomerium-authenticate-6f5c68ff6b-p4vzb, seems like no kube-state-metrics installed
W0929 19:03:37.576603       1 containers.go:65] unknown pod: vertical-pod-autoscaler-ecc/vertical-pod-autoscaler-updater-f6c6c88d6-tq648, seems like no kube-state-metrics installed
W0929 19:03:37.577216       1 containers.go:65] unknown pod: kong-internal/kong-kong-internal-948b64c4b-26zzp, seems like no kube-state-metrics installed
W0929 19:03:37.577245       1 containers.go:65] unknown pod: vertical-pod-autoscaler/vertical-pod-autoscaler-recommender-577b8847df-nc84r, seems like no kube-state-metrics installed
W0929 19:03:37.577644       1 containers.go:65] unknown pod: kube-system/cilium-j8stt, seems like no kube-state-metrics installed
W0929 19:03:37.577675       1 containers.go:65] unknown pod: logging/elasticsearch-es-client-1, seems like no kube-state-metrics installed
W0929 19:03:37.577696       1 containers.go:65] unknown pod: zodiac/zookeeper-0, seems like no kube-state-metrics installed
W0929 19:03:37.578267       1 containers.go:65] unknown pod: logging/elasticsearch-es-client-1, seems like no kube-state-metrics installed
W0929 19:03:37.578308       1 containers.go:65] unknown pod: kube-system/cilium-ldzf5, seems like no kube-state-metrics installed
W0929 19:03:37.578330       1 containers.go:65] unknown pod: zodiac/elastic-master-2, seems like no kube-state-metrics installed
W0929 19:03:37.578518       1 containers.go:65] unknown pod: kube-system/cilium-bs7md, seems like no kube-state-metrics installed
I0929 19:03:37.584400       1 constructor.go:64] got 13 nodes, 1500 services, 1390 applications
I0929 19:03:39.063600       1 compaction.go:92] compaction iteration started
I0929 19:03:49.064250       1 compaction.go:92] compaction iteration started
I0929 19:03:57.011050       1 updater.go:53] worker iteration for 2tt6kt9l
I0929 19:03:59.158375       1 compaction.go:92] compaction iteration started
I0929 19:04:09.064052       1 compaction.go:92] compaction iteration started
I0929 19:04:19.064213       1 compaction.go:92] compaction iteration started

@sumeet-zuora
Copy link
Author

Still same, after almost 15 minutes .. only containerD is visible

image

@apetruhin
Copy link
Member

It can take some time (depending on the cluster size) to cache-updater download metrics of the kube-state-metrics for the first time.
Do you have more lines like this in Coroot logs?

I0929 19:03:57.011050       1 updater.go:53] worker iteration for 2tt6kt9l

Or maybe some errors?

@sumeet-zuora
Copy link
Author

Still nothing, no errors during startup .. only messages i see are

I0930 07:03:55.449972       1 main.go:29] version: 0.4.0
I0930 07:03:55.450088       1 db.go:39] using sqlite database
I0930 07:03:55.795158       1 cache.go:130] cache loaded from disk in 339.678568ms
I0930 07:03:55.795491       1 compaction.go:81] compaction worker started
I0930 07:03:55.795534       1 main.go:77] listening on 0.0.0.0:8080
I0930 07:03:56.796094       1 updater.go:53] worker iteration for 2tt6kt9l
I0930 07:04:05.795815       1 compaction.go:92] compaction iteration started
I0930 08:15:05.809959       1 compaction.go:155] compaction task 3c4b3c56d9bf3ed9c6fb8ca80b6e51d3 [1664511240,1664514840,1664518440,1664522040]:3600 -> 1664511240:14400 done in 12.387516ms
I0930 08:15:05.811276       1 compaction.go:144] deleting chunk after compaction: /data/cache/2tt6kt9l/2tt6kt9l-ad52fcad143b8b1451800115bbe853fe-1664511240-120-30.db
I0930 08:15:05.811322       1 compaction.go:144] deleting chunk after compaction: /data/cache/2tt6kt9l/2tt6kt9l-ad52fcad143b8b1451800115bbe853fe-1664514840-120-30.db
I0930 08:15:05.811344       1 compaction.go:144] deleting chunk after compaction: /data/cache/2tt6kt9l/2tt6kt9l-ad52fcad143b8b1451800115bbe853fe-1664518440-120-30.db
I0930 08:15:05.811370       1 compaction.go:144] deleting chunk after compaction: /data/cache/2tt6kt9l/2tt6kt9l-ad52fcad143b8b1451800115bbe853fe-1664522040-120-30.db
I0930 08:15:05.811410       1 compaction.go:155] compaction task ad52fcad143b8b1451800115bbe853fe [1664511240,1664514840,1664518440,1664522040]:3600 -> 1664511240:14400 done in 1.410773ms
I0930 08:15:15.795574       1 compaction.go:92] compaction iteration started
I0930 08:15:25.796272       1 compaction.go:92] compaction iteration started
I0930 08:15:26.200839       1 updater.go:53] worker iteration for 2tt6kt9l

@apetruhin
Copy link
Member

  • Can you show a screenshot of the settings page (/p/2tt6kt9l/settings)?
  • Execute the kube_pod_info query in your VictoriaMetrics and show the output.

@sumeet-zuora
Copy link
Author

image

so, I did found that, I was not scraping metrics of kube-state-metrics from where the coroot cluster was running, but with adding annotations I got the metrics
image

kube_pod_info{app_kubernetes_io_component="metrics", app_kubernetes_io_instance="kube-state-metrics", app_kubernetes_io_managed_by="Helm", app_kubernetes_io_name="kube-state-metrics", app_kubernetes_io_part_of="kube-state-metrics", app_kubernetes_io_version="2.6.0", container="kube-state-metrics", created_by_kind="DaemonSet", created_by_name="aws-node-termination-handler", datacenter="eks-12-ecc-xxxx-xxxxx", exported_namespace="aws-node-termination-handler", exported_node="ip-10-124-128-97.us-west-2.compute.internal", exported_pod="aws-node-termination-handler-4tzqm", helm_sh_chart="kube-state-metrics-4.20.1", host_ip="10.124.128.97", host_network="true", instance="10.8.30.247:8080", job="1", namespace="monitoring", node="ip-10-124-130-55.us-west-2.compute.internal", pod="kube-state-metrics-c6678766c-cbprt", pod_ip="10.124.128.97", pod_template_hash="c6678766c", priority_class="system-node-critical", uid="1385a8d7-9a21-4674-8dfa-b0cb50fe6b54"}

@sumeet-zuora
Copy link
Author

Something new showed up and it keeps on changing, different applications are show automatically under monitoring

image

does it take time to build cache or something?

@apetruhin
Copy link
Member

Coroot uses metrics gathered by kube-state-metrics to join containers into applications. So, this probably should fix the issue.

@sumeet-zuora
Copy link
Author

So, after adding the annotations and can see the metrics in VM.. still it complains about some pods missing and suddenly it detects them.. seems like it is loosing connections

W0930 18:55:57.155919       1 containers.go:65] unknown pod: logging/elasticsearch-es-client-1, seems like no kube-state-metrics installed
W0930 18:55:57.155951       1 containers.go:65] unknown pod: keda/keda-operator-675b587d7b-xcls7, seems like no kube-state-metrics installed
W0930 18:55:57.156000       1 containers.go:65] unknown pod: kube-system/cilium-7lfxm, seems like no kube-state-metrics installed
W0930 18:55:57.156039       1 containers.go:65] unknown pod: kube-system/cilium-ns58n, seems like no kube-state-metrics installed
W0930 18:55:57.156068       1 containers.go:65] unknown pod: logging/elasticsearch-es-warm-0, seems like no kube-state-metrics installed
W0930 18:55:57.156106       1 containers.go:65] unknown pod: kube-system/cilium-q5gqr, seems like no kube-state-metrics installed
W0930 18:55:57.156140       1 containers.go:65] unknown pod: kube-system/cilium-c6kzq, seems like no kube-state-metrics installed
W0930 18:55:57.156176       1 containers.go:65] unknown pod: kube-system/cilium-operator-69c65bf5c6-mrz6b, seems like no kube-state-metrics installed
W0930 18:55:57.156257       1 containers.go:65] unknown pod: elastic-operator/elastic-operator-1, seems like no kube-state-metrics installed
W0930 18:55:57.156292       1 containers.go:65] unknown pod: kube-system/cilium-c6kzq, seems like no kube-state-metrics installed
W0930 18:55:57.156336       1 containers.go:65] unknown pod: kube-system/cilium-c6kzq, seems like no kube-state-metrics installed
W0930 18:55:57.156374       1 containers.go:65] unknown pod: kube-system/kube-proxy-dxkzf, seems like no kube-state-metrics installed
W0930 18:55:57.156413       1 containers.go:65] unknown pod: kube-system/cilium-68x2b, seems like no kube-state-metrics installed
I0930 18:55:57.163314       1 constructor.go:64] got 18 nodes, 1656 services, 1450 applications
2022/09/30 18:55:57 http: panic serving 127.0.0.1:50250: runtime error: invalid memory address or nil pointer dereference
goroutine 5986 [running]:
net/http.(*conn).serve.func1()
	/usr/local/go/src/net/http/server.go:1825 +0xbf
panic({0xac2f00, 0x1203280})
	/usr/local/go/src/runtime/panic.go:844 +0x258
github.com/coroot/coroot/api/views/overview.Render(0xc00029cbd0)
	/go/src/api/views/overview/overview.go:107 +0xb07
github.com/coroot/coroot/api/views.Overview(...)
	/go/src/api/views/views.go:20
github.com/coroot/coroot/api.(*Api).Overview(0xa92fa0?, {0xc5ccf0, 0xc006c16380}, 0xc000241e00?)
	/go/src/api/api.go:193 +0x91
net/http.HandlerFunc.ServeHTTP(0xc006c0d000?, {0xc5ccf0?, 0xc006c16380?}, 0x0?)
	/usr/local/go/src/net/http/server.go:2084 +0x2f
github.com/gorilla/mux.(*Router).ServeHTTP(0xc0001c2240, {0xc5ccf0, 0xc006c16380}, 0xc006c0cc00)
	/go/pkg/mod/github.com/gorilla/[email protected]/mux.go:210 +0x1cf
net/http.serverHandler.ServeHTTP({0xc000775860?}, {0xc5ccf0, 0xc006c16380}, 0xc006c0cc00)
	/usr/local/go/src/net/http/server.go:2916 +0x43b
net/http.(*conn).serve(0xc000fbe460, {0xc5d398, 0xc0003dcd80})
	/usr/local/go/src/net/http/server.go:1966 +0x5d7
created by net/http.(*Server).Serve
	/usr/local/go/src/net/http/server.go:3071 +0x4db
I0930 18:55:57.325649       1 compaction.go:92] compaction iteration started
I0930 18:55:58.325732       1 updater.go:53] worker iteration for 2tt6kt9l

in drop down I can see applications
image

but after selecting nothing is there
image

also UI is flaky.. the applications keep on changing

@apetruhin
Copy link
Member

apetruhin commented Oct 11, 2022

@Schaudhari7565, apologies for the delayed response. We have fixed the panic. Please update Coroot.

@sumeet-zuora
Copy link
Author

I did update to latest 0.5.0 and still got panic is this due to large number of applications? wanted to know if we can restrict the applications or filter based on some labels .. like datacenter=eks16 to avoid reading all the metrics at same time

I1011 17:28:51.890568       1 constructor.go:68] got 46 nodes, 1557 services, 1484 applications
2022/10/11 17:28:52 http: panic serving 127.0.0.1:56478: runtime error: invalid memory address or nil pointer dereference
goroutine 20983 [running]:
net/http.(*conn).serve.func1()
	/usr/local/go/src/net/http/server.go:1825 +0xbf
panic({0xaf80e0, 0x1260280})
	/usr/local/go/src/runtime/panic.go:844 +0x258
github.com/coroot/coroot/auditor.(*appAuditor).cpu(0xc01207ebb8)
	/go/src/auditor/cpu.go:39 +0x4a2
github.com/coroot/coroot/auditor.Audit(0xc079e1e000)
	/go/src/auditor/auditor.go:26 +0x10a
github.com/coroot/coroot/api/views/overview.Render(0xc079e1e000)
	/go/src/api/views/overview/overview.go:40 +0x9d
github.com/coroot/coroot/api/views.Overview(...)
	/go/src/api/views/views.go:20
github.com/coroot/coroot/api.(*Api).Overview(0xc079e00120?, {0xc9f470, 0xc079e0c000}, 0xc0002c3680?)
	/go/src/api/api.go:194 +0x91
net/http.HandlerFunc.ServeHTTP(0xc079e1a000?, {0xc9f470?, 0xc079e0c000?}, 0xc0c526a9c0?)
	/usr/local/go/src/net/http/server.go:2084 +0x2f
github.com/gorilla/mux.(*Router).ServeHTTP(0xc000242000, {0xc9f470, 0xc079e0c000}, 0xc06c530000)
	/go/pkg/mod/github.com/gorilla/[email protected]/mux.go:210 +0x1cf
net/http.serverHandler.ServeHTTP({0xc06c512ea0?}, {0xc9f470, 0xc079e0c000}, 0xc06c530000)
	/usr/local/go/src/net/http/server.go:2916 +0x43b
net/http.(*conn).serve(0xc06c528000, {0xc9fb18, 0xc00013d9b0})
	/usr/local/go/src/net/http/server.go:1966 +0x5d7
created by net/http.(*Server).Serve
	/usr/local/go/src/net/http/server.go:3071 +0x4db
I1011 17:28:55.345169       1 compaction.go:92] compaction iteration started
I1011 17:29:05.344864       1 compaction.go:92] compaction iteration started
I1011 17:29:15.345428       1 compaction.go:92] compaction iteration started
I1011 17:29:25.345135       1 compaction.go:92] compaction iteration started
^C

@apetruhin
Copy link
Member

It is a new bug. We will fix it soon. Meanwhile, please install version 0.4.1

@sumeet-zuora
Copy link
Author

Scaled down to 0.4.1, will monitor the logs

@apetruhin
Copy link
Member

apetruhin commented Oct 13, 2022

@Schaudhari7565, we've fixed the panic bug. Please upgrade Coroot to version >=0.5.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants