Get podname and namespace "unknown" #102

wsszh · 2022-07-15T06:57:44Z

Hi, I set the filenameTemplate: "{uuid}-dump-{timestamp}-{hostname}-{exe_name}-{pid}-{signal}-{podname}-{namespace}", but I get the filename like this: "9a1fc79c-758c-4599-a22d-2e94444a3250-dump-1657867608-segfaulter-segfaulter-1-4-unknown-unknown.zip". How to fix it?

joaogbcravo · 2023-05-22T11:07:21Z

Hey, I also got this unknown. How did you solve it?

Robert-Stam · 2024-01-16T15:40:47Z

We see his behaviour here as well (on AWS) any news?
I see the issue is set the closed, however it doesn't seems to be resolved?

No9 · 2024-01-17T00:27:26Z

Hey @Robert-Stam
Can you confirm which aws.values.xxx.yaml you have used in the deployment and which version of EKS you are using.
It's likely that the version of crio is now outdated as this hasn't been updated for a while.

Robert-Stam · 2024-01-17T08:11:09Z

Hey @Robert-Stam Can you confirm which aws.values.xxx.yaml you have used in the deployment and which version of EKS you are using. It's likely that the version of crio is now outdated as this hasn't been updated for a while.

I have used the settings from: https://github.com/IBM/core-dump-handler/blob/main/charts/core-dump-handler/values.aws.yaml

We are using Kubernetes 1.28 (on Intel hardware, m6i family) with the AMI: amazon-eks-node-1.28-v20240110
See: https://github.com/awslabs/amazon-eks-ami/releases/tag/v20240110

Thanks in advance!

Robert-Stam · 2024-03-15T14:43:47Z

@No9 Hi Anton, any update on this?

No9 · 2024-03-22T22:28:59Z

I don't have access to an AWS account to debug.
Can you log into an agent container that has processed a core dump and provide the output of

cat /var/mnt/core-dump-handler/composer.log

If there are no errors can you enable debugging by setting
https://github.com/IBM/core-dump-handler/blob/main/charts/core-dump-handler/values.yaml#L27
to Debug

Robert-Stam · 2024-04-05T09:48:34Z

I tested with k8s v1.29 on AKS (Azure) and GKE (Google), and it all resolves to 'unknown' as namespace.
This is the output from the composer log on AKS

ERROR - 2024-04-05T09:41:43.149688332+00:00 - failed to create pod at index 0
ERROR - 2024-04-05T09:41:47.803435709+00:00 - Failed to get pod id

Hope this helps.

Robert-Stam · 2024-04-05T11:39:55Z

@No9 I tried to create a small PR to update the packages and crictr version, however without luck.
FYI, here is my PR: #158

Do you have tried k8s v1.29 in the IBM cloud successfully?

No9 · 2024-04-08T18:28:05Z

crictl is already on the host on IKS and others so it isn't a useful test.
Did you look for the compose logs as per this comment?
#102 (comment)

Robert-Stam · 2024-04-08T19:01:14Z

crictl is already on the host on IKS and others so it isn't a useful test.

Did you look for the compose logs as per this comment?

#102 (comment)

See: #102 (comment)

Robert-Stam · 2024-04-08T19:02:18Z

Do you have tried k8s v1.29 in the IBM cloud successfully?

Do you have tried k8s v1.29 in the IBM cloud successfully?

No9 · 2024-04-08T21:42:16Z

Sorry I missed your log output post for some reason.
So it appears as though this command is executing but not returning a list of pods:

crictl pods  --name <hostname> -o json

where <hostname> is captured from the crashing container.

Are you overriding the hostname on the deployed workloads?

In the meantime I'll take a look at a 1.29 cluster to confirm.
[Edit]
Confirmed that the core dump works as expected on IBM Cloud IKS 1.29 with no additional values parameters.
Tested with the following failing container.

kubectl run -i -t segfaulter --image=quay.io/icdh/segfaulter --restart=Never

Robert-Stam · 2024-04-09T07:47:38Z

Sorry I missed your log output post for some reason. So it appears as though this command is executing but not returning a list of pods:
crictl pods  --name <hostname> -o json
where <hostname> is captured from the crashing container.

Are you overriding the hostname on the deployed workloads?

In the meantime I'll take a look at a 1.29 cluster to confirm. [Edit] Confirmed that the core dump works as expected on IBM Cloud IKS 1.29 with no additional values parameters. Tested with the following failing container.
kubectl run -i -t segfaulter --image=quay.io/icdh/segfaulter --restart=Never

I am not overriding the hostname

To make sure we are on the same page, you did test with {namespace} in the filenameTemplate and that is filled out correctly?

No9 · 2024-04-09T20:14:17Z

Revalidated with this config

composer:
  ignoreCrio: false
  crioImageCmd: "img"
  logLevel: "Warn"
  filenameTemplate: "{uuid}-dump-{timestamp}-{hostname}-{exe_name}-{pid}-{signal}-{namespace}"

Ran kubectl run -i -t segfaulter --image=quay.io/icdh/segfaulter --restart=Never

The following output from the container showing the default namespace is obtained.

[2024-04-09T20:04:27Z INFO  core_dump_agent] Uploading: /var/mnt/core-dump-handler/cores/3fb6b86a-6726-4f5c-80fd-f34e8a971536-dump-1712693067-segfaulter-segfaulter-1-4-default.zip
[2024-04-09T20:04:27Z INFO  core_dump_agent] zip size is 28610
[2024-04-09T20:04:27Z INFO  core_dump_agent] S3 Returned: 200

Can I suggest getting a debug container on the host and establishing what happens when the following is ran:
If JSON is returned can you either post it here and/or validate it in the test suite.

crictl pods  --name <hostname> -o json

Thanks
[Edit]
kubernetes info IBM Kubernetes Service 1.29.3_1531

Robert-Stam · 2024-04-10T09:07:56Z

@No9 Anton, I executed your command in the running container (ibm/core-dump-handler:v8.10.0) on AWS (with k8s v1.29).
This is the result

[root@core-dump-lgd5p app]# ./crictl pods  --name ip-10-87-16-57.eu-west-2.compute.internal -o json
WARN[0000] runtime connect using default endpoints: [unix:///var/run/dockershim.sock unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock]. As the default settings are now deprecated, you should set the endpoint instead.
ERRO[0002] connect endpoint 'unix:///var/run/dockershim.sock', make sure you are running as root and the endpoint has been started: context deadline exceeded
ERRO[0004] connect endpoint 'unix:///run/containerd/containerd.sock', make sure you are running as root and the endpoint has been started: context deadline exceeded
FATA[0006] connect: connect endpoint 'unix:///run/crio/crio.sock', make sure you are running as root and the endpoint has been started: context deadline exceeded

And these are the settings applied (based on the log)

[2024-04-10T09:21:31Z INFO  core_dump_agent] Writing composer .env
    LOG_LEVEL=Warn
    IGNORE_CRIO=false
    CRIO_IMAGE_CMD=img
    USE_CRIO_CONF=false
    FILENAME_TEMPLATE={namespace}-{uuid}-dump-{timestamp}-{hostname}-{exe_name}-{pid}-{signal}
    LOG_LENGTH=500
    POD_SELECTOR_LABEL=
    TIMEOUT=600
    COMPRESSION=true
    CORE_EVENTS=false
    EVENT_DIRECTORY=/var/mnt/core-dump-handler/events

No9 · 2024-04-10T12:31:05Z

OK it looks like you are trying to run crictl from the handler container. What I was trying to suggest was setting up a debug session on the node.
e.g.

kubectl get nodes 
NAME             STATUS   ROLES           AGE    VERSION
node1   Ready    master,worker   176d   v1.26.9+52589e6
node2   Ready    master,worker   176d   v1.26.9+52589e6
node3    Ready    master,worker   176d   v1.26.9+52589e6

With the node name, it doesn't matter which, run

 kubectl debug node/node1 --image=ubuntu

When you have a debug session run something like the following:

/host/usr/bin/crictl -r unix:///host/run/crio/crio.sock pods  --name core-dump-lgd5p -o json

Where /host/usr/bin/crictl is the location of wherever you have configured to copy crictl and unix:///host/run/crio/crio.sock is the crio.socket which may be in a different location and core-dump-lgd5p is the pod name

Expected output:

{
  "items": [
    {
      "id": "df2bb27cbc78c2fb51aea8cb2f9eeb6124c871244a5fb71e989458bb673125df",
      "metadata": {
        "name": "core-dump-handler-7kqc6",
        "uid": "c8ea5ce9-72be-4826-82b3-b8c3a8144d50",
        "namespace": "observe",
        "attempt": 0
      },
      "state": "SANDBOX_READY",
      "createdAt": "1712691523593249607",
      "labels": {
        "controller-revision-hash": "7b6c988b5d",
        "io.kubernetes.container.name": "POD",
        "io.kubernetes.pod.name": "core-dump-handler-7kqc6",
        "io.kubernetes.pod.namespace": "observe",
        "io.kubernetes.pod.uid": "c8ea5ce9-72be-4826-82b3-b8c3a8144d50",
        "name": "core-dump-ds",
        "pod-template-generation": "1"
      },
      "annotations": {
        "kubectl.kubernetes.io/default-container": "coredump-container",
        "kubernetes.io/config.seen": "2024-04-09T14:38:43.120492372-05:00",
        "kubernetes.io/config.source": "api",
        "openshift.io/scc": "core-dump-admin-privileged"
      },
      "runtimeHandler": ""
    }
  ]
}

wsszh changed the title ~~Get podname and namespace related to the core-dump file~~ Get podname and namespace "unknow" Jul 15, 2022

wsszh changed the title ~~Get podname and namespace "unknow"~~ Get podname and namespace "unknown" Jul 15, 2022

wsszh closed this as completed Jul 15, 2022

No9 reopened this Jan 17, 2024

Robert-Stam mentioned this issue Apr 5, 2024

OAS-9150 | Updated rust packages & Updated crictl to v1.29.0 #158

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get podname and namespace "unknown" #102

Get podname and namespace "unknown" #102

wsszh commented Jul 15, 2022 •

edited

Loading

joaogbcravo commented May 22, 2023 •

edited

Loading

Robert-Stam commented Jan 16, 2024

No9 commented Jan 17, 2024

Robert-Stam commented Jan 17, 2024

Robert-Stam commented Mar 15, 2024 •

edited

Loading

No9 commented Mar 22, 2024

Robert-Stam commented Apr 5, 2024

Robert-Stam commented Apr 5, 2024

No9 commented Apr 8, 2024

Robert-Stam commented Apr 8, 2024

Robert-Stam commented Apr 8, 2024

No9 commented Apr 8, 2024 •

edited

Loading

Robert-Stam commented Apr 9, 2024 •

edited

Loading

No9 commented Apr 9, 2024 •

edited

Loading

Robert-Stam commented Apr 10, 2024 •

edited

Loading

No9 commented Apr 10, 2024 •

edited

Loading

Get podname and namespace "unknown" #102

Get podname and namespace "unknown" #102

Comments

wsszh commented Jul 15, 2022 • edited Loading

joaogbcravo commented May 22, 2023 • edited Loading

Robert-Stam commented Jan 16, 2024

No9 commented Jan 17, 2024

Robert-Stam commented Jan 17, 2024

Robert-Stam commented Mar 15, 2024 • edited Loading

No9 commented Mar 22, 2024

Robert-Stam commented Apr 5, 2024

Robert-Stam commented Apr 5, 2024

No9 commented Apr 8, 2024

Robert-Stam commented Apr 8, 2024

Robert-Stam commented Apr 8, 2024

No9 commented Apr 8, 2024 • edited Loading

Robert-Stam commented Apr 9, 2024 • edited Loading

No9 commented Apr 9, 2024 • edited Loading

Robert-Stam commented Apr 10, 2024 • edited Loading

No9 commented Apr 10, 2024 • edited Loading

wsszh commented Jul 15, 2022 •

edited

Loading

joaogbcravo commented May 22, 2023 •

edited

Loading

Robert-Stam commented Mar 15, 2024 •

edited

Loading

No9 commented Apr 8, 2024 •

edited

Loading

Robert-Stam commented Apr 9, 2024 •

edited

Loading

No9 commented Apr 9, 2024 •

edited

Loading

Robert-Stam commented Apr 10, 2024 •

edited

Loading

No9 commented Apr 10, 2024 •

edited

Loading