Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get podname and namespace "unknown" #102

Open
wsszh opened this issue Jul 15, 2022 · 16 comments
Open

Get podname and namespace "unknown" #102

wsszh opened this issue Jul 15, 2022 · 16 comments

Comments

@wsszh
Copy link

wsszh commented Jul 15, 2022

Hi, I set the filenameTemplate: "{uuid}-dump-{timestamp}-{hostname}-{exe_name}-{pid}-{signal}-{podname}-{namespace}", but I get the filename like this: "9a1fc79c-758c-4599-a22d-2e94444a3250-dump-1657867608-segfaulter-segfaulter-1-4-unknown-unknown.zip". How to fix it?

@wsszh wsszh changed the title Get podname and namespace related to the core-dump file Get podname and namespace "unknow" Jul 15, 2022
@wsszh wsszh changed the title Get podname and namespace "unknow" Get podname and namespace "unknown" Jul 15, 2022
@wsszh wsszh closed this as completed Jul 15, 2022
@joaogbcravo
Copy link

joaogbcravo commented May 22, 2023

Hey, I also got this unknown. How did you solve it?

@Robert-Stam
Copy link

We see his behaviour here as well (on AWS) any news?
I see the issue is set the closed, however it doesn't seems to be resolved?

@No9 No9 reopened this Jan 17, 2024
@No9
Copy link
Collaborator

No9 commented Jan 17, 2024

Hey @Robert-Stam
Can you confirm which aws.values.xxx.yaml you have used in the deployment and which version of EKS you are using.
It's likely that the version of crio is now outdated as this hasn't been updated for a while.

@Robert-Stam
Copy link

Hey @Robert-Stam Can you confirm which aws.values.xxx.yaml you have used in the deployment and which version of EKS you are using. It's likely that the version of crio is now outdated as this hasn't been updated for a while.

I have used the settings from: https://github.com/IBM/core-dump-handler/blob/main/charts/core-dump-handler/values.aws.yaml

We are using Kubernetes 1.28 (on Intel hardware, m6i family) with the AMI: amazon-eks-node-1.28-v20240110
See: https://github.com/awslabs/amazon-eks-ami/releases/tag/v20240110

image

Thanks in advance!

@Robert-Stam
Copy link

Robert-Stam commented Mar 15, 2024

@No9 Hi Anton, any update on this?

@No9
Copy link
Collaborator

No9 commented Mar 22, 2024

I don't have access to an AWS account to debug.
Can you log into an agent container that has processed a core dump and provide the output of

cat /var/mnt/core-dump-handler/composer.log

If there are no errors can you enable debugging by setting
https://github.com/IBM/core-dump-handler/blob/main/charts/core-dump-handler/values.yaml#L27
to Debug

@Robert-Stam
Copy link

I tested with k8s v1.29 on AKS (Azure) and GKE (Google), and it all resolves to 'unknown' as namespace.
This is the output from the composer log on AKS

ERROR - 2024-04-05T09:41:43.149688332+00:00 - failed to create pod at index 0
ERROR - 2024-04-05T09:41:47.803435709+00:00 - Failed to get pod id

Hope this helps.

@Robert-Stam
Copy link

@No9 I tried to create a small PR to update the packages and crictr version, however without luck.
FYI, here is my PR: #158

Do you have tried k8s v1.29 in the IBM cloud successfully?

@No9
Copy link
Collaborator

No9 commented Apr 8, 2024

crictl is already on the host on IKS and others so it isn't a useful test.
Did you look for the compose logs as per this comment?
#102 (comment)

@Robert-Stam
Copy link

crictl is already on the host on IKS and others so it isn't a useful test.

Did you look for the compose logs as per this comment?

#102 (comment)

See: #102 (comment)

@Robert-Stam
Copy link

Do you have tried k8s v1.29 in the IBM cloud successfully?

Do you have tried k8s v1.29 in the IBM cloud successfully?

@No9
Copy link
Collaborator

No9 commented Apr 8, 2024

Sorry I missed your log output post for some reason.
So it appears as though this command is executing but not returning a list of pods:

crictl pods  --name <hostname> -o json

where <hostname> is captured from the crashing container.

Are you overriding the hostname on the deployed workloads?

In the meantime I'll take a look at a 1.29 cluster to confirm.
[Edit]
Confirmed that the core dump works as expected on IBM Cloud IKS 1.29 with no additional values parameters.
Tested with the following failing container.

kubectl run -i -t segfaulter --image=quay.io/icdh/segfaulter --restart=Never

@Robert-Stam
Copy link

Robert-Stam commented Apr 9, 2024

Sorry I missed your log output post for some reason. So it appears as though this command is executing but not returning a list of pods:

crictl pods  --name <hostname> -o json

where <hostname> is captured from the crashing container.

Are you overriding the hostname on the deployed workloads?

In the meantime I'll take a look at a 1.29 cluster to confirm. [Edit] Confirmed that the core dump works as expected on IBM Cloud IKS 1.29 with no additional values parameters. Tested with the following failing container.

kubectl run -i -t segfaulter --image=quay.io/icdh/segfaulter --restart=Never

I am not overriding the hostname

To make sure we are on the same page, you did test with {namespace} in the filenameTemplate and that is filled out correctly?

@No9
Copy link
Collaborator

No9 commented Apr 9, 2024

Revalidated with this config

composer:
  ignoreCrio: false
  crioImageCmd: "img"
  logLevel: "Warn"
  filenameTemplate: "{uuid}-dump-{timestamp}-{hostname}-{exe_name}-{pid}-{signal}-{namespace}"

Ran kubectl run -i -t segfaulter --image=quay.io/icdh/segfaulter --restart=Never

The following output from the container showing the default namespace is obtained.

[2024-04-09T20:04:27Z INFO  core_dump_agent] Uploading: /var/mnt/core-dump-handler/cores/3fb6b86a-6726-4f5c-80fd-f34e8a971536-dump-1712693067-segfaulter-segfaulter-1-4-default.zip
[2024-04-09T20:04:27Z INFO  core_dump_agent] zip size is 28610
[2024-04-09T20:04:27Z INFO  core_dump_agent] S3 Returned: 200

Can I suggest getting a debug container on the host and establishing what happens when the following is ran:
If JSON is returned can you either post it here and/or validate it in the test suite.

crictl pods  --name <hostname> -o json

Thanks
[Edit]
kubernetes info IBM Kubernetes Service 1.29.3_1531

@Robert-Stam
Copy link

Robert-Stam commented Apr 10, 2024

@No9 Anton, I executed your command in the running container (ibm/core-dump-handler:v8.10.0) on AWS (with k8s v1.29).
This is the result

[root@core-dump-lgd5p app]# ./crictl pods  --name ip-10-87-16-57.eu-west-2.compute.internal -o json
WARN[0000] runtime connect using default endpoints: [unix:///var/run/dockershim.sock unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock]. As the default settings are now deprecated, you should set the endpoint instead.
ERRO[0002] connect endpoint 'unix:///var/run/dockershim.sock', make sure you are running as root and the endpoint has been started: context deadline exceeded
ERRO[0004] connect endpoint 'unix:///run/containerd/containerd.sock', make sure you are running as root and the endpoint has been started: context deadline exceeded
FATA[0006] connect: connect endpoint 'unix:///run/crio/crio.sock', make sure you are running as root and the endpoint has been started: context deadline exceeded

And these are the settings applied (based on the log)

[2024-04-10T09:21:31Z INFO  core_dump_agent] Writing composer .env
    LOG_LEVEL=Warn
    IGNORE_CRIO=false
    CRIO_IMAGE_CMD=img
    USE_CRIO_CONF=false
    FILENAME_TEMPLATE={namespace}-{uuid}-dump-{timestamp}-{hostname}-{exe_name}-{pid}-{signal}
    LOG_LENGTH=500
    POD_SELECTOR_LABEL=
    TIMEOUT=600
    COMPRESSION=true
    CORE_EVENTS=false
    EVENT_DIRECTORY=/var/mnt/core-dump-handler/events

@No9
Copy link
Collaborator

No9 commented Apr 10, 2024

OK it looks like you are trying to run crictl from the handler container. What I was trying to suggest was setting up a debug session on the node.
e.g.

kubectl get nodes 
NAME             STATUS   ROLES           AGE    VERSION
node1   Ready    master,worker   176d   v1.26.9+52589e6
node2   Ready    master,worker   176d   v1.26.9+52589e6
node3    Ready    master,worker   176d   v1.26.9+52589e6

With the node name, it doesn't matter which, run

 kubectl debug node/node1 --image=ubuntu

When you have a debug session run something like the following:

/host/usr/bin/crictl -r unix:///host/run/crio/crio.sock pods  --name core-dump-lgd5p -o json

Where /host/usr/bin/crictl is the location of wherever you have configured to copy crictl and unix:///host/run/crio/crio.sock is the crio.socket which may be in a different location and core-dump-lgd5p is the pod name

Expected output:

{
  "items": [
    {
      "id": "df2bb27cbc78c2fb51aea8cb2f9eeb6124c871244a5fb71e989458bb673125df",
      "metadata": {
        "name": "core-dump-handler-7kqc6",
        "uid": "c8ea5ce9-72be-4826-82b3-b8c3a8144d50",
        "namespace": "observe",
        "attempt": 0
      },
      "state": "SANDBOX_READY",
      "createdAt": "1712691523593249607",
      "labels": {
        "controller-revision-hash": "7b6c988b5d",
        "io.kubernetes.container.name": "POD",
        "io.kubernetes.pod.name": "core-dump-handler-7kqc6",
        "io.kubernetes.pod.namespace": "observe",
        "io.kubernetes.pod.uid": "c8ea5ce9-72be-4826-82b3-b8c3a8144d50",
        "name": "core-dump-ds",
        "pod-template-generation": "1"
      },
      "annotations": {
        "kubectl.kubernetes.io/default-container": "coredump-container",
        "kubernetes.io/config.seen": "2024-04-09T14:38:43.120492372-05:00",
        "kubernetes.io/config.source": "api",
        "openshift.io/scc": "core-dump-admin-privileged"
      },
      "runtimeHandler": ""
    }
  ]
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants