Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Installation of Dragonfly2 service components dragonfly-client, dragonfly-seed-client health check failed #3673

Open
caisheng821 opened this issue Nov 27, 2024 · 8 comments
Assignees

Comments

@caisheng821
Copy link

dragonfly-client, dragonfly-seed-client issue : Liveness probe failed: timeout: failed to connect service "unix:///var/run/dragonfly/dfdaemon.sock" within 1s

@caisheng821
Copy link
Author

Kubernetes : v1.28.15
OS: debian12
Containerd: 1.6.20

1 similar comment
@caisheng821
Copy link
Author

Kubernetes : v1.28.15
OS: debian12
Containerd: 1.6.20

@gaius-qi gaius-qi self-assigned this Nov 27, 2024
@gaius-qi
Copy link
Member

@caisheng821 Can you provide the log of kubectl describe po dragonfly-seed-client and kubectl logs dragonfly-seed-client.

@caisheng821
Copy link
Author

kubectl describe po dragonfly-seed-client-0 -n dragonfly-system
Name: dragonfly-seed-client-0
Namespace: dragonfly-system
Priority: 0
Service Account: default
Node: cdsf/172.23.11.188
Start Time: Mon, 25 Nov 2024 11:51:01 +0800
Labels: app=dragonfly
apps.kubernetes.io/pod-index=0
component=seed-client
controller-revision-hash=dragonfly-seed-client-85c644b4f9
release=dragonfly
statefulset.kubernetes.io/pod-name=dragonfly-seed-client-0
Annotations: checksum/config: 140c96bc815d67ab002f38b32efa512dd48948518016d079fbe0329d252efdd8
Status: Running
IP: 172.21.1.155
IPs:
IP: 172.21.1.155
Controlled By: StatefulSet/dragonfly-seed-client
Init Containers:
wait-for-manager:
Container ID: containerd://4d9680d81f9e89e11a33b6b0ee1583132ec4305d297772cf868b154be71ea3ce
Image: busybox:latest
Image ID: busybox@sha256:c121a8a6392cffd1288512fb51bf828dffb7969d6ee5f63ae56937bacb5dc7ce
Port:
Host Port:
Command:
sh
-c
until nslookup dragonfly-manager.dragonfly-system.svc.cluster.local && nc -vz dragonfly-manager.dragonfly-system.svc.cluster.local 8080; do echo waiting for manager; sleep 2; done;
State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 25 Nov 2024 11:51:02 +0800
Finished: Mon, 25 Nov 2024 11:51:02 +0800
Ready: True
Restart Count: 0
Limits:
cpu: 2
memory: 4Gi
Requests:
cpu: 0
memory: 0
Environment:
Mounts:
Containers:
seed-client:
Container ID: containerd://36923af718dd34e1e0721450786f0085a6b5449d308954f1d16ab53022451df1
Image: dragonflyoss/client:latest
Image ID: dragonflyoss/client@sha256:6571b0b1d206237bfec7eca6e9177c72a3d362b77e1a338dad816dbbe672e02b
Ports: 4000/TCP, 4001/TCP, 4003/TCP, 4002/TCP, 4004/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP
Args:
--log-level=info
--verbose
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Thu, 28 Nov 2024 09:24:52 +0800
Finished: Thu, 28 Nov 2024 09:26:52 +0800
Ready: False
Restart Count: 936
Limits:
cpu: 2
memory: 4Gi
Requests:
cpu: 0
memory: 0
Liveness: exec [/bin/grpc_health_probe -addr=unix:///var/run/dragonfly/dfdaemon.sock] delay=15s timeout=5s period=30s #success=1 #failure=3
Readiness: exec [/bin/grpc_health_probe -addr=unix:///var/run/dragonfly/dfdaemon.sock] delay=5s timeout=5s period=30s #success=1 #failure=3
Environment:
Mounts:
/etc/dragonfly from config (rw)
/var/lib/dragonfly/ from storage (rw)
/var/log/dragonfly/dfdaemon/ from logs (rw)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
storage:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: storage-dragonfly-seed-client-0
ReadOnly: false
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: dragonfly-seed-client
Optional: false
logs:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit:
QoS Class: Burstable
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message


Warning Unhealthy 12m (x2803 over 2d21h) kubelet Liveness probe failed: timeout: failed to connect service "unix:///var/run/dragonfly/dfdaemon.sock" within 1s
Warning BackOff 7m12s (x11584 over 2d21h) kubelet Back-off restarting failed container seed-client in pod dragonfly-seed-client-0_dragonfly-system(a6b881a8-b34a-449a-bb7f-0c876694ff24)
Warning Unhealthy 2m22s (x6434 over 2d21h) kubelet Readiness probe failed: timeout: failed to connect service "unix:///var/run/dragonfly/dfdaemon.sock" within 1s

@caisheng821
Copy link
Author

kubectl describe po dragonfly-client-lgx66 -n dragonfly-system
Name: dragonfly-client-lgx66
Namespace: dragonfly-system
Priority: 0
Service Account: default
Node: cdsf/172.23.11.188
Start Time: Mon, 25 Nov 2024 11:27:19 +0800
Labels: app=dragonfly
component=client
controller-revision-hash=896f66885
pod-template-generation=5
release=dragonfly
Annotations: cattle.io/timestamp: 2024-11-25T03:01:32Z
checksum/config: d0d38701744b259ae47121c4a0429cda1090e7b581094aa168709256f8bbc838
Status: Running
IP: 172.23.11.88
IPs:
IP: 172.23.11.88
Controlled By: DaemonSet/dragonfly-client
Init Containers:
wait-for-scheduler:
Container ID: containerd://13e22d8b5a3f211be8de2f73e44007d3231f62095d1d84bdb521c6520c74ad5e
Image: busybox:latest
Image ID: busybox@sha256:c121a8a6392cffd1288512fb51bf828dffb7969d6ee5f63ae56937bacb5dc7ce
Port:
Host Port:
Command:
sh
-c
until nslookup dragonfly-scheduler.dragonfly-system.svc.cluster.local && nc -vz dragonfly-scheduler.dragonfly-system.svc.cluster.local 8002; do echo waiting for scheduler; sleep 2; done;
State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 25 Nov 2024 11:27:19 +0800
Finished: Mon, 25 Nov 2024 11:27:19 +0800
Ready: True
Restart Count: 0
Environment:
Mounts:
dfinit:
Container ID: containerd://3997ea845d16a93c79849e3261e2e10c30ace898b379415f84278af9008d4ac7
Image: dragonflyoss/dfinit:latest
Image ID: dragonflyoss/dfinit@sha256:0f63c8458db216412471a24449be4f994b9d3bd0fb4f67f95e60923a1bde9bf8
Port:
Host Port:
Args:
--log-level=info
--verbose
State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 25 Nov 2024 11:38:01 +0800
Finished: Mon, 25 Nov 2024 11:38:01 +0800
Ready: True
Restart Count: 7
Limits:
cpu: 2
memory: 4Gi
Requests:
cpu: 0
memory: 0
Environment:
Mounts:
/etc/containerd from containerd-config-dir (rw)
/etc/dragonfly from dfinit-config (rw)
restart-container-runtime:
Container ID: containerd://c9f3415a53a0a8a3b0703aad2b755ce4ca74afb2ba1e22ecb5d4b48e4821512b
Image: busybox:latest
Image ID: busybox@sha256:c121a8a6392cffd1288512fb51bf828dffb7969d6ee5f63ae56937bacb5dc7ce
Port:
Host Port:
Command:
/bin/sh
-cx
nsenter -t 1 -m -- systemctl restart containerd.service
echo "restart container"
State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 25 Nov 2024 11:38:02 +0800
Finished: Mon, 25 Nov 2024 11:38:02 +0800
Ready: True
Restart Count: 0
Limits:
cpu: 2
memory: 4Gi
Requests:
cpu: 0
memory: 0
Environment:
Mounts:
Containers:
client:
Container ID: containerd://4b603f7d5c4c5499925c09b93480e29b9f1489fcc402fa394d22c8497fd18ed2
Image: dragonflyoss/client:latest
Image ID: dragonflyoss/client@sha256:6571b0b1d206237bfec7eca6e9177c72a3d362b77e1a338dad816dbbe672e02b
Ports: 4000/TCP, 4003/TCP, 4002/TCP, 4004/TCP
Host Ports: 4000/TCP, 4003/TCP, 4002/TCP, 4004/TCP
Args:
--log-level=info
--verbose
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Thu, 28 Nov 2024 09:27:10 +0800
Finished: Thu, 28 Nov 2024 09:29:10 +0800
Ready: False
Restart Count: 941
Limits:
cpu: 2
memory: 4Gi
Requests:
cpu: 0
memory: 0
Liveness: exec [/bin/grpc_health_probe -addr=unix:///var/run/dragonfly/dfdaemon.sock] delay=15s timeout=5s period=30s #success=1 #failure=3
Readiness: exec [/bin/grpc_health_probe -addr=unix:///var/run/dragonfly/dfdaemon.sock] delay=5s timeout=5s period=30s #success=1 #failure=3
Environment:
Mounts:
/etc/dragonfly from config (rw)
/var/lib/dragonfly/ from storage (rw)
/var/log/dragonfly/dfdaemon/ from logs (rw)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: dragonfly-client
Optional: false
dfinit-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: dragonfly-dfinit
Optional: false
containerd-config-dir:
Type: HostPath (bare host directory volume)
Path: /etc/containerd
HostPathType: DirectoryOrCreate
storage:
Type: HostPath (bare host directory volume)
Path: /var/lib/dragonfly/
HostPathType: DirectoryOrCreate
logs:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit:
QoS Class: Burstable
Node-Selectors:
Tolerations: node.kubernetes.io/disk-pressure:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/network-unavailable:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/pid-pressure:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
Type Reason Age From Message


Warning Unhealthy 41m (x6424 over 2d21h) kubelet Readiness probe failed: timeout: failed to connect service "unix:///var/run/dragonfly/dfdaemon.sock" within 1s
Warning Unhealthy 6m44s (x2824 over 2d21h) kubelet Liveness probe failed: timeout: failed to connect service "unix:///var/run/dragonfly/dfdaemon.sock" within 1s
Warning BackOff 104s (x11652 over 2d21h) kubelet Back-off restarting failed container client in pod dragonfly-client-lgx66_dragonfly-system(f0f5b4a1-d9ac-4d3f-8fb7-1662606d18ab)

@caisheng821
Copy link
Author

@caisheng821 Can you provide the log of kubectl describe po dragonfly-seed-client and kubectl logs dragonfly-seed-client.

already been posted on the log

@gaius-qi
Copy link
Member

@caisheng821 Can you provide the log of kubectl describe po dragonfly-seed-client and kubectl logs dragonfly-seed-client.

@caisheng821 Please provide kubectl logs dragonfly-seed-client.

@caisheng821
Copy link
Author

@caisheng821 Can you provide the log of kubectl describe po dragonfly-seed-client and kubectl logs dragonfly-seed-client.

@caisheng821 Please provide kubectl logs dragonfly-seed-client.

kubectl logs dragonfly-seed-client-0 -n dragonfly-system
Defaulted container "seed-client" out of: seed-client, wait-for-manager (init)
2024-11-29T09:48:36.952553530+00:00 INFO tracing initialized directory: /var/log/dragonfly/dfdaemon, level: INFO
at dragonfly-client/src/tracing/mod.rs:131

2024-11-29T09:48:36.952628118+00:00 INFO initializing metadata directory: "/var/lib/dragonfly/" ["task", "piece", "persistent_cache_task"]
at dragonfly-client-storage/src/storage_engine/rocksdb.rs:72
in open
in new
in new

2024-11-29T09:48:37.022500735+00:00 INFO metadata initialized directory: "/var/lib/dragonfly/metadata"
at dragonfly-client-storage/src/storage_engine/rocksdb.rs:131
in open
in new
in new

2024-11-29T09:48:37.023419115+00:00 INFO content initialized directory: "/var/lib/dragonfly/content"
at dragonfly-client-storage/src/content.rs:80
in new
in new

2024-11-29T09:48:37.042387642+00:00 INFO refresh available scheduler addresses: ["172.21.1.203", "172.21.2.68", "172.21.0.178"]
at dragonfly-client/src/grpc/scheduler.rs:602
in update_available_scheduler_addrs
in refresh_available_scheduler_addrs
in new

2024-11-29T09:48:37.042472470+00:00 INFO load [http] builtin backend
at dragonfly-client-backend/src/lib.rs:261
in load_builtin_backends
in new

2024-11-29T09:48:37.042496818+00:00 INFO load [https] builtin backend
at dragonfly-client-backend/src/lib.rs:267
in load_builtin_backends
in new

2024-11-29T09:48:37.042509179+00:00 INFO load [s3] builtin backend
at dragonfly-client-backend/src/lib.rs:275
in load_builtin_backends
in new

2024-11-29T09:48:37.042521043+00:00 INFO load [gcs] builtin backend
at dragonfly-client-backend/src/lib.rs:283
in load_builtin_backends
in new

2024-11-29T09:48:37.042532588+00:00 INFO load [abs] builtin backend
at dragonfly-client-backend/src/lib.rs:291
in load_builtin_backends
in new

2024-11-29T09:48:37.042543320+00:00 INFO load [oss] builtin backend
at dragonfly-client-backend/src/lib.rs:299
in load_builtin_backends
in new

2024-11-29T09:48:37.042553894+00:00 INFO load [obs] builtin backend
at dragonfly-client-backend/src/lib.rs:307
in load_builtin_backends
in new

2024-11-29T09:48:37.042566372+00:00 INFO load [cos] builtin backend
at dragonfly-client-backend/src/lib.rs:315
in load_builtin_backends
in new

2024-11-29T09:48:37.042577263+00:00 INFO load [hdfs] builtin backend
at dragonfly-client-backend/src/lib.rs:319
in load_builtin_backends
in new

2024-11-29T09:48:37.042608595+00:00 WARN skip loading plugin backends, because the plugin directory /var/lib/dragonfly/plugins/dfdaemon/ does not exist
at dragonfly-client-backend/src/lib.rs:329
in load_plugin_backends
in new

2024-11-29T09:48:37.045909994+00:00 INFO load registry cert success
at dragonfly-client/src/proxy/mod.rs:117
in new

2024-11-29T09:48:37.045929033+00:00 INFO load proxy ca cert and key success
at dragonfly-client/src/proxy/mod.rs:129
in new

2024-11-29T09:48:37.049418421+00:00 INFO announce host to 172.21.1.203:8002
at dragonfly-client/src/grpc/scheduler.rs:232
in init_announce_host with request: AnnounceHostRequest { host: Some(Host { id: "172.21.1.155-dragonfly-seed-client-0-seed", r#type: 1, hostname: "dragonfly-seed-clien
t-0", ip: "172.21.1.155", port: 4000, download_port: 4000, os: "linux", platform: "linux", platform_family: "unix", platform_version: "12", kernel_version: "6.1.0-25-amd64", cpu: Some(Cpu { logical_count: 2, physical_count: 2, percent: 4.112109184265137, process_percent: 0.0, times: None }), memory: Some(Memory { total: 7824576512, available: 6338109440, used: 1486467072, used_percent: 0.0, process_used_percent: 0.0, free: 432066560 }), network: Some(Network { tcp_connection_count: 0, upload_tcp_connection_count: 0, location: Some(""), idc: Some(""), download_rate: 0, download_rate_limit: 53687091200, upload_rate: 0, upload_rate_limit: 53687091200 }), disk: Some(Disk { total: 41892380672, free: 28545040384, used: 13347340288, used_percent: 31.861021202170747, inodes_total: 0, inodes_used: 0, inodes_free: 0, inodes_used_percent: 0.0, read_bandwidth: 0, write_bandwidth: 0 }), build: Some(Build { git_version: "0.1.117", git_commit: Some("unknown"), go_version: None, rust_version: Some(""), platform: None }), scheduler_cluster_id: 0, disable_shared: false }), interval: Some(Duration { seconds: 60, nanos: 0 }) }
in new

2024-11-29T09:48:37.049556916+00:00 INFO announce host to 172.21.2.68:8002
at dragonfly-client/src/grpc/scheduler.rs:232
in init_announce_host with request: AnnounceHostRequest { host: Some(Host { id: "172.21.1.155-dragonfly-seed-client-0-seed", r#type: 1, hostname: "dragonfly-seed-client-0", ip: "172.21.1.155", port: 4000, download_port: 4000, os: "linux", platform: "linux", platform_family: "unix", platform_version: "12", kernel_version: "6.1.0-25-amd64", cpu: Some(Cpu { logical_count: 2, physical_count: 2, percent: 4.112109184265137, process_percent: 0.0, times: None }), memory: Some(Memory { total: 7824576512, available: 6338109440, used: 1486467072, used_percent: 0.0, process_used_percent: 0.0, free: 432066560 }), network: Some(Network { tcp_connection_count: 0, upload_tcp_connection_count: 0, location: Some(""), idc: Some(""), download_rate: 0, download_rate_limit: 53687091200, upload_rate: 0, upload_rate_limit: 53687091200 }), disk: Some(Disk { total: 41892380672, free: 28545040384, used: 13347340288, used_percent: 31.861021202170747, inodes_total: 0, inodes_used: 0, inodes_free: 0, inodes_used_percent: 0.0, read_bandwidth: 0, write_bandwidth: 0 }), build: Some(Build { git_version: "0.1.117", git_commit: Some("unknown"), go_version: None, rust_version: Some(""), platform: None }), scheduler_cluster_id: 0, disable_shared: false }), interval: Some(Duration { seconds: 60, nanos: 0 }) }
in new

2024-11-29T09:48:37.049559032+00:00 INFO announce host to 172.21.0.178:8002
at dragonfly-client/src/grpc/scheduler.rs:232
in init_announce_host with request: AnnounceHostRequest { host: Some(Host { id: "172.21.1.155-dragonfly-seed-client-0-seed", r#type: 1, hostname: "dragonfly-seed-client-0", ip: "172.21.1.155", port: 4000, download_port: 4000, os: "linux", platform: "linux", platform_family: "unix", platform_version: "12", kernel_version: "6.1.0-25-amd64", cpu: Some(Cpu { logical_count: 2, physical_count: 2, percent: 4.112109184265137, process_percent: 0.0, times: None }), memory: Some(Memory { total: 7824576512, available: 6338109440, used: 1486467072, used_percent: 0.0, process_used_percent: 0.0, free: 432066560 }), network: Some(Network { tcp_connection_count: 0, upload_tcp_connection_count: 0, location: Some(""), idc: Some(""), download_rate: 0, download_rate_limit: 53687091200, upload_rate: 0, upload_rate_limit: 53687091200 }), disk: Some(Disk { total: 41892380672, free: 28545040384, used: 13347340288, used_percent: 31.861021202170747, inodes_total: 0, inodes_used: 0, inodes_free: 0, inodes_used_percent: 0.0, read_bandwidth: 0, write_bandwidth: 0 }), build: Some(Build { git_version: "0.1.117", git_commit: Some("unknown"), go_version: None, rust_version: Some(""), platform: None }), scheduler_cluster_id: 0, disable_shared: false }), interval: Some(Duration { seconds: 60, nanos: 0 }) }
in new

2024-11-29T09:48:37.054749685+00:00 INFO dfdaemon started at pid 1
at dragonfly-client/src/bin/dfdaemon/main.rs:293

2024-11-29T09:48:37.057299804+00:00 INFO upload server listening on 0.0.0.0:4000
at dragonfly-client/src/grpc/dfdaemon_upload.rs:124
in run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants