Skip to content
This repository has been archived by the owner on Sep 18, 2020. It is now read-only.

segvault when updating / stuck updating #184

Open
johanneswuerbach opened this issue Aug 27, 2018 · 2 comments
Open

segvault when updating / stuck updating #184

johanneswuerbach opened this issue Aug 27, 2018 · 2 comments

Comments

@johanneswuerbach
Copy link

Recently the CLUO (v0.7.0) seems to have been stuck and continuously tried to update the same node.

CoreOS: CoreOS 1800.5.0
Kubernetes: v1.9.9
Cloud: AWS us-east-1, kops 1.10

Agent logs:

I0827 23:00:28.134375       1 agent.go:184] Node drained, rebooting
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x1238536]
 goroutine 33 [running]:
github.com/coreos/container-linux-update-operator/pkg/updateengine.(*Client).ReceiveStatuses(0xc4203c6660, 0xc420052480, 0xc420052300)
	/go/src/github.com/coreos/container-linux-update-operator/pkg/updateengine/client.go:99 +0x186
created by github.com/coreos/container-linux-update-operator/pkg/agent.(*Klocksmith).watchUpdateStatus
	/go/src/github.com/coreos/container-linux-update-operator/pkg/agent/agent.go:251 +0x102

Controller logs:

I0827 23:03:26.319148       1 operator.go:449] Found node "ip-10-100-24-49.ec2.internal" still rebooting, waiting
I0827 23:03:26.319172       1 operator.go:451] Found 1 (of max 1) rebooting nodes; waiting for completion
I0827 23:03:59.455065       1 operator.go:507] Found 0 rebooted nodes
I0827 23:03:59.719801       1 operator.go:449] Found node "ip-10-100-24-49.ec2.internal" still rebooting, waiting
I0827 23:03:59.720003       1 operator.go:451] Found 1 (of max 1) rebooting nodes; waiting for completion
I0827 23:04:32.719449       1 operator.go:507] Found 0 rebooted nodes
I0827 23:04:33.119047       1 operator.go:449] Found node "ip-10-100-24-49.ec2.internal" still rebooting, waiting
I0827 23:04:33.119072       1 operator.go:451] Found 1 (of max 1) rebooting nodes; waiting for completion
I0827 23:05:06.520970       1 operator.go:507] Found 0 rebooted nodes
I0827 23:05:06.918956       1 operator.go:449] Found node "ip-10-100-24-49.ec2.internal" still rebooting, waiting
I0827 23:05:06.918976       1 operator.go:451] Found 1 (of max 1) rebooting nodes; waiting for completion
I0827 23:05:39.920518       1 operator.go:507] Found 0 rebooted nodes
I0827 23:05:40.320071       1 operator.go:449] Found node "ip-10-100-24-49.ec2.internal" still rebooting, waiting
I0827 23:05:40.320094       1 operator.go:451] Found 1 (of max 1) rebooting nodes; waiting for completion
I0827 23:06:13.719273       1 operator.go:507] Found 1 rebooted nodes
I0827 23:06:14.519760       1 operator.go:449] Found node "ip-10-100-24-49.ec2.internal" still rebooting, waiting
I0827 23:06:14.519909       1 operator.go:451] Found 1 (of max 1) rebooting nodes; waiting for completion
@johanneswuerbach johanneswuerbach changed the title segvault when updating segvault when updating / stuck updating Aug 28, 2018
@johanneswuerbach
Copy link
Author

Looks like downgrading to v0.6.0 has solved the issue for us.

@sdemos
Copy link
Contributor

sdemos commented Aug 29, 2018

The panic you are running into looks the same as the one in #93, which is odd because according to that issue, it should've been fixed with v0.7.0. I'll have to try and reproduce that, I don't remember much about it.

As far as not updating, the panic shouldn't have anything to do with it, the panic comes because the dbus channel gets closed underneath the watch function because the system is going down for the reboot.

Can you post the operator deployment for the failed one? Do you have a reboot window or any pre- or post-reboot hooks configured? It might also be helpful to get some of the debugging logs, which you can do by adding the flag -v 4 to the operator deployment.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants