Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reboots do not work when using reboot resource set to delay_mins 0 #1801

Open
ramereth opened this issue Sep 12, 2021 · 4 comments
Open

Reboots do not work when using reboot resource set to delay_mins 0 #1801

ramereth opened this issue Sep 12, 2021 · 4 comments

Comments

@ramereth
Copy link
Contributor

🗣️ Foreword

Thank for taking the time to fill this bug report fully. Without it we may not be able to fix the bug, and the issue may be closed without resolution.

👻 Brief Description

Using a reboot resource to :reboot_new to reboot a VM using test-kitchen fails to work.

Version

Test Kitchen version: 3.0.0

Environment

sous-chefs/apparmor#28

Scenario

Rebooting a VM to test disabling AppArmor.

Steps to Reproduce

See PR above for an example and try any of the disable suites.

Expected Result

Rebooting a VM without an error.

Actual Result

   [2021-09-12T06:32:14+00:00] WARN: Rebooting server at a recipe's request. Details: {:delay_mins=>0, :reason=>"Changing the AppArmor state requires a reboot", :timestamp=>2021-09-12 06:32:14.78388044 +0000, :requested_by=>"apparmor_state_change"}
       
       Running handlers:
   [2021-09-12T06:32:14+00:00] ERROR: Running exception handlers
       Running handlers complete
   [2021-09-12T06:32:14+00:00] ERROR: Exception handlers complete
       Infra Phase failed. 4 resources updated in 08 seconds
   [2021-09-12T06:32:14+00:00] FATAL: Stacktrace dumped to /var/tmp/kitchen/cache/chef-stacktrace.out
   [2021-09-12T06:32:14+00:00] FATAL: ---------------------------------------------------------------------------------------
   [2021-09-12T06:32:14+00:00] FATAL: PLEASE PROVIDE THE CONTENTS OF THE stacktrace.out FILE (above) IF YOU FILE A BUG REPORT
   [2021-09-12T06:32:14+00:00] FATAL: ---------------------------------------------------------------------------------------
   [2021-09-12T06:32:14+00:00] FATAL: Chef::Exceptions::Reboot: Rebooting server at a recipe's request. Details: {:delay_mins=>0, :reason=>"Changing the AppArmor state requires a reboot", :timestamp=>2021-09-12 06:32:14.78388044 +0000, :requested_by=>"apparmor_state_change"}
read:0x00007fb03315ac80 /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/logging-2.3.0/lib/logging/diagnostic_context.rb:471 run> terminated with exception (report_on_exception is true):
/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/net-ssh-6.1.0/lib/net/ssh/transport/packet_stream.rb:75:in `select': closed stream (IOError)
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/net-ssh-6.1.0/lib/net/ssh/transport/packet_stream.rb:75:in `available_for_read?'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/net-ssh-6.1.0/lib/net/ssh/transport/packet_stream.rb:90:in `next_packet'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/net-ssh-6.1.0/lib/net/ssh/transport/session.rb:193:in `block in poll_message'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/net-ssh-6.1.0/lib/net/ssh/transport/session.rb:190:in `loop'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/net-ssh-6.1.0/lib/net/ssh/transport/session.rb:190:in `poll_message'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/net-ssh-6.1.0/lib/net/ssh/connection/session.rb:545:in `dispatch_incoming_packets'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/net-ssh-6.1.0/lib/net/ssh/connection/session.rb:248:in `ev_preprocess'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/net-ssh-6.1.0/lib/net/ssh/connection/event_loop.rb:100:in `each'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/net-ssh-6.1.0/lib/net/ssh/connection/event_loop.rb:100:in `ev_preprocess'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/net-ssh-6.1.0/lib/net/ssh/connection/event_loop.rb:28:in `process'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/net-ssh-6.1.0/lib/net/ssh/connection/session.rb:227:in `process'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/net-ssh-6.1.0/lib/net/ssh/connection/session.rb:180:in `block in loop'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/net-ssh-6.1.0/lib/net/ssh/connection/session.rb:180:in `loop'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/net-ssh-6.1.0/lib/net/ssh/connection/session.rb:180:in `loop'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/net-ssh-6.1.0/lib/net/ssh/connection/session.rb:126:in `close'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/test-kitchen-3.0.0/lib/kitchen/transport/ssh.rb:129:in `close'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/test-kitchen-3.0.0/lib/kitchen/transport/ssh.rb:106:in `cleanup!'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/test-kitchen-3.0.0/lib/kitchen/instance.rb:298:in `cleanup!'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/test-kitchen-3.0.0/lib/kitchen/command.rb:203:in `run_action_in_thread'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/test-kitchen-3.0.0/lib/kitchen/command.rb:166:in `block (2 levels) in run_action'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/logging-2.3.0/lib/logging/diagnostic_context.rb:474:in `block in create_with_logging_context'
/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/net-ssh-6.1.0/lib/net/ssh/transport/packet_stream.rb:75:in `select': closed stream (IOError)
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/net-ssh-6.1.0/lib/net/ssh/transport/packet_stream.rb:75:in `available_for_read?'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/net-ssh-6.1.0/lib/net/ssh/transport/packet_stream.rb:90:in `next_packet'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/net-ssh-6.1.0/lib/net/ssh/transport/session.rb:193:in `block in poll_message'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/net-ssh-6.1.0/lib/net/ssh/transport/session.rb:190:in `loop'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/net-ssh-6.1.0/lib/net/ssh/transport/session.rb:190:in `poll_message'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/net-ssh-6.1.0/lib/net/ssh/connection/session.rb:545:in `dispatch_incoming_packets'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/net-ssh-6.1.0/lib/net/ssh/connection/session.rb:248:in `ev_preprocess'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/net-ssh-6.1.0/lib/net/ssh/connection/event_loop.rb:100:in `each'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/net-ssh-6.1.0/lib/net/ssh/connection/event_loop.rb:100:in `ev_preprocess'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/net-ssh-6.1.0/lib/net/ssh/connection/event_loop.rb:28:in `process'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/net-ssh-6.1.0/lib/net/ssh/connection/session.rb:227:in `process'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/net-ssh-6.1.0/lib/net/ssh/connection/session.rb:180:in `block in loop'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/net-ssh-6.1.0/lib/net/ssh/connection/session.rb:180:in `loop'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/net-ssh-6.1.0/lib/net/ssh/connection/session.rb:180:in `loop'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/net-ssh-6.1.0/lib/net/ssh/connection/session.rb:126:in `close'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/test-kitchen-3.0.0/lib/kitchen/transport/ssh.rb:129:in `close'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/test-kitchen-3.0.0/lib/kitchen/transport/ssh.rb:106:in `cleanup!'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/test-kitchen-3.0.0/lib/kitchen/instance.rb:298:in `cleanup!'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/test-kitchen-3.0.0/lib/kitchen/command.rb:203:in `run_action_in_thread'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/test-kitchen-3.0.0/lib/kitchen/command.rb:166:in `block (2 levels) in run_action'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/logging-2.3.0/lib/logging/diagnostic_context.rb:474:in `block in create_with_logging_context'
Error: The process '/usr/local/bin/kitchen' failed with exit code 1

➕ Additional context

Add any other context about the problem here. e.g. related issues or existing pull requests.

@ramereth ramereth changed the title Reboots no longer working Reboots do not work when using reboot resource set to delay_mins 1 Sep 16, 2021
@ramereth
Copy link
Contributor Author

I have an update on this issue. This seems to work as long as you do something like the following:

reboot 'default' do
  delay_mins 1
  action :reboot_now
end

It does not work (most of the time) if you just do the following:

reboot 'default' do
  action :reboot_now
end

@ramereth ramereth changed the title Reboots do not work when using reboot resource set to delay_mins 1 Reboots do not work when using reboot resource set to delay_mins 0 Sep 16, 2021
@nielsbuus
Copy link

I would like to weigh in on this issue, as I am also facing similar challenges with the way test-kitchen behaves, although I'm still trying to understand why.

From what I have been able to piece together, when you use a reboot resource, Chef will ultimately execute this method:

https://github.com/chef/chef/blob/v17.6.2/lib/chef/platform/rebooter.rb#L34

The problem here is that once you execute those shutdown commands, things move very fast. I haven't tested on Linux VM's (like I assume you do), but the behavior is likely similar.

chef-client tries to terminate in a controlled manner when reboots are ordered, by immediately throwing an exception, this exception is caught by an error handler which maps it to an exit code and exits with the exit code 35.

This exit code is then returned by the ssh or winrm transports to kitchen, which then concludes failure... unless the retry_on_exit_code option is set to include 35 in your kitchen.yml provisioner settings.

Regardless.... there seems to be a race condition here, because the shutdown executable will more or less immediately instruct all running processes to terminate - including the Ruby process that powers chef-client inside the VM.

I have not been able to verify it with chef-client as it terminates so quickly (and I don't know how to send fake signals on Windows), but Ruby seems to react to Windows signals by immediately exiting and chef-client does not seem to have anything that hooks into that behavior and delays the termination in time for a clean termination.

As a result, I suspect that the "clean" exit that chef-client tries to obtain with the exit code 35 doesn't happen if Ruby dies before it resumes from the shell execution of shutdown, raises the reboot exception and finally handles it to map it to an exit code and exits.

And even if chef-client is able to exit cleanly, the news might not reach test-kitchen if the SSH server or WinRM server is killed before it can relay the exit code (and close the session).

Your error

/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/net-ssh-6.1.0/lib/net/ssh/transport/packet_stream.rb:75:in `select': closed stream (IOError)

Seems to suggest that your VM's SSH server has indeed been killed and the TCP connection closed abruptly.

In other words, I think the reason why delay_mins appears to work is that it ensures Chef can exit the way it wants and the transport (SSH or WinRM) can conclude the command (chef-client) failed due to a reboot order and close gracefully, because it will be another 60 seconds before the OS pulls the plug.

The only immediate solution I can think of here would be for chef-client to run the shutdown command in a seperate detached process with 1-2 second delay and then raise the reboot exception immediately, so chef-client can exit on its own terms just before the OS ends it.

But I'm not a contributor (yet), so I wouldn't know if that would be acceptable or where to shove it in.

I would be nice if a chef maintainer could comment on this issue and confirm (or correct) these findings...

@i5pranay93
Copy link
Contributor

i5pranay93 commented Sep 28, 2021

@ramereth @nielsbuus We had this issue in past as well,
#1376
chrisgit/test_kitchen-multiboot#1
#1376 (comment)
2 solution:
#1218 (comment)

@i5pranay93
Copy link
Contributor

i5pranay93 commented Oct 7, 2021

Hi
was trying to replicate issue
it did replicate,
but as per @ramereth
it should work in case when we have something like

reboot 'default' do
  delay_mins 1
  action :reboot_now
end

and doesn’t work in case
when

reboot 'default' do
  action :reboot_now
end

but when I try, it (Kitchen converge )fails in both case

Kitchen.yml

driver:
  name: vagrant

provisioner:
  name: chef_infra
  product_name: chef
  product_version: <%= ENV['CHEF_VERSION'] || 'latest' %>
  chef_license: accept-no-persist
  multiple_converge: 2
  enforce_idempotency: true
  deprecations_as_errors: true
  log_level: <%= ENV['CHEF_LOG_LEVEL'] || 'auto' %>

  # Allow reboots
  max_retries: 3
  wait_for_retry: 90
  retry_on_exit_code: # An array of exit codes that can indicate that kitchen should retry the converge command. Defaults to an empty array.
        - 35              # Reboot is scheduled
        - 20              # The Windows system cannot find the device specified.
        - 1               # Generic failure
  root_path: /var/tmp/kitchen

verifier:
  name: inspec

platforms:
  - name: ubuntu-18.04

suites:
  - name: disable
    run_list:
      - recipe[workstation::default]
    attributes:
      apparmor:
        disable: true
        automatic_reboot: true
    verifier:
      inspec_tests:
        - path: test/integration/common
      inputs:
        enabled: false
    provisioner:
      multiple_converge: 1
      enforce_idempotency: false

recipes/default.rb

#
# Cookbook:: workstation
# Recipe:: default
#
# Copyright:: 2021, The Authors, All Rights Reserved.


reboot 'default' do
  delay_mins 1
  action :reboot_now
end


file '/tmp/test.txt' do
  content 'this is managed by chef'
  action :create
end

Logs
  Running handlers:
       [2021-10-06T15:38:30+00:00] ERROR: Running exception handlers
           Running handlers complete
       [2021-10-06T15:38:30+00:00] ERROR: Exception handlers complete
           Infra Phase failed. 0 resources updated in 01 seconds
       [2021-10-06T15:38:30+00:00] FATAL: Stacktrace dumped to /var/tmp/kitchen/cache/chef-stacktrace.out
       [2021-10-06T15:38:30+00:00] FATAL: ---------------------------------------------------------------------------------------
       [2021-10-06T15:38:30+00:00] FATAL: PLEASE PROVIDE THE CONTENTS OF THE stacktrace.out FILE (above) IF YOU FILE A BUG REPORT
       [2021-10-06T15:38:30+00:00] FATAL: ---------------------------------------------------------------------------------------
       [2021-10-06T15:38:30+00:00] FATAL: Chef::Exceptions::Reboot: Rebooting server at a recipe's request. Details: {:delay_mins=>1, :reason=>"Reboot by Chef Infra Client", :timestamp=>2021-10-06 15:38:30.579979039 +0000, :requested_by=>"default"}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants