'reboot' module not working as expected: failure to setup for boot_time_command's execution gets treated as result of boot_time_command #83018
Labels
affects_2.16
bug
This issue/PR relates to a bug.
module
This issue/PR relates to a module.
needs_verified
This issue needs to be verified/reproduced by maintainer
P3
Priority 3 - Approved, No Time Limitation
Summary
To verify that a reboot actually took place, the
ansible.builtin.reboot
module compares an initial output ofboot_time_command
(execution # 1) with successive ones (execution # 2, # 3, # 4, ...) and considers the host as rebooted the first time they differ to continue with post-reboot checks.It however fails to correctly handle certain cases where setup for the execution of
boot_time_command
fails before even executing that command and interprets the output of that failure as the actual output ofboot_time_command
which consequently differ from a correct execution and mark the host as rebooted.Failure of setting up for the execution of
boot_time_command
is to be considered a normal case while the host is rebooting as various parts of the system no longer work as normal (filesystems get umounted, ...).A work around is to set
post_reboot_delay
with a "high-enough" value, but a safe threshold to be found varies with the speed at which a system properly shut downs.For an end user, the result is similar to #78007: the module may completes and mark the host as successfully rebooted while it is in fact still rebooting, neither verifying if the reboots was indeed successful and possibly failing in the middle of following tasks.
The work around of #78007 does not work since
boot_time_commmand
is not even executed in our case. The associated PR was also rejected and doesn't look like it would solve this instance of the problem.The issue can be reproduced about 50% of the time on a freshly installed debian12 systems running in a virtual machine on vmware and executing the
reboot
module as an adhoc command with the latest currently released version of ansible (2.16.5).While reproducing the issue may be difficult due to it possibly be a race condition, I encourage to understand the kind of error that get's generated in the attached exeuction and verify that this particular category of error is not treated as such by the current source code and rather as the normal execution of
boot_time_command
.Issue Type
Bug Report
Component Name
ansible.builtin.reboot
Ansible Version
Configuration
OS / Environment
controller: ubuntu 22.04
target host: debian12
Steps to Reproduce
Expected Results
cb8d4303-f662-496a-b5de-df2a4ac85da7\r\n
which is the first execution ofcat /proc/sys/kernel/random/boot_id
which is the default value of boot_time_commandboot_time_command
# 2 and # 3Could not chdir to home directory /home/automation: No such file or directory\r\ncb8d4303-f662-496a-b5de-df2a4ac85da7\r\n
which is a mixup of the error generated by setting up the execution ofboot_time_command
's # 4 and its own output. Due to the race conditions getting involved, we have seen multiple kind of errors messages. In this particular instance, we may speculate that /home had been unmounted and as such /home/automation no longer existed. We have been able to reproduce similar errors even if changingremote_tmp
inansible.cfg
.Note:
This is our partition table for full disclosure
Actual Results
Code of Conduct
The text was updated successfully, but these errors were encountered: