Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Restart the networking service" failing repeatedly (during IIAB install, and when ./iiab-network is run) [similar Debian 11 Bullseye VM has a strange DNS-like freezing issue] [QA Automation / testing cloud-init Multipass script example] #3568

Open
holta opened this issue May 5, 2023 · 17 comments
Labels
Milestone

Comments

@holta
Copy link
Member

holta commented May 5, 2023

Thanks to @EMG70. Here are some basic diagnostics (from the Debian 11 VM) until we understand more:

2023-05-05 18:57:26,155 p=85862 u=root n=ansible | TASK [network : Restart the networking service] ******************
2023-05-05 18:57:27,132 p=85862 u=root n=ansible | fatal: [127.0.0.1]: FAILED! => {"changed": false, "msg": "Unable to start service networking: Job for networking.service failed because the control process exited with error code.\nSee "systemctl status networking.\service" and "journalctl -xe" for details.\n"}

iiab-diagnostics: http://sprunge.us/ui5O7W?en

root@box:~# journalctl -u networking | pastebinit -b sprunge.us
( http://sprunge.us/ixfrp5?en )

@holta holta added the question label May 5, 2023
@holta holta added this to the 8.1 milestone May 5, 2023
@holta
Copy link
Member Author

holta commented May 5, 2023

root@box:~# journalctl -u networking | pastebinit -b sprunge.us
( http://sprunge.us/ixfrp5?en )

The bottommost 10 lines above correspond to:

root@box:~# systemctl status networking.service
● networking.service - Raise network interfaces
     Loaded: loaded (/lib/systemd/system/networking.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Fri 2023-05-05 18:57:27 BST; 21min ago
       Docs: man:interfaces(5)
   Main PID: 88346 (code=exited, status=1/FAILURE)
        CPU: 134ms

May 05 18:57:26 box dhclient[88406]: before submitting a bug.  These pages explain the proper
May 05 18:57:26 box dhclient[88406]: process and the information we find helpful for debugging.
May 05 18:57:26 box dhclient[88406]:
May 05 18:57:26 box dhclient[88406]: exiting.
May 05 18:57:27 box ifup[88346]: ifup: failed to bring up extra0
May 05 18:57:27 box ifup[88433]: Cannot find device "br0"
May 05 18:57:27 box ifup[88346]: ifup: failed to bring up br0
May 05 18:57:27 box systemd[1]: networking.service: Main process exited, code=exited, status=1/FAILURE
May 05 18:57:27 box systemd[1]: networking.service: Failed with result 'exit-code'.
May 05 18:57:27 box systemd[1]: Failed to start Raise network interfaces.

@holta
Copy link
Member Author

holta commented May 5, 2023

  1. FYI / FWIW rebooting @EMG70's above VM and trying ./iiab-network does not help. After rebooting, the errors are nearly identical:

    root@box:~# systemctl status networking.service
    ...
    May 05 19:37:52 box ifup[4237]: ifup: failed to bring up default
    May 05 19:37:52 box ifup[4295]: Cannot find device "br0"
    May 05 19:37:52 box ifup[4237]: ifup: failed to bring up br0
    May 05 19:37:52 box systemd[1]: networking.service: Main process exited, code=exited, status=1/FAILURE
    May 05 19:37:52 box systemd[1]: networking.service: Failed with result 'exit-code'.
    May 05 19:37:52 box systemd[1]: Failed to start Raise network interfaces.
    

    I don't know why it changed from extra0 (prior to reboot) to default (after reboot).

  2. A nearly identical VM I set up (but with "just 1 ethernet" instead of "2 ethernets") has a serious DNS and connectivity/slowness glitch of a different kind — I don't know if each VM's networking problems are relasted? In any case, both issues happen every time that (each respective VM) is rebooted. On my own VM, it takes a Full Minute to sudo su - every time, complaining about "name resolution" in a very mysterious way. This is confusing as ping to a named IP address (like MIT.EDU below) works throughout — before, during and afterwards:

    ubuntu@box:~$ sudo su -
    sudo: unable to resolve host box: Temporary failure in name resolution

    root@box:~# ping mit.edu
    PING mit.edu (23.32.174.245) 56(84) bytes of data.
    64 bytes from a23-32-174-245.deploy.static.akamaitechnologies.com (23.32.174.245): icmp_seq=1 ttl=59 time=9.83 ms
    64 bytes from a23-32-174-245.deploy.static.akamaitechnologies.com (23.32.174.245): icmp_seq=2 ttl=59 time=10.7 ms

    iiab-diagnostics for my own VM: http://sprunge.us/HD0RqA?en

@holta holta changed the title "Restart the networking service" failing repeatedly (during IIAB install, and when ./iiab-network is run) "Restart the networking service" failing repeatedly (during IIAB install, and when ./iiab-network is run) [similar VM has a strange DNS-like freezing issue] May 5, 2023
@holta
Copy link
Member Author

holta commented May 5, 2023

EXPLANATION: Debian 11 VM's do not work properly with 2 Ethernet interfaces within Multipass, as demonstrated by this vanilla/fresh "OS ONLY VM" here, prior to installing IIAB...

root@deb11b:~# systemctl status networking.service
● networking.service - Raise network interfaces
     Loaded: loaded (/lib/systemd/system/networking.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Fri 2023-05-05 20:06:21 BST; 54s ago
       Docs: man:interfaces(5)
    Process: 426 ExecStart=/sbin/ifup -a --read-environment (code=exited, status=1/FAILURE)
   Main PID: 426 (code=exited, status=1/FAILURE)
        CPU: 107ms

May 05 20:06:19 deb11b dhclient[503]: before submitting a bug.  These pages explain the proper
May 05 20:06:19 deb11b dhclient[503]: process and the information we find helpful for debugging.
May 05 20:06:19 deb11b dhclient[503]:
May 05 20:06:19 deb11b dhclient[503]: exiting.
May 05 20:06:19 deb11b ifup[426]: ifup: failed to bring up extra0
May 05 20:06:19 deb11b ifup[426]: ifup: waiting for lock on /run/network/ifstate.enp5s0
May 05 20:06:19 deb11b ifup[426]: ifup: waiting for lock on /run/network/ifstate.enp6s0
May 05 20:06:21 deb11b systemd[1]: networking.service: Main process exited, code=exited, status=1/FAILURE
May 05 20:06:21 deb11b systemd[1]: networking.service: Failed with result 'exit-code'.
May 05 20:06:21 deb11b systemd[1]: Failed to start Raise network interfaces.

An "OS ONLY" Debian 11 VM with just 1 Ethernet interface works better, as the paste below shows: (still the above mess suggests Debian 11 networking cannot be trusted within Multipass VM's; which is very unfortunate, as Debian 12 VM's work really well within Multipass, allowing for accelerated testing!)

# systemctl status networking
● networking.service - Raise network interfaces
     Loaded: loaded (/lib/systemd/system/networking.service; enabled; vendor preset: enabled)
     Active: active (exited) since Fri 2023-05-05 15:13:10 EDT; 44s ago
       Docs: man:interfaces(5)
    Process: 414 ExecStart=/sbin/ifup -a --read-environment (code=exited, status=0/SUCCESS)
   Main PID: 414 (code=exited, status=0/SUCCESS)
        CPU: 32ms

May 05 15:13:10 deb11c systemd[1]: Starting Raise network interfaces...
May 05 15:13:10 deb11c ifup[414]: ifup: waiting for lock on /run/network/ifstate.enp5s0
May 05 15:13:10 deb11c systemd[1]: Finished Raise network interfaces.

@holta
Copy link
Member Author

holta commented May 5, 2023

Background: @EMG70 is working on QA Automation of IIAB unit tests using clout-init scripts like this iiab-pbx.yml example...

# https://cloudinit.readthedocs.io/en/latest/topics/examples.html

# -y at end of lines for Mint's problematic /usr/local/bin/apt
# (Hard-coding /usr/bin/apt is another option!)

runcmd:
  - apt update
  - apt install git -y
  - mkdir /etc/iiab
  - curl https://raw.githubusercontent.com/iiab/iiab/master/vars/local_vars_unittest.yml > /etc/iiab/local_vars.yml
  - [sed, -i, 's/^pbx_install:.*/pbx_install: True/', /etc/iiab/local_vars.yml]
  - [sed, -i, 's/^pbx_enabled:.*/pbx_enabled: True/', /etc/iiab/local_vars.yml]
  - curl iiab.io/risky.txt | bash &

Then he launches everything (fresh VM + OS install + IIAB install + unit test) thanks to one single command — as follows:

ETC.

@holta holta changed the title "Restart the networking service" failing repeatedly (during IIAB install, and when ./iiab-network is run) [similar VM has a strange DNS-like freezing issue] "Restart the networking service" failing repeatedly (during IIAB install, and when ./iiab-network is run) [similar VM has a strange DNS-like freezing issue] [QA Automation / testing cloud-init Multipass script example] May 5, 2023
@jvonau
Copy link
Contributor

jvonau commented May 6, 2023

'network_enabled: False' and manage the network on your own.

@holta holta changed the title "Restart the networking service" failing repeatedly (during IIAB install, and when ./iiab-network is run) [similar VM has a strange DNS-like freezing issue] [QA Automation / testing cloud-init Multipass script example] "Restart the networking service" failing repeatedly (during IIAB install, and when ./iiab-network is run) [similar Debian 11 Bullseye VM has a strange DNS-like freezing issue] [QA Automation / testing cloud-init Multipass script example] May 6, 2023
@jvonau
Copy link
Contributor

jvonau commented May 6, 2023

Gather the information from a fresh VM without ever installing IIAB, seems like '--bridged --cloud-init ' assembled this config file:

 684 DIRECTORY /etc/network/interfaces.d FILES WILL FOLLOW...IF THEY EXIST
 685 -IIAB--------------------------------------------------------------------------
 686 -rw-r--r-- 1 root root 432 May  5 18:14 /etc/network/interfaces.d/50-cloud-init
 687                         ...ITS LAST 100 LINES FOLLOW...
 688 
 689 # This file is generated from information provided by the datasource.  Changes
 690 # to it will not persist across an instance reboot.  To disable cloud-init's
 691 # network configuration capabilities, write a file
 692 # /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg with the following:
 693 # network: {config: disabled}
 694 auto lo
 695 iface lo inet loopback
 696 
 697 auto default
 698 iface default inet dhcp
 699 
 700 auto extra0
 701 iface extra0 inet dhcp
 702     metric 200
 703 

and the resulting network looked like:

 921 =IIAB==========================================================================
 922 COMMAND: /usr/sbin/ip route    # Routing table
 923 
 924 default via 10.88.216.1 dev enp5s0 
 925 10.8.0.0/18 via 10.8.0.25 dev tun0 
 926 10.8.0.1 via 10.8.0.25 dev tun0 
 927 10.8.0.25 dev tun0 proto kernel scope link src 10.8.0.26 
 928 10.88.216.0/24 dev enp5s0 proto kernel scope link src 10.88.216.245 
 929 192.168.0.0/24 dev enp6s0 proto kernel scope link src 192.168.0.39 

Seems like a bit of a bug/misconfiguration somewhere, the above only has ONE default route while the other OS's would show 2 default routes but with different metrics assigned to each. Now without the second gateway present that interface would not be part of 'exclude_devices' via 'second_gateway_found' becoming part of 'lan_list_result' that would become a slave device under br0. Wonder why there is no mention of the devices enp5s0 or enp6s0 in the configuration file.

@holta
Copy link
Member Author

holta commented May 6, 2023

Gather the information from a fresh VM without ever installing IIAB, seems like '--bridged --cloud-init '

I might be wrong, but I'm assuming it's Multipass's --bridged flag that just does not play well with Debian 11 VM's.

So be it.

Still, I'll post the 2 things you mention, in case there's something to learn here:

  • /etc/network/interfaces.d/50-cloud-init
  • Output from: ip r

Any others?

@jvonau
Copy link
Contributor

jvonau commented May 6, 2023

That would be a good start. Think I can gather enough info from ip r

@jvonau
Copy link
Contributor

jvonau commented May 6, 2023

Think it more of an issue within the pre-canned images not playing nice with more than one network interface.

@holta
Copy link
Member Author

holta commented May 6, 2023

Think it more of an issue within the pre-canned images not playing nice with more than one network interface.

Ok.

And interestingly, even with "1 single Ethernet" interface, the other Debian 11 (Multipass VM) consistently acted very strangely in a very different way:

  1. Severe slowness might be arising from packet loss or similar
  2. False allegations that DNS / name resolution was not working (while ping mit.edu always worked fine!)

@jvonau
Copy link
Contributor

jvonau commented May 6, 2023

2. False allegations that DNS / name resolution was not working (while `ping mit.edu` always worked fine!)

Think that is a result of the image using the older ifupdown functions where the hostname change becomes effective upon reboot. With a reboot does sudo fail to resolve the hostname of the VM?

@holta
Copy link
Member Author

holta commented May 6, 2023

"OS ONLY" Debian 11 VM with 2 Ethernet interfaces are such a mess, they appear to barely reboot, But on first boot (all you get?) it looks like this:

ubuntu@deb11d:~$ sudo su -
root@deb11d:~# cat /etc/network/interfaces.d/50-cloud-init
# This file is generated from information provided by the datasource.  Changes
# to it will not persist across an instance reboot.  To disable cloud-init's
# network configuration capabilities, write a file
# /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg with the following:
# network: {config: disabled}
auto lo
iface lo inet loopback

auto default
iface default inet dhcp

auto extra0
iface extra0 inet dhcp
    metric 200
root@deb11d:~# ip r
default via 10.181.233.1 dev enp5s0
10.181.233.0/24 dev enp5s0 proto kernel scope link src 10.181.233.14
192.168.0.0/24 dev enp6s0 proto kernel scope link src 192.168.0.169
root@deb11d:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enp5s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 52:54:00:a8:b2:bc brd ff:ff:ff:ff:ff:ff
    inet 10.181.233.14/24 brd 10.181.233.255 scope global dynamic enp5s0
       valid_lft 3460sec preferred_lft 3460sec
    inet6 fd42:fb6b:9c93:d2f1:5054:ff:fea8:b2bc/64 scope global dynamic mngtmpaddr
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fea8:b2bc/64 scope link
       valid_lft forever preferred_lft forever
3: enp6s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 52:54:00:a1:5f:4f brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.169/24 brd 192.168.0.255 scope global dynamic enp6s0
       valid_lft 7081sec preferred_lft 7081sec
    inet6 fe80::5054:ff:fea1:5f4f/64 scope link
       valid_lft forever preferred_lft forever

@holta
Copy link
Member Author

holta commented May 6, 2023

"OS ONLY" Debian 11 VM with 1 single Ethernet interface:

root@deb11c:~# cat /etc/network/interfaces.d/50-cloud-init
# This file is generated from information provided by the datasource.  Changes
# to it will not persist across an instance reboot.  To disable cloud-init's
# network configuration capabilities, write a file
# /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg with the following:
# network: {config: disabled}
auto lo
iface lo inet loopback

auto enp5s0
iface enp5s0 inet dhcp
root@deb11c:~# ip r
default via 10.181.233.1 dev enp5s0
10.181.233.0/24 dev enp5s0 proto kernel scope link src 10.181.233.67
root@deb11c:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enp5s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 52:54:00:9d:11:a2 brd ff:ff:ff:ff:ff:ff
    inet 10.181.233.67/24 brd 10.181.233.255 scope global dynamic enp5s0
       valid_lft 3507sec preferred_lft 3507sec
    inet6 fd42:fb6b:9c93:d2f1:5054:ff:fe9d:11a2/64 scope global dynamic mngtmpaddr
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe9d:11a2/64 scope link
       valid_lft forever preferred_lft forever

@holta
Copy link
Member Author

holta commented May 6, 2023

2. False allegations that DNS / name resolution was not working (while `ping mit.edu` always worked fine!)

Think that is a result of the image using the older ifupdown functions where the hostname change becomes effective upon reboot. With a reboot does sudo fail to resolve the hostname of the VM?

Apologies that original single-Ethernet-interface (--bridged flag was not used) VM is now deleted.

Just FYI/FWIW it had contained IIAB with roles/pbx successfully installed.

Just FYI/FWIW minimal such Multipass VM's with 1 single Ethernet interface have all rebooted and generally (appeared to) work fine prior to installation of IIAB.

@holta
Copy link
Member Author

holta commented May 7, 2023

"OS ONLY" Debian 11 VM with 2 Ethernet interfaces are such a mess, they appear to barely reboot

The irony is that Debian 11 (with 2 Ethernet interfaces) works well (as a Multipass VM) until it's rebooted 🤔
So the question is whether VM's networking might possibly be patched up a bit?
I don't know.
I gave it a shot, trying a few different things here:

  1. I deleted /etc/network/interfaces.d/50-cloud-init which wasn't enough to make the Multipass VM usable after reboot.

  2. I also tried rebooting a Debian 11 VM (likewise with 2 interfaces) after trying the suggestion in /etc/network/interfaces.d# cat 50-cloud-init which said you can disable cloud-init's network configuration capabilities as follows:

    root@deb11:~# cat > /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg
    network: {config: disabled}
    
  3. Finally, I also tried apt purge cloud-init then rebooting the VM.

  4. I also tried a VM including all 3 of the above changes. All these "OS ONLY" VM's fail on reboot — you can no longer log in — the usual 2 methods fail as seen here:

    # multipass shell deb11
    shell failed: ssh connection failed: 'No route to host'
    
    root@126-u2204-desk:~# ssh [email protected]
    ssh: connect to host 10.181.233.150 port 22: No route to host
    

    Likewise the VM may have rebooted (it says it's running below!?) but not in a usable form — even the VM's basic system diagnostics (multipass info ...) are not available:

    # multipass list
    Name                    State             IPv4             Image
    deb11                   Running           10.181.233.150    Not Available
    
    # multipass info deb11
    info failed: ssh connection failed: 'No route to host'
    

    So I deleted each of the above the broken VM's, as follows:

    lxc delete deb11 --project=multipass --force
    snap restart multipass
    

    (And recreated a new VM each time, using...)

    multipass launch --bridged -n deb11 https://cloud.debian.org/images/cloud/bullseye/latest/debian-11-generic-amd64.qcow2

RECAP:

  • It's not looking good.
  • Any other potentially plausible approaches? (To patching Debian 11 VM's networking, allowing VM to at least reboot!)
  • Should any of the cloud-init files below perhaps be modified or stripped out?
root@deb11:/etc/cloud# tree
.
├── cloud.cfg
├── cloud.cfg.d
│   ├── 00_debian.cfg
│   ├── 01_debian_cloud.cfg
│   ├── 05_logging.cfg
│   └── README
└── templates
    ├── chef_client.rb.tmpl
    ├── chrony.conf.alpine.tmpl
    ├── chrony.conf.debian.tmpl
    ├── chrony.conf.fedora.tmpl
    ├── chrony.conf.opensuse.tmpl
    ├── chrony.conf.rhel.tmpl
    ├── chrony.conf.sles.tmpl
    ├── chrony.conf.ubuntu.tmpl
    ├── hosts.alpine.tmpl
    ├── hosts.debian.tmpl
    ├── hosts.freebsd.tmpl
    ├── hosts.redhat.tmpl
    ├── hosts.suse.tmpl
    ├── ntp.conf.alpine.tmpl
    ├── ntp.conf.debian.tmpl
    ├── ntp.conf.fedora.tmpl
    ├── ntp.conf.opensuse.tmpl
    ├── ntp.conf.rhel.tmpl
    ├── ntp.conf.sles.tmpl
    ├── ntp.conf.ubuntu.tmpl
    ├── resolv.conf.tmpl
    ├── sources.list.debian.tmpl
    ├── sources.list.ubuntu.tmpl
    └── timesyncd.conf.tmpl

@holta
Copy link
Member Author

holta commented May 15, 2023

We can tag this issue as "wontfix" if it's too much trouble.

Debian 11 networking is probably patchable after its first boot, to make it work with Multipass & FreePBX etc.

But if the effort is too complicated in the end, let's drop it.

@jvonau
Copy link
Contributor

jvonau commented Dec 9, 2023

#3672

@holta holta modified the milestones: 8.1, 8.2 Jan 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants