Preserve kernel-assigned IPv6 link-local addresses on a bridge network's bridge #47787

robmry · 2024-05-01T17:18:43Z

- What I did

Before release 25.0.0, an IPv6-enabled bridge network's bridge was always assigned address fe80::1 as well as an IPAM-assigned address from fixed-cidr-v6 or the network's --subnet/--ip-range. It then deleted any other routable addresses. When the bridge came up because a container got linked to it, the kernel assigned a link-local address in fe80::/64.

In 25.0.0, #46850 tried to reconcile expected addresses on a bridge network's bridge with addresses with those on the bridge, before deciding which addresses to add/remove. That solved a problem where changes in fixed-cidr-v6 prevented the daemon from starting, if the new and old subnets overlapped. But, it was more aggressive about removing addresses, and would remove the kernel-assigned link local address. In most circumstances, fe80::1 seems to be used for NDP etc - but, it could cause problems as discussed in this Slack thread (repro steps from that chat are copied into a comment below).

In 26.1.1, #47771 added env var DOCKER_BRIDGE_PRESERVE_KERNEL_LL which, if set to 1, prevented the daemon from adding the fe80::1 address and from deleting any address in fe80::/64 - preserving the kernel-assigned link local address, and fe80::1 (because if an earlier version of the daemon had replaced the kernel-ll address, the bridge wouldn't get a new one). That solved the problem discussed in the Slack thread.

This change updates the daemon to use the 26.1.1 env-var controlled behaviour in all cases ... it removes the env-var, never tries to assign fe80::1, and doesn't remove addresses in fe80::/64.

Please note that not-assigning fe80::1 is new in this release (apart from the optional behaviour in 26.1.1). All previous releases would assign that address. I don't think there's any reason to add that extra LL address, the kernel will always set one up. Nothing set up by the daemon uses it, and it's not documented.

Closes #47778

In separate commits, this PR also:

Improves error reporting for invalid subnets.
Adds a check that fixed-cidr-v6 / --subnet is not a multicast address.
Prevents the daemon from removing multicast addresses from a bridge (because, if there's one there, it was added by the user so it's not the daemon's to delete).

- How I did it

Simplified the code a little, now fe80::1 isn't added, there's only one IPv6 address to add to the bridge (the IPAM assigned address), so there was no longer any need for a map of wanted-addresses.

- How to verify it

Updated/added tests.

- Description for the changelog

For IPv6-enabled bridge networks, do not attempt to replace the bridge's kernel-assigned link local address with `fe80::1`.

Make the behaviour enabled by env var DOCKER_BRIDGE_PRESERVE_KERNEL_LL the default... - don't remove kernel assigned link-local addresses - or any address in fe80::/64 - don't assign fe80::1 to a bridge Signed-off-by: Rob Murray <[email protected]>

- Remove package variable bridge.bridgeIPv6 - Use netip in more places - Improve error messages from fixed-cidr-v6 checks Signed-off-by: Rob Murray <[email protected]>

Signed-off-by: Rob Murray <[email protected]>

Multicast addresses aren't added by the daemon so, if they're present, it's because they were explicitly added - possibly to a user-managed bridge. So, don't remove. Signed-off-by: Rob Murray <[email protected]>

integration/networking/bridge_test.go

thaJeztah · 2024-05-07T11:53:19Z

Quick note; the Docker Community slack is on a free plan, which means that Slack conversations will eventually disappear. Not sure if there's anything relevant in the slack thread that must be preserved, but if there is, it's worth to either copy (or screenshot) the relevant discussion.

robmry · 2024-05-07T12:28:16Z

Quick note; the Docker Community slack is on a free plan, which means that Slack conversations will eventually disappear. Not sure if there's anything relevant in the slack thread that must be preserved, but if there is, it's worth to either copy (or screenshot) the relevant discussion.

Oh, thank you - hadn't appreciated that. I think it's mostly here, apart from Vinod's repro steps ...

vinod kumar 14 days ago

I believe I found the problem . we create dockerbr using manual configuration . then using old style network-script to bringup the interface up and down and also to persist the information for restart case. here is the script .
# Tue Apr 23 05:27:59 2024

DEVICE=dockerbr
TYPE=Bridge
IPADDR=172.16.2.1
NETMASK=255.255.0.0
MTU=1450
ONBOOT=yes
BOOTPROTO=none
NM_CONTROLLED=no
DELAY=0
NOZEROCONF=yes
IPV6INIT=yes
IPV6ADDR=fddd::2:0/96
IPV6_DEFAULTGW=::dockerbr
here we assign both IPv6 and IPV4 address.
when docker comes up it reassign the IPV6 based on assign CIDR . here is the command-line that is being used for bring up the dockerd.

/usr/bin/dockerd --exec-opt native.cgroupdriver=systemd --userland-proxy-path=/usr/bin/docker-proxy -H unix:///var/run/docker.sock -H tcp://172.16.2.1:2375 --data-root /opt/ciena/data/docker --label host_id=2 --fixed-cidr=172.16.2.0/24 --bridge dockerbr4 --mtu 1450 --ipv6 --fixed-cidr-v6=fddd::2:0/112

i guess they both are conflicting and breaking ipv6 connectivity when interface brought down and up in later stage.
Note: I dont know if its wrong or right but this used to work before 25.x release.

vinod kumar [12 days ago]

hi Rob , here are the steps to recreate issue . pls note that we use vxilan to connect with adjacent hosts docker network.
systemctl stop docker
sudo route -6 del fddd::/96 dev dockerbr
ifconfig dockerbr down
brctl delbr dockerbr
rm -f /opt/ciena/data/docker/docker/network/files/local-kv.db

# create everything from scratch

bridge fdb del to 00:00:00:00:00:00 dst 10.182.39.162 dev vxilan
bridge fdb del to 00:00:00:00:00:00 dst 10.182.39.160 dev vxilan
ifconfig vxilan down
ip link del vxilan

sudo ip link add vxilan type vxlan id 42 dev ens192 dstport 0
sudo brctl addbr dockerbr"
sudo brctl addif dockerbr vxilan || sudo brctl show dockerbr | grep vxilan
sudo ifconfig vxilan 0 up"
sudo ifconfig dockerbr 172.16.2.1/16 up
sudo ifconfig vxilan mtu 1450
sudo bridge fdb append to 00:00:00:00:00:00 dst 10.182.39.162 dev vxilan
sudo bridge fdb append to 00:00:00:00:00:00 dst 10.182.39.160 dev vxilan
sudo route -6 add fddd::/96 dev dockerbr
systemctl start docker
docker network create --driver bridge --subnet 172.16.0.0/16 --gateway 172.16.2.1 --ip-range 172.16.2.0/24 --ipv6 --subnet fddd::/96 --gateway fddd::2:1 --ip-range fddd::2:0/112 --label bp={} -o "com.docker.network.bridge.name"="dockerbr" -o "com.docker.network.driver.mtu"="1450" bp-bridge

# now start container and start ipv6 ping . choose ipv6 ping-able from the host.

docker run --rm  --net bp-bridge -it alpine  sh -c "ping6 2620:11b:d06c:f801:20c:29ff:fe12:cab0"

# command for running dokcerd

/usr/bin/dockerd --exec-opt native.cgroupdriver=systemd --userland-proxy-path=/usr/bin/docker-proxy -H unix:///var/run/docker.sock -H tcp://172.16.0.1:2375 --data-root /opt/ciena/data/docker --label host_id=2 
ping6 will stop after some duration.

robmry added 4 commits May 1, 2024 17:20

Refactor IPv6 subnet validation

aa3a86c

- Remove package variable bridge.bridgeIPv6 - Use netip in more places - Improve error messages from fixed-cidr-v6 checks Signed-off-by: Rob Murray <[email protected]>

Disallow IPv6 multicast as bridge n/w subnet

a5f82ba

Signed-off-by: Rob Murray <[email protected]>

Don't delete IPv6 multicast addresses from a bridge

b11e95f

Multicast addresses aren't added by the daemon so, if they're present, it's because they were explicitly added - possibly to a user-managed bridge. So, don't remove. Signed-off-by: Rob Murray <[email protected]>

robmry self-assigned this May 1, 2024

robmry added status/1-design-review status/2-code-review area/networking impact/changelog kind/bugfix PR's that fix bugs area/networking/ipv6 Issues related to ipv6 labels May 1, 2024

robmry added this to the 27.0.0 milestone May 1, 2024

robmry requested review from corhere and akerouanton May 1, 2024 18:29

robmry changed the title ~~47778 preserve kernel ll addrs~~ Preserve kernel-assigned IPv6 link-local addresses on a bridge network's bridge May 1, 2024

robmry marked this pull request as ready for review May 1, 2024 18:31

akerouanton reviewed May 7, 2024

View reviewed changes

integration/networking/bridge_test.go Show resolved Hide resolved

akerouanton approved these changes May 7, 2024

View reviewed changes

corhere approved these changes May 8, 2024

View reviewed changes

akerouanton merged commit 75821a7 into moby:master May 10, 2024
153 checks passed

robmry deleted the 47778_preserve_kernel_ll_addrs branch May 15, 2024 09:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preserve kernel-assigned IPv6 link-local addresses on a bridge network's bridge #47787

Preserve kernel-assigned IPv6 link-local addresses on a bridge network's bridge #47787

robmry commented May 1, 2024 •

edited

thaJeztah commented May 7, 2024

robmry commented May 7, 2024

Preserve kernel-assigned IPv6 link-local addresses on a bridge network's bridge #47787

Preserve kernel-assigned IPv6 link-local addresses on a bridge network's bridge #47787

Conversation

robmry commented May 1, 2024 • edited

thaJeztah commented May 7, 2024

robmry commented May 7, 2024

robmry commented May 1, 2024 •

edited