Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve kernel-assigned IPv6 link-local addresses on a bridge network's bridge #47787

Merged
merged 4 commits into from
May 10, 2024

Conversation

robmry
Copy link
Contributor

@robmry robmry commented May 1, 2024

- What I did

Before release 25.0.0, an IPv6-enabled bridge network's bridge was always assigned address fe80::1 as well as an IPAM-assigned address from fixed-cidr-v6 or the network's --subnet/--ip-range. It then deleted any other routable addresses. When the bridge came up because a container got linked to it, the kernel assigned a link-local address in fe80::/64.

In 25.0.0, #46850 tried to reconcile expected addresses on a bridge network's bridge with addresses with those on the bridge, before deciding which addresses to add/remove. That solved a problem where changes in fixed-cidr-v6 prevented the daemon from starting, if the new and old subnets overlapped. But, it was more aggressive about removing addresses, and would remove the kernel-assigned link local address. In most circumstances, fe80::1 seems to be used for NDP etc - but, it could cause problems as discussed in this Slack thread (repro steps from that chat are copied into a comment below).

In 26.1.1, #47771 added env var DOCKER_BRIDGE_PRESERVE_KERNEL_LL which, if set to 1, prevented the daemon from adding the fe80::1 address and from deleting any address in fe80::/64 - preserving the kernel-assigned link local address, and fe80::1 (because if an earlier version of the daemon had replaced the kernel-ll address, the bridge wouldn't get a new one). That solved the problem discussed in the Slack thread.

This change updates the daemon to use the 26.1.1 env-var controlled behaviour in all cases ... it removes the env-var, never tries to assign fe80::1, and doesn't remove addresses in fe80::/64.

Please note that not-assigning fe80::1 is new in this release (apart from the optional behaviour in 26.1.1). All previous releases would assign that address. I don't think there's any reason to add that extra LL address, the kernel will always set one up. Nothing set up by the daemon uses it, and it's not documented.

Closes #47778

In separate commits, this PR also:

  • Improves error reporting for invalid subnets.
  • Adds a check that fixed-cidr-v6 / --subnet is not a multicast address.
  • Prevents the daemon from removing multicast addresses from a bridge (because, if there's one there, it was added by the user so it's not the daemon's to delete).

- How I did it

Simplified the code a little, now fe80::1 isn't added, there's only one IPv6 address to add to the bridge (the IPAM assigned address), so there was no longer any need for a map of wanted-addresses.

- How to verify it

Updated/added tests.

- Description for the changelog

For IPv6-enabled bridge networks, do not attempt to replace the bridge's kernel-assigned link local address with `fe80::1`.

robmry added 4 commits May 1, 2024 17:20
Make the behaviour enabled by env var DOCKER_BRIDGE_PRESERVE_KERNEL_LL
the default...
- don't remove kernel assigned link-local addresses
  - or any address in fe80::/64
- don't assign fe80::1 to a bridge

Signed-off-by: Rob Murray <[email protected]>
- Remove package variable bridge.bridgeIPv6
- Use netip in more places
- Improve error messages from fixed-cidr-v6 checks

Signed-off-by: Rob Murray <[email protected]>
Multicast addresses aren't added by the daemon so, if they're present,
it's because they were explicitly added - possibly to a user-managed
bridge. So, don't remove.

Signed-off-by: Rob Murray <[email protected]>
@robmry robmry self-assigned this May 1, 2024
@robmry robmry added this to the 27.0.0 milestone May 1, 2024
@robmry robmry requested review from corhere and akerouanton May 1, 2024 18:29
@robmry robmry changed the title 47778 preserve kernel ll addrs Preserve kernel-assigned IPv6 link-local addresses on a bridge network's bridge May 1, 2024
@robmry robmry marked this pull request as ready for review May 1, 2024 18:31
@thaJeztah
Copy link
Member

Quick note; the Docker Community slack is on a free plan, which means that Slack conversations will eventually disappear. Not sure if there's anything relevant in the slack thread that must be preserved, but if there is, it's worth to either copy (or screenshot) the relevant discussion.

Screenshot 2024-05-07 at 13 51 54

@robmry
Copy link
Contributor Author

robmry commented May 7, 2024

Quick note; the Docker Community slack is on a free plan, which means that Slack conversations will eventually disappear. Not sure if there's anything relevant in the slack thread that must be preserved, but if there is, it's worth to either copy (or screenshot) the relevant discussion.

Oh, thank you - hadn't appreciated that. I think it's mostly here, apart from Vinod's repro steps ...

vinod kumar 14 days ago

I believe I found the problem . we create dockerbr using manual configuration . then using old style network-script to bringup the interface up and down and also to persist the information for restart case. here is the script .

# Tue Apr 23 05:27:59 2024

DEVICE=dockerbr
TYPE=Bridge
IPADDR=172.16.2.1
NETMASK=255.255.0.0
MTU=1450
ONBOOT=yes
BOOTPROTO=none
NM_CONTROLLED=no
DELAY=0
NOZEROCONF=yes
IPV6INIT=yes
IPV6ADDR=fddd::2:0/96
IPV6_DEFAULTGW=::dockerbr

here we assign both IPv6 and IPV4 address.
when docker comes up it reassign the IPV6 based on assign CIDR . here is the command-line that is being used for bring up the dockerd.

/usr/bin/dockerd --exec-opt native.cgroupdriver=systemd --userland-proxy-path=/usr/bin/docker-proxy -H unix:///var/run/docker.sock -H tcp://172.16.2.1:2375 --data-root /opt/ciena/data/docker --label host_id=2 --fixed-cidr=172.16.2.0/24 --bridge dockerbr4 --mtu 1450 --ipv6 --fixed-cidr-v6=fddd::2:0/112

i guess they both are conflicting and breaking ipv6 connectivity when interface brought down and up in later stage.
Note: I dont know if its wrong or right but this used to work before 25.x release.

vinod kumar [12 days ago]

hi Rob , here are the steps to recreate issue . pls note that we use vxilan to connect with adjacent hosts docker network.

systemctl stop docker
sudo route -6 del fddd::/96 dev dockerbr
ifconfig dockerbr down
brctl delbr dockerbr
rm -f /opt/ciena/data/docker/docker/network/files/local-kv.db

# create everything from scratch

bridge fdb del to 00:00:00:00:00:00 dst 10.182.39.162 dev vxilan
bridge fdb del to 00:00:00:00:00:00 dst 10.182.39.160 dev vxilan
ifconfig vxilan down
ip link del vxilan

sudo ip link add vxilan type vxlan id 42 dev ens192 dstport 0
sudo brctl addbr dockerbr"
sudo brctl addif dockerbr vxilan || sudo brctl show dockerbr | grep vxilan
sudo ifconfig vxilan 0 up"
sudo ifconfig dockerbr 172.16.2.1/16 up
sudo ifconfig vxilan mtu 1450
sudo bridge fdb append to 00:00:00:00:00:00 dst 10.182.39.162 dev vxilan
sudo bridge fdb append to 00:00:00:00:00:00 dst 10.182.39.160 dev vxilan
sudo route -6 add fddd::/96 dev dockerbr
systemctl start docker
docker network create --driver bridge --subnet 172.16.0.0/16 --gateway 172.16.2.1 --ip-range 172.16.2.0/24 --ipv6 --subnet fddd::/96 --gateway fddd::2:1 --ip-range fddd::2:0/112 --label bp={} -o "com.docker.network.bridge.name"="dockerbr" -o "com.docker.network.driver.mtu"="1450" bp-bridge

# now start container and start ipv6 ping . choose ipv6 ping-able from the host.

docker run --rm  --net bp-bridge -it alpine  sh -c "ping6 2620:11b:d06c:f801:20c:29ff:fe12:cab0"

# command for running dokcerd

/usr/bin/dockerd --exec-opt native.cgroupdriver=systemd --userland-proxy-path=/usr/bin/docker-proxy -H unix:///var/run/docker.sock -H tcp://172.16.0.1:2375 --data-root /opt/ciena/data/docker --label host_id=2 

ping6 will stop after some duration.

@akerouanton akerouanton merged commit 75821a7 into moby:master May 10, 2024
153 checks passed
@robmry robmry deleted the 47778_preserve_kernel_ll_addrs branch May 15, 2024 09:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Review deletion of kernel-ll addresses on bridges
4 participants