Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bgpd: fix dynamic peer graceful restart race condition #16020

Merged
merged 1 commit into from
May 17, 2024

Conversation

louis-6wind
Copy link
Contributor

bgp_llgr topotest sometimes fails at step 8:

topo: STEP 8: 'Check if we can see 172.16.1.2/32 after R4 (dynamic peer) was killed'

R4 neighbor is deleted on R2 because it fails to re-connect:

14:33:40.128048 BGP: [HKWM3-ZC5QP] 192.168.3.1 fd -1 went from Established to Clearing
14:33:40.128154 BGP: [MJ1TJ-HEE3V] 192.168.3.1(r4) graceful restart timer expired
14:33:40.128158 BGP: [ZTA2J-YRKGY] 192.168.3.1(r4) graceful restart stalepath timer stopped
14:33:40.128162 BGP: [H917J-25EWN] 192.168.3.1(r4) Long-lived stale timer (IPv4 Unicast) started for 20 sec
14:33:40.128168 BGP: [H5X66-NXP9S] 192.168.3.1(r4) Long-lived set stale community (LLGR_STALE) for: 172.16.1.2/32
14:33:40.128220 BGP: [H5X66-NXP9S] 192.168.3.1(r4) Long-lived set stale community (LLGR_STALE) for: 192.168.3.0/24
[...]
14:33:41.138869 BGP: [RGGAC-RJ6WG] 192.168.3.1 [Event] Connect failed 111(Connection refused)
14:33:41.138906 BGP: [ZWCSR-M7FG9] 192.168.3.1 [FSM] TCP_connection_open_failed (Connect->Active), fd 23
14:33:41.138912 BGP: [JA9RP-HSD1K] 192.168.3.1 (dynamic neighbor) deleted (bgp_connect_fail)
14:33:41.139126 BGP: [P98A2-2RDFE] 192.168.3.1(r4) graceful restart stalepath timer stopped

af8496a ("bgpd: Do not delete BGP dynamic peers if graceful restart kicks in") forgot to modify bgp_connect_fail()

Do not delete the peer in bgp_connect_fail() if Non-Stop-Forwarding is in progress.

Fixes: af8496a ("bgpd: Do not delete BGP dynamic peers if graceful restart kicks in")

bgp_llgr topotest sometimes fails at step 8:

> topo: STEP 8: 'Check if we can see 172.16.1.2/32 after R4 (dynamic peer) was killed'

R4 neighbor is deleted on R2 because it fails to re-connect:

> 14:33:40.128048 BGP: [HKWM3-ZC5QP] 192.168.3.1 fd -1 went from Established to Clearing
> 14:33:40.128154 BGP: [MJ1TJ-HEE3V] 192.168.3.1(r4) graceful restart timer expired
> 14:33:40.128158 BGP: [ZTA2J-YRKGY] 192.168.3.1(r4) graceful restart stalepath timer stopped
> 14:33:40.128162 BGP: [H917J-25EWN] 192.168.3.1(r4) Long-lived stale timer (IPv4 Unicast) started for 20 sec
> 14:33:40.128168 BGP: [H5X66-NXP9S] 192.168.3.1(r4) Long-lived set stale community (LLGR_STALE) for: 172.16.1.2/32
> 14:33:40.128220 BGP: [H5X66-NXP9S] 192.168.3.1(r4) Long-lived set stale community (LLGR_STALE) for: 192.168.3.0/24
> [...]
> 14:33:41.138869 BGP: [RGGAC-RJ6WG] 192.168.3.1 [Event] Connect failed 111(Connection refused)
> 14:33:41.138906 BGP: [ZWCSR-M7FG9] 192.168.3.1 [FSM] TCP_connection_open_failed (Connect->Active), fd 23
> 14:33:41.138912 BGP: [JA9RP-HSD1K] 192.168.3.1 (dynamic neighbor) deleted (bgp_connect_fail)
> 14:33:41.139126 BGP: [P98A2-2RDFE] 192.168.3.1(r4) graceful restart stalepath timer stopped

af8496a ("bgpd: Do not delete BGP dynamic peers if graceful restart
kicks in") forgot to modify bgp_connect_fail()

Do not delete the peer in bgp_connect_fail() if Non-Stop-Forwarding is
in progress.

Fixes: af8496a ("bgpd: Do not delete BGP dynamic peers if graceful restart kicks in")
Signed-off-by: Louis Scalbert <[email protected]>
@ton31337
Copy link
Member

@Mergifyio backport stable/10.0 stable/9.1 stable/9.0

Copy link

mergify bot commented May 16, 2024

backport stable/10.0 stable/9.1 stable/9.0

✅ Backports have been created

@ton31337 ton31337 merged commit 5dcb188 into FRRouting:master May 17, 2024
13 checks passed
donaldsharp added a commit that referenced this pull request May 17, 2024
bgpd: fix dynamic peer graceful restart race condition (backport #16020)
donaldsharp added a commit that referenced this pull request May 17, 2024
bgpd: fix dynamic peer graceful restart race condition (backport #16020)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants