Windows Server 2022 failover or switchover with duplicate IP address #202

thaala · 2023-11-10T07:59:30Z

thaala
Nov 10, 2023

Failovers and switchovers using patronis etcdctl ends up with 80% failure because of duplicate IP in the network.
Windows (our case: 3nodesWindows Server 2022, 2nodesPostgres) doesnt reactivate or test the additional IP again after such conflict has been detected.

with an ipconfig /all the standard IP4 status is a (Preferred) state. In case of such error we got a (Duplicate) status instead

The failover stucks until removing this address by powershell command. After removing address vip-manager add it again and failover succeeds.

We found a workaround for the moment. Its forever running task on both postgres servers which looks every 5 seconds for a (duplicate) state of the desired interface and if occurs remove the address....

Powershell:
Remove-NetIPAddress -Confirm:$false -InterfaceAlias yourinterfacename -AddressState Duplicate

A better way could be to add such verfiy command short time after adding the IP to the interface instead doing this ever 5 seconds inside the vip-service. If duplicate state happens the ip can be removed and added again until state is (Preferred) or amount of try ends up with a permanent fail...

Thank you for this software.
BR Thilo

MarkIITech · 2023-11-28T19:12:21Z

MarkIITech
Nov 28, 2023

We have been running into issues ever since we updated python from 3.9 to 3.11+. The vip-manager seems to move too quickly and the server sense duplicate IP Addresses on a administrative switchover. We've found that by restarting the vip-manager service on the destination leader, the contention resolves. It is somewhat intermittent, but completely destroys the High-Availability of the ETCD/PGSQL Cluster since we don't know if it will be successful during a true host failure. We've disabled Windows built in Duplicate IP Detection on both Cluster Nodes to see if that would resolve the issue. It became slightly better, Windows stopped popping up regarding duplicate IP Addresses, but we still find that the switchover is unsuccessful 80+% of the time. Any thoughts on how to improve this system or to replace vip-manager with another solution?

0 replies

TakashiArakane · 2023-12-05T12:11:49Z

TakashiArakane
Dec 5, 2023

Would disabling Gratuitous ARP in the registry settings help? (Although the pros and cons of disabling it may need to be considered separately.)

Registry key: HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
Value name: ArpRetryCount
Value type: REG_DWORD
Data: 0x0

Under Windows 2022 environment, the same duplicate detection has been improved as described above.

1 reply

MarkIITech Jan 17, 2024

We did disable the ArpRetryCount on our two nodes and it eliminated the Windows pop-up notification regarding duplicate IP address. We still have a seemingly random failure of vip-manager to switchover. A restart of the vip-manager on the destination leader corrects the problem. We monitor by pinging the virtual IP address and when it drops more than one packet, we know that the switchover has failed. I'd be happy to provide more information if it helps resolve this bug. Regards-

thaala · 2024-01-18T07:58:25Z

thaala
Jan 18, 2024
Author

Hello,

removing the Alarm is not the right way to solve this problem because i think that there are really duplicate IP's in the network. Next Eth-Switch must learn that the ip is no more active on this specific port. This may take some time.

I think it would be best to change to a three state switchover instead of a hard on/off procedure. The trigger key in the parameter database
(etc3) should also have an "intermediate" trigger-value which should be shown for a short while. This will give every vip-node a chance to remove the virtual IP. After changing this trigger-key after a little timeout to a real node id - the network should be ready to accept the ip without problems.

BR Thilo

6 replies

MarkIITech Jan 24, 2024

Just to confirm - It looks like last week's update should have corrected the timing issues we were facing?
Thank you in advance-

pashagolub Jan 25, 2024
Maintainer

We hope so, yes

thaala May 29, 2024
Author

@pashagolub: Do you plan to update the vip-manager code for checking of duplicate ip addresses? The windows powershell-workaround seems to work well but i think to do same procedure inside your code would be an enhancement. I see that you're using the iphlpapi.lib. But i dont know how to query the status of an interface figuring out whether the ip is duplicated or not.
If you can implement this check and remove the address again if this problem happened - followed by a next try after short time delay can have a successful IP address add. I am unsure whether this is a "windows only" problem but i didnt think so....

BTW. have a lot of thanks for this software. After long investigation i found that patroni solution is "the only" postgres standby database solution which is functioning well....
BR Thilo

pashagolub May 29, 2024
Maintainer

@thaala PRs are welcome! We're, of course, interested in improving our software! :)

thaala Jul 11, 2024
Author

Hi,

in meanwhile other problem occurres. The Interface where the virtual ip added was identified by the name of the interface. But after some months of 24/7 working the name becomes inaccessible - this may related to the x million calls of the Powershell:
Remove-NetIPAddress - command. This problem can be solved by change the name of the interface and then change it back. Strange....

However this was a workaround for the workaround - its one workaround too much....
i have edited the code by adding two features:

Adding a parameter which superseeds the "Interfacename" parameter!
If Parameter "InterfaceIndex" was set to a interface number it will be used to get the right interface with net.InterfaceByIndex()
instead of net.InterfaceByName().
The ipmanager is expanded by a function "isdup()" which calls window function to evaluate the member
"DadState" == "Duplicated (2)
if state "Duplicated" is happened after the virtual ip was added, the virtual ip will be removed again (in 1 second slices)!
The Test after adding the address happened every second - for max 6 seconds. If not seen "DadState" == "Duplicated"
this adding is ended successfully. This adding/removing procedure is limited to maximum of 7 retries.....
if the ip remains "Duplicated" after 7'th try the virtual ip is leaved alone...

Next week we will going to test this software on our prduction system. Unfortunately the function "isdup()" is a dummy for linux systems because my skills on linux programming are nonexistent... This nonexistent skills are also valid for go programming language. (only c#, c++, c)
if you want the code in the mainstream you may advice me how to give it to you - maybe you can make is a bit "smoother".

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Windows Server 2022 failover or switchover with duplicate IP address #202

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 7 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Windows Server 2022 failover or switchover with duplicate IP address #202

thaala Nov 10, 2023

Replies: 3 comments · 7 replies

MarkIITech Nov 28, 2023

TakashiArakane Dec 5, 2023

MarkIITech Jan 17, 2024

thaala Jan 18, 2024 Author

MarkIITech Jan 24, 2024

pashagolub Jan 25, 2024 Maintainer

thaala May 29, 2024 Author

pashagolub May 29, 2024 Maintainer

thaala Jul 11, 2024 Author

thaala
Nov 10, 2023

Replies: 3 comments 7 replies

MarkIITech
Nov 28, 2023

TakashiArakane
Dec 5, 2023

thaala
Jan 18, 2024
Author

pashagolub Jan 25, 2024
Maintainer

thaala May 29, 2024
Author

pashagolub May 29, 2024
Maintainer

thaala Jul 11, 2024
Author