Claude Session - Understanding WireGuard Routing

I have a question about a problem I'm having with wireguard used between two routers to implement a tunnel. The scenario is quite complex but the basic idea is that the wireguard session between router a and router b should always be active, regardless of the underlying IP route used to carry the wireguard UDP packets. Specifically what I have seen is that when routes change such that router a ends up using a different source IP address than before, the wireguard session becomes a "zombie" -- the session stays up as far as both ends are concerned but no traffic passes. Presumably this is because router b tries to send any wireguard packets to router a's old IP address and so they never arrive. In this state I've found that if I manually terminate the wireguard session then re-start it, everything works as expected. My question is: is there a way that I can have router a re-start the wireguard session it has with router b in the event that its routing topology changes? The routers are running RouterOS by the way.

This is a well-known WireGuard behavior. WireGuard is designed to handle "roaming" — when it receives a valid authenticated packet from a peer at a new source IP, it updates its endpoint for that peer. However, your scenario creates a chicken-and-egg problem: router B won't update its endpoint until it receives a packet from router A's new IP, but if there's no outbound traffic from A (or keepalives aren't configured), B keeps sending to the stale address.

A few approaches for RouterOS:

1. Ensure PersistentKeepalive is set

On the WireGuard peer configuration on router A, make sure you have:

/interface wireguard peers set [find where endpoint-address="<router-b-ip>"] persistent-keepalive=25

This should cause A to send keepalives from its new source IP, prompting B to update. If you already have this and it's still failing, the issue may be more subtle (perhaps related to how RouterOS handles the interface during route transitions).

2. Script-based restart on route change

You can create a script that monitors your routing state and restarts the WireGuard peer when a relevant change occurs. Here's a skeleton approach:

routeros

# Script: check-wg-source-ip
# Store the current source IP used to reach router B in a global variable
# If it changes, disable/enable the WireGuard peer

:local peerEndpoint "203.0.113.50"  # Router B's IP
:local currentSrc [/ip route get [/ip route find dst-address="$peerEndpoint/32" active] gateway]
# Or more reliably, check which source IP would be used:
:local currentSrc [/ip route check $peerEndpoint as-value ->"src-address"]

:global lastWgSrcIP
:if ($currentSrc != $lastWgSrcIP) do={
    :log warning "WireGuard source IP changed from $lastWgSrcIP to $currentSrc - restarting peer"
    /interface wireguard peers disable [find where endpoint-address~"$peerEndpoint"]
    :delay 1s
    /interface wireguard peers enable [find where endpoint-address~"$peerEndpoint"]
    :set lastWgSrcIP $currentSrc
}

Then schedule it to run frequently:

routeros

/system scheduler add name=wg-source-monitor interval=10s on-event=check-wg-source-ip

3. Use Netwatch to detect tunnel failure

If you have a known IP on the other side of the tunnel, you can use Netwatch to detect when it becomes unreachable and trigger a restart:

routeros

/tool netwatch add host=10.255.0.2 interval=10s timeout=2s \\
    down-script="/interface wireguard peers disable [find]; :delay 2s; /interface wireguard peers enable [find]"

This is reactive rather than proactive but catches the zombie state.

4. Consider the endpoint on both sides

If router A has a stable public IP (or at least more stable than B), you might configure B with A's endpoint and have A with no endpoint configured for B — letting A be the "server" that B connects to. This inverts the problem.

Does router A have multiple WAN links or is this more about internal routing changes affecting the source IP selection? That might affect which approach is cleanest.

That points to a few more subtle possibilities:

Connection tracking holding stale state

RouterOS (like Linux) maintains connection tracking entries for UDP flows. Even after your route changes, the existing conntrack entry for the WireGuard UDP flow may keep the old source IP "pinned." The keepalives continue going out with the stale source because conntrack remembers the original tuple.

You can test this theory by checking:

routeros

/ip firewall connection print where dst-address~":51820"

If you see the old source IP there after a route change, that's your culprit.

Possible fixes:

Flush the relevant conntrack entry when routes change — add to your route-change script:

routeros

   /ip firewall connection remove [find where dst-address~"<router-b-ip>"]

Exclude WireGuard traffic from connection tracking entirely:

routeros

   /ip firewall raw add chain=prerouting protocol=udp port=51820 action=notrack
   /ip firewall raw add chain=output protocol=udp port=51820 action=notrack

This forces every packet to go through fresh routing decisions.

WireGuard interface source address binding

Check your WireGuard interface configuration:

routeros

/interface wireguard print detail

If listen-address is set to a specific IP rather than 0.0.0.0 or left default, that could anchor the source. It should typically be unset or 0.0.0.0.

Routing table lookup timing

There can also be a race condition where the WireGuard interface sends its keepalive before the routing table has fully converged. This is harder to solve elegantly but the conntrack removal approach usually catches it.

Does your setup involve NAT anywhere in the path, or is this purely routed? And are both routers under your control, or is router B a third-party endpoint?