fix(connectivity): clean up cell_relay policy routing on teardown
Unit Tests / test (push) Successful in 9m37s
Unit Tests / test (push) Successful in 9m37s
A cell_relay policy-routes an assigned peer with `ip rule from <peer> lookup <table>` plus a shared `default via <cell-ip>` route in that table inside cell-wireguard. Two teardown bugs leaked both (confirmed on hardware, pic0<->pic1): - remove_peer_route_via deleted the rule with a hardcoded default table 100, but the v2 cell_relay path adds it with the connection's own table (1000+), so the rule never matched and survived peer detach/delete. It now deletes by source IP (table-agnostic), covering both the v2 and the legacy route-via (table 100) paths. - nothing ever removed the table's shared default route: delete_connection explicitly skipped cell_relay and reconcile_cell_relays deletes the record directly. Added wireguard_manager.teardown_route_table(table) (removes any leftover lookup-<table> rules + flushes the table) and call it from both delete_connection and the reconcile removal path. Also clear a peer's relay rule on peer deletion so a peer deleted while still assigned doesn't leave a stale source rule that could misroute a future peer reusing the IP. Regression tests: detach removes the rule by source; delete_connection and reconcile-removal each flush the relay table. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -1437,6 +1437,18 @@ class ConnectivityManager(BaseServiceManager):
|
||||
except Exception as e:
|
||||
logger.warning(f"delete_connection: killswitch cleanup failed "
|
||||
f"(non-fatal): {e}")
|
||||
elif (record.get('type') == self.CELL_RELAY_TYPE
|
||||
and isinstance(table, int)
|
||||
and self.wireguard_manager is not None):
|
||||
# A cell_relay policy-routes peers via a source ip rule + a
|
||||
# shared default route in its table inside cell-wireguard. Per-peer
|
||||
# detach removes the rules; the table's default route only goes
|
||||
# away here, when the connection is gone — otherwise it leaks.
|
||||
try:
|
||||
self.wireguard_manager.teardown_route_table(table)
|
||||
except Exception as e:
|
||||
logger.warning(f"delete_connection: cell_relay route table "
|
||||
f"cleanup failed (non-fatal): {e}")
|
||||
|
||||
for secret_ref in record.get('secret_refs', []):
|
||||
if self.vault_manager is not None:
|
||||
@@ -1554,6 +1566,18 @@ class ConnectivityManager(BaseServiceManager):
|
||||
f"{cell_name!r} no longer offered but still "
|
||||
f"referenced; keeping")
|
||||
continue
|
||||
# Flush the relay's policy-routing table (shared default route)
|
||||
# before forgetting the record — this path deletes the config
|
||||
# entry directly rather than via delete_connection, so it must
|
||||
# do the same host-routing teardown or the route leaks.
|
||||
rtable = rec.get('table')
|
||||
if self.wireguard_manager is not None and isinstance(rtable, int):
|
||||
try:
|
||||
self.wireguard_manager.teardown_route_table(rtable)
|
||||
except Exception as e:
|
||||
logger.warning(f"reconcile_cell_relays: route table "
|
||||
f"cleanup for {cell_name!r} failed "
|
||||
f"(non-fatal): {e}")
|
||||
try:
|
||||
self.config_manager.delete_connection(rec.get('id'))
|
||||
removed.append(rec.get('id'))
|
||||
|
||||
Reference in New Issue
Block a user