fix(connectivity): clean up cell_relay policy routing on teardown
Unit Tests / test (push) Successful in 9m37s

A cell_relay policy-routes an assigned peer with `ip rule from <peer>
lookup <table>` plus a shared `default via <cell-ip>` route in that table
inside cell-wireguard. Two teardown bugs leaked both (confirmed on hardware,
pic0<->pic1):

- remove_peer_route_via deleted the rule with a hardcoded default table 100,
  but the v2 cell_relay path adds it with the connection's own table (1000+),
  so the rule never matched and survived peer detach/delete. It now deletes
  by source IP (table-agnostic), covering both the v2 and the legacy
  route-via (table 100) paths.
- nothing ever removed the table's shared default route: delete_connection
  explicitly skipped cell_relay and reconcile_cell_relays deletes the record
  directly. Added wireguard_manager.teardown_route_table(table) (removes any
  leftover lookup-<table> rules + flushes the table) and call it from both
  delete_connection and the reconcile removal path.

Also clear a peer's relay rule on peer deletion so a peer deleted while still
assigned doesn't leave a stale source rule that could misroute a future peer
reusing the IP.

Regression tests: detach removes the rule by source; delete_connection and
reconcile-removal each flush the relay table.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-17 11:34:41 -04:00
parent 639fb66e5b
commit 2ab6e715d8
4 changed files with 129 additions and 8 deletions
+24
View File
@@ -1437,6 +1437,18 @@ class ConnectivityManager(BaseServiceManager):
except Exception as e:
logger.warning(f"delete_connection: killswitch cleanup failed "
f"(non-fatal): {e}")
elif (record.get('type') == self.CELL_RELAY_TYPE
and isinstance(table, int)
and self.wireguard_manager is not None):
# A cell_relay policy-routes peers via a source ip rule + a
# shared default route in its table inside cell-wireguard. Per-peer
# detach removes the rules; the table's default route only goes
# away here, when the connection is gone — otherwise it leaks.
try:
self.wireguard_manager.teardown_route_table(table)
except Exception as e:
logger.warning(f"delete_connection: cell_relay route table "
f"cleanup failed (non-fatal): {e}")
for secret_ref in record.get('secret_refs', []):
if self.vault_manager is not None:
@@ -1554,6 +1566,18 @@ class ConnectivityManager(BaseServiceManager):
f"{cell_name!r} no longer offered but still "
f"referenced; keeping")
continue
# Flush the relay's policy-routing table (shared default route)
# before forgetting the record — this path deletes the config
# entry directly rather than via delete_connection, so it must
# do the same host-routing teardown or the route leaks.
rtable = rec.get('table')
if self.wireguard_manager is not None and isinstance(rtable, int):
try:
self.wireguard_manager.teardown_route_table(rtable)
except Exception as e:
logger.warning(f"reconcile_cell_relays: route table "
f"cleanup for {cell_name!r} failed "
f"(non-fatal): {e}")
try:
self.config_manager.delete_connection(rec.get('id'))
removed.append(rec.get('id'))