fix(connectivity): clean up cell_relay policy routing on teardown
Unit Tests / test (push) Successful in 9m37s

A cell_relay policy-routes an assigned peer with `ip rule from <peer>
lookup <table>` plus a shared `default via <cell-ip>` route in that table
inside cell-wireguard. Two teardown bugs leaked both (confirmed on hardware,
pic0<->pic1):

- remove_peer_route_via deleted the rule with a hardcoded default table 100,
  but the v2 cell_relay path adds it with the connection's own table (1000+),
  so the rule never matched and survived peer detach/delete. It now deletes
  by source IP (table-agnostic), covering both the v2 and the legacy
  route-via (table 100) paths.
- nothing ever removed the table's shared default route: delete_connection
  explicitly skipped cell_relay and reconcile_cell_relays deletes the record
  directly. Added wireguard_manager.teardown_route_table(table) (removes any
  leftover lookup-<table> rules + flushes the table) and call it from both
  delete_connection and the reconcile removal path.

Also clear a peer's relay rule on peer deletion so a peer deleted while still
assigned doesn't leave a stale source rule that could misroute a future peer
reusing the IP.

Regression tests: detach removes the rule by source; delete_connection and
reconcile-removal each flush the relay table.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-17 11:34:41 -04:00
parent 639fb66e5b
commit 2ab6e715d8
4 changed files with 129 additions and 8 deletions
+51
View File
@@ -319,5 +319,56 @@ class TestHealth(_Base):
self.assertEqual(health, 'down')
# ---------------------------------------------------------------------------
# Teardown cleanup — regression for the confirmed cell_relay routing leak.
#
# A cell_relay policy-routes a peer with `ip rule from <peer> lookup <table>`
# plus a shared `default via <cell-ip>` route in that table, inside
# cell-wireguard. Before the fix, detaching/deleting the peer left the rule
# (remove_peer_route_via used the wrong default table) and nothing ever flushed
# the table's default route — both leaked, confirmed on hardware.
# ---------------------------------------------------------------------------
class TestTeardownCleanup(_Base):
def _relay(self):
self.cell_link.list_connections.return_value = [_link('alpha')]
self.mgr.reconcile_cell_relays()
return self._raw_relays()[0]
def test_detach_removes_peer_ip_rule(self):
relay = self._relay()
peer = {'peer': 'laptop', 'ip': '10.0.0.5/32',
'exit_via': relay['id'], 'route_via': 'alpha'}
self.peer_registry.get_peer.return_value = peer
self.peer_registry.set_peer_exit_via.return_value = True
with patch.object(self.mgr, 'apply_routes'):
res = self.mgr.set_peer_exit('laptop', 'default')
self.assertTrue(res['ok'])
# The peer's source ip rule is cleared by source (table-agnostic), so it
# matches the relay's allocated table rather than the old default 100.
self.wg.remove_peer_route_via.assert_called_once_with('10.0.0.5')
def test_delete_connection_flushes_relay_route_table(self):
relay = self._relay()
# Not referenced by any peer (detached) → deletable.
self.peer_registry.list_peers.return_value = []
res = self.mgr.delete_connection(relay['id'])
self.assertTrue(res['ok'])
self.wg.teardown_route_table.assert_called_once_with(relay['table'])
def test_reconcile_removal_flushes_relay_route_table(self):
relay = self._relay()
table = relay['table']
# Offer withdrawn and not referenced → reconcile removes the relay and
# must flush its routing table (this path bypasses delete_connection).
self.cell_link.list_connections.return_value = [
_link('alpha', remote_exit_offered=False)]
self.peer_registry.list_peers.return_value = []
out = self.mgr.reconcile_cell_relays()
self.assertIn(relay['id'], out['removed'])
self.wg.teardown_route_table.assert_called_once_with(table)
if __name__ == '__main__':
unittest.main()