When a cell is connected to others, changing the local WireGuard address
or Docker ip_range to a subnet that overlaps a connected cell's vpn_subnet
would break routing. Both now return 409 with the conflicting cell name.
- wireguard.address: derive network from new address, check all connected
cells' vpn_subnet for overlap (after existing format validation)
- ip_range: check all connected cells' vpn_subnet for overlap (after
existing RFC-1918 validation)
Tests: 4 cases each (overlap → 409, no overlap → ok, no cells → ok,
format error still fires first → 400).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two gaps allowed a cell to take a domain already in use by a connected cell:
1. PUT /api/config domain change: added check against cell_link_manager's
connected cells list before saving — returns 409 if the new domain
collides with any connected cell's domain.
2. accept_invite healing path: a remote cell changing its domain via a
re-invite was not validated against other connected cells' domains.
Now calls _check_invite_conflicts(invite, exclude_cell=name) before
applying any change.
Also: the healing path now detects domain changes (alongside dns_ip/
vpn_subnet/endpoint), updates the stored domain, and refreshes the DNS
forward rule when the domain changes.
Tests: 3 new domain-conflict tests in test_config_validation.py;
3 new accept_invite healing tests in test_cell_link_manager.py.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three issues fixed together:
1. WireGuard address changes now go through the pending-restart queue
(shown in the UI banner) instead of restarting cell-wireguard immediately.
Only private_key changes still restart immediately; address and port
changes both defer to the user-initiated Apply flow. Previously the
address change was silently applied and never appeared in Settings →
Pending Configuration.
2. When the WG address changes, the API spawns a background thread that
pushes the updated invite to all connected cells (over LAN, before the
WG tunnel is back up). This lets remote cells automatically update
their dns_ip, AllowedIPs, and CoreDNS forwarding rules without manual
re-pairing.
3. accept_invite now handles the "already connected but changed" case:
if the remote cell re-sends an invite with a different dns_ip, vpn_subnet
or endpoint, we update the stored link, the WG AllowedIPs, and the
CoreDNS forward rule in place — no delete/re-add required. Previously
the endpoint was ignored and returned the stale record unchanged.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three fixes:
1. Extend the docker-exec safety guard in wireguard_manager to also check
for 'wg_confs' in the config path. When running unit tests on the host
the API uses /app/config/wireguard/wg0.conf (no wg_confs subdir), so the
old '/tmp/' | 'pytest' check didn't fire — _syncconf and friends were
executing live 'docker exec cell-wireguard wg set' calls against the
running container, removing real VPN peers that didn't appear in the
test config. The wg_confs subdir only exists inside the container mount,
so its presence reliably gates live calls.
2. Fix get_split_tunnel_ips() wrong path: self.data_dir + 'api/cell_links.json'
→ self.data_dir + 'cell_links.json'. The extra 'api/' segment produced
/app/data/api/cell_links.json inside the container instead of the real
/app/data/cell_links.json, so connected cells were silently excluded from
split-tunnel CIDRs.
3. update_peer_ip_registry and ip_update now also call
wireguard_manager.update_peer_ip so wg0.conf AllowedIPs stay in sync when
a peer's VPN IP changes at runtime (previously only peers.json was updated).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
**Auto mutual pairing**
When Cell A imports Cell B's invite (POST /api/cells on A), A now
immediately pushes its own invite to Cell B over the LAN (using the
endpoint IP, before the WG tunnel exists) via the new endpoint:
POST /api/cells/peer-sync/accept-invite
Cell B auto-adds Cell A as a WireGuard peer and DNS forward, completing
the bidirectional tunnel without any manual action on Cell B's UI.
The endpoint is idempotent and unauthenticated (runs before WG tunnel).
Previously, the pairing was one-sided: Cell A had Cell B as a WG peer
but Cell B never had Cell A — the tunnel never established and all
cross-cell operations silently failed.
**Conflict detection (add_connection + accept-invite)**
_check_invite_conflicts() now validates before connecting:
- VPN subnet must not overlap own subnet or any already-connected cell's subnet
- Domain must not match own domain or any already-connected cell's domain
Returns clear error messages so the admin knows which cell to reconfigure.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
DNS A records now return the WireGuard server IP (10.0.0.1) instead of
Docker bridge VIPs so cross-cell peers resolve service names correctly
regardless of their bridge subnet. DNAT rules (wg0:53→cell-dns:53 and
wg0:80→cell-caddy:80) are applied at startup. Caddy routes by Host header,
eliminating the Docker bridge subnet conflict. Firewall cell rules allow
DNS and service (Caddy) traffic from linked cell subnets. Split-tunnel
AllowedIPs now dynamically includes connected-cell VPN subnets and drops
the 172.20.0.0/16 range. Peers with route_via set now receive full-tunnel
config (0.0.0.0/0) so all their traffic exits via the remote cell.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds the ability to route a specific peer's internet traffic through a
connected cell acting as an exit relay.
Cell A side:
- PUT /api/peers/<peer>/route-via {"via_cell": "cellB"} sets route_via
- Updates WG AllowedIPs to include 0.0.0.0/0 for the exit cell peer
- Adds ip rule + ip route in policy table inside cell-wireguard so the
specific peer's traffic egresses via cellB's WG IP
- Sets exit_relay_active on the cell link and pushes use_as_exit_relay=True
to cellB via peer-sync
Cell B side:
- Receives use_as_exit_relay in the peer-sync payload
- Calls apply_cell_rules(..., exit_relay=True) to add FORWARD -o eth0 ACCEPT
- Stores remote_exit_relay_active flag for startup recovery
Startup recovery:
- apply_all_cell_rules passes exit_relay=remote_exit_relay_active (cellB)
- _apply_startup_enforcement reapplies ip rule for each peer with route_via (cellA)
since policy routing rules don't survive container restart
peer_registry gets route_via field with lazy migration.
22 new tests across test_cell_link_manager, test_peer_registry, test_peer_route_via.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds the ability for a cell to signal to a peer that it's willing to
route internet traffic on their behalf. This is the signaling layer
for Phase 3 (per-peer routing via exit cell).
Changes:
- cell_links.json: exit_offered (bool) + remote_exit_offered (bool)
fields with lazy migration (default false for existing records)
- _push_permissions_to_remote: includes exit_offered in the push body
- apply_remote_permissions: accepts exit_offered kwarg; stores it as
remote_exit_offered on the matching cell link
- peer-sync receiver: passes exit_offered from body to apply_remote_permissions
- CellLinkManager.set_exit_offered(cell_name, offered): persists +
triggers push so the remote learns of our offer immediately
- PUT /api/cells/<name>/exit-offer: REST endpoint to toggle the flag
- 12 new tests covering all new paths
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When PIC A updates service sharing permissions, it immediately pushes
the mirrored state to PIC B over the WireGuard tunnel so B's UI shows
what A is sharing with it in real time.
Architecture:
- Push model: update_permissions() → _push_permissions_to_remote() →
POST /api/cells/peer-sync/permissions on remote cell
- Auth: source IP must be inside a known cell's vpn_subnet (WireGuard
tunnel proves identity) + body's from_public_key must match stored key
- Mirror semantics: our inbound (what we share) → their outbound view
- Non-fatal: push failures set pending_push=True; replay_pending_pushes()
retries at startup so offline cells catch up on reconnect
- add_connection() also pushes initial state so remote sees permissions
immediately on the first connect
New fields on cell_links.json records (lazy-migrated):
remote_api_url, last_push_status, last_push_at, last_push_error,
pending_push, last_remote_update_at
New endpoint: POST /api/cells/peer-sync/permissions
30 new tests (1101 total).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Phase 1 — connection fixes:
- routing_manager.stop(): remove iptables -F / -t nat -F nuclear flush that
would wipe WireGuard MASQUERADE and all peer rules on any UI stop action
- wireguard_manager.add_cell_peer(): reject vpn_subnet that overlaps the local
WG network (routing blackhole — was the root cause of no handshake)
- wireguard_manager._syncconf(): pass Endpoint to 'wg set' so cell peers with
static endpoints are synced to the kernel (not just AllowedIPs)
Phase 2 — service-sharing permissions backend:
- firewall_manager: add _cell_tag(), clear_cell_rules(), apply_cell_rules(),
apply_all_cell_rules() — iptables FORWARD rules for cell-to-cell traffic
using 'pic-cell-<name>' comment tags, distinct from 'pic-peer-*'
- app.py startup enforcement: call apply_all_cell_rules(cell_links) so rules
survive API restarts
- cell_link_manager: permissions schema {inbound, outbound} per service;
lazy migration for existing entries; update_permissions(), get_permissions();
apply_cell_rules wired into add_connection/remove_connection
- routes/cells.py: GET /api/cells/services, GET+PUT /api/cells/<n>/permissions;
RuntimeError now returns 400 (not 500) from add_connection
Removed broken 'test' cell (subnet 10.0.0.0/24 collided with local WG network).
Second PIC must use a distinct subnet (e.g. 10.0.1.0/24) before reconnecting.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>