5 Commits

Author SHA1 Message Date
roof c2d215ee2e fix: cross-cell routing for split-tunnel peers
Three related fixes for split-tunnel peers that need to reach connected cells:

1. apply_peer_rules/apply_all_peer_rules now accept wg_subnet (actual local VPN
   subnet) and cell_subnets (connected cells' vpn_subnets) parameters instead of
   hardcoding 10.0.0.0/24. All callers (startup, add_peer, update_peer,
   apply-enforcement endpoint) pass the real values.

2. Explicit ACCEPT rules are inserted in FORWARD for each connected cell's
   subnet so split-tunnel peers (internet_access=False) can still reach
   connected cells via the wg0→wg0 path.

3. apply_ip_range in network_manager now loads cell_links.json and passes it
   to generate_corefile(), fixing a race where the bootstrap DNS thread could
   overwrite the Corefile and wipe cross-cell DNS forwarding zones on startup.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 14:36:28 -04:00
roof dc2606541c feat: Phase 4 hardening — retry/backoff, loop detection, sync status UI + tests
Phase 4.1 — Retry/backoff for failed permission pushes:
- _compute_next_retry(): capped exponential backoff with jitter (60s–1h)
- _record_push_result(): tracks push_attempts and next_retry_at per link
- replay_pending_pushes(): skips links still in backoff window, logs deferred count
- _load() migration: adds push_attempts/next_retry_at to existing records

Phase 4.2 — Loop detection (A→B→A routing cycle):
- set_peer_route_via(): returns 409 if target cell already routes peers through us
- apply_remote_permissions(): soft warning when accepting exit-relay that would cycle

Phase 4.3 — Sync staleness indicator in Cell Network UI:
- SyncBadge component: green (synced), amber (pending/failed), gray (never)
- Shows relativeTime of last sync + error message + next retry estimate
- Injected into CellPanel header alongside tunnel online/handshake status

Tests (54 new):
- TestCheckInviteConflicts: subnet overlap, domain conflict, exclude_cell (9 tests)
- TestPushInviteToRemote: success, 4xx, no endpoint, subprocess errors (7 tests)
- TestAcceptInviteNew: new cell, idempotent, healing dns/subnet changes (16 tests)
- TestAddConnectionMutualPairing: push-invite call, non-fatal failure (5 tests)
- TestPeerSyncAcceptInvite endpoint: happy path, field validation, error propagation (16 tests)
- Fixed 2 existing replay tests to clear backoff gate (simulates elapsed window)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 04:18:36 -04:00
roof 0e16d6968a fix: prevent test runs from corrupting live WG state; sync wg0.conf on IP change
Three fixes:

1. Extend the docker-exec safety guard in wireguard_manager to also check
   for 'wg_confs' in the config path.  When running unit tests on the host
   the API uses /app/config/wireguard/wg0.conf (no wg_confs subdir), so the
   old '/tmp/' | 'pytest' check didn't fire — _syncconf and friends were
   executing live 'docker exec cell-wireguard wg set' calls against the
   running container, removing real VPN peers that didn't appear in the
   test config.  The wg_confs subdir only exists inside the container mount,
   so its presence reliably gates live calls.

2. Fix get_split_tunnel_ips() wrong path: self.data_dir + 'api/cell_links.json'
   → self.data_dir + 'cell_links.json'.  The extra 'api/' segment produced
   /app/data/api/cell_links.json inside the container instead of the real
   /app/data/cell_links.json, so connected cells were silently excluded from
   split-tunnel CIDRs.

3. update_peer_ip_registry and ip_update now also call
   wireguard_manager.update_peer_ip so wg0.conf AllowedIPs stay in sync when
   a peer's VPN IP changes at runtime (previously only peers.json was updated).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 07:45:28 -04:00
roof 8ea834e108 feat: Phase 3 - per-peer internet routing via exit cell
Adds the ability to route a specific peer's internet traffic through a
connected cell acting as an exit relay.

Cell A side:
- PUT /api/peers/<peer>/route-via {"via_cell": "cellB"} sets route_via
- Updates WG AllowedIPs to include 0.0.0.0/0 for the exit cell peer
- Adds ip rule + ip route in policy table inside cell-wireguard so the
  specific peer's traffic egresses via cellB's WG IP
- Sets exit_relay_active on the cell link and pushes use_as_exit_relay=True
  to cellB via peer-sync

Cell B side:
- Receives use_as_exit_relay in the peer-sync payload
- Calls apply_cell_rules(..., exit_relay=True) to add FORWARD -o eth0 ACCEPT
- Stores remote_exit_relay_active flag for startup recovery

Startup recovery:
- apply_all_cell_rules passes exit_relay=remote_exit_relay_active (cellB)
- _apply_startup_enforcement reapplies ip rule for each peer with route_via (cellA)
  since policy routing rules don't survive container restart

peer_registry gets route_via field with lazy migration.
22 new tests across test_cell_link_manager, test_peer_registry, test_peer_route_via.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 16:23:31 -04:00
roof 09138fbc18 A5: Extract all route groups into Flask blueprints (app.py -1735 lines)
Extract 9 route groups out of app.py into routes/ blueprints:
- routes/network.py  — DNS, DHCP, NTP, network info/test (10 routes)
- routes/wireguard.py — WireGuard keys, peers, config, enforcement (18 routes)
- routes/cells.py    — cell-to-cell connections (5 routes)
- routes/peers.py    — peer CRUD + IP update + _next_peer_ip helper (10 routes)
- routes/routing.py  — NAT, peer routes, firewall, iptables (17 routes)
- routes/vault.py    — certs, trust, secrets (19 routes)
- routes/containers.py — containers, images, volumes (14 routes)
- routes/services.py — service bus, logs, services status/connectivity (18 routes)
- routes/peer_dashboard.py — peer-scoped dashboard/services (2 routes)

All blueprints use lazy `from app import X` inside route bodies to preserve
test patch compatibility (patch('app.email_manager', mock) still works).

Also included in this commit:
- A1 fix: backup/restore now includes email/calendar user files
- A2 fix: apply_config sets applying=True flag via helper container
- A3 fix: add_peer rolls back firewall on DNS failure

app.py reduced: 3011 → 1294 lines. 1021 tests passing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 06:11:21 -04:00