Commit Graph

10 Commits

Author SHA1 Message Date
roof 69862331e7 fix: DDNS update token in body, webdav gating, regression tests
Unit Tests / test (push) Successful in 7m25s
- PicNgoDDNS.update(): send token in request body instead of Authorization
  header; DDNS server validates it from body (was returning HTTP 422 on
  every heartbeat, leaving IP record stale after fresh install)
- peers.py / Peers.jsx: webdav service_access only valid when 'files' store
  service is installed; was always shown even with no services, confusing
  users into thinking WebDAV was pre-installed
- 10 new regression tests: DDNS update body contract, Caddy always
  regenerates on startup with no services, peer role allowed on
  /api/services/active, webdav gating by installed services

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 16:56:12 -04:00
roof c696ca9ef6 fix: DNS split-horizon in DDNS mode, service access filter, health check, verbosity persistence
Unit Tests / test (push) Successful in 7m32s
- DNS (critical): add _configured_dns_params() that returns (primary_domain,
  split_horizon_zones) from config_manager so all apply_all_dns_rules() callers
  pass the correct primary zone (e.g. 'pic.ngo') and split-horizon list
  (e.g. ['pic1.pic.ngo']) instead of the FQDN as the primary — fixes
  DNS_PROBE_FINISHED_BAD_CONFIG for all external domains when on VPN

- firewall_manager: add split_horizon_zones param to apply_all_dns_rules()
  and forward it to generate_corefile()

- Peers: filter service_access list to installed services only; peers.py
  derives valid services from config_manager.get_installed_services() with
  the email→mail ID mapping; Peers.jsx fetches from /api/store/installed
  and filters the checkboxes and defaults accordingly

- Health check: fix file_manager→'files' ID mapping so files service health
  is checked when installed (was silently skipped due to 'file' vs 'files')

- Verbosity persistence: move log_levels.json from non-mounted
  /app/api/config/ to CONFIG_DIR (/app/config/) which maps to config/api/
  on the host; both load (managers.py) and save (routes/services.py) updated

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 13:05:58 -04:00
roof eee0e800aa feat: add GET /api/peers/<peer_name> endpoint
Unit Tests / test (push) Successful in 11m19s
Allows fetching a single peer by name. E2E tests need this to verify
persisted peer state after PUT operations.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 05:19:10 -04:00
roof 41d09c598b wire: AccountManager HTTP dispatch + EgressManager startup + egress API routes
Unit Tests / test (push) Successful in 11m15s
- add_peer() now calls account_manager.provision() for any installed store
  service whose manifest declares accounts.manager == 'http', enabling
  per-peer credential provisioning to third-party HTTP services
- reapply_on_startup() calls egress_manager.apply_all() so fwmark rules
  survive container restarts without manual intervention
- add GET /api/egress/status and PUT /api/egress/services/<id>/exit routes
  so the UI can read and override per-service egress policy
- tests: HTTP provision wiring (happy path + non-fatal failure), egress
  apply_all at startup (wired/unwired/failure cases)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-05 10:30:41 -04:00
roof 16fb362df7 feat: replace hardcoded service names with ServiceRegistry-driven Caddy and CoreDNS config
Unit Tests / test (push) Failing after 11s
Previously, CaddyManager and NetworkManager contained hardcoded lists of
service names (calendar, files, mail, webdav, etc.), meaning every new
service required a code change to appear in Caddy routes and DNS records.
Now both managers accept a service_registry parameter and derive their
service lists dynamically from the registry at runtime.

- CaddyManager: new _build_registry_service_routes() and
  _http01_service_pairs() methods pull routes from the registry
- NetworkManager: new _get_service_subdomains() method returns registry
  subdomains with a hardcoded fallback when no registry is wired in;
  _build_dns_records, stale-record detection, and service name sets all
  use the registry
- managers.py: service_registry constructed before network_manager so it
  can be injected into both CaddyManager and NetworkManager
- service_registry.py: validation chokepoint in get_caddy_routes() rejects
  invalid subdomain/backend values and reserved service names
- service_store_manager.py: _validate_manifest now validates top-level
  subdomain, backend, extra_subdomains, and extra_backends fields
- tests: 24 new tests covering registry-driven routing and DNS subdomain
  generation (test_caddy_registry_integration.py)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-28 18:27:52 -04:00
roof c2d215ee2e fix: cross-cell routing for split-tunnel peers
Three related fixes for split-tunnel peers that need to reach connected cells:

1. apply_peer_rules/apply_all_peer_rules now accept wg_subnet (actual local VPN
   subnet) and cell_subnets (connected cells' vpn_subnets) parameters instead of
   hardcoding 10.0.0.0/24. All callers (startup, add_peer, update_peer,
   apply-enforcement endpoint) pass the real values.

2. Explicit ACCEPT rules are inserted in FORWARD for each connected cell's
   subnet so split-tunnel peers (internet_access=False) can still reach
   connected cells via the wg0→wg0 path.

3. apply_ip_range in network_manager now loads cell_links.json and passes it
   to generate_corefile(), fixing a race where the bootstrap DNS thread could
   overwrite the Corefile and wipe cross-cell DNS forwarding zones on startup.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 14:36:28 -04:00
roof dc2606541c feat: Phase 4 hardening — retry/backoff, loop detection, sync status UI + tests
Phase 4.1 — Retry/backoff for failed permission pushes:
- _compute_next_retry(): capped exponential backoff with jitter (60s–1h)
- _record_push_result(): tracks push_attempts and next_retry_at per link
- replay_pending_pushes(): skips links still in backoff window, logs deferred count
- _load() migration: adds push_attempts/next_retry_at to existing records

Phase 4.2 — Loop detection (A→B→A routing cycle):
- set_peer_route_via(): returns 409 if target cell already routes peers through us
- apply_remote_permissions(): soft warning when accepting exit-relay that would cycle

Phase 4.3 — Sync staleness indicator in Cell Network UI:
- SyncBadge component: green (synced), amber (pending/failed), gray (never)
- Shows relativeTime of last sync + error message + next retry estimate
- Injected into CellPanel header alongside tunnel online/handshake status

Tests (54 new):
- TestCheckInviteConflicts: subnet overlap, domain conflict, exclude_cell (9 tests)
- TestPushInviteToRemote: success, 4xx, no endpoint, subprocess errors (7 tests)
- TestAcceptInviteNew: new cell, idempotent, healing dns/subnet changes (16 tests)
- TestAddConnectionMutualPairing: push-invite call, non-fatal failure (5 tests)
- TestPeerSyncAcceptInvite endpoint: happy path, field validation, error propagation (16 tests)
- Fixed 2 existing replay tests to clear backoff gate (simulates elapsed window)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 04:18:36 -04:00
roof 0e16d6968a fix: prevent test runs from corrupting live WG state; sync wg0.conf on IP change
Three fixes:

1. Extend the docker-exec safety guard in wireguard_manager to also check
   for 'wg_confs' in the config path.  When running unit tests on the host
   the API uses /app/config/wireguard/wg0.conf (no wg_confs subdir), so the
   old '/tmp/' | 'pytest' check didn't fire — _syncconf and friends were
   executing live 'docker exec cell-wireguard wg set' calls against the
   running container, removing real VPN peers that didn't appear in the
   test config.  The wg_confs subdir only exists inside the container mount,
   so its presence reliably gates live calls.

2. Fix get_split_tunnel_ips() wrong path: self.data_dir + 'api/cell_links.json'
   → self.data_dir + 'cell_links.json'.  The extra 'api/' segment produced
   /app/data/api/cell_links.json inside the container instead of the real
   /app/data/cell_links.json, so connected cells were silently excluded from
   split-tunnel CIDRs.

3. update_peer_ip_registry and ip_update now also call
   wireguard_manager.update_peer_ip so wg0.conf AllowedIPs stay in sync when
   a peer's VPN IP changes at runtime (previously only peers.json was updated).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 07:45:28 -04:00
roof 8ea834e108 feat: Phase 3 - per-peer internet routing via exit cell
Adds the ability to route a specific peer's internet traffic through a
connected cell acting as an exit relay.

Cell A side:
- PUT /api/peers/<peer>/route-via {"via_cell": "cellB"} sets route_via
- Updates WG AllowedIPs to include 0.0.0.0/0 for the exit cell peer
- Adds ip rule + ip route in policy table inside cell-wireguard so the
  specific peer's traffic egresses via cellB's WG IP
- Sets exit_relay_active on the cell link and pushes use_as_exit_relay=True
  to cellB via peer-sync

Cell B side:
- Receives use_as_exit_relay in the peer-sync payload
- Calls apply_cell_rules(..., exit_relay=True) to add FORWARD -o eth0 ACCEPT
- Stores remote_exit_relay_active flag for startup recovery

Startup recovery:
- apply_all_cell_rules passes exit_relay=remote_exit_relay_active (cellB)
- _apply_startup_enforcement reapplies ip rule for each peer with route_via (cellA)
  since policy routing rules don't survive container restart

peer_registry gets route_via field with lazy migration.
22 new tests across test_cell_link_manager, test_peer_registry, test_peer_route_via.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 16:23:31 -04:00
roof 09138fbc18 A5: Extract all route groups into Flask blueprints (app.py -1735 lines)
Extract 9 route groups out of app.py into routes/ blueprints:
- routes/network.py  — DNS, DHCP, NTP, network info/test (10 routes)
- routes/wireguard.py — WireGuard keys, peers, config, enforcement (18 routes)
- routes/cells.py    — cell-to-cell connections (5 routes)
- routes/peers.py    — peer CRUD + IP update + _next_peer_ip helper (10 routes)
- routes/routing.py  — NAT, peer routes, firewall, iptables (17 routes)
- routes/vault.py    — certs, trust, secrets (19 routes)
- routes/containers.py — containers, images, volumes (14 routes)
- routes/services.py — service bus, logs, services status/connectivity (18 routes)
- routes/peer_dashboard.py — peer-scoped dashboard/services (2 routes)

All blueprints use lazy `from app import X` inside route bodies to preserve
test patch compatibility (patch('app.email_manager', mock) still works).

Also included in this commit:
- A1 fix: backup/restore now includes email/calendar user files
- A2 fix: apply_config sets applying=True flag via helper container
- A3 fix: add_peer rolls back firewall on DNS failure

app.py reduced: 3011 → 1294 lines. 1021 tests passing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 06:11:21 -04:00