roof/pic - pic - Gitea: Git with a cup of tea

roof/pic

Author	SHA1	Message	Date
roof	8ee1d88e37	Add subnet conflict validation for wireguard.address and ip_range changes When a cell is connected to others, changing the local WireGuard address or Docker ip_range to a subnet that overlaps a connected cell's vpn_subnet would break routing. Both now return 409 with the conflicting cell name. - wireguard.address: derive network from new address, check all connected cells' vpn_subnet for overlap (after existing format validation) - ip_range: check all connected cells' vpn_subnet for overlap (after existing RFC-1918 validation) Tests: 4 cases each (overlap → 409, no overlap → ok, no cells → ok, format error still fires first → 400). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-04 10:00:58 -04:00
roof	c658d2b16c	Add domain conflict validation when changing domain or accepting heal invite Two gaps allowed a cell to take a domain already in use by a connected cell: 1. PUT /api/config domain change: added check against cell_link_manager's connected cells list before saving — returns 409 if the new domain collides with any connected cell's domain. 2. accept_invite healing path: a remote cell changing its domain via a re-invite was not validated against other connected cells' domains. Now calls _check_invite_conflicts(invite, exclude_cell=name) before applying any change. Also: the healing path now detects domain changes (alongside dns_ip/ vpn_subnet/endpoint), updates the stored domain, and refreshes the DNS forward rule when the domain changes. Tests: 3 new domain-conflict tests in test_config_validation.py; 3 new accept_invite healing tests in test_cell_link_manager.py. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-04 09:46:58 -04:00
roof	dc2606541c	feat: Phase 4 hardening — retry/backoff, loop detection, sync status UI + tests Phase 4.1 — Retry/backoff for failed permission pushes: - _compute_next_retry(): capped exponential backoff with jitter (60s–1h) - _record_push_result(): tracks push_attempts and next_retry_at per link - replay_pending_pushes(): skips links still in backoff window, logs deferred count - _load() migration: adds push_attempts/next_retry_at to existing records Phase 4.2 — Loop detection (A→B→A routing cycle): - set_peer_route_via(): returns 409 if target cell already routes peers through us - apply_remote_permissions(): soft warning when accepting exit-relay that would cycle Phase 4.3 — Sync staleness indicator in Cell Network UI: - SyncBadge component: green (synced), amber (pending/failed), gray (never) - Shows relativeTime of last sync + error message + next retry estimate - Injected into CellPanel header alongside tunnel online/handshake status Tests (54 new): - TestCheckInviteConflicts: subnet overlap, domain conflict, exclude_cell (9 tests) - TestPushInviteToRemote: success, 4xx, no endpoint, subprocess errors (7 tests) - TestAcceptInviteNew: new cell, idempotent, healing dns/subnet changes (16 tests) - TestAddConnectionMutualPairing: push-invite call, non-fatal failure (5 tests) - TestPeerSyncAcceptInvite endpoint: happy path, field validation, error propagation (16 tests) - Fixed 2 existing replay tests to clear backoff gate (simulates elapsed window) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-04 04:18:36 -04:00
roof	960a4ecc51	fix: WG address change now queues pending restart + heals cell connections Three issues fixed together: 1. WireGuard address changes now go through the pending-restart queue (shown in the UI banner) instead of restarting cell-wireguard immediately. Only private_key changes still restart immediately; address and port changes both defer to the user-initiated Apply flow. Previously the address change was silently applied and never appeared in Settings → Pending Configuration. 2. When the WG address changes, the API spawns a background thread that pushes the updated invite to all connected cells (over LAN, before the WG tunnel is back up). This lets remote cells automatically update their dns_ip, AllowedIPs, and CoreDNS forwarding rules without manual re-pairing. 3. accept_invite now handles the "already connected but changed" case: if the remote cell re-sends an invite with a different dns_ip, vpn_subnet or endpoint, we update the stored link, the WG AllowedIPs, and the CoreDNS forward rule in place — no delete/re-add required. Previously the endpoint was ignored and returned the stale record unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 08:29:18 -04:00
roof	0e16d6968a	fix: prevent test runs from corrupting live WG state; sync wg0.conf on IP change Three fixes: 1. Extend the docker-exec safety guard in wireguard_manager to also check for 'wg_confs' in the config path. When running unit tests on the host the API uses /app/config/wireguard/wg0.conf (no wg_confs subdir), so the old '/tmp/' \| 'pytest' check didn't fire — _syncconf and friends were executing live 'docker exec cell-wireguard wg set' calls against the running container, removing real VPN peers that didn't appear in the test config. The wg_confs subdir only exists inside the container mount, so its presence reliably gates live calls. 2. Fix get_split_tunnel_ips() wrong path: self.data_dir + 'api/cell_links.json' → self.data_dir + 'cell_links.json'. The extra 'api/' segment produced /app/data/api/cell_links.json inside the container instead of the real /app/data/cell_links.json, so connected cells were silently excluded from split-tunnel CIDRs. 3. update_peer_ip_registry and ip_update now also call wireguard_manager.update_peer_ip so wg0.conf AllowedIPs stay in sync when a peer's VPN IP changes at runtime (previously only peers.json was updated). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 07:45:28 -04:00
roof	99c1d9cd92	feat: auto mutual WG pairing + subnet/domain conflict detection Auto mutual pairing When Cell A imports Cell B's invite (POST /api/cells on A), A now immediately pushes its own invite to Cell B over the LAN (using the endpoint IP, before the WG tunnel exists) via the new endpoint: POST /api/cells/peer-sync/accept-invite Cell B auto-adds Cell A as a WireGuard peer and DNS forward, completing the bidirectional tunnel without any manual action on Cell B's UI. The endpoint is idempotent and unauthenticated (runs before WG tunnel). Previously, the pairing was one-sided: Cell A had Cell B as a WG peer but Cell B never had Cell A — the tunnel never established and all cross-cell operations silently failed. Conflict detection (add_connection + accept-invite) _check_invite_conflicts() now validates before connecting: - VPN subnet must not overlap own subnet or any already-connected cell's subnet - Domain must not match own domain or any already-connected cell's domain Returns clear error messages so the admin knows which cell to reconfigure. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 06:24:46 -04:00
roof	9a800e3b6b	feat: fix cross-cell service access — DNS DNAT, service DNAT, Caddy routing DNS A records now return the WireGuard server IP (10.0.0.1) instead of Docker bridge VIPs so cross-cell peers resolve service names correctly regardless of their bridge subnet. DNAT rules (wg0:53→cell-dns:53 and wg0:80→cell-caddy:80) are applied at startup. Caddy routes by Host header, eliminating the Docker bridge subnet conflict. Firewall cell rules allow DNS and service (Caddy) traffic from linked cell subnets. Split-tunnel AllowedIPs now dynamically includes connected-cell VPN subnets and drops the 172.20.0.0/16 range. Peers with route_via set now receive full-tunnel config (0.0.0.0/0) so all their traffic exits via the remote cell. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 03:12:09 -04:00
roof	8ea834e108	feat: Phase 3 - per-peer internet routing via exit cell Adds the ability to route a specific peer's internet traffic through a connected cell acting as an exit relay. Cell A side: - PUT /api/peers/<peer>/route-via {"via_cell": "cellB"} sets route_via - Updates WG AllowedIPs to include 0.0.0.0/0 for the exit cell peer - Adds ip rule + ip route in policy table inside cell-wireguard so the specific peer's traffic egresses via cellB's WG IP - Sets exit_relay_active on the cell link and pushes use_as_exit_relay=True to cellB via peer-sync Cell B side: - Receives use_as_exit_relay in the peer-sync payload - Calls apply_cell_rules(..., exit_relay=True) to add FORWARD -o eth0 ACCEPT - Stores remote_exit_relay_active flag for startup recovery Startup recovery: - apply_all_cell_rules passes exit_relay=remote_exit_relay_active (cellB) - _apply_startup_enforcement reapplies ip rule for each peer with route_via (cellA) since policy routing rules don't survive container restart peer_registry gets route_via field with lazy migration. 22 new tests across test_cell_link_manager, test_peer_registry, test_peer_route_via. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-01 16:23:31 -04:00
roof	dcee03dd3f	feat(cells): Phase 2 — exit-offer signaling between connected cells Adds the ability for a cell to signal to a peer that it's willing to route internet traffic on their behalf. This is the signaling layer for Phase 3 (per-peer routing via exit cell). Changes: - cell_links.json: exit_offered (bool) + remote_exit_offered (bool) fields with lazy migration (default false for existing records) - _push_permissions_to_remote: includes exit_offered in the push body - apply_remote_permissions: accepts exit_offered kwarg; stores it as remote_exit_offered on the matching cell link - peer-sync receiver: passes exit_offered from body to apply_remote_permissions - CellLinkManager.set_exit_offered(cell_name, offered): persists + triggers push so the remote learns of our offer immediately - PUT /api/cells/<name>/exit-offer: REST endpoint to toggle the flag - 12 new tests covering all new paths Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-01 15:49:21 -04:00
roof	a3d0cd5a48	feat(cells): Phase 1 — permission sync between connected PICs When PIC A updates service sharing permissions, it immediately pushes the mirrored state to PIC B over the WireGuard tunnel so B's UI shows what A is sharing with it in real time. Architecture: - Push model: update_permissions() → _push_permissions_to_remote() → POST /api/cells/peer-sync/permissions on remote cell - Auth: source IP must be inside a known cell's vpn_subnet (WireGuard tunnel proves identity) + body's from_public_key must match stored key - Mirror semantics: our inbound (what we share) → their outbound view - Non-fatal: push failures set pending_push=True; replay_pending_pushes() retries at startup so offline cells catch up on reconnect - add_connection() also pushes initial state so remote sees permissions immediately on the first connect New fields on cell_links.json records (lazy-migrated): remote_api_url, last_push_status, last_push_at, last_push_error, pending_push, last_remote_update_at New endpoint: POST /api/cells/peer-sync/permissions 30 new tests (1101 total). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-01 13:12:30 -04:00
roof	0b103ffafb	feat(cells): fix PIC-to-PIC connection + add service-sharing permissions Phase 1 — connection fixes: - routing_manager.stop(): remove iptables -F / -t nat -F nuclear flush that would wipe WireGuard MASQUERADE and all peer rules on any UI stop action - wireguard_manager.add_cell_peer(): reject vpn_subnet that overlaps the local WG network (routing blackhole — was the root cause of no handshake) - wireguard_manager._syncconf(): pass Endpoint to 'wg set' so cell peers with static endpoints are synced to the kernel (not just AllowedIPs) Phase 2 — service-sharing permissions backend: - firewall_manager: add _cell_tag(), clear_cell_rules(), apply_cell_rules(), apply_all_cell_rules() — iptables FORWARD rules for cell-to-cell traffic using 'pic-cell-<name>' comment tags, distinct from 'pic-peer-*' - app.py startup enforcement: call apply_all_cell_rules(cell_links) so rules survive API restarts - cell_link_manager: permissions schema {inbound, outbound} per service; lazy migration for existing entries; update_permissions(), get_permissions(); apply_cell_rules wired into add_connection/remove_connection - routes/cells.py: GET /api/cells/services, GET+PUT /api/cells/<n>/permissions; RuntimeError now returns 400 (not 500) from add_connection Removed broken 'test' cell (subnet 10.0.0.0/24 collided with local WG network). Second PIC must use a distinct subnet (e.g. 10.0.1.0/24) before reconnecting. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-01 08:35:24 -04:00
roof	5d0238ff3c	A5: Extract config routes into blueprint (app.py 1294 → 579 lines) Move all /api/config/* routes and pending-restart helpers into routes/config.py. Re-export helpers from app.py for backward compat: from routes.config import _set_pending_restart, _clear_pending_restart, _collect_service_ports, _dedup_changes Test patches updated: app._set_pending_restart → routes.config._set_pending_restart app._clear_pending_restart → routes.config._clear_pending_restart app.threading.Thread → routes.config.threading.Thread Remaining in app.py: Flask setup, middleware, health monitor thread, /health, /api/status, /api/health/history* (use module-level state). 1021 tests passing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-01 06:53:24 -04:00
roof	09138fbc18	A5: Extract all route groups into Flask blueprints (app.py -1735 lines) Extract 9 route groups out of app.py into routes/ blueprints: - routes/network.py — DNS, DHCP, NTP, network info/test (10 routes) - routes/wireguard.py — WireGuard keys, peers, config, enforcement (18 routes) - routes/cells.py — cell-to-cell connections (5 routes) - routes/peers.py — peer CRUD + IP update + _next_peer_ip helper (10 routes) - routes/routing.py — NAT, peer routes, firewall, iptables (17 routes) - routes/vault.py — certs, trust, secrets (19 routes) - routes/containers.py — containers, images, volumes (14 routes) - routes/services.py — service bus, logs, services status/connectivity (18 routes) - routes/peer_dashboard.py — peer-scoped dashboard/services (2 routes) All blueprints use lazy `from app import X` inside route bodies to preserve test patch compatibility (patch('app.email_manager', mock) still works). Also included in this commit: - A1 fix: backup/restore now includes email/calendar user files - A2 fix: apply_config sets applying=True flag via helper container - A3 fix: add_peer rolls back firewall on DNS failure app.py reduced: 3011 → 1294 lines. 1021 tests passing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-01 06:11:21 -04:00

13 Commits