roof/pic - pic - Gitea: Git with a cup of tea

roof/pic

Author	SHA1	Message	Date
roof	16fb362df7	feat: replace hardcoded service names with ServiceRegistry-driven Caddy and CoreDNS config Unit Tests / test (push) Failing after 11s Details Previously, CaddyManager and NetworkManager contained hardcoded lists of service names (calendar, files, mail, webdav, etc.), meaning every new service required a code change to appear in Caddy routes and DNS records. Now both managers accept a service_registry parameter and derive their service lists dynamically from the registry at runtime. - CaddyManager: new _build_registry_service_routes() and _http01_service_pairs() methods pull routes from the registry - NetworkManager: new _get_service_subdomains() method returns registry subdomains with a hardcoded fallback when no registry is wired in; _build_dns_records, stale-record detection, and service name sets all use the registry - managers.py: service_registry constructed before network_manager so it can be injected into both CaddyManager and NetworkManager - service_registry.py: validation chokepoint in get_caddy_routes() rejects invalid subdomain/backend values and reserved service names - service_store_manager.py: _validate_manifest now validates top-level subdomain, backend, extra_subdomains, and extra_backends fields - tests: 24 new tests covering registry-driven routing and DNS subdomain generation (test_caddy_registry_integration.py) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-28 18:27:52 -04:00
roof	0afdee32da	feat: Services UI — nested nav, per-service pages, settings migration Rename Store → Services: ServicesIndex.jsx shows built-in core services (Email, Calendar, Files) with Manage links, plus the existing add-on store below. New service sub-pages at /services/email\|calendar\|files serve both admin and peer roles. Admins see connection info, service status, users list, and an inline config form (port/data-dir). Peers see connection info and their personal credentials fetched from peerAPI. Navigation restructured: a Services parent item expands to show the three sub-pages via a collapsible sidebar group (ChevronDown toggle). Both admin and peer navigation include the Services group. Sidebar extracted NavItem/NavList components to eliminate the duplicate mobile/ desktop rendering. Settings.jsx drops EmailForm, CalendarForm, FilesForm and their SERVICE_DEFS entries. Port conflict detection and per-service validation logic extracted to utils/serviceConfig.js, shared by Settings and the new service pages. Service form flushers are registered without cleanup so the Apply banner saves dirty config even when the user navigates away from a service page before clicking Apply. Legacy routes /email, /calendar, /files, /store redirect to their new canonical paths. GET /api/config now includes installed_services so the nav can derive which add-ons are installed without a separate store fetch. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-28 06:46:17 -04:00
roof	b16189d00f	Fix three DNS corruption bugs in DDNS/non-LAN mode Unit Tests / test (push) Successful in 11m30s Details apply_cell_name() now skips multi-label zone files (split-horizon DDNS zones like pic2.pic.ngo.zone) and excludes '*' and '@' from hostname candidate detection, preventing the wildcard record from being renamed to the old cell name during a cell rename. update_split_horizon_zone() now deletes stale zone files from previous cell names sharing the same TLD (e.g. pic3.pic.ngo.zone when renaming to pic2.pic.ngo), eliminating orphaned DNS entries. _bootstrap_dns() now detects non-LAN domain modes and calls update_split_horizon_zone() instead of apply_ip_range(), preventing service records (api, calendar, files…) from being re-injected into the DDNS parent zone on every container restart. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-28 05:56:00 -04:00
roof	66500bb128	fix: use effective_domain for service links and clean up stale DNS records Unit Tests / test (push) Successful in 11m32s Details Dashboard, Email, Calendar, and Files pages were building service URLs with the internal LAN zone name (e.g. 'cell') instead of the public effective domain (e.g. 'pic2.pic.ngo'), and always using http:// even in DDNS mode where HTTPS is available. Changes: - Dashboard/Email/Calendar/Files: read effective_domain + domain_mode from ConfigContext; use effective_domain in non-LAN mode and https:// for all DDNS domain modes. - Calendar: show port 443 instead of 80 in DDNS mode. - network_manager.update_split_horizon_zone: when the primary internal zone name is a parent of the effective DDNS domain (e.g. pic.ngo is a parent of pic2.pic.ngo), remove stale bootstrap service records (api, calendar, files, mail, webmail, webdav) that pollute the DNS display and would shadow public DNS responses. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-28 05:06:52 -04:00
roof	d7dbd596ab	feat: route PIC services as subdomains of the cell's effective domain Unit Tests / test (push) Successful in 11m33s Details In DDNS modes (pic_ngo, cloudflare, duckdns, http01), all built-in services are now reachable as subdomains of the cell domain, e.g. calendar.pic1.pic.ngo instead of pic1.pic.ngo/calendar. Key changes: - CaddyManager._build_core_service_routes(): new helper generates Caddy named-matcher host blocks for calendar, mail/webmail, files, webdav, and api subdomains within the wildcard TLS server block. - All ACME modes (pic_ngo, cloudflare, duckdns) use the new subdomain matchers; http01 emits a dedicated server block per service. - http01: installed store-plugin services whose name clashes with a core service are skipped to prevent duplicate server blocks. - routes/config.py: ip_utils.write_caddyfile() is skipped in non-LAN modes so LAN Caddy config never overwrites the ACME config. - firewall_manager.generate_corefile(): new split_horizon_zones param adds local authoritative file zones so LAN clients resolve *.pic1.pic.ngo to the internal Caddy IP without hairpin NAT. - NetworkManager.update_split_horizon_zone(): writes the wildcard zone file and regenerates the Corefile with the split-horizon block; called automatically after every identity change in non-LAN mode. - Added @ to allowed record-name chars in update_dns_zone validation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-28 04:31:57 -04:00
roof	1f016de855	feat: make DDNS domain_name the effective domain across all services Unit Tests / test (push) Successful in 11m35s Details - ConfigManager.get_effective_domain(): returns domain_name when DDNS active (pic_ngo/cloudflare/duckdns), domain otherwise. Used by all public-facing services so they use the real registered FQDN. - ConfigManager.get_internal_domain(): always returns _identity.domain (CoreDNS zone name, dnsmasq, cell-link invites — stays internal). - Silent migration: if domain_mode != lan and domain is generic "cell", auto-set to {cell_name}.local for unique CoreDNS zone naming. - caddy_manager: fix custom_domain bug — cloudflare/http01 modes were reading identity.get('custom_domain') which never exists; now reads domain_name correctly. - routes/config, app: expose effective_domain in GET /api/config and /api/status responses. - email_manager, routes/email: use get_effective_domain() for OVERRIDE_HOSTNAME, POSTMASTER_ADDRESS, and new-user email defaults. - ServiceBus.IDENTITY_CHANGED event: emitted from PUT /api/config and POST /api/ddns/register after identity writes; caddy_manager and email_manager subscribe to regenerate config automatically. - Settings.jsx: hide Local Domain input in non-LAN modes; show read-only effective_domain with "managed by DDNS" badge and an Advanced toggle for the internal CoreDNS zone name. - 11 new test classes covering all new helpers, event subscriptions, caddy/email handlers, and the custom_domain fix. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-28 02:48:47 -04:00
roof	ad2eaca273	feat: release old pic.ngo subdomain when cell name changes Unit Tests / test (push) Successful in 15m45s Details Adds DELETE /api/v1/registration to the DDNS server (token-authenticated, owner-only) and PicNgoDDNS.release() on the client. DDNSManager.register() now automatically releases the old subdomain before claiming the new one, so stale names are freed for others to use. Release failures are logged as warnings and do not block the new registration. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-26 17:07:13 -04:00
roof	de43f4a9a0	fix: DDNS register() always sends public IP and saves token to correct location Unit Tests / test (push) Successful in 15m27s Details Two bugs that prevented registration from working after wizard completion: 1. register(name, '') sent empty IP; server stored blank A record. Now calls _get_public_ip() when ip is empty so the A record is always set correctly. 2. Token was saved to _identity.domain.ddns.token (TypeError when domain is a string) instead of the top-level ddns config where update_ip() reads it. Subdomain also now correctly written to _identity.domain_name. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-26 16:05:55 -04:00
roof	0b31d02f10	feat: DDNS self-healing heartbeat + manual re-register endpoint Unit Tests / test (push) Successful in 15m26s Details - DDNSTokenExpired exception triggers auto re-register in update_ip() so cells recover silently after a DDNS DB reset - POST /api/ddns/register lets the user force re-registration from Settings - Re-register button in Settings → External Domain & DDNS (pic_ngo only) - 3 new tests covering register endpoint: wrong provider, missing name, success Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-26 15:05:27 -04:00
roof	cde177966d	fix: DDNS URL env var takes priority; switch default to HTTPS - ddns_manager: DDNS_URL env var overrides stored api_base_url so existing cells pick up the new HTTPS endpoint without re-registering - docker-compose.yml: default DDNS_URL now points to https://ddns.pic.ngo - setup_manager.py: add rstrip('/') before replacing /api/v1 to handle URLs with or without trailing slash Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-26 14:50:28 -04:00
roof	61e8631c7d	feat: DDNS settings integration — check availability, update credentials - GET /api/config now returns domain_mode, domain_name, ddns.{provider,subdomain,has_token} - GET /api/ddns/check/<name> proxies availability check to DDNS service - PUT /api/ddns validates and saves cloudflare/duckdns credentials post-setup - When cell_name changes for pic_ngo provider, auto-registers the new subdomain - Settings: Cell Name shows availability badge for pic_ngo; auto-save blocks on taken - Settings: new External Domain & DDNS section — pic_ngo info, cloudflare/duckdns edit - 11 new tests for the two new endpoints (all pass) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-26 14:35:37 -04:00
roof	777ffa4fb2	fix: use DDNS_URL env var for availability check; default to port 8080 Unit Tests / test (push) Successful in 15m23s Details _check_pic_ngo_available was hardcoding https://ddns.pic.ngo, ignoring DDNS_URL. Now imports DDNS_API_BASE from setup_manager so both the availability check and DDNS registration use the same configured URL. API container now receives DDNS_URL and DDNS_TOTP_SECRET from env. Default DDNS_URL points to http://ddns.pic.ngo:8080/api/v1 (the FastAPI service runs on port 8080 without TLS termination in front). Also returns 503 (not 500) when the DDNS service is unreachable. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-26 13:06:44 -04:00
roof	1c62c47475	fix: 500 on setup complete + wizard shows all 7 steps Unit Tests / test (push) Successful in 15m41s Details Two bugs: 1. AttributeError: AuthManager.update_password does not exist — the fallback when create_user fails should call set_password_admin(). This caused a 500 on every setup submit when an admin user already existed (e.g. from a previous install attempt). 2. Wizard was jumping to step 2 and skipping domain steps 3-4 when preconfigured data existed in cell_config.json. Since the installer no longer sets that data, and the wizard must always show all steps, the installerConfigured state and all step-skipping navigation is removed. Values are still pre-filled if found in config. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-25 16:41:33 -04:00
roof	4a42ff5dcc	wizard: move all config to /setup; install.sh is infrastructure-only Unit Tests / test (push) Successful in 15m41s Details install.sh no longer prompts for anything. It installs packages (with sudo), creates the system user, clones the repo, and runs 'make install' — all as the invoking user. Only package installs and system-level ops use sudo. All folder creation happens under the user's own account, no chown needed. /setup wizard gains the missing validation that was previously in install.sh: - Step 1: checks pic.ngo name availability via backend (non-blocking) - Step 4: 'Verify token' button for Cloudflare and DuckDNS tokens, validated server-side through new /api/setup/validate steps API changes (routes/setup.py): - validate step 'pic_ngo_available': proxy check to ddns.pic.ngo - validate step 'cloudflare_token': verify via Cloudflare tokens API - validate step 'duckdns_token': verify via DuckDNS update endpoint Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-25 16:07:56 -04:00
roof	2d842abe5b	installer: restore cell identity prompts and domain setup Unit Tests / test (push) Successful in 15m39s Details Reverts `8d1ef39`. The installer must collect cell name, domain mode, and provider tokens before 'make install' so that DDNS registration, availability checks, and Caddy TLS can be configured at first boot. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-25 15:01:32 -04:00
roof	f550f04ce2	Fix DDNS registration and wizard pre-fill after installer run Unit Tests / test (push) Successful in 15m29s Details DDNS registration (setup_cell.py): - Replace pyotp dependency with stdlib TOTP (HMAC-SHA1, RFC 6238) pyotp is only available inside the Docker container, not on the host where setup_cell.py runs — registration was silently skipped every time - OTP header still sent if generation succeeds; omitted gracefully if not Wizard pre-fill (setup_manager + Setup.jsx): - GET /api/setup/status now returns 'preconfigured' dict with cell_name, domain_mode, domain_name, and provider tokens from installer-written config - Setup.jsx fetches status on mount and pre-fills all form state so the user only needs to set password, services, and timezone — not re-enter the identity they already configured in the bash installer - Fails silently so wizard still works on fresh installs with no config Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-25 12:22:53 -04:00
roof	925ab1f696	Overhaul setup wizard: domain config, password strength, field alignment Unit Tests / test (push) Successful in 8m48s Details Password: - Add lowercase to strength scoring; "Good" now requires all API criteria (12 chars, upper, lower, digit) — no more submitting passwords the API rejects - isReady gates the Next button on meeting API requirements, not just length Domain steps 3 + 4: - Step 3: choose pic_ngo / custom / lan (sends valid API domain_modes) - Step 4 (pic.ngo): shows derived [cellName].pic.ngo domain preview - Step 4 (custom): domain name field + TLS method selector (Cloudflare DNS-01 + API token, DuckDNS + token, HTTP-01 + port-80 warning) - Step 4 skipped entirely for LAN-only - Review step shows actual domain string and TLS method instead of opaque codes Cell name: - Description and preview hint make clear it becomes the pic.ngo subdomain - Step 1 shows live "name.pic.ngo" preview as you type Backend: - setup_manager now accepts and stores domain_name, cloudflare_api_token, duckdns_token for Phase 3 DDNS registration use Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-11 07:27:59 -04:00
roof	24877df976	Fix setup wizard and installer for fresh-install flow Unit Tests / test (push) Successful in 8m53s Details - setup_manager: fall back to update_password if admin already exists (installer bootstrap creates admin; wizard now updates rather than fails) - install.sh: chown repo to SUDO_USER instead of pic user so the invoking operator can run make update without git safe.directory errors - test: update mock to also stub update_password when testing total auth failure Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-11 06:08:55 -04:00
roof	1989dfa0a3	Fix: exempt /api/setup/* from enforce_auth so setup wizard works on fresh install Unit Tests / test (push) Successful in 8m49s Details The setup wizard runs before any account exists, but the installer's setup_cell.py creates auth_users.json with an admin account first. This meant enforce_auth was active by the time the browser hit /setup, blocking all /api/setup/* calls with 401. The CSRF hook already exempted /api/setup/* — auth enforcement now matches. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-11 05:03:44 -04:00
roof	f1b48208fc	Fix CI unit test failures and DDNS config wiring Unit Tests / test (push) Failing after 8m58s Details - auth_manager._ensure_file(): stop creating the empty auth_users.json on init — the constructor now only creates the parent directory. The 503 guard in enforce_auth relies on the file existing-but-empty; by not creating it on init, a fresh install correctly bypasses auth (file missing → FileNotFoundError → bypass), while the explicit misconfiguration case (file created with [] but no users added) still returns 503. - test_enforce_auth_configured.py: update empty_auth_manager fixture to explicitly write '[]' to the file (reproduces the misconfig scenario now that the constructor no longer creates it). - ddns_manager: read ddns config from configs['ddns'] directly instead of identity.domain.ddns — _identity.domain is a plain string, not a dict, so the nested lookup silently returned nothing on every call. - setup_cell.py: write top-level 'ddns' block into cell_config.json with provider, api_base_url, and totp_secret; default TOTP secret to the production value so installs work without a manual env var. - test_ddns_manager.py: update _make_config_manager to populate cm.configs instead of mocking get_identity() to match the new ddns config location. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-10 04:20:19 -04:00
roof	e38bd4e81f	Phase 5: extended connectivity — WireGuard ext, OpenVPN, Tor exit routing - ConnectivityManager: per-peer exit routing via iptables fwmark/policy tables (wg_ext=0x10/t110, openvpn=0x20/t120, tor=0x30/t130) - Dedicated PIC_CONNECTIVITY chains (mangle+nat), kill-switch FORWARD DROP - Config upload with sanitization: strips PostUp/PostDown and OVpn script dirs - Peer exit_via field added to peer registry (backward-compat, default=default) - 7 Flask routes at /api/connectivity/* - Connectivity.jsx: 693-line frontend with exit cards, peer assignment table - 72 new tests for ConnectivityManager (72 passing) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-09 10:48:20 -04:00
roof	0a21f22076	Phase 4: service store — manifest validation, install/remove, Store UI - ServiceStoreManager: manifest allowlist (git.pic.ngo/roof/*), volume denylist, ACCEPT-only iptables rules, ${SERVICE_IP}-only dest_ip - IP allocator: pool 172.20.0.20-254, skips CONTAINER_OFFSETS VIPs - Compose overlay: docker-compose.services.yml auto-included via DCF - Flask blueprint at /api/store: list, install, remove, refresh - Store.jsx: full install/remove UI with spinners and toast notifications - 95 new unit tests for ServiceStoreManager (all passing) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-09 10:19:39 -04:00
roof	f77d7fabcd	Phase 3: ddns_manager — DDNS client, provider adapters, IP heartbeat Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-09 09:42:00 -04:00
roof	7d290c12c4	Phase 2: caddy_manager — Caddyfile generation, health monitor, DNS-01 support Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-09 09:04:11 -04:00
roof	cf1b9672f4	Phase 1: first-run setup wizard, bash installer, Docker profiles Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-09 08:05:38 -04:00
roof	b8e57b6e51	Fix race condition in ensure_forward_stateful: add threading.Lock Concurrent callers (health monitor + startup) could both pass the delete-all loop and each insert a copy, producing duplicate ESTABLISHED,RELATED rules. Lock serialises all calls. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-07 10:12:18 -04:00
roof	1b61e9e290	Fix ICMP latency: re-anchor ESTABLISHED,RELATED to FORWARD position 1 on every health tick Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-05 18:51:38 -04:00
roof	0042b3b1bb	Use alpine instead of busybox for cell subnet route injection pic1 ships alpine but not busybox; ensure_cell_subnet_routes() now uses the alpine image so route injection works on all cells. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-05 12:59:23 -04:00
roof	e2c50c381a	Fix cross-cell domain access: scope DNAT rules, add Docker→wg0 routing - firewall_manager: add _get_wg_server_ip() helper; scope ensure_cell_api_dnat(), ensure_dns_dnat(), ensure_service_dnat() DNAT rules with -d server_ip; add ensure_wg_masquerade() (Docker→wg0 MASQUERADE+FORWARD) and ensure_cell_subnet_routes() (host routes via docker run busybox) - wireguard_manager: scope PostUp DNAT rules with -d server_ip in generate_config() and ensure_postup_dnat(); add Docker→wg0 MASQUERADE+FORWARD rules - app.py: call ensure_wg_masquerade() and ensure_cell_subnet_routes() in _apply_startup_enforcement() - tests/test_firewall_manager.py: mock _get_wg_server_ip, add test_dnat_is_scoped_to_server_ip and test_returns_false_when_wg_server_ip_not_found - tests/e2e/wg/test_cell_to_cell_routing.py: rewrite to use dynamic config (no hardcoded IPs/ports), add latency and domain access tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-05 12:37:02 -04:00
roof	1e1bda4679	Fix cross-cell ICMP routing: state-based cell DROP + e2e test The cell catch-all DROP rule blocked all traffic from a connected cell's subnet, including ESTABLISHED/RELATED packets (ICMP replies, TCP ACKs) for connections initiated by local VPN peers. This broke ping to the remote cell's WireGuard IP even when the cell-to-cell tunnel was healthy. Change the DROP to match only NEW,INVALID connections so established reply traffic passes through to the stateful ACCEPT rule. Also adds tests/e2e/wg/test_cell_to_cell_routing.py — an end-to-end test that brings up a real WireGuard tunnel from the test runner to pic1 and verifies full cross-cell routing including ICMP ping, API /health, and Caddy. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-05 10:59:11 -04:00
roof	5a4e292440	fix: allow reply traffic from connected cells through FORWARD chain apply_cell_rules drops all traffic from a cell's subnet except specific service ports. This also drops ICMP replies and TCP ACKs for connections initiated by local peers to the connected cell, breaking cross-cell routing (ping to 10.0.0.1 silently dropped by test's cell DROP rule). Fix: ensure_forward_stateful() inserts a stateful ESTABLISHED,RELATED ACCEPT at the top of FORWARD. Called from apply_cell_rules (every cell add/update) and from _apply_startup_enforcement. Idempotent. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-04 15:13:59 -04:00
roof	c2d215ee2e	fix: cross-cell routing for split-tunnel peers Three related fixes for split-tunnel peers that need to reach connected cells: 1. apply_peer_rules/apply_all_peer_rules now accept wg_subnet (actual local VPN subnet) and cell_subnets (connected cells' vpn_subnets) parameters instead of hardcoding 10.0.0.0/24. All callers (startup, add_peer, update_peer, apply-enforcement endpoint) pass the real values. 2. Explicit ACCEPT rules are inserted in FORWARD for each connected cell's subnet so split-tunnel peers (internet_access=False) can still reach connected cells via the wg0→wg0 path. 3. apply_ip_range in network_manager now loads cell_links.json and passes it to generate_corefile(), fixing a race where the bootstrap DNS thread could overwrite the Corefile and wipe cross-cell DNS forwarding zones on startup. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-04 14:36:28 -04:00
roof	8ee1d88e37	Add subnet conflict validation for wireguard.address and ip_range changes When a cell is connected to others, changing the local WireGuard address or Docker ip_range to a subnet that overlaps a connected cell's vpn_subnet would break routing. Both now return 409 with the conflicting cell name. - wireguard.address: derive network from new address, check all connected cells' vpn_subnet for overlap (after existing format validation) - ip_range: check all connected cells' vpn_subnet for overlap (after existing RFC-1918 validation) Tests: 4 cases each (overlap → 409, no overlap → ok, no cells → ok, format error still fires first → 400). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-04 10:00:58 -04:00
roof	c658d2b16c	Add domain conflict validation when changing domain or accepting heal invite Two gaps allowed a cell to take a domain already in use by a connected cell: 1. PUT /api/config domain change: added check against cell_link_manager's connected cells list before saving — returns 409 if the new domain collides with any connected cell's domain. 2. accept_invite healing path: a remote cell changing its domain via a re-invite was not validated against other connected cells' domains. Now calls _check_invite_conflicts(invite, exclude_cell=name) before applying any change. Also: the healing path now detects domain changes (alongside dns_ip/ vpn_subnet/endpoint), updates the stored domain, and refreshes the DNS forward rule when the domain changes. Tests: 3 new domain-conflict tests in test_config_validation.py; 3 new accept_invite healing tests in test_cell_link_manager.py. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-04 09:46:58 -04:00
roof	ac0c16c97b	Fix session cookie name collision when running multiple PIC instances on localhost Flask's default cookie name ('session') is shared across all ports on the same hostname. When two PIC instances are accessed via localhost:portA and localhost:portB, logging into one overwrites the other's session cookie, causing repeated logouts. Derive a unique 8-hex suffix from each instance's persistent SECRET_KEY and set SESSION_COOKIE_NAME = 'pic_sess_<suffix>'. This ensures each cell uses a distinct cookie name, so sessions are fully isolated regardless of hostname. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-04 09:15:42 -04:00
roof	28a193e430	Fix ensure_postup_dnat to strip-and-replace all DNAT rules idempotently _get_dnat_container_ips() used a concatenating docker inspect format that produced "invalid IP" when containers had multiple network attachments. The old ensure_postup_dnat appended rather than replacing, so each update call added a broken duplicate set of rules causing iptables to fail on startup and tear down wg0 entirely. Fix _get_dnat_container_ips to use a space separator in the format string and validate each token as a real IP before accepting it. Rewrite ensure_postup_dnat with _is_dnat_rule() helper: strips every managed DNAT/FORWARD rule (any IP, port 53/80) on semicolon-split and appends a single correct set — fully idempotent regardless of prior state. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-04 06:54:20 -04:00
roof	dc2606541c	feat: Phase 4 hardening — retry/backoff, loop detection, sync status UI + tests Phase 4.1 — Retry/backoff for failed permission pushes: - _compute_next_retry(): capped exponential backoff with jitter (60s–1h) - _record_push_result(): tracks push_attempts and next_retry_at per link - replay_pending_pushes(): skips links still in backoff window, logs deferred count - _load() migration: adds push_attempts/next_retry_at to existing records Phase 4.2 — Loop detection (A→B→A routing cycle): - set_peer_route_via(): returns 409 if target cell already routes peers through us - apply_remote_permissions(): soft warning when accepting exit-relay that would cycle Phase 4.3 — Sync staleness indicator in Cell Network UI: - SyncBadge component: green (synced), amber (pending/failed), gray (never) - Shows relativeTime of last sync + error message + next retry estimate - Injected into CellPanel header alongside tunnel online/handshake status Tests (54 new): - TestCheckInviteConflicts: subnet overlap, domain conflict, exclude_cell (9 tests) - TestPushInviteToRemote: success, 4xx, no endpoint, subprocess errors (7 tests) - TestAcceptInviteNew: new cell, idempotent, healing dns/subnet changes (16 tests) - TestAddConnectionMutualPairing: push-invite call, non-fatal failure (5 tests) - TestPeerSyncAcceptInvite endpoint: happy path, field validation, error propagation (16 tests) - Fixed 2 existing replay tests to clear backoff gate (simulates elapsed window) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-04 04:18:36 -04:00
roof	960a4ecc51	fix: WG address change now queues pending restart + heals cell connections Three issues fixed together: 1. WireGuard address changes now go through the pending-restart queue (shown in the UI banner) instead of restarting cell-wireguard immediately. Only private_key changes still restart immediately; address and port changes both defer to the user-initiated Apply flow. Previously the address change was silently applied and never appeared in Settings → Pending Configuration. 2. When the WG address changes, the API spawns a background thread that pushes the updated invite to all connected cells (over LAN, before the WG tunnel is back up). This lets remote cells automatically update their dns_ip, AllowedIPs, and CoreDNS forwarding rules without manual re-pairing. 3. accept_invite now handles the "already connected but changed" case: if the remote cell re-sends an invite with a different dns_ip, vpn_subnet or endpoint, we update the stored link, the WG AllowedIPs, and the CoreDNS forward rule in place — no delete/re-add required. Previously the endpoint was ignored and returned the stale record unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 08:29:18 -04:00
roof	0e16d6968a	fix: prevent test runs from corrupting live WG state; sync wg0.conf on IP change Three fixes: 1. Extend the docker-exec safety guard in wireguard_manager to also check for 'wg_confs' in the config path. When running unit tests on the host the API uses /app/config/wireguard/wg0.conf (no wg_confs subdir), so the old '/tmp/' \| 'pytest' check didn't fire — _syncconf and friends were executing live 'docker exec cell-wireguard wg set' calls against the running container, removing real VPN peers that didn't appear in the test config. The wg_confs subdir only exists inside the container mount, so its presence reliably gates live calls. 2. Fix get_split_tunnel_ips() wrong path: self.data_dir + 'api/cell_links.json' → self.data_dir + 'cell_links.json'. The extra 'api/' segment produced /app/data/api/cell_links.json inside the container instead of the real /app/data/cell_links.json, so connected cells were silently excluded from split-tunnel CIDRs. 3. update_peer_ip_registry and ip_update now also call wireguard_manager.update_peer_ip so wg0.conf AllowedIPs stay in sync when a peer's VPN IP changes at runtime (previously only peers.json was updated). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 07:45:28 -04:00
roof	99c1d9cd92	feat: auto mutual WG pairing + subnet/domain conflict detection Auto mutual pairing When Cell A imports Cell B's invite (POST /api/cells on A), A now immediately pushes its own invite to Cell B over the LAN (using the endpoint IP, before the WG tunnel exists) via the new endpoint: POST /api/cells/peer-sync/accept-invite Cell B auto-adds Cell A as a WireGuard peer and DNS forward, completing the bidirectional tunnel without any manual action on Cell B's UI. The endpoint is idempotent and unauthenticated (runs before WG tunnel). Previously, the pairing was one-sided: Cell A had Cell B as a WG peer but Cell B never had Cell A — the tunnel never established and all cross-cell operations silently failed. Conflict detection (add_connection + accept-invite) _check_invite_conflicts() now validates before connecting: - VPN subnet must not overlap own subnet or any already-connected cell's subnet - Domain must not match own domain or any already-connected cell's domain Returns clear error messages so the admin knows which cell to reconfigure. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 06:24:46 -04:00
roof	1a611e0474	fix: UI always accessible; fix exit-relay AllowedIPs not updating PIC UI always accessible (service_access=[]) Remove the per-peer Caddy:80 ACCEPT/DROP rule from apply_peer_rules. Service access was enforced at two layers (iptables DROP + CoreDNS ACL), but the iptables layer also blocked the PIC web UI served through Caddy. CoreDNS ACL alone is sufficient — DNS blocks service hostnames; the UI path through Caddy remains reachable regardless of service_access value. Exit-relay internet routing (route_via another cell) update_peer_ip validated new_ip as a single ip_network, rejecting the comma-separated '10.0.1.0/24, 0.0.0.0/0' string passed by update_cell_peer_allowed_ips(add_default_route=True). The AllowedIPs in wg0.conf was never updated, so WireGuard never routed internet traffic through the exit cell's tunnel. Fix: validate each CIDR individually and apply the change live via wg set without a container restart. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 05:41:22 -04:00
roof	c521fab1cb	fix: merge CoreDNS ACL per-service and add reload plugin; add peer/cell e2e tests - _build_acl_block: put all blocked IPs for a service in ONE acl block instead of one block per peer — the first block's allow-all was silently granting access to every peer after the first blocked one (first-match semantics) - generate_corefile: add 'reload' plugin so SIGUSR1 triggers Corefile reload in newer CoreDNS builds (without it the signal was a no-op) - tests/test_firewall_manager.py: new tests for single merged ACL block and the reload directive - tests/e2e/api/test_peer_access_update.py: e2e tests for service_access, internet_access, and peer_access updates persisting live to iptables/CoreDNS - tests/e2e/api/test_cell_to_cell.py: e2e tests for cell-to-cell connection management, permissions API, and cross-cell service access restrictions Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 04:57:37 -04:00
roof	f1666ba19c	fix: embed DNAT rules in wg0.conf PostUp for persistence + fix dns_ip in server config DNAT rules applied via docker exec are lost whenever wg-easy reloads the WireGuard interface (PostDown flushes the nat table then PostUp only re-adds static rules). Fix: embed DNS (port 53) and service (port 80) DNAT rules directly in wg0.conf PostUp/PostDown so they reapply on every interface restart. ensure_postup_dnat() patches existing configs on startup. get_server_config() now returns the WG server IP (e.g. 10.0.0.1) for dns_ip instead of the cell-dns container IP (172.20.0.3). This makes the value consistent with what get_peer_config() writes into the .conf file, and fixes the stale hint text in Peers.jsx and WireGuard.jsx. UI: fallback dns_ip changed from 172.20.0.3 to 10.0.0.1; split-tunnel fallback drops the 172.20.0.0/16 stale range. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 04:07:10 -04:00
roof	9a800e3b6b	feat: fix cross-cell service access — DNS DNAT, service DNAT, Caddy routing DNS A records now return the WireGuard server IP (10.0.0.1) instead of Docker bridge VIPs so cross-cell peers resolve service names correctly regardless of their bridge subnet. DNAT rules (wg0:53→cell-dns:53 and wg0:80→cell-caddy:80) are applied at startup. Caddy routes by Host header, eliminating the Docker bridge subnet conflict. Firewall cell rules allow DNS and service (Caddy) traffic from linked cell subnets. Split-tunnel AllowedIPs now dynamically includes connected-cell VPN subnets and drops the 172.20.0.0/16 range. Peers with route_via set now receive full-tunnel config (0.0.0.0/0) so all their traffic exits via the remote cell. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 03:12:09 -04:00
roof	f2f15eb17e	fix: restore cell WG peer blocks lost from wg0.conf on startup Cell link [Peer] blocks can vanish from wg0.conf after a container rebuild or config reset. The startup recovery previously only restored VPN peer rules (iptables) but not the WireGuard peer blocks needed for cell-to-cell tunnels, leaving the link red with no automatic recovery. Add _restore_cell_wg_peers() called from _apply_startup_enforcement() that reconciles wg0.conf against cell_links.json and re-adds any missing [Peer] blocks, then calls _syncconf() to hot-reload the interface. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 01:52:47 -04:00
roof	68c27b4521	security: replace WireGuard catch-all ACCEPT with DROP The PostUp rule appended `iptables -A FORWARD -i wg0 -j ACCEPT` which allowed any WireGuard-connected client full internet access regardless of per-peer rules, even when no peers were configured in wg0.conf. Fix: change PostUp/PostDown to use DROP as the catch-all. Per-peer and per-cell rules use -I (insert at top) so they take precedence; unknown or unconfigured WG traffic hits the DROP at the bottom. Also add reconcile_stale_peer_rules() called on startup to remove FORWARD rules for peer IPs that no longer exist in the registry, preventing deleted peers from retaining firewall access across container restarts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 00:31:55 -04:00
roof	8ea834e108	feat: Phase 3 - per-peer internet routing via exit cell Adds the ability to route a specific peer's internet traffic through a connected cell acting as an exit relay. Cell A side: - PUT /api/peers/<peer>/route-via {"via_cell": "cellB"} sets route_via - Updates WG AllowedIPs to include 0.0.0.0/0 for the exit cell peer - Adds ip rule + ip route in policy table inside cell-wireguard so the specific peer's traffic egresses via cellB's WG IP - Sets exit_relay_active on the cell link and pushes use_as_exit_relay=True to cellB via peer-sync Cell B side: - Receives use_as_exit_relay in the peer-sync payload - Calls apply_cell_rules(..., exit_relay=True) to add FORWARD -o eth0 ACCEPT - Stores remote_exit_relay_active flag for startup recovery Startup recovery: - apply_all_cell_rules passes exit_relay=remote_exit_relay_active (cellB) - _apply_startup_enforcement reapplies ip rule for each peer with route_via (cellA) since policy routing rules don't survive container restart peer_registry gets route_via field with lazy migration. 22 new tests across test_cell_link_manager, test_peer_registry, test_peer_route_via. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-01 16:23:31 -04:00
roof	dcee03dd3f	feat(cells): Phase 2 — exit-offer signaling between connected cells Adds the ability for a cell to signal to a peer that it's willing to route internet traffic on their behalf. This is the signaling layer for Phase 3 (per-peer routing via exit cell). Changes: - cell_links.json: exit_offered (bool) + remote_exit_offered (bool) fields with lazy migration (default false for existing records) - _push_permissions_to_remote: includes exit_offered in the push body - apply_remote_permissions: accepts exit_offered kwarg; stores it as remote_exit_offered on the matching cell link - peer-sync receiver: passes exit_offered from body to apply_remote_permissions - CellLinkManager.set_exit_offered(cell_name, offered): persists + triggers push so the remote learns of our offer immediately - PUT /api/cells/<name>/exit-offer: REST endpoint to toggle the flag - 12 new tests covering all new paths Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-01 15:49:21 -04:00
roof	7da0cbb714	fix: add X-Forwarded-For WG IP to peer-sync push curl command MASQUERADE rewrites the source IP of forwarded packets from the cell's WG address (10.0.x.1) to cell-wireguard's bridge IP (172.20.x.9). The peer-sync endpoint authenticates callers by checking that the source IP is inside a known cell's vpn_subnet, so MASQUERADE caused all pushes to fail with 403. Fix: _push_permissions_to_remote() now calls _local_wg_ip() to get the local wg0 address and passes it as X-Forwarded-For. _authenticate_peer_cell() already supports XFF for exactly this proxying scenario. Also adds a test verifying the header is present in the constructed curl command. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-01 15:24:08 -04:00
roof	59927b6ad7	fix: whitelist peer-sync endpoint from session auth + CSRF /api/cells/peer-sync/permissions is called over the WireGuard tunnel by remote cells — they have no session cookie and cannot produce a CSRF token. The endpoint authenticates via source IP (must be in the remote cell's vpn_subnet) and WireGuard public key instead. Without this, the global enforce_auth hook returns 401 before the route handler runs, so all cross-cell permission pushes fail even when the WG tunnel and iptables rules are correct. Also adds a test verifying the route can be reached without a session. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-01 14:59:57 -04:00

1 2 3

133 Commits