201 Commits

Author SHA1 Message Date
roof 925ab1f696 Overhaul setup wizard: domain config, password strength, field alignment
Unit Tests / test (push) Successful in 8m48s
Password:
- Add lowercase to strength scoring; "Good" now requires all API criteria
  (12 chars, upper, lower, digit) — no more submitting passwords the API rejects
- isReady gates the Next button on meeting API requirements, not just length

Domain steps 3 + 4:
- Step 3: choose pic_ngo / custom / lan (sends valid API domain_modes)
- Step 4 (pic.ngo): shows derived [cellName].pic.ngo domain preview
- Step 4 (custom): domain name field + TLS method selector
  (Cloudflare DNS-01 + API token, DuckDNS + token, HTTP-01 + port-80 warning)
- Step 4 skipped entirely for LAN-only
- Review step shows actual domain string and TLS method instead of opaque codes

Cell name:
- Description and preview hint make clear it becomes the pic.ngo subdomain
- Step 1 shows live "name.pic.ngo" preview as you type

Backend:
- setup_manager now accepts and stores domain_name, cloudflare_api_token,
  duckdns_token for Phase 3 DDNS registration use

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 07:27:59 -04:00
roof 439886624e Fix config/data ownership — chown to invoking user after make install
Unit Tests / test (push) Successful in 8m46s
make install runs as root so all generated files (config/, data/) land
as root:root. Added a chown pass in install.sh after make install
completes, re-applying REPO_OWNER ownership. Also fixed the make setup
chown to use SUDO_USER when invoked via sudo rather than always id -u
(which is 0 when running as root).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 06:46:12 -04:00
roof 24877df976 Fix setup wizard and installer for fresh-install flow
Unit Tests / test (push) Successful in 8m53s
- setup_manager: fall back to update_password if admin already exists
  (installer bootstrap creates admin; wizard now updates rather than fails)
- install.sh: chown repo to SUDO_USER instead of pic user so the
  invoking operator can run make update without git safe.directory errors
- test: update mock to also stub update_password when testing total auth failure

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 06:08:55 -04:00
roof bfa0d99dd1 Fix git safe.directory error for non-root users after install
Unit Tests / test (push) Successful in 8m55s
The installer runs as root and chowns /opt/pic to the pic user.
Any other user (roof, operator) running make update then hits
"detected dubious ownership". Fix: add /opt/pic to system-wide
git safe.directory after clone, and add same guard in make update.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 05:46:40 -04:00
roof 1e2cf5580f Fix setup wizard: align field names with API (domain_type→domain_mode, services→services_enabled)
Unit Tests / test (push) Successful in 8m52s
The wizard was sending domain_type and services but the API expected
domain_mode and services_enabled, causing a validation error on submit.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 05:36:18 -04:00
roof 1989dfa0a3 Fix: exempt /api/setup/* from enforce_auth so setup wizard works on fresh install
Unit Tests / test (push) Successful in 8m49s
The setup wizard runs before any account exists, but the installer's
setup_cell.py creates auth_users.json with an admin account first.
This meant enforce_auth was active by the time the browser hit /setup,
blocking all /api/setup/* calls with 401. The CSRF hook already exempted
/api/setup/* — auth enforcement now matches.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 05:03:44 -04:00
roof 5dab6377bc Restore https:// now that git.pic.ngo has a TLS certificate
Unit Tests / test (push) Failing after 15m59s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 04:33:51 -04:00
roof 0a24d20bbc Update QUICKSTART: use http for install.pic.ngo and git.pic.ngo (no HTTPS yet)
Unit Tests / test (push) Successful in 8m50s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 02:58:48 -04:00
roof 46599bd37e Fix installer: use http://git.pic.ngo without port (nginx forwards)
Unit Tests / test (push) Successful in 8m55s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 02:57:13 -04:00
roof dde4d9a53f Rewrite CLAUDE.md following article best practices
Unit Tests / test (push) Successful in 8m54s
Adds: tech stack, coding conventions, file placement rules, safety rules,
infrastructure topology table, and expands architecture with key-file table
and before-request hook documentation. Removes vague guidance, replaces
with actionable rules Claude can follow automatically.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-10 07:25:53 -04:00
roof 674a66f7a0 Revert registry port: git.pic.ngo uses standard port (DNS fix pending)
Unit Tests / test (push) Successful in 8m55s
2026-05-10 06:59:13 -04:00
roof 9df3bf6a17 Fix release workflow: registry is git.pic.ngo:3000 not port 80
Unit Tests / test (push) Successful in 8m55s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-10 06:52:42 -04:00
roof 0773179962 Gitignore .coverage files
Unit Tests / test (push) Successful in 8m55s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-10 06:28:40 -04:00
roof 3a35cf72d3 Fix CI failures on root — mock OSError instead of relying on filesystem
Tests assumed write to /nonexistent/... fails, but CI runs as root where
Linux allows creating any path. Use unittest.mock.patch on builtins.open
with OSError side_effect so the test is environment-independent.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-10 06:19:24 -04:00
roof 515f3d5075 Update QUICKSTART: lead with curl installer, document all domain modes
Unit Tests / test (push) Failing after 8m43s
Option A is now the one-line curl installer (install.pic.ngo); Option B
is the manual git clone path. Wizard section covers all five domain modes
(pic_ngo, cloudflare, duckdns, http01, lan) and current password rules.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-10 05:05:08 -04:00
roof 35993bc79d Update all documentation to reflect current architecture
Unit Tests / test (push) Failing after 8m47s
README, QUICKSTART, and Wiki were pre-wizard, pre-auth, pre-DDNS, and
pre-service-store.  Full rewrite covering:
- First-run wizard replaces manual make setup + .env identity config
- Session-based auth (admin/peer roles, CSRF protection)
- DDNS: pic.ngo registration with TOTP, provider abstraction
- Service store: install/remove optional services from manifest index
- Cell-to-cell networking and peer-sync protocol
- Extended connectivity: WG external, OpenVPN, Tor exit routing
- Caddy HTTPS: Let's Encrypt (DNS-01/HTTP-01) or internal CA
- Current container list, port bindings, and security model
- Accurate make targets (ddns-update, reset-admin-password, etc.)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-10 04:35:37 -04:00
roof f1b48208fc Fix CI unit test failures and DDNS config wiring
Unit Tests / test (push) Failing after 8m58s
- auth_manager._ensure_file(): stop creating the empty auth_users.json on
  init — the constructor now only creates the parent directory.  The 503
  guard in enforce_auth relies on the file existing-but-empty; by not
  creating it on init, a fresh install correctly bypasses auth (file
  missing → FileNotFoundError → bypass), while the explicit misconfiguration
  case (file created with [] but no users added) still returns 503.
- test_enforce_auth_configured.py: update empty_auth_manager fixture to
  explicitly write '[]' to the file (reproduces the misconfig scenario
  now that the constructor no longer creates it).
- ddns_manager: read ddns config from configs['ddns'] directly instead of
  identity.domain.ddns — _identity.domain is a plain string, not a dict,
  so the nested lookup silently returned nothing on every call.
- setup_cell.py: write top-level 'ddns' block into cell_config.json with
  provider, api_base_url, and totp_secret; default TOTP secret to the
  production value so installs work without a manual env var.
- test_ddns_manager.py: update _make_config_manager to populate cm.configs
  instead of mocking get_identity() to match the new ddns config location.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-10 04:20:19 -04:00
roof ffe1dbeed6 Integrate DDNS registration and IP update into installer
Unit Tests / test (push) Failing after 8m57s
setup_cell.py: register_with_ddns() called at end of setup — detects
public IP via api.ipify.org, generates TOTP code from DDNS_TOTP_SECRET,
POSTs to DDNS /register, saves token to data/api/.ddns_token (mode 600).
Idempotent: skips if token file already exists. Fails gracefully if
DDNS_TOTP_SECRET is unset or network is unreachable.

scripts/ddns_update.py: standalone script for periodic IP updates.
Reads token from data/api/.ddns_token, fetches current public IP,
compares to cached last IP (data/api/.ddns_last_ip) and calls /update
only when the IP has actually changed.

Makefile: add ddns-update (run update script) and ddns-register (force
re-registration by removing old token then calling register_with_ddns).
Usage: DDNS_TOTP_SECRET=<secret> make ddns-register

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-10 02:28:02 -04:00
roof 15376b67c7 Add runtime-generated config paths to .gitignore
Unit Tests / test (push) Failing after 9m0s
config/api/dns/, config/api/network.json, config/api/webdav/ are
created at API startup and should never be tracked.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 13:26:03 -04:00
roof 8efe8c1225 Merge PIC v2 — phases 1-5 + CI/CD: wizard, HTTPS, DDNS, service store, connectivity
Unit Tests / test (push) Failing after 8m52s
2026-05-09 12:11:15 -04:00
roof 64e60dc577 Add Gitea Actions CI workflows — unit tests on push, image builds on tag
Unit Tests / test (push) Failing after 9m3s
- test.yml: run unit tests on every push (all branches)
- release.yml: build and push pic-api + pic-webui images on v*.*.* tags

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 10:59:29 -04:00
roof e38bd4e81f Phase 5: extended connectivity — WireGuard ext, OpenVPN, Tor exit routing
- ConnectivityManager: per-peer exit routing via iptables fwmark/policy tables
  (wg_ext=0x10/t110, openvpn=0x20/t120, tor=0x30/t130)
- Dedicated PIC_CONNECTIVITY chains (mangle+nat), kill-switch FORWARD DROP
- Config upload with sanitization: strips PostUp/PostDown and OVpn script dirs
- Peer exit_via field added to peer registry (backward-compat, default=default)
- 7 Flask routes at /api/connectivity/*
- Connectivity.jsx: 693-line frontend with exit cards, peer assignment table
- 72 new tests for ConnectivityManager (72 passing)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 10:48:20 -04:00
roof 0a21f22076 Phase 4: service store — manifest validation, install/remove, Store UI
- ServiceStoreManager: manifest allowlist (git.pic.ngo/roof/*), volume
  denylist, ACCEPT-only iptables rules, ${SERVICE_IP}-only dest_ip
- IP allocator: pool 172.20.0.20-254, skips CONTAINER_OFFSETS VIPs
- Compose overlay: docker-compose.services.yml auto-included via DCF
- Flask blueprint at /api/store: list, install, remove, refresh
- Store.jsx: full install/remove UI with spinners and toast notifications
- 95 new unit tests for ServiceStoreManager (all passing)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 10:19:39 -04:00
roof f77d7fabcd Phase 3: ddns_manager — DDNS client, provider adapters, IP heartbeat
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 09:42:00 -04:00
roof 7d290c12c4 Phase 2: caddy_manager — Caddyfile generation, health monitor, DNS-01 support
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 09:04:11 -04:00
roof c1b1686cd9 Add frontend wiring for setup wizard — setupAPI, SetupGuard, /setup route
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 08:27:13 -04:00
roof cf1b9672f4 Phase 1: first-run setup wizard, bash installer, Docker profiles
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 08:05:38 -04:00
roof 6dbd0dff46 Add Gitea Actions CI workflows — unit tests on push, image build on tag
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 07:21:35 -04:00
roof 7391d7f7a2 Add e2e latency consistency test for WireGuard tunnel
Sends 50 pings at 0.2s intervals through the cell-to-cell tunnel and
asserts that ≤5% exceed 3× the median RTT (floor 15ms). Catches
server-side packet processing regressions on wired paths.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-07 15:13:27 -04:00
roof b8e57b6e51 Fix race condition in ensure_forward_stateful: add threading.Lock
Concurrent callers (health monitor + startup) could both pass the
delete-all loop and each insert a copy, producing duplicate
ESTABLISHED,RELATED rules. Lock serialises all calls.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-07 10:12:18 -04:00
roof 1b61e9e290 Fix ICMP latency: re-anchor ESTABLISHED,RELATED to FORWARD position 1 on every health tick
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 18:51:38 -04:00
roof 6f84a3ffe1 Fix e2e fixture: use Table=off + manual routes to avoid wg-quick conflict
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 13:31:53 -04:00
roof 0042b3b1bb Use alpine instead of busybox for cell subnet route injection
pic1 ships alpine but not busybox; ensure_cell_subnet_routes() now uses
the alpine image so route injection works on all cells.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 12:59:23 -04:00
roof e2c50c381a Fix cross-cell domain access: scope DNAT rules, add Docker→wg0 routing
- firewall_manager: add _get_wg_server_ip() helper; scope ensure_cell_api_dnat(),
  ensure_dns_dnat(), ensure_service_dnat() DNAT rules with -d server_ip; add
  ensure_wg_masquerade() (Docker→wg0 MASQUERADE+FORWARD) and
  ensure_cell_subnet_routes() (host routes via docker run busybox)
- wireguard_manager: scope PostUp DNAT rules with -d server_ip in generate_config()
  and ensure_postup_dnat(); add Docker→wg0 MASQUERADE+FORWARD rules
- app.py: call ensure_wg_masquerade() and ensure_cell_subnet_routes() in
  _apply_startup_enforcement()
- tests/test_firewall_manager.py: mock _get_wg_server_ip, add
  test_dnat_is_scoped_to_server_ip and test_returns_false_when_wg_server_ip_not_found
- tests/e2e/wg/test_cell_to_cell_routing.py: rewrite to use dynamic config
  (no hardcoded IPs/ports), add latency and domain access tests

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 12:37:02 -04:00
roof 1e1bda4679 Fix cross-cell ICMP routing: state-based cell DROP + e2e test
The cell catch-all DROP rule blocked all traffic from a connected cell's
subnet, including ESTABLISHED/RELATED packets (ICMP replies, TCP ACKs) for
connections initiated by local VPN peers. This broke ping to the remote
cell's WireGuard IP even when the cell-to-cell tunnel was healthy.

Change the DROP to match only NEW,INVALID connections so established reply
traffic passes through to the stateful ACCEPT rule.

Also adds tests/e2e/wg/test_cell_to_cell_routing.py — an end-to-end test
that brings up a real WireGuard tunnel from the test runner to pic1 and
verifies full cross-cell routing including ICMP ping, API /health, and Caddy.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 10:59:11 -04:00
roof 5a4e292440 fix: allow reply traffic from connected cells through FORWARD chain
apply_cell_rules drops all traffic from a cell's subnet except specific
service ports. This also drops ICMP replies and TCP ACKs for connections
initiated by local peers to the connected cell, breaking cross-cell
routing (ping to 10.0.0.1 silently dropped by test's cell DROP rule).

Fix: ensure_forward_stateful() inserts a stateful ESTABLISHED,RELATED
ACCEPT at the top of FORWARD. Called from apply_cell_rules (every cell
add/update) and from _apply_startup_enforcement. Idempotent.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 15:13:59 -04:00
roof c2d215ee2e fix: cross-cell routing for split-tunnel peers
Three related fixes for split-tunnel peers that need to reach connected cells:

1. apply_peer_rules/apply_all_peer_rules now accept wg_subnet (actual local VPN
   subnet) and cell_subnets (connected cells' vpn_subnets) parameters instead of
   hardcoding 10.0.0.0/24. All callers (startup, add_peer, update_peer,
   apply-enforcement endpoint) pass the real values.

2. Explicit ACCEPT rules are inserted in FORWARD for each connected cell's
   subnet so split-tunnel peers (internet_access=False) can still reach
   connected cells via the wg0→wg0 path.

3. apply_ip_range in network_manager now loads cell_links.json and passes it
   to generate_corefile(), fixing a race where the bootstrap DNS thread could
   overwrite the Corefile and wipe cross-cell DNS forwarding zones on startup.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 14:36:28 -04:00
roof 8ee1d88e37 Add subnet conflict validation for wireguard.address and ip_range changes
When a cell is connected to others, changing the local WireGuard address
or Docker ip_range to a subnet that overlaps a connected cell's vpn_subnet
would break routing. Both now return 409 with the conflicting cell name.

- wireguard.address: derive network from new address, check all connected
  cells' vpn_subnet for overlap (after existing format validation)
- ip_range: check all connected cells' vpn_subnet for overlap (after
  existing RFC-1918 validation)

Tests: 4 cases each (overlap → 409, no overlap → ok, no cells → ok,
format error still fires first → 400).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 10:00:58 -04:00
roof c658d2b16c Add domain conflict validation when changing domain or accepting heal invite
Two gaps allowed a cell to take a domain already in use by a connected cell:

1. PUT /api/config domain change: added check against cell_link_manager's
   connected cells list before saving — returns 409 if the new domain
   collides with any connected cell's domain.

2. accept_invite healing path: a remote cell changing its domain via a
   re-invite was not validated against other connected cells' domains.
   Now calls _check_invite_conflicts(invite, exclude_cell=name) before
   applying any change.

Also: the healing path now detects domain changes (alongside dns_ip/
vpn_subnet/endpoint), updates the stored domain, and refreshes the DNS
forward rule when the domain changes.

Tests: 3 new domain-conflict tests in test_config_validation.py;
3 new accept_invite healing tests in test_cell_link_manager.py.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 09:46:58 -04:00
roof ac0c16c97b Fix session cookie name collision when running multiple PIC instances on localhost
Flask's default cookie name ('session') is shared across all ports on the same
hostname. When two PIC instances are accessed via localhost:portA and localhost:portB,
logging into one overwrites the other's session cookie, causing repeated logouts.

Derive a unique 8-hex suffix from each instance's persistent SECRET_KEY and set
SESSION_COOKIE_NAME = 'pic_sess_<suffix>'. This ensures each cell uses a distinct
cookie name, so sessions are fully isolated regardless of hostname.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 09:15:42 -04:00
roof 28a193e430 Fix ensure_postup_dnat to strip-and-replace all DNAT rules idempotently
_get_dnat_container_ips() used a concatenating docker inspect format that
produced "invalid IP" when containers had multiple network attachments.
The old ensure_postup_dnat appended rather than replacing, so each update
call added a broken duplicate set of rules causing iptables to fail on
startup and tear down wg0 entirely.

Fix _get_dnat_container_ips to use a space separator in the format string
and validate each token as a real IP before accepting it.

Rewrite ensure_postup_dnat with _is_dnat_rule() helper: strips every
managed DNAT/FORWARD rule (any IP, port 53/80) on semicolon-split and
appends a single correct set — fully idempotent regardless of prior state.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 06:54:20 -04:00
roof d36fe88e16 feat(ui): add show/hide password toggle to login and account settings
Login.jsx:
- Eye/EyeOff toggle on the password field
- Locked account error now shows exact minutes remaining ("Try again in 3 minutes")
  instead of generic "Try again later"

AccountSettings.jsx:
- PasswordInput component wraps all 4 password fields with individual eye toggles
  (current password, new password, confirm, admin reset)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 06:05:46 -04:00
roof 67362349d1 test: add loop detection tests for PUT /api/peers/<peer>/route-via
3 new tests in TestSetPeerRouteVia:
- 409 when remote_exit_relay_active=True (would create A→B→A cycle)
- disable (via_cell=null) bypasses loop check — always allowed
- no 409 when remote_exit_relay_active=False (safe to enable)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 04:24:02 -04:00
roof dc2606541c feat: Phase 4 hardening — retry/backoff, loop detection, sync status UI + tests
Phase 4.1 — Retry/backoff for failed permission pushes:
- _compute_next_retry(): capped exponential backoff with jitter (60s–1h)
- _record_push_result(): tracks push_attempts and next_retry_at per link
- replay_pending_pushes(): skips links still in backoff window, logs deferred count
- _load() migration: adds push_attempts/next_retry_at to existing records

Phase 4.2 — Loop detection (A→B→A routing cycle):
- set_peer_route_via(): returns 409 if target cell already routes peers through us
- apply_remote_permissions(): soft warning when accepting exit-relay that would cycle

Phase 4.3 — Sync staleness indicator in Cell Network UI:
- SyncBadge component: green (synced), amber (pending/failed), gray (never)
- Shows relativeTime of last sync + error message + next retry estimate
- Injected into CellPanel header alongside tunnel online/handshake status

Tests (54 new):
- TestCheckInviteConflicts: subnet overlap, domain conflict, exclude_cell (9 tests)
- TestPushInviteToRemote: success, 4xx, no endpoint, subprocess errors (7 tests)
- TestAcceptInviteNew: new cell, idempotent, healing dns/subnet changes (16 tests)
- TestAddConnectionMutualPairing: push-invite call, non-fatal failure (5 tests)
- TestPeerSyncAcceptInvite endpoint: happy path, field validation, error propagation (16 tests)
- Fixed 2 existing replay tests to clear backoff gate (simulates elapsed window)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 04:18:36 -04:00
roof 960a4ecc51 fix: WG address change now queues pending restart + heals cell connections
Three issues fixed together:

1. WireGuard address changes now go through the pending-restart queue
   (shown in the UI banner) instead of restarting cell-wireguard immediately.
   Only private_key changes still restart immediately; address and port
   changes both defer to the user-initiated Apply flow.  Previously the
   address change was silently applied and never appeared in Settings →
   Pending Configuration.

2. When the WG address changes, the API spawns a background thread that
   pushes the updated invite to all connected cells (over LAN, before the
   WG tunnel is back up).  This lets remote cells automatically update
   their dns_ip, AllowedIPs, and CoreDNS forwarding rules without manual
   re-pairing.

3. accept_invite now handles the "already connected but changed" case:
   if the remote cell re-sends an invite with a different dns_ip, vpn_subnet
   or endpoint, we update the stored link, the WG AllowedIPs, and the
   CoreDNS forward rule in place — no delete/re-add required.  Previously
   the endpoint was ignored and returned the stale record unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 08:29:18 -04:00
roof 0e16d6968a fix: prevent test runs from corrupting live WG state; sync wg0.conf on IP change
Three fixes:

1. Extend the docker-exec safety guard in wireguard_manager to also check
   for 'wg_confs' in the config path.  When running unit tests on the host
   the API uses /app/config/wireguard/wg0.conf (no wg_confs subdir), so the
   old '/tmp/' | 'pytest' check didn't fire — _syncconf and friends were
   executing live 'docker exec cell-wireguard wg set' calls against the
   running container, removing real VPN peers that didn't appear in the
   test config.  The wg_confs subdir only exists inside the container mount,
   so its presence reliably gates live calls.

2. Fix get_split_tunnel_ips() wrong path: self.data_dir + 'api/cell_links.json'
   → self.data_dir + 'cell_links.json'.  The extra 'api/' segment produced
   /app/data/api/cell_links.json inside the container instead of the real
   /app/data/cell_links.json, so connected cells were silently excluded from
   split-tunnel CIDRs.

3. update_peer_ip_registry and ip_update now also call
   wireguard_manager.update_peer_ip so wg0.conf AllowedIPs stay in sync when
   a peer's VPN IP changes at runtime (previously only peers.json was updated).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 07:45:28 -04:00
roof 99c1d9cd92 feat: auto mutual WG pairing + subnet/domain conflict detection
**Auto mutual pairing**
When Cell A imports Cell B's invite (POST /api/cells on A), A now
immediately pushes its own invite to Cell B over the LAN (using the
endpoint IP, before the WG tunnel exists) via the new endpoint:
  POST /api/cells/peer-sync/accept-invite

Cell B auto-adds Cell A as a WireGuard peer and DNS forward, completing
the bidirectional tunnel without any manual action on Cell B's UI.
The endpoint is idempotent and unauthenticated (runs before WG tunnel).

Previously, the pairing was one-sided: Cell A had Cell B as a WG peer
but Cell B never had Cell A — the tunnel never established and all
cross-cell operations silently failed.

**Conflict detection (add_connection + accept-invite)**
_check_invite_conflicts() now validates before connecting:
  - VPN subnet must not overlap own subnet or any already-connected cell's subnet
  - Domain must not match own domain or any already-connected cell's domain
Returns clear error messages so the admin knows which cell to reconfigure.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 06:24:46 -04:00
roof 1a611e0474 fix: UI always accessible; fix exit-relay AllowedIPs not updating
**PIC UI always accessible (service_access=[])**
Remove the per-peer Caddy:80 ACCEPT/DROP rule from apply_peer_rules.
Service access was enforced at two layers (iptables DROP + CoreDNS ACL),
but the iptables layer also blocked the PIC web UI served through Caddy.
CoreDNS ACL alone is sufficient — DNS blocks service hostnames; the UI
path through Caddy remains reachable regardless of service_access value.

**Exit-relay internet routing (route_via another cell)**
update_peer_ip validated new_ip as a single ip_network, rejecting the
comma-separated '10.0.1.0/24, 0.0.0.0/0' string passed by
update_cell_peer_allowed_ips(add_default_route=True). The AllowedIPs
in wg0.conf was never updated, so WireGuard never routed internet traffic
through the exit cell's tunnel. Fix: validate each CIDR individually and
apply the change live via wg set without a container restart.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 05:41:22 -04:00
roof c521fab1cb fix: merge CoreDNS ACL per-service and add reload plugin; add peer/cell e2e tests
- _build_acl_block: put all blocked IPs for a service in ONE acl block instead
  of one block per peer — the first block's allow-all was silently granting
  access to every peer after the first blocked one (first-match semantics)
- generate_corefile: add 'reload' plugin so SIGUSR1 triggers Corefile reload
  in newer CoreDNS builds (without it the signal was a no-op)
- tests/test_firewall_manager.py: new tests for single merged ACL block and
  the reload directive
- tests/e2e/api/test_peer_access_update.py: e2e tests for service_access,
  internet_access, and peer_access updates persisting live to iptables/CoreDNS
- tests/e2e/api/test_cell_to_cell.py: e2e tests for cell-to-cell connection
  management, permissions API, and cross-cell service access restrictions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 04:57:37 -04:00
roof f1666ba19c fix: embed DNAT rules in wg0.conf PostUp for persistence + fix dns_ip in server config
DNAT rules applied via docker exec are lost whenever wg-easy reloads the
WireGuard interface (PostDown flushes the nat table then PostUp only
re-adds static rules). Fix: embed DNS (port 53) and service (port 80)
DNAT rules directly in wg0.conf PostUp/PostDown so they reapply on every
interface restart. ensure_postup_dnat() patches existing configs on startup.

get_server_config() now returns the WG server IP (e.g. 10.0.0.1) for
dns_ip instead of the cell-dns container IP (172.20.0.3). This makes the
value consistent with what get_peer_config() writes into the .conf file,
and fixes the stale hint text in Peers.jsx and WireGuard.jsx.

UI: fallback dns_ip changed from 172.20.0.3 to 10.0.0.1; split-tunnel
fallback drops the 172.20.0.0/16 stale range.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 04:07:10 -04:00