Commit Graph

215 Commits

Author SHA1 Message Date
roof 61e8631c7d feat: DDNS settings integration — check availability, update credentials
- GET /api/config now returns domain_mode, domain_name, ddns.{provider,subdomain,has_token}
- GET /api/ddns/check/<name> proxies availability check to DDNS service
- PUT /api/ddns validates and saves cloudflare/duckdns credentials post-setup
- When cell_name changes for pic_ngo provider, auto-registers the new subdomain
- Settings: Cell Name shows availability badge for pic_ngo; auto-save blocks on taken
- Settings: new External Domain & DDNS section — pic_ngo info, cloudflare/duckdns edit
- 11 new tests for the two new endpoints (all pass)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 14:35:37 -04:00
roof 81dcced0ca fix: bake DDNS_TOTP_SECRET and correct URL into defaults
Unit Tests / test (push) Successful in 15m42s
docker-compose.yml DDNS_TOTP_SECRET defaulted to empty string —
containers on fresh installs had no OTP, so every /register call
was rejected with 401 and no domain was ever registered.

setup_cell.py still pointed to https://ddns.pic.ngo/api/v1 (no nginx
on VPS, so HTTPS fails). Both now default to the correct values; both
are still overridable via env var for custom DDNS deployments.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 13:49:43 -04:00
roof 777ffa4fb2 fix: use DDNS_URL env var for availability check; default to port 8080
Unit Tests / test (push) Successful in 15m23s
_check_pic_ngo_available was hardcoding https://ddns.pic.ngo, ignoring
DDNS_URL. Now imports DDNS_API_BASE from setup_manager so both the
availability check and DDNS registration use the same configured URL.

API container now receives DDNS_URL and DDNS_TOTP_SECRET from env.
Default DDNS_URL points to http://ddns.pic.ngo:8080/api/v1 (the
FastAPI service runs on port 8080 without TLS termination in front).

Also returns 503 (not 500) when the DDNS service is unreachable.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 13:06:44 -04:00
roof 55d36eb410 wizard: block Next if external service cannot be verified
Unit Tests / test (push) Successful in 15m44s
For pic_ngo: name must be confirmed available (not just format-valid).
For cloudflare/duckdns: token is auto-verified on Next if not already
done — invalid or unreachable service blocks proceeding. Only lan and
http01 (no external dependency) allow Next without a live check.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 08:09:06 -04:00
roof 99dcb1332a wizard: check pic.ngo availability on Next, not just on blur
The availability check was only triggered onBlur, so clicking Next
without blurring the field skipped the DDNS request entirely. Now
handleNext awaits the check and blocks with an error if the name is
taken. Unknown/unreachable DDNS is treated as available to avoid
blocking the wizard.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 07:56:59 -04:00
roof 900781032a wizard: 5-step redesign — password, domain, timezone, services, review
Unit Tests / test (push) Successful in 15m22s
Domain name is now the cell identity (no separate cell name step).
All 5 providers (pic_ngo, cloudflare, duckdns, http01, lan) are
first-class options in a single Domain step. pic.ngo availability
is checked live via backend proxy to ddns.pic.ngo. Cloudflare and
DuckDNS tokens are verified via backend before proceeding.
cell_name is derived automatically from the chosen domain.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 07:09:57 -04:00
roof 1c62c47475 fix: 500 on setup complete + wizard shows all 7 steps
Unit Tests / test (push) Successful in 15m41s
Two bugs:

1. AttributeError: AuthManager.update_password does not exist — the
   fallback when create_user fails should call set_password_admin().
   This caused a 500 on every setup submit when an admin user already
   existed (e.g. from a previous install attempt).

2. Wizard was jumping to step 2 and skipping domain steps 3-4 when
   preconfigured data existed in cell_config.json. Since the installer
   no longer sets that data, and the wizard must always show all steps,
   the installerConfigured state and all step-skipping navigation is
   removed. Values are still pre-filled if found in config.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 16:41:33 -04:00
roof 4a42ff5dcc wizard: move all config to /setup; install.sh is infrastructure-only
Unit Tests / test (push) Successful in 15m41s
install.sh no longer prompts for anything. It installs packages (with sudo),
creates the system user, clones the repo, and runs 'make install' — all as
the invoking user. Only package installs and system-level ops use sudo.
All folder creation happens under the user's own account, no chown needed.

/setup wizard gains the missing validation that was previously in install.sh:
- Step 1: checks pic.ngo name availability via backend (non-blocking)
- Step 4: 'Verify token' button for Cloudflare and DuckDNS tokens,
  validated server-side through new /api/setup/validate steps

API changes (routes/setup.py):
- validate step 'pic_ngo_available': proxy check to ddns.pic.ngo
- validate step 'cloudflare_token': verify via Cloudflare tokens API
- validate step 'duckdns_token': verify via DuckDNS update endpoint

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 16:07:56 -04:00
roof 2d842abe5b installer: restore cell identity prompts and domain setup
Unit Tests / test (push) Successful in 15m39s
Reverts 8d1ef39. The installer must collect cell name, domain mode, and
provider tokens before 'make install' so that DDNS registration,
availability checks, and Caddy TLS can be configured at first boot.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 15:01:32 -04:00
roof 8d1ef39ca5 installer: remove cell identity prompts — wizard handles all config
Unit Tests / test (push) Successful in 15m44s
The /setup wizard now collects cell name, domain mode, credentials,
password, services, and timezone.  The bash installer's job is just
infrastructure: packages, user, repo clone, make install, start.

Removes: prompt/prompt_secret helpers, verify_cf_token, verify_duckdns,
check_pic_ngo_available, and the entire Step 5 identity block.
TOTAL_STEPS 8 → 7.  Step numbers renumbered accordingly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 14:41:46 -04:00
roof 9566f7dd1b wizard: skip cell-name and domain steps when installer pre-configured them
Unit Tests / test (push) Successful in 15m44s
When the bash installer collects cell name and domain mode, the first-run
wizard's /setup should only ask for a password, service selection, and
timezone.  Previously the wizard pre-filled those fields but still showed
all 7 steps.

- useEffect fetches /api/setup/status on mount; if preconfigured.cell_name
  and preconfigured.domain_mode are both set, sets installerConfigured=true
  and jumps to step 2 (password)
- handleStep2Next → step 5 when installerConfigured (skips domain steps 3+4)
- handleStep2Back → step 1 when installerConfigured (review cell name)
- handleStep5Back returns to step 2 when installerConfigured

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 14:03:56 -04:00
roof f03a5f08c6 Makefile: explicitly pass all identity env vars to setup_cell.py
Unit Tests / test (push) Successful in 15m41s
DOMAIN_MODE, CELL_DOMAIN_NAME, CLOUDFLARE_API_TOKEN, DUCKDNS_TOKEN,
DUCKDNS_SUBDOMAIN are now explicit in the setup target so they are
visible and documented, not silently inherited from the environment.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 13:27:53 -04:00
roof f550f04ce2 Fix DDNS registration and wizard pre-fill after installer run
Unit Tests / test (push) Successful in 15m29s
DDNS registration (setup_cell.py):
- Replace pyotp dependency with stdlib TOTP (HMAC-SHA1, RFC 6238)
  pyotp is only available inside the Docker container, not on the host
  where setup_cell.py runs — registration was silently skipped every time
- OTP header still sent if generation succeeds; omitted gracefully if not

Wizard pre-fill (setup_manager + Setup.jsx):
- GET /api/setup/status now returns 'preconfigured' dict with cell_name,
  domain_mode, domain_name, and provider tokens from installer-written config
- Setup.jsx fetches status on mount and pre-fills all form state so the
  user only needs to set password, services, and timezone — not re-enter
  the identity they already configured in the bash installer
- Fails silently so wizard still works on fresh installs with no config

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 12:22:53 -04:00
roof 579f49ba13 Installer: interactive cell identity prompts with live token validation
Unit Tests / test (push) Successful in 15m24s
install.sh now guides the user through the full identity setup before
running make install:
- Cell name prompt with format validation and pic.ngo availability check
- Domain mode selection: pic.ngo / Cloudflare / DuckDNS / HTTP-01 / LAN
- Cloudflare API token: collected and verified against CF tokens/verify API
- DuckDNS: subdomain + token verified against duckdns.org/update
- HTTP-01: domain name collected, port-80 warning shown
- All collected values passed as env vars to make install
- After two failed token attempts user can continue (re-verified at boot)
- Final banner shows configured cell name and domain

setup_cell.py: updated to handle all domain modes
- Reads DOMAIN_MODE / CELL_DOMAIN_NAME / CLOUDFLARE_API_TOKEN /
  DUCKDNS_TOKEN / DUCKDNS_SUBDOMAIN from env
- write_cell_config() now writes domain_mode + domain_name to _identity
  and builds the ddns section for each provider (not hardcoded to pic_ngo)
- register_with_ddns() only called when DOMAIN_MODE == 'pic_ngo'

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 11:34:22 -04:00
roof 925ab1f696 Overhaul setup wizard: domain config, password strength, field alignment
Unit Tests / test (push) Successful in 8m48s
Password:
- Add lowercase to strength scoring; "Good" now requires all API criteria
  (12 chars, upper, lower, digit) — no more submitting passwords the API rejects
- isReady gates the Next button on meeting API requirements, not just length

Domain steps 3 + 4:
- Step 3: choose pic_ngo / custom / lan (sends valid API domain_modes)
- Step 4 (pic.ngo): shows derived [cellName].pic.ngo domain preview
- Step 4 (custom): domain name field + TLS method selector
  (Cloudflare DNS-01 + API token, DuckDNS + token, HTTP-01 + port-80 warning)
- Step 4 skipped entirely for LAN-only
- Review step shows actual domain string and TLS method instead of opaque codes

Cell name:
- Description and preview hint make clear it becomes the pic.ngo subdomain
- Step 1 shows live "name.pic.ngo" preview as you type

Backend:
- setup_manager now accepts and stores domain_name, cloudflare_api_token,
  duckdns_token for Phase 3 DDNS registration use

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 07:27:59 -04:00
roof 439886624e Fix config/data ownership — chown to invoking user after make install
Unit Tests / test (push) Successful in 8m46s
make install runs as root so all generated files (config/, data/) land
as root:root. Added a chown pass in install.sh after make install
completes, re-applying REPO_OWNER ownership. Also fixed the make setup
chown to use SUDO_USER when invoked via sudo rather than always id -u
(which is 0 when running as root).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 06:46:12 -04:00
roof 24877df976 Fix setup wizard and installer for fresh-install flow
Unit Tests / test (push) Successful in 8m53s
- setup_manager: fall back to update_password if admin already exists
  (installer bootstrap creates admin; wizard now updates rather than fails)
- install.sh: chown repo to SUDO_USER instead of pic user so the
  invoking operator can run make update without git safe.directory errors
- test: update mock to also stub update_password when testing total auth failure

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 06:08:55 -04:00
roof bfa0d99dd1 Fix git safe.directory error for non-root users after install
Unit Tests / test (push) Successful in 8m55s
The installer runs as root and chowns /opt/pic to the pic user.
Any other user (roof, operator) running make update then hits
"detected dubious ownership". Fix: add /opt/pic to system-wide
git safe.directory after clone, and add same guard in make update.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 05:46:40 -04:00
roof 1e2cf5580f Fix setup wizard: align field names with API (domain_type→domain_mode, services→services_enabled)
Unit Tests / test (push) Successful in 8m52s
The wizard was sending domain_type and services but the API expected
domain_mode and services_enabled, causing a validation error on submit.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 05:36:18 -04:00
roof 1989dfa0a3 Fix: exempt /api/setup/* from enforce_auth so setup wizard works on fresh install
Unit Tests / test (push) Successful in 8m49s
The setup wizard runs before any account exists, but the installer's
setup_cell.py creates auth_users.json with an admin account first.
This meant enforce_auth was active by the time the browser hit /setup,
blocking all /api/setup/* calls with 401. The CSRF hook already exempted
/api/setup/* — auth enforcement now matches.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 05:03:44 -04:00
roof 5dab6377bc Restore https:// now that git.pic.ngo has a TLS certificate
Unit Tests / test (push) Failing after 15m59s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 04:33:51 -04:00
roof 0a24d20bbc Update QUICKSTART: use http for install.pic.ngo and git.pic.ngo (no HTTPS yet)
Unit Tests / test (push) Successful in 8m50s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 02:58:48 -04:00
roof 46599bd37e Fix installer: use http://git.pic.ngo without port (nginx forwards)
Unit Tests / test (push) Successful in 8m55s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 02:57:13 -04:00
roof dde4d9a53f Rewrite CLAUDE.md following article best practices
Unit Tests / test (push) Successful in 8m54s
Adds: tech stack, coding conventions, file placement rules, safety rules,
infrastructure topology table, and expands architecture with key-file table
and before-request hook documentation. Removes vague guidance, replaces
with actionable rules Claude can follow automatically.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-10 07:25:53 -04:00
roof 674a66f7a0 Revert registry port: git.pic.ngo uses standard port (DNS fix pending)
Unit Tests / test (push) Successful in 8m55s
2026-05-10 06:59:13 -04:00
roof 9df3bf6a17 Fix release workflow: registry is git.pic.ngo:3000 not port 80
Unit Tests / test (push) Successful in 8m55s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-10 06:52:42 -04:00
roof 0773179962 Gitignore .coverage files
Unit Tests / test (push) Successful in 8m55s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-10 06:28:40 -04:00
roof 3a35cf72d3 Fix CI failures on root — mock OSError instead of relying on filesystem
Tests assumed write to /nonexistent/... fails, but CI runs as root where
Linux allows creating any path. Use unittest.mock.patch on builtins.open
with OSError side_effect so the test is environment-independent.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-10 06:19:24 -04:00
roof 515f3d5075 Update QUICKSTART: lead with curl installer, document all domain modes
Unit Tests / test (push) Failing after 8m43s
Option A is now the one-line curl installer (install.pic.ngo); Option B
is the manual git clone path. Wizard section covers all five domain modes
(pic_ngo, cloudflare, duckdns, http01, lan) and current password rules.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-10 05:05:08 -04:00
roof 35993bc79d Update all documentation to reflect current architecture
Unit Tests / test (push) Failing after 8m47s
README, QUICKSTART, and Wiki were pre-wizard, pre-auth, pre-DDNS, and
pre-service-store.  Full rewrite covering:
- First-run wizard replaces manual make setup + .env identity config
- Session-based auth (admin/peer roles, CSRF protection)
- DDNS: pic.ngo registration with TOTP, provider abstraction
- Service store: install/remove optional services from manifest index
- Cell-to-cell networking and peer-sync protocol
- Extended connectivity: WG external, OpenVPN, Tor exit routing
- Caddy HTTPS: Let's Encrypt (DNS-01/HTTP-01) or internal CA
- Current container list, port bindings, and security model
- Accurate make targets (ddns-update, reset-admin-password, etc.)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-10 04:35:37 -04:00
roof f1b48208fc Fix CI unit test failures and DDNS config wiring
Unit Tests / test (push) Failing after 8m58s
- auth_manager._ensure_file(): stop creating the empty auth_users.json on
  init — the constructor now only creates the parent directory.  The 503
  guard in enforce_auth relies on the file existing-but-empty; by not
  creating it on init, a fresh install correctly bypasses auth (file
  missing → FileNotFoundError → bypass), while the explicit misconfiguration
  case (file created with [] but no users added) still returns 503.
- test_enforce_auth_configured.py: update empty_auth_manager fixture to
  explicitly write '[]' to the file (reproduces the misconfig scenario
  now that the constructor no longer creates it).
- ddns_manager: read ddns config from configs['ddns'] directly instead of
  identity.domain.ddns — _identity.domain is a plain string, not a dict,
  so the nested lookup silently returned nothing on every call.
- setup_cell.py: write top-level 'ddns' block into cell_config.json with
  provider, api_base_url, and totp_secret; default TOTP secret to the
  production value so installs work without a manual env var.
- test_ddns_manager.py: update _make_config_manager to populate cm.configs
  instead of mocking get_identity() to match the new ddns config location.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-10 04:20:19 -04:00
roof ffe1dbeed6 Integrate DDNS registration and IP update into installer
Unit Tests / test (push) Failing after 8m57s
setup_cell.py: register_with_ddns() called at end of setup — detects
public IP via api.ipify.org, generates TOTP code from DDNS_TOTP_SECRET,
POSTs to DDNS /register, saves token to data/api/.ddns_token (mode 600).
Idempotent: skips if token file already exists. Fails gracefully if
DDNS_TOTP_SECRET is unset or network is unreachable.

scripts/ddns_update.py: standalone script for periodic IP updates.
Reads token from data/api/.ddns_token, fetches current public IP,
compares to cached last IP (data/api/.ddns_last_ip) and calls /update
only when the IP has actually changed.

Makefile: add ddns-update (run update script) and ddns-register (force
re-registration by removing old token then calling register_with_ddns).
Usage: DDNS_TOTP_SECRET=<secret> make ddns-register

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-10 02:28:02 -04:00
roof 15376b67c7 Add runtime-generated config paths to .gitignore
Unit Tests / test (push) Failing after 9m0s
config/api/dns/, config/api/network.json, config/api/webdav/ are
created at API startup and should never be tracked.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 13:26:03 -04:00
roof 8efe8c1225 Merge PIC v2 — phases 1-5 + CI/CD: wizard, HTTPS, DDNS, service store, connectivity
Unit Tests / test (push) Failing after 8m52s
2026-05-09 12:11:15 -04:00
roof 64e60dc577 Add Gitea Actions CI workflows — unit tests on push, image builds on tag
Unit Tests / test (push) Failing after 9m3s
- test.yml: run unit tests on every push (all branches)
- release.yml: build and push pic-api + pic-webui images on v*.*.* tags

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 10:59:29 -04:00
roof e38bd4e81f Phase 5: extended connectivity — WireGuard ext, OpenVPN, Tor exit routing
- ConnectivityManager: per-peer exit routing via iptables fwmark/policy tables
  (wg_ext=0x10/t110, openvpn=0x20/t120, tor=0x30/t130)
- Dedicated PIC_CONNECTIVITY chains (mangle+nat), kill-switch FORWARD DROP
- Config upload with sanitization: strips PostUp/PostDown and OVpn script dirs
- Peer exit_via field added to peer registry (backward-compat, default=default)
- 7 Flask routes at /api/connectivity/*
- Connectivity.jsx: 693-line frontend with exit cards, peer assignment table
- 72 new tests for ConnectivityManager (72 passing)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 10:48:20 -04:00
roof 0a21f22076 Phase 4: service store — manifest validation, install/remove, Store UI
- ServiceStoreManager: manifest allowlist (git.pic.ngo/roof/*), volume
  denylist, ACCEPT-only iptables rules, ${SERVICE_IP}-only dest_ip
- IP allocator: pool 172.20.0.20-254, skips CONTAINER_OFFSETS VIPs
- Compose overlay: docker-compose.services.yml auto-included via DCF
- Flask blueprint at /api/store: list, install, remove, refresh
- Store.jsx: full install/remove UI with spinners and toast notifications
- 95 new unit tests for ServiceStoreManager (all passing)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 10:19:39 -04:00
roof f77d7fabcd Phase 3: ddns_manager — DDNS client, provider adapters, IP heartbeat
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 09:42:00 -04:00
roof 7d290c12c4 Phase 2: caddy_manager — Caddyfile generation, health monitor, DNS-01 support
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 09:04:11 -04:00
roof c1b1686cd9 Add frontend wiring for setup wizard — setupAPI, SetupGuard, /setup route
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 08:27:13 -04:00
roof cf1b9672f4 Phase 1: first-run setup wizard, bash installer, Docker profiles
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 08:05:38 -04:00
roof 6dbd0dff46 Add Gitea Actions CI workflows — unit tests on push, image build on tag
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 07:21:35 -04:00
roof 7391d7f7a2 Add e2e latency consistency test for WireGuard tunnel
Sends 50 pings at 0.2s intervals through the cell-to-cell tunnel and
asserts that ≤5% exceed 3× the median RTT (floor 15ms). Catches
server-side packet processing regressions on wired paths.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-07 15:13:27 -04:00
roof b8e57b6e51 Fix race condition in ensure_forward_stateful: add threading.Lock
Concurrent callers (health monitor + startup) could both pass the
delete-all loop and each insert a copy, producing duplicate
ESTABLISHED,RELATED rules. Lock serialises all calls.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-07 10:12:18 -04:00
roof 1b61e9e290 Fix ICMP latency: re-anchor ESTABLISHED,RELATED to FORWARD position 1 on every health tick
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 18:51:38 -04:00
roof 6f84a3ffe1 Fix e2e fixture: use Table=off + manual routes to avoid wg-quick conflict
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 13:31:53 -04:00
roof 0042b3b1bb Use alpine instead of busybox for cell subnet route injection
pic1 ships alpine but not busybox; ensure_cell_subnet_routes() now uses
the alpine image so route injection works on all cells.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 12:59:23 -04:00
roof e2c50c381a Fix cross-cell domain access: scope DNAT rules, add Docker→wg0 routing
- firewall_manager: add _get_wg_server_ip() helper; scope ensure_cell_api_dnat(),
  ensure_dns_dnat(), ensure_service_dnat() DNAT rules with -d server_ip; add
  ensure_wg_masquerade() (Docker→wg0 MASQUERADE+FORWARD) and
  ensure_cell_subnet_routes() (host routes via docker run busybox)
- wireguard_manager: scope PostUp DNAT rules with -d server_ip in generate_config()
  and ensure_postup_dnat(); add Docker→wg0 MASQUERADE+FORWARD rules
- app.py: call ensure_wg_masquerade() and ensure_cell_subnet_routes() in
  _apply_startup_enforcement()
- tests/test_firewall_manager.py: mock _get_wg_server_ip, add
  test_dnat_is_scoped_to_server_ip and test_returns_false_when_wg_server_ip_not_found
- tests/e2e/wg/test_cell_to_cell_routing.py: rewrite to use dynamic config
  (no hardcoded IPs/ports), add latency and domain access tests

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 12:37:02 -04:00
roof 1e1bda4679 Fix cross-cell ICMP routing: state-based cell DROP + e2e test
The cell catch-all DROP rule blocked all traffic from a connected cell's
subnet, including ESTABLISHED/RELATED packets (ICMP replies, TCP ACKs) for
connections initiated by local VPN peers. This broke ping to the remote
cell's WireGuard IP even when the cell-to-cell tunnel was healthy.

Change the DROP to match only NEW,INVALID connections so established reply
traffic passes through to the stateful ACCEPT rule.

Also adds tests/e2e/wg/test_cell_to_cell_routing.py — an end-to-end test
that brings up a real WireGuard tunnel from the test runner to pic1 and
verifies full cross-cell routing including ICMP ping, API /health, and Caddy.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 10:59:11 -04:00
roof 5a4e292440 fix: allow reply traffic from connected cells through FORWARD chain
apply_cell_rules drops all traffic from a cell's subnet except specific
service ports. This also drops ICMP replies and TCP ACKs for connections
initiated by local peers to the connected cell, breaking cross-cell
routing (ping to 10.0.0.1 silently dropped by test's cell DROP rule).

Fix: ensure_forward_stateful() inserts a stateful ESTABLISHED,RELATED
ACCEPT at the top of FORWARD. Called from apply_cell_rules (every cell
add/update) and from _apply_startup_enforcement. Idempotent.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 15:13:59 -04:00