Commit Graph

12 Commits

Author SHA1 Message Date
roof 3d750ed1e8 Fix DDNS security and reliability gaps (#2, #3, #5, #6, #7)
Unit Tests / test (push) Successful in 7m23s
- Fix #2: Move DDNS bearer token from cell_config.json to data/api/ddns_token.
  Token is now in the secrets store (data/) rather than the config store (config/).
  Auto-migrates existing installs on first access. ConfigManager.get/set_ddns_token()
  added. set_ddns_config() now strips 'token' key to prevent it leaking back.

- Fix #3: Set Caddyfile permissions to 0o600 after write so the token embedded
  in the Caddyfile is not world-readable on the host filesystem.

- Fix #5: Heartbeat now fires IDENTITY_CHANGED after re-registration so Caddy
  regenerates its config with the new token automatically — users no longer need
  to click Re-register in Settings after a wizard registration failure.
  Also: heartbeat skips the 401-cycle when no token exists and goes straight to
  registration instead. DDNSManager now accepts service_bus= and is wired up.

- Fix #6: Settings page starts polling GET /api/caddy/cert-status every 15s
  after a successful DDNS re-registration and shows "Acquiring certificate…"
  feedback until Let's Encrypt issues the cert (up to 5 minutes).

- Fix #7: regenerate_with_installed() is debounced (5 s window) so two rapid
  IDENTITY_CHANGED events (e.g. wizard + heartbeat) can't start simultaneous
  ACME orders that interfere with each other.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-09 03:37:48 -04:00
roof e9077b2633 fix: Caddy health check must hit /config/ not /
Unit Tests / test (push) Successful in 7m35s
GET http://cell-caddy:2019/ returns 404 because Caddy's admin API has no
root handler.  The health monitor interpreted every response as a failure,
restarted Caddy every 3 minutes, and prevented ACME from ever completing.

/config/ returns 200 + the running config JSON whenever Caddy is up and
serving — that is the correct liveness indicator.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 15:57:32 -04:00
roof da302b5d54 fix: renew_cert regenerates Caddyfile before reload
Unit Tests / test (push) Successful in 7m32s
A stale or empty-token Caddyfile on disk caused Caddy to reject the
/load request, so the Renew button appeared to do nothing. Now
renew_cert() calls regenerate_with_installed([]) first, which writes a
fresh Caddyfile from current identity/config before reloading Caddy.
This ensures a broken on-disk file never blocks ACME renewal.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 14:38:30 -04:00
roof 7ef294fd65 fix: fall back to lan mode in pic_ngo Caddyfile when token is empty
Unit Tests / test (push) Successful in 7m42s
On a fresh install before DDNS registration completes, ddns.token is
empty. Writing `token ` (bare keyword, no value) causes Caddy to reject
the Caddyfile at startup with "wrong argument count or unexpected line
ending after 'token'".

Guard added: if the token is empty, generate a LAN-mode Caddyfile so
Caddy starts cleanly. The Caddyfile is regenerated automatically once
registration completes and the token is persisted to cell_config.json.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 13:38:51 -04:00
roof 33d255f089 feat: TLS certificate management in Vault page
Unit Tests / test (push) Successful in 7m26s
Adds live cert status, one-click ACME renewal, and custom cert upload
directly to the Vault page so users never need to touch Caddy config.

Backend:
- CaddyManager.get_cert_status() now returns domain, domain_mode, and
  cert_type so the UI can render the right controls without a separate
  identity fetch
- CaddyManager.renew_cert() reloads Caddy and invalidates the status
  cache; the frontend polls until the cert turns valid
- CaddyManager.upload_custom_cert() validates PEM, writes cert+key to
  the shared config/caddy/certs/ volume, updates identity (cert_type=custom),
  and regenerates the Caddyfile so Caddy references the new paths
- LAN-mode Caddyfile switches from /etc/caddy/internal/ to the shared
  certs dir automatically when cert_type=custom is set
- ddns_api default no longer includes /api/v1 — the plugin appends it;
  legacy /api/v1 suffix is stripped at write time to keep the Caddyfile clean
- POST /api/caddy/cert-renew and POST /api/caddy/custom-cert routes added

Frontend:
- TLSPanel component at the top of Vault.jsx shows status badge
  (valid/expiring-soon/expired/pending/internal) with domain and expiry
- Renew button visible only for ACME modes; spins during the API call
  then polls GET /api/caddy/cert-status every 10 s until valid
- Upload Custom Cert opens a modal with PEM text areas; works for all modes
- caddyAPI.renewCert() and uploadCustomCert() added to api.js

Tests: 22 new tests across 5 classes covering enriched status,
renew_cert guards, upload_custom_cert validation/writes/persistence,
custom-cert Caddyfile path selection, and ddns_api suffix stripping.
All 2093 existing tests continue to pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 12:53:42 -04:00
roof 85d265187d fix: Caddy TLS cert acquisition — two DNS-01 blockers
Unit Tests / test (push) Successful in 7m32s
1. caddy_manager: embed ddns.token (registration bearer token) in
   Caddyfile, not DDNS_TOTP_SECRET. The pic_ngo plugin sends the token
   to POST /api/v1/dns-challenge; using the TOTP secret caused 401 on
   every attempt.

2. firewall_manager: add _acme-challenge.<zone> forwarding block before
   each split-horizon zone in the Corefile. Without this, CoreDNS was
   authoritative for the challenge name and returned NODATA for TXT
   queries (wildcard A record matches but wrong type), blocking Caddy's
   internal DNS pre-verification step.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 10:45:15 -04:00
roof 0267dce73d feat: HTTPS cert status, IDENTITY_CHANGED wiring, remove stale ip_utils Caddyfile writes
Unit Tests / test (push) Successful in 11m18s
- CaddyManager: add refresh_cert_status() and get_cert_status_fresh() that
  open a live TLS connection to cell-caddy:443 to read cert expiry; avoids
  needing a volume mount into the API container
- CaddyManager: periodic cert refresh in health_monitor_loop (every 60 cycles)
- config.py PUT /api/ddns: publish IDENTITY_CHANGED so CaddyManager regenerates
  the Caddyfile immediately after any domain/cell_name change — previously the
  event was never fired from this route
- config.py: remove all ip_utils.write_caddyfile() calls; CaddyManager is now
  the sole authority for Caddyfile generation
- app.py: add GET /api/caddy/cert-status route
- app.py: add GET /api/egress/status and PUT /api/egress/services/<id>/exit routes
- Settings.jsx: display cert status badge (valid/expired/internal/unknown) with
  expiry date and days-remaining in the domain section
- Tests: TestRefreshCertStatus (8 tests), TestDdnsConfigUpdatesFiresIdentityChanged,
  TestCaddyCertStatusRoute added; fix expired-cert helper to set not_valid_before
  relative to expiry so it's always earlier

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-05 11:39:36 -04:00
roof a906c26b5d fix: resolve Caddy env vars at write time to prevent parse errors
Unit Tests / test (push) Successful in 11m25s
acme_ca and the pic_ngo DNS credentials ({$PIC_NGO_DDNS_TOKEN},
{$PIC_NGO_DDNS_API}) were written as Caddy env-var placeholders, but the
Caddy container does not inherit the API container's environment, so the
substitutions always failed — Caddy saw bare directive names with no
arguments and rejected the Caddyfile.

- _global_acme_block: only emit the acme_ca directive when ACME_CA_URL is
  actually set; omitting it makes Caddy default to Let's Encrypt production.
- _caddyfile_pic_ngo: embed the DDNS_TOTP_SECRET and DDNS_URL values directly
  into the Caddyfile at write time rather than relying on Caddy env expansion.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-30 15:01:15 -04:00
roof 0bfe95320b feat: Phase 2 — remove builtins layer, ServiceRegistry is installed-only
Unit Tests / test (push) Successful in 11m31s
Builtins (email/calendar/files) are no longer baked into the API image.
ServiceRegistry now only knows about installed store services. When nothing
is installed, Caddy and DNS get no service routes — no hardcoded fallback.

Changes:
- service_registry.py: remove _BUILTINS_DIR, _builtin_ids, _builtin_manifest,
  _load_manifest; get() and list_all() now delegate entirely to installed services
- caddy_manager.py: remove _build_core_service_routes(); remove hardcoded
  fallback pairs from _http01_service_pairs(); empty registry → api block only
- network_manager.py: _get_service_subdomains() returns [] when no registry
- api/services/builtins/: deleted (email, calendar, files manifests)
- Tests updated throughout: removed builtin-dependent assertions, added
  installed-service fixtures, updated fallback expectations to api-only

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 08:53:44 -04:00
roof d7dbd596ab feat: route PIC services as subdomains of the cell's effective domain
Unit Tests / test (push) Successful in 11m33s
In DDNS modes (pic_ngo, cloudflare, duckdns, http01), all built-in
services are now reachable as subdomains of the cell domain, e.g.
calendar.pic1.pic.ngo instead of pic1.pic.ngo/calendar.

Key changes:
- CaddyManager._build_core_service_routes(): new helper generates
  Caddy named-matcher host blocks for calendar, mail/webmail, files,
  webdav, and api subdomains within the wildcard TLS server block.
- All ACME modes (pic_ngo, cloudflare, duckdns) use the new
  subdomain matchers; http01 emits a dedicated server block per service.
- http01: installed store-plugin services whose name clashes with a
  core service are skipped to prevent duplicate server blocks.
- routes/config.py: ip_utils.write_caddyfile() is skipped in non-LAN
  modes so LAN Caddy config never overwrites the ACME config.
- firewall_manager.generate_corefile(): new split_horizon_zones param
  adds local authoritative file zones so LAN clients resolve
  *.pic1.pic.ngo to the internal Caddy IP without hairpin NAT.
- NetworkManager.update_split_horizon_zone(): writes the wildcard zone
  file and regenerates the Corefile with the split-horizon block;
  called automatically after every identity change in non-LAN mode.
- Added @ to allowed record-name chars in update_dns_zone validation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-28 04:31:57 -04:00
roof 1f016de855 feat: make DDNS domain_name the effective domain across all services
Unit Tests / test (push) Successful in 11m35s
- ConfigManager.get_effective_domain(): returns domain_name when DDNS
  active (pic_ngo/cloudflare/duckdns), domain otherwise. Used by all
  public-facing services so they use the real registered FQDN.
- ConfigManager.get_internal_domain(): always returns _identity.domain
  (CoreDNS zone name, dnsmasq, cell-link invites — stays internal).
- Silent migration: if domain_mode != lan and domain is generic "cell",
  auto-set to {cell_name}.local for unique CoreDNS zone naming.
- caddy_manager: fix custom_domain bug — cloudflare/http01 modes were
  reading identity.get('custom_domain') which never exists; now reads
  domain_name correctly.
- routes/config, app: expose effective_domain in GET /api/config and
  /api/status responses.
- email_manager, routes/email: use get_effective_domain() for
  OVERRIDE_HOSTNAME, POSTMASTER_ADDRESS, and new-user email defaults.
- ServiceBus.IDENTITY_CHANGED event: emitted from PUT /api/config and
  POST /api/ddns/register after identity writes; caddy_manager and
  email_manager subscribe to regenerate config automatically.
- Settings.jsx: hide Local Domain input in non-LAN modes; show
  read-only effective_domain with "managed by DDNS" badge and an
  Advanced toggle for the internal CoreDNS zone name.
- 11 new test classes covering all new helpers, event subscriptions,
  caddy/email handlers, and the custom_domain fix.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-28 02:48:47 -04:00
roof 7d290c12c4 Phase 2: caddy_manager — Caddyfile generation, health monitor, DNS-01 support
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 09:04:11 -04:00