Root causes fixed:
- Dead LOG_LEVEL globals() lookup pinned root logger at INFO regardless of
PIC_LOG_LEVEL env or config; replaced with _resolve_root_log_level() +
apply_root_log_level() which sets both root logger and all attached handlers
at startup and on runtime re-apply.
- set_service_level() only set the named 'pic.<service>' logger; bare module
loggers (e.g. 'caddy_manager') were never reached, so per-service log files
stayed 0 bytes. Fixed via _SERVICE_MODULE_LOGGERS map covering all managers.
- Log viewer GET /api/logs had no level filter; added ?level= query param.
- Per-service log levels lived in an out-of-band config/api/log_levels.json
side-file with no validation; migrated into ConfigManager under a new
'logging' section ({python:{root,services}, containers:{caddy,coredns,
wireguard,mailserver,api}}) with get/set helpers, invalid-level rejection,
and one-time migration from the old file on first load.
New capabilities:
- Container log levels: Caddy (injects global log { level X } + hot reload),
CoreDNS (DEBUG enables log plugin, else errors-only), WireGuard/mailserver
via pending_restart path.
- PUT /api/logs/verbosity accepts {python, containers} dict; returns per-entry
applied:hot|pending_restart status.
- Webui Logs page gains two-section Verbosity tab (Python services + Container
services) with needs-restart badges.
- managers.py wires per-service loggers before manager instantiation and
re-applies persisted levels from ConfigManager; legacy log_levels.json read
removed.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Security — WireGuard:
- Replace linuxserver/wireguard (privileged + SYS_MODULE + /lib/modules) with a
bespoke alpine image (wireguard/Dockerfile + entrypoint.sh): CAP_NET_ADMIN only,
119 MB → 14.7 MB. Modern kernels (≥5.6) have WireGuard built in; no module
loading required. Kernel-fallback comment left in compose for rare old kernels.
Security — supply-chain digest pins:
- CoreDNS image pinned by SHA-256 digest in docker-compose.yml.
- api/Dockerfile: python:3.11-slim and docker:27-cli pinned by digest.
- webui/Dockerfile: node:20-alpine and nginxinc/nginx-unprivileged:alpine pinned.
- ntp/Dockerfile: alpine:3.20 pinned by digest.
- wireguard/Dockerfile: alpine:3.20 pinned by digest.
Security — webui non-root:
- Switch from nginx:alpine (root, port 80) to nginxinc/nginx-unprivileged:alpine
(port 8080, runs as nginx uid 101). Compose port mapping and all Caddy upstream
references updated: cell-webui:80 → cell-webui:8080 everywhere.
API layer reduction (561 MB → 245 MB):
- Multi-stage api/Dockerfile: docker CLI copied from docker:27-cli stage instead
of being installed via apt from Docker's external repo (removes GPG key fetch,
lsb-release, gnupg, two apt-get update rounds). --no-install-recommends on
remaining apt install. mkdir folded into the same RUN layer.
Bug fix — WireGuard config path mismatch:
- setup_cell.py wrote wg0.conf to config/wireguard/wg0.conf but wireguard_manager
and the new entrypoint expect config/wireguard/wg_confs/wg0.conf (the standard
wg-quick sub-directory). Fixed by creating the wg_confs/ sub-dir and writing
there; REQUIRED_DIRS updated to pre-create it.
Bug fix — empty chrony.conf:
- config/ntp/chrony.conf was 0 bytes (pre-existing gap); added a real config
(pool.ntp.org + Cloudflare, allow 172.20/10.0, local stratum 10, driftfile,
makestep, rtcsync). NTP compose service now builds from ./ntp instead of
pulling alpine:latest and running apk at every container start.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Four bugs fixed:
1. Banner delay (up to 5 s): DraftConfigContext now exposes isDirty as
reactive useState so App.jsx re-renders immediately when any section
marks itself dirty, instead of waiting for the next checkPending() poll.
2. Banner re-triggers after Apply (race): For non-'*' container restarts
(e.g., cell_name → DNS restart) the background thread took ~300 ms to
clear _pending_restart. A concurrent checkPending() poll could see
needs_restart=True and overwrite the frontend's optimistic clear.
Fix: set needs_restart=False and applying=True synchronously before
spawning the thread.
3. Apply showed banner during applyPending() when hasDirty()==false:
setApplyStatus('saving') was skipped for the auto-save-then-apply
path, leaving applyStatus=null while applyPending() ran and the
banner stayed visible. Always set 'saving' before applyPending().
4. Cert status always 'unknown' in pic_ngo mode: _check_cert_via_ssl
connected to cell-caddy:443 but sent SNI='cell-caddy'. Caddy finds no
matching cert and returns nothing. Fix: pass the effective public
domain (e.g. pic1.pic.ngo) as SNI so Caddy returns the right cert.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix#2: Move DDNS bearer token from cell_config.json to data/api/ddns_token.
Token is now in the secrets store (data/) rather than the config store (config/).
Auto-migrates existing installs on first access. ConfigManager.get/set_ddns_token()
added. set_ddns_config() now strips 'token' key to prevent it leaking back.
- Fix#3: Set Caddyfile permissions to 0o600 after write so the token embedded
in the Caddyfile is not world-readable on the host filesystem.
- Fix#5: Heartbeat now fires IDENTITY_CHANGED after re-registration so Caddy
regenerates its config with the new token automatically — users no longer need
to click Re-register in Settings after a wizard registration failure.
Also: heartbeat skips the 401-cycle when no token exists and goes straight to
registration instead. DDNSManager now accepts service_bus= and is wired up.
- Fix#6: Settings page starts polling GET /api/caddy/cert-status every 15s
after a successful DDNS re-registration and shows "Acquiring certificate…"
feedback until Let's Encrypt issues the cert (up to 5 minutes).
- Fix#7: regenerate_with_installed() is debounced (5 s window) so two rapid
IDENTITY_CHANGED events (e.g. wizard + heartbeat) can't start simultaneous
ACME orders that interfere with each other.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
GET http://cell-caddy:2019/ returns 404 because Caddy's admin API has no
root handler. The health monitor interpreted every response as a failure,
restarted Caddy every 3 minutes, and prevented ACME from ever completing.
/config/ returns 200 + the running config JSON whenever Caddy is up and
serving — that is the correct liveness indicator.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
A stale or empty-token Caddyfile on disk caused Caddy to reject the
/load request, so the Renew button appeared to do nothing. Now
renew_cert() calls regenerate_with_installed([]) first, which writes a
fresh Caddyfile from current identity/config before reloading Caddy.
This ensures a broken on-disk file never blocks ACME renewal.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
On a fresh install before DDNS registration completes, ddns.token is
empty. Writing `token ` (bare keyword, no value) causes Caddy to reject
the Caddyfile at startup with "wrong argument count or unexpected line
ending after 'token'".
Guard added: if the token is empty, generate a LAN-mode Caddyfile so
Caddy starts cleanly. The Caddyfile is regenerated automatically once
registration completes and the token is persisted to cell_config.json.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds live cert status, one-click ACME renewal, and custom cert upload
directly to the Vault page so users never need to touch Caddy config.
Backend:
- CaddyManager.get_cert_status() now returns domain, domain_mode, and
cert_type so the UI can render the right controls without a separate
identity fetch
- CaddyManager.renew_cert() reloads Caddy and invalidates the status
cache; the frontend polls until the cert turns valid
- CaddyManager.upload_custom_cert() validates PEM, writes cert+key to
the shared config/caddy/certs/ volume, updates identity (cert_type=custom),
and regenerates the Caddyfile so Caddy references the new paths
- LAN-mode Caddyfile switches from /etc/caddy/internal/ to the shared
certs dir automatically when cert_type=custom is set
- ddns_api default no longer includes /api/v1 — the plugin appends it;
legacy /api/v1 suffix is stripped at write time to keep the Caddyfile clean
- POST /api/caddy/cert-renew and POST /api/caddy/custom-cert routes added
Frontend:
- TLSPanel component at the top of Vault.jsx shows status badge
(valid/expiring-soon/expired/pending/internal) with domain and expiry
- Renew button visible only for ACME modes; spins during the API call
then polls GET /api/caddy/cert-status every 10 s until valid
- Upload Custom Cert opens a modal with PEM text areas; works for all modes
- caddyAPI.renewCert() and uploadCustomCert() added to api.js
Tests: 22 new tests across 5 classes covering enriched status,
renew_cert guards, upload_custom_cert validation/writes/persistence,
custom-cert Caddyfile path selection, and ddns_api suffix stripping.
All 2093 existing tests continue to pass.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1. caddy_manager: embed ddns.token (registration bearer token) in
Caddyfile, not DDNS_TOTP_SECRET. The pic_ngo plugin sends the token
to POST /api/v1/dns-challenge; using the TOTP secret caused 401 on
every attempt.
2. firewall_manager: add _acme-challenge.<zone> forwarding block before
each split-horizon zone in the Corefile. Without this, CoreDNS was
authoritative for the challenge name and returned NODATA for TXT
queries (wildcard A record matches but wrong type), blocking Caddy's
internal DNS pre-verification step.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- CaddyManager: add refresh_cert_status() and get_cert_status_fresh() that
open a live TLS connection to cell-caddy:443 to read cert expiry; avoids
needing a volume mount into the API container
- CaddyManager: periodic cert refresh in health_monitor_loop (every 60 cycles)
- config.py PUT /api/ddns: publish IDENTITY_CHANGED so CaddyManager regenerates
the Caddyfile immediately after any domain/cell_name change — previously the
event was never fired from this route
- config.py: remove all ip_utils.write_caddyfile() calls; CaddyManager is now
the sole authority for Caddyfile generation
- app.py: add GET /api/caddy/cert-status route
- app.py: add GET /api/egress/status and PUT /api/egress/services/<id>/exit routes
- Settings.jsx: display cert status badge (valid/expired/internal/unknown) with
expiry date and days-remaining in the domain section
- Tests: TestRefreshCertStatus (8 tests), TestDdnsConfigUpdatesFiresIdentityChanged,
TestCaddyCertStatusRoute added; fix expired-cert helper to set not_valid_before
relative to expiry so it's always earlier
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
acme_ca and the pic_ngo DNS credentials ({$PIC_NGO_DDNS_TOKEN},
{$PIC_NGO_DDNS_API}) were written as Caddy env-var placeholders, but the
Caddy container does not inherit the API container's environment, so the
substitutions always failed — Caddy saw bare directive names with no
arguments and rejected the Caddyfile.
- _global_acme_block: only emit the acme_ca directive when ACME_CA_URL is
actually set; omitting it makes Caddy default to Let's Encrypt production.
- _caddyfile_pic_ngo: embed the DDNS_TOTP_SECRET and DDNS_URL values directly
into the Caddyfile at write time rather than relying on Caddy env expansion.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Builtins (email/calendar/files) are no longer baked into the API image.
ServiceRegistry now only knows about installed store services. When nothing
is installed, Caddy and DNS get no service routes — no hardcoded fallback.
Changes:
- service_registry.py: remove _BUILTINS_DIR, _builtin_ids, _builtin_manifest,
_load_manifest; get() and list_all() now delegate entirely to installed services
- caddy_manager.py: remove _build_core_service_routes(); remove hardcoded
fallback pairs from _http01_service_pairs(); empty registry → api block only
- network_manager.py: _get_service_subdomains() returns [] when no registry
- api/services/builtins/: deleted (email, calendar, files manifests)
- Tests updated throughout: removed builtin-dependent assertions, added
installed-service fixtures, updated fallback expectations to api-only
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
In DDNS modes (pic_ngo, cloudflare, duckdns, http01), all built-in
services are now reachable as subdomains of the cell domain, e.g.
calendar.pic1.pic.ngo instead of pic1.pic.ngo/calendar.
Key changes:
- CaddyManager._build_core_service_routes(): new helper generates
Caddy named-matcher host blocks for calendar, mail/webmail, files,
webdav, and api subdomains within the wildcard TLS server block.
- All ACME modes (pic_ngo, cloudflare, duckdns) use the new
subdomain matchers; http01 emits a dedicated server block per service.
- http01: installed store-plugin services whose name clashes with a
core service are skipped to prevent duplicate server blocks.
- routes/config.py: ip_utils.write_caddyfile() is skipped in non-LAN
modes so LAN Caddy config never overwrites the ACME config.
- firewall_manager.generate_corefile(): new split_horizon_zones param
adds local authoritative file zones so LAN clients resolve
*.pic1.pic.ngo to the internal Caddy IP without hairpin NAT.
- NetworkManager.update_split_horizon_zone(): writes the wildcard zone
file and regenerates the Corefile with the split-horizon block;
called automatically after every identity change in non-LAN mode.
- Added @ to allowed record-name chars in update_dns_zone validation.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- ConfigManager.get_effective_domain(): returns domain_name when DDNS
active (pic_ngo/cloudflare/duckdns), domain otherwise. Used by all
public-facing services so they use the real registered FQDN.
- ConfigManager.get_internal_domain(): always returns _identity.domain
(CoreDNS zone name, dnsmasq, cell-link invites — stays internal).
- Silent migration: if domain_mode != lan and domain is generic "cell",
auto-set to {cell_name}.local for unique CoreDNS zone naming.
- caddy_manager: fix custom_domain bug — cloudflare/http01 modes were
reading identity.get('custom_domain') which never exists; now reads
domain_name correctly.
- routes/config, app: expose effective_domain in GET /api/config and
/api/status responses.
- email_manager, routes/email: use get_effective_domain() for
OVERRIDE_HOSTNAME, POSTMASTER_ADDRESS, and new-user email defaults.
- ServiceBus.IDENTITY_CHANGED event: emitted from PUT /api/config and
POST /api/ddns/register after identity writes; caddy_manager and
email_manager subscribe to regenerate config automatically.
- Settings.jsx: hide Local Domain input in non-LAN modes; show
read-only effective_domain with "managed by DDNS" badge and an
Advanced toggle for the internal CoreDNS zone name.
- 11 new test classes covering all new helpers, event subscriptions,
caddy/email handlers, and the custom_domain fix.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>