- config/cosign/cosign.pub: public verification key committed to repo (safe);
cosign private key lives in /home/roof/.pic-secrets/ and is NEVER committed
- api/config_manager.py: image_verification config block (modes: off|warn|enforce,
default: warn) so existing deployments are unaffected until images are signed
- api/service_composer.py: cosign verify before pull/up; enforce aborts the
operation, warn logs and proceeds, off skips entirely; also fixes the prior
unsafe proceed-on-pull-failure path
- api/service_store_manager.py: store-image digest requirement (warn default,
reject under enforce)
- api/Dockerfile: cosign binary copied from the official cosign image
- docker-compose.yml: config/cosign/ bind-mounted into cell-api container
- install.sh: ensure/verify bundled cosign pubkey on new cell installs
- api/manifest_validator.py: validate_build_context() — Dockerfile lint
- tests: full coverage for config modes, composer verify paths, store digest
guard, and validate_build_context
Verification defaults to warn so nothing breaks in production until images are
signed (phase 2). Private key stored outside git at /home/roof/.pic-secrets/.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Three independent bugs surfaced during pic1 clean-install testing:
1. Tor _exit_status hardcoded configured=True regardless of whether Tor was
actually installed. Status now flows through the same store-installed /
container-running bridge used by every other optional service, so Tor only
reports installed when the container is present and running.
2. check_port_open compared the port from wg0.conf against the kernel-reported
listening port, causing false "port closed" results whenever the conf and the
running container were momentarily out of sync. The function is now an honest
liveness check: any wg0 interface that is up and has a "listening port:" line
in `wg show` is considered open. The check-port API endpoint now also returns
the actual kernel listening_port and a port_mismatch flag so the UI can inform
the user when a container recreate is needed. (The recreate machinery already
exists via the port-change pending-restart path; this fix makes the mismatch
visible rather than silently lying about reachability.)
3. upload_backup only handled .zip archives; encrypted .age blobs were rejected
with a generic error. The endpoint now calls backup_crypto.is_encrypted() to
detect Age-encrypted blobs and stores them verbatim as <id>.tar.gz.age with
mode 0600 so they can be uploaded and then restored with a passphrase. The
plaintext zip path is unchanged.
Tests added/updated: test_connectivity_manager.py (Tor status bridge),
test_wireguard_manager.py + test_wireguard_endpoints.py (port-check liveness
and mismatch flag), test_config_backup_restore_http.py (encrypted upload
round-trip).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
cell exits surface as cell_relay connections via reconcile, bridged onto
the existing cell route_via mechanism, health from handshake, loop
detection, assignable in the unified UI
- CELL_RELAY_TYPE constant; not manually creatable
- reconcile_cell_relays() derives connections from cell links offering an
exit (name "Cell: <cellname>", mark+table only, no iface/port/container)
- apply_routes bridges cell_relay to existing route_via path via
apply_peer_route_via + cell firewall rules + set_exit_relay_active;
keeps peer.route_via in sync
- _probe_cell_relay health from cell handshake + offer state
- _cell_relay_loops loop detection at assign and apply time
- FAILOPEN_DEFAULTS cell_relay=False
- set_peer_exit clears stale route_via on reassignment
- reconcile hooked into PUT /exit-offer and peer-sync/permissions handlers
- cell_link_manager + wireguard_manager wired into connectivity_manager
- UI: cell_relay in TYPE_META/GROUP_TYPES/GROUP_LABELS (Cells optgroup),
removed "coming soon" placeholder
- 18 new tests in tests/test_connectivity_cell_relay.py
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
instanceable rendering, per-instance up/down on create/delete,
store-service-installed gate, per-instance health
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Health probes (probe_health/refresh_health) are type-aware: WireGuard
checks the last WG handshake timestamp, OpenVPN checks the tun/tap
interface, Tor checks the control-port GETINFO, and sshuttle/proxy
types do a TCP reachability probe to the remote endpoint. Results are
persisted via set_connection_status and wired into the health_monitor_loop
so the UI always has a current health snapshot without polling.
Per-peer fail-open semantics: VPN, SSH, and proxy connections default to
fail-closed (kill-switch stays active even when the tunnel is down).
Tor defaults to fail-open. The default can be overridden per-peer via
set_peer_failopen/effective_failopen. apply_routes skips the fwmark and
kill-switch rules for any fail-open peer whose connection health is not
"working", letting traffic fall back to direct routing transparently.
New generic admin-only connection CRUD endpoints (GET/POST/PUT/DELETE
/api/connectivity/connections, GET /<id>/health, PUT
/api/connectivity/peers/<peer>/failopen) are guarded by the existing
admin role check. connection.create, connection.update, connection.delete,
and peer.failopen are all registered in ROUTE_ACTION_MAP for the audit
hook so every change is recorded in the owner-visible change log.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Add AuditManager (api/audit_manager.py): JSONL append-only log at
data/api/audit/audit.log with SHA-256 hash chain for tamper detection,
verify endpoint, size-based rotation, and automatic redaction of secret
fields before any entry is written. Supports structured query (actor,
action, date range) and CSV export.
Wire an @app.after_request hook in app.py that fires on every mutating
/api/* request: captures actor, role, remote IP, and maps the route +
method to a human-readable action via ROUTE_ACTION_MAP. Explicit audit
entries for password_change and password_reset are added in
auth_routes.py so those events record the actor without logging secret
values.
Expose an admin-only blueprint (api/routes/audit.py):
GET /api/audit — paginated query
GET /api/audit/export — CSV download
GET /api/audit/verify — hash-chain integrity check
Register AuditManager in managers.py and add api/audit to
config_manager.py critical_data_paths so it is included in backups and
restored with other persistent state.
Add Activity page (webui/src/pages/Activity.jsx, admin-only) reachable
from the nav in App.jsx. New auditAPI helper in api.js covers all three
endpoints.
Tests: test_audit_manager.py (unit: hash chain, redaction, rotation,
query, csv, verify) and test_audit_hook_routes.py (integration: hook
fires on mutating routes, skips safe methods, records actor/ip/action,
backup-inclusion assertion).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Root causes fixed:
- Dead LOG_LEVEL globals() lookup pinned root logger at INFO regardless of
PIC_LOG_LEVEL env or config; replaced with _resolve_root_log_level() +
apply_root_log_level() which sets both root logger and all attached handlers
at startup and on runtime re-apply.
- set_service_level() only set the named 'pic.<service>' logger; bare module
loggers (e.g. 'caddy_manager') were never reached, so per-service log files
stayed 0 bytes. Fixed via _SERVICE_MODULE_LOGGERS map covering all managers.
- Log viewer GET /api/logs had no level filter; added ?level= query param.
- Per-service log levels lived in an out-of-band config/api/log_levels.json
side-file with no validation; migrated into ConfigManager under a new
'logging' section ({python:{root,services}, containers:{caddy,coredns,
wireguard,mailserver,api}}) with get/set helpers, invalid-level rejection,
and one-time migration from the old file on first load.
New capabilities:
- Container log levels: Caddy (injects global log { level X } + hot reload),
CoreDNS (DEBUG enables log plugin, else errors-only), WireGuard/mailserver
via pending_restart path.
- PUT /api/logs/verbosity accepts {python, containers} dict; returns per-entry
applied:hot|pending_restart status.
- Webui Logs page gains two-section Verbosity tab (Python services + Container
services) with needs-restart badges.
- managers.py wires per-service loggers before manager instantiation and
re-applies persisted levels from ConfigManager; legacy log_levels.json read
removed.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
apply_routes now iterates over connection instances rather than types:
each instance gets its own fwmark, routing table, interface, and
redirect_port via _routing_connections / _resolve_peer_connection /
_apply_connection_for_src; kill-switch is enforced per iface-instance.
Old per-type MARKS/TABLES constants are kept only as migration scaffolding.
peer_registry: exit_via is now stored as a connection id (or 'default');
_migrate_exit_via_to_connection_id runs on _load_peers to upgrade legacy
type-string values; set_peer_exit_via validates against known connection
ids; VALID_EXIT_VIA removed; config_manager wired in from managers.py.
egress_manager: egress_overrides keyed by service_id → connection_id;
local MARKS/TABLES/EXIT_TYPES/_REDIRECT_PORTS/_add_tor_redirect removed;
(mark, table, redirect_port) resolved at apply-time via
connectivity_manager.get_connection; manifest egress.allowed still
enforced by connection type.
api/app.py + api.js: PUT peer/service exit endpoints accept {connection_id};
back-compat shim resolves a legacy type string to its single active instance.
Tests extended: two same-type instances produce distinct marks/tables/ports;
peer exit_via and egress override id migrations round-trip correctly;
single-instance behaviour is equivalent to the old type-keyed path.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Migrate from the single-exit-per-type model (one wireguard_exit, one
tor_exit, etc.) to N named connection instances, each carrying its own
resource allocations and vault-backed secret refs.
config_manager.py:
- Connectivity v2 schema: top-level `connections` list, each entry has
id, name, type, enabled, status, config, secret_ref, and allocated
resources (mark, table, iface, redirect_port).
- Helpers: get_connectivity / list_connections / get_connection /
add_connection / update_connection / delete_connection /
set_connection_status.
- v1→v2 migration: promotes legacy wireguard_exit / tor fields into
the new list on first load; idempotent on v2 configs.
connectivity_manager.py:
- Resource allocator: per-instance fwmark range 0x1000–0x1FFF, routing
table range 1000+, interface names, and redirect ports 9100–9199;
all tracked in config to survive restarts.
- Connection CRUD: create / update / delete / list / get with vault
secret refs for WireGuard private keys and Tor credentials.
- Single-Tor enforcement: rejects a second tor/tor_bridge instance at
creation time.
- Per-instance config validation for each connection type.
- apply_routes, peer wiring, and egress hookups are intentionally left
unchanged in this phase; they land in later phases alongside UI.
tests/test_connectivity_connections.py (new, 473 lines):
- Allocator uniqueness, v1→v2 migration round-trip, CRUD lifecycle,
single-Tor enforcement, and status transitions.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The pic1 commit (c65beb2) correctly removed rp_filter sysctl from
WireGuard PostUp/PostDown because writing /proc/sys fails in the
unprivileged (NET_ADMIN-only) container and crashed wg-quick. Two
tests that asserted rp_filter was present were left stale. Replace
them with a single test asserting rp_filter is NOT in the generated
config, restoring green main.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Security — WireGuard:
- Replace linuxserver/wireguard (privileged + SYS_MODULE + /lib/modules) with a
bespoke alpine image (wireguard/Dockerfile + entrypoint.sh): CAP_NET_ADMIN only,
119 MB → 14.7 MB. Modern kernels (≥5.6) have WireGuard built in; no module
loading required. Kernel-fallback comment left in compose for rare old kernels.
Security — supply-chain digest pins:
- CoreDNS image pinned by SHA-256 digest in docker-compose.yml.
- api/Dockerfile: python:3.11-slim and docker:27-cli pinned by digest.
- webui/Dockerfile: node:20-alpine and nginxinc/nginx-unprivileged:alpine pinned.
- ntp/Dockerfile: alpine:3.20 pinned by digest.
- wireguard/Dockerfile: alpine:3.20 pinned by digest.
Security — webui non-root:
- Switch from nginx:alpine (root, port 80) to nginxinc/nginx-unprivileged:alpine
(port 8080, runs as nginx uid 101). Compose port mapping and all Caddy upstream
references updated: cell-webui:80 → cell-webui:8080 everywhere.
API layer reduction (561 MB → 245 MB):
- Multi-stage api/Dockerfile: docker CLI copied from docker:27-cli stage instead
of being installed via apt from Docker's external repo (removes GPG key fetch,
lsb-release, gnupg, two apt-get update rounds). --no-install-recommends on
remaining apt install. mkdir folded into the same RUN layer.
Bug fix — WireGuard config path mismatch:
- setup_cell.py wrote wg0.conf to config/wireguard/wg0.conf but wireguard_manager
and the new entrypoint expect config/wireguard/wg_confs/wg0.conf (the standard
wg-quick sub-directory). Fixed by creating the wg_confs/ sub-dir and writing
there; REQUIRED_DIRS updated to pre-create it.
Bug fix — empty chrony.conf:
- config/ntp/chrony.conf was 0 bytes (pre-existing gap); added a real config
(pool.ntp.org + Cloudflare, allow 172.20/10.0, local stratum 10, driftfile,
makestep, rtcsync). NTP compose service now builds from ./ntp instead of
pulling alpine:latest and running apk at every container start.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Adds TestStartupCaddyRegen::test_startup_regenerates_caddyfile_first,
asserting that _apply_startup_enforcement() calls
caddy_manager.regenerate_with_installed([]) before any peer/iptables work.
This pins the fix that ensures a stale on-disk Caddyfile (e.g. missing
`admin 0.0.0.0:2019`) is overwritten at startup and cannot cause the health
monitor to restart Caddy every few minutes.
Also restores two displaced lines in test_health_history_maxlen_evicts_old_entries.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
VPN clients got dns_probe_finished_bad_config / couldn't resolve any domain
after first setup because:
1. complete_setup() never wrote the split-horizon DNS zone for non-LAN modes;
SetupManager now accepts network_manager as an optional 3rd constructor
param, and complete_setup() calls
self.network_manager.update_split_horizon_zone(effective_domain, wg_ip,
primary_domain) for pic_ngo/cell_to_cell modes.
2. generate_corefile() used a tmp-file + os.replace pattern; the Corefile is
a Docker FILE bind-mount, so os.replace orphaned the inode and CoreDNS
never saw config updates. Fixed by truncating and rewriting in place
(open with 'w', seek(0), truncate()), preserving the inode CoreDNS holds.
api/managers.py passes network_manager into SetupManager.
Tests: new mock_network_manager fixture, 2 setup-zone tests, 1 inode
regression test in test_firewall_manager.py.
Verified live on pic1.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Coverage was below acceptable levels and several newly-added code paths
(sshuttle egress, proxy egress, DDNS provider stubs, DNS overview route,
peer-registry provisioning) had zero test coverage.
~250 new unit tests are added across 16 new test files. Existing test files
are updated to match refactored interfaces (DHCP removed, constants
introduced, network_manager restructured). .coveragerc is added to pin the
source mapping and the 70% floor so regressions are caught at commit time.
tests/test_enhanced_api.py was previously living in api/ (wrong location)
and is moved to tests/ where it belongs.
Integration test files are updated to remove references to DHCP endpoints
and add coverage for the new DNS overview and DDNS sync endpoints.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Four bugs fixed:
1. Banner delay (up to 5 s): DraftConfigContext now exposes isDirty as
reactive useState so App.jsx re-renders immediately when any section
marks itself dirty, instead of waiting for the next checkPending() poll.
2. Banner re-triggers after Apply (race): For non-'*' container restarts
(e.g., cell_name → DNS restart) the background thread took ~300 ms to
clear _pending_restart. A concurrent checkPending() poll could see
needs_restart=True and overwrite the frontend's optimistic clear.
Fix: set needs_restart=False and applying=True synchronously before
spawning the thread.
3. Apply showed banner during applyPending() when hasDirty()==false:
setApplyStatus('saving') was skipped for the auto-save-then-apply
path, leaving applyStatus=null while applyPending() ran and the
banner stayed visible. Always set 'saving' before applyPending().
4. Cert status always 'unknown' in pic_ngo mode: _check_cert_via_ssl
connected to cell-caddy:443 but sent SNI='cell-caddy'. Caddy finds no
matching cert and returns nothing. Fix: pass the effective public
domain (e.g. pic1.pic.ngo) as SNI so Caddy returns the right cert.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
_bootstrap_dns runs at container start before the wizard, writing the
default cell name ('mycell') into cell.zone. When the wizard completed
it fired IDENTITY_CHANGED for Caddy but never updated the DNS zone, so
DNS records kept showing 'mycell.cell' even after naming the cell.
After successful wizard completion, call network_manager.apply_cell_name
to rename the hostname record in the primary zone file, then reload
CoreDNS. The empty old_name triggers auto-detection so it works even
when the zone was written with the env-var default.
Adds test_setup_route.py covering: apply_cell_name called on success,
not called on failure, 410 on repeat completion, and IDENTITY_CHANGED
publication.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix#2: Move DDNS bearer token from cell_config.json to data/api/ddns_token.
Token is now in the secrets store (data/) rather than the config store (config/).
Auto-migrates existing installs on first access. ConfigManager.get/set_ddns_token()
added. set_ddns_config() now strips 'token' key to prevent it leaking back.
- Fix#3: Set Caddyfile permissions to 0o600 after write so the token embedded
in the Caddyfile is not world-readable on the host filesystem.
- Fix#5: Heartbeat now fires IDENTITY_CHANGED after re-registration so Caddy
regenerates its config with the new token automatically — users no longer need
to click Re-register in Settings after a wizard registration failure.
Also: heartbeat skips the 401-cycle when no token exists and goes straight to
registration instead. DDNSManager now accepts service_bus= and is wired up.
- Fix#6: Settings page starts polling GET /api/caddy/cert-status every 15s
after a successful DDNS re-registration and shows "Acquiring certificate…"
feedback until Let's Encrypt issues the cert (up to 5 minutes).
- Fix#7: regenerate_with_installed() is debounced (5 s window) so two rapid
IDENTITY_CHANGED events (e.g. wizard + heartbeat) can't start simultaneous
ACME orders that interfere with each other.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
GET http://cell-caddy:2019/ returns 404 because Caddy's admin API has no
root handler. The health monitor interpreted every response as a failure,
restarted Caddy every 3 minutes, and prevented ACME from ever completing.
/config/ returns 200 + the running config JSON whenever Caddy is up and
serving — that is the correct liveness indicator.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
A stale or empty-token Caddyfile on disk caused Caddy to reject the
/load request, so the Renew button appeared to do nothing. Now
renew_cert() calls regenerate_with_installed([]) first, which writes a
fresh Caddyfile from current identity/config before reloading Caddy.
This ensures a broken on-disk file never blocks ACME renewal.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
On a fresh install before DDNS registration completes, ddns.token is
empty. Writing `token ` (bare keyword, no value) causes Caddy to reject
the Caddyfile at startup with "wrong argument count or unexpected line
ending after 'token'".
Guard added: if the token is empty, generate a LAN-mode Caddyfile so
Caddy starts cleanly. The Caddyfile is regenerated automatically once
registration completes and the token is persisted to cell_config.json.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds live cert status, one-click ACME renewal, and custom cert upload
directly to the Vault page so users never need to touch Caddy config.
Backend:
- CaddyManager.get_cert_status() now returns domain, domain_mode, and
cert_type so the UI can render the right controls without a separate
identity fetch
- CaddyManager.renew_cert() reloads Caddy and invalidates the status
cache; the frontend polls until the cert turns valid
- CaddyManager.upload_custom_cert() validates PEM, writes cert+key to
the shared config/caddy/certs/ volume, updates identity (cert_type=custom),
and regenerates the Caddyfile so Caddy references the new paths
- LAN-mode Caddyfile switches from /etc/caddy/internal/ to the shared
certs dir automatically when cert_type=custom is set
- ddns_api default no longer includes /api/v1 — the plugin appends it;
legacy /api/v1 suffix is stripped at write time to keep the Caddyfile clean
- POST /api/caddy/cert-renew and POST /api/caddy/custom-cert routes added
Frontend:
- TLSPanel component at the top of Vault.jsx shows status badge
(valid/expiring-soon/expired/pending/internal) with domain and expiry
- Renew button visible only for ACME modes; spins during the API call
then polls GET /api/caddy/cert-status every 10 s until valid
- Upload Custom Cert opens a modal with PEM text areas; works for all modes
- caddyAPI.renewCert() and uploadCustomCert() added to api.js
Tests: 22 new tests across 5 classes covering enriched status,
renew_cert guards, upload_custom_cert validation/writes/persistence,
custom-cert Caddyfile path selection, and ddns_api suffix stripping.
All 2093 existing tests continue to pass.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1. caddy_manager: embed ddns.token (registration bearer token) in
Caddyfile, not DDNS_TOTP_SECRET. The pic_ngo plugin sends the token
to POST /api/v1/dns-challenge; using the TOTP secret caused 401 on
every attempt.
2. firewall_manager: add _acme-challenge.<zone> forwarding block before
each split-horizon zone in the Corefile. Without this, CoreDNS was
authoritative for the challenge name and returned NODATA for TXT
queries (wildcard A record matches but wrong type), blocking Caddy's
internal DNS pre-verification step.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The method is named get_email_users in EmailManager; the route was
calling the non-existent get_users, causing an AttributeError on every
GET /api/email/users request.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
VPN peers can reach Caddy via the host's WireGuard interface (10.0.0.1),
not via the Docker bridge IP (172.20.0.2) which is unreachable outside
the container network. _bootstrap_dns now calls _get_wg_server_ip()
instead of ip_utils.get_service_ips() so the internal zone returns a
routable address for service subdomains.
Also log config save failures instead of silently swallowing them —
the silent PermissionError/OSError was masking write failures and
making it impossible to diagnose why installed services disappeared
after container restarts.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
make start-core (called by install.sh step 6) used $(DCF) which includes
docker-compose.services.yml — that file declares cell-network as external:true.
On a fresh machine the network doesn't exist yet, so compose up failed with
"network cell-network declared as external, but could not be found".
Fix: add the same network-create idempotency guard that start and update
already have. Also add 26 regression tests (test_install_process.py) that
verify install.sh structure and that all start-* targets using DCF create
the network before running compose up.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- PicNgoDDNS.update(): send token in request body instead of Authorization
header; DDNS server validates it from body (was returning HTTP 422 on
every heartbeat, leaving IP record stale after fresh install)
- peers.py / Peers.jsx: webdav service_access only valid when 'files' store
service is installed; was always shown even with no services, confusing
users into thinking WebDAV was pre-installed
- 10 new regression tests: DDNS update body contract, Caddy always
regenerates on startup with no services, peer role allowed on
/api/services/active, webdav gating by installed services
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- DNS (critical): add _configured_dns_params() that returns (primary_domain,
split_horizon_zones) from config_manager so all apply_all_dns_rules() callers
pass the correct primary zone (e.g. 'pic.ngo') and split-horizon list
(e.g. ['pic1.pic.ngo']) instead of the FQDN as the primary — fixes
DNS_PROBE_FINISHED_BAD_CONFIG for all external domains when on VPN
- firewall_manager: add split_horizon_zones param to apply_all_dns_rules()
and forward it to generate_corefile()
- Peers: filter service_access list to installed services only; peers.py
derives valid services from config_manager.get_installed_services() with
the email→mail ID mapping; Peers.jsx fetches from /api/store/installed
and filters the checkboxes and defaults accordingly
- Health check: fix file_manager→'files' ID mapping so files service health
is checked when installed (was silently skipped due to 'file' vs 'files')
- Verbosity persistence: move log_levels.json from non-mounted
/app/api/config/ to CONFIG_DIR (/app/config/) which maps to config/api/
on the host; both load (managers.py) and save (routes/services.py) updated
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
_build_dns_records() only hardcoded 'api' and 'webui', relying on the
optional service registry for the rest. Built-in services (calendar,
files, mail, webdav) were never registered, so they were absent from
the zone file and tests querying webdav.<domain> via CoreDNS got
NXDOMAIN.
Add _BUILTIN_SERVICE_SUBDOMAINS constant and include those names in
every zone build. Also update _stale and apply_cell_name exclusion
sets so DDNS mode correctly removes them from the parent zone.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
nft lives in /usr/sbin which is absent from the non-root PATH on Debian.
The delete call already used sudo; add it to the list call too so the
session-scoped cleanup fixture doesn't crash before any test runs.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
wg-quick creates an nftables 'preraw' table per interface that drops
decrypted ICMP replies arriving on any other interface. If a test run
crashes before bring_down(), the table persists and silently kills pings
on subsequent runs (handshake succeeds, replies are decrypted, but the
stale table drops them before the ping process sees them).
Extend cleanup_stale_e2e_interfaces() to also delete any orphaned
wg-quick-pic-e2e-* nftables tables found on the host.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
test_wg_connect_and_ping_server and the connected_peer fixture hardcoded
10.0.0.1 / 10.0.0.0/24 as the server VPN address. This breaks when the
server uses a different subnet (e.g. pic1 uses 10.0.1.1/24). Now both
read 'address' from /api/wireguard/status at session start and pass the
live server_ip / server_network through wg_server_info and connected_peer.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The test_remote_permissions_pushed_to_cell2 test verifies that permission
changes on cell1 are pushed to cell2 via the WireGuard tunnel. When both
cells use a public endpoint (DDNS VPS) instead of LAN IPs, no tunnel is
established and the push silently fails. The test now probes cell2's API
at its WG DNS IP before asserting the push succeeded — skips gracefully
if the tunnel is down rather than failing.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
API:
- _configured_domain() now prefers _identity.domain_name (full FQDN
e.g. 'test5.pic.ngo') over domain ('pic.ngo'). Service URLs in
/api/peer/services and /api/peer/dashboard now correctly return
'calendar.test5.pic.ngo' instead of 'calendar.pic.ngo'.
WG e2e tests:
- test_api_domain_returns_json_not_webui: accept 3xx redirect as
valid routing (Caddy redirects HTTP→HTTPS in pic_ngo mode).
- test_catchall_api_path_returns_json and test_catchall_root_serves_webui:
skip when Caddy is in HTTPS-redirect mode — catch-all :80 block only
exists in HTTP-mode cells (lan/local domain).
- test_http_api_domain_reaches_api: replace --dns-servers (requires
c-ares) with dig + curl --host pattern.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Cells with wildcard zone (e.g. * -> 172.20.0.2) and cells with per-service
VIP DNS records are both valid. Accept either in the assertion so the test
passes regardless of the zone file style.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
test_peer_services_* functions hardcoded 'http://192.168.31.51:3000' as the
fallback for PIC_API_BASE, causing failures when tests run on any other host
(including pic1 itself). Use the api_base fixture, which reads PIC_HOST and
PIC_API_PORT from the environment.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The synthetic cell fixture used a 46-char base64 key where the validator
expects exactly 43 chars before '='. The key failed format validation so
add_cell_peer returned False, making the cell connection store nothing and
all TestCellPermissionsApi tests hit 404.
The TestCellServiceAccessRestrictions and TestLiveCellConnection teardown
fixtures called _remove_connection(cell2_client, ...) without checking if
cell2_client is None (expected when no second cell is configured), causing
AttributeError on teardown.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The e2e tests were reading a stale Corefile at a hardcoded fallback path
(/home/roof/pic/config/dns/Corefile) instead of the live one written by
the API (/opt/pic/config/dns/Corefile on pic1). Adding a proper API
endpoint eliminates the path ambiguity.
The iptables test was checking whether peer_ip, DROP, and dpt:80 appeared
anywhere in the full multi-line output rather than on the same rule line,
producing false positives. Now checks per line.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
POST requests from PicAPIClient were failing with 403 (CSRF token missing)
because the login response csrf_token was not being applied to subsequent
request headers.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- CaddyManager: add refresh_cert_status() and get_cert_status_fresh() that
open a live TLS connection to cell-caddy:443 to read cert expiry; avoids
needing a volume mount into the API container
- CaddyManager: periodic cert refresh in health_monitor_loop (every 60 cycles)
- config.py PUT /api/ddns: publish IDENTITY_CHANGED so CaddyManager regenerates
the Caddyfile immediately after any domain/cell_name change — previously the
event was never fired from this route
- config.py: remove all ip_utils.write_caddyfile() calls; CaddyManager is now
the sole authority for Caddyfile generation
- app.py: add GET /api/caddy/cert-status route
- app.py: add GET /api/egress/status and PUT /api/egress/services/<id>/exit routes
- Settings.jsx: display cert status badge (valid/expired/internal/unknown) with
expiry date and days-remaining in the domain section
- Tests: TestRefreshCertStatus (8 tests), TestDdnsConfigUpdatesFiresIdentityChanged,
TestCaddyCertStatusRoute added; fix expired-cert helper to set not_valid_before
relative to expiry so it's always earlier
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- add_peer() now calls account_manager.provision() for any installed store
service whose manifest declares accounts.manager == 'http', enabling
per-peer credential provisioning to third-party HTTP services
- reapply_on_startup() calls egress_manager.apply_all() so fwmark rules
survive container restarts without manual intervention
- add GET /api/egress/status and PUT /api/egress/services/<id>/exit routes
so the UI can read and override per-service egress policy
- tests: HTTP provision wiring (happy path + non-fatal failure), egress
apply_all at startup (wired/unwired/failure cases)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
acme_ca and the pic_ngo DNS credentials ({$PIC_NGO_DDNS_TOKEN},
{$PIC_NGO_DDNS_API}) were written as Caddy env-var placeholders, but the
Caddy container does not inherit the API container's environment, so the
substitutions always failed — Caddy saw bare directive names with no
arguments and rejected the Caddyfile.
- _global_acme_block: only emit the acme_ca directive when ACME_CA_URL is
actually set; omitting it makes Caddy default to Let's Encrypt production.
- _caddyfile_pic_ngo: embed the DDNS_TOTP_SECRET and DDNS_URL values directly
into the Caddyfile at write time rather than relying on Caddy env expansion.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- ConnectivityManager: move config dirs to data_dir/services/<id>/config so
Docker can bind-mount them into store-service containers (Docker resolves
bind-mount paths on the host, not inside the API container). Add
_migrate_legacy_configs to copy existing files from the old config_dir
location on first boot.
- manifest_validator: add allow_host_network parameter to
validate_rendered_compose. When True, waives the external-network
requirement, permits network_mode: host, and allows devices: — all needed
by VPN/Tor containers that must share the host network namespace to create
tun/wg interfaces. Non-host services are unaffected.
- service_composer: read requires_host_network from the manifest and pass
allow_host_network=True to validate_rendered_compose for connectivity
services.
- Tests: update file-path assertions to new data_dir layout; add
TestMigrateLegacyConfigs, TestValidateRenderedComposeHostNetwork, and
two TestWriteCompose cases for the host-network path.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Tests verify:
- /services page loads and lists all available services
- Admin can install calendar, files, email, and webmail via the store UI
- Install order respects dependencies (email before webmail)
- Uninstall flow shows confirmation dialog before removing
- Dashboard shows service links after install
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two manifest validation bugs blocked all store service installs:
1. service_store_manager.RESERVED_SUBDOMAINS included 'mail', which
prevented the email service from using its required subdomain.
Removed mail/calendar/files/webmail — they belong to official PIC
store services and must be claimable by them.
2. manifest_validator required @sha256 digest pins on ALL images,
including first-party git.pic.ngo/roof/* images that the PIC team
builds and controls. service_store_manager._validate_manifest already
only warned for first-party images; the secondary validator was
stricter than intended, causing a hard reject on :latest tags.
Aligned to warn-not-reject for first-party; malformed digests (when
provided) are still a hard error.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Routes outbound traffic from installed service containers through
alternate exits (wireguard_ext, openvpn, tor) using host-side
iptables fwmark policy-routing in a dedicated PIC_EGRESS chain.
Marks 0x110/0x120/0x130 are distinct from ConnectivityManager's
0x10/0x20/0x30. Container IPs discovered at runtime via docker
inspect. Wired into ServiceStoreManager install/remove lifecycle
and managers.py singleton. 22 new tests.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Services with accounts.manager='http' now use POST/DELETE to the
service container's /service-api/accounts endpoint instead of
requiring a named Python manager. _resolve_service allows 'http'
without a registered Python object; _provision_http and
_deprovision_http handle the HTTP calls with 404-as-success on
delete. 9 new tests.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Services are now installed post-setup from the Store page, so the
wizard step that let users pre-select email/calendar/files is removed.
Reduces wizard from 5 steps to 4 (Step4Services deleted, Step5Review
renamed to Step4Review). Backend drops services_enabled validation,
background install thread, and service_store_manager dependency.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Email/calendar/files routes now return 404 when the service is not
installed, using a require_active_service decorator that checks
ServiceRegistry. Status endpoints are exempt so health checks always work.
SetupManager.complete_setup() now accepts a service_store_manager and
installs any wizard-selected services in a background daemon thread after
setup completes. Failures are logged but do not fail the wizard.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>