Commit Graph

128 Commits

Author SHA1 Message Date
roof 7ef294fd65 fix: fall back to lan mode in pic_ngo Caddyfile when token is empty
Unit Tests / test (push) Successful in 7m42s
On a fresh install before DDNS registration completes, ddns.token is
empty. Writing `token ` (bare keyword, no value) causes Caddy to reject
the Caddyfile at startup with "wrong argument count or unexpected line
ending after 'token'".

Guard added: if the token is empty, generate a LAN-mode Caddyfile so
Caddy starts cleanly. The Caddyfile is regenerated automatically once
registration completes and the token is persisted to cell_config.json.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 13:38:51 -04:00
roof 33d255f089 feat: TLS certificate management in Vault page
Unit Tests / test (push) Successful in 7m26s
Adds live cert status, one-click ACME renewal, and custom cert upload
directly to the Vault page so users never need to touch Caddy config.

Backend:
- CaddyManager.get_cert_status() now returns domain, domain_mode, and
  cert_type so the UI can render the right controls without a separate
  identity fetch
- CaddyManager.renew_cert() reloads Caddy and invalidates the status
  cache; the frontend polls until the cert turns valid
- CaddyManager.upload_custom_cert() validates PEM, writes cert+key to
  the shared config/caddy/certs/ volume, updates identity (cert_type=custom),
  and regenerates the Caddyfile so Caddy references the new paths
- LAN-mode Caddyfile switches from /etc/caddy/internal/ to the shared
  certs dir automatically when cert_type=custom is set
- ddns_api default no longer includes /api/v1 — the plugin appends it;
  legacy /api/v1 suffix is stripped at write time to keep the Caddyfile clean
- POST /api/caddy/cert-renew and POST /api/caddy/custom-cert routes added

Frontend:
- TLSPanel component at the top of Vault.jsx shows status badge
  (valid/expiring-soon/expired/pending/internal) with domain and expiry
- Renew button visible only for ACME modes; spins during the API call
  then polls GET /api/caddy/cert-status every 10 s until valid
- Upload Custom Cert opens a modal with PEM text areas; works for all modes
- caddyAPI.renewCert() and uploadCustomCert() added to api.js

Tests: 22 new tests across 5 classes covering enriched status,
renew_cert guards, upload_custom_cert validation/writes/persistence,
custom-cert Caddyfile path selection, and ddns_api suffix stripping.
All 2093 existing tests continue to pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 12:53:42 -04:00
roof 85d265187d fix: Caddy TLS cert acquisition — two DNS-01 blockers
Unit Tests / test (push) Successful in 7m32s
1. caddy_manager: embed ddns.token (registration bearer token) in
   Caddyfile, not DDNS_TOTP_SECRET. The pic_ngo plugin sends the token
   to POST /api/v1/dns-challenge; using the TOTP secret caused 401 on
   every attempt.

2. firewall_manager: add _acme-challenge.<zone> forwarding block before
   each split-horizon zone in the Corefile. Without this, CoreDNS was
   authoritative for the challenge name and returned NODATA for TXT
   queries (wildcard A record matches but wrong type), blocking Caddy's
   internal DNS pre-verification step.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 10:45:15 -04:00
roof 76bbc2b67a fix: EmailManager route calls get_email_users not get_users
Unit Tests / test (push) Successful in 7m27s
The method is named get_email_users in EmailManager; the route was
calling the non-existent get_users, causing an AttributeError on every
GET /api/email/users request.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 10:12:24 -04:00
roof bd71466a87 fix: split-horizon DNS zone uses WireGuard IP, not Docker bridge IP
Unit Tests / test (push) Successful in 7m31s
VPN peers can reach Caddy via the host's WireGuard interface (10.0.0.1),
not via the Docker bridge IP (172.20.0.2) which is unreachable outside
the container network. _bootstrap_dns now calls _get_wg_server_ip()
instead of ip_utils.get_service_ips() so the internal zone returns a
routable address for service subdomains.

Also log config save failures instead of silently swallowing them —
the silent PermissionError/OSError was masking write failures and
making it impossible to diagnose why installed services disappeared
after container restarts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 02:11:01 -04:00
roof e4c80149f4 fix: start-core missing cell-network creation breaks fresh install
Unit Tests / test (push) Successful in 7m34s
make start-core (called by install.sh step 6) used $(DCF) which includes
docker-compose.services.yml — that file declares cell-network as external:true.
On a fresh machine the network doesn't exist yet, so compose up failed with
"network cell-network declared as external, but could not be found".

Fix: add the same network-create idempotency guard that start and update
already have. Also add 26 regression tests (test_install_process.py) that
verify install.sh structure and that all start-* targets using DCF create
the network before running compose up.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 01:07:00 -04:00
roof 69862331e7 fix: DDNS update token in body, webdav gating, regression tests
Unit Tests / test (push) Successful in 7m25s
- PicNgoDDNS.update(): send token in request body instead of Authorization
  header; DDNS server validates it from body (was returning HTTP 422 on
  every heartbeat, leaving IP record stale after fresh install)
- peers.py / Peers.jsx: webdav service_access only valid when 'files' store
  service is installed; was always shown even with no services, confusing
  users into thinking WebDAV was pre-installed
- 10 new regression tests: DDNS update body contract, Caddy always
  regenerates on startup with no services, peer role allowed on
  /api/services/active, webdav gating by installed services

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 16:56:12 -04:00
roof c696ca9ef6 fix: DNS split-horizon in DDNS mode, service access filter, health check, verbosity persistence
Unit Tests / test (push) Successful in 7m32s
- DNS (critical): add _configured_dns_params() that returns (primary_domain,
  split_horizon_zones) from config_manager so all apply_all_dns_rules() callers
  pass the correct primary zone (e.g. 'pic.ngo') and split-horizon list
  (e.g. ['pic1.pic.ngo']) instead of the FQDN as the primary — fixes
  DNS_PROBE_FINISHED_BAD_CONFIG for all external domains when on VPN

- firewall_manager: add split_horizon_zones param to apply_all_dns_rules()
  and forward it to generate_corefile()

- Peers: filter service_access list to installed services only; peers.py
  derives valid services from config_manager.get_installed_services() with
  the email→mail ID mapping; Peers.jsx fetches from /api/store/installed
  and filters the checkboxes and defaults accordingly

- Health check: fix file_manager→'files' ID mapping so files service health
  is checked when installed (was silently skipped due to 'file' vs 'files')

- Verbosity persistence: move log_levels.json from non-mounted
  /app/api/config/ to CONFIG_DIR (/app/config/) which maps to config/api/
  on the host; both load (managers.py) and save (routes/services.py) updated

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 13:05:58 -04:00
roof 08f46332b0 fix: add built-in service subdomains to DNS zone on startup
Unit Tests / test (push) Successful in 7m45s
_build_dns_records() only hardcoded 'api' and 'webui', relying on the
optional service registry for the rest. Built-in services (calendar,
files, mail, webdav) were never registered, so they were absent from
the zone file and tests querying webdav.<domain> via CoreDNS got
NXDOMAIN.

Add _BUILTIN_SERVICE_SUBDOMAINS constant and include those names in
every zone build. Also update _stale and apply_cell_name exclusion
sets so DDNS mode correctly removes them from the parent zone.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 03:14:34 -04:00
roof e8b8e47aa4 fix: use sudo for nft list tables — /usr/sbin not in roof user PATH
Unit Tests / test (push) Successful in 7m26s
nft lives in /usr/sbin which is absent from the non-root PATH on Debian.
The delete call already used sudo; add it to the list call too so the
session-scoped cleanup fixture doesn't crash before any test runs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 15:46:09 -04:00
roof adce219a46 fix: clean up stale wg-quick nftables tables in e2e test teardown
Unit Tests / test (push) Successful in 7m29s
wg-quick creates an nftables 'preraw' table per interface that drops
decrypted ICMP replies arriving on any other interface. If a test run
crashes before bring_down(), the table persists and silently kills pings
on subsequent runs (handshake succeeds, replies are decrypted, but the
stale table drops them before the ping process sees them).

Extend cleanup_stale_e2e_interfaces() to also delete any orphaned
wg-quick-pic-e2e-* nftables tables found on the host.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 15:35:19 -04:00
roof ab6d6230dd Fix: read WG server IP and subnet from live API instead of hardcoding 10.0.0.x
Unit Tests / test (push) Successful in 7m30s
test_wg_connect_and_ping_server and the connected_peer fixture hardcoded
10.0.0.1 / 10.0.0.0/24 as the server VPN address. This breaks when the
server uses a different subnet (e.g. pic1 uses 10.0.1.1/24). Now both
read 'address' from /api/wireguard/status at session start and pass the
live server_ip / server_network through wg_server_info and connected_peer.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:09:48 -04:00
roof e2e9c50786 Test: skip peer-sync push test when WG tunnel between cells is not active
Unit Tests / test (push) Successful in 7m27s
The test_remote_permissions_pushed_to_cell2 test verifies that permission
changes on cell1 are pushed to cell2 via the WireGuard tunnel. When both
cells use a public endpoint (DDNS VPS) instead of LAN IPs, no tunnel is
established and the push silently fails. The test now probes cell2's API
at its WG DNS IP before asserting the push succeeded — skips gracefully
if the tunnel is down rather than failing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 12:52:03 -04:00
roof 31f76c54fa Fix: use domain_name as service URL base and harden WG e2e tests
Unit Tests / test (push) Successful in 11m15s
API:
- _configured_domain() now prefers _identity.domain_name (full FQDN
  e.g. 'test5.pic.ngo') over domain ('pic.ngo'). Service URLs in
  /api/peer/services and /api/peer/dashboard now correctly return
  'calendar.test5.pic.ngo' instead of 'calendar.pic.ngo'.

WG e2e tests:
- test_api_domain_returns_json_not_webui: accept 3xx redirect as
  valid routing (Caddy redirects HTTP→HTTPS in pic_ngo mode).
- test_catchall_api_path_returns_json and test_catchall_root_serves_webui:
  skip when Caddy is in HTTPS-redirect mode — catch-all :80 block only
  exists in HTTP-mode cells (lan/local domain).
- test_http_api_domain_reaches_api: replace --dns-servers (requires
  c-ares) with dig + curl --host pattern.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 08:40:59 -04:00
roof b6af71acb5 Fix: accept both VIP and Caddy IP in DNS resolution test
Unit Tests / test (push) Successful in 11m9s
Cells with wildcard zone (e.g. * -> 172.20.0.2) and cells with per-service
VIP DNS records are both valid. Accept either in the assertion so the test
passes regardless of the zone file style.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 08:29:05 -04:00
roof 352bb6bb9e Fix: use api_base fixture instead of hardcoded pic0 IP in WG domain access tests
test_peer_services_* functions hardcoded 'http://192.168.31.51:3000' as the
fallback for PIC_API_BASE, causing failures when tests run on any other host
(including pic1 itself). Use the api_base fixture, which reads PIC_HOST and
PIC_API_PORT from the environment.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 08:06:29 -04:00
roof 3e26186f85 fix: correct fake WireGuard key length and guard cell2_client teardown
Unit Tests / test (push) Successful in 11m14s
The synthetic cell fixture used a 46-char base64 key where the validator
expects exactly 43 chars before '='. The key failed format validation so
add_cell_peer returned False, making the cell connection store nothing and
all TestCellPermissionsApi tests hit 404.

The TestCellServiceAccessRestrictions and TestLiveCellConnection teardown
fixtures called _remove_connection(cell2_client, ...) without checking if
cell2_client is None (expected when no second cell is configured), causing
AttributeError on teardown.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 06:20:52 -04:00
roof f84f16fcd6 fix: add /api/network/dns/corefile endpoint and per-line iptables check
Unit Tests / test (push) Successful in 11m13s
The e2e tests were reading a stale Corefile at a hardcoded fallback path
(/home/roof/pic/config/dns/Corefile) instead of the live one written by
the API (/opt/pic/config/dns/Corefile on pic1). Adding a proper API
endpoint eliminates the path ambiguity.

The iptables test was checking whether peer_ip, DROP, and dpt:80 appeared
anywhere in the full multi-line output rather than on the same rule line,
producing false positives. Now checks per line.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 05:54:17 -04:00
roof 2b29938a64 fix: set CSRF token in PicAPIClient after login
Unit Tests / test (push) Successful in 11m22s
POST requests from PicAPIClient were failing with 403 (CSRF token missing)
because the login response csrf_token was not being applied to subsequent
request headers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 05:05:08 -04:00
roof 0267dce73d feat: HTTPS cert status, IDENTITY_CHANGED wiring, remove stale ip_utils Caddyfile writes
Unit Tests / test (push) Successful in 11m18s
- CaddyManager: add refresh_cert_status() and get_cert_status_fresh() that
  open a live TLS connection to cell-caddy:443 to read cert expiry; avoids
  needing a volume mount into the API container
- CaddyManager: periodic cert refresh in health_monitor_loop (every 60 cycles)
- config.py PUT /api/ddns: publish IDENTITY_CHANGED so CaddyManager regenerates
  the Caddyfile immediately after any domain/cell_name change — previously the
  event was never fired from this route
- config.py: remove all ip_utils.write_caddyfile() calls; CaddyManager is now
  the sole authority for Caddyfile generation
- app.py: add GET /api/caddy/cert-status route
- app.py: add GET /api/egress/status and PUT /api/egress/services/<id>/exit routes
- Settings.jsx: display cert status badge (valid/expired/internal/unknown) with
  expiry date and days-remaining in the domain section
- Tests: TestRefreshCertStatus (8 tests), TestDdnsConfigUpdatesFiresIdentityChanged,
  TestCaddyCertStatusRoute added; fix expired-cert helper to set not_valid_before
  relative to expiry so it's always earlier

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-05 11:39:36 -04:00
roof 41d09c598b wire: AccountManager HTTP dispatch + EgressManager startup + egress API routes
Unit Tests / test (push) Successful in 11m15s
- add_peer() now calls account_manager.provision() for any installed store
  service whose manifest declares accounts.manager == 'http', enabling
  per-peer credential provisioning to third-party HTTP services
- reapply_on_startup() calls egress_manager.apply_all() so fwmark rules
  survive container restarts without manual intervention
- add GET /api/egress/status and PUT /api/egress/services/<id>/exit routes
  so the UI can read and override per-service egress policy
- tests: HTTP provision wiring (happy path + non-fatal failure), egress
  apply_all at startup (wired/unwired/failure cases)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-05 10:30:41 -04:00
roof a906c26b5d fix: resolve Caddy env vars at write time to prevent parse errors
Unit Tests / test (push) Successful in 11m25s
acme_ca and the pic_ngo DNS credentials ({$PIC_NGO_DDNS_TOKEN},
{$PIC_NGO_DDNS_API}) were written as Caddy env-var placeholders, but the
Caddy container does not inherit the API container's environment, so the
substitutions always failed — Caddy saw bare directive names with no
arguments and rejected the Caddyfile.

- _global_acme_block: only emit the acme_ca directive when ACME_CA_URL is
  actually set; omitting it makes Caddy default to Let's Encrypt production.
- _caddyfile_pic_ngo: embed the DDNS_TOTP_SECRET and DDNS_URL values directly
  into the Caddyfile at write time rather than relying on Caddy env expansion.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-30 15:01:15 -04:00
roof 7d5c5421f1 Implement connectivity store services (wireguard-ext, openvpn-client, tor)
Unit Tests / test (push) Successful in 11m31s
- ConnectivityManager: move config dirs to data_dir/services/<id>/config so
  Docker can bind-mount them into store-service containers (Docker resolves
  bind-mount paths on the host, not inside the API container).  Add
  _migrate_legacy_configs to copy existing files from the old config_dir
  location on first boot.

- manifest_validator: add allow_host_network parameter to
  validate_rendered_compose.  When True, waives the external-network
  requirement, permits network_mode: host, and allows devices: — all needed
  by VPN/Tor containers that must share the host network namespace to create
  tun/wg interfaces.  Non-host services are unaffected.

- service_composer: read requires_host_network from the manifest and pass
  allow_host_network=True to validate_rendered_compose for connectivity
  services.

- Tests: update file-path assertions to new data_dir layout; add
  TestMigrateLegacyConfigs, TestValidateRenderedComposeHostNetwork, and
  two TestWriteCompose cases for the host-network path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-30 10:06:48 -04:00
roof 5ed75677c3 test: add e2e tests for service store install/uninstall flow
Unit Tests / test (push) Successful in 11m13s
Tests verify:
- /services page loads and lists all available services
- Admin can install calendar, files, email, and webmail via the store UI
- Install order respects dependencies (email before webmail)
- Uninstall flow shows confirmation dialog before removing
- Dashboard shows service links after install

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-30 04:51:10 -04:00
roof f7bb2cc962 fix: allow first-party store service subdomains and registry images
Unit Tests / test (push) Successful in 11m25s
Two manifest validation bugs blocked all store service installs:

1. service_store_manager.RESERVED_SUBDOMAINS included 'mail', which
   prevented the email service from using its required subdomain.
   Removed mail/calendar/files/webmail — they belong to official PIC
   store services and must be claimable by them.

2. manifest_validator required @sha256 digest pins on ALL images,
   including first-party git.pic.ngo/roof/* images that the PIC team
   builds and controls. service_store_manager._validate_manifest already
   only warned for first-party images; the secondary validator was
   stricter than intended, causing a hard reject on :latest tags.
   Aligned to warn-not-reject for first-party; malformed digests (when
   provided) are still a hard error.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-30 03:09:41 -04:00
roof 03a67ad922 feat: add EgressManager — per-service egress enforcement via host iptables
Unit Tests / test (push) Successful in 11m20s
Routes outbound traffic from installed service containers through
alternate exits (wireguard_ext, openvpn, tor) using host-side
iptables fwmark policy-routing in a dedicated PIC_EGRESS chain.
Marks 0x110/0x120/0x130 are distinct from ConnectivityManager's
0x10/0x20/0x30. Container IPs discovered at runtime via docker
inspect. Wired into ServiceStoreManager install/remove lifecycle
and managers.py singleton. 22 new tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-30 00:58:47 -04:00
roof 5cbbfb41d9 feat: add HTTP dispatch to AccountManager for generic store services
Services with accounts.manager='http' now use POST/DELETE to the
service container's /service-api/accounts endpoint instead of
requiring a named Python manager. _resolve_service allows 'http'
without a registered Python object; _provision_http and
_deprovision_http handle the HTTP calls with 404-as-success on
delete. 9 new tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-30 00:46:54 -04:00
roof 1f2f9d9f6e feat: add manifest_validator.py — security chokepoint for compose and manifest validation
Unit Tests / test (push) Successful in 11m18s
Rejects privileged compose configs (network_mode:host, pid:host, ipc:host,
userns_mode:host, cap_add:ALL, string commands, missing cell-network,
reserved container names). Validates manifest schema_version=3, image
digest pinning (sha256 required, :tag-only rejected), and provision hook
format. Wired into ServiceComposer.write_compose() and
ServiceStoreManager.install() as a single enforcement point.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 18:45:45 -04:00
roof 62b31b072b feat: remove optional services step from setup wizard
Services are now installed post-setup from the Store page, so the
wizard step that let users pre-select email/calendar/files is removed.
Reduces wizard from 5 steps to 4 (Step4Services deleted, Step5Review
renamed to Step4Review). Backend drops services_enabled validation,
background install thread, and service_store_manager dependency.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 18:33:43 -04:00
roof 44d7e96f29 feat: Phase 6 — require_active_service decorator + wizard install wiring
Email/calendar/files routes now return 404 when the service is not
installed, using a require_active_service decorator that checks
ServiceRegistry. Status endpoints are exempt so health checks always work.

SetupManager.complete_setup() now accepts a service_store_manager and
installs any wizard-selected services in a background daemon thread after
setup completes. Failures are logged but do not fail the wizard.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 16:58:57 -04:00
roof a69ca1e402 feat: Phase 5 — remove legacy service blocks, one-shot container cleanup
Unit Tests / test (push) Successful in 11m20s
Email, calendar, files, webmail (rainloop), and the file manager (filegator)
are removed from the main docker-compose stack. They install as independent
per-service compose projects via ServiceComposer.

On startup, _cleanup_legacy_builtin_containers() stops and removes any of the
5 legacy containers still running from the old main stack (guarded by a
one-shot sentinel in _meta.legacy_builtins_cleaned so it never runs twice).
Per-service installs (com.docker.compose.project != 'pic') are left untouched.

Changes:
- docker-compose.yml: remove mail, radicale, webdav, rainloop, filegator blocks;
  fix dhcp + ntp to profiles: ["core","full"] so they start with --profile core
- Makefile: replace all --profile full with --profile core (6 occurrences);
  remove mailserver.env conditional from update: target
- api/legacy_cleanup.py: new module with cleanup_legacy_builtin_containers()
- api/app.py: import and call cleanup at startup before reapply_on_startup()
- tests/test_legacy_cleanup.py: 7 tests covering sentinel, absent containers,
  per-service project skip, main-stack removal, exception safety

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 15:57:45 -04:00
roof a10fe11136 feat: Phase 4 — dynamic nav + service visibility based on installed services
Unit Tests / test (push) Successful in 11m24s
Email, calendar, and files no longer appear in the nav or as usable pages
unless they are installed. The nav refreshes whenever a service is installed
or removed via the new pic-services-changed CustomEvent.

Changes:
- routes/services.py: add GET /api/services/active endpoint
- api.js: add servicesAPI.listActive()
- App.jsx: replace hardcoded coreServiceChildren with dynamic state fetched
  from /api/services/active; SERVICE_META maps ids to nav entry shapes
- ServiceNotInstalledBanner.jsx: new component — admin gets catalog link,
  peer gets "contact admin" message
- EmailPage/CalendarPage/FilesPage: show banner when service not installed
- ServicesIndex.jsx: remove CoreServiceCard + CORE_SERVICES "Built-in"
  section; rename Remove → Uninstall; dispatch pic-services-changed on
  install/uninstall success
- MyServices.jsx: conditionally render service cards based on active list;
  placeholder card when absent; page-level notice when nothing is installed
- tests/test_services_active_endpoint.py: 4 new endpoint tests

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 12:15:02 -04:00
roof 87c321c1c9 feat: Phase 3 — ServiceComposer deps + store install via per-service compose
Unit Tests / test (push) Successful in 11m21s
ServiceStoreManager.install() now delegates container lifecycle to
ServiceComposer (per-service docker-compose.yml) instead of appending to a
shared compose override. This eliminates IP pool allocation, compose override
rendering, and the single-stack docker exec approach.

Changes:
- service_composer.py: add _resolve_requires(), _resolve_dependents(),
  reapply_active_services() — dependency graph and startup reapply
- service_store_manager.py: rewrite install() and remove() to use
  ServiceComposer; add _fetch_template(); delete _allocate_service_ip(),
  _render_compose_override(), _write_compose_override(); remove() now guards
  against removing services that others depend on
- managers.py: pass service_composer= to ServiceStoreManager
- Tests: 13 new composer dep tests; TestInstall/TestRemove rewritten for
  the new composer-driven path; test_optional_services_feature.py updated

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 09:33:02 -04:00
roof 0bfe95320b feat: Phase 2 — remove builtins layer, ServiceRegistry is installed-only
Unit Tests / test (push) Successful in 11m31s
Builtins (email/calendar/files) are no longer baked into the API image.
ServiceRegistry now only knows about installed store services. When nothing
is installed, Caddy and DNS get no service routes — no hardcoded fallback.

Changes:
- service_registry.py: remove _BUILTINS_DIR, _builtin_ids, _builtin_manifest,
  _load_manifest; get() and list_all() now delegate entirely to installed services
- caddy_manager.py: remove _build_core_service_routes(); remove hardcoded
  fallback pairs from _http01_service_pairs(); empty registry → api block only
- network_manager.py: _get_service_subdomains() returns [] when no registry
- api/services/builtins/: deleted (email, calendar, files manifests)
- Tests updated throughout: removed builtin-dependent assertions, added
  installed-service fixtures, updated fallback expectations to api-only

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 08:53:44 -04:00
roof 18b50d08c1 fix: post-Phase-0 corrections — data-dir bind mounts, reserved subdomains, list_active()
Unit Tests / test (push) Successful in 11m31s
Three related fixes discovered during review of Phase 0 and Phase 1 manifests:

1. validate_rendered_compose(): add allowed_data_dir param. After ${PIC_DATA_DIR}
   substitution, compose templates produce absolute paths; without this the
   validator would reject every service install.  ServiceComposer.write_compose()
   now passes its resolved data_dir so only the designated data directory is
   exempt — /etc, /proc, docker.sock etc. still blocked.

2. _RESERVED_SUBDOMAINS: remove service-level subdomains (mail, calendar, files,
   webdav, webmail). The reserved list should protect PIC infrastructure endpoints
   (api, webui, admin) — not service subdomains that official store services
   (calendar, files, webmail) must be allowed to claim.  Aligns with the
   existing _RESERVED_SUBS in service_registry.py.

3. ServiceRegistry.list_active(): new method returning only installed store
   services (no builtins). This is the forward-looking API that Phase 2 will
   make the primary read path once builtins are deleted. Adding it now unblocks
   the QA agent's test_optional_services_feature.py which was already testing
   the expected Phase 2 behaviour.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 07:35:43 -04:00
roof c40919d374 feat: Phase 0 — manifest_validator, compose YAML safety check, cap_add allowlist, backend denylist, provision hook enforcement, size cap
Introduces api/manifest_validator.py as a single security chokepoint
imported by both ServiceComposer and ServiceStoreManager:

- validate_manifest(): rejects kind=builtin, reserved container names,
  reserved subdomains, backend denylist (localhost, cell-api, etc.),
  cap_add outside allowlist / in denylist, shell-string provision hooks,
  and env values with shell-special characters
- validate_rendered_compose(): walks the rendered YAML and rejects
  privileged:true, host network/pid/ipc/userns, absolute bind mounts,
  denied capabilities, devices key, apparmor/seccomp unconfined, and
  string-form command/entrypoint (shell-injection vector)
- validate_provision_hook(): requires argv list form, lowercase binary,
  rejects NUL bytes

ServiceStoreManager changes:
- _validate_manifest() delegates to validate_manifest() after existing checks
- _fetch_manifest() and fetch_index() now stream with a 256 KB size cap
  (prevents memory exhaustion from a malicious or compromised index)
- Digest-pin warning for images missing @sha256 (hard error for unknown
  registries, warning for git.pic.ngo/roof/* and TRUSTED_IMAGES_NO_DIGEST)

ServiceComposer changes:
- write_compose() calls validate_rendered_compose() before any disk write
  so no partial file is left if validation fails
- render_template() substitutes ${PIC_DATA_DIR} with the resolved data_dir path

102 new tests in tests/test_manifest_validator.py covering all five P0
security issues.  Existing test mocks updated to use streaming response
pattern (stream=True + raw.read) and valid compose YAML templates.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 07:23:08 -04:00
roof 2f5370bd98 feat: add Steps 1-4 implementation files (AccountManager, ServiceComposer, builtins, tests)
Unit Tests / test (push) Successful in 11m24s
These files were created during Steps 1-4 of the services architecture but were
never staged: AccountManager (per-service credential provisioning), ServiceComposer
(docker-compose lifecycle), built-in service manifests for email/calendar/files,
and their test suites (158 tests). Also un-tracks .coverage binaries that were
accidentally committed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 04:39:19 -04:00
roof 16fb362df7 feat: replace hardcoded service names with ServiceRegistry-driven Caddy and CoreDNS config
Unit Tests / test (push) Failing after 11s
Previously, CaddyManager and NetworkManager contained hardcoded lists of
service names (calendar, files, mail, webdav, etc.), meaning every new
service required a code change to appear in Caddy routes and DNS records.
Now both managers accept a service_registry parameter and derive their
service lists dynamically from the registry at runtime.

- CaddyManager: new _build_registry_service_routes() and
  _http01_service_pairs() methods pull routes from the registry
- NetworkManager: new _get_service_subdomains() method returns registry
  subdomains with a hardcoded fallback when no registry is wired in;
  _build_dns_records, stale-record detection, and service name sets all
  use the registry
- managers.py: service_registry constructed before network_manager so it
  can be injected into both CaddyManager and NetworkManager
- service_registry.py: validation chokepoint in get_caddy_routes() rejects
  invalid subdomain/backend values and reserved service names
- service_store_manager.py: _validate_manifest now validates top-level
  subdomain, backend, extra_subdomains, and extra_backends fields
- tests: 24 new tests covering registry-driven routing and DNS subdomain
  generation (test_caddy_registry_integration.py)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-28 18:27:52 -04:00
roof 0afdee32da feat: Services UI — nested nav, per-service pages, settings migration
Rename Store → Services: ServicesIndex.jsx shows built-in core services
(Email, Calendar, Files) with Manage links, plus the existing add-on
store below.

New service sub-pages at /services/email|calendar|files serve both
admin and peer roles. Admins see connection info, service status, users
list, and an inline config form (port/data-dir). Peers see connection
info and their personal credentials fetched from peerAPI.

Navigation restructured: a Services parent item expands to show the
three sub-pages via a collapsible sidebar group (ChevronDown toggle).
Both admin and peer navigation include the Services group. Sidebar
extracted NavItem/NavList components to eliminate the duplicate mobile/
desktop rendering.

Settings.jsx drops EmailForm, CalendarForm, FilesForm and their
SERVICE_DEFS entries. Port conflict detection and per-service validation
logic extracted to utils/serviceConfig.js, shared by Settings and the
new service pages. Service form flushers are registered without cleanup
so the Apply banner saves dirty config even when the user navigates away
from a service page before clicking Apply.

Legacy routes /email, /calendar, /files, /store redirect to their new
canonical paths.

GET /api/config now includes installed_services so the nav can derive
which add-ons are installed without a separate store fetch.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-28 06:46:17 -04:00
roof b16189d00f Fix three DNS corruption bugs in DDNS/non-LAN mode
Unit Tests / test (push) Successful in 11m30s
apply_cell_name() now skips multi-label zone files (split-horizon DDNS
zones like pic2.pic.ngo.zone) and excludes '*' and '@' from hostname
candidate detection, preventing the wildcard record from being renamed
to the old cell name during a cell rename.

update_split_horizon_zone() now deletes stale zone files from previous
cell names sharing the same TLD (e.g. pic3.pic.ngo.zone when renaming
to pic2.pic.ngo), eliminating orphaned DNS entries.

_bootstrap_dns() now detects non-LAN domain modes and calls
update_split_horizon_zone() instead of apply_ip_range(), preventing
service records (api, calendar, files…) from being re-injected into
the DDNS parent zone on every container restart.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-28 05:56:00 -04:00
roof 66500bb128 fix: use effective_domain for service links and clean up stale DNS records
Unit Tests / test (push) Successful in 11m32s
Dashboard, Email, Calendar, and Files pages were building service URLs
with the internal LAN zone name (e.g. 'cell') instead of the public
effective domain (e.g. 'pic2.pic.ngo'), and always using http:// even
in DDNS mode where HTTPS is available.

Changes:
- Dashboard/Email/Calendar/Files: read effective_domain + domain_mode
  from ConfigContext; use effective_domain in non-LAN mode and https://
  for all DDNS domain modes.
- Calendar: show port 443 instead of 80 in DDNS mode.
- network_manager.update_split_horizon_zone: when the primary internal
  zone name is a parent of the effective DDNS domain (e.g. pic.ngo is a
  parent of pic2.pic.ngo), remove stale bootstrap service records (api,
  calendar, files, mail, webmail, webdav) that pollute the DNS display
  and would shadow public DNS responses.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-28 05:06:52 -04:00
roof d7dbd596ab feat: route PIC services as subdomains of the cell's effective domain
Unit Tests / test (push) Successful in 11m33s
In DDNS modes (pic_ngo, cloudflare, duckdns, http01), all built-in
services are now reachable as subdomains of the cell domain, e.g.
calendar.pic1.pic.ngo instead of pic1.pic.ngo/calendar.

Key changes:
- CaddyManager._build_core_service_routes(): new helper generates
  Caddy named-matcher host blocks for calendar, mail/webmail, files,
  webdav, and api subdomains within the wildcard TLS server block.
- All ACME modes (pic_ngo, cloudflare, duckdns) use the new
  subdomain matchers; http01 emits a dedicated server block per service.
- http01: installed store-plugin services whose name clashes with a
  core service are skipped to prevent duplicate server blocks.
- routes/config.py: ip_utils.write_caddyfile() is skipped in non-LAN
  modes so LAN Caddy config never overwrites the ACME config.
- firewall_manager.generate_corefile(): new split_horizon_zones param
  adds local authoritative file zones so LAN clients resolve
  *.pic1.pic.ngo to the internal Caddy IP without hairpin NAT.
- NetworkManager.update_split_horizon_zone(): writes the wildcard zone
  file and regenerates the Corefile with the split-horizon block;
  called automatically after every identity change in non-LAN mode.
- Added @ to allowed record-name chars in update_dns_zone validation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-28 04:31:57 -04:00
roof 1f016de855 feat: make DDNS domain_name the effective domain across all services
Unit Tests / test (push) Successful in 11m35s
- ConfigManager.get_effective_domain(): returns domain_name when DDNS
  active (pic_ngo/cloudflare/duckdns), domain otherwise. Used by all
  public-facing services so they use the real registered FQDN.
- ConfigManager.get_internal_domain(): always returns _identity.domain
  (CoreDNS zone name, dnsmasq, cell-link invites — stays internal).
- Silent migration: if domain_mode != lan and domain is generic "cell",
  auto-set to {cell_name}.local for unique CoreDNS zone naming.
- caddy_manager: fix custom_domain bug — cloudflare/http01 modes were
  reading identity.get('custom_domain') which never exists; now reads
  domain_name correctly.
- routes/config, app: expose effective_domain in GET /api/config and
  /api/status responses.
- email_manager, routes/email: use get_effective_domain() for
  OVERRIDE_HOSTNAME, POSTMASTER_ADDRESS, and new-user email defaults.
- ServiceBus.IDENTITY_CHANGED event: emitted from PUT /api/config and
  POST /api/ddns/register after identity writes; caddy_manager and
  email_manager subscribe to regenerate config automatically.
- Settings.jsx: hide Local Domain input in non-LAN modes; show
  read-only effective_domain with "managed by DDNS" badge and an
  Advanced toggle for the internal CoreDNS zone name.
- 11 new test classes covering all new helpers, event subscriptions,
  caddy/email handlers, and the custom_domain fix.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-28 02:48:47 -04:00
roof ad2eaca273 feat: release old pic.ngo subdomain when cell name changes
Unit Tests / test (push) Successful in 15m45s
Adds DELETE /api/v1/registration to the DDNS server (token-authenticated,
owner-only) and PicNgoDDNS.release() on the client. DDNSManager.register()
now automatically releases the old subdomain before claiming the new one,
so stale names are freed for others to use. Release failures are logged as
warnings and do not block the new registration.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 17:07:13 -04:00
roof de43f4a9a0 fix: DDNS register() always sends public IP and saves token to correct location
Unit Tests / test (push) Successful in 15m27s
Two bugs that prevented registration from working after wizard completion:
1. register(name, '') sent empty IP; server stored blank A record. Now calls
   _get_public_ip() when ip is empty so the A record is always set correctly.
2. Token was saved to _identity.domain.ddns.token (TypeError when domain is a
   string) instead of the top-level ddns config where update_ip() reads it.
   Subdomain also now correctly written to _identity.domain_name.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 16:05:55 -04:00
roof 0b31d02f10 feat: DDNS self-healing heartbeat + manual re-register endpoint
Unit Tests / test (push) Successful in 15m26s
- DDNSTokenExpired exception triggers auto re-register in update_ip()
  so cells recover silently after a DDNS DB reset
- POST /api/ddns/register lets the user force re-registration from Settings
- Re-register button in Settings → External Domain & DDNS (pic_ngo only)
- 3 new tests covering register endpoint: wrong provider, missing name, success

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 15:05:27 -04:00
roof 61e8631c7d feat: DDNS settings integration — check availability, update credentials
- GET /api/config now returns domain_mode, domain_name, ddns.{provider,subdomain,has_token}
- GET /api/ddns/check/<name> proxies availability check to DDNS service
- PUT /api/ddns validates and saves cloudflare/duckdns credentials post-setup
- When cell_name changes for pic_ngo provider, auto-registers the new subdomain
- Settings: Cell Name shows availability badge for pic_ngo; auto-save blocks on taken
- Settings: new External Domain & DDNS section — pic_ngo info, cloudflare/duckdns edit
- 11 new tests for the two new endpoints (all pass)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 14:35:37 -04:00
roof 1c62c47475 fix: 500 on setup complete + wizard shows all 7 steps
Unit Tests / test (push) Successful in 15m41s
Two bugs:

1. AttributeError: AuthManager.update_password does not exist — the
   fallback when create_user fails should call set_password_admin().
   This caused a 500 on every setup submit when an admin user already
   existed (e.g. from a previous install attempt).

2. Wizard was jumping to step 2 and skipping domain steps 3-4 when
   preconfigured data existed in cell_config.json. Since the installer
   no longer sets that data, and the wizard must always show all steps,
   the installerConfigured state and all step-skipping navigation is
   removed. Values are still pre-filled if found in config.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 16:41:33 -04:00
roof 2d842abe5b installer: restore cell identity prompts and domain setup
Unit Tests / test (push) Successful in 15m39s
Reverts 8d1ef39. The installer must collect cell name, domain mode, and
provider tokens before 'make install' so that DDNS registration,
availability checks, and Caddy TLS can be configured at first boot.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 15:01:32 -04:00
roof f550f04ce2 Fix DDNS registration and wizard pre-fill after installer run
Unit Tests / test (push) Successful in 15m29s
DDNS registration (setup_cell.py):
- Replace pyotp dependency with stdlib TOTP (HMAC-SHA1, RFC 6238)
  pyotp is only available inside the Docker container, not on the host
  where setup_cell.py runs — registration was silently skipped every time
- OTP header still sent if generation succeeds; omitted gracefully if not

Wizard pre-fill (setup_manager + Setup.jsx):
- GET /api/setup/status now returns 'preconfigured' dict with cell_name,
  domain_mode, domain_name, and provider tokens from installer-written config
- Setup.jsx fetches status on mount and pre-fills all form state so the
  user only needs to set password, services, and timezone — not re-enter
  the identity they already configured in the bash installer
- Fails silently so wizard still works on fresh installs with no config

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 12:22:53 -04:00