Setup wizard (Issue 1 — UI):
- pic.ngo subdomain input now uses the same split-field style as DuckDNS:
input + static '.pic.ngo' suffix in a flex row, availability status below
Setup wizard (Issue 2 — Caddy not regenerating after completion):
- complete_setup route now fires IDENTITY_CHANGED after a successful wizard
submission so CaddyManager regenerates the Caddyfile immediately; users
no longer need to press 'Renew Certificate' to start ACME
Settings — DDNS status (Issue 2 — domain status missing):
- New GET /api/ddns/status endpoint: returns registered flag, domain_name,
public_ip (ipify with 30s cache), last_ip from heartbeat
- Settings DDNS section for pic_ngo now shows a live status row with
color-coded dot (green=registered+current, yellow=registered+stale,
gray=not registered), current public IP, and a Check button
- Status auto-refreshes on mount and after each successful re-registration
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
GET http://cell-caddy:2019/ returns 404 because Caddy's admin API has no
root handler. The health monitor interpreted every response as a failure,
restarted Caddy every 3 minutes, and prevented ACME from ever completing.
/config/ returns 200 + the running config JSON whenever Caddy is up and
serving — that is the correct liveness indicator.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
A stale or empty-token Caddyfile on disk caused Caddy to reject the
/load request, so the Renew button appeared to do nothing. Now
renew_cert() calls regenerate_with_installed([]) first, which writes a
fresh Caddyfile from current identity/config before reloading Caddy.
This ensures a broken on-disk file never blocks ACME renewal.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two problems on fresh install with pic_ngo mode:
1. Caddy crashed at startup because ddns.token was empty (registration
hadn't completed yet), producing a bare `token` keyword in the
Caddyfile that Caddy rejects with "wrong argument count".
Fix: fall back to lan mode in _caddyfile_pic_ngo when the token is
empty so Caddy always starts cleanly. The Caddyfile is regenerated
once registration completes and the token is persisted.
2. DDNS registration failures were silently swallowed — the wizard
showed "Setup complete!" with no indication that HTTPS wouldn't work.
This made it look like everything was fine when the subdomain was
never registered (e.g. name already taken from a previous install,
or transient network error).
Fix: capture the exception, classify it (name_taken vs transient),
and return it as a `warnings` list in the setup response. The wizard
done screen now shows amber warning cards with actionable text instead
of auto-redirecting, giving the user a "Continue to login" button and
a clear explanation of what went wrong.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
On a fresh install before DDNS registration completes, ddns.token is
empty. Writing `token ` (bare keyword, no value) causes Caddy to reject
the Caddyfile at startup with "wrong argument count or unexpected line
ending after 'token'".
Guard added: if the token is empty, generate a LAN-mode Caddyfile so
Caddy starts cleanly. The Caddyfile is regenerated automatically once
registration completes and the token is persisted to cell_config.json.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds live cert status, one-click ACME renewal, and custom cert upload
directly to the Vault page so users never need to touch Caddy config.
Backend:
- CaddyManager.get_cert_status() now returns domain, domain_mode, and
cert_type so the UI can render the right controls without a separate
identity fetch
- CaddyManager.renew_cert() reloads Caddy and invalidates the status
cache; the frontend polls until the cert turns valid
- CaddyManager.upload_custom_cert() validates PEM, writes cert+key to
the shared config/caddy/certs/ volume, updates identity (cert_type=custom),
and regenerates the Caddyfile so Caddy references the new paths
- LAN-mode Caddyfile switches from /etc/caddy/internal/ to the shared
certs dir automatically when cert_type=custom is set
- ddns_api default no longer includes /api/v1 — the plugin appends it;
legacy /api/v1 suffix is stripped at write time to keep the Caddyfile clean
- POST /api/caddy/cert-renew and POST /api/caddy/custom-cert routes added
Frontend:
- TLSPanel component at the top of Vault.jsx shows status badge
(valid/expiring-soon/expired/pending/internal) with domain and expiry
- Renew button visible only for ACME modes; spins during the API call
then polls GET /api/caddy/cert-status every 10 s until valid
- Upload Custom Cert opens a modal with PEM text areas; works for all modes
- caddyAPI.renewCert() and uploadCustomCert() added to api.js
Tests: 22 new tests across 5 classes covering enriched status,
renew_cert guards, upload_custom_cert validation/writes/persistence,
custom-cert Caddyfile path selection, and ddns_api suffix stripping.
All 2093 existing tests continue to pass.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1. caddy_manager: embed ddns.token (registration bearer token) in
Caddyfile, not DDNS_TOTP_SECRET. The pic_ngo plugin sends the token
to POST /api/v1/dns-challenge; using the TOTP secret caused 401 on
every attempt.
2. firewall_manager: add _acme-challenge.<zone> forwarding block before
each split-horizon zone in the Corefile. Without this, CoreDNS was
authoritative for the challenge name and returned NODATA for TXT
queries (wildcard A record matches but wrong type), blocking Caddy's
internal DNS pre-verification step.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The method is named get_email_users in EmailManager; the route was
calling the non-existent get_users, causing an AttributeError on every
GET /api/email/users request.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
VPN peers can reach Caddy via the host's WireGuard interface (10.0.0.1),
not via the Docker bridge IP (172.20.0.2) which is unreachable outside
the container network. _bootstrap_dns now calls _get_wg_server_ip()
instead of ip_utils.get_service_ips() so the internal zone returns a
routable address for service subdomains.
Also log config save failures instead of silently swallowing them —
the silent PermissionError/OSError was masking write failures and
making it impossible to diagnose why installed services disappeared
after container restarts.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- PicNgoDDNS.update(): send token in request body instead of Authorization
header; DDNS server validates it from body (was returning HTTP 422 on
every heartbeat, leaving IP record stale after fresh install)
- peers.py / Peers.jsx: webdav service_access only valid when 'files' store
service is installed; was always shown even with no services, confusing
users into thinking WebDAV was pre-installed
- 10 new regression tests: DDNS update body contract, Caddy always
regenerates on startup with no services, peer role allowed on
/api/services/active, webdav gating by installed services
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add _PEER_READABLE_PATHS allowlist in enforce_auth so peer-role sessions
can read /api/services/active; fixes My Services showing 'not installed'
for cell members when services are installed
- Move Caddy regeneration before the early-return in reapply_on_startup so
the Caddyfile is always rebuilt from current identity on startup, even when
no store services are installed; fixes ERR_SSL_PROTOCOL_ERROR after a cell
rename (Caddyfile retained old wildcard domain)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- calendar: create_calendar_user() now writes bcrypt htpasswd entry to
data/services/calendar/config/users (the path Radicale reads at
/etc/radicale/users); delete_calendar_user() removes the entry
- email: create_email_user() calls `docker exec cell-mail setup email add`
to register the account in docker-mailserver's Dovecot/Postfix store;
delete_email_user() calls the matching `setup email del` — both are
non-fatal if the container isn't running
- service_composer.install(): pull image separately before up so slow
registry pulls don't race with container startup; retry up once on
failure so a transient registry hiccup on first install doesn't
require the user to manually retry
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- DNS (critical): add _configured_dns_params() that returns (primary_domain,
split_horizon_zones) from config_manager so all apply_all_dns_rules() callers
pass the correct primary zone (e.g. 'pic.ngo') and split-horizon list
(e.g. ['pic1.pic.ngo']) instead of the FQDN as the primary — fixes
DNS_PROBE_FINISHED_BAD_CONFIG for all external domains when on VPN
- firewall_manager: add split_horizon_zones param to apply_all_dns_rules()
and forward it to generate_corefile()
- Peers: filter service_access list to installed services only; peers.py
derives valid services from config_manager.get_installed_services() with
the email→mail ID mapping; Peers.jsx fetches from /api/store/installed
and filters the checkboxes and defaults accordingly
- Health check: fix file_manager→'files' ID mapping so files service health
is checked when installed (was silently skipped due to 'file' vs 'files')
- Verbosity persistence: move log_levels.json from non-mounted
/app/api/config/ to CONFIG_DIR (/app/config/) which maps to config/api/
on the host; both load (managers.py) and save (routes/services.py) updated
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The apply_all_dns_rules() call at the end of _bootstrap_dns() was
added to force reload 30s into the Corefile on startup. Now that
reload 30s is removed (it broke CoreDNS zone serving), the call is
unnecessary in LAN mode and actively harmful in DDNS mode:
update_split_horizon_zone() already writes the correct Corefile
with the split-horizon block; the subsequent apply_all_dns_rules()
call would overwrite it without the split-horizon zones, causing
all service subdomain lookups to return NXDOMAIN.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CoreDNS 1.14.3 returns REFUSED for all zones that use
'file /data/zone reload 30s' — the reload timer defers the
initial zone load, causing the plugin to return REFUSED until
the timer fires. The timer never resolves this correctly.
Zone updates are already triggered by SIGUSR1 sent from
_reload_dns_service() after every zone file write, which
causes CoreDNS to reinitialise all plugins and re-read zone
files. No periodic zone polling is needed.
Also update config/dns/Corefile to remove the stale reload 30s.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three related issues prevented CoreDNS from serving updated zone records:
1. The `file` plugin blocks in generate_corefile() lacked a `reload`
option, so CoreDNS never re-read zone files after they were written.
Added `reload 30s` so zone file changes are picked up within 30s.
2. _reload_dns_service() sent SIGHUP via `docker exec ... kill -HUP 1`,
which doesn't trigger zone reloads. Changed to SIGUSR1 via
`docker kill --signal=SIGUSR1` (same as firewall_manager.reload_coredns).
3. _bootstrap_dns() wrote the zone file but never regenerated the
Corefile. CoreDNS's reload plugin only fires when the Corefile
changes, so zone records from startup were invisible until the next
peer modification triggered apply_all_dns_rules(). Now _bootstrap_dns()
always calls apply_all_dns_rules() after the zone write.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
_build_dns_records() only hardcoded 'api' and 'webui', relying on the
optional service registry for the rest. Built-in services (calendar,
files, mail, webdav) were never registered, so they were absent from
the zone file and tests querying webdav.<domain> via CoreDNS got
NXDOMAIN.
Add _BUILTIN_SERVICE_SUBDOMAINS constant and include those names in
every zone build. Also update _stale and apply_cell_name exclusion
sets so DDNS mode correctly removes them from the parent zone.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The address field in get_status() was hardcoded to SERVER_ADDRESS
('10.0.0.1/24') regardless of what wg0.conf contains, so instances
with a non-default subnet (e.g. pic1 at 10.0.1.1/24) always reported
the wrong server IP to callers such as the e2e WG conftest fixture.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
_write_config() was stripping trailing newlines, causing the next
add_cell_peer() to create a single-newline separator between [Interface]
and [Peer] blocks instead of the required blank line. On the following
remove_peer() call, split('\n\n') treated both sections as one block,
matched the PublicKey filter, and wrote an empty string — destroying the
[Interface] section and reverting to the hardcoded SERVER_ADDRESS fallback.
Two-part fix:
1. _write_config() always ends content with a newline
2. remove_peer() normalises single-newline [Peer] headers to blank-line
separators before splitting, and refuses to write if [Interface] would
be lost
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The GET /api/cells/invite endpoint was returning domain='pic.ngo' instead
of the full FQDN 'test5.pic.ngo' because it read _identity.domain rather
than _identity.domain_name.
Apply the same domain_name preference (domain_name || domain) to:
- routes/cells.py get_cell_invite() — the invite shown to connecting cells
- routes/cells.py update_cell_permissions() — Corefile DNS regeneration
- cell_link_manager.py _check_invite_conflicts() — incoming domain collision check
- cell_link_manager.py exchange_invites() — own invite construction
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
API:
- _configured_domain() now prefers _identity.domain_name (full FQDN
e.g. 'test5.pic.ngo') over domain ('pic.ngo'). Service URLs in
/api/peer/services and /api/peer/dashboard now correctly return
'calendar.test5.pic.ngo' instead of 'calendar.pic.ngo'.
WG e2e tests:
- test_api_domain_returns_json_not_webui: accept 3xx redirect as
valid routing (Caddy redirects HTTP→HTTPS in pic_ngo mode).
- test_catchall_api_path_returns_json and test_catchall_root_serves_webui:
skip when Caddy is in HTTPS-redirect mode — catch-all :80 block only
exists in HTTP-mode cells (lan/local domain).
- test_http_api_domain_reaches_api: replace --dns-servers (requires
c-ares) with dig + curl --host pattern.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds listen_port to /api/wireguard/status response so e2e test conftest
picks up the actual port (51821) instead of defaulting to 51820.
Extends PostUp/PreDown in generate_config to also DNAT and forward port
443 (HTTPS) through to cell-caddy — mirrors the ensure_service_dnat fix
so HTTPS works even after a WireGuard container restart without an API
restart. Updates _is_dnat_rule to recognize 443 rules.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ensure_service_dnat() only wired port 80 → cell-caddy, so HTTPS was
silently dropped: no DNAT rule redirected 443 to the Caddy container,
and the FORWARD chain had no ACCEPT for dport 443. Refactored the
function to loop over both 80 and 443 so both are DNAT'd and forwarded.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The e2e tests were reading a stale Corefile at a hardcoded fallback path
(/home/roof/pic/config/dns/Corefile) instead of the live one written by
the API (/opt/pic/config/dns/Corefile on pic1). Adding a proper API
endpoint eliminates the path ambiguity.
The iptables test was checking whether peer_ip, DROP, and dpt:80 appeared
anywhere in the full multi-line output rather than on the same rule line,
producing false positives. Now checks per line.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Allows fetching a single peer by name. E2E tests need this to verify
persisted peer state after PUT operations.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Endpoint override:
- Add PUT /api/wireguard/endpoint to set endpoint_override in identity
config; GET returns detected, override, and effective endpoints
- _effective_endpoint() helper applies override in peer config generation
(wireguard.py and peer_dashboard.py); detected IP still shown in UI
- Add Endpoint Override input in WireGuard page — solves the common case
where auto-detected IP is a gateway/VPS but peers connect via LAN IP
Docker cell-network fix:
- Declare cell-network external in docker-compose.yml; Docker Compose v5
enforces label ownership and rejects networks created by older versions
- Makefile start/update pre-create cell-network idempotently
- reinstall/uninstall(full) explicitly delete and recreate the network
- Fix uninstall loop path: data/api/services/ (not data/services/)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When the email store service is installed but no explicit domain has been
set in its config, _provision_email now falls back to
config_manager.get_effective_domain() so peer account creation works
immediately without requiring a separate config step.
Also threads config_manager into AccountManager.__init__ (optional kwarg,
no existing callers break) so the fallback is available without a global
import.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- CaddyManager: add refresh_cert_status() and get_cert_status_fresh() that
open a live TLS connection to cell-caddy:443 to read cert expiry; avoids
needing a volume mount into the API container
- CaddyManager: periodic cert refresh in health_monitor_loop (every 60 cycles)
- config.py PUT /api/ddns: publish IDENTITY_CHANGED so CaddyManager regenerates
the Caddyfile immediately after any domain/cell_name change — previously the
event was never fired from this route
- config.py: remove all ip_utils.write_caddyfile() calls; CaddyManager is now
the sole authority for Caddyfile generation
- app.py: add GET /api/caddy/cert-status route
- app.py: add GET /api/egress/status and PUT /api/egress/services/<id>/exit routes
- Settings.jsx: display cert status badge (valid/expired/internal/unknown) with
expiry date and days-remaining in the domain section
- Tests: TestRefreshCertStatus (8 tests), TestDdnsConfigUpdatesFiresIdentityChanged,
TestCaddyCertStatusRoute added; fix expired-cert helper to set not_valid_before
relative to expiry so it's always earlier
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- add_peer() now calls account_manager.provision() for any installed store
service whose manifest declares accounts.manager == 'http', enabling
per-peer credential provisioning to third-party HTTP services
- reapply_on_startup() calls egress_manager.apply_all() so fwmark rules
survive container restarts without manual intervention
- add GET /api/egress/status and PUT /api/egress/services/<id>/exit routes
so the UI can read and override per-service egress policy
- tests: HTTP provision wiring (happy path + non-fatal failure), egress
apply_all at startup (wired/unwired/failure cases)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
acme_ca and the pic_ngo DNS credentials ({$PIC_NGO_DDNS_TOKEN},
{$PIC_NGO_DDNS_API}) were written as Caddy env-var placeholders, but the
Caddy container does not inherit the API container's environment, so the
substitutions always failed — Caddy saw bare directive names with no
arguments and rejected the Caddyfile.
- _global_acme_block: only emit the acme_ca directive when ACME_CA_URL is
actually set; omitting it makes Caddy default to Let's Encrypt production.
- _caddyfile_pic_ngo: embed the DDNS_TOTP_SECRET and DDNS_URL values directly
into the Caddyfile at write time rather than relying on Caddy env expansion.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- docker-compose.services.yml: change external network name from
pic_cell-network to cell-network so store-service compose files can find
it. The project-prefixed name was overriding the explicit name: cell-network
fix in docker-compose.yml when both files were merged by make start.
- service_store.py: normalize docker compose stderr into the error key in
the 400 response so the Store page shows the actual failure reason instead
of the generic fallback message.
- app.py: skip health checks for email/calendar/files managers when those
optional store services are not installed — prevents false Down alerts and
unnecessary noise in health history.
- Logs.jsx: remove Email/Calendar/Files columns from the health history table;
they are optional store services, not core builtins that should always appear.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- ConnectivityManager: move config dirs to data_dir/services/<id>/config so
Docker can bind-mount them into store-service containers (Docker resolves
bind-mount paths on the host, not inside the API container). Add
_migrate_legacy_configs to copy existing files from the old config_dir
location on first boot.
- manifest_validator: add allow_host_network parameter to
validate_rendered_compose. When True, waives the external-network
requirement, permits network_mode: host, and allows devices: — all needed
by VPN/Tor containers that must share the host network namespace to create
tun/wg interfaces. Non-host services are unaffected.
- service_composer: read requires_host_network from the manifest and pass
allow_host_network=True to validate_rendered_compose for connectivity
services.
- Tests: update file-path assertions to new data_dir layout; add
TestMigrateLegacyConfigs, TestValidateRenderedComposeHostNetwork, and
two TestWriteCompose cases for the host-network path.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two manifest validation bugs blocked all store service installs:
1. service_store_manager.RESERVED_SUBDOMAINS included 'mail', which
prevented the email service from using its required subdomain.
Removed mail/calendar/files/webmail — they belong to official PIC
store services and must be claimable by them.
2. manifest_validator required @sha256 digest pins on ALL images,
including first-party git.pic.ngo/roof/* images that the PIC team
builds and controls. service_store_manager._validate_manifest already
only warned for first-party images; the secondary validator was
stricter than intended, causing a hard reject on :latest tags.
Aligned to warn-not-reject for first-party; malformed digests (when
provided) are still a hard error.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Routes outbound traffic from installed service containers through
alternate exits (wireguard_ext, openvpn, tor) using host-side
iptables fwmark policy-routing in a dedicated PIC_EGRESS chain.
Marks 0x110/0x120/0x130 are distinct from ConnectivityManager's
0x10/0x20/0x30. Container IPs discovered at runtime via docker
inspect. Wired into ServiceStoreManager install/remove lifecycle
and managers.py singleton. 22 new tests.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Services with accounts.manager='http' now use POST/DELETE to the
service container's /service-api/accounts endpoint instead of
requiring a named Python manager. _resolve_service allows 'http'
without a registered Python object; _provision_http and
_deprovision_http handle the HTTP calls with 404-as-success on
delete. 9 new tests.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Services are now installed post-setup from the Store page, so the
wizard step that let users pre-select email/calendar/files is removed.
Reduces wizard from 5 steps to 4 (Step4Services deleted, Step5Review
renamed to Step4Review). Backend drops services_enabled validation,
background install thread, and service_store_manager dependency.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Email/calendar/files routes now return 404 when the service is not
installed, using a require_active_service decorator that checks
ServiceRegistry. Status endpoints are exempt so health checks always work.
SetupManager.complete_setup() now accepts a service_store_manager and
installs any wizard-selected services in a background daemon thread after
setup completes. Failures are logged but do not fail the wizard.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Email, calendar, files, webmail (rainloop), and the file manager (filegator)
are removed from the main docker-compose stack. They install as independent
per-service compose projects via ServiceComposer.
On startup, _cleanup_legacy_builtin_containers() stops and removes any of the
5 legacy containers still running from the old main stack (guarded by a
one-shot sentinel in _meta.legacy_builtins_cleaned so it never runs twice).
Per-service installs (com.docker.compose.project != 'pic') are left untouched.
Changes:
- docker-compose.yml: remove mail, radicale, webdav, rainloop, filegator blocks;
fix dhcp + ntp to profiles: ["core","full"] so they start with --profile core
- Makefile: replace all --profile full with --profile core (6 occurrences);
remove mailserver.env conditional from update: target
- api/legacy_cleanup.py: new module with cleanup_legacy_builtin_containers()
- api/app.py: import and call cleanup at startup before reapply_on_startup()
- tests/test_legacy_cleanup.py: 7 tests covering sentinel, absent containers,
per-service project skip, main-stack removal, exception safety
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Email, calendar, and files no longer appear in the nav or as usable pages
unless they are installed. The nav refreshes whenever a service is installed
or removed via the new pic-services-changed CustomEvent.
Changes:
- routes/services.py: add GET /api/services/active endpoint
- api.js: add servicesAPI.listActive()
- App.jsx: replace hardcoded coreServiceChildren with dynamic state fetched
from /api/services/active; SERVICE_META maps ids to nav entry shapes
- ServiceNotInstalledBanner.jsx: new component — admin gets catalog link,
peer gets "contact admin" message
- EmailPage/CalendarPage/FilesPage: show banner when service not installed
- ServicesIndex.jsx: remove CoreServiceCard + CORE_SERVICES "Built-in"
section; rename Remove → Uninstall; dispatch pic-services-changed on
install/uninstall success
- MyServices.jsx: conditionally render service cards based on active list;
placeholder card when absent; page-level notice when nothing is installed
- tests/test_services_active_endpoint.py: 4 new endpoint tests
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ServiceStoreManager.install() now delegates container lifecycle to
ServiceComposer (per-service docker-compose.yml) instead of appending to a
shared compose override. This eliminates IP pool allocation, compose override
rendering, and the single-stack docker exec approach.
Changes:
- service_composer.py: add _resolve_requires(), _resolve_dependents(),
reapply_active_services() — dependency graph and startup reapply
- service_store_manager.py: rewrite install() and remove() to use
ServiceComposer; add _fetch_template(); delete _allocate_service_ip(),
_render_compose_override(), _write_compose_override(); remove() now guards
against removing services that others depend on
- managers.py: pass service_composer= to ServiceStoreManager
- Tests: 13 new composer dep tests; TestInstall/TestRemove rewritten for
the new composer-driven path; test_optional_services_feature.py updated
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Builtins (email/calendar/files) are no longer baked into the API image.
ServiceRegistry now only knows about installed store services. When nothing
is installed, Caddy and DNS get no service routes — no hardcoded fallback.
Changes:
- service_registry.py: remove _BUILTINS_DIR, _builtin_ids, _builtin_manifest,
_load_manifest; get() and list_all() now delegate entirely to installed services
- caddy_manager.py: remove _build_core_service_routes(); remove hardcoded
fallback pairs from _http01_service_pairs(); empty registry → api block only
- network_manager.py: _get_service_subdomains() returns [] when no registry
- api/services/builtins/: deleted (email, calendar, files manifests)
- Tests updated throughout: removed builtin-dependent assertions, added
installed-service fixtures, updated fallback expectations to api-only
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three related fixes discovered during review of Phase 0 and Phase 1 manifests:
1. validate_rendered_compose(): add allowed_data_dir param. After ${PIC_DATA_DIR}
substitution, compose templates produce absolute paths; without this the
validator would reject every service install. ServiceComposer.write_compose()
now passes its resolved data_dir so only the designated data directory is
exempt — /etc, /proc, docker.sock etc. still blocked.
2. _RESERVED_SUBDOMAINS: remove service-level subdomains (mail, calendar, files,
webdav, webmail). The reserved list should protect PIC infrastructure endpoints
(api, webui, admin) — not service subdomains that official store services
(calendar, files, webmail) must be allowed to claim. Aligns with the
existing _RESERVED_SUBS in service_registry.py.
3. ServiceRegistry.list_active(): new method returning only installed store
services (no builtins). This is the forward-looking API that Phase 2 will
make the primary read path once builtins are deleted. Adding it now unblocks
the QA agent's test_optional_services_feature.py which was already testing
the expected Phase 2 behaviour.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduces api/manifest_validator.py as a single security chokepoint
imported by both ServiceComposer and ServiceStoreManager:
- validate_manifest(): rejects kind=builtin, reserved container names,
reserved subdomains, backend denylist (localhost, cell-api, etc.),
cap_add outside allowlist / in denylist, shell-string provision hooks,
and env values with shell-special characters
- validate_rendered_compose(): walks the rendered YAML and rejects
privileged:true, host network/pid/ipc/userns, absolute bind mounts,
denied capabilities, devices key, apparmor/seccomp unconfined, and
string-form command/entrypoint (shell-injection vector)
- validate_provision_hook(): requires argv list form, lowercase binary,
rejects NUL bytes
ServiceStoreManager changes:
- _validate_manifest() delegates to validate_manifest() after existing checks
- _fetch_manifest() and fetch_index() now stream with a 256 KB size cap
(prevents memory exhaustion from a malicious or compromised index)
- Digest-pin warning for images missing @sha256 (hard error for unknown
registries, warning for git.pic.ngo/roof/* and TRUSTED_IMAGES_NO_DIGEST)
ServiceComposer changes:
- write_compose() calls validate_rendered_compose() before any disk write
so no partial file is left if validation fails
- render_template() substitutes ${PIC_DATA_DIR} with the resolved data_dir path
102 new tests in tests/test_manifest_validator.py covering all five P0
security issues. Existing test mocks updated to use streaming response
pattern (stream=True + raw.read) and valid compose YAML templates.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
These files were created during Steps 1-4 of the services architecture but were
never staged: AccountManager (per-service credential provisioning), ServiceComposer
(docker-compose lifecycle), built-in service manifests for email/calendar/files,
and their test suites (158 tests). Also un-tracks .coverage binaries that were
accidentally committed.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously, CaddyManager and NetworkManager contained hardcoded lists of
service names (calendar, files, mail, webdav, etc.), meaning every new
service required a code change to appear in Caddy routes and DNS records.
Now both managers accept a service_registry parameter and derive their
service lists dynamically from the registry at runtime.
- CaddyManager: new _build_registry_service_routes() and
_http01_service_pairs() methods pull routes from the registry
- NetworkManager: new _get_service_subdomains() method returns registry
subdomains with a hardcoded fallback when no registry is wired in;
_build_dns_records, stale-record detection, and service name sets all
use the registry
- managers.py: service_registry constructed before network_manager so it
can be injected into both CaddyManager and NetworkManager
- service_registry.py: validation chokepoint in get_caddy_routes() rejects
invalid subdomain/backend values and reserved service names
- service_store_manager.py: _validate_manifest now validates top-level
subdomain, backend, extra_subdomains, and extra_backends fields
- tests: 24 new tests covering registry-driven routing and DNS subdomain
generation (test_caddy_registry_integration.py)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Rename Store → Services: ServicesIndex.jsx shows built-in core services
(Email, Calendar, Files) with Manage links, plus the existing add-on
store below.
New service sub-pages at /services/email|calendar|files serve both
admin and peer roles. Admins see connection info, service status, users
list, and an inline config form (port/data-dir). Peers see connection
info and their personal credentials fetched from peerAPI.
Navigation restructured: a Services parent item expands to show the
three sub-pages via a collapsible sidebar group (ChevronDown toggle).
Both admin and peer navigation include the Services group. Sidebar
extracted NavItem/NavList components to eliminate the duplicate mobile/
desktop rendering.
Settings.jsx drops EmailForm, CalendarForm, FilesForm and their
SERVICE_DEFS entries. Port conflict detection and per-service validation
logic extracted to utils/serviceConfig.js, shared by Settings and the
new service pages. Service form flushers are registered without cleanup
so the Apply banner saves dirty config even when the user navigates away
from a service page before clicking Apply.
Legacy routes /email, /calendar, /files, /store redirect to their new
canonical paths.
GET /api/config now includes installed_services so the nav can derive
which add-ons are installed without a separate store fetch.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
apply_cell_name() now skips multi-label zone files (split-horizon DDNS
zones like pic2.pic.ngo.zone) and excludes '*' and '@' from hostname
candidate detection, preventing the wildcard record from being renamed
to the old cell name during a cell rename.
update_split_horizon_zone() now deletes stale zone files from previous
cell names sharing the same TLD (e.g. pic3.pic.ngo.zone when renaming
to pic2.pic.ngo), eliminating orphaned DNS entries.
_bootstrap_dns() now detects non-LAN domain modes and calls
update_split_horizon_zone() instead of apply_ip_range(), preventing
service records (api, calendar, files…) from being re-injected into
the DDNS parent zone on every container restart.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Dashboard, Email, Calendar, and Files pages were building service URLs
with the internal LAN zone name (e.g. 'cell') instead of the public
effective domain (e.g. 'pic2.pic.ngo'), and always using http:// even
in DDNS mode where HTTPS is available.
Changes:
- Dashboard/Email/Calendar/Files: read effective_domain + domain_mode
from ConfigContext; use effective_domain in non-LAN mode and https://
for all DDNS domain modes.
- Calendar: show port 443 instead of 80 in DDNS mode.
- network_manager.update_split_horizon_zone: when the primary internal
zone name is a parent of the effective DDNS domain (e.g. pic.ngo is a
parent of pic2.pic.ngo), remove stale bootstrap service records (api,
calendar, files, mail, webmail, webdav) that pollute the DNS display
and would shadow public DNS responses.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
In DDNS modes (pic_ngo, cloudflare, duckdns, http01), all built-in
services are now reachable as subdomains of the cell domain, e.g.
calendar.pic1.pic.ngo instead of pic1.pic.ngo/calendar.
Key changes:
- CaddyManager._build_core_service_routes(): new helper generates
Caddy named-matcher host blocks for calendar, mail/webmail, files,
webdav, and api subdomains within the wildcard TLS server block.
- All ACME modes (pic_ngo, cloudflare, duckdns) use the new
subdomain matchers; http01 emits a dedicated server block per service.
- http01: installed store-plugin services whose name clashes with a
core service are skipped to prevent duplicate server blocks.
- routes/config.py: ip_utils.write_caddyfile() is skipped in non-LAN
modes so LAN Caddy config never overwrites the ACME config.
- firewall_manager.generate_corefile(): new split_horizon_zones param
adds local authoritative file zones so LAN clients resolve
*.pic1.pic.ngo to the internal Caddy IP without hairpin NAT.
- NetworkManager.update_split_horizon_zone(): writes the wildcard zone
file and regenerates the Corefile with the split-horizon block;
called automatically after every identity change in non-LAN mode.
- Added @ to allowed record-name chars in update_dns_zone validation.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>