- Add _PEER_READABLE_PATHS allowlist in enforce_auth so peer-role sessions
can read /api/services/active; fixes My Services showing 'not installed'
for cell members when services are installed
- Move Caddy regeneration before the early-return in reapply_on_startup so
the Caddyfile is always rebuilt from current identity on startup, even when
no store services are installed; fixes ERR_SSL_PROTOCOL_ERROR after a cell
rename (Caddyfile retained old wildcard domain)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- calendar: create_calendar_user() now writes bcrypt htpasswd entry to
data/services/calendar/config/users (the path Radicale reads at
/etc/radicale/users); delete_calendar_user() removes the entry
- email: create_email_user() calls `docker exec cell-mail setup email add`
to register the account in docker-mailserver's Dovecot/Postfix store;
delete_email_user() calls the matching `setup email del` — both are
non-fatal if the container isn't running
- service_composer.install(): pull image separately before up so slow
registry pulls don't race with container startup; retry up once on
failure so a transient registry hiccup on first install doesn't
require the user to manually retry
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- DNS (critical): add _configured_dns_params() that returns (primary_domain,
split_horizon_zones) from config_manager so all apply_all_dns_rules() callers
pass the correct primary zone (e.g. 'pic.ngo') and split-horizon list
(e.g. ['pic1.pic.ngo']) instead of the FQDN as the primary — fixes
DNS_PROBE_FINISHED_BAD_CONFIG for all external domains when on VPN
- firewall_manager: add split_horizon_zones param to apply_all_dns_rules()
and forward it to generate_corefile()
- Peers: filter service_access list to installed services only; peers.py
derives valid services from config_manager.get_installed_services() with
the email→mail ID mapping; Peers.jsx fetches from /api/store/installed
and filters the checkboxes and defaults accordingly
- Health check: fix file_manager→'files' ID mapping so files service health
is checked when installed (was silently skipped due to 'file' vs 'files')
- Verbosity persistence: move log_levels.json from non-mounted
/app/api/config/ to CONFIG_DIR (/app/config/) which maps to config/api/
on the host; both load (managers.py) and save (routes/services.py) updated
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The apply_all_dns_rules() call at the end of _bootstrap_dns() was
added to force reload 30s into the Corefile on startup. Now that
reload 30s is removed (it broke CoreDNS zone serving), the call is
unnecessary in LAN mode and actively harmful in DDNS mode:
update_split_horizon_zone() already writes the correct Corefile
with the split-horizon block; the subsequent apply_all_dns_rules()
call would overwrite it without the split-horizon zones, causing
all service subdomain lookups to return NXDOMAIN.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CoreDNS 1.14.3 returns REFUSED for all zones that use
'file /data/zone reload 30s' — the reload timer defers the
initial zone load, causing the plugin to return REFUSED until
the timer fires. The timer never resolves this correctly.
Zone updates are already triggered by SIGUSR1 sent from
_reload_dns_service() after every zone file write, which
causes CoreDNS to reinitialise all plugins and re-read zone
files. No periodic zone polling is needed.
Also update config/dns/Corefile to remove the stale reload 30s.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three related issues prevented CoreDNS from serving updated zone records:
1. The `file` plugin blocks in generate_corefile() lacked a `reload`
option, so CoreDNS never re-read zone files after they were written.
Added `reload 30s` so zone file changes are picked up within 30s.
2. _reload_dns_service() sent SIGHUP via `docker exec ... kill -HUP 1`,
which doesn't trigger zone reloads. Changed to SIGUSR1 via
`docker kill --signal=SIGUSR1` (same as firewall_manager.reload_coredns).
3. _bootstrap_dns() wrote the zone file but never regenerated the
Corefile. CoreDNS's reload plugin only fires when the Corefile
changes, so zone records from startup were invisible until the next
peer modification triggered apply_all_dns_rules(). Now _bootstrap_dns()
always calls apply_all_dns_rules() after the zone write.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
_build_dns_records() only hardcoded 'api' and 'webui', relying on the
optional service registry for the rest. Built-in services (calendar,
files, mail, webdav) were never registered, so they were absent from
the zone file and tests querying webdav.<domain> via CoreDNS got
NXDOMAIN.
Add _BUILTIN_SERVICE_SUBDOMAINS constant and include those names in
every zone build. Also update _stale and apply_cell_name exclusion
sets so DDNS mode correctly removes them from the parent zone.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
nft lives in /usr/sbin which is absent from the non-root PATH on Debian.
The delete call already used sudo; add it to the list call too so the
session-scoped cleanup fixture doesn't crash before any test runs.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
wg-quick creates an nftables 'preraw' table per interface that drops
decrypted ICMP replies arriving on any other interface. If a test run
crashes before bring_down(), the table persists and silently kills pings
on subsequent runs (handshake succeeds, replies are decrypted, but the
stale table drops them before the ping process sees them).
Extend cleanup_stale_e2e_interfaces() to also delete any orphaned
wg-quick-pic-e2e-* nftables tables found on the host.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The address field in get_status() was hardcoded to SERVER_ADDRESS
('10.0.0.1/24') regardless of what wg0.conf contains, so instances
with a non-default subnet (e.g. pic1 at 10.0.1.1/24) always reported
the wrong server IP to callers such as the e2e WG conftest fixture.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
test_wg_connect_and_ping_server and the connected_peer fixture hardcoded
10.0.0.1 / 10.0.0.0/24 as the server VPN address. This breaks when the
server uses a different subnet (e.g. pic1 uses 10.0.1.1/24). Now both
read 'address' from /api/wireguard/status at session start and pass the
live server_ip / server_network through wg_server_info and connected_peer.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The test_remote_permissions_pushed_to_cell2 test verifies that permission
changes on cell1 are pushed to cell2 via the WireGuard tunnel. When both
cells use a public endpoint (DDNS VPS) instead of LAN IPs, no tunnel is
established and the push silently fails. The test now probes cell2's API
at its WG DNS IP before asserting the push succeeded — skips gracefully
if the tunnel is down rather than failing.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
_write_config() was stripping trailing newlines, causing the next
add_cell_peer() to create a single-newline separator between [Interface]
and [Peer] blocks instead of the required blank line. On the following
remove_peer() call, split('\n\n') treated both sections as one block,
matched the PublicKey filter, and wrote an empty string — destroying the
[Interface] section and reverting to the hardcoded SERVER_ADDRESS fallback.
Two-part fix:
1. _write_config() always ends content with a newline
2. remove_peer() normalises single-newline [Peer] headers to blank-line
separators before splitting, and refuses to write if [Interface] would
be lost
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The GET /api/cells/invite endpoint was returning domain='pic.ngo' instead
of the full FQDN 'test5.pic.ngo' because it read _identity.domain rather
than _identity.domain_name.
Apply the same domain_name preference (domain_name || domain) to:
- routes/cells.py get_cell_invite() — the invite shown to connecting cells
- routes/cells.py update_cell_permissions() — Corefile DNS regeneration
- cell_link_manager.py _check_invite_conflicts() — incoming domain collision check
- cell_link_manager.py exchange_invites() — own invite construction
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
API:
- _configured_domain() now prefers _identity.domain_name (full FQDN
e.g. 'test5.pic.ngo') over domain ('pic.ngo'). Service URLs in
/api/peer/services and /api/peer/dashboard now correctly return
'calendar.test5.pic.ngo' instead of 'calendar.pic.ngo'.
WG e2e tests:
- test_api_domain_returns_json_not_webui: accept 3xx redirect as
valid routing (Caddy redirects HTTP→HTTPS in pic_ngo mode).
- test_catchall_api_path_returns_json and test_catchall_root_serves_webui:
skip when Caddy is in HTTPS-redirect mode — catch-all :80 block only
exists in HTTP-mode cells (lan/local domain).
- test_http_api_domain_reaches_api: replace --dns-servers (requires
c-ares) with dig + curl --host pattern.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Cells with wildcard zone (e.g. * -> 172.20.0.2) and cells with per-service
VIP DNS records are both valid. Accept either in the assertion so the test
passes regardless of the zone file style.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
test_peer_services_* functions hardcoded 'http://192.168.31.51:3000' as the
fallback for PIC_API_BASE, causing failures when tests run on any other host
(including pic1 itself). Use the api_base fixture, which reads PIC_HOST and
PIC_API_PORT from the environment.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds listen_port to /api/wireguard/status response so e2e test conftest
picks up the actual port (51821) instead of defaulting to 51820.
Extends PostUp/PreDown in generate_config to also DNAT and forward port
443 (HTTPS) through to cell-caddy — mirrors the ensure_service_dnat fix
so HTTPS works even after a WireGuard container restart without an API
restart. Updates _is_dnat_rule to recognize 443 rules.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ensure_service_dnat() only wired port 80 → cell-caddy, so HTTPS was
silently dropped: no DNAT rule redirected 443 to the Caddy container,
and the FORWARD chain had no ACCEPT for dport 443. Refactored the
function to loop over both 80 and 443 so both are DNAT'd and forwarded.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The synthetic cell fixture used a 46-char base64 key where the validator
expects exactly 43 chars before '='. The key failed format validation so
add_cell_peer returned False, making the cell connection store nothing and
all TestCellPermissionsApi tests hit 404.
The TestCellServiceAccessRestrictions and TestLiveCellConnection teardown
fixtures called _remove_connection(cell2_client, ...) without checking if
cell2_client is None (expected when no second cell is configured), causing
AttributeError on teardown.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The e2e tests were reading a stale Corefile at a hardcoded fallback path
(/home/roof/pic/config/dns/Corefile) instead of the live one written by
the API (/opt/pic/config/dns/Corefile on pic1). Adding a proper API
endpoint eliminates the path ambiguity.
The iptables test was checking whether peer_ip, DROP, and dpt:80 appeared
anywhere in the full multi-line output rather than on the same rule line,
producing false positives. Now checks per line.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Allows fetching a single peer by name. E2E tests need this to verify
persisted peer state after PUT operations.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
POST requests from PicAPIClient were failing with 403 (CSRF token missing)
because the login response csrf_token was not being applied to subsequent
request headers.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Endpoint override:
- Add PUT /api/wireguard/endpoint to set endpoint_override in identity
config; GET returns detected, override, and effective endpoints
- _effective_endpoint() helper applies override in peer config generation
(wireguard.py and peer_dashboard.py); detected IP still shown in UI
- Add Endpoint Override input in WireGuard page — solves the common case
where auto-detected IP is a gateway/VPS but peers connect via LAN IP
Docker cell-network fix:
- Declare cell-network external in docker-compose.yml; Docker Compose v5
enforces label ownership and rejects networks created by older versions
- Makefile start/update pre-create cell-network idempotently
- reinstall/uninstall(full) explicitly delete and recreate the network
- Fix uninstall loop path: data/api/services/ (not data/services/)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Docker Compose v5 enforces label ownership on networks it creates. On
systems where cell-network was created by an older compose version (no
labels), Caddy and other services fail to start with "incorrect label"
error.
Declaring the network external in docker-compose.yml skips label
validation. The Makefile start/update targets now create the network if
it doesn't exist (idempotent). The reinstall and uninstall (full) paths
explicitly delete the network so fresh recreations are clean.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When the email store service is installed but no explicit domain has been
set in its config, _provision_email now falls back to
config_manager.get_effective_domain() so peer account creation works
immediately without requiring a separate config step.
Also threads config_manager into AccountManager.__init__ (optional kwarg,
no existing callers break) so the fallback is available without a global
import.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Iterates data/services/*/docker-compose.yml and runs `docker compose down`
for each before stopping core containers, so stale optional-service
containers (email, calendar, files, etc.) don't leave cell-network occupied
and block a subsequent fresh install.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- CaddyManager: add refresh_cert_status() and get_cert_status_fresh() that
open a live TLS connection to cell-caddy:443 to read cert expiry; avoids
needing a volume mount into the API container
- CaddyManager: periodic cert refresh in health_monitor_loop (every 60 cycles)
- config.py PUT /api/ddns: publish IDENTITY_CHANGED so CaddyManager regenerates
the Caddyfile immediately after any domain/cell_name change — previously the
event was never fired from this route
- config.py: remove all ip_utils.write_caddyfile() calls; CaddyManager is now
the sole authority for Caddyfile generation
- app.py: add GET /api/caddy/cert-status route
- app.py: add GET /api/egress/status and PUT /api/egress/services/<id>/exit routes
- Settings.jsx: display cert status badge (valid/expired/internal/unknown) with
expiry date and days-remaining in the domain section
- Tests: TestRefreshCertStatus (8 tests), TestDdnsConfigUpdatesFiresIdentityChanged,
TestCaddyCertStatusRoute added; fix expired-cert helper to set not_valid_before
relative to expiry so it's always earlier
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- add_peer() now calls account_manager.provision() for any installed store
service whose manifest declares accounts.manager == 'http', enabling
per-peer credential provisioning to third-party HTTP services
- reapply_on_startup() calls egress_manager.apply_all() so fwmark rules
survive container restarts without manual intervention
- add GET /api/egress/status and PUT /api/egress/services/<id>/exit routes
so the UI can read and override per-service egress policy
- tests: HTTP provision wiring (happy path + non-fatal failure), egress
apply_all at startup (wired/unwired/failure cases)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
acme_ca and the pic_ngo DNS credentials ({$PIC_NGO_DDNS_TOKEN},
{$PIC_NGO_DDNS_API}) were written as Caddy env-var placeholders, but the
Caddy container does not inherit the API container's environment, so the
substitutions always failed — Caddy saw bare directive names with no
arguments and rejected the Caddyfile.
- _global_acme_block: only emit the acme_ca directive when ACME_CA_URL is
actually set; omitting it makes Caddy default to Let's Encrypt production.
- _caddyfile_pic_ngo: embed the DDNS_TOTP_SECRET and DDNS_URL values directly
into the Caddyfile at write time rather than relying on Caddy env expansion.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- docker-compose.services.yml: change external network name from
pic_cell-network to cell-network so store-service compose files can find
it. The project-prefixed name was overriding the explicit name: cell-network
fix in docker-compose.yml when both files were merged by make start.
- service_store.py: normalize docker compose stderr into the error key in
the 400 response so the Store page shows the actual failure reason instead
of the generic fallback message.
- app.py: skip health checks for email/calendar/files managers when those
optional store services are not installed — prevents false Down alerts and
unnecessary noise in health history.
- Logs.jsx: remove Email/Calendar/Files columns from the health history table;
they are optional store services, not core builtins that should always appear.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- ConnectivityManager: move config dirs to data_dir/services/<id>/config so
Docker can bind-mount them into store-service containers (Docker resolves
bind-mount paths on the host, not inside the API container). Add
_migrate_legacy_configs to copy existing files from the old config_dir
location on first boot.
- manifest_validator: add allow_host_network parameter to
validate_rendered_compose. When True, waives the external-network
requirement, permits network_mode: host, and allows devices: — all needed
by VPN/Tor containers that must share the host network namespace to create
tun/wg interfaces. Non-host services are unaffected.
- service_composer: read requires_host_network from the manifest and pass
allow_host_network=True to validate_rendered_compose for connectivity
services.
- Tests: update file-path assertions to new data_dir layout; add
TestMigrateLegacyConfigs, TestValidateRenderedComposeHostNetwork, and
two TestWriteCompose cases for the host-network path.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Without name: cell-network, Docker Compose creates the network as
pic_cell-network (prefixed with the project name). Store service compose
templates declare cell-network as external: true and can't find it.
Adding name: cell-network makes the network name predictable regardless
of the Compose project name.
Existing installs need: make stop && make start to recreate the network.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Tests verify:
- /services page loads and lists all available services
- Admin can install calendar, files, email, and webmail via the store UI
- Install order respects dependencies (email before webmail)
- Uninstall flow shows confirmation dialog before removing
- Dashboard shows service links after install
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two manifest validation bugs blocked all store service installs:
1. service_store_manager.RESERVED_SUBDOMAINS included 'mail', which
prevented the email service from using its required subdomain.
Removed mail/calendar/files/webmail — they belong to official PIC
store services and must be claimable by them.
2. manifest_validator required @sha256 digest pins on ALL images,
including first-party git.pic.ngo/roof/* images that the PIC team
builds and controls. service_store_manager._validate_manifest already
only warned for first-party images; the secondary validator was
stricter than intended, causing a hard reject on :latest tags.
Aligned to warn-not-reject for first-party; malformed digests (when
provided) are still a hard error.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
SERVICES was computed on line 33 using activeServiceIds which was not
declared until line 36. In strict JS, const is not hoisted — this threw
a ReferenceError on mount, crashing the component and showing a blank page.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fetches /api/services/active on load; service status cards and quick-
access links for email, calendar, files, and webmail are suppressed
until the service is installed via the Store. Core services (WireGuard,
Routing, Network) always show. Fixes #setup_complete gate on dev stack.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Routes outbound traffic from installed service containers through
alternate exits (wireguard_ext, openvpn, tor) using host-side
iptables fwmark policy-routing in a dedicated PIC_EGRESS chain.
Marks 0x110/0x120/0x130 are distinct from ConnectivityManager's
0x10/0x20/0x30. Container IPs discovered at runtime via docker
inspect. Wired into ServiceStoreManager install/remove lifecycle
and managers.py singleton. 22 new tests.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Services with accounts.manager='http' now use POST/DELETE to the
service container's /service-api/accounts endpoint instead of
requiring a named Python manager. _resolve_service allows 'http'
without a registered Python object; _provision_http and
_deprovision_http handle the HTTP calls with 404-as-success on
delete. 9 new tests.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Services are now installed post-setup from the Store page, so the
wizard step that let users pre-select email/calendar/files is removed.
Reduces wizard from 5 steps to 4 (Step4Services deleted, Step5Review
renamed to Step4Review). Backend drops services_enabled validation,
background install thread, and service_store_manager dependency.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
setup_cell.py no longer creates mail/radicale/webdav config and data dirs —
those are managed by ServiceComposer when services are installed. Added
data/services/ for ServiceComposer. sanity_check.py now uses stdlib urllib
and discovers installed services via /api/services/active before checking
their status routes.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Email, calendar, and files are now optional store services, not always-on
builtins. Updated README, QUICKSTART, Wiki, and service-developer-guide to
reflect: dynamic nav, optional service install flow, correct egress
identifiers (wireguard_ext/default vs wireguard/cell_internet), removed
builtin/store distinction from manifest reference, 7 core containers.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Email/calendar/files routes now return 404 when the service is not
installed, using a require_active_service decorator that checks
ServiceRegistry. Status endpoints are exempt so health checks always work.
SetupManager.complete_setup() now accepts a service_store_manager and
installs any wizard-selected services in a background daemon thread after
setup completes. Failures are logged but do not fail the wizard.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Email, calendar, files, webmail (rainloop), and the file manager (filegator)
are removed from the main docker-compose stack. They install as independent
per-service compose projects via ServiceComposer.
On startup, _cleanup_legacy_builtin_containers() stops and removes any of the
5 legacy containers still running from the old main stack (guarded by a
one-shot sentinel in _meta.legacy_builtins_cleaned so it never runs twice).
Per-service installs (com.docker.compose.project != 'pic') are left untouched.
Changes:
- docker-compose.yml: remove mail, radicale, webdav, rainloop, filegator blocks;
fix dhcp + ntp to profiles: ["core","full"] so they start with --profile core
- Makefile: replace all --profile full with --profile core (6 occurrences);
remove mailserver.env conditional from update: target
- api/legacy_cleanup.py: new module with cleanup_legacy_builtin_containers()
- api/app.py: import and call cleanup at startup before reapply_on_startup()
- tests/test_legacy_cleanup.py: 7 tests covering sentinel, absent containers,
per-service project skip, main-stack removal, exception safety
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Email, calendar, and files no longer appear in the nav or as usable pages
unless they are installed. The nav refreshes whenever a service is installed
or removed via the new pic-services-changed CustomEvent.
Changes:
- routes/services.py: add GET /api/services/active endpoint
- api.js: add servicesAPI.listActive()
- App.jsx: replace hardcoded coreServiceChildren with dynamic state fetched
from /api/services/active; SERVICE_META maps ids to nav entry shapes
- ServiceNotInstalledBanner.jsx: new component — admin gets catalog link,
peer gets "contact admin" message
- EmailPage/CalendarPage/FilesPage: show banner when service not installed
- ServicesIndex.jsx: remove CoreServiceCard + CORE_SERVICES "Built-in"
section; rename Remove → Uninstall; dispatch pic-services-changed on
install/uninstall success
- MyServices.jsx: conditionally render service cards based on active list;
placeholder card when absent; page-level notice when nothing is installed
- tests/test_services_active_endpoint.py: 4 new endpoint tests
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ServiceStoreManager.install() now delegates container lifecycle to
ServiceComposer (per-service docker-compose.yml) instead of appending to a
shared compose override. This eliminates IP pool allocation, compose override
rendering, and the single-stack docker exec approach.
Changes:
- service_composer.py: add _resolve_requires(), _resolve_dependents(),
reapply_active_services() — dependency graph and startup reapply
- service_store_manager.py: rewrite install() and remove() to use
ServiceComposer; add _fetch_template(); delete _allocate_service_ip(),
_render_compose_override(), _write_compose_override(); remove() now guards
against removing services that others depend on
- managers.py: pass service_composer= to ServiceStoreManager
- Tests: 13 new composer dep tests; TestInstall/TestRemove rewritten for
the new composer-driven path; test_optional_services_feature.py updated
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Builtins (email/calendar/files) are no longer baked into the API image.
ServiceRegistry now only knows about installed store services. When nothing
is installed, Caddy and DNS get no service routes — no hardcoded fallback.
Changes:
- service_registry.py: remove _BUILTINS_DIR, _builtin_ids, _builtin_manifest,
_load_manifest; get() and list_all() now delegate entirely to installed services
- caddy_manager.py: remove _build_core_service_routes(); remove hardcoded
fallback pairs from _http01_service_pairs(); empty registry → api block only
- network_manager.py: _get_service_subdomains() returns [] when no registry
- api/services/builtins/: deleted (email, calendar, files manifests)
- Tests updated throughout: removed builtin-dependent assertions, added
installed-service fixtures, updated fallback expectations to api-only
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three related fixes discovered during review of Phase 0 and Phase 1 manifests:
1. validate_rendered_compose(): add allowed_data_dir param. After ${PIC_DATA_DIR}
substitution, compose templates produce absolute paths; without this the
validator would reject every service install. ServiceComposer.write_compose()
now passes its resolved data_dir so only the designated data directory is
exempt — /etc, /proc, docker.sock etc. still blocked.
2. _RESERVED_SUBDOMAINS: remove service-level subdomains (mail, calendar, files,
webdav, webmail). The reserved list should protect PIC infrastructure endpoints
(api, webui, admin) — not service subdomains that official store services
(calendar, files, webmail) must be allowed to claim. Aligns with the
existing _RESERVED_SUBS in service_registry.py.
3. ServiceRegistry.list_active(): new method returning only installed store
services (no builtins). This is the forward-looking API that Phase 2 will
make the primary read path once builtins are deleted. Adding it now unblocks
the QA agent's test_optional_services_feature.py which was already testing
the expected Phase 2 behaviour.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduces api/manifest_validator.py as a single security chokepoint
imported by both ServiceComposer and ServiceStoreManager:
- validate_manifest(): rejects kind=builtin, reserved container names,
reserved subdomains, backend denylist (localhost, cell-api, etc.),
cap_add outside allowlist / in denylist, shell-string provision hooks,
and env values with shell-special characters
- validate_rendered_compose(): walks the rendered YAML and rejects
privileged:true, host network/pid/ipc/userns, absolute bind mounts,
denied capabilities, devices key, apparmor/seccomp unconfined, and
string-form command/entrypoint (shell-injection vector)
- validate_provision_hook(): requires argv list form, lowercase binary,
rejects NUL bytes
ServiceStoreManager changes:
- _validate_manifest() delegates to validate_manifest() after existing checks
- _fetch_manifest() and fetch_index() now stream with a 256 KB size cap
(prevents memory exhaustion from a malicious or compromised index)
- Digest-pin warning for images missing @sha256 (hard error for unknown
registries, warning for git.pic.ngo/roof/* and TRUSTED_IMAGES_NO_DIGEST)
ServiceComposer changes:
- write_compose() calls validate_rendered_compose() before any disk write
so no partial file is left if validation fails
- render_template() substitutes ${PIC_DATA_DIR} with the resolved data_dir path
102 new tests in tests/test_manifest_validator.py covering all five P0
security issues. Existing test mocks updated to use streaming response
pattern (stream=True + raw.read) and valid compose YAML templates.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>