Live verification on pic1 of the connectivity v2 multi-instance feature
surfaced four integration bugs that prevented installing any published
connectivity store service (proxy/wireguard-ext/openvpn-client/sshuttle)
and left stale host routing state behind. All four are fixed here:
1. manifest_validator rejected the CI-published `name:tag@sha256:<digest>`
image form (it required digest-only), while service_store_manager already
accepted it — so every published store image failed validation. Allow an
optional tag before the digest, matching service_store_manager.
2. The cell-api image shipped the docker CLI but not the Compose v2 plugin,
so every `docker compose` ServiceComposer runs (pull/up/down for store
services) failed with "'compose' is not a docker command". Copy the
compose plugin binary from the docker-cli stage.
3. service_store_manager.install ran the base compose up for instanceable
services, whose template still contains ${INSTANCE_ID}/${REDIRECT_PORT}
(there is no base container — one runs per connection instance). It now
verifies the image signature but defers the container to connection
creation for instanceable manifests.
4. delete_connection freed the record/secrets/container but never removed the
connection's individually-managed `ip rule fwmark->table` or its FORWARD
kill-switch (apply_routes only flushes the PIC_CONNECTIVITY chains and
re-adds rules for surviving connections), leaking stale host routing state.
It now tears both down; added _remove_killswitch.
Verified end-to-end on pic1: two proxy instances allocate distinct
marks/tables/ports (skipping in-use resources), render distinct per-instance
containers, two peers route through distinct instances (per-peer MARK +
REDIRECT), delete is blocked while referenced (409) and cleans its ip rule.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
All store images are now digest-pinned and cosign-signed by the publish
pipeline, so the warn-by-default training-wheels period is over: an
unsigned or undigested image must not install unless the admin
explicitly opts out. The service_composer fallback used when the config
manager is unavailable or corrupt also flips to enforce — config
corruption must fail closed rather than silently weaken verification.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- config/cosign/cosign.pub: public verification key committed to repo (safe);
cosign private key lives in /home/roof/.pic-secrets/ and is NEVER committed
- api/config_manager.py: image_verification config block (modes: off|warn|enforce,
default: warn) so existing deployments are unaffected until images are signed
- api/service_composer.py: cosign verify before pull/up; enforce aborts the
operation, warn logs and proceeds, off skips entirely; also fixes the prior
unsafe proceed-on-pull-failure path
- api/service_store_manager.py: store-image digest requirement (warn default,
reject under enforce)
- api/Dockerfile: cosign binary copied from the official cosign image
- docker-compose.yml: config/cosign/ bind-mounted into cell-api container
- install.sh: ensure/verify bundled cosign pubkey on new cell installs
- api/manifest_validator.py: validate_build_context() — Dockerfile lint
- tests: full coverage for config modes, composer verify paths, store digest
guard, and validate_build_context
Verification defaults to warn so nothing breaks in production until images are
signed (phase 2). Private key stored outside git at /home/roof/.pic-secrets/.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- PicNgoDDNS.update(): send token in request body instead of Authorization
header; DDNS server validates it from body (was returning HTTP 422 on
every heartbeat, leaving IP record stale after fresh install)
- peers.py / Peers.jsx: webdav service_access only valid when 'files' store
service is installed; was always shown even with no services, confusing
users into thinking WebDAV was pre-installed
- 10 new regression tests: DDNS update body contract, Caddy always
regenerates on startup with no services, peer role allowed on
/api/services/active, webdav gating by installed services
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- add_peer() now calls account_manager.provision() for any installed store
service whose manifest declares accounts.manager == 'http', enabling
per-peer credential provisioning to third-party HTTP services
- reapply_on_startup() calls egress_manager.apply_all() so fwmark rules
survive container restarts without manual intervention
- add GET /api/egress/status and PUT /api/egress/services/<id>/exit routes
so the UI can read and override per-service egress policy
- tests: HTTP provision wiring (happy path + non-fatal failure), egress
apply_all at startup (wired/unwired/failure cases)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two manifest validation bugs blocked all store service installs:
1. service_store_manager.RESERVED_SUBDOMAINS included 'mail', which
prevented the email service from using its required subdomain.
Removed mail/calendar/files/webmail — they belong to official PIC
store services and must be claimable by them.
2. manifest_validator required @sha256 digest pins on ALL images,
including first-party git.pic.ngo/roof/* images that the PIC team
builds and controls. service_store_manager._validate_manifest already
only warned for first-party images; the secondary validator was
stricter than intended, causing a hard reject on :latest tags.
Aligned to warn-not-reject for first-party; malformed digests (when
provided) are still a hard error.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ServiceStoreManager.install() now delegates container lifecycle to
ServiceComposer (per-service docker-compose.yml) instead of appending to a
shared compose override. This eliminates IP pool allocation, compose override
rendering, and the single-stack docker exec approach.
Changes:
- service_composer.py: add _resolve_requires(), _resolve_dependents(),
reapply_active_services() — dependency graph and startup reapply
- service_store_manager.py: rewrite install() and remove() to use
ServiceComposer; add _fetch_template(); delete _allocate_service_ip(),
_render_compose_override(), _write_compose_override(); remove() now guards
against removing services that others depend on
- managers.py: pass service_composer= to ServiceStoreManager
- Tests: 13 new composer dep tests; TestInstall/TestRemove rewritten for
the new composer-driven path; test_optional_services_feature.py updated
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduces api/manifest_validator.py as a single security chokepoint
imported by both ServiceComposer and ServiceStoreManager:
- validate_manifest(): rejects kind=builtin, reserved container names,
reserved subdomains, backend denylist (localhost, cell-api, etc.),
cap_add outside allowlist / in denylist, shell-string provision hooks,
and env values with shell-special characters
- validate_rendered_compose(): walks the rendered YAML and rejects
privileged:true, host network/pid/ipc/userns, absolute bind mounts,
denied capabilities, devices key, apparmor/seccomp unconfined, and
string-form command/entrypoint (shell-injection vector)
- validate_provision_hook(): requires argv list form, lowercase binary,
rejects NUL bytes
ServiceStoreManager changes:
- _validate_manifest() delegates to validate_manifest() after existing checks
- _fetch_manifest() and fetch_index() now stream with a 256 KB size cap
(prevents memory exhaustion from a malicious or compromised index)
- Digest-pin warning for images missing @sha256 (hard error for unknown
registries, warning for git.pic.ngo/roof/* and TRUSTED_IMAGES_NO_DIGEST)
ServiceComposer changes:
- write_compose() calls validate_rendered_compose() before any disk write
so no partial file is left if validation fails
- render_template() substitutes ${PIC_DATA_DIR} with the resolved data_dir path
102 new tests in tests/test_manifest_validator.py covering all five P0
security issues. Existing test mocks updated to use streaming response
pattern (stream=True + raw.read) and valid compose YAML templates.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>