Commit Graph

11 Commits

Author SHA1 Message Date
roof 238db60702 feat: secure build phase 1 — cosign cell-side image verification (warn default) + Dockerfile validation
Unit Tests / test (push) Successful in 13m28s
- config/cosign/cosign.pub: public verification key committed to repo (safe);
  cosign private key lives in /home/roof/.pic-secrets/ and is NEVER committed
- api/config_manager.py: image_verification config block (modes: off|warn|enforce,
  default: warn) so existing deployments are unaffected until images are signed
- api/service_composer.py: cosign verify before pull/up; enforce aborts the
  operation, warn logs and proceeds, off skips entirely; also fixes the prior
  unsafe proceed-on-pull-failure path
- api/service_store_manager.py: store-image digest requirement (warn default,
  reject under enforce)
- api/Dockerfile: cosign binary copied from the official cosign image
- docker-compose.yml: config/cosign/ bind-mounted into cell-api container
- install.sh: ensure/verify bundled cosign pubkey on new cell installs
- api/manifest_validator.py: validate_build_context() — Dockerfile lint
- tests: full coverage for config modes, composer verify paths, store digest
  guard, and validate_build_context

Verification defaults to warn so nothing breaks in production until images are
signed (phase 2). Private key stored outside git at /home/roof/.pic-secrets/.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 03:53:47 -04:00
roof 603225694c feat: connectivity redesign phase 5 — one container per connection instance
Unit Tests / test (push) Successful in 13m5s
instanceable rendering, per-instance up/down on create/delete,
store-service-installed gate, per-instance health

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 22:56:31 -04:00
roof 6232ef23a9 feat: connectivity — registry-driven peer table, sshuttle/proxy egress, egress UI
The peer table was empty because it was not consulting the peer registry;
now peers are driven by PeerRegistry so the Connectivity page reflects actual
connected cells.

Exit-key handling is unified: all code paths now use the same key derivation
so a store-service exit bridge and a manual WireGuard peer both produce
consistent routing state.

Two new egress exit types are added (sshuttle via SSH tunnel and proxy via
redsocks SOCKS5), wiring through connectivity_manager, egress_manager, and
app.py routes. This lets a cell route its traffic through an SSH host or a
SOCKS5 proxy as an alternative to WireGuard exit nodes.

ServiceStoreManager and ServiceBus updated so the egress lifecycle (install /
uninstall) is cleanly signalled between components.

Connectivity.jsx gains the Service Egress section, letting operators assign
and reassign egress methods from the UI without touching config files.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 08:36:15 -04:00
roof 1607a2e86f fix: peer access to /api/services/active and unconditional Caddy startup regen
Unit Tests / test (push) Successful in 7m23s
- Add _PEER_READABLE_PATHS allowlist in enforce_auth so peer-role sessions
  can read /api/services/active; fixes My Services showing 'not installed'
  for cell members when services are installed
- Move Caddy regeneration before the early-return in reapply_on_startup so
  the Caddyfile is always rebuilt from current identity on startup, even when
  no store services are installed; fixes ERR_SSL_PROTOCOL_ERROR after a cell
  rename (Caddyfile retained old wildcard domain)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 15:58:27 -04:00
roof 41d09c598b wire: AccountManager HTTP dispatch + EgressManager startup + egress API routes
Unit Tests / test (push) Successful in 11m15s
- add_peer() now calls account_manager.provision() for any installed store
  service whose manifest declares accounts.manager == 'http', enabling
  per-peer credential provisioning to third-party HTTP services
- reapply_on_startup() calls egress_manager.apply_all() so fwmark rules
  survive container restarts without manual intervention
- add GET /api/egress/status and PUT /api/egress/services/<id>/exit routes
  so the UI can read and override per-service egress policy
- tests: HTTP provision wiring (happy path + non-fatal failure), egress
  apply_all at startup (wired/unwired/failure cases)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-05 10:30:41 -04:00
roof f7bb2cc962 fix: allow first-party store service subdomains and registry images
Unit Tests / test (push) Successful in 11m25s
Two manifest validation bugs blocked all store service installs:

1. service_store_manager.RESERVED_SUBDOMAINS included 'mail', which
   prevented the email service from using its required subdomain.
   Removed mail/calendar/files/webmail — they belong to official PIC
   store services and must be claimable by them.

2. manifest_validator required @sha256 digest pins on ALL images,
   including first-party git.pic.ngo/roof/* images that the PIC team
   builds and controls. service_store_manager._validate_manifest already
   only warned for first-party images; the secondary validator was
   stricter than intended, causing a hard reject on :latest tags.
   Aligned to warn-not-reject for first-party; malformed digests (when
   provided) are still a hard error.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-30 03:09:41 -04:00
roof 03a67ad922 feat: add EgressManager — per-service egress enforcement via host iptables
Unit Tests / test (push) Successful in 11m20s
Routes outbound traffic from installed service containers through
alternate exits (wireguard_ext, openvpn, tor) using host-side
iptables fwmark policy-routing in a dedicated PIC_EGRESS chain.
Marks 0x110/0x120/0x130 are distinct from ConnectivityManager's
0x10/0x20/0x30. Container IPs discovered at runtime via docker
inspect. Wired into ServiceStoreManager install/remove lifecycle
and managers.py singleton. 22 new tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-30 00:58:47 -04:00
roof 87c321c1c9 feat: Phase 3 — ServiceComposer deps + store install via per-service compose
Unit Tests / test (push) Successful in 11m21s
ServiceStoreManager.install() now delegates container lifecycle to
ServiceComposer (per-service docker-compose.yml) instead of appending to a
shared compose override. This eliminates IP pool allocation, compose override
rendering, and the single-stack docker exec approach.

Changes:
- service_composer.py: add _resolve_requires(), _resolve_dependents(),
  reapply_active_services() — dependency graph and startup reapply
- service_store_manager.py: rewrite install() and remove() to use
  ServiceComposer; add _fetch_template(); delete _allocate_service_ip(),
  _render_compose_override(), _write_compose_override(); remove() now guards
  against removing services that others depend on
- managers.py: pass service_composer= to ServiceStoreManager
- Tests: 13 new composer dep tests; TestInstall/TestRemove rewritten for
  the new composer-driven path; test_optional_services_feature.py updated

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 09:33:02 -04:00
roof c40919d374 feat: Phase 0 — manifest_validator, compose YAML safety check, cap_add allowlist, backend denylist, provision hook enforcement, size cap
Introduces api/manifest_validator.py as a single security chokepoint
imported by both ServiceComposer and ServiceStoreManager:

- validate_manifest(): rejects kind=builtin, reserved container names,
  reserved subdomains, backend denylist (localhost, cell-api, etc.),
  cap_add outside allowlist / in denylist, shell-string provision hooks,
  and env values with shell-special characters
- validate_rendered_compose(): walks the rendered YAML and rejects
  privileged:true, host network/pid/ipc/userns, absolute bind mounts,
  denied capabilities, devices key, apparmor/seccomp unconfined, and
  string-form command/entrypoint (shell-injection vector)
- validate_provision_hook(): requires argv list form, lowercase binary,
  rejects NUL bytes

ServiceStoreManager changes:
- _validate_manifest() delegates to validate_manifest() after existing checks
- _fetch_manifest() and fetch_index() now stream with a 256 KB size cap
  (prevents memory exhaustion from a malicious or compromised index)
- Digest-pin warning for images missing @sha256 (hard error for unknown
  registries, warning for git.pic.ngo/roof/* and TRUSTED_IMAGES_NO_DIGEST)

ServiceComposer changes:
- write_compose() calls validate_rendered_compose() before any disk write
  so no partial file is left if validation fails
- render_template() substitutes ${PIC_DATA_DIR} with the resolved data_dir path

102 new tests in tests/test_manifest_validator.py covering all five P0
security issues.  Existing test mocks updated to use streaming response
pattern (stream=True + raw.read) and valid compose YAML templates.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 07:23:08 -04:00
roof 16fb362df7 feat: replace hardcoded service names with ServiceRegistry-driven Caddy and CoreDNS config
Unit Tests / test (push) Failing after 11s
Previously, CaddyManager and NetworkManager contained hardcoded lists of
service names (calendar, files, mail, webdav, etc.), meaning every new
service required a code change to appear in Caddy routes and DNS records.
Now both managers accept a service_registry parameter and derive their
service lists dynamically from the registry at runtime.

- CaddyManager: new _build_registry_service_routes() and
  _http01_service_pairs() methods pull routes from the registry
- NetworkManager: new _get_service_subdomains() method returns registry
  subdomains with a hardcoded fallback when no registry is wired in;
  _build_dns_records, stale-record detection, and service name sets all
  use the registry
- managers.py: service_registry constructed before network_manager so it
  can be injected into both CaddyManager and NetworkManager
- service_registry.py: validation chokepoint in get_caddy_routes() rejects
  invalid subdomain/backend values and reserved service names
- service_store_manager.py: _validate_manifest now validates top-level
  subdomain, backend, extra_subdomains, and extra_backends fields
- tests: 24 new tests covering registry-driven routing and DNS subdomain
  generation (test_caddy_registry_integration.py)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-28 18:27:52 -04:00
roof 0a21f22076 Phase 4: service store — manifest validation, install/remove, Store UI
- ServiceStoreManager: manifest allowlist (git.pic.ngo/roof/*), volume
  denylist, ACCEPT-only iptables rules, ${SERVICE_IP}-only dest_ip
- IP allocator: pool 172.20.0.20-254, skips CONTAINER_OFFSETS VIPs
- Compose overlay: docker-compose.services.yml auto-included via DCF
- Flask blueprint at /api/store: list, install, remove, refresh
- Store.jsx: full install/remove UI with spinners and toast notifications
- 95 new unit tests for ServiceStoreManager (all passing)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 10:19:39 -04:00