diff --git a/ADR-–-001-Isolated-Build-and-Signing-Pipeline-for-Store-Images.md b/ADR-–-001-Isolated-Build-and-Signing-Pipeline-for-Store-Images.md deleted file mode 100644 index 33b90e1..0000000 --- a/ADR-–-001-Isolated-Build-and-Signing-Pipeline-for-Store-Images.md +++ /dev/null @@ -1,54 +0,0 @@ -> **Status:** Active | **Owner:** @roof | **Updated:** 2026-06-11 - -# ADR – 001 Isolated Build and Signing Pipeline for Store Images - -## Context - -The service store will accept community submissions. Any community-submitted pull request contains a Dockerfile that runs arbitrary code during `docker build`. Two distinct threat vectors exist: - -1. **Malware in pre-built images.** A maintainer who blindly pulls and runs a pre-built image submitted by a community member has no assurance that the image matches the published source. -2. **Credential theft during build.** Building untrusted Dockerfiles on a CI runner that has access to signing keys, registry credentials, or other secrets gives the attacker direct access to those secrets via the build environment. - -These two risks mean neither "trust the community image" nor "build on the trusted runner" is safe as a sole approach. - ---- - -## Options Considered - -### Option A — Mount Docker socket into the CI runner - -The simplest approach: give a single CI runner both Docker build access and the ability to sign and push images. No isolation. - -**Rejected because:** Docker socket access is root-equivalent on the host. A malicious Dockerfile that exploits the build to escape the container gains the runner's full credentials, including signing keys and registry push access. This is a complete supply-chain compromise. - -### Option B — Rootless builders on the trusted runner - -Use rootless Buildah or Podman on the trusted runner (`gitea-action0`) where secrets live, with strict process isolation but no VM boundary. - -**Rejected because:** container escapes and Linux kernel exploits in rootless builders have a history. A sophisticated Dockerfile can still exfiltrate secrets through timing attacks, shared kernel state, or sandbox escapes without a VM boundary. The risk surface is smaller but not zero. - -### Option C — Two-stage pipeline with a sacrificial VM (chosen) - -Separate the build from the sign-and-push using two physically distinct runner VMs: - -- **`gitea-action1`** (sacrificial, no secrets): accepts pull requests from the community, builds the Docker image using kaniko (no Docker daemon, no host privileges), and pushes an unverified image to a staging area. This runner is treated as potentially hostile — it is rebuilt between jobs and never given signing keys or registry push credentials beyond the staging target. -- **`gitea-action0`** (trusted, has secrets): pulls the built image, runs a Trivy vulnerability scan, signs with cosign using the PIC signing key (stored only in Gitea Secrets on this runner), re-tags and pushes to the production registry, then writes the `@sha256:` digest back into the manifest in the `pic-services` repository. - -Cells bundle the cosign public key and verify every store image before starting its containers. Rollout used warn-by-default first; the default was flipped to enforce once all existing store images had been signed. - ---- - -## Decision - -We use the two-stage pipeline (Option C). Untrusted Dockerfiles are built in an isolated, no-secrets kaniko environment on `gitea-action1`. The resulting image is then pulled, scanned, signed, and published by the trusted `gitea-action0`. Cells reject unsigned or undigested images by default (`image_verification: enforce`). - ---- - -## Consequences - -- **Unsigned or undigested images refuse to install** on any cell running the default `enforce` mode. This blocks malware injected between the community's source and the image pull. -- **Service developers must disable or downgrade verification locally.** During development, set `image_verification.mode` to `warn` or `off` in `config/api/cell_config.json` (then restart the API container) to run unsigned local images. See [[Dev – Build a Store Service]]. -- **The cosign private key lives only in Gitea Secrets on `gitea-action0`.** It is never on disk on `gitea-action1` or on any cell. The public key is bundled with PIC and used only for verification. -- **Trivy scan is a gate, not an alert.** A failed scan blocks the sign-and-push step. The definition of "fail" is a HIGH or CRITICAL severity CVE with a fixed version available. -- **`gitea-action1` is disposable.** It has no persistent state worth protecting. If it is compromised during a build, the worst outcome is a poisoned staging image that fails the cosign verification gate on `gitea-action0`. -- **Manifest digests are written back by CI.** The `image` field in a merged manifest is always a `@sha256:` digest, not a mutable tag. This prevents tag-redirect attacks after the manifest is merged. diff --git a/ADR-–-001-Store-Images-Are-Signed-and-Verified-by-Cells.md b/ADR-–-001-Store-Images-Are-Signed-and-Verified-by-Cells.md new file mode 100644 index 0000000..3c50eae --- /dev/null +++ b/ADR-–-001-Store-Images-Are-Signed-and-Verified-by-Cells.md @@ -0,0 +1,55 @@ +> **Status:** Active | **Owner:** @roof | **Applies to:** main (2026-06) | **Updated:** 2026-06-11 + +# ADR – 001 Store Images Are Signed and Verified by Cells + +## Context + +The service store will accept community-submitted services. A cell installing a store service pulls a container image and runs it with access to the cell network. Without provenance guarantees, two attacks are possible: + +1. **Malicious image content.** An image that does not match its published source can carry malware onto every cell that installs it. +2. **Tag redirection.** A manifest that references a mutable tag (`:latest`) can be made to pull different content after review, even if the reviewed content was clean. + +The store needed a way for a cell to decide, at install time, whether an image is exactly what the PIC publish process produced. + +--- + +## Options Considered + +### Option A — Trust the registry + +Restrict images to `git.pic.ngo/roof/*` and rely on registry access control. + +**Rejected because:** it protects against outsiders but not against a compromised publish path, a stolen registry credential, or tag mutation. The cell has no way to detect that an image changed after review. + +### Option B — Digest pinning only + +Require every manifest `image` to be a `@sha256:` digest. + +**Rejected as insufficient alone:** digests guarantee immutability but not origin. A digest written into a manifest by an attacker still installs cleanly. Digest pinning is kept as a necessary part of the solution. + +### Option C — Digest pinning + cryptographic signing, verified on the cell (chosen) + +Every published store image is digest-pinned and signed with cosign. The cell verifies the signature against a bundled public key before starting any store container. + +--- + +## Decision + +Store images must be **digest-pinned and cosign-signed**, and **cells verify them at install time**: + +- `ServiceStoreManager` rejects manifests whose `image` is not a `@sha256:` digest. +- `ServiceComposer` runs `cosign verify` against the bundled public key (`config/cosign/cosign.pub`) before bringing a service up (`api/service_composer.py`). +- Behaviour is controlled by the `image_verification` section in `cell_config.json`: `off` | `warn` | `enforce`. The default is **`enforce`** — an undigested, unsigned, or signature-mismatched image refuses to install. If the verification mode cannot be read (corrupt config), the composer falls back to `enforce`: verification fails closed, never silently weakens. +- Signing happens in the publish pipeline (images are signed and their digests written back into manifests before they reach the store index); the private key never exists on a cell. + +Rollout was staged: verification shipped warn-by-default first, and the default flipped to `enforce` once every store image was signed. + +--- + +## Consequences + +- **Unsigned or undigested images refuse to install** on any cell running the default `enforce` mode. +- **Service developers must downgrade verification locally.** To run images that have not been through the publish pipeline, set `image_verification.mode` to `warn` or `off` in `config/api/cell_config.json` and restart the API container. See [[Dev – Build a Store Service]]. +- **Manifests always reference digests, not tags.** The `image` field in a published manifest is a `@sha256:` digest, which removes tag-redirect attacks entirely. +- **Only the public key ships with PIC.** Compromising a cell yields nothing that can sign new images. +- **There is currently no API route or UI for the verification mode** — changing it is a config-file edit. This is intentional friction: weakening verification should not be one click away. diff --git a/ADR-–-004-The-Wiki-Is-the-Single-Documentation-Source.md b/Archive-–-ADR-004-The-Wiki-Is-the-Single-Documentation-Source.md similarity index 93% rename from ADR-–-004-The-Wiki-Is-the-Single-Documentation-Source.md rename to Archive-–-ADR-004-The-Wiki-Is-the-Single-Documentation-Source.md index 22dcf51..b8a635e 100644 --- a/ADR-–-004-The-Wiki-Is-the-Single-Documentation-Source.md +++ b/Archive-–-ADR-004-The-Wiki-Is-the-Single-Documentation-Source.md @@ -1,4 +1,6 @@ -> **Status:** Active | **Owner:** @roof | **Updated:** 2026-06-11 +> **Status:** Deprecated | **Owner:** @roof | **Updated:** 2026-06-11 + +> ⚠️ **ARCHIVED** — ADRs cover PIC product and code decisions only; documentation-process decisions are out of ADR scope. The decision itself still stands: the wiki is the single documentation source and the code repo keeps only `README.md`. # ADR – 004 The Wiki Is the Single Documentation Source diff --git a/Dev-–-Build-a-Store-Service.md b/Dev-–-Build-a-Store-Service.md index 2a0de85..3173ad7 100644 --- a/Dev-–-Build-a-Store-Service.md +++ b/Dev-–-Build-a-Store-Service.md @@ -185,6 +185,6 @@ Merged services are built by the two-stage pipeline (untrusted kaniko build → While developing locally, set `image_verification.mode` to `warn` or `off` in `config/api/cell_config.json` (then restart the API container) to run images that have not been through the pipeline yet. -See [[ADR – 001 Isolated Build and Signing Pipeline for Store Images]] for the rationale behind the two-stage pipeline. +See [[ADR – 001 Store Images Are Signed and Verified by Cells]] for the rationale behind signing and verification. See [[Dev – Service Manifest Reference]] for the complete field reference, compose template variables, the account-provisioning HTTP interface, and backup/egress integration details. diff --git a/_Sidebar.md b/_Sidebar.md index 290b846..5f0a515 100644 --- a/_Sidebar.md +++ b/_Sidebar.md @@ -47,10 +47,9 @@ ### Decisions (ADRs) -[[ADR – 001 Isolated Build and Signing Pipeline for Store Images]] +[[ADR – 001 Store Images Are Signed and Verified by Cells]] [[ADR – 002 Named Connection Instances for Connectivity]] [[ADR – 003 All Optional Functionality Ships as Store Services]] -[[ADR – 004 The Wiki Is the Single Documentation Source]] --- @@ -59,3 +58,10 @@ [[Meta – Glossary]] [[Meta – Template Runbook]] [[Meta – Template ADR]] + +--- + +### Archive + +[[Archive – User Guide]] +[[Archive – ADR 004 The Wiki Is the Single Documentation Source]]