fix: scope ADRs to product/code decisions only

ADR – 001 refocused from the CI pipeline onto the product decision (cells
verify digest-pinned, cosign-signed store images; enforce by default).
ADR – 004 (docs process) archived — out of ADR scope per maintainer rule.
Sidebar gains an Archive section so archived pages stay discoverable.
2026-06-11 15:47:54 -04:00
parent 150b9c6c47
commit b54d42ba57
5 changed files with 67 additions and 58 deletions
@@ -1,54 +0,0 @@
> **Status:** Active | **Owner:** @roof | **Updated:** 2026-06-11
# ADR – 001 Isolated Build and Signing Pipeline for Store Images
## Context
The service store will accept community submissions. Any community-submitted pull request contains a Dockerfile that runs arbitrary code during `docker build`. Two distinct threat vectors exist:
1. **Malware in pre-built images.** A maintainer who blindly pulls and runs a pre-built image submitted by a community member has no assurance that the image matches the published source.
2. **Credential theft during build.** Building untrusted Dockerfiles on a CI runner that has access to signing keys, registry credentials, or other secrets gives the attacker direct access to those secrets via the build environment.
These two risks mean neither "trust the community image" nor "build on the trusted runner" is safe as a sole approach.
---
## Options Considered
### Option A — Mount Docker socket into the CI runner
The simplest approach: give a single CI runner both Docker build access and the ability to sign and push images. No isolation.
**Rejected because:** Docker socket access is root-equivalent on the host. A malicious Dockerfile that exploits the build to escape the container gains the runner's full credentials, including signing keys and registry push access. This is a complete supply-chain compromise.
### Option B — Rootless builders on the trusted runner
Use rootless Buildah or Podman on the trusted runner (`gitea-action0`) where secrets live, with strict process isolation but no VM boundary.
**Rejected because:** container escapes and Linux kernel exploits in rootless builders have a history. A sophisticated Dockerfile can still exfiltrate secrets through timing attacks, shared kernel state, or sandbox escapes without a VM boundary. The risk surface is smaller but not zero.
### Option C — Two-stage pipeline with a sacrificial VM (chosen)
Separate the build from the sign-and-push using two physically distinct runner VMs:
- **`gitea-action1`** (sacrificial, no secrets): accepts pull requests from the community, builds the Docker image using kaniko (no Docker daemon, no host privileges), and pushes an unverified image to a staging area. This runner is treated as potentially hostile — it is rebuilt between jobs and never given signing keys or registry push credentials beyond the staging target.
- **`gitea-action0`** (trusted, has secrets): pulls the built image, runs a Trivy vulnerability scan, signs with cosign using the PIC signing key (stored only in Gitea Secrets on this runner), re-tags and pushes to the production registry, then writes the `@sha256:` digest back into the manifest in the `pic-services` repository.
Cells bundle the cosign public key and verify every store image before starting its containers. Rollout used warn-by-default first; the default was flipped to enforce once all existing store images had been signed.
---
## Decision
We use the two-stage pipeline (Option C). Untrusted Dockerfiles are built in an isolated, no-secrets kaniko environment on `gitea-action1`. The resulting image is then pulled, scanned, signed, and published by the trusted `gitea-action0`. Cells reject unsigned or undigested images by default (`image_verification: enforce`).
---
## Consequences
- **Unsigned or undigested images refuse to install** on any cell running the default `enforce` mode. This blocks malware injected between the community's source and the image pull.
- **Service developers must disable or downgrade verification locally.** During development, set `image_verification.mode` to `warn` or `off` in `config/api/cell_config.json` (then restart the API container) to run unsigned local images. See [[Dev – Build a Store Service]].
- **The cosign private key lives only in Gitea Secrets on `gitea-action0`.** It is never on disk on `gitea-action1` or on any cell. The public key is bundled with PIC and used only for verification.
- **Trivy scan is a gate, not an alert.** A failed scan blocks the sign-and-push step. The definition of "fail" is a HIGH or CRITICAL severity CVE with a fixed version available.
- **`gitea-action1` is disposable.** It has no persistent state worth protecting. If it is compromised during a build, the worst outcome is a poisoned staging image that fails the cosign verification gate on `gitea-action0`.
- **Manifest digests are written back by CI.** The `image` field in a merged manifest is always a `@sha256:` digest, not a mutable tag. This prevents tag-redirect attacks after the manifest is merged.
@@ -0,0 +1,55 @@
> **Status:** Active | **Owner:** @roof | **Applies to:** main (2026-06) | **Updated:** 2026-06-11
# ADR – 001 Store Images Are Signed and Verified by Cells
## Context
The service store will accept community-submitted services. A cell installing a store service pulls a container image and runs it with access to the cell network. Without provenance guarantees, two attacks are possible:
1. **Malicious image content.** An image that does not match its published source can carry malware onto every cell that installs it.
2. **Tag redirection.** A manifest that references a mutable tag (`:latest`) can be made to pull different content after review, even if the reviewed content was clean.
The store needed a way for a cell to decide, at install time, whether an image is exactly what the PIC publish process produced.
---
## Options Considered
### Option A — Trust the registry
Restrict images to `git.pic.ngo/roof/*` and rely on registry access control.
**Rejected because:** it protects against outsiders but not against a compromised publish path, a stolen registry credential, or tag mutation. The cell has no way to detect that an image changed after review.
### Option B — Digest pinning only
Require every manifest `image` to be a `@sha256:` digest.
**Rejected as insufficient alone:** digests guarantee immutability but not origin. A digest written into a manifest by an attacker still installs cleanly. Digest pinning is kept as a necessary part of the solution.
### Option C — Digest pinning + cryptographic signing, verified on the cell (chosen)
Every published store image is digest-pinned and signed with cosign. The cell verifies the signature against a bundled public key before starting any store container.
---
## Decision
Store images must be **digest-pinned and cosign-signed**, and **cells verify them at install time**:
- `ServiceStoreManager` rejects manifests whose `image` is not a `@sha256:` digest.
- `ServiceComposer` runs `cosign verify` against the bundled public key (`config/cosign/cosign.pub`) before bringing a service up (`api/service_composer.py`).
- Behaviour is controlled by the `image_verification` section in `cell_config.json`: `off` | `warn` | `enforce`. The default is **`enforce`** — an undigested, unsigned, or signature-mismatched image refuses to install. If the verification mode cannot be read (corrupt config), the composer falls back to `enforce`: verification fails closed, never silently weakens.
- Signing happens in the publish pipeline (images are signed and their digests written back into manifests before they reach the store index); the private key never exists on a cell.
Rollout was staged: verification shipped warn-by-default first, and the default flipped to `enforce` once every store image was signed.
---
## Consequences
- **Unsigned or undigested images refuse to install** on any cell running the default `enforce` mode.
- **Service developers must downgrade verification locally.** To run images that have not been through the publish pipeline, set `image_verification.mode` to `warn` or `off` in `config/api/cell_config.json` and restart the API container. See [[Dev – Build a Store Service]].
- **Manifests always reference digests, not tags.** The `image` field in a published manifest is a `@sha256:` digest, which removes tag-redirect attacks entirely.
- **Only the public key ships with PIC.** Compromising a cell yields nothing that can sign new images.
- **There is currently no API route or UI for the verification mode** — changing it is a config-file edit. This is intentional friction: weakening verification should not be one click away.
@@ -1,4 +1,6 @@
> **Status:** Active | **Owner:** @roof | **Updated:** 2026-06-11 > **Status:** Deprecated | **Owner:** @roof | **Updated:** 2026-06-11
> ⚠️ **ARCHIVED** — ADRs cover PIC product and code decisions only; documentation-process decisions are out of ADR scope. The decision itself still stands: the wiki is the single documentation source and the code repo keeps only `README.md`.
# ADR – 004 The Wiki Is the Single Documentation Source # ADR – 004 The Wiki Is the Single Documentation Source
+1 -1
@@ -185,6 +185,6 @@ Merged services are built by the two-stage pipeline (untrusted kaniko build →
While developing locally, set `image_verification.mode` to `warn` or `off` in `config/api/cell_config.json` (then restart the API container) to run images that have not been through the pipeline yet. While developing locally, set `image_verification.mode` to `warn` or `off` in `config/api/cell_config.json` (then restart the API container) to run images that have not been through the pipeline yet.
See [[ADR – 001 Isolated Build and Signing Pipeline for Store Images]] for the rationale behind the two-stage pipeline. See [[ADR – 001 Store Images Are Signed and Verified by Cells]] for the rationale behind signing and verification.
See [[Dev – Service Manifest Reference]] for the complete field reference, compose template variables, the account-provisioning HTTP interface, and backup/egress integration details. See [[Dev – Service Manifest Reference]] for the complete field reference, compose template variables, the account-provisioning HTTP interface, and backup/egress integration details.
+8 -2
@@ -47,10 +47,9 @@
### Decisions (ADRs) ### Decisions (ADRs)
[[ADR – 001 Isolated Build and Signing Pipeline for Store Images]] [[ADR – 001 Store Images Are Signed and Verified by Cells]]
[[ADR – 002 Named Connection Instances for Connectivity]] [[ADR – 002 Named Connection Instances for Connectivity]]
[[ADR – 003 All Optional Functionality Ships as Store Services]] [[ADR – 003 All Optional Functionality Ships as Store Services]]
[[ADR – 004 The Wiki Is the Single Documentation Source]]
--- ---
@@ -59,3 +58,10 @@
[[Meta – Glossary]] [[Meta – Glossary]]
[[Meta – Template Runbook]] [[Meta – Template Runbook]]
[[Meta – Template ADR]] [[Meta – Template ADR]]
---
### Archive
[[Archive – User Guide]]
[[Archive – ADR 004 The Wiki Is the Single Documentation Source]]