Files
pic/README.md
T
roof d5018c2b34 fix: architecture audit — security, atomicity, broken endpoints, test coverage
Sprint 1 — Security & correctness:
- Restore all 10 commented-out is_local_request() checks (vault, containers, images, volumes)
- Fix XFF spoofing: only trust the LAST X-Forwarded-For entry (Caddy's append), not all
- Require prefix length in wireguard.address (was accepting bare IPs like 10.0.0.1)
- Validate service_access list in add_peer (valid: calendar/files/mail/webdav)
- Fix dhcp/reservations POST/DELETE: unpack mac/ip/hostname from body (was passing dict as positional arg)
- Fix network/test POST: remove spurious data arg (test_connectivity takes no args)
- Fix remove_peer: clear iptables rules and regenerate DNS ACLs on deletion (was leaving stale rules)
- Fix CoreDNS reload: SIGHUP → SIGUSR1 (SIGHUP kills the process; SIGUSR1 triggers reload plugin)
- Remove local.{domain} block from Corefile template (local.zone doesn't exist, caused log spam)
- Fix routing_manager._remove_nat_rule: targeted -D instead of flushing entire POSTROUTING chain

Sprint 2 — State consistency:
- Atomic config writes in config_manager, ip_utils, firewall_manager, network_manager
  (write to .tmp → fsync → os.replace, prevents truncated files on kill)
- backup_config: now also backs up Caddyfile, Corefile, .env, DNS zone files
- restore_config: restores all of the above so config stays consistent after restore

Sprint 3 — Dead code / documentation:
- Remove CellManager instantiation from app startup (was never called, double-instantiated all managers)
- Document routing_manager scope (targets host, not cell-wireguard; methods not called by any active route)

Sprint 4 — Test infrastructure:
- Add tests/conftest.py with shared tmp_dir, tmp_config_dir, tmp_data_dir, flask_client fixtures
- Add tests/test_config_validation.py: 400 paths for ip_range, port, wireguard.address validation
- Add tests/test_ip_utils_caddyfile.py: 14 tests for write_caddyfile (was completely untested)
- Expand test_app_misc.py: 7 new is_local_request tests covering XFF spoofing and cell-network IPs
- Add --cov-fail-under=70 to make test-coverage
- Add pre-commit hook that runs pytest before every commit

414 tests pass (was 372).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 03:27:52 -04:00

268 lines
8.8 KiB
Markdown

# Personal Internet Cell (PIC)
A self-hosted digital infrastructure platform. One stack, one API, one UI — managing DNS, DHCP, NTP, WireGuard VPN, email, calendar/contacts, file storage, and a reverse proxy on your own hardware.
---
## What it does
- **Network services** — CoreDNS, dnsmasq DHCP, chrony NTP, all dynamically managed
- **WireGuard VPN** — peer lifecycle, QR-code provisioning, per-peer service access control
- **Digital services** — Email (Postfix/Dovecot), Calendar/Contacts (Radicale CalDAV), Files (WebDAV + Filegator)
- **Reverse proxy** — Caddy with per-service virtual IPs; subdomains like `calendar.mycell.cell` work on VPN clients automatically
- **Certificate authority** — self-hosted CA via VaultManager
- **Cell mesh** — connect two PIC instances with site-to-site WireGuard + DNS forwarding
Everything is configured through a REST API and a React web UI. No manual config file editing needed for normal operations.
---
## Quick Start
### Prerequisites
- Debian/Ubuntu host (apt-based)
- 2 GB+ RAM, 10 GB+ disk
- Open ports: 53 (DNS), 80 (HTTP), 3000 (API), 8081 (Web UI), 51820/udp (WireGuard)
### Install
```bash
git clone <repo-url> pic
cd pic
# Install system deps (docker, python3, python3-cryptography, etc.)
make check-deps
# Generate keys + write configs
make setup
# Build and start all 12 containers
make start
```
`make setup` accepts overrides for a second cell on a different host:
```bash
CELL_NAME=pic1 VPN_ADDRESS=10.1.0.1/24 make setup && make start
```
### Access
| Service | URL |
|---------|-----|
| Web UI | `http://<host-ip>:8081` |
| API | `http://<host-ip>:3000` |
| Health | `http://<host-ip>:3000/health` |
From a WireGuard client: `http://mycell.cell` (replace with your cell name/domain).
### Local dev (no Docker)
```bash
pip install -r api/requirements.txt
python api/app.py # Flask API on :3000
cd webui && npm install && npm run dev # React UI on :5173 (proxies /api → :3000)
```
---
## Management Commands
```bash
# First install
make check-deps # install system packages via apt
make setup # generate keys, write configs, create data dirs
make start # start all 12 containers
# Daily operations
make status # container status + API health
make logs # follow all container logs
make logs-api # follow logs for one service (api, dns, wg, mail, caddy, ...)
make shell-api # shell inside a container
# Deploy latest code
make update # git pull + rebuild api image + restart
# Maintenance
make backup # tar config/ + data/ into backups/
make restore # list available backups and restore
make clean # remove containers/volumes, keep config/data
# Full wipe (test machines)
make reinstall # stop, wipe config/data, setup, start fresh
make uninstall # stop + remove images; prompts to also wipe config/data
# Tests
make test # run full pytest suite
make test-coverage # tests + HTML coverage report in htmlcov/
```
---
## Connecting Two Cells (PIC Mesh)
Two PIC instances form a mesh: site-to-site WireGuard tunnels with automatic DNS forwarding so each cell's services resolve from the other.
### Exchange invites
1. On **Cell A** → Web UI → **Cell Network** → copy the invite JSON.
2. On **Cell B****Cell Network** → paste into "Connect to Another Cell" → **Connect**.
3. On **Cell B** → copy its invite JSON.
4. On **Cell A** → paste Cell B's invite → **Connect**.
Both cells now have a WireGuard peer with `AllowedIPs = remote VPN subnet` and a CoreDNS forwarding block so `*.pic1.cell` resolves across the tunnel.
### Same-LAN tip
If both cells share the same external IP (behind NAT), replace the auto-detected endpoint with the LAN IP before connecting:
```json
{ "endpoint": "192.168.31.50:51820", ... }
```
---
## Architecture
### Stack
```
cell-caddy (Caddy) :80/:443 + per-service virtual IPs
cell-api (Flask :3000) REST API + config management + container orchestration
cell-webui (Nginx :8081) React UI
cell-dns (CoreDNS :53) internal DNS + per-peer ACLs
cell-dhcp (dnsmasq) DHCP + static reservations
cell-ntp (chrony) NTP
cell-wireguard WireGuard VPN
cell-mail (docker-mailserver) SMTP/IMAP
cell-radicale CalDAV/CardDAV :5232
cell-webdav WebDAV :80
cell-filegator file manager UI :8080
cell-rainloop webmail :8888
```
All containers share a custom Docker bridge network. Static IPs are assigned in `docker-compose.yml`. Caddy adds per-service virtual IPs to its own interface at API startup so `calendar.<domain>`, `files.<domain>`, etc. route to the right container.
### Backend (`api/`)
Service managers (`network_manager.py`, `wireguard_manager.py`, `peer_registry.py`, etc.) all inherit `BaseServiceManager`. `app.py` contains all Flask routes — one file, organized by service.
`ConfigManager` (`config_manager.py`) is the single source of truth. Config lives in `config/api/cell_config.json`. All managers read/write through it.
`ip_utils.py` owns all container IP logic via `CONTAINER_OFFSETS` — do not hardcode IPs elsewhere.
When a config change requires recreating the Docker network (e.g. `ip_range` change), the API spawns a helper container that outlives cell-api to run `docker compose down && up`. Other restarts run `compose up -d --no-deps <containers>` directly.
### Frontend (`webui/`)
React 18 + Vite + Tailwind CSS. All API calls go through `src/services/api.js` (Axios). Vite dev server proxies `/api` to `localhost:3000`. Pages in `src/pages/`, shared components in `src/components/`.
### Project layout
```
pic/
├── api/ # Flask API + all service managers
│ ├── app.py # all routes (~2700 lines)
│ ├── config_manager.py # unified config CRUD
│ ├── ip_utils.py # IP/CIDR helpers + Caddyfile generator
│ ├── firewall_manager.py # iptables (via cell-wireguard) + Corefile
│ ├── network_manager.py # DNS zones, DHCP, NTP
│ ├── wireguard_manager.py
│ ├── peer_registry.py
│ ├── vault_manager.py
│ ├── email_manager.py
│ ├── calendar_manager.py
│ ├── file_manager.py
│ └── container_manager.py
├── webui/ # React frontend
├── config/ # Config files (bind-mounted into containers)
│ ├── api/cell_config.json ← live config
│ ├── caddy/Caddyfile
│ ├── dns/Corefile
│ └── ...
├── data/ # Persistent data (git-ignored)
├── tests/ # pytest suite (372 tests, 27 files)
├── docker-compose.yml
└── Makefile
```
---
## API Reference
### Config
```
GET /api/config full config + service IPs
PUT /api/config update identity or service config
GET /api/config/pending pending restart info
POST /api/config/apply apply pending restart
POST /api/config/backup create backup
POST /api/config/restore/<backup_id> restore from backup
```
### Network
```
GET /api/dns/records
POST /api/dns/records
GET /api/dhcp/leases
GET /api/dhcp/reservations
POST /api/dhcp/reservations
```
### WireGuard & Peers
```
GET /api/wireguard/status
GET /api/wireguard/peers
POST /api/wireguard/peers
GET /api/peers
POST /api/peers
PUT /api/peers/<name>
DELETE /api/peers/<name>
GET /api/peers/<name>/config peer config + QR code
```
### Containers & Health
```
GET /api/containers
POST /api/containers/<name>/restart
GET /health
GET /api/services/status
```
---
## Testing
```bash
make test # run full suite
make test-coverage # coverage report in htmlcov/
pytest tests/test_<module>.py # single file
pytest tests/ -k "test_name" # single test
```
Tests live in `tests/` and use `unittest.TestCase` collected by pytest. External system calls (Docker, iptables, file writes) are mocked with `unittest.mock.patch`.
Known coverage gaps: `write_caddyfile`, `POST /api/config/apply` (helper container path), `PUT /api/config` 400 validation paths. These are the highest-risk untested paths.
---
## Security Notes
- The API is access-controlled by `is_local_request()` — it checks whether the request comes from a local/loopback/cell-network IP. Sensitive endpoints (containers, vault) are restricted to local access only.
- All per-peer service access is enforced via iptables rules inside `cell-wireguard` and CoreDNS ACL blocks.
- The Docker socket is mounted into `cell-api` for container management — treat network access to port 3000 as privileged.
- `ip_range` must be an RFC-1918 CIDR (10.0.0.0/8, 172.16.0.0/12, or 192.168.0.0/16). The API and UI both validate this.
---
## License
MIT — see [LICENSE](LICENSE).