Commit Graph

140 Commits

Author SHA1 Message Date
roof 4ba79fd614 Fix Phase 1 permission sync: route push via cell-wireguard + DNAT receive
cell-api has no route to remote WG tunnel IPs — only cell-wireguard does.
Fix _push_permissions_to_remote() to use 'docker exec cell-wireguard curl'
so outbound sync HTTP traverses the WG tunnel from the right namespace.

On the receive side, add ensure_cell_api_dnat() which installs three
iptables rules inside cell-wireguard on startup:
  - PREROUTING DNAT: wg0:3000 → cell-api:3000 (Docker bridge IP)
  - POSTROUTING MASQUERADE: so cell-api's reply routes back via wg0
  - FORWARD ACCEPT: allow the wg0→eth0 forwarded traffic

Called from _apply_startup_enforcement() so rules survive container restarts.
Tests updated to mock subprocess.run instead of urllib.request.urlopen.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 13:48:49 -04:00
roof a3d0cd5a48 feat(cells): Phase 1 — permission sync between connected PICs
When PIC A updates service sharing permissions, it immediately pushes
the mirrored state to PIC B over the WireGuard tunnel so B's UI shows
what A is sharing with it in real time.

Architecture:
- Push model: update_permissions() → _push_permissions_to_remote() →
  POST /api/cells/peer-sync/permissions on remote cell
- Auth: source IP must be inside a known cell's vpn_subnet (WireGuard
  tunnel proves identity) + body's from_public_key must match stored key
- Mirror semantics: our inbound (what we share) → their outbound view
- Non-fatal: push failures set pending_push=True; replay_pending_pushes()
  retries at startup so offline cells catch up on reconnect
- add_connection() also pushes initial state so remote sees permissions
  immediately on the first connect

New fields on cell_links.json records (lazy-migrated):
  remote_api_url, last_push_status, last_push_at, last_push_error,
  pending_push, last_remote_update_at

New endpoint: POST /api/cells/peer-sync/permissions

30 new tests (1101 total).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 13:12:30 -04:00
roof 37d023659a fix(ui): parse getPeerStatuses dict response correctly in CellNetwork
The /api/wireguard/peers/statuses endpoint returns {pubkey: {online,...}}
not {peers: [{public_key,...}]}. The status mapping loop was always
producing an empty statusByKey, making every connected cell show Offline.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 12:25:12 -04:00
roof 29390f064a fix(scripts): api code lives at /app/api/ inside container, not /app/
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 11:50:56 -04:00
roof a8a1de1cba fix(scripts): detect container vs host layout reliably in reset_admin_password
Replace heuristic directory scan with explicit container detection:
/app/scripts path means container, script sibling to api/ means host.
Prevents accidental /app/data/api misdetection when that dir exists.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 11:49:19 -04:00
roof 56d677e925 fix: copy button HTTP fallback, reset-admin-password in Docker, scripts volume
- CellNetwork.jsx CopyButton: use execCommand fallback when clipboard API
  is unavailable (HTTP non-localhost context)
- Makefile reset-admin-password: run inside cell-api container via docker exec
  so bcrypt and all deps are available without host installation
- docker-compose.yml: mount ./scripts:/app/scripts:ro in cell-api so the
  reset script is accessible inside the container
- scripts/reset_admin_password.py: auto-detect API module path and data dir
  so the script works in both host (api/ sibling) and container (/app) layouts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 11:34:38 -04:00
roof e8b3288a41 Merge feature/security-fixes-and-qa into main
Includes 6 commits:
- PIC-to-PIC cell connection fixes (subnet overlap, WG reload, FORWARD rules, DNS)
- Service-sharing permissions backend (cell_links.json, /api/cells/permissions)
- Flask blueprint refactor (app.py -1735 lines extracted)
- Peer add rollback + manager extraction (P2 arch debt)
- VPN key sync on startup, DNS zone hostname/SOA fix, peer status UI
- 50 new tests (1071 total) + CellNetwork.jsx UI for service-sharing

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 11:09:34 -04:00
roof 562d866a65 feat(cells): Phase 3 tests + Phase 4 UI for cell service-sharing
Phase 3 — tests (50 new, total now 1071):
- test_cell_link_manager: atomicity (WG fail → DNS not called, link not
  persisted), DNS warning non-fatal, inbound_services arg, unknown service
  filtered, update/get permissions, lazy migration of legacy entries
- test_wireguard_manager: subnet overlap rejection (exact, supernet, adjacent
  non-overlapping, different class-A, honours wg0.conf configured network)
- test_firewall_manager: _cell_tag sanitisation, apply_cell_rules emits correct
  ACCEPT/DROP per service + catch-all DROP, clear_cell_rules no-op and exact
  line removal, apply_all_cell_rules iterates with correct args
- test_cells_endpoints: RuntimeError→400, GET /services, GET/PUT permissions
  (200/400/404 paths, service name validation, arg forwarding)

Phase 4 — UI:
- CellNetwork.jsx: replace flat cell list with CellPanel expandable cards;
  add ServiceShareToggle (ARIA switch, saves immediately), InboundServiceBadge
  (read-only), DisconnectConfirmModal (replaces window.confirm); relative
  timestamps; paste validation on blur; WireGuard status merged by public_key
- api.js: add cellLinkAPI.getPermissions, updatePermissions, getServices

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 08:45:32 -04:00
roof 0b103ffafb feat(cells): fix PIC-to-PIC connection + add service-sharing permissions
Phase 1 — connection fixes:
- routing_manager.stop(): remove iptables -F / -t nat -F nuclear flush that
  would wipe WireGuard MASQUERADE and all peer rules on any UI stop action
- wireguard_manager.add_cell_peer(): reject vpn_subnet that overlaps the local
  WG network (routing blackhole — was the root cause of no handshake)
- wireguard_manager._syncconf(): pass Endpoint to 'wg set' so cell peers with
  static endpoints are synced to the kernel (not just AllowedIPs)

Phase 2 — service-sharing permissions backend:
- firewall_manager: add _cell_tag(), clear_cell_rules(), apply_cell_rules(),
  apply_all_cell_rules() — iptables FORWARD rules for cell-to-cell traffic
  using 'pic-cell-<name>' comment tags, distinct from 'pic-peer-*'
- app.py startup enforcement: call apply_all_cell_rules(cell_links) so rules
  survive API restarts
- cell_link_manager: permissions schema {inbound, outbound} per service;
  lazy migration for existing entries; update_permissions(), get_permissions();
  apply_cell_rules wired into add_connection/remove_connection
- routes/cells.py: GET /api/cells/services, GET+PUT /api/cells/<n>/permissions;
  RuntimeError now returns 400 (not 500) from add_connection

Removed broken 'test' cell (subnet 10.0.0.0/24 collided with local WG network).
Second PIC must use a distinct subnet (e.g. 10.0.1.0/24) before reconnecting.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 08:35:24 -04:00
roof f3118ff401 fix(vpn): sync WireGuard server key on startup; fix DNS zone cell_name/SOA; fix peer status UI
- API key store was out of sync with wg0.conf: get_keys() generated a random
  phantom key instead of reading the actual WireGuard server key, so all peer
  configs had the wrong PublicKey and could never handshake. Fixed by writing
  correct raw-bytes key files at deploy time and adding _sync_wg_keys() to API
  startup so the store auto-syncs from wg0.conf on every restart.

- apply_domain() fell back silently when zone file had no $ORIGIN directive;
  now also parses the SOA MNAME as the old-domain fallback.

- apply_cell_name() only replaced the hostname if old_name matched literally
  in the zone file; now auto-detects the actual hostname (non-service A record)
  so a stale zone (mycell vs dev) is corrected on next config apply.

- DNS zone file corrected: SOA pic.ngo. admin.pic.ngo., mycell → dev.

- WireGuard UI: add 30s auto-poll for peer statuses; fix "peers currently
  connected" counter to show online/total instead of total count.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 08:05:45 -04:00
roof 5d0238ff3c A5: Extract config routes into blueprint (app.py 1294 → 579 lines)
Move all /api/config/* routes and pending-restart helpers into
routes/config.py. Re-export helpers from app.py for backward compat:

  from routes.config import _set_pending_restart, _clear_pending_restart,
                           _collect_service_ports, _dedup_changes

Test patches updated:
  app._set_pending_restart     → routes.config._set_pending_restart
  app._clear_pending_restart   → routes.config._clear_pending_restart
  app.threading.Thread         → routes.config.threading.Thread

Remaining in app.py: Flask setup, middleware, health monitor thread,
/health, /api/status, /api/health/history* (use module-level state).

1021 tests passing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 06:53:24 -04:00
roof 09138fbc18 A5: Extract all route groups into Flask blueprints (app.py -1735 lines)
Extract 9 route groups out of app.py into routes/ blueprints:
- routes/network.py  — DNS, DHCP, NTP, network info/test (10 routes)
- routes/wireguard.py — WireGuard keys, peers, config, enforcement (18 routes)
- routes/cells.py    — cell-to-cell connections (5 routes)
- routes/peers.py    — peer CRUD + IP update + _next_peer_ip helper (10 routes)
- routes/routing.py  — NAT, peer routes, firewall, iptables (17 routes)
- routes/vault.py    — certs, trust, secrets (19 routes)
- routes/containers.py — containers, images, volumes (14 routes)
- routes/services.py — service bus, logs, services status/connectivity (18 routes)
- routes/peer_dashboard.py — peer-scoped dashboard/services (2 routes)

All blueprints use lazy `from app import X` inside route bodies to preserve
test patch compatibility (patch('app.email_manager', mock) still works).

Also included in this commit:
- A1 fix: backup/restore now includes email/calendar user files
- A2 fix: apply_config sets applying=True flag via helper container
- A3 fix: add_peer rolls back firewall on DNS failure

app.py reduced: 3011 → 1294 lines. 1021 tests passing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 06:11:21 -04:00
roof d54844cd44 fix(P2): peer add rollback, helper failure recovery, manager extraction (A2/A3/A5)
A3 — Peer add atomicity: track firewall_applied flag and call
clear_peer_rules() during rollback so partial peer-add failures
don't leave stale iptables rules behind. Added test.

A2 — Pending config flag: instead of clearing before spawning the
helper container (fire-and-forget), set applying=True and let the
helper clear it on success by writing to cell_config.json via a
mounted /app/data volume. On API restart after a failed apply,
_recover_pending_apply() resets the applying flag so the UI shows
pending changes and the user can retry. GET /api/config/pending now
includes the applying field.

A5 (foundation) — Extract all manager instantiation into managers.py.
app.py re-exports every name so existing test patches (patch('app.X'))
continue to work unchanged. 1021 unit tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 05:27:39 -04:00
roof 2455fe189e fix: apply_cell_name regex now matches zone files with TTL field
_generate_zone_content writes records as "name TTL IN A value" but the
regex only matched "name IN A value" (no TTL), so renaming the cell
never updated the DNS hostname record. Updated regex to make TTL optional.
Also fixed the unit test zone fixture to use the actual generated format.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 09:32:51 -04:00
roof 10eac1fda1 fix: make update stashes runtime config before pull to avoid merge conflicts
Runtime config files exist on disk but are now gitignored. A bare git pull
conflicts with them. Stash (including untracked) before pulling and pop after.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 09:10:00 -04:00
roof caadcaf5c9 fix: untrack runtime config files and add them to .gitignore
These files are machine-specific and generated at runtime — pulling on any
other machine caused unmerged file conflicts. Remove from index (files kept
on disk) and add explicit gitignore rules.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 09:07:27 -04:00
roof ede01b316e fix: untrack runtime config files and add them to .gitignore
These files are machine-specific and generated at runtime — they should
never have been committed. Remove from index (files kept on disk) and
add explicit gitignore rules to prevent future re-adds.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 09:02:22 -04:00
roof fcb338b659 merge: feature/security-fixes-and-qa — security audit fixes, CSRF, test coverage
Merges 7 commits covering:
- P0/P1/P2/P3 audit remediations (CSRF, restart_service, dual config sync, peer atomicity, DNS preservation, trust boundary)
- 1020 passing tests + 8 new test files
- CSRF regression fixes: grace period for existing sessions, GET endpoints for check-port/refresh-ip, native fetch CSRF headers in WireGuard.jsx and Peers.jsx

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 08:58:54 -04:00
roof 9aaacd11cc fix: CSRF regression — grace period for old sessions, GET check-port/refresh-ip, Peers.jsx native fetch tokens
- check_csrf() now issues a token for sessions that predate CSRF (existing logins) instead of blocking them
- /api/wireguard/check-port and /api/wireguard/refresh-ip accept GET so native fetch calls bypass the token requirement
- WireGuard.jsx: changed three native fetch POST → GET for the above endpoints
- Peers.jsx: add X-CSRF-Token header to three native fetch mutation calls (calendar collection, peer PUT, clear-reinstall)
- api.js: export getCsrfToken() so non-Axios callers can read the current token

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-27 12:18:02 -04:00
roof a43f9fbf0d fix: full security audit remediation — P0/P1/P2/P3 fixes + 1020 passing tests
P0 — Broken functionality:
- Fix 12+ endpoints with wrong manager method signatures (email/calendar/file/routing)
- Fix email_manager.delete_email_user() missing domain arg
- Fix cell-link DNS forwarding wiped on every peer change (generate_corefile now
  accepts cell_links param; add/remove_cell_dns_forward no longer clobber the file)
- Fix Flask SECRET_KEY regenerating on every restart (persisted to DATA_DIR)
- Fix _next_peer_ip exhaustion returning 500 instead of 409
- Fix ConfigManager Caddyfile path (/app/config-caddy/)
- Fix UI double-add and wrong-key peer bugs in Peers.jsx / WireGuard.jsx
- Remove hardcoded credentials from Dashboard.jsx

P1 — Security:
- CSRF token validation on all POST/PUT/DELETE/PATCH to /api/* (double-submit pattern)
- enforce_auth: 503 only when users file readable but empty; never bypass on IOError
- WireGuard add_cell_peer: validate pubkey, name, endpoint against strict regexes
- DNS add_cell_dns_forward: validate IP and domain; reject injection chars
- DNS zone write: realpath containment + record content validation
- iptables comment /32 suffix prevents substring match deleting wrong peer rules
- is_local_request() trusts only loopback + 172.16.0.0/12 (Docker bridge)
- POST /api/containers: volume allow-list prevents arbitrary host mounts
- file_manager: bcrypt ($2b→$2y) for WebDAV; realpath containment in delete_user
- email/calendar: stop persisting plaintext passwords in user records
- routing_manager: validate IPs, networks, and interface names
- peer_registry: write peers.json at mode 0o600
- vault_manager: Fernet key file at mode 0o600
- CORS: lock down to explicit origin list
- domain/cell_name validation: reject newline, brace, semicolon injection chars

P2 — Architecture:
- Peer add: rollback registry entry if firewall rules fail post-add
- restart_service(): base class now calls _restart_container(); email and calendar
  managers call cell-mail / cell-radicale respectively
- email/calendar managers sync user list (no passwords) to cell_config.json
- Pending-restart flag cleared only after helper subprocess exits with code 0
- docker-compose.yml: add config-caddy volume to API container

P3 — Tests (854 → 1020):
- Fill test_email_endpoints.py, test_calendar_endpoints.py,
  test_network_endpoints.py, test_routing_endpoints.py
- New: test_peer_management_update.py, test_peer_management_edge_cases.py,
  test_input_validation.py, test_enforce_auth_configured.py,
  test_cell_link_dns.py, test_logs_endpoints.py, test_cells_endpoints.py,
  test_is_local_request_per_endpoint.py, test_caddy_routing.py
- E2E conftest: skip WireGuard suite when wg-quick absent
- Update existing tests to match fixed signatures and comment formats

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-27 11:30:21 -04:00
roof 0c12e3fc97 fix: change domain from dev to lan to avoid browser HSTS preload blocking HTTP
The .dev TLD has been HSTS preloaded in Chrome/Firefox/Safari/Edge since 2019.
Browsers silently redirect http://anything.dev to https://anything.dev before
making any network request. Since Caddy has auto_https off, all browser-based
access to .dev domains fails with a connection error even though DNS, routing,
and HTTP all work correctly (curl works; browsers don't).

- cell_config.json: domain "dev" -> "lan"
- Caddyfile: all http://*.dev blocks -> http://*.lan
- Corefile: dev zone -> lan zone (file /data/lan.zone)
- data/dns/lan.zone: new zone file (dev.zone removed live)
- test_wg_domain_access.py: remove hardcoded DOMAIN_IPS / .dev references;
  read domain from /api/config at runtime so tests work with any configured TLD

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-27 01:54:33 -04:00
roof 32272420cb test: add E2E coverage for peer dashboard/services, DNS records, and WG domain access
- test_peer_dashboard_services.py (63 tests): unit tests for all API fixes
  * peer_dashboard field names (name/transfer_rx/transfer_tx vs old stale names)
  * peer_dashboard service_urls dict with correct domain-keyed URLs
  * peer_services email structure (nested smtp/imap, address not username)
  * peer_services files key (not webdav), caldav URL (calendar.dev not radicale.dev:5232)
  * peer_services wireguard DNS (not 10.0.0.1), config text with DNS line
  * DNS zone records (api/webui → Caddy, VIPs for calendar/files/mail/webdav)
  * Caddyfile generation (all service blocks including webui.dev)
  * Access control (401 anon, 403 admin on peer-only routes, 404 missing peer)
- e2e/api/test_peer_endpoints.py: fix stale field assertions, add structure checks
- e2e/wg/test_wg_domain_access.py: E2E WG tests for DNS resolution via VPN tunnel
  * All *.dev domains resolve to correct IPs via CoreDNS
  * api.dev/webui.dev must resolve to Caddy, not container direct IPs
  * CoreDNS reachability through VPN tunnel
  * Peer config DNS field correctness
- e2e/ui/test_peer_dashboard.py: UI checks for service icon links, CalDAV URL, email

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 17:41:21 -04:00
roof 3690c6d955 fix: correct DNS records, peer dashboard field names, and services API response
- network_manager: api/webui DNS records now point to Caddy (172.20.0.2)
  instead of their container IPs so Caddy can reverse-proxy correctly
- ip_utils: add webui.dev block to generated Caddyfile
- config/caddy/Caddyfile: regenerated with webui.dev block
- config/dns/Corefile: simplify to single forward zone (remove duplicate)
- app.py peer_dashboard: rename peer_name→name, rx_bytes→transfer_rx,
  tx_bytes→transfer_tx to match PeerDashboard.jsx; add service_urls dict
- app.py peer_services: fix DNS (10.0.0.1→real CoreDNS IP), CalDAV URL
  (radicale.dev:5232→calendar.dev), email structure (flat→nested smtp/imap
  objects), rename webdav→files, add WireGuard config text, add username field
- PeerDashboard.jsx: render service icon links from service_urls

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 17:11:21 -04:00
roof e5d59fd94d fix: sync API key-store from wg0.conf to prevent WireGuard handshake failure
linuxserver/wireguard auto-generates its own PrivateKey on first container
start, independently of the PIC API's key-store.  When the two diverge, the
API generates peer configs with the wrong server public key and the WireGuard
handshake fails silently — the client can ping the VPN subnet (10.0.0.x) but
gets no internet and cannot reach any Docker service (172.20.0.x).

Adds _sync_keys_from_conf(): called at the top of apply_config(), reads the
PrivateKey from wg0.conf, derives the matching public key, and overwrites the
API key files (private.key / public.key) if they differ.  This makes wg0.conf
the authoritative source for the server identity, keeping get_peer_config()
consistent with the live WireGuard interface.

Adds 5 new tests in TestSyncKeysFromConf covering:
- key-store update when conf key differs
- no-op when keys already match
- get_peer_config() uses the synced key
- no raise when conf is missing
- apply_config() passes the synced key through bootstrap

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 16:40:21 -04:00
roof 9418c3da5b feat: restore WireGuard peers after bootstrap and add VPN routing tests
apply_config() now calls _load_registered_peers() when wg0.conf is empty
so all active peers from peers.json are written back into the config file
after a bootstrap — preventing clients from losing tunnel access after
an API restart that regenerated wg0.conf from scratch.

Adds test_wireguard_vpn_routing.py (36 tests) covering:
- generate_config() PostUp/PostDown rules enabling internet forwarding
  (MASQUERADE + FORWARD ACCEPT required for internet-through-VPN)
- get_peer_config() DNS field pointing to cell-dns for domain resolution
- apply_config() bootstrap peer restoration from peers.json
- _load_registered_peers() filtering (inactive, missing fields, malformed)
- add_peer() /32 AllowedIPs enforcement to prevent route leaks

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 15:33:57 -04:00
roof 78706d685f merge: feature/security-fixes-and-qa — WireGuard fixes, test infrastructure, port propagation
Merges 5 commits from feature/security-fixes-and-qa:

- WireGuard peer sync, privileged mode, E2E and integration test correctness
- e2e/integration test infrastructure and Makefile test targets
- wireguard_port identity change and check_port_open verification
- apply_config bootstraps wg0.conf when file is empty
- Port changes now propagate to containers via env file in-place writes
  (root cause: write_env_file used os.replace which changes inode; Docker
  file bind-mounts track the original inode, so containers never saw
  port changes; fixed by in-place write + --force-recreate on apply)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 15:06:17 -04:00
roof 580d8af7ae fix: port changes now propagate to containers via env file in-place writes
Root cause: write_env_file used os.replace() which creates a new inode.
Docker file bind-mounts track the original inode at mount time, so the
container's /app/.env.compose never saw updates — docker compose always
read the stale port value and skipped container recreation.

Fixes:
- ip_utils.write_env_file: write in-place (open 'w') instead of os.replace()
  so Docker bind-mounted files see the update immediately
- apply_pending_config: add --force-recreate to docker compose up for
  specific-container restarts, bypassing config-hash comparison as a
  belt-and-suspenders measure

Tests added:
- TestWriteEnvFileInPlace: verifies inode is preserved across writes
- TestApplyPendingConfigForceRecreate: verifies --force-recreate is in the
  docker compose command for specific-container restarts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 15:00:43 -04:00
roof 729c401c33 fix: apply_config bootstraps wg0.conf when file is empty
If wg0.conf exists but is empty or has no [Interface] section,
apply_config previously found no lines to update and silently
returned with no changes — leaving the container broken on next
restart with an empty config.

Fix: detect empty/missing [Interface] section and regenerate the
full config from generate_config() before applying field updates.

This was the root cause of port changes not propagating:
apply_config was called but found nothing to patch in an empty file.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 09:25:02 -04:00
roof de5ff75a2e fix: wireguard_port identity change and check_port_open verification
Bug 1 — port not propagated to wg0.conf:
  The identity update path (wireguard_port via PUT /api/config) was calling
  wireguard_manager.update_config() which only saves to a JSON file via
  BaseServiceManager. wg0.conf was never updated, so after a container
  restart the WireGuard interface would still listen on the old port.
  Fix: call apply_config() instead — it writes ListenPort into wg0.conf.

Bug 2 — check_port_open ignored configured port:
  check_port_open() checked for 'listening port' in wg show output but
  never compared it against the configured port. A port-mismatch (e.g.
  after config change but before restart) would return True — misleading.
  Fix: require 'listening port: {configured_port}' to match exactly.

Tests added:
  - test_check_port_open_wrong_port_returns_false
  - test_check_port_open_explicit_port_matches
  - test_check_port_open_explicit_port_mismatch
  - test_wireguard_port_identity_change_calls_apply_config
  - test_wireguard_port_same_value_does_not_call_apply_config

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 08:41:22 -04:00
roof 9677755b4f fix: e2e/integration test infrastructure and Makefile test targets
- Fix make test: was pointing to non-existent api/tests/, now runs unit tests
  correctly with --ignore=e2e --ignore=integration
- Remove dead phase test targets (test-phase1..4, test-all-phases) that all
  referenced cd api && pytest tests/ (non-existent path)
- Add .test_admin_pass file: reset_admin_password.py now writes a persistent
  test password file alongside .admin_initial_password; the API never deletes
  it (unlike .admin_initial_password which is consumed on first startup)
- Update both integration/conftest.py and e2e/helpers/admin_password.py to
  read .test_admin_pass before .admin_initial_password — so tests work after
  make restart without needing PIC_ADMIN_PASS env var
- Add AI collaboration rules to CLAUDE.md (auto-loaded every session)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 08:27:27 -04:00
roof 420dced9ff fix: WireGuard peer sync, privileged mode, E2E and integration test correctness
- api/app.py: sync WireGuard server config on peer add/remove (non-fatal)
- docker-compose.yml: add privileged:true to wireguard service
- E2E tests: fix logout selector, DNS IP lookup, wg config DNS line, VIP skip guards,
  badge text selectors, heading .first, async logout wait
- Integration tests: fix 4 tests that sent unauthenticated requests expecting 400
  (now use authenticated session helpers); accept 401 as valid in webui proxy test;
  add password field to service_access validation test
- Remove stale tracked config templates (config/api/api/*, config/api/cell.env, etc.)
  that no longer exist on disk after config layout was reorganised

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 06:04:40 -04:00
roof 31a7951ffd fix: 4 issues — admin password sudo, peer modal, WireGuard fetch creds, port check
1. make reset/show-admin-password: use sudo so data/api/ owned-by-root
   files are writable without explicit sudo prefix

2. Peers.jsx: remove one-time password modal on peer creation — admin
   already knows the password they typed; replace with a success toast
   showing peer name and provisioned accounts

3. WireGuard.jsx + Peers.jsx: add credentials:'include' to every raw
   fetch() call (7 calls across two files, plus fix one hardcoded
   localhost:3000 URL); the port check and peer status calls were
   returning 401 because they didn't send the session cookie

4. test_admin_wireguard.py: update test to match new toast flow (no modal),
   add Scenario 10 test that verifies the port check badge renders on the
   WireGuard page after the credentials fix

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 03:33:11 -04:00
roof ec9ceec7a7 feat: add show-admin-password and reset-admin-password make targets
make show-admin-password — prints the admin password from the initial
  setup file if the API hasn't consumed it yet; otherwise prompts to reset
make reset-admin-password — generates a strong random password, updates
  auth_users.json directly, writes it back to the setup file, and prints
  it prominently so it's easy to copy

Also enhances reset_admin_password.py with --show, --generate flags and
a clear banner output, and adds both targets to make help.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 03:17:38 -04:00
roof 7d2979b8af fix: integration and E2E test correctness after auth enforcement
config_manager: make per-file copy errors non-fatal during restore
  (resolves test failures when /app/config/* is not writable by test runner)
test_live_api.py: fix NameError (_req.Session not requests.Session)
test_negative_scenarios.py: replace raw requests.* with authenticated _S.*
  (all endpoints now require auth; unauthenticated calls return 401)
wg/conftest.py: fix wg_server_info — public key is at /api/wireguard/keys
test_admin_navigation.py, test_peer_acl.py: add .first to ambiguous locators
  to avoid Playwright strict-mode errors when desktop+mobile nav both mount

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 18:14:38 -04:00
roof 828dc8cb8f fix: Makefile test targets for Debian system Python and sudo
- All test targets now use python3 -m pytest (not bare pytest)
- test-e2e-deps uses sudo pip3 --break-system-packages and
  sudo python3 -m playwright install (Debian externally-managed env)
- test-e2e-wg uses sudo -E python3 -m pytest (preserves PATH/env)
- reset-test-admin-pass uses make ifndef guard instead of shell ?: expansion
- Remove stale -m markers from test targets (filters were redundant)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 17:42:32 -04:00
roof a98e095e10 fix: enrich peer dashboard and services API endpoints
/api/peer/dashboard now returns live WireGuard stats (online, rx_bytes,
tx_bytes, last_handshake, allowed_ips) by calling wireguard_manager.
/api/peer/services now returns a structured dict with wireguard, email,
caldav, webdav sections containing hostnames and credentials.
Fixes 2 failing E2E API tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 16:49:10 -04:00
roof 0d32038150 feat: add comprehensive E2E test suite (Playwright + WireGuard + API)
Adds tests/e2e/ with three layers of E2E coverage:
- API layer (tests/e2e/api/): unauthenticated access, admin endpoints,
  peer endpoints, access control enforcement — 24 tests
- Playwright UI (tests/e2e/ui/): login flows, admin navigation, peer
  dashboard/services, role-based ACL, password change — 60+ tests
- WireGuard connectivity (tests/e2e/wg/): tunnel up/down, DNS resolution
  through VPN, service ACL enforcement via iptables, full-tunnel routing
Shared helpers: PicAPIClient, WGInterface, playwright_login, cleanup.
Makefile targets: test-e2e-api, test-e2e-ui, test-e2e-wg, test-e2e.
Adds scripts/reset_admin_password.py for test bootstrap.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 16:41:13 -04:00
roof 1e81b3b618 Fix webui port binding: restore public access on 8081
The devops security pass incorrectly bound the webui to 127.0.0.1,
making it unreachable from the network. The webui is the user-facing
interface and must be publicly accessible. Internal-only services
(api :3000, radicale :5232, webdav :8080, rainloop :8888,
filegator :8082) retain their loopback bindings.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 16:10:49 -04:00
roof fc3cfc9741 Fix post-deploy auth issues: best-effort service provisioning, integration test auth, test mock corrections
- api/app.py: email/calendar/files provisioning now best-effort (non-fatal); fixed email_manager.create_email_user call to include domain argument
- tests/integration: added module-level auth sessions to all integration test files; added admin auth to api fixture and _resolve_admin_pass() helper; added TEST_PEER_PASSWORD constant; added password to peer creation calls
- tests/test_peer_provisioning.py: renamed rollback test to reflect new best-effort semantics (email failure no longer causes rollback)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 15:42:03 -04:00
Administrator 975d05eef3 Merge branch 'feature/security-fixes-and-qa' into 'main'
feat: add authentication and authorization system

See merge request root/pic!10
2026-04-25 19:10:17 +00:00
roof 8650704316 feat: add authentication and authorization system
Backend:
- AuthManager (api/auth_manager.py): server-side user store with bcrypt
  password hashing, account lockout after 5 failed attempts (15 min),
  and atomic file writes
- AuthRoutes (api/auth_routes.py): Blueprint at /api/auth/* — login,
  logout, me, change-password, admin reset-password, list-users
- app.py: register auth_bp blueprint; add enforce_auth before_request
  hook (401 for unauthenticated, 403 for wrong role; only active when
  auth store has users so pre-auth tests remain green); instantiate
  AuthManager; update POST /api/peers to require password >= 10 chars
  and auto-provision email + calendar + files + auth accounts with full
  rollback on any failure; extend DELETE /api/peers to tear down all
  four service accounts; add /api/peer/dashboard and /api/peer/services
  peer-scoped routes; fix is_local_request to also trust the last
  X-Forwarded-For entry appended by the reverse proxy (Caddy)
- Role-based access: admin for /api/* (except /api/auth/* which is
  public and /api/peer/* which is peer-only)
- setup_cell.py: generate and print initial admin password, store in
  .admin_initial_password with 0600 permissions; cleaned up on first
  admin login

Frontend:
- AuthContext.jsx: React context with login/logout/me state and Axios
  interceptor for automatic 401 redirect
- PrivateRoute.jsx: route guard component
- Login.jsx: login page with error handling and must-change-password
  redirect
- AccountSettings.jsx: change-password form for any authenticated user
- PeerDashboard.jsx: peer-role landing page (IP, service list)
- MyServices.jsx: peer service links page
- App.jsx, Sidebar.jsx: AuthContext integration, logout button,
  PrivateRoute wrappers, peer-role routing
- Peers.jsx, WireGuard.jsx, api.js: auth-aware API calls

Tests: 100 new auth tests all pass (test_auth_manager, test_auth_routes,
test_route_protection, test_peer_provisioning). Fix pre-existing test
failures: update WireGuard test keys to valid 44-char base64 format
(test_wireguard_manager, test_peer_wg_integration), add password field
and service manager mocks to test_api_endpoints peer tests, add auth
helpers to conftest.py. Full suite: 845 passed, 0 failures.

Fixed: .admin_initial_password security cleanup on bootstrap, username
minimum length (3 chars enforced by USERNAME_RE regex)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 15:00:06 -04:00
Administrator 693262409c Merge branch 'feature/security-fixes-and-qa' into 'main'
add security fixes, port hardening, and expanded QA coverage

See merge request root/pic!9
2026-04-25 17:55:56 +00:00
roof a338836bb8 add security fixes, port hardening, and expanded QA coverage
Security fixes:
- Replace debug=True with env-driven FLASK_DEBUG in app.py
- Add _safe_path helper and path-traversal protection to all 6 file routes
  in file_manager.py
- Add peer_name regex and input validation (public_key, name, endpoint_ip)
  in wireguard_manager.py
- Stop returning private key from GET /api/wireguard/keys; return only
  public_key + has_private_key boolean
- Fix is_local_request() XFF bypass by checking remote_addr only, ignoring
  X-Forwarded-For
- Remove duplicate get_all_configs / get_config_summary methods from
  config_manager.py

DevOps:
- Bind 6 internal service ports to 127.0.0.1 in docker-compose.yml
  (radicale, webdav, api, webui, rainloop, filegator)
- Move WebDAV credentials to env vars (WEBDAV_USER, WEBDAV_PASS)
- Pin flask, flask-cors, requests, cryptography, docker to secure minimum
  versions in requirements.txt

QA (560 tests, 0 failures):
- tests/test_wireguard_endpoints.py: 18 new endpoint tests
- tests/test_file_endpoints.py: 24 new endpoint tests incl. path traversal
- tests/test_container_manager.py: expanded from 2 to 30 tests
- tests/test_config_backup_restore_http.py: 25 new tests (new file)
- tests/test_config_apply.py: 9 new tests (new file)

Docs:
- Rewrite README.md with accurate architecture, ports, env vars, security notes
- Rewrite QUICKSTART.md with verified commands

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 13:08:24 -04:00
roof eb817ffdc5 fix: WireGuard sysctl || true, port check on page load, add peer status tests
Root cause: sysctl -q net.ipv4.conf.all.rp_filter=0 in PostUp exited non-zero
inside the linuxserver/wireguard container (no permission), causing wg-quick to
tear down the wg0 interface — breaking peer status, port check, and internet
access through full tunnel.

- wireguard_manager.py: add || true to both sysctl PostUp/PostDown lines
- docker-compose.yml: add net.ipv4.conf.all.rp_filter=0 to wireguard sysctls
- WireGuard.jsx: kick off port check asynchronously on page load (was refresh-only)
- tests: add TestWireGuardSysctlAndPortCheck — 14 new tests covering sysctl
  content, check_port_open (interface up / down / fallback-to-handshake),
  get_peer_status (online / offline / not-found / no-handshake), and
  get_all_peer_statuses (multi-peer / empty / skips interface line)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 10:31:57 -04:00
roof 4b994a5964 feat: domain validation for NTP servers and mail domain fields
- isValidDomain / isValidDomainOrIp helpers (RFC-compliant label regex)
- network.ntp_servers: each entry validated as hostname or IP; invalid entry shown in error message
- email.domain: validated as a proper domain name; blocks autosave until fixed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 09:39:59 -04:00
roof 15e009bd94 feat: fix export/import, add backup download/upload, restore service checkboxes
- export_config: clean output (no internal _keys), identity exposed as 'identity'
- import_config: handle 'identity' key, merge into existing config (not replace)
- restore_config: accept optional services list for selective restore
- backup_config: include 'identity' in manifest services list
- new GET /api/config/backups/<id>/download → zip file download
- new POST /api/config/backup/upload → zip file upload
- webui: Download + Upload buttons, restore modal with per-service checkboxes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 08:51:40 -04:00
roof 2bd6545f0e fix: silent autosave, pending dedup, domain/cell_name pending, containers access
- Settings: remove Save buttons; autosave is silent (no toast on success, error only)
- Settings: loadAll() resets dirty flags to prevent stale autosave after discard
- app.py: fix domain/ip_range "actually changed" check — full identity is always
  sent on save so these were triggering pending on every keystroke regardless
- app.py: _dedup_changes handles port-change format "service field: old → new"
  (split on ':' not ' changed') so dns_port changed twice shows one entry
- app.py: domain + cell_name changes now go through pending restart banner;
  apply_domain/apply_cell_name write files immediately (reload=False) and set
  pending; Discard restores zone files + Caddyfile to pre-change state
- app.py: _set_pending_restart captures pre-change snapshot BEFORE config writes
  (was snapshotting after, making Discard a no-op)
- app.py: is_local_request reads /proc/net/route to allow the actual Docker
  bridge subnet (172.0.0.0/24) which is not RFC-1918; fixes Containers page 403
- container_manager: get_container_logs raises instead of swallowing exceptions
  so nonexistent container returns 500+error not 200+empty

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 07:16:13 -04:00
roof 4215e03ac6 fix: autosave, cell name overflow, length validation, apply-and-verify tests
Autosave on Apply (was broken):
- App.jsx called useDraftConfig() in the same component that rendered
  DraftConfigProvider — a component cannot consume context it provides.
  Fixed by splitting into AppCore (consumes context, all logic) and App
  (thin shell that wraps AppCore in DraftConfigProvider).  The hook now
  runs inside the provider and hasDirty()/flushAll() work correctly.

Cell name / domain length validation (255-char DNS standard):
- api/app.py: reject cell_name or domain > 255 chars or empty with 400
- api/app.py: reject ip_range without CIDR prefix (bare IPs shift all VIPs)
- webui/src/pages/Settings.jsx: cellNameError + domainError computed values
  block saveIdentity and show inline error; maxLength={255} on inputs
- tests/test_identity_validation.py: 8 unit tests for the new validation

Cell name overflow on all pages:
- Dashboard.jsx: add min-w-0 to flex child div + truncate + title on cell_name
- CellNetwork.jsx: min-w-0 + truncate + title on cell_name, domain, endpoint,
  vpn_subnet in invite cards and connected-cells list

Apply-and-verify integration tests:
- tests/integration/test_apply_propagation.py: TestPendingState (no restarts)
  and TestApplyAndVerify (triggers real container restart + health poll)
  covering the full save → apply → wait → verify propagation lifecycle

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 05:29:09 -04:00
roof 3ce45a8911 fix: get_live_service_vips uses config API, require CIDR prefix for ip_range
- tests/integration/conftest.py: get_live_service_vips() now reads from the
  config API's service_ips field instead of docker exec.  The docker exec approach
  spawns a fresh Python process that imports firewall_manager with its hardcoded
  initial SERVICE_IPS, ignoring any update_service_ips() calls made at runtime.
  The config API always computes VIPs from the current ip_range, so it matches what
  the running app actually uses when writing iptables rules.

- api/app.py: reject ip_range values without a CIDR prefix (e.g. '10.0.0.1')
  with a 400.  Bare IPs are parsed as /32 by ipaddress.ip_network(strict=False),
  which shifts all VIP offsets and produces unusable Docker subnet configs.

- tests/integration/test_config_api.py: update bare-ip test to expect 400 now
  that the API enforces the prefix requirement.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 04:54:47 -04:00
roof 768571f2b7 feat: port conflict validation, autosave on Apply, extended integration tests
Port conflict validation:
- api/port_registry.py: detect_conflicts() checks all service sections for shared port values
- api/app.py: returns HTTP 409 on port conflict after existing range validation
- webui/src/pages/Settings.jsx: JS-side detectPortConflicts() with useMemo shows inline
  conflict errors and blocks Save before the request is made; catch blocks surface server
  error messages (including 409) instead of generic fallbacks

Config autosave on Apply:
- webui/src/contexts/DraftConfigContext.jsx: new context; Settings registers flush callbacks
  per section; App calls flushAll() before applyPending() when any section is dirty
- webui/src/App.jsx: wraps tree with DraftConfigProvider, handleApply shows 'saving' banner
  state and awaits flushAll()
- webui/src/pages/Settings.jsx: registers identity + per-service flushers; propagates dirty
  state into context via setDirty; uses refs to avoid stale closures

Extended integration test coverage (114 new tests):
- tests/integration/test_config_api.py: GET/PUT config, export, import, backup lifecycle
- tests/integration/test_network_services.py: DNS records + DHCP reservations CRUD
- tests/integration/test_containers.py: list, restart, logs, stats; recovery polling
- tests/integration/test_negative_scenarios.py: error-path coverage for all endpoints
- tests/test_port_conflicts.py: 20 unit tests for port_registry.detect_conflicts()

Pre-commit hook updated to skip tests/integration/ (live-stack tests require a running
stack and must be run explicitly via `make test-integration`).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 04:45:47 -04:00