apply_cell_rules drops all traffic from a cell's subnet except specific
service ports. This also drops ICMP replies and TCP ACKs for connections
initiated by local peers to the connected cell, breaking cross-cell
routing (ping to 10.0.0.1 silently dropped by test's cell DROP rule).
Fix: ensure_forward_stateful() inserts a stateful ESTABLISHED,RELATED
ACCEPT at the top of FORWARD. Called from apply_cell_rules (every cell
add/update) and from _apply_startup_enforcement. Idempotent.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three related fixes for split-tunnel peers that need to reach connected cells:
1. apply_peer_rules/apply_all_peer_rules now accept wg_subnet (actual local VPN
subnet) and cell_subnets (connected cells' vpn_subnets) parameters instead of
hardcoding 10.0.0.0/24. All callers (startup, add_peer, update_peer,
apply-enforcement endpoint) pass the real values.
2. Explicit ACCEPT rules are inserted in FORWARD for each connected cell's
subnet so split-tunnel peers (internet_access=False) can still reach
connected cells via the wg0→wg0 path.
3. apply_ip_range in network_manager now loads cell_links.json and passes it
to generate_corefile(), fixing a race where the bootstrap DNS thread could
overwrite the Corefile and wipe cross-cell DNS forwarding zones on startup.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
**PIC UI always accessible (service_access=[])**
Remove the per-peer Caddy:80 ACCEPT/DROP rule from apply_peer_rules.
Service access was enforced at two layers (iptables DROP + CoreDNS ACL),
but the iptables layer also blocked the PIC web UI served through Caddy.
CoreDNS ACL alone is sufficient — DNS blocks service hostnames; the UI
path through Caddy remains reachable regardless of service_access value.
**Exit-relay internet routing (route_via another cell)**
update_peer_ip validated new_ip as a single ip_network, rejecting the
comma-separated '10.0.1.0/24, 0.0.0.0/0' string passed by
update_cell_peer_allowed_ips(add_default_route=True). The AllowedIPs
in wg0.conf was never updated, so WireGuard never routed internet traffic
through the exit cell's tunnel. Fix: validate each CIDR individually and
apply the change live via wg set without a container restart.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- _build_acl_block: put all blocked IPs for a service in ONE acl block instead
of one block per peer — the first block's allow-all was silently granting
access to every peer after the first blocked one (first-match semantics)
- generate_corefile: add 'reload' plugin so SIGUSR1 triggers Corefile reload
in newer CoreDNS builds (without it the signal was a no-op)
- tests/test_firewall_manager.py: new tests for single merged ACL block and
the reload directive
- tests/e2e/api/test_peer_access_update.py: e2e tests for service_access,
internet_access, and peer_access updates persisting live to iptables/CoreDNS
- tests/e2e/api/test_cell_to_cell.py: e2e tests for cell-to-cell connection
management, permissions API, and cross-cell service access restrictions
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
DNS A records now return the WireGuard server IP (10.0.0.1) instead of
Docker bridge VIPs so cross-cell peers resolve service names correctly
regardless of their bridge subnet. DNAT rules (wg0:53→cell-dns:53 and
wg0:80→cell-caddy:80) are applied at startup. Caddy routes by Host header,
eliminating the Docker bridge subnet conflict. Firewall cell rules allow
DNS and service (Caddy) traffic from linked cell subnets. Split-tunnel
AllowedIPs now dynamically includes connected-cell VPN subnets and drops
the 172.20.0.0/16 range. Peers with route_via set now receive full-tunnel
config (0.0.0.0/0) so all their traffic exits via the remote cell.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The PostUp rule appended `iptables -A FORWARD -i wg0 -j ACCEPT` which
allowed any WireGuard-connected client full internet access regardless of
per-peer rules, even when no peers were configured in wg0.conf.
Fix: change PostUp/PostDown to use DROP as the catch-all. Per-peer and
per-cell rules use -I (insert at top) so they take precedence; unknown
or unconfigured WG traffic hits the DROP at the bottom.
Also add reconcile_stale_peer_rules() called on startup to remove FORWARD
rules for peer IPs that no longer exist in the registry, preventing deleted
peers from retaining firewall access across container restarts.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds the ability to route a specific peer's internet traffic through a
connected cell acting as an exit relay.
Cell A side:
- PUT /api/peers/<peer>/route-via {"via_cell": "cellB"} sets route_via
- Updates WG AllowedIPs to include 0.0.0.0/0 for the exit cell peer
- Adds ip rule + ip route in policy table inside cell-wireguard so the
specific peer's traffic egresses via cellB's WG IP
- Sets exit_relay_active on the cell link and pushes use_as_exit_relay=True
to cellB via peer-sync
Cell B side:
- Receives use_as_exit_relay in the peer-sync payload
- Calls apply_cell_rules(..., exit_relay=True) to add FORWARD -o eth0 ACCEPT
- Stores remote_exit_relay_active flag for startup recovery
Startup recovery:
- apply_all_cell_rules passes exit_relay=remote_exit_relay_active (cellB)
- _apply_startup_enforcement reapplies ip rule for each peer with route_via (cellA)
since policy routing rules don't survive container restart
peer_registry gets route_via field with lazy migration.
22 new tests across test_cell_link_manager, test_peer_registry, test_peer_route_via.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The per-cell catch-all DROP was reaching position 5 before our ACCEPT
(position 6) because apply_all_cell_rules can re-run after
ensure_cell_api_dnat, pushing the DNAT ACCEPT below the DROP.
Fix: add the API-sync ACCEPT inside apply_cell_rules itself, tagged with
the cell's own tag and inserted LAST (= position 1, above the DROP).
Since it's part of the cell's rule block it is always in the right
position relative to the catch-all DROP, regardless of call order.
Also adds _get_cell_api_ip() helper (docker inspect cell-api) so the
destination IP is always current, and two new tests that verify both the
rule exists and that the insertion order guarantees it wins over DROP.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
cell-api has no route to remote WG tunnel IPs — only cell-wireguard does.
Fix _push_permissions_to_remote() to use 'docker exec cell-wireguard curl'
so outbound sync HTTP traverses the WG tunnel from the right namespace.
On the receive side, add ensure_cell_api_dnat() which installs three
iptables rules inside cell-wireguard on startup:
- PREROUTING DNAT: wg0:3000 → cell-api:3000 (Docker bridge IP)
- POSTROUTING MASQUERADE: so cell-api's reply routes back via wg0
- FORWARD ACCEPT: allow the wg0→eth0 forwarded traffic
Called from _apply_startup_enforcement() so rules survive container restarts.
Tests updated to mock subprocess.run instead of urllib.request.urlopen.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Phase 1 — connection fixes:
- routing_manager.stop(): remove iptables -F / -t nat -F nuclear flush that
would wipe WireGuard MASQUERADE and all peer rules on any UI stop action
- wireguard_manager.add_cell_peer(): reject vpn_subnet that overlaps the local
WG network (routing blackhole — was the root cause of no handshake)
- wireguard_manager._syncconf(): pass Endpoint to 'wg set' so cell peers with
static endpoints are synced to the kernel (not just AllowedIPs)
Phase 2 — service-sharing permissions backend:
- firewall_manager: add _cell_tag(), clear_cell_rules(), apply_cell_rules(),
apply_all_cell_rules() — iptables FORWARD rules for cell-to-cell traffic
using 'pic-cell-<name>' comment tags, distinct from 'pic-peer-*'
- app.py startup enforcement: call apply_all_cell_rules(cell_links) so rules
survive API restarts
- cell_link_manager: permissions schema {inbound, outbound} per service;
lazy migration for existing entries; update_permissions(), get_permissions();
apply_cell_rules wired into add_connection/remove_connection
- routes/cells.py: GET /api/cells/services, GET+PUT /api/cells/<n>/permissions;
RuntimeError now returns 400 (not 500) from add_connection
Removed broken 'test' cell (subnet 10.0.0.0/24 collided with local WG network).
Second PIC must use a distinct subnet (e.g. 10.0.1.0/24) before reconnecting.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sprint 1 — Security & correctness:
- Restore all 10 commented-out is_local_request() checks (vault, containers, images, volumes)
- Fix XFF spoofing: only trust the LAST X-Forwarded-For entry (Caddy's append), not all
- Require prefix length in wireguard.address (was accepting bare IPs like 10.0.0.1)
- Validate service_access list in add_peer (valid: calendar/files/mail/webdav)
- Fix dhcp/reservations POST/DELETE: unpack mac/ip/hostname from body (was passing dict as positional arg)
- Fix network/test POST: remove spurious data arg (test_connectivity takes no args)
- Fix remove_peer: clear iptables rules and regenerate DNS ACLs on deletion (was leaving stale rules)
- Fix CoreDNS reload: SIGHUP → SIGUSR1 (SIGHUP kills the process; SIGUSR1 triggers reload plugin)
- Remove local.{domain} block from Corefile template (local.zone doesn't exist, caused log spam)
- Fix routing_manager._remove_nat_rule: targeted -D instead of flushing entire POSTROUTING chain
Sprint 2 — State consistency:
- Atomic config writes in config_manager, ip_utils, firewall_manager, network_manager
(write to .tmp → fsync → os.replace, prevents truncated files on kill)
- backup_config: now also backs up Caddyfile, Corefile, .env, DNS zone files
- restore_config: restores all of the above so config stays consistent after restore
Sprint 3 — Dead code / documentation:
- Remove CellManager instantiation from app startup (was never called, double-instantiated all managers)
- Document routing_manager scope (targets host, not cell-wireguard; methods not called by any active route)
Sprint 4 — Test infrastructure:
- Add tests/conftest.py with shared tmp_dir, tmp_config_dir, tmp_data_dir, flask_client fixtures
- Add tests/test_config_validation.py: 400 paths for ip_range, port, wireguard.address validation
- Add tests/test_ip_utils_caddyfile.py: 14 tests for write_caddyfile (was completely untested)
- Expand test_app_misc.py: 7 new is_local_request tests covering XFF spoofing and cell-network IPs
- Add --cov-fail-under=70 to make test-coverage
- Add pre-commit hook that runs pytest before every commit
414 tests pass (was 372).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two bugs caused DNS to fail when the domain name changes:
1. generate_corefile() hardcoded 'cell' as the zone name instead of
using the configured domain — on startup it would silently reset any
domain change back to 'cell'
2. apply_domain() regex replaced ALL non-dot zones (including local.cell)
with the new domain → duplicate zone blocks → CoreDNS crash
Fix: add a domain parameter to generate_corefile/apply_all_dns_rules,
add _configured_domain() helper in app.py, and delegate Corefile updates
in apply_domain() to generate_corefile() so the logic is in one place.
Also parameterise SERVICE_HOSTS ACL entries via the domain argument.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When ip_range changes in Settings, the new subnet is now applied to:
- DNS zone records (network_manager.apply_ip_range)
- Caddy virtual IPs (firewall_manager.ensure_caddy_virtual_ips)
- iptables per-service rules (firewall_manager.update_service_ips)
- docker-compose.yml static IPs if writable (ip_utils.update_docker_compose_ips)
New module ip_utils.py derives all container IPs from the subnet using
fixed offsets so the entire stack stays consistent from one setting.
321 tests pass (72 new tests added for ip_utils, apply_ip_range, update_service_ips).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Server-side access control:
- firewall_manager.py: per-peer iptables FORWARD rules in WireGuard container;
virtual IPs on Caddy (172.20.0.21-24) for per-service DROP/ACCEPT targeting
- CoreDNS Corefile regenerated with ACL blocks for blocked services per peer
- POST /api/wireguard/apply-enforcement re-applies rules after WireGuard restart;
wg0.conf PostUp calls it via curl so rules restore automatically on container start
WireGuard fixes:
- _syncconf uses `wg set peer` instead of `wg syncconf` to avoid resetting ListenPort
- add_peer validates AllowedIPs must be /32 — rejects full/split tunnel CIDRs that
would route internet or LAN traffic to that peer
- _config_file() checks for linuxserver wg_confs/ subdirectory first
UI:
- Peers page fetches /api/wireguard/peers/statuses for live handshake data;
status badge now shows real Online/Offline + seconds since last handshake
- IP field removed from Add Peer form (auto-assigned from 10.0.0.0/24)
Tests (246 pass):
- test_firewall_manager.py: 22 tests for ACL generation, iptables rule correctness,
comment tagging, clear_peer_rules filter logic
- test_peer_wg_integration.py: 10 tests for /32 enforcement, IP auto-assignment,
syncconf called on add/remove
- test_wireguard_manager.py: updated to reflect correct IPs and /32 requirement
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>