fix: architecture audit — security, atomicity, broken endpoints, test coverage

Sprint 1 — Security & correctness:
- Restore all 10 commented-out is_local_request() checks (vault, containers, images, volumes)
- Fix XFF spoofing: only trust the LAST X-Forwarded-For entry (Caddy's append), not all
- Require prefix length in wireguard.address (was accepting bare IPs like 10.0.0.1)
- Validate service_access list in add_peer (valid: calendar/files/mail/webdav)
- Fix dhcp/reservations POST/DELETE: unpack mac/ip/hostname from body (was passing dict as positional arg)
- Fix network/test POST: remove spurious data arg (test_connectivity takes no args)
- Fix remove_peer: clear iptables rules and regenerate DNS ACLs on deletion (was leaving stale rules)
- Fix CoreDNS reload: SIGHUP → SIGUSR1 (SIGHUP kills the process; SIGUSR1 triggers reload plugin)
- Remove local.{domain} block from Corefile template (local.zone doesn't exist, caused log spam)
- Fix routing_manager._remove_nat_rule: targeted -D instead of flushing entire POSTROUTING chain

Sprint 2 — State consistency:
- Atomic config writes in config_manager, ip_utils, firewall_manager, network_manager
  (write to .tmp → fsync → os.replace, prevents truncated files on kill)
- backup_config: now also backs up Caddyfile, Corefile, .env, DNS zone files
- restore_config: restores all of the above so config stays consistent after restore

Sprint 3 — Dead code / documentation:
- Remove CellManager instantiation from app startup (was never called, double-instantiated all managers)
- Document routing_manager scope (targets host, not cell-wireguard; methods not called by any active route)

Sprint 4 — Test infrastructure:
- Add tests/conftest.py with shared tmp_dir, tmp_config_dir, tmp_data_dir, flask_client fixtures
- Add tests/test_config_validation.py: 400 paths for ip_range, port, wireguard.address validation
- Add tests/test_ip_utils_caddyfile.py: 14 tests for write_caddyfile (was completely untested)
- Expand test_app_misc.py: 7 new is_local_request tests covering XFF spoofing and cell-network IPs
- Add --cov-fail-under=70 to make test-coverage
- Add pre-commit hook that runs pytest before every commit

414 tests pass (was 372).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-04-24 03:27:52 -04:00
parent 55bec04603
commit d5018c2b34
13 changed files with 801 additions and 633 deletions
+21 -7
View File
@@ -2,6 +2,16 @@
"""
Routing Manager for Personal Internet Cell
Handles VPN gateway, NAT, iptables, and advanced routing
NOTE: This manager runs iptables/ip-route commands on the HOST (the machine running
docker-compose), not inside cell-wireguard. This is intentional for host-level
routing features (exit-node, bridge, split-route) that are not yet wired to any
UI endpoint. The manager is instantiated but its methods are not called by any
active API route.
CRITICAL: _remove_nat_rule flushes ALL of POSTROUTING (-F), which would wipe the
WireGuard MASQUERADE rule. Do not call it until this is fixed to use targeted
rule deletion (-D) instead of a full flush.
"""
import os
@@ -766,14 +776,18 @@ class RoutingManager(BaseServiceManager):
logger.error(f"Failed to apply NAT rule: {e}")
def _remove_nat_rule(self, rule_id: str):
"""Remove NAT rule from iptables"""
"""Remove NAT rule from iptables by rule_id comment tag."""
try:
# This is a simplified removal - in practice you'd need to track the exact rule
cmd = ['iptables', '-t', 'nat', '-F', 'POSTROUTING']
subprocess.run(cmd, check=True, timeout=10)
logger.info(f"Removed NAT rule: {rule_id}")
# Use -D with the comment tag to remove the specific rule rather than
# flushing the entire POSTROUTING chain (which would wipe WireGuard MASQUERADE).
cmd = ['iptables', '-t', 'nat', '-D', 'POSTROUTING',
'-m', 'comment', '--comment', rule_id, '-j', 'MASQUERADE']
result = subprocess.run(cmd, timeout=10)
if result.returncode != 0:
# Rule may not exist — not an error
logger.debug(f"NAT rule {rule_id} not found (already removed?)")
else:
logger.info(f"Removed NAT rule: {rule_id}")
except Exception as e:
logger.error(f"Failed to remove NAT rule: {e}")