fix: complete cross-cell peer-sync push (domain SNI + source-preserving NAT)
Unit Tests / test (push) Successful in 9m45s

Finishes the transport repair (L1+L2 landed in 714fb9b). The push now works
end-to-end between linked cells — verified live: offer/permission state
propagates automatically and the cell_relay derives/reverts without manual steps.

L3 — push by domain, not bare IP (cell_link_manager): the push targeted
https://<vpn-ip>, but in DDNS/ACME mode Caddy only holds a cert for the cell's
domain, so the TLS handshake failed by IP. Target https://<remote-domain> with
`curl --resolve <domain>:443:<dns_ip>` — connect to the VPN IP over the tunnel
but present the domain as SNI/Host. remote_api_url is now domain-based; legacy
http://ip:3000 and https://ip URLs migrate on load.

L4 — preserve the real source for auth (firewall_manager): the blanket
`-o eth0 MASQUERADE` rewrote the push source, so the remote's X-Forwarded-For
source-subnet auth couldn't match. apply_cell_rules adds a tightly-scoped nat
POSTROUTING RETURN (linked-subnet → caddy:443 only) above the masquerade; the
host route returns Caddy's reply through the tunnel. Reviewed by pic-security:
WireGuard per-cell AllowedIPs + Caddy last-XFF (no trusted_proxies) keep this
un-spoofable; the API stays 127.0.0.1-only.

Also:
- validate remote-invite domain/dns_ip/endpoint/subnet at ingest (they reach a
  curl --resolve argv — block leading-dash argument-injection).
- remove the host subnet route on cell unlink (remove_cell_subnet_route); the
  route was never cleaned, leaving a stale subnet that made is_local_request
  treat it as local. Mock firewall side-effects in the affected unit tests.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
2026-06-17 01:02:20 -04:00
parent 714fb9b1a9
commit 639fb66e5b
4 changed files with 250 additions and 46 deletions
+34
View File
@@ -438,11 +438,26 @@ def apply_cell_rules(cell_name: str, vpn_subnet: str, inbound_services: List[str
_iptables(['-I', 'FORWARD', '-s', vpn_subnet, '-d', caddy_ip,
'-p', 'tcp', '--dport', '443',
'-m', 'comment', '--comment', tag, '-j', 'ACCEPT'])
# Preserve the linked cell's real VPN source on peer-sync traffic:
# the blanket `-o eth0 MASQUERADE` would rewrite it to cell-wireguard's
# bridge IP, and the remote side authenticates the push by matching the
# source (via X-Forwarded-For) to the cell's VPN subnet. RETURN before
# the MASQUERADE (inserted at the top of nat POSTROUTING). Caddy's reply
# to the real VPN IP routes back via the cell-subnet host route
# (ensure_cell_subnet_routes). The :80 service path keeps masquerade.
_iptables(['-t', 'nat', '-I', 'POSTROUTING', '-s', vpn_subnet,
'-d', caddy_ip, '-p', 'tcp', '--dport', '443',
'-m', 'comment', '--comment', tag, '-j', 'RETURN'])
# Ensure reply traffic (e.g. ICMP, TCP ACKs) for connections initiated
# by local peers to this cell is not dropped by the cell's catch-all DROP.
ensure_forward_stateful()
# Host route so Caddy's peer-sync reply (to the linked cell's un-masqueraded
# VPN IP) leaves via cell-wireguard rather than the default gateway. Added at
# startup for all links; ensure it on runtime link-add too. Idempotent.
ensure_cell_subnet_routes([{'vpn_subnet': vpn_subnet}])
logger.info(
f"Applied cell rules for {cell_name} ({vpn_subnet}): "
f"inbound={inbound_services} exit_relay={exit_relay}"
@@ -689,6 +704,25 @@ def ensure_cell_subnet_routes(cell_links: List[Dict[str, Any]]) -> None:
logger.warning(f'ensure_cell_subnet_routes: {subnet}: {e}')
def remove_cell_subnet_route(vpn_subnet: str) -> None:
"""Remove the host route for a disconnected cell's VPN subnet (idempotent).
Counterpart to ensure_cell_subnet_routes. Without it the route lingers after a
cell is unlinked — blackholing that subnet via cell-wireguard, and (on a host
that runs the API/tests directly, e.g. a dev box) making is_local_request /
_local_subnets treat the stale subnet as locally attached.
"""
if not vpn_subnet:
return
WG_BRIDGE_IP = '172.20.0.9'
try:
_run(['docker', 'run', '--rm', '--network', 'host', '--cap-add', 'NET_ADMIN',
'alpine', 'ip', 'route', 'del', vpn_subnet, 'via', WG_BRIDGE_IP],
check=False)
except Exception as e:
logger.warning(f'remove_cell_subnet_route: {vpn_subnet}: {e}')
# ---------------------------------------------------------------------------
# DNS ACL (CoreDNS Corefile generation)
# ---------------------------------------------------------------------------