fix: complete cross-cell peer-sync push (domain SNI + source-preserving NAT)
Unit Tests / test (push) Successful in 9m45s
Unit Tests / test (push) Successful in 9m45s
Finishes the transport repair (L1+L2 landed in 714fb9b). The push now works
end-to-end between linked cells — verified live: offer/permission state
propagates automatically and the cell_relay derives/reverts without manual steps.
L3 — push by domain, not bare IP (cell_link_manager): the push targeted
https://<vpn-ip>, but in DDNS/ACME mode Caddy only holds a cert for the cell's
domain, so the TLS handshake failed by IP. Target https://<remote-domain> with
`curl --resolve <domain>:443:<dns_ip>` — connect to the VPN IP over the tunnel
but present the domain as SNI/Host. remote_api_url is now domain-based; legacy
http://ip:3000 and https://ip URLs migrate on load.
L4 — preserve the real source for auth (firewall_manager): the blanket
`-o eth0 MASQUERADE` rewrote the push source, so the remote's X-Forwarded-For
source-subnet auth couldn't match. apply_cell_rules adds a tightly-scoped nat
POSTROUTING RETURN (linked-subnet → caddy:443 only) above the masquerade; the
host route returns Caddy's reply through the tunnel. Reviewed by pic-security:
WireGuard per-cell AllowedIPs + Caddy last-XFF (no trusted_proxies) keep this
un-spoofable; the API stays 127.0.0.1-only.
Also:
- validate remote-invite domain/dns_ip/endpoint/subnet at ingest (they reach a
curl --resolve argv — block leading-dash argument-injection).
- remove the host subnet route on cell unlink (remove_cell_subnet_route); the
route was never cleaned, leaving a stale subnet that made is_local_request
treat it as local. Mock firewall side-effects in the affected unit tests.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
@@ -438,11 +438,26 @@ def apply_cell_rules(cell_name: str, vpn_subnet: str, inbound_services: List[str
|
||||
_iptables(['-I', 'FORWARD', '-s', vpn_subnet, '-d', caddy_ip,
|
||||
'-p', 'tcp', '--dport', '443',
|
||||
'-m', 'comment', '--comment', tag, '-j', 'ACCEPT'])
|
||||
# Preserve the linked cell's real VPN source on peer-sync traffic:
|
||||
# the blanket `-o eth0 MASQUERADE` would rewrite it to cell-wireguard's
|
||||
# bridge IP, and the remote side authenticates the push by matching the
|
||||
# source (via X-Forwarded-For) to the cell's VPN subnet. RETURN before
|
||||
# the MASQUERADE (inserted at the top of nat POSTROUTING). Caddy's reply
|
||||
# to the real VPN IP routes back via the cell-subnet host route
|
||||
# (ensure_cell_subnet_routes). The :80 service path keeps masquerade.
|
||||
_iptables(['-t', 'nat', '-I', 'POSTROUTING', '-s', vpn_subnet,
|
||||
'-d', caddy_ip, '-p', 'tcp', '--dport', '443',
|
||||
'-m', 'comment', '--comment', tag, '-j', 'RETURN'])
|
||||
|
||||
# Ensure reply traffic (e.g. ICMP, TCP ACKs) for connections initiated
|
||||
# by local peers to this cell is not dropped by the cell's catch-all DROP.
|
||||
ensure_forward_stateful()
|
||||
|
||||
# Host route so Caddy's peer-sync reply (to the linked cell's un-masqueraded
|
||||
# VPN IP) leaves via cell-wireguard rather than the default gateway. Added at
|
||||
# startup for all links; ensure it on runtime link-add too. Idempotent.
|
||||
ensure_cell_subnet_routes([{'vpn_subnet': vpn_subnet}])
|
||||
|
||||
logger.info(
|
||||
f"Applied cell rules for {cell_name} ({vpn_subnet}): "
|
||||
f"inbound={inbound_services} exit_relay={exit_relay}"
|
||||
@@ -689,6 +704,25 @@ def ensure_cell_subnet_routes(cell_links: List[Dict[str, Any]]) -> None:
|
||||
logger.warning(f'ensure_cell_subnet_routes: {subnet}: {e}')
|
||||
|
||||
|
||||
def remove_cell_subnet_route(vpn_subnet: str) -> None:
|
||||
"""Remove the host route for a disconnected cell's VPN subnet (idempotent).
|
||||
|
||||
Counterpart to ensure_cell_subnet_routes. Without it the route lingers after a
|
||||
cell is unlinked — blackholing that subnet via cell-wireguard, and (on a host
|
||||
that runs the API/tests directly, e.g. a dev box) making is_local_request /
|
||||
_local_subnets treat the stale subnet as locally attached.
|
||||
"""
|
||||
if not vpn_subnet:
|
||||
return
|
||||
WG_BRIDGE_IP = '172.20.0.9'
|
||||
try:
|
||||
_run(['docker', 'run', '--rm', '--network', 'host', '--cap-add', 'NET_ADMIN',
|
||||
'alpine', 'ip', 'route', 'del', vpn_subnet, 'via', WG_BRIDGE_IP],
|
||||
check=False)
|
||||
except Exception as e:
|
||||
logger.warning(f'remove_cell_subnet_route: {vpn_subnet}: {e}')
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# DNS ACL (CoreDNS Corefile generation)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
Reference in New Issue
Block a user