fix: make cross-cell peer-sync push actually reach the remote cell's API
Unit Tests / test (push) Successful in 9m48s

The offer/permission push between linked cells never worked end-to-end. Two
fixes complete the transport (the push already targets the remote over the WG
tunnel; fix #3 earlier pointed it at HTTPS):

1. The slim WireGuard image (where the push originates — the only namespace with
   routes to remote-cell VPN subnets) had no TLS-capable HTTP client (busybox
   wget lacks TLS, no curl). Add curl + ca-certificates (~5MB).

2. The receiving cell's cell-link firewall allowed the linked subnet to reach
   cell-api:3000 — a dead path (the API binds 127.0.0.1 only; nothing DNATs
   :3000). Move the peer-sync ACCEPT to Caddy:443, which the WG server already
   DNATs (wg0:443 → Caddy → cell-api) and whose replies the existing
   `-o eth0 MASQUERADE` routes back through the tunnel. Source auth (cell VPN
   subnet via X-Forwarded-For) is preserved; the API stays 127.0.0.1-only.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
2026-06-16 10:01:56 -04:00
parent c7e01d4aa7
commit 714fb9b1a9
3 changed files with 61 additions and 38 deletions
+14 -9
View File
@@ -374,7 +374,8 @@ def apply_cell_rules(cell_name: str, vpn_subnet: str, inbound_services: List[str
Traffic from vpn_subnet is allowed only to service VIPs listed in
inbound_services; all other cell traffic is DROPped. Cells get no
internet or peer access — only explicit service access via Caddy on
port 80, plus the cell-api port (3000) for permission-sync pushes.
port 80, plus Caddy on 443 for cross-cell peer-sync pushes (offer/
permission state) which reach cell-api through Caddy.
DNS (port 53) is always allowed so cell peers can resolve service names.
Service names resolve to the WG server IP; ensure_service_dnat() routes
@@ -388,7 +389,7 @@ def apply_cell_rules(cell_name: str, vpn_subnet: str, inbound_services: List[str
2. Exit relay ACCEPT (-o eth0) (if exit_relay, above catch-all)
3. Service ACCEPT to Caddy port 80 (if any inbound_services)
4. DNS ACCEPT to cell-dns port 53 (UDP + TCP)
5. API-sync ACCEPT (inserted last → top)
5. Peer-sync ACCEPT to Caddy port 443 (inserted last → top)
"""
try:
tag = _cell_tag(cell_name)
@@ -425,13 +426,17 @@ def apply_cell_rules(cell_name: str, vpn_subnet: str, inbound_services: List[str
'-p', proto, '--dport', '53',
'-m', 'comment', '--comment', tag, '-j', 'ACCEPT'])
# API permission-sync ACCEPT — inserted LAST so it goes to position 1 (above
# the catch-all DROP). Remote cells push permissions to our cell-api via the
# WG tunnel; iptables sees source=cell_subnet dst=api_ip after DNAT.
api_ip = _get_cell_api_ip()
if api_ip:
_iptables(['-I', 'FORWARD', '-s', vpn_subnet, '-d', api_ip,
'-p', 'tcp', '--dport', '3000',
# Peer-sync ACCEPT — inserted LAST so it goes to position 1 (above the
# catch-all DROP). Remote cells push offer/permission state to our API over
# the WG tunnel. The push targets the remote's Caddy on 443 (DNAT wg0:443 →
# Caddy → cell-api), NOT cell-api:3000 directly: the API binds 127.0.0.1
# only and is reachable solely through Caddy. After DNAT iptables sees
# source=cell_subnet dst=caddy_ip:443; the existing `-o eth0 MASQUERADE`
# routes Caddy's reply back through the tunnel.
caddy_ip = _get_caddy_container_ip()
if caddy_ip:
_iptables(['-I', 'FORWARD', '-s', vpn_subnet, '-d', caddy_ip,
'-p', 'tcp', '--dport', '443',
'-m', 'comment', '--comment', tag, '-j', 'ACCEPT'])
# Ensure reply traffic (e.g. ICMP, TCP ACKs) for connections initiated