Email, calendar, files, webmail (rainloop), and the file manager (filegator)
are removed from the main docker-compose stack. They install as independent
per-service compose projects via ServiceComposer.
On startup, _cleanup_legacy_builtin_containers() stops and removes any of the
5 legacy containers still running from the old main stack (guarded by a
one-shot sentinel in _meta.legacy_builtins_cleaned so it never runs twice).
Per-service installs (com.docker.compose.project != 'pic') are left untouched.
Changes:
- docker-compose.yml: remove mail, radicale, webdav, rainloop, filegator blocks;
fix dhcp + ntp to profiles: ["core","full"] so they start with --profile core
- Makefile: replace all --profile full with --profile core (6 occurrences);
remove mailserver.env conditional from update: target
- api/legacy_cleanup.py: new module with cleanup_legacy_builtin_containers()
- api/app.py: import and call cleanup at startup before reapply_on_startup()
- tests/test_legacy_cleanup.py: 7 tests covering sentinel, absent containers,
per-service project skip, main-stack removal, exception safety
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Email, calendar, and files no longer appear in the nav or as usable pages
unless they are installed. The nav refreshes whenever a service is installed
or removed via the new pic-services-changed CustomEvent.
Changes:
- routes/services.py: add GET /api/services/active endpoint
- api.js: add servicesAPI.listActive()
- App.jsx: replace hardcoded coreServiceChildren with dynamic state fetched
from /api/services/active; SERVICE_META maps ids to nav entry shapes
- ServiceNotInstalledBanner.jsx: new component — admin gets catalog link,
peer gets "contact admin" message
- EmailPage/CalendarPage/FilesPage: show banner when service not installed
- ServicesIndex.jsx: remove CoreServiceCard + CORE_SERVICES "Built-in"
section; rename Remove → Uninstall; dispatch pic-services-changed on
install/uninstall success
- MyServices.jsx: conditionally render service cards based on active list;
placeholder card when absent; page-level notice when nothing is installed
- tests/test_services_active_endpoint.py: 4 new endpoint tests
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ServiceStoreManager.install() now delegates container lifecycle to
ServiceComposer (per-service docker-compose.yml) instead of appending to a
shared compose override. This eliminates IP pool allocation, compose override
rendering, and the single-stack docker exec approach.
Changes:
- service_composer.py: add _resolve_requires(), _resolve_dependents(),
reapply_active_services() — dependency graph and startup reapply
- service_store_manager.py: rewrite install() and remove() to use
ServiceComposer; add _fetch_template(); delete _allocate_service_ip(),
_render_compose_override(), _write_compose_override(); remove() now guards
against removing services that others depend on
- managers.py: pass service_composer= to ServiceStoreManager
- Tests: 13 new composer dep tests; TestInstall/TestRemove rewritten for
the new composer-driven path; test_optional_services_feature.py updated
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Builtins (email/calendar/files) are no longer baked into the API image.
ServiceRegistry now only knows about installed store services. When nothing
is installed, Caddy and DNS get no service routes — no hardcoded fallback.
Changes:
- service_registry.py: remove _BUILTINS_DIR, _builtin_ids, _builtin_manifest,
_load_manifest; get() and list_all() now delegate entirely to installed services
- caddy_manager.py: remove _build_core_service_routes(); remove hardcoded
fallback pairs from _http01_service_pairs(); empty registry → api block only
- network_manager.py: _get_service_subdomains() returns [] when no registry
- api/services/builtins/: deleted (email, calendar, files manifests)
- Tests updated throughout: removed builtin-dependent assertions, added
installed-service fixtures, updated fallback expectations to api-only
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three related fixes discovered during review of Phase 0 and Phase 1 manifests:
1. validate_rendered_compose(): add allowed_data_dir param. After ${PIC_DATA_DIR}
substitution, compose templates produce absolute paths; without this the
validator would reject every service install. ServiceComposer.write_compose()
now passes its resolved data_dir so only the designated data directory is
exempt — /etc, /proc, docker.sock etc. still blocked.
2. _RESERVED_SUBDOMAINS: remove service-level subdomains (mail, calendar, files,
webdav, webmail). The reserved list should protect PIC infrastructure endpoints
(api, webui, admin) — not service subdomains that official store services
(calendar, files, webmail) must be allowed to claim. Aligns with the
existing _RESERVED_SUBS in service_registry.py.
3. ServiceRegistry.list_active(): new method returning only installed store
services (no builtins). This is the forward-looking API that Phase 2 will
make the primary read path once builtins are deleted. Adding it now unblocks
the QA agent's test_optional_services_feature.py which was already testing
the expected Phase 2 behaviour.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduces api/manifest_validator.py as a single security chokepoint
imported by both ServiceComposer and ServiceStoreManager:
- validate_manifest(): rejects kind=builtin, reserved container names,
reserved subdomains, backend denylist (localhost, cell-api, etc.),
cap_add outside allowlist / in denylist, shell-string provision hooks,
and env values with shell-special characters
- validate_rendered_compose(): walks the rendered YAML and rejects
privileged:true, host network/pid/ipc/userns, absolute bind mounts,
denied capabilities, devices key, apparmor/seccomp unconfined, and
string-form command/entrypoint (shell-injection vector)
- validate_provision_hook(): requires argv list form, lowercase binary,
rejects NUL bytes
ServiceStoreManager changes:
- _validate_manifest() delegates to validate_manifest() after existing checks
- _fetch_manifest() and fetch_index() now stream with a 256 KB size cap
(prevents memory exhaustion from a malicious or compromised index)
- Digest-pin warning for images missing @sha256 (hard error for unknown
registries, warning for git.pic.ngo/roof/* and TRUSTED_IMAGES_NO_DIGEST)
ServiceComposer changes:
- write_compose() calls validate_rendered_compose() before any disk write
so no partial file is left if validation fails
- render_template() substitutes ${PIC_DATA_DIR} with the resolved data_dir path
102 new tests in tests/test_manifest_validator.py covering all five P0
security issues. Existing test mocks updated to use streaming response
pattern (stream=True + raw.read) and valid compose YAML templates.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
These files were created during Steps 1-4 of the services architecture but were
never staged: AccountManager (per-service credential provisioning), ServiceComposer
(docker-compose lifecycle), built-in service manifests for email/calendar/files,
and their test suites (158 tests). Also un-tracks .coverage binaries that were
accidentally committed.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously, CaddyManager and NetworkManager contained hardcoded lists of
service names (calendar, files, mail, webdav, etc.), meaning every new
service required a code change to appear in Caddy routes and DNS records.
Now both managers accept a service_registry parameter and derive their
service lists dynamically from the registry at runtime.
- CaddyManager: new _build_registry_service_routes() and
_http01_service_pairs() methods pull routes from the registry
- NetworkManager: new _get_service_subdomains() method returns registry
subdomains with a hardcoded fallback when no registry is wired in;
_build_dns_records, stale-record detection, and service name sets all
use the registry
- managers.py: service_registry constructed before network_manager so it
can be injected into both CaddyManager and NetworkManager
- service_registry.py: validation chokepoint in get_caddy_routes() rejects
invalid subdomain/backend values and reserved service names
- service_store_manager.py: _validate_manifest now validates top-level
subdomain, backend, extra_subdomains, and extra_backends fields
- tests: 24 new tests covering registry-driven routing and DNS subdomain
generation (test_caddy_registry_integration.py)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Rename Store → Services: ServicesIndex.jsx shows built-in core services
(Email, Calendar, Files) with Manage links, plus the existing add-on
store below.
New service sub-pages at /services/email|calendar|files serve both
admin and peer roles. Admins see connection info, service status, users
list, and an inline config form (port/data-dir). Peers see connection
info and their personal credentials fetched from peerAPI.
Navigation restructured: a Services parent item expands to show the
three sub-pages via a collapsible sidebar group (ChevronDown toggle).
Both admin and peer navigation include the Services group. Sidebar
extracted NavItem/NavList components to eliminate the duplicate mobile/
desktop rendering.
Settings.jsx drops EmailForm, CalendarForm, FilesForm and their
SERVICE_DEFS entries. Port conflict detection and per-service validation
logic extracted to utils/serviceConfig.js, shared by Settings and the
new service pages. Service form flushers are registered without cleanup
so the Apply banner saves dirty config even when the user navigates away
from a service page before clicking Apply.
Legacy routes /email, /calendar, /files, /store redirect to their new
canonical paths.
GET /api/config now includes installed_services so the nav can derive
which add-ons are installed without a separate store fetch.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
apply_cell_name() now skips multi-label zone files (split-horizon DDNS
zones like pic2.pic.ngo.zone) and excludes '*' and '@' from hostname
candidate detection, preventing the wildcard record from being renamed
to the old cell name during a cell rename.
update_split_horizon_zone() now deletes stale zone files from previous
cell names sharing the same TLD (e.g. pic3.pic.ngo.zone when renaming
to pic2.pic.ngo), eliminating orphaned DNS entries.
_bootstrap_dns() now detects non-LAN domain modes and calls
update_split_horizon_zone() instead of apply_ip_range(), preventing
service records (api, calendar, files…) from being re-injected into
the DDNS parent zone on every container restart.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Dashboard, Email, Calendar, and Files pages were building service URLs
with the internal LAN zone name (e.g. 'cell') instead of the public
effective domain (e.g. 'pic2.pic.ngo'), and always using http:// even
in DDNS mode where HTTPS is available.
Changes:
- Dashboard/Email/Calendar/Files: read effective_domain + domain_mode
from ConfigContext; use effective_domain in non-LAN mode and https://
for all DDNS domain modes.
- Calendar: show port 443 instead of 80 in DDNS mode.
- network_manager.update_split_horizon_zone: when the primary internal
zone name is a parent of the effective DDNS domain (e.g. pic.ngo is a
parent of pic2.pic.ngo), remove stale bootstrap service records (api,
calendar, files, mail, webmail, webdav) that pollute the DNS display
and would shadow public DNS responses.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
In DDNS modes (pic_ngo, cloudflare, duckdns, http01), all built-in
services are now reachable as subdomains of the cell domain, e.g.
calendar.pic1.pic.ngo instead of pic1.pic.ngo/calendar.
Key changes:
- CaddyManager._build_core_service_routes(): new helper generates
Caddy named-matcher host blocks for calendar, mail/webmail, files,
webdav, and api subdomains within the wildcard TLS server block.
- All ACME modes (pic_ngo, cloudflare, duckdns) use the new
subdomain matchers; http01 emits a dedicated server block per service.
- http01: installed store-plugin services whose name clashes with a
core service are skipped to prevent duplicate server blocks.
- routes/config.py: ip_utils.write_caddyfile() is skipped in non-LAN
modes so LAN Caddy config never overwrites the ACME config.
- firewall_manager.generate_corefile(): new split_horizon_zones param
adds local authoritative file zones so LAN clients resolve
*.pic1.pic.ngo to the internal Caddy IP without hairpin NAT.
- NetworkManager.update_split_horizon_zone(): writes the wildcard zone
file and regenerates the Corefile with the split-horizon block;
called automatically after every identity change in non-LAN mode.
- Added @ to allowed record-name chars in update_dns_zone validation.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- ConfigManager.get_effective_domain(): returns domain_name when DDNS
active (pic_ngo/cloudflare/duckdns), domain otherwise. Used by all
public-facing services so they use the real registered FQDN.
- ConfigManager.get_internal_domain(): always returns _identity.domain
(CoreDNS zone name, dnsmasq, cell-link invites — stays internal).
- Silent migration: if domain_mode != lan and domain is generic "cell",
auto-set to {cell_name}.local for unique CoreDNS zone naming.
- caddy_manager: fix custom_domain bug — cloudflare/http01 modes were
reading identity.get('custom_domain') which never exists; now reads
domain_name correctly.
- routes/config, app: expose effective_domain in GET /api/config and
/api/status responses.
- email_manager, routes/email: use get_effective_domain() for
OVERRIDE_HOSTNAME, POSTMASTER_ADDRESS, and new-user email defaults.
- ServiceBus.IDENTITY_CHANGED event: emitted from PUT /api/config and
POST /api/ddns/register after identity writes; caddy_manager and
email_manager subscribe to regenerate config automatically.
- Settings.jsx: hide Local Domain input in non-LAN modes; show
read-only effective_domain with "managed by DDNS" badge and an
Advanced toggle for the internal CoreDNS zone name.
- 11 new test classes covering all new helpers, event subscriptions,
caddy/email handlers, and the custom_domain fix.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds DELETE /api/v1/registration to the DDNS server (token-authenticated,
owner-only) and PicNgoDDNS.release() on the client. DDNSManager.register()
now automatically releases the old subdomain before claiming the new one,
so stale names are freed for others to use. Release failures are logged as
warnings and do not block the new registration.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two bugs that prevented registration from working after wizard completion:
1. register(name, '') sent empty IP; server stored blank A record. Now calls
_get_public_ip() when ip is empty so the A record is always set correctly.
2. Token was saved to _identity.domain.ddns.token (TypeError when domain is a
string) instead of the top-level ddns config where update_ip() reads it.
Subdomain also now correctly written to _identity.domain_name.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- DDNSTokenExpired exception triggers auto re-register in update_ip()
so cells recover silently after a DDNS DB reset
- POST /api/ddns/register lets the user force re-registration from Settings
- Re-register button in Settings → External Domain & DDNS (pic_ngo only)
- 3 new tests covering register endpoint: wrong provider, missing name, success
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- GET /api/config now returns domain_mode, domain_name, ddns.{provider,subdomain,has_token}
- GET /api/ddns/check/<name> proxies availability check to DDNS service
- PUT /api/ddns validates and saves cloudflare/duckdns credentials post-setup
- When cell_name changes for pic_ngo provider, auto-registers the new subdomain
- Settings: Cell Name shows availability badge for pic_ngo; auto-save blocks on taken
- Settings: new External Domain & DDNS section — pic_ngo info, cloudflare/duckdns edit
- 11 new tests for the two new endpoints (all pass)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two bugs:
1. AttributeError: AuthManager.update_password does not exist — the
fallback when create_user fails should call set_password_admin().
This caused a 500 on every setup submit when an admin user already
existed (e.g. from a previous install attempt).
2. Wizard was jumping to step 2 and skipping domain steps 3-4 when
preconfigured data existed in cell_config.json. Since the installer
no longer sets that data, and the wizard must always show all steps,
the installerConfigured state and all step-skipping navigation is
removed. Values are still pre-filled if found in config.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Reverts 8d1ef39. The installer must collect cell name, domain mode, and
provider tokens before 'make install' so that DDNS registration,
availability checks, and Caddy TLS can be configured at first boot.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
DDNS registration (setup_cell.py):
- Replace pyotp dependency with stdlib TOTP (HMAC-SHA1, RFC 6238)
pyotp is only available inside the Docker container, not on the host
where setup_cell.py runs — registration was silently skipped every time
- OTP header still sent if generation succeeds; omitted gracefully if not
Wizard pre-fill (setup_manager + Setup.jsx):
- GET /api/setup/status now returns 'preconfigured' dict with cell_name,
domain_mode, domain_name, and provider tokens from installer-written config
- Setup.jsx fetches status on mount and pre-fills all form state so the
user only needs to set password, services, and timezone — not re-enter
the identity they already configured in the bash installer
- Fails silently so wizard still works on fresh installs with no config
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- setup_manager: fall back to update_password if admin already exists
(installer bootstrap creates admin; wizard now updates rather than fails)
- install.sh: chown repo to SUDO_USER instead of pic user so the
invoking operator can run make update without git safe.directory errors
- test: update mock to also stub update_password when testing total auth failure
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The setup wizard runs before any account exists, but the installer's
setup_cell.py creates auth_users.json with an admin account first.
This meant enforce_auth was active by the time the browser hit /setup,
blocking all /api/setup/* calls with 401. The CSRF hook already exempted
/api/setup/* — auth enforcement now matches.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Tests assumed write to /nonexistent/... fails, but CI runs as root where
Linux allows creating any path. Use unittest.mock.patch on builtins.open
with OSError side_effect so the test is environment-independent.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- auth_manager._ensure_file(): stop creating the empty auth_users.json on
init — the constructor now only creates the parent directory. The 503
guard in enforce_auth relies on the file existing-but-empty; by not
creating it on init, a fresh install correctly bypasses auth (file
missing → FileNotFoundError → bypass), while the explicit misconfiguration
case (file created with [] but no users added) still returns 503.
- test_enforce_auth_configured.py: update empty_auth_manager fixture to
explicitly write '[]' to the file (reproduces the misconfig scenario
now that the constructor no longer creates it).
- ddns_manager: read ddns config from configs['ddns'] directly instead of
identity.domain.ddns — _identity.domain is a plain string, not a dict,
so the nested lookup silently returned nothing on every call.
- setup_cell.py: write top-level 'ddns' block into cell_config.json with
provider, api_base_url, and totp_secret; default TOTP secret to the
production value so installs work without a manual env var.
- test_ddns_manager.py: update _make_config_manager to populate cm.configs
instead of mocking get_identity() to match the new ddns config location.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sends 50 pings at 0.2s intervals through the cell-to-cell tunnel and
asserts that ≤5% exceed 3× the median RTT (floor 15ms). Catches
server-side packet processing regressions on wired paths.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- firewall_manager: add _get_wg_server_ip() helper; scope ensure_cell_api_dnat(),
ensure_dns_dnat(), ensure_service_dnat() DNAT rules with -d server_ip; add
ensure_wg_masquerade() (Docker→wg0 MASQUERADE+FORWARD) and
ensure_cell_subnet_routes() (host routes via docker run busybox)
- wireguard_manager: scope PostUp DNAT rules with -d server_ip in generate_config()
and ensure_postup_dnat(); add Docker→wg0 MASQUERADE+FORWARD rules
- app.py: call ensure_wg_masquerade() and ensure_cell_subnet_routes() in
_apply_startup_enforcement()
- tests/test_firewall_manager.py: mock _get_wg_server_ip, add
test_dnat_is_scoped_to_server_ip and test_returns_false_when_wg_server_ip_not_found
- tests/e2e/wg/test_cell_to_cell_routing.py: rewrite to use dynamic config
(no hardcoded IPs/ports), add latency and domain access tests
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The cell catch-all DROP rule blocked all traffic from a connected cell's
subnet, including ESTABLISHED/RELATED packets (ICMP replies, TCP ACKs) for
connections initiated by local VPN peers. This broke ping to the remote
cell's WireGuard IP even when the cell-to-cell tunnel was healthy.
Change the DROP to match only NEW,INVALID connections so established reply
traffic passes through to the stateful ACCEPT rule.
Also adds tests/e2e/wg/test_cell_to_cell_routing.py — an end-to-end test
that brings up a real WireGuard tunnel from the test runner to pic1 and
verifies full cross-cell routing including ICMP ping, API /health, and Caddy.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
apply_cell_rules drops all traffic from a cell's subnet except specific
service ports. This also drops ICMP replies and TCP ACKs for connections
initiated by local peers to the connected cell, breaking cross-cell
routing (ping to 10.0.0.1 silently dropped by test's cell DROP rule).
Fix: ensure_forward_stateful() inserts a stateful ESTABLISHED,RELATED
ACCEPT at the top of FORWARD. Called from apply_cell_rules (every cell
add/update) and from _apply_startup_enforcement. Idempotent.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three related fixes for split-tunnel peers that need to reach connected cells:
1. apply_peer_rules/apply_all_peer_rules now accept wg_subnet (actual local VPN
subnet) and cell_subnets (connected cells' vpn_subnets) parameters instead of
hardcoding 10.0.0.0/24. All callers (startup, add_peer, update_peer,
apply-enforcement endpoint) pass the real values.
2. Explicit ACCEPT rules are inserted in FORWARD for each connected cell's
subnet so split-tunnel peers (internet_access=False) can still reach
connected cells via the wg0→wg0 path.
3. apply_ip_range in network_manager now loads cell_links.json and passes it
to generate_corefile(), fixing a race where the bootstrap DNS thread could
overwrite the Corefile and wipe cross-cell DNS forwarding zones on startup.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When a cell is connected to others, changing the local WireGuard address
or Docker ip_range to a subnet that overlaps a connected cell's vpn_subnet
would break routing. Both now return 409 with the conflicting cell name.
- wireguard.address: derive network from new address, check all connected
cells' vpn_subnet for overlap (after existing format validation)
- ip_range: check all connected cells' vpn_subnet for overlap (after
existing RFC-1918 validation)
Tests: 4 cases each (overlap → 409, no overlap → ok, no cells → ok,
format error still fires first → 400).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two gaps allowed a cell to take a domain already in use by a connected cell:
1. PUT /api/config domain change: added check against cell_link_manager's
connected cells list before saving — returns 409 if the new domain
collides with any connected cell's domain.
2. accept_invite healing path: a remote cell changing its domain via a
re-invite was not validated against other connected cells' domains.
Now calls _check_invite_conflicts(invite, exclude_cell=name) before
applying any change.
Also: the healing path now detects domain changes (alongside dns_ip/
vpn_subnet/endpoint), updates the stored domain, and refreshes the DNS
forward rule when the domain changes.
Tests: 3 new domain-conflict tests in test_config_validation.py;
3 new accept_invite healing tests in test_cell_link_manager.py.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Flask's default cookie name ('session') is shared across all ports on the same
hostname. When two PIC instances are accessed via localhost:portA and localhost:portB,
logging into one overwrites the other's session cookie, causing repeated logouts.
Derive a unique 8-hex suffix from each instance's persistent SECRET_KEY and set
SESSION_COOKIE_NAME = 'pic_sess_<suffix>'. This ensures each cell uses a distinct
cookie name, so sessions are fully isolated regardless of hostname.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
_get_dnat_container_ips() used a concatenating docker inspect format that
produced "invalid IP" when containers had multiple network attachments.
The old ensure_postup_dnat appended rather than replacing, so each update
call added a broken duplicate set of rules causing iptables to fail on
startup and tear down wg0 entirely.
Fix _get_dnat_container_ips to use a space separator in the format string
and validate each token as a real IP before accepting it.
Rewrite ensure_postup_dnat with _is_dnat_rule() helper: strips every
managed DNAT/FORWARD rule (any IP, port 53/80) on semicolon-split and
appends a single correct set — fully idempotent regardless of prior state.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 new tests in TestSetPeerRouteVia:
- 409 when remote_exit_relay_active=True (would create A→B→A cycle)
- disable (via_cell=null) bypasses loop check — always allowed
- no 409 when remote_exit_relay_active=False (safe to enable)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three issues fixed together:
1. WireGuard address changes now go through the pending-restart queue
(shown in the UI banner) instead of restarting cell-wireguard immediately.
Only private_key changes still restart immediately; address and port
changes both defer to the user-initiated Apply flow. Previously the
address change was silently applied and never appeared in Settings →
Pending Configuration.
2. When the WG address changes, the API spawns a background thread that
pushes the updated invite to all connected cells (over LAN, before the
WG tunnel is back up). This lets remote cells automatically update
their dns_ip, AllowedIPs, and CoreDNS forwarding rules without manual
re-pairing.
3. accept_invite now handles the "already connected but changed" case:
if the remote cell re-sends an invite with a different dns_ip, vpn_subnet
or endpoint, we update the stored link, the WG AllowedIPs, and the
CoreDNS forward rule in place — no delete/re-add required. Previously
the endpoint was ignored and returned the stale record unchanged.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three fixes:
1. Extend the docker-exec safety guard in wireguard_manager to also check
for 'wg_confs' in the config path. When running unit tests on the host
the API uses /app/config/wireguard/wg0.conf (no wg_confs subdir), so the
old '/tmp/' | 'pytest' check didn't fire — _syncconf and friends were
executing live 'docker exec cell-wireguard wg set' calls against the
running container, removing real VPN peers that didn't appear in the
test config. The wg_confs subdir only exists inside the container mount,
so its presence reliably gates live calls.
2. Fix get_split_tunnel_ips() wrong path: self.data_dir + 'api/cell_links.json'
→ self.data_dir + 'cell_links.json'. The extra 'api/' segment produced
/app/data/api/cell_links.json inside the container instead of the real
/app/data/cell_links.json, so connected cells were silently excluded from
split-tunnel CIDRs.
3. update_peer_ip_registry and ip_update now also call
wireguard_manager.update_peer_ip so wg0.conf AllowedIPs stay in sync when
a peer's VPN IP changes at runtime (previously only peers.json was updated).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
**PIC UI always accessible (service_access=[])**
Remove the per-peer Caddy:80 ACCEPT/DROP rule from apply_peer_rules.
Service access was enforced at two layers (iptables DROP + CoreDNS ACL),
but the iptables layer also blocked the PIC web UI served through Caddy.
CoreDNS ACL alone is sufficient — DNS blocks service hostnames; the UI
path through Caddy remains reachable regardless of service_access value.
**Exit-relay internet routing (route_via another cell)**
update_peer_ip validated new_ip as a single ip_network, rejecting the
comma-separated '10.0.1.0/24, 0.0.0.0/0' string passed by
update_cell_peer_allowed_ips(add_default_route=True). The AllowedIPs
in wg0.conf was never updated, so WireGuard never routed internet traffic
through the exit cell's tunnel. Fix: validate each CIDR individually and
apply the change live via wg set without a container restart.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- _build_acl_block: put all blocked IPs for a service in ONE acl block instead
of one block per peer — the first block's allow-all was silently granting
access to every peer after the first blocked one (first-match semantics)
- generate_corefile: add 'reload' plugin so SIGUSR1 triggers Corefile reload
in newer CoreDNS builds (without it the signal was a no-op)
- tests/test_firewall_manager.py: new tests for single merged ACL block and
the reload directive
- tests/e2e/api/test_peer_access_update.py: e2e tests for service_access,
internet_access, and peer_access updates persisting live to iptables/CoreDNS
- tests/e2e/api/test_cell_to_cell.py: e2e tests for cell-to-cell connection
management, permissions API, and cross-cell service access restrictions
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
DNAT rules applied via docker exec are lost whenever wg-easy reloads the
WireGuard interface (PostDown flushes the nat table then PostUp only
re-adds static rules). Fix: embed DNS (port 53) and service (port 80)
DNAT rules directly in wg0.conf PostUp/PostDown so they reapply on every
interface restart. ensure_postup_dnat() patches existing configs on startup.
get_server_config() now returns the WG server IP (e.g. 10.0.0.1) for
dns_ip instead of the cell-dns container IP (172.20.0.3). This makes the
value consistent with what get_peer_config() writes into the .conf file,
and fixes the stale hint text in Peers.jsx and WireGuard.jsx.
UI: fallback dns_ip changed from 172.20.0.3 to 10.0.0.1; split-tunnel
fallback drops the 172.20.0.0/16 stale range.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
DNS A records now return the WireGuard server IP (10.0.0.1) instead of
Docker bridge VIPs so cross-cell peers resolve service names correctly
regardless of their bridge subnet. DNAT rules (wg0:53→cell-dns:53 and
wg0:80→cell-caddy:80) are applied at startup. Caddy routes by Host header,
eliminating the Docker bridge subnet conflict. Firewall cell rules allow
DNS and service (Caddy) traffic from linked cell subnets. Split-tunnel
AllowedIPs now dynamically includes connected-cell VPN subnets and drops
the 172.20.0.0/16 range. Peers with route_via set now receive full-tunnel
config (0.0.0.0/0) so all their traffic exits via the remote cell.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The PostUp rule appended `iptables -A FORWARD -i wg0 -j ACCEPT` which
allowed any WireGuard-connected client full internet access regardless of
per-peer rules, even when no peers were configured in wg0.conf.
Fix: change PostUp/PostDown to use DROP as the catch-all. Per-peer and
per-cell rules use -I (insert at top) so they take precedence; unknown
or unconfigured WG traffic hits the DROP at the bottom.
Also add reconcile_stale_peer_rules() called on startup to remove FORWARD
rules for peer IPs that no longer exist in the registry, preventing deleted
peers from retaining firewall access across container restarts.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds the ability to route a specific peer's internet traffic through a
connected cell acting as an exit relay.
Cell A side:
- PUT /api/peers/<peer>/route-via {"via_cell": "cellB"} sets route_via
- Updates WG AllowedIPs to include 0.0.0.0/0 for the exit cell peer
- Adds ip rule + ip route in policy table inside cell-wireguard so the
specific peer's traffic egresses via cellB's WG IP
- Sets exit_relay_active on the cell link and pushes use_as_exit_relay=True
to cellB via peer-sync
Cell B side:
- Receives use_as_exit_relay in the peer-sync payload
- Calls apply_cell_rules(..., exit_relay=True) to add FORWARD -o eth0 ACCEPT
- Stores remote_exit_relay_active flag for startup recovery
Startup recovery:
- apply_all_cell_rules passes exit_relay=remote_exit_relay_active (cellB)
- _apply_startup_enforcement reapplies ip rule for each peer with route_via (cellA)
since policy routing rules don't survive container restart
peer_registry gets route_via field with lazy migration.
22 new tests across test_cell_link_manager, test_peer_registry, test_peer_route_via.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>