Three related issues prevented CoreDNS from serving updated zone records:
1. The `file` plugin blocks in generate_corefile() lacked a `reload`
option, so CoreDNS never re-read zone files after they were written.
Added `reload 30s` so zone file changes are picked up within 30s.
2. _reload_dns_service() sent SIGHUP via `docker exec ... kill -HUP 1`,
which doesn't trigger zone reloads. Changed to SIGUSR1 via
`docker kill --signal=SIGUSR1` (same as firewall_manager.reload_coredns).
3. _bootstrap_dns() wrote the zone file but never regenerated the
Corefile. CoreDNS's reload plugin only fires when the Corefile
changes, so zone records from startup were invisible until the next
peer modification triggered apply_all_dns_rules(). Now _bootstrap_dns()
always calls apply_all_dns_rules() after the zone write.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
_build_dns_records() only hardcoded 'api' and 'webui', relying on the
optional service registry for the rest. Built-in services (calendar,
files, mail, webdav) were never registered, so they were absent from
the zone file and tests querying webdav.<domain> via CoreDNS got
NXDOMAIN.
Add _BUILTIN_SERVICE_SUBDOMAINS constant and include those names in
every zone build. Also update _stale and apply_cell_name exclusion
sets so DDNS mode correctly removes them from the parent zone.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Builtins (email/calendar/files) are no longer baked into the API image.
ServiceRegistry now only knows about installed store services. When nothing
is installed, Caddy and DNS get no service routes — no hardcoded fallback.
Changes:
- service_registry.py: remove _BUILTINS_DIR, _builtin_ids, _builtin_manifest,
_load_manifest; get() and list_all() now delegate entirely to installed services
- caddy_manager.py: remove _build_core_service_routes(); remove hardcoded
fallback pairs from _http01_service_pairs(); empty registry → api block only
- network_manager.py: _get_service_subdomains() returns [] when no registry
- api/services/builtins/: deleted (email, calendar, files manifests)
- Tests updated throughout: removed builtin-dependent assertions, added
installed-service fixtures, updated fallback expectations to api-only
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously, CaddyManager and NetworkManager contained hardcoded lists of
service names (calendar, files, mail, webdav, etc.), meaning every new
service required a code change to appear in Caddy routes and DNS records.
Now both managers accept a service_registry parameter and derive their
service lists dynamically from the registry at runtime.
- CaddyManager: new _build_registry_service_routes() and
_http01_service_pairs() methods pull routes from the registry
- NetworkManager: new _get_service_subdomains() method returns registry
subdomains with a hardcoded fallback when no registry is wired in;
_build_dns_records, stale-record detection, and service name sets all
use the registry
- managers.py: service_registry constructed before network_manager so it
can be injected into both CaddyManager and NetworkManager
- service_registry.py: validation chokepoint in get_caddy_routes() rejects
invalid subdomain/backend values and reserved service names
- service_store_manager.py: _validate_manifest now validates top-level
subdomain, backend, extra_subdomains, and extra_backends fields
- tests: 24 new tests covering registry-driven routing and DNS subdomain
generation (test_caddy_registry_integration.py)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
apply_cell_name() now skips multi-label zone files (split-horizon DDNS
zones like pic2.pic.ngo.zone) and excludes '*' and '@' from hostname
candidate detection, preventing the wildcard record from being renamed
to the old cell name during a cell rename.
update_split_horizon_zone() now deletes stale zone files from previous
cell names sharing the same TLD (e.g. pic3.pic.ngo.zone when renaming
to pic2.pic.ngo), eliminating orphaned DNS entries.
_bootstrap_dns() now detects non-LAN domain modes and calls
update_split_horizon_zone() instead of apply_ip_range(), preventing
service records (api, calendar, files…) from being re-injected into
the DDNS parent zone on every container restart.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Dashboard, Email, Calendar, and Files pages were building service URLs
with the internal LAN zone name (e.g. 'cell') instead of the public
effective domain (e.g. 'pic2.pic.ngo'), and always using http:// even
in DDNS mode where HTTPS is available.
Changes:
- Dashboard/Email/Calendar/Files: read effective_domain + domain_mode
from ConfigContext; use effective_domain in non-LAN mode and https://
for all DDNS domain modes.
- Calendar: show port 443 instead of 80 in DDNS mode.
- network_manager.update_split_horizon_zone: when the primary internal
zone name is a parent of the effective DDNS domain (e.g. pic.ngo is a
parent of pic2.pic.ngo), remove stale bootstrap service records (api,
calendar, files, mail, webmail, webdav) that pollute the DNS display
and would shadow public DNS responses.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
In DDNS modes (pic_ngo, cloudflare, duckdns, http01), all built-in
services are now reachable as subdomains of the cell domain, e.g.
calendar.pic1.pic.ngo instead of pic1.pic.ngo/calendar.
Key changes:
- CaddyManager._build_core_service_routes(): new helper generates
Caddy named-matcher host blocks for calendar, mail/webmail, files,
webdav, and api subdomains within the wildcard TLS server block.
- All ACME modes (pic_ngo, cloudflare, duckdns) use the new
subdomain matchers; http01 emits a dedicated server block per service.
- http01: installed store-plugin services whose name clashes with a
core service are skipped to prevent duplicate server blocks.
- routes/config.py: ip_utils.write_caddyfile() is skipped in non-LAN
modes so LAN Caddy config never overwrites the ACME config.
- firewall_manager.generate_corefile(): new split_horizon_zones param
adds local authoritative file zones so LAN clients resolve
*.pic1.pic.ngo to the internal Caddy IP without hairpin NAT.
- NetworkManager.update_split_horizon_zone(): writes the wildcard zone
file and regenerates the Corefile with the split-horizon block;
called automatically after every identity change in non-LAN mode.
- Added @ to allowed record-name chars in update_dns_zone validation.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three related fixes for split-tunnel peers that need to reach connected cells:
1. apply_peer_rules/apply_all_peer_rules now accept wg_subnet (actual local VPN
subnet) and cell_subnets (connected cells' vpn_subnets) parameters instead of
hardcoding 10.0.0.0/24. All callers (startup, add_peer, update_peer,
apply-enforcement endpoint) pass the real values.
2. Explicit ACCEPT rules are inserted in FORWARD for each connected cell's
subnet so split-tunnel peers (internet_access=False) can still reach
connected cells via the wg0→wg0 path.
3. apply_ip_range in network_manager now loads cell_links.json and passes it
to generate_corefile(), fixing a race where the bootstrap DNS thread could
overwrite the Corefile and wipe cross-cell DNS forwarding zones on startup.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
DNS A records now return the WireGuard server IP (10.0.0.1) instead of
Docker bridge VIPs so cross-cell peers resolve service names correctly
regardless of their bridge subnet. DNAT rules (wg0:53→cell-dns:53 and
wg0:80→cell-caddy:80) are applied at startup. Caddy routes by Host header,
eliminating the Docker bridge subnet conflict. Firewall cell rules allow
DNS and service (Caddy) traffic from linked cell subnets. Split-tunnel
AllowedIPs now dynamically includes connected-cell VPN subnets and drops
the 172.20.0.0/16 range. Peers with route_via set now receive full-tunnel
config (0.0.0.0/0) so all their traffic exits via the remote cell.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- API key store was out of sync with wg0.conf: get_keys() generated a random
phantom key instead of reading the actual WireGuard server key, so all peer
configs had the wrong PublicKey and could never handshake. Fixed by writing
correct raw-bytes key files at deploy time and adding _sync_wg_keys() to API
startup so the store auto-syncs from wg0.conf on every restart.
- apply_domain() fell back silently when zone file had no $ORIGIN directive;
now also parses the SOA MNAME as the old-domain fallback.
- apply_cell_name() only replaced the hostname if old_name matched literally
in the zone file; now auto-detects the actual hostname (non-service A record)
so a stale zone (mycell vs dev) is corrected on next config apply.
- DNS zone file corrected: SOA pic.ngo. admin.pic.ngo., mycell → dev.
- WireGuard UI: add 30s auto-poll for peer statuses; fix "peers currently
connected" counter to show online/total instead of total count.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
_generate_zone_content writes records as "name TTL IN A value" but the
regex only matched "name IN A value" (no TTL), so renaming the cell
never updated the DNS hostname record. Updated regex to make TTL optional.
Also fixed the unit test zone fixture to use the actual generated format.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- network_manager: api/webui DNS records now point to Caddy (172.20.0.2)
instead of their container IPs so Caddy can reverse-proxy correctly
- ip_utils: add webui.dev block to generated Caddyfile
- config/caddy/Caddyfile: regenerated with webui.dev block
- config/dns/Corefile: simplify to single forward zone (remove duplicate)
- app.py peer_dashboard: rename peer_name→name, rx_bytes→transfer_rx,
tx_bytes→transfer_tx to match PeerDashboard.jsx; add service_urls dict
- app.py peer_services: fix DNS (10.0.0.1→real CoreDNS IP), CalDAV URL
(radicale.dev:5232→calendar.dev), email structure (flat→nested smtp/imap
objects), rename webdav→files, add WireGuard config text, add username field
- PeerDashboard.jsx: render service icon links from service_urls
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Settings: remove Save buttons; autosave is silent (no toast on success, error only)
- Settings: loadAll() resets dirty flags to prevent stale autosave after discard
- app.py: fix domain/ip_range "actually changed" check — full identity is always
sent on save so these were triggering pending on every keystroke regardless
- app.py: _dedup_changes handles port-change format "service field: old → new"
(split on ':' not ' changed') so dns_port changed twice shows one entry
- app.py: domain + cell_name changes now go through pending restart banner;
apply_domain/apply_cell_name write files immediately (reload=False) and set
pending; Discard restores zone files + Caddyfile to pre-change state
- app.py: _set_pending_restart captures pre-change snapshot BEFORE config writes
(was snapshotting after, making Discard a no-op)
- app.py: is_local_request reads /proc/net/route to allow the actual Docker
bridge subnet (172.0.0.0/24) which is not RFC-1918; fixes Containers page 403
- container_manager: get_container_logs raises instead of swallowing exceptions
so nonexistent container returns 500+error not 200+empty
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously the loop broke after processing the first zone file it found.
If dev.zone already existed (created by apply_ip_range), it would be
processed and the loop would stop — leaving any other zone files (e.g.
cell.zone from an earlier domain) in place. get_dns_records() reads all
.zone files so the stale zone appeared doubled in the UI.
Fix: collect all non-local zone files first, write the target, then
delete every file that is not the current domain's zone.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sprint 1 — Security & correctness:
- Restore all 10 commented-out is_local_request() checks (vault, containers, images, volumes)
- Fix XFF spoofing: only trust the LAST X-Forwarded-For entry (Caddy's append), not all
- Require prefix length in wireguard.address (was accepting bare IPs like 10.0.0.1)
- Validate service_access list in add_peer (valid: calendar/files/mail/webdav)
- Fix dhcp/reservations POST/DELETE: unpack mac/ip/hostname from body (was passing dict as positional arg)
- Fix network/test POST: remove spurious data arg (test_connectivity takes no args)
- Fix remove_peer: clear iptables rules and regenerate DNS ACLs on deletion (was leaving stale rules)
- Fix CoreDNS reload: SIGHUP → SIGUSR1 (SIGHUP kills the process; SIGUSR1 triggers reload plugin)
- Remove local.{domain} block from Corefile template (local.zone doesn't exist, caused log spam)
- Fix routing_manager._remove_nat_rule: targeted -D instead of flushing entire POSTROUTING chain
Sprint 2 — State consistency:
- Atomic config writes in config_manager, ip_utils, firewall_manager, network_manager
(write to .tmp → fsync → os.replace, prevents truncated files on kill)
- backup_config: now also backs up Caddyfile, Corefile, .env, DNS zone files
- restore_config: restores all of the above so config stays consistent after restore
Sprint 3 — Dead code / documentation:
- Remove CellManager instantiation from app startup (was never called, double-instantiated all managers)
- Document routing_manager scope (targets host, not cell-wireguard; methods not called by any active route)
Sprint 4 — Test infrastructure:
- Add tests/conftest.py with shared tmp_dir, tmp_config_dir, tmp_data_dir, flask_client fixtures
- Add tests/test_config_validation.py: 400 paths for ip_range, port, wireguard.address validation
- Add tests/test_ip_utils_caddyfile.py: 14 tests for write_caddyfile (was completely untested)
- Expand test_app_misc.py: 7 new is_local_request tests covering XFF spoofing and cell-network IPs
- Add --cov-fail-under=70 to make test-coverage
- Add pre-commit hook that runs pytest before every commit
414 tests pass (was 372).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two bugs caused DNS to fail when the domain name changes:
1. generate_corefile() hardcoded 'cell' as the zone name instead of
using the configured domain — on startup it would silently reset any
domain change back to 'cell'
2. apply_domain() regex replaced ALL non-dot zones (including local.cell)
with the new domain → duplicate zone blocks → CoreDNS crash
Fix: add a domain parameter to generate_corefile/apply_all_dns_rules,
add _configured_domain() helper in app.py, and delegate Corefile updates
in apply_domain() to generate_corefile() so the logic is in one place.
Also parameterise SERVICE_HOSTS ACL entries via the domain argument.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When ip_range changes in Settings, the new subnet is now applied to:
- DNS zone records (network_manager.apply_ip_range)
- Caddy virtual IPs (firewall_manager.ensure_caddy_virtual_ips)
- iptables per-service rules (firewall_manager.update_service_ips)
- docker-compose.yml static IPs if writable (ip_utils.update_docker_compose_ips)
New module ip_utils.py derives all container IPs from the subnet using
fixed offsets so the entire stack stays consistent from one setting.
321 tests pass (72 new tests added for ip_utils, apply_ip_range, update_service_ips).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- NetworkManager.bootstrap_dns_records(): creates A records for all
cell services (api, webui, calendar, files, mail, webmail, webdav,
<cell_name>) using their static container IPs — only runs when the
zone file doesn't exist yet (idempotent)
- API startup: _bootstrap_dns() thread reads cell_name/domain from
config_manager and calls bootstrap — runs alongside enforcement thread
- Fix: add_dns_record(data) and remove_dns_record(data) now correctly
unpack dict kwargs instead of passing dict as positional arg
- Fix: remove duplicate cell{} block in config/dns/Corefile
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Site-to-site WireGuard tunnels between PIC cells with automatic DNS forwarding.
Each cell generates an invite JSON (public key, endpoint, VPN subnet, DNS IP,
domain); the remote cell imports it to establish a bidirectional tunnel and
CoreDNS forwarding block so each cell's domain resolves across the mesh.
Backend:
- CellLinkManager: invite generation, add/remove connections, live WireGuard
handshake status; stores links in data/cell_links.json
- WireGuardManager: add_cell_peer() accepts subnet CIDRs (not /32) and an
optional endpoint for site-to-site peers; _read_iface_field() reads port,
address, and network directly from wg0.conf at runtime instead of constants
- NetworkManager: add/remove CoreDNS forwarding blocks per remote cell domain
- app.py: /api/cells/* routes; _next_peer_ip() derives VPN range from
configured address so peer allocation follows any address change
Frontend:
- CellNetwork page: invite panel (JSON + QR), connect form (paste JSON),
connected cells list (green/red status, disconnect button)
- App.jsx: Cell Network nav entry and route
Tests: 25 new tests across test_wireguard_manager, test_network_manager,
test_cell_link_manager (263 total)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Changes:
- ConfigContext.jsx: React context that loads /api/config once; exposes domain,
cell_name, refresh() — wraps entire app in App.jsx
- Email/Calendar/Files pages: replace hardcoded 'mail.cell', 'calendar.cell',
'files.cell', 'webdav.cell' with domain from ConfigContext; hostname updates
immediately after Settings save (refreshConfig() called on save)
- /api/status: cell_name and domain now read from stored _identity in config_manager,
not hardcoded 'personal-internet-cell' / 'cell.local'
- network_manager.apply_cell_name(old, new): updates hostname A-record in primary
zone file and reloads CoreDNS; called from PUT /api/config when cell_name changes
- Old identity captured before save so apply_cell_name gets the correct old value
- Settings EmailForm: smtp/imap ports are read-only with note (docker-compose.yml level)
- Settings FilesForm: port is read-only with note (Caddy proxies on 80 externally)
- Settings CalendarForm: port labeled "Internal port; clients use 80 via Caddy"
Tests added:
- test_apply_cell_name_renames_host_record
- test_apply_cell_name_noop_when_same
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
config_manager restore_config and import_config previously injected zero-filled
entries (port=0, domain='') for every service schema regardless of whether that
service was in the backup/import data. Removed this logic — only restore what's
actually in the backup.
network_manager.apply_domain now:
- updates dnsmasq.conf domain= line (reload cell-dhcp)
- rewrites Corefile zone blocks to the new domain name
- renames and rewrites the primary zone file $ORIGIN + SOA records
- reloads CoreDNS
Tests added first (TDD):
- test_restore_does_not_zero_unconfigured_services
- test_restore_does_not_zero_import
- test_apply_domain_updates_corefile (zone file + Corefile)
- test_apply_domain_updates_dnsmasq
- test_apply_config_writes_dhcp_range / ntp_servers
- test_apply_config_updates_mailserver_env / no_domain_no_restart
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Each service manager now has apply_config() that writes to the actual config:
- network: dhcp_range → dnsmasq.conf (reload cell-dhcp), ntp_servers → chrony.conf
(restart cell-ntp), domain → dnsmasq.conf domain= line
- email: domain → mailserver.env OVERRIDE_HOSTNAME + POSTMASTER_ADDRESS,
restart cell-mail
- wireguard: port/address/private_key → wg0.conf ListenPort/Address/PrivateKey,
restart cell-wireguard
- calendar: port → radicale config hosts=, restart cell-radicale
PUT /api/config now calls apply_config() after persisting JSON, and returns
{restarted: [...], warnings: [...]} so Settings UI can show which containers
were restarted. _restart_container() helper added to BaseServiceManager.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Service manager fixes (connectivity tests):
- email_manager: replace telnet with socket.create_connection for SMTP/IMAP;
replace nslookup with socket.getaddrinfo for DNS; exclude unconfigured domain
from success (email healthy=False now correctly means ports refused, not missing domain)
- calendar_manager: replace localhost:5232 with cell-radicale:5232;
fix database check to test dir writability instead of file existence (files created on demand)
- file_manager: replace localhost:8080 with cell-webdav:80; add top-level success key
- network_manager: replace nslookup with socket.getaddrinfo;
add success key to dhcp_test and ntp_test return values
- routing_manager: exclude iptables_access from success
(iptables runs in cell-wireguard, not API container)
- wireguard_manager: add success key to no-arg test_connectivity result
Health history UI:
- SvcCol reads data?.status?.running || data?.status?.status — handles nested health check shape
Result: network/wireguard/calendar/files/routing/vault all healthy=True.
Email healthy=False is correct — mail server needs ≥1 account before Dovecot starts.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix CoreDNS not loading .cell zones (wrong Corefile path, now uses -conf flag)
- Fix WireGuard server address conflict (172.20.0.1/16 overlapped with Docker
network; changed to 10.0.0.1/24 to eliminate duplicate routes)
- Add SERVERMODE=true and sysctls to WireGuard docker-compose for server mode
- Fix DNS zone file parser to handle 4-field records (name IN type value)
- Add get_dns_records() to NetworkManager; mount data/dns into API container
- Fix peer config endpoint: look up IP/key from registry, use real endpoint
- Add bulk peer statuses endpoint keyed by public_key
- Normalize snake_case API fields to camelCase in WireGuard UI
- Add port check endpoint (checks via live handshake, not unreliable TCP probe)
- Add Caddy virtual hosts for ui/calendar/files/mail .cell domains (HTTP only)
- Fix cell config domain default from cell.local to cell
- Fix Routing Network Config tab (was calling hardcoded localhost:3000)
- Fix DNS records display (record.value not record.ip)
- Move service access guide to top of Dashboard with login hints
- Add /api/routing/setup endpoint
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>