Add AuditManager (api/audit_manager.py): JSONL append-only log at
data/api/audit/audit.log with SHA-256 hash chain for tamper detection,
verify endpoint, size-based rotation, and automatic redaction of secret
fields before any entry is written. Supports structured query (actor,
action, date range) and CSV export.
Wire an @app.after_request hook in app.py that fires on every mutating
/api/* request: captures actor, role, remote IP, and maps the route +
method to a human-readable action via ROUTE_ACTION_MAP. Explicit audit
entries for password_change and password_reset are added in
auth_routes.py so those events record the actor without logging secret
values.
Expose an admin-only blueprint (api/routes/audit.py):
GET /api/audit — paginated query
GET /api/audit/export — CSV download
GET /api/audit/verify — hash-chain integrity check
Register AuditManager in managers.py and add api/audit to
config_manager.py critical_data_paths so it is included in backups and
restored with other persistent state.
Add Activity page (webui/src/pages/Activity.jsx, admin-only) reachable
from the nav in App.jsx. New auditAPI helper in api.js covers all three
endpoints.
Tests: test_audit_manager.py (unit: hash chain, redaction, rotation,
query, csv, verify) and test_audit_hook_routes.py (integration: hook
fires on mutating routes, skips safe methods, records actor/ip/action,
backup-inclusion assertion).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Root causes fixed:
- Dead LOG_LEVEL globals() lookup pinned root logger at INFO regardless of
PIC_LOG_LEVEL env or config; replaced with _resolve_root_log_level() +
apply_root_log_level() which sets both root logger and all attached handlers
at startup and on runtime re-apply.
- set_service_level() only set the named 'pic.<service>' logger; bare module
loggers (e.g. 'caddy_manager') were never reached, so per-service log files
stayed 0 bytes. Fixed via _SERVICE_MODULE_LOGGERS map covering all managers.
- Log viewer GET /api/logs had no level filter; added ?level= query param.
- Per-service log levels lived in an out-of-band config/api/log_levels.json
side-file with no validation; migrated into ConfigManager under a new
'logging' section ({python:{root,services}, containers:{caddy,coredns,
wireguard,mailserver,api}}) with get/set helpers, invalid-level rejection,
and one-time migration from the old file on first load.
New capabilities:
- Container log levels: Caddy (injects global log { level X } + hot reload),
CoreDNS (DEBUG enables log plugin, else errors-only), WireGuard/mailserver
via pending_restart path.
- PUT /api/logs/verbosity accepts {python, containers} dict; returns per-entry
applied:hot|pending_restart status.
- Webui Logs page gains two-section Verbosity tab (Python services + Container
services) with needs-restart badges.
- managers.py wires per-service loggers before manager instantiation and
re-applies persisted levels from ConfigManager; legacy log_levels.json read
removed.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
apply_routes now iterates over connection instances rather than types:
each instance gets its own fwmark, routing table, interface, and
redirect_port via _routing_connections / _resolve_peer_connection /
_apply_connection_for_src; kill-switch is enforced per iface-instance.
Old per-type MARKS/TABLES constants are kept only as migration scaffolding.
peer_registry: exit_via is now stored as a connection id (or 'default');
_migrate_exit_via_to_connection_id runs on _load_peers to upgrade legacy
type-string values; set_peer_exit_via validates against known connection
ids; VALID_EXIT_VIA removed; config_manager wired in from managers.py.
egress_manager: egress_overrides keyed by service_id → connection_id;
local MARKS/TABLES/EXIT_TYPES/_REDIRECT_PORTS/_add_tor_redirect removed;
(mark, table, redirect_port) resolved at apply-time via
connectivity_manager.get_connection; manifest egress.allowed still
enforced by connection type.
api/app.py + api.js: PUT peer/service exit endpoints accept {connection_id};
back-compat shim resolves a legacy type string to its single active instance.
Tests extended: two same-type instances produce distinct marks/tables/ports;
peer exit_via and egress override id migrations round-trip correctly;
single-instance behaviour is equivalent to the old type-keyed path.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
VPN clients got dns_probe_finished_bad_config / couldn't resolve any domain
after first setup because:
1. complete_setup() never wrote the split-horizon DNS zone for non-LAN modes;
SetupManager now accepts network_manager as an optional 3rd constructor
param, and complete_setup() calls
self.network_manager.update_split_horizon_zone(effective_domain, wg_ip,
primary_domain) for pic_ngo/cell_to_cell modes.
2. generate_corefile() used a tmp-file + os.replace pattern; the Corefile is
a Docker FILE bind-mount, so os.replace orphaned the inode and CoreDNS
never saw config updates. Fixed by truncating and rewriting in place
(open with 'w', seek(0), truncate()), preserving the inode CoreDNS holds.
api/managers.py passes network_manager into SetupManager.
Tests: new mock_network_manager fixture, 2 setup-zone tests, 1 inode
regression test in test_firewall_manager.py.
Verified live on pic1.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The peer table was empty because it was not consulting the peer registry;
now peers are driven by PeerRegistry so the Connectivity page reflects actual
connected cells.
Exit-key handling is unified: all code paths now use the same key derivation
so a store-service exit bridge and a manual WireGuard peer both produce
consistent routing state.
Two new egress exit types are added (sshuttle via SSH tunnel and proxy via
redsocks SOCKS5), wiring through connectivity_manager, egress_manager, and
app.py routes. This lets a cell route its traffic through an SSH host or a
SOCKS5 proxy as an alternative to WireGuard exit nodes.
ServiceStoreManager and ServiceBus updated so the egress lifecycle (install /
uninstall) is cleanly signalled between components.
Connectivity.jsx gains the Service Egress section, letting operators assign
and reassign egress methods from the UI without touching config files.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- Fix#2: Move DDNS bearer token from cell_config.json to data/api/ddns_token.
Token is now in the secrets store (data/) rather than the config store (config/).
Auto-migrates existing installs on first access. ConfigManager.get/set_ddns_token()
added. set_ddns_config() now strips 'token' key to prevent it leaking back.
- Fix#3: Set Caddyfile permissions to 0o600 after write so the token embedded
in the Caddyfile is not world-readable on the host filesystem.
- Fix#5: Heartbeat now fires IDENTITY_CHANGED after re-registration so Caddy
regenerates its config with the new token automatically — users no longer need
to click Re-register in Settings after a wizard registration failure.
Also: heartbeat skips the 401-cycle when no token exists and goes straight to
registration instead. DDNSManager now accepts service_bus= and is wired up.
- Fix#6: Settings page starts polling GET /api/caddy/cert-status every 15s
after a successful DDNS re-registration and shows "Acquiring certificate…"
feedback until Let's Encrypt issues the cert (up to 5 minutes).
- Fix#7: regenerate_with_installed() is debounced (5 s window) so two rapid
IDENTITY_CHANGED events (e.g. wizard + heartbeat) can't start simultaneous
ACME orders that interfere with each other.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- DNS (critical): add _configured_dns_params() that returns (primary_domain,
split_horizon_zones) from config_manager so all apply_all_dns_rules() callers
pass the correct primary zone (e.g. 'pic.ngo') and split-horizon list
(e.g. ['pic1.pic.ngo']) instead of the FQDN as the primary — fixes
DNS_PROBE_FINISHED_BAD_CONFIG for all external domains when on VPN
- firewall_manager: add split_horizon_zones param to apply_all_dns_rules()
and forward it to generate_corefile()
- Peers: filter service_access list to installed services only; peers.py
derives valid services from config_manager.get_installed_services() with
the email→mail ID mapping; Peers.jsx fetches from /api/store/installed
and filters the checkboxes and defaults accordingly
- Health check: fix file_manager→'files' ID mapping so files service health
is checked when installed (was silently skipped due to 'file' vs 'files')
- Verbosity persistence: move log_levels.json from non-mounted
/app/api/config/ to CONFIG_DIR (/app/config/) which maps to config/api/
on the host; both load (managers.py) and save (routes/services.py) updated
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When the email store service is installed but no explicit domain has been
set in its config, _provision_email now falls back to
config_manager.get_effective_domain() so peer account creation works
immediately without requiring a separate config step.
Also threads config_manager into AccountManager.__init__ (optional kwarg,
no existing callers break) so the fallback is available without a global
import.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Routes outbound traffic from installed service containers through
alternate exits (wireguard_ext, openvpn, tor) using host-side
iptables fwmark policy-routing in a dedicated PIC_EGRESS chain.
Marks 0x110/0x120/0x130 are distinct from ConnectivityManager's
0x10/0x20/0x30. Container IPs discovered at runtime via docker
inspect. Wired into ServiceStoreManager install/remove lifecycle
and managers.py singleton. 22 new tests.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Services are now installed post-setup from the Store page, so the
wizard step that let users pre-select email/calendar/files is removed.
Reduces wizard from 5 steps to 4 (Step4Services deleted, Step5Review
renamed to Step4Review). Backend drops services_enabled validation,
background install thread, and service_store_manager dependency.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Email/calendar/files routes now return 404 when the service is not
installed, using a require_active_service decorator that checks
ServiceRegistry. Status endpoints are exempt so health checks always work.
SetupManager.complete_setup() now accepts a service_store_manager and
installs any wizard-selected services in a background daemon thread after
setup completes. Failures are logged but do not fail the wizard.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ServiceStoreManager.install() now delegates container lifecycle to
ServiceComposer (per-service docker-compose.yml) instead of appending to a
shared compose override. This eliminates IP pool allocation, compose override
rendering, and the single-stack docker exec approach.
Changes:
- service_composer.py: add _resolve_requires(), _resolve_dependents(),
reapply_active_services() — dependency graph and startup reapply
- service_store_manager.py: rewrite install() and remove() to use
ServiceComposer; add _fetch_template(); delete _allocate_service_ip(),
_render_compose_override(), _write_compose_override(); remove() now guards
against removing services that others depend on
- managers.py: pass service_composer= to ServiceStoreManager
- Tests: 13 new composer dep tests; TestInstall/TestRemove rewritten for
the new composer-driven path; test_optional_services_feature.py updated
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously, CaddyManager and NetworkManager contained hardcoded lists of
service names (calendar, files, mail, webdav, etc.), meaning every new
service required a code change to appear in Caddy routes and DNS records.
Now both managers accept a service_registry parameter and derive their
service lists dynamically from the registry at runtime.
- CaddyManager: new _build_registry_service_routes() and
_http01_service_pairs() methods pull routes from the registry
- NetworkManager: new _get_service_subdomains() method returns registry
subdomains with a hardcoded fallback when no registry is wired in;
_build_dns_records, stale-record detection, and service name sets all
use the registry
- managers.py: service_registry constructed before network_manager so it
can be injected into both CaddyManager and NetworkManager
- service_registry.py: validation chokepoint in get_caddy_routes() rejects
invalid subdomain/backend values and reserved service names
- service_store_manager.py: _validate_manifest now validates top-level
subdomain, backend, extra_subdomains, and extra_backends fields
- tests: 24 new tests covering registry-driven routing and DNS subdomain
generation (test_caddy_registry_integration.py)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- ConfigManager.get_effective_domain(): returns domain_name when DDNS
active (pic_ngo/cloudflare/duckdns), domain otherwise. Used by all
public-facing services so they use the real registered FQDN.
- ConfigManager.get_internal_domain(): always returns _identity.domain
(CoreDNS zone name, dnsmasq, cell-link invites — stays internal).
- Silent migration: if domain_mode != lan and domain is generic "cell",
auto-set to {cell_name}.local for unique CoreDNS zone naming.
- caddy_manager: fix custom_domain bug — cloudflare/http01 modes were
reading identity.get('custom_domain') which never exists; now reads
domain_name correctly.
- routes/config, app: expose effective_domain in GET /api/config and
/api/status responses.
- email_manager, routes/email: use get_effective_domain() for
OVERRIDE_HOSTNAME, POSTMASTER_ADDRESS, and new-user email defaults.
- ServiceBus.IDENTITY_CHANGED event: emitted from PUT /api/config and
POST /api/ddns/register after identity writes; caddy_manager and
email_manager subscribe to regenerate config automatically.
- Settings.jsx: hide Local Domain input in non-LAN modes; show
read-only effective_domain with "managed by DDNS" badge and an
Advanced toggle for the internal CoreDNS zone name.
- 11 new test classes covering all new helpers, event subscriptions,
caddy/email handlers, and the custom_domain fix.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
A3 — Peer add atomicity: track firewall_applied flag and call
clear_peer_rules() during rollback so partial peer-add failures
don't leave stale iptables rules behind. Added test.
A2 — Pending config flag: instead of clearing before spawning the
helper container (fire-and-forget), set applying=True and let the
helper clear it on success by writing to cell_config.json via a
mounted /app/data volume. On API restart after a failed apply,
_recover_pending_apply() resets the applying flag so the UI shows
pending changes and the user can retry. GET /api/config/pending now
includes the applying field.
A5 (foundation) — Extract all manager instantiation into managers.py.
app.py re-exports every name so existing test patches (patch('app.X'))
continue to work unchanged. 1021 unit tests pass.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>