Skip to content

Changelog

All notable changes to Watchgrid are documented here. Format follows Keep a Changelog.


[Unreleased]

[1.27.0] - 2026-05-18

Added

  • System → About page (/system/about) consolidates versions, control-plane endpoints, container CPU/memory, Postgres health, and live tenant counts into a single screen. CPU and memory are sourced from cgroup v2 (/sys/fs/cgroup/cpu.stat + memory.current / memory.max) with cgroup v1 and host /proc fallbacks, so the percentages reflect the actual server container — not the host VM. Database card runs a 2 s PingContext, surfaces ping latency, Postgres version, the live pg_stat_activity connection count for current_database(), and the sql/pgx pool's open/in-use/idle counters. The Resources card auto-refreshes every 5 s; the rest of the page reloads on demand via the Refresh button. New endpoint GET /api/system/stats powers the card and is documented in Swagger.
  • Configurable session timeout + IP-bound sessions — new system_settings table (migration 022) stores session_ttl_minutes (default 60 minutes, was hardcoded at 24 hours) and session_bind_ip (default on). Admins manage both from a new "Session" card at the top of System → Users. JWTs now carry an ip claim derived from X-Forwarded-For / X-Real-IP / RemoteAddr; requireAuth rejects requests whose client IP differs from the claim when IP binding is on. Tokens minted before this release have an empty IP claim and are grandfathered through the IP check so existing sessions aren't yanked at deploy time. Auth cookie Max-Age and the registry-token expires_in now both follow the configured TTL. Backed by GET /api/system/settings (any authed user) and PUT /api/system/settings (admin only) with a 1-minute in-memory cache. Registry tokens are deliberately not IP-bound — docker daemon traffic comes over a separate connection.
  • Per-repository sync feedback in App StoreSync All now fires every repository in parallel via Promise.all, and each repository row shows its own inline state (Syncing… with a pulsing dot, Synced at HH:MM:SS in green, Sync failed: <reason> in red) instead of a single blocking browser alert at the end. The aggregate toast tells you "synced N repositories" or "X of N failed — see row for details."

Changed

  • SSH-key repository auth is now stored in the database, not as a filesystem path — the Add Repository form's "SSH Key Path" single-line input is replaced with a "SSH Private Key" textarea that accepts the PEM key contents directly. The server writes the value to a per-sync os.CreateTemp file with 0600 permissions, hands the path to git via GIT_SSH_COMMAND=ssh -i <tmp> -o IdentitiesOnly=yes -o StrictHostKeyChecking=no, and defers the file removal. The repository list/create API responses now run through a redactedRepo helper that blanks password and ssh_key and emits has_password / has_ssh_key booleans — secrets that the API previously echoed back to every caller are no longer disclosed.
  • Notifications are uniformly toasts — the inline mil-banner-success / mil-banner-error blocks and native alert() / confirm() dialogs were swept out of DNS, SSO, PKI, App Store, Audit Log, Inventory, License Management, Firewall, Tenants, User Management, Registry Manager, K8s Device Panel, App Config Modal, Host Row, and Cluster Row in favour of the existing ToastProvider API and ConfirmProvider modal. Persistent page-state banners (initial-load errors, license-summary status card, permission-restriction notices, delete-confirm modal warnings) were intentionally left in place — those are page state, not transient notifications. Pre-auth flows (Login, 2FA setup, onboarding wizard) keep their inline banners since there's no ToastProvider mounted before login.
  • Sidebar redesign — the Hosts Overview page header lost its Sites eyebrow so it matches the flat title pattern used everywhere else (DNS Management, Single Sign-On, User Management, etc.). The main left sidebar is narrower (w-56w-52), with tighter row padding (px-6 py-3px-5 py-2.5) and a smaller logo cell; system submenu items use px-8 py-2. Every nav label has whitespace-nowrap so long items like "Site Management" and "Pending Approvals" never wrap. The inner Sites column (visible on /inventory) is also narrower (w-60w-48), drops the "List" subheading, replaces the rounded "card" treatment with the flat left-accent-bar style used in the main nav, and truncates long site names. The Sites column no longer appears on /sites (Site Management) where it was redundant with the page content.
  • Sites workspace collapse arrow looks like a macOS sidebar toggle — the chevron-in-a-bordered-pill (|<|) was reading as "something hidden between two vertical lines" (h/t Joël for the feedback). Now a borderless SF-Symbols-style sidebar icon (rounded rect + inner divider) with a subtle green pane fill when expanded, and a hover-only background. The toggle column lost its border-r so it no longer visually frames the icon.
  • Version stickers moved off the chrome — the v1.26.x / Agent v1.26.x strip under the WatchGrid logo was visual noise. Both lines are gone from the sidebar; versions now live exclusively in System → About, where Server, Agent, uptime, and platform sit on the Versions card.

Security

  • Default session lifetime cut from 24 hours to 1 hour. Combined with the new IP binding (also on by default), a stolen token is dead within an hour and unusable from a different network within seconds.
  • Repository credentials redacted in API responsesGET /api/repositories previously returned the raw password and ssh_key fields to any authenticated caller. The new redactedRepo helper blanks both and emits boolean has_password / has_ssh_key flags instead. POST responses use the same redaction.

Fixed

  • docker-compose.ui-test.yml failed to start after the JWT-length validator landed — the committed JWT_SECRET was 25 characters, but the server now rejects anything under 32. The committed value is now a 60-char dev-only secret. While in the file, also seed WATCHGRID_ALLOWED_ORIGINS with all four localhost:5173 / 5174 and 127.0.0.1:5173 / 5174 variants so a browser opened on the Vite default origin gets through CORS instead of seeing "Network error" after the in-browser fetch is 403'd. Additionally WATCHGRID_DEV_MODE=true now auto-appends those four origins inside the server (allowedBrowserOrigins in ws_security.go), so a fresh checkout works without any env-var dance.
  • docs/local-ui-testing.md pointed at the wrong Vite port — was 5174, is actually 5173.

Database migrations

  • 022_system_settings.sql — new system_settings (key, value, updated_at, updated_by) table seeded with session_ttl_minutes=60 and session_bind_ip=true. Mirrored into scripts/init-db.sql for clean installs.

[1.26.7] - 2026-05-14

Fixed

  • Cluster-agent pod restart wiped Watchgrid's view of installed apps — the cluster-agent's proxy route map is in-memory, and after kubectl rollout restart deploy/watchgrid-cluster-agent -n watchgrid-system the next heartbeat carried exposed_apps: []. ClusterRow.jsx reads /api/clusters/apps which just returns the device's reported exposed_apps, so the UI offered the apps as available-to-install even though their Deployments and Services were still running in the cluster. With the new heartbeat-driven DNS sync from 1.26.6 it got worse — the empty array also caused syncClusterDNSRecords to delete the cluster's DNS rows. Fix is two-sided: (1) cluster-agent/commands.go handleDeploy now stamps annotations on the Service after kubectl apply (app.watchgrid.io/hostname, /expose-port, /protocol, /name) — the server already set app.watchgrid.io/managed-by=watchgrid and /name as labels via addAppLabels, but those alone weren't enough to rebuild a route because the port and protocol were lost; (2) new cluster-agent/rediscover.go runs once at startup after WireGuard comes up, lists every Service cluster-wide carrying the managed-by label, and rebuilds the in-memory proxy map from the annotations. Includes a fallback path for apps deployed before this fix shipped: hostname is derived from BuildDNSHostname(appName) and port falls back to svc.Spec.Ports[0].Port — good enough for the bundled demo apps, and gets superseded by exact annotations on the next redeploy. After this fix lands, the next heartbeat repopulates exposed_apps, the 1.26.6 DNS sync re-upserts records, and the UI shows the apps as installed again — automatically, without any manual redeploy.

[1.26.6] - 2026-05-14

Fixed

  • Cluster-agent DNS registration calls were 401-ingcluster-agent/dns.go POSTed proxy hostnames to /api/dns/records, but that endpoint is gated by requireAuth (admin JWT) and the cluster-agent has no token, so every deploy logged DNS register failed for {hostname} after 3 attempts: status 401 (operators saw this for nginx-demo---local, hello-web, etc.). The Kubernetes/Proxy UI's "Open WebUI" link reverse-proxies via /api/k8s/service-proxy/... and dials dev.WireGuard.TunnelIP:8081 directly, so this didn't block that flow — but the *.wg hostnames advertised on the cluster's proxy never had matching DNS records, so anyone trying to resolve {app}.wg directly (over the VPN) got NXDOMAIN. The cluster-agent already publishes its current proxy routes in the exposed_apps array of every heartbeat, so the server now owns DNS state: a new syncClusterDNSRecords function in server/main.go upserts a custom_dns_records row for each advertised hostname (pointing at the cluster's tunnel IP) and deletes rows for hostnames the cluster previously claimed but no longer reports. Per-cluster ownership is tracked in an in-memory map, with a safety guard so we only delete records whose IP still matches the cluster's tunnel — operators who overwrote a record manually won't have it clobbered. cluster-agent/dns.go was deleted; commands.go and main.go no longer wire a DNSManager. Stale rows from before this fix linger until the same cluster reasserts a smaller exposed_apps set; clean them up by hand via DELETE FROM custom_dns_records WHERE ip_address = '<cluster tunnel ip>' AND hostname NOT IN (...) if needed.
  • Registry Test Web couldn't register a proxy route because its metadata had no expose_portapps/registry-test-web/metadata.yaml was missing expose_port and expose_protocol, so when the cluster-agent's deploy handler read the metadata it had nothing to add to the proxy and never called proxy.AddRoute(...). Symptom: the agent log showed Command deploy completed: Deployed Registry Test Web: ... but no matching Proxy route added line, and clicking the app in the UI did nothing because the cluster's exposed_apps heartbeat array never contained it. Set expose_port: 80 and expose_protocol: http on the metadata to match the Service definition in deployment.yaml. (Note: the deployment image is pulled from registry.wg:5000/..., which still requires containerd registry-mirror configuration on each cluster node — that's a separate problem if image pulls fail.)

[1.26.5] - 2026-05-14

Fixed

  • K8s reverse proxy showed "cluster device not reachable" because the cluster-agent never brought up WireGuardclusterProvisionHandler generated the cluster-agent ConfigMap with WATCHGRID_DISABLE_WIREGUARD: "true" baked in. That dates back to when the cluster-agent shipped (commit 22d6167) and the only traffic it needed was HTTPS to the public server URL for registration/heartbeats/commands. The K8s service-proxy feature added later (k8sServiceProxyHandler in server/main.go) reverse-proxies into cluster workloads by dialing the cluster-agent's :8081 endpoint at dev.WireGuard.TunnelIP, so a cluster-agent without a tunnel always fails with "cluster device not reachable" (server/main.go:3624-3626). The provisioning manifest now sets WATCHGRID_DISABLE_WIREGUARD: "false" (with an inline comment explaining when to disable it). Prerequisites — /dev/net/tun mount and NET_ADMIN capability — were already in the manifest, so userspace WireGuard via wireguard-go works without any other changes. If WG setup fails on a given node, the cluster-agent logs WireGuard setup failed: ... (continuing without VPN) and proceeds, so the failure mode is graceful. Operators with already-onboarded clusters need to regenerate the cluster manifest in the UI and re-apply it on the cluster so the cluster-agent pod restarts with the new env var.

[1.26.4] - 2026-05-14

Fixed

  • App deploys to newly-onboarded clusters failed silently because the cluster row was never written to devicesregisterHandler upserted the device into the in-memory devices map and returned 200, but never called dbSaveDevice. Heartbeats from a registered device take the UPDATE-only path (dbUpdateHeartbeat), so a missing row stays missing forever; zero rows affected, no error. The cluster therefore showed up in /api/clusters (which reads the in-memory map) and in the UI, but cluster_commands.cluster_id has a foreign key on devices(id) ON DELETE CASCADE, so the first INSERT INTO cluster_commands (...) from clusterDeployHandler → dbEnqueueClusterCommand was rejected by Postgres and the deploy never reached the cluster-agent. Symptom on the rob.trial server: an Omni cluster heartbeating with K8s info visible and 18 pods reported, but the user could not install any app onto it. registerHandler now persists the device with dbSaveDevice immediately after upserting the in-memory map (same pattern the heartbeat auto-register path already uses), so the row exists by the time any FK-dependent insert runs. Operators with already-onboarded clusters that hit this bug will need to delete/re-apply the cluster manifest (or restart the cluster-agent pod) so it re-issues /api/register against the patched server.

[1.26.3] - 2026-05-07

Fixed

  • Onboarding & cluster manifest URLs honor the real client scheme behind Caddy — the frontend nginx forwarded X-Forwarded-Proto: $scheme to the Go server, but $scheme is the scheme nginx itself listens on (port 80, so always http). That overwrote the https value Caddy was already setting on the inbound request, so forwardedProto(r) returned http and the onboarding "Reprovision Existing Device" / "Basic Installation" / "Installation With Kubernetes" curl snippets — plus the server_url baked into the cluster manifest — all came out as http://... even when the operator was on HTTPS. New map $http_x_forwarded_proto $forwarded_proto block at the top of frontend/nginx.conf falls back to $scheme only when the upstream proxy didn't set the header; all three proxy_set_header X-Forwarded-Proto ... lines (/api/, /downloads/, \.sh$) now use that variable.
  • Duplicated --siteid flag in onboarding commands — the backend getOnboardingInfoHandler appends --siteid <id> whenever the request carries site_id=…, and Inventory.jsx was also wrapping the returned strings with siteScopedCommand() which appended the same flag again. The deduplication regex inside siteScopedCommand was broken ("+? required at least one literal quote, which the backend output never has), so the second flag always slipped through and operators saw --siteid 1 --siteid 1. Removed the redundant frontend wrapper entirely; the backend is the single source of truth.
  • Device-detail tabs wrap instead of clipping the rightmost onesHostRow.jsx and ClusterRow.jsx rendered the tab strip with flex … overflow-x-auto whitespace-nowrap … no-scrollbar. On a typical desktop width the ninth tab (Kubernetes, only present when device.k8s_enabled) was pushed past the right edge with the scrollbar deliberately hidden, so operators couldn't see or reach the Kubernetes tab even though the tab itself was rendered. Replaced with flex flex-wrap gap-x-6 gap-y-2; tabs now flow to a second line when they don't fit, which mirrors how page-level tab strips already behave.
  • Sites/Inventory side panel starts expanded each time you enter that workspacesitesExpanded was initialised once at Layout mount from location.pathname, so if the app loaded on Dashboard (or anywhere outside /sites / /inventory) the panel was stuck collapsed even after you navigated into Sites. Default state is now true and a small useEffect re-expands the panel whenever the user enters the Sites/Inventory workspace from outside it. In-session manual collapse via the toggle still works — leaving the workspace and coming back simply reopens it.
  • Onboard-To-Site modal couldn't scroll on shorter viewports.mil-modal-card had no max-height and no overflow rule, so the Cluster Manifest section at the bottom was clipped off-screen with no way to reach it. Added max-h-[90vh] overflow-y-auto to the shared modal card class.
  • K8sDevicePanel is not defined when opening Kubernetes on a host device — the Inventory.jsx split into per-row component files (HostRow.jsx, ClusterRow.jsx, etc.) extracted the JSX that renders <K8sDevicePanel ... /> into HostRow.jsx but did not carry the import K8sDevicePanel from '../K8sDevicePanel' along with it. ClusterRow.jsx got the import; HostRow.jsx didn't. The build still succeeded because JSX references are compiled to React.createElement(K8sDevicePanel, …) calls that only blow up at render time, so the regression only surfaced when an operator with a K3s-enabled host clicked the Kubernetes tab — they then saw a red error overlay instead of the cluster panel. Added the missing import; verified the eight PascalCase JSX tags in HostRow.jsx now all resolve.
  • createPortal is not defined in cluster + app-config modals — same class of regression as the K8sDevicePanel miss: the inventory split moved createPortal(...) calls into ClusterRow.jsx and AppConfigModal.jsx without bringing import { createPortal } from 'react-dom'. TerminalOverlay.jsx and K8sDevicePanel.jsx had the import; the other two did not. Same Vite-can't-see-it-at-build-time, blows-up-at-render mechanism — the failure surfaced when a user opened the cluster row's config modal or the app-config modal. Added the missing imports; full audit of every PascalCase JSX tag across src/components/inventory/*.jsx now passes (no other unbound references).

[1.26.2] - 2026-05-07

Fixed

  • Creating users in the UI no longer fails with users_role_check — the System → Users dropdown offered user, admin, operator, but the database constraint users_role_check only accepts super-admin, tenant-admin, operator, viewer. Saving any role other than operator was rejected by Postgres with pq: new row for relation "users" violates check constraint "users_role_check". The dropdown is now viewer / operator (plus tenant-admin and super-admin when the caller is a super-admin), the backend default in usersCreateHandler is viewer, the privilege gate is updated to block super-admin/tenant-admin for non-super-admins, and unknown roles are now rejected with a 400 before they reach the DB.

Removed

  • Licensing docs pagedocs/licensing.md removed and dropped from the mkdocs.yml nav and the docs/index.md table; the Licensing / editions row in CLAUDE.md's docs-sync table is gone too.
  • "Getting Help" footer on docs landing page — removed the trailing Getting Help section (sales email + "Watchgrid B.V. — The Netherlands" line) from docs/index.md.

Changed

  • Dashboard map fits all devices + control plane on first loadMapContainer only honors center/zoom on mount, and serverLocation is fetched async, so the very first paint had only the device list (one Pi → zoom 10 → map locked on London). After the server location landed the props were ignored and the user was stuck looking at London with the Falkenstein control plane off-screen. New FitToContent child uses useMap().fitBounds once data arrives, then sets a ref so subsequent polling refreshes don't yank the user's manual pan/zoom back.

Fixed

  • Device flapping between assigned site and "unassigned" after WireGuard approvalwgApproveHandler was overwriting the in-memory devices[id] entry with a stripped Device{} that carried only the ID, tunnel IP, and LastSeen. That clobbered the TenantID, Hostname, DeviceType, and crucially the SiteID that wgRegisterHandler had just populated. The DB row stayed correct, so REST /api/devices?tenant_id=X (which filters in-memory by tenant and falls back to the DB record) still returned the right site, but the dashboard WebSocket snapshot uses tenantID="" and the in-memory entry survives the merge — every WS push wiped site_id and the device jumped to "Unassigned" until the next REST poll restored it. The handler now updates LastSeen and the tunnel IP on the existing in-memory record instead of replacing it. Existing servers with corrupt in-memory state recover on the next restart (the map is rehydrated from the DB at boot).
  • Onboarding commands respect TLS-terminating reverse proxygetOnboardingInfoHandler (and the server_url it returns to the cluster-manifest generator) built the URL with r.TLS != nil, which is always nil when TLS is terminated upstream. The Onboard-To-Site modal therefore showed curl -fsSL http://... even when the operator was logged in over HTTPS, leaving them to hand-edit the command. Now uses the existing forwardedProto(r) helper (which honors X-Forwarded-Proto) and falls back to X-Forwarded-Host when set, matching the convention already used by getExternalBaseURL for OIDC redirects.

Changed

  • SSO config moved to its own System menu item — the OIDC settings form is no longer a section under System → Users; it now lives at System → SSO (/system/sso). New frontend/src/SSO.jsx owns the form, fetch, and save handlers. Users.jsx drops the OIDC state, fetchOIDCSettings, handleSaveOIDCSettings, and the embedded form (~200 lines). Same backend endpoints (GET/PUT /api/auth/oidc/settings); no API changes. Page is super-admin only — non-super-admins see a permission-denied notice instead of the form.
  • Inventory.jsx split into per-component files (#42) — the 4308-line monolith is now 2218 lines. New frontend/src/components/inventory/ hosts six extracted files: shared.jsx (formatters + InfoPanel/InfoRow/DeviceTabPanel/RuntimeTrend), HostRow.jsx (1237 lines), ClusterRow.jsx (538 lines), ServiceRow.jsx, AppConfigModal.jsx, TerminalOverlay.jsx. All 17 unit tests still pass; frontend build clean; no behaviour change (each extracted function already took its dependencies via props so extraction was purely structural).

Added

  • Web Vitals reporting (#44)web-vitals initialised in main.jsx; LCP / CLS / INP / FCP / TTFB ship via navigator.sendBeacon to a new POST /api/metrics/vitals endpoint on the server, which folds the values into two Prometheus histograms (watchgrid_web_vitals_ms and watchgrid_web_vitals_cls). No third-party analytics — stays on the customer's own infrastructure. Endpoint is exempt from CSRF + CORS (fires before auth, via sendBeacon which may drop cookies).
  • Vitest + React Testing Library harness (#45)vite.config.js gains a test: section (jsdom, globals, coverage via v8). New src/test/setup.js pulls in @testing-library/jest-dom matchers and cleans the DOM between cases. 17 tests shipped against useApi, usePolling, ConfirmProvider, ToastProvider; CI gates the frontend image build on vitest run via a new test-frontend job in build.yml. npm test, npm run test:watch, npm run test:coverage added to scripts.
  • Virtualization primitive for long device lists (#43)react-window dependency + new frontend/src/lib/virtualList.js exposing VirtualizedList and a VIRTUALIZE_THRESHOLD constant (300). Not yet wired into Inventory's expandable-row path — current tenants stay well below the threshold and expandable rows need per-row height tracking — but the primitive is ready for the first customer who trips it.

Changed

  • Shared skeleton-loader primitives (#37) — new components/Skeleton.jsx (SkeletonBlock, SkeletonLine, SkeletonCard, SkeletonRows). Inventory and Dashboard render skeletons matching the real layout on first load instead of spinner→content, eliminating CLS. Announced to screen readers via role="status" + aria-busy="true".
  • Typed API client helper (#41)frontend/src/lib/api.js wraps AuthContext.apiRequest and exposes useApi() returning { get, post, put, patch, delete, raw }. Errors throw a typed ApiError with .status, .body, human-readable .message; uncaught errors auto-surface via the toast system (401 is skipped — already handled by the session-expired overlay). Firewall delete flow seeded as a migration example; remaining call-sites can move incrementally.

Changed

  • Exponential backoff with jitter on terminal reconnect (#36)DeviceTerminal.jsx moves from fixed 2s × 2^n retries to full-jitter exponential backoff (1 s base, 30 s cap, 6 attempts). Counter resets on every successful ready status so a network blip doesn't count against a fresh streak. After max attempts the UI shows a Reconnect button that resets the counter.
  • CSP tightened (#40)connect-src dropped from 'self' ws: wss: (any-origin) to 'self' (modern browsers cover same-origin WSS). Added frame-src 'none'. Preserved style-src 'self' 'unsafe-inline' as a documented Tailwind exception; nginx.conf now carries the full rationale inline.
  • Mobile: horizontal scroll on device-panel tabs (#39) — Inventory device- and cluster-panel tab rows switch from flex-wrap to horizontal scroll (overflow-x-auto + whitespace-nowrap + new .no-scrollbar utility in index.css). Min 44 px tap target per WCAG 2.5.5. Added role="tablist" / role="tab" / aria-selected.
  • Accessibility spot pass (#38)Login error banner is now role="alert" + aria-live="assertive"; username/password inputs gain aria-invalid/aria-describedby pointing at the error banner, plus correct autoComplete hints.

Added

  • Structured JSON logging with log/slog (#13) — new server/logging.go and agent/logging.go install a slog.JSONHandler at the level picked from WATCHGRID_LOG_LEVEL (debug/info/warn/error; default info). Server logs carry component=server + version; agent logs add device_id when available. The standard log package is bridged through slog via log.SetOutput so existing log.Printf call sites emit JSON immediately (marked legacy_log=true) — migration to first-class slog attributes can land incrementally without touching every file at once.
  • GDPR user export + cascade purge (#24) — two new super-admin-only endpoints. GET /api/users/{username}/export bundles every row across users, admin_audit_log, device_security_log, ssh_certificates, license_audit_log, and device_profile_runs that references the user into a single JSON download (password hash + 2FA secret redacted). DELETE /api/users/{username}?purge=true runs a transactional cascade-delete across the same tables, scrubs username matches in device_security_log.details, and writes a user_gdpr_purge audit entry before deleting so the record survives its own cascade. The regular DELETE /api/users/{id} without ?purge=true keeps its existing off-boarding semantics. Documented in docs/users.md#gdpr--data-subject-access-requests.
  • Prometheus /metrics endpoint (#14)github.com/prometheus/client_golang with a package-private registry instruments HTTP routes (low-cardinality route bucketing, status class, latency histogram), agent heartbeats, login failures by reason, per-limiter rate-limit rejections, WireGuard peer count, and DB pool stats (open + in-use). Mounted at /metrics on :8080; production deployments must block external access via the reverse proxy / NetworkPolicy. Documented in docs/production.md#observability-prometheus.
  • k8s hardening: PodDisruptionBudget + default-deny NetworkPolicies (#22) — new k8s/07-policies.yaml ships PDBs (minAvailable: 2 for server, minAvailable: 1 for Postgres) plus a default-deny network policy with explicit allow edges: frontend→server, server→postgres, server→registry, cluster-agent→server, kube-dns for every workload, and controlled Internet egress from the server (RFC1918 excluded).
  • Route-level code splitting with React.lazy + Suspense (#31) — every dashboard route except / (Dashboard) is now React.lazy-loaded with an accessible spinner fallback. The entry chunk shrank from ~375 KB raw / 87 KB gz to 60 KB / 19 KB gz (3× smaller); heavy screens like Inventory (40 KB gz) and Sites/Users/Tenants (~4.5 KB gz each) ship only when visited. Initial-load JS+CSS drops from ~163 KB gz to ~136 KB gz.
  • Production runbook (#16) — new docs/runbook.md covers first-install checklist, backup & verification drills (Postgres, WireGuard key, SSH-CA), upgrade + rollback procedure with post-upgrade smoke tests, an alert → action playbook, leader-election verification drill, and the observability cheat-sheet. Linked from mkdocs.yml and deployed at docs.watchgrid.dev/runbook/.
  • CSRF protection on state-changing endpoints (#19) — new csrfMiddleware enforces the double-submit-cookie pattern: on login the server sets a non-httpOnly watchgrid_csrf cookie with 32 random bytes; AuthContext.apiRequest reads it back and echoes the value in an X-CSRF-Token header on every POST/PUT/PATCH/DELETE. The middleware compares header against cookie in constant time and returns 403 on mismatch. Exempt paths: agent endpoints (token-authenticated), registry proxy, /downloads/, WebSocket upgrades, /api/auth/login, /api/auth/oidc/*. GET/HEAD/OPTIONS and Authorization-header-only clients are unaffected.
  • Password policy + bcrypt cost 12 (#20) — new validatePasswordPolicy enforces min length 8, mixed case + digit + special, and a common-passwords blocklist on every user-facing password endpoint (createUserHandler, changePasswordHandler, onboardingHandler). Bcrypt cost raised from the library default (10) to 12 for new hashes. GET /api/auth/password-policy publishes the rules so the frontend can render live feedback. Existing stored hashes at cost 10 remain verifiable — no forced migration. Env-var admin bootstrap does NOT validate (so operators with weak existing ADMIN_PASSWORD don't lose access on upgrade).
  • Toast notification system (#33) — new ToastProvider exposes toast.success/error/info with accessible role="status" announcement, auto-dismiss (5 s for info/success, 8 s for errors), pause-on-hover, and focus behaviour. Top-level alert() call sites in Tenants, ProvisioningProfiles, AdminDevices, Sites, and more migrated off the browser's native dialogs.
  • Branded ConfirmModal replaces window.confirm (#32) — new ConfirmProvider exposes confirm({ title, message, variant }) returning Promise<boolean>, with focus trap, ESC-to-close, backdrop-click-to-cancel, and a variant: 'danger' style for destructive actions. Sign-out in Layout, plus delete flows in Users, Sites, DNS, Firewall, ProvisioningProfiles, Tenants, and AdminDevices all migrated.
  • Cluster command queue persisted to PostgreSQL — new cluster_commands table (migration 021) stores every deploy/delete/restart operation destined for a cluster-agent with its kind, JSONB payload, status (pending/claimed/done/failed), idempotency key, result, and lifecycle timestamps. clusterDeployHandler and clusterUndeployHandler now enqueue to the DB via dbEnqueueClusterCommand; commandHandler claims the next command atomically with SELECT ... FOR UPDATE SKIP LOCKED when a cluster-agent polls; commandResultHandler marks the claimed row done/failed on result POST. An idempotency key (<kind>:<app>:<namespace>) deduplicates double-click enqueues while a previous command for the same target is still in flight. On server startup, commands stuck claimed for more than 10 minutes are reset to pending for re-delivery (Kubernetes deploys are idempotent). New GET /api/clusters/commands?cluster_id=<id> endpoint returns queue history. Documented in docs/clusters.md#command-queue.

Fixed

  • Hijacker/Flusher interface forwarding through metrics middleware (#57) — Tier 3a's statusRecorder embedded http.ResponseWriter for Write/WriteHeader but Go doesn't promote interfaces across embedded fields, so http.Hijacker was lost. gorilla/websocket refuses to upgrade without Hijacker and 500'd on /api/ws/dashboard. Surfaced by post-deploy E2E smoke. statusRecorder.Hijack() + Flush() now forward explicitly.
  • CSRF bypass for Authorization: Bearer / X-Agent-Token (#55) — follow-up to #48. Requests authenticating with header tokens can't be forged cross-origin (CORS preflight blocks setting those headers), so CSRF adds no defensive value and just breaks API-token callers that happen to share a browser with a live session cookie. Surfaced by the E2E firewall suite, which uses Bearer auth for fixtures.
  • E2E fixtures aligned with Tier 2b changes (#54, #56) — sign-out test now drives the ConfirmModal instead of window.confirm; multi-tenancy fixture password bumped from 9 chars to 19 chars to meet the new policy; firewall-delete test drives the modal.
  • CI pipeline unblocked (multiple hotfixes early in the session)aquasecurity/trivy-action tag pinned to @v0.36.0 (earlier @0.28.0 didn't exist as a tag and its transitively-referenced setup-trivy@v0.2.1 was also removed). Image refs lowercased via a REPO_LC env var (Trivy can't parse uppercase). dorny/paths-filter@v3 received explicit pull-requests: read permission on the detect-changes job (PRs otherwise failed with "Resource not accessible by integration" and skipped every downstream build). The static-analysis workflow (security.yml with CodeQL + Trivy SARIF upload) was removed — every job in it targets GitHub code-scanning alerts, which requires GitHub Advanced Security on private repos; the image-scan Trivy gate in the build workflow remains the primary CVE enforcement path.
  • Migration 020 insert statement020_onboarding_token_expiry.sql was missing the name column in its INSERT INTO schema_migrations statement, causing fresh-install migrations to fail at that step. The insert now matches the format used by migrations 018 and 019.

Changed

  • CI: image scanning + SBOM — every build workflow run now scans the pushed image with Trivy (CRITICAL-severity gate, ignore-unfixed) and generates a CycloneDX SBOM per component (server, frontend, cluster-agent, service-agent) as a 90-day retained artefact. CVE response SLA documented in docs/production.md (Critical: triage 24h, patch 3d, release 7d). The originally-planned static-analysis workflow (CodeQL Go + CodeQL JavaScript + Trivy fs SARIF upload) was dropped — every job in it relies on GitHub code-scanning alerts, which requires GitHub Advanced Security on private repos. The Trivy image gate in the build workflow remains the primary CVE enforcement path.
  • K8s manifests: container images pinned to semver + sha256 digestimagePullPolicy changed from Always to IfNotPresent in all base manifests. k8s/base/kustomization.yaml and k8s/kustomization.yaml default to the current release tag instead of :latest. The production overlay (k8s/overlays/production/kustomization.yaml) carries both newTag and digest fields. scripts/pin-images.sh <version> fetches digests from the registry (using crane, skopeo, or docker) and updates the overlay in one step. The release CI job (pin-manifests) runs automatically on semver tag pushes, captures the digest from the build step, and commits the updated overlay to main. Dev overlay retains :latest intentionally.

Added

  • Real /healthz and /readyz endpoints (#15) — liveness probe runs a DB PingContext, checks that the wg0 WireGuard device is present, and stats the default-tenant SSH-CA host key. Readiness additionally gates on a migrationsApplied flag that flips once schema_migrations has at least one row. k8s/04-server.yaml swapped tcpSocket probes for httpGet ones and picked up a startupProbe so slow first migrations don't trigger liveness failures.
  • Proprietary LICENSE, EULA.md, NOTICES.txt (#26)LICENSE states the source-code terms, EULA.md is the customer-facing End User Licence Agreement, NOTICES.txt is regenerated by scripts/gen-notices.sh (Go modules via go-licenses when installed, else go list -m all; npm via license-checker-rseidelsohn). README now links all three. README's prior License: MIT footer was inconsistent with the product's paid licence-key enforcement — replaced with a Licensing section that points to the three new files.

Changed

  • Frontend polling: 30s default + pause on hidden tabs (#28) — new frontend/src/lib/usePolling.js hook encapsulates setInterval + visibilitychange so dashboards stop hammering the API when the operator switches tabs and resume immediately on focus. Dashboard, Sites, Inventory, AppManager, AdminDevices, and K8sDevicePanel moved to the hook; intervals raised from 10–15 s to 30 s. On a 100-device tenant that cuts idle background load roughly 3× (and 100 % while tabs are hidden).
  • Audit-log retention sweeper (#25) — new audit_retention.go runs a daily goroutine that DELETEs rows older than WATCHGRID_AUDIT_RETENTION_DAYS (default 90) from both admin_audit_log and device_security_log. Sweep cadence is tunable via WATCHGRID_AUDIT_RETENTION_SWEEP_HOURS (default 24). Guidance for regulated customers on archive-to-object-storage workflows added to docs/audit.md.
  • Frontend build hardening (#29, #30) — Vite now drops debugger statements and marks console.{log,info,warn,debug,trace} as pure in production builds (so they're tree-shaken from the shipped bundle while console.error is preserved for real errors). Sourcemaps no longer ship to prod, and React/router/xterm/leaflet/icons are split into their own long-cacheable chunks via manualChunks. frontend/nginx.conf serves /assets/ (Vite's hashed output) with Cache-Control: public, max-age=31536000, immutable while index.html stays no-cache. Initial-load gzipped JS+CSS is ~163 KB (target was <500 KB).
  • Soft 401 handling on background polls (#34)AuthContext.apiRequest(url, { background: true }) no longer hard-logs-out on 401. Instead it sets a sessionExpired flag, returns the response, and a new SessionExpiredOverlay mounts the Login form on top of the running app so unsaved form state is preserved. User-initiated requests still log out hard. Polling sites in Dashboard, Sites, Inventory, AppManager, Layout, AdminDevices, and K8sDevicePanel are opted in.

Security

  • CORS lockdown on browser-facing API (#23) — new corsMiddleware rejects requests carrying an Origin header that is neither same-origin (Origin host == request Host) nor on the comma-separated WATCHGRID_ALLOWED_ORIGINS allowlist. Agent endpoints (/api/register, /api/heartbeat, /api/commands/, /api/commandresult, /api/wg/..., /api/logs/...), the registry proxy (/api/registry/, /v2/, /registry/), public /downloads/, and WebSocket upgrades are exempt. OPTIONS preflights short-circuit with Access-Control-Max-Age: 600. Reuses the existing WATCHGRID_ALLOWED_ORIGINS env var (renamed helper allowedWebSocketOriginsallowedBrowserOrigins). Documented in docs/production.md.
  • Trial + prod Postgres TLS bootstrap fixed for fresh volumes — the previous design had two bugs that combined to make a fresh-volume bootstrap impossible. (1) command: postgres -c ssl=on -c ssl_cert_file=... crashed on first boot because the docker-entrypoint forwards command-line args to the temp server it spins up to run init scripts — and that temp server can't start without a cert that hasn't been generated yet, leaving the container in a restart loop where the init script never runs. (2) postgres:16-alpine doesn't ship openssl, and init scripts run as the unprivileged postgres user so apk add from the script isn't possible — the cert generation would have failed silently with exit 127 anyway. Fix: drop the command: override in both trials/docker-compose.trial.yml and docker-compose.prod.yml, switch the image to postgres:16 (Debian, ships openssl), and have scripts/postgres-ssl-init.sh generate the cert AND append ssl = on plus cert paths to postgresql.auto.conf. The real exec postgres after init reads auto-conf, persisting SSL across restarts. Re-run admin-panel/scripts/seed-kv.sh <kv-namespace-id> so Cloudflare KV picks up the new compose template; the seed script itself was updated to current wrangler syntax (kv key put + --remote).
  • Go toolchain bumped to 1.24.13 / 1.25.9 — addresses CVE-2025-68121 (crypto/tls: incorrect certificate validation in stdlib, CRITICAL) which was blocking the Trivy image gate in CI. server, agent, and service-agent Dockerfiles move from golang:1.24.9-alpine to golang:1.24.13-alpine; their go.mod toolchain directives bump from go1.24.9 to go1.24.13 so cross-compiled agent binaries embed the patched stdlib. cluster-agent pins golang:1.25-alpinegolang:1.25.9-alpine for explicit patch-level tracking.
  • PostgreSQL TLS enforced in productioninitDatabase now returns a fatal error if WATCHGRID_DB_SSLMODE=disable outside WATCHGRID_DEV_MODE=true. docker-compose.prod.yml mounts scripts/postgres-ssl-init.sh into /docker-entrypoint-initdb.d/, which generates a one-time self-signed cert and writes ssl = on (plus cert paths) into postgresql.auto.conf so the cluster starts encrypted from first boot, with WATCHGRID_DB_SSLMODE=require on the client side. Registry proxy logs a security warning when REGISTRY_URL uses HTTP for a non-localhost host. Trust bundle handling documented in docs/production.md.
  • SSH CA key backup & restore runbookscripts/backup-ssh-ca.sh creates an AES-256-CBC encrypted tarball of all four CA key files, supports local paths and rsync remote destinations, and retains a configurable number of backups (default 14). Systemd service + timer units in scripts/systemd/ for daily automated backups. Full restore procedure with RTO < 15 min documented in docs/ssh-ca.md#backup--restore.
  • Rate limiter memory bounded + Traefik cross-replica safety net — in-process rate limiter now runs a background goroutine that evicts stale buckets every 5 minutes (10-minute TTL), replacing the ad-hoc GC. docker-compose.prod.yml gains a Traefik auth-ratelimit middleware (10 req/min/IP) on the /api router as a cross-replica enforcement layer. Architecture decision documented in docs/production.md.
  • WebSocket endpoints require JWT before upgradePOST /api/ws-ticket issues a 2-minute purpose-bound ticket ("ws") so browsers can open WebSockets without putting a long-lived JWT in the URL. Both dashboardWSHandler and terminalUserWebsocketHandler now use shared extractWSToken + verifyWSToken helpers that accept regular JWTs, ws-tickets, httpOnly cookies, and the Sec-WebSocket-Protocol: watchgrid-jwt.<token> sub-protocol trick. terminalUserWebsocketHandler also gains a tenant check: the connecting user must have access to the session device's tenant.
  • Admin password required in production — server now refuses to start if neither ADMIN_PASSWORD nor ADMIN_PASSWORD_HASH is set and WATCHGRID_DEV_MODE is not true. The hardcoded changeme fallback is restricted to dev mode only. docker-compose.prod.yml updated to document and pass ADMIN_PASSWORD.
  • Onboarding tokens now expiretoken_expires_at column added to tenants; default TTL is 1 year on generation/rotation (raised from the original 30 days — short enough to bound the blast radius of a leaked token, long enough that fleets on a yearly re-image cadence don't have to rotate mid-cycle). Expired tokens are rejected at /api/register and /api/wg/register and the event is logged to device_security_log. Existing tokens are backfilled by migration 020. The Tenants UI shows expiry date (yellow <7 days, red = expired) and a "Rotate Token" button for admins.
  • Agent binary self-update now supports Ed25519 signature verification — when WATCHGRID_UPDATE_PUBKEY (hex-encoded Ed25519 public key) is set or the key is embedded at build time via -ldflags, the agent downloads a .sig file alongside the binary and verifies the signature before installation. Updates are rejected if verification fails. Falls back to checksum-only with a warning when no key is configured.
  • Provisioning script requires HTTPSprovision.sh now refuses http:// server URLs to prevent supply-chain attacks during agent binary download. Set WATCHGRID_ALLOW_HTTP=1 to override in local development.
  • SPKI certificate pinning in agent — set WATCHGRID_SERVER_SPKI to a comma-separated list of hex-encoded SHA-256 SPKI hashes to pin the server's TLS certificate. Both the HTTP client and WebSocket dialer enforce the pin.
  • Terminal WebSocket agent connection requires a per-session token — the server issues a one-time agent_token with each terminal session command. The agent sends it as X-Agent-Token header when connecting; the server rejects connections with a missing or incorrect token.
  • tcpdump interface name validated against system allowlistrunCapture now validates the interface name with a strict regex and confirms it exists on the host before invoking tcpdump, preventing command-injection via crafted interface names.
  • Packet captures written to private directory — captures are now stored in /var/lib/watchgrid/captures/capture.pcap (directory mode 0700, file mode 0600) instead of world-readable /tmp/capture.pcap.

[1.26.1] - 2026-04-21

Security

  • K8s API TLS verification restored — the server no longer disables certificate verification when routing K8s API calls through the WireGuard tunnel. The cluster CA embedded in the kubeconfig is now used with ServerName set to the original kubeconfig hostname, preserving cert validation while routing via the tunnel IP.
  • JWT secret minimum length enforced — the server now rejects startup if JWT_SECRET is shorter than 32 characters, preventing weak secrets that could be brute-forced to forge tokens.
  • JWT removed from service proxy URL — the K8s service proxy no longer appends the auth token as a URL query parameter (exposed in browser history and server logs). The httpOnly session cookie is used instead.
  • K8s service proxy restricted to Watchgrid server — the cluster-agent's port 8081 proxy now only accepts connections from the WireGuard gateway IP (100.64.1.254), blocking other WireGuard peers from reaching internal cluster services.
  • K8s proxy and registry DNS gateway IP derived per-tenant — the cluster-agent previously hardcoded 100.64.1.254 as the allowed gateway IP and DNS server, which broke multi-tenant deployments where the gateway is a different IP (e.g. 100.64.2.254). The gateway is now derived from the cluster-agent's own tunnel IP (replacing the last octet with .254), matching the per-tenant subnet convention. The WATCHGRID_GATEWAY_IP env var can still override this for non-standard deployments.
  • System namespaces protected from destructive K8s commandshandleDeploy, handleDelete, handleRestart, handleScale, and handleK8sDeploy now reject any operation targeting kube-system, kube-public, kube-node-lease, or watchgrid-system.
  • OIDC issuer URL validated against SSRF — the server now resolves the OIDC issuer hostname before fetching the discovery document and rejects URLs that resolve to loopback, private, or link-local addresses, and requires HTTPS.
  • Firewall rule scopeID validated against tenant — when creating a firewall rule scoped to a site or device, the server now verifies that the resource belongs to the authenticated user's tenant, preventing cross-tenant rule injection.

[1.26.0] - 2026-04-20

Fixed

  • Dashboard map now visible on fresh tenants — the map is shown when the control plane server has a location set, even if no devices have registered yet
  • App Store repository sync crash — clicking Sync in the Repository tab threw "is not a function"; the onRepoChange callback was missing from the RepoManager component call

Security

  • Agent self-update now verifies SHA-256 checksum before installing the downloaded binary — a compromised or tampered binary is rejected before it can replace the running agent. The build script generates .sha256 files alongside each architecture binary; the server exposes them at /downloads/watchgrid-agent-{arch}.sha256.
  • Removed insecure_skip_verify: true from K3s registry config — the containerd registry configuration no longer disables TLS certificate verification. Plain HTTP endpoints (sufficient for the WireGuard-encrypted tunnel) are used directly, eliminating the unnecessary TLS bypass.
  • Shell command parameters no longer logged — the agent debug log redacted the full command Params field to prevent credentials or secrets embedded in shell commands from appearing in system logs.

[1.24.0] - 2026-04-12

Added

  • OIDC single sign-on — configurable Login with SSO button on the login page; supports Microsoft Entra ID and any OpenID Connect provider
  • Super-admin SSO settings panel in System → Users for configuring issuer, client ID/secret, button text, claim mapping, default tenant/role, and auto-provisioning behavior
  • Automatic OIDC user linking and provisioning with persisted auth_source metadata
  • System → Admin Devices — dedicated page for managing WireGuard-enabled admin workstations (moved from dashboard)
  • System → Pending Approvals — dedicated page with full approve/deny/profiles workflow (moved from dashboard)
  • Multi-level firewall rule management — create allow/deny rules at tenant, site, or device scope, enforced as iptables entries in the WireGuard mesh
  • Firewall rules support protocol (tcp/udp/icmp/any), source/destination IP or CIDR, port or port range, direction (inbound/outbound/both), priority, and enable/disable toggle
  • System → Firewall page with scope tabs, rules table, and create/edit modal
  • REST API for firewall rules: GET/POST /api/firewall/rules, PUT/DELETE/POST /api/firewall/rules/{id}[/toggle]
  • Location tab on device and cluster detail panels — set name, latitude, longitude, and location lock directly from the Sites workspace
  • Raspberry Pi telemetry (CPU temperature, core voltage, SDRAM voltage) shown in the Sites device info panel under a dedicated Pi Telemetry section
  • Devices without a location now appear on the dashboard map as a gray ? marker at a deterministic placeholder position

Fixed

  • Persistence manager was never initialized, causing OIDC settings save to fail with "Persistence is not initialized" — now initialized at startup using /etc/watchgrid as config dir
  • OIDC redirect_uri was built with http:// instead of https:// when running behind a reverse proxy without X-Forwarded-Proto — added WATCHGRID_EXTERNAL_URL env var to explicitly set the base URL
  • Firewall rule direction: both now correctly creates iptables entries for both src→dst and dst→src; previously two separate rules were required for bidirectional traffic

Changed

  • Dashboard decluttered: device list and services section removed; devices are managed from the Sites workspace. Dashboard now shows map, license warnings, and pending approvals badge only
  • Dashboard map expanded to fill available viewport height
  • Pending approvals section on dashboard replaced with a compact orange badge linking to the dedicated approvals page
  • User management now displays whether an account is local or oidc
  • Auth configuration can be stored in persisted server state; environment variables remain as fallbacks
  • Tenant peer allowlist removed — replaced by tenant-scope firewall rules
  • Tenant firewall modal now shows only the peer-to-peer toggle (master open/isolated switch)
  • API documentation switched from Swagger UI to Redoc for improved readability

[1.23.0] - 2025-04-01

Added

  • Sites — logical groupings of devices representing physical locations or teams
  • Aggregate metrics (avg CPU/memory/disk, total bandwidth) across all site devices
  • One-click provisioning profile runs across entire site
  • Auto-deploy: apps automatically deployed to new devices joining the site
  • Site-scoped firewall rules
  • REST API: GET/POST /api/sites, GET/PUT/DELETE /api/sites/{id}, assign/unassign endpoints
  • K3s Cluster Management — register and manage K3s clusters via cluster-agent
  • Cluster provisioning generates a ready-to-run install command
  • Deploy/undeploy apps from the Watchgrid catalog to any cluster
  • K8s service proxy: forward HTTP requests to services running inside the cluster
  • Kubernetes resource queries (pods, deployments, namespaces, logs, scale)
  • Multi-architecture cluster-agent builds (amd64, arm64)
  • Provisioning Profiles — tag-based bash scripts that run on devices automatically
  • Profiles match devices by tag overlap
  • Execution tracking with per-device run history and output
  • Site-level bulk profile execution
  • Quick-add bundles for common setups
  • App Routines — schedule recurring actions on deployed apps
  • Actions: start, stop, restart
  • Cron-based scheduling with per-routine timezone support
  • Manual trigger (run now) outside of schedule
  • App Repositories — add external Git or Helm repositories as app sources
  • Git: public or private repos, configurable branch
  • Helm: chart repositories
  • Manual sync trigger
  • Onboarding tokens — provision devices to specific tenants using --token flag
  • Device re-registration preserves site assignment and WireGuard stats
  • Real client IP tracked in audit log (bypasses Docker internal proxy)
  • In-cluster registry access via localhost:5000 hostPort and registry proxy sidecar
  • K8s hostNetwork on cluster-agent to prevent localhost registry port conflicts
  • Auto-site-lock: site assignment is locked once set, preventing accidental reassignment

Changed

  • Unified frontend layout and typography across all pages
  • Cluster app management moved from AppManager to dedicated Clusters UI with tabbed interface

[1.22.0] - 2025-04-01

Added

  • Per-device app configuration system — configure app settings (strings, secrets, booleans) through the web UI
  • Automatic config substitution on deployment — values injected into K8s manifests at deploy time
  • Configuration persistence across redeployments

[1.21.0] - 2025-03-01

Added

  • Provisioning profiles — tag-based scripts that run automatically on device registration
  • App metadata system — define configurable fields in app manifests

Fixed

  • WireGuard peer cleanup on device deletion

[1.20.0] - 2025-02-01

Added

  • Audit log — tracks all administrative actions with user, timestamp, and detail
  • Multi-tenancy firewall policies — per-tenant WireGuard ACLs

Changed

  • Server module split begun — main.go decomposed into auth.go, database.go, middleware.go, and domain modules

[1.19.0] - 2025-01-01

Added

  • SSH Certificate Authority — server-signed short-lived user certs (24h) and host certs (365d)
  • ./test-ssh-ca.sh validation script

Fixed

  • Magic DNS resolution timing on fresh device registration

[1.18.0] - 2024-12-01

Added

  • Two-factor authentication (TOTP) — HMAC-SHA1, ±30s window, custom base32 implementation
  • K3s cluster-agent for external Kubernetes cluster registration

Changed

  • WireGuard subnet expanded to 100.64.0.0/10 (RFC 6598) for multi-tenant scalability

[1.17.0] - 2024-11-01

Added

  • Private Docker registry built into the stack — accessible at registry.wg:5000 over VPN
  • Registry authentication proxy through server API

Fixed

  • Agent reconnection after server restart

[1.16.0] - 2024-10-01

Added

  • Web terminal — WebSocket-based shell access to devices and K8s pods via @xterm/xterm
  • Real-time dashboard WebSocket feed for device status

Changed

  • Frontend migrated to React 18 + Vite + TailwindCSS