Changelog
All notable changes to Watchgrid are documented here. Format follows Keep a Changelog.
[Unreleased]
[1.27.0] - 2026-05-18
Added
- System → About page (
/system/about) consolidates versions, control-plane endpoints, container CPU/memory, Postgres health, and live tenant counts into a single screen. CPU and memory are sourced from cgroup v2 (/sys/fs/cgroup/cpu.stat+memory.current/memory.max) with cgroup v1 and host/procfallbacks, so the percentages reflect the actual server container — not the host VM. Database card runs a 2 sPingContext, surfaces ping latency, Postgres version, the livepg_stat_activityconnection count forcurrent_database(), and the sql/pgx pool's open/in-use/idle counters. The Resources card auto-refreshes every 5 s; the rest of the page reloads on demand via the Refresh button. New endpointGET /api/system/statspowers the card and is documented in Swagger. - Configurable session timeout + IP-bound sessions — new
system_settingstable (migration022) storessession_ttl_minutes(default 60 minutes, was hardcoded at 24 hours) andsession_bind_ip(default on). Admins manage both from a new "Session" card at the top of System → Users. JWTs now carry anipclaim derived fromX-Forwarded-For/X-Real-IP/RemoteAddr;requireAuthrejects requests whose client IP differs from the claim when IP binding is on. Tokens minted before this release have an empty IP claim and are grandfathered through the IP check so existing sessions aren't yanked at deploy time. Auth cookieMax-Ageand the registry-tokenexpires_innow both follow the configured TTL. Backed byGET /api/system/settings(any authed user) andPUT /api/system/settings(admin only) with a 1-minute in-memory cache. Registry tokens are deliberately not IP-bound — docker daemon traffic comes over a separate connection. - Per-repository sync feedback in App Store —
Sync Allnow fires every repository in parallel viaPromise.all, and each repository row shows its own inline state (Syncing…with a pulsing dot,Synced at HH:MM:SSin green,Sync failed: <reason>in red) instead of a single blocking browser alert at the end. The aggregate toast tells you "synced N repositories" or "X of N failed — see row for details."
Changed
- SSH-key repository auth is now stored in the database, not as a filesystem path — the Add Repository form's "SSH Key Path" single-line input is replaced with a "SSH Private Key" textarea that accepts the PEM key contents directly. The server writes the value to a per-sync
os.CreateTempfile with0600permissions, hands the path togitviaGIT_SSH_COMMAND=ssh -i <tmp> -o IdentitiesOnly=yes -o StrictHostKeyChecking=no, anddefers the file removal. The repository list/create API responses now run through aredactedRepohelper that blankspasswordandssh_keyand emitshas_password/has_ssh_keybooleans — secrets that the API previously echoed back to every caller are no longer disclosed. - Notifications are uniformly toasts — the inline
mil-banner-success/mil-banner-errorblocks and nativealert()/confirm()dialogs were swept out of DNS, SSO, PKI, App Store, Audit Log, Inventory, License Management, Firewall, Tenants, User Management, Registry Manager, K8s Device Panel, App Config Modal, Host Row, and Cluster Row in favour of the existingToastProviderAPI andConfirmProvidermodal. Persistent page-state banners (initial-load errors, license-summary status card, permission-restriction notices, delete-confirm modal warnings) were intentionally left in place — those are page state, not transient notifications. Pre-auth flows (Login, 2FA setup, onboarding wizard) keep their inline banners since there's no ToastProvider mounted before login. - Sidebar redesign — the Hosts Overview page header lost its
Siteseyebrow so it matches the flat title pattern used everywhere else (DNS Management, Single Sign-On, User Management, etc.). The main left sidebar is narrower (w-56→w-52), with tighter row padding (px-6 py-3→px-5 py-2.5) and a smaller logo cell; system submenu items usepx-8 py-2. Every nav label haswhitespace-nowrapso long items like "Site Management" and "Pending Approvals" never wrap. The inner Sites column (visible on/inventory) is also narrower (w-60→w-48), drops the "List" subheading, replaces the rounded "card" treatment with the flat left-accent-bar style used in the main nav, and truncates long site names. The Sites column no longer appears on/sites(Site Management) where it was redundant with the page content. - Sites workspace collapse arrow looks like a macOS sidebar toggle — the chevron-in-a-bordered-pill (
|<|) was reading as "something hidden between two vertical lines" (h/t Joël for the feedback). Now a borderless SF-Symbols-style sidebar icon (rounded rect + inner divider) with a subtle green pane fill when expanded, and a hover-only background. The toggle column lost its border-r so it no longer visually frames the icon. - Version stickers moved off the chrome — the
v1.26.x / Agent v1.26.xstrip under the WatchGrid logo was visual noise. Both lines are gone from the sidebar; versions now live exclusively in System → About, where Server, Agent, uptime, and platform sit on the Versions card.
Security
- Default session lifetime cut from 24 hours to 1 hour. Combined with the new IP binding (also on by default), a stolen token is dead within an hour and unusable from a different network within seconds.
- Repository credentials redacted in API responses —
GET /api/repositoriespreviously returned the rawpasswordandssh_keyfields to any authenticated caller. The newredactedRepohelper blanks both and emits booleanhas_password/has_ssh_keyflags instead. POST responses use the same redaction.
Fixed
docker-compose.ui-test.ymlfailed to start after the JWT-length validator landed — the committedJWT_SECRETwas 25 characters, but the server now rejects anything under 32. The committed value is now a 60-char dev-only secret. While in the file, also seedWATCHGRID_ALLOWED_ORIGINSwith all fourlocalhost:5173 / 5174and127.0.0.1:5173 / 5174variants so a browser opened on the Vite default origin gets through CORS instead of seeing "Network error" after the in-browser fetch is 403'd. AdditionallyWATCHGRID_DEV_MODE=truenow auto-appends those four origins inside the server (allowedBrowserOriginsinws_security.go), so a fresh checkout works without any env-var dance.docs/local-ui-testing.mdpointed at the wrong Vite port — was5174, is actually5173.
Database migrations
022_system_settings.sql— newsystem_settings (key, value, updated_at, updated_by)table seeded withsession_ttl_minutes=60andsession_bind_ip=true. Mirrored intoscripts/init-db.sqlfor clean installs.
[1.26.7] - 2026-05-14
Fixed
- Cluster-agent pod restart wiped Watchgrid's view of installed apps — the cluster-agent's proxy route map is in-memory, and after
kubectl rollout restart deploy/watchgrid-cluster-agent -n watchgrid-systemthe next heartbeat carriedexposed_apps: [].ClusterRow.jsxreads/api/clusters/appswhich just returns the device's reportedexposed_apps, so the UI offered the apps as available-to-install even though their Deployments and Services were still running in the cluster. With the new heartbeat-driven DNS sync from 1.26.6 it got worse — the empty array also causedsyncClusterDNSRecordsto delete the cluster's DNS rows. Fix is two-sided: (1)cluster-agent/commands.gohandleDeploynow stamps annotations on the Service afterkubectl apply(app.watchgrid.io/hostname,/expose-port,/protocol,/name) — the server already setapp.watchgrid.io/managed-by=watchgridand/nameas labels viaaddAppLabels, but those alone weren't enough to rebuild a route because the port and protocol were lost; (2) newcluster-agent/rediscover.goruns once at startup after WireGuard comes up, lists every Service cluster-wide carrying the managed-by label, and rebuilds the in-memory proxy map from the annotations. Includes a fallback path for apps deployed before this fix shipped: hostname is derived fromBuildDNSHostname(appName)and port falls back tosvc.Spec.Ports[0].Port— good enough for the bundled demo apps, and gets superseded by exact annotations on the next redeploy. After this fix lands, the next heartbeat repopulatesexposed_apps, the 1.26.6 DNS sync re-upserts records, and the UI shows the apps as installed again — automatically, without any manual redeploy.
[1.26.6] - 2026-05-14
Fixed
- Cluster-agent DNS registration calls were 401-ing —
cluster-agent/dns.goPOSTed proxy hostnames to/api/dns/records, but that endpoint is gated byrequireAuth(admin JWT) and the cluster-agent has no token, so every deploy loggedDNS register failed for {hostname} after 3 attempts: status 401(operators saw this fornginx-demo---local,hello-web, etc.). The Kubernetes/Proxy UI's "Open WebUI" link reverse-proxies via/api/k8s/service-proxy/...and dialsdev.WireGuard.TunnelIP:8081directly, so this didn't block that flow — but the*.wghostnames advertised on the cluster's proxy never had matching DNS records, so anyone trying to resolve{app}.wgdirectly (over the VPN) got NXDOMAIN. The cluster-agent already publishes its current proxy routes in theexposed_appsarray of every heartbeat, so the server now owns DNS state: a newsyncClusterDNSRecordsfunction inserver/main.goupserts acustom_dns_recordsrow for each advertised hostname (pointing at the cluster's tunnel IP) and deletes rows for hostnames the cluster previously claimed but no longer reports. Per-cluster ownership is tracked in an in-memory map, with a safety guard so we only delete records whose IP still matches the cluster's tunnel — operators who overwrote a record manually won't have it clobbered.cluster-agent/dns.gowas deleted;commands.goandmain.gono longer wire aDNSManager. Stale rows from before this fix linger until the same cluster reasserts a smaller exposed_apps set; clean them up by hand viaDELETE FROM custom_dns_records WHERE ip_address = '<cluster tunnel ip>' AND hostname NOT IN (...)if needed. - Registry Test Web couldn't register a proxy route because its metadata had no expose_port —
apps/registry-test-web/metadata.yamlwas missingexpose_portandexpose_protocol, so when the cluster-agent's deploy handler read the metadata it had nothing to add to the proxy and never calledproxy.AddRoute(...). Symptom: the agent log showedCommand deploy completed: Deployed Registry Test Web: ...but no matchingProxy route addedline, and clicking the app in the UI did nothing because the cluster'sexposed_appsheartbeat array never contained it. Setexpose_port: 80andexpose_protocol: httpon the metadata to match the Service definition indeployment.yaml. (Note: the deployment image is pulled fromregistry.wg:5000/..., which still requires containerd registry-mirror configuration on each cluster node — that's a separate problem if image pulls fail.)
[1.26.5] - 2026-05-14
Fixed
- K8s reverse proxy showed "cluster device not reachable" because the cluster-agent never brought up WireGuard —
clusterProvisionHandlergenerated the cluster-agent ConfigMap withWATCHGRID_DISABLE_WIREGUARD: "true"baked in. That dates back to when the cluster-agent shipped (commit 22d6167) and the only traffic it needed was HTTPS to the public server URL for registration/heartbeats/commands. The K8s service-proxy feature added later (k8sServiceProxyHandlerinserver/main.go) reverse-proxies into cluster workloads by dialing the cluster-agent's:8081endpoint atdev.WireGuard.TunnelIP, so a cluster-agent without a tunnel always fails with "cluster device not reachable" (server/main.go:3624-3626). The provisioning manifest now setsWATCHGRID_DISABLE_WIREGUARD: "false"(with an inline comment explaining when to disable it). Prerequisites —/dev/net/tunmount andNET_ADMINcapability — were already in the manifest, so userspace WireGuard viawireguard-goworks without any other changes. If WG setup fails on a given node, the cluster-agent logsWireGuard setup failed: ... (continuing without VPN)and proceeds, so the failure mode is graceful. Operators with already-onboarded clusters need to regenerate the cluster manifest in the UI and re-apply it on the cluster so the cluster-agent pod restarts with the new env var.
[1.26.4] - 2026-05-14
Fixed
- App deploys to newly-onboarded clusters failed silently because the cluster row was never written to
devices—registerHandlerupserted the device into the in-memorydevicesmap and returned200, but never calleddbSaveDevice. Heartbeats from a registered device take the UPDATE-only path (dbUpdateHeartbeat), so a missing row stays missing forever; zero rows affected, no error. The cluster therefore showed up in/api/clusters(which reads the in-memory map) and in the UI, butcluster_commands.cluster_idhas a foreign key ondevices(id) ON DELETE CASCADE, so the firstINSERT INTO cluster_commands (...)fromclusterDeployHandler → dbEnqueueClusterCommandwas rejected by Postgres and the deploy never reached the cluster-agent. Symptom on the rob.trial server: an Omni cluster heartbeating with K8s info visible and 18 pods reported, but the user could not install any app onto it.registerHandlernow persists the device withdbSaveDeviceimmediately after upserting the in-memory map (same pattern the heartbeat auto-register path already uses), so the row exists by the time any FK-dependent insert runs. Operators with already-onboarded clusters that hit this bug will need to delete/re-apply the cluster manifest (or restart the cluster-agent pod) so it re-issues/api/registeragainst the patched server.
[1.26.3] - 2026-05-07
Fixed
- Onboarding & cluster manifest URLs honor the real client scheme behind Caddy — the frontend nginx forwarded
X-Forwarded-Proto: $schemeto the Go server, but$schemeis the scheme nginx itself listens on (port 80, so alwayshttp). That overwrote thehttpsvalue Caddy was already setting on the inbound request, soforwardedProto(r)returnedhttpand the onboarding "Reprovision Existing Device" / "Basic Installation" / "Installation With Kubernetes" curl snippets — plus theserver_urlbaked into the cluster manifest — all came out ashttp://...even when the operator was on HTTPS. Newmap $http_x_forwarded_proto $forwarded_protoblock at the top offrontend/nginx.conffalls back to$schemeonly when the upstream proxy didn't set the header; all threeproxy_set_header X-Forwarded-Proto ...lines (/api/,/downloads/,\.sh$) now use that variable. - Duplicated
--siteidflag in onboarding commands — the backendgetOnboardingInfoHandlerappends--siteid <id>whenever the request carriessite_id=…, andInventory.jsxwas also wrapping the returned strings withsiteScopedCommand()which appended the same flag again. The deduplication regex insidesiteScopedCommandwas broken ("+?required at least one literal quote, which the backend output never has), so the second flag always slipped through and operators saw--siteid 1 --siteid 1. Removed the redundant frontend wrapper entirely; the backend is the single source of truth. - Device-detail tabs wrap instead of clipping the rightmost ones —
HostRow.jsxandClusterRow.jsxrendered the tab strip withflex … overflow-x-auto whitespace-nowrap … no-scrollbar. On a typical desktop width the ninth tab (Kubernetes, only present whendevice.k8s_enabled) was pushed past the right edge with the scrollbar deliberately hidden, so operators couldn't see or reach the Kubernetes tab even though the tab itself was rendered. Replaced withflex flex-wrap gap-x-6 gap-y-2; tabs now flow to a second line when they don't fit, which mirrors how page-level tab strips already behave. - Sites/Inventory side panel starts expanded each time you enter that workspace —
sitesExpandedwas initialised once at Layout mount fromlocation.pathname, so if the app loaded on Dashboard (or anywhere outside/sites//inventory) the panel was stuck collapsed even after you navigated into Sites. Default state is nowtrueand a smalluseEffectre-expands the panel whenever the user enters the Sites/Inventory workspace from outside it. In-session manual collapse via the‹toggle still works — leaving the workspace and coming back simply reopens it. - Onboard-To-Site modal couldn't scroll on shorter viewports —
.mil-modal-cardhad no max-height and no overflow rule, so the Cluster Manifest section at the bottom was clipped off-screen with no way to reach it. Addedmax-h-[90vh] overflow-y-autoto the shared modal card class. K8sDevicePanel is not definedwhen opening Kubernetes on a host device — theInventory.jsxsplit into per-row component files (HostRow.jsx,ClusterRow.jsx, etc.) extracted the JSX that renders<K8sDevicePanel ... />intoHostRow.jsxbut did not carry theimport K8sDevicePanel from '../K8sDevicePanel'along with it.ClusterRow.jsxgot the import;HostRow.jsxdidn't. The build still succeeded because JSX references are compiled toReact.createElement(K8sDevicePanel, …)calls that only blow up at render time, so the regression only surfaced when an operator with a K3s-enabled host clicked the Kubernetes tab — they then saw a red error overlay instead of the cluster panel. Added the missing import; verified the eight PascalCase JSX tags inHostRow.jsxnow all resolve.createPortal is not definedin cluster + app-config modals — same class of regression as theK8sDevicePanelmiss: the inventory split movedcreatePortal(...)calls intoClusterRow.jsxandAppConfigModal.jsxwithout bringingimport { createPortal } from 'react-dom'.TerminalOverlay.jsxandK8sDevicePanel.jsxhad the import; the other two did not. Same Vite-can't-see-it-at-build-time, blows-up-at-render mechanism — the failure surfaced when a user opened the cluster row's config modal or the app-config modal. Added the missing imports; full audit of every PascalCase JSX tag acrosssrc/components/inventory/*.jsxnow passes (no other unbound references).
[1.26.2] - 2026-05-07
Fixed
- Creating users in the UI no longer fails with
users_role_check— the System → Users dropdown offereduser,admin,operator, but the database constraintusers_role_checkonly acceptssuper-admin,tenant-admin,operator,viewer. Saving any role other thanoperatorwas rejected by Postgres withpq: new row for relation "users" violates check constraint "users_role_check". The dropdown is nowviewer/operator(plustenant-adminandsuper-adminwhen the caller is a super-admin), the backend default inusersCreateHandlerisviewer, the privilege gate is updated to blocksuper-admin/tenant-adminfor non-super-admins, and unknown roles are now rejected with a 400 before they reach the DB.
Removed
- Licensing docs page —
docs/licensing.mdremoved and dropped from themkdocs.ymlnav and thedocs/index.mdtable; theLicensing / editionsrow inCLAUDE.md's docs-sync table is gone too. - "Getting Help" footer on docs landing page — removed the trailing
Getting Helpsection (sales email + "Watchgrid B.V. — The Netherlands" line) fromdocs/index.md.
Changed
- Dashboard map fits all devices + control plane on first load —
MapContaineronly honorscenter/zoomon mount, andserverLocationis fetched async, so the very first paint had only the device list (one Pi → zoom 10 → map locked on London). After the server location landed the props were ignored and the user was stuck looking at London with the Falkenstein control plane off-screen. NewFitToContentchild usesuseMap().fitBoundsonce data arrives, then sets a ref so subsequent polling refreshes don't yank the user's manual pan/zoom back.
Fixed
- Device flapping between assigned site and "unassigned" after WireGuard approval —
wgApproveHandlerwas overwriting the in-memorydevices[id]entry with a strippedDevice{}that carried only the ID, tunnel IP, andLastSeen. That clobbered theTenantID,Hostname,DeviceType, and crucially theSiteIDthatwgRegisterHandlerhad just populated. The DB row stayed correct, so REST/api/devices?tenant_id=X(which filters in-memory by tenant and falls back to the DB record) still returned the right site, but the dashboard WebSocket snapshot usestenantID=""and the in-memory entry survives the merge — every WS push wipedsite_idand the device jumped to "Unassigned" until the next REST poll restored it. The handler now updatesLastSeenand the tunnel IP on the existing in-memory record instead of replacing it. Existing servers with corrupt in-memory state recover on the next restart (the map is rehydrated from the DB at boot). - Onboarding commands respect TLS-terminating reverse proxy —
getOnboardingInfoHandler(and theserver_urlit returns to the cluster-manifest generator) built the URL withr.TLS != nil, which is always nil when TLS is terminated upstream. The Onboard-To-Site modal therefore showedcurl -fsSL http://...even when the operator was logged in over HTTPS, leaving them to hand-edit the command. Now uses the existingforwardedProto(r)helper (which honorsX-Forwarded-Proto) and falls back toX-Forwarded-Hostwhen set, matching the convention already used bygetExternalBaseURLfor OIDC redirects.
Changed
- SSO config moved to its own System menu item — the OIDC settings form is no longer a section under
System → Users; it now lives atSystem → SSO(/system/sso). Newfrontend/src/SSO.jsxowns the form, fetch, and save handlers.Users.jsxdrops the OIDC state,fetchOIDCSettings,handleSaveOIDCSettings, and the embedded form (~200 lines). Same backend endpoints (GET/PUT /api/auth/oidc/settings); no API changes. Page is super-admin only — non-super-admins see a permission-denied notice instead of the form. Inventory.jsxsplit into per-component files (#42) — the 4308-line monolith is now 2218 lines. Newfrontend/src/components/inventory/hosts six extracted files:shared.jsx(formatters +InfoPanel/InfoRow/DeviceTabPanel/RuntimeTrend),HostRow.jsx(1237 lines),ClusterRow.jsx(538 lines),ServiceRow.jsx,AppConfigModal.jsx,TerminalOverlay.jsx. All 17 unit tests still pass; frontend build clean; no behaviour change (each extracted function already took its dependencies via props so extraction was purely structural).
Added
- Web Vitals reporting (#44) —
web-vitalsinitialised inmain.jsx; LCP / CLS / INP / FCP / TTFB ship vianavigator.sendBeaconto a newPOST /api/metrics/vitalsendpoint on the server, which folds the values into two Prometheus histograms (watchgrid_web_vitals_msandwatchgrid_web_vitals_cls). No third-party analytics — stays on the customer's own infrastructure. Endpoint is exempt from CSRF + CORS (fires before auth, via sendBeacon which may drop cookies). - Vitest + React Testing Library harness (#45) —
vite.config.jsgains atest:section (jsdom, globals, coverage via v8). Newsrc/test/setup.jspulls in@testing-library/jest-dommatchers and cleans the DOM between cases. 17 tests shipped againstuseApi,usePolling,ConfirmProvider,ToastProvider; CI gates the frontend image build onvitest runvia a newtest-frontendjob inbuild.yml.npm test,npm run test:watch,npm run test:coverageadded to scripts. - Virtualization primitive for long device lists (#43) —
react-windowdependency + newfrontend/src/lib/virtualList.jsexposingVirtualizedListand aVIRTUALIZE_THRESHOLDconstant (300). Not yet wired into Inventory's expandable-row path — current tenants stay well below the threshold and expandable rows need per-row height tracking — but the primitive is ready for the first customer who trips it.
Changed
- Shared skeleton-loader primitives (#37) — new
components/Skeleton.jsx(SkeletonBlock,SkeletonLine,SkeletonCard,SkeletonRows). Inventory and Dashboard render skeletons matching the real layout on first load instead of spinner→content, eliminating CLS. Announced to screen readers viarole="status"+aria-busy="true". - Typed API client helper (#41) —
frontend/src/lib/api.jswrapsAuthContext.apiRequestand exposesuseApi()returning{ get, post, put, patch, delete, raw }. Errors throw a typedApiErrorwith.status,.body, human-readable.message; uncaught errors auto-surface via the toast system (401 is skipped — already handled by the session-expired overlay). Firewall delete flow seeded as a migration example; remaining call-sites can move incrementally.
Changed
- Exponential backoff with jitter on terminal reconnect (#36) —
DeviceTerminal.jsxmoves from fixed2s × 2^nretries to full-jitter exponential backoff (1 s base, 30 s cap, 6 attempts). Counter resets on every successfulreadystatus so a network blip doesn't count against a fresh streak. After max attempts the UI shows a Reconnect button that resets the counter. - CSP tightened (#40) —
connect-srcdropped from'self' ws: wss:(any-origin) to'self'(modern browsers cover same-origin WSS). Addedframe-src 'none'. Preservedstyle-src 'self' 'unsafe-inline'as a documented Tailwind exception;nginx.confnow carries the full rationale inline. - Mobile: horizontal scroll on device-panel tabs (#39) — Inventory device- and cluster-panel tab rows switch from
flex-wrapto horizontal scroll (overflow-x-auto+whitespace-nowrap+ new.no-scrollbarutility inindex.css). Min 44 px tap target per WCAG 2.5.5. Addedrole="tablist"/role="tab"/aria-selected. - Accessibility spot pass (#38) —
Loginerror banner is nowrole="alert"+aria-live="assertive"; username/password inputs gainaria-invalid/aria-describedbypointing at the error banner, plus correctautoCompletehints.
Added
- Structured JSON logging with
log/slog(#13) — newserver/logging.goandagent/logging.goinstall aslog.JSONHandlerat the level picked fromWATCHGRID_LOG_LEVEL(debug/info/warn/error; defaultinfo). Server logs carrycomponent=server+version; agent logs adddevice_idwhen available. The standardlogpackage is bridged through slog vialog.SetOutputso existinglog.Printfcall sites emit JSON immediately (markedlegacy_log=true) — migration to first-class slog attributes can land incrementally without touching every file at once. - GDPR user export + cascade purge (#24) — two new super-admin-only endpoints.
GET /api/users/{username}/exportbundles every row acrossusers,admin_audit_log,device_security_log,ssh_certificates,license_audit_log, anddevice_profile_runsthat references the user into a single JSON download (password hash + 2FA secret redacted).DELETE /api/users/{username}?purge=trueruns a transactional cascade-delete across the same tables, scrubs username matches indevice_security_log.details, and writes auser_gdpr_purgeaudit entry before deleting so the record survives its own cascade. The regularDELETE /api/users/{id}without?purge=truekeeps its existing off-boarding semantics. Documented indocs/users.md#gdpr--data-subject-access-requests. - Prometheus
/metricsendpoint (#14) —github.com/prometheus/client_golangwith a package-private registry instruments HTTP routes (low-cardinality route bucketing, status class, latency histogram), agent heartbeats, login failures by reason, per-limiter rate-limit rejections, WireGuard peer count, and DB pool stats (open + in-use). Mounted at/metricson :8080; production deployments must block external access via the reverse proxy / NetworkPolicy. Documented indocs/production.md#observability-prometheus. - k8s hardening: PodDisruptionBudget + default-deny NetworkPolicies (#22) — new
k8s/07-policies.yamlships PDBs (minAvailable: 2for server,minAvailable: 1for Postgres) plus a default-deny network policy with explicit allow edges: frontend→server, server→postgres, server→registry, cluster-agent→server, kube-dns for every workload, and controlled Internet egress from the server (RFC1918 excluded). - Route-level code splitting with
React.lazy+Suspense(#31) — every dashboard route except/(Dashboard) is nowReact.lazy-loaded with an accessible spinner fallback. The entry chunk shrank from ~375 KB raw / 87 KB gz to 60 KB / 19 KB gz (3× smaller); heavy screens like Inventory (40 KB gz) and Sites/Users/Tenants (~4.5 KB gz each) ship only when visited. Initial-load JS+CSS drops from ~163 KB gz to ~136 KB gz. - Production runbook (#16) — new
docs/runbook.mdcovers first-install checklist, backup & verification drills (Postgres, WireGuard key, SSH-CA), upgrade + rollback procedure with post-upgrade smoke tests, an alert → action playbook, leader-election verification drill, and the observability cheat-sheet. Linked frommkdocs.ymland deployed atdocs.watchgrid.dev/runbook/. - CSRF protection on state-changing endpoints (#19) — new
csrfMiddlewareenforces the double-submit-cookie pattern: on login the server sets a non-httpOnlywatchgrid_csrfcookie with 32 random bytes;AuthContext.apiRequestreads it back and echoes the value in anX-CSRF-Tokenheader on every POST/PUT/PATCH/DELETE. The middleware compares header against cookie in constant time and returns 403 on mismatch. Exempt paths: agent endpoints (token-authenticated), registry proxy,/downloads/, WebSocket upgrades,/api/auth/login,/api/auth/oidc/*.GET/HEAD/OPTIONSand Authorization-header-only clients are unaffected. - Password policy + bcrypt cost 12 (#20) — new
validatePasswordPolicyenforces min length 8, mixed case + digit + special, and a common-passwords blocklist on every user-facing password endpoint (createUserHandler,changePasswordHandler,onboardingHandler). Bcrypt cost raised from the library default (10) to 12 for new hashes.GET /api/auth/password-policypublishes the rules so the frontend can render live feedback. Existing stored hashes at cost 10 remain verifiable — no forced migration. Env-var admin bootstrap does NOT validate (so operators with weak existingADMIN_PASSWORDdon't lose access on upgrade). - Toast notification system (#33) — new
ToastProviderexposestoast.success/error/infowith accessiblerole="status"announcement, auto-dismiss (5 s for info/success, 8 s for errors), pause-on-hover, and focus behaviour. Top-levelalert()call sites inTenants,ProvisioningProfiles,AdminDevices,Sites, and more migrated off the browser's native dialogs. - Branded ConfirmModal replaces
window.confirm(#32) — newConfirmProviderexposesconfirm({ title, message, variant })returningPromise<boolean>, with focus trap, ESC-to-close, backdrop-click-to-cancel, and avariant: 'danger'style for destructive actions. Sign-out inLayout, plus delete flows inUsers,Sites,DNS,Firewall,ProvisioningProfiles,Tenants, andAdminDevicesall migrated. - Cluster command queue persisted to PostgreSQL — new
cluster_commandstable (migration021) stores every deploy/delete/restart operation destined for a cluster-agent with its kind, JSONB payload, status (pending/claimed/done/failed), idempotency key, result, and lifecycle timestamps.clusterDeployHandlerandclusterUndeployHandlernow enqueue to the DB viadbEnqueueClusterCommand;commandHandlerclaims the next command atomically withSELECT ... FOR UPDATE SKIP LOCKEDwhen a cluster-agent polls;commandResultHandlermarks the claimed rowdone/failedon result POST. An idempotency key (<kind>:<app>:<namespace>) deduplicates double-click enqueues while a previous command for the same target is still in flight. On server startup, commands stuckclaimedfor more than 10 minutes are reset topendingfor re-delivery (Kubernetes deploys are idempotent). NewGET /api/clusters/commands?cluster_id=<id>endpoint returns queue history. Documented indocs/clusters.md#command-queue.
Fixed
- Hijacker/Flusher interface forwarding through metrics middleware (#57) — Tier 3a's
statusRecorderembeddedhttp.ResponseWriterfor Write/WriteHeader but Go doesn't promote interfaces across embedded fields, sohttp.Hijackerwas lost.gorilla/websocketrefuses to upgrade without Hijacker and 500'd on/api/ws/dashboard. Surfaced by post-deploy E2E smoke.statusRecorder.Hijack()+Flush()now forward explicitly. - CSRF bypass for
Authorization: Bearer/X-Agent-Token(#55) — follow-up to #48. Requests authenticating with header tokens can't be forged cross-origin (CORS preflight blocks setting those headers), so CSRF adds no defensive value and just breaks API-token callers that happen to share a browser with a live session cookie. Surfaced by the E2E firewall suite, which uses Bearer auth for fixtures. - E2E fixtures aligned with Tier 2b changes (#54, #56) — sign-out test now drives the
ConfirmModalinstead ofwindow.confirm; multi-tenancy fixture password bumped from 9 chars to 19 chars to meet the new policy; firewall-delete test drives the modal. - CI pipeline unblocked (multiple hotfixes early in the session) —
aquasecurity/trivy-actiontag pinned to@v0.36.0(earlier@0.28.0didn't exist as a tag and its transitively-referencedsetup-trivy@v0.2.1was also removed). Image refs lowercased via aREPO_LCenv var (Trivy can't parse uppercase).dorny/paths-filter@v3received explicitpull-requests: readpermission on thedetect-changesjob (PRs otherwise failed with "Resource not accessible by integration" and skipped every downstream build). The static-analysis workflow (security.ymlwith CodeQL + Trivy SARIF upload) was removed — every job in it targets GitHub code-scanning alerts, which requires GitHub Advanced Security on private repos; the image-scan Trivy gate in the build workflow remains the primary CVE enforcement path. - Migration 020 insert statement —
020_onboarding_token_expiry.sqlwas missing thenamecolumn in itsINSERT INTO schema_migrationsstatement, causing fresh-install migrations to fail at that step. The insert now matches the format used by migrations 018 and 019.
Changed
- CI: image scanning + SBOM — every build workflow run now scans the pushed image with Trivy (
CRITICAL-severity gate,ignore-unfixed) and generates a CycloneDX SBOM per component (server, frontend, cluster-agent, service-agent) as a 90-day retained artefact. CVE response SLA documented indocs/production.md(Critical: triage 24h, patch 3d, release 7d). The originally-planned static-analysis workflow (CodeQL Go + CodeQL JavaScript + Trivy fs SARIF upload) was dropped — every job in it relies on GitHub code-scanning alerts, which requires GitHub Advanced Security on private repos. The Trivy image gate in the build workflow remains the primary CVE enforcement path. - K8s manifests: container images pinned to semver + sha256 digest —
imagePullPolicychanged fromAlwaystoIfNotPresentin all base manifests.k8s/base/kustomization.yamlandk8s/kustomization.yamldefault to the current release tag instead of:latest. The production overlay (k8s/overlays/production/kustomization.yaml) carries bothnewTaganddigestfields.scripts/pin-images.sh <version>fetches digests from the registry (usingcrane,skopeo, ordocker) and updates the overlay in one step. The release CI job (pin-manifests) runs automatically on semver tag pushes, captures the digest from the build step, and commits the updated overlay tomain. Dev overlay retains:latestintentionally.
Added
- Real
/healthzand/readyzendpoints (#15) — liveness probe runs a DBPingContext, checks that thewg0WireGuard device is present, and stats the default-tenant SSH-CA host key. Readiness additionally gates on amigrationsAppliedflag that flips onceschema_migrationshas at least one row.k8s/04-server.yamlswappedtcpSocketprobes forhttpGetones and picked up astartupProbeso slow first migrations don't trigger liveness failures. - Proprietary LICENSE, EULA.md, NOTICES.txt (#26) —
LICENSEstates the source-code terms,EULA.mdis the customer-facing End User Licence Agreement,NOTICES.txtis regenerated byscripts/gen-notices.sh(Go modules viago-licenseswhen installed, elsego list -m all; npm vialicense-checker-rseidelsohn). README now links all three. README's priorLicense: MITfooter was inconsistent with the product's paid licence-key enforcement — replaced with a Licensing section that points to the three new files.
Changed
- Frontend polling: 30s default + pause on hidden tabs (#28) — new
frontend/src/lib/usePolling.jshook encapsulatessetInterval+visibilitychangeso dashboards stop hammering the API when the operator switches tabs and resume immediately on focus.Dashboard,Sites,Inventory,AppManager,AdminDevices, andK8sDevicePanelmoved to the hook; intervals raised from 10–15 s to 30 s. On a 100-device tenant that cuts idle background load roughly 3× (and 100 % while tabs are hidden). - Audit-log retention sweeper (#25) — new
audit_retention.goruns a daily goroutine thatDELETEs rows older thanWATCHGRID_AUDIT_RETENTION_DAYS(default 90) from bothadmin_audit_loganddevice_security_log. Sweep cadence is tunable viaWATCHGRID_AUDIT_RETENTION_SWEEP_HOURS(default 24). Guidance for regulated customers on archive-to-object-storage workflows added todocs/audit.md. - Frontend build hardening (#29, #30) — Vite now drops
debuggerstatements and marksconsole.{log,info,warn,debug,trace}as pure in production builds (so they're tree-shaken from the shipped bundle whileconsole.erroris preserved for real errors). Sourcemaps no longer ship to prod, and React/router/xterm/leaflet/icons are split into their own long-cacheable chunks viamanualChunks.frontend/nginx.confserves/assets/(Vite's hashed output) withCache-Control: public, max-age=31536000, immutablewhileindex.htmlstaysno-cache. Initial-load gzipped JS+CSS is ~163 KB (target was <500 KB). - Soft 401 handling on background polls (#34) —
AuthContext.apiRequest(url, { background: true })no longer hard-logs-out on 401. Instead it sets asessionExpiredflag, returns the response, and a newSessionExpiredOverlaymounts the Login form on top of the running app so unsaved form state is preserved. User-initiated requests still log out hard. Polling sites inDashboard,Sites,Inventory,AppManager,Layout,AdminDevices, andK8sDevicePanelare opted in.
Security
- CORS lockdown on browser-facing API (#23) — new
corsMiddlewarerejects requests carrying anOriginheader that is neither same-origin (Origin host == request Host) nor on the comma-separatedWATCHGRID_ALLOWED_ORIGINSallowlist. Agent endpoints (/api/register,/api/heartbeat,/api/commands/,/api/commandresult,/api/wg/...,/api/logs/...), the registry proxy (/api/registry/,/v2/,/registry/), public/downloads/, and WebSocket upgrades are exempt. OPTIONS preflights short-circuit withAccess-Control-Max-Age: 600. Reuses the existingWATCHGRID_ALLOWED_ORIGINSenv var (renamed helperallowedWebSocketOrigins→allowedBrowserOrigins). Documented indocs/production.md. - Trial + prod Postgres TLS bootstrap fixed for fresh volumes — the previous design had two bugs that combined to make a fresh-volume bootstrap impossible. (1)
command: postgres -c ssl=on -c ssl_cert_file=...crashed on first boot because the docker-entrypoint forwards command-line args to the temp server it spins up to run init scripts — and that temp server can't start without a cert that hasn't been generated yet, leaving the container in a restart loop where the init script never runs. (2)postgres:16-alpinedoesn't shipopenssl, and init scripts run as the unprivilegedpostgresuser soapk addfrom the script isn't possible — the cert generation would have failed silently with exit 127 anyway. Fix: drop thecommand:override in bothtrials/docker-compose.trial.ymlanddocker-compose.prod.yml, switch the image topostgres:16(Debian, ships openssl), and havescripts/postgres-ssl-init.shgenerate the cert AND appendssl = onplus cert paths topostgresql.auto.conf. The realexec postgresafter init reads auto-conf, persisting SSL across restarts. Re-runadmin-panel/scripts/seed-kv.sh <kv-namespace-id>so Cloudflare KV picks up the new compose template; the seed script itself was updated to current wrangler syntax (kv key put+--remote). - Go toolchain bumped to 1.24.13 / 1.25.9 — addresses CVE-2025-68121 (
crypto/tls: incorrect certificate validation instdlib, CRITICAL) which was blocking the Trivy image gate in CI.server,agent, andservice-agentDockerfiles move fromgolang:1.24.9-alpinetogolang:1.24.13-alpine; theirgo.modtoolchaindirectives bump fromgo1.24.9togo1.24.13so cross-compiled agent binaries embed the patched stdlib.cluster-agentpinsgolang:1.25-alpine→golang:1.25.9-alpinefor explicit patch-level tracking. - PostgreSQL TLS enforced in production —
initDatabasenow returns a fatal error ifWATCHGRID_DB_SSLMODE=disableoutsideWATCHGRID_DEV_MODE=true.docker-compose.prod.ymlmountsscripts/postgres-ssl-init.shinto/docker-entrypoint-initdb.d/, which generates a one-time self-signed cert and writesssl = on(plus cert paths) intopostgresql.auto.confso the cluster starts encrypted from first boot, withWATCHGRID_DB_SSLMODE=requireon the client side. Registry proxy logs a security warning whenREGISTRY_URLuses HTTP for a non-localhost host. Trust bundle handling documented indocs/production.md. - SSH CA key backup & restore runbook —
scripts/backup-ssh-ca.shcreates an AES-256-CBC encrypted tarball of all four CA key files, supports local paths and rsync remote destinations, and retains a configurable number of backups (default 14). Systemd service + timer units inscripts/systemd/for daily automated backups. Full restore procedure with RTO < 15 min documented indocs/ssh-ca.md#backup--restore. - Rate limiter memory bounded + Traefik cross-replica safety net — in-process rate limiter now runs a background goroutine that evicts stale buckets every 5 minutes (10-minute TTL), replacing the ad-hoc GC.
docker-compose.prod.ymlgains a Traefikauth-ratelimitmiddleware (10 req/min/IP) on the/apirouter as a cross-replica enforcement layer. Architecture decision documented indocs/production.md. - WebSocket endpoints require JWT before upgrade —
POST /api/ws-ticketissues a 2-minute purpose-bound ticket ("ws") so browsers can open WebSockets without putting a long-lived JWT in the URL. BothdashboardWSHandlerandterminalUserWebsocketHandlernow use sharedextractWSToken+verifyWSTokenhelpers that accept regular JWTs, ws-tickets, httpOnly cookies, and theSec-WebSocket-Protocol: watchgrid-jwt.<token>sub-protocol trick.terminalUserWebsocketHandleralso gains a tenant check: the connecting user must have access to the session device's tenant. - Admin password required in production — server now refuses to start if neither
ADMIN_PASSWORDnorADMIN_PASSWORD_HASHis set andWATCHGRID_DEV_MODEis nottrue. The hardcodedchangemefallback is restricted to dev mode only.docker-compose.prod.ymlupdated to document and passADMIN_PASSWORD. - Onboarding tokens now expire —
token_expires_atcolumn added totenants; default TTL is 1 year on generation/rotation (raised from the original 30 days — short enough to bound the blast radius of a leaked token, long enough that fleets on a yearly re-image cadence don't have to rotate mid-cycle). Expired tokens are rejected at/api/registerand/api/wg/registerand the event is logged todevice_security_log. Existing tokens are backfilled by migration 020. The Tenants UI shows expiry date (yellow <7 days, red = expired) and a "Rotate Token" button for admins. - Agent binary self-update now supports Ed25519 signature verification — when
WATCHGRID_UPDATE_PUBKEY(hex-encoded Ed25519 public key) is set or the key is embedded at build time via-ldflags, the agent downloads a.sigfile alongside the binary and verifies the signature before installation. Updates are rejected if verification fails. Falls back to checksum-only with a warning when no key is configured. - Provisioning script requires HTTPS —
provision.shnow refuseshttp://server URLs to prevent supply-chain attacks during agent binary download. SetWATCHGRID_ALLOW_HTTP=1to override in local development. - SPKI certificate pinning in agent — set
WATCHGRID_SERVER_SPKIto a comma-separated list of hex-encoded SHA-256 SPKI hashes to pin the server's TLS certificate. Both the HTTP client and WebSocket dialer enforce the pin. - Terminal WebSocket agent connection requires a per-session token — the server issues a one-time
agent_tokenwith each terminal session command. The agent sends it asX-Agent-Tokenheader when connecting; the server rejects connections with a missing or incorrect token. - tcpdump interface name validated against system allowlist —
runCapturenow validates the interface name with a strict regex and confirms it exists on the host before invoking tcpdump, preventing command-injection via crafted interface names. - Packet captures written to private directory — captures are now stored in
/var/lib/watchgrid/captures/capture.pcap(directory mode0700, file mode0600) instead of world-readable/tmp/capture.pcap.
[1.26.1] - 2026-04-21
Security
- K8s API TLS verification restored — the server no longer disables certificate verification when routing K8s API calls through the WireGuard tunnel. The cluster CA embedded in the kubeconfig is now used with
ServerNameset to the original kubeconfig hostname, preserving cert validation while routing via the tunnel IP. - JWT secret minimum length enforced — the server now rejects startup if
JWT_SECRETis shorter than 32 characters, preventing weak secrets that could be brute-forced to forge tokens. - JWT removed from service proxy URL — the K8s service proxy no longer appends the auth token as a URL query parameter (exposed in browser history and server logs). The httpOnly session cookie is used instead.
- K8s service proxy restricted to Watchgrid server — the cluster-agent's port 8081 proxy now only accepts connections from the WireGuard gateway IP (
100.64.1.254), blocking other WireGuard peers from reaching internal cluster services. - K8s proxy and registry DNS gateway IP derived per-tenant — the cluster-agent previously hardcoded
100.64.1.254as the allowed gateway IP and DNS server, which broke multi-tenant deployments where the gateway is a different IP (e.g.100.64.2.254). The gateway is now derived from the cluster-agent's own tunnel IP (replacing the last octet with.254), matching the per-tenant subnet convention. TheWATCHGRID_GATEWAY_IPenv var can still override this for non-standard deployments. - System namespaces protected from destructive K8s commands —
handleDeploy,handleDelete,handleRestart,handleScale, andhandleK8sDeploynow reject any operation targetingkube-system,kube-public,kube-node-lease, orwatchgrid-system. - OIDC issuer URL validated against SSRF — the server now resolves the OIDC issuer hostname before fetching the discovery document and rejects URLs that resolve to loopback, private, or link-local addresses, and requires HTTPS.
- Firewall rule scopeID validated against tenant — when creating a firewall rule scoped to a site or device, the server now verifies that the resource belongs to the authenticated user's tenant, preventing cross-tenant rule injection.
[1.26.0] - 2026-04-20
Fixed
- Dashboard map now visible on fresh tenants — the map is shown when the control plane server has a location set, even if no devices have registered yet
- App Store repository sync crash — clicking Sync in the Repository tab threw "is not a function"; the
onRepoChangecallback was missing from theRepoManagercomponent call
Security
- Agent self-update now verifies SHA-256 checksum before installing the downloaded binary — a compromised or tampered binary is rejected before it can replace the running agent. The build script generates
.sha256files alongside each architecture binary; the server exposes them at/downloads/watchgrid-agent-{arch}.sha256. - Removed
insecure_skip_verify: truefrom K3s registry config — the containerd registry configuration no longer disables TLS certificate verification. Plain HTTP endpoints (sufficient for the WireGuard-encrypted tunnel) are used directly, eliminating the unnecessary TLS bypass. - Shell command parameters no longer logged — the agent debug log redacted the full command
Paramsfield to prevent credentials or secrets embedded in shell commands from appearing in system logs.
[1.24.0] - 2026-04-12
Added
- OIDC single sign-on — configurable
Login with SSObutton on the login page; supports Microsoft Entra ID and any OpenID Connect provider - Super-admin SSO settings panel in System → Users for configuring issuer, client ID/secret, button text, claim mapping, default tenant/role, and auto-provisioning behavior
- Automatic OIDC user linking and provisioning with persisted
auth_sourcemetadata - System → Admin Devices — dedicated page for managing WireGuard-enabled admin workstations (moved from dashboard)
- System → Pending Approvals — dedicated page with full approve/deny/profiles workflow (moved from dashboard)
- Multi-level firewall rule management — create allow/deny rules at tenant, site, or device scope, enforced as iptables entries in the WireGuard mesh
- Firewall rules support protocol (tcp/udp/icmp/any), source/destination IP or CIDR, port or port range, direction (inbound/outbound/both), priority, and enable/disable toggle
- System → Firewall page with scope tabs, rules table, and create/edit modal
- REST API for firewall rules:
GET/POST /api/firewall/rules,PUT/DELETE/POST /api/firewall/rules/{id}[/toggle] - Location tab on device and cluster detail panels — set name, latitude, longitude, and location lock directly from the Sites workspace
- Raspberry Pi telemetry (CPU temperature, core voltage, SDRAM voltage) shown in the Sites device info panel under a dedicated Pi Telemetry section
- Devices without a location now appear on the dashboard map as a gray
?marker at a deterministic placeholder position
Fixed
- Persistence manager was never initialized, causing OIDC settings save to fail with "Persistence is not initialized" — now initialized at startup using
/etc/watchgridas config dir - OIDC
redirect_uriwas built withhttp://instead ofhttps://when running behind a reverse proxy withoutX-Forwarded-Proto— addedWATCHGRID_EXTERNAL_URLenv var to explicitly set the base URL - Firewall rule
direction: bothnow correctly creates iptables entries for both src→dst and dst→src; previously two separate rules were required for bidirectional traffic
Changed
- Dashboard decluttered: device list and services section removed; devices are managed from the Sites workspace. Dashboard now shows map, license warnings, and pending approvals badge only
- Dashboard map expanded to fill available viewport height
- Pending approvals section on dashboard replaced with a compact orange badge linking to the dedicated approvals page
- User management now displays whether an account is
localoroidc - Auth configuration can be stored in persisted server state; environment variables remain as fallbacks
- Tenant peer allowlist removed — replaced by tenant-scope firewall rules
- Tenant firewall modal now shows only the peer-to-peer toggle (master open/isolated switch)
- API documentation switched from Swagger UI to Redoc for improved readability
[1.23.0] - 2025-04-01
Added
- Sites — logical groupings of devices representing physical locations or teams
- Aggregate metrics (avg CPU/memory/disk, total bandwidth) across all site devices
- One-click provisioning profile runs across entire site
- Auto-deploy: apps automatically deployed to new devices joining the site
- Site-scoped firewall rules
- REST API:
GET/POST /api/sites,GET/PUT/DELETE /api/sites/{id}, assign/unassign endpoints - K3s Cluster Management — register and manage K3s clusters via cluster-agent
- Cluster provisioning generates a ready-to-run install command
- Deploy/undeploy apps from the Watchgrid catalog to any cluster
- K8s service proxy: forward HTTP requests to services running inside the cluster
- Kubernetes resource queries (pods, deployments, namespaces, logs, scale)
- Multi-architecture cluster-agent builds (amd64, arm64)
- Provisioning Profiles — tag-based bash scripts that run on devices automatically
- Profiles match devices by tag overlap
- Execution tracking with per-device run history and output
- Site-level bulk profile execution
- Quick-add bundles for common setups
- App Routines — schedule recurring actions on deployed apps
- Actions:
start,stop,restart - Cron-based scheduling with per-routine timezone support
- Manual trigger (run now) outside of schedule
- App Repositories — add external Git or Helm repositories as app sources
- Git: public or private repos, configurable branch
- Helm: chart repositories
- Manual sync trigger
- Onboarding tokens — provision devices to specific tenants using
--tokenflag - Device re-registration preserves site assignment and WireGuard stats
- Real client IP tracked in audit log (bypasses Docker internal proxy)
- In-cluster registry access via
localhost:5000hostPort and registry proxy sidecar - K8s hostNetwork on cluster-agent to prevent localhost registry port conflicts
- Auto-site-lock: site assignment is locked once set, preventing accidental reassignment
Changed
- Unified frontend layout and typography across all pages
- Cluster app management moved from AppManager to dedicated Clusters UI with tabbed interface
[1.22.0] - 2025-04-01
Added
- Per-device app configuration system — configure app settings (strings, secrets, booleans) through the web UI
- Automatic config substitution on deployment — values injected into K8s manifests at deploy time
- Configuration persistence across redeployments
[1.21.0] - 2025-03-01
Added
- Provisioning profiles — tag-based scripts that run automatically on device registration
- App metadata system — define configurable fields in app manifests
Fixed
- WireGuard peer cleanup on device deletion
[1.20.0] - 2025-02-01
Added
- Audit log — tracks all administrative actions with user, timestamp, and detail
- Multi-tenancy firewall policies — per-tenant WireGuard ACLs
Changed
- Server module split begun —
main.godecomposed intoauth.go,database.go,middleware.go, and domain modules
[1.19.0] - 2025-01-01
Added
- SSH Certificate Authority — server-signed short-lived user certs (24h) and host certs (365d)
./test-ssh-ca.shvalidation script
Fixed
- Magic DNS resolution timing on fresh device registration
[1.18.0] - 2024-12-01
Added
- Two-factor authentication (TOTP) — HMAC-SHA1, ±30s window, custom base32 implementation
- K3s cluster-agent for external Kubernetes cluster registration
Changed
- WireGuard subnet expanded to
100.64.0.0/10(RFC 6598) for multi-tenant scalability
[1.17.0] - 2024-11-01
Added
- Private Docker registry built into the stack — accessible at
registry.wg:5000over VPN - Registry authentication proxy through server API
Fixed
- Agent reconnection after server restart
[1.16.0] - 2024-10-01
Added
- Web terminal — WebSocket-based shell access to devices and K8s pods via
@xterm/xterm - Real-time dashboard WebSocket feed for device status
Changed
- Frontend migrated to React 18 + Vite + TailwindCSS