Skip to content

Port the kata daemon to Windows#75

Open
njt wants to merge 11 commits into
kenn-io:mainfrom
njt:windows-port
Open

Port the kata daemon to Windows#75
njt wants to merge 11 commits into
kenn-io:mainfrom
njt:windows-port

Conversation

@njt
Copy link
Copy Markdown
Contributor

@njt njt commented May 31, 2026

Ports the kata daemon to Windows. go build, go vet, go test, and go mod tidy are clean on Windows and unchanged on Unix.

Supersedes #73 / #74 (the 0xc0000142 detach fix is folded in as its own commit).

What's in it

Daemon port — platform-specific pieces behind build tags:

  • Signaling — Windows has no cross-process SIGTERM/SIGHUP, so stop/reload use per-daemon named events (Local\kata-stop|reload-<dbhash>-<pid>) set by daemon stop/reload. Unix keeps SIGTERM/SIGHUP.
  • ProcessAlive — Windows uses OpenProcess + GetExitCodeProcess(STILL_ACTIVE); Unix keeps kill(0).
  • Namespace — no uid on Windows, so the per-user socket-dir tag derives from a sanitized USERNAME.

Console detach fix (the original bug) — the daemon runs git remote to resolve project identity. On Windows the auto-started daemon shared the caller's console; once that shell exited, git aborted at startup with STATUS_DLL_INIT_FAILED (exit status 0xc0000142). Now the daemon is created with DETACHED_PROCESS | CREATE_NEW_PROCESS_GROUP and routes stdout/stderr to daemon.log instead of the caller's stderr (which a surviving detached daemon would hold open, hanging output-capturing parents).

Test portability — stock main had ~25 Windows failures in a Unix-first suite:

  • token_identity e2e test tagged !windows like its siblings.
  • hookprobe helper built as hookprobe.exe; hook-config tests use bare-name / per-OS absolute commands.
  • Unix 0600 file-mode assertions skipped on Windows (ACL-based there).
  • shouldRefuseAutoStartDaemon normalized via filepath.ToSlash (a real bug: it looked for \go-build on Windows).
  • agent-output tests assert through agentValue; beads-from-live-bd tests skip on Windows; stop/reload tests stand up the named events a real daemon would.

CI — adds a windows job to .github/workflows/test.yml running go build, go vet, and go test on windows-latest. The existing ubuntu lanes never compile the *_windows.go files or !windows-tagged tests, so this is the regression gate for the port. No cgo, so no extra toolchain setup.

@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented May 31, 2026

roborev: Combined Review (d4e05cb)

High severity finding: Windows build still fails due to an unsupported signal constant.

High

  • cmd/kata/daemon_signaling_windows.go:97
    The Windows-only file sends syscall.SIGHUP, but Go’s Windows syscall package does not define SIGHUP, so the Windows build still fails. Use a synthetic os.Signal value that exists on Windows, or define a small local signal type; runReloadLoop only needs a receive event and does not inspect the signal value.

Synthesized from 2 reviews (agents: codex | types: default, security)

@wesm
Copy link
Copy Markdown
Member

wesm commented May 31, 2026

Thanks. If you don't mind I can take it from here and I'll add CI so this doesn't regress?

Nat Torkington and others added 3 commits May 31, 2026 18:15
Squashed Windows port branch. Additional commits folded in:

- fix(daemon): detach auto-started daemon from caller console/stdio on Windows
- fix(windows): tag token_identity e2e test !windows like its siblings
- test(windows): stand up named signal events for faked daemons in stop/reload tests
- fix(windows): make the test suite pass on Windows
Guard the Windows daemon port against regressions. The _windows.go
files and !windows build tags are never compiled by the existing
ubuntu-latest lanes, so breakage on Windows is invisible there.

The new job runs go build, go vet, and go test -p 1 ./... on
windows-latest. The project has no cgo, so unlike msgvault it needs no
MSYS2 or toolchain setup. vet runs explicitly because the GOOS=linux
golangci-lint lane never sees the Windows-tagged files.
The path is built from the daemon's own data dir and the fixed
"daemon.log" constant, not user input. Annotate //nolint:gosec so the
make-lint pre-commit hook (and CI lint lane) pass.
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented May 31, 2026

roborev: Combined Review (42a1fcd)

Medium issue found; security review found no additional findings.

Medium

  • internal/client/ensure.go:152 - The automatic restart path for incompatible daemons still uses p.Signal(syscall.SIGTERM). Since the PR establishes that Windows requires the named stop event, EnsureRunning can detect an old live daemon on Windows, fail to stop it, wait 3 seconds, and return old daemon did not stop within 3s instead of replacing it.

    Suggested fix: Move daemon stop signaling into shared platform-specific code, or add a Windows-specific stopRunningDaemons path that sets the same Local\kata-stop-<dbhash>-<pid> event, deriving the DB hash from the runtime record/config.


Synthesized from 2 reviews (agents: codex | types: default, security)

wesm added 2 commits May 31, 2026 19:23
Git for Windows checks out text files as CRLF (core.autocrlf=true by
default). The internal/tui snapshot tests compare golden files
byte-for-byte against LF-rendered output, so on windows-latest all ~45
TestSnapshot_* cases failed with invisible want/got diffs. A repo-wide
.gitattributes pins text files to LF on checkout for every platform.
TestClaimHubClientUsesUnixRuntimeForKataInvalid hardcoded
os.MkdirTemp("/tmp", ...); on the GitHub windows-latest runner /tmp
resolves to a nonexistent C:\tmp, failing with GetFileAttributesEx
before the test ever reached its AF_UNIX listener. The test exercises
the unix:// runtime-file fallback, which is Unix-only, so guard it with
the same runtime.GOOS check its endpoint_test.go siblings already use.
The file's other (platform-independent) test keeps running on Windows.
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Jun 1, 2026

roborev: Combined Review (0f88398)

Windows support is not ready to merge: one High compile break and one Medium restart-path bug remain.

High

  • cmd/kata/daemon_signaling_windows.go:88: The Windows-only reload event path sends syscall.SIGHUP, but SIGHUP is not available in the Windows syscall API, so the new windows-latest build will fail to compile. Use a cross-platform sentinel signal value such as os.Interrupt, since runReloadLoop ignores the value, or define a small private os.Signal implementation for synthetic reload events.

Medium

  • internal/client/ensure.go:137: The auto-start compatibility path still stops incompatible daemons with p.Signal(syscall.SIGTERM). The new Windows stop mechanism uses named events because SIGTERM cannot be delivered cross-process there, so a Windows client that discovers an incompatible daemon will fail to stop it and restart cleanly. Move platform-specific stop signaling into a shared internal helper and use it from both kata daemon stop and stopRunningDaemons.

Synthesized from 2 reviews (agents: codex | types: default, security)

@vercel
Copy link
Copy Markdown

vercel Bot commented Jun 1, 2026

@wesm is attempting to deploy a commit to the Kenn Software Team on Vercel.

A member of the Team first needs to authorize it.

@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Jun 1, 2026

roborev: Combined Review (405f282)

Medium risk: one lifecycle race can block daemon upgrades on Windows or during normal shutdown.

Medium

  • internal/client/ensure.go:149
    stopRunningDaemons treats any SignalDaemonStop failure as fatal after a prior liveness/probe check. The daemon can exit between the probe and the signal/open-event call, leaving a stale runtime record; on Windows the named event can also disappear during shutdown. In that case, an upgrade path that should treat the old daemon as already stopped instead fails and refuses to start the replacement.

    Suggested fix: After a signal error, re-check the runtime record/process or probe result and ignore/clean up the record when the daemon is no longer live. Only return an error when the daemon is still confirmed running but cannot be signaled.


Synthesized from 2 reviews (agents: codex | types: default, security)

@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Jun 1, 2026

roborev: Combined Review (aa985bc)

High severity issue: Windows CI will fail to compile due to a Unix-only signal in a shared test.

High

  • cmd/kata/daemon_reload_loop_test.go:63
    The new Windows CI job will compile this test package on Windows, but the test still sends syscall.SIGHUP, which is Unix-only. As a result, go test ./... on windows-latest will fail at compile time.
    Fix: Use a platform-independent os.Signal test value, or send any cross-platform signal-like value, since runReloadLoop only reacts to channel delivery and does not inspect the signal.

Synthesized from 2 reviews (agents: codex | types: default, security)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants