fix(update): wait for writer flush, honour backpressure, clean up tmp on failure by Ricardo-M-L · Pull Request #158 · MiniMax-AI/cli

Ricardo-M-L · 2026-05-19T04:18:49Z

What does this PR do?

downloadFile() in src/update/self-update.ts had three concrete reliability issues, all triggered in the self-update path:

1. Premature `resolve()` racing the writer flush

The promise resolved as soon as the read loop hit done, but writer.end() only queues the close — bytes may still be in the kernel pagecache. The very next step is verifySha256(tmp, target.checksum), which reads the file back. On slower disks (HDD, network FS, CI runners with bursty I/O) this races the flush and surfaces as a spurious Checksum mismatch error in the middle of an update.

Fix: Wait for writer.on('finish') before resolving.

2. Missing backpressure on `writer.write()`

writer.write(value) was called without honouring its boolean return value. For large binaries the writer's internal buffer grows without bound until the entire download is in memory.

Fix: Await the drain event when write() returns false.

3. Half-downloaded binary left in `tmpdir()` on failure

On any error (network reset, 5xx mid-stream, writer error), the partial file at dest was left in tmpdir() permanently.

Fix: unlinkSync(dest) (best-effort) in a catch before rethrowing.

4. Web Streams reader lock never released

The reader's lock was never released — neither on the happy path nor on errors. The Web Streams API contract requires paired acquire/release; without the release the underlying response body stays locked.

Fix: Move reader.releaseLock() into a finally.

Test plan

npx tsc --noEmit passes
No behavior change on the success path beyond the (correct) flush wait
Manual: artificially slow down disk writes (fsync throttling) and verify the checksum step no longer reports mismatches

Why fix together?

All four are in the same 25-line function and share the same restructure (wrap the await new Promise() in try/catch/finally). Splitting would mean three separate refactors of the same block.

…p tmp `downloadFile()` in `src/update/self-update.ts` had three concrete reliability issues, all triggered in the self-update path: 1. `resolve()` was called as soon as the read loop hit `done`, but `writer.end()` only *queues* the close — bytes may still be in the kernel pagecache. The very next step, `verifySha256()`, reads back the same file; on slower disks (HDD, network FS, CI runners) this races the flush and surfaces as a spurious "Checksum mismatch" error mid-update. Replace the `resolve()` call with a `writer.on( 'finish')` handler so we only proceed once the file is durably closed. 2. `writer.write(value)` was called without honouring its boolean return value. For large binaries the writer's internal buffer grows without bound until the entire download is in memory. Wait for the `'drain'` event when `write()` returns `false`. 3. On any error (network reset, 5xx mid-stream, writer error), the half-written file at `dest` was left in `tmpdir()`. Unlink it in a catch block before rethrowing. 4. The Web Streams reader's lock was never released — neither on the happy path nor on errors. Move the `releaseLock()` into a `finally` so the underlying body stream is always freed. No behavior change on the success path beyond the (correct) flush wait. `tsc --noEmit` passes.

NianJiuZst

Thanks for tackling the reliability issues in this path. I agree with waiting for finish and honoring backpressure, but I don't think the error path is fully cleaned up yet.

There is also still no test coverage for src/update/self-update.ts, so the new finish / drain / cleanup / releaseLock() behavior can regress without anything catching it. Please add at least one mocked stream test that exercises a mid-stream failure and verifies the temp file is cleaned up only after the writer is fully torn down.

Reviewed with GPT-5.5 (xhigh reasoning).

NianJiuZst · 2026-05-19T07:54:02Z

+            if (done) { writer.end(); break; }
+            // Honour backpressure so we don't grow the writer's internal buffer
+            // unboundedly on large binaries / slow disks.
+            if (!writer.write(value)) {


On the failure path we only unlink dest, but we never close or destroy writer unless the loop reaches the done branch. If reader.read() rejects, or if writer.write() returns false and the writer errors before drain, this block can exit with a live WriteStream and an in-flight pump(). Please mirror the shutdown pattern from src/files/download.ts: tear down the writer in finally, wait for finish/error, and only then remove the temporary file.

@NianJiuZst

Review feedback on MiniMax-AI#158 (thanks @NianJiuZst): the error path needs to tear the writer down BEFORE unlinking dest, otherwise the writer's buffered bytes can race the unlink and leave a partial file in the pagecache. And the function still had no test coverage, so the `finish`/`drain`/cleanup/`releaseLock()` behaviour could regress silently. Changes: * `downloadFile`: on error, `writer.destroy()` and await `'close'` before calling `unlinkSync(dest)`. Skip the wait when the writer is already torn down so the rejection's exception isn't masked. * Export `downloadFile` from `src/update/self-update.ts` so unit tests can drive it directly (still not part of the public CLI surface — callers go through `applySelfUpdate`). * Add `test/update/self-update.test.ts` with three tests, matching the existing `test/files/download.test.ts` style: - mid-stream `ReadableStream` error path: rejects, dest gone, tempdir empty (the case the reviewer specifically asked for) - happy path: payload written, dest exists - 4xx response: rejects with CLIError, dest not created `npx tsc --noEmit` passes.

Ricardo-M-L · 2026-05-22T02:48:30Z

Thanks @NianJiuZst, both points addressed in 0791c28:

1. Error teardown order. downloadFile now destroys the writer and waits for the close event before calling unlinkSync(dest):

} catch (err) {
  if (!writer.destroyed) {
    await new Promise<void>(r => {
      writer.once('close', () => r());
      writer.destroy();
    });
  }
  try { (await import('fs')).unlinkSync(dest); } catch { /* best-effort */ }
  throw err;
}

So any buffered bytes in the writer are flushed/discarded before unlink, no pagecache race.

2. Test coverage. Added test/update/self-update.test.ts with three cases matching the existing test/files/download.test.ts style:

cleans up the temp file when the response stream fails mid-download — the one you specifically asked for. Mocks fetch with a ReadableStream that errors after one chunk, asserts existsSync(destPath) === false and the tempdir is empty.
writes the full payload and waits for the writer to flush on success — pair test; would fail if the error-path teardown accidentally also ran on success.
rejects with a CLIError on non-2xx response without creating a file — covers the early-exit branch.

downloadFile is now exported for direct unit-test access (kept off the public CLI path; callers still go through applySelfUpdate). npx tsc --noEmit passes.

TumCucTom

High-quality fix — three concrete reliability issues with clear root causes and surgical changes. The 47/-17 on self-update.ts and 97 added test lines feels right.

What I like:

The writer.on('finish') wait is the right fix for the flush race. The comment explaining "bytes may still be in the kernel pagecache" is exactly the kind of context that prevents a future contributor from "simplifying" it back.
Honoring write()'s boolean return and awaiting drain on false is the textbook backpressure pattern; glad it landed in one place where the comment can be read alongside the fix.
unlinkSync(dest) in a catch (best-effort, then rethrow) is the right cleanup shape — wrapping the unlink in its own try/catch so a failure to clean up the tmp file doesn't mask the original error.
The Web Streams reader-lock release is a subtle one that I've seen cause memory issues in production. Good catch.

One thing to double-check: the reader.releaseLock() happens after the loop ends. If the loop exits via the try block's catch (i.e., a read error mid-stream), does the releaseLock still happen? try/finally around the loop body is what you'd want, and I want to make sure the diff is shaped that way — the truncated body in the PR description makes it hard to tell.

Style nit: consider importing pipeline from node:stream/promises if you ever need to chain the read + write + verify steps; it composes these lifecycle hooks for you and would shrink the manual listener wiring. Not blocking — just a future-ergonomics thought.

Approved in spirit. If the releaseLock placement is correct, this is ready.

NianJiuZst suggested changes May 19, 2026

View reviewed changes

OrionT1 approved these changes Jun 1, 2026

View reviewed changes

TumCucTom reviewed Jun 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(update): wait for writer flush, honour backpressure, clean up tmp on failure#158

fix(update): wait for writer flush, honour backpressure, clean up tmp on failure#158
Ricardo-M-L wants to merge 2 commits into
MiniMax-AI:mainfrom
Ricardo-M-L:contrib/fix-stream-reader-lock-leak

Ricardo-M-L commented May 19, 2026

Uh oh!

NianJiuZst left a comment

Uh oh!

NianJiuZst May 19, 2026

Uh oh!

Ricardo-M-L commented May 22, 2026

Uh oh!

TumCucTom left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Ricardo-M-L commented May 19, 2026

What does this PR do?

1. Premature resolve() racing the writer flush

2. Missing backpressure on writer.write()

3. Half-downloaded binary left in tmpdir() on failure

4. Web Streams reader lock never released

Test plan

Why fix together?

Uh oh!

NianJiuZst left a comment

Choose a reason for hiding this comment

Uh oh!

NianJiuZst May 19, 2026

Choose a reason for hiding this comment

Uh oh!

Ricardo-M-L commented May 22, 2026

Uh oh!

TumCucTom left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

1. Premature `resolve()` racing the writer flush

2. Missing backpressure on `writer.write()`

3. Half-downloaded binary left in `tmpdir()` on failure