Skip to content

[Feature] Implement JSON-RPC over TCP loopback transport for gradle-server (resolves #1815) #1860

@wenytang-ms

Description

@wenytang-ms

Summary

Status — ✅ Shipped (all PRs merged to develop). #1863 (PR 1/3, Java) · #1864 (PR 2/3, TS) · #1867 (PR 3/3, cleanup) · follow-up #1868 (rename GrpcGradle* proto messages). Bug #1815 is resolved on develop. The only deferred item is handshake-nonce authentication (see "Deferred" below); it is not required for the #1815 fix.

Replace the gRPC/HTTP-2 transport between the VS Code extension and the long-running gradle-server JVM with JSON-RPC 2.0 over a TCP loopback socket (127.0.0.1 + ephemeral port) via LSP4J + vscode-jsonrpc. Proto message types (GetBuildReply, RunBuildReply, etc.) are retained as serialized payloads inside the JSON-RPC envelope — no schema changes, no public API changes.

This is the implementation follow-up to the design discussion in #1825 with the motivation re-grounded on the actual root cause identified from customer logs in #1815.

Root cause (verified from customer logs in #1815)

   Node @grpc/grpc-js HTTP/2 session reuse
        ↓ (known race, grpc/grpc-node#2872, Node http2 write/END_STREAM ordering)
   Truncated DATA frame: declares length N, flushes N-Δ bytes before END_STREAM
        ↓
   netty 4.1.130 strict mode (introduced by CVE-2025-55163 fix path, PR #15518)
        ↓
   PROTOCOL_ERROR → RST_STREAM → grpc-js status CANCELLED
        ↓
   User sees: "INTERNAL: Encountered end-of-stream mid-frame" + "Call cancelled"

PR #1775 (netty bump to 4.1.130) is the trigger. The customer logs analyzed in the #1815 thread show pure HTTP/2 frame errors with no process-kill markers — this is not an EDR issue (correcting the original framing in #1825).

Neither side will fix this upstream:

Proposed solution

Move the task RPC transport off HTTP/2 entirely:

Aspect Before After
Wire HTTP/2 over TCP loopback Raw TCP loopback (127.0.0.1:<ephemeral>)
RPC layer gRPC JSON-RPC 2.0 (LSP4J on Java, vscode-jsonrpc on Node)
Message bodies proto on wire proto bytes base64-embedded in JSON-RPC params/result
Streaming (GetBuild, RunBuild) gRPC server-streaming JSON-RPC request + correlated notifications keyed by streamId
Cancellation (de facto) business-level cancellation_key Same — single channel, no $/cancelRequest bridging
Port allocation Node get-port + retry Node net.createServer().listen(0, '127.0.0.1') → kernel-assigned, zero collision
Same-host isolation (none — TCP loopback inherently same-host-reachable) First-connection-wins on a loopback-only listener; handshake-nonce auth deferred (see below)

Why TCP loopback (and not Named Pipe / UDS)

The original draft of this proposal used Named Pipe (Windows) / Unix Domain Socket (macOS/Linux). After re-evaluation it was changed to plain TCP loopback for three reasons:

  1. First-class, mainstream transport in the VS Code extension ecosystem. vscode-languageclient exposes TransportKind with stdio / ipc / pipe / socket as peer-level options — socket is not a fallback. Microsoft's own vscode-java ships TCP loopback as a supported JDT.LS transport (see JDTLS_CLIENT_PORT) and has used it in enterprise environments for years.
  2. Lower Java-side complexity, zero cross-platform branching. No NamedPipeStream.java reimplementation (~120 lines avoided), no Windows pipe path vs Unix socket path branching, no UDS residual-file cleanup, and the listen callback completes synchronously so the JVM connect path needs no retry loop.
  3. Better observability. Standard netstat / lsof -i / tcpdump -i lo tooling works out of the box for diagnosing connect failures and stuck streams.

The EDR-avoidance argument that originally favored Named Pipe is not supported by evidence — the #1815 customer logs contain pure HTTP/2 frame errors with no process-kill markers, and Named Pipe is itself monitored by Defender for Identity / Microsoft ATA as a lateral-movement signal. Both transports are subject to the same enterprise HIPS surface, so the simpler, more ecosystem-aligned choice wins.

Why pipe/loopback instead of trying to stabilize HTTP/2

Why not a full proto-less rewrite

Retaining proto as payload preserves:

  • extension/src/api/Api.ts public surface (Output type, RunTaskOpts, etc.) — keeps backward compatibility for any third-party consumer.
  • All ~60 proto getter call sites across 14 TS files — zero churn outside TaskServerClient.ts.
  • All handler business logic in gradle-server/.../handlers/*Handler.java — only the IO sink abstraction changes (StreamObserverTaskReplySink<T>).

Connection establishment

As built (PR 1 + PR 2):

  1. Node creates net.createServer(), calls .listen(0, '127.0.0.1', cb), and only after the callback fires reads server.address().port (kernel-assigned ephemeral port — no port-probe race, no collision-retry).
  2. Node spawns the JVM with --port=<port>.
  3. JVM connects back via new Socket("127.0.0.1", port); Node accepts only the first inbound connection and wraps it as the MessageConnection. The handshake is gated by a 30 s connect timeout that disposes the listener, kills the JVM, and surfaces a clear error — no half-started state.

Deferred — handshake nonce authentication:

The original design mandated a 128-bit nonce passed to the JVM via --taskSocketNonce=<hex> and sent as the first gradle/handshake message, validated before the connection is published, because 127.0.0.1:<port> is reachable by any same-host process and ephemeral-port randomness is not a security boundary. This was intentionally deferred and is not in PR 1 / PR 2 — the current model relies on loopback-only binding + first-connection-wins. Tracked as a follow-up. (Switching ports is not a substitute: any loopback port is enumerable and reachable by same-host processes regardless of its number.)

Benefits

Impact on adjacent projects — verified zero

The vscode-gradle public API surface (extension/src/api/Api.ts) keeps the same exported types and method signatures. Verified via code search that no sibling project imports them:

Sibling Touch point Affected?
build-server-for-gradle Independent JVM; already uses JSON-RPC over Named Pipe (BuildServerConnector.ts); zero gRPC/netty deps in its tree No
eclipse.jdt.ls Uses BSP via build-server-for-gradle, not gradle-server No
vscode-java / vscode-java-test / vscode-java-debug 0 references to vscode-gradle's runTask / runBuild / onReady / RunTaskOpts No
vscode-java-dependency Only references extensionId: "vscjava.vscode-gradle" metadata (for project creation), not API No
vscode-maven / vscode-java-pack No interaction No

Implementation phases (PRs)

The cut-over is shipped as three sequential PRs rather than one mega-PR, so each protocol change stays reviewable in isolation. None of the intermediate states is meant to ship — no version is published between PR 1 and PR 3, so a temporary extension↔server protocol mismatch on develop (between the PR 1 and PR 2 merges) is explicitly accepted. There is intentionally no gRPC↔JSON-RPC compatibility shim (no dual-transport bootstrap, no feature flag); handlers and the client are rewritten in place.

The original standalone "Phase 1" transport-neutral handler refactor (#1861, TaskReplySink + GrpcReplySinkAdapter) was superseded and closed. Because the final design does a hard cut-over with no dual-transport window, the adapter is unnecessary — the handler rewrite is folded directly into PR 1 below.

PR 1/3 — gradle-server (Java): gRPC → JSON-RPC — #1863

Migrates the server half of the transport.

  • Adds package com.github.badsyntax.gradle.transport.jsonrpc: GradleService (@JsonSegment("gradle"), six @JsonRequest methods), GradleClient (two @JsonNotification streaming-reply methods gradle/getBuild/reply + gradle/runBuild/reply), the wire DTOs (GradleRequestParams { request, streamId }, GradleResponse { reply }, GradleStreamPayload { streamId, payload }), GradleServiceImpl (dispatches each RPC to its handler on a worker executor), and the socket bootstrap that connects out to Node's loopback listener and builds the LSP4J launcher.
  • Modifies GradleServer.java into a thin JSON-RPC bootstrap (no more ServerBuilder/NettyServerBuilder, no netty mid-frame log filter); rewrites all six handlers (GetBuild, RunBuild, GetProjectDependencies, CancelBuild, CancelBuilds, ExecuteCommand) to drop every io.grpc.* import; trims gradle-server/build.gradle to remove the gRPC/netty deps + grpc codegen plugin while keeping LSP4J + protobuf-java.
  • Removes the gRPC service shim (TaskService.java), ErrorMessageBuilder.java, and the netty-filter test.
  • Pins the wire contract consumed by PR 2 — method namespace gradle/, base64-of-proto-bytes payloads, and the four error codes (-32000 UNKNOWN, -32001 NOT_FOUND, -32002 CANCELLED, -32603 INTERNAL) — enforced by JsonRpcTransportTest.
  • Tests: GradleServerTest, JsonRpcTransportTest (two LSP4J launchers over piped streams: codec roundtrip, all six RPCs, streaming-notification channel isolation, error-code mapping), ExecuteCommandHandlerTest, GradleServerThreadFactoryTest.
  • State after merge: Java speaks JSON-RPC; the TS extension on develop still speaks gRPC, so the handshake fails. Bug Gradle Encountered end-of-stream mid-frame #1815 is not yet fixed from a user's perspective (the netty stack only leaves the runtime path once PR 2 also lands). Accepted because no version ships here.

PR 2/3 — extension (TypeScript): gRPC client → JSON-RPC client — #1864

Migrates the client half and closes the handshake gap.

  • Adds package extension/src/transport/jsonrpc/: GradleJsonRpcClient (typed facade over MessageConnection, demultiplexes the two streaming reply notifications back to per-call callbacks by streamId), loopbackServer.ts (binds 127.0.0.1:0, 30 s connect timeout), protoCodec.ts (base64 ↔ proto bytes), streamId.ts, JsonRpcErrors.ts (the four pinned error codes + helpers), and types.ts (the wire shapes pinned by PR 1).
  • Flips the architecture: the extension now listens on 127.0.0.1:0 and the JVM dials in (GradleServer.start() binds the listener before spawning the JVM and passes the ephemeral port via --port=<n>), eliminating any window where the JVM could connect before the extension is ready.
  • Rewrites TaskServerClient.ts onto GradleJsonRpcClient, dropping @grpc/grpc-js, waitForReady, channel-state polling, and the retryOnSpuriousCancel workaround (the Node HTTP/2 race it compensated for cannot occur over plain TCP + LSP4J framing). GradleRunnerTerminal.ts switches its error checks to the JSON-RPC helpers.
  • Removes the generated gRPC client stub (gradle_grpc_pb.{js,d.ts}), retryOnSpuriousCancel.ts + its test, the grpc protobuf plugin wiring in extension/build.gradle, and @grpc/grpc-js + grpc-tools from package.json (adds a direct pin on vscode-jsonrpc).
  • Tests: new GradleJsonRpcClient.test.ts (codec roundtrip, streaming dispatch, concurrent-stream multiplexing by streamId, unary roundtrips, ResponseError → GradleRpcError translation) plus the existing integration suite (notably should load gradle tasks).
  • State after merge: both sides speak JSON-RPC and Gradle Encountered end-of-stream mid-frame #1815 is fixed on develop. The proto/gradle.proto service Gradle { … } block and the root build.gradle grpcVersion property still exist but are no longer consumed by codegen on either side.

CI note: because PR 1's base is develop (still gRPC on the TS side), PR 1's cross-platform integration test should load gradle tasks cannot pass standalone — only its Build & Analyse + Java unit tests do. PR 2's branch is rebased on top of PR 1, so PR 2's diff against develop contains both halves and its full cross-platform CI goes green, validating the combined stack. The two PRs are merged together (or back-to-back) to keep develop green.

PR 3/3 — proto + root-build cleanup — #1867 (merged)

Removes the now-dead plumbing once both halves are on JSON-RPC.

  • Modifies proto/gradle.proto to delete the service Gradle { … } block (all message definitions stay — they remain the payload schema), and deletes the root build.gradle grpcVersion ext property.
  • Clears the remaining grpc mentions across docs and packaging: README.md, ARCHITECTURE.md, CONTRIBUTING.md, doc comments in the Java/TS transport code, the four io.grpc entries in cgmanifest.json, the grpc block in ThirdPartyNotices.txt, the @grpc/grpc-js dependency in npm-package/package.json (+ lockfile), and the @grpc/proto-loader webpack external.
  • Tests: no new tests; the existing suite keeps passing (:gradle-server:test, tsc, production webpack).
  • State after merge: the only remaining grpc tokens are intentional, documented exemptions — the GrpcGradle{Closure,Method,Field} proto message names (renamed separately in refactor: rename GrpcGradle* proto messages to Gradle*Proto #1868), the user-facing io.grpc:* Maven artifacts in gradle-language-server's ArtifactUsage.json autocomplete catalog, and historical entries in CHANGELOG.md.

Follow-up — rename GrpcGradle* proto messages — #1868 (merged)

A cosmetic follow-up (not part of the #1815 fix) that renames GrpcGradleClosure/Method/FieldGradleClosureProto/MethodProto/FieldProto (name-only; the protobuf wire format is unchanged because field numbers/tags are untouched) and drops the now-unused "Grpc" cSpell word. The *Proto suffix avoids colliding with the existing com.microsoft.gradle.api.GradleClosure/Method/Field domain models. After this, the codebase has zero gRPC-named identifiers outside the two data/history files noted above.

Out of scope

Related

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions