[SEA-NodeJS] (3/3) INTERVAL type parity + operation-lifecycle depth#411
Conversation
1ab03b7 to
33252ab
Compare
e79405d to
d8bcc30
Compare
33252ab to
59fa37a
Compare
d8bcc30 to
dde8efa
Compare
…rity, docs
Validated each finding against a live pecotesting warehouse first; the headline
INTERVAL story turned out to be split-artifact, not breakage.
- F7: getResultMetadata stored the *unpatched* Duration IPC bytes in
meta.arrowSchema while advertising ArrowBased — store the patched bytes so an
ArrowBased consumer doesn't hit `Unrecognized type "Duration" (18)`.
- F3: fetchChunk now honors the `isClosed` cooperative-cancel probe (parity with
ThriftOperationBackend) at its yield points.
- F6: on a fetch error, best-effort close the statement (napi contract: stream
is unspecified after Err) and surface a typed kernel error via decodeNapiKernelError.
- F9: cancel-after-fetch now throws the canonical OperationStateError(Canceled)
("The operation was canceled by a client") — byte-matches the Thrift message.
- F10: typed HiveDriverError (not raw Error) in the schema/fetchNextBatch guards.
- F1: corrected SeaArrowIpcDurationFix docs — on this layer the rewriter only
makes Duration *decodable* (raw Int64); the duration_unit formatter lands in
#411 (verified live: byte-identical to Thrift).
- F5: documented that nested Duration is a SHARED apache-arrow@13 limitation —
verified the Thrift backend throws the identical error, so SEA matches parity.
- F2: added a live e2e that drives a real Arrow Duration column through the
rewriter (asserts no "Duration (18)" crash + raw-Int64 on this layer).
- F8: pinned the no-`Failed` invariant in status() (failures reject at submit).
- F12: renamed SeaResultsProvider's SeaStatementHandle → SeaFetchHandle (was a
name collision with the lifecycle interface of a different shape).
- F13: dropped the no-op await on the synchronous statement.schema().
- F14: fixed the Float-precision comment (Precision enum, not bit-width).
- F15: SeaResultsProvider.prime loops instead of self-recursing on empty batches.
Deferred (noted on the PR): F4 (per-batch triple-decode perf) and F11
(hasResultSet() hard-coded true for M0).
Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
dde8efa to
fc2aee5
Compare
…erage Validated every interval edge case (null, multi-row, negative sub-year, sibling-survives) against a live pecotesting warehouse first — all byte-identical to Thrift. The findings were layering/dead-code/coverage, not runtime bugs. - F1: corrected the DURATION_UNIT_METADATA_KEY doc — the interval/duration branches are SEA-gated by construction (Thrift maps INTERVAL → ArrowString and never reaches them), NOT "reused by thrift" as the old comment claimed. - F3/F6/F10: formatArrowInterval now handles YEAR_MONTH only (typed `Interval` + `IntervalUnit.YEAR_MONTH`, no magic `=== 0`) and THROWS on any other unit. The old non-exhaustive default silently misread MONTH_DAY_NANO/undefined as [days,ms]; and the native Interval[DayTime] branch was dead+broken (the kernel emits DAY-TIME as Duration, handled separately) — removed it. - F2: exported the metadata key + added a test pinning it equal to the SEA-side declaration (guards against a silent rename drift). - F8: dropped the dead `_unit` param from formatDayTimeFromTotal. - F9: removed the dead duration check in the BigNum branch (rewritten Int64s arrive as raw bigint) and fixed the false "Int32Array instanceof Uint8Array" comment. - F7: added a YEAR-MONTH sub-year-negative unit test (-1 month → "-0-1"). - F4: added live e2e for null INTERVAL → null and multi-row batches. - F5: e2e before() now probes getSeaNative() and skips (not errors) when the binding is absent; dropped the flaky wall-clock latency assertions (assert behavior — cancel resolved, callback fired once). Deferred (low/informational, noted): F11 (consolidate test makeContext helpers), F12 (tighten instanceOf assertions), F13 (per-operation interval-representation breadcrumb — a per-value log would be spam). Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
|
Change naming to Kernel.. instead of Sea.. everywhere |
|
🟡 P2 (Minor )
|
…erage Validated every interval edge case (null, multi-row, negative sub-year, sibling-survives) against a live pecotesting warehouse first — all byte-identical to Thrift. The findings were layering/dead-code/coverage, not runtime bugs. - F1: corrected the DURATION_UNIT_METADATA_KEY doc — the interval/duration branches are SEA-gated by construction (Thrift maps INTERVAL → ArrowString and never reaches them), NOT "reused by thrift" as the old comment claimed. - F3/F6/F10: formatArrowInterval now handles YEAR_MONTH only (typed `Interval` + `IntervalUnit.YEAR_MONTH`, no magic `=== 0`) and THROWS on any other unit. The old non-exhaustive default silently misread MONTH_DAY_NANO/undefined as [days,ms]; and the native Interval[DayTime] branch was dead+broken (the kernel emits DAY-TIME as Duration, handled separately) — removed it. - F2: exported the metadata key + added a test pinning it equal to the SEA-side declaration (guards against a silent rename drift). - F8: dropped the dead `_unit` param from formatDayTimeFromTotal. - F9: removed the dead duration check in the BigNum branch (rewritten Int64s arrive as raw bigint) and fixed the false "Int32Array instanceof Uint8Array" comment. - F7: added a YEAR-MONTH sub-year-negative unit test (-1 month → "-0-1"). - F4: added live e2e for null INTERVAL → null and multi-row batches. - F5: e2e before() now probes getSeaNative() and skips (not errors) when the binding is absent; dropped the flaky wall-clock latency assertions (assert behavior — cancel resolved, callback fired once). Deferred (low/informational, noted): F11 (consolidate test makeContext helpers), F12 (tighten instanceOf assertions), F13 (per-operation interval-representation breadcrumb — a per-value log would be spam). Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
…t gaps Addresses the P2 review comment on #411 (the interval/duration layer) and cascades the #410 fixes underneath (this branch was rebased onto the updated #410 tip). Validated against a live warehouse (interval-duration + interval-edge e2e) and unit tests. - ArrowResultConverter.toNanoseconds: throw on an unrecognized Arrow duration unit instead of silently defaulting to NANOSECOND. The four units are exactly what SeaArrowIpcDurationFix stamps, so an unknown unit means the two sides drifted — fail loud, matching formatArrowInterval. - SeaIntervalParity tests: add DAY-TIME via Duration(MILLISECOND) and Duration(SECOND) (only MICROSECOND/NANOSECOND were exercised), and a test asserting the converter throws (HiveDriverError) on a native non-YEAR_MONTH Arrow Interval rather than misreading it. - interval-duration-e2e: flip the stale "raw Int64 on this layer" assertion to expect the formatted thrift string "1 02:03:04.000000000". That test was written on #410 (pre-formatter) and explicitly noted "#411 flips this to the formatted string" — now that #411's formatter is wired, the value is the byte-identical thrift DAY-TIME string. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
e2abd38 to
54dbcff
Compare
|
Addressed the P2 items in
The "rename |
|
Re: "Change naming to |
…erage Validated every interval edge case (null, multi-row, negative sub-year, sibling-survives) against a live pecotesting warehouse first — all byte-identical to Thrift. The findings were layering/dead-code/coverage, not runtime bugs. - F1: corrected the DURATION_UNIT_METADATA_KEY doc — the interval/duration branches are SEA-gated by construction (Thrift maps INTERVAL → ArrowString and never reaches them), NOT "reused by thrift" as the old comment claimed. - F3/F6/F10: formatArrowInterval now handles YEAR_MONTH only (typed `Interval` + `IntervalUnit.YEAR_MONTH`, no magic `=== 0`) and THROWS on any other unit. The old non-exhaustive default silently misread MONTH_DAY_NANO/undefined as [days,ms]; and the native Interval[DayTime] branch was dead+broken (the kernel emits DAY-TIME as Duration, handled separately) — removed it. - F2: exported the metadata key + added a test pinning it equal to the SEA-side declaration (guards against a silent rename drift). - F8: dropped the dead `_unit` param from formatDayTimeFromTotal. - F9: removed the dead duration check in the BigNum branch (rewritten Int64s arrive as raw bigint) and fixed the false "Int32Array instanceof Uint8Array" comment. - F7: added a YEAR-MONTH sub-year-negative unit test (-1 month → "-0-1"). - F4: added live e2e for null INTERVAL → null and multi-row batches. - F5: e2e before() now probes getSeaNative() and skips (not errors) when the binding is absent; dropped the flaky wall-clock latency assertions (assert behavior — cancel resolved, callback fired once). Deferred (low/informational, noted): F11 (consolidate test makeContext helpers), F12 (tighten instanceOf assertions), F13 (per-operation interval-representation breadcrumb — a per-value log would be spam). Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
…t gaps Addresses the P2 review comment on #411 (the interval/duration layer) and cascades the #410 fixes underneath (this branch was rebased onto the updated #410 tip). Validated against a live warehouse (interval-duration + interval-edge e2e) and unit tests. - ArrowResultConverter.toNanoseconds: throw on an unrecognized Arrow duration unit instead of silently defaulting to NANOSECOND. The four units are exactly what SeaArrowIpcDurationFix stamps, so an unknown unit means the two sides drifted — fail loud, matching formatArrowInterval. - SeaIntervalParity tests: add DAY-TIME via Duration(MILLISECOND) and Duration(SECOND) (only MICROSECOND/NANOSECOND were exercised), and a test asserting the converter throws (HiveDriverError) on a native non-YEAR_MONTH Arrow Interval rather than misreading it. - interval-duration-e2e: flip the stale "raw Int64 on this layer" assertion to expect the formatted thrift string "1 02:03:04.000000000". That test was written on #410 (pre-formatter) and explicitly noted "#411 flips this to the formatted string" — now that #411's formatter is wired, the value is the byte-identical thrift DAY-TIME string. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
54dbcff to
456046c
Compare
…rity, docs
Validated each finding against a live pecotesting warehouse first; the headline
INTERVAL story turned out to be split-artifact, not breakage.
- F7: getResultMetadata stored the *unpatched* Duration IPC bytes in
meta.arrowSchema while advertising ArrowBased — store the patched bytes so an
ArrowBased consumer doesn't hit `Unrecognized type "Duration" (18)`.
- F3: fetchChunk now honors the `isClosed` cooperative-cancel probe (parity with
ThriftOperationBackend) at its yield points.
- F6: on a fetch error, best-effort close the statement (napi contract: stream
is unspecified after Err) and surface a typed kernel error via decodeNapiKernelError.
- F9: cancel-after-fetch now throws the canonical OperationStateError(Canceled)
("The operation was canceled by a client") — byte-matches the Thrift message.
- F10: typed HiveDriverError (not raw Error) in the schema/fetchNextBatch guards.
- F1: corrected SeaArrowIpcDurationFix docs — on this layer the rewriter only
makes Duration *decodable* (raw Int64); the duration_unit formatter lands in
#411 (verified live: byte-identical to Thrift).
- F5: documented that nested Duration is a SHARED apache-arrow@13 limitation —
verified the Thrift backend throws the identical error, so SEA matches parity.
- F2: added a live e2e that drives a real Arrow Duration column through the
rewriter (asserts no "Duration (18)" crash + raw-Int64 on this layer).
- F8: pinned the no-`Failed` invariant in status() (failures reject at submit).
- F12: renamed SeaResultsProvider's SeaStatementHandle → SeaFetchHandle (was a
name collision with the lifecycle interface of a different shape).
- F13: dropped the no-op await on the synchronous statement.schema().
- F14: fixed the Float-precision comment (Precision enum, not bit-width).
- F15: SeaResultsProvider.prime loops instead of self-recursing on empty batches.
Deferred (noted on the PR): F4 (per-batch triple-decode perf) and F11
(hasResultSet() hard-coded true for M0).
Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
8d60efd to
1991121
Compare
* feat(sea): SEA execution + result fetching [2/3]
Second of three stacked PRs (base: [1/3] connect + auth). Wires the
statement-execution + result-read path:
- SeaSessionBackend.executeStatement: real implementation — runs SQL via the
napi Connection and returns a SeaOperationBackend (replaces [1/3]'s stub).
- SeaOperationBackend: fetch pipeline (napi Statement.fetchNextBatch →
SeaResultsProvider → ArrowResultConverter → ResultSlicer) plus operation
cancel/close/finished via the SeaOperationLifecycle helpers.
- SeaResultsProvider / SeaArrowIpc / SeaArrowIpcDurationFix: Arrow IPC decode
for inline + CloudFetch result batches (the duration-fix pre-processor
rewrites Arrow Duration → Int64 so apache-arrow@13 can read it).
- ArrowResultConverter: constructor now takes the neutral { schema? } shape so
both the Thrift and SEA backends construct it without an adapter.
- flatbuffers pinned to 23.5.26 to match apache-arrow@13's nested copy.
Tests: executeStatement + openSession forwarding, M0 datatype round-trip
through the shared converter (primitives + ARRAY/MAP/STRUCT), multi-batch
streaming, and the neutral-metadata converter contract. Full INTERVAL-type
value parity + exhaustive operation-lifecycle coverage land in [3/3].
Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
* fix(sea): address #410 review — fetch cleanup, cooperative cancel, parity, docs
Validated each finding against a live pecotesting warehouse first; the headline
INTERVAL story turned out to be split-artifact, not breakage.
- F7: getResultMetadata stored the *unpatched* Duration IPC bytes in
meta.arrowSchema while advertising ArrowBased — store the patched bytes so an
ArrowBased consumer doesn't hit `Unrecognized type "Duration" (18)`.
- F3: fetchChunk now honors the `isClosed` cooperative-cancel probe (parity with
ThriftOperationBackend) at its yield points.
- F6: on a fetch error, best-effort close the statement (napi contract: stream
is unspecified after Err) and surface a typed kernel error via decodeNapiKernelError.
- F9: cancel-after-fetch now throws the canonical OperationStateError(Canceled)
("The operation was canceled by a client") — byte-matches the Thrift message.
- F10: typed HiveDriverError (not raw Error) in the schema/fetchNextBatch guards.
- F1: corrected SeaArrowIpcDurationFix docs — on this layer the rewriter only
makes Duration *decodable* (raw Int64); the duration_unit formatter lands in
#411 (verified live: byte-identical to Thrift).
- F5: documented that nested Duration is a SHARED apache-arrow@13 limitation —
verified the Thrift backend throws the identical error, so SEA matches parity.
- F2: added a live e2e that drives a real Arrow Duration column through the
rewriter (asserts no "Duration (18)" crash + raw-Int64 on this layer).
- F8: pinned the no-`Failed` invariant in status() (failures reject at submit).
- F12: renamed SeaResultsProvider's SeaStatementHandle → SeaFetchHandle (was a
name collision with the lifecycle interface of a different shape).
- F13: dropped the no-op await on the synchronous statement.schema().
- F14: fixed the Float-precision comment (Precision enum, not bit-width).
- F15: SeaResultsProvider.prime loops instead of self-recursing on empty batches.
Deferred (noted on the PR): F4 (per-batch triple-decode perf) and F11
(hasResultSet() hard-coded true for M0).
Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
* fix(sea): address #410 review — error fidelity, fetch perf, validation, tests
Addresses the 8 review threads on PR #410. Validated against a live
warehouse (results-e2e parity gate + interval-duration-e2e + execution-e2e
all pass) plus new/updated unit tests.
- SeaOperationLifecycle: rethrowKernelError now delegates to the canonical
decodeNapiKernelError, so cancel/close errors get the same fidelity as
fetch errors — the sqlState remap (envelope field is `sqlState`, the old
code read `sqlstate` and dropped it), the kernelMetadata namespace, and
the strict `startsWith` sentinel match (was a loose `indexOf >= 0`).
- SeaArrowIpc: replace decodeIpcBatch (full RecordBatchReader
materialization just to sum row counts) with countRowsInIpc, which reads
RecordBatch header `length` via MessageReader and skips bodies — no
vector decode. Removes ~2x Arrow decode CPU + transient allocation on the
fetch hot path (the converter still re-decodes for values). SeaResultsProvider
switched to it.
- SeaArrowIpc: hermetic unit tests (tests/unit/sea/SeaArrowIpc.test.ts) for
the framing walk, no-op/garbage rewrite paths, the row-count path, and the
empty-schema guard. (The Duration-positive rewrite stays covered by the
live e2e — apache-arrow@13 can't construct a Duration column hermetically.)
- SeaOperationBackend: on a fetch-error cleanup close() that also fails, log
the failure at warn (statement may leak) instead of fully swallowing it —
the original fetch error is still surfaced.
- SeaSessionBackend: reject queryTags / useLZ4Compression /
stagingAllowedLocalPath with M0-style errors instead of silently ignoring
them (+ unit tests). Silent no-ops are the worst failure mode for callers.
- SeaArrowIpc: throw a typed HiveDriverError (not a raw TypeError) when an
IPC payload carries no schema.
- SeaArrowIpcDurationFix.readMessageAt: fail-closed guards for negative
metadataLength / bodyLength (was relying on subarray clamping).
- Fix stale `tests/integration/sea/...` doc refs → `tests/e2e/sea/`.
Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
* test(telemetry): de-flake FeatureFlagCache placeholder test (network seam)
The "fetchFeatureFlag should return false as placeholder implementation"
test called the real fetchFeatureFlag, which makes an HTTP call via
fetchWithRetry (10s timeout) to a bogus host. Under mocha's 2s default it
only passed when the DNS failure happened to resolve quickly — flaky
across runners and Node versions (it timed out on Node 14/16/18 in CI,
fail-fast-canceling the rest of the matrix; Node 20 passed).
Stub the fetchWithRetry network seam so the test deterministically
exercises the behavior under test (fetchFeatureFlag resolves to false)
with no real network call. No production code change.
Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
---------
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
Third of three stacked PRs (base: [2/3] execution + results). Completes the
SEA foundation:
- ArrowResultConverter: INTERVAL parity. Formats Arrow Interval[YearMonth] /
Interval[DayTime] and Duration (rewritten to Int64 by SeaArrowIpcDurationFix)
into the canonical Thrift strings ("Y-M" / "D HH:mm:ss.fffffffff"), byte-
identical to the Thrift path. Threads the Arrow field through convertArrowTypes
so the duration-unit metadata is available at value-conversion time.
- Exhaustive operation-lifecycle coverage: seaCancel / seaClose / seaFinished
idempotency, flag-set-before-await ordering (cancel-mid-fetch), kernel-error
mapping, and the neutral OperationStatus callback shape.
- SeaIntervalParity tests build real Arrow IPC batches via flatbuffers and
assert the formatted strings.
With this, SEA reaches M0 parity with Thrift (connect/auth → execute →
fetch → operation lifecycle → INTERVAL types). Replaces the single 8/8 PR #383.
Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
…erage Validated every interval edge case (null, multi-row, negative sub-year, sibling-survives) against a live pecotesting warehouse first — all byte-identical to Thrift. The findings were layering/dead-code/coverage, not runtime bugs. - F1: corrected the DURATION_UNIT_METADATA_KEY doc — the interval/duration branches are SEA-gated by construction (Thrift maps INTERVAL → ArrowString and never reaches them), NOT "reused by thrift" as the old comment claimed. - F3/F6/F10: formatArrowInterval now handles YEAR_MONTH only (typed `Interval` + `IntervalUnit.YEAR_MONTH`, no magic `=== 0`) and THROWS on any other unit. The old non-exhaustive default silently misread MONTH_DAY_NANO/undefined as [days,ms]; and the native Interval[DayTime] branch was dead+broken (the kernel emits DAY-TIME as Duration, handled separately) — removed it. - F2: exported the metadata key + added a test pinning it equal to the SEA-side declaration (guards against a silent rename drift). - F8: dropped the dead `_unit` param from formatDayTimeFromTotal. - F9: removed the dead duration check in the BigNum branch (rewritten Int64s arrive as raw bigint) and fixed the false "Int32Array instanceof Uint8Array" comment. - F7: added a YEAR-MONTH sub-year-negative unit test (-1 month → "-0-1"). - F4: added live e2e for null INTERVAL → null and multi-row batches. - F5: e2e before() now probes getSeaNative() and skips (not errors) when the binding is absent; dropped the flaky wall-clock latency assertions (assert behavior — cancel resolved, callback fired once). Deferred (low/informational, noted): F11 (consolidate test makeContext helpers), F12 (tighten instanceOf assertions), F13 (per-operation interval-representation breadcrumb — a per-value log would be spam). Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
…t gaps Addresses the P2 review comment on #411 (the interval/duration layer) and cascades the #410 fixes underneath (this branch was rebased onto the updated #410 tip). Validated against a live warehouse (interval-duration + interval-edge e2e) and unit tests. - ArrowResultConverter.toNanoseconds: throw on an unrecognized Arrow duration unit instead of silently defaulting to NANOSECOND. The four units are exactly what SeaArrowIpcDurationFix stamps, so an unknown unit means the two sides drifted — fail loud, matching formatArrowInterval. - SeaIntervalParity tests: add DAY-TIME via Duration(MILLISECOND) and Duration(SECOND) (only MICROSECOND/NANOSECOND were exercised), and a test asserting the converter throws (HiveDriverError) on a native non-YEAR_MONTH Arrow Interval rather than misreading it. - interval-duration-e2e: flip the stale "raw Int64 on this layer" assertion to expect the formatted thrift string "1 02:03:04.000000000". That test was written on #410 (pre-formatter) and explicitly noted "#411 flips this to the formatted string" — now that #411's formatter is wired, the value is the byte-identical thrift DAY-TIME string. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
456046c to
92f2ee8
Compare
Third of three stacked PRs (base: 2/3 execution + results). Completes the SEA M0 foundation.
Y-M/D HH:mm:ss.fffffffff), byte-identical to the Thrift path.With this, SEA reaches M0 parity with Thrift. Replaces the single 8/8 #383.
Stack: 1/3 → 2/3 → 3/3
This pull request and its description were written by Isaac.