Skip to content

[SEA-NodeJS] (3/3) INTERVAL type parity + operation-lifecycle depth#411

Merged
msrathore-db merged 3 commits into
mainfrom
msrathore/sea-operation-lifecycle
Jun 2, 2026
Merged

[SEA-NodeJS] (3/3) INTERVAL type parity + operation-lifecycle depth#411
msrathore-db merged 3 commits into
mainfrom
msrathore/sea-operation-lifecycle

Conversation

@msrathore-db
Copy link
Copy Markdown
Contributor

Third of three stacked PRs (base: 2/3 execution + results). Completes the SEA M0 foundation.

  • ArrowResultConverter — INTERVAL parity: formats Arrow Interval[YearMonth]/Interval[DayTime] and Duration (→Int64 via SeaArrowIpcDurationFix) into canonical Thrift strings (Y-M / D HH:mm:ss.fffffffff), byte-identical to the Thrift path.
  • Exhaustive operation-lifecycle coverage: seaCancel/seaClose/seaFinished idempotency, flag-set-before-await (cancel-mid-fetch), kernel-error mapping, neutral OperationStatus callback shape.
  • SeaIntervalParity tests build real Arrow IPC batches via flatbuffers and assert the formatted strings.

With this, SEA reaches M0 parity with Thrift. Replaces the single 8/8 #383.

Stack: 1/3 → 2/3 → 3/3

This pull request and its description were written by Isaac.

@msrathore-db msrathore-db force-pushed the msrathore/sea-execution-results branch from 1ab03b7 to 33252ab Compare June 1, 2026 13:43
@msrathore-db msrathore-db force-pushed the msrathore/sea-operation-lifecycle branch from e79405d to d8bcc30 Compare June 1, 2026 13:43
@msrathore-db msrathore-db force-pushed the msrathore/sea-execution-results branch from 33252ab to 59fa37a Compare June 1, 2026 17:57
@msrathore-db msrathore-db force-pushed the msrathore/sea-operation-lifecycle branch from d8bcc30 to dde8efa Compare June 1, 2026 17:58
msrathore-db added a commit that referenced this pull request Jun 1, 2026
…rity, docs

Validated each finding against a live pecotesting warehouse first; the headline
INTERVAL story turned out to be split-artifact, not breakage.

- F7: getResultMetadata stored the *unpatched* Duration IPC bytes in
  meta.arrowSchema while advertising ArrowBased — store the patched bytes so an
  ArrowBased consumer doesn't hit `Unrecognized type "Duration" (18)`.
- F3: fetchChunk now honors the `isClosed` cooperative-cancel probe (parity with
  ThriftOperationBackend) at its yield points.
- F6: on a fetch error, best-effort close the statement (napi contract: stream
  is unspecified after Err) and surface a typed kernel error via decodeNapiKernelError.
- F9: cancel-after-fetch now throws the canonical OperationStateError(Canceled)
  ("The operation was canceled by a client") — byte-matches the Thrift message.
- F10: typed HiveDriverError (not raw Error) in the schema/fetchNextBatch guards.
- F1: corrected SeaArrowIpcDurationFix docs — on this layer the rewriter only
  makes Duration *decodable* (raw Int64); the duration_unit formatter lands in
  #411 (verified live: byte-identical to Thrift).
- F5: documented that nested Duration is a SHARED apache-arrow@13 limitation —
  verified the Thrift backend throws the identical error, so SEA matches parity.
- F2: added a live e2e that drives a real Arrow Duration column through the
  rewriter (asserts no "Duration (18)" crash + raw-Int64 on this layer).
- F8: pinned the no-`Failed` invariant in status() (failures reject at submit).
- F12: renamed SeaResultsProvider's SeaStatementHandle → SeaFetchHandle (was a
  name collision with the lifecycle interface of a different shape).
- F13: dropped the no-op await on the synchronous statement.schema().
- F14: fixed the Float-precision comment (Precision enum, not bit-width).
- F15: SeaResultsProvider.prime loops instead of self-recursing on empty batches.

Deferred (noted on the PR): F4 (per-batch triple-decode perf) and F11
(hasResultSet() hard-coded true for M0).

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
@msrathore-db msrathore-db force-pushed the msrathore/sea-operation-lifecycle branch from dde8efa to fc2aee5 Compare June 1, 2026 20:42
msrathore-db added a commit that referenced this pull request Jun 1, 2026
…erage

Validated every interval edge case (null, multi-row, negative sub-year,
sibling-survives) against a live pecotesting warehouse first — all byte-identical
to Thrift. The findings were layering/dead-code/coverage, not runtime bugs.

- F1: corrected the DURATION_UNIT_METADATA_KEY doc — the interval/duration
  branches are SEA-gated by construction (Thrift maps INTERVAL → ArrowString and
  never reaches them), NOT "reused by thrift" as the old comment claimed.
- F3/F6/F10: formatArrowInterval now handles YEAR_MONTH only (typed `Interval` +
  `IntervalUnit.YEAR_MONTH`, no magic `=== 0`) and THROWS on any other unit. The
  old non-exhaustive default silently misread MONTH_DAY_NANO/undefined as
  [days,ms]; and the native Interval[DayTime] branch was dead+broken (the kernel
  emits DAY-TIME as Duration, handled separately) — removed it.
- F2: exported the metadata key + added a test pinning it equal to the SEA-side
  declaration (guards against a silent rename drift).
- F8: dropped the dead `_unit` param from formatDayTimeFromTotal.
- F9: removed the dead duration check in the BigNum branch (rewritten Int64s
  arrive as raw bigint) and fixed the false "Int32Array instanceof Uint8Array"
  comment.
- F7: added a YEAR-MONTH sub-year-negative unit test (-1 month → "-0-1").
- F4: added live e2e for null INTERVAL → null and multi-row batches.
- F5: e2e before() now probes getSeaNative() and skips (not errors) when the
  binding is absent; dropped the flaky wall-clock latency assertions (assert
  behavior — cancel resolved, callback fired once).

Deferred (low/informational, noted): F11 (consolidate test makeContext helpers),
F12 (tighten instanceOf assertions), F13 (per-operation interval-representation
breadcrumb — a per-value log would be spam).

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
@msrathore-db msrathore-db requested a review from gopalldb June 2, 2026 09:10
@gopalldb
Copy link
Copy Markdown
Collaborator

gopalldb commented Jun 2, 2026

Change naming to Kernel.. instead of Sea.. everywhere

@gopalldb
Copy link
Copy Markdown
Collaborator

gopalldb commented Jun 2, 2026

🟡 P2 (Minor )

  • toNanoseconds default treats unknown units as nanoseconds — SEA-only and currently impossible (rewriter enumerates exactly the four units), but the asymmetry (rewriter defaults MICROSECOND, converter
    defaults NANOSECOND) is a latent footgun if they ever drift. → consider throwing on unrecognized units, mirroring formatArrowInterval's fail-loud stance.
  • formatArrowInterval throws mid-result on native non-YEAR_MONTH intervals — reachable only on SEA if the kernel's "DAY-TIME always arrives as Duration" assumption breaks. Fail-loud is defensible (the old
    non-exhaustive default emitted confidently-wrong [days, ms]), and it can't affect Thrift.
  • Test gaps — no dedicated test asserts the converter throws on a native non-YEAR_MONTH Interval, and no SEA test exercises MILLISECOND/SECOND duration units specifically

msrathore-db added a commit that referenced this pull request Jun 2, 2026
…erage

Validated every interval edge case (null, multi-row, negative sub-year,
sibling-survives) against a live pecotesting warehouse first — all byte-identical
to Thrift. The findings were layering/dead-code/coverage, not runtime bugs.

- F1: corrected the DURATION_UNIT_METADATA_KEY doc — the interval/duration
  branches are SEA-gated by construction (Thrift maps INTERVAL → ArrowString and
  never reaches them), NOT "reused by thrift" as the old comment claimed.
- F3/F6/F10: formatArrowInterval now handles YEAR_MONTH only (typed `Interval` +
  `IntervalUnit.YEAR_MONTH`, no magic `=== 0`) and THROWS on any other unit. The
  old non-exhaustive default silently misread MONTH_DAY_NANO/undefined as
  [days,ms]; and the native Interval[DayTime] branch was dead+broken (the kernel
  emits DAY-TIME as Duration, handled separately) — removed it.
- F2: exported the metadata key + added a test pinning it equal to the SEA-side
  declaration (guards against a silent rename drift).
- F8: dropped the dead `_unit` param from formatDayTimeFromTotal.
- F9: removed the dead duration check in the BigNum branch (rewritten Int64s
  arrive as raw bigint) and fixed the false "Int32Array instanceof Uint8Array"
  comment.
- F7: added a YEAR-MONTH sub-year-negative unit test (-1 month → "-0-1").
- F4: added live e2e for null INTERVAL → null and multi-row batches.
- F5: e2e before() now probes getSeaNative() and skips (not errors) when the
  binding is absent; dropped the flaky wall-clock latency assertions (assert
  behavior — cancel resolved, callback fired once).

Deferred (low/informational, noted): F11 (consolidate test makeContext helpers),
F12 (tighten instanceOf assertions), F13 (per-operation interval-representation
breadcrumb — a per-value log would be spam).

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
msrathore-db added a commit that referenced this pull request Jun 2, 2026
…t gaps

Addresses the P2 review comment on #411 (the interval/duration layer) and
cascades the #410 fixes underneath (this branch was rebased onto the
updated #410 tip). Validated against a live warehouse (interval-duration
+ interval-edge e2e) and unit tests.

- ArrowResultConverter.toNanoseconds: throw on an unrecognized Arrow
  duration unit instead of silently defaulting to NANOSECOND. The four
  units are exactly what SeaArrowIpcDurationFix stamps, so an unknown unit
  means the two sides drifted — fail loud, matching formatArrowInterval.

- SeaIntervalParity tests: add DAY-TIME via Duration(MILLISECOND) and
  Duration(SECOND) (only MICROSECOND/NANOSECOND were exercised), and a
  test asserting the converter throws (HiveDriverError) on a native
  non-YEAR_MONTH Arrow Interval rather than misreading it.

- interval-duration-e2e: flip the stale "raw Int64 on this layer" assertion
  to expect the formatted thrift string "1 02:03:04.000000000". That test
  was written on #410 (pre-formatter) and explicitly noted "#411 flips this
  to the formatted string" — now that #411's formatter is wired, the value
  is the byte-identical thrift DAY-TIME string.

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
@msrathore-db msrathore-db force-pushed the msrathore/sea-operation-lifecycle branch from e2abd38 to 54dbcff Compare June 2, 2026 14:13
@msrathore-db
Copy link
Copy Markdown
Contributor Author

Addressed the P2 items in 54dbcff (validated against a live warehouse + unit tests):

  • toNanoseconds fail-loud: now throws HiveDriverError on an unrecognized Arrow duration unit instead of defaulting to NANOSECOND. The four units are exactly what SeaArrowIpcDurationFix stamps, so an unknown unit means the two sides drifted — fail loud, matching formatArrowInterval's stance.
  • formatArrowInterval mid-result throw: kept as-is (fail-loud is the intended contract, as you noted — it can't affect Thrift).
  • Test gaps: added DAY-TIME via Duration(MILLISECOND) and Duration(SECOND) cases (only MICROSECOND/NANOSECOND were exercised), plus a test asserting the converter throws on a native non-YEAR_MONTH Arrow Interval rather than misreading it as [days, ms].

The "rename Sea..Kernel.. everywhere" comment is broader (it touches code already on main and the public useSEA option) — following up separately on scope.

@msrathore-db
Copy link
Copy Markdown
Contributor Author

Re: "Change naming to Kernel.. instead of Sea.. everywhere" — deferring this to a separate dedicated PR rather than folding it into this stack. The SEA backend (the lib/sea/ files + the public useSEA connection option) is already on main, and a ~349-occurrence rename across both #410/#411 and the merged code would diverge from main, churn the public API, and make these feature PRs hard to review. It'll be cleaner as one self-contained rename PR once the stack lands.

msrathore-db added a commit that referenced this pull request Jun 2, 2026
…erage

Validated every interval edge case (null, multi-row, negative sub-year,
sibling-survives) against a live pecotesting warehouse first — all byte-identical
to Thrift. The findings were layering/dead-code/coverage, not runtime bugs.

- F1: corrected the DURATION_UNIT_METADATA_KEY doc — the interval/duration
  branches are SEA-gated by construction (Thrift maps INTERVAL → ArrowString and
  never reaches them), NOT "reused by thrift" as the old comment claimed.
- F3/F6/F10: formatArrowInterval now handles YEAR_MONTH only (typed `Interval` +
  `IntervalUnit.YEAR_MONTH`, no magic `=== 0`) and THROWS on any other unit. The
  old non-exhaustive default silently misread MONTH_DAY_NANO/undefined as
  [days,ms]; and the native Interval[DayTime] branch was dead+broken (the kernel
  emits DAY-TIME as Duration, handled separately) — removed it.
- F2: exported the metadata key + added a test pinning it equal to the SEA-side
  declaration (guards against a silent rename drift).
- F8: dropped the dead `_unit` param from formatDayTimeFromTotal.
- F9: removed the dead duration check in the BigNum branch (rewritten Int64s
  arrive as raw bigint) and fixed the false "Int32Array instanceof Uint8Array"
  comment.
- F7: added a YEAR-MONTH sub-year-negative unit test (-1 month → "-0-1").
- F4: added live e2e for null INTERVAL → null and multi-row batches.
- F5: e2e before() now probes getSeaNative() and skips (not errors) when the
  binding is absent; dropped the flaky wall-clock latency assertions (assert
  behavior — cancel resolved, callback fired once).

Deferred (low/informational, noted): F11 (consolidate test makeContext helpers),
F12 (tighten instanceOf assertions), F13 (per-operation interval-representation
breadcrumb — a per-value log would be spam).

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
msrathore-db added a commit that referenced this pull request Jun 2, 2026
…t gaps

Addresses the P2 review comment on #411 (the interval/duration layer) and
cascades the #410 fixes underneath (this branch was rebased onto the
updated #410 tip). Validated against a live warehouse (interval-duration
+ interval-edge e2e) and unit tests.

- ArrowResultConverter.toNanoseconds: throw on an unrecognized Arrow
  duration unit instead of silently defaulting to NANOSECOND. The four
  units are exactly what SeaArrowIpcDurationFix stamps, so an unknown unit
  means the two sides drifted — fail loud, matching formatArrowInterval.

- SeaIntervalParity tests: add DAY-TIME via Duration(MILLISECOND) and
  Duration(SECOND) (only MICROSECOND/NANOSECOND were exercised), and a
  test asserting the converter throws (HiveDriverError) on a native
  non-YEAR_MONTH Arrow Interval rather than misreading it.

- interval-duration-e2e: flip the stale "raw Int64 on this layer" assertion
  to expect the formatted thrift string "1 02:03:04.000000000". That test
  was written on #410 (pre-formatter) and explicitly noted "#411 flips this
  to the formatted string" — now that #411's formatter is wired, the value
  is the byte-identical thrift DAY-TIME string.

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
@msrathore-db msrathore-db force-pushed the msrathore/sea-operation-lifecycle branch from 54dbcff to 456046c Compare June 2, 2026 14:40
msrathore-db added a commit that referenced this pull request Jun 2, 2026
…rity, docs

Validated each finding against a live pecotesting warehouse first; the headline
INTERVAL story turned out to be split-artifact, not breakage.

- F7: getResultMetadata stored the *unpatched* Duration IPC bytes in
  meta.arrowSchema while advertising ArrowBased — store the patched bytes so an
  ArrowBased consumer doesn't hit `Unrecognized type "Duration" (18)`.
- F3: fetchChunk now honors the `isClosed` cooperative-cancel probe (parity with
  ThriftOperationBackend) at its yield points.
- F6: on a fetch error, best-effort close the statement (napi contract: stream
  is unspecified after Err) and surface a typed kernel error via decodeNapiKernelError.
- F9: cancel-after-fetch now throws the canonical OperationStateError(Canceled)
  ("The operation was canceled by a client") — byte-matches the Thrift message.
- F10: typed HiveDriverError (not raw Error) in the schema/fetchNextBatch guards.
- F1: corrected SeaArrowIpcDurationFix docs — on this layer the rewriter only
  makes Duration *decodable* (raw Int64); the duration_unit formatter lands in
  #411 (verified live: byte-identical to Thrift).
- F5: documented that nested Duration is a SHARED apache-arrow@13 limitation —
  verified the Thrift backend throws the identical error, so SEA matches parity.
- F2: added a live e2e that drives a real Arrow Duration column through the
  rewriter (asserts no "Duration (18)" crash + raw-Int64 on this layer).
- F8: pinned the no-`Failed` invariant in status() (failures reject at submit).
- F12: renamed SeaResultsProvider's SeaStatementHandle → SeaFetchHandle (was a
  name collision with the lifecycle interface of a different shape).
- F13: dropped the no-op await on the synchronous statement.schema().
- F14: fixed the Float-precision comment (Precision enum, not bit-width).
- F15: SeaResultsProvider.prime loops instead of self-recursing on empty batches.

Deferred (noted on the PR): F4 (per-batch triple-decode perf) and F11
(hasResultSet() hard-coded true for M0).

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
@msrathore-db msrathore-db force-pushed the msrathore/sea-execution-results branch from 8d60efd to 1991121 Compare June 2, 2026 14:54
msrathore-db added a commit that referenced this pull request Jun 2, 2026
* feat(sea): SEA execution + result fetching [2/3]

Second of three stacked PRs (base: [1/3] connect + auth). Wires the
statement-execution + result-read path:

- SeaSessionBackend.executeStatement: real implementation — runs SQL via the
  napi Connection and returns a SeaOperationBackend (replaces [1/3]'s stub).
- SeaOperationBackend: fetch pipeline (napi Statement.fetchNextBatch →
  SeaResultsProvider → ArrowResultConverter → ResultSlicer) plus operation
  cancel/close/finished via the SeaOperationLifecycle helpers.
- SeaResultsProvider / SeaArrowIpc / SeaArrowIpcDurationFix: Arrow IPC decode
  for inline + CloudFetch result batches (the duration-fix pre-processor
  rewrites Arrow Duration → Int64 so apache-arrow@13 can read it).
- ArrowResultConverter: constructor now takes the neutral { schema? } shape so
  both the Thrift and SEA backends construct it without an adapter.
- flatbuffers pinned to 23.5.26 to match apache-arrow@13's nested copy.

Tests: executeStatement + openSession forwarding, M0 datatype round-trip
through the shared converter (primitives + ARRAY/MAP/STRUCT), multi-batch
streaming, and the neutral-metadata converter contract. Full INTERVAL-type
value parity + exhaustive operation-lifecycle coverage land in [3/3].

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>

* fix(sea): address #410 review — fetch cleanup, cooperative cancel, parity, docs

Validated each finding against a live pecotesting warehouse first; the headline
INTERVAL story turned out to be split-artifact, not breakage.

- F7: getResultMetadata stored the *unpatched* Duration IPC bytes in
  meta.arrowSchema while advertising ArrowBased — store the patched bytes so an
  ArrowBased consumer doesn't hit `Unrecognized type "Duration" (18)`.
- F3: fetchChunk now honors the `isClosed` cooperative-cancel probe (parity with
  ThriftOperationBackend) at its yield points.
- F6: on a fetch error, best-effort close the statement (napi contract: stream
  is unspecified after Err) and surface a typed kernel error via decodeNapiKernelError.
- F9: cancel-after-fetch now throws the canonical OperationStateError(Canceled)
  ("The operation was canceled by a client") — byte-matches the Thrift message.
- F10: typed HiveDriverError (not raw Error) in the schema/fetchNextBatch guards.
- F1: corrected SeaArrowIpcDurationFix docs — on this layer the rewriter only
  makes Duration *decodable* (raw Int64); the duration_unit formatter lands in
  #411 (verified live: byte-identical to Thrift).
- F5: documented that nested Duration is a SHARED apache-arrow@13 limitation —
  verified the Thrift backend throws the identical error, so SEA matches parity.
- F2: added a live e2e that drives a real Arrow Duration column through the
  rewriter (asserts no "Duration (18)" crash + raw-Int64 on this layer).
- F8: pinned the no-`Failed` invariant in status() (failures reject at submit).
- F12: renamed SeaResultsProvider's SeaStatementHandle → SeaFetchHandle (was a
  name collision with the lifecycle interface of a different shape).
- F13: dropped the no-op await on the synchronous statement.schema().
- F14: fixed the Float-precision comment (Precision enum, not bit-width).
- F15: SeaResultsProvider.prime loops instead of self-recursing on empty batches.

Deferred (noted on the PR): F4 (per-batch triple-decode perf) and F11
(hasResultSet() hard-coded true for M0).

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>

* fix(sea): address #410 review — error fidelity, fetch perf, validation, tests

Addresses the 8 review threads on PR #410. Validated against a live
warehouse (results-e2e parity gate + interval-duration-e2e + execution-e2e
all pass) plus new/updated unit tests.

- SeaOperationLifecycle: rethrowKernelError now delegates to the canonical
  decodeNapiKernelError, so cancel/close errors get the same fidelity as
  fetch errors — the sqlState remap (envelope field is `sqlState`, the old
  code read `sqlstate` and dropped it), the kernelMetadata namespace, and
  the strict `startsWith` sentinel match (was a loose `indexOf >= 0`).

- SeaArrowIpc: replace decodeIpcBatch (full RecordBatchReader
  materialization just to sum row counts) with countRowsInIpc, which reads
  RecordBatch header `length` via MessageReader and skips bodies — no
  vector decode. Removes ~2x Arrow decode CPU + transient allocation on the
  fetch hot path (the converter still re-decodes for values). SeaResultsProvider
  switched to it.

- SeaArrowIpc: hermetic unit tests (tests/unit/sea/SeaArrowIpc.test.ts) for
  the framing walk, no-op/garbage rewrite paths, the row-count path, and the
  empty-schema guard. (The Duration-positive rewrite stays covered by the
  live e2e — apache-arrow@13 can't construct a Duration column hermetically.)

- SeaOperationBackend: on a fetch-error cleanup close() that also fails, log
  the failure at warn (statement may leak) instead of fully swallowing it —
  the original fetch error is still surfaced.

- SeaSessionBackend: reject queryTags / useLZ4Compression /
  stagingAllowedLocalPath with M0-style errors instead of silently ignoring
  them (+ unit tests). Silent no-ops are the worst failure mode for callers.

- SeaArrowIpc: throw a typed HiveDriverError (not a raw TypeError) when an
  IPC payload carries no schema.

- SeaArrowIpcDurationFix.readMessageAt: fail-closed guards for negative
  metadataLength / bodyLength (was relying on subarray clamping).

- Fix stale `tests/integration/sea/...` doc refs → `tests/e2e/sea/`.

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>

* test(telemetry): de-flake FeatureFlagCache placeholder test (network seam)

The "fetchFeatureFlag should return false as placeholder implementation"
test called the real fetchFeatureFlag, which makes an HTTP call via
fetchWithRetry (10s timeout) to a bogus host. Under mocha's 2s default it
only passed when the DNS failure happened to resolve quickly — flaky
across runners and Node versions (it timed out on Node 14/16/18 in CI,
fail-fast-canceling the rest of the matrix; Node 20 passed).

Stub the fetchWithRetry network seam so the test deterministically
exercises the behavior under test (fetchFeatureFlag resolves to false)
with no real network call. No production code change.

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>

---------

Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
Third of three stacked PRs (base: [2/3] execution + results). Completes the
SEA foundation:

- ArrowResultConverter: INTERVAL parity. Formats Arrow Interval[YearMonth] /
  Interval[DayTime] and Duration (rewritten to Int64 by SeaArrowIpcDurationFix)
  into the canonical Thrift strings ("Y-M" / "D HH:mm:ss.fffffffff"), byte-
  identical to the Thrift path. Threads the Arrow field through convertArrowTypes
  so the duration-unit metadata is available at value-conversion time.
- Exhaustive operation-lifecycle coverage: seaCancel / seaClose / seaFinished
  idempotency, flag-set-before-await ordering (cancel-mid-fetch), kernel-error
  mapping, and the neutral OperationStatus callback shape.
- SeaIntervalParity tests build real Arrow IPC batches via flatbuffers and
  assert the formatted strings.

With this, SEA reaches M0 parity with Thrift (connect/auth → execute →
fetch → operation lifecycle → INTERVAL types). Replaces the single 8/8 PR #383.

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
…erage

Validated every interval edge case (null, multi-row, negative sub-year,
sibling-survives) against a live pecotesting warehouse first — all byte-identical
to Thrift. The findings were layering/dead-code/coverage, not runtime bugs.

- F1: corrected the DURATION_UNIT_METADATA_KEY doc — the interval/duration
  branches are SEA-gated by construction (Thrift maps INTERVAL → ArrowString and
  never reaches them), NOT "reused by thrift" as the old comment claimed.
- F3/F6/F10: formatArrowInterval now handles YEAR_MONTH only (typed `Interval` +
  `IntervalUnit.YEAR_MONTH`, no magic `=== 0`) and THROWS on any other unit. The
  old non-exhaustive default silently misread MONTH_DAY_NANO/undefined as
  [days,ms]; and the native Interval[DayTime] branch was dead+broken (the kernel
  emits DAY-TIME as Duration, handled separately) — removed it.
- F2: exported the metadata key + added a test pinning it equal to the SEA-side
  declaration (guards against a silent rename drift).
- F8: dropped the dead `_unit` param from formatDayTimeFromTotal.
- F9: removed the dead duration check in the BigNum branch (rewritten Int64s
  arrive as raw bigint) and fixed the false "Int32Array instanceof Uint8Array"
  comment.
- F7: added a YEAR-MONTH sub-year-negative unit test (-1 month → "-0-1").
- F4: added live e2e for null INTERVAL → null and multi-row batches.
- F5: e2e before() now probes getSeaNative() and skips (not errors) when the
  binding is absent; dropped the flaky wall-clock latency assertions (assert
  behavior — cancel resolved, callback fired once).

Deferred (low/informational, noted): F11 (consolidate test makeContext helpers),
F12 (tighten instanceOf assertions), F13 (per-operation interval-representation
breadcrumb — a per-value log would be spam).

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
…t gaps

Addresses the P2 review comment on #411 (the interval/duration layer) and
cascades the #410 fixes underneath (this branch was rebased onto the
updated #410 tip). Validated against a live warehouse (interval-duration
+ interval-edge e2e) and unit tests.

- ArrowResultConverter.toNanoseconds: throw on an unrecognized Arrow
  duration unit instead of silently defaulting to NANOSECOND. The four
  units are exactly what SeaArrowIpcDurationFix stamps, so an unknown unit
  means the two sides drifted — fail loud, matching formatArrowInterval.

- SeaIntervalParity tests: add DAY-TIME via Duration(MILLISECOND) and
  Duration(SECOND) (only MICROSECOND/NANOSECOND were exercised), and a
  test asserting the converter throws (HiveDriverError) on a native
  non-YEAR_MONTH Arrow Interval rather than misreading it.

- interval-duration-e2e: flip the stale "raw Int64 on this layer" assertion
  to expect the formatted thrift string "1 02:03:04.000000000". That test
  was written on #410 (pre-formatter) and explicitly noted "#411 flips this
  to the formatted string" — now that #411's formatter is wired, the value
  is the byte-identical thrift DAY-TIME string.

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
@msrathore-db msrathore-db changed the base branch from msrathore/sea-execution-results to main June 2, 2026 15:01
@msrathore-db msrathore-db force-pushed the msrathore/sea-operation-lifecycle branch from 456046c to 92f2ee8 Compare June 2, 2026 15:01
@msrathore-db msrathore-db merged commit de04e19 into main Jun 2, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants