`eth_call`/`StateCall` needlessly refused at the epoch after an expensive migration, with an undetectable error code

_Originally reported by @ArseniiPetrovich_

## Summary

Explicit calls (`eth_call`, `EthEstimateGas`, `StateCall`) pinned to a specific tipset are rejected with `ErrExpensiveFork` ("refusing explicit call due to state fork at epoch") not only at an expensive upgrade epoch `U`, but also at `U+1`, the first epoch *after* the migration, whose state is already fully materialised. The error is also returned with the generic JSON-RPC code `1`, so downstream tooling cannot distinguish it from any other application error.

Two independent problems, both worth fixing:

1. The refusal window is one epoch too wide (`U+1` should be served).
2. The error has no stable, registered code.

## Symptom

Reproduced on mainnet across the nv28 / FireHorse upgrade (`UpgradeFireHorseHeight = 6052800`). Same `eth_call` (`feeGrowthGlobal0X128()` on a Uniswap v3 pool), varying only the pinned block:

```
epoch U-1 (6052799): OK   -> 0x...15675c877f4119b6014c3ff7346ceae74
epoch U   (6052800): ERR  -> {"code":1,"message":"refusing explicit call due to state fork at epoch"}   <-- legit
epoch U+1 (6052801): ERR  -> {"code":1,"message":"refusing explicit call due to state fork at epoch"}   <-- the problem
epoch U+2 (6052802): OK   -> 0x...15675c877f4119b6014c3ff7346ceae74
```

The state at `U+1` is plainly available: `eth_getStorageAt` and `eth_getCode` both succeed there and return the correct values. Only the *explicit-call* path refuses.

## Downstream impact

The Graph's `graph-node` indexes FEVM contracts by replaying `eth_call` at the block where each event was emitted. When an event lands in block `U` or `U+1` the call is refused, and `graph-node` does not recognise the message as a deterministic error, so it treats it as a possible reorg and retries indefinitely. The subgraph wedges permanently at the upgrade epoch. This gets more likely every upgrade as FEVM activity grows.

## Why it happens

The guard (in `node/impl/eth/gas.go` and `chain/stmgr/call.go`) refuses a call when an expensive migration sits anywhere between the *parent* epoch and the called epoch. Including the parent makes a call at `U+1` trip on the migration at `U`, even though that migration is already baked into the state `U+1` runs on (it is recoverable without re-execution, and nothing runs the migration on demand). The epoch `U` itself genuinely must stay refused, because serving it would run the migration on demand against the wrong state.

Separately, `ErrExpensiveFork` is a bare sentinel that is never converted to a typed RPC error, so go-jsonrpc falls back to code `1`. Lotus already registers typed errors with stable codes (`api/api_errors.go`); `ErrExpensiveFork` is the natural sibling of `ErrNullRound` (both mean "this epoch can't be served as requested").

## Workarounds today

- **graph-node operators:** set `GRAPH_GETH_ETH_CALL_ERRORS="refusing explicit call due to state fork"` so the message is treated as deterministic, stopping the retry loop. Relies on the exact message string. Doing this prior to applying a proper fix means that some indexable content may be missed.
- **callers:** avoid pinning explicit calls to the upgrade epoch or the one immediately after; `U+2` onward is fine.

## Possible fixes

- **Serve `U+1`:** narrow the guard so it no longer refuses on a migration at the parent epoch, keeping `U` refused. This is what actually unblocks indexers, since events almost always land at `U+1` rather than exactly at `U`.
- **Signal it the way the ecosystem already does:** there's no widely-recognised code for this, but the de-facto convention for "state not servable at this block" (pruned/archive nodes) is code `-32000` plus a recognisable message phrase such as `required historical state unavailable` or `state ... is not available`. Lotus emits `1` with a bespoke message that no tooling recognises. Moving to `-32000` and growing the message to carry that phrase (e.g. `required historical state unavailable: refusing explicit call due to state fork at epoch N`) aligns with geth-oriented tooling. We should *not* borrow revert-style phrasing (e.g. `-32015` / "execution reverted"): `graph-node` would then treat the call as a successful revert and index `null` data, which is worse than the retry loop.

The serve-`U+1` fix is the one that resolves this in practice. The residual case (`U` itself, genuinely unservable) needs `graph-node` to grow a "state unavailable, advance past this block" path; it currently only knows retry-forever or treat-as-revert. The code/message change is the precondition for such an upstream fix, and less in our control.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`eth_call`/`StateCall` needlessly refused at the epoch after an expensive migration, with an undetectable error code #13642

Summary

Symptom

Downstream impact

Why it happens

Workarounds today

Possible fixes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

eth_call/StateCall needlessly refused at the epoch after an expensive migration, with an undetectable error code #13642

Description

Summary

Symptom

Downstream impact

Why it happens

Workarounds today

Possible fixes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`eth_call`/`StateCall` needlessly refused at the epoch after an expensive migration, with an undetectable error code #13642