Skip to content

Python: add new shared-SSA-backed SSA adapter (additive)#21923

Open
yoff wants to merge 1 commit into
yoff/python-add-new-cfg-libraryfrom
yoff/python-add-new-ssa
Open

Python: add new shared-SSA-backed SSA adapter (additive)#21923
yoff wants to merge 1 commit into
yoff/python-add-new-cfg-libraryfrom
yoff/python-add-new-ssa

Conversation

@yoff
Copy link
Copy Markdown
Contributor

@yoff yoff commented Jun 1, 2026

Summary

Preparatory refactor for the shared-CFG dataflow migration (#21894).
Based on #21921 — merge that first.

Adds the new Python SSA adapter additively, without changing any production behaviour.

Library addition

  • semmle.python.dataflow.new.internal.SsaImpl — Python SSA implementation built on the new (shared) CFG.

    Mirrors the Java SSA adapter at java/ql/lib/semmle/code/java/dataflow/internal/SsaImpl.qll: an InputSig is defined in terms of positional (BasicBlock, int) variable references, and the shared codeql.ssa.Ssa::Make<Location, Cfg, Input> module is then instantiated.

    SourceVariable is the AST-level Py::Variable. Variable references are looked up via the new CFG facade's NameNode.defines/uses/deletes predicates (added in the preceding PR), which themselves are one-line bridges to AST-level Name.defines/uses/deletes.

    Implicit-entry definitions are inserted for non-local/global/builtin reads, captured variables, and (when needed) parameters.

Test additions

  • library-tests/dataflow-new-ssa/ — exercises the new SSA over a representative test corpus and checks expected def/use chains.
  • library-tests/dataflow-new-ssa-vs-legacy/ — runs both new SSA and legacy ESSA over the same corpus and diffs the results, so any semantic divergence shows up as a test failure.

Production impact

None. The new SSA adapter has zero callers in lib/ and src/ — the legacy ESSA SSA (semmle/python/essa/*) remains the default. The dataflow library is not migrated yet; that lands in #21894 (which will shrink considerably once this stack merges).

Verification

  • All 367 lib/ + src/ + consistency-queries/ queries compile clean.
  • All 641 ControlFlow + PointsTo + dataflow + essa + consistency library-tests pass.
  • Both new dataflow-new-ssa[/vs-legacy] test packs pass.

Notes for reviewers

  • SsaImpl.qll is the only non-test addition; ~290 LOC. Mostly the InputSig glue + the implicit-entry-definition logic.
  • The vs-legacy test is the easiest way to spot semantic differences: any new divergence will show up as a diff in CmpTest.expected.

Copilot AI review requested due to automatic review settings June 1, 2026 12:04
@yoff yoff requested a review from a team as a code owner June 1, 2026 12:04
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an additive, shared-SSA-backed SSA adapter for Python on top of the new shared CFG facade, along with library test packs to validate the new SSA in isolation and against legacy ESSA as preparation for the upcoming dataflow migration.

Changes:

  • Introduce semmle.python.dataflow.new.internal.SsaImpl (shared-SSA instantiation + ESSA-shaped adapter surface).
  • Add a new inline-expectations SSA test pack (dataflow-new-ssa) and a “vs legacy ESSA” comparison pack (dataflow-new-ssa-vs-legacy).
  • Add a Python library change note documenting the new adapter.
Show a summary per file
File Description
python/ql/test/library-tests/dataflow-new-ssa/test.py New Python corpus for inline SSA def/use/phi expectations.
python/ql/test/library-tests/dataflow-new-ssa/SsaTest.ql Inline-expectations driver query for the new SSA adapter.
python/ql/test/library-tests/dataflow-new-ssa/SsaTest.expected Snapshot of current inline-expectations outcomes.
python/ql/test/library-tests/dataflow-new-ssa-vs-legacy/test.py Shared corpus used to compare new SSA vs legacy ESSA.
python/ql/test/library-tests/dataflow-new-ssa-vs-legacy/CmpTest.ql Query comparing def signatures between new SSA and legacy ESSA.
python/ql/test/library-tests/dataflow-new-ssa-vs-legacy/CmpTest.expected Snapshot of current “def-only-old/new” diffs.
python/ql/lib/semmle/python/dataflow/new/internal/SsaImpl.qll New shared-SSA-backed SSA adapter and ESSA-shaped compatibility layer.
python/ql/lib/change-notes/2026-05-19-add-shared-ssa.md Change note entry for the new SSA adapter.

Copilot's findings

  • Files reviewed: 8/8 changed files
  • Comments generated: 3

Comment on lines +3 to +7
# The shared SSA implementation prunes its construction by liveness:
# definitions of variables that are not read are never materialised.
# This is by design — write-only variables would only bloat the SSA
# graph. Tests therefore must always include a read of each variable
# being verified.
Comment on lines +15 to +16
* - Captured / closure variables (gap: new SSA does not yet model
* closure captures).
Comment on lines +1 to +3
---
category: minorAnalysis
---
yoff pushed a commit that referenced this pull request Jun 1, 2026
Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll)
and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade
(semmle.python.controlflow.internal.Cfg) and the new SSA adapter
(semmle.python.dataflow.new.internal.SsaImpl), both introduced
additively in the preceding PRs in this stack.

This is the trunk-flip equivalent of the original draft PR #21894 (kept
around as documentation), rebased on top of the four preparatory PRs:

  P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919).
  P2: Qualify Flow.qll's AST references with Py:: prefix (#21920).
  P3: Add new shared-CFG-backed control flow graph (#21921).
  P4: Add new shared-SSA-backed SSA adapter (#21923).

The Python dataflow library (semmle/python/dataflow/new/) now imports
the new CFG facade and SSA adapter. All CFG-typed predicates
(ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are
qualified with the Cfg:: prefix; SSA references switch from
EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable.

GuardNode is redesigned to use the new CFG's outcome-node model
(isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock +
flipped indirection. Only BarrierGuard<...> is preserved as public
API.

Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib,
...) are updated to take CFG nodes from the new facade.

A handful of dataflow consistency tweaks for the new CFG:
- Augmented-assignment targets are treated as both load and store.
- 'from X import *' produces uncertain SSA writes for unknown names.
- CFG nodes are canonicalised so dataflow does not see equivalent
  pre/post-order pairs as distinct nodes.

Two AST tweaks for the new CFG:
- AstNodeImpl: omit PEP 695 type-parameter names from
  FunctionDefExpr / ClassDefExpr children.
- ImportResolution: drop the legacy essa import.

Test churn (~175 files): reblessed library- and query-test .expected
files reflect slightly different CFG granularity, different toString
output, and a handful of true alert deltas in security queries.

Verification: all 367 lib + src + consistency-queries compile clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
yoff pushed a commit that referenced this pull request Jun 1, 2026
Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll)
and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade
(semmle.python.controlflow.internal.Cfg) and the new SSA adapter
(semmle.python.dataflow.new.internal.SsaImpl), both introduced
additively in the preceding PRs in this stack.

This is the trunk-flip equivalent of the original draft PR #21894 (kept
around as documentation), rebased on top of the four preparatory PRs:

  P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919).
  P2: Qualify Flow.qll's AST references with Py:: prefix (#21920).
  P3: Add new shared-CFG-backed control flow graph (#21921).
  P4: Add new shared-SSA-backed SSA adapter (#21923).

The Python dataflow library (semmle/python/dataflow/new/) now imports
the new CFG facade and SSA adapter. All CFG-typed predicates
(ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are
qualified with the Cfg:: prefix; SSA references switch from
EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable.

GuardNode is redesigned to use the new CFG's outcome-node model
(isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock +
flipped indirection. Only BarrierGuard<...> is preserved as public
API.

Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib,
...) are updated to take CFG nodes from the new facade.

A handful of dataflow consistency tweaks for the new CFG:
- Augmented-assignment targets are treated as both load and store.
- 'from X import *' produces uncertain SSA writes for unknown names.
- CFG nodes are canonicalised so dataflow does not see equivalent
  pre/post-order pairs as distinct nodes.

Two AST tweaks for the new CFG:
- AstNodeImpl: omit PEP 695 type-parameter names from
  FunctionDefExpr / ClassDefExpr children.
- ImportResolution: drop the legacy essa import.

Test churn (~175 files): reblessed library- and query-test .expected
files reflect slightly different CFG granularity, different toString
output, and a handful of true alert deltas in security queries.

Verification: all 367 lib + src + consistency-queries compile clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
yoff pushed a commit that referenced this pull request Jun 1, 2026
Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll)
and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade
(semmle.python.controlflow.internal.Cfg) and the new SSA adapter
(semmle.python.dataflow.new.internal.SsaImpl), both introduced
additively in the preceding PRs in this stack.

This is the trunk-flip equivalent of the original draft PR #21894 (kept
around as documentation), rebased on top of the four preparatory PRs:

  P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919).
  P2: Qualify Flow.qll's AST references with Py:: prefix (#21920).
  P3: Add new shared-CFG-backed control flow graph (#21921).
  P4: Add new shared-SSA-backed SSA adapter (#21923).

The Python dataflow library (semmle/python/dataflow/new/) now imports
the new CFG facade and SSA adapter. All CFG-typed predicates
(ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are
qualified with the Cfg:: prefix; SSA references switch from
EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable.

GuardNode is redesigned to use the new CFG's outcome-node model
(isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock +
flipped indirection. Only BarrierGuard<...> is preserved as public
API.

Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib,
...) are updated to take CFG nodes from the new facade.

A handful of dataflow consistency tweaks for the new CFG:
- Augmented-assignment targets are treated as both load and store.
- 'from X import *' produces uncertain SSA writes for unknown names.
- CFG nodes are canonicalised so dataflow does not see equivalent
  pre/post-order pairs as distinct nodes.

Two AST tweaks for the new CFG:
- AstNodeImpl: omit PEP 695 type-parameter names from
  FunctionDefExpr / ClassDefExpr children.
- ImportResolution: drop the legacy essa import.

Test churn (~175 files): reblessed library- and query-test .expected
files reflect slightly different CFG granularity, different toString
output, and a handful of true alert deltas in security queries.

Verification: all 367 lib + src + consistency-queries compile clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@yoff yoff force-pushed the yoff/python-add-new-cfg-library branch from b547f1b to 370dc98 Compare June 1, 2026 13:27
@yoff yoff requested a review from a team as a code owner June 1, 2026 13:27
@yoff yoff force-pushed the yoff/python-add-new-ssa branch from 28062e1 to 4d4b916 Compare June 1, 2026 13:28
yoff pushed a commit that referenced this pull request Jun 1, 2026
Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll)
and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade
(semmle.python.controlflow.internal.Cfg) and the new SSA adapter
(semmle.python.dataflow.new.internal.SsaImpl), both introduced
additively in the preceding PRs in this stack.

This is the trunk-flip equivalent of the original draft PR #21894 (kept
around as documentation), rebased on top of the four preparatory PRs:

  P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919).
  P2: Qualify Flow.qll's AST references with Py:: prefix (#21920).
  P3: Add new shared-CFG-backed control flow graph (#21921).
  P4: Add new shared-SSA-backed SSA adapter (#21923).

The Python dataflow library (semmle/python/dataflow/new/) now imports
the new CFG facade and SSA adapter. All CFG-typed predicates
(ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are
qualified with the Cfg:: prefix; SSA references switch from
EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable.

GuardNode is redesigned to use the new CFG's outcome-node model
(isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock +
flipped indirection. Only BarrierGuard<...> is preserved as public
API.

Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib,
...) are updated to take CFG nodes from the new facade.

A handful of dataflow consistency tweaks for the new CFG:
- Augmented-assignment targets are treated as both load and store.
- 'from X import *' produces uncertain SSA writes for unknown names.
- CFG nodes are canonicalised so dataflow does not see equivalent
  pre/post-order pairs as distinct nodes.

Two AST tweaks for the new CFG:
- AstNodeImpl: omit PEP 695 type-parameter names from
  FunctionDefExpr / ClassDefExpr children.
- ImportResolution: drop the legacy essa import.

Test churn (~175 files): reblessed library- and query-test .expected
files reflect slightly different CFG granularity, different toString
output, and a handful of true alert deltas in security queries.

Verification: all 367 lib + src + consistency-queries compile clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@yoff yoff force-pushed the yoff/python-add-new-cfg-library branch from 370dc98 to 0ca4cab Compare June 1, 2026 14:05
Preparatory refactor for the shared-CFG dataflow migration. Adds the
new Python SSA adapter additively, without changing any production
behaviour.

Library additions:

- semmle.python.dataflow.new.internal.SsaImpl — Python SSA
  implementation built on the new (shared) CFG. Mirrors the Java SSA
  adapter (java/ql/lib/semmle/code/java/dataflow/internal/SsaImpl.qll):
  an InputSig is defined in terms of positional (BasicBlock, int)
  variable references, and the shared
  codeql.ssa.Ssa::Make<Location, Cfg, Input> module is then
  instantiated.

  SourceVariable is the AST-level Py::Variable. Variable references
  are looked up via the new CFG facade's NameNode.defines/uses/deletes
  predicates (added in the preceding PR), which themselves are
  one-line bridges to AST-level Name.defines/uses/deletes.

  Implicit-entry definitions are inserted for non-local/global/builtin
  reads, captured variables, and (when needed) parameters.

Test additions:

- library-tests/dataflow-new-ssa/ — exercises the new SSA over a
  representative test corpus and checks expected def/use chains.

- library-tests/dataflow-new-ssa-vs-legacy/ — runs both new SSA and
  legacy ESSA over the same corpus and diffs the results, so any
  semantic divergence shows up as a test failure.

Production impact:

None. The new SSA adapter has zero callers in lib/ and src/ — the
legacy ESSA SSA (semmle/python/essa/*) remains the default. The
dataflow library is not migrated yet; that lands in a follow-up PR.

Verified by:
- All 367 lib + src + consistency-queries compile clean.
- All 641 ControlFlow + PointsTo + dataflow + essa + consistency
  library-tests pass.
- Both new dataflow-new-ssa[/vs-legacy] test packs pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@yoff yoff force-pushed the yoff/python-add-new-ssa branch from 4d4b916 to 6313e70 Compare June 1, 2026 14:05
yoff pushed a commit that referenced this pull request Jun 1, 2026
Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll)
and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade
(semmle.python.controlflow.internal.Cfg) and the new SSA adapter
(semmle.python.dataflow.new.internal.SsaImpl), both introduced
additively in the preceding PRs in this stack.

This is the trunk-flip equivalent of the original draft PR #21894 (kept
around as documentation), rebased on top of the four preparatory PRs:

  P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919).
  P2: Qualify Flow.qll's AST references with Py:: prefix (#21920).
  P3: Add new shared-CFG-backed control flow graph (#21921).
  P4: Add new shared-SSA-backed SSA adapter (#21923).

The Python dataflow library (semmle/python/dataflow/new/) now imports
the new CFG facade and SSA adapter. All CFG-typed predicates
(ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are
qualified with the Cfg:: prefix; SSA references switch from
EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable.

GuardNode is redesigned to use the new CFG's outcome-node model
(isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock +
flipped indirection. Only BarrierGuard<...> is preserved as public
API.

Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib,
...) are updated to take CFG nodes from the new facade.

A handful of dataflow consistency tweaks for the new CFG:
- Augmented-assignment targets are treated as both load and store.
- 'from X import *' produces uncertain SSA writes for unknown names.
- CFG nodes are canonicalised so dataflow does not see equivalent
  pre/post-order pairs as distinct nodes.

Two AST tweaks for the new CFG:
- AstNodeImpl: omit PEP 695 type-parameter names from
  FunctionDefExpr / ClassDefExpr children.
- ImportResolution: drop the legacy essa import.

Test churn (~175 files): reblessed library- and query-test .expected
files reflect slightly different CFG granularity, different toString
output, and a handful of true alert deltas in security queries.

Verification: all 367 lib + src + consistency-queries compile clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants