Skip to content

Python: remove AstNode.getAFlowNode() and rewrite callers#21919

Open
yoff wants to merge 3 commits into
mainfrom
yoff/python-remove-getAFlowNode
Open

Python: remove AstNode.getAFlowNode() and rewrite callers#21919
yoff wants to merge 3 commits into
mainfrom
yoff/python-remove-getAFlowNode

Conversation

@yoff
Copy link
Copy Markdown
Contributor

@yoff yoff commented Jun 1, 2026

Summary

Preparatory refactor for the shared-CFG dataflow migration (#21894).

Removes the AstNode.getAFlowNode() cached predicate from the public Python QL API. All ~140 callers across lib/, src/, test/, and tools/ are rewritten from expr.getAFlowNode() = cfgNode to cfgNode.getNode() = expr, using ControlFlowNode.getNode() which already exists in Flow.qll.

Why

The dataflow library is being migrated to the shared CFG. In the new world there is no AST → CFG predicate (analogous to Java, Ruby, Swift, etc.) — the bridge only goes from CFG → AST via ControlFlowNode.getNode(). This PR factors out the call-site rewrite so that the bigger migration PR contains only genuinely new logic.

Semantic effect

None. The two forms are exact equivalents in the legacy CFG:

e.getAFlowNode() = n  <==>  n.getNode() = e

Verified by:

  • All 361 lib/ + src/ queries compile clean.
  • All 122 ControlFlow + PointsTo library-tests pass.
  • All 64 dataflow library-tests pass.
  • All 113 Variables/Exceptions/Expressions/Statements/Functions/Imports/Security/CWE-798/ModificationOfParameterWithDefault query-tests pass.

Notes for reviewers

  • The cached predicate definition in python/ql/lib/semmle/python/AstExtended.qll is the central deletion. Everything else is mechanical rewrite at call sites.
  • A handful of Exprs.qll / Import.qll classes had override XxxNode getAFlowNode() { result = super.getAFlowNode() } redeclarations that became dangling — removed.
  • A breaking change-note is included.

Copilot AI review requested due to automatic review settings June 1, 2026 10:54
@yoff yoff requested a review from a team as a code owner June 1, 2026 10:54
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Removes the public Python QL API predicate AstNode.getAFlowNode() and rewrites callers to use the inverse mapping via ControlFlowNode.getNode(), as a preparatory refactor for the Python shared-CFG dataflow migration.

Changes:

  • Delete AstNode.getAFlowNode() from python/ql/lib/semmle/python/AstExtended.qll.
  • Mechanically rewrite call sites from e.getAFlowNode() = n to n.getNode() = e across library, queries, tests, and tools.
  • Add a breaking change-note documenting the API removal and the migration pattern.
Show a summary per file
File Description
python/tools/recorded-call-graph-metrics/ql/lib/RecordedCalls.qll Rewrites recorded call resolution to match AST↔CFG via getNode().
python/ql/test/library-tests/PointsTo/local/LocalPointsTo.ql Updates PointsTo test to bind CFG nodes using getNode().
python/ql/test/library-tests/PointsTo/global/Global.ql Updates PointsTo test to bind CFG nodes using getNode().
python/ql/test/library-tests/dataflow/tainttracking/TestTaintLib.qll Rewrites a tainttracking test to compare through ControlFlowNode.getNode().
python/ql/test/library-tests/ControlFlow/splitting/NodeCount.ql Replaces flow-node counting with a ControlFlowNode comprehension over getNode().
python/ql/test/library-tests/ControlFlow/PointsToSupport/UseFromDefinition.ql Rewrites DefinitionNode binding from AST using getNode().
python/ql/test/experimental/import-resolution/importflow.ql Updates import-resolution experiment to pass CFG nodes without getAFlowNode().
python/ql/test/2/library-tests/comprehensions/ConsistencyCheck.ql Updates comprehension consistency check to use `exists(ControlFlowNode n
python/ql/src/Variables/UnusedModuleVariable.ql Rewrites reachability use from u.getAFlowNode() to an explicit CFG node bound by getNode().
python/ql/src/Variables/UndefinedPlaceHolder.ql Rewrites SSA/points-to use binding to CFG nodes via getNode().
python/ql/src/Variables/UndefinedGlobal.ql Rewrites control-flow containment/dominance checks to bind CFG nodes via getNode().
python/ql/src/Variables/Undefined.qll Rewrites loop-entry edge logic to bind statement CFG nodes via getNode().
python/ql/src/Variables/SuspiciousUnusedLoopIterationVariable.ql Updates comprehension CFG predecessor navigation to use getNode().
python/ql/src/Variables/MultiplyDefined.ql Rewrites assignment-definition comparison to use Definition.getNode() = <ast>.
python/ql/src/Variables/Loop.qll Rewrites loop-to-use reachability to use CFG nodes bound via getNode().
python/ql/src/Variables/LeakingListComprehension.ql Rewrites reachability/dominance constraints to use CFG nodes bound via getNode().
python/ql/src/Variables/Definition.qll Rewrites list-comprehension declaration reachability to use CFG nodes bound via getNode().
python/ql/src/Statements/SideEffectInAssert.ql Rewrites subprocess-call matching to bind the call CFG node via getNode().
python/ql/src/Statements/NonIteratorInForLoop.ql Rewrites for-iter binding to iter.getNode() = loop.getIter().
python/ql/src/Statements/NestedLoopsSameVariableWithReuse.ql Rewrites SSA definition matching to compare via getNode().
python/ql/src/Statements/IterableStringOrSequence.ql Rewrites for-iter binding to use getNode().
python/ql/src/semmle/python/functions/ModificationOfParameterWithDefaultCustomizations.qll Rewrites augmented-assignment target matching to use asCfgNode().getNode().
python/ql/src/Security/CWE-798/HardcodedCredentials.ql Rewrites keyword-value matching to use asCfgNode().getNode() = k.getValue().
python/ql/src/Resources/FileOpen.qll Rewrites return-value/SSA linking to use AST via getNode() rather than getAFlowNode().
python/ql/src/Functions/ReturnValueIgnored.ql Rewrites the “unused return value” counting logic to bind call CFG nodes via getNode().
python/ql/src/Functions/ExplicitReturnInInit.ql Rewrites “never returns” call detection to bind return-value CFG via getNode().
python/ql/src/Expressions/TruncatedDivision.ql Rewrites BinaryExprNode binding to bin.getNode() = div.
python/ql/src/Expressions/IsComparisons.qll Rewrites CompareNode binding to fcomp.getNode() = comp.
python/ql/src/Expressions/IncorrectComparisonUsingIs.ql Rewrites CompareNode binding to fcomp.getNode() = comp.
python/ql/src/Expressions/Formatting/AdvancedFormatting.qll Rewrites format-string points-to checks to pass a CFG node bound via getNode().
python/ql/src/Expressions/DuplicateKeyInDictionaryLiteral.ql Rewrites block-order and dominance comparisons to use CFG nodes bound via getNode().
python/ql/src/Expressions/CallArgs.qll Rewrites call matching helpers to bind call CFG nodes via getNode().
python/ql/src/Exceptions/CatchingBaseException.ql Rewrites except-block “reaches exit” check to use CFG nodes bound via getNode().
python/ql/src/Classes/ClassAttributes.qll Rewrites SSA and points-to bindings for attributes to use CFG nodes via getNode().
python/ql/src/analysis/PointsToFailure.ql Rewrites points-to failure detection to bind CFG nodes via getNode().
python/ql/src/analysis/KeyPointsToFailure.ql Rewrites key points-to failure logic to bind SSA uses via CFG nodes from getNode().
python/ql/src/analysis/ImportFailure.ql Rewrites import failure checks to bind import CFG nodes via getNode().
python/ql/src/analysis/CrossProjectDefinitions.qll Rewrites attribute resolution plumbing to pass CFG nodes bound via getNode().
python/ql/lib/semmle/python/types/Object.qll Rewrites vararg-origin checks to compare via ControlFlowNode.getNode().
python/ql/lib/semmle/python/types/FunctionObject.qll Rewrites parameter-definition linking to compare via getNode().
python/ql/lib/semmle/python/types/Extensions.qll Rewrites loop iterable/target bindings to compare via getNode().
python/ql/lib/semmle/python/types/Exceptions.qll Rewrites tuple-element extraction to bind element CFG node via getNode().
python/ql/lib/semmle/python/types/ClassObject.qll Rewrites metaclass/origin bindings to use CFG→AST via getNode().
python/ql/lib/semmle/python/SelfAttribute.qll Rewrites dominance/guard checks to bind self/object CFG nodes via getNode().
python/ql/lib/semmle/python/pointsto/PointsTo.qll Rewrites import/base/decorator/metaclass SSA bindings to use CFG→AST via getNode().
python/ql/lib/semmle/python/objects/TObject.qll Rewrites decorator CFG binding to use deco.getNode() = f.getADecorator().
python/ql/lib/semmle/python/objects/Classes.qll Rewrites class-expression origin binding to use an explicit CFG node via getNode().
python/ql/lib/semmle/python/objects/Callables.qll Rewrites callable-expression origin binding and return BB lookup via getNode().
python/ql/lib/semmle/python/internal/CachedStages.qll Rewrites cached-stage trigger from AST→CFG existence to CFG→AST via getNode().
python/ql/lib/semmle/python/Import.qll Removes now-dangling getAFlowNode() override on ImportMember.
python/ql/lib/semmle/python/frameworks/FlaskAdmin.qll Rewrites decorator matching for route handlers via node.getNode() = decorator.
python/ql/lib/semmle/python/frameworks/Flask.qll Rewrites decorator matching and return-value flow-node binding via getNode().
python/ql/lib/semmle/python/frameworks/FastApi.qll Rewrites decorator matching for route handlers via getNode().
python/ql/lib/semmle/python/frameworks/Bottle.qll Rewrites decorator matching for route handlers via getNode().
python/ql/lib/semmle/python/Flow.qll Rewrites internal legacy-CFG predicates/methods to use ControlFlowNode.getNode() comparisons.
python/ql/lib/semmle/python/Exprs.qll Rewrites side-effect detection to bind an expression CFG node via getNode() and removes dangling overrides.
python/ql/lib/semmle/python/essa/SsaDefinitions.qll Rewrites SSA source-definition predicates to bind defs via defn.getNode() = var.
python/ql/lib/semmle/python/dataflow/old/Implementation.qll Rewrites legacy dataflow plumbing to bind return/yield/assign nodes via getNode().
python/ql/lib/semmle/python/dataflow/new/internal/VariableCapture.qll Rewrites captured-variable reads/writes to bind CFG nodes via getNode().
python/ql/lib/semmle/python/dataflow/new/internal/DataFlowPublic.qll Rewrites module-variable local read to bind via asCfgNode().getNode().
python/ql/lib/semmle/python/dataflow/new/internal/DataFlowPrivate.qll Rewrites yield store steps to bind yield/value via asCfgNode().getNode().
python/ql/lib/semmle/python/dataflow/new/internal/DataFlowDispatch.qll Rewrites extracted return/yield node selection to bind via node.getNode() = ....
python/ql/lib/semmle/python/dataflow/new/internal/Attributes.qll Rewrites class-definition-as-attr object node to bind via asCfgNode().getNode() = cls.
python/ql/lib/semmle/python/dataflow/new/BarrierGuards.qll Rewrites constant-comparison guard logic to bind constants’ CFG nodes via getNode().
python/ql/lib/semmle/python/AstExtended.qll Removes the public cached predicate AstNode.getAFlowNode().
python/ql/lib/LegacyPointsTo.qll Rewrites points-to/refers-to helpers to bind both expression and origin CFG nodes via getNode().
python/ql/lib/change-notes/2026-05-19-remove-getAFlowNode.md Adds a breaking change-note documenting the API removal and replacement pattern.
python/ql/lib/analysis/DefinitionTracking.qll Rewrites definition tracking entry points to bind use CFG nodes via getNode().

Copilot's findings

  • Files reviewed: 68/68 changed files
  • Comments generated: 0

yoff pushed a commit that referenced this pull request Jun 1, 2026
Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll)
and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade
(semmle.python.controlflow.internal.Cfg) and the new SSA adapter
(semmle.python.dataflow.new.internal.SsaImpl), both introduced
additively in the preceding PRs in this stack.

This is the trunk-flip equivalent of the original draft PR #21894 (kept
around as documentation), rebased on top of the four preparatory PRs:

  P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919).
  P2: Qualify Flow.qll's AST references with Py:: prefix (#21920).
  P3: Add new shared-CFG-backed control flow graph (#21921).
  P4: Add new shared-SSA-backed SSA adapter (#21923).

The Python dataflow library (semmle/python/dataflow/new/) now imports
the new CFG facade and SSA adapter. All CFG-typed predicates
(ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are
qualified with the Cfg:: prefix; SSA references switch from
EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable.

GuardNode is redesigned to use the new CFG's outcome-node model
(isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock +
flipped indirection. Only BarrierGuard<...> is preserved as public
API.

Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib,
...) are updated to take CFG nodes from the new facade.

A handful of dataflow consistency tweaks for the new CFG:
- Augmented-assignment targets are treated as both load and store.
- 'from X import *' produces uncertain SSA writes for unknown names.
- CFG nodes are canonicalised so dataflow does not see equivalent
  pre/post-order pairs as distinct nodes.

Two AST tweaks for the new CFG:
- AstNodeImpl: omit PEP 695 type-parameter names from
  FunctionDefExpr / ClassDefExpr children.
- ImportResolution: drop the legacy essa import.

Test churn (~175 files): reblessed library- and query-test .expected
files reflect slightly different CFG granularity, different toString
output, and a handful of true alert deltas in security queries.

Verification: all 367 lib + src + consistency-queries compile clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
yoff pushed a commit that referenced this pull request Jun 1, 2026
Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll)
and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade
(semmle.python.controlflow.internal.Cfg) and the new SSA adapter
(semmle.python.dataflow.new.internal.SsaImpl), both introduced
additively in the preceding PRs in this stack.

This is the trunk-flip equivalent of the original draft PR #21894 (kept
around as documentation), rebased on top of the four preparatory PRs:

  P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919).
  P2: Qualify Flow.qll's AST references with Py:: prefix (#21920).
  P3: Add new shared-CFG-backed control flow graph (#21921).
  P4: Add new shared-SSA-backed SSA adapter (#21923).

The Python dataflow library (semmle/python/dataflow/new/) now imports
the new CFG facade and SSA adapter. All CFG-typed predicates
(ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are
qualified with the Cfg:: prefix; SSA references switch from
EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable.

GuardNode is redesigned to use the new CFG's outcome-node model
(isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock +
flipped indirection. Only BarrierGuard<...> is preserved as public
API.

Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib,
...) are updated to take CFG nodes from the new facade.

A handful of dataflow consistency tweaks for the new CFG:
- Augmented-assignment targets are treated as both load and store.
- 'from X import *' produces uncertain SSA writes for unknown names.
- CFG nodes are canonicalised so dataflow does not see equivalent
  pre/post-order pairs as distinct nodes.

Two AST tweaks for the new CFG:
- AstNodeImpl: omit PEP 695 type-parameter names from
  FunctionDefExpr / ClassDefExpr children.
- ImportResolution: drop the legacy essa import.

Test churn (~175 files): reblessed library- and query-test .expected
files reflect slightly different CFG granularity, different toString
output, and a handful of true alert deltas in security queries.

Verification: all 367 lib + src + consistency-queries compile clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
yoff pushed a commit that referenced this pull request Jun 1, 2026
Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll)
and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade
(semmle.python.controlflow.internal.Cfg) and the new SSA adapter
(semmle.python.dataflow.new.internal.SsaImpl), both introduced
additively in the preceding PRs in this stack.

This is the trunk-flip equivalent of the original draft PR #21894 (kept
around as documentation), rebased on top of the four preparatory PRs:

  P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919).
  P2: Qualify Flow.qll's AST references with Py:: prefix (#21920).
  P3: Add new shared-CFG-backed control flow graph (#21921).
  P4: Add new shared-SSA-backed SSA adapter (#21923).

The Python dataflow library (semmle/python/dataflow/new/) now imports
the new CFG facade and SSA adapter. All CFG-typed predicates
(ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are
qualified with the Cfg:: prefix; SSA references switch from
EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable.

GuardNode is redesigned to use the new CFG's outcome-node model
(isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock +
flipped indirection. Only BarrierGuard<...> is preserved as public
API.

Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib,
...) are updated to take CFG nodes from the new facade.

A handful of dataflow consistency tweaks for the new CFG:
- Augmented-assignment targets are treated as both load and store.
- 'from X import *' produces uncertain SSA writes for unknown names.
- CFG nodes are canonicalised so dataflow does not see equivalent
  pre/post-order pairs as distinct nodes.

Two AST tweaks for the new CFG:
- AstNodeImpl: omit PEP 695 type-parameter names from
  FunctionDefExpr / ClassDefExpr children.
- ImportResolution: drop the legacy essa import.

Test churn (~175 files): reblessed library- and query-test .expected
files reflect slightly different CFG granularity, different toString
output, and a handful of true alert deltas in security queries.

Verification: all 367 lib + src + consistency-queries compile clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
yoff pushed a commit that referenced this pull request Jun 1, 2026
Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll)
and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade
(semmle.python.controlflow.internal.Cfg) and the new SSA adapter
(semmle.python.dataflow.new.internal.SsaImpl), both introduced
additively in the preceding PRs in this stack.

This is the trunk-flip equivalent of the original draft PR #21894 (kept
around as documentation), rebased on top of the four preparatory PRs:

  P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919).
  P2: Qualify Flow.qll's AST references with Py:: prefix (#21920).
  P3: Add new shared-CFG-backed control flow graph (#21921).
  P4: Add new shared-SSA-backed SSA adapter (#21923).

The Python dataflow library (semmle/python/dataflow/new/) now imports
the new CFG facade and SSA adapter. All CFG-typed predicates
(ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are
qualified with the Cfg:: prefix; SSA references switch from
EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable.

GuardNode is redesigned to use the new CFG's outcome-node model
(isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock +
flipped indirection. Only BarrierGuard<...> is preserved as public
API.

Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib,
...) are updated to take CFG nodes from the new facade.

A handful of dataflow consistency tweaks for the new CFG:
- Augmented-assignment targets are treated as both load and store.
- 'from X import *' produces uncertain SSA writes for unknown names.
- CFG nodes are canonicalised so dataflow does not see equivalent
  pre/post-order pairs as distinct nodes.

Two AST tweaks for the new CFG:
- AstNodeImpl: omit PEP 695 type-parameter names from
  FunctionDefExpr / ClassDefExpr children.
- ImportResolution: drop the legacy essa import.

Test churn (~175 files): reblessed library- and query-test .expected
files reflect slightly different CFG granularity, different toString
output, and a handful of true alert deltas in security queries.

Verification: all 367 lib + src + consistency-queries compile clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The internal predicates that identify `@staticmethod`, `@classmethod` and
`@property` decorators previously required the decorator's `NameNode` to
satisfy `isGlobal()` (i.e. no SSA def reaches the decorator's name use).
That filter was correct but unnecessarily indirect: these three names
are builtins, and even when a class body redefines one, the class body
has not started executing at the decorator position, so Python uses the
builtin.

Match the decorator's AST `Name` directly instead, dropping the CFG/SSA
detour. The slight semantic change — `isGlobal()` would have rejected
module-level shadowing of these builtins — is negligible in practice
and explicitly documented in the change note.

`hasContextmanagerDecorator` and `hasOverloadDecorator` keep the
`NameNode.isGlobal()` check because their target names (`contextmanager`,
`overload`) are imported, not builtin, and local shadowing is a real
concern.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI and others added 2 commits June 1, 2026 14:05
Preparatory refactor for the shared-CFG dataflow migration.

Removes the AstNode.getAFlowNode() cached predicate from the public
Python QL API. All ~140 callers across lib/, src/, test/, and tools/
are rewritten from `expr.getAFlowNode() = cfgNode` to
`cfgNode.getNode() = expr`, using ControlFlowNode.getNode() which
already exists in Flow.qll.

Semantic noop verified by:
- All 361 lib/ + src/ queries compile clean.
- All 122 ControlFlow + PointsTo library-tests pass.
- All 64 dataflow library-tests pass.
- All 113 Variables/Exceptions/Expressions/Statements/Functions/Imports/
  Security/CWE-798/ModificationOfParameterWithDefault query-tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Follow-up to the getAFlowNode removal in the same PR: same AST→legacy-CFG
bridge pattern. Rewrite the 11 call sites (across objects/, types/,
frameworks/, and TypeTrackingImpl) to bind a `Return ret` explicitly,
then constrain via `ret.getScope() = f and n.getNode() = ret.getValue()`.

Semantic noop.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@yoff yoff force-pushed the yoff/python-remove-getAFlowNode branch from befb51a to 8e4f127 Compare June 1, 2026 14:05
Copy link
Copy Markdown
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit conflicted about this PR. The changes themselves look reasonable, but it does introduce quite a lot of churn.

Also, I'm not sure I find the "why" (that other languages don't have this) terribly persuasive. The getAFlowNode predicate is a simple inversion of getNode, and it seems to me that it would be just as useful in a world where we use the new CFG library.

Rather, it seems to me that the issue is that -- at present -- getAFlowNode only returns an old CFG node, which ties the dataflow library to that API.

However, this suggests we could have used a different approach here: rename getAFlowNode to getALegacyFlowNode. The latter can be made deprecated, which allows external users to continue to use it until we get rid of it entirely. When adding the new CFG library, we can then re-add getAFlowNode with the old API, and we don't have to make any changes to the dataflow libraries (apart from a slight detour to using getALegacyFlowNode).

What do you think?

Finally, I'll be curious to see if this affects performance in any way. One way to guarantee that it doesn't would be to look at the RA before and after the change, and verify that it is (hopefully) identical.

@yoff
Copy link
Copy Markdown
Contributor Author

yoff commented Jun 1, 2026

I'm a bit conflicted about this PR. The changes themselves look reasonable, but it does introduce quite a lot of churn.

Also, I'm not sure I find the "why" (that other languages don't have this) terribly persuasive. The getAFlowNode predicate is a simple inversion of getNode, and it seems to me that it would be just as useful in a world where we use the new CFG library.

Rather, it seems to me that the issue is that -- at present -- getAFlowNode only returns an old CFG node, which ties the dataflow library to that API.

However, this suggests we could have used a different approach here: rename getAFlowNode to getALegacyFlowNode. The latter can be made deprecated, which allows external users to continue to use it until we get rid of it entirely. When adding the new CFG library, we can then re-add getAFlowNode with the old API, and we don't have to make any changes to the dataflow libraries (apart from a slight detour to using getALegacyFlowNode).

What do you think?

I actually consider it a clean-up. I would like to untangle our hierarchies as much as possible and I think that the AST not knowing about the CFG is a feature. Are you worried that users will experience a lot of churn from this?

Finally, I'll be curious to see if this affects performance in any way. One way to guarantee that it doesn't would be to look at the RA before and after the change, and verify that it is (hopefully) identical.

That is a nice suggestion!

@tausbn
Copy link
Copy Markdown
Contributor

tausbn commented Jun 1, 2026

I actually consider it a clean-up. I would like to untangle our hierarchies as much as possible and I think that the AST not knowing about the CFG is a feature. Are you worried that users will experience a lot of churn from this?

Yeah, that's my main concern, seeing as getAFlowNode is used all over the place (as this PR demonstrates).

In that case, how about we add back getAFlowNode, but make it deprecated? That should satisfy your desire to clean up the API, while still allowing everything to run smoothly (and your migration fixes will have gotten rid of the deprecation warnings in our own code).

yoff pushed a commit that referenced this pull request Jun 1, 2026
Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll)
and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade
(semmle.python.controlflow.internal.Cfg) and the new SSA adapter
(semmle.python.dataflow.new.internal.SsaImpl), both introduced
additively in the preceding PRs in this stack.

This is the trunk-flip equivalent of the original draft PR #21894 (kept
around as documentation), rebased on top of the four preparatory PRs:

  P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919).
  P2: Qualify Flow.qll's AST references with Py:: prefix (#21920).
  P3: Add new shared-CFG-backed control flow graph (#21921).
  P4: Add new shared-SSA-backed SSA adapter (#21923).

The Python dataflow library (semmle/python/dataflow/new/) now imports
the new CFG facade and SSA adapter. All CFG-typed predicates
(ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are
qualified with the Cfg:: prefix; SSA references switch from
EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable.

GuardNode is redesigned to use the new CFG's outcome-node model
(isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock +
flipped indirection. Only BarrierGuard<...> is preserved as public
API.

Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib,
...) are updated to take CFG nodes from the new facade.

A handful of dataflow consistency tweaks for the new CFG:
- Augmented-assignment targets are treated as both load and store.
- 'from X import *' produces uncertain SSA writes for unknown names.
- CFG nodes are canonicalised so dataflow does not see equivalent
  pre/post-order pairs as distinct nodes.

Two AST tweaks for the new CFG:
- AstNodeImpl: omit PEP 695 type-parameter names from
  FunctionDefExpr / ClassDefExpr children.
- ImportResolution: drop the legacy essa import.

Test churn (~175 files): reblessed library- and query-test .expected
files reflect slightly different CFG granularity, different toString
output, and a handful of true alert deltas in security queries.

Verification: all 367 lib + src + consistency-queries compile clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@yoff
Copy link
Copy Markdown
Contributor Author

yoff commented Jun 1, 2026

I actually consider it a clean-up. I would like to untangle our hierarchies as much as possible and I think that the AST not knowing about the CFG is a feature. Are you worried that users will experience a lot of churn from this?

Yeah, that's my main concern, seeing as getAFlowNode is used all over the place (as this PR demonstrates).

In that case, how about we add back getAFlowNode, but make it deprecated? That should satisfy your desire to clean up the API, while still allowing everything to run smoothly (and your migration fixes will have gotten rid of the deprecation warnings in our own code).

Yes, I supose we could do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants