Skip to content

[3.15] gh-150228: Improve the PEP 829 batch processing APIs (GH-150542)#150748

Merged
warsaw merged 1 commit into
python:3.15from
miss-islington:backport-27ebd9a-3.15
Jun 2, 2026
Merged

[3.15] gh-150228: Improve the PEP 829 batch processing APIs (GH-150542)#150748
warsaw merged 1 commit into
python:3.15from
miss-islington:backport-27ebd9a-3.15

Conversation

@miss-islington
Copy link
Copy Markdown
Contributor

As previously discussed with @ncoghlan and approved for 3.15b2 by @hugovk,
this implements the batch processing APIs for addsitedir() and friends. We
remove the defer_processing_start_files flag which required some implicit
module global state, and promote StartupState to the public documented API.

This also moves the bulk of the module global functions into methods of the
StartupState class, so it removes the awkward APIs in 3.15b1. Now, instances
of this class are an accumulator for startup state, using StartupState.process()
to process them. Callers can now batch up startup state themselves by using
the methods on this class. The module global functions are shims for this
which preserve the legacy APIs and semantics using the new state class.

This PR also fixes the interleaving regression identified by @ncoghlan in the
same issue. Now, .pth file sys.path extensions are added to sys.path after
the sitedir that the .pth file is found in, restoring the legacy behavior.

Along the way, I've made a lot of improvements to function docstrings,
site.rst documentation, and comments in the code explaining what's going on.

  • Add a note that if known_paths is provided to StartupState.init(), it
    will get mutated in place.

  • Improve some conditional flows.

  • Improve some comments.

  • Improve the what's new entry.

  • Make test_impl_exec_imports_suppressed_by_matching_start() more robust

Based on PR comment, we need to read both the .pth and .start files, and prove
that the .pth file's import line (which passes a bigger increment) is not
called, but the .start file's entry point (which uses the default increment)
is called.

  • As per review, move some methods to the private API

_read_pth_file() and _read_start_file() are not intended to be part of the
public API surface outside of the site module, so even though they are used by
methods outside of the StartupState class, make them privately named.

  • Resolve several review feedbacks

  • Move a versionadded

  • Better list comprehension formatting (use the output from
    ruff format --line-length 78)

  • Add docs for site.makepath() and point the case-normalization requirement to
    this utility function.

  • Note that StartupState.process() is not idempotent.

  • Address another feedback comment

This time, we get rid of the legacy implementation reset local, which was
always difficult to understand, and just implement a return value based on the
processing mode selected.

The comment by @encukou that started this change:

I still see two red flags here though: an argument that doesn't combine with
other arguments, and (another instance of) changing the return type based on
an argument.

Did you consider adding a StartupState.addsitedir(sitedir) method, instead of
the startup_state argument?

As it turns out, this is an even cleaner design. By moving the bulk of the
previous module global functions into StartupState methods, we can get rid
of all the awkward startup_state keyword-only arguments which conflict
with known_path (Petr's first point). We can also get rid of the
return value dichotomy (Petr's second point) because now we can preserve
exactly the Python 3.14 API in the module global functions, and implement
the better APIs in the class methods. We also generally don't have to
pass around process_known_sitedirs.

Now the following module global functions are essentially shims around
class methods:

  • site.addsitedir() -> StartupState.addsitedir()
  • site.addusersitepackages() -> StartupState.addusersitepackages()
  • site.addsitepackages() -> StartupState.addsitepackages()
  • Additional minor changes
  • Remove a now unused parameter

(cherry picked from commit 27ebd9a)

Co-authored-by: Barry Warsaw barry@python.org
Co-authored-by: Hugo van Kemenade 1324225+hugovk@users.noreply.github.com

…150542)

* pythongh-150228: Improve the PEP 829 batch processing APIs

As previously discussed with @ncoghlan and approved for 3.15b2 by @hugovk,
this implements the batch processing APIs for addsitedir() and friends.  We
remove the `defer_processing_start_files` flag which required some implicit
module global state, and promote StartupState to the public documented API.

This also moves the bulk of the module global functions into methods of the
`StartupState` class, so it removes the awkward APIs in 3.15b1.  Now, instances
of this class are an accumulator for startup state, using `StartupState.process()`
to process them.  Callers can now batch up startup state themselves by using
the methods on this class.  The module global functions are shims for this
which preserve the legacy APIs and semantics using the new state class.

This PR also fixes the interleaving regression identified by @ncoghlan in the
same issue.  Now, .pth file sys.path extensions are added to sys.path after
the sitedir that the .pth file is found in, restoring the legacy behavior.

Along the way, I've made a lot of improvements to function docstrings,
site.rst documentation, and comments in the code explaining what's going on.

* Add a note that if known_paths is provided to StartupState.__init__(), it
  will get mutated in place.
* Improve some conditional flows.
* Improve some comments.
* Improve the what's new entry.

* Make test_impl_exec_imports_suppressed_by_matching_start() more robust

Based on PR comment, we need to read both the .pth and .start files, and prove
that the .pth file's import line (which passes a bigger increment) is not
called, but the .start file's entry point (which uses the default increment)
is called.

* As per review, move some methods to the private API

_read_pth_file() and _read_start_file() are not intended to be part of the
public API surface outside of the site module, so even though they are used by
methods outside of the StartupState class, make them privately named.

* Resolve several review feedbacks

* Move a `versionadded`
* Better list comprehension formatting (use the output from
  `ruff format --line-length 78`)

* Add docs for site.makepath() and point the case-normalization requirement to
  this utility function.
* Note that StartupState.process() is not idempotent.

* Address another feedback comment

This time, we get rid of the legacy implementation `reset` local, which was
always difficult to understand, and just implement a return value based on the
processing mode selected.

* Changes based on pythongh-150228 review

The comment by @encukou that started this change:

```
I still see two red flags here though: an argument that doesn't combine with
other arguments, and (another instance of) changing the return type based on
an argument.

Did you consider adding a StartupState.addsitedir(sitedir) method, instead of
the startup_state argument?
```

As it turns out, this is an even cleaner design.  By moving the bulk of the
previous module global functions into `StartupState` methods, we can get rid
of all the awkward `startup_state` keyword-only arguments which conflict
with `known_path` (Petr's first point).  We can also get rid of the
return value dichotomy (Petr's second point) because now we can preserve
exactly the Python 3.14 API in the module global functions, and implement
the better APIs in the class methods.  We also generally don't have to
pass around `process_known_sitedirs`.

Now the following module global functions are essentially shims around
class methods:

* site.addsitedir() -> StartupState.addsitedir()
* site.addusersitepackages() -> StartupState.addusersitepackages()
* site.addsitepackages() -> StartupState.addsitepackages()
* Additional minor changes
* Remove a now unused parameter

(cherry picked from commit 27ebd9a)

Co-authored-by: Barry Warsaw <barry@python.org>
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
@warsaw warsaw enabled auto-merge (squash) June 2, 2026 01:46
@warsaw warsaw merged commit 848ba18 into python:3.15 Jun 2, 2026
60 checks passed
@miss-islington miss-islington deleted the backport-27ebd9a-3.15 branch June 2, 2026 02:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants