[3.15] gh-150228: Improve the PEP 829 batch processing APIs (GH-150542)#150748
Merged
Conversation
…150542) * pythongh-150228: Improve the PEP 829 batch processing APIs As previously discussed with @ncoghlan and approved for 3.15b2 by @hugovk, this implements the batch processing APIs for addsitedir() and friends. We remove the `defer_processing_start_files` flag which required some implicit module global state, and promote StartupState to the public documented API. This also moves the bulk of the module global functions into methods of the `StartupState` class, so it removes the awkward APIs in 3.15b1. Now, instances of this class are an accumulator for startup state, using `StartupState.process()` to process them. Callers can now batch up startup state themselves by using the methods on this class. The module global functions are shims for this which preserve the legacy APIs and semantics using the new state class. This PR also fixes the interleaving regression identified by @ncoghlan in the same issue. Now, .pth file sys.path extensions are added to sys.path after the sitedir that the .pth file is found in, restoring the legacy behavior. Along the way, I've made a lot of improvements to function docstrings, site.rst documentation, and comments in the code explaining what's going on. * Add a note that if known_paths is provided to StartupState.__init__(), it will get mutated in place. * Improve some conditional flows. * Improve some comments. * Improve the what's new entry. * Make test_impl_exec_imports_suppressed_by_matching_start() more robust Based on PR comment, we need to read both the .pth and .start files, and prove that the .pth file's import line (which passes a bigger increment) is not called, but the .start file's entry point (which uses the default increment) is called. * As per review, move some methods to the private API _read_pth_file() and _read_start_file() are not intended to be part of the public API surface outside of the site module, so even though they are used by methods outside of the StartupState class, make them privately named. * Resolve several review feedbacks * Move a `versionadded` * Better list comprehension formatting (use the output from `ruff format --line-length 78`) * Add docs for site.makepath() and point the case-normalization requirement to this utility function. * Note that StartupState.process() is not idempotent. * Address another feedback comment This time, we get rid of the legacy implementation `reset` local, which was always difficult to understand, and just implement a return value based on the processing mode selected. * Changes based on pythongh-150228 review The comment by @encukou that started this change: ``` I still see two red flags here though: an argument that doesn't combine with other arguments, and (another instance of) changing the return type based on an argument. Did you consider adding a StartupState.addsitedir(sitedir) method, instead of the startup_state argument? ``` As it turns out, this is an even cleaner design. By moving the bulk of the previous module global functions into `StartupState` methods, we can get rid of all the awkward `startup_state` keyword-only arguments which conflict with `known_path` (Petr's first point). We can also get rid of the return value dichotomy (Petr's second point) because now we can preserve exactly the Python 3.14 API in the module global functions, and implement the better APIs in the class methods. We also generally don't have to pass around `process_known_sitedirs`. Now the following module global functions are essentially shims around class methods: * site.addsitedir() -> StartupState.addsitedir() * site.addusersitepackages() -> StartupState.addusersitepackages() * site.addsitepackages() -> StartupState.addsitepackages() * Additional minor changes * Remove a now unused parameter (cherry picked from commit 27ebd9a) Co-authored-by: Barry Warsaw <barry@python.org> Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
This was referenced Jun 2, 2026
warsaw
approved these changes
Jun 2, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
As previously discussed with @ncoghlan and approved for 3.15b2 by @hugovk,
this implements the batch processing APIs for addsitedir() and friends. We
remove the
defer_processing_start_filesflag which required some implicitmodule global state, and promote StartupState to the public documented API.
This also moves the bulk of the module global functions into methods of the
StartupStateclass, so it removes the awkward APIs in 3.15b1. Now, instancesof this class are an accumulator for startup state, using
StartupState.process()to process them. Callers can now batch up startup state themselves by using
the methods on this class. The module global functions are shims for this
which preserve the legacy APIs and semantics using the new state class.
This PR also fixes the interleaving regression identified by @ncoghlan in the
same issue. Now, .pth file sys.path extensions are added to sys.path after
the sitedir that the .pth file is found in, restoring the legacy behavior.
Along the way, I've made a lot of improvements to function docstrings,
site.rst documentation, and comments in the code explaining what's going on.
Add a note that if known_paths is provided to StartupState.init(), it
will get mutated in place.
Improve some conditional flows.
Improve some comments.
Improve the what's new entry.
Make test_impl_exec_imports_suppressed_by_matching_start() more robust
Based on PR comment, we need to read both the .pth and .start files, and prove
that the .pth file's import line (which passes a bigger increment) is not
called, but the .start file's entry point (which uses the default increment)
is called.
_read_pth_file() and _read_start_file() are not intended to be part of the
public API surface outside of the site module, so even though they are used by
methods outside of the StartupState class, make them privately named.
Resolve several review feedbacks
Move a
versionaddedBetter list comprehension formatting (use the output from
ruff format --line-length 78)Add docs for site.makepath() and point the case-normalization requirement to
this utility function.
Note that StartupState.process() is not idempotent.
Address another feedback comment
This time, we get rid of the legacy implementation
resetlocal, which wasalways difficult to understand, and just implement a return value based on the
processing mode selected.
The comment by @encukou that started this change:
As it turns out, this is an even cleaner design. By moving the bulk of the
previous module global functions into
StartupStatemethods, we can get ridof all the awkward
startup_statekeyword-only arguments which conflictwith
known_path(Petr's first point). We can also get rid of thereturn value dichotomy (Petr's second point) because now we can preserve
exactly the Python 3.14 API in the module global functions, and implement
the better APIs in the class methods. We also generally don't have to
pass around
process_known_sitedirs.Now the following module global functions are essentially shims around
class methods:
(cherry picked from commit 27ebd9a)
Co-authored-by: Barry Warsaw barry@python.org
Co-authored-by: Hugo van Kemenade 1324225+hugovk@users.noreply.github.com