gh-150228: Improve the PEP 829 batch processing APIs#150542
Conversation
As previously discussed with @ncoghlan and approved for 3.15b2 by @hugovk, this exposes the batch processing APIs for addsitedir() and friends. We remove the `defer_processing_start_files` flag which required some implicit module global state, and promote StartupState to the public documented API. This removes the need for module global implicit state and allows callers to control when accumulated .start and .pth file state is processed if they want. This also fixes the interleaving regression identified by @ncoghlan in the same issue. Now, .pth file sys.path extensions are added to sys.path after the sitedir that the .pth file is found in, restoring the legacy behavior. Along the way, I've made a lot of improvements to function docstrings, site.rst documentation, and comments in the code explaining what's going on.
Documentation build overview
31 files changed ·
|
…NPiO-.rst Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
…NPiO-.rst Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
* Add a note that if known_paths is provided to StartupState.__init__(), it will get mutated in place. * Improve some conditional flows. * Improve some comments. * Improve the what's new entry.
|
When you're done making the requested changes, leave the comment: |
ncoghlan
left a comment
There was a problem hiding this comment.
My approval is for the updated API design - much tidier without the implicit global state. Thanks @warsaw!
For the exact implementation and docs details, +1 to @gpshead's comments and questions (I don't have any strong opinions on how the open questions should be resolved, I just agree there are some details still to be tweaked for consistency)
* Add docs for site.makepath() and point the case-normalization requirement to this utility function. * Note that StartupState.process() is not idempotent.
This time, we get rid of the legacy implementation `reset` local, which was always difficult to understand, and just implement a return value based on the processing mode selected.
|
I have made the requested changes; please review again. |
encukou
left a comment
There was a problem hiding this comment.
Thanks! This looks like a better API design!
I still see two red flags here though: an argument that doesn't combine with other arguments, and (another instance of) changing the return type based on an argument.
Did you consider adding a StartupState.addsitedir(sitedir) method, instead of the startup_state argument?
Sadly, we have to keep these warts to cater to the legacy APIs. We could potentially deprecate them, but I don't really see much value in that, other than API hygiene, and it's probably not worth the backward incompatibilities.
I didn't, but that actually might be pretty cool. state = site.StartupState(known_paths)
state.addsitedir()
state.process()Then we only have to keep |
The comment by @encukou that started this change: ``` I still see two red flags here though: an argument that doesn't combine with other arguments, and (another instance of) changing the return type based on an argument. Did you consider adding a StartupState.addsitedir(sitedir) method, instead of the startup_state argument? ``` As it turns out, this is an even cleaner design. By moving the bulk of the previous module global functions into `StartupState` methods, we can get rid of all the awkward `startup_state` keyword-only arguments which conflict with `known_path` (Petr's first point). We can also get rid of the return value dichotomy (Petr's second point) because now we can preserve exactly the Python 3.14 API in the module global functions, and implement the better APIs in the class methods. We also generally don't have to pass around `process_known_sitedirs`. Now the following module global functions are essentially shims around class methods: * site.addsitedir() -> StartupState.addsitedir() * site.addusersitepackages() -> StartupState.addusersitepackages() * site.addsitepackages() -> StartupState.addsitepackages()
In fact, it was pretty cool. I really like where that headed so now |
|
Thanks @warsaw for the PR 🌮🎉.. I'm working now to backport this PR to: 3.15. |
|
GH-150748 is a backport of this pull request to the 3.15 branch. |
… (#150748) gh-150228: Improve the PEP 829 batch processing APIs (GH-150542) * gh-150228: Improve the PEP 829 batch processing APIs As previously discussed with @ncoghlan and approved for 3.15b2 by @hugovk, this implements the batch processing APIs for addsitedir() and friends. We remove the `defer_processing_start_files` flag which required some implicit module global state, and promote StartupState to the public documented API. This also moves the bulk of the module global functions into methods of the `StartupState` class, so it removes the awkward APIs in 3.15b1. Now, instances of this class are an accumulator for startup state, using `StartupState.process()` to process them. Callers can now batch up startup state themselves by using the methods on this class. The module global functions are shims for this which preserve the legacy APIs and semantics using the new state class. This PR also fixes the interleaving regression identified by @ncoghlan in the same issue. Now, .pth file sys.path extensions are added to sys.path after the sitedir that the .pth file is found in, restoring the legacy behavior. Along the way, I've made a lot of improvements to function docstrings, site.rst documentation, and comments in the code explaining what's going on. * Add a note that if known_paths is provided to StartupState.__init__(), it will get mutated in place. * Improve some conditional flows. * Improve some comments. * Improve the what's new entry. * Make test_impl_exec_imports_suppressed_by_matching_start() more robust Based on PR comment, we need to read both the .pth and .start files, and prove that the .pth file's import line (which passes a bigger increment) is not called, but the .start file's entry point (which uses the default increment) is called. * As per review, move some methods to the private API _read_pth_file() and _read_start_file() are not intended to be part of the public API surface outside of the site module, so even though they are used by methods outside of the StartupState class, make them privately named. * Resolve several review feedbacks * Move a `versionadded` * Better list comprehension formatting (use the output from `ruff format --line-length 78`) * Add docs for site.makepath() and point the case-normalization requirement to this utility function. * Note that StartupState.process() is not idempotent. * Address another feedback comment This time, we get rid of the legacy implementation `reset` local, which was always difficult to understand, and just implement a return value based on the processing mode selected. * Changes based on gh-150228 review The comment by @encukou that started this change: ``` I still see two red flags here though: an argument that doesn't combine with other arguments, and (another instance of) changing the return type based on an argument. Did you consider adding a StartupState.addsitedir(sitedir) method, instead of the startup_state argument? ``` As it turns out, this is an even cleaner design. By moving the bulk of the previous module global functions into `StartupState` methods, we can get rid of all the awkward `startup_state` keyword-only arguments which conflict with `known_path` (Petr's first point). We can also get rid of the return value dichotomy (Petr's second point) because now we can preserve exactly the Python 3.14 API in the module global functions, and implement the better APIs in the class methods. We also generally don't have to pass around `process_known_sitedirs`. Now the following module global functions are essentially shims around class methods: * site.addsitedir() -> StartupState.addsitedir() * site.addusersitepackages() -> StartupState.addusersitepackages() * site.addsitepackages() -> StartupState.addsitepackages() * Additional minor changes * Remove a now unused parameter (cherry picked from commit 27ebd9a) Co-authored-by: Barry Warsaw <barry@python.org> Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
As previously discussed with @ncoghlan and approved for 3.15b2 by @hugovk, this exposes the batch processing APIs for addsitedir() and friends. We remove the
defer_processing_start_filesflag which required some implicit module global state, and promote StartupState to the public documented API. This removes the need for module global implicit state and allows callers to control when accumulated .start and .pth file state is processed if they want.This also fixes the interleaving regression identified by @ncoghlan in the same issue. Now, .pth file sys.path extensions are added to sys.path after the sitedir that the .pth file is found in, restoring the legacy behavior.
Along the way, I've made a lot of improvements to function docstrings, site.rst documentation, and comments in the code explaining what's going on.