-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New HTTP cache with lower memory usage #11143
Conversation
Summary of failing checks:
|
@pfmoore would appreciate any feedback. |
Not sure what you want from me? I don't feel I have anything to say regarding the technical details of the implementation. As far as the design goes, my position remains the same. I don't like pip leaving a load of unused cache data that it will no longer use around. This PR seems to simply offer a means of listing what's present via The savings with the new cache are excellent, and maybe that justifies a sucky UI, and hidden bloat for anyone who isn't already sufficiently affected by the cache size to be actively looking for a way to clean stuff up. This whole thing feels like we're designing for the rare/suboptimal case (people using both old and new versions of pip on the same machine) where we should be optimising for the normal/recommended case (people always use the latest version of pip). As such, use the new style cache and simply trash an old style cache if found is the indicated approach. Just possibly with a temporary means to opt into not deleting the old-style cache (a I'd be very interested to know how other CacheControl users have handled this sort of transition. I can't believe we're the only people in this situation. I wish CacheControl had better support for this type of transition, but as they don't seem to, I'd like to know how their other users feel about the situation. At the moment, I'm not clear on the way forward here. We still seem no closer to having an acceptable transition strategy. But maybe I'm the only one with reservations about just leaving the old cache around. If so, one of the other maintainers can override me and merge this - I won't object to that, my concerns aren't severe enough that I want to block someone else from accepting this if they feel I'm making too much of the issue. |
I am happy to clear out the old cache if that's what you prefer, I don't have strong opinions here. And I don't know how |
Mostly by rough consensus, so I think the best thing to do is to wait for some of the other @pypa/pip-committers to weigh in with their opinions. |
Note btw that this may also speed up installation slightly, since it doesn't involve decoding a 500MB messagepack message to extract the body. |
0d76330
to
e8654ad
Compare
I am back to working on this now that |
Any chance for a review? |
Did you do this, or is the code still implementing the same strategy as it was when I previously commented? I don't think I have anything new to add beyond what I said above. |
It's still the same strategy. The main difference is that this is now at least theoretically mergeable, insofar as it's vendoring a released version of cachecontrol instead of a patched one. |
Cool. I still think we should build this to optimise for people who are upgrading to the latest pip, as I said above (and hence switch to the new cache format and clear out the old data, rather than leaving it there "just in case") but that's just my view. I'm fine if you'd rather leave the code as is and wait for other opinions. |
I do have a few concerns with wiping out the cache:
I wonder if it might not be better to sort of phase it in? Basically do:
I dunno, take my thoughts with a big grain of salt, just what popped into my head. |
Alternatively, put this behind a flag? |
Let's leave the cache in place for now. I think we should wipe it in a later release, but not in the first cut for sure. |
I don't think this needs an opt-in/opt-out cycle, to be honest. It's unclear to me what that would achieve. |
OK. How will we ensure we don't forget to do that? Maybe create a follow-up PR now that adds the "delete old cache data" functionality, and add it to a later release milestone? Would it be worth adding some docs in https://pip.pypa.io/en/stable/topics/caching/ covering how to manually clear the old cache data if you know you don't need it? The "Cache management" section is basically all about managing the wheel cache, and doesn't help anyone who wants to clear the HTTP cache. Even if it's just "to clear the HTTP cache, delete the contents of such-and-such directory, there's no fine control" along with a note explaining that pip switched to a new cache format in version X.Y and the old cache data will no longer be used after that. This is very much from my personal perspective, as someone who hates wasting disk space on "obsolete" cached data (there's a bunch of directories on my PC linked to apps I no longer have installed, and that annoys the heck out of me 🙁). I'd rather pip not be one of the tools that annoys me like that... |
OK, so summary of tasks so far—
In current PR:
|
I personally don't think we should wipe the old cache without user consent. I frequently see users with a large number of virtual environments with various python and pip versions. Wiping the cache would make thing painful for them, while really benefiting only few users who would have very large caches and can afford to use the same (or modern) pip versions everywhere. So I'd be in favor of documenting. We could also log a warning when creating the new cache while an old cache is present, to inform users and give them information so they can decide whether to clean or not. |
A warning as an alternative seems like an option for "in a couple of releases", insofar as the gap is intended to decrease chances of people using multiple versions of I have filed #12082 for the "what do we do with old cache" issue. In addition to the options discussed here (delete automatically, warn with instructions) I added a third option. Further discussion of the cleanup issue should probably happen there insofar as a cleanup mechanism is not deemed immediately necessary in this PR. |
45c69e2
to
a86d7d6
Compare
I have addressed the immediate comments, and there is an issue filed (#12082) for whatever the clean-up policy is going to be in a future release. |
Beyond the two comments above, anyone have other comments and able to do a formal sign off? Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, appears to be a solid improvement over how the cache currently works and the migration path looks good.
Since I haven't been super active in pip lately I'm not going to merge this myself out of respect for the folks who have been, but for whatever my review is worth, I think this is good to go.
Thank you! There's one unaddressed comment above re output, I can try to do it today if I have time, otherwise tomorrow. |
Cool - I'd misread "maybe we can merge" as meaning it was OK as it is, rather than "we can update the doc to merge these cases" 🙂 Ping me once you've done the update and I'll merge the PR then. |
OK, did a tweak, so if tests pass that should address that comment. |
No idea why this is failing. Doesn't seem related to the PR? |
Unrelated, see discussion in #12267. |
Added to the 23.3 milestone, so that it doesn't get forgotten. If we decide to defer it, let's make sure that at least it's a conscious decision (personally, I'm fine for it to go in). @itamarst could you merge the latest main into this PR and fix up any remaining CI issues? |
OK, merging as per this comment |
Thank you! |
Your patience and persistence in getting this PR incorporated is highly appreciated! Sorry it was such a long process. |
Absolutely awesome, I'm very happy this landed, thanks 🙏 |
Mostly I just hope I didn't break everything 😁 |
Fixes #2984
Additionally, updates
CacheControl
to the latest release.TL;DR: Huge decrease in memory usage, by using streaming cache instead of loading (multiple copies of) the whole file into memory in one go.
Note to reviewer: The Windows CI didn't run for reasons that I can't control. Please rerun, and if tests fail I will fix them.
Open questions
What to do about the old cache? My personal feeling is leave it alone, and then follow-up in a year or so with code to delete old caches. Happy to implement some cleanup code if you disagree.
Did I catch all places in
pip
that need updating? The original code is from a year ago so there might be something changed since then.Memory usage improvements
I did a before/after memory usage comparison of installing
tensorflow
, which is a 500MB wheel for my OS/Python version. The measurement was done with the following script:I ran it twice, once with latest released version of
pip
(./measure-memory.sh pip
) and once with my branch (./measure-memory.sh $PWD
). Here are the results, measuring peak memory usage:As you can see, memory usage is significantly improved!