Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetch version metadata in batches #170

Closed
charliermarsh opened this issue Oct 23, 2023 · 4 comments · Fixed by #2452
Closed

Fetch version metadata in batches #170

charliermarsh opened this issue Oct 23, 2023 · 4 comments · Fixed by #2452
Assignees
Labels
performance Potential performance improvement
Milestone

Comments

@charliermarsh
Copy link
Member

When resolving:

urllib3<1.25.4
boto3

We end up testing hundreds of boto versions. After a point, we should start fetching the version metadata in parallel for these cases.

@charliermarsh charliermarsh added the performance Potential performance improvement label Oct 24, 2023
@charliermarsh charliermarsh added this to the Future milestone Oct 25, 2023
@konstin
Copy link
Member

konstin commented Nov 24, 2023

This is effectively blocked on being smarter with ranges for me, either by being a lot faster with intersections, or ideally having ranges with fewer segments.

Checking the first ~500 version of tf-models-nightly takes 12s for me, which are nearly exclusively spent on range intersection of ranges with a lot of segments, even though we should be able to merge the segments.

image

@charliermarsh
Copy link
Member Author

Is that with all the metadata cached? The fragmented ranges thing is similar to what @zanieb is looking at for error reporting and is related to an issue I filed on PubGrub about this.

@konstin
Copy link
Member

konstin commented Nov 24, 2023

Yes, it basically does not make sense to work on this until pubgrub-rs/pubgrub#135 is fixed or we have some workaround. We could try making intersections really fast but i think this would be effort spent in the wrong place when range shouldn't be that large in the first place.

@konstin
Copy link
Member

konstin commented Feb 20, 2024

As an update, pubgrub is much faster now, and it will be even faster for this case with pubgrub-rs/pubgrub#177 (the benchmark i used hits the boto case), so we can retry doing this.

zanieb pushed a commit that referenced this issue Mar 13, 2024
This update pulls in pubgrub-rs/pubgrub#177,
optimizing common range operations in pubgrub. Please refer to this PR
for a more extensive description and discussion of the changes.

The changes optimize that last remaining pathological case,
`bio_embeddings[all]` on python 3.12, which has to try 100k versions,
from 12s to 3s in the cached case. It should also enable smarter
prefetching in batches (#170),
even though a naive attempt didn't show better network usage.

**before** 12s


![image](https://github.com/pubgrub-rs/pubgrub/assets/6826232/80ffdc49-1159-453d-a3ea-0dd431df6d3b)

**after** 3s


![image](https://github.com/pubgrub-rs/pubgrub/assets/6826232/69508c29-73ab-4593-a588-d8c722242513)

```
$  taskset -c 0 hyperfine --warmup 1 "../uv/target/profiling/main-uv pip compile ../uv/scripts/requirements/bio_embeddings.in"  "../uv/target/profiling/branch-uv pip compile ../uv/scripts/requirements/bio_embeddings.in"
Benchmark 1: ../uv/target/profiling/main-uv pip compile ../uv/scripts/requirements/bio_embeddings.in
  Time (mean ± σ):     12.321 s ±  0.064 s    [User: 12.014 s, System: 0.300 s]
  Range (min … max):   12.224 s … 12.406 s    10 runs

Benchmark 2: ../uv/target/profiling/branch-uv pip compile ../uv/scripts/requirements/bio_embeddings.in
  Time (mean ± σ):      3.109 s ±  0.004 s    [User: 2.782 s, System: 0.321 s]
  Range (min … max):    3.103 s …  3.116 s    10 runs

Summary
  ../uv/target/profiling/branch-uv pip compile ../uv/scripts/requirements/bio_embeddings.in ran
    3.96 ± 0.02 times faster than ../uv/target/profiling/main-uv pip compile ../uv/scripts/requirements/bio_embeddings.in
```

It also adds `bio_embeddings[all]` as a requirements test case.
@konstin konstin self-assigned this Mar 14, 2024
konstin added a commit that referenced this issue Apr 8, 2024
…2452)

With pubgrub being fast for complex ranges, we can now compute the next
n candidates without taking a performance hit. This speeds up cold cache
`urllib3<1.25.4` `boto3` from maybe 40s - 50s to ~2s. See docstrings for
details on the heuristics.

**Before**


![image](https://github.com/astral-sh/uv/assets/6826232/b7b06950-e45b-4c49-b65e-ae19fe9888cc)

**After**


![image](https://github.com/astral-sh/uv/assets/6826232/1c749248-850e-49c1-9d57-a7d78f87b3aa)

---

We need two parts of the prefetching, first looking for compatible
version and then falling back to flat next versions. After we selected a
boto3 version, there is only one compatible botocore version remaining,
so when won't find other compatible candidates for prefetching. We see
this as a pattern where we only prefetch boto3 (stack bars), but not
botocore (sequential requests between the stacked bars).


![image](https://github.com/astral-sh/uv/assets/6826232/e5186800-23ac-4ed1-99b9-4d1046fbd03a)

The risk is that we're completely wrong with the guess and cause a lot
of useless network requests. I think this is acceptable since this
mechanism only triggers when we're already on the bad path and we should
simply have fetched all versions after some seconds (assuming a fast
index like pypi).

---

It would be even better if the pubgrub state was copy-on-write so we
could simulate more progress than we actually have; currently we're
guessing what the next version is which could be completely wrong, but i
think this is still a valuable heuristic.

Fixes #170.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Potential performance improvement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants