Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: don't redundantly add incompatibilities (1.5x speedup) #29

Merged
merged 1 commit into from
Oct 12, 2020

Conversation

Eh2406
Copy link
Member

@Eh2406 Eh2406 commented Oct 10, 2020

This follows up on #19 (comment), by adding a cache of get_dependencies already called. For me the benchmark 0.175 sec -> 0.086 sec so that is a big win!

The odd type Map<P, Set<V>> instead of ___Set<(P, V)> is because (P, V) does not impl Hash nor Ord.

@aleksator
Copy link
Member

The odd type Map<P, Set> instead of ___Set<(P, V)> is because (P, V) does not impl Hash nor Ord.

I'd say it would be odd to do it the other way actually. It's logical to group versions by packages in Map rather than copying it in every tuple.

Looks good!

@Eh2406 Eh2406 marked this pull request as draft October 10, 2020 22:33
@Eh2406
Copy link
Member Author

Eh2406 commented Oct 10, 2020

I marked this as Draft as when rebased on #32 it is not clear that it pulls its weight. We will need to reevaluate after a new benchmark is made/found.

@Eh2406
Copy link
Member Author

Eh2406 commented Oct 11, 2020

I did some work on new benchmarks, and it looks like this is still 1.5x after #32. The new benchmarks will hopefully have a PR up later today.

Having slept on it, A variation on this that guarantees that get_dependencies is only called once for each arguments, would mean that we don't need to recommend caching get_dependencies. So may be worth doing even if it is perf neutral.

@mpizenberg
Copy link
Member

Looks good!

I'm not sure what you mean by

A variation on this that guarantees that get_dependencies is only called once for each arguments, would mean that we don't need to recommend caching get_dependencies

but any useful and performance-sensitive retriever will be kept around for more than one resolve call, so it should cache dependencies. With this version you implemented, it's a gain in performance, and no double caching. So in my opinion it's optimal while we don't have more usage.

@Eh2406
Copy link
Member Author

Eh2406 commented Oct 11, 2020

I pushed another commit that impls my idea, it means that we only call get_dependencies once and store the incompatibilities. For Cargo's use case where we are building only one dependency tree we wouldn't need an additional cache. For the benchmarks I have made, it has the same performance. It is a separate commit, so if we don't want the more complicated code we can remove it easily.

@mpizenberg
Copy link
Member

For Cargo's use case where we are building only one dependency tree we wouldn't need an additional cache.

I thought Cargo would store in target/ or in ~/.cargo a cache file of dependencies that it can load once instead of looking for every package dependencies in the file system. Or does it do something completely different?

@mpizenberg
Copy link
Member

so if we don't want the more complicated code we can remove it easily.

It's safer to move one step first, and then evaluate with other benchmarks the impact of caching dependencies. We can safely merge the simple version once #32 lands, then re-evaluate.

@Eh2406 Eh2406 mentioned this pull request Oct 11, 2020
@Eh2406
Copy link
Member Author

Eh2406 commented Oct 11, 2020

It's safer to move one step first, and then evaluate with other benchmarks the impact of caching dependencies. We can safely merge the simple version once #32 lands, then re-evaluate.

Good plan. Backed out the complications.

I thought Cargo would store in target/ or in ~/.cargo a cache file of dependencies that it can load once instead of looking for every package dependencies in the file system. Or does it do something completely different?

It does have a copy of crates.io-index in ~/.cargo , and that gets cache as files (for when git is to slow), and then there is an in memory cache (for when the resolver asks for the same package and files are to slow), and then another in memory cache (for when the resolver asks for the same dependencies and the memory cache are to slow). (I may have missed some.) So I think we could skip the memory caches.

@mpizenberg mpizenberg marked this pull request as ready for review October 12, 2020 18:42
@aleksator aleksator merged commit f297599 into dev Oct 12, 2020
@aleksator aleksator changed the title perf: don't redundantly add incompatibilities (2x speedup) perf: don't redundantly add incompatibilities (1.5x speedup) Oct 12, 2020
@Eh2406 Eh2406 deleted the dont-readd-deps branch October 12, 2020 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants