-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solving slows down dramatically when testing hundreds of versions #135
Comments
I'll let @Eh2406 answer on the performance question as I've been out of the loop for quite a long time. Feels weird though as your root only depends on a single version of foo. So a soon as one is found to be ok, it should be good. But if you're not adding any "bar" to the knowledge, there isn't any solution. One remark, I'd try to separate the performance of the solver and the performance of the error reporting, as no effort has been put into error reporting speed at all. Regarding you second issue, it's not as you think. An open bracket means the value is excluded. So |
Thanks @mpizenberg.
To clarify, omitting the error handling entirely ( |
Means: versions between 0a0.dev0 (included) and 5.0a1 (excluded) and versions >= 5.0a1.post0.dev0. Assuming 5.0a1.post0.dev0 comes next just after 5.0a1, this is the reduced form saying, "all versions >= 0.a0.dev0, except 5.0a1 which is excluded" |
That makes sense, and I'm still learning as I go here, but, assume we know that there are no versions after Similarly, given |
To be clear: I don't know for certain that the term expansion itself the cause of the slowdown. I have just noticed, in practice, that the solver slows to a halt when running into cases in which hundreds of versions of a given package are tested in sequence, and in profiling, significant time in spent computing term unions and intersections, which led me to notice that the terms get extremely long in these cases. |
oh yeah definitely. I don't remember clearly what amount of work did we put into this kind of reduction. If I remember, reduction of terms was still in our todo list. Jacob can confirm. Some of it was discussed in #19 if I recall correctly.
Thanks for reporting it! It definitely helps to have concrete examples like these to figure out performance improvement opportunities |
Thank you for reporting and for the interesting conversation! With the current architecture the resolver does not know about any It would be nice to be able to confirm that this is what's actually causing the slowdown in this case. let me do some experiments. |
On a side note, another generic way to help speedup the solver is to pre-record incompatibilities representing any kind of representable knowledge, such as non-existing versions in a given range, or a strategic incompatibility between two packages that can make the solver save many steps and dead ends (cf #121) It's possible that given an ecosystem of packages (like rust packages) some key incompatibilities could save hours of cumulative time spent by everyone. |
Before #112 fn main() {
let mut dependency_provider = OfflineDependencyProvider::<&str, Range<NumberVersion>>::new();
// root depends on foo...
dependency_provider.add_dependencies("root", 1, vec![("foo", Range::any())]);
for i in 1..500 {
let start = Instant::now();
// foo depends on bar...
dependency_provider.add_dependencies("foo", i, vec![("bar", Range::any())]);
_ = resolve(&dependency_provider, "root", 1);
let time = start.elapsed().as_secs_f32();
println!("{i}, {time}");
}
} I used Excel to fit a trendline (note: the two lines are on different axes): |
Thinking about yesterday's analysis again even when |
Regarding the rebuilding of assignments intersection ( https://github.com/pubgrub-rs/pubgrub/blob/dev/src/internal/partial_solution.rs#L279-L282 ) we could speed that up in different ways. First, intersection is fundamentally associative, meaning it's fully parallelizable (even on the GPU?). So we could compute the intersection in parallel. Then multiple backtracking are essentially recomputing exactly the same intersections, up to a point. Therefore, we could have logarithmically spaced snapshots of intersections, up-to a point for a given package. Then when backtracking happens, we start at the closest snapshot before the backtracked step, instead of starting from 0. |
I spent a lot of last night thinking about this problem, time when a more reasonable person would have been sleeping.
To elaborate, the benchmark as written should be a pretty direct test of this optimization, however it does not seem to be firing correctly. It is not firing because the fact that It is fairly easy to change the example to avoid that optimization even if it were working correctly. So algorithmic improvements to backtracking and finding satisfiers are probably valuable. Most importantly, a lot of these computations can be cashed. Either in a targeted fashion, by keeping track of exactly what's needed and pre-computing it. Or by a general fashion in which we keep track of the relations between terms in some data structure. Designs here can get pretty fanciful but generally can be explored after 0.3. (Well possibly adding a The problems of optimizing the representation of I'm considering breaking these three topics in to separate issues to keep conversations focused. Thoughts? |
So I just had a look at the code this morning, to have a sense of what it would mean to store pre-intersected date derivations. There are three locations where we recompute these terms intersections.
Point (3) is tricky because we can't get around recomputing all intersections, since we have a different starting point. It could still be speed up by using a dichotomic search, where we look at the previous satisfier by doing at each search point the intersection between that initial term and the precomputed accumulators. |
I have been working on several different approaches to see what will make a difference on these benchmarks. None of them are at a state for public discussion. Each on their own did not make a significant performance difference. Today I experimented with putting them altogether. And I'm seeing significant results! The current version of the benchmark is: let mut dependency_provider = OfflineDependencyProvider::<&str, Range<NumberVersion>>::new();
// root depends on foo...
dependency_provider.add_dependencies("root", 1, vec![("foo", Range::full())]);
for i in 1..500 {
// foo depends on bar...
dependency_provider.add_dependencies("foo", i, vec![("bad", Range::full())]);
}
let start = Instant::now();
_ = resolve(&dependency_provider, "root", 1);
let time = start.elapsed().as_secs_f32();
let len = dependency_provider
.versions(&"foo")
.into_iter()
.flatten()
.count();
println!("{len}, {time}"); with a slower variation by changing
This is without optimizations for making I will clean up some of my hacks into PR's, but probably not till after PackagingCon. I may also push for us to merge #104, as keeping two versions of each of these commits is getting difficult. |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
It turns out that one of the hacks on its own was sufficient to get most of the performance benefit for the harder case. I am working on polishing for review in branch slow_135. The fix may be over indexed on the synthetic benchmark. @charliermarsh can you check if it helps with your real world use case? If not, we will have to figure out how to make a more realistic benchmark. |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
Here are some numbers on the synthetic benchmark:
All of the mentioned branches pass all tests and should be cleanable for merge, if they turn out to be as useful as they seem. All of those branches are pushed, so you can check how much of that improvement translates to real world examples by commit. I will run our "ron" files, next time I'm at my benchmarking computer. The flame graph still shows obvious opportunities for improvement, although interestingly not the same suggestions from last time. |
I was able to update our branch to pull in those changes, and I let this pathological resolution run for 60 seconds. |
The last 2 (the ones with *) would take far too long to run a full criterion analysis, so I'm just reporting the "Collecting 100 samples in estimated" time. To say this is not the dramatic improvement I was hoping for, would be an understatement. On the other hand, I don't have an example demonstrating the dramatic slowdowns @charliermarsh observed. 🤔 |
When testing many consecutive versions of a given package, the cost of computing compatible versions seems to increase non-linearly over time.
Concretely, running this example slows down noticeably for me at around 100 iterations, and grinds to a ~halt in the 200s:
I suspect that this may have to do with the increased size of the intersected versions. For example, after testing and rejecting Django 5.0a1, we produce a term like:
We then test Django 4.2.6, which gives us:
We then take the intersection of these terms, which gives us:
This continues until we have a range like:
If you're testing hundreds of versions, the terms continue to grow, and I suspect this leads to some kind of quadratic behavior, since we're increasing the number of terms linearly with the number of tested versions.
The intersections aren't "wrong", and it's possible that we're doing something wrong with our version formulations -- but could we take advantage of the fact that we know there are no versions in these partial ranges?
For example, given
[ 0a0.dev0, 5.0a1 [ [ 5.0a1.post0.dev0, ∞ [
and[ 0a0.dev0, 4.2.6 [ [ 4.2.6.post0.dev0, ∞ [
, it would be preferable to reduce to[ 0a0.dev0, 4.2.6 [
, if I'm understanding the syntax correctly.The text was updated successfully, but these errors were encountered: