feat: add a `simplify` for error messages #156

Eh2406 · 2023-11-22T21:08:43Z

Ranges generated by PubGrub often end up being verbose and pedantic about versions that do not matter. To take an example from #155 2 | 3 | 4 | 5 could be more concisely stated as >=2, <=5 given that it's not possible to have versions in between the integers. More generally it could be expressed as >=2, if the list of available versions were taken into account.

The logic for simplifying a VS given a complete set of versions feels like it should be simple. But it is trickier to implement than I expected. Especially if you want to guarantee O(len(VS) + len(versions)) time and O(1) allocations.

While working through the logic I had to implement a check which versions match from a list in O(len(VS) + len(versions)) time, which felt useful enough to be worth including in the API. In implementing that I noticed that our implementation of contains, did not aggressively short-circuit. The short-circuiting implementation is just as easy to read, so I decided to include that as well.

Eh2406 · 2023-11-22T21:19:49Z

cc @zanieb, I think this can be used to dramatically improve your error messages.

cc @baszalmstra mamba-org/resolvo#2 (comment) I don't know if these are improvements you'd be interested in include in your copy of this type.

A future possibility is figuring out when it is safe for PubGrub to ask for a range to be simplified during processing. This would be enormously helpful for #135. But needs to be done with extreme care. Basic set properties do not hold if simplify is added to the equation: A.itersection(A.negate()) == empty is true, but simplify(A).itersection(simplify(A.negate())) == empty is not.

mpizenberg · 2023-11-23T23:52:02Z

It feels like a very good idea for error reporting where solver logic is not important anymore, and readability is much more useful. Also at this point the potential costs are likely negligible compared to the solving costs?

mpizenberg · 2023-11-23T23:56:09Z

src/range.rs

-                (Excluded(start), Excluded(end)) => v > start && v < end,
+                (Excluded(start), _) => v <= start,
+                (Included(start), _) => v < start,
+                (Unbounded, _) => false,


Isn't this wrong if the very first segment is (Unbounded, Unbounded)?

This whole short-circuit change feels shady ...

The false here means that it fails to short-circuit and falls through to the check below which correctly returns true. That being said I need to reread it twice to figure out what was going on here, and I wrote the code yesterday. There has to be a clear way to write this.

Similarly, we can probably improve the testing to ensure that the code are correct, even when were not paying attention. I will need to look into what is worth the effort. (One day, I would love to use kani or creusot to prove the code correct, not that I have time for it now.)

Eh2406 · 2023-11-24T18:46:03Z

I tinkered with the code to make it more "obviously correct". While I was at it I noticed that STD provide some helpful methods which simplified a few things.

I looked at our test code generation, and I'm pretty confident it will cover all the corner cases.

mpizenberg · 2023-11-24T22:08:00Z

src/range.rs

-                (Excluded(start), Included(end)) => v > start && v <= end,
-                (Excluded(start), Excluded(end)) => v > start && v < end,
-            } {
+            if !within_lower_bound(segment, v) {


Ah ok, I I'm getting it now. The naming sounds a bit weird? "within bounds" makes sense, but "within lower bound" feels odd. I'd say something like "above lower bound", "below upper bound". Or if we want to avoid the negation, use "below" in both cases, like this:

if version_below_lower_bound( v, segment ) { return false; } else if version_below_upper_bound( v, segment ) { return true; }

PS, there is a typo in within_uppern_bound with an "n" at the end of "upper".

Or alternatively, a method on the bounded range maybe:

if segment.is_above(v) { return false; } else if !segment.is_below(v) { return true; }

I'm not a super fan of the not ! in there though.

EDIT: forget that, we don't have a type for the segment

How about a within_bounds that returns a https://doc.rust-lang.org/std/cmp/enum.Ordering.html ?

So taking Ordering as a slightly richer Bool. It's tempting. In that case it's important to have the version argument before the segment argument (within_bounds(v, segment)) because ordering can be confusing.

mpizenberg · 2023-11-26T00:51:40Z

What do you think of splitting a bit the simplify function, to make it easier to follow? I'm thinking something like this (pseudo code):

// return the segment index in the range for each version in the range, None otherwise
locate_versions(&self, versions: Iter<V>) -> Iter<Option<usize>>

// group adjacent versions locations
// [None, 3, 6, 7, None] -> [(3, 7)]
// [3, 6, 7, None] -> [(None, 7)]
// [3, 6, 7] -> [(None, None)]
// [None, 1, 4, 7, None, None, None, 8, None, 9] -> [(1, 7), (8, 8), (9, None)]
group_adjacent_locations(locations: Iter<Option<usize>>) -> Iter<(Option<usize>, Option<usize>)>

// simplify range with segments at given location bounds.
keep_segments(&self, kept_segments: Iter<(Option<usize>, Option<usize>)) -> Self

simplify(&self, versions: Iter<V>) -> Self {
  let version_locations = self.locate_versions(versions);
  let kept_segments = group_adjacent_locations(version_locations);
  self.keep_segments(kept_segments)
}

Eh2406 · 2023-11-27T01:01:57Z

group_adjacent_locations was harder than I was expecting. It's hard to build out of normal iterator methods, because it may need to return one more result after the underlying iterator has been exhausted. It would be straightforward to implement using generators, if they were stable. But eventually I made it work.

zanieb · 2023-11-27T16:02:15Z

I'll try to use this today!

mpizenberg · 2023-11-27T18:49:37Z

group_adjacent_locations was harder than I was expecting.

If you feel it's harder than worth it maybe it's better to not split group_adjacent_locations and keep_segments. I'm not finding more straightforward ways to write it either. Your call.

I don't see your previous version to compare complexity. I guess you forced push rewrote it. I haven't followed the CI merging changes. Are PRs still squashed merged? or are they "normal" merge? Because if they are still squashed merge there is no need to overwrite your commit while working on the PR. And if they aren't well it's slightly annoying to make sure all a PR history is clean. I don't want to derail this PR, I just thought it was relevant as it makes my reviewing of the PR harder.

Eh2406 · 2023-11-27T19:33:55Z

I think they're worth keeping. I wish they had been easier to come up with or more direct code, but they do make things clear.

I did force push, and will try and avoid doing so while you're reviewing. I got into the habit because it simplifies the history which is helpful when dealing with several long-lived branches. Hopefully those will become less common. If you scroll through conversation here on GitHub it provides links for every time I force push, like

the compare button can show you what changed between each force push.

src/range.rs

konstin · 2023-11-28T08:32:45Z

But needs to be done with extreme care. Basic set properties do not hold if simplify is added to the equation: A.itersection(A.negate()) == empty is true, but simplify(A).itersection(simplify(A.negate())) == empty is not.

Could you give an example where this doesn't hold? I tried but couldn't find one.

konstin · 2023-11-28T11:12:21Z

I made a POC for simplifying all terms: astral-sh/pubgrub@main...zanieb:pubgrub:know-thy-versions-rebase. The main problem is that this breaks accum_term.subset_of(&incompat_term), any ideas? I've just inserted the simplification at all the places where we regularly intersect, though i feels like we shouldn't need to build up all the intersections every time in the first place.

Co-authored-by: konsti <[email protected]>

zanieb · 2023-11-28T18:14:19Z

Hm so I gave this a try by hacking it into the reporter and testing a simple holes case.

b9b0e1d

Before:

Because there is no available version for bar and foo 1.0.0 depends on bar, foo 1.0.0 is forbidden.
And because there is no version of foo in <1.0.0 | >1.0.0, <2.0.0 | >2.0.0, <3.0.0 | >3.0.0, <4.0.0 | >4.0.0, foo <2.0.0 | >2.0.0, <3.0.0 | >3.0.0, <4.0.0 | >4.0.0 is forbidden. (1)

Because there is no available version for bar and foo 2.0.0 depends on bar, foo 2.0.0 is forbidden.
And because foo <2.0.0 | >2.0.0, <3.0.0 | >3.0.0, <4.0.0 | >4.0.0 is forbidden (1), foo <3.0.0 | >3.0.0, <4.0.0 | >4.0.0 is forbidden. (2)

Because there is no available version for bar and foo 3.0.0 depends on bar, foo 3.0.0 is forbidden.
And because foo <3.0.0 | >3.0.0, <4.0.0 | >4.0.0 is forbidden (2), foo <4.0.0 | >4.0.0 is forbidden. (3)

Because there is no available version for bar and foo 4.0.0 depends on bar, foo 4.0.0 is forbidden.
And because foo <4.0.0 | >4.0.0 is forbidden (3), foo * is forbidden.
And because root 1.0.0 depends on foo, root 1.0.0 is forbidden.

After:


Because there is no version of bar in ∅ and foo <=1.0.0 depends on bar ∅, foo 1.0.0 is forbidden.
And because there is no version of foo in ∅, foo <2.0.0 | >2.0.0, <3.0.0 | >3.0.0, <4.0.0 | >4.0.0 is forbidden. (1)

Because there is no version of bar in ∅ and foo 2.0.0 depends on bar ∅, foo 2.0.0 is forbidden.
And because foo <2.0.0 | >2.0.0, <3.0.0 | >3.0.0, <4.0.0 | >4.0.0 is forbidden (1), foo <3.0.0 | >3.0.0, <4.0.0 | >4.0.0 is forbidden. (2)

Because there is no version of bar in ∅ and foo 3.0.0 depends on bar ∅, foo 3.0.0 is forbidden.
And because foo <3.0.0 | >3.0.0, <4.0.0 | >4.0.0 is forbidden (2), foo <4.0.0 | >4.0.0 is forbidden. (3)

Because there is no version of bar in ∅ and foo >=4.0.0 depends on bar ∅, foo 4.0.0 is forbidden.
And because foo <4.0.0 | >4.0.0 is forbidden (3), foo * is forbidden.
And because root depends on foo, root 1.0.0 is forbidden.

While we can certainly do something to improve the null cases, I'm not seeing this as a drastic improvement. Perhaps I'm doing something wrong?

Eh2406 · 2023-11-28T18:41:16Z

But needs to be done with extreme care. Basic set properties do not hold if simplify is added to the equation: A.itersection(A.negate()) == empty is true, but simplify(A).itersection(simplify(A.negate())) == empty is not.

Could you give an example where this doesn't hold? I tried but couldn't find one.

You seem to be correct, this implementation of simplify does uphold this property. The point I was trying to make is defining what properties simplify must uphold and making sure that the rest of the algorithm doesn't rely on anything else requires some care.

I made a POC for simplifying all terms: astral-sh/pubgrub@main...zanieb:pubgrub:know-thy-versions-rebase. The main problem is that this breaks accum_term.subset_of(&incompat_term), any ideas? I've just inserted the simplification at all the places where we regularly intersect,

There are clearly some properties that code is relying on that simplify does not uphold. What they are, I do not yet know. And I would like to separate/delay the conversation about how the algorithm can use simplify until after we merge making the freestanding method useful/available.

though i feels like we shouldn't need to build up all the intersections every time in the first place.

You may want to look at the "accumulated_intersection" and the "fewer_intersections" branches. I be happy to discuss other changes to the algorithm to reduce intersections, either on zulip or in a issue.

Hm so I gave this a try by hacking it into the reporter and testing a simple holes case.

This is exactly where I was hoping this freestanding method would be useful. Let's see how much it helped...

While we can certainly do something to improve the null cases, I'm not seeing this as a drastic improvement. Perhaps I'm doing something wrong?

That is not as helpful as I was hoping :-( clearly simplify is not working as well as I'd like. Let me experiment with your branch and I will report back.

Eh2406 · 2023-11-28T19:11:11Z

Got it. Most of the error reporting is based on the terms, replacing terms: derived.terms.clone(), with

terms: derived
                    .terms
                    .iter()
                    .map(|(p, t)| (p.clone(), t.simplify(versions.get(&p).unwrap_or(&Vec::new()).into_iter())))
                    .collect(),

and adding a simplify method to term that forwards along. I get:

Because there is no version of bar in ∅ and foo <=1.0.0 depends on bar ∅, foo <=1.0.0 is forbidden.
And because there is no version of foo in ∅, foo <2.0.0 is forbidden. (1)

Because there is no version of bar in ∅ and foo 2.0.0 depends on bar ∅, foo 2.0.0 is forbidden.
And because foo <2.0.0 is forbidden (1), foo <3.0.0 is forbidden. (2)

Because there is no version of bar in ∅ and foo 3.0.0 depends on bar ∅, foo 3.0.0 is forbidden.
And because foo <3.0.0 is forbidden (2), foo <4.0.0 is forbidden. (3)

Because there is no version of bar in ∅ and foo >=4.0.0 depends on bar ∅, foo >=4.0.0 is forbidden.
And because foo <4.0.0 is forbidden (3), foo * is forbidden.
And because root depends on foo, root * is forbidden.

zanieb · 2023-11-28T19:34:19Z

@Eh2406 ah that makes a lot of sense! I'll explore the effect on more error messages then.

zanieb · 2023-11-28T19:46:01Z

With 53c9f6d we get better handling of the empty ranges

Because there is no available version for bar and foo <=1.0.0 depends on bar, foo <=1.0.0 is forbidden.
And because there is no available version for foo, foo <2.0.0 is forbidden. (1)

Because there is no available version for bar and foo 2.0.0 depends on bar, foo 2.0.0 is forbidden.
And because foo <2.0.0 is forbidden (1), foo <3.0.0 is forbidden. (2)

Because there is no available version for bar and foo 3.0.0 depends on bar, foo 3.0.0 is forbidden.
And because foo <3.0.0 is forbidden (2), foo <4.0.0 is forbidden. (3)

Because there is no available version for bar and foo >=4.0.0 depends on bar, foo >=4.0.0 is forbidden.
And because foo <4.0.0 is forbidden (3), foo * is forbidden.
And because root depends on foo, root * is forbidden.

There's a bit of a problem because And because there is no available version for foo should read And because there is no available version for foo >1.00,<2.0.0 and And because foo <2.0.0 is forbidden (1) should read And because foo <2.0.0 is forbidden (1) and there is no available version for foo >2.0.0,<3.0.0

While attempting to use this simplification code I got an odd lifetime error with ``` let c = set.complement(); let s = c.simplify(versions); s.complement() ``` By in lining locate_versions the lifetimes could be simplified so that that code works

Eh2406 · 2023-11-28T20:28:12Z

Right. This simplification code assumes that information about versions that don't exist is unneeded. Which is not true when dealing with "NoVertions", or anything derived from them. Because of #155, everything in this example derives from a "NoVertions". I'm open to ideas on how to get us to a better place.

In the meantime, the open question in this PR is whether this code is useful and worth merging.

zanieb · 2023-11-28T22:13:17Z

I think this is a pretty clear path to better error messages. We can either continue tackling it piecewise by merging or devote a new branch to error messaging and merge the whole thing to dev at once.

Eh2406 · 2023-11-28T22:44:11Z

This project has been plagued with long living branches, so I'm biased toward merging as often as is acceptable.

Furthermore there are at least two independent ways to build on this PR, incorporating it in the algorithm and using it on the output, which could be done in parallel and may each take several attempts.

mpizenberg · 2023-11-28T22:55:33Z

That's a lot nicer error message @zanieb ! If this is useful as-is I'd agree with @Eh2406 that we can merge it.

src/range.rs

Co-authored-by: Zanie Blue <[email protected]>

Uses pubgrub-rs/pubgrub#156 to consolidate version ranges in error reports using the actual available versions for each package. Alternative to astral-sh/pubgrub#8 which implements this behavior as a method in the `Reporter` — here it's implemented in our custom report formatter (#521) instead which requires no upstream changes. Requires astral-sh/pubgrub#11 to only retrieve the versions for packages that will be used in the report. This is a work in progress. Some things to do: - ~We may want to allow lazy retrieval of the version maps from the formatter~ - [x] We should probably create a separate error type for no solution instead of mixing them with other resolve errors - ~We can probably do something smarter than creating vectors to hold the versions~ - [x] This degrades error messages when a single version is not available, we'll need to special case that - [x] It seems safer to coerce the error type in `resolve` instead of `solve` if feasible

Eh2406 force-pushed the simplify branch from b116cf6 to 3a8e5bb Compare November 22, 2023 21:24

mpizenberg reviewed Nov 23, 2023

View reviewed changes

Eh2406 force-pushed the simplify branch from 3a8e5bb to c843929 Compare November 24, 2023 18:41

mpizenberg reviewed Nov 24, 2023

View reviewed changes

Eh2406 force-pushed the simplify branch from c843929 to 6f3eed3 Compare November 25, 2023 22:32

Eh2406 force-pushed the simplify branch from 6f3eed3 to 66a229e Compare November 27, 2023 00:59

Eh2406 force-pushed the simplify branch from 66a229e to c54cc97 Compare November 27, 2023 01:06

feat: add a simplify for error messages

5cd1dc9

Eh2406 force-pushed the simplify branch from c54cc97 to 5cd1dc9 Compare November 27, 2023 01:10

konstin mentioned this pull request Nov 28, 2023

Perf: Clone only valid segments astral-sh/pubgrub#6

Closed

konstin reviewed Nov 28, 2023

View reviewed changes

src/range.rs Show resolved Hide resolved

Fix broken links

8f7ef55

Co-authored-by: konsti <[email protected]>

zanieb reviewed Nov 29, 2023

View reviewed changes

src/range.rs Outdated Show resolved Hide resolved

zanieb reviewed Nov 29, 2023

View reviewed changes

src/range.rs Outdated Show resolved Hide resolved

zanieb approved these changes Nov 29, 2023

View reviewed changes

Eh2406 and others added 2 commits November 29, 2023 11:09

correct capitalization

581f66d

Co-authored-by: Zanie Blue <[email protected]>

improve comment

e518819

Co-authored-by: Zanie Blue <[email protected]>

Eh2406 added this pull request to the merge queue Nov 29, 2023

Merged via the queue into dev with commit 2b2d8d4 Nov 29, 2023
5 checks passed

Eh2406 deleted the simplify branch November 29, 2023 16:13

This was referenced Nov 29, 2023

Simplify version sets in error reports astral-sh/pubgrub#7

Closed

Simplify version sets in error reports astral-sh/pubgrub#8

Closed

Eh2406 mentioned this pull request Nov 30, 2023

When can the algorithm use simplify #162

Open

zanieb mentioned this pull request Dec 4, 2023

Use available versions to simplify unsat error reports astral-sh/uv#547

Merged

3 tasks

mpizenberg mentioned this pull request Dec 5, 2023

How to get the report formatting presented in the README #167

Open

Eh2406 mentioned this pull request Jan 31, 2024

Improve error messages mamba-org/resolvo#9

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add a `simplify` for error messages #156

feat: add a `simplify` for error messages #156

Eh2406 commented Nov 22, 2023

Eh2406 commented Nov 22, 2023

mpizenberg commented Nov 23, 2023

mpizenberg Nov 23, 2023

mpizenberg Nov 24, 2023

Eh2406 Nov 24, 2023

Eh2406 commented Nov 24, 2023

mpizenberg Nov 24, 2023 •

edited

Loading

mpizenberg Nov 24, 2023 •

edited

Loading

Eh2406 Nov 25, 2023

mpizenberg Nov 25, 2023

mpizenberg commented Nov 26, 2023 •

edited

Loading

Eh2406 commented Nov 27, 2023

zanieb commented Nov 27, 2023

mpizenberg commented Nov 27, 2023

Eh2406 commented Nov 27, 2023

konstin commented Nov 28, 2023

konstin commented Nov 28, 2023

zanieb commented Nov 28, 2023 •

edited

Loading

Eh2406 commented Nov 28, 2023

Eh2406 commented Nov 28, 2023

zanieb commented Nov 28, 2023

zanieb commented Nov 28, 2023 •

edited

Loading

Eh2406 commented Nov 28, 2023

zanieb commented Nov 28, 2023

Eh2406 commented Nov 28, 2023

mpizenberg commented Nov 28, 2023

feat: add a simplify for error messages #156

feat: add a simplify for error messages #156

Conversation

Eh2406 commented Nov 22, 2023

Eh2406 commented Nov 22, 2023

mpizenberg commented Nov 23, 2023

mpizenberg Nov 23, 2023

Choose a reason for hiding this comment

mpizenberg Nov 24, 2023

Choose a reason for hiding this comment

Eh2406 Nov 24, 2023

Choose a reason for hiding this comment

Eh2406 commented Nov 24, 2023

mpizenberg Nov 24, 2023 • edited Loading

Choose a reason for hiding this comment

mpizenberg Nov 24, 2023 • edited Loading

Choose a reason for hiding this comment

Eh2406 Nov 25, 2023

Choose a reason for hiding this comment

mpizenberg Nov 25, 2023

Choose a reason for hiding this comment

mpizenberg commented Nov 26, 2023 • edited Loading

Eh2406 commented Nov 27, 2023

zanieb commented Nov 27, 2023

mpizenberg commented Nov 27, 2023

Eh2406 commented Nov 27, 2023

konstin commented Nov 28, 2023

konstin commented Nov 28, 2023

zanieb commented Nov 28, 2023 • edited Loading

Eh2406 commented Nov 28, 2023

Eh2406 commented Nov 28, 2023

zanieb commented Nov 28, 2023

zanieb commented Nov 28, 2023 • edited Loading

Eh2406 commented Nov 28, 2023

zanieb commented Nov 28, 2023

Eh2406 commented Nov 28, 2023

mpizenberg commented Nov 28, 2023

feat: add a `simplify` for error messages #156

feat: add a `simplify` for error messages #156

mpizenberg Nov 24, 2023 •

edited

Loading

mpizenberg Nov 24, 2023 •

edited

Loading

mpizenberg commented Nov 26, 2023 •

edited

Loading

zanieb commented Nov 28, 2023 •

edited

Loading

zanieb commented Nov 28, 2023 •

edited

Loading