-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add a simplify
for error messages
#156
Conversation
cc @zanieb, I think this can be used to dramatically improve your error messages. cc @baszalmstra mamba-org/resolvo#2 (comment) I don't know if these are improvements you'd be interested in include in your copy of this type. A future possibility is figuring out when it is safe for PubGrub to ask for a range to be simplified during processing. This would be enormously helpful for #135. But needs to be done with extreme care. Basic set properties do not hold if |
It feels like a very good idea for error reporting where solver logic is not important anymore, and readability is much more useful. Also at this point the potential costs are likely negligible compared to the solving costs? |
src/range.rs
Outdated
(Excluded(start), Excluded(end)) => v > start && v < end, | ||
(Excluded(start), _) => v <= start, | ||
(Included(start), _) => v < start, | ||
(Unbounded, _) => false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this wrong if the very first segment is (Unbounded, Unbounded)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole short-circuit change feels shady ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The false
here means that it fails to short-circuit and falls through to the check below which correctly returns true
. That being said I need to reread it twice to figure out what was going on here, and I wrote the code yesterday. There has to be a clear way to write this.
Similarly, we can probably improve the testing to ensure that the code are correct, even when were not paying attention. I will need to look into what is worth the effort. (One day, I would love to use kani or creusot to prove the code correct, not that I have time for it now.)
I tinkered with the code to make it more "obviously correct". While I was at it I noticed that STD provide some helpful methods which simplified a few things. I looked at our test code generation, and I'm pretty confident it will cover all the corner cases. |
src/range.rs
Outdated
(Excluded(start), Included(end)) => v > start && v <= end, | ||
(Excluded(start), Excluded(end)) => v > start && v < end, | ||
} { | ||
if !within_lower_bound(segment, v) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, I I'm getting it now. The naming sounds a bit weird? "within bounds" makes sense, but "within lower bound" feels odd. I'd say something like "above lower bound", "below upper bound". Or if we want to avoid the negation, use "below" in both cases, like this:
if version_below_lower_bound( v, segment ) {
return false;
} else if version_below_upper_bound( v, segment ) {
return true;
}
PS, there is a typo in within_uppern_bound
with an "n" at the end of "upper".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or alternatively, a method on the bounded range maybe:
if segment.is_above(v) {
return false;
} else if !segment.is_below(v) {
return true;
}
I'm not a super fan of the not !
in there though.
EDIT: forget that, we don't have a type for the segment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about a within_bounds
that returns a https://doc.rust-lang.org/std/cmp/enum.Ordering.html ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So taking Ordering
as a slightly richer Bool
. It's tempting. In that case it's important to have the version argument before the segment argument (within_bounds(v, segment)
) because ordering can be confusing.
What do you think of splitting a bit the simplify function, to make it easier to follow? I'm thinking something like this (pseudo code): // return the segment index in the range for each version in the range, None otherwise
locate_versions(&self, versions: Iter<V>) -> Iter<Option<usize>>
// group adjacent versions locations
// [None, 3, 6, 7, None] -> [(3, 7)]
// [3, 6, 7, None] -> [(None, 7)]
// [3, 6, 7] -> [(None, None)]
// [None, 1, 4, 7, None, None, None, 8, None, 9] -> [(1, 7), (8, 8), (9, None)]
group_adjacent_locations(locations: Iter<Option<usize>>) -> Iter<(Option<usize>, Option<usize>)>
// simplify range with segments at given location bounds.
keep_segments(&self, kept_segments: Iter<(Option<usize>, Option<usize>)) -> Self
simplify(&self, versions: Iter<V>) -> Self {
let version_locations = self.locate_versions(versions);
let kept_segments = group_adjacent_locations(version_locations);
self.keep_segments(kept_segments)
} |
|
I'll try to use this today! |
If you feel it's harder than worth it maybe it's better to not split group_adjacent_locations and keep_segments. I'm not finding more straightforward ways to write it either. Your call. I don't see your previous version to compare complexity. I guess you forced push rewrote it. I haven't followed the CI merging changes. Are PRs still squashed merged? or are they "normal" merge? Because if they are still squashed merge there is no need to overwrite your commit while working on the PR. And if they aren't well it's slightly annoying to make sure all a PR history is clean. I don't want to derail this PR, I just thought it was relevant as it makes my reviewing of the PR harder. |
Could you give an example where this doesn't hold? I tried but couldn't find one. |
I made a POC for simplifying all terms: astral-sh/pubgrub@main...zanieb:pubgrub:know-thy-versions-rebase. The main problem is that this breaks |
Co-authored-by: konsti <[email protected]>
Hm so I gave this a try by hacking it into the reporter and testing a simple holes case. Before:
After:
While we can certainly do something to improve the null cases, I'm not seeing this as a drastic improvement. Perhaps I'm doing something wrong? |
You seem to be correct, this implementation of simplify does uphold this property. The point I was trying to make is defining what properties
There are clearly some properties that code is relying on that
You may want to look at the "accumulated_intersection" and the "fewer_intersections" branches. I be happy to discuss other changes to the algorithm to reduce intersections, either on zulip or in a issue.
This is exactly where I was hoping this freestanding method would be useful. Let's see how much it helped...
That is not as helpful as I was hoping :-( clearly |
Got it. Most of the error reporting is based on the
and adding a
|
@Eh2406 ah that makes a lot of sense! I'll explore the effect on more error messages then. |
With 53c9f6d we get better handling of the empty ranges
There's a bit of a problem because |
While attempting to use this simplification code I got an odd lifetime error with ``` let c = set.complement(); let s = c.simplify(versions); s.complement() ``` By in lining locate_versions the lifetimes could be simplified so that that code works
Right. This simplification code assumes that information about versions that don't exist is unneeded. Which is not true when dealing with "NoVertions", or anything derived from them. Because of #155, everything in this example derives from a "NoVertions". I'm open to ideas on how to get us to a better place. In the meantime, the open question in this PR is whether this code is useful and worth merging. |
I think this is a pretty clear path to better error messages. We can either continue tackling it piecewise by merging or devote a new branch to error messaging and merge the whole thing to |
This project has been plagued with long living branches, so I'm biased toward merging as often as is acceptable. Furthermore there are at least two independent ways to build on this PR, incorporating it in the algorithm and using it on the output, which could be done in parallel and may each take several attempts. |
Co-authored-by: Zanie Blue <[email protected]>
Co-authored-by: Zanie Blue <[email protected]>
Uses pubgrub-rs/pubgrub#156 to consolidate version ranges in error reports using the actual available versions for each package. Alternative to astral-sh/pubgrub#8 which implements this behavior as a method in the `Reporter` — here it's implemented in our custom report formatter (#521) instead which requires no upstream changes. Requires astral-sh/pubgrub#11 to only retrieve the versions for packages that will be used in the report. This is a work in progress. Some things to do: - ~We may want to allow lazy retrieval of the version maps from the formatter~ - [x] We should probably create a separate error type for no solution instead of mixing them with other resolve errors - ~We can probably do something smarter than creating vectors to hold the versions~ - [x] This degrades error messages when a single version is not available, we'll need to special case that - [x] It seems safer to coerce the error type in `resolve` instead of `solve` if feasible
Ranges generated by PubGrub often end up being verbose and pedantic about versions that do not matter. To take an example from #155
2 | 3 | 4 | 5
could be more concisely stated as>=2, <=5
given that it's not possible to have versions in between the integers. More generally it could be expressed as>=2
, if the list of available versions were taken into account.The logic for simplifying a VS given a complete set of versions feels like it should be simple. But it is trickier to implement than I expected. Especially if you want to guarantee
O(len(VS) + len(versions))
time andO(1)
allocations.While working through the logic I had to implement a check which versions match from a list in
O(len(VS) + len(versions))
time, which felt useful enough to be worth including in the API. In implementing that I noticed that our implementation ofcontains
, did not aggressively short-circuit. The short-circuiting implementation is just as easy to read, so I decided to include that as well.