Consider returning an empty collection instead of an exception for MaxByAsync/MinByAsync #2166

MatthewSteeples · 2024-09-12T12:10:21Z

Feature request

Behavior change on an existing operator.

Which subcomponent library: Async.Ix

Which next library version: major (runtime behaviour breaking change)

Please describe the feature.

The versions of this method in the framework only throw exceptions for ValueTypes, instead preferring to return null for reference types. In the spirit of this, it may be worth returning the empty List from these methods instead of throwing an error. I'll probably put together a branch with this on later this week for consideration and our own internal use.

idg10 · 2024-09-12T14:35:04Z

The MinBy and MaxBy in Rx and Ix are, unfortunately, just not the same thing as the ones with those names in .NET. It's unfortunate that when .NET added operators with the same names (many years after Rx and Ix had them), they chose different semantics.

So when there are multiple items all of which end up being the 'maximum' (i.e., if you have a tie for first place) we return all of them, whereas the .NET runtime library just arbitrarily picks one and returns you that. Bearing in mind that these are, then, basically different operators (that just happen to have the same name), aligning the behaviours isn't really a goal for us, particularly since full alignment would mean removing functionality we have to today:

If we were going to make a breaking change with the goal of fully aligning our MinBy and MaxBy with the .NET runtime operators of the same names, we would also have to change the behaviour (and, where appropriate, the method signatures) to reflect the fact that we've adopted the .NET runtime approach in which these operators now return a single item instead of a collection.

But I don't think we want to remove this functionality. Existing Rx and Ix users relying on MinBy and MaxBy will expect the current behaviour, and downgrading it to match the "exactly one item returned" behaviour of the .NET runtime versions will break their code without providing any means by which they can unbreak it.

I don't think that's a price worth paying simply for making our MinBy and MaxBy more closely resemble their newer-and-different namesakes.

Another option, I suppose, would be to retain the existing multiple min/max behaviour but to change the empty collection behaviour. But if the sole motivation here is to make Ix look a bit more like the runtime libraries, this seems only to get half way there. It just makes them slightly more like their namesakes, but still fundamentally different in an important respect. Is that a useful change?

Remember the point of the Interactive Extensions was to add operators for IEnumerable<T> that are unavailable in the .NET runtime. Rx.NET implemented the standard operators but also a load more. And Ix's job was to backfill all of those extra operators to IEnumerable<T>.

So it's very much a non-goal of Ix to provide operators that look just like ones already in .NET. The only reason Ix defined MinBy and MaxBy at all is because at the time, there were no such operators in .NET.

I think if we were to change anything here, we'd want to ask how we should deal with the fact that since Rx and Ix defined the extended set of operators that went beyond standard LINQ, the .NET runtime has now added some new members, a few of which have naming incompatibilities with existing Rx/Ix operators. (E.g., they added Chunk, which does a subset of what our Buffer operator does. And of course, they've added these MinBy and MaxBy operators that are different from ours.)

There is an argument that we should add IAsyncEnumerable<T> implementations of MinBy and MaxBy that work just like the .NET runtime ones not to System.Interactive.Async but to System.Linq.Async, because it is the job of System.Linq.Async to look as much as possible like LINQ to Objects. But to do that requires us to work out what is the correct strategy for dealing with these naming inconsistencies that have emerged as a result of .NET Runtime adding new operators that conflict with naming choices previously made by Rx/Ix.

E.g., perhaps we should deprecate our existing MinBy and MaxBy, and add new methods that do exactly the same thing but with different names. And we could introduce new methods that do the same thing as these operators in .NET runtime.

But we'd need a plan for how this would avoid breaking existing code.

Remember, one of the issues we have to deal with that a lot of libraries don't is that because Rx is very widely used, it's common for projects to have several components all with references to different versions of Rx. These all get unified to the highest version requested, and as a result of this, we really can't just change the behaviour of existing operators.

It's not good enough for us to say "Well this is a major version increase, so we're allowed to have breaking changes." Consider the following setup:

MyApp uses Lib1 and Lib2
Lib1 uses System.Interactive.Async v5.0
Lib2 uses System.Interactive.Async v7.0

The application is going to use v7.0 of System.Interactive.Async. But the author of Lib1 didn't ask for that. They asked for v5.0. If we change the behaviour of MinBy and MaxBy in v7.0, but Lib1 was written to expect the old behaviour, this is going to cause incorrect behaviour at runtime.

The argument "Well it's a major version change, so you should expect this sort of thing" cuts no ice. Lib1 asked for v5.0. If a widely used library decides that it's entitled to make these kinds of breaking changes just because SemVer says it's OK, this effectively amounts to a restriction saying "No single application shall use a mixture of components that depend on different versions of Ix."

But since these are widely used libraries, application authors might not even know there's an issue. Their dependency on either Ix or Rx might be hidden beneath many layers of indirect dependencies.

So the bottom line is that we have to be a lot more careful. We can't simply interpret SemVer rules as a license to break anything whenever we like. In practice, we have to avoid breaking changes if at all possible, and in cases where they really are unavoidable, we have to introduce them painfully gradually, and in a way that a) gives people several years of warning that they are coming and b) provides ways to mitigate problems caused by the breaking changes.

That's asking rather a lot more than you were probably thinking of doing, so if you don't particularly want to come up with a 5 year plan for how this could be done, that would be an understandable response to what I've written. But I wanted to explain (hopefully before you put any work into this) why if you submit a change that simply changes the basic behaviour of an operator that has been around for over a decade, we are extremely unlikely to approve that PR.

MatthewSteeples · 2024-09-15T21:05:56Z

First of all Ian, thanks so much for your detailed reply. I agree with what you're saying pretty much in entirety.

Firstly, I believe that this library's method of returning a collection of the largest/smallest items is the correct way of going about the implementation, so I'm not suggesting changing that at all. We make use of Morelinq who subscribe to the same philosophy of returning a collection of items, although to deal with the conflicting .NET implementations they've recently chosen to rename theirs Maxima and Minima (indicating the plurals).

I suppose, rewinding my comments about compatibility with the framework, what it comes down to is that I believe that returning and empty collection is a lot more useful than throwing an exception for these methods. Having no items in your IAsyncEnumerable is most likely not an "exceptional" circumstance, and an empty collection signifying there are no Largest or Smallest values seems logically correct. The way this library works currently means that you have to catch the exception, check your IAsyncEnumerable before calling the method, or do some workaround involving DefaultIfEmpty (so that there's a null that can be passed around).

Based on your comments about the complexities of breaking changes in major versions, and the long term "goal" of aligning naming with the .NET Framework, my revised suggestion would be to create new methods MinimaByAsync/MaximaByAsync and obsolete the existing ones that return a collection but with a fairly lengthy/indefinite delay before removal. I'd be happy to add MinByAsync/MaxByAsync to System.Linq.Async to mirror the framework's implementations at the same time if that would be useful?

Regarding putting the work into it, we'll probably be writing some if not all of these methods anyway in some way shape or form so as to avoid the workarounds I mentioned earlier, and I'm all for sharing this kind of infrastructure code which is why I wanted to contribute it back somewhere.

MatthewSteeples added the [area] Ix label Sep 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider returning an empty collection instead of an exception for MaxByAsync/MinByAsync #2166

Consider returning an empty collection instead of an exception for MaxByAsync/MinByAsync #2166

MatthewSteeples commented Sep 12, 2024

idg10 commented Sep 12, 2024

MatthewSteeples commented Sep 15, 2024

Consider returning an empty collection instead of an exception for MaxByAsync/MinByAsync #2166

Consider returning an empty collection instead of an exception for MaxByAsync/MinByAsync #2166

Comments

MatthewSteeples commented Sep 12, 2024

Feature request

idg10 commented Sep 12, 2024

MatthewSteeples commented Sep 15, 2024