-
Notifications
You must be signed in to change notification settings - Fork 790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CompilerPerf] Faster equality in generic contexts #5112
Conversation
Very, very interesting. I rely heavily on Would this improve such situations as well, or would existing code benefit if it also implemented the generic |
This is getting close to the territory we would accept, if backed up by perf results Here are some current failures:
|
If you are create a sealed type with |
Sorry, WIP, so I will bash my way through failures as I get time... Oh, currently this is just equality (GetHashCode/Equals + associated operators) but I plan on getting to comparison (CompareTo + associated operators). Should that be done as a seperate PR? I think there is value in that, shrinking the surface area of an individual change? |
If the technique is the same then do it in the same PR |
Hahahah! I guess I have been a little... ummm... adventurous (?) with my PRs. Anyway, I always kind of intended them to be group efforts and really I see myself throwing out ideas as much as anything... But hey, after 3 years I guess this just isn't they way that this open source stuff works. Get an idea and polish it off yourself or be damned :-) (obviously a bit tongue in cheek - I didn't have a number of helpers on the Seq work...)
Let's drag up some of the old test from the previous PRs... (micro benchmarks, blah, blah, blah - I haven't run these extensively, numbers jump around, looking for overall trend... at the end I realized I shouldn't have had as many significant figures in the %s as really everything is +/- a few %... But anyway...) In #574 we are linking to this gist:
In #549 where I've copied the code to this gist:
In #930 where I've copied the code to this gist:
In #513 we're I've copied the code to this gist:
More in #513 where I've copied the code to this gist:
More in #513 where I've copied the code to this gist:
|
(seems to be a build server issue, and builds/tests on my machine, and the build before the failure had a 120 minute timeout and thus was incomplete... that wasn't this PR, but something to do with Span...) Anyway, I'd be happy if I could leave this here for the moment. i.e. I have intentions of doing the comparison side of things, but this stands on its own, and I don't think I currently have the time to do the work (from my, admittedly old, memory I did an initial phase of refactoring out the exception throwing throwing behaviour and I think a bit more refactoring before being able to effectively do the same process as here) So I'm changing the title of this PR to reflect that it is just equality... and also that it isn't a WIP anymore... |
|
||
.line 16707566,16707566 : 0,0 '' | ||
IL_000a: ldarg.0 | ||
IL_000b: ldloc.0 | ||
IL_000c: tail. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not totally comfortable with these tailcalls disappearing - have we discussed this? Thanks
@manofstick I see we talked about the tailcall removal in the context of #513: I added this issue here to track it #946. My specific reply was here: #946 (comment)
Still, I wonder if we should make the tailcall removal a separate PR? In case we needed to revert it at a later point? Thanks! |
@dotnet-bot test this please |
Hmm. Removing the tail call avoiding code does smash performance, in many cases making things worse than they were. The 64-bit JIT now appears to do tail calls internally as what would of causes a stack overflow under 32-bit, doesn't under 64-bit. I create some example types here. Anyway, I then modified all calls in FSharp.Core to be of the form:
Which, albeit a bit ugly, does work. Thoughts? |
@manofstick Yes, tail calls prevent inlining - hopefully will be revisited soon: https://github.com/dotnet/coreclr/issues/18361#issuecomment-396060473 |
@manofstick I applaud your efforts in trying to optimize the big and the small! It's a dangerous game you play (might break things) but someone has to do it :) Btw in the referenced issue I actually saw performance drop when surpressing |
I would suggest leaving the For cases where performance might improve without
These are things we can revisit on .Net Core (see dotnet/coreclr#18406). Since we generally only see |
OK; had some good progress. Basically scrapped everything (well except the core idea) and re-implemented. So, to make everyone happy I have restored tail calls which now doesn't have as much of an impact on performance as I have managed to drop one level of call indirection in some cases - but looking with great anticipation for when @AndyAyersMS can implement some optimizations in the JIT! @mrange - yes, yes I am a sucker for punishment :-) But there is a fairly significant set of tests I put in when I working on #513, so hopefully have a reasonable degree of confidence... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very promising, will review more completely. Can you report latest performance results? thanks
src/fsharp/FSharp.Core/prim-types.fs
Outdated
member __.Equals (x,y) = System.String.Equals (x, y) | ||
member __.GetHashCode x = HashString x } | ||
|
||
| _, ty when ty.Equals typeof<decimal> -> box { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This is odd formatting for an object expression, we would always use
{ new X with
member ...
}
or
{ new X with
member ... }
If piping is defined at this point then the box can be like this:
{ new X with
member ... }
|> box
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think piping is available in prim_types but I'll see... I'll move the brace down anyway...
src/fsharp/FSharp.Core/prim-types.fs
Outdated
/// Implements generic equality between two values, with PER semantics for NaN (so equality on two NaN values returns false) | ||
// | ||
// The compiler optimizer is aware of this function (see use of generic_equality_per_inner_vref in opt.fs) | ||
// and devirtualizes calls to it based on "T". | ||
let GenericEqualityIntrinsic (x : 'T) (y : 'T) : bool = | ||
GenericEqualityObj false fsEqualityComparerNoHashingPER ((box x), (box y)) | ||
FSharpEqualityComparer_PER<'T>.Equals (x, y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm aware that this is now more expensive initially: we have the table lookup followed by an indirect call, before getting to GenericEqualityObj.
Am I correct that that's the fundamental tradeoff inherent to the PR? i.e. those costs are small compared to subsequent savings of not boxing (for primitive value types) and the advantages of using the default equality comparer (for other types)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a slight cost, but the cost of boxing (for value types) and type-based pattern matching (in GenericEqualityObj) are an order of magnitude higher (he says without actually running the numbers, but from general feel.)
Also remember that these functions are only kicking in when equality doesn't know the type at compile time, so it's not a hit all over the place.
src/fsharp/FSharp.Core/prim-types.fs
Outdated
let canUseDefaultEqualityComparer (ty:Type) = | ||
let processed = System.Collections.Generic.HashSet () | ||
|
||
let rec recurse idx (types:array<Type>) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use Type[]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really? I haven't been following any F# formatting recommendation discussion but find that a bit surprising. I really like the consistency of array
with list
and just generics in general. Anyway, if that's the way that it goes then happy enough to change it.
src/fsharp/FSharp.Core/prim-types.fs
Outdated
&& not (ty.Equals typeof<float32>) | ||
&& isValidGenericType true "System.Nullable`1" | ||
&& not (typeof<IStructuralEquatable>.IsAssignableFrom ty | ||
&& not (false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wonky formatting here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using true && ...
and false || ...
is a bit odd, just remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wonky formatting is what is aligned with - i.e what the not
, &&
or ||
corresponds to. I find it much easier to read than any other formatting - but happy to change it whatever you want.
The true
at the start of &&
and the false
at the start of ||
allow you to comment out any of the corresponding lines without changing any other lines - the compiler is smart enough to remove them - but if you don't like 'em I'll get rid of them.
src/fsharp/FSharp.Core/prim-types.fs
Outdated
&& not (ty.Equals typeof<float>) | ||
&& not (ty.Equals typeof<float32>) | ||
&& isValidGenericType true "System.Nullable`1" | ||
&& not (typeof<IStructuralEquatable>.IsAssignableFrom ty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I assume this means non-type-specialized hash/eq/compare on keys that are instances of F# structural record and union types (e.g. option values) are a little bit slower overall? But the F# compiler de-virtualizes these normally? thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not too sure what your saying, I can't use EqualityComparer.Default on records and union types because they have IStruturalEquality which has different ER/PER symantics:
type Boo = Boo of float
let nan = Boo System.Double.NaN
printfn "%b" (HashIdentity.Structural.Equals (nan, nan))
printfn "%b" (System.Collections.Generic.EqualityComparer.Default.Equals (nan, nan))
// false
// true
They are de-virtualized by the compiler under normal conditions - none of the things I'm changing should change generated code (which is borne out by no changed to the IL as part of the test suite)
src/fsharp/FSharp.Core/prim-types.fs
Outdated
&& isValidGenericType true "System.Nullable`1" | ||
&& not (typeof<IStructuralEquatable>.IsAssignableFrom ty | ||
&& not (false | ||
|| isValidGenericType false "System.ValueTuple`1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to special case ValueTuple
? There is no corresponding ValueTuple
case in GenericEqualityObj
?
Have you put this in because ValueTuple
is known to respect structural checking, i.e. if T1
and T2
support default comparison then ValueTuple<T1,T2>
? Would it make sense to add Result
and ValueOption
to this table too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was confused about whether this code applies to enum and value types. I see // covers enum and value types
but it seems it applies to reference types as well? Could you add more comments in this please?
Also the double negative not (..... isValidGenericType false "System.ValueTuple
1")` is really confusing and something feels wrong here, How can we have both
not (isValidGenericType false "System.ValueTuple`1" || .... )
and
isValidGenericType true "System.Nullable`1"
The recursive call to isValidGenericType
seems to be used in both a positive and negative sense in these different cases. Simplify/clarify the code please so it's really easy to think about, since this routine is really important to have correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ValueTuples
have nice implementation of IEquatable<>.Equals
which utilizes EqualityComparer<'Tn>.Default
so if all the 'Tns
are safe (which we check recursively) then we get a really quick comparison. Unfortuantely standard tuple in its IEquatable<>.Equals
method uses EqualityComparer.Default on all Tns, so really doesn't help. I haven't looked at Result and ValueOption.
The second parameter in isValidGenericType is if the root type doesn't match (i.e. is the type Nullable? no, then return true so check the next &&
condition - or is the type a ValueTuple? no, then return false to check the next ||
condition...), otherwise the result of the function is true if all it's generic arguments are good or false if one of them fails.
The && not (false
followed by the ||
s can be read as unless ...
I did play around a bit with this whole logic. I'm not sure if there is a way to have it so that on first glance it reads well, but I think the combination of the wonky formatting :-) and maybe a couple of comments might help? I don't know, if you think you can make it much easier then feel free to mess around with it. I just think the first way that it clicks in your head is the easiest - which might be different for different people so not sure of the gain of changing it? But happy to do so.
But I will fix up some comments.
@manofstick Perhaps we should start looking at dotnet core jitter? Improve how it handle .tail calls to improve for F#? I am interested in doing something but is looking for an idea of limited scope to get started with. |
Updated all the perf tests. Lots of transcribing by hand as I haven't automated this so might be so copy and paste errors, etc. But think it should be pretty good. So worse, some better. The final test was a lot better. Should get even better if there is some work done to improve tail calls in the JIT! In #574 we are linking to this gist:
In #549 where I've copied the code to this gist:
In #930 where I've copied the code to this gist:
In #513 we're I've copied the code to this gist:
More in #513 where I've copied the code to this gist:
More in #513 where I've copied the code to this gist:
|
@mrange Yeah I think you might be right, but besides allowing inlining on methods with tail instructions I'm not sure what the path would be (and hey, if @AndyAyersMS is going to look after those then woo hoo!) The problem, as I understand it (which, believe me, is rather limited!), is that the calling convention under the 64-bit JIT is that anything wider than 64-bits is passed on the stack (where-as otherwise the first 4 (?) 64-bit or less arguments are just passed on registers), and its this spilling onto the stack which causes grief. There is some __vectorcall calling convention which allows wider things, but hey, I know as much about that as my insights into the meaning of life... So yeah, as always, willing to give anything a crack, but I'm (currently!) not standing on the shoulders of any giants... Giants. Damn Giants. Come over here... |
OK; few more tweeks, but think now I'm done. I made changes based on your comments as well as some additional code. The changes have resulted in more improvements across the board, so I'll redo the results soon - I have also added an extra test. Oh, and I think there was a problem with the build server, which is why some of the builds failed, so maybe after the weekend they could be trigger to start again... |
Added new test that checks comparison of
|
…iends (as they have special handling)
…ized GetHashCode)
…or EqualityComparer
…or EqualityComparer
…or EqualityComparer
…or EqualityComparer
…or EqualityComparer
@manofstick , @dsyme , @cartermp - what do we want to do with this PR? It looks like we got quite close, then it just lost momentum? Do we want to resurrect it? Thanks KEvin |
Rising from the ashes of #513 (3 years later!) comes a new implementation. The new implementation is much simpler, albeit with a smaller scope (I'm no longer attempting to handle Tuples, no longer wandering into forbidden code generation - even if it was only
MakeGenericType
)So what does this do? It increases performance of Equals, CompareTo, GetHashCode when used in a generic context. In a non-generic context the fsharp compiler inserts efficient code through a combination of inline IL, statically resolved type parameters and compiler optimizations (lots of magic).
So what is a generic context?
A type which has generated equality/comparison such as a record type:
or
So any of the containers, such as Map, Set, dict as well as other places in the standard library isn't handled by
inline
functions also has potential for improvement.