-
Notifications
You must be signed in to change notification settings - Fork 790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extraneous JIT_TailCall generated #533
Comments
Can you provide a minimal, self-contained repro? |
I'm guessing this is kind of f# issue, but actually just a > 8-bytes sized Value Types on 64-bit JIT issue (and larger than that, 64 bit, pass-by-value calling convention decision for windows). Although F# does exacerbate the problem with the desire to put the IL tail instruction all over the place. Anyway, I should have, but haven't, gone through the underlying principals in worked examples, but the overview of this is described in these postings. Personally I always just turn off "tail call optimization" on projects where I used value types, as it's only really required if you have mutually recursive tail call functions. It doesn't stop the f# compiler from turning recursive defined functions into iterative solutions on a per funciton basis. I think this issue should results in an Attribute to selectively apply the tail instruction. Or alternatively, and possibly better, selectively opt-out of tail being applied. This way at least as an optimization step with profiler in hand, one could stop tail being applied to individual problem areas. At the moment, for any non-trivial use (which should always performance related, why use them otherwise?), value types > 64-bit are basically not viable with "generate tail calls" flag set. This caught me out badly when we move from 32-bit to 64-bit runtime, but now I just turn it off before I type one line of code (and I'm doing my best to helper remove it being a problem from FSharp.Core!). |
Yes, this is a 64-bit and/or RyuJIT CLR issue (exacerbated by F#'s "desire to put the IL tail instruction all over the place" as @manofstick puts it :) ) |
I have a case when mutually recursive functions generate 30% load for JIT_TailCall and 15% load for JIT_TailCallHelperStub_ReturnAddress. These functions are closed over method variables and class fields. When I turn off tail call generation, my performance increases exactly by 45%. I haven't profiled this snippet, but this is how my real code is structured:
.NET 4.6 and F# 4.0 don't help at all. I tried to rewrite this as methods, but got StackOverflowException. However, I do not understand why I am not getting SO when I run a very big number of iterations without tail call generation? Update
reduces JIT_TailCall and JIT_TailCallHelperStub_ReturnAddress overhead to 18% an 2% of execution time that is 2x faster, so the actual overhead decreased from 45% to 10% of the initial time. Still high, but not as dismal as in the first scenario. |
Your question doesn't really fit this issue, In your first case you have a legitimate need for the tail instruction being added, and building without tail calls may indeed blow your stack. Try your code in a 64 bit build and I think you will see improvement from a tail call perspective. Maybe. In the second case you have reduced your problem to a state where the f# compiler just converts the recursion into a loop, so your shouldn't have an problem. I'm surprised you see any profiling time in tail call method... And to be in good coding practice you should be trying to avoid mutables. Maybe it's just your cut down example, but, assuming my refactoring is correct, the following should be a "nicer"case:
|
@manofstick Thanks! This is not the first time I see a huge percentage if JIT_TailCall, but the first when it is critical. The issue is about this, I posted this only to confirm it. All tests were made under Release/x64. Mutable state hidden inside methods is one of the F# blessings, it is fast and often more readable, but does removing it reduces the JIT_TailCall overhead? From your posts above I guess mutable variables should mitigate the issue of passing structs > 8 bytes. Faster code is always a "nicer" one, other public things being equal. I would rewrite the loop as imperative one, but struggle to do so in my quite complex real code. You wrote that disabling tail call generation:
How to ensure that my method won't blow up with a SO exception at some point, even if I tested it on 100m iterations? |
I'll leave you with your mutable hidden state :-) (Oh, there can be some situations, but shouldn't be the first arrow in the quiver to reach for...) And I'm all for faster code, but I wouldn't says that always correlates highly with "nicer"... As for determining if the f# compiler have turned a tail recursive function into an iterative one, I suggest ILSpy. Anyway, the F# compiler can be strange. I had a tail recursive function that was acting on lists of a concrete type and in the Debug version it was converted into an iterative implementation, but then I converted the function to be generic on the list type, and it wasn't anymore. (In both cases the Release version "did the right thing".) Anyway, this is all getting a bit off topic, and should be moved to stackoverflow if you wish to continue. |
…64-bit value type optimization dotnet/fsharp#533 (comment) and microbench in Script1.fsx) TODO: profile and try to reach LINQ performance Series Add, #50000000, ops: 5990176, mem/item: 21 Series Read, #50000000, ops: 35161744, mem/item: 0 Series Read, #50000000, ops: 33534540, mem/item: 0 Series Read, #50000000, ops: 36982248, mem/item: 0 Series Read, #50000000, ops: 36982248, mem/item: 0 Series Read, #50000000, ops: 37792894, mem/item: 0 Series Add/divide Chained, #50000000, ops: 22182786, mem/item: 0 Series Add/divide Chained, #50000000, ops: 21720243, mem/item: 0 Series Add/divide Chained, #50000000, ops: 21824530, mem/item: 0 Series Add/divide Chained, #50000000, ops: 22104332, mem/item: 0 Series Add/divide Chained, #50000000, ops: 22133687, mem/item: 0 Series Add/Delete Inline, #50000000, ops: 24366471, mem/item: 0 Series Add/Delete Inline, #50000000, ops: 24777006, mem/item: 0 Series Add/Delete Inline, #50000000, ops: 24801587, mem/item: 0 Series Add/Delete Inline, #50000000, ops: 24003840, mem/item: 0 Series Add/Delete Inline, #50000000, ops: 24630541, mem/item: 0 LINQ Add/divide Chained, #50000000, ops: 26385224, mem/item: 0 LINQ Add/divide Chained, #50000000, ops: 26219192, mem/item: 0 LINQ Add/divide Chained, #50000000, ops: 25960539, mem/item: 0 LINQ Add/divide Chained, #50000000, ops: 26399155, mem/item: 0 LINQ Add/divide Chained, #50000000, ops: 26867275, mem/item: 0 LINQ Add/Delete Inline, #50000000, ops: 27292576, mem/item: 0 LINQ Add/Delete Inline, #50000000, ops: 26553372, mem/item: 0 LINQ Add/Delete Inline, #50000000, ops: 27901785, mem/item: 0 LINQ Add/Delete Inline, #50000000, ops: 26780931, mem/item: 0 LINQ Add/Delete Inline, #50000000, ops: 26695141, mem/item: 0 Streams Add/divide Chained, #50000000, ops: 17774617, mem/item: 0 Streams Add/divide Chained, #50000000, ops: 18491124, mem/item: 0 Streams Add/divide Chained, #50000000, ops: 18867924, mem/item: 0 Streams Add/divide Chained, #50000000, ops: 18953752, mem/item: 0 Streams Add/divide Chained, #50000000, ops: 18518518, mem/item: 0 Streams Add/divide Inline, #50000000, ops: 19470404, mem/item: 0 Streams Add/divide Inline, #50000000, ops: 19833399, mem/item: 0 Streams Add/divide Inline, #50000000, ops: 19149751, mem/item: 0 Streams Add/divide Inline, #50000000, ops: 18712574, mem/item: 0 Streams Add/divide Inline, #50000000, ops: 19319938, mem/item: 0 ----------------
Originally posted this issue at FSharp.Compiler.Services as: fsharp/fsharp-compiler-docs#380
I have posted the following head-scratcher on stackoverflow: http://stackoverflow.com/questions/31433605/how-to-eliminate-time-spent-in-jit-tailcall-for-functions-that-are-genuinely-non
Summary: I am calling a .NET assembly (F# 4.0) from a dynamically generated assembly (FSCS 3.1) and it is passing a custom value type (struct) over the assembly boundary. This causes a set of unnecessary calls to JIT_TailCall that degrade performance materially.
Direction: as raised by @dsyme is this an F# issue or a core CLR issue? In the latter case I will move this issue to .NET core CLR. The only "evidence" of it being F# related are (1) that I am using F# when it occurs and (2) it is tail-call related - so evidence not conclusive.
The text was updated successfully, but these errors were encountered: