Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extraneous JIT_TailCall generated #533

Closed
cognociente opened this issue Jul 15, 2015 · 7 comments
Closed

Extraneous JIT_TailCall generated #533

cognociente opened this issue Jul 15, 2015 · 7 comments

Comments

@cognociente
Copy link

Originally posted this issue at FSharp.Compiler.Services as: fsharp/fsharp-compiler-docs#380

I have posted the following head-scratcher on stackoverflow: http://stackoverflow.com/questions/31433605/how-to-eliminate-time-spent-in-jit-tailcall-for-functions-that-are-genuinely-non

Summary: I am calling a .NET assembly (F# 4.0) from a dynamically generated assembly (FSCS 3.1) and it is passing a custom value type (struct) over the assembly boundary. This causes a set of unnecessary calls to JIT_TailCall that degrade performance materially.

Direction: as raised by @dsyme is this an F# issue or a core CLR issue? In the latter case I will move this issue to .NET core CLR. The only "evidence" of it being F# related are (1) that I am using F# when it occurs and (2) it is tail-call related - so evidence not conclusive.

@latkin
Copy link
Contributor

latkin commented Jul 15, 2015

Can you provide a minimal, self-contained repro?

@manofstick
Copy link
Contributor

I'm guessing this is kind of f# issue, but actually just a > 8-bytes sized Value Types on 64-bit JIT issue (and larger than that, 64 bit, pass-by-value calling convention decision for windows). Although F# does exacerbate the problem with the desire to put the IL tail instruction all over the place. Anyway, I should have, but haven't, gone through the underlying principals in worked examples, but the overview of this is described in these postings.

Personally I always just turn off "tail call optimization" on projects where I used value types, as it's only really required if you have mutually recursive tail call functions. It doesn't stop the f# compiler from turning recursive defined functions into iterative solutions on a per funciton basis.

I think this issue should results in an Attribute to selectively apply the tail instruction. Or alternatively, and possibly better, selectively opt-out of tail being applied. This way at least as an optimization step with profiler in hand, one could stop tail being applied to individual problem areas.

At the moment, for any non-trivial use (which should always performance related, why use them otherwise?), value types > 64-bit are basically not viable with "generate tail calls" flag set. This caught me out badly when we move from 32-bit to 64-bit runtime, but now I just turn it off before I type one line of code (and I'm doing my best to helper remove it being a problem from FSharp.Core!).

@dsyme
Copy link
Contributor

dsyme commented Aug 3, 2015

Yes, this is a 64-bit and/or RyuJIT CLR issue (exacerbated by F#'s "desire to put the IL tail instruction all over the place" as @manofstick puts it :) )

@latkin latkin added the question label Aug 4, 2015
@buybackoff
Copy link
Contributor

I have a case when mutually recursive functions generate 30% load for JIT_TailCall and 15% load for JIT_TailCallHelperStub_ReturnAddress. These functions are closed over method variables and class fields. When I turn off tail call generation, my performance increases exactly by 45%.

I haven't profiled this snippet, but this is how my real code is structured:

#time "on"
type MyRecType() = 

  let list = System.Collections.Generic.List()

  member this.DoWork() =
    let mutable tcs = new TaskCompletionSource<_>()
    let returnTask = tcs.Task 
    let mutable local = 1

    let rec outerLoop() =
      if local < 1000000 then
        innerLoop(1)
      else
        tcs.SetResult(local)
        ()

    and innerLoop(inc:int) =
      if local % 2 = 0 then
        local <- local + inc
        outerLoop()
      else
        list.Add(local) // just fake access to a field to illustrate the pattern
        local <- local + 1
        innerLoop(inc)

    outerLoop()

    returnTask


let instance = MyRecType()

instance.DoWork().Result

> Real: 00:00:00.019, CPU: 00:00:00.031, GC gen0: 0, gen1: 0, gen2: 0
> val it : int = 1000001

.NET 4.6 and F# 4.0 don't help at all.

I tried to rewrite this as methods, but got StackOverflowException. However, I do not understand why I am not getting SO when I run a very big number of iterations without tail call generation?

Update
Rewriting the method as:

  member this.DoWork2() =
    let mutable tcs = new TaskCompletionSource<_>()
    let returnTask = tcs.Task
    let mutable local = 1
    let rec loop(isOuter:bool, inc:int) =
      if isOuter then
        if local < 1000000 then
          loop(false,1)
        else
          tcs.SetResult(local)
          ()
      else
        if local % 2 = 0 then
          local <- local + inc
          loop(true,1)
        else
          list.Add(local) // just fake access to a field to illustrate the pattern
          local <- local + 1
          loop(false,1)

    loop(true,1)

    returnTask


> Real: 00:00:00.004, CPU: 00:00:00.015, GC gen0: 0, gen1: 0, gen2: 0
> val it : int = 1000001

reduces JIT_TailCall and JIT_TailCallHelperStub_ReturnAddress overhead to 18% an 2% of execution time that is 2x faster, so the actual overhead decreased from 45% to 10% of the initial time. Still high, but not as dismal as in the first scenario.

@manofstick
Copy link
Contributor

@buybackoff

Your question doesn't really fit this issue, In your first case you have a legitimate need for the tail instruction being added, and building without tail calls may indeed blow your stack. Try your code in a 64 bit build and I think you will see improvement from a tail call perspective. Maybe.

In the second case you have reduced your problem to a state where the f# compiler just converts the recursion into a loop, so your shouldn't have an problem. I'm surprised you see any profiling time in tail call method...

And to be in good coding practice you should be trying to avoid mutables. Maybe it's just your cut down example, but, assuming my refactoring is correct, the following should be a "nicer"case:

member this.DoWork3() =
  let rec loop isOuter counter inc =
    if isOuter then
      if counter < 10000000
        then loop false counter 1
        else counter
    elif counter % 2 = 0 then
        loop true (counter+inc) 1
    else
        list.Add counter // just fake access to a field to illustrate the pattern
        loop false (counter+1) 1

  let tcs = TaskCompletionSource ()
  tcs.SetResult (loop true 1 1)
  tcs.Task

@buybackoff
Copy link
Contributor

@manofstick Thanks! This is not the first time I see a huge percentage if JIT_TailCall, but the first when it is critical. The issue is about this, I posted this only to confirm it. All tests were made under Release/x64.

Mutable state hidden inside methods is one of the F# blessings, it is fast and often more readable, but does removing it reduces the JIT_TailCall overhead? From your posts above I guess mutable variables should mitigate the issue of passing structs > 8 bytes. Faster code is always a "nicer" one, other public things being equal. I would rewrite the loop as imperative one, but struggle to do so in my quite complex real code.

You wrote that disabling tail call generation:

... doesn't stop the f# compiler from turning recursive defined functions into iterative solutions on a per funciton basis.

How to ensure that my method won't blow up with a SO exception at some point, even if I tested it on 100m iterations?

@manofstick
Copy link
Contributor

@buybackoff

I'll leave you with your mutable hidden state :-) (Oh, there can be some situations, but shouldn't be the first arrow in the quiver to reach for...)

And I'm all for faster code, but I wouldn't says that always correlates highly with "nicer"...

As for determining if the f# compiler have turned a tail recursive function into an iterative one, I suggest ILSpy. Anyway, the F# compiler can be strange. I had a tail recursive function that was acting on lists of a concrete type and in the Debug version it was converted into an iterative implementation, but then I converted the function to be generic on the list type, and it wasn't anymore. (In both cases the Release version "did the right thing".)

Anyway, this is all getting a bit off topic, and should be moved to stackoverflow if you wish to continue.

buybackoff added a commit to Spreads/Spreads that referenced this issue Dec 20, 2015
…64-bit value type optimization dotnet/fsharp#533 (comment) and microbench in Script1.fsx)

TODO: profile and try to reach LINQ performance

Series Add, #50000000, ops: 5990176, mem/item: 21
Series Read, #50000000, ops: 35161744, mem/item: 0
Series Read, #50000000, ops: 33534540, mem/item: 0
Series Read, #50000000, ops: 36982248, mem/item: 0
Series Read, #50000000, ops: 36982248, mem/item: 0
Series Read, #50000000, ops: 37792894, mem/item: 0
Series Add/divide Chained, #50000000, ops: 22182786, mem/item: 0
Series Add/divide Chained, #50000000, ops: 21720243, mem/item: 0
Series Add/divide Chained, #50000000, ops: 21824530, mem/item: 0
Series Add/divide Chained, #50000000, ops: 22104332, mem/item: 0
Series Add/divide Chained, #50000000, ops: 22133687, mem/item: 0
Series Add/Delete Inline, #50000000, ops: 24366471, mem/item: 0
Series Add/Delete Inline, #50000000, ops: 24777006, mem/item: 0
Series Add/Delete Inline, #50000000, ops: 24801587, mem/item: 0
Series Add/Delete Inline, #50000000, ops: 24003840, mem/item: 0
Series Add/Delete Inline, #50000000, ops: 24630541, mem/item: 0
LINQ Add/divide Chained, #50000000, ops: 26385224, mem/item: 0
LINQ Add/divide Chained, #50000000, ops: 26219192, mem/item: 0
LINQ Add/divide Chained, #50000000, ops: 25960539, mem/item: 0
LINQ Add/divide Chained, #50000000, ops: 26399155, mem/item: 0
LINQ Add/divide Chained, #50000000, ops: 26867275, mem/item: 0
LINQ Add/Delete Inline, #50000000, ops: 27292576, mem/item: 0
LINQ Add/Delete Inline, #50000000, ops: 26553372, mem/item: 0
LINQ Add/Delete Inline, #50000000, ops: 27901785, mem/item: 0
LINQ Add/Delete Inline, #50000000, ops: 26780931, mem/item: 0
LINQ Add/Delete Inline, #50000000, ops: 26695141, mem/item: 0
Streams Add/divide Chained, #50000000, ops: 17774617, mem/item: 0
Streams Add/divide Chained, #50000000, ops: 18491124, mem/item: 0
Streams Add/divide Chained, #50000000, ops: 18867924, mem/item: 0
Streams Add/divide Chained, #50000000, ops: 18953752, mem/item: 0
Streams Add/divide Chained, #50000000, ops: 18518518, mem/item: 0
Streams Add/divide Inline, #50000000, ops: 19470404, mem/item: 0
Streams Add/divide Inline, #50000000, ops: 19833399, mem/item: 0
Streams Add/divide Inline, #50000000, ops: 19149751, mem/item: 0
Streams Add/divide Inline, #50000000, ops: 18712574, mem/item: 0
Streams Add/divide Inline, #50000000, ops: 19319938, mem/item: 0
----------------
@dsyme dsyme changed the title Will the issue I am having with generation of extraneous JIT_TailCall be solved in the upcoming release to F# 4.0? Extraneous JIT_TailCall generated Jul 18, 2016
@dsyme dsyme closed this as completed Dec 27, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants