Bump tuple inference length cutoff from 16 to 32 #27398

andyferris · 2018-06-03T05:16:34Z

~~Bumps tupletype_len to 31~~ Removes tupletype_len entirely.
Bumps tuple_splat to 32
~~Similarly for the types Any16 -> Any32 and All16 -> All32.~~

For context, I feel this would be useful for working with heterogenously typed data as tuples and named tuples. In particular, for v0.7/v1.0, simple containers such as Vector{NamedTuple{...}} could be versatile, performant containers for tables and data (similarly for named tuples of arrays, etc), and at times the existing limits (where practically speaking its 14 elements being the biggest size that gives "full run-time speed") felt a bit limiting (e.g. a table with 15-30 columns doesn't seem particularly unreasonable, though for very large numbers I admit that switching to a more dynamic data structure might be preferable).

Incidentally, this might help with things like arrays with 15+ dimensions and so-on (#20163) (cc @Jutho). Having 30 dimensions as the maximum with fully-inferred code seems a somewhat reasonable cutoff number to me, giving ~1 billion elements for a 2x2x2x... sized array, as in #20163.

Of course, I'm more than a bit ignorant of what other impacts this may have internally for the compiler (compile time speed will obviously be slower in some situations) but I thought this might be worthwhile floating for inclusion in v0.7.

andyferris · 2018-06-03T12:10:57Z

Appveyer timed out, Travis is complaining about Sockets, and the other CIs passed.

We could also run nanosoldier, except it doesn't really measure the negative impacts (compile time) and I'm guessing the benchmark suite is likely to be insensitive to any positive impacts...

ararslan · 2018-06-03T17:34:01Z

Nanosoldier is broken right now anyway.

vtjnash · 2018-06-07T21:23:12Z

Similarly for the types Any16 -> Any32 and All16 -> All32.

This isn't necessarily similar – it's the cutoff for when we want to switch from the O(n^2) algorithm to the O(n) algorithm (although we have some compiler optimizations that – very slowly – can turn this particular O(n^2) algorithm back into the O(n) algorithm). If anything, it should probably be much smaller.

andyferris · 2018-06-12T10:58:38Z

OK, I tend to agree with Jameson, such changes should be at the very least be considered separately (I had originally figured the less independent "magic" parameters, the better).

This PR now merely tweaks the default inference parameters and leaves Any16 alone.

andyferris · 2018-06-14T11:45:01Z

Bump

martinholters · 2018-06-14T12:21:32Z

@nanosoldier runbenchmarks(ALL, vs=":master")

vtjnash · 2018-06-14T15:31:46Z

Fixes #22370

In that issue, I recommended removing limit_tuple_type, so this seems to be a cautious start in that direction.

ChrisRackauckas · 2018-06-14T16:05:26Z

I'm not sure it's a full fix for #22370 which asks for a local override. While raising the global max helps, one could still run into a case where you know you have an array of 40 dimensions and just want a macro to locally change the max.

nanosoldier · 2018-06-14T18:38:11Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

andyferris · 2018-06-14T23:14:28Z

Is that a typical amount of noise for nanosoldier? It seems unlikely to me that this change should affect many of the benchmarks... (I assume we mostly benchmark already inferrable code)

andyferris · 2018-06-14T23:19:17Z

In that issue, I recommended removing limit_tuple_type, so this seems to be a cautious start in that direction.

I'd also be happy if we simply removed the tuple limitations from inference, so long as this won't cause problems (but that shouldn't mean we can't merge this for now).

ChrisRackauckas · 2018-06-15T00:21:18Z

I think @KristofferC mentioned to me some adverse timing effects in unrelated benchmarks due to high tuple type limitations in Slack. I didn't understand why it would occur though (IIRC it was runtime and not compile time?), so maybe he has some details.

ChrisRackauckas · 2018-06-24T08:54:42Z

Bumping this. For reference, I received a PR which uses length 46 SVectors, and it's not a particularly bad idea in this case either.

https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/pull/374/files/581bf0e28fac0f5490cfc5bdc484ad531546de2c..9833eb62f4c2b4b6eda6cca62182428611bd33d6#diff-6c62ad55534b8f93c40614616894d756R2

andyferris · 2018-06-24T12:13:35Z

@ChrisRackauckas Yes, I believe (hope) we will be getting something like this in soon. After brief discussions with Jameson, my understanding is that we might try to eliminate the tupletype_len parameter altogether, which I planned to attempt on this branch when I get a moment.

Out of curiousity, did you experience performance problems working with SVector{46, Int}?

ChrisRackauckas · 2018-06-24T14:26:37Z

Yes, I believe (hope) we will be getting something like this in soon. After brief discussions with Jameson, my understanding is that we might try to eliminate the tupletype_len parameter altogether, which I planned to attempt on this branch when I get a moment.

Thanks for the update!

Out of curiousity, did you experience performance problems working with SVector{46, Int}?

I didn't performance test that yet since right now we're just getting it working. It's an algorithm with 46 hardcoded constants though, so I hope we don't need an allocation to get around this issue.

KristofferC · 2018-06-24T14:37:36Z

Isn't this a cache that is not supposed to be created over and over in e.g. a hot loop? Creating a length 46 array take on the order for 50ns so why are you worried about the allocation? Did you profile to see that using an Array slows things down? It might be faster?

ChrisRackauckas · 2018-06-24T14:45:56Z

Isn't this a cache that is not supposed to be created over and over in e.g. a hot loop? Creating a length 46 array take on the order for 50ns so why are you worried about the allocation? Did you profile to see that using an Array slows things down? It might be faster?

It has nothing to do with that. Performance on the CPU with standard types etc. is probably fine with an array. But there are many reasons to want to write an algorithm with a loop-able list of more than 16 constants. In this particular application it has to do with attempts to compile to alternative targets like asm.js and .ptx kernels. Getting that stuff working is much easier when there are no arrays involved (and even if we can get it to work, in that case not having to create 4000 small GPU-based arrays to solve each small ODE in parallel is probably a good idea anyways)

andyferris · 2018-06-26T12:02:15Z

@vtjnash I've made a first attempt at erasing tupletype_len from existence, in the second commit. Is this now something like you imagined?

If not, we can try sneak just the first commit in v0.7 ASAP.

andyferris · 2018-06-26T12:29:38Z

@nanosoldier runbenchmarks(ALL, vs=":master")

nanosoldier · 2018-06-26T17:13:30Z

Something went wrong when running your job:

ProcessExitedException()

cc @ararslan

ararslan · 2018-06-26T17:52:05Z

Sorry, had to kill the job because the server needed to be restarted.

@nanosoldier runbenchmarks(ALL, vs=":master")

andyferris · 2018-06-26T21:04:12Z

CircleCI seems unique in complaining that

From worker 3:	Internal error: encountered unexpected error in runtime:
      From worker 3:	ErrorException("type Params has no field MAX_TUPLETYPE_LEN")

Not sure how I can not be seeing this locally or in other CI? I've restarted one of them...

vtjnash · 2018-06-26T21:21:29Z

Yes, this looks like what I had in mind

andyferris · 2018-06-27T00:48:53Z

OK - CircleCI was correctly picking up that @martinholters very recently merged some changes that depended on MAX_TUPLETYPE_LEN in #27434. @martinholters can you please check if the logic remains sound now I've ~~removed that bound, or if we need to~~ used the splat size parameter (MAX_TUPLE_SPLAT) to limit the loop here? Thanks. (EDIT: as I understand our plan to remove MAX_TUPLETYPE_LEN, we don't want to remove any limits on recursion, but we do want to treat tuples a bit more like structs where inference doesn't care specifically how many fields there are).

andyferris · 2018-06-27T03:56:11Z

@nanosoldier runbenchmarks(ALL, vs=":master")

KristofferC · 2018-06-27T06:08:21Z

@nanosoldier runbenchmarks(ALL, vs=":master")

martinholters · 2018-06-27T06:17:58Z

Yes, using MAX_TUPLE_SPLAT there sounds correct 👍

nanosoldier · 2018-06-27T06:20:44Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

martinholters · 2018-06-27T06:31:10Z

base/compiler/abstractinterpretation.jl

@@ -388,7 +387,7 @@ function abstract_iteration(@nospecialize(itertype), vtypes::VarTable, sv::Infer
    valtype = statetype = Bottom
    ret = Any[]
    stateordonet = widenconst(stateordonet)
-    while !(Nothing <: stateordonet) && length(ret) < sv.params.MAX_TUPLETYPE_LEN
+    while !(Nothing <: stateordonet) || length(ret) < sv.params.MAX_TUPLE_SPLAT


Why the change from && to ||? Doesn't that lead to an infinite loop for something like

struct Foo end Base.iterate(::Foo) = (0, nothing) Base.iterate(::Foo, state) = (0, (state,)) foo(x) = tuple(x...) @code_warntype foo(Foo())

?

Oops!!! 😧 I'll fix that

nanosoldier · 2018-06-27T11:49:14Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

JeffBezanson · 2018-06-27T15:24:31Z

This is not release-blocking.

andyferris · 2018-06-27T21:26:28Z

Sorry Jeff

andyferris · 2018-06-27T21:26:54Z

CI and Nansoldier seem good (Nanosoldier noise looks like last time?). Going to merge.

ChrisRackauckas · 2018-07-03T20:51:12Z

Why does the splat limit need to stay?

andyferris · 2018-07-03T22:02:38Z

I believe that it protects us from infinite recursion from badly written code, and the compiler never terminating.

I think this is because deciding whether a given piece of code is badly written is the same as the halting problem, which we can't solve. So we need some variety of simplistic, strong heuristic that guarantees compilation would not fail.

martinholters · 2018-07-04T14:25:51Z

MAX_TUPLE_SPLAT is used for two things. One is when splatting non-tuples (so renaming it might be a good idea), the other is for elision/inlining of the _apply call for tuple arguments. The former needs a hard limit to ensure termination (see #27398 (comment)), the latter ... I don't know.

GiggleLiu · 2020-12-15T18:49:45Z

For tuple size > 16, the spatting still allocates. According to this PR, it should have been fixed. Is this an regression?
My Julia version is 1.5.2

FYI: It causes problem to CUDA array manipulations, see issue
JuliaGPU/GPUArrays.jl#340 (comment)

mauro3 mentioned this pull request Jun 4, 2018

Type instability as soon as number of keyword arguments > 12 or optional args > 15 #27385

Closed

andyferris requested review from vtjnash and JeffBezanson June 6, 2018 03:23

andyferris force-pushed the ajf/infer-ntuple-32 branch from 457f701 to 92a375f Compare June 12, 2018 10:56

andyferris force-pushed the ajf/infer-ntuple-32 branch from 92a375f to 1b2493b Compare June 26, 2018 11:41

andyferris force-pushed the ajf/infer-ntuple-32 branch from ce031f3 to 3572d50 Compare June 27, 2018 00:42

andyferris added the compiler:inference Type inference label Jun 27, 2018

andyferris added this to the 0.7 milestone Jun 27, 2018

martinholters reviewed Jun 27, 2018

View reviewed changes

Andy Ferris added 2 commits June 27, 2018 18:44

Bump tuple inference length cutoff from 16 to 32

e7b2499

Remove tupletype_len heuristic

978c39f

andyferris force-pushed the ajf/infer-ntuple-32 branch from 1931d9d to 978c39f Compare June 27, 2018 08:44

JeffBezanson removed this from the 0.7 milestone Jun 27, 2018

andyferris merged commit fed0d5d into master Jun 27, 2018

StefanKarpinski deleted the ajf/infer-ntuple-32 branch June 27, 2018 21:30

ChrisRackauckas mentioned this pull request Jun 30, 2018

Locally change or ignore inference limits #22370

Closed

tkoolen mentioned this pull request Jul 4, 2018

Fix #388, hcat allocations with Julia master JuliaArrays/StaticArrays.jl#461

Merged

ChrisRackauckas mentioned this pull request Jul 15, 2018

WIP: Implement the ROCK2 method SciML/OrdinaryDiffEq.jl#374

Merged

ExpandingMan mentioned this pull request Aug 25, 2018

compiler performance regression with long tuples #28890

Closed

ChrisRackauckas mentioned this pull request Sep 24, 2018

Change README rule of thumb JuliaArrays/StaticArrays.jl#506

Open

ChrisRackauckas mentioned this pull request Nov 14, 2018

Hessian of 4x4 matrix multiplication using StaticArrays brings Julia to its knees JuliaDiff/ForwardDiff.jl#266

Closed

GiggleLiu mentioned this pull request Nov 9, 2020

[WIP] fix permutedims JuliaGPU/GPUArrays.jl#334

Closed

GiggleLiu mentioned this pull request Dec 14, 2020

Cartesian indexing issue for high dimensional arrays JuliaGPU/GPUArrays.jl#340

Closed

kshyatt mentioned this pull request Apr 13, 2021

Allow use of 32 element tuples without dynamic allocation #40468

Merged

Bump tuple inference length cutoff from 16 to 32 #27398

Bump tuple inference length cutoff from 16 to 32 #27398

Conversation

andyferris commented Jun 3, 2018 • edited Loading

andyferris commented Jun 3, 2018

ararslan commented Jun 3, 2018

vtjnash commented Jun 7, 2018

andyferris commented Jun 12, 2018

andyferris commented Jun 14, 2018

martinholters commented Jun 14, 2018

vtjnash commented Jun 14, 2018

ChrisRackauckas commented Jun 14, 2018

nanosoldier commented Jun 14, 2018

andyferris commented Jun 14, 2018 • edited Loading

andyferris commented Jun 14, 2018

ChrisRackauckas commented Jun 15, 2018

ChrisRackauckas commented Jun 24, 2018

andyferris commented Jun 24, 2018

ChrisRackauckas commented Jun 24, 2018

KristofferC commented Jun 24, 2018

ChrisRackauckas commented Jun 24, 2018

andyferris commented Jun 26, 2018

andyferris commented Jun 26, 2018

nanosoldier commented Jun 26, 2018

ararslan commented Jun 26, 2018

andyferris commented Jun 26, 2018 • edited Loading

vtjnash commented Jun 26, 2018

andyferris commented Jun 27, 2018 • edited Loading

andyferris commented Jun 27, 2018

KristofferC commented Jun 27, 2018

martinholters commented Jun 27, 2018

nanosoldier commented Jun 27, 2018

martinholters Jun 27, 2018

Choose a reason for hiding this comment

andyferris Jun 27, 2018

Choose a reason for hiding this comment

nanosoldier commented Jun 27, 2018

JeffBezanson commented Jun 27, 2018

andyferris commented Jun 27, 2018

andyferris commented Jun 27, 2018

ChrisRackauckas commented Jul 3, 2018

andyferris commented Jul 3, 2018

martinholters commented Jul 4, 2018

GiggleLiu commented Dec 15, 2020 • edited Loading

andyferris commented Jun 3, 2018 •

edited

Loading

andyferris commented Jun 14, 2018 •

edited

Loading

andyferris commented Jun 26, 2018 •

edited

Loading

andyferris commented Jun 27, 2018 •

edited

Loading

GiggleLiu commented Dec 15, 2020 •

edited

Loading