-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bump tuple inference length cutoff from 16 to 32 #27398
Conversation
Appveyer timed out, Travis is complaining about We could also run nanosoldier, except it doesn't really measure the negative impacts (compile time) and I'm guessing the benchmark suite is likely to be insensitive to any positive impacts... |
Nanosoldier is broken right now anyway. |
This isn't necessarily similar – it's the cutoff for when we want to switch from the |
457f701
to
92a375f
Compare
OK, I tend to agree with Jameson, such changes should be at the very least be considered separately (I had originally figured the less independent "magic" parameters, the better). This PR now merely tweaks the default inference parameters and leaves |
Bump |
@nanosoldier |
Fixes #22370 In that issue, I recommended removing |
I'm not sure it's a full fix for #22370 which asks for a local override. While raising the global max helps, one could still run into a case where you know you have an array of 40 dimensions and just want a macro to locally change the max. |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
Is that a typical amount of noise for nanosoldier? It seems unlikely to me that this change should affect many of the benchmarks... (I assume we mostly benchmark already inferrable code) |
I'd also be happy if we simply removed the tuple limitations from inference, so long as this won't cause problems (but that shouldn't mean we can't merge this for now). |
I think @KristofferC mentioned to me some adverse timing effects in unrelated benchmarks due to high tuple type limitations in Slack. I didn't understand why it would occur though (IIRC it was runtime and not compile time?), so maybe he has some details. |
Bumping this. For reference, I received a PR which uses length 46 SVectors, and it's not a particularly bad idea in this case either. |
@ChrisRackauckas Yes, I believe (hope) we will be getting something like this in soon. After brief discussions with Jameson, my understanding is that we might try to eliminate the Out of curiousity, did you experience performance problems working with |
Thanks for the update!
I didn't performance test that yet since right now we're just getting it working. It's an algorithm with 46 hardcoded constants though, so I hope we don't need an allocation to get around this issue. |
Isn't this a cache that is not supposed to be created over and over in e.g. a hot loop? Creating a length 46 array take on the order for 50ns so why are you worried about the allocation? Did you profile to see that using an Array slows things down? It might be faster? |
It has nothing to do with that. Performance on the CPU with standard types etc. is probably fine with an array. But there are many reasons to want to write an algorithm with a loop-able list of more than 16 constants. In this particular application it has to do with attempts to compile to alternative targets like asm.js and .ptx kernels. Getting that stuff working is much easier when there are no arrays involved (and even if we can get it to work, in that case not having to create 4000 small GPU-based arrays to solve each small ODE in parallel is probably a good idea anyways) |
92a375f
to
1b2493b
Compare
@vtjnash I've made a first attempt at erasing If not, we can try sneak just the first commit in v0.7 ASAP. |
@nanosoldier |
Sorry, had to kill the job because the server needed to be restarted. @nanosoldier |
CircleCI seems unique in complaining that
Not sure how I can not be seeing this locally or in other CI? I've restarted one of them... |
Yes, this looks like what I had in mind |
ce031f3
to
3572d50
Compare
OK - CircleCI was correctly picking up that @martinholters very recently merged some changes that depended on |
@nanosoldier runbenchmarks(ALL, vs=":master") |
@nanosoldier |
Yes, using |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
@@ -388,7 +387,7 @@ function abstract_iteration(@nospecialize(itertype), vtypes::VarTable, sv::Infer | |||
valtype = statetype = Bottom | |||
ret = Any[] | |||
stateordonet = widenconst(stateordonet) | |||
while !(Nothing <: stateordonet) && length(ret) < sv.params.MAX_TUPLETYPE_LEN | |||
while !(Nothing <: stateordonet) || length(ret) < sv.params.MAX_TUPLE_SPLAT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the change from &&
to ||
? Doesn't that lead to an infinite loop for something like
struct Foo end
Base.iterate(::Foo) = (0, nothing)
Base.iterate(::Foo, state) = (0, (state,))
foo(x) = tuple(x...)
@code_warntype foo(Foo())
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops!!! 😧 I'll fix that
1931d9d
to
978c39f
Compare
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
This is not release-blocking. |
Sorry Jeff |
CI and Nansoldier seem good (Nanosoldier noise looks like last time?). Going to merge. |
Why does the splat limit need to stay? |
I believe that it protects us from infinite recursion from badly written code, and the compiler never terminating. I think this is because deciding whether a given piece of code is badly written is the same as the halting problem, which we can't solve. So we need some variety of simplistic, strong heuristic that guarantees compilation would not fail. |
|
For tuple size > 16, the spatting still allocates. According to this PR, it should have been fixed. Is this an regression? FYI: It causes problem to CUDA array manipulations, see issue |
BumpsRemovestupletype_len
to31
tupletype_len
entirely.tuple_splat
to32
Similarly for the typesAny16
->Any32
andAll16
->All32
.For context, I feel this would be useful for working with heterogenously typed data as tuples and named tuples. In particular, for v0.7/v1.0, simple containers such as
Vector{NamedTuple{...}}
could be versatile, performant containers for tables and data (similarly for named tuples of arrays, etc), and at times the existing limits (where practically speaking its14
elements being the biggest size that gives "full run-time speed") felt a bit limiting (e.g. a table with 15-30 columns doesn't seem particularly unreasonable, though for very large numbers I admit that switching to a more dynamic data structure might be preferable).Incidentally, this might help with things like arrays with 15+ dimensions and so-on (#20163) (cc @Jutho). Having
30
dimensions as the maximum with fully-inferred code seems a somewhat reasonable cutoff number to me, giving ~1 billion elements for a 2x2x2x... sized array, as in #20163.Of course, I'm more than a bit ignorant of what other impacts this may have internally for the compiler (compile time speed will obviously be slower in some situations) but I thought this might be worthwhile floating for inclusion in v0.7.