Remove inline annotations from broadcast kernels #35675

mbauman · 2020-04-30T21:42:11Z

In my cursory spot-tests, it appears as though we no longer need to force inlining the whole way through to the innermost loops of broadcast. This is huge, and in my naive understanding I think it'll greatly improve codegen times and sizes. It'll also allow for embedding more alternative loop designs as they no longer need to be inlined into the same function body — solving my reservations in #30973.

I've kept the preparatory @inlines for now, just removing them on the actual implementation.

Making this an early PR just to allow a Nanosoldier run.

mbauman · 2020-04-30T21:42:22Z

@nanosoldier runbenchmarks(ALL, vs = ":master")

mbauman · 2020-04-30T21:45:04Z

On master:

2 $ time make -C test broadcast
    JULIA test/broadcast
Test  (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)
broadcast  (1) |        started at 2020-04-30T16:43:19.276
broadcast  (1) |    47.61 |   1.28 |  2.7 |    5773.20 |   478.94

Test Summary: | Pass  Total
  Overall     |  439    439
    SUCCESS

real	0m51.426s
user	0m50.828s
sys	0m0.784s

On this PR:

$ time make -C test broadcast
    JULIA test/broadcast
Test  (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)
broadcast  (1) |        started at 2020-04-30T16:43:02.881
broadcast  (1) |    37.41 |   1.06 |  2.8 |    4296.83 |   386.69

Test Summary: | Pass  Total
  Overall     |  439    439
    SUCCESS

real	0m41.086s
user	0m40.656s
sys	0m0.649s

mbauman · 2020-04-30T21:53:34Z

base/broadcast.jl

@@ -1102,15 +1102,15 @@ struct BitMaskedBitArray{N,M}
    mask::BitArray{M}
    BitMaskedBitArray{N,M}(parent, mask) where {N,M} = new(parent, mask)
 end
-@inline function BitMaskedBitArray(parent::BitArray{N}, mask::BitArray{M}) where {N,M}
+function BitMaskedBitArray(parent::BitArray{N}, mask::BitArray{M}) where {N,M}


Suggested change

function BitMaskedBitArray(parent::BitArray{N}, mask::BitArray{M}) where {N,M}

@inline function BitMaskedBitArray(parent::BitArray{N}, mask::BitArray{M}) where {N,M}

This one was overzealous — we should keep the force @inline here for the@boundscheck. It currently still inlines, but I think it's good practice to keep this here in case something else in here gets big enough to lose the auto-inline.

StefanKarpinski · 2020-04-30T21:57:24Z

Beautiful!

nanosoldier · 2020-05-01T05:22:28Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

timholy · 2020-05-01T07:40:05Z

Very impressive gain!

StefanKarpinski · 2020-05-01T13:47:17Z

Making the inlining cost model more accurate by making reality match the model FTW.

mbauman · 2020-05-01T16:02:09Z

Well that turned out exceptionally well! I wasn't expecting any run-time performance gains but was fearful of losses. Looks like there are just a handful of extra allocations in a few cases... but it's in exchange for a 20%+ compile time speedup in the unit tests. I'll see if anything can be done about those allocations, but I think the fact that this will allow for future runtime improvements (like #30973) will make the broadcast superusers more than pleased.

mbauman · 2020-05-05T18:15:33Z

Ok, the regressions are limited to broadcast expressions that use literal pows (like x.^2) or types (like round.(Int, x)). These end up using Refs inside tuples — like (Ref(^), x, Ref(Val(2))) and (Ref{Type{Int}}(Int), x). The problem is that these Refs are properly mutable and thus must be allocated because an inner (non-inlined) function could pull them out and store them somewhere. In short, this exasperates the need for a true immutable scalar type for broadcast; #18379.

StefanKarpinski · 2020-05-05T18:39:31Z

Maybe ImmutableRef would be a good name for it? The need for immutability here also informs the whole &x syntax: for the ccall usage we specifically want that to create a mutable box; for the broadcast usage we want it to create an immutable box. Good thing we didn't pick yet!

vtjnash · 2020-05-05T18:47:46Z

The main confusion with that might be that it's not really a Ref unless it's mutable (as that's the Julia keyword for pinning something to the heap to make it reference-able). It might more accurately be Scalarize, but I think Some might fit here too?

mbauman · 2020-05-05T19:34:42Z

Oh interesting. Some is a new idea. It's exactly the struct we need and a half-decent (short!) word, but it doesn't have the getindex/length/size methods we need. Before settling on Ref I had once called this a struct ZeroDimArray{T} <: AbstractArray{T,0} in the broadcast internals. Scalar (or some variation thereof) sure seems to be the crowd favorite word here, but gosh do I find it confusing. Imagine adding this Scalar into the mix in this FAQ, for example. Singleton is another option that was recently proposed.

I reviewed all the issues/discourse threads I could find to look for other alternatives, but the above is just about the universe of what I found (Nullable anyone? hah). All the typical verbiage tends have mutable associations (e.g., Box).

mbauman · 2020-05-05T19:56:22Z

What about using a lazy Fill a la FillArrays? Inspired by #33206, which gives:

julia> show(reshape([1], ()))
fill(1)

It actually fits linguistically, too — you can imagine the Fill()'ed argument as filling in across the broadcasted shape. The downside is that it's not as simple anymore — it covers far more than just a zero-dimensional array — and coordinating with FillArrays makes this more complicated yet.

mbauman · 2020-05-05T20:28:25Z

Back to the issue at hand, I suppose this points towards another form of performance regression that this would introduce — it'd block constant propagation in the same vein that Ref blocked the constant propagation in the motivating example in #35591.

vtjnash · 2020-05-05T20:41:16Z

but it doesn't have the methods we need

Mostly, neither did Ref until it got co-opted for this use. And Some is the new name for the Nullable container. The Singleton doesn't seem quite right (it's not the only item in the set)

tkf · 2020-05-06T08:26:02Z

Having x[] for Some would be nice also if we have more Union{Nothing,Some}-based API like #34821 and #35699 (comment).

x = get(dict, key)
if x !== nothing
    f(x[])  # nicer than `f(something(x))`?
else
    ...
end

rfourquet · 2020-05-06T08:57:45Z

I just proposed somewhere else

what about Only? only is already used in a container context with only one element, and there would be the property only(Only(x)) === x.

But Some and an hypothetical Only seem redundant. Anyway, only dropping an idea here.

vtjnash · 2021-04-12T17:10:06Z

bump?

mbauman · 2021-04-12T18:56:37Z

I've rebased, but in doing so I was reminded that the blocker here was performance of Ref-ified scalars. The path forward here is through #39184

Replaces #35778 Replaces #39184 Fixes #39151 Refs #35675 Refs #43200

This removes the dependence on inlining for performance, so we also remove `@inline`, since it can harm performance. make Some type a zero-dim broadcast container (e.g. a scalar) Replaces JuliaLang#35778 Replaces JuliaLang#39184 Fixes JuliaLang#39151 Refs JuliaLang#35675 Refs JuliaLang#43200

mbauman added the broadcast Applying a function over a collection label Apr 30, 2020

mbauman commented Apr 30, 2020

View reviewed changes

tkf mentioned this pull request May 6, 2020

Recommend to use tuples to scalarize items in broadcast expressions #35591

Merged

rfourquet changed the title ~~Remove inline annations from broadcast kernels~~ Remove inline annotations from broadcast kernels May 6, 2020

MasonProtter mentioned this pull request May 6, 2020

Define machinery so that Some can be used for broadcast #35778

Closed

tkf mentioned this pull request May 11, 2020

Fix method amguity in Scalar(::StaticArray) JuliaArrays/StaticArrays.jl#774

Closed

Seelengrab mentioned this pull request Jan 8, 2021

Fused Broadcasting kills constant folding? #39151

Open

vtjnash mentioned this pull request Jan 27, 2021

implement with fewer afoldl methods #39414

Merged

mbauman force-pushed the mb/outline-broadcast branch from 18b6944 to b688db5 Compare January 27, 2021 19:44

mbauman and others added 2 commits April 12, 2021 14:51

Remove inline annations from broadcast kernels

637a2f2

Force specialization on the broadcasteds

5b8bba6

mbauman force-pushed the mb/outline-broadcast branch from b688db5 to 5b8bba6 Compare April 12, 2021 18:54

mbauman mentioned this pull request Jul 15, 2021

Workaround #28126, support SIMDing broadcast in more cases #30973

Closed

mbauman mentioned this pull request Nov 23, 2021

broadcast: disable nospecialize logic for outer method signature #43200

Open

vtjnash added a commit that referenced this pull request Nov 23, 2021

make Some type a zero-dim broadcast container (e.g. a scalar)

59b8341

Replaces #35778 Replaces #39184 Fixes #39151 Refs #35675 Refs #43200

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove inline annotations from broadcast kernels #35675

Remove inline annotations from broadcast kernels #35675

mbauman commented Apr 30, 2020

mbauman commented Apr 30, 2020

mbauman commented Apr 30, 2020

mbauman Apr 30, 2020

StefanKarpinski commented Apr 30, 2020

nanosoldier commented May 1, 2020

timholy commented May 1, 2020

StefanKarpinski commented May 1, 2020

mbauman commented May 1, 2020

mbauman commented May 5, 2020 •

edited

Loading

StefanKarpinski commented May 5, 2020 •

edited

Loading

vtjnash commented May 5, 2020

mbauman commented May 5, 2020 •

edited

Loading

mbauman commented May 5, 2020

mbauman commented May 5, 2020

vtjnash commented May 5, 2020

tkf commented May 6, 2020

rfourquet commented May 6, 2020

vtjnash commented Apr 12, 2021

mbauman commented Apr 12, 2021

	function BitMaskedBitArray(parent::BitArray{N}, mask::BitArray{M}) where {N,M}
	@inline function BitMaskedBitArray(parent::BitArray{N}, mask::BitArray{M}) where {N,M}

Remove inline annotations from broadcast kernels #35675

Are you sure you want to change the base?

Remove inline annotations from broadcast kernels #35675

Conversation

mbauman commented Apr 30, 2020

mbauman commented Apr 30, 2020

mbauman commented Apr 30, 2020

mbauman Apr 30, 2020

Choose a reason for hiding this comment

StefanKarpinski commented Apr 30, 2020

nanosoldier commented May 1, 2020

timholy commented May 1, 2020

StefanKarpinski commented May 1, 2020

mbauman commented May 1, 2020

mbauman commented May 5, 2020 • edited Loading

StefanKarpinski commented May 5, 2020 • edited Loading

vtjnash commented May 5, 2020

mbauman commented May 5, 2020 • edited Loading

mbauman commented May 5, 2020

mbauman commented May 5, 2020

vtjnash commented May 5, 2020

tkf commented May 6, 2020

rfourquet commented May 6, 2020

vtjnash commented Apr 12, 2021

mbauman commented Apr 12, 2021

mbauman commented May 5, 2020 •

edited

Loading

StefanKarpinski commented May 5, 2020 •

edited

Loading

mbauman commented May 5, 2020 •

edited

Loading