Inverted marking of vector tuple type #15244

ArchRobison · 2016-02-25T20:45:18Z

This is PR is intended to generate a discussion about a solution to a problem that @eschnett and I were discussing. See here for start of discussion chain. The problem is how to get access to LLVM vector types from Julia. The intent of the PR is to get feedback on whether the inverted approach (5 below) is something we should take forward.

Some approaches to the general problem are:

Map homogeneous tuples to LLVM vector types automatically. This introduces layout problems, which though I've been able to surmount though type punning, but can cause performance problems since sometimes the vector type causes worse performance, as when operations on a tuple do not map to vector operations and require excessive extract/insert code.
Modify LLVM SLP to work on arrays as if they were vectors. This approach works up to a point, but requires type punning on loads/stores that starts to break down for vectors that are members of structs, such as in the GradientNumber example. If we could talk the LLVM community into adding cast operations or intrinsics to case between arrays and vectors, this approach would be my preference since it keeps use of SIMD instructions hidden. Though some users want explicit SIMD.
Introduce a special Julia type that maps to an LLVM vector. This requires teaching the Julia type system and users about the type (yuck!)
Use an immutable Julia type with a single tuple field. Mark the type in some way to indicate that the tuple should map to an LLVM vector. Experience so far indicates that this is awkward. In retrospect, the approach is troublesome because the interpretation of a tuple type depends on surrounding context.
Use a homogeneous Julia tuple where the elements have a marked type. Marking the element types eliminates the context-sensitivity problems of approach 4.

Approach 5, which sounds upside-down, seems to be the simplest solution. This PR implements it. [Updated 2016-3-1 to reflect update in PR. Original PR called the marked type Banana and used "pre-peeled" metaphor. Now marked type is VecElement. Updated example uses typealias to retain discussion context.] Example:

#from Base
#immutable VecElement{T}  # Marked type
#    value :: T
#end
#export VecElt

typealias Banana VecElement
typealias V NTuple{4,Banana{Float32}}

The PR causes two special mappings for the example:

A Banana{T} maps to the LLVM type for a T, not an LLVM struct containing the LLVM representation of T. In other words, bananas are pre-peeled in LLVM.
A homogeneous tuple of Banana{T} maps to an LLVM vector of T. I.e., an LLVM vector is used instead of an array.

The pre-peeling rule ensures that a Banana{T} is mapped consistently regardless of surrounding context.

Below is an example that uses the definitions above:

add(u,v) = (Banana(u[1].value+v[1].value),
            Banana(u[2].value+v[2].value),
            Banana(u[3].value+v[3].value),
            Banana(u[4].value+v[4].value))

code_llvm(add,(V,V))

At -O3, the code is:

define void @julia_add_22173(<4 x float>* sret, <4 x float>, <4 x float>) #0 {
top:
  %3 = fadd <4 x float> %1, %2
  store <4 x float> %3, <4 x float>* %0, align 16
  ret void
}

A library such as SIMD.jl could presumably obtain similar effect without -O via llvmcall.

StefanKarpinski · 2016-02-25T20:51:03Z

The type could be called 🍌.

eschnett · 2016-02-25T22:06:03Z

But. This change has only about 20 lines of code!

Amazing. Thanks a lot! 👍

vtjnash · 2016-02-25T22:13:50Z

this would be analogous to the way we treat Float64, Float32, Complex versions of the same, and Signed specially in codegen, so i think there's reasonable precedence for going this route

ArchRobison · 2016-02-26T00:01:25Z

Per @StefanKarpinski's suggestion, I tried using "\xF0\x9F\x8D\x8C" instead of "Banana" and then the following code works fine.

import Base.+

immutable 🍌{T}
    x :: T
end

typealias V NTuple{4,🍌{Float32}}

(+)(u,v) = (🍌(u[1].x+v[1].x),
            🍌(u[2].x+v[2].x),
            🍌(u[3].x+v[3].x),
            🍌(u[4].x+v[4].x))

code_llvm(+,(V,V))

It certainly has the advantage of being unlikely to conflict with user identifiers. I only wish the font wasn't monochrome.

StefanKarpinski · 2016-02-26T00:05:25Z

I see it in spectacular yellow 😁

JeffBezanson · 2016-02-26T00:32:53Z

I see it in spectacular yellow

Apple really knows how to sell computers.

ArchRobison · 2016-02-26T00:41:09Z

Wow. I'm home and now see the full color on my MacBook.

I'm open to replacement names less high up in Unicode. Perhaps VecElement?

eschnett · 2016-02-26T02:16:33Z

I'd call it VecElt, VecElem (both abbreviated), or VectorElement. Alternatively, use SIMD instead of Vec.

timholy · 2016-02-26T14:20:50Z

The main problem I see with this is that people are likely to encounter mysterious disappearances of data if they use the 🐵.jl package.

pkofod · 2016-02-26T14:38:02Z

Just load 🐯.jl first, that removes most of the problems introduced in 🐵.jl, although it does introduce some new ones as well.

timholy · 2016-02-26T21:37:13Z

This conversation certainly gives new meaning to the phrase "the Julia ecosystem."

timholy · 2016-02-26T22:00:44Z

To the substance of this PR: it's certainly not the first approach I would have thought of, but often good ideas aren't. This is probably in the domain of "fiendishly clever."

Would we need some kind of promotion to permit operations like ::Int * ::Banana{T} to automatically fall back to ::Int * ::T or perhaps Banana{::Int * ::T}? I guess I'm wondering about getting data into and out of Bananas.

The name SIMD{T} might be best. I like VecElement, but I agree that @eschnett has a good point about abbreviating consistently.

ArchRobison · 2016-02-26T22:08:19Z

I tried to clean up the use of name comparison but did something wrong. With the second commit, I get an infinite recursion when compiling infererence.jl. Here is a sample of the call chain:

jl_apply_generic at /localdisk/adrobiso/julia-invert/julia/src/gf.c:1909
unknown function (ip: 0x2af6d793d7f1)
jl_apply_generic at /localdisk/adrobiso/julia-invert/julia/src/gf.c:1909
unknown function (ip: 0x2af6d76ca413)
jl_apply_generic at /localdisk/adrobiso/julia-invert/julia/src/gf.c:1909
unknown function (ip: 0x2af6d76b13df)

@yuyichao 's suggestion looks like what I want to do anyway. (Thanks!)

ArchRobison · 2016-02-26T22:21:28Z

With respect to getting values in/out of Banana, I was expecting that to be hidden inside a library like SIMD.jl. I like VecElt for the final name, but am keeping Banana for work in progress because it's easy to grep.

ArchRobison · 2016-02-26T22:43:28Z

I'm not understanding some point of the boot process. With the latest three commits, I'm getting a failure during the build process. Here's an excerpt:

make[1]: Entering directory `/localdisk/adrobiso/julia-invert/julia'
 cd /localdisk/adrobiso/julia-invert/julia/base && /localdisk/adrobiso/julia-invert/julia/usr/bin/julia -C native --output-ji /localdisk/adrobiso/julia-invert/julia/usr/lib/julia/inference0.ji -f coreimg.jl
A method error occurred before the base module was defined. Aborting...
#<null>.TypeVar
(:T, #<null>.Any, true)
while loading boot.jl, in expression starting on line 280

See here for what I defined in boot.jl, based on "monkey see, monkey do" knowledge.

yuyichao · 2016-02-26T22:54:33Z

Move it below the constructor of TypeVar's (and posibly others) are defined?

ArchRobison · 2016-02-26T23:00:46Z

Thanks! That solved the problem.

mschauer · 2016-02-28T10:32:24Z

Marking the elements would make SIMD optimization seamlessly compatible with Julia packages which are based on wrapped homogeneous tuple types like FixedSizeArrays .

KristofferC · 2016-02-28T11:16:09Z

FWIW I don't get any packed instructions using this branch, julia -O and the proof of concept code in the first post.

Here is LLVM and generated code: https://gist.github.com/KristofferC/4be0fea644ef22a4b773

ArchRobison · 2016-02-29T17:13:46Z

@KristofferC : It's my fault for updating the pull request and not commenting on a functional change. The update cleaned up the hacky test for is_banana_type. It now requires that Base.Banana be used instead of any type called Banana. I've updated the example in the first comment to reflect the change.

Next update will change the name to VecElt and remove WIP (Wackiness in Progress?) from the title.

KristofferC · 2016-02-29T17:57:11Z

Thanks, it is working now. Experimenting with something like this:

import Base.Banana

typealias Vec{N, T} NTuple{N,Base.Banana{T}}

@generated function add{N}(u::Vec{N}, v::Vec{N})
    Expr(:tuple, [:(Banana(u[$i].value + v[$i].value)) for i in 1:N]...)
end

code_llvm(add,(Vec{4,Float32},Vec{4,Float32}))

it seems a bit brittle. For example

code_llvm(add,(Vec{8,Float32},Vec{8,Float32}))

does not produce two fadd <4x float> but instead does some strange juggling between the registers, generated code: https://gist.github.com/KristofferC/3fbf15ed40015e49cb8a.

It would be nice if it "just worked" without having to match the exact register size but maybe it is out of scope for this PR.

KristofferC · 2016-02-29T18:15:44Z

Actually creating a Vec{8, Float32} segfaults so it doesn't really matter what code is being generated :P :

@generated function Base.rand{N, T}(::Type{Vec{N,T}})
    Expr(:tuple, [:(Banana(rand(T))) for i in 1:N]...)
end

a = rand(Vec{8,Float32})


signal (11): Segmentation fault
while loading no file, in expression starting on line 0
indexed_next at ./tuple.jl:33
unknown function (ip: 0x7eff2c7b4188)
[inline] at /home/kristoffer/julia/src/julia_internal.h:69
jl_call_method_internal at /home/kristoffer/julia/src/gf.c:1906
send_to_backend at ./REPL.jl:613
unknown function (ip: 0x7f013f4a8655)
.
.
.
Allocations: 4703615 (Pool: 4700646; Big: 2969); GC: 10
Segmentation fault (core dumped)

ArchRobison · 2016-02-29T19:25:46Z

PR updated to use VecElt instead of Banana.

I can replicate the optimization and crash fragility that @KristofferC observes. On the optimization front, I think we'll have to rely on llvmcall within SIMD.jl. I'm guessing that the SLP vectorizer has a cutoff parameter for how wide a vector to consider, and compile time may be quadratic in that parameter? To rely on llvmcall, we'll probably want an introspection function to tell whether a tuple is represented as an LLVM vector or not.

I'll look into the crash problem. The following example works past the show, but crashes at the second println.

typealias Vec{N, T} NTuple{N,Base.VecElt{T}}

@generated function Base.rand{N, T}(::Type{Vec{N,T}})
    Expr(:tuple, [:(VecElt(rand(T))) for i in 1:N]...)
end

a = rand(Vec{8,Float32})

show(a)
println()
println(a) # Crashes here

tkelman · 2016-02-29T20:14:11Z

Do we (edit: use) Elt as shorthand for element anywhere else?

KristofferC · 2016-02-29T21:09:46Z

I personally like VecElem or VecElement more.

ArchRobison · 2016-02-29T22:35:42Z

I was wondering if there was prior art for the Elt abbreviation. I couldn't find any for Elt or Elem. I see no problem with a long name such as VecElement since it's a low-level interface that SIMD.jl can hide.

Anyone else have a preference?

tkelman · 2016-03-01T00:51:09Z

CI failure was #15294, restarted

From worker 2:       * numbers              Error During Test
    From worker 2:    Test threw an exception of type UndefVarError
    From worker 2:    Expression: round(T,true // false) === convert(T,Inf)
    From worker 2:    UndefVarError: no_op_err not defined
    From worker 2:     in gcd(::Bool, ::Bool) at ./intfuncs.jl:11
    From worker 2:     in Rational{Bool}(::Bool, ::Bool) at ./rational.jl:9
    From worker 2:     [inlined code] from /tmp/julia/share/julia/test/numbers.jl:2849
    From worker 2:     in anonymous at ./no file:-1
    From worker 2:     [inlined code] from ./essentials.jl:78
    From worker 2:     in include_string(::UTF8String, ::ASCIIString) at ./loading.jl:371
    From worker 2:     in include_from_node1(::ASCIIString) at ./loading.jl:420
    From worker 2:     [inlined code] from ./util.jl:179
    From worker 2:     in runtests(::ASCIIString) at /tmp/julia/share/julia/test/testdefs.jl:7
    From worker 2:     in (::Base.Serializer.__deserialized_types__.##9183)(::ASCIIString) at /tmp/julia/share/julia/test/runtests.jl:36
    From worker 2:     in run_work_thunk(::Base.##254#256{Base.CallMsg{:call_fetch}}, ::Bool) at ./multi.jl:714
    From worker 2:     [inlined code] from ./multi.jl:1010
    From worker 2:     in (::Base.##253#255{Base.CallMsg{:call_fetch},TCPSocket})() at ./task.jl:59

KristofferC · 2016-03-30T13:50:26Z

Small update, this crashes:

julia> ((VecElement(1.0), VecElement(2.0), VecElement(2.0), VecElement(2.0)))

signal (11): Segmentation fault
while loading no file, in expression starting on line 0
[inline] at ./array.jl:391
put! at ./channels.jl:41
unknown function (ip: 0x7fdb3570f2d7)
[inline] at /home/kristoffer/juliacxx/src/julia_internal.h:69

ArchRobison · 2016-04-07T18:26:28Z

Today's commits (up to "Allow more sizes...") fix all the crashes that I'm aware of. So feel free to increase my awareness.

I have not tackled fixing LLVM for all vector sixes like @eschnett's PR does, though my PR does allow the non-power-of-two sizes that seem to work.

StefanKarpinski · 2016-04-07T18:38:46Z

Given that there are no known crashes, this seems like it may be at a good point to merge and allow additional issues to be fixed in future PRs. If the participants don't object to that, I suggest merging.

yuyichao · 2016-04-07T18:43:38Z

This uses some c++11 features (nullptr) so it needs to either use only c++03 features or manually turn on c++11 when we are compiling with LLVM 3.3.

ArchRobison · 2016-04-07T23:34:45Z

I'll try to knock it back to C++03. I didn't realize LLVM 3.3 was still in
play.

On Thu, Apr 7, 2016 at 1:44 PM, Yichao Yu [email protected] wrote:

This uses some c++11 features (nullptr) so it needs to either use only
c++03 features or manually turn on c++11 when we are compiling with LLVM
3.3.

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#15244 (comment)

yuyichao · 2016-04-07T23:57:38Z

src/cgutils.cpp

        }
        return (Type*)jst->struct_decl;
    }
    return julia_type_to_llvm(jt, isboxed);
 }

+// Returns 0 if no special rule applies
+unsigned jl_llvm_special_tuple_alignment(jl_tupletype_t *st)


This will probably break AOT compilation.

Could you explain why? (I'm clueless on what ahead of time compilation involves for Julia.)

IIUC it allows you to build julia without llvm at runtime.

I skimmed through @vtjnash's article. Maybe he can comment on the AOT issue here. The change in question causes jl_compute_field_offsets to need LLVM. In the AOT model, is jl_compute_field_offsets part of the run-time?

Yes, it is part of the run time.

Thanks for the info. Basically I need to add an oracle to the run-time that determines whether LLVM uses a vector, using only Julia type information and not it's mapping to LLVM types. If I call this oracle from the Julia-to-LLVM mapping logic where it decides whether to use an LLVM vector, then there will always be agreement on the answer.

How complicated/platform-dependent is the LLVM logic used here?

The tricky part is that currently the tuple type is used if all the elements have the same LLVM type and the LLVM type is a scalar type. I'll have to cut that back to using a tuple only if all the elements have the same Julia type, or all are pointers. And make an estimate on whether the Julia element type maps to an LLVM scalar type (seems straightforward).

I suspect the pointer case isn't interesting, though AVX-512 and a creative mind will prove me wrong.

I'll have to cut that back to using a tuple only if all the elements have the same Julia type, or all are pointers. And make an estimate on whether the Julia element type maps to an LLVM scalar type (seems straightforward).

I think it is better to do that anyway since it will be easier to describe what our ABI is. If it matters, we can treat non floating point bitstype with the same size as the same scalar integer type when deciding whether it is a homogeneous vectorizable tuple.

Latest revision fixes the AOT issue per discussion. test/vecelement.jl tests all the types of interest, and there's an assertion in cgutils.cpp to catch disagreements between Julia and LLVM about the alignment of a struct.

ArchRobison · 2016-04-08T18:46:55Z

I tried compiling with:

LLVM_VER = 3.3

and got lots of errors in jitlayer.cpp unrelated to my changes. Has anyone successfully built Julia with LLVM 3.3 lately?

tkelman · 2016-04-08T19:02:43Z

I did a few days ago but it required manually adding -std=c++11 to CXXFLAGS. That file could maybe be fixed with llvm version ifdefs, not sure.

ArchRobison · 2016-04-15T19:48:42Z

Since @tkelman's remarks indicate that we've bought into using C++11 even when compiling with LLVM 3.3, there seems to be no point in changing nullptr to NULL, so I'll leave it as is in the PR.

yuyichao · 2016-04-15T21:24:55Z

I'll leave it as is in the PR.

Should be fine.... Supplying CXXFLAGS shouldn't be too bad for someone that want to test LLVM 3.3... As a side note, #15556 breaks linking of sysimg on linux with LLVM 3.3 although there's dirty workaround to make it work (by throw away the first character in the function name)...

vtjnash · 2016-04-16T05:05:03Z

it still works on clang / osx, so i hadn't noticed any llvm3.3 breakage. i haven't had a reason to test on linux recently. but iirc, using nullptr has been accepted by clang in the past without complaint (and without demanding a c++11 flag – unlike gcc)

ArchRobison · 2016-04-28T20:53:39Z

Something went astray with my last pull/merge. Please ignore latest update until I straighten it out.

ArchRobison · 2016-04-29T02:16:26Z

Commits squashed. If the checks pass, I suggest giving the changes a final review and merging if it looks good.

tkelman · 2016-04-29T02:25:45Z

src/cgutils.cpp

+// It is currently unused, but might be used in the future for a more precise answer.
+static unsigned julia_alignment(Value* /*ptr*/, jl_value_t *jltype, unsigned alignment) {
+    if (!alignment && ((jl_datatype_t*)jltype)->alignment > MAX_ALIGN) {
+        // Type's natural alignment exceeds strictess alignment promised in heap, so return the heap alignment.


Thanks. Now fixed.

vtjnash · 2016-04-29T17:48:20Z

lgtm. Can I ask that you also write a NEWS entry and some documentation on VecElement to the stdlib, mention it in the Julia-to-C mapping table (http://docs.julialang.org/en/latest/manual/calling-c-and-fortran-code/#bits-types), and perhaps put some usage examples somewhere in the manual (maybe in http://docs.julialang.org/en/latest/manual/performance-tips/?). I don't think that needs to hold up merging this though. Thanks for your persistence in making this happen.

A VecElement{T} is represented same as a T in LLVM. Furthermore, a homogenous tuple of VecElement{T} is represented as an LLVM vector if allowed by LLVM and number of elements the tuple is 16 or fewer, *and* can be correctly handled as a vector. Adds an assertion and test to check if LLVM and Julia agree on alignment. Feature is designed to work with Ahead-Of-Time compilation too.

ArchRobison · 2016-04-30T15:45:22Z

Yes, I'll next work on the items that @vtnash suggested.

simonbyrne mentioned this pull request Feb 27, 2016

Allow @simd to use vectorised math operations when available #15265

Open

ArchRobison force-pushed the adr/invert branch from 683723d to 30fb043 Compare February 27, 2016 17:36

ArchRobison force-pushed the adr/invert branch from 30fb043 to f82117e Compare February 29, 2016 18:50

yuyichao mentioned this pull request Mar 26, 2016

Fix atomics #15634

Merged

ArchRobison force-pushed the adr/invert branch 2 times, most recently from b008838 to 5230671 Compare April 7, 2016 14:52

yuyichao reviewed Apr 7, 2016
View reviewed changes

ArchRobison mentioned this pull request Apr 26, 2016

Initial Design Discussion rprechelt/Vectorize.jl#2

Closed

ArchRobison force-pushed the adr/invert branch 2 times, most recently from b208328 to 1388b2a Compare April 29, 2016 02:11

tkelman reviewed Apr 29, 2016
View reviewed changes

ArchRobison force-pushed the adr/invert branch from 1f55382 to f9b3020 Compare April 29, 2016 21:31

ArchRobison merged commit 7315f52 into JuliaLang:master Apr 30, 2016

ArchRobison changed the title ~~Inverted marking of vector tuple type (WIP)~~ Inverted marking of vector tuple type Apr 30, 2016

ArchRobison mentioned this pull request Apr 30, 2016

ccall should pass vectorized tuples by value instead of reference #16138

Closed

simonbyrne mentioned this pull request Jul 22, 2016

The way to use AES. JuliaRandom/RandomNumbers.jl#10

Closed

andyferris mentioned this pull request Sep 6, 2018

Alternative to specify open/closed endpoints JuliaMath/IntervalSets.jl#39

Closed

Inverted marking of vector tuple type #15244

Inverted marking of vector tuple type #15244

Conversation

ArchRobison commented Feb 25, 2016

StefanKarpinski commented Feb 25, 2016

eschnett commented Feb 25, 2016

vtjnash commented Feb 25, 2016

ArchRobison commented Feb 26, 2016

StefanKarpinski commented Feb 26, 2016

JeffBezanson commented Feb 26, 2016

ArchRobison commented Feb 26, 2016

eschnett commented Feb 26, 2016

timholy commented Feb 26, 2016

pkofod commented Feb 26, 2016

timholy commented Feb 26, 2016

timholy commented Feb 26, 2016

ArchRobison commented Feb 26, 2016

ArchRobison commented Feb 26, 2016

ArchRobison commented Feb 26, 2016

yuyichao commented Feb 26, 2016

ArchRobison commented Feb 26, 2016

mschauer commented Feb 28, 2016

KristofferC commented Feb 28, 2016

ArchRobison commented Feb 29, 2016

KristofferC commented Feb 29, 2016

KristofferC commented Feb 29, 2016

ArchRobison commented Feb 29, 2016

tkelman commented Feb 29, 2016

KristofferC commented Feb 29, 2016

ArchRobison commented Feb 29, 2016

tkelman commented Mar 1, 2016

KristofferC commented Mar 30, 2016

ArchRobison commented Apr 7, 2016

StefanKarpinski commented Apr 7, 2016

yuyichao commented Apr 7, 2016

ArchRobison commented Apr 7, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ArchRobison commented Apr 8, 2016

tkelman commented Apr 8, 2016

ArchRobison commented Apr 15, 2016

yuyichao commented Apr 15, 2016

vtjnash commented Apr 16, 2016

ArchRobison commented Apr 28, 2016

ArchRobison commented Apr 29, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vtjnash commented Apr 29, 2016

ArchRobison commented Apr 30, 2016