Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inverted marking of vector tuple type #15244

Merged
merged 1 commit into from
Apr 30, 2016

Conversation

ArchRobison
Copy link
Contributor

This is PR is intended to generate a discussion about a solution to a problem that @eschnett and I were discussing. See here for start of discussion chain. The problem is how to get access to LLVM vector types from Julia. The intent of the PR is to get feedback on whether the inverted approach (5 below) is something we should take forward.

Some approaches to the general problem are:

  1. Map homogeneous tuples to LLVM vector types automatically. This introduces layout problems, which though I've been able to surmount though type punning, but can cause performance problems since sometimes the vector type causes worse performance, as when operations on a tuple do not map to vector operations and require excessive extract/insert code.
  2. Modify LLVM SLP to work on arrays as if they were vectors. This approach works up to a point, but requires type punning on loads/stores that starts to break down for vectors that are members of structs, such as in the GradientNumber example. If we could talk the LLVM community into adding cast operations or intrinsics to case between arrays and vectors, this approach would be my preference since it keeps use of SIMD instructions hidden. Though some users want explicit SIMD.
  3. Introduce a special Julia type that maps to an LLVM vector. This requires teaching the Julia type system and users about the type (yuck!)
  4. Use an immutable Julia type with a single tuple field. Mark the type in some way to indicate that the tuple should map to an LLVM vector. Experience so far indicates that this is awkward. In retrospect, the approach is troublesome because the interpretation of a tuple type depends on surrounding context.
  5. Use a homogeneous Julia tuple where the elements have a marked type. Marking the element types eliminates the context-sensitivity problems of approach 4.

Approach 5, which sounds upside-down, seems to be the simplest solution. This PR implements it. [Updated 2016-3-1 to reflect update in PR. Original PR called the marked type Banana and used "pre-peeled" metaphor. Now marked type is VecElement. Updated example uses typealias to retain discussion context.] Example:

#from Base
#immutable VecElement{T}  # Marked type
#    value :: T
#end
#export VecElt

typealias Banana VecElement
typealias V NTuple{4,Banana{Float32}}

The PR causes two special mappings for the example:

  • A Banana{T} maps to the LLVM type for a T, not an LLVM struct containing the LLVM representation of T. In other words, bananas are pre-peeled in LLVM.
  • A homogeneous tuple of Banana{T} maps to an LLVM vector of T. I.e., an LLVM vector is used instead of an array.

The pre-peeling rule ensures that a Banana{T} is mapped consistently regardless of surrounding context.

Below is an example that uses the definitions above:

add(u,v) = (Banana(u[1].value+v[1].value),
            Banana(u[2].value+v[2].value),
            Banana(u[3].value+v[3].value),
            Banana(u[4].value+v[4].value))

code_llvm(add,(V,V))

At -O3, the code is:

define void @julia_add_22173(<4 x float>* sret, <4 x float>, <4 x float>) #0 {
top:
  %3 = fadd <4 x float> %1, %2
  store <4 x float> %3, <4 x float>* %0, align 16
  ret void
}

A library such as SIMD.jl could presumably obtain similar effect without -O via llvmcall.

@StefanKarpinski
Copy link
Member

The type could be called 🍌.

@eschnett
Copy link
Contributor

But. This change has only about 20 lines of code!

Amazing. Thanks a lot! 👍

@vtjnash
Copy link
Member

vtjnash commented Feb 25, 2016

this would be analogous to the way we treat Float64, Float32, Complex versions of the same, and Signed specially in codegen, so i think there's reasonable precedence for going this route

@ArchRobison
Copy link
Contributor Author

Per @StefanKarpinski's suggestion, I tried using "\xF0\x9F\x8D\x8C" instead of "Banana" and then the following code works fine.

import Base.+

immutable 🍌{T}
    x :: T
end

typealias V NTuple{4,🍌{Float32}}

(+)(u,v) = (🍌(u[1].x+v[1].x),
            🍌(u[2].x+v[2].x),
            🍌(u[3].x+v[3].x),
            🍌(u[4].x+v[4].x))

code_llvm(+,(V,V))

It certainly has the advantage of being unlikely to conflict with user identifiers. I only wish the font wasn't monochrome.

@StefanKarpinski
Copy link
Member

I see it in spectacular yellow 😁

@JeffBezanson
Copy link
Member

I see it in spectacular yellow

Apple really knows how to sell computers.

@ArchRobison
Copy link
Contributor Author

Wow. I'm home and now see the full color on my MacBook.

I'm open to replacement names less high up in Unicode. Perhaps VecElement?

@eschnett
Copy link
Contributor

I'd call it VecElt, VecElem (both abbreviated), or VectorElement. Alternatively, use SIMD instead of Vec.

@timholy
Copy link
Member

timholy commented Feb 26, 2016

The main problem I see with this is that people are likely to encounter mysterious disappearances of data if they use the 🐵.jl package.

@pkofod
Copy link
Contributor

pkofod commented Feb 26, 2016

Just load 🐯.jl first, that removes most of the problems introduced in 🐵.jl, although it does introduce some new ones as well.

@timholy
Copy link
Member

timholy commented Feb 26, 2016

This conversation certainly gives new meaning to the phrase "the Julia ecosystem."

@timholy
Copy link
Member

timholy commented Feb 26, 2016

To the substance of this PR: it's certainly not the first approach I would have thought of, but often good ideas aren't. This is probably in the domain of "fiendishly clever."

Would we need some kind of promotion to permit operations like ::Int * ::Banana{T} to automatically fall back to ::Int * ::T or perhaps Banana{::Int * ::T}? I guess I'm wondering about getting data into and out of Bananas.

The name SIMD{T} might be best. I like VecElement, but I agree that @eschnett has a good point about abbreviating consistently.

@ArchRobison
Copy link
Contributor Author

I tried to clean up the use of name comparison but did something wrong. With the second commit, I get an infinite recursion when compiling infererence.jl. Here is a sample of the call chain:

jl_apply_generic at /localdisk/adrobiso/julia-invert/julia/src/gf.c:1909
unknown function (ip: 0x2af6d793d7f1)
jl_apply_generic at /localdisk/adrobiso/julia-invert/julia/src/gf.c:1909
unknown function (ip: 0x2af6d76ca413)
jl_apply_generic at /localdisk/adrobiso/julia-invert/julia/src/gf.c:1909
unknown function (ip: 0x2af6d76b13df)

@yuyichao 's suggestion looks like what I want to do anyway. (Thanks!)

@ArchRobison
Copy link
Contributor Author

With respect to getting values in/out of Banana, I was expecting that to be hidden inside a library like SIMD.jl. I like VecElt for the final name, but am keeping Banana for work in progress because it's easy to grep.

@ArchRobison
Copy link
Contributor Author

I'm not understanding some point of the boot process. With the latest three commits, I'm getting a failure during the build process. Here's an excerpt:

make[1]: Entering directory `/localdisk/adrobiso/julia-invert/julia'
 cd /localdisk/adrobiso/julia-invert/julia/base && /localdisk/adrobiso/julia-invert/julia/usr/bin/julia -C native --output-ji /localdisk/adrobiso/julia-invert/julia/usr/lib/julia/inference0.ji -f coreimg.jl
A method error occurred before the base module was defined. Aborting...
#<null>.TypeVar
(:T, #<null>.Any, true)
while loading boot.jl, in expression starting on line 280

See here for what I defined in boot.jl, based on "monkey see, monkey do" knowledge.

@yuyichao
Copy link
Contributor

Move it below the constructor of TypeVar's (and posibly others) are defined?

@ArchRobison
Copy link
Contributor Author

Thanks! That solved the problem.

@mschauer
Copy link
Contributor

Marking the elements would make SIMD optimization seamlessly compatible with Julia packages which are based on wrapped homogeneous tuple types like FixedSizeArrays .

@KristofferC
Copy link
Member

FWIW I don't get any packed instructions using this branch, julia -O and the proof of concept code in the first post.

Here is LLVM and generated code: https://gist.github.com/KristofferC/4be0fea644ef22a4b773

@ArchRobison
Copy link
Contributor Author

@KristofferC : It's my fault for updating the pull request and not commenting on a functional change. The update cleaned up the hacky test for is_banana_type. It now requires that Base.Banana be used instead of any type called Banana. I've updated the example in the first comment to reflect the change.

Next update will change the name to VecElt and remove WIP (Wackiness in Progress?) from the title.

@KristofferC
Copy link
Member

Thanks, it is working now. Experimenting with something like this:

import Base.Banana

typealias Vec{N, T} NTuple{N,Base.Banana{T}}

@generated function add{N}(u::Vec{N}, v::Vec{N})
    Expr(:tuple, [:(Banana(u[$i].value + v[$i].value)) for i in 1:N]...)
end

code_llvm(add,(Vec{4,Float32},Vec{4,Float32}))

it seems a bit brittle. For example

code_llvm(add,(Vec{8,Float32},Vec{8,Float32}))

does not produce two fadd <4x float> but instead does some strange juggling between the registers, generated code: https://gist.github.com/KristofferC/3fbf15ed40015e49cb8a.

It would be nice if it "just worked" without having to match the exact register size but maybe it is out of scope for this PR.

@KristofferC
Copy link
Member

Actually creating a Vec{8, Float32} segfaults so it doesn't really matter what code is being generated :P :

@generated function Base.rand{N, T}(::Type{Vec{N,T}})
    Expr(:tuple, [:(Banana(rand(T))) for i in 1:N]...)
end

a = rand(Vec{8,Float32})


signal (11): Segmentation fault
while loading no file, in expression starting on line 0
indexed_next at ./tuple.jl:33
unknown function (ip: 0x7eff2c7b4188)
[inline] at /home/kristoffer/julia/src/julia_internal.h:69
jl_call_method_internal at /home/kristoffer/julia/src/gf.c:1906
send_to_backend at ./REPL.jl:613
unknown function (ip: 0x7f013f4a8655)
.
.
.
Allocations: 4703615 (Pool: 4700646; Big: 2969); GC: 10
Segmentation fault (core dumped)

@ArchRobison
Copy link
Contributor Author

PR updated to use VecElt instead of Banana.

I can replicate the optimization and crash fragility that @KristofferC observes. On the optimization front, I think we'll have to rely on llvmcall within SIMD.jl. I'm guessing that the SLP vectorizer has a cutoff parameter for how wide a vector to consider, and compile time may be quadratic in that parameter? To rely on llvmcall, we'll probably want an introspection function to tell whether a tuple is represented as an LLVM vector or not.

I'll look into the crash problem. The following example works past the show, but crashes at the second println.

typealias Vec{N, T} NTuple{N,Base.VecElt{T}}

@generated function Base.rand{N, T}(::Type{Vec{N,T}})
    Expr(:tuple, [:(VecElt(rand(T))) for i in 1:N]...)
end

a = rand(Vec{8,Float32})

show(a)
println()
println(a) # Crashes here

@tkelman
Copy link
Contributor

tkelman commented Feb 29, 2016

Do we (edit: use) Elt as shorthand for element anywhere else?

@KristofferC
Copy link
Member

I personally like VecElem or VecElement more.

@ArchRobison
Copy link
Contributor Author

I was wondering if there was prior art for the Elt abbreviation. I couldn't find any for Elt or Elem. I see no problem with a long name such as VecElement since it's a low-level interface that SIMD.jl can hide.

Anyone else have a preference?

@tkelman
Copy link
Contributor

tkelman commented Mar 1, 2016

CI failure was #15294, restarted

From worker 2:       * numbers              Error During Test
    From worker 2:    Test threw an exception of type UndefVarError
    From worker 2:    Expression: round(T,true // false) === convert(T,Inf)
    From worker 2:    UndefVarError: no_op_err not defined
    From worker 2:     in gcd(::Bool, ::Bool) at ./intfuncs.jl:11
    From worker 2:     in Rational{Bool}(::Bool, ::Bool) at ./rational.jl:9
    From worker 2:     [inlined code] from /tmp/julia/share/julia/test/numbers.jl:2849
    From worker 2:     in anonymous at ./no file:-1
    From worker 2:     [inlined code] from ./essentials.jl:78
    From worker 2:     in include_string(::UTF8String, ::ASCIIString) at ./loading.jl:371
    From worker 2:     in include_from_node1(::ASCIIString) at ./loading.jl:420
    From worker 2:     [inlined code] from ./util.jl:179
    From worker 2:     in runtests(::ASCIIString) at /tmp/julia/share/julia/test/testdefs.jl:7
    From worker 2:     in (::Base.Serializer.__deserialized_types__.##9183)(::ASCIIString) at /tmp/julia/share/julia/test/runtests.jl:36
    From worker 2:     in run_work_thunk(::Base.##254#256{Base.CallMsg{:call_fetch}}, ::Bool) at ./multi.jl:714
    From worker 2:     [inlined code] from ./multi.jl:1010
    From worker 2:     in (::Base.##253#255{Base.CallMsg{:call_fetch},TCPSocket})() at ./task.jl:59

@yuyichao yuyichao mentioned this pull request Mar 26, 2016
@KristofferC
Copy link
Member

Small update, this crashes:

julia> ((VecElement(1.0), VecElement(2.0), VecElement(2.0), VecElement(2.0)))

signal (11): Segmentation fault
while loading no file, in expression starting on line 0
[inline] at ./array.jl:391
put! at ./channels.jl:41
unknown function (ip: 0x7fdb3570f2d7)
[inline] at /home/kristoffer/juliacxx/src/julia_internal.h:69

@ArchRobison ArchRobison force-pushed the adr/invert branch 2 times, most recently from b008838 to 5230671 Compare April 7, 2016 14:52
@ArchRobison
Copy link
Contributor Author

Today's commits (up to "Allow more sizes...") fix all the crashes that I'm aware of. So feel free to increase my awareness.

I have not tackled fixing LLVM for all vector sixes like @eschnett's PR does, though my PR does allow the non-power-of-two sizes that seem to work.

@StefanKarpinski
Copy link
Member

Given that there are no known crashes, this seems like it may be at a good point to merge and allow additional issues to be fixed in future PRs. If the participants don't object to that, I suggest merging.

@yuyichao
Copy link
Contributor

yuyichao commented Apr 7, 2016

This uses some c++11 features (nullptr) so it needs to either use only c++03 features or manually turn on c++11 when we are compiling with LLVM 3.3.

@ArchRobison
Copy link
Contributor Author

I'll try to knock it back to C++03. I didn't realize LLVM 3.3 was still in
play.

On Thu, Apr 7, 2016 at 1:44 PM, Yichao Yu [email protected] wrote:

This uses some c++11 features (nullptr) so it needs to either use only
c++03 features or manually turn on c++11 when we are compiling with LLVM
3.3.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#15244 (comment)

}
return (Type*)jst->struct_decl;
}
return julia_type_to_llvm(jt, isboxed);
}

// Returns 0 if no special rule applies
unsigned jl_llvm_special_tuple_alignment(jl_tupletype_t *st)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will probably break AOT compilation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why? (I'm clueless on what ahead of time compilation involves for Julia.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC it allows you to build julia without llvm at runtime.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I skimmed through @vtjnash's article. Maybe he can comment on the AOT issue here. The change in question causes jl_compute_field_offsets to need LLVM. In the AOT model, is jl_compute_field_offsets part of the run-time?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is part of the run time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the info. Basically I need to add an oracle to the run-time that determines whether LLVM uses a vector, using only Julia type information and not it's mapping to LLVM types. If I call this oracle from the Julia-to-LLVM mapping logic where it decides whether to use an LLVM vector, then there will always be agreement on the answer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How complicated/platform-dependent is the LLVM logic used here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tricky part is that currently the tuple type is used if all the elements have the same LLVM type and the LLVM type is a scalar type. I'll have to cut that back to using a tuple only if all the elements have the same Julia type, or all are pointers. And make an estimate on whether the Julia element type maps to an LLVM scalar type (seems straightforward).

I suspect the pointer case isn't interesting, though AVX-512 and a creative mind will prove me wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have to cut that back to using a tuple only if all the elements have the same Julia type, or all are pointers. And make an estimate on whether the Julia element type maps to an LLVM scalar type (seems straightforward).

I think it is better to do that anyway since it will be easier to describe what our ABI is. If it matters, we can treat non floating point bitstype with the same size as the same scalar integer type when deciding whether it is a homogeneous vectorizable tuple.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest revision fixes the AOT issue per discussion. test/vecelement.jl tests all the types of interest, and there's an assertion in cgutils.cpp to catch disagreements between Julia and LLVM about the alignment of a struct.

@ArchRobison
Copy link
Contributor Author

I tried compiling with:

LLVM_VER = 3.3

and got lots of errors in jitlayer.cpp unrelated to my changes. Has anyone successfully built Julia with LLVM 3.3 lately?

@tkelman
Copy link
Contributor

tkelman commented Apr 8, 2016

I did a few days ago but it required manually adding -std=c++11 to CXXFLAGS. That file could maybe be fixed with llvm version ifdefs, not sure.

@ArchRobison
Copy link
Contributor Author

Since @tkelman's remarks indicate that we've bought into using C++11 even when compiling with LLVM 3.3, there seems to be no point in changing nullptr to NULL, so I'll leave it as is in the PR.

@yuyichao
Copy link
Contributor

I'll leave it as is in the PR.

Should be fine.... Supplying CXXFLAGS shouldn't be too bad for someone that want to test LLVM 3.3... As a side note, #15556 breaks linking of sysimg on linux with LLVM 3.3 although there's dirty workaround to make it work (by throw away the first character in the function name)...

@vtjnash
Copy link
Member

vtjnash commented Apr 16, 2016

it still works on clang / osx, so i hadn't noticed any llvm3.3 breakage. i haven't had a reason to test on linux recently. but iirc, using nullptr has been accepted by clang in the past without complaint (and without demanding a c++11 flag – unlike gcc)

@ArchRobison
Copy link
Contributor Author

Something went astray with my last pull/merge. Please ignore latest update until I straighten it out.

@ArchRobison ArchRobison force-pushed the adr/invert branch 2 times, most recently from b208328 to 1388b2a Compare April 29, 2016 02:11
@ArchRobison
Copy link
Contributor Author

Commits squashed. If the checks pass, I suggest giving the changes a final review and merging if it looks good.

// It is currently unused, but might be used in the future for a more precise answer.
static unsigned julia_alignment(Value* /*ptr*/, jl_value_t *jltype, unsigned alignment) {
if (!alignment && ((jl_datatype_t*)jltype)->alignment > MAX_ALIGN) {
// Type's natural alignment exceeds strictess alignment promised in heap, so return the heap alignment.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strictest?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Now fixed.

@vtjnash
Copy link
Member

vtjnash commented Apr 29, 2016

lgtm. Can I ask that you also write a NEWS entry and some documentation on VecElement to the stdlib, mention it in the Julia-to-C mapping table (http://docs.julialang.org/en/latest/manual/calling-c-and-fortran-code/#bits-types), and perhaps put some usage examples somewhere in the manual (maybe in http://docs.julialang.org/en/latest/manual/performance-tips/?). I don't think that needs to hold up merging this though. Thanks for your persistence in making this happen.

A VecElement{T} is represented same as a T in LLVM.
Furthermore, a homogenous tuple of VecElement{T} is
represented as an LLVM vector if allowed by LLVM and
number of elements the tuple is 16 or fewer, *and* can
be correctly handled as a vector.

Adds an assertion and test to check if LLVM and Julia agree
on alignment.

Feature is designed to work with Ahead-Of-Time compilation too.
@ArchRobison ArchRobison merged commit 7315f52 into JuliaLang:master Apr 30, 2016
@ArchRobison
Copy link
Contributor Author

Yes, I'll next work on the items that @vtnash suggested.

@ArchRobison ArchRobison changed the title Inverted marking of vector tuple type (WIP) Inverted marking of vector tuple type Apr 30, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.