-
-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recursive and dotted replacement with muladd #57
Conversation
This is absolutely beautiful. The code is very clean so even though it's complicated, it's still easy to follow. This is probably the best way to fix SciML/OrdinaryDiffEq.jl#86 too. I like that you got the macro in front of the equals working, and I never thought about using IMO it's looking complicated enough yet widely useful enough that we might want to spin it off as a separate package. Other people might want to use this if a README is stuck on it. I'll merge when tests pass, or I can quickly generate a MuladdMacro.jl in DiffEq and give you write privileges, transfer the tests over, get that registered, and then replace its usage in OrdinaryDiffEq.jl/StochasticDiffEq.jl. |
The main work was already done, these are just some adjustments. If you think this is worth a separate package, I can transfer it and also add a README (basically a (maybe a bit longer) copy of the existing documentation and some examples, I guess?). Of course, one still has to apply some fixes to julia> muladd([1.], [1.], [1.])
ERROR: DimensionMismatch("Cannot multiply two vectors")
Stacktrace:
[1] muladd(::Array{Float64,1}, ::Array{Float64,1}, ::Array{Float64,1}) at ./math.jl:943 and even in the use case above with scalar julia> @code_llvm muladd([1.], 1., [1.])
define i8** @julia_muladd_60850(i8** dereferenceable(40), double, i8** dereferenceable(40)) #0 !dbg !5 {
top:
%ptls_i8 = call i8* asm "movq %fs:0, $0;\0Aaddq $$-10888, $0", "=r,~{dirflag},~{fpsr},~{flags}"() #2
%ptls = bitcast i8* %ptls_i8 to i8****
%3 = alloca [3 x i8**], align 8
%.sub = getelementptr inbounds [3 x i8**], [3 x i8**]* %3, i64 0, i64 0
%4 = getelementptr [3 x i8**], [3 x i8**]* %3, i64 0, i64 2
store i8** null, i8*** %4, align 8
%5 = bitcast [3 x i8**]* %3 to i64*
store i64 2, i64* %5, align 8
%6 = getelementptr [3 x i8**], [3 x i8**]* %3, i64 0, i64 1
%7 = bitcast i8* %ptls_i8 to i64*
%8 = load i64, i64* %7, align 8
%9 = bitcast i8*** %6 to i64*
store i64 %8, i64* %9, align 8
store i8*** %.sub, i8**** %ptls, align 8
%10 = call i8** @"julia_*_60853"(i8** nonnull %0, double %1)
store i8** %10, i8*** %4, align 8
%11 = call i8** @"julia_+_60851"(i8** %10, i8** nonnull %2)
%12 = load i64, i64* %9, align 8
store i64 %12, i64* %7, align 8
ret i8** %11
} which does not call julia> @code_llvm muladd(1., 1., 1.)
define double @jlsys_muladd_59756(double, double, double) #0 !dbg !5 {
top:
%3 = call double @llvm.fmuladd.f64(double %0, double %1, double %2)
ret double %3
} and the huge expression julia> @code_llvm muladd.([1.], [1.], [1.])
define i8** @japi1_broadcast_60847(i8**, i8***, i32) #0 !dbg !5 {
top:
%3 = alloca %OneTo, align 8
%4 = alloca %OneTo, align 8
%5 = alloca %OneTo, align 8
%6 = alloca i8***, align 8
store volatile i8*** %1, i8**** %6, align 8
%ptls_i8 = call i8* asm "movq %fs:0, $0;\0Aaddq $$-10888, $0", "=r,~{dirflag},~{fpsr},~{flags}"() #5
%ptls = bitcast i8* %ptls_i8 to i8****
%7 = alloca [4 x i8**], align 8
%.sub = getelementptr inbounds [4 x i8**], [4 x i8**]* %7, i64 0, i64 0
%8 = getelementptr [4 x i8**], [4 x i8**]* %7, i64 0, i64 2
%9 = bitcast [4 x i8**]* %7 to i64*
%10 = bitcast i8*** %8 to i8*
call void @llvm.memset.p0i8.i64(i8* %10, i8 0, i64 16, i32 8, i1 false)
store i64 4, i64* %9, align 8
%11 = getelementptr [4 x i8**], [4 x i8**]* %7, i64 0, i64 1
%12 = bitcast i8* %ptls_i8 to i64*
%13 = load i64, i64* %12, align 8
%14 = bitcast i8*** %11 to i64*
store i64 %13, i64* %14, align 8
store i8*** %.sub, i8**** %ptls, align 8
%newout = alloca %OneTo, align 8
%15 = getelementptr i8**, i8*** %1, i64 2
switch i32 %2, label %pass26 [
i32 2, label %fail
i32 3, label %fail25
]
fail: ; preds = %top
call void @jl_bounds_error_tuple_int(i8*** %15, i64 0, i64 1)
unreachable
fail25: ; preds = %top
call void @jl_bounds_error_tuple_int(i8*** %15, i64 1, i64 2)
unreachable
pass26: ; preds = %top
%16 = getelementptr i8**, i8*** %1, i64 1
%17 = load i8**, i8*** %16, align 8
%18 = load i8**, i8*** %15, align 8
%19 = getelementptr i8**, i8*** %1, i64 3
%20 = load i8**, i8*** %19, align 8
%21 = getelementptr i8*, i8** %17, i64 3
%22 = bitcast i8** %21 to i64*
%23 = load i64, i64* %22, align 8
%24 = getelementptr i8*, i8** %18, i64 3
%25 = bitcast i8** %24 to i64*
%26 = load i64, i64* %25, align 8
%27 = getelementptr i8*, i8** %20, i64 3
%28 = bitcast i8** %27 to i64*
%29 = load i64, i64* %28, align 8
%30 = icmp slt i64 %26, 0
%31 = select i1 %30, i64 0, i64 %26
%32 = getelementptr inbounds %OneTo, %OneTo* %5, i64 0, i32 0
store i64 %31, i64* %32, align 8
%33 = icmp slt i64 %29, 0
%34 = select i1 %33, i64 0, i64 %29
%35 = getelementptr inbounds %OneTo, %OneTo* %4, i64 0, i32 0
store i64 %34, i64* %35, align 8
%36 = icmp slt i64 %23, 0
%37 = select i1 %36, i64 0, i64 %23
%38 = getelementptr inbounds %OneTo, %OneTo* %3, i64 0, i32 0
store i64 %37, i64* %38, align 8
%39 = call %OneTo @julia__bcs1_60823(%OneTo* nocapture nonnull readonly %3, %OneTo* nocapture nonnull readonly %5)
%40 = extractvalue %OneTo %39, 0
%41 = getelementptr inbounds %OneTo, %OneTo* %newout, i64 0, i32 0
store i64 %40, i64* %41, align 8
%42 = call %OneTo @julia__bcs1_60823(%OneTo* nocapture nonnull readonly %newout, %OneTo* nocapture nonnull readonly %4)
%43 = extractvalue %OneTo %42, 0
%44 = call i8** inttoptr (i64 139923554842592 to i8** (i8**, i64)*)(i8** inttoptr (i64 139923408648656 to i8**), i64 %43)
store i8** %44, i8*** %8, align 8
%45 = icmp slt i64 %43, 1
br i1 %45, label %L139.backedge, label %if28.lr.ph
L139.loopexit.loopexit: ; preds = %middle.block206, %if28.us.us.us
br label %L139.loopexit
L139.loopexit.loopexit64: ; preds = %middle.block183, %if28.us.us
br label %L139.loopexit
L139.loopexit.loopexit65: ; preds = %middle.block160, %if28.us.us48
br label %L139.loopexit
L139.loopexit.loopexit66: ; preds = %middle.block139, %if28.us
br label %L139.loopexit
L139.loopexit.loopexit67: ; preds = %middle.block116, %if28.us44.us
br label %L139.loopexit
L139.loopexit.loopexit68: ; preds = %middle.block95, %if28.us44
br label %L139.loopexit
L139.loopexit.loopexit69: ; preds = %middle.block75, %if28.us46
br label %L139.loopexit
L139.loopexit.loopexit70: ; preds = %middle.block, %if28
br label %L139.loopexit
L139.loopexit: ; preds = %L139.loopexit.loopexit70, %L139.loopexit.loopexit69, %L139.loopexit.loopexit68, %L139.loopexit.loopexit67, %L139.loopexit.loopexit66, %L139.loopexit.loopexit65, %L139.loopexit.loopexit64, %L139.loopexit.loopexit
%46 = getelementptr [4 x i8**], [4 x i8**]* %7, i64 0, i64 3
store i8** %44, i8*** %46, align 8
br label %L139.backedge
L139.backedge: ; preds = %pass26, %L139.loopexit
%47 = load i64, i64* %14, align 8
store i64 %47, i64* %12, align 8
ret i8** %44
if28.lr.ph: ; preds = %pass26
%48 = load i64, i64* %28, align 8
%49 = icmp slt i64 %48, 0
%50 = select i1 %49, i64 0, i64 %48
%51 = icmp eq i64 %43, %50
%52 = load i64, i64* %25, align 8
%53 = icmp slt i64 %52, 0
%54 = select i1 %53, i64 0, i64 %52
%55 = icmp eq i64 %43, %54
%56 = load i64, i64* %22, align 8
%57 = icmp slt i64 %56, 0
%58 = select i1 %57, i64 0, i64 %56
%59 = icmp eq i64 %43, %58
%60 = bitcast i8** %17 to double**
%61 = bitcast i8** %18 to double**
%62 = bitcast i8** %20 to double**
%63 = bitcast i8** %44 to double**
%64 = load double*, double** %60, align 8
%65 = load double*, double** %61, align 8
%66 = load double*, double** %62, align 8
%67 = load double*, double** %63, align 8
br i1 %59, label %if28.lr.ph.split.us, label %if28.lr.ph.if28.lr.ph.split_crit_edge
if28.lr.ph.if28.lr.ph.split_crit_edge: ; preds = %if28.lr.ph
br i1 %55, label %if28.lr.ph.split.split.us, label %if28.lr.ph.split.if28.lr.ph.split.split_crit_edge
if28.lr.ph.split.us: ; preds = %if28.lr.ph
br i1 %55, label %if28.lr.ph.split.us.split.us, label %if28.lr.ph.split.us.if28.lr.ph.split.us.split_crit_edge
if28.lr.ph.split.us.if28.lr.ph.split.us.split_crit_edge: ; preds = %if28.lr.ph.split.us
br i1 %51, label %if28.us.us48.preheader, label %if28.us.preheader
if28.us.preheader: ; preds = %if28.lr.ph.split.us.if28.lr.ph.split.us.split_crit_edge
%min.iters.check141 = icmp ult i64 %43, 4
br i1 %min.iters.check141, label %scalar.ph140, label %min.iters.checked142
min.iters.checked142: ; preds = %if28.us.preheader
%n.vec144 = and i64 %43, -4
%cmp.zero145 = icmp eq i64 %n.vec144, 0
br i1 %cmp.zero145, label %scalar.ph140, label %vector.ph146
vector.ph146: ; preds = %min.iters.checked142
br label %vector.body138
vector.body138: ; preds = %vector.body138, %vector.ph146
%index147 = phi i64 [ 0, %vector.ph146 ], [ %index.next148, %vector.body138 ]
%68 = getelementptr double, double* %64, i64 %index147
%69 = bitcast double* %68 to <2 x double>*
%wide.load155 = load <2 x double>, <2 x double>* %69, align 8
%70 = getelementptr double, double* %68, i64 2
%71 = bitcast double* %70 to <2 x double>*
%wide.load156 = load <2 x double>, <2 x double>* %71, align 8
%72 = load double, double* %65, align 8
%73 = insertelement <2 x double> undef, double %72, i32 0
%74 = insertelement <2 x double> %73, double %72, i32 1
%75 = insertelement <2 x double> undef, double %72, i32 0
%76 = insertelement <2 x double> %75, double %72, i32 1
%77 = load double, double* %66, align 8
%78 = insertelement <2 x double> undef, double %77, i32 0
%79 = insertelement <2 x double> %78, double %77, i32 1
%80 = insertelement <2 x double> undef, double %77, i32 0
%81 = insertelement <2 x double> %80, double %77, i32 1
%82 = call <2 x double> @llvm.fmuladd.v2f64(<2 x double> %wide.load155, <2 x double> %74, <2 x double> %79)
%83 = call <2 x double> @llvm.fmuladd.v2f64(<2 x double> %wide.load156, <2 x double> %76, <2 x double> %81)
%84 = getelementptr double, double* %67, i64 %index147
%85 = bitcast double* %84 to <2 x double>*
store <2 x double> %82, <2 x double>* %85, align 8
%86 = getelementptr double, double* %84, i64 2
%87 = bitcast double* %86 to <2 x double>*
store <2 x double> %83, <2 x double>* %87, align 8
%index.next148 = add i64 %index147, 4
%88 = icmp eq i64 %index.next148, %n.vec144
br i1 %88, label %middle.block139, label %vector.body138
middle.block139: ; preds = %vector.body138
%cmp.n150 = icmp eq i64 %43, %n.vec144
br i1 %cmp.n150, label %L139.loopexit.loopexit66, label %scalar.ph140
scalar.ph140: ; preds = %middle.block139, %min.iters.checked142, %if28.us.preheader
%bc.resume.val149 = phi i64 [ %n.vec144, %middle.block139 ], [ 0, %if28.us.preheader ], [ 0, %min.iters.checked142 ]
br label %if28.us
if28.us.us48.preheader: ; preds = %if28.lr.ph.split.us.if28.lr.ph.split.us.split_crit_edge
%min.iters.check162 = icmp ult i64 %43, 4
br i1 %min.iters.check162, label %scalar.ph161, label %min.iters.checked163
min.iters.checked163: ; preds = %if28.us.us48.preheader
%n.vec165 = and i64 %43, -4
%cmp.zero166 = icmp eq i64 %n.vec165, 0
br i1 %cmp.zero166, label %scalar.ph161, label %vector.ph167
vector.ph167: ; preds = %min.iters.checked163
br label %vector.body159
vector.body159: ; preds = %vector.body159, %vector.ph167
%index168 = phi i64 [ 0, %vector.ph167 ], [ %index.next169, %vector.body159 ]
%89 = getelementptr double, double* %64, i64 %index168
%90 = bitcast double* %89 to <2 x double>*
%wide.load176 = load <2 x double>, <2 x double>* %90, align 8
%91 = getelementptr double, double* %89, i64 2
%92 = bitcast double* %91 to <2 x double>*
%wide.load177 = load <2 x double>, <2 x double>* %92, align 8
%93 = load double, double* %65, align 8
%94 = insertelement <2 x double> undef, double %93, i32 0
%95 = insertelement <2 x double> %94, double %93, i32 1
%96 = insertelement <2 x double> undef, double %93, i32 0
%97 = insertelement <2 x double> %96, double %93, i32 1
%98 = getelementptr double, double* %66, i64 %index168
%99 = bitcast double* %98 to <2 x double>*
%wide.load178 = load <2 x double>, <2 x double>* %99, align 8
%100 = getelementptr double, double* %98, i64 2
%101 = bitcast double* %100 to <2 x double>*
%wide.load179 = load <2 x double>, <2 x double>* %101, align 8
%102 = call <2 x double> @llvm.fmuladd.v2f64(<2 x double> %wide.load176, <2 x double> %95, <2 x double> %wide.load178)
%103 = call <2 x double> @llvm.fmuladd.v2f64(<2 x double> %wide.load177, <2 x double> %97, <2 x double> %wide.load179)
%104 = getelementptr double, double* %67, i64 %index168
%105 = bitcast double* %104 to <2 x double>*
store <2 x double> %102, <2 x double>* %105, align 8
%106 = getelementptr double, double* %104, i64 2
%107 = bitcast double* %106 to <2 x double>*
store <2 x double> %103, <2 x double>* %107, align 8
%index.next169 = add i64 %index168, 4
%108 = icmp eq i64 %index.next169, %n.vec165
br i1 %108, label %middle.block160, label %vector.body159
middle.block160: ; preds = %vector.body159
%cmp.n171 = icmp eq i64 %43, %n.vec165
br i1 %cmp.n171, label %L139.loopexit.loopexit65, label %scalar.ph161
scalar.ph161: ; preds = %middle.block160, %min.iters.checked163, %if28.us.us48.preheader
%bc.resume.val170 = phi i64 [ %n.vec165, %middle.block160 ], [ 0, %if28.us.us48.preheader ], [ 0, %min.iters.checked163 ]
br label %if28.us.us48
if28.lr.ph.split.us.split.us: ; preds = %if28.lr.ph.split.us
br i1 %51, label %if28.us.us.us.preheader, label %if28.us.us.preheader
if28.us.us.preheader: ; preds = %if28.lr.ph.split.us.split.us
%min.iters.check185 = icmp ult i64 %43, 4
br i1 %min.iters.check185, label %scalar.ph184, label %min.iters.checked186
min.iters.checked186: ; preds = %if28.us.us.preheader
%n.vec188 = and i64 %43, -4
%cmp.zero189 = icmp eq i64 %n.vec188, 0
br i1 %cmp.zero189, label %scalar.ph184, label %vector.ph190
vector.ph190: ; preds = %min.iters.checked186
br label %vector.body182
vector.body182: ; preds = %vector.body182, %vector.ph190
%index191 = phi i64 [ 0, %vector.ph190 ], [ %index.next192, %vector.body182 ]
%109 = getelementptr double, double* %64, i64 %index191
%110 = bitcast double* %109 to <2 x double>*
%wide.load199 = load <2 x double>, <2 x double>* %110, align 8
%111 = getelementptr double, double* %109, i64 2
%112 = bitcast double* %111 to <2 x double>*
%wide.load200 = load <2 x double>, <2 x double>* %112, align 8
%113 = getelementptr double, double* %65, i64 %index191
%114 = bitcast double* %113 to <2 x double>*
%wide.load201 = load <2 x double>, <2 x double>* %114, align 8
%115 = getelementptr double, double* %113, i64 2
%116 = bitcast double* %115 to <2 x double>*
%wide.load202 = load <2 x double>, <2 x double>* %116, align 8
%117 = load double, double* %66, align 8
%118 = insertelement <2 x double> undef, double %117, i32 0
%119 = insertelement <2 x double> %118, double %117, i32 1
%120 = insertelement <2 x double> undef, double %117, i32 0
%121 = insertelement <2 x double> %120, double %117, i32 1
%122 = call <2 x double> @llvm.fmuladd.v2f64(<2 x double> %wide.load199, <2 x double> %wide.load201, <2 x double> %119)
%123 = call <2 x double> @llvm.fmuladd.v2f64(<2 x double> %wide.load200, <2 x double> %wide.load202, <2 x double> %121)
%124 = getelementptr double, double* %67, i64 %index191
%125 = bitcast double* %124 to <2 x double>*
store <2 x double> %122, <2 x double>* %125, align 8
%126 = getelementptr double, double* %124, i64 2
%127 = bitcast double* %126 to <2 x double>*
store <2 x double> %123, <2 x double>* %127, align 8
%index.next192 = add i64 %index191, 4
%128 = icmp eq i64 %index.next192, %n.vec188
br i1 %128, label %middle.block183, label %vector.body182
middle.block183: ; preds = %vector.body182
%cmp.n194 = icmp eq i64 %43, %n.vec188
br i1 %cmp.n194, label %L139.loopexit.loopexit64, label %scalar.ph184
scalar.ph184: ; preds = %middle.block183, %min.iters.checked186, %if28.us.us.preheader
%bc.resume.val193 = phi i64 [ %n.vec188, %middle.block183 ], [ 0, %if28.us.us.preheader ], [ 0, %min.iters.checked186 ]
br label %if28.us.us
if28.us.us.us.preheader: ; preds = %if28.lr.ph.split.us.split.us
%min.iters.check208 = icmp ult i64 %43, 4
br i1 %min.iters.check208, label %scalar.ph207, label %min.iters.checked209
min.iters.checked209: ; preds = %if28.us.us.us.preheader
%n.vec211 = and i64 %43, -4
%cmp.zero212 = icmp eq i64 %n.vec211, 0
br i1 %cmp.zero212, label %scalar.ph207, label %vector.ph213
vector.ph213: ; preds = %min.iters.checked209
br label %vector.body205
vector.body205: ; preds = %vector.body205, %vector.ph213
%index214 = phi i64 [ 0, %vector.ph213 ], [ %index.next215, %vector.body205 ]
%129 = getelementptr double, double* %64, i64 %index214
%130 = bitcast double* %129 to <2 x double>*
%wide.load222 = load <2 x double>, <2 x double>* %130, align 8
%131 = getelementptr double, double* %129, i64 2
%132 = bitcast double* %131 to <2 x double>*
%wide.load223 = load <2 x double>, <2 x double>* %132, align 8
%133 = getelementptr double, double* %65, i64 %index214
%134 = bitcast double* %133 to <2 x double>*
%wide.load224 = load <2 x double>, <2 x double>* %134, align 8
%135 = getelementptr double, double* %133, i64 2
%136 = bitcast double* %135 to <2 x double>*
%wide.load225 = load <2 x double>, <2 x double>* %136, align 8
%137 = getelementptr double, double* %66, i64 %index214
%138 = bitcast double* %137 to <2 x double>*
%wide.load226 = load <2 x double>, <2 x double>* %138, align 8
%139 = getelementptr double, double* %137, i64 2
%140 = bitcast double* %139 to <2 x double>*
%wide.load227 = load <2 x double>, <2 x double>* %140, align 8
%141 = call <2 x double> @llvm.fmuladd.v2f64(<2 x double> %wide.load222, <2 x double> %wide.load224, <2 x double> %wide.load226)
%142 = call <2 x double> @llvm.fmuladd.v2f64(<2 x double> %wide.load223, <2 x double> %wide.load225, <2 x double> %wide.load227)
%143 = getelementptr double, double* %67, i64 %index214
%144 = bitcast double* %143 to <2 x double>*
store <2 x double> %141, <2 x double>* %144, align 8
%145 = getelementptr double, double* %143, i64 2
%146 = bitcast double* %145 to <2 x double>*
store <2 x double> %142, <2 x double>* %146, align 8
%index.next215 = add i64 %index214, 4
%147 = icmp eq i64 %index.next215, %n.vec211
br i1 %147, label %middle.block206, label %vector.body205
middle.block206: ; preds = %vector.body205
%cmp.n217 = icmp eq i64 %43, %n.vec211
br i1 %cmp.n217, label %L139.loopexit.loopexit, label %scalar.ph207
scalar.ph207: ; preds = %middle.block206, %min.iters.checked209, %if28.us.us.us.preheader
%bc.resume.val216 = phi i64 [ %n.vec211, %middle.block206 ], [ 0, %if28.us.us.us.preheader ], [ 0, %min.iters.checked209 ]
br label %if28.us.us.us
if28.us.us.us: ; preds = %scalar.ph207, %if28.us.us.us
%"i#668.043.us.us.us" = phi i64 [ %156, %if28.us.us.us ], [ %bc.resume.val216, %scalar.ph207 ]
%148 = getelementptr double, double* %64, i64 %"i#668.043.us.us.us"
%149 = load double, double* %148, align 8
%150 = getelementptr double, double* %65, i64 %"i#668.043.us.us.us"
%151 = load double, double* %150, align 8
%152 = getelementptr double, double* %66, i64 %"i#668.043.us.us.us"
%153 = load double, double* %152, align 8
%154 = call double @llvm.fmuladd.f64(double %149, double %151, double %153)
%155 = getelementptr double, double* %67, i64 %"i#668.043.us.us.us"
store double %154, double* %155, align 8
%156 = add nuw nsw i64 %"i#668.043.us.us.us", 1
%exitcond56 = icmp eq i64 %156, %43
br i1 %exitcond56, label %L139.loopexit.loopexit, label %if28.us.us.us
if28.us.us: ; preds = %scalar.ph184, %if28.us.us
%"i#668.043.us.us" = phi i64 [ %164, %if28.us.us ], [ %bc.resume.val193, %scalar.ph184 ]
%157 = getelementptr double, double* %64, i64 %"i#668.043.us.us"
%158 = load double, double* %157, align 8
%159 = getelementptr double, double* %65, i64 %"i#668.043.us.us"
%160 = load double, double* %159, align 8
%161 = load double, double* %66, align 8
%162 = call double @llvm.fmuladd.f64(double %158, double %160, double %161)
%163 = getelementptr double, double* %67, i64 %"i#668.043.us.us"
store double %162, double* %163, align 8
%164 = add nuw nsw i64 %"i#668.043.us.us", 1
%exitcond53 = icmp eq i64 %164, %43
br i1 %exitcond53, label %L139.loopexit.loopexit64, label %if28.us.us
if28.us.us48: ; preds = %scalar.ph161, %if28.us.us48
%"i#668.043.us.us49" = phi i64 [ %172, %if28.us.us48 ], [ %bc.resume.val170, %scalar.ph161 ]
%165 = getelementptr double, double* %64, i64 %"i#668.043.us.us49"
%166 = load double, double* %165, align 8
%167 = load double, double* %65, align 8
%168 = getelementptr double, double* %66, i64 %"i#668.043.us.us49"
%169 = load double, double* %168, align 8
%170 = call double @llvm.fmuladd.f64(double %166, double %167, double %169)
%171 = getelementptr double, double* %67, i64 %"i#668.043.us.us49"
store double %170, double* %171, align 8
%172 = add nuw nsw i64 %"i#668.043.us.us49", 1
%exitcond54 = icmp eq i64 %172, %43
br i1 %exitcond54, label %L139.loopexit.loopexit65, label %if28.us.us48
if28.us: ; preds = %scalar.ph140, %if28.us
%"i#668.043.us" = phi i64 [ %179, %if28.us ], [ %bc.resume.val149, %scalar.ph140 ]
%173 = getelementptr double, double* %64, i64 %"i#668.043.us"
%174 = load double, double* %173, align 8
%175 = load double, double* %65, align 8
%176 = load double, double* %66, align 8
%177 = call double @llvm.fmuladd.f64(double %174, double %175, double %176)
%178 = getelementptr double, double* %67, i64 %"i#668.043.us"
store double %177, double* %178, align 8
%179 = add nuw nsw i64 %"i#668.043.us", 1
%exitcond50 = icmp eq i64 %179, %43
br i1 %exitcond50, label %L139.loopexit.loopexit66, label %if28.us
if28.lr.ph.split.if28.lr.ph.split.split_crit_edge: ; preds = %if28.lr.ph.if28.lr.ph.split_crit_edge
br i1 %51, label %if28.us46.preheader, label %if28.preheader
if28.preheader: ; preds = %if28.lr.ph.split.if28.lr.ph.split.split_crit_edge
%min.iters.check = icmp ult i64 %43, 4
br i1 %min.iters.check, label %scalar.ph, label %min.iters.checked
min.iters.checked: ; preds = %if28.preheader
%n.vec = and i64 %43, -4
%cmp.zero = icmp eq i64 %n.vec, 0
br i1 %cmp.zero, label %scalar.ph, label %vector.ph
vector.ph: ; preds = %min.iters.checked
br label %vector.body
vector.body: ; preds = %vector.body, %vector.ph
%index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
%180 = load double, double* %64, align 8
%181 = insertelement <2 x double> undef, double %180, i32 0
%182 = insertelement <2 x double> %181, double %180, i32 1
%183 = insertelement <2 x double> undef, double %180, i32 0
%184 = insertelement <2 x double> %183, double %180, i32 1
%185 = load double, double* %65, align 8
%186 = insertelement <2 x double> undef, double %185, i32 0
%187 = insertelement <2 x double> %186, double %185, i32 1
%188 = insertelement <2 x double> undef, double %185, i32 0
%189 = insertelement <2 x double> %188, double %185, i32 1
%190 = load double, double* %66, align 8
%191 = insertelement <2 x double> undef, double %190, i32 0
%192 = insertelement <2 x double> %191, double %190, i32 1
%193 = insertelement <2 x double> undef, double %190, i32 0
%194 = insertelement <2 x double> %193, double %190, i32 1
%195 = call <2 x double> @llvm.fmuladd.v2f64(<2 x double> %182, <2 x double> %187, <2 x double> %192)
%196 = call <2 x double> @llvm.fmuladd.v2f64(<2 x double> %184, <2 x double> %189, <2 x double> %194)
%197 = getelementptr double, double* %67, i64 %index
%198 = bitcast double* %197 to <2 x double>*
store <2 x double> %195, <2 x double>* %198, align 8
%199 = getelementptr double, double* %197, i64 2
%200 = bitcast double* %199 to <2 x double>*
store <2 x double> %196, <2 x double>* %200, align 8
%index.next = add i64 %index, 4
%201 = icmp eq i64 %index.next, %n.vec
br i1 %201, label %middle.block, label %vector.body
middle.block: ; preds = %vector.body
%cmp.n = icmp eq i64 %43, %n.vec
br i1 %cmp.n, label %L139.loopexit.loopexit70, label %scalar.ph
scalar.ph: ; preds = %middle.block, %min.iters.checked, %if28.preheader
%bc.resume.val = phi i64 [ %n.vec, %middle.block ], [ 0, %if28.preheader ], [ 0, %min.iters.checked ]
br label %if28
if28.us46.preheader: ; preds = %if28.lr.ph.split.if28.lr.ph.split.split_crit_edge
%min.iters.check77 = icmp ult i64 %43, 4
br i1 %min.iters.check77, label %scalar.ph76, label %min.iters.checked78
min.iters.checked78: ; preds = %if28.us46.preheader
%n.vec80 = and i64 %43, -4
%cmp.zero81 = icmp eq i64 %n.vec80, 0
br i1 %cmp.zero81, label %scalar.ph76, label %vector.ph82
vector.ph82: ; preds = %min.iters.checked78
br label %vector.body74
vector.body74: ; preds = %vector.body74, %vector.ph82
%index83 = phi i64 [ 0, %vector.ph82 ], [ %index.next84, %vector.body74 ]
%202 = load double, double* %64, align 8
%203 = insertelement <2 x double> undef, double %202, i32 0
%204 = insertelement <2 x double> %203, double %202, i32 1
%205 = insertelement <2 x double> undef, double %202, i32 0
%206 = insertelement <2 x double> %205, double %202, i32 1
%207 = load double, double* %65, align 8
%208 = insertelement <2 x double> undef, double %207, i32 0
%209 = insertelement <2 x double> %208, double %207, i32 1
%210 = insertelement <2 x double> undef, double %207, i32 0
%211 = insertelement <2 x double> %210, double %207, i32 1
%212 = getelementptr double, double* %66, i64 %index83
%213 = bitcast double* %212 to <2 x double>*
%wide.load = load <2 x double>, <2 x double>* %213, align 8
%214 = getelementptr double, double* %212, i64 2
%215 = bitcast double* %214 to <2 x double>*
%wide.load91 = load <2 x double>, <2 x double>* %215, align 8
%216 = call <2 x double> @llvm.fmuladd.v2f64(<2 x double> %204, <2 x double> %209, <2 x double> %wide.load)
%217 = call <2 x double> @llvm.fmuladd.v2f64(<2 x double> %206, <2 x double> %211, <2 x double> %wide.load91)
%218 = getelementptr double, double* %67, i64 %index83
%219 = bitcast double* %218 to <2 x double>*
store <2 x double> %216, <2 x double>* %219, align 8
%220 = getelementptr double, double* %218, i64 2
%221 = bitcast double* %220 to <2 x double>*
store <2 x double> %217, <2 x double>* %221, align 8
%index.next84 = add i64 %index83, 4
%222 = icmp eq i64 %index.next84, %n.vec80
br i1 %222, label %middle.block75, label %vector.body74
middle.block75: ; preds = %vector.body74
%cmp.n86 = icmp eq i64 %43, %n.vec80
br i1 %cmp.n86, label %L139.loopexit.loopexit69, label %scalar.ph76
scalar.ph76: ; preds = %middle.block75, %min.iters.checked78, %if28.us46.preheader
%bc.resume.val85 = phi i64 [ %n.vec80, %middle.block75 ], [ 0, %if28.us46.preheader ], [ 0, %min.iters.checked78 ]
br label %if28.us46
if28.lr.ph.split.split.us: ; preds = %if28.lr.ph.if28.lr.ph.split_crit_edge
br i1 %51, label %if28.us44.us.preheader, label %if28.us44.preheader
if28.us44.preheader: ; preds = %if28.lr.ph.split.split.us
%min.iters.check97 = icmp ult i64 %43, 4
br i1 %min.iters.check97, label %scalar.ph96, label %min.iters.checked98
min.iters.checked98: ; preds = %if28.us44.preheader
%n.vec100 = and i64 %43, -4
%cmp.zero101 = icmp eq i64 %n.vec100, 0
br i1 %cmp.zero101, label %scalar.ph96, label %vector.ph102
vector.ph102: ; preds = %min.iters.checked98
br label %vector.body94
vector.body94: ; preds = %vector.body94, %vector.ph102
%index103 = phi i64 [ 0, %vector.ph102 ], [ %index.next104, %vector.body94 ]
%223 = load double, double* %64, align 8
%224 = insertelement <2 x double> undef, double %223, i32 0
%225 = insertelement <2 x double> %224, double %223, i32 1
%226 = insertelement <2 x double> undef, double %223, i32 0
%227 = insertelement <2 x double> %226, double %223, i32 1
%228 = getelementptr double, double* %65, i64 %index103
%229 = bitcast double* %228 to <2 x double>*
%wide.load111 = load <2 x double>, <2 x double>* %229, align 8
%230 = getelementptr double, double* %228, i64 2
%231 = bitcast double* %230 to <2 x double>*
%wide.load112 = load <2 x double>, <2 x double>* %231, align 8
%232 = load double, double* %66, align 8
%233 = insertelement <2 x double> undef, double %232, i32 0
%234 = insertelement <2 x double> %233, double %232, i32 1
%235 = insertelement <2 x double> undef, double %232, i32 0
%236 = insertelement <2 x double> %235, double %232, i32 1
%237 = call <2 x double> @llvm.fmuladd.v2f64(<2 x double> %225, <2 x double> %wide.load111, <2 x double> %234)
%238 = call <2 x double> @llvm.fmuladd.v2f64(<2 x double> %227, <2 x double> %wide.load112, <2 x double> %236)
%239 = getelementptr double, double* %67, i64 %index103
%240 = bitcast double* %239 to <2 x double>*
store <2 x double> %237, <2 x double>* %240, align 8
%241 = getelementptr double, double* %239, i64 2
%242 = bitcast double* %241 to <2 x double>*
store <2 x double> %238, <2 x double>* %242, align 8
%index.next104 = add i64 %index103, 4
%243 = icmp eq i64 %index.next104, %n.vec100
br i1 %243, label %middle.block95, label %vector.body94
middle.block95: ; preds = %vector.body94
%cmp.n106 = icmp eq i64 %43, %n.vec100
br i1 %cmp.n106, label %L139.loopexit.loopexit68, label %scalar.ph96
scalar.ph96: ; preds = %middle.block95, %min.iters.checked98, %if28.us44.preheader
%bc.resume.val105 = phi i64 [ %n.vec100, %middle.block95 ], [ 0, %if28.us44.preheader ], [ 0, %min.iters.checked98 ]
br label %if28.us44
if28.us44.us.preheader: ; preds = %if28.lr.ph.split.split.us
%min.iters.check118 = icmp ult i64 %43, 4
br i1 %min.iters.check118, label %scalar.ph117, label %min.iters.checked119
min.iters.checked119: ; preds = %if28.us44.us.preheader
%n.vec121 = and i64 %43, -4
%cmp.zero122 = icmp eq i64 %n.vec121, 0
br i1 %cmp.zero122, label %scalar.ph117, label %vector.ph123
vector.ph123: ; preds = %min.iters.checked119
br label %vector.body115
vector.body115: ; preds = %vector.body115, %vector.ph123
%index124 = phi i64 [ 0, %vector.ph123 ], [ %index.next125, %vector.body115 ]
%244 = load double, double* %64, align 8
%245 = insertelement <2 x double> undef, double %244, i32 0
%246 = insertelement <2 x double> %245, double %244, i32 1
%247 = insertelement <2 x double> undef, double %244, i32 0
%248 = insertelement <2 x double> %247, double %244, i32 1
%249 = getelementptr double, double* %65, i64 %index124
%250 = bitcast double* %249 to <2 x double>*
%wide.load132 = load <2 x double>, <2 x double>* %250, align 8
%251 = getelementptr double, double* %249, i64 2
%252 = bitcast double* %251 to <2 x double>*
%wide.load133 = load <2 x double>, <2 x double>* %252, align 8
%253 = getelementptr double, double* %66, i64 %index124
%254 = bitcast double* %253 to <2 x double>*
%wide.load134 = load <2 x double>, <2 x double>* %254, align 8
%255 = getelementptr double, double* %253, i64 2
%256 = bitcast double* %255 to <2 x double>*
%wide.load135 = load <2 x double>, <2 x double>* %256, align 8
%257 = call <2 x double> @llvm.fmuladd.v2f64(<2 x double> %246, <2 x double> %wide.load132, <2 x double> %wide.load134)
%258 = call <2 x double> @llvm.fmuladd.v2f64(<2 x double> %248, <2 x double> %wide.load133, <2 x double> %wide.load135)
%259 = getelementptr double, double* %67, i64 %index124
%260 = bitcast double* %259 to <2 x double>*
store <2 x double> %257, <2 x double>* %260, align 8
%261 = getelementptr double, double* %259, i64 2
%262 = bitcast double* %261 to <2 x double>*
store <2 x double> %258, <2 x double>* %262, align 8
%index.next125 = add i64 %index124, 4
%263 = icmp eq i64 %index.next125, %n.vec121
br i1 %263, label %middle.block116, label %vector.body115
middle.block116: ; preds = %vector.body115
%cmp.n127 = icmp eq i64 %43, %n.vec121
br i1 %cmp.n127, label %L139.loopexit.loopexit67, label %scalar.ph117
scalar.ph117: ; preds = %middle.block116, %min.iters.checked119, %if28.us44.us.preheader
%bc.resume.val126 = phi i64 [ %n.vec121, %middle.block116 ], [ 0, %if28.us44.us.preheader ], [ 0, %min.iters.checked119 ]
br label %if28.us44.us
if28.us44.us: ; preds = %scalar.ph117, %if28.us44.us
%"i#668.043.us45.us" = phi i64 [ %271, %if28.us44.us ], [ %bc.resume.val126, %scalar.ph117 ]
%264 = load double, double* %64, align 8
%265 = getelementptr double, double* %65, i64 %"i#668.043.us45.us"
%266 = load double, double* %265, align 8
%267 = getelementptr double, double* %66, i64 %"i#668.043.us45.us"
%268 = load double, double* %267, align 8
%269 = call double @llvm.fmuladd.f64(double %264, double %266, double %268)
%270 = getelementptr double, double* %67, i64 %"i#668.043.us45.us"
store double %269, double* %270, align 8
%271 = add nuw nsw i64 %"i#668.043.us45.us", 1
%exitcond55 = icmp eq i64 %271, %43
br i1 %exitcond55, label %L139.loopexit.loopexit67, label %if28.us44.us
if28.us44: ; preds = %scalar.ph96, %if28.us44
%"i#668.043.us45" = phi i64 [ %278, %if28.us44 ], [ %bc.resume.val105, %scalar.ph96 ]
%272 = load double, double* %64, align 8
%273 = getelementptr double, double* %65, i64 %"i#668.043.us45"
%274 = load double, double* %273, align 8
%275 = load double, double* %66, align 8
%276 = call double @llvm.fmuladd.f64(double %272, double %274, double %275)
%277 = getelementptr double, double* %67, i64 %"i#668.043.us45"
store double %276, double* %277, align 8
%278 = add nuw nsw i64 %"i#668.043.us45", 1
%exitcond51 = icmp eq i64 %278, %43
br i1 %exitcond51, label %L139.loopexit.loopexit68, label %if28.us44
if28.us46: ; preds = %scalar.ph76, %if28.us46
%"i#668.043.us47" = phi i64 [ %285, %if28.us46 ], [ %bc.resume.val85, %scalar.ph76 ]
%279 = load double, double* %64, align 8
%280 = load double, double* %65, align 8
%281 = getelementptr double, double* %66, i64 %"i#668.043.us47"
%282 = load double, double* %281, align 8
%283 = call double @llvm.fmuladd.f64(double %279, double %280, double %282)
%284 = getelementptr double, double* %67, i64 %"i#668.043.us47"
store double %283, double* %284, align 8
%285 = add nuw nsw i64 %"i#668.043.us47", 1
%exitcond52 = icmp eq i64 %285, %43
br i1 %exitcond52, label %L139.loopexit.loopexit69, label %if28.us46
if28: ; preds = %scalar.ph, %if28
%"i#668.043" = phi i64 [ %291, %if28 ], [ %bc.resume.val, %scalar.ph ]
%286 = load double, double* %64, align 8
%287 = load double, double* %65, align 8
%288 = load double, double* %66, align 8
%289 = call double @llvm.fmuladd.f64(double %286, double %287, double %288)
%290 = getelementptr double, double* %67, i64 %"i#668.043"
store double %289, double* %290, align 8
%291 = add nuw nsw i64 %"i#668.043", 1
%exitcond = icmp eq i64 %291, %43
br i1 %exitcond, label %L139.loopexit.loopexit70, label %if28
} |
Codecov Report
@@ Coverage Diff @@
## master #57 +/- ##
==========================================
+ Coverage 15.89% 19.25% +3.36%
==========================================
Files 30 30
Lines 1076 1101 +25
==========================================
+ Hits 171 212 +41
+ Misses 905 889 -16
Continue to review full report at Codecov.
|
I'm not completely sure yet about when exactly julia> A = reshape(1:4, 2, :)
2×2 Base.ReshapedArray{Int64,2,UnitRange{Int64},Tuple{}}:
1 3
2 4
julia> x = 1:2
1:2
julia> y = 3:4
3:4
julia> A*x.+y
2-element Array{Int64,1}:
10
14
julia> muladd.(A, x, y)
2×2 Array{Int64,2}:
4 6
8 12 So I would suggest only applying |
It seems that because of JuliaLang/julia#22255 it is currently not possible to immediately apply |
1 similar comment
That's catching all of the broadcast code, but in the end it's actually doing a |
Yes broadcasts work with |
The other option is to split them into separate statements: @. tmp = a1*k1 + a2*k2 + a3*k3 + a4*k4
@. tmp += a5*k5 + a6*k6 + a7*k7 + a8*k8 But that is a lot of code churn for what will hopefully get fixed soon. |
Sounds good. Looks like we're all set. |
Hi!
As a reaction to SciML/OrdinaryDiffEq.jl#86 (comment), this PR changes the current implementation of
@muladd
in the following ways:muladd
is applied recursivelyx
,y
,z
,+
,*
in the expressionx*y+z
) is a dot call,muladd
is also applied as a dot call, resulting inmuladd.(x,y,z)
This is a first step towards fixing SciML/OrdinaryDiffEq.jl#86 (comment), since with this new implementation use of
@muladd
in https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/master/src/integrators/verner_rk_integrators.jl leads to the following expressions:Moreover, it enables a shorter notation. The last example could also be written as
which is transformed to the same expression
Or e.g. in line https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/master/src/integrators/verner_rk_integrators.jl#L207
can be replaced by
which leads to