Inconsistent performance with order of operations with constant #626

AxisAngles · 2022-08-05T09:42:31Z

Usually, one should try to have constants of multiplicaiton at the front so that the constant value can be pre-computed for performance. For example, doing 3^0.5/2*x instead of x*3^0.5/2. However, If one does this in Luau, there is a considerable performance penalty, and the latter ends up being faster.

To reproduce this bug, run this code:

local sum01 = 0
local sum12 = 0
local sum23 = 0
local sum34 = 0

for i = 1, 10 do

	local x = 0
	local const = 1.234

	local t0 = os.clock()
	
	for i = 1, 2_000_000 do
		x = i*1.234
		x = i*1.234
		x = i*1.234
		x = i*1.234
		x = i*1.234
	end

	local t1 = os.clock()

	for i = 1, 2_000_000 do
		x = 1.234*i
		x = 1.234*i
		x = 1.234*i
		x = 1.234*i
		x = 1.234*i
	end

	local t2 = os.clock()

	for i = 1, 2_000_000 do
		x = i*const
		x = i*const
		x = i*const
		x = i*const
		x = i*const
	end

	local t3 = os.clock()

	for i = 1, 2_000_000 do
		x = const*i
		x = const*i
		x = const*i
		x = const*i
		x = const*i
	end

	local t4 = os.clock()
	
	sum01 += t1 - t0
	sum12 += t2 - t1
	sum23 += t3 - t2
	sum34 += t4 - t3
end

print("i*1.234", sum01)
print("1.234*i", sum12)
print("i*const", sum23)
print("const*i", sum34)

Output should look something like this:

i*1.234 0.17914739972911775
1.234*i 0.25017180014401674
i*const 0.1743612999562174
const*i 0.19351800007279962

In this case, multiplication with a constant up front is specifically slower than all other options.

The text was updated successfully, but these errors were encountered:

zeux · 2022-08-05T16:05:51Z

Yeah this is a current limitation of the bytecode encoding. For optimal performance right now you'd want something like var * (c1 * c2 / c3).

We were planning on lifting that for non-commutative operators (/ and - which are right now in an odd spot where 1/x or 1-x are simply not possible to reverse), but we'd need to look into how all of this works out from the balance perspective when applied to all operators, as especially for + we mostly see constants on the right hand side in the wild:

benchmarks:

      6 ADD k r
    327 ADD r k
    243 ADD r r
      9 DIV k r
     43 DIV r k
     32 DIV r r
     21 MOD r k
      4 MOD r r
     42 MUL k r
     77 MUL r k
    224 MUL r r
      1 POW k r
     11 POW r k
     15 SUB k r
    118 SUB r k
     92 SUB r r

lua-apps:

    225 ADD k r
   2873 ADD r k
   2344 ADD r r
     79 DIV k r
    787 DIV r k
    488 DIV r r
    112 MOD r k
     59 MOD r r
    435 MUL k r
    769 MUL r k
   1854 MUL r r
     14 POW k r
     18 POW r k
      3 POW r r
    298 SUB k r
   1259 SUB r k
   2059 SUB r r

zeux · 2022-08-05T16:22:37Z

A quick test indicates that this would increase the interpreter code (binary) size by about 1 KB, so this will need to wait until we stop supporting older bytecode versions so that we don't increase the L1 cache footprint unnecessarily.

Right now, we can compile R\*K for all arithmetic instructions, but K\*R gets compiled into two instructions (LOADN/LOADK + arithmetic opcode). This is problematic since it leads to reduced performance for some code. However, we'd like to avoid adding reverse variants of ADDK et al for all opcodes to avoid the increase in I$ footprint for interpreter. Looking at the arithmetic instructions, % and // don't have interesting use cases for K\*V; ^ is sometimes used with constant on the left hand side but this would need to call pow() by necessity in all cases so it would be slow regardless of the dispatch overhead. This leaves the four basic arithmetic operations. For + and \*, we can implement a compiler-side optimization in the future that transforms K\*R to R\*K automatically. This could either be done unconditionally at -O2, or conditionally based on the type of the value (driven by type annotations / inference) -- this technically changes behavior in presence of metamethods, although it might be sensible to just always do this because non-commutative +/* are evil. However, for - and / it is impossible for the compiler to optimize this in the future, so we need dedicated opcodes. This only increases the interpreter size by ~300 bytes (~1.5%) on X64. This makes spectral-norm and math-partial-sums 6% faster; maybe more importantly, voxelgen gets 1.5% faster (so this change does have real-world impact). To avoid the proliferation of bytecode versions this change piggybacks on the bytecode version bump that was just made in 604 for vector constants; we would still be able to enable these independently but we'll consider v5 complete when both are enabled. Related: #626 --------- Co-authored-by: vegorov-rbx <[email protected]>

When type information is specified, we can compile k*n and k+n into MULK/ADDK forms that are faster to execute, as long as we think n is a number. Since we generally restrict type aware optimizations to O2, this does that as well. This makes trig benchmark ~4% faster on Apple M2 in VM, and also a tiny improvement on scimark (~0.1%) can be observed. The optimization only affects interpreted execution, as NCG already can synthesize optimal code here. If the type information is not truthful (e.g. user annotates type as a number and it's not), the worst case scenario is flipped arguments to metamethods like __add/__mul for constant left hand side. Fixes #626 (the fix requires type information or NCG but I doubt any further work on this is warranted) --------- Co-authored-by: vegorov-rbx <[email protected]>

AxisAngles added the bug label Aug 5, 2022

zeux self-assigned this Aug 5, 2022

zeux mentioned this issue Nov 27, 2023

Add SUBRK and DIVRK bytecode instructions to bytecode v5 #1115

Merged

zeux mentioned this issue Nov 16, 2024

Compiler: Optimize k*n and k+n when types are known #1529

Merged

vegorov-rbx closed this as completed in #1529 Nov 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent performance with order of operations with constant #626

Inconsistent performance with order of operations with constant #626

AxisAngles commented Aug 5, 2022 •

edited

Loading

zeux commented Aug 5, 2022

zeux commented Aug 5, 2022

Inconsistent performance with order of operations with constant #626

Inconsistent performance with order of operations with constant #626

Comments

AxisAngles commented Aug 5, 2022 • edited Loading

zeux commented Aug 5, 2022

zeux commented Aug 5, 2022

AxisAngles commented Aug 5, 2022 •

edited

Loading