Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent performance with order of operations with constant #626

Closed
AxisAngles opened this issue Aug 5, 2022 · 2 comments · Fixed by #1529
Closed

Inconsistent performance with order of operations with constant #626

AxisAngles opened this issue Aug 5, 2022 · 2 comments · Fixed by #1529
Assignees
Labels
bug Something isn't working

Comments

@AxisAngles
Copy link

AxisAngles commented Aug 5, 2022

Usually, one should try to have constants of multiplicaiton at the front so that the constant value can be pre-computed for performance. For example, doing 3^0.5/2*x instead of x*3^0.5/2. However, If one does this in Luau, there is a considerable performance penalty, and the latter ends up being faster.

To reproduce this bug, run this code:

local sum01 = 0
local sum12 = 0
local sum23 = 0
local sum34 = 0

for i = 1, 10 do

	local x = 0
	local const = 1.234

	local t0 = os.clock()
	
	for i = 1, 2_000_000 do
		x = i*1.234
		x = i*1.234
		x = i*1.234
		x = i*1.234
		x = i*1.234
	end

	local t1 = os.clock()

	for i = 1, 2_000_000 do
		x = 1.234*i
		x = 1.234*i
		x = 1.234*i
		x = 1.234*i
		x = 1.234*i
	end

	local t2 = os.clock()

	for i = 1, 2_000_000 do
		x = i*const
		x = i*const
		x = i*const
		x = i*const
		x = i*const
	end

	local t3 = os.clock()

	for i = 1, 2_000_000 do
		x = const*i
		x = const*i
		x = const*i
		x = const*i
		x = const*i
	end

	local t4 = os.clock()
	
	sum01 += t1 - t0
	sum12 += t2 - t1
	sum23 += t3 - t2
	sum34 += t4 - t3
end

print("i*1.234", sum01)
print("1.234*i", sum12)
print("i*const", sum23)
print("const*i", sum34)

Output should look something like this:

i*1.234 0.17914739972911775
1.234*i 0.25017180014401674
i*const 0.1743612999562174
const*i 0.19351800007279962

In this case, multiplication with a constant up front is specifically slower than all other options.

@AxisAngles AxisAngles added the bug Something isn't working label Aug 5, 2022
@zeux
Copy link
Collaborator

zeux commented Aug 5, 2022

Yeah this is a current limitation of the bytecode encoding. For optimal performance right now you'd want something like var * (c1 * c2 / c3).

We were planning on lifting that for non-commutative operators (/ and - which are right now in an odd spot where 1/x or 1-x are simply not possible to reverse), but we'd need to look into how all of this works out from the balance perspective when applied to all operators, as especially for + we mostly see constants on the right hand side in the wild:

benchmarks:

      6 ADD k r
    327 ADD r k
    243 ADD r r
      9 DIV k r
     43 DIV r k
     32 DIV r r
     21 MOD r k
      4 MOD r r
     42 MUL k r
     77 MUL r k
    224 MUL r r
      1 POW k r
     11 POW r k
     15 SUB k r
    118 SUB r k
     92 SUB r r

lua-apps:

    225 ADD k r
   2873 ADD r k
   2344 ADD r r
     79 DIV k r
    787 DIV r k
    488 DIV r r
    112 MOD r k
     59 MOD r r
    435 MUL k r
    769 MUL r k
   1854 MUL r r
     14 POW k r
     18 POW r k
      3 POW r r
    298 SUB k r
   1259 SUB r k
   2059 SUB r r

@zeux
Copy link
Collaborator

zeux commented Aug 5, 2022

A quick test indicates that this would increase the interpreter code (binary) size by about 1 KB, so this will need to wait until we stop supporting older bytecode versions so that we don't increase the L1 cache footprint unnecessarily.

@zeux zeux self-assigned this Aug 5, 2022
vegorov-rbx added a commit that referenced this issue Nov 28, 2023
Right now, we can compile R\*K for all arithmetic instructions, but K\*R
gets compiled into two instructions (LOADN/LOADK + arithmetic opcode).

This is problematic since it leads to reduced performance for some code.
However, we'd like to avoid adding reverse variants of ADDK et al for
all opcodes to avoid the increase in I$ footprint for interpreter.

Looking at the arithmetic instructions, % and // don't have interesting
use cases for K\*V; ^ is sometimes used with constant on the left hand
side but this would need to call pow() by necessity in all cases so it
would be slow regardless of the dispatch overhead. This leaves the four
basic arithmetic operations.

For + and \*, we can implement a compiler-side optimization in the
future that transforms K\*R to R\*K automatically. This could either be
done unconditionally at -O2, or conditionally based on the type of the
value (driven by type annotations / inference) -- this technically
changes behavior in presence of metamethods, although it might be
sensible to just always do this because non-commutative +/* are evil.

However, for - and / it is impossible for the compiler to optimize this
in the future, so we need dedicated opcodes. This only increases the
interpreter size by ~300 bytes (~1.5%) on X64.

This makes spectral-norm and math-partial-sums 6% faster; maybe more
importantly, voxelgen gets 1.5% faster (so this change does have
real-world impact).

To avoid the proliferation of bytecode versions this change piggybacks
on the bytecode version bump that was just made in 604 for vector
constants; we would still be able to enable these independently but
we'll consider v5 complete when both are enabled.

Related: #626

---------

Co-authored-by: vegorov-rbx <[email protected]>
vegorov-rbx added a commit that referenced this issue Nov 19, 2024
When type information is specified, we can compile k*n and k+n into
MULK/ADDK forms that are faster to execute, as long as we think n is a
number. Since we generally restrict type aware optimizations to O2, this
does that as well.

This makes trig benchmark ~4% faster on Apple M2 in VM, and also a tiny
improvement on scimark (~0.1%) can be observed. The optimization only
affects interpreted execution, as NCG already can synthesize optimal
code here.

If the type information is not truthful (e.g. user annotates type as a
number and it's not), the worst case scenario is flipped arguments to
metamethods like __add/__mul for constant left hand side.

Fixes #626 (the fix requires type information or NCG but I doubt any
further work on this is warranted)

---------

Co-authored-by: vegorov-rbx <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

Successfully merging a pull request may close this issue.

2 participants