Instantiate global arrays only once #6775

vezenovm · 2024-12-11T19:34:52Z

Aim

I should be able to use large global array constants in a loop. Global constants cannot be mutable so I should be able to just be performing a read on an already existing array value.

Expected Behavior

The loop from regression_4709 should not take very long to compute:

    let mut acc: Field = 0;
    for i in 0..257 {
        for j in 0..257 {
            acc += EXPONENTIATE[i][j];
        }
    }
    assert(!acc.lt(x));
    assert(x != y);

However, regression_4709 is not run as part of our brillig execution opcodes report as it takes too much time/memory.

Bug

Reading from a global array in a loop can lead to a massive execution blowup as we are generating the array and incrementing its reference counter inside of the loop.

If we take regression_4709, and reduce the EXPONENTIATE array to be 10 x 10 rather than 257 x 257 we can see this blowup.

global EXPONENTIATE: [[Field; 10]; 10] = [
    [1, 1, 1, 1, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
];
fn main(x: Field, y: pub Field) {
    let mut acc: Field = 0;
    for i in 0..10 {
        for j in 0..10 {
            acc += EXPONENTIATE[i][j];
        }
    }
    assert(!acc.lt(x));
    assert(x != y);
}

The initial SSA we generate is the following:

brillig(inline) fn main f0 {
  b0(v0: Field, v1: Field):
    v4 = allocate -> &mut Field
    store Field 0 at v4
    jmp b1(u32 0)
  b1(v2: u32):
    v8 = lt v2, u32 10
    jmpif v8 then: b3, else: b2
  b2():
    v9 = load v4 -> Field
    v11 = call f1(v9, v0) -> u1
    v12 = not v11
    constrain v11 == u1 0
    v14 = eq v0, v1
    v15 = not v14
    constrain v14 == u1 0
    return
  b3():
    jmp b4(u32 0)
  b4(v3: u32):
    v16 = lt v3, u32 10
    jmpif v16 then: b6, else: b5
  b5():
    v18 = add v2, u32 1
    jmp b1(v18)
  b6():
    v19 = load v4 -> Field
    v21 = make_array [Field 1, Field 1, Field 1, Field 1, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0] : [Field; 10]
    v22 = make_array [Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0] : [Field; 10]
    v23 = make_array [Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0] : [Field; 10]
    v24 = make_array [Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0] : [Field; 10]
    v25 = make_array [Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0] : [Field; 10]
    v26 = make_array [Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0] : [Field; 10]
    v27 = make_array [Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0] : [Field; 10]
    v28 = make_array [Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0] : [Field; 10]
    v29 = make_array [Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0] : [Field; 10]
    v30 = make_array [Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0] : [Field; 10]
    v31 = make_array [v21, v22, v23, v24, v25, v26, v27, v28, v29, v30] : [[Field; 10]; 10]
    v32 = array_get v31, index v2 -> [Field; 10]
    inc_rc v32
    v33 = array_get v32, index v3 -> Field
    v34 = add v19, v33
    store v34 at v4
    v35 = add v3, u32 1
    jmp b4(v35)
}

You will see that we are creating our nested array inside of the inner most nested loop. If we were to switch the global to a let statement inside of main we see these make_array instructions are laid down in b0. The global constant version executes 12.5k brillig opcodes and the local variable version executes 2.2k.

Once we get to constant folding the repeat make_array instructions are replaced by inc_rc instructions due to this logic when deduplicating make arrays

noir/compiler/noirc_evaluator/src/ssa/opt/constant_folding.rs

Line 321 in f6957fa

dfg.insert_instruction_and_results(inc_rc, block, None, call_stack);

.

Our final SSA then looks like this:

brillig(inline) fn main f0 {
  b0(v0: Field, v1: Field):
    v4 = allocate -> &mut Field
    store Field 0 at v4
    jmp b1(u32 0)
  b1(v2: u32):
    v8 = lt v2, u32 10
    jmpif v8 then: b3, else: b2
  b2():
    v9 = load v4 -> Field
    v11 = call field_less_than(v9, v0) -> u1
    constrain v11 == u1 0
    v13 = eq v0, v1
    constrain v13 == u1 0
    return
  b3():
    jmp b4(u32 0)
  b4(v3: u32):
    v14 = lt v3, u32 10
    jmpif v14 then: b6, else: b5
  b5():
    v16 = add v2, u32 1
    jmp b1(v16)
  b6():
    v17 = load v4 -> Field
    v19 = make_array [Field 1, Field 1, Field 1, Field 1, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0] : [Field; 10]
    v20 = make_array [Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0] : [Field; 10]
    inc_rc v20
    inc_rc v20
    inc_rc v20
    inc_rc v20
    inc_rc v20
    inc_rc v20
    inc_rc v20
    inc_rc v20
    v21 = make_array [v19, v20, v20, v20, v20, v20, v20, v20, v20, v20] : [[Field; 10]; 10]
    v22 = array_get v21, index v2 -> [Field; 10]
    inc_rc v22
    v23 = array_get v22, index v3 -> Field
    v24 = add v17, v23
    store v24 at v4
    v25 = add v3, u32 1
    jmp b4(v25)
}

This array creation logic should be outside of the loop and we should not see a difference in execution trace when working with a global or local array.

To Reproduce

Go to execution_success/regression_4709
Reduce size of exponentiation
Compile with --show-ssa
Inspect SSA

Workaround

Yes

Workaround Description

Use a local array to reduce execution trace

Additional Context

No response

Project Impact

Blocker

Blocker Context

There is a workaround, but unless a user is aware of this internal compiler bug they will not be aware of why their program is so slow. Thus, I have marked it as a blocker.

Nargo Version

nargo version = 1.0.0-beta.0 noirc version = 1.0.0-beta.0+f1756ef1b2fe78da24d250bba3534870a00685d4 (git version hash: 2a840ca, is dirty: true)

NoirJS Version

No response

Proving Backend Tooling & Version

No response

Would you like to submit a PR for this Issue?

Yes

Support Needs

No response

The text was updated successfully, but these errors were encountered:

vezenovm added brillig Unconstrained functions / brillig IR bug Something isn't working ssa labels Dec 11, 2024

vezenovm self-assigned this Dec 11, 2024

github-project-automation bot added this to Noir Dec 11, 2024

github-project-automation bot moved this to 📋 Backlog in Noir Dec 11, 2024

vezenovm mentioned this issue Dec 12, 2024

feat(ssa): Hoist MakeArray instructions during loop invariant code motion #6782

Merged

5 tasks

vezenovm closed this as completed in #6782 Dec 12, 2024

github-project-automation bot moved this from 📋 Backlog to ✅ Done in Noir Dec 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instantiate global arrays only once #6775

Instantiate global arrays only once #6775

vezenovm commented Dec 11, 2024

Instantiate global arrays only once #6775

Instantiate global arrays only once #6775

Comments

vezenovm commented Dec 11, 2024

Aim

Expected Behavior

Bug

To Reproduce

Workaround

Workaround Description

Additional Context

Project Impact

Blocker Context

Nargo Version

NoirJS Version

Proving Backend Tooling & Version

Would you like to submit a PR for this Issue?

Support Needs