Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditional RAM writes scale nonlinearly in loops #5027

Open
sirasistant opened this issue May 13, 2024 · 24 comments
Open

Conditional RAM writes scale nonlinearly in loops #5027

sirasistant opened this issue May 13, 2024 · 24 comments
Labels
bug Something isn't working

Comments

@sirasistant
Copy link
Contributor

sirasistant commented May 13, 2024

Aim

Using conditional ram writes:

global SIZE = 500;

fn main(mut fields: [Field; SIZE], enables: [bool; SIZE], indices: [u64; SIZE]) -> pub [Field; SIZE] {
    for i in 0..SIZE {
        if enables[i] {
            fields[indices[i]] = i as Field;
        }
    }

    fields
}

Expected Behavior

I expected conditional RAM writes to scale in the same way with the number of iterations as the unconditional ones

global SIZE = 500;

fn main(mut fields: [Field; SIZE], indices: [u64; SIZE]) -> pub [Field; SIZE] {
    for i in 0..SIZE {
        fields[indices[i]] = i as Field;
    }

    fields
}

Bug

The first example scales nonlinearly: With size=30 it's 10k constraints (333 constraints/iteration) and with size=500 it's 1.65M constraints (3300 constraints/iteration)

The second example (unconditional) scales sublinearly (possibly because of fixed costs): with size=100 it's 4.6k constraints and with size=500 it's 12k constraints.

Also, it seems like even without the nonlinear scaling the conditional ram writes are quite expensive (10x the cost of a unconditional one in the best case)

To Reproduce

Project Impact

None

Impact Context

No response

Workaround

None

Workaround Description

No response

Additional Context

No response

Installation Method

None

Nargo Version

No response

NoirJS Version

No response

Would you like to submit a PR for this Issue?

None

Support Needs

No response

@sirasistant sirasistant added the bug Something isn't working label May 13, 2024
@github-project-automation github-project-automation bot moved this to 📋 Backlog in Noir May 13, 2024
@TomAFrench
Copy link
Member

SSA for the expensive program with SIZE = 2

After Array Set Optimizations:
acir(inline) fn main f0 {
  b0(v0: [Field; 2], v1: [u1; 2]):
    v92 = array_get v1, index u64 0
    enable_side_effects v92
    v93 = array_get v0, index u64 0
    enable_side_effects u1 1
    v95 = cast v92 as Field
    v97 = mul v95, v93
    v98 = cast v92 as u64
    v100 = array_get v1, index u64 1
    enable_side_effects v100
    v101 = array_get v0, index u64 1
    v102 = lt v98, u64 2
    v103 = mul v102, v100
    constrain v103 == v100 'push out of bounds'
    v104 = array_set [v97, Field 0], index v98, value v101
    v105 = add v98, u64 1
    v106 = not v100
    enable_side_effects v100
    v107 = array_get v104, index v98
    v108 = array_get [v97, Field 0], index v98
    v109 = cast v100 as Field
    v110 = cast v106 as Field
    v111 = mul v109, v107
    v112 = mul v110, v108
    v113 = add v111, v112
    v114 = array_set mut v104, index v98, value v113
    enable_side_effects u1 1
    v115 = cast v100 as u64
    v116 = cast v106 as u64
    v117 = mul v115, v105
    v118 = mul v116, v98
    v119 = add v117, v118
    constrain v119 == u64 0
    return v114
}

One thing that jumps out is the check that we're not pushing past the capacity of the BoundedVec. We have enough information here that we should be able to rule that out at compile time.

@TomAFrench
Copy link
Member

assert(self.len < MaxLen, "push out of bounds");

We should be able to just remove this assertion (or throw the error in an unconstrained block if we want to maintain the error message). The attempt to write past the end of the array would trigger the ACVM to fail.

@TomAFrench
Copy link
Member

This ends up being just a small contribution (~4000) constraints.

@guipublic
Copy link
Contributor

Did you really compare the number of constraints or just the number of acir opcodes?
Because 25 constraints for writing 500 times into an array of 500 elements does not seem right. I would expect ~2000 instead.

@sirasistant
Copy link
Contributor Author

sirasistant commented May 28, 2024

@guipublic

I was referring to 25 constraints/item, will update first message!

This function:

global SIZE = 500;

fn main(fields: [Field; SIZE], to_keep: [bool; SIZE]) -> pub [Field; SIZE] {
    let mut bounded_vec: BoundedVec<Field, SIZE> = BoundedVec::new();

    for i in 0..SIZE {
        if to_keep[i] {
            bounded_vec.push(fields[i]);
        }
    }

    assert_eq(bounded_vec.len(), 0);
    bounded_vec.storage
}

is 1.6M gates

This one:

global SIZE = 500;

fn main(mut fields: [Field; SIZE], indices: [u64; SIZE]) -> pub [Field; SIZE] {
    for i in 0..SIZE {
        fields[indices[i]] = i as Field;
    }

    fields
}

is 12k gates

@jfecher
Copy link
Contributor

jfecher commented May 28, 2024

One thing that jumps out is the check that we're not pushing past the capacity of the BoundedVec. We have enough information here that we should be able to rule that out at compile time.

This looks like it'd be difficult to rule out at compile-time. We know that we only push at most once per loop and only loop SIZE times but the compiler just sees that each push is conditional on to_keep and can't keep an exact count. If we wanted to remove this condition we'd need much more meticulous tracking of maximum bounds, probably tracking each individual case of the value (e.g. len can be 0 or 1 here) then verifying all of those cases are < 2.

@sirasistant
Copy link
Contributor Author

The length check is not the main contributor though, there seems to be some weird scaling with loop sizes going on. Constraint counts per iteration become much more similar if we decrease the number of iterations to 20. In that case, the version costs 320 constraints/write (a 10x reduction per iteration by scaling down the number of iterations)

@sirasistant
Copy link
Contributor Author

To avoid any boundedvec related overhead, tried this code:

global SIZE = 500;

fn main(mut fields: [Field; SIZE], enables: [bool; SIZE], indices: [u64; SIZE]) -> pub [Field; SIZE] {
    for i in 0..SIZE {
        if enables[i] {
            fields[indices[i]] = i as Field;
        }
    }

    fields
}

And it also scales nonlinearly. With size=500 it's 1.65M constraints (3300 constraints/iteration) and with size=30 it's 10k constraints (333 constraints/iteration)

@sirasistant sirasistant changed the title Conditional RAM writes cost many constraints Conditional RAM writes scale nonlinearly in loops May 29, 2024
@sirasistant
Copy link
Contributor Author

Updating the original issue

@jfecher
Copy link
Contributor

jfecher commented May 29, 2024

And it also scales nonlinearly. With size=500 it's 1.65M constraints (3300 constraints/iteration) and with size=30 it's 10k constraints (333 constraints/iteration)

Sounds related to #4629 then.

Testing your most recent example in the comment I get:

SIZE | Acir Opcodes | Change
2    |     57       |
3    |     88       |   31
4    |     121      |   33
5    |     156      |   35

So it is increasing by 2 per each increase in SIZE.

@sirasistant
Copy link
Contributor Author

Is there a way to use the as_witness intrinsic to fix this nonlinear scaling? or it's not fixable by that? Haven't looked much into the actual constraints being generated for predicated RAM writes yet

@vezenovm
Copy link
Contributor

For this program where the SIZE = 2:

global SIZE = 2;

fn main(mut fields: [Field; SIZE], enables: [bool; SIZE], indices: [u64; SIZE]) -> pub [Field; SIZE] {
    for i in 0..SIZE {
        if enables[i] {
            fields[indices[i]] = i as Field;
        }
    }

    fields
}

we get the following SSA:

acir(inline) fn main f0 {
  b0(v0: [Field; 2], v1: [u1; 2], v2: [u64; 2]):
    inc_rc v0
    v69 = array_get v1, index u32 0
    enable_side_effects v69
    v70 = array_get v2, index u32 0
    v71 = cast v70 as u32
    v72 = array_set v0, index v71, value Field 0
    v74 = not v69
    v75 = array_get v72, index v71
    v76 = array_get v0, index v71
    v77 = cast v69 as Field
    v78 = cast v74 as Field
    v79 = mul v77, v75
    v80 = mul v78, v76
    v81 = add v79, v80
    v82 = array_set mut v72, index v71, value v81
    enable_side_effects u1 1
    v83 = array_get v1, index u32 1
    enable_side_effects v83
    v84 = array_get v2, index u32 1
    v85 = cast v84 as u32
    v86 = array_set v82, index v85, value Field 1
    v88 = not v83
    v89 = array_get v86, index v85
    v90 = array_get v82, index v85
    v91 = cast v83 as Field
    v92 = cast v88 as Field
    v93 = mul v91, v89
    v94 = mul v92, v90
    v95 = add v93, v94
    v96 = array_set mut v86, index v85, value v95
    enable_side_effects u1 1
    dec_rc v0
    return v96
}

We have two places where we are not optimally setting up the array gets/sets.

    v72 = array_set v0, index v71, value Field 0
    v74 = not v69
    v75 = array_get v72, index v71
    v76 = array_get v0, index v71

and

    v86 = array_set v82, index v85, value Field 1
    v88 = not v83
    v89 = array_get v86, index v85
    v90 = array_get v82, index v85

For both v76 = array_get v0, index v71 and v90 = array_get v82, index v85. We can call the array_get before the array_set in both instances, but right now we end up writing and copying the dynamic array when we do not need to do so. For example, we could instead read from v82 before the array_set and then the can have a mut v82 in the array set. This can save us having to copy the dynamic array.

This is in line with the conclusion from this comment above:

So it is increasing by 2 per each increase in SIZE.

As we increase SIZE by one we have two extra dynamic reads to copy over the array from the previous SIZE.

@jfecher
Copy link
Contributor

jfecher commented Jul 12, 2024

As we increase SIZE by one we have two extra dynamic reads to copy over the array from the previous SIZE.

That sounds like it would just be a change in velocity of 2 extra constraints, not an acceleration of 2 extra constraints. If there were a constant 2 extra constraints per each SIZE increase we'd see no change in acceleration / nonlinear scaling.

(pardon my velocity/acceleration analogy)

@vezenovm
Copy link
Contributor

vezenovm commented Jul 12, 2024

That sounds like it would just be a change in velocity of 2 extra constraints, not an acceleration of 2 extra constraints. If there were a constant 2 extra constraints per each SIZE increase we'd see no change in acceleration / nonlinear scaling.

For clarity sake: I am using ACIR gates as my metric as I find it a bit easier to reason about for ACIR gen than backend constraint count.

If we increase SIZE by one we have two extra dynamic reads to copy over the array from the previous SIZE, but we are also doing those two extra dynamic reads an extra time. e.g. if we made SIZE = 6 the SSA above would repeat three more times. So we would be copying and initializing a new array of size 6 now 6 times, rather than copying and initializing a new size 2 array 2 times. This is what is exponential.

@jfecher
Copy link
Contributor

jfecher commented Jul 12, 2024

Thanks for the explanation, I misunderstood before.

I wonder if #5449 would help (or fix 🤞) this. With that I'd expect:

    v72 = array_set v0, index v71, value Field 0
    v74 = not v69
    v75 = array_get v72, index v71
    v76 = array_get v0, index v71

To be turned into

    v72 = array_set v0, index v71, value Field 0
    v74 = not v69
    // v75 optimized to constant Field 0
    v76 = array_get v0, index v71

and

    v86 = array_set v82, index v85, value Field 1
    v88 = not v83
    v89 = array_get v86, index v85
    v90 = array_get v82, index v85

to be optimized to:

    v86 = array_set v82, index v85, value Field 1
    v88 = not v83
    // v89 optimized to constant Field 1
    v90 = array_get v82, index v85

@TomAFrench
Copy link
Member

TomAFrench commented Jul 12, 2024

I was having a look at whether #5449 would help on this as well. I get the SSA

After Array Set Optimizations:
acir(inline) fn main f0 {
  b0(v0: [Field; 2], v1: [u1; 2], v2: [u64; 2]):
    inc_rc v0
    v59 = array_get v1, index u32 0
    enable_side_effects v59
    v60 = array_get v2, index u32 0
    v61 = cast v60 as u32
    v62 = array_set v0, index v61, value Field 0
    v64 = not v59
    v65 = array_get v0, index v61
    v67 = cast v64 as Field
    v68 = mul v67, v65
    v69 = array_set mut v62, index v61, value v68
    enable_side_effects u1 1
    v70 = array_get v1, index u32 1
    enable_side_effects v70
    v71 = array_get v2, index u32 1
    v72 = cast v71 as u32
    v73 = array_set v69, index v72, value Field 1
    v75 = not v70
    v76 = array_get v69, index v72
    v77 = cast v70 as Field
    v78 = cast v75 as Field
    v79 = mul v78, v76
    v80 = add v77, v79
    v81 = array_set mut v73, index v72, value v80
    enable_side_effects u1 1
    dec_rc v0
    return v81
}

which lines up with your predictions.

@jfecher
Copy link
Contributor

jfecher commented Jul 12, 2024

Testing Maxime's example with that PR on different sizes I get fewer constraints:

size | acir opcodes
2    | 43
3    | 67
4    | 93

But it is still accelerating by 2 each time.
(For reference on master the opcode count for these sizes is 51, 79, and 109 respectively)

@vezenovm
Copy link
Contributor

From Tom's new SSA above we get less constraints but we still have these blocks:

v73 = array_set v69, index v72, value Field 1
v75 = not v70
v76 = array_get v69, index v72

The issue is that this array_get should be able to happen before this array_set. Then v69 can be a mut v69.

@jfecher
Copy link
Contributor

jfecher commented Jul 12, 2024

    v73 = array_set v69, index v72, value Field 1
    v75 = not v70
    v76 = array_get v69, index v72

Odd, I'd have expected v76 to be optimized out by Tom's PR.
I'd also think we don't want the array_get to be before the array_set though since that'd change the result.

@vezenovm
Copy link
Contributor

I'd also think we don't want the array_get to be before the array_set though since that'd change the result.

v69 is immutable no?

@jfecher
Copy link
Contributor

jfecher commented Jul 12, 2024

v69 is immutable no?

Right, my bad. I was staring at that snippet for a bit too long 😅

@TomAFrench
Copy link
Member

Odd, I'd have expected v76 to be optimized out by Tom's PR.

That would require the read to be performed on v73 rather than v69

@jfecher
Copy link
Contributor

jfecher commented Jul 12, 2024

That would require the read to be performed on v73 rather than v69

Right, I forgot how array_set works

@TomAFrench
Copy link
Member

We should be able to reorder these array reads/writes, my only concern is if this could result in changes to the error message being emitted in some cases. e.g. we move an out-of-bounds read earlier in the program which causes execution to "jump" to later in the program.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: 📋 Backlog
Development

Successfully merging a pull request may close this issue.

5 participants