Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Make nargo::ops::transform_program idempotent #6695

Merged

Conversation

aakoshh
Copy link
Contributor

@aakoshh aakoshh commented Dec 4, 2024

Description

Problem*

Followup for #6694

Summary*

Makes nargo::ops::transform_program idempotent by adding two loops, both up to 3 passes, exiting early if the opcode hash doesn't change:

  • an outer loop in acvm::compiler::compile that includes optimize_internal and transform_internal
  • an inner loop in acvm::compiler::transformers::transform_internal

Additional Context

Here's the journey I went through investigating the source of changes across optimisation runs.

Ever increasing current_witness_index

On the slice_loop example the Circuit::current_witness_index field increased by 2 after each pass. This seems to be because:

  1. It is increased by the number of intermediate variables created here even though the mutable reference is also increased, hence the double increment
  2. The intermediate variables are (in this case entirely) eliminated later, but the current index isn't adjusted accordingly.

The PR adds a brute force step to visit each opcode and collect the remaining witnesses to set the next one correctly.

Alternatively we could implement PartialEq for Circuit in a way that it ignores current_witness_index.

After this fix we have the following tests still flagging the transformation as non-idempotent:

  • 7_function
  • conditional_1
  • fold_fibonacci
  • hashmap
  • regression_5252
  • regression_6451
  • sha256
  • sha256_regression
  • sha256_var_size_regression
  • sha256_var_witness_const_regression
  • slices
  • to_be_bytes

Multiple passes

For the above I found that adding the following to the test makes all of them pass:

                for _ in 0..2 {
                    program_0 = nargo::ops::transform_program(program_0, width);
                }

So two extra full passes to stabilise the optimisation currently. An example where 1 extra pass is not enough was conditional_1. Ideally we would find which part of the transformation needs to be repeated to avoid having to do a full pass; our initial hypothesis was to look at #6668

I had to add a loop around transform_internal allowing 3 passes before the tests indicated more stability than before. The expectation was that we only need to loop around MergeExpressionOptimizer, but that doesn't seem to be true. Even after this, the following two programs fail:

  • 7_function
  • slices
    The difference in both cases is the removal of a range check:
assertion failed: `(left == right)`: optimization not idempotent for test program 'slices''
  left: `"func 0\ncurrent witness index : 617\nprivate parameters indices : [0]\npublic parameters indices : [1]\nreturn value indices : []\nEXPR [ (1, _1) -10 ]\nEXPR [ (-1, _0) (-1, _2) 10 ]\nBRILLIG CALL func 0: in..."` (truncated)
 right: `"func 0\ncurrent witness index : 617\nprivate parameters indices : [0]\npublic parameters indices : [1]\nreturn value indices : []\nEXPR [ (1, _1) -10 ]\nEXPR [ (-1, _0) (-1, _2) 10 ]\nBRILLIG CALL func 0: in..."` (truncated)

Differences (-left|+right):
 BRILLIG CALL func 0: inputs: [Single(Expression { mul_terms: [], linear_combinations: [(1, Witness(13))], q_c: 0 })], outputs: [Simple(Witness(14))]
 EXPR [ (1, _13, _14) (1, _15) -1 ]
 EXPR [ (1, _13, _15) 0 ]
 EXPR [ (1, _7) (-1, _16) 5 ]
-BLACKBOX::RANGE [(16)] [ ]
 EXPR [ (-1, _17) 0 ]
 EXPR [ (1, _0, _4) (-1, _489) 0 ]
 EXPR [ (-10, _4, _8) (-1, _490) 0 ]
 EXPR [ (10, _8) (-1, _18) (1, _489) (1, _490) 0 ]

For these to stabilise had to move the loop to be around the entire transformation that includes the initial backend agnostic optimize_internal as well:

/// Applies [`ProofSystemCompiler`][crate::ProofSystemCompiler] specific optimizations to a [`Circuit`].
pub fn compile<F: AcirField>(
    acir: Circuit<F>,
    expression_width: ExpressionWidth,
) -> (Circuit<F>, AcirTransformationMap) {
    let mut pass = 0;
    let mut prev_acir = acir;

    let (mut acir, acir_opcode_positions) = loop {
        let (acir, acir_opcode_positions) = optimize_internal(prev_acir);
        let (acir, acir_opcode_positions) =
            transform_internal(acir, expression_width, acir_opcode_positions);

        if pass == 2 {
            break (acir, acir_opcode_positions);
        }
        pass += 1;
        prev_acir = acir;
    };

    let transformation_map = AcirTransformationMap::new(&acir_opcode_positions);
    acir.assert_messages = transform_assert_messages(acir.assert_messages, &transformation_map);

    (acir, transformation_map)
}

The final solution has an inner and an outer loop, treating both as black boxes.

Documentation*

Check one:

  • No documentation needed.
  • Documentation included in this PR.
  • [For Experimental Features] Documentation to be submitted in a separate PR.

PR Checklist*

  • I have tested the changes locally.
  • I have formatted the changes with Prettier and/or cargo fmt on default settings.

@aakoshh aakoshh changed the base branch from master to 6670-test-transform-is-idempotent December 4, 2024 11:56
Copy link
Contributor

github-actions bot commented Dec 4, 2024

Peak Memory Sample

Program Peak Memory %
keccak256 83.22M 0%
workspace 121.82M -1%
regression_4709 333.68M 0%
ram_blowup_regression 2.65G 0%

@aakoshh aakoshh force-pushed the 6670-fix-transform-idempotent branch from 4b370d6 to b59cb34 Compare December 4, 2024 11:59
Copy link
Member

@TomAFrench TomAFrench left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we need two extra passes to stabilise the optimisation currently. An example were 1 extra pass is not enough was conditional_1.

Performing the full transformation step again is a little overkill. The only non-idempotent optimization is #6668 afaik so if anything we should run multiple instances of this but avoid the rest.

acvm-repo/acvm/src/compiler/transformers/mod.rs Outdated Show resolved Hide resolved
@aakoshh
Copy link
Contributor Author

aakoshh commented Dec 4, 2024

Performing the full transformation step again is a little overkill.

Agreed, I just tried that as a quick way to see if there is an upper bound now.

@aakoshh
Copy link
Contributor Author

aakoshh commented Dec 4, 2024

The only non-idempotent optimization is #6668 afaik so if anything we should run multiple instances of this but avoid the rest.

I pushed a commit which allows multiple passes with the MergeExpressionsOptimizer, but despite this another full cycle still changes the results 🤔

Copy link
Contributor

github-actions bot commented Dec 4, 2024

Changes to circuit sizes

Generated at commit: f4e1f04a230d77a9a4f95cf19a8ccd56c9bcc841, compared to commit: 73e11c629cac3b1308d1f48ea2db74accf800f97

🧾 Summary (10% most significant diffs)

Program ACIR opcodes (+/-) % Circuit size (+/-) %
regression_6451 -2 ✅ -12.50% -1 ✅ -3.57%
conditional_1 +255 ❌ +4.15% -690 ✅ -4.12%

Full diff report 👇
Program ACIR opcodes (+/-) % Circuit size (+/-) %
sha256_regression 20,465 (+39) +0.19% 200,368 (+21) +0.01%
sha256 1,732 (+2) +0.12% 24,916 (0) 0.00%
sha256_var_witness_const_regression 1,370 (+2) +0.15% 16,810 (0) 0.00%
regression_5252 32,898 (+9) +0.03% 44,246 (-6) -0.01%
slices 737 (-1) -0.14% 3,836 (-2) -0.05%
7_function 155 (-1) -0.64% 2,968 (-2) -0.07%
sha256_var_size_regression 11,505 (+64) +0.56% 70,388 (-48) -0.07%
hashmap 24,133 (+28) +0.12% 62,509 (-64) -0.10%
to_be_bytes 67 (-6) -8.22% 118 (-3) -2.48%
regression_6451 14 (-2) -12.50% 27 (-1) -3.57%
conditional_1 6,405 (+255) +4.15% 16,044 (-690) -4.12%

@aakoshh
Copy link
Contributor Author

aakoshh commented Dec 4, 2024

For example on the following example it looks like the MEO only runs once, which suggests there is another source of unstable output:

cargo test -p nargo_cli --bins -- test_transform_program_is_idempotent to_be_bytes

@aakoshh aakoshh marked this pull request as ready for review December 4, 2024 17:32
@aakoshh aakoshh force-pushed the 6670-fix-transform-idempotent branch from 371678e to ce9d107 Compare December 4, 2024 17:37
Copy link
Collaborator

@asterite asterite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@asterite asterite force-pushed the 6670-fix-transform-idempotent branch from 051be60 to ce9d107 Compare December 4, 2024 19:31
@aakoshh aakoshh merged commit 82b38bd into 6670-test-transform-is-idempotent Dec 5, 2024
26 checks passed
@aakoshh aakoshh deleted the 6670-fix-transform-idempotent branch December 5, 2024 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants