Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ssa): Hoist MakeArray instructions during loop invariant code motion #6782

Merged
merged 11 commits into from
Dec 12, 2024

Conversation

vezenovm
Copy link
Contributor

@vezenovm vezenovm commented Dec 12, 2024

Description

Problem*

Resolves #6775

Summary*

In #6685 we restricted that MakeArray can only be deduplicated in an ACIR runtime. This is due to that arrays can actually be mutated in unconstrained code.

Now when hoisting loop invariants, we check whether we are hoisting a MakeArray. If we have hoisted a MakeArray instruction, we also insert an IncrementRc instruction in the old location of MakeArray in the loop block.

This PR is trading off the cost of creating an array inside of a loop vs. the cost of creating an array once + incrementing its reference counter inside the loop. Other than for the smallest of arrays, this tradeoff should almost always be worth it.

Before merging we may want to determine some kind of heuristic for when this optimization should run. It could also be left for a follow-up PR.

Additional Context

A potential optimization to follow-up this PR would be to bring back the rc tracker that was fully removed in DIE #6700. This would more generally help with the cost from deduplicating MakeArray instructions more generally as right now we always insert an IncrementRc upon deduplicating or hoisting a MakeArray. If that array is never mutably borrowed in the block, DIE should be able to remove those instructions that were included to maintain correctness.

regression_4709 performance:
Master:
The test was previously excluded from our CI's Brillig tests due to that execution would hang. I even ran out of memory and crashed the mainframe when trying to profile it.
After this PR:
~52 million opcodes executed
After this PR w/ #6783:
~1.2 millions opcodes executed

blob performance:
Master:
~990 million opcodes executed
After this PR:
~775 million opcodes executed (23.7% improvement from master)
After this PR w/ #6783:
~449 million opcodes executed (54.6% improvement from master)

Documentation*

Check one:

  • No documentation needed.
  • Documentation included in this PR.
  • [For Experimental Features] Documentation to be submitted in a separate PR.

PR Checklist*

  • I have tested the changes locally.
  • I have formatted the changes with Prettier and/or cargo fmt on default settings.

Copy link
Contributor

github-actions bot commented Dec 12, 2024

Changes to Brillig bytecode sizes

Generated at commit: 426b9e32148487e3d4d2b1c358d7001370c60599, compared to commit: 919149d3413be5232b33611094687fdb5fd86086

🧾 Summary (10% most significant diffs)

Program Brillig opcodes (+/-) %
array_dedup_regression +6 ❌ +2.29%
merkle_insert -16 ✅ -1.95%
sha256_var_size_regression -40 ✅ -2.10%

Full diff report 👇
Program Brillig opcodes (+/-) %
array_dedup_regression 268 (+6) +2.29%
regression_4449 796 (+15) +1.92%
array_dynamic_blackbox_input 1,077 (+18) +1.70%
simple_shield 938 (+9) +0.97%
fold_numeric_generic_poseidon 775 (+6) +0.78%
no_predicates_numeric_generic_poseidon 775 (+6) +0.78%
regression_4709 134,514 (+774) +0.58%
sha256_var_padding_regression 5,318 (+18) +0.34%
sha2_byte 2,833 (+9) +0.32%
ram_blowup_regression 1,002 (+3) +0.30%
hashmap 22,212 (+61) +0.28%
uhashmap 14,454 (+39) +0.27%
sha256_regression 7,112 (+15) +0.21%
sha256_brillig_performance_regression 1,740 (+3) +0.17%
brillig_cow_regression 2,236 (+3) +0.13%
poseidon_bn254_hash 5,533 (-2) -0.04%
poseidon_bn254_hash_width_3 5,533 (-2) -0.04%
regression_5252 4,741 (-15) -0.32%
poseidonsponge_x5_254 4,353 (-24) -0.55%
merkle_insert 804 (-16) -1.95%
sha256_var_size_regression 1,864 (-40) -2.10%

Copy link
Contributor

github-actions bot commented Dec 12, 2024

Changes to number of Brillig opcodes executed

Generated at commit: 426b9e32148487e3d4d2b1c358d7001370c60599, compared to commit: 919149d3413be5232b33611094687fdb5fd86086

🧾 Summary (10% most significant diffs)

Program Brillig opcodes (+/-) %
merkle_insert -197 ✅ -4.86%
regression_4449 -23,445 ✅ -9.80%
array_dedup_regression -410 ✅ -37.00%

Full diff report 👇
Program Brillig opcodes (+/-) %
sha2_byte 47,575 (+601) +1.28%
fold_numeric_generic_poseidon 5,218 (+6) +0.12%
no_predicates_numeric_generic_poseidon 5,218 (+6) +0.12%
poseidonsponge_x5_254 184,584 (-426) -0.23%
regression_5252 919,211 (-5,169) -0.56%
uhashmap 146,320 (-1,684) -1.14%
sha256_brillig_performance_regression 23,341 (-309) -1.31%
brillig_cow_regression 520,937 (-8,525) -1.61%
ram_blowup_regression 781,398 (-13,205) -1.66%
sha256_var_padding_regression 223,017 (-4,142) -1.82%
simple_shield 2,925 (-59) -1.98%
sha256_regression 119,623 (-2,481) -2.03%
hashmap 54,647 (-1,235) -2.21%
poseidon_bn254_hash 163,416 (-3,822) -2.29%
poseidon_bn254_hash_width_3 163,416 (-3,822) -2.29%
sha256_var_size_regression 16,733 (-601) -3.47%
array_dynamic_blackbox_input 18,636 (-714) -3.69%
merkle_insert 3,856 (-197) -4.86%
regression_4449 215,700 (-23,445) -9.80%
array_dedup_regression 698 (-410) -37.00%

Copy link
Contributor

github-actions bot commented Dec 12, 2024

Peak Memory Sample

Program Peak Memory
keccak256 79.14M
workspace 122.03M
regression_4709 295.99M
ram_blowup_regression 2.44G
private-kernel-tail 210.43M
private-kernel-reset 862.22M
private-kernel-inner 307.94M
parity-root 175.78M

Copy link
Contributor

github-actions bot commented Dec 12, 2024

Compilation Sample

Program Compilation Time %
sha256_regression 0m1.594s 10%
regression_4709 0m0.823s 8%
ram_blowup_regression 0m17.405s 1%
private-kernel-tail 0m1.311s 6%
private-kernel-reset 0m8.432s -7%
private-kernel-inner 0m2.320s -15%
parity-root 0m0.918s -3%
noir-contracts 2m52.531s 5%

@vezenovm vezenovm mentioned this pull request Dec 12, 2024
5 tasks
@vezenovm vezenovm marked this pull request as ready for review December 12, 2024 04:00
@vezenovm vezenovm added the run-external-checks Trigger CI job to run tests on external repos label Dec 12, 2024
@jfecher
Copy link
Contributor

jfecher commented Dec 12, 2024

This PR is trading off the cost of creating an array inside of a loop vs. the cost of creating an array once + incrementing its reference counter inside the loop. Other than for the smallest of arrays, this tradeoff should almost always be worth it.

I'll note that if we're creating an array in the loop and mutating it each iteration then this approach would be slower since we'd have an extra inc_rc instruction each iteration and need to do a dynamic check to copy the array. Overall though it's probably a fine tradeoff since the good case of this opt is much better than the worst case applying it can have.

@vezenovm
Copy link
Contributor Author

I'll note that if we're creating an array in the loop and mutating it each iteration

Yeah good note. The trade-off noted in the description is more for creating immutable arrays outside the loop. However, an array declared mutable inside the loop should already be issuing an inc_rc instruction after being created and will be copied already every iteration.

@vezenovm vezenovm requested a review from a team December 12, 2024 15:31
Copy link
Contributor

@jfecher jfecher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think hoisting without the mutable checks should be fine. Unlike deduplication, there should be no chance the array in question is used between the hoist point and original point since the instruction just wasn't present there before.

@jfecher jfecher added this pull request to the merge queue Dec 12, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 12, 2024
@vezenovm vezenovm added this pull request to the merge queue Dec 12, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 12, 2024
@vezenovm vezenovm enabled auto-merge December 12, 2024 20:46
@vezenovm vezenovm added this pull request to the merge queue Dec 12, 2024
@vezenovm vezenovm removed this pull request from the merge queue due to a manual request Dec 12, 2024
@vezenovm vezenovm added this pull request to the merge queue Dec 12, 2024
Merged via the queue into master with commit b88db67 Dec 12, 2024
79 of 81 checks passed
@vezenovm vezenovm deleted the mv/hoist-make-array-fromloops branch December 12, 2024 21:23
AztecBot added a commit to AztecProtocol/aztec-packages that referenced this pull request Dec 13, 2024
…iant code motion (noir-lang/noir#6782)

feat: add `(x | 1)` optimization for booleans (noir-lang/noir#6795)
feat: `nargo test -q` (or `nargo test --format terse`) (noir-lang/noir#6776)
fix: disable failure persistance in nargo test fuzzing (noir-lang/noir#6777)
feat(cli): Verify `return` against ABI and `Prover.toml` (noir-lang/noir#6765)
chore(ssa): Activate loop invariant code motion on ACIR functions (noir-lang/noir#6785)
fix: use extension in docs link so it also works on GitHub (noir-lang/noir#6787)
fix: optimizer to keep track of changing opcode locations (noir-lang/noir#6781)
fix: Minimal change to avoid reverting entire PR #6685 (noir-lang/noir#6778)
AztecBot added a commit to AztecProtocol/aztec-packages that referenced this pull request Dec 13, 2024
…tion (noir-lang/noir#6782)

feat: add `(x | 1)` optimization for booleans (noir-lang/noir#6795)
feat: `nargo test -q` (or `nargo test --format terse`) (noir-lang/noir#6776)
fix: disable failure persistance in nargo test fuzzing (noir-lang/noir#6777)
feat(cli): Verify `return` against ABI and `Prover.toml` (noir-lang/noir#6765)
chore(ssa): Activate loop invariant code motion on ACIR functions (noir-lang/noir#6785)
fix: use extension in docs link so it also works on GitHub (noir-lang/noir#6787)
fix: optimizer to keep track of changing opcode locations (noir-lang/noir#6781)
fix: Minimal change to avoid reverting entire PR #6685 (noir-lang/noir#6778)
TomAFrench added a commit that referenced this pull request Dec 14, 2024
* master: (313 commits)
  chore: Do not print entire functions when running debug trace (#6814)
  chore(ci): Active rollup circuits in compilation report (#6813)
  feat(ssa): Bring back tracking of RC instructions during DIE (#6783)
  feat: add `nargo test --format json` (#6796)
  chore: Change Id to use a u32 (#6807)
  feat(ssa): Hoist MakeArray instructions during loop invariant code motion  (#6782)
  feat: add `(x | 1)` optimization for booleans (#6795)
  feat: `nargo test -q` (or `nargo test --format terse`) (#6776)
  fix: disable failure persistance in nargo test fuzzing (#6777)
  feat(cli): Verify `return` against ABI and `Prover.toml` (#6765)
  chore(ssa): Activate loop invariant code motion on ACIR functions (#6785)
  fix: use extension in docs link so it also works on GitHub (#6787)
  fix: optimizer to keep track of changing opcode locations (#6781)
  fix: Minimal change to avoid reverting entire PR #6685 (#6778)
  feat: several `nargo test` improvements (#6728)
  chore: Try replace callstack with a linked list (#6747)
  chore: Use `NumericType` not `Type` for casts and numeric constants (#6769)
  chore(ci): Extend compiler memory report to external repos (#6768)
  chore(ci): Handle external libraries in compilation timing report (#6750)
  feat(ssa): Implement missing brillig constraints SSA check (#6658)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run-external-checks Trigger CI job to run tests on external repos
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Instantiate global arrays only once
3 participants