chore: manage call stacks using a tree #6791

guipublic · 2024-12-12T13:34:08Z

Description

Problem*

Resolves #6603

Summary*

Call stacks are stored inside a big tree, which allows to share identical prefixes between call stacks

Additional Context

The only drawback is that we need to re-create the trees among function contexts during inlining

Documentation*

Check one:

No documentation needed.
Documentation included in this PR.
[For Experimental Features] Documentation to be submitted in a separate PR.

PR Checklist*

I have tested the changes locally.
I have formatted the changes with Prettier and/or cargo fmt on default settings.

github-actions · 2024-12-12T13:58:14Z

Peak Memory Sample

Program	Peak Memory	%
keccak256	77.72M	-2%
workspace	121.91M	-1%
regression_4709	273.51M	-5%
ram_blowup_regression	1.61G	0%
private-kernel-tail	202.33M	-4%
private-kernel-reset	744.53M	-13%
private-kernel-inner	308.49M	1%
parity-root	168.40M	-4%

github-actions · 2024-12-12T13:59:45Z

Compilation Sample

Program	Compilation Time	%
sha256_regression	0m1.490s	1%
regression_4709	0m0.811s	2%
ram_blowup_regression	0m15.972s	-3%
rollup-base-public	6m51.466s	39%
rollup-base-private	3m38.207s	11%
private-kernel-tail	0m1.230s	-13%
private-kernel-reset	0m8.276s	-7%
private-kernel-inner	0m2.241s	-12%
parity-root	0m0.919s	-7%
noir-contracts	2m38.298s	-9%

TomAFrench · 2024-12-12T14:40:25Z

Looks like this is an improvement over #6747 and gets us closer to the optimal #6753 (comment)

TomAFrench · 2024-12-12T14:42:41Z

We are eating a performance penalty however.

jfecher

Some good improvements here - I wonder why regression_4709 increased so much in compilation time though. From the description it sounds like inlining could take longer but regression_4709 seems to be dominated by a large loop instead.

compiler/noirc_evaluator/src/ssa/ir/dfg.rs

guipublic · 2024-12-12T15:58:00Z

I wonder why regression_4709 increased so much in compilation time though. From the description it sounds like inlining could take longer but regression_4709 seems to be dominated by a large loop instead.

My guess is that it is due to getting a big list of flattened call stacks; not too deep but very wide, which means some elements have a lot of children and searching among them would be significant. I will try to use some sorted container for the children, like a btree.

compiler/noirc_evaluator/src/ssa/ir/dfg.rs

aakoshh · 2024-12-12T16:36:16Z

compiler/noirc_evaluator/src/ssa/ir/function.rs

@@ -85,6 +86,8 @@ impl Function {
    /// Note that any parameters or attributes of the function must be manually added later.
    pub(crate) fn new(name: String, id: FunctionId) -> Self {
        let mut dfg = DataFlowGraph::default();
+        // Adds root node for the location tree
+        dfg.add_location_to_root(Location::dummy());


Is this something that could be done in DataFlowGraph::default() itself? Or maybe the new CallStackHelper?

compiler/noirc_evaluator/src/ssa/ir/dfg.rs

jfecher

LGTM - we may still want to change call stacks back to Vecs from lists now that we no longer need the sharing there

github-actions · 2024-12-13T18:52:59Z

Changes to Brillig bytecode sizes

Generated at commit: 79abe7c0caf997b4cff3847c7fdc1b764dcacd34, compared to commit: f065c6682e2c896a346716cf88ac285f1d4bf846

🧾 Summary (10% most significant diffs)

Program	Brillig opcodes (+/-)	%
poseidonsponge_x5_254	+3 ❌	+0.07%
sha256_regression	+3 ❌	+0.04%

Full diff report 👇

Program	Brillig opcodes (+/-)	%
poseidonsponge_x5_254	4,254 (+3)	+0.07%
sha256_regression	6,920 (+3)	+0.04%

aakoshh · 2024-12-15T15:24:47Z

compiler/noirc_evaluator/src/ssa/ir/call_stack.rs

+            .iter()
+            .rev()
+            .take(1000)


I assume we're searching in reverse so that we find recent additions faster; is there any significance as to why give up after 1000 children, after which a duplicate entry is acceptable? Is that value something that was actually observed?

I wonder if you thought some reverse index from e.g. the hash of the location to the child CallStackId could be used to make this faster and easier to reason about? Like children: HashMap<u64, CallStackId> where the key is fxhash::hash64(location).

guipublic added 3 commits December 12, 2024 13:29

manage call stacks with a tree

f3e30a1

Merge branch 'master' into gd/issue_6603

690ce1c

fix merge issues

3255400

fix unit tests

23f6af1

jfecher reviewed Dec 12, 2024

View reviewed changes