Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: manage call stacks using a tree #6791

Open
wants to merge 16 commits into
base: master
Choose a base branch
from
Open

Conversation

guipublic
Copy link
Contributor

Description

Problem*

Resolves #6603

Summary*

Call stacks are stored inside a big tree, which allows to share identical prefixes between call stacks

Additional Context

The only drawback is that we need to re-create the trees among function contexts during inlining

Documentation*

Check one:

  • No documentation needed.
  • Documentation included in this PR.
  • [For Experimental Features] Documentation to be submitted in a separate PR.

PR Checklist*

  • I have tested the changes locally.
  • I have formatted the changes with Prettier and/or cargo fmt on default settings.

Copy link
Contributor

github-actions bot commented Dec 12, 2024

Peak Memory Sample

Program Peak Memory %
keccak256 77.72M -2%
workspace 121.91M -1%
regression_4709 273.51M -5%
ram_blowup_regression 1.61G 0%
private-kernel-tail 202.33M -4%
private-kernel-reset 744.53M -13%
private-kernel-inner 308.49M 1%
parity-root 168.40M -4%

Copy link
Contributor

github-actions bot commented Dec 12, 2024

Compilation Sample

Program Compilation Time %
sha256_regression 0m1.490s 1%
regression_4709 0m0.811s 2%
ram_blowup_regression 0m15.972s -3%
rollup-base-public 6m51.466s 39%
rollup-base-private 3m38.207s 11%
private-kernel-tail 0m1.230s -13%
private-kernel-reset 0m8.276s -7%
private-kernel-inner 0m2.241s -12%
parity-root 0m0.919s -7%
noir-contracts 2m38.298s -9%

@TomAFrench
Copy link
Member

Looks like this is an improvement over #6747 and gets us closer to the optimal #6753 (comment)

@TomAFrench
Copy link
Member

We are eating a performance penalty however.

Copy link
Contributor

@jfecher jfecher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some good improvements here - I wonder why regression_4709 increased so much in compilation time though. From the description it sounds like inlining could take longer but regression_4709 seems to be dominated by a large loop instead.

compiler/noirc_evaluator/src/ssa/ir/dfg.rs Outdated Show resolved Hide resolved
@guipublic
Copy link
Contributor Author

I wonder why regression_4709 increased so much in compilation time though. From the description it sounds like inlining could take longer but regression_4709 seems to be dominated by a large loop instead.

My guess is that it is due to getting a big list of flattened call stacks; not too deep but very wide, which means some elements have a lot of children and searching among them would be significant. I will try to use some sorted container for the children, like a btree.

@@ -85,6 +86,8 @@ impl Function {
/// Note that any parameters or attributes of the function must be manually added later.
pub(crate) fn new(name: String, id: FunctionId) -> Self {
let mut dfg = DataFlowGraph::default();
// Adds root node for the location tree
dfg.add_location_to_root(Location::dummy());
Copy link
Contributor

@aakoshh aakoshh Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something that could be done in DataFlowGraph::default() itself? Or maybe the new CallStackHelper?

Copy link
Contributor

@jfecher jfecher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - we may still want to change call stacks back to Vecs from lists now that we no longer need the sharing there

Copy link
Contributor

github-actions bot commented Dec 13, 2024

Changes to Brillig bytecode sizes

Generated at commit: 79abe7c0caf997b4cff3847c7fdc1b764dcacd34, compared to commit: f065c6682e2c896a346716cf88ac285f1d4bf846

🧾 Summary (10% most significant diffs)

Program Brillig opcodes (+/-) %
poseidonsponge_x5_254 +3 ❌ +0.07%
sha256_regression +3 ❌ +0.04%

Full diff report 👇
Program Brillig opcodes (+/-) %
poseidonsponge_x5_254 4,254 (+3) +0.07%
sha256_regression 6,920 (+3) +0.04%

Comment on lines +67 to +69
.iter()
.rev()
.take(1000)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we're searching in reverse so that we find recent additions faster; is there any significance as to why give up after 1000 children, after which a duplicate entry is acceptable? Is that value something that was actually observed?

I wonder if you thought some reverse index from e.g. the hash of the location to the child CallStackId could be used to make this faster and easier to reason about? Like children: HashMap<u64, CallStackId> where the key is fxhash::hash64(location).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reduce memory usage of tracking instruction callstacks in dfg.locations
4 participants