Ensure that all allocas are created in the entry block. #4685

zygoloid · 2024-12-13T23:55:17Z

Non-entry-block allocas will allocate new stack memory each time they're reached, resulting in leaking stack memory over time for allocas in a loop. Move all such allocas to the entry block instead, and use an LLVM intrinsic to mark when the lifetime of the variable actually begins.

chandlerc · 2024-12-14T01:30:08Z

toolchain/lower/handle.cpp

+  // If we're not in the entry block, make a note that we should the alloca up
+  // into the entry block, and add a lifetime marker to indicate when its
+  // lifetime really begins. This avoids creating a new stack allocation each
+  // time the variable declaration is reached.


I'd focus on the specific aspect of how this works in the comment maybe:

Suggested change

// If we're not in the entry block, make a note that we should the alloca up

// into the entry block, and add a lifetime marker to indicate when its

// lifetime really begins. This avoids creating a new stack allocation each

// time the variable declaration is reached.

// If we're not in the entry block, add the instruction to a list to

// move into the entry block at the end of the function, and emit a lifetime marker to indicate when its

// lifetime really begins.

However, what do you think about a somewhat different approach: directly create the alloca at the start of the entry block, and unconditionally emit the start intrinsics?

Specifically, this would avoid interleaving allocas with other instructions in the entry block, and would give sub-block coverage of the start/end live region even within the entry block. It also just seems a bit cleaner, even though it means a bit of setup at the start.

Hah, I started doing that and then talked myself out of it. The issue is that the allocas end up in reverse order compared to the variables, which really should be fine but doesn't seem "right" in some sense :) There's a convenience member on IRBuilder to jump to an insert point after the existing allocas in the entry point, but it does a linear scan so building IR would be quadratic in the number of allocas.

Other LLVM frontends seem to go to lengths to avoid this reversing of alloca order -- Clang collects allocas in a vector, and I found one that was inserting a placeholder instruction to insert allocas before and then removing it when it was done.

If you think the reverse order (and extra lifetime start markers) is fine, I'll do that. It's definitely the easiest thing.

I found that commit in my git history, switched over to that approach.

Sadly, I do suspect that reverse order of allocas will end up causing problems...

What about having two IR builders, one for allocas and one for everything else? It'll be a bit fiddly to keep it at the right position... likely need to reset its insertion point the first time we go to use it after creating any other instruction in the entry block. But there isn't any caching or state to really save between the builders I would think?

If the above approach is too annoying, what about immediately creating a new basic block, so the entry block only holds allocas and an unconditional branch? We'd basically have an "alloca" block followed by the "real" entry block. That should let you pin an IR builder to the unconditional branch at the start of function emission for allocas, while the other moves. And we could even keep your finish and merge the two basic blocks together at the end -- the only cost would be the unconditional branch instruction as the rest would be spliced. Seems similar to your initial PR, but using a basic block instead of a vector, and getting to just splice rather than having to walk it?

chandlerc · 2024-12-14T01:33:51Z

toolchain/lower/function_context.cpp

+auto FunctionContext::Finish() -> void {
+  // Move allocas into the entry block.
+  auto& entry_block = function_->getEntryBlock();
+  auto insert_point = entry_block.getFirstNonPHIOrDbgOrAlloca();


Hmm, shouldn't we ensure that the entry block isn't branched to? And given that, there shouldn't be PHIs or other instructions canonically?

I guess you want to skip over allocas placed while emitting the entry block itself though? That makes sense, but maybe the suggestion I make in the other inline comment can avoid needing this...

and creating a lifetime start marker.

github-actions bot requested a review from chandlerc December 13, 2024 23:55

github-actions bot added the toolchain label Dec 13, 2024

Simplify test and avoid relying on carbon-language#4644

480b5c4

chandlerc reviewed Dec 14, 2024

View reviewed changes

Switch to unconditionally moving the alloca to the start of the function

7645bd3

and creating a lifetime start marker.

zygoloid force-pushed the toolchain-entry-block-allocas branch from b0d3fb5 to 7645bd3 Compare December 14, 2024 02:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure that all allocas are created in the entry block. #4685

Ensure that all allocas are created in the entry block. #4685

zygoloid commented Dec 13, 2024

chandlerc Dec 14, 2024

zygoloid Dec 14, 2024 •

edited

Loading

zygoloid Dec 14, 2024

chandlerc Dec 14, 2024

chandlerc Dec 14, 2024

Ensure that all allocas are created in the entry block. #4685

Are you sure you want to change the base?

Ensure that all allocas are created in the entry block. #4685

Conversation

zygoloid commented Dec 13, 2024

chandlerc Dec 14, 2024

Choose a reason for hiding this comment

zygoloid Dec 14, 2024 • edited Loading

Choose a reason for hiding this comment

zygoloid Dec 14, 2024

Choose a reason for hiding this comment

chandlerc Dec 14, 2024

Choose a reason for hiding this comment

chandlerc Dec 14, 2024

Choose a reason for hiding this comment

zygoloid Dec 14, 2024 •

edited

Loading