-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure that all allocas are created in the entry block. #4685
base: trunk
Are you sure you want to change the base?
Ensure that all allocas are created in the entry block. #4685
Conversation
Non-entry-block allocas will allocate new stack memory each time they're reached, resulting in leaking stack memory over time for allocas in a loop. Move all such allocas to the entry block instead, and use an LLVM intrinsic to mark when the lifetime of the variable actually begins.
toolchain/lower/handle.cpp
Outdated
// If we're not in the entry block, make a note that we should the alloca up | ||
// into the entry block, and add a lifetime marker to indicate when its | ||
// lifetime really begins. This avoids creating a new stack allocation each | ||
// time the variable declaration is reached. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd focus on the specific aspect of how this works in the comment maybe:
// If we're not in the entry block, make a note that we should the alloca up | |
// into the entry block, and add a lifetime marker to indicate when its | |
// lifetime really begins. This avoids creating a new stack allocation each | |
// time the variable declaration is reached. | |
// If we're not in the entry block, add the instruction to a list to | |
// move into the entry block at the end of the function, and emit a lifetime marker to indicate when its | |
// lifetime really begins. |
However, what do you think about a somewhat different approach: directly create the alloca at the start of the entry block, and unconditionally emit the start intrinsics?
Specifically, this would avoid interleaving allocas with other instructions in the entry block, and would give sub-block coverage of the start/end live region even within the entry block. It also just seems a bit cleaner, even though it means a bit of setup at the start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hah, I started doing that and then talked myself out of it. The issue is that the allocas end up in reverse order compared to the variables, which really should be fine but doesn't seem "right" in some sense :) There's a convenience member on IRBuilder to jump to an insert point after the existing allocas in the entry point, but it does a linear scan so building IR would be quadratic in the number of allocas.
Other LLVM frontends seem to go to lengths to avoid this reversing of alloca order -- Clang collects allocas in a vector, and I found one that was inserting a placeholder instruction to insert allocas before and then removing it when it was done.
If you think the reverse order (and extra lifetime start markers) is fine, I'll do that. It's definitely the easiest thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that commit in my git history, switched over to that approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly, I do suspect that reverse order of allocas will end up causing problems...
What about having two IR builders, one for allocas and one for everything else? It'll be a bit fiddly to keep it at the right position... likely need to reset its insertion point the first time we go to use it after creating any other instruction in the entry block. But there isn't any caching or state to really save between the builders I would think?
If the above approach is too annoying, what about immediately creating a new basic block, so the entry block only holds allocas and an unconditional branch? We'd basically have an "alloca" block followed by the "real" entry block. That should let you pin an IR builder to the unconditional branch at the start of function emission for allocas, while the other moves. And we could even keep your finish and merge the two basic blocks together at the end -- the only cost would be the unconditional branch instruction as the rest would be spliced. Seems similar to your initial PR, but using a basic block instead of a vector, and getting to just splice rather than having to walk it?
toolchain/lower/function_context.cpp
Outdated
auto FunctionContext::Finish() -> void { | ||
// Move allocas into the entry block. | ||
auto& entry_block = function_->getEntryBlock(); | ||
auto insert_point = entry_block.getFirstNonPHIOrDbgOrAlloca(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, shouldn't we ensure that the entry block isn't branched to? And given that, there shouldn't be PHIs or other instructions canonically?
I guess you want to skip over allocas placed while emitting the entry block itself though? That makes sense, but maybe the suggestion I make in the other inline comment can avoid needing this...
and creating a lifetime start marker.
b0d3fb5
to
7645bd3
Compare
Non-entry-block allocas will allocate new stack memory each time they're reached, resulting in leaking stack memory over time for allocas in a loop. Move all such allocas to the entry block instead, and use an LLVM intrinsic to mark when the lifetime of the variable actually begins.