-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generators are too big #52924
Comments
I am wondering if thinking of this as "slots of a struct" is the right model anyway. The content of the generator type (aside from the state flag) is basically the call frame for this generator, so shouldn't this be treated much more like the stack? There probably shouldn't even be types, just a large enough opaque "blob of bytes" that can be used for generator execution. If you think of this in terms of fields, you are not going to be able to reuse the space of two |
Yes, it's just a stack of bytes. This optimization was referring to the possibility to overlap some of the bytes to hold data that is currently stored in separate "slots of a struct". |
My intuition about generator layout is that it would be:
I expect that this is complicated by alignment and wanting to avoid moving values around inside of the generator as it iterates through states, but clearly we should be doing better than we are if the size is currently the sum of all yield point sizes instead of the max. If benchmarks are bad enough this may be a pre-stabilization priority. :-\ |
It's the sum of all variables that are alive across a yield point (one slot for every variable, not one slot for every yield point). We can do better by tracking which variables are live across non-intersecting sets of yield points and using the same space to store both. |
Marking as blocking for async/await stabilization. This may be too strong, but this issue certainly prevents us from having the performant sort of futures that we want. @cramertj had advocated for a simple strategy, vaguely described here, and covered in some detail in the video |
Created a topic on Zulip which contains some notes on this |
I think @tmandry is intending to work on this, so tentatively assigning-- feel free to let me know if I should unassign. |
#59087 was merged into this, but I'm not sure it would be fixed by the optimization we were discussing here. Taking this (simplified) segment: let composed_2 = async {
let inner = i_am_1kb();
println!("");
await!(inner);
}; Here, removing the |
I haven't followed the Rust async story closely, but I recently encountered this paper in the most recent C++ committee mailing. Apparently the size issue was considered a deal-breaker for adopting the same strategy that's implemented in Rust, and the type-erased version was chosen instead. Where are Rust coroutines exactly in the "trade-offs" tables from the paper? |
@petrochenkov We currently have some optimizations that are performed before the generator type is created, and these are allowed to affect the As far as the requirements around being called recursively, placing it on an ABI boundary, being virtual, these can all be done by choosing to type-erase by using the The header-file bits don't apply to Rust for obvious reasons (though it is interesting to consider that these optimizations are a dependency of crate metadata due to their affects on the generator size). |
I played a bit around with @tmandry's optimization from #60187 in my project and wanted to share the results here instead of spamming the CR: Here are my results (for the top level Old nightly:
tmandry:generator-optimization:
And there is the relevant code which put those things together: let x = async {
let everything = async {
let (ctrl, starter) = connection_builder.build().unwrap();
dbg!(std::mem::size_of_val(&starter));
let conn_fut = async move {
let conn_res = starter.start_connection().await;
trace!("Res: {:?}", conn_res);
};
let send_task_futs = join_all((0..8).map(|_| send_msgs(&ctrl)));
let lifecycle_fut = lifecycle_task(&ctrl);
dbg!(std::mem::size_of_val(&conn_fut));
dbg!(std::mem::size_of_val(&send_task_futs));
dbg!(std::mem::size_of_val(&lifecycle_fut));
join!(conn_fut, send_task_futs, lifecycle_fut);
};
dbg!(std::mem::size_of_val(&everything));
everything.await
};
dbg!(std::mem::size_of_val(&x));
block_on(x); So it definitely seems to help quite a bit. The inner Futures ( However it seems like the main issue is still the exponential growth that is caused by some operations. E.g. here Afterwards I was curious on how much The code then looks like: let x = async {
let everything = async {
let (ctrl, starter) = connection_builder.build().unwrap();
dbg!(std::mem::size_of_val(&starter));
let conn_fut = async move {
let conn_res = starter.start_connection().await;
trace!("Res: {:?}", conn_res);
};
let send_task_futs = join_all((0..8).map(|_| send_msgs(&ctrl)));
let lifecycle_fut = lifecycle_task(&ctrl);
dbg!(std::mem::size_of_val(&conn_fut));
dbg!(std::mem::size_of_val(&send_task_futs));
dbg!(std::mem::size_of_val(&lifecycle_fut));
let joiner = Joiner {
a: Some(conn_fut),
b: Some(send_task_futs),
c: Some(lifecycle_fut),
a_res: None,
b_res: None,
c_res: None,
};
dbg!(std::mem::size_of_val(&joiner));
joiner.await;
};
dbg!(std::mem::size_of_val(&everything));
everything.await
};
dbg!(std::mem::size_of_val(&x));
block_on(x); The sizes with that change are:
Which are in the same ballpark as with The interesting thing here: Actually there is only a single Edit: The last sentence was not true, |
@Matthias247 This is really helpful, thanks! I'm not sure what the root cause here is, but if you put that future in a crate I can checkout and build somewhere, I can probably find out. |
…mertj Preserve local scopes in generator MIR Part of #52924, depended upon by the generator layout optimization #60187. This PR adds `StorageDead` statements in more places in generators, so we can see when non-`Drop` locals have gone out of scope and recover their storage. The reason this is only done for generators is compiler performance. See #60187 (comment) for what happens when we do this for all functions. For `Drop` locals, we modify the `MaybeStorageLive` analysis to use `drop` to indicate that storage is no longer live for the local. Once `drop` returns or unwinds to our function, we implicitly assume that the local is `StorageDead`. Instead of using `drop`, it is possible to emit more `StorageDead` statements in the MIR for `Drop` locals so we can handle all locals the same. I am fine with doing it that way, but this was the simplest approach for my purposes. It is also likely to be more performant. r? @Zoxc (feel free to reassign) cc @cramertj @eddyb @RalfJung @rust-lang/wg-async-await
So I've taken a close look at @Matthias247's example, and believe that the solution here is the same as the one in #59123 (comment). |
Removing the "blocking" label here as I believe the most urgent issue has been resolved. These types are still bigger than we'd like them to be and there're lots of opportunities for improvement, but I think we're now at a point where I'd feel comfortable stabilizing WRT size. |
This issue caused stackoverflows for me in some cases. I think it should be a blocker. |
@PvdBerg1998 A major optimization PR (#60187) landed the other day, have you tested with that? That said, I think there's still a huge improvement left to do in #59123. I'm hoping to have a fix for that up soon. |
@tmandry I didn't, that's great! I'll take a look some time in the future. |
Is generator size only a stability hazard because of the performance implications, or are there concerns about adding future generator size optimizations backward-compatibly? I was under the impression that we only guarantee size_of stability for repr(c) types. |
I recently had to deal with a special-purpose implementation of mini stackless coroutines based on LLVM IR and used for running GPU shaders with barriers on CPU.
Not sure how applicable these ideas are to our MIR, but I'm still just going to leave them here. |
The result size of generator _gen = 4100. #![feature(generators, generator_trait)]
fn main() {
let _gen = move || {
let x = [1u8; 1024];
yield;
drop(x);
let x = [1u8; 1024];
yield;
drop(x);
let x = [1u8; 1024];
yield;
drop(x);
let x = [1u8; 1024];
yield;
drop(x);
};
dbg!(std::mem::size_of_val(&_gen));
} Errors:
|
@DustinByfuglien Yes, note that adding manual scopes around each variable keeps the size at #![feature(generators, generator_trait)]
fn main() {
let _gen = move || {
{
let x = [1u8; 1024];
yield;
drop(x);
}
{
let x = [1u8; 1024];
yield;
drop(x);
}
{
let x = [1u8; 1024];
yield;
drop(x);
}
{
let x = [1u8; 1024];
yield;
drop(x);
}
};
dbg!(std::mem::size_of_val(&_gen));
}
|
But adding one more "yield" to each scope give 4100 again. #![feature(generators, generator_trait)]
fn main() {
let _gen = || {
{
let x = [1u8; 1024];
yield;
yield;
drop(x);
}
{
let x = [1u8; 1024];
yield;
yield;
drop(x);
}
{
let x = [1u8; 1024];
yield;
yield;
drop(x);
}
{
let x = [1u8; 1024];
yield;
yield;
drop(x);
}
};
dbg!(std::mem::size_of_val(&_gen));
} |
@tmandry May be you can take some time to look at this. May be this examples can be useful to improve optimizing algorythm. |
Yes, that's because we only consider temporaries to be eligible for overlap if they're only live for exactly one yield point. |
The original impetus for this issue (getting generators to ever re-use storage slots) has been addressed to some degree, shall we close this? Alternatively, should we re-purpose this issue into a tracking issue for all generator-size-related issues? |
I feel like it would likely be better to create a new tracking issue, rather than inherit all the history from this one. |
Closing in favor of #69826 |
Currently, generators won't ever reuse storage slots, causing them to take up more space than is necessary:
Finding optimal solutions seems like a difficult packing problem, but it should be easy to do quite a bit better.
Also see the example case in #59087.
The text was updated successfully, but these errors were encountered: