-
-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Moka get_with
inflates future size by ~7x
#212
Comments
Another issue I kinda measured is stack depth: Printing a Still, there is quite some overhead there, though luckily release builds should optimize most of that out. |
Thank you for the detailed analysis. I was not aware of this kind of optimization issue exists in
You are right. I added fn main() {
let cache = moka::future::Cache::new(1);
let get_fut = async {
let buf = [0u8; 1024];
async {}.await;
drop(buf);
};
let get_fut = Box::pin(get_fut); // <= Added this line.
println!("get_fut size: {}", std::mem::size_of_val(&get_fut));
let moka_fut = cache.get_with((), get_fut);
println!("moka_fut size: {}", std::mem::size_of_val(&moka_fut));
} Before adding the line: get_fut size: 1026
moka_fut size: 7712 After adding the line: get_fut size: 8
moka_fut size: 688 (rustc 1.66.1 (90743e729 2023-01-10), host: aarch64-apple-darwin) According to rust-lang/rust#62958, nesting Line 917 in 84bee99
async fn get_with(..., init, ...)
=> async fn get_or_insert_with_hash_and_fun(..., init, ...)
=> async fn insert_with_hash_and_fun(..., init, ...) and also I will get the |
Thank you for taking a look ❤️ I believe just reducing the number of nested calls should improve the situation here. All in all, this seems to be a well known problem for Rust compiler developers. Apart from the issue you linked to, there is also a tracking issue for general memory usage issues: rust-lang/rust#69826 A Also, I have done my analysis on BTW, I blogged about this whole topic here: https://swatinem.de/blog/future-size/ also with an analysis of assembly etc. |
We can extract this from #1010, and this should help with moka-rs/moka#212 even if we do not (yet) switch to more widespread moka usage.
I started to work on PR #220 to mitigate this issue for upcoming v0.10.0 release. Once I am satisfied with the result, I will back-port the mitigation to v0.9.7 as well. In v0.10.0, there are Current ResultI ran the following program (debug build) against the head of the fn main() {
let cache = moka::future::Cache::new(1);
async fn get() {
let buf = [0u8; 1024];
async {}.await;
#[allow(clippy::drop_copy)]
drop(buf);
}
let get_fut = get();
let get_fut_size = std::mem::size_of_val(&get_fut);
println!("get_fut size: {} bytes", get_fut_size);
let moka_fut = cache.get_with((), get_fut);
let moka_fut_size = std::mem::size_of_val(&moka_fut);
println!(
"moka_get_with_fut size: {} bytes ({:.2}x)",
moka_fut_size,
moka_fut_size as f64 / get_fut_size as f64
);
let get_fut = get();
let moka_fut = cache.entry(()).or_insert_with(get_fut);
let moka_fut_size = std::mem::size_of_val(&moka_fut);
println!(
"moka_entry_or_insert_wt_fut size: {} bytes ({:.2}x)",
moka_fut_size,
moka_fut_size as f64 / get_fut_size as f64
);
} Before get_fut size: 1026 bytes
moka_get_with_fut size: 7712 bytes (7.52x)
moka_entry_or_insert_wt_fut size: 7720 bytes (7.52x) After get_fut size: 1026 bytes
moka_get_with_fut size: 4464 bytes (4.35x)
moka_entry_or_insert_wt_fut size: 5536 bytes (5.40x) $ rustc -V
rustc 1.67.0 (fc594f156 2023-01-24) Current Change
|
I used a nightly compiler on the topic branch of the PR to log the type sizes. I found that the most inner future, which is from print-type-size variant `Suspend1`: 3361 bytes
...
print-type-size padding: 1026 bytes
print-type-size upvar `.init`: 1026 bytes, alignment: 1 bytes
...
print-type-size local `.__awaitee`: 1026 bytes, alignment: 1 bytes $ rustc +nightly -V
rustc 1.69.0-nightly (11d96b593 2023-02-01)
$ cargo +nightly rustc -F future --example check_future_size -- -Zprint-type-sizes
...
print-type-size type: `[async fn body@moka::future::value_initializer::ValueInitializer<...>::
try_init_or_read<'_, (), (), ...>::{closure#0}]`: 3368 bytes, alignment: 8 bytes
print-type-size discriminant: 1 bytes
print-type-size variant `Suspend1`: 3361 bytes
print-type-size local `.get`: 32 bytes, offset: 0 bytes, alignment: 8 bytes
print-type-size upvar `.get`: 32 bytes
print-type-size local `.cht_key`: 16 bytes
print-type-size padding: 16 bytes
print-type-size upvar `.type_id`: 8 bytes, alignment: 8 bytes
print-type-size local `.self`: 8 bytes
print-type-size local `.insert`: 24 bytes
print-type-size local `.post_init`: 8 bytes
print-type-size padding: 8 bytes
print-type-size local `.hash`: 8 bytes, alignment: 8 bytes
print-type-size local `.waiter`: 8 bytes
print-type-size local `..generator_field10`: 8 bytes
print-type-size local `.waiter_guard`: 56 bytes
print-type-size upvar `.self`: 8 bytes
print-type-size upvar `.key`: 8 bytes
print-type-size upvar `.insert`: 24 bytes
print-type-size upvar `.post_init`: 8 bytes
print-type-size padding: 1026 bytes
print-type-size upvar `.init`: 1026 bytes, alignment: 1 bytes
print-type-size local `..generator_field19`: 1 bytes
print-type-size local `..generator_field20`: 1 bytes
print-type-size local `..generator_field21`: 1 bytes
print-type-size padding: 1 bytes
print-type-size local `.__awaitee`: 1026 bytes, alignment: 1 bytes
... Unfortunately, I do not think I could simplify |
@tatsuya6502 thanks for taking a look at this! ❤️ I believe the duplication between The padding looks a bit out of place though. It might be reserved for some space that is needed in |
Yes, there are other suspend points: -Zprint-type-sizes
|
I tried one of the solutions in your blog article, and it worked very well for our case.
Commit: bc92f4e Result: get_fut size: 1026 bytes
moka_get_with_fut size: 2544 bytes (2.48x)
moka_entry_or_insert_wt_fut size: 2616 bytes (2.55x)
As for safety, I made only internal methods to take I think ~2.5x is almost optimal with current |
Nice, that looks like a very nice improvement!
Your comment is not formatted properly. It might be worth creating a separate issue in rustc for that one, depending on what is seemingly going on. |
Oops. Fixed it. FYI,
|
Thank you for creating |
Tested v0.9.7 and published it to crates.io: |
Closing this issue:
Please reopen if needed. |
Add test for Future inflating arg size to 3x This adds one more test that should track improvements to generator layout, like rust-lang#62958 and rust-lang#62575. In particular, this test highlights suboptimal layout, as the storage for the argument future is not being reused across its usage as `upvar`, `local` and `awaitee` (being polled to completion). This is on top of rust-lang#107692 (as those would conflict with each other) It is a minimal repro for code mentioned in moka-rs/moka#212 (comment) (CC `@tatsuya6502)`
FYI, there is work underway in the compiler to solve some of the problems with future size blowup: |
Consider the following example:
This outputs the following (even in
--release
mode):The resulting future grows by pretty much 7x the size of the future I provide.
The problem does not seem to be affected by rust-lang/rust#62321 though.
If I comment out the two
println
statements, I can use a nightly compiler to log the type sizes, usingcargo +nightly rustc -- -Zprint-type-sizes
:-Zprint-type-sizes
This problem manifests itself here: getsentry/symbolicator#979
The tests are failing due to stack overflows.
I believe there is a workaround though to intentionally use
Box::pin
to reduce the size of the future passed toget_with
.Also,
get_with
has roughly an overhead of~616
bytes, if I give it an empty future (though I believe that still has at least a discriminant).The text was updated successfully, but these errors were encountered: