-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve rust resource usage #1
Conversation
IMO this is not a correct or fair replacement. With all other languages and runtimes, what's tested is:
In my opinion, this is the only thing reasonable to measure. In contrast, what this PR measures is:
In other words, calling The reason this is invisible is that One way to prevent this is to write use std::env;
use tokio::time::{sleep, Duration};
#[tokio::main]
async fn main() {
let args: Vec<String> = env::args().collect();
let num_tasks = args[1].parse::<i32>().unwrap();
let tasks = (0..num_tasks)
.map(|_| tokio::spawn(sleep(Duration::from_secs(10))))
.collect::<Vec<_>>();
for task in tasks {
task.await;
}
} Note the use of |
Nope, it doesn't evaluate them one by one. The script takes 10 seconds, no matter how many tasks you run. So for 1M tasks it also takes 10 seconds. And it consumed more memory than 100k. |
I don't think you understood what I meant to say. Your code does evaluate futures one by one. However, this seems to work as intended because "sleep for 10s" is translated to "sleep until now + 10s" upon future creation. So the futures are created, all waiting for approximately the same moment, and then they are evaluated one by one. The first feature takes a while to evaluate; the rest are awaited almost instantaneously. This is purely a side effect of In my opinion, it is exceedingly misleading to say that the code in this PR demonstrates that |
To be more specific, the problem here is that futures don't "really" start execution until they are first polled, and they don't resume execution until they're polled either. Your code starts futures sequentially, while all other examples in this repository start futures in parallel, including the This effect is hidden when This is only the case for some very specific futures provided by the runtime -- |
So if you are right, this code should wait 10 seconds between "2" and "3"? async fn main() {
let args: Vec<String> = env::args().collect();
let num_tasks = args[1].parse::<i32>().unwrap();
let tasks = (0..num_tasks)
.map(|_| sleep(Duration::from_secs(10)))
.collect::<Vec<_>>();
println!("1");
std::thread::sleep(std::time::Duration::from_secs(10));
println!("2");
for task in tasks {
task.await;
}
println!("3");
} |
Maybe we're having a language barrier problem here. This is not what I'm saying, at all. I hate telling you to try reading what I said again, but I'm out of ideas of how to explain this. As a very high-level not-at-all-correct metaphor, maybe consider that |
Maybe a lower-level explanation will work better? I can write something up if you let me know how much you know how familiar you are with async internals, scheduling, and event loops in general? |
I just checked By having spawn in there, it consumes more memory than C#. I know the basics, but I'm not familiar with tokio internals. |
The original code does run futures in parallel, because However, I agree that the two tokio/async_std benchmarks are somewhat incorrect, because they test the implementation of The correct way to test the individual runtimes is to invoke |
With the latter suggestion implemented, I get 386 MiB for tokio and 513 MiB for async_std, both on 1M tasks. I'm not yet sure why the memory use is that high, but it's quite possible that that's just how things are. |
Sadly, C# is the one language in this benchmark that I'm totally unfamiliar with. I'm reading the .NET code at the moment, and I think it's quite possible that their implementation is simply better than Rust's. Maybe polling (Rust) vs continuation (.NET) has something to do with this, I'm not sure. async_std's 512 bytes per task sounds unexpectedly high for no apparent reason, though. |
I think the main difference there about memory usage comes from simple place -
|
No, it's quite easy to verify that's not the case. Even if you didn't verify that in practice, it's still clear that the next power of two after 1M is very close to 1M.
It only works the same by coincidence. Reading this thread would demonstrate that a semantic difference exists. Unless you're purely talking about the |
I've done some research, and IMO, Rust is actually very inefficient re: coroutine memory use, apparently more so than C#. Hopefully this is fixed in rustc at some point, but this seems to be a known issue. I'll try to push some PRs into async_std/tokio in the meantime to workaround individual manifestations of this issue, but I won't make any promises. |
By creating the future manually instead of relying on `async { .. }`, we workaround rustc's inefficient future layouting. On [a simple benchmark](https://github.com/hez2010/async-runtimes-benchmarks-2024) spawning 1M of tasks, this reduces memory use from about 512 bytes per future to about 340 bytes per future. More context: hez2010/async-runtimes-benchmarks-2024#1
I've sent a PR to solve part of the problem for As for Perhaps the lesson here is to use |
This might be reasonable - Tasks in .NET serve a similar role Futures do in Rust but at the same time they are not a part of a singular coarser grained concurrency unit like in Rust where Futures are a part of a Task. There is no need to deterministically know the memory consumed by all the callstacks within a Task in advance since .NET is garbage-collected and heap allocates the state machine boxes for the tasks that yield asynchronously. The baseline allocation cost of the Task hence starts at about 100B, Another 100B or so are spent on allocating a At the same time, a Task in Rust if my understanding is correct has definitive knowledge of the memory taken by all of its constituent Futures. As a result, it has much smaller CPU overhead due to the way it is dispatched/polled and smaller amortized memory overhead once we start throwing complex code at it. Which could explain why it starts at 500B. This difference is unlikely to be relevant that often, but might be interesting in a common pattern of concurrency seen in C# where you have multiple calls with data that is independent from each other: using var http = new HttpClient { BaseAddress = someUrl };
var page1 = http.GetStringAsync("/page1");
var page2 = http.GetStringAsync("/page2");
Console.WriteLine(await page1 + await page2); In Rust, if you would like to have whatever form of post-processing an async function does to be handled on a different thread you would have to spawn a Task. In .NET on the other hand this is the default behavior - the tasks are hot-started and any worker thread can steal the processing of a continuation from another worker's thread queue if the latter does not get to it in time. Let me know if I got any details wrong and hopefully this sheds some light on the async differences between the two. |
I'm closing this in favor of the other merged PR. |
By creating the future manually instead of relying on `async { .. }`, we workaround rustc's inefficient future layouting. On [a simple benchmark](https://github.com/hez2010/async-runtimes-benchmarks-2024) spawning 1M of tasks, this reduces memory use from about 512 bytes per future to about 340 bytes per future. More context: hez2010/async-runtimes-benchmarks-2024#1
By creating the future manually instead of relying on `async { .. }`, we workaround rustc's inefficient future layouting. On [a simple benchmark](https://github.com/hez2010/async-runtimes-benchmarks-2024) spawning 1M of tasks, this reduces memory use from about 512 bytes per future to about 340 bytes per future. More context: hez2010/async-runtimes-benchmarks-2024#1
It lowered it to about 30% for rust_tokio