-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add default feature for pinning TaskPool threads to CPU cores #6471
Conversation
I'm not sure it is the case. While the preemptive scheduler on all of the OSes we're supporting can make large scale thread interruptions, CPU pinning won't solve that alone. The system itself may be oversubscribed momentarily and there's not much we can do about that in userspace. For that, we'd need to set thread priority, to prevent the app's threads from being preempted by the scheduler. Though I'm much less confident we should make that move since altering process/thread priorities without good reason to do so can easily choke a system by preventing it from being interrupted by high realtime priority tasks (OS level IO, etc). One potential option in this space is temporarily setting a thread to a higher priority while executing a system or a One other well-known area that definitely could benefit from this is audio processing, which is known to benefit greatly from the lowered preemption rates and higher cache coherency, so we could also scope that out as a focus area for the use of either of these two config options. Ultimately, CPU pinning should be viewed as a performance knob exposed to developers (and maybe end users via a config file) that exposes an explicit tradeoff between the time to schedule the app's threads and the cache coherency of normal operation. It may be perf win, or a perf loss depending on the use case. |
Ah I think I know why. Setting the core affinity means that the thread can only be scheduled to those cores. Meaning the OS will defer scheduling that thread until the core is available again. This includes when the thread willingly yields to the OS. There's probably some other process that is holding the core for longer that is forcing that stall. The only real way we can prevent this is by altering thread priority, and that might not be what we want. |
That makes sense. Looks like the time slice that a process is given on windows is around 20ms which is about the time I'm seeing. And it's happening to the prepare systems mostly because those park the thread when the gpu is busy. |
Backlog cleanup: this mentions #4740 (which was closed without merging) as best merged first. Given that, and inactivity since Dec 2022, suspect it's reasonable to close this one too. |
Objective
Fixes #4803. When the OS changes which physical CPU core a thread is running on, the CPU cache is effectively flushed as a result of the context switch, forcing the local context to refetch the entire execution context from main memory. This can be a significant performance killer in data-oriented programming, which Bevy and ECS in general heavily leverages. The current situation also allows threads within a process to be assigned to the same physical cores, which can cause an app's threads to fight for the same physical resources.
Solution
Use core_affinity to forcibly pin OS threads to specific CPU cores via CPU pinning. This isn't a silver bullet for dealing with the impact of context switches, but makes the tradeoff of limiting the scheduling opportunities provided by the OS for ensuring that a task on a given thread cannot be moved to a different physical core..
This PR fetches a list of all of the physical core IDs, and pins threads 1:1 to each one. This is turned on by an optional but default feature on bevy_tasks.
This is currently a draft as the current configuration will cause all 3 TaskPools (Compute, AsyncCompute, and IO) to be pinned to the same threads, which is not ideal. Preferably this would be best merged after #4740 is completed.