Add default feature for pinning TaskPool threads to CPU cores #6471

james7132 · 2022-11-05T00:23:05Z

Objective

Fixes #4803. When the OS changes which physical CPU core a thread is running on, the CPU cache is effectively flushed as a result of the context switch, forcing the local context to refetch the entire execution context from main memory. This can be a significant performance killer in data-oriented programming, which Bevy and ECS in general heavily leverages. The current situation also allows threads within a process to be assigned to the same physical cores, which can cause an app's threads to fight for the same physical resources.

Solution

Use core_affinity to forcibly pin OS threads to specific CPU cores via CPU pinning. This isn't a silver bullet for dealing with the impact of context switches, but makes the tradeoff of limiting the scheduling opportunities provided by the OS for ensuring that a task on a given thread cannot be moved to a different physical core..

This PR fetches a list of all of the physical core IDs, and pins threads 1:1 to each one. This is turned on by an optional but default feature on bevy_tasks.

This is currently a draft as the current configuration will cause all 3 TaskPools (Compute, AsyncCompute, and IO) to be pinned to the same threads, which is not ideal. Preferably this would be best merged after #4740 is completed.

alice-i-cecile · 2022-11-05T01:17:48Z

At work, we've been seeing anomalous lag spikes without a corresponding gameplay reason.

This seems like a very plausible candidate for this problem.

james7132 · 2022-11-05T12:32:50Z

At work, we've been seeing anomalous lag spikes without a corresponding gameplay reason.

This seems like a very plausible candidate for this problem.

I'm not sure it is the case. While the preemptive scheduler on all of the OSes we're supporting can make large scale thread interruptions, CPU pinning won't solve that alone. The system itself may be oversubscribed momentarily and there's not much we can do about that in userspace. For that, we'd need to set thread priority, to prevent the app's threads from being preempted by the scheduler. Though I'm much less confident we should make that move since altering process/thread priorities without good reason to do so can easily choke a system by preventing it from being interrupted by high realtime priority tasks (OS level IO, etc).

One potential option in this space is temporarily setting a thread to a higher priority while executing a system or a par_for_each task to avoid preemption inside that block, but that might add more overhead due to the system calls necessary to set and reset the priority. Worth testing though.

One other well-known area that definitely could benefit from this is audio processing, which is known to benefit greatly from the lowered preemption rates and higher cache coherency, so we could also scope that out as a focus area for the use of either of these two config options.

Ultimately, CPU pinning should be viewed as a performance knob exposed to developers (and maybe end users via a config file) that exposes an explicit tradeoff between the time to schedule the app's threads and the cache coherency of normal operation. It may be perf win, or a perf loss depending on the use case.

hymm · 2022-12-16T03:13:52Z

Using tracy I can see this is working on my Windows machine.

Initial investigation is that this is performing worse, but something weird is going on and it deserves more investigation.

So the frame times are worse on many foxes

but frames are randomly getting stalled out

I see it mostly during the prepare stage, but also in sometimes in other systems like transform propagation and check visibility. No idea why it might be happening.

james7132 · 2022-12-16T05:22:33Z

Ah I think I know why. Setting the core affinity means that the thread can only be scheduled to those cores. Meaning the OS will defer scheduling that thread until the core is available again. This includes when the thread willingly yields to the OS. There's probably some other process that is holding the core for longer that is forcing that stall. The only real way we can prevent this is by altering thread priority, and that might not be what we want.

hymm · 2022-12-16T19:43:26Z

That makes sense. Looks like the time slice that a process is given on windows is around 20ms which is about the time I'm seeing. And it's happening to the prepare systems mostly because those park the thread when the gpu is busy.

bas-ie · 2024-10-20T02:18:04Z

Backlog cleanup: this mentions #4740 (which was closed without merging) as best merged first. Given that, and inactivity since Dec 2022, suspect it's reasonable to close this one too.

Add optional feature for pinning threads to CPU cores

3d8ad39

james7132 marked this pull request as draft November 5, 2022 00:23

james7132 added C-Performance A change motivated by improving speed, memory usage or compile times A-Tasks Tools for parallel and async work labels Nov 5, 2022

james7132 requested a review from hymm November 5, 2022 00:32

alice-i-cecile added the S-Blocked This cannot move forward until something else changes label Nov 5, 2022

james7132 added 4 commits November 29, 2022 02:57

Merge branch 'main' into task-core-affinity

5fc6f54

Make it work without overlap

6c3d539

Fix CI

03553e7

Try fixing WASM builds

440bfd3

james7132 marked this pull request as ready for review November 29, 2022 18:05

james7132 removed the S-Blocked This cannot move forward until something else changes label Nov 29, 2022

Merge branch 'main' into task-core-affinity

df6ceed

bas-ie closed this Oct 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add default feature for pinning TaskPool threads to CPU cores #6471

Add default feature for pinning TaskPool threads to CPU cores #6471

james7132 commented Nov 5, 2022 •

edited

Loading

alice-i-cecile commented Nov 5, 2022

james7132 commented Nov 5, 2022 •

edited

Loading

hymm commented Dec 16, 2022 •

edited

Loading

james7132 commented Dec 16, 2022

hymm commented Dec 16, 2022 •

edited

Loading

bas-ie commented Oct 20, 2024

Add default feature for pinning TaskPool threads to CPU cores #6471

Add default feature for pinning TaskPool threads to CPU cores #6471

Conversation

james7132 commented Nov 5, 2022 • edited Loading

Objective

Solution

alice-i-cecile commented Nov 5, 2022

james7132 commented Nov 5, 2022 • edited Loading

hymm commented Dec 16, 2022 • edited Loading

james7132 commented Dec 16, 2022

hymm commented Dec 16, 2022 • edited Loading

bas-ie commented Oct 20, 2024

james7132 commented Nov 5, 2022 •

edited

Loading

james7132 commented Nov 5, 2022 •

edited

Loading

hymm commented Dec 16, 2022 •

edited

Loading

hymm commented Dec 16, 2022 •

edited

Loading