WebGLRenderer: Rearrange logic in renderObjects to reduce CPU-side draw cost for multi-camera setups. #22123
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Rendering all objects for one camera at a time (vs. rendering each object for all cameras before moving to the next object) reduces the CPU cost of rendering fairly significantly. Specifically, reduces needing to recompute a bunch of stuff per-camera for each object, and also reduces the number of GL state changes made when rendering objects with similar materials.
You can see a before/after example in this sample - link. By default, it renders with the current Three.js behavior for WebGLRenderer::renderObjects. If you tap the right grip button, it will switch to the proposed renderObjects order. Tapping the left grip button will revert to the default behavior again. To most effectively evaluate this, run OVR Metrics HUD and keep an eye on framerate. On Quest 2, I see an initial framerate of ~60fps. Toggling to the proposed renderObjects order, I get a solid 90fps.
A note on the sample -- it is focused on stressing CPU draw calls. It draws a bunch of individual cubes as unique draw calls, and it's intentionally rendering headlocked and to low-resolution eye buffers to provide consistent perf results and to focus the timing results on CPU perf.