WebGLRenderer: Merge update ranges before issuing updates to the GPU. #29189
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
TLDR: this PR achieves up to 3 orders of magnitude performance improvement when updating a large number of adjacent ranges within
InstancedBufferAttribute
which is a common use case for projects heavily leveraging instancing.Description
BufferAttribute#addUpdateRange can be used with
needsUpdate
so that three only transfers subsections of data to the GPU. This is a powerful feature which allows clients to better manage CPU<>GPU bandwidth. For example, in cases where a BufferAttribute may be several MB large and only a few bytes change per frame, clients can transfer only the changed bytes instead of the entire buffer.In our product we've seen large improvement gains using update ranges, but frame drops in cases where many update ranges are present in a single frame. This can easily be observed with InstancedBufferAttribute. In a project which heavily leverages
InstancedMesh
and thereforeInstancedBufferAttribute
to represent instance data, it's commonly required that individual instances are updated usingaddUpdateRange
. In a frame where all instances need to be updated, this can create a large number of update ranges which are nearly all adjacent. As a result we observe a large number of avoidablegl.bufferSubData
calls and frame drops (I imagine due to GPU command overhead).This PR automatically merges overlapping / adjacent update ranges before calling
gl.bufferSubData
and results in up to a 99.78% wall time reduction rendering our project (see below for details)Impact
In a toy example within our company, I created a scene with 10k plane geometries (via
InstancedMesh
) which are positioned by vec3's interleaved viaInstancedBufferAttribute
. Updating all 10k positions in a single frame on a 2021 M1 Macbook Pro takes 112.21ms in three.js today, when run on this this PR it takes 0.25ms instead.Design Notes
addUpdateRange
where we could amortize the merging costs because clients are allowed to directly manipulate theupdateRanges
array. Adding this logic to the renderers ensures robustness regardless of how clients interact with update ranges.WebGLAttributes
making it challenging to envision how we'd mockgl
when instantiatingWebGLAttributes
. If there's a suggestion here, I'd love to hear it.This contribution is funded by SOOT