Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebGPURenderer: Compute modelViewMatrix using GPU #29299

Merged
merged 13 commits into from
Sep 3, 2024

Conversation

sunag
Copy link
Collaborator

@sunag sunag commented Sep 2, 2024

Related issue: #28719
Related: https://lxjk.github.io/2017/10/01/Stop-Using-Normal-Matrix.html

Performance

This change is part of the integration process of #28719 for less CPU usage. These changes brought a gain of around ~25% in performance for scenes with many objects.

default - old default - now
12.10ms 8.85ms
image image
bundle - old bundle - now
5.67ms 3.27ms
image image

Precision

You can use highPrecisionModelViewMatrix for all MVPs or for some selected Materials.

Global usage:

// global
import { highPrecisionModelViewMatrix, highPrecisionModelNormalMatrix } from 'three/tsl';

const renderer = new THREE.WebGPURenderer( { antialias: true } );
renderer.nodes.modelViewMatrix = highPrecisionModelViewMatrix;
renderer.nodes.modelNormalViewMatrix = highPrecisionModelNormalViewMatrix;

Single Material / Object

import { highPrecisionModelNormalMatrix, cameraProjectionMatrix, highPrecisionModelViewMatrix, positionLocal, normalLocal } from 'three/tsl';

material = new THREE.NodeMaterial();
material.vertexNode = cameraProjectionMatrix.mul( highPrecisionModelViewMatrix ).mul( positionLocal );
material.normalNode = highPrecisionModelNormalViewMatrix.transformDirection( normalLocal );
  • model* will use GPU.
  • highPrecision* will use CPU.

modelViewMatrix using GPU is the default. Since they are all nodes, you can customize your own.

Copy link

github-actions bot commented Sep 2, 2024

📦 Bundle size

Full ESM build, minified and gzipped.

Before After Diff
WebGL 685.1 kB
169.6 kB
685.1 kB
169.6 kB
+0 B
+0 B
WebGPU 821.7 kB
220.6 kB
822.6 kB
220.8 kB
+862 B
+223 B
WebGPU Nodes 821.3 kB
220.5 kB
822.1 kB
220.7 kB
+1.28 kB
+315 B

🌳 Bundle size after tree-shaking

Minimal build including a renderer, camera, empty scene, and dependencies.

Before After Diff
WebGL 461.9 kB
111.4 kB
461.9 kB
111.4 kB
+0 B
+0 B
WebGPU 522.1 kB
140.7 kB
521.7 kB
140.6 kB
-441 B
-61 B
WebGPU Nodes 478.8 kB
130.5 kB
478.3 kB
130.5 kB
-43.78 kB
-57 B

@sunag sunag changed the title WebGPURenderer: Compute modelViewMatrix using GPU WebGPURenderer: Compute modelViewMatrix using GPU - WIP Sep 2, 2024
@WestLangley
Copy link
Collaborator

We do something similar here for instancing, but this only works if the columns of the matrix are orthogonal, which is not true in general.

See this explanation -- especially the last sentence.

@sunag
Copy link
Collaborator Author

sunag commented Sep 2, 2024

Thanks @WestLangley!

Now I just need to check another approach to webgpu_postprocessing_motion_blur example :)

@sunag sunag changed the title WebGPURenderer: Compute modelViewMatrix using GPU - WIP WebGPURenderer: Compute modelViewMatrix using GPU Sep 3, 2024
@sunag sunag marked this pull request as ready for review September 3, 2024 00:36
@sunag sunag added this to the r169 milestone Sep 3, 2024
@gkjohnson
Copy link
Collaborator

I'm not as familiar with shader nodes so I may be misunderstanding what's happening here but I'll write my two cents from what I understand from the description:

This change could cause precision issues when using large coordinates since GPU calculations use 32 bit math which hasn't been an uncommon issue with instances and skinned meshes with large position values (bone and instance matrices are multiplied into the mv matrix on the gpu). See here and here. I suspect it will be even more common if this is done on the GPU for every mesh.

Assume the camera is far from the origin (camera orbiting a to-scale globe model with a radius of 6.3e6 meters) meaning the objects in frame have extremely large positional values, as well. If the MV matrix is calculated on the CPU then 64-bit precision is used meaning any error resulting from the calculations will be much smaller than it will be if these calculations are done with 32 bits on the GPU. This can cause very noticeable jitter artifacts during rendering.

@RenaudRohlinger
Copy link
Collaborator

In that case, similar to how logarithmicDepthBuffer improves depth precision at the cost of performance, we could introduce another option like renderer.normalMatrix (or somethingNormalCPU...) for 64-bit precision when jitter artifacts occur.
This would still allow for high-precision MV matrix calculations on the CPU, would align with the existing logarithmicDepthBuffer logic, and improve performances while giving developers an option to increase precision in demanding scenarios.

@sunag
Copy link
Collaborator Author

sunag commented Sep 3, 2024

I like the idea of ​​having a point of origin relative to the camera as presented in item 3.2.1 of this article https://www.diva-portal.org/smash/get/diva2:275843/FULLTEXT02.

@gkjohnson
Copy link
Collaborator

we could introduce another option like renderer.normalMatrix

To be clear is this just for normal matrices? This PR involves moving both the model-view matrix multiplication (and implicitly the normal matrix generation) to the GPU, right? In this case we'd want to name it something indicating it's for more than just normal matrices.

I like the idea of ​​having a point of origin relative to the camera as presented in item 3.2.1 of this article https://www.diva-portal.org/smash/get/diva2:275843/FULLTEXT02.

This is what multiplying the model and view matrices on the CPU is achieving - ie what WebGLRenderer is already doing.

@sunag
Copy link
Collaborator Author

sunag commented Sep 3, 2024

I think the idea is to have global matrices relative to the camera position. It's not what we do today

@WestLangley
Copy link
Collaborator

Restatement of my previous comment:

The technique proposed in this PR will only be correct when the columns of the model view matrix are orthogonal. The columns will typically not be orthogonal when, for example,

(a) a non-uniformly-scaled parent has a rotated child,

(b) a user-provided object matrix has non-orthogonal columns.

@WestLangley
Copy link
Collaborator

Maybe revisit #5974, instead.

@gkjohnson
Copy link
Collaborator

I think the idea is to have global matrices relative to the camera position. It's not what we do today

This is no different than calculating a model-view matrix, though, as far as I undrstand. The model-view matrix places the object relative to the camera. Perhaps you're imagining something different but in order to maintain these you have to multiply the existing world matrix by the inverse of the camera world matrix. You can either do that before rendering or maintain it on each object but I'm not sure of the value of the latter since it just makes things more difficult to maintain and removes the ability to render with multiple cameras without recalculating everything. Either way the same (if not more) matrix multiplication has to happen and everything will have to be recalculated when the camera moves.

I may need a more concrete explanation to understand the differences in what's being suggested.

@aardgoose
Copy link
Contributor

I have been experimenting with a similar ideas (obviously restricted to uniform scaling) but made an opt in to allow object.static as proposed in #28719. Thus the existing known to be correct behavior is preserved but a lighter CPU varient is available for renderBundles - (to get lighting working, light uniforms need moving into a shared bindGroup etc).

https://github.com/aardgoose/three.js/tree/freeze2

@sunag
Copy link
Collaborator Author

sunag commented Sep 3, 2024

This would not use matrix multiplication, it would be a simple subtraction of the objects' world matrix position with the camera's world position, in which case the camera world would always have zero position for the GPU. It is certainly something else to add for CPU how to calculate viewMatrix and normalMatrix is today, not ideal for Bundler, but since three.js currently does not have dedicated API for "huge open world", given the issues you presented in WebGLRenderer, the viewMatrix calculated on the CPU does not solve problems such as Attached SkinnedMesh, InstacedMesh and probably others that a world matrices relative to the camera position should resolve.

It is also possible to notice that most of the issues are related to the incorrect use of the scale, where 1 meter should be 1.0.

I don't think there is a perfect solution in this here, just since this PR is prioritizing performance and keeping it functional in situations where the camera needs to move 10 kilometers of distance of center of scene, which seems reasonable to me. Since some specific cases are the scenario that moves.

We could have Nodes to deal with these situations since the function TSL Fn call are deferred we would not have problems in defining how the viewMatrix is constructed for a given object, these are other possibilities to be studied.

@sunag
Copy link
Collaborator Author

sunag commented Sep 3, 2024

We could have Nodes to deal with these situations since the function TSL Fn call are deferred we would not have problems in defining how the viewMatrix is constructed for a given object, these are other possibilities to be studied.

It seems like the best way to close this issue:
You can use highPrecisionModelViewMatrix for global or specific cases for example:

Global usage:

// global
import { highPrecisionModelViewMatrix } from 'three/tsl';

const renderer = new THREE.WebGPURenderer( { antialias: true } );
renderer.nodes.modelViewMatrix = highPrecisionModelViewMatrix; // it will replace all MVP with this modelView node

Single Material / Object

import { cameraProjectionMatrix, highPrecisionModelViewMatrix, positionLocal } from 'three/tsl';

material = new THREE.NodeMaterial();
material.vertexNode = cameraProjectionMatrix.mul( highPrecisionModelViewMatrix).mul( positionLocal );

highPrecisionModelViewMatrix will use CPU and modelViewMatrix will use GPU.
modelViewMatrix will be the default.

@sunag sunag merged commit 29cb17f into mrdoob:dev Sep 3, 2024
12 checks passed
@sunag sunag deleted the dev-performance-3 branch September 3, 2024 15:20
@WestLangley
Copy link
Collaborator

Master branch (WebGLRenderer and WebGPURenderer)

master

This PR (WebGPURenderer)

169dev

This is because this PR computes the incorrect normals in the GPU... not surprising, based on my comments above.

@sunag
Copy link
Collaborator Author

sunag commented Sep 3, 2024

@WestLangley Could you share the code of this test?

@WestLangley
Copy link
Collaborator

WebGPU dev branch fiddle: https://jsfiddle.net/La1e5gmz/

@sunag
Copy link
Collaborator Author

sunag commented Sep 3, 2024

I'm checking that out, thanks, maybe I'll try something like that, but I need to do some testing still.

The code below is just an abstraction

const modelNormalMatrix = ( object ) => ... new Matrix3().getNormalMatrix( object.matrixWorld )
const normalView = cameraViewMatrix.transformDirection( modelNormalMatrix.mul( normal ) );

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants