Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebGPURenderer: partial compute() shader support for WebGL backend #27367

Merged
merged 9 commits into from
Jan 9, 2024

Conversation

aardgoose
Copy link
Contributor

Adds support for compute shaders where storageBuffers are only accessed by instanceIndex.

Areas that need work are a possible helper StorageAttribute subclass of InstancedBufferAttribute to handle different alignment characteristics of WebGL and WebGPU and coordinate system differences.

With minimal changes this allows the webgpu_compute_particles, webgpu_compute_particles_rain and webgpu_compute_points examples to run.

With #27355 the webgpu_compute_audio runs but passes the original audio through unchanged, demonstrating the instanceIndex limitation above.

This PR replaces the WebGPU storage buffers with attribute/varying pairs with transform feedback in the WebGL shaders.
(as a result buffer allocation is doubled for these buffers)

This also includes (again) the movement of VAO creation into the draw call, now with full material/geometry caching of VAOs which allows other examples to reuse VAOs correctly across objects. This is required because the VAO required is not constant for draw calls when vertex attributes from storage buffers are used.

Also fixed confusion of return of NodeBuilderState or NodeBuilder from nodes.getForCompute().

@sunag sunag added this to the r160 milestone Dec 12, 2023
@mrdoob mrdoob modified the milestones: r160, r161 Dec 22, 2023
@RenaudRohlinger
Copy link
Collaborator

The introduction of a compute shaders fallback in WebGL is amazing. Moving ahead with this seems like a strategic step, especially considering the intricate task of aligning the WebGL and WebGPU backends.
We can refine the 'StorageAttribute' subclass implementation in future updates, which will allow us to swiftly integrate these essential functionalities.

I am worried that waiting too long to merge this draft might lead to potential conflicts with future updates of the WebGPURenderer and particularly the WebGLBackend and potentially delay this crucial feature. What's your thoughts on this @aardgoose?

@sunag sunag marked this pull request as ready for review January 9, 2024 02:10
@sunag
Copy link
Collaborator

sunag commented Jan 9, 2024

It seems that webgpu_compute_audio is not returning the audio computed in WebGLBackend.

@sunag
Copy link
Collaborator

sunag commented Jan 9, 2024

Amazing work so far.

@RenaudRohlinger
Copy link
Collaborator

RenaudRohlinger commented Jan 9, 2024

Maybe we should convert the transformFeedbackVarying usage from gl.SEPARATE_ATTRIBS to gl.INTERLEAVED_ATTRIBS before merging as the current implementation is too limited in the WebGL context (4 attributes maximum vs 128 with interleaved).

I can take care of it if you want.

/cc @aardgoose @sunag

@aardgoose
Copy link
Contributor Author

@sunag the audio example is one that can't be replicated, at least easily, because it relies on using storage buffer indexes other than InstanceIndex so you just get the same audio out that you put in.

@aardgoose
Copy link
Contributor Author

@RenaudRohlinger using interleaved buffers and remaining compatible with the existing webGPU storage buffer code would be difficult I suspect (I haven't looked deeply though), I presume you would have to define the storage buffers as structs in wgsl to keep buffers compatible (apart from alignment differences).

@RenaudRohlinger
Copy link
Collaborator

RenaudRohlinger commented Jan 9, 2024

I believe it would be doable without affecting the WebGPU side.
Basically in the class WebGLAttributeUtils there could be a new method createComputeAttributes which packs and interleaved all the storage attributes (regardless of their type) into a giant ArrayBuffer with a reference of stride and offset.
Then the WebGLBackend would stay almost the same except that instead of switching all the attributes we would just swap the interleavedBuffer that contains all the preconfigured attributes via VAO. /cc @aardgoose

I just gave it a try but I couldn't figure out why the buffers were not correctly bounded (no error and no result to the screen, definitely not the best type of issue).

I'd say we can merge it as it is and try to improve it through interleaved array buffer with another PR. In the meantime developers can just try to be careful on how they pack their datas 😄.
/cc @sunag

@sunag
Copy link
Collaborator

sunag commented Jan 9, 2024

I'd say we can merge it as it is and try to improve it through interleaved array buffer with another PR. In the meantime developers can just try to be careful on how they pack their datas 😄.

I think it's great. I think we can think about InterleavedStorageBufferAttribute and transfer the logic of the packs initially to the user.

@sunag sunag changed the title RFC WebGPURenderer: partial compute() shader support for WebGL backend WebGPURenderer: partial compute() shader support for WebGL backend Jan 9, 2024
@sunag sunag merged commit 9fb80cf into mrdoob:dev Jan 9, 2024
11 checks passed
@aardgoose aardgoose deleted the compute-2 branch January 12, 2024 14:14
AdaRoseCannon pushed a commit to AdaRoseCannon/three.js that referenced this pull request Jan 15, 2024
…rdoob#27367)

* compute support

* code bot fixes

* add StorageBufferAttribute

* adapt to storagetBufferAttribute

* dynamically realign the size of storage buffer based on backend

* compute examples takes too much time to init for puppeteer

* Add StorageBufferAttribute.create()

---------

Co-authored-by: aardgoose <[email protected]>
@LeviPesin
Copy link
Contributor

Should we close #25273 then?

@LeviPesin LeviPesin mentioned this pull request Jan 18, 2024
3 tasks
console.warn( 'Abstract class.' );
const gl = this.gl;

this.discard = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this flag? Maybe RASTERIZER_DISCARD can be just always enabled in beginCompute and disabled in finishCompute? /ping @aardgoose @sunag

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did have it that way at first, but for some reason which I can't remember that had issues. If I get a chance I will try and reproduce the problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked again, the check deals with nested compute shaders.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... Feels a bit strange for me. So you mean one shader starts executing before another is finished? Is this intended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment needs fixing. Actually the problem is that compute passes can be nested and indeed are in the compute_points example, which is where I hit this issue:

renderer.compute( precomputeShaderNode().compute( particleNum ) );

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just thought about this again and I think the current approach is completely equivalent to always calling enable in compute (the only possibility is that it could be called multiple times -- but I guess it wouldn't break?) and disable in finishCompute (it already does this). Can you please check this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WIth nested calls you get the sequence, seen in practice in compute_points: I originally had enable in discard in beginCompute().

beginCompute()
   beginCompute() <- called from onBeforeRender()
      compute()
   endCompute()
   compute() <- discard disabled by previous endCompute()
endCompute()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compute() <- discard disabled by previous endCompute()

No -- I'm suggesting that it be enabled in compute() instead of beginCompute() (currently it is the same).


}

static create( count, itemSize, typeClass = Float32Array ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead of using this method we can overload the constuctor? Like so: /ping @aardgoose @sunag

constructor( array, itemSize, typeClass = Float32Array ) {

    if ( ArrayBuffer.isView( arrray ) === false ) array = new typeClass( array * itemSize );

    super( array, itemSize );

    this.isStorageBufferAttribute = true;

}

'webgpu_portal'
'webgpu_portal',

// WebGPU idleTime and parseTime too low
Copy link
Contributor

@LeviPesin LeviPesin Jan 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to increase idleTime -- I think from these two only it could be relevant here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants