Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - Add a helper for storage buffers similar to UniformVec #4079

Closed
wants to merge 6 commits into from

Conversation

superdump
Copy link
Contributor

Objective

  • Add a helper for storage buffers similar to UniformVec

Solution

  • Add a StorageBuffer<T, U> where T is the main body of the shader struct without any final variable-sized array member, and U is the type of the items in a variable-sized array.
  • Use () as the type for unwanted parts, e.g. StorageBuffer<(), Vec4>::default() would construct a binding that would work with struct MyType { data: array<vec4<f32>>; } in WGSL and StorageBuffer<MyType, ()>::default() would work with struct MyType { ... } in WGSL as long as there are no variable-sized arrays.
  • Std430 requires that there is at most one variable-sized array in a storage buffer, that if there is one it is the last member of the binding, and that it has at least one item. StorageBuffer handles all of these constraints.

StorageBuffer takes two types, the first is the type of the body of the shader
struct, excluding any final variable-sized array. The second is the type of the
variable-sized array items.
This allows specification of the empty () type for StorageBuffer bodies or
variable-sized arrays.
@github-actions github-actions bot added the S-Needs-Triage This issue needs to be labelled label Mar 2, 2022
@superdump superdump added A-Rendering Drawing game state to the screen C-Usability A targeted quality-of-life change that makes Bevy easier to use and removed S-Needs-Triage This issue needs to be labelled labels Mar 2, 2022
&mut self.values[index]
}

pub fn reserve(&mut self, capacity: usize, device: &RenderDevice) -> bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a need to separately reserve the buffer from writing to it?

Could we use an approach like the one in #3532 that implements Deref(Mut) targeting the underlying Vec and defer buffer allocation/reservation to before writing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sweet! I hadn’t seen that PR. I was just saying to Cart that something felt wrong with how UniformVec contains the Vec but that looks like it addresses many of the things my gut was concerned about. We should merge that first and I’ll update this to follow the same pattern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed what I think is in line with what you have done in #3532 and I guess it's parallel work so this does not depend on that. Still, I've added #3532 to the rendering project board so we get both merged when they're ready to have a consistent API.

let mut offset = 0;
// First write the struct body if there is one
if Self::BODY_SIZE > 0 {
if let Ok(new_offset) = writer.write(&self.body).map_err(|e| warn!("{:?}", e)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is failing to write really just a warning? Should it bubble up the Result instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I suppose not successfully writing could leave the buffer in a trashed state. Maybe this should be an error. But, this was copy paste from UniformVec so would need addressing everywhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally think this should be unwrapped (like we do in UniformVec). This will only fail if we haven't allocated enough space in the buffer. That is something we should be doing correctly (and behavior would be undefined if we didnt). Not something we should be surfacing to the user, or allowing to continue if we did this wrong.

self.scratch[i] = 0;
}
} else {
// Then write the array. Note that padding bytes may be added between the body
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the note should probably be on the function declaration to make it more discoverable.

Copy link
Contributor

@IceSentry IceSentry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks correct to me other than the warning in my previous comment, but that should probably be dealt with in a separate PR that also touches UniformVec.

Copy link
Contributor

@kurtkuehnert kurtkuehnert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice addition. LGTM

Further Discussion

We should probably document this type once we have cleaned up its API.

Are there cases where StorageBuffer is not sufficient to represent any storage type?
Is it possible/useful to find a single general abstraction for uniform buffers as well? I suspect no.

@mockersf
Copy link
Member

mockersf commented Mar 5, 2022

should this be done for std140?

@superdump
Copy link
Contributor Author

should this be done for std140?

Should what be done? For std140 we have UniformVec and DynamicUniformVec.

@mockersf
Copy link
Member

mockersf commented Mar 5, 2022

Should what be done? For std140 we have UniformVec and DynamicUniformVec.

Good then! That was me missing the point 🙂

@superdump
Copy link
Contributor Author

Should what be done? For std140 we have UniformVec and DynamicUniformVec.

Good then! That was me missing the point 🙂

Also there is #3532 that implements Deref and DerefMut for *UniformVec and this PR adding StorageBuffer was updated to align with that approach for the contained Vec. I called this StorageBuffer because it can also have a body and there can only be one unsized array. As such StorageBuffer covers all of what a storage buffer binding can be, I think. Uniform buffers are a bit different in that all of their members are Sized (I think that is correct) and so unsized arrays are not allowed within the binding. But dynamic uniform bindings allow you to provide an offset when binding which will bind the nth item of the Vec. My brain is feeling fuzzy so I’m a bit uncertain about this but this is what it is telling me. :)

@james7132 james7132 added the S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it label Mar 6, 2022
@james7132
Copy link
Member

bors try

bors bot added a commit that referenced this pull request Mar 7, 2022
@james7132
Copy link
Member

CI issues seem to be transient errors from dead links.

bors bot added a commit that referenced this pull request Mar 7, 2022
@superdump
Copy link
Contributor Author

bors try

bors bot added a commit that referenced this pull request Mar 13, 2022
@cart
Copy link
Member

cart commented Mar 22, 2022

bors r+

bors bot pushed a commit that referenced this pull request Mar 22, 2022
# Objective

- Add a helper for storage buffers similar to `UniformVec`

## Solution

- Add a `StorageBuffer<T, U>` where `T` is the main body of the shader struct without any final variable-sized array member, and `U` is the type of the items in a variable-sized array.
- Use `()` as the type for unwanted parts, e.g. `StorageBuffer<(), Vec4>::default()` would construct a binding that would work with `struct MyType { data: array<vec4<f32>>; }` in WGSL and `StorageBuffer<MyType, ()>::default()` would work with `struct MyType { ... }` in WGSL as long as there are no variable-sized arrays.
- Std430 requires that there is at most one variable-sized array in a storage buffer, that if there is one it is the last member of the binding, and that it has at least one item. `StorageBuffer` handles all of these constraints.
@bors bors bot changed the title Add a helper for storage buffers similar to UniformVec [Merged by Bors] - Add a helper for storage buffers similar to UniformVec Mar 22, 2022
@bors bors bot closed this Mar 22, 2022
bors bot pushed a commit that referenced this pull request Apr 6, 2022
# Objective

- Make use of storage buffers, where they are available, for clustered forward bindings to support far more point lights in a scene
- Fixes #3605 
- Based on top of #4079 

This branch on an M1 Max can keep 60fps with about 2150 point lights of radius 1m in the Sponza scene where I've been testing. The bottleneck is mostly assigning lights to clusters which grows faster than linearly (I think 1000 lights was about 1.5ms and 5000 was 7.5ms). I have seen papers and presentations leveraging compute shaders that can get this up to over 1 million. That said, I think any further optimisations should probably be done in a separate PR.

## Solution

- Add `RenderDevice` to the `Material` and `SpecializedMaterial` trait `::key()` functions to allow setting flags on the keys depending on feature/limit availability
- Make `GpuPointLights` and `ViewClusterBuffers` into enums containing `UniformVec` and `StorageBuffer` variants. Implement the necessary API on them to make usage the same for both cases, and the only difference is at initialisation time.
- Appropriate shader defs in the shader code to handle the two cases

## Context on some decisions / open questions

- I'm using `max_storage_buffers_per_shader_stage >= 3` as a check to see if storage buffers are supported. I was thinking about diving into 'binding resource management' but it feels like we don't have enough use cases to understand the problem yet, and it is mostly a separate concern to this PR, so I think it should be handled separately.
- Should `ViewClusterBuffers` and `ViewClusterBindings` be merged, duplicating the count variables into the enum variants?


Co-authored-by: Carter Anderson <[email protected]>
bors bot pushed a commit that referenced this pull request Apr 6, 2022
# Objective

- Make use of storage buffers, where they are available, for clustered forward bindings to support far more point lights in a scene
- Fixes #3605 
- Based on top of #4079 

This branch on an M1 Max can keep 60fps with about 2150 point lights of radius 1m in the Sponza scene where I've been testing. The bottleneck is mostly assigning lights to clusters which grows faster than linearly (I think 1000 lights was about 1.5ms and 5000 was 7.5ms). I have seen papers and presentations leveraging compute shaders that can get this up to over 1 million. That said, I think any further optimisations should probably be done in a separate PR.

## Solution

- Add `RenderDevice` to the `Material` and `SpecializedMaterial` trait `::key()` functions to allow setting flags on the keys depending on feature/limit availability
- Make `GpuPointLights` and `ViewClusterBuffers` into enums containing `UniformVec` and `StorageBuffer` variants. Implement the necessary API on them to make usage the same for both cases, and the only difference is at initialisation time.
- Appropriate shader defs in the shader code to handle the two cases

## Context on some decisions / open questions

- I'm using `max_storage_buffers_per_shader_stage >= 3` as a check to see if storage buffers are supported. I was thinking about diving into 'binding resource management' but it feels like we don't have enough use cases to understand the problem yet, and it is mostly a separate concern to this PR, so I think it should be handled separately.
- Should `ViewClusterBuffers` and `ViewClusterBindings` be merged, duplicating the count variables into the enum variants?


Co-authored-by: Carter Anderson <[email protected]>
bors bot pushed a commit that referenced this pull request Apr 6, 2022
# Objective

- Make use of storage buffers, where they are available, for clustered forward bindings to support far more point lights in a scene
- Fixes #3605 
- Based on top of #4079 

This branch on an M1 Max can keep 60fps with about 2150 point lights of radius 1m in the Sponza scene where I've been testing. The bottleneck is mostly assigning lights to clusters which grows faster than linearly (I think 1000 lights was about 1.5ms and 5000 was 7.5ms). I have seen papers and presentations leveraging compute shaders that can get this up to over 1 million. That said, I think any further optimisations should probably be done in a separate PR.

## Solution

- Add `RenderDevice` to the `Material` and `SpecializedMaterial` trait `::key()` functions to allow setting flags on the keys depending on feature/limit availability
- Make `GpuPointLights` and `ViewClusterBuffers` into enums containing `UniformVec` and `StorageBuffer` variants. Implement the necessary API on them to make usage the same for both cases, and the only difference is at initialisation time.
- Appropriate shader defs in the shader code to handle the two cases

## Context on some decisions / open questions

- I'm using `max_storage_buffers_per_shader_stage >= 3` as a check to see if storage buffers are supported. I was thinking about diving into 'binding resource management' but it feels like we don't have enough use cases to understand the problem yet, and it is mostly a separate concern to this PR, so I think it should be handled separately.
- Should `ViewClusterBuffers` and `ViewClusterBindings` be merged, duplicating the count variables into the enum variants?


Co-authored-by: Carter Anderson <[email protected]>
bors bot pushed a commit that referenced this pull request Apr 7, 2022
# Objective

- Make use of storage buffers, where they are available, for clustered forward bindings to support far more point lights in a scene
- Fixes #3605 
- Based on top of #4079 

This branch on an M1 Max can keep 60fps with about 2150 point lights of radius 1m in the Sponza scene where I've been testing. The bottleneck is mostly assigning lights to clusters which grows faster than linearly (I think 1000 lights was about 1.5ms and 5000 was 7.5ms). I have seen papers and presentations leveraging compute shaders that can get this up to over 1 million. That said, I think any further optimisations should probably be done in a separate PR.

## Solution

- Add `RenderDevice` to the `Material` and `SpecializedMaterial` trait `::key()` functions to allow setting flags on the keys depending on feature/limit availability
- Make `GpuPointLights` and `ViewClusterBuffers` into enums containing `UniformVec` and `StorageBuffer` variants. Implement the necessary API on them to make usage the same for both cases, and the only difference is at initialisation time.
- Appropriate shader defs in the shader code to handle the two cases

## Context on some decisions / open questions

- I'm using `max_storage_buffers_per_shader_stage >= 3` as a check to see if storage buffers are supported. I was thinking about diving into 'binding resource management' but it feels like we don't have enough use cases to understand the problem yet, and it is mostly a separate concern to this PR, so I think it should be handled separately.
- Should `ViewClusterBuffers` and `ViewClusterBindings` be merged, duplicating the count variables into the enum variants?


Co-authored-by: Carter Anderson <[email protected]>
aevyrie pushed a commit to aevyrie/bevy that referenced this pull request Jun 7, 2022
…4079)

# Objective

- Add a helper for storage buffers similar to `UniformVec`

## Solution

- Add a `StorageBuffer<T, U>` where `T` is the main body of the shader struct without any final variable-sized array member, and `U` is the type of the items in a variable-sized array.
- Use `()` as the type for unwanted parts, e.g. `StorageBuffer<(), Vec4>::default()` would construct a binding that would work with `struct MyType { data: array<vec4<f32>>; }` in WGSL and `StorageBuffer<MyType, ()>::default()` would work with `struct MyType { ... }` in WGSL as long as there are no variable-sized arrays.
- Std430 requires that there is at most one variable-sized array in a storage buffer, that if there is one it is the last member of the binding, and that it has at least one item. `StorageBuffer` handles all of these constraints.
aevyrie pushed a commit to aevyrie/bevy that referenced this pull request Jun 7, 2022
# Objective

- Make use of storage buffers, where they are available, for clustered forward bindings to support far more point lights in a scene
- Fixes bevyengine#3605 
- Based on top of bevyengine#4079 

This branch on an M1 Max can keep 60fps with about 2150 point lights of radius 1m in the Sponza scene where I've been testing. The bottleneck is mostly assigning lights to clusters which grows faster than linearly (I think 1000 lights was about 1.5ms and 5000 was 7.5ms). I have seen papers and presentations leveraging compute shaders that can get this up to over 1 million. That said, I think any further optimisations should probably be done in a separate PR.

## Solution

- Add `RenderDevice` to the `Material` and `SpecializedMaterial` trait `::key()` functions to allow setting flags on the keys depending on feature/limit availability
- Make `GpuPointLights` and `ViewClusterBuffers` into enums containing `UniformVec` and `StorageBuffer` variants. Implement the necessary API on them to make usage the same for both cases, and the only difference is at initialisation time.
- Appropriate shader defs in the shader code to handle the two cases

## Context on some decisions / open questions

- I'm using `max_storage_buffers_per_shader_stage >= 3` as a check to see if storage buffers are supported. I was thinking about diving into 'binding resource management' but it feels like we don't have enough use cases to understand the problem yet, and it is mostly a separate concern to this PR, so I think it should be handled separately.
- Should `ViewClusterBuffers` and `ViewClusterBindings` be merged, duplicating the count variables into the enum variants?


Co-authored-by: Carter Anderson <[email protected]>
ItsDoot pushed a commit to ItsDoot/bevy that referenced this pull request Feb 1, 2023
…4079)

# Objective

- Add a helper for storage buffers similar to `UniformVec`

## Solution

- Add a `StorageBuffer<T, U>` where `T` is the main body of the shader struct without any final variable-sized array member, and `U` is the type of the items in a variable-sized array.
- Use `()` as the type for unwanted parts, e.g. `StorageBuffer<(), Vec4>::default()` would construct a binding that would work with `struct MyType { data: array<vec4<f32>>; }` in WGSL and `StorageBuffer<MyType, ()>::default()` would work with `struct MyType { ... }` in WGSL as long as there are no variable-sized arrays.
- Std430 requires that there is at most one variable-sized array in a storage buffer, that if there is one it is the last member of the binding, and that it has at least one item. `StorageBuffer` handles all of these constraints.
ItsDoot pushed a commit to ItsDoot/bevy that referenced this pull request Feb 1, 2023
# Objective

- Make use of storage buffers, where they are available, for clustered forward bindings to support far more point lights in a scene
- Fixes bevyengine#3605 
- Based on top of bevyengine#4079 

This branch on an M1 Max can keep 60fps with about 2150 point lights of radius 1m in the Sponza scene where I've been testing. The bottleneck is mostly assigning lights to clusters which grows faster than linearly (I think 1000 lights was about 1.5ms and 5000 was 7.5ms). I have seen papers and presentations leveraging compute shaders that can get this up to over 1 million. That said, I think any further optimisations should probably be done in a separate PR.

## Solution

- Add `RenderDevice` to the `Material` and `SpecializedMaterial` trait `::key()` functions to allow setting flags on the keys depending on feature/limit availability
- Make `GpuPointLights` and `ViewClusterBuffers` into enums containing `UniformVec` and `StorageBuffer` variants. Implement the necessary API on them to make usage the same for both cases, and the only difference is at initialisation time.
- Appropriate shader defs in the shader code to handle the two cases

## Context on some decisions / open questions

- I'm using `max_storage_buffers_per_shader_stage >= 3` as a check to see if storage buffers are supported. I was thinking about diving into 'binding resource management' but it feels like we don't have enough use cases to understand the problem yet, and it is mostly a separate concern to this PR, so I think it should be handled separately.
- Should `ViewClusterBuffers` and `ViewClusterBindings` be merged, duplicating the count variables into the enum variants?


Co-authored-by: Carter Anderson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Rendering Drawing game state to the screen C-Usability A targeted quality-of-life change that makes Bevy easier to use S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

6 participants