-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compute Shaders #139
Compute Shaders #139
Conversation
I want some eyes on this now, but there are a few items remaining:
|
Merged master in. This should resolve the test issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of it went over my head so take this as newbie's perspective. I can't comment much on the quality of the code, most of my comments would be related to the documentation for a user that is not familiar with shaders. Maybe add some descriptions over the key struct/impls like ComputeNode, ComputeState and ComputePipelineCompiler for a high level overview.
examples/3d/compute.rs
Outdated
uint[] indices; | ||
}; // this is used as both input and output for convenience | ||
|
||
// The Collatz Conjecture states that for any integer n: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this needs a better introduction. I had to google Collatz Conjecture to understand what we will be doing in this example. Maybe directly a link to Collatz Conjecture video made by Numberphile
on youtube ? I know you've explained the rule, but it's still quite abstract.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mostly a port from the original wgpu example found here:
https://github.com/gfx-rs/wgpu-rs/tree/master/examples/hello-compute
I'm open to a more practical example, but it would likely require more time than I have now to implement.
examples/3d/compute.rs
Outdated
} | ||
|
||
/// set up a simple 3D scene | ||
fn setup( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there should be an overview on the high level steps taken by compute shader. Nothing too big, just to get the gist of it. Maybe a comment in the example is not the best place for it, but this would speed up learning about how it's used in bevy. Surely it will be added in the documentation later, but in the meanwhile, just a simple 3-4 lines paragraph to explain the basics steps. I like to see it those explanations in the first user facing API when you would be about to create a shader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For sure, I'm not entirely happy with the sample and there is some cleanup that needs to happen. I also would like to output the result to the console. I think a comment here explaining compute might help a lot!
I've started reviewing this (finally). Can you rebase on master? You'll need to account for a few wgpu api changes. |
Great work. I've got a few questions though, is a separate compute stage (https://github.com/bevyengine/bevy/pull/139/files#diff-1272fb5bd21fcab0f60c9f567f1e7d93R64) useful? |
@Philipp-M Also I'm not really convinced an entity based approach to compute makes a lot of sense. I'm still leaning towards compute being more system based instead of entity based, where you have a resource that lets you dispatch compute work inside of the compute node. |
Yes certainly, that'd make sense, especially in the current shift we have with programmable render pipelines. As for the sample, I could provide a simple raytracer, I've recently written with wgpu, maybe this fits better with a game engine and shows the intercommunication with buffers/images between the compute and forward pipeline (as I said, I haven't read yet the source, but I guess this is possible)? |
A simple raytracer might be a better example. My only concern would be how advanced that is, as the example provided in this PR is really simple in terms of setup. Perhaps what it actually processes is a bit strange, but since it matched the wgpu example I thought that might make it more approachable. |
commands | ||
.spawn(ComputeComponents { | ||
compute_pipelines: ComputePipelines::from_pipelines(vec![ | ||
ComputePipeline::specialized( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If pipeline specialization isn't required, we can use the simpler from_handles
interface
.spawn(ComputeComponents {
compute_pipelines: ComputePipelines::from_handles(&[pipeline_handle]),
dispatch: Dispatch {
only_once: false,
work_group_size_x: data_count as u32,
..Default::default()
},
})
If/when we sort out dynamic_uniform inference, this will be the preferred interface for most cases
@@ -290,7 +290,10 @@ impl<'a> DrawContext<'a> { | |||
.get_layout() | |||
.ok_or_else(|| DrawError::PipelineHasNoLayout)?; | |||
for bindings in render_resource_bindings.iter_mut() { | |||
bindings.update_bind_groups(pipeline_descriptor, &**self.render_resource_context); | |||
bindings.update_bind_groups( | |||
pipeline_descriptor.get_layout().unwrap(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why re-get the pipeline layout here with an unwrap? we already got a reference above (and handled the error)
@@ -57,6 +61,8 @@ pub mod stage { | |||
pub static RENDER_RESOURCE: &str = "render_resource"; | |||
/// Stage where Render Graph systems are run. In general you shouldn't add systems to this stage manually. | |||
pub static RENDER_GRAPH_SYSTEMS: &str = "render_graph_systems"; | |||
/// Compute stage where compute systems are executed. | |||
pub static COMPUTE: &str = "compute"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a new stage here? I know "DRAW" isn't the best name for running compute, but each stage we add reduces parallelism potential. Unless we really need a new stage, lets use draw (and consider naming alternatives if DRAW isn't a good fit)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally I think we can throw all of the compute work onto it's own queue. We don't specifically need a stage, but we need to make sure that compute runs before any draw calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does compute need to be "queued up in a resource/component" before draw calls are "queued up in a resource/component"?
I imagine order would matter when it comes to creating command buffers / submitting command buffers, but thats all deferred until the render graph execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does compute need to be "queued up in a resource/component" before draw calls are "queued up in a resource/component"?
Yes because you can have multiple dispatches.
I imagine order would matter when it comes to creating command buffers / submitting command buffers, but thats all deferred until the render graph execution.
I think I need to look at the render graph a bit closer. I was assuming a graph node is a new command buffer?
} | ||
|
||
impl ComputePipelineCompiler { | ||
// TODO: Share some of this with PipelineCompiler. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theres more code duplication here than I would like. Theres a lot of "implementation divergence risk" as it stands. id rather do one of these:
- add an enum or something that wraps the two descriptors and use that during compilation.
- have one descriptor for the "common stuff", but then have an enum inside that for compute stuff vs rasterization stuff.
- make PipelineCompiler generic on descriptors and implement a Trait that extracts the required data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the enum approach makes sense. Compute pipelines are very very small and only require bind group layouts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me!
render_context.begin_compute_pass(&mut |compute_pass| { | ||
let mut compute_state = ComputeState::default(); | ||
|
||
let mut entities = world.query::<&Dispatch>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: for short queries i generally like inlining them in the for loop. feels "clearer" to me
} | ||
|
||
/// Tracks the current pipeline state to ensure compute calls are valid. | ||
#[derive(Default)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we merge into a single PipelineDescriptor we could use the same "state" type in PassNode and ComputeNode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay with that, as long as we can still make sure compute gets sent to the GPU first..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually given that the pass types are different (both in our abstraction and wgpu) its probably actually better to keep them separate. My bad!
@@ -420,6 +421,7 @@ fn render_resources_node_system<T: RenderResources>( | |||
mut state: Local<RenderResourcesNodeState<T>>, | |||
render_resource_context: Res<Box<dyn RenderResourceContext>>, | |||
mut query: Query<(&T, &Draw, &mut RenderPipelines)>, | |||
mut query2: Query<(&T, &Dispatch, &mut ComputePipelines)>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: as a style thing, whenever we have more than one query, give them names to describe them. ex: draw_query
and compute_query
@@ -434,6 +436,14 @@ fn render_resources_node_system<T: RenderResources>( | |||
.uniform_buffer_arrays | |||
.increment_changed_item_counts(&uniforms); | |||
} | |||
|
|||
// update uniforms info | |||
for (uniforms, _dispatch, _compute_pipelines) in &mut query2.iter() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In its current state, I don't think the (non-asset) RenderResourcesNode will behave correctly for compute:
- it never reads buffer data from the compute query
- if an entity has both ComputePipelines and RenderPipelines, we will double increment changed item count.
Lets either fix these problems or just remove the code and add an issue to follow up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. it never reads buffer data from the compute query
I'm not sure I understand what you mean here?
2. if an entity has both ComputePipelines and RenderPipelines, we will double increment changed item count.
Yeah, I think we need more separation between them. Ideally compute is never run on a single component as that would just be wasteful. I don't think we want to encourage people to have large components.. Instead we really need to work the compute API as it stands now to be system based and process multiple components. An example:
// Compute system pseudo code..
fn compute_velocity(
compute_dispatcher: ResMut<ComputeDispatcher>,
velocity_pipeline: ResMut<VelocityPipeline>,
velocity_compute_resource: ResMut<VelocityComputeResource>
query: Query<(&Translation, &Velocity)>,
) {
// Imagine this a bit more verbose..
query.translations = velocity_compute_resource.read();
// Imagine this a bit more verbose..
velocity_compute_resource.write(query);
compute_dispatcher.set_pipeline(velocity_pipeline);
// Adds a resource to be used by the compute shader for the current frame.
compute_dispatcher.add_resource(velocity_compute_resource)
// Add dispatch command for the current frame.
// Again imagine something more verbose here..
compute_dispatcher.dispatch(query.len());
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree that compute should generally be run on sets of components. I imagine there will be a number of cases where people will want to run compute on "all transforms" or "all material assets" as input. It probably makes sense to make it easy to reuse the buffers generated by RenderResourcesNode (and ditto for AssetRenderResourcesNode). They are already contiguous arrays of data (although each value in the array is aligned to 256 bits in order to meet the "dynamic uniform" constraints). Reusing these abstractions also makes it "easy" to operate on ECS data.
I'm not sure I understand what you mean here?
Currently you call "increment_changed_item_counts" which sets up BufferArrayStatus to track the render resources that are returned from the "compute_query". But then you don't actually copy buffer data. The only code that currently does that is this:
for (uniforms, draw, mut render_pipelines) in &mut query.iter() { |
But that only runs on the "draw query" not the "compute query"
} | ||
|
||
// Setup our compute pipeline and a "dispatch" component/entity that will dispatch the shader. | ||
fn setup( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't consider this example "complete" until we have outputs copied back to the cpu side / printed to the console. Can we either add that or create an issue to follow up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't want to merge this until the example actually did something the user could see. 👍 I haven't had time lately to take a deep dive into retrieving data back from a GPU resource in bevy.
Yes a raytracer should probably be additional, to show how intercommunication between the compute and forward pipeline can be achieved. As this a quite common use case these days. |
@Philipp-M Yeah I feel as though after this PR is merged we should follow up with a more complex compute example like raytracing or perhaps even boids.. |
Currently this can't be merged, even though the conflict is trivial. #361 broke |
Since the fix is simple, I'm posting the diff here: diff --git a/crates/bevy_render/src/render_graph/nodes/render_resources_node.rs b/crates/bevy_render/src/render_graph/nodes/render_resources_node.rs
index 38bcdcc0..cf0ee30a 100644
--- a/crates/bevy_render/src/render_graph/nodes/render_resources_node.rs
+++ b/crates/bevy_render/src/render_graph/nodes/render_resources_node.rs
@@ -428,7 +428,7 @@ fn render_resources_node_system<T: RenderResources>(
mut state: Local<RenderResourcesNodeState<Entity, T>>,
render_resource_context: Res<Box<dyn RenderResourceContext>>,
mut query: Query<(Entity, &T, &Draw, &mut RenderPipelines)>,
- mut query2: Query<(&T, &Dispatch, &mut ComputePipelines)>,
+ mut query2: Query<(Entity, &T, &Dispatch, &mut ComputePipelines)>,
) {
let state = state.deref_mut();
let uniform_buffer_arrays = &mut state.uniform_buffer_arrays;
@@ -443,20 +443,6 @@ fn render_resources_node_system<T: RenderResources>(
uniform_buffer_arrays.remove_bindings(*entity);
}
- // update uniforms info
- for (uniforms, _dispatch, _compute_pipelines) in &mut query2.iter() {
- state
- .uniform_buffer_arrays
- .increment_changed_item_counts(&uniforms);
- }
-
- state
- .uniform_buffer_arrays
- .setup_buffer_arrays(render_resource_context, state.dynamic_uniforms);
- state
- .uniform_buffer_arrays
- .update_staging_buffer(render_resource_context);
-
for (entity, uniforms, draw, mut render_pipelines) in &mut query.iter() {
if !draw.is_visible {
continue;
@@ -469,6 +455,23 @@ fn render_resources_node_system<T: RenderResources>(
&mut render_pipelines.bindings,
)
}
+
+ if let Some((_, first, _, _)) = query2.iter().iter().next() {
+ uniform_buffer_arrays.initialize(first);
+ }
+
+ for entity in query2.removed::<T>() {
+ uniform_buffer_arrays.remove_bindings(*entity);
+ }
+
+ for (entity, uniforms, _, mut compute_pipelines) in &mut query2.iter() {
+ uniform_buffer_arrays.prepare_uniform_buffers(entity, uniforms);
+ setup_uniform_texture_resources::<T>(
+ &uniforms,
+ render_resource_context,
+ &mut compute_pipelines.bindings,
+ )
+ }
uniform_buffer_arrays.resize_buffer_arrays(render_resource_context);
uniform_buffer_arrays.resize_staging_buffer(render_resource_context); |
Hello |
Hello @StarArawn!
No matter what, thank you for your work! |
Closing this as its missing some key things and I think a better job can be done. |
closes: #120
I'm not super happy with the API, and I'm open to suggestions on how we can make it better. Also this relies on the shader reflection PR I opened which
should merge firstis merged!#189