Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute Shaders #139

Closed
wants to merge 31 commits into from
Closed

Conversation

StarArawn
Copy link
Contributor

@StarArawn StarArawn commented Aug 12, 2020

closes: #120

I'm not super happy with the API, and I'm open to suggestions on how we can make it better. Also this relies on the shader reflection PR I opened which should merge first is merged!
#189

@karroffel karroffel added C-Feature A new feature, making something new possible A-Rendering Drawing game state to the screen labels Aug 12, 2020
@StarArawn
Copy link
Contributor Author

I want some eyes on this now, but there are a few items remaining:

  1. The compute example should just run once and spit out the data in the console. I wasn't exactly sure how to map buffers especially one from a render resource. Perhaps that's not possible yet.
  2. There is some cleanup I have to do.

@StarArawn StarArawn marked this pull request as ready for review August 18, 2020 01:57
@StarArawn
Copy link
Contributor Author

Merged master in. This should resolve the test issues.

Copy link

@TheJasonLessard TheJasonLessard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of it went over my head so take this as newbie's perspective. I can't comment much on the quality of the code, most of my comments would be related to the documentation for a user that is not familiar with shaders. Maybe add some descriptions over the key struct/impls like ComputeNode, ComputeState and ComputePipelineCompiler for a high level overview.

Cargo.toml Outdated Show resolved Hide resolved
uint[] indices;
}; // this is used as both input and output for convenience

// The Collatz Conjecture states that for any integer n:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this needs a better introduction. I had to google Collatz Conjecture to understand what we will be doing in this example. Maybe directly a link to Collatz Conjecture video made by Numberphile
on youtube ? I know you've explained the rule, but it's still quite abstract.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly a port from the original wgpu example found here:
https://github.com/gfx-rs/wgpu-rs/tree/master/examples/hello-compute

I'm open to a more practical example, but it would likely require more time than I have now to implement.

examples/3d/compute.rs Outdated Show resolved Hide resolved
}

/// set up a simple 3D scene
fn setup(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there should be an overview on the high level steps taken by compute shader. Nothing too big, just to get the gist of it. Maybe a comment in the example is not the best place for it, but this would speed up learning about how it's used in bevy. Surely it will be added in the documentation later, but in the meanwhile, just a simple 3-4 lines paragraph to explain the basics steps. I like to see it those explanations in the first user facing API when you would be about to create a shader.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure, I'm not entirely happy with the sample and there is some cleanup that needs to happen. I also would like to output the result to the console. I think a comment here explaining compute might help a lot!

@cart
Copy link
Member

cart commented Aug 24, 2020

I've started reviewing this (finally). Can you rebase on master? You'll need to account for a few wgpu api changes.

@Philipp-M
Copy link
Contributor

Great work.

I've got a few questions though, is a separate compute stage (https://github.com/bevyengine/bevy/pull/139/files#diff-1272fb5bd21fcab0f60c9f567f1e7d93R64) useful?
I'm thinking of fancy pipelines like the ones that were just presented in Unreal Engine 5, where probably multiple compute stages occur during rendering (even for rasterization...), and not at one defined point.
Or is this intended for simpler cases to be a good default, where extended usage is not necessary?
I haven't yet completely read the source code in detail..., is the render graph programmable with multiple compute shaders?

@StarArawn
Copy link
Contributor Author

@Philipp-M
This is primarily meant as a higher level way for end users to quickly spawn off compute work. I think we do need a lower level API around building compute nodes directly into the graph, and the functionality exists now, but it could be a little nicer.

Also I'm not really convinced an entity based approach to compute makes a lot of sense. I'm still leaning towards compute being more system based instead of entity based, where you have a resource that lets you dispatch compute work inside of the compute node.

@Philipp-M
Copy link
Contributor

I think we do need a lower level API around building compute nodes directly into the graph

Yes certainly, that'd make sense, especially in the current shift we have with programmable render pipelines.

As for the sample, I could provide a simple raytracer, I've recently written with wgpu, maybe this fits better with a game engine and shows the intercommunication with buffers/images between the compute and forward pipeline (as I said, I haven't read yet the source, but I guess this is possible)?

@StarArawn
Copy link
Contributor Author

StarArawn commented Aug 24, 2020

@Philipp-M

As for the sample, I could provide a simple raytracer, I've recently written with wgpu, maybe this fits better with a game engine and shows the intercommunication with buffers/images between the compute and forward pipeline (as I said, I haven't read yet the source, but I guess this is possible)?

A simple raytracer might be a better example. My only concern would be how advanced that is, as the example provided in this PR is really simple in terms of setup. Perhaps what it actually processes is a bit strange, but since it matched the wgpu example I thought that might make it more approachable.

commands
.spawn(ComputeComponents {
compute_pipelines: ComputePipelines::from_pipelines(vec![
ComputePipeline::specialized(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If pipeline specialization isn't required, we can use the simpler from_handles interface

.spawn(ComputeComponents {
    compute_pipelines: ComputePipelines::from_handles(&[pipeline_handle]),
    dispatch: Dispatch {
        only_once: false,
        work_group_size_x: data_count as u32,
        ..Default::default()
    },
})

If/when we sort out dynamic_uniform inference, this will be the preferred interface for most cases

@@ -290,7 +290,10 @@ impl<'a> DrawContext<'a> {
.get_layout()
.ok_or_else(|| DrawError::PipelineHasNoLayout)?;
for bindings in render_resource_bindings.iter_mut() {
bindings.update_bind_groups(pipeline_descriptor, &**self.render_resource_context);
bindings.update_bind_groups(
pipeline_descriptor.get_layout().unwrap(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why re-get the pipeline layout here with an unwrap? we already got a reference above (and handled the error)

@@ -57,6 +61,8 @@ pub mod stage {
pub static RENDER_RESOURCE: &str = "render_resource";
/// Stage where Render Graph systems are run. In general you shouldn't add systems to this stage manually.
pub static RENDER_GRAPH_SYSTEMS: &str = "render_graph_systems";
/// Compute stage where compute systems are executed.
pub static COMPUTE: &str = "compute";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a new stage here? I know "DRAW" isn't the best name for running compute, but each stage we add reduces parallelism potential. Unless we really need a new stage, lets use draw (and consider naming alternatives if DRAW isn't a good fit)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally I think we can throw all of the compute work onto it's own queue. We don't specifically need a stage, but we need to make sure that compute runs before any draw calls.

Copy link
Member

@cart cart Aug 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does compute need to be "queued up in a resource/component" before draw calls are "queued up in a resource/component"?

I imagine order would matter when it comes to creating command buffers / submitting command buffers, but thats all deferred until the render graph execution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does compute need to be "queued up in a resource/component" before draw calls are "queued up in a resource/component"?

Yes because you can have multiple dispatches.

I imagine order would matter when it comes to creating command buffers / submitting command buffers, but thats all deferred until the render graph execution.

I think I need to look at the render graph a bit closer. I was assuming a graph node is a new command buffer?

}

impl ComputePipelineCompiler {
// TODO: Share some of this with PipelineCompiler.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theres more code duplication here than I would like. Theres a lot of "implementation divergence risk" as it stands. id rather do one of these:

  1. add an enum or something that wraps the two descriptors and use that during compilation.
  2. have one descriptor for the "common stuff", but then have an enum inside that for compute stuff vs rasterization stuff.
  3. make PipelineCompiler generic on descriptors and implement a Trait that extracts the required data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the enum approach makes sense. Compute pipelines are very very small and only require bind group layouts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me!

render_context.begin_compute_pass(&mut |compute_pass| {
let mut compute_state = ComputeState::default();

let mut entities = world.query::<&Dispatch>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for short queries i generally like inlining them in the for loop. feels "clearer" to me

}

/// Tracks the current pipeline state to ensure compute calls are valid.
#[derive(Default)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we merge into a single PipelineDescriptor we could use the same "state" type in PassNode and ComputeNode

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with that, as long as we can still make sure compute gets sent to the GPU first..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually given that the pass types are different (both in our abstraction and wgpu) its probably actually better to keep them separate. My bad!

@@ -420,6 +421,7 @@ fn render_resources_node_system<T: RenderResources>(
mut state: Local<RenderResourcesNodeState<T>>,
render_resource_context: Res<Box<dyn RenderResourceContext>>,
mut query: Query<(&T, &Draw, &mut RenderPipelines)>,
mut query2: Query<(&T, &Dispatch, &mut ComputePipelines)>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: as a style thing, whenever we have more than one query, give them names to describe them. ex: draw_query and compute_query

@@ -434,6 +436,14 @@ fn render_resources_node_system<T: RenderResources>(
.uniform_buffer_arrays
.increment_changed_item_counts(&uniforms);
}

// update uniforms info
for (uniforms, _dispatch, _compute_pipelines) in &mut query2.iter() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In its current state, I don't think the (non-asset) RenderResourcesNode will behave correctly for compute:

  1. it never reads buffer data from the compute query
  2. if an entity has both ComputePipelines and RenderPipelines, we will double increment changed item count.

Lets either fix these problems or just remove the code and add an issue to follow up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1. it never reads buffer data from the compute query

I'm not sure I understand what you mean here?

2. if an entity has both ComputePipelines and RenderPipelines, we will double increment changed item count.

Yeah, I think we need more separation between them. Ideally compute is never run on a single component as that would just be wasteful. I don't think we want to encourage people to have large components.. Instead we really need to work the compute API as it stands now to be system based and process multiple components. An example:

// Compute system pseudo code..
fn compute_velocity(
  compute_dispatcher: ResMut<ComputeDispatcher>,
  velocity_pipeline: ResMut<VelocityPipeline>,
  velocity_compute_resource: ResMut<VelocityComputeResource>
  query: Query<(&Translation, &Velocity)>,
) {

  // Imagine this a bit more verbose..
  query.translations = velocity_compute_resource.read();

  // Imagine this a bit more verbose..
  velocity_compute_resource.write(query);

  compute_dispatcher.set_pipeline(velocity_pipeline);

  // Adds a resource to be used by the compute shader for the current frame.
  compute_dispatcher.add_resource(velocity_compute_resource)

  // Add dispatch command for the current frame.
  // Again imagine something more verbose here..
  compute_dispatcher.dispatch(query.len());
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree that compute should generally be run on sets of components. I imagine there will be a number of cases where people will want to run compute on "all transforms" or "all material assets" as input. It probably makes sense to make it easy to reuse the buffers generated by RenderResourcesNode (and ditto for AssetRenderResourcesNode). They are already contiguous arrays of data (although each value in the array is aligned to 256 bits in order to meet the "dynamic uniform" constraints). Reusing these abstractions also makes it "easy" to operate on ECS data.

I'm not sure I understand what you mean here?

Currently you call "increment_changed_item_counts" which sets up BufferArrayStatus to track the render resources that are returned from the "compute_query". But then you don't actually copy buffer data. The only code that currently does that is this:

for (uniforms, draw, mut render_pipelines) in &mut query.iter() {

But that only runs on the "draw query" not the "compute query"

}

// Setup our compute pipeline and a "dispatch" component/entity that will dispatch the shader.
fn setup(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't consider this example "complete" until we have outputs copied back to the cpu side / printed to the console. Can we either add that or create an issue to follow up?

Copy link
Contributor Author

@StarArawn StarArawn Aug 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't want to merge this until the example actually did something the user could see. 👍 I haven't had time lately to take a deep dive into retrieving data back from a GPU resource in bevy.

@Philipp-M
Copy link
Contributor

A simple raytracer might be a better example. My only concern would be how advanced that is, as the example provided in this PR is really simple in terms of setup. Perhaps what it actually processes is a bit strange, but since it matched the wgpu example I thought that might make it more approachable.

Yes a raytracer should probably be additional, to show how intercommunication between the compute and forward pipeline can be achieved. As this a quite common use case these days.

@StarArawn
Copy link
Contributor Author

StarArawn commented Aug 24, 2020

@Philipp-M Yeah I feel as though after this PR is merged we should follow up with a more complex compute example like raytracing or perhaps even boids..

@sim-the-bean
Copy link
Contributor

Currently this can't be merged, even though the conflict is trivial.

#361 broke UniformBufferArrays usage by changing nearly its entire api. I'm currently working on fixing this, if I succeed before @StarArawn, I'll submit a PR to their branch.

@sim-the-bean
Copy link
Contributor

Since the fix is simple, I'm posting the diff here:
Note that the conflict is already solved and not represented.

diff --git a/crates/bevy_render/src/render_graph/nodes/render_resources_node.rs b/crates/bevy_render/src/render_graph/nodes/render_resources_node.rs
index 38bcdcc0..cf0ee30a 100644
--- a/crates/bevy_render/src/render_graph/nodes/render_resources_node.rs
+++ b/crates/bevy_render/src/render_graph/nodes/render_resources_node.rs
@@ -428,7 +428,7 @@ fn render_resources_node_system<T: RenderResources>(
     mut state: Local<RenderResourcesNodeState<Entity, T>>,
     render_resource_context: Res<Box<dyn RenderResourceContext>>,
     mut query: Query<(Entity, &T, &Draw, &mut RenderPipelines)>,
-    mut query2: Query<(&T, &Dispatch, &mut ComputePipelines)>,
+    mut query2: Query<(Entity, &T, &Dispatch, &mut ComputePipelines)>,
 ) {
     let state = state.deref_mut();
     let uniform_buffer_arrays = &mut state.uniform_buffer_arrays;
@@ -443,20 +443,6 @@ fn render_resources_node_system<T: RenderResources>(
         uniform_buffer_arrays.remove_bindings(*entity);
     }
 
-    // update uniforms info
-    for (uniforms, _dispatch, _compute_pipelines) in &mut query2.iter() {
-        state
-            .uniform_buffer_arrays
-            .increment_changed_item_counts(&uniforms);
-    }
-
-    state
-        .uniform_buffer_arrays
-        .setup_buffer_arrays(render_resource_context, state.dynamic_uniforms);
-    state
-        .uniform_buffer_arrays
-        .update_staging_buffer(render_resource_context);
-
     for (entity, uniforms, draw, mut render_pipelines) in &mut query.iter() {
         if !draw.is_visible {
             continue;
@@ -469,6 +455,23 @@ fn render_resources_node_system<T: RenderResources>(
             &mut render_pipelines.bindings,
         )
     }
+
+    if let Some((_, first, _, _)) = query2.iter().iter().next() {
+        uniform_buffer_arrays.initialize(first);
+    }
+
+    for entity in query2.removed::<T>() {
+        uniform_buffer_arrays.remove_bindings(*entity);
+    }
+
+    for (entity, uniforms, _, mut compute_pipelines) in &mut query2.iter() {
+        uniform_buffer_arrays.prepare_uniform_buffers(entity, uniforms);
+        setup_uniform_texture_resources::<T>(
+            &uniforms,
+            render_resource_context,
+            &mut compute_pipelines.bindings,
+        )
+    }
 
     uniform_buffer_arrays.resize_buffer_arrays(render_resource_context);
     uniform_buffer_arrays.resize_staging_buffer(render_resource_context);

Base automatically changed from master to main February 19, 2021 20:44
@Dumdidldum
Copy link

Dumdidldum commented Apr 15, 2021

Hello
Are there any news for this?
I would be very interested to have compute shaders in bevy!

@Moxinilian
Copy link
Member

Moxinilian commented Apr 17, 2021

Hello @StarArawn!
This PR has been sitting here for a while. Here's what we can do:

  • If you are happy with the design, we can have cart review it again after the merge conflict is fixed so it can be merged.
  • If not and you don't feel like working on it anymore, feel free to close the PR. We have your tracking issue for compute shaders, so the need won't be lost.

No matter what, thank you for your work!

@StarArawn StarArawn closed this Apr 19, 2021
@StarArawn
Copy link
Contributor Author

Closing this as its missing some key things and I think a better job can be done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Rendering Drawing game state to the screen C-Feature A new feature, making something new possible
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add compute shaders
8 participants