Compute Shaders #139

StarArawn · 2020-08-12T13:14:12Z

closes: #120

I'm not super happy with the API, and I'm open to suggestions on how we can make it better. Also this relies on the shader reflection PR I opened which ~~should merge first~~ is merged!
#189

StarArawn · 2020-08-18T01:57:56Z

I want some eyes on this now, but there are a few items remaining:

The compute example should just run once and spit out the data in the console. I wasn't exactly sure how to map buffers especially one from a render resource. Perhaps that's not possible yet.
There is some cleanup I have to do.

StarArawn · 2020-08-20T22:08:58Z

Merged master in. This should resolve the test issues.

TheJasonLessard

Most of it went over my head so take this as newbie's perspective. I can't comment much on the quality of the code, most of my comments would be related to the documentation for a user that is not familiar with shaders. Maybe add some descriptions over the key struct/impls like ComputeNode, ComputeState and ComputePipelineCompiler for a high level overview.

Cargo.toml

TheJasonLessard · 2020-08-21T17:13:06Z

examples/3d/compute.rs

+    uint[] indices;
+}; // this is used as both input and output for convenience
+
+// The Collatz Conjecture states that for any integer n:


I feel like this needs a better introduction. I had to google Collatz Conjecture to understand what we will be doing in this example. Maybe directly a link to Collatz Conjecture video made by Numberphile
on youtube ? I know you've explained the rule, but it's still quite abstract.

This is mostly a port from the original wgpu example found here:
https://github.com/gfx-rs/wgpu-rs/tree/master/examples/hello-compute

I'm open to a more practical example, but it would likely require more time than I have now to implement.

examples/3d/compute.rs

TheJasonLessard · 2020-08-21T17:23:53Z

examples/3d/compute.rs

+}
+
+/// set up a simple 3D scene
+fn setup(


I think there should be an overview on the high level steps taken by compute shader. Nothing too big, just to get the gist of it. Maybe a comment in the example is not the best place for it, but this would speed up learning about how it's used in bevy. Surely it will be added in the documentation later, but in the meanwhile, just a simple 3-4 lines paragraph to explain the basics steps. I like to see it those explanations in the first user facing API when you would be about to create a shader.

For sure, I'm not entirely happy with the sample and there is some cleanup that needs to happen. I also would like to output the result to the console. I think a comment here explaining compute might help a lot!

cart · 2020-08-24T19:48:08Z

I've started reviewing this (finally). Can you rebase on master? You'll need to account for a few wgpu api changes.

Philipp-M · 2020-08-24T20:06:56Z

Great work.

I've got a few questions though, is a separate compute stage (https://github.com/bevyengine/bevy/pull/139/files#diff-1272fb5bd21fcab0f60c9f567f1e7d93R64) useful?
I'm thinking of fancy pipelines like the ones that were just presented in Unreal Engine 5, where probably multiple compute stages occur during rendering (even for rasterization...), and not at one defined point.
Or is this intended for simpler cases to be a good default, where extended usage is not necessary?
I haven't yet completely read the source code in detail..., is the render graph programmable with multiple compute shaders?

StarArawn · 2020-08-24T20:12:23Z

@Philipp-M
This is primarily meant as a higher level way for end users to quickly spawn off compute work. I think we do need a lower level API around building compute nodes directly into the graph, and the functionality exists now, but it could be a little nicer.

Also I'm not really convinced an entity based approach to compute makes a lot of sense. I'm still leaning towards compute being more system based instead of entity based, where you have a resource that lets you dispatch compute work inside of the compute node.

Philipp-M · 2020-08-24T20:40:52Z

I think we do need a lower level API around building compute nodes directly into the graph

Yes certainly, that'd make sense, especially in the current shift we have with programmable render pipelines.

As for the sample, I could provide a simple raytracer, I've recently written with wgpu, maybe this fits better with a game engine and shows the intercommunication with buffers/images between the compute and forward pipeline (as I said, I haven't read yet the source, but I guess this is possible)?

StarArawn · 2020-08-24T20:47:25Z

@Philipp-M

As for the sample, I could provide a simple raytracer, I've recently written with wgpu, maybe this fits better with a game engine and shows the intercommunication with buffers/images between the compute and forward pipeline (as I said, I haven't read yet the source, but I guess this is possible)?

A simple raytracer might be a better example. My only concern would be how advanced that is, as the example provided in this PR is really simple in terms of setup. Perhaps what it actually processes is a bit strange, but since it matched the wgpu example I thought that might make it more approachable.

cart · 2020-08-24T19:13:36Z

examples/shader/compute.rs

+    commands
+        .spawn(ComputeComponents {
+            compute_pipelines: ComputePipelines::from_pipelines(vec![
+                ComputePipeline::specialized(


If pipeline specialization isn't required, we can use the simpler from_handles interface

.spawn(ComputeComponents { compute_pipelines: ComputePipelines::from_handles(&[pipeline_handle]), dispatch: Dispatch { only_once: false, work_group_size_x: data_count as u32, ..Default::default() }, })

If/when we sort out dynamic_uniform inference, this will be the preferred interface for most cases

cart · 2020-08-24T19:57:19Z

crates/bevy_render/src/draw.rs

@@ -290,7 +290,10 @@ impl<'a> DrawContext<'a> {
            .get_layout()
            .ok_or_else(|| DrawError::PipelineHasNoLayout)?;
        for bindings in render_resource_bindings.iter_mut() {
-            bindings.update_bind_groups(pipeline_descriptor, &**self.render_resource_context);
+            bindings.update_bind_groups(
+                pipeline_descriptor.get_layout().unwrap(),


why re-get the pipeline layout here with an unwrap? we already got a reference above (and handled the error)

cart · 2020-08-24T19:59:53Z

crates/bevy_render/src/lib.rs

@@ -57,6 +61,8 @@ pub mod stage {
    pub static RENDER_RESOURCE: &str = "render_resource";
    /// Stage where Render Graph systems are run. In general you shouldn't add systems to this stage manually.
    pub static RENDER_GRAPH_SYSTEMS: &str = "render_graph_systems";
+    /// Compute stage where compute systems are executed.
+    pub static COMPUTE: &str = "compute";


Do we need a new stage here? I know "DRAW" isn't the best name for running compute, but each stage we add reduces parallelism potential. Unless we really need a new stage, lets use draw (and consider naming alternatives if DRAW isn't a good fit)

Ideally I think we can throw all of the compute work onto it's own queue. We don't specifically need a stage, but we need to make sure that compute runs before any draw calls.

Does compute need to be "queued up in a resource/component" before draw calls are "queued up in a resource/component"?

I imagine order would matter when it comes to creating command buffers / submitting command buffers, but thats all deferred until the render graph execution.

Does compute need to be "queued up in a resource/component" before draw calls are "queued up in a resource/component"?

Yes because you can have multiple dispatches.

I imagine order would matter when it comes to creating command buffers / submitting command buffers, but thats all deferred until the render graph execution.

I think I need to look at the render graph a bit closer. I was assuming a graph node is a new command buffer?

cart · 2020-08-24T20:12:16Z

crates/bevy_render/src/pipeline/compute_pipeline_compiler.rs

+}
+
+impl ComputePipelineCompiler {
+    // TODO: Share some of this with PipelineCompiler.


Theres more code duplication here than I would like. Theres a lot of "implementation divergence risk" as it stands. id rather do one of these:

add an enum or something that wraps the two descriptors and use that during compilation.

have one descriptor for the "common stuff", but then have an enum inside that for compute stuff vs rasterization stuff.

make PipelineCompiler generic on descriptors and implement a Trait that extracts the required data.

I think the enum approach makes sense. Compute pipelines are very very small and only require bind group layouts.

Works for me!

cart · 2020-08-24T20:19:38Z

crates/bevy_render/src/render_graph/nodes/compute_node.rs

+        render_context.begin_compute_pass(&mut |compute_pass| {
+            let mut compute_state = ComputeState::default();
+
+            let mut entities = world.query::<&Dispatch>();


nit: for short queries i generally like inlining them in the for loop. feels "clearer" to me

cart · 2020-08-24T20:21:15Z

crates/bevy_render/src/render_graph/nodes/compute_node.rs

+}
+
+/// Tracks the current pipeline state to ensure compute calls are valid.
+#[derive(Default)]


if we merge into a single PipelineDescriptor we could use the same "state" type in PassNode and ComputeNode

I'm okay with that, as long as we can still make sure compute gets sent to the GPU first..

Actually given that the pass types are different (both in our abstraction and wgpu) its probably actually better to keep them separate. My bad!

cart · 2020-08-24T20:23:09Z

crates/bevy_render/src/render_graph/nodes/render_resources_node.rs

@@ -420,6 +421,7 @@ fn render_resources_node_system<T: RenderResources>(
    mut state: Local<RenderResourcesNodeState<T>>,
    render_resource_context: Res<Box<dyn RenderResourceContext>>,
    mut query: Query<(&T, &Draw, &mut RenderPipelines)>,
+    mut query2: Query<(&T, &Dispatch, &mut ComputePipelines)>,


nit: as a style thing, whenever we have more than one query, give them names to describe them. ex: draw_query and compute_query

cart · 2020-08-24T20:37:33Z

crates/bevy_render/src/render_graph/nodes/render_resources_node.rs

@@ -434,6 +436,14 @@ fn render_resources_node_system<T: RenderResources>(
            .uniform_buffer_arrays
            .increment_changed_item_counts(&uniforms);
    }
+
+    // update uniforms info
+    for (uniforms, _dispatch, _compute_pipelines) in &mut query2.iter() {


In its current state, I don't think the (non-asset) RenderResourcesNode will behave correctly for compute:

it never reads buffer data from the compute query

if an entity has both ComputePipelines and RenderPipelines, we will double increment changed item count.

Lets either fix these problems or just remove the code and add an issue to follow up

1. it never reads buffer data from the compute query

I'm not sure I understand what you mean here?

2. if an entity has both ComputePipelines and RenderPipelines, we will double increment changed item count.

Yeah, I think we need more separation between them. Ideally compute is never run on a single component as that would just be wasteful. I don't think we want to encourage people to have large components.. Instead we really need to work the compute API as it stands now to be system based and process multiple components. An example:

// Compute system pseudo code.. fn compute_velocity( compute_dispatcher: ResMut<ComputeDispatcher>, velocity_pipeline: ResMut<VelocityPipeline>, velocity_compute_resource: ResMut<VelocityComputeResource> query: Query<(&Translation, &Velocity)>, ) { // Imagine this a bit more verbose.. query.translations = velocity_compute_resource.read(); // Imagine this a bit more verbose.. velocity_compute_resource.write(query); compute_dispatcher.set_pipeline(velocity_pipeline); // Adds a resource to be used by the compute shader for the current frame. compute_dispatcher.add_resource(velocity_compute_resource) // Add dispatch command for the current frame. // Again imagine something more verbose here.. compute_dispatcher.dispatch(query.len()); }

Yeah I agree that compute should generally be run on sets of components. I imagine there will be a number of cases where people will want to run compute on "all transforms" or "all material assets" as input. It probably makes sense to make it easy to reuse the buffers generated by RenderResourcesNode (and ditto for AssetRenderResourcesNode). They are already contiguous arrays of data (although each value in the array is aligned to 256 bits in order to meet the "dynamic uniform" constraints). Reusing these abstractions also makes it "easy" to operate on ECS data.

I'm not sure I understand what you mean here?

Currently you call "increment_changed_item_counts" which sets up BufferArrayStatus to track the render resources that are returned from the "compute_query". But then you don't actually copy buffer data. The only code that currently does that is this:

bevy/crates/bevy_render/src/render_graph/nodes/render_resources_node.rs

Line 472 in 0958427

for (uniforms, draw, mut render_pipelines) in &mut query.iter() {

But that only runs on the "draw query" not the "compute query"

cart · 2020-08-24T20:52:03Z

examples/shader/compute.rs

+}
+
+// Setup our compute pipeline and a "dispatch" component/entity that will dispatch the shader.
+fn setup(


I wouldn't consider this example "complete" until we have outputs copied back to the cpu side / printed to the console. Can we either add that or create an issue to follow up?

I wouldn't want to merge this until the example actually did something the user could see. 👍 I haven't had time lately to take a deep dive into retrieving data back from a GPU resource in bevy.

Philipp-M · 2020-08-24T20:53:40Z

A simple raytracer might be a better example. My only concern would be how advanced that is, as the example provided in this PR is really simple in terms of setup. Perhaps what it actually processes is a bit strange, but since it matched the wgpu example I thought that might make it more approachable.

Yes a raytracer should probably be additional, to show how intercommunication between the compute and forward pipeline can be achieved. As this a quite common use case these days.

StarArawn · 2020-08-24T21:46:57Z

@Philipp-M Yeah I feel as though after this PR is merged we should follow up with a more complex compute example like raytracing or perhaps even boids..

sim-the-bean · 2020-09-02T04:14:42Z

Currently this can't be merged, even though the conflict is trivial.

#361 broke UniformBufferArrays usage by changing nearly its entire api. I'm currently working on fixing this, if I succeed before @StarArawn, I'll submit a PR to their branch.

sim-the-bean · 2020-09-02T05:05:19Z

Since the fix is simple, I'm posting the diff here:
Note that the conflict is already solved and not represented.

diff --git a/crates/bevy_render/src/render_graph/nodes/render_resources_node.rs b/crates/bevy_render/src/render_graph/nodes/render_resources_node.rs
index 38bcdcc0..cf0ee30a 100644
--- a/crates/bevy_render/src/render_graph/nodes/render_resources_node.rs
+++ b/crates/bevy_render/src/render_graph/nodes/render_resources_node.rs
@@ -428,7 +428,7 @@ fn render_resources_node_system<T: RenderResources>(
     mut state: Local<RenderResourcesNodeState<Entity, T>>,
     render_resource_context: Res<Box<dyn RenderResourceContext>>,
     mut query: Query<(Entity, &T, &Draw, &mut RenderPipelines)>,
-    mut query2: Query<(&T, &Dispatch, &mut ComputePipelines)>,
+    mut query2: Query<(Entity, &T, &Dispatch, &mut ComputePipelines)>,
 ) {
     let state = state.deref_mut();
     let uniform_buffer_arrays = &mut state.uniform_buffer_arrays;
@@ -443,20 +443,6 @@ fn render_resources_node_system<T: RenderResources>(
         uniform_buffer_arrays.remove_bindings(*entity);
     }
 
-    // update uniforms info
-    for (uniforms, _dispatch, _compute_pipelines) in &mut query2.iter() {
-        state
-            .uniform_buffer_arrays
-            .increment_changed_item_counts(&uniforms);
-    }
-
-    state
-        .uniform_buffer_arrays
-        .setup_buffer_arrays(render_resource_context, state.dynamic_uniforms);
-    state
-        .uniform_buffer_arrays
-        .update_staging_buffer(render_resource_context);
-
     for (entity, uniforms, draw, mut render_pipelines) in &mut query.iter() {
         if !draw.is_visible {
             continue;
@@ -469,6 +455,23 @@ fn render_resources_node_system<T: RenderResources>(
             &mut render_pipelines.bindings,
         )
     }
+
+    if let Some((_, first, _, _)) = query2.iter().iter().next() {
+        uniform_buffer_arrays.initialize(first);
+    }
+
+    for entity in query2.removed::<T>() {
+        uniform_buffer_arrays.remove_bindings(*entity);
+    }
+
+    for (entity, uniforms, _, mut compute_pipelines) in &mut query2.iter() {
+        uniform_buffer_arrays.prepare_uniform_buffers(entity, uniforms);
+        setup_uniform_texture_resources::<T>(
+            &uniforms,
+            render_resource_context,
+            &mut compute_pipelines.bindings,
+        )
+    }
 
     uniform_buffer_arrays.resize_buffer_arrays(render_resource_context);
     uniform_buffer_arrays.resize_staging_buffer(render_resource_context);

Dumdidldum · 2021-04-15T20:19:04Z

Hello
Are there any news for this?
I would be very interested to have compute shaders in bevy!

Moxinilian · 2021-04-17T09:32:12Z

Hello @StarArawn!
This PR has been sitting here for a while. Here's what we can do:

If you are happy with the design, we can have cart review it again after the merge conflict is fixed so it can be merged.
If not and you don't feel like working on it anymore, feel free to close the PR. We have your tracking issue for compute shaders, so the need won't be lost.

No matter what, thank you for your work!

StarArawn · 2021-04-19T23:31:09Z

Closing this as its missing some key things and I think a better job can be done.

StarArawn added 6 commits August 11, 2020 20:39

Start by adding a compute shader and compute pipeline.

939b64f

Remove bevy_convention which isn't required for compute.

19dee56

More progress made by adding a ComputePipeline and compute stage.

d7d4c9e

Added a Dispatch component and a ComputeCommand.

1d69ec4

Separate out draw from dispatch more.

5f66cbe

More changes trying to figure out stuff.

e99acc2

karroffel added C-Feature A new feature, making something new possible A-Rendering Drawing game state to the screen labels Aug 12, 2020

StarArawn added 17 commits August 12, 2020 16:58

More progress on adding compute shaders.

b4ecb70

Added ComputeNode..

0063e95

Fixed up compute initialization.

91abc6b

Cleaned up inner api.

3354e99

First start at a working compute example.

e481842

Merge branch 'master' into compute-pipelines

9cea004

Trying to get a working compute example.

56f896e

Trying something.

5e70705

Fixed tests.

2486534

Undo shader reflect change.

ad33dde

Make sure tests work again.

ac3e450

Merge branch 'bind-group-reflect-fix' into compute-pipelines

ec34faa

Testing on windows.

bb9554b

Replace buffer stuff with RenderResources

80adb3f

Mostly working compute.

a415639

Working compute shader.

f27eb6a

Merge branch 'master' into compute-pipelines

9f7a14e

StarArawn marked this pull request as ready for review August 18, 2020 01:57

StarArawn added 3 commits August 17, 2020 21:59

Fix formatting.

1168a93

Fixed more formatting issues.

e94eda5

Merge branch 'master' into compute-pipelines

21411ce

StarArawn added 3 commits August 20, 2020 18:13

Fixed fmt issues.

e688513

Fixed clippy issues.

04c71d9

Last format fix.

f32563f

TheJasonLessard reviewed Aug 21, 2020

View reviewed changes

Moved compute example to shader folder.

6eef887

Merge branch 'master' into compute-pipelines

0958427

cart reviewed Aug 24, 2020

View reviewed changes

mcpar-land mentioned this pull request Oct 7, 2020

Performance - Use Compute Shader for Z buffers mcpar-land/bevy_ascii#1

Open

Base automatically changed from master to main February 19, 2021 20:44

StarArawn closed this Apr 19, 2021

Compute Shaders #139

Compute Shaders #139

Conversation

StarArawn commented Aug 12, 2020 • edited Loading

StarArawn commented Aug 18, 2020

StarArawn commented Aug 20, 2020

TheJasonLessard left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cart commented Aug 24, 2020

Philipp-M commented Aug 24, 2020

StarArawn commented Aug 24, 2020

Philipp-M commented Aug 24, 2020

StarArawn commented Aug 24, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cart Aug 24, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

StarArawn Aug 24, 2020 • edited Loading

Choose a reason for hiding this comment

Philipp-M commented Aug 24, 2020

StarArawn commented Aug 24, 2020 • edited Loading

sim-the-bean commented Sep 2, 2020

sim-the-bean commented Sep 2, 2020

Dumdidldum commented Apr 15, 2021 • edited Loading

Moxinilian commented Apr 17, 2021 • edited Loading

StarArawn commented Apr 19, 2021

StarArawn commented Aug 12, 2020 •

edited

Loading

StarArawn commented Aug 24, 2020 •

edited

Loading

cart Aug 24, 2020 •

edited

Loading

StarArawn Aug 24, 2020 •

edited

Loading

StarArawn commented Aug 24, 2020 •

edited

Loading

Dumdidldum commented Apr 15, 2021 •

edited

Loading

Moxinilian commented Apr 17, 2021 •

edited

Loading