Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue - Update_grass #1

Closed
dror-g opened this issue Dec 3, 2023 · 3 comments
Closed

Performance issue - Update_grass #1

dror-g opened this issue Dec 3, 2023 · 3 comments

Comments

@dror-g
Copy link

dror-g commented Dec 3, 2023

Hi, thanks for sharing this and the great article! learned a lot.

I've been using the grass plugin in my app, but get poor performance on my laptop - 30 - 40 fps (not surprising I'm sure 😄 ) .
CPU and GPU are at 30% load, so started to investigate.

Delay is Update_grass.
I noticed that the problem is not the noise function as I originally thought (tried Simplex, Value, even simple Sine wave),
But rather the meshes.get_mut() command. Replace it with immutable .get and all is well (but no wind of course).

https://github.com/mikeam565/first-game/blob/262d0da747b27e8306cdba75237fa61bc9e07ab1/src/entities/grass.rs#L176C12-L176C12

I tried different ways of getting a mutable or even replacing the whole Mesh asset with a modified one in my go - but no luck yet.
But I'm new to Rust and not a developer by trade, so perhaps you'll have better ideas..

To reproduce on a stronger PC perhaps generate more blades of grass till fps drops to 30 or so.
Attached flamegraphs showing time spent with and without get_mut / wind.

Thanks again!
P.S - perhaps consider releasing the grass plugin as a crate?

flamegraph- without Grass plugin. baseline.
flamegraph- without Grass plugin

flamegraph- with Grass
flamegraph- with Grass

lamegraph- no applywind, still low fps
flamegraph- no applywind, still low fps

flamegraph- no get_mut line and no wind. same as baseline really..
flamegraph- no get_mut line and no wind. same as baseline really..

@mikeam565
Copy link
Owner

Hi Dror,

First of all, thank you for the feedback! I appreciate that you took the time to read my posts and have forked the repo :)

Yes, the performance isn't amazing. I have a 3060ti and 5800x and I get ~70 fps. Not great for a scene with just grass and two rectangles. I believe it just comes down to not pushing more work to the GPU. Ideally, the array of vertices would be stored and modified entirely within the GPU, since grass is mostly visual (not likely to have much interaction within the game world with other entities). Based off this Bevy issue, I see that they've done some work to implement GPU instancing, so I might actually move this issue to the top of my priority list to get a chance to play with that.

I will also add that the perlin noise sampling's hit on performance is not insignificant either and worth looking into. Replacing the content of the function sample_noise with some constant f32, I go from ~70 fps to ~88. I wonder what I'm missing here, I figured sampling 2d perlin noise would just be a constant time op. I wonder if this also falls under work that could be pushed to the GPU.

Eventually when I think the grass is in a game-ready state, I will put it out as a crate.

@dror-g
Copy link
Author

dror-g commented Dec 5, 2023

I agree that the perlin noise is not a major hit on performance. The flamegraphs show that when it's disabled or replaced with sine wave there's still a performance hit caused by preparing the meshes for render.
Moving to the GPU is probably the best way to resolve this. I think a shader is the most common solution..

Another idea / workaround though -
Perhaps only modify some of the meshes instead of all of them?
If you exclude from updating meshes that are outside the screen camera view,
Or apply wind in waves that move across the field, affecting only part of the grass,
Then there will be fewer meshes / draw calls..

but yea overall gpu is the way to go.

Thanks again! Feel free to close issue

@mikeam565
Copy link
Owner

I could have been clearer, but I meant that the hit to performance from sampling Perlin noise is significant. 88 to 70 is roughly a 20% hit.

Also, there is only one grass mesh, with all of the individual blades’ vertices. The performance would be even worse if each blade of grass was its own mesh (I would know, that’s what I did the first time lol). So that get_mut is for the entire grass field, and then I iterate on all of the vertices to modify them.

in terms of only rendering what is on screen, that is called frustum culling, and I think that’s already implemented in Bevy so it could be a trivial optimization. SimonDev actually just released a video on view-based optimizations in the games industry that covers this and many other tricks developers in game dev use. But in his grass video he also covered grass that dynamicallyy loads denser closer to the player and sparse far away.

It will likely come down to whatever I feel like focusing on when I sit down next time to work on this, but I really would like to play around with that new GPU array buffer they introduced to Bevy.

Thanks again for your interest in the project! I’ll go ahead and close this issue but if you catch anything else feel free to open another one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants