-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding a Prepare stage for the sprite pipeline #5247
Adding a Prepare stage for the sprite pipeline #5247
Conversation
Output from This branch:
Main:
~1 FPS gain |
dbea56a
to
725cba2
Compare
d506de5
to
304798d
Compare
}; | ||
if sprite.color == Color::WHITE { | ||
extracted_sprites.sprites.alloc().init(sprite); | ||
} else { | ||
extracted_sprites.colored_sprites.alloc().init(sprite); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By merging #5103 this kind of code duplication would disappear
New output from This branch:
main:
~3 FPS gain |
Cool. :) Could you get set up to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't taken the time to closely compare the cut and pasted code to make sure it's the same. But this is looking pretty good. Interesting that it brings a performance benefit. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have now checked over the cut and pasted code to look for differences. I did find some but I think it will be ultimately the same from a functional perspective, just possibly different performance with a bit more work for the post-phase-sort batching code to do.
|
||
for (colored, sprites) in [ | ||
(false, &extracted_sprites.sprites), | ||
(true, &extracted_sprites.colored_sprites), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that doing this would be wrong as previously all sprites whether colour or not were sorted together but I guess the batches will be split as part of the batching code that is run after the phases are sorted. It could create more work for the batching to do though. Maybe need to compare performance for something with a mix of colour and not colour sprites.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I introduced this separation in 0a0d5c4, previously colored and non colored were merged together.
I can rollback the changes but the performance improvements would go back to 1FPS instead of 3FPS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems maybe they aren't being split in the batching system... maybe. I'm speculating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I may have figured out the problem. Because this processes the tinted and plain sprites separately, it only ends up inserting 2 sprite batches below and I hadn't realised that.
The previous code in queue_sprites
adds a SpriteBatch
for every sprite because we don't know if they need to be merged or not because there are other things that can be drawn (Mesh2d
s with custom materials or so) and in the sprite code we don't know at what z they are, nor that they even exist.
So you can't just add one SpriteBatch
per batch we perceive here. At least not with the current batch_phase_system
. You have to add one per sprite.
Red: main branch do you want me to do the same test without the color separation? It feels like this MR brings perf gains but also a more consistent framerate |
5084583
to
ddfbf91
Compare
# Objective Allow better performance testing for #5247 ## Solution I added color tints to the `many_sprites` example stress test.
ddfbf91
to
fdf09f3
Compare
@superdump Shocking results imo: For Yellow: This PR It's very suspicious how much faster this branch is, I wonder if something maybe broke |
I'll have to try it out. That looks great if true! :) |
If the separation I did works correctly I think |
IIRC, sorting all the sprites was one of the major bottlenecks in the sprites renderer. It makes sense to me that splitting the sprites preemptively to sort them as multiple smaller arrays, rather than one huge array, would improve perf. Potentially, now that there are multiple separate buckets of data, these sorts could also be done in parallel (maybe just if the arrays are large). Maybe that could further improve perf? The other major bottleneck for sprites rendering is the memory bandwidth bloat from all the data in vertex attributes. However, afaik, we have all thought hard about that area and haven't found ways to reduce it much, without moving to a different rendering approach using storage buffers (incompatible with WebGL2). |
I responded about sprite performance that is unrelated to this change in #rendering-dev on Discord as it's quite a deep topic: https://discord.com/channels/691052431525675048/743663924229963868/997122417795284993 |
Could you update this on top of |
It's in progress, #5310 brought some conflicts |
@superdump do you think this is worth keeping in the milestone as we draw close to release? Seems great but not critical to me? |
const QUAD_VERTEX_POSITIONS: [Vec2; 4] = [ | ||
// Top left |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the bottom left. y goes up? And similar for the rest.
0, 2, 3, // Bottom left triangle | ||
0, 1, 2, // Top right triangle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0, 2, 3, // Bottom left triangle | |
0, 1, 2, // Top right triangle | |
0, 2, 3, // Top-left triangle | |
0, 1, 2, // Bottom-right triangle |
It isn't critical, but it is a night and day enormous performance improvement for I just tested on an M1 Max and I see similar performance uplift. Here is the distribution of frame times: |
This does look suspicious though. I've been trying to figure out what I'm missing. I just thought I would compare the statistics for I did see some z-fighting on one sprite in one run of |
I'm not sure I understand, are you saying that this PR is fixing batching logic? |
|
||
for (colored, sprites) in [ | ||
(false, &extracted_sprites.sprites), | ||
(true, &extracted_sprites.colored_sprites), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I may have figured out the problem. Because this processes the tinted and plain sprites separately, it only ends up inserting 2 sprite batches below and I hadn't realised that.
The previous code in queue_sprites
adds a SpriteBatch
for every sprite because we don't know if they need to be merged or not because there are other things that can be drawn (Mesh2d
s with custom materials or so) and in the sprite code we don't know at what z they are, nor that they even exist.
So you can't just add one SpriteBatch
per batch we perceive here. At least not with the current batch_phase_system
. You have to add one per sprite.
I was saying that this PR breaks batching logic. See above. :) |
So the |
No, it’s rather that you add a SpriteBatch for each sprite with the batch range that is to the best of your knowledge. So in this case the plain sprites would have just 0, then 0 to 1 inclusive, 0 to 2, etc. and the tinted ones would also have something similar. But I think each has its own index within the batch and its own sort key. Then all the plain and tinted and mesh2d and everything get sorted so now they’re interleaved with each other. And then the batch system will merge / split batches based on whether there are other things that are from other batches or that are non-batchable between. So if you have sprite batches with plain index 0 batch 0 to 0, then tinted index 0 batch 0 to 0, then plain index 1 batch 0 to 1, then it would have to split the plain batch because there is a tinted batch item between. I’d suggest to look closely at how the previous code worked. |
# Objective Allow better performance testing for bevyengine#5247 ## Solution I added color tints to the `many_sprites` example stress test.
# Objective Allow better performance testing for bevyengine#5247 ## Solution I added color tints to the `many_sprites` example stress test.
I know this PR is closed now, but i just have a comment regarding the phase item batching. @superdump @ManevilleF It is not strictly necessary to have one phase item per sprite for correct sorting and batching. As you have said previously, the problem is that other non-sprite items (like Mesh2d or Text2d or whatever) could exist, and if their Z coordinate is in-between sprites, then those batches have to be broken up. So the thing is, what we actually care about are the Z coordinates. Respecting the Z layering. So you could preemptively batch things somewhat and spawn a pre-batched phase item of sprites, as long as they have the same Z coordinate. Group the sprites based on Z coordinate. This should still work correctly (allow other things to be rendered sandwiched between sprites) and also reduce the work for the phase item batching stage somewhat, when there are many sprites at the same Z. |
I closed this in favor of #6621 which doesn't try to change any logic ( since I ended up breaking the batching) |
I'm just trying to help you understand why sprites are represented in phase items the way they are. :) Conceptually, you can think of each phase item is a "thing to draw". Bevy's phase items are the abstraction that allows the engine to generically optimize the entire set of objects to be rendered in the whole scene, without having to know the details of how each one is to be rendered exactly. The original purpose for all this phase items stuff (and the PhaseSort render stage) is to be able to sort everything in the entire scene (front-to-back for opaque render passes to reduce overdraw, back-to-front for blended render passes for correct transparency / alpha blending). Everything that needs to be rendered (sprites, meshes, shapes, other custom user things, ...) represents itself as some sort of phase item. This is the job of the Queue systems: to generate "phase items" for the things they want to render, and add them to bevy. Each phase item contains a "sort key" (typically the Z distance from the camera) and other metadata, as well as the "draw function", which is what contains the code to do the actual rendering (wgpu draw calls). During the PhaseSort stage, Bevy will sort all the phase items. During the Render stage, it will call their "draw functions", so they can render themselves however they like. When sprite batching was implemented in Bevy, PhaseSort/PhaseItems were extended to also do batching as well as sorting. After all the items in the scene are sorted, there is a system that will try to opportunistically batch them together. It will check if successive items are "compatible" and merge them. For now, this is only used for sprites (AFAIK), but it is a general abstraction that could be extended to other things, too. So, this is how sprites become "batched". If there are many "sprite" phase items one after another, and there is nothing in between, they get merged into a single phase item (whose draw function will then render them all using one draw call). There can be other 2d things, besides sprites. For example, Mesh2d, Text2d, shapes/tilemaps from 3rd party plugins, some sort of custom things implemented by users, ... the list goes on. These things would also have their own phase items. Imagine you have 5 sprites at Z = 1.0, and 15 sprites at Z = 3.0. All of these 20 sprites can be batched together. If every individual sprite is a separate phase item, all of this "just works". Bevy just iterates through all phase items and merges together what it can, replacing the phase items of the sprites with phase items that draw a whole batch at once. Now, the question is, could you "pre-batch" things? Could you create these "sprite batch" phase items ahead of time? Instead of the queue system creating one phase item per sprite (which would then be merged by the general "phase batching" system), could it create already-batched phase items to begin with? And the answer is: yes, but only if it respects all the aforementioned considerations. It must respect the fact that other non-sprite items might exist. What this really means is that you can group sprites together by Z coordinate, but no more than that. Using the example from above, the queue system could have created a phase item for the 5 sprites at Z=1.0, and another phase item for the 15 sprites at Z=3.0. Bevy will then happily merge those two phase items into one big 20-sprite batch, if it can (when there are no non-sprite items in between). This would have been just as correct as creating one phase item for each individual sprite. |
@inodentry Thank you very much ! This explanation is very clear and helps a lot. |
# Objective Allow better performance testing for bevyengine#5247 ## Solution I added color tints to the `many_sprites` example stress test.
Objective
The entire logic for rendering preparation is in the
queue
stage and should be moved in aprepare
stage.Solution
Add a
prepare
stage.prepare
stage spawns the sprite batches and computes the vertex dataqueue
stage only handles bind groups and render phasesI also added some documentation.
Related MRs: