Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3.x] Fix Rare GL Bug caused by Particle Systems #61632

Conversation

jasonwinterpixel
Copy link
Contributor

@jasonwinterpixel jasonwinterpixel commented Jun 2, 2022

This PR unbinds particle systems textures after use.

I spoke with @clayjohn on the rocket chat about this.

If you have a particle system that uses more than 1 texture, it will bind two textures.

It will never unbind that second texture.

If you render a viewport later on in your scene, on webgl, you will receive a GL error:

image

[.WebGL-0x4c044fc600] GL_INVALID_OPERATION: Feedback loop formed between Framebuffer and active Texture.

On non webgl platforms, the system merely liberally allows the texture to be bound, and returns all black if it is sampled.

This happens because the particle system left a texture bound. This PR fixes it.

There is some possibility this bug is present on master, but I don't know how the renderer works on master so I cant comment.

@jasonwinterpixel jasonwinterpixel requested a review from a team as a code owner June 2, 2022 14:08
@akien-mga akien-mga added this to the 3.5 milestone Jun 2, 2022
@lawnjelly
Copy link
Member

lawnjelly commented Jun 9, 2022

Is this fix also needed for GLES2?

Could we alternatively unbind all texture slots when starting to render a viewport? I was just wondering if this same bug could occur in other circumstances than particle systems, and we hadn't spotted it yet, and maybe we could fix all such bugs in the same way.

One effect of this PR could be if you were rendering e.g. 100 of the same particle system, it would bind / unbind the same texture 100 times. It is a bit of a tricky one though. Another alternative is some kind of wrapper for keeping track of which slots are in use and their current values, and unbinding the unused slots, and preventing redundant changes. 🤔

What are your thoughts @clayjohn ?

@jasonwinterpixel
Copy link
Contributor Author

jasonwinterpixel commented Jun 10, 2022

Could we alternatively unbind all texture slots when starting to render a viewport? I was just wondering if this same bug could occur in other circumstances than particle systems, and we hadn't spotted it yet, and maybe we could fix all such bugs in the same way.

This bug almost definetly exists in other systems. @clayjohn has raised this to me.

I think cleaning the textures up at the start or end of a viewport render is okay but I think it would be better to identify what render commands are leaving stuff bound and unbinding them intelligently during the render loop.

One effect of this PR could be if you were rendering e.g. 100 of the same particle system, it would bind / unbind the same texture 100 times.

The actual execution of the unbind and whether there is a performance increase to 'not unbinding after each particle use' is probably driver dependent and a performance increase is not clearly gained there. In general, I think a robust, simple renderer system will bind it's resources, render its thing(s) and then unbind those resources. I think it's a mistake in this renderer to leave this bound. A much more powerful renderer would batch it's draw calls more intelligently, but just leaving random stuff bound is not the way to do it. Godot 3.x should heavily prioritize correctness rather than optimizations as 4.0 is meant to be the future proof place for large work like this.

@clayjohn
Copy link
Member

Could we alternatively unbind all texture slots when starting to render a viewport? I was just wondering if this same bug could occur in other circumstances than particle systems, and we hadn't spotted it yet, and maybe we could fix all such bugs in the same way.

Possibly, I'm not sure what the cost would be if we were to be unbinding 1000 already unbound texture slots. My guess is most if not all drivers would be able to no-op and perform no work, but it could lead to pathological edge cases on devices with low-quality drivers.

My original suggestion was to track max texture id usage over the course of a single renderloop (that is per viewport) and then unbind all the used slots. So the unbind would be called once per frame per viewport. Essentially making each viewport clean up after itself.

I think that we can intelligently identify places that require a full unbind and then ensure that all active texture slots are unbound. For example, the issue here is from the engine leaving a screen_texture bound to texture slot 2 and then rendering to the screen_texture later in the frame (or perhaps early in the next frame). To work around this problem while minimizing binds/unbinds, we could track max texture slot usage, and then call a full unbind right before switching to a new framebuffer. That would ensure that everytime we render to a framebuffer there is no chance that that framebuffer's texture is currently bound. At the same time, it would result in fewer unbinds then unbinding after every draw command.

@lawnjelly
Copy link
Member

Possibly, I'm not sure what the cost would be if we were to be unbinding 1000 already unbound texture slots. My guess is most if not all drivers would be able to no-op and perform no work, but it could lead to pathological edge cases on devices with low-quality drivers.

This was assuming the maximum would be 8 or 16 slots (we could look on the GLES database to find the actual numbers involved, I think only 8 are mandated in GLES2). Keeping track of the maximum we use is also a good option if the number of slots is higher than this, or using a wrapper to keep track of texture slots for this (I've used this approach in the past in other projects).

I think that we can intelligently identify places that require a full unbind and then ensure that all active texture slots are unbound. For example, the issue here is from the engine leaving a screen_texture bound to texture slot 2 and then rendering to the screen_texture later in the frame (or perhaps early in the next frame). To work around this problem while minimizing binds/unbinds, we could track max texture slot usage, and then call a full unbind right before switching to a new framebuffer. That would ensure that everytime we render to a framebuffer there is no chance that that framebuffer's texture is currently bound. At the same time, it would result in fewer unbinds then unbinding after every draw command.

Also agree that unbinding on situations like rendering a new viewport seems more sensible than the current approach in this PR, as it should catch all such bugs instead of just this particular one.

@jasonwinterpixel
Copy link
Contributor Author

jasonwinterpixel commented Jun 20, 2022

Also agree that unbinding on situations like rendering a new viewport seems more sensible than the current approach in this PR, as it should catch all such bugs instead of just this particular one.

This is a known, precise fix for a known, precise individual bug which should afford some value. I think 'catch all' systems as described should be considered a last resort. Locating the unbound textures and unbinding them per command feels better to me, rather than using the blunt instrument of 'unbind everything'. Maybe I'm wrong to feel that way.

Can we identify a second instance of this class of bug? Is there only one other example? Should we similarly precisely fix that single other bug?

For example, the issue here is from the engine leaving a screen_texture bound to texture slot 2 and then rendering to the screen_texture later in the frame (or perhaps early in the next frame).

I dont think it matters but just to clarify, I'm fairly certain that the bug that this PR fixes was not related to SCREEN_TEXTURE. None of the particle systems in our project use SCREEN_TEXTURE. Rather, we have a low resolution viewport that we render something into, and that is where the issue arose.

@@ -6557,6 +6557,9 @@ void RasterizerStorageGLES3::update_particles() {
}

Material *material = material_owner.getornull(particles->process_material);
int texture_count = 0;
RID *textures = nullptr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem needed in this outer scope in the end?

@clayjohn
Copy link
Member

Also agree that unbinding on situations like rendering a new viewport seems more sensible than the current approach in this PR, as it should catch all such bugs instead of just this particular one.

This is a known, precise fix for a known, precise individual bug which should afford some value. I think 'catch all' systems as described should be considered a last resort. Locating the unbound textures and unbinding them per command feels better to me, rather than using the blunt instrument of 'unbind everything'. Maybe I'm wrong to feel that way.

The idea isn't to unbind everything, its to unbind what was used since the last framebuffer switch. In general, I agree that it is best to fix the specific bug. But in this case, this isn't a bug that arises between commands. Unbinding all resources after a command is guaranteed to unbind more often than necessary (although depending on drivers, that may not actually be a big deal).

Again, the bug comes from having a framebuffer's texture left bound while rendering to that framebuffer. If we unbind up to the max texture slot used by a given Viewport, then we guarantee that the framebuffer texture is never left bound when it comes time to render to that framebuffer.

Can we identify a second instance of this class of bug? Is there only one other example? Should we similarly precisely fix that single other bug?

#45532, #46883

For example, the issue here is from the engine leaving a screen_texture bound to texture slot 2 and then rendering to the screen_texture later in the frame (or perhaps early in the next frame).

I dont think it matters but just to clarify, I'm fairly certain that the bug that this PR fixes was not related to SCREEN_TEXTURE. None of the particle systems in our project use SCREEN_TEXTURE. Rather, we have a low resolution viewport that we render something into, and that is where the issue arose.

You are correct, I mispoke, when I wrote screen_texture I meant the backbuffer texture associated with a framebuffer (which is the current viewports SCREEN_TEXTURE). When accessed from another Viewport, it isn't correct to call it the screen_texture anymore.

@clayjohn
Copy link
Member

@jasonwinterpixel This is what I had in mind, clayjohn@baf691a

Is there any way if you could test out that commit and let me know if it resolves your bug as well? I wasn't able to reproduce locally.

@lawnjelly
Copy link
Member

This is a known, precise fix for a known, precise individual bug which should afford some value. I think 'catch all' systems as described should be considered a last resort. Locating the unbound textures and unbinding them per command feels better to me, rather than using the blunt instrument of 'unbind everything'. Maybe I'm wrong to feel that way.

On a philosophy point - although it may seem strange, the policy as far as I see in Godot is that just because something fixes a bug, doesn't guarantee it will be merged.

The emphasis is very much on finding "the right fix" (by consensus with maintainers etc). It does seem to help keep the codebase as a whole more concise and sensible, and less what might be considered "a patchwork of bugfixes" which can happen to some projects, making them more difficult to maintain. The flipside of course is that it can mean bugs remain open a little longer.

This happens to all of us BTW regularly. What often happens making a PR is we don't see the forest for the trees (and often reduz comes up with a better way of fixing as he has the best overview).

Anyway that isn't to say the approach in this PR has no value, it shows very well the cause and solution to the problem. And we don't yet know if the other approaches will fix it! 😀

@akien-mga akien-mga modified the milestones: 3.5, 3.x Jul 2, 2022
@akien-mga akien-mga added the cherrypick:3.5 Considered for cherry-picking into a future 3.5.x release label Jul 2, 2022
@akien-mga akien-mga modified the milestones: 3.x, 3.6 Dec 12, 2022
@lawnjelly
Copy link
Member

@clayjohn that branch of yours looked fine to me, and as we haven't heard back, maybe you could make a PR, as it should (in theory anyway 😁 ) fix the issue without the risk of unnecessary API calls.

@akien-mga akien-mga removed the cherrypick:3.5 Considered for cherry-picking into a future 3.5.x release label Aug 10, 2023
@akien-mga
Copy link
Member

Superseded by #80481. Thanks for the contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: PRs for 3.6 beta 1
Development

Successfully merging this pull request may close these issues.

5 participants