Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3.x] Add deferred notifications #65581

Closed
wants to merge 1 commit into from

Conversation

lawnjelly
Copy link
Member

@lawnjelly lawnjelly commented Sep 9, 2022

Adds the ability to defer sending notifications to the next flush, instead of sending notifications immediately. This system allows the prevention of duplicate notifications on the same frame / tick, which can result in large numbers of notifications and slowdown.

Helps address #61929
Alternative to #74672

Notes

  • Although [3.x] Faster queue free #62444 helps with notifications specifically during deletion, this PR deals with detaching and moving nodes, which also suffers from slowdowns.
  • Duplicate notifications can become especially problematic with large numbers of child nodes, especially NOTIFICATION_MOVED_IN_PARENT
  • This adds a framework whereby deferred notifications are sent via the SceneTree rather than sent directly to the Node. The SceneTree keeps lists for each deferred notification type, which can be flushed on frames and physics ticks.
  • In order to prevent duplicates, dirty flags are stored on each node for each deferred type.
  • Also relevant to 4.x, I can do version for 4.x if this is approved.

Statistics

Using the MRP from #61929, trying with Nodes, blank and filled MeshInstances:

30000 nodes queue_deleted:

Before 20.6 seconds
After 7.06 seconds

Notifications sent:

Before 450 million
After 30000

Discussion and additional methods

This PR essentially replaces the heavyweight notification with a cheaper call, but these cheap calls are still made a lot of times.

An additional technique might be to maintain a list of interested children for each notification type (like observer pattern), but the added complexity and memory use is likely not warranted at this stage. Another problem is that for NOTIFICATION_MOVED_IN_PARENT, we are only sending to the children after a certain point in the child list, and this could become expensive unless the subscriber list was also sorted by child index.

Another idea is for nodes to register their interest in a notification type (via a set of bitflags), and check this prior to sending the more heavyweight notification.

However although both of these techniques are theoretically possible in core, there is a problem - users must also be able to respond to notifications (either through script or deriving from classes in modules etc), and users would have to register interest / or lack of interest somehow, and thus this might not be doable in a backward compatible way.

@timothyqiu
Copy link
Member

This breaks compat I guess? Because logic inside NOTIFICATION_MOVED_IN_PARENT won't be executed immediately.

For example, Skeleton2D marks the bones dirty in this notification. Bone references might be wrong until the notification is received.

if (p_what == NOTIFICATION_MOVED_IN_PARENT) {
if (skeleton) {
skeleton->_make_bone_setup_dirty();
}
}

@lawnjelly
Copy link
Member Author

lawnjelly commented Nov 23, 2022

This breaks compat I guess? Because logic inside NOTIFICATION_MOVED_IN_PARENT won't be executed immediately.

For example, Skeleton2D marks the bones dirty in this notification. Bone references might be wrong until the notification is received.

Yes absolutely, I should have emphasised this. 👍

By deferring the notifications till the next flush (currently at the start of the physics tick and frame) there will inevitably be a potential gap in which nodes have a pending notification, if they are processed before the next flush. So there is the potential for regressions which may need fixing / bypassing the system.

This is one of those things that is hard to predict the special cases a priori so would turn up through a few betas - there are highly likely to be 2 or 3, most likely resulting in a frame delay in something. However imo the benefits of reducing notifications are significant. The current system is flawed imo - the runaway notifications is clearly nonsense behaviour, so it is well worth working through any regressions to get to a better new stable point.

I will try and have a find in files through though and see if there are any more obvious pinchpoints, so we can fix ahead of time.

Update:
Ok having a find_in_files look (I probably did this at the time of making the PR but here's an update):
NOTIFICATION_MOVED_IN_PARENT is actually only responded to in limited places:

  • Control - in cases where it needs the order of children to draw. This will presumably be updated in the flush before the frame update, but if necessary we can put a flush in between the idle process call and the actual draw operation.
  • CanvasLayer - similar situation to Control
  • Skeleton2D - the case you mention above.

Out of these the Skeleton2D is most likely to be interesting, but I see the action it creates is making a deferred call:

void Skeleton2D::_make_bone_setup_dirty() {
	if (bone_setup_dirty) {
		return;
	}
	bone_setup_dirty = true;
	if (is_inside_tree()) {
		call_deferred("_update_bone_setup");
	}
}

This is likely having a frame delay already, and it seems like at the moment the deferred notification will worst case create a frame delay, so maybe a 2 frame delay? I'll try and test this.

@lawnjelly lawnjelly force-pushed the deferred_notifications branch from 6bce782 to ad9d4ea Compare November 23, 2022 15:29
@lawnjelly
Copy link
Member Author

As a result of testing for extra "safety" I've added one extra flush(), at the end of the idle() and before the draw. This should catch the case if things need to have draw order updated due to changes in idle, prior to the draw (otherwise there is at least the possibility of a frame delay).

Should be fine now even with Skeleton2D, as far as I can see. 👍

It's possible to check this empirically by checking the queue is empty at the end of SceneTree::idle() (I did this for testing, but removed for the PR).

@RandomShaper
Copy link
Member

It's possible to check this empirically by checking the queue is empty at the end of SceneTree::idle() (I did this for testing, but removed for the PR).

May it be a good idea to kept that check in debug builds so users can detect and report potential still unprocessed notifications, or can't that reasonably happen at this point?

@lawnjelly
Copy link
Member Author

May it be a good idea to kept that check in debug builds so users can detect and report potential still unprocessed notifications, or can't that reasonably happen at this point?

_call_idle_callbacks() looks like the only other call that could cause this in SceneTree. However in Main::iteration() there is a message_queue flush prior to the draw, which could just possibly cause a notification, so I can put a WARN_PRINT_ONCE there as a double check for this occurrence.

Given that deferred notifications could cause regression (we've tried to eliminate but there could be something unforeseen), we could make it optional 🤔 , however I'm not convinced this is totally necessary as it should show up in a beta.

@RandomShaper
Copy link
Member

I can put a WARN_PRINT_ONCE there as a double check for this occurrence.

Wouldn't harm I guess, but it's up to you.

@lawnjelly
Copy link
Member Author

Wouldn't harm I guess, but it's up to you.

Looking at it, it's a little ugly to put in Main::iteration() (because SceneTree is inherited from MainLoop etc) so maybe we can leave for now unless there's evidence of a problem. It's easy to test this when debugging anyway.

@RandomShaper
Copy link
Member

Makes sense.

Adds the ability to defer sending notifications to the next flush, instead of sending notifications immediately. This system allows the prevention of duplicate notifications on the same frame / tick, which can result in large numbers of notifications and slowdown.
@lawnjelly lawnjelly force-pushed the deferred_notifications branch from ad9d4ea to f315490 Compare December 7, 2022 11:09
@lawnjelly
Copy link
Member Author

On thinking about this, I've gone straight for using a pending notification flag instead of storing a tick count (which was mainly for debugging during development anyway). This makes it easier to add future flags (should we decide to do this) without increasing memory use, and is a little easier to understand for a reader.

@Zylann
Copy link
Contributor

Zylann commented Dec 16, 2022

Can't we also make NOTIFICATION_MOVED_IN_PARENT opt-in with a basic flag in Node, so it's not sent in the first place? That would likely solve the issue even faster for non-2D nodes, because they dont use that notification (see #61929 (comment)). I tested that option and it got rid of the issue for me.

@akien-mga akien-mga changed the title Add deferred notifications [3.x] Add deferred notifications Jan 9, 2023
@lawnjelly lawnjelly marked this pull request as draft March 12, 2023 05:26
@lawnjelly
Copy link
Member Author

Marking this as draft (even though approved) to prevent accidental merging, because I am now slightly more favoured towards my newer PR #74672 which solves the same problem in a more targeted way. (I can mark this as ready for review again if reviewers prefer this version though 👍 ).

@lawnjelly
Copy link
Member Author

Closing in favour of #82248 .

@lawnjelly lawnjelly closed this Feb 1, 2024
@AThousandShips AThousandShips removed this from the 3.6 milestone Feb 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants