-
-
Notifications
You must be signed in to change notification settings - Fork 21.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added null check to prevent segfaults on disconnects while using websockets #33930
Added null check to prevent segfaults on disconnects while using websockets #33930
Conversation
02903e1
to
89f3b98
Compare
There should be no race condition there. Unless you are using threads yourself to poll the server/client. I don't see how this fixes an issue, can you provide a minimum reproduction project that shows just this issue? And that this patch fixes it? Note: as stated in the comments, #33279 is most likely related to #33788 , are you sure you are not experiencing that as well? |
I agree but I am just saying what we where seeing. As you can see from the log I posted on issue #33279 our system logs that a player disconnected followed by:
Around the websocket networking code in 3.1 there where 3 locations that where accessing the result from get_peer directly that where not trying to access the server. (You fixed both of these on PR #31482 and #31483) Stability improved when the first two where fixed but we did not fully see the segfaults go away till I added the null check to the
We are not using any threads or doing any polls ourselves. We are using the standard Godot High Level Multiplayer API.
I saw that but we are running a headless mono godot server in release and from you comments (correct me if I am wrong though) it only affected debug enabled exports.
Sure, I can work on putting one together later this week, but testing it may be tricky. Like with issue #33279 it fails quite randomly and the frequency of failures depends on the frequency of connections and disconnections. We only found the problem after we put the server up live for people to test out. I do not think it would be of much help to you but I can provide you our most recent logs as well if you wish to see them. For Reference Full Log of Segfault I posted on issue #33279
|
It's more consistent in debug enabled exports, but memory corruption might still exists in release builds. I'm still not convinced this is the issue, sorry. |
Sure I think we could try to test it with that fix. We are in the middle of testing it right now though but I think we can do it in the next couple of days or so. |
Ok I was able to run a test tonight since we are planning on bringing the server down for a bit anyway. Before I go into the results here is some background info:
We ran a total of 3 tests on our server and we crashed 3 times. Looking through the logs I did not see a notification that freeing references was failing due to them being locked. Here is the shortest of the 3 logs:
We did see these messages starting to popup with debug enabled though:
Here are all 3 full logs for you to review if you wish. |
In your log I see:
|
Oh sorry, I was in a rush to test it as I was going on vacation and I think I used an older branch that I forked by mistake. Ill double check it as soon as I get back from it. |
Closing, this does not address the real problem, which is related to #33290 and references being deleted during signal emission. |
Sorry I was unable to get back to this. I am happy the root issue was found though. Thank you for your time on this. |
While testing a websocket enabled server I started to experience some random crashes when players disconnected similar to what was reported on issue #33279. Looking over the logs the best I could tell is that the
_peer_map
was being modified between theERR_FAIL_COND(!has_peer(p_peer_id))
check indisconnect_peer
and theERR_FAIL_COND_V(!has_peer(p_id), NULL)
check inget_peer
causing it to return a NULL causing the segfault.To fix this I replaced the
ERR_FAIL_COND(!has_peer(p_peer_id));
check indisconnect_peer
with a null check similar to the change made on PR #31482. This does not fix the underlying race condition with_peer_map
but the server now only logs the issue instead of out right crashing.It should be noted that there where two areas in
websocket_multiplayer_peer
that also needed null checks but these where added on PR #31482 and #31483. This issue also occurs in 3.1 so cherry-picking this commit would help solve it there since it looks like PR #31482 and #31483 have already been moved to 3.1 as well.