-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/147 Create a separate dockerized server #229
Conversation
@Tarmslitaren I know that this is a big one and that you probably weren't expecting it take your time before you merge it. Let me know if you have questions. |
This is awesome! I was looking at this myself last weekend and realized it needed a larger refactor to make it work. I'll definitely give it a spin and see if it works for us! Our next meet isn't till the 20th, so I won't have real world feedback till then. But I'll try building it in advance at least. Thank you for putting this together! |
I very excitedly played with this tonight. Ran into a small build issue which I've noted in the PR. Additionally, a couple of observations:
This is awesome! My group has struggled a bit with connectivity stability, I suspect this could eliminate problems for us. My group will be excited to try it! |
Co-authored-by: Daniel Kimsey <[email protected]>
Thanks for looking in to things. I accepted your change. I added listeners for SIGINT and SIGTERM, which do seem to help. I had them call the existing stopServer before exiting it should be more graceful, but there still is probably a better way to do / handle things. When I use docker run attached it still won't respond for ctrl+c for me unless I pass the -it flags. I'm working on Windows if it is potentially related to that. I did see that it does appear to respond correctly to start / stop in Docker desktop. |
This is really nice. I will try to test this out on Friday. I've been running FrosthavenAssistant on my desktop and connecting with mobile devices for reliability issues. This would make it much more convenient. |
Looks nice guys! Tell me when you think this is ready and tested ( within reason ) and then I will merge. I really appreciate the engagement since this is not a feature that's up my alley. |
So we did run into some trouble with our attempt this weekend. There were cases where the server just stopped listening/responding. Process is still running, but the server is no longer responding. Nothing useful in the logs, but it looked like the usual suspect: one of our players has a (cough, strong) tendency to navigate away from the app. It wasn't a guarantee, but when it happened it would definitely kill the server until I restarted the process. Thinking aloud, I'm not sure what to do here. The process doesn't log enough information to provide useful diagnostics. So maybe, that's worth doing in this effort? Or perhaps as a second PR on top of this one? Otherwise it was definitely more stable than running it from the phone, that's a success! |
If you prefer to add diagnostics in a separate MR, then I can merge this now. (provided, regular non-dockerized connections still work ok) If the server stops responding after a client disconnects and the reconnects frequently, it is of course useful to look into how the reconnecting is working. It is possible a new socket connection is opened every time, which might not be great. (Haven't touched this in a while) |
I'll put in some additional logging and see what I can do to recreate. I feel like I saw some similar things the one real world test that I ran, but I wasn't able to consistently recreate afterwards after some tweaks. I'll try running some additional testing with multiple clients over a longer timeframe to see if I can recreate this. My group typically runs on a monthly cadence. I might be able to get a real world test in a few weeks. |
We tried this out on Friday and ran through two scenarios. It worked well, but there seemed to be some wonkiness when hitting undo more than a few times. I will test it out again later. |
I pushed out some changes the most obvious thing that I've been finding that could cause an issue is trying to access something on a closed socket connection, which could potentially happen in a race condition and could be more likely with many clients connecting and disconnecting. I added some logging about the connection health including how many connections have been made. I suspect some of the server unresponsive things could be how the client is handling the connection when the app is paused and then resumed with switching away. On the undo side I did see that there could be an index access error, but think that it might need a little more work than that potentially. In particular I've noticed some strange behavior if the edit screen is up with one client for a character and a different client undo's the actions done with that edit screen. I personally am okay shipping with potential issues with undo, but would like to confirm that the stability issues have been addressed. |
Definitely fewer crashes. Last night we experienced ~ 2 hard crashes that took the server down, (container needed to be restarted). Only 1 or 2 disconnects. Noted bugs:
That all being said, this is good work. I got compliments from my players that the changes were definitely improving the game play experience. I know squashing this bugs isn't likely directly server code. But I think the fact this PR separates the server from the UI is letting us identify and tackle them. And that's been 100% worth it. |
@dekimsey Can you say which scenario load caused the crash? It might not be related, since the crash is on a ping message, but it doesn't hurt to check, in case it's a reproducible error. |
Will do.
Interesting, I know I re-fetched, rebuilt, and pulled it for the evening's game. I checked, and I can honestly say I don't know what happened. Perhaps I failed to reset the head or rebase. But I'm definitely missing the added logging. Very strange, sorry for the bad report!
Scenario |
Thanks for the info, even if it wasn't the most up to date I did find a few additional places that were accessing the remote address on the socket that could potential be closed in a race condition that from what I've seen has been the source of the hard crashes. I did get a chance to use things out in the wild yesterday and everything seemed to work well. I did see a few server unresponsive messages pop-up, but they did seem to indicate an issue with the server, but I suspect are mostly due to connections breaking and needing to be reestablished when the client app pauses if that is the Flutter terminology for the application lifecycle event when the app is hidden. I did see that with 4 users in about a 2 and a half hour session we saw 66 total connections get established. If everyone else is I think I'm fine with merging at this point and when I'm able to I'll try to dig in to the client code and see I can make some improvements to how it manages connections to get rid of the server unresponsive errors. I did push in a couple minor changes today with this comment where I made some of the code accessing the remote address on a Socket safer and I also added the client and server versions to the set network message client mismatch error. |
All right, I'll merge this now. Thank you everyone! Please contact me for any questions regarding the code base if needed, gong forward. And please advise if/when it would be a good idea to backport the separate server for use in the flutter app. The main valid reason for clients to disconnect is when they go to background is in IOS, the platform does not allow sockets to stay open in background. Android is more open, but there are still limitations. There is code to automatically reconnect and so get the latest state when app goes back to foreground, however as discussed here it seems to not always work as intended for getting the latest state? |
Network refactoring around the server code.
Moved the base server logic without the Flutter dependencies to its own abstract class and refactored the existing Server class / file to inherit from it.
I then moved the abstract class to a separate pure dart project folder frosthaven-assistant-server and created an implementation in pure dart. The implementation was able to be fairly simplified due to not fully needing to understand and update the game state itself since it can mostly serve as a relay to the other clients.
When containerizing the code I found that the remoteAddress was not necessarily enough information to distinguish between the different socket clients so I made a small change to also include the remote port.
If I missed something please let me know. I wasn't sure if it was necessary to update the github actions for the project
Also I've done some testing throughout the development and tested it during a Gloomhaven session this last weekend. It worked fairly well. It was a little hard to know if issues I saw were transient issues caused by things like my laptop going to sleep and running on a hotel WiFi network. I'd appreciate additional testing and being informed of any issues encountered and steps to reproduce if possible.
Note that I tried to make sure that I didn't make changes that would cause problems with the existing client / server communication contract.