-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keep the launchd daemon alive #45
Conversation
Working with multiple lima clusters in the recent weeks I found that socket_vmnet is not running for unknown reason. The typical flow is trying to start the clusters, and lima hostagent fails with connection refused with /var/run/socket_vmnet. This happens to me one or more times in the same day. Trying to run a stress test creating and destroying the lima clusters 50 times fails after several runs and from the point of the failure, all runs failed. The issue seems to be that socket_vmnet is stopped by launched because it seems to be idle and it is never started again. Adding the keep alive option eliminated this issue. With this change the daemon is kept running and it should restart after failures. Signed-off-by: Nir Soffer <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM
|
This makes sense, but I don't see how it can be idle when I'm running 3 clusters with many components communicating between the clusters (e.g. rbd mirroring, submariner tunnels, etc). And worse terminating socket_vmnet breaks the lima vms network - after termination the hostagent is in a kind of busy loop logging errors about using a closed connection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, but I don't think this launchd file is used by Lima?
It is not used when you are using "managed" networks. In that case Lima is starting/stopping the daemon as-needed. But Lima can also connect to an "unmanaged" network, where you just specify the socket address. I assumed that is what @nirs was doing, and in that case you would use the |
Right, this is how I use it. I feel safer when limactl does not have special permissions, and ensuring that running socket_vmnet from brew is safe require changing ownership on brew directories which is messy and breaks brew upgrades. Using |
I think the issue was wrong handling of SIGPIPE #48. You can see in the log from #43:
I think the issue is:
The SIGPIPE issue is fixed in #49, but there may be other reasons for fatal failure. |
Working with multiple lima clusters in the recent weeks I found that socket_vmnet is not running for unknown reason. The typical flow is trying to start the clusters, and lima hostagent fails with connection refused with /var/run/socket_vmnet. This happens to me one or more times in the same day. Trying to run a stress test creating and destroying the lima clusters 50 times fails after several runs and from the point of the failure, all runs failed.
The issue seems to be that socket_vmnet is stopped by launched because it seems to be idle and it is never started again. Adding the keep alive option eliminated this issue.
With this change the daemon is kept running and it should restart after failures.