Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic: runtime/cgo: misuse of an invalid Handle (Code-Hex/vz/v3.changeStateOnObserver) #1381

Closed
AkihiroSuda opened this issue Feb 24, 2023 · 14 comments · Fixed by #1494, #1497 or #1525
Closed
Labels
bug Something isn't working component/vz

Comments

@AkihiroSuda
Copy link
Member

$ limactl start template://experimental/vz

$ limactl stop vz
INFO[0000] Sending SIGINT to hostagent process 47742    
INFO[0000] Waiting for the host agent and the driver processes to shut down 
INFO[0000] [hostagent] 2023/02/24 17:59:43 tcpproxy: for incoming conn 127.0.0.1:51498, error dialing "192.168.5.15:22": connect tcp 192.168.5.15:22: no route to host 
INFO[0000] [hostagent] 2023/02/24 17:59:53 tcpproxy: for incoming conn 127.0.0.1:51499, error dialing "192.168.5.15:22": connect tcp 192.168.5.15:22: connection was refused 
INFO[0000] [hostagent] Received SIGINT, shutting down the host agent 
INFO[0000] [hostagent] Shutting down the host agent     
INFO[0000] [hostagent] Stopping forwarding "/run/lima-guestagent.sock" (guest) to "/Users/suda/.lima/vz/ga.sock" (host) 
INFO[0000] [hostagent] Shutting down VZ                 
ERRO[0000] [hostagent] dhcp: unhandled message type: RELEASE 
INFO[0001] [hostagent] panic: runtime/cgo: misuse of an invalid Handle 
INFO[0001] [hostagent] goroutine 17 [running, locked to thread]: 
INFO[0001] [hostagent] runtime/cgo.Handle.Value(...)    
INFO[0001] [hostagent]  /usr/local/Cellar/go/1.19.6/libexec/src/runtime/cgo/handle.go:124 
INFO[0001] [hostagent] github.com/Code-Hex/vz/v3.changeStateOnObserver(0x0, 0xc000006801?) 
INFO[0001] [hostagent]  /Users/suda/gopath/pkg/mod/github.com/!code-!hex/vz/[email protected]/virtualization.go:160 +0xec 

version: e939454

@AkihiroSuda AkihiroSuda added bug Something isn't working component/vz labels Feb 24, 2023
@AkihiroSuda
Copy link
Member Author

Looks like this happens only on stopping a new instance

@balajiv113
Copy link
Member

Will check this out

@balajiv113
Copy link
Member

Debugged this, this is from upstream only. Will raise a ticket in vz. I tried to do some workaround but none worked.

@jandubois
Copy link
Member

Looks like this happens only on stopping a new instance

Yes, looks like that for me too. It also hangs for almost another 3 minutes after the panic() before exiting the process. This seems to be inside the vz package, so I don't think we could capture it and exit right away (maybe not worth doing anyways).

After restarting the instance, I can shut it down normally without a panic, but still get an error from DHCP:

ERRO[0001] [hostagent] dhcp: unhandled message type: RELEASE

@balajiv113
Copy link
Member

but still get an error from DHCP:
DHCP error is from gvisor-tap-vsock.

@dmigwi
Copy link

dmigwi commented Mar 4, 2023

I experienced the same error working on a separate product on Mac Os. The crashes also happened severally when I was shutting down the application. After updating the o.s to Monterey the crash seems to have disappeared by I want to understand what exactly happened.
image

@jandubois
Copy link
Member

@dmigwi This issue is about macOS 13 (Ventura) because there is no Virtualization framework support in Lima for earlier macOS version (we only support EFI boot, which was added in Ventura). So if your problem with the vz library was fixed by Monterey, then this is a separate issue.

@dmigwi
Copy link

dmigwi commented Mar 4, 2023

@jandubois Thanks for letting me know that is issue is not yet resolved since Ventura is the latest Mac Os release. A weird thing that happened when I experienced that crash on my product (desktop app) is my mac was weirdly hot and the fan was humming way louder than usual. This led me to believe something in the background or o.s. level was pushing my CPU to its extreme limit.

I tend to believe the golang GC kicked in to clean up memory space it thought wasn't needed anymore since space was kinda being stretched thin then. The communication between golang and cgo is via memory locations inform of integers, so when cgo attempted to resolve the memory location that turned out to reused by something else that crash happened then. Its all documented here: https://golang.design/research/cgo-handle/

My initial hypothesis is that the crash happens when the garbage collector runs and cleans up memory locations its not expected to mess with. I could be wrong on this!

@jandubois
Copy link
Member

@dmigwi Please report your issue on the https://github.com/Code-Hex/vz repo if you have additional information to reproduce the problem.

This repo is purely about Lima, so we will not be able to help you with your project here, since you seem to be experiencing a separate issue.

pendo324 added a commit to runfinch/finch that referenced this issue Apr 12, 2023
Issue #, if available:

*Description of changes:*
- always force stop when using Virtualization.framework, potential
workaround for lima-vm/lima#1381 (needs more
testing)

*Testing done:*
- local testing


- [x] I've reviewed the guidance in CONTRIBUTING.md


#### License Acceptance

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

---------

Signed-off-by: Justin Alvarez <[email protected]>
@AkihiroSuda
Copy link
Member Author

AkihiroSuda commented Apr 15, 2023

Still doesn't seem fixed (macOS 13.3.1, Intel)

$ limactl stop vz
INFO[0000] Sending SIGINT to hostagent process 33283    
INFO[0000] Waiting for the host agent and the driver processes to shut down 
INFO[0000] [hostagent] 2023/04/15 22:39:32 tcpproxy: for incoming conn 127.0.0.1:60602, error dialing "192.168.5.15:22": connect tcp 192.168.5.15:22: no route to host 
INFO[0000] [hostagent] 2023/04/15 22:39:42 tcpproxy: for incoming conn 127.0.0.1:60612, error dialing "192.168.5.15:22": connect tcp 192.168.5.15:22: connection was refused 
INFO[0000] [hostagent] Received SIGINT, shutting down the host agent 
INFO[0000] [hostagent] Shutting down the host agent     
INFO[0000] [hostagent] Stopping forwarding "/run/lima-guestagent.sock" (guest) to "/Users/suda/.lima/vz/ga.sock" (host) 
INFO[0000] [hostagent] Shutting down VZ                 
ERRO[0000] [hostagent] dhcp: unhandled message type: RELEASE 
INFO[0001] [hostagent] panic: runtime/cgo: misuse of an invalid Handle 
INFO[0001] [hostagent] goroutine 17 [running, locked to thread]: 
INFO[0001] [hostagent] runtime/cgo.Handle.Value(...)    
INFO[0001] [hostagent]  /usr/local/Cellar/go/1.20.3/libexec/src/runtime/cgo/handle.go:124 
INFO[0001] [hostagent] github.com/Code-Hex/vz/v3.changeStateOnObserver(0x0, 0xc000006801?) 
INFO[0001] [hostagent]  /Users/suda/gopath/pkg/mod/github.com/!code-!hex/vz/[email protected]/virtualization.go:160 +0xb4 

@AkihiroSuda AkihiroSuda reopened this Apr 15, 2023
@balajiv113
Copy link
Member

@AkihiroSuda
Is the steps to reproduce are the same ??

I tried with multiple create stop and start stop use cases :( Didn't face the issue.

@AkihiroSuda
Copy link
Member Author

@AkihiroSuda Is the steps to reproduce are the same ??

Yes

I tried with multiple create stop and start stop use cases :( Didn't face the issue.

Probably because this is a race condition

@abiosoft
Copy link
Contributor

This is still happening on v0.15.1.

A user reported it abiosoft/colima#691 (comment) and I am also able to reproduce it.

@AkihiroSuda AkihiroSuda reopened this Apr 16, 2023
@joellarsonibotta
Copy link

joellarsonibotta commented Apr 27, 2023

I can confirm that it is still happening for me:

$ colima stop
INFO[0000] stopping colima
INFO[0000] stopping ...                                  context=docker
INFO[0002] stopping ...                                  context=vm
> [hostagent] panic: runtime/cgo: misuse of an invalid Handle
> [hostagent] goroutine 17 [running, locked to thread]:
> [hostagent] runtime/cgo.Handle.Value(...)
> [hostagent] 	/opt/homebrew/Cellar/go/1.20.3/libexec/src/runtime/cgo/handle.go:124
> [hostagent] github.com/Code-Hex/vz/v3.changeStateOnObserver(0x0, 0x14000002680?)
> [hostagent] 	/Users/brew/Library/Caches/Homebrew/go_mod_cache/pkg/mod/github.com/!code-!hex/vz/[email protected]/virtualization.go:160 +0xb0

Version Info:

colima version 0.5.4
limactl version 0.15.1

INFO[0000] colima is running using macOS Virtualization.Framework
INFO[0000] arch: aarch64
INFO[0000] runtime: docker
INFO[0000] mountType: virtiofs
INFO[0000] socket: unix:///$HOME/.colima/default/docker.sock

I do see that there's a problem identified in vz with a solution coming soon: Code-Hex/vz#124

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working component/vz
Projects
None yet
6 participants