Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

EOSVM OC Compiler Monitor crash #10992

Open
eosusa opened this issue Dec 30, 2021 · 1 comment
Open

EOSVM OC Compiler Monitor crash #10992

eosusa opened this issue Dec 30, 2021 · 1 comment

Comments

@eosusa
Copy link

eosusa commented Dec 30, 2021

Was syncing a StateHist node on the WAX Mainnet and was 1.5weeks into sync when nodeos suddenly crashed out with the following error in the stderr.txt log:

ERROR: EOS VM OC Compiler monitor exiting with active connections

When checking dmesg, i see the following error:

[1367672.002726] traps: oc-compile[18426] general protection fault ip:7f0c4bc33d00 sp:7ffc3d9728a0 error:0 in libc-2.27.so[7f0c4bb9f000+1e7000]

I am using the WAX 2.0.12wax01 code from https://github.com/cc32d9/wax2.0/tree/v2.0.12wax01 and had http, chain, and statehist plugins running (ship was collecting trace and console logging). Has a single p2p connection to a geographically close node and had synced perfect from genesis all the way to the #150385039 - #150385538 range and then died.

Machine is 18.04 on a bare-metal i9 10900k, 128gb ddr4, with os/state on 2tb nvme and block/ship on zfs mirrored 8tb ssds

@spoonincode
Copy link
Contributor

This message from OC tends to be a red herring (and probably should have its direness toned down). The OC monitor is a separate process that maintains a socket based connection to nodeos and performs actions (compiling WASM) on nodeos' request. This message (OS VM OC Compiler monitor exiting with active connections) is the OC monitor losing its connection to nodeos while it is still performing actions on nodeos' behalf. That a critical error from the perspective of the OC monitor: nodeos should never close the socket connection to the OC monitor without performing a proper shutdown communication sequence first.

Sometimes OC monitor doesn't even get the opportunity to print this message, as it asks the kernel to SIGKILL it when nodeos dies anyways!

Typically this message comes about because nodeos crashes or otherwise uncleanly shuts down while a compilation is on going. The traps: oc-compile[18426] general protection message may indicate that a compilation failed due to the system being out of memory, which could have cascaded to nodeos failing due to it also running out of memory. (oc-compile processes failing/crashing on their own is not a problem)

Do you have any other interesting log messages during this time frame? Any feel for system resource usage?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants