-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
node crashes and corrupts DB on heavy usage #2813
Comments
Can you build defid with debug enabled using |
I can try this. As these nodes should run 24/7 without interruption, I would prefer to only "force" such an interruption once. So to make sure it has everything prepared: I install
|
I'd just build with the following and skip the release steps. Binaries will be in
We can set the ulimit just for defid if you run it via systemd. Edit your service file, assuming it is
Run the following command.
Then start or restart the defid service.
If you use systemd ignore the rest below, if however you run defid under a user account you would set
Or add the line to your script, or create a wrapper script if cron ran a binary.
Or you could set ulimit system wide by editing
After reboot and see if ulimit is now set to unlimited. |
thx. Yes using systemd. Did the steps and its all running now. will report when i crashes again. |
Run
Then GZIP the core dump and include the defid binary as well. I will then fire up gdb with the core dump and the executable and get the back trace. However you could also just get the backtrace and post it here. Install gdb if not already installed and run the following replacing PID with the PID of the core dump.
Once at the GDB prompt type in |
thx. I prefer to not install too much on the production machine. will see if I can do it locally. One thing that I think might be the reason is a memory leak. Cause the logs on the crash fell a bit like an out of memory. Will try to watch the memory usage in the next hours |
update so far: it seems that the node is constantly increasing memory usage. started with 3g and is now at 5.7g |
quick update: node is now at 7g memory. Its holding up longer than expected already (not sure if its due to the unlimited setting or debug build or just because there was less load). |
memory is still rising but slower than in previous tries. now 7.4g. I will go back to the release build of 4.0.8 where it got the problem within 12hours. lets see if thats the difference or the unlimited |
@Bushstar running it with the So it seems that there is a difference to result of gdb:
(I guess it cannot read the data due to missing debug symbols, but since the error is not happening in debug build it will be hard to track down :/) last logs before the crash doesn't really help either:
|
running it on 4.0.9 with |
looks like its really the |
held up longer, but also crashed now. But again with no symbols. whats weird are those log messages, as they do not make sense imho. They come in massively right before it just stops.
this time gdb says:
(not sure if the unlimited setting works in the systemd as the dump is truncated, will try setting ulimit directly) |
@Bushstar it finally also happened in a debug build. but seems the its still truncated and does not have debug symbols. will try a completely fresh build, maybe the build process is not cleaning up correctly. this time it was a sig sev again
|
I managed to reproduce it locally with bot in dry-run mode (so not sending any transaction, basically only reading blockcount, pool data etc.). defid running directly in the console, it still just stops, but with the note about the kill:
core dump is 77.4GB ... bt shows:
build was done from latest master: |
yes, 4.1.0 is running without any issues here. |
Summary
I am running multiple nodes on different servers. All get heavy usage of transactions (lots of TD and EVM txs, also native swaps).
Trying to update to 4.0.8 leads to a node crash with corrupted DB within a few hours.
It all works perfectly stable on 4.0.3
Environment
The text was updated successfully, but these errors were encountered: