Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread 'tokio-runtime-worker' panicked at 'Critical database error // Too many open files -- 2409-2 upgrade #6488

Closed
2 tasks done
infrachris opened this issue Nov 15, 2024 · 5 comments
Labels
I2-bug The node fails to follow expected behavior. I10-unconfirmed Issue might be valid, but it's not yet known.

Comments

@infrachris
Copy link

infrachris commented Nov 15, 2024

Is there an existing issue?

  • I have searched the existing issues

Experiencing problems? Have you tried our Stack Exchange first?

  • This is not a support question.

Description of bug

While starting up for the first time after upgrading to 2409-2, I received the following error:
Thread 'tokio-runtime-worker' panicked at 'Critical database error: Custom { kind: Other, error: Error { message: "IO error: While open a file for random read: /home/polkadot/.local/share/polkadot/chains/polkadot/db/full/4304564.sst: Too many open files" } }', /builds/substrate/primitives/database/src/kvdb.rs:29 This is a bug. Please report it at: https://github.com/paritytech/polkadot-sdk/issues/new polkadot.service: Main process exited, code=exited, status=1/FAILURE polkadot.service: Failed with result 'exit-code'. polkadot.service: Consumed 1min 23.427s CPU time.

The node failed to stay running upon the start where this error was received and went into a failed/stopped state. Node stayed running after a subsequent start.

I have ulimits set to 10,000, and wait 120seconds between stopping and starting.

Steps to reproduce

This occurred while upgrading just one of my nodes, but it was the only polkadot node I have (RocksDB). I will work on posting more detail along with some surrounding error messages shortly.

@infrachris infrachris added I10-unconfirmed Issue might be valid, but it's not yet known. I2-bug The node fails to follow expected behavior. labels Nov 15, 2024
@TheCrazyStaker
Copy link

I think this error tokio-runtime-worker' panicked has nothing to do with upgrading to 2409-2. I got the same error 1 day before 2409-2 was released.

@bkchr
Copy link
Member

bkchr commented Nov 15, 2024

I have ulimits set to 10,000, and wait 120seconds between stopping and starting.

If you increase it to 20000, does it fix it?

@infrachris
Copy link
Author

I have ulimits set to 10,000, and wait 120seconds between stopping and starting.

If you increase it to 20000, does it fix it?

It hasn't been very repeatable as just attempting to start the service again did not show the error. I'll try 20000 if it happens again. I'm also going to dig much more if there's a hidden ulimit or something going on, and update or close this. Thank you

@bkchr
Copy link
Member

bkchr commented Nov 15, 2024

The node automatically raises the limit to maximum that is possible without root rights. It also prints a warning if this target limit is below 10_000. However, I think these 10_000 are probably more like a random number that we maybe need to increase now.

@infrachris
Copy link
Author

I've changed ulimit to 1048576, I'll update and re-open if this comes up again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I2-bug The node fails to follow expected behavior. I10-unconfirmed Issue might be valid, but it's not yet known.
Projects
None yet
Development

No branches or pull requests

3 participants