-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure to grant database privileges in CockroachDB on startup #1304
Comments
A similar change broke the CLI github action, because CRDB was not pinned to a specific version: oxidecomputer/oxide.rs#203 While I think we can - and should - update the SQL statement, I don't think we updated the revision of CRDB being used by Omicron recently. Do you have any idea why this failure would occur, if we're still using a months-old version of cockroach? |
FWIW, on Linux, I upgrade my rev of CRDB to |
The error message implies that the syntax change was handled by Cockroach without issue. It looks to me like the real problem is:
|
@bnaecker is it possible that cockroach died during this phase? You might check the stdout/stderr and the log files it writes to. Alternatively, you could look at the SMF log for CockroachDB to see if SMF restarted it due to processes exiting. |
@davepacheco I noticed that re-reading the error message. It does seem like two distinct failures, since the |
@davepacheco I'm actually not sure we have Cockroach's logfiles at this point. When sled-agent noticed this failure, it tore down the zone and tried again. I think that removes any extant state of the zone, including whatever log files Cockroach was writing, right? |
Reopening this because I don't think #1305 fixed the real issue here. |
If it destroys the ZFS dataset that was used for the CockroachDB files, then yes, it would have destroyed the evidence we'd want for further investigation here. I'd suggest that if zone setup fails and we want to retry it (which I'm not sure is a good idea, but might be), we should at least archive the on-disk state so we can debug problems like this (which will of course happen in the field too). We may also want to save core files of any processes running in the zone or the ones of interest to us. @smklein I don't know enough about how this works today -- would that be a reasonable issue to file? |
I saw an internal server error installing the control plane this evening. I built and installed Omicron using
omicron-package
, building from a local commit merging #1298 and the current main, which is at 154a4a6. From a fresh reboot I:pfexec ./tools/create_virtual_hardware.sh
cargo build --bin omicron-package --release
./target/release/omicron-package package
pfexec ./target/release/omicron-package install
Looking at the sled agent logs, I saw:
The relevant bit there is the long list of database permissions. The sled agent starts up CRDB, then runs
dbinit.sql
to initialize the database and schema. That includes a statement granting permissions to operate on the tables in the database. That syntax appears outdated as of cockroachdb/cockroach#73065. The sled agent apparently gets a non-zero exit code from the subprocess running that SQL file, since it tears down the zone and tries again. The second attempt succeeds. I'm not sure why there would be a difference.We should probably adopt the recommended action which is to use
ALTER DEFAULT PRIVILEGES
instead.The text was updated successfully, but these errors were encountered: