Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace in-memory datastore with a CockroachDB-based one #57

Merged
merged 78 commits into from
Apr 13, 2021

Conversation

davepacheco
Copy link
Collaborator

The in-memory datastore was always intended just for prototyping, to defer the work of figuring out the database interaction until we had confidence in the bigger architectural pieces. It's time to replace it with a real database.

Compatibility / changes in developer workflow

My goal with this change is to make it as easy to run and test Omicron on CockroachDB as it was with the in-memory store. If you run omicron_dev db-run in one terminal to start up a single-node CockroachDB cluster, then you can essentially do everything you used to do with Nexus and Sled Agent, except that the data will be stored in CockroachDB instead of in memory. If you want to wipe the database, you can just stop and start omicron_dev db-run again -- it deletes its database when it shuts down. (I built it that way to mimic the in-memory version, though it's worth considering if we'd rather have it store data in some well-known directory by default and not delete it.)

The test suite uses the same facility to spin up a whole new CockroachDB instance for each test. This may be overkill -- in particular, it would probably be faster to spin up one CockroachDB cluster and use separate databases for each test -- but it works reliably for me and doesn't take very long.

It was tempting to keep the in-memory datastore, but I think that would add a lot more ongoing work. And given how easy it is to spin up single-node CockroachDB clusters that delete their data on clean shutdown, that seems like the better way to go.

In this change

Most of the new stuff is in the src/nexus/db. This is grouped into several files. (This ought to be documented in module-level documentation, but I have not done that yet.)

  • conversions.rs: provides conversions for Rust types to and from database types. The patterns here are documented at the top of this file.
  • datastore.rs: provides interfaces similar to the old in-memory datastore. These are the logical database operations (e.g., project_create())
  • operations.rs: provides low-level interfaces for database operations: mostly "query", "execute", and wrappers that handle common cases like "if no rows are returned, we should produce an ObjectNotFound error"
  • schema.rs: this is the glue between "datastore.rs" (application-level operations like "create project") and "operations.rs" (which executes database queries). This stuff describes our schema. This is the sort of thing that maybe could be replaced by Diesel or the like.
  • sql.rs: interfaces for generating SQL at least somewhat safely
  • sql_operations.rs: another big piece of glue between datastore.rs and operations.rs. This stuff generates the actual SQL queries.

Other notable changes here:

  • Updated README
  • The Nexus configuration file now has a required database.url property for configuring how to talk to CockroachDB.
  • Model changes (api_model.rs):
    • The objects in api_model.rs used to be wrapped in an Arc. The idea here was that we could cache these. However, our discussion of ORMs and Diesel led me to feeling like we want to discourage code paths that load entire objects most of the time, since that couples them tightly to the current schema. The pattern I'd like to pursue instead are to make more targeted updates. (See instance_update_runtime().)
    • I changed some of the numeric types in api_model.rs to promise that their values stay within i64 because that's what CockroachDB uses for integers.
    • I added a newtype for generation numbers. I also removed the unused generation number from ApiProject.
  • Updated Nexus to maintain a connection pool for the database
  • Updated the test suite to spin up databases as needed

Unrelated stuff here:

  • I added a new bail_unless! macro. This is like anyhow's ensure, but it produces an ApiError::InternalError.
  • I refactored the provision saga to use separate functions rather than closures. I think this will be a better pattern going forward.
  • There are some small changes as a result of sync'ing up with newer Dropshot.
  • I removed "boot_disk_size" from the Instance. I think this probably doesn't belong here, though we might still want it in the provision request as a shorthand for creating a disk.

Future work

There's a ton of future work here.

  • Update SagaLog to use a database!
  • Service discovery and connection pooling: I used bb8 here because it was easy. Based on my survey, I think we'll probably want to build our own cueball-like library.
  • The type conversions could probably be implemented with a proc macro.
  • There's not that much automated testing for the new stuff. (The basics are covered by the existing integration tests.)
  • We need better logging of queries.
  • We could use cockroach cert to generate real TLS certificates rather than running the local cluster in insecure mode.
  • For the functions in sql_operations.rs, they may make more sense inside the Table or LookupKey traits, or other traits that can be attached to types that impl these.
  • If we do that we may want to remove datastore.rs entirely, instead expecting sagas to interface directly with the lower level database code. Most of the stuff here are just wrappers for that stuff, and it may make more sense to wrap them with saga actions.

There's a lot more as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant