replace in-memory datastore with a CockroachDB-based one #57

davepacheco · 2021-04-13T16:56:24Z

The in-memory datastore was always intended just for prototyping, to defer the work of figuring out the database interaction until we had confidence in the bigger architectural pieces. It's time to replace it with a real database.

Compatibility / changes in developer workflow

My goal with this change is to make it as easy to run and test Omicron on CockroachDB as it was with the in-memory store. If you run omicron_dev db-run in one terminal to start up a single-node CockroachDB cluster, then you can essentially do everything you used to do with Nexus and Sled Agent, except that the data will be stored in CockroachDB instead of in memory. If you want to wipe the database, you can just stop and start omicron_dev db-run again -- it deletes its database when it shuts down. (I built it that way to mimic the in-memory version, though it's worth considering if we'd rather have it store data in some well-known directory by default and not delete it.)

The test suite uses the same facility to spin up a whole new CockroachDB instance for each test. This may be overkill -- in particular, it would probably be faster to spin up one CockroachDB cluster and use separate databases for each test -- but it works reliably for me and doesn't take very long.

It was tempting to keep the in-memory datastore, but I think that would add a lot more ongoing work. And given how easy it is to spin up single-node CockroachDB clusters that delete their data on clean shutdown, that seems like the better way to go.

In this change

Most of the new stuff is in the src/nexus/db. This is grouped into several files. (This ought to be documented in module-level documentation, but I have not done that yet.)

conversions.rs: provides conversions for Rust types to and from database types. The patterns here are documented at the top of this file.
datastore.rs: provides interfaces similar to the old in-memory datastore. These are the logical database operations (e.g., project_create())
operations.rs: provides low-level interfaces for database operations: mostly "query", "execute", and wrappers that handle common cases like "if no rows are returned, we should produce an ObjectNotFound error"
schema.rs: this is the glue between "datastore.rs" (application-level operations like "create project") and "operations.rs" (which executes database queries). This stuff describes our schema. This is the sort of thing that maybe could be replaced by Diesel or the like.
sql.rs: interfaces for generating SQL at least somewhat safely
sql_operations.rs: another big piece of glue between datastore.rs and operations.rs. This stuff generates the actual SQL queries.

Other notable changes here:

Updated README
The Nexus configuration file now has a required database.url property for configuring how to talk to CockroachDB.
Model changes (api_model.rs):
- The objects in api_model.rs used to be wrapped in an Arc. The idea here was that we could cache these. However, our discussion of ORMs and Diesel led me to feeling like we want to discourage code paths that load entire objects most of the time, since that couples them tightly to the current schema. The pattern I'd like to pursue instead are to make more targeted updates. (See instance_update_runtime().)
- I changed some of the numeric types in api_model.rs to promise that their values stay within i64 because that's what CockroachDB uses for integers.
- I added a newtype for generation numbers. I also removed the unused generation number from ApiProject.
Updated Nexus to maintain a connection pool for the database
Updated the test suite to spin up databases as needed

Unrelated stuff here:

I added a new bail_unless! macro. This is like anyhow's ensure, but it produces an ApiError::InternalError.
I refactored the provision saga to use separate functions rather than closures. I think this will be a better pattern going forward.
There are some small changes as a result of sync'ing up with newer Dropshot.
I removed "boot_disk_size" from the Instance. I think this probably doesn't belong here, though we might still want it in the provision request as a shorthand for creating a disk.

Future work

There's a ton of future work here.

Update SagaLog to use a database!
Service discovery and connection pooling: I used bb8 here because it was easy. Based on my survey, I think we'll probably want to build our own cueball-like library.
The type conversions could probably be implemented with a proc macro.
There's not that much automated testing for the new stuff. (The basics are covered by the existing integration tests.)
We need better logging of queries.
We could use cockroach cert to generate real TLS certificates rather than running the local cluster in insecure mode.
For the functions in sql_operations.rs, they may make more sense inside the Table or LookupKey traits, or other traits that can be attached to types that impl these.
If we do that we may want to remove datastore.rs entirely, instead expecting sagas to interface directly with the lower level database code. Most of the stuff here are just wrappers for that stuff, and it may make more sense to wrap them with saga actions.

There's a lot more as well.

… separate file

davepacheco added 30 commits March 22, 2021 09:50

add database config to nexus config

5a75596

omicron_dev could use fixed listen port

f2cbea3

add initial pool

9368b48

WIP: create project, partially create instance

f90569b

WIP: project CRUD

a156406

use functions rather than closures for instance create saga

e11229f

very slightly improve SQL building

e4bdfff

lots of thoughts on conflict errors

58422e6

WIP: adding instances to database

87022a0

WIP: instance provision saga works

80c7a65

fix sled agent -> nexus update of instance run state

ce3fe20

remove some unneeded datastore methods

f0f9f3f

list instances

c52fc0e

delete instance

e3ee5c7

disks: metadata and translators

ac15d66

WIP: disks

83f59b5

WIP: working on getting test suite to work again

4fcce7c

work around broken test for now

2393690

update test

83129ed

fix test again

385bca8

fix test suite

6b8a1f4

style + clippy nits

5389e9c

clippy added a new lint in 1.51

c5f59ab

remove uses of Arc-using model type aliases

4fec834

rename transitional type aliases back

ff05dc2

db datastore: add a bit more structure around error handling

3948bfd

add more structure for queries that return at most one row

fb0cb9c

add more structure to execute() that return at most one row

bc48052

fix race

6b1b2b1

add bail_on macro

bb3e029

davepacheco added 24 commits April 11, 2021 19:43

update project_delete_disk

ec9de76

refactor INSERT paths

9b16856

better use SqlValueSet

9eff5f0

clean up small XXXs

054a845

align database schema with external schema for time_modified

f535405

fix style

bc16cea

resolve XXX about time_deleted

85361eb

explain why sql_error_generic is not a From

341ede9

clean up two different kinds of sql operation functions and move to a…

875a6c8

… separate file

fix test regression

5912e52

remove the old datastore file

3becc8d

helper function, remove some XXXs

e064eff

a bit more commonization of instance/disk create

2afad5a

move and polish instance create docs

2b70a9b

clean up more comments

337741e

more docs, updating XXXs and TODOs

92f4e00

work through a few more XXXs

75992a0

more docs for low-level query functions

0c5a9c7

clean up some XXXs

b0f3935

start updating README

352b63e

update README

f3a0c1b

fix doc nits

42068a3

Merge remote-tracking branch 'oxide/main' into dap-db-work

dc19608

fix disk test

5f5b794

davepacheco merged commit 7001ad7 into main Apr 13, 2021

This was referenced Apr 13, 2021

Update API version and make sure local and deployed versions work oxidecomputer/console#246

Closed

Deployed API uses CockroachDB oxidecomputer/console#247

Merged

davepacheco deleted the dap-db-work branch March 13, 2022 03:11

davepacheco restored the dap-db-work branch March 13, 2022 03:11

smklein mentioned this pull request Oct 3, 2023

Revisit decision to use "bb8" as a connection pool? #4192

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

replace in-memory datastore with a CockroachDB-based one #57

replace in-memory datastore with a CockroachDB-based one #57

davepacheco commented Apr 13, 2021

replace in-memory datastore with a CockroachDB-based one #57

replace in-memory datastore with a CockroachDB-based one #57

Conversation

davepacheco commented Apr 13, 2021

Compatibility / changes in developer workflow

In this change

Future work