Skip to content

Commit

Permalink
adapter: Expression cache design doc
Browse files Browse the repository at this point in the history
This commit adds a design doc for an optimized expression cache.

Works towards resolving #MaterializeInc/database-issues/issues/8384
  • Loading branch information
jkosh44 committed Oct 8, 2024
1 parent 92ff51b commit 329766e
Showing 1 changed file with 135 additions and 0 deletions.
135 changes: 135 additions & 0 deletions doc/developer/design/2024_10_08_expression_cache.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# Expression Cache

## The Problem

Optimization is a slow process that is on the critical path for `environmentd` startup times. Some
environments spend 30 seconds just on optimization. All optimization time spent in startup is
experienced downtime for users when `environmentd` restarts.

## Success Criteria

Startup spends less than 1 second optimizing expressions.

## Solution Proposal

The solution being proposed in this document is a cache of optimized expressions. During startup,
`environmentd` will first look in the cache for optimized expressions and only compute a new
expression if it isn't present in the cache. If enough expressions are cached and the cache is fast
enough, then the time spent on this part of startup should be small.

The cache will present similarly as a key-value value store where the key is a composite of

- deployment generation
- object global ID
- expression type (local MIR, global MIR, LIR, etc)

The value will be a serialized version of the optimized expression. The cache will also be made
durable so that it's available after a restart, at least within the same deployment generation.

Upgrading an environment will look something like this:

1. Start deploy generation `n` in read-only mode.
2. Populate the expression cache for generation `n`.
3. Start deploy generation `n` in read-write mode.
4. Read optimized expressions from cache.

Restarting an environment will look something like this:

1. Start deploy generation `n` in read-write mode.
2. Read optimized expressions from cache.

### Cache API

Below is the API that the cache will present. It may be further wrapped with typed methods that
take care of serializing and deserializing bytes. Additionally, we probably don't need a trait when
implementing.

```Rust
trait ExpressionCache {
/// Returns the `expression_type` of `global_id` that is currently deployed in a cluster. This
/// will not change in-between restarts as result of DDL, as long as `global_id` exists.
fn get_deployed_expression(&self, global_id: GlobalId, expression_type: ExpressionType) -> Option<Bytes>;

/// Returns the `expression_type` of `global_id` based on the current catalog contents of
/// `deploy_generation`. This may change in-between restarts as result of DDL.
fn get_durable_expression(&self, deploy_generation: u64, global_id: GlobalId, expression_type: ExpressionType) -> Option<Bytes>;

/// Durably inserts `expression`, with key `(deploy_generation, global_id, expression_type)`.
///
/// Panics if `(deploy_generation, global_id, expression_type)` already exists.
fn insert_expression(&mut self, deploy_generation: u64, global_id: GlobalId, expression_type: ExpressionType, expression: Bytes);

/// Durably remove and return all entries in `deploy_generation` that depend on an ID in
/// `dropped_ids`.
fn invalidate_entries(&mut self, deploy_generation: u64, dropped_ids: BTreeSet<GlobalId>) -> Vec<(GlobalId, ExpressionType)>;

/// Durably removes all entries in `deploy_generation`.
fn remove_deploy_generation(&mut self, deploy_generation: u64);

/// Remove all entries that depend on a global ID that is not present in `txn`.
fn reconcile(&mut self, txn: mz_catalog::durable::Transaction);
}
```

### Startup

Below is a detailed set of steps that will happen in startup.

1. Call `ExpressionCache::reconcile` to remove any invalid entries.
2. While opening the catalog, for each object:
a. If the object is present in the cache, read the cached optimized expression via
`ExpressionCache::get_durable_expression`.
b. Else generate the optimized expressions and insert the expressions via
`ExpressionCache::insert_expression`.
3. If in read-write mode, call `ExpressionCache::remove_deploy_generation` to remove the previous
deploy generation.

### DDL - Create

1. Execute catalog transaction.
2. Update cache via `ExpressionCache::insert_expression`.

### DDL - Drop
1. Execute catalog transaction.
2. Invalidate cache entries via `ExpressionCache::invalidate_entries`.
3. Re-compute and repopulate cache entries that depended on dropped entries via
`ExpressionCache::insert_expression`.

### File System Implementation

One potential implementation is via the filesystem of an attached durable storage to environmentd.
Each cache entry would be saved as a file of the format
`/path/to/cache/<deploy_generation>/<global_id>/<expression_type>`.

#### Pros
- No need to worry about coordination across K8s pods.
- Bulk deletion is a simple directory delete.

#### Cons
- Need to worry about Flushing/fsync.
- Need to worry about concurrency.
- Need to worry about atomicity.
- Need to worry about mocking things in memory for tests.
- If we lose the pod, then we also lose the cache.

### Persist implementation

Another potential implementation is via persist. Each cache entry would be keyed by
`(deploy_generation, global_id, expression_type)` and the value would be a serialized version of the
expression.

#### Pros
- Flushing, concurrency, atomicity, mocking are already implemented by persist.

#### Cons
- We need to worry about coordinating access across multiple pods. It's expected that during
upgrades at least two `environmentd`s will be communicating with the cache.

## Open questions

- Which implementation should we use?
- If we use the persist implementation, how do we coordinate writes across pods?
- I haven't thought much about this, but here's one idea. The cache will maintain a subscribe on
the persist shard. Everytime it experiences an upper mismatch, it will listen for all new
changes. If any of the changes contain the current deploy generation, then panic, else ignore
them.

0 comments on commit 329766e

Please sign in to comment.