adapter: Implement expression cache #30122

jkosh44 · 2024-10-21T18:54:37Z

This commit adds a module for an expression cache. It adds the
functionality to durably cache optimized expressions. It uses the
durable_cache module in its implementation.

The expression cache is run on its own task, so that callers can insert
new entries without blocking on the insert completing.

The adapter doesn't use the expression cache yet, that is saved for a
later commit.

Works towards resolving #MaterializeInc/database-issues/issues/8384

Motivation

This PR adds a known-desirable feature.

Checklist

This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

def- · 2024-10-23T17:17:52Z

I'm guessing this should not cause any behavior changes, so no new tests. Could we add a feature benchmark scenario that shows something speeding up from the new expression cache?

jkosh44 · 2024-10-24T13:18:39Z

I'm guessing this should not cause any behavior changes, so no new tests.

I'm working on writing unit tests, but yeah no new integration tests are needed. In the next PR when we hook everything up we may want to add some targeted integration tests.

Could we add a feature benchmark scenario that shows something speeding up from the new expression cache?

That will also have to wait until the next PR when this actually gets hooked up.

jkosh44 · 2024-10-24T17:27:04Z

src/durable-cache/src/lib.rs

@danhhz FYI, I added logs here. No need to review the rest of the PR.

Cool, this at least lets us turn them on if we suspect something is going on. We could also (potentially later) do something more sophisticated like persist's machine::snapshot which switches to info after some number of retries and/or add metrics, but I'll leave that to your discretion.

Btw, not changed in this PR but I noticed while I was looking, would you mind replacing the Diagnostics::from_purpose call with constructing the Diagnostics directly so we can set the name field plz? That's what ultimately powers e.g. the "catalog" shard dropdown in the persist dash and from_purpose sets it to "unknown"

I've updated the Diagnostics, I'll probably hold off on more sophisticated logging.

This commit adds a module for an expression cache. It adds the functionality to durably cache optimized expressions. It uses the durable_cache module in its implementation. The expression cache is run on its own task, so that callers can insert new entries without blocking on the insert completing. The adapter doesn't use the expression cache yet, that is saved for a later commit. Works towards resolving #MaterializeInc/database-issues/issues/8384

shepherdlybot · 2024-10-24T18:30:29Z

Mitigations

Completing required mitigations increases Resilience Coverage.

(Required) Code Review 🔍 Detected
(Required) Feature Flag
(Required) Integration Test
(Required) Observability 🔍 Detected
(Required) QA Review 🔍 Detected
(Required) Run Nightly Tests
Unit Test 🔍 Detected

Risk Summary:

The pull request carries a high risk score of 80, primarily driven by the predictors "Sum Bug Reports Of Files" and "Delta of Executable Lines." Historically, PRs with these predictors are 117% more likely to cause a bug than the repository baseline. Additionally, the repository's predicted bug trend is increasing.

Note: The risk score is not based on semantic analysis but on historical predictors of bug occurrence in the repository. The attributes above were deemed the strongest predictors based on that history. Predictors and the score may change as the PR evolves in code, time, and review activity.

Bug Hotspots:
What's This?

File	Percentile
../durable/persist.rs	95

aljoscha

As far as I can tell, this does what it's supposed to do! And the tests look good. I'm very curious to see how it shakes out when actually used.

aljoscha · 2024-10-25T13:52:58Z

src/catalog/src/expr_cache.rs

+}
+
+impl ExpressionCache {
+    /// Creates a new [`ExpressionCache`] and reconciles all entries in current deploy generation


Could we describe what "reconciles" means here, in the docstring? I now know that it means "remove all entries from cache that are not in current_ids.

aljoscha · 2024-10-25T13:56:06Z

src/catalog/src/expr_cache.rs

+    }
+}
+
+impl Arbitrary for Expressions {


This is for testing, but for reasons ™️ it can't be in the tests module, right?

Actually I just never thought about it, I moved it into the tests module just fine.

aljoscha · 2024-10-25T13:58:30Z

src/repr/src/global_id.rs

@@ -84,6 +84,9 @@ impl FromStr for GlobalId {
        if s.len() < 2 {
            return Err(anyhow!("couldn't parse id {}", s));
        }
+        if s == "Explained Query" {


why's this one needed?

The tests randomly generate GlobalIds. If it happens to generate a GlobalId::Explain, then it wouldn't be able to round-trip through persist without this.

ahh, the Display <-> FromStr round-trip didn't work before this ... I see! 🙈

danhhz · 2024-10-25T15:19:20Z

src/durable-cache/src/lib.rs

Cool, this at least lets us turn them on if we suspect something is going on. We could also (potentially later) do something more sophisticated like persist's machine::snapshot which switches to info after some number of retries and/or add metrics, but I'll leave that to your discretion.

Btw, not changed in this PR but I noticed while I was looking, would you mind replacing the Diagnostics::from_purpose call with constructing the Diagnostics directly so we can set the name field plz? That's what ultimately powers e.g. the "catalog" shard dropdown in the persist dash and from_purpose sets it to "unknown"

danhhz · 2024-10-25T15:24:53Z

src/catalog/src/expr_cache.rs

+    ) {
+        (
+            RelationDesc::builder()
+                .with_column("data", ScalarType::Jsonb.nullable(false))


I've been thinking since we initially talked about all this and have slightly warmed up to two fields instead: "key" and "val", both jsonb. It would be harder to change away from the key/val structure, but I don't want to overindex on my regrets for txn-wal and how hard it was to add in the schema later. To be clear, I'm on the fence and either is probably fine but...

The advantage to two fields:

More semantic

Perhaps better persist pushdown stats, they're recursively computed and trimmed, but ultimately stored per-field

Maybe something something future projection pushdown handwave

As in something like

( RelationDesc::builder() .with_column("key", ScalarType::Jsonb.nullable(false)) .finish(), RelationDesc::builder() .with_column("val", ScalarType::Jsonb.nullable(false)) .finish(), )

?

Another advantage of this approach that I'm just realizing is that we can more easily deserialize the key and value separately in case of backwards compatibility issues.

Yeah, exactly

I'm going to merge as is, but I might have to change this when I try and implement backwards compatibility in the next PR.

The more I think about it, the more I think you might want separate fields for persist stats reasons. So it might be a good idea to switch before we roll this out too widely. BUT, unlike txn-wal, we could always easily burn const EXPRESSION_CACHE_SEED: usize = 4; so maybe it really is fine to punt until/if it becomes a problem

(But yeah, definitely feel free to merge as-is)

jkosh44 · 2024-10-25T17:01:35Z

I'm just now realizing that modifiying any of the optimized expression sturcts will break the cache because it will fail to deserailize, which is exactly what we wanted to avoid.

We'll need to add some functionality on working with undecoded values in the cache if we want this to work properly.

I'm still going to merge this, but I'll work on fixing the issue in a follow up PR.

jkosh44 · 2024-10-25T17:12:51Z

src/catalog/src/expr_cache.rs

+    proptest! {
+        #![proptest_config(ProptestConfig::with_cases(32))]
+
+        #[mz_ore::test]
+        #[cfg_attr(miri, ignore)]
+        fn expr_cache_roundtrip((key, val) in any::<(CacheKey, Expressions)>()) {
+            let (encoded_key, encoded_val) = ExpressionCodec::encode(&key, &val);
+            let (decoded_key, decoded_val) = ExpressionCodec::decode(encoded_key, encoded_val);
+
+            assert_eq!(key, decoded_key);
+            assert_eq!(val, decoded_val);
+        }
+    }


FYI, this test with just 32 cases is pretty slow. It's possible that Arbitrary creates completely unrealistically huge expressions, but I am worried about the serialization perf in prod.

jkosh44 force-pushed the expression_cache branch 4 times, most recently from 7cb23b1 to 8911b1c Compare October 22, 2024 20:16

jkosh44 force-pushed the expression_cache branch 2 times, most recently from 5230b91 to d5ba35d Compare October 23, 2024 20:55

jkosh44 force-pushed the expression_cache branch from d5ba35d to 36cfe91 Compare October 24, 2024 17:19

jkosh44 changed the title ~~WIP: Expression cache~~ adapter: Implement expression cache Oct 24, 2024

jkosh44 force-pushed the expression_cache branch from 36cfe91 to 0ea1fe0 Compare October 24, 2024 17:25

jkosh44 commented Oct 24, 2024

View reviewed changes

jkosh44 force-pushed the expression_cache branch from 0ea1fe0 to 5a7d00d Compare October 24, 2024 17:35

jkosh44 force-pushed the expression_cache branch from 5a7d00d to 9c4c722 Compare October 24, 2024 17:45

jkosh44 marked this pull request as ready for review October 24, 2024 18:30

jkosh44 requested review from a team and benesch as code owners October 24, 2024 18:30

jkosh44 requested review from ParkMyCar and aljoscha October 24, 2024 18:30

jkosh44 added 2 commits October 24, 2024 16:06

Move module to mz_catalog crate

4269ce1

Fixup bazel

c8a322b

aljoscha approved these changes Oct 25, 2024

View reviewed changes

danhhz reviewed Oct 25, 2024

View reviewed changes

Fixup

d949513

benesch removed their request for review October 25, 2024 17:08

jkosh44 commented Oct 25, 2024

View reviewed changes

jkosh44 enabled auto-merge (squash) October 25, 2024 17:14

jkosh44 disabled auto-merge October 25, 2024 17:39

jkosh44 merged commit 506d2b7 into MaterializeInc:main Oct 25, 2024
80 checks passed

jkosh44 deleted the expression_cache branch October 25, 2024 17:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adapter: Implement expression cache #30122

adapter: Implement expression cache #30122

jkosh44 commented Oct 21, 2024 •

edited

Loading

def- commented Oct 23, 2024

jkosh44 commented Oct 24, 2024

jkosh44 Oct 24, 2024

danhhz Oct 25, 2024

jkosh44 Oct 25, 2024 •

edited

Loading

shepherdlybot bot commented Oct 24, 2024 •

edited

Loading

aljoscha left a comment

aljoscha Oct 25, 2024

jkosh44 Oct 25, 2024

aljoscha Oct 25, 2024

jkosh44 Oct 25, 2024

aljoscha Oct 25, 2024

jkosh44 Oct 25, 2024

aljoscha Oct 25, 2024

danhhz Oct 25, 2024

danhhz Oct 25, 2024

jkosh44 Oct 25, 2024

danhhz Oct 25, 2024

jkosh44 Oct 25, 2024

danhhz Oct 25, 2024 •

edited

Loading

jkosh44 commented Oct 25, 2024 •

edited

Loading

jkosh44 Oct 25, 2024

adapter: Implement expression cache #30122

adapter: Implement expression cache #30122

Conversation

jkosh44 commented Oct 21, 2024 • edited Loading

Motivation

Checklist

def- commented Oct 23, 2024

jkosh44 commented Oct 24, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jkosh44 Oct 25, 2024 • edited Loading

Choose a reason for hiding this comment

shepherdlybot bot commented Oct 24, 2024 • edited Loading

Mitigations

aljoscha left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danhhz Oct 25, 2024 • edited Loading

Choose a reason for hiding this comment

jkosh44 commented Oct 25, 2024 • edited Loading

Choose a reason for hiding this comment

jkosh44 commented Oct 21, 2024 •

edited

Loading

jkosh44 Oct 25, 2024 •

edited

Loading

shepherdlybot bot commented Oct 24, 2024 •

edited

Loading

danhhz Oct 25, 2024 •

edited

Loading

jkosh44 commented Oct 25, 2024 •

edited

Loading