Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adapter: Implement expression cache #30122

Merged
merged 4 commits into from
Oct 25, 2024

Conversation

jkosh44
Copy link
Contributor

@jkosh44 jkosh44 commented Oct 21, 2024

This commit adds a module for an expression cache. It adds the
functionality to durably cache optimized expressions. It uses the
durable_cache module in its implementation.

The expression cache is run on its own task, so that callers can insert
new entries without blocking on the insert completing.

The adapter doesn't use the expression cache yet, that is saved for a
later commit.

Works towards resolving #MaterializeInc/database-issues/issues/8384

Motivation

This PR adds a known-desirable feature.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@jkosh44 jkosh44 force-pushed the expression_cache branch 4 times, most recently from 7cb23b1 to 8911b1c Compare October 22, 2024 20:16
@def-
Copy link
Contributor

def- commented Oct 23, 2024

I'm guessing this should not cause any behavior changes, so no new tests. Could we add a feature benchmark scenario that shows something speeding up from the new expression cache?

@jkosh44 jkosh44 force-pushed the expression_cache branch 2 times, most recently from 5230b91 to d5ba35d Compare October 23, 2024 20:55
@jkosh44
Copy link
Contributor Author

jkosh44 commented Oct 24, 2024

I'm guessing this should not cause any behavior changes, so no new tests.

I'm working on writing unit tests, but yeah no new integration tests are needed. In the next PR when we hook everything up we may want to add some targeted integration tests.

Could we add a feature benchmark scenario that shows something speeding up from the new expression cache?

That will also have to wait until the next PR when this actually gets hooked up.

@jkosh44 jkosh44 changed the title WIP: Expression cache adapter: Implement expression cache Oct 24, 2024
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danhhz FYI, I added logs here. No need to review the rest of the PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, this at least lets us turn them on if we suspect something is going on. We could also (potentially later) do something more sophisticated like persist's machine::snapshot which switches to info after some number of retries and/or add metrics, but I'll leave that to your discretion.

Btw, not changed in this PR but I noticed while I was looking, would you mind replacing the Diagnostics::from_purpose call with constructing the Diagnostics directly so we can set the name field plz? That's what ultimately powers e.g. the "catalog" shard dropdown in the persist dash and from_purpose sets it to "unknown"

Copy link
Contributor Author

@jkosh44 jkosh44 Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the Diagnostics, I'll probably hold off on more sophisticated logging.

This commit adds a module for an expression cache. It adds the
functionality to durably cache optimized expressions. It uses the
durable_cache module in its implementation.

The expression cache is run on its own task, so that callers can insert
new entries without blocking on the insert completing.

The adapter doesn't use the expression cache yet, that is saved for a
later commit.

Works towards resolving #MaterializeInc/database-issues/issues/8384
@jkosh44 jkosh44 marked this pull request as ready for review October 24, 2024 18:30
@jkosh44 jkosh44 requested review from a team and benesch as code owners October 24, 2024 18:30
@jkosh44 jkosh44 requested review from ParkMyCar and aljoscha October 24, 2024 18:30
Copy link

shepherdlybot bot commented Oct 24, 2024

Risk Score:80 / 100 Bug Hotspots:1 Resilience Coverage:50%

Mitigations

Completing required mitigations increases Resilience Coverage.

  • (Required) Code Review 🔍 Detected
  • (Required) Feature Flag
  • (Required) Integration Test
  • (Required) Observability 🔍 Detected
  • (Required) QA Review 🔍 Detected
  • (Required) Run Nightly Tests
  • Unit Test 🔍 Detected
Risk Summary:

The pull request carries a high risk score of 80, primarily driven by the predictors "Sum Bug Reports Of Files" and "Delta of Executable Lines." Historically, PRs with these predictors are 117% more likely to cause a bug than the repository baseline. Additionally, the repository's predicted bug trend is increasing.

Note: The risk score is not based on semantic analysis but on historical predictors of bug occurrence in the repository. The attributes above were deemed the strongest predictors based on that history. Predictors and the score may change as the PR evolves in code, time, and review activity.

Bug Hotspots:
What's This?

File Percentile
../durable/persist.rs 95

Copy link
Contributor

@aljoscha aljoscha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, this does what it's supposed to do! And the tests look good. I'm very curious to see how it shakes out when actually used.

}

impl ExpressionCache {
/// Creates a new [`ExpressionCache`] and reconciles all entries in current deploy generation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we describe what "reconciles" means here, in the docstring? I now know that it means "remove all entries from cache that are not in current_ids.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
}

impl Arbitrary for Expressions {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for testing, but for reasons ™️ it can't be in the tests module, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I just never thought about it, I moved it into the tests module just fine.

@@ -84,6 +84,9 @@ impl FromStr for GlobalId {
if s.len() < 2 {
return Err(anyhow!("couldn't parse id {}", s));
}
if s == "Explained Query" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why's this one needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests randomly generate GlobalIds. If it happens to generate a GlobalId::Explain, then it wouldn't be able to round-trip through persist without this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, the Display <-> FromStr round-trip didn't work before this ... I see! 🙈

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, this at least lets us turn them on if we suspect something is going on. We could also (potentially later) do something more sophisticated like persist's machine::snapshot which switches to info after some number of retries and/or add metrics, but I'll leave that to your discretion.

Btw, not changed in this PR but I noticed while I was looking, would you mind replacing the Diagnostics::from_purpose call with constructing the Diagnostics directly so we can set the name field plz? That's what ultimately powers e.g. the "catalog" shard dropdown in the persist dash and from_purpose sets it to "unknown"

) {
(
RelationDesc::builder()
.with_column("data", ScalarType::Jsonb.nullable(false))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking since we initially talked about all this and have slightly warmed up to two fields instead: "key" and "val", both jsonb. It would be harder to change away from the key/val structure, but I don't want to overindex on my regrets for txn-wal and how hard it was to add in the schema later. To be clear, I'm on the fence and either is probably fine but...

The advantage to two fields:

  • More semantic
  • Perhaps better persist pushdown stats, they're recursively computed and trimmed, but ultimately stored per-field
  • Maybe something something future projection pushdown handwave

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in something like

        (
            RelationDesc::builder()
                .with_column("key", ScalarType::Jsonb.nullable(false))
                .finish(),
            RelationDesc::builder()
                .with_column("val", ScalarType::Jsonb.nullable(false))
                .finish(),
        )

?

Another advantage of this approach that I'm just realizing is that we can more easily deserialize the key and value separately in case of backwards compatibility issues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, exactly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to merge as is, but I might have to change this when I try and implement backwards compatibility in the next PR.

Copy link
Contributor

@danhhz danhhz Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I think about it, the more I think you might want separate fields for persist stats reasons. So it might be a good idea to switch before we roll this out too widely. BUT, unlike txn-wal, we could always easily burn const EXPRESSION_CACHE_SEED: usize = 4; so maybe it really is fine to punt until/if it becomes a problem

(But yeah, definitely feel free to merge as-is)

@jkosh44
Copy link
Contributor Author

jkosh44 commented Oct 25, 2024

I'm just now realizing that modifiying any of the optimized expression sturcts will break the cache because it will fail to deserailize, which is exactly what we wanted to avoid.

We'll need to add some functionality on working with undecoded values in the cache if we want this to work properly.

I'm still going to merge this, but I'll work on fixing the issue in a follow up PR.

@benesch benesch removed their request for review October 25, 2024 17:08
Comment on lines +581 to +593
proptest! {
#![proptest_config(ProptestConfig::with_cases(32))]

#[mz_ore::test]
#[cfg_attr(miri, ignore)]
fn expr_cache_roundtrip((key, val) in any::<(CacheKey, Expressions)>()) {
let (encoded_key, encoded_val) = ExpressionCodec::encode(&key, &val);
let (decoded_key, decoded_val) = ExpressionCodec::decode(encoded_key, encoded_val);

assert_eq!(key, decoded_key);
assert_eq!(val, decoded_val);
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, this test with just 32 cases is pretty slow. It's possible that Arbitrary creates completely unrealistically huge expressions, but I am worried about the serialization perf in prod.

@jkosh44 jkosh44 enabled auto-merge (squash) October 25, 2024 17:14
@jkosh44 jkosh44 disabled auto-merge October 25, 2024 17:39
@jkosh44 jkosh44 merged commit 506d2b7 into MaterializeInc:main Oct 25, 2024
80 checks passed
@jkosh44 jkosh44 deleted the expression_cache branch October 25, 2024 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants