Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

source/pg: proactively validate schemas #30807

Merged

Conversation

petrosagg
Copy link
Contributor

The PostgreSQL logical replication protocol only notifies us that a table has changed schema if that table participates in a transaction after the schema change has occured.

This is problematic because until that table participates in a transaction Materialize will keep advancing the upper frontier of the table, asserting that everything is fine and potentially revealing invalid data.

This PR puts an upper bound on this window of invalidity by proactively re-validating the schema of all tables in an ingestion in a cadence. The interval is set to 15s by default but configurable through LD.

Motivation

Tips for reviewer

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@petrosagg petrosagg requested a review from a team as a code owner December 12, 2024 16:09
@petrosagg petrosagg force-pushed the pg-proactive-schema-validation branch 2 times, most recently from 549d9bd to 621bcb1 Compare December 13, 2024 17:19
Copy link
Contributor

@teskje teskje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 513 to 514
// If the publication is missing there is nothing else to
// do. These errors are not retractable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest dropping one of these two comments, since they seem redundant.

The PostgreSQL logical replication protocol only notifies us that a
table has changed schema if that table participates in a transaction
*after* the schema change has occured.

This is problematic because until that table participates in a
transaction Materialize will keep advancing the upper frontier of the
table, asserting that everything is fine and potentially revealing
invalid data.

This PR puts an upper bound on this window of invalidity by proactively
re-validating the schema of all tables in an ingestion in a cadence. The
interval is set to 15s by default but configurable through LD.

Signed-off-by: Petros Angelatos <[email protected]>
@petrosagg petrosagg force-pushed the pg-proactive-schema-validation branch from 621bcb1 to 9799a0a Compare January 9, 2025 09:17
@petrosagg petrosagg enabled auto-merge January 9, 2025 09:17
@petrosagg petrosagg merged commit 84fff69 into MaterializeInc:main Jan 9, 2025
80 checks passed
@petrosagg petrosagg deleted the pg-proactive-schema-validation branch January 9, 2025 10:15
def- added a commit to def-/materialize that referenced this pull request Jan 10, 2025
def- added a commit to def-/materialize that referenced this pull request Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants