-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow restricting table features to auto-update protocol versions #1660
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, pushing parts of my review + questions. Will review the rest later. :)
"Your table schema requires the following table feature(s) that require manual enablement: <features>.", | ||
"", | ||
"To do this, run the following command for each of features listed above:", | ||
" ALTER TABLE table_name SET TBLPROPERTIES ('delta.feature.feature_name' = 'supported')", | ||
"Replace \"table_name\" and \"feature_name\" with real values.", | ||
"", | ||
"Required Delta protocol: <requiredVersion>", | ||
"Current Delta protocol: <currentVersion>" | ||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: Does that mean we now allow users to specify support for all table features? I thought they were not meant to be user-facing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind, they are allowed to be user-facing. So a user can specify support for DVs like this or add CDC support and change the metadata at the same time, if they wish 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A feature can also be controlled exclusively by table features, such as TimestampNtz
: #1626. In this case the config must be user-facing.
"Your table schema requires the following table feature(s) that require manual enablement: <features>.", | ||
"", | ||
"To do this, run the following command for each of features listed above:", | ||
" ALTER TABLE table_name SET TBLPROPERTIES ('delta.feature.feature_name' = 'supported')", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Table name can be passed in to this error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems not trivial because we have only the Delta Log instance which is based on a path. The name is already lost at this stage.
core/src/main/scala/org/apache/spark/sql/delta/OptimisticTransaction.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/delta/OptimisticTransaction.scala
Outdated
Show resolved
Hide resolved
* ([[FeatureAutomaticallyEnabledByMetadata.automaticallyUpdateProtocolOfExistingTables]] is set | ||
* to `true`), or being supported in the same transaction via a table property. | ||
*/ | ||
private def assertTableFeaturesAutomaticallySupported( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the description, rename to assertTableFeaturesSupportedByProtocol
? It doesn't necessarily only check automatic supports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about assertRequiredFeaturesCanBeSupportedByProtocol
😄?
core/src/test/scala/org/apache/spark/sql/delta/DeltaTimestampNTZSuite.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/delta/OptimisticTransaction.scala
Outdated
Show resolved
Hide resolved
* When the feature's metadata requirements are satisfied for <b>new tables</b>, or for | ||
* <b>existing tables when [[automaticallyUpdateProtocolOfExistingTables]] set to `true`</b>, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we allow html annotations in our doc comments? I thought it was just markdown?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe so... Someone could correct me if I'm wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the other half of the review. Looks good, just some more nits and questions.
core/src/main/scala/org/apache/spark/sql/delta/actions/actions.scala
Outdated
Show resolved
Hide resolved
Map(TestReaderWriterMetadataNoAutoUpdateFeature.TABLE_PROP_KEY -> "true"), | ||
tableProtocol = | ||
Protocol(TABLE_FEATURES_MIN_READER_VERSION, TABLE_FEATURES_MIN_WRITER_VERSION), | ||
expectedExceptionClass = Some("DELTA_FEATURES_REyQUIRE_MANUAL_ENABLEMENT")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: AFAICT, for this test function this is always the same exception? Can we change it to always intercept the right exception and only make it more generic when it is necessary? Class string matching is more prone to mistakes where we make a typo and it compiles, but the test fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I discovered I didn't do assert
around intercept[]
so the test can pass even with the typo 😂
core/src/test/scala/org/apache/spark/sql/delta/DeltaProtocolVersionSuite.scala
Outdated
Show resolved
Hide resolved
s"ALTER TABLE delta.`${dir.getCanonicalPath}` SET TBLPROPERTIES (" + | ||
s" '${TestReaderWriterMetadataAutoUpdateFeature.TABLE_PROP_KEY}' = 'true'," + | ||
s" '${TestWriterMetadataNoAutoUpdateFeature.TABLE_PROP_KEY}' = 'true')") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's definitely interesting to test the combination of settings multiple features together. Do we already have test for the combination of two different auto-update features (e.g. CDC and Column Mapping)? A test for auto-update feature with the same feature delta.feature.xx = true/enabled/supported being set manually? For an auto-update feature that is set to not supported(e.g. default in the session), make sure it doesn't accidentally do a protocol update?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added tests from this matrix:
- legacy vs TF protocol
- non-auto-update feature vs auto-update feature
- enabled via metadata vs via
delta.feature.xxx
Also the following:
- a non-auto-update feature enabled via both metadata and
delta.feature.xxx
- a non-auto-update feature, an auto-update feature, and some legacy features enabled via metadata
- the above with a session default
Should be enough.
core/src/test/scala/org/apache/spark/sql/delta/DeltaProtocolVersionSuite.scala
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/delta/OptimisticTransaction.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/delta/OptimisticTransaction.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/delta/OptimisticTransaction.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/delta/actions/TableFeatureSupport.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/delta/actions/actions.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/delta/OptimisticTransaction.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/delta/OptimisticTransaction.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/delta/actions/actions.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/delta/actions/actions.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/delta/actions/actions.scala
Outdated
Show resolved
Hide resolved
core/src/test/scala/org/apache/spark/sql/delta/DeltaProtocolVersionSuite.scala
Show resolved
Hide resolved
core/src/test/scala/org/apache/spark/sql/delta/DeltaProtocolVersionSuite.scala
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/delta/actions/actions.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/delta/actions/actions.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just nits left from me. Please still process those!
Description
This PR allows marking a metadata table feature as "auto-update capable", such that during ALTER TABLE when its metadata requirements are satisfied, the table's protocol will be automatically and silently bumped to the required version.
Note that this mechanism only applies to "metadata requirements".
'delta.feature.featureName' = 'supported'
table prop will still auto-update the table's protocol to support table features.Also, this mechanism only affects existing tables. For new tables, the protocol is always set to the min version that satisfies all enabled features, aka. all features are auto-update capable.
For compatibility, legacy table features (features supported by
Protocol(2, 6)
) are always auto-update capable. Specifically, Column Mapping implements its own mechanism to block the usage without protocol version bumps.How was this patch tested?
New tests.
Does this PR introduce any user-facing changes?
No. This PR affects only feature developers.