Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: handle protocol compatibility #1807

Merged
merged 5 commits into from
Nov 17, 2023
Merged

Conversation

roeap
Copy link
Collaborator

@roeap roeap commented Nov 5, 2023

Description

In preparation for further improvements to protocol support, this PR introduces a ProtocolChecker which validates that we can read form / write to / commit to a specific table. So far everything is expressed in table features in the hopes, that this keeps on giving correct behaviours as we add more table features.

The existing support for append only is integrated and extended to handle enablement logic according to the protocol.

@github-actions github-actions bot added the binding/python Issues for the Python package label Nov 5, 2023
@roeap roeap force-pushed the preconditions branch 3 times, most recently from 9815fc6 to 0c72b20 Compare November 6, 2023 06:54
@github-actions github-actions bot removed the binding/python Issues for the Python package label Nov 6, 2023
@roeap roeap marked this pull request as ready for review November 6, 2023 07:19
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just a drive-by fix as builds were failing for benches.

Comment on lines +148 to +170
pub static INSTANCE: Lazy<ProtocolChecker> = Lazy::new(|| {
let reader_features = HashSet::new();
// reader_features.insert(ReaderFeatures::ColumnMapping);

let mut writer_features = HashSet::new();
writer_features.insert(WriterFeatures::AppendOnly);
writer_features.insert(WriterFeatures::Invariants);
// writer_features.insert(WriterFeatures::CheckConstraints);
// writer_features.insert(WriterFeatures::ChangeDataFeed);
// writer_features.insert(WriterFeatures::GeneratedColumns);
// writer_features.insert(WriterFeatures::ColumnMapping);
// writer_features.insert(WriterFeatures::IdentityColumns);

ProtocolChecker::new(reader_features, writer_features)
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I like the API of ProtocolChecker and I think it's a great addition to the library. The motivation for this global I am struggling with however. I thought it might be to avoid allocations, but as far as I can tell, the common paths through can_read and can_write_to will clone a number of the related HashSets anyways, which has me perplexed.

I think I'm just not understanding, would you mind putting some more commentary above INSTANCE on the motivation for it so that I may be enlightened 😄

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to add some comments, just really quick here as well.

The cloning bothered me as well, and I'll see that I can fix that. The main motivation though is that with table features protocol support is essentially a negotiation between the client and the table. Since the table features we support lock in at compile time, there really is exactly one correct way the instance should look like, so i felt its less error prone if we just create it globally and track support for features in one place.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some comments and avoided cloning the sets.

@hntd187
Copy link
Collaborator

hntd187 commented Nov 11, 2023

Ayyo @MrPowers

use crate::kernel::{Action, ReaderFeatures, WriterFeatures};
use crate::table::state::DeltaTableState;

lazy_static! {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this used for? The table reader and writers have Into<usize> impls so you should just be able to cast them to find their associated reader / writer version.

Copy link
Collaborator Author

@roeap roeap Nov 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main intent is to have a uniform way of handling compatibility for tables on 3/7 and the ones below.

I.e. to check compatibility with a version we need all features leading up to that version, unless on 3/7 where we just need to check all features the table requires. So casting a specific features to a min version only helps for certain checks. So just enumerating all features per version allowed to have one consistent way of handling capability checks, that "should" keep on behaving correctly at we add more and more features to the supported feature list.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I think I might not understand the check here. Are you saying these are for the chain of required features? Like check constraints for example requires append only and invariants? So maybe it's the name that throws me off if that's the case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as usual, it depends 😆 - tables features as a concept are only available at reader/writer version 3/7. AT that version, clients can validate that they implement all features the table requires as stated in the Protocol action. The respective fields must be available at these versions. For any reader/writer version below that, clients must assume that they need to implement all features up to the respective version.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh I see, so this only matters when we are not dealing with actual table features and meant to enforce the "all or nothing" style of protocol versioning prior?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly.

@roeap
Copy link
Collaborator Author

roeap commented Nov 14, 2023

@rtyler @wjones127 - any chance you could approve this so we can land it on main? There seems to be quite some activity around table features right now, so i think it would be valualble to establish a baseline...

@roeap roeap merged commit c15d768 into delta-io:main Nov 17, 2023
21 checks passed
@roeap roeap deleted the preconditions branch November 17, 2023 06:33
roeap added a commit that referenced this pull request Dec 11, 2023
~~based on #1807~~

# Description

In the effort to advance protocol support and move our internal APIs
closer to the kernel library, it is advantageous to leverage the
expression handling logic from kernel specifically for filtering actions
etc.

This PR just add the expression definitions and evaluation logic.
Integrating it with our current codebase and basing the existing
partition handling logic on this is left for follow up PRs to keep thigs
review-able.

related: #1894, #1776
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/rust Issues for the Rust crate crate/core
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants