Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that RestCatalog passes user config to FileIO #476

Merged
merged 2 commits into from
Aug 20, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 26 additions & 2 deletions crates/catalog/rest/src/catalog.rs
Original file line number Diff line number Diff line change
Expand Up @@ -466,6 +466,11 @@ impl Catalog for RestCatalog {
}

/// Create a new table inside the namespace.
///
/// In the resulting table, if there are any config properties that
/// are present in both the response from the REST server and the
/// config provided when creating this `RestCatalog` instance then
/// the value provided locally to the `RestCatalog` will take precedence.
async fn create_table(
&self,
namespace: &NamespaceIdent,
Expand Down Expand Up @@ -504,8 +509,15 @@ impl Catalog for RestCatalog {
.query::<LoadTableResponse, ErrorResponse, OK>(request)
.await?;

let config = resp
.config
.unwrap_or_default()
.into_iter()
.chain(self.user_config.props.clone().into_iter())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if it was that way around, you could not use the local user_config to override a property that came from the REST service? I think we would want the ability to do that. I think it makes more sense the way around that I have it.

Copy link
Collaborator

@liurenjie1024 liurenjie1024 Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both two methods have pros and cons.

If we prioritize user config, we get max flexibility, but also would be easy to introduce bugs. For example, what if user misconfiged some fileio property which should be managed by rest catalog?

Also, the rest catalog spec 's statement make me feel that we should repsect rest catalog's response.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My rationale behind this was so that users could override values from the REST config. For instance if you wanted to try out using an S3 proxy / cache by overriding the value of the S3 warehouse endpoint. I can change it if you insist, but that would reduce the usefulness of this change in my eyes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a reasonable case for me. cc @Xuanwo @Fokko what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong preference for this but respecting the user's override seems sensible to me. We can mention its behavior in the API comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some doc comments to the two relevant methods to clarify the behaviour. Hope this helps.

.collect();

let file_io = self
.load_file_io(resp.metadata_location.as_deref(), resp.config)
.load_file_io(resp.metadata_location.as_deref(), Some(config))
.await?;

let table = Table::builder()
Expand All @@ -524,6 +536,11 @@ impl Catalog for RestCatalog {
}

/// Load table from the catalog.
///
/// If there are any config properties that are present in
/// both the response from the REST server and the config provided
/// when creating this `RestCatalog` instance then the value
/// provided locally to the `RestCatalog` will take precedence.
async fn load_table(&self, table: &TableIdent) -> Result<Table> {
let request = self
.context()
Expand All @@ -542,8 +559,15 @@ impl Catalog for RestCatalog {
.query::<LoadTableResponse, ErrorResponse, OK>(request)
.await?;

let config = resp
.config
.unwrap_or_default()
.into_iter()
.chain(self.user_config.props.clone().into_iter())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

.collect();

let file_io = self
.load_file_io(resp.metadata_location.as_deref(), resp.config)
.load_file_io(resp.metadata_location.as_deref(), Some(config))
.await?;

let table_builder = Table::builder()
Expand Down
Loading