Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add iceberg-rest-catalog-client #16

Closed
wants to merge 2 commits into from

Conversation

Xuanwo
Copy link
Member

@Xuanwo Xuanwo commented Aug 2, 2023

This PR will add iceberg-rest-catalog-client which generated from openapi specs.

Reviewer Notes

Please ignore all code under src/apis, src/models and docs. They are all generated.

Acknowledgment

This work is inspired by @liurenjie1024's work in icelake

Xuanwo and others added 2 commits August 2, 2023 12:09
Signed-off-by: Xuanwo <[email protected]>
Co-authored-by: Renjie Liu <[email protected]>
@liurenjie1024
Copy link
Collaborator

Another option is to use build.rs to generate it automatically at compile time. What do you guys think?

@Xuanwo
Copy link
Member Author

Xuanwo commented Aug 2, 2023

Another option is to use build.rs to generate it automatically at compile time.

I don't want our users to install openapi-generator 😢

@liurenjie1024
Copy link
Collaborator

I don't want our users to install openapi-generator

Yeah, that's a tradeoff.

@Xuanwo
Copy link
Member Author

Xuanwo commented Aug 2, 2023

The issue we are facing is that openapi-generator is not as commonly used as protoc. While we can create a gRPC-based client using build.rs to call protoc, the same cannot be done for an openapi-generator based client since many users may not have it installed.

In fact, I don't want users to install protoc either 😢

Copy link
Collaborator

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

Copy link
Collaborator

@JanKaul JanKaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thank you.

I guess eventually we need to define a catalog interface(trait) that this crate will then implement.

JanKaul

This comment was marked as duplicate.

@Xuanwo
Copy link
Member Author

Xuanwo commented Aug 2, 2023

I guess eventually we need to define a catalog interface(trait) that this crate will then implement.

Yes, we will implement our own catalog trait based on this client.

@liurenjie1024
Copy link
Collaborator

cc @nastra @Fokko PTAL

@Fokko
Copy link
Contributor

Fokko commented Aug 3, 2023

Yes, we will implement our own catalog trait based on this client.

Shouldn't we first create a trait and then the implementation? I'm not sure if this is the right approach. There is a lot of stuff in here that will be duplicated. For example, Schema, Types, Metadata (it maps 1 to 1 with the Iceberg metadata), Snapshots, PartitionSpecs, SortFields. I don't think we want to have a Schema in the rest catalog package, and a general Schema, because then we would need to convert the objects vica versa.

There is a working REST catalog available at: https://github.com/tabular-io/docker-spark-iceberg/tree/main/spark Would be awesome if we start testing if we can actually pull a table from there, including tests.

@liurenjie1024
Copy link
Collaborator

Schema, Types, Metadata (it maps 1 to 1 with the Iceberg metadata), Snapshots, PartitionSpecs, SortFields.

I think eventually we will need this. From my experience, stuffs in this crate are used for ser/de, while those in spec crate are used for in memory and user-facing. Conversion between this stuff will simply ser/de codes, otherwise, we will need to maintain these by ourselves, which is cumbersome.

But we should be careful that don't expose these things as public api to end user.

@Xuanwo
Copy link
Member Author

Xuanwo commented Aug 3, 2023

There is a lot of stuff in here that will be duplicated.

If we want to avoid manually writing a client for the rest catalog, then this is necessary. Rust is unlike to java (which has runtime reflect) or python (which is dynamic), we must declare structs or types first before we can using them. The convertion could be simple and zero cost.

But we should be careful that don't expose these things as public api to end user.

iceberg-rest-catalog-client is an underlying crate may be used by iceberg or any other users. iceberg can use iceberg-rest-catalog-client to implement rest-catalog support. And other users may use this client to connect to a rest catalog service directly.

The client itself is a simple HTTP SDK like aws-sdk-s3, it won't be part of public api of iceberg.

There is a working REST catalog available at: https://github.com/tabular-io/docker-spark-iceberg/tree/main/spark Would be awesome if we start testing if we can actually pull a table from there, including tests.

Makes sense. But I'm afraid that it's not part of iceberg-rest-catalog-client. We need those tests when we implement rest catalog in iceberg. In fact, iceberg-rest-catalog-client knows nothing about table or metadata, everything to it is request & response with pre-defained schema in open-api.yml.

@Fokko
Copy link
Contributor

Fokko commented Aug 3, 2023

If we want to avoid manually writing a client for the rest catalog, then this is necessary.

I'm afraid that we have to alter the code anyway. For example, the open-API spec does not define how to deserialize a fixed[16] or a decimal(10, 2). The open-API spec was created afterward. I would also love to rely on the spec itself, but that's not possible.

@Xuanwo
Copy link
Member Author

Xuanwo commented Aug 3, 2023

The open-API spec was created afterward. I would also love to rely on the spec itself, but that's not possible.

Ok, I understand. It appears that we are unable to create a client based on the specifications provided. Therefore, we must develop and maintain our own implementation. Let's close this PR now. We can define the trait first and than implement it by hand.

@Xuanwo Xuanwo closed this Aug 3, 2023
@Xuanwo Xuanwo deleted the iceberg-rest-catalog branch August 3, 2023 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants