Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Introduce schema definition. #19

Merged
merged 4 commits into from
Aug 3, 2023
Merged

Conversation

liurenjie1024
Copy link
Collaborator

Initial schema definition. SchemaVisitor, name indexes will come later.

crates/iceberg/src/error.rs Outdated Show resolved Hide resolved
}

/// Set the field's initial default value.
pub fn with_write_default(mut self, value: impl ToString) -> Self {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write_default and initial_default seems should be a value?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but currently Value is not defined yet 🤪

Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM! Although, I feel that the usage of once_cell here is a bit over-designed. Is it possible that we may need to add new fields into StructType, for example, in schema evaluation?

@liurenjie1024
Copy link
Collaborator Author

The Python and java implementation treats schema as immutable data structures, I think we should also follow that. Mutable makes things complicated, especially when we have indexes. For example the name indexing following.

@Xuanwo
Copy link
Member

Xuanwo commented Aug 2, 2023

The Python and java implementation treats schema as immutable data structures, I think we should also follow that. Mutable makes things complicated, especially when we have indexes. For example the name indexing following.

Got it, makes sense now.

@JanKaul
Copy link
Collaborator

JanKaul commented Aug 2, 2023

Should we also include identifier-field-ids?

@liurenjie1024
Copy link
Collaborator Author

Should we also include identifier-field-ids?

Just took a look at it. We need visitor pattern to verify them, for example the types, etc. So I want to postpone it after we introduce schema visitor.

@JanKaul
Copy link
Collaborator

JanKaul commented Aug 3, 2023

I don't entirely understand. Could you elaborate why we would need the visitor pattern for the identifier-field-ids? I was thinking about something similar to the serialized representation.

@liurenjie1024
Copy link
Collaborator Author

liurenjie1024 commented Aug 3, 2023

I don't entirely understand. Could you elaborate why we would need the visitor pattern for the identifier-field-ids? I was thinking about something similar to the serialized representation.

I mean the verification check, you can find the implementation in java: https://github.com/apache/iceberg/blob/1bb853191fd378fb1dfda5a5cb297475b7fc204b/api/src/main/java/org/apache/iceberg/Schema.java#L104

cc @Fokko @JanKaul Why I don't include identifier-field-ids here.

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small comments, but looks good 👍🏻

crates/iceberg/src/spec/datatypes.rs Show resolved Hide resolved
crates/iceberg/src/spec/schema.rs Show resolved Hide resolved
r#struct: StructType,
schema_id: i32,
highest_field_id: i32,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the identifier-field-ids are missing: https://iceberg.apache.org/spec/#identifier-field-ids

@Fokko Fokko merged commit 67f394e into apache:main Aug 3, 2023
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants