-
Notifications
You must be signed in to change notification settings - Fork 383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#1313][#1471] feat(iceberg): Support struct column for iceberg #1721
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test also need to be added in com.datastrato.gravitino.integration.test.catalog.lakehouse.iceberg.CatalogIcebergIT
* @param field Gravitino field. | ||
* @param id | ||
* @return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a full comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll add it
@@ -61,4 +61,22 @@ public static IcebergColumn fromNestedField(Types.NestedField nestedField) { | |||
.withType(ConvertUtil.formIcebergType(nestedField.type())) | |||
.build(); | |||
} | |||
|
|||
/** | |||
* Convert the Gravitino field of Iceberg to the Iceberg column. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's field of StructType, not Iceberg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll fix it
Arrays.stream(fields) | ||
.map( | ||
field -> { | ||
return ConvertUtil.fromGravitinoField(field, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The id is always zero?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's best to generate this ID according to iceberg's generation rules
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The id will not be used in the subsequent logic, visitor.struct() will generate this ID according to iceberg's generation rules.
My idea is that since the generated id here will not be used later, assigning a value of 0 directly to the id will make the code more concise
@Override
public Type struct(IcebergTable struct, List<Type> types) {
List<IcebergColumn> fields =
Arrays.stream(struct.columns())
.map(column -> (IcebergColumn) column)
.collect(Collectors.toList());
List<Types.NestedField> newFields = Lists.newArrayListWithExpectedSize(fields.size());
boolean isRoot = root == struct;
for (int i = 0; i < fields.size(); i += 1) {
IcebergColumn field = fields.get(i);
Type type = types.get(i);
// for new conversions, use ordinals for ids in the root struct
int id = isRoot ? i : getNextId();
if (field.nullable()) {
newFields.add(Types.NestedField.optional(id, field.name(), type, field.comment()));
} else {
newFields.add(Types.NestedField.required(id, field.name(), type, field.comment()));
}
}
return Types.StructType.of(newFields);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it is necessary to add generated ID logic in this method, though these ID will never be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's confirmed that this won't be used, then I think we can pass a 0 to keep the code clean. We should first raise an issue to refactor the previous code and abandon the use of ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 0 of id passed here will never be used in the subsequent logic, so It's not necessary to generate id here.
As you mentioned, it will be much appreciate if you can raise an issue to discuss the refactor of previous code and abandon the use of ID.
@FANNG1 @Clearvive can you please also help to review. |
.../main/java/com/datastrato/gravitino/catalog/lakehouse/iceberg/converter/FromIcebergType.java
Outdated
Show resolved
Hide resolved
public static IcebergColumn fromGravitinoField( | ||
com.datastrato.gravitino.rel.types.Types.StructType.Field field, int id) { | ||
return new IcebergColumn.Builder() | ||
.withId(id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The iceberg type ID has generation rules, but it is uncertain whether the underlying layer is useful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The input parameters of this method is StructType.Field, not a table, my understanding of iceberg type ID generation rules is aimed at IcebergTable
I think it's better to put the rules of generating id to the previous layer method (the method that call fromGravitinoField)
Consider adding relevant tests in CatalogIcebergIT |
.../java/com/datastrato/gravitino/catalog/lakehouse/iceberg/converter/ToIcebergTypeVisitor.java
Outdated
Show resolved
Hide resolved
.../test/java/com/datastrato/gravitino/catalog/lakehouse/iceberg/converter/TestConvertUtil.java
Show resolved
Hide resolved
@@ -17,10 +18,25 @@ public class ConvertUtil { | |||
* Convert the Iceberg Table to the corresponding schema information in the Iceberg. | |||
* | |||
* @param icebergTable Iceberg table. | |||
* @return iceberg schema. | |||
* @return Iceberg schema. | |||
*/ | |||
public static Schema toIcebergSchema(IcebergTable icebergTable) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you rename icebergTable
to gravitinoTable
to make it more clear?
.../src/main/java/com/datastrato/gravitino/catalog/lakehouse/iceberg/converter/ConvertUtil.java
Outdated
Show resolved
Hide resolved
.../src/main/java/com/datastrato/gravitino/catalog/lakehouse/iceberg/converter/ConvertUtil.java
Outdated
Show resolved
Hide resolved
.../src/main/java/com/datastrato/gravitino/catalog/lakehouse/iceberg/converter/ConvertUtil.java
Outdated
Show resolved
Hide resolved
.../java/com/datastrato/gravitino/catalog/lakehouse/iceberg/converter/ToIcebergTypeVisitor.java
Outdated
Show resolved
Hide resolved
.../java/com/datastrato/gravitino/catalog/lakehouse/iceberg/converter/ToIcebergTypeVisitor.java
Outdated
Show resolved
Hide resolved
.../java/com/datastrato/gravitino/catalog/lakehouse/iceberg/converter/ToIcebergTypeVisitor.java
Outdated
Show resolved
Hide resolved
LGTM, except a few minor comments |
please rebase the latest code. and change the Iceberg document add |
So we finally catch the 0.4.0 release, thanks @TEOTEO520 for your work. 😄 |
@FANNG1 |
It's my first PR for open source community in my life, thanks to everyone in Gravitino for your advices and selfless help! |
LGTM, let's wait the CI to finish |
…berg (#1721) ### What changes were proposed in this pull request? add struct column support for iceberg ### Why are the changes needed? Fix: #1313 Fix: #1471 ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? UT added --------- Co-authored-by: teo <[email protected]>
@TEOTEO520 , thanks for your work, wellcome to join Gravitino community, hope you enjoy the opensource journey :) |
What changes were proposed in this pull request?
add struct column support for iceberg
Why are the changes needed?
Fix: #1313
Fix: #1471
Does this PR introduce any user-facing change?
no
How was this patch tested?
UT added