Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#1313][#1471] feat(iceberg): Support struct column for iceberg #1721

Merged
merged 6 commits into from
Feb 5, 2024

Conversation

TEOTEO520
Copy link
Contributor

@TEOTEO520 TEOTEO520 commented Jan 25, 2024

What changes were proposed in this pull request?

add struct column support for iceberg

Why are the changes needed?

Fix: #1313
Fix: #1471

Does this PR introduce any user-facing change?

no

How was this patch tested?

UT added

@TEOTEO520 TEOTEO520 changed the title support struct for iceberg [#1313] feat(iceberg): Support struct column for iceberg Jan 25, 2024
@qqqttt123 qqqttt123 requested a review from mchades January 26, 2024 01:50
@mchades mchades requested review from FANNG1 and Clearvive January 26, 2024 02:22
@mchades mchades requested a review from qqqttt123 January 26, 2024 02:22
Copy link
Contributor

@mchades mchades left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test also need to be added in com.datastrato.gravitino.integration.test.catalog.lakehouse.iceberg.CatalogIcebergIT

Comment on lines 68 to 70
* @param field Gravitino field.
* @param id
* @return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a full comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll add it

@@ -61,4 +61,22 @@ public static IcebergColumn fromNestedField(Types.NestedField nestedField) {
.withType(ConvertUtil.formIcebergType(nestedField.type()))
.build();
}

/**
* Convert the Gravitino field of Iceberg to the Iceberg column.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's field of StructType, not Iceberg

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll fix it

Arrays.stream(fields)
.map(
field -> {
return ConvertUtil.fromGravitinoField(field, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The id is always zero?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's best to generate this ID according to iceberg's generation rules

Copy link
Contributor Author

@TEOTEO520 TEOTEO520 Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The id will not be used in the subsequent logic, visitor.struct() will generate this ID according to iceberg's generation rules.
My idea is that since the generated id here will not be used later, assigning a value of 0 directly to the id will make the code more concise

  @Override
  public Type struct(IcebergTable struct, List<Type> types) {
    List<IcebergColumn> fields =
        Arrays.stream(struct.columns())
            .map(column -> (IcebergColumn) column)
            .collect(Collectors.toList());
    List<Types.NestedField> newFields = Lists.newArrayListWithExpectedSize(fields.size());
    boolean isRoot = root == struct;

    for (int i = 0; i < fields.size(); i += 1) {
      IcebergColumn field = fields.get(i);
      Type type = types.get(i);

      // for new conversions, use ordinals for ids in the root struct
      int id = isRoot ? i : getNextId();
      if (field.nullable()) {
        newFields.add(Types.NestedField.optional(id, field.name(), type, field.comment()));
      } else {
        newFields.add(Types.NestedField.required(id, field.name(), type, field.comment()));
      }
    }
    return Types.StructType.of(newFields);
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it is necessary to add generated ID logic in this method, though these ID will never be used?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's confirmed that this won't be used, then I think we can pass a 0 to keep the code clean. We should first raise an issue to refactor the previous code and abandon the use of ID.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 0 of id passed here will never be used in the subsequent logic, so It's not necessary to generate id here.
As you mentioned, it will be much appreciate if you can raise an issue to discuss the refactor of previous code and abandon the use of ID.

@jerryshao
Copy link
Contributor

@FANNG1 @Clearvive can you please also help to review.

@qqqttt123 qqqttt123 changed the title [#1313] feat(iceberg): Support struct column for iceberg [#1313][#1471] feat(iceberg): Support struct column for iceberg Jan 26, 2024
public static IcebergColumn fromGravitinoField(
com.datastrato.gravitino.rel.types.Types.StructType.Field field, int id) {
return new IcebergColumn.Builder()
.withId(id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The iceberg type ID has generation rules, but it is uncertain whether the underlying layer is useful

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input parameters of this method is StructType.Field, not a table, my understanding of iceberg type ID generation rules is aimed at IcebergTable
I think it's better to put the rules of generating id to the previous layer method (the method that call fromGravitinoField)

@Clearvive
Copy link
Contributor

Consider adding relevant tests in CatalogIcebergIT

@@ -17,10 +18,25 @@ public class ConvertUtil {
* Convert the Iceberg Table to the corresponding schema information in the Iceberg.
*
* @param icebergTable Iceberg table.
* @return iceberg schema.
* @return Iceberg schema.
*/
public static Schema toIcebergSchema(IcebergTable icebergTable) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you rename icebergTable to gravitinoTable to make it more clear?

@FANNG1
Copy link
Contributor

FANNG1 commented Feb 5, 2024

LGTM, except a few minor comments

@FANNG1
Copy link
Contributor

FANNG1 commented Feb 5, 2024

please rebase the latest code. and change the Iceberg document add Struct to Iceberg support column types in docs/lakehouse-iceberg-catalog.md

@jerryshao
Copy link
Contributor

So we finally catch the 0.4.0 release, thanks @TEOTEO520 for your work. 😄

@jerryshao jerryshao added need backport Issues that need to backport to another branch branch-0.4 labels Feb 5, 2024
@yuqi1129
Copy link
Contributor

yuqi1129 commented Feb 5, 2024

@FANNG1
Can you assist in checking if the changes to the documents are acceptable?

@TEOTEO520
Copy link
Contributor Author

It's my first PR for open source community in my life, thanks to everyone in Gravitino for your advices and selfless help!
@jerryshao @Clearvive @FANNG1 @mchades @qqqttt123 @yuqi1129

@FANNG1
Copy link
Contributor

FANNG1 commented Feb 5, 2024

@FANNG1 Can you assist in checking if the changes to the documents are acceptable?

LGTM, let's wait the CI to finish

@FANNG1 FANNG1 merged commit f63f6f3 into apache:main Feb 5, 2024
12 checks passed
github-actions bot pushed a commit that referenced this pull request Feb 5, 2024
…berg (#1721)

### What changes were proposed in this pull request?

add  struct column support for iceberg 

### Why are the changes needed?

Fix: #1313
Fix: #1471 

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

UT added

---------

Co-authored-by: teo <[email protected]>
@FANNG1
Copy link
Contributor

FANNG1 commented Feb 5, 2024

@TEOTEO520 , thanks for your work, wellcome to join Gravitino community, hope you enjoy the opensource journey :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need backport Issues that need to backport to another branch
Projects
None yet
6 participants