Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#1313][#1471] feat(iceberg): Support struct column for iceberg #1721

Merged
merged 6 commits into from
Feb 5, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -61,4 +61,22 @@ public static IcebergColumn fromNestedField(Types.NestedField nestedField) {
.withType(ConvertUtil.formIcebergType(nestedField.type()))
.build();
}

/**
* Convert the Gravitino field of Iceberg to the Iceberg column.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's field of StructType, not Iceberg

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll fix it

*
* @param field Gravitino field.
* @param id
* @return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a full comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll add it

*/
public static IcebergColumn fromGravitinoField(
FANNG1 marked this conversation as resolved.
Show resolved Hide resolved
com.datastrato.gravitino.rel.types.Types.StructType.Field field, int id) {
return new IcebergColumn.Builder()
.withId(id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The iceberg type ID has generation rules, but it is uncertain whether the underlying layer is useful

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input parameters of this method is StructType.Field, not a table, my understanding of iceberg type ID generation rules is aimed at IcebergTable
I think it's better to put the rules of generating id to the previous layer method (the method that call fromGravitinoField)

.withName(field.name())
.withNullable(field.nullable())
.withComment(field.comment())
.withType(field.type())
.build();
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,16 @@ public Type schema(Schema schema, Type structType) {

@Override
public Type struct(Types.StructType struct, List<Type> fieldResults) {
throw new UnsupportedOperationException("Data conversion of struct type is not supported");
return com.datastrato.gravitino.rel.types.Types.StructType.of(
struct.fields().stream()
FANNG1 marked this conversation as resolved.
Show resolved Hide resolved
.map(
nestedField ->
com.datastrato.gravitino.rel.types.Types.StructType.Field.of(
nestedField.name(),
fieldResults.get(struct.fields().indexOf(nestedField)),
nestedField.isOptional(),
nestedField.doc()))
.toArray(com.datastrato.gravitino.rel.types.Types.StructType.Field[]::new));
}

@Override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,9 @@
import com.datastrato.gravitino.rel.types.Type;
import com.datastrato.gravitino.rel.types.Types;
import com.google.common.collect.Lists;
import java.util.Arrays;
import java.util.List;
import java.util.stream.Collectors;

/**
* Type converter belonging to gravitino.
Expand Down Expand Up @@ -52,6 +54,19 @@ public static <T> T visit(Type type, ToIcebergTypeVisitor<T> visitor) {
} else if (type instanceof Types.ListType) {
Types.ListType list = (Types.ListType) type;
return visitor.array(list, visit(list.elementType(), visitor));
} else if (type instanceof Types.StructType) {
Types.StructType struct = (Types.StructType) type;
Types.StructType.Field[] fields = struct.fields();
List<IcebergColumn> columns =
Arrays.stream(fields)
.map(
field -> {
return ConvertUtil.fromGravitinoField(field, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The id is always zero?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's best to generate this ID according to iceberg's generation rules

Copy link
Contributor Author

@TEOTEO520 TEOTEO520 Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The id will not be used in the subsequent logic, visitor.struct() will generate this ID according to iceberg's generation rules.
My idea is that since the generated id here will not be used later, assigning a value of 0 directly to the id will make the code more concise

  @Override
  public Type struct(IcebergTable struct, List<Type> types) {
    List<IcebergColumn> fields =
        Arrays.stream(struct.columns())
            .map(column -> (IcebergColumn) column)
            .collect(Collectors.toList());
    List<Types.NestedField> newFields = Lists.newArrayListWithExpectedSize(fields.size());
    boolean isRoot = root == struct;

    for (int i = 0; i < fields.size(); i += 1) {
      IcebergColumn field = fields.get(i);
      Type type = types.get(i);

      // for new conversions, use ordinals for ids in the root struct
      int id = isRoot ? i : getNextId();
      if (field.nullable()) {
        newFields.add(Types.NestedField.optional(id, field.name(), type, field.comment()));
      } else {
        newFields.add(Types.NestedField.required(id, field.name(), type, field.comment()));
      }
    }
    return Types.StructType.of(newFields);
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it is necessary to add generated ID logic in this method, though these ID will never be used?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's confirmed that this won't be used, then I think we can pass a 0 to keep the code clean. We should first raise an issue to refactor the previous code and abandon the use of ID.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 0 of id passed here will never be used in the subsequent logic, so It's not necessary to generate id here.
As you mentioned, it will be much appreciate if you can raise an issue to discuss the refactor of previous code and abandon the use of ID.

})
.collect(Collectors.toList());
IcebergTable mockTable =
new IcebergTable.Builder().withColumns(columns.toArray(new IcebergColumn[0])).build();
FANNG1 marked this conversation as resolved.
Show resolved Hide resolved
return visit(mockTable, visitor);
} else {
return visitor.atomic((Type.PrimitiveType) type);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,25 @@ public void testCreateIcebergTable() {
.withComment(ICEBERG_COMMENT)
.withNullable(false)
.build();
Column[] columns = new Column[] {col1, col2};
Types.StructType structTypeInside =
Types.StructType.of(
Types.StructType.Field.notNullField("integer_field_inside", Types.IntegerType.get()),
Types.StructType.Field.notNullField(
"string_field_inside", Types.StringType.get(), "string field inside"));
Types.StructType structType =
Types.StructType.of(
Types.StructType.Field.notNullField("integer_field", Types.IntegerType.get()),
Types.StructType.Field.notNullField(
"string_field", Types.StringType.get(), "string field"),
Types.StructType.Field.nullableField("struct_field", structTypeInside, "struct field"));
IcebergColumn col3 =
new IcebergColumn.Builder()
.withName("col_3")
.withType(structType)
.withComment(ICEBERG_COMMENT)
.withNullable(false)
.build();
Column[] columns = new Column[] {col1, col2, col3};

SortOrder[] sortOrders = createSortOrder();
Table table =
Expand All @@ -169,6 +187,7 @@ public void testCreateIcebergTable() {
Assertions.assertEquals("val2", loadedTable.properties().get("key2"));
Assertions.assertTrue(loadedTable.columns()[0].nullable());
Assertions.assertFalse(loadedTable.columns()[1].nullable());
Assertions.assertFalse(loadedTable.columns()[2].nullable());

Assertions.assertTrue(icebergCatalog.asTableCatalog().tableExists(tableIdentifier));
NameIdentifier[] tableIdents =
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -252,6 +252,37 @@ public void testFormIcebergType() {
Assertions.assertTrue(
((com.datastrato.gravitino.rel.types.Types.ListType) gravitinoListType).elementType()
instanceof com.datastrato.gravitino.rel.types.Types.StringType);

Types.StructType structTypeInside =
FANNG1 marked this conversation as resolved.
Show resolved Hide resolved
Types.StructType.of(
Types.NestedField.optional(
2, "integer_type_inside", Types.IntegerType.get(), "integer type"),
Types.NestedField.optional(
3, "string_type_inside", Types.StringType.get(), "string type"));
Types.StructType structType =
Types.StructType.of(
Types.NestedField.optional(0, "integer_type", Types.IntegerType.get(), "integer type"),
Types.NestedField.optional(1, "struct_type", structTypeInside, "struct type inside"));
com.datastrato.gravitino.rel.types.Type gravitinoStructType =
ConvertUtil.formIcebergType(structType);
Assertions.assertTrue(
(gravitinoStructType) instanceof com.datastrato.gravitino.rel.types.Types.StructType);
Assertions.assertTrue(
((com.datastrato.gravitino.rel.types.Types.StructType) gravitinoStructType)
.fields()[0].type()
instanceof com.datastrato.gravitino.rel.types.Types.IntegerType);
Assertions.assertTrue(
((com.datastrato.gravitino.rel.types.Types.StructType)
((com.datastrato.gravitino.rel.types.Types.StructType) gravitinoStructType)
.fields()[1].type())
.fields()[0].type()
instanceof com.datastrato.gravitino.rel.types.Types.IntegerType);
Assertions.assertTrue(
((com.datastrato.gravitino.rel.types.Types.StructType)
((com.datastrato.gravitino.rel.types.Types.StructType) gravitinoStructType)
.fields()[1].type())
.fields()[1].type()
instanceof com.datastrato.gravitino.rel.types.Types.StringType);
}

@Test
Expand Down
Loading