Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#4758] correct Iceberg document location is immutable not reserved #4808

Merged
merged 5 commits into from
Sep 5, 2024

Conversation

FANNG1
Copy link
Contributor

@FANNG1 FANNG1 commented Aug 30, 2024

What changes were proposed in this pull request?

correct Iceberg document location is immutable not reserved

Why are the changes needed?

Fix: #4758

Does this PR introduce any user-facing change?

no

How was this patch tested?

document

@FANNG1 FANNG1 self-assigned this Aug 30, 2024
| `location` | Iceberg location for table storage. | None | No | 0.2.0 |
| `format` | Iceberg table format like `iceberg/parquet`. | None | No | 0.2.0 |
| `format-version` | Iceberg meta data format version. | None | No | 0.2.0 |
| `provider` | Table provider like `iceberg`. | None | No | 0.2.0 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

provider is the concept of Spark. I'm not sure whether it should be a property of Iceberg catalogs. Could you give more resources about it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually format version can be changed. v1 can change to v2. But v2 can't change to v1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can change format, too. Iceberg can mix to use multiple formats.
For example, partition 1 use parquet. partiton 2 use orc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

location can be changed too. For one Iceberg table, they can have different locations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, they can changed by Iceberg, but we doesn't test and verify the action, we'd better keep it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a consensus on the changes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users can change underlying Iceberg catalog these properties, you can't make them immutable. It will make users confusing.

@FANNG1
Copy link
Contributor Author

FANNG1 commented Sep 3, 2024

@jerqi @mchades any other comments?

| Configuration item | Description | Default value | Required | Since Version |
|--------------------|----------------------------------------------|---------------|----------|---------------|
| `location` | Iceberg location for table storage. | None | No | 0.2.0 |
| `format` | Iceberg table format like `iceberg/parquet`. | None | No | 0.2.0 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kinds of format do we support?

| `location` | Iceberg location for table storage. | None | No | 0.2.0 |
| `format` | Iceberg table format like `iceberg/parquet`. | None | No | 0.2.0 |
| `format-version` | Iceberg meta data format version. | None | No | 0.2.0 |
| `provider` | Table provider like `iceberg`. | None | No | 0.2.0 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The provider seems useless.

| `location` | Iceberg location for table storage. | None | No | 0.2.0 |
| `format` | Iceberg table format like `iceberg/parquet`. | None | No | 0.2.0 |
| `format-version` | Iceberg meta data format version. | None | No | 0.2.0 |
| `provider` | Table provider like `iceberg`. | None | No | 0.2.0 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users can change underlying Iceberg catalog these properties, you can't make them immutable. It will make users confusing.

@FANNG1
Copy link
Contributor Author

FANNG1 commented Sep 3, 2024

@jerqi you could create seperate issue to change the Iceberg properties, it's not related to this PR.

| `cherry-pick-snapshot-id` | Selecting a specific snapshot in a merge operation. | None | No | 0.2.0 |
| `sort-order` | Iceberg table sort order, please use `SortOrder` in table meta instead. | None | No | 0.2.0 |
| `identifier-fields` | The identifier fields for defining the table. | None | No | 0.2.0 |
| `write.distribution-mode` | Defines distribution of write data, please use `distribution` in table meta instead. | None | No | 0.2.0 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this property have no default value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't have default property values to show up, you may mean Iceberg has a default value for write.distribution-mode, but we shouldn't depend on it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Different Iceberg table may have different default values, too. I thought this property is a property to create a catalog. If not, it's ok. https://iceberg.apache.org/docs/latest/spark-writes/#writing-distribution-modes

Copy link
Contributor Author

@FANNG1 FANNG1 Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we could remove default value column for this table, because most properties is not managed by Gravitino, so default value not so much meaningful here, it just shows these properties are immutable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed default value and required column from the table, since this's seems not so useful for reserved properties. cc @mchades

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can state that the default value is generated by Iceberg and provide the link.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems complicated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed default value and required column from the table, since this's seems not so useful for reserved properties. cc @mchades

It's ok for me. One Iceberg user like me will be confused about the different default value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what you mean about different default value?

Copy link
Contributor

@mchades mchades left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM.

@jerryshao jerryshao merged commit a0cb174 into apache:main Sep 5, 2024
17 checks passed
github-actions bot pushed a commit that referenced this pull request Sep 5, 2024
…4808)

### What changes were proposed in this pull request?
correct Iceberg document  location is immutable not reserved

### Why are the changes needed?

Fix: #4758 

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
document
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Improvement] how to specify Iceberg table location
4 participants