-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#4758] correct Iceberg document location is immutable not reserved #4808
Conversation
docs/lakehouse-iceberg-catalog.md
Outdated
| `location` | Iceberg location for table storage. | None | No | 0.2.0 | | ||
| `format` | Iceberg table format like `iceberg/parquet`. | None | No | 0.2.0 | | ||
| `format-version` | Iceberg meta data format version. | None | No | 0.2.0 | | ||
| `provider` | Table provider like `iceberg`. | None | No | 0.2.0 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
provider
is the concept of Spark. I'm not sure whether it should be a property of Iceberg catalogs. Could you give more resources about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually format version can be changed. v1 can change to v2. But v2 can't change to v1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can change format, too. Iceberg can mix to use multiple formats.
For example, partition 1 use parquet. partiton 2 use orc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
location can be changed too. For one Iceberg table, they can have different locations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, they can changed by Iceberg, but we doesn't test and verify the action, we'd better keep it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a consensus on the changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users can change underlying Iceberg catalog these properties, you can't make them immutable. It will make users confusing.
docs/lakehouse-iceberg-catalog.md
Outdated
| Configuration item | Description | Default value | Required | Since Version | | ||
|--------------------|----------------------------------------------|---------------|----------|---------------| | ||
| `location` | Iceberg location for table storage. | None | No | 0.2.0 | | ||
| `format` | Iceberg table format like `iceberg/parquet`. | None | No | 0.2.0 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What kinds of format do we support?
docs/lakehouse-iceberg-catalog.md
Outdated
| `location` | Iceberg location for table storage. | None | No | 0.2.0 | | ||
| `format` | Iceberg table format like `iceberg/parquet`. | None | No | 0.2.0 | | ||
| `format-version` | Iceberg meta data format version. | None | No | 0.2.0 | | ||
| `provider` | Table provider like `iceberg`. | None | No | 0.2.0 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The provider seems useless.
docs/lakehouse-iceberg-catalog.md
Outdated
| `location` | Iceberg location for table storage. | None | No | 0.2.0 | | ||
| `format` | Iceberg table format like `iceberg/parquet`. | None | No | 0.2.0 | | ||
| `format-version` | Iceberg meta data format version. | None | No | 0.2.0 | | ||
| `provider` | Table provider like `iceberg`. | None | No | 0.2.0 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users can change underlying Iceberg catalog these properties, you can't make them immutable. It will make users confusing.
@jerqi you could create seperate issue to change the Iceberg properties, it's not related to this PR. |
docs/lakehouse-iceberg-catalog.md
Outdated
| `cherry-pick-snapshot-id` | Selecting a specific snapshot in a merge operation. | None | No | 0.2.0 | | ||
| `sort-order` | Iceberg table sort order, please use `SortOrder` in table meta instead. | None | No | 0.2.0 | | ||
| `identifier-fields` | The identifier fields for defining the table. | None | No | 0.2.0 | | ||
| `write.distribution-mode` | Defines distribution of write data, please use `distribution` in table meta instead. | None | No | 0.2.0 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this property have no default value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't have default property values to show up, you may mean Iceberg has a default value for write.distribution-mode
, but we shouldn't depend on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Different Iceberg table may have different default values, too. I thought this property is a property to create a catalog. If not, it's ok. https://iceberg.apache.org/docs/latest/spark-writes/#writing-distribution-modes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we could remove default value column for this table, because most properties is not managed by Gravitino, so default value not so much meaningful here, it just shows these properties are immutable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed default value
and required
column from the table, since this's seems not so useful for reserved properties. cc @mchades
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can state that the default value is generated by Iceberg and provide the link.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed
default value
andrequired
column from the table, since this's seems not so useful for reserved properties. cc @mchades
It's ok for me. One Iceberg user like me will be confused about the different default value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what you mean about different default value
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM.
What changes were proposed in this pull request?
correct Iceberg document location is immutable not reserved
Why are the changes needed?
Fix: #4758
Does this PR introduce any user-facing change?
no
How was this patch tested?
document