-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Iceberg: Add support for creating and dropping tables using Iceberg object store #20555
base: master
Are you sure you want to change the base?
Conversation
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
Re-created from #20516 |
cda8e04
to
4f83e8a
Compare
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
Any thoughts @amogh-jahagirdar |
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time. |
I added the stale-ignore label so the PR stays open. Please rebase as a next step. |
6ed669a
to
a7454d1
Compare
Hi @mosabua, pushed the rebase in. Looks like the failed check timed out but I'm not able to restart it. |
Please ask for help on the #iceberg or the #core-dev channel |
ed46a94
to
e452e6d
Compare
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergConfig.java
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergConfig.java
Show resolved
Hide resolved
Hi @findinpath, did you have a chance to take a look at my updates? |
7473dad
to
a99dde1
Compare
* - `iceberg.data-location` | ||
- Sets the path that data files will be written to. | ||
- table location + /data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why this property is needed for catalog level. When do we set this globally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, my mistake. I did mean to add this globally so it doesn't necessarily need to be configured per table. I'll add this in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ebyhr actually, looking at it I have this property set up in the same way iceberg.file-format
is set up so it should be configurable at the catalog level already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iceberg.file-format
is unrelated. I'm asking the scenario when we want to set it globally.
Please remove this property if you are not sure. We can add it in follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, gotcha. I thought you were asking if it's possible to set it globally.
The use case for being able to set it globally would be if you want every table to be created with the object store file layout to preemptively avoid running into rate limiting. Without being able to set it globally, you would have to manually update tables that run into rate limiting which can be time consuming and prone to missing some tables that might run into the issue in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting it globally would also be helpful if you want to have all of your tables follow the same setup, instead of some using the object store file layout and some using the default
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSessionProperties.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergTableProperties.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergTableWithObjectStore.java
Show resolved
Hide resolved
// TODO: support path override in Iceberg table creation: https://github.com/trinodb/trino/issues/8861 | ||
if (table.properties().containsKey(OBJECT_STORE_PATH) || | ||
table.properties().containsKey("write.folder-storage.path") || // Removed from Iceberg as of 0.14.0, but preserved for backward compatibility | ||
table.properties().containsKey(WRITE_METADATA_LOCATION) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why was WRITE_METADATA_LOCATION removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I totally follow. I'm not removing the property, just its usage here since Trino will be able to drop tables created with Iceberg's object storage file layout with this change
Adding support for updating this table property would be useful, see It is a common to create a table and only later encounter object storage rate limiting. Not allowing updates would make this workflow to fix rate limiting a pain. Not sure if updates should go in this PR or another one @ebyhr but heads up. |
892a76d
to
38fe6eb
Compare
38fe6eb
to
78224d6
Compare
That's a good callout. I'll add it in this PR |
@jakelong95 I already push the code to support |
Yeah, saw that you added adding in object_store_enabled. I'm adding in support for modifying data_location since the two properties are related |
Description
Currently, Trino is only able to write to and read from tables that use Iceberg's
ObjectStorageLocationProvider
, but is unable to create or drop tables using the location provider.This PR enables Trino to create tables using Iceberg's object storage by adding the following properties:
iceberg.object-store.enabled
- Corresponds with Spark'swrite.object-storage.enabled
, which enables use of Iceberg's ObjectStorageLocationProvidericeberg.data-location
- Corresponds to Spark'swrite.data.path
, which sets where Iceberg data files will be writtenEnabling the object store property and setting the data path will cause Iceberg to provide data file locations prefixed by a deterministic hash generated from the file name in the location specified by
write.data.path
, which will help reduce throttling from cloud storage systems like S3 by evenly distributing files across multiple prefixes. Iceberg's own documentation on this feature has more information.For example, without the object store enabled you would get the following locations for data files, all under the same prefix
iceberg-tables/myschema/mytable/data
:s3://mybucket/iceberg-tables/myschema/mytable/data/file1.parquet
s3://mybucket/iceberg-tables/myschema/mytable/data/file2.parquet
s3://mybucket/iceberg-tables/myschema/mytable/data/file3.parquet
But, if you enable the object store and set the data path to is set to
s3://mybucket/datafiles
, you would get the following locations, each in their own prefix:s3://mybucket/datafiles/<file1 hash>/myschema/mytable/file1.parquet
s3://mybucket/datafiles/<file2 hash>/myschema/mytable/file2.parquet
s3://mybucket/datafiles/<file3 hash>/myschema/mytable/file3.parquet
Additional context and related issues
This PR maintains compatibility with Spark by using the table properties
write.object-storage.enabled
andwrite.data.path
, which had previously been set up to allow Trino to write to Iceberg tables using theObjectStorageLocationProvider
in #8573Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text: