Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added ability to have unique table location for each iceberg table #6063

Merged
merged 1 commit into from
Aug 3, 2021

Conversation

sshkvar
Copy link
Contributor

@sshkvar sshkvar commented Nov 23, 2020

#5632

Added new iceberg configuration property iceberg.unique-table-location

By default this property = false, so table directory will have the same name as table.
In case iceberg.unique-table-location = true unique UUID will be added to the table directory name, so each table will have unique location

Copy link
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments, otherwise looks good

@sshkvar
Copy link
Contributor Author

sshkvar commented Nov 24, 2020

A few minor comments, otherwise looks good

@electrum Thanks for the comments, I have performed code fixes for all of them

@sshkvar
Copy link
Contributor Author

sshkvar commented Nov 26, 2020

@electrum is it possible to have this changes in the next release?

@sshkvar
Copy link
Contributor Author

sshkvar commented Dec 3, 2020

@electrum could you please approve this PR in case of all looks good?

@sshkvar
Copy link
Contributor Author

sshkvar commented Dec 7, 2020

@electrum I would be really appreciate for your review of this PR

Copy link
Member

@raunaqmorarka raunaqmorarka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add test to AbstractTestIcebergSmoke to verify that some UUID as added to table location.

Does iceberg.unique-table-location have any purpose beyond making it possible for Trino to delete data on drop of table ?
If not, it would be easier to have just one flag iceberg.delete-files-on-table-drop and implement both unique table location and data removal behind that flag rather than requiring users to set 2 flags.

Should we also be providing this as a table property ? From user point of view it's easier to create a "managed" table using table property rather than changing config to set a catalog property.

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sshkvar thanks for working on this.

Can you please rebase?

We should also have a test for this that would cover
CREATE + DROP
CREATE + RENAME + DROP

showing that table data gets correctly dropped.

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the test.
Overall LGTM. Some editorial comments in the test.

@findepi findepi requested review from electrum and phd3 July 29, 2021 07:20
@findepi
Copy link
Member

findepi commented Jul 29, 2021

cc @rdblue

Comment on lines 70 to 73
public static DistributedQueryRunner createIcebergQueryRunner(Map<String, String> extraProperties,
FileFormat format,
List<TpchTable<?>> tables,
Optional<File> metastoreDirectory)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update formatting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the link to code style.
Formatting updated

public static DistributedQueryRunner createIcebergQueryRunner(
Map<String, String> extraProperties,
FileFormat format,
List<TpchTable<?>> tables,
Optional<File> metastoreDirectory)
Optional<File> metastoreDirectory,
Optional<Map<String, String>> icebergPropertiesOverride)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We call it connectorProperties in other XxxQueryRunner classes.

also, no need for optional, since empty map has the same meaning

           Map<String, String> connectorProperties)

also, please add it right after extraProperties (also seems usual)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 52 to 53
File tempDir = Files.createTempDirectory("test_iceberg_v2").toFile();
metastoreDir = new File(tempDir, "iceberg_data");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why two levels of nesting?
Would it be fine to have just

metastoreDir = Files.createTempDirectory("....");

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

however, it seems the test doesn't need to know the metastoreDir, so instead of setting it up (and cleaning it later), it should be sufficient to pass Optional.empty() for the metastoreDir, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need metastoreDir because it is used in createTestingFileHiveMetastore(metastoreDir),
but I removed two levels nesting


return createIcebergQueryRunner(
ImmutableMap.of(),
ORC,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use default

new IcebergConfig().getFileFormat()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a follow up we should remove format parameter and use the new connectorProperties instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replaced with new IcebergConfig().getFileFormat()

assertTrue(table.isPresent(), "Table should exists");
String location = table.get().getStorage().getLocation();
assertTrue(location.matches(format(".*%s-[0-9a-f]{32}", TABLE)), "Table location should have UUID suffix");
assertTrue(location.matches(format(".*%s-[0-9a-f]{32}", "table_with_uuid")), "Table location should have UUID suffix");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not use assertTrue with complex expressions. In case of failure, it produces unhelpful message.

Use

[org.assertj.core.api.Assertions.] assertThat(location).matches(....);

instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@findepi
Copy link
Member

findepi commented Aug 3, 2021

@sshkvar could you please remove the merge commits from the git history?
you should be able to do this with git rebase.

@sshkvar
Copy link
Contributor Author

sshkvar commented Aug 3, 2021

@sshkvar could you please remove the merge commits from the git history?
you should be able to do this with git rebase.

done

@findepi
Copy link
Member

findepi commented Aug 3, 2021

@sshkvar thanks!
is the separation into 4 commits significant here? if not, can you squash them into single commit?
(again, git rebase may be useful here)

@sshkvar
Copy link
Contributor Author

sshkvar commented Aug 3, 2021

@findepi I combined changes to single commit

@findepi findepi merged commit bea74d5 into trinodb:master Aug 3, 2021
@findepi findepi added this to the 361 milestone Aug 3, 2021
@findepi findepi mentioned this pull request Aug 3, 2021
10 tasks
@findepi
Copy link
Member

findepi commented Aug 3, 2021

Merged, thanks!

@sshkvar
Copy link
Contributor Author

sshkvar commented Aug 3, 2021

Merged, thanks!

Thanks you for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants