Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hive: close the fileIO client when closing the hive catalog #10771

Merged
merged 4 commits into from
Jul 25, 2024

Conversation

hussein-awala
Copy link
Member

This PR fixes the following warning:

WARN S3FileIO: Unclosed S3FileIO instance created by:
        org.apache.iceberg.aws.s3.S3FileIO.initialize(S3FileIO.java:359)
        org.apache.iceberg.CatalogUtil.loadFileIO(CatalogUtil.java:350)
        org.apache.iceberg.hive.HiveCatalog.initialize(HiveCatalog.java:111)
        ...

@github-actions github-actions bot added the hive label Jul 24, 2024
@Override
public void close() throws IOException {
super.close();
fileIO.close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fileIO could be null at this point if initialization failed and close is called immediately after it, so it would be good to add a null check.

Generally speaking, there's no guarantee that adding close() here actually fixes the exception you mentioned in the PR. You would most likely need a similar concept to what was introduced by #7487 / #8315

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just created a fileIOCloser to track the IO files and close them when the catalog is closed, could your recheck please?

@amogh-jahagirdar
Copy link
Contributor

Looks like the flink test failure is unrelated, I reopened #10356 since it looks like that test is still flaky. I'm retriggering CI.

Comment on lines 534 to 536
HiveTableOperations hiveTableOperations =
new HiveTableOperations(conf, clients, fileIO, name, dbName, tableName);
fileIOCloser.put(hiveTableOperations, hiveTableOperations.io());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something I've wondered for a bit, is it possible to uplevel any of this logic to BaseMetastoreCatalog so that we avoid having every catalog which extends from that to have it's own Cache<TableOperations, FileIO>?

Or does that not generalize? Not a blocker, but just wanted to make sure we at least thought through it to avoid duplication.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to move it to uplevel but I found some subclasses without a FileIO instance (ViewAwareTableBuilder and HadoopCatalogTableBuilder), if you have any suggestions to generalize it, I'm open to it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a TODO on my list to generalize this, so I'll take a look at this (hopefully this week)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amogh-jahagirdar I've opened #10893 to addresse this

@amogh-jahagirdar
Copy link
Contributor

Thanks @hussein-awala , and thank you @nastra for reviewing!

@amogh-jahagirdar amogh-jahagirdar merged commit 7e2920a into apache:main Jul 25, 2024
60 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants