Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade Glue Client to AWS SDK v2 #17866

Conversation

ShubhamChaurasia
Copy link
Member

@ShubhamChaurasia ShubhamChaurasia commented Jun 13, 2023

Description

This PR updates the glue client to AWS SDK v2. It mainly converts the GlueHiveMetastore and TrinoGlueCatalog used by respectively trino-hive and trino-iceberg to v2.

  • The packages have been relocated from com.amazonaws.services.glue to software.amazon.awssdk.services.glue. Creation of clients and requests have been moved to builder pattern.

v1 version

new DeleteDatabaseRequest().withName(database)

v2 version

DeleteDatabaseRequest.builder().name(database).build()
  • In v1 the async client extended the sync client and hence had both the sync and async methods. In v2, the async client only has the async methods which return CompletableFuture. In order to use the async method for a sync call, it needs to be blocked using join() or get(). As this is a common use case the handling has been added as a utility to io.trino.plugin.hive.metastore.glue.AwsSdkUtil.

  • The request handlers are converted to use the ExecutionInterceptor which is used to intercept the request and perform several actions during the lifecycle of the request.

  • Converting MetricsRequester to MetricsPublisher to collect metrics and store them in GlueMetastoreStats. Some of the metrics have been changed. Below is the result.

v1 version

select "awsclientexecutetime.alltime.avg", "awsclientexecutetime.alltime.count", "awsrequesttime.alltime.avg", "awsrequesttime.alltime.count", "awsclientretrypausetime.alltime.avg", "awsclientretrypausetime.alltime.count", "awsthrottleexceptions.totalcount", "awsrequestcount.totalcount", "awsretrycount.totalcount", "awshttpclientpoolavailablecount", "awshttpclientpoolleasedcount", "awshttpclientpoolpendingcount"  from "trino.plugin.hive.metastore.glue:name=hive,type=gluehivemetastore";

Result:

"awsclientexecutetime.alltime.avg","awsclientexecutetime.alltime.count","awsrequesttime.alltime.avg","awsrequesttime.alltime.count","awsclientretrypausetime.alltime.avg","awsclientretrypausetime.alltime.count","awsthrottleexceptions.totalcount","awsrequestcount.totalcount","awsretrycount.totalcount","awshttpclientpoolavailablecount","awshttpclientpoolleasedcount","awshttpclientpoolpendingcount"
"505.086064688","125.0","425.98159896799996","125.0","NaN","0.0","0","125","0","5","0","0"

v2 version

select "awsservicecallduration.alltime.avg", "awsservicecallduration.alltime.count", "awsapicallduration.alltime.avg", "awsapicallduration.alltime.count", "awsbackoffdelayduration.alltime.avg", "awsbackoffdelayduration.alltime.count", "awsthrottleexceptions.totalcount", "awsrequestcount.totalcount", "awsretrycount.totalcount", "awshttpclientpoolavailablecount", "awshttpclientpoolleasedcount", "awshttpclientpoolpendingcount"  from "trino.plugin.hive.metastore.glue:name=hive,type=gluehivemetastore";

Result:

"awsservicecallduration.alltime.avg","awsservicecallduration.alltime.count","awsapicallduration.alltime.avg","awsapicallduration.alltime.count","awsbackoffdelayduration.alltime.avg","awsbackoffdelayduration.alltime.count","awsthrottleexceptions.totalcount","awsrequestcount.totalcount","awsretrycount.totalcount","awshttpclientpoolavailablecount","awshttpclientpoolleasedcount","awshttpclientpoolpendingcount"
"920.0434782608696","23.0","924.3913043478261","23.0","0.0","23.0","0","23","0","1","0","0"

Co-authored-by: Karan Makhija[email protected]

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text:

`hive.metastore.glue.pin-client-to-current-region` is deprecated. Current region will be inferred automatically if running on EC2 machine.

@cla-bot
Copy link

cla-bot bot commented Jun 13, 2023

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@github-actions github-actions bot added delta-lake Delta Lake connector hive Hive connector iceberg Iceberg connector tests:hive labels Jun 13, 2023
@electrum
Copy link
Member

I'm not sure why the original code used the async version of the interface, but we can use the synchronous GlueClient in the new version.


<dependency>
<groupId>software.amazon.awssdk</groupId>
<artifactId>http-client-spi</artifactId>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use apache-client since we don't need or want async behavior. Please see trino-filesystem-s3 to see how we use it with S3.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few details in this comment. trino-hive may not be able to use sync client as mentioned.
Should I convert the one in trino-iceberg(TrinoGlueCatalog) to sync ?

/**
* Helper method to handle sync request with async client
*/
public static <Request, Result> Result awsSyncRequest(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be needed if we use GlueClient.

import software.amazon.awssdk.services.glue.model.UpdateColumnStatisticsForTableRequest;
import software.amazon.awssdk.services.glue.model.UpdateDatabaseRequest;
import software.amazon.awssdk.services.glue.model.UpdatePartitionRequest;
import software.amazon.awssdk.services.glue.model.UpdateTableRequest;

import static java.util.Objects.requireNonNull;

public class GlueCatalogIdRequestHandler
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're updating all the code, I think it would make more sense to drop this class and set the catalog ID directly when creating the request objects. @ebyhr thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That way we will need to pass catalogId (or GlueHiveMetastoreConfig) to all the parts of the code that create a glue request. Currently this is needed only at one place where the client is getting created, and GlueCatalogIdRequestHandler getting injected there. @electrum @ebyhr thoughts ?

Copy link
Member

@electrum electrum Jun 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, the alternative is to update all of the call sites in GlueHiveMetastore. It looks like most requests of a given type are only made in one place, so the amount of code should be the same (probably fewer lines). The advantage of the GlueCatalogIdRequestHandler approach is that we guarantee that no places are missed, so perhaps we should leave this alone.

It is unfortunate that the Glue SDK doesn't have a common interface or subclass for Glue requests for the catalog ID methods.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, have kept it as is.

@ShubhamChaurasia
Copy link
Member Author

I'm not sure why the original code used the async version of the interface, but we can use the synchronous GlueClient in the new version.

@electrum thanks for the review. I am working on addressing the other comments. On this one, I think the async client originally existed because of the batchUpdatePartitionAsync, batchGetPartitionAsync, batchCreatePartitionAsync methods in GlueHiveMetastore which operate on the batches of partitions (concurrently). I think it is appropriate for them to be async. With sync client we will have to explicitly do these operations in multiple threads. WDYT ?

TrinoGlueCatalog in iceberg module do not have any such requirements yet, and can work with sync client. I can convert that.

@cla-bot
Copy link

cla-bot bot commented Jun 15, 2023

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@electrum
Copy link
Member

That makes sense. Let's change Iceberg to use synchronous, but use async for Hive. I note that the V1 client actually used an executor internally, but with V2 we can use the async client and reduce thread usage (but with higher fixed thread usage and overhead due to Netty).

@ShubhamChaurasia
Copy link
Member Author

Sure @electrum, changed glue calls in iceberg module to use sync client.

@findinpath
Copy link
Contributor

@ShubhamChaurasia pls rebase on top of master to address the code conflicts.

@ShubhamChaurasia ShubhamChaurasia force-pushed the aws-sdk-v2-upgrade/glue branch 2 times, most recently from cc68a9a to 4343d86 Compare July 4, 2023 03:32
@ShubhamChaurasia
Copy link
Member Author

@findinpath rebased and updated.

This CR updates the glue client to aws sdk v2. It mainly converts the GlueHiveMetastore and TrinoGlueCatalog used by respectively `trino-hive` and `trino-iceberg` to v2.

- The packages have been relocated from com.amazonaws.services.glue to software.amazon.awssdk.services.glue.
Creation of clients and requests have been moved to builder pattern.

v1 version
```
new DeleteDatabaseRequest().withName(database)
```

v2 version
```
DeleteDatabaseRequest.builder().name(database).build()
```

- In v1 the async client extended the sync client and hence had both the sync and async methods. In v2, the async client only has the async methods which return CompletableFuture. In order to use the async method for a sync call, it needs to be blocked using join() or get(). As this is a common use case the handling has been added as a utility to io.trino.plugin.hive.metastore.glue.AwsSdkUtil.

- The request handlers are converted to use the `ExecutionInterceptor` which is used to intercept the request and perform several actions during the lifecycle of the request.

- Converting MetricsRequester to MetricsPublisher to collect metrics and store them in GlueMetastoreStats. Some of the metrics have been changed. Below is the result.

v1 version
```
select "awsclientexecutetime.alltime.avg", "awsclientexecutetime.alltime.count", "awsrequesttime.alltime.avg", "awsrequesttime.alltime.count", "awsclientretrypausetime.alltime.avg", "awsclientretrypausetime.alltime.count", "awsthrottleexceptions.totalcount", "awsrequestcount.totalcount", "awsretrycount.totalcount", "awshttpclientpoolavailablecount", "awshttpclientpoolleasedcount", "awshttpclientpoolpendingcount"  from "trino.plugin.hive.metastore.glue:name=hive,type=gluehivemetastore";

Result:

"awsclientexecutetime.alltime.avg","awsclientexecutetime.alltime.count","awsrequesttime.alltime.avg","awsrequesttime.alltime.count","awsclientretrypausetime.alltime.avg","awsclientretrypausetime.alltime.count","awsthrottleexceptions.totalcount","awsrequestcount.totalcount","awsretrycount.totalcount","awshttpclientpoolavailablecount","awshttpclientpoolleasedcount","awshttpclientpoolpendingcount"
"505.086064688","125.0","425.98159896799996","125.0","NaN","0.0","0","125","0","5","0","0"
```

v2 version
```
select "awsservicecallduration.alltime.avg", "awsservicecallduration.alltime.count", "awsapicallduration.alltime.avg", "awsapicallduration.alltime.count", "awsbackoffdelayduration.alltime.avg", "awsbackoffdelayduration.alltime.count", "awsthrottleexceptions.totalcount", "awsrequestcount.totalcount", "awsretrycount.totalcount", "awshttpclientpoolavailablecount", "awshttpclientpoolleasedcount", "awshttpclientpoolpendingcount"  from "trino.plugin.hive.metastore.glue:name=hive,type=gluehivemetastore";

Result:

"awsservicecallduration.alltime.avg","awsservicecallduration.alltime.count","awsapicallduration.alltime.avg","awsapicallduration.alltime.count","awsbackoffdelayduration.alltime.avg","awsbackoffdelayduration.alltime.count","awsthrottleexceptions.totalcount","awsrequestcount.totalcount","awsretrycount.totalcount","awshttpclientpoolavailablecount","awshttpclientpoolleasedcount","awshttpclientpoolpendingcount"
"920.0434782608696","23.0","924.3913043478261","23.0","0.0","23.0","0","23","0","1","0","0"
```

Co-authored-by: Karan Makhija<[email protected]>
…client-to-current-region

This commit addresses review comments and includes following changes -
1. Uses paginator APIs provided by SDK V2 to avoid handling pagination externally
2. Deprecates "hive.metastore.glue.pin-client-to-current-region" as the SDK will automatically try to determine the region if the code is running in EC2 machine.
@ShubhamChaurasia
Copy link
Member Author

@electrum @findinpath rebased again, could you please check ?

Copy link

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Jan 15, 2024
@mosabua
Copy link
Member

mosabua commented Jan 15, 2024

👋 @ShubhamChaurasia @findinpath @electrum .. could you please collaborate and figure out if this PR and approach is still valid and should be implemented.

We're working on closing out old and inactive PRs, so if you're too busy or this has too many merge conflicts to be worth picking back up, we'll be making another pass to close it out in a few weeks.

@findinpath
Copy link
Contributor

@ShubhamChaurasia pls rebase. We should be now in good shape to review the changes to eventually land this PR.

@ShubhamChaurasia
Copy link
Member Author

@mosabua @findinpath thanks for letting me know, will give rebase a try.

@github-actions github-actions bot removed the stale label Jan 16, 2024
Copy link

github-actions bot commented Feb 7, 2024

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Feb 7, 2024
@findinpath
Copy link
Contributor

@ShubhamChaurasia do you still plan to continue with the work needed to land this PR?

@findinpath findinpath mentioned this pull request Feb 12, 2024
@github-actions github-actions bot removed the stale label Feb 12, 2024
Copy link

github-actions bot commented Mar 5, 2024

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Mar 5, 2024
Copy link

Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time.

@github-actions github-actions bot closed this Mar 27, 2024
@mosabua
Copy link
Member

mosabua commented May 17, 2024

This PR is made redundant by the merged update with much more improvements in #20657

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed delta-lake Delta Lake connector hive Hive connector iceberg Iceberg connector stale
Development

Successfully merging this pull request may close these issues.

4 participants