Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hive Meta Store impersonation access #14464

Closed
wants to merge 1 commit into from

Conversation

BlueStalker
Copy link

This commits squash the original commits from PR
#13699
which includes the follow commits:

HMS impersonation access
refactoring to use HMS Authentication Module
add Config for multiple hms instances
Update HMS memory settings
address review comments

== RELEASE NOTES ==

General Changes
* N/A

Hive Changes
* add hive metastore impersonation access
* add multiple hive metastore instances support and metrics
* Code refactoring

If release note is NOT required, use:

== NO RELEASE NOTE ==

This commits squash the original commits from PR
prestodb#13699
which includes the follow commits:

HMS impersonation access
refactoring to use HMS Authentication Module
add Config for multiple hms instances
Update HMS memory settings
address review comments
@BlueStalker
Copy link
Author

@arhimondr This comes from #13699 and let's make another round of code reviews. Thanks. Also, FYI, @zhenxiao

@arhimondr
Copy link
Member

@BlueStalker Thank you, will start reviewing it soon.

Copy link
Member

@arhimondr arhimondr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just skimmed through. Some high level comments.

I don't see any changes to the CachingHiveMetastore. How is this supposed to work?

Let's assume the first call to getTable is done by user Alice that is authorized to get the table. The response is cached by the CachingHiveMetastore. Then the user Bob calls getTable, and the table information is returned from the CachingHiveMetastore without any security checks. This is a potential security risk.

@@ -1450,6 +1450,12 @@
<version>1.3.5-4</version>
</dependency>

<dependency>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use ApacheCommons in Presto. Please use Guava instead.

@@ -131,6 +131,8 @@ Property Name Description
to the Hive metastore service.

``hive.metastore.client.keytab`` Hive metastore client keytab location.
``hive.metastore.impersonation.enabled`` Enable metastore end-user impersonation.
``hive.metastore.impersonation.user`` Default impersonation user when communicating with Hive Metastore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what cases we fallback to the default user? It feels like if impersonation is enabled we should never contact the metastore with "default".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MetastoreHiveStatisticsProvider#getPartitionsStatistics, it gets the paritition stats.

@@ -7,6 +7,11 @@ set -euo pipefail -x
cleanup_docker_containers
start_docker_containers

# restart HMS to pickup memory settings
exec_in_hadoop_master_container cp /etc/hadoop/conf/hive-env.sh /etc/hive/conf/hive-env.sh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes should go directly to the docker containers. Here is the repository: https://github.com/prestodb/docker-images

@BlueStalker
Copy link
Author

Hi, Andrii, Sorry for late reply.
Actually I am wondering how the CachingHiveMetastore works here. I saw in HiveMetadataFactory, it delegates SemiTransactionalHiveMetastore into "CachingHiveMetastore.memoizeMetastore(this.metastore, perTransactionCacheMaximumSize)", the cache is per transaction scope if I understand correctly? Anywhere else we are using this cache globally?

Just skimmed through. Some high level comments.

I don't see any changes to the CachingHiveMetastore. How is this supposed to work?

Let's assume the first call to getTable is done by user Alice that is authorized to get the table. The response is cached by the CachingHiveMetastore. Then the user Bob calls getTable, and the table information is returned from the CachingHiveMetastore without any security checks. This is a potential security risk.

@arhimondr
Copy link
Member

@BlueStalker The global metastore cache can be enabled with https://github.com/prestodb/presto/blob/master/presto-hive-metastore/src/main/java/com/facebook/presto/hive/MetastoreClientConfig.java#L35.

Also It doesn't feel right to wrap with doAs at the SemiTransactionalHiveMetastore level. The doAs will only work for ThriftHiveMetastoreClient.java. Wrapping it with doAs actually breaks encapsulation. Other metastore implementation may have their own means of providing user. It feels like the information about identity has to be passed all the way down to the ThriftHiveMetastoreClient, and then the ThriftHiveMetastoreClient can implement authentication by applying doAs, as this is very ThriftHiveMetastoreClient specific.

@stale
Copy link

stale bot commented Dec 12, 2020

This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the task, make sure you've addressed reviewer comments, and rebase on the latest master. Thank you for your contributions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants