-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an information_schema. role_authorization_descriptors table #3535
Conversation
@kokosing FYI Note that I have not done the predicate pushdown, yet. This has most of the plumbing. The caching seems to be fairly effective already, though.
Although I assume there's some other optimization at work. :) Is there a place where the information_schema is documented? (I could not find any) |
For the predicate push down I can think of three scenarios:
For (1) I'd loop over the passed values (up to a limit) and call |
Added predicate pushdown as described above. For role names it supports all kinds of predicates (since we can enumerate all existing roles and then filter them), grantee names support "point values" only. To avoid surprises the grantee_type does not have to the specified, the code will retrieve both USERs and ROLEs of that name and then post-filter. The "public" role is not listed, since that mapping is not actually stored anywhere and so cannot be retrieved when querying through roles. There's a basic test. If there's a good place to test the predicate pushdown, please point me to it. Other than perhaps more test and a place to put the documentation, this should be good for review. |
Looking at the test-failures... |
All tests fixed now. @kokosing |
Expanded the test a bit to include a predicate filter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just skimmed
presto-hive/src/main/java/io/prestosql/plugin/hive/HiveMetadata.java
Outdated
Show resolved
Hide resolved
presto-hive/src/main/java/io/prestosql/plugin/hive/metastore/HiveMetastore.java
Outdated
Show resolved
Hide resolved
presto-hive/src/test/java/io/prestosql/plugin/hive/TestHiveIntegrationSmokeTest.java
Outdated
Show resolved
Hide resolved
presto-tests/src/test/java/io/prestosql/tests/TestInformationSchemaConnector.java
Show resolved
Hide resolved
presto-tests/src/test/java/io/prestosql/tests/TestInformationSchemaConnector.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for jumping into this late.
information_schema
is covered by the SQL specification, so it should only contain tables and columns exactly as specified. Non-standard things should go in the system
catalog or be connector specific.
Based on the comments and code in the table implementation, this seems to be Hive specific, so I suggest making this a Hive connector specific table like system.role_grants
. You can bind an implementation of SystemTable
in HiveModule
.
Otherwise, making this a generic system table with an SPI change will require more thorough design.
I like the idea of system table, I regret that I haven't thought about this in the first place. However, I am not sure if it is easy to implement predicate pushdown for that easily. |
There's certainly precedence for non-standard information_schemas:
Without predicate pushdown I do not think that this would be useful. In fact I cannot even find ROLES, ENABLED_ROLES, APPLICABLE_ROLES in the SQL standard: Edit:
|
Ok. I found a way for doing this, including some more limited predicate pushdown (limit is not supported and only single values). Before I go on with this, let's please agree to do so and what the table should be called exactly. :) I still think it's much better to expose this along |
I think we’d need to reserve My main concern is that the implementation in the engine is specific to Hive. I’m not familiar enough with roles in the SQL standard to know what this should look like and will need to research it. Maybe @martint knows. |
Thanks @electrum . I have a change more or less ready for the system table approach (was actually a good opportunity to understand the bootstrapping process better). The only problem left is not to install that table when SqlStandardAccessControl is not enabled for a catalog. For now I'll hold off until I hear more. Edit: Figured that last part out as well by binding the new system table in the SqlStandardSecurityModule. So that particular system table would only show up when SqlStandardAccessControl is enabled in a catalog. |
For the In the future I'd envision that we could expose ROLES, ENABLED_ROLES, APPLICABLE_ROLES, TABLE_PRIVILEGES (and hopefully ROLE_GRANTS) for those as well. I suppose the general question is how much of that we allow to do via Presto and for how much of these we'd defer to the underlying databases. |
I started a slack discussion on this. I did find this in the standard (https://crate.io/docs/sql-99/en/latest/chapters/16.html#the-information-schema):
And this (http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt):
So adding a table to the Also from reading the standard the |
Based on discussions on slack I propose the following: Add Note: Technically the existong ROLES table and this table should be in a schema called |
I'll post an update soon that adds the In order to avoid a large amount of code both schemas are served from the current InformationSchema* classes. If that is not acceptable I can disentangle that too. |
|
Fixing some tests. |
I'll look at the remaining tests when we have agreed on final way to expose this information. I have now explored four different ways:
All of these are good options (IMHO). Let's pick one. :) |
Rebased yesterday evening. Yep... (2) or (4) seem to be the best options. If we decide for (2) I'll remove the last commit and continue from there, for (4) I'd squash the two commits and then go from there. This is also a friendly (ping) :) |
I didn't realize that the existing role tables in I'm still concerned that the comments in |
Thanks @electrum . That's an excellent point. These are Hive Metastore limitations. There are two options, I think.
OK... I'll go with (2) above + change the name of the table to Edit: Do you prefer the new table be name |
OK... Updated to use I will now look into how to add a more useful method to |
Good. Functional tests pass. That's a baseline. I'll have a change to push the Hive specific logic into the Hive Connector. |
OK... Have a look. The Hive logic is now where it should be: |
long count = 0; | ||
if (grantees.isPresent()) { | ||
TOP: | ||
for (String grantee : grantees.get()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's a more elegant to phrase this let me know.
Instead of the break label
that could build() and return as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extract a method and use return
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks decent. I left initial comments.
@@ -2346,6 +2346,12 @@ public void dropRole(ConnectorSession session, String role) | |||
return accessControlMetadata.listRoles(session); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The schema of this table is:
role_nane | grantor | grantor_type | grantee | grantee_type | is_grantable
Is this how it is defined in standard? Does standard defines types for these columns? Do we use same column types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ROLE_NAME, GRANTEE, and GRANTOR are defined as INFORMATION_SCHEMA.SQL_IDENTIFIER (which looks like it's not defined in the copy of the standard I have).
I think we should go by the other tables we have (ROLES, APPLICABLE_ROLES, ENABLED_ROLES), which use VARCHAR as I have done here.
The standard does not have grantor_type and grantee_type. This is an artifact Presto (PrestoPrincipals can be USER or ROLE). We need those to support any connector where USERs and ROLEs are from separate domains, and we have to distinguish them.
presto-hive/src/main/java/io/prestosql/plugin/hive/metastore/file/FileHiveMetastore.java
Outdated
Show resolved
Hide resolved
...-hive/src/main/java/io/prestosql/plugin/hive/metastore/thrift/ThriftHiveMetastoreClient.java
Outdated
Show resolved
Hide resolved
presto-hive/src/main/java/io/prestosql/plugin/hive/security/SqlStandardAccessControl.java
Outdated
Show resolved
Hide resolved
long count = 0; | ||
if (grantees.isPresent()) { | ||
TOP: | ||
for (String grantee : grantees.get()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extract a method and use return
?
...main/src/main/java/io/prestosql/connector/informationschema/InformationSchemaPageSource.java
Show resolved
Hide resolved
...main/src/main/java/io/prestosql/connector/informationschema/InformationSchemaPageSource.java
Outdated
Show resolved
Hide resolved
presto-tests/src/test/java/io/prestosql/tests/TestInformationSchemaConnector.java
Show resolved
Hide resolved
...o-main/src/main/java/io/prestosql/connector/informationschema/InformationSchemaMetadata.java
Show resolved
Hide resolved
@@ -182,7 +186,7 @@ public ConnectorTableProperties getTableProperties(ConnectorSession session, Con | |||
} | |||
|
|||
return Optional.of(new LimitApplicationResult<>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a tests that covers cases like filter(limit(scan)
and limit(filter(scan))
. In other words I think we should consider banning filter pushdown if there is already limit applied as I guess it is buggy today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's something I wanted to ask. It seems we can push down either a filter or a limit, but not both.
I debugged it and found that the framework is doing the right thing. When a filter is present it does not push the limit down (i.e. does not call applyLimit) and vice versa. So that works fine, and I assume the "framework" has tests for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Framework might have such tests but for regular tables. I think we need to add new ones for information_schema
tables.
Thanks for the review @kokosing . Pushed a new change:
Didn't change:
|
Enhanced MockConnectorFactory and added role tests to TestInformationSchema. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More comments...
presto-hive/src/main/java/io/prestosql/plugin/hive/metastore/file/FileHiveMetastore.java
Outdated
Show resolved
Hide resolved
presto-hive/src/main/java/io/prestosql/plugin/hive/metastore/file/FileHiveMetastore.java
Outdated
Show resolved
Hide resolved
...o-hive/src/main/java/io/prestosql/plugin/hive/security/SqlStandardAccessControlMetadata.java
Outdated
Show resolved
Hide resolved
...o-hive/src/main/java/io/prestosql/plugin/hive/security/SqlStandardAccessControlMetadata.java
Outdated
Show resolved
Hide resolved
...o-hive/src/main/java/io/prestosql/plugin/hive/security/SqlStandardAccessControlMetadata.java
Outdated
Show resolved
Hide resolved
"('test_r_a_d1', null, null, 'user', 'USER', 'NO')"); | ||
|
||
assertQuery( | ||
"SELECT * FROM information_schema.role_authorization_descriptors LIMIT 1000000000", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use LIMIT 5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried specifically with a very large number to make sure it does not cause any issues.
@@ -182,7 +186,7 @@ public ConnectorTableProperties getTableProperties(ConnectorSession session, Con | |||
} | |||
|
|||
return Optional.of(new LimitApplicationResult<>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Framework might have such tests but for regular tables. I think we need to add new ones for information_schema
tables.
...main/src/main/java/io/prestosql/connector/informationschema/InformationSchemaPageSource.java
Outdated
Show resolved
Hide resolved
return; | ||
} | ||
|
||
for (RoleGrant grant : metadata.listAllRoleGrants(session, catalogName, roles, grantees, limit)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it is doable to implement predicate and limit pushdown only here (instead of SqlStandardAccessControlMetadata
)? That way it will out of the box for any other connectors or even for any access control metadata in hive. That would possibly require adding different methods to metadata
(more granular ones).
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I had first, but @electrum correctly pointed out that this is a vert Hive specific.
So in this design here we just pass the predicates down, and then let the Connectors do what's best for them. (For example only HMS has this weird limit that the only APIs are getting Roles by a single Principal, or Principals by a single Role).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, but I am afraid that we went to far with this being too much hive specific and so it can be difficult in future to reuse it as one might need to reimplement predicate pushdown in the one use case.
HMS has this weird limit that the only APIs are getting Roles by a single Principal, or Principals by a single Role
Could expose this in Connector metadata SPI? I would imagine that in future we could also have batch version of these methods. In case when batch version is not implemented (like in Hive) we could fallback to using methods to access roles for given principal or principals for given role.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ConnectorMetadata has now two relevant methods: (1) list all roles, and (2) list all role grants (with optional filter criteria).
I think connectors will be quite different on this front. Databases (Postgres, Oracel, MySQL, Phoenix, etc) have their own information_schema
and the connector would just turn this into a query on their information_schema
in that case we'd want to pass the predicate through (both on role and principal).
Others (like Hive) have limited and different APIs, and that needs some logic to be "optimal". Yet, other connectors have other way to query their role metadata.
Both methods may or may not be batched, but it's completely hidden now. The connector decides how to implement them. Note that the Hive implementation is completely hidden in the Hive Metadata, and the only part the InformationSchema needs are these two methods to be called. Everything else is automated (extracting the predicate, passing it down, etc)
If we do not want to leak Hive specifics into the InformationSchemas code I do not see how to do it fundamentally differently. If I put specific logic any higher than ConnectorMetadata we'd leak the details of Hive up.
But maybe I don't understand what you mean...
Perhaps could outline what methods you would add to ConnectorMetadata, and how would the code in InformationSchemaPageSource use them...?
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
presto-tests/src/test/java/io/prestosql/tests/TestInformationSchemaConnector.java
Show resolved
Hide resolved
Thanks for the review @kokosing . Pushed another update.
|
Also added predicate pushdown counting, which also verifies that LIMIT is not pushed together with the other predicates. |
Pushed a rebase. Let's get this over the finishing line. :) |
Let's solve the: #3535 (comment) |
Thanks @kokosing . See my comment: #3535 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
% minor comments
Please ping me once it is ready to be merged.
...o-main/src/main/java/io/prestosql/connector/informationschema/InformationSchemaMetadata.java
Outdated
Show resolved
Hide resolved
granteesPushedCounter.incrementAndGet(); | ||
} | ||
if (limit.isPresent()) { | ||
limitPushedCounter.incrementAndGet(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
listRowGrantsCallsCounter
, rolesPushedCounter
, rolesPushedCounter
and limitPushedCounter
should be grouped into a single bean that represents interaction with metadata call. Typically we had single counter per method, you had 4. Using bean we would maintain having single counter per method, where one counter is just complex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added beans: a Counter bean a Count bean to compare against.
@kokosing , pushed a new update. Let me know whether the bean approach in CountingMockConnector is what you had in mind. |
TestThrottledAsyncQueue seems unrelated and passes locally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know whether the bean approach in CountingMockConnector is what you had in mind.
Yes. I left few comments there.
Also I would like to merge this pull request after the current release goes out. I don't want to mess with the current release related work.
assertMetadataCalls( | ||
"SELECT count(*) FROM (SELECT * FROM test_catalog.information_schema.role_authorization_descriptors WHERE grantee = 'user5' LIMIT 1)", "VALUES 1", | ||
new MetadataCallsCount() | ||
.withRoleGrantCount(1, 0, 1, 0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is very nice. How about going one step further and having this more fluent, like
new MetadataCallsCount().with(roleGrantCount(/*method calls count*/ 1).withGranteesPushedDown(1));
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had those methods before. It just seemed to now make more sense to compare the beans...
Maybe it was better the way I had it before?
In the end we have certain metrics we want to count for the connector; those are the atomic counters.
I think I misunderstood what you wanted... MetadataCallsCount is already a "bean", and so having the individual members for counts is fine...?
} | ||
} | ||
|
||
public static class RoleGrantCounter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RoleGrantCounter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... Naming is hard. :)
private final AtomicLong listRowGrantsCallsCounter = new AtomicLong(); | ||
private final AtomicLong rolesPushedCounter = new AtomicLong(); | ||
private final AtomicLong granteesPushedCounter = new AtomicLong(); | ||
private final AtomicLong limitPushedCounter = new AtomicLong(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to have bean to be immutable. incrementListRoleGrants
could return new instance. Then you don't need two classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The AtomicLongs for the other counters are mutable. I think this counter would be inherently mutable.
How do you handle atomicity between calls? Have the counter bean be an atomic reference and you update the reference with each new instance?
We're going to need 2 beans, right? One for counting and one to compare against during the test.
The more I think about it, the more I think the previous code was better.
The schema of this table is: role_nane | grantor | grantor_type | grantee | grantee_type | is_grantable For Hive, queries on this table are translated to calls to metastore.get_principals_in_role or metastore.get_role_grants_for_principal, with proper predicate pushdown to avoid excessive requests to the metastore. grantor and grantor_type are not yet implemented and always NULL.
Pushed another update. @kokosing . |
Merged, thanks! |
Thank you @kokosing . And thanks for the detailed review on this - it makes me feel really good about the quality of the Presto code base! |
Update:
this is now called information_schema.role_authorization_descriptors
The schema of this table is:
| role_name | grantor | grantor_type | grantee | grantee_type | is_grantable |
Connectors can use that to make role grant information available to (admin) users by implementing ConnectorMetadata.listAllRoleGrants, with an option to filter by predicates for optimization.
Currently this is implemented in the Hive Connector only.
grantor and grantor_type are not yet supported and always null. (but it could be added in a separate DR).
Continued from #3232