Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an information_schema. role_authorization_descriptors table #3535

Merged
merged 1 commit into from
Jun 1, 2020
Merged

Add an information_schema. role_authorization_descriptors table #3535

merged 1 commit into from
Jun 1, 2020

Conversation

lhofhansl
Copy link
Member

@lhofhansl lhofhansl commented Apr 24, 2020

Update:
this is now called information_schema.role_authorization_descriptors
The schema of this table is:
| role_name | grantor | grantor_type | grantee | grantee_type | is_grantable |

Connectors can use that to make role grant information available to (admin) users by implementing ConnectorMetadata.listAllRoleGrants, with an option to filter by predicates for optimization.

Currently this is implemented in the Hive Connector only.

grantor and grantor_type are not yet supported and always null. (but it could be added in a separate DR).

Continued from #3232

@cla-bot cla-bot bot added the cla-signed label Apr 24, 2020
@lhofhansl
Copy link
Member Author

@kokosing FYI

Note that I have not done the predicate pushdown, yet. This has most of the plumbing. The caching seems to be fairly effective already, though.

presto> select count(*) from information_schema.role_grants,information_schema.role_grants,information_schema.role_grants,information_schema.role_grants,information_schema.role_grants,information_schema.role_grants,information_schema.role_grants,information_schema.role_grants,information_schema.role_grants,information_schema.role_grants,information_schema.role_grants;
   _col0   
-----------
 362797056 
(1 row)

Query 20200424_023942_00015_7issh, FINISHED, 1 node
Splits: 198 total, 198 done (100.00%)
0:00 [126 rows, 40B] [477 rows/s, 152B/s]

Although I assume there's some other optimization at work. :)

Is there a place where the information_schema is documented? (I could not find any)

@lhofhansl
Copy link
Member Author

For the predicate push down I can think of three scenarios:

  1. point values for role_name - WHERE role_name = 'x', or WHERE role_name in (...), etc.
  2. arbitrary predicate on role_name - WHERE role_name LIKE ..., etc.
    3a. point values for grantee and grantee_type - WHERE grantee = 'x' AND grantee_type = 'USER', or WHERE (grantee, grantee_type) IN (('x', 'USER', 'y', 'ROLE', ...)
    3b. point values for grantee (name) only.

For (1) I'd loop over the passed values (up to a limit) and call metastore.listPrincipals
For (2) I'd retrieve all roles with metadata.listRoles, then filter them by the predicate and retrieve the remaining ones with metadata.listPrinicipals. That way any predicate would work. Perhaps (1) should just do the same.
For (3a) this looks quite tricky to me.
For (3b) I'd loop over the passed grantee name and retrieve them with two calls metadata.listRoleGrants one for USER and one for ROLE. This is still a pretty significant optimization when you look for the roles granted to a specific user.

@lhofhansl
Copy link
Member Author

lhofhansl commented Apr 24, 2020

Added predicate pushdown as described above.

For role names it supports all kinds of predicates (since we can enumerate all existing roles and then filter them), grantee names support "point values" only. To avoid surprises the grantee_type does not have to the specified, the code will retrieve both USERs and ROLEs of that name and then post-filter.

The "public" role is not listed, since that mapping is not actually stored anywhere and so cannot be retrieved when querying through roles.

There's a basic test. If there's a good place to test the predicate pushdown, please point me to it.

Other than perhaps more test and a place to put the documentation, this should be good for review.

@kokosing @martint

@lhofhansl
Copy link
Member Author

Looking at the test-failures...

@lhofhansl
Copy link
Member Author

All tests fixed now. @kokosing

@lhofhansl
Copy link
Member Author

Expanded the test a bit to include a predicate filter.

Copy link
Member

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just skimmed

Copy link
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for jumping into this late.

information_schema is covered by the SQL specification, so it should only contain tables and columns exactly as specified. Non-standard things should go in the system catalog or be connector specific.

Based on the comments and code in the table implementation, this seems to be Hive specific, so I suggest making this a Hive connector specific table like system.role_grants. You can bind an implementation of SystemTable in HiveModule.

Otherwise, making this a generic system table with an SPI change will require more thorough design.

@kokosing
Copy link
Member

I like the idea of system table, I regret that I haven't thought about this in the first place. However, I am not sure if it is easy to implement predicate pushdown for that easily.

@lhofhansl
Copy link
Member Author

lhofhansl commented Apr 27, 2020

There's certainly precedence for non-standard information_schemas:

Without predicate pushdown I do not think that this would be useful.

In fact I cannot even find ROLES, ENABLED_ROLES, APPLICABLE_ROLES in the SQL standard:
http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt

Edit:
Poking around a bit... It's important that this table is catalog scoped. Not sure how I'd do that with the system connector.

presto> select * from system.information_schema.roles;
Query 20200427_231741_00015_74n8c failed: This connector does not support roles

presto> select * from information_schema.roles;
 role_name 
-----------
 admin     
 public    
 auth      
 test      
 x         
(5 rows)

Query 20200427_231747_00016_74n8c, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0:00 [5 rows, 45B] [21 rows/s, 191B/s]

@lhofhansl
Copy link
Member Author

lhofhansl commented Apr 28, 2020

Another question... So if we had a Hive catalog called x; there would then be a table x.system.role_grants, right?

What if somebody has a system schema already in their Hive catalog? How would we ensure that there would be no name collisions?

@electrum @kokosing

@lhofhansl
Copy link
Member Author

Ok. I found a way for doing this, including some more limited predicate pushdown (limit is not supported and only single values).

Before I go on with this, let's please agree to do so and what the table should be called exactly. :)

I still think it's much better to expose this along ROLES, ENABLED_ROLES, and APPLICABLE_ROLES in information_schema

@electrum
Copy link
Member

I think we’d need to reserve system as a special name for Presto (in the Hive connector). We could make that name configurable if it’s a problem.

My main concern is that the implementation in the engine is specific to Hive. I’m not familiar enough with roles in the SQL standard to know what this should look like and will need to research it. Maybe @martint knows.

@lhofhansl
Copy link
Member Author

lhofhansl commented Apr 29, 2020

Thanks @electrum . I have a change more or less ready for the system table approach (was actually a good opportunity to understand the bootstrapping process better). The only problem left is not to install that table when SqlStandardAccessControl is not enabled for a catalog.

For now I'll hold off until I hear more.

Edit: Figured that last part out as well by binding the new system table in the SqlStandardSecurityModule. So that particular system table would only show up when SqlStandardAccessControl is enabled in a catalog.

@lhofhansl
Copy link
Member Author

lhofhansl commented Apr 29, 2020

For the information_schema we have more connectors that do support standard SQL access control: PostgreSQL, Phoenix, MySQL, SQLServer, and others.

In the future I'd envision that we could expose ROLES, ENABLED_ROLES, APPLICABLE_ROLES, TABLE_PRIVILEGES (and hopefully ROLE_GRANTS) for those as well.

I suppose the general question is how much of that we allow to do via Presto and for how much of these we'd defer to the underlying databases.

@kokosing @electrum @martint

@lhofhansl
Copy link
Member Author

lhofhansl commented Apr 29, 2020

I started a slack discussion on this.

I did find this in the standard (https://crate.io/docs/sql-99/en/latest/chapters/16.html#the-information-schema):

The total number of Views in INFORMATION_SCHEMA, and their exact definition, is non-standard because the SQL Standard allows implementors to add additional Views, as well as to add additional Columns to the Standard- defined Views, to describe additional, implementation-defined features. However, except for one exception, the View descriptions that follow must all be supported by your SQL DBMS.

And this (http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt):

An implementation may define objects that are associated with INFORMATION_SCHEMA that are not defined in this Clause. An im- plementation may also add columns to tables that are defined in this Clause.

So adding a table to the information_schema really seems to be the right way here.

Also from reading the standard the information_schema seems to be for users (not admins). SQL-99 does describe ENABLED_ROLES and APPLICABLE_ROLES, but not ROLES (since ROLES requires admin privileges). So yet another option is to put both ROLES and ROLE_GRANTS in a system schema. Edit: Then again, ROLES and ROLE_GRANTS still just what a user can see in the information_schema... Regular user just do not see anything.
(I still prefer information_schema.role_grants)

@lhofhansl
Copy link
Member Author

Based on discussions on slack I propose the following:

Add information_schema.ROLE_AUTHORIZATION_DESCRIPTORS with the following shape:
(role, grantor, grantor_type, grantee, grantee_type, is_grantable)

Note: Technically the existong ROLES table and this table should be in a schema called definition_schema.
I'm proposing - just like ROLES - to put it in information_schema, but again we wouldn't be SQL standard compliant. (Perhaps with a followup PR to move both into a schema called definition_schema.)

@lhofhansl
Copy link
Member Author

I'll post an update soon that adds the definition_schema with both roles and role_authorization_descriptors in it. roles will also remain available information_schema for backwards compatibility.

In order to avoid a large amount of code both schemas are served from the current InformationSchema* classes. If that is not acceptable I can disentangle that too.

@lhofhansl
Copy link
Member Author

lhofhansl commented May 1, 2020

Push as a new commit, so that we can see the difference. If OK, I'll squash before commit.

@martint , @electrum , @kokosing please have a look.

In a next step we could replace enabled_roles and applicable_roles with actual views over role_authorization_descriptors

@lhofhansl
Copy link
Member Author

presto> show schemas;
       Schema       
--------------------
[...]
 definition_schema  
 information_schema 
[...]
(11 rows)

Query 20200501_020552_00067_7ij2u, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:00 [11 rows, 129B] [43 rows/s, 514B/s]

presto> show tables in definition_schema;
             Table              
--------------------------------
 role_authorization_descriptors 
 roles                          
(2 rows)

Query 20200501_020609_00068_7ij2u, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:00 [2 rows, 89B] [6 rows/s, 307B/s]

presto> show tables in information_schema;          
      Table       
------------------
 applicable_roles 
 columns          
 enabled_roles    
 roles            
 schemata         
 table_privileges 
 tables           
 views            
(8 rows)

Query 20200501_020616_00069_7ij2u, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:00 [8 rows, 300B] [32 rows/s, 1.19KB/s]

presto> select * from definition_schema.role_authorization_descriptors;
 role_name | grantor | grantor_type | grantee | grantee_type | is_grantable 
-----------+---------+--------------+---------+--------------+--------------
 admin     | NULL    | NULL         | admin   | USER         | YES          
 admin     | NULL    | NULL         | lars    | USER         | NO           
 admin     | NULL    | NULL         | user    | USER         | YES          
 admin     | NULL    | NULL         | test    | USER         | NO           
 auth      | NULL    | NULL         | lars    | USER         | NO           
 auth      | NULL    | NULL         | auth    | USER         | NO           
 test      | NULL    | NULL         | auth    | ROLE         | NO           
 test      | NULL    | NULL         | lars    | USER         | YES          
(8 rows)

Query 20200501_020635_00070_7ij2u, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0:00 [8 rows, 360B] [29 rows/s, 1.28KB/s]

@lhofhansl
Copy link
Member Author

Fixing some tests.

@lhofhansl
Copy link
Member Author

lhofhansl commented May 1, 2020

I'll look at the remaining tests when we have agreed on final way to expose this information.

I have now explored four different ways:

  1. SHOW PRINCIPALS role and SHOW ROLE GRANTS FOR
  2. A new table in information_schema (I called it role_grants, but we can just name it role_authorization_descriptors and add the grantor columns)
  3. A system table of the same shape.
  4. roles, and role_authorization_descriptors in a new definition_schema. Also keep a roles table in information_schema for backward compatibility.

All of these are good options (IMHO). Let's pick one. :)
(I'd vote for (2) followed by (4), followed by (3) and (1)... In that order.)

@kokosing
Copy link
Member

kokosing commented May 4, 2020

I would go with 2 and possibly with 4 later.

We need to to get conclusion before we move any further. @martint @electrum What is your voice here?

@lhofhansl
Copy link
Member Author

Rebased yesterday evening.

Yep... (2) or (4) seem to be the best options.
(4) is a bit more SQL compliant, but (2) is more in line with what we have already in Presto, a smaller less disruptive change, and perhaps less of a surprise to users (no new schema).

If we decide for (2) I'll remove the last commit and continue from there, for (4) I'd squash the two commits and then go from there.

This is also a friendly (ping) :)
(I'd like to do an internal release with this functionality and would prefer to do it with the open source implementation.)

@lhofhansl
Copy link
Member Author

@martint @electrum ping... Just point me in the right direction and I'll continue.

@electrum
Copy link
Member

I didn't realize that the existing role tables in information_schema are non-standard. That was probably a miss when we were reviewing and merging that feature. But since we already have those, I think it makes sense to be consistent with those. Let's go with option 2.

I'm still concerned that the comments in InformationSchemaPageSource talk about the Hive metastore. How specific to Hive is this implementation? Would it be applicable to other connectors?

@lhofhansl
Copy link
Member Author

lhofhansl commented May 14, 2020

Thanks @electrum . That's an excellent point. These are Hive Metastore limitations.

There are two options, I think.

  1. Say that listRoleGrants and listPrincipals is the lowest common denominator. Then all connectors would have to implement those two.
  2. Think of a more general interface in Metadata, and connectors are free to implement that as they see fit. The trick would be to still benefit from predicate push down.

OK... I'll go with (2) above + change the name of the table to role_authorization_descriptors.
And I'll think through a metadata interface that is better applicable to all connectors.

Edit: Do you prefer the new table be name ROLE_GRANTS, or ROLE_AUTHORIZATION_DESCRIPTORS? ROLE_AUTHORIZATION_DESCRIPTORS is "more standard" but it technically belongs in the definition_schema.

@lhofhansl
Copy link
Member Author

OK... Updated to use information_schema.role_authorization_descriptors.
The syntax should be final.

I will now look into how to add a more useful method to Metadata and ConnectorMetadata to allow connectors retrieving all role grants. Something like 'getAllRoleGrants` with optional role and grantees passes for filtering.

@lhofhansl
Copy link
Member Author

lhofhansl commented May 15, 2020

Good. Functional tests pass. That's a baseline. I'll have a change to push the Hive specific logic into the Hive Connector.

@lhofhansl
Copy link
Member Author

OK... Have a look. The Hive logic is now where it should be: io.prestosql.plugin.hive.security.SqlStandardAccessControlMetadata in a new method: listAllRoleGrants that is being passed the relevant predicated to optionally filter the results.

long count = 0;
if (grantees.isPresent()) {
TOP:
for (String grantee : grantees.get()) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's a more elegant to phrase this let me know.
Instead of the break label that could build() and return as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract a method and use return?

Copy link
Member

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks decent. I left initial comments.

@@ -2346,6 +2346,12 @@ public void dropRole(ConnectorSession session, String role)
return accessControlMetadata.listRoles(session);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The schema of this table is:
role_nane | grantor | grantor_type | grantee | grantee_type | is_grantable

Is this how it is defined in standard? Does standard defines types for these columns? Do we use same column types?

Copy link
Member Author

@lhofhansl lhofhansl May 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ROLE_NAME, GRANTEE, and GRANTOR are defined as INFORMATION_SCHEMA.SQL_IDENTIFIER (which looks like it's not defined in the copy of the standard I have).
I think we should go by the other tables we have (ROLES, APPLICABLE_ROLES, ENABLED_ROLES), which use VARCHAR as I have done here.

The standard does not have grantor_type and grantee_type. This is an artifact Presto (PrestoPrincipals can be USER or ROLE). We need those to support any connector where USERs and ROLEs are from separate domains, and we have to distinguish them.

long count = 0;
if (grantees.isPresent()) {
TOP:
for (String grantee : grantees.get()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract a method and use return?

@@ -182,7 +186,7 @@ public ConnectorTableProperties getTableProperties(ConnectorSession session, Con
}

return Optional.of(new LimitApplicationResult<>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a tests that covers cases like filter(limit(scan) and limit(filter(scan)). In other words I think we should consider banning filter pushdown if there is already limit applied as I guess it is buggy today.

Copy link
Member Author

@lhofhansl lhofhansl May 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's something I wanted to ask. It seems we can push down either a filter or a limit, but not both.
I debugged it and found that the framework is doing the right thing. When a filter is present it does not push the limit down (i.e. does not call applyLimit) and vice versa. So that works fine, and I assume the "framework" has tests for that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Framework might have such tests but for regular tables. I think we need to add new ones for information_schema tables.

@lhofhansl
Copy link
Member Author

lhofhansl commented May 19, 2020

Thanks for the review @kokosing .

Pushed a new change:

  • added LIMIT tests. The interesting bit there was that unless we force an ordering the order is more or less random, and if we do provide an ordering the optimization cannot be used. So I'm just verifying the count except for the one case where I know internally which one row will be returned
  • Cleaned up the login in SqlStandardAccessControlMetadata
  • Renamed the variable names as requested.
  • Fixed the logic in InformationSchemaMetadata

Didn't change:

  • extra tests for InformationSchema.
  • extra permission ROLE_AUTHORIZATION_DESCRIPTORS

@lhofhansl
Copy link
Member Author

Enhanced MockConnectorFactory and added role tests to TestInformationSchema.

Copy link
Member

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More comments...

"('test_r_a_d1', null, null, 'user', 'USER', 'NO')");

assertQuery(
"SELECT * FROM information_schema.role_authorization_descriptors LIMIT 1000000000",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use LIMIT 5

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried specifically with a very large number to make sure it does not cause any issues.

@@ -182,7 +186,7 @@ public ConnectorTableProperties getTableProperties(ConnectorSession session, Con
}

return Optional.of(new LimitApplicationResult<>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Framework might have such tests but for regular tables. I think we need to add new ones for information_schema tables.

return;
}

for (RoleGrant grant : metadata.listAllRoleGrants(session, catalogName, roles, grantees, limit)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it is doable to implement predicate and limit pushdown only here (instead of SqlStandardAccessControlMetadata)? That way it will out of the box for any other connectors or even for any access control metadata in hive. That would possibly require adding different methods to metadata (more granular ones).

WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I had first, but @electrum correctly pointed out that this is a vert Hive specific.
So in this design here we just pass the predicates down, and then let the Connectors do what's best for them. (For example only HMS has this weird limit that the only APIs are getting Roles by a single Principal, or Principals by a single Role).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, but I am afraid that we went to far with this being too much hive specific and so it can be difficult in future to reuse it as one might need to reimplement predicate pushdown in the one use case.

HMS has this weird limit that the only APIs are getting Roles by a single Principal, or Principals by a single Role

Could expose this in Connector metadata SPI? I would imagine that in future we could also have batch version of these methods. In case when batch version is not implemented (like in Hive) we could fallback to using methods to access roles for given principal or principals for given role.

WDYT?

Copy link
Member Author

@lhofhansl lhofhansl May 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConnectorMetadata has now two relevant methods: (1) list all roles, and (2) list all role grants (with optional filter criteria).

I think connectors will be quite different on this front. Databases (Postgres, Oracel, MySQL, Phoenix, etc) have their own information_schema and the connector would just turn this into a query on their information_schema in that case we'd want to pass the predicate through (both on role and principal).
Others (like Hive) have limited and different APIs, and that needs some logic to be "optimal". Yet, other connectors have other way to query their role metadata.

Both methods may or may not be batched, but it's completely hidden now. The connector decides how to implement them. Note that the Hive implementation is completely hidden in the Hive Metadata, and the only part the InformationSchema needs are these two methods to be called. Everything else is automated (extracting the predicate, passing it down, etc)

If we do not want to leak Hive specifics into the InformationSchemas code I do not see how to do it fundamentally differently. If I put specific logic any higher than ConnectorMetadata we'd leak the details of Hive up.

But maybe I don't understand what you mean...
Perhaps could outline what methods you would add to ConnectorMetadata, and how would the code in InformationSchemaPageSource use them...?

Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@lhofhansl
Copy link
Member Author

lhofhansl commented May 20, 2020

Thanks for the review @kokosing .

Pushed another update.

  • I left the predicate pushdown where it is, since it is really only the connector that knows how to best optimize. I think this is as good as it gets, otherwise we'd be leaking Hive specific details into all other connectors.
  • Also have not done the predicate pushdown counting, yet. Looking into that part now.

@lhofhansl
Copy link
Member Author

Also added predicate pushdown counting, which also verifies that LIMIT is not pushed together with the other predicates.

@lhofhansl lhofhansl requested a review from kokosing May 27, 2020 16:10
@lhofhansl
Copy link
Member Author

Pushed a rebase. Let's get this over the finishing line. :)

@kokosing
Copy link
Member

Let's get this over the finishing line. :)

Let's solve the: #3535 (comment)

@lhofhansl
Copy link
Member Author

Thanks @kokosing . See my comment: #3535 (comment)

Copy link
Member

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

% minor comments

Please ping me once it is ready to be merged.

granteesPushedCounter.incrementAndGet();
}
if (limit.isPresent()) {
limitPushedCounter.incrementAndGet();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

listRowGrantsCallsCounter, rolesPushedCounter, rolesPushedCounter and limitPushedCounter should be grouped into a single bean that represents interaction with metadata call. Typically we had single counter per method, you had 4. Using bean we would maintain having single counter per method, where one counter is just complex.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added beans: a Counter bean a Count bean to compare against.

@lhofhansl
Copy link
Member Author

@kokosing , pushed a new update. Let me know whether the bean approach in CountingMockConnector is what you had in mind.

@lhofhansl
Copy link
Member Author

TestThrottledAsyncQueue seems unrelated and passes locally.

Copy link
Member

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know whether the bean approach in CountingMockConnector is what you had in mind.

Yes. I left few comments there.

Also I would like to merge this pull request after the current release goes out. I don't want to mess with the current release related work.

assertMetadataCalls(
"SELECT count(*) FROM (SELECT * FROM test_catalog.information_schema.role_authorization_descriptors WHERE grantee = 'user5' LIMIT 1)", "VALUES 1",
new MetadataCallsCount()
.withRoleGrantCount(1, 0, 1, 0));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is very nice. How about going one step further and having this more fluent, like

new MetadataCallsCount().with(roleGrantCount(/*method calls count*/ 1).withGranteesPushedDown(1));

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had those methods before. It just seemed to now make more sense to compare the beans...

Maybe it was better the way I had it before?

In the end we have certain metrics we want to count for the connector; those are the atomic counters.
I think I misunderstood what you wanted... MetadataCallsCount is already a "bean", and so having the individual members for counts is fine...?

}
}

public static class RoleGrantCounter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RoleGrantCounter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... Naming is hard. :)

private final AtomicLong listRowGrantsCallsCounter = new AtomicLong();
private final AtomicLong rolesPushedCounter = new AtomicLong();
private final AtomicLong granteesPushedCounter = new AtomicLong();
private final AtomicLong limitPushedCounter = new AtomicLong();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to have bean to be immutable. incrementListRoleGrants could return new instance. Then you don't need two classes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AtomicLongs for the other counters are mutable. I think this counter would be inherently mutable.
How do you handle atomicity between calls? Have the counter bean be an atomic reference and you update the reference with each new instance?

We're going to need 2 beans, right? One for counting and one to compare against during the test.
The more I think about it, the more I think the previous code was better.

The schema of this table is:
role_nane | grantor | grantor_type | grantee | grantee_type | is_grantable

For Hive, queries on this table are translated to calls to
metastore.get_principals_in_role or metastore.get_role_grants_for_principal,
with proper predicate pushdown to avoid excessive requests to the metastore.

grantor and grantor_type are not yet implemented and always NULL.
@lhofhansl
Copy link
Member Author

Pushed another update. @kokosing .
I think for the test this is as good as it gets. The counters need to be mutable (that's also why the other calls are tracked with - mutable - AtomicLongs). I consolidated the counting into a single bean that tracks the relevant counts.

@kokosing kokosing merged commit 1e3690a into trinodb:master Jun 1, 2020
@kokosing
Copy link
Member

kokosing commented Jun 1, 2020

Merged, thanks!

@kokosing kokosing changed the title Add an information_schema.role_grants table Add an information_schema. role_authorization_descriptors table Jun 1, 2020
@kokosing kokosing mentioned this pull request Jun 1, 2020
9 tasks
@lhofhansl
Copy link
Member Author

Thank you @kokosing . And thanks for the detailed review on this - it makes me feel really good about the quality of the Presto code base!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants