-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Presto ranger integration with row-level filters and column masking #11640
Conversation
c6c542e
to
21f1f09
Compare
@stagraqubole Does this solve your use case ? |
Yet to try this out but the main usecase for us is using Ranger policies already defined via Hive. From code it looks like with config "ranger-service-types" it can do catalog level authorization but it would be good to have examples/doc. |
Following steps will help enable
|
Design doc : https://docs.google.com/document/d/1Jtapmwkp1Up_w6w_3dUeOXfLPLRAsglbqoIWNRG-NJM/edit# Presto-spi changes : https://github.com/prestodb/presto/blob/21f1f0932ac395a6a574596bc2f12a39ac5750df/presto-spi/src/main/java/com/facebook/presto/spi/security/SystemAccessControl.java Rest all is the presto-ranger plugin |
ship it! |
@cryptoe can you post a link to the ranger policy language (everything I can find is a sales pitch or pictures of UIs)? I'm curious what the limitations of the language are, or is anything allowed in the row level filter? |
@dain Are you referring to language model of the row level filtering expression? May be you are referring to https://cwiki.apache.org/confluence/display/RANGER/Row-level+filtering+and+column-masking+using+Apache+Ranger+policies+in+Apache+Hive . I think you can put any non empty string in the row level filter. I am doing the row level filter string to presto sql conversion here. |
@cryptoe That is whatI was afraid of. That leads to many questions:
If there was a small spec for these expressions which contained the syntax for the language and semantics of functions/operations, I see the translation as doable, but short of that, I have concerns about this feature both producing wrong answers and breaking security, both of which are core values of this project. |
@dain (I have contributed some work to Ranger in the past) Ranger works as a policy management system. It let's you define policies and store them by means of plugins. E.g. hive is a plugin, hdfs is a plugin and potentially presto could be a plugin. Policy checking happens entirely within the client (e.g. Presto). The Domain Specific Language is set by the plugin. Policies are refreshed by the client every 30s by default. So the questions you are raising are highly relevant, but cannot not be answered in the domain of Ranger and should be answered in Presto's domain. So for example "what language is the string in?" should be up to Presto to define in context of its policy evaluation and supported by the Ranger plugin. Either as part of enforced templating of input or by definition. In the first then it would make sense to have the plugin as part of Presto (although it would need to be copied over to Ranger). The nature of Presto being a kind of layer on top of other data requesting engines brings a practical challenge: ideally you would like to reuse the policies applied to the underlying databases and data (e.g. hive, mysql, postgres etc). Obviously you then become dependent on what those plugins support as DSL. Thus my suggestion would be to have a "policy enforcement specification" in Presto that specifies what the DSL for Presto is and next to that a specification how to handle the "sub database policy definitions". My 2 cents. Let me know how we could contribute if required. We would love to have this in. |
@bolkedebruin That makes sense. The problem is that most people likely want to use existing Hive rules in Presto, but the super lax policy system in Ranger makes that almost impossible without implementing HQL in Presto. Looking at the docs for Ranger my guess is 99% of users only use the UI to define policy. Is the UI limited in scope? Specifically, does it only support a reasonable/portable language? Or is there a simple portable language that can be used from the UI and understandable from multiple systems. |
@dain maybe lax is another word for very flexible ;-). So in many cases the UI will be used for setting policies and they will be slow changing. However, I do know of deployments that do use the Rest API for defining policies and some deployments that have faster changing ones. For the language that doesn't really matter as it is verfied by the plugin in Ranger. For hive that means no verification (as far as I could determine) and the language is HQL. E.g. for row level masking the query is basically added to the statement created by the user. This also means that the query will fail if an error is made by the administrator while defining policies (probably tough to debug for the end user). So I would argue that leaves us with a couple of scenarios
|
@bolkedebruin, to clarify, Ranger internally just stores strings? I ask because looking at the docs, the system seems to have a graphical UI to create and update the policies. To implement such a UI, I would assume that you either need to store the filter as an abstract syntax tree, or you need to be able to parse the native language of ever supported system and produce an AST for the UI. Actually even in that case, the AST supported by the UI would be limited, so maybe we can support only that subset? |
Ah gotcha. Ranger indeed stores an AST and renders this in the UI. That's easy to support and already done by this patch. Only supporting this is my scenario 3. The Hive plugin in Ranger also adds support for row level filter expressions or masks. These are defined in a HQL subset and more or less directly added to the query a user creates (in the hive UI that is!, Eg a select query) The REST API and UI both generate the same AST. They also both support the custom row level filters which are just an array of strings in the AST. The patch as it stands supports the custom row level filters to the extend that the expressions aRe verified by prestos parser. Hence there might be minor caveats. This is scenario 1. Why not make row level filtering and masking support optional (I think it is in this patch on a phone though). Also make make configurable if the query needs to fail if presto then encounters a requirement for those filters in a particular policy preventing data leakage. Maybe mark it as experimental. This would put it in the hands of an operator and we could see what the real world experience is? |
@bolkedebruin How did you arrive at conclusion that ranger stores AST..? @dain you noted about "Ranger makes that almost impossible without implementing HQL in Presto" |
@RameshByndoor because it does. This is an example of a policy AST in Json (row level filter not being part of it)
So the row level expression is stored as an arbitrary string within the AST of the policy (probably something like In other words the AST of a Hive policy is more than just the row level expression. It is the whole Policy you define in Ranger, which can include row level filters and masks but not neccesarily. The client (presto in this case) downloads the whole Policy and then does verification on its own. Verification happens by parsing the AST (which is already done for you by the Ranger framework) and applying it. Note: in this case the DB schema of Ranger is irrelevant. It reassembles the AST from the DB and provides that as an interface to clients. |
@bolkedebruin ln simple words Ranger Row filters are NOT parsed to Abstract syntax Tree(AST) anywhere in ranger, instead it propagates as just expression string, I don't see any code which is doing in apache/ranger,please do post code if it does. |
@RameshByndoor that is exactly what I said and I am not trying to make the point that it does. However, the AST of Ranger 'wraps' (if you will) the row level expressions. Please note that we are not just talking about the row level expressions here, but that @dain was, in my opinion, asking for a more holistic view on how policies work in Ranger and how these could be integrated into Presto. A 'row level' expression is just a mere sub item of a policy. |
@bolkedebruin Thanks for raising that concern. You are right, Ranger is much more capable in that way, We can consume whole of presto resources and define everything over there. I have tried that too. @bolkedebruin @dain have look at here. Hope this helps. :) |
@bolkedebruin and @RameshByndoor, my main concern is row level filtering (I think the rest is pretty straight forward). Specifically, for the example above, |
@cryptoe my personal suggestion is we split this PR into two parts. In the first part, we add support for column Ranger checks, but fail if there are row filters (I believe this is uncontroversial). Then for the second part, we develop a spec for a limited subset of the HQL filter language, that has tight definitions of the semantics the language and each supported operator, function and type. The spec would be our commitment to correctness. This second part might be controversial, so I'd wait to hear feedback from @martint, @electrum, @kokosing, and others. |
Sounds fair. Can we add support for "just" the presto filter language (too)? (Eg. Decoupling it from hive?) |
@bolkedebruin, is that something people want? I think there is a much bigger discussion to be had around how this is actually integrated into the engine. I expect that the current "string expression" API isn't what the optimizer folks will want for this integration point, but we should wait for them to comment. |
Hi everyone, Thanks for putting this PR up and it looks great! I am part of the security team at FB and we really have a high priority to merge the row level security piece (column masking is not as high a priority for us). However, it seems like there is still more flushing out needed in terms of how the filters are propagated. A couple of questions: Thanks! |
@Phelodas I was splitting the PR into two as @dain suggested. One would be the ranger piece containing only basic ACL's on tables/schemas/catalogs. @bolkedebruin Presto sever uses jersey 2.xx while ranger uses jersey version 1.xx . We got around the dependency issues by shading the jar in the released artifact. Solving the dependency issues required for test case addition is a little tricky. Currently stuck there. Working on having a specific classloader for test cases to load the ranger jars in TestingPrestoServer . |
gentle ping |
Hi all, I am preparing a PR for the Ranger 1.2.0 integration. This will only include some simple functionalities, row-level filters and column masking are not included. I hope to submit the PR for review by the end of this week. |
Hey guys, I am going to make a new PR for the row level security changes. I think it's important to explore a more structured approach to passing the filters to Presto but it would be great to vet if this design works for Ranger as well. I'll have a design doc up by tomorrow. Thanks! |
@Phelodas What do you mean by "passing the filters to Presto"? All the policies should be defined in the ranger UI. The policy file is then cached and evaluated within Presto using the |
I have sent the pr in the prestosql repo. Please check if it solves your usecase |
Update: shading jersey when integrating with Ranger 1.2 proofs hard, due to transitive dependencies. We are working (incl @WalkerTR ) on an intermediate solution. We would love to know how you shaded Jersey @cryptoe and would like to see if it applies to a recent ranger. We think Ranger changes are required and that will take some time to have those merged upstream. |
@bolkedebruin Please checkout the ranger pom.xml https://github.com/prestodb/presto/blob/5bdc5106f2ecaf7068fdf5b54045deccd8bc0136/presto-ranger-plugin/pom.xml . I think the pretty much the same pom should work for your usecase no ? |
We reached out to the Ranger ppl. They have a shim pattern that solves the issues, most likely, without the need for sharing. Diving in today. |
Opened a new issue that is focused solely on Row Level Security : #12491 . I've attached a design doc there as well. Would love to hear your comments/feedback. @WalkerTR Sorry, I missed your comment. I am mentioning the format for the filter in the design doc I mentioned above. The idea would be to conform the SQL (HQL) to that format in the connector implementation. Please let me know your thoughts in the issue. |
@cryptoe @Phelodas I have made a plugin that works with latest Ranger and PrestoSql. It is part of the Ranger source code and I will push it upstream there. It does not do "control" yet, it is the boilerplate. https://github.com/bolkedebruin/ranger/tree/presto_plugin |
@Phelodas @cryptoe I have submitted the first iteration of the patch to Ranger: https://issues.apache.org/jira/browse/RANGER-2395 . We are testing it in our production environment. Couple of points:
|
its throwing me error stating plugin not registered when i restart the presto.. here are my access control properties and jar is kept under plugin/ranger-presto/ranger-presto.jar did i miss anything. |
@radd3 any chance you copied non-shaded jar? Shaded jar from |
i checked and it was shaded one.. after some debugging i am getting this error.. i also checked whether port was reused for something.. without this plugin the interaction in the cli works normal. with this plugin i get the below error. |
Hi I am not able to download the below jar. Github repo URL has compile time error in ranger plugin. presto/plugin/ranger-presto-plugin/presto-ranger-plugin-0.version-shaded.jar |
@cryptoe do you think this is a problem? how to deal with it? thank you very much. |
This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the task, make sure you've addressed reviewer comments, and rebase on the latest master. Thank you for your contributions! |
|
Hey @rohanpednekar , I see that you reopened this PR. |
Yes, this is stale PR we have another PR #15519 for Ranger integration, Closing this for now. |
This PR includes #8980 and #10996, Also includes row-level filtering.