-
Notifications
You must be signed in to change notification settings - Fork 447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kerberos support for the livy client #314
Conversation
brockn
commented
Jan 4, 2017
- Added authentication parameter (-a was taken so long opt only)
- Fixes all failing tests
- Adds requests-kerberos support
- Handles "waiting" state
* Added authentication parameter (-a was taken so long opt only) * Fixes all failing tests * Adds requests-kerberos support
We'll be testing this patch over the new few weeks. |
Thanks for the submission @brockn! Before we discuss code details, I'd like to take a step back and discuss how Kerberos authentication should be done so that the following configurations are possible, and all of the people mentioned below are able to cover the scenarios they care about (please do let me know if I'm missing some scenario!):
We've already started this sort of discussion in #284 cc. |
Hi @aggFTW, Thanks for pointing out #284! I didn't see that. I emphasize with that and too be clear, this approach requires the user to kinit (and renew via say crontab) on the host where their jupyter instance is running. So fairly narrow in scope, but is straightforward to extend to use cases 1, 2, and 4 and perhaps 5, which is a larger project of which I don't know the scope. Given that Kerberos support seems stalled, I feel like we might be trying to boil the ocean here. Either this or #284 would make SparkMagic usable in the real-world where clusters have Kerberos. Just my two cents though! Brock |
I agree that we can build this incrementally. I just want to give people who have worked on scenarios 4 and 5 the opportunity to chime in if this is a good first step that they could extend for their scenarios. I like this first approach if we can also expose options to support scenarios 1 and 2 at a minimum before check in. @languy @tc0312 @msftristew: Thoughts? |
Hey, @praveenkanamarlapudi @joychak @prabhu1984 @languy @tc0312 @msftristew: have you had a chance to read this thread? Thanks! |
Not yet but I will do this week.
…Sent from my iPhone
On Jan 9, 2017, at 10:45 AM, Alejandro Guerrero Gonzalez ***@***.***> wrote:
Hey, @praveenkanamarlapudi @joychak @prabhu1984 @languy @tc0312 @msftristew: have you had a chance to read this thread?
Thanks!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Much appreciated @joychak! Hopefully we can release something valuable for everyone! :) |
Hi @aggFTW For the scenarios you mentioned earlier I am giving my view from #284 perspective.
|
Hi @aggFTW, We can implement kerberos authentication incrementally using #284 PR. i.e. Lets enable kerberos authentication first on Spark Magic. That way, spark magic manages kerberos ticket init, renew and lifetime. PR #284 manages kerberos authentication in two ways:-
Problem with enabling kerberos authentication on jupyterhub: If we manage kerberos authentication outside the sparkmagic in jupyterhub, then it would add unnecessary dependent to install Jupyterhub. Most of the user community install Jupyter/Sparkmagic for their own instance. Let us know your thoughts. |
Some thoughts: Auto-renewing the ticket ourselves seems like a good optimization (over, say cronjob'ing kinit as was suggested), but there might be some pitfalls: is there enough output for the end-user to troubleshoot? Does it conflict with another renewal mechanism already on the box? Does it properly update the ticket on Windows OS? (I've had issues with a java kerberos lib that required changing registry keys to get permissions to write to the ticket cache folder). Are there scenarios that require turning off auto-renewal and prompt for user creds instead when ticket expires? On mutual auth: I think it would be good to expose as many dials as requests_kerberos provides in configuration and also adopt its default behavior to avoid surprising the end-user. For instance, if I read their doc correctly, it seems that mutual auth is required by default. On scenario 5, flowing the ticket to other components (livy, sparkmagic, etc): is this something the component can request from jupyter when needed? Or will it get pushed and cached for later use? In which case the ticket might be stale (not to mention possible security issues). Also, if the purpose is to sign on to external Kerberos servers, we're talking about propagating (and renewing) a different ticket (TGS). My 2 cents. |
Thanks @languy @prabhu1984 @praveenkanamarlapudi! I like @languy's suggestion about keeping ticket management completely out of scope as a way to simplify. Thinking about it, I do see some pitfalls coming in... Let's start by getting this PR in, exposing all the dials that @prabhu1984 @praveenkanamarlapudi, Thanks for your feedback! Points are valid, and we can definitely take some of the code from #284 and enrich this PR. Can you help us code review this PR? I will start doing so myself. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good start! In general, I would ask you to take a look at the feedback for #284 and add unit tests for new behaviors.
I will install in my machine once you update the PR and also test locally.
Thanks!
@@ -85,6 +85,7 @@ def version(path): | |||
'pandas>=0.17.1', | |||
'numpy', | |||
'requests', | |||
'requests-kerberos', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add it to the requirements.txt file too please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will need to bump the version of all 3 packages in this code base from 0.10.0
to 0.11.0
@@ -331,12 +331,15 @@ def _do_not_call_change_language(self, line, cell="", local_ns=None): | |||
|
|||
@magic_arguments() | |||
@line_magic | |||
@argument("--authentication", type=str, default=NONE_AUTH, choices=POSSIBLE_AUTH, \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also update the server extension to pass in the authentication method:
code = '%{} -s {} -u {} -p {}'.format(KernelMagics._do_not_call_change_endpoint.__name__, endpoint, username, password) |
Look at _get_kernel_name
for how the server extension handles optional parameters. I think default should be configurable, with default being BASIC_AUTH
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also add unit tests for this.
(username, password, url) = (credentials['username'], credentials['password'], credentials['url']) | ||
self.endpoint = Endpoint(url, username, password) | ||
(username, password, authentication, url) = (credentials['username'], credentials['password'], \ | ||
credentials['authentication'], credentials['url']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please update the README.md file with examples on how to configure different authentication methods in the config file, and list Kerberos support as a feature 😄 ?
from requests_kerberos import HTTPKerberosAuth, REQUIRED | ||
auth = HTTPKerberosAuth(mutual_authentication=REQUIRED, force_preemptive=True) | ||
else: | ||
raise ValueError("Unsupported authentication type {}".format(self._endpoint.authentication)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use BadUserConfigurationException
@@ -52,6 +53,8 @@ def manage_spark(self, line, local_ns=None): | |||
"from the server for SQL queries") | |||
@argument("-r", "--samplefraction", type=float, default=None, help="Sample fraction for sampling from SQL queries") | |||
@argument("-u", "--url", type=str, default=None, help="URL for Livy endpoint") | |||
@argument("--authentication", type=str, default=NONE_AUTH, choices=POSSIBLE_AUTH, \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about using -t
for authentication?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update the widgets that serve as UI for this too like in https://github.com/jupyter-incubator/sparkmagic/pull/284/files#r83328805
if data is None: | ||
r = function(url, headers=self._headers, verify=self.verify_ssl) | ||
else: | ||
r = function(url, headers=self._headers, data=json.dumps(data), verify=self.verify_ssl) | ||
else: | ||
if self._endpoint.authentication == BASIC_AUTH: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add some unit tests for this if/else statements?
@@ -52,7 +52,7 @@ def _get_statement_output(self, session, statement_id): | |||
|
|||
self.logger.debug(u"Status of statement {} is {}.".format(statement_id, status)) | |||
|
|||
if status == u"running": | |||
if status == u"running" or status == u"waiting": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a unit test for this please?
Hi @brockn, are you taking a look at the review feedback? Can I help with something else? |
FYI, talk given by @joychak at Spark Summit on the topic: http://www.slideshare.net/SparkSummit/secured-kerberosbased-spark-notebook-for-data-science-spark-summit-east-talk-by-joy-chakraborty Joy, want to chime in on the thread to comment on this particular implementation? Thanks! |
Sure. I would be more than happy to chat.
Thanks,
Joy
…Sent from my iPhone
On Feb 21, 2017, at 5:33 PM, Alejandro Guerrero Gonzalez ***@***.***> wrote:
FYI, talk given by @joychak at Spark Summit on the topic: http://www.slideshare.net/SparkSummit/secured-kerberosbased-spark-notebook-for-data-science-spark-summit-east-talk-by-joy-chakraborty
Joy, want to chime in on the thread to comment on this particular implementation?
Thanks!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@brockn Any update on merging this PR? |
Hi all! I understand that this has been implemented? If so, could you please share the details of configuration? Have added auth "Kerberos" to config.json, kinited, but still getting Problem accessing /sessions. Reason: Authentication required. How do I get sparkmagic to do the authentication? |