Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kerberos support for the livy client #314

Closed
wants to merge 3 commits into from
Closed

Kerberos support for the livy client #314

wants to merge 3 commits into from

Conversation

brockn
Copy link

@brockn brockn commented Jan 4, 2017

  • Added authentication parameter (-a was taken so long opt only)
  • Fixes all failing tests
  • Adds requests-kerberos support
  • Handles "waiting" state

* Added authentication parameter (-a was taken so long opt only)
* Fixes all failing tests
* Adds requests-kerberos support
@brockn
Copy link
Author

brockn commented Jan 4, 2017

We'll be testing this patch over the new few weeks.

@brockn brockn mentioned this pull request Jan 4, 2017
@aggFTW
Copy link
Contributor

aggFTW commented Jan 4, 2017

Thanks for the submission @brockn!

Before we discuss code details, I'd like to take a step back and discuss how Kerberos authentication should be done so that the following configurations are possible, and all of the people mentioned below are able to cover the scenarios they care about (please do let me know if I'm missing some scenario!):

  1. default/explicit principals
  2. required/optional/disabled mutual authentication
  3. single-user Jupyter where sparkmagic takes the default ticket in the box
  4. single-user Jupyter where sparkmagic is responsible for managing the lifetime of its tickets
  5. integrated authentication where the ticket flows from client-machine to JupyterHub, SparkMagic, and finally Livy

We've already started this sort of discussion in #284

cc.
@languy @tc0312 @msftristew @praveenkanamarlapudi @joychak @prabhu1984

@brockn
Copy link
Author

brockn commented Jan 5, 2017

Hi @aggFTW,

Thanks for pointing out #284! I didn't see that.

I emphasize with that and too be clear, this approach requires the user to kinit (and renew via say crontab) on the host where their jupyter instance is running. So fairly narrow in scope, but is straightforward to extend to use cases 1, 2, and 4 and perhaps 5, which is a larger project of which I don't know the scope.

Given that Kerberos support seems stalled, I feel like we might be trying to boil the ocean here. Either this or #284 would make SparkMagic usable in the real-world where clusters have Kerberos.

Just my two cents though!

Brock

@aggFTW
Copy link
Contributor

aggFTW commented Jan 5, 2017

I agree that we can build this incrementally. I just want to give people who have worked on scenarios 4 and 5 the opportunity to chime in if this is a good first step that they could extend for their scenarios.
Ping @praveenkanamarlapudi @joychak @prabhu1984

I like this first approach if we can also expose options to support scenarios 1 and 2 at a minimum before check in.

@languy @tc0312 @msftristew: Thoughts?

@aggFTW
Copy link
Contributor

aggFTW commented Jan 9, 2017

Hey, @praveenkanamarlapudi @joychak @prabhu1984 @languy @tc0312 @msftristew: have you had a chance to read this thread?

Thanks!

@joychak
Copy link

joychak commented Jan 9, 2017 via email

@aggFTW
Copy link
Contributor

aggFTW commented Jan 9, 2017

Much appreciated @joychak! Hopefully we can release something valuable for everyone! :)

@praveen-kanamarlapudi
Copy link
Contributor

Hi @aggFTW

For the scenarios you mentioned earlier I am giving my view from #284 perspective.

  1. default/explicit principals
    Since user can provide the principal on USERNAME tab, user can provide the default/explicit principal as "user" or "user@PRINCIPAL_DOMAIN".

  2. required/optional/disabled mutual authentication
    User can mention Kerberos on "auth_type" for kerberos authentication or leave it blank for plain/simple authentication. We made mutual authentication as optional. I am not sure if required is the best practice.

  3. single-user Jupyter where sparkmagic takes the default ticket in the box
    We have two scenarios here.

    • If user uses config.json to create ticket? If user creates kerberos ticket using kinit on box and starta spark/pyspark kernel on jupyter, it will automatically uses the existing kerberos ticket.
    • If user uses %manage_spark, we are creating a new ticket/renewing the ticket if exists any.
  4. single-user Jupyter where sparkmagic is responsible for managing the lifetime of its tickets
    Yes, our code changes (as per configuration), it will manage the lifetime and renew ticket automatically. Ticket won't be renewed if there are no sessions available for the give user. We are not destroying the ticket as it may impact other processes.

  5. integrated authentication where the ticket flows from client-machine to JupyterHub, SparkMagic, and finally Livy
    We are using JupyterHub for our environment, but we didn't enable kerberos for Jupyterhub, we are managing kerberos on SparkMagic.
    If user wants to setup sparkmagic/jupyter without jupyterhub, then our setup will works, since kerberos is managed on SparkMagic.

Note: All users will not go through jupyterhub setup everytime, since it requires root access for npm, node, configurable-http-proxy installation, etc. So, keeping kerberos authentication on sparkmagic may be the right way.

@pkasinathan
Copy link

Hi @aggFTW,

We can implement kerberos authentication incrementally using #284 PR. i.e. Lets enable kerberos authentication first on Spark Magic. That way, spark magic manages kerberos ticket init, renew and lifetime.

PR #284 manages kerberos authentication in two ways:-

  1. User run kinit on jupyter/sparkmagic installed host and enable kerberos authentication using auth_type on ~/.sparkmagic/config.json.
    {or}
  2. User run %manage_spark and add endpoint with Kerberos authentication by providing username/password to run kinit command by spark magic and manage lifetime.

Problem with enabling kerberos authentication on jupyterhub: If we manage kerberos authentication outside the sparkmagic in jupyterhub, then it would add unnecessary dependent to install Jupyterhub. Most of the user community install Jupyter/Sparkmagic for their own instance.

Let us know your thoughts.

@languy
Copy link

languy commented Jan 10, 2017

Some thoughts:
This is a good first step that can help most users. Leaving the management of the ticket lifecycle (ie renewing the ticket) outside of the scope has some advantages: kerberos already comes with an expected set of tools and documentation that can be built upon.

Auto-renewing the ticket ourselves seems like a good optimization (over, say cronjob'ing kinit as was suggested), but there might be some pitfalls: is there enough output for the end-user to troubleshoot? Does it conflict with another renewal mechanism already on the box? Does it properly update the ticket on Windows OS? (I've had issues with a java kerberos lib that required changing registry keys to get permissions to write to the ticket cache folder). Are there scenarios that require turning off auto-renewal and prompt for user creds instead when ticket expires?

On mutual auth: I think it would be good to expose as many dials as requests_kerberos provides in configuration and also adopt its default behavior to avoid surprising the end-user. For instance, if I read their doc correctly, it seems that mutual auth is required by default.

On scenario 5, flowing the ticket to other components (livy, sparkmagic, etc): is this something the component can request from jupyter when needed? Or will it get pushed and cached for later use? In which case the ticket might be stale (not to mention possible security issues). Also, if the purpose is to sign on to external Kerberos servers, we're talking about propagating (and renewing) a different ticket (TGS).

My 2 cents.

@aggFTW
Copy link
Contributor

aggFTW commented Jan 10, 2017

Thanks @languy @prabhu1984 @praveenkanamarlapudi!

I like @languy's suggestion about keeping ticket management completely out of scope as a way to simplify. Thinking about it, I do see some pitfalls coming in... Let's start by getting this PR in, exposing all the dials that requests_kerberos provides.

@prabhu1984 @praveenkanamarlapudi, Thanks for your feedback! Points are valid, and we can definitely take some of the code from #284 and enrich this PR. Can you help us code review this PR? I will start doing so myself.

Copy link
Contributor

@aggFTW aggFTW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start! In general, I would ask you to take a look at the feedback for #284 and add unit tests for new behaviors.

I will install in my machine once you update the PR and also test locally.

Thanks!

@@ -85,6 +85,7 @@ def version(path):
'pandas>=0.17.1',
'numpy',
'requests',
'requests-kerberos',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add it to the requirements.txt file too please?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will need to bump the version of all 3 packages in this code base from 0.10.0 to 0.11.0

@@ -331,12 +331,15 @@ def _do_not_call_change_language(self, line, cell="", local_ns=None):

@magic_arguments()
@line_magic
@argument("--authentication", type=str, default=NONE_AUTH, choices=POSSIBLE_AUTH, \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also update the server extension to pass in the authentication method:

code = '%{} -s {} -u {} -p {}'.format(KernelMagics._do_not_call_change_endpoint.__name__, endpoint, username, password)

Look at _get_kernel_name for how the server extension handles optional parameters. I think default should be configurable, with default being BASIC_AUTH.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add unit tests for this.

(username, password, url) = (credentials['username'], credentials['password'], credentials['url'])
self.endpoint = Endpoint(url, username, password)
(username, password, authentication, url) = (credentials['username'], credentials['password'], \
credentials['authentication'], credentials['url'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please update the README.md file with examples on how to configure different authentication methods in the config file, and list Kerberos support as a feature 😄 ?

from requests_kerberos import HTTPKerberosAuth, REQUIRED
auth = HTTPKerberosAuth(mutual_authentication=REQUIRED, force_preemptive=True)
else:
raise ValueError("Unsupported authentication type {}".format(self._endpoint.authentication))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -52,6 +53,8 @@ def manage_spark(self, line, local_ns=None):
"from the server for SQL queries")
@argument("-r", "--samplefraction", type=float, default=None, help="Sample fraction for sampling from SQL queries")
@argument("-u", "--url", type=str, default=None, help="URL for Livy endpoint")
@argument("--authentication", type=str, default=NONE_AUTH, choices=POSSIBLE_AUTH, \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using -t for authentication?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the widgets that serve as UI for this too like in https://github.com/jupyter-incubator/sparkmagic/pull/284/files#r83328805

if data is None:
r = function(url, headers=self._headers, verify=self.verify_ssl)
else:
r = function(url, headers=self._headers, data=json.dumps(data), verify=self.verify_ssl)
else:
if self._endpoint.authentication == BASIC_AUTH:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some unit tests for this if/else statements?

@@ -52,7 +52,7 @@ def _get_statement_output(self, session, statement_id):

self.logger.debug(u"Status of statement {} is {}.".format(statement_id, status))

if status == u"running":
if status == u"running" or status == u"waiting":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a unit test for this please?

@aggFTW
Copy link
Contributor

aggFTW commented Jan 17, 2017

Hi @brockn, are you taking a look at the review feedback? Can I help with something else?
I also noticed that you are checking that status is not None: when does this happen?
Thanks!

@aggFTW
Copy link
Contributor

aggFTW commented Feb 21, 2017

FYI, talk given by @joychak at Spark Summit on the topic: http://www.slideshare.net/SparkSummit/secured-kerberosbased-spark-notebook-for-data-science-spark-summit-east-talk-by-joy-chakraborty

Joy, want to chime in on the thread to comment on this particular implementation?

Thanks!

@joychak
Copy link

joychak commented Feb 21, 2017 via email

@pranayhasan
Copy link

@brockn Any update on merging this PR?

@aggFTW aggFTW closed this May 1, 2017
@vkabanorv
Copy link

Hi all!

I understand that this has been implemented? If so, could you please share the details of configuration? Have added auth "Kerberos" to config.json, kinited, but still getting

Problem accessing /sessions. Reason:

    Authentication required

. How do I get sparkmagic to do the authentication?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants