Kerberos support for the livy client #314

brockn · 2017-01-04T03:26:40Z

Added authentication parameter (-a was taken so long opt only)
Fixes all failing tests
Adds requests-kerberos support
Handles "waiting" state

* Added authentication parameter (-a was taken so long opt only) * Fixes all failing tests * Adds requests-kerberos support

brockn · 2017-01-04T03:27:36Z

We'll be testing this patch over the new few weeks.

aggFTW · 2017-01-04T18:35:32Z

Thanks for the submission @brockn!

Before we discuss code details, I'd like to take a step back and discuss how Kerberos authentication should be done so that the following configurations are possible, and all of the people mentioned below are able to cover the scenarios they care about (please do let me know if I'm missing some scenario!):

default/explicit principals
required/optional/disabled mutual authentication
single-user Jupyter where sparkmagic takes the default ticket in the box
single-user Jupyter where sparkmagic is responsible for managing the lifetime of its tickets
integrated authentication where the ticket flows from client-machine to JupyterHub, SparkMagic, and finally Livy

We've already started this sort of discussion in #284

cc.
@languy @tc0312 @msftristew @praveenkanamarlapudi @joychak @prabhu1984

brockn · 2017-01-05T15:58:51Z

Hi @aggFTW,

Thanks for pointing out #284! I didn't see that.

I emphasize with that and too be clear, this approach requires the user to kinit (and renew via say crontab) on the host where their jupyter instance is running. So fairly narrow in scope, but is straightforward to extend to use cases 1, 2, and 4 and perhaps 5, which is a larger project of which I don't know the scope.

Given that Kerberos support seems stalled, I feel like we might be trying to boil the ocean here. Either this or #284 would make SparkMagic usable in the real-world where clusters have Kerberos.

Just my two cents though!

Brock

aggFTW · 2017-01-05T18:25:51Z

I agree that we can build this incrementally. I just want to give people who have worked on scenarios 4 and 5 the opportunity to chime in if this is a good first step that they could extend for their scenarios.
Ping @praveenkanamarlapudi @joychak @prabhu1984

I like this first approach if we can also expose options to support scenarios 1 and 2 at a minimum before check in.

@languy @tc0312 @msftristew: Thoughts?

aggFTW · 2017-01-09T15:45:04Z

Hey, @praveenkanamarlapudi @joychak @prabhu1984 @languy @tc0312 @msftristew: have you had a chance to read this thread?

Thanks!

joychak · 2017-01-09T16:02:17Z

Not yet but I will do this week.

…

Sent from my iPhone

On Jan 9, 2017, at 10:45 AM, Alejandro Guerrero Gonzalez ***@***.***> wrote: Hey, @praveenkanamarlapudi @joychak @prabhu1984 @languy @tc0312 @msftristew: have you had a chance to read this thread? Thanks! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

aggFTW · 2017-01-09T16:27:01Z

Much appreciated @joychak! Hopefully we can release something valuable for everyone! :)

praveen-kanamarlapudi · 2017-01-09T20:39:28Z

Hi @aggFTW

For the scenarios you mentioned earlier I am giving my view from #284 perspective.

default/explicit principals
Since user can provide the principal on USERNAME tab, user can provide the default/explicit principal as "user" or "user@PRINCIPAL_DOMAIN".
required/optional/disabled mutual authentication
User can mention Kerberos on "auth_type" for kerberos authentication or leave it blank for plain/simple authentication. We made mutual authentication as optional. I am not sure if required is the best practice.
single-user Jupyter where sparkmagic takes the default ticket in the box
We have two scenarios here.
- If user uses config.json to create ticket? If user creates kerberos ticket using kinit on box and starta spark/pyspark kernel on jupyter, it will automatically uses the existing kerberos ticket.
- If user uses %manage_spark, we are creating a new ticket/renewing the ticket if exists any.
single-user Jupyter where sparkmagic is responsible for managing the lifetime of its tickets
Yes, our code changes (as per configuration), it will manage the lifetime and renew ticket automatically. Ticket won't be renewed if there are no sessions available for the give user. We are not destroying the ticket as it may impact other processes.
integrated authentication where the ticket flows from client-machine to JupyterHub, SparkMagic, and finally Livy
We are using JupyterHub for our environment, but we didn't enable kerberos for Jupyterhub, we are managing kerberos on SparkMagic.
If user wants to setup sparkmagic/jupyter without jupyterhub, then our setup will works, since kerberos is managed on SparkMagic.

Note: All users will not go through jupyterhub setup everytime, since it requires root access for npm, node, configurable-http-proxy installation, etc. So, keeping kerberos authentication on sparkmagic may be the right way.

pkasinathan · 2017-01-10T01:51:46Z

Hi @aggFTW,

We can implement kerberos authentication incrementally using #284 PR. i.e. Lets enable kerberos authentication first on Spark Magic. That way, spark magic manages kerberos ticket init, renew and lifetime.

PR #284 manages kerberos authentication in two ways:-

User run kinit on jupyter/sparkmagic installed host and enable kerberos authentication using auth_type on ~/.sparkmagic/config.json.
{or}
User run %manage_spark and add endpoint with Kerberos authentication by providing username/password to run kinit command by spark magic and manage lifetime.

Problem with enabling kerberos authentication on jupyterhub: If we manage kerberos authentication outside the sparkmagic in jupyterhub, then it would add unnecessary dependent to install Jupyterhub. Most of the user community install Jupyter/Sparkmagic for their own instance.

Let us know your thoughts.

languy · 2017-01-10T13:09:59Z

Some thoughts:
This is a good first step that can help most users. Leaving the management of the ticket lifecycle (ie renewing the ticket) outside of the scope has some advantages: kerberos already comes with an expected set of tools and documentation that can be built upon.

Auto-renewing the ticket ourselves seems like a good optimization (over, say cronjob'ing kinit as was suggested), but there might be some pitfalls: is there enough output for the end-user to troubleshoot? Does it conflict with another renewal mechanism already on the box? Does it properly update the ticket on Windows OS? (I've had issues with a java kerberos lib that required changing registry keys to get permissions to write to the ticket cache folder). Are there scenarios that require turning off auto-renewal and prompt for user creds instead when ticket expires?

On mutual auth: I think it would be good to expose as many dials as requests_kerberos provides in configuration and also adopt its default behavior to avoid surprising the end-user. For instance, if I read their doc correctly, it seems that mutual auth is required by default.

On scenario 5, flowing the ticket to other components (livy, sparkmagic, etc): is this something the component can request from jupyter when needed? Or will it get pushed and cached for later use? In which case the ticket might be stale (not to mention possible security issues). Also, if the purpose is to sign on to external Kerberos servers, we're talking about propagating (and renewing) a different ticket (TGS).

My 2 cents.

aggFTW · 2017-01-10T23:59:26Z

Thanks @languy @prabhu1984 @praveenkanamarlapudi!

I like @languy's suggestion about keeping ticket management completely out of scope as a way to simplify. Thinking about it, I do see some pitfalls coming in... Let's start by getting this PR in, exposing all the dials that requests_kerberos provides.

@prabhu1984 @praveenkanamarlapudi, Thanks for your feedback! Points are valid, and we can definitely take some of the code from #284 and enrich this PR. Can you help us code review this PR? I will start doing so myself.

aggFTW

Good start! In general, I would ask you to take a look at the feedback for #284 and add unit tests for new behaviors.

I will install in my machine once you update the PR and also test locally.

Thanks!

aggFTW · 2017-01-10T23:56:29Z

sparkmagic/setup.py

@@ -85,6 +85,7 @@ def version(path):
          'pandas>=0.17.1',
          'numpy',		
          'requests',
+          'requests-kerberos',


Can you please add it to the requirements.txt file too please?

You will need to bump the version of all 3 packages in this code base from 0.10.0 to 0.11.0

aggFTW · 2017-01-11T00:06:24Z

sparkmagic/sparkmagic/kernels/kernelmagics.py

@@ -331,12 +331,15 @@ def _do_not_call_change_language(self, line, cell="", local_ns=None):

    @magic_arguments()
    @line_magic
+    @argument("--authentication", type=str, default=NONE_AUTH, choices=POSSIBLE_AUTH, \


Please also update the server extension to pass in the authentication method:

sparkmagic/sparkmagic/sparkmagic/serverextension/handlers.py

Line 55 in 67fcf74

code = '%{} -s {} -u {} -p {}'.format(KernelMagics._do_not_call_change_endpoint.__name__, endpoint, username, password)

Look at _get_kernel_name for how the server extension handles optional parameters. I think default should be configurable, with default being BASIC_AUTH.

Please also add unit tests for this.

aggFTW · 2017-01-11T00:08:22Z

sparkmagic/sparkmagic/kernels/kernelmagics.py

-        (username, password, url) = (credentials['username'], credentials['password'], credentials['url'])
-        self.endpoint = Endpoint(url, username, password)
+        (username, password, authentication, url) = (credentials['username'], credentials['password'], \
+                credentials['authentication'], credentials['url'])


Can you please update the README.md file with examples on how to configure different authentication methods in the config file, and list Kerberos support as a feature 😄 ?

aggFTW · 2017-01-11T00:11:51Z

sparkmagic/sparkmagic/livyclientlib/reliablehttpclient.py

+                        from requests_kerberos import HTTPKerberosAuth, REQUIRED
+                        auth = HTTPKerberosAuth(mutual_authentication=REQUIRED, force_preemptive=True)
+                    else:
+                        raise ValueError("Unsupported authentication type {}".format(self._endpoint.authentication))


Please use BadUserConfigurationException

aggFTW · 2017-01-11T00:14:54Z

sparkmagic/sparkmagic/magics/remotesparkmagics.py

@@ -52,6 +53,8 @@ def manage_spark(self, line, local_ns=None):
                                                                        "from the server for SQL queries")
    @argument("-r", "--samplefraction", type=float, default=None, help="Sample fraction for sampling from SQL queries")
    @argument("-u", "--url", type=str, default=None, help="URL for Livy endpoint")
+    @argument("--authentication", type=str, default=NONE_AUTH, choices=POSSIBLE_AUTH, \


What about using -t for authentication?

Can you update the widgets that serve as UI for this too like in https://github.com/jupyter-incubator/sparkmagic/pull/284/files#r83328805

aggFTW · 2017-01-11T00:20:39Z

sparkmagic/sparkmagic/livyclientlib/reliablehttpclient.py

                    if data is None:
                        r = function(url, headers=self._headers, verify=self.verify_ssl)
                    else:
                        r = function(url, headers=self._headers, data=json.dumps(data), verify=self.verify_ssl)
                else:
+                    if self._endpoint.authentication == BASIC_AUTH:


Could you add some unit tests for this if/else statements?

aggFTW · 2017-01-11T00:20:54Z

sparkmagic/sparkmagic/livyclientlib/command.py

@@ -52,7 +52,7 @@ def _get_statement_output(self, session, statement_id):

            self.logger.debug(u"Status of statement {} is {}.".format(statement_id, status))

-            if status == u"running":
+            if status == u"running" or status == u"waiting":


Can you add a unit test for this please?

aggFTW · 2017-01-17T19:54:59Z

Hi @brockn, are you taking a look at the review feedback? Can I help with something else?
I also noticed that you are checking that status is not None: when does this happen?
Thanks!

aggFTW · 2017-02-21T22:33:27Z

FYI, talk given by @joychak at Spark Summit on the topic: http://www.slideshare.net/SparkSummit/secured-kerberosbased-spark-notebook-for-data-science-spark-summit-east-talk-by-joy-chakraborty

Joy, want to chime in on the thread to comment on this particular implementation?

Thanks!

joychak · 2017-02-21T23:46:08Z

Sure. I would be more than happy to chat. Thanks, Joy

…

Sent from my iPhone

On Feb 21, 2017, at 5:33 PM, Alejandro Guerrero Gonzalez ***@***.***> wrote: FYI, talk given by @joychak at Spark Summit on the topic: http://www.slideshare.net/SparkSummit/secured-kerberosbased-spark-notebook-for-data-science-spark-summit-east-talk-by-joy-chakraborty Joy, want to chime in on the thread to comment on this particular implementation? Thanks! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

pranayhasan · 2017-03-02T15:34:01Z

@brockn Any update on merging this PR?

vkabanorv · 2018-07-24T15:44:56Z

Hi all!

I understand that this has been implemented? If so, could you please share the details of configuration? Have added auth "Kerberos" to config.json, kinited, but still getting

Problem accessing /sessions. Reason:

    Authentication required

. How do I get sparkmagic to do the authentication?

Kerberos support for the livy client

69de9e3

* Added authentication parameter (-a was taken so long opt only) * Fixes all failing tests * Adds requests-kerberos support

brockn mentioned this pull request Jan 4, 2017

Add Kerberos Support #282

Closed

aggFTW requested changes Jan 11, 2017

View reviewed changes

Fix bug when current status comes back None

aabe862

Additional debugging changes

7d7c0e2

aggFTW closed this May 1, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kerberos support for the livy client #314

Kerberos support for the livy client #314

brockn commented Jan 4, 2017

brockn commented Jan 4, 2017

aggFTW commented Jan 4, 2017 •

edited

Loading

brockn commented Jan 5, 2017

aggFTW commented Jan 5, 2017 •

edited

Loading

aggFTW commented Jan 9, 2017

joychak commented Jan 9, 2017 via email

aggFTW commented Jan 9, 2017

praveen-kanamarlapudi commented Jan 9, 2017

pkasinathan commented Jan 10, 2017

languy commented Jan 10, 2017

aggFTW commented Jan 10, 2017

aggFTW left a comment

aggFTW Jan 10, 2017

aggFTW Jan 11, 2017

aggFTW Jan 11, 2017

aggFTW Jan 11, 2017

aggFTW Jan 11, 2017

aggFTW Jan 11, 2017

aggFTW Jan 11, 2017

aggFTW Jan 11, 2017

aggFTW Jan 11, 2017

aggFTW Jan 11, 2017

aggFTW commented Jan 17, 2017

aggFTW commented Feb 21, 2017

joychak commented Feb 21, 2017 via email

pranayhasan commented Mar 2, 2017

vkabanorv commented Jul 24, 2018

Kerberos support for the livy client #314

Kerberos support for the livy client #314

Conversation

brockn commented Jan 4, 2017

brockn commented Jan 4, 2017

aggFTW commented Jan 4, 2017 • edited Loading

brockn commented Jan 5, 2017

aggFTW commented Jan 5, 2017 • edited Loading

aggFTW commented Jan 9, 2017

joychak commented Jan 9, 2017 via email

aggFTW commented Jan 9, 2017

praveen-kanamarlapudi commented Jan 9, 2017

pkasinathan commented Jan 10, 2017

languy commented Jan 10, 2017

aggFTW commented Jan 10, 2017

aggFTW left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aggFTW commented Jan 17, 2017

aggFTW commented Feb 21, 2017

joychak commented Feb 21, 2017 via email

pranayhasan commented Mar 2, 2017

vkabanorv commented Jul 24, 2018

aggFTW commented Jan 4, 2017 •

edited

Loading

aggFTW commented Jan 5, 2017 •

edited

Loading