Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OAuth2 support for flyte-cli and SDK engine #23

Merged
merged 46 commits into from
Dec 6, 2019
Merged

Conversation

katrogan
Copy link
Contributor

@katrogan katrogan commented Oct 11, 2019

This change adds authentication support for flyte-cli and the pyflyte CLIs.

New authorization code

Specifically this change introduces an AuthorizationClient which implements the PKCE authorization flow for untrusted clients. This client handles requesting an initial access token, spinning up a callback server to receive the access token and using that to retrieve an authorization code. The client also handles refreshing expired authorization tokens.

This change also includes a lightweight DiscoveryClient for retrieving authorization endpoint metadata defined in the OAuth 2.0 Authorization Server Metadata draft document.

An authorization client singleton is lazily initialized for use by flyte-cli.

Pyflyte changes (basic auth)

Requests an authorization token using a username and password.

Flyte-cli changes (standard auth)

Requests an authorization token using the PKCE flow.

Raw client changes

Wraps RPC calls to flyteadmin in a retry handler that initiates the appropriate authentication flow defined in the flytekit config in response to HTTP 401 unauthorized response codes.

from flytekit.clis.helpers import construct_literal_map_from_variable_map as _construct_literal_map_from_variable_map, \
construct_literal_map_from_parameter_map as _construct_literal_map_from_parameter_map, \
parse_args_into_dict as _parse_args_into_dict, str2bool as _str2bool
from flytekit.common import utils as _utils, launch_plan as _launch_plan_common
from flytekit.common.core import identifier as _identifier
from flytekit.common.types import helpers as _type_helpers
from flytekit.common.utils import load_proto_from_file as _load_proto_from_file
from flytekit.configuration import platform as _platform_config
from flytekit.configuration import creds as _creds_config, platform as _platform_config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I think we should keep the underlying config objects regardless (because they'll be useful in pyflyte), but currently none of these files iirc are used in flyte-cli. That is, the flytekit.config style files are never read by flyte-cli, because it's assumed that users of flyte-cli might not have any flyte repos checked out at all - they may only be an administrator. Auth has a fair amount to config though, it's pretty awkward to pass in everything as command line args. What does @matthewphsmith think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should definitely be env var based. You can use the configuration package though to make things easier since the objects are env var aware.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think making it env var based makes it difficult to set-up for an end-user. I'd prefer to have a config that can be easily cloned and written, but I'm happy to move it to a separate directory if it makes sense organizationally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haytham's argument was that there is a .aws/config and a .kube/config and a .gitconfig. Why can't there be a .flyte/config? I think it's cleaner than having to rely on five or six environment variables. What's the downside that you're worried about Matt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can definitely have .flyte/config The only thing is then we need to provide config modification commands (but with click opening up the editor is simple)
Another option is to have .flyte/config as an option otherwise explicitly specify all the args

from http import client as _StatusCodes
except ImportError: # Python 2
import httplib as _StatusCodes
from BaseHTTPServer import HTTPServer as _HTTPServer, BaseHTTPRequestHandler as _BaseHTTPRequestHandler
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's import the module. it makes refactoring easier and keeps the namespace cleaner.

import BaseHTTPServer as _base_http_server

Later...
_base_http_server.HTTPServer

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for urlparse below

Copy link
Contributor

@wild-endeavor wild-endeavor Nov 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but it makes it harder to jump into the definition of the underlying object. and the ide shows everything as yellow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm actually i can still jump around in my ide to the underlying object with refactoring


from flytekit.configuration import common as _config_common

DISCOVERY_ENDPOINT = _config_common.FlyteStringConfigurationEntry('credentials', 'discovery_endpoint', default=None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are all these env vars going to be auto-set via the auth flow? or would a user ever specify any of this? If yes, can you let me know about a few of those cases to help me understand?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are specific to the hosted Flyte platform - so they can't be set. For example the discovery endpoint is configured in admin. The native Okta app at Lyft we're testing with has a unique client id and a hardcoded, pre-configured redirect URI. The authorization metadata key is a departure we have to make from the standard protocol because Envoy actually behaves funkily if we use the default value.

@katrogan katrogan requested a review from kumare3 as a code owner November 12, 2019 19:16
if not _platform_config.AUTH.get():
# nothing to do
return None
access_token = _keyring.get_password(_metadata, _metadata_access_token)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is awesome!

@kumare3
Copy link
Contributor

kumare3 commented Nov 12, 2019

I think before an approval we should either demonstrate the auth flow?

"""
This is the redirect uri registered with the app which handles authorization for a Flyte deployment.
This is the callback uri registered with the app which handles authorization for a Flyte deployment.
Please note the hardcoded port number. Ideally we would not do this, but some IDPs do not allow wildcards for
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haytham suggested that we implement logic to either accept a wildcard port or a list of ports (i.e. if we register a bunch of hard-coded ports with the IDP.) That way we have some flexibility with choosing an available system port. I'm happy to save that for the next PR though and leave this comment in for now



def _handle_rpc_error(fn):
@_try_three_times
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @matthewphsmith we introduced this to make sure we don't get stuck in a loop if authentication is persistently failing...but i'm not sure that this is the prettiest code 🙃 let us know if a better way stands out

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code's been refactored but i think the original comment still stands

katrogan and others added 4 commits November 27, 2019 10:47
…that if the host is specified in the user's ~/.flyte/config file, you don't need to specify it in the flyte-cli command
requirements.txt Outdated Show resolved Hide resolved
auth=true

[credentials]
discovery_endpoint=http://corp.idp.com/.well-known/oauth-authorization-server
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the use case a bit as to why a user would specify the discovery and redirect endpoint? Wouldn’t that config be based server side?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, this discovery endpoint should be pointed at flyte.corp.net or something... and that should redirect to whatever IDP corp.net is using. Or it can return the values itself without redirecting, but the point is, you only hard-code the main flyte server. Thinking about it, maybe we should nix this @katrogan and just assume the extension off of the platform URL. That's probably the cleanest thing to do.

The redirect endpoint is configured in the IDP. It's a hardcoded fully-specified URL. This means that the callback url (including the port) that the local server that's stood up at has to match exactly what the IDP has. This information is not returned by the metadata/discovery endpoint. Why i'm not sure... it may be a security concern.

Ideally, the metadata endpoint would be less restrictive and allow us to return more things. Because of the limitations of the RFC, we also need to hardcode this in for example: https://github.com/lyft/flyteexamples/pull/97/files#diff-2e41be19539d0ba6fda0e5ce814a9123R22. This is the scope that will be requested by pyflyte when it makes the token request. Again, like the callback url, it has to match a hardcoded string set within the IDP, otherwise it'll fail. It doesn't really offer anything, but we have to have it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: assuming the extension - I'm happy to do this if we can safely assume this is the major convention (which I guess sounds like it is?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm very interesting...well seems like you have a handle on it, so I defer to whatever you think is best though I am biased to less config wherever possible.

flytekit/clients/raw.py Outdated Show resolved Hide resolved
_utf_8 = 'utf-8'


class FlyteAuthenticationException(_FlyteException):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIt: should we define this in flyte.exceptions.user?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flytekit/clis/flyte_cli/main.py Show resolved Hide resolved
:return:
"""
auth_endpoints = _credentials_access.get_authorization_endpoints()
token_endpoint = auth_endpoints.token_endpoint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guessing you'll have to do something here to support relative URLs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the token endpoint is "/token", the code doesn't work, I think

wild-endeavor
wild-endeavor previously approved these changes Dec 6, 2019
@jonathanburns
Copy link
Contributor

Any chance you can ask for password over stdin if password is required but not password is passed? Something like this:

https://github.com/lyft/flytekit/pull/56/files#diff-39ff0a124da92894da1f07bd5ce13c67R46-R47

@wild-endeavor wild-endeavor merged commit 439a36d into master Dec 6, 2019
max-hoffman pushed a commit to dolthub/flytekit that referenced this pull request May 11, 2021
This change adds authentication support for flyte-cli and the pyflyte CLIs.

# New authorization code
Specifically this change introduces an **AuthorizationClient** which implements the [PKCE authorization flow](https://www.oauth.com/oauth2-servers/pkce/authorization-code-exchange/) for untrusted clients. This client handles requesting an initial access token, spinning up a callback server to receive the access token and using that to retrieve an authorization code. The client also handles refreshing expired authorization tokens.

This change also includes a lightweight **DiscoveryClient** for retrieving authorization endpoint metadata defined in the [OAuth 2.0 Authorization Server Metadata](https://tools.ietf.org/id/draft-ietf-oauth-discovery-08.html) draft document.

An authorization client singleton is lazily initialized for use by flyte-cli.

# Pyflyte changes (basic auth)
Requests an authorization token using a username and password.

# Flyte-cli changes (standard auth)
Requests an authorization token using the PKCE flow.

# Raw client changes
Wraps RPC calls to flyteadmin in a retry handler that initiates the appropriate authentication flow defined in the flytekit config in response to `HTTP 401 unauthorized` response codes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants