Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core feature] Allow flyteadmin to start even if OIDC is unavailable (Improve flyteadmin startup resiliency) #5701

Open
2 tasks done
ddl-rliu opened this issue Aug 28, 2024 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@ddl-rliu
Copy link
Contributor

ddl-rliu commented Aug 28, 2024

Motivation: Why do you think this is important?

Today, the flyteadmin pod is blocked from starting up until the OIDC provider is healthy and available (the pod gets stuck in Error state). In some Kubernetes configurations, this erroring-pod could cause deployment-wide issues. The current behavior could be made more resilient.

(Note that this applies to configurations using useAuth=true)

Goal: What should the final outcome look like, ideally?

A better approach in these configurations is to allow flyte to start up, even if the OIDC provider is unavailable. Then, try to re-initialize the OIDC provider later in the deployment lifespan. This is a more resilient approach, and it can be made configurable.

Describe alternatives you've considered

A workaround is to disable Flyte until the OIDC provider is available.

Propose: Link/Inline OR Additional context

Proposed fix here: #5702

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@ddl-rliu ddl-rliu added enhancement New feature or request untriaged This issues has not yet been looked at by the Maintainers labels Aug 28, 2024
Copy link

runllm bot commented Aug 28, 2024

The issue of FlyteAdmin being blocked from starting up until the OIDC provider is healthy is known. According to Flyte's documentation, FlyteAdmin requires an OIDC provider for authentication when useAuth=true. This can cause deployment issues if the OIDC provider is unavailable.

A proposed solution is to allow FlyteAdmin to start even if the OIDC provider is unavailable and re-initialize the OIDC provider later. This approach would improve resiliency and can be made configurable. Currently, a workaround is to disable Flyte until the OIDC provider is available.

Would you like more details on how to implement this or any other specific aspect?

Sources:


This is an AI-generated response and your feedback is appreciated! Please leave a 👍 if this is helpful and 👎 if it is not.

Mention @RunLLM to ask follow-up questions. See runllm.com for more!

@eapolinario eapolinario self-assigned this Sep 5, 2024
@eapolinario eapolinario removed the untriaged This issues has not yet been looked at by the Maintainers label Sep 5, 2024
@Sovietaced
Copy link
Contributor

A better approach in these configurations is to allow flyte to start up, even if the OIDC provider is unavailable. Then, try to re-initialize the OIDC provider later in the deployment lifespan. This is a more resilient approach, and it can be made configurable.

This is a double edged sword. If I recall correctly there was some work done recently to indefinitely cache something related to OIDC on boot. If the OIDC provider is down and you do a normal rolling deployment you could end up in a worse state since you previously had working pods but now you have broken ones.

@eapolinario
Copy link
Contributor

If I recall correctly there was some work done recently to indefinitely cache something related to OIDC on boot.

@Sovietaced , can you point to this change? Are you thinking of #5621?

@Sovietaced
Copy link
Contributor

@Sovietaced , can you point to this change? Are you thinking of #5621?

Yeah I think so

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Assigned
Development

No branches or pull requests

3 participants