Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use incluster config fallback for KubernetesProvider #3357

Merged

Conversation

trey-stafford
Copy link
Contributor

Description

This PR supersedes #2503

See kubernetes-client/python#1005 for a good description of the issue.

Changed Behaviour

The KubernetesProvider now falls back to loading config in-cluster if a kube-config file is not found. This allows in-cluster submission of parsl jobs.

Fixes

Fixes # (issue)

Type of change

Choose which options apply, and delete the ones which do not apply.

  • Bug fix

@mfisher87
Copy link
Contributor

@benclifford do you have any thoughts on this CI failure? It also occurred on your PR #3359.

Failed to build radical.pilot
ERROR: Could not build wheels for radical.pilot, which is required to install pyproject.toml-based projects

@benclifford
Copy link
Collaborator

some exciting new build problem with radical's build/packaging. I tried briefly to recreate it on my laptop and did not have a problem, so I'll dig into it more deeply later in the week - I'm not working on Parsl today.

@benclifford
Copy link
Collaborator

(very possibly related to build system changes in Python 3.12 which were/are fairly aggressive)

@mfisher87
Copy link
Contributor

Oof. Thanks so much for providing that context and saving us from spinning our wheels on this! ❤️

@benclifford
Copy link
Collaborator

@trey-stafford I don't have rights to update your branch, so I can't merge the latest parsl master (and in general that will make it hard to get your branch up to date to merge it later). But, recent PR #3360 should fix the radical test failure you reported.

@benclifford benclifford changed the title K8s use incluster config fallback Use incluster config fallback for KubernetesProvider Apr 18, 2024
@trey-stafford trey-stafford force-pushed the k8s-use-incluser-config-fallback branch from 82c02a1 to 9f63b39 Compare April 18, 2024 16:29
@trey-stafford
Copy link
Contributor Author

@benclifford I've merged in master. Happy to do that again as needed!

Note that I also amended my first commit to this PR to properly credit @manning-ncsa for work contributed in the superseded PR, #2503 .

@mfisher87
Copy link
Contributor

@trey-stafford if you edit this PR can you enable "Allow maintainers to push"? Or is that only settable when opening the original PR? 🤔

@benclifford
Copy link
Collaborator

you did something weird perhaps with rebase that makes the diff look weird - I think I know what, and I think that will fix itself once master gets merged again - which is necessary for putting this PR on master anyway as our tests require the PR be up to date.

@mfisher87
Copy link
Contributor

mfisher87 commented Apr 19, 2024

I think I see what happened, we'll get it sorted on Monday! Thanks for your patience :)

@mfisher87 mfisher87 force-pushed the k8s-use-incluser-config-fallback branch from 9f63b39 to 0e1fc95 Compare April 24, 2024 17:38
@mfisher87
Copy link
Contributor

🚀 Sorry didn't get this done on Monday, but I think we're all fixed up now :)

@benclifford
Copy link
Collaborator

Can you comment a bit on what the exception reported to a user looks like? Say the user intends to do something with the config loaded by load_kube_config() but they mess it up in some way that raises an exception... does useful debugging information then get lost because that exception is immediately discarded?

@mfisher87
Copy link
Contributor

mfisher87 commented Apr 24, 2024

I have a couple meetings coming up to prepare for but we will get back to you in a couple hours!

@mfisher87
Copy link
Contributor

Would you like that context in the commit message for this commit? 4833fc2

@benclifford
Copy link
Collaborator

It doesn't really matter where you write it - if you have something really interesting to say, a comment in the code would be ok too, for example. Just more interested that someone thinks about it because I am always suspicious of too-easily-discarded exceptions.

@mfisher87
Copy link
Contributor

mfisher87 commented Apr 26, 2024

We pushed another change that we feel will help prevent a really confusing user experience if both local config and incluster config fail to load. We used an ExceptionGroup to handle this, but perhaps the better solution is to only re-raise the local config exception with a note that we tried both loading local and incluster config. The ExceptionGroup looks like this when raised:

Traceback (most recent call last):
  File "/home/mfisher/Projects/qgreenland-net/parsl-exploration/src/parsl/parsl/providers/kubernetes/kube.py", line 109, in __init__
    config.load_kube_config()
  File "/home/mfisher/.miniconda3/envs/parsl-exploration/lib/python3.12/site-packages/kubernetes/config/kube_config.py", line 815, in load_kube_config
    loader = _get_kube_config_loader(
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mfisher/.miniconda3/envs/parsl-exploration/lib/python3.12/site-packages/kubernetes/config/kube_config.py", line 772, in _get_kube_config_loader
    raise ConfigException(
kubernetes.config.config_exception.ConfigException: Invalid kube-config file. No configuration found.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mfisher/Projects/qgreenland-net/parsl-exploration/src/parsl/parsl/providers/kubernetes/kube.py", line 123, in __init__
    config.load_incluster_config()
  File "/home/mfisher/.miniconda3/envs/parsl-exploration/lib/python3.12/site-packages/kubernetes/config/incluster_config.py", line 121, in load_incluster_config
    try_refresh_token=try_refresh_token).load_and_set(client_configuration)
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mfisher/.miniconda3/envs/parsl-exploration/lib/python3.12/site-packages/kubernetes/config/incluster_config.py", line 54, in load_and_set
    self._load_config()
  File "/home/mfisher/.miniconda3/envs/parsl-exploration/lib/python3.12/site-packages/kubernetes/config/incluster_config.py", line 62, in _load_config
    raise ConfigException("Service host/port is not set.")
kubernetes.config.config_exception.ConfigException: Service host/port is not set.

During handling of the above exception, another exception occurred:

  + Exception Group Traceback (most recent call last):
  |   File "/home/mfisher/Projects/qgreenland-net/parsl-exploration/run.py", line 104, in <module>
  |     config = get_parsl_config()
  |              ^^^^^^^^^^^^^^^^^^
  |   File "/home/mfisher/Projects/qgreenland-net/parsl-exploration/run.py", line 49, in get_parsl_config
  |     provider=KubernetesProvider(
  |              ^^^^^^^^^^^^^^^^^^^
  |   File "/home/mfisher/Projects/qgreenland-net/parsl-exploration/src/parsl/parsl/providers/kubernetes/kube.py", line 125, in __init__
  |     raise ExceptionGroup(
  | ExceptionGroup: Encountered errors loading kubernetes cluster configuration (2 sub-exceptions)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/home/mfisher/Projects/qgreenland-net/parsl-exploration/src/parsl/parsl/providers/kubernetes/kube.py", line 109, in __init__
    |     config.load_kube_config()
    |   File "/home/mfisher/.miniconda3/envs/parsl-exploration/lib/python3.12/site-packages/kubernetes/config/kube_config.py", line 815, in load_kube_config
    |     loader = _get_kube_config_loader(
    |              ^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/home/mfisher/.miniconda3/envs/parsl-exploration/lib/python3.12/site-packages/kubernetes/config/kube_config.py", line 772, in _get_kube_config_loader
    |     raise ConfigException(
    | kubernetes.config.config_exception.ConfigException: Invalid kube-config file. No configuration found.
    +---------------- 2 ----------------
    | Traceback (most recent call last):
    |   File "/home/mfisher/Projects/qgreenland-net/parsl-exploration/src/parsl/parsl/providers/kubernetes/kube.py", line 123, in __init__
    |     config.load_incluster_config()
    |   File "/home/mfisher/.miniconda3/envs/parsl-exploration/lib/python3.12/site-packages/kubernetes/config/incluster_config.py", line 121, in load_incluster_config
    |     try_refresh_token=try_refresh_token).load_and_set(client_configuration)
    |                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/home/mfisher/.miniconda3/envs/parsl-exploration/lib/python3.12/site-packages/kubernetes/config/incluster_config.py", line 54, in load_and_set
    |     self._load_config()
    |   File "/home/mfisher/.miniconda3/envs/parsl-exploration/lib/python3.12/site-packages/kubernetes/config/incluster_config.py", line 62, in _load_config
    |     raise ConfigException("Service host/port is not set.")
    | kubernetes.config.config_exception.ConfigException: Service host/port is not set.
    +------------------------------------

And as we're typing this @trey-stafford called out that parsl needs to support 3.10, and ExceptionGroups are new in 3.11. After the change, you can see both root-cause exceptions in the chain:

Traceback (most recent call last):
  File "/home/mfisher/Projects/qgreenland-net/parsl-exploration/src/parsl/parsl/providers/kubernetes/kube.py", line 109, in __init__
    config.load_kube_config()
  File "/home/mfisher/.miniconda3/envs/parsl-exploration/lib/python3.12/site-packages/kubernetes/config/kube_config.py", line 815, in load_kube_config
    loader = _get_kube_config_loader(
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mfisher/.miniconda3/envs/parsl-exploration/lib/python3.12/site-packages/kubernetes/config/kube_config.py", line 772, in _get_kube_config_loader
    raise ConfigException(
kubernetes.config.config_exception.ConfigException: Invalid kube-config file. No configuration found.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mfisher/Projects/qgreenland-net/parsl-exploration/src/parsl/parsl/providers/kubernetes/kube.py", line 123, in __init__
    config.load_incluster_config()
  File "/home/mfisher/.miniconda3/envs/parsl-exploration/lib/python3.12/site-packages/kubernetes/config/incluster_config.py", line 121, in load_incluster_config
    try_refresh_token=try_refresh_token).load_and_set(client_configuration)
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mfisher/.miniconda3/envs/parsl-exploration/lib/python3.12/site-packages/kubernetes/config/incluster_config.py", line 54, in load_and_set
    self._load_config()
  File "/home/mfisher/.miniconda3/envs/parsl-exploration/lib/python3.12/site-packages/kubernetes/config/incluster_config.py", line 62, in _load_config
    raise ConfigException("Service host/port is not set.")
kubernetes.config.config_exception.ConfigException: Service host/port is not set.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mfisher/Projects/qgreenland-net/parsl-exploration/run.py", line 104, in <module>
    config = get_parsl_config()
             ^^^^^^^^^^^^^^^^^^
  File "/home/mfisher/Projects/qgreenland-net/parsl-exploration/run.py", line 49, in get_parsl_config
    provider=KubernetesProvider(
             ^^^^^^^^^^^^^^^^^^^
  File "/home/mfisher/Projects/qgreenland-net/parsl-exploration/src/parsl/parsl/providers/kubernetes/kube.py", line 125, in __init__
    raise config.config_exception.ConfigException(
kubernetes.config.config_exception.ConfigException: Failed to load both kube-config file and in-cluster configuration.

@mfisher87 mfisher87 force-pushed the k8s-use-incluser-config-fallback branch from fbbd8d9 to 67cdca4 Compare April 26, 2024 23:40
Co-Authored-By: Trey Stafford <[email protected]>
@mfisher87
Copy link
Contributor

@benclifford we think we've got things ready to go!

@benclifford
Copy link
Collaborator

ok cool. you still need to do something to let me update this branch, to fit in our merge process

@benclifford
Copy link
Collaborator

wrt your comment on using Python 3.11 features, in recent years our informal policy has been to support the same Python versions as are listed here as "bugfix" or "security", so under that model, no 3.11 features until near the end of 2026 - https://devguide.python.org/versions/

We have a low-key project going on to understand our usage a bit better, and my gut feeling is that with people using things like conda more and being less tied to system-level Python, we could probably drop support earlier (and so access newer features like ExceptionGroups earlier) - tagging @NishchayKarle who is working on that usage project to help him understand this use case for that work.

@mfisher87
Copy link
Contributor

mfisher87 commented Apr 27, 2024

ok cool. you still need to do something to let me update this branch, to fit in our merge process

Shoot. I think @trey-stafford has to do that, and he may not be available until Monday. I don't have an edit button even though the PR is from a repo I have write access to. As ugly as it is, I can create another fork and open a new PR and ensure the "allow maintainers to change" checkbox is ticket if you'd like to get this merged sooner than that.

Trey, when you get to it, this checkbox is towards the bottom of the right sidebar:

image

@benclifford
Copy link
Collaborator

@mfisher87 no super rush to mess around to get it merged

@trey-stafford
Copy link
Contributor Author

ok cool. you still need to do something to let me update this branch, to fit in our merge process

Shoot. I think @trey-stafford has to do that, and he may not be available until Monday. I don't have an edit button even though the PR is from a repo I have write access to. As ugly as it is, I can create another fork and open a new PR and ensure the "allow maintainers to change" checkbox is ticket if you'd like to get this merged sooner than that.

Trey, when you get to it, this checkbox is towards the bottom of the right sidebar:

image

This option is not available. I think it's because this is an org-forked repo. This documentation suggests it is possible with "user-owned forks".

Maybe a new fork is the right path?

@trey-stafford
Copy link
Contributor Author

Ok, I transferred the repo to my personal account and was able to enable edits and access by maintainers! @benclifford hopefully that does the trick!

@mfisher87
Copy link
Contributor

mfisher87 commented Apr 29, 2024

In retrospect, that makes sense, as checking that box would give maintainers of other projects access to org-wide secrets! 🤯 (there is no such thing as personal-account-wide secrets)

@benclifford benclifford merged commit 0d1c30a into Parsl:master Apr 29, 2024
6 checks passed
@benclifford
Copy link
Collaborator

ok that worked, so this will be in the next Parsl release happening in the next few hours. Thanks!

@sophie-bui
Copy link
Collaborator

👋 Hi @trey-stafford and @mfisher87! My name is Sophie and I'm Parsl's community manager.

We're trying to broadly understand how people are using Parsl and would appreciate it if either/both of you would fill out our quick survey: https://bit.ly/parsl-use-survey. Thanks in advance!

@mfisher87
Copy link
Contributor

Hey @sophie-bui thanks for asking for our input! I just filled it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants