-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot run pipeline samples in GCP IAP Deployment #2773
Comments
When I retried deployment again, the message is changed to AccessDeniedException: 403 Primary: /namespaces/dcard-data.svc.id.goog with additional claims does not have storage.objects.list access to dcard--bruce. |
not sure whether that is related, Traceback (most recent call last):
File "<string>", line 6, in <module>
File "/usr/local/lib/python2.7/dist-packages/google/api_core/page_iterator.py", line 212, in _items_iter
for page in self._page_iter(increment=False):
File "/usr/local/lib/python2.7/dist-packages/google/api_core/page_iterator.py", line 243, in _page_iter
List of buckets:
page = self._next_page()
File "/usr/local/lib/python2.7/dist-packages/google/api_core/page_iterator.py", line 369, in _next_page
response = self._get_next_page_response()
File "/usr/local/lib/python2.7/dist-packages/google/api_core/page_iterator.py", line 419, in _get_next_page_response
method=self._HTTP_METHOD, path=self.path, query_params=params
File "/usr/local/lib/python2.7/dist-packages/google/cloud/_http.py", line 417, in api_request
timeout=timeout,
File "/usr/local/lib/python2.7/dist-packages/google/cloud/_http.py", line 275, in _make_request
method, url, headers, data, target_object, timeout=timeout
File "/usr/local/lib/python2.7/dist-packages/google/cloud/_http.py", line 313, in _do_request
url=url, method=method, headers=headers, data=data, timeout=timeout
File "/usr/local/lib/python2.7/dist-packages/google/auth/transport/requests.py", line 277, in request
self.credentials.before_request(auth_request, method, url, request_headers)
File "/usr/local/lib/python2.7/dist-packages/google/auth/credentials.py", line 124, in before_request
self.refresh(request)
File "/usr/local/lib/python2.7/dist-packages/google/auth/compute_engine/credentials.py", line 102, in refresh
six.raise_from(new_exc, caught_exc)
File "/usr/lib/python2.7/dist-packages/six.py", line 737, in raise_from
raise value
google.auth.exceptions.RefreshError: HTTPConnectionPool(host='metadata.google.internal', port=80): Read timed out. (read timeout=120) |
I found this issue also: googleapis/google-auth-library-python#211 |
After set workload identity to |
Maybe this has to do with how gcloud obtains/refreshes credentials? Even when using the old secret method (e.g. |
About timeout problem, I think that is GKE problem. That will use default credential client and the certification is timeout around 1 hour. @parthmishra I tried that but it didn’t work because gcloud sdk implementation |
@bruce3557, also running into this on some training experiments (using Katib outside pipelines). I end up with that same error when trying to download training data:
Please post back if you find a fix |
@wronk I find a workaround solution to prevent this problem in kubeflow issue 4607. Before GCP fix the issue, we cannot do anything I think. |
As mentioned in the GCP issue, did you try the workarounds.
|
@Bobgy I get the following error when trying to downgrade Master of cluster [xxxxx] will be upgraded from version [1.14.9-gke.2] to version [1.14.8-gke.17]. This operation is long-running and will block other operations on the cluster (including |
I don't yet understand how all of kubeflow is set up but am wondering about the effect such change would have on the other components. Would they continue to work assuming pipeline works ? |
AFAIK there is an ongoing issue related with recent GKE release. Will keep this thread updated. |
It means a new patch version has been released. The new 1.14.8-gke.x probably already have the fix. |
@Bobgy thanks - found latest in 1.18.8 series is 1.14.8-gke.33 and used your command to upgrade from earlier kubeflow 0.7 default version. Still getting this error though and cluster-user has Storage Admin role File "kfp_component/google/dataflow/_launch_python.py", line 58, in launch_python |
@yantriks-edi-bice Sorry for late notice, you probably also need to upgrade your google/cloud-sdk client versions as mentioned in #3069 (comment) |
It seems the original issue is a GKE workload identity problem, closing now. |
@Bobgy: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Signed-off-by: Andrews Arokiam <[email protected]> Signed-off-by: Dan Sun <[email protected]> Co-authored-by: Dan Sun <[email protected]>
What happened:
We cannot run pipeline samples.
Seems that
gcloud
related command cannot get workload identity correctly.The error messages are
What did you expect to happen:
We should run pipeline samples smoothly.
What steps did you take:
Created a run and an experiment.
Anything else you would like to add:
I tried this implementation and still cannot get correct result.
https://github.com/kubeflow/pipelines/blob/master/samples/core/secret/secret.py
The text was updated successfully, but these errors were encountered: