-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upload file/dir to pod does not work in GKE #5599
Comments
Might be the same root-cause as for #5527. In any case, the same fix should likely apply. |
@manusa hope so... |
Well, I don't mean to say that the problem is a read-only file system or a If in your case, the target directory is writeable, the same approach we can use to fix #5527 should also be valid for this. |
@manusa I understand. To be honest, I'm not 100% sure that's the case because if it was, would kubectl cp work? |
OK, this is weird then. Maybe @shawkins has some ideas about why this might be failing in our case and not for |
@cr-liorholtzman @manusa It may take more debugging to see why it's returning false - with debug logging enabled you should see the reason false is returned. To workaround a kubernetes bug with websockets we added the logic that will first copy the file to the tmp location, then validate its size. If for some reason your container is always reporting an unexpected size, then we'll consider the operation to have failed. |
@shawkins do you have your own debug property or just the regular maven/spring? |
The client uses slf4j, so you can configure it with whatever is bound as the logging implementation. The relevant package / logging context here is io.fabric8.kubernetes.client.dsl.internal.uploadable - but you can go much higher up if you want to see more of what is happening. |
@shawkins So I executed the test with debug and below are the logs. From what I can understand it's either that the webSocket is closed prematurely, or maybe fabric8 fails to transfer the file from /tmp to the destination folder (not sure if the validation is before/after the file is transferred inside the pod
|
@cr-liorholtzman the problem is with the size validation: DEBUG i.f.k.c.d.i.u.PodUpload.upload:123 - upload file size validation failed, expected 1869312, but was 0 It's running "wc -c < tmpfile" to get the file size - could you try that directly on your pod and see what it produces? kubernetes/kubernetes#89899 is the upstream issue that prevents websocket based file uploads from working reliably. The underlying fix was merged not that long ago, so eventually this workaround on our side won't be needed. |
@shawkins You mean simply running "wc -c < tmpfile" in the pod? The tmpfile is removed shortly after the upload starts, I can see it for just a fragile of second... |
I mean on any file to start just to validate that the appropriate size is seen. |
@shawkins tested now, wc -c works well... |
Meaning that on boty GKE and minikube you get an appropriate value back for any given file? If that's the case it would imply that we're not transferring anything successfully to GKE in the first place. |
@shawkins Yup, the command works on both environments... upload fails only in GKE |
@shawkins answering here to keep track on the issue, what other logs can I extract? the above logs are after enabling "trace" on f8 on the .internal level. Is there anything else I can enable? |
If this is all we have to work with, this is what I see:
The client needs to determine what container to use since it's not set on your upload operation. Based upon this log ordering is appears that the pod is already ready, so it won't shouldn't need a watch - just simply perform a list and glean the container id from there.
There are normally 3 websocket invocations expected for an upload. The upload, the size check, and the exploading of the tar - these are currently three different execs/websockets. You expect to see {"metadata":{},"status":"Success"} when an command completes successfully - that is not a message for a watch websocket. Since these ExecWebSocketListener events appearing first, it should be the result of the upload - it's atypical to see success here because it's the termination of the websocket that completes the upload - the websocket protocol currently lacks a way to send an end of input. It simply looks like the pod / api-server is signaling it is done with the upload without receiving anything.
In your code ContainerHelper or above are you starting a Watch, Informer, etc.? All of the subsequent AbstractWatchManager logs relate to this Watch.
This is the client performing the size check and getting back 0. |
This issue has been automatically marked as stale because it has not had any activity since 90 days. It will be closed if no further activity occurs within 7 days. Thank you for your contributions! |
This issue has been automatically marked as stale because it has not had any activity since 90 days. It will be closed if no further activity occurs within 7 days. Thank you for your contributions! |
Describe the bug
Ever since our clusters were upgraded to a GKE version that works only with containerd, the upload method simply does not work. It does work on my local minikube, but not on GKE (1.25.10-gke.2700).
Trying to debug this, I see that the upload process starts since I see a fragment of the file for less then a second in /tmp but it removed immediately after starting and the upload method returns false.
Tried to play with kubernetes.upload.connection.timeout & kubernetes.upload.request.timeout but it doesn't help.
Important to mention that my source is actually a different pod in the same cluster from which I initiate the kubernetes-client. From this pod the tests are executed and the API to deploy and trying to upload the file to the remote pod is ran.
A simple kubectl cp command in the cluster does work.
Any ides?
Fabric8 Kubernetes Client version
6.9.2
Steps to reproduce
Expected behavior
The upload method will return false and the file/dir will not be copied.
Runtime
other (please specify in additional context)
Kubernetes API Server version
1.25.3@latest
Environment
GKE
Fabric8 Kubernetes Client Logs
Additional context
No response
The text was updated successfully, but these errors were encountered: