-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with upload / failures with PodIT #4910
Comments
I keep hitting this issue on an unrelated PR: Should we consider skipping those tests until we have a good solution? |
The upload issue is prevalent against 1.26 / containerd - with all client types. It typically looks like an entire packet is being lost, but now here's one with just a single byte that's off https://github.com/fabric8io/kubernetes-client/actions/runs/4310764866/jobs/7519523723#step:5:1160 If we disable those tests we should definitely add a warning in the release notes that pod upload seems broken. The guess is that the api server is introducing message ordering issues each direction - sending or receiving - which would also cause the download issue. The download issue seems to be much more rare - tried reproducing that more yesterday without success. I'd probably leave that test running for now. |
The copyDir failures with the e2e tests and vertx are in the same vain as the download issue, but seems more reproducible. I modified the code to fully read the input stream before passing it off to the tar utility to show both the length and an md5 sum of the contents. For each successful run that shows: An unsuccessful run looks like:
Note the output of the length / checksum before the message that contains the remaining bytes - all of which occurs after receiving close from the server. The workaround for the download handling is that we should be able to expect an errorChannel / exit code message. Rather than immediately terminate with onClose we can wait some amount of time for that to appear. Local testing seemed to confirm that this worked. The upload is thornier. We really don't have any good way to know if the server has received our data. There is no expected exit code message, and even if it did come if the api server has misordered messages at best it would show an error if we used tar. The best I can come up with is that we could compute / request a checksum afterwords and clearly error and/or retry some number of times until there is a match. Added a comment to the upstream issue and linking to #4923 for good measure. |
…condition with exec
…with exec streams
…with exec streams
… wait to upload
… wait to upload
…with exec streams
Signed-off-by: Marc Nuri <[email protected]>
With kubernetes 1.26 we are still seeing errors with pod upload:
https://github.com/fabric8io/kubernetes-client/actions/runs/4252695615/jobs/7396899761#step:6:579
With additional trace logging failed test runs showed the same data being sent to the server either way - as measured down at the level at which okhttp writes to the ssl socket.
There doesn't seem to be a good explanation for this other than we are more prone to seeing it on the containerd minikube runtime.
The issue does seem to match one reported upstream kubernetes/kubernetes#112834
We may want to add a formal warning about this behavior so that users are aware of potential data loss.
The text was updated successfully, but these errors were encountered: