-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix #4910: making upload more reliable #4968
Conversation
We can workaround 1 by shading the class ourselves if/until an upstream fix is supplied. The only other problem is that commons-compress is currently an optional dependency, it will be required then for all file uploads. I'll decouple the fix of EOF detection until after we hit more of the upload issues - if for some reason they are more problematic with piping cat to file then we may not need to do anything else. |
ec2dc05
to
c1b6494
Compare
This does not work - the tar still writes the unlimited size to the header in such as way as limiting later will result in files that are padded up to the nearest record block. So we will need two different upload strategies - one for streams and one for tars.
These changes now include launching a nanny process to check for the expected tar size before closing. I've had no luck reproducing the error locally to confirm this addresses the failing case - at worst it's the closing of the websocket / stdin that triggers the final writing of bytes. We won't know that unfortunately if / until this prospective change starts failing in github runs. The other downside of the changes so far is that all uploads, not just directories, will require commons-compress. Everything is now using CountingOutputStream for verification. |
@manusa @rohanKanojia if no one objects to the approach here, I'll go ahead and fix up the unit tests to work with the new upload verification logic. |
cbd8626
to
e7da77b
Compare
Based upon the failures in https://github.com/fabric8io/kubernetes-client/actions/runs/4503720351/jobs/7927685644?pr=4968 and the lack of responses back from the server, we can infer that while we had written all of the bytes, the final bytes had not been written on the server side. So the flushing of stdin / processing by cat is not predictable and we cannot simply wait on it. We either would need to send more data through, then truncate or we can simply keep retrying. It seems to make more sense to do the latter - but that won't work for streams, unless we create a copy. The easiest thing to do at this point would be to just let the upload finish, then verify the size, and if it's incorrect then just report it as a failed upload. We'll let the user decide for now how to work around it. In our integration tests we can add some simple retry logic to deflake the tests. |
8211910
to
faee793
Compare
40c49fe
to
57d8d70
Compare
SonarCloud Quality Gate failed. |
void retryUpload(BooleanSupplier operation) { | ||
Awaitility.await().atMost(60, TimeUnit.SECONDS).until(operation::getAsBoolean); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really convinced about this, might eventually hide something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly as we are only dealing with the boolean return so this does not qualify exactly the nature of the failed upload. However most other problems are likely to be deterministic, in which case we'll exceed the timeout. Only other somewhat rare non-deterministic problems would be missed.
Description
Fix #4910
Trying to fully address the upload failures.
@manusa and others - any thoughts on the first and last points?
Type of change
test, version modification, documentation, etc.)
Checklist