Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix #4910 / #4923 addressing inconsistent behavior with exec streams #4959

Merged
merged 1 commit into from
Mar 10, 2023

Conversation

shawkins
Copy link
Contributor

@shawkins shawkins commented Mar 9, 2023

Description

Only 1 real issue was fully tracked down with #4910 / #4923 - vertx closeHandler is immediate. That needs changed to the endHandler, which is notified in order. Reviewing the vertx logic does also highlight that we'll have a problem if the server ever does send a ping - but that should not currently be the case. The only "ping" sent currently is the 0 byte message.

For the common failures some additional debugging has been added to the failure message to narrow in on where the problem lies.

I had thought there was a concurrency issue with PodOperationsImpl.readTo, but that shouldn't be the case - the stream was still being closed in the calling method's try.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • Feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change
  • Chore (non-breaking change which doesn't affect codebase;
    test, version modification, documentation, etc.)

Checklist

  • Code contributed by me aligns with current project license: Apache 2.0
  • I Added CHANGELOG entry regarding this change
  • I have implemented unit tests to cover my changes
  • I have added/updated the javadocs and other documentation accordingly
  • No new bugs, code smells, etc. in SonarCloud report
  • I tested my code in Kubernetes
  • I tested my code in OpenShift

@shawkins
Copy link
Contributor Author

shawkins commented Mar 9, 2023

https://github.com/fabric8io/kubernetes-client/actions/runs/4376515845/jobs/7659072908#step:6:450 confirms a fresh instance where some of the stdIn upload is dropped. Options here include:

  • adding a time delay between when the last data is sent and close
  • attempt to check inline with the operation or afterwards. File upload and directory upload have divergent logic that makes this difficult. We may want to consolidate to just tar and a two step process - send the tar.gz, then in a separate exec check the length or checksum, then extract.

https://github.com/fabric8io/kubernetes-client/actions/runs/4268450566/jobs/7430955144#step:5:448 since there is no upload could be an issue with stdOut download - but I could never quite confirm that locally. Initially it seemed like an issue with the stream close, but that turned out to not be the case. I'd like to get another reproduction of that locally or via the github runs before drawing more conclusions about how to proceed.

@shawkins
Copy link
Contributor Author

shawkins commented Mar 9, 2023

@manusa the first commit addresses the vertx close problem and some misc. cleanups. The second commit was speculative change - basically throwing a few more messages at the server and a little wait to help ensure that what was already sent is processed. This was done as a first attempt because I couldn't think of a good way an inline, or a post check that wouldn't require even more code. However it also does not appear to be resilient - at least for github execution - so it was removed.

@shawkins shawkins force-pushed the iss4910 branch 2 times, most recently from cdad995 to 3985d24 Compare March 10, 2023 02:41
@manusa
Copy link
Member

manusa commented Mar 10, 2023

So from the final code changes I see we're addressing 3 things here:

  • Vert.x ping-pong handling: request more frames in case a pong is received (added a handler instead of disabling client-sent pings) - the change / comment are just to being defensive, it does not appear that we'll currently send a client side ping, nor will the server send one.
  • Vert.x replacing closeHandler with endHandler which waits for the previous frames to be processed as opposed to closing the stream immediately
  • Pod (Download) Operations: moved the exitCode blocking retrieval to the try-with-resources block. OutputStream is clearly closed after the server process has completed. This is just a minor cleanup and should not address the problems seen with copyFile.

There still remains a few issues to solve:

  • Delaying the close operation server-side for tar extract operation (might complete before the buffers are flushed) -Upload-
  • Upload/Download checksums

I'll keep #4910 and #4923 open until we create follow-up issues.

please feel free to edit this comment to make it more accurate

@manusa
Copy link
Member

manusa commented Mar 10, 2023

I also think that we might want to add a little complexity to the upload operation (maybe also consolidate into a single implementation regardless of file or directory -like you proposed-). Perform 2 commands, one for the upload, one for the checksum. The upload command is only completed (local close processing is delayed) after the second command is successful. The checksum command is executed after the upload completes (websocket remains open) and is retried n times until success. I understand that this should cover the issue, but maybe I'm misinterpreting something.

@manusa manusa added this to the 6.5.0 milestone Mar 10, 2023
@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

44.4% 44.4% Coverage
0.0% 0.0% Duplication

@manusa manusa merged commit 11daf25 into fabric8io:master Mar 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants