-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Occasional IOException
/InterruptedException
during startup
#11084
Comments
Hubert Plociniczak reports a new STANDUP for the provided date (2024-09-17): Progress: Continued trying to reproduce occassional stability issues. Addressing comment for #11092. (partially previous day) Prepared a PR that disables signing on develop and PRs. Fixed #11100. Created a PR that adds more info to occasional assertion failures (#11113). It should be finished by 2024-09-18. Next Day: Next day I will be working on the #11082 task. Continue investigating issues. |
Hubert Plociniczak reports a new STANDUP for the provided date (2024-09-18): Progress: CI problems with #11092. Unable to reproduce #11084 so far. Debugging CI problems. It should be finished by 2024-09-18. Next Day: Next day I will be working on the #11092 task. Continue investigating issues. |
I was just playing with the FIS project and I am seeing the exception on every second run - execute |
Hubert Plociniczak reports a new STANDUP for the provided date (2024-09-25): Progress: Trying to create a test setup that reliably reproduces InterruptedException. No luck so far. Another attempt at fixing notarization issue on Mac (#11169). More testing of #10823. It should be finished by 2024-09-30. Next Day: Next day I will be working on the #11084 task. Continue investigating the issue. |
Hubert Plociniczak reports a new STANDUP for yesterday (2024-09-26): Progress: More tweaks to notarization issue (#11182). While working on #11084 started experiencing visualization problems. Bisected and reported as #11189. It should be finished by 2024-09-30. Next Day: Next day I will be working on the #11084 task. Continue investigating the issue. |
Hubert Plociniczak reports a new STANDUP for the provided date (2024-09-26): Progress: Still unable to consistently reproduce interruptions' problem. #11189 appears to be problematic consistently for other users as well. One last attempt at notarization for MacOS. It should be finished by 2024-09-30. Next Day: Next day I will be working on the #11084 task. Continue investigating the issue. |
There is no value in sending expression updates involving interrupts to the user: ![Screenshot from 2024-09-30 14-47-17-2](https://github.com/user-attachments/assets/78fca5bf-085d-4c1c-99fb-0acb5f0a31a3) Adding more logging information to see how aborts affect execution. Related to #11084.
Hubert Plociniczak reports a new STANDUP for the provided date (2024-09-30): Progress: Have a workaround for some intermittent interrupt problems - we shouldn't be reporting InterruptException at all as new execution will always be submitted. Struggling with writing a unit test for this asynchronous scenario. It should be finished by 2024-09-30. Next Day: Next day I will be working on the #11084 task. Figure out how to write a test-case |
Hubert Plociniczak reports a new STANDUP for yesterday (2024-10-01): Progress: Figured out the test-case for the problematic case. PR created and passed review. Extending deadline as there are more cases to be handled (involving libraries). It should be finished by 2024-10-03. Next Day: Next day I will be working on the #11084 task. Continue working on the issue |
I tried to test my project on
...and I have never seen the Anyway, I cannot run the whole project's download process anymore. I had to apply a bugfix to
even with that I am getting:
don't you know, @jdunkerley what could be the fix? |
Another example of interrupts interfering with the execution where we get temporary
|
Hubert Plociniczak reports a new STANDUP for the provided date (2024-10-02): Progress: Testing in search of more problems with #11084. Filed a number of tickets for problems discovered on the way. Most importantly hit #11237 which reveals problems with excessive recomputation. The problem is reproducible but not on a small example. It should be finished by 2024-10-03. Next Day: Next day I will be working on the #11084 task. Continue testing and debugging problems with interruptions |
Hubert Plociniczak reports a new STANDUP for the provided date (2024-10-03): Progress: Continued to triage #11237 to create a smaller example but it will have to do for now. Discussing remaining issues with interruptions (on libraries' side). During testing still seeing occasional visualization failures with interrupts - created #11250 for that; test case is again problematic. Discussing build time issues. It should be finished by 2024-10-03. Next Day: Next day I will be working on the #11084 task. Create a test case for the fix and continue testing interrupts |
Hubert Plociniczak reports a new STANDUP for the provided date (2024-10-04): Progress: Constructed a test case for the problematic visualizations. Pair debugging set up issues on fresh Windows machine. More testing. It should be finished by 2024-10-04. Next Day: Next day I will be working on the #11084 task. If happy move on to other tasks. |
* Do not run visualizations on InterruptException There is no point in running visualization for the expression value that is InterruptedException. The latter is likely to bubble up the exception or create one that will be confusing to the user. Closes #11243 and partially addresses some of the symptomes of #11084. * Add a test for confusing visualization failures Previously a visualization failure would be reported: ``` Method `+` of type sleep interrupted could not be found. ``` * PR review Nit
There is no value in sending expression updates involving interrupts to the user: ![Screenshot from 2024-09-30 14-47-17-2](https://github.com/user-attachments/assets/78fca5bf-085d-4c1c-99fb-0acb5f0a31a3) Adding more logging information to see how aborts affect execution. Related to #11084. (cherry picked from commit ad53c82)
* Do not run visualizations on InterruptException There is no point in running visualization for the expression value that is InterruptedException. The latter is likely to bubble up the exception or create one that will be confusing to the user. Closes #11243 and partially addresses some of the symptomes of #11084. * Add a test for confusing visualization failures Previously a visualization failure would be reported: ``` Method `+` of type sleep interrupted could not be found. ``` * PR review Nit (cherry picked from commit 9429a45)
Hubert Plociniczak reports a new STANDUP for the provided date (2024-10-07): Progress: Testing reveals more problems with handling IO exceptions in the library. Adding retries for http requests but we still seem to silently throw more. Needs more investigation. It should be finished by 2024-10-08. Next Day: Next day I will be working on the #11084 task. Need to figure out where in the libraries we are crashing. |
Hubert Plociniczak reports a new STANDUP for yesterday (2024-10-08): Progress: Extended the scope of retries to cover HTTP requests and reading from body's stream. Made retry implementation pretty simple. More testing revealed that this finally fixed notorious Next Day: Next day I will be working on the #11272 task. Continue investigating the issue |
Rather than cancelling the Future that captures jobs' logic, this change introduces a two-level system: - interrupt all jobs softly via ThreadInterrupted in safepoint - if safepoint is not executed within some time period, trigger a hard-interrupt by cancelling the job explicitly, is possible Closes #11084.
Hubert Plociniczak reports a new STANDUP for the provided date (2024-10-18): Progress: More testing. Looking if we can eliminate somehow the IO Errors but it looks likely to be caused by InterruptedException. Reviewing the current state of CI. It should be finished by 2024-10-18. Next Day: Next day I will be working on the #11084 task. Address interrupted exceptions |
Hubert Plociniczak reports a new STANDUP for yesterday (2024-10-21): Progress: Created a draft PR that introduces a 2-level system of interruptions: via safepoints/ThreadInterrupted Exception and if that fails via InterruptedException. Should reduce chances of weird IO exceptions. Adding more info to diagnose weird MacOS e2e failures (#11369). It should be finished by 2024-10-22. Next Day: Next day I will be working on the #11084 task. More testing and undraft PR |
Rather than cancelling Futures that capture jobs' logic, this change introduces a two-level system: - interrupt all jobs softly via ThreadInterrupted at safepoints - if safepoint is not executed within some time period or it is but the job is still not cancelled, trigger a hard-interrupt by cancelling the job explicitly, if possible Closes #11084.
Hubert Plociniczak reports a new STANDUP for the provided date (2024-10-22): Progress: Modified initial PR to make it work correctly with Future's cancellations semantics. More testing to ensure that the solution work although thread interruptions may still occur. It should be finished by 2024-10-22. Next Day: Next day I will be working on the #11084 task. Address PR review |
Hubert Plociniczak reports a new STANDUP for the provided date (2024-10-23): Progress: Still fighting with improved interruptions. Somehow visualizations are timing out with the new logic. Reduced hours. It should be finished by 2024-10-25. Next Day: Next day I will be working on the #11084 task. Continue investigating the issue. |
Hubert Plociniczak reports a new STANDUP for the provided date (2024-10-24): Progress: Same status as the day before. Still debugging problems with visualizations. It should be finished by 2024-10-25. Next Day: Next day I will be working on the #11084 task. Continue investigating the issue. |
Hubert Plociniczak reports a new STANDUP for the provided date (2024-10-25): Progress: Continued debugging visualizations when interruptions are handled dfferently. Debugging sessions with James re widgets. It should be finished by 2024-10-25. Next Day: Next day I will be working on the #11084 task. Continue investigating the issue. |
Hubert Plociniczak reports a new STANDUP for the provided date (2024-10-29): Progress: Minor cleanups and brief planning. Mostly off. It should be finished by 2024-10-29. Next Day: Next day I will be working on the #11265 task. Pick up logging tickets |
Hubert Plociniczak reports a new STANDUP for the provided date (2024-10-28): Progress: Updated PR and performed extensive testing. The setup behaves better and reduces chances of accidental IO exceptions. It should be finished by 2024-10-29. Next Day: Next day I will be working on the #11084 task. Address review and pick up next item |
Our execution can be interrupted but it should only happen when there is another one in the queue scheduled.
The text was updated successfully, but these errors were encountered: