-
Notifications
You must be signed in to change notification settings - Fork 670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] [FlyteAdmin] Notifications SQS subscriber stops processing messages when "connection reset by peer" #376
Comments
@rstanevich thank you for opening up the bug, I will add it to the next milestone. Do you think thats ok, or is it affecting everyday and we should fix it ASAP? |
thanks for opening @rstanevich - is this a recent issue? |
Thank you for the reply. BTW, we have Flyte's very own |
@rstanevich do you have an alert if FlyteAdmin crashes? What would be preferrable, that you notice a FlyteAdmin crash or it continues to limp along and you notice older messages? |
@kumare3 |
@rstanevich I think we found a very good way of solving this problem. @katrogan will merge the PR soon. Thank you for raising the issue. |
it is merged and will be part of the next release |
* Better Error Signed-off-by: Haytham Abuelfutuh <[email protected]> * lint Signed-off-by: Haytham Abuelfutuh <[email protected]> * fix unit test Signed-off-by: Haytham Abuelfutuh <[email protected]> * Add defensive nil checks Signed-off-by: Haytham Abuelfutuh <[email protected]>
…#376) * wip Signed-off-by: Katrina Rogan <[email protected]> * add a test too Signed-off-by: Katrina Rogan <[email protected]> * Matchable attribute impl Signed-off-by: Katrina Rogan <[email protected]>
* Add missing in_container.mk Signed-off-by: Haytham Abuelfutuh <[email protected]> * Fix serialization for greatexpectations Signed-off-by: Haytham Abuelfutuh <[email protected]> * fix mnist classifier examples Signed-off-by: Haytham Abuelfutuh <[email protected]> * Update requirements for kfpytorch Signed-off-by: Haytham Abuelfutuh <[email protected]> * Update kfpytorch requirements Signed-off-by: Haytham Abuelfutuh <[email protected]> * Update sql requirements Signed-off-by: Haytham Abuelfutuh <[email protected]> * Enable pytorch sagemaker image build Signed-off-by: Haytham Abuelfutuh <[email protected]> * tidy names Signed-off-by: Haytham Abuelfutuh <[email protected]> * Try to silence TERM errors Signed-off-by: Haytham Abuelfutuh <[email protected]> * Install libssl1.0.0 Signed-off-by: Haytham Abuelfutuh <[email protected]> * Updates to pytorch images Signed-off-by: Haytham Abuelfutuh <[email protected]> * Try updates Signed-off-by: Haytham Abuelfutuh <[email protected]> * cleanup Signed-off-by: Haytham Abuelfutuh <[email protected]>
* Better Error Signed-off-by: Haytham Abuelfutuh <[email protected]> * lint Signed-off-by: Haytham Abuelfutuh <[email protected]> * fix unit test Signed-off-by: Haytham Abuelfutuh <[email protected]> * Add defensive nil checks Signed-off-by: Haytham Abuelfutuh <[email protected]>
…#376) * wip Signed-off-by: Katrina Rogan <[email protected]> * add a test too Signed-off-by: Katrina Rogan <[email protected]> * Matchable attribute impl Signed-off-by: Katrina Rogan <[email protected]>
Describe the bug
Notifications SQS subscriber stopped process messages
Expected behavior
Gracefully reconnecting if the application is running
Flyte component
To Reproduce
Steps to reproduce the behavior:
Environment
Flyte component
Additional context
Logs:
I guess solution will be similar to this one: flyteorg/flyteadmin#92
The text was updated successfully, but these errors were encountered: