[Bug][Priority] RC sometimes not sending zero timer payloads? #1158

ear-dev · 2022-04-27T13:13:56Z

I noticed that some sessions on the brazil bot were getting no response at times from the DF.agent.

https://ba.chatbot.vega.viasat.com/live/sBkcCy9aeq3ujxdpq/room-info
https://ba.chatbot.vega.viasat.com/live/QBdfwBdq657aza5mC/room-info
https://ba.chatbot.vega.viasat.com/live/gtP8QotNBXLEf746e/room-info

Molly suggested that DF was expecting a zero timer payload which never arrived from RC.

hey, so i am digging into the first session you sent, and https://ba.chatbot.vega.viasat.com/live/sBkcCy9aeq3ujxdpq is getting stuck on a particular page in CX that expects a RC zero timer event - but RC didn’t send the event
8:35
not sure why as that flow has been workign fine for me every time i’ve tested
8:37
is there any way to see why RC is not sending that event? it seems like a weird intermittent issue?
8:38

@Shailesh351 I will assign to you...... maybe there is something in the RC logs?

The text was updated successfully, but these errors were encountered:

ear-dev · 2022-06-16T14:29:15Z

NOTE: Molly will provide Shailesh access to the CX logs for debugging. There are a few rooms with this issue pasted above, but I also use this one: Fnz8PEtSYcMR5Zc6Y

@Shailesh351 already has access to the CX agent so he can look at how the boleto intent and boletoPause events are handled. We have other zero timer payloads like registration that do not ever show this behavior where the room gets stuck because CX is not getting the pause event.

One diff might be that the other payloads also include a "please wait" text message, but we currently do not think that RC handles that any differently.

Another possibility might be that RC is somehow not handling an HTTP error message in this case? Basically from the RC side, we are not seeing any errors showing up in our logs around this event.

ear-dev · 2022-06-23T15:10:54Z

@Shailesh351 will find succesfull sessions where the visitor sent 'boleto' text, and compare with our frozen sessions. We may need log points to figure this out.

Molly can provide list of sessions with the issue...... NOTE: Debug logging was on June 21-22. So sessions have to be from then.

ear-dev · 2022-07-01T15:11:14Z

NOTE: After our recent upgrade in prod to RC version v4.4.2.widechat-4 and DF version v1.2.3.widechat-6 this issue seems to have fixed itself.... we will leave this story under review for another week and close if we do not see a repro.

ear-dev · 2022-07-06T14:49:19Z

@Shailesh351 @bhardwajaditya looks like this issue may have returned in prod. This time it was 'verificationPause', where CX was waiting for the event. The event is eventually showing up, but 40 minutes late or so. Somehow our task scheduling is getting stuck?

ear-dev · 2022-07-07T16:03:56Z

General Notes:

What would cause the scheduled event to hang up? Which process responsible for sending it.
Load?
Stuck threads?
Best effort events or guaranteed? How handled.
Retry, timing?
Latency on events? Characterize…. Min, max, average
How do events scale….. do they back up?

Who's responsible for the event scheduling: DF.app -> writes to DB -> appBridge -> RC.server

could appBridge be queueing and blocking?
could we be failing to do DB writes?
Failed scheduled events should throw an error: "The App appID is scheduling an onetime job processor ID ", "The App appId is scheduling a recurring job processor ID"

What is the lifecycle of a scheduled event?? RC docs? @bhardwajaditya can you help me document all the different states that a scheduled event will go through.

ear-dev · 2022-07-07T16:26:25Z

@bhardwajaditya searching for the log point where a job is getting scheduled does not help because there is no identifying data associated with it. I think we should make a story to add the roomID to these log points. What do you think?

ear-dev · 2022-07-14T16:10:20Z

NOTE: we see three flavors of this bug

The event never gets executed and our visitors are stuck in a blackout window
The event gets triggered 10 minutes..... or 2 hours later and CX is totally confused
The event gets triggered 10 minutes later and gets executed twice (maybe a few minutes apart even)

@Shailesh351 can you please look at latest RC server upstream to see if they may have a fix that we're missing? Thanks.

ear-dev · 2022-07-26T14:42:47Z

@Shailesh351 I've been testing this build and it is currently failing our 'Multiple "continue_blackout" message dropping payloads in a row ' test, described in this wiki: https://wiki.viasat.com/pages/viewpage.action?pageId=549170025

Can you verify please? thanks...

ear-dev assigned Shailesh351 Apr 27, 2022

ear-dev mentioned this issue Jun 14, 2022

[DF blackout window] timer to release a stuck blackout window. #1225

Open

ear-dev changed the title ~~[Bug] RC sometimes not sending zero timer payloads?~~ [Bug][Priority] RC sometimes not sending zero timer payloads? Jun 23, 2022

ear-dev closed this as completed Jul 5, 2022

ear-dev reopened this Jul 6, 2022

Shailesh351 mentioned this issue Jul 21, 2022

[FIX] Scheduler Issues by updating Agenda #1251

Merged

ear-dev closed this as completed in #1251 Jul 25, 2022

ear-dev reopened this Jul 26, 2022

Shailesh351 mentioned this issue Jul 28, 2022

[FIX] Multiple Blackout Test failing due to race condition in setting isProcessingMessage flag WideChat/Apps.Dialogflow#163

Merged

ear-dev closed this as completed Jul 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug][Priority] RC sometimes not sending zero timer payloads? #1158

[Bug][Priority] RC sometimes not sending zero timer payloads? #1158

ear-dev commented Apr 27, 2022 •

edited

Loading

ear-dev commented Jun 16, 2022 •

edited

Loading

ear-dev commented Jun 23, 2022

ear-dev commented Jul 1, 2022

ear-dev commented Jul 6, 2022

ear-dev commented Jul 7, 2022 •

edited

Loading

ear-dev commented Jul 7, 2022

ear-dev commented Jul 14, 2022

ear-dev commented Jul 26, 2022 •

edited

Loading

[Bug][Priority] RC sometimes not sending zero timer payloads? #1158

[Bug][Priority] RC sometimes not sending zero timer payloads? #1158

Comments

ear-dev commented Apr 27, 2022 • edited Loading

ear-dev commented Jun 16, 2022 • edited Loading

ear-dev commented Jun 23, 2022

ear-dev commented Jul 1, 2022

ear-dev commented Jul 6, 2022

ear-dev commented Jul 7, 2022 • edited Loading

ear-dev commented Jul 7, 2022

ear-dev commented Jul 14, 2022

ear-dev commented Jul 26, 2022 • edited Loading

ear-dev commented Apr 27, 2022 •

edited

Loading

ear-dev commented Jun 16, 2022 •

edited

Loading

ear-dev commented Jul 7, 2022 •

edited

Loading

ear-dev commented Jul 26, 2022 •

edited

Loading