[chip-test] alert_handler_reverse_ping_in_deep_sleep #14663

moidx · 2022-08-30T23:01:36Z

Check that escalation receivers located inside always-on blocks do not
auto-escalate due to the reverse ping feature while the system is in deep
sleep.

Signed-off-by: Miguel Osorio [email protected]

engdoreis

LGTM, I just left a few minor comments.

sw/device/tests/alert_handler_reverse_ping_in_deep_sleep_test.c

hw/top_earlgrey/data/chip_testplan.hjson

moidx · 2022-09-07T23:22:42Z

Hi @cindychip, I updated the test to enable and check all local alerts. I also added a flag to the ISR to fail the test if there are any alert interrupts detected. PTAL. Thanks!

cindychip

Thanks Miguel. great test!
I am running in DV and trying to dump waves to see if it is what we expect.

sw/device/tests/alert_handler_reverse_ping_in_deep_sleep_test.c

cindychip

Looks great, thanks Miguel

sw/device/tests/alert_handler_reverse_ping_in_deep_sleep_test.c

msfschaffner

LGTM modulo @cindychip's comments.

msfschaffner · 2022-09-09T19:13:45Z

hw/top_earlgrey/data/chip_testplan.hjson

+              - Ensure reset status is low power exit. A `kDifRstmgrResetInfoEscalation` signals that
+                there was a local escalation and should result in test failure.
+              - Disable AON timer.
+              - Check there are no flagged local alerts.


msfschaffner · 2022-09-09T19:14:41Z

hw/top_earlgrey/dv/chip_sim_cfg.hjson

@@ -778,15 +778,33 @@
      uvm_test_seq: chip_sw_alert_handler_escalation_vseq
      sw_images: ["//sw/device/tests/sim_dv:alert_handler_escalation_test:1"]
      en_run_modes: ["sw_test_mode_test_rom"]
+      // Disable scoreboard to avoid incorrect alert prediction from the alert_monitor. Due to the
+      // cross-domain alert senders and receivers, the monitor from the chip level did not support
+      // processing alerts accurately from both ends.


Is this something that we track in an issue to fix later (if not, should it be?)?

cindychip · 2022-09-10T02:48:46Z

@moidx here is the run result in dv:

Maybe we can set the run_timeout_min to 4 hours?

### Test Results
|  Stage  |                       Name                       | Tests                                            |  Max Job Runtime  |  Simulated Time  |  Passi
ng  |  Total  |  Pass Rate  |
|:-------:|:------------------------------------------------:|:-------------------------------------------------|:-----------------:|:----------------:|:------
---:|:-------:|:-----------:|
|   V2    | chip_sw_alert_handler_reverse_ping_in_deep_sleep | chip_sw_alert_handler_reverse_ping_in_deep_sleep |      3.443h       |    255.146ms     |     1 
    |    1    |  100.00 %   |
|   V2    |                                                  | **TOTAL**                                        |                   |                  |     1 
    |    1    |  100.00 %   |
|         |                                                  | **TOTAL**                                        |                   |                  |     1 
    |    1    |  100.00 %   |

tjaychen · 2022-09-10T05:03:05Z

hey all,
i've been staring at this test a bit, and now i'm not sure how we really should be testing this. Because it's IMO a combination of hardware / software / system behavior.

From what I can see, the most likely way a failure will trigger is a combination of two things:

we screw up the clock / reset controls somehow such that pwrmgr escalation receiver gets out of sync from the alert_handler.
after a low power event, the software for whatever reason takes an enormous amount of time before it enables pings again.

The first event will cause the pwrmgr escalation receiver to think that it should keep counting even though the alert handler has gone offline.
While the latter (if the limit is 175ms..it's not inconceivable software might run into this, since as far as I can tell rom doesn't enable the ping) would mean the alert handler doesn't ping the escalation for some time, causing it to auto-escalate.

The test case we have here I feel like is not very likely to ever flag an error (though it is still good to have) because most of the "time" in the test I think is used in pre-sleep, and it's not very likely things will go out of sync there.

The only way I can think of to test this is to hit the alert handler / pwrmgr combintation repeatedly with different kinds of sleep over and over with real software / or introduce artificial delays on the software, but all of this is going to take way too long in simulation. We could run it in FPGA, but this would be something that should be run for hours.

I think at a minimum I should probably add some assertions to ensure the escalation receiver and alert_handler cannot get out of sync, but beyond that I don't really know how to make this better.... any thoughts?

tjaychen · 2022-09-10T05:14:40Z

i think what i'm trying to say is that the purpose of these tests is probably to look for places where alert handler and pwrmgr get out of sync. But in this test, there's really only one moment where that could happen. And even if things DID get out of sync, it would require software to behave in an expected way to actually manifest an error. So I wonder if it makes more sense to codify what we think are the properties into cross module assertions, and then just try to stimulate it as much as possible.

moidx · 2022-09-10T20:38:50Z

Hi @tjaychen, for the long running test you are describing, we could simulate different pwrmgr transitions in a similar way to pwrmgr_deep_sleep_all_reset_reqs_test.c with additional random delays in software while enabling/checking alerts on each power cycle. I can probably write that as a separate test if it makes sense. wdyt?

cindychip · 2022-09-12T21:46:35Z

Thanks Tim and Miguel for the suggestions.

I think Tim's concern is totally valid, because prim_esc_receiver does not have any status register to tell which state is it at, if reset happened or not, or if this counter is reset to 0.

1). To address pwrmgr and prim_esc_receiver not in sync:
As Tim summarized, we can try the following two things:
a). probing some internal signals, or write assertions.
b). have some long FPGA test to re-iterate this sequence, or wait for longer time after wake up, just to catch if any modules are out-of-sync.
To unblock this PR from reviewing/merging, I will create a separate issue for that.

We could have some long FPGA test to re-iterate this sequence, or wait for longer time after wake up, just to catch if any modules are out-of-sync.
2). To address pwrmgr and alert_handler not in sync:
@moidx's suggestion would work!

@tjaychen please feel free to let me know if I missed any points from our discussion.

msfschaffner · 2022-09-12T22:24:05Z

@tjaychen just to clarify, the out of sync scenario you mention should not be possible if everything is implemented correctly, right? i.e., the vectors you describe would only help to bias the test so that the involved mechanisms are used in a way that a bug (if there was one) would manifest more readily?

tjaychen · 2022-09-12T22:58:15Z

@tjaychen just to clarify, the out of sync scenario you mention should not be possible if everything is implemented correctly, right? i.e., the vectors you describe would only help to bias the test so that the involved mechanisms are used in a way that a bug (if there was one) would manifest more readily?

yeah. But I worry I'm being too specific on just clk/rst, so if you can think of any other areas we should target..we should probably think about that too.

Check that escalation receivers located inside always-on blocks do not auto-escalate due to the reverse ping feature while the system is in deep sleep. Signed-off-by: Miguel Osorio <[email protected]>

moidx requested review from a team and cfrantz as code owners August 30, 2022 23:01

moidx requested review from rasmus-madsen and engdoreis and removed request for a team August 30, 2022 23:01

moidx force-pushed the chip-test-alert-handler-reverse-ping branch from 294c14d to 8518316 Compare August 30, 2022 23:08

moidx removed request for cfrantz, rasmus-madsen and engdoreis August 30, 2022 23:09

engdoreis approved these changes Aug 31, 2022

View reviewed changes

moidx force-pushed the chip-test-alert-handler-reverse-ping branch from 8518316 to 9639385 Compare August 31, 2022 22:22

moidx requested review from a team, ctopal, weicaiyang and msfschaffner and removed request for a team and ctopal September 1, 2022 00:13

moidx changed the title ~~[test] alert_handler_reverse_ping_in_deep_sleep~~ [chip-test] alert_handler_reverse_ping_in_deep_sleep Sep 1, 2022

moidx linked an issue Sep 1, 2022 that may be closed by this pull request

[chip-test] chip_sw_alert_handler_reverse_ping_in_deep_sleep #14134

Closed

6 tasks

moidx force-pushed the chip-test-alert-handler-reverse-ping branch from 9639385 to 9ff9827 Compare September 1, 2022 01:09

moidx requested a review from cindychip September 6, 2022 15:21

cindychip reviewed Sep 6, 2022

View reviewed changes

hw/top_earlgrey/data/chip_testplan.hjson Outdated Show resolved Hide resolved

hw/top_earlgrey/data/chip_testplan.hjson Outdated Show resolved Hide resolved

hw/top_earlgrey/data/chip_testplan.hjson Outdated Show resolved Hide resolved

moidx force-pushed the chip-test-alert-handler-reverse-ping branch 2 times, most recently from 9ff9827 to 4a6d497 Compare September 7, 2022 23:18

moidx requested a review from cindychip September 7, 2022 23:21

cindychip reviewed Sep 8, 2022

View reviewed changes

sw/device/tests/alert_handler_reverse_ping_in_deep_sleep_test.c Outdated Show resolved Hide resolved

moidx force-pushed the chip-test-alert-handler-reverse-ping branch from 4a6d497 to 3020974 Compare September 8, 2022 23:15

cindychip approved these changes Sep 9, 2022

View reviewed changes

sw/device/tests/alert_handler_reverse_ping_in_deep_sleep_test.c Outdated Show resolved Hide resolved

sw/device/tests/alert_handler_reverse_ping_in_deep_sleep_test.c Show resolved Hide resolved

msfschaffner approved these changes Sep 9, 2022

View reviewed changes

cindychip mentioned this pull request Sep 12, 2022

[chip/prim_esc_receiver] check if prim_esc_receiver is out-of-sync #14887

Closed

2 tasks

[test] alert_handler_reverse_ping_in_deep_sleep

ff7d6a0

Check that escalation receivers located inside always-on blocks do not auto-escalate due to the reverse ping feature while the system is in deep sleep. Signed-off-by: Miguel Osorio <[email protected]>

moidx force-pushed the chip-test-alert-handler-reverse-ping branch from 3020974 to ff7d6a0 Compare September 13, 2022 02:07

moidx merged commit db5c836 into lowRISC:master Sep 13, 2022

moidx deleted the chip-test-alert-handler-reverse-ping branch September 13, 2022 05:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[chip-test] alert_handler_reverse_ping_in_deep_sleep #14663

[chip-test] alert_handler_reverse_ping_in_deep_sleep #14663

moidx commented Aug 30, 2022

engdoreis left a comment

moidx commented Sep 7, 2022

cindychip left a comment

cindychip left a comment

msfschaffner left a comment

msfschaffner Sep 9, 2022

msfschaffner Sep 9, 2022

cindychip commented Sep 10, 2022

tjaychen commented Sep 10, 2022

tjaychen commented Sep 10, 2022

moidx commented Sep 10, 2022

cindychip commented Sep 12, 2022 •

edited

Loading

msfschaffner commented Sep 12, 2022

tjaychen commented Sep 12, 2022

[chip-test] alert_handler_reverse_ping_in_deep_sleep #14663

[chip-test] alert_handler_reverse_ping_in_deep_sleep #14663

Conversation

moidx commented Aug 30, 2022

engdoreis left a comment

Choose a reason for hiding this comment

moidx commented Sep 7, 2022

cindychip left a comment

Choose a reason for hiding this comment

cindychip left a comment

Choose a reason for hiding this comment

msfschaffner left a comment

Choose a reason for hiding this comment

msfschaffner Sep 9, 2022

Choose a reason for hiding this comment

msfschaffner Sep 9, 2022

Choose a reason for hiding this comment

cindychip commented Sep 10, 2022

tjaychen commented Sep 10, 2022

tjaychen commented Sep 10, 2022

moidx commented Sep 10, 2022

cindychip commented Sep 12, 2022 • edited Loading

msfschaffner commented Sep 12, 2022

tjaychen commented Sep 12, 2022

cindychip commented Sep 12, 2022 •

edited

Loading