Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[chip/prim_esc_receiver] check if prim_esc_receiver is out-of-sync #14887

Closed
1 of 2 tasks
cindychip opened this issue Sep 12, 2022 · 10 comments · Fixed by #23894
Closed
1 of 2 tasks

[chip/prim_esc_receiver] check if prim_esc_receiver is out-of-sync #14887

cindychip opened this issue Sep 12, 2022 · 10 comments · Fixed by #23894
Assignees
Labels
Component:DV DV issue: testbench, test case, etc. Earlgrey-PROD Candidate Temporary label to triage issues into Earlgrey-PROD Milestones IP:alert_handler IP:prim IP:pwrmgr Priority:P2 Priority: medium TOP:earlgrey

Comments

@cindychip
Copy link
Contributor

cindychip commented Sep 12, 2022

This discussion comes from PR: #14663 (comment)

We would like some checking between prim_esc_receiver and alert_handler, and pwrmgr to make sure they are not out-of-sync.

For example - When alert_handler/pwrmgr reset, will prim_esc_receiver counters clear to 0?
Currently if the prim_esc_receiver did not reset, and we configured the alert_handler to ping after reset, it is hard to find this error.

So the current suggestions are:

  • probing some internal signals, or write assertions.
  • have some long FPGA test to re-iterate this sequence, or wait for longer time after wake up, just to catch if any modules are out-of-sync.

@moidx @tjaychen @msfschaffner

estimate 12
remaining 2023-05-04 8

@andreaskurth
Copy link
Contributor

Triaged for pwrmgr. May be relevant for M2.5, so assigning it there with existing Priority:P2 Priority: medium .

@moidx
Copy link
Contributor

moidx commented Mar 27, 2023

This was mainly to test synchronization corner cases that were pointed out in the alert handler revere ping test cases. Assigning to @msfschaffner to review priority. I suspect this could potentially be lowered to P3 if we are ok with the current coverage.

@moidx moidx self-assigned this Apr 10, 2023
@moidx moidx added TOP:earlgrey Priority:P1 Priority: high and removed Priority:P2 Priority: medium labels Apr 10, 2023
@msfschaffner
Copy link
Contributor

Regarding the assertion, I think adding a check that makes sure that the alert handler and escalation receiver are both reset together (with some small margin) should do the trick here.

@msfschaffner
Copy link
Contributor

msfschaffner commented Apr 10, 2023

estimate update
FPGA update: 0
assertion: 2

@msfschaffner
Copy link
Contributor

Discussed with @matutem - this does not seem to be very important, given existing escalations test, hence moving this to backlog.

@matutem
Copy link
Contributor

matutem commented Mar 4, 2024

We have a number of escalation tests in both simulation and fpga. I think various RTL changes have made this area much more safe than at the time this issue was filed:

  • the alert handler gets notified when both reset and clocks are on or off
  • the escalation network is reset on every lc reset

My preference would be to add SVA to check the alert handler escalation request and pwrmgr detecting it happen no more than a few cycles apart. As for pwrmgr triggering escalation when a ping timeout occurs, we could add the following assertions:

  • the escalation receiver timeout counter never gets past half its range (i.e., the top bit is never set)
  • the alert handler stops ping requests to the escalation receivers when escalation reset is active
  • the escalation receivers reset the timeout counters when escalation reset is active

The assertions should run in both unit and top level tests. I think we have enough stimulus to verify this. I think this doesn't need to block D/V2(S).

@matutem
Copy link
Contributor

matutem commented Jul 1, 2024

More about this: it seems much of this is addressed if we assert each individual esc_sender/receiver pair is connected to the same clk and rst_n signals. That can be run as a connectivity test.

@matutem
Copy link
Contributor

matutem commented Jul 2, 2024

The proposal is to create a connectivity config that checks all prim_esc_sender for the clk_i and rst_ni. The check is expected to fail because pwrmgr's esc_receiver rst_i is NOT connected to the same signal as alert_handler's esc_sender rst_i, which is captured in #23888.

matutem added a commit to matutem/opentitan that referenced this issue Jul 2, 2024
… and rst_n

The escalation sender and receiver pairs need to connect to the same clock and
reset signals, or they can malfunction.

This adds a connectivity configuration for these tests, but it is expected to
fail due to the mis-connection of pwrmgr rst_ni described in lowRISC#23888.

Fixes lowRISC#14887

Signed-off-by: Guillermo Maturana <[email protected]>
@andreaskurth
Copy link
Contributor

Discussed in the triage meeting: postponing this to M7 because the problem described in #23888 is well understood but won't be changed for Earlgrey-PROD (because it doesn't have a significant impact). Thus the test in PR #23894, which currently fails because it tests the eventual behavior, needs to be adapted to test the current behavior and pass.

matutem added a commit to matutem/opentitan that referenced this issue Jul 12, 2024
… and rst_n

The escalation sender and receiver pairs need to connect to the same clock and
reset signals, or they can malfunction.

This adds a connectivity configuration for these tests, but it is expected to
fail due to the mis-connection of pwrmgr rst_ni described in lowRISC#23888.

Fixes lowRISC#14887

Signed-off-by: Guillermo Maturana <[email protected]>
@vogelpi
Copy link
Contributor

vogelpi commented Jul 15, 2024

I've merged PR #23894 which closed this issue (adding connectivity checks) as the checks indeed pass with the current design.

Note that issue #23888 tracking the fixing of the reset connections is still open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component:DV DV issue: testbench, test case, etc. Earlgrey-PROD Candidate Temporary label to triage issues into Earlgrey-PROD Milestones IP:alert_handler IP:prim IP:pwrmgr Priority:P2 Priority: medium TOP:earlgrey
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants