Add Litmus tests #170

ezelioli · 2024-12-03T12:02:37Z

Contributions:

Refactor bootrom source code to be parametric (note no changes in actual bootrom content)
Add SMP support to software runtime
Add simple SMP hello software test
Add PULP's fork of Litmus tests as submodule (in sw/deps)
Add script with utility functions to parse output of Litmus tests (in utils/litmus)
Add make flow to run tests
Extend zero-stage boot-loader for SMP

ezelioli · 2024-12-03T18:58:03Z

The bootrom SMP support consists of pausing all secondary cores after a first common reset sequence, and let the main core do the initialization process. The main (non-SMP) core is statically determined by a macro at the beginning of the bootrom. The secondary cores are then woken up before moving to the next boot stage, i.e. in boot_next_stage.

The wakeup sequence consists of:

Sending a software interrupt to all cores (here). Note that this includes the main core itself, which will send an IPI to iself.
Wait for IPI to be received in each core. First, each core waits in a WFI loop. When the IPI is received, the core clears the respective CLINT IPI register, clearing the interrupt. Then, each core reads all the CLINT IPI registers to check that all other cores have already cleared it.
Both cores proceed to next stage

Possible problems:

We could avoid sending interrupts to the main core itself, simply resuming the other cores. This would then have some possible implications on how the "synch" step (point 2 above) happens, since secondary cores would not know when the first core has completed the wakeup sequence (by reading CLINT IPI registers). However, do we need this synchronization? Also, is this synchronization based on IPIs really race-free?

ezelioli · 2024-12-03T19:09:25Z

The SMP support in the software runtime (crt0.S) instead fixes the main core to core 0. All other cores are paused after some common required initialization steps in the crt0.S. Non-main cores wait in a WFI loop for software interrupts. The wake-up sequence in this case only sends IPIs to all cores except core 0. The smp_resume routine also waits for the interrupt to be cleared by the secondary cores before proceeding (here). This ensures that when the smp_resume returns, the IPIs have been propagated to all cores and that the cores have woke up. However, this has the downside of potentially deadlocking if another core does not wake up properly. Also, if another core has not reached the WFI loop for any reason, this will stall core 0 until then. Finally, is this really race-free?

ezelioli · 2024-12-03T21:00:12Z

Zero-stage bootloader also required some adaptations wrt #85 due to the different behavior upon resuming secondary harts in the Cheshire runtime (crt0.S). When calling smp_resume the secondary harts jump to main - exactly as for the primary hart, but skipping some cold init steps - instead of jumping to the point in the code where the smp_resume is placed.

niwis

LGTM. Can we add smp_hello to the Cheshire CI? I remember that we previously had issues with executing from either DRAM or SPM because of the way the stack was set up. Would be great to see if this is working now. Just out of curiosity, did you test the bootloader for a multicore configuration (e.g. SMP Linux?)

Regarding your comments:

We could avoid sending interrupts to the main core itself, simply resuming the other cores.

Why do you think this might be a problem? If synchronisation is needed, we could also add a barrier. Not sure if there would be a reason for it, though.

Zero-stage bootloader also required some adaptations wrt #85 due to the different behavior upon resuming secondary harts in the Cheshire runtime (crt0.S). When calling smp_resume the secondary harts jump to main - exactly as for the primary hart, but skipping some cold init steps - instead of jumping to the point in the code where the smp_resume is placed.

I think this makes sense!

sw/lib/crt0.S

niwis · 2024-12-04T05:07:14Z

sw/lib/smp.c

+    fence();
+    for (uint32_t i = 1; i < num_harts; i++) {
+        *reg32(&__base_clint, i << 2) = 0x1;
+        while (*reg32(&__base_clint, i << 2))


The smp_resume routine also waits for the interrupt to be cleared by the secondary cores before proceeding (here). This ensures that when the smp_resume returns, the IPIs have been propagated to all cores and that the cores have woke up. However, this has the downside of potentially deadlocking if another core does not wake up properly. Also, if another core has not reached the WFI loop for any reason, this will stall core 0 until then. Finally, is this really race-free?

The main possible drawback that I see here is that it might introduce a delay between waking up cores. Could the same be achieved by adding a barrier after smp_resume if necessary?

Yes, we could remove the CLINT register polling and leave the synchronization up to the programmer (e.g. by adding a barrier) if needed.

ezelioli · 2024-12-04T12:35:04Z

Regarding CI:

LGTM. Can we add smp_hello to the Cheshire CI? I remember that we previously had issues with executing from either DRAM or SPM because of the way the stack was set up. Would be great to see if this is working now. Just out of curiosity, did you test the bootloader for a multicore configuration (e.g. SMP Linux?)

Yes that should be possible. I have only tested the smp_hello myself, will add that to CI as well.

Regarding bootrom SMP:

We could avoid sending interrupts to the main core itself, simply resuming the other cores.

Why do you think this might be a problem? If synchronisation is needed, we could also add a barrier. Not sure if there would be a reason for it, though.

I think both approaches would be fine. I just am not sure whether we need to synchronize cores, and whether this way is a proper way of doing. However, the current approach is working and I don't see major issues with it.

Co-authored-by: Emanuele Parisi <[email protected]>

- Add dual-core configuration in testbench - Add number of cores parameter for consistent CLINT/PLIC generation - Add PLIC configuration file generation according to number of cores - Bump nonfree to version with baremetal SMP tests

ezelioli mentioned this pull request Dec 3, 2024

Improve SMP support #169

Closed

ezelioli marked this pull request as ready for review December 3, 2024 12:41

ezelioli requested review from paulsc96 and niwis December 3, 2024 12:41

ezelioli self-assigned this Dec 3, 2024

niwis previously approved these changes Dec 4, 2024

View reviewed changes

ezelioli and others added 2 commits December 5, 2024 10:36

Extend SMP support to runtime and ZSL

270bd1a

Co-authored-by: Emanuele Parisi <[email protected]>

Add Litmus tests

7fc246a

ezelioli dismissed niwis’s stale review via 7fc246a December 5, 2024 09:54

ezelioli force-pushed the ez/litmus branch from e653bd4 to 7fc246a Compare December 5, 2024 09:54

Add SMP tests to Gitlab CI

9c32519

- Add dual-core configuration in testbench - Add number of cores parameter for consistent CLINT/PLIC generation - Add PLIC configuration file generation according to number of cores - Bump nonfree to version with baremetal SMP tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Litmus tests #170

Add Litmus tests #170

ezelioli commented Dec 3, 2024 •

edited

Loading

ezelioli commented Dec 3, 2024

ezelioli commented Dec 3, 2024

ezelioli commented Dec 3, 2024

niwis left a comment

niwis Dec 4, 2024

ezelioli Dec 4, 2024

ezelioli commented Dec 4, 2024

Add Litmus tests #170

Are you sure you want to change the base?

Add Litmus tests #170

Conversation

ezelioli commented Dec 3, 2024 • edited Loading

ezelioli commented Dec 3, 2024

ezelioli commented Dec 3, 2024

ezelioli commented Dec 3, 2024

niwis left a comment

Choose a reason for hiding this comment

niwis Dec 4, 2024

Choose a reason for hiding this comment

ezelioli Dec 4, 2024

Choose a reason for hiding this comment

ezelioli commented Dec 4, 2024

ezelioli commented Dec 3, 2024 •

edited

Loading