[fpga, prim] Add support for Ultrascale and CW340 + CW341 #19295

a-will · 2023-07-26T22:19:48Z

Add primitives for a new Ultrascale library. The clocking architecture and primitive set is new vs. 7 series.

In the interest of moving more of the clocking out of LUTs / keeping it on dedicated clock channels, make prim_clock_div abstract and add Ultrascale-specific clock_div and clock_inv implementations.

Add hardware support for the CW340 + CW341 boards.

Note: The build times are going to be higher for this board than need to be. We might want to seriously consider removing generic mode (#15452), as the current spi_device structure makes it difficult to close timing and poses various CDC challenges. The timing for generic mode is wildly different from flash and TPM modes, and they mix poorly together.

On my machine, the hour-plus build goes down to 40-45 mins if generic_mode is removed. CI will be much slower that that, though ;)

a-will · 2023-07-26T22:25:06Z

hw/ip/prim_xilinx_ultrascale/lint/prim_xilinx_ultrascale_clock_buf.vlt

Is there any reason to have these empty waivers? Should we just delete them? (carryover from prim_xilinx)

I could go either way on that. Fine with removing them (just have to update / comment the associated lines in the corefiles).

a-will · 2023-07-27T20:25:20Z

Just noting: Build time for CW340 in CI was 1h46m with spi_device still containing generic mode. But that seems fast, since CW310 took only 1h01m.

a-will · 2023-07-28T20:31:31Z

CC @engdoreis

hw/top_earlgrey/data/pins_cw341.xdc

hw/top_earlgrey/rtl/clkgen_xil_ultrascale.sv

hw/bitstream/vivado/BUILD

engdoreis · 2023-07-31T09:34:08Z

CC @GregAC @johngt

engdoreis

Thanks @a-will, This PR seems sensible to me, but I don't know enough to approved it.

hw/bitstream/vivado/BUILD

hw/top_earlgrey/data/pins_cw341.xdc

nbdd0121 · 2023-08-04T13:51:09Z

azure-pipelines.yml

@@ -509,6 +509,50 @@ jobs:
    displayName: Upload artifacts for CW310
    condition: failed()

+- job: chip_earlgrey_cw340


Could this be achieved using matrix? https://learn.microsoft.com/en-us/azure/devops/pipelines/yaml-schema/jobs-job-strategy?view=azure-pipelines

There is actually a template in the integrated repo, and it would be good to bring it over, tweak it, and apply it to all bitstream jobs. However, this PR is rather large already, hehe.

I plan to do that separately.

msfschaffner

Thanks for all the work here - this LGTM! Just a few questions.

msfschaffner · 2023-08-09T22:03:58Z

hw/ip/prim_xilinx_ultrascale/lint/prim_xilinx_ultrascale_clock_buf.vlt

I could go either way on that. Fine with removing them (just have to update / comment the associated lines in the corefiles).

msfschaffner · 2023-08-09T22:08:19Z

hw/ip/prim_xilinx_ultrascale/rtl/prim_xilinx_ultrascale_clock_inv.sv

+    if (HasScanMode) begin : gen_scan
+      BUFGCTRL #(
+        .IS_I0_INVERTED(1'b1),
+        .IS_S0_INVERTED(1'b1)


Do I understand correctly that this takes care of selection inversion below (just because the same clock / selects are assigned to both channels)?

That's right! I0 having inverted sense means that when the I0 pin is selected, O will be inverted. Likewise, S0 is set to have inverted sense, so the same select signal can be used for both S0 and S1.

msfschaffner · 2023-08-09T22:09:10Z

hw/ip/prim_generic/rtl/prim_generic_clock_div.sv

@@ -4,7 +4,7 @@

 `include "prim_assert.sv"

-module prim_clock_div #(
+module prim_generic_clock_div #(


Cool idea! I bet this results in better timing since the tool can use proper clocking resources now...

msfschaffner · 2023-08-09T22:10:27Z

hw/ip/prim_xilinx_ultrascale/rtl/prim_xilinx_ultrascale_clock_div.sv

+
+`include "prim_assert.sv"
+
+module prim_xilinx_ultrascale_clock_div #(


is this also available for the 7 series? should we add that as well?

We could, but at least for Earl Grey, we wouldn't be able to make use of it, I think. ~~Unlike Ultrascale, 7 series only has 32 BUFGs total, to use across muxes, dividers, general global buffers, etc. And in CW310, we're using almost all of them already.~~

Edit: Actually, I forgot. The dividers are only on regional buffers (BUFR) in 7series. They're more complicated to use, though.

Yes, we use a BUFH e.g. for AES to enforce a specific AES placement on CW310. The issue is that if you don't enforce the placement when using BUFH, Vivado (or at least the version we are currently using) tends to do irrational things, see #8138 (comment). I expect the same will happen with BUFH. Maybe it would be worth to at one point check if this still happens with newer Vivado releases and if not, upgrade. Of course not as part of this PR.

a-will · 2023-08-10T00:13:09Z

For the curious about achievable performance, a quick build for CW340 closed timing for 30 MHz CLK_SYS and 33.333 MHz CLK_IO with a lot of slack still on CLK_IO (enough that the build could hit 50 MHz for CLK_IO). The build took no more time than the 10 MHz variant.

CLK_SYS had less headroom; some OTBN paths with 55 layers of logic left it with just over 4 ns of slack. Though, I should say that 4 ns is still large enough that it might not reflect the end of what's easy.

Meanwhile, on that build, SPI passthrough passed timing for better than 24 MHz (full cycle sampling).

vogelpi

Thanks @a-will , LGTM!

vogelpi · 2023-08-10T08:47:57Z

util/topgen/templates/chiplevel.sv.tpl

@@ -1069,7 +1100,7 @@ module chip_${top["name"]}_${target["name"]} #(
 // Also need to add AST simulation and FPGA emulation models for things like entropy source -
 // otherwise Verilator / FPGA will hang.
  top_${top["name"]} #(
-% if target["name"] == "cw310":
+% if target["name"] in ["cw310", "cw340"]:


Eventually, we will want to add a separate branch for the CW340/CW341 here to enable also KMAC masking on FPGA. For this, both KeymgrKmacEnMasking and KmacEnMasking need to be set to 1. But I recommend doing this as a follow-up. It will have a big impact on utilization, timing and maybe also build time.

With those enabled, it consumed another 3% of the available logic, and timing was not materially affected. The build time did go up another 15-20% locally (about 6-8 minutes) for the 30 MHz SYS / 33.333 MHz IO build.

Oh wow! The FPGA must be really big then :-) Thanks for giving it a try and reporting back @a-will !

vogelpi · 2023-08-10T08:52:59Z

hw/ip/prim_xilinx_ultrascale/rtl/prim_xilinx_ultrascale_and2.sv

@@ -0,0 +1,19 @@
+// Copyright lowRISC contributors.


I would have been nice if we found a way to share those primitives between 7-series and ultrascale that can be shared - probably everything except for the clock/reset stuff. This would save quite some code duplication. But I understand this would require additional tooling work and it isn't high prio IMHO.

Yeah, we could probably lump that in with a wider primgen overhaul. We kind of need to either fix fusesoc to handle dependency injection after running generators or to move primgen activity outside fusesoc (i.e. something that runs prior to fusesoc).

vogelpi · 2023-08-10T08:56:53Z

hw/ip/prim_xilinx_ultrascale/rtl/prim_xilinx_ultrascale_clock_div.sv

+
+`include "prim_assert.sv"
+
+module prim_xilinx_ultrascale_clock_div #(


Yes, we use a BUFH e.g. for AES to enforce a specific AES placement on CW310. The issue is that if you don't enforce the placement when using BUFH, Vivado (or at least the version we are currently using) tends to do irrational things, see #8138 (comment). I expect the same will happen with BUFH. Maybe it would be worth to at one point check if this still happens with newer Vivado releases and if not, upgrade. Of course not as part of this PR.

jwnrt

We went through this in the software team and didn't have any concerns, just one nitpick.

Thanks @a-will.

hw/bitstream/vivado/BUILD

Add a library specific to the Xilinx / AMD Ultrascale FPGAs. The clocking architecture is quite different for Ultrascale vs. 7 series, and the available primitives are not the same. Thus, it needed its own set of primitives. Signed-off-by: Alexander Williams <[email protected]>

Create a primitive for clock inverters on Ultrascale devices that use clock buffers. Use the clock buffers for clock inverters where possible, so clocks may continue to be routed on dedicated clock routing tracks. Unlike the case with 7-series, there are many clock buffers available in Ultrascale design. Signed-off-by: Alexander Williams <[email protected]>

Make prim_clock_div an abstract core, allowing the use of technology-specific clock divider primitives that meet the requirements. Add prim_clock_inv to prim:all. fusesoc appears to not gather all dependencies when generated cores depend on abstract cores. prim_generic_clock_div depends on prim_clock_inv, but the core was left behind when building clkmgr sims. Signed-off-by: Alexander Williams <[email protected]>

Use the clock dividers in Ultrascale for small divisors, so most clocking can remain on the dedicated routing channels. Signed-off-by: Alexander Williams <[email protected]>

Add FPGA build support for the CW340 + CW341 with Earl Grey. Port the SPI constraints from the ASIC to the CW340, and move the USB timing constraints out of the physical constraints file. Adjust build scripts for the CW340's primitives for tasks like MMI generation. Use CW310 software for now. The device-side software should be mostly compatible, but this could change in the future. Signed-off-by: Alexander Williams <[email protected]>

This will add CW340 builds to the bitstream caches, so tooling and software development may proceed without each developer needing to build the design. Signed-off-by: Alexander Williams <[email protected]>

a-will · 2023-08-10T20:09:24Z

Since CW310 is now at 24 MHz, I've increased the frequency here to match.

a-will commented Jul 26, 2023

View reviewed changes

a-will force-pushed the cw340 branch 2 times, most recently from 3a7f4c5 to 4fe8a14 Compare July 27, 2023 02:48

a-will force-pushed the cw340 branch 2 times, most recently from b15ace9 to eca5e1a Compare July 28, 2023 17:29

a-will self-assigned this Jul 28, 2023

a-will marked this pull request as ready for review July 28, 2023 20:31

a-will requested a review from msfschaffner as a code owner July 28, 2023 20:31

a-will commented Jul 28, 2023

View reviewed changes

hw/top_earlgrey/data/pins_cw341.xdc Outdated Show resolved Hide resolved

a-will commented Jul 28, 2023

View reviewed changes

hw/top_earlgrey/data/pins_cw341.xdc Outdated Show resolved Hide resolved

a-will commented Jul 28, 2023

View reviewed changes

hw/top_earlgrey/rtl/clkgen_xil_ultrascale.sv Show resolved Hide resolved

a-will commented Jul 28, 2023

View reviewed changes

hw/bitstream/vivado/BUILD Outdated Show resolved Hide resolved

engdoreis reviewed Jul 31, 2023

View reviewed changes

hw/bitstream/vivado/BUILD Outdated Show resolved Hide resolved

hw/bitstream/vivado/BUILD Outdated Show resolved Hide resolved

hw/top_earlgrey/data/pins_cw341.xdc Show resolved Hide resolved

msfschaffner requested review from GregAC, andreaskurth and vogelpi August 3, 2023 18:36

a-will force-pushed the cw340 branch from eca5e1a to 0b98603 Compare August 3, 2023 22:22

a-will requested review from milesdai and rswarbrick as code owners August 3, 2023 22:22

a-will force-pushed the cw340 branch 3 times, most recently from 553b3a4 to d2a1549 Compare August 4, 2023 05:58

nbdd0121 reviewed Aug 4, 2023

View reviewed changes

a-will force-pushed the cw340 branch from d2a1549 to 4b0bdb5 Compare August 4, 2023 15:23

msfschaffner approved these changes Aug 9, 2023

View reviewed changes

vogelpi approved these changes Aug 10, 2023

View reviewed changes

jwnrt approved these changes Aug 10, 2023

View reviewed changes

hw/bitstream/vivado/BUILD Outdated Show resolved Hide resolved

a-will force-pushed the cw340 branch 2 times, most recently from d914ed4 to 8ec2dbf Compare August 10, 2023 18:00

a-will added 6 commits August 10, 2023 12:40

[prim] Add clock divider prim for Ultrascale

79eab8a

Use the clock dividers in Ultrascale for small divisors, so most clocking can remain on the dedicated routing channels. Signed-off-by: Alexander Williams <[email protected]>

[ci] Add CW340 build to pipeline for master branch

ce6f04c

This will add CW340 builds to the bitstream caches, so tooling and software development may proceed without each developer needing to build the design. Signed-off-by: Alexander Williams <[email protected]>

a-will force-pushed the cw340 branch from 8ec2dbf to ce6f04c Compare August 10, 2023 19:45

a-will added the kokoro:rebuild label Aug 10, 2023

opentitan-github-bot removed the kokoro:rebuild label Aug 10, 2023

a-will merged commit 0eb1a55 into lowRISC:master Aug 10, 2023
24 checks passed

a-will deleted the cw340 branch August 10, 2023 22:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fpga, prim] Add support for Ultrascale and CW340 + CW341 #19295

[fpga, prim] Add support for Ultrascale and CW340 + CW341 #19295

a-will commented Jul 26, 2023 •

edited

Loading

a-will Jul 26, 2023

msfschaffner Aug 9, 2023

a-will commented Jul 27, 2023

a-will commented Jul 28, 2023

engdoreis commented Jul 31, 2023

engdoreis left a comment

nbdd0121 Aug 4, 2023

a-will Aug 4, 2023

msfschaffner left a comment •

edited

Loading

msfschaffner Aug 9, 2023

msfschaffner Aug 9, 2023

a-will Aug 9, 2023

msfschaffner Aug 9, 2023

msfschaffner Aug 9, 2023

a-will Aug 9, 2023 •

edited

Loading

vogelpi Aug 10, 2023

a-will commented Aug 10, 2023 •

edited

Loading

vogelpi left a comment

vogelpi Aug 10, 2023

a-will Aug 10, 2023

vogelpi Aug 11, 2023

vogelpi Aug 10, 2023

a-will Aug 10, 2023

vogelpi Aug 10, 2023

jwnrt left a comment

a-will commented Aug 10, 2023


		`include "prim_assert.sv"

		module prim_xilinx_ultrascale_clock_div #(

[fpga, prim] Add support for Ultrascale and CW340 + CW341 #19295

[fpga, prim] Add support for Ultrascale and CW340 + CW341 #19295

Conversation

a-will commented Jul 26, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

a-will commented Jul 27, 2023

a-will commented Jul 28, 2023

engdoreis commented Jul 31, 2023

engdoreis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

msfschaffner left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

a-will Aug 9, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

a-will commented Aug 10, 2023 • edited Loading

vogelpi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jwnrt left a comment

Choose a reason for hiding this comment

a-will commented Aug 10, 2023

a-will commented Jul 26, 2023 •

edited

Loading

msfschaffner left a comment •

edited

Loading

a-will Aug 9, 2023 •

edited

Loading

a-will commented Aug 10, 2023 •

edited

Loading