Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Peripheral API design: exposing bus interfaces #10

Open
jfng opened this issue Feb 19, 2020 · 7 comments
Open

Peripheral API design: exposing bus interfaces #10

jfng opened this issue Feb 19, 2020 · 7 comments

Comments

@jfng
Copy link
Member

jfng commented Feb 19, 2020

Peripherals are a currently missing building block from nmigen-soc.

They would provide wrappers to cores by means of a CSR interface (also interrupts, but handling these could be the subject of a separate issue).
For example, an AsyncSerialPeripheral wrapper in nmigen-soc would provide access to an AsyncSerial core in nmigen-stdio. Baudrate, RX/TX data, strobes etc. would be accessed through CSRs.

Integration would be straightforward for peripherals that provide nothing more than CSRs:

  • CSRs are gathered behind a csr.Multiplexer, whose bus interface is exposed by the peripheral
  • all peripheral interfaces are gathered behind a single csr.Decoder
  • the csr.Decoder bus interface is bridged to the SoC interconnect

But what about peripherals that also provide a memory interface ? (e.g. DRAM controllers, flash controllers, etc.)
I see two possible approaches:

Approach A: exposing two separate bus interfaces for CSRs and memories

CSRs would be handled the same way as described above, but the peripheral would also provide a separate bus interface to access its memories (e.g. WB4). I think LiteX follows a similar approach.

This has the consequence of locating the CSRs and memories of a given peripheral in separate regions of the SoC address space.

pros:

  • lower resource consumption; all the CSRs of the SoC are still pooled behind a single csr.Decoder, and the WB4 interface of a peripheral is directly connected to its logic.

cons:

  • transactions may be reordered if e.g. the WB4 interface sits behind a FIFO, but not the CSR interface.

Approach B: exposing a single bus interface for both CSRs and memories

Instead of two separate interfaces, a memory-capable peripheral would expose a single bus interface like WB4 or AXI4. This has the consequence of locating all the resources of a peripheral in the same address space region.

  • peripherals would have a local wishbone.Decoder, whose bus interface would be exposed
  • memory interfaces would be added to the decoder
  • CSRs would be grouped into banks, each bank would be bridged to the same decoder
    (e.g. csr.Multiplexer -> WishoneCSRBridge -> wishbone.Decoder)

pros:

  • peripherals with single standard bus interface are easier to integrate when instantiated alone
    (counterargument: users may prefer just using the bare nmigen-stdio cores instead, if available)
  • the address space layout of a peripheral would be flexible to the point where one could mimick the peripherals of another SoC. This could facilitate porting/reusing drivers.

cons:

  • some layouts may consume significantly more resources, e.g. if many CSR banks are requested.
    (although I assume that the general case consists of a single CSR bank)

Any thoughts on this ?
cc @whitequark @awygle @enjoy-digital and others

@awygle
Copy link

awygle commented Feb 19, 2020

To get my biases out of the way - I am most concerned about the use case where there is no CPU, and possibly no bus. I believe this case is covered by wrapping nmigen-soc around nmigen-stdio, so I'm not too worried about that, but you should know where I'm coming from.

Approach A seems more flexible to me, in that it can be configured to act like Approach B. With Approach A, I can hook up AXI-Lite to the control port and AXI4 to the data port for AXI SoCs, and just hook everything up to the same WB4 bus for Wishbone SoCs. I believe the downside of this approach can be mitigated by requiring the control and data ports to have matched pipelining delays, or at the very least documenting the difference if one exists so that the SoC integrator can match them if desired.

@jfng
Copy link
Member Author

jfng commented Feb 19, 2020

Approach A seems more flexible to me, in that it can be configured to act like Approach B. With Approach A, I can hook up AXI-Lite to the control port and AXI4 to the data port for AXI SoCs, and just hook everything up to the same WB4 bus for Wishbone SoCs. I believe the downside of this approach can be mitigated by requiring the control and data ports to have matched pipelining delays, or at the very least documenting the difference if one exists so that the SoC integrator can match them if desired.

I think you just changed my mind on this! (I was in favor of approach B)

Both of the use-cases I highlighted for Approach B are actually doable with separate CSR and memory interfaces, namely:

  • peripherals with single standard bus interface are easier to integrate when instantiated alone

The CSR bus interface could just be bridged by a parent module to the WB4/AXI4 bus, resulting in a "single standard bus interface".

  • the address space layout of a peripheral would be flexible to the point where one could mimick the peripherals of another SoC. This could facilitate porting/reusing drivers.

Similarly, a parent module could wrap the peripheral and reorganize its address space, and expose whatever layout may be needed in order to reuse a particular driver.

@jfng
Copy link
Member Author

jfng commented Feb 19, 2020

So, in the case of peripherals with CSRs, I'm thinking of a csr.Peripheral mixin that would be used like this (without interrupts, for now):

class AsyncSerialPeripheral(csr.Peripheral, Elaboratable):
    def __init__(self, *, rx_depth=16, tx_depth=16, **kwargs):
        super().__init__()

        self._phy     = AsyncSerial(**kwargs)
        self._rx_fifo = SyncFIFO(width=self._phy.rx.data.width, depth=rx_depth)
        self._tx_fifo = SyncFIFO(width=self._phy.tx.data.width, depth=tx_depth)

        self._divisor = self.csr(self._phy.divisor.width, "rw")
        self._rx_data = self.csr(self._phy.rx.data.width, "r")
        self._rx_rdy  = self.csr(1, "r")
        self._tx_data = self.csr(self._phy.tx.data.width, "w")
        self._tx_rdy  = self.csr(1, "r")

        self._bridge  = self.csr_bridge()
        self.csr_bus  = self._bridge.bus

    def elaborate(self, platform):
        m = Module()
        m.submodules.bridge  = self._bridge

        # ...

        return m

For memory interfaces, a separate wishbone.Peripheral mixin would provide:

  • a self.window() method that would return a wishbone.Interface
  • a self.wb_bridge() method that would return a bridge to all the requested windows.

That way, a peripheral that requires both a CSR bus and a WB4 bus would inherit from both csr.Peripheral and wishbone.Peripheral.

Would this be acceptable ?

@zignig
Copy link

zignig commented Feb 20, 2020

I think that we have to be careful not to limit the structure of the CSR interface.

Having the interface glom all the csr.whatever into a single csr.bus would make a harvard interface difficult.

perahaps a bus instance and the add to this bus interface would work better.

I think @awygle observing that a minimal interface without a CPU or (wishbone|AXI|whatever) interface is important. We should be able to make a nmigen-soc with nothing but 0 or more CSR interfaces.

@tannewt
Copy link

tannewt commented Aug 10, 2020

FYI: I'm working on a library I'm calling systemonachip. Here is an example: https://github.com/tannewt/systemonachip/blob/main/systemonachip/peripheral/timer.py#L12

It is based on lambasoc but makes two changes:

  1. Uses data descriptors for CSR definition. These classes then change their behavior based on the bus on the instance. If it's a Record then it produces the csr.Element. If not, it reads it's offset from the memory window. This allows the value to be read from the outside for use in higher level driver functions. This works with the simulator too.
  2. Pass in the bus/memory window into the constructor. This makes them an explicit input and can be used for dual-role classes.

@jfng
Copy link
Member Author

jfng commented Jun 6, 2023

This issue was discussed in two IRC meetings three years ago, but I forgot to summarize their conclusions.

20/07/20 : https://freenode.irclog.whitequark.org/nmigen/2020-07-20#1595274233-1595276561;

There is consensus for Approach A. PeripheralInfo must be modified to hold the memory map of every bus interface of a peripheral.

While a bus-agnostic API (consisting of memory ports and CSR elements) could automate compatibility with multiple bus protocols, some performance-critical features such as bursts would be hard to abstract over, if not impossible. Feature support would be limited to a common denominator.

27/07/20 : https://freenode.irclog.whitequark.org/nmigen/2020-07-27#1595876880-1595885191;

Considering an hypothetic flash controller peripheral. It has a memory and a CSR element with a "program" bit. Setting this bit has the side-effect of programming the flash storage with the contents of the memory.

Without further assumptions, this interaction is susceptible to data hazards, regardless of how many bus interfaces the peripheral has. Writes may be reordered such that the "program" bit is set before the last word of data reaches its destination.

Memory accesses may be delayed, combined or reordered at every step between the initiator, cache hierarchy, interconnect, and the peripheral:

  • If the initiator is a CPU, changes to memory ordering can be made by both the compiler and the CPU.
  • Memory-like regions would likely be cached by the initiator; writes may be delayed or combined before becoming bus transactions.
  • The interconnect topology, buffering primitives, and arbitration may introduce latencies. These are problematic if the peripheral has multiple bus interfaces.

The detection of memory reorderings from the compiler or the CPU is outside the scope of amaranth-soc. Therefore, adding synchronization primitives to the interconnect or peripheral isn't enough to mitigate them.

To be effective, synchronization needs to be implemented end-to-end. In such cases, the BSP generated by amaranth-soc should provide constraints to the compiler and the CPU's memory controller.

@whitequark
Copy link
Member

Thanks for summarizing this, JF! All of this makes sense to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants