diff --git a/contributors.adoc b/contributors.adoc index 789925e5..3ab4f0fa 100644 --- a/contributors.adoc +++ b/contributors.adoc @@ -4,3 +4,4 @@ This RISC-V specification has been contributed to directly or indirectly by (in [%hardbreaks] Aaron Durbin, Allen Baum, Anup Patel, Daniel Gracia PĂ©rez, David Kruckemyer, Greg Favor, Ahmad Fawal, Guerney D Hunt, John Hauser, Josh Scheid, Matt Evans, Manuel Rodriguez, Nick Kossifidis, Paul Donahue, Paul Walmsley, Perrine Peresse, Philipp Tomsich, Rieul Ducousso, Scott Nelson, Siqi Zhao, Sunil V.L, Tomasz Jeznach, Vassilis Papaefstathiou, Vedvyas Shanbhogue + diff --git a/header.adoc b/header.adoc index 8c735d2c..0734e15c 100644 --- a/header.adoc +++ b/header.adoc @@ -22,8 +22,9 @@ :lang: en :listing-caption: Listing :sectnums: +:sectnumlevels: 5 :toc: left -:toclevels: 4 +:toclevels: 5 :source-highlighter: pygments ifdef::backend-pdf[] :source-highlighter: coderay @@ -56,6 +57,7 @@ Copyright 2023 by RISC-V International. [preface] include::contributors.adoc[] +include::iommu_preface.adoc[] include::iommu_intro.adoc[] include::iommu_data_structures.adoc[] include::iommu_in_memory_queues.adoc[] diff --git a/iommu.bib b/iommu.bib index 55e462c3..49ef1d72 100644 --- a/iommu.bib +++ b/iommu.bib @@ -17,3 +17,11 @@ @electronic{AIA title = {RISC-V Advanced Interrupt Architecture}, url = {https://github.com/riscv/riscv-aia} } +@electronic{CFI, + title = {RISC-V Shadow Stacks and Landing Pads}, + url = {https://github.com/riscv/riscv-cfi} +} +@electronic{PR243, + title = {Clarification updates to IOMMU v1.0.0}, + url = {https://github.com/riscv-non-isa/riscv-iommu/pull/243/commits} +} diff --git a/iommu_data_structures.adoc b/iommu_data_structures.adoc index 68d926eb..dab992bc 100644 --- a/iommu_data_structures.adoc +++ b/iommu_data_structures.adoc @@ -197,6 +197,8 @@ image::ddt-base.svg[width=800,height=400] //ddtp--->+--+ +->+--+ +->+--+ ddtp--->+--+ +->+--+ ddtp--->+--+ //.... +<<< + ==== Non-leaf DDT entry A valid (`V==1`) non-leaf DDT entry provides the PPN of the next level DDT. @@ -374,6 +376,8 @@ that supports multiple process contexts and thus generates a valid `process_id` with its memory accesses. For PCIe, for example, if the request has a PASID then the PASID is used as the `process_id`. +<<< + When `PDTV` is 1, the `DPE` bit may set to 1 to enable the use of 0 as the default value of `process_id` for translating requests without a valid `process_id`. When `PDTV` is 0, the `DPE` bit is reserved for future standard @@ -544,6 +548,8 @@ address. | 9-15 | -- | Reserved for standard use. |=== +<<< + When `DC.tc.PDTV` is 1, the `DC.fsc` field holds the process-directory table pointer (`pdtp`). When the device supports multiple process contexts, selected by the `process_id`, the PDT is used to determine the first-stage page table and @@ -906,6 +912,8 @@ translation and protection process. When address translation caches (<>) are implemented, the translation process may use the `GSCID` and `PSCID` to associate the cached translations with their address spaces. +<<< + The process to translate an `IOVA` is as follows: @@ -991,7 +999,11 @@ The process to translate an `IOVA` is as follows: . Translation process is complete When checking the `U` bit in a second-stage PTE, the transaction is treated as -not requesting supervisor privilege. +not requesting supervisor privilege. The `pte.xwr=010` encoding, as specified by +the Zicfiss cite:[CFI] extension for the Shadow Stack page type in single-stage +and VS-stage page tables, remains a reserved encoding for IO transactions. + +<<< When the translation process reports a fault, and the request is an Untranslated request or a Translated request, the IOMMU requests the IO bridge to abort the @@ -1151,8 +1163,8 @@ file and translating the address using the MSI page table is as follows: process are equivalent to that of a regular RISC-V second-stage PTE with `R`=`W`=`U`=1 and `X`=0. Similar to a second-stage PTE, when checking the `U` bit, the transaction is treated as not requesting supervisor privilege. -. If the transaction is an Untranslated or Translated read-for-execute then stop - and report "Instruction access fault" (cause = 1). +.. If the transaction is an Untranslated or Translated read-for-execute then stop + and report "Instruction access fault" (cause = 1). . MSI address translation process is complete. [NOTE] @@ -1251,13 +1263,18 @@ no-write not requested and no write permission; no read permission) then a Success response is returned with the denied permission (R, W or X) set to 0 and the other permission bits set to the value determined from the page tables. The X permission is granted only if the R permission is also -granted. Execute-only translations are not compatible with PCIe ATS as PCIe -requires read permission to be granted if the execute permission is granted. +granted and the execute permission was requested. Execute-only translations are +not compatible with PCIe ATS as PCIe requires read permission to be granted +if the execute permission is granted. + +<<< When a Success response is generated for an ATS translation request, no fault records are reported to software through the fault/event reporting mechanism, even when the response indicates no access was granted or some permissions were -denied. +denied. Conversely, when a UR or CA response is generated for an ATS translation +request, the corresponding fault is reported to software through the fault/event +reporting mechanism. If the translation request has an address determined to be an MSI address using the rules defined by the <> but the MSI PTE is configured in MRIF @@ -1346,11 +1363,12 @@ of "Page Request". a "Page Request Group Response" message to the device. When the IOMMU generates the response, the status field of the response depends -on the cause of the error. +on the cause of the error. If a fault condition prevents locating a valid device +context then the `PRPR` value assumed is 0. The status is set to Response Failure if the following faults are encountered: -* `ddtp.iommu_mode` is `Off` +* `ddtp.iommu_mode` is `Off` (cause = 256) * DDT entry load access fault (cause = 257) * DDT entry misconfigured (cause = 259) * DDT entry not valid (cause = 258) @@ -1359,8 +1377,9 @@ The status is set to Response Failure if the following faults are encountered: The status is set to Invalid Request if the following faults are encountered: -* `ddtp.iommu_mode` is `Bare` -* `EN_PRI` is set to 0 +* `device_id` is wider than supported by the IOMMU mode (cause = 260) +* `ddtp.iommu_mode` is `Bare` (cause = 260) +* `EN_PRI` is set to 0 (cause = 260) The status is set to Success if no other faults were encountered but the "Page Request" could not be queued due to the page-request queue being full diff --git a/iommu_hw_guidelines.adoc b/iommu_hw_guidelines.adoc index 3d359905..44730556 100644 --- a/iommu_hw_guidelines.adoc +++ b/iommu_hw_guidelines.adoc @@ -46,6 +46,8 @@ indicate this condition. For AXI, for example, the completion status is provided by SLVERR on RRESP (Read Data channel). For PCIe, for example, the completion status field may be set to "Unsupported Request" (UR) or "Completer Abort" (CA). +<<< + [[RAS]] === Reliability, Availability, and Serviceability (RAS) The IOMMU may support a RAS architecture that specifies the methods for diff --git a/iommu_in_memory_queues.adoc b/iommu_in_memory_queues.adoc index 6fdf4c4c..0fcabfcb 100644 --- a/iommu_in_memory_queues.adoc +++ b/iommu_in_memory_queues.adoc @@ -180,7 +180,17 @@ operand is valid. Setting `PSCV` to 1 is allowed only for `IOTINVAL.VMA`. The the translations associated with the host (i.e. those where the second-stage is Bare) are operated on. When `GV` is 0, the `GSCID` operand is ignored. When `AV` is 0, the `ADDR` operand is ignored. When `PSCV` operand is 0, the -`PSCID` operand is ignored. +`PSCID` operand is ignored. When the `AV` operand is set to 1, if the `ADDR` +operand specifies an invalid address, the command may or may not perform any +invalidations. + +[NOTE] +==== +When an invalid address is specified, an implementation may either complete the +command with no effect or may complete the command using an alternate, yet +`UNSPECIFIED`, legal value for the address. Note that entries may generally be +invalidated from the address translation cache at any time. +==== `IOTINVAL.VMA` ensures that previous stores made to the first-stage page tables by the harts are observed by the IOMMU before all subsequent implicit @@ -189,8 +199,8 @@ reads from IOMMU to the corresponding first-stage page tables. [[IVMA]] .`IOTINVAL.VMA` operands and operations -[width=75%] -[%header, cols="2,2,3,20"] +[width=100%] +[%header, cols="2,2,3,30"] |=== |`GV`|`AV`|`PSCV`| Operation |0 |0 |0 | Invalidates all address-translation cache entries, including @@ -234,8 +244,8 @@ is illegal. [[IGVMA]] .`IOTINVAL.GVMA` operands and operations -[width=75%] -[%header, cols="2,2,20"] +[width=100%] +[%header, cols="2,2,30"] |=== | `GV` | `AV` | Operation | 0 | ignored| Invalidates information cached from any level of the @@ -245,8 +255,8 @@ is illegal. identified by the `GSCID` operand. | 1 | 1 | Invalidates information cached from leaf second-stage page table entries corresponding to the guest-physical-address in - `ADDR` operand, for only for VM address spaces identified - `GSCID` operand. + `ADDR` operand, but only for VM address spaces identified + by the `GSCID` operand. |=== [NOTE] @@ -270,6 +280,10 @@ match the `GSCID` argument, regardless of the address argument. Simpler implementations may ignore the operand of `IOTINVAL.VMA` and/or `IOTINVAL.GVMA` and always perform a global invalidation of all address-translation entries. + +Some implementations may cache an identity-mapped translation for the stage of +address translation operating in `Bare` mode. Since these identity mappings +are invariably correct, an explicit invalidation is unnecessary. ==== [NOTE] @@ -558,6 +572,8 @@ If the `DSV` operand is 1, then a valid destination segment number is specified by the `DSEG` operand. If the `DSV` operand is 0, then the `DSEG` operand is ignored. +<<< + [NOTE] ==== A Hierarchy is a PCI Express I/O interconnect topology, wherein the diff --git a/iommu_intro.adoc b/iommu_intro.adoc index 3548b61f..7b094cb8 100644 --- a/iommu_intro.adoc +++ b/iommu_intro.adoc @@ -52,8 +52,10 @@ management complexity for DMA. Use of an identical format also allows the same page tables to be used simultaneously by both the CPU MMU and the IOMMU. Although there is no option to disable two-stage address translation, either -stage may be effectivly disabled by configuring the virtual memory scheme for -that stage to be `Bare` i.e. perfom no address translation or memory protection. +stage may be effectively disabled by configuring the virtual memory scheme for +that stage to be `Bare` i.e. perform no address translation or memory protection. + +<<< The virtual memory scheme employed by the IOMMU may be configured individually per device in the IOMMU. Devices perform DMA using an I/O virtual address (IOVA). @@ -87,7 +89,7 @@ is a VA. Two-stage address translation is in effect. The first-stage translates the VA to a GPA and the second-stage translates the GPA to a SPA. Each stage enforces the configured memory protections. Such a configuration would be typically be employed when the device control is passed-through to a virtual -machine and the Guest OS in the VM uses the first-stage addresss translation to +machine and the Guest OS in the VM uses the first-stage address translation to further constrain the memory accessed by such devices and associated privileges and memory protections. Comparing to a RISC-V hart, this configuration is analogous to two-stage address translation being in effect on a RISC-V hart with @@ -112,6 +114,8 @@ collection of processes that share a common virtual address space. The IOMMU may use the `GSCID` and `PSCID` to tag entries in the IOATC to avoid duplication and simplify invalidation operations. +<<< + Some devices may participate in the translation process and provide a device side ATC (DevATC) for its own memory accesses. By providing a DevATC, the device shares the translation caching responsibility and thereby reduce @@ -147,7 +151,7 @@ in the device context. === Glossary .Terms and definitions [width=90%] -[%header, cols="5,20"] +[%header, cols="5,25"] |=== | Term ^| Definition | AIA | RISC-V Advanced Interrupt Architecture cite:[AIA]. @@ -190,10 +194,6 @@ in the device context. address space of a process. The PASID value is provided in the PASID TLP prefix of the request. | PBMT | Page-Based Memory Types. -| PPN | Physical Page Number. -| PRI | Page Request Interface - a PCIe protocol that enables - devices to request OS memory manager services to make pages - resident cite:[PCI]. | PC | Process Context. | PCIe | Peripheral Component Interconnect Express bus standard cite:[PCI]. @@ -293,29 +293,9 @@ in <> the OS may configure the IOMMU with a page table to translate the IOVA and thereby limit the addresses that may be accessed to those allowed by the page table. -Legacy 32-bit devices cannot access the memory above 4 GiB. The IOMMU, through -its address remapping capability, offers a simple mechanism for the device to -directly access any address in the system (with appropriate access permission). -Without an IOMMU, the OS must resort to copying data through buffers (also -known as bounce buffers) allocated in memory below 4 GiB. In this scenario the -IOMMU improves the system performance. - -The IOMMU can be useful to perform scatter/gather DMA as it permits to allocate -large regions of memory for I/O without the need for all of the memory to be -contiguous. A contiguous virtual address range can map to such fragmented -physical addresses and the device programmed with the virtual address range. - -The IOMMU can be used to support shared virtual addressing which is the ability -to share a process address space with devices. The virtual addresses used for -DMA are then translated by the IOMMU to an SPA. - -When the IOMMU is used by a non-virtualized OS, the first-stage suffices to -provide the required address translation and protection function and the -second-stage may be set to Bare. - [[fig:device-isolation]] .Device isolation in non-virtualized OS -image::non-virt-OS.svg[width=300,height=300] +image::non-virt-OS.svg[width=300,height=300, align="center"] //["ditaa",shadows=false, separation=false, fontsize: 16] //.... @@ -340,6 +320,26 @@ image::non-virt-OS.svg[width=300,height=300] // +--------+ //.... +Legacy 32-bit devices cannot access the memory above 4 GiB. The IOMMU, through +its address remapping capability, offers a simple mechanism for the device to +directly access any address in the system (with appropriate access permission). +Without an IOMMU, the OS must resort to copying data through buffers (also +known as bounce buffers) allocated in memory below 4 GiB. In this scenario the +IOMMU improves the system performance. + +The IOMMU can be useful to perform scatter/gather DMA as it permits to allocate +large regions of memory for I/O without the need for all of the memory to be +contiguous. A contiguous virtual address range can map to such fragmented +physical addresses and the device programmed with the virtual address range. + +The IOMMU can be used to support shared virtual addressing which is the ability +to share a process address space with devices. The virtual addresses used for +DMA are then translated by the IOMMU to an SPA. + +When the IOMMU is used by a non-virtualized OS, the first-stage suffices to +provide the required address translation and protection function and the +second-stage may be set to Bare. + ==== Hypervisor IOMMU makes it possible for a guest operating system, running in a virtual @@ -360,7 +360,7 @@ and from D2 to VM-2 associated memory. [[fig:dma-translation-direct-device-assignment]] .DMA translation to enable direct device assignment -image::hypervisor.svg[width=300,height=300] +image::hypervisor.svg[width=300,height=300, align="center"] //["ditaa",shadows=false, separation=false, fontsize: 16] //.... //+----------------+ +----------------+ @@ -394,7 +394,7 @@ address, the same as supported by regular RISC-V page-based address translation. [[MSI_REDIR]] .MSI address translation to direct guest programmed MSI to IMSIC guest interrupt files -image::msi-imsic.svg[width=500,height=400] +image::msi-imsic.svg[width=500,height=400, align="center"] //["ditaa",shadows=false, separation=false, font=courier, fontsize: 16] //.... // +-----------------------+ @@ -440,21 +440,9 @@ hypervisor. <> illustrates the concept. -The IOMMU is configured to perform address translation using a first-stage -and second-stage page table for device D1. The second-stage is typically used by -the hypervisor to translate GPA to SPA and limit the device D1 to memory -associated with VM-1. The first-stage is typically configured by the Guest OS to -translate a VA to a GPA and contain device D1 access to a subset of VM-1 memory. - -For device D2 only the second-stage is active and the first-stage is set to Bare. - -The host OS or hypervisor may also retain a device, such as D3, for its own use. -The first-stage suffices to provide the required address translation and -protection function for device D3 and the second-stage is set to Bare. - [[fig:iommu-for-guest-os]] .Address translation in IOMMU for Guest OS -image::guest-OS.svg[width=500,height=400] +image::guest-OS.svg[width=500,height=400, align="center"] //["ditaa",shadows=false, separation=false, fontsize: 16] //.... //+---------------------------------------------------+ @@ -484,6 +472,18 @@ image::guest-OS.svg[width=500,height=400] // +-----------+ +-----------+ +-----------+ //.... +The IOMMU is configured to perform address translation using a first-stage +and second-stage page table for device D1. The second-stage is typically used by +the hypervisor to translate GPA to SPA and limit the device D1 to memory +associated with VM-1. The first-stage is typically configured by the Guest OS to +translate a VA to a GPA and contain device D1 access to a subset of VM-1 memory. + +For device D2 only the second-stage is active and the first-stage is set to Bare. + +The host OS or hypervisor may also retain a device, such as D3, for its own use. +The first-stage suffices to provide the required address translation and +protection function for device D3 and the second-stage is set to Bare. + === Placement and data flow <> shows an example of a typical system on a chip @@ -496,6 +496,10 @@ for example, the Root Port is a PCIe port that maps a portion of a hierarchy through an associated virtual PCI-PCI bridge and maps the PCIe IO protocol transactions to the system interconnect transactions. +[[fig:example-soc-with-iommu]] +.Example of IOMMUs integration in SoC. +image::placement.svg[width=800, align="center"] + The first IOMMU instance, IOMMU 0 (associated with the IO Bridge 0), interfaces a Root Port to the system fabric/interconnect. One or more endpoint devices are interfaced to the SoC through this Root Port. In the case of PCIe, the Root Port @@ -517,10 +521,6 @@ to translate the IOVA to a Supervisor Physical Addresses (SPA). The IOMMU is not invoked for outbound transactions. -[[fig:example-soc-with-iommu]] -.Example of IOMMUs integration in SoC. -image::placement.svg[width=800] - The IOMMU is invoked by the IO Bridge for address translation and protection for inbound transactions. The data associated with the inbound transactions is not processed by the IOMMU. The IOMMU behaves like a look-aside IP to the IO Bridge @@ -583,16 +583,16 @@ and has several interfaces (see <>): .. To receive "Page Request" and "Stop Marker" messages from the endpoints and to send "Page Request Group Response" messages to the endpoints. +[[fig:iommu-interfaces]] +.IOMMU interfaces. +image::interfaces.svg[width=800, align="center"] + The interfaces related to recording an incoming MSI in a memory-resident interrupt file (MRIF) (See RISC-V Advanced Interrupt Architecture cite:[AIA]) are implementation-specific. The partitioning of responsibility between the IOMMU and the IO bridge for recording the incoming MSI in an MRIF and generating the associated _notice_ MSI are implementation-specific. -[[fig:iommu-interfaces]] -.IOMMU interfaces. -image::interfaces.svg[width=800] - Similar to the RISC-V harts, physical memory attributes (PMA) and physical memory protection (PMP) checks must be completed on all inbound IO transactions even when the IOMMU is in bypass (`Bare` mode). The placement and integration of @@ -643,6 +643,8 @@ Other implementation-specific methods in the IO bridge may be provided to perform such authentication. ==== +<<< + === IOMMU features Version 1.0 of the RISC-V IOMMU specification supports the following features: diff --git a/iommu_preface.adoc b/iommu_preface.adoc new file mode 100644 index 00000000..52ebe53e --- /dev/null +++ b/iommu_preface.adoc @@ -0,0 +1,34 @@ +== Preface + +[.big]*_Preface to Version 1.0.1_* + +The following backward-compatible changes, comprising a set of clarifications +and corrections, have been made since version 1.0.0: + +* A set of typographic errors and editorial updates were made. +* Clarified that translations cached in IOMMU ATC do not require explicit + invalidation when the IOMMU operates in Bare mode. +* Clarified that memory faults encountered by commands also set the `cqmf` flag. +* Clarified that values tested by the algorithm in the SW Guidelines section + are those before any modifications made by the algorithm. +* Included SW guidelines for modifying non-leaf PDT entries. +* Clarified the behavior for in-flight transactions observed at the time of `ddtp` + write operations. +* Clarified the behavior when `IOTINVAL` is invoked with an invalid address. +* Stated that faults leading to UR/CA ATS responses are reported in the Fault Queue. +* Added a detailed description of the `capabilities.PAS` field. +* Included software guidelines for changing IOMMU modes and provided + RV32-specific guidelines for programming tr_req_ctl and HPM counters. +* Stated that the PCIe specification requires granting execute permission + in translation responses only if explicitly requested. +* Clarified the handling of hardware implementations that internally split + 8-byte transactions. +* Noted that shadow stack encodings introduced by Zicfiss are reserved + and not usable for IOMMU use. +* Listed the fault codes reported for faults detected by Page Request. + +These changes were made through PR#243 cite:[PR243]. + +[.big]*_Preface to Version 1.0.0_* + +* Ratified version of the RISC-V IOMMU Architecture Specification. diff --git a/iommu_ref_model/libiommu/src/iommu_command_queue.c b/iommu_ref_model/libiommu/src/iommu_command_queue.c index 082fad81..d25d6565 100644 --- a/iommu_ref_model/libiommu/src/iommu_command_queue.c +++ b/iommu_ref_model/libiommu/src/iommu_command_queue.c @@ -129,6 +129,10 @@ process_commands( // the DID operand must not be wider than that supported by // the ddtp.iommu_mode. if ( command.iodir.dv != 1 ) goto command_illegal; + // When DV operand is 1, the value of the DID operand must not + // be wider than that supported by the ddtp.iommu_mode. + if ( command.iodir.did & ~g_max_devid_mask ) + goto command_illegal; // The PID operand of IODIR.INVAL_PDT must not be wider than // the width supported by the IOMMU (see Section 5.3) if ( g_reg_file.capabilities.pd20 == 0 && diff --git a/iommu_ref_model/test/test_app.c b/iommu_ref_model/test/test_app.c index 597a6496..9c97dbb1 100644 --- a/iommu_ref_model/test/test_app.c +++ b/iommu_ref_model/test/test_app.c @@ -2246,6 +2246,14 @@ main(void) { write_memory((char *)&cmd, ((cqb.ppn * PAGESIZE) | (cqh.index * 16)), 16); write_register(CQCSR_OFFSET, 4, cqcsr.raw); + // Invalidate PC - DID must not be too wide + g_max_devid_mask = 0x3F; + process_commands(); + cqcsr.raw = read_register(CQCSR_OFFSET, 4); + fail_if( ( cqcsr.cmd_ill != 1 ) ); + g_max_devid_mask = 0xFFFFFF; + write_register(CQCSR_OFFSET, 4, cqcsr.raw); + // Process the fixed up command process_commands(); diff --git a/iommu_registers.adoc b/iommu_registers.adoc index a990daf5..193f6061 100644 --- a/iommu_registers.adoc +++ b/iommu_registers.adoc @@ -9,15 +9,15 @@ the size of the access, or if the access spans multiple registers, or if the size of the access is not 4 bytes or 8 bytes, is `UNSPECIFIED`. A 4 byte access to an IOMMU register must be single-copy atomic. Whether an 8 byte access to an IOMMU register is single-copy atomic is `UNSPECIFIED`, and such an access may -appear, internally to the IOMMU, as if two separate 4 byte accesses were -performed. +appear, internally to the IOMMU, as if two separate 4 byte accesses -- first to +the high half and second to the low half -- were performed. [NOTE] ==== -The 8 byte IOMMU registers are defined in such a way that software can perform -two individual 4 byte accesses, or hardware can perform two independent 4 byte -transactions resulting from an 8 byte access, to the high and low halves of the -register as long as the register semantics, with regards to side-effects, are +The 8-byte IOMMU registers are defined in such a way that software can perform +two individual 4-byte accesses, or hardware can perform two independent 4-byte +transactions resulting from an 8-byte access, to the high and low halves of the +register, in that order, as long as the register semantics, with regard to side-effects, are respected between the two software accesses, or two hardware transactions, respectively. ==== @@ -68,8 +68,10 @@ the register returns 0 and writes to that offset are ignored. CSR >> | if `capabilities.ATS==0` |84 |`ipsr` |4 |<>| No -|88 |`iocntovf` |4 |<> | if `capabilities.HPM==0` -|92 |`iocntinh` |4 |<> | if `capabilities.HPM==0` +|88 |`iocountovf` |4 |<> | if `capabilities.HPM==0` +|92 |`iocountinh` |4 |<> | if `capabilities.HPM==0` |96 |`iohpmcycles` |8 |<> | if `capabilities.HPM==0` |104 |`iohpmctr1-31` |248 |<> | if `capabilities.HPM==0` |352 |`iohpmevt1-31` |248 |<> | if `capabilities.HPM==0` @@ -99,6 +101,8 @@ The reset value is 0 for the following registers fields. * `tr_req_ctl.Go/Busy` * `ddtp.busy` +<<< + The reset value is 0 for the following registers. * `ipsr` @@ -234,6 +238,9 @@ must be supported. IOMMU implementations must support the Svnapot standard extension for NAPOT Translation Contiguity. +The physical address space addressable by the IOMMU ranges from 0 to +stem:[2^{capabilities.PAS} - 1]. + [NOTE] ==== Hypervisor may provide an SW emulated IOMMU to allow the guest to manage @@ -356,10 +363,10 @@ are enabled (i.e. `cqcsr.cqon/cqen == 1`, `fqcsr.fqon/cqen == 1`, or !5-13 ! reserved ! Reserved for standard use. !14-15 ! custom ! Designated for custom use. !=== -|4 |`busy` |RO | A write to `ddtp` may require the IOMMU to +|4 |`busy` |RO | A write to `ddtp.iommu_mode` may require the IOMMU to perform many operations that may not occur synchronously to the write. When a write is - observed by the `ddtp`, the `busy` bit is set + observed by the `ddtp.iommu_mode`, the `busy` bit is set to 1. When the `busy` bit is 1, behavior of additional writes to the `ddtp` is `UNSPECIFIED`. Some implementations @@ -370,7 +377,7 @@ are enabled (i.e. `cqcsr.cqon/cqen == 1`, `fqcsr.fqon/cqen == 1`, or + If the `busy` bit reads 0 then the IOMMU has completed the operations associated with the - previous write to `ddtp`. + + previous write to `ddtp.iommu_mode`. + + An IOMMU that can complete these operations synchronously may hard-wire this bit to 0. @@ -392,18 +399,24 @@ subset of directory-table levels and device-context widths. At a minimum one of the modes must be supported. When the `iommu_mode` field value is changed to `Off` the IOMMU guarantees that -in-flight transactions from devices connected to the IOMMU will be processed -with the configurations applicable to the old value of the `iommu_mode` field -and that all transactions and previous requests from devices that have already -been processed by the IOMMU be committed to a global ordering point such that -they can be observed by all RISC-V harts, devices, and IOMMUs in the platform. +in-flight transactions, observed at the time of the write to this field, from devices +connected to the IOMMU will either be processed with the configurations +applicable to the old value of the `iommu_mode` field or be aborted +(<>). It also ensures that all transactions and previous +requests from devices that have already been processed by the IOMMU are committed +to a global ordering point such that they can be observed by all RISC-V harts, +devices, and IOMMUs in the platform. Software must not change the `PPN` field +value when transitioning the `iommu_mode` to `Off`. The IOMMU behavior of writing `iommu_mode` to `1LVL`, `2LVL`, or `3LVL`, when the previous value of the `iommu_mode` is not `Off` or `Bare` is `UNSPECIFIED`. To change DDT levels, the IOMMU must first be transitioned to `Bare` or `Off` -state. +state. The behavior resulting from changing the `iommu_mode` to `Bare` when the +previous value of the `iommu_mode` was not `Off` is `UNSPECIFIED`. + +<<< -When an IOMMU is transitioned to `Bare` of `Off` state, the IOMMU may retain +When an IOMMU is transitioned to `Bare` or `Off` state, the IOMMU may retain information cached from in-memory data structures such as page tables, DDT, PDT, etc. Software must use suitable invalidation commands to invalidate cached entries. @@ -677,6 +690,8 @@ In RV32, only the low order 32-bits of the register (22-bit `PPN` and 5-bit `LOG2SZ-1`) need to be written. ==== +<<< + [[PQH]] === Page-request-queue head (`pqh`) @@ -779,11 +794,12 @@ status of the command-queue. generation of interrupts from command-queue when set to 1. |7:2 |reserved|WPRI | Reserved for standard use -|8 |`cqmf` |RW1C | If command-queue access leads to a memory fault then - the command-queue-memory-fault bit is set to 1 and - the command-queue stalls until this bit is cleared. - To re-enable command processing, software should - clear this bit by writing 1. +|8 |`cqmf` |RW1C | If command-queue access to fetch a command or a + memory access made by a command leads to a memory + fault, then the command-queue-memory-fault bit is set + to 1, and the command-queue stalls until this bit is + cleared. To re-enable command processing, software + should clear this bit by writing 1. |9 |`cmd_to`|RW1C | If the execution of a command leads to a timeout (e.g. a command to invalidate device ATC may timeout waiting for a completion), then the @@ -796,7 +812,7 @@ status of the command-queue. sets the `cmd_ill` bit and stops processing from the command-queue. To re-enable command processing software should clear this bit by writing 1. -|11 |`fence_w_ip`|RW1C | An IOMMU that supports only wire-signaled-interrupts +|11 |`fence_w_ip`|RW1C | An IOMMU that supports wire-signaled-interrupts sets the `fence_w_ip` bit to indicate completion of an `IOFENCE.C` command. To re-enable interrupts on `IOFENCE.C` completion, @@ -852,6 +868,8 @@ to wait for all previous commands to be committed, if so desired, before turning off the command-queue. ==== +<<< + [[FQCSR]] === Fault queue CSR (`fqcsr`) @@ -967,7 +985,7 @@ status of the page-request-queue. |0 |`pqen` |RW | The page-request-enable bit enables the page-request-queue when set to 1. + + - Changing `pqen` from 0 to 1, sets the `pqh` + Changing `pqen` from 0 to 1, sets the `pqt` register and the `pqcsr` bits `pqmf` and `pqof` to 0. The page-request-queue may take some time to be active following setting the `pqen` to 1. @@ -1117,9 +1135,11 @@ interrupt-pending bit. |=== If a bit in `ipsr` is 1 then a write of 1 to the bit transitions the bit from 1->0. -If the conditions to set that bit are still present (See <>) or if +If the conditions to set that bit are still present (See <>) or if they occur after the bit is cleared then that bit transitions again from 0->1. +<<< + [[OVF]] === Performance-monitoring counter overflow status (`iocountovf`) The performance-monitoring counter overflow status is a 32-bit read-only @@ -1182,6 +1202,11 @@ When the `iohpmcycles` counter is not needed, it is desirable to conditionally inhibit it to reduce energy consumption. Providing a single register to inhibit all counters allows a) one or more counters to be atomically programmed with events to count b) one or more counters to be sampled atomically. + +To initialize an event counter or the cycles counter to a desired value, it +should be first inhibited if it is enabled to count. This measure ensures that +it does not count during the update process. The inhibition should be removed +after the register has been programmed with the desired value. ==== [[CYC]] @@ -1411,6 +1436,8 @@ during the update and the inhibit removed after the register has been programmed with the desired value. ==== +<<< + [NOTE] ==== If `capabilities.HPM` is 1 then a minimum of one programmable event counter @@ -1510,6 +1537,11 @@ translation-request interface for debug. This register is present when this translation request. |=== +[NOTE] +==== +In RV32, the high half of the register should be written first, followed by the +low half, which includes the `Go/Busy` bit, to initiate a translation. +==== [[TRR_RSP]] === Translation-response (`tr_response`) @@ -1663,6 +1695,8 @@ to determine a MSI table entry. Each MSI table entry for interrupt vector `x` has three registers `msi_addr_x`, `msi_data_x`, and `msi_vec_ctl_x`. These registers are hardwired to 0 if `capabilities.IGS == WSI`. +<<< + If an access fault is detected on a MSI write using `msi_addr_x`, then the IOMMU reports a "IOMMU MSI write access fault" (cause 273) fault, with `TTYP` set to 0 and `iotval` set to the value of `msi_addr_x`. diff --git a/iommu_sw_guidelines.adoc b/iommu_sw_guidelines.adoc index 8fe90c49..f91098ec 100644 --- a/iommu_sw_guidelines.adoc +++ b/iommu_sw_guidelines.adoc @@ -138,21 +138,24 @@ previous read and/or write requests, that have already been processed by the IOMMU, be committed to a global ordering point as part of the `IOFENCE.C` command. +In subsequent sections, when an algorithm step tests values in the in-memory +data structures to determine the type of invalidation operation to perform, the +data values tested are the old values i.e. values before a change is made. + [[DC_CHANGE]] ==== Changing device directory table entry If software changes a leaf-level DDT entry (i.e, a device context (`DC`), of device with `device_id = D`) then the following invalidations must be performed: * `IODIR.INVAL_DDT` with `DV=1` and `DID=D` -* If `DC.tc.PDTV==1` then `IODIR.INVAL_PDT` with `DV=1`, `PV=0`, and `DID=D` * If `DC.iohgatp.MODE != Bare` ** `IOTINVAL.VMA` with `GV=1`, `AV=PSCV=0`, and `GSCID=DC.iohgatp.GSCID` ** `IOTINVAL.GVMA` with `GV=1`, `AV=0`, and `GSCID=DC.iohgatp.GSCID` * else -** If `DC.tc.PDTV==1 || DC.tc.PDTV == 0 && DC.fsc.MODE == Bare` +** If `DC.tc.PDTV==1` *** `IOTINVAL.VMA` with `GV=AV=PSCV=0` -** else +** else if `DC.fsc.MODE != Bare` *** `IOTINVAL.VMA` with `GV=AV=0` and `PSCV=1`, and `PSCID=DC.ta.PSCID` If software changes a non-leaf-level DDT entry the following invalidations @@ -170,13 +173,18 @@ If software changes a leaf-level PDT entry (i.e, a process context (`PC`), for `device_id=D` and `process_id=P`) then the following invalidations must be performed: -* `IODIR.INVAL_PDT` with `DV=1`, `PV=1`, `DID=D` and `PID=P` +* `IODIR.INVAL_PDT` with `DV=1`, `DID=D` and `PID=P` * If `DC.iohgatp.MODE != Bare` ** `IOTINVAL.VMA` with `GV=1`, `AV=0`, `PV=1`, `GSCID=DC.iohgatp.GSCID`, and `PSCID=PC.PSCID` * else ** `IOTINVAL.VMA` with `GV=0`, `AV=0`, `PV=1`, and `PSCID=PC.PSCID` +If software changes a non-leaf-level PDT entry the following invalidations +must be performed: + +* `IODIR.INVAL_DDT` with `DV=1` and `DID=D` + Between a change to the PDT entry and when an invalidation command to invalidate the cached entry is processed by the IOMMU, the IOMMU may use the old value or the new value of the entry. @@ -302,6 +310,8 @@ the DevATC may be satisfied by the IOMMU from the IOATC, to ensure correct operation software must first invalidate the IOATC before sending invalidations to the DevATC. +<<< + ==== Caching invalid entries This specification does not allow the caching of first/second-stage PTEs whose