Skip to content

Commit

Permalink
Merge pull request openhwgroup#1020 from pascalgouedo/dev_dd_pgo_doc
Browse files Browse the repository at this point in the history
User Manual final updates.
  • Loading branch information
pascalgouedo authored Jul 3, 2024
2 parents 8f24b1d + 4e7b2ec commit c998590
Show file tree
Hide file tree
Showing 5 changed files with 105 additions and 51 deletions.
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@
# The short X.Y version
version = u''
# The full version, including alpha/beta/rc tags
release = u'v1.8.0'
release = u'v1.8.3'


# -- General configuration ---------------------------------------------------
Expand Down
2 changes: 1 addition & 1 deletion docs/source/corev_hw_loop.rst
Original file line number Diff line number Diff line change
Expand Up @@ -181,7 +181,7 @@ If ebreak is used to enter in Debug Mode (:ref:`ebreak_scenario_2`) and put at t

When ebreak instruction is used as Software Breakpoint by a debugger when in debug mode and is placed at the last instruction location of an HWLoop in instruction memory, no special management is foreseen.
When executing the Software Breakpoint/ebreak instruction, control is given back to the debugger which will manage the different cases.
For instance in Single-Step case, original instruction is put back in instruction memory, a Single-Step command is executed on this last instruction (with desgin updating PC and lpcountX to correct values) and Software Breakpoint/ebreak is put back by the debugger in memory.
For instance in Single-Step case, original instruction is put back in instruction memory, a Single-Step command is executed on this last instruction (with design updating PC and lpcountX to correct values) and Software Breakpoint/ebreak is put back by the debugger in memory.

When ecall instruction is used by a debugger to execute System Calls and is placed at the last instruction location of an HWLoop in instruction memory, debugger ecall handler in debug program should do the same than described above for application case.

Expand Down
8 changes: 5 additions & 3 deletions docs/source/fpu.rst
Original file line number Diff line number Diff line change
Expand Up @@ -163,9 +163,6 @@ host the floating-point operands.

The latency of the individual instructions are explained in :ref:`instructions_latency_table` table.

To allow FPU unit to be put in sleep mode at the same time the core is doing so, a clock gating cell is instantiated in ``cv32e40p_top`` top level module as well
with its enable signal being inverted ``core_sleep_o`` core output.

FP CSR
------

Expand All @@ -175,6 +172,11 @@ exceptions that occurred since it was last reset and the rounding mode.
:ref:`csr-fflags` and :ref:`csr-frm` can be accessed directly or via :ref:`csr-fcsr` which is mapped to
those two registers.

FPU Sleeping mode
-----------------

To reduce power consumption, FPU clock is stopped when no FP instruction is being executed. To do so a dedicated clock gating cell is instantiated in ``cv32e40p_top`` top level module with its enable signal depending of both ``apu_req_o`` and ``apu_busy_o`` core outputs.

Reminder for programmers
------------------------

Expand Down
56 changes: 49 additions & 7 deletions docs/source/integration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -248,13 +248,55 @@ The ``constraints/cv32e40p_core.sdc`` file provides an example of synthesis cons
ASIC Synthesis
^^^^^^^^^^^^^^

ASIC synthesis is supported for CV32E40P. The whole design is completely
synchronous and uses positive-edge triggered flip-flops. The
core occupies an area of about XX kGE.
With the FPU, the area increases to about XX kGE (XX kGE
FPU, XX kGE additional register file). A technology specific implementation
of a clock gating cell as described in :ref:`clock-gating-cell` needs to
be provided.
ASIC synthesis is supported for CV32E40P. The whole design is completely synchronous and uses positive-edge triggered flip-flops.

To give some size numbers, it has been synthetized at 100 MHz with a 32 KB memory connected on each of its OBI interface, DFT scan chains have been implemented and it went down to full back-end implementation with Clock Tree synthesis.
But no memory bist are inserted and there are no scan compression for DFT.

And a technology specific implementation of a clock gating cell as described in :ref:`clock-gating-cell` has been provided.

Following table gives CV32E40P size in Kilo-Gates numbers using a 2-input NAND gate with X1 drive for different top parameters settings (COREV_CLUSTER = 0 for all cases).

.. table:: CV32E40P size
:name: CV32E40P size
:widths: 45 45 10
:class: no-scrollbar-table

+-----------------------+--------------------+--------+
| **Configuration** | **Top Parameters** | **KG** |
+=======================+====================+========+
| V1 | COREV_PULP = 0 | 40 |
| | | |
| | FPU = 0 | |
| | | |
| | ZFINX = 0 | |
+-----------------------+--------------------+--------+
| V2 PULP | COREV_PULP = 1 | 57 |
| | | |
| | FPU = 0 | |
| | | |
| | ZFINX = 0 | |
+-----------------------+--------------------+--------+
| V2 PULP & FPU | COREV_PULP = 1 | 93 |
| | | |
| | FPU = 1 | |
| | | |
| | ZFINX = 0 | |
| | | |
| | FPU_ADDMUL_LAT = 0 | |
| | | |
| | FPU_OTHERS_LAT = 0 | |
+-----------------------+--------------------+--------+
| V2 PULP & FPU & ZFINX | COREV_PULP = 1 | 77 |
| | | |
| | FPU = 1 | |
| | | |
| | ZFINX = 1 | |
| | | |
| | FPU_ADDMUL_LAT = 0 | |
| | | |
| | FPU_OTHERS_LAT = 0 | |
+-----------------------+--------------------+--------+

FPGA Synthesis
^^^^^^^^^^^^^^^
Expand Down
Loading

0 comments on commit c998590

Please sign in to comment.