Skip to content

Commit

Permalink
Merge branch 'intel:main' into main
Browse files Browse the repository at this point in the history
  • Loading branch information
desmonddak authored Oct 29, 2024
2 parents dce9512 + 8bf5d23 commit 8b266d0
Show file tree
Hide file tree
Showing 39 changed files with 3,138 additions and 632 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ tmp*
confapp/.vscode/*
*tracker.json
*tracker.log
*.sv

# Exceptions
!.vscode/extensions.json
Expand Down
2 changes: 2 additions & 0 deletions doc/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,8 @@ Some in-development items will have opened issues, as well. Feel free to create
- CRC
- [Parity](./components/parity.md)
- Interleaving
- Clocking
- [Clock gating](./components/clock_gating.md)
- Data flow
- Ready/Valid
- Connect/Disconnect
Expand Down
Binary file added doc/components/clock_gate.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
84 changes: 84 additions & 0 deletions doc/components/clock_gating.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# Clock Gating

ROHD-HCL includes a generic clock gating component for enabling and disabling clocks to save power. The implementation supports multiple scenarios and use cases:

- Easily control whether clock gating `isPresent` or not without modifying the implementation.
- Delay (or don't) controlled signals that are sampled in the gated clock domain, depending on your timing needs.
- Optionally use an override to force all clock gates to be enabled.
- Bring your own clock gating implementation and propagate the instantiation and any additionally required ports through an entire hierarchy without modifying any lower levels of the design.
- Automatically handle some tricky situations (e.g. keeping clocks enabled during reset for synchronous reset).

![Diagram of the clock gating component](clock_gate.png)

A very simple counter design is shown below with clock gating included via the component.

```dart
class CounterWithSimpleClockGate extends Module {
Logic get count => output('count');
CounterWithSimpleClockGate({
required Logic clk,
required Logic incr,
required Logic reset,
required ClockGateControlInterface cgIntf,
}) : super(name: 'clk_gated_counter') {
clk = addInput('clk', clk);
incr = addInput('incr', incr);
reset = addInput('reset', reset);
// We clone the incoming interface, receiving all config information with it
cgIntf = ClockGateControlInterface.clone(cgIntf)
..pairConnectIO(this, cgIntf, PairRole.consumer);
// In this case, we want to enable the clock any time we're incrementing
final clkEnable = incr;
// Build the actual clock gate component.
final clkGate = ClockGate(
clk,
enable: clkEnable,
reset: reset,
controlIntf: cgIntf,
);
final count = addOutput('count', width: 8);
count <=
flop(
// access the gated clock from the component
clkGate.gatedClk,
// by default, `controlled` signals are delayed by 1 cycle
count + clkGate.controlled(incr).zeroExtend(count.width),
reset: reset,
);
}
}
```

Some important pieces to note here are:

- The clock gate component is instantiated like any other component
- We pass it a `ClockGateControlInterface` which brings with it any potential custom control. When we punch ports for this design, we use the `clone` constructor, which carries said configuration information.
- We enable the clock any time `incr` is asserted to increment the counter.
- Use the gated clock on the downstream flop for the counter.
- Use a "controlled" version of `incr`, which by default is delayed by one cycle.

The `ClockGateControlInterface` comes with an optional `enableOverride` which can force the clocks to always be enabled. It also contains a boolean `isPresent` which can control whether clock gating should be generated at all. Since configuration information is automatically carried down through the hierarchy, this means you *can turn on or off clock gating generation through an entire hierarchy without modifying your design*.

Suppose now we wanted to add our own custom clock gating module implementation. This implementation may require some additional signals as well. When we pass a control interface we can provide some additional arguments to achieve this. For example:

```dart
ClockGateControlInterface(
additionalPorts: [
Port('anotherOverride'),
],
gatedClockGenerator: (intf, clk, enable) => CustomClockGatingModule(
clk: clk,
en: enable,
anotherOverride: intf.port('anotherOverride'),
).gatedClk,
);
```

Passing in an interface configured like this would mean that any consumers would automatically get the additional ports and new clock gating implementation. Our counter example could get this new method for clock gating and a new port without changing the design of the counter at all.

An executable version of this example is available in `example/clock_gating_example.dart`.
7 changes: 7 additions & 0 deletions doc/components/fixed_point.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Fixed-Point Arithmetic

Fixed-point binary representation of numbers is useful several applications including digital signal processing and embedded systems. As a first step towards enabling fixed-point components, we created a new value system `FixedPointValue` similar to `LogicValue`.

## FixedPointValue

A `FixedPointValue` represents a signed or unsigned fixed-point value following the Q notation (Qm.n format) as introduced by [Texas Instruments](https://www.ti.com/lit/ug/spru565b/spru565b.pdf). It comprises an optional sign, integer part and/or a fractional part. `FixedPointValue`s can be constructed from individual fields or from a Dart `double`, converted to Dart `double`, can be compared and can be operated on (+, -, *, /).
6 changes: 4 additions & 2 deletions doc/components/floating_point.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@ Again, like `FloatingPointValue`, `FloatingPoint64` and `FloatingPoint32` subcla

## FloatingPointAdder

A very basic `FloatingPointAdder` component is available which does not perform any rounding. It takes two `FloatingPoint` `LogicStructure`s and adds them, returning a normalized `FloatingPoint` on the output. An option on input is the type of `ParallelPrefixTree` used in the internal addition of the mantissas.
A very basic `FloatingPointAdderSimple` component is available which does not perform any rounding. It takes two `FloatingPoint` `LogicStructure`s and adds them, returning a normalized `FloatingPoint` on the output. An option on input is the type of `ParallelPrefixTree` used in the internal addition of the mantissas.

Currently, the `FloatingPointAdder` is close in accuracy (as it has no rounding) and is not optimized for circuit performance, but only provides the key functionalities of alignment, addition, and normalization. Still, this component is a starting point for more realistic floating-point components that leverage the logical `FloatingPoint` and literal `FloatingPointValue` type abstractions.
Currently, the `FloatingPointAdderSimple` is close in accuracy (as it has no rounding) and is not optimized for circuit performance, but only provides the key functionalities of alignment, addition, and normalization. Still, this component is a starting point for more realistic floating-point components that leverage the logical `FloatingPoint` and literal `FloatingPointValue` type abstractions.

A second `FloatingPointAdderRound` component is available which does perform rounding. It is based on "Delay-Optimized Implementation of IEEE Floating-Point Addition", by Peter-Michael Seidel and Guy Even, using an R-path and an N-path to process far-apart exponents and use rounding and an N-path for exponents within 2 and subtraction, which is exact.
24 changes: 14 additions & 10 deletions doc/components/multiplier_components.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ Our `RadixEncoder` module is general, creating selection tables for arbitrary Bo

The `PartialProductGenerator` class also provides for sign extension with multiple options including `SignExtension.none` which is no sign extension for help in debugging, as well as `SignExtension.compactRect` which is a compact form which works for rectangular products where the multiplicand and multiplier can be of different widths.

If customization is needed beyond sign extension options, routines are provided that allow for fixed customization of bit positions, or conditional (mux based on a Logic) form.
The `PartialProductGenerator` creates a set of addends in its base class `PartialProductArray` which is simply a `List<List<Logic>>` to represent addends and a `rowShift[row]` to represent the shifts in the partial product matrix. If customization is needed beyond sign extension options, routines are provided that allow for fixed customization of bit positions or conditional (mux based on a Logic) form in the `PartialProductArray`.

```dart
final ppg = PartialProductGenerator(a,b);
Expand Down Expand Up @@ -167,7 +167,7 @@ You can also generate a Markdown form of the same matrix:

Once you have a partial product matrix, you would like to add up the addends. Traditionally this is done using compression trees which instantiate 2:1 and 3:2 column compressors (or carry-save adders) to reduce the matrix to two addends. The final two addends are often added with an efficient final adder.

Our `ColumnCompressor` class uses a delay-driven approach to efficiently compress the rows of the partial product matrix. Its only argument is a `PartialProductGenerator`, and it creates a list of `ColumnQueue`s containing the final two addends stored by column after compression. An `extractRow`routine can be used to extract the columns. `ColumnCompressor` currently has an extension `EvaluateColumnCompressor` which can be used to print out the compression progress. Here is the legend for these printouts.
Our `ColumnCompressor` class uses a delay-driven approach to efficiently compress the rows of the partial product matrix. Its only argument is a `PartialProductArray` (base class of `PartialProductGenerator`), and it creates a list of `ColumnQueue`s containing the final two addends stored by column after compression. An `extractRow`routine can be used to extract the columns. `ColumnCompressor` currently has an extension `EvaluateColumnCompressor` which can be used to print out the compression progress. Here is the legend for these printouts.

- `ppR,C` = partial product entry at row R, column C
- `sR,C` = sum term coming last from row R, column C
Expand All @@ -183,12 +183,14 @@ Compression Tree before:
pp2,6 pp2,5 pp2,4 pp2,3
pp1,6 pp1,5 pp1,4
1 1 0 0 0 0 0 0 0 1 1 0 110000000110 (3078)
1 1 0 0 0 1 1 1 0 0 001100011100 (796)
0 0 0 0 0 1 0 0 000000001000 (8)
0 1 0 0 0 0 000001000000 (64)
1 1 1 1 000001111000 (120)
0 1 1 000000110000 (48) Total=18
11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 0 0 s s s S S = 3075 (-1021)
1 1 0 0 0 0 0 0 0 1 = 769 (769)
0 0 0 0 0 1 1 1 = 14 (14)
1 i S 1 1 0 = 184 (184)
0 0 1 1 = 24 (24)
0 1 1 = 48 (48)
p 0 0 0 0 0 0 0 1 0 0 1 0 = 18 (18)
```

Compression Tree after compression:
Expand All @@ -197,8 +199,10 @@ Compression Tree after compression:
pp5,11 pp5,10 s0,9 s0,8 s0,7 c0,5 c0,4 c0,3 s0,3 s0,2 pp0,1 pp1,0
c0,9 c0,8 c0,7 c0,6 s0,6 s0,5 s0,4 s0,3 s0,2 s0,1 pp0,0
1 1 1 1 1 0 1 0 0 1 0 0 111110100100 (4004)
0 0 0 0 1 1 0 1 1 1 0 000001101110 (110) Total=18
11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 1 1 1 0 0 1 1 0 S = 4045 (-51)
0 0 0 0 1 0 0 0 1 0 1 = 69 (69)
p 0 0 0 0 0 0 0 1 0 0 1 0 = 18 (18)
```

## Final Adder
Expand Down
176 changes: 176 additions & 0 deletions example/clock_gating_example.dart
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
// Copyright (C) 2024 Intel Corporation
// SPDX-License-Identifier: BSD-3-Clause
//
// clock_gating_example.dart
// Example of how to use clock gating.
//
// 2024 September 24
// Author: Max Korbel <[email protected]>

// ignore_for_file: avoid_print

import 'dart:async';

import 'package:rohd/rohd.dart';
import 'package:rohd_hcl/rohd_hcl.dart';
import 'package:rohd_vf/rohd_vf.dart';

/// A very simple counter that has clock gating internally.
class CounterWithSimpleClockGate extends Module {
Logic get count => output('count');

CounterWithSimpleClockGate({
required Logic clk,
required Logic incr,
required Logic reset,
required ClockGateControlInterface cgIntf,
}) : super(name: 'clk_gated_counter') {
clk = addInput('clk', clk);
incr = addInput('incr', incr);
reset = addInput('reset', reset);

// We clone the incoming interface, receiving all config information with it
cgIntf = ClockGateControlInterface.clone(cgIntf)
..pairConnectIO(this, cgIntf, PairRole.consumer);

// In this case, we want to enable the clock any time we're incrementing
final clkEnable = incr;

// Build the actual clock gate component.
final clkGate = ClockGate(
clk,
enable: clkEnable,
reset: reset,
controlIntf: cgIntf,
delayControlledSignals: true,
);

final count = addOutput('count', width: 8);
count <=
flop(
// access the gated clock from the component
clkGate.gatedClk,

// depending on configuration default, `controlled` signals are
// delayed by 1 cycle (in this case we enable it)
count + clkGate.controlled(incr).zeroExtend(count.width),

reset: reset,
);
}
}

/// A reference to an external SystemVerilog clock-gating macro.
class CustomClockGateMacro extends Module with CustomSystemVerilog {
Logic get gatedClk => output('gatedClk');

CustomClockGateMacro({
required Logic clk,
required Logic en,
required Logic override,
required Logic anotherOverride,
}) : super(name: 'custom_clock_gate_macro') {
// make sure ports match the SystemVerilog
clk = addInput('clk', clk);
en = addInput('en', en);
override = addInput('override', override);
anotherOverride = addInput('another_override', anotherOverride);
addOutput('gatedClk');

// simulation-only behavior
gatedClk <= clk & flop(~clk, en | override | anotherOverride);
}

// define how to instantiate this custom SystemVerilog
@override
String instantiationVerilog(String instanceType, String instanceName,
Map<String, String> inputs, Map<String, String> outputs) =>
'`CUSTOM_CLOCK_GATE('
'${outputs['gatedClk']}, '
'${inputs['clk']}, '
'${inputs['en']}, '
'${inputs['override']}, '
'${inputs['another_override']}'
')';
}

Future<void> main({bool noPrint = false}) async {
// Build a custom version of the clock gating control interface which uses our
// custom macro.
final customClockGateControlIntf = ClockGateControlInterface(
hasEnableOverride: true,
additionalPorts: [
// we add an additional override port, for example, which is passed
// automatically down the hierarchy
Port('anotherOverride'),
],
gatedClockGenerator: (intf, clk, enable) => CustomClockGateMacro(
clk: clk,
en: enable,
override: intf.enableOverride!,
anotherOverride: intf.port('anotherOverride'),
).gatedClk,
);

// Generate a simple clock. This will run along by itself as
// the Simulator goes.
final clk = SimpleClockGenerator(10).clk;

// ... and some additional signals
final reset = Logic();
final incr = Logic();

final counter = CounterWithSimpleClockGate(
clk: clk,
reset: reset,
incr: incr,
cgIntf: customClockGateControlIntf,
);

// build the module and attach a waveform viewer for debug
await counter.build();

// Let's see what this module looks like as SystemVerilog, so we can pass it
// to other tools.
final systemVerilogCode = counter.generateSynth();
if (!noPrint) {
print(systemVerilogCode);
}

// Now let's try simulating!

// Attach a waveform dumper so we can see what happens.
if (!noPrint) {
WaveDumper(counter);
}

// Start off with a disabled counter and asserting reset at the start.
incr.inject(0);
reset.inject(1);

// leave overrides turned off
customClockGateControlIntf.enableOverride!.inject(0);
customClockGateControlIntf.port('anotherOverride').inject(0);

Simulator.setMaxSimTime(1000);
unawaited(Simulator.run());

// wait a bit before dropping reset
await clk.waitCycles(3);
reset.inject(0);

// wait a bit before raising incr
await clk.waitCycles(5);
incr.inject(1);

// leave it high for a bit, then drop it
await clk.waitCycles(5);
incr.inject(0);

// wait a little longer, then end the test
await clk.waitCycles(5);
await Simulator.endSimulation();

// Now we can review the waves to see how the gated clock does not toggle
// while gated!
}
14 changes: 9 additions & 5 deletions example/example.dart
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
import 'package:rohd/rohd.dart';
import 'package:rohd_hcl/rohd_hcl.dart';

Future<void> main() async {
Future<void> main({bool noPrint = false}) async {
// Build a module that rotates a 16-bit signal by an 8-bit signal, which
// we guarantee will never see more than 10 as the rotate amount.
final original = Logic(width: 16);
Expand All @@ -23,11 +23,15 @@ Future<void> main() async {
// Do a quick little simulation with some inputs
original.put(0x4321);
rotateAmount.put(4);
print('Shifting ${original.value} by ${rotateAmount.value} '
'yields ${rotated.value}');
if (!noPrint) {
print('Shifting ${original.value} by ${rotateAmount.value} '
'yields ${rotated.value}');
}

// Generate verilog for it and print it out
await mod.build();
print('Generating verilog...');
print(mod.generateSynth());
final sv = mod.generateSynth();
if (!noPrint) {
print(sv);
}
}
1 change: 1 addition & 0 deletions lib/rohd_hcl.dart
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
export 'src/arbiters/arbiters.dart';
export 'src/arithmetic/arithmetic.dart';
export 'src/binary_gray.dart';
export 'src/clock_gating.dart';
export 'src/component_config/component_config.dart';
export 'src/count.dart';
export 'src/edge_detector.dart';
Expand Down
4 changes: 2 additions & 2 deletions lib/src/arithmetic/addend_compressor.dart
Original file line number Diff line number Diff line change
Expand Up @@ -167,8 +167,8 @@ class ColumnCompressor {
late final List<ColumnQueue> columns;

/// The partial product generator to be compressed
final PartialProductGenerator pp;
/// The partial product array to be compressed
final PartialProductArray pp;

/// Initialize a ColumnCompressor for a set of partial products
ColumnCompressor(this.pp) {
Expand Down
Loading

0 comments on commit 8b266d0

Please sign in to comment.