From c810f50c2be7bd691973c1c7280e00ca3c0f29a4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jannis=20Sch=C3=B6nleber?= Date: Wed, 29 Nov 2023 10:52:10 +0100 Subject: [PATCH] docs: update readme --- README.md | 34 +++++++++++++++++++--------------- 1 file changed, 19 insertions(+), 15 deletions(-) diff --git a/README.md b/README.md index c547e279..0a964f3b 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,9 @@ -# Stella Nera - Halutmatmul +
+ +# Stella Nera: A halutmatmul based accelerator +
+ +
### Algorithmic CI [![PyTorch Layer Test | PyTest](https://github.com/joennlae/halutmatmul/actions/workflows/python_testing.yaml/badge.svg)](https://github.com/joennlae/halutmatmul/actions/workflows/python_testing.yaml) @@ -13,7 +18,9 @@ [![RTL Linting](https://github.com/joennlae/halutmatmul/actions/workflows/hw_linting.yaml/badge.svg)](https://github.com/joennlae/halutmatmul/actions/workflows/hw_linting.yaml) [![HW Design Verification](https://github.com/joennlae/halutmatmul/actions/workflows/hw_dv.yaml/badge.svg)](https://github.com/joennlae/halutmatmul/actions/workflows/hw_dv.yaml) -# Paper +
+ +## Paper * [Stella Nera: Achieving 161 TOp/s/W with Multiplier-free DNN Acceleration based on Approximate Matrix Multiplication](https://arxiv.org/abs/2311.10207) @@ -22,6 +29,7 @@ *The recent Maddness method approximates Matrix Multiplication (MatMul) without the need for multiplication by using a hash-based version of product quantization (PQ). The hash function is a decision tree, allowing for efficient hardware implementation, as multiply-accumulate operations are replaced by decision tree passes and LUT lookups. Stella Nera is the first Maddness accelerator achieving 15x higher area efficiency (GMAC/s/mm^2) and 25x higher energy efficiency (TMAC/s/W) than direct MatMul accelerators in the same technology. In a commercial 14 nm technology and scaled to 3 nm, we achieve an energy efficiency of 161 TOp/s/W@0.55V with a Top-1 accuracy on CIFAR-10 of over 92.5% using ResNet9.* + # Algorithmic - Maddness ![Maddness Animation](/docs/images/maddness_animation.webp) @@ -56,9 +64,9 @@ mse = np.square(C_halut - C).mean() print(mse) ``` -# Hardware - OpenROAD flow results from CI +# Hardware - OpenROAD flow results from CI - NOT OPTIMIZED -All NanGate45 results are NOT OPTIMIZED! The results are only for reference and to show the flow works. +All completely open hardware results are NOT OPTIMIZED! The results are only for reference and to show the flow works. | All Designs | NanGate45 | | ------------- | ------------- | @@ -86,8 +94,8 @@ ACC_TYPE=INT DATA_WIDTH=8 NUM_M=8 NUM_DECODER_UNITS=4 NUM_C=16 make halut-open-s | Voltage [V] | 1.1 | | Util [%] | 50.4 | | TNS | 0 | -| Clock Net | ![Clock_net](https://raw.githubusercontent.com/joennlae/halutmatmul-openroad-reports/main/latest/nangate45/halut_matmul/reports/final_clocks.webp) | -| Routing | ![Routing](https://raw.githubusercontent.com/joennlae/halutmatmul-openroad-reports/main/latest/nangate45/halut_matmul/reports/final_routing.webp) | +| Clock Net | Clock Net | +| Routing | Routing | | GDS | [GDS Download](https://raw.githubusercontent.com/joennlae/halutmatmul-openroad-reports/main/latest/nangate45/halut_matmul/results/6_final.gds) | @@ -101,8 +109,8 @@ ACC_TYPE=INT DATA_WIDTH=8 NUM_M=8 NUM_DECODER_UNITS=4 NUM_C=16 make halut-open-s | Voltage [V] | 1.1 | | Util [%] | 48.7 | | TNS | 0 | -| Clock Net | ![Clock_net](https://raw.githubusercontent.com/joennlae/halutmatmul-openroad-reports/main/latest/nangate45/halut_encoder_4/reports/final_clocks.webp) | -| Routing | ![Routing](https://raw.githubusercontent.com/joennlae/halutmatmul-openroad-reports/main/latest/nangate45/halut_encoder_4/reports/final_routing.webp) | +| Clock Net | Clock Net | +| Routing | Routing | | GDS | [GDS Download](https://raw.githubusercontent.com/joennlae/halutmatmul-openroad-reports/main/latest/nangate45/halut_encoder_4/results/6_final.gds) | @@ -116,8 +124,8 @@ ACC_TYPE=INT DATA_WIDTH=8 NUM_M=8 NUM_DECODER_UNITS=4 NUM_C=16 make halut-open-s | Voltage [V] | 1.1 | | Util [%] | 52.1 | | TNS | 0 | -| Clock Net | ![Clock_net](https://raw.githubusercontent.com/joennlae/halutmatmul-openroad-reports/main/latest/nangate45/halut_decoder/reports/final_clocks.webp) | -| Routing | ![Routing](https://raw.githubusercontent.com/joennlae/halutmatmul-openroad-reports/main/latest/nangate45/halut_decoder/reports/final_routing.webp) | +| Clock Net | Clock Net | +| Routing | Routing | | GDS | [GDS Download](https://raw.githubusercontent.com/joennlae/halutmatmul-openroad-reports/main/latest/nangate45/halut_decoder/results/6_final.gds) | @@ -133,11 +141,7 @@ conda activate halutmatmul conda env create -f environment_gpu.yml --prefix /scratch/janniss/conda/halutmatmul_gpu ``` -## References +### References * [arXiv](https://arxiv.org/abs/2106.10860) Maddness paper * Based on [MADDness/Bolt](https://github.com/dblalock/bolt). - -## Hackernews mention (comments only) and discussion - -* [HN: Bolt: Faster matrix and vector operations that run on compressed data](https://news.ycombinator.com/item?id=31792206)