Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PUSCH receiver kernels #108

Merged
merged 41 commits into from
Jan 6, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
10ba276
[software] Add mimo_mmse_f32/f16 kernels
mbertuletti Feb 13, 2023
5f667a3
[software] Add jacobi_f32 kernel for linear system solution
mbertuletti Mar 17, 2023
d0482e9
[software] Fix complex conjugate multiplications in LTtrisol_f32/f16
mbertuletti Apr 4, 2023
bc8952c
[software] Add parallel hermitian_f32/f16 for mimo_mmse_f32/f16
mbertuletti Apr 25, 2023
7337c60
[software] Clean up main of cholesky_f16 and mimo_mmse_f16
mbertuletti Jul 17, 2023
ce412ab
[software] Move kernels and data generation scripts to runtime folder
mbertuletti Jul 17, 2023
8df39cc
[software] Add cfft_radix4_f16 kernel
mbertuletti Jul 17, 2023
52237dc
[software] Add chest_f16 (block-type channel estimation) kernel
mbertuletti Sep 4, 2023
a2819be
[software] Add cmatmul_f16 kernel (complex matrix-multiplication)
mbertuletti Sep 13, 2023
d93ffd0
[software] Add mimo_mmse_f16 with wDotp extensions
mbertuletti Sep 29, 2023
363fa4e
[software] Add function descriptions
mbertuletti Sep 29, 2023
cab0290
[software] Update data generation scripts
mbertuletti Oct 2, 2023
a039b79
[software] Transfer data using DMA
mbertuletti Dec 12, 2023
edc2c6c
[software] Add cholesky_f16 with wDotp extensions
mbertuletti Dec 12, 2023
b36b2f7
[software] Add OFDM application
mbertuletti Dec 12, 2023
b7a0c84
[software] Add mimo_mmse_q16 kernels (fixed-point precision)
mbertuletti Jan 4, 2024
a170b74
[software] Fix cfft_radix4_f16 butterfly operations
mbertuletti Jan 8, 2024
380729c
[software] Adapt data generation to folder structure in #PR96
mbertuletti Jan 8, 2024
bb58026
[software] Add complex instructions to cmatmul_f16
mbertuletti Jan 8, 2024
962c313
[software] Handle multiple beamgroups in mimo_mmse_f16
mbertuletti Jan 11, 2024
6ec634e
[software] Add and compile mimo_mmse_f16 with soft-divsqrt
mbertuletti Feb 2, 2024
d8d29b6
[software] Add cmatmul_q16 (complex fixed-point matrix-multiplication)
mbertuletti Feb 20, 2024
7c6c1b6
[software] Modify channel estimation with multiplication by pilots
mbertuletti Mar 5, 2024
ac589b2
[software] Fix data loading in cfft_radix4_f16
mbertuletti Mar 5, 2024
8bed4ad
[software] Adapt to new folder structure in #PR96
mbertuletti Apr 25, 2024
0e6b37c
[software] Remove load of che inputs from inner loop
mbertuletti Apr 25, 2024
0e3894f
[software] Add shuffle instruction in cfft_radix4_f16
mbertuletti Jul 5, 2024
f0270f8
[software] Clean-up complex matmuls
mbertuletti Aug 22, 2024
5f3c750
[software] Add f32 and f16 dotp/axpy kernels
mbertuletti Aug 26, 2024
33701fa
[software] Clean-up data transfers in mimo_mmse_f16
mbertuletti Sep 5, 2024
f0570a5
[software] Add mimo_mmse_f16 with fcdotp extensions
mbertuletti Sep 20, 2024
5984c35
[software] Add mimo_mmse_f8 kernels
mbertuletti Sep 20, 2024
0596309
[software] Clean up folded mimo_mmse_f16 and Ltrisol_f16
mbertuletti Oct 16, 2024
bbab0ca
[software] Adapt generation of data to #PR103
mbertuletti Nov 26, 2024
3b5886b
[github] Change Ubuntu version to 22.04
mbertuletti Dec 6, 2024
3ea70e0
[software] Add matmul kernel with the conflict optimization scheme
yichao-zh Oct 21, 2022
5bee548
[software] Move the port-conflict optimized matmul to matmul_i32p
mbertuletti Dec 10, 2024
0f1de6f
Update CHANGELOG.md
mbertuletti Dec 10, 2024
ea4ad72
[software] Add explanation for the use of defines
mbertuletti Dec 19, 2024
ca65aa6
[software] Cross-out defines for Banshee Monte-Carlo simulation
mbertuletti Dec 19, 2024
d00e81b
[software] Add flag to fold q16 MMSE
mbertuletti Jan 6, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ jobs:
path: riscv-gnu-toolchain.tzst

tc-llvm:
runs-on: ubuntu-20.04
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v4
- name: Recover the submodule commit hash
Expand Down Expand Up @@ -240,7 +240,7 @@ jobs:
git diff --exit-code

check-control-registers:
runs-on: ubuntu-20.04
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v4
- name: Install Python requirements
Expand All @@ -266,7 +266,7 @@ jobs:
# Build Software #
####################
build-apps-gcc:
runs-on: ubuntu-20.04
runs-on: ubuntu-22.04
needs: tc-gcc
strategy:
matrix:
Expand Down Expand Up @@ -297,7 +297,7 @@ jobs:
path: apps-gcc.tzst

build-apps-llvm:
runs-on: ubuntu-20.04
runs-on: ubuntu-22.04
needs: [tc-gcc, tc-llvm]
strategy:
matrix:
Expand Down Expand Up @@ -377,7 +377,7 @@ jobs:
# Run Software #
##################
run-apps-gcc:
runs-on: ubuntu-20.04
runs-on: ubuntu-22.04
timeout-minutes: 20
needs: [build-apps-gcc, riscv-isa-sim, verilator-model]
strategy:
Expand Down Expand Up @@ -415,7 +415,7 @@ jobs:
make trace

run-apps-llvm:
runs-on: ubuntu-20.04
runs-on: ubuntu-22.04
timeout-minutes: 20
needs: [build-apps-llvm, riscv-isa-sim, verilator-model]
strategy:
Expand Down Expand Up @@ -453,7 +453,7 @@ jobs:
make trace

run-apps-halide:
runs-on: ubuntu-20.04
runs-on: ubuntu-22.04
timeout-minutes: 20
needs: [build-apps-halide, riscv-isa-sim, verilator-model]
strategy:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ jobs:
# Check License #
#################
check-license:
runs-on: ubuntu-20.04
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v4
- name: Install Python requirements
Expand Down
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,10 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
- Add pv.pack.h xpulpv2 instruction
- Add a script to generate random data to preload the L2 memory
- Add stack overflow simulator warning using dedicated CSR
- Add mimo_mmse_f16 kernels
- Add cmatmul_f16 kernels
- Add cfft_radix4_f16 kernels
- Add chest_f16 kernels

### Fixed
- Measure the `wfi` stalls and stalls caused by `opc` properly
Expand Down
1 change: 1 addition & 0 deletions python-requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,4 @@ progressbar2
tabulate
sympy
scipy
pyflexfloat
9 changes: 7 additions & 2 deletions software/apps/baremetal/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,13 @@ APPS := $(patsubst $(APPS_DIR)/%/main.c,%,$(shell find $(APPS_DIR) -name "main.c
BINARIES := $(addprefix $(BIN_DIR)/,$(APPS))
ALL := $(APPS)

ALL_GCC := $(filter-out matmul_f16 matmul_f32, $(ALL))
ALL_LLVM := $(filter-out synth_i32 chest_q16 cfft_radix2_q16 cfft_radix4_q16, $(ALL))
FP_SUFFIXES := f16 f32 f8
I_SUFFIXES := q16 q32 i16 i32 i8
I_APPS := $(foreach suf, $(FP_SUFFIXES), $(filter %_$(suf), $(ALL)))
FP_APPS := $(foreach suf, $(I_SUFFIXES), $(filter %_$(suf), $(ALL)))
# Filter out applications
ALL_GCC := $(filter-out $(I_APPS), $(ALL))
ALL_LLVM := $(filter-out $(FP_APPS), $(ALL))

# Make all applications
all: $(ALL_GCC)
Expand Down
58 changes: 58 additions & 0 deletions software/apps/baremetal/axpy_f16/main.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
// Copyright 2021 ETH Zurich and University of Bologna.
// Licensed under the Apache License, Version 2.0, see LICENSE for details.
// SPDX-License-Identifier: Apache-2.0

// Author: Marco Bertuletti, ETH Zurich

#include <stdint.h>
#include <stdlib.h>
#include <string.h>

#include "dma.h"
#include "encoding.h"
#include "printf.h"
#include "runtime.h"
#include "synchronization.h"

#include "data_axpy_f16.h"

#include "baremetal/mempool_axpy_f16.h"
#include "baremetal/mempool_checks.h"

__fp16 l1_X[array_N] __attribute__((aligned(NUM_BANKS), section(".l1_prio")));
__fp16 l1_Y[array_N] __attribute__((aligned(NUM_BANKS), section(".l1_prio")));

int main() {

uint32_t core_id = mempool_get_core_id();
uint32_t num_cores = mempool_get_core_count();
uint32_t time_init, time_end;
mempool_barrier_init(core_id);

time_init = 0;
time_end = 0;
if (core_id == 0) {
dma_memcpy_blocking(l1_X, l2_X, array_N * sizeof(int16_t));
dma_memcpy_blocking(l1_Y, l2_Y, array_N * sizeof(int16_t));
}
uint32_t register volatile a = *(uint32_t *)&(l2_A)&0x0000FFFF;
mempool_barrier(num_cores);

// PARALLEL, LOCAL ACCESSES
time_init = mempool_get_timer();
mempool_start_benchmark();
axpy_f16vecp_local_unrolled4(a, l1_X, l1_Y, array_N);
mempool_stop_benchmark();
time_end = mempool_get_timer();

mempool_barrier(num_cores);
// Check results
if (core_id == 0) {
uint32_t clock_cycles = (time_end - time_init);
printf("\nKernel execution takes %d clock cycles\n", clock_cycles);
}
mempool_check_f16(l1_Y, l2_Z, 100, 0.1f, 0);
mempool_barrier(num_cores);

return 0;
}
57 changes: 57 additions & 0 deletions software/apps/baremetal/axpy_f32/main.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
// Copyright 2021 ETH Zurich and University of Bologna.
// Licensed under the Apache License, Version 2.0, see LICENSE for details.
// SPDX-License-Identifier: Apache-2.0

// Author: Marco Bertuletti, ETH Zurich

#include <stdint.h>
#include <stdlib.h>
#include <string.h>

#include "dma.h"
#include "encoding.h"
#include "printf.h"
#include "runtime.h"
#include "synchronization.h"

#include "data_axpy_f32.h"

#include "baremetal/mempool_axpy_f32.h"
#include "baremetal/mempool_checks.h"

float l1_X[array_N] __attribute__((aligned(NUM_BANKS), section(".l1_prio")));
float l1_Y[array_N] __attribute__((aligned(NUM_BANKS), section(".l1_prio")));

int main() {

uint32_t core_id = mempool_get_core_id();
uint32_t num_cores = mempool_get_core_count();
uint32_t time_init, time_end;
mempool_barrier_init(core_id);

time_init = 0;
time_end = 0;
if (core_id == 0) {
dma_memcpy_blocking(l1_X, l2_X, array_N * sizeof(int32_t));
dma_memcpy_blocking(l1_Y, l2_Y, array_N * sizeof(int32_t));
}
float register volatile a = l2_A;
mempool_barrier(num_cores);

// PARALLEL
time_init = mempool_get_timer();
mempool_start_benchmark();
axpy_f32p_local_unrolled4(a, l1_X, l1_Y, array_N);
mempool_stop_benchmark();
time_end = mempool_get_timer();

// Check results
if (core_id == 0) {
uint32_t clock_cycles = (time_end - time_init);
printf("\nKernel execution takes %d clock cycles\n", clock_cycles);
}
mempool_check_f32(l1_Y, l2_Z, 100, 0.1f, 0);
mempool_barrier(num_cores);

return 0;
}
16 changes: 7 additions & 9 deletions software/apps/baremetal/axpy_i32/main.c
Original file line number Diff line number Diff line change
Expand Up @@ -8,22 +8,19 @@
#include <stdlib.h>
#include <string.h>

/* Mempool runtime libraries */
#include "builtins_v2.h"
#include "dma.h"
#include "encoding.h"
#include "printf.h"
#include "runtime.h"
#include "synchronization.h"

#include "baremetal/mempool_axpy_i32p.h"
#include "baremetal/mempool_checks.h"
#include "data_axpy_i32.h"

int32_t l1_X[array_N]
__attribute__((aligned(NUM_CORES * sizeof(uint32_t)), section(".l1")));
int32_t l1_Y[array_N]
__attribute__((aligned(NUM_CORES * sizeof(uint32_t)), section(".l1")));
#include "baremetal/mempool_axpy_i32.h"
#include "baremetal/mempool_checks.h"

int32_t l1_X[array_N] __attribute__((aligned(NUM_BANKS), section(".l1")));
int32_t l1_Y[array_N] __attribute__((aligned(NUM_BANKS), section(".l1")));
int volatile error __attribute__((section(".l1")));

int main() {
Expand All @@ -38,11 +35,12 @@ int main() {
dma_memcpy_blocking(l1_Y, l2_Y, array_N * sizeof(int32_t));
error = 0;
}
register volatile int32_t a = l2_A;
mempool_barrier(num_cores);

// Benchmark
mempool_start_benchmark();
calc_axpy_unloop_x4_localbank(l1_X, l1_Y, ALPHA, array_N, core_id, num_cores);
calc_axpy_unloop_x4_localbank(l1_X, l1_Y, a, array_N, core_id, num_cores);
mempool_barrier(num_cores);
mempool_stop_benchmark();

Expand Down
15 changes: 10 additions & 5 deletions software/apps/baremetal/cfft_radix2_q16/main.c
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@
#include <stdlib.h>
#include <string.h>

/* Mempool runtime libraries */
#include "builtins_v2.h"
#include "dma.h"
#include "encoding.h"
Expand All @@ -19,15 +18,21 @@
#include "synchronization.h"

#include "data_cfft_radix2_q16.h"
#define N_BANKS (NUM_CORES * BANKING_FACTOR)

/* CFFT mempool libraries */
/*
======================
Parameters and defines

SINGLE: When defined runs single-core CFFT.
PARALLEL: When defined runs parallel CFFT.
*/

#define PARALLEL

#include "baremetal/mempool_cfft_q16_bitreversal.h"
#include "baremetal/mempool_checks.h"
#include "baremetal/mempool_radix2_cfft_q16.h"

#define PARALLEL

/* CFFT mempool data */
int16_t l1_pSrc[2 * N_CSAMPLES]
__attribute__((aligned(sizeof(int32_t)), section(".l1_prio")));
Expand Down
Loading