Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PUSCH receiver kernels #108

Open
wants to merge 40 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 38 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
10ba276
[software] Add mimo_mmse_f32/f16 kernels
mbertuletti Feb 13, 2023
5f667a3
[software] Add jacobi_f32 kernel for linear system solution
mbertuletti Mar 17, 2023
d0482e9
[software] Fix complex conjugate multiplications in LTtrisol_f32/f16
mbertuletti Apr 4, 2023
bc8952c
[software] Add parallel hermitian_f32/f16 for mimo_mmse_f32/f16
mbertuletti Apr 25, 2023
7337c60
[software] Clean up main of cholesky_f16 and mimo_mmse_f16
mbertuletti Jul 17, 2023
ce412ab
[software] Move kernels and data generation scripts to runtime folder
mbertuletti Jul 17, 2023
8df39cc
[software] Add cfft_radix4_f16 kernel
mbertuletti Jul 17, 2023
52237dc
[software] Add chest_f16 (block-type channel estimation) kernel
mbertuletti Sep 4, 2023
a2819be
[software] Add cmatmul_f16 kernel (complex matrix-multiplication)
mbertuletti Sep 13, 2023
d93ffd0
[software] Add mimo_mmse_f16 with wDotp extensions
mbertuletti Sep 29, 2023
363fa4e
[software] Add function descriptions
mbertuletti Sep 29, 2023
cab0290
[software] Update data generation scripts
mbertuletti Oct 2, 2023
a039b79
[software] Transfer data using DMA
mbertuletti Dec 12, 2023
edc2c6c
[software] Add cholesky_f16 with wDotp extensions
mbertuletti Dec 12, 2023
b36b2f7
[software] Add OFDM application
mbertuletti Dec 12, 2023
b7a0c84
[software] Add mimo_mmse_q16 kernels (fixed-point precision)
mbertuletti Jan 4, 2024
a170b74
[software] Fix cfft_radix4_f16 butterfly operations
mbertuletti Jan 8, 2024
380729c
[software] Adapt data generation to folder structure in #PR96
mbertuletti Jan 8, 2024
bb58026
[software] Add complex instructions to cmatmul_f16
mbertuletti Jan 8, 2024
962c313
[software] Handle multiple beamgroups in mimo_mmse_f16
mbertuletti Jan 11, 2024
6ec634e
[software] Add and compile mimo_mmse_f16 with soft-divsqrt
mbertuletti Feb 2, 2024
d8d29b6
[software] Add cmatmul_q16 (complex fixed-point matrix-multiplication)
mbertuletti Feb 20, 2024
7c6c1b6
[software] Modify channel estimation with multiplication by pilots
mbertuletti Mar 5, 2024
ac589b2
[software] Fix data loading in cfft_radix4_f16
mbertuletti Mar 5, 2024
8bed4ad
[software] Adapt to new folder structure in #PR96
mbertuletti Apr 25, 2024
0e6b37c
[software] Remove load of che inputs from inner loop
mbertuletti Apr 25, 2024
0e3894f
[software] Add shuffle instruction in cfft_radix4_f16
mbertuletti Jul 5, 2024
f0270f8
[software] Clean-up complex matmuls
mbertuletti Aug 22, 2024
5f3c750
[software] Add f32 and f16 dotp/axpy kernels
mbertuletti Aug 26, 2024
33701fa
[software] Clean-up data transfers in mimo_mmse_f16
mbertuletti Sep 5, 2024
f0570a5
[software] Add mimo_mmse_f16 with fcdotp extensions
mbertuletti Sep 20, 2024
5984c35
[software] Add mimo_mmse_f8 kernels
mbertuletti Sep 20, 2024
0596309
[software] Clean up folded mimo_mmse_f16 and Ltrisol_f16
mbertuletti Oct 16, 2024
bbab0ca
[software] Adapt generation of data to #PR103
mbertuletti Nov 26, 2024
3b5886b
[github] Change Ubuntu version to 22.04
mbertuletti Dec 6, 2024
3ea70e0
[software] Add matmul kernel with the conflict optimization scheme
yichao-zh Oct 21, 2022
5bee548
[software] Move the port-conflict optimized matmul to matmul_i32p
mbertuletti Dec 10, 2024
0f1de6f
Update CHANGELOG.md
mbertuletti Dec 10, 2024
c53ec74
[software] Add explanation for the use of defines
mbertuletti Dec 19, 2024
264879e
[software] Cross-out defines for Banshee Monte-Carlo simulation
mbertuletti Dec 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ jobs:
path: riscv-gnu-toolchain.tzst

tc-llvm:
runs-on: ubuntu-20.04
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v4
- name: Recover the submodule commit hash
Expand Down Expand Up @@ -240,7 +240,7 @@ jobs:
git diff --exit-code

check-control-registers:
runs-on: ubuntu-20.04
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v4
- name: Install Python requirements
Expand All @@ -266,7 +266,7 @@ jobs:
# Build Software #
####################
build-apps-gcc:
runs-on: ubuntu-20.04
runs-on: ubuntu-22.04
needs: tc-gcc
strategy:
matrix:
Expand Down Expand Up @@ -297,7 +297,7 @@ jobs:
path: apps-gcc.tzst

build-apps-llvm:
runs-on: ubuntu-20.04
runs-on: ubuntu-22.04
needs: [tc-gcc, tc-llvm]
strategy:
matrix:
Expand Down Expand Up @@ -377,7 +377,7 @@ jobs:
# Run Software #
##################
run-apps-gcc:
runs-on: ubuntu-20.04
runs-on: ubuntu-22.04
timeout-minutes: 20
needs: [build-apps-gcc, riscv-isa-sim, verilator-model]
strategy:
Expand Down Expand Up @@ -415,7 +415,7 @@ jobs:
make trace

run-apps-llvm:
runs-on: ubuntu-20.04
runs-on: ubuntu-22.04
timeout-minutes: 20
needs: [build-apps-llvm, riscv-isa-sim, verilator-model]
strategy:
Expand Down Expand Up @@ -453,7 +453,7 @@ jobs:
make trace

run-apps-halide:
runs-on: ubuntu-20.04
runs-on: ubuntu-22.04
timeout-minutes: 20
needs: [build-apps-halide, riscv-isa-sim, verilator-model]
strategy:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ jobs:
# Check License #
#################
check-license:
runs-on: ubuntu-20.04
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v4
- name: Install Python requirements
Expand Down
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,10 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
- Add pv.pack.h xpulpv2 instruction
- Add a script to generate random data to preload the L2 memory
- Add stack overflow simulator warning using dedicated CSR
- Add mimo_mmse_f16 kernels
- Add cmatmul_f16 kernels
- Add cfft_radix4_f16 kernels
- Add chest_f16 kernels

### Fixed
- Measure the `wfi` stalls and stalls caused by `opc` properly
Expand Down
1 change: 1 addition & 0 deletions python-requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,4 @@ progressbar2
tabulate
sympy
scipy
pyflexfloat
14 changes: 12 additions & 2 deletions software/apps/baremetal/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,18 @@ APPS := $(patsubst $(APPS_DIR)/%/main.c,%,$(shell find $(APPS_DIR) -name "main.c
BINARIES := $(addprefix $(BIN_DIR)/,$(APPS))
ALL := $(APPS)

ALL_GCC := $(filter-out matmul_f16 matmul_f32, $(ALL))
ALL_LLVM := $(filter-out synth_i32 chest_q16 cfft_radix2_q16 cfft_radix4_q16, $(ALL))
FP_APPS := axpy_f16 axpy_f32
FP_APPS += cfft_radix4_f16 chest_f16 cholesky_f16
FP_APPS += cmatmul_f16 matmul_f16 matmul_f32
FP_APPS += dotp_f16 dotp_f32
FP_APPS += mimo_mmse_f32 mimo_mmse_f16 mimo_mmse_f8 ofdm_f16

I_APPS := synth_i32
I_APPS += cfft_radix2_q16 cfft_radix4_q16 chest_q16 cholesky_q16 cholesky_q32
I_APPS += cmatmul_q16 mimo_mmse_q16

ALL_GCC := $(filter-out $(FP_APPS), $(ALL))
ALL_LLVM := $(filter-out $(I_APPS), $(ALL))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have the convention of adding the i32/f16/... suffix, we could easily automatically find all FP and I apps with a wildcard, right?


# Make all applications
all: $(ALL_GCC)
Expand Down
60 changes: 60 additions & 0 deletions software/apps/baremetal/axpy_f16/main.c
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to add a "#define local_parallel" instead of commenting the others out.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thanks.

Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
// Copyright 2021 ETH Zurich and University of Bologna.
// Licensed under the Apache License, Version 2.0, see LICENSE for details.
// SPDX-License-Identifier: Apache-2.0

// Author: Marco Bertuletti, ETH Zurich

#include <stdint.h>
#include <stdlib.h>
#include <string.h>

#include "dma.h"
#include "encoding.h"
#include "printf.h"
#include "runtime.h"
#include "synchronization.h"

#include "data_axpy_f16.h"
#define NUM_BANKS (NUM_CORES * BANKING_FACTOR)

// Vectors for kernel computation
__fp16 l1_X[array_N] __attribute__((aligned(NUM_BANKS), section(".l1_prio")));
__fp16 l1_Y[array_N] __attribute__((aligned(NUM_BANKS), section(".l1_prio")));

#include "baremetal/mempool_axpy_f16.h"
#include "baremetal/mempool_checks.h"

int main() {

uint32_t core_id = mempool_get_core_id();
uint32_t num_cores = mempool_get_core_count();
uint32_t time_init, time_end;
mempool_barrier_init(core_id);

time_init = 0;
time_end = 0;
if (core_id == 0) {
dma_memcpy_blocking(l1_X, l2_X, array_N * sizeof(int16_t));
dma_memcpy_blocking(l1_Y, l2_Y, array_N * sizeof(int16_t));
}
uint32_t register volatile a = *(uint32_t *)&(l2_A)&0x0000FFFF;
mempool_barrier(num_cores);

// PARALLEL, LOCAL ACCESSES
time_init = mempool_get_timer();
mempool_start_benchmark();
axpy_f16vecp_local_unrolled4(a, l1_X, l1_Y, array_N);
mempool_stop_benchmark();
time_end = mempool_get_timer();

mempool_barrier(num_cores);
// Check results
if (core_id == 0) {
uint32_t clock_cycles = (time_end - time_init);
printf("\nKernel execution takes %d clock cycles\n", clock_cycles);
}
mempool_check_f16(l1_Y, l2_Z, 100, 0.1f, 0);
mempool_barrier(num_cores);

return 0;
}
59 changes: 59 additions & 0 deletions software/apps/baremetal/axpy_f32/main.c
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same at this kernel, if we keep all of the kernels in this "main.c", we should add #define. Otherwise, we only leave one kernel instead of commenting others out. What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
// Copyright 2021 ETH Zurich and University of Bologna.
// Licensed under the Apache License, Version 2.0, see LICENSE for details.
// SPDX-License-Identifier: Apache-2.0

// Author: Marco Bertuletti, ETH Zurich

#include <stdint.h>
#include <stdlib.h>
#include <string.h>

#include "dma.h"
#include "encoding.h"
#include "printf.h"
#include "runtime.h"
#include "synchronization.h"

#include "data_axpy_f32.h"
#define NUM_BANKS (NUM_CORES * BANKING_FACTOR)

// Vectors for kernel computation
float l1_X[array_N] __attribute__((aligned(NUM_BANKS), section(".l1_prio")));
float l1_Y[array_N] __attribute__((aligned(NUM_BANKS), section(".l1_prio")));

#include "baremetal/mempool_axpy_f32.h"
#include "baremetal/mempool_checks.h"

int main() {

uint32_t core_id = mempool_get_core_id();
uint32_t num_cores = mempool_get_core_count();
uint32_t time_init, time_end;
mempool_barrier_init(core_id);

time_init = 0;
time_end = 0;
if (core_id == 0) {
dma_memcpy_blocking(l1_X, l2_X, array_N * sizeof(int32_t));
dma_memcpy_blocking(l1_Y, l2_Y, array_N * sizeof(int32_t));
}
float register volatile a = l2_A;
mempool_barrier(num_cores);

// PARALLEL
time_init = mempool_get_timer();
mempool_start_benchmark();
axpy_f32p_local_unrolled4(a, l1_X, l1_Y, array_N);
mempool_stop_benchmark();
time_end = mempool_get_timer();

// Check results
if (core_id == 0) {
uint32_t clock_cycles = (time_end - time_init);
printf("\nKernel execution takes %d clock cycles\n", clock_cycles);
}
mempool_check_f32(l1_Y, l2_Z, 100, 0.1f, 0);
mempool_barrier(num_cores);

return 0;
}
5 changes: 3 additions & 2 deletions software/apps/baremetal/axpy_i32/main.c
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
#include "runtime.h"
#include "synchronization.h"

#include "baremetal/mempool_axpy_i32p.h"
#include "baremetal/mempool_axpy_i32.h"
#include "baremetal/mempool_checks.h"
#include "data_axpy_i32.h"

Expand All @@ -38,11 +38,12 @@ int main() {
dma_memcpy_blocking(l1_Y, l2_Y, array_N * sizeof(int32_t));
error = 0;
}
register volatile int32_t a = l2_A;
mempool_barrier(num_cores);

// Benchmark
mempool_start_benchmark();
calc_axpy_unloop_x4_localbank(l1_X, l1_Y, ALPHA, array_N, core_id, num_cores);
calc_axpy_unloop_x4_localbank(l1_X, l1_Y, a, array_N, core_id, num_cores);
mempool_barrier(num_cores);
mempool_stop_benchmark();

Expand Down
Loading