CAPIPrecis is an abstraction layer (AFU-Control) that simplifies communication and buffering with IBM CAPI Power Service Layer (PSL). Each control unit handling different aspects of communication with the PSL, it simplifies the interface for sending and receiving memory transactions, and preserves the fine-grain random or sequential memory access pattern. Furthermore our layer differentiate its self from other CAPI frameworks, by keeping the PSL cache support.
- Design your Compute Units (CUs) without the need to interface with the PSL directly (CU-centric).
- Supports PSL cache access.
- Supports fine grain random, or streaming access, for it keeps the command-buffer exposed and flexible for any CAPI-PSL supported commands.
- You will only be concerned with sending PSL supported commands, for example you can send reads/writes without the need to check parity, error reporting, and latency requirements. Just push commands to their corresponding buffers, and wait for the response.
- Each sent command can be coupled with meta-data (for example CU_ID, request_size, etc.), and then receive data or responses coupled with those elements for an easier multi-CU design.
- Open-source and minimalistic design.
- OpenMP is already a feature of the compiler, so this step is not necessary.
CAPI@Precis:~$ sudo apt-get install libomp-dev
- Simulation and Synthesis
- This framework was developed on Ubuntu 18.04 LTS.
- ModelSim is used for simulation and installed along side Quartus II 18.1.
- Synthesis requires ALTERA Quartus, starting from release 15.0 of Quartus II should be fine.
- Nallatech P385-A7 card with the Altera Stratix-V-GX-A7 FPGA is supported.
- Environment Variable setup,
HOME
andALTERAPATH
depend on where you clone the repository and install ModelSim.
#quartus 18.1 env-variables
export ALTERAPATH="${HOME}/intelFPGA/18.1"
export QUARTUS_INSTALL_DIR="${ALTERAPATH}/quartus"
export LM_LICENSE_FILE="${ALTERAPATH}/licenses/psl_A000_license.dat:${ALTERAPATH}/licenses/common_license.dat"
export QSYS_ROOTDIR="${ALTERAPATH}/quartus/sopc_builder/bin"
export PATH=$PATH:${ALTERAPATH}/quartus/bin
export PATH=$PATH:${ALTERAPATH}/nios2eds/bin
#modelsim env-variables
export PATH=$PATH:${ALTERAPATH}/modelsim_ase/bin
#CAPIPrecis project folder
export CAPI_PROJECT=00_CAPIPrecis
#CAPI framework env variables
export PSLSE_INSTALL_DIR="${HOME}/Documents/github_repos/${CAPI_PROJECT}/01_capi_integration/pslse"
export VPI_USER_H_DIR="${ALTERAPATH}/modelsim_ase/include"
export PSLVER=8
export BIT32=n
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$PSLSE_INSTALL_DIR/libcxl:$PSLSE_INSTALL_DIR/afu_driver/src"
#PSLSE env variables
export PSLSE_SERVER_DIR="${HOME}/Documents/github_repos/${CAPI_PROJECT}/01_capi_integration/accelerator_sim/server"
export PSLSE_SERVER_DAT="${PSLSE_SERVER_DIR}/pslse_server.dat"
export SHIM_HOST_DAT="${PSLSE_SERVER_DIR}/shim_host.dat"
export PSLSE_PARMS="${PSLSE_SERVER_DIR}/pslse.parms"
export DEBUG_LOG_PATH="${PSLSE_SERVER_DIR}/debug.log"
- AFU Communication with PSL
- please check (CAPI User's Manual).
- Clone CAPIPrecis.
CAPI@Precis:~$ git clone https://github.com/atmughrabi/CAPIPrecis.git
- From the home directory go to the CAPIPrecis directory:
CAPI@Precis:~$ cd CAPIPrecis/
- Setup the CAPI submodules.
CAPI@Precis:~CAPIPrecis$ git submodule update --init --recursive
- (Optional) From the root directory go to benchmark directory:
CAPI@Precis:~CAPIPrecis$ cd 00_bench/
- The default compilation is
openmp
mode:
CAPI@Precis:~CAPIPrecis/00_bench$ make
- From the root directory you can modify the Makefile with the directories you need for you custom project:
CAPI@Precis:~CAPIPrecis/00_bench$ make run
- OR
CAPI@Precis:~CAPIPrecis/00_bench$ make run-openmp
- Example output:
*-----------------------------------------------------*
| Number of Threads : 8 |
-----------------------------------------------------
*-----------------------------------------------------*
| Allocating Data Arrays (SIZE) 131072 |
-----------------------------------------------------
*-----------------------------------------------------*
| Populating Data Arrays (Seed) 27491095 |
-----------------------------------------------------
*-----------------------------------------------------*
| Copy data (SIZE) 131072 |
-----------------------------------------------------
| Time (Seconds) | 0.00000900000000000000 |
| BandWidth MB/s | 55555.55555555555474711582 |
| BandWidth GB/s | 54.25347222222222143273 |
*-----------------------------------------------------*
| Data Missmatched (#) | 0 |
-----------------------------------------------------
| PASS |
*-----------------------------------------------------*
| Freeing Data Arrays (SIZE) 131072 |
-----------------------------------------------------
- NOTE: You need CAPI environment setup on your machine (tested on Power8 8247-22L).
- CAPI Education Videos
- We are not supporting CAPI-SNAP since our processing suite supports accelerator-cache. SNAP does not support this feature yet. So if you are interested in streaming applications or do not benefit from caches SNAP is also good candidate.
- To check the SNAP framework: https://github.com/open-power/snap.
- NOTE: You need three open terminals, for running vsim, pslse, and the application.
- (Optional) From the root directory go to benchmark directory:
CAPI@Precis:~CAPIPrecis$ cd 00_bench/
- On terminal 1. Run [ModelSim vsim] for
simulation
this step is not needed when running on real hardware, this just simulates the AFU that resides on your (CAPI supported) FPGA :
CAPI@Precis:~CAPIPrecis/00_bench$ make run-vsim
- The previous step will execute vsim.tcl script to compile the design, to start the running the simulation just execute the following command at the transcript terminal of ModelSim :
r #recompile design
,c #run simulation
ModelSim> r
ModelSim> c
- On Terminal 2. Run PSL Simulation Engine (PSLSE) for
simulation
this step is not needed when running on real hardware, this just emulates the PSL that resides on your (CAPI supported) IBM-PowerPC machine :
CAPI@Precis:~CAPIPrecis/00_bench$ make run-pslse
- On Terminal 3. Run the algorithm that communicates with the PSLSE (simulation):
CAPI@Precis:~CAPIPrecis/00_bench$ make run-capi-sim
- On Terminal 3. Run the algorithm that communicates with the PSLSE (simulation) printing out stats based on the responses received to the AFU-Control layer:
CAPI@Precis:~CAPIPrecis/00_bench$ make run-capi-sim-verbose
- Example output: please check (CAPI User's Manual), for each response explanation. The stats are labeled
RESPONSE_COMMANADTYPE_count
.
*-----------------------------------------------------*
| WEDStruct structure |
-----------------------------------------------------
| wed | 0x557ba26c1600 |
| wed->size_send | 131072 |
| wed->size_recive | 131072 |
| wed->array_send | 0x7fc19b290080 |
| wed->array_receive | 0x7fc19b20f080 |
-----------------------------------------------------
*-----------------------------------------------------*
| AFU configuration START |
-----------------------------------------------------
| status | 0 |
*-----------------------------------------------------*
| AFU configuration DONE |
-----------------------------------------------------
| status | 1111000000000001 |
*-----------------------------------------------------*
*-----------------------------------------------------*
| CU configuration START |
-----------------------------------------------------
| status | 0 |
*-----------------------------------------------------*
| CU configuration DONE |
-----------------------------------------------------
| status | 333b1000008 |
*-----------------------------------------------------*
*-----------------------------------------------------*
| AFU Stats |
-----------------------------------------------------
| CYCLE_count | 55060 |
| Time (Seconds) | 0.00022023999999999999 |
-----------------------------------------------------
*-----------------------------------------------------*
| Total BW |
-----------------------------------------------------
| Data MB | 1.00000000000000000000 |
| Data GB | 0.00097656250000000000 |
-----------------------------------------------------
| BandWidth MB/s | 4540.50127134035574272275 |
| BandWidth GB/s | 4.43408327279331615500 |
*-----------------------------------------------------*
| Total Read BW |
-----------------------------------------------------
| Data MB | 0.50000000000000000000 |
| Data GB | 0.00048828125000000000 |
-----------------------------------------------------
| BandWidth MB/s | 2270.25063567017787136137 |
| BandWidth GB/s | 2.21704163639665807750 |
*-----------------------------------------------------*
| Total Write BW |
-----------------------------------------------------
| Data MB | 0.50000000000000000000 |
| Data GB | 0.00048828125000000000 |
-----------------------------------------------------
| BandWidth MB/s | 2270.25063567017787136137 |
| BandWidth GB/s | 2.21704163639665807750 |
*-----------------------------------------------------*
| Effective total BW |
-----------------------------------------------------
| Data MB | 1.00000000000000000000 |
| Data GB | 0.00097656250000000000 |
-----------------------------------------------------
| BandWidth MB/s | 4540.50127134035574272275 |
| BandWidth GB/s | 4.43408327279331615500 |
*-----------------------------------------------------*
| Effective Read BW |
-----------------------------------------------------
| Data MB | 0.50000000000000000000 |
| Data GB | 0.00048828125000000000 |
-----------------------------------------------------
| BandWidth MB/s | 2270.25063567017787136137 |
| BandWidth GB/s | 2.21704163639665807750 |
*-----------------------------------------------------*
| Effective Write BW |
-----------------------------------------------------
| Data MB | 0.50000000000000000000 |
| Data GB | 0.00048828125000000000 |
-----------------------------------------------------
| BandWidth MB/s | 2270.25063567017787136137 |
| BandWidth GB/s | 2.21704163639665807750 |
*-----------------------------------------------------*
| Byte Transfer Stats |
-----------------------------------------------------
| READ_BYTE_count | 524288 |
| WRITE_BYTE_count | 524288 |
-----------------------------------------------------
| PREFETCH_READ_BYTE_count | 0 |
| PREFETCH_WRITE_BYTE_count | 0 |
*-----------------------------------------------------*
| Responses Stats |
-----------------------------------------------------
| DONE_count | 8415 |
-----------------------------------------------------
| DONE_READ_count | 4096 |
| DONE_WRITE_count | 4096 |
-----------------------------------------------------
| DONE_RESTART_count | 206 |
-----------------------------------------------------
| DONE_PREFETCH_READ_count | 8 |
| DONE_PREFETCH_WRITE_count | 8 |
-----------------------------------------------------
| PAGED_count | 206 |
| FLUSHED_count | 0 |
| AERROR_count | 0 |
| DERROR_count | 0 |
| FAILED_count | 0 |
| NRES_count | 0 |
| NLOCK_count | 0 |
*-----------------------------------------------------*
These steps require ALTERA Quartus synthesis tool, starting from release 15.0 of Quartus II should be fine.
- From the root directory (using terminal)
CAPI@Precis:~CAPIPrecis$ make run-synth
or
CAPI@Precis:~CAPIPrecis$ cd 01_capi_integration/accelerator_synth/
CAPI@Precis:~CAPIPrecis/01_capi_integration/accelerator_synth$ make
- Check CAPIPrecis.sta.rpt for timing requirements violations
- From the root directory (using terminal)
CAPI@Precis:~CAPIPrecis$ make run-synth-gui
or
CAPI@Precis:~CAPIPrecis$ cd 01_capi_integration/accelerator_synth/
CAPI@Precis:~CAPIPrecis/01_capi_integration/accelerator_synth$ make gui
- Synthesize using Quartus GUI
- From the root directory (using terminal) runs a list of seeds synthesizing for each.
CAPI@Precis:~CAPIPrecis$ make run-synth-sweep
or
CAPI@Precis:~CAPIPrecis$ cd 01_capi_integration/accelerator_synth/
CAPI@Precis:~CAPIPrecis/01_capi_integration/accelerator_synth$ make sweep
- From the root directory go to CAPI integration directory -> CAPIPrecis binary images:
CAPI@Precis:~CAPIPrecis$ cd 01_capi_integration/accelerator_bin/
- Flash the image to the corresponding
#define DEVICE
you can modify it according to your Power8 system from00_bench/include/capi_utils/capienv.h
CAPI@Precis:~CAPIPrecis/01_capi_integration/accelerator_bin$ sudo capi-flash-script capi-precis_GITCOMMIT#_DATETIME.rbf
- (Optional) From the root directory go to benchmark directory:
CAPI@Precis:~CAPIPrecis$ cd 00_bench/
- Runs algorithm that communicates with the or PSL (real HW):
CAPI@Precis:~CAPIPrecis/00_bench$ make run-capi-fpga
This run outputs different AFU-Control stats based on the responses received from the PSL
- Runs algorithm that communicates with the or PSL (real HW):
CAPI@Precis:~CAPIPrecis/00_bench$ make run-capi-fpga-verbose
Usage: capi-precis-openmp [OPTION...]
-s <size> -n [num threads] -a [afu config] -c [cu config]
CAPIPrecis is an open source CAPI enabled FPGA processing framework, it is
designed to abstract the PSL layer for a faster development cycles
-a, --afu-config=[DEFAULT:0x1]
AFU-Control buffers(read/write/prefetcher)
arbitration 0x01 round robin 0x10 fixed priority
-b, --afu-config2=[DEFAULT:0x0]
AFU-Control MMIO register for extensible features
-c, --cu-config=[DEFAULT:0x01]
CU configurations for requests cached/non
cached/prefetcher active or not check Makefile for
more examples
-d, --cu-config2=[DEFAULT:0x00]
CU-Control MMIO register for extensible features
-m, --cu-mode=[DEFAULT:0x03]
CU configurations for read/write engines.
disable-both-engines-[0] write-engine-[1]
read-engine-[2] enable-both-engines-[3]
-n, --num-threads=[DEFAULT:MAX]
Default: MAX number of threads the system has
-s, --size=SIZE:512
Size of array to be sent and copied back
-?, --help Give this help list
--usage Give a short usage message
-V, --version Print program version
Mandatory or optional arguments to long options are also mandatory or optional
for any corresponding short options.
00_bench
- The SW side that runs on the host(CPU)include
src
algorithms
openmp
algorithm.c
- Contains a version of the code that runs on CPU.
capi
algorithm.c
- Contains a version of the code that runs on FPGA.
capi-utils
capienv.c
- Has the functions for setting up CAPI with our application (setup/start/wait/error).
main
capi-precis.c
- Our main program execution starts from here.
tests
test_afu.c
- test file to try things before integration.test_capi-precis.c
- another test bed to try some functionalities.
utils
mt19937.c
- Random number generator.myMalloc.c
- Custom malloc wrapper for aligned allocations.timer.c
- simple time measurement library.
Makefile
- This makefile handles the compilation/and simulation of CAPIPrecis
01_capi_integration
- The SW side that runs on the Device(FPGA)/ModelSimaccelerator_rtl
cu_control
- CU Units reside in this folder (read/write engines)afu_pkgs
- global packagesafu_control
- AFU Control units in this folder
accelerator_bin
- Binary images of CAPIPrecis (passed time requirements)capi-precis_GITCOMMIT#_DATETIME.rbf
- flash binary imagesynthesis_reports_capi-precis_GITCOMMIT#_DATETIME
- synthesis reports for that binary image
accelerator_sim
server
- files for PSLSE layerpslse.parms
pslse_server.dat
shim_host.dat
sim
- ModelSim file and tcl scriptsvsim.tcl
- when adding files to you RTL project you need to update this scriptinerface.do
- Wave files for ModelSim simulation
accelerator_synth
- synthesis scriptscapi
- This folder contains helper scripts that generated the files necessary for synthesizing the project.psl_fpga
- This folder contains the RTL for the PSL layer, IPs, and the AFU topcapi-precis.tcl
Makefile
- Synthesis Makefile that invokes Quartus.
pslse
libcxl
capi-utils
Makefile
- Global makefile
Report bugs to: