1 篇博文含有标签「example」

RISC-V matrix extension example

2023年2月1日 · 阅读需 5 分钟

Engineer @ T-Head

In order to enhance the AI inference ability of the Xuantie processor, T-Head propose a matrix extension Instruction set. The following is an example of AI inference introduced through the T-Head open source AI deployment package.

1 docker

Pull the image hhb:2.1-matrix that supports matrix extension from docker hub, start a container, and open a terminal in interactive mode in the container:

docker pull hhb4tools/hhb:2.1-matrix
docker run -itd --name=your.hhb2.1-matrix -p 22 -v /your_mount_dir/:/mnt "hhb4tools/hhb:2.1-matrix"
docker exec -it your.hhb2.1-matrix /bin/bash

After entering container, you can use the hhb --version command to confirm:

root@c14249f8243c:/# hhb --version
HHB version: 2.1.x-matrix, build 20230131

docker installation guide：Docker Engine installation overview

2 Model deployment

Take the deployment of MobileNet as an example, in /home/rvm_caffe_mv1_int8, there is already a complete Makefile script, execute the make command to convert the model into the required sample program, which can be executed on RISC-V architecture chips that support matrix extension.

cd /home/rvm_caffe_mv1_int8
make

The key steps in the model deployment process are described below:

2.1 Model compilation

HHB is an offline AI model compilation and optimization tool. Executing the following commands can quantize the original model, optimize operators such as operator fusion, and generate a C code model with high execution efficiency on the target chip.

hhb -C --calibrate-dataset ./cat.jpg --model-file ./mobilenetv1.prototxt ./mobilenetv1.caffemodel --data-scale 0.017 --data-mean '104 117 124' --output . --board rvm --quantization-scheme="int8_asym_w_sym" --pixel-format BGR --fuse-conv-relu --channel-quantization --target-layout NHWC

The model compilation options are described as:

-C ：specifies to execute the main command until C code is generated.
--calibrate-dataset：specifies the calibration image used for quantization.
--model-file ：specifies a MobileNet model downloaded to the current directory. A Caffe model is divided into two files. The files following the option are not sequence-sensitive.
--data-mean ：specifies a mean.
--data-scale ：specifies a scale.
--output ：specifies the current directory as the path to store files that you need to generate.
--board ：specifies the platform as the destination platform.
--quantization-scheme ：specifies a quantization scheme.
--pixel-format：specifies the input image format required by the model, the default is RGB, and the BGR image needs to be set to BGR when the model is trained.
--fuse-conv-relu：specifies relu fuse to convolution layer.
-channel-quantization：specifies weight channel quantization.
--target-layout NHWC：specifies the tensor layout.

After the command is executed, multiple files such as main.c and model.c will be generated in the current directory:

data.0.tensor： cat.png preprocessed tensor by decoding.
data.0.bin：data.0.tensor binary data.
main.c：the reference entry to the sample program.
model.c：a model structure file that describes the model structure.
hhb.bm：HHB format model file.
model.params：the weights converted to int8.
io.c：the helper function for reading and writing files.
io.h：the declaration of the helper function for reading and writing files.
process.c：the image preprocessing function.
process.h：the declaration of the image preprocessing function.

2.2 SHL library

SHL is a set of neural network library API for Xuantie CPU platform, and provides a series of optimized binary libraries. SHL supports convolutional layer-focused optimization by matrix extension, . In this example, the prebuilt inference library has been placed in the /home/install_nn2 directory, and the source code can also be downloaded and rebuild by the following steps.

git clone -b matrix https://github.com/T-head-Semi/csi-nn2.git
cd csi-nn2
make nn2_rvm
make install

2.2 Executable program

After hhb completes the code generation, execute the following compilation command to link the rvm high-performance library and generate the c_runtime program in the current directory:

riscv64-unknown-linux-gnu-gcc -O2 -g3 -march=rv64gcv_zfh_xtheadc -mabi=lp64d -I/home -I/home/install_nn2/include -I/home/decode/install/include -o c_runtime  main.c model.c io.c process.c -L/home/install_nn2/lib -L/home/decode/install/lib/rv -ljpeg -lpng -lz -lstdc++ -lshl_rvm -lm -static -Wl,--gc-sections

The compilation option is described as follows:

-O2 -g3: specifies the optimization option and debug-level.
-march: specifies the architecture option for RISC-V matrix extension chip.
-mabi: specifies the application binary interface (ABI) option for RISC-V matrix extension chip.
-I: specifies the location of the header file that needs to be used during compilation.
main.c model.c io.c process.c: the source file that you need to use for compilation.
-L: specifies the path to store the specified library.
-ljpeg: links to a JPEG decoding library.
-lpng: links to a PNG decoding library.
-lz: links to a zlib.
-lstdc++: links to a standard C++ library.
-lshl_rvm: links to an optimized version library of rvm in SHL.
-lm: links to a standard math library.
-static: a static link.
-Wl,--gc-sections: recycles unused sections during linking.

The gcc version used in this example is V2.6.1, you can use the following command to check:

riscv64-unknown-linux-gnu-gcc -v

3 Simulate

After the compilation is complete, use T-Head's qemu simulation program to execute, and you can see the top5 execution results on the terminal:

qemu-riscv64 -cpu  rv64,x-v=true,vext_spec=v1.0,vlen=128,x-matrix=on,mlen=128 c_runtime model.params data.0.bin

The qemu version used in this example is V6.0.94, you can use the following command to check:

qemu-riscv64 -version

4 Other

RISC-V matrix extension also supports fp16 data type, just modify the hhb compilation command as follows, and keep other steps unchanged, you can use fp16 for inference.

hhb -C --calibrate-dataset ./cat.jpg --model-file ./mobilenetv1.prototxt ./mobilenetv1.caffemodel 
--data-scale 0.017 --data-mean '104 117 124' --output . --board rvm --quantization-scheme="float16" 
--pixel-format BGR --target-layout NHWC

1 docker​

2 Model deployment​

2.1 Model compilation​

2.2 SHL library​

2.2 Executable program​

3 Simulate​

4 Other​