基于RISC-V的神经网络加速器硬件实现*

doi:10.16257/j.cnki.1681-1070.2023.0016

电子与封装

• 电路与系统 • 下一篇

基于RISC-V的神经网络加速器硬件实现^*

鞠虎，高营，田青，周颖

中国电子科技集团公司第五十八研究所，江苏无锡　214035

收稿日期:2021-12-17 修回日期:2022-05-05 出版日期:2022-05-17 发布日期:2022-05-17
通讯作者: 田青
基金资助:
江苏省产业前瞻与关键核心技术研发项目（BE2021003）

Hardware Implementation of Neural Network Accelerator Based on RISC-V

JU Hu, GAO Ying, TIAN Qing, ZHOU Yin

The 58th Institute of China Electronics Technology Group Corporation, Wuxi 214035, China

Received:2021-12-17 Revised:2022-05-05 Online:2022-05-17 Published:2022-05-17

摘要/Abstract

摘要： 针对开放精简指令集（RISC-V）的人工智能（AI）处理器较少、先进的精简指令微处理器（ARM）架构供应链不稳定、自主可控性弱的问题，设计了以RISC-V处理器为核心的神经网络推理加速器系统级芯片（SoC）架构。采用开源项目搭建SoC架构；基于可变张量加速器架构，完成深度神经网络加速器指令集设计；通过高级可扩展接口（AXI）连接处理器与可变张量加速器（VTA），并采用共享内存的方式进行数据传输；基于深度学习编译栈实现卷积运算和神经网络部署。实验结果表明，所设计的架构可灵活实现多种主流的深度神经网络推理任务，乘法累加单元（MAC）数目可以达到1024，量化长度为有符号8位整数（INT8），编译栈支持主流神经网络编译，实现M-ZFNet和M-ResNet20神经网络图像分类演示，在现场可编程逻辑门阵列（FPGA）电路上整体准确率分别达到78.95%和84.81%。

关键词: RISC-V, 神经网络, 可变张量加速器, 通用矩阵乘, 深度学习编译器

Abstract: To solve the problem of less artificial intelligence (AI) accelerators based on open reduced instruction set computer-FIVE (RISC-V) instruction set, unstable supply chain of advanced RISC machines (ARM) architecture and weak autonomy and controllability, the system on chip (SoC) architecture of neural network inference accelerator based on RISC-V processor is designed. SoC architecture is built based on open source free project; The instruction set of deep neural network accelerator is designed based on the open source variable tensor accelerator architecture; The processor and versatile tensor accelerator (VTA) are connected by advanced eXtensible interface (AXI) bus protocol, and data transfer between the processor and the VTA through the shared memory; Convolution operation and network deployment are realized through deep learning compiler. The experimental results show that the designed architecture can flexibly realize a variety of mainstream deep neural network reasoning tasks. The number of multiply and accumulate operations (MAC) can reach 1024 and the quantization length is signed 8-bit integer (INT8). The compiler supports mainstream neural network compilation. In addition, the demonstration of M-ZFNet and M-ResNet20 neural network image classification is completed, and the overall accuracy of field programmable gate array (FPGA) is 78.95% and 84.81% respectively.

Key words: RISC-V, neural network, versatile tensor accelerator, general matrix multiplication, deep learning compiler

鞠虎, 高营, 田青, 周颖. 基于RISC-V的神经网络加速器硬件实现^*[J]. 电子与封装, doi: 10.16257/j.cnki.1681-1070.2023.0016.

JU Hu, GAO Ying, TIAN Qing, ZHOU Yin. Hardware Implementation of Neural Network Accelerator Based on RISC-V[J]. Electronics & Packaging, doi: 10.16257/j.cnki.1681-1070.2023.0016.

中国半导体行业协会封装分会会刊

中国电子学会电子制造与封装技术分会会刊

基于RISC-V的神经网络加速器硬件实现^*

Hardware Implementation of Neural Network Accelerator Based on RISC-V

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 1

编辑推荐

Metrics

本文评价