中国半导体行业协会封装分会会刊

中国电子学会电子制造与封装技术分会会刊

导航

电子与封装

• 电路与系统 •    下一篇

基于RISC-V的神经网络加速器硬件实现*

鞠  虎,高  营,田  青,周  颖   

  1. 中国电子科技集团公司第五十八研究所,江苏 无锡 214035
  • 收稿日期:2021-12-17 修回日期:2022-05-05 出版日期:2022-05-17 发布日期:2022-05-17
  • 通讯作者: 田青
  • 基金资助:
    江苏省产业前瞻与关键核心技术研发项目(BE2021003)

Hardware Implementation of Neural Network Accelerator Based on RISC-V

JU Hu, GAO Ying, TIAN Qing, ZHOU Yin   

  1. The 58th Institute of China Electronics Technology Group Corporation, Wuxi 214035, China
  • Received:2021-12-17 Revised:2022-05-05 Online:2022-05-17 Published:2022-05-17

摘要: 针对开放精简指令集(RISC-V)的人工智能(AI)处理器较少、先进的精简指令微处理器(ARM)架构供应链不稳定、自主可控性弱的问题,设计了以RISC-V处理器为核心的神经网络推理加速器系统级芯片(SoC)架构。采用开源项目搭建SoC架构;基于可变张量加速器架构,完成深度神经网络加速器指令集设计;通过高级可扩展接口(AXI)连接处理器与可变张量加速器(VTA),并采用共享内存的方式进行数据传输;基于深度学习编译栈实现卷积运算和神经网络部署。实验结果表明,所设计的架构可灵活实现多种主流的深度神经网络推理任务,乘法累加单元(MAC)数目可以达到1024,量化长度为有符号8位整数(INT8),编译栈支持主流神经网络编译,实现M-ZFNet和M-ResNet20神经网络图像分类演示,在现场可编程逻辑门阵列(FPGA)电路上整体准确率分别达到78.95%和84.81%。

关键词: RISC-V, 神经网络, 可变张量加速器, 通用矩阵乘, 深度学习编译器

Abstract: To solve the problem of less artificial intelligence (AI) accelerators based on open reduced instruction set computer-FIVE (RISC-V) instruction set, unstable supply chain of advanced RISC machines (ARM) architecture and weak autonomy and controllability, the system on chip (SoC) architecture of neural network inference accelerator based on RISC-V processor is designed. SoC architecture is built based on open source free project; The instruction set of deep neural network accelerator is designed based on the open source variable tensor accelerator architecture; The processor and versatile tensor accelerator (VTA) are connected by advanced eXtensible interface (AXI) bus protocol, and data transfer between the processor and the VTA through the shared memory; Convolution operation and network deployment are realized through deep learning compiler. The experimental results show that the designed architecture can flexibly realize a variety of mainstream deep neural network reasoning tasks. The number of multiply and accumulate operations (MAC) can reach 1024 and the quantization length is signed 8-bit integer (INT8). The compiler supports mainstream neural network compilation. In addition, the demonstration of M-ZFNet and M-ResNet20 neural network image classification is completed, and the overall accuracy of field programmable gate array (FPGA) is 78.95% and 84.81% respectively.

Key words: RISC-V,  neural network,  versatile tensor accelerator,  general matrix multiplication,  deep learning compiler