中国半导体行业协会封装分会会刊

中国电子学会电子制造与封装技术分会会刊

导航

电子与封装 ›› 2024, Vol. 24 ›› Issue (12): 120306 . doi: 10.16257/j.cnki.1681-1070.2024.0172

• 电路与系统 • 上一篇    下一篇

面向大规格矩阵协方差运算的高性能硬件加速器设计*

陈铠1,刘传柱2,冯建哲1,滕紫珩1,李世平1,傅玉祥2,李丽2,何国强1   

  1. 1.江苏华创微系统有限公司,南京? 211800;2. 南京大学电子科学与工程学院,南京?210023
  • 收稿日期:2024-07-04 出版日期:2024-12-25 发布日期:2024-12-25
  • 作者简介:陈铠(1979—),男,江苏南京人,硕士,高级工程师,主要研究方向为信号处理算法和人工智能算法高效能硬件实现、异构多核SoC芯片架构设计。

Design of High Performance Hardware Accelerator for Large-Scale Matrix Covariance Computation

CHEN Kai1, LIU Chuanzhu2, FENG Jianzhe1, TENG Ziheng1, LI Shiping1, FU Yuxiang2,   

  1. 1. Jiangsu Huachuang Microsystem Co., Ltd., Nanjing 211800, China; 2. School of Electronic Science and Engineering, Nanjing University, Nanjing 210023, China
  • Received:2024-07-04 Online:2024-12-25 Published:2024-12-25

摘要: 随着雷达系统向多通道、高带宽方向发展,大规格矩阵带来的协方差运算实时性问题限制了空时二维自适应处理(STAP)技术在先进机载雷达平台上的应用。提出了一种高性能硬件加速器设计方法,旨在满足日益增长的大规格矩阵协方差处理需求,同时提高低功耗约束下的运算效率。加速器由运算部件、控制模块、存储模块和DMA控制器组成,通过对矩阵按列分段处理的方式,在硬件存储资源有限的条件下,支持最大256×8192的矩阵协方差运算。设计了下三角运算控制逻辑,降低了运算量,并提出了一套高并发乒乓存储、流水乘累加树处理机制,提高了处理效能。流片测试结果表明,该加速器处理大规格矩阵协方差运算时性能为算力接近的CPU核的70倍以上。

关键词: 协方差, 硬件加速器, 流水计算, 乘累加树, 乒乓存储

Abstract: With the development of radar systems toward multi-channel and high bandwidth, the real-time problem of covariance operation caused by large-scale matrix limits the application of space-time adaptive processing (STAP) technology in advanced airborne radar platforms. A high performance hardware accelerator design method is proposed to meet the increasing demand for large-scale matrix covariance processing and improve computational efficiency under low-power constraints. The accelerator is composed of computing unit, control module, storage module and DMA controller. It can support up to 256×8192 matrix covariance operation under the condition of limited hardware storage resources by processing the matrix in column segments. The control logic of the lower triangulation operation is designed to reduce the amount of computation, and a high-concurrency ping-pong storage mechanism along with a pipelined multiplication-accumulation tree processing method are proposed to enhance the processing efficiency. The tape-out test results show that the performance of the hardware accelerator in large-scale matrix covariance operations is more than 70 times that of a CPU core with similar computational capabilities.

Key words: covariance, hardware accelerator, pipeline computation, multiplication-accumulation tree, ping-pong storage

中图分类号: