基于深度学习的目标检测研究与应用综述

doi:10.16257/j.cnki.1681-1070.2022.0114

摘要/Abstract

摘要： 基于深度学习的目标检测算法相较于传统的目标检测算法来说，对复杂场景的稳健性更强，是当前研究的热点方向。根据基于深度学习的目标检测算法的流程特点将其分为两阶段目标检测算法和单阶段目标检测算法，着重介绍了部分经典算法所解决的问题及其优缺点，并梳理了其在工业界的应用情况，最后对其仍存在的问题进行了讨论，对未来可能的发展趋势进行了展望。

关键词: 计算机视觉, 深度学习, 目标检测, 工业应用

Abstract: Compared with traditional object detection algorithms, object detection algorithms based on deep learning are more robust to complex scenes, and are currently a hot research direction. It is divided into two-stage detection algorithm and one-stage detection algorithm according to the process characteristics of the object detection algorithm based on deep learning. The problems solved by some of the classic algorithms and their advantages and disadvantages are introduced. Its application in the industry is sorted out. Finally, the remaining problems are discussed, and the possible future development trends are further prospected.

Key words: computervision, deeplearning, objectdetection, engineeringapplication

中图分类号:

TP391.4

吕璐;程虎;朱鸿泰;代年树. 基于深度学习的目标检测研究与应用综述[J]. 电子与封装, 2022, 22(1): 010307 .

LYU Lu, CHENG Hu, ZHU Hongtai, DAI Nianshu. Progress of Research and Application of Object DetectionBased on Deep Learning[J]. Electronics & Packaging, 2022, 22(1): 010307 .

参考文献

[1] ROBERTS L G. Machine perception of three-dimensional solids[D]. Massachusetts: Massachusetts Institute of Technology, 1963.
[2] LIENHART R, MAYDT J. An extended set of haar-like features for rapid object detection[C]// Proceedings International Conference on Image Processing. IEEE, 2002, 1: I.
[3] DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]// 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05). IEEE, 2005, 1: 886-893.
[4] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014: 580-587.
[5] HE K, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916.
[6] GIRSHICK R. Fast R-CNN[C]// Proceedings of the IEEE International Conference on Computer Vision, 2015: 1440-1448.
[7] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[C]// Advances in Neural Information Processing Systems, 2015: 91-99.
[8] LIN T Y, DOLLáR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 2117-2125.
[9] DAI J, LI Y, HE K, et al. R-FCN: Object detection via region-based fully convolutional networks[C]//Advances in Neural Information Processing Systems, 2016: 379-387.
[10] HE K, GKIOXARI G, DOLLáR P, et al. Mask R-CNN[C]// Proceedings of the IEEE International Conference on Computer Vision, 2017: 2961-2969.
[11] CAI Z, VASCONCELOS N. Cascade R-CNN: Delving into high quality object detection[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 6154-6162.
[12] ZHAO Q, SHENG T, WANG Y, et al. M2det: A single-shot object detector based on multi-level feature pyramid network[C]// Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33: 9259-9266.
[13] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, real-time object detection[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779-788.
[14] LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single shot multibox detector[C]// European Conference on Computer Vision. Springer, Cham, 2016: 21-37.
[15] REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 7263-7271.
[16] REDMON J, FARHADI A. Yolov3: An incremental improvement[J]. arXiv preprint arXiv:1804.02767, 2018.
[17] BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: Optimal Speed and Accuracy of Object Detection[J]. arXiv Preprint arXiv:2004.10934, 2020.
[18] YUN S, HAN D, OH S J, et al. Cutmix: Regularization strategy to train strong classifiers with localizable features[C]// Proceedings of the IEEE International Conference on Computer Vision. 2019: 6023-6032.
[19] WANG C Y, MARK LIAO H Y, WU Y H, et al. CSPNet: A new backbone that can enhance learning capability of CNN[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020: 390-391.
[20] MISRA D. Mish: A self regularized non-monotonic neural activation function[J]. arXiv Preprint arXiv: 1908.08681, 2019.
[21] YANG J, FU X, HU Y, et al. PanNet: A deep network architecture for pan-sharpening[C]// Proceedings of the IEEE International Conference on Computer Vision, 2017: 5449-5457.
[22] YU J, JIANG Y, WANG Z, et al. Unitbox: An advanced object detection network[C]// Proceedings of the 24th ACM international conference on Multimedia, 2016: 516-520.
[23] REZATOFIGHI H, TSOI N, GWAK J Y, et al. Generalized intersection over union: A metric and a loss for bounding box regression[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019: 658-666.
[24] ZHENG Z, WANG P, LIU W, et al. Distance-IoU Loss: Faster and better learning for bounding box regression[C]// AAAI, 2020: 12993-13000.
[25] HUANG L, YANG Y, DENG Y, et al. Densebox: Unifying landmark localization with end to end object detection[J]. arXiv Preprint arXiv:1509.04874, 2015.
[26] ZHOU X, WANG D, KR?HENBüHL P. Objects as points[J]. arXiv preprint arXiv:1904.07850, 2019.
[27] ZHOU X, ZHUO J, KRAHENBUHL P. Bottom-up object detection by grouping extreme and center points[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019: 850-859.
[28] LIU W, LIAO S, REN W, et al. High-level semantic feature detection: A new perspective for pedestrian detection[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019: 5187-5196.
[29] LAW H, DENG J. Cornernet: Detecting objects as paired keypoints[C]// Proceedings of the European Conference on Computer Vision (ECCV), 2018: 734-750.
[30] DUAN K, BAI S, XIE L, et al. Centernet: Keypoint triplets for object detection[C]// Proceedings of the IEEE International Conference on Computer Vision, 2019: 6569-6578.
[31] ZHU C, HE Y, SAVVIDES M. Feature selective anchor-free module for single-shot object detection[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019: 840-849.
[32] TIAN Z, SHEN C, CHEN H, et al. FCOS: Fully convolutional one-stage object detection[C]// Proceedings of the IEEE International Conference on Computer Vision, 2019: 9627-9636.
[33] KONG T, SUN F, LIU H, et al. Foveabox: Beyound anchor-based object detection[J]. IEEE Transactions on Image Processing, 2020, 29: 7389-7398.
[34] VICENTE S, CARREIRA J, AGAPITO L, et al. Reconstructing pascal VOC[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014: 41-48.
[35] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft coco: Common objects in context[C]// European Conference on Computer Vision. Springer, Cham, 2014: 740-755.
[36] CAO J, CHOLAKKAL H, ANWER R M, et al. D2Det: Towards high quality object detection and instance segmentation[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 11485-11494.
[37] SHRIVASTAVA A, GUPTA A, GIRSHICK R. Training region-based object detectors with online hard example mining[C]/ /Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 761-769.
[38] LI B, LIU Y, WANG X. Gradient harmonized single-stage detector[C]// Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33: 8577-8584.
[39] GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]// Advances in neural Information Processing Systems, 2014: 2672-2680.
[40] ZHU J Y, PARK T, ISOLA P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]// Proceedings of the IEEE International Conference on Computer Vision, 2017: 2223-2232.
[41] CHOI Y, UH Y, YOO J, et al. Stargan v2: Diverse image synthesis for multiple domains[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 8188-8197.
[42] FU K, ZHANG T, ZHANG Y, et al. Meta-SSD: Towards fast adaptation for few-shot object detection with meta-learning[J]. IEEE Access, 2019(7): 77597-77606.

中国半导体行业协会封装分会会刊

中国电子学会电子制造与封装技术分会会刊