[1] CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[J]. Lecture Notes in Computer Science, 2020, 12346: 213-229. [2] CHEN H T, WANG Y H, GUO T Y, et al. Pre-trained image processing transformer[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021: 12294-12305. [3] ZHENG S X, LU J C, ZHAO H S, et al. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021: 6877-6886. [4] ZHU X Z, SU W J, LU L W, et al. Deformable DETR: deformable transformers for end-to-end object detection[EB/OL]. [2025-07-17]. https://arxiv.org/abs/2010.04159. [5] ZHOU L W, ZHOU Y B, CORSO J J, et al. End-to-end dense video captioning with masked transformer[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018: 8739-8748. [6] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016: 770-778. [7] HOWARD A G, ZHU M L, CHEN B, et al. MobileNets: efficient convolutional neural networks for mobile vision applications[EB/OL]. [2025-07-17]. https://arxiv.org/abs/1704.04861 [8] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]//Advances in Neural Information Processing Systems, Long Beach, CA, USA, 2017, 30:5998-6008. [9] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: transformers for image recognition at scale[EB/OL]. [2025-07-17]. https://arxiv.org/abs/2010.11929. [10] HO J, JAIN A, ABBEEL P. Denoising diffusion probabilistic models[EB/OL]. [2025-07-17]. https://arxiv.org/abs/2006.11239. [11] SONG J M, MENG C L, ERMON S. Denoising diffusion implicit models[EB/OL]. [2025-07-17]. https://arxiv.org/abs/2010.02502. [12] NICHOL A, DHARIWAL P, RAMESH A, et al. GLIDE: towards photorealistic image generation and editing with text-guided diffusion models[EB/OL]. [2025-07-17]. https://arxiv.org/abs/2112.10741. [13] RAMESH A, DHARIWAL P, NICHOL A, et al. Hierarchical text-conditional image generation with CLIP latents[EB/OL]. [2025-07-17]. https://arxiv.org/abs/2204.06125. [14] ROMBACH R, BLATTMANN A, LORENZ D, et al. High-resolution image synthesis with latent diffusion models[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022: 10674-10685. [15] GUO Y W, YANG C Y, RAO A Y, et al. AnimateDiff: animate your personalized text-to-image diffusion models without specific tuning[EB/OL]. [2025-07-17]. https://arxiv.org/abs/2307.04725. [16] PEEBLES W, XIE S N. Scalable diffusion models with transformers[C]//2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2024: 4172-4182. [17] ZHOU Z X, NING X F, HONG K, et al. A survey on efficient inference for large language models[EB/OL]. [2025-07-17]. https://arxiv.org/abs/2404.14294. [18] BAI G J, CHAI Z, LING C, et al. Beyond efficiency: a systematic survey of resource-efficient large language models[EB/OL]. [2025-07-17]. https://arxiv.org/abs/2401.00625. [19] CROITORU F A, HONDRU V, IONESCU R T, et al. Diffusion models in vision: a survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(9): 10850-10869. [20] ZENG Q, HU C G, SONG M L, et al. Diffusion model quantization: a review[EB/OL]. [2025-07-17]. https://arxiv.org/abs/2505.05215. [21] SHANG Y Z, YUAN Z H, XIE B, et al. Post-training quantization on diffusion models[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 2023: 1972-1981. [22] LI X Y, LIU Y J, LIAN L, et al. Q-diffusion: quantizing diffusion models[C]//2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2024: 17489-17499. [23] HE Y F, LIU L P, LIU J, et al. PTQD: accurate post-training quantization for diffusion models[EB/OL]. [2025-07-17]. https://arxiv.org/abs/2305.10657. [24] BOLYA D, HOFFMAN J. Token merging for fast stable diffusion[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Vancouver, BC, Canada, 2023: 4599-4603. [25] KIM M, GAO S Q, HSU Y C, et al. Token fusion: bridging the gap between token pruning and token merging[C]//2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 2024: 1372-1381. [26] GUO R Q, WANG L, CHEN X F, et al. 20.2 a 28nm 74.34TFLOPS/W BF16 heterogenous CIM-based accelerator exploiting denoising-similarity for diffusion models[C]//2024 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, 2024: 362-364. [27] KONG W H, HAO Y F, GUO Q, et al. Cambricon-D: full-network differential acceleration for diffusion models[C]//2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA), Buenos Aires, Argentina, 2024: 903-914. [28] KIM S, LEE H, CHO W, et al. Ditto: accelerating diffusion model via temporal value similarity[EB/OL]. [2025-07-17]. https://arxiv.org/abs/2501.11211. [29] QIN E, SAMAJDAR A, KWON H, et al. SIGMA: a sparse and irregular GEMM accelerator with flexible interconnects for DNN training[C]//2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), San Diego, CA, USA, 2020: 58-70. [30] LEE T T, LIEW S Y. Parallel routing algorithms in Benes-Clos networks[C]//Proceedings of IEEE INFOCOM '96. Conference on Computer Communications, San Francisco, CA, 2002: 279-286. [31] ZHANG Z K, WANG H R, HAN S, et al. SpArch: efficient architecture for sparse matrix multiplication[C]//2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), San Diego, CA, USA, 2020: 261-274. [32] ZHANG G W, ATTALURI N, EMER J S, et al. Gamma: leveraging Gustavson’s algorithm to accelerate sparse matrix multiplication[C]//Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Virtual, Online, USA, 2021: 687-701. [33] YANG Y F, EMER J S, SANCHEZ D. Trapezoid: a versatile accelerator for dense and sparse matrix multiplications[C]//2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA), Buenos Aires, Argentina, 2024: 931-945. [34] HEO J, PUTRA A, YOON J, et al. EXION: exploiting inter- and intra-iteration output sparsity for diffusion models[EB/OL]. [2025-07-17]. https://arxiv.org/abs/2501.05680. [35] CHEN M Z, SHAO W Q, XU P, et al. EfficientQAT: efficient quantization-aware training for large language models[EB/OL]. [2025-07-17]. https://arxiv.org/abs/2407.11062. [36] TSENG A, CHEE J, SUN Q Y, et al. QuIP#: even better LLM quantization with hadamard incoherence and lattice codebooks[EB/OL]. [2025-07-17]. https://arxiv.org/abs/2402.04396. [37] SHAO W Q, CHEN M Z, ZHANG Z Y, et al. OmniQuant: omnidirectionally calibrated quantization for large language models[EB/OL]. [2025-07-17]. https://arxiv.org/abs/2308.13137. [38] YOUNG S I, ZHE W, TAUBMAN D, et al. Transform quantization for CNN compression[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(9): 5700-5714. [39] CHENG J, WU J X, LENG C, et al. Quantized CNN: a unified approach to accelerate and compress convolutional networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(10): 4730-4743. [40] FRANTAR E, ASHKBOOS S, HOEFLER T, et al. GPTQ: accurate post-training quantization for generative pre-trained transformers[EB/OL]. [2025-07-17]. https://arxiv.org/abs/2210.17323. [41] TAI Y S, WU A Y. MPTQ-ViT: mixed-precision post-training quantization for vision transformer[EB/OL]. [2025-07-17]. https://arxiv.org/abs/2401.14895. [42] LI M Y, LIN J, MENG C L, et al. Efficient spatially sparse inference for conditional GANs and diffusion models[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(12): 14465-14480. [43] ZOU S Y, TANG J J, ZHOU Y Y, et al. Towards efficient diffusion-based image editing with instant attention masks[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(7): 7864-7872. [44] KANG X L, WEI Q W, LI N Y, et al. SUArch: accelerating layer-wise N: M sparse pattern with a unified architecture for deep-learning edge device[C]//Proceedings of the 30th Asia and South Pacific Design Automation Conference, Tokyo, Japan, 2025: 700-705. [45] QU Z, LIU L, TU F B, et al. DOTA: detect and omit weak attentions for scalable transformer acceleration[C]//Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 2022: 14-26.
|