Enhancing CNN Inference Time and Reducing Latency on Edge and Resource-Constrained Systems through Quantization

Author
Keywords
Abstract

Systems that use Deep Learning (DL) models extensively utilize cloud computing for inference tasks in various domains such as traffic monitoring, healthcare, and IoT. However, applications like autonomous vehicles, surveillance systems, and spacecraft are transitioning towards edge computing due to band-width limitations, transmission delays, and network connectivity issues. Edge computing mitigates these challenges by reducing latency through local data and model processing on the device. Implementing Deep Neural Networks (DNNs) on edge devices faces resource constraints, such as limited memory, computing power, etc. DNNs employ 32-bit floating-point precision for accuracy, leading to inflated model sizes. Quantization offers a solution by converting high-precision floating-point (FP) values to lower-precision or integer (INT) values, focusing on throughput and improving latency. This paper presents a comparative study of the accuracy and performance of 64-bit, 32-bit, and 16-bit floating-point instructions, along with 8-bit integer instructions, using Post Training Quantization (PTQ) and Quantization-Aware Training (QAT), on multiple Nets including CustomNets, which was inferenced on a GPU as well as a Xilinx Deep Processing Unit (DPU). The models were evaluated on a sample of the EuroSat Remote Sensing dataset. Quantizing models to FP16 and INT8 resulted in 2-3 x and 4 x faster inferencing, respectively, with a negligible decrease in accuracy of 1-4 %. FP64 exhibited a 2 3 x decrease in speed but a slight accuracy improvement (2 %). On the DPU, models showed minimal accuracy degradation of about 1 %. Overall, model size decreased by a constant 2 x and 4x from FP32 to FP16 and INT8, respectively, while increasing by 2 x for FP64. This reduction in size, with negligible loss in accuracy enables onboard storage along withfaster and accurate inferencing on resource constraint systems. © 2024 IEEE.

Year of Conference
2024
Conference Name
2nd IEEE International Conference on Networks, Multimedia and Information Technology, NMITCON 2024
Publisher
Institute of Electrical and Electronics Engineers Inc.
ISBN Number
979-835037289-2 (ISBN)
DOI
10.1109/NMITCON62075.2024.10699069
Conference Proceedings
Download citation
Cits
0
CIT

For admissions and all other information, please visit the official website of

Cambridge Institute of Technology

Cambridge Group of Institutions

Contact

Web portal developed and administered by Dr. Subrahmanya S. Katte, Dean - Academics.

Contact the Site Admin.