Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Zaman KS, Reaz MBI, Md Ali SH, Bakar AAA, Chowdhury MEH. Custom Hardware Architectures for Deep Learning on Portable Devices: A Review. IEEE Trans Neural Netw Learn Syst 2022;33:6068-6088. [PMID: 34086580 DOI: 10.1109/tnnls.2021.3082304] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]

For:	Zaman KS, Reaz MBI, Md Ali SH, Bakar AAA, Chowdhury MEH. Custom Hardware Architectures for Deep Learning on Portable Devices: A Review. IEEE Trans Neural Netw Learn Syst 2022;33:6068-6088. [PMID: 34086580 DOI: 10.1109/tnnls.2021.3082304] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]

Number

Cited by Other Article(s)

Jiang Y, Li X, Luo H, Yin S, Kaynak O. Quo vadis artificial intelligence? DISCOVER ARTIFICIAL INTELLIGENCE 2022. [DOI: 10.1007/s44163-022-00022-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]

Valenzuela W, Saavedra A, Zarkesh-Ha P, Figueroa M. Motion-Based Object Location on a Smart Image Sensor Using On-Pixel Memory. SENSORS (BASEL, SWITZERLAND) 2022;22:s22176538. [PMID: 36080999 PMCID: PMC9460117 DOI: 10.3390/s22176538] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/19/2022] [Revised: 08/18/2022] [Accepted: 08/24/2022] [Indexed: 05/27/2023]

Abstract

Object location is a crucial computer vision method often used as a previous stage to object classification. Object-location algorithms require high computational and memory resources, which poses a difficult challenge for portable and low-power devices, even when the algorithm is implemented using dedicated digital hardware. Moving part of the computation to the imager may reduce the memory requirements of the digital post-processor and exploit the parallelism available in the algorithm. This paper presents the architecture of a Smart Imaging Sensor (SIS) that performs object location using pixel-level parallelism. The SIS is based on a custom smart pixel, capable of computing frame differences in the analog domain, and a digital coprocessor that performs morphological operations and connected components to determine the bounding boxes of the detected objects. The smart-pixel array implements on-pixel temporal difference computation using analog memories to detect motion between consecutive frames. Our SIS can operate in two modes: (1) as a conventional image sensor and (2) as a smart sensor which delivers a binary image that highlights the pixels in which movement is detected between consecutive frames and the object bounding boxes. In this paper, we present the design of the smart pixel and evaluate its performance using post-parasitic extraction on a 0.35 µm mixed-signal CMOS process. With a pixel-pitch of 32 µm × 32 µm, we achieved a fill factor of 28%. To evaluate the scalability of the design, we ported the layout to a 0.18 µm process, achieving a fill factor of 74%. On an array of 320×240 smart pixels, the circuit operates at a maximum frame rate of 3846 frames per second. The digital coprocessor was implemented and validated on a Xilinx Artix-7 XC7A35T field-programmable gate array that runs at 125 MHz, locates objects in a video frame in 0.614 µs, and has a power consumption of 58 mW.

Collapse

Research on the Lightweight Deployment Method of Integration of Training and Inference in Artificial Intelligence. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12136616] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]

Abstract In recent years, the continuous development of artificial intelligence has largely been driven by algorithms and computing power. This paper mainly discusses the training and inference methods of artificial intelligence from the perspective of computing power. To address the issue of computing power, it is necessary to consider performance, cost, power consumption, flexibility, and robustness comprehensively. At present, the training of artificial intelligence models mostly are based on GPU platforms. Although GPUs offer high computing performance, their power consumption and cost are relatively high. It is not suitable to use GPUs as the implementation platform in certain application scenarios with demanding power consumption and cost. The emergence of high-performance heterogeneous architecture devices provides a new path for the integration of artificial intelligence training and inference. Typically, in Xilinx and Intel’s multi-core heterogeneous architecture, multiple high-performance processors and FPGAs are integrated into a single chip. When compared with the current separate training and inference method, heterogeneous architectures leverage a single chip to realize the integration of AI training and inference, providing a good balance of training and inference of different targets, further reducing the cost of training and implementation of AI inference and power consumption, so as to achieve the lightweight goals of computation, and to improve the flexibility and robustness of the system. In this paper, based on the LeNet-5 network structure, we first introduced the process of network training using a multi-core CPU in Xilinx’s latest multi-core heterogeneous architecture device, MPSoC. Then, the method of converting the network model into hardware logic implementation was studied, and the model parameters were transferred from the processing system of the device to the hardware accelerator structure, composed of programmable logic through the bus interface AXI provided on the chip. Finally, the integrated implementation method was tested and verified in Xilinx MPSoC. According to the test results, the recognition accuracy of this lightweight deployment scheme on MNIST dataset and CIFAR-10 dataset reached 99.5 and 75.4% respectively, while the average processing time of the single frame was only 2.2 ms. In addition, the power consumption of the network within the SoC hardware accelerator is only 1.363 W at 100 MHz. Collapse

Embedded Object Detection with Custom LittleNet, FINN and Vitis AI DCNN Accelerators. JOURNAL OF LOW POWER ELECTRONICS AND APPLICATIONS 2022. [DOI: 10.3390/jlpea12020030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]

Pandelea V, Ragusa E, Apicella T, Gastaldo P, Cambria E. Emotion Recognition on Edge Devices: Training and Deployment. SENSORS 2021;21:s21134496. [PMID: 34209251 PMCID: PMC8271649 DOI: 10.3390/s21134496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Revised: 06/28/2021] [Accepted: 06/28/2021] [Indexed: 11/28/2022]