1
|
Jiang Y, Li X, Luo H, Yin S, Kaynak O. Quo vadis artificial intelligence? DISCOVER ARTIFICIAL INTELLIGENCE 2022. [DOI: 10.1007/s44163-022-00022-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
AbstractThe study of artificial intelligence (AI) has been a continuous endeavor of scientists and engineers for over 65 years. The simple contention is that human-created machines can do more than just labor-intensive work; they can develop human-like intelligence. Being aware or not, AI has penetrated into our daily lives, playing novel roles in industry, healthcare, transportation, education, and many more areas that are close to the general public. AI is believed to be one of the major drives to change socio-economical lives. In another aspect, AI contributes to the advancement of state-of-the-art technologies in many fields of study, as helpful tools for groundbreaking research. However, the prosperity of AI as we witness today was not established smoothly. During the past decades, AI has struggled through historical stages with several winters. Therefore, at this juncture, to enlighten future development, it is time to discuss the past, present, and have an outlook on AI. In this article, we will discuss from a historical perspective how challenges were faced on the path of revolution of both the AI tools and the AI systems. Especially, in addition to the technical development of AI in the short to mid-term, thoughts and insights are also presented regarding the symbiotic relationship of AI and humans in the long run.
Collapse
|
2
|
Valenzuela W, Saavedra A, Zarkesh-Ha P, Figueroa M. Motion-Based Object Location on a Smart Image Sensor Using On-Pixel Memory. SENSORS (BASEL, SWITZERLAND) 2022; 22:s22176538. [PMID: 36080999 PMCID: PMC9460117 DOI: 10.3390/s22176538] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/19/2022] [Revised: 08/18/2022] [Accepted: 08/24/2022] [Indexed: 05/27/2023]
Abstract
Object location is a crucial computer vision method often used as a previous stage to object classification. Object-location algorithms require high computational and memory resources, which poses a difficult challenge for portable and low-power devices, even when the algorithm is implemented using dedicated digital hardware. Moving part of the computation to the imager may reduce the memory requirements of the digital post-processor and exploit the parallelism available in the algorithm. This paper presents the architecture of a Smart Imaging Sensor (SIS) that performs object location using pixel-level parallelism. The SIS is based on a custom smart pixel, capable of computing frame differences in the analog domain, and a digital coprocessor that performs morphological operations and connected components to determine the bounding boxes of the detected objects. The smart-pixel array implements on-pixel temporal difference computation using analog memories to detect motion between consecutive frames. Our SIS can operate in two modes: (1) as a conventional image sensor and (2) as a smart sensor which delivers a binary image that highlights the pixels in which movement is detected between consecutive frames and the object bounding boxes. In this paper, we present the design of the smart pixel and evaluate its performance using post-parasitic extraction on a 0.35 µm mixed-signal CMOS process. With a pixel-pitch of 32 µm × 32 µm, we achieved a fill factor of 28%. To evaluate the scalability of the design, we ported the layout to a 0.18 µm process, achieving a fill factor of 74%. On an array of 320×240 smart pixels, the circuit operates at a maximum frame rate of 3846 frames per second. The digital coprocessor was implemented and validated on a Xilinx Artix-7 XC7A35T field-programmable gate array that runs at 125 MHz, locates objects in a video frame in 0.614 µs, and has a power consumption of 58 mW.
Collapse
Affiliation(s)
- Wladimir Valenzuela
- Department of Electrical Engineering, Faculty of Engineering, Universidad de Concepción, Concepción 4070386, Chile
| | - Antonio Saavedra
- Embedded Systems Architecture Group, Institute of Computer Engineering and Microelectronics, Electrical Engineering and Computer Science Faculty, Technische Universität Berlin, 10623 Berlin, Germany
| | - Payman Zarkesh-Ha
- Department of Electrical and Computer Engineering (ECE), School of Engineering, University of New Mexico, Albuquerque, NM 87131-1070, USA
| | - Miguel Figueroa
- Department of Electrical Engineering, Faculty of Engineering, Universidad de Concepción, Concepción 4070386, Chile
| |
Collapse
|
3
|
Research on the Lightweight Deployment Method of Integration of Training and Inference in Artificial Intelligence. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12136616] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
In recent years, the continuous development of artificial intelligence has largely been driven by algorithms and computing power. This paper mainly discusses the training and inference methods of artificial intelligence from the perspective of computing power. To address the issue of computing power, it is necessary to consider performance, cost, power consumption, flexibility, and robustness comprehensively. At present, the training of artificial intelligence models mostly are based on GPU platforms. Although GPUs offer high computing performance, their power consumption and cost are relatively high. It is not suitable to use GPUs as the implementation platform in certain application scenarios with demanding power consumption and cost. The emergence of high-performance heterogeneous architecture devices provides a new path for the integration of artificial intelligence training and inference. Typically, in Xilinx and Intel’s multi-core heterogeneous architecture, multiple high-performance processors and FPGAs are integrated into a single chip. When compared with the current separate training and inference method, heterogeneous architectures leverage a single chip to realize the integration of AI training and inference, providing a good balance of training and inference of different targets, further reducing the cost of training and implementation of AI inference and power consumption, so as to achieve the lightweight goals of computation, and to improve the flexibility and robustness of the system. In this paper, based on the LeNet-5 network structure, we first introduced the process of network training using a multi-core CPU in Xilinx’s latest multi-core heterogeneous architecture device, MPSoC. Then, the method of converting the network model into hardware logic implementation was studied, and the model parameters were transferred from the processing system of the device to the hardware accelerator structure, composed of programmable logic through the bus interface AXI provided on the chip. Finally, the integrated implementation method was tested and verified in Xilinx MPSoC. According to the test results, the recognition accuracy of this lightweight deployment scheme on MNIST dataset and CIFAR-10 dataset reached 99.5 and 75.4% respectively, while the average processing time of the single frame was only 2.2 ms. In addition, the power consumption of the network within the SoC hardware accelerator is only 1.363 W at 100 MHz.
Collapse
|
4
|
Embedded Object Detection with Custom LittleNet, FINN and Vitis AI DCNN Accelerators. JOURNAL OF LOW POWER ELECTRONICS AND APPLICATIONS 2022. [DOI: 10.3390/jlpea12020030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
Object detection is an essential component of many systems used, for example, in advanced driver assistance systems (ADAS) or advanced video surveillance systems (AVSS). Currently, the highest detection accuracy is achieved by solutions using deep convolutional neural networks (DCNN). Unfortunately, these come at the cost of a high computational complexity; hence, the work on the widely understood acceleration of these algorithms is very important and timely. In this work, we compare three different DCNN hardware accelerator implementation methods: coarse-grained (a custom accelerator called LittleNet), fine-grained (FINN) and sequential (Vitis AI). We evaluate the approaches in terms of object detection accuracy, throughput and energy usage on the VOT and VTB datasets. We also present the limitations of each of the methods considered. We describe the whole process of DNNs implementation, including architecture design, training, quantisation and hardware implementation. We used two custom DNN architectures to obtain a higher accuracy, higher throughput and lower energy consumption. The first was implemented in SystemVerilog and the second with the FINN tool from AMD Xilinx. Next, both approaches were compared with the Vitis AI tool from AMD Xilinx. The final implementations were tested on the Avnet Ultra96-V2 development board with the Zynq UltraScale+ MPSoC ZCU3EG device. For two different DNNs architectures, we achieved a throughput of 196 fps for our custom accelerator and 111 fps for FINN. The same networks implemented with Vitis AI achieved 123.3 fps and 53.3 fps, respectively.
Collapse
|
5
|
Pandelea V, Ragusa E, Apicella T, Gastaldo P, Cambria E. Emotion Recognition on Edge Devices: Training and Deployment. SENSORS 2021; 21:s21134496. [PMID: 34209251 PMCID: PMC8271649 DOI: 10.3390/s21134496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Revised: 06/28/2021] [Accepted: 06/28/2021] [Indexed: 11/28/2022]
Abstract
Emotion recognition, among other natural language processing tasks, has greatly benefited from the use of large transformer models. Deploying these models on resource-constrained devices, however, is a major challenge due to their computational cost. In this paper, we show that the combination of large transformers, as high-quality feature extractors, and simple hardware-friendly classifiers based on linear separators can achieve competitive performance while allowing real-time inference and fast training. Various solutions including batch and Online Sequential Learning are analyzed. Additionally, our experiments show that latency and performance can be further improved via dimensionality reduction and pre-training, respectively. The resulting system is implemented on two types of edge device, namely an edge accelerator and two smartphones.
Collapse
Affiliation(s)
- Vlad Pandelea
- School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Ave, Singapore 639798, Singapore;
| | - Edoardo Ragusa
- Department of Naval, Electric, Electronic and Telecommunications Engineering, University of Genoa, 16145 Genova, Italy; (E.R.); (T.A.); (P.G.)
| | - Tommaso Apicella
- Department of Naval, Electric, Electronic and Telecommunications Engineering, University of Genoa, 16145 Genova, Italy; (E.R.); (T.A.); (P.G.)
| | - Paolo Gastaldo
- Department of Naval, Electric, Electronic and Telecommunications Engineering, University of Genoa, 16145 Genova, Italy; (E.R.); (T.A.); (P.G.)
| | - Erik Cambria
- School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Ave, Singapore 639798, Singapore;
- Correspondence:
| |
Collapse
|