1
|
Sun H, Liu R, Cai W, Wang J, Wang Y, Tang H, Cui Y, Yao D, Guo D. Reliable object tracking by multimodal hybrid feature extraction and transformer-based fusion. Neural Netw 2024; 178:106493. [PMID: 38970946 DOI: 10.1016/j.neunet.2024.106493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 05/21/2024] [Accepted: 06/25/2024] [Indexed: 07/08/2024]
Abstract
Visual object tracking, which is primarily based on visible light image sequences, encounters numerous challenges in complicated scenarios, such as low light conditions, high dynamic ranges, and background clutter. To address these challenges, incorporating the advantages of multiple visual modalities is a promising solution for achieving reliable object tracking. However, the existing approaches usually integrate multimodal inputs through adaptive local feature interactions, which cannot leverage the full potential of visual cues, thus resulting in insufficient feature modeling. In this study, we propose a novel multimodal hybrid tracker (MMHT) that utilizes frame-event-based data for reliable single object tracking. The MMHT model employs a hybrid backbone consisting of an artificial neural network (ANN) and a spiking neural network (SNN) to extract dominant features from different visual modalities and then uses a unified encoder to align the features across different domains. Moreover, we propose an enhanced transformer-based module to fuse multimodal features using attention mechanisms. With these methods, the MMHT model can effectively construct a multiscale and multidimensional visual feature space and achieve discriminative feature modeling. Extensive experiments demonstrate that the MMHT model exhibits competitive performance in comparison with that of other state-of-the-art methods. Overall, our results highlight the effectiveness of the MMHT model in terms of addressing the challenges faced in visual object tracking tasks.
Collapse
Affiliation(s)
- Hongze Sun
- Clinical Hospital of Chengdu Brain Science Institute, MOE Key Lab for NeuroInformation, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Rui Liu
- Clinical Hospital of Chengdu Brain Science Institute, MOE Key Lab for NeuroInformation, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Wuque Cai
- Clinical Hospital of Chengdu Brain Science Institute, MOE Key Lab for NeuroInformation, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Jun Wang
- Clinical Hospital of Chengdu Brain Science Institute, MOE Key Lab for NeuroInformation, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Yue Wang
- Clinical Hospital of Chengdu Brain Science Institute, MOE Key Lab for NeuroInformation, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Huajin Tang
- College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
| | - Yan Cui
- Clinical Hospital of Chengdu Brain Science Institute, MOE Key Lab for NeuroInformation, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China; Sichuan Academy of Medical Sciences and Sichuan Provincial People's Hospital, Chengdu 611731, China
| | - Dezhong Yao
- Clinical Hospital of Chengdu Brain Science Institute, MOE Key Lab for NeuroInformation, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China; Research Unit of NeuroInformation (2019RU035), Chinese Academy of Medical Sciences, Chengdu 611731, China.
| | - Daqing Guo
- Clinical Hospital of Chengdu Brain Science Institute, MOE Key Lab for NeuroInformation, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China.
| |
Collapse
|
2
|
Wu Y, Shi B, Zheng Z, Zheng H, Yu F, Liu X, Luo G, Deng L. Adaptive spatiotemporal neural networks through complementary hybridization. Nat Commun 2024; 15:7355. [PMID: 39191782 PMCID: PMC11350166 DOI: 10.1038/s41467-024-51641-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Accepted: 08/12/2024] [Indexed: 08/29/2024] Open
Abstract
Processing spatiotemporal data sources with both high spatial dimension and rich temporal information is a ubiquitous need in machine intelligence. Recurrent neural networks in the machine learning domain and bio-inspired spiking neural networks in the neuromorphic computing domain are two promising candidate models for dealing with spatiotemporal data via extrinsic dynamics and intrinsic dynamics, respectively. Nevertheless, these networks have disparate modeling paradigms, which leads to different performance results, making it hard for them to cover diverse data sources and performance requirements in practice. Constructing a unified modeling framework that can effectively and adaptively process variable spatiotemporal data in different situations remains quite challenging. In this work, we propose hybrid spatiotemporal neural networks created by combining the recurrent neural networks and spiking neural networks under a unified surrogate gradient learning framework and a Hessian-aware neuron selection method. By flexibly tuning the ratio between two types of neurons, the hybrid model demonstrates better adaptive ability in balancing different performance metrics, including accuracy, robustness, and efficiency on several typical benchmarks, and generally outperforms conventional single-paradigm recurrent neural networks and spiking neural networks. Furthermore, we evidence the great potential of the proposed network with a robotic task in varying environments. With our proof of concept, the proposed hybrid model provides a generic modeling route to process spatiotemporal data sources in the open world.
Collapse
Affiliation(s)
- Yujie Wu
- Center for Brain Inspired Computing Research (CBICR), Department of Precision Instrument, Tsinghua University, Beijing, China
- Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China
- Institute of Theoretical Computer Science, Graz University of Technology, Graz, Austria
| | - Bizhao Shi
- School of Computer Science, Peking University, Beijing, China
- Center for Energy-Efficient Computing and Applications, Peking University, Beijing, China
| | - Zhong Zheng
- Center for Brain Inspired Computing Research (CBICR), Department of Precision Instrument, Tsinghua University, Beijing, China
| | - Hanle Zheng
- Center for Brain Inspired Computing Research (CBICR), Department of Precision Instrument, Tsinghua University, Beijing, China
| | - Fangwen Yu
- Center for Brain Inspired Computing Research (CBICR), Department of Precision Instrument, Tsinghua University, Beijing, China
| | - Xue Liu
- Center for Brain Inspired Computing Research (CBICR), Department of Precision Instrument, Tsinghua University, Beijing, China
| | - Guojie Luo
- School of Computer Science, Peking University, Beijing, China
- Center for Energy-Efficient Computing and Applications, Peking University, Beijing, China
| | - Lei Deng
- Center for Brain Inspired Computing Research (CBICR), Department of Precision Instrument, Tsinghua University, Beijing, China.
| |
Collapse
|
3
|
Li Z, Li Z, Tang W, Yao J, Dou Z, Gong J, Li Y, Zhang B, Dong Y, Xia J, Sun L, Jiang P, Cao X, Yang R, Miao X, Yang R. Crossmodal sensory neurons based on high-performance flexible memristors for human-machine in-sensor computing system. Nat Commun 2024; 15:7275. [PMID: 39179548 PMCID: PMC11344147 DOI: 10.1038/s41467-024-51609-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Accepted: 08/13/2024] [Indexed: 08/26/2024] Open
Abstract
Constructing crossmodal in-sensor processing system based on high-performance flexible devices is of great significance for the development of wearable human-machine interfaces. A bio-inspired crossmodal in-sensor computing system can perform real-time energy-efficient processing of multimodal signals, alleviating data conversion and transmission between different modules in conventional chips. Here, we report a bio-inspired crossmodal spiking sensory neuron (CSSN) based on a flexible VO2 memristor, and demonstrate a crossmodal in-sensor encoding and computing system for wearable human-machine interfaces. We demonstrate excellent performance in the VO2 memristor including endurance (>1012), uniformity (0.72% for cycle-to-cycle variations and 3.73% for device-to-device variations), speed (<30 ns), and flexibility (bendable to a curvature radius of 1 mm). A flexible hardware processing system is implemented based on the CSSN, which can directly perceive and encode pressure and temperature bimodal information into spikes, and then enables the real-time haptic-feedback for human-machine interaction. We successfully construct a crossmodal in-sensor spiking reservoir computing system via the CSSNs, which can achieve dynamic objects identification with a high accuracy of 98.1% and real-time signal feedback. This work provides a feasible approach for constructing flexible bio-inspired crossmodal in-sensor computing systems for wearable human-machine interfaces.
Collapse
Affiliation(s)
- Zhiyuan Li
- School of Integrated Circuits, Huazhong University of Science and Technology, Wuhan, China
- Hubei Yangtze Memory Laboratories, Wuhan, China
| | - Zhongshao Li
- State Key Laboratory of High Performance Ceramics and Superfine Microstructure, Shanghai Institute of Ceramics, Chinese Academy of Sciences, Shanghai, China
- Center of Materials Science and Optoelectronics Engineering, University of Chinese Academy of Sciences, Beijing, China
| | - Wei Tang
- School of Integrated Circuits, Huazhong University of Science and Technology, Wuhan, China
| | - Jiaping Yao
- School of Integrated Circuits, Huazhong University of Science and Technology, Wuhan, China
| | - Zhipeng Dou
- State Key Laboratory of Catalysis, CAS Center for Excellence in Nanoscience, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, China
| | - Junjie Gong
- School of Integrated Circuits, Huazhong University of Science and Technology, Wuhan, China
| | - Yongfei Li
- School of Integrated Circuits, Huazhong University of Science and Technology, Wuhan, China
| | - Beining Zhang
- School of Integrated Circuits, Huazhong University of Science and Technology, Wuhan, China
| | - Yunxiao Dong
- School of Integrated Circuits, Huazhong University of Science and Technology, Wuhan, China
| | - Jian Xia
- School of Integrated Circuits, Huazhong University of Science and Technology, Wuhan, China
| | - Lin Sun
- State Key Laboratory of Catalysis, CAS Center for Excellence in Nanoscience, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, China
| | - Peng Jiang
- State Key Laboratory of Catalysis, CAS Center for Excellence in Nanoscience, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, China
| | - Xun Cao
- State Key Laboratory of High Performance Ceramics and Superfine Microstructure, Shanghai Institute of Ceramics, Chinese Academy of Sciences, Shanghai, China.
- Center of Materials Science and Optoelectronics Engineering, University of Chinese Academy of Sciences, Beijing, China.
| | - Rui Yang
- School of Integrated Circuits, Huazhong University of Science and Technology, Wuhan, China.
- Hubei Yangtze Memory Laboratories, Wuhan, China.
| | - Xiangshui Miao
- School of Integrated Circuits, Huazhong University of Science and Technology, Wuhan, China.
- Hubei Yangtze Memory Laboratories, Wuhan, China.
| | - Ronggui Yang
- State Key Laboratory of Coal Combustion, School of Energy and Power Engineering, Huazhong University of Science and Technology, Wuhan, China
| |
Collapse
|
4
|
Yan T, Zhou T, Guo Y, Zhao Y, Shao G, Wu J, Huang R, Dai Q, Fang L. Nanowatt all-optical 3D perception for mobile robotics. SCIENCE ADVANCES 2024; 10:eadn2031. [PMID: 38968351 PMCID: PMC11225784 DOI: 10.1126/sciadv.adn2031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 06/03/2024] [Indexed: 07/07/2024]
Abstract
Three-dimensional (3D) perception is vital to drive mobile robotics' progress toward intelligence. However, state-of-the-art 3D perception solutions require complicated postprocessing or point-by-point scanning, suffering computational burden, latency of tens of milliseconds, and additional power consumption. Here, we propose a parallel all-optical computational chipset 3D perception architecture (Aop3D) with nanowatt power and light speed. The 3D perception is executed during the light propagation over the passive chipset, and the captured light intensity distribution provides a direct reflection of the depth map, eliminating the need for extensive postprocessing. The prototype system of Aop3D is tested in various scenarios and deployed to a mobile robot, demonstrating unprecedented performance in distance detection and obstacle avoidance. Moreover, Aop3D works at a frame rate of 600 hertz and a power consumption of 33.3 nanowatts per meta-pixel experimentally. Our work is promising toward next-generation direct 3D perception techniques with light speed and high energy efficiency.
Collapse
Affiliation(s)
- Tao Yan
- Department of Automation, Tsinghua University, Beijing 100084, China
| | - Tiankuang Zhou
- Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
- Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing 100084, China
| | - Yanchen Guo
- Department of Automation, Tsinghua University, Beijing 100084, China
- Shenzhen International Graduate School, Tsinghua University, Shenzhen 518071, China
| | - Yun Zhao
- Department of Automation, Tsinghua University, Beijing 100084, China
- Shenzhen International Graduate School, Tsinghua University, Shenzhen 518071, China
| | - Guocheng Shao
- Department of Automation, Tsinghua University, Beijing 100084, China
- Shenzhen International Graduate School, Tsinghua University, Shenzhen 518071, China
| | - Jiamin Wu
- Department of Automation, Tsinghua University, Beijing 100084, China
- Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing 100084, China
- Institute for Brain and Cognitive Sciences, Tsinghua University, Beijing 100084, China
| | - Ruqi Huang
- Shenzhen International Graduate School, Tsinghua University, Shenzhen 518071, China
| | - Qionghai Dai
- Department of Automation, Tsinghua University, Beijing 100084, China
- Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing 100084, China
- Institute for Brain and Cognitive Sciences, Tsinghua University, Beijing 100084, China
| | - Lu Fang
- Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
- Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing 100084, China
- Institute for Brain and Cognitive Sciences, Tsinghua University, Beijing 100084, China
| |
Collapse
|
5
|
Wang K, Liao Y, Li W, Li J, Su H, Chen R, Park JH, Zhang Y, Zhou X, Wu C, Liu Z, Guo T, Kim TW. Memory-electroluminescence for multiple action-potentials combination in bio-inspired afferent nerves. Nat Commun 2024; 15:3505. [PMID: 38664383 PMCID: PMC11045776 DOI: 10.1038/s41467-024-47641-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 04/05/2024] [Indexed: 04/28/2024] Open
Abstract
The development of optoelectronics mimicking the functions of the biological nervous system is important to artificial intelligence. This work demonstrates an optoelectronic, artificial, afferent-nerve strategy based on memory-electroluminescence spikes, which can realize multiple action-potentials combination through a single optical channel. The memory-electroluminescence spikes have diverse morphologies due to their history-dependent characteristics and can be used to encode distributed sensor signals. As the key to successful functioning of the optoelectronic, artificial afferent nerve, a driving mode for light-emitting diodes, namely, the non-carrier injection mode, is proposed, allowing it to drive nanoscale light-emitting diodes to generate a memory-electroluminescence spikes that has multiple sub-peaks. Moreover, multiplexing of the spikes can be obtained by using optical signals with different wavelengths, allowing for a large signal bandwidth, and the multiple action-potentials transmission process in afferent nerves can be demonstrated. Finally, sensor-position recognition with the bio-inspired afferent nerve is developed and shown to have a high recognition accuracy of 98.88%. This work demonstrates a strategy for mimicking biological afferent nerves and offers insights into the construction of artificial perception systems.
Collapse
Affiliation(s)
- Kun Wang
- College of Physics and Information Engineering, Fuzhou University, Fuzhou, 350108, China
| | - Yitao Liao
- College of Physics and Information Engineering, Fuzhou University, Fuzhou, 350108, China
| | - Wenhao Li
- College of Physics and Information Engineering, Fuzhou University, Fuzhou, 350108, China
| | - Junlong Li
- College of Physics and Information Engineering, Fuzhou University, Fuzhou, 350108, China
| | - Hao Su
- College of Physics and Information Engineering, Fuzhou University, Fuzhou, 350108, China
| | - Rong Chen
- Fujian Science & Technology Innovation Laboratory for Optoelectronic Information of China, Fuzhou, 350108, China
| | - Jae Hyeon Park
- Department of Electronic and Computer Engineering, Hanyang University, Seoul, 133-791, Korea
| | - Yongai Zhang
- College of Physics and Information Engineering, Fuzhou University, Fuzhou, 350108, China
- Fujian Science & Technology Innovation Laboratory for Optoelectronic Information of China, Fuzhou, 350108, China
| | - Xiongtu Zhou
- College of Physics and Information Engineering, Fuzhou University, Fuzhou, 350108, China
- Fujian Science & Technology Innovation Laboratory for Optoelectronic Information of China, Fuzhou, 350108, China
| | - Chaoxing Wu
- College of Physics and Information Engineering, Fuzhou University, Fuzhou, 350108, China.
- Fujian Science & Technology Innovation Laboratory for Optoelectronic Information of China, Fuzhou, 350108, China.
| | - Zhiqiang Liu
- Research and Development Center for Semiconductor Lighting Technology, Institute of Semiconductors, Chinese Academy of Sciences, Beijing, 100083, China.
| | - Tailiang Guo
- College of Physics and Information Engineering, Fuzhou University, Fuzhou, 350108, China.
- Fujian Science & Technology Innovation Laboratory for Optoelectronic Information of China, Fuzhou, 350108, China.
| | - Tae Whan Kim
- Department of Electronic and Computer Engineering, Hanyang University, Seoul, 133-791, Korea.
| |
Collapse
|
6
|
Xu C, Solomon SA, Gao W. Artificial Intelligence-Powered Electronic Skin. NAT MACH INTELL 2023; 5:1344-1355. [PMID: 38370145 PMCID: PMC10868719 DOI: 10.1038/s42256-023-00760-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2023] [Accepted: 10/18/2023] [Indexed: 02/20/2024]
Abstract
Skin-interfaced electronics is gradually changing medical practices by enabling continuous and noninvasive tracking of physiological and biochemical information. With the rise of big data and digital medicine, next-generation electronic skin (e-skin) will be able to use artificial intelligence (AI) to optimize its design as well as uncover user-personalized health profiles. Recent multimodal e-skin platforms have already employed machine learning (ML) algorithms for autonomous data analytics. Unfortunately, there is a lack of appropriate AI protocols and guidelines for e-skin devices, resulting in overly complex models and non-reproducible conclusions for simple applications. This review aims to present AI technologies in e-skin hardware and assess their potential for new inspired integrated platform solutions. We outline recent breakthroughs in AI strategies and their applications in engineering e-skins as well as understanding health information collected by e-skins, highlighting the transformative deployment of AI in robotics, prosthetics, virtual reality, and personalized healthcare. We also discuss the challenges and prospects of AI-powered e-skins as well as predictions for the future trajectory of smart e-skins.
Collapse
Affiliation(s)
- Changhao Xu
- Andrew and Peggy Cherng Department of Medical Engineering, Division of Engineering and Applied Science, California Institute of Technology, Pasadena, CA, USA
| | - Samuel A. Solomon
- Andrew and Peggy Cherng Department of Medical Engineering, Division of Engineering and Applied Science, California Institute of Technology, Pasadena, CA, USA
| | - Wei Gao
- Andrew and Peggy Cherng Department of Medical Engineering, Division of Engineering and Applied Science, California Institute of Technology, Pasadena, CA, USA
| |
Collapse
|
7
|
Zhang C, Yang Z, Xue B, Zhuo H, Liao L, Yang X, Zhu Z. Perceiving like a Bat: Hierarchical 3D Geometric-Semantic Scene Understanding Inspired by a Biomimetic Mechanism. Biomimetics (Basel) 2023; 8:436. [PMID: 37754187 PMCID: PMC10526479 DOI: 10.3390/biomimetics8050436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2023] [Revised: 09/05/2023] [Accepted: 09/13/2023] [Indexed: 09/28/2023] Open
Abstract
Geometric-semantic scene understanding is a spatial intelligence capability that is essential for robots to perceive and navigate the world. However, understanding a natural scene remains challenging for robots because of restricted sensors and time-varying situations. In contrast, humans and animals are able to form a complex neuromorphic concept of the scene they move in. This neuromorphic concept captures geometric and semantic aspects of the scenario and reconstructs the scene at multiple levels of abstraction. This article seeks to reduce the gap between robot and animal perception by proposing an ingenious scene-understanding approach that seamlessly captures geometric and semantic aspects in an unexplored environment. We proposed two types of biologically inspired environment perception methods, i.e., a set of elaborate biomimetic sensors and a brain-inspired parsing algorithm related to scene understanding, that enable robots to perceive their surroundings like bats. Our evaluations show that the proposed scene-understanding system achieves competitive performance in image semantic segmentation and volumetric-semantic scene reconstruction. Moreover, to verify the practicability of our proposed scene-understanding method, we also conducted real-world geometric-semantic scene reconstruction in an indoor environment with our self-developed drone.
Collapse
Affiliation(s)
| | - Zhong Yang
- College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China; (C.Z.)
| | | | | | | | | | | |
Collapse
|