1
|
Astorayme MA, Vázquez-Rowe I, Kahhat R. The use of artificial intelligence algorithms to detect macroplastics in aquatic environments: A critical review. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 945:173843. [PMID: 38871326 DOI: 10.1016/j.scitotenv.2024.173843] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 05/08/2024] [Accepted: 06/05/2024] [Indexed: 06/15/2024]
Abstract
The presence of macroplastic (MP) is having serious consequences on natural ecosystems, directly affecting biota and human wellbeing. Given this scenario, estimating MPs' abundance is crucial for assessing the issue and formulating effective waste management strategies. In this context, the main objective of this critical review is to analyze the use of machine learning (ML) techniques, with a particular interest in deep learning (DL) approaches, to detect, classify and quantify MPs in aquatic environments, supported by datasets such as satellite or aerial images and video recordings taken by unmanned aerial vehicles. This article provides a concise overview of artificial intelligence concepts, followed by a bibliometric analysis and a critical review. The search methodology aimed to categorize the scientific contributions through temporal and spatial criteria for bibliometric analysis, whereas the critical review was based on generating homogeneous groups according to the complexity of ML and DL methods, as well as the type of dataset. In light of the review carried out, classical ML techniques, such as random forest or support vector machines, showed robustness in MPs detection. However, it seems that achieving optimal efficiencies in multiclass classification is a limitation for these methods. Consequently, more advanced techniques such as DL approaches are taking the lead for the detection and multiclass classification of MPs. A series of architectures based on convolutional neural networks, and the use of complex pre-trained models through the transfer learning, are currently being explored (e.g., VGG16 and YOLO models), although currently the computational expense is high due to the need for processing large volumes of data. Additionally, there seems to be a trend towards detecting smaller plastic, which need higher resolution images. Finally, it is important to stress that since 2020 there has been a significant increase in scientific research focusing on transformer-based architectures for object detection. Although this can be considered the current state of the art, no studies have been identified that utilize these architectures for MP detection.
Collapse
Affiliation(s)
- Miguel Angel Astorayme
- Peruvian Life Cycle Assessment & Industrial Ecology Network (PELCAN), Department of Engineering, Pontificia Universidad Católica del Perú, Av. Universitaria 1801, San Miguel 15074, Lima, Peru; Dept. of Fluid Mechanics Engineering, Universidad Nacional Mayor de San Marcos, Av. Universitaria/Av. Germán Amézaga s/n., Lima 1508, Lima, Peru..
| | - Ian Vázquez-Rowe
- Peruvian Life Cycle Assessment & Industrial Ecology Network (PELCAN), Department of Engineering, Pontificia Universidad Católica del Perú, Av. Universitaria 1801, San Miguel 15074, Lima, Peru
| | - Ramzy Kahhat
- Peruvian Life Cycle Assessment & Industrial Ecology Network (PELCAN), Department of Engineering, Pontificia Universidad Católica del Perú, Av. Universitaria 1801, San Miguel 15074, Lima, Peru
| |
Collapse
|
2
|
Halder S, Islam N, Ray B, Andrews E, Hettiarachchi P, Jackson E. AI-based seagrass morphology measurement. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2024; 369:122246. [PMID: 39241598 DOI: 10.1016/j.jenvman.2024.122246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Revised: 07/31/2024] [Accepted: 08/16/2024] [Indexed: 09/09/2024]
Abstract
Seagrass meadows are an essential part of the Great Barrier Reef ecosystem, providing various benefits such as filtering nutrients and sediment, serving as a nursery for fish and shellfish, and capturing atmospheric carbon as blue carbon. Understanding the phenotypic plasticity of seagrasses and their ability to acclimate their morphology in response to environ-mental stressors is crucial. Investigating these morphological changes can provide valuable insights into ecosystem health and inform conservation strategies aimed at mitigating seagrass decline. Measuring seagrass growth by measuring morphological parameters such as the length and width of leaves, rhizomes, and roots is essential. The manual process of measuring morphological parameters of seagrass can be time-consuming, inaccurate and costly, so researchers are exploring machine-learning techniques to automate the process. To automate this process, researchers have developed a machine learning model that utilizes image processing and artificial intelligence to measure morphological parameters from digital imagery. The study uses a deep learning model called YOLO-v6 to classify three distinct seagrass object types and determine their dimensions. The results suggest that the proposed model is highly effective, with an average recall of 97.5%, an average precision of 83.7%, and an average f1 score of 90.1%. The model code has been made publicly available on GitHub (https://github.com/sajalhalder/AI-ASMM).
Collapse
Affiliation(s)
- Sajal Halder
- College of ICT, School of Engineering and Technology, Central Queensland University, Melbourne, Australia; Data61, CSIRO, Melbourne, Australia.
| | - Nahina Islam
- College of ICT, School of Engineering and Technology, Central Queensland University, Melbourne, Australia; Centre of Machine Learning, Networking and Education Technology (CML-NET), Central Queensland University, Rockhampton, Australia.
| | - Biplob Ray
- College of ICT, School of Engineering and Technology, Central Queensland University, Melbourne, Australia; Centre of Machine Learning, Networking and Education Technology (CML-NET), Central Queensland University, Rockhampton, Australia.
| | - Elizabeth Andrews
- Coastal Marine Ecosystems Research Centre (CMERC), Central Queensland University, Gladstone, QLD, Australia.
| | - Pushpika Hettiarachchi
- College of ICT, School of Engineering and Technology, Central Queensland University, Melbourne, Australia.
| | - Emma Jackson
- Coastal Marine Ecosystems Research Centre (CMERC), Central Queensland University, Gladstone, QLD, Australia.
| |
Collapse
|
3
|
Chen Q, Li M, Lai Z, Zhu J, Guan L. A Multi-Scale Target Detection Method Using an Improved Faster Region Convolutional Neural Network Based on Enhanced Backbone and Optimized Mechanisms. J Imaging 2024; 10:197. [PMID: 39194986 DOI: 10.3390/jimaging10080197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2024] [Revised: 08/08/2024] [Accepted: 08/10/2024] [Indexed: 08/29/2024] Open
Abstract
Currently, existing deep learning methods exhibit many limitations in multi-target detection, such as low accuracy and high rates of false detection and missed detections. This paper proposes an improved Faster R-CNN algorithm, aiming to enhance the algorithm's capability in detecting multi-scale targets. This algorithm has three improvements based on Faster R-CNN. Firstly, the new algorithm uses the ResNet101 network for feature extraction of the detection image, which achieves stronger feature extraction capabilities. Secondly, the new algorithm integrates Online Hard Example Mining (OHEM), Soft non-maximum suppression (Soft-NMS), and Distance Intersection Over Union (DIOU) modules, which improves the positive and negative sample imbalance and the problem of small targets being easily missed during model training. Finally, the Region Proposal Network (RPN) is simplified to achieve a faster detection speed and a lower miss rate. The multi-scale training (MST) strategy is also used to train the improved Faster R-CNN to achieve a balance between detection accuracy and efficiency. Compared to the other detection models, the improved Faster R-CNN demonstrates significant advantages in terms of mAP@0.5, F1-score, and Log average miss rate (LAMR). The model proposed in this paper provides valuable insights and inspiration for many fields, such as smart agriculture, medical diagnosis, and face recognition.
Collapse
Affiliation(s)
- Qianyong Chen
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou 341000, China
| | - Mengshan Li
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou 341000, China
| | - Zhenghui Lai
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou 341000, China
| | - Jihong Zhu
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou 341000, China
| | - Lixin Guan
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou 341000, China
| |
Collapse
|
4
|
Zhou H. Multiscale apple recognition method based on improved CenterNet. Heliyon 2024; 10:e29035. [PMID: 38633658 PMCID: PMC11021973 DOI: 10.1016/j.heliyon.2024.e29035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2023] [Revised: 03/28/2024] [Accepted: 03/28/2024] [Indexed: 04/19/2024] Open
Abstract
Traditional apple-picking robots are unable to detect apples in real-time in complex environments. In order to improve detection efficiency, a fast CenterNet apple recognition method for multiple apple targets in dense scenes is proposed. This method can quickly and accurately identify multiple apple targets in dense scenes. The backbone network mainly consists of resnet-44 fully convolutional network, region of interest network (RPN), and region of interest (ROI). The experimental results show that the improved YoloV5 network model has a higher recognition accuracy of 94.1% and 95.8% for apple in the night environment, which improves the recognition accuracy of the occluded features and the features in the dark light, and the model is more robust in the actual data set.
Collapse
Affiliation(s)
- Han Zhou
- College of Mechanical and Electrical Engineering, Hainan Vocational University of Science and Technology, Haikou, 571126, Hainan, China
| |
Collapse
|
5
|
Chen BL, Cheng TH, Huang YC, Hsieh YL, Hsu HC, Lu CY, Huang MH, Nien SY, Kuo YF. Developing an automatic warning system for anomalous chicken dispersion and movement using deep learning and machine learning. Poult Sci 2023; 102:103040. [PMID: 37769488 PMCID: PMC10539969 DOI: 10.1016/j.psj.2023.103040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 08/07/2023] [Accepted: 08/13/2023] [Indexed: 10/02/2023] Open
Abstract
Chicken is a major source of dietary protein worldwide. The dispersion and movement of chickens constitute vital indicators of their health and status. This is especially evident in Taiwanese native chickens (TNCs), a local variety which is high in physical activity when healthy. Conventionally, the dispersion and movement of chicken flocks are observed in patrols. However, manual patrolling is laborious and time-consuming. Moreover, frequent patrols increase the risk of carrying pathogens into chicken farms. To address these issues, this study proposes an approach to develop an automatic warning system for anomalous dispersion and movement of chicken flocks in commercial chicken farms. Embendded systems were developed to acquire videos of chickens from overhead view in a chicken house, in which approximately 20,000 TNCs were raised for a period of 10 wk. Each video was 5-min in length. The videos were transmitted to a remote cloud server and were converted into images. A You Only Look Once-version 7 tiny (YOLOv7-tiny) object detection model was trained to detect chickens in the images. The dispersion of the chicken flocks in a 5-min long video was calculated using nearest neighbor index (NNI). The movement of the chicken flocks in a 5-min long video was quantified using simple online and real-time tracking algorithm (SORT). The normal ranges (i.e., 95% confidence intervals) of chicken dispersion and movement were established using an autoregressive integrated moving average (ARIMA) model and a seasonal autoregressive integrated moving average with exogenous factors (SARIMAX) model, respectively. The system allows farmers to check up on the chicken farm only when the dispersion or movement values were not in the normal ranges. Thus, labor time can be saved and the risk of carrying pathogens into chicken farms can be reduced. The trained YOLOv7-tiny model achieved an average precision of 98.2% in chicken detection. SORT achieved a multiple object tracking accuracy of 95.3%. The ARIMA and SARIMAX achieved a mean absolute percentage error 3.71% and 13.39%, respectively, in forecasting dispersion and movement. The proposed approach can serve as a solution for automatic monitoring of anomalous chicken dispersion and movement in chicken farming, alerting farmers of potential health risks and environmental hazards in chicken farms.
Collapse
Affiliation(s)
- Bo-Lin Chen
- Department of Biomechatronics Engineering, National Taiwan University, Taipei, Taiwan
| | - Ting-Hui Cheng
- Department of Biomechatronics Engineering, National Taiwan University, Taipei, Taiwan
| | - Yi-Che Huang
- Department of Biomechatronics Engineering, National Taiwan University, Taipei, Taiwan
| | - Yu-Lun Hsieh
- Department of Biomechatronics Engineering, National Taiwan University, Taipei, Taiwan
| | - Hao-Chun Hsu
- Department of Biomechatronics Engineering, National Taiwan University, Taipei, Taiwan
| | - Chen-Yi Lu
- Department of Biomechatronics Engineering, National Taiwan University, Taipei, Taiwan
| | - Mao-Hsiang Huang
- Department of Biomechatronics Engineering, National Taiwan University, Taipei, Taiwan
| | - Shu-Yao Nien
- Department of Biomechatronics Engineering, National Taiwan University, Taipei, Taiwan
| | - Yan-Fu Kuo
- Department of Biomechatronics Engineering, National Taiwan University, Taipei, Taiwan.
| |
Collapse
|
6
|
Chai B, Efstathiou C, Yue H, Draviam VM. Opportunities and challenges for deep learning in cell dynamics research. Trends Cell Biol 2023:S0962-8924(23)00228-3. [PMID: 38030542 DOI: 10.1016/j.tcb.2023.10.010] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 09/30/2023] [Accepted: 10/13/2023] [Indexed: 12/01/2023]
Abstract
The growth of artificial intelligence (AI) has led to an increase in the adoption of computer vision and deep learning (DL) techniques for the evaluation of microscopy images and movies. This adoption has not only addressed hurdles in quantitative analysis of dynamic cell biological processes but has also started to support advances in drug development, precision medicine, and genome-phenome mapping. We survey existing AI-based techniques and tools, as well as open-source datasets, with a specific focus on the computational tasks of segmentation, classification, and tracking of cellular and subcellular structures and dynamics. We summarise long-standing challenges in microscopy video analysis from a computational perspective and review emerging research frontiers and innovative applications for DL-guided automation in cell dynamics research.
Collapse
Affiliation(s)
- Binghao Chai
- School of Biological and Behavioural Sciences, Queen Mary University of London (QMUL), London E1 4NS, UK
| | - Christoforos Efstathiou
- School of Biological and Behavioural Sciences, Queen Mary University of London (QMUL), London E1 4NS, UK
| | - Haoran Yue
- School of Biological and Behavioural Sciences, Queen Mary University of London (QMUL), London E1 4NS, UK
| | - Viji M Draviam
- School of Biological and Behavioural Sciences, Queen Mary University of London (QMUL), London E1 4NS, UK; The Alan Turing Institute, London NW1 2DB, UK.
| |
Collapse
|
7
|
Malta A, Farinha T, Mendes M. Augmented Reality in Maintenance-History and Perspectives. J Imaging 2023; 9:142. [PMID: 37504819 PMCID: PMC10381749 DOI: 10.3390/jimaging9070142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 06/23/2023] [Accepted: 07/03/2023] [Indexed: 07/29/2023] Open
Abstract
Augmented Reality (AR) is a technology that allows virtual elements to be superimposed over images of real contexts, whether these are text elements, graphics, or other types of objects. Smart AR glasses are increasingly optimized, and modern ones have features such as Global Positioning System (GPS), a microphone, and gesture recognition, among others. These devices allow users to have their hands free to perform tasks while they receive instructions in real time through the glasses. This allows maintenance professionals to carry out interventions more efficiently and in a shorter time than would be necessary without the support of this technology. In the present work, a timeline of important achievements is established, including important findings in object recognition, real-time operation. and integration of technologies for shop floor use. Perspectives on future research and related recommendations are proposed as well.
Collapse
Affiliation(s)
- Ana Malta
- Coimbra Institute of Engineering, Rua Pedro Nunes-Quinta da Nora, Polytechnic Institute of Coimbra, 3030-199 Coimbra, Portugal
- RCM2+ Research Centre for Asset Management and Systems Engineering, ISEC/IPC, Rua Pedro Nunes, 3030-199 Coimbra, Portugal
| | - Torres Farinha
- Coimbra Institute of Engineering, Rua Pedro Nunes-Quinta da Nora, Polytechnic Institute of Coimbra, 3030-199 Coimbra, Portugal
- RCM2+ Research Centre for Asset Management and Systems Engineering, ISEC/IPC, Rua Pedro Nunes, 3030-199 Coimbra, Portugal
| | - Mateus Mendes
- Coimbra Institute of Engineering, Rua Pedro Nunes-Quinta da Nora, Polytechnic Institute of Coimbra, 3030-199 Coimbra, Portugal
- RCM2+ Research Centre for Asset Management and Systems Engineering, ISEC/IPC, Rua Pedro Nunes, 3030-199 Coimbra, Portugal
| |
Collapse
|
8
|
Berwo MA, Khan A, Fang Y, Fahim H, Javaid S, Mahmood J, Abideen ZU, M S S. Deep Learning Techniques for Vehicle Detection and Classification from Images/Videos: A Survey. SENSORS (BASEL, SWITZERLAND) 2023; 23:4832. [PMID: 37430745 DOI: 10.3390/s23104832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 05/05/2023] [Accepted: 05/10/2023] [Indexed: 07/12/2023]
Abstract
Detecting and classifying vehicles as objects from images and videos is challenging in appearance-based representation, yet plays a significant role in the substantial real-time applications of Intelligent Transportation Systems (ITSs). The rapid development of Deep Learning (DL) has resulted in the computer-vision community demanding efficient, robust, and outstanding services to be built in various fields. This paper covers a wide range of vehicle detection and classification approaches and the application of these in estimating traffic density, real-time targets, toll management and other areas using DL architectures. Moreover, the paper also presents a detailed analysis of DL techniques, benchmark datasets, and preliminaries. A survey of some vital detection and classification applications, namely, vehicle detection and classification and performance, is conducted, with a detailed investigation of the challenges faced. The paper also addresses the promising technological advancements of the last few years.
Collapse
Affiliation(s)
- Michael Abebe Berwo
- School of Information and Engineering, Chang'an University, Xi'an 710064, China
| | - Asad Khan
- School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou 510006, China
| | - Yong Fang
- School of Information and Engineering, Chang'an University, Xi'an 710064, China
| | - Hamza Fahim
- School of Electronics and Information, Tongji University, Shanghai 200070, China
| | - Shumaila Javaid
- School of Electronics and Information, Tongji University, Shanghai 200070, China
| | - Jabar Mahmood
- School of Information and Engineering, Chang'an University, Xi'an 710064, China
| | - Zain Ul Abideen
- Research Institute of Automotive Engineering, Jiangsu University, Zhenjiang 212013, China
| | - Syam M S
- IOT Research Center, Shenzhen University, Shenzhen 518060, China
| |
Collapse
|
9
|
Gragnaniello D, Greco A, Saggese A, Vento M, Vicinanza A. Benchmarking 2D Multi-Object Detection and Tracking Algorithms in Autonomous Vehicle Driving Scenarios. SENSORS (BASEL, SWITZERLAND) 2023; 23:s23084024. [PMID: 37112365 PMCID: PMC10141924 DOI: 10.3390/s23084024] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Revised: 04/07/2023] [Accepted: 04/12/2023] [Indexed: 05/27/2023]
Abstract
Self-driving vehicles must be controlled by navigation algorithms that ensure safe driving for passengers, pedestrians and other vehicle drivers. One of the key factors to achieve this goal is the availability of effective multi-object detection and tracking algorithms, which allow to estimate position, orientation and speed of pedestrians and other vehicles on the road. The experimental analyses conducted so far have not thoroughly evaluated the effectiveness of these methods in road driving scenarios. To this aim, we propose in this paper a benchmark of modern multi-object detection and tracking methods applied to image sequences acquired by a camera installed on board the vehicle, namely, on the videos available in the BDD100K dataset. The proposed experimental framework allows to evaluate 22 different combinations of multi-object detection and tracking methods using metrics that highlight the positive contribution and limitations of each module of the considered algorithms. The analysis of the experimental results points out that the best method currently available is the combination of ConvNext and QDTrack, but also that the multi-object tracking methods applied on road images must be substantially improved. Thanks to our analysis, we conclude that the evaluation metrics should be extended by considering specific aspects of the autonomous driving scenarios, such as multi-class problem formulation and distance from the targets, and that the effectiveness of the methods must be evaluated by simulating the impact of the errors on driving safety.
Collapse
|
10
|
Ma TJ, Anderson RJ. Remote Sensing Low Signal-to-Noise-Ratio Target Detection Enhancement. SENSORS (BASEL, SWITZERLAND) 2023; 23:3314. [PMID: 36992025 PMCID: PMC10054736 DOI: 10.3390/s23063314] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 03/10/2023] [Accepted: 03/20/2023] [Indexed: 06/19/2023]
Abstract
In real-time remote sensing application, frames of data are continuously flowing into the processing system. The capability of detecting objects of interest and tracking them as they move is crucial to many critical surveillance and monitoring missions. Detecting small objects using remote sensors is an ongoing, challenging problem. Since object(s) are located far away from the sensor, the target's Signal-to-Noise-Ratio (SNR) is low. The Limit of Detection (LOD) for remote sensors is bounded by what is observable on each image frame. In this paper, we present a new method, a "Multi-frame Moving Object Detection System (MMODS)", to detect small, low SNR objects that are beyond what a human can observe in a single video frame. This is demonstrated by using simulated data where our technology-detected objects are as small as one pixel with a targeted SNR, close to 1:1. We also demonstrate a similar improvement using live data collected with a remote camera. The MMODS technology fills a major technology gap in remote sensing surveillance applications for small target detection. Our method does not require prior knowledge about the environment, pre-labeled targets, or training data to effectively detect and track slow- and fast-moving targets, regardless of the size or the distance.
Collapse
|
11
|
Li Y, Liu M, Yi Y, Li Q, Ren D, Zuo W. Two-stage single image reflection removal with reflection-aware guidance. APPL INTELL 2023. [DOI: 10.1007/s10489-022-04391-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/09/2023]
|
12
|
Kim T, Goh TS, Lee JS, Lee JH, Kim H, Jung ID. Transfer learning-based ensemble convolutional neural network for accelerated diagnosis of foot fractures. Phys Eng Sci Med 2023; 46:265-277. [PMID: 36625995 DOI: 10.1007/s13246-023-01215-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Accepted: 01/02/2023] [Indexed: 01/11/2023]
Abstract
The complex shape of the foot, consisting of 26 bones, variable ligaments, tendons, and muscles leads to misdiagnosis of foot fractures. Despite the introduction of artificial intelligence (AI) to diagnose fractures, the accuracy of foot fracture diagnosis is lower than that of conventional methods. We developed an AI assistant system that assists with consistent diagnosis and helps interns or non-experts improve their diagnosis of foot fractures, and compared the effectiveness of the AI assistance on various groups with different proficiency. Contrast-limited adaptive histogram equalization was used to improve the visibility of original radiographs and data augmentation was applied to prevent overfitting. Preprocessed radiographs were fed to an ensemble model of a transfer learning-based convolutional neural network (CNN) that was developed for foot fracture detection with three models: InceptionResNetV2, MobilenetV1, and ResNet152V2. After training the model, score class activation mapping was applied to visualize the fracture based on the model prediction. The prediction result was evaluated by the receiver operating characteristic (ROC) curve and its area under the curve (AUC), and the F1-Score. Regarding the test set, the ensemble model exhibited better classification ability (F1-Score: 0.837, AUC: 0.95, Accuracy: 86.1%) than other single models that showed an accuracy of 82.4%. With AI assistance for the orthopedic fellow, resident, intern, and student group, the accuracy of each group improved by 3.75%, 7.25%, 6.25%, and 7% respectively and diagnosis time was reduced by 21.9%, 14.7%, 24.4%, and 34.6% respectively.
Collapse
Affiliation(s)
- Taekyeong Kim
- Department of Mechanical Engineering, Ulsan National Institute of Science and Technology, Ulsan, 44919, Republic of Korea
| | - Tae Sik Goh
- Department of Orthopaedic Surgery, Biomedical Research Institute, Pusan National University Hospital, Pusan National University School of Medicine, Busan, 49241, Republic of Korea
| | - Jung Sub Lee
- Department of Orthopaedic Surgery, Biomedical Research Institute, Pusan National University Hospital, Pusan National University School of Medicine, Busan, 49241, Republic of Korea
| | - Ji Hyun Lee
- Health Insurance Review & Assessment Service, Wonju, 26465, Republic of Korea
| | - Hayeol Kim
- Department of Mechanical Engineering, Ulsan National Institute of Science and Technology, Ulsan, 44919, Republic of Korea
| | - Im Doo Jung
- Department of Mechanical Engineering, Ulsan National Institute of Science and Technology, Ulsan, 44919, Republic of Korea.
| |
Collapse
|
13
|
Cui Z, Lu N. Feature-comparison network for visual tracking. APPL INTELL 2023. [DOI: 10.1007/s10489-023-04466-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
|
14
|
Gullapelly A, Banik BG. Multiple object tracking with behavior detection in crowded scenes using deep learning. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2022. [DOI: 10.3233/jifs-223516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Multi-object tracking (MOT) is essential for solving the majority of computer vision issues related to crowd analytics. In an MOT system designing object detection and association are the two main steps. Every frame of the video stream is examined to find the desired objects in the first step. Their trajectories are determined in the second step by comparing the detected objects in the current frame to those in the previous frame. Less missing detections are made possible by an object detection system with high accuracy, which results in fewer segmented tracks. We propose a new deep learning-based model for improving the performance of object detection and object tracking in this research. First, object detection is performed by using the adaptive Mask-RCNN model. After that, the ResNet-50 model is used to extract more reliable and significant features of the objects. Then the effective adaptive feature channel selection method is employed for selecting feature channels to determine the final response map. Finally, an adaptive combination kernel correlation filter is used for multiple object tracking. Extensive experiments were conducted on large object tracking databases like MOT-20 and KITTI-MOTS. According to the experimental results, the proposed tracker performs better than other cutting-edge trackers when faced with various problems. The experimental simulation is carried out in python. The overall success rate and precision of the proposed algorithm are 95.36% and 93.27%.
Collapse
Affiliation(s)
- Aparna Gullapelly
- Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Deemed to be University, Hyderabad, Telangana, India
| | - Barnali Gupta Banik
- Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Deemed to be University, Hyderabad, Telangana, India
| |
Collapse
|
15
|
Dutta D, Pal SK. Prediction and assessment of the impact of COVID-19 lockdown on air quality over Kolkata: a deep transfer learning approach. ENVIRONMENTAL MONITORING AND ASSESSMENT 2022; 195:223. [PMID: 36544059 PMCID: PMC9771789 DOI: 10.1007/s10661-022-10761-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Accepted: 11/12/2022] [Indexed: 06/17/2023]
Abstract
The present study focuses on the prediction and assessment of the impact of lockdown because of coronavirus pandemic on the air quality during three different phases, viz., normal periods (1 January 2018-23 March 2020), complete lockdown (24 March 2020-31 May 2020), and partial lockdown (1 June 2020-30 September 2020). We identify the most important air pollutants influencing the air quality of Kolkata during three different periods using Random Forest, a tree-based machine learning (ML) algorithm. It is found that the ambient air quality of Kolkata is mainly affected with the aid of particulate matter or PM (PM10 and PM2.5). However, the effect of the lockdown is most prominent on PM2.5 which spreads in the air of Kolkata due to diesel-driven vehicles, domestic and commercial combustion activities, road dust, and open burning. To predict urban PM2.5 and PM10 concentrations 24 h in advance, we use a deep learning (DL) model, namely, stacked-bidirectional long short-term memory (stacked-BDLSTM). The model is trained during the normal periods, and it shows the superiority over some supervised ML models, like support vector machine, K-nearest neighbor classifier, multilayer perceptron, long short-term memory, and statistical time series forecasting model autoregressive integrated moving average. This pre-trained stacked-BDLSTM is applied to predict the concentrations of PM2.5 and PM10 during the pandemic situation of two cases, viz., complete lockdown and partial lockdown using a deep model-based transfer learning (TL) approach (TLS-BDLSTM). Transfer learning aims to utilize the information gained from one problem to improve the predictive performance of a learning model for a different but related problem. Our work helps to demonstrate how TL is useful when there is a scarcity of data during the COVID-19 pandemic regarding the drastic change in concentration of pollutants. The results reveal the best prediction performance of TLS-BDLSTM with a lead time of 24 h as compared to some well-known traditional ML and statistical models and the pre-trained stacked-BDLSTM. The prediction is then validated using the real-time data obtained during the complete lockdown due to COVID second wave (16 May-15 June 2021) with different time steps, e.g., 24 h, 48 h, 72 h, and 96-120 h. TLS-BDLSTM involving transfer learning is seen to outperform the said comparing methods in modeling the long-term temporal dependency of multivariate time series data and boost the forecast efficiency not only in single step, but also in multiple steps. The proposed methodologies are effective, consistent, and can be used by operational organizations to utilize in monitoring and management of air quality.
Collapse
Affiliation(s)
- Debashree Dutta
- Center for Soft Computing Research, Indian Statistical Institute, Kolkata, 700108 India
| | - Sankar K. Pal
- Center for Soft Computing Research, Indian Statistical Institute, Kolkata, 700108 India
| |
Collapse
|
16
|
Sánchez-Ferrer A, Valero-Mas JJ, Gallego AJ, Calvo-Zaragoza J. An Experimental Study on Marine Debris Location and Recognition using Object Detection. Pattern Recognit Lett 2022. [DOI: 10.1016/j.patrec.2022.12.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
|
17
|
Smooth momentum: improving lipschitzness in gradient descent. APPL INTELL 2022. [DOI: 10.1007/s10489-022-04207-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
18
|
Zong G, Wei L, Guo S, Wang Y. A cascaded refined rgb-d salient object detection network based on the attention mechanism. APPL INTELL 2022. [DOI: 10.1007/s10489-022-04186-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
19
|
Ferreira R, José de Castro Ferreira J, José Ribeiro Neves A. Object Tracking Using Adapted Optical Flow. ARTIF INTELL 2022. [DOI: 10.5772/intechopen.102863] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
The objective of this work is to present an object tracking algorithm developed from the combination of random tree techniques and optical flow adapted in terms of Gaussian curvature. This allows you to define a minimum surface limited by the contour of a two-dimensional image, which must or should not contain a minimum amount of optical flow vector associated with the movement of an object. The random tree will have the purpose of verifying the existence of superfluous vectors of optical flow by discarding them, defining a minimum number of vectors that characterizes the movement of the object. The results obtained were compared with those of the Lucas-Kanade algorithms with and without Gaussian filter, Horn and Schunk and Farneback. The items evaluated were precision and processing time, which made it possible to validate the results, despite the distinct nature between the algorithms. They were like those obtained in Lucas and Kanade with or without Gaussian filter, the Horn and Schunk, and better in relation to Farneback. This work allows analyzing the optical flow over small regions in an optimal way in relation to precision (and computational cost), enabling its application to area, such as cardiology, in the prediction of infarction.
Collapse
|
20
|
Alcántara A, Galván IM, Aler R. Deep neural networks for the quantile estimation of regional renewable energy production. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03958-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
AbstractWind and solar energy forecasting have become crucial for the inclusion of renewable energy in electrical power systems. Although most works have focused on point prediction, it is currently becoming important to also estimate the forecast uncertainty. With regard to forecasting methods, deep neural networks have shown good performance in many fields. However, the use of these networks for comparative studies of probabilistic forecasts of renewable energies, especially for regional forecasts, has not yet received much attention. The aim of this article is to study the performance of deep networks for estimating multiple conditional quantiles on regional renewable electricity production and compare them with widely used quantile regression methods such as the linear, support vector quantile regression, gradient boosting quantile regression, natural gradient boosting and quantile regression forest methods. A grid of numerical weather prediction variables covers the region of interest. These variables act as the predictors of the regional model. In addition to quantiles, prediction intervals are also constructed, and the models are evaluated using different metrics. These prediction intervals are further improved through an adapted conformalized quantile regression methodology. Overall, the results show that deep networks are the best performing method for both solar and wind energy regions, producing narrow prediction intervals with good coverage.
Collapse
|
21
|
Mispronunciation Detection and Diagnosis with Articulatory-Level Feedback Generation for Non-Native Arabic Speech. MATHEMATICS 2022. [DOI: 10.3390/math10152727] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
A high-performance versatile computer-assisted pronunciation training (CAPT) system that provides the learner immediate feedback as to whether their pronunciation is correct is very helpful in learning correct pronunciation and allows learners to practice this at any time and with unlimited repetitions, without the presence of an instructor. In this paper, we propose deep learning-based techniques to build a high-performance versatile CAPT system for mispronunciation detection and diagnosis (MDD) and articulatory feedback generation for non-native Arabic learners. The proposed system can locate the error in pronunciation, recognize the mispronounced phonemes, and detect the corresponding articulatory features (AFs), not only in words but even in sentences. We formulate the recognition of phonemes and corresponding AFs as a multi-label object recognition problem, where the objects are the phonemes and their AFs in a spectral image. Moreover, we investigate the use of cutting-edge neural text-to-speech (TTS) technology to generate a new corpus of high-quality speech from predefined text that has the most common substitution errors among Arabic learners. The proposed model and its various enhanced versions achieved excellent results. We compared the performance of the different proposed models with the state-of-the-art end-to-end technique of MDD, and our system had a better performance. In addition, we proposed using fusion between the proposed model and the end-to-end model and obtained a better performance. Our best model achieved a 3.83% phoneme error rate (PER) in the phoneme recognition task, a 70.53% F1-score in the MDD task, and a detection error rate (DER) of 2.6% for the AF detection task.
Collapse
|
22
|
Udaya Mohanan K, Cho S, Park BG. Optimization of the structural complexity of artificial neural network for hardware-driven neuromorphic computing application. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03783-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
AbstractThis work focuses on the optimization of the structural complexity of a single-layer feedforward neural network (SLFN) for neuromorphic hardware implementation. The singular value decomposition (SVD) method is used for the determination of the effective number of neurons in the hidden layer for Modified National Institute of Standards and Technology (MNIST) dataset classification. The proposed method is also verified on a SLFN using weights derived from a synaptic transistor device. The effectiveness of this methodology in estimating the reduced number of neurons in the hidden layer makes this method highly useful in optimizing complex neural network architectures for their hardware realization.
Collapse
|
23
|
Zheng Y, Cui L. Defect detection on new samples with siamese defect-aware attention network. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03595-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
24
|
Cai Y, Li L, Wang D, Liu X. MFNet: Multi-level fusion aware feature pyramid based multi-view stereo network for 3D reconstruction. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03754-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
25
|
Krenzer A, Makowski K, Hekalo A, Fitting D, Troya J, Zoller WG, Hann A, Puppe F. Fast machine learning annotation in the medical domain: a semi-automated video annotation tool for gastroenterologists. Biomed Eng Online 2022; 21:33. [PMID: 35614504 PMCID: PMC9134702 DOI: 10.1186/s12938-022-01001-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Accepted: 04/25/2022] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND Machine learning, especially deep learning, is becoming more and more relevant in research and development in the medical domain. For all the supervised deep learning applications, data is the most critical factor in securing successful implementation and sustaining the progress of the machine learning model. Especially gastroenterological data, which often involves endoscopic videos, are cumbersome to annotate. Domain experts are needed to interpret and annotate the videos. To support those domain experts, we generated a framework. With this framework, instead of annotating every frame in the video sequence, experts are just performing key annotations at the beginning and the end of sequences with pathologies, e.g., visible polyps. Subsequently, non-expert annotators supported by machine learning add the missing annotations for the frames in-between. METHODS In our framework, an expert reviews the video and annotates a few video frames to verify the object's annotations for the non-expert. In a second step, a non-expert has visual confirmation of the given object and can annotate all following and preceding frames with AI assistance. After the expert has finished, relevant frames will be selected and passed on to an AI model. This information allows the AI model to detect and mark the desired object on all following and preceding frames with an annotation. Therefore, the non-expert can adjust and modify the AI predictions and export the results, which can then be used to train the AI model. RESULTS Using this framework, we were able to reduce workload of domain experts on average by a factor of 20 on our data. This is primarily due to the structure of the framework, which is designed to minimize the workload of the domain expert. Pairing this framework with a state-of-the-art semi-automated AI model enhances the annotation speed further. Through a prospective study with 10 participants, we show that semi-automated annotation using our tool doubles the annotation speed of non-expert annotators compared to a well-known state-of-the-art annotation tool. CONCLUSION In summary, we introduce a framework for fast expert annotation for gastroenterologists, which reduces the workload of the domain expert considerably while maintaining a very high annotation quality. The framework incorporates a semi-automated annotation system utilizing trained object detection models. The software and framework are open-source.
Collapse
Affiliation(s)
- Adrian Krenzer
- Department of Artificial Intelligence and Knowledge Systems, Sanderring 2, 97070, Würzburg, Germany.
| | - Kevin Makowski
- Department of Artificial Intelligence and Knowledge Systems, Sanderring 2, 97070, Würzburg, Germany
| | - Amar Hekalo
- Department of Artificial Intelligence and Knowledge Systems, Sanderring 2, 97070, Würzburg, Germany
| | - Daniel Fitting
- Interventional and Experimental Endoscopy (InExEn), Department of Internal Medicine II, University Hospital Würzburg, Oberdürrbacher Straße 6, 97080, Würzburg, Germany
| | - Joel Troya
- Interventional and Experimental Endoscopy (InExEn), Department of Internal Medicine II, University Hospital Würzburg, Oberdürrbacher Straße 6, 97080, Würzburg, Germany
| | - Wolfram G Zoller
- Department of Internal Medicine and Gastroenterology, Katharinenhospital, Kriegsbergstrasse 60, 70174, Stuttgart, Germany
| | - Alexander Hann
- Interventional and Experimental Endoscopy (InExEn), Department of Internal Medicine II, University Hospital Würzburg, Oberdürrbacher Straße 6, 97080, Würzburg, Germany
| | - Frank Puppe
- Department of Artificial Intelligence and Knowledge Systems, Sanderring 2, 97070, Würzburg, Germany
| |
Collapse
|
26
|
|
27
|
Cores D, Brea VM, Mucientes M. Spatiotemporal tubelet feature aggregation and object linking for small object detection in videos. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03529-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
AbstractThis paper addresses the problem of exploiting spatiotemporal information to improve small object detection precision in video. We propose a two-stage object detector called FANet based on short-term spatiotemporal feature aggregation and long-term object linking to refine object detections. First, we generate a set of short tubelet proposals. Then, we aggregate RoI pooled deep features throughout the tubelet using a new temporal pooling operator that summarizes the information with a fixed output size independent of the tubelet length. In addition, we define a double head implementation that we feed with spatiotemporal information for spatiotemporal classification and with spatial information for object localization and spatial classification. Finally, a long-term linking method builds long tubes with the previously calculated short tubelets to overcome detection errors. The association strategy addresses the generally low overlap between instances of small objects in consecutive frames by reducing the influence of the overlap in the final linking score. We evaluated our model in three different datasets with small objects, outperforming previous state-of-the-art spatiotemporal object detectors and our spatial baseline.
Collapse
|
28
|
Delving into monocular 3D vehicle tracking: a decoupled framework and a dedicated metric. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03432-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
29
|
Yu J, Oh H. Graph-structure based multi-label prediction and classification for unsupervised person re-identification. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03163-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
30
|
Computational Intelligence-Based Harmony Search Algorithm for Real-Time Object Detection and Tracking in Video Surveillance Systems. MATHEMATICS 2022. [DOI: 10.3390/math10050733] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Recently, video surveillance systems have gained significant interest in several application areas. The examination of video sequences for the detection and tracking of objects remains a major issue in the field of image processing and computer vision. The object detection and tracking process includes the extraction of moving objects from the frames and continual tracking over time. The latest advances in computation intelligence (CI) techniques have become popular in the field of image processing and computer vision. In this aspect, this study introduces a novel computational intelligence-based harmony search algorithm for real-time object detection and tracking (CIHSA-RTODT) technique on video surveillance systems. The CIHSA-RTODT technique mainly focuses on detecting and tracking the objects that exist in the video frame. The CIHSA-RTODT technique incorporates an improved RefineDet-based object detection module, which can effectually recognize multiple objects in the video frame. In addition, the hyperparameter values of the improved RefineDet model are adjusted by the use of the Adagrad optimizer. Moreover, a harmony search algorithm (HSA) with a twin support vector machine (TWSVM) model is employed for object classification. The design of optimal RefineDet feature extraction with the application of HSA to appropriately adjust the parameters involved in the TWSVM model for object detection and tracking shows the novelty of the work. A wide range of experimental analyses are carried out on an open access dataset, and the results are inspected in several ways. The simulation outcome reported the superiority of the CIHSA-RTODT technique over the other existing techniques.
Collapse
|
31
|
Deep Learning-Based Small Object Detection and Classification Model for Garbage Waste Management in Smart Cities and IoT Environment. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12052281] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In recent years, object detection has gained significant interest and is considered a challenging problem in computer vision. Object detection is mainly employed for several applications, such as instance segmentation, object tracking, image captioning, healthcare, etc. Recent studies have reported that deep learning (DL) models can be employed for effective object detection compared to traditional methods. The rapid urbanization of smart cities necessitates the design of intelligent and automated waste management techniques for effective recycling of waste. In this view, this study develops a novel deep learning-based small object detection and classification model for garbage waste management (DLSODC-GWM) technique. The proposed DLSODC-GWM technique mainly focuses on detecting and classifying small garbage waste objects to assist intelligent waste management systems. The DLSODC-GWM technique follows two major processes, namely, object detection and classification. For object detection, an arithmetic optimization algorithm (AOA) with an improved RefineDet (IRD) model is applied, where the hyperparameters of the IRD model are optimally chosen by the AOA. Secondly, the functional link neural network (FLNN) technique was applied for the classification of waste objects into multiple classes. The design of IRD for waste classification and AOA-based hyperparameter tuning demonstrates the novelty of the work. The performance validation of the DLSODC-GWM technique is performed using benchmark datasets, and the experimental results show the promising performance of the DLSODC-GWM method on existing approaches with a maximum accuy of 98.61%.
Collapse
|
32
|
Continuous Viewpoint Planning in Conjunction with Dynamic Exploration for Active Object Recognition. ENTROPY 2021; 23:e23121702. [PMID: 34946008 PMCID: PMC8701023 DOI: 10.3390/e23121702] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Revised: 12/13/2021] [Accepted: 12/13/2021] [Indexed: 12/03/2022]
Abstract
Active object recognition (AOR) aims at collecting additional information to improve recognition performance by purposefully adjusting the viewpoint of an agent. How to determine the next best viewpoint of the agent, i.e., viewpoint planning (VP), is a research focus. Most existing VP methods perform viewpoint exploration in the discrete viewpoint space, which have to sample viewpoint space and may bring in significant quantization error. To address this challenge, a continuous VP approach for AOR based on reinforcement learning is proposed. Specifically, we use two separate neural networks to model the VP policy as a parameterized Gaussian distribution and resort the proximal policy optimization framework to learn the policy. Furthermore, an adaptive entropy regularization based dynamic exploration scheme is presented to automatically adjust the viewpoint exploration ability in the learning process. To the end, experimental results on the public dataset GERMS well demonstrate the superiority of our proposed VP method.
Collapse
|
33
|
Abstract
Object tracking is a fundamental computer vision problem that refers to a set of methods proposed to precisely track the motion trajectory of an object in a video. Multiple Object Tracking (MOT) is a subclass of object tracking that has received growing interest due to its academic and commercial potential. Although numerous methods have been introduced to cope with this problem, many challenges remain to be solved, such as severe object occlusion and abrupt appearance changes. This paper focuses on giving a thorough review of the evolution of MOT in recent decades, investigating the recent advances in MOT, and showing some potential directions for future work. The primary contributions include: (1) a detailed description of the MOT’s main problems and solutions, (2) a categorization of the previous MOT algorithms into 12 approaches and discussion of the main procedures for each category, (3) a review of the benchmark datasets and standard evaluation methods for evaluating the MOT, (4) a discussion of various MOT challenges and solutions by analyzing the related references, and (5) a summary of the latest MOT technologies and recent MOT trends using the mentioned MOT categories.
Collapse
|
34
|
Azuri I, Rosenhek-Goldian I, Regev-Rudzki N, Fantner G, Cohen SR. The role of convolutional neural networks in scanning probe microscopy: a review. BEILSTEIN JOURNAL OF NANOTECHNOLOGY 2021; 12:878-901. [PMID: 34476169 PMCID: PMC8372315 DOI: 10.3762/bjnano.12.66] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Accepted: 07/23/2021] [Indexed: 05/13/2023]
Abstract
Progress in computing capabilities has enhanced science in many ways. In recent years, various branches of machine learning have been the key facilitators in forging new paths, ranging from categorizing big data to instrumental control, from materials design through image analysis. Deep learning has the ability to identify abstract characteristics embedded within a data set, subsequently using that association to categorize, identify, and isolate subsets of the data. Scanning probe microscopy measures multimodal surface properties, combining morphology with electronic, mechanical, and other characteristics. In this review, we focus on a subset of deep learning algorithms, that is, convolutional neural networks, and how it is transforming the acquisition and analysis of scanning probe data.
Collapse
Affiliation(s)
- Ido Azuri
- Weizmann Institute of Science, Department of Life Sciences Core Facilities, Rehovot 76100, Israel
| | - Irit Rosenhek-Goldian
- Weizmann Institute of Science, Department of Chemical Research Support, Rehovot 76100, Israel
| | - Neta Regev-Rudzki
- Weizmann Institute of Science, Department of Biomolecular Sciences, Rehovot 76100, Israel
| | - Georg Fantner
- École Polytechnique Fédérale de Lausanne, Laboratory for Bio- and Nano-Instrumentation, CH1015 Lausanne, Switzerland
| | - Sidney R Cohen
- Weizmann Institute of Science, Department of Chemical Research Support, Rehovot 76100, Israel
| |
Collapse
|
35
|
|