101
|
Fukuda N, Konda S, Umehara J, Hirashima M. Efficient musculoskeletal annotation using free-form deformation. Sci Rep 2024; 14:16077. [PMID: 38992241 PMCID: PMC11239816 DOI: 10.1038/s41598-024-67125-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Accepted: 07/08/2024] [Indexed: 07/13/2024] Open
Abstract
Traditionally, constructing training datasets for automatic muscle segmentation from medical images involved skilled operators, leading to high labor costs and limited scalability. To address this issue, we developed a tool that enables efficient annotation by non-experts and assessed its effectiveness for training an automatic segmentation network. Our system allows users to deform a template three-dimensional (3D) anatomical model to fit a target magnetic-resonance image using free-form deformation with independent control points for axial, sagittal, and coronal directions. This method simplifies the annotation process by allowing non-experts to intuitively adjust the model, enabling simultaneous annotation of all muscles in the template. We evaluated the quality of the tool-assisted segmentation performed by non-experts, which achieved a Dice coefficient greater than 0.75 compared to expert segmentation, without significant errors such as mislabeling adjacent muscles or omitting musculature. An automatic segmentation network trained with datasets created using this tool demonstrated performance comparable to or superior to that of networks trained with expert-generated datasets. This innovative tool significantly reduces the time and labor costs associated with dataset creation for automatic muscle segmentation, potentially revolutionizing medical image annotation and accelerating the development of deep learning-based segmentation networks in various clinical applications.
Collapse
Affiliation(s)
- Norio Fukuda
- Center for Information and Neural Networks (CiNet), Advanced ICT Research Institute, National Institute of Information and Communications Technology (NICT), 1-4 Yamadaoka, Suita, Osaka, 565-0871, Japan
| | - Shoji Konda
- Center for Information and Neural Networks (CiNet), Advanced ICT Research Institute, National Institute of Information and Communications Technology (NICT), 1-4 Yamadaoka, Suita, Osaka, 565-0871, Japan
- Department of Health and Sport Sciences, Graduate School of Medicine, Osaka University, 1-17 Machikaneyama-Cho, Toyonaka, Osaka, 560-0043, Japan
| | - Jun Umehara
- Center for Information and Neural Networks (CiNet), Advanced ICT Research Institute, National Institute of Information and Communications Technology (NICT), 1-4 Yamadaoka, Suita, Osaka, 565-0871, Japan
- Faculty of Rehabilitation, Kansai Medical University, 18-89 Uyama-Higashi, Hirakata, Osaka, 573-1136, Japan
| | - Masaya Hirashima
- Center for Information and Neural Networks (CiNet), Advanced ICT Research Institute, National Institute of Information and Communications Technology (NICT), 1-4 Yamadaoka, Suita, Osaka, 565-0871, Japan.
- Graduate School of Frontier Biosciences, Osaka University, 1-3 Yamadaoka, Suita, Osaka, 565-0871, Japan.
| |
Collapse
|
102
|
Yan Y, Shi Z, Zhang Y. Hierarchical multi-task deep learning-assisted construction of human gut microbiota reactive oxygen species-scavenging enzymes database. mSphere 2024:e0034624. [PMID: 38995053 DOI: 10.1128/msphere.00346-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Accepted: 05/24/2024] [Indexed: 07/13/2024] Open
Abstract
In the process of oxygen reduction, reactive oxygen species (ROS) are generated as intermediates, including superoxide anion (O2-), hydrogen peroxide (H2O2), and hydroxyl radicals (OH-). ROS can be destructive, and an imbalance between oxidants and antioxidants in the body can lead to pathological inflammation. Inappropriate ROS production can cause oxidative damage, disrupting the balance in the body and potentially leading to DNA damage in intestinal epithelial cells and beneficial bacteria. Microorganisms have evolved various enzymes to mitigate the harmful effects of ROS. Accurately predicting the types of ROS-scavenging enzymes (ROSes) is crucial for understanding the oxidative stress mechanisms and formulating strategies to combat diseases related to the "gut-organ axis." Currently, there are no available ROSes databases (DBs). In this study, we propose a systematic workflow comprising three modules and employ a hierarchical multi-task deep learning approach to collect, expand, and explore ROSes-related entries. Based on this, we have developed the human gut microbiota ROSes DB (http://39.101.72.186/), which includes 7,689 entries. This DB provides user-friendly browsing and search features to support various applications. With the assistance of ROSes DB, various communication-based microbial interactions can be explored, further enabling the construction and analysis of the evolutionary and complex networks of ROSes DB in human gut microbiota species.IMPORTANCEReactive oxygen species (ROS) is generated during the process of oxygen reduction, including superoxide anion, hydrogen peroxide, and hydroxyl radicals. ROS can potentially cause damage to cells and DNA, leading to pathological inflammation within the body. Microorganisms have evolved various enzymes to mitigate the harmful effects of ROS, thereby maintaining a balance of microorganisms within the host. The study highlights the current absence of a ROSes DB, emphasizing the crucial importance of accurately predicting the types of ROSes for understanding oxidative stress mechanisms and developing strategies for diseases related to the "gut-organ axis." This research proposes a systematic workflow and employs a multi-task deep learning approach to establish the human gut microbiota ROSes DB. This DB comprises 7,689 entries and serves as a valuable tool for researchers to delve into the role of ROSes in the human gut microbiota.
Collapse
Affiliation(s)
- Yueyang Yan
- College of Veterinary Medicine, Jilin University, Changchun, China
| | - Zhanpeng Shi
- College of Veterinary Medicine, Jilin University, Changchun, China
| | - Yongrui Zhang
- Department of Urology, The First Hospital of Jilin University, Changchun, Jilin, China
| |
Collapse
|
103
|
Verma S, Kumar P, Singh JP. MLNAS: Meta-learning based neural architecture search for automated generation of deep neural networks for plant disease detection tasks. NETWORK (BRISTOL, ENGLAND) 2024:1-24. [PMID: 38994690 DOI: 10.1080/0954898x.2024.2374852] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Accepted: 06/20/2024] [Indexed: 07/13/2024]
Abstract
Plant diseases pose a significant threat to agricultural productivity worldwide. Convolutional neural networks (CNNs) have achieved state-of-the-art performances on several plant disease detection tasks. However, the manual development of CNN models using an exhaustive approach is a resource-intensive task. Neural Architecture Search (NAS) has emerged as an innovative paradigm that seeks to automate model generation procedures without human intervention. However, the application of NAS in plant disease detection has received limited attention. In this work, we propose a two-stage meta-learning-based neural architecture search system (ML NAS) to automate the generation of CNN models for unseen plant disease detection tasks. The first stage recommends the most suitable benchmark models for unseen plant disease detection tasks based on the prior evaluations of benchmark models on existing plant disease datasets. In the second stage, the proposed NAS operators are employed to optimize the recommended model for the target task. The experimental results showed that the MLNAS system's model outperformed state-of-the-art models on the fruit disease dataset, achieving an accuracy of 99.61%. Furthermore, the MLNAS-generated model outperformed the Progressive NAS model on the 8-class plant disease dataset, achieving an accuracy of 99.8%. Hence, the proposed MLNAS system facilitates faster model development with reduced computational costs.
Collapse
Affiliation(s)
- Sahil Verma
- Department of Computer Science and Engineering, National Institute of Technology Patna, Bihar, India
| | - Prabhat Kumar
- Department of Computer Science and Engineering, National Institute of Technology Patna, Bihar, India
| | - Jyoti Prakash Singh
- Department of Computer Science and Engineering, National Institute of Technology Patna, Bihar, India
| |
Collapse
|
104
|
Maruyama S, Watanabe H, Shimosegawa M. An image quality assessment index based on image features and keypoints for X-ray CT images. PLoS One 2024; 19:e0304860. [PMID: 38990930 PMCID: PMC11238976 DOI: 10.1371/journal.pone.0304860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Accepted: 05/21/2024] [Indexed: 07/13/2024] Open
Abstract
Optimization tasks in diagnostic radiological imaging require objective quantitative metrics that correlate with the subjective perception of observers. However, although one such metric, the structural similarity index (SSIM), is popular, it has limitations across various aspects in its application to medical images. In this study, we introduce a novel image quality evaluation approach based on keypoints and their associated unique image feature values, focusing on developing a framework to address the need for robustness and interpretability that are lacking in conventional methodologies. The proposed index quantifies and visualizes the distance between feature vectors associated with keypoints, which varies depending on changes in the image quality. This metric was validated on images with varying noise levels and resolution characteristics, and its applicability and effectiveness were examined by evaluating images subjected to various affine transformations. In the verification of X-ray computed tomography imaging using a head phantom, the distances between feature descriptors for each keypoint increased as the image quality degraded, exhibiting a strong correlation with the changes in the SSIM. Notably, the proposed index outperformed conventional full-reference metrics in terms of robustness to various transformations which are without changes in the image quality. Overall, the results suggested that image analysis performed using the proposed framework could effectively visualize the corresponding feature points, potentially harnessing lost feature information owing to changes in the image quality. These findings demonstrate the feasibility of applying the novel index to analyze changes in the image quality. This method may overcome limitations inherent in conventional evaluation methodologies and contribute to medical image analysis in the broader domain.
Collapse
Affiliation(s)
- Sho Maruyama
- Department of Radiological Technology, Gunma Prefectural College of Health Sciences, Maebashi, Gunma, Japan
| | - Haruyuki Watanabe
- Department of Radiological Technology, Gunma Prefectural College of Health Sciences, Maebashi, Gunma, Japan
| | - Masayuki Shimosegawa
- Department of Radiological Technology, Gunma Prefectural College of Health Sciences, Maebashi, Gunma, Japan
| |
Collapse
|
105
|
Karim MJ, Goni MOF, Nahiduzzaman M, Ahsan M, Haider J, Kowalski M. Enhancing agriculture through real-time grape leaf disease classification via an edge device with a lightweight CNN architecture and Grad-CAM. Sci Rep 2024; 14:16022. [PMID: 38992069 PMCID: PMC11239930 DOI: 10.1038/s41598-024-66989-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Accepted: 07/08/2024] [Indexed: 07/13/2024] Open
Abstract
Crop diseases can significantly affect various aspects of crop cultivation, including crop yield, quality, production costs, and crop loss. The utilization of modern technologies such as image analysis via machine learning techniques enables early and precise detection of crop diseases, hence empowering farmers to effectively manage and avoid the occurrence of crop diseases. The proposed methodology involves the use of modified MobileNetV3Large model deployed on edge device for real-time monitoring of grape leaf disease while reducing computational memory demands and ensuring satisfactory classification performance. To enhance applicability of MobileNetV3Large, custom layers consisting of two dense layers were added, each followed by a dropout layer, helped mitigate overfitting and ensured that the model remains efficient. Comparisons among other models showed that the proposed model outperformed those with an average train and test accuracy of 99.66% and 99.42%, with a precision, recall, and F1 score of approximately 99.42%. The model was deployed on an edge device (Nvidia Jetson Nano) using a custom developed GUI app and predicted from both saved and real-time data with high confidence values. Grad-CAM visualization was used to identify and represent image areas that affect the convolutional neural network (CNN) classification decision-making process with high accuracy. This research contributes to the development of plant disease classification technologies for edge devices, which have the potential to enhance the ability of autonomous farming for farmers, agronomists, and researchers to monitor and mitigate plant diseases efficiently and effectively, with a positive impact on global food security.
Collapse
Affiliation(s)
- Md Jawadul Karim
- Department of Electrical & Computer Engineering, Rajshahi University of Engineering & Technology, Rajshahi, 6204, Bangladesh
| | - Md Omaer Faruq Goni
- Department of Electrical & Computer Engineering, Rajshahi University of Engineering & Technology, Rajshahi, 6204, Bangladesh
| | - Md Nahiduzzaman
- Department of Electrical & Computer Engineering, Rajshahi University of Engineering & Technology, Rajshahi, 6204, Bangladesh
| | - Mominul Ahsan
- Department of Computer Science, University of York, Deramore Lane, Heslington, York, YO10 5GH, UK
| | - Julfikar Haider
- Department of Engineering, Manchester Metropolitan University, Chester Street, Manchester, M1 5GD, UK
| | - Marcin Kowalski
- Institute of Optoelectronics, Military University of Technology, Gen. S. Kaliskiego 2, 00-908, Warsaw, Poland.
| |
Collapse
|
106
|
Soneji P, Challita EJ, Bhamla S. Trackoscope: A low-cost, open, autonomous tracking microscope for long-term observations of microscale organisms. PLoS One 2024; 19:e0306700. [PMID: 38990841 PMCID: PMC11239018 DOI: 10.1371/journal.pone.0306700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Accepted: 06/21/2024] [Indexed: 07/13/2024] Open
Abstract
Cells and microorganisms are motile, yet the stationary nature of conventional microscopes impedes comprehensive, long-term behavioral and biomechanical analysis. The limitations are twofold: a narrow focus permits high-resolution imaging but sacrifices the broader context of organism behavior, while a wider focus compromises microscopic detail. This trade-off is especially problematic when investigating rapidly motile ciliates, which often have to be confined to small volumes between coverslips affecting their natural behavior. To address this challenge, we introduce Trackoscope, a 2-axis autonomous tracking microscope designed to follow swimming organisms ranging from 10μm to 2mm across a 325cm2 area (equivalent to an A5 sheet) for extended durations-ranging from hours to days-at high resolution. Utilizing Trackoscope, we captured a diverse array of behaviors, from the air-water swimming locomotion of Amoeba to bacterial hunting dynamics in Actinosphaerium, walking gait in Tardigrada, and binary fission in motile Blepharisma. Trackoscope is a cost-effective solution well-suited for diverse settings, from high school labs to resource-constrained research environments. Its capability to capture diverse behaviors in larger, more realistic ecosystems extends our understanding of the physics of living systems. The low-cost, open architecture democratizes scientific discovery, offering a dynamic window into the lives of previously inaccessible small aquatic organisms.
Collapse
Affiliation(s)
- Priya Soneji
- George W. Woodruff School of Mechanical Engineering, Georgia Institute of Technology, Atlanta, GA, United States of America
| | - Elio J Challita
- George W. Woodruff School of Mechanical Engineering, Georgia Institute of Technology, Atlanta, GA, United States of America
| | - Saad Bhamla
- School of Chemical and Biomolecular Engineering, Georgia Institute of Technology, Atlanta, GA, United States of America
| |
Collapse
|
107
|
Choudhary R, Mahadevan R. FOCUS on NOD2: Advancing IBD Drug Discovery with a User-Informed Machine Learning Framework. ACS Med Chem Lett 2024; 15:1057-1070. [PMID: 39015268 PMCID: PMC11247655 DOI: 10.1021/acsmedchemlett.4c00148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Revised: 05/17/2024] [Accepted: 06/03/2024] [Indexed: 07/18/2024] Open
Abstract
In this study, we introduce the Framework for Optimized Customizable User-Informed Synthesis (FOCUS), a generative machine learning model tailored for drug discovery. FOCUS integrates domain expertise and uses Proximal Policy Optimization (PPO) to guide Monte Carlo Tree Search (MCTS) to efficiently explore chemical space. It generates SMILES representations of potential drug candidates, optimizing for druggability and binding efficacy to NOD2, PEP, and MCT1 receptors. The model is highly interpretive, allowing for user-feedback and expert-driven adjustments based on detailed cycle reports. Employing tools like SHAP and LIME, FOCUS provides a transparent analysis of decision-making processes, emphasizing features such as docking scores and interaction fingerprints. Comparative studies with Muramyl Dipeptide (MDP) demonstrate improved interaction profiles. FOCUS merges advanced machine learning with expert insight, accelerating the drug discovery pipeline.
Collapse
Affiliation(s)
- Ruhi Choudhary
- Department of Chemical Engineering
and Applied Chemistry, University of Toronto, Toronto, Ontario M5S 3E5, Canada
| | - Radhakrishnan Mahadevan
- Department of Chemical Engineering
and Applied Chemistry, University of Toronto, Toronto, Ontario M5S 3E5, Canada
| |
Collapse
|
108
|
Hsu J, Nguyen KT, Bujnowska M, Janes KA, Fallahi-Sichani M. Protocol for iterative indirect immunofluorescence imaging in cultured cells, tissue sections, and metaphase chromosome spreads. STAR Protoc 2024; 5:103190. [PMID: 39002133 DOI: 10.1016/j.xpro.2024.103190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Revised: 05/14/2024] [Accepted: 06/20/2024] [Indexed: 07/15/2024] Open
Abstract
We present a protocol to generate highly multiplexed spatial data at cellular and subcellular resolutions using iterative indirect immunofluorescence imaging (4i). We describe streamlined steps for using 4i across fixed cultured cells, formalin-fixed paraffin-embedded (FFPE) tissue sections, and metaphase chromosome spreads. We detail procedures for sample preparation, antibody and DNA staining, immunofluorescence imaging, antibody elution, and image processing. This protocol is adapted for high-throughput analysis of fixed cultured cells and addresses sample-specific challenges such as intrinsic tissue autofluorescence and chromosome fragility. For complete details on the use and execution of this protocol for fixed cultured cells, please refer to Comandante-Lou et al.1.
Collapse
Affiliation(s)
- Jeffrey Hsu
- Medical Scientist Training Program, University of Virginia, Charlottesville, VA 22908, USA; Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22908, USA
| | - Kimberly T Nguyen
- Medical Scientist Training Program, University of Virginia, Charlottesville, VA 22908, USA; Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22908, USA
| | - Magda Bujnowska
- Medical Scientist Training Program, University of Virginia, Charlottesville, VA 22908, USA; Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22908, USA
| | - Kevin A Janes
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22908, USA; Department of Biomedical Engineering, University of Virginia, Charlottesville, VA 22908, USA; UVA Comprehensive Cancer Center, University of Virginia, Charlottesville, VA 22908, USA
| | - Mohammad Fallahi-Sichani
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA 22908, USA; UVA Comprehensive Cancer Center, University of Virginia, Charlottesville, VA 22908, USA.
| |
Collapse
|
109
|
Fu DS, Huang J, Hazra D, Dwivedi AK, Gupta SK, Shivahare BD, Garg D. Enhancing sports image data classification in federated learning through genetic algorithm-based optimization of base architecture. PLoS One 2024; 19:e0303462. [PMID: 38990969 PMCID: PMC11239052 DOI: 10.1371/journal.pone.0303462] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Accepted: 04/25/2024] [Indexed: 07/13/2024] Open
Abstract
Nowadays, federated learning is one of the most prominent choices for making decisions. A significant benefit of federated learning is that, unlike deep learning, it is not necessary to share data samples with the model owner. The weight of the global model in traditional federated learning is created by averaging the weights of all clients or sites. In the proposed work, a novel method has been discussed to generate an optimized base model without hampering its performance, which is based on a genetic algorithm. Chromosome representation, crossover, and mutation-all the intermediate operations of the genetic algorithm have been illustrated with useful examples. After applying the genetic algorithm, there is a significant improvement in inference time and a huge reduction in storage space. Therefore, the model can be easily deployed on resource-constrained devices. For the experimental work, sports data has been used in balanced and unbalanced scenarios with various numbers of clients in a federated learning environment. In addition, we have used four famous deep learning architectures, such as AlexNet, VGG19, ResNet50, and EfficientNetB3, as the base model. We have achieved 92.34% accuracy with 9 clients in the balanced data set by using EfficientNetB3 as the base model using a GA-based approach. Moreover, after applying the genetic algorithm to optimize EfficientNetB3, there is an improvement in inference time and storage space by 20% and 2.35%, respectively.
Collapse
Affiliation(s)
- De Sheng Fu
- College of Public Education, ZheJiang Institute of Economics and Trade HangZhou, ZheJiang, China
| | - Jie Huang
- College of Business administration ZheJiang Institute of Economics and Trade HangZhou, ZheJiang, China
| | - Dibyanarayan Hazra
- School of Computer Science Engineering and Technology, Bennett University, Greater Noida, India
| | - Amit Kumar Dwivedi
- School of Computer Science Engineering and Technology, Bennett University, Greater Noida, India
| | | | | | | |
Collapse
|
110
|
Du C, Yan Z, Xiong Z, Yu L. Boosting integral-based human pose estimation through implicit heatmap learning. Neural Netw 2024; 179:106524. [PMID: 39029299 DOI: 10.1016/j.neunet.2024.106524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 03/16/2024] [Accepted: 07/06/2024] [Indexed: 07/21/2024]
Abstract
Human pose estimation typically encompasses three categories: heatmap-, regression-, and integral-based methods. While integral-based methods possess advantages such as end-to-end learning, full-convolution learning, and being free from quantization errors, they have garnered comparatively less attention due to inferior performance. In this paper, we revisit integral-based approaches for human pose estimation and propose a novel implicit heatmap learning framework. The framework learns the true distribution of keypoints from the perspective of maximum likelihood estimation, aiming to mitigate inherent ambiguity in shape and variance associated with implicit heatmaps. Specifically, Simple Implicit Heatmap Normalization (SIHN) is first introduced to calculate implicit heatmaps as an efficient and effective representation for keypoint localization, which replaces the vanilla softmax normalization method. As implicit heatmaps may introduce potential challenges related to variance and shape ambiguity arising from the inherent nature of implicit heatmaps, we thus propose a Differentiable Spatial-to-Distributive Transform (DSDT) method to aptly map those implicit heatmaps onto the transformation coefficients of a deformed distribution. The deformed distribution is predicted by a likelihood-based generative model to unravel the shape ambiguity quandary effectively, and the transformation coefficients are learned by a regression model to resolve the variance ambiguity issue. Additionally, to expedite the acquisition of precise shape representations throughout the training process, we introduce a Wasserstein Distance-based Constraint (WDC) to ensure stable and reasonable supervision during the initial generation of implicit heatmaps. Experimental results on both the MSCOCO and MPII datasets demonstrate the effectiveness of our proposed method, achieving competitive performance against heatmap-based approaches while maintaining the advantages of integral-based approaches. Our source codes and pre-trained models are available at https://github.com/ducongju/IHL.
Collapse
Affiliation(s)
- Congju Du
- School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Zengqiang Yan
- School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Zixiang Xiong
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77483, USA
| | - Li Yu
- School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan 430074, China.
| |
Collapse
|
111
|
Guo Y, Nath P, Mahadevan S, Witherell P. Active learning for adaptive surrogate model improvement in high-dimensional problems. STRUCTURAL AND MULTIDISCIPLINARY OPTIMIZATION : JOURNAL OF THE INTERNATIONAL SOCIETY FOR STRUCTURAL AND MULTIDISCIPLINARY OPTIMIZATION 2024; 67:122. [PMID: 39006128 PMCID: PMC11236939 DOI: 10.1007/s00158-024-03816-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 05/08/2024] [Accepted: 05/08/2024] [Indexed: 07/16/2024]
Abstract
This paper investigates a novel approach to efficiently construct and improve surrogate models in problems with high-dimensional input and output. In this approach, the principal components and corresponding features of the high-dimensional output are first identified. For each feature, the active subspace technique is used to identify a corresponding low-dimensional subspace of the input domain; then a surrogate model is built for each feature in its corresponding active subspace. A low-dimensional adaptive learning strategy is proposed to identify training samples to improve the surrogate model. In contrast to existing adaptive learning methods that focus on a scalar output or a small number of outputs, this paper addresses adaptive learning with high-dimensional input and output, with a novel learning function that balances exploration and exploitation, i.e., considering unexplored regions and high-error regions, respectively. The adaptive learning is in terms of the active variables in the low-dimensional space, and the newly added training samples can be easily mapped back to the original space for running the expensive physics model. The proposed method is demonstrated for the numerical simulation of an additive manufacturing part, with a high-dimensional field output quantity of interest (residual stress) in the component that has spatial variability due to the stochastic nature of multiple input variables (including process variables and material properties). Various factors in the adaptive learning process are investigated, including the number of training samples, range and distribution of the adaptive training samples, contributions of various errors, and the importance of exploration versus exploitation in the learning function.
Collapse
Affiliation(s)
- Yulin Guo
- Department of Civil and Environmental Engineering, Vanderbilt University, Nashville, TN 37235 USA
| | - Paromita Nath
- Department of Mechanical Engineering, Rowan University, Glassboro, NJ 08028 USA
| | - Sankaran Mahadevan
- Department of Civil and Environmental Engineering, Vanderbilt University, Nashville, TN 37235 USA
| | - Paul Witherell
- Engineering Laboratory, National Institute of Standards and Technology, Gaithersburg, MD 20899 USA
| |
Collapse
|
112
|
Zhang H, Yang YF, Song XL, Hu HJ, Yang YY, Zhu X, Yang C. An interpretable artificial intelligence model based on CT for prognosis of intracerebral hemorrhage: a multicenter study. BMC Med Imaging 2024; 24:170. [PMID: 38982357 PMCID: PMC11234657 DOI: 10.1186/s12880-024-01352-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2024] [Accepted: 07/01/2024] [Indexed: 07/11/2024] Open
Abstract
OBJECTIVES To develop and validate a novel interpretable artificial intelligence (AI) model that integrates radiomic features, deep learning features, and imaging features at multiple semantic levels to predict the prognosis of intracerebral hemorrhage (ICH) patients at 6 months post-onset. MATERIALS AND METHODS Retrospectively enrolled 222 patients with ICH for Non-contrast Computed Tomography (NCCT) images and clinical data, who were divided into a training cohort (n = 186, medical center 1) and an external testing cohort (n = 36, medical center 2). Following image preprocessing, the entire hematoma region was segmented by two radiologists as the volume of interest (VOI). Pyradiomics algorithm library was utilized to extract 1762 radiomics features, while a deep convolutional neural network (EfficientnetV2-L) was employed to extract 1000 deep learning features. Additionally, radiologists evaluated imaging features. Based on the three different modalities of features mentioned above, the Random Forest (RF) model was trained, resulting in three models (Radiomics Model, Radiomics-Clinical Model, and DL-Radiomics-Clinical Model). The performance and clinical utility of the models were assessed using the Area Under the Receiver Operating Characteristic Curve (AUC), calibration curve, and Decision Curve Analysis (DCA), with AUC compared using the DeLong test. Furthermore, this study employs three methods, Shapley Additive Explanations (SHAP), Grad-CAM, and Guided Grad-CAM, to conduct a multidimensional interpretability analysis of model decisions. RESULTS The Radiomics-Clinical Model and DL-Radiomics-Clinical Model exhibited relatively good predictive performance, with an AUC of 0.86 [95% Confidence Intervals (CI): 0.71, 0.95; P < 0.01] and 0.89 (95% CI: 0.74, 0.97; P < 0.01), respectively, in the external testing cohort. CONCLUSION The multimodal explainable AI model proposed in this study can accurately predict the prognosis of ICH. Interpretability methods such as SHAP, Grad-CAM, and Guided Grad-Cam partially address the interpretability limitations of AI models. Integrating multimodal imaging features can effectively improve the performance of the model. CLINICAL RELEVANCE STATEMENT Predicting the prognosis of patients with ICH is a key objective in emergency care. Accurate and efficient prognostic tools can effectively prevent, manage, and monitor adverse events in ICH patients, maximizing treatment outcomes.
Collapse
Affiliation(s)
- Hao Zhang
- Department of Radiology, The First Affiliated Hospital of Dalian Medical University, Dalian, 116000, Liaoning, China
| | - Yun-Feng Yang
- Laboratory for Medical Imaging Informatics, Shanghai Institute of Technical Physics, Chinese Academy of Sciences, Shanghai, 200083, China
- Laboratory for Medical Imaging Informatics, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Xue-Lin Song
- Department of Radiology, The Second Affiliated Hospital of Dalian Medical University, Dalian, 116027, Liaoning, China
| | - Hai-Jian Hu
- Department of Hemato-oncology, The First Hospital of Changsha, Changsha, 410005, Hunan, China
| | - Yuan-Yuan Yang
- Laboratory for Medical Imaging Informatics, Shanghai Institute of Technical Physics, Chinese Academy of Sciences, Shanghai, 200083, China
- Laboratory for Medical Imaging Informatics, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Xia Zhu
- Department of Gynecology, Hunan Provincial Maternal and Child Health Care Hospital, Changsha, 410028, Hunan, China
| | - Chao Yang
- Department of Radiology, The First Affiliated Hospital of Dalian Medical University, Dalian, 116000, Liaoning, China.
| |
Collapse
|
113
|
Maus J, Nikulin P, Hofheinz F, Petr J, Braune A, Kotzerke J, van den Hoff J. Deep learning based bilateral filtering for edge-preserving denoising of respiratory-gated PET. EJNMMI Phys 2024; 11:58. [PMID: 38977533 PMCID: PMC11231129 DOI: 10.1186/s40658-024-00661-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Accepted: 06/17/2024] [Indexed: 07/10/2024] Open
Abstract
BACKGROUND Residual image noise is substantial in positron emission tomography (PET) and one of the factors limiting lesion detection, quantification, and overall image quality. Thus, improving noise reduction remains of considerable interest. This is especially true for respiratory-gated PET investigations. The only broadly used approach for noise reduction in PET imaging has been the application of low-pass filters, usually Gaussians, which however leads to loss of spatial resolution and increased partial volume effects affecting detectability of small lesions and quantitative data evaluation. The bilateral filter (BF) - a locally adaptive image filter - allows to reduce image noise while preserving well defined object edges but manual optimization of the filter parameters for a given PET scan can be tedious and time-consuming, hampering its clinical use. In this work we have investigated to what extent a suitable deep learning based approach can resolve this issue by training a suitable network with the target of reproducing the results of manually adjusted case-specific bilateral filtering. METHODS Altogether, 69 respiratory-gated clinical PET/CT scans with three different tracers ([ 18 F ] FDG,[ 18 F ] L-DOPA,[ 68 Ga ] DOTATATE) were used for the present investigation. Prior to data processing, the gated data sets were split, resulting in a total of 552 single-gate image volumes. For each of these image volumes, four 3D ROIs were delineated: one ROI for image noise assessment and three ROIs for focal uptake (e.g. tumor lesions) measurements at different target/background contrast levels. An automated procedure was used to perform a brute force search of the two-dimensional BF parameter space for each data set to identify the "optimal" filter parameters to generate user-approved ground truth input data consisting of pairs of original and optimally BF filtered images. For reproducing the optimal BF filtering, we employed a modified 3D U-Net CNN incorporating residual learning principle. The network training and evaluation was performed using a 5-fold cross-validation scheme. The influence of filtering on lesion SUV quantification and image noise level was assessed by calculating absolute and fractional differences between the CNN, manual BF, or original (STD) data sets in the previously defined ROIs. RESULTS The automated procedure used for filter parameter determination chose adequate filter parameters for the majority of the data sets with only 19 patient data sets requiring manual tuning. Evaluation of the focal uptake ROIs revealed that CNN as well as BF based filtering essentially maintain the focal SUV max values of the unfiltered images with a low mean ± SD difference of δ SUV max CNN , STD = (-3.9 ± 5.2)% and δ SUV max BF , STD = (-4.4 ± 5.3)%. Regarding relative performance of CNN versus BF, both methods lead to very similar SUV max values in the vast majority of cases with an overall average difference of δ SUV max CNN , BF = (0.5 ± 4.8)%. Evaluation of the noise properties showed that CNN filtering mostly satisfactorily reproduces the noise level and characteristics of BF with δ Noise CNN , BF = (5.6 ± 10.5)%. No significant tracer dependent differences between CNN and BF were observed. CONCLUSIONS Our results show that a neural network based denoising can reproduce the results of a case by case optimized BF in a fully automated way. Apart from rare cases it led to images of practically identical quality regarding noise level, edge preservation, and signal recovery. We believe such a network might proof especially useful in the context of improved motion correction of respiratory-gated PET studies but could also help to establish BF-equivalent edge-preserving CNN filtering in clinical PET since it obviates time consuming manual BF parameter tuning.
Collapse
Affiliation(s)
- Jens Maus
- Department of Positron Emission Tomography, Institute of Radiopharmaceutical Cancer Research, Helmholtz-Zentrum Dresden-Rossendorf, Bautzner Landstraße 400, 01314, Dresden, Germany.
| | - Pavel Nikulin
- Department of Positron Emission Tomography, Institute of Radiopharmaceutical Cancer Research, Helmholtz-Zentrum Dresden-Rossendorf, Bautzner Landstraße 400, 01314, Dresden, Germany
| | - Frank Hofheinz
- Department of Positron Emission Tomography, Institute of Radiopharmaceutical Cancer Research, Helmholtz-Zentrum Dresden-Rossendorf, Bautzner Landstraße 400, 01314, Dresden, Germany
| | - Jan Petr
- Department of Positron Emission Tomography, Institute of Radiopharmaceutical Cancer Research, Helmholtz-Zentrum Dresden-Rossendorf, Bautzner Landstraße 400, 01314, Dresden, Germany
| | - Anja Braune
- Klinik und Poliklinik für Nuklearmedizin, Universtitätsklinikum Carl Gustav Carus, Fetscherstraße 74, 01307, Dresden, Germany
| | - Jörg Kotzerke
- Klinik und Poliklinik für Nuklearmedizin, Universtitätsklinikum Carl Gustav Carus, Fetscherstraße 74, 01307, Dresden, Germany
| | - Jörg van den Hoff
- Department of Positron Emission Tomography, Institute of Radiopharmaceutical Cancer Research, Helmholtz-Zentrum Dresden-Rossendorf, Bautzner Landstraße 400, 01314, Dresden, Germany
- Klinik und Poliklinik für Nuklearmedizin, Universtitätsklinikum Carl Gustav Carus, Fetscherstraße 74, 01307, Dresden, Germany
| |
Collapse
|
114
|
Hermanson VR, Cutter GR, Hinke JT, Dawkins M, Watters GM. A method to estimate prey density from single-camera images: A case study with chinstrap penguins and Antarctic krill. PLoS One 2024; 19:e0303633. [PMID: 38980882 PMCID: PMC11232977 DOI: 10.1371/journal.pone.0303633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 04/29/2024] [Indexed: 07/11/2024] Open
Abstract
Estimating the densities of marine prey observed in animal-borne video loggers when encountered by foraging predators represents an important challenge for understanding predator-prey interactions in the marine environment. We used video images collected during the foraging trip of one chinstrap penguin (Pygoscelis antarcticus) from Cape Shirreff, Livingston Island, Antarctica to develop a novel approach for estimating the density of Antarctic krill (Euphausia superba) encountered during foraging activities. Using the open-source Video and Image Analytics for a Marine Environment (VIAME), we trained a neural network model to identify video frames containing krill. Our image classifier has an overall accuracy of 73%, with a positive predictive value of 83% for prediction of frames containing krill. We then developed a method to estimate the volume of water imaged, thus the density (N·m-3) of krill, in the 2-dimensional images. The method is based on the maximum range from the camera where krill remain visibly resolvable and assumes that mean krill length is known, and that the distribution of orientation angles of krill is uniform. From 1,932 images identified as containing krill, we manually identified a subset of 124 images from across the video record that contained resolvable and unresolvable krill necessary to estimate the resolvable range and imaged volume for the video sensor. Krill swarm density encountered by the penguins ranged from 2 to 307 krill·m-3 and mean density of krill was 48 krill·m-3 (sd = 61 krill·m-3). Mean krill biomass density was 25 g·m-3. Our frame-level image classifier model and krill density estimation method provide a new approach to efficiently process video-logger data and estimate krill density from 2D imagery, providing key information on prey aggregations that may affect predator foraging performance. The approach should be directly applicable to other marine predators feeding on aggregations of prey.
Collapse
Affiliation(s)
- Victoria R. Hermanson
- Antarctic Ecosystem Research Division, Southwest Fisheries Science Center, National Marine Fisheries Service, National Oceanic and Atmospheric Administration, La Jolla, CA, United States of America
| | - George R. Cutter
- Antarctic Ecosystem Research Division, Southwest Fisheries Science Center, National Marine Fisheries Service, National Oceanic and Atmospheric Administration, La Jolla, CA, United States of America
| | - Jefferson T. Hinke
- Antarctic Ecosystem Research Division, Southwest Fisheries Science Center, National Marine Fisheries Service, National Oceanic and Atmospheric Administration, La Jolla, CA, United States of America
| | | | - George M. Watters
- Antarctic Ecosystem Research Division, Southwest Fisheries Science Center, National Marine Fisheries Service, National Oceanic and Atmospheric Administration, La Jolla, CA, United States of America
| |
Collapse
|
115
|
Park HG. Bayesian estimation of covariate assisted principal regression for brain functional connectivity. Biostatistics 2024:kxae023. [PMID: 38981041 DOI: 10.1093/biostatistics/kxae023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 03/25/2024] [Accepted: 06/02/2024] [Indexed: 07/11/2024] Open
Abstract
This paper presents a Bayesian reformulation of covariate-assisted principal regression for covariance matrix outcomes to identify low-dimensional components in the covariance associated with covariates. By introducing a geometric approach to the covariance matrices and leveraging Euclidean geometry, we estimate dimension reduction parameters and model covariance heterogeneity based on covariates. This method enables joint estimation and uncertainty quantification of relevant model parameters associated with heteroscedasticity. We demonstrate our approach through simulation studies and apply it to analyze associations between covariates and brain functional connectivity using data from the Human Connectome Project.
Collapse
Affiliation(s)
- Hyung G Park
- Division of Biostatistics, Department of Population Health, New York University Grossman School of Medicine, 180 Madison Ave., New York, NY 10016, USA
| |
Collapse
|
116
|
Bowen Z, Huacai L, Shengbo Z, Xinqiang C, Hongwei X. Night target detection algorithm based on improved YOLOv7. Sci Rep 2024; 14:15771. [PMID: 38982192 PMCID: PMC11233500 DOI: 10.1038/s41598-024-66842-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 07/04/2024] [Indexed: 07/11/2024] Open
Abstract
Aiming at the problems of error detection and missing detection in night target detection, this paper proposes a night target detection algorithm based on YOLOv7(You Only Look Once v7). The algorithm proposed in this paper preprocesses images by means of square equalization and Gamma transform. The GSConv(Group Separable Convolution) module is introduced to reduce the number of parameters and the amount of calculation to improve the detection effect. ShuffleNetv2_×1.5 is introduced as the feature extraction Network to reduce the number of Network parameters while maintaining high tracking accuracy. The hard-swish activation function is adopted to greatly reduce the delay cost. At last, Scylla Intersection over Union function is used instead of Efficient Intersection over Union function to optimize the loss function and improve the robustness. Experimental results demonstrate that the average detection accuracy of the proposed improved YOLOv7 model is 88.1%. It can effectively improve the detection accuracy and accuracy of night target detection.
Collapse
Affiliation(s)
- Zheng Bowen
- Key Laboratory of Electric Drive and Control of Anhui Province, AnHui Polytechnic University, Wuhu, China
| | - Lu Huacai
- Key Laboratory of Electric Drive and Control of Anhui Province, AnHui Polytechnic University, Wuhu, China.
| | - Zhu Shengbo
- Key Laboratory of Electric Drive and Control of Anhui Province, AnHui Polytechnic University, Wuhu, China
| | - Chen Xinqiang
- Key Laboratory of Electric Drive and Control of Anhui Province, AnHui Polytechnic University, Wuhu, China
| | - Xing Hongwei
- Key Laboratory of Electric Drive and Control of Anhui Province, AnHui Polytechnic University, Wuhu, China
| |
Collapse
|
117
|
Zhu D, Xin Z, Zheng S, Wang Y, Yang X. Addressing the Accuracy-Cost Trade-off in Material Property Prediction Using a Teacher-Student Strategy. J Chem Theory Comput 2024; 20:5743-5750. [PMID: 38875176 DOI: 10.1021/acs.jctc.4c00625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2024]
Abstract
Deep learning has catalyzed a transformative shift in material discovery, offering a key advantage over traditional experimental and theoretical methods by significantly reducing associated costs. Models adept at predicting properties from chemical compositions alone do not require structural information. However, this cost-efficient approach compromises model precision, particularly in Chemical Composition-based Property Prediction Models (CPMs), which are notably less accurate than Structure-based Property Prediction Models (SPMs). Addressing this challenge, our study introduces a novel Teacher-Student (TS) strategy, where a pretrained SPM serves as an instructive 'teacher' to enhance the CPM's precision. This TS strategy successfully harmonizes low-cost exploration with high accuracy, achieving a significant 47.1% reduction in relative error in scenarios involving 100 data entries. We also evaluate the effectiveness of the proposed strategy by employing perovskites as a case study. This method represents a significant advancement in the exploration and identification of valuable materials, leveraging CPM's potential while overcoming its precision limitations.
Collapse
Affiliation(s)
- Dong Zhu
- Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhikuang Xin
- Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Siming Zheng
- Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yangang Wang
- Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xiaoyu Yang
- Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
118
|
Islam MS, Kalmady SV, Hindle A, Sandhu R, Sun W, Sepehrvand N, Greiner R, Kaul P. Diagnostic and Prognostic Electrocardiogram-Based Models for Rapid Clinical Applications. Can J Cardiol 2024:S0828-282X(24)00523-3. [PMID: 38992812 DOI: 10.1016/j.cjca.2024.07.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Revised: 07/04/2024] [Accepted: 07/05/2024] [Indexed: 07/13/2024] Open
Abstract
Leveraging artificial intelligence (AI) for the analysis of electrocardiograms (ECGs) has the potential to transform diagnosis and estimate the prognosis of not only cardiac but, increasingly, noncardiac conditions. In this review, we summarize clinical studies and AI-enhanced ECG-based clinical applications in the early detection, diagnosis, and estimating prognosis of cardiovascular diseases in the past 5 years (2019-2023). With advancements in deep learning and the rapid increased use of ECG technologies, a large number of clinical studies have been published. However, most of these studies are single-centre, retrospective, proof-of-concept studies that lack external validation. Prospective studies that progress from development toward deployment in clinical settings account for < 15% of the studies. Successful implementations of ECG-based AI applications that have received approval from the Food and Drug Administration have been developed through commercial collaborations, with approximately half of them being for mobile or wearable devices. The field is in its early stages, and overcoming several obstacles is essential, such as prospective validation in multicentre large data sets, addressing technical issues, bias, privacy, data security, model generalizability, and global scalability. This review concludes with a discussion of these challenges and potential solutions. By providing a holistic view of the state of AI in ECG analysis, this review aims to set a foundation for future research directions, emphasizing the need for comprehensive, clinically integrated, and globally deployable AI solutions in cardiovascular disease management.
Collapse
Affiliation(s)
- Md Saiful Islam
- Canadian VIGOUR Centre, University of Alberta, Edmonton, Alberta, Canada; Department of Medicine, University of Alberta, Edmonton, Alberta, Canada
| | - Sunil Vasu Kalmady
- Canadian VIGOUR Centre, University of Alberta, Edmonton, Alberta, Canada; Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada
| | - Abram Hindle
- Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada
| | - Roopinder Sandhu
- Canadian VIGOUR Centre, University of Alberta, Edmonton, Alberta, Canada; Smidt Heart Institute, Cedars-Sinai Medical Center Hospital System, Los Angeles, California, USA
| | - Weijie Sun
- Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada
| | - Nariman Sepehrvand
- Canadian VIGOUR Centre, University of Alberta, Edmonton, Alberta, Canada; Department of Medicine, University of Calgary, Calgary, Alberta, Canada
| | - Russel Greiner
- Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada; Alberta Machine Intelligence Institute, Edmonton, Alberta, Canada
| | - Padma Kaul
- Canadian VIGOUR Centre, University of Alberta, Edmonton, Alberta, Canada; Department of Medicine, University of Alberta, Edmonton, Alberta, Canada.
| |
Collapse
|
119
|
Xu Y, Sun R, Hu M, Zeng H. A Dual-Modal Fusion Network Using Optical Coherence Tomography and Fundus Images in Detection of Glaucomatous Optic Neuropathy. Curr Eye Res 2024:1-7. [PMID: 38979787 DOI: 10.1080/02713683.2024.2375401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2024] [Accepted: 06/27/2024] [Indexed: 07/10/2024]
Abstract
PURPOSE We designed a dual-modal fusion network to detect glaucomatous optic neuropathy, which utilized both retinal nerve fiber layer thickness from optical coherence tomography reports and fundus images. METHODS A total of 327 healthy subjects (410 eyes) and 87 glaucomatous optic neuropathy patients (113 eyes) were included. The retinal nerve fiber layer thickness from optical coherence tomography reports and fundus images were used as predictors in the dual-modal fusion network to diagnose glaucoma. The area under the receiver operation characteristic curve, accuracy, sensitivity, and specificity were measured to compare our method and other approaches. RESULTS The accuracy of our dual-modal fusion network using both retinal nerve fiber layer thickness from optical coherence tomography reports and fundus images was 0.935 and we achieved a significant larger area under the receiver operation characteristic curve of our method with 0.968 (95% confidence interval, 0.937-0.999). For only using retinal nerve fiber layer thickness, we compared the area under the receiver operation characteristic curves between our network and other three approaches: 0.916 (95% confidence interval, 0.855, 0.977) with our optical coherence tomography Net; 0.841 (95% confidence interval, 0.749, 0.933) with Clock sectors division; 0.862 (95% confidence interval, 0.757, 0.968) with inferior, superior, nasal temporal sectors division and 0.886 (95% confidence interval, 0.815, 0.957) with optic disc sectors division. For only using fundus images, we compared the area under the receiver operation characteristic curves between our network and other two approaches: 0.867 (95% confidence interval: 0.781-0.952) with our Image Net; 0.774 (95% confidence interval: 0.670, 0.878) with ResNet50; 0.747 (95% confidence interval: 0.628, 0.866) with VGG16. CONCLUSION Our dual-modal fusion network utilizing both retinal nerve fiber layer thickness from optical coherence tomography reports and fundus images can diagnose glaucoma with a much better performance than the current approaches based on optical coherence tomography only or fundus images only.
Collapse
Affiliation(s)
- Yongli Xu
- College of Mathematics and Physics, Beijing University of Chemical Technology, Beijing, China
- College of Statistics and Data Science, Faculty of Science, Beijing University of Technology, Beijing, China
| | - Run Sun
- College of Mathematics and Physics, Beijing University of Chemical Technology, Beijing, China
| | - Man Hu
- Department of Ophthalmology, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, China
| | - Hui Zeng
- College of Mathematics and Physics, Beijing University of Chemical Technology, Beijing, China
| |
Collapse
|
120
|
Okawa J, Hori K, Izuno H, Fukuda M, Ujihashi T, Kodama S, Yoshimoto T, Sato R, Ono T. Developing tongue coating status assessment using image recognition with deep learning. J Prosthodont Res 2024; 68:425-431. [PMID: 37766551 DOI: 10.2186/jpr.jpr_d_23_00117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/29/2023]
Abstract
PURPOSE To build an image recognition network to evaluate tongue coating status. METHODS Two image recognition networks were built: one for tongue detection and another for tongue coating classification. Digital tongue photographs were used to develop both networks; images from 251 (178 women, 74.7±6.6 years) and 144 older adults (83 women, 73.8±7.3 years) who volunteered to participate were used for the tongue detection network and coating classification network, respectively. The learning objective of the tongue detection network is to extract a rectangular region that includes the tongue. You-Only-Look-Once (YOLO) v2 was used as the detection network, and transfer learning was performed using ResNet-50. The accuracy was evaluated by calculating the intersection over the union. For tongue coating classification, the rectangular area including the tongue was divided into a grid of 7×7. Five experienced panelists scored the tongue coating in each area using one of five grades, and the tongue coating index (TCI) was calculated. Transfer learning for tongue coating grades was performed using ResNet-18, and the TCI was calculated. Agreement between the panelists and network for the tongue coating grades in each area and TCI was evaluated using the kappa coefficient and intraclass correlation, respectively. RESULTS The tongue detection network recognized the tongue with a high intersection over union (0.885±0.081). The tongue coating classification network showed high agreement with tongue coating grades and TCI, with a kappa coefficient of 0.826 and an intraclass correlation coefficient of 0.807, respectively. CONCLUSIONS Image recognition enables simple and detailed assessment of tongue coating status.
Collapse
Affiliation(s)
- Jumpei Okawa
- Division of Comprehensive Prosthodontics, Faculty of Dentistry & Graduate School of Medical and Dental Sciences, Niigata University, Niigata, Japan
| | - Kazuhiro Hori
- Division of Comprehensive Prosthodontics, Faculty of Dentistry & Graduate School of Medical and Dental Sciences, Niigata University, Niigata, Japan
| | - Hiromi Izuno
- Department of Oral Health Sciences, Faculty of Nursing and Health Care, BAIKA Women's University, Ibaraki, Japan
| | - Masayo Fukuda
- Department of Oral Health Science, Faculty of Health Science, Kobe Tokiwa University, Kobe, Japan
| | - Takako Ujihashi
- Division of Comprehensive Prosthodontics, Faculty of Dentistry & Graduate School of Medical and Dental Sciences, Niigata University, Niigata, Japan
- Department of Oral Health Science, Faculty of Health Science, Kobe Tokiwa University, Kobe, Japan
| | - Shohei Kodama
- Division of Comprehensive Prosthodontics, Faculty of Dentistry & Graduate School of Medical and Dental Sciences, Niigata University, Niigata, Japan
| | - Tasuku Yoshimoto
- Division of Comprehensive Prosthodontics, Faculty of Dentistry & Graduate School of Medical and Dental Sciences, Niigata University, Niigata, Japan
| | - Rikako Sato
- Division of Comprehensive Prosthodontics, Faculty of Dentistry & Graduate School of Medical and Dental Sciences, Niigata University, Niigata, Japan
| | - Takahiro Ono
- Division of Comprehensive Prosthodontics, Faculty of Dentistry & Graduate School of Medical and Dental Sciences, Niigata University, Niigata, Japan
- Department of Geriatric Dentistry, Osaka Dental University, Osaka, Japan
| |
Collapse
|
121
|
Kaneko T, Matsumoto J, Lu W, Zhao X, Ueno-Nigh LR, Oishi T, Kimura K, Otsuka Y, Zheng A, Ikenaka K, Baba K, Mochizuki H, Nishijo H, Inoue KI, Takada M. Deciphering social traits and pathophysiological conditions from natural behaviors in common marmosets. Curr Biol 2024; 34:2854-2867.e5. [PMID: 38889723 DOI: 10.1016/j.cub.2024.05.033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 05/15/2024] [Accepted: 05/17/2024] [Indexed: 06/20/2024]
Abstract
Nonhuman primates (NHPs) are indispensable animal models by virtue of the continuity of behavioral repertoires across primates, including humans. However, behavioral assessment at the laboratory level has so far been limited. Employing the application of three-dimensional (3D) pose estimation and the optimal integration of subsequent analytic methodologies, we demonstrate that our artificial intelligence (AI)-based approach has successfully deciphered the ethological, cognitive, and pathological traits of common marmosets from their natural behaviors. By applying multiple deep neural networks trained with large-scale datasets, we established an evaluation system that could reconstruct and estimate the 3D poses of the marmosets, a small NHP that is suitable for analyzing complex natural behaviors in laboratory setups. We further developed downstream analytic methodologies to quantify a variety of behavioral parameters beyond motion kinematics. We revealed the distinct parental roles of male and female marmosets through automated detections of food-sharing behaviors using a spatial-temporal filter on 3D poses. Employing a recurrent neural network to analyze 3D pose time series data during social interactions, we additionally discovered that marmosets adjusted their behaviors based on others' internal state, which is not directly observable but can be inferred from the sequence of others' actions. Moreover, a fully unsupervised approach enabled us to detect progressively appearing symptomatic behaviors over a year in a Parkinson's disease model. The high-throughput and versatile nature of an AI-driven approach to analyze natural behaviors will open a new avenue for neuroscience research dealing with big-data analyses of social and pathophysiological behaviors in NHPs.
Collapse
Affiliation(s)
- Takaaki Kaneko
- Center for the Evolutionary Origins of Human Behavior, Kyoto University, Inuyama, Aichi 484-8506, Japan.
| | - Jumpei Matsumoto
- Department of System Emotional Science, Faculty of Medicine, University of Toyama, Toyama 930-0194, Japan; Research Center for Idling Brain Science, University of Toyama, Toyama 930-0194, Japan
| | - Wanyi Lu
- Center for the Evolutionary Origins of Human Behavior, Kyoto University, Inuyama, Aichi 484-8506, Japan
| | - Xincheng Zhao
- Center for the Evolutionary Origins of Human Behavior, Kyoto University, Inuyama, Aichi 484-8506, Japan
| | - Louie Richard Ueno-Nigh
- Center for the Evolutionary Origins of Human Behavior, Kyoto University, Inuyama, Aichi 484-8506, Japan
| | - Takao Oishi
- Center for the Evolutionary Origins of Human Behavior, Kyoto University, Inuyama, Aichi 484-8506, Japan
| | - Kei Kimura
- Center for the Evolutionary Origins of Human Behavior, Kyoto University, Inuyama, Aichi 484-8506, Japan
| | - Yukiko Otsuka
- Center for the Evolutionary Origins of Human Behavior, Kyoto University, Inuyama, Aichi 484-8506, Japan
| | - Andi Zheng
- Center for the Evolutionary Origins of Human Behavior, Kyoto University, Inuyama, Aichi 484-8506, Japan
| | - Kensuke Ikenaka
- Department of Neurology, Osaka University Graduate School of Medicine, Suita, Osaka 565-0871, Japan
| | - Kousuke Baba
- Department of Neurology, Osaka University Graduate School of Medicine, Suita, Osaka 565-0871, Japan
| | - Hideki Mochizuki
- Department of Neurology, Osaka University Graduate School of Medicine, Suita, Osaka 565-0871, Japan
| | - Hisao Nishijo
- Department of System Emotional Science, Faculty of Medicine, University of Toyama, Toyama 930-0194, Japan; Research Center for Idling Brain Science, University of Toyama, Toyama 930-0194, Japan; Faculty of Human Sciences, University of East Asia, Shimonoseki, Yamaguchi 751-8503, Japan
| | - Ken-Ichi Inoue
- Center for the Evolutionary Origins of Human Behavior, Kyoto University, Inuyama, Aichi 484-8506, Japan
| | - Masahiko Takada
- Center for the Evolutionary Origins of Human Behavior, Kyoto University, Inuyama, Aichi 484-8506, Japan; Department of Neurology, Osaka University Graduate School of Medicine, Suita, Osaka 565-0871, Japan.
| |
Collapse
|
122
|
Su F, Wu O, Zhu W. Multi-Label Adversarial Attack With New Measures and Self-Paced Constraint Weighting. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:3809-3822. [PMID: 38875089 DOI: 10.1109/tip.2024.3411927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2024]
Abstract
An adversarial attack is typically implemented by solving a constrained optimization problem. In top-k adversarial attacks implementation for multi-label learning, the attack failure degree (AFD) and attack cost (AC) of a possible attack are major concerns. According to our experimental and theoretical analysis, existing methods are negatively impacted by the coarse measures for AFD/AC and the indiscriminate treatment for all constraints, particularly when there is no ideal solution. Hence, this study first develops a refined measure based on the Jaccard index appropriate for AFD and AC, distinguishing the failure degrees/costs of two possible attacks better than the existing indicator function-based scheme. Furthermore, we formulate novel optimization problems with the least constraint violation via new measures for AFD and AC, and theoretically demonstrate the effectiveness of weighting slack variables for constraints. Finally, a self-paced weighting strategy is proposed to assign different priorities to various constraints during optimization, resulting in larger attack gains compared to previous indiscriminate schemes. Meanwhile, our method avoids fluctuations during optimization, especially in the presence of highly conflicting constraints. Extensive experiments on four benchmark datasets validate the effectiveness of our method across different evaluation metrics.
Collapse
|
123
|
Chen B, Ding F, Ma B, Wang L, Ning S. A Method for Real-Time Recognition of Safflower Filaments in Unstructured Environments Using the YOLO-SaFi Model. SENSORS (BASEL, SWITZERLAND) 2024; 24:4410. [PMID: 39001189 PMCID: PMC11244584 DOI: 10.3390/s24134410] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Revised: 07/03/2024] [Accepted: 07/04/2024] [Indexed: 07/16/2024]
Abstract
The identification of safflower filament targets and the precise localization of picking points are fundamental prerequisites for achieving automated filament retrieval. In light of challenges such as severe occlusion of targets, low recognition accuracy, and the considerable size of models in unstructured environments, this paper introduces a novel lightweight YOLO-SaFi model. The architectural design of this model features a Backbone layer incorporating the StarNet network; a Neck layer introducing a novel ELC convolution module to refine the C2f module; and a Head layer implementing a new lightweight shared convolution detection head, Detect_EL. Furthermore, the loss function is enhanced by upgrading CIoU to PIoUv2. These enhancements significantly augment the model's capability to perceive spatial information and facilitate multi-feature fusion, consequently enhancing detection performance and rendering the model more lightweight. Performance evaluations conducted via comparative experiments with the baseline model reveal that YOLO-SaFi achieved a reduction of parameters, computational load, and weight files by 50.0%, 40.7%, and 48.2%, respectively, compared to the YOLOv8 baseline model. Moreover, YOLO-SaFi demonstrated improvements in recall, mean average precision, and detection speed by 1.9%, 0.3%, and 88.4 frames per second, respectively. Finally, the deployment of the YOLO-SaFi model on the Jetson Orin Nano device corroborates the superior performance of the enhanced model, thereby establishing a robust visual detection framework for the advancement of intelligent safflower filament retrieval robots in unstructured environments.
Collapse
Affiliation(s)
- Bangbang Chen
- School of Mechatronic Engineering, Xi’an Technological University, Xi’an 710021, China; (B.C.); (S.N.)
- School of Mechatronic Engineering, Xinjiang Institute of Technology, Aksu 843100, China; (B.M.); (L.W.)
| | - Feng Ding
- School of Mechatronic Engineering, Xi’an Technological University, Xi’an 710021, China; (B.C.); (S.N.)
| | - Baojian Ma
- School of Mechatronic Engineering, Xinjiang Institute of Technology, Aksu 843100, China; (B.M.); (L.W.)
| | - Liqiang Wang
- School of Mechatronic Engineering, Xinjiang Institute of Technology, Aksu 843100, China; (B.M.); (L.W.)
| | - Shanping Ning
- School of Mechatronic Engineering, Xi’an Technological University, Xi’an 710021, China; (B.C.); (S.N.)
| |
Collapse
|
124
|
Wang H, Quan W, Zhao R, Zhang M, Jiang N. Learning Temporal-Spatial Contextual Adaptation for Three-Dimensional Human Pose Estimation. SENSORS (BASEL, SWITZERLAND) 2024; 24:4422. [PMID: 39001202 PMCID: PMC11244605 DOI: 10.3390/s24134422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Revised: 06/20/2024] [Accepted: 07/04/2024] [Indexed: 07/16/2024]
Abstract
Three-dimensional human pose estimation focuses on generating 3D pose sequences from 2D videos. It has enormous potential in the fields of human-robot interaction, remote sensing, virtual reality, and computer vision. Existing excellent methods primarily focus on exploring spatial or temporal encoding to achieve 3D pose inference. However, various architectures exploit the independent effects of spatial and temporal cues on 3D pose estimation, while neglecting the spatial-temporal synergistic influence. To address this issue, this paper proposes a novel 3D pose estimation method with a dual-adaptive spatial-temporal former (DASTFormer) and additional supervised training. The DASTFormer contains attention-adaptive (AtA) and pure-adaptive (PuA) modes, which will enhance pose inference from 2D to 3D by adaptively learning spatial-temporal effects, considering both their cooperative and independent influences. In addition, an additional supervised training with batch variance loss is proposed in this work. Different from common training strategy, a two-round parameter update is conducted on the same batch data. Not only can it better explore the potential relationship between spatial-temporal encoding and 3D poses, but it can also alleviate the batch size limitations imposed by graphics cards on transformer-based frameworks. Extensive experimental results show that the proposed method significantly outperforms most state-of-the-art approaches on Human3.6 and HumanEVA datasets.
Collapse
Affiliation(s)
| | | | | | | | - Na Jiang
- College of Information Engineering, Capital Normal University, Beijing 100048, China; (H.W.); (W.Q.); (R.Z.); (M.Z.)
| |
Collapse
|
125
|
Reale-Nosei G, Amador-Domínguez E, Serrano E. From vision to text: A comprehensive review of natural image captioning in medical diagnosis and radiology report generation. Med Image Anal 2024; 97:103264. [PMID: 39013207 DOI: 10.1016/j.media.2024.103264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 04/25/2024] [Accepted: 07/01/2024] [Indexed: 07/18/2024]
Abstract
Natural Image Captioning (NIC) is an interdisciplinary research area that lies within the intersection of Computer Vision (CV) and Natural Language Processing (NLP). Several works have been presented on the subject, ranging from the early template-based approaches to the more recent deep learning-based methods. This paper conducts a survey in the area of NIC, especially focusing on its applications for Medical Image Captioning (MIC) and Diagnostic Captioning (DC) in the field of radiology. A review of the state-of-the-art is conducted summarizing key research works in NIC and DC to provide a wide overview on the subject. These works include existing NIC and MIC models, datasets, evaluation metrics, and previous reviews in the specialized literature. The revised work is thoroughly analyzed and discussed, highlighting the limitations of existing approaches and their potential implications in real clinical practice. Similarly, future potential research lines are outlined on the basis of the detected limitations.
Collapse
Affiliation(s)
- Gabriel Reale-Nosei
- ETSI Informáticos, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain.
| | - Elvira Amador-Domínguez
- Ontology Engineering Group, Departamento de Inteligencia Artificial, ETSI Informáticos, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain; Departamento de Sistemas Informáticos, ETSI Sistemas Informáticos, Universidad Politécnica de Madrid, 28031 Madrid, Spain.
| | - Emilio Serrano
- Ontology Engineering Group, Departamento de Inteligencia Artificial, ETSI Informáticos, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain.
| |
Collapse
|
126
|
Khodadadzadeh M, Sloan AT, Jones NA, Coyle D, Kelso JAS. Artificial intelligence detects awareness of functional relation with the environment in 3 month old babies. Sci Rep 2024; 14:15580. [PMID: 38971875 PMCID: PMC11227524 DOI: 10.1038/s41598-024-66312-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Accepted: 07/01/2024] [Indexed: 07/08/2024] Open
Abstract
A recent experiment probed how purposeful action emerges in early life by manipulating infants' functional connection to an object in the environment (i.e., tethering an infant's foot to a colorful mobile). Vicon motion capture data from multiple infant joints were used here to create Histograms of Joint Displacements (HJDs) to generate pose-based descriptors for 3D infant spatial trajectories. Using HJDs as inputs, machine and deep learning systems were tasked with classifying the experimental state from which snippets of movement data were sampled. The architectures tested included k-Nearest Neighbour (kNN), Linear Discriminant Analysis (LDA), Fully connected network (FCNet), 1D-Convolutional Neural Network (1D-Conv), 1D-Capsule Network (1D-CapsNet), 2D-Conv and 2D-CapsNet. Sliding window scenarios were used for temporal analysis to search for topological changes in infant movement related to functional context. kNN and LDA achieved higher classification accuracy with single joint features, while deep learning approaches, particularly 2D-CapsNet, achieved higher accuracy on full-body features. For each AI architecture tested, measures of foot activity displayed the most distinct and coherent pattern alterations across different experimental stages (reflected in the highest classification accuracy rate), indicating that interaction with the world impacts the infant behaviour most at the site of organism~world connection.
Collapse
Affiliation(s)
- Massoud Khodadadzadeh
- School of Computer Science and Technology, University of Bedfordshire, Luton, LU1 3JU, UK.
- The Bath Institute for the Augmented Human, University of Bath, Bath, BA2 7AY, UK.
- Intelligent Systems Research Centre, Ulster University, Derry, Londonderry, BT48 7JL, UK.
| | - Aliza T Sloan
- Human Brain and Behaviour Laboratory, Center for Complex Systems and Brain Sciences, Florida Atlantic University, Boca Raton, FL, 33431, US
| | - Nancy Aaron Jones
- Human Brain and Behaviour Laboratory, Center for Complex Systems and Brain Sciences, Florida Atlantic University, Boca Raton, FL, 33431, US
| | - Damien Coyle
- The Bath Institute for the Augmented Human, University of Bath, Bath, BA2 7AY, UK
- Intelligent Systems Research Centre, Ulster University, Derry, Londonderry, BT48 7JL, UK
| | - J A Scott Kelso
- Human Brain and Behaviour Laboratory, Center for Complex Systems and Brain Sciences, Florida Atlantic University, Boca Raton, FL, 33431, US
- Intelligent Systems Research Centre, Ulster University, Derry, Londonderry, BT48 7JL, UK
| |
Collapse
|
127
|
Li Z, Xie H, Wang Z, Li D, Chen K, Zong X, Qiang W, Wen F, Deng Z, Chen L, Li H, Dong H, Wu P, Sun T, Cheng Y, Yang Y, Xue J, Zheng Q, Jiang J, Chen W. Deep learning for multi-type infectious keratitis diagnosis: A nationwide, cross-sectional, multicenter study. NPJ Digit Med 2024; 7:181. [PMID: 38971902 PMCID: PMC11227533 DOI: 10.1038/s41746-024-01174-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Accepted: 06/21/2024] [Indexed: 07/08/2024] Open
Abstract
The main cause of corneal blindness worldwide is keratitis, especially the infectious form caused by bacteria, fungi, viruses, and Acanthamoeba. The key to effective management of infectious keratitis hinges on prompt and precise diagnosis. Nevertheless, the current gold standard, such as cultures of corneal scrapings, remains time-consuming and frequently yields false-negative results. Here, using 23,055 slit-lamp images collected from 12 clinical centers nationwide, this study constructed a clinically feasible deep learning system, DeepIK, that could emulate the diagnostic process of a human expert to identify and differentiate bacterial, fungal, viral, amebic, and noninfectious keratitis. DeepIK exhibited remarkable performance in internal, external, and prospective datasets (all areas under the receiver operating characteristic curves > 0.96) and outperformed three other state-of-the-art algorithms (DenseNet121, InceptionResNetV2, and Swin-Transformer). Our study indicates that DeepIK possesses the capability to assist ophthalmologists in accurately and swiftly identifying various infectious keratitis types from slit-lamp images, thereby facilitating timely and targeted treatment.
Collapse
Affiliation(s)
- Zhongwen Li
- Ningbo Key Laboratory of Medical Research on Blinding Eye Diseases, Ningbo Eye Institute, Ningbo Eye Hospital, Wenzhou Medical University, Ningbo, 315000, China
- National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, China
| | - He Xie
- National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, China
| | - Zhouqian Wang
- National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, China
| | - Daoyuan Li
- Department of Ophthalmology, The Affiliated Hospital of Guizhou Medical University, Guiyang, 550004, China
| | - Kuan Chen
- Department of Ophthalmology, Cangnan Hospital, Wenzhou Medical University, Wenzhou, 325000, China
| | - Xihang Zong
- Ningbo Key Laboratory of Medical Research on Blinding Eye Diseases, Ningbo Eye Institute, Ningbo Eye Hospital, Wenzhou Medical University, Ningbo, 315000, China
| | - Wei Qiang
- Ningbo Key Laboratory of Medical Research on Blinding Eye Diseases, Ningbo Eye Institute, Ningbo Eye Hospital, Wenzhou Medical University, Ningbo, 315000, China
| | - Feng Wen
- Ningbo Key Laboratory of Medical Research on Blinding Eye Diseases, Ningbo Eye Institute, Ningbo Eye Hospital, Wenzhou Medical University, Ningbo, 315000, China
| | - Zhihong Deng
- Department of Ophthalmology, The Third Xiangya Hospital, Central South University, Changsha, 410013, China
| | - Limin Chen
- Department of Ophthalmology, The First Affiliated Hospital of Fujian Medical University, Fuzhou, 350000, China
| | - Huiping Li
- Department of Ophthalmology, People's Hospital of Ningxia Hui Autonomous Region, Ningxia Medical University, Yinchuan, 750001, China
| | - He Dong
- The Third People's Hospital of Dalian & Dalian Municipal Eye Hospital, Dalian, 116033, China
| | - Pengcheng Wu
- Department of Ophthalmology, The Second Hospital of Lanzhou University, Lanzhou, 730030, China
| | - Tao Sun
- The Affiliated Eye Hospital of Nanchang University, Jiangxi Clinical Research Center for Ophthalmic Disease, Jiangxi Research Institute of Ophthalmology and Visual Science, Jiangxi Provincial Key Laboratory for Ophthalmology, Nanchang, 330006, China
| | - Yan Cheng
- Xi'an No.1 Hospital, Shaanxi Institute of Ophthalmology, Shaanxi Key Laboratory of Ophthalmology, The First Affiliated Hospital of Northwestern University, Xi'an, 710002, China
| | - Yanning Yang
- Department of Ophthalmology, Renmin Hospital of Wuhan University, Wuhan, 430060, China
| | - Jinsong Xue
- Affiliated Eye Hospital of Nanjing Medical University, Nanjing, 210029, China
| | - Qinxiang Zheng
- Ningbo Key Laboratory of Medical Research on Blinding Eye Diseases, Ningbo Eye Institute, Ningbo Eye Hospital, Wenzhou Medical University, Ningbo, 315000, China.
- National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, China.
| | - Jiewei Jiang
- School of Electronic Engineering, Xi'an University of Posts and Telecommunications, Xi'an, 710121, China.
| | - Wei Chen
- Ningbo Key Laboratory of Medical Research on Blinding Eye Diseases, Ningbo Eye Institute, Ningbo Eye Hospital, Wenzhou Medical University, Ningbo, 315000, China.
- National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, China.
| |
Collapse
|
128
|
Gao Z, Xu K, Zhuang H, Liu L, Mao X, Ding B, Feng D, Wang H. Less confidence, less forgetting: Learning with a humbler teacher in exemplar-free Class-Incremental learning. Neural Netw 2024; 179:106513. [PMID: 39018945 DOI: 10.1016/j.neunet.2024.106513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 06/26/2024] [Accepted: 07/04/2024] [Indexed: 07/19/2024]
Abstract
Class-Incremental learning (CIL) is challenging due to catastrophic forgetting (CF), which escalates in exemplar-free scenarios. To mitigate CF, Knowledge Distillation (KD), which leverages old models as teacher models, has been widely employed in CIL. However, based on a case study, our investigation reveals that the teacher model exhibits over-confidence in unseen new samples. In this article, we conduct empirical experiments and provide theoretical analysis to investigate the over-confident phenomenon and the impact of KD in exemplar-free CIL, where access to old samples is unavailable. Building on our analysis, we propose a novel approach, Learning with Humbler Teacher, by systematically selecting an appropriate checkpoint model as a humbler teacher to mitigate CF. Furthermore, we explore utilizing the nuclear norm to obtain an appropriate temporal ensemble to enhance model stability. Notably, LwHT outperforms the state-of-the-art approach by a significant margin of 10.41%, 6.56%, and 4.31% in various settings while demonstrating superior model plasticity.
Collapse
Affiliation(s)
- Zijian Gao
- National University of Defense Technology, Changsha 410000, China; State Key Laboratory of Complex & Critical Software Environment, Changsha 410000, China
| | - Kele Xu
- National University of Defense Technology, Changsha 410000, China; State Key Laboratory of Complex & Critical Software Environment, Changsha 410000, China.
| | - Huiping Zhuang
- South China University of Technology, Guangzhou 510000, China
| | - Li Liu
- National University of Defense Technology, Changsha 410000, China; University of Oulu, 02150 Oulu, Finland
| | - Xinjun Mao
- National University of Defense Technology, Changsha 410000, China; State Key Laboratory of Complex & Critical Software Environment, Changsha 410000, China
| | - Bo Ding
- National University of Defense Technology, Changsha 410000, China; State Key Laboratory of Complex & Critical Software Environment, Changsha 410000, China
| | - Dawei Feng
- National University of Defense Technology, Changsha 410000, China; State Key Laboratory of Complex & Critical Software Environment, Changsha 410000, China
| | - Huaimin Wang
- National University of Defense Technology, Changsha 410000, China; State Key Laboratory of Complex & Critical Software Environment, Changsha 410000, China
| |
Collapse
|
129
|
Hu L, Hu L, Chen M. Edge-enhanced infrared image super-resolution reconstruction model under transformer. Sci Rep 2024; 14:15585. [PMID: 38971844 PMCID: PMC11227526 DOI: 10.1038/s41598-024-66302-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Accepted: 07/01/2024] [Indexed: 07/08/2024] Open
Abstract
Infrared images have important applications in military, security and surveillance fields. However, limited by technical factors, the resolution of infrared images is generally low, which seriously limits the application and development of infrared images in various fields. To address the problem of difficult recovery of edge information and easy ringing effect in the super-resolution reconstruction process of infrared images, an edge-enhanced infrared image super-resolution reconstruction model TESR under transformer is proposed. The main structure of this model is transformer. First, in view of the problem of difficult recovery of edge information of infrared images, an edge detection auxiliary network is designed, which can obtain more accurate edge information from the input low-resolution images and enhance the edge details during image reconstruction; then, the CSWin Transformer is introduced to compute the self-attention of horizontal and vertical stripes in parallel, so as to increase the receptive field of the model and enable it to utilize features with higher semantic levels. The super-resolution reconstruction model proposed in this paper can extract more comprehensive image information, and at the same time, it can obtain more accurate edge information to enhance the texture details of super-resolution images, and achieve better reconstruction results.
Collapse
Affiliation(s)
- Lei Hu
- School of Computer and Information Engineering, Jiangxi Normal University, Nanchang, 330022, China.
| | - Long Hu
- School of Computer and Information Engineering, Jiangxi Normal University, Nanchang, 330022, China
| | - MingHui Chen
- School of Computer and Information Engineering, Jiangxi Normal University, Nanchang, 330022, China
| |
Collapse
|
130
|
Jiao J, Alsharid M, Drukker L, Papageorghiou AT, Zisserman A, Noble JA. Audio-visual modelling in a clinical setting. Sci Rep 2024; 14:15569. [PMID: 38971838 PMCID: PMC11227581 DOI: 10.1038/s41598-024-66160-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Accepted: 06/27/2024] [Indexed: 07/08/2024] Open
Abstract
Auditory and visual signals are two primary perception modalities that are usually present together and correlate with each other, not only in natural environments but also in clinical settings. However, audio-visual modelling in the latter case can be more challenging, due to the different sources of audio/video signals and the noise (both signal-level and semantic-level) in auditory signals-usually speech audio. In this study, we consider audio-visual modelling in a clinical setting, providing a solution to learn medical representations that benefit various clinical tasks, without relying on dense supervisory annotations from human experts for the model training. A simple yet effective multi-modal self-supervised learning framework is presented for this purpose. The proposed approach is able to help find standard anatomical planes, predict the focusing position of sonographer's eyes, and localise anatomical regions of interest during ultrasound imaging. Experimental analysis on a large-scale clinical multi-modal ultrasound video dataset show that the proposed novel representation learning method provides good transferable anatomical representations that boost the performance of automated downstream clinical tasks, even outperforming fully-supervised solutions. Being able to learn such medical representations in a self-supervised manner will contribute to several aspects including a better understanding of obstetric imaging, training new sonographers, more effective assistive tools for human experts, and enhancement of the clinical workflow.
Collapse
Affiliation(s)
- Jianbo Jiao
- Department of Engineering Science, University of Oxford, Oxford, UK.
- School of Computer Science, University of Birmingham, Birmingham, UK.
| | - Mohammad Alsharid
- Department of Engineering Science, University of Oxford, Oxford, UK
- Department of Electrical Engineering and Computer Science, Khalifa University, Abu Dhabi, United Arab Emirates
| | - Lior Drukker
- Nuffield Department of Women's and Reproductive Health, University of Oxford, Oxford, UK
- Rabin Medical Center, Tel-Aviv University Faculty of Medicine, Tel Aviv, Israel
| | - Aris T Papageorghiou
- Nuffield Department of Women's and Reproductive Health, University of Oxford, Oxford, UK
| | - Andrew Zisserman
- Department of Engineering Science, University of Oxford, Oxford, UK
| | - J Alison Noble
- Department of Engineering Science, University of Oxford, Oxford, UK.
| |
Collapse
|
131
|
Cheng P, Mao C, Tang J, Yang S, Cheng Y, Wang W, Gu Q, Han W, Chen H, Li S, Chen Y, Zhou J, Li W, Pan A, Zhao S, Huang X, Zhu S, Zhang J, Shu W, Wang S. Zero-shot prediction of mutation effects with multimodal deep representation learning guides protein engineering. Cell Res 2024:10.1038/s41422-024-00989-2. [PMID: 38969803 DOI: 10.1038/s41422-024-00989-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Accepted: 06/03/2024] [Indexed: 07/07/2024] Open
Abstract
Mutations in amino acid sequences can provoke changes in protein function. Accurate and unsupervised prediction of mutation effects is critical in biotechnology and biomedicine, but remains a fundamental challenge. To resolve this challenge, here we present Protein Mutational Effect Predictor (ProMEP), a general and multiple sequence alignment-free method that enables zero-shot prediction of mutation effects. A multimodal deep representation learning model embedded in ProMEP was developed to comprehensively learn both sequence and structure contexts from ~160 million proteins. ProMEP achieves state-of-the-art performance in mutational effect prediction and accomplishes a tremendous improvement in speed, enabling efficient and intelligent protein engineering. Specifically, ProMEP accurately forecasts mutational consequences on the gene-editing enzymes TnpB and TadA, and successfully guides the development of high-performance gene-editing tools with their engineered variants. The gene-editing efficiency of a 5-site mutant of TnpB reaches up to 74.04% (vs 24.66% for the wild type); and the base editing tool developed on the basis of a TadA 15-site mutant (in addition to the A106V/D108N double mutation that renders deoxyadenosine deaminase activity to TadA) exhibits an A-to-G conversion frequency of up to 77.27% (vs 69.80% for ABE8e, a previous TadA-based adenine base editor) with significantly reduced bystander and off-target effects compared to ABE8e. ProMEP not only showcases superior performance in predicting mutational effects on proteins but also demonstrates a great capability to guide protein engineering. Therefore, ProMEP enables efficient exploration of the gigantic protein space and facilitates practical design of proteins, thereby advancing studies in biomedicine and synthetic biology.
Collapse
Affiliation(s)
- Peng Cheng
- Bioinformatics Center of AMMS, Beijing, China
| | - Cong Mao
- State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Jin Tang
- Zhejiang Lab, Hangzhou, Zhejiang, China
| | - Sen Yang
- Bioinformatics Center of AMMS, Beijing, China
| | - Yu Cheng
- State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Wuke Wang
- Zhejiang Lab, Hangzhou, Zhejiang, China
| | - Qiuxi Gu
- State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Wei Han
- Zhejiang Lab, Hangzhou, Zhejiang, China
| | - Hao Chen
- State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Sihan Li
- State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | | | | | - Wuju Li
- Bioinformatics Center of AMMS, Beijing, China
| | - Aimin Pan
- Zhejiang Lab, Hangzhou, Zhejiang, China
| | - Suwen Zhao
- iHuman Institute, ShanghaiTech University, Shanghai, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | - Xingxu Huang
- Zhejiang Lab, Hangzhou, Zhejiang, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | | | - Jun Zhang
- State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China.
| | - Wenjie Shu
- Bioinformatics Center of AMMS, Beijing, China.
| | | |
Collapse
|
132
|
Yang Q, Wang C, Pan K, Xia B, Xie R, Shi J. An improved 3D-UNet-based brain hippocampus segmentation model based on MR images. BMC Med Imaging 2024; 24:166. [PMID: 38970025 PMCID: PMC11225132 DOI: 10.1186/s12880-024-01346-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Accepted: 06/24/2024] [Indexed: 07/07/2024] Open
Abstract
OBJECTIVE Accurate delineation of the hippocampal region via magnetic resonance imaging (MRI) is crucial for the prevention and early diagnosis of neurosystemic diseases. Determining how to accurately and quickly delineate the hippocampus from MRI results has become a serious issue. In this study, a pixel-level semantic segmentation method using 3D-UNet is proposed to realize the automatic segmentation of the brain hippocampus from MRI results. METHODS Two hundred three-dimensional T1-weighted (3D-T1) nongadolinium contrast-enhanced magnetic resonance (MR) images were acquired at Hangzhou Cancer Hospital from June 2020 to December 2022. These samples were divided into two groups, containing 175 and 25 samples. In the first group, 145 cases were used to train the hippocampus segmentation model, and the remaining 30 cases were used to fine-tune the hyperparameters of the model. Images for twenty-five patients in the second group were used as the test set to evaluate the performance of the model. The training set of images was processed via rotation, scaling, grey value augmentation and transformation with a smooth dense deformation field for both image data and ground truth labels. A filling technique was introduced into the segmentation network to establish the hippocampus segmentation model. In addition, the performance of models established with the original network, such as VNet, SegResNet, UNetR and 3D-UNet, was compared with that of models constructed by combining the filling technique with the original segmentation network. RESULTS The results showed that the performance of the segmentation model improved after the filling technique was introduced. Specifically, when the filling technique was introduced into VNet, SegResNet, 3D-UNet and UNetR, the segmentation performance of the models trained with an input image size of 48 × 48 × 48 improved. Among them, the 3D-UNet-based model with the filling technique achieved the best performance, with a Dice score (Dice score) of 0.7989 ± 0.0398 and a mean intersection over union (mIoU) of 0.6669 ± 0.0540, which were greater than those of the original 3D-UNet-based model. In addition, the oversegmentation ratio (OSR), average surface distance (ASD) and Hausdorff distance (HD) were 0.0666 ± 0.0351, 0.5733 ± 0.1018 and 5.1235 ± 1.4397, respectively, which were better than those of the other models. In addition, when the size of the input image was set to 48 × 48 × 48, 64 × 64 × 64 and 96 × 96 × 96, the model performance gradually improved, and the Dice scores of the proposed model reached 0.7989 ± 0.0398, 0.8371 ± 0.0254 and 0.8674 ± 0.0257, respectively. In addition, the mIoUs reached 0.6669 ± 0.0540, 0.7207 ± 0.0370 and 0.7668 ± 0.0392, respectively. CONCLUSION The proposed hippocampus segmentation model constructed by introducing the filling technique into a segmentation network performed better than models built solely on the original network and can improve the efficiency of diagnostic analysis.
Collapse
Affiliation(s)
- Qian Yang
- Information Technology Center, Taizhou University, 1139 Shifu Dadao, Taizhou City, Zhejiang Province, China
| | - Chengfeng Wang
- College of Mathematics and Computer Science, Zhejiang A & F University, 666 Wusu Street, Hangzhou, 311300, China
| | - Kaicheng Pan
- Hangzhou Cancer hospital, 34 YanGuan Lane, Hangzhou, 310002, China
| | - Bing Xia
- Hangzhou Cancer hospital, 34 YanGuan Lane, Hangzhou, 310002, China.
| | - Ruifei Xie
- Hangzhou Cancer hospital, 34 YanGuan Lane, Hangzhou, 310002, China.
| | - Jiankai Shi
- School of Computer Science, Hangzhou Dianzi University, Xiasha Higher Education Zone, Hangzhou, Zhejiang, 310018, People's Republic of China.
| |
Collapse
|
133
|
Qu W, Li X, Jin X. Knowledge enhanced bottom-up affordance grounding for robotic interaction. PeerJ Comput Sci 2024; 10:e2097. [PMID: 38983207 PMCID: PMC11232630 DOI: 10.7717/peerj-cs.2097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Accepted: 05/13/2024] [Indexed: 07/11/2024]
Abstract
With the rapid advancement of robotics technology, an increasing number of researchers are exploring the use of natural language as a communication channel between humans and robots. In scenarios where language conditioned manipulation grounding, prevailing methods rely heavily on supervised multimodal deep learning. In this paradigm, robots assimilate knowledge from both language instructions and visual input. However, these approaches lack external knowledge for comprehending natural language instructions and are hindered by the substantial demand for a large amount of paired data, where vision and language are usually linked through manual annotation for the creation of realistic datasets. To address the above problems, we propose the knowledge enhanced bottom-up affordance grounding network (KBAG-Net), which enhances natural language understanding through external knowledge, improving accuracy in object grasping affordance segmentation. In addition, we introduce a semi-automatic data generation method aimed at facilitating the quick establishment of the language following manipulation grounding dataset. The experimental results on two standard dataset demonstrate that our method outperforms existing methods with the external knowledge. Specifically, our method outperforms the two-stage method by 12.98% and 1.22% of mIoU on the two dataset, respectively. For broader community engagement, we will make the semi-automatic data construction method publicly available at https://github.com/wmqu/Automated-Dataset-Construction4LGM.
Collapse
Affiliation(s)
- Wen Qu
- Computer Science and Technology, Dalian Martime University, Dalian, Liaoning, China
| | - Xiao Li
- Computer Science and Technology, Dalian Martime University, Dalian, Liaoning, China
| | - Xiao Jin
- Computer Science and Technology, Dalian Martime University, Dalian, Liaoning, China
| |
Collapse
|
134
|
Butt MA, Kaleem MF, Bilal M, Hanif MS. Using multi-label ensemble CNN classifiers to mitigate labelling inconsistencies in patch-level Gleason grading. PLoS One 2024; 19:e0304847. [PMID: 38968206 PMCID: PMC11226137 DOI: 10.1371/journal.pone.0304847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Accepted: 05/21/2024] [Indexed: 07/07/2024] Open
Abstract
This paper presents a novel approach to enhance the accuracy of patch-level Gleason grading in prostate histopathology images, a critical task in the diagnosis and prognosis of prostate cancer. This study shows that the Gleason grading accuracy can be improved by addressing the prevalent issue of label inconsistencies in the SICAPv2 prostate dataset, which employs a majority voting scheme for patch-level labels. We propose a multi-label ensemble deep-learning classifier that effectively mitigates these inconsistencies and yields more accurate results than the state-of-the-art works. Specifically, our approach leverages the strengths of three different one-vs-all deep learning models in an ensemble to learn diverse features from the histopathology images to individually indicate the presence of one or more Gleason grades (G3, G4, and G5) in each patch. These deep learning models have been trained using transfer learning to fine-tune a variant of the ResNet18 CNN classifier chosen after an extensive ablation study. Experimental results demonstrate that our multi-label ensemble classifier significantly outperforms traditional single-label classifiers reported in the literature by at least 14% and 4% on accuracy and f1-score metrics respectively. These results underscore the potential of our proposed machine learning approach to improve the accuracy and consistency of prostate cancer grading.
Collapse
Affiliation(s)
- Muhammad Asim Butt
- Department of Electrical Engineering, University of Management and Technology, Lahore, Pakistan
| | | | - Muhammad Bilal
- Center of Excellence in Intelligent Engineering Systems, King Abdulaziz University, Jeddah, Saudi Arabia
- Department of Electrical and Computer Engineering, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Muhammad Shehzad Hanif
- Center of Excellence in Intelligent Engineering Systems, King Abdulaziz University, Jeddah, Saudi Arabia
- Department of Electrical and Computer Engineering, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
135
|
Xiong S, Tan Y, Wang G, Yan P, Xiang X. Learning feature relationships in CNN model via relational embedding convolution layer. Neural Netw 2024; 179:106510. [PMID: 39024707 DOI: 10.1016/j.neunet.2024.106510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2024] [Revised: 06/28/2024] [Accepted: 07/03/2024] [Indexed: 07/20/2024]
Abstract
Establishing the relationships among hierarchical visual attributes of objects in the visual world is crucial for human cognition. The classic convolution neural network (CNN) can successfully extract hierarchical features but ignore the relationships among features, resulting in shortcomings compared to humans in areas like interpretability and domain generalization. Recently, algorithms have introduced feature relationships by external prior knowledge and special auxiliary modules, which have been proven to bring multiple improvements in many computer vision tasks. However, prior knowledge is often difficult to obtain, and auxiliary modules bring additional consumption of computing and storage resources, which limits the flexibility and practicality of the algorithm. In this paper, we aim to drive the CNN model to learn the relationships among hierarchical deep features without prior knowledge and consumption increasing, while enhancing the fundamental performance of some aspects. Firstly, the task of learning the relationships among hierarchical features in CNN is defined and three key problems related to this task are pointed out, including the quantitative metric of connection intensity, the threshold of useless connections, and the updating strategy of relation graph. Secondly, Relational Embedding Convolution (RE-Conv) layer is proposed for the representation of feature relationships in convolution layer, followed by a scheme called use & disuse strategy which aims to address the three problems of feature relation learning. Finally, the improvements brought by the proposed feature relation learning scheme have been demonstrated through numerous experiments, including interpretability, domain generalization, noise robustness, and inference efficiency. In particular, the proposed scheme outperforms many state-of-the-art methods in the domain generalization community and can be seamlessly integrated with existing methods for further improvement. Meanwhile, it maintains comparable precision to the original CNN model while reducing floating point operations (FLOPs) by approximately 50%.
Collapse
Affiliation(s)
- Shengzhou Xiong
- School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, 430074, China; National Key Laboratory of Multispectral Information Intelligent Processing Technology, Wuhan, 430074, China.
| | - Yihua Tan
- School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, 430074, China; National Key Laboratory of Multispectral Information Intelligent Processing Technology, Wuhan, 430074, China.
| | - Guoyou Wang
- School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, 430074, China; National Key Laboratory of Multispectral Information Intelligent Processing Technology, Wuhan, 430074, China.
| | - Pei Yan
- School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, 430074, China; National Key Laboratory of Multispectral Information Intelligent Processing Technology, Wuhan, 430074, China.
| | - Xuanyu Xiang
- School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, 430074, China; National Key Laboratory of Multispectral Information Intelligent Processing Technology, Wuhan, 430074, China.
| |
Collapse
|
136
|
Das P, Nath S, Gupta R, Roy SD, Bhowmik MK. Infrared thermogram image guided discontinuous appearances of hyaluronic acid for classification of arthritic knee joints. J Therm Biol 2024; 123:103915. [PMID: 38981303 DOI: 10.1016/j.jtherbio.2024.103915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Revised: 06/25/2024] [Accepted: 07/01/2024] [Indexed: 07/11/2024]
Abstract
The liveliness of a human potentially depends on his/her smooth movability. To accomplish the work of daily life, the joints of the body need to be healthy. However, the occurrence of Rheumatoid arthritis and Osteoarthritis has a significant prevalence towards the immovability of humankind. Rheumatoid arthritis (RA) and Osteoarthritis (OA) mostly affect the joints of the hand and knee which result in lifelong pain, inability to climb, walk, etc. In the early stages, these diseases attack the synovial membrane and synovial fluid, and further it destroys the soft tissues and bone structure. By early diagnosis, we can start the treatment in the early stage which may cure these diseases with such extreme consequences. As per clinical studies of previous literature, it is observed that synovial fluid imbalance appears in the early stage of such diseases and Hyaluronic Acid (HA) concentration also decreases for that. Therefore, estimation of HA is a significant key to arthritis disease classification and grading. In this paper, we proposed a hybrid framework for classification of arthritic knee joints based on the analysis of the discontinuous appearances of the HA concentration using infrared imaging technology. To meet up the specific necessities, firstly we have proposed a modified K-Means clustering algorithm for extraction of the region of interest (ROI) i.e., the knee joint surface. Secondly, a mathematical formulation is proposed to calculate the concentration of HA from the segmented ROIs. This experimental process was implemented on the publicly available IR (Infrared) Knee Joint Dataset and for further evaluation of the novelty of mathematical formulation, we have extended the proposed work to the classification of healthy and arthritis affected knee joints depending on significant discriminative characteristics of the HA concentration with respect to the existing significant imaging features. Experimental results and analysis demonstrates that concentration of HA has the dominant potential for classifying healthy and arthritic knee joints using infrared holistic images. Our experimental analysis reveals that estimation and combination of the HA concentration features with conventional handcrafted and deep features increases the classification performance with an average accuracy of 91% and 97.22% respectively as compared to the each individual feature sets.
Collapse
Affiliation(s)
- Puja Das
- Department of Computer Science & Engineering, Tripura University (A Central University), Suryamaninagar, 799022, India.
| | - Satyabrata Nath
- Department of PMR, Agartala Government Medical College and G B Pant Hospital Campus, Kunjaban Agartala, 799006, Tripura, India.
| | - Ranjan Gupta
- Department of Rheumatology, All India Institute of Medical Sciences (AIIMS)-New Delhi, Ansari Nagar, New Delhi, 110029, India.
| | - Sourav Dey Roy
- Department of Computer Science & Engineering, Tripura University (A Central University), Suryamaninagar, 799022, India.
| | - Mrinal Kanti Bhowmik
- Department of Computer Science & Engineering, Tripura University (A Central University), Suryamaninagar, 799022, India.
| |
Collapse
|
137
|
Li Y, Hu J, Wu H, Wei Y, Shan H, Song X, Hua X, Xu W, Jiang Y. An appearance quality classification method for Auricularia auricula based on deep learning. Sci Rep 2024; 14:15516. [PMID: 38969651 PMCID: PMC11226435 DOI: 10.1038/s41598-023-50739-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2023] [Accepted: 12/24/2023] [Indexed: 07/07/2024] Open
Abstract
The intelligent appearance quality classification method for Auricularia auricula is of great significance to promote this industry. This paper proposes an appearance quality classification method for Auricularia auricula based on the improved Faster Region-based Convolutional Neural Networks (improved Faster RCNN) framework. The original Faster RCNN is improved by establishing a multiscale feature fusion detection model to improve the accuracy and real-time performance of the model. The multiscale feature fusion detection model makes full use of shallow feature information to complete target detection. It fuses shallow features with rich detailed information with deep features rich in strong semantic information. Since the fusion algorithm directly uses the existing information of the feature extraction network, there is no additional calculation. The fused features contain more original detailed feature information. Therefore, the improved Faster RCNN can improve the final detection rate without sacrificing speed. By comparing with the original Faster RCNN model, the mean average precision (mAP) of the improved Faster RCNN is increased by 2.13%. The average precision (AP) of the first-level Auricularia auricula is almost unchanged at a high level. The AP of the second-level Auricularia auricula is increased by nearly 5%. And the third-level Auricularia auricula AP is increased by 1%. The improved Faster RCNN improves the frames per second from 6.81 of the original Faster RCNN to 13.5. Meanwhile, the influence of complex environment and image resolution on the Auricularia auricula detection is explored.
Collapse
Affiliation(s)
- Yang Li
- College of Engineering and Technology, Tianjin Agricultural University, Tianjin, 300392, China
- Key Laboratory of Smart Breeding (Co-construction by Ministry and Province), Ministry of Agriculture and Rural Affairs (TJAU), Tianjin, 300392, China
| | - Jiajun Hu
- College of Mechanical Engineering, Jiamusi University, Jiamusi, 154007, China
| | - Haiyun Wu
- College of Engineering and Technology, Tianjin Agricultural University, Tianjin, 300392, China
- Key Laboratory of Smart Breeding (Co-construction by Ministry and Province), Ministry of Agriculture and Rural Affairs (TJAU), Tianjin, 300392, China
| | - Yong Wei
- College of Engineering and Technology, Tianjin Agricultural University, Tianjin, 300392, China
- Key Laboratory of Smart Breeding (Co-construction by Ministry and Province), Ministry of Agriculture and Rural Affairs (TJAU), Tianjin, 300392, China
| | - Huiyong Shan
- College of Engineering and Technology, Tianjin Agricultural University, Tianjin, 300392, China
- Key Laboratory of Smart Breeding (Co-construction by Ministry and Province), Ministry of Agriculture and Rural Affairs (TJAU), Tianjin, 300392, China
| | - Xin Song
- College of Engineering and Technology, Tianjin Agricultural University, Tianjin, 300392, China
- Key Laboratory of Smart Breeding (Co-construction by Ministry and Province), Ministry of Agriculture and Rural Affairs (TJAU), Tianjin, 300392, China
| | - Xiuping Hua
- College of Engineering and Technology, Tianjin Agricultural University, Tianjin, 300392, China
- Key Laboratory of Smart Breeding (Co-construction by Ministry and Province), Ministry of Agriculture and Rural Affairs (TJAU), Tianjin, 300392, China
| | - Wei Xu
- College of Engineering and Technology, Tianjin Agricultural University, Tianjin, 300392, China
- Key Laboratory of Smart Breeding (Co-construction by Ministry and Province), Ministry of Agriculture and Rural Affairs (TJAU), Tianjin, 300392, China
| | - Yongcheng Jiang
- College of Engineering and Technology, Tianjin Agricultural University, Tianjin, 300392, China.
- Key Laboratory of Smart Breeding (Co-construction by Ministry and Province), Ministry of Agriculture and Rural Affairs (TJAU), Tianjin, 300392, China.
- College of Mechanical Engineering, Jiamusi University, Jiamusi, 154007, China.
| |
Collapse
|
138
|
Hou M, Chen Y, Li J, Yi F. Single 5-centimeter-aperture metalens enabled intelligent lightweight mid-infrared thermographic camera. SCIENCE ADVANCES 2024; 10:eado4847. [PMID: 38968354 PMCID: PMC11225786 DOI: 10.1126/sciadv.ado4847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Accepted: 06/04/2024] [Indexed: 07/07/2024]
Abstract
Existing mid-infrared thermographic cameras rely on a stack of refractive lenses, resulting in bulky and heavy imaging systems that restrict their broader utility. Here, we demonstrate a lightweight metalens-based thermographic camera (MTC) enabled by a single 0.5-mm-thick, 3.7-g-weight, flat, and mass-producible metalens. The large aperture size (5 cm) of our metalens, when combined with an uncooled focal plane array, enables thermal imaging at distances of tens of meters. By computationally removing the veiling glare, our MTC realizes the temperature mapping with an inaccuracy of less than ±0.7% within the range of 35° to 700°C and shows exceptional environmental adaptability. Furthermore, by using intelligent algorithms and spectral filtering, our uncooled MTC enables visualization and quantification of the SF6 gas leakage at a long distance of 5 m, with a remarkable minimum detectable leak rate of 0.2 sccm. Our work opens the door to the lightweight and multifunctional intelligent thermal imaging systems.
Collapse
Affiliation(s)
- Mingming Hou
- School of Optical and Electronic Information and Wuhan National Research Center for Optoelectronics (WNLO), Huazhong University of Science and Technology, Hubei, Wuhan 430074, China
| | - Yan Chen
- School of Optical and Electronic Information and Wuhan National Research Center for Optoelectronics (WNLO), Huazhong University of Science and Technology, Hubei, Wuhan 430074, China
| | - Junyu Li
- IRay Technology Co. Ltd., Yantai 264006, China
| | - Fei Yi
- School of Optical and Electronic Information and Wuhan National Research Center for Optoelectronics (WNLO), Huazhong University of Science and Technology, Hubei, Wuhan 430074, China
- Optics Valley Laboratory, Hubei 430074, China
- Shenzhen Huazhong University of Science and Technology Research Institute, Shenzhen 518000, China
| |
Collapse
|
139
|
Liu T, Zhang S, Yu Z. Redefining Accuracy: Underwater Depth Estimation for Irregular Illumination Scenes. SENSORS (BASEL, SWITZERLAND) 2024; 24:4353. [PMID: 39001132 PMCID: PMC11244248 DOI: 10.3390/s24134353] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/08/2024] [Revised: 06/30/2024] [Accepted: 07/02/2024] [Indexed: 07/16/2024]
Abstract
Acquiring underwater depth maps is essential as they provide indispensable three-dimensional spatial information for visualizing the underwater environment. These depth maps serve various purposes, including underwater navigation, environmental monitoring, and resource exploration. While most of the current depth estimation methods can work well in ideal underwater environments with homogeneous illumination, few consider the risk caused by irregular illumination, which is common in practical underwater environments. On the one hand, underwater environments with low-light conditions can reduce image contrast. The reduction brings challenges to depth estimation models in accurately differentiating among objects. On the other hand, overexposure caused by reflection or artificial illumination can degrade the textures of underwater objects, which is crucial to geometric constraints between frames. To address the above issues, we propose an underwater self-supervised monocular depth estimation network integrating image enhancement and auxiliary depth information. We use the Monte Carlo image enhancement module (MC-IEM) to tackle the inherent uncertainty in low-light underwater images through probabilistic estimation. When pixel values are enhanced, object recognition becomes more accessible, allowing for a more precise acquisition of distance information and thus resulting in more accurate depth estimation. Next, we extract additional geometric features through transfer learning, infusing prior knowledge from a supervised large-scale model into a self-supervised depth estimation network to refine loss functions and a depth network to address the overexposure issue. We conduct experiments with two public datasets, which exhibited superior performance compared to existing approaches in underwater depth estimation.
Collapse
Affiliation(s)
- Tong Liu
- Key Laboratory of Ocean Observation and Information of Hainan Province, Sanya Oceanographic Institution, Ocean University of China, Sanya 572024, China; (T.L.); (S.Z.)
- Faculty of Information Science and Engineering, Ocean University of China, Qingdao 266100, China
| | - Sainan Zhang
- Key Laboratory of Ocean Observation and Information of Hainan Province, Sanya Oceanographic Institution, Ocean University of China, Sanya 572024, China; (T.L.); (S.Z.)
- Faculty of Information Science and Engineering, Ocean University of China, Qingdao 266100, China
| | - Zhibin Yu
- Key Laboratory of Ocean Observation and Information of Hainan Province, Sanya Oceanographic Institution, Ocean University of China, Sanya 572024, China; (T.L.); (S.Z.)
- Faculty of Information Science and Engineering, Ocean University of China, Qingdao 266100, China
| |
Collapse
|
140
|
Zu W, Xie S, Zhao Q, Li G, Ma L. Embedded prompt tuning: Towards enhanced calibration of pretrained models for medical images. Med Image Anal 2024; 97:103258. [PMID: 38996667 DOI: 10.1016/j.media.2024.103258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 06/21/2024] [Accepted: 06/24/2024] [Indexed: 07/14/2024]
Abstract
Foundation models pre-trained on large-scale data have been widely witnessed to achieve success in various natural imaging downstream tasks. Parameter-efficient fine-tuning (PEFT) methods aim to adapt foundation models to new domains by updating only a small portion of parameters in order to reduce computational overhead. However, the effectiveness of these PEFT methods, especially in cross-domain few-shot scenarios, e.g., medical image analysis, has not been fully explored. In this work, we facilitate the study of the performance of PEFT when adapting foundation models to medical image classification tasks. Furthermore, to alleviate the limitations of prompt introducing ways and approximation capabilities on Transformer architectures of mainstream prompt tuning methods, we propose the Embedded Prompt Tuning (EPT) method by embedding prompt tokens into the expanded channels. We also find that there are anomalies in the feature space distribution of foundation models during pre-training process, and prompt tuning can help mitigate this negative impact. To explain this phenomenon, we also introduce a novel perspective to understand prompt tuning: Prompt tuning is a distribution calibrator. And we support it by analysing patch-wise scaling and feature separation operations contained in EPT. Our experiments show that EPT outperforms several state-of-the-art fine-tuning methods by a significant margin on few-shot medical image classification tasks, and completes the fine-tuning process within highly competitive time, indicating EPT is an effective PEFT method. The source code is available at github.com/zuwenqiang/EPT.
Collapse
Affiliation(s)
- Wenqiang Zu
- School of Artificial Intelligence, University of Chinese Academy of Sciences, China; Institute of Automation, Chinese Academy of Sciences, China; Beijing Academy of Artificial Intelligence, China
| | - Shenghao Xie
- Academy for Advanced Interdisciplinary Studies, Peking University, China; School of Cyber Science and Engineering, Wuhan University, China; Beijing Academy of Artificial Intelligence, China
| | - Qing Zhao
- College of Future Technology, National Biomedical Imaging Center, Peking University, China
| | - Guoqi Li
- Institute of Automation, Chinese Academy of Sciences, China
| | - Lei Ma
- College of Future Technology, National Biomedical Imaging Center, Peking University, China; Beijing Academy of Artificial Intelligence, China.
| |
Collapse
|
141
|
Hao J, Chen S. Language-aware multiple datasets detection pretraining for DETRs. Neural Netw 2024; 179:106506. [PMID: 38996689 DOI: 10.1016/j.neunet.2024.106506] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2024] [Revised: 05/17/2024] [Accepted: 07/02/2024] [Indexed: 07/14/2024]
Abstract
Pretraining on large-scale datasets can boost the performance of object detectors while the annotated datasets for object detection are hard to scale up due to the high labor cost. What we possess are numerous isolated filed-specific datasets, thus, it is appealing to jointly pretrain models across aggregation of datasets to enhance data volume and diversity. In this paper, we propose a strong framework for utilizing Multiple datasets to pretrain DETR-like detectors, termed METR, without the need for manual label spaces integration. It converts the typical multi-classification in object detection into binary classification by introducing a pre-trained language model. Specifically, we design a category extraction module for extracting potential categories involved in an image and assign these categories into different queries by language embeddings. Each query is only responsible for predicting a class-specific object. Besides, to adapt our novel detection paradigm, we propose a Class-wise Bipartite Matching strategy that limits the ground truths to match queries assigned to the same category. Extensive experiments demonstrate that METR achieves extraordinary results on either multi-task joint training or the pretrain & finetune paradigm. Notably, our pre-trained models have high flexible transferability and increase the performance upon various DETR-like detectors on COCO val2017 benchmark. Our code is publicly available at: https://github.com/isbrycee/METR.
Collapse
Affiliation(s)
- Jing Hao
- VIS, Baidu Inc., Beijing, 100000, China.
| | - Song Chen
- VIS, Baidu Inc., Beijing, 100000, China.
| |
Collapse
|
142
|
Gryshchuk V, Singh D, Teipel S, Dyrba M. Contrastive Self-supervised Learning for Neurodegenerative Disorder Classification. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.07.03.24309882. [PMID: 39006425 PMCID: PMC11245060 DOI: 10.1101/2024.07.03.24309882] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
Neurodegenerative diseases such as Alzheimer's disease (AD) or frontotemporal lobar degeneration (FTLD) involve specific loss of brain volume, detectable in vivo using T1-weighted MRI scans. Supervised machine learning approaches classifying neurodegenerative diseases require diagnostic-labels for each sample. However, it can be difficult to obtain expert labels for a large amount of data. Self-supervised learning (SSL) offers an alternative for training machine learning models without data-labels. We investigated if the SSL models can applied to distinguish between different neurodegenerative disorders in an interpretable manner. Our method comprises a feature extractor and a downstream classification head. A deep convolutional neural network trained in a contrastive self-supervised way serves as the feature extractor, learning latent representation, while the classifier head is a single-layer perceptron. We used N=2694 T1-weighted MRI scans from four data cohorts: two ADNI datasets, AIBL and FTLDNI, including cognitively normal controls (CN), cases with prodromal and clinical AD, as well as FTLD cases differentiated into its sub-types. Our results showed that the feature extractor trained in a self-supervised way provides generalizable and robust representations for the downstream classification. For AD vs. CN, our model achieves 82% balanced accuracy on the test subset and 80% on an independent holdout dataset. Similarly, the Behavioral variant of frontotemporal dementia (BV) vs. CN model attains an 88% balanced accuracy on the test subset. The average feature attribution heatmaps obtained by the Integrated Gradient method highlighted hallmark regions, i.e., temporal gray matter atrophy for AD, and insular atrophy for BV. In conclusion, our models perform comparably to state-of-the-art supervised deep learning approaches. This suggests that the SSL methodology can successfully make use of unannotated neuroimaging datasets as training data while remaining robust and interpretable.
Collapse
|
143
|
Zhang J, Qin Y, Tian R, Bai X, Liu J. Similarity measure method of near-infrared spectrum combined with multi-attribute information. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2024; 322:124783. [PMID: 38972098 DOI: 10.1016/j.saa.2024.124783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/18/2024] [Revised: 07/01/2024] [Accepted: 07/03/2024] [Indexed: 07/09/2024]
Abstract
Due to the high-dimensionality, redundancy, and non-linearity of the near-infrared (NIR) spectra data, as well as the influence of attributes such as producing area and grade of the sample, which can all affect the similarity measure between samples. This paper proposed a t-distributed stochastic neighbor embedding algorithm based on Sinkhorn distance (St-SNE) combined with multi-attribute data information. Firstly, the Sinkhorn distance was introduced which can solve problems such as KL divergence asymmetry and sparse data distribution in high-dimensional space, thereby constructing probability distributions that make low-dimensional space similar to high-dimensional space. In addition, to address the impact of multi-attribute features of samples on similarity measure, a multi-attribute distance matrix was constructed using information entropy, and then combined with the numerical matrix of spectral data to obtain a mixed data matrix. In order to validate the effectiveness of the St-SNE algorithm, dimensionality reduction projection was performed on NIR spectral data and compared with PCA, LPP, and t-SNE algorithms. The results demonstrated that the St-SNE algorithm effectively distinguishes samples with different attribute information, and produced more distinct projection boundaries of sample category in low-dimensional space. Then we tested the classification performance of St-SNE for different attributes by using the tobacco and mango datasets, and compared it with LPP, t-SNE, UMAP, and Fisher t-SNE algorithms. The results showed that St-SNE algorithm had the highest classification accuracy for different attributes. Finally, we compared the results of searching the most similar sample with the target tobacco for cigarette formulas, and experiments showed that the St-SNE had the highest consistency with the recommendation of the experts than that of the other algorithms. It can provide strong support for the maintenance and design of the product formula.
Collapse
Affiliation(s)
- Jinfeng Zhang
- College of Information Science and Technology, Qingdao University of Science and Technology, China
| | - Yuhua Qin
- College of Information Science and Technology, Qingdao University of Science and Technology, China.
| | - Rongkun Tian
- College of Information Science and Technology, Qingdao University of Science and Technology, China
| | - Xiaoli Bai
- R&D Center, China Tobacco Yunnan Industrial Co., Ltd, No. 367 Hongjin Road, Kunming 650231, China
| | - Jing Liu
- R&D Center, China Tobacco Yunnan Industrial Co., Ltd, No. 367 Hongjin Road, Kunming 650231, China
| |
Collapse
|
144
|
Joris P, Moermans R, Jenar E, Vandermeulen D, Claes P. Area of origin estimation from multiple arbitrarily oriented surfaces using marker-guided structure from motion. Forensic Sci Int 2024; 361:112140. [PMID: 39024802 DOI: 10.1016/j.forsciint.2024.112140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Revised: 07/01/2024] [Accepted: 07/03/2024] [Indexed: 07/20/2024]
Abstract
Bloodstain pattern analysis plays a crucial role in forensic investigations. Projected patterns can offer valuable insights into the dynamics of crime scenes. In this paper, we propose and validate a novel approach that extends existing software, HemoVision, to analyze impact patterns that are distributed across multiple arbitrarily oriented surfaces. The proposed method integrates HemoVision's marker-based system with structure from motion (SfM) techniques to reconstruct the three-dimensional geometry of impact patterns using only two-dimensional photographs. Controlled experiments were used to validate the proposed approach, demonstrating robustness in reconstruction accuracy with median translation errors below 3 mm and median angular errors below 0.2°, irrespective of imaging device or image resolution. Comparing the estimated areas origin to their known ground truth, the proposed method achieved an average total error of 8.12 cm, with the primary source of error being the vertical dimension. Despite this, the overall error remains well within the ranges of error reported in prior work. This study demonstrates that HemoVision can be used to analyze complex impact patterns using only two-dimensional photographs, providing forensic experts with an efficient and accessible tool for investigating intricate crime scenes involving multi-surface impact patterns.
Collapse
Affiliation(s)
- Philip Joris
- Forentrics, Antwerp, Belgium; Department of Electrical Engineering, ESAT/PSI, KU Leuven, Leuven, Belgium; Medical Imaging Research Center, MIRC, KU Leuven, Leuven, Belgium.
| | - Ruben Moermans
- Forentrics, Antwerp, Belgium; Department of Electrical Engineering, ESAT/PSI, KU Leuven, Leuven, Belgium; Medical Imaging Research Center, MIRC, KU Leuven, Leuven, Belgium.
| | - Els Jenar
- Department of Forensic Medicine, University Hospitals UZ Leuven, Leuven, Belgium.
| | - Dirk Vandermeulen
- Department of Electrical Engineering, ESAT/PSI, KU Leuven, Leuven, Belgium; Medical Imaging Research Center, MIRC, KU Leuven, Leuven, Belgium.
| | - Peter Claes
- Department of Electrical Engineering, ESAT/PSI, KU Leuven, Leuven, Belgium; Medical Imaging Research Center, MIRC, KU Leuven, Leuven, Belgium; Department of Human Genetics, KU Leuven, Leuven, Belgium.
| |
Collapse
|
145
|
Li S, Zhu Y, Spencer BA, Wang G. Single-Subject Deep-Learning Image Reconstruction With a Neural Optimization Transfer Algorithm for PET-Enabled Dual-Energy CT Imaging. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:4075-4089. [PMID: 38941203 DOI: 10.1109/tip.2024.3418347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/30/2024]
Abstract
Combining dual-energy computed tomography (DECT) with positron emission tomography (PET) offers many potential clinical applications but typically requires expensive hardware upgrades or increases radiation doses on PET/CT scanners due to an extra X-ray CT scan. The recent PET-enabled DECT method allows DECT imaging on PET/CT without requiring a second X-ray CT scan. It combines the already existing X-ray CT image with a 511 keV γ -ray CT (gCT) image reconstructed from time-of-flight PET emission data. A kernelized framework has been developed for reconstructing gCT image but this method has not fully exploited the potential of prior knowledge. Use of deep neural networks may explore the power of deep learning in this application. However, common approaches require a large database for training, which is impractical for a new imaging method like PET-enabled DECT. Here, we propose a single-subject method by using neural-network representation as a deep coefficient prior to improving gCT image reconstruction without population-based pre-training. The resulting optimization problem becomes the tomographic estimation of nonlinear neural-network parameters from gCT projection data. This complicated problem can be efficiently solved by utilizing the optimization transfer strategy with quadratic surrogates. Each iteration of the proposed neural optimization transfer algorithm includes: PET activity image update; gCT image update; and least-square neural-network learning in the gCT image domain. This algorithm is guaranteed to monotonically increase the data likelihood. Results from computer simulation, real phantom data and real patient data have demonstrated that the proposed method can significantly improve gCT image quality and consequent multi-material decomposition as compared to other methods.
Collapse
|
146
|
Nieradzik L, Sieburg-Rockel J, Helmling S, Keuper J, Weibel T, Olbrich A, Stephani H. Automating Wood Species Detection and Classification in Microscopic Images of Fibrous Materials with Deep Learning. MICROSCOPY AND MICROANALYSIS : THE OFFICIAL JOURNAL OF MICROSCOPY SOCIETY OF AMERICA, MICROBEAM ANALYSIS SOCIETY, MICROSCOPICAL SOCIETY OF CANADA 2024; 30:508-520. [PMID: 38709570 DOI: 10.1093/mam/ozae038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Revised: 03/14/2024] [Accepted: 03/31/2024] [Indexed: 05/08/2024]
Abstract
We have developed a methodology for the systematic generation of a large image dataset of macerated wood references, which we used to generate image data for nine hardwood genera. This is the basis for a substantial approach to automate, for the first time, the identification of hardwood species in microscopic images of fibrous materials by deep learning. Our methodology includes a flexible pipeline for easy annotation of vessel elements. We compare the performance of different neural network architectures and hyperparameters. Our proposed method performs similarly well to human experts. In the future, this will improve controls on global wood fiber product flows to protect forests.
Collapse
Affiliation(s)
- Lars Nieradzik
- Image Processing Department, Fraunhofer ITWM, Fraunhofer Platz 1, Kaiserslautern 67663, Rhineland-Palatinate, Germany
| | | | - Stephanie Helmling
- Thünen Institute of Wood Research, Leuschnerstraße 91, Hamburg 21031, Germany
| | - Janis Keuper
- Institute for Machine Learning and Analysis (IMLA), Offenburg University, Badstr. 24, Offenburg 77652, Baden-Wuerttemberg, Germany
| | - Thomas Weibel
- Image Processing Department, Fraunhofer ITWM, Fraunhofer Platz 1, Kaiserslautern 67663, Rhineland-Palatinate, Germany
| | - Andrea Olbrich
- Thünen Institute of Wood Research, Leuschnerstraße 91, Hamburg 21031, Germany
| | - Henrike Stephani
- Image Processing Department, Fraunhofer ITWM, Fraunhofer Platz 1, Kaiserslautern 67663, Rhineland-Palatinate, Germany
| |
Collapse
|
147
|
Groisser BN, Thakur A, Hillstrom HJ, Adhiyaman A, Zucker C, Du J, Cunningham M, Hresko MT, Haddas R, Blanco J, Potter HG, Mintz DN, Breighner RE, Heyer JH, Widmann RF. Fully automated determination of robotic pedicle screw accuracy and precision utilizing computer vision algorithms. J Robot Surg 2024; 18:278. [PMID: 38960985 PMCID: PMC11222209 DOI: 10.1007/s11701-024-02001-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2024] [Accepted: 05/27/2024] [Indexed: 07/05/2024]
Abstract
Historically, pedicle screw accuracy measurements have relied on CT and expert visual assessment of the position of pedicle screws relative to preoperative plans. Proper pedicle screw placement is necessary to avoid complications, cost and morbidity of revision procedures. The aim of this study was to determine accuracy and precision of pedicle screw insertion via a novel computer vision algorithm using preoperative and postoperative computed tomography (CT) scans. Three cadaveric specimens were utilized. Screw placement planning on preoperative CT was performed according to standard clinical practice. Two experienced surgeons performed bilateral T2-L4 instrumentation using robotic-assisted navigation. Postoperative CT scans of the instrumented levels were obtained. Automated segmentation and computer vision techniques were employed to align each preoperative vertebra with its postoperative counterpart and then compare screw positions along all three axes. Registration accuracy was assessed by preoperatively embedding spherical markers (tantalum beads) to measure discrepancies in landmark alignment. Eighty-eight pedicle screws were placed in 3 cadavers' spines. Automated registrations between pre- and postoperative CT achieved sub-voxel accuracy. For the screw tip and tail, the mean three-dimensional errors were 1.67 mm and 1.78 mm, respectively. Mean angular deviation of screw axes from plan was 1.58°. For screw mid-pedicular accuracy, mean absolute error in the medial-lateral and superior-inferior directions were 0.75 mm and 0.60 mm, respectively. This study introduces automated algorithms for determining accuracy and precision of planned pedicle screws. Our accuracy outcomes are comparable or superior to recent robotic-assisted in vivo and cadaver studies. This computerized workflow establishes a standardized protocol for assessing pedicle screw placement accuracy and precision and provides detailed 3D translational and angular accuracy and precision for baseline comparison.
Collapse
Affiliation(s)
- Benjamin N Groisser
- Hospital for Special Surgery, 535 East 70th Street, New York, NY, 10021, USA
| | - Ankush Thakur
- Hospital for Special Surgery, 535 East 70th Street, New York, NY, 10021, USA
| | - Howard J Hillstrom
- Hospital for Special Surgery, 535 East 70th Street, New York, NY, 10021, USA
| | - Akshitha Adhiyaman
- Hospital for Special Surgery, 535 East 70th Street, New York, NY, 10021, USA
| | - Colson Zucker
- Hospital for Special Surgery, 535 East 70th Street, New York, NY, 10021, USA
| | - Jerry Du
- Hospital for Special Surgery, 535 East 70th Street, New York, NY, 10021, USA
| | - Matthew Cunningham
- Hospital for Special Surgery, 535 East 70th Street, New York, NY, 10021, USA
| | | | - Ram Haddas
- University of Rochester Medical Center, Rochester, NY, USA
| | - John Blanco
- Hospital for Special Surgery, 535 East 70th Street, New York, NY, 10021, USA
| | - Hollis G Potter
- Hospital for Special Surgery, 535 East 70th Street, New York, NY, 10021, USA
| | - Douglas N Mintz
- Hospital for Special Surgery, 535 East 70th Street, New York, NY, 10021, USA
| | - Ryan E Breighner
- Hospital for Special Surgery, 535 East 70th Street, New York, NY, 10021, USA
| | - Jessica H Heyer
- Hospital for Special Surgery, 535 East 70th Street, New York, NY, 10021, USA.
| | - Roger F Widmann
- Hospital for Special Surgery, 535 East 70th Street, New York, NY, 10021, USA
| |
Collapse
|
148
|
Ferreira IJMCF, Simões JMS, Pereira B, Correia JNGCC, de Amaral Areia ALF. Ensemble learning for fetal ultrasound and maternal-fetal data to predict mode of delivery after labor induction. Sci Rep 2024; 14:15275. [PMID: 38961231 PMCID: PMC11222528 DOI: 10.1038/s41598-024-65394-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Accepted: 06/19/2024] [Indexed: 07/05/2024] Open
Abstract
Providing adequate counseling on mode of delivery after induction of labor (IOL) is of utmost importance. Various AI algorithms have been developed for this purpose, but rely on maternal-fetal data, not including ultrasound (US) imaging. We used retrospectively collected clinical data from 808 subjects submitted to IOL, totaling 2024 US images, to train AI models to predict vaginal delivery (VD) and cesarean section (CS) outcomes after IOL. The best overall model used only clinical data (F1-score: 0.736; positive predictive value (PPV): 0.734). The imaging models employed fetal head, abdomen and femur US images, showing limited discriminative results. The best model used femur images (F1-score: 0.594; PPV: 0.580). Consequently, we constructed ensemble models to test whether US imaging could enhance the clinical data model. The best ensemble model included clinical data and US femur images (F1-score: 0.689; PPV: 0.693), presenting a false positive and false negative interesting trade-off. The model accurately predicted CS on 4 additional cases, despite misclassifying 20 additional VD, resulting in a 6.0% decrease in average accuracy compared to the clinical data model. Hence, integrating US imaging into the latter model can be a new development in assisting mode of delivery counseling.
Collapse
Affiliation(s)
- Iolanda João Mora Cruz Freitas Ferreira
- Faculty of Medicine of University of Coimbra, Obstetrics Department, University and Hospitalar Centre of Coimbra, Coimbra, Portugal.
- Maternidade Doutor Daniel de Matos, R. Miguel Torga, 3030-165, Coimbra, Portugal.
| | - Joana Maria Silva Simões
- Department of Informatics Engineering, Centre for Informatics and Systems of the University of Coimbra, University of Coimbra, Coimbra, Portugal
| | - Beatriz Pereira
- Department of Physics, University of Coimbra, Coimbra, Portugal
| | | | - Ana Luísa Fialho de Amaral Areia
- Faculty of Medicine of University of Coimbra, Obstetrics Department, University and Hospitalar Centre of Coimbra, Coimbra, Portugal
| |
Collapse
|
149
|
You J, Cai H, Wang Y, Bian A, Cheng K, Meng L, Wang X, Gao P, Chen S, Cai Y, Peng B. Artificial intelligence automated surgical phases recognition in intraoperative videos of laparoscopic pancreatoduodenectomy. Surg Endosc 2024:10.1007/s00464-024-10916-6. [PMID: 38958719 DOI: 10.1007/s00464-024-10916-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Accepted: 05/05/2024] [Indexed: 07/04/2024]
Abstract
BACKGROUND Laparoscopic pancreatoduodenectomy (LPD) is one of the most challenging operations and has a long learning curve. Artificial intelligence (AI) automated surgical phase recognition in intraoperative videos has many potential applications in surgical education, helping shorten the learning curve, but no study has made this breakthrough in LPD. Herein, we aimed to build AI models to recognize the surgical phase in LPD and explore the performance characteristics of AI models. METHODS Among 69 LPD videos from a single surgical team, we used 42 in the building group to establish the models and used the remaining 27 videos in the analysis group to assess the models' performance characteristics. We annotated 13 surgical phases of LPD, including 4 key phases and 9 necessary phases. Two minimal invasive pancreatic surgeons annotated all the videos. We built two AI models for the key phase and necessary phase recognition, based on convolutional neural networks. The overall performance of the AI models was determined mainly by mean average precision (mAP). RESULTS Overall mAPs of the AI models in the test set of the building group were 89.7% and 84.7% for key phases and necessary phases, respectively. In the 27-video analysis group, overall mAPs were 86.8% and 71.2%, with maximum mAPs of 98.1% and 93.9%. We found commonalities between the error of model recognition and the differences of surgeon annotation, and the AI model exhibited bad performance in cases with anatomic variation or lesion involvement with adjacent organs. CONCLUSIONS AI automated surgical phase recognition can be achieved in LPD, with outstanding performance in selective cases. This breakthrough may be the first step toward AI- and video-based surgical education in more complex surgeries.
Collapse
Affiliation(s)
- Jiaying You
- WestChina-California Research Center for Predictive Intervention, Sichuan University West China Hospital, Chengdu, China
- Division of Pancreatic Surgery, Department of General Surgery, Sichuan University West China Hospital, No. 37, Guoxue Alley, Chengdu, 610041, China
| | - He Cai
- Division of Pancreatic Surgery, Department of General Surgery, Sichuan University West China Hospital, No. 37, Guoxue Alley, Chengdu, 610041, China
| | - Yuxian Wang
- Chengdu Withai Innovations Technology Company, Chengdu, China
| | - Ang Bian
- College of Computer Science, Sichuan University, Chengdu, China
| | - Ke Cheng
- Division of Pancreatic Surgery, Department of General Surgery, Sichuan University West China Hospital, No. 37, Guoxue Alley, Chengdu, 610041, China
| | - Lingwei Meng
- Division of Pancreatic Surgery, Department of General Surgery, Sichuan University West China Hospital, No. 37, Guoxue Alley, Chengdu, 610041, China
| | - Xin Wang
- Division of Pancreatic Surgery, Department of General Surgery, Sichuan University West China Hospital, No. 37, Guoxue Alley, Chengdu, 610041, China
| | - Pan Gao
- Division of Pancreatic Surgery, Department of General Surgery, Sichuan University West China Hospital, No. 37, Guoxue Alley, Chengdu, 610041, China
| | - Sirui Chen
- Mianyang Central Hospital, School of Medicine University of Electronic Science and Technology of China, Mianyang, China
| | - Yunqiang Cai
- Division of Pancreatic Surgery, Department of General Surgery, Sichuan University West China Hospital, No. 37, Guoxue Alley, Chengdu, 610041, China.
| | - Bing Peng
- Division of Pancreatic Surgery, Department of General Surgery, Sichuan University West China Hospital, No. 37, Guoxue Alley, Chengdu, 610041, China.
| |
Collapse
|
150
|
Wen R, Yao Y, Li Z, Liu Q, Wang Y, Chen Y. LESM-YOLO: An Improved Aircraft Ducts Defect Detection Model. SENSORS (BASEL, SWITZERLAND) 2024; 24:4331. [PMID: 39001110 PMCID: PMC11244086 DOI: 10.3390/s24134331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Revised: 06/21/2024] [Accepted: 07/01/2024] [Indexed: 07/16/2024]
Abstract
Aircraft ducts play an indispensable role in various systems of an aircraft. The regular inspection and maintenance of aircraft ducts are of great significance for preventing potential failures and ensuring the normal operation of the aircraft. Traditional manual inspection methods are costly and inefficient, especially under low-light conditions. To address these issues, we propose a new defect detection model called LESM-YOLO. In this study, we integrate a lighting enhancement module to improve the accuracy and recognition of the model under low-light conditions. Additionally, to reduce the model's parameter count, we employ space-to-depth convolution, making the model more lightweight and suitable for deployment on edge detection devices. Furthermore, we introduce Mixed Local Channel Attention (MLCA), which balances complexity and accuracy by combining local channel and spatial attention mechanisms, enhancing the overall performance of the model and improving the accuracy and robustness of defect detection. Finally, we compare the proposed model with other existing models to validate the effectiveness of LESM-YOLO. The test results show that our proposed model achieves an mAP of 96.3%, a 5.4% improvement over the original model, while maintaining a detection speed of 138.7, meeting real-time monitoring requirements. The model proposed in this paper provides valuable technical support for the detection of dark defects in aircraft ducts.
Collapse
Affiliation(s)
- Runyuan Wen
- School of Computer Science and Technology, Xidian University, Xi’an 710126, China; (R.W.); (Z.L.); (Q.L.); (Y.W.)
| | - Yong Yao
- School of Computer Science and Technology, Xidian University, Xi’an 710126, China; (R.W.); (Z.L.); (Q.L.); (Y.W.)
| | - Zijian Li
- School of Computer Science and Technology, Xidian University, Xi’an 710126, China; (R.W.); (Z.L.); (Q.L.); (Y.W.)
| | - Qiyang Liu
- School of Computer Science and Technology, Xidian University, Xi’an 710126, China; (R.W.); (Z.L.); (Q.L.); (Y.W.)
| | - Yijing Wang
- School of Computer Science and Technology, Xidian University, Xi’an 710126, China; (R.W.); (Z.L.); (Q.L.); (Y.W.)
| | - Yizhuo Chen
- Guangzhou Institute of Technology, Xidian University, Guangzhou 510530, China;
| |
Collapse
|