201
|
Diaz-Pinto A, Alle S, Nath V, Tang Y, Ihsani A, Asad M, Pérez-García F, Mehta P, Li W, Flores M, Roth HR, Vercauteren T, Xu D, Dogra P, Ourselin S, Feng A, Cardoso MJ. MONAI Label: A framework for AI-assisted interactive labeling of 3D medical images. Med Image Anal 2024; 95:103207. [PMID: 38776843 DOI: 10.1016/j.media.2024.103207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 04/16/2024] [Accepted: 05/13/2024] [Indexed: 05/25/2024]
Abstract
The lack of annotated datasets is a major bottleneck for training new task-specific supervised machine learning models, considering that manual annotation is extremely expensive and time-consuming. To address this problem, we present MONAI Label, a free and open-source framework that facilitates the development of applications based on artificial intelligence (AI) models that aim at reducing the time required to annotate radiology datasets. Through MONAI Label, researchers can develop AI annotation applications focusing on their domain of expertise. It allows researchers to readily deploy their apps as services, which can be made available to clinicians via their preferred user interface. Currently, MONAI Label readily supports locally installed (3D Slicer) and web-based (OHIF) frontends and offers two active learning strategies to facilitate and speed up the training of segmentation algorithms. MONAI Label allows researchers to make incremental improvements to their AI-based annotation application by making them available to other researchers and clinicians alike. Additionally, MONAI Label provides sample AI-based interactive and non-interactive labeling applications, that can be used directly off the shelf, as plug-and-play to any given dataset. Significant reduced annotation times using the interactive model can be observed on two public datasets.
Collapse
Affiliation(s)
- Andres Diaz-Pinto
- School of Biomedical Engineering & Imaging Sciences, King's College London, London, UK; NVIDIA Santa Clara, CA, USA.
| | | | | | | | | | - Muhammad Asad
- School of Biomedical Engineering & Imaging Sciences, King's College London, London, UK
| | - Fernando Pérez-García
- School of Biomedical Engineering & Imaging Sciences, King's College London, London, UK; Department of Medical Physics and Biomedical Engineering, University College London, London, UK
| | - Pritesh Mehta
- School of Biomedical Engineering & Imaging Sciences, King's College London, London, UK; Department of Medical Physics and Biomedical Engineering, University College London, London, UK
| | | | | | | | - Tom Vercauteren
- School of Biomedical Engineering & Imaging Sciences, King's College London, London, UK
| | | | | | - Sebastien Ourselin
- School of Biomedical Engineering & Imaging Sciences, King's College London, London, UK
| | | | - M Jorge Cardoso
- School of Biomedical Engineering & Imaging Sciences, King's College London, London, UK
| |
Collapse
|
202
|
Hermoza R, Nascimento JC, Carneiro G. Weakly-supervised preclinical tumor localization associated with survival prediction from lung cancer screening Chest X-ray images. Comput Med Imaging Graph 2024; 115:102395. [PMID: 38729092 DOI: 10.1016/j.compmedimag.2024.102395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 04/13/2024] [Accepted: 04/30/2024] [Indexed: 05/12/2024]
Abstract
In this paper, we hypothesize that it is possible to localize image regions of preclinical tumors in a Chest X-ray (CXR) image by a weakly-supervised training of a survival prediction model using a dataset containing CXR images of healthy patients and their time-to-death label. These visual explanations can empower clinicians in early lung cancer detection and increase patient awareness of their susceptibility to the disease. To test this hypothesis, we train a censor-aware multi-class survival prediction deep learning classifier that is robust to imbalanced training, where classes represent quantized number of days for time-to-death prediction. Such multi-class model allows us to use post-hoc interpretability methods, such as Grad-CAM, to localize image regions of preclinical tumors. For the experiments, we propose a new benchmark based on the National Lung Cancer Screening Trial (NLST) dataset to test weakly-supervised preclinical tumor localization and survival prediction models, and results suggest that our proposed method shows state-of-the-art C-index survival prediction and weakly-supervised preclinical tumor localization results. To our knowledge, this constitutes a pioneer approach in the field that is able to produce visual explanations of preclinical events associated with survival prediction results.
Collapse
Affiliation(s)
- Renato Hermoza
- Australian Institute for Machine Learning, The University of Adelaide, Australia.
| | - Jacinto C Nascimento
- Institute for Systems and Robotics (ISR/IST), LARSyS, Instituto Superior Técnico, Universidade de Lisboa, Portugal.
| | - Gustavo Carneiro
- Centre for Vision, Speech and Signal Processing (CVSSP), The University of Surrey, UK.
| |
Collapse
|
203
|
Zhang D, Huang H, Zhao Q, Zhou G. Generalized latent multi-view clustering with tensorized bipartite graph. Neural Netw 2024; 175:106282. [PMID: 38599137 DOI: 10.1016/j.neunet.2024.106282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2023] [Revised: 03/22/2024] [Accepted: 03/27/2024] [Indexed: 04/12/2024]
Abstract
Tensor-based multi-view spectral clustering algorithms use tensors to model the structure of multi-dimensional data to take advantage of the complementary information and high-order correlations embedded in the graph, thus achieving impressive clustering performance. However, these algorithms use linear models to obtain consensus, which prevents the learned consensus from adequately representing the nonlinear structure of complex data. In order to address this issue, we propose a method called Generalized Latent Multi-View Clustering with Tensorized Bipartite Graph (GLMC-TBG). Specifically, in this paper we introduce neural networks to learn highly nonlinear mappings that encode nonlinear structures in graphs into latent representations. In addition, multiple views share the same latent consensus through nonlinear interactions. In this way, a more comprehensive common representation from multiple views can be achieved. An Augmented Lagrangian Multiplier with Alternating Direction Minimization (ALM-ADM) framework is designed to optimize the model. Experiments on seven real-world data sets verify that the proposed algorithm is superior to state-of-the-art algorithms.
Collapse
Affiliation(s)
- Dongping Zhang
- School of Automation, Guangdong University of Technology, Guangzhou 510006, China; Guangdong Key Laboratory of IoT Information Technology, Guangdong University of Technology, Guangzhou 510006, China.
| | - Haonan Huang
- School of Automation, Guangdong University of Technology, Guangzhou 510006, China; Key Laboratory of Intelligent Information Processing and System Integration of IoT, Ministry of Education, Guangzhou 510006, China; Guangdong-HongKong-Macao Joint Laboratory for Smart Discrete Manufacturing, Guangzhou 510006, China.
| | - Qibin Zhao
- School of Automation, Guangdong University of Technology, Guangzhou 510006, China; Center for Advanced Intelligence Project (AIP), RIKEN, Tokyo 103-0027, Japan.
| | - Guoxu Zhou
- School of Automation, Guangdong University of Technology, Guangzhou 510006, China; Key Laboratory of Intelligent Detection and The Internet of Things in Manufacturing, Ministry of Education, Guangzhou 510006, China.
| |
Collapse
|
204
|
Du M, Zhang J, Zhi Y, Zhang J, Liu R, Zhang G, Wang J. A method for extracting corneal reflection images from multiple eye images. Comput Biol Med 2024; 177:108631. [PMID: 38824787 DOI: 10.1016/j.compbiomed.2024.108631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 04/27/2024] [Accepted: 05/18/2024] [Indexed: 06/04/2024]
Abstract
The incident light reflected from the cornea is rich in information about the human surroundings, and these reflected rays are imaged by the camera, which can be used for research on human consciousness and gaze analysis, and produce certain help in the fields of psychology, human computer interaction and disease diagnosis. However, limited by the low corneal reflection ability, when a high-definition camera captures corneal reflecting rays, a large amount of color and texture interference from the iris can seriously contaminate the corneal reflection images, resulting in low usability and ubiquity of corneal reflection images. In this paper, we propose a corneal reflection image extraction method with multiple eye images as input. We align the iris regions of multiple eye images with the help of iris localization method, and by comparing multiple iris regions, we obtain the complementary iris regions, so that the iris interference in the corneal reflection region can be stripped completely. A large number of experiments have demonstrated that our work can effectively mitigate iris interference and effectively improve the quality of corneal reflection images.
Collapse
Affiliation(s)
- Mengqi Du
- College of Computer Science and Technology, Zhejiang University of Technology, Liuhe 288, 310023, Hangzhou, China.
| | - Jiayu Zhang
- Department of Ophthalmology, The Third Affiliated Hospital of Wenzhou Medical University, Ruifeng 168, 325200, Ruian, China.
| | - Yuyi Zhi
- Jianxing Honors College, Zhejiang University of Technology, Liuhe 288, 310023, Hangzhou, China.
| | - Jianhua Zhang
- School of Computer Science and Engineering, Tianjin University of Technology, Tianjin, 300384, China.
| | - Ruyu Liu
- School of Information Science and Technology, Hangzhou Normal University, Hangzhou, 311121, China.
| | - Guodao Zhang
- Institute of Intelligent Media Computing, Hangzhou Dianzi University, Hangzhou, 310018, China.
| | - Jing Wang
- Department of Ophthalmology, Zhongshan Hospital, Fudan University, Shanghai, 200032, China.
| |
Collapse
|
205
|
Marhamati M, Dorry B, Imannezhad S, Hussain MA, Neshat AA, Kalmishi A, Momeny M. Patient's airway monitoring during cardiopulmonary resuscitation using deep networks. Med Eng Phys 2024; 129:104179. [PMID: 38906566 DOI: 10.1016/j.medengphy.2024.104179] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 03/16/2024] [Accepted: 05/06/2024] [Indexed: 06/23/2024]
Abstract
Cardiopulmonary resuscitation (CPR) is a crucial life-saving technique commonly administered to individuals experiencing cardiac arrest. Among the important aspects of CPR is ensuring the correct airway position of the patient, which is typically monitored by human tutors or supervisors. This study aims to utilize deep transfer learning for the detection of the patient's correct and incorrect airway position during cardiopulmonary resuscitation. To address the challenge of identifying the airway position, we curated a dataset consisting of 198 recorded video sequences, each lasting 6-8 s, showcasing both correct and incorrect airway positions during mouth-to-mouth breathing and breathing with an Ambu Bag. We employed six cutting-edge deep networks, namely DarkNet19, EfficientNetB0, GoogleNet, MobileNet-v2, ResNet50, and NasnetMobile. These networks were initially pre-trained on computer vision data and subsequently fine-tuned using the CPR dataset. The validation of the fine-tuned networks in detecting the patient's correct airway position during mouth-to-mouth breathing achieved impressive results, with the best sensitivity (98.8 %), specificity (100 %), and F-measure (97.2 %). Similarly, the detection of the patient's correct airway position during breathing with an Ambu Bag exhibited excellent performance, with the best sensitivity (100 %), specificity (99.8 %), and F-measure (99.7 %).
Collapse
Affiliation(s)
- Mahmoud Marhamati
- Department of Nursing, Esfarayen Faculty of Medical Science, Esfarayen, Iran.
| | - Behnam Dorry
- Department of Computer Engineering, Islamic Azad University, Babol Branch, Babol, Iran
| | - Shima Imannezhad
- Department of Pediatrics, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | | | - Ali Asghar Neshat
- Department of Environmental Health, Esfarayen Faculty of Medical Science, Esfarayen, Iran
| | - Abulfazl Kalmishi
- Department of Internal and Surgical Nursing, Faculty of Nursing and Midwifery, Sabzevar University of Medical Sciences, Sabzevar, Iran
| | - Mohammad Momeny
- Department of Geosciences and Geography, University of Helsinki, FI-00014, Finland.
| |
Collapse
|
206
|
Liu A, Guo Y, Yong JH, Xu F. Multi-Grained Radiology Report Generation With Sentence-Level Image-Language Contrastive Learning. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:2657-2669. [PMID: 38437149 DOI: 10.1109/tmi.2024.3372638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2024]
Abstract
The automatic generation of accurate radiology reports is of great clinical importance and has drawn growing research interest. However, it is still a challenging task due to the imbalance between normal and abnormal descriptions and the multi-sentence and multi-topic nature of radiology reports. These features result in significant challenges to generating accurate descriptions for medical images, especially the important abnormal findings. Previous methods to tackle these problems rely heavily on extra manual annotations, which are expensive to acquire. We propose a multi-grained report generation framework incorporating sentence-level image-sentence contrastive learning, which does not require any extra labeling but effectively learns knowledge from the image-report pairs. We first introduce contrastive learning as an auxiliary task for image feature learning. Different from previous contrastive methods, we exploit the multi-topic nature of imaging reports and perform fine-grained contrastive learning by extracting sentence topics and contents and contrasting between sentence contents and refined image contents guided by sentence topics. This forces the model to learn distinct abnormal image features for each specific topic. During generation, we use two decoders to first generate coarse sentence topics and then the fine-grained text of each sentence. We directly supervise the intermediate topics using sentence topics learned by our contrastive objective. This strengthens the generation constraint and enables independent fine-tuning of the decoders using reinforcement learning, which further boosts model performance. Experiments on two large-scale datasets MIMIC-CXR and IU-Xray demonstrate that our approach outperforms existing state-of-the-art methods, evaluated by both language generation metrics and clinical accuracy.
Collapse
|
207
|
Singh S, Singh R, Kumar S, Suri A. A Narrative Review on 3-Dimensional Visualization Techniques in Neurosurgical Education, Simulation, and Planning. World Neurosurg 2024; 187:46-64. [PMID: 38580090 DOI: 10.1016/j.wneu.2024.03.134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 03/22/2024] [Accepted: 03/23/2024] [Indexed: 04/07/2024]
Abstract
BACKGROUND High-fidelity visualization of anatomical organs is crucial for neurosurgical education, simulation, and planning. This becomes much more important for minimally invasive neurosurgical procedures. Realistic anatomical visualization can allow resident surgeons to learn visual cues and orient themselves with the complex 3-dimensional (3D) anatomy. Achieving full fidelity in 3D medical visualization is an active area of research; however, the prior reviews focus on the application area and lack the underlying technical principles. Accordingly, the present study attempts to bridge this gap by providing a narrative review of the techniques used for 3D visualization. METHODS We conducted a literature review on 3D medical visualization technology from 2018 to 2023 using the PubMed and Google Scholar search engines. The cross-referenced manuscripts were extensively studied to find literature that discusses technology relevant to 3D medical visualization. We also compiled and ran software applications that were accessible to us in order to better understand them. RESULTS We present the underlying fundamental technology used in 3D medical visualization in the context of neurosurgical education, simulation, and planning. Further, we discuss and categorize a few important applications based on the 3D visualization techniques they use. CONCLUSIONS The visualization of virtual human organs has not yet achieved a level of realism close to reality. This gap is largely due to the interdisciplinary nature of this research, population diversity, and validation complexities. With the advancements in computational resources and automation of 3D visualization pipelines, next-gen applications may offer enhanced medical 3D visualization fidelity.
Collapse
Affiliation(s)
- Sukhraj Singh
- Amar Nath and Shashi Khosla School of Information Technology, Indian Institute of Technology Delhi, New Delhi, India.
| | - Ramandeep Singh
- Department of Neurosurgery, All India Institute of Medical Sciences, New Delhi, India.
| | - Subodh Kumar
- Department of Computer Science and Engineering, Indian Institute of Technology Delhi, New Delhi, India.
| | - Ashish Suri
- Department of Neurosurgery, All India Institute of Medical Sciences, New Delhi, India.
| |
Collapse
|
208
|
Dudas D, Saghand PG, Dilling TJ, Perez BA, Rosenberg SA, El Naqa I. Deep Learning-Guided Dosimetry for Mitigating Local Failure of Patients With Non-Small Cell Lung Cancer Receiving Stereotactic Body Radiation Therapy. Int J Radiat Oncol Biol Phys 2024; 119:990-1000. [PMID: 38056778 DOI: 10.1016/j.ijrobp.2023.11.059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Revised: 11/14/2023] [Accepted: 11/25/2023] [Indexed: 12/08/2023]
Abstract
PURPOSE Non-small cell lung cancer (NSCLC) stereotactic body radiation therapy with 50 Gy/5 fractions is sometimes considered controversial, as the nominal biologically effective dose (BED) of 100 Gy is felt by some to be insufficient for long-term local control of some lesions. In this study, we analyzed such patients using explainable deep learning techniques and consequently proposed appropriate treatment planning criteria. These novel criteria could help planners achieve optimized treatment plans for maximal local control. METHODS AND MATERIALS A total of 535 patients treated with 50 Gy/5 fractions were used to develop a novel deep learning local response model. A multimodality approach, incorporating computed tomography images, 3-dimensional dose distribution, and patient demographics, combined with a discrete-time survival model, was applied to predict time to failure and the probability of local control. Subsequently, an integrated gradient-weighted class activation mapping method was used to identify the most significant dose-volume metrics predictive of local failure and their optimal cut-points. RESULTS The model was cross-validated, showing an acceptable performance (c-index: 0.72, 95% CI, 0.68-0.75); the testing c-index was 0.69. The model's spatial attention was concentrated mostly in the tumors' periphery (planning target volume [PTV] - internal gross target volume [IGTV]) region. Statistically significant dose-volume metrics in improved local control were BED Dnear-min ≥ 103.8 Gy in IGTV (hazard ratio [HR], 0.31; 95% CI, 015-0.63), V104 ≥ 98% in IGTV (HR, 0.30; 95% CI, 0.15-0.60), gEUD ≥ 103.8 Gy in PTV-IGTV (HR, 0.25; 95% CI, 0.12-0.50), and Dmean ≥ 104.5 Gy in PTV-IGTV (HR, 0.25; 95% CI, 0.12-0.51). CONCLUSIONS Deep learning-identified dose-volume metrics have shown significant prognostic power (log-rank, P = .003) and could be used as additional actionable criteria for treatment planning in NSCLC stereotactic body radiation therapy patients receiving 50 Gy in 5 fractions. Although our data do not confirm or refute that a significantly higher BED for the prescription dose is necessary for tumor control in NSCLC, it might be clinically effective to escalate the nominal prescribed dose from BED 100 to 105 Gy.
Collapse
Affiliation(s)
| | | | - Thomas J Dilling
- Radiation Oncology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, Florida
| | - Bradford A Perez
- Radiation Oncology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, Florida
| | - Stephen A Rosenberg
- Radiation Oncology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, Florida
| | - Issam El Naqa
- Departments of Machine Learning; Radiation Oncology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, Florida
| |
Collapse
|
209
|
Mattia GM, Villain E, Nemmi F, Le Lann MV, Franceries X, Péran P. Investigating the discrimination ability of 3D convolutional neural networks applied to altered brain MRI parametric maps. Artif Intell Med 2024; 153:102897. [PMID: 38810471 DOI: 10.1016/j.artmed.2024.102897] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 03/05/2024] [Accepted: 05/15/2024] [Indexed: 05/31/2024]
Abstract
Convolutional neural networks (CNNs) are gradually being recognized in the neuroimaging community as a powerful tool for image analysis. Despite their outstanding performances, some aspects of CNN functioning are still not fully understood by human operators. We postulated that the interpretability of CNNs applied to neuroimaging data could be improved by investigating their behavior when they are fed data with known characteristics. We analyzed the ability of 3D CNNs to discriminate between original and altered whole-brain parametric maps derived from diffusion-weighted magnetic resonance imaging. The alteration consisted in linearly changing the voxel intensity of either one (monoregion) or two (biregion) anatomical regions in each brain volume, but without mimicking any neuropathology. Performing ten-fold cross-validation and using a hold-out set for testing, we assessed the CNNs' discrimination ability according to the intensity of the altered regions, comparing the latter's size and relative position. Monoregion CNNs showed that the larger the modified region, the smaller the intensity increase needed to achieve good performances. Biregion CNNs systematically outperformed monoregion CNNs, but could only detect one of the two target regions when tested on the corresponding monoregion images. Exploiting prior information on training data allowed for a better understanding of CNN behavior, especially when altered regions were combined. This can inform about the complexity of CNN pattern retrieval and elucidate misclassified examples, particularly relevant for pathological data. The proposed analytical approach may serve to gain insights into CNN behavior and guide the design of enhanced detection systems exploiting our prior knowledge.
Collapse
Affiliation(s)
- Giulia Maria Mattia
- ToNIC, Toulouse NeuroImaging Center, Université de Toulouse, Inserm, UPS, Toulouse, France.
| | - Edouard Villain
- ToNIC, Toulouse NeuroImaging Center, Université de Toulouse, Inserm, UPS, Toulouse, France; LAAS CNRS, Université de Toulouse, CNRS, INSA, UPS, Toulouse, France.
| | - Federico Nemmi
- ToNIC, Toulouse NeuroImaging Center, Université de Toulouse, Inserm, UPS, Toulouse, France.
| | | | - Xavier Franceries
- CRCT, Centre de Recherche en Cancérologie de Toulouse, Inserm, UPS, Toulouse, France.
| | - Patrice Péran
- ToNIC, Toulouse NeuroImaging Center, Université de Toulouse, Inserm, UPS, Toulouse, France.
| |
Collapse
|
210
|
Yu D, Zhong Q, Xiao Y, Feng Z, Tang F, Feng S, Cai Y, Gao Y, Lan T, Li M, Yu F, Wang Z, Gao X, Li Z. Combination of MRI-based prediction and CRISPR/Cas12a-based detection for IDH genotyping in glioma. NPJ Precis Oncol 2024; 8:140. [PMID: 38951603 PMCID: PMC11217299 DOI: 10.1038/s41698-024-00632-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Accepted: 05/30/2024] [Indexed: 07/03/2024] Open
Abstract
Early identification of IDH mutation status is of great significance in clinical therapeutic decision-making in the treatment of glioma. We demonstrate a technological solution to improve the accuracy and reliability of IDH mutation detection by combining MRI-based prediction and a CRISPR-based automatic integrated gene detection system (AIGS). A model was constructed to predict the IDH mutation status using whole slices in MRI scans with a Transformer neural network, and the predictive model achieved accuracies of 0.93, 0.87, and 0.84 using the internal and two external test sets, respectively. Additionally, CRISPR/Cas12a-based AIGS was constructed, and AIGS achieved 100% diagnostic accuracy in terms of IDH detection using both frozen tissue and FFPE samples in one hour. Moreover, the feature attribution of our predictive model was assessed using GradCAM, and the highest correlations with tumor cell percentages in enhancing and IDH-wildtype gliomas were found to have GradCAM importance (0.65 and 0.5, respectively). This MRI-based predictive model could, therefore, guide biopsy for tumor-enriched, which would ensure the veracity and stability of the rapid detection results. The combination of our predictive model and AIGS improved the early determination of IDH mutation status in glioma patients. This combined system of MRI-based prediction and CRISPR/Cas12a-based detection can be used to guide biopsy, resection, and radiation for glioma patients to improve patient outcomes.
Collapse
Affiliation(s)
- Donghu Yu
- Brain Glioma Center & Department of Neurosurgery, Zhongnan Hospital of Wuhan University, Wuhan, China
| | - Qisheng Zhong
- Department of Neurosurgery, 960 Hospital of PLA, Jinan, Shandong, China
| | - Yilei Xiao
- Department of Neurosurgery, Liaocheng People's Hospital, Liaocheng, China
| | - Zhebin Feng
- Department of Neurosurgery, PLA General Hospital, Beijing, China
| | - Feng Tang
- Brain Glioma Center & Department of Neurosurgery, Zhongnan Hospital of Wuhan University, Wuhan, China
| | - Shiyu Feng
- Department of Neurosurgery, PLA General Hospital, Beijing, China
| | - Yuxiang Cai
- Department of Pathology, Zhongnan Hospital of Wuhan University, Wuhan, China
| | - Yutong Gao
- Department of Prosthodontics, Wuhan University Hospital of Stomatology, Wuhan, China
| | - Tian Lan
- Brain Glioma Center & Department of Neurosurgery, Zhongnan Hospital of Wuhan University, Wuhan, China
| | - Mingjun Li
- Department of Radiology, Liaocheng People's Hospital, Liaocheng, China
| | - Fuhua Yu
- Department of Neurosurgery, Liaocheng People's Hospital, Liaocheng, China
| | - Zefen Wang
- Department of Physiology, Wuhan University School of Basic Medical Sciences, Wuhan, China.
| | - Xu Gao
- Department of Neurosurgery, General Hospital of Northern Theater Command, Shenyang, China.
| | - Zhiqiang Li
- Brain Glioma Center & Department of Neurosurgery, Zhongnan Hospital of Wuhan University, Wuhan, China.
| |
Collapse
|
211
|
Nikitin V, Wildenberg G, Mittone A, Shevchenko P, Deriy A, De Carlo F. Laminography as a tool for imaging large-size samples with high resolution. JOURNAL OF SYNCHROTRON RADIATION 2024; 31:851-866. [PMID: 38771775 PMCID: PMC11226144 DOI: 10.1107/s1600577524002923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Accepted: 04/03/2024] [Indexed: 05/23/2024]
Abstract
Despite the increased brilliance of the new generation synchrotron sources, there is still a challenge with high-resolution scanning of very thick and absorbing samples, such as a whole mouse brain stained with heavy elements, and, extending further, brains of primates. Samples are typically cut into smaller parts, to ensure a sufficient X-ray transmission, and scanned separately. Compared with the standard tomography setup where the sample would be cut into many pillars, the laminographic geometry operates with slab-shaped sections significantly reducing the number of sample parts to be prepared, the cutting damage and data stitching problems. In this work, a laminography pipeline for imaging large samples (>1 cm) at micrometre resolution is presented. The implementation includes a low-cost instrument setup installed at the 2-BM micro-CT beamline of the Advanced Photon Source. Additionally, sample mounting, scanning techniques, data stitching procedures, a fast reconstruction algorithm with low computational complexity, and accelerated reconstruction on multi-GPU systems for processing large-scale datasets are presented. The applicability of the whole laminography pipeline was demonstrated by imaging four sequential slabs throughout an entire mouse brain sample stained with osmium, in total generating approximately 12 TB of raw data for reconstruction.
Collapse
Affiliation(s)
- Viktor Nikitin
- Advanced Photon SourceArgonne National LaboratoryLemontIL60439USA
| | | | - Alberto Mittone
- Advanced Photon SourceArgonne National LaboratoryLemontIL60439USA
| | - Pavel Shevchenko
- Advanced Photon SourceArgonne National LaboratoryLemontIL60439USA
| | - Alex Deriy
- Advanced Photon SourceArgonne National LaboratoryLemontIL60439USA
| | | |
Collapse
|
212
|
Zhang H, Gu C, Lan Q, Zhang W, Liu C, Yang J. Learning-based distortion correction enables proximal-scanning endoscopic OCT elastography. BIOMEDICAL OPTICS EXPRESS 2024; 15:4345-4364. [PMID: 39022540 PMCID: PMC11249688 DOI: 10.1364/boe.528522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Revised: 06/13/2024] [Accepted: 06/14/2024] [Indexed: 07/20/2024]
Abstract
Proximal rotary scanning is predominantly used in the clinical practice of endoscopic and intravascular OCT, mainly because of the much lower manufacturing cost of the probe compared to distal scanning. However, proximal scanning causes severe beam stability issues (also known as non-uniform rotational distortion, NURD), which hinders the extension of its applications to functional imaging, such as OCT elastography (OCE). In this work, we demonstrate the abilities of learning-based NURD correction methods to enable the imaging stability required for intensity-based OCE. Compared with the previous learning-based NURD correction methods that use pseudo distortion vectors for model training, we propose a method to extract real distortion vectors from a specific endoscopic OCT system, and validate its superiority in accuracy under both convolutional-neural-network- and transformer-based learning architectures. We further verify its effectiveness in elastography calculations (digital image correlation and optical flow) and the advantages of our method over other NURD correction methods. Using the air pressure of a balloon catheter as a mechanical stimulus, our proximal-scanning endoscopic OCE could effectively differentiate between areas of varying stiffness of atherosclerotic vascular phantoms. Compared with the existing endoscopic OCE methods that measure only in the radial direction, our method could achieve 2D displacement/strain distribution in both radial and circumferential directions.
Collapse
Affiliation(s)
- Haoran Zhang
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Chengfu Gu
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Qi Lan
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Weiyi Zhang
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Chang Liu
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Jianlong Yang
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
213
|
Biderman D, Whiteway MR, Hurwitz C, Greenspan N, Lee RS, Vishnubhotla A, Warren R, Pedraja F, Noone D, Schartner MM, Huntenburg JM, Khanal A, Meijer GT, Noel JP, Pan-Vazquez A, Socha KZ, Urai AE, Cunningham JP, Sawtell NB, Paninski L. Lightning Pose: improved animal pose estimation via semi-supervised learning, Bayesian ensembling and cloud-native open-source tools. Nat Methods 2024; 21:1316-1328. [PMID: 38918605 DOI: 10.1038/s41592-024-02319-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Accepted: 05/17/2024] [Indexed: 06/27/2024]
Abstract
Contemporary pose estimation methods enable precise measurements of behavior via supervised deep learning with hand-labeled video frames. Although effective in many cases, the supervised approach requires extensive labeling and often produces outputs that are unreliable for downstream analyses. Here, we introduce 'Lightning Pose', an efficient pose estimation package with three algorithmic contributions. First, in addition to training on a few labeled video frames, we use many unlabeled videos and penalize the network whenever its predictions violate motion continuity, multiple-view geometry and posture plausibility (semi-supervised learning). Second, we introduce a network architecture that resolves occlusions by predicting pose on any given frame using surrounding unlabeled frames. Third, we refine the pose predictions post hoc by combining ensembling and Kalman smoothing. Together, these components render pose trajectories more accurate and scientifically usable. We released a cloud application that allows users to label data, train networks and process new videos directly from the browser.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | - Anup Khanal
- University of California, Los Angeles, Los Angeles, CA, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
214
|
Arian R, Aghababaei A, Soltanipour A, Khodabandeh Z, Rakhshani S, Iyer SB, Ashtari F, Rabbani H, Kafieh R. SLO-Net: Enhancing Multiple Sclerosis Diagnosis Beyond Optical Coherence Tomography Using Infrared Reflectance Scanning Laser Ophthalmoscopy Images. Transl Vis Sci Technol 2024; 13:13. [PMID: 39017629 PMCID: PMC11262482 DOI: 10.1167/tvst.13.7.13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Accepted: 04/27/2024] [Indexed: 07/18/2024] Open
Abstract
Purpose Several machine learning studies have used optical coherence tomography (OCT) for multiple sclerosis (MS) classification with promising outcomes. Infrared reflectance scanning laser ophthalmoscopy (IR-SLO) captures high-resolution fundus images, commonly combined with OCT for fixed B-scan positions. However, no machine learning research has utilized IR-SLO images for automated MS diagnosis. Methods This study utilized a dataset comprised of IR-SLO images and OCT data from Isfahan, Iran, encompassing 32 MS and 70 healthy individuals. A number of convolutional neural networks (CNNs)-namely, VGG-16, VGG-19, ResNet-50, ResNet-101, and a custom architecture-were trained with both IR-SLO images and OCT thickness maps as two separate input datasets. The highest performing models for each modality were then integrated to create a bimodal model that receives the combination of OCT thickness maps and IR-SLO images. Subject-wise data splitting was employed to prevent data leakage among training, validation, and testing sets. Results Overall, images of the 102 patients from the internal dataset were divided into test, validation, and training subsets. Subsequently, we employed a bootstrapping approach on the training data through iterative sampling with replacement. The performance of the proposed bimodal model was evaluated on the internal test dataset, demonstrating an accuracy of 92.40% ± 4.1% (95% confidence interval [CI], 83.61-98.08), sensitivity of 95.43% ± 5.75% (95% CI, 83.71-100.0), specificity of 92.82% ± 3.72% (95% CI, 81.15-96.77), area under the receiver operating characteristic (AUROC) curve of 96.99% ± 2.99% (95% CI, 86.11-99.78), and area under the precision-recall curve (AUPRC) of 97.27% ± 2.94% (95% CI, 86.83-99.83). Furthermore, to assess the model generalization ability, we examined its performance on an external test dataset following the same bootstrap methodology, achieving promising results, with accuracy of 85.43% ± 0.08% (95% CI, 71.43-100.0), sensitivity of 97.33% ± 0.06% (95% CI, 83.33-100.0), specificity of 84.6% ± 0.10% (95% CI, 71.43-100.0), AUROC curve of 99.67% ± 0.02% (95% CI, 95.63-100.0), and AUPRC of 99.65% ± 0.02% (95% CI, 94.90-100.0). Conclusions Incorporating both modalities improves the performance of automated diagnosis of MS, showcasing the potential of utilizing IR-SLO as a complementary tool alongside OCT. Translational Relevance Should the results of our proposed bimodal model be validated in future work with larger and more diverse datasets, diagnosis of MS based on both OCT and IR-SLO can be reliably integrated into routine clinical practice.
Collapse
Affiliation(s)
- Roya Arian
- Medical Image and Signal Processing Research Center, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Ali Aghababaei
- Medical Image and Signal Processing Research Center, Isfahan University of Medical Sciences, Isfahan, Iran
- School of Medicine, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Asieh Soltanipour
- Medical Image and Signal Processing Research Center, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Zahra Khodabandeh
- Medical Image and Signal Processing Research Center, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Sajed Rakhshani
- Medical Image and Signal Processing Research Center, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Shwasa B. Iyer
- Department of Engineering, Durham University, Durham, UK
| | - Fereshteh Ashtari
- Isfahan Neurosciences Research Center, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Hossein Rabbani
- Medical Image and Signal Processing Research Center, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Raheleh Kafieh
- Department of Engineering, Durham University, Durham, UK
| |
Collapse
|
215
|
Küstner T, Hammernik K, Rueckert D, Hepp T, Gatidis S. Predictive uncertainty in deep learning-based MR image reconstruction using deep ensembles: Evaluation on the fastMRI data set. Magn Reson Med 2024; 92:289-302. [PMID: 38282254 DOI: 10.1002/mrm.30030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Revised: 12/08/2023] [Accepted: 01/10/2024] [Indexed: 01/30/2024]
Abstract
PURPOSE To estimate pixel-wise predictive uncertainty for deep learning-based MR image reconstruction and to examine the impact of domain shifts and architecture robustness. METHODS Uncertainty prediction could provide a measure for robustness of deep learning (DL)-based MR image reconstruction from undersampled data. DL methods bear the risk of inducing reconstruction errors like in-painting of unrealistic structures or missing pathologies. These errors may be obscured by visual realism of DL reconstruction and thus remain undiscovered. Furthermore, most methods are task-agnostic and not well calibrated to domain shifts. We propose a strategy that estimates aleatoric (data) and epistemic (model) uncertainty, which entails training a deep ensemble (epistemic) with nonnegative log-likelihood (aleatoric) loss in addition to the conventional applied losses terms. The proposed procedure can be paired with any DL reconstruction, enabling investigations of their predictive uncertainties on a pixel level. Five different architectures were investigated on the fastMRI database. The impact on the examined uncertainty of in-distributional and out-of-distributional data with changes to undersampling pattern, imaging contrast, imaging orientation, anatomy, and pathology were explored. RESULTS Predictive uncertainty could be captured and showed good correlation to normalized mean squared error. Uncertainty was primarily focused along the aliased anatomies and on hyperintense and hypointense regions. The proposed uncertainty measure was able to detect disease prevalence shifts. Distinct predictive uncertainty patterns were observed for changing network architectures. CONCLUSION The proposed approach enables aleatoric and epistemic uncertainty prediction for DL-based MR reconstruction with an interpretable examination on a pixel level.
Collapse
Affiliation(s)
- Thomas Küstner
- Medical Image and Data Analysis (MIDAS.lab), Department of Diagnostic and Interventional Radiology, University Hospital of Tuebingen, Tübingen, Germany
| | - Kerstin Hammernik
- School of Computation, Information and Technology, Klinikum rechts der Isar, Technical University of Munich, Munich, Germany
- School of Medicine, Klinikum Rechts der Isar, Technical University of Munich, Munich, Germany
| | - Daniel Rueckert
- School of Computation, Information and Technology, Klinikum rechts der Isar, Technical University of Munich, Munich, Germany
- School of Medicine, Klinikum Rechts der Isar, Technical University of Munich, Munich, Germany
- Department of Computing, Imperial College London, London, UK
| | - Tobias Hepp
- Medical Image and Data Analysis (MIDAS.lab), Department of Diagnostic and Interventional Radiology, University Hospital of Tuebingen, Tübingen, Germany
| | - Sergios Gatidis
- Medical Image and Data Analysis (MIDAS.lab), Department of Diagnostic and Interventional Radiology, University Hospital of Tuebingen, Tübingen, Germany
| |
Collapse
|
216
|
Baldeon-Calisto M, Rivera-Velastegui F, Lai-Yuen SK, Riofrío D, Pérez-Pérez N, Benítez D, Flores-Moyano R. DistilIQA: Distilling Vision Transformers for no-reference perceptual CT image quality assessment. Comput Biol Med 2024; 177:108670. [PMID: 38838558 DOI: 10.1016/j.compbiomed.2024.108670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Revised: 04/25/2024] [Accepted: 05/26/2024] [Indexed: 06/07/2024]
Abstract
No-reference image quality assessment (IQA) is a critical step in medical image analysis, with the objective of predicting perceptual image quality without the need for a pristine reference image. The application of no-reference IQA to CT scans is valuable in providing an automated and objective approach to assessing scan quality, optimizing radiation dose, and improving overall healthcare efficiency. In this paper, we introduce DistilIQA, a novel distilled Vision Transformer network designed for no-reference CT image quality assessment. DistilIQA integrates convolutional operations and multi-head self-attention mechanisms by incorporating a powerful convolutional stem at the beginning of the traditional ViT network. Additionally, we present a two-step distillation methodology aimed at improving network performance and efficiency. In the initial step, a "teacher ensemble network" is constructed by training five vision Transformer networks using a five-fold division schema. In the second step, a "student network", comprising of a single Vision Transformer, is trained using the original labeled dataset and the predictions generated by the teacher network as new labels. DistilIQA is evaluated in the task of quality score prediction from low-dose chest CT scans obtained from the LDCT and Projection data of the Cancer Imaging Archive, along with low-dose abdominal CT images from the LDCTIQAC2023 Grand Challenge. Our results demonstrate DistilIQA's remarkable performance in both benchmarks, surpassing the capabilities of various CNNs and Transformer architectures. Moreover, our comprehensive experimental analysis demonstrates the effectiveness of incorporating convolutional operations within the ViT architecture and highlights the advantages of our distillation methodology.
Collapse
Affiliation(s)
- Maria Baldeon-Calisto
- Departamento de Ingeniería Industrial and Instituto de Innovación en Productividad y Logística CATENA-USFQ, Universidad San Francisco de Quito USFQ, Quito, 170157, Ecuador; Colegio de Ciencias e Ingenierías "El Politécnico", Universidad San Francisco de Quito USFQ, Quito, 170157, Ecuador.
| | | | - Susana K Lai-Yuen
- Department of Industrial and Management Systems Engineering, University of South Florida, Tampa, 33620, FL, USA.
| | - Daniel Riofrío
- Colegio de Ciencias e Ingenierías "El Politécnico", Universidad San Francisco de Quito USFQ, Quito, 170157, Ecuador.
| | - Noel Pérez-Pérez
- Colegio de Ciencias e Ingenierías "El Politécnico", Universidad San Francisco de Quito USFQ, Quito, 170157, Ecuador.
| | - Diego Benítez
- Colegio de Ciencias e Ingenierías "El Politécnico", Universidad San Francisco de Quito USFQ, Quito, 170157, Ecuador.
| | - Ricardo Flores-Moyano
- Colegio de Ciencias e Ingenierías "El Politécnico", Universidad San Francisco de Quito USFQ, Quito, 170157, Ecuador.
| |
Collapse
|
217
|
Qi J, Zhou P, Ran G, Gao H, Wang P, Li D, Gao Y, Navarro-Alarcon D. Model predictive manipulation of compliant objects with multi-objective optimizer and adversarial network for occlusion compensation. ISA TRANSACTIONS 2024; 150:359-373. [PMID: 38797650 DOI: 10.1016/j.isatra.2024.05.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 05/08/2024] [Accepted: 05/08/2024] [Indexed: 05/29/2024]
Abstract
BACKGROUND The manipulation of compliant objects by robotic systems remains a challenging task, largely due to their variable shapes and the complex, high-dimensional nature of their interaction dynamics. Traditional robotic manipulation strategies struggle with the accurate modeling and control necessary to handle such materials, especially in the presence of visual occlusions that frequently occur in dynamic environments. Meanwhile, for most unstructured environments, robots are required to have autonomous interactions with their surroundings. METHODS To solve the shape manipulation of compliant objects in an unstructured environment, we begin by exploring the regression-based algorithm of representing the high-dimensional configuration space of deformable objects in a compressed form that enables efficient and effective manipulation. Simultaneously, we address the issue of visual occlusions by proposing the integration of an adversarial network, enabling guiding the shaping task even with partial observations of the object. Afterwards, we propose a receding-time estimator to coordinate the robot action with the computed shape features while satisfying various performance criteria. Finally, model predictive controller is utilized to compute the robot's shaping motions subject to safety constraints. Detailed experiments are presented to evaluate the proposed manipulation framework. SIGNIFICANT FINDINGS Our MPC framework utilizes the compressed representation and occlusion-compensated information to predict the object's behavior, while the multi-objective optimizer ensures that the resulting control actions meet multiple performance criteria. Through rigorous experimental validation, our approach demonstrates superior manipulation capabilities in scenarios with visual obstructions, outperforming existing methods in terms of precision and operational reliability. The findings highlight the potential of our integrated approach to significantly enhance the manipulation of compliant objects in real-world robotic applications.
Collapse
Affiliation(s)
- Jiaming Qi
- Centre for Transformative Garment Production, The Hong Kong University, NT, Hong Kong.
| | - Peng Zhou
- Centre for Transformative Garment Production, The Hong Kong University, NT, Hong Kong.
| | - Guangtao Ran
- Harbin Institute of Technology, Department of Control Science and Engineering, Heilongjiang, China.
| | - Han Gao
- The School of Automation, Beijing Institution of Technology, Beijing, China.
| | - Pengyu Wang
- Department of Aerospace Engineering, Korea Advanced Institute of Science and Technology, Republic of Korea.
| | - Dongyu Li
- Beihang University, School of Cyber Science and Technology, Beijing, China.
| | - Yufeng Gao
- Harbin Institute of Technology, Department of Control Science and Engineering, Heilongjiang, China.
| | - David Navarro-Alarcon
- The Hong Kong Polytechnic University, Department of Mechanical Engineering, Kowloon, Hong Kong.
| |
Collapse
|
218
|
Wang Z, Wu M, Liu Q, Wang X, Yan C, Song T. Multiclassification of Hepatic Cystic Echinococcosis by Using Multiple Kernel Learning Framework and Ultrasound Images. ULTRASOUND IN MEDICINE & BIOLOGY 2024; 50:1034-1044. [PMID: 38679514 DOI: 10.1016/j.ultrasmedbio.2024.03.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 03/10/2024] [Accepted: 03/30/2024] [Indexed: 05/01/2024]
Abstract
To properly treat and care for hepatic cystic echinococcosis (HCE), it is essential to make an accurate diagnosis before treatment. OBJECTIVE The objective of this study was to assess the diagnostic accuracy of computer-aided diagnosis techniques in classifying HCE ultrasound images into five subtypes. METHODS A total of 1820 HCE ultrasound images collected from 967 patients were included in the study. A multi-kernel learning method was developed to learn the texture and depth features of the ultrasound images. Combined kernel functions were built-in Support Vector Machine (MK-SVM) for the classification work. The experimental results were evaluated using five-fold cross-validation. Finally, our approach was compared with three other machine learning algorithms: the decision tree classifier, random forest, and gradient boosting decision tree. RESULTS Among all the methods used in the study, the MK-SVM achieved the highest accuracy of 96.6% on the fused feature set. CONCLUSION The multi-kernel learning method effectively learns different image features from ultrasound images by utilizing various kernels. The MK-SVM method, which combines the learning of texture features and depth features separately, has significant application value in HCE classification tasks.
Collapse
Affiliation(s)
- Zhengye Wang
- Center for Disease Control and Prevention, Xinjiang Production and Construction Corps, Urumqi, China; Ultrasound Department, State Key Laboratory of Pathogenesis, Prevention and Treatment of High Incidence Disease in Central Asia, The First Affiliated Hospital of Xinjiang Medical University, Urumqi, China
| | - Miao Wu
- College of Medical Engineering and Technology, Xinjiang Medical University, Urumqi, China
| | - Qian Liu
- Basic Medical College, Xinjiang Medical University, Urumqi, China
| | - Xiaorong Wang
- Ultrasound Department, State Key Laboratory of Pathogenesis, Prevention and Treatment of High Incidence Disease in Central Asia, The First Affiliated Hospital of Xinjiang Medical University, Urumqi, China
| | - Chuanbo Yan
- College of Medical Engineering and Technology, Xinjiang Medical University, Urumqi, China
| | - Tao Song
- Ultrasound Department, State Key Laboratory of Pathogenesis, Prevention and Treatment of High Incidence Disease in Central Asia, The First Affiliated Hospital of Xinjiang Medical University, Urumqi, China.
| |
Collapse
|
219
|
Zhao Z, Xie W, Zuo B, Wang Y. Skeleton Extraction for Articulated Objects With the Spherical Unwrapping Profiles. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:3731-3748. [PMID: 37022000 DOI: 10.1109/tvcg.2023.3239370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Embedding unified skeletons into unregistered scans is fundamental to finding correspondences, depicting motions, and capturing underlying structures among the articulated objects in the same category. Some existing approaches rely on laborious registration to adapt a predefined LBS model to each input, while others require the input to be set to a canonical pose, e.g., T-pose or A-pose. However, their effectiveness is always influenced by the water-tightness, face topology, and vertex density of the input mesh. At the core of our approach lies a novel unwrapping method, named SUPPLE (Spherical UnwraPping ProfiLEs), which maps a surface into image planes independent of mesh topologies. Based on this lower-dimensional representation, a learning-based framework is further designed to localize and connect skeletal joints with fully convolutional architectures. Experiments demonstrate that our framework yields reliable skeleton extractions across a broad range of articulated categories, from raw scans to online CADs.
Collapse
|
220
|
Culley S, Caballero AC, Burden JJ, Uhlmann V. Made to measure: An introduction to quantifying microscopy data in the life sciences. J Microsc 2024; 295:61-82. [PMID: 37269048 DOI: 10.1111/jmi.13208] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 05/25/2023] [Accepted: 05/26/2023] [Indexed: 06/04/2023]
Abstract
Images are at the core of most modern biological experiments and are used as a major source of quantitative information. Numerous algorithms are available to process images and make them more amenable to be measured. Yet the nature of the quantitative output that is useful for a given biological experiment is uniquely dependent upon the question being investigated. Here, we discuss the 3 main types of information that can be extracted from microscopy data: intensity, morphology, and object counts or categorical labels. For each, we describe where they come from, how they can be measured, and what may affect the relevance of these measurements in downstream data analysis. Acknowledging that what makes a measurement 'good' is ultimately down to the biological question being investigated, this review aims at providing readers with a toolkit to challenge how they quantify their own data and be critical of conclusions drawn from quantitative bioimage analysis experiments.
Collapse
Affiliation(s)
- Siân Culley
- Randall Centre for Cell and Molecular Biophysics, King's College London, London, UK
| | | | | | - Virginie Uhlmann
- European Bioinformatics Institute (EMBL-EBI), EMBL, Cambridge, UK
| |
Collapse
|
221
|
Liu Y, Xia K, Cen Y, Ying S, Zhao Z. Artificial intelligence for caries detection: a novel diagnostic tool using deep learning algorithms. Oral Radiol 2024; 40:375-384. [PMID: 38498223 DOI: 10.1007/s11282-024-00741-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Accepted: 01/26/2024] [Indexed: 03/20/2024]
Abstract
OBJECTIVES The aim of this study was to develop an assessment tool for automatic detection of dental caries in periapical radiographs using convolutional neural network (CNN) architecture. METHODS A novel diagnostic model named ResNet + SAM was established using numerous periapical radiographs (4278 images) annotated by medical experts to automatically detect dental caries. The performance of the model was compared to the traditional CNNs (VGG19, ResNet-50), and the dentists. The Gradient-weighted Class Activation Mapping (Grad-CAM) technique shows the region of interest in the image for the CNNs. RESULTS ResNet + SAM demonstrated significantly improved performance compared to the modified ResNet-50 model, with an average F1 score of 0.886 (95% CI 0.855-0.918), accuracy of 0.885 (95% CI 0.862-0.901) and AUC of 0.954 (95% CI 0.924-0.980). The comparison between the performance of the model and the dentists revealed that the model achieved higher accuracy than that of the junior dentists. With the assist of the tool, the dentists achieved superior metrics with a mean F1 score of 0.827 and the interobserver agreement for dental caries is enhanced from 0.592/0.610 to 0.706/0.723. CONCLUSIONS According to the results obtained from the experiments, the automatic assessment tool using the ResNet + SAM model shows remarkable performance and has excellent possibilities in identifying dental caries. The use of the assessment tool in clinical practice can be of great benefit as a clinical decision-making support in dentistry and reduce the workload of dentists.
Collapse
Affiliation(s)
- Yiliang Liu
- College of Computer Science, Sichuan University, No.24 South Section 1, Yihuan Road, Chengdu, 610065, China
- State Key Laboratory of Fundamental Science on Synthetic Vision, College of Computer Science, Sichuan University, Chengdu, 610064, Sichuan, China
| | - Kai Xia
- State Key Laboratory of Oral Diseases and National Clinical Research Center for Oral Diseases, Department of Orthodontics, West China Hospital of Stomatology, Sichuan University, No. 14, 3rd section, South Renmin Road, Chengdu, 610041, Sichuan, China
| | - Yueyan Cen
- State Key Laboratory of Oral Diseases and National Clinical Research Center for Oral Diseases, West China Hospital of Stomatology, Sichuan University, No. 14, 3rd section, South Renmin Road, Chengdu, 610041, Sichuan, China
| | - Sancong Ying
- College of Computer Science, Sichuan University, No.24 South Section 1, Yihuan Road, Chengdu, 610065, China.
- State Key Laboratory of Fundamental Science on Synthetic Vision, College of Computer Science, Sichuan University, Chengdu, 610064, Sichuan, China.
| | - Zhihe Zhao
- State Key Laboratory of Oral Diseases and National Clinical Research Center for Oral Diseases, Department of Orthodontics, West China Hospital of Stomatology, Sichuan University, No. 14, 3rd section, South Renmin Road, Chengdu, 610041, Sichuan, China
| |
Collapse
|
222
|
Hsiao JHW. Understanding Human Cognition Through Computational Modeling. Top Cogn Sci 2024; 16:349-376. [PMID: 38781432 DOI: 10.1111/tops.12737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 05/07/2024] [Accepted: 05/08/2024] [Indexed: 05/25/2024]
Abstract
One important goal of cognitive science is to understand the mind in terms of its representational and computational capacities, where computational modeling plays an essential role in providing theoretical explanations and predictions of human behavior and mental phenomena. In my research, I have been using computational modeling, together with behavioral experiments and cognitive neuroscience methods, to investigate the information processing mechanisms underlying learning and visual cognition in terms of perceptual representation and attention strategy. In perceptual representation, I have used neural network models to understand how the split architecture in the human visual system influences visual cognition, and to examine perceptual representation development as the results of expertise. In attention strategy, I have developed the Eye Movement analysis with Hidden Markov Models method for quantifying eye movement pattern and consistency using both spatial and temporal information, which has led to novel findings across disciplines not discoverable using traditional methods. By integrating it with deep neural networks (DNN), I have developed DNN+HMM to account for eye movement strategy learning in human visual cognition. The understanding of the human mind through computational modeling also facilitates research on artificial intelligence's (AI) comparability with human cognition, which can in turn help explainable AI systems infer humans' belief on AI's operations and provide human-centered explanations to enhance human-AI interaction and mutual understanding. Together, these demonstrate the essential role of computational modeling methods in providing theoretical accounts of the human mind as well as its interaction with its environment and AI systems.
Collapse
|
223
|
Wang J, Qiao L, Zhou S, Zhou J, Wang J, Li J, Ying S, Chang C, Shi J. Weakly Supervised Lesion Detection and Diagnosis for Breast Cancers With Partially Annotated Ultrasound Images. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:2509-2521. [PMID: 38373131 DOI: 10.1109/tmi.2024.3366940] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/21/2024]
Abstract
Deep learning (DL) has proven highly effective for ultrasound-based computer-aided diagnosis (CAD) of breast cancers. In an automatic CAD system, lesion detection is critical for the following diagnosis. However, existing DL-based methods generally require voluminous manually-annotated region of interest (ROI) labels and class labels to train both the lesion detection and diagnosis models. In clinical practice, the ROI labels, i.e. ground truths, may not always be optimal for the classification task due to individual experience of sonologists, resulting in the issue of coarse annotation to limit the diagnosis performance of a CAD model. To address this issue, a novel Two-Stage Detection and Diagnosis Network (TSDDNet) is proposed based on weakly supervised learning to improve diagnostic accuracy of the ultrasound-based CAD for breast cancers. In particular, all the initial ROI-level labels are considered as coarse annotations before model training. In the first training stage, a candidate selection mechanism is then designed to refine manual ROIs in the fully annotated images and generate accurate pseudo-ROIs for the partially annotated images under the guidance of class labels. The training set is updated with more accurate ROI labels for the second training stage. A fusion network is developed to integrate detection network and classification network into a unified end-to-end framework as the final CAD model in the second training stage. A self-distillation strategy is designed on this model for joint optimization to further improves its diagnosis performance. The proposed TSDDNet is evaluated on three B-mode ultrasound datasets, and the experimental results indicate that it achieves the best performance on both lesion detection and diagnosis tasks, suggesting promising application potential.
Collapse
|
224
|
Chen JS, Goubran M, Kim G, Kim MJ, Willmann JK, Zeineh M, Hristov D, Kaffas AE. Motion correction of 3D dynamic contrast-enhanced ultrasound imaging without anatomical B-Mode images: Pilot evaluation in eight patients. Med Phys 2024; 51:4827-4837. [PMID: 38377383 DOI: 10.1002/mp.16995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 12/05/2023] [Accepted: 01/05/2024] [Indexed: 02/22/2024] Open
Abstract
BACKGROUND Dynamic contrast-enhanced ultrasound (DCE-US) is highly susceptible to motion artifacts arising from patient movement, respiration, and operator handling and experience. Motion artifacts can be especially problematic in the context of perfusion quantification. In conventional 2D DCE-US, motion correction (MC) algorithms take advantage of accompanying side-by-side anatomical B-Mode images that contain time-stable features. However, current commercial models of 3D DCE-US do not provide side-by-side B-Mode images, which makes MC challenging. PURPOSE This work introduces a novel MC algorithm for 3D DCE-US and assesses its efficacy when handling clinical data sets. METHODS In brief, the algorithm uses a pyramidal approach whereby short temporal windows consisting of three consecutive frames are created to perform local registrations, which are then registered to a master reference derived from a weighted average of all frames. We applied the algorithm to imaging studies from eight patients with metastatic lesions in the liver and assessed improvements in original versus motion corrected 3D DCE-US cine using: (i) frame-to-frame volumetric overlap of segmented lesions, (ii) normalized correlation coefficient (NCC) between frames (similarity analysis), and (iii) sum of squared errors (SSE), root-mean-squared error (RMSE), and r-squared (R2) quality-of-fit from fitted time-intensity curves (TIC) extracted from a segmented lesion. RESULTS We noted improvements in frame-to-frame lesion overlap across all patients, from 68% ± 13% without correction to 83% ± 3% with MC (p = 0.023). Frame-to-frame similarity as assessed by NCC also improved on two different sets of time points from 0.694 ± 0.057 (original cine) to 0.862 ± 0.049 (corresponding MC cine) and 0.723 ± 0.066 to 0.886 ± 0.036 (p ≤ 0.001 for both). TIC analysis displayed a significant decrease in RMSE (p = 0.018) and a significant increase in R2 goodness-of-fit (p = 0.029) for the patient cohort. CONCLUSIONS Overall, results suggest decreases in 3D DCE-US motion after applying the proposed algorithm.
Collapse
Affiliation(s)
- Jia-Shu Chen
- Department of Neuroscience, Brown University, Providence, Rhode Island, USA
- The Warren Alpert Medical School, Brown University, Providence, Rhode Island, USA
| | - Maged Goubran
- Sunnybrook Health Sciences Center, Toronto, Ontario, Canada
- Department of Radiology, Stanford University, Stanford, California, USA
| | - Gaeun Kim
- Department of Radiology, Stanford University, Stanford, California, USA
| | - Matthew J Kim
- Department of Radiation Oncology - Radiation Physics, Stanford School of Medicine, Stanford University, Stanford, California, USA
| | - Jürgen K Willmann
- Department of Radiology, Molecular Imaging Program, Stanford School of Medicine, Stanford University, Stanford, California, USA
| | - Michael Zeineh
- Department of Radiology, Stanford University, Stanford, California, USA
| | - Dimitre Hristov
- Department of Radiation Oncology - Radiation Physics, Stanford School of Medicine, Stanford University, Stanford, California, USA
| | - Ahmed El Kaffas
- Department of Radiology, Molecular Imaging Program, Stanford School of Medicine, Stanford University, Stanford, California, USA
| |
Collapse
|
225
|
Dai F, Liu Q, Guo Y, Xie R, Wu J, Deng T, Zhu H, Deng L, Song L. Convolutional neural networks combined with classification algorithms for the diagnosis of periodontitis. Oral Radiol 2024; 40:357-366. [PMID: 38393548 DOI: 10.1007/s11282-024-00739-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Accepted: 01/03/2024] [Indexed: 02/25/2024]
Abstract
OBJECTIVES We aim to develop a deep learning model based on a convolutional neural network (CNN) combined with a classification algorithm (CA) to assist dentists in quickly and accurately diagnosing the stage of periodontitis. MATERIALS AND METHODS Periapical radiographs (PERs) and clinical data were collected. The CNNs including Alexnet, VGG16, and ResNet18 were trained on PER to establish the PER-CNN models for no periodontal bone loss (PBL) and PBL. The CAs including random forest (RF), support vector machine (SVM), naive Bayes (NB), logistic regression (LR), and k-nearest neighbor (KNN) were added to the PER-CNN model for control, stage I, stage II and stage III/IV periodontitis. Heat map was produced using a gradient-weighted class activation mapping method to visualize the regions of interest of the PER-Alexnet model. Clustering analysis was performed based on the ten PER-CNN scores and the clinical characteristics. RESULTS The accuracy of the PER-Alexnet and PER-VGG16 models with the higher performance was 0.872 and 0.853, respectively. The accuracy of the PER-Alexnet + RF model with the highest performance for control, stage I, stage II and stage III/IV was 0.968, 0.960, 0.835 and 0.842, respectively. Heat map showed that the regions of interest predicted by the model were periodontitis bone lesions. We found that age and smoking were significantly related to periodontitis based on the PER-Alexnet scores. CONCLUSION The PER-Alexnet + RF model has reached high performance for whole-case periodontal diagnosis. The CNN models combined with CA can assist dentists in quickly and accurately diagnosing the stage of periodontitis.
Collapse
Affiliation(s)
- Fang Dai
- Center of Stomatology, The Second Affiliated Hospital, Jiangxi Medical College, Nanchang University, No.1, Minde Road, Nanchang, 330000, Jiangxi, China
- The Institute of Periodontal Disease, Nanchang University, Nanchang, China
- JXHC Key Laboratory of Periodontology, The Second Affiliated Hospital of Nanchang University, Nanchang, China
| | - Qiangdong Liu
- Center of Stomatology, The Second Affiliated Hospital, Jiangxi Medical College, Nanchang University, No.1, Minde Road, Nanchang, 330000, Jiangxi, China
- The Second Clinical Medical School, Nanchang University, Nanchang, China
- The Institute of Periodontal Disease, Nanchang University, Nanchang, China
- JXHC Key Laboratory of Periodontology, The Second Affiliated Hospital of Nanchang University, Nanchang, China
| | - Yuchen Guo
- The Second Clinical Medical School, Nanchang University, Nanchang, China
| | - Ruixiang Xie
- School of Life Sciences, Nanchang University, Nanchang, China
| | - Jingting Wu
- Center of Stomatology, The Second Affiliated Hospital, Jiangxi Medical College, Nanchang University, No.1, Minde Road, Nanchang, 330000, Jiangxi, China
- The Institute of Periodontal Disease, Nanchang University, Nanchang, China
- JXHC Key Laboratory of Periodontology, The Second Affiliated Hospital of Nanchang University, Nanchang, China
| | - Tian Deng
- Center of Stomatology, The Second Affiliated Hospital, Jiangxi Medical College, Nanchang University, No.1, Minde Road, Nanchang, 330000, Jiangxi, China
- The Institute of Periodontal Disease, Nanchang University, Nanchang, China
- JXHC Key Laboratory of Periodontology, The Second Affiliated Hospital of Nanchang University, Nanchang, China
| | - Hongbiao Zhu
- Center of Stomatology, The Second Affiliated Hospital, Jiangxi Medical College, Nanchang University, No.1, Minde Road, Nanchang, 330000, Jiangxi, China
- The Institute of Periodontal Disease, Nanchang University, Nanchang, China
- JXHC Key Laboratory of Periodontology, The Second Affiliated Hospital of Nanchang University, Nanchang, China
| | - Libin Deng
- School of Public Health, Nanchang University, No.1299, Xuefu Avenue, Nanchang, 330000, Jiangxi, China.
- Jiangxi Provincial Key Laboratory of Preventive Medicine, Nanchang University, Nanchang, China.
- The Institute of Periodontal Disease, Nanchang University, Nanchang, China.
- JXHC Key Laboratory of Periodontology, The Second Affiliated Hospital of Nanchang University, Nanchang, China.
| | - Li Song
- Center of Stomatology, The Second Affiliated Hospital, Jiangxi Medical College, Nanchang University, No.1, Minde Road, Nanchang, 330000, Jiangxi, China.
- The Institute of Periodontal Disease, Nanchang University, Nanchang, China.
- JXHC Key Laboratory of Periodontology, The Second Affiliated Hospital of Nanchang University, Nanchang, China.
| |
Collapse
|
226
|
Yang X, Li R, Yang X, Zhou Y, Liu Y, Han JDJ. Coordinate-wise monotonic transformations enable privacy-preserving age estimation with 3D face point cloud. SCIENCE CHINA. LIFE SCIENCES 2024; 67:1489-1501. [PMID: 38573362 DOI: 10.1007/s11427-023-2518-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 12/25/2023] [Indexed: 04/05/2024]
Abstract
The human face is a valuable biomarker of aging, but the collection and use of its image raise significant privacy concerns. Here we present an approach for facial data masking that preserves age-related features using coordinate-wise monotonic transformations. We first develop a deep learning model that estimates age directly from non-registered face point clouds with high accuracy and generalizability. We show that the model learns a highly indistinguishable mapping using faces treated with coordinate-wise monotonic transformations, indicating that the relative positioning of facial information is a low-level biomarker of facial aging. Through visual perception tests and computational 3D face verification experiments, we demonstrate that transformed faces are significantly more difficult to perceive for human but not for machines, except when only the face shape information is accessible. Our study leads to a facial data protection guideline that has the potential to broaden public access to face datasets with minimized privacy risks.
Collapse
Affiliation(s)
- Xinyu Yang
- School of Life Sciences, Peking University, Beijing, 100871, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China
| | - Runhan Li
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China
| | - Xindi Yang
- Beijing Key Lab of Traffic Data Analysis and Mining, School of Computer and Information Technology, Beijing Jiaotong University, Beijing, 100044, China
| | - Yong Zhou
- Clinical Research Institute, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Yi Liu
- Beijing Key Lab of Traffic Data Analysis and Mining, School of Computer and Information Technology, Beijing Jiaotong University, Beijing, 100044, China
| | - Jing-Dong J Han
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China.
| |
Collapse
|
227
|
Merrouchi M, Benyoussef Y, Skittou M, Atifi K, Gadi T. ConvCoroNet: a deep convolutional neural network optimized with iterative thresholding algorithm for Covid-19 detection using chest X-ray images. J Biomol Struct Dyn 2024; 42:5699-5712. [PMID: 37354142 DOI: 10.1080/07391102.2023.2227726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Accepted: 06/15/2023] [Indexed: 06/26/2023]
Abstract
Covid-19 is a global pandemic. Early and accurate detection of positive cases prevent the further spread of this epidemic and help to treat rapidly the infected patients. During the peak of this epidemic, there was an insufficiency of Covid-19 test kits. In addition, this technique takes a considerable time in the diagnosis. Hence the need to find fast, accurate and low-cost method to replace or supplement RT PCR-based methods. Covid-19 is a respiratory disease, chest X-ray images are often used to diagnose pneumonia. From this perspective, these images can play an important role in the Covid-19 detection. In this article, we propose ConvCoroNet, a deep convolutional neural network model optimized with new method based on iterative thresholding algorithm to detect coronavirus automatically from chest X-ray images. ConvCoroNet is trained on a dataset prepared by collecting chest X-ray images of Covid-19, pneumonia and normal cases from publically datasets. The experimental results of our proposed model show a high accuracy of 99.50%, sensitivity of 98.80% and specificity of 99.85% when detecting Covid-19 from chest X-ray images. ConvCoroNet achieves promising results in the automatic detection of Covid-19 from chest X-ray images. It may be able to help radiologists in the Covid-19 detection by reducing the examination time of X-ray images.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- M Merrouchi
- Faculty of Science and Technology, Hassan First, Settat, Morocco
| | - Y Benyoussef
- National School of Applied Sciences, Hassan First, Berrechid, Morocco
| | - M Skittou
- Faculty of Science and Technology, Hassan First, Settat, Morocco
| | - K Atifi
- Faculty of Science and Technology, Hassan First, Settat, Morocco
| | - T Gadi
- Faculty of Science and Technology, Hassan First, Settat, Morocco
| |
Collapse
|
228
|
Meng Y, Zhang Y, Xie J, Duan J, Joddrell M, Madhusudhan S, Peto T, Zhao Y, Zheng Y. Multi-granularity learning of explicit geometric constraint and contrast for label-efficient medical image segmentation and differentiable clinical function assessment. Med Image Anal 2024; 95:103183. [PMID: 38692098 DOI: 10.1016/j.media.2024.103183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 01/26/2024] [Accepted: 04/18/2024] [Indexed: 05/03/2024]
Abstract
Automated segmentation is a challenging task in medical image analysis that usually requires a large amount of manually labeled data. However, most current supervised learning based algorithms suffer from insufficient manual annotations, posing a significant difficulty for accurate and robust segmentation. In addition, most current semi-supervised methods lack explicit representations of geometric structure and semantic information, restricting segmentation accuracy. In this work, we propose a hybrid framework to learn polygon vertices, region masks, and their boundaries in a weakly/semi-supervised manner that significantly advances geometric and semantic representations. Firstly, we propose multi-granularity learning of explicit geometric structure constraints via polygon vertices (PolyV) and pixel-wise region (PixelR) segmentation masks in a semi-supervised manner. Secondly, we propose eliminating boundary ambiguity by using an explicit contrastive objective to learn a discriminative feature space of boundary contours at the pixel level with limited annotations. Thirdly, we exploit the task-specific clinical domain knowledge to differentiate the clinical function assessment end-to-end. The ground truth of clinical function assessment, on the other hand, can serve as auxiliary weak supervision for PolyV and PixelR learning. We evaluate the proposed framework on two tasks, including optic disc (OD) and cup (OC) segmentation along with vertical cup-to-disc ratio (vCDR) estimation in fundus images; left ventricle (LV) segmentation at end-diastolic and end-systolic frames along with ejection fraction (LVEF) estimation in two-dimensional echocardiography images. Experiments on nine large-scale datasets of the two tasks under different label settings demonstrate our model's superior performance on segmentation and clinical function assessment.
Collapse
Affiliation(s)
- Yanda Meng
- Department of Eye and Vision Sciences, University of Liverpool, Liverpool, United Kingdom
| | - Yuchen Zhang
- Center for Bioinformatics, Peking University, Beijing, China
| | - Jianyang Xie
- Department of Eye and Vision Sciences, University of Liverpool, Liverpool, United Kingdom
| | - Jinming Duan
- School of Computer Science, University of Birmingham, Birmingham, United Kingdom
| | - Martha Joddrell
- Liverpool Centre for Cardiovascular Science, University of Liverpool and Liverpool Heart & Chest Hospital, Liverpool, United Kingdom; Department of Cardiovascular and Metabolic Medicine, University of Liverpool, Liverpool, United Kingdom
| | - Savita Madhusudhan
- St Paul's Eye Unit, Liverpool University Hospitals NHS Foundation Trust, Liverpool, United Kingdom
| | - Tunde Peto
- School of Medicine, Dentistry and Biomedical Sciences, Queen's University Belfast, Belfast, United Kingdom
| | - Yitian Zhao
- Institute of Biomedical Engineering, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Science, Ningbo, China; Ningbo Eye Hospital, Ningbo, China.
| | - Yalin Zheng
- Department of Eye and Vision Sciences, University of Liverpool, Liverpool, United Kingdom; Liverpool Centre for Cardiovascular Science, University of Liverpool and Liverpool Heart & Chest Hospital, Liverpool, United Kingdom.
| |
Collapse
|
229
|
Mo Y, Liu F, Yang G, Wang S, Zheng J, Wu F, Papież BW, McIlwraith D, He T, Guo Y. Labelling with dynamics: A data-efficient learning paradigm for medical image segmentation. Med Image Anal 2024; 95:103196. [PMID: 38781755 DOI: 10.1016/j.media.2024.103196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 02/20/2024] [Accepted: 05/02/2024] [Indexed: 05/25/2024]
Abstract
The success of deep learning on image classification and recognition tasks has led to new applications in diverse contexts, including the field of medical imaging. However, two properties of deep neural networks (DNNs) may limit their future use in medical applications. The first is that DNNs require a large amount of labeled training data, and the second is that the deep learning-based models lack interpretability. In this paper, we propose and investigate a data-efficient framework for the task of general medical image segmentation. We address the two aforementioned challenges by introducing domain knowledge in the form of a strong prior into a deep learning framework. This prior is expressed by a customized dynamical system. We performed experiments on two different datasets, namely JSRT and ISIC2016 (heart and lungs segmentation on chest X-ray images and skin lesion segmentation on dermoscopy images). We have achieved competitive results using the same amount of training data compared to the state-of-the-art methods. More importantly, we demonstrate that our framework is extremely data-efficient, and it can achieve reliable results using extremely limited training data. Furthermore, the proposed method is rotationally invariant and insensitive to initialization.
Collapse
Affiliation(s)
- Yuanhan Mo
- Big Data Institute, University of Oxford, UK; Data Science Institute, Imperial College London, UK
| | - Fangde Liu
- Data Science Institute, Imperial College London, UK
| | - Guang Yang
- Department of Bioengineering and Imperial-X, Imperial College London, UK
| | - Shuo Wang
- Data Science Institute, Imperial College London, UK
| | - Jianqing Zheng
- Chinese Academy for Medical Sciences Oxford Institute, Nuffield Department of Medicine, University of Oxford, UK
| | - Fuping Wu
- Big Data Institute, University of Oxford, UK
| | | | | | | | - Yike Guo
- Data Science Institute, Imperial College London, UK; Hong Kong University of Science and Technology, Hong Kong.
| |
Collapse
|
230
|
Wang J, Wu S, Yuan Z, Tong Q, Xu K. Frequency compensated diffusion model for real-scene dehazing. Neural Netw 2024; 175:106281. [PMID: 38579573 DOI: 10.1016/j.neunet.2024.106281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 02/29/2024] [Accepted: 03/27/2024] [Indexed: 04/07/2024]
Abstract
Due to distribution shift, deep learning based methods for image dehazing suffer from performance degradation when applied to real-world hazy images. In this paper, this study considers a dehazing framework based on conditional diffusion models for improved generalization to real haze. First, our work finds that optimizing the training objective of diffusion models, i.e., Gaussian noise vectors, is non-trivial. The spectral bias of deep networks hinders the higher frequency modes in Gaussian vectors from being learned and hence impairs the reconstruction of image details. To tackle this issue, this study designs a network unit, named Frequency Compensation block (FCB), with a bank of filters that jointly emphasize the mid-to-high frequencies of an input signal. Our work demonstrates that diffusion models with FCB achieve significant gains in both perceptual and distortion metrics. Second, to further boost the generalization performance, this study proposed a novel data synthesis pipeline, HazeAug, to augment haze in terms of degree and diversity. Within the framework, a solid baseline for blind dehazing is set up where models are trained on synthetic hazy-clean pairs, and directly generalize to real data. Extensive evaluations on real dehazing datasets demonstrate the superior performance of the proposed dehazing diffusion model in distortion metrics. Compared to recent methods pre-trained on large-scale, high-quality image datasets, our model achieves a significant PSNR improvement of over 1 dB on challenging databases such as Dense-Haze and Nh-Haze.
Collapse
Affiliation(s)
- Jing Wang
- Sony Research and Development Center Beijing Lab, Chao-Yang District, Beijing, 100027, China
| | - Songtao Wu
- Sony Research and Development Center Beijing Lab, Chao-Yang District, Beijing, 100027, China.
| | - Zhiqiang Yuan
- Aerospace Information Research Institute, Chinese Academy of Science, Hai-dian District, Beijing, 100094, China
| | - Qiang Tong
- Sony Research and Development Center Beijing Lab, Chao-Yang District, Beijing, 100027, China
| | - Kuanhong Xu
- Sony Research and Development Center Beijing Lab, Chao-Yang District, Beijing, 100027, China
| |
Collapse
|
231
|
Zhang S, Yu W, Jiang F, Nie L, Yao H, Huang Q, Tao D. Stereo Image Restoration via Attention-Guided Correspondence Learning. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:4850-4865. [PMID: 38261483 DOI: 10.1109/tpami.2024.3357709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/25/2024]
Abstract
Although stereo image restoration has been extensively studied, most existing work focuses on restoring stereo images with limited horizontal parallax due to the binocular symmetry constraint. Stereo images with unlimited parallax (e.g., large ranges and asymmetrical types) are more challenging in real-world applications and have rarely been explored so far. To restore high-quality stereo images with unlimited parallax, this paper proposes an attention-guided correspondence learning method, which learns both self- and cross-views feature correspondence guided by parallax and omnidirectional attention. To learn cross-view feature correspondence, a Selective Parallax Attention Module (SPAM) is proposed to interact with cross-view features under the guidance of parallax attention that adaptively selects receptive fields for different parallax ranges. Furthermore, to handle asymmetrical parallax, we propose a Non-local Omnidirectional Attention Module (NOAM) to learn the non-local correlation of both self- and cross-view contexts, which guides the aggregation of global contextual features. Finally, we propose an Attention-guided Correspondence Learning Restoration Network (ACLRNet) upon SPAMs and NOAMs to restore stereo images by associating the features of two views based on the learned correspondence. Extensive experiments on five benchmark datasets demonstrate the effectiveness and generalization of the proposed method on three stereo image restoration tasks including super-resolution, denoising, and compression artifact reduction.
Collapse
|
232
|
Bülow RD, Lan YC, Amann K, Boor P. [Artificial intelligence in kidney transplant pathology]. PATHOLOGIE (HEIDELBERG, GERMANY) 2024; 45:277-283. [PMID: 38598097 DOI: 10.1007/s00292-024-01324-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 03/12/2024] [Indexed: 04/11/2024]
Abstract
BACKGROUND Artificial intelligence (AI) systems have showed promising results in digital pathology, including digital nephropathology and specifically also kidney transplant pathology. AIM Summarize the current state of research and limitations in the field of AI in kidney transplant pathology diagnostics and provide a future outlook. MATERIALS AND METHODS Literature search in PubMed and Web of Science using the search terms "deep learning", "transplant", and "kidney". Based on these results and studies cited in the identified literature, a selection was made of studies that have a histopathological focus and use AI to improve kidney transplant diagnostics. RESULTS AND CONCLUSION Many studies have already made important contributions, particularly to the automation of the quantification of some histopathological lesions in nephropathology. This likely can be extended to automatically quantify all relevant lesions for a kidney transplant, such as Banff lesions. Important limitations and challenges exist in the collection of representative data sets and the updates of Banff classification, making large-scale studies challenging. The already positive study results make future AI support in kidney transplant pathology appear likely.
Collapse
Affiliation(s)
- Roman David Bülow
- Institut für Pathologie, Sektion Nephropathologie, Universitätsklinikum RWTH Aachen, Pauwelsstraße 30, 52074, Aachen, Deutschland
| | - Yu-Chia Lan
- Institut für Pathologie, Sektion Nephropathologie, Universitätsklinikum RWTH Aachen, Pauwelsstraße 30, 52074, Aachen, Deutschland
| | - Kerstin Amann
- Abteilung Nephropathologie, Institut für Pathologie, Universitätsklinikum Erlangen, Friedrich-Alexander Universität Erlangen-Nürnberg, Erlangen, Deutschland
| | - Peter Boor
- Institut für Pathologie, Sektion Nephropathologie, Universitätsklinikum RWTH Aachen, Pauwelsstraße 30, 52074, Aachen, Deutschland.
- Medizinische Klinik II, Universitätsklinikum RWTH Aachen, Aachen, Deutschland.
| |
Collapse
|
233
|
He Y, Ji Y, Li S, Shen Y, Ye L, Li Z, Huang W, Du Q. Age and sex estimation in cephalometric radiographs based on multitask convolutional neural networks. Oral Surg Oral Med Oral Pathol Oral Radiol 2024; 138:225-231. [PMID: 38614872 DOI: 10.1016/j.oooo.2024.02.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 01/27/2024] [Accepted: 02/10/2024] [Indexed: 04/15/2024]
Abstract
OBJECTIVES Age and sex characteristics are evident in cephalometric radiographs (CRs), yet their accurate estimation remains challenging due to the complexity of these images. This study aimed to harness deep learning to automate age and sex estimation from CRs, potentially simplifying their interpretation. STUDY DESIGN We compared the performance of 4 deep learning models (SVM, R-net, VGG16-SingleTask, and our proposed VGG16-MultiTask) in estimating age and sex from the testing dataset, utilizing a VGG16-based multitask deep learning model on 4,557 CRs. Gradient-weighted class activation mapping (Grad-CAM) was incorporated to identify sex. Performance was assessed using mean absolute error (MAE), specificity, sensitivity, F1 score, and the area under the curve (AUC) in receiver operating characteristic analysis. RESULTS The VGG16-MultiTask model outperformed the others, with the lowest MAE (0.864±1.602) and highest sensitivity (0.85), specificity (0.88), F1 score (0.863), and AUC (0.93), demonstrating superior efficacy and robust performance. CONCLUSIONS The VGG multitask model demonstrates significant potential in enhancing age and sex estimation from cephalometric analysis, underscoring the role of AI in improving biomedical interpretations.
Collapse
Affiliation(s)
- Yun He
- College of Preclinical Medicine of Chengdu University, Chengdu, Sichuan, China
| | - Yixuan Ji
- State Key Laboratory of Oral Diseases & National Center for Stomatology & National Clinical Research Center for Oral Diseases & Other Research Platforms, West China Hospital of Stomatology, Sichuan University, Chengdu, Sichuan, China
| | - Shihao Li
- Department of Biotherapy, Cancer Center, West China Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Yu Shen
- College of Preclinical Medicine of Chengdu University, Chengdu, Sichuan, China
| | - Lu Ye
- College of Preclinical Medicine of Chengdu University, Chengdu, Sichuan, China
| | - Ziyan Li
- Hospital of Chengdu Office of People's Government of Tibetan Autonomous Region (Hospital.C.T.), Chengdu, Sichuan, China
| | - Wenting Huang
- Hospital of Chengdu Office of People's Government of Tibetan Autonomous Region (Hospital.C.T.), Chengdu, Sichuan, China
| | - Qilian Du
- Hospital of Chengdu Office of People's Government of Tibetan Autonomous Region (Hospital.C.T.), Chengdu, Sichuan, China.
| |
Collapse
|
234
|
Tohidi F, Paul M, Ulhaq A, Chakraborty S. Improved Video-Based Point Cloud Compression via Segmentation. SENSORS (BASEL, SWITZERLAND) 2024; 24:4285. [PMID: 39001064 PMCID: PMC11243880 DOI: 10.3390/s24134285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Revised: 06/19/2024] [Accepted: 06/28/2024] [Indexed: 07/16/2024]
Abstract
A point cloud is a representation of objects or scenes utilising unordered points comprising 3D positions and attributes. The ability of point clouds to mimic natural forms has gained significant attention from diverse applied fields, such as virtual reality and augmented reality. However, the point cloud, especially those representing dynamic scenes or objects in motion, must be compressed efficiently due to its huge data volume. The latest video-based point cloud compression (V-PCC) standard for dynamic point clouds divides the 3D point cloud into many patches using computationally expensive normal estimation, segmentation, and refinement. The patches are projected onto a 2D plane to apply existing video coding techniques. This process often results in losing proximity information and some original points. This loss induces artefacts that adversely affect user perception. The proposed method segments dynamic point clouds based on shape similarity and occlusion before patch generation. This segmentation strategy helps maintain the points' proximity and retain more original points by exploiting the density and occlusion of the points. The experimental results establish that the proposed method significantly outperforms the V-PCC standard and other relevant methods regarding rate-distortion performance and subjective quality testing for both geometric and texture data of several benchmark video sequences.
Collapse
Affiliation(s)
- Faranak Tohidi
- School of Computing Mathematics and Engineering, Charles Sturt University, Bathurst, NSW 2795, Australia
| | - Manoranjan Paul
- School of Computing Mathematics and Engineering, Charles Sturt University, Bathurst, NSW 2795, Australia
| | - Anwaar Ulhaq
- School of Engineering and Technology, Centre for Intelligent Systems, Central Queensland University, Sydney Campus, Rockhampton, QLD 4701, Australia
| | - Subrata Chakraborty
- Faculty of Science, Agriculture, Business and Law, University of New England, Armidale, NSW 2351, Australia
- Centre for Advanced Modelling and Geospatial Information Systems (CAMGIS), Faculty of Engineering and Information Technology, University of Technology Sydney, Ultimo, NSW 2007, Australia
- Griffith Business School, Griffith University, Brisbane, QLD 4111, Australia
| |
Collapse
|
235
|
Li Z, Wang M. Rigid point cloud registration based on correspondence cloud for image-to-patient registration in image-guided surgery. Med Phys 2024; 51:4554-4566. [PMID: 38856158 DOI: 10.1002/mp.17243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Revised: 04/30/2024] [Accepted: 05/21/2024] [Indexed: 06/11/2024] Open
Abstract
BACKGROUND Image-to-patient registration aligns preoperative images to intra-operative anatomical structures and it is a critical step in image-guided surgery (IGS). The accuracy and speed of this step significantly influence the performance of IGS systems. Rigid registration based on paired points has been widely used in IGS, but studies have shown its limitations in terms of cost, accuracy, and registration time. Therefore, rigid registration of point clouds representing the human anatomical surfaces has become an alternative way for image-to-patient registration in the IGS systems. PURPOSE We propose a novel correspondence-based rigid point cloud registration method that can achieve global registration without the need for pose initialization. The proposed method is less sensitive to outliers compared to the widely used RANSAC-based registration methods and it achieves high accuracy at a high speed, which is particularly suitable for the image-to-patient registration in IGS. METHODS We use the rotation axis and angle to represent the rigid spatial transformation between two coordinate systems. Given a set of correspondences between two point clouds in two coordinate systems, we first construct a 3D correspondence cloud (CC) from the inlier correspondences and prove that the CC distributes on a plane, whose normal is the rotation axis between the two point clouds. Thus, the rotation axis can be estimated by fitting the CP. Then, we further show that when projecting the normals of a pair of corresponding points onto the CP, the angle between the projected normal pairs is equal to the rotation angle. Therefore, the rotation angle can be estimated from the angle histogram. Besides, this two-stage estimation also produces a high-quality correspondence subset with high inlier rate. With the estimated rotation axis, rotation angle, and the correspondence subset, the spatial transformation can be computed directly, or be estimated using RANSAC in a fast and robust way within only 100 iterations. RESULTS To validate the performance of the proposed registration method, we conducted experiments on the CT-Skull dataset. We first conducted a simulation experiment by controlling the initial inlier rate of the correspondence set, and the results showed that the proposed method can effectively obtain a correspondence subset with much higher inlier rate. We then compared our method with traditional approaches such as ICP, Go-ICP, and RANSAC, as well as recently proposed methods like TEASER, SC2-PCR, and MAC. Our method outperformed all traditional methods in terms of registration accuracy and speed. While achieving a registration accuracy comparable to the recently proposed methods, our method demonstrated superior speed, being almost three times faster than TEASER. CONCLUSIONS Experiments on the CT-Skull dataset demonstrate that the proposed method can effectively obtain a high-quality correspondence subset with high inlier rate, and a tiny RANSAC with 100 iterations is sufficient to estimate the optimal transformation for point cloud registration. Our method achieves higher registration accuracy and faster speed than existing widely used methods, demonstrating great potential for the image-to-patient registration, where a rigid spatial transformation is needed to align preoperative images to intra-operative patient anatomy.
Collapse
Affiliation(s)
- Zhihao Li
- Digital Medical Research Center of School of Basic Medical Sciences, Fudan University, Shanghai, China
- Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Shanghai, China
| | - Manning Wang
- Digital Medical Research Center of School of Basic Medical Sciences, Fudan University, Shanghai, China
- Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Shanghai, China
| |
Collapse
|
236
|
Yuan K, Kattel M, Lavanchy JL, Navab N, Srivastav V, Padoy N. Advancing surgical VQA with scene graph knowledge. Int J Comput Assist Radiol Surg 2024; 19:1409-1417. [PMID: 38780829 PMCID: PMC11231006 DOI: 10.1007/s11548-024-03141-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Accepted: 04/04/2024] [Indexed: 05/25/2024]
Abstract
PURPOSE The modern operating room is becoming increasingly complex, requiring innovative intra-operative support systems. While the focus of surgical data science has largely been on video analysis, integrating surgical computer vision with natural language capabilities is emerging as a necessity. Our work aims to advance visual question answering (VQA) in the surgical context with scene graph knowledge, addressing two main challenges in the current surgical VQA systems: removing question-condition bias in the surgical VQA dataset and incorporating scene-aware reasoning in the surgical VQA model design. METHODS First, we propose a surgical scene graph-based dataset, SSG-VQA, generated by employing segmentation and detection models on publicly available datasets. We build surgical scene graphs using spatial and action information of instruments and anatomies. These graphs are fed into a question engine, generating diverse QA pairs. We then propose SSG-VQA-Net, a novel surgical VQA model incorporating a lightweight Scene-embedded Interaction Module, which integrates geometric scene knowledge in the VQA model design by employing cross-attention between the textual and the scene features. RESULTS Our comprehensive analysis shows that our SSG-VQA dataset provides a more complex, diverse, geometrically grounded, unbiased and surgical action-oriented dataset compared to existing surgical VQA datasets and SSG-VQA-Net outperforms existing methods across different question types and complexities. We highlight that the primary limitation in the current surgical VQA systems is the lack of scene knowledge to answer complex queries. CONCLUSION We present a novel surgical VQA dataset and model and show that results can be significantly improved by incorporating geometric scene features in the VQA model design. We point out that the bottleneck of the current surgical visual question-answer model lies in learning the encoded representation rather than decoding the sequence. Our SSG-VQA dataset provides a diagnostic benchmark to test the scene understanding and reasoning capabilities of the model. The source code and the dataset will be made publicly available at: https://github.com/CAMMA-public/SSG-VQA .
Collapse
Affiliation(s)
- Kun Yuan
- University of Strasbourg, CNRS, INSERM, ICube, UMR7357, Strasbourg, France.
- IHU, Strasbourg, France.
- CAMP, Technische Universität München, Munich, Germany.
| | - Manasi Kattel
- University of Strasbourg, CNRS, INSERM, ICube, UMR7357, Strasbourg, France
- IHU, Strasbourg, France
| | | | - Nassir Navab
- CAMP, Technische Universität München, Munich, Germany
| | - Vinkle Srivastav
- University of Strasbourg, CNRS, INSERM, ICube, UMR7357, Strasbourg, France
- IHU, Strasbourg, France
| | - Nicolas Padoy
- University of Strasbourg, CNRS, INSERM, ICube, UMR7357, Strasbourg, France
- IHU, Strasbourg, France
| |
Collapse
|
237
|
Murphy KM, Ludwig E, Gutierrez J, Gehan MA. Deep Learning in Image-Based Plant Phenotyping. ANNUAL REVIEW OF PLANT BIOLOGY 2024; 75:771-795. [PMID: 38382904 DOI: 10.1146/annurev-arplant-070523-042828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
A major bottleneck in the crop improvement pipeline is our ability to phenotype crops quickly and efficiently. Image-based, high-throughput phenotyping has a number of advantages because it is nondestructive and reduces human labor, but a new challenge arises in extracting meaningful information from large quantities of image data. Deep learning, a type of artificial intelligence, is an approach used to analyze image data and make predictions on unseen images that ultimately reduces the need for human input in computation. Here, we review the basics of deep learning, assessments of deep learning success, examples of applications of deep learning in plant phenomics, best practices, and open challenges.
Collapse
Affiliation(s)
| | - Ella Ludwig
- Donald Danforth Plant Science Center, St. Louis, Missouri, USA;
| | - Jorge Gutierrez
- Donald Danforth Plant Science Center, St. Louis, Missouri, USA;
| | - Malia A Gehan
- Donald Danforth Plant Science Center, St. Louis, Missouri, USA;
| |
Collapse
|
238
|
Jiang Z, Seyedi S, Griner E, Abbasi A, Rad AB, Kwon H, Cotes RO, Clifford GD. Evaluating and mitigating unfairness in multimodal remote mental health assessments. PLOS DIGITAL HEALTH 2024; 3:e0000413. [PMID: 39046989 DOI: 10.1371/journal.pdig.0000413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 06/13/2024] [Indexed: 07/27/2024]
Abstract
Research on automated mental health assessment tools has been growing in recent years, often aiming to address the subjectivity and bias that existed in the current clinical practice of the psychiatric evaluation process. Despite the substantial health and economic ramifications, the potential unfairness of those automated tools was understudied and required more attention. In this work, we systematically evaluated the fairness level in a multimodal remote mental health dataset and an assessment system, where we compared the fairness level in race, gender, education level, and age. Demographic parity ratio (DPR) and equalized odds ratio (EOR) of classifiers using different modalities were compared, along with the F1 scores in different demographic groups. Post-training classifier threshold optimization was employed to mitigate the unfairness. No statistically significant unfairness was found in the composition of the dataset. Varying degrees of unfairness were identified among modalities, with no single modality consistently demonstrating better fairness across all demographic variables. Post-training mitigation effectively improved both DPR and EOR metrics at the expense of a decrease in F1 scores. Addressing and mitigating unfairness in these automated tools are essential steps in fostering trust among clinicians, gaining deeper insights into their use cases, and facilitating their appropriate utilization.
Collapse
Affiliation(s)
- Zifan Jiang
- Department of Biomedical Informatics, Emory University School of Medicine, Atlanta, Georgia, United States of America
- Department of Biomedical Engineering, Emory University and Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Salman Seyedi
- Department of Biomedical Informatics, Emory University School of Medicine, Atlanta, Georgia, United States of America
| | - Emily Griner
- Department of Psychiatry and Behavioral Sciences, Emory University School of Medicine, Atlanta, Georgia, United States of America
| | - Ahmed Abbasi
- Department of IT, Analytics, and Operations, University of Notre Dame, Notre Dame, Indiana, United States of America
| | - Ali Bahrami Rad
- Department of Biomedical Informatics, Emory University School of Medicine, Atlanta, Georgia, United States of America
| | - Hyeokhyen Kwon
- Department of Biomedical Informatics, Emory University School of Medicine, Atlanta, Georgia, United States of America
| | - Robert O Cotes
- Department of Psychiatry and Behavioral Sciences, Emory University School of Medicine, Atlanta, Georgia, United States of America
| | - Gari D Clifford
- Department of Biomedical Informatics, Emory University School of Medicine, Atlanta, Georgia, United States of America
- Department of Biomedical Engineering, Emory University and Georgia Institute of Technology, Atlanta, Georgia, United States of America
| |
Collapse
|
239
|
Qiao T, Xie S, Chen Y, Retraint F, Luo X. Fully Unsupervised Deepfake Video Detection Via Enhanced Contrastive Learning. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:4654-4668. [PMID: 38252582 DOI: 10.1109/tpami.2024.3356814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Nowadays, Deepfake videos are widely spread over the Internet, which severely impairs the public trustworthiness and social security. Although more and more reliable detectors have recently sprung up for resisting against that new-emerging tampering technique, some challengeable issues still need to be addressed, such that most of Deepfake video detectors under the framework of the supervised mechanism require a large scale of samples with accurate labels for training. When the amount of the training samples with the true labels are not enough or the training data are maliciously poisoned by adversaries, the supervised classifier is probably not reliable for detection. To tackle that tough issue, it is proposed to design a fully unsupervised Deepfake detector. In particular, in the whole procedure of training or testing, we have no idea of any information about the true labels of samples. First, we novelly design a pseudo-label generator for labeling the training samples, where the traditional hand-crafted features are used to characterize both types of samples. Second, the training samples with the pseudo-labels are fed into the proposed enhanced contrastive learner, in which the discriminative features are further extracted and continually refined by iteration on the guidance of the contrastive loss. Last, relying on the inter-frame correlation, we complete the final binary classification between real and fake videos. A large scale of experimental results empirically verify the effectiveness of our proposed unsupervised Deepfake detector on the benchmark datasets including FF++, Celeb-DF, DFD, DFDC, and UADFV. Furthermore, our proposed well-performed detector is superior to the current unsupervised method, and comparable to the baseline supervised methods. More importantly, when facing the problem of the labeled data poisoned by malicious adversaries or insufficient data for training, our proposed unsupervised Deepfake detector performs its powerful superiority.
Collapse
|
240
|
Wu Z, Weng Z, Peng W, Yang X, Li A, Davis LS, Jiang YG. Building an Open-Vocabulary Video CLIP Model With Better Architectures, Optimization and Data. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:4747-4762. [PMID: 38261478 DOI: 10.1109/tpami.2024.3357503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/25/2024]
Abstract
Despite significant results achieved by Contrastive Language-Image Pretraining (CLIP) in zero-shot image recognition, limited effort has been made exploring its potential for zero-shot video recognition. This paper presents Open-VCLIP++, a simple yet effective framework that adapts CLIP to a strong zero-shot video classifier, capable of identifying novel actions and events during testing. Open-VCLIP++ minimally modifies CLIP to capture spatial-temporal relationships in videos, thereby creating a specialized video classifier while striving for generalization. We formally demonstrate that training Open-VCLIP++ is tantamount to continual learning with zero historical data. To address this problem, we introduce Interpolated Weight Optimization, a technique that leverages the advantages of weight interpolation during both training and testing. Furthermore, we build upon large language models to produce fine-grained video descriptions. These detailed descriptions are further aligned with video features, facilitating a better transfer of CLIP to the video domain. Our approach is evaluated on three widely used action recognition datasets, following a variety of zero-shot evaluation protocols. The results demonstrate that our method surpasses existing state-of-the-art techniques by significant margins. Specifically, we achieve zero-shot accuracy scores of 88.1%, 58.7%, and 81.2% on UCF, HMDB, and Kinetics-600 datasets respectively, outpacing the best-performing alternative methods by 8.5%, 8.2%, and 12.3%. We also evaluate our approach on the MSR-VTT video-text retrieval dataset, where it delivers competitive video-to-text and text-to-video retrieval performance, while utilizing substantially less fine-tuning data compared to other methods.
Collapse
|
241
|
Kim S, Chae DK. What Does a Model Really Look at?: Extracting Model-Oriented Concepts for Explaining Deep Neural Networks. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:4612-4624. [PMID: 38261481 DOI: 10.1109/tpami.2024.3357717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/25/2024]
Abstract
Model explainability is one of the crucial ingredients for building trustable AI systems, especially in the applications requiring reliability such as automated driving and diagnosis. Many explainability methods have been studied in the literature. Among many others, this article focuses on a research line that tries to visually explain a pre-trained image classification model such as Convolutional Neural Network by discovering concepts learned by the model, which is so-called the concept-based explanation. Previous concept-based explanation methods rely on the human definition of concepts (e.g., the Broden dataset) or semantic segmentation techniques like Slic (Simple Linear Iterative Clustering). However, we argue that the concepts identified by those methods may show image parts which are more in line with a human perspective or cropped by a segmentation method, rather than purely reflect a model's own perspective. We propose Model-Oriented Concept Extraction (MOCE), a novel approach to extracting key concepts based solely on a model itself, thereby being able to capture its unique perspectives which are not affected by any external factors. Experimental results on various pre-trained models confirmed the advantages of extracting concepts by truly representing the model's point of view.
Collapse
|
242
|
Bannone E, Collins T, Esposito A, Cinelli L, De Pastena M, Pessaux P, Felli E, Andreotti E, Okamoto N, Barberio M, Felli E, Montorsi RM, Ingaglio N, Rodríguez-Luna MR, Nkusi R, Marescaux J, Hostettler A, Salvia R, Diana M. Surgical optomics: hyperspectral imaging and deep learning towards precision intraoperative automatic tissue recognition-results from the EX-MACHYNA trial. Surg Endosc 2024; 38:3758-3772. [PMID: 38789623 DOI: 10.1007/s00464-024-10880-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Accepted: 04/23/2024] [Indexed: 05/26/2024]
Abstract
BACKGROUND Hyperspectral imaging (HSI), combined with machine learning, can help to identify characteristic tissue signatures enabling automatic tissue recognition during surgery. This study aims to develop the first HSI-based automatic abdominal tissue recognition with human data in a prospective bi-center setting. METHODS Data were collected from patients undergoing elective open abdominal surgery at two international tertiary referral hospitals from September 2020 to June 2021. HS images were captured at various time points throughout the surgical procedure. Resulting RGB images were annotated with 13 distinct organ labels. Convolutional Neural Networks (CNNs) were employed for the analysis, with both external and internal validation settings utilized. RESULTS A total of 169 patients were included, 73 (43.2%) from Strasbourg and 96 (56.8%) from Verona. The internal validation within centers combined patients from both centers into a single cohort, randomly allocated to the training (127 patients, 75.1%, 585 images) and test sets (42 patients, 24.9%, 181 images). This validation setting showed the best performance. The highest true positive rate was achieved for the skin (100%) and the liver (97%). Misclassifications included tissues with a similar embryological origin (omentum and mesentery: 32%) or with overlaying boundaries (liver and hepatic ligament: 22%). The median DICE score for ten tissue classes exceeded 80%. CONCLUSION To improve automatic surgical scene segmentation and to drive clinical translation, multicenter accurate HSI datasets are essential, but further work is needed to quantify the clinical value of HSI. HSI might be included in a new omics science, namely surgical optomics, which uses light to extract quantifiable tissue features during surgery.
Collapse
Affiliation(s)
- Elisa Bannone
- Research Institute Against Digestive Cancer (IRCAD), 67000, Strasbourg, France.
- Department of General and Pancreatic Surgery, The Pancreas Institute, University of Verona Hospital Trust, P.Le Scuro 10, 37134, Verona, Italy.
| | - Toby Collins
- Research Institute Against Digestive Cancer (IRCAD), 67000, Strasbourg, France
| | - Alessandro Esposito
- Department of General and Pancreatic Surgery, The Pancreas Institute, University of Verona Hospital Trust, P.Le Scuro 10, 37134, Verona, Italy
| | - Lorenzo Cinelli
- Research Institute Against Digestive Cancer (IRCAD), 67000, Strasbourg, France
- Department of Gastrointestinal Surgery, San Raffaele Hospital IRCCS, Milan, Italy
| | - Matteo De Pastena
- Department of General and Pancreatic Surgery, The Pancreas Institute, University of Verona Hospital Trust, P.Le Scuro 10, 37134, Verona, Italy
| | - Patrick Pessaux
- Research Institute Against Digestive Cancer (IRCAD), 67000, Strasbourg, France
- Department of General, Digestive, and Endocrine Surgery, University Hospital of Strasbourg, Strasbourg, France
- Institut of Viral and Liver Disease, Inserm U1110, University of Strasbourg, Strasbourg, France
| | - Emanuele Felli
- Research Institute Against Digestive Cancer (IRCAD), 67000, Strasbourg, France
- Department of General, Digestive, and Endocrine Surgery, University Hospital of Strasbourg, Strasbourg, France
- Institut of Viral and Liver Disease, Inserm U1110, University of Strasbourg, Strasbourg, France
| | - Elena Andreotti
- Department of General and Pancreatic Surgery, The Pancreas Institute, University of Verona Hospital Trust, P.Le Scuro 10, 37134, Verona, Italy
| | - Nariaki Okamoto
- Research Institute Against Digestive Cancer (IRCAD), 67000, Strasbourg, France
- Photonics Instrumentation for Health, iCube Laboratory, University of Strasbourg, Strasbourg, France
| | - Manuel Barberio
- Research Institute Against Digestive Cancer (IRCAD), 67000, Strasbourg, France
- General Surgery Department, Ospedale Cardinale G. Panico, Tricase, Italy
| | - Eric Felli
- Research Institute Against Digestive Cancer (IRCAD), 67000, Strasbourg, France
- Department of Visceral Surgery and Medicine, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
| | - Roberto Maria Montorsi
- Department of General and Pancreatic Surgery, The Pancreas Institute, University of Verona Hospital Trust, P.Le Scuro 10, 37134, Verona, Italy
| | - Naomi Ingaglio
- Department of General and Pancreatic Surgery, The Pancreas Institute, University of Verona Hospital Trust, P.Le Scuro 10, 37134, Verona, Italy
| | - María Rita Rodríguez-Luna
- Research Institute Against Digestive Cancer (IRCAD), 67000, Strasbourg, France
- Photonics Instrumentation for Health, iCube Laboratory, University of Strasbourg, Strasbourg, France
| | - Richard Nkusi
- Research Institute Against Digestive Cancer (IRCAD), 67000, Strasbourg, France
| | - Jacque Marescaux
- Research Institute Against Digestive Cancer (IRCAD), 67000, Strasbourg, France
| | | | - Roberto Salvia
- Department of General and Pancreatic Surgery, The Pancreas Institute, University of Verona Hospital Trust, P.Le Scuro 10, 37134, Verona, Italy
| | - Michele Diana
- Photonics Instrumentation for Health, iCube Laboratory, University of Strasbourg, Strasbourg, France
- Department of Surgery, University Hospital of Geneva, Geneva, Switzerland
| |
Collapse
|
243
|
Kuo JC, Chan W, Leon-Novelo L, Lairson DR, Brown A, Fujimoto K. Latent classification model for censored longitudinal binary outcome. Stat Med 2024. [PMID: 38951953 DOI: 10.1002/sim.10156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Revised: 04/23/2024] [Accepted: 06/10/2024] [Indexed: 07/03/2024]
Abstract
Latent classification model is a class of statistical methods for identifying unobserved class membership among the study samples using some observed data. In this study, we proposed a latent classification model that takes a censored longitudinal binary outcome variable and uses its changing pattern over time to predict individuals' latent class membership. Assuming the time-dependent outcome variables follow a continuous-time Markov chain, the proposed method has two primary goals: (1) estimate the distribution of the latent classes and predict individuals' class membership, and (2) estimate the class-specific transition rates and rate ratios. To assess the model's performance, we conducted a simulation study and verified that our algorithm produces accurate model estimates (ie, small bias) with reasonable confidence intervals (ie, achieving approximately 95% coverage probability). Furthermore, we compared our model to four other existing latent class models and demonstrated that our approach yields higher prediction accuracies for latent classes. We applied our proposed method to analyze the COVID-19 data in Houston, Texas, US collected between January first 2021 and December 31st 2021. Early reports on the COVID-19 pandemic showed that the severity of a SARS-CoV-2 infection tends to vary greatly by cases. We found that while demographic characteristics explain some of the differences in individuals' experience with COVID-19, some unaccounted-for latent variables were associated with the disease.
Collapse
Affiliation(s)
- Jacky C Kuo
- Department of Biostatistics and Data Science, University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Wenyaw Chan
- Department of Biostatistics and Data Science, University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Luis Leon-Novelo
- Department of Biostatistics and Data Science, University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - David R Lairson
- Department of Management, Policy and Community Health, University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Armand Brown
- Bureau of Epidemiology, Houston Health Department, Houston, Texas, USA
| | - Kayo Fujimoto
- Department of Health Promotion and Behaviroal Sciences, University of Texas Health Science Center at Houston, Houston, Texas, USA
| |
Collapse
|
244
|
Strong JS, Furube T, Takeuchi M, Kawakubo H, Maeda Y, Matsuda S, Fukuda K, Nakamura R, Kitagawa Y. Evaluating surgical expertise with AI-based automated instrument recognition for robotic distal gastrectomy. Ann Gastroenterol Surg 2024; 8:611-619. [PMID: 38957567 PMCID: PMC11216797 DOI: 10.1002/ags3.12784] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 12/11/2023] [Accepted: 02/09/2024] [Indexed: 07/04/2024] Open
Abstract
Introduction Complexities of robotic distal gastrectomy (RDG) give reason to assess physician's surgical skill. Varying levels in surgical skill affect patient outcomes. We aim to investigate how a novel artificial intelligence (AI) model can be used to evaluate surgical skill in RDG by recognizing surgical instruments. Methods Fifty-five consecutive robotic surgical videos of RDG for gastric cancer were analyzed. We used Deeplab, a multi-stage temporal convolutional network, and it trained on 1234 manually annotated images. The model was then tested on 149 annotated images for accuracy. Deep learning metrics such as Intersection over Union (IoU) and accuracy were assessed, and the comparison between experienced and non-experienced surgeons based on usage of instruments during infrapyloric lymph node dissection was performed. Results We annotated 540 Cadiere forceps, 898 Fenestrated bipolars, 359 Suction tubes, 307 Maryland bipolars, 688 Harmonic scalpels, 400 Staplers, and 59 Large clips. The average IoU and accuracy were 0.82 ± 0.12 and 87.2 ± 11.9% respectively. Moreover, the percentage of each instrument's usage to overall infrapyloric lymphadenectomy duration predicted by AI were compared. The use of Stapler and Large clip were significantly shorter in the experienced group compared to the non-experienced group. Conclusions This study is the first to report that surgical skill can be successfully and accurately determined by an AI model for RDG. Our AI gives us a way to recognize and automatically generate instance segmentation of the surgical instruments present in this procedure. Use of this technology allows unbiased, more accessible RDG surgical skill.
Collapse
Affiliation(s)
- James S. Strong
- Department of SurgeryKeio University School of MedicineTokyoJapan
- Harvard CollegeHarvard UniversityCambridgeMassachusettsUSA
| | - Tasuku Furube
- Department of SurgeryKeio University School of MedicineTokyoJapan
| | - Masashi Takeuchi
- Department of SurgeryKeio University School of MedicineTokyoJapan
| | | | - Yusuke Maeda
- Department of SurgeryKeio University School of MedicineTokyoJapan
| | - Satoru Matsuda
- Department of SurgeryKeio University School of MedicineTokyoJapan
| | - Kazumasa Fukuda
- Department of SurgeryKeio University School of MedicineTokyoJapan
| | - Rieko Nakamura
- Department of SurgeryKeio University School of MedicineTokyoJapan
| | - Yuko Kitagawa
- Department of SurgeryKeio University School of MedicineTokyoJapan
| |
Collapse
|
245
|
Hiremath A, Viswanathan VS, Bera K, Shiradkar R, Yuan L, Armitage K, Gilkeson R, Ji M, Fu P, Gupta A, Lu C, Madabhushi A. Deep learning reveals lung shape differences on baseline chest CT between mild and severe COVID-19: A multi-site retrospective study. Comput Biol Med 2024; 177:108643. [PMID: 38815485 PMCID: PMC11188049 DOI: 10.1016/j.compbiomed.2024.108643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 05/10/2024] [Accepted: 05/21/2024] [Indexed: 06/01/2024]
Abstract
Severe COVID-19 can lead to extensive lung disease causing lung architectural distortion. In this study we employed machine learning and statistical atlas-based approaches to explore possible changes in lung shape among COVID-19 patients and evaluated whether the extent of these changes was associated with COVID-19 severity. On a large multi-institutional dataset (N = 3443), three different populations were defined; a) healthy (no COVID-19), b) mild COVID-19 (no ventilator required), c) severe COVID-19 (ventilator required), and the presence of lung shape differences between them were explored using baseline chest CT. Significant lung shape differences were observed along mediastinal surfaces of the lungs across all severity of COVID-19 disease. Additionally, differences were seen on basal surfaces of the lung when compared between healthy and severe COVID-19 patients. Finally, an AI model (a 3D residual convolutional network) characterizing these shape differences coupled with lung infiltrates (ground-glass opacities and consolidation regions) was found to be associated with COVID-19 severity.
Collapse
Affiliation(s)
- Amogh Hiremath
- Case Western Reserve University, Department of Biomedical Engineering, Cleveland, OH, USA; Picture Health, Cleveland, OH, USA
| | | | - Kaustav Bera
- University Hospitals Cleveland Medical Center, Department of Radiology, Cleveland, OH, USA
| | | | - Lei Yuan
- Renmin Hospital of Wuhan University, Department of Information Center, Wuhan, Hubei, China
| | - Keith Armitage
- University Hospitals Cleveland Medical Center, Department of Infectious Diseases, Cleveland, OH, USA
| | - Robert Gilkeson
- University Hospitals Cleveland Medical Center, Department of Radiology, Cleveland, OH, USA
| | - Mengyao Ji
- Renmin Hospital of Wuhan University, Department of Gastroenterology, Wuhan, Hubei, China
| | - Pingfu Fu
- Case Western Reserve University, Department of Population and Quantitative Health Sciences, Cleveland, OH, USA
| | - Amit Gupta
- University Hospitals Cleveland Medical Center, Department of Radiology, Cleveland, OH, USA
| | - Cheng Lu
- Guangdong Provincial People's Hospital, Department of Radiology, Guangdong Academy of Medical Sciences, Guangzhou, China; Guangdong Provincial People's Hospital, Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangdong Academy of Medical Sciences, Guangzhou, China; Guangdong Provincial People's Hospital, Medical Research Center, Guangdong Academy of Medical Sciences, China
| | - Anant Madabhushi
- Georgia Institute of Technology and Emory University, Radiology and Imaging Sciences, Biomedical Informatics (BMI) and Pathology, GA, USA; Atlanta Veterans Administration Medical Center, GA, USA.
| |
Collapse
|
246
|
Stewart EEM, Fleming RW, Schütz AC. A simple optical flow model explains why certain object viewpoints are special. Proc Biol Sci 2024; 291:20240577. [PMID: 38981528 DOI: 10.1098/rspb.2024.0577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Accepted: 06/13/2024] [Indexed: 07/11/2024] Open
Abstract
A core challenge in perception is recognizing objects across the highly variable retinal input that occurs when objects are viewed from different directions (e.g. front versus side views). It has long been known that certain views are of particular importance, but it remains unclear why. We reasoned that characterizing the computations underlying visual comparisons between objects could explain the privileged status of certain qualitatively special views. We measured pose discrimination for a wide range of objects, finding large variations in performance depending on the object and the viewing angle, with front and back views yielding particularly good discrimination. Strikingly, a simple and biologically plausible computational model based on measuring the projected three-dimensional optical flow between views of objects accurately predicted both successes and failures of discrimination performance. This provides a computational account of why certain views have a privileged status.
Collapse
Affiliation(s)
- Emma E M Stewart
- School of Biological and Behavioural Sciences, Queen Mary University London , London E14NS, UK
- Department of Experimental and Biological Psychology, Queen Mary University London , London E14NS, UK
- Centre for Brain and Behaviour, Queen Mary University London , London E14NS, UK
| | - Roland W Fleming
- Department of Experimental Psychology, Justus Liebig University Giessen , Giessen 35394, Germany
- Centre for Mind, Brain, and Behaviour (CMBB), University of Marburg and Justus Liebig University Giessen , Giessen 35032, Germany
| | - Alexander C Schütz
- Centre for Mind, Brain, and Behaviour (CMBB), University of Marburg and Justus Liebig University Giessen , Giessen 35032, Germany
- General and Experimental Psychology, University of Marburg , Marburg 35032, Germany
| |
Collapse
|
247
|
Jeong HK, Park C, Jiang SW, Nicholas M, Chen S, Henao R, Kheterpal M. Image Quality Assessment Using Convolutional Neural Network in Clinical Skin Images. JID INNOVATIONS 2024; 4:100285. [PMID: 39036289 PMCID: PMC11260318 DOI: 10.1016/j.xjidi.2024.100285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 12/24/2023] [Accepted: 03/06/2024] [Indexed: 07/23/2024] Open
Abstract
The image quality received for clinical evaluation is often suboptimal. The goal is to develop an image quality analysis tool to assess patient- and primary care physician-derived images using deep learning model. Dataset included patient- and primary care physician-derived images from August 21, 2018 to June 30, 2022 with 4 unique quality labels. VGG16 model was fine tuned with input data, and optimal threshold was determined by Youden's index. Ordinal labels were transformed to binary labels using a majority vote because model distinguishes between 2 categories (good vs bad). At a threshold of 0.587, area under the curve for the test set was 0.885 (95% confidence interval = 0.838-0.933); sensitivity, specificity, positive predictive value, and negative predictive value were 0.829, 0.784, 0.906, and 0.645, respectively. Independent validation of 300 additional images (from patients and primary care physicians) demonstrated area under the curve of 0.864 (95% confidence interval = 0.818-0.909) and area under the curve of 0.902 (95% confidence interval = 0.85-0.95), respectively. The sensitivity, specificity, positive predictive value, and negative predictive value for the 300 images were 0.827, 0.800, 0.959, and 0.450, respectively. We demonstrate a practical approach improving the image quality for clinical workflow. Although users may have to capture additional images, this is offset by the improved workload and efficiency for clinical teams.
Collapse
Affiliation(s)
- Hyeon Ki Jeong
- Department of Biostatistics & Bioinformatics, Duke University School of Medicine, Durham, North Carolina, USA
| | - Christine Park
- Duke University School of Medicine, Durham, North Carolina, USA
| | - Simon W. Jiang
- Duke University School of Medicine, Durham, North Carolina, USA
| | - Matilda Nicholas
- Department of Dermatology, Duke University School of Medicine, Durham, North Carolina, USA
| | - Suephy Chen
- Department of Dermatology, Duke University School of Medicine, Durham, North Carolina, USA
- Durham VA Medical Center, Durham, North Carolina, USA
| | - Ricardo Henao
- Department of Biostatistics & Bioinformatics, Duke University School of Medicine, Durham, North Carolina, USA
| | - Meenal Kheterpal
- Department of Dermatology, Duke University School of Medicine, Durham, North Carolina, USA
| |
Collapse
|
248
|
Shen J, Zhao H, Deng W. Broad Learning System under Label Noise: A Novel Reweighting Framework with Logarithm Kernel and Mixture Autoencoder. SENSORS (BASEL, SWITZERLAND) 2024; 24:4268. [PMID: 39001047 PMCID: PMC11244421 DOI: 10.3390/s24134268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Revised: 06/27/2024] [Accepted: 06/29/2024] [Indexed: 07/16/2024]
Abstract
The Broad Learning System (BLS) has demonstrated strong performance across a variety of problems. However, BLS based on the Minimum Mean Square Error (MMSE) criterion is highly sensitive to label noise. To enhance the robustness of BLS in environments with label noise, a function called Logarithm Kernel (LK) is designed to reweight the samples for outputting weights during the training of BLS in order to construct a Logarithm Kernel-based BLS (L-BLS) in this paper. Additionally, for image databases with numerous features, a Mixture Autoencoder (MAE) is designed to construct more representative feature nodes of BLS in complex label noise environments. For the MAE, two corresponding versions of BLS, MAEBLS, and L-MAEBLS were also developed. The extensive experiments validate the robustness and effectiveness of the proposed L-BLS, and MAE can provide more representative feature nodes for the corresponding version of BLS.
Collapse
Affiliation(s)
- Jiuru Shen
- College of Electronic Information and Automation, Civil Aviation University of China, Tianjin 300300, China
| | - Huimin Zhao
- College of Electronic Information and Automation, Civil Aviation University of China, Tianjin 300300, China
| | - Wu Deng
- College of Electronic Information and Automation, Civil Aviation University of China, Tianjin 300300, China
| |
Collapse
|
249
|
Lee YW, Kim BG. Attention-based scale sequence network for small object detection. Heliyon 2024; 10:e32931. [PMID: 39021898 PMCID: PMC11253262 DOI: 10.1016/j.heliyon.2024.e32931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 06/11/2024] [Accepted: 06/12/2024] [Indexed: 07/20/2024] Open
Abstract
Recently, with the remarkable development of deep learning technology, achievements are being updated in various computer vision fields. In particular, the object recognition field is receiving the most attention. Nevertheless, recognition performance for small objects is still challenging. Its performance is of utmost importance in realistic applications such as searching for missing persons through aerial photography. The core structure of the object recognition neural network is the feature pyramid network (FPN). You Only Look Once (YOLO) is the most widely used representative model following this structure. In this study, we proposed an attention-based scale sequence network (ASSN) that improves the scale sequence feature pyramid network (ssFPN), enhancing the performance of the FPN-based detector for small objects. ASSN is a lightweight attention module optimized for FPN-based detectors and has the versatility to be applied to any model with a corresponding structure. The proposed ASSN demonstrated performance improvements compared to the baselines (YOLOv7 and YOLOv8) in average precision (AP) of up to 0.6%. Additionally, the AP for small objects ( A P S ) showed also improvements of up to 1.9%. Furthermore, ASSN exhibits higher performance than ssFPN while achieving lightweightness and optimization, thereby improving computational complexity and processing speed. ASSN is open-source based on YOLO version 7 and 8. This can be found in our public repository: https://github.com/smu-ivpl/ASSN.git.
Collapse
Affiliation(s)
- Young-Woon Lee
- Department of Computer Engineering, Sunmoon University, Asan, Republic of Korea
| | - Byung-Gyu Kim
- Division of Artificial Intelligence Engineering, Sookmyung Women's University, Seoul, Republic of Korea
| |
Collapse
|
250
|
Park J, Soucy E, Segawa J, Mair R, Konkle T. Immersive scene representation in human visual cortex with ultra-wide-angle neuroimaging. Nat Commun 2024; 15:5477. [PMID: 38942766 PMCID: PMC11213904 DOI: 10.1038/s41467-024-49669-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 06/13/2024] [Indexed: 06/30/2024] Open
Abstract
While human vision spans 220°, traditional functional MRI setups display images only up to central 10-15°. Thus, it remains unknown how the brain represents a scene perceived across the full visual field. Here, we introduce a method for ultra-wide angle display and probe signatures of immersive scene representation. An unobstructed view of 175° is achieved by bouncing the projected image off angled-mirrors onto a custom-built curved screen. To avoid perceptual distortion, scenes are created with wide field-of-view from custom virtual environments. We find that immersive scene representation drives medial cortex with far-peripheral preferences, but shows minimal modulation in classic scene regions. Further, scene and face-selective regions maintain their content preferences even with extreme far-periphery stimulation, highlighting that not all far-peripheral information is automatically integrated into scene regions computations. This work provides clarifying evidence on content vs. peripheral preferences in scene representation and opens new avenues to research immersive vision.
Collapse
Affiliation(s)
- Jeongho Park
- Department of Psychology, Harvard University, Cambridge, MA, USA.
| | - Edward Soucy
- Center for Brain Science, Harvard University, Cambridge, MA, USA
| | - Jennifer Segawa
- Center for Brain Science, Harvard University, Cambridge, MA, USA
| | - Ross Mair
- Center for Brain Science, Harvard University, Cambridge, MA, USA
- Department of Radiology, Harvard Medical School, Boston, MA, USA
- Department of Radiology, Massachusetts General Hospital, Boston, MA, USA
| | - Talia Konkle
- Department of Psychology, Harvard University, Cambridge, MA, USA
- Center for Brain Science, Harvard University, Cambridge, MA, USA
- Kempner Institute for Biological and Artificial Intelligence, Harvard University, Boston, MA, USA
| |
Collapse
|