1
|
Estimation of wheat protein content and wet gluten content based on fusion of hyperspectral and RGB sensors using machine learning algorithms. Food Chem 2024; 448:139103. [PMID: 38547708 DOI: 10.1016/j.foodchem.2024.139103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Revised: 02/27/2024] [Accepted: 03/19/2024] [Indexed: 04/24/2024]
Abstract
The protein content (PC) and wet gluten content (WGC) are crucial indicators determining the quality of wheat, playing a pivotal role in evaluating processing and baking performance. Original reflectance (OR), wavelet feature (WF), and color index (CI) were extracted from hyperspectral and RGB sensors. Combining Pearson-competitive adaptive reweighted sampling (CARs)-variance inflation factor (VIF) with four machine learning (ML) algorithms were used to model accuracy of PC and WGC. As a result, three CIs, six ORs, and twelve WFs were selected for PC and WGC datasets. For single-modal data, the back-propagation neural network exhibited superior accuracy, with estimation accuracies (WF > OR > CI). For multi-modal data, the random forest regression paired with OR + WF + CI showed the highest validation accuracy. Utilizing the Gini impurity, WF outweighed OR and CI in the PC and WGC models. The amalgamation of MLs with multimodal data harnessed the synergies among various remote sensing sources, substantially augmenting model precision and stability.
Collapse
|
2
|
A double-branch convolutional neural network model for species identification based on multi-modal data. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2024; 318:124454. [PMID: 38788500 DOI: 10.1016/j.saa.2024.124454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 04/15/2024] [Accepted: 05/10/2024] [Indexed: 05/26/2024]
Abstract
For species identification analysis, methods based on deep learning are becoming prevalent due to their data-driven and task-oriented nature. The most commonly used convolutional neural network (CNN) model has been well applied in Raman spectra recognition. However, when faced with similar molecules or functional groups, the features of overlapping peaks and weak peaks may not be fully extracted using the CNN model, which can potentially hinder accurate species identification. Based on these practical challenges, the fusion of multi-modal data can effectively meet the comprehensive and accurate analysis of actual samples when compared with single-modal data. In this study, we propose a double-branch CNN model by integrating Raman and image multi-modal data, named SI-DBNet. In addition, we have developed a one-dimensional convolutional neural network combining dilated convolutions and efficient channel attention mechanisms for spectral branching. The effectiveness of the model has been demonstrated using the Grad-CAM method to visualize the key regions concerned by the model. When compared to single-modal and multi-modal classification methods, our SI-DBNet model achieved superior performance with a classification accuracy of 98.8%. The proposed method provided a new reference for species identification based on multi-modal data fusion.
Collapse
|
3
|
Machine learning integration of multi-modal analytical data for distinguishing abnormal botanical drugs and its application in Guhong injection. Chin Med 2024; 19:2. [PMID: 38163913 PMCID: PMC10759515 DOI: 10.1186/s13020-023-00873-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 12/14/2023] [Indexed: 01/03/2024] Open
Abstract
BACKGROUND Determination of batch-to-batch consistency of botanical drugs (BDs) has long been the bottleneck in quality evaluation primarily due to the chemical diversity inherent in BDs. This diversity presents an obstacle to achieving comprehensive standardization for BDs. Basically, a single detection mode likely leads to substandard analysis results as different classes of structures always possess distinct physicochemical properties. Whereas representing a workaround for multi-target standardization using multi-modal data, data processing for information from diverse sources is of great importance for the accuracy of classification. METHODS In this research, multi-modal data of 78 batches of Guhong injections (GHIs) consisting of 52 normal and 26 abnormal samples were acquired by employing HPLC-UV, -ELSD, and quantitative 1H NMR (q1HNMR), of which data obtained was then individually used for Pearson correlation coefficient (PCC) calculation and partial least square-discriminant analysis (PLS-DA). Then, a mid-level data fusion method with data containing qualitative and quantitative information to establish a support vector machine (SVM) model for evaluating the batch-to-batch consistency of GHIs. RESULTS The resulting outcomes showed that datasets from one detection mode (e.g., data from UV detectors only) are inadequate for accurately assessing the product's quality. The mid-level data fusion strategy for the quality evaluation enabled the classification of normal and abnormal batches of GHIs at 100% accuracy. CONCLUSIONS A quality assessment strategy was successfully developed by leveraging a mid-level data fusion method for the batch-to-batch consistency evaluation of GHIs. This study highlights the promising utility of data from different detection modes for the quality evaluation of BDs. It also reminds manufacturers and researchers about the advantages of involving data fusion to handle multi-modal data. Especially when done jointly, this strategy can significantly increase the accuracy of product classification and serve as a capable tool for studies of other BDs.
Collapse
|
4
|
Integration of incomplete multi-omics data using Knowledge Distillation and Supervised Variational Autoencoders for disease progression prediction. J Biomed Inform 2023; 147:104512. [PMID: 37813325 DOI: 10.1016/j.jbi.2023.104512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 08/31/2023] [Accepted: 10/03/2023] [Indexed: 10/11/2023]
Abstract
OBJECTIVE The rapid advancement of high-throughput technologies in the biomedical field has resulted in the accumulation of diverse omics data types, such as mRNA expression, DNA methylation, and microRNA expression, for studying various diseases. Integrating these multi-omics datasets enables a comprehensive understanding of the molecular basis of cancer and facilitates accurate prediction of disease progression. METHODS However, conventional approaches face challenges due to the dimensionality curse problem. This paper introduces a novel framework called Knowledge Distillation and Supervised Variational AutoEncoders utilizing View Correlation Discovery Network (KD-SVAE-VCDN) to address the integration of high-dimensional multi-omics data with limited common samples. Through our experimental evaluation, we demonstrate that the proposed KD-SVAE-VCDN architecture accurately predicts the progression of breast and kidney carcinoma by effectively classifying patients as long- or short-term survivors. Furthermore, our approach outperforms other state-of-the-art multi-omics integration models. RESULTS Our findings highlight the efficacy of the KD-SVAE-VCDN architecture in predicting the disease progression of breast and kidney carcinoma. By enabling the classification of patients based on survival outcomes, our model contributes to personalized and targeted treatments. The favorable performance of our approach in comparison to several existing models suggests its potential to contribute to the advancement of cancer understanding and management. CONCLUSION The development of a robust predictive model capable of accurately forecasting disease progression at the time of diagnosis holds immense promise for advancing personalized medicine. By leveraging multi-omics data integration, our proposed KD-SVAE-VCDN framework offers an effective solution to this challenge, paving the way for more precise and tailored treatment strategies for patients with different types of cancer.
Collapse
|
5
|
DE-JANet: A unified network based on dual encoder and joint attention for Alzheimer's disease classification using multi-modal data. Comput Biol Med 2023; 165:107396. [PMID: 37703717 DOI: 10.1016/j.compbiomed.2023.107396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 07/28/2023] [Accepted: 08/26/2023] [Indexed: 09/15/2023]
Abstract
Structural magnetic resonance imaging (sMRI), which can reflect cerebral atrophy, plays an important role in the early detection of Alzheimer's disease (AD). However, the information provided by analyzing only the morphological changes in sMRI is relatively limited, and the assessment of the atrophy degree is subjective. Therefore, it is meaningful to combine sMRI with other clinical information to acquire complementary diagnosis information and achieve a more accurate classification of AD. Nevertheless, how to fuse these multi-modal data effectively is still challenging. In this paper, we propose DE-JANet, a unified AD classification network that integrates image data sMRI with non-image clinical data, such as age and Mini-Mental State Examination (MMSE) score, for more effective multi-modal analysis. DE-JANet consists of three key components: (1) a dual encoder module for extracting low-level features from the image and non-image data according to specific encoding regularity, (2) a joint attention module for fusing multi-modal features, and (3) a token classification module for performing AD-related classification according to the fused multi-modal features. Our DE-JANet is evaluated on the ADNI dataset, with a mean accuracy of 0.9722 and 0.9538 for AD classification and mild cognition impairment (MCI) classification, respectively, which is superior to existing methods and indicates advanced performance on AD-related diagnosis tasks.
Collapse
|
6
|
Replication and Refinement of Brain Age Model for adolescent development. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.16.553472. [PMID: 37645839 PMCID: PMC10462059 DOI: 10.1101/2023.08.16.553472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
The discrepancy between chronological age and estimated brain age, known as the brain age gap, may serve as a biomarker to reveal brain development and neuropsychiatric problems. This has motivated many studies focusing on the accurate estimation of brain age using different features and models, of which the generalizability is yet to be tested. Our recent study has demonstrated that conventional machine learning models can achieve high accuracy on brain age prediction during development using only a small set of selected features from multimodal brain imaging data. In the current study, we tested the replicability of various brain age models on the Adolescent Brain Cognitive Development (ABCD) cohort. We proposed a new refined model to improve the robustness of brain age prediction. The direct replication test for existing brain age models derived from the age range of 8-22 years onto the ABCD participants at baseline (9 to 10 years old) and year-two follow-up (11 to 12 years old) indicate that pre-trained models could capture the overall mean age failed precisely estimating brain age variation within a narrow range. The refined model, which combined broad prediction of the pre-trained model and granular information with the narrow age range, achieved the best performance with a mean absolute error of 0.49 and 0.48 years on the baseline and year-two data, respectively. The brain age gap yielded by the refined model showed significant associations with the participants' information processing speed and verbal comprehension ability on baseline data.
Collapse
|
7
|
Predicting time-to-conversion for dementia of Alzheimer's type using multi-modal deep survival analysis. Neurobiol Aging 2023; 121:139-156. [PMID: 36442416 PMCID: PMC10535369 DOI: 10.1016/j.neurobiolaging.2022.10.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Revised: 10/08/2022] [Accepted: 10/11/2022] [Indexed: 11/27/2022]
Abstract
Dementia of Alzheimer's Type (DAT) is a complex disorder influenced by numerous factors, and it is difficult to predict individual progression trajectory from normal or mildly impaired cognition to DAT. An in-depth examination of multiple modalities of data may yield an accurate estimate of time-to-conversion to DAT for preclinical subjects at various stages of disease development. We used a deep-learning model designed for survival analyses to predict subjects' time-to-conversion to DAT using the baseline data of 401 subjects with 63 features from MRI, genetic, and CDC (Cognitive tests, Demographic, and CSF) data in the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. Our study demonstrated that CDC data outperform genetic or MRI data in predicting DAT time-to-conversion for subjects with Mild Cognitive Impairment (MCI). On the other hand, genetic data provided the most predictive power for subjects with Normal Cognition (NC) at the time of the visit. Furthermore, combining MRI and genetic features improved the time-to-event prediction over using either modality alone. Finally, adding CDC to any combination of features only worked as well as using only the CDC features.
Collapse
|
8
|
SurvivalCNN: A deep learning-based method for gastric cancer survival prediction using radiological imaging data and clinicopathological variables. Artif Intell Med 2022; 134:102424. [PMID: 36462894 DOI: 10.1016/j.artmed.2022.102424] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Revised: 09/15/2022] [Accepted: 10/07/2022] [Indexed: 12/13/2022]
Abstract
Radiological images have shown promising effects in patient prognostication. Deep learning provides a powerful approach for in-depth analysis of imaging data and integration of multi-modal data for modeling. In this work, we propose SurvivalCNN, a deep learning structure for cancer patient survival prediction using CT imaging data and non-imaging clinical data. In SurvivalCNN, a supervised convolutional neural network is designed to extract volumetric image features, and radiomics features are also integrated to provide potentially different imaging information. Within SurvivalCNN, a novel multi-thread multi-layer perceptron module, namely, SurvivalMLP, is proposed to perform survival prediction from censored survival data. We evaluate the proposed SurvivalCNN framework on a large clinical dataset of 1061 gastric cancer patients for both overall survival (OS) and progression-free survival (PFS) prediction. We compare SurvivalCNN to three different modeling methods and examine the effects of various sets of data/features when used individually or in combination. With five-fold cross validation, our experimental results show that SurvivalCNN achieves averaged concordance index 0.849 and 0.783 for predicting OS and PFS, respectively, outperforming the compared state-of-the-art methods and the clinical model. After future validation, the proposed SurvivalCNN model may serve as a clinical tool to improve gastric cancer patient survival estimation and prognosis analysis.
Collapse
|
9
|
Abstract
Type 2 diabetes - a prevalent chronic disease worldwide - increases risk for serious health consequences including heart and kidney disease. Forecasting diabetes progression can inform disease management strategies, thereby potentially reducing the likelihood or severity of its consequences. We use continuous glucose monitoring and actigraphy data from 54 individuals with Type 2 diabetes to predict their future hemoglobin A1c, HDL cholesterol, LDL cholesterol, and triglyceride levels one year later. We use a combination of convolutional and recurrent neural networks to develop a deep neural network architecture that can learn the dynamic patterns in different sensors' data and combine those patterns with additional demographic and lab data. To further demonstrate the generalizability of our models, we also evaluate their performance using an independent public dataset of individuals with Type 1 diabetes. In addition to diabetes, our approach could be useful for other serious and chronic physical illness, where dynamic (e.g., from multiple sensors) and static (e.g., demographic) data are used for creating predictive models.
Collapse
|
10
|
Machine learning-based analysis of operator pupillary response to assess cognitive workload in clinical ultrasound imaging. Comput Biol Med 2021; 135:104589. [PMID: 34198044 PMCID: PMC8404042 DOI: 10.1016/j.compbiomed.2021.104589] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Revised: 06/12/2021] [Accepted: 06/15/2021] [Indexed: 12/12/2022]
Abstract
Introduction Pupillometry, the measurement of eye pupil diameter, is a well-established and objective modality correlated with cognitive workload. In this paper, we analyse the pupillary response of ultrasound imaging operators to assess their cognitive workload, captured while they undertake routine fetal ultrasound examinations. Our experiments and analysis are performed on real-world datasets obtained using remote eye-tracking under natural clinical environmental conditions. Methods Our analysis pipeline involves careful temporal sequence (time-series) extraction by retrospectively matching the pupil diameter data with tasks captured in the corresponding ultrasound scan video in a multi-modal data acquisition setup. This is followed by the pupil diameter pre-processing and the calculation of pupillary response sequences. Exploratory statistical analysis of the operator pupillary responses and comparisons of the distributions between ultrasonographic tasks (fetal heart versus fetal brain) and operator expertise (newly-qualified versus experienced operators) are performed. Machine learning is explored to automatically classify the temporal sequences into the corresponding ultrasonographic tasks and operator experience using temporal, spectral, and time-frequency features with classical (shallow) models, and convolutional neural networks as deep learning models. Results Preliminary statistical analysis of the extracted pupillary response shows a significant variation for different ultrasonographic tasks and operator expertise, suggesting different extents of cognitive workload in each case, as measured by pupillometry. The best-performing machine learning models achieve receiver operating characteristic (ROC) area under curve (AUC) values of 0.98 and 0.80, for ultrasonographic task classification and operator experience classification, respectively. Conclusion We conclude that we can successfully assess cognitive workload from pupil diameter changes measured while ultrasound operators perform routine scans. The machine learning allows the discrimination of the undertaken ultrasonographic tasks and scanning expertise using the pupillary response sequences as an index of the operators’ cognitive workload. A high cognitive workload can reduce operator efficiency and constrain their decision-making, hence, the ability to objectively assess cognitive workload is a first step towards understanding these effects on operator performance in biomedical applications such as medical imaging. Machine learning-based pupillary response analysis is performed to assess operator cognitive workload in clinical ultrasound. A systematic multi-modal data analysis pipeline is proposed using eye-tracking, pupillometry, and sonography data science. Pertinent challenges of natural or real-world clinical datasets are addressed. Pupillary responses around event triggers, different ultrasonographic tasks, and different operator experiences are studied. Machine learning models are learnt to classify undertaken tasks or operator expertise from pupillometric time-series data.
Collapse
|
11
|
Improvement of deep cross-modal retrieval by generating real-valued representation. PeerJ Comput Sci 2021; 7:e491. [PMID: 33987458 PMCID: PMC8093956 DOI: 10.7717/peerj-cs.491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Accepted: 03/24/2021] [Indexed: 06/12/2023]
Abstract
The cross-modal retrieval (CMR) has attracted much attention in the research community due to flexible and comprehensive retrieval. The core challenge in CMR is the heterogeneity gap, which is generated due to different statistical properties of multi-modal data. The most common solution to bridge the heterogeneity gap is representation learning, which generates a common sub-space. In this work, we propose a framework called "Improvement of Deep Cross-Modal Retrieval (IDCMR)", which generates real-valued representation. The IDCMR preserves both intra-modal and inter-modal similarity. The intra-modal similarity is preserved by selecting an appropriate training model for text and image modality. The inter-modal similarity is preserved by reducing modality-invariance loss. The mean average precision (mAP) is used as a performance measure in the CMR system. Extensive experiments are performed, and results show that IDCMR outperforms over state-of-the-art methods by a margin 4% and 2% relatively with mAP in the text to image and image to text retrieval tasks on MSCOCO and Xmedia dataset respectively.
Collapse
|
12
|
High-resolution connectomic fingerprints: Mapping neural identity and behavior. Neuroimage 2021; 229:117695. [PMID: 33422711 DOI: 10.1016/j.neuroimage.2020.117695] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Revised: 12/16/2020] [Accepted: 12/23/2020] [Indexed: 01/30/2023] Open
Abstract
Connectomes are typically mapped at low resolution based on a specific brain parcellation atlas. Here, we investigate high-resolution connectomes independent of any atlas, propose new methodologies to facilitate their mapping and demonstrate their utility in predicting behavior and identifying individuals. Using structural, functional and diffusion-weighted MRI acquired in 1000 healthy adults, we aimed to map the cortical correlates of identity and behavior at ultra-high spatial resolution. Using methods based on sparse matrix representations, we propose a computationally feasible high-resolution connectomic approach that improves neural fingerprinting and behavior prediction. Using this high-resolution approach, we find that the multimodal cortical gradients of individual uniqueness reside in the association cortices. Furthermore, our analyses identified a striking dichotomy between the facets of a person's neural identity that best predict their behavior and cognition, compared to those that best differentiate them from other individuals. Functional connectivity was one of the most accurate predictors of behavior, yet resided among the weakest differentiators of identity; whereas the converse was found for morphological properties, such as cortical curvature. This study provides new insights into the neural basis of personal identity and new tools to facilitate ultra-high-resolution connectomics.
Collapse
|
13
|
Neonatal morphometric similarity mapping for predicting brain age and characterizing neuroanatomic variation associated with preterm birth. Neuroimage Clin 2020; 25:102195. [PMID: 32044713 PMCID: PMC7016043 DOI: 10.1016/j.nicl.2020.102195] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Revised: 01/14/2020] [Accepted: 01/21/2020] [Indexed: 01/01/2023]
Abstract
Multi-contrast MRI captures information about brain macro- and micro-structure which can be combined in an integrated model to obtain a detailed "fingerprint" of the anatomical properties of an individual's brain. Inter-regional similarities between features derived from structural and diffusion MRI, including regional volumes, diffusion tensor metrics, neurite orientation dispersion and density imaging measures, can be modelled as morphometric similarity networks (MSNs). Here, individual MSNs were derived from 105 neonates (59 preterm and 46 term) who were scanned between 38 and 45 weeks postmenstrual age (PMA). Inter-regional similarities were used as predictors in a regression model of age at the time of scanning and in a classification model to discriminate between preterm and term infant brains. When tested on unseen data, the regression model predicted PMA at scan with a mean absolute error of 0.70 ± 0.56 weeks, and the classification model achieved 92% accuracy. We conclude that MSNs predict chronological brain age accurately; and they provide a data-driven approach to identify networks that characterise typical maturation and those that contribute most to neuroanatomic variation associated with preterm birth.
Collapse
|
14
|
Multi-modal brain fingerprinting: A manifold approximation based framework. Neuroimage 2018; 183:212-226. [PMID: 30099077 DOI: 10.1016/j.neuroimage.2018.08.006] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2018] [Revised: 06/22/2018] [Accepted: 08/02/2018] [Indexed: 12/01/2022] Open
Abstract
This work presents an efficient framework, based on manifold approximation, for generating brain fingerprints from multi-modal data. The proposed framework represents images as bags of local features which are used to build a subject proximity graph. Compact fingerprints are obtained by projecting this graph in a low-dimensional manifold using spectral embedding. Experiments using the T1/T2-weighted MRI, diffusion MRI, and resting-state fMRI data of 945 Human Connectome Project subjects demonstrate the benefit of combining multiple modalities, with multi-modal fingerprints more discriminative than those generated from individual modalities. Results also highlight the link between fingerprint similarity and genetic proximity, monozygotic twins having more similar fingerprints than dizygotic or non-twin siblings. This link is also reflected in the differences of feature correspondences between twin/sibling pairs, occurring in major brain structures and across hemispheres. The robustness of the proposed framework to factors like image alignment and scan resolution, as well as the reproducibility of results on retest scans, suggest the potential of multi-modal brain fingerprinting for characterizing individuals in a large cohort analysis.
Collapse
|
15
|
Video and accelerometer-based motion analysis for automated surgical skills assessment. Int J Comput Assist Radiol Surg 2018; 13:443-455. [PMID: 29380122 DOI: 10.1007/s11548-018-1704-z] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2017] [Accepted: 01/08/2018] [Indexed: 10/18/2022]
Abstract
PURPOSE Basic surgical skills of suturing and knot tying are an essential part of medical training. Having an automated system for surgical skills assessment could help save experts time and improve training efficiency. There have been some recent attempts at automated surgical skills assessment using either video analysis or acceleration data. In this paper, we present a novel approach for automated assessment of OSATS-like surgical skills and provide an analysis of different features on multi-modal data (video and accelerometer data). METHODS We conduct a large study for basic surgical skill assessment on a dataset that contained video and accelerometer data for suturing and knot-tying tasks. We introduce "entropy-based" features-approximate entropy and cross-approximate entropy, which quantify the amount of predictability and regularity of fluctuations in time series data. The proposed features are compared to existing methods of Sequential Motion Texture, Discrete Cosine Transform and Discrete Fourier Transform, for surgical skills assessment. RESULTS We report average performance of different features across all applicable OSATS-like criteria for suturing and knot-tying tasks. Our analysis shows that the proposed entropy-based features outperform previous state-of-the-art methods using video data, achieving average classification accuracies of 95.1 and 92.2% for suturing and knot tying, respectively. For accelerometer data, our method performs better for suturing achieving 86.8% average accuracy. We also show that fusion of video and acceleration features can improve overall performance for skill assessment. CONCLUSION Automated surgical skills assessment can be achieved with high accuracy using the proposed entropy features. Such a system can significantly improve the efficiency of surgical training in medical schools and teaching hospitals.
Collapse
|
16
|
Modeling eye movement patterns to characterize perceptual skill in image-based diagnostic reasoning processes. COMPUTER VISION AND IMAGE UNDERSTANDING : CVIU 2016; 151:138-152. [PMID: 36046501 PMCID: PMC9426376 DOI: 10.1016/j.cviu.2016.03.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Experts have a remarkable capability of locating, perceptually organizing, identifying, and categorizing objects in images specific to their domains of expertise. In this article, we present a hierarchical probabilistic framework to discover the stereotypical and idiosyncratic viewing behaviors exhibited with expertise-specific groups. Through these patterned eye movement behaviors we are able to elicit the domain-specific knowledge and perceptual skills from the subjects whose eye movements are recorded during diagnostic reasoning processes on medical images. Analyzing experts' eye movement patterns provides us insight into cognitive strategies exploited to solve complex perceptual reasoning tasks. An experiment was conducted to collect both eye movement and verbal narrative data from three groups of subjects with different levels or no medical training (eleven board-certified dermatologists, four dermatologists in training and thirteen undergraduates) while they were examining and describing 50 photographic dermatological images. We use a hidden Markov model to describe each subject's eye movement sequence combined with hierarchical stochastic processes to capture and differentiate the discovered eye movement patterns shared by multiple subjects within and among the three groups. Independent experts' annotations of diagnostic conceptual units of thought in the transcribed verbal narratives are time-aligned with discovered eye movement patterns to help interpret the patterns' meanings. By mapping eye movement patterns to thought units, we uncover the relationships between visual and linguistic elements of their reasoning and perceptual processes, and show the manner in which these subjects varied their behaviors while parsing the images. We also show that inferred eye movement patterns characterize groups of similar temporal and spatial properties, and specify a subset of distinctive eye movement patterns which are commonly exhibited across multiple images. Based on the combinations of the occurrences of these eye movement patterns, we are able to categorize the images from the perspective of experts' viewing strategies in a novel way. In each category, images share similar lesion distributions and configurations. Our results show that modeling with multi-modal data, representative of physicians' diagnostic viewing behaviors and thought processes, is feasible and informative to gain insights into physicians' cognitive strategies, as well as medical image understanding.
Collapse
|