1
|
Improving the Prognostic Evaluation Precision of Hospital Outcomes for Heart Failure Using Admission Notes and Clinical Tabular Data: Multimodal Deep Learning Model. J Med Internet Res 2024; 26:e54363. [PMID: 38696251 PMCID: PMC11099809 DOI: 10.2196/54363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 01/01/2024] [Accepted: 03/19/2024] [Indexed: 05/04/2024] Open
Abstract
BACKGROUND Clinical notes contain contextualized information beyond structured data related to patients' past and current health status. OBJECTIVE This study aimed to design a multimodal deep learning approach to improve the evaluation precision of hospital outcomes for heart failure (HF) using admission clinical notes and easily collected tabular data. METHODS Data for the development and validation of the multimodal model were retrospectively derived from 3 open-access US databases, including the Medical Information Mart for Intensive Care III v1.4 (MIMIC-III) and MIMIC-IV v1.0, collected from a teaching hospital from 2001 to 2019, and the eICU Collaborative Research Database v1.2, collected from 208 hospitals from 2014 to 2015. The study cohorts consisted of all patients with critical HF. The clinical notes, including chief complaint, history of present illness, physical examination, medical history, and admission medication, as well as clinical variables recorded in electronic health records, were analyzed. We developed a deep learning mortality prediction model for in-hospital patients, which underwent complete internal, prospective, and external evaluation. The Integrated Gradients and SHapley Additive exPlanations (SHAP) methods were used to analyze the importance of risk factors. RESULTS The study included 9989 (16.4%) patients in the development set, 2497 (14.1%) patients in the internal validation set, 1896 (18.3%) in the prospective validation set, and 7432 (15%) patients in the external validation set. The area under the receiver operating characteristic curve of the models was 0.838 (95% CI 0.827-0.851), 0.849 (95% CI 0.841-0.856), and 0.767 (95% CI 0.762-0.772), for the internal, prospective, and external validation sets, respectively. The area under the receiver operating characteristic curve of the multimodal model outperformed that of the unimodal models in all test sets, and tabular data contributed to higher discrimination. The medical history and physical examination were more useful than other factors in early assessments. CONCLUSIONS The multimodal deep learning model for combining admission notes and clinical tabular data showed promising efficacy as a potentially novel method in evaluating the risk of mortality in patients with HF, providing more accurate and timely decision support.
Collapse
|
2
|
Classifying Chinese Medicine Constitution Using Multimodal Deep-Learning Model. Chin J Integr Med 2024; 30:163-170. [PMID: 36374441 DOI: 10.1007/s11655-022-3541-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/13/2022] [Indexed: 11/16/2022]
Abstract
OBJECTIVE To develop a multimodal deep-learning model for classifying Chinese medicine constitution, i.e., the balanced and unbalanced constitutions, based on inspection of tongue and face images, pulse waves from palpation, and health information from a total of 540 subjects. METHODS This study data consisted of tongue and face images, pulse waves obtained by palpation, and health information, including personal information, life habits, medical history, and current symptoms, from 540 subjects (202 males and 338 females). Convolutional neural networks, recurrent neural networks, and fully connected neural networks were used to extract deep features from the data. Feature fusion and decision fusion models were constructed for the multimodal data. RESULTS The optimal models for tongue and face images, pulse waves and health information were ResNet18, Gate Recurrent Unit, and entity embedding, respectively. Feature fusion was superior to decision fusion. The multimodal analysis revealed that multimodal data compensated for the loss of information from a single mode, resulting in improved classification performance. CONCLUSIONS Multimodal data fusion can supplement single model information and improve classification performance. Our research underscores the effectiveness of multimodal deep learning technology to identify body constitution for modernizing and improving the intelligent application of Chinese medicine.
Collapse
|
3
|
Natural-Language-Driven Multimodal Representation Learning for Audio-Visual Scene-Aware Dialog System. SENSORS (BASEL, SWITZERLAND) 2023; 23:7875. [PMID: 37765933 PMCID: PMC10536977 DOI: 10.3390/s23187875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 08/13/2023] [Accepted: 08/28/2023] [Indexed: 09/29/2023]
Abstract
With the development of multimedia systems in wireless environments, the rising need for artificial intelligence is to design a system that can properly communicate with humans with a comprehensive understanding of various types of information in a human-like manner. Therefore, this paper addresses an audio-visual scene-aware dialog system that can communicate with users about audio-visual scenes. It is essential to understand not only visual and textual information but also audio information in a comprehensive way. Despite the substantial progress in multimodal representation learning with language and visual modalities, there are still two caveats: ineffective use of auditory information and the lack of interpretability of the deep learning systems' reasoning. To address these issues, we propose a novel audio-visual scene-aware dialog system that utilizes a set of explicit information from each modality as a form of natural language, which can be fused into a language model in a natural way. It leverages a transformer-based decoder to generate a coherent and correct response based on multimodal knowledge in a multitask learning setting. In addition, we also address the way of interpreting the model with a response-driven temporal moment localization method to verify how the system generates the response. The system itself provides the user with the evidence referred to in the system response process as a form of the timestamp of the scene. We show the superiority of the proposed model in all quantitative and qualitative measurements compared to the baseline. In particular, the proposed model achieved robust performance even in environments using all three modalities, including audio. We also conducted extensive experiments to investigate the proposed model. In addition, we obtained state-of-the-art performance in the system response reasoning task.
Collapse
|
4
|
HF-DDI: Predicting Drug-Drug Interaction Events Based on Multimodal Hybrid Fusion. J Comput Biol 2023; 30:961-971. [PMID: 37594774 DOI: 10.1089/cmb.2023.0068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/19/2023] Open
Abstract
Drug-drug interactions (DDIs) can have a significant impact on patient safety and health. Predicting potential DDIs before administering drugs to patients is a critical step in drug development and can help prevent adverse drug events. In this study, we propose a novel method called HF-DDI for predicting DDI events based on various drug features, including molecular structure, target, and enzyme information. Specifically, we design our model with both early fusion and late fusion strategies and utilize a score calculation module to predict the likelihood of interactions between drugs. Our model was trained and tested on a large data set of known DDIs, achieving an overall accuracy of 0.948. The results suggest that incorporating multiple drug features can improve the accuracy of DDI event prediction and may be useful for improving drug safety and patient outcomes.
Collapse
|
5
|
XMR: an explainable multimodal neural network for drug response prediction. FRONTIERS IN BIOINFORMATICS 2023; 3:1164482. [PMID: 37600972 PMCID: PMC10433751 DOI: 10.3389/fbinf.2023.1164482] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2023] [Accepted: 07/14/2023] [Indexed: 08/22/2023] Open
Abstract
Introduction: Existing large-scale preclinical cancer drug response databases provide us with a great opportunity to identify and predict potentially effective drugs to combat cancers. Deep learning models built on these databases have been developed and applied to tackle the cancer drug-response prediction task. Their prediction has been demonstrated to significantly outperform traditional machine learning methods. However, due to the "black box" characteristic, biologically faithful explanations are hardly derived from these deep learning models. Interpretable deep learning models that rely on visible neural networks (VNNs) have been proposed to provide biological justification for the predicted outcomes. However, their performance does not meet the expectation to be applied in clinical practice. Methods: In this paper, we develop an XMR model, an eXplainable Multimodal neural network for drug Response prediction. XMR is a new compact multimodal neural network consisting of two sub-networks: a visible neural network for learning genomic features and a graph neural network (GNN) for learning drugs' structural features. Both sub-networks are integrated into a multimodal fusion layer to model the drug response for the given gene mutations and the drug's molecular structures. Furthermore, a pruning approach is applied to provide better interpretations of the XMR model. We use five pathway hierarchies (cell cycle, DNA repair, diseases, signal transduction, and metabolism), which are obtained from the Reactome Pathway Database, as the architecture of VNN for our XMR model to predict drug responses of triple negative breast cancer. Results: We find that our model outperforms other state-of-the-art interpretable deep learning models in terms of predictive performance. In addition, our model can provide biological insights into explaining drug responses for triple-negative breast cancer. Discussion: Overall, combining both VNN and GNN in a multimodal fusion layer, XMR captures key genomic and molecular features and offers reasonable interpretability in biology, thereby better predicting drug responses in cancer patients. Our model would also benefit personalized cancer therapy in the future.
Collapse
|
6
|
A Comprehensive and Versatile Multimodal Deep-Learning Approach for Predicting Diverse Properties of Advanced Materials. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2023; 10:e2302508. [PMID: 37357977 PMCID: PMC10460884 DOI: 10.1002/advs.202302508] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 06/08/2023] [Indexed: 06/27/2023]
Abstract
A multimodal deep-learning (MDL) framework is presented for predicting physical properties of a ten-dimensional acrylic polymer composite material by merging physical attributes and chemical data. The MDL model comprises four modules, including three generative deep-learning models for material structure characterization and a fourth model for property prediction. The approach handles an 18-dimensional complexity, with ten compositional inputs and eight property outputs, successfully predicting 913 680 property data points across 114 210 composition conditions. This level of complexity is unprecedented in computational materials science, particularly for materials with undefined structures. A framework is proposed to analyze the high-dimensional information space for inverse material design, demonstrating flexibility and adaptability to various materials and scales, provided sufficient data are available. This study advances future research on different materials and the development of more sophisticated models, drawing the authors closer to the ultimate goal of predicting all properties of all materials.
Collapse
|
7
|
Study on the detection of water status of tomato ( Solanum lycopersicum L.) by multimodal deep learning. FRONTIERS IN PLANT SCIENCE 2023; 14:1094142. [PMID: 37324706 PMCID: PMC10264697 DOI: 10.3389/fpls.2023.1094142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Accepted: 05/10/2023] [Indexed: 06/17/2023]
Abstract
Water plays a very important role in the growth of tomato (Solanum lycopersicum L.), and how to detect the water status of tomato is the key to precise irrigation. The objective of this study is to detect the water status of tomato by fusing RGB, NIR and depth image information through deep learning. Five irrigation levels were set to cultivate tomatoes in different water states, with irrigation amounts of 150%, 125%, 100%, 75%, and 50% of reference evapotranspiration calculated by a modified Penman-Monteith equation, respectively. The water status of tomatoes was divided into five categories: severely irrigated deficit, slightly irrigated deficit, moderately irrigated, slightly over-irrigated, and severely over-irrigated. RGB images, depth images and NIR images of the upper part of the tomato plant were taken as data sets. The data sets were used to train and test the tomato water status detection models built with single-mode and multimodal deep learning networks, respectively. In the single-mode deep learning network, two CNNs, VGG-16 and Resnet-50, were trained on a single RGB image, a depth image, or a NIR image for a total of six cases. In the multimodal deep learning network, two or more of the RGB images, depth images and NIR images were trained with VGG-16 or Resnet-50, respectively, for a total of 20 combinations. Results showed that the accuracy of tomato water status detection based on single-mode deep learning ranged from 88.97% to 93.09%, while the accuracy of tomato water status detection based on multimodal deep learning ranged from 93.09% to 99.18%. The multimodal deep learning significantly outperformed the single-modal deep learning. The tomato water status detection model built using a multimodal deep learning network with ResNet-50 for RGB images and VGG-16 for depth and NIR images was optimal. This study provides a novel method for non-destructive detection of water status of tomato and gives a reference for precise irrigation management.
Collapse
|
8
|
Data augmentation and multimodal learning for predicting drug response in patient-derived xenografts from gene expressions and histology images. Front Med (Lausanne) 2023; 10:1058919. [PMID: 36960342 PMCID: PMC10027779 DOI: 10.3389/fmed.2023.1058919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 02/10/2023] [Indexed: 03/09/2023] Open
Abstract
Patient-derived xenografts (PDXs) are an appealing platform for preclinical drug studies. A primary challenge in modeling drug response prediction (DRP) with PDXs and neural networks (NNs) is the limited number of drug response samples. We investigate multimodal neural network (MM-Net) and data augmentation for DRP in PDXs. The MM-Net learns to predict response using drug descriptors, gene expressions (GE), and histology whole-slide images (WSIs). We explore whether combining WSIs with GE improves predictions as compared with models that use GE alone. We propose two data augmentation methods which allow us training multimodal and unimodal NNs without changing architectures with a single larger dataset: 1) combine single-drug and drug-pair treatments by homogenizing drug representations, and 2) augment drug-pairs which doubles the sample size of all drug-pair samples. Unimodal NNs which use GE are compared to assess the contribution of data augmentation. The NN that uses the original and the augmented drug-pair treatments as well as single-drug treatments outperforms NNs that ignore either the augmented drug-pairs or the single-drug treatments. In assessing the multimodal learning based on the MCC metric, MM-Net outperforms all the baselines. Our results show that data augmentation and integration of histology images with GE can improve prediction performance of drug response in PDXs.
Collapse
|
9
|
A Multimodal Deep Learning Approach to Predicting Systemic Diseases from Oral Conditions. Diagnostics (Basel) 2022; 12:diagnostics12123192. [PMID: 36553200 PMCID: PMC9777898 DOI: 10.3390/diagnostics12123192] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Revised: 12/09/2022] [Accepted: 12/13/2022] [Indexed: 12/23/2022] Open
Abstract
Background: It is known that oral diseases such as periodontal (gum) disease are closely linked to various systemic diseases and disorders. Deep learning advances have the potential to make major contributions to healthcare, particularly in the domains that rely on medical imaging. Incorporating non-imaging information based on clinical and laboratory data may allow clinicians to make more comprehensive and accurate decisions. Methods: Here, we developed a multimodal deep learning method to predict systemic diseases and disorders from oral health conditions. A dual-loss autoencoder was used in the first phase to extract periodontal disease-related features from 1188 panoramic radiographs. Then, in the second phase, we fused the image features with the demographic data and clinical information taken from electronic health records (EHR) to predict systemic diseases. We used receiver operation characteristics (ROC) and accuracy to evaluate our model. The model was further validated by an unseen test dataset. Findings: According to our findings, the top three most accurately predicted chapters, in order, are the Chapters III, VI and IX. The results indicated that the proposed model could predict systemic diseases belonging to Chapters III, VI and IX, with AUC values of 0.92 (95% CI, 0.90-94), 0.87 (95% CI, 0.84-89) and 0.78 (95% CI, 0.75-81), respectively. To assess the robustness of the models, we performed the evaluation on the unseen test dataset for these chapters and the results showed an accuracy of 0.88, 0.82 and 0.72 for Chapters III, VI and IX, respectively. Interpretation: The present study shows that the combination of panoramic radiograph and clinical oral features could be considered to train a fusion deep learning model for predicting systemic diseases and disorders.
Collapse
|
10
|
Multimodal attention-based deep learning for Alzheimer's disease diagnosis. J Am Med Inform Assoc 2022; 29:2014-2022. [PMID: 36149257 PMCID: PMC9667156 DOI: 10.1093/jamia/ocac168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 08/10/2022] [Accepted: 09/21/2022] [Indexed: 11/13/2022] Open
Abstract
OBJECTIVE Alzheimer's disease (AD) is the most common neurodegenerative disorder with one of the most complex pathogeneses, making effective and clinically actionable decision support difficult. The objective of this study was to develop a novel multimodal deep learning framework to aid medical professionals in AD diagnosis. MATERIALS AND METHODS We present a Multimodal Alzheimer's Disease Diagnosis framework (MADDi) to accurately detect the presence of AD and mild cognitive impairment (MCI) from imaging, genetic, and clinical data. MADDi is novel in that we use cross-modal attention, which captures interactions between modalities-a method not previously explored in this domain. We perform multi-class classification, a challenging task considering the strong similarities between MCI and AD. We compare with previous state-of-the-art models, evaluate the importance of attention, and examine the contribution of each modality to the model's performance. RESULTS MADDi classifies MCI, AD, and controls with 96.88% accuracy on a held-out test set. When examining the contribution of different attention schemes, we found that the combination of cross-modal attention with self-attention performed the best, and no attention layers in the model performed the worst, with a 7.9% difference in F1-scores. DISCUSSION Our experiments underlined the importance of structured clinical data to help machine learning models contextualize and interpret the remaining modalities. Extensive ablation studies showed that any multimodal mixture of input features without access to structured clinical information suffered marked performance losses. CONCLUSION This study demonstrates the merit of combining multiple input modalities via cross-modal attention to deliver highly accurate AD diagnostic decision support.
Collapse
|
11
|
Deep multimodal predictome for studying mental disorders. Hum Brain Mapp 2022; 44:509-522. [PMID: 36574598 PMCID: PMC9842924 DOI: 10.1002/hbm.26077] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 07/29/2022] [Accepted: 08/16/2022] [Indexed: 01/25/2023] Open
Abstract
Characterizing neuropsychiatric disorders is challenging due to heterogeneity in the population. We propose combining structural and functional neuroimaging and genomic data in a multimodal classification framework to leverage their complementary information. Our objectives are two-fold (i) to improve the classification of disorders and (ii) to introspect the concepts learned to explore underlying neural and biological mechanisms linked to mental disorders. Previous multimodal studies have focused on naïve neural networks, mostly perceptron, to learn modality-wise features and often assume equal contribution from each modality. Our focus is on the development of neural networks for feature learning and implementing an adaptive control unit for the fusion phase. Our mid fusion with attention model includes a multilayer feed-forward network, an autoencoder, a bi-directional long short-term memory unit with attention as the features extractor, and a linear attention module for controlling modality-specific influence. The proposed model acquired 92% (p < .0001) accuracy in schizophrenia prediction, outperforming several other state-of-the-art models applied to unimodal or multimodal data. Post hoc feature analyses uncovered critical neural features and genes/biological pathways associated with schizophrenia. The proposed model effectively combines multimodal neuroimaging and genomics data for predicting mental disorders. Interpreting salient features identified by the model may advance our understanding of their underlying etiological mechanisms.
Collapse
|
12
|
Multimodal deep learning with feature level fusion for identification of choroidal neovascularization activity in age-related macular degeneration. Acta Ophthalmol 2022; 100:e512-e520. [PMID: 34159761 DOI: 10.1111/aos.14928] [Citation(s) in RCA: 59] [Impact Index Per Article: 29.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 05/20/2021] [Indexed: 02/06/2023]
Abstract
PURPOSE This study aimed to determine the efficacy of a multimodal deep learning (DL) model using optical coherence tomography (OCT) and optical coherence tomography angiography (OCTA) images for the assessment of choroidal neovascularization (CNV) in neovascular age-related macular degeneration (AMD). METHODS This retrospective and cross-sectional study was performed at a multicentre, and the inclusion criteria were age >50 years and a diagnosis of typical neovascular AMD. The OCT and OCTA data for an internal data set and two external data sets were collected. A DL model was developed with a novel feature-level fusion (FLF) method utilized to combine the multimodal data. The results were compared with identification performed by an ophthalmologist. The best model was tested on two external data sets to show its potential for clinical use. RESULTS Our best model achieved an accuracy of 95.5% and an area under the curve (AUC) of 0.9796 on multimodal data inputs for the internal data set, which is comparable to the performance of retinal specialists. The proposed model reached an accuracy of 100.00% and an AUC of 1.0 for the Ningbo data set, and these performance indicators were 90.48% and an AUC of 0.9727 for the Jinhua data set. CONCLUSION The FLF method is feasible and highly accurate, and could enhance the power of the existing computer-aided diagnosis systems. The bi-modal computer-aided diagnosis (CADx) system for the automated identification of CNV activity is an accurate and promising tool in the realm of public health.
Collapse
|
13
|
Multimodal, multitask, multiattention (M3) deep learning detection of reticular pseudodrusen: Toward automated and accessible classification of age-related macular degeneration. J Am Med Inform Assoc 2021; 28:1135-1148. [PMID: 33792724 PMCID: PMC8200273 DOI: 10.1093/jamia/ocaa302] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2020] [Accepted: 11/16/2020] [Indexed: 02/06/2023] Open
Abstract
OBJECTIVE Reticular pseudodrusen (RPD), a key feature of age-related macular degeneration (AMD), are poorly detected by human experts on standard color fundus photography (CFP) and typically require advanced imaging modalities such as fundus autofluorescence (FAF). The objective was to develop and evaluate the performance of a novel multimodal, multitask, multiattention (M3) deep learning framework on RPD detection. MATERIALS AND METHODS A deep learning framework (M3) was developed to detect RPD presence accurately using CFP alone, FAF alone, or both, employing >8000 CFP-FAF image pairs obtained prospectively (Age-Related Eye Disease Study 2). The M3 framework includes multimodal (detection from single or multiple image modalities), multitask (training different tasks simultaneously to improve generalizability), and multiattention (improving ensembled feature representation) operation. Performance on RPD detection was compared with state-of-the-art deep learning models and 13 ophthalmologists; performance on detection of 2 other AMD features (geographic atrophy and pigmentary abnormalities) was also evaluated. RESULTS For RPD detection, M3 achieved an area under the receiver-operating characteristic curve (AUROC) of 0.832, 0.931, and 0.933 for CFP alone, FAF alone, and both, respectively. M3 performance on CFP was very substantially superior to human retinal specialists (median F1 score = 0.644 vs 0.350). External validation (the Rotterdam Study) demonstrated high accuracy on CFP alone (AUROC, 0.965). The M3 framework also accurately detected geographic atrophy and pigmentary abnormalities (AUROC, 0.909 and 0.912, respectively), demonstrating its generalizability. CONCLUSIONS This study demonstrates the successful development, robust evaluation, and external validation of a novel deep learning framework that enables accessible, accurate, and automated AMD diagnosis and prognosis.
Collapse
|
14
|
Can We Ditch Feature Engineering? End-to-End Deep Learning for Affect Recognition from Physiological Sensor Data. SENSORS 2020; 20:s20226535. [PMID: 33207564 PMCID: PMC7697590 DOI: 10.3390/s20226535] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Revised: 11/01/2020] [Accepted: 11/06/2020] [Indexed: 01/18/2023]
Abstract
To further extend the applicability of wearable sensors in various domains such as mobile health systems and the automotive industry, new methods for accurately extracting subtle physiological information from these wearable sensors are required. However, the extraction of valuable information from physiological signals is still challenging—smartphones can count steps and compute heart rate, but they cannot recognize emotions and related affective states. This study analyzes the possibility of using end-to-end multimodal deep learning (DL) methods for affect recognition. Ten end-to-end DL architectures are compared on four different datasets with diverse raw physiological signals used for affect recognition, including emotional and stress states. The DL architectures specialized for time-series classification were enhanced to simultaneously facilitate learning from multiple sensors, each having their own sampling frequency. To enable fair comparison among the different DL architectures, Bayesian optimization was used for hyperparameter tuning. The experimental results showed that the performance of the models depends on the intensity of the physiological response induced by the affective stimuli, i.e., the DL models recognize stress induced by the Trier Social Stress Test more successfully than they recognize emotional changes induced by watching affective content, e.g., funny videos. Additionally, the results showed that the CNN-based architectures might be more suitable than LSTM-based architectures for affect recognition from physiological sensors.
Collapse
|
15
|
MildInt: Deep Learning-Based Multimodal Longitudinal Data Integration Framework. Front Genet 2019; 10:617. [PMID: 31316553 PMCID: PMC6611503 DOI: 10.3389/fgene.2019.00617] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Accepted: 06/13/2019] [Indexed: 12/18/2022] Open
Abstract
As large amounts of heterogeneous biomedical data become available, numerous methods for integrating such datasets have been developed to extract complementary knowledge from multiple domains of sources. Recently, a deep learning approach has shown promising results in a variety of research areas. However, applying the deep learning approach requires expertise for constructing a deep architecture that can take multimodal longitudinal data. Thus, in this paper, a deep learning-based python package for data integration is developed. The python package deep learning-based multimodal longitudinal data integration framework (MildInt) provides the preconstructed deep learning architecture for a classification task. MildInt contains two learning phases: learning feature representation from each modality of data and training a classifier for the final decision. Adopting deep architecture in the first phase leads to learning more task-relevant feature representation than a linear model. In the second phase, linear regression classifier is used for detecting and investigating biomarkers from multimodal data. Thus, by combining the linear model and the deep learning model, higher accuracy and better interpretability can be achieved. We validated the performance of our package using simulation data and real data. For the real data, as a pilot study, we used clinical and multimodal neuroimaging datasets in Alzheimer's disease to predict the disease progression. MildInt is capable of integrating multiple forms of numerical data including time series and non-time series data for extracting complementary features from the multimodal dataset.
Collapse
|
16
|
Multimodal 3D DenseNet for IDH Genotype Prediction in Gliomas. Genes (Basel) 2018; 9:E382. [PMID: 30061525 PMCID: PMC6115744 DOI: 10.3390/genes9080382] [Citation(s) in RCA: 65] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2018] [Revised: 07/06/2018] [Accepted: 07/16/2018] [Indexed: 01/08/2023] Open
Abstract
Non-invasive prediction of isocitrate dehydrogenase (IDH) genotype plays an important role in tumor glioma diagnosis and prognosis. Recently, research has shown that radiology images can be a potential tool for genotype prediction, and fusion of multi-modality data by deep learning methods can further provide complementary information to enhance prediction accuracy. However, it still does not have an effective deep learning architecture to predict IDH genotype with three-dimensional (3D) multimodal medical images. In this paper, we proposed a novel multimodal 3D DenseNet (M3D-DenseNet) model to predict IDH genotypes with multimodal magnetic resonance imaging (MRI) data. To evaluate its performance, we conducted experiments on the BRATS-2017 and The Cancer Genome Atlas breast invasive carcinoma (TCGA-BRCA) dataset to get image data as input and gene mutation information as the target, respectively. We achieved 84.6% accuracy (area under the curve (AUC) = 85.7%) on the validation dataset. To evaluate its generalizability, we applied transfer learning techniques to predict World Health Organization (WHO) grade status, which also achieved a high accuracy of 91.4% (AUC = 94.8%) on validation dataset. With the properties of automatic feature extraction, and effective and high generalizability, M3D-DenseNet can serve as a useful method for other multimodal radiogenomics problems and has the potential to be applied in clinical decision making.
Collapse
|