1
|
de Lacy N, Lam WY, Ramshaw M. RiskPath: Explainable deep learning for multistep biomedical prediction in longitudinal data. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.09.19.24313909. [PMID: 39371168 PMCID: PMC11451668 DOI: 10.1101/2024.09.19.24313909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/08/2024]
Abstract
Predicting individual and population risk for disease outcomes and identifying persons at elevated risk is a key prerequisite for targeting interventions to improve health. However, current risk stratification tools for the common, chronic diseases that develop over the lifecourse and represent the majority of disease morbidity, mortality and healthcare costs are aging and achieve only moderate predictive performance. In some common, highly morbid conditions such as mental illness no risk stratification tools are yet available. There is an urgent need to improve predictive performance for chronic diseases and understand how cumulative, multifactorial risks aggregate over time so that intervention programs can be targeted earlier and more effectively in the disease course. Chronic diseases are the end outcomes of multifactorial risks that increment over years and represent cumulative, temporally-sensitive risk pathways. However, tools in current clinical use were constructed in older data and utilize inputs from a single data collection step. Here, we present RiskPath, a multistep deep learning method for temporally-sensitive biomedical risk prediction tailored for the constraints and demands of biomedical practice that achieves very strong performance and full translational explainability. RiskPath delineates and quantifies cumulative multifactorial risk pathways and allows the user to explore performance-complexity tradeoffs and constrain models as required by clinical use cases. Our results highlight the potential for developing a new generation of risk stratification tools and risk pathway mapping in time-dependent diseases and health outcomes by leveraging powerful timeseries deep learning methods in the wealth of biomedical data now appearing in large, longitudinal open science datasets.
Collapse
Affiliation(s)
- Nina de Lacy
- Department of Psychiatry, University of Utah, Salt Lake City, Utah
| | - Wai Yin Lam
- Scientific Computing Institute, University of Utah, Salt Lake City, Utah
| | - Michael Ramshaw
- Department of Psychiatry, University of Utah, Salt Lake City, Utah
| |
Collapse
|
2
|
de Lacy N, Ramshaw MJ. Selectively predicting the onset of ADHD, oppositional defiant disorder, and conduct disorder in early adolescence with high accuracy. Front Psychiatry 2023; 14:1280326. [PMID: 38144472 PMCID: PMC10739523 DOI: 10.3389/fpsyt.2023.1280326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/20/2023] [Accepted: 11/13/2023] [Indexed: 12/26/2023] Open
Abstract
Introduction The externalizing disorders of attention deficit hyperactivity disorder (ADHD), oppositional defiant disorder (ODD), and conduct disorder (CD) are common in adolescence and are strong predictors of adult psychopathology. While treatable, substantial diagnostic overlap complicates intervention planning. Understanding which factors predict the onset of each disorder and disambiguating their different predictors is of substantial translational interest. Materials and methods We analyzed 5,777 multimodal candidate predictors from children aged 9-10 years and their parents in the ABCD cohort to predict the future onset of ADHD, ODD, and CD at 2-year follow-up. We used deep learning optimized with an innovative AI algorithm to jointly optimize model training, perform automated feature selection, and construct individual-level predictions of illness onset and all prevailing cases at 11-12 years and examined relative predictive performance when candidate predictors were restricted to only neural metrics. Results Multimodal models achieved ~86-97% accuracy, 0.919-0.996 AUROC, and ~82-97% precision and recall in testing in held-out, unseen data. In neural-only models, predictive performance dropped substantially but nonetheless achieved accuracy and AUROC of ~80%. Parent aggressive and externalizing traits uniquely differentiated the onset of ODD, while structural MRI metrics in the limbic system were specific to CD. Psychosocial measures of sleep disorders, parent mental health and behavioral traits, and school performance proved valuable across all disorders. In neural-only models, structural and functional MRI metrics in subcortical regions and cortical-subcortical connectivity were emphasized. Overall, we identified a strong correlation between accuracy and final predictor importance. Conclusion Deep learning optimized with AI can generate highly accurate individual-level predictions of the onset of early adolescent externalizing disorders using multimodal features. While externalizing disorders are frequently co-morbid in adolescents, certain predictors were specific to the onset of ODD or CD vs. ADHD. To our knowledge, this is the first machine learning study to predict the onset of all three major adolescent externalizing disorders with the same design and participant cohort to enable direct comparisons, analyze >200 multimodal features, and include many types of neuroimaging metrics. Future study to test our observations in external validation data will help further test the generalizability of these findings.
Collapse
Affiliation(s)
- Nina de Lacy
- Huntsman Mental Health Institute, Salt Lake City, UT, United States
- Department of Psychiatry, University of Utah, Salt Lake City, UT, United States
| | - Michael J. Ramshaw
- Huntsman Mental Health Institute, Salt Lake City, UT, United States
- Department of Psychiatry, University of Utah, Salt Lake City, UT, United States
| |
Collapse
|
3
|
Harris L, Fondrie WE, Oh S, Noble WS. Evaluating Proteomics Imputation Methods with Improved Criteria. J Proteome Res 2023; 22:3427-3438. [PMID: 37861703 PMCID: PMC10949645 DOI: 10.1021/acs.jproteome.3c00205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2023]
Abstract
Quantitative measurements produced by tandem mass spectrometry proteomics experiments typically contain a large proportion of missing values. Missing values hinder reproducibility, reduce statistical power, and make it difficult to compare across samples or experiments. Although many methods exist for imputing missing values, in practice, the most commonly used methods are among the worst performing. Furthermore, previous benchmarking studies have focused on relatively simple measurements of error such as the mean-squared error between imputed and held-out values. Here we evaluate the performance of commonly used imputation methods using three practical, "downstream-centric" criteria. These criteria measure the ability to identify differentially expressed peptides, generate new quantitative peptides, and improve the peptide lower limit of quantification. Our evaluation comprises several experiment types and acquisition strategies, including data-dependent and data-independent acquisition. We find that imputation does not necessarily improve the ability to identify differentially expressed peptides but that it can identify new quantitative peptides and improve the peptide lower limit of quantification. We find that MissForest is generally the best performing method per our downstream-centric criteria. We also argue that existing imputation methods do not properly account for the variance of peptide quantifications and highlight the need for methods that do.
Collapse
Affiliation(s)
- Lincoln Harris
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
| | | | - Sewoong Oh
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, United States
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, United States
| |
Collapse
|
4
|
de Lacy N, Ramshaw MJ. Predicting new onset thought disorder in early adolescence with optimized deep learning implicates environmental-putamen interactions. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.10.23.23297438. [PMID: 37961085 PMCID: PMC10635181 DOI: 10.1101/2023.10.23.23297438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Background Thought disorder (TD) is a sensitive and specific marker of risk for schizophrenia onset. Specifying factors that predict TD onset in adolescence is important to early identification of youth at risk. However, there is a paucity of studies prospectively predicting TD onset in unstratified youth populations. Study Design We used deep learning optimized with artificial intelligence (AI) to analyze 5,777 multimodal features obtained at 9-10 years from youth and their parents in the ABCD study, including 5,014 neural metrics, to prospectively predict new onset TD cases at 11-12 years. The design was replicated for all prevailing TD cases at 11-12 years. Study Results Optimizing performance with AI, we were able to achieve 92% accuracy and F1 and 0.96 AUROC in prospectively predicting the onset of TD in early adolescence. Structural differences in the left putamen, sleep disturbances and the level of parental externalizing behaviors were specific predictors of new onset TD at 11-12 yrs, interacting with low youth prosociality, the total parental behavioral problems and parent-child conflict and whether the youth had already come to clinical attention. More important predictors showed greater inter-individual variability. Conclusions This study provides robust person-level, multivariable signatures of early adolescent TD which suggest that structural differences in the left putamen in late childhood are a candidate biomarker that interacts with psychosocial stressors to increase risk for TD onset. Our work also suggests that interventions to promote improved sleep and lessen parent-child psychosocial stressors are worthy of further exploration to modulate risk for TD onset.
Collapse
Affiliation(s)
- Nina de Lacy
- Huntsman Mental Health Institute, Salt Lake City, UT 84103
- Department of Psychiatry, University of Utah, Salt Lake City, UT 84103
| | - Michael J. Ramshaw
- Huntsman Mental Health Institute, Salt Lake City, UT 84103
- Department of Psychiatry, University of Utah, Salt Lake City, UT 84103
| |
Collapse
|
5
|
de Lacy N, Ramshaw MJ, McCauley E, Kerr KF, Kaufman J, Nathan Kutz J. Predicting individual cases of major adolescent psychiatric conditions with artificial intelligence. Transl Psychiatry 2023; 13:314. [PMID: 37816706 PMCID: PMC10564881 DOI: 10.1038/s41398-023-02599-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Revised: 09/10/2023] [Accepted: 09/20/2023] [Indexed: 10/12/2023] Open
Abstract
Three-quarters of lifetime mental illness occurs by the age of 24, but relatively little is known about how to robustly identify youth at risk to target intervention efforts known to improve outcomes. Barriers to knowledge have included obtaining robust predictions while simultaneously analyzing large numbers of different types of candidate predictors. In a new, large, transdiagnostic youth sample and multidomain high-dimension data, we used 160 candidate predictors encompassing neural, prenatal, developmental, physiologic, sociocultural, environmental, emotional and cognitive features and leveraged three different machine learning algorithms optimized with a novel artificial intelligence meta-learning technique to predict individual cases of anxiety, depression, attention deficit, disruptive behaviors and post-traumatic stress. Our models tested well in unseen, held-out data (AUC ≥ 0.94). By utilizing a large-scale design and advanced computational approaches, we were able to compare the relative predictive ability of neural versus psychosocial features in a principled manner and found that psychosocial features consistently outperformed neural metrics in their relative ability to deliver robust predictions of individual cases. We found that deep learning with artificial neural networks and tree-based learning with XGBoost outperformed logistic regression with ElasticNet, supporting the conceptualization of mental illnesses as multifactorial disease processes with non-linear relationships among predictors that can be robustly modeled with computational psychiatry techniques. To our knowledge, this is the first study to test the relative predictive ability of these gold-standard algorithms from different classes across multiple mental health conditions in youth within the same study design in multidomain data utilizing >100 candidate predictors. Further research is suggested to explore these findings in longitudinal data and validate results in an external dataset.
Collapse
Affiliation(s)
- Nina de Lacy
- Huntsman Mental Health Institute, Salt Lake City, UT, 84103, USA.
- Department of Psychiatry, University of Utah, Salt Lake City, UT, 84103, USA.
| | - Michael J Ramshaw
- Huntsman Mental Health Institute, Salt Lake City, UT, 84103, USA
- Department of Psychiatry, University of Utah, Salt Lake City, UT, 84103, USA
| | - Elizabeth McCauley
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA, USA
| | - Kathleen F Kerr
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | | | - J Nathan Kutz
- Department of Applied Mathematics, University of Washington, Seattle, WA, USA
- AI Institute for Dynamical Systems, Seattle, WA, USA
| |
Collapse
|
6
|
Feng J, Duan T, Zhou Y, Chang X, Li Y. An improved nonnegative matrix factorization with the imputation method model for pollution source apportionment during rainstorm events. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2023; 328:116888. [PMID: 36516713 DOI: 10.1016/j.jenvman.2022.116888] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Revised: 11/11/2022] [Accepted: 11/25/2022] [Indexed: 06/17/2023]
Abstract
Data scarcity caused by extreme conditions during storms adds difficulties in performing pollution source apportionment. This study integrated nonnegative matrix factorization with the imputation method (NMF-IM) to fill in missing data (NAs) and conduct source apportionment. A total of 367 river samples and 35 runoff samples were taken from the Banqiao and Nanfei River basins located in Hefei, China, during four rainfall events from June to August 2020. Sixteen indicators were quantified and used for source diagnostics using NMF-IM. The results showed that total phosphorus (TP) had higher concentrations and more violent fluctuations than total nitrogen (TN) in river samples taken from rain. NMF-IM was shown to recover the value distribution of NAs approximately. The source profiles and contribution rates calculated by NMF-IM with NAs were close to the original results calculated by NMF without NAs, with root mean square error of less than 2.3% and differences less than 9.5%. Multiple forms of nitrogen and phosphorus indicators benefit reaching reasonable source diagnostics results. At least four indicators were needed to reach the same contribution rates as 16 indicator diagnostics. The two good indicator combination groups are nitrate (NO3-N), nitrite (NO2-N), ammonia nitrogen (NH3-N), and total suspended solids (TSS) and NO3-N, NO2-N, phosphorus (PO4-P), and TSS. The pollution source contributions changed with the Antecedent dry period (ADPs) of rain events. Treated tailwater and untreated sewage were major sources, contributing more than 80% of the total pollution of the rainstorm events with short ADPs. Dust wash became the dominant contributor after 60 min and contributed 36% of the total pollution of rainstorm events with long ADPs. The average source contribution rates for rainfall events in the Banqiao River were treated tailwater (41%) > untreated sewage (27%) > dust wash (19%) > other sources (16%). The pollution source diagnostics results were verified to be reasonable by simulation using tested run-off data and literature results.
Collapse
Affiliation(s)
- Jiashen Feng
- State Key Joint Laboratory of Environment Simulation and Pollution Control, School of the Environment, Beijing Normal University, Beijing, China
| | - Tingting Duan
- State Key Joint Laboratory of Environment Simulation and Pollution Control, School of the Environment, Beijing Normal University, Beijing, China
| | - Yanqing Zhou
- State Key Joint Laboratory of Environment Simulation and Pollution Control, School of the Environment, Beijing Normal University, Beijing, China
| | - Xuan Chang
- State Key Joint Laboratory of Environment Simulation and Pollution Control, School of the Environment, Beijing Normal University, Beijing, China
| | - Yingxia Li
- State Key Joint Laboratory of Environment Simulation and Pollution Control, School of the Environment, Beijing Normal University, Beijing, China.
| |
Collapse
|
7
|
Freeman BA, Jaro S, Park T, Keene S, Tansey W, Reznik E. MIRTH: Metabolite Imputation via Rank-Transformation and Harmonization. Genome Biol 2022; 23:184. [PMID: 36050754 PMCID: PMC9438248 DOI: 10.1186/s13059-022-02738-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Accepted: 07/23/2022] [Indexed: 12/12/2022] Open
Abstract
Out of the thousands of metabolites in a given specimen, most metabolomics experiments measure only hundreds, with poor overlap across experimental platforms. Here, we describe Metabolite Imputation via Rank-Transformation and Harmonization (MIRTH), a method to impute unmeasured metabolite abundances by jointly modeling metabolite covariation across datasets which have heterogeneous coverage of metabolite features. MIRTH successfully recovers masked metabolite abundances both within single datasets and across multiple, independently-profiled datasets. MIRTH demonstrates that latent information about otherwise unmeasured metabolites is embedded within existing metabolomics data, and can be used to generate novel hypotheses and simplify existing metabolomic workflows.
Collapse
Affiliation(s)
- Benjamin A Freeman
- Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, USA
| | - Sophie Jaro
- Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, USA.,Electrical Engineering Department, The Cooper Union, New York, USA
| | - Tricia Park
- Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, USA
| | - Sam Keene
- Electrical Engineering Department, The Cooper Union, New York, USA
| | - Wesley Tansey
- Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, USA.
| | - Ed Reznik
- Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, USA.
| |
Collapse
|