1
|
Borgmästars E, Ulfenborg B, Johansson M, Jonsson P, Billing O, Franklin O, Lundin C, Jacobson S, Simm M, Lubovac-Pilav Z, Sund M. Multi-omics profiling to identify early plasma biomarkers in pre-diagnostic pancreatic ductal adenocarcinoma: a nested case-control study. Transl Oncol 2024; 48:102059. [PMID: 39018772 DOI: 10.1016/j.tranon.2024.102059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 05/20/2024] [Accepted: 07/05/2024] [Indexed: 07/19/2024] Open
Abstract
Pancreatic ductal adenocarcinoma (PDAC) is an aggressive disease with poor survival. Novel biomarkers are urgently needed to improve the outcome through early detection. Here, we aimed to discover novel biomarkers for early PDAC detection using multi-omics profiling in pre-diagnostic plasma samples biobanked after routine health examinations. A nested case-control study within the Northern Sweden Health and Disease Study was designed. Pre-diagnostic plasma samples from 37 future PDAC patients collected within 2.3 years before diagnosis and 37 matched healthy controls were included. We analyzed metabolites using liquid chromatography mass spectrometry and gas chromatography mass spectrometry, microRNAs by HTG edgeseq, proteins by multiplex proximity extension assays, as well as three clinical biomarkers using milliplex technology. Supervised and unsupervised multi-omics integration were performed as well as univariate analyses for the different omics types and clinical biomarkers. Multiple hypothesis testing was corrected using Benjamini-Hochberg's method and a false discovery rate (FDR) below 0.1 was considered statistically significant. Carbohydrate antigen (CA) 19-9 was associated with PDAC risk (OR [95 % CI] = 3.09 [1.31-7.29], FDR = 0.03) and increased closer to PDAC diagnosis. Supervised multi-omics models resulted in poor discrimination between future PDAC cases and healthy controls with obtained accuracies between 0.429-0.500. No single metabolite, microRNA, or protein was differentially altered (FDR < 0.1) between future PDAC cases and healthy controls. CA 19-9 levels increase up to two years prior to PDAC diagnosis but extensive multi-omics analysis including metabolomics, microRNAomics and proteomics in this cohort did not identify novel early biomarkers for PDAC.
Collapse
Affiliation(s)
- Emmy Borgmästars
- Department of Diagnostics and Intervention/ Surgery, Umeå University, Umeå, Sweden.
| | - Benjamin Ulfenborg
- School of Bioscience, Department of Biology and Bioinformatics, University of Skövde, Skövde, Sweden
| | - Mattias Johansson
- Genomic Epidemiology Branch, International Agency for Research on Cancer, Lyon, France
| | - Pär Jonsson
- Department of Chemistry, Umeå University, Umeå, Sweden
| | - Ola Billing
- Department of Diagnostics and Intervention/ Surgery, Umeå University, Umeå, Sweden
| | - Oskar Franklin
- Department of Diagnostics and Intervention/ Surgery, Umeå University, Umeå, Sweden; Division of Surgical Oncology, Department of Surgery, University of Colorado School of Medicine, Aurora, CO, USA
| | - Christina Lundin
- Department of Diagnostics and Intervention/ Surgery, Umeå University, Umeå, Sweden
| | - Sara Jacobson
- Department of Diagnostics and Intervention/ Surgery, Umeå University, Umeå, Sweden
| | - Maja Simm
- Department of Diagnostics and Intervention/ Surgery, Umeå University, Umeå, Sweden; Department of Clinical Sciences/ Obstetrics and Gynecology, Umeå University, Umeå, Sweden
| | - Zelmina Lubovac-Pilav
- School of Bioscience, Department of Biology and Bioinformatics, University of Skövde, Skövde, Sweden
| | - Malin Sund
- Department of Diagnostics and Intervention/ Surgery, Umeå University, Umeå, Sweden; Department of Surgery, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
| |
Collapse
|
2
|
Novoloaca A, Broc C, Beloeil L, Yu WH, Becker J. Comparative analysis of integrative classification methods for multi-omics data. Brief Bioinform 2024; 25:bbae331. [PMID: 38985929 PMCID: PMC11234228 DOI: 10.1093/bib/bbae331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 05/31/2024] [Indexed: 07/12/2024] Open
Abstract
Recent advances in sequencing, mass spectrometry, and cytometry technologies have enabled researchers to collect multiple 'omics data types from a single sample. These large datasets have led to a growing consensus that a holistic approach is needed to identify new candidate biomarkers and unveil mechanisms underlying disease etiology, a key to precision medicine. While many reviews and benchmarks have been conducted on unsupervised approaches, their supervised counterparts have received less attention in the literature and no gold standard has emerged yet. In this work, we present a thorough comparison of a selection of six methods, representative of the main families of intermediate integrative approaches (matrix factorization, multiple kernel methods, ensemble learning, and graph-based methods). As non-integrative control, random forest was performed on concatenated and separated data types. Methods were evaluated for classification performance on both simulated and real-world datasets, the latter being carefully selected to cover different medical applications (infectious diseases, oncology, and vaccines) and data modalities. A total of 15 simulation scenarios were designed from the real-world datasets to explore a large and realistic parameter space (e.g. sample size, dimensionality, class imbalance, effect size). On real data, the method comparison showed that integrative approaches performed better or equally well than their non-integrative counterpart. By contrast, DIABLO and the four random forest alternatives outperform the others across the majority of simulation scenarios. The strengths and limitations of these methods are discussed in detail as well as guidelines for future applications.
Collapse
Affiliation(s)
- Alexei Novoloaca
- BIOASTER Research Institute, 40 avenue Tony Garnier, F-69007 Lyon, France
| | - Camilo Broc
- BIOASTER Research Institute, 40 avenue Tony Garnier, F-69007 Lyon, France
| | - Laurent Beloeil
- BIOASTER Research Institute, 40 avenue Tony Garnier, F-69007 Lyon, France
| | - Wen-Han Yu
- Bill & Melinda Gates Medical Research Institute, Cambridge, Massachusetts, MA 02139, United States
| | - Jérémie Becker
- BIOASTER Research Institute, 40 avenue Tony Garnier, F-69007 Lyon, France
| |
Collapse
|
3
|
Drouard G, Mykkänen J, Heiskanen J, Pohjonen J, Ruohonen S, Pahkala K, Lehtimäki T, Wang X, Ollikainen M, Ripatti S, Pirinen M, Raitakari O, Kaprio J. Exploring machine learning strategies for predicting cardiovascular disease risk factors from multi-omic data. BMC Med Inform Decis Mak 2024; 24:116. [PMID: 38698395 PMCID: PMC11064347 DOI: 10.1186/s12911-024-02521-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Accepted: 04/29/2024] [Indexed: 05/05/2024] Open
Abstract
BACKGROUND Machine learning (ML) classifiers are increasingly used for predicting cardiovascular disease (CVD) and related risk factors using omics data, although these outcomes often exhibit categorical nature and class imbalances. However, little is known about which ML classifier, omics data, or upstream dimension reduction strategy has the strongest influence on prediction quality in such settings. Our study aimed to illustrate and compare different machine learning strategies to predict CVD risk factors under different scenarios. METHODS We compared the use of six ML classifiers in predicting CVD risk factors using blood-derived metabolomics, epigenetics and transcriptomics data. Upstream omic dimension reduction was performed using either unsupervised or semi-supervised autoencoders, whose downstream ML classifier performance we compared. CVD risk factors included systolic and diastolic blood pressure measurements and ultrasound-based biomarkers of left ventricular diastolic dysfunction (LVDD; E/e' ratio, E/A ratio, LAVI) collected from 1,249 Finnish participants, of which 80% were used for model fitting. We predicted individuals with low, high or average levels of CVD risk factors, the latter class being the most common. We constructed multi-omic predictions using a meta-learner that weighted single-omic predictions. Model performance comparisons were based on the F1 score. Finally, we investigated whether learned omic representations from pre-trained semi-supervised autoencoders could improve outcome prediction in an external cohort using transfer learning. RESULTS Depending on the ML classifier or omic used, the quality of single-omic predictions varied. Multi-omics predictions outperformed single-omics predictions in most cases, particularly in the prediction of individuals with high or low CVD risk factor levels. Semi-supervised autoencoders improved downstream predictions compared to the use of unsupervised autoencoders. In addition, median gains in Area Under the Curve by transfer learning compared to modelling from scratch ranged from 0.09 to 0.14 and 0.07 to 0.11 units for transcriptomic and metabolomic data, respectively. CONCLUSIONS By illustrating the use of different machine learning strategies in different scenarios, our study provides a platform for researchers to evaluate how the choice of omics, ML classifiers, and dimension reduction can influence the quality of CVD risk factor predictions.
Collapse
Affiliation(s)
- Gabin Drouard
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland.
| | - Juha Mykkänen
- Centre for Population Health Research, University of Turku and Turku University Hospital, Turku, Finland
- Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, Turku, Finland
| | - Jarkko Heiskanen
- Centre for Population Health Research, University of Turku and Turku University Hospital, Turku, Finland
- Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, Turku, Finland
| | - Joona Pohjonen
- Research Program in Systems Oncology, University of Helsinki, Helsinki, Finland
| | - Saku Ruohonen
- Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, Turku, Finland
| | - Katja Pahkala
- Centre for Population Health Research, University of Turku and Turku University Hospital, Turku, Finland
- Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, Turku, Finland
- Paavo Nurmi Centre & Unit for Health and Physical Activity, University of Turku, Turku, Finland
| | - Terho Lehtimäki
- Department of Clinical Chemistry, Fimlab Laboratories, and Finnish Cardiovascular Research Center - Tampere, Faculty of Medicine and Health Technology, Tampere University, 33520, Tampere, Finland
| | - Xiaoling Wang
- Georgia Prevention Institute, Medical College of Georgia, Augusta University, Augusta, GA, USA
| | - Miina Ollikainen
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
- Minerva Foundation Institute for Medical Research, Helsinki, Finland
| | - Samuli Ripatti
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
- Public Health, Faculty of Medicine, University of Helsinki, Helsinki, Finland
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Matti Pirinen
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
- Public Health, Faculty of Medicine, University of Helsinki, Helsinki, Finland
- Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
| | - Olli Raitakari
- Centre for Population Health Research, University of Turku and Turku University Hospital, Turku, Finland
- Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, Turku, Finland
- Department of Clinical Physiology and Nuclear Medicine, Turku University Hospital, Turku, Finland
| | - Jaakko Kaprio
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland.
| |
Collapse
|
4
|
Lac L, Leung CK, Hu P. Computational frameworks integrating deep learning and statistical models in mining multimodal omics data. J Biomed Inform 2024; 152:104629. [PMID: 38552994 DOI: 10.1016/j.jbi.2024.104629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 02/26/2024] [Accepted: 03/25/2024] [Indexed: 04/04/2024]
Abstract
BACKGROUND In health research, multimodal omics data analysis is widely used to address important clinical and biological questions. Traditional statistical methods rely on the strong assumptions of distribution. Statistical methods such as testing and differential expression are commonly used in omics analysis. Deep learning, on the other hand, is an advanced computer science technique that is powerful in mining high-dimensional omics data for prediction tasks. Recently, integrative frameworks or methods have been developed for omics studies that combine statistical models and deep learning algorithms. METHODS AND RESULTS The aim of these integrative frameworks is to combine the strengths of both statistical methods and deep learning algorithms to improve prediction accuracy while also providing interpretability and explainability. This review report discusses the current state-of-the-art integrative frameworks, their limitations, and potential future directions in survival and time-to-event longitudinal analysis, dimension reduction and clustering, regression and classification, feature selection, and causal and transfer learning.
Collapse
Affiliation(s)
- Leann Lac
- Department of Computer Science, University of Manitoba, Winnipeg, Manitoba, Canada; Department of Statistics, University of Manitoba, Winnipeg, Manitoba, Canada
| | - Carson K Leung
- Department of Computer Science, University of Manitoba, Winnipeg, Manitoba, Canada
| | - Pingzhao Hu
- Department of Computer Science, University of Manitoba, Winnipeg, Manitoba, Canada; Department of Biochemistry, Western University, London, Ontario, Canada; Department of Computer Science, Western University, London, Ontario, Canada; Department of Oncology, Western University, London, Ontario, Canada; Department of Epidemiology and Biostatistics, Western University, London, Ontario, Canada; The Children's Health Research Institute, Lawson Health Research Institute, London, Ontario, Canada.
| |
Collapse
|
5
|
Young T, Laroche O, Walker SP, Miller MR, Casanovas P, Steiner K, Esmaeili N, Zhao R, Bowman JP, Wilson R, Bridle A, Carter CG, Nowak BF, Alfaro AC, Symonds JE. Prediction of Feed Efficiency and Performance-Based Traits in Fish via Integration of Multiple Omics and Clinical Covariates. BIOLOGY 2023; 12:1135. [PMID: 37627019 PMCID: PMC10452023 DOI: 10.3390/biology12081135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 08/07/2023] [Accepted: 08/08/2023] [Indexed: 08/27/2023]
Abstract
Fish aquaculture is a rapidly expanding global industry, set to support growing demands for sources of marine protein. Enhancing feed efficiency (FE) in farmed fish is required to reduce production costs and improve sector sustainability. Recognising that organisms are complex systems whose emerging phenotypes are the product of multiple interacting molecular processes, systems-based approaches are expected to deliver new biological insights into FE and growth performance. Here, we establish 14 diverse layers of multi-omics and clinical covariates to assess their capacities to predict FE and associated performance traits in a fish model (Oncorhynchus tshawytscha) and uncover the influential variables. Inter-omic relatedness between the different layers revealed several significant concordances, particularly between datasets originating from similar material/tissue and between blood indicators and some of the proteomic (liver), metabolomic (liver), and microbiomic layers. Single- and multi-layer random forest (RF) regression models showed that integration of all data layers provide greater FE prediction power than any single-layer model alone. Although FE was among the most challenging of the traits we attempted to predict, the mean accuracy of 40 different FE models in terms of root-mean square errors normalized to percentage was 30.4%, supporting RF as a feature selection tool and approach for complex trait prediction. Major contributions to the integrated FE models were derived from layers of proteomic and metabolomic data, with substantial influence also provided by the lipid composition layer. A correlation matrix of the top 27 variables in the models highlighted FE trait-associations with faecal bacteria (Serratia spp.), palmitic and nervonic acid moieties in whole body lipids, levels of free glycerol in muscle, and N-acetylglutamic acid content in liver. In summary, we identified subsets of molecular characteristics for the assessment of commercially relevant performance-based metrics in farmed Chinook salmon.
Collapse
Affiliation(s)
- Tim Young
- Aquaculture Biotechnology Research Group, Department of Environmental Science, School of Science, Private Bag 92006, Auckland 1142, New Zealand
- The Centre for Biomedical and Chemical Sciences, School of Science, Auckland University of Technology, Private Bag 92006, Auckland 1142, New Zealand
| | | | | | - Matthew R. Miller
- Cawthron Institute, Nelson 7010, New Zealand
- Institute for Marine and Antarctic Studies, University of Tasmania, Hobart Private Bag 49, Hobart 7005, Australia
| | | | | | - Noah Esmaeili
- Institute for Marine and Antarctic Studies, University of Tasmania, Hobart Private Bag 49, Hobart 7005, Australia
| | - Ruixiang Zhao
- Institute for Marine and Antarctic Studies, University of Tasmania, Hobart Private Bag 49, Hobart 7005, Australia
| | - John P. Bowman
- Tasmanian Institute of Agricultural Research, University of Tasmania, Hobart 7005, Australia
| | - Richard Wilson
- Central Science Laboratory, Research Division, University of Tasmania, Hobart 7001, Australia
| | - Andrew Bridle
- Institute for Marine and Antarctic Studies, University of Tasmania, Hobart Private Bag 49, Hobart 7005, Australia
| | - Chris G. Carter
- Institute for Marine and Antarctic Studies, University of Tasmania, Hobart Private Bag 49, Hobart 7005, Australia
- Blue Economy Cooperative Research Centre, Launceston 7250, Australia
| | - Barbara F. Nowak
- Institute for Marine and Antarctic Studies, University of Tasmania, Hobart Private Bag 49, Hobart 7005, Australia
| | - Andrea C. Alfaro
- Aquaculture Biotechnology Research Group, Department of Environmental Science, School of Science, Private Bag 92006, Auckland 1142, New Zealand
| | - Jane E. Symonds
- Cawthron Institute, Nelson 7010, New Zealand
- Institute for Marine and Antarctic Studies, University of Tasmania, Hobart Private Bag 49, Hobart 7005, Australia
| |
Collapse
|
6
|
Wissel D, Rowson D, Boeva V. Systematic comparison of multi-omics survival models reveals a widespread lack of noise resistance. CELL REPORTS METHODS 2023; 3:100461. [PMID: 37159669 PMCID: PMC10162996 DOI: 10.1016/j.crmeth.2023.100461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Revised: 02/01/2023] [Accepted: 03/30/2023] [Indexed: 05/11/2023]
Abstract
As observed in several previous studies, integrating more molecular modalities in multi-omics cancer survival models may not always improve model accuracy. In this study, we compared eight deep learning and four statistical integration techniques for survival prediction on 17 multi-omics datasets, examining model performance in terms of overall accuracy and noise resistance. We found that one deep learning method, mean late fusion, and two statistical methods, PriorityLasso and BlockForest, performed best in terms of both noise resistance and overall discriminative and calibration performance. Nevertheless, all methods struggled to adequately handle noise when too many modalities were added. In summary, we confirmed that current multi-omics survival methods are not sufficiently noise resistant. We recommend relying on only modalities for which there is known predictive value for a particular cancer type until models that have stronger noise-resistance properties are developed.
Collapse
Affiliation(s)
- David Wissel
- ETH Zurich, Department of Computer Science, Zurich, Switzerland
- University of Zurich, Department of Molecular Life Sciences, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Daniel Rowson
- ETH Zurich, Department of Computer Science, Zurich, Switzerland
| | - Valentina Boeva
- ETH Zurich, Department of Computer Science, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Université de Paris UMR-S1016, Institut Cochin, Inserm U1016, Paris, France
- Corresponding author
| |
Collapse
|
7
|
Predicting prediction: A systematic workflow to analyze factors affecting the classification performance in genomic biomarker discovery. PLoS One 2022; 17:e0276607. [DOI: 10.1371/journal.pone.0276607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Accepted: 10/11/2022] [Indexed: 11/11/2022] Open
Abstract
High throughput technologies in genomics enable the analysis of small alterations in gene expression levels. Patterns of such deviations are an important starting point for the discovery and verification of new biomarker candidates. Identifying such patterns is a challenging task that requires sophisticated machine learning approaches. Currently, there are a variety of classification models, and a common approach is to compare the performance and select the best one for a given classification problem. Since the association between the features of a data set and the performance of a particular classification method is still not fully understood, the main contribution of this work is to provide a new methodology for predicting the prediction results of different classifiers in the field of biomarker discovery. We propose here a three-steps computational workflow that includes an analysis of the data set characteristics, the calculation of the classification accuracy and, finally, the prediction of the resulting classification error. The experiments were carried out on synthetic and microarray datasets. Using this method, we showed that the predictability strongly depends on the discriminatory ability of the features, e.g., sets of genes, in two or multi-class datasets. If a dataset has a certain discriminatory ability, this method enables prediction of the classification performance before applying a learning model. Thus, our results contribute to a better understanding of the relationship between dataset characteristics and the corresponding performance of a machine learning method, and suggest the optimal classification method for a given dataset based on its discriminatory ability.
Collapse
|
8
|
Li Y, Mansmann U, Du S, Hornung R. Benchmark study of feature selection strategies for multi-omics data. BMC Bioinformatics 2022; 23:412. [PMID: 36199022 PMCID: PMC9533501 DOI: 10.1186/s12859-022-04962-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Accepted: 09/21/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In the last few years, multi-omics data, that is, datasets containing different types of high-dimensional molecular variables for the same samples, have become increasingly available. To date, several comparison studies focused on feature selection methods for omics data, but to our knowledge, none compared these methods for the special case of multi-omics data. Given that these data have specific structures that differentiate them from single-omics data, it is unclear whether different feature selection strategies may be optimal for such data. In this paper, using 15 cancer multi-omics datasets we compared four filter methods, two embedded methods, and two wrapper methods with respect to their performance in the prediction of a binary outcome in several situations that may affect the prediction results. As classifiers, we used support vector machines and random forests. The methods were compared using repeated fivefold cross-validation. The accuracy, the AUC, and the Brier score served as performance metrics. RESULTS The results suggested that, first, the chosen number of selected features affects the predictive performance for many feature selection methods but not all. Second, whether the features were selected by data type or from all data types concurrently did not considerably affect the predictive performance, but for some methods, concurrent selection took more time. Third, regardless of which performance measure was considered, the feature selection methods mRMR, the permutation importance of random forests, and the Lasso tended to outperform the other considered methods. Here, mRMR and the permutation importance of random forests already delivered strong predictive performance when considering only a few selected features. Finally, the wrapper methods were computationally much more expensive than the filter and embedded methods. CONCLUSIONS We recommend the permutation importance of random forests and the filter method mRMR for feature selection using multi-omics data, where, however, mRMR is considerably more computationally costly.
Collapse
Affiliation(s)
- Yingxia Li
- Institute for Medical Information Processing, Biometry and Epidemiology, University of Munich, Marchioninistr. 15, 81377, Munich, Germany.
| | - Ulrich Mansmann
- Institute for Medical Information Processing, Biometry and Epidemiology, University of Munich, Marchioninistr. 15, 81377, Munich, Germany
| | - Shangming Du
- Institute for Medical Information Processing, Biometry and Epidemiology, University of Munich, Marchioninistr. 15, 81377, Munich, Germany
| | - Roman Hornung
- Institute for Medical Information Processing, Biometry and Epidemiology, University of Munich, Marchioninistr. 15, 81377, Munich, Germany
| |
Collapse
|
9
|
Viswanathan VS, Toro P, Corredor G, Mukhopadhyay S, Madabhushi A. The state of the art for artificial intelligence in lung digital pathology. J Pathol 2022; 257:413-429. [PMID: 35579955 PMCID: PMC9254900 DOI: 10.1002/path.5966] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Revised: 04/26/2022] [Accepted: 05/15/2022] [Indexed: 12/03/2022]
Abstract
Lung diseases carry a significant burden of morbidity and mortality worldwide. The advent of digital pathology (DP) and an increase in computational power have led to the development of artificial intelligence (AI)-based tools that can assist pathologists and pulmonologists in improving clinical workflow and patient management. While previous works have explored the advances in computational approaches for breast, prostate, and head and neck cancers, there has been a growing interest in applying these technologies to lung diseases as well. The application of AI tools on radiology images for better characterization of indeterminate lung nodules, fibrotic lung disease, and lung cancer risk stratification has been well documented. In this article, we discuss methodologies used to build AI tools in lung DP, describing the various hand-crafted and deep learning-based unsupervised feature approaches. Next, we review AI tools across a wide spectrum of lung diseases including cancer, tuberculosis, idiopathic pulmonary fibrosis, and COVID-19. We discuss the utility of novel imaging biomarkers for different types of clinical problems including quantification of biomarkers like PD-L1, lung disease diagnosis, risk stratification, and prediction of response to treatments such as immune checkpoint inhibitors. We also look briefly at some emerging applications of AI tools in lung DP such as multimodal data analysis, 3D pathology, and transplant rejection. Lastly, we discuss the future of DP-based AI tools, describing the challenges with regulatory approval, developing reimbursement models, planning clinical deployment, and addressing AI biases. © 2022 The Authors. The Journal of Pathology published by John Wiley & Sons Ltd on behalf of The Pathological Society of Great Britain and Ireland.
Collapse
Affiliation(s)
| | - Paula Toro
- Department of PathologyCleveland ClinicClevelandOHUSA
| | - Germán Corredor
- Department of Biomedical EngineeringCase Western Reserve UniversityClevelandOHUSA
- Louis Stokes Cleveland VA Medical CenterClevelandOHUSA
| | | | - Anant Madabhushi
- Department of Biomedical EngineeringCase Western Reserve UniversityClevelandOHUSA
- Louis Stokes Cleveland VA Medical CenterClevelandOHUSA
| |
Collapse
|
10
|
Synergistic Effects of Different Levels of Genomic Data for the Staging of Lung Adenocarcinoma: An Illustrative Study. Genes (Basel) 2021; 12:genes12121872. [PMID: 34946821 PMCID: PMC8700916 DOI: 10.3390/genes12121872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Revised: 11/18/2021] [Accepted: 11/24/2021] [Indexed: 11/17/2022] Open
Abstract
Lung adenocarcinoma (LUAD) is a common and very lethal cancer. Accurate staging is a prerequisite for its effective diagnosis and treatment. Therefore, improving the accuracy of the stage prediction of LUAD patients is of great clinical relevance. Previous works have mainly focused on single genomic data information or a small number of different omics data types concurrently for generating predictive models. A few of them have considered multi-omics data from genome to proteome. We used a publicly available dataset to illustrate the potential of multi-omics data for stage prediction in LUAD. In particular, we investigated the roles of the specific omics data types in the prediction process. We used a self-developed method, Omics-MKL, for stage prediction that combines an existing feature ranking technique Minimum Redundancy and Maximum Relevance (mRMR), which avoids redundancy among the selected features, and multiple kernel learning (MKL), applying different kernels for different omics data types. Each of the considered omics data types individually provided useful prediction results. Moreover, using multi-omics data delivered notably better results than using single-omics data. Gene expression and methylation information seem to play vital roles in the staging of LUAD. The Omics-MKL method retained 70 features after the selection process. Of these, 21 (30%) were methylation features and 34 (48.57%) were gene expression features. Moreover, 18 (25.71%) of the selected features are known to be related to LUAD, and 29 (41.43%) to lung cancer in general. Using multi-omics data from genome to proteome for predicting the stage of LUAD seems promising because each omics data type may improve the accuracy of the predictions. Here, methylation and gene expression data may play particularly important roles.
Collapse
|
11
|
Bouranis JA, Beaver LM, Ho E. Metabolic Fate of Dietary Glucosinolates and Their Metabolites: A Role for the Microbiome. Front Nutr 2021; 8:748433. [PMID: 34631775 PMCID: PMC8492924 DOI: 10.3389/fnut.2021.748433] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Accepted: 08/27/2021] [Indexed: 01/08/2023] Open
Abstract
Robust evidence shows that phytochemicals from cruciferous vegetables, like broccoli, are associated with numerous health benefits. The anti-cancer properties of these foods are attributed to bioactive isothiocyanates (ITCs) and indoles, phytochemicals generated from biological precursor compounds called glucosinolates. ITCs, and particularly sulforaphane (SFN), are of intense interest as they block the initiation, and suppress the progression of cancer, through genetic and epigenetic mechanisms. The efficacy of these compounds is well-demonstrated in cell culture and animal models, however, high levels of inter-individual variation in absorption and excretion of ITCs is a significant barrier to the use of dietary glucosinolates to prevent and treat disease. The source of inter-individual ITC variation has yet to be fully elucidated and the gut microbiome may play a key role. This review highlights evidence that the gut microbiome influences the metabolic fate and activity of ITCs. Human feeding trials have shown inter-individual variations in gut microbiome composition coincides with variations in ITC absorption and excretion, and some bacteria produce ITCs from glucosinolates. Additionally, consumption of cruciferous vegetables can alter the composition of the gut microbiome and shift the physiochemical environment of the gut lumen, influencing the production of phytochemicals. Microbiome and diet induced changes to ITC metabolism may lead to the decrease of cancer fighting phytochemicals such as SFN and increase the production of biologically inert ones like SFN-nitrile. We conclude by offering perspective on the use of novel “omics” technologies to elucidate the interplay of the gut microbiome and ITC formation.
Collapse
Affiliation(s)
- John A Bouranis
- Linus Pauling Institute, Oregon State University, Corvallis, OR, United States.,School of Biological and Population Health Sciences, Oregon State University, Corvallis, OR, United States
| | - Laura M Beaver
- Linus Pauling Institute, Oregon State University, Corvallis, OR, United States.,School of Biological and Population Health Sciences, Oregon State University, Corvallis, OR, United States
| | - Emily Ho
- Linus Pauling Institute, Oregon State University, Corvallis, OR, United States.,School of Biological and Population Health Sciences, Oregon State University, Corvallis, OR, United States
| |
Collapse
|
12
|
Vahabi N, McDonough CW, Desai AA, Cavallari LH, Duarte JD, Michailidis G. Cox-sMBPLS: An Algorithm for Disease Survival Prediction and Multi-Omics Module Discovery Incorporating Cis-Regulatory Quantitative Effects. Front Genet 2021; 12:701405. [PMID: 34408773 PMCID: PMC8366414 DOI: 10.3389/fgene.2021.701405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Accepted: 07/07/2021] [Indexed: 12/03/2022] Open
Abstract
Background The development of high-throughput techniques has enabled profiling a large number of biomolecules across a number of molecular compartments. The challenge then becomes to integrate such multimodal Omics data to gain insights into biological processes and disease onset and progression mechanisms. Further, given the high dimensionality of such data, incorporating prior biological information on interactions between molecular compartments when developing statistical models for data integration is beneficial, especially in settings involving a small number of samples. Results We develop a supervised model for time to event data (e.g., death, biochemical recurrence) that simultaneously accounts for redundant information within Omics profiles and leverages prior biological associations between them through a multi-block PLS framework. The interactions between data from different molecular compartments (e.g., epigenome, transcriptome, methylome, etc.) were captured by using cis-regulatory quantitative effects in the proposed model. The model, coined Cox-sMBPLS, exhibits superior prediction performance and improved feature selection based on both simulation studies and analysis of data from heart failure patients. Conclusion The proposed supervised Cox-sMBPLS model can effectively incorporate prior biological information in the survival prediction system, leading to improved prediction performance and feature selection. It also enables the identification of multi-Omics modules of biomolecules that impact the patients’ survival probability and also provides insights into potential relevant risk factors that merit further investigation.
Collapse
Affiliation(s)
- Nasim Vahabi
- Informatics Institute, University of Florida, Gainesville, FL, United States
| | - Caitrin W McDonough
- Department of Pharmacotherapy and Translational Research, Center for Pharmacogenomics and Precision Medicine, University of Florida, Gainesville, FL, United States
| | - Ankit A Desai
- Department of Medicine, Indiana University, Indianapolis, IN, United States
| | - Larisa H Cavallari
- Department of Pharmacotherapy and Translational Research, Center for Pharmacogenomics and Precision Medicine, University of Florida, Gainesville, FL, United States
| | - Julio D Duarte
- Department of Pharmacotherapy and Translational Research, Center for Pharmacogenomics and Precision Medicine, University of Florida, Gainesville, FL, United States
| | - George Michailidis
- Informatics Institute, University of Florida, Gainesville, FL, United States
| |
Collapse
|
13
|
Zhao L, Dong Q, Luo C, Wu Y, Bu D, Qi X, Luo Y, Zhao Y. DeepOmix: A scalable and interpretable multi-omics deep learning framework and application in cancer survival analysis. Comput Struct Biotechnol J 2021; 19:2719-2725. [PMID: 34093987 PMCID: PMC8131983 DOI: 10.1016/j.csbj.2021.04.067] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 04/26/2021] [Accepted: 04/27/2021] [Indexed: 01/23/2023] Open
Abstract
Integrative analysis of multi-omics data can elucidate valuable insights into complex molecular mechanisms for various diseases. However, due to their different modalities and high dimension, utilizing and integrating different types of omics data suffers from great challenges. There is an urgent need to develop a powerful method to improve survival prediction and detect functional gene modules from multi-omics data. To deal with these problems, we present DeepOmix (a scalable and interpretable multi-Omics Deep learning framework and application in cancer survival analysis), a flexible, scalable, and interpretable method for extracting relationships between the clinical survival time and multi-omics data based on a deep learning framework. DeepOmix enables the non-linear combination of variables from different omics datasets and incorporates prior biological information defined by users (such as signaling pathways and tissue networks). Benchmark experiments demonstrate that DeepOmix outperforms the other five cutting-edge prediction methods. Besides, Lower Grade Glioma (LGG) is taken as the case study to perform the prognosis prediction and illustrate the functional module nodes which are associated with the prognostic result in the prediction model.
Collapse
Affiliation(s)
- Lianhe Zhao
- Key Laboratory of Intelligent Information Processing, Advanced Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Qiongye Dong
- Key Laboratory of Intelligent Information Processing, Advanced Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Chunlong Luo
- Key Laboratory of Intelligent Information Processing, Advanced Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yang Wu
- Key Laboratory of Intelligent Information Processing, Advanced Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Dechao Bu
- Key Laboratory of Intelligent Information Processing, Advanced Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Xiaoning Qi
- Key Laboratory of Intelligent Information Processing, Advanced Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yufan Luo
- Key Laboratory of Intelligent Information Processing, Advanced Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yi Zhao
- Key Laboratory of Intelligent Information Processing, Advanced Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China.,Hwa Mei Hospital, University of Chinese Academy of Sciences, Ningbo 315000, China
| |
Collapse
|
14
|
Herrmann M, Probst P, Hornung R, Jurinovic V, Boulesteix AL. Large-scale benchmark study of survival prediction methods using multi-omics data. Brief Bioinform 2020; 22:5895463. [PMID: 32823283 PMCID: PMC8138887 DOI: 10.1093/bib/bbaa167] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2020] [Revised: 06/25/2020] [Accepted: 07/03/2020] [Indexed: 12/18/2022] Open
Abstract
Multi-omics data, that is, datasets containing different types of high-dimensional molecular variables, are increasingly often generated for the investigation of various diseases. Nevertheless, questions remain regarding the usefulness of multi-omics data for the prediction of disease outcomes such as survival time. It is also unclear which methods are most appropriate to derive such prediction models. We aim to give some answers to these questions through a large-scale benchmark study using real data. Different prediction methods from machine learning and statistics were applied on 18 multi-omics cancer datasets (35 to 1000 observations, up to 100 000 variables) from the database 'The Cancer Genome Atlas' (TCGA). The considered outcome was the (censored) survival time. Eleven methods based on boosting, penalized regression and random forest were compared, comprising both methods that do and that do not take the group structure of the omics variables into account. The Kaplan-Meier estimate and a Cox model using only clinical variables were used as reference methods. The methods were compared using several repetitions of 5-fold cross-validation. Uno's C-index and the integrated Brier score served as performance metrics. The results indicate that methods taking into account the multi-omics structure have a slightly better prediction performance. Taking this structure into account can protect the predictive information in low-dimensional groups-especially clinical variables-from not being exploited during prediction. Moreover, only the block forest method outperformed the Cox model on average, and only slightly. This indicates, as a by-product of our study, that in the considered TCGA studies the utility of multi-omics data for prediction purposes was limited. Contact:moritz.herrmann@stat.uni-muenchen.de, +49 89 2180 3198 Supplementary information: Supplementary data are available at Briefings in Bioinformatics online. All analyses are reproducible using R code freely available on Github.
Collapse
Affiliation(s)
- Moritz Herrmann
- Department of Statistics, Ludwig Maximilian University, Munich, 80539, Germany
| | - Philipp Probst
- Institute for Medical Information Processing, Biometry, and Epidemiology, Ludwig Maximilian University, Munich, 81377, Germany
| | - Roman Hornung
- Institute for Medical Information Processing, Biometry, and Epidemiology, Ludwig Maximilian University, Munich, 81377, Germany
| | - Vindi Jurinovic
- Institute for Medical Information Processing, Biometry, and Epidemiology, Ludwig Maximilian University, Munich, 81377, Germany
| | - Anne-Laure Boulesteix
- Institute for Medical Information Processing, Biometry, and Epidemiology, Ludwig Maximilian University, Munich, 81377, Germany
| |
Collapse
|
15
|
Zhou J, Wang L, Yuan R, Yu X, Chen Z, Yang F, Sun G, Dong Q. Signatures of Mucosal Microbiome in Oral Squamous Cell Carcinoma Identified Using a Random Forest Model. Cancer Manag Res 2020; 12:5353-5363. [PMID: 32753953 PMCID: PMC7342497 DOI: 10.2147/cmar.s251021] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2020] [Accepted: 06/12/2020] [Indexed: 12/27/2022] Open
Abstract
Objective The aim of this study was to explore the signatures of oral microbiome associated with OSCC using a random forest (RF) model. Patients and Methods A total of 24 patients with OSCC were enrolled in the study. The oral microbiome was assessed in cancerous lesions and matched paracancerous tissues from each patient using 16S rRNA gene sequencing. Signatures of mucosal microbiome in OSCC were identified using a RF model. Results Significant differences were found between OSCC lesions and matched paracancerous tissues with respect to the microbial profile and composition. Linear discriminant analysis effect size analyses (LEfSe) identified 15 bacteria genera associated with cancerous lesions. Fusobacterium, Treponema, Streptococcus, Peptostreptococcus, Carnobacterium, Tannerella, Parvimonas and Filifactor were enriched. A classifier based on RF model identified a microbial signature comprising 12 bacteria, which was capable of distinguishing cancerous lesions and paracancerous tissues (AUC = 0.82). The network of the oral microbiome in cancerous lesions appeared to be simplified and fragmented. Functional analyses of oral microbiome showed altered functions in amino acid metabolism and increased capacity of glucose utilization in OSCC. Conclusion The identified microbial signatures may potentially be used as a biomarker for predicting OSCC or for clinical assessment of oral cancer risk.
Collapse
Affiliation(s)
- Jianhua Zhou
- Department of Stomatology, Qingdao Municipal Hospital, Qingdao University, Qingdao 266071, Shandong, People's Republic of China
| | - Lili Wang
- Central Laboratories and Department of Gastroenterology, Qingdao Municipal Hospital, Qingdao University, Qingdao 266071, Shandong, People's Republic of China
| | - Rongtao Yuan
- Department of Stomatology, Qingdao Municipal Hospital, Qingdao University, Qingdao 266071, Shandong, People's Republic of China
| | - Xinjuan Yu
- Central Laboratories and Department of Gastroenterology, Qingdao Municipal Hospital, Qingdao University, Qingdao 266071, Shandong, People's Republic of China
| | - Zhenggang Chen
- Department of Stomatology, Qingdao Municipal Hospital, Qingdao University, Qingdao 266071, Shandong, People's Republic of China
| | - Fang Yang
- Department of Stomatology, Qingdao Municipal Hospital, Qingdao University, Qingdao 266071, Shandong, People's Republic of China
| | - Guirong Sun
- Clinical Laboratory, The Affiliated Hospital, Qingdao University, Qingdao 266011, Shandong, People's Republic of China
| | - Quanjiang Dong
- Central Laboratories and Department of Gastroenterology, Qingdao Municipal Hospital, Qingdao University, Qingdao 266071, Shandong, People's Republic of China
| |
Collapse
|