1
|
Zhang Q, Chang C, Shen L, Long Q. Incorporating graph information in Bayesian factor analysis with robust and adaptive shrinkage priors. Biometrics 2024; 80:ujad014. [PMID: 38281768 PMCID: PMC10826885 DOI: 10.1093/biomtc/ujad014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Revised: 10/20/2023] [Accepted: 11/16/2023] [Indexed: 01/30/2024]
Abstract
There has been an increasing interest in decomposing high-dimensional multi-omics data into a product of low-rank and sparse matrices for the purpose of dimension reduction and feature engineering. Bayesian factor models achieve such low-dimensional representation of the original data through different sparsity-inducing priors. However, few of these models can efficiently incorporate the information encoded by the biological graphs, which has been already proven to be useful in many analysis tasks. In this work, we propose a Bayesian factor model with novel hierarchical priors, which incorporate the biological graph knowledge as a tool of identifying a group of genes functioning collaboratively. The proposed model therefore enables sparsity within networks by allowing each factor loading to be shrunk adaptively and by considering additional layers to relate individual shrinkage parameters to the underlying graph information, both of which yield a more accurate structure recovery of factor loadings. Further, this new priors overcome the phase transition phenomenon, in contrast to existing graph-incorporated approaches, so that it is robust to noisy edges that are inconsistent with the actual sparsity structure of the factor loadings. Finally, our model can handle both continuous and discrete data types. The proposed method is shown to outperform several existing factor analysis methods through simulation experiments and real data analyses.
Collapse
Affiliation(s)
- Qiyiwen Zhang
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, United States
| | - Changgee Chang
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN 47405, United States
| | - Li Shen
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, United States
| | - Qi Long
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, United States
| |
Collapse
|
2
|
Jiang S, Wang T, Zhang KH. Data-driven decision-making for precision diagnosis of digestive diseases. Biomed Eng Online 2023; 22:87. [PMID: 37658345 PMCID: PMC10472739 DOI: 10.1186/s12938-023-01148-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2023] [Accepted: 08/15/2023] [Indexed: 09/03/2023] Open
Abstract
Modern omics technologies can generate massive amounts of biomedical data, providing unprecedented opportunities for individualized precision medicine. However, traditional statistical methods cannot effectively process and utilize such big data. To meet this new challenge, machine learning algorithms have been developed and applied rapidly in recent years, which are capable of reducing dimensionality, extracting features, organizing data and forming automatable data-driven clinical decision systems. Data-driven clinical decision-making have promising applications in precision medicine and has been studied in digestive diseases, including early diagnosis and screening, molecular typing, staging and stratification of digestive malignancies, as well as precise diagnosis of Crohn's disease, auxiliary diagnosis of imaging and endoscopy, differential diagnosis of cystic lesions, etiology discrimination of acute abdominal pain, stratification of upper gastrointestinal bleeding (UGIB), and real-time diagnosis of esophageal motility function, showing good application prospects. Herein, we reviewed the recent progress of data-driven clinical decision making in precision diagnosis of digestive diseases and discussed the limitations of data-driven decision making after a brief introduction of methods for data-driven decision making.
Collapse
Affiliation(s)
- Song Jiang
- Department of Gastroenterology, The First Affiliated Hospital of Nanchang University, No. 17, Yongwai Zheng Street, Nanchang, 330006 China
- Jiangxi Institute of Gastroenterology and Hepatology, Nanchang, 330006 China
| | - Ting Wang
- Department of Gastroenterology, The First Affiliated Hospital of Nanchang University, No. 17, Yongwai Zheng Street, Nanchang, 330006 China
- Jiangxi Institute of Gastroenterology and Hepatology, Nanchang, 330006 China
| | - Kun-He Zhang
- Department of Gastroenterology, The First Affiliated Hospital of Nanchang University, No. 17, Yongwai Zheng Street, Nanchang, 330006 China
- Jiangxi Institute of Gastroenterology and Hepatology, Nanchang, 330006 China
| |
Collapse
|
3
|
Jihad M, Yet İ. Multiomics Integration at Single-Cell Resolution Using Bayesian Networks: A Case Study in Hepatocellular Carcinoma. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2023; 27:24-33. [PMID: 36602810 DOI: 10.1089/omi.2022.0170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Multiomics data integration is one of the leading frontiers of complex disease research and integrative biology. The advances in single-cell sequencing technologies offer yet another crucial dimension in multiomics research. The single-cell studies enable the study and integration of multiomics data simultaneously in the same cell. We report in this study multiomics data integration in single-cell resolution using Bayesian networks (BNs) in a case study of hepatocellular carcinoma (HCC). A BN encodes the conditional dependencies/independencies of variables using a graphical model with an accompanying joint probability. RNA-seq and Reduced Representation Bisulfite Sequencing data were analyzed separately, and copy number variations were estimated by the hidden Markov model method. Several BN models were constructed to reveal omics' causal and associational relationships. These methods were subjected to a validation study using an independent data set. We show the heterogeneity of the multiple cellular layers of HCC at single-cell omics resolution by identifying best-fitted BN models of 295 genes. We also provide novel insights into the multiomics mechanistic relationships in the human lymphocyte antigen class I genes in HCC. To the best of our knowledge, this is the first study to focus on integrating omics data using a machine learning algorithm, BNs, at the single-cell resolution using a case study of HCC.
Collapse
Affiliation(s)
- Muntadher Jihad
- Department of Bioinformatics, Graduate School of Health Sciences, Hacettepe University, Ankara, Turkey
| | - İdil Yet
- Department of Bioinformatics, Graduate School of Health Sciences, Hacettepe University, Ankara, Turkey
| |
Collapse
|
4
|
Hao X, Cheng S, Jiang B, Xin S. Applying multi-omics techniques to the discovery of biomarkers for acute aortic dissection. Front Cardiovasc Med 2022; 9:961991. [PMID: 36588568 PMCID: PMC9797526 DOI: 10.3389/fcvm.2022.961991] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2022] [Accepted: 11/28/2022] [Indexed: 12/23/2022] Open
Abstract
Acute aortic dissection (AAD) is a cardiovascular disease that manifests suddenly and fatally. Due to the lack of specific early symptoms, many patients with AAD are often overlooked or misdiagnosed, which is undoubtedly catastrophic for patients. The particular pathogenic mechanism of AAD is yet unknown, which makes clinical pharmacological therapy extremely difficult. Therefore, it is necessary and crucial to find and employ unique biomarkers for Acute aortic dissection (AAD) as soon as possible in clinical practice and research. This will aid in the early detection of AAD and give clear guidelines for the creation of focused treatment agents. This goal has been made attainable over the past 20 years by the quick advancement of omics technologies and the development of high-throughput tissue specimen biomarker screening. The primary histology data support and add to one another to create a more thorough and three-dimensional picture of the disease. Based on the introduction of the main histology technologies, in this review, we summarize the current situation and most recent developments in the application of multi-omics technologies to AAD biomarker discovery and emphasize the significance of concentrating on integration concepts for integrating multi-omics data. In this context, we seek to offer fresh concepts and recommendations for fundamental investigation, perspective innovation, and therapeutic development in AAD.
Collapse
Affiliation(s)
- Xinyu Hao
- Department of Vascular Surgery, The First Affiliated Hospital of China Medical University, China Medical University, Shenyang, China,Key Laboratory of Pathogenesis, Prevention and Therapeutics of Aortic Aneurysm, Shenyang, Liaoning, China
| | - Shuai Cheng
- Department of Vascular Surgery, The First Affiliated Hospital of China Medical University, China Medical University, Shenyang, China,Key Laboratory of Pathogenesis, Prevention and Therapeutics of Aortic Aneurysm, Shenyang, Liaoning, China
| | - Bo Jiang
- Department of Vascular Surgery, The First Affiliated Hospital of China Medical University, China Medical University, Shenyang, China,Key Laboratory of Pathogenesis, Prevention and Therapeutics of Aortic Aneurysm, Shenyang, Liaoning, China
| | - Shijie Xin
- Department of Vascular Surgery, The First Affiliated Hospital of China Medical University, China Medical University, Shenyang, China,Key Laboratory of Pathogenesis, Prevention and Therapeutics of Aortic Aneurysm, Shenyang, Liaoning, China,*Correspondence: Shijie Xin,
| |
Collapse
|
5
|
Minatel BC, Cohn DE, Pewarchuk ME, Barros-Filho MC, Sage AP, Stewart GL, Marshall EA, Telkar N, Martinez VD, Reis PP, Robinson WP, Lam WL. Genetic and Epigenetic Mechanisms Deregulate the CRL2pVHL Complex in Hepatocellular Carcinoma. Front Genet 2022; 13:910221. [PMID: 35664333 PMCID: PMC9159809 DOI: 10.3389/fgene.2022.910221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Accepted: 05/02/2022] [Indexed: 12/02/2022] Open
Abstract
Dysregulation of ubiquitin-proteasome pathway genes through copy number alteration, promoter hypomethylation, and miRNA deregulation is involved in cancer development and progression. Further characterizing alterations in these genes may uncover novel drug targets across a range of diseases in which druggable alterations are uncommon, including hepatocellular carcinoma (HCC). We analyzed 377 HCC and 59 adjacent non-malignant liver tissue samples, focusing on alterations to component genes of the widely studied CRL2pVHL E3 ubiquitin ligase complex. mRNA upregulation of the component genes was common, and was correlated with DNA hypomethylation and copy number increase, but many tumours displayed overexpression that was not explained by either mechanism. Interestingly, we found 66 miRNAs, including 39 previously unannotated miRNAs, that were downregulated in HCC and predicted to target one or more CRL2pVHL components. Several miRNAs, including hsa-miR-101-3p and hsa-miR-139-5p, were negatively correlated with multiple component genes, suggesting that miRNA deregulation may contribute to CRL2pVHL overexpression. Combining miRNA and mRNA expression, DNA copy number, and methylation status into one multidimensional survival analysis, we found a significant association between greater numbers of alterations and poorer overall survival for multiple component genes. While the intricacies of CRL2pVHL complex gene regulation require additional research, it is evident that multiple causes for the deregulation of these genes must be considered in HCC, including non-traditional mechanisms.
Collapse
Affiliation(s)
- Brenda C. Minatel
- Department of Integrative Oncology, British Columbia Cancer Research Institute, Vancouver, BC, Canada
| | - David E. Cohn
- Department of Integrative Oncology, British Columbia Cancer Research Institute, Vancouver, BC, Canada
- *Correspondence: David E. Cohn,
| | - Michelle E. Pewarchuk
- Department of Integrative Oncology, British Columbia Cancer Research Institute, Vancouver, BC, Canada
| | - Mateus C. Barros-Filho
- Department of Integrative Oncology, British Columbia Cancer Research Institute, Vancouver, BC, Canada
- Department of Oncology, Hospital Sírio-Libanes, São Paulo, Brazil
| | - Adam P. Sage
- Department of Integrative Oncology, British Columbia Cancer Research Institute, Vancouver, BC, Canada
| | - Greg L. Stewart
- Department of Integrative Oncology, British Columbia Cancer Research Institute, Vancouver, BC, Canada
| | - Erin A. Marshall
- Department of Integrative Oncology, British Columbia Cancer Research Institute, Vancouver, BC, Canada
| | - Nikita Telkar
- Department of Integrative Oncology, British Columbia Cancer Research Institute, Vancouver, BC, Canada
- British Columbia Children’s Hospital Research Institute, Vancouver, BC, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | - Victor D. Martinez
- Department of Integrative Oncology, British Columbia Cancer Research Institute, Vancouver, BC, Canada
- Department of Pathology and Laboratory Medicine, IWK Health Centre, Halifax, NS, Canada
- Department of Pathology, Faculty of Medicine, Dalhousie University, Halifax, NS, Canada
- Beatrice Hunter Cancer Research Institute, Halifax, NS, Canada
| | - Patricia P. Reis
- Department of Surgery and Orthopedics and Experimental Research Unity (UNIPEX), Faculty of Medicine, São Paulo State University (UNESP), Botucatu, Brazil
| | - Wendy P. Robinson
- British Columbia Children’s Hospital Research Institute, Vancouver, BC, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | - Wan L. Lam
- Department of Integrative Oncology, British Columbia Cancer Research Institute, Vancouver, BC, Canada
| |
Collapse
|
6
|
Stanton JE, Malijauskaite S, McGourty K, Grabrucker AM. The Metallome as a Link Between the "Omes" in Autism Spectrum Disorders. Front Mol Neurosci 2021; 14:695873. [PMID: 34290588 PMCID: PMC8289253 DOI: 10.3389/fnmol.2021.695873] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Accepted: 06/14/2021] [Indexed: 12/26/2022] Open
Abstract
Metal dyshomeostasis plays a significant role in various neurological diseases such as Alzheimer's disease, Parkinson's disease, Autism Spectrum Disorders (ASD), and many more. Like studies investigating the proteome, transcriptome, epigenome, microbiome, etc., for years, metallomics studies have focused on data from their domain, i.e., trace metal composition, only. Still, few have considered the links between other "omes," which may together result in an individual's specific pathologies. In particular, ASD have been reported to have multitudes of possible causal effects. Metallomics data focusing on metal deficiencies and dyshomeostasis can be linked to functions of metalloenzymes, metal transporters, and transcription factors, thus affecting the proteome and transcriptome. Furthermore, recent studies in ASD have emphasized the gut-brain axis, with alterations in the microbiome being linked to changes in the metabolome and inflammatory processes. However, the microbiome and other "omes" are heavily influenced by the metallome. Thus, here, we will summarize the known implications of a changed metallome for other "omes" in the body in the context of "omics" studies in ASD. We will highlight possible connections and propose a model that may explain the so far independently reported pathologies in ASD.
Collapse
Affiliation(s)
- Janelle E Stanton
- Department of Biological Sciences, University of Limerick, Limerick, Ireland.,Bernal Institute, University of Limerick, Limerick, Ireland
| | - Sigita Malijauskaite
- Bernal Institute, University of Limerick, Limerick, Ireland.,Department of Chemical Sciences, University of Limerick, Limerick, Ireland
| | - Kieran McGourty
- Bernal Institute, University of Limerick, Limerick, Ireland.,Department of Chemical Sciences, University of Limerick, Limerick, Ireland.,Health Research Institute, University of Limerick, Limerick, Ireland
| | - Andreas M Grabrucker
- Department of Biological Sciences, University of Limerick, Limerick, Ireland.,Bernal Institute, University of Limerick, Limerick, Ireland.,Health Research Institute, University of Limerick, Limerick, Ireland
| |
Collapse
|
7
|
Ahmad F, Mahmood A, Muhmood T. Machine learning-integrated omics for the risk and safety assessment of nanomaterials. Biomater Sci 2021; 9:1598-1608. [PMID: 33443512 DOI: 10.1039/d0bm01672a] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
With the advancement in nanotechnology, we are experiencing transformation in world order with deep insemination of nanoproducts from basic necessities to advanced electronics, health care products and medicines. Therefore, nanoproducts, however, can have negative side effects and must be strictly monitored to avoid negative outcomes. Future toxicity and safety challenges regarding nanomaterial incorporation into consumer products, including rapid addition of nanomaterials with diverse functionalities and attributes, highlight the limitations of traditional safety evaluation tools. Currently, artificial intelligence and machine learning algorithms are envisioned for enhancing and improving the nano-bio-interaction simulation and modeling, and they extend to the post-marketing surveillance of nanomaterials in the real world. Thus, hyphenation of machine learning with biology and nanomaterials could provide exclusive insights into the perturbations of delicate biological functions after integration with nanomaterials. In this review, we discuss the potential of combining integrative omics with machine learning in profiling nanomaterial safety and risk assessment and provide guidance for regulatory authorities as well.
Collapse
Affiliation(s)
- Farooq Ahmad
- College of Engineering and Applied Sciences, Nanjing National Laboratory of Microstructures, Jiangsu Key Laboratory of Artificial Functional Materials, Nanjing University, Nanjing, Jiangsu 210093, China.
| | - Asif Mahmood
- Beijing Key Laboratory of Photoelectronic/Electrophotonic Conversion Materials, School of Chemistry and Chemical Engineering, Beijing Institute of Technology, Beijing, 100081, China
| | - Tahir Muhmood
- State Key Lab of Metal Matrix Composites, School of Materials Science and Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
| |
Collapse
|
8
|
Niño-Ramírez S, Jaramillo-Arroyave D, Ardila O, Guevara-Casallas LG. Reducing the heterogeneity in hepatocellular carcinoma. A cluster analysis based on clinical variables in patients treated at a quaternary care hospital. REVISTA DE GASTROENTEROLOGIA DE MEXICO (ENGLISH) 2021; 86:S0375-0906(21)00011-2. [PMID: 33745755 DOI: 10.1016/j.rgmx.2020.07.013] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2020] [Revised: 06/28/2020] [Accepted: 07/19/2020] [Indexed: 12/23/2022]
Abstract
INTRODUCTION AND OBJECTIVE Even though the term hepatocellular carcinoma designates the most common type of primary liver cancer, the disease has a high level of heterogeneity due to its etiology, geographic variation, behavior, and association with specific genetic alterations. The aim of the present study was to establish, through a cluster analysis, the clinical characteristics that enable homogeneous conglomerates to be defined. MATERIALS AND METHODS An exploratory cluster analysis was developed utilizing the K-means method for sub-classifying 119 cases of patients with hepatocellular carcinoma. Sixty-two of those patients met the inclusion criteria, as well as none of the exclusion criteria. For the cluster analysis, an n-dimensional space was defined, in which n was equal to the number of variables included in the study (n = 17). The spatial coordinates corresponded to any possible magnitude between the minimum and maximum values of the variables analyzed (age, sex, tumor volume, AFP, AST, DB, Alb, Na, INR, Cr, HBV, HCV, OH, NASH, cirrhosis, multiple tumors, and neotumor). RESULTS Four patterns with homogeneous clinical characteristics were identified, in which age at presentation, history of hepatitis B virus infection, altered liver profile with cholestatic dominance, and low albumin levels were associated with an apparently worse outcome. CONCLUSION How heterogeneity in hepatocellular carcinoma could be reduced was shown through utilizing an unsupervised learning method to define specific subgroups, in whom known pathophysiologic mechanisms could better explain tumor behavior and define the determining prognostic factors related to the subgroups.
Collapse
Affiliation(s)
| | - D Jaramillo-Arroyave
- Hospital Universitario San Vicente Fundación, Universidad de Antioquia, Facultad de Medicina Universidad CES, Medellín, Colombia
| | - O Ardila
- Hospital Universitario San Vicente Fundación, Universidad de Antioquia, Facultad de Medicina Universidad CES, Medellín, Colombia
| | | |
Collapse
|
9
|
Harnessing big 'omics' data and AI for drug discovery in hepatocellular carcinoma. Nat Rev Gastroenterol Hepatol 2020; 17:238-251. [PMID: 31900465 PMCID: PMC7401304 DOI: 10.1038/s41575-019-0240-9] [Citation(s) in RCA: 72] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/06/2019] [Indexed: 12/13/2022]
Abstract
Hepatocellular carcinoma (HCC) is the most common form of primary adult liver cancer. After nearly a decade with sorafenib as the only approved treatment, multiple new agents have demonstrated efficacy in clinical trials, including the targeted therapies regorafenib, lenvatinib and cabozantinib, the anti-angiogenic antibody ramucirumab, and the immune checkpoint inhibitors nivolumab and pembrolizumab. Although these agents offer new promise to patients with HCC, the optimal choice and sequence of therapies remains unknown and without established biomarkers, and many patients do not respond to treatment. The advances and the decreasing costs of molecular measurement technologies enable profiling of HCC molecular features (such as genome, transcriptome, proteome and metabolome) at different levels, including bulk tissues, animal models and single cells. The release of such data sets to the public enhances the ability to search for information from these legacy studies and provides the opportunity to leverage them to understand HCC mechanisms, rationally develop new therapeutics and identify candidate biomarkers of treatment response. Here, we provide a comprehensive review of public data sets related to HCC and discuss how emerging artificial intelligence methods can be applied to identify new targets and drugs as well as to guide therapeutic choices for improved HCC treatment.
Collapse
|
10
|
Tong D, Tian Y, Zhou T, Ye Q, Li J, Ding K, Li J. Improving prediction performance of colon cancer prognosis based on the integration of clinical and multi-omics data. BMC Med Inform Decis Mak 2020; 20:22. [PMID: 32033604 PMCID: PMC7006213 DOI: 10.1186/s12911-020-1043-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2019] [Accepted: 01/31/2020] [Indexed: 12/16/2022] Open
Abstract
Background Colon cancer is common worldwide and is the leading cause of cancer-related death. Multiple levels of omics data are available due to the development of sequencing technologies. In this study, we proposed an integrative prognostic model for colon cancer based on the integration of clinical and multi-omics data. Methods In total, 344 patients were included in this study. Clinical, gene expression, DNA methylation and miRNA expression data were retrieved from The Cancer Genome Atlas (TCGA). To accommodate the high dimensionality of omics data, unsupervised clustering was used as dimension reduction method. The bias-corrected Harrell’s concordance index was used to verify which clustering result provided the best prognostic performance. Finally, we proposed a prognostic prediction model based on the integration of clinical data and multi-omics data. Uno’s concordance index with cross-validation was used to compare the discriminative performance of the prognostic model constructed with different covariates. Results Combinations of clinical and multi-omics data can improve prognostic performance, as shown by the increase of the bias-corrected Harrell’s concordance of the prognostic model from 0.7424 (clinical features only) to 0.7604 (clinical features and three types of omics features). Additionally, 2-year, 3-year and 5-year Uno’s concordance statistics increased from 0.7329, 0.7043, and 0.7002 (clinical features only) to 0.7639, 0.7474 and 0.7597 (clinical features and three types of omics features), respectively. Conclusion In conclusion, this study successfully combined clinical and multi-omics data for better prediction of colon cancer prognosis.
Collapse
Affiliation(s)
- Danyang Tong
- Engineering Research Center of EMR and Intelligent Expert System, Ministry of Education, College of Biomedical Engineering and Instrument Science, Zhejiang University, No. 38 Zheda Road, Hangzhou, 310027, Zhejiang Province, China
| | - Yu Tian
- Engineering Research Center of EMR and Intelligent Expert System, Ministry of Education, College of Biomedical Engineering and Instrument Science, Zhejiang University, No. 38 Zheda Road, Hangzhou, 310027, Zhejiang Province, China
| | - Tianshu Zhou
- Engineering Research Center of EMR and Intelligent Expert System, Ministry of Education, College of Biomedical Engineering and Instrument Science, Zhejiang University, No. 38 Zheda Road, Hangzhou, 310027, Zhejiang Province, China
| | - Qiancheng Ye
- Engineering Research Center of EMR and Intelligent Expert System, Ministry of Education, College of Biomedical Engineering and Instrument Science, Zhejiang University, No. 38 Zheda Road, Hangzhou, 310027, Zhejiang Province, China
| | - Jun Li
- Department of Surgical Oncology, Second Affiliated Hospital, Zhejiang University School of Medicine, No. 88 Jiefang Road, Hangzhou, 31009, Zhejiang Province, China
| | - Kefeng Ding
- Department of Surgical Oncology, Second Affiliated Hospital, Zhejiang University School of Medicine, No. 88 Jiefang Road, Hangzhou, 31009, Zhejiang Province, China
| | - Jingsong Li
- Engineering Research Center of EMR and Intelligent Expert System, Ministry of Education, College of Biomedical Engineering and Instrument Science, Zhejiang University, No. 38 Zheda Road, Hangzhou, 310027, Zhejiang Province, China. .,Research Center for Healthcare Data Science, Zhejiang Lab, Hangzhou, China.
| |
Collapse
|
11
|
Subramanian I, Verma S, Kumar S, Jere A, Anamika K. Multi-omics Data Integration, Interpretation, and Its Application. Bioinform Biol Insights 2020; 14:1177932219899051. [PMID: 32076369 PMCID: PMC7003173 DOI: 10.1177/1177932219899051] [Citation(s) in RCA: 518] [Impact Index Per Article: 129.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2019] [Accepted: 11/09/2019] [Indexed: 12/22/2022] Open
Abstract
To study complex biological processes holistically, it is imperative to take an integrative approach that combines multi-omics data to highlight the interrelationships of the involved biomolecules and their functions. With the advent of high-throughput techniques and availability of multi-omics data generated from a large set of samples, several promising tools and methods have been developed for data integration and interpretation. In this review, we collected the tools and methods that adopt integrative approach to analyze multiple omics data and summarized their ability to address applications such as disease subtyping, biomarker prediction, and deriving insights into the data. We provide the methodology, use-cases, and limitations of these tools; brief account of multi-omics data repositories and visualization portals; and challenges associated with multi-omics data integration.
Collapse
Affiliation(s)
| | | | | | - Abhay Jere
- Innovation Cell, Ministry of Human Resource Development, New Delhi, India
| | | |
Collapse
|
12
|
Coretto P, Serra A, Tagliaferri R. Robust clustering of noisy high-dimensional gene expression data for patients subtyping. Bioinformatics 2019; 34:4064-4072. [PMID: 29939219 DOI: 10.1093/bioinformatics/bty502] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2018] [Accepted: 06/19/2018] [Indexed: 12/12/2022] Open
Abstract
Motivation One of the most important research areas in personalized medicine is the discovery of disease sub-types with relevance in clinical applications. This is usually accomplished by exploring gene expression data with unsupervised clustering methodologies. Then, with the advent of multiple omics technologies, data integration methodologies have been further developed to obtain better performances in patient separability. However, these methods do not guarantee the survival separability of the patients in different clusters. Results We propose a new methodology that first computes a robust and sparse correlation matrix of the genes, then decomposes it and projects the patient data onto the first m spectral components of the correlation matrix. After that, a robust and adaptive to noise clustering algorithm is applied. The clustering is set up to optimize the separation between survival curves estimated cluster-wise. The method is able to identify clusters that have different omics signatures and also statistically significant differences in survival time. The proposed methodology is tested on five cancer datasets downloaded from The Cancer Genome Atlas repository. The proposed method is compared with the Similarity Network Fusion (SNF) approach, and model based clustering based on Student's t-distribution (TMIX). Our method obtains a better performance in terms of survival separability, even if it uses a single gene expression view compared to the multi-view approach of the SNF method. Finally, a pathway based analysis is accomplished to highlight the biological processes that differentiate the obtained patient groups. Availability and implementation Our R source code is available online at https://github.com/angy89/RobustClusteringPatientSubtyping. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pietro Coretto
- Department of Economics and Statistics, STATLAB, University of Salerno, Fisciano, SA, Italy
| | - Angela Serra
- Department of Management and Innovation Systems, NeuRoNeLab, University of Salerno, Fisciano, SA, Italy
| | - Roberto Tagliaferri
- Department of Management and Innovation Systems, NeuRoNeLab, University of Salerno, Fisciano, SA, Italy
| |
Collapse
|
13
|
Huang Z, Zhan X, Xiang S, Johnson TS, Helm B, Yu CY, Zhang J, Salama P, Rizkalla M, Han Z, Huang K. SALMON: Survival Analysis Learning With Multi-Omics Neural Networks on Breast Cancer. Front Genet 2019; 10:166. [PMID: 30906311 PMCID: PMC6419526 DOI: 10.3389/fgene.2019.00166] [Citation(s) in RCA: 113] [Impact Index Per Article: 22.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2018] [Accepted: 02/14/2019] [Indexed: 12/22/2022] Open
Abstract
Improved cancer prognosis is a central goal for precision health medicine. Though many models can predict differential survival from data, there is a strong need for sophisticated algorithms that can aggregate and filter relevant predictors from increasingly complex data inputs. In turn, these models should provide deeper insight into which types of data are most relevant to improve prognosis. Deep Learning-based neural networks offer a potential solution for both problems because they are highly flexible and account for data complexity in a non-linear fashion. In this study, we implement Deep Learning-based networks to determine how gene expression data predicts Cox regression survival in breast cancer. We accomplish this through an algorithm called SALMON (Survival Analysis Learning with Multi-Omics Neural Networks), which aggregates and simplifies gene expression data and cancer biomarkers to enable prognosis prediction. The results revealed improved performance when more omics data were used in model construction. Rather than use raw gene expression values as model inputs, we innovatively use eigengene modules from the result of gene co-expression network analysis. The corresponding high impact co-expression modules and other omics data are identified by feature selection technique, then examined by conducting enrichment analysis and exploiting biological functions, escalated the interpretation of input feature from gene level to co-expression modules level. Our study shows the feasibility of discovering breast cancer related co-expression modules, sketch a blueprint of future endeavors on Deep Learning-based survival analysis. SALMON source code is available at https://github.com/huangzhii/SALMON/.
Collapse
Affiliation(s)
- Zhi Huang
- School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, United States.,Department of Medicine, Indiana University School of Medicine, Indianapolis, IN, United States.,Department of Electrical and Computer Engineering, Indiana University-Purdue University Indianapolis, Indianapolis, IN, United States
| | - Xiaohui Zhan
- Department of Medicine, Indiana University School of Medicine, Indianapolis, IN, United States.,National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, Guangdong Key Laboratory for Biomedical Measurements and Ultrasound Imaging, School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen, China
| | - Shunian Xiang
- National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, Guangdong Key Laboratory for Biomedical Measurements and Ultrasound Imaging, School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen, China.,Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, United States
| | - Travis S Johnson
- Department of Medicine, Indiana University School of Medicine, Indianapolis, IN, United States.,Department of Biomedical Informatics, The Ohio State University, Columbus, OH, United States
| | - Bryan Helm
- Department of Medicine, Indiana University School of Medicine, Indianapolis, IN, United States
| | - Christina Y Yu
- Department of Medicine, Indiana University School of Medicine, Indianapolis, IN, United States.,Department of Biomedical Informatics, The Ohio State University, Columbus, OH, United States
| | - Jie Zhang
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, United States
| | - Paul Salama
- Department of Electrical and Computer Engineering, Indiana University-Purdue University Indianapolis, Indianapolis, IN, United States
| | - Maher Rizkalla
- Department of Electrical and Computer Engineering, Indiana University-Purdue University Indianapolis, Indianapolis, IN, United States
| | - Zhi Han
- Department of Medicine, Indiana University School of Medicine, Indianapolis, IN, United States.,Regenstrief Institute, Indianapolis, IN, United States
| | - Kun Huang
- Department of Medicine, Indiana University School of Medicine, Indianapolis, IN, United States.,Department of Electrical and Computer Engineering, Indiana University-Purdue University Indianapolis, Indianapolis, IN, United States.,Regenstrief Institute, Indianapolis, IN, United States
| |
Collapse
|
14
|
Misra BB, Langefeld CD, Olivier M, Cox LA. Integrated Omics: Tools, Advances, and Future Approaches. J Mol Endocrinol 2018; 62:JME-18-0055. [PMID: 30006342 DOI: 10.1530/jme-18-0055] [Citation(s) in RCA: 206] [Impact Index Per Article: 34.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/24/2018] [Revised: 07/02/2018] [Accepted: 07/12/2018] [Indexed: 12/13/2022]
Abstract
With the rapid adoption of high-throughput omic approaches to analyze biological samples such as genomics, transcriptomics, proteomics, and metabolomics, each analysis can generate tera- to peta-byte sized data files on a daily basis. These data file sizes, together with differences in nomenclature among these data types, make the integration of these multi-dimensional omics data into biologically meaningful context challenging. Variously named as integrated omics, multi-omics, poly-omics, trans-omics, pan-omics, or shortened to just 'omics', the challenges include differences in data cleaning, normalization, biomolecule identification, data dimensionality reduction, biological contextualization, statistical validation, data storage and handling, sharing, and data archiving. The ultimate goal is towards the holistic realization of a 'systems biology' understanding of the biological question in hand. Commonly used approaches in these efforts are currently limited by the 3 i's - integration, interpretation, and insights. Post integration, these very large datasets aim to yield unprecedented views of cellular systems at exquisite resolution for transformative insights into processes, events, and diseases through various computational and informatics frameworks. With the continued reduction in costs and processing time for sample analyses, and increasing types of omics datasets generated such as glycomics, lipidomics, microbiomics, and phenomics, an increasing number of scientists in this interdisciplinary domain of bioinformatics face these challenges. We discuss recent approaches, existing tools, and potential caveats in the integration of omics datasets for development of standardized analytical pipelines that could be adopted by the global omics research community.
Collapse
Affiliation(s)
- Biswapriya B Misra
- B Misra, Internal Medicine, Wake Forest University School of Medicine, Winston-Salem, United States
| | - Carl D Langefeld
- C Langefeld, Biostatistical Sciences, Wake Forest University School of Medicine, Winston-Salem, United States
| | - Michael Olivier
- M Olivier, Internal Medicine, Wake Forest University School of Medicine, Winston-Salem, United States
| | - Laura A Cox
- L Cox, Internal Medicine, Wake Forest University School of Medicine, Winston-Salem, United States
| |
Collapse
|
15
|
Chaudhary K, Poirion OB, Lu L, Garmire LX. Deep Learning-Based Multi-Omics Integration Robustly Predicts Survival in Liver Cancer. Clin Cancer Res 2018; 24:1248-1259. [PMID: 28982688 PMCID: PMC6050171 DOI: 10.1158/1078-0432.ccr-17-0853] [Citation(s) in RCA: 490] [Impact Index Per Article: 81.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2017] [Revised: 06/18/2017] [Accepted: 10/02/2017] [Indexed: 02/07/2023]
Abstract
Identifying robust survival subgroups of hepatocellular carcinoma (HCC) will significantly improve patient care. Currently, endeavor of integrating multi-omics data to explicitly predict HCC survival from multiple patient cohorts is lacking. To fill this gap, we present a deep learning (DL)-based model on HCC that robustly differentiates survival subpopulations of patients in six cohorts. We built the DL-based, survival-sensitive model on 360 HCC patients' data using RNA sequencing (RNA-Seq), miRNA sequencing (miRNA-Seq), and methylation data from The Cancer Genome Atlas (TCGA), which predicts prognosis as good as an alternative model where genomics and clinical data are both considered. This DL-based model provides two optimal subgroups of patients with significant survival differences (P = 7.13e-6) and good model fitness [concordance index (C-index) = 0.68]. More aggressive subtype is associated with frequent TP53 inactivation mutations, higher expression of stemness markers (KRT19 and EPCAM) and tumor marker BIRC5, and activated Wnt and Akt signaling pathways. We validated this multi-omics model on five external datasets of various omics types: LIRI-JP cohort (n = 230, C-index = 0.75), NCI cohort (n = 221, C-index = 0.67), Chinese cohort (n = 166, C-index = 0.69), E-TABM-36 cohort (n = 40, C-index = 0.77), and Hawaiian cohort (n = 27, C-index = 0.82). This is the first study to employ DL to identify multi-omics features linked to the differential survival of patients with HCC. Given its robustness over multiple cohorts, we expect this workflow to be useful at predicting HCC prognosis prediction. Clin Cancer Res; 24(6); 1248-59. ©2017 AACR.
Collapse
Affiliation(s)
| | - Olivier B Poirion
- Epidemiology Program, University of Hawaii Cancer Center, Honolulu, Hawaii
| | - Liangqun Lu
- Epidemiology Program, University of Hawaii Cancer Center, Honolulu, Hawaii
- Molecular Biosciences and Bioengineering Graduate Program, University of Hawaii at Manoa, Honolulu, Hawaii
| | - Lana X Garmire
- Epidemiology Program, University of Hawaii Cancer Center, Honolulu, Hawaii.
- Molecular Biosciences and Bioengineering Graduate Program, University of Hawaii at Manoa, Honolulu, Hawaii
| |
Collapse
|
16
|
Chen YC, Gotea V, Margolin G, Elnitski L. Significant associations between driver gene mutations and DNA methylation alterations across many cancer types. PLoS Comput Biol 2017; 13:e1005840. [PMID: 29125844 PMCID: PMC5709060 DOI: 10.1371/journal.pcbi.1005840] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2017] [Revised: 11/30/2017] [Accepted: 10/23/2017] [Indexed: 12/15/2022] Open
Abstract
Recent evidence shows that mutations in several driver genes can cause aberrant methylation patterns, a hallmark of cancer. In light of these findings, we hypothesized that the landscapes of tumor genomes and epigenomes are tightly interconnected. We measured this relationship using principal component analyses and methylation-mutation associations applied at the nucleotide level and with respect to genome-wide trends. We found that a few mutated driver genes were associated with genome-wide patterns of aberrant hypomethylation or CpG island hypermethylation in specific cancer types. In addition, we identified associations between 737 mutated driver genes and site-specific methylation changes. Moreover, using these mutation-methylation associations, we were able to distinguish between two uterine and two thyroid cancer subtypes. The driver gene mutation–associated methylation differences between the thyroid cancer subtypes were linked to differential gene expression in JAK-STAT signaling, NADPH oxidation, and other cancer-related pathways. These results establish that driver gene mutations are associated with methylation alterations capable of shaping regulatory network functions. In addition, the methodology presented here can be used to subdivide tumors into more homogeneous subsets corresponding to underlying molecular characteristics, which could improve treatment efficacy. Mutations that alter the function of driver genes by changing DNA nucleotides have been recognized as key players in cancer progression. However, recent evidence has shown that DNA methylation, which can control gene expression, is also highly dysregulated in cancer and contributes to carcinogenesis. Whether methylation alterations correspond to mutated driver genes in cancer remains unclear. In this study, we analyzed 4,302 tumors from 18 cancer types and demonstrated that driver gene mutations are inherently connected with the aberrant DNA methylation landscape in cancer. We showed that driver gene–associated methylation patterns can classify heterogeneous tumors within a cancer type into homogeneous subtypes and have the potential to influence genes that contribute to tumor growth. This finding could help us better understand the fundamental connection between driver gene mutations and DNA methylation alterations in cancer, and to further improve cancer treatment.
Collapse
Affiliation(s)
- Yun-Ching Chen
- Genomic Functional Analysis Section, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, United States
| | - Valer Gotea
- Genomic Functional Analysis Section, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, United States
| | - Gennady Margolin
- Genomic Functional Analysis Section, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, United States
| | - Laura Elnitski
- Genomic Functional Analysis Section, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, United States
- * E-mail:
| |
Collapse
|
17
|
Moore DA, Young CA, Morris HT, Oien KA, Lee JL, Jones JL, Salto-Tellez M. Time for change: a new training programme for morpho-molecular pathologists? J Clin Pathol 2017; 71:285-290. [PMID: 29113995 PMCID: PMC5868526 DOI: 10.1136/jclinpath-2017-204821] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2017] [Accepted: 10/03/2017] [Indexed: 12/20/2022]
Abstract
The evolution of cellular pathology as a specialty has always been driven by technological developments and the clinical relevance of incorporating novel investigations into diagnostic practice. In recent years, the molecular characterisation of cancer has become of crucial relevance in patient treatment both for predictive testing and subclassification of certain tumours. Much of this has become possible due to the availability of next-generation sequencing technologies and the whole-genome sequencing of tumours is now being rolled out into clinical practice in England via the 100 000 Genome Project. The effective integration of cellular pathology reporting and genomic characterisation is crucial to ensure the morphological and genomic data are interpreted in the relevant context, though despite this, in many UK centres molecular testing is entirely detached from cellular pathology departments. The CM-Path initiative recognises there is a genomics knowledge and skills gap within cellular pathology that needs to be bridged through an upskilling of the current workforce and a redesign of pathology training. Bridging this gap will allow the development of an integrated ‘morphomolecular pathology’ specialty, which can maintain the relevance of cellular pathology at the centre of cancer patient management and allow the pathology community to continue to be a major influence in cancer discovery as well as playing a driving role in the delivery of precision medicine approaches. Here, several alternative models of pathology training, designed to address this challenge, are presented and appraised.
Collapse
Affiliation(s)
- David A Moore
- Department of Cancer Studies, University of Leicester, Leicester, UK
| | | | - Hayley T Morris
- Institute of Cancer Sciences - Pathology, University of Glasgow, Glasgow, UK
| | - Karin A Oien
- Institute of Cancer Sciences - Pathology, University of Glasgow, Glasgow, UK
| | - Jessica L Lee
- Strategy and Initiatives, National Cancer Research Institute, London, UK
| | - J Louise Jones
- Centre for Tumour Biology, Barts Cancer Institute, Barts and the London School of Medicine and Dentistry, London, UK
| | - Manuel Salto-Tellez
- Northern Ireland Molecular Pathology Laboratory, Centre for Cancer Research and Cell Biology, Queen's University Belfast, Belfast, UK
| |
Collapse
|