1
|
Yang Y, Shao X, Li Z, Zhang L, Yang B, Jin B, Hu X, Qu X, Che X, Liu Y. Prognostic heterogeneity of Ki67 in non-small cell lung cancer: A comprehensive reappraisal on immunohistochemistry and transcriptional data. J Cell Mol Med 2024; 28:e18521. [PMID: 39021279 PMCID: PMC11255407 DOI: 10.1111/jcmm.18521] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 05/26/2024] [Accepted: 06/12/2024] [Indexed: 07/20/2024] Open
Abstract
In the present study, the debatable prognostic value of Ki67 in patients with non-small cell lung cancer (NSCLC) was attributed to the heterogeneity between lung adenocarcinoma (LUAD) and lung squamous carcinoma (LUSC). Based on meta-analyses of 29 studies, a retrospective immunohistochemical cohort of 1479 patients from our center, eight transcriptional datasets and a single-cell datasets with 40 patients, we found that high Ki67 expression suggests a poor outcome in LUAD, but conversely, low Ki67 expression indicates worse prognosis in LUSC. Furthermore, low proliferation in LUSC is associated with higher metastatic capacity, which is related to the stronger epithelial-mesenchymal transition potential, immunosuppressive microenvironment and angiogenesis. Finally, nomogram model incorporating clinical risk factors and Ki67 expression outperformed the basic clinical model for the accurate prognostic prediction of LUSC. With the largest prognostic assessment of Ki67 from protein to mRNA level, our study highlights that Ki67 also has an important prognostic value in NSCLC, but separate evaluation of LUAD and LUSC is necessary to provide more valuable information for clinical decision-making in NSCLC.
Collapse
Affiliation(s)
- Yujing Yang
- Department of Medical OncologyThe First Hospital of China Medical UniversityShenyangChina
- Key Laboratory of Anticancer Drugs and Biotherapy of Liaoning ProvinceThe First Hospital of China Medical UniversityShenyangChina
- Clinical Cancer Research Center of ShenyangThe First Hospital of China Medical UniversityShenyangChina
- Department of Oncology, Nanfang HospitalSouthern Medical UniversityGuangzhouChina
| | - Xinye Shao
- Department of Medical OncologyThe First Hospital of China Medical UniversityShenyangChina
- Key Laboratory of Anticancer Drugs and Biotherapy of Liaoning ProvinceThe First Hospital of China Medical UniversityShenyangChina
- Clinical Cancer Research Center of ShenyangThe First Hospital of China Medical UniversityShenyangChina
| | - Zhi Li
- Department of Medical OncologyThe First Hospital of China Medical UniversityShenyangChina
| | - Lingyun Zhang
- Department of Medical OncologyThe First Hospital of China Medical UniversityShenyangChina
- Clinical Cancer Research Center of ShenyangThe First Hospital of China Medical UniversityShenyangChina
| | - Bowen Yang
- Department of Medical OncologyThe First Hospital of China Medical UniversityShenyangChina
| | - Bo Jin
- Department of Medical OncologyThe First Hospital of China Medical UniversityShenyangChina
- Clinical Cancer Research Center of ShenyangThe First Hospital of China Medical UniversityShenyangChina
| | - Xuejun Hu
- Department of Respiratory and Infectious Disease of GeriatricsThe First Hospital of China Medical UniversityShenyangChina
| | - Xiujuan Qu
- Department of Medical OncologyThe First Hospital of China Medical UniversityShenyangChina
- Key Laboratory of Anticancer Drugs and Biotherapy of Liaoning ProvinceThe First Hospital of China Medical UniversityShenyangChina
- Clinical Cancer Research Center of ShenyangThe First Hospital of China Medical UniversityShenyangChina
| | - Xiaofang Che
- Department of Medical OncologyThe First Hospital of China Medical UniversityShenyangChina
- Key Laboratory of Anticancer Drugs and Biotherapy of Liaoning ProvinceThe First Hospital of China Medical UniversityShenyangChina
- Clinical Cancer Research Center of ShenyangThe First Hospital of China Medical UniversityShenyangChina
| | - Yunpeng Liu
- Department of Medical OncologyThe First Hospital of China Medical UniversityShenyangChina
- Key Laboratory of Anticancer Drugs and Biotherapy of Liaoning ProvinceThe First Hospital of China Medical UniversityShenyangChina
- Clinical Cancer Research Center of ShenyangThe First Hospital of China Medical UniversityShenyangChina
| |
Collapse
|
2
|
Huang F, Welner RS, Chen JY, Yue Z. PAGER-scFGA: unveiling cell functions and molecular mechanisms in cell trajectories through single-cell functional genomics analysis. FRONTIERS IN BIOINFORMATICS 2024; 4:1336135. [PMID: 38690527 PMCID: PMC11058213 DOI: 10.3389/fbinf.2024.1336135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Accepted: 04/01/2024] [Indexed: 05/02/2024] Open
Abstract
Background: Understanding how cells and tissues respond to stress factors and perturbations during disease processes is crucial for developing effective prevention, diagnosis, and treatment strategies. Single-cell RNA sequencing (scRNA-seq) enables high-resolution identification of cells and exploration of cell heterogeneity, shedding light on cell differentiation/maturation and functional differences. Recent advancements in multimodal sequencing technologies have focused on improving access to cell-specific subgroups for functional genomics analysis. To facilitate the functional annotation of cell groups and characterization of molecular mechanisms underlying cell trajectories, we introduce the Pathways, Annotated Gene Lists, and Gene Signatures Electronic Repository for Single-Cell Functional Genomics Analysis (PAGER-scFGA). Results: We have developed PAGER-scFGA, which integrates cell functional annotations and gene-set enrichment analysis into popular single-cell analysis pipelines such as Scanpy. Using differentially expressed genes (DEGs) from pairwise cell clusters, PAGER-scFGA infers cell functions through the enrichment of potential cell-marker genesets. Moreover, PAGER-scFGA provides pathways, annotated gene lists, and gene signatures (PAGs) enriched in specific cell subsets with tissue compositions and continuous transitions along cell trajectories. Additionally, PAGER-scFGA enables the construction of a gene subcellular map based on DEGs and allows examination of the gene functional compartments (GFCs) underlying cell maturation/differentiation. In a real-world case study of mouse natural killer (mNK) cells, PAGER-scFGA revealed two major stages of natural killer (NK) cells and three trajectories from the precursor stage to NK T-like mature stage within blood, spleen, and bone marrow tissues. As the trajectories progress to later stages, the DEGs exhibit greater divergence and variability. However, the DEGs in different trajectories still interact within a network during NK cell maturation. Notably, PAGER-scFGA unveiled cell cytotoxicity, exocytosis, and the response to interleukin (IL) signaling pathways and associated network models during the progression from precursor NK cells to mature NK cells. Conclusion: PAGER-scFGA enables in-depth exploration of functional insights and presents a comprehensive knowledge map of gene networks and GFCs, which can be utilized for future studies and hypothesis generation. It is expected to become an indispensable tool for inferring cell functions and detecting molecular mechanisms within cell trajectories in single-cell studies. The web app (accessible at https://au-singlecell.streamlit.app/) is publicly available.
Collapse
Affiliation(s)
- Fengyuan Huang
- Department of Biomedical Informatics and Data Science, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, United States
| | - Robert S. Welner
- Hematology & Oncology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, United States
| | - Jake Y. Chen
- Department of Biomedical Informatics and Data Science, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, United States
| | - Zongliang Yue
- Health Outcome Research and Policy Department, Harrison College of Pharmacy, Auburn University, Auburn, AL, United States
| |
Collapse
|
3
|
Chafai N, Bonizzi L, Botti S, Badaoui B. Emerging applications of machine learning in genomic medicine and healthcare. Crit Rev Clin Lab Sci 2024; 61:140-163. [PMID: 37815417 DOI: 10.1080/10408363.2023.2259466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Accepted: 09/12/2023] [Indexed: 10/11/2023]
Abstract
The integration of artificial intelligence technologies has propelled the progress of clinical and genomic medicine in recent years. The significant increase in computing power has facilitated the ability of artificial intelligence models to analyze and extract features from extensive medical data and images, thereby contributing to the advancement of intelligent diagnostic tools. Artificial intelligence (AI) models have been utilized in the field of personalized medicine to integrate clinical data and genomic information of patients. This integration allows for the identification of customized treatment recommendations, ultimately leading to enhanced patient outcomes. Notwithstanding the notable advancements, the application of artificial intelligence (AI) in the field of medicine is impeded by various obstacles such as the limited availability of clinical and genomic data, the diversity of datasets, ethical implications, and the inconclusive interpretation of AI models' results. In this review, a comprehensive evaluation of multiple machine learning algorithms utilized in the fields of clinical and genomic medicine is conducted. Furthermore, we present an overview of the implementation of artificial intelligence (AI) in the fields of clinical medicine, drug discovery, and genomic medicine. Finally, a number of constraints pertaining to the implementation of artificial intelligence within the healthcare industry are examined.
Collapse
Affiliation(s)
- Narjice Chafai
- Laboratory of Biodiversity, Ecology, and Genome, Faculty of Sciences, Department of Biology, Mohammed V University in Rabat, Rabat, Morocco
| | - Luigi Bonizzi
- Department of Biomedical, Surgical and Dental Science, University of Milan, Milan, Italy
| | - Sara Botti
- PTP Science Park, Via Einstein - Loc. Cascina Codazza, Lodi, Italy
| | - Bouabid Badaoui
- Laboratory of Biodiversity, Ecology, and Genome, Faculty of Sciences, Department of Biology, Mohammed V University in Rabat, Rabat, Morocco
- African Sustainable Agriculture Research Institute (ASARI), Mohammed VI Polytechnic University (UM6P), Laâyoune, Morocco
| |
Collapse
|
4
|
Jiang L, Xu C, Bai Y, Liu A, Gong Y, Wang YP, Deng HW. Autosurv: interpretable deep learning framework for cancer survival analysis incorporating clinical and multi-omics data. NPJ Precis Oncol 2024; 8:4. [PMID: 38182734 PMCID: PMC10770412 DOI: 10.1038/s41698-023-00494-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Accepted: 12/05/2023] [Indexed: 01/07/2024] Open
Abstract
Accurate prognosis for cancer patients can provide critical information for optimizing treatment plans and improving life quality. Combining omics data and demographic/clinical information can offer a more comprehensive view of cancer prognosis than using omics or clinical data alone and can also reveal the underlying disease mechanisms at the molecular level. In this study, we developed and validated a deep learning framework to extract information from high-dimensional gene expression and miRNA expression data and conduct prognosis prediction for breast cancer and ovarian-cancer patients using multiple independent multi-omics datasets. Our model achieved significantly better prognosis prediction than the current machine learning and deep learning approaches in various settings. Moreover, an interpretation method was applied to tackle the "black-box" nature of deep neural networks and we identified features (i.e., genes, miRNA, demographic/clinical variables) that were important to distinguish predicted high- and low-risk patients. The significance of the identified features was partially supported by previous studies.
Collapse
Affiliation(s)
- Lindong Jiang
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, 70112, USA
| | - Chao Xu
- Department of Biostatistics and Epidemiology, University of Oklahoma Health Sciences Center, Oklahoma City, OK, 73104, USA
| | - Yuntong Bai
- Department of Biomedical Engineering, School of Science and Engineering, Tulane University, New Orleans, LA, 70118, USA
| | - Anqi Liu
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, 70112, USA
| | - Yun Gong
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, 70112, USA
| | - Yu-Ping Wang
- Department of Biomedical Engineering, School of Science and Engineering, Tulane University, New Orleans, LA, 70118, USA
| | - Hong-Wen Deng
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, 70112, USA.
| |
Collapse
|
5
|
Jiang L, Xu C, Bai Y, Liu A, Gong Y, Wang YP, Deng HW. AUTOSURV: INTERPRETABLE DEEP LEARNING FRAMEWORK FOR CANCER SURVIVAL ANALYSIS INCORPORATING CLINICAL AND MULTI-OMICS DATA. RESEARCH SQUARE 2023:rs.3.rs-2486756. [PMID: 37609286 PMCID: PMC10441464 DOI: 10.21203/rs.3.rs-2486756/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]
Abstract
Accurate prognosis for cancer patients can provide critical information for optimizing treatment plans and improving life quality. Combining omics data and demographic/clinical information can offer a more comprehensive view of cancer prognosis than using omics or clinical data alone and can reveal the underlying disease mechanisms at the molecular level. In this study, we developed a novel deep learning framework to extract information from high-dimensional gene expression and miRNA expression data and conduct prognosis prediction for breast cancer and ovarian cancer patients. Our model achieved significantly better prognosis prediction than the conventional Cox Proportional Hazard model and other competitive deep learning approaches in various settings. Moreover, an interpretation approach was applied to tackle the "black-box" nature of deep neural networks and we identified features (i.e., genes, miRNA, demographic/clinical variables) that made important contributions to distinguishing predicted high- and low-risk patients. The identified associations were partially supported by previous studies.
Collapse
Affiliation(s)
- Lindong Jiang
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, 70112
| | - Chao Xu
- Department of Biostatistics and Epidemiology, University of Oklahoma Health Sciences Center, Oklahoma City, OK, 73104
| | - Yuntong Bai
- Department of Biomedical Engineering, School of Science and Engineering, Tulane University, New Orleans, LA, 70118
| | - Anqi Liu
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, 70112
| | - Yun Gong
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, 70112
| | - Yu-Ping Wang
- Department of Biomedical Engineering, School of Science and Engineering, Tulane University, New Orleans, LA, 70118
| | - Hong-Wen Deng
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, 70112
| |
Collapse
|
6
|
Liang B, Gong H, Lu L, Xu J. Risk stratification and pathway analysis based on graph neural network and interpretable algorithm. BMC Bioinformatics 2022; 23:394. [PMID: 36167504 PMCID: PMC9516820 DOI: 10.1186/s12859-022-04950-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Accepted: 09/19/2022] [Indexed: 12/01/2022] Open
Abstract
Background Pathway-based analysis of transcriptomic data has shown greater stability and better performance than traditional gene-based analysis. Until now, some pathway-based deep learning models have been developed for bioinformatic analysis, but these models have not fully considered the topological features of pathways, which limits the performance of the final prediction result. Results To address this issue, we propose a novel model, called PathGNN, which constructs a Graph Neural Networks (GNNs) model that can capture topological features of pathways. As a case, PathGNN was applied to predict long-term survival of four types of cancer and achieved promising predictive performance when compared to other common methods. Furthermore, the adoption of an interpretation algorithm enabled the identification of plausible pathways associated with survival. Conclusion PathGNN demonstrates that GNN can be effectively applied to build a pathway-based model, resulting in promising predictive power. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04950-1.
Collapse
Affiliation(s)
- Bilin Liang
- Shanghai Artificial Intelligence Laboratory, Yunjing Road 701, Shanghai, China
| | - Haifan Gong
- Shanghai Artificial Intelligence Laboratory, Yunjing Road 701, Shanghai, China
| | - Lu Lu
- Shanghai Artificial Intelligence Laboratory, Yunjing Road 701, Shanghai, China
| | - Jie Xu
- Shanghai Artificial Intelligence Laboratory, Yunjing Road 701, Shanghai, China.
| |
Collapse
|
7
|
Diaz-Uriarte R, Gómez de Lope E, Giugno R, Fröhlich H, Nazarov PV, Nepomuceno-Chamorro IA, Rauschenberger A, Glaab E. Ten quick tips for biomarker discovery and validation analyses using machine learning. PLoS Comput Biol 2022; 18:e1010357. [PMID: 35951526 PMCID: PMC9371329 DOI: 10.1371/journal.pcbi.1010357] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Affiliation(s)
- Ramon Diaz-Uriarte
- Department of Biochemistry, School of Medicine, Universidad Autónoma de Madrid, Instituto de Investigaciones Biomédicas ‘Alberto Sols’ (UAM-CSIC), Madrid, Spain
| | - Elisa Gómez de Lope
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Luxembourg
| | - Rosalba Giugno
- Department of Computer Science, University of Verona, Verona, Italy
| | - Holger Fröhlich
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
- Bonn-Aachen International Centre for IT (b-it), Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Petr V. Nazarov
- Department of Cancer Research, Luxembourg Institute of Health, Strassen, Luxembourg
| | | | - Armin Rauschenberger
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Luxembourg
| | - Enrico Glaab
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Luxembourg
- * E-mail:
| |
Collapse
|
8
|
Weng Z, Yue Z, Zhu Y, Chen JY. DEMA: a distance-bounded energy-field minimization algorithm to model and layout biomolecular networks with quantitative features. Bioinformatics 2022; 38:i359-i368. [PMID: 35758816 PMCID: PMC9235497 DOI: 10.1093/bioinformatics/btac261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
SUMMARY In biology, graph layout algorithms can reveal comprehensive biological contexts by visually positioning graph nodes in their relevant neighborhoods. A layout software algorithm/engine commonly takes a set of nodes and edges and produces layout coordinates of nodes according to edge constraints. However, current layout engines normally do not consider node, edge or node-set properties during layout and only curate these properties after the layout is created. Here, we propose a new layout algorithm, distance-bounded energy-field minimization algorithm (DEMA), to natively consider various biological factors, i.e., the strength of gene-to-gene association, the gene's relative contribution weight and the functional groups of genes, to enhance the interpretation of complex network graphs. In DEMA, we introduce a parameterized energy model where nodes are repelled by the network topology and attracted by a few biological factors, i.e., interaction coefficient, effect coefficient and fold change of gene expression. We generalize these factors as gene weights, protein-protein interaction weights, gene-to-gene correlations and the gene set annotations-four parameterized functional properties used in DEMA. Moreover, DEMA considers further attraction/repulsion/grouping coefficient to enable different preferences in generating network views. Applying DEMA, we performed two case studies using genetic data in autism spectrum disorder and Alzheimer's disease, respectively, for gene candidate discovery. Furthermore, we implement our algorithm as a plugin to Cytoscape, an open-source software platform for visualizing networks; hence, it is convenient. Our software and demo can be freely accessed at http://discovery.informatics.uab.edu/dema. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhenyu Weng
- Communication and Information Security Lab, Institute of Big Data Technologies, Shenzhen Graduate School, Peking University, Shenzhen 518055, China
| | - Zongliang Yue
- Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Yuesheng Zhu
- Communication and Information Security Lab, Institute of Big Data Technologies, Shenzhen Graduate School, Peking University, Shenzhen 518055, China
| | - Jake Yue Chen
- Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| |
Collapse
|
9
|
Quazi S. Artificial intelligence and machine learning in precision and genomic medicine. Med Oncol 2022; 39:120. [PMID: 35704152 PMCID: PMC9198206 DOI: 10.1007/s12032-022-01711-1] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Accepted: 03/14/2022] [Indexed: 10/28/2022]
Abstract
The advancement of precision medicine in medical care has led behind the conventional symptom-driven treatment process by allowing early risk prediction of disease through improved diagnostics and customization of more effective treatments. It is necessary to scrutinize overall patient data alongside broad factors to observe and differentiate between ill and relatively healthy people to take the most appropriate path toward precision medicine, resulting in an improved vision of biological indicators that can signal health changes. Precision and genomic medicine combined with artificial intelligence have the potential to improve patient healthcare. Patients with less common therapeutic responses or unique healthcare demands are using genomic medicine technologies. AI provides insights through advanced computation and inference, enabling the system to reason and learn while enhancing physician decision making. Many cell characteristics, including gene up-regulation, proteins binding to nucleic acids, and splicing, can be measured at high throughput and used as training objectives for predictive models. Researchers can create a new era of effective genomic medicine with the improved availability of a broad range of datasets and modern computer techniques such as machine learning. This review article has elucidated the contributions of ML algorithms in precision and genome medicine.
Collapse
Affiliation(s)
- Sameer Quazi
- GenLab Biosolutions Private Limited, Bangalore, Karnataka, 560043, India.
- Department of Biomedical Sciences, School of Life Sciences, Anglia Ruskin University, Cambridge, UK.
| |
Collapse
|
10
|
Abstract
The advancement of precision medicine in medical care has led behind the conventional symptom-driven treatment process by allowing early risk prediction of disease through improved diagnostics and customization of more effective treatments. It is necessary to scrutinize overall patient data alongside broad factors to observe and differentiate between ill and relatively healthy people to take the most appropriate path toward precision medicine, resulting in an improved vision of biological indicators that can signal health changes. Precision and genomic medicine combined with artificial intelligence have the potential to improve patient healthcare. Patients with less common therapeutic responses or unique healthcare demands are using genomic medicine technologies. AI provides insights through advanced computation and inference, enabling the system to reason and learn while enhancing physician decision making. Many cell characteristics, including gene up-regulation, proteins binding to nucleic acids, and splicing, can be measured at high throughput and used as training objectives for predictive models. Researchers can create a new era of effective genomic medicine with the improved availability of a broad range of datasets and modern computer techniques such as machine learning. This review article has elucidated the contributions of ML algorithms in precision and genome medicine.
Collapse
Affiliation(s)
- Sameer Quazi
- GenLab Biosolutions Private Limited, Bangalore, Karnataka, 560043, India.
- Department of Biomedical Sciences, School of Life Sciences, Anglia Ruskin University, Cambridge, UK.
| |
Collapse
|
11
|
Yue Z, Slominski R, Bharti S, Chen JY. PAGER Web APP: An Interactive, Online Gene Set and Network Interpretation Tool for Functional Genomics. Front Genet 2022; 13:820361. [PMID: 35495152 PMCID: PMC9039620 DOI: 10.3389/fgene.2022.820361] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Accepted: 03/17/2022] [Indexed: 12/30/2022] Open
Abstract
Functional genomics studies have helped researchers annotate differentially expressed gene lists, extract gene expression signatures, and identify biological pathways from omics profiling experiments conducted on biological samples. The current geneset, network, and pathway analysis (GNPA) web servers, e.g., DAVID, EnrichR, WebGestaltR, or PAGER, do not allow automated integrative functional genomic downstream analysis. In this study, we developed a new web-based interactive application, “PAGER Web APP”, which supports online R scripting of integrative GNPA. In a case study of melanoma drug resistance, we showed that the new PAGER Web APP enabled us to discover highly relevant pathways and network modules, leading to novel biological insights. We also compared PAGER Web APP’s pathway analysis results retrieved among PAGER, EnrichR, and WebGestaltR to show its advantages in integrative GNPA. The interactive online web APP is publicly accessible from the link, https://aimed-lab.shinyapps.io/PAGERwebapp/.
Collapse
Affiliation(s)
- Zongliang Yue
- Informatics Institute in the School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, United States
| | - Radomir Slominski
- Informatics Institute in the School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, United States
- Graduate Biomedical Sciences Program, The University of Alabama at Birmingham, Birmingham, AL, United States
| | - Samuel Bharti
- Informatics Institute in the School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, United States
| | - Jake Y. Chen
- Informatics Institute in the School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, United States
- *Correspondence: Jake Y. Chen,
| |
Collapse
|
12
|
Artificial Intelligence for Precision Oncology. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022; 1361:249-268. [DOI: 10.1007/978-3-030-91836-1_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
13
|
Rintala TJ, Federico A, Latonen L, Greco D, Fortino V. A systematic comparison of data- and knowledge-driven approaches to disease subtype discovery. Brief Bioinform 2021; 22:6350885. [PMID: 34396389 PMCID: PMC8575038 DOI: 10.1093/bib/bbab314] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Revised: 07/05/2021] [Accepted: 07/20/2021] [Indexed: 12/14/2022] Open
Abstract
Typical clustering analysis for large-scale genomics data combines two unsupervised learning techniques: dimensionality reduction and clustering (DR-CL) methods. It has been demonstrated that transforming gene expression to pathway-level information can improve the robustness and interpretability of disease grouping results. This approach, referred to as biological knowledge-driven clustering (BK-CL) approach, is often neglected, due to a lack of tools enabling systematic comparisons with more established DR-based methods. Moreover, classic clustering metrics based on group separability tend to favor the DR-CL paradigm, which may increase the risk of identifying less actionable disease subtypes that have ambiguous biological and clinical explanations. Hence, there is a need for developing metrics that assess biological and clinical relevance. To facilitate the systematic analysis of BK-CL methods, we propose a computational protocol for quantitative analysis of clustering results derived from both DR-CL and BK-CL methods. Moreover, we propose a new BK-CL method that combines prior knowledge of disease relevant genes, network diffusion algorithms and gene set enrichment analysis to generate robust pathway-level information. Benchmarking studies were conducted to compare the grouping results from different DR-CL and BK-CL approaches with respect to standard clustering evaluation metrics, concordance with known subtypes, association with clinical outcomes and disease modules in co-expression networks of genes. No single approach dominated every metric, showing the importance multi-objective evaluation in clustering analysis. However, we demonstrated that, on gene expression data sets derived from TCGA samples, the BK-CL approach can find groupings that provide significant prognostic value in both breast and prostate cancers.
Collapse
Affiliation(s)
- Teemu J Rintala
- Institute of Biomedicine University of Eastern Finland, Yliopistonranta 1 E, 70210 Kuopio, Finland
| | - Antonio Federico
- Faculty of Medicine and Health Technology Tampere University, Kalevantie, 4 33100 Tampere, Finland.,BioMediTech Institute Tampere University, Kalevantie 4, 33100 Tampere, Finland
| | - Leena Latonen
- Institute of Biomedicine University of Eastern Finland, Yliopistonranta 1 E, 70210 Kuopio, Finland
| | - Dario Greco
- Faculty of Medicine and Health Technology Tampere University, Kalevantie, 4 33100 Tampere, Finland.,BioMediTech Institute Tampere University, Kalevantie 4, 33100 Tampere, Finland.,Institute of Biotechnology University of Helsinki, Viikinkaari 5d, 00014 Helsinki, Finland
| | - Vittorio Fortino
- Institute of Biomedicine University of Eastern Finland, Yliopistonranta 1 E, 70210 Kuopio, Finland
| |
Collapse
|
14
|
Duan R, Gao L, Gao Y, Hu Y, Xu H, Huang M, Song K, Wang H, Dong Y, Jiang C, Zhang C, Jia S. Evaluation and comparison of multi-omics data integration methods for cancer subtyping. PLoS Comput Biol 2021; 17:e1009224. [PMID: 34383739 PMCID: PMC8384175 DOI: 10.1371/journal.pcbi.1009224] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Revised: 08/24/2021] [Accepted: 06/28/2021] [Indexed: 11/18/2022] Open
Abstract
Computational integrative analysis has become a significant approach in the data-driven exploration of biological problems. Many integration methods for cancer subtyping have been proposed, but evaluating these methods has become a complicated problem due to the lack of gold standards. Moreover, questions of practical importance remain to be addressed regarding the impact of selecting appropriate data types and combinations on the performance of integrative studies. Here, we constructed three classes of benchmarking datasets of nine cancers in TCGA by considering all the eleven combinations of four multi-omics data types. Using these datasets, we conducted a comprehensive evaluation of ten representative integration methods for cancer subtyping in terms of accuracy measured by combining both clustering accuracy and clinical significance, robustness, and computational efficiency. We subsequently investigated the influence of different omics data on cancer subtyping and the effectiveness of their combinations. Refuting the widely held intuition that incorporating more types of omics data always produces better results, our analyses showed that there are situations where integrating more omics data negatively impacts the performance of integration methods. Our analyses also suggested several effective combinations for most cancers under our studies, which may be of particular interest to researchers in omics data analysis. Cancer is one of the most heterogeneous diseases, characterized by diverse morphological, phenotypic, and genomic profiles between tumors and their subtypes. Identifying cancer subtypes can help patients receive precise treatments. With the development of high-throughput technologies, genomics, epigenomics, and transcriptomics data have been generated for large cancer patient cohorts. It is believed that the more omics data we use, the more accurate identification of cancer subtypes. To examine this assumption, we first constructed three classes of benchmarking datasets to conduct a comprehensive evaluation and comparison of ten representative multi-omics data integration methods for cancer subtyping by considering their accuracy, robustness, and computational efficiency. Then, we investigated the influence of different omics data and their various combinations on the effectiveness of cancer subtyping. Our analyses showed that there are situations where integrating more omics data negatively impacts the performance of integration methods. We hope that our work may help researchers choose a proper method and an effective data combination when identifying cancer subtypes using data integration methods.
Collapse
Affiliation(s)
- Ran Duan
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi’an, China
- * E-mail:
| | - Yong Gao
- Department of Computer Science, The University of British Columbia Okanagan, Kelowna, British Columbia, Canada
| | - Yuxuan Hu
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Han Xu
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Mingfeng Huang
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Kuo Song
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Hongda Wang
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Yongqiang Dong
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Chaoqun Jiang
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Chenxing Zhang
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Songwei Jia
- School of Computer Science and Technology, Xidian University, Xi’an, China
| |
Collapse
|
15
|
Oh JH, Choi W, Ko E, Kang M, Tannenbaum A, Deasy JO. PathCNN: interpretable convolutional neural networks for survival prediction and pathway analysis applied to glioblastoma. Bioinformatics 2021; 37:i443-i450. [PMID: 34252964 PMCID: PMC8336441 DOI: 10.1093/bioinformatics/btab285] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
MOTIVATION Convolutional neural networks (CNNs) have achieved great success in the areas of image processing and computer vision, handling grid-structured inputs and efficiently capturing local dependencies through multiple levels of abstraction. However, a lack of interpretability remains a key barrier to the adoption of deep neural networks, particularly in predictive modeling of disease outcomes. Moreover, because biological array data are generally represented in a non-grid structured format, CNNs cannot be applied directly. RESULTS To address these issues, we propose a novel method, called PathCNN, that constructs an interpretable CNN model on integrated multi-omics data using a newly defined pathway image. PathCNN showed promising predictive performance in differentiating between long-term survival (LTS) and non-LTS when applied to glioblastoma multiforme (GBM). The adoption of a visualization tool coupled with statistical analysis enabled the identification of plausible pathways associated with survival in GBM. In summary, PathCNN demonstrates that CNNs can be effectively applied to multi-omics data in an interpretable manner, resulting in promising predictive power while identifying key biological correlates of disease. AVAILABILITY AND IMPLEMENTATION The source code is freely available at: https://github.com/mskspi/PathCNN.
Collapse
Affiliation(s)
- Jung Hun Oh
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Wookjin Choi
- Department of Computer Science, Virginia State University, Petersburg, VA 23806, USA
| | - Euiseong Ko
- Department of Computer Science, University of Nevada, Las Vegas, NV 89154, USA
| | - Mingon Kang
- Department of Computer Science, University of Nevada, Las Vegas, NV 89154, USA
| | - Allen Tannenbaum
- Departments of Computer Science and Applied Mathematics & Statistics, Stony Brook University, New York, NY 11794, USA
| | - Joseph O Deasy
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| |
Collapse
|
16
|
Liñares-Blanco J, Pazos A, Fernandez-Lozano C. Machine learning analysis of TCGA cancer data. PeerJ Comput Sci 2021; 7:e584. [PMID: 34322589 PMCID: PMC8293929 DOI: 10.7717/peerj-cs.584] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Accepted: 05/17/2021] [Indexed: 06/13/2023]
Abstract
In recent years, machine learning (ML) researchers have changed their focus towards biological problems that are difficult to analyse with standard approaches. Large initiatives such as The Cancer Genome Atlas (TCGA) have allowed the use of omic data for the training of these algorithms. In order to study the state of the art, this review is provided to cover the main works that have used ML with TCGA data. Firstly, the principal discoveries made by the TCGA consortium are presented. Once these bases have been established, we begin with the main objective of this study, the identification and discussion of those works that have used the TCGA data for the training of different ML approaches. After a review of more than 100 different papers, it has been possible to make a classification according to following three pillars: the type of tumour, the type of algorithm and the predicted biological problem. One of the conclusions drawn in this work shows a high density of studies based on two major algorithms: Random Forest and Support Vector Machines. We also observe the rise in the use of deep artificial neural networks. It is worth emphasizing, the increase of integrative models of multi-omic data analysis. The different biological conditions are a consequence of molecular homeostasis, driven by both protein coding regions, regulatory elements and the surrounding environment. It is notable that a large number of works make use of genetic expression data, which has been found to be the preferred method by researchers when training the different models. The biological problems addressed have been classified into five types: prognosis prediction, tumour subtypes, microsatellite instability (MSI), immunological aspects and certain pathways of interest. A clear trend was detected in the prediction of these conditions according to the type of tumour. That is the reason for which a greater number of works have focused on the BRCA cohort, while specific works for survival, for example, were centred on the GBM cohort, due to its large number of events. Throughout this review, it will be possible to go in depth into the works and the methodologies used to study TCGA cancer data. Finally, it is intended that this work will serve as a basis for future research in this field of study.
Collapse
Affiliation(s)
- Jose Liñares-Blanco
- CITIC-Research Center of Information and Communication Technologies, University of A Coruna, A Coruña, Spain
- Department of Computer Science and Information Technologies, Faculty of Computer Science, University of A Coruna, A Coruña, Spain
| | - Alejandro Pazos
- CITIC-Research Center of Information and Communication Technologies, University of A Coruna, A Coruña, Spain
- Department of Computer Science and Information Technologies, Faculty of Computer Science, University of A Coruna, A Coruña, Spain
- Grupo de Redes de Neuronas Artificiales y Sistemas Adaptativos. Imagen Médica y Diagnóstico Radiológico (RNASA-IMEDIR). Complexo Hospitalario Universitario de A Coruña (CHUAC), SERGAS, Universidade da Coruña, Instituto de Investigación Biomédica de A Coruña (INIBIC), A Coruña, Spain
| | - Carlos Fernandez-Lozano
- CITIC-Research Center of Information and Communication Technologies, University of A Coruna, A Coruña, Spain
- Department of Computer Science and Information Technologies, Faculty of Computer Science, University of A Coruna, A Coruña, Spain
- Grupo de Redes de Neuronas Artificiales y Sistemas Adaptativos. Imagen Médica y Diagnóstico Radiológico (RNASA-IMEDIR). Complexo Hospitalario Universitario de A Coruña (CHUAC), SERGAS, Universidade da Coruña, Instituto de Investigación Biomédica de A Coruña (INIBIC), A Coruña, Spain
| |
Collapse
|
17
|
Patel SK, George B, Rai V. Artificial Intelligence to Decode Cancer Mechanism: Beyond Patient Stratification for Precision Oncology. Front Pharmacol 2020; 11:1177. [PMID: 32903628 PMCID: PMC7438594 DOI: 10.3389/fphar.2020.01177] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2020] [Accepted: 07/20/2020] [Indexed: 12/13/2022] Open
Abstract
The multitude of multi-omics data generated cost-effectively using advanced high-throughput technologies has imposed challenging domain for research in Artificial Intelligence (AI). Data curation poses a significant challenge as different parameters, instruments, and sample preparations approaches are employed for generating these big data sets. AI could reduce the fuzziness and randomness in data handling and build a platform for the data ecosystem, and thus serve as the primary choice for data mining and big data analysis to make informed decisions. However, AI implication remains intricate for researchers/clinicians lacking specific training in computational tools and informatics. Cancer is a major cause of death worldwide, accounting for an estimated 9.6 million deaths in 2018. Certain cancers, such as pancreatic and gastric cancers, are detected only after they have reached their advanced stages with frequent relapses. Cancer is one of the most complex diseases affecting a range of organs with diverse disease progression mechanisms and the effectors ranging from gene-epigenetics to a wide array of metabolites. Hence a comprehensive study, including genomics, epi-genomics, transcriptomics, proteomics, and metabolomics, along with the medical/mass-spectrometry imaging, patient clinical history, treatments provided, genetics, and disease endemicity, is essential. Cancer Moonshot℠ Research Initiatives by NIH National Cancer Institute aims to collect as much information as possible from different regions of the world and make a cancer data repository. AI could play an immense role in (a) analysis of complex and heterogeneous data sets (multi-omics and/or inter-omics), (b) data integration to provide a holistic disease molecular mechanism, (c) identification of diagnostic and prognostic markers, and (d) monitor patient's response to drugs/treatments and recovery. AI enables precision disease management well beyond the prevalent disease stratification patterns, such as differential expression and supervised classification. This review highlights critical advances and challenges in omics data analysis, dealing with data variability from lab-to-lab, and data integration. We also describe methods used in data mining and AI methods to obtain robust results for precision medicine from "big" data. In the future, AI could be expanded to achieve ground-breaking progress in disease management.
Collapse
Affiliation(s)
- Sandip Kumar Patel
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Mumbai, India
- Buck Institute for Research on Aging, Novato, CA, United States
| | - Bhawana George
- Department of Hematopathology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Vineeta Rai
- Department of Entomology & Plant Pathology, North Carolina State University, Raleigh, NC, United States
| |
Collapse
|
18
|
Hao J, Kim Y, Mallavarapu T, Oh JH, Kang M. Interpretable deep neural network for cancer survival analysis by integrating genomic and clinical data. BMC Med Genomics 2019; 12:189. [PMID: 31865908 PMCID: PMC6927105 DOI: 10.1186/s12920-019-0624-2] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Background Understanding the complex biological mechanisms of cancer patient survival using genomic and clinical data is vital, not only to develop new treatments for patients, but also to improve survival prediction. However, highly nonlinear and high-dimension, low-sample size (HDLSS) data cause computational challenges to applying conventional survival analysis. Results We propose a novel biologically interpretable pathway-based sparse deep neural network, named Cox-PASNet, which integrates high-dimensional gene expression data and clinical data on a simple neural network architecture for survival analysis. Cox-PASNet is biologically interpretable where nodes in the neural network correspond to biological genes and pathways, while capturing the nonlinear and hierarchical effects of biological pathways associated with cancer patient survival. We also propose a heuristic optimization solution to train Cox-PASNet with HDLSS data. Cox-PASNet was intensively evaluated by comparing the predictive performance of current state-of-the-art methods on glioblastoma multiforme (GBM) and ovarian serous cystadenocarcinoma (OV) cancer. In the experiments, Cox-PASNet showed out-performance, compared to the benchmarking methods. Moreover, the neural network architecture of Cox-PASNet was biologically interpreted, and several significant prognostic factors of genes and biological pathways were identified. Conclusions Cox-PASNet models biological mechanisms in the neural network by incorporating biological pathway databases and sparse coding. The neural network of Cox-PASNet can identify nonlinear and hierarchical associations of genomic and clinical data to cancer patient survival. The open-source code of Cox-PASNet in PyTorch implemented for training, evaluation, and model interpretation is available at: https://github.com/DataX-JieHao/Cox-PASNet.
Collapse
Affiliation(s)
- Jie Hao
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Youngsoon Kim
- Department of Computer Science, Kennesaw State University, Marietta, GA, USA
| | | | - Jung Hun Oh
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Mingon Kang
- Department of Computer Science, University of Nevada, Las Vegas, Las Vegas, NV, USA.
| |
Collapse
|