1
|
Du L, Gao P, Liu Z, Yin N, Wang X. TMODINET: A trustworthy multi-omics dynamic learning integration network for cancer diagnostic. Comput Biol Chem 2024; 113:108202. [PMID: 39243551 DOI: 10.1016/j.compbiolchem.2024.108202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Revised: 07/23/2024] [Accepted: 08/31/2024] [Indexed: 09/09/2024]
Abstract
Multiple types of omics data contain a wealth of biomedical information which reflect different aspects of clinical samples. Multi-omics integrated analysis is more likely to lead to more accurate clinical decisions. Existing cancer diagnostic methods based on multi-omics data integration mainly focus on the classification accuracy of the model, while neglecting the interpretability of the internal mechanism and the reliability of the results, which are crucial in specific domains such as precision medicine and the life sciences. To overcome this limitation, we propose a trustworthy multi-omics dynamic learning framework (TMODINET) for cancer diagnostic. The framework employs multi-omics adaptive dynamic learning to process each sample to provide patient-centered personality diagnosis by using self-attentional learning of features and modalities. To characterize the correlation between samples well, we introduce a graph dynamic learning method which can adaptively adjust the graph structure according to the specific classification results for specific graph convolutional networks (GCN) learning. Moreover, we utilize an uncertainty mechanism by employing Dirichlet distribution and Dempster-Shafer theory to obtain uncertainty and integrate multi-omics data at the decision level, ensuring trustworthy for cancer diagnosis. Extensive experiments on four real-world multimodal medical datasets are conducted. Compared to state-of-the-art methods, the superior performance and trustworthiness of our proposed algorithm are clearly validated. Our model has great potential for clinical diagnosis.
Collapse
Affiliation(s)
- Ling Du
- Department of Software, Tiangong University, Tianjin, China.
| | - Peipei Gao
- Department of Computer Science and Technology, Tiangong University, Tianjin, China.
| | - Zhuang Liu
- School of FinTech, Research Center of Applied Finance Dongbei University of Finance & Economics, Dalian, China.
| | - Nan Yin
- Department of Machine Learning, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates.
| | - Xiaochao Wang
- Department of Mathematical Sciences, Tiangong University, Tianjin, China.
| |
Collapse
|
2
|
Oh VKS, Li RW. Wise Roles and Future Visionary Endeavors of Current Emperor: Advancing Dynamic Methods for Longitudinal Microbiome Meta-Omics Data in Personalized and Precision Medicine. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2400458. [PMID: 39535493 DOI: 10.1002/advs.202400458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 09/16/2024] [Indexed: 11/16/2024]
Abstract
Understanding the etiological complexity of diseases requires identifying biomarkers longitudinally associated with specific phenotypes. Advanced sequencing tools generate dynamic microbiome data, providing insights into microbial community functions and their impact on health. This review aims to explore the current roles and future visionary endeavors of dynamic methods for integrating longitudinal microbiome multi-omics data in personalized and precision medicine. This work seeks to synthesize existing research, propose best practices, and highlight innovative techniques. The development and application of advanced dynamic methods, including the unified analytical frameworks and deep learning tools in artificial intelligence, are critically examined. Aggregating data on microbes, metabolites, genes, and other entities offers profound insights into the interactions among microorganisms, host physiology, and external stimuli. Despite progress, the absence of gold standards for validating analytical protocols and data resources of various longitudinal multi-omics studies remains a significant challenge. The interdependence of workflow steps critically affects overall outcomes. This work provides a comprehensive roadmap for best practices, addressing current challenges with advanced dynamic methods. The review underscores the biological effects of clinical, experimental, and analytical protocol settings on outcomes. Establishing consensus on dynamic microbiome inter-studies and advancing reliable analytical protocols are pivotal for the future of personalized and precision medicine.
Collapse
Affiliation(s)
- Vera-Khlara S Oh
- Big Biomedical Data Integration and Statistical Analysis (DIANA) Research Center, Department of Data Science, College of Natural Sciences, Jeju National University, Jeju City, Jeju Do, 63243, South Korea
| | - Robert W Li
- United States Department of Agriculture, Agricultural Research Service, Animal Genomics and Improvement Laboratory, Beltsville, MD, 20705, USA
| |
Collapse
|
3
|
Cai Z, Apolinário S, Baião AR, Pacini C, Sousa MD, Vinga S, Reddel RR, Robinson PJ, Garnett MJ, Zhong Q, Gonçalves E. Synthetic augmentation of cancer cell line multi-omic datasets using unsupervised deep learning. Nat Commun 2024; 15:10390. [PMID: 39614072 DOI: 10.1038/s41467-024-54771-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Accepted: 11/18/2024] [Indexed: 12/01/2024] Open
Abstract
Integrating diverse types of biological data is essential for a holistic understanding of cancer biology, yet it remains challenging due to data heterogeneity, complexity, and sparsity. Addressing this, our study introduces an unsupervised deep learning model, MOSA (Multi-Omic Synthetic Augmentation), specifically designed to integrate and augment the Cancer Dependency Map (DepMap). Harnessing orthogonal multi-omic information, this model successfully generates molecular and phenotypic profiles, resulting in an increase of 32.7% in the number of multi-omic profiles and thereby generating a complete DepMap for 1523 cancer cell lines. The synthetically enhanced data increases statistical power, uncovering less studied mechanisms associated with drug resistance, and refines the identification of genetic associations and clustering of cancer cell lines. By applying SHapley Additive exPlanations (SHAP) for model interpretation, MOSA reveals multi-omic features essential for cell clustering and biomarker identification related to drug and gene dependencies. This understanding is crucial for developing much-needed effective strategies to prioritize cancer targets.
Collapse
Affiliation(s)
- Zhaoxiang Cai
- ProCan®, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW, Australia
| | - Sofia Apolinário
- INESC-ID, 1000-029, Lisboa, Portugal
- Instituto Superior Técnico (IST), Universidade de Lisboa, 1049-001, Lisboa, Portugal
| | - Ana R Baião
- INESC-ID, 1000-029, Lisboa, Portugal
- Instituto Superior Técnico (IST), Universidade de Lisboa, 1049-001, Lisboa, Portugal
| | - Clare Pacini
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, CB10 1SA, UK
| | - Miguel D Sousa
- INESC-ID, 1000-029, Lisboa, Portugal
- Instituto Superior Técnico (IST), Universidade de Lisboa, 1049-001, Lisboa, Portugal
| | - Susana Vinga
- INESC-ID, 1000-029, Lisboa, Portugal
- Instituto Superior Técnico (IST), Universidade de Lisboa, 1049-001, Lisboa, Portugal
| | - Roger R Reddel
- ProCan®, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW, Australia
| | - Phillip J Robinson
- ProCan®, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW, Australia
| | - Mathew J Garnett
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, CB10 1SA, UK
| | - Qing Zhong
- ProCan®, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW, Australia.
| | - Emanuel Gonçalves
- INESC-ID, 1000-029, Lisboa, Portugal.
- Instituto Superior Técnico (IST), Universidade de Lisboa, 1049-001, Lisboa, Portugal.
| |
Collapse
|
4
|
Tang X, Prodduturi N, Thompson K, Weinshilboum R, O’Sullivan C, Boughey J, Tizhoosh H, Klee E, Wang L, Goetz M, Suman V, Kalari K. OmicsFootPrint: a framework to integrate and interpret multi-omics data using circular images and deep neural networks. Nucleic Acids Res 2024; 52:e99. [PMID: 39445795 PMCID: PMC11602161 DOI: 10.1093/nar/gkae915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Revised: 08/14/2024] [Accepted: 10/07/2024] [Indexed: 10/25/2024] Open
Abstract
The OmicsFootPrint framework addresses the need for advanced multi-omics data analysis methodologies by transforming data into intuitive two-dimensional circular images and facilitating the interpretation of complex diseases. Utilizing deep neural networks and incorporating the SHapley Additive exPlanations algorithm, the framework enhances model interpretability. Tested with The Cancer Genome Atlas data, OmicsFootPrint effectively classified lung and breast cancer subtypes, achieving high area under the curve (AUC) scores-0.98 ± 0.02 for lung cancer subtype differentiation and 0.83 ± 0.07 for breast cancer PAM50 subtypes, and successfully distinguished between invasive lobular and ductal carcinomas in breast cancer, showcasing its robustness. It also demonstrated notable performance in predicting drug responses in cancer cell lines, with a median AUC of 0.74, surpassing nine existing methods. Furthermore, its effectiveness persists even with reduced training sample sizes. OmicsFootPrint marks an enhancement in multi-omics research, offering a novel, efficient and interpretable approach that contributes to a deeper understanding of disease mechanisms.
Collapse
Affiliation(s)
- Xiaojia Tang
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN 55905, USA
| | - Naresh Prodduturi
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN 55905, USA
| | - Kevin J Thompson
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN 55905, USA
| | - Richard Weinshilboum
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, MN 55905, USA
| | | | - Judy C Boughey
- Department of Surgery, Mayo Clinic, Rochester, MN 55905, USA
| | - Hamid R Tizhoosh
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN 55905, USA
| | - Eric W Klee
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN 55905, USA
| | - Liewei Wang
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, MN 55905, USA
| | - Matthew P Goetz
- Department of Oncology, Mayo Clinic, Rochester, MN 55905, USA
| | - Vera Suman
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN 55905, USA
| | - Krishna R Kalari
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN 55905, USA
| |
Collapse
|
5
|
Vidanagamachchi SM, Waidyarathna KMGTR. Opportunities, challenges and future perspectives of using bioinformatics and artificial intelligence techniques on tropical disease identification using omics data. Front Digit Health 2024; 6:1471200. [PMID: 39654982 PMCID: PMC11625773 DOI: 10.3389/fdgth.2024.1471200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Accepted: 11/06/2024] [Indexed: 12/12/2024] Open
Abstract
Tropical diseases can often be caused by viruses, bacteria, parasites, and fungi. They can be spread over vectors. Analysis of multiple omics data types can be utilized in providing comprehensive insights into biological system functions and disease progression. To this end, bioinformatics tools and diverse AI techniques are pivotal in identifying and understanding tropical diseases through the analysis of omics data. In this article, we provide a thorough review of opportunities, challenges, and future directions of utilizing Bioinformatics tools and AI-assisted models on tropical disease identification using various omics data types. We conducted the review from 2015 to 2024 considering reliable databases of peer-reviewed journals and conference articles. Several keywords were taken for the article searching and around 40 articles were reviewed. According to the review, we observed that utilization of omics data with Bioinformatics tools like BLAST, and Clustal Omega can make significant outcomes in tropical disease identification. Further, the integration of multiple omics data improves biomarker identification, and disease predictions including disease outbreak predictions. Moreover, AI-assisted models can improve the precision, cost-effectiveness, and efficiency of CRISPR-based gene editing, optimizing gRNA design, and supporting advanced genetic correction. Several AI-assisted models including XAI can be used to identify diseases and repurpose therapeutic targets and biomarkers efficiently. Furthermore, recent advancements including Transformer-based models such as BERT and GPT-4, have been mainly applied for sequence analysis and functional genomics. Finally, the most recent GeneViT model, utilizing Vision Transformers, and other AI techniques like Generative Adversarial Networks, Federated Learning, Transfer Learning, Reinforcement Learning, Automated ML and Attention Mechanism have shown significant performance in disease classification using omics data.
Collapse
Affiliation(s)
- S. M. Vidanagamachchi
- Department of Computer Science, Faculty of Science, University of Ruhuna, Matara, Sri Lanka
| | - K. M. G. T. R. Waidyarathna
- Department of Information Technology, Sri Lanka Institute of Advanced Technological Education, Galle, Sri Lanka
| |
Collapse
|
6
|
Ali SS, Li Q, Agrawal PB. Implementation of multi-omics in diagnosis of pediatric rare diseases. Pediatr Res 2024:10.1038/s41390-024-03728-w. [PMID: 39562738 DOI: 10.1038/s41390-024-03728-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 10/24/2024] [Accepted: 10/28/2024] [Indexed: 11/21/2024]
Abstract
The rapid and accurate diagnosis of rare diseases is paramount in directing clinical management. In recent years, the integration of multi-omics approaches has emerged as a potential strategy to overcome diagnostic hurdles. This review examines the application of multi-omics technologies, including genomics, epigenomics, transcriptomics, proteomics, and metabolomics, in relation to the diagnostic journey of rare diseases. We explore how these combined approaches enhance the detection of pathogenic genetic variants and decipher molecular mechanisms. This review highlights the groundbreaking potential of multi-omics in advancing the precision medicine paradigm for rare diseases, offering insights into future directions and clinical applications. IMPACT: This review discusses using current tests and emerging technologies to diagnose pediatric rare diseases. We describe the next steps after inconclusive molecular testing and a structure for using multi-omics in further investigations. The use of multi-omics is expanding, and it is essential to incorporate it into clinical practice to enhance individualized patient care.
Collapse
Affiliation(s)
- Sara S Ali
- Division of Neonatology, Department of Pediatrics, University of Miami Miller School of Medicine and Holtz Children's Hospital, Jackson Health System, Miami, FL, USA
| | - Qifei Li
- Division of Neonatology, Department of Pediatrics, University of Miami Miller School of Medicine and Holtz Children's Hospital, Jackson Health System, Miami, FL, USA
| | - Pankaj B Agrawal
- Division of Neonatology, Department of Pediatrics, University of Miami Miller School of Medicine and Holtz Children's Hospital, Jackson Health System, Miami, FL, USA.
| |
Collapse
|
7
|
Sanches PHG, de Melo NC, Porcari AM, de Carvalho LM. Integrating Molecular Perspectives: Strategies for Comprehensive Multi-Omics Integrative Data Analysis and Machine Learning Applications in Transcriptomics, Proteomics, and Metabolomics. BIOLOGY 2024; 13:848. [PMID: 39596803 PMCID: PMC11592251 DOI: 10.3390/biology13110848] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2024] [Revised: 07/19/2024] [Accepted: 07/25/2024] [Indexed: 11/29/2024]
Abstract
With the advent of high-throughput technologies, the field of omics has made significant strides in characterizing biological systems at various levels of complexity. Transcriptomics, proteomics, and metabolomics are the three most widely used omics technologies, each providing unique insights into different layers of a biological system. However, analyzing each omics data set separately may not provide a comprehensive understanding of the subject under study. Therefore, integrating multi-omics data has become increasingly important in bioinformatics research. In this article, we review strategies for integrating transcriptomics, proteomics, and metabolomics data, including co-expression analysis, metabolite-gene networks, constraint-based models, pathway enrichment analysis, and interactome analysis. We discuss combined omics integration approaches, correlation-based strategies, and machine learning techniques that utilize one or more types of omics data. By presenting these methods, we aim to provide researchers with a better understanding of how to integrate omics data to gain a more comprehensive view of a biological system, facilitating the identification of complex patterns and interactions that might be missed by single-omics analyses.
Collapse
Affiliation(s)
- Pedro H. Godoy Sanches
- MS4Life Laboratory of Mass Spectrometry, Health Sciences Postgraduate Program, São Francisco University, Bragança Paulista 12916-900, SP, Brazil
| | - Nicolly Clemente de Melo
- Graduate Program in Biomedicine, São Francisco University, Bragança Paulista 12916-900, SP, Brazil
| | - Andreia M. Porcari
- MS4Life Laboratory of Mass Spectrometry, Health Sciences Postgraduate Program, São Francisco University, Bragança Paulista 12916-900, SP, Brazil
| | - Lucas Miguel de Carvalho
- Post Graduate Program in Health Sciences, São Francisco University, Bragança Paulista 12916-900, SP, Brazil
| |
Collapse
|
8
|
Ballard JL, Wang Z, Li W, Shen L, Long Q. Deep learning-based approaches for multi-omics data integration and analysis. BioData Min 2024; 17:38. [PMID: 39358793 PMCID: PMC11446004 DOI: 10.1186/s13040-024-00391-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Accepted: 09/06/2024] [Indexed: 10/04/2024] Open
Abstract
BACKGROUND The rapid growth of deep learning, as well as the vast and ever-growing amount of available data, have provided ample opportunity for advances in fusion and analysis of complex and heterogeneous data types. Different data modalities provide complementary information that can be leveraged to gain a more complete understanding of each subject. In the biomedical domain, multi-omics data includes molecular (genomics, transcriptomics, proteomics, epigenomics, metabolomics, etc.) and imaging (radiomics, pathomics) modalities which, when combined, have the potential to improve performance on prediction, classification, clustering and other tasks. Deep learning encompasses a wide variety of methods, each of which have certain strengths and weaknesses for multi-omics integration. METHOD In this review, we categorize recent deep learning-based approaches by their basic architectures and discuss their unique capabilities in relation to one another. We also discuss some emerging themes advancing the field of multi-omics integration. RESULTS Deep learning-based multi-omics integration methods were categorized broadly into non-generative (feedforward neural networks, graph convolutional neural networks, and autoencoders) and generative (variational methods, generative adversarial models, and a generative pretrained model). Generative methods have the advantage of being able to impose constraints on the shared representations to enforce certain properties or incorporate prior knowledge. They can also be used to generate or impute missing modalities. Recent advances achieved by these methods include the ability to handle incomplete data as well as going beyond the traditional molecular omics data types to integrate other modalities such as imaging data. CONCLUSION We expect to see further growth in methods that can handle missingness, as this is a common challenge in working with complex and heterogeneous data. Additionally, methods that integrate more data types are expected to improve performance on downstream tasks by capturing a comprehensive view of each sample.
Collapse
Affiliation(s)
- Jenna L Ballard
- Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA, 19104, USA.
| | - Zexuan Wang
- Graduate Group in Applied Mathematics and Computational Science, University of Pennsylvania, 209 S. 33rd Street, Philadelphia, PA, 19104, USA
| | - Wenrui Li
- Department of Statistics, University of Connecticut, 215 Glenbrook Road, Storrs, CT, 06269, USA
| | - Li Shen
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, 423 Guardian Drive, Philadelphia, PA, 19104, USA.
| | - Qi Long
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, 423 Guardian Drive, Philadelphia, PA, 19104, USA.
| |
Collapse
|
9
|
Cirinciani M, Da Pozzo E, Trincavelli ML, Milazzo P, Martini C. Drug Mechanism: A bioinformatic update. Biochem Pharmacol 2024; 228:116078. [PMID: 38402909 DOI: 10.1016/j.bcp.2024.116078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 02/01/2024] [Accepted: 02/22/2024] [Indexed: 02/27/2024]
Abstract
A drug Mechanism of Action (MoA) is a complex biological phenomenon that describes how a bioactive compound produces a pharmacological effect. The complete knowledge of MoA is fundamental to fully understanding the drug activity. Over the years, many experimental methods have been developed and a huge quantity of data has been produced. Nowadays, considering the increasing omics data availability and the improvement of the accessible computational resources, the study of a drug MoA is conducted by integrating experimental and bioinformatics approaches. The development of new in silico solutions for this type of analysis is continuously ongoing; herein, an updating review on such bioinformatic methods is presented. The methodologies cited are based on multi-omics data integration in biochemical networks and Machine Learning (ML). The multiple types of usable input data and the advantages and disadvantages of each method have been analyzed, with a focus on their applications. Three specific research areas (i.e. cancer drug development, antibiotics discovery, and drug repurposing) have been chosen for their importance in the drug discovery fields in which the study of drug MoA, through novel bioinformatics approaches, is particularly productive.
Collapse
Affiliation(s)
- Martina Cirinciani
- Department of Pharmacy, University of Pisa, via Bonanno 6, 56126 Pisa, Italy
| | - Eleonora Da Pozzo
- Department of Pharmacy, University of Pisa, via Bonanno 6, 56126 Pisa, Italy; Center for Instrument Sharing University of Pisa (CISUP), Lungarno Pacinotti, 43/44, 56126 Pisa, Italy
| | - Maria Letizia Trincavelli
- Department of Pharmacy, University of Pisa, via Bonanno 6, 56126 Pisa, Italy; Center for Instrument Sharing University of Pisa (CISUP), Lungarno Pacinotti, 43/44, 56126 Pisa, Italy
| | - Paolo Milazzo
- Center for Instrument Sharing University of Pisa (CISUP), Lungarno Pacinotti, 43/44, 56126 Pisa, Italy; Department of Computer Science, University of Pisa, Largo Pontecorvo, 3, 56127 Pisa, Italy
| | - Claudia Martini
- Department of Pharmacy, University of Pisa, via Bonanno 6, 56126 Pisa, Italy; Center for Instrument Sharing University of Pisa (CISUP), Lungarno Pacinotti, 43/44, 56126 Pisa, Italy.
| |
Collapse
|
10
|
Wang Y, Hong J, Lu Y, Sheng N, Fu Y, Yang L, Meng L, Huang L, Wang H. A Controllability Reinforcement Learning Method for Pancreatic Cancer Biomarker Identification. IEEE Trans Nanobioscience 2024; 23:556-563. [PMID: 39133596 DOI: 10.1109/tnb.2024.3441689] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2024]
Abstract
Pancreatic cancer is one of the most malignant cancers with rapid progression and poor prognosis. The use of transcriptional data can be effective in finding new biomarkers for pancreatic cancer. Many network-based methods used to identify cancer biomarkers are proposed, among which the combination of network controllability appears. However, most of the existing methods do not study RNA, rely on priori and mutations information, or can only achieve classification tasks. In this study, we propose a method combined Relational Graph Convolutional Network and Deep Q-Network called RDDriver to identify pancreatic cancer biomarkers based on multi-layer heterogeneous transcriptional regulation network. Firstly, we construct a regulation network containing long non-coding RNA, microRNA, and messenger RNA. Secondly, Relational Graph Convolutional Network is used to learn the node representation. Finally, we use the idea of Deep Q-Network to build a model, which score and prioritize each RNA with the Popov-Belevitch-Hautus criterion. We train RDDriver on three small simulated networks, and calculate the average score after applying the model parameters to the regulation networks separately. To demonstrate the effectiveness of the method, we perform experiments for comparison between RDDriver and other eight methods based on the approximate benchmark of three types cancer drivers RNAs.
Collapse
|
11
|
Vitorino R. Transforming Clinical Research: The Power of High-Throughput Omics Integration. Proteomes 2024; 12:25. [PMID: 39311198 PMCID: PMC11417901 DOI: 10.3390/proteomes12030025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Revised: 08/31/2024] [Accepted: 09/02/2024] [Indexed: 09/26/2024] Open
Abstract
High-throughput omics technologies have dramatically changed biological research, providing unprecedented insights into the complexity of living systems. This review presents a comprehensive examination of the current landscape of high-throughput omics pipelines, covering key technologies, data integration techniques and their diverse applications. It looks at advances in next-generation sequencing, mass spectrometry and microarray platforms and highlights their contribution to data volume and precision. In addition, this review looks at the critical role of bioinformatics tools and statistical methods in managing the large datasets generated by these technologies. By integrating multi-omics data, researchers can gain a holistic understanding of biological systems, leading to the identification of new biomarkers and therapeutic targets, particularly in complex diseases such as cancer. The review also looks at the integration of omics data into electronic health records (EHRs) and the potential for cloud computing and big data analytics to improve data storage, analysis and sharing. Despite significant advances, there are still challenges such as data complexity, technical limitations and ethical issues. Future directions include the development of more sophisticated computational tools and the application of advanced machine learning techniques, which are critical for addressing the complexity and heterogeneity of omics datasets. This review aims to serve as a valuable resource for researchers and practitioners, highlighting the transformative potential of high-throughput omics technologies in advancing personalized medicine and improving clinical outcomes.
Collapse
Affiliation(s)
- Rui Vitorino
- iBiMED, Department of Medical Sciences, University of Aveiro, 3810-193 Aveiro, Portugal;
- Department of Surgery and Physiology, Cardiovascular R&D Centre—UnIC@RISE, Faculty of Medicine, University of Porto, 4200-319 Porto, Portugal
| |
Collapse
|
12
|
Abbasi AF, Asim MN, Ahmed S, Vollmer S, Dengel A. Survival prediction landscape: an in-depth systematic literature review on activities, methods, tools, diseases, and databases. Front Artif Intell 2024; 7:1428501. [PMID: 39021434 PMCID: PMC11252047 DOI: 10.3389/frai.2024.1428501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Accepted: 06/12/2024] [Indexed: 07/20/2024] Open
Abstract
Survival prediction integrates patient-specific molecular information and clinical signatures to forecast the anticipated time of an event, such as recurrence, death, or disease progression. Survival prediction proves valuable in guiding treatment decisions, optimizing resource allocation, and interventions of precision medicine. The wide range of diseases, the existence of various variants within the same disease, and the reliance on available data necessitate disease-specific computational survival predictors. The widespread adoption of artificial intelligence (AI) methods in crafting survival predictors has undoubtedly revolutionized this field. However, the ever-increasing demand for more sophisticated and effective prediction models necessitates the continued creation of innovative advancements. To catalyze these advancements, it is crucial to bring existing survival predictors knowledge and insights into a centralized platform. The paper in hand thoroughly examines 23 existing review studies and provides a concise overview of their scope and limitations. Focusing on a comprehensive set of 90 most recent survival predictors across 44 diverse diseases, it delves into insights of diverse types of methods that are used in the development of disease-specific predictors. This exhaustive analysis encompasses the utilized data modalities along with a detailed analysis of subsets of clinical features, feature engineering methods, and the specific statistical, machine or deep learning approaches that have been employed. It also provides insights about survival prediction data sources, open-source predictors, and survival prediction frameworks.
Collapse
Affiliation(s)
- Ahtisham Fazeel Abbasi
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, Germany
- Smart Data & Knowledge Services, Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Kaiserslautern, Germany
| | - Muhammad Nabeel Asim
- Smart Data & Knowledge Services, Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Kaiserslautern, Germany
| | - Sheraz Ahmed
- Smart Data & Knowledge Services, Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Kaiserslautern, Germany
| | - Sebastian Vollmer
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, Germany
- Smart Data & Knowledge Services, Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Kaiserslautern, Germany
| | - Andreas Dengel
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, Germany
- Smart Data & Knowledge Services, Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Kaiserslautern, Germany
| |
Collapse
|
13
|
Lyu C, Joehanes R, Huan T, Levy D, Li Y, Wang M, Liu X, Liu C, Ma J. Enhancing selection of alcohol consumption-associated genes by random forest. Br J Nutr 2024; 131:2058-2067. [PMID: 38606596 PMCID: PMC11216877 DOI: 10.1017/s0007114524000795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/13/2024]
Abstract
Machine learning methods have been used in identifying omics markers for a variety of phenotypes. We aimed to examine whether a supervised machine learning algorithm can improve identification of alcohol-associated transcriptomic markers. In this study, we analysed array-based, whole-blood derived expression data for 17 873 gene transcripts in 5508 Framingham Heart Study participants. By using the Boruta algorithm, a supervised random forest (RF)-based feature selection method, we selected twenty-five alcohol-associated transcripts. In a testing set (30 % of entire study participants), AUC (area under the receiver operating characteristics curve) of these twenty-five transcripts were 0·73, 0·69 and 0·66 for non-drinkers v. moderate drinkers, non-drinkers v. heavy drinkers and moderate drinkers v. heavy drinkers, respectively. The AUC of the selected transcripts by the Boruta method were comparable to those identified using conventional linear regression models, for example, AUC of 1958 transcripts identified by conventional linear regression models (false discovery rate < 0·2) were 0·74, 0·66 and 0·65, respectively. With Bonferroni correction for the twenty-five Boruta method-selected transcripts and three CVD risk factors (i.e. at P < 6·7e-4), we observed thirteen transcripts were associated with obesity, three transcripts with type 2 diabetes and one transcript with hypertension. For example, we observed that alcohol consumption was inversely associated with the expression of DOCK4, IL4R, and SORT1, and DOCK4 and SORT1 were positively associated with obesity, and IL4R was inversely associated with hypertension. In conclusion, using a supervised machine learning method, the RF-based Boruta algorithm, we identified novel alcohol-associated gene transcripts.
Collapse
Affiliation(s)
- Chenglin Lyu
- Department of Biostatistics, Boston University School of Public Health, Boston, MA
- Department of Anatomy and Neurobiology, Boston University Chobanian & Avedisian School of Medicine, Boston, MA
| | - Roby Joehanes
- Framingham Heart Study and Population Sciences Branch, NHLBI, Framingham, MA
| | - Tianxiao Huan
- Framingham Heart Study and Population Sciences Branch, NHLBI, Framingham, MA
| | - Daniel Levy
- Framingham Heart Study and Population Sciences Branch, NHLBI, Framingham, MA
| | - Yi Li
- Department of Biostatistics, Boston University School of Public Health, Boston, MA
| | - Mengyao Wang
- Department of Biostatistics, Boston University School of Public Health, Boston, MA
| | - Xue Liu
- Department of Biostatistics, Boston University School of Public Health, Boston, MA
| | - Chunyu Liu
- Department of Biostatistics, Boston University School of Public Health, Boston, MA
| | - Jiantao Ma
- Nutrition Epidemiology and Data Science, Friedman School of Nutrition Science and Policy, Tufts University, Boston, MA
| |
Collapse
|
14
|
Aljarallah NA, Dutta AK, Sait ARW. A Systematic Review of Genetics- and Molecular-Pathway-Based Machine Learning Models for Neurological Disorder Diagnosis. Int J Mol Sci 2024; 25:6422. [PMID: 38928128 PMCID: PMC11203850 DOI: 10.3390/ijms25126422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Revised: 05/29/2024] [Accepted: 06/08/2024] [Indexed: 06/28/2024] Open
Abstract
The process of identification and management of neurological disorder conditions faces challenges, prompting the investigation of novel methods in order to improve diagnostic accuracy. In this study, we conducted a systematic literature review to identify the significance of genetics- and molecular-pathway-based machine learning (ML) models in treating neurological disorder conditions. According to the study's objectives, search strategies were developed to extract the research studies using digital libraries. We followed rigorous study selection criteria. A total of 24 studies met the inclusion criteria and were included in the review. We classified the studies based on neurological disorders. The included studies highlighted multiple methodologies and exceptional results in treating neurological disorders. The study findings underscore the potential of the existing models, presenting personalized interventions based on the individual's conditions. The findings offer better-performing approaches that handle genetics and molecular data to generate effective outcomes. Moreover, we discuss the future research directions and challenges, emphasizing the demand for generalizing existing models in real-world clinical settings. This study contributes to advancing knowledge in the field of diagnosis and management of neurological disorders.
Collapse
Affiliation(s)
- Nasser Ali Aljarallah
- Department of Computer Science and Information Systems, College of Applied Sciences, AlMaarefa University, Ad Diriyah, Riyadh 13713, Saudi Arabia;
| | - Ashit Kumar Dutta
- Department of Computer Science and Information Systems, College of Applied Sciences, AlMaarefa University, Ad Diriyah, Riyadh 13713, Saudi Arabia;
| | - Abdul Rahaman Wahab Sait
- Department of Documents and Archive, Center of Documents and Administrative Communication, King Faisal University, Al-Ahsa, Al Hofuf 31982, Saudi Arabia
| |
Collapse
|
15
|
Chakraborty S, Sharma G, Karmakar S, Banerjee S. Multi-OMICS approaches in cancer biology: New era in cancer therapy. Biochim Biophys Acta Mol Basis Dis 2024; 1870:167120. [PMID: 38484941 DOI: 10.1016/j.bbadis.2024.167120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 03/06/2024] [Accepted: 03/06/2024] [Indexed: 04/01/2024]
Abstract
Innovative multi-omics frameworks integrate diverse datasets from the same patients to enhance our understanding of the molecular and clinical aspects of cancers. Advanced omics and multi-view clustering algorithms present unprecedented opportunities for classifying cancers into subtypes, refining survival predictions and treatment outcomes, and unravelling key pathophysiological processes across various molecular layers. However, with the increasing availability of cost-effective high-throughput technologies (HTT) that generate vast amounts of data, analyzing single layers often falls short of establishing causal relations. Integrating multi-omics data spanning genomes, epigenomes, transcriptomes, proteomes, metabolomes, and microbiomes offers unique prospects to comprehend the underlying biology of complex diseases like cancer. This discussion explores algorithmic frameworks designed to uncover cancer subtypes, disease mechanisms, and methods for identifying pivotal genomic alterations. It also underscores the significance of multi-omics in tumor classifications, diagnostics, and prognostications. Despite its unparalleled advantages, the integration of multi-omics data has been slow to find its way into everyday clinics. A major hurdle is the uneven maturity of different omics approaches and the widening gap between the generation of large datasets and the capacity to process this data. Initiatives promoting the standardization of sample processing and analytical pipelines, as well as multidisciplinary training for experts in data analysis and interpretation, are crucial for translating theoretical findings into practical applications.
Collapse
Affiliation(s)
- Sohini Chakraborty
- Department of Biotechnology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India
| | - Gaurav Sharma
- Department of Biotechnology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India
| | - Sricheta Karmakar
- Department of Biotechnology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India
| | - Satarupa Banerjee
- Department of Biotechnology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India.
| |
Collapse
|
16
|
Labarga A, Martínez-Gonzalez J, Barajas M. Integrative Multi-Omics Analysis for Etiology Classification and Biomarker Discovery in Stroke: Advancing towards Precision Medicine. BIOLOGY 2024; 13:338. [PMID: 38785820 PMCID: PMC11149453 DOI: 10.3390/biology13050338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2024] [Revised: 05/02/2024] [Accepted: 05/06/2024] [Indexed: 05/25/2024]
Abstract
Recent advancements in high-throughput omics technologies have opened new avenues for investigating stroke at the molecular level and elucidating the intricate interactions among various molecular components. We present a novel approach for multi-omics data integration on knowledge graphs and have applied it to a stroke etiology classification task of 30 stroke patients through the integrative analysis of DNA methylation and mRNA, miRNA, and circRNA. This approach has demonstrated promising performance as compared to other existing single technology approaches.
Collapse
Affiliation(s)
- Alberto Labarga
- Health Science Department, Public University of Navarra, 31006 Pamplona, Spain;
| | | | - Miguel Barajas
- Health Science Department, Public University of Navarra, 31006 Pamplona, Spain;
| |
Collapse
|
17
|
Tang X, Prodduturi N, Thompson KJ, Weinshilboum RM, O'Sullivan CC, Boughey JC, Tizhoosh H, Klee EW, Wang L, Goetz MP, Suman V, Kalari KR. OmicsFootPrint: a framework to integrate and interpret multi-omics data using circular images and deep neural networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.21.586001. [PMID: 38585820 PMCID: PMC10996492 DOI: 10.1101/2024.03.21.586001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
The OmicsFootPrint framework addresses the need for advanced multi-omics data analysis methodologies by transforming data into intuitive two-dimensional circular images and facilitating the interpretation of complex diseases. Utilizing Deep Neural Networks and incorporating the SHapley Additive exPlanations (SHAP) algorithm, the framework enhances model interpretability. Tested with The Cancer Genome Atlas (TCGA) data, OmicsFootPrint effectively classified lung and breast cancer subtypes, achieving high Area Under Curve (AUC) scores - 0.98±0.02 for lung cancer subtype differentiation, 0.83±0.07 for breast cancer PAM50 subtypes, and successfully distinguishe between invasive lobular and ductal carcinomas in breast cancer, showcasing its robustness. It also demonstrated notable performance in predicting drug responses in cancer cell lines, with a median AUC of 0.74, surpassing existing algorithms. Furthermore, its effectiveness persists even with reduced training sample sizes. OmicsFootPrint marks an enhancement in multi-omics research, offering a novel, efficient, and interpretable approach that contributes to a deeper understanding of disease mechanisms.
Collapse
|