1
|
Zhao YX, Yu CQ, Li LP, Wang DW, Song HF, Wei Y. BJLD-CMI: a predictive circRNA-miRNA interactions model combining multi-angle feature information. Front Genet 2024; 15:1399810. [PMID: 38798699 PMCID: PMC11116695 DOI: 10.3389/fgene.2024.1399810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Accepted: 04/03/2024] [Indexed: 05/29/2024] Open
Abstract
Increasing research findings suggest that circular RNA (circRNA) exerts a crucial function in the pathogenesis of complex human diseases by binding to miRNA. Identifying their potential interactions is of paramount importance for the diagnosis and treatment of diseases. However, long cycles, small scales, and time-consuming processes characterize previous biological wet experiments. Consequently, the use of an efficient computational model to forecast the interactions between circRNA and miRNA is gradually becoming mainstream. In this study, we present a new prediction model named BJLD-CMI. The model extracts circRNA sequence features and miRNA sequence features by applying Jaccard and Bert's method and organically integrates them to obtain CMI attribute features, and then uses the graph embedding method Line to extract CMI behavioral features based on the known circRNA-miRNA correlation graph information. And then we predict the potential circRNA-miRNA interactions by fusing the multi-angle feature information such as attribute and behavior through Autoencoder in Autoencoder Networks. BJLD-CMI attained 94.95% and 90.69% of the area under the ROC curve on the CMI-9589 and CMI-9905 datasets. When compared with existing models, the results indicate that BJLD-CMI exhibits the best overall competence. During the case study experiment, we conducted a PubMed literature search to confirm that out of the top 10 predicted CMIs, seven pairs did indeed exist. These results suggest that BJLD-CMI is an effective method for predicting interactions between circRNAs and miRNAs. It provides a valuable candidate for biological wet experiments and can reduce the burden of researchers.
Collapse
Affiliation(s)
- Yi-Xin Zhao
- School of information Engineering, Xijing University, Xi’an, China
| | - Chang-Qing Yu
- School of information Engineering, Xijing University, Xi’an, China
| | - Li-Ping Li
- School of information Engineering, Xijing University, Xi’an, China
- College of Grassland and Environment Sciences, Xinjiang Agricultural University, Ürümqi, China
| | - Deng-Wu Wang
- School of information Engineering, Xijing University, Xi’an, China
| | - Hui-Fan Song
- School of information Engineering, Xijing University, Xi’an, China
| | - Yu Wei
- School of information Engineering, Xijing University, Xi’an, China
| |
Collapse
|
2
|
Cho H, She J, De Marchi D, El-Zaatari H, Barnes EL, Kahkoska AR, Kosorok MR, Virkud AV. Machine Learning and Health Science Research: Tutorial. J Med Internet Res 2024; 26:e50890. [PMID: 38289657 PMCID: PMC10865203 DOI: 10.2196/50890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Revised: 11/30/2023] [Accepted: 12/21/2023] [Indexed: 02/01/2024] Open
Abstract
Machine learning (ML) has seen impressive growth in health science research due to its capacity for handling complex data to perform a range of tasks, including unsupervised learning, supervised learning, and reinforcement learning. To aid health science researchers in understanding the strengths and limitations of ML and to facilitate its integration into their studies, we present here a guideline for integrating ML into an analysis through a structured framework, covering steps from framing a research question to study design and analysis techniques for specialized data types.
Collapse
Affiliation(s)
- Hunyong Cho
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Jane She
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Daniel De Marchi
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Helal El-Zaatari
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Edward L Barnes
- Division of Gastroenterology and Hepatology, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
- Center for Gastrointestinal Biology and Diseases, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Anna R Kahkoska
- Department of Nutrition, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
- Division of Endocrinology and Metabolism, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
- Center for Aging and Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Michael R Kosorok
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Arti V Virkud
- Kidney Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| |
Collapse
|
3
|
Wieder C, Cooke J, Frainay C, Poupin N, Bowler R, Jourdan F, Kechris KJ, Lai RP, Ebbels T. PathIntegrate: Multivariate modelling approaches for pathway-based multi-omics data integration. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.09.574780. [PMID: 38260498 PMCID: PMC10802464 DOI: 10.1101/2024.01.09.574780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
As terabytes of multi-omics data are being generated, there is an ever-increasing need for methods facilitating the integration and interpretation of such data. Current multi-omics integration methods typically output lists, clusters, or subnetworks of molecules related to an outcome. Even with expert domain knowledge, discerning the biological processes involved is a time-consuming activity. Here we propose PathIntegrate, a method for integrating multi-omics datasets based on pathways, designed to exploit knowledge of biological systems and thus provide interpretable models for such studies. PathIntegrate employs single-sample pathway analysis to transform multi-omics datasets from the molecular to the pathway-level, and applies a predictive single-view or multi-view model to integrate the data. Model outputs include multi-omics pathways ranked by their contribution to the outcome prediction, the contribution of each omics layer, and the importance of each molecule in a pathway. Using semi-synthetic data we demonstrate the benefit of grouping molecules into pathways to detect signals in low signal-to-noise scenarios, as well as the ability of PathIntegrate to precisely identify important pathways at low effect sizes. Finally, using COPD and COVID-19 data we showcase how PathIntegrate enables convenient integration and interpretation of complex high-dimensional multi-omics datasets. The PathIntegrate Python package is available at https://github.com/cwieder/PathIntegrate.
Collapse
Affiliation(s)
- Cecilia Wieder
- Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Faculty of Medicine, Imperial College London, London, United Kingdom
| | - Juliette Cooke
- Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, France
| | - Clement Frainay
- Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, France
| | - Nathalie Poupin
- Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, France
| | - Russell Bowler
- National Jewish Health, 1400 Jackson Street, Denver, CO, 80206, USA
| | - Fabien Jourdan
- MetaboHUB-Metatoul, National Infrastructure of Metabolomics and Fluxomics, Toulouse, France
| | - Katerina J Kechris
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, United States of America
| | - Rachel Pj Lai
- Department of Infectious Disease, Faculty of Medicine, Imperial College London, London, United Kingdom
| | - Timothy Ebbels
- Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Faculty of Medicine, Imperial College London, London, United Kingdom
| |
Collapse
|
4
|
Rahman A, Debnath T, Kundu D, Khan MSI, Aishi AA, Sazzad S, Sayduzzaman M, Band SS. Machine learning and deep learning-based approach in smart healthcare: Recent advances, applications, challenges and opportunities. AIMS Public Health 2024; 11:58-109. [PMID: 38617415 PMCID: PMC11007421 DOI: 10.3934/publichealth.2024004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Accepted: 12/18/2023] [Indexed: 04/16/2024] Open
Abstract
In recent years, machine learning (ML) and deep learning (DL) have been the leading approaches to solving various challenges, such as disease predictions, drug discovery, medical image analysis, etc., in intelligent healthcare applications. Further, given the current progress in the fields of ML and DL, there exists the promising potential for both to provide support in the realm of healthcare. This study offered an exhaustive survey on ML and DL for the healthcare system, concentrating on vital state of the art features, integration benefits, applications, prospects and future guidelines. To conduct the research, we found the most prominent journal and conference databases using distinct keywords to discover scholarly consequences. First, we furnished the most current along with cutting-edge progress in ML-DL-based analysis in smart healthcare in a compendious manner. Next, we integrated the advancement of various services for ML and DL, including ML-healthcare, DL-healthcare, and ML-DL-healthcare. We then offered ML and DL-based applications in the healthcare industry. Eventually, we emphasized the research disputes and recommendations for further studies based on our observations.
Collapse
Affiliation(s)
- Anichur Rahman
- Department of CSE, National Institute of Textile Engineering and Research (NITER), Constituent Institute of the University of Dhaka, Savar, Dhaka-1350
- Department of CSE, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
| | - Tanoy Debnath
- Department of CSE, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
- Department of CSE, Green University of Bangladesh, 220/D, Begum Rokeya Sarani, Dhaka -1207, Bangladesh
| | - Dipanjali Kundu
- Department of CSE, National Institute of Textile Engineering and Research (NITER), Constituent Institute of the University of Dhaka, Savar, Dhaka-1350
| | - Md. Saikat Islam Khan
- Department of CSE, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
| | - Airin Afroj Aishi
- Department of Computing and Information System, Daffodil International University, Savar, Dhaka, Bangladesh
| | - Sadia Sazzad
- Department of CSE, National Institute of Textile Engineering and Research (NITER), Constituent Institute of the University of Dhaka, Savar, Dhaka-1350
| | - Mohammad Sayduzzaman
- Department of CSE, National Institute of Textile Engineering and Research (NITER), Constituent Institute of the University of Dhaka, Savar, Dhaka-1350
| | - Shahab S. Band
- Department of Information Management, International Graduate School of Artificial Intelligence, National Yunlin University of Science and Technology, Taiwan
| |
Collapse
|
5
|
Yue T, Wang Y, Zhang L, Gu C, Xue H, Wang W, Lyu Q, Dun Y. Deep Learning for Genomics: From Early Neural Nets to Modern Large Language Models. Int J Mol Sci 2023; 24:15858. [PMID: 37958843 PMCID: PMC10649223 DOI: 10.3390/ijms242115858] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 10/24/2023] [Accepted: 10/30/2023] [Indexed: 11/15/2023] Open
Abstract
The data explosion driven by advancements in genomic research, such as high-throughput sequencing techniques, is constantly challenging conventional methods used in genomics. In parallel with the urgent demand for robust algorithms, deep learning has succeeded in various fields such as vision, speech, and text processing. Yet genomics entails unique challenges to deep learning, since we expect a superhuman intelligence that explores beyond our knowledge to interpret the genome from deep learning. A powerful deep learning model should rely on the insightful utilization of task-specific knowledge. In this paper, we briefly discuss the strengths of different deep learning models from a genomic perspective so as to fit each particular task with proper deep learning-based architecture, and we remark on practical considerations of developing deep learning architectures for genomics. We also provide a concise review of deep learning applications in various aspects of genomic research and point out current challenges and potential research directions for future genomics applications. We believe the collaborative use of ever-growing diverse data and the fast iteration of deep learning models will continue to contribute to the future of genomics.
Collapse
Affiliation(s)
- Tianwei Yue
- School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA; (Y.W.); (L.Z.); (W.W.)
| | - Yuanxin Wang
- School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA; (Y.W.); (L.Z.); (W.W.)
| | - Longxiang Zhang
- School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA; (Y.W.); (L.Z.); (W.W.)
| | - Chunming Gu
- Department of Biomedical Engineering, School of Medicine, Johns Hopkins University, Baltimore, MD 21218, USA;
| | - Haoru Xue
- The Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA;
| | - Wenping Wang
- School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA; (Y.W.); (L.Z.); (W.W.)
| | - Qi Lyu
- Department of Computational Mathematics, Science, and Engineering, Michigan State University, East Lansing, MI 48824, USA;
| | - Yujie Dun
- School of Information and Communications Engineering, Xi’an Jiaotong University, Xi’an 710049, China;
| |
Collapse
|
6
|
Chen X, Feng B, Xu K, Chen Y, Duan X, Jin Z, Li K, Li R, Long W, Liu X. Development and validation of a deep learning radiomics nomogram for preoperatively differentiating thymic epithelial tumor histologic subtypes. Eur Radiol 2023; 33:6804-6816. [PMID: 37148352 DOI: 10.1007/s00330-023-09690-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 02/20/2023] [Accepted: 02/27/2023] [Indexed: 05/08/2023]
Abstract
OBJECTIVES Using contrast-enhanced computed tomography (CECT) and deep learning technology to develop a deep learning radiomics nomogram (DLRN) to preoperative predict risk status of patients with thymic epithelial tumors (TETs). METHODS Between October 2008 and May 2020, 257 consecutive patients with surgically and pathologically confirmed TETs were enrolled from three medical centers. We extracted deep learning features from all lesions using a transformer-based convolutional neural network and created a deep learning signature (DLS) using selector operator regression and least absolute shrinkage. The predictive capability of a DLRN incorporating clinical characteristics, subjective CT findings and DLS was evaluated by the area under the curve (AUC) of a receiver operating characteristic curve. RESULTS To construct a DLS, 25 deep learning features with non-zero coefficients were selected from 116 low-risk TETs (subtypes A, AB, and B1) and 141 high-risk TETs (subtypes B2, B3, and C). The combination of subjective CT features such as infiltration and DLS demonstrated the best performance in differentiating TETs risk status. The AUCs in the training, internal validation, external validation 1 and 2 cohorts were 0.959 (95% confidence interval [CI]: 0.924-0.993), 0.868 (95% CI: 0.765-0.970), 0.846 (95% CI: 0.750-0.942), and 0.846 (95% CI: 0.735-0.957), respectively. The DeLong test and decision in curve analysis revealed that the DLRN was the most predictive and clinically useful model. CONCLUSIONS The DLRN comprised of CECT-derived DLS and subjective CT findings showed a high performance in predicting risk status of patients with TETs. CLINICAL RELEVANCE STATEMENT Accurate risk status assessment of thymic epithelial tumors (TETs) may aid in determining whether preoperative neoadjuvant treatment is necessary. A deep learning radiomics nomogram incorporating enhancement CT-based deep learning features, clinical characteristics, and subjective CT findings has the potential to predict the histologic subtypes of TETs, which can facilitate decision-making and personalized therapy in clinical practice. KEY POINTS • A non-invasive diagnostic method that can predict the pathological risk status may be useful for pretreatment stratification and prognostic evaluation in TET patients. • DLRN demonstrated superior performance in differentiating the risk status of TETs when compared to the deep learning signature, radiomics signature, or clinical model. • The DeLong test and decision in curve analysis revealed that the DLRN was the most predictive and clinically useful in differentiating the risk status of TETs.
Collapse
Affiliation(s)
- Xiangmeng Chen
- Department of Radiology, Jiangmen Central Hospital, Jiangmen, Guangdong Province, 529030, People's Republic of China
| | - Bao Feng
- Department of Radiology, Jiangmen Central Hospital, Jiangmen, Guangdong Province, 529030, People's Republic of China
- Laboratory of Artificial Intelligence of Biomedicine, Guilin University of Aerospace Technology, Guilin, Guangxi Province, 541004, People's Republic of China
| | - Kuncai Xu
- Laboratory of Artificial Intelligence of Biomedicine, Guilin University of Aerospace Technology, Guilin, Guangxi Province, 541004, People's Republic of China
| | - Yehang Chen
- Laboratory of Artificial Intelligence of Biomedicine, Guilin University of Aerospace Technology, Guilin, Guangxi Province, 541004, People's Republic of China
| | - Xiaobei Duan
- Department of Nuclear Medicine, Jiangmen Central Hospital, Jiangmen, Guangdong Province, 529030, People's Republic of China
| | - Zhifa Jin
- Department of Radiology, Jiangmen Central Hospital, Jiangmen, Guangdong Province, 529030, People's Republic of China
| | - Kunwei Li
- Department of Radiology, The Fifth Affiliated Hospital of Sun Yat-Sen University, Zhuhai, Guangdong Province, 519000, People's Republic of China
| | - Ronggang Li
- Department of Pathology, Jiangmen Central Hospital, Jiangmen, Guangdong Province, 529030, People's Republic of China
| | - Wansheng Long
- Department of Radiology, Jiangmen Central Hospital, Jiangmen, Guangdong Province, 529030, People's Republic of China.
| | - Xueguo Liu
- Department of Radiology, The Seventh Affiliated Hospital of Sun Yat-Sen University, Shenzhen, Guangdong Province, 518107, People's Republic of China.
| |
Collapse
|
7
|
Chafai N, Hayah I, Houaga I, Badaoui B. A review of machine learning models applied to genomic prediction in animal breeding. Front Genet 2023; 14:1150596. [PMID: 37745853 PMCID: PMC10516561 DOI: 10.3389/fgene.2023.1150596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Accepted: 08/22/2023] [Indexed: 09/26/2023] Open
Abstract
The advent of modern genotyping technologies has revolutionized genomic selection in animal breeding. Large marker datasets have shown several drawbacks for traditional genomic prediction methods in terms of flexibility, accuracy, and computational power. Recently, the application of machine learning models in animal breeding has gained a lot of interest due to their tremendous flexibility and their ability to capture patterns in large noisy datasets. Here, we present a general overview of a handful of machine learning algorithms and their application in genomic prediction to provide a meta-picture of their performance in genomic estimated breeding values estimation, genotype imputation, and feature selection. Finally, we discuss a potential adoption of machine learning models in genomic prediction in developing countries. The results of the reviewed studies showed that machine learning models have indeed performed well in fitting large noisy data sets and modeling minor nonadditive effects in some of the studies. However, sometimes conventional methods outperformed machine learning models, which confirms that there's no universal method for genomic prediction. In summary, machine learning models have great potential for extracting patterns from single nucleotide polymorphism datasets. Nonetheless, the level of their adoption in animal breeding is still low due to data limitations, complex genetic interactions, a lack of standardization and reproducibility, and the lack of interpretability of machine learning models when trained with biological data. Consequently, there is no remarkable outperformance of machine learning methods compared to traditional methods in genomic prediction. Therefore, more research should be conducted to discover new insights that could enhance livestock breeding programs.
Collapse
Affiliation(s)
- Narjice Chafai
- Laboratory of Biodiversity, Ecology, and Genome, Department of Biology, Faculty of Sciences, Mohammed V University in Rabat, Rabat, Morocco
| | - Ichrak Hayah
- Laboratory of Biodiversity, Ecology, and Genome, Department of Biology, Faculty of Sciences, Mohammed V University in Rabat, Rabat, Morocco
| | - Isidore Houaga
- Centre for Tropical Livestock Genetics and Health, The Roslin Institute, Royal (Dick) School of Veterinary Medicine, The University of Edinburgh, Edinburgh, United Kingdom
- The Roslin Institute, Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, United Kingdom
| | - Bouabid Badaoui
- Laboratory of Biodiversity, Ecology, and Genome, Department of Biology, Faculty of Sciences, Mohammed V University in Rabat, Rabat, Morocco
- African Sustainable Agriculture Research Institute (ASARI), Mohammed VI Polytechnic University (UM6P), Laayoune, Morocco
| |
Collapse
|
8
|
Morabito F, Adornetto C, Monti P, Amaro A, Reggiani F, Colombo M, Rodriguez-Aldana Y, Tripepi G, D’Arrigo G, Vener C, Torricelli F, Rossi T, Neri A, Ferrarini M, Cutrona G, Gentile M, Greco G. Genes selection using deep learning and explainable artificial intelligence for chronic lymphocytic leukemia predicting the need and time to therapy. Front Oncol 2023; 13:1198992. [PMID: 37719021 PMCID: PMC10501728 DOI: 10.3389/fonc.2023.1198992] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Accepted: 07/31/2023] [Indexed: 09/19/2023] Open
Abstract
Analyzing gene expression profiles (GEP) through artificial intelligence provides meaningful insight into cancer disease. This study introduces DeepSHAP Autoencoder Filter for Genes Selection (DSAF-GS), a novel deep learning and explainable artificial intelligence-based approach for feature selection in genomics-scale data. DSAF-GS exploits the autoencoder's reconstruction capabilities without changing the original feature space, enhancing the interpretation of the results. Explainable artificial intelligence is then used to select the informative genes for chronic lymphocytic leukemia prognosis of 217 cases from a GEP database comprising roughly 20,000 genes. The model for prognosis prediction achieved an accuracy of 86.4%, a sensitivity of 85.0%, and a specificity of 87.5%. According to the proposed approach, predictions were strongly influenced by CEACAM19 and PIGP, moderately influenced by MKL1 and GNE, and poorly influenced by other genes. The 10 most influential genes were selected for further analysis. Among them, FADD, FIBP, FIBP, GNE, IGF1R, MKL1, PIGP, and SLC39A6 were identified in the Reactome pathway database as involved in signal transduction, transcription, protein metabolism, immune system, cell cycle, and apoptosis. Moreover, according to the network model of the 3D protein-protein interaction (PPI) explored using the NetworkAnalyst tool, FADD, FIBP, IGF1R, QTRT1, GNE, SLC39A6, and MKL1 appear coupled into a complex network. Finally, all 10 selected genes showed a predictive power on time to first treatment (TTFT) in univariate analyses on a basic prognostic model including IGHV mutational status, del(11q) and del(17p), NOTCH1 mutations, β2-microglobulin, Rai stage, and B-lymphocytosis known to predict TTFT in CLL. However, only IGF1R [hazard ratio (HR) 1.41, 95% CI 1.08-1.84, P=0.013), COL28A1 (HR 0.32, 95% CI 0.10-0.97, P=0.045), and QTRT1 (HR 7.73, 95% CI 2.48-24.04, P<0.001) genes were significantly associated with TTFT in multivariable analyses when combined with the prognostic factors of the basic model, ultimately increasing the Harrell's c-index and the explained variation to 78.6% (versus 76.5% of the basic prognostic model) and 52.6% (versus 42.2% of the basic prognostic model), respectively. Also, the goodness of model fit was enhanced (χ2 = 20.1, P=0.002), indicating its improved performance above the basic prognostic model. In conclusion, DSAF-GS identified a group of significant genes for CLL prognosis, suggesting future directions for bio-molecular research.
Collapse
Affiliation(s)
| | - Carlo Adornetto
- Department of Mathematics and Computer Science, University of Calabria, Cosenza, Italy
| | - Paola Monti
- Mutagenesis and Cancer Prevention Unit, Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS) Ospedale Policlinico San Martino, Genoa, Italy
| | - Adriana Amaro
- Tumor Epigenetics Unit, Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS) Ospedale Policlinico San Martino, Genoa, Italy
| | - Francesco Reggiani
- Tumor Epigenetics Unit, Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS) Ospedale Policlinico San Martino, Genoa, Italy
| | - Monica Colombo
- Molecular Pathology Unit, Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS) Ospedale Policlinico San Martino, Genoa, Italy
| | | | - Giovanni Tripepi
- Consiglio Nazionale delle Ricerche, Istituto di Fisiologia Clinica del Consiglio Nazionale delle Ricerche (CNR), Reggio Calabria, Italy
| | - Graziella D’Arrigo
- Consiglio Nazionale delle Ricerche, Istituto di Fisiologia Clinica del Consiglio Nazionale delle Ricerche (CNR), Reggio Calabria, Italy
| | - Claudia Vener
- Department of Oncology and Hemato-Oncology, University of Milan, Milan, Italy
| | - Federica Torricelli
- Laboratory of Translational Research, Azienda Unità Sanitaria Locale - Istituto di Ricovero e Cura a Crabtree Scientifico (USL-IRCCS) of Reggio Emilia, Reggio Emilia, Italy
| | - Teresa Rossi
- Laboratory of Translational Research, Azienda Unità Sanitaria Locale - Istituto di Ricovero e Cura a Crabtree Scientifico (USL-IRCCS) of Reggio Emilia, Reggio Emilia, Italy
| | - Antonino Neri
- Scientific Directorate, Azienda Unità Sanitaria Locale - Istituto di Ricovero e Cura a Carattere Scientifico (USL-IRCCS) of Reggio Emilia, Reggio Emilia, Italy
| | - Manlio Ferrarini
- Unità Operariva (UO) Molecular Pathology, Ospedale Policlinico San Martino Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS), Genoa, Italy
| | - Giovanna Cutrona
- Molecular Pathology Unit, Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS) Ospedale Policlinico San Martino, Genoa, Italy
| | - Massimo Gentile
- Hematology Unit, Department of Onco-Hematology, Azienda Ospedaliera (A.O.) of Cosenza, Cosenza, Italy
- Department of Pharmacy and Health and Nutritional Sciences, University of Calabria, Cosenza, Italy
| | - Gianluigi Greco
- Department of Mathematics and Computer Science, University of Calabria, Cosenza, Italy
| |
Collapse
|
9
|
Curti PDF, Selli A, Pinto DL, Merlos-Ruiz A, Balieiro JCDC, Ventura RV. Applications of livestock monitoring devices and machine learning algorithms in animal production and reproduction: an overview. Anim Reprod 2023; 20:e20230077. [PMID: 37700909 PMCID: PMC10494883 DOI: 10.1590/1984-3143-ar2023-0077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 07/10/2023] [Indexed: 09/14/2023] Open
Abstract
Some sectors of animal production and reproduction have shown great technological advances due to the development of research areas such as Precision Livestock Farming (PLF). PLF is an innovative approach that allows animals to be monitored, through the adoption of cutting-edge technologies that continuously collect real-time data by combining the use of sensors with advanced algorithms to provide decision tools for farmers. Artificial Intelligence (AI) is a field that merges computer science and large datasets to create expert systems that are able to generate predictions and classifications similarly to human intelligence. In a simplified manner, Machine Learning (ML) is a branch of AI, and can be considered as a broader field that encompasses Deep Learning (DL, a Neural Network formed by at least three layers), generating a hierarchy of subsets formed by AI, ML and DL, respectively. Both ML and DL provide innovative methods for analyzing data, especially beneficial for large datasets commonly found in livestock-related activities. These approaches enable the extraction of valuable insights to address issues related to behavior, health, reproduction, production, and the environment, facilitating informed decision-making. In order to create the referred technologies, studies generally go through five steps involving data processing: acquisition, transferring, storage, analysis and delivery of results. Although the data collection and analysis steps are usually thoroughly reported by the scientific community, a good execution of each step is essential to achieve good and credible results, which impacts the degree of acceptance of the proposed technologies in real life practical circumstances. In this context, the present work aims to describe an overview of the current implementations of ML/DL in livestock reproduction and production, as well to identify potential challenges and critical points in each of the five steps mentioned, which can affect results and application of AI techniques by farmers in practical situations.
Collapse
Affiliation(s)
- Paula de Freitas Curti
- Departamento de Nutrição e Produção Animal, Faculdade de Medicina Veterinária e Zootecnia, Universidade de São Paulo, Pirassununga, SP, Brasil
| | - Alana Selli
- Departamento de Nutrição e Produção Animal, Faculdade de Medicina Veterinária e Zootecnia, Universidade de São Paulo, Pirassununga, SP, Brasil
| | - Diógenes Lodi Pinto
- Departamento de Nutrição e Produção Animal, Faculdade de Medicina Veterinária e Zootecnia, Universidade de São Paulo, Pirassununga, SP, Brasil
| | - Alexandre Merlos-Ruiz
- Departamento de Nutrição e Produção Animal, Faculdade de Medicina Veterinária e Zootecnia, Universidade de São Paulo, Pirassununga, SP, Brasil
| | - Julio Cesar de Carvalho Balieiro
- Departamento de Nutrição e Produção Animal, Faculdade de Medicina Veterinária e Zootecnia, Universidade de São Paulo, Pirassununga, SP, Brasil
| | - Ricardo Vieira Ventura
- Departamento de Nutrição e Produção Animal, Faculdade de Medicina Veterinária e Zootecnia, Universidade de São Paulo, Pirassununga, SP, Brasil
| |
Collapse
|
10
|
Gupta NS, Kumar P. Perspective of artificial intelligence in healthcare data management: A journey towards precision medicine. Comput Biol Med 2023; 162:107051. [PMID: 37271113 DOI: 10.1016/j.compbiomed.2023.107051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Revised: 05/06/2023] [Accepted: 05/20/2023] [Indexed: 06/06/2023]
Abstract
Mounting evidence has highlighted the implementation of big data handling and management in the healthcare industry to improve the clinical services. Various private and public companies have generated, stored, and analyzed different types of big healthcare data, such as omics data, clinical data, electronic health records, personal health records, and sensing data with the aim to move in the direction of precision medicine. Additionally, with the advancement in technologies, researchers are curious to extract the potential involvement of artificial intelligence and machine learning on big healthcare data to enhance the quality of patient's lives. However, seeking solutions from big healthcare data requires proper management, storage, and analysis, which imposes hinderances associated with big data handling. Herein, we briefly discuss the implication of big data handling and the role of artificial intelligence in precision medicine. Further, we also highlighted the potential of artificial intelligence in integrating and analyzing the big data that offer personalized treatment. In addition, we briefly discuss the applications of artificial intelligence in personalized treatment, especially in neurological diseases. Lastly, we discuss the challenges and limitations imposed by artificial intelligence in big data management and analysis to hinder precision medicine.
Collapse
Affiliation(s)
- Nancy Sanjay Gupta
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University, India
| | - Pravir Kumar
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University, India.
| |
Collapse
|
11
|
Kuzudisli C, Bakir-Gungor B, Bulut N, Qaqish B, Yousef M. Review of feature selection approaches based on grouping of features. PeerJ 2023; 11:e15666. [PMID: 37483989 PMCID: PMC10358338 DOI: 10.7717/peerj.15666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 06/08/2023] [Indexed: 07/25/2023] Open
Abstract
With the rapid development in technology, large amounts of high-dimensional data have been generated. This high dimensionality including redundancy and irrelevancy poses a great challenge in data analysis and decision making. Feature selection (FS) is an effective way to reduce dimensionality by eliminating redundant and irrelevant data. Most traditional FS approaches score and rank each feature individually; and then perform FS either by eliminating lower ranked features or by retaining highly-ranked features. In this review, we discuss an emerging approach to FS that is based on initially grouping features, then scoring groups of features rather than scoring individual features. Despite the presence of reviews on clustering and FS algorithms, to the best of our knowledge, this is the first review focusing on FS techniques based on grouping. The typical idea behind FS through grouping is to generate groups of similar features with dissimilarity between groups, then select representative features from each cluster. Approaches under supervised, unsupervised, semi supervised and integrative frameworks are explored. The comparison of experimental results indicates the effectiveness of sequential, optimization-based (i.e., fuzzy or evolutionary), hybrid and multi-method approaches. When it comes to biological data, the involvement of external biological sources can improve analysis results. We hope this work's findings can guide effective design of new FS approaches using feature grouping.
Collapse
Affiliation(s)
- Cihan Kuzudisli
- Department of Computer Engineering, Hasan Kalyoncu University, Gaziantep, Turkey
- Department of Electrical and Computer Engineering, Abdullah Gul University, Kayseri, Turkey
| | - Burcu Bakir-Gungor
- Department of Computer Engineering, Abdullah Gul University, Kayseri, Turkey
| | - Nurten Bulut
- Department of Computer Engineering, Abdullah Gul University, Kayseri, Turkey
| | - Bahjat Qaqish
- Department of Biostatistics, University of North Carolina at Chapel Hill, North Carolina, Chapel Hill, United States of America
| | - Malik Yousef
- Department of Information Systems, Zefat Academic College, Zefat, Israel
- Galilee Digital Health Research Center, Zefat Academic College, Zefat, Israel
| |
Collapse
|
12
|
Li C, Dubbelaar ML, Zhang X, Zheng JC. Editorial: Understanding the heterogeneity and spatial brain environment of neurodegenerative diseases through conventional and future methods. Front Cell Neurosci 2023; 17:1211273. [PMID: 37287510 PMCID: PMC10242171 DOI: 10.3389/fncel.2023.1211273] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Accepted: 05/02/2023] [Indexed: 06/09/2023] Open
Affiliation(s)
- Cui Li
- Center for Translational Neurodegeneration and Regenerative Therapy, Shanghai Tongji Hospital Affiliated to Tongji University School of Medicine, Shanghai, China
| | - Marissa L. Dubbelaar
- Department of Peptide-Based Immunotherapy, University Hospital Tübingen, Tübingen, Germany
- Cluster of Excellence iFIT (EXC2180) “Image-Guided and Functionally Instructed Tumor Therapies”, University of Tübingen, Tübingen, Germany
- Department of Immunology, Institute for Cell Biology, University of Tübingen, Tübingen, Germany
- Quantitative Biology Center (QBiC), University of Tübingen, Tübingen, Germany
| | - Xiaoming Zhang
- Center for Translational Neurodegeneration and Regenerative Therapy, Shanghai Tongji Hospital Affiliated to Tongji University School of Medicine, Shanghai, China
| | - Jialin C. Zheng
- Center for Translational Neurodegeneration and Regenerative Therapy, Shanghai Tongji Hospital Affiliated to Tongji University School of Medicine, Shanghai, China
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration of Ministry of Education, Orthopedic Department of Tongji Hospital, School of Medicine, Tongji University, Shanghai, China
- Translational Research Institute of Brain and Brain-Like Intelligence, Shanghai Fourth People's Hospital Affiliated to Tongji University School of Medicine, Shanghai, China
- Collaborative Innovation Center for Brain Science, Tongji University, Shanghai, China
| |
Collapse
|
13
|
Hauptmann T, Kramer S. A fair experimental comparison of neural network architectures for latent representations of multi-omics for drug response prediction. BMC Bioinformatics 2023; 24:45. [PMID: 36788531 PMCID: PMC9926634 DOI: 10.1186/s12859-023-05166-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 01/31/2023] [Indexed: 02/16/2023] Open
Abstract
BACKGROUND Recent years have seen a surge of novel neural network architectures for the integration of multi-omics data for prediction. Most of the architectures include either encoders alone or encoders and decoders, i.e., autoencoders of various sorts, to transform multi-omics data into latent representations. One important parameter is the depth of integration: the point at which the latent representations are computed or merged, which can be either early, intermediate, or late. The literature on integration methods is growing steadily, however, close to nothing is known about the relative performance of these methods under fair experimental conditions and under consideration of different use cases. RESULTS We developed a comparison framework that trains and optimizes multi-omics integration methods under equal conditions. We incorporated early integration, PCA and four recently published deep learning methods: MOLI, Super.FELT, OmiEmbed, and MOMA. Further, we devised a novel method, Omics Stacking, that combines the advantages of intermediate and late integration. Experiments were conducted on a public drug response data set with multiple omics data (somatic point mutations, somatic copy number profiles and gene expression profiles) that was obtained from cell lines, patient-derived xenografts, and patient samples. Our experiments confirmed that early integration has the lowest predictive performance. Overall, architectures that integrate triplet loss achieved the best results. Statistical differences can, overall, rarely be observed, however, in terms of the average ranks of methods, Super.FELT is consistently performing best in a cross-validation setting and Omics Stacking best in an external test set setting. CONCLUSIONS We recommend researchers to follow fair comparison protocols, as suggested in the paper. When faced with a new data set, Super.FELT is a good option in the cross-validation setting as well as Omics Stacking in the external test set setting. Statistical significances are hardly observable, despite trends in the algorithms' rankings. Future work on refined methods for transfer learning tailored for this domain may improve the situation for external test sets. The source code of all experiments is available under https://github.com/kramerlab/Multi-Omics_analysis.
Collapse
Affiliation(s)
- Tony Hauptmann
- Department of Computer Science, Johannes Gutenberg University Mainz, Mainz, Germany.
| | - Stefan Kramer
- grid.5802.f0000 0001 1941 7111Department of Computer Science, Johannes Gutenberg University Mainz, Mainz, Germany
| |
Collapse
|
14
|
Flores JE, Claborne DM, Weller ZD, Webb-Robertson BJM, Waters KM, Bramer LM. Missing data in multi-omics integration: Recent advances through artificial intelligence. Front Artif Intell 2023; 6:1098308. [PMID: 36844425 PMCID: PMC9949722 DOI: 10.3389/frai.2023.1098308] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 01/23/2023] [Indexed: 02/11/2023] Open
Abstract
Biological systems function through complex interactions between various 'omics (biomolecules), and a more complete understanding of these systems is only possible through an integrated, multi-omic perspective. This has presented the need for the development of integration approaches that are able to capture the complex, often non-linear, interactions that define these biological systems and are adapted to the challenges of combining the heterogenous data across 'omic views. A principal challenge to multi-omic integration is missing data because all biomolecules are not measured in all samples. Due to either cost, instrument sensitivity, or other experimental factors, data for a biological sample may be missing for one or more 'omic techologies. Recent methodological developments in artificial intelligence and statistical learning have greatly facilitated the analyses of multi-omics data, however many of these techniques assume access to completely observed data. A subset of these methods incorporate mechanisms for handling partially observed samples, and these methods are the focus of this review. We describe recently developed approaches, noting their primary use cases and highlighting each method's approach to handling missing data. We additionally provide an overview of the more traditional missing data workflows and their limitations; and we discuss potential avenues for further developments as well as how the missing data issue and its current solutions may generalize beyond the multi-omics context.
Collapse
Affiliation(s)
- Javier E. Flores
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States
| | - Daniel M. Claborne
- Pacific Northwest National Laboratory, Artificial Intelligence and Data Analytics Division, National Security Directorate, Richland, WA, United States
| | - Zachary D. Weller
- Pacific Northwest National Laboratory, Artificial Intelligence and Data Analytics Division, National Security Directorate, Richland, WA, United States
| | - Bobbie-Jo M. Webb-Robertson
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States
| | - Katrina M. Waters
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States
| | - Lisa M. Bramer
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States,*Correspondence: Lisa M. Bramer ✉
| |
Collapse
|
15
|
Katta MR, Kalluru PKR, Bavishi DA, Hameed M, Valisekka SS. Artificial intelligence in pancreatic cancer: diagnosis, limitations, and the future prospects-a narrative review. J Cancer Res Clin Oncol 2023:10.1007/s00432-023-04625-1. [PMID: 36739356 DOI: 10.1007/s00432-023-04625-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Accepted: 01/27/2023] [Indexed: 02/06/2023]
Abstract
PURPOSE This review aims to explore the role of AI in the application of pancreatic cancer management and make recommendations to minimize the impact of the limitations to provide further benefits from AI use in the future. METHODS A comprehensive review of the literature was conducted using a combination of MeSH keywords, including "Artificial intelligence", "Pancreatic cancer", "Diagnosis", and "Limitations". RESULTS The beneficial implications of AI in the detection of biomarkers, diagnosis, and prognosis of pancreatic cancer have been explored. In addition, current drawbacks of AI use have been divided into subcategories encompassing statistical, training, and knowledge limitations; data handling, ethical and medicolegal aspects; and clinical integration and implementation. CONCLUSION Artificial intelligence (AI) refers to computational machine systems that accomplish a set of given tasks by imitating human intelligence in an exponential learning pattern. AI in gastrointestinal oncology has continued to provide significant advancements in the clinical, molecular, and radiological diagnosis and intervention techniques required to improve the prognosis of many gastrointestinal cancer types, particularly pancreatic cancer.
Collapse
Affiliation(s)
| | | | | | - Maha Hameed
- Clinical Research Department, King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia.
| | | |
Collapse
|
16
|
Lorefice L, Pitzalis M, Murgia F, Fenu G, Atzori L, Cocco E. Omics approaches to understanding the efficacy and safety of disease-modifying treatments in multiple sclerosis. Front Genet 2023; 14:1076421. [PMID: 36793897 PMCID: PMC9922720 DOI: 10.3389/fgene.2023.1076421] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Accepted: 01/09/2023] [Indexed: 02/03/2023] Open
Abstract
From the perspective of precision medicine, the challenge for the future is to improve the accuracy of diagnosis, prognosis, and prediction of therapeutic responses through the identification of biomarkers. In this framework, the omics sciences (genomics, transcriptomics, proteomics, and metabolomics) and their combined use represent innovative approaches for the exploration of the complexity and heterogeneity of multiple sclerosis (MS). This review examines the evidence currently available on the application of omics sciences to MS, analyses the methods, their limitations, the samples used, and their characteristics, with a particular focus on biomarkers associated with the disease state, exposure to disease-modifying treatments (DMTs), and drug efficacies and safety profiles.
Collapse
Affiliation(s)
- Lorena Lorefice
- Multiple Sclerosis Center, Binaghi Hospital, ASL Cagliari, Department of Medical Sciences and Public Health, University of Cagliari, Cagliari, Italy
- *Correspondence: Lorena Lorefice,
| | - Maristella Pitzalis
- Institute for Genetic and Biomedical Research, National Research Council, Cagliari, Italy
| | - Federica Murgia
- Dpt of Biomedical Sciences, University of Cagliari, Cagliari, Italy
| | - Giuseppe Fenu
- Department of Neurosciences, ARNAS Brotzu, Cagliari, Italy
| | - Luigi Atzori
- Multiple Sclerosis Center, Binaghi Hospital, ASL Cagliari, Department of Medical Sciences and Public Health, University of Cagliari, Cagliari, Italy
| | - Eleonora Cocco
- Multiple Sclerosis Center, Binaghi Hospital, ASL Cagliari, Department of Medical Sciences and Public Health, University of Cagliari, Cagliari, Italy
| |
Collapse
|
17
|
Devarajan AK, Truu M, Gopalasubramaniam SK, Muthukrishanan G, Truu J. Application of data integration for rice bacterial strain selection by combining their osmotic stress response and plant growth-promoting traits. Front Microbiol 2022; 13:1058772. [PMID: 36590400 PMCID: PMC9797599 DOI: 10.3389/fmicb.2022.1058772] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 11/29/2022] [Indexed: 12/23/2022] Open
Abstract
Agricultural application of plant-beneficial bacteria to improve crop yield and alleviate the stress caused by environmental conditions, pests, and pathogens is gaining popularity. However, before using these bacterial strains in plant experiments, their environmental stress responses and plant health improvement potential should be examined. In this study, we explored the applicability of three unsupervised machine learning-based data integration methods, including principal component analysis (PCA) of concatenated data, multiple co-inertia analysis (MCIA), and multiple kernel learning (MKL), to select osmotic stress-tolerant plant growth-promoting (PGP) bacterial strains isolated from the rice phyllosphere. The studied datasets consisted of direct and indirect PGP activity measurements and osmotic stress responses of eight bacterial strains previously isolated from the phyllosphere of drought-tolerant rice cultivar. The production of phytohormones, such as indole-acetic acid (IAA), gibberellic acid (GA), abscisic acid (ABA), and cytokinin, were used as direct PGP traits, whereas the production of hydrogen cyanide and siderophore and antagonistic activity against the foliar pathogens Pyricularia oryzae and Helminthosporium oryzae were evaluated as measures of indirect PGP activity. The strains were subjected to a range of osmotic stress levels by adding PEG 6000 (0, 11, 21, and 32.6%) to their growth medium. The results of the osmotic stress response experiments showed that all bacterial strains accumulated endogenous proline and glycine betaine (GB) and exhibited an increase in growth, when osmotic stress levels were increased to a specific degree, while the production of IAA and GA considerably decreased. The three applied data integration methods did not provide a similar grouping of the strains. Especially deviant was the ordination of microbial strains based on the PCA of concatenated data. However, all three data integration methods indicated that the strains Bacillus altitudinis PB46 and B. megaterium PB50 shared high similarity in PGP traits and osmotic stress response. Overall, our results indicate that data integration methods complement the single-table data analysis approach and improve the selection process for PGP microbial strains.
Collapse
Affiliation(s)
- Arun Kumar Devarajan
- Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia,*Correspondence: Arun Kumar Devarajan,
| | - Marika Truu
- Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
| | - Sabarinathan Kuttalingam Gopalasubramaniam
- Department of Plant Pathology, Agricultural College and Research Institute, Tamil Nadu Agricultural University, Killikulam, Tuticorin, India,Sabarinathan Kuttalingam Gopalasubramaniam,
| | - Gomathy Muthukrishanan
- Department of Soil Science and Agricultural Chemistry, Agricultural College and Research Institute, Tamil Nadu Agricultural University, Killikulam, Tuticorin, India
| | - Jaak Truu
- Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
| |
Collapse
|
18
|
Taguchi YH, Turki T. A tensor decomposition-based integrated analysis applicable to multiple gene expression profiles without sample matching. Sci Rep 2022; 12:21242. [PMID: 36481877 PMCID: PMC9732005 DOI: 10.1038/s41598-022-25524-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2021] [Accepted: 11/30/2022] [Indexed: 12/13/2022] Open
Abstract
The integrated analysis of multiple gene expression profiles previously measured in distinct studies is problematic since missing both sample matches and common labels prevent their integration in fully data-driven, unsupervised training. In this study, we propose a strategy to enable the integration of multiple gene expression profiles among multiple independent studies with neither labeling nor sample matching using tensor decomposition unsupervised feature extraction. We apply this strategy to Alzheimer's disease (AD)-related gene expression profiles that lack precise correspondence among samples, including AD single-cell RNA sequence (scRNA-seq) data. We were able to select biologically reasonable genes using the integrated analysis. Overall, integrated gene expression profiles can function analogously to prior- and/or transfer-learning strategies in other machine-learning applications. For scRNA-seq, the proposed approach significantly reduces the required computational memory.
Collapse
Affiliation(s)
- Y-h. Taguchi
- grid.443595.a0000 0001 2323 0843Department of Physics, Chuo University, Tokyo, 112-8551 Japan
| | - Turki Turki
- grid.412125.10000 0001 0619 1117Department of Computer Science, King Abdulaziz University, Jeddah, 21589 Saudi Arabia
| |
Collapse
|
19
|
Zhang Y, Deng Y, Zhou Z, Zhang X, Jiao P, Zhao Z. Multimodal learning for fetal distress diagnosis using a multimodal medical information fusion framework. Front Physiol 2022; 13:1021400. [DOI: 10.3389/fphys.2022.1021400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Accepted: 10/25/2022] [Indexed: 11/09/2022] Open
Abstract
Cardiotocography (CTG) monitoring is an important medical diagnostic tool for fetal well-being evaluation in late pregnancy. In this regard, intelligent CTG classification based on Fetal Heart Rate (FHR) signals is a challenging research area that can assist obstetricians in making clinical decisions, thereby improving the efficiency and accuracy of pregnancy management. Most existing methods focus on one specific modality, that is, they only detect one type of modality and inevitably have limitations such as incomplete or redundant source domain feature extraction, and poor repeatability. This study focuses on modeling multimodal learning for Fetal Distress Diagnosis (FDD); however, exists three major challenges: unaligned multimodalities; failure to learn and fuse the causality and inclusion between multimodal biomedical data; modality sensitivity, that is, difficulty in implementing a task in the absence of modalities. To address these three issues, we propose a Multimodal Medical Information Fusion framework named MMIF, where the Category Constrained-Parallel ViT model (CCPViT) was first proposed to explore multimodal learning tasks and address the misalignment between multimodalities. Based on CCPViT, a cross-attention-based image-text joint component is introduced to establish a Multimodal Representation Alignment Network model (MRAN), explore the deep-level interactive representation between cross-modal data, and assist multimodal learning. Furthermore, we designed a simple-structured FDD test model based on the highly modal alignment MMIF, realizing task delegation from multimodal model training (image and text) to unimodal pathological diagnosis (image). Extensive experiments, including model parameter sensitivity analysis, cross-modal alignment assessment, and pathological diagnostic accuracy evaluation, were conducted to show our models’ superior performance and effectiveness.
Collapse
|
20
|
Zhanpeng H, Jiekang W. A Multiview Clustering Method With Low-Rank and Sparsity Constraints for Cancer Subtyping. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3213-3223. [PMID: 34705654 DOI: 10.1109/tcbb.2021.3122917] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Multiomics data clustering is one of the major challenges in the field of precision medicine. Integration of multiomics data for cancer subtyping can improve the understanding on cancer and reveal systems-level insights. How to integrate multiomics data for accurate cancer subtyping is an interesting and challenging research problem. To capture the global and the local structure of omics data, a novel framework for integrating multiomics data is proposed for cancer subtyping. Multiview clustering with low-rank and sparsity constraints (MVCLRS) can measure the local similarities of samples in each omics data and obtain global consensus structures by integrating the multiomics data. The main insight provided by MVCLRS is that low-rank sparse subspace clustering for the construction of an affinity matrix can best capture the local similarities in omics data. Extensive testing is conducted on 10 real world cancer datasets with multiomics from The Cancer Genome Atlas. Compared with 10 state-of-the-art multiomics clustering algorithms, the MVCLRS performs better in the 10 cancer datasets by providing its clustering results with at least one enriched clinical label in nine of ten cancer subtypes, the most of any method.
Collapse
|
21
|
IoMT-Based Mitochondrial and Multifactorial Genetic Inheritance Disorder Prediction Using Machine Learning. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:2650742. [PMID: 35909844 PMCID: PMC9334098 DOI: 10.1155/2022/2650742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/15/2022] [Accepted: 07/04/2022] [Indexed: 11/18/2022]
Abstract
A genetic disorder is a serious disease that affects a large number of individuals around the world. There are various types of genetic illnesses, however, we focus on mitochondrial and multifactorial genetic disorders for prediction. Genetic illness is caused by a number of factors, including a defective maternal or paternal gene, excessive abortions, a lack of blood cells, and low white blood cell count. For premature or teenage life development, early detection of genetic diseases is crucial. Although it is difficult to forecast genetic disorders ahead of time, this prediction is very critical since a person's life progress depends on it. Machine learning algorithms are used to diagnose genetic disorders with high accuracy utilizing datasets collected and constructed from a large number of patient medical reports. A lot of studies have been conducted recently employing genome sequencing for illness detection, but fewer studies have been presented using patient medical history. The accuracy of existing studies that use a patient's history is restricted. The internet of medical things (IoMT) based proposed model for genetic disease prediction in this article uses two separate machine learning algorithms: support vector machine (SVM) and K-Nearest Neighbor (KNN). Experimental results show that SVM has outperformed the KNN and existing prediction methods in terms of accuracy. SVM achieved an accuracy of 94.99% and 86.6% for training and testing, respectively.
Collapse
|
22
|
Feng H, Xiang Y, Wang X, Xue W, Yue Z. MTAGCN: predicting miRNA-target associations in Camellia sinensis var. assamica through graph convolution neural network. BMC Bioinformatics 2022; 23:271. [PMID: 35820798 PMCID: PMC9275082 DOI: 10.1186/s12859-022-04819-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Accepted: 07/01/2022] [Indexed: 11/10/2022] Open
Abstract
Background MircoRNAs (miRNAs) play a central role in diverse biological processes of Camellia sinensis var.assamica (CSA) through their associations with target mRNAs, including CSA growth, development and stress response. However, although the experiment methods of CSA miRNA-target identifications are costly and time-consuming, few computational methods have been developed to tackle the CSA miRNA-target association prediction problem. Results In this paper, we constructed a heterogeneous network for CSA miRNA and targets by integrating rich biological information, including a miRNA similarity network, a target similarity network, and a miRNA-target association network. We then proposed a deep learning framework of graph convolution networks with layer attention mechanism, named MTAGCN. In particular, MTAGCN uses the attention mechanism to combine embeddings of multiple graph convolution layers, employing the integrated embedding to score the unobserved CSA miRNA-target associations. Discussion Comprehensive experiment results on two tasks (balanced task and unbalanced task) demonstrated that our proposed model achieved better performance than the classic machine learning and existing graph convolution network-based methods. The analysis of these results could offer valuable information for understanding complex CSA miRNA-target association mechanisms and would make a contribution to precision plant breeding. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04819-3.
Collapse
Affiliation(s)
- Haisong Feng
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Ying Xiang
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Xiaosong Wang
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Wei Xue
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Zhenyu Yue
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China.
| |
Collapse
|
23
|
Combining Molecular, Imaging, and Clinical Data Analysis for Predicting Cancer Prognosis. Cancers (Basel) 2022; 14:cancers14133215. [PMID: 35804988 PMCID: PMC9265023 DOI: 10.3390/cancers14133215] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 06/24/2022] [Accepted: 06/27/2022] [Indexed: 02/04/2023] Open
Abstract
Simple Summary The rise of Big Data, the widespread use of Machine Learning, and the cheapening of omics techniques have allowed for the creation of more sophisticated and accurate models in biomedical research. This article presents the state-of-the-art predictive models of cancer prognosis that use multimodal data, considering clinical, molecular (omics and non-omics), and image data. The subject of study, the data modalities used, the data processing and modelling methods applied, the validation strategies involved, the integration strategies encompassed, and the evolution of prognostic predictive models are discussed. Finally, we discuss challenges and opportunities in this field of cancer research, with great potential impact on the clinical management of patients and, by extension, on the implementation of personalised and precision medicine. Abstract Cancer is one of the most detrimental diseases globally. Accordingly, the prognosis prediction of cancer patients has become a field of interest. In this review, we have gathered 43 state-of-the-art scientific papers published in the last 6 years that built cancer prognosis predictive models using multimodal data. We have defined the multimodality of data as four main types: clinical, anatomopathological, molecular, and medical imaging; and we have expanded on the information that each modality provides. The 43 studies were divided into three categories based on the modelling approach taken, and their characteristics were further discussed together with current issues and future trends. Research in this area has evolved from survival analysis through statistical modelling using mainly clinical and anatomopathological data to the prediction of cancer prognosis through a multi-faceted data-driven approach by the integration of complex, multimodal, and high-dimensional data containing multi-omics and medical imaging information and by applying Machine Learning and, more recently, Deep Learning techniques. This review concludes that cancer prognosis predictive multimodal models are capable of better stratifying patients, which can improve clinical management and contribute to the implementation of personalised medicine as well as provide new and valuable knowledge on cancer biology and its progression.
Collapse
|
24
|
Caligola S, De Sanctis F, Canè S, Ugel S. Breaking the Immune Complexity of the Tumor Microenvironment Using Single-Cell Technologies. Front Genet 2022; 13:867880. [PMID: 35651929 PMCID: PMC9149246 DOI: 10.3389/fgene.2022.867880] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Accepted: 04/27/2022] [Indexed: 12/31/2022] Open
Abstract
Tumors are not a simple aggregate of transformed cells but rather a complicated ecosystem containing various components, including infiltrating immune cells, tumor-related stromal cells, endothelial cells, soluble factors, and extracellular matrix proteins. Profiling the immune contexture of this intricate framework is now mandatory to develop more effective cancer therapies and precise immunotherapeutic approaches by identifying exact targets or predictive biomarkers, respectively. Conventional technologies are limited in reaching this goal because they lack high resolution. Recent developments in single-cell technologies, such as single-cell RNA transcriptomics, mass cytometry, and multiparameter immunofluorescence, have revolutionized the cancer immunology field, capturing the heterogeneity of tumor-infiltrating immune cells and the dynamic complexity of tenets that regulate cell networks in the tumor microenvironment. In this review, we describe some of the current single-cell technologies and computational techniques applied for immune-profiling the cancer landscape and discuss future directions of how integrating multi-omics data can guide a new "precision oncology" advancement.
Collapse
Affiliation(s)
- Simone Caligola
- Immunology Section, Department of Medicine, University of Verona, Verona, Italy
| | | | - Stefania Canè
- Immunology Section, Department of Medicine, University of Verona, Verona, Italy
| | - Stefano Ugel
- Immunology Section, Department of Medicine, University of Verona, Verona, Italy
| |
Collapse
|
25
|
van Loon W, de Vos F, Fokkema M, Szabo B, Koini M, Schmidt R, de Rooij M. Analyzing Hierarchical Multi-View MRI Data With StaPLR: An Application to Alzheimer's Disease Classification. Front Neurosci 2022; 16:830630. [PMID: 35546881 PMCID: PMC9082949 DOI: 10.3389/fnins.2022.830630] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Accepted: 03/23/2022] [Indexed: 11/16/2022] Open
Abstract
Multi-view data refers to a setting where features are divided into feature sets, for example because they correspond to different sources. Stacked penalized logistic regression (StaPLR) is a recently introduced method that can be used for classification and automatically selecting the views that are most important for prediction. We introduce an extension of this method to a setting where the data has a hierarchical multi-view structure. We also introduce a new view importance measure for StaPLR, which allows us to compare the importance of views at any level of the hierarchy. We apply our extended StaPLR algorithm to Alzheimer's disease classification where different MRI measures have been calculated from three scan types: structural MRI, diffusion-weighted MRI, and resting-state fMRI. StaPLR can identify which scan types and which derived MRI measures are most important for classification, and it outperforms elastic net regression in classification performance.
Collapse
Affiliation(s)
- Wouter van Loon
- Department of Methodology and Statistics, Leiden University, Leiden, Netherlands
| | - Frank de Vos
- Department of Methodology and Statistics, Leiden University, Leiden, Netherlands.,Department of Radiology, Leiden University Medical Center, Leiden, Netherlands.,Leiden Institute for Brain and Cognition, Leiden, Netherlands
| | - Marjolein Fokkema
- Department of Methodology and Statistics, Leiden University, Leiden, Netherlands
| | - Botond Szabo
- Department of Decision Sciences, Bocconi University, Milan, Italy.,Bocconi Institute for Data Science and Analytics, Bocconi University, Milan, Italy
| | - Marisa Koini
- Division of Neurogeriatrics, Department of Neurology, Medical University of Graz, Graz, Austria
| | - Reinhold Schmidt
- Division of Neurogeriatrics, Department of Neurology, Medical University of Graz, Graz, Austria
| | - Mark de Rooij
- Department of Methodology and Statistics, Leiden University, Leiden, Netherlands.,Leiden Institute for Brain and Cognition, Leiden, Netherlands
| |
Collapse
|
26
|
Watson ER, Taherian Fard A, Mar JC. Computational Methods for Single-Cell Imaging and Omics Data Integration. Front Mol Biosci 2022; 8:768106. [PMID: 35111809 PMCID: PMC8801747 DOI: 10.3389/fmolb.2021.768106] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Accepted: 11/29/2021] [Indexed: 12/12/2022] Open
Abstract
Integrating single cell omics and single cell imaging allows for a more effective characterisation of the underlying mechanisms that drive a phenotype at the tissue level, creating a comprehensive profile at the cellular level. Although the use of imaging data is well established in biomedical research, its primary application has been to observe phenotypes at the tissue or organ level, often using medical imaging techniques such as MRI, CT, and PET. These imaging technologies complement omics-based data in biomedical research because they are helpful for identifying associations between genotype and phenotype, along with functional changes occurring at the tissue level. Single cell imaging can act as an intermediary between these levels. Meanwhile new technologies continue to arrive that can be used to interrogate the genome of single cells and its related omics datasets. As these two areas, single cell imaging and single cell omics, each advance independently with the development of novel techniques, the opportunity to integrate these data types becomes more and more attractive. This review outlines some of the technologies and methods currently available for generating, processing, and analysing single-cell omics- and imaging data, and how they could be integrated to further our understanding of complex biological phenomena like ageing. We include an emphasis on machine learning algorithms because of their ability to identify complex patterns in large multidimensional data.
Collapse
Affiliation(s)
| | - Atefeh Taherian Fard
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, QLD, Australia
| | - Jessica Cara Mar
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, QLD, Australia
| |
Collapse
|
27
|
Viaud G, Mayilvahanan P, Cournede PH. Representation Learning for the Clustering of Multi-Omics Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:135-145. [PMID: 33600320 DOI: 10.1109/tcbb.2021.3060340] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The integration of several sources of data for the identification of subtypes of diseases has gained attention over the past few years. The heterogeneity and the high dimensions of the data sets calls for an adequate representation of the data. We summarize the field of representation learning for the multi-omics clustering problem and we investigate several techniques to learn relevant combined representations, using methods from group factor analysis (PCA, MFA and extensions) and from machine learning with autoencoders. We highlight the importance of appropriately designing and training the latter, notably with a novel combination of a disjointed deep autoencoder (DDAE) architecture and a layer-wise reconstruction loss. These different representations can then be clustered to identify biologically meaningful clusters of patients. We provide a unifying framework for model comparison between statistical and deep learning approaches with the introduction of a new weighted internal clustering index that evaluates how well the clustering information is retained from each source, favoring contributions from all data sets. We apply our methodology to two case studies for which previous works of integrative clustering exist, TCGA Breast Cancer and TARGET Neuroblastoma, and show how our method can yield good and well-balanced clusters across the different data sources.
Collapse
|
28
|
Huminiecki Ł. Virtual Gene Concept and a Corresponding Pragmatic Research Program in Genetical Data Science. ENTROPY (BASEL, SWITZERLAND) 2021; 24:17. [PMID: 35052043 PMCID: PMC8774939 DOI: 10.3390/e24010017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 12/02/2021] [Accepted: 12/14/2021] [Indexed: 06/14/2023]
Abstract
Mendel proposed an experimentally verifiable paradigm of particle-based heredity that has been influential for over 150 years. The historical arguments have been reflected in the near past as Mendel's concept has been diversified by new types of omics data. As an effect of the accumulation of omics data, a virtual gene concept forms, giving rise to genetical data science. The concept integrates genetical, functional, and molecular features of the Mendelian paradigm. I argue that the virtual gene concept should be deployed pragmatically. Indeed, the concept has already inspired a practical research program related to systems genetics. The program includes questions about functionality of structural and categorical gene variants, about regulation of gene expression, and about roles of epigenetic modifications. The methodology of the program includes bioinformatics, machine learning, and deep learning. Education, funding, careers, standards, benchmarks, and tools to monitor research progress should be provided to support the research program.
Collapse
Affiliation(s)
- Łukasz Huminiecki
- Evolutionary, Computational, and Statistical Genetics, Department of Molecula Biology, Institute of Genetics and Animal Biotechnology, Polish Academy of Sciences, Postępu 36A, Jastrzębiec, 05-552 Warsaw, Poland
| |
Collapse
|
29
|
Vijayakumar S, Angione C. Protocol for hybrid flux balance, statistical, and machine learning analysis of multi-omic data from the cyanobacterium Synechococcus sp. PCC 7002. STAR Protoc 2021; 2:100837. [PMID: 34632416 PMCID: PMC8488602 DOI: 10.1016/j.xpro.2021.100837] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Combining a computational framework for flux balance analysis with machine learning improves the accuracy of predicting metabolic activity across conditions, while enabling mechanistic interpretation. This protocol presents a guide to condition-specific metabolic modeling that integrates regularized flux balance analysis with machine learning approaches to extract key features from transcriptomic and fluxomic data. We demonstrate the protocol as applied to Synechococcus sp. PCC 7002; we also outline how it can be adapted to any species or community with available multi-omic data. For complete details on the use and execution of this protocol, please refer to Vijayakumar et al. (2020).
Collapse
Affiliation(s)
- Supreeta Vijayakumar
- School of Computing, Engineering & Digital Technologies, Teesside University, Middlesbrough, North Yorkshire TS1 3BX, UK
| | - Claudio Angione
- School of Computing, Engineering & Digital Technologies, Teesside University, Middlesbrough, North Yorkshire TS1 3BX, UK
- Centre for Digital Innovation, Teesside University, Middlesbrough TS1 3BX, UK
- Healthcare Innovation Centre, Teesside University, Middlesbrough TS1 3BX, UK
| |
Collapse
|
30
|
Mourragui SMC, Loog M, Vis DJ, Moore K, Manjon AG, van de Wiel MA, Reinders MJT, Wessels LFA. Predicting patient response with models trained on cell lines and patient-derived xenografts by nonlinear transfer learning. Proc Natl Acad Sci U S A 2021; 118:e2106682118. [PMID: 34873056 PMCID: PMC8670522 DOI: 10.1073/pnas.2106682118] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/18/2021] [Indexed: 12/13/2022] Open
Abstract
Preclinical models have been the workhorse of cancer research, producing massive amounts of drug response data. Unfortunately, translating response biomarkers derived from these datasets to human tumors has proven to be particularly challenging. To address this challenge, we developed TRANSACT, a computational framework that builds a consensus space to capture biological processes common to preclinical models and human tumors and exploits this space to construct drug response predictors that robustly transfer from preclinical models to human tumors. TRANSACT performs favorably compared to four competing approaches, including two deep learning approaches, on a set of 23 drug prediction challenges on The Cancer Genome Atlas and 226 metastatic tumors from the Hartwig Medical Foundation. We demonstrate that response predictions deliver a robust performance for a number of therapies of high clinical importance: platinum-based chemotherapies, gemcitabine, and paclitaxel. In contrast to other approaches, we demonstrate the interpretability of the TRANSACT predictors by correctly identifying known biomarkers of targeted therapies, and we propose potential mechanisms that mediate the resistance to two chemotherapeutic agents.
Collapse
Affiliation(s)
- Soufiane M C Mourragui
- Division of Molecular Carcinogenesis, Oncode Institute, The Netherlands Cancer Institute, 1066 CX Amsterdam, The Netherlands
- Department of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, 2628 XE Delft, The Netherlands
| | - Marco Loog
- Department of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, 2628 XE Delft, The Netherlands
- Department of Computer Science, University of Copenhagen, 2100 Copenhagen, Denmark
| | - Daniel J Vis
- Division of Molecular Carcinogenesis, Oncode Institute, The Netherlands Cancer Institute, 1066 CX Amsterdam, The Netherlands
| | - Kat Moore
- Division of Molecular Carcinogenesis, Oncode Institute, The Netherlands Cancer Institute, 1066 CX Amsterdam, The Netherlands
| | - Anna G Manjon
- Division of Cell Biology, Oncode Institute, The Netherlands Cancer Institute, 1066 CX Amsterdam, The Netherlands
| | - Mark A van de Wiel
- Epidemiology and Biostatistics, Amsterdam University Medical Center, 1105 AZ Amsterdam, The Netherlands
- Medical Research Council Biostatistics Unit, Cambridge University, Cambridge CB2 0SR, United Kingdom
| | - Marcel J T Reinders
- Department of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, 2628 XE Delft, The Netherlands;
- Leiden Computational Biology Center, Leiden University Medical Center, 2333 ZC Leiden, The Netherlands
| | - Lodewyk F A Wessels
- Division of Molecular Carcinogenesis, Oncode Institute, The Netherlands Cancer Institute, 1066 CX Amsterdam, The Netherlands;
- Department of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, 2628 XE Delft, The Netherlands
| |
Collapse
|
31
|
Ferré Q, Chèneby J, Puthier D, Capponi C, Ballester B. Anomaly detection in genomic catalogues using unsupervised multi-view autoencoders. BMC Bioinformatics 2021; 22:460. [PMID: 34563116 PMCID: PMC8467021 DOI: 10.1186/s12859-021-04359-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 06/04/2021] [Accepted: 08/09/2021] [Indexed: 11/13/2022] Open
Abstract
Background Accurate identification of Transcriptional Regulator binding locations is essential for analysis of genomic regions, including Cis Regulatory Elements. The customary NGS approaches, predominantly ChIP-Seq, can be obscured by data anomalies and biases which are difficult to detect without supervision. Results Here, we develop a method to leverage the usual combinations between many experimental series to mark such atypical peaks. We use deep learning to perform a lossy compression of the genomic regions’ representations with multiview convolutions. Using artificial data, we show that our method correctly identifies groups of correlating series and evaluates CRE according to group completeness. It is then applied to the ReMap database’s large volume of curated ChIP-seq data. We show that peaks lacking known biological correlators are singled out and less confirmed in real data. We propose normalization approaches useful in interpreting black-box models. Conclusion Our approach detects peaks that are less corroborated than average. It can be extended to other similar problems, and can be interpreted to identify correlation groups. It is implemented in an open-source tool called atyPeak. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04359-2.
Collapse
Affiliation(s)
- Quentin Ferré
- INSERM, TAGC, Aix Marseille University, Marseille, France.,Université de Toulon, CNRS, LIS, Aix Marseille University, Marseille, France
| | - Jeanne Chèneby
- INSERM, TAGC, Aix Marseille University, Marseille, France
| | - Denis Puthier
- INSERM, TAGC, Aix Marseille University, Marseille, France
| | - Cécile Capponi
- Université de Toulon, CNRS, LIS, Aix Marseille University, Marseille, France.
| | | |
Collapse
|
32
|
Liu J, Ge S, Cheng Y, Wang X. Multi-View Spectral Clustering Based on Multi-Smooth Representation Fusion for Cancer Subtype Prediction. Front Genet 2021; 12:718915. [PMID: 34552619 PMCID: PMC8450448 DOI: 10.3389/fgene.2021.718915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 08/05/2021] [Indexed: 11/24/2022] Open
Abstract
It is a vital task to design an integrated machine learning model to discover cancer subtypes and understand the heterogeneity of cancer based on multiple omics data. In recent years, some multi-view clustering algorithms have been proposed and applied to the prediction of cancer subtypes. Among them, the multi-view clustering methods based on graph learning are widely concerned. These multi-view approaches usually have one or more of the following problems. Many multi-view algorithms use the original omics data matrix to construct the similarity matrix and ignore the learning of the similarity matrix. They separate the data clustering process from the graph learning process, resulting in a highly dependent clustering performance on the predefined graph. In the process of graph fusion, these methods simply take the average value of the affinity graph of multiple views to represent the result of the fusion graph, and the rich heterogeneous information is not fully utilized. To solve the above problems, in this paper, a Multi-view Spectral Clustering Based on Multi-smooth Representation Fusion (MRF-MSC) method was proposed. Firstly, MRF-MSC constructs a smooth representation for each data type, which can be viewed as a sample (patient) similarity matrix. The smooth representation can explicitly enhance the grouping effect. Secondly, MRF-MSC integrates the smooth representation of multiple omics data to form a similarity matrix containing all biological data information through graph fusion. In addition, MRF-MSC adaptively gives weight factors to the smooth regularization representation of each omics data by using the self-weighting method. Finally, MRF-MSC imposes constrained Laplacian rank on the fusion similarity matrix to get a better cluster structure. The above problems can be transformed into spectral clustering for solving, and the clustering results can be obtained. MRF-MSC unifies the above process of graph construction, graph fusion and spectral clustering under one framework, which can learn better data representation and high-quality graphs, so as to achieve better clustering effect. In the experiment, MRF-MSC obtained good experimental results on the TCGA cancer data sets.
Collapse
Affiliation(s)
- Jian Liu
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
- Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, China University of Mining and Technology, Xuzhou, China
| | - Shuguang Ge
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
- Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, China University of Mining and Technology, Xuzhou, China
| | - Yuhu Cheng
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
- Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, China University of Mining and Technology, Xuzhou, China
| | - Xuesong Wang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
- Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, China University of Mining and Technology, Xuzhou, China
| |
Collapse
|
33
|
Krantz M, Zimmer D, Adler SO, Kitashova A, Klipp E, Mühlhaus T, Nägele T. Data Management and Modeling in Plant Biology. FRONTIERS IN PLANT SCIENCE 2021; 12:717958. [PMID: 34539712 PMCID: PMC8446634 DOI: 10.3389/fpls.2021.717958] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Accepted: 07/29/2021] [Indexed: 05/25/2023]
Abstract
The study of plant-environment interactions is a multidisciplinary research field. With the emergence of quantitative large-scale and high-throughput techniques, amount and dimensionality of experimental data have strongly increased. Appropriate strategies for data storage, management, and evaluation are needed to make efficient use of experimental findings. Computational approaches of data mining are essential for deriving statistical trends and signatures contained in data matrices. Although, current biology is challenged by high data dimensionality in general, this is particularly true for plant biology. Plants as sessile organisms have to cope with environmental fluctuations. This typically results in strong dynamics of metabolite and protein concentrations which are often challenging to quantify. Summarizing experimental output results in complex data arrays, which need computational statistics and numerical methods for building quantitative models. Experimental findings need to be combined by computational models to gain a mechanistic understanding of plant metabolism. For this, bioinformatics and mathematics need to be combined with experimental setups in physiology, biochemistry, and molecular biology. This review presents and discusses concepts at the interface of experiment and computation, which are likely to shape current and future plant biology. Finally, this interface is discussed with regard to its capabilities and limitations to develop a quantitative model of plant-environment interactions.
Collapse
Affiliation(s)
- Maria Krantz
- Theoretical Biophysics, Institute of Biology, Humboldt-Universität zu Berlin, Berlin, Germany
| | - David Zimmer
- Computational Systems Biology, Technische Universität Kaiserslautern, Kaiserslautern, Germany
| | - Stephan O. Adler
- Theoretical Biophysics, Institute of Biology, Humboldt-Universität zu Berlin, Berlin, Germany
| | - Anastasia Kitashova
- Plant Evolutionary Cell Biology, Faculty of Biology, Ludwig-Maximilians-Universität München, Planegg-Martinsried, Germany
| | - Edda Klipp
- Theoretical Biophysics, Institute of Biology, Humboldt-Universität zu Berlin, Berlin, Germany
| | - Timo Mühlhaus
- Computational Systems Biology, Technische Universität Kaiserslautern, Kaiserslautern, Germany
| | - Thomas Nägele
- Plant Evolutionary Cell Biology, Faculty of Biology, Ludwig-Maximilians-Universität München, Planegg-Martinsried, Germany
| |
Collapse
|
34
|
Zeng P, Tang X, Wu T, Tian Q, Li M, Ding J. [Identification of potential regulatory genes for embryonic stem cell self-renewal and pluripotency by random forest]. NAN FANG YI KE DA XUE XUE BAO = JOURNAL OF SOUTHERN MEDICAL UNIVERSITY 2021; 41:1234-1238. [PMID: 34549716 DOI: 10.12122/j.issn.1673-4254.2021.08.16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
OBJECTIVE To identify novel genes associated with self-renewal and pluripotency of mouse embryonic stem cells(mESCs)by integrating multiomics data based on machine learning methods. METHODS We integrated multiomics information of mESCs involving transcriptome, histone modifications, chromatin accessibility, transcription factor binding and architectural protein binding, and compared the signal differences between known stem cell self-renewal and pluripotency genes and other genes.By integrating these multiomics data, we established prediction models based on several machine learning classifiers including random forests and performed 5-fold cross validations.The model was trained using the training dataset containing two thirds of the input samples, and the remaining one third of the input samples were used as the test dataset to assess the performance of the model in independent tests.Finally, the results predicted by the model were validated through gene function annotation and cell function experiments including cell viability assay, colony formation assay and cell cycle analysis. RESULTS Compared with the random genes, the genes known to be associated with self-renewal and pluripotency of mESCs in the multiomics data showed significantly different features.Random forest outperformed the other machine learning algorithms tested on these multiomics data, with an area under the curve (AUC) of 0.883±0.018 for cross validation and an AUC of 0.880±0.028 for independent test.Based on this model, we identified 893 potential regulatory genes associated wwith self-renewal and pluripotency of mESCs, which were similar to the known genes in functional annotation.Known-down of the predicted novel regulator gene Cct6a resulted in significant decreases in the cell viability of mESCs (P < 0.0001) and the number of cell clones (P < 0.01), significantly increased the number of cells in G1 phase (P < 0.01) and decreasedthe number of S phase cells (P < 0.05).Knockdown of Cct6a also led to failure of positive alkaline phosphatase staining of the mESCs. CONCLUSION Machine learning model based on multiomics data can be used to predict potential self-renewal and pluripotency regulators with high performance.By using this model, we predicted potential self-renewal and pluripotency regulatory genes including Cct6a and applied experimental validation.This model provides new insights into the regulatory mechanism of mESCs and contribute to stem cell research.
Collapse
Affiliation(s)
- P Zeng
- School of Basic Medical Science, Southern Medical University, Guangzhou 510515, China
| | - X Tang
- Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou 510080, China
| | - T Wu
- Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou 510080, China
| | - Q Tian
- Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou 510080, China
| | - M Li
- School of Basic Medical Science, Southern Medical University, Guangzhou 510515, China
| | - J Ding
- Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou 510080, China
| |
Collapse
|
35
|
Hulot A, Laloë D, Jaffrézic F. A unified framework for the integration of multiple hierarchical clusterings or networks from multi-source data. BMC Bioinformatics 2021; 22:392. [PMID: 34348641 PMCID: PMC8336092 DOI: 10.1186/s12859-021-04303-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Accepted: 07/13/2021] [Indexed: 11/30/2022] Open
Abstract
Background Integrating data from different sources is a recurring question in computational biology. Much effort has been devoted to the integration of data sets of the same type, typically multiple numerical data tables. However, data types are generally heterogeneous: it is a common place to gather data in the form of trees, networks or factorial maps, as these representations all have an appealing visual interpretation that helps to study grouping patterns and interactions between entities. The question we aim to answer in this paper is that of the integration of such representations. Results To this end, we provide a simple procedure to compare data with various types, in particular trees or networks, that relies essentially on two steps: the first step projects the representations into a common coordinate system; the second step then uses a multi-table integration approach to compare the projected data. We rely on efficient and well-known methodologies for each step: the projection step is achieved by retrieving a distance matrix for each representation form and then applying multidimensional scaling to provide a new set of coordinates from all the pairwise distances. The integration step is then achieved by applying a multiple factor analysis to the multiple tables of the new coordinates. This procedure provides tools to integrate and compare data available, for instance, as tree or network structures. Our approach is complementary to kernel methods, traditionally used to answer the same question. Conclusion Our approach is evaluated on simulation and used to analyze two real-world data sets: first, we compare several clusterings for different cell-types obtained from a transcriptomics single-cell data set in mouse embryos; second, we use our procedure to aggregate a multi-table data set from the TCGA breast cancer database, in order to compare several protein networks inferred for different breast cancer subtypes. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04303-4.
Collapse
Affiliation(s)
- Audrey Hulot
- Université Paris-Saclay, INRAE, AgroParisTech, GABI , 78350, Jouy-en-Josas, France. .,Université Paris-Saclay, AgroParisTech, INRAE, UMR MIA-Paris , 75005, Paris, France. .,Université Paris-Saclay, UVSQ, Inserm, Infection et inflammation , 78180, Montigny-le-Bretonneux, France.
| | - Denis Laloë
- Université Paris-Saclay, INRAE, AgroParisTech, GABI , 78350, Jouy-en-Josas, France
| | - Florence Jaffrézic
- Université Paris-Saclay, INRAE, AgroParisTech, GABI , 78350, Jouy-en-Josas, France
| |
Collapse
|
36
|
Yu N, Wu MJ, Liu JX, Zheng CH, Xu Y. Correntropy-Based Hypergraph Regularized NMF for Clustering and Feature Selection on Multi-Cancer Integrated Data. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:3952-3963. [PMID: 32603306 DOI: 10.1109/tcyb.2020.3000799] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Non-negative matrix factorization (NMF) has become one of the most powerful methods for clustering and feature selection. However, the performance of the traditional NMF method severely degrades when the data contain noises and outliers or the manifold structure of the data is not taken into account. In this article, a novel method called correntropy-based hypergraph regularized NMF (CHNMF) is proposed to solve the above problem. Specifically, we use the correntropy instead of the Euclidean norm in the loss term of CHNMF, which will improve the robustness of the algorithm. And the hypergraph regularization term is also applied to the objective function, which can explore the high-order geometric information in more sample points. Then, the half-quadratic (HQ) optimization technique is adopted to solve the complex optimization problem of CHNMF. Finally, extensive experimental results on multi-cancer integrated data indicate that the proposed CHNMF method is superior to other state-of-the-art methods for clustering and feature selection.
Collapse
|
37
|
Genome wide analysis implicates upregulation of proteasome pathway in major depressive disorder. Transl Psychiatry 2021; 11:409. [PMID: 34321460 PMCID: PMC8319154 DOI: 10.1038/s41398-021-01529-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/24/2020] [Revised: 02/27/2021] [Accepted: 06/21/2021] [Indexed: 12/02/2022] Open
|
38
|
Stanton JE, Malijauskaite S, McGourty K, Grabrucker AM. The Metallome as a Link Between the "Omes" in Autism Spectrum Disorders. Front Mol Neurosci 2021; 14:695873. [PMID: 34290588 PMCID: PMC8289253 DOI: 10.3389/fnmol.2021.695873] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Accepted: 06/14/2021] [Indexed: 12/26/2022] Open
Abstract
Metal dyshomeostasis plays a significant role in various neurological diseases such as Alzheimer's disease, Parkinson's disease, Autism Spectrum Disorders (ASD), and many more. Like studies investigating the proteome, transcriptome, epigenome, microbiome, etc., for years, metallomics studies have focused on data from their domain, i.e., trace metal composition, only. Still, few have considered the links between other "omes," which may together result in an individual's specific pathologies. In particular, ASD have been reported to have multitudes of possible causal effects. Metallomics data focusing on metal deficiencies and dyshomeostasis can be linked to functions of metalloenzymes, metal transporters, and transcription factors, thus affecting the proteome and transcriptome. Furthermore, recent studies in ASD have emphasized the gut-brain axis, with alterations in the microbiome being linked to changes in the metabolome and inflammatory processes. However, the microbiome and other "omes" are heavily influenced by the metallome. Thus, here, we will summarize the known implications of a changed metallome for other "omes" in the body in the context of "omics" studies in ASD. We will highlight possible connections and propose a model that may explain the so far independently reported pathologies in ASD.
Collapse
Affiliation(s)
- Janelle E Stanton
- Department of Biological Sciences, University of Limerick, Limerick, Ireland.,Bernal Institute, University of Limerick, Limerick, Ireland
| | - Sigita Malijauskaite
- Bernal Institute, University of Limerick, Limerick, Ireland.,Department of Chemical Sciences, University of Limerick, Limerick, Ireland
| | - Kieran McGourty
- Bernal Institute, University of Limerick, Limerick, Ireland.,Department of Chemical Sciences, University of Limerick, Limerick, Ireland.,Health Research Institute, University of Limerick, Limerick, Ireland
| | - Andreas M Grabrucker
- Department of Biological Sciences, University of Limerick, Limerick, Ireland.,Bernal Institute, University of Limerick, Limerick, Ireland.,Health Research Institute, University of Limerick, Limerick, Ireland
| |
Collapse
|
39
|
Picard M, Scott-Boyer MP, Bodein A, Périn O, Droit A. Integration strategies of multi-omics data for machine learning analysis. Comput Struct Biotechnol J 2021; 19:3735-3746. [PMID: 34285775 PMCID: PMC8258788 DOI: 10.1016/j.csbj.2021.06.030] [Citation(s) in RCA: 148] [Impact Index Per Article: 49.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 06/17/2021] [Accepted: 06/21/2021] [Indexed: 12/25/2022] Open
Abstract
Increased availability of high-throughput technologies has generated an ever-growing number of omics data that seek to portray many different but complementary biological layers including genomics, epigenomics, transcriptomics, proteomics, and metabolomics. New insight from these data have been obtained by machine learning algorithms that have produced diagnostic and classification biomarkers. Most biomarkers obtained to date however only include one omic measurement at a time and thus do not take full advantage of recent multi-omics experiments that now capture the entire complexity of biological systems. Multi-omics data integration strategies are needed to combine the complementary knowledge brought by each omics layer. We have summarized the most recent data integration methods/ frameworks into five different integration strategies: early, mixed, intermediate, late and hierarchical. In this mini-review, we focus on challenges and existing multi-omics integration strategies by paying special attention to machine learning applications.
Collapse
Affiliation(s)
- Milan Picard
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Marie-Pier Scott-Boyer
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Antoine Bodein
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Olivier Périn
- Digital Sciences Department, L'Oréal Advanced Research, Aulnay-sous-bois, France
| | - Arnaud Droit
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
- Corresponding author.
| |
Collapse
|
40
|
Choi HJ, Wang C, Pan X, Jang J, Cao M, Brazzo JA, Bae Y, Lee K. Emerging machine learning approaches to phenotyping cellular motility and morphodynamics. Phys Biol 2021; 18:10.1088/1478-3975/abffbe. [PMID: 33971636 PMCID: PMC9131244 DOI: 10.1088/1478-3975/abffbe] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Accepted: 05/10/2021] [Indexed: 12/22/2022]
Abstract
Cells respond heterogeneously to molecular and environmental perturbations. Phenotypic heterogeneity, wherein multiple phenotypes coexist in the same conditions, presents challenges when interpreting the observed heterogeneity. Advances in live cell microscopy allow researchers to acquire an unprecedented amount of live cell image data at high spatiotemporal resolutions. Phenotyping cellular dynamics, however, is a nontrivial task and requires machine learning (ML) approaches to discern phenotypic heterogeneity from live cell images. In recent years, ML has proven instrumental in biomedical research, allowing scientists to implement sophisticated computation in which computers learn and effectively perform specific analyses with minimal human instruction or intervention. In this review, we discuss how ML has been recently employed in the study of cell motility and morphodynamics to identify phenotypes from computer vision analysis. We focus on new approaches to extract and learn meaningful spatiotemporal features from complex live cell images for cellular and subcellular phenotyping.
Collapse
Affiliation(s)
- Hee June Choi
- Department of Biomedical Engineering, Worcester Polytechnic Institute, Worcester, MA 01609, United States of America
- Vascular Biology Program and Department of Surgery, Boston Children’s Hospital, Harvard Medical School, Boston, MA 02115, United States of America
| | - Chuangqi Wang
- Department of Biomedical Engineering, Worcester Polytechnic Institute, Worcester, MA 01609, United States of America
- Present address. Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Xiang Pan
- Department of Biomedical Engineering, Worcester Polytechnic Institute, Worcester, MA 01609, United States of America
- Vascular Biology Program and Department of Surgery, Boston Children’s Hospital, Harvard Medical School, Boston, MA 02115, United States of America
| | - Junbong Jang
- Department of Biomedical Engineering, Worcester Polytechnic Institute, Worcester, MA 01609, United States of America
- Vascular Biology Program and Department of Surgery, Boston Children’s Hospital, Harvard Medical School, Boston, MA 02115, United States of America
| | - Mengzhi Cao
- Data Science Program, Worcester Polytechnic Institute, Worcester, MA 01609, United States of America
| | - Joseph A Brazzo
- Department of Pathology and Anatomical Sciences, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, State University of New York, Buffalo, NY 14203, United States of America
| | - Yongho Bae
- Department of Pathology and Anatomical Sciences, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, State University of New York, Buffalo, NY 14203, United States of America
| | - Kwonmoo Lee
- Department of Biomedical Engineering, Worcester Polytechnic Institute, Worcester, MA 01609, United States of America
- Vascular Biology Program and Department of Surgery, Boston Children’s Hospital, Harvard Medical School, Boston, MA 02115, United States of America
| |
Collapse
|
41
|
Wang T, Shao W, Huang Z, Tang H, Zhang J, Ding Z, Huang K. MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification. Nat Commun 2021; 12:3445. [PMID: 34103512 PMCID: PMC8187432 DOI: 10.1038/s41467-021-23774-w] [Citation(s) in RCA: 105] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Accepted: 05/04/2021] [Indexed: 12/18/2022] Open
Abstract
To fully utilize the advances in omics technologies and achieve a more comprehensive understanding of human diseases, novel computational methods are required for integrative analysis of multiple types of omics data. Here, we present a novel multi-omics integrative method named Multi-Omics Graph cOnvolutional NETworks (MOGONET) for biomedical classification. MOGONET jointly explores omics-specific learning and cross-omics correlation learning for effective multi-omics data classification. We demonstrate that MOGONET outperforms other state-of-the-art supervised multi-omics integrative analysis approaches from different biomedical classification applications using mRNA expression data, DNA methylation data, and microRNA expression data. Furthermore, MOGONET can identify important biomarkers from different omics data types related to the investigated biomedical problems.
Collapse
Affiliation(s)
- Tongxin Wang
- Department of Computer Science, Indiana University Bloomington, Bloomington, IN, USA
| | - Wei Shao
- Department of Medicine, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Zhi Huang
- Department of Medicine, Indiana University School of Medicine, Indianapolis, IN, USA
- School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, USA
| | - Haixu Tang
- Department of Computer Science, Indiana University Bloomington, Bloomington, IN, USA
| | - Jie Zhang
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Zhengming Ding
- Department of Computer Science, Tulane University, New Orleans, LA, USA.
| | - Kun Huang
- Department of Medicine, Indiana University School of Medicine, Indianapolis, IN, USA.
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN, USA.
- Regenstrief Institute, Indianapolis, IN, USA.
| |
Collapse
|
42
|
Jin T, Rehani P, Ying M, Huang J, Liu S, Roussos P, Wang D. scGRNom: a computational pipeline of integrative multi-omics analyses for predicting cell-type disease genes and regulatory networks. Genome Med 2021; 13:95. [PMID: 34044854 PMCID: PMC8161957 DOI: 10.1186/s13073-021-00908-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2020] [Accepted: 05/13/2021] [Indexed: 02/06/2023] Open
Abstract
Understanding cell-type-specific gene regulatory mechanisms from genetic variants to diseases remains challenging. To address this, we developed a computational pipeline, scGRNom (single-cell Gene Regulatory Network prediction from multi-omics), to predict cell-type disease genes and regulatory networks including transcription factors and regulatory elements. With applications to schizophrenia and Alzheimer's disease, we predicted disease genes and regulatory networks for excitatory and inhibitory neurons, microglia, and oligodendrocytes. Further enrichment analyses revealed cross-disease and disease-specific functions and pathways at the cell-type level. Our machine learning analysis also found that cell-type disease genes improved clinical phenotype predictions. scGRNom is a general-purpose tool available at https://github.com/daifengwanglab/scGRNom .
Collapse
Affiliation(s)
- Ting Jin
- Department of Biostatistics and Medical Informatics, University of Wisconsin - Madison, Madison, WI, 53706, USA
- Waisman Center, University of Wisconsin - Madison, Madison, WI, 53705, USA
| | - Peter Rehani
- Waisman Center, University of Wisconsin - Madison, Madison, WI, 53705, USA
- Department of Integrative Biology, University of Wisconsin - Madison, Madison, WI, 53706, USA
- Present address: Morgridge Institute for Research, Madison, WI, 53715, USA
| | - Mufang Ying
- Department of Statistics, University of Wisconsin - Madison, Madison, WI, 53706, USA
- Present address: Department of Statistics, Rutgers University, Piscataway, NJ, 08854, USA
| | - Jiawei Huang
- Department of Statistics, University of Wisconsin - Madison, Madison, WI, 53706, USA
| | - Shuang Liu
- Waisman Center, University of Wisconsin - Madison, Madison, WI, 53705, USA
| | - Panagiotis Roussos
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Daifeng Wang
- Department of Biostatistics and Medical Informatics, University of Wisconsin - Madison, Madison, WI, 53706, USA.
- Waisman Center, University of Wisconsin - Madison, Madison, WI, 53705, USA.
- Department of Computer Sciences, University of Wisconsin - Madison, Madison, WI, 53706, USA.
| |
Collapse
|
43
|
Guo Y, Wang Q, Guo Y, Zhang Y, Fu Y, Zhang H. Preoperative prediction of perineural invasion with multi-modality radiomics in rectal cancer. Sci Rep 2021; 11:9429. [PMID: 33941817 PMCID: PMC8093213 DOI: 10.1038/s41598-021-88831-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2020] [Accepted: 04/14/2021] [Indexed: 02/06/2023] Open
Abstract
Perineural invasion (PNI) as a grossly underreported independent risk predictor in rectal cancer is hard to identify preoperatively. We aim to predict PNI status in rectal cancer using multi-modality radiomics. In total, 396 radiomics features were extracted from T2-weighted images (T2WIs), diffusion-weighted images (DWIs), and portal venous phase of contrast-enhanced CT (CE-CT) respectively of 94 consecutive patients with histologically confirmed rectal cancer. T2WI score, DWI score, and CT score were calculated via the radiomics features selection and optimization. Discrimination, calibration, and clinical benefit ability were used to evaluate the performance of the radiomics scores in both training and testing datasets. CT score and T2WI score were independent risk predictors [CT score, OR (95% CI) = 4.218 (1.070–16.620); T2WI score, OR (95% CI) = 105.721 (3.091–3615.790)]. The concise score which combined CT score and T2WI score, showed the best performance [training dataset, AUC (95% CI) = 0.906 (0.833–0.979); testing dataset, AUC (95% CI) = 0.884 (0.761–1.000)] and good calibration (P > 0.05 in the Hosmer–Lemeshow test for the training and testing datasets). Decision curve analysis showed that the multi-modality radiomics nomogram had a higher clinical net benefit. The multi-modality radiomics score could be used to preoperatively assess PNI status in rectal cancer.
Collapse
Affiliation(s)
- Yu Guo
- Department of Radiology, The First Hospital of Jilin University, Jilin Provincial Key Laboratory of Medical Imaging and Big Data, Changchun, China
| | - Quan Wang
- Department of Gastric and Colorectal Surgery, The First Hospital of Jilin University, Changchun, China
| | - Yan Guo
- GE Healthcare, Shanghai, China
| | - Yiying Zhang
- Department of Radiology, The First Hospital of Jilin University, Jilin Provincial Key Laboratory of Medical Imaging and Big Data, Changchun, China
| | - Yu Fu
- Department of Radiology, The First Hospital of Jilin University, Jilin Provincial Key Laboratory of Medical Imaging and Big Data, Changchun, China.
| | - Huimao Zhang
- Department of Radiology, The First Hospital of Jilin University, Jilin Provincial Key Laboratory of Medical Imaging and Big Data, Changchun, China.
| |
Collapse
|
44
|
|
45
|
Tian S, Wang C. An ensemble of the iCluster method to analyze longitudinal lncRNA expression data for psoriasis patients. Hum Genomics 2021; 15:23. [PMID: 33879268 PMCID: PMC8056592 DOI: 10.1186/s40246-021-00323-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Accepted: 04/12/2021] [Indexed: 11/17/2022] Open
Abstract
Background Psoriasis is an immune-mediated, inflammatory disorder of the skin with chronic inflammation and hyper-proliferation of the epidermis. Since psoriasis has genetic components and the diseased tissue of psoriasis is very easily accessible, it is natural to use high-throughput technologies to characterize psoriasis and thus seek targeted therapies. Transcriptional profiles change correspondingly after an intervention. Unlike cross-sectional gene expression data, longitudinal gene expression data can capture the dynamic changes and thus facilitate causal inference. Methods Using the iCluster method as a building block, an ensemble method was proposed and applied to a longitudinal gene expression dataset for psoriasis, with the objective of identifying key lncRNAs that can discriminate the responders from the non-responders to two immune treatments of psoriasis. Results Using support vector machine models, the leave-one-out predictive accuracy of the 20-lncRNA signature identified by this ensemble was estimated as 80%, which outperforms several competing methods. Furthermore, pathway enrichment analysis was performed on the target mRNAs of the identified lncRNAs. Of the enriched GO terms or KEGG pathways, proteasome, and protein deubiquitination is included. The ubiquitination-proteasome system is regarded as a key player in psoriasis, and a proteasome inhibitor to target ubiquitination pathway holds promises for treating psoriasis. Conclusions An integrative method such as iCluster for multiple data integration can be adopted directly to analyze longitudinal gene expression data, which offers more promising options for longitudinal big data analysis. A comprehensive evaluation and validation of the resulting 20-lncRNA signature is highly desirable. Supplementary Information The online version contains supplementary material available at 10.1186/s40246-021-00323-6.
Collapse
Affiliation(s)
- Suyan Tian
- Division of Clinical Research, The First Hospital of Jilin University, 1 Xinmin Street, Changchun, Jilin, 130021, People's Republic of China.
| | - Chi Wang
- Department of Internal Medicine, College of Medicine, University of Kentucky, 800 Rose St, Lexington, KY, 40536, USA. .,Markey Cancer Center, University of Kentucky, 800 Rose St, Lexington, KY, 40536, USA.
| |
Collapse
|
46
|
A New Era of Neuro-Oncology Research Pioneered by Multi-Omics Analysis and Machine Learning. Biomolecules 2021; 11:biom11040565. [PMID: 33921457 PMCID: PMC8070530 DOI: 10.3390/biom11040565] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Revised: 04/02/2021] [Accepted: 04/07/2021] [Indexed: 02/06/2023] Open
Abstract
Although the incidence of central nervous system (CNS) cancers is not high, it significantly reduces a patient’s quality of life and results in high mortality rates. A low incidence also means a low number of cases, which in turn means a low amount of information. To compensate, researchers have tried to increase the amount of information available from a single test using high-throughput technologies. This approach, referred to as single-omics analysis, has only been partially successful as one type of data may not be able to appropriately describe all the characteristics of a tumor. It is presently unclear what type of data can describe a particular clinical situation. One way to solve this problem is to use multi-omics data. When using many types of data, a selected data type or a combination of them may effectively resolve a clinical question. Hence, we conducted a comprehensive survey of papers in the field of neuro-oncology that used multi-omics data for analysis and found that most of the papers utilized machine learning techniques. This fact shows that it is useful to utilize machine learning techniques in multi-omics analysis. In this review, we discuss the current status of multi-omics analysis in the field of neuro-oncology and the importance of using machine learning techniques.
Collapse
|
47
|
Termine A, Fabrizio C, Strafella C, Caputo V, Petrosini L, Caltagirone C, Giardina E, Cascella R. Multi-Layer Picture of Neurodegenerative Diseases: Lessons from the Use of Big Data through Artificial Intelligence. J Pers Med 2021; 11:280. [PMID: 33917161 PMCID: PMC8067806 DOI: 10.3390/jpm11040280] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Revised: 04/05/2021] [Accepted: 04/06/2021] [Indexed: 12/13/2022] Open
Abstract
In the big data era, artificial intelligence techniques have been applied to tackle traditional issues in the study of neurodegenerative diseases. Despite the progress made in understanding the complex (epi)genetics signatures underlying neurodegenerative disorders, performing early diagnosis and developing drug repurposing strategies remain serious challenges for such conditions. In this context, the integration of multi-omics, neuroimaging, and electronic health records data can be exploited using deep learning methods to provide the most accurate representation of patients possible. Deep learning allows researchers to find multi-modal biomarkers to develop more effective and personalized treatments, early diagnosis tools, as well as useful information for drug discovering and repurposing in neurodegenerative pathologies. In this review, we will describe how relevant studies have been able to demonstrate the potential of deep learning to enhance the knowledge of neurodegenerative disorders such as Alzheimer's and Parkinson's diseases through the integration of all sources of biomedical data.
Collapse
Affiliation(s)
- Andrea Termine
- IRCCS Santa Lucia Foundation, Genomic Medicine Laboratory UILDM, 00179 Rome, Italy; (A.T.); (C.S.); (V.C.); (R.C.)
| | - Carlo Fabrizio
- IRCCS Santa Lucia Foundation, Laboratory of Experimental and Behavioral Neurophysiology, 00143 Rome, Italy; (C.F.); (L.P.)
| | - Claudia Strafella
- IRCCS Santa Lucia Foundation, Genomic Medicine Laboratory UILDM, 00179 Rome, Italy; (A.T.); (C.S.); (V.C.); (R.C.)
- Department of Biomedicine and Prevention, Tor Vergata University of Rome, 00133 Rome, Italy
| | - Valerio Caputo
- IRCCS Santa Lucia Foundation, Genomic Medicine Laboratory UILDM, 00179 Rome, Italy; (A.T.); (C.S.); (V.C.); (R.C.)
- Department of Biomedicine and Prevention, Tor Vergata University of Rome, 00133 Rome, Italy
| | - Laura Petrosini
- IRCCS Santa Lucia Foundation, Laboratory of Experimental and Behavioral Neurophysiology, 00143 Rome, Italy; (C.F.); (L.P.)
| | - Carlo Caltagirone
- IRCCS Santa Lucia Foundation, Department of Clinical and Behavioral Neurology, 00179 Rome, Italy;
| | - Emiliano Giardina
- IRCCS Santa Lucia Foundation, Genomic Medicine Laboratory UILDM, 00179 Rome, Italy; (A.T.); (C.S.); (V.C.); (R.C.)
- UILDM Lazio ONLUS Foundation, Department of Biomedicine and Prevention, Tor Vergata University, 00133 Rome, Italy
| | - Raffaella Cascella
- IRCCS Santa Lucia Foundation, Genomic Medicine Laboratory UILDM, 00179 Rome, Italy; (A.T.); (C.S.); (V.C.); (R.C.)
- Department of Biomedical Sciences, Catholic University Our Lady of Good Counsel, 1000 Tirana, Albania
| |
Collapse
|
48
|
ORN: Inferring patient-specific dysregulation status of pathway modules in cancer with OR-gate Network. PLoS Comput Biol 2021; 17:e1008792. [PMID: 33819263 PMCID: PMC8049496 DOI: 10.1371/journal.pcbi.1008792] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 04/15/2021] [Accepted: 02/15/2021] [Indexed: 01/26/2023] Open
Abstract
Pathway level understanding of cancer plays a key role in precision oncology. However, the current amount of high-throughput data cannot support the elucidation of full pathway topology. In this study, instead of directly learning the pathway network, we adapted the probabilistic OR gate to model the modular structure of pathways and regulon. The resulting model, OR-gate Network (ORN), can simultaneously infer pathway modules of somatic alterations, patient-specific pathway dysregulation status, and downstream regulon. In a trained ORN, the differentially expressed genes (DEGs) in each tumour can be explained by somatic mutations perturbing a pathway module. Furthermore, the ORN handles one of the most important properties of pathway perturbation in tumours, the mutual exclusivity. We have applied the ORN to lower-grade glioma (LGG) samples and liver hepatocellular carcinoma (LIHC) samples in TCGA and breast cancer samples from METABRIC. Both datasets have shown abnormal pathway activities related to immune response and cell cycles. In LGG samples, ORN identified pathway modules closely related to glioma development and revealed two pathways closely related to patient survival. We had similar results with LIHC samples. Additional results from the METABRIC datasets showed that ORN could characterize critical mechanisms of cancer and connect them to less studied somatic mutations (e.g., BAP1, MIR604, MICAL3, and telomere activities), which may generate novel hypothesis for targeted therapy. Cellular functions are carried out by a set of gene products. Mutation of a single gene is often sufficient to disrupt certain biological functions and promote tumorigenesis. Therefore, genes participating in the same function are less likely to mutate in the same sample. Such phenomenon is called “mutual exclusivity”. In this study, our algorithm (ORN) has utilized this property to identify gene-level mutations that affect similar biological functions. It also considers mutations’ impact on mRNA expression. Functional modules identified by ORN tends to be mutually exclusive while causing similar differential expression profiles. When we applied ORN to lower-grade glioma and liver cancer datasets, we have identified gene modules significantly related to patient survival. Furthermore, across different types of cancer, ORN has connected well-known cancer driver mutations with genes whose functions remain unclear. These connections, once validated, can generate novel hypothesis for biologist to further investigate cancer mechanism and develop targeted therapy.
Collapse
|
49
|
Cancer Subtype Recognition Based on Laplacian Rank Constrained Multiview Clustering. Genes (Basel) 2021; 12:genes12040526. [PMID: 33916856 PMCID: PMC8065670 DOI: 10.3390/genes12040526] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 03/28/2021] [Accepted: 03/31/2021] [Indexed: 12/13/2022] Open
Abstract
Integrating multigenomic data to recognize cancer subtype is an important task in bioinformatics. In recent years, some multiview clustering algorithms have been proposed and applied to identify cancer subtype. However, these clustering algorithms ignore that each data contributes differently to the clustering results during the fusion process, and they require additional clustering steps to generate the final labels. In this paper, a new one-step method for cancer subtype recognition based on graph learning framework is designed, called Laplacian Rank Constrained Multiview Clustering (LRCMC). LRCMC first forms a graph for a single biological data to reveal the relationship between data points and uses affinity matrix to encode the graph structure. Then, it adds weights to measure the contribution of each graph and finally merges these individual graphs into a consensus graph. In addition, LRCMC constructs the adaptive neighbors to adjust the similarity of sample points, and it uses the rank constraint on the Laplacian matrix to ensure that each graph structure has the same connected components. Experiments on several benchmark datasets and The Cancer Genome Atlas (TCGA) datasets have demonstrated the effectiveness of the proposed algorithm comparing to the state-of-the-art methods.
Collapse
|
50
|
Vlachavas EI, Bohn J, Ückert F, Nürnberg S. A Detailed Catalogue of Multi-Omics Methodologies for Identification of Putative Biomarkers and Causal Molecular Networks in Translational Cancer Research. Int J Mol Sci 2021; 22:2822. [PMID: 33802234 PMCID: PMC8000236 DOI: 10.3390/ijms22062822] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 03/05/2021] [Accepted: 03/05/2021] [Indexed: 02/06/2023] Open
Abstract
Recent advances in sequencing and biotechnological methodologies have led to the generation of large volumes of molecular data of different omics layers, such as genomics, transcriptomics, proteomics and metabolomics. Integration of these data with clinical information provides new opportunities to discover how perturbations in biological processes lead to disease. Using data-driven approaches for the integration and interpretation of multi-omics data could stably identify links between structural and functional information and propose causal molecular networks with potential impact on cancer pathophysiology. This knowledge can then be used to improve disease diagnosis, prognosis, prevention, and therapy. This review will summarize and categorize the most current computational methodologies and tools for integration of distinct molecular layers in the context of translational cancer research and personalized therapy. Additionally, the bioinformatics tools Multi-Omics Factor Analysis (MOFA) and netDX will be tested using omics data from public cancer resources, to assess their overall robustness, provide reproducible workflows for gaining biological knowledge from multi-omics data, and to comprehensively understand the significantly perturbed biological entities in distinct cancer types. We show that the performed supervised and unsupervised analyses result in meaningful and novel findings.
Collapse
Affiliation(s)
- Efstathios Iason Vlachavas
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
| | - Jonas Bohn
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
| | - Frank Ückert
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
- Applied Medical Informatics, University Hospital Hamburg-Eppendorf, 20251 Hamburg, Germany
| | - Sylvia Nürnberg
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
- Applied Medical Informatics, University Hospital Hamburg-Eppendorf, 20251 Hamburg, Germany
| |
Collapse
|