1
|
Chen JY, Niu SH, Li HY, Liao XD, Xing SC. Multiomics analysis of the effects of manure-borne doxycycline combined with oversized fiber microplastics on pak choi growth and the risk of antibiotic resistance gene transmission. JOURNAL OF HAZARDOUS MATERIALS 2024; 475:134931. [PMID: 38889467 DOI: 10.1016/j.jhazmat.2024.134931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 05/23/2024] [Accepted: 06/13/2024] [Indexed: 06/20/2024]
Abstract
In this study, oversized microplastics (OMPs) were intentionally introduced into soil containing manure-borne doxycycline (DOX). This strategic approach was used to systematically examine the effects of combined OMP and DOX pollution on the growth of pak choi, analyze alterations in soil environmental metabolites, and explore the potential migration of antibiotic resistance genes (ARGs). The results revealed a more pronounced impact of DOX than of OMPs. Slender-fiber OMPs (SF OMPs) had a more substantial influence on the growth of pak choi than did coarse-fiber OMPs (CF OMPs). Conversely, CF OMPs had a more significant effect on the migration of ARGs within the system. When DOX was combined with OMPs, the negative effects of DOX on pak choi growth were mitigated through the synthesis of indole through the adjustment of carbon metabolism and amino acid metabolism in pak choi roots. In this process, Pseudohongiellaceae and Xanthomonadaceae were key bacteria. During the migration of ARGs, the potential host bacterium Limnobacter should be considered. Additionally, the majority of potential host bacteria in the pak choi endophytic environment were associated with tetG. This study provides insights into the intricate interplay among DOX, OMPs, ARGs, plant growth, soil metabolism, and the microbiome.
Collapse
Affiliation(s)
- Jing-Yuan Chen
- College of Animal Science, South China Agricultural University, Guangzhou, Guangdong 510642, China
| | - Shi-Hua Niu
- College of Animal Science, South China Agricultural University, Guangzhou, Guangdong 510642, China
| | - Hai-Yang Li
- Integrative Microbiology Research Centre, South China Agricultural University, Guangzhou 510642, China
| | - Xin-Di Liao
- College of Animal Science, South China Agricultural University, Guangzhou, Guangdong 510642, China; Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, and Key Laboratory of Chicken Genetics, Breeding and Reproduction, Ministry Agriculture, Guangzhou, Guangdong 510642, China; National-Local Joint Engineering Research Center for Livestock Breeding, Guangzhou, Guangdong 510642, China
| | - Si-Cheng Xing
- Integrative Microbiology Research Centre, South China Agricultural University, Guangzhou 510642, China; Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, and Key Laboratory of Chicken Genetics, Breeding and Reproduction, Ministry Agriculture, Guangzhou, Guangdong 510642, China; National-Local Joint Engineering Research Center for Livestock Breeding, Guangzhou, Guangdong 510642, China.
| |
Collapse
|
2
|
Zhao C, Su KJ, Wu C, Cao X, Sha Q, Li W, Luo Z, Qing T, Qiu C, Zhao LJ, Liu A, Jiang L, Zhang X, Shen H, Zhou W, Deng HW. Multi-scale variational autoencoder for imputation of missing values in untargeted metabolomics using whole-genome sequencing data. Comput Biol Med 2024; 179:108813. [PMID: 38955127 DOI: 10.1016/j.compbiomed.2024.108813] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 06/18/2024] [Accepted: 06/24/2024] [Indexed: 07/04/2024]
Abstract
BACKGROUND Missing data is a common challenge in mass spectrometry-based metabolomics, which can lead to biased and incomplete analyses. The integration of whole-genome sequencing (WGS) data with metabolomics data has emerged as a promising approach to enhance the accuracy of data imputation in metabolomics studies. METHOD In this study, we propose a novel method that leverages the information from WGS data and reference metabolites to impute unknown metabolites. Our approach utilizes a multi-scale variational autoencoder to jointly model the burden score, polygenetic risk score (PGS), and linkage disequilibrium (LD) pruned single nucleotide polymorphisms (SNPs) for feature extraction and missing metabolomics data imputation. By learning the latent representations of both omics data, our method can effectively impute missing metabolomics values based on genomic information. RESULTS We evaluate the performance of our method on empirical metabolomics datasets with missing values and demonstrate its superiority compared to conventional imputation techniques. Using 35 template metabolites derived burden scores, PGS and LD-pruned SNPs, the proposed methods achieved R2-scores > 0.01 for 71.55 % of metabolites. CONCLUSION The integration of WGS data in metabolomics imputation not only improves data completeness but also enhances downstream analyses, paving the way for more comprehensive and accurate investigations of metabolic pathways and disease associations. Our findings offer valuable insights into the potential benefits of utilizing WGS data for metabolomics data imputation and underscore the importance of leveraging multi-modal data integration in precision medicine research.
Collapse
Affiliation(s)
- Chen Zhao
- Department of Computer Science, Kennesaw State University, 680 Arntson Dr, Marietta, GA, 30060, USA
| | - Kuan-Jui Su
- Division of Biomedical Informatics and Genomics, Tulane Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University, New Orleans, LA, 70112, USA
| | - Chong Wu
- Department of Biostatistics, University of Texas MD Anderson, Pickens Academic Tower, 1400 Pressler St., Houston, TX, 77030, USA
| | - Xuewei Cao
- Department of Mathematical Sciences, Michigan Technological University, 1400 Townsend Dr, Houghton, MI, 49931, USA
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, 1400 Townsend Dr, Houghton, MI, 49931, USA
| | - Wu Li
- Division of Biomedical Informatics and Genomics, Tulane Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University, New Orleans, LA, 70112, USA
| | - Zhe Luo
- Division of Biomedical Informatics and Genomics, Tulane Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University, New Orleans, LA, 70112, USA
| | - Tian Qing
- Division of Biomedical Informatics and Genomics, Tulane Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University, New Orleans, LA, 70112, USA
| | - Chuan Qiu
- Division of Biomedical Informatics and Genomics, Tulane Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University, New Orleans, LA, 70112, USA
| | - Lan Juan Zhao
- Division of Biomedical Informatics and Genomics, Tulane Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University, New Orleans, LA, 70112, USA
| | - Anqi Liu
- Division of Biomedical Informatics and Genomics, Tulane Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University, New Orleans, LA, 70112, USA
| | - Lindong Jiang
- Division of Biomedical Informatics and Genomics, Tulane Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University, New Orleans, LA, 70112, USA
| | - Xiao Zhang
- Division of Biomedical Informatics and Genomics, Tulane Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University, New Orleans, LA, 70112, USA
| | - Hui Shen
- Division of Biomedical Informatics and Genomics, Tulane Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University, New Orleans, LA, 70112, USA
| | - Weihua Zhou
- Department of Applied Computing, Michigan Technological University, 1400 Townsend Dr, Houghton, MI, 49931, USA; Center for Biocomputing and Digital Health, Institute of Computing and Cybersystems, and Health Research Institute, Michigan Technological University, Houghton, MI, 49931, USA.
| | - Hong-Wen Deng
- Division of Biomedical Informatics and Genomics, Tulane Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University, New Orleans, LA, 70112, USA
| |
Collapse
|
3
|
Curion F, Theis FJ. Machine learning integrative approaches to advance computational immunology. Genome Med 2024; 16:80. [PMID: 38862979 PMCID: PMC11165829 DOI: 10.1186/s13073-024-01350-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Accepted: 05/23/2024] [Indexed: 06/13/2024] Open
Abstract
The study of immunology, traditionally reliant on proteomics to evaluate individual immune cells, has been revolutionized by single-cell RNA sequencing. Computational immunologists play a crucial role in analysing these datasets, moving beyond traditional protein marker identification to encompass a more detailed view of cellular phenotypes and their functional roles. Recent technological advancements allow the simultaneous measurements of multiple cellular components-transcriptome, proteome, chromatin, epigenetic modifications and metabolites-within single cells, including in spatial contexts within tissues. This has led to the generation of complex multiscale datasets that can include multimodal measurements from the same cells or a mix of paired and unpaired modalities. Modern machine learning (ML) techniques allow for the integration of multiple "omics" data without the need for extensive independent modelling of each modality. This review focuses on recent advancements in ML integrative approaches applied to immunological studies. We highlight the importance of these methods in creating a unified representation of multiscale data collections, particularly for single-cell and spatial profiling technologies. Finally, we discuss the challenges of these holistic approaches and how they will be instrumental in the development of a common coordinate framework for multiscale studies, thereby accelerating research and enabling discoveries in the computational immunology field.
Collapse
Affiliation(s)
- Fabiola Curion
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Department of Mathematics, School of Computation, Information and Technology, Technical University of Munich, Munich, Germany
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany.
- Department of Mathematics, School of Computation, Information and Technology, Technical University of Munich, Munich, Germany.
- School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany.
| |
Collapse
|
4
|
Braytee A, He S, Tang S, Sun Y, Jiang X, Yu X, Khatri I, Chaturvedi K, Prasad M, Anaissi A. Identification of cancer risk groups through multi-omics integration using autoencoder and tensor analysis. Sci Rep 2024; 14:11263. [PMID: 38760420 PMCID: PMC11101416 DOI: 10.1038/s41598-024-59670-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 04/12/2024] [Indexed: 05/19/2024] Open
Abstract
Identifying cancer risk groups by multi-omics has attracted researchers in their quest to find biomarkers from diverse risk-related omics. Stratifying the patients into cancer risk groups using genomics is essential for clinicians for pre-prevention treatment to improve the survival time for patients and identify the appropriate therapy strategies. This study proposes a multi-omics framework that can extract the features from various omics simultaneously. The framework employs autoencoders to learn the non-linear representation of the data and applies tensor analysis for feature learning. Further, the clustering method is used to stratify the patients into multiple cancer risk groups. Several omics were included in the experiments, namely methylation, somatic copy-number variation (SCNV), micro RNA (miRNA) and RNA sequencing (RNAseq) from two cancer types, including Glioma and Breast Invasive Carcinoma from the TCGA dataset. The results of this study are promising, as evidenced by the survival analysis and classification models, which outperformed the state-of-the-art. The patients can be significantly (p-value<0.05) divided into risk groups using extracted latent variables from the fused multi-omics data. The pipeline is open source to help researchers and clinicians identify the patients' risk groups using genomics.
Collapse
Affiliation(s)
- Ali Braytee
- School of Computer Science, University of Technology Sydney, Ultimo, 2007, Australia.
| | - Sam He
- School of Computer Science, The University of Sydney, Camperdown, 2006, Australia
| | - Shuxian Tang
- School of Computer Science, The University of Sydney, Camperdown, 2006, Australia
| | - Yuxuan Sun
- School of Computer Science, The University of Sydney, Camperdown, 2006, Australia
| | - Xiaoying Jiang
- School of Computer Science, The University of Sydney, Camperdown, 2006, Australia
| | - Xuanding Yu
- School of Computer Science, The University of Sydney, Camperdown, 2006, Australia
| | - Inder Khatri
- Department of Applied Mathematics, Delhi Technological University, Delhi, 110042, India
| | - Kunal Chaturvedi
- School of Computer Science, University of Technology Sydney, Ultimo, 2007, Australia
| | - Mukesh Prasad
- School of Computer Science, University of Technology Sydney, Ultimo, 2007, Australia
| | - Ali Anaissi
- School of Computer Science, The University of Sydney, Camperdown, 2006, Australia
- TD School, University of Technology Sydney, Ultimo, 2007, Australia
| |
Collapse
|
5
|
Lac L, Leung CK, Hu P. Computational frameworks integrating deep learning and statistical models in mining multimodal omics data. J Biomed Inform 2024; 152:104629. [PMID: 38552994 DOI: 10.1016/j.jbi.2024.104629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 02/26/2024] [Accepted: 03/25/2024] [Indexed: 04/04/2024]
Abstract
BACKGROUND In health research, multimodal omics data analysis is widely used to address important clinical and biological questions. Traditional statistical methods rely on the strong assumptions of distribution. Statistical methods such as testing and differential expression are commonly used in omics analysis. Deep learning, on the other hand, is an advanced computer science technique that is powerful in mining high-dimensional omics data for prediction tasks. Recently, integrative frameworks or methods have been developed for omics studies that combine statistical models and deep learning algorithms. METHODS AND RESULTS The aim of these integrative frameworks is to combine the strengths of both statistical methods and deep learning algorithms to improve prediction accuracy while also providing interpretability and explainability. This review report discusses the current state-of-the-art integrative frameworks, their limitations, and potential future directions in survival and time-to-event longitudinal analysis, dimension reduction and clustering, regression and classification, feature selection, and causal and transfer learning.
Collapse
Affiliation(s)
- Leann Lac
- Department of Computer Science, University of Manitoba, Winnipeg, Manitoba, Canada; Department of Statistics, University of Manitoba, Winnipeg, Manitoba, Canada
| | - Carson K Leung
- Department of Computer Science, University of Manitoba, Winnipeg, Manitoba, Canada
| | - Pingzhao Hu
- Department of Computer Science, University of Manitoba, Winnipeg, Manitoba, Canada; Department of Biochemistry, Western University, London, Ontario, Canada; Department of Computer Science, Western University, London, Ontario, Canada; Department of Oncology, Western University, London, Ontario, Canada; Department of Epidemiology and Biostatistics, Western University, London, Ontario, Canada; The Children's Health Research Institute, Lawson Health Research Institute, London, Ontario, Canada.
| |
Collapse
|
6
|
Tanvir RB, Islam MM, Sobhan M, Luo D, Mondal AM. MOGAT: A Multi-Omics Integration Framework Using Graph Attention Networks for Cancer Subtype Prediction. Int J Mol Sci 2024; 25:2788. [PMID: 38474033 DOI: 10.3390/ijms25052788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Revised: 02/15/2024] [Accepted: 02/22/2024] [Indexed: 03/14/2024] Open
Abstract
Accurate cancer subtype prediction is crucial for personalized medicine. Integrating multi-omics data represents a viable approach to comprehending the intricate pathophysiology of complex diseases like cancer. Conventional machine learning techniques are not ideal for analyzing the complex interrelationships among different categories of omics data. Numerous models have been suggested using graph-based learning to uncover veiled representations and network formations unique to distinct types of omics data to heighten predictions regarding cancers and characterize patients' profiles, amongst other applications aimed at improving disease management in medical research. The existing graph-based state-of-the-art multi-omics integration approaches for cancer subtype prediction, MOGONET, and SUPREME, use a graph convolutional network (GCN), which fails to consider the level of importance of neighboring nodes on a particular node. To address this gap, we hypothesize that paying attention to each neighbor or providing appropriate weights to neighbors based on their importance might improve the cancer subtype prediction. The natural choice to determine the importance of each neighbor of a node in a graph is to explore the graph attention network (GAT). Here, we propose MOGAT, a novel multi-omics integration approach, leveraging GAT models that incorporate graph-based learning with an attention mechanism. MOGAT utilizes a multi-head attention mechanism to extract appropriate information for a specific sample by assigning unique attention coefficients to neighboring samples. Based on our knowledge, our group is the first to explore GAT in multi-omics integration for cancer subtype prediction. To evaluate the performance of MOGAT in predicting cancer subtypes, we explored two sets of breast cancer data from TCGA and METABRIC. Our proposed approach, MOGAT, outperforms MOGONET by 32% to 46% and SUPREME by 2% to 16% in cancer subtype prediction in different scenarios, supporting our hypothesis. Our results also showed that GAT embeddings provide a better prognosis in differentiating the high-risk group from the low-risk group than raw features.
Collapse
Affiliation(s)
- Raihanul Bari Tanvir
- Knight Foundation School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| | - Md Mezbahul Islam
- Knight Foundation School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| | - Masrur Sobhan
- Knight Foundation School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| | - Dongsheng Luo
- Knight Foundation School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| | - Ananda Mohan Mondal
- Knight Foundation School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| |
Collapse
|
7
|
Mardoc E, Sow MD, Déjean S, Salse J. Genomic data integration tutorial, a plant case study. BMC Genomics 2024; 25:66. [PMID: 38233804 PMCID: PMC10792847 DOI: 10.1186/s12864-023-09833-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 11/22/2023] [Indexed: 01/19/2024] Open
Abstract
BACKGROUND The ongoing evolution of the Next Generation Sequencing (NGS) technologies has led to the production of genomic data on a massive scale. While tools for genomic data integration and analysis are becoming increasingly available, the conceptual and analytical complexities still represent a great challenge in many biological contexts. RESULTS To address this issue, we describe a six-steps tutorial for the best practices in genomic data integration, consisting of (1) designing a data matrix; (2) formulating a specific biological question toward data description, selection and prediction; (3) selecting a tool adapted to the targeted questions; (4) preprocessing of the data; (5) conducting preliminary analysis, and finally (6) executing genomic data integration. CONCLUSION The tutorial has been tested and demonstrated on publicly available genomic data generated from poplar (Populus L.), a woody plant model. We also developed a new graphical output for the unsupervised multi-block analysis, cimDiablo_v2, available at https://forgemia.inra.fr/umr-gdec/omics-integration-on-poplar , and allowing the selection of master drivers in genomic data variation and interplay.
Collapse
Affiliation(s)
- Emile Mardoc
- UCA-INRAE UMR 1095 Genetics, Diversity and Ecophysiology of Cereals (GDEC), 5 Chemin de Beaulieu, 63000, Clermont-Ferrand, France
| | - Mamadou Dia Sow
- UCA-INRAE UMR 1095 Genetics, Diversity and Ecophysiology of Cereals (GDEC), 5 Chemin de Beaulieu, 63000, Clermont-Ferrand, France
| | - Sébastien Déjean
- Institut de Mathématiques de Toulouse, UMR 5219, Université de Toulouse, CNRS, Université Paul Sabatier, Toulouse, France
| | - Jérôme Salse
- UCA-INRAE UMR 1095 Genetics, Diversity and Ecophysiology of Cereals (GDEC), 5 Chemin de Beaulieu, 63000, Clermont-Ferrand, France.
| |
Collapse
|
8
|
Zhang W, Zhang M, Sun M, Hu M, Yu M, Sun J, Zhang X, Du B. Metabolomics-transcriptomics joint analysis: unveiling the dysregulated cell death network and developing a diagnostic model for high-grade neuroblastoma. Front Immunol 2024; 14:1345734. [PMID: 38239355 PMCID: PMC10794662 DOI: 10.3389/fimmu.2023.1345734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 12/14/2023] [Indexed: 01/22/2024] Open
Abstract
High-grade neuroblastoma (HG-NB) exhibits a significantly diminished survival rate in comparison to low-grade neuroblastoma (LG-NB), primarily attributed to the mechanism of HG-NB is unclear and the lacking effective therapeutic targets and diagnostic model. Therefore, the current investigation aims to study the dysregulated network between HG-NB and LG-NB based on transcriptomics and metabolomics joint analysis. Meanwhile, a risk diagnostic model to distinguish HG-NB and LG-NB was also developed. Metabolomics analysis was conducted using plasma samples obtained from 48 HG-NB patients and 36 LG-NB patients. A total of 39 metabolites exhibited alterations, with 20 showing an increase and 19 displaying a decrease in HG-NB. Additionally, transcriptomics analysis was performed on NB tissue samples collected from 31 HG-NB patients and 20 LG-NB patients. Results showed that a significant alteration was observed in a total of 1,199 mRNAs in HG-NB, among which 893 were upregulated while the remaining 306 were downregulated. In particular, the joint analysis of both omics data revealed three aberrant pathways, namely the cAMP signaling pathway, PI3K-Akt signaling pathway, and TNF signaling pathway, which were found to be associated with cell death. Notably, a diagnostic model for HG-NB risk classification was developed based on the genes MGST1, SERPINE1, and ERBB3 with an area under the receiver operating characteristic curve of 0.915. In the validation set, the sensitivity and specificity were determined to be 75.0% and 80.0%, respectively.
Collapse
Affiliation(s)
- Wancun Zhang
- Health Commission of Henan Province Key Laboratory for Precision Diagnosis and Treatment of Pediatric Tumor, Children’s Hospital Affiliated to Zhengzhou University, Zhengzhou, China
- Henan International Joint Laboratory for Prevention and Treatment of Pediatric Disease, Children’s Hospital Affiliated to Zhengzhou University, Zhengzhou, China
- Henan Key Laboratory of Children’s Genetics and Metabolic Diseases, Children’s Hospital Affiliated to Zhengzhou University, Zhengzhou, China
| | - Mengxin Zhang
- Health Commission of Henan Province Key Laboratory for Precision Diagnosis and Treatment of Pediatric Tumor, Children’s Hospital Affiliated to Zhengzhou University, Zhengzhou, China
| | - Meng Sun
- Henan Key Laboratory of Children’s Genetics and Metabolic Diseases, Children’s Hospital Affiliated to Zhengzhou University, Zhengzhou, China
| | - Minghui Hu
- Health Commission of Henan Province Key Laboratory for Precision Diagnosis and Treatment of Pediatric Tumor, Children’s Hospital Affiliated to Zhengzhou University, Zhengzhou, China
| | - Muchun Yu
- Henan International Joint Laboratory for Prevention and Treatment of Pediatric Disease, Children’s Hospital Affiliated to Zhengzhou University, Zhengzhou, China
| | - Jushan Sun
- Health Commission of Henan Province Key Laboratory for Precision Diagnosis and Treatment of Pediatric Tumor, Children’s Hospital Affiliated to Zhengzhou University, Zhengzhou, China
| | - Xianwei Zhang
- Health Commission of Henan Province Key Laboratory for Precision Diagnosis and Treatment of Pediatric Tumor, Children’s Hospital Affiliated to Zhengzhou University, Zhengzhou, China
| | - Bang Du
- Health Commission of Henan Province Key Laboratory for Precision Diagnosis and Treatment of Pediatric Tumor, Children’s Hospital Affiliated to Zhengzhou University, Zhengzhou, China
- Henan International Joint Laboratory for Prevention and Treatment of Pediatric Disease, Children’s Hospital Affiliated to Zhengzhou University, Zhengzhou, China
- Henan Key Laboratory of Children’s Genetics and Metabolic Diseases, Children’s Hospital Affiliated to Zhengzhou University, Zhengzhou, China
| |
Collapse
|
9
|
Li Z, Melograna F, Hoskens H, Duroux D, Marazita ML, Walsh S, Weinberg SM, Shriver MD, Müller-Myhsok B, Claes P, Van Steen K. netMUG: a novel network-guided multi-view clustering workflow for dissecting genetic and facial heterogeneity. Front Genet 2023; 14:1286800. [PMID: 38125750 PMCID: PMC10731261 DOI: 10.3389/fgene.2023.1286800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 11/14/2023] [Indexed: 12/23/2023] Open
Abstract
Introduction: Multi-view data offer advantages over single-view data for characterizing individuals, which is crucial in precision medicine toward personalized prevention, diagnosis, or treatment follow-up. Methods: Here, we develop a network-guided multi-view clustering framework named netMUG to identify actionable subgroups of individuals. This pipeline first adopts sparse multiple canonical correlation analysis to select multi-view features possibly informed by extraneous data, which are then used to construct individual-specific networks (ISNs). Finally, the individual subtypes are automatically derived by hierarchical clustering on these network representations. Results: We applied netMUG to a dataset containing genomic data and facial images to obtain BMI-informed multi-view strata and showed how it could be used for a refined obesity characterization. Benchmark analysis of netMUG on synthetic data with known strata of individuals indicated its superior performance compared with both baseline and benchmark methods for multi-view clustering. The clustering derived from netMUG achieved an adjusted Rand index of 1 with respect to the synthesized true labels. In addition, the real-data analysis revealed subgroups strongly linked to BMI and genetic and facial determinants of these subgroups. Discussion: netMUG provides a powerful strategy, exploiting individual-specific networks to identify meaningful and actionable strata. Moreover, the implementation is easy to generalize to accommodate heterogeneous data sources or highlight data structures.
Collapse
Affiliation(s)
- Zuqi Li
- BIO3 - Laboratory for Systems Medicine, Department of Human Genetics, KU Leuven, Leuven, Belgium
- Medical Imaging Research Center, University Hospitals Leuven, Leuven, Belgium
| | - Federico Melograna
- BIO3 - Laboratory for Systems Medicine, Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Hanne Hoskens
- BIO3 - Laboratory for Systems Medicine, Department of Human Genetics, KU Leuven, Leuven, Belgium
- Medical Imaging Research Center, University Hospitals Leuven, Leuven, Belgium
| | - Diane Duroux
- BIO3 - Laboratory for Systems Genetics, GIGA-R Medical Genomics, University of Liège, Liège, Belgium
| | - Mary L. Marazita
- Center for Craniofacial and Dental Genetics, Department of Oral and Craniofacial Sciences, University of Pittsburgh, Pittsburgh, PA, United States
- Department of Human Genetics, University of Pittsburgh, Pittsburgh, PA, United States
| | - Susan Walsh
- Department of Biology, Indiana University Indianapolis, Indianapolis, IN, United States
| | - Seth M. Weinberg
- Center for Craniofacial and Dental Genetics, Department of Oral and Craniofacial Sciences, University of Pittsburgh, Pittsburgh, PA, United States
- Department of Human Genetics, University of Pittsburgh, Pittsburgh, PA, United States
| | - Mark D. Shriver
- Department of Anthropology, Pennsylvania State University, State College, PA, United States
| | | | - Peter Claes
- BIO3 - Laboratory for Systems Medicine, Department of Human Genetics, KU Leuven, Leuven, Belgium
- Medical Imaging Research Center, University Hospitals Leuven, Leuven, Belgium
- Department of Electrical Engineering, KU Leuven, Leuven, Belgium
- Murdoch Children’s Research Institute, Melbourne, VIC, Australia
| | - Kristel Van Steen
- BIO3 - Laboratory for Systems Medicine, Department of Human Genetics, KU Leuven, Leuven, Belgium
- BIO3 - Laboratory for Systems Genetics, GIGA-R Medical Genomics, University of Liège, Liège, Belgium
| |
Collapse
|
10
|
Gammaldi N, Pezzini F, Michelucci E, Di Giorgi N, Simonati A, Rocchiccioli S, Santorelli FM, Doccini S. Integrative human and murine multi-omics: Highlighting shared biomarkers in the neuronal ceroid lipofuscinoses. Neurobiol Dis 2023; 189:106349. [PMID: 37952681 DOI: 10.1016/j.nbd.2023.106349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 11/08/2023] [Accepted: 11/09/2023] [Indexed: 11/14/2023] Open
Abstract
Neuronal ceroid lipofuscinosis (NCL) is a group of neurodegenerative disorders whose molecular mechanisms remain largely unknown. Omics approaches are among the methods that generate new information on modifying factors and molecular signatures. Moreover, omics data integration can address the need to progressively expand knowledge around the disease and pinpoint specific proteins to promote as candidate biomarkers. In this work, we integrated a total of 62 proteomic and transcriptomic datasets originating from humans and mice, employing a new approach able to define dysregulated processes across species, stages and NCL forms. Moreover, we selected a pool of differentially expressed proteins and genes as species- and form-related biomarkers of disease status/progression and evaluated local and spatial differences in most affected brain regions. Our results offer promising targets for potential new therapeutic strategies and reinforce the hypothesis of a connection between NCLs and other forms of dementia, particularly Alzheimer's disease.
Collapse
Affiliation(s)
- N Gammaldi
- Department of Neurosciences, Psychology, Drug Research and Child Health (NEUROFARBA), University of Florence, Florence, Italy; Molecular Medicine for Neurodegenerative and Neuromuscular Diseases Unit, IRCCS Stella Maris Foundation - Pisa, Italy
| | - F Pezzini
- Department of Surgery, Dentistry, Paediatrics and Gynaecology, University of Verona, Verona, Italy
| | - E Michelucci
- Clinical Physiology-National Research Council (IFC-CNR), Pisa, Italy
| | - N Di Giorgi
- Clinical Physiology-National Research Council (IFC-CNR), Pisa, Italy
| | - A Simonati
- Department of Surgery, Dentistry, Paediatrics and Gynaecology, University of Verona, Verona, Italy
| | - S Rocchiccioli
- Clinical Physiology-National Research Council (IFC-CNR), Pisa, Italy
| | - F M Santorelli
- Molecular Medicine for Neurodegenerative and Neuromuscular Diseases Unit, IRCCS Stella Maris Foundation - Pisa, Italy
| | - S Doccini
- Molecular Medicine for Neurodegenerative and Neuromuscular Diseases Unit, IRCCS Stella Maris Foundation - Pisa, Italy.
| |
Collapse
|
11
|
Reitz CJ, Kuzmanov U, Gramolini AO. Multi-omic analyses and network biology in cardiovascular disease. Proteomics 2023; 23:e2200289. [PMID: 37691071 DOI: 10.1002/pmic.202200289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Revised: 08/11/2023] [Accepted: 08/22/2023] [Indexed: 09/12/2023]
Abstract
Heart disease remains a leading cause of death in North America and worldwide. Despite advances in therapies, the chronic nature of cardiovascular diseases ultimately results in frequent hospitalizations and steady rates of mortality. Systems biology approaches have provided a new frontier toward unraveling the underlying mechanisms of cell, tissue, and organ dysfunction in disease. Mapping the complex networks of molecular functions across the genome, transcriptome, proteome, and metabolome has enormous potential to advance our understanding of cardiovascular disease, discover new disease biomarkers, and develop novel therapies. Computational workflows to interpret these data-intensive analyses as well as integration between different levels of interrogation remain important challenges in the advancement and application of systems biology-based analyses in cardiovascular research. This review will focus on summarizing the recent developments in network biology-level profiling in the heart, with particular emphasis on modeling of human heart failure. We will provide new perspectives on integration between different levels of large "omics" datasets, including integration of gene regulatory networks, protein-protein interactions, signaling networks, and metabolic networks in the heart.
Collapse
Affiliation(s)
- Cristine J Reitz
- Department of Physiology, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
- Translational Biology and Engineering Program, Ted Rogers Centre for Heart Research, Toronto, Ontario, Canada
| | - Uros Kuzmanov
- Department of Physiology, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
- Translational Biology and Engineering Program, Ted Rogers Centre for Heart Research, Toronto, Ontario, Canada
| | - Anthony O Gramolini
- Department of Physiology, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
- Translational Biology and Engineering Program, Ted Rogers Centre for Heart Research, Toronto, Ontario, Canada
| |
Collapse
|
12
|
Huang L, Song M, Shen H, Hong H, Gong P, Deng HW, Zhang C. Deep Learning Methods for Omics Data Imputation. BIOLOGY 2023; 12:1313. [PMID: 37887023 PMCID: PMC10604785 DOI: 10.3390/biology12101313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 09/28/2023] [Accepted: 10/02/2023] [Indexed: 10/28/2023]
Abstract
One common problem in omics data analysis is missing values, which can arise due to various reasons, such as poor tissue quality and insufficient sample volumes. Instead of discarding missing values and related data, imputation approaches offer an alternative means of handling missing data. However, the imputation of missing omics data is a non-trivial task. Difficulties mainly come from high dimensionality, non-linear or non-monotonic relationships within features, technical variations introduced by sampling methods, sample heterogeneity, and the non-random missingness mechanism. Several advanced imputation methods, including deep learning-based methods, have been proposed to address these challenges. Due to its capability of modeling complex patterns and relationships in large and high-dimensional datasets, many researchers have adopted deep learning models to impute missing omics data. This review provides a comprehensive overview of the currently available deep learning-based methods for omics imputation from the perspective of deep generative model architectures such as autoencoder, variational autoencoder, generative adversarial networks, and Transformer, with an emphasis on multi-omics data imputation. In addition, this review also discusses the opportunities that deep learning brings and the challenges that it might face in this field.
Collapse
Affiliation(s)
- Lei Huang
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS 39406, USA
| | - Meng Song
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS 39406, USA
| | - Hui Shen
- Center for Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA 70112, USA
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Ping Gong
- Environmental Laboratory, U.S. Army Engineer Research and Development Center, Vicksburg, MS 39180, USA
| | - Hong-Wen Deng
- Center for Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA 70112, USA
| | - Chaoyang Zhang
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS 39406, USA
| |
Collapse
|
13
|
Velten B, Stegle O. Principles and challenges of modeling temporal and spatial omics data. Nat Methods 2023; 20:1462-1474. [PMID: 37710019 DOI: 10.1038/s41592-023-01992-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Accepted: 07/31/2023] [Indexed: 09/16/2023]
Abstract
Studies with temporal or spatial resolution are crucial to understand the molecular dynamics and spatial dependencies underlying a biological process or system. With advances in high-throughput omic technologies, time- and space-resolved molecular measurements at scale are increasingly accessible, providing new opportunities to study the role of timing or structure in a wide range of biological questions. At the same time, analyses of the data being generated in the context of spatiotemporal studies entail new challenges that need to be considered, including the need to account for temporal and spatial dependencies and compare them across different scales, biological samples or conditions. In this Review, we provide an overview of common principles and challenges in the analysis of temporal and spatial omics data. We discuss statistical concepts to model temporal and spatial dependencies and highlight opportunities for adapting existing analysis methods to data with temporal and spatial dimensions.
Collapse
Affiliation(s)
- Britta Velten
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany.
- Cellular Genetics Programme, Wellcome Sanger Institute, Hinxton, Cambridge, UK.
- Centre for Organismal Studies (COS) and Interdisciplinary Center for Scientific Computing (IWR), Heidelberg University, Heidelberg, Germany.
| | - Oliver Stegle
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany.
- Cellular Genetics Programme, Wellcome Sanger Institute, Hinxton, Cambridge, UK.
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany.
| |
Collapse
|
14
|
Athaya T, Ripan RC, Li X, Hu H. Multimodal deep learning approaches for single-cell multi-omics data integration. Brief Bioinform 2023; 24:bbad313. [PMID: 37651607 PMCID: PMC10516349 DOI: 10.1093/bib/bbad313] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 06/23/2023] [Accepted: 07/18/2023] [Indexed: 09/02/2023] Open
Abstract
Integrating single-cell multi-omics data is a challenging task that has led to new insights into complex cellular systems. Various computational methods have been proposed to effectively integrate these rapidly accumulating datasets, including deep learning. However, despite the proven success of deep learning in integrating multi-omics data and its better performance over classical computational methods, there has been no systematic study of its application to single-cell multi-omics data integration. To fill this gap, we conducted a literature review to explore the use of multimodal deep learning techniques in single-cell multi-omics data integration, taking into account recent studies from multiple perspectives. Specifically, we first summarized different modalities found in single-cell multi-omics data. We then reviewed current deep learning techniques for processing multimodal data and categorized deep learning-based integration methods for single-cell multi-omics data according to data modality, deep learning architecture, fusion strategy, key tasks and downstream analysis. Finally, we provided insights into using these deep learning models to integrate multi-omics data and better understand single-cell biological mechanisms.
Collapse
Affiliation(s)
- Tasbiraha Athaya
- Department of Computer Science, University of Central Florida, Orlando, Florida, United States of America
| | - Rony Chowdhury Ripan
- Department of Computer Science, University of Central Florida, Orlando, Florida, United States of America
| | - Xiaoman Li
- Burnett School of Biomedical Science, College of Medicine, University of Central Florida, Orlando, Florida, United States of America
| | - Haiyan Hu
- Department of Computer Science, University of Central Florida, Orlando, Florida, United States of America
| |
Collapse
|
15
|
Ye X, Shang Y, Shi T, Zhang W, Sakurai T. Multi-omics clustering for cancer subtyping based on latent subspace learning. Comput Biol Med 2023; 164:107223. [PMID: 37490833 DOI: 10.1016/j.compbiomed.2023.107223] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Revised: 06/07/2023] [Accepted: 06/30/2023] [Indexed: 07/27/2023]
Abstract
The increased availability of high-throughput technologies has enabled biomedical researchers to learn about disease etiology across multiple omics layers, which shows promise for improving cancer subtype identification. Many computational methods have been developed to perform clustering on multi-omics data, however, only a few of them are applicable for partial multi-omics in which some samples lack data in some types of omics. In this study, we propose a novel multi-omics clustering method based on latent sub-space learning (MCLS), which can deal with the missing multi-omics for clustering. We utilize the data with complete omics to construct a latent subspace using PCA-based feature extraction and singular value decomposition (SVD). The data with incomplete multi-omics are then projected to the latent subspace, and spectral clustering is performed to find the clusters. The proposed MCLS method is evaluated on seven different cancer datasets on three levels of omics in both full and partial cases compared to several state-of-the-art methods. The experimental results show that the proposed MCLS method is more efficient and effective than the compared methods for cancer subtype identification in multi-omics data analysis, which provides important references to a comprehensive understanding of cancer and biological mechanisms. AVAILABILITY: The proposed method can be freely accessible at https://github.com/ShangCS/MCLS.
Collapse
Affiliation(s)
- Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan; Tsukuba Life Science Innovation Program, University of Tsukuba, Tsukuba, 3058577, Japan.
| | - Yifan Shang
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan
| | - Tianyi Shi
- Tsukuba Life Science Innovation Program, University of Tsukuba, Tsukuba, 3058577, Japan
| | - Weihang Zhang
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan
| | - Tetsuya Sakurai
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan; Tsukuba Life Science Innovation Program, University of Tsukuba, Tsukuba, 3058577, Japan
| |
Collapse
|
16
|
Shi Y, Wan J, Zhang X, Yin Y. CL-Impute: A contrastive learning-based imputation for dropout single-cell RNA-seq data. Comput Biol Med 2023; 164:107263. [PMID: 37531858 DOI: 10.1016/j.compbiomed.2023.107263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 06/27/2023] [Accepted: 07/16/2023] [Indexed: 08/04/2023]
Abstract
BACKGROUND Single-cell RNA-sequencing (scRNA-seq) technology has revolutionized the study of cell heterogeneity and biological interpretation at the single-cell level. However, the dropout events commonly present in scRNA-seq data can markedly reduce the reliability of downstream analysis. Existing imputation methods often overlook the discrepancy between the established cell relationship from dropout noisy data and reality, which limits their performances due to the learned untrustworthy cell representations. METHOD Here, we propose a novel approach called the CL-Impute (Contrastive Learning-based Impute) model for estimating missing genes without relying on preconstructed cell relationships. CL-Impute utilizes contrastive learning and a self-attention network to address this challenge. Specifically, the proposed CL-Impute model leverages contrastive learning to learn cell representations from the self-perspective of dropout events, whereas the self-attention network captures cell relationships from the global-perspective. RESULTS Experimental results on four benchmark datasets, including quantitative assessment, cell clustering, gene identification, and trajectory inference, demonstrate the superior performance of CL-Impute compared with that of existing state-of-the-art imputation methods. Furthermore, our experiment reveals that combining contrastive learning and masking cell augmentation enables the model to learn actual latent features from noisy data with a high rate of dropout events, enhancing the reliability of imputed values. CONCLUSIONS CL-Impute is a novel contrastive learning-based method to impute scRNA-seq data in the context of high dropout rate. The source code of CL-Impute is available at https://github.com/yuchen21-web/Imputation-for-scRNA-seq.
Collapse
Affiliation(s)
- Yuchen Shi
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, 310018, China; Key Laboratory of Complex Systems Modeling and Simulation Ministry of Education, Ministry of Education, China
| | - Jian Wan
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, 310018, China; School of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou, 310023, China
| | - Xin Zhang
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, 310018, China; Key Laboratory of Complex Systems Modeling and Simulation Ministry of Education, Ministry of Education, China.
| | - Yuyu Yin
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, 310018, China; Key Laboratory of Complex Systems Modeling and Simulation Ministry of Education, Ministry of Education, China.
| |
Collapse
|
17
|
Goh WWB, Hui HWH, Wong L. How missing value imputation is confounded with batch effects and what you can do about it. Drug Discov Today 2023; 28:103661. [PMID: 37301250 DOI: 10.1016/j.drudis.2023.103661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 05/31/2023] [Accepted: 06/05/2023] [Indexed: 06/12/2023]
Abstract
In data-processing pipelines, upstream steps can influence downstream processes because of their sequential nature. Among these data-processing steps, batch effect (BE) correction (BEC) and missing value imputation (MVI) are crucial for ensuring data suitability for advanced modeling and reducing the likelihood of false discoveries. Although BEC-MVI interactions are not well studied, they are ultimately interdependent. Batch sensitization can improve the quality of MVI. Conversely, accounting for missingness also improves proper BE estimation in BEC. Here, we discuss how BEC and MVI are interconnected and interdependent. We show how batch sensitization can improve any MVI and bring attention to the idea of BE-associated missing values (BEAMs). Finally, we discuss how batch-class imbalance problems can be mitigated by borrowing ideas from machine learning.
Collapse
Affiliation(s)
- Wilson Wen Bin Goh
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore; School of Biological Sciences, Nanyang Technological University, Singapore; Center for Biomedical Informatics, Nanyang Technological University, Singapore.
| | - Harvard Wai Hann Hui
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore; School of Biological Sciences, Nanyang Technological University, Singapore
| | - Limsoon Wong
- Department of Computer Science, National University of Singapore, Singapore; Department of Pathology, National University of Singapore, Singapore.
| |
Collapse
|
18
|
Blutt SE, Coarfa C, Neu J, Pammi M. Multiomic Investigations into Lung Health and Disease. Microorganisms 2023; 11:2116. [PMID: 37630676 PMCID: PMC10459661 DOI: 10.3390/microorganisms11082116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 08/08/2023] [Accepted: 08/13/2023] [Indexed: 08/27/2023] Open
Abstract
Diseases of the lung account for more than 5 million deaths worldwide and are a healthcare burden. Improving clinical outcomes, including mortality and quality of life, involves a holistic understanding of the disease, which can be provided by the integration of lung multi-omics data. An enhanced understanding of comprehensive multiomic datasets provides opportunities to leverage those datasets to inform the treatment and prevention of lung diseases by classifying severity, prognostication, and discovery of biomarkers. The main objective of this review is to summarize the use of multiomics investigations in lung disease, including multiomics integration and the use of machine learning computational methods. This review also discusses lung disease models, including animal models, organoids, and single-cell lines, to study multiomics in lung health and disease. We provide examples of lung diseases where multi-omics investigations have provided deeper insight into etiopathogenesis and have resulted in improved preventative and therapeutic interventions.
Collapse
Affiliation(s)
- Sarah E. Blutt
- Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, TX 77030, USA;
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX 77030, USA;
| | - Cristian Coarfa
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX 77030, USA;
- Dan L Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Josef Neu
- Department of Pediatrics, Section of Neonatology, University of Florida, Gainesville, FL 32611, USA;
| | - Mohan Pammi
- Department of Pediatrics, Section of Neonatology, Baylor College of Medicine and Texas Children’s Hospital, Houston, TX 77030, USA
| |
Collapse
|
19
|
Wang XH, Zhang SF, Wu HY, Gao J, Wang L, Wang XH, Gao TH. miRNA326-5p Targets DKC1 Gene to Regulate Apoptosis-Related Proteins and Intervene in the Development of Neuroblastoma. Anal Cell Pathol (Amst) 2023; 2023:6761894. [PMID: 37426487 PMCID: PMC10329557 DOI: 10.1155/2023/6761894] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 05/25/2023] [Accepted: 06/07/2023] [Indexed: 07/11/2023] Open
Abstract
Objective To study the effect of congenital dyskeratosis 1 (DKC1) on neuroblastoma and its regulation mechanism. Methods The expression of DKC1 in neuroblastoma was analyzed by TCGA database and molecular assay. NB cells were transfected with siDKC1 to observe the effects of DKC1 on proliferation, cloning, metastasis, and invasion, and apoptosis and apoptosis-related proteins. The tumor-bearing mouse model was constructed, shDKC1 was transfected to observe the tumor growth and tumor tissue changes, and the expression of DKC1 and Ki-67 was detected. Screening and identification of miRNA326-5p targeting DKC1. NB cells were treated with miRNA326-5p mimic or inhibitors to detect the expression of DKC1. NB cells were transfected with miRNA326-5p and DKC1 mimics to detect cell proliferation, apoptosis, and apoptotic protein expression. Results DKC1 was highly expressed in NB cells and tissues. The activity, proliferation, invasion, and migration of NB cells were significantly decreased by DKC1 gene knockout, while apoptosis was significantly increased. The expression level of B-cell lymphoma-2 in shDKC1 group was significantly lower than that of the control group, while the expression level of BAK, BAX, and caspase-3 was significantly higher than that of the control group. The results of experiments on tumor-bearing mice were consistent with the above results. The results of miRNA assay showed that miRNA326-5p could bind DKC1 mRNA to inhibit the protein expression, thereby inhibiting the proliferation of NB cells, promoting their apoptosis, and regulating the expression of apoptotic proteins. Conclusion miRNA326-5p targeting DKC1 mRNA regulates apoptosis-related proteins to inhibit neuroblastoma proliferation and promote the apoptotic process.
Collapse
Affiliation(s)
- Xiao-Hui Wang
- Department of Pediatric Surgery, Henan Provincial People's Hospital, Zhengzhou 450000, China
| | - Shu-Feng Zhang
- Department of Pediatric Surgery, Henan Provincial People's Hospital, Zhengzhou 450000, China
| | - Hai-Ying Wu
- Department of Obstetrics, Henan Provincial People's Hospital, Zhengzhou 450000, China
| | - Jian Gao
- Department of Pediatric Surgery, Henan Provincial People's Hospital, Zhengzhou 450000, China
| | - Lin Wang
- Department of Pediatric Surgery, Henan Provincial People's Hospital, Zhengzhou 450000, China
| | - Xu-Hui Wang
- Department of Pediatric Surgery, Henan Provincial People's Hospital, Zhengzhou 450000, China
| | - Tian-Hui Gao
- Department of Medical Oncology, Henan Provincial People's Hospital, Zhengzhou 450000, China
| |
Collapse
|
20
|
Li Z, Melograna F, Hoskens H, Duroux D, Marazita ML, Walsh S, Weinberg SM, Shriver MD, Müller-Myhsok B, Claes P, Van Steen K. netMUG: a novel network-guided multi-view clustering workflow for dissecting genetic and facial heterogeneity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.04.539350. [PMID: 37205363 PMCID: PMC10187283 DOI: 10.1101/2023.05.04.539350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Multi-view data offer advantages over single-view data for characterizing individuals, which is crucial in precision medicine toward personalized prevention, diagnosis, or treatment follow-up. Here, we develop a network-guided multi-view clustering framework named netMUG to identify actionable subgroups of individuals. This pipeline first adopts sparse multiple canonical correlation analysis to select multi-view features possibly informed by extraneous data, which are then used to construct individual-specific networks (ISNs). Finally, the individual subtypes are automatically derived by hierarchical clustering on these network representations. We applied netMUG to a dataset containing genomic data and facial images to obtain BMI-informed multi-view strata and showed how it could be used for a refined obesity characterization. Benchmark analysis of netMUG on synthetic data with known strata of individuals indicated its superior performance compared with both baseline and benchmark methods for multi-view clustering. In addition, the real-data analysis revealed subgroups strongly linked to BMI and genetic and facial determinants of these classes. NetMUG provides a powerful strategy, exploiting individual-specific networks to identify meaningful and actionable strata. Moreover, the implementation is easy to generalize to accommodate heterogeneous data sources or highlight data structures.
Collapse
Affiliation(s)
- Zuqi Li
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- Medical Imaging Research Center, University Hospitals Leuven, Leuven, Belgium
| | | | - Hanne Hoskens
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- Medical Imaging Research Center, University Hospitals Leuven, Leuven, Belgium
| | - Diane Duroux
- GIGA-R Medical Genomics, University of Liège, Liège, Belgium
| | - Mary L. Marazita
- Center for Craniofacial and Dental Genetics, Department of Oral and Craniofacial Sciences, University of Pittsburgh, Pittsburgh, PA 15219, USA
- Department of Human Genetics, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Susan Walsh
- Department of Biology, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, USA
| | - Seth M. Weinberg
- Center for Craniofacial and Dental Genetics, Department of Oral and Craniofacial Sciences, University of Pittsburgh, Pittsburgh, PA 15219, USA
- Department of Human Genetics, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Mark D. Shriver
- Department of Anthropology, Pennsylvania State University, State College, PA 16801, USA
| | | | - Peter Claes
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- Medical Imaging Research Center, University Hospitals Leuven, Leuven, Belgium
- Department of Electrical Engineering, ESAT/PSI, KU Leuven, Leuven, Belgium
- Murdoch Children’s Research Institute, Melbourne, Victoria, Australia
| | - Kristel Van Steen
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- GIGA-R Medical Genomics, University of Liège, Liège, Belgium
| |
Collapse
|
21
|
Wu Z, Lohmöller J, Kuhl C, Wehrle K, Jankowski J. Use of Computation Ecosystems to Analyze the Kidney-Heart Crosstalk. Circ Res 2023; 132:1084-1100. [PMID: 37053282 DOI: 10.1161/circresaha.123.321765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/15/2023]
Abstract
The identification of mediators for physiologic processes, correlation of molecular processes, or even pathophysiological processes within a single organ such as the kidney or heart has been extensively studied to answer specific research questions using organ-centered approaches in the past 50 years. However, it has become evident that these approaches do not adequately complement each other and display a distorted single-disease progression, lacking holistic multilevel/multidimensional correlations. Holistic approaches have become increasingly significant in understanding and uncovering high dimensional interactions and molecular overlaps between different organ systems in the pathophysiology of multimorbid and systemic diseases like cardiorenal syndrome because of pathological heart-kidney crosstalk. Holistic approaches to unraveling multimorbid diseases are based on the integration, merging, and correlation of extensive, heterogeneous, and multidimensional data from different data sources, both -omics and nonomics databases. These approaches aimed at generating viable and translatable disease models using mathematical, statistical, and computational tools, thereby creating first computational ecosystems. As part of these computational ecosystems, systems medicine solutions focus on the analysis of -omics data in single-organ diseases. However, the data-scientific requirements to address the complexity of multimodality and multimorbidity reach far beyond what is currently available and require multiphased and cross-sectional approaches. These approaches break down complexity into small and comprehensible challenges. Such holistic computational ecosystems encompass data, methods, processes, and interdisciplinary knowledge to manage the complexity of multiorgan crosstalk. Therefore, this review summarizes the current knowledge of kidney-heart crosstalk, along with methods and opportunities that arise from the novel application of computational ecosystems providing a holistic analysis on the example of kidney-heart crosstalk.
Collapse
Affiliation(s)
- Zhuojun Wu
- Institute of Molecular Cardiovascular Research (Z.W., J.J.), Rheinisch-Westfälische Technische Hochschule Aachen University, Germany
- Department of Radiology (C.K.), Rheinisch-Westfälische Technische Hochschule Aachen University, Germany
| | - Johannes Lohmöller
- Medical Faculty, and Department of Computer Science, Communication and Distributed Systems (COMSYS) (J.L., K.W.), Rheinisch-Westfälische Technische Hochschule Aachen University, Germany
| | - Christiane Kuhl
- Department of Radiology (C.K.), Rheinisch-Westfälische Technische Hochschule Aachen University, Germany
| | - Klaus Wehrle
- Institute of Molecular Cardiovascular Research (Z.W., J.J.), Rheinisch-Westfälische Technische Hochschule Aachen University, Germany
- Medical Faculty, and Department of Computer Science, Communication and Distributed Systems (COMSYS) (J.L., K.W.), Rheinisch-Westfälische Technische Hochschule Aachen University, Germany
| | - Joachim Jankowski
- Institute of Molecular Cardiovascular Research (Z.W., J.J.), Rheinisch-Westfälische Technische Hochschule Aachen University, Germany
- Department of Pathology, Cardiovascular Research Institute Maastricht (CARIM), University of Maastricht, The Netherlands (J.J.)
- Aachen-Maastricht Institute for Cardiorenal Disease (AMICARE), University Hospital Rheinisch-Westfälische Technische Hochschule Aachen, Germany (J.J.)
| |
Collapse
|
22
|
Mohammed MA, Abdulkareem KH, Dinar AM, Zapirain BG. Rise of Deep Learning Clinical Applications and Challenges in Omics Data: A Systematic Review. Diagnostics (Basel) 2023; 13:diagnostics13040664. [PMID: 36832152 PMCID: PMC9955380 DOI: 10.3390/diagnostics13040664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2022] [Revised: 02/05/2023] [Accepted: 02/07/2023] [Indexed: 02/12/2023] Open
Abstract
This research aims to review and evaluate the most relevant scientific studies about deep learning (DL) models in the omics field. It also aims to realize the potential of DL techniques in omics data analysis fully by demonstrating this potential and identifying the key challenges that must be addressed. Numerous elements are essential for comprehending numerous studies by surveying the existing literature. For example, the clinical applications and datasets from the literature are essential elements. The published literature highlights the difficulties encountered by other researchers. In addition to looking for other studies, such as guidelines, comparative studies, and review papers, a systematic approach is used to search all relevant publications on omics and DL using different keyword variants. From 2018 to 2022, the search procedure was conducted on four Internet search engines: IEEE Xplore, Web of Science, ScienceDirect, and PubMed. These indexes were chosen because they offer enough coverage and linkages to numerous papers in the biological field. A total of 65 articles were added to the final list. The inclusion and exclusion criteria were specified. Of the 65 publications, 42 are clinical applications of DL in omics data. Furthermore, 16 out of 65 articles comprised the review publications based on single- and multi-omics data from the proposed taxonomy. Finally, only a small number of articles (7/65) were included in papers focusing on comparative analysis and guidelines. The use of DL in studying omics data presented several obstacles related to DL itself, preprocessing procedures, datasets, model validation, and testbed applications. Numerous relevant investigations were performed to address these issues. Unlike other review papers, our study distinctly reflects different observations on omics with DL model areas. We believe that the result of this study can be a useful guideline for practitioners who look for a comprehensive view of the role of DL in omics data analysis.
Collapse
Affiliation(s)
- Mazin Abed Mohammed
- College of Computer Science and Information Technology, University of Anbar, Anbar 31001, Iraq
- eVIDA Lab, University of Deusto, 48007 Bilbao, Spain
- Correspondence: (M.A.M.); (B.G.Z.)
| | - Karrar Hameed Abdulkareem
- College of Agriculture, Al-Muthanna University, Samawah 66001, Iraq
- College of Engineering, University of Warith Al-Anbiyaa, Karbala 56001, Iraq
| | - Ahmed M. Dinar
- Computer Engineering Department, University of Technology- Iraq, Baghdad 19006, Iraq
| | | |
Collapse
|
23
|
Flores JE, Claborne DM, Weller ZD, Webb-Robertson BJM, Waters KM, Bramer LM. Missing data in multi-omics integration: Recent advances through artificial intelligence. Front Artif Intell 2023; 6:1098308. [PMID: 36844425 PMCID: PMC9949722 DOI: 10.3389/frai.2023.1098308] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 01/23/2023] [Indexed: 02/11/2023] Open
Abstract
Biological systems function through complex interactions between various 'omics (biomolecules), and a more complete understanding of these systems is only possible through an integrated, multi-omic perspective. This has presented the need for the development of integration approaches that are able to capture the complex, often non-linear, interactions that define these biological systems and are adapted to the challenges of combining the heterogenous data across 'omic views. A principal challenge to multi-omic integration is missing data because all biomolecules are not measured in all samples. Due to either cost, instrument sensitivity, or other experimental factors, data for a biological sample may be missing for one or more 'omic techologies. Recent methodological developments in artificial intelligence and statistical learning have greatly facilitated the analyses of multi-omics data, however many of these techniques assume access to completely observed data. A subset of these methods incorporate mechanisms for handling partially observed samples, and these methods are the focus of this review. We describe recently developed approaches, noting their primary use cases and highlighting each method's approach to handling missing data. We additionally provide an overview of the more traditional missing data workflows and their limitations; and we discuss potential avenues for further developments as well as how the missing data issue and its current solutions may generalize beyond the multi-omics context.
Collapse
Affiliation(s)
- Javier E. Flores
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States
| | - Daniel M. Claborne
- Pacific Northwest National Laboratory, Artificial Intelligence and Data Analytics Division, National Security Directorate, Richland, WA, United States
| | - Zachary D. Weller
- Pacific Northwest National Laboratory, Artificial Intelligence and Data Analytics Division, National Security Directorate, Richland, WA, United States
| | - Bobbie-Jo M. Webb-Robertson
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States
| | - Katrina M. Waters
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States
| | - Lisa M. Bramer
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States,*Correspondence: Lisa M. Bramer ✉
| |
Collapse
|
24
|
Paquette AG, Lapehn S, Freije S, MacDonald J, Bammler T, Day DB, Loftus CT, Kannan K, Alex Mason W, Bush NR, LeWinn KZ, Enquobahrie DA, Marsit C, Sathyanarayana S. Placental transcriptomic signatures of prenatal exposure to Hydroxy-Polycyclic aromatic hydrocarbons. ENVIRONMENT INTERNATIONAL 2023; 172:107763. [PMID: 36689866 PMCID: PMC10211546 DOI: 10.1016/j.envint.2023.107763] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 01/16/2023] [Accepted: 01/17/2023] [Indexed: 05/27/2023]
Abstract
BACKGROUND Polycyclic aromatic hydrocarbons (PAHs) are ubiquitous pollutants originating from petrogenic and pyrogenic sources. PAH compounds can cross the placenta, and prenatal PAH exposure is linked to adverse infant and childhood health outcomes. OBJECTIVE In this first human transcriptomic assessment of PAHs in the placenta, we examined associations between prenatal PAH exposure and placental gene expression to gain insight into mechanisms by which PAHs may disrupt placental function. METHODS The ECHO PATHWAYS Consortium quantified prenatal PAH exposure and the placental transcriptome from 629 pregnant participants enrolled in the CANDLE study. Concentrations of 12 monohydroxy-PAH (OH-PAH) metabolites were measured in mid-pregnancy urine using high performance liquid chromatography tandem mass spectrometry. Placental transcriptomic data were obtained using paired-end RNA sequencing. Linear models were fitted to estimate covariate-adjusted associations between maternal urinary OH-PAHs and placental gene expression. We performed sex-stratified analyses to evaluate whether associations varied by fetal sex. Selected PAH/gene expression analyses were validated by treating HTR-8/SVneo cells with phenanthrene, and quantifying expression via qPCR. RESULTS Urinary concentrations of 6 OH-PAHs were associated with placental expression of 8 genes. Three biological pathways were associated with 4 OH-PAHs. Placental expression of SGF29 and TRIP13 as well as the vitamin digestion and absorption pathway were positively associated with multiple metabolites. HTR-8/SVneo cells treated with phenanthrene also exhibited 23 % increased TRIP13 expression compared to vehicle controls (p = 0.04). Fetal sex may modify the relationship between prenatal OH-PAHs and placental gene expression, as more associations were identified in females than males (45 vs 28 associations). DISCUSSION Our study highlights novel genes whose placental expression may be disrupted by OH-PAHs. Increased expression of DNA damage repair gene TRIP13 may represent a response to double-stranded DNA breaks. Increased expression of genes involved in vitamin digestion and metabolism may reflect dietary exposures or represent a compensatory mechanism to combat damage related to OH-PAH toxicity. Further work is needed to study the role of these genes in placental function and their links to perinatal outcomes and lifelong health.
Collapse
Affiliation(s)
- Alison G Paquette
- Seattle Children's Research Institute, Seattle, WA, USA; University of Washington, Seattle, WA, USA.
| | | | | | | | | | - Drew B Day
- Seattle Children's Research Institute, Seattle, WA, USA
| | | | | | - W Alex Mason
- University of Tennessee Health Sciences Center, Memphis, TN, USA
| | - Nicole R Bush
- University of California San Francisco, San Francisco CA, USA
| | - Kaja Z LeWinn
- University of California San Francisco, San Francisco CA, USA
| | | | | | - Sheela Sathyanarayana
- Seattle Children's Research Institute, Seattle, WA, USA; University of Washington, Seattle, WA, USA
| |
Collapse
|
25
|
Sun Q, Cheng L, Meng A, Ge S, Chen J, Zhang L, Gong P. SADLN: Self-attention based deep learning network of integrating multi-omics data for cancer subtype recognition. Front Genet 2023; 13:1032768. [PMID: 36685873 PMCID: PMC9846505 DOI: 10.3389/fgene.2022.1032768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 12/15/2022] [Indexed: 01/05/2023] Open
Abstract
Integrating multi-omics data for cancer subtype recognition is an important task in bioinformatics. Recently, deep learning has been applied to recognize the subtype of cancers. However, existing studies almost integrate the multi-omics data simply by concatenation as the single data and then learn a latent low-dimensional representation through a deep learning model, which did not consider the distribution differently of omics data. Moreover, these methods ignore the relationship of samples. To tackle these problems, we proposed SADLN: A self-attention based deep learning network of integrating multi-omics data for cancer subtype recognition. SADLN combined encoder, self-attention, decoder, and discriminator into a unified framework, which can not only integrate multi-omics data but also adaptively model the sample's relationship for learning an accurately latent low-dimensional representation. With the integrated representation learned from the network, SADLN used Gaussian Mixture Model to identify cancer subtypes. Experiments on ten cancer datasets of TCGA demonstrated the advantages of SADLN compared to ten methods. The Self-Attention Based Deep Learning Network (SADLN) is an effective method of integrating multi-omics data for cancer subtype recognition.
Collapse
Affiliation(s)
- Qiuwen Sun
- School of Medical Imaging, Xuzhou Medical University, Xuzhou, China
| | - Lei Cheng
- School of Medical Imaging, Xuzhou Medical University, Xuzhou, China
| | - Ao Meng
- School of Medical Imaging, Xuzhou Medical University, Xuzhou, China
| | - Shuguang Ge
- School of Information and Control Engineering, University of Mining and Technology, Xuzhou, China
| | - Jie Chen
- Department of Radiation Oncology, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Longzhen Zhang
- Department of Radiation Oncology, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Ping Gong
- School of Medical Imaging, Xuzhou Medical University, Xuzhou, China,*Correspondence: Ping Gong,
| |
Collapse
|
26
|
Liao J, Li X, Gan Y, Han S, Rong P, Wang W, Li W, Zhou L. Artificial intelligence assists precision medicine in cancer treatment. Front Oncol 2023; 12:998222. [PMID: 36686757 PMCID: PMC9846804 DOI: 10.3389/fonc.2022.998222] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 11/22/2022] [Indexed: 01/06/2023] Open
Abstract
Cancer is a major medical problem worldwide. Due to its high heterogeneity, the use of the same drugs or surgical methods in patients with the same tumor may have different curative effects, leading to the need for more accurate treatment methods for tumors and personalized treatments for patients. The precise treatment of tumors is essential, which renders obtaining an in-depth understanding of the changes that tumors undergo urgent, including changes in their genes, proteins and cancer cell phenotypes, in order to develop targeted treatment strategies for patients. Artificial intelligence (AI) based on big data can extract the hidden patterns, important information, and corresponding knowledge behind the enormous amount of data. For example, the ML and deep learning of subsets of AI can be used to mine the deep-level information in genomics, transcriptomics, proteomics, radiomics, digital pathological images, and other data, which can make clinicians synthetically and comprehensively understand tumors. In addition, AI can find new biomarkers from data to assist tumor screening, detection, diagnosis, treatment and prognosis prediction, so as to providing the best treatment for individual patients and improving their clinical outcomes.
Collapse
Affiliation(s)
- Jinzhuang Liao
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China
| | - Xiaoying Li
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China
| | - Yu Gan
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China
| | - Shuangze Han
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China
| | - Pengfei Rong
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China,Cell Transplantation and Gene Therapy Institute, The Third Xiangya Hospital, Central South University, Changsha, Hunan, China,*Correspondence: Pengfei Rong, ; Wei Wang, ; Wei Li, ; Li Zhou,
| | - Wei Wang
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China,Cell Transplantation and Gene Therapy Institute, The Third Xiangya Hospital, Central South University, Changsha, Hunan, China,*Correspondence: Pengfei Rong, ; Wei Wang, ; Wei Li, ; Li Zhou,
| | - Wei Li
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China,Cell Transplantation and Gene Therapy Institute, The Third Xiangya Hospital, Central South University, Changsha, Hunan, China,*Correspondence: Pengfei Rong, ; Wei Wang, ; Wei Li, ; Li Zhou,
| | - Li Zhou
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China,Cell Transplantation and Gene Therapy Institute, The Third Xiangya Hospital, Central South University, Changsha, Hunan, China,Department of Pathology, The Xiangya Hospital of Central South University, Changsha, Hunan, China,*Correspondence: Pengfei Rong, ; Wei Wang, ; Wei Li, ; Li Zhou,
| |
Collapse
|
27
|
Ermini L, Mallo D, Kleftogiannis D, Acar A. Editorial: Cancer evolution. Front Genet 2023; 14:1187687. [PMID: 37124613 PMCID: PMC10141315 DOI: 10.3389/fgene.2023.1187687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 04/05/2023] [Indexed: 05/02/2023] Open
Affiliation(s)
- Luca Ermini
- NORLUX NeuroOncology Laboratory, Department of Cancer Research, Luxembourg Institute of Health, Luxembourg, Luxembourg
- *Correspondence: Luca Ermini, ; Diego Mallo, ; Dimitrios Kleftogiannis, ; Ahmet Acar,
| | - Diego Mallo
- Arizona Cancer Evolution Center, Biodesign Institute and School of Life Sciences, Arizona State University, Tempe, AZ, United States
- *Correspondence: Luca Ermini, ; Diego Mallo, ; Dimitrios Kleftogiannis, ; Ahmet Acar,
| | - Dimitrios Kleftogiannis
- Department of Informatics, Computational Biology Unit and Centre for Cancer Biomarkers, University of Bergen, Bergen, Norway
- *Correspondence: Luca Ermini, ; Diego Mallo, ; Dimitrios Kleftogiannis, ; Ahmet Acar,
| | - Ahmet Acar
- Department of Biological Sciences, Middle East Technical University, Universiteler Mah, Ankara, Turkiye
- *Correspondence: Luca Ermini, ; Diego Mallo, ; Dimitrios Kleftogiannis, ; Ahmet Acar,
| |
Collapse
|
28
|
Tanvir Ahmed K, Cheng S, Li Q, Yong J, Zhang W. Incomplete time-series gene expression in integrative study for islet autoimmunity prediction. Brief Bioinform 2022; 24:6895461. [PMID: 36513375 PMCID: PMC9851333 DOI: 10.1093/bib/bbac537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 10/27/2022] [Accepted: 11/08/2022] [Indexed: 12/15/2022] Open
Abstract
Type 1 diabetes (T1D) outcome prediction plays a vital role in identifying novel risk factors, ensuring early patient care and designing cohort studies. TEDDY is a longitudinal cohort study that collects a vast amount of multi-omics and clinical data from its participants to explore the progression and markers of T1D. However, missing data in the omics profiles make the outcome prediction a difficult task. TEDDY collected time series gene expression for less than 6% of enrolled participants. Additionally, for the participants whose gene expressions are collected, 79% time steps are missing. This study introduces an advanced bioinformatics framework for gene expression imputation and islet autoimmunity (IA) prediction. The imputation model generates synthetic data for participants with partially or entirely missing gene expression. The prediction model integrates the synthetic gene expression with other risk factors to achieve better predictive performance. Comprehensive experiments on TEDDY datasets show that: (1) Our pipeline can effectively integrate synthetic gene expression with family history, HLA genotype and SNPs to better predict IA status at 2 years (sensitivity 0.622, AUC 0.715) compared with the individual datasets and state-of-the-art results in the literature (AUC 0.682). (2) The synthetic gene expression contains predictive signals as strong as the true gene expression, reducing reliance on expensive and long-term longitudinal data collection. (3) Time series gene expression is crucial to the proposed improvement and shows significantly better predictive ability than cross-sectional gene expression. (4) Our pipeline is robust to limited data availability. Availability: Code is available at https://github.com/compbiolabucf/TEDDY.
Collapse
Affiliation(s)
| | - Sze Cheng
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota Twin Cities, Minneapolis, MN 55455, USA
| | - Qian Li
- Department of Biostatistics, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Jeongsik Yong
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota Twin Cities, Minneapolis, MN 55455, USA
| | - Wei Zhang
- Corresponding author. Wei Zhang, Computer Science Department, University of Central Florida. Tel.: 407-823-2763;
| |
Collapse
|
29
|
Kong W, Hui HWH, Peng H, Goh WWB. Dealing with missing values in proteomics data. Proteomics 2022; 22:e2200092. [PMID: 36349819 DOI: 10.1002/pmic.202200092] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 09/15/2022] [Accepted: 10/11/2022] [Indexed: 11/10/2022]
Abstract
Proteomics data are often plagued with missingness issues. These missing values (MVs) threaten the integrity of subsequent statistical analyses by reduction of statistical power, introduction of bias, and failure to represent the true sample. Over the years, several categories of missing value imputation (MVI) methods have been developed and adapted for proteomics data. These MVI methods perform their tasks based on different prior assumptions (e.g., data is normally or independently distributed) and operating principles (e.g., the algorithm is built to address random missingness only), resulting in varying levels of performance even when dealing with the same dataset. Thus, to achieve a satisfactory outcome, a suitable MVI method must be selected. To guide decision making on suitable MVI method, we provide a decision chart which facilitates strategic considerations on datasets presenting different characteristics. We also bring attention to other issues that can impact proper MVI such as the presence of confounders (e.g., batch effects) which can influence MVI performance. Thus, these too, should be considered during or before MVI.
Collapse
Affiliation(s)
- Weijia Kong
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore.,School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Harvard Wai Hann Hui
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore.,School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Hui Peng
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore.,School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Wilson Wen Bin Goh
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore.,School of Biological Sciences, Nanyang Technological University, Singapore, Singapore.,Centre for Biomedical Informatics, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
30
|
Xu Y, Zhang X, Li H, Zheng H, Zhang J, Olsen MS, Varshney RK, Prasanna BM, Qian Q. Smart breeding driven by big data, artificial intelligence, and integrated genomic-enviromic prediction. MOLECULAR PLANT 2022; 15:1664-1695. [PMID: 36081348 DOI: 10.1016/j.molp.2022.09.001] [Citation(s) in RCA: 43] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 08/20/2022] [Accepted: 09/02/2022] [Indexed: 05/12/2023]
Abstract
The first paradigm of plant breeding involves direct selection-based phenotypic observation, followed by predictive breeding using statistical models for quantitative traits constructed based on genetic experimental design and, more recently, by incorporation of molecular marker genotypes. However, plant performance or phenotype (P) is determined by the combined effects of genotype (G), envirotype (E), and genotype by environment interaction (GEI). Phenotypes can be predicted more precisely by training a model using data collected from multiple sources, including spatiotemporal omics (genomics, phenomics, and enviromics across time and space). Integration of 3D information profiles (G-P-E), each with multidimensionality, provides predictive breeding with both tremendous opportunities and great challenges. Here, we first review innovative technologies for predictive breeding. We then evaluate multidimensional information profiles that can be integrated with a predictive breeding strategy, particularly envirotypic data, which have largely been neglected in data collection and are nearly untouched in model construction. We propose a smart breeding scheme, integrated genomic-enviromic prediction (iGEP), as an extension of genomic prediction, using integrated multiomics information, big data technology, and artificial intelligence (mainly focused on machine and deep learning). We discuss how to implement iGEP, including spatiotemporal models, environmental indices, factorial and spatiotemporal structure of plant breeding data, and cross-species prediction. A strategy is then proposed for prediction-based crop redesign at both the macro (individual, population, and species) and micro (gene, metabolism, and network) scales. Finally, we provide perspectives on translating smart breeding into genetic gain through integrative breeding platforms and open-source breeding initiatives. We call for coordinated efforts in smart breeding through iGEP, institutional partnerships, and innovative technological support.
Collapse
Affiliation(s)
- Yunbi Xu
- Institute of Crop Sciences, CIMMYT-China, Chinese Academy of Agricultural Sciences, Beijing 100081, China; CIMMYT-China Tropical Maize Research Center, School of Food Science and Engineering, Foshan University, Foshan, Guangdong 528231, China; Peking University Institute of Advanced Agricultural Sciences, Weifang, Shandong 261325, China.
| | - Xingping Zhang
- Peking University Institute of Advanced Agricultural Sciences, Weifang, Shandong 261325, China
| | - Huihui Li
- Institute of Crop Sciences, CIMMYT-China, Chinese Academy of Agricultural Sciences, Beijing 100081, China; National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences, Sanya, Hainan 572024, China
| | - Hongjian Zheng
- CIMMYT-China Specialty Maize Research Center, Shanghai Academy of Agricultural Sciences, Shanghai 201400, China
| | - Jianan Zhang
- MolBreeding Biotechnology Co., Ltd., Shijiazhuang, Hebei 050035, China
| | - Michael S Olsen
- CIMMYT (International Maize and Wheat Improvement Center), ICRAF Campus, United Nations Avenue, Nairobi, Kenya
| | - Rajeev K Varshney
- State Agricultural Biotechnology Centre, Centre for Crop and Food Innovation, Food Futures Institute, Murdoch University, Murdoch, Australia
| | - Boddupalli M Prasanna
- CIMMYT (International Maize and Wheat Improvement Center), ICRAF Campus, United Nations Avenue, Nairobi, Kenya
| | - Qian Qian
- Institute of Crop Sciences, CIMMYT-China, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| |
Collapse
|
31
|
Song M, Greenbaum J, Luttrell J, Zhou W, Wu C, Luo Z, Qiu C, Zhao LJ, Su KJ, Tian Q, Shen H, Hong H, Gong P, Shi X, Deng HW, Zhang C. An autoencoder-based deep learning method for genotype imputation. Front Artif Intell 2022; 5:1028978. [PMID: 36406474 PMCID: PMC9671213 DOI: 10.3389/frai.2022.1028978] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Accepted: 09/29/2022] [Indexed: 11/06/2022] Open
Abstract
Genotype imputation has a wide range of applications in genome-wide association study (GWAS), including increasing the statistical power of association tests, discovering trait-associated loci in meta-analyses, and prioritizing causal variants with fine-mapping. In recent years, deep learning (DL) based methods, such as sparse convolutional denoising autoencoder (SCDA), have been developed for genotype imputation. However, it remains a challenging task to optimize the learning process in DL-based methods to achieve high imputation accuracy. To address this challenge, we have developed a convolutional autoencoder (AE) model for genotype imputation and implemented a customized training loop by modifying the training process with a single batch loss rather than the average loss over batches. This modified AE imputation model was evaluated using a yeast dataset, the human leukocyte antigen (HLA) data from the 1,000 Genomes Project (1KGP), and our in-house genotype data from the Louisiana Osteoporosis Study (LOS). Our modified AE imputation model has achieved comparable or better performance than the existing SCDA model in terms of evaluation metrics such as the concordance rate (CR), the Hellinger score, the scaled Euclidean norm (SEN) score, and the imputation quality score (IQS) in all three datasets. Taking the imputation results from the HLA data as an example, the AE model achieved an average CR of 0.9468 and 0.9459, Hellinger score of 0.9765 and 0.9518, SEN score of 0.9977 and 0.9953, and IQS of 0.9515 and 0.9044 at missing ratios of 10% and 20%, respectively. As for the results of LOS data, it achieved an average CR of 0.9005, Hellinger score of 0.9384, SEN score of 0.9940, and IQS of 0.8681 at the missing ratio of 20%. In summary, our proposed method for genotype imputation has a great potential to increase the statistical power of GWAS and improve downstream post-GWAS analyses.
Collapse
Affiliation(s)
- Meng Song
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, United States
| | - Jonathan Greenbaum
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States
| | - Joseph Luttrell
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, United States
| | - Weihua Zhou
- College of Computing, Michigan Technological University, Houghton, MI, United States
| | - Chong Wu
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Zhe Luo
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States
| | - Chuan Qiu
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States
| | - Lan Juan Zhao
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States
| | - Kuan-Jui Su
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States
| | - Qing Tian
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States
| | - Hui Shen
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, United States
| | - Ping Gong
- Environmental Laboratory, U.S. Army Engineer Research and Development Center, Vicksburg, MS, United States
| | - Xinghua Shi
- Department of Computer & Information Sciences, Temple University, Philadelphia, PA, United States
| | - Hong-Wen Deng
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States,*Correspondence: Hong-Wen Deng
| | - Chaoyang Zhang
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, United States,Chaoyang Zhang
| |
Collapse
|
32
|
Ten simple rules for a successful international consortium in big data omics. PLoS Comput Biol 2022; 18:e1010546. [PMID: 36264838 PMCID: PMC9584380 DOI: 10.1371/journal.pcbi.1010546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
|
33
|
Raufaste-Cazavieille V, Santiago R, Droit A. Multi-omics analysis: Paving the path toward achieving precision medicine in cancer treatment and immuno-oncology. Front Mol Biosci 2022; 9:962743. [PMID: 36304921 PMCID: PMC9595279 DOI: 10.3389/fmolb.2022.962743] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 09/21/2022] [Indexed: 11/13/2022] Open
Abstract
The acceleration of large-scale sequencing and the progress in high-throughput computational analyses, defined as omics, was a hallmark for the comprehension of the biological processes in human health and diseases. In cancerology, the omics approach, initiated by genomics and transcriptomics studies, has revealed an incredible complexity with unsuspected molecular diversity within a same tumor type as well as spatial and temporal heterogeneity of tumors. The integration of multiple biological layers of omics studies brought oncology to a new paradigm, from tumor site classification to pan-cancer molecular classification, offering new therapeutic opportunities for precision medicine. In this review, we will provide a comprehensive overview of the latest innovations for multi-omics integration in oncology and summarize the largest multi-omics dataset available for adult and pediatric cancers. We will present multi-omics techniques for characterizing cancer biology and show how multi-omics data can be combined with clinical data for the identification of prognostic and treatment-specific biomarkers, opening the way to personalized therapy. To conclude, we will detail the newest strategies for dissecting the tumor immune environment and host–tumor interaction. We will explore the advances in immunomics and microbiomics for biomarker identification to guide therapeutic decision in immuno-oncology.
Collapse
Affiliation(s)
| | - Raoul Santiago
- CHU de Québec Research Center, Université Laval, Québec, QC, Canada
- Division of Pediatric Hematology-Oncology, Centre Hospitalier Universitaire de L’Université Laval, Charles Bruneau Cancer Center, Québec, QC, Canada
- *Correspondence: Raoul Santiago, ; Arnaud Droit,
| | - Arnaud Droit
- CHU de Québec Research Center, Université Laval, Québec, QC, Canada
- *Correspondence: Raoul Santiago, ; Arnaud Droit,
| |
Collapse
|
34
|
Yang HT, Crawford DC, Abazeed ME. Editorial: Translating clinical genomics and health informatics into precision oncology. Front Genet 2022; 13:1029212. [PMID: 36263433 PMCID: PMC9574329 DOI: 10.3389/fgene.2022.1029212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Accepted: 09/20/2022] [Indexed: 12/02/2022] Open
Affiliation(s)
| | | | - Mohamed E. Abazeed
- Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
- Robert H. Lurie Comprehensive Cancer Center, Northwestern University, Chicago, IL, United States
- *Correspondence: Mohamed E. Abazeed,
| |
Collapse
|
35
|
Hill C, Avila-Palencia I, Maxwell AP, Hunter RF, McKnight AJ. Harnessing the Full Potential of Multi-Omic Analyses to Advance the Study and Treatment of Chronic Kidney Disease. FRONTIERS IN NEPHROLOGY 2022; 2:923068. [PMID: 37674991 PMCID: PMC10479694 DOI: 10.3389/fneph.2022.923068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Accepted: 05/30/2022] [Indexed: 09/08/2023]
Abstract
Chronic kidney disease (CKD) was the 12th leading cause of death globally in 2017 with the prevalence of CKD estimated at ~9%. Early detection and intervention for CKD may improve patient outcomes, but standard testing approaches even in developed countries do not facilitate identification of patients at high risk of developing CKD, nor those progressing to end-stage kidney disease (ESKD). Recent advances in CKD research are moving towards a more personalised approach for CKD. Heritability for CKD ranges from 30% to 75%, yet identified genetic risk factors account for only a small proportion of the inherited contribution to CKD. More in depth analysis of genomic sequencing data in large cohorts is revealing new genetic risk factors for common diagnoses of CKD and providing novel diagnoses for rare forms of CKD. Multi-omic approaches are now being harnessed to improve our understanding of CKD and explain some of the so-called 'missing heritability'. The most common omic analyses employed for CKD are genomics, epigenomics, transcriptomics, metabolomics, proteomics and phenomics. While each of these omics have been reviewed individually, considering integrated multi-omic analysis offers considerable scope to improve our understanding and treatment of CKD. This narrative review summarises current understanding of multi-omic research alongside recent experimental and analytical approaches, discusses current challenges and future perspectives, and offers new insights for CKD.
Collapse
Affiliation(s)
| | | | | | | | - Amy Jayne McKnight
- Centre for Public Health, Queen’s University Belfast, Belfast, United Kingdom
| |
Collapse
|
36
|
De La Toba EA, Bell SE, Romanova EV, Sweedler JV. Mass Spectrometry Measurements of Neuropeptides: From Identification to Quantitation. ANNUAL REVIEW OF ANALYTICAL CHEMISTRY (PALO ALTO, CALIF.) 2022; 15:83-106. [PMID: 35324254 DOI: 10.1146/annurev-anchem-061020-022048] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Neuropeptides (NPs), a unique class of neuronal signaling molecules, participate in a variety of physiological processes and diseases. Quantitative measurements of NPs provide valuable information regarding how these molecules are differentially regulated in a multitude of neurological, metabolic, and mental disorders. Mass spectrometry (MS) has evolved to become a powerful technique for measuring trace levels of NPs in complex biological tissues and individual cells using both targeted and exploratory approaches. There are inherent challenges to measuring NPs, including their wide endogenous concentration range, transport and postmortem degradation, complex sample matrices, and statistical processing of MS data required for accurate NP quantitation. This review highlights techniques developed to address these challenges and presents an overview of quantitative MS-based measurement approaches for NPs, including the incorporation of separation methods for high-throughput analysis, MS imaging for spatial measurements, and methods for NP quantitation in single neurons.
Collapse
Affiliation(s)
- Eduardo A De La Toba
- Department of Chemistry, University of Illinois Urbana-Champaign, Urbana, Illinois, USA;
- Beckman Institute for Advanced Science and Technology, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| | - Sara E Bell
- Department of Chemistry, University of Illinois Urbana-Champaign, Urbana, Illinois, USA;
- Beckman Institute for Advanced Science and Technology, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| | - Elena V Romanova
- Department of Chemistry, University of Illinois Urbana-Champaign, Urbana, Illinois, USA;
- Beckman Institute for Advanced Science and Technology, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| | - Jonathan V Sweedler
- Department of Chemistry, University of Illinois Urbana-Champaign, Urbana, Illinois, USA;
- Beckman Institute for Advanced Science and Technology, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| |
Collapse
|
37
|
When artificial intelligence meets PD-1/PD-L1 inhibitors: Population screening, response prediction and efficacy evaluation. Comput Biol Med 2022; 145:105499. [DOI: 10.1016/j.compbiomed.2022.105499] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 03/26/2022] [Accepted: 04/03/2022] [Indexed: 02/07/2023]
|
38
|
Wolc A, Dekkers JCM. Application of Bayesian genomic prediction methods to genome-wide association analyses. Genet Sel Evol 2022; 54:31. [PMID: 35562659 PMCID: PMC9103490 DOI: 10.1186/s12711-022-00724-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 04/27/2022] [Indexed: 11/19/2022] Open
Abstract
Background Bayesian genomic prediction methods were developed to simultaneously fit all genotyped markers to a set of available phenotypes for prediction of breeding values for quantitative traits, allowing for differences in the genetic architecture (distribution of marker effects) of traits. These methods also provide a flexible and reliable framework for genome-wide association (GWA) studies. The objective here was to review developments in Bayesian hierarchical and variable selection models for GWA analyses. Results By fitting all genotyped markers simultaneously, Bayesian GWA methods implicitly account for population structure and the multiple-testing problem of classical single-marker GWA. Implemented using Markov chain Monte Carlo methods, Bayesian GWA methods allow for control of error rates using probabilities obtained from posterior distributions. Power of GWA studies using Bayesian methods can be enhanced by using informative priors based on previous association studies, gene expression analyses, or functional annotation information. Applied to multiple traits, Bayesian GWA analyses can give insight into pleiotropic effects by multi-trait, structural equation, or graphical models. Bayesian methods can also be used to combine genomic, transcriptomic, proteomic, and other -omics data to infer causal genotype to phenotype relationships and to suggest external interventions that can improve performance. Conclusions Bayesian hierarchical and variable selection methods provide a unified and powerful framework for genomic prediction, GWA, integration of prior information, and integration of information from other -omics platforms to identify causal mutations for complex quantitative traits.
Collapse
Affiliation(s)
- Anna Wolc
- Department of Animal Science, Iowa State University, 806 Stange Road, 239 Kildee Hall, Ames, IA, 50010, USA.,Hy-Line International, 2583 240th Street, Dallas Center, IA, 50063, USA
| | - Jack C M Dekkers
- Department of Animal Science, Iowa State University, 806 Stange Road, 239 Kildee Hall, Ames, IA, 50010, USA.
| |
Collapse
|
39
|
Vahabi N, Michailidis G. Unsupervised Multi-Omics Data Integration Methods: A Comprehensive Review. Front Genet 2022; 13:854752. [PMID: 35391796 PMCID: PMC8981526 DOI: 10.3389/fgene.2022.854752] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Accepted: 02/28/2022] [Indexed: 12/26/2022] Open
Abstract
Through the developments of Omics technologies and dissemination of large-scale datasets, such as those from The Cancer Genome Atlas, Alzheimer’s Disease Neuroimaging Initiative, and Genotype-Tissue Expression, it is becoming increasingly possible to study complex biological processes and disease mechanisms more holistically. However, to obtain a comprehensive view of these complex systems, it is crucial to integrate data across various Omics modalities, and also leverage external knowledge available in biological databases. This review aims to provide an overview of multi-Omics data integration methods with different statistical approaches, focusing on unsupervised learning tasks, including disease onset prediction, biomarker discovery, disease subtyping, module discovery, and network/pathway analysis. We also briefly review feature selection methods, multi-Omics data sets, and resources/tools that constitute critical components for carrying out the integration.
Collapse
Affiliation(s)
- Nasim Vahabi
- Informatics Institute, University of Florida, Gainesville, FL, United States
| | - George Michailidis
- Informatics Institute, University of Florida, Gainesville, FL, United States
| |
Collapse
|
40
|
Suomi T, Elo LL. Statistical and machine learning methods to study human CD4+ T cell proteome profiles. Immunol Lett 2022; 245:8-17. [DOI: 10.1016/j.imlet.2022.03.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 03/11/2022] [Accepted: 03/15/2022] [Indexed: 11/05/2022]
|
41
|
Drouard G, Ollikainen M, Mykkänen J, Raitakari O, Lehtimäki T, Kähönen M, Mishra PP, Wang X, Kaprio J. Multi-Omics Integration in a Twin Cohort and Predictive Modeling of Blood Pressure Values. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2022; 26:130-141. [PMID: 35259029 PMCID: PMC8978565 DOI: 10.1089/omi.2021.0201] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/21/2023]
Abstract
Abnormal blood pressure is strongly associated with risk of high-prevalence diseases, making the study of blood pressure a major public health challenge. Although biological mechanisms underlying hypertension at the single omic level have been discovered, multi-omics integrative analyses using continuous variations in blood pressure values remain limited. We used a multi-omics regression-based method, called sparse multi-block partial least square, for integrative, explanatory, and predictive interests in study of systolic and diastolic blood pressure values. Various datasets were obtained from the Finnish Twin Cohort for up to 444 twins. Blocks of omics-including transcriptomic, methylation, metabolomic-data as well as polygenic risk scores and clinical data were integrated into the modeling and supported by cross-validation. The predictive contribution of each omics block when predicting blood pressure values was investigated using external participants from the Young Finns Study. In addition to revealing interesting inter-omics associations, we found that each block of omics heterogeneously improved the predictions of blood pressure values once the multi-omics data were integrated. The modeling revealed a plurality of clinical, transcriptomic, and metabolomic factors consistent with the literature and that play a leading role in explaining unit variations in blood pressure. These findings demonstrate (1) the robustness of our integrative method to harness results obtained by single omics discriminant analyses, and (2) the added value of predictive and exploratory gains of a multi-omics approach in studies of complex phenotypes such as blood pressure.
Collapse
Affiliation(s)
- Gabin Drouard
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
- Address correspondence to: Gabin Drouard, MSc, Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Tukholmankatu 8, Helsinki 00014, Finland
| | - Miina Ollikainen
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
| | - Juha Mykkänen
- Centre for Population Health Research, University of Turku and Turku University Hospital, Turku, Finland
- Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, Turku, Finland
| | - Olli Raitakari
- Centre for Population Health Research, University of Turku and Turku University Hospital, Turku, Finland
- Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, Turku, Finland
- Department of Clinical Physiology and Nuclear Medicine, Turku University Hospital, Turku, Finland
| | - Terho Lehtimäki
- Department of Clinical Chemistry, Fimlab Laboratories and Finnish Cardiovascular Research Center-Tampere, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
| | - Mika Kähönen
- Department of Clinical Physiology, Tampere University Hospital, and Finnish Cardiovascular Research Center-Tampere, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
| | - Pashupati P. Mishra
- Department of Clinical Chemistry, Fimlab Laboratories and Finnish Cardiovascular Research Center-Tampere, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
| | - Xiaoling Wang
- Georgia Prevention Institute (GPI), Medical College of Georgia, Augusta University, Augusta, Georgia, USA
| | - Jaakko Kaprio
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
| |
Collapse
|
42
|
Kang M, Ko E, Mersha TB. A roadmap for multi-omics data integration using deep learning. Brief Bioinform 2022; 23:bbab454. [PMID: 34791014 PMCID: PMC8769688 DOI: 10.1093/bib/bbab454] [Citation(s) in RCA: 79] [Impact Index Per Article: 39.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Revised: 09/30/2021] [Accepted: 10/05/2021] [Indexed: 12/18/2022] Open
Abstract
High-throughput next-generation sequencing now makes it possible to generate a vast amount of multi-omics data for various applications. These data have revolutionized biomedical research by providing a more comprehensive understanding of the biological systems and molecular mechanisms of disease development. Recently, deep learning (DL) algorithms have become one of the most promising methods in multi-omics data analysis, due to their predictive performance and capability of capturing nonlinear and hierarchical features. While integrating and translating multi-omics data into useful functional insights remain the biggest bottleneck, there is a clear trend towards incorporating multi-omics analysis in biomedical research to help explain the complex relationships between molecular layers. Multi-omics data have a role to improve prevention, early detection and prediction; monitor progression; interpret patterns and endotyping; and design personalized treatments. In this review, we outline a roadmap of multi-omics integration using DL and offer a practical perspective into the advantages, challenges and barriers to the implementation of DL in multi-omics data.
Collapse
Affiliation(s)
- Mingon Kang
- Department of Computer Science at the University of Nevada, Las Vegas, NV, USA
| | - Euiseong Ko
- Department of Computer Science at the University of Nevada, Las Vegas, NV, USA
| | - Tesfaye B Mersha
- Department of Pediatrics, Cincinnati Children’s Hospital Medical Center, University of Cincinnati, Cincinnati, OH, USA
| |
Collapse
|
43
|
Yoon SJ, Kwon W, Lee OJ, Jung JH, Shin YC, Lim CS, Kim H, Jang JY, Shin SH, Heo JS, Han IW. External validation of risk prediction platforms for pancreatic fistula after pancreatoduodenectomy using nomograms and artificial intelligence. Ann Surg Treat Res 2022; 102:147-152. [PMID: 35317357 PMCID: PMC8914522 DOI: 10.4174/astr.2022.102.3.147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Revised: 12/24/2021] [Accepted: 01/11/2022] [Indexed: 11/30/2022] Open
Affiliation(s)
- So Jeong Yoon
- Division of Hepatobiliary-Pancreatic Surgery, Department of Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Wooil Kwon
- Department of Surgery, Seoul National University Hospital, Seoul National University College of Medicine, Seoul, Korea
| | - Ok Joo Lee
- Division of Hepatobiliary-Pancreatic Surgery, Department of Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Ji Hye Jung
- Division of Hepatobiliary-Pancreatic Surgery, Department of Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Yong Chan Shin
- Department of Surgery, Ilsan Paik Hospital, Inje University College of Medicine, Goyang, Korea
| | - Chang-Sup Lim
- Department of Surgery, Seoul Metropolitan Government - Seoul National University Boramae Medical Center, Seoul National University College of Medicine, Seoul, Korea
| | - Hongbeom Kim
- Department of Surgery, Seoul National University Hospital, Seoul National University College of Medicine, Seoul, Korea
| | - Jin-Young Jang
- Department of Surgery, Seoul National University Hospital, Seoul National University College of Medicine, Seoul, Korea
| | - Sang Hyun Shin
- Division of Hepatobiliary-Pancreatic Surgery, Department of Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Jin Seok Heo
- Division of Hepatobiliary-Pancreatic Surgery, Department of Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - In Woong Han
- Division of Hepatobiliary-Pancreatic Surgery, Department of Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| |
Collapse
|
44
|
Ahmed KT, Sun J, Cheng S, Yong J, Zhang W. Multi-omics data integration by generative adversarial network. Bioinformatics 2021; 38:179-186. [PMID: 34415323 DOI: 10.1093/bioinformatics/btab608] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Revised: 07/27/2021] [Accepted: 08/18/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Accurate disease phenotype prediction plays an important role in the treatment of heterogeneous diseases like cancer in the era of precision medicine. With the advent of high throughput technologies, more comprehensive multi-omics data is now available that can effectively link the genotype to phenotype. However, the interactive relation of multi-omics datasets makes it particularly challenging to incorporate different biological layers to discover the coherent biological signatures and predict phenotypic outcomes. In this study, we introduce omicsGAN, a generative adversarial network model to integrate two omics data and their interaction network. The model captures information from the interaction network as well as the two omics datasets and fuse them to generate synthetic data with better predictive signals. RESULTS Large-scale experiments on The Cancer Genome Atlas breast cancer, lung cancer and ovarian cancer datasets validate that (i) the model can effectively integrate two omics data (e.g. mRNA and microRNA expression data) and their interaction network (e.g. microRNA-mRNA interaction network). The synthetic omics data generated by the proposed model has a better performance on cancer outcome classification and patients survival prediction compared to original omics datasets. (ii) The integrity of the interaction network plays a vital role in the generation of synthetic data with higher predictive quality. Using a random interaction network does not allow the framework to learn meaningful information from the omics datasets; therefore, results in synthetic data with weaker predictive signals. AVAILABILITY AND IMPLEMENTATION Source code is available at: https://github.com/CompbioLabUCF/omicsGAN. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Khandakar Tanvir Ahmed
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
- Genomics and Bioinformatics Cluster, University of Central Florida, Orlando, FL 32816, USA
| | - Jiao Sun
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
- Genomics and Bioinformatics Cluster, University of Central Florida, Orlando, FL 32816, USA
| | - Sze Cheng
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota Twin Cities, Minneapolis, MN 55455, USA
| | - Jeongsik Yong
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota Twin Cities, Minneapolis, MN 55455, USA
| | - Wei Zhang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
- Genomics and Bioinformatics Cluster, University of Central Florida, Orlando, FL 32816, USA
| |
Collapse
|
45
|
Li Y, Li H, Sun T, Ding C. Pathogen-Host Interaction Repertoire at Proteome and Posttranslational Modification Levels During Fungal Infections. Front Cell Infect Microbiol 2021; 11:774340. [PMID: 34926320 PMCID: PMC8674643 DOI: 10.3389/fcimb.2021.774340] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2021] [Accepted: 11/15/2021] [Indexed: 12/22/2022] Open
Abstract
Prevalence of fungal diseases has increased globally in recent years, which often associated with increased immunocompromised patients, aging populations, and the novel Coronavirus pandemic. Furthermore, due to the limitation of available antifungal agents mortality and morbidity rates of invasion fungal disease remain stubbornly high, and the emergence of multidrug-resistant fungi exacerbates the problem. Fungal pathogenicity and interactions between fungi and host have been the focus of many studies, as a result, lots of pathogenic mechanisms and fungal virulence factors have been identified. Mass spectrometry (MS)-based proteomics is a novel approach to better understand fungal pathogenicities and host–pathogen interactions at protein and protein posttranslational modification (PTM) levels. The approach has successfully elucidated interactions between pathogens and hosts by examining, for example, samples of fungal cells under different conditions, body fluids from infected patients, and exosomes. Many studies conclude that protein and PTM levels in both pathogens and hosts play important roles in progression of fungal diseases. This review summarizes mass spectrometry studies of protein and PTM levels from perspectives of both pathogens and hosts and provides an integrative conceptual outlook on fungal pathogenesis, antifungal agents development, and host–pathogen interactions.
Collapse
Affiliation(s)
- Yanjian Li
- College of Life and Health Sciences, Northeastern University, Shenyang, China
| | - Hailong Li
- NHC Key Laboratory of AIDS Immunology (China Medical University), National Clinical Research Center for Laboratory Medicine, The First Affiliated Hospital of China Medical University, Shenyang, China
| | - Tianshu Sun
- Medical Research Centre, State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Chinese Academy of Medical Science, Beijing, China.,Beijing Key Laboratory for Mechanisms Research and Precision Diagnosis of Invasive Fungal Diseases, Beijing, China
| | - Chen Ding
- College of Life and Health Sciences, Northeastern University, Shenyang, China
| |
Collapse
|
46
|
Pierre-Jean M, Mauger F, Deleuze JF, Le Floch E. PIntMF: Penalized Integrative Matrix Factorization method for multi-omics data. Bioinformatics 2021; 38:900-907. [PMID: 34849583 PMCID: PMC8796362 DOI: 10.1093/bioinformatics/btab786] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Revised: 09/30/2021] [Accepted: 11/11/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION It is more and more common to perform multi-omics analyses to explore the genome at diverse levels and not only at a single level. Through integrative statistical methods, multi-omics data have the power to reveal new biological processes, potential biomarkers and subgroups in a cohort. Matrix factorization (MF) is an unsupervised statistical method that allows a clustering of individuals, but also reveals relevant omics variables from the various blocks. RESULTS Here, we present PIntMF (Penalized Integrative Matrix Factorization), an MF model with sparsity, positivity and equality constraints. To induce sparsity in the model, we used a classical Lasso penalization on variable and individual matrices. For the matrix of samples, sparsity helps in the clustering, while normalization (matching an equality constraint) of inferred coefficients is added to improve interpretation. Moreover, we added an automatic tuning of the sparsity parameters using the famous glmnet package. We also proposed three criteria to help the user to choose the number of latent variables. PIntMF was compared with other state-of-the-art integrative methods including feature selection techniques in both synthetic and real data. PIntMF succeeds in finding relevant clusters as well as variables in two types of simulated data (correlated and uncorrelated). Next, PIntMF was applied to two real datasets (Diet and cancer), and it revealed interpretable clusters linked to available clinical data. Our method outperforms the existing ones on two criteria (clustering and variable selection). We show that PIntMF is an easy, fast and powerful tool to extract patterns and cluster samples from multi-omics data. AVAILABILITY AND IMPLEMENTATION An R package is available at https://github.com/mpierrejean/pintmf. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Florence Mauger
- Centre National de Recherche en Génomique Humaine, CEA, Université de Paris-Saclay, Evry, France
| | - Jean-François Deleuze
- Centre National de Recherche en Génomique Humaine, CEA, Université de Paris-Saclay, Evry, France
| | - Edith Le Floch
- Centre National de Recherche en Génomique Humaine, CEA, Université de Paris-Saclay, Evry, France
| |
Collapse
|
47
|
|
48
|
Park Y, Heider D, Hauschild AC. Integrative Analysis of Next-Generation Sequencing for Next-Generation Cancer Research toward Artificial Intelligence. Cancers (Basel) 2021; 13:3148. [PMID: 34202427 PMCID: PMC8269018 DOI: 10.3390/cancers13133148] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Revised: 06/16/2021] [Accepted: 06/21/2021] [Indexed: 12/18/2022] Open
Abstract
The rapid improvement of next-generation sequencing (NGS) technologies and their application in large-scale cohorts in cancer research led to common challenges of big data. It opened a new research area incorporating systems biology and machine learning. As large-scale NGS data accumulated, sophisticated data analysis methods became indispensable. In addition, NGS data have been integrated with systems biology to build better predictive models to determine the characteristics of tumors and tumor subtypes. Therefore, various machine learning algorithms were introduced to identify underlying biological mechanisms. In this work, we review novel technologies developed for NGS data analysis, and we describe how these computational methodologies integrate systems biology and omics data. Subsequently, we discuss how deep neural networks outperform other approaches, the potential of graph neural networks (GNN) in systems biology, and the limitations in NGS biomedical research. To reflect on the various challenges and corresponding computational solutions, we will discuss the following three topics: (i) molecular characteristics, (ii) tumor heterogeneity, and (iii) drug discovery. We conclude that machine learning and network-based approaches can add valuable insights and build highly accurate models. However, a well-informed choice of learning algorithm and biological network information is crucial for the success of each specific research question.
Collapse
Affiliation(s)
- Youngjun Park
- Department of Mathematics and Computer Science, Philipps-University of Marburg, 35032 Marburg, Germany; (Y.P.); (D.H.)
| | - Dominik Heider
- Department of Mathematics and Computer Science, Philipps-University of Marburg, 35032 Marburg, Germany; (Y.P.); (D.H.)
| | - Anne-Christin Hauschild
- Department of Mathematics and Computer Science, Philipps-University of Marburg, 35032 Marburg, Germany; (Y.P.); (D.H.)
- Department of Medical Informatics, University Medical Center Göttingen, 37075 Göttingen, Germany
| |
Collapse
|
49
|
Picard M, Scott-Boyer MP, Bodein A, Périn O, Droit A. Integration strategies of multi-omics data for machine learning analysis. Comput Struct Biotechnol J 2021; 19:3735-3746. [PMID: 34285775 PMCID: PMC8258788 DOI: 10.1016/j.csbj.2021.06.030] [Citation(s) in RCA: 148] [Impact Index Per Article: 49.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 06/17/2021] [Accepted: 06/21/2021] [Indexed: 12/25/2022] Open
Abstract
Increased availability of high-throughput technologies has generated an ever-growing number of omics data that seek to portray many different but complementary biological layers including genomics, epigenomics, transcriptomics, proteomics, and metabolomics. New insight from these data have been obtained by machine learning algorithms that have produced diagnostic and classification biomarkers. Most biomarkers obtained to date however only include one omic measurement at a time and thus do not take full advantage of recent multi-omics experiments that now capture the entire complexity of biological systems. Multi-omics data integration strategies are needed to combine the complementary knowledge brought by each omics layer. We have summarized the most recent data integration methods/ frameworks into five different integration strategies: early, mixed, intermediate, late and hierarchical. In this mini-review, we focus on challenges and existing multi-omics integration strategies by paying special attention to machine learning applications.
Collapse
Affiliation(s)
- Milan Picard
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Marie-Pier Scott-Boyer
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Antoine Bodein
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Olivier Périn
- Digital Sciences Department, L'Oréal Advanced Research, Aulnay-sous-bois, France
| | - Arnaud Droit
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
- Corresponding author.
| |
Collapse
|
50
|
Chu X, Zhang B, Koeken VACM, Gupta MK, Li Y. Multi-Omics Approaches in Immunological Research. Front Immunol 2021; 12:668045. [PMID: 34177908 PMCID: PMC8226116 DOI: 10.3389/fimmu.2021.668045] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 05/28/2021] [Indexed: 12/14/2022] Open
Abstract
The immune system plays a vital role in health and disease, and is regulated through a complex interactive network of many different immune cells and mediators. To understand the complexity of the immune system, we propose to apply a multi-omics approach in immunological research. This review provides a complete overview of available methodological approaches for the different omics data layers relevant for immunological research, including genetics, epigenetics, transcriptomics, proteomics, metabolomics, and cellomics. Thereafter, we describe the various methods for data analysis as well as how to integrate different layers of omics data. Finally, we discuss the possible applications of multi-omics studies and opportunities they provide for understanding the complex regulatory networks as well as immune variation in various immune-related diseases.
Collapse
Affiliation(s)
- Xiaojing Chu
- Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, Netherlands
- Department of Computational Biology for Individualised Medicine, Centre for Individualised Infection Medicine (CiiM), a joint venture between the Hannover Medical School and the Helmholtz Centre for Infection Research, Hannover, Germany
- TWINCORE, Centre for Experimental and Clinical Infection Research, a joint venture between the Hannover Medical School and the Helmholtz Centre for Infection Research, Hannover, Germany
| | - Bowen Zhang
- Department of Computational Biology for Individualised Medicine, Centre for Individualised Infection Medicine (CiiM), a joint venture between the Hannover Medical School and the Helmholtz Centre for Infection Research, Hannover, Germany
- TWINCORE, Centre for Experimental and Clinical Infection Research, a joint venture between the Hannover Medical School and the Helmholtz Centre for Infection Research, Hannover, Germany
| | - Valerie A. C. M. Koeken
- Department of Computational Biology for Individualised Medicine, Centre for Individualised Infection Medicine (CiiM), a joint venture between the Hannover Medical School and the Helmholtz Centre for Infection Research, Hannover, Germany
- TWINCORE, Centre for Experimental and Clinical Infection Research, a joint venture between the Hannover Medical School and the Helmholtz Centre for Infection Research, Hannover, Germany
- Department of Internal Medicine and Radboud Center for Infectious Diseases, Radboud University Medical Center, Nijmegen, Netherlands
| | - Manoj Kumar Gupta
- Department of Computational Biology for Individualised Medicine, Centre for Individualised Infection Medicine (CiiM), a joint venture between the Hannover Medical School and the Helmholtz Centre for Infection Research, Hannover, Germany
- TWINCORE, Centre for Experimental and Clinical Infection Research, a joint venture between the Hannover Medical School and the Helmholtz Centre for Infection Research, Hannover, Germany
| | - Yang Li
- Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, Netherlands
- Department of Computational Biology for Individualised Medicine, Centre for Individualised Infection Medicine (CiiM), a joint venture between the Hannover Medical School and the Helmholtz Centre for Infection Research, Hannover, Germany
- TWINCORE, Centre for Experimental and Clinical Infection Research, a joint venture between the Hannover Medical School and the Helmholtz Centre for Infection Research, Hannover, Germany
- Department of Internal Medicine and Radboud Center for Infectious Diseases, Radboud University Medical Center, Nijmegen, Netherlands
| |
Collapse
|