1
|
Gliozzo J, Soto-Gomez M, Guarino V, Bonometti A, Cabri A, Cavalleri E, Reese J, Robinson PN, Mesiti M, Valentini G, Casiraghi E. Intrinsic-dimension analysis for guiding dimensionality reduction and data fusion in multi-omics data processing. Artif Intell Med 2025; 160:103049. [PMID: 39673960 DOI: 10.1016/j.artmed.2024.103049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 12/03/2024] [Accepted: 12/04/2024] [Indexed: 12/16/2024]
Abstract
Multi-omics data have revolutionized biomedical research by providing a comprehensive understanding of biological systems and the molecular mechanisms of disease development. However, analyzing multi-omics data is challenging due to high dimensionality and limited sample sizes, necessitating proper data-reduction pipelines to ensure reliable analyses. Additionally, its multimodal nature requires effective data-integration pipelines. While several dimensionality reduction and data fusion algorithms have been proposed, crucial aspects are often overlooked. Specifically, the choice of projection space dimension is typically heuristic and uniformly applied across all omics, neglecting the unique high dimension small sample size challenges faced by individual omics. This paper introduces a novel multi-modal dimensionality reduction pipeline tailored to individual views. By leveraging intrinsic dimensionality estimators, we assess the curse-of-dimensionality impact on each view and propose a two-step reduction strategy for significantly affected views, combining feature selection with feature extraction. Compared to traditional uniform reduction pipelines in a crucial and supervised multi-omics analysis setting, our approach shows significant improvement. Additionally, we explore three effective unsupervised multi-omics data fusion methods rooted in the main data fusion strategies to gain insights into their performance under crucial, yet overlooked, settings.
Collapse
Affiliation(s)
- Jessica Gliozzo
- AnacletoLab, Computer Science Department, Università degli Studi di Milano, Milan, Italy; European Commission, Joint Research Centre (JRC), Ispra, Italy
| | - Mauricio Soto-Gomez
- AnacletoLab, Computer Science Department, Università degli Studi di Milano, Milan, Italy
| | - Valentina Guarino
- AnacletoLab, Computer Science Department, Università degli Studi di Milano, Milan, Italy
| | - Arturo Bonometti
- Department of Biomedical Sciences, Humanitas University, Milan, Italy; Department of Pathology, IRCCS Humanitas Clinical and Research Hospital, Milan, Italy
| | - Alberto Cabri
- AnacletoLab, Computer Science Department, Università degli Studi di Milano, Milan, Italy
| | - Emanuele Cavalleri
- AnacletoLab, Computer Science Department, Università degli Studi di Milano, Milan, Italy
| | - Justin Reese
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Marco Mesiti
- AnacletoLab, Computer Science Department, Università degli Studi di Milano, Milan, Italy; Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Giorgio Valentini
- AnacletoLab, Computer Science Department, Università degli Studi di Milano, Milan, Italy; CINI, Infolife National Laboratory, Roma, Italy
| | - Elena Casiraghi
- AnacletoLab, Computer Science Department, Università degli Studi di Milano, Milan, Italy; Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA; CINI, Infolife National Laboratory, Roma, Italy; Department of Computer Science, Aalto University, Espoo, Finland.
| |
Collapse
|
2
|
Muthukrishanan G, Munisamy J, Gopalasubramaniam SK, Subramanian KS, Dharmaraj R, Nath DJ, Dutta P, Devarajan AK. Impact of foliar application of phyllosphere yeast strains combined with soil fertilizer application on rice growth and yield. ENVIRONMENTAL MICROBIOME 2024; 19:102. [PMID: 39695904 DOI: 10.1186/s40793-024-00635-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Accepted: 11/04/2024] [Indexed: 12/20/2024]
Abstract
BACKGROUND The application of beneficial microbes in agriculture is gaining increasing attention as a means to reduce reliance on chemical fertilizers. This approach can potentially mitigate negative impacts on soil, animal, and human health, as well as decrease climate-changing factors. Among these microbes, yeast has been the least explored, particularly within the phyllosphere compartment. This study addresses this knowledge gap by investigating the potential of phyllosphere yeast to improve rice yield while reducing fertilizer dosage. RESULTS From fifty-two rice yeast phyllosphere isolates, we identified three yeast strains-Rhodotorula paludigena Y1, Pseudozyma sp. Y71, and Cryptococcus sp. Y72-that could thrive at 36 °C and possessed significant multifarious plant growth-promoting traits, enhancing rice root and shoot length upon seed inoculation. These three strains demonstrated favorable compatibility, leading to the creation of a yeast consortium. We assessed the combined effect of foliar application of this yeast consortium and individual strains with two distinct recommended doses of chemical fertilizers (RDCFs) (75 and 100%), as well as RDCFs alone (75 and 100%), in rice maintained in pot-culture and field experiments. The pot-culture experiment investigated the leaf microbial community, plant biochemicals, root and shoot length during the stem elongation, flowering, and dough phases, and yield-related parameters at harvest. The field experiment determined the actual yield. Integrated results from both experiments revealed that the yeast consortium with 75% RDCFs was more effective than the yeast consortium with 100% RDCFs, single strain applications with RDCFs (75 and 100%), and RDCFs alone (75 and 100%). Additionally, this treatment improved leaf metabolite levels compared to control rice plants. CONCLUSIONS Overall, a 25% reduction in soil chemical fertilizers combined with yeast consortium foliar application improved rice growth, biochemicals, and yield. This study also advances the field of phyllosphere yeast research in agriculture.
Collapse
Affiliation(s)
- Gomathy Muthukrishanan
- Department of Soil Science and Agricultural Chemistry, Agricultural College and Research Institute, Tamil Nadu Agricultural University, Killikulam, Tuticorin, 628252, India.
| | - Jeyashri Munisamy
- Department of Soil Science and Agricultural Chemistry, Agricultural College and Research Institute, Tamil Nadu Agricultural University, Killikulam, Tuticorin, 628252, India
| | | | | | | | | | - Pranab Dutta
- Central Agricultural University, Umiam, Meghalaya, 793122, India
| | | |
Collapse
|
3
|
Briscik M, Tazza G, Vidács L, Dillies MA, Déjean S. Supervised multiple kernel learning approaches for multi-omics data integration. BioData Min 2024; 17:53. [PMID: 39580456 PMCID: PMC11585117 DOI: 10.1186/s13040-024-00406-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Accepted: 11/14/2024] [Indexed: 11/25/2024] Open
Abstract
BACKGROUND Advances in high-throughput technologies have originated an ever-increasing availability of omics datasets. The integration of multiple heterogeneous data sources is currently an issue for biology and bioinformatics. Multiple kernel learning (MKL) has shown to be a flexible and valid approach to consider the diverse nature of multi-omics inputs, despite being an underused tool in genomic data mining. RESULTS We provide novel MKL approaches based on different kernel fusion strategies. To learn from the meta-kernel of input kernels, we adapted unsupervised integration algorithms for supervised tasks with support vector machines. We also tested deep learning architectures for kernel fusion and classification. The results show that MKL-based models can outperform more complex, state-of-the-art, supervised multi-omics integrative approaches. CONCLUSION Multiple kernel learning offers a natural framework for predictive models in multi-omics data. It proved to provide a fast and reliable solution that can compete with and outperform more complex architectures. Our results offer a direction for bio-data mining research, biomarker discovery and further development of methods for heterogeneous data integration.
Collapse
Affiliation(s)
- Mitja Briscik
- Institut de Mathématiques de Toulouse, UMR5219, CNRS, UPS, Université de Toulouse, Cedex 9, Toulouse, 31062, France.
| | - Gabriele Tazza
- Department of Computer Science, Applied Artificial Intelligence Group , University of Szeged, Szeged, 6720, Hungary.
| | - László Vidács
- Department of Computer Science, Applied Artificial Intelligence Group , University of Szeged, Szeged, 6720, Hungary
| | - Marie-Agnès Dillies
- Institut Pasteur, Université Paris Cité, Bioinformatics and Biostatistics Hub, F-75015, Paris, France
| | - Sébastien Déjean
- Institut de Mathématiques de Toulouse, UMR5219, CNRS, UPS, Université de Toulouse, Cedex 9, Toulouse, 31062, France
| |
Collapse
|
4
|
Paries M, Vigneau E, Huneau A, Lantz O, Bougeard S. MBPCA-OS: an exploratory multiblock method for variables of different measurement levels. Application to study the immune response to SARS-CoV-2 infection and vaccination. Int J Biostat 2024; 20:389-406. [PMID: 38083810 DOI: 10.1515/ijb-2023-0062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Accepted: 11/02/2023] [Indexed: 12/21/2024]
Abstract
Studying a large number of variables measured on the same observations and organized in blocks - denoted multiblock data - is becoming standard in several domains especially in biology. To explore the relationships between all these variables - at the block- and the variable-level - several exploratory multiblock methods were proposed. However, most of them are only designed for numeric variables. In reality, some data sets contain variables of different measurement levels (i.e., numeric, nominal, ordinal). In this article, we focus on exploratory multiblock methods that handle variables at their appropriate measurement level. Multi-Block Principal Component Analysis with Optimal Scaling (MBPCA-OS) is proposed and applied to multiblock data from the CURIE-O-SA French cohort. In this study, variables are of different measurement levels and organized in four blocks. The objective is to study the immune responses according to the SARS-CoV-2 infection and vaccination statuses, the symptoms and the participant's characteristics.
Collapse
Affiliation(s)
- Martin Paries
- Oniris, INRAE, StatSC, 44300 Nantes, France
- Anses, Epidemiology, Health and Welfare, Laboratory of Ploufragan-Plouzané-Niort, Ploufragan, France
| | | | - Adeline Huneau
- Anses, Epidemiology, Health and Welfare, Laboratory of Ploufragan-Plouzané-Niort, Ploufragan, France
| | - Olivier Lantz
- Clinical Immunology Laboratory, Institute Curie, Paris, France
| | - Stéphanie Bougeard
- Anses, Epidemiology, Health and Welfare, Laboratory of Ploufragan-Plouzané-Niort, Ploufragan, France
| |
Collapse
|
5
|
Ballard JL, Wang Z, Li W, Shen L, Long Q. Deep learning-based approaches for multi-omics data integration and analysis. BioData Min 2024; 17:38. [PMID: 39358793 PMCID: PMC11446004 DOI: 10.1186/s13040-024-00391-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Accepted: 09/06/2024] [Indexed: 10/04/2024] Open
Abstract
BACKGROUND The rapid growth of deep learning, as well as the vast and ever-growing amount of available data, have provided ample opportunity for advances in fusion and analysis of complex and heterogeneous data types. Different data modalities provide complementary information that can be leveraged to gain a more complete understanding of each subject. In the biomedical domain, multi-omics data includes molecular (genomics, transcriptomics, proteomics, epigenomics, metabolomics, etc.) and imaging (radiomics, pathomics) modalities which, when combined, have the potential to improve performance on prediction, classification, clustering and other tasks. Deep learning encompasses a wide variety of methods, each of which have certain strengths and weaknesses for multi-omics integration. METHOD In this review, we categorize recent deep learning-based approaches by their basic architectures and discuss their unique capabilities in relation to one another. We also discuss some emerging themes advancing the field of multi-omics integration. RESULTS Deep learning-based multi-omics integration methods were categorized broadly into non-generative (feedforward neural networks, graph convolutional neural networks, and autoencoders) and generative (variational methods, generative adversarial models, and a generative pretrained model). Generative methods have the advantage of being able to impose constraints on the shared representations to enforce certain properties or incorporate prior knowledge. They can also be used to generate or impute missing modalities. Recent advances achieved by these methods include the ability to handle incomplete data as well as going beyond the traditional molecular omics data types to integrate other modalities such as imaging data. CONCLUSION We expect to see further growth in methods that can handle missingness, as this is a common challenge in working with complex and heterogeneous data. Additionally, methods that integrate more data types are expected to improve performance on downstream tasks by capturing a comprehensive view of each sample.
Collapse
Affiliation(s)
- Jenna L Ballard
- Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA, 19104, USA.
| | - Zexuan Wang
- Graduate Group in Applied Mathematics and Computational Science, University of Pennsylvania, 209 S. 33rd Street, Philadelphia, PA, 19104, USA
| | - Wenrui Li
- Department of Statistics, University of Connecticut, 215 Glenbrook Road, Storrs, CT, 06269, USA
| | - Li Shen
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, 423 Guardian Drive, Philadelphia, PA, 19104, USA.
| | - Qi Long
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, 423 Guardian Drive, Philadelphia, PA, 19104, USA.
| |
Collapse
|
6
|
Zhang W, Sen A, Pena JK, Reitsma A, Alexander OC, Tajima T, Martinez OM, Krams SM. Application of Mass Cytometry Platforms to Solid Organ Transplantation. Transplantation 2024; 108:2034-2044. [PMID: 38467594 PMCID: PMC11390974 DOI: 10.1097/tp.0000000000004925] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/13/2024]
Abstract
Transplantation serves as the cornerstone of treatment for patients with end-stage organ disease. The prevalence of complications, such as allograft rejection, infection, and malignancies, underscores the need to dissect the complex interactions of the immune system at the single-cell level. In this review, we discuss studies using mass cytometry or cytometry by time-of-flight, a cutting-edge technology enabling the characterization of immune populations and cell-to-cell interactions in granular detail. We review the application of mass cytometry in human and experimental animal studies in the context of transplantation, uncovering invaluable contributions of the tool to understanding rejection and other transplant-related complications. We discuss recent innovations that have the potential to streamline and standardize mass cytometry workflows for application to multisite clinical trials. Additionally, we introduce imaging mass cytometry, a technique that couples the power of mass cytometry with spatial context, thereby mapping cellular interactions within tissue microenvironments. The synergistic integration of mass cytometry and imaging mass cytometry data with other omics data sets and high-dimensional data platforms to further define immune dynamics is discussed. In conclusion, mass cytometry technologies, when integrated with other tools and data, shed light on the intricate landscape of the immune response in transplantation. This approach holds significant potential for enhancing patient outcomes by advancing our understanding and facilitating the development of new diagnostics and therapeutics.
Collapse
Affiliation(s)
- Wenming Zhang
- Department of Surgery, Stanford University, Stanford, CA, United States
| | - Ayantika Sen
- Department of Surgery, Stanford University, Stanford, CA, United States
| | - Josselyn K. Pena
- Department of Surgery, Stanford University, Stanford, CA, United States
| | - Andrea Reitsma
- Department of Surgery, Stanford University, Stanford, CA, United States
| | - Oliver C. Alexander
- Department of Surgery, Stanford University, Stanford, CA, United States
- Meharry Medical College, School of Medicine, Nashville, TN, United States
| | - Tetsuya Tajima
- Department of Surgery, Stanford University, Stanford, CA, United States
| | | | - Sheri M. Krams
- Department of Surgery, Stanford University, Stanford, CA, United States
| |
Collapse
|
7
|
Tripathy RK, Frohock Z, Wang H, Cary GA, Keegan S, Carter GW, Li Y. An explainable graph neural network approach for effectively integrating multi-omics with prior knowledge to identify biomarkers from interacting biological domains. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.23.609465. [PMID: 39253523 PMCID: PMC11383059 DOI: 10.1101/2024.08.23.609465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 09/11/2024]
Abstract
The rapid growth of multi-omics datasets, in addition to the wealth of existing biological prior knowledge, necessitates the development of effective methods for their integration. Such methods are essential for building predictive models and identifying disease-related molecular markers. We propose a framework for supervised integration of multi-omics data with biological priors represented as knowledge graphs. Our framework leverages graph neural networks (GNNs) to model the relationships among features from high-dimensional 'omics data and set transformers to integrate low-dimensional representations of 'omics features. Furthermore, our framework incorporates explainability methods to elucidate important biomarkers and extract interaction relationships between biological quantities of interest. We demonstrate the effectiveness of our approach by applying it to Alzheimer's disease (AD) multi-omics data from the ROSMAP cohort, showing that the integration of transcriptomics and proteomics data with AD biological domain network priors improves the prediction accuracy of AD status and highlights functional AD biomarkers.
Collapse
Affiliation(s)
| | - Zachary Frohock
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Hong Wang
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | | | | | - Yi Li
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| |
Collapse
|
8
|
Kobel CM, Merkesvik J, Burgos IMT, Lai W, Øyås O, Pope PB, Hvidsten TR, Aho VTE. Integrating host and microbiome biology using holo-omics. Mol Omics 2024; 20:438-452. [PMID: 38963125 DOI: 10.1039/d4mo00017j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/05/2024]
Abstract
Holo-omics is the use of omics data to study a host and its inherent microbiomes - a biological system known as a "holobiont". A microbiome that exists in such a space often encounters habitat stability and in return provides metabolic capacities that can benefit their host. Here we present an overview of beneficial host-microbiome systems and propose and discuss several methodological frameworks that can be used to investigate the intricacies of the many as yet undefined host-microbiome interactions that influence holobiont homeostasis. While this is an emerging field, we anticipate that ongoing methodological advancements will enhance the biological resolution that is necessary to improve our understanding of host-microbiome interplay to make meaningful interpretations and biotechnological applications.
Collapse
Affiliation(s)
- Carl M Kobel
- Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway.
| | - Jenny Merkesvik
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, Ås, Norway
| | | | - Wanxin Lai
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, Ås, Norway
| | - Ove Øyås
- Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway.
| | - Phillip B Pope
- Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway.
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, Ås, Norway
- Centre for Microbiome Research, School of Biomedical Sciences, Queensland University of Technology (QUT), Translational Research Institute, Woolloongabba, Queensland, Australia
| | - Torgeir R Hvidsten
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, Ås, Norway
| | - Velma T E Aho
- Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway.
| |
Collapse
|
9
|
Chakraborty S, Sharma G, Karmakar S, Banerjee S. Multi-OMICS approaches in cancer biology: New era in cancer therapy. Biochim Biophys Acta Mol Basis Dis 2024; 1870:167120. [PMID: 38484941 DOI: 10.1016/j.bbadis.2024.167120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 03/06/2024] [Accepted: 03/06/2024] [Indexed: 04/01/2024]
Abstract
Innovative multi-omics frameworks integrate diverse datasets from the same patients to enhance our understanding of the molecular and clinical aspects of cancers. Advanced omics and multi-view clustering algorithms present unprecedented opportunities for classifying cancers into subtypes, refining survival predictions and treatment outcomes, and unravelling key pathophysiological processes across various molecular layers. However, with the increasing availability of cost-effective high-throughput technologies (HTT) that generate vast amounts of data, analyzing single layers often falls short of establishing causal relations. Integrating multi-omics data spanning genomes, epigenomes, transcriptomes, proteomes, metabolomes, and microbiomes offers unique prospects to comprehend the underlying biology of complex diseases like cancer. This discussion explores algorithmic frameworks designed to uncover cancer subtypes, disease mechanisms, and methods for identifying pivotal genomic alterations. It also underscores the significance of multi-omics in tumor classifications, diagnostics, and prognostications. Despite its unparalleled advantages, the integration of multi-omics data has been slow to find its way into everyday clinics. A major hurdle is the uneven maturity of different omics approaches and the widening gap between the generation of large datasets and the capacity to process this data. Initiatives promoting the standardization of sample processing and analytical pipelines, as well as multidisciplinary training for experts in data analysis and interpretation, are crucial for translating theoretical findings into practical applications.
Collapse
Affiliation(s)
- Sohini Chakraborty
- Department of Biotechnology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India
| | - Gaurav Sharma
- Department of Biotechnology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India
| | - Sricheta Karmakar
- Department of Biotechnology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India
| | - Satarupa Banerjee
- Department of Biotechnology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India.
| |
Collapse
|
10
|
Lobón-Iglesias MJ, Andrianteranagna M, Han ZY, Chauvin C, Masliah-Planchon J, Manriquez V, Tauziede-Espariat A, Turczynski S, Bouarich-Bourimi R, Frah M, Dufour C, Blauwblomme T, Cardoen L, Pierron G, Maillot L, Guillemot D, Reynaud S, Bourneix C, Pouponnot C, Surdez D, Bohec M, Baulande S, Delattre O, Piaggio E, Ayrault O, Waterfall JJ, Servant N, Beccaria K, Dangouloff-Ros V, Bourdeaut F. Imaging and multi-omics datasets converge to define different neural progenitor origins for ATRT-SHH subgroups. Nat Commun 2023; 14:6669. [PMID: 37863903 PMCID: PMC10589300 DOI: 10.1038/s41467-023-42371-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Accepted: 10/09/2023] [Indexed: 10/22/2023] Open
Abstract
Atypical teratoid rhabdoid tumors (ATRT) are divided into MYC, TYR and SHH subgroups, suggesting diverse lineages of origin. Here, we investigate the imaging of human ATRT at diagnosis and the precise anatomic origin of brain tumors in the Rosa26-CreERT2::Smarcb1flox/flox model. This cross-species analysis points to an extra-cerebral origin for MYC tumors. Additionally, we clearly distinguish SHH ATRT emerging from the cerebellar anterior lobe (CAL) from those emerging from the basal ganglia (BG) and intra-ventricular (IV) regions. Molecular characteristics point to the midbrain-hindbrain boundary as the origin of CAL SHH ATRT, and to the ganglionic eminence as the origin of BG/IV SHH ATRT. Single-cell RNA sequencing on SHH ATRT supports these hypotheses. Trajectory analyses suggest that SMARCB1 loss induces a de-differentiation process mediated by repressors of the neuronal program such as REST, ID and the NOTCH pathway.
Collapse
Affiliation(s)
- María-Jesús Lobón-Iglesias
- INSERM U830, Laboratory of Translational Research In Pediatric Oncology, PSL Research University, SIREDO Oncology center, Institut Curie Research Center, Paris, France
| | - Mamy Andrianteranagna
- INSERM U830, Laboratory of Translational Research In Pediatric Oncology, PSL Research University, SIREDO Oncology center, Institut Curie Research Center, Paris, France
- INSERM U900, Bioinformatics, Biostatistics, Epidemiology and Computational Systems Unit, Institut Curie, Mines Paris Tech, PSL Research University, Institut Curie Research Center, Paris, France
| | - Zhi-Yan Han
- INSERM U830, Laboratory of Translational Research In Pediatric Oncology, PSL Research University, SIREDO Oncology center, Institut Curie Research Center, Paris, France
| | - Céline Chauvin
- INSERM U830, Laboratory of Translational Research In Pediatric Oncology, PSL Research University, SIREDO Oncology center, Institut Curie Research Center, Paris, France
| | - Julien Masliah-Planchon
- Somatic Genetic Unit, Department of Pathology and Diagnostic and Theranostic Medecine, Institut Curie Hospital, Paris, France
| | - Valeria Manriquez
- INSERM U932, Immunity and Cancer, PSL Research University, Institut Curie Research Center, Paris, France
| | - Arnault Tauziede-Espariat
- Department of Neuropathology, GHU Paris-Psychiatry and Neurosciences, Sainte-Anne Hospital, Paris, France
- Paris Psychiatry and Neurosciences Institute (IPNP), UMR S1266, INSERM, IMA-BRAIN, Paris, France
| | - Sandrina Turczynski
- INSERM U830, Laboratory of Translational Research In Pediatric Oncology, PSL Research University, SIREDO Oncology center, Institut Curie Research Center, Paris, France
| | - Rachida Bouarich-Bourimi
- INSERM U830, Laboratory of Translational Research In Pediatric Oncology, PSL Research University, SIREDO Oncology center, Institut Curie Research Center, Paris, France
| | - Magali Frah
- INSERM U830, Laboratory of Translational Research In Pediatric Oncology, PSL Research University, SIREDO Oncology center, Institut Curie Research Center, Paris, France
| | - Christelle Dufour
- Department of Children and Adolescents Oncology, Gustave Roussy, Paris Saclay University, Villejuif, France
| | - Thomas Blauwblomme
- Department of Pediatric Neurosurgery-AP-HP, Necker Sick Kids Hospital, Université de Paris, Paris, France
| | | | - Gaelle Pierron
- Somatic Genetic Unit, Department of Pathology and Diagnostic and Theranostic Medecine, Institut Curie Hospital, Paris, France
| | - Laetitia Maillot
- Somatic Genetic Unit, Department of Pathology and Diagnostic and Theranostic Medecine, Institut Curie Hospital, Paris, France
| | - Delphine Guillemot
- Somatic Genetic Unit, Department of Pathology and Diagnostic and Theranostic Medecine, Institut Curie Hospital, Paris, France
| | - Stéphanie Reynaud
- Somatic Genetic Unit, Department of Pathology and Diagnostic and Theranostic Medecine, Institut Curie Hospital, Paris, France
| | - Christine Bourneix
- Somatic Genetic Unit, Department of Pathology and Diagnostic and Theranostic Medecine, Institut Curie Hospital, Paris, France
| | - Célio Pouponnot
- CNRS UMR 3347, INSERM U1021, Institut Curie, PSL Research University, Université Paris-Saclay, Orsay, France
| | - Didier Surdez
- INSERM U830, Diversity and Plasticity of Childhood Tumors Lab, PSL Research University, SIREDO Oncology Center, Institut Curie Research Center, Paris, France
- Balgrist University Hospital, Faculty of Medicine, University of Zurich (UZH), Zurich, Switzerland
| | - Mylene Bohec
- Institut Curie, PSL University, Single Cell Initiative, ICGex Next-Generation Sequencing Platform, PSL University, 75005, Paris, France
| | - Sylvain Baulande
- Institut Curie, PSL University, Single Cell Initiative, ICGex Next-Generation Sequencing Platform, PSL University, 75005, Paris, France
| | - Olivier Delattre
- Somatic Genetic Unit, Department of Pathology and Diagnostic and Theranostic Medecine, Institut Curie Hospital, Paris, France
- INSERM U830, Diversity and Plasticity of Childhood Tumors Lab, PSL Research University, SIREDO Oncology Center, Institut Curie Research Center, Paris, France
| | - Eliane Piaggio
- INSERM U932, Immunity and Cancer, PSL Research University, Institut Curie Research Center, Paris, France
| | - Olivier Ayrault
- CNRS UMR 3347, INSERM U1021, Institut Curie, PSL Research University, Université Paris-Saclay, Orsay, France
| | - Joshua J Waterfall
- INSERM U830, Integrative Functional Genomics of Cancer Lab, PSL Research University, Institut Curie Research Center, Paris, France
- Department of Translational Research, PSL Research University, Institut Curie Research Center, Paris, France
| | - Nicolas Servant
- INSERM U900, Bioinformatics, Biostatistics, Epidemiology and Computational Systems Unit, Institut Curie, Mines Paris Tech, PSL Research University, Institut Curie Research Center, Paris, France
| | - Kevin Beccaria
- Department of Pediatric Neurosurgery-AP-HP, Necker Sick Kids Hospital, Université de Paris, Paris, France
| | - Volodia Dangouloff-Ros
- Pediatric Radiology Department, AP-HP, Necker Sick Kids Hospital and Paris Cite Universiy INSERM 1299 and UMR 1163, Institut Imagine, Paris, France
| | - Franck Bourdeaut
- INSERM U830, Laboratory of Translational Research In Pediatric Oncology, PSL Research University, SIREDO Oncology center, Institut Curie Research Center, Paris, France.
- Department of Pediatric Oncology, SIREDO Oncology Center, Institut Curie Hospital, Paris, and Université de Paris, Paris, France.
| |
Collapse
|
11
|
Maigné É, Noirot C, Henry J, Adu Kesewaah Y, Badin L, Déjean S, Guilmineau C, Krebs A, Mathevet F, Segalini A, Thomassin L, Colongo D, Gaspin C, Liaubet L, Vialaneix N. Asterics: a simple tool for the ExploRation and Integration of omiCS data. BMC Bioinformatics 2023; 24:391. [PMID: 37853347 PMCID: PMC10583411 DOI: 10.1186/s12859-023-05504-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 09/28/2023] [Indexed: 10/20/2023] Open
Abstract
BACKGROUND The rapid development of omics acquisition techniques has induced the production of a large volume of heterogeneous and multi-level omics datasets, which require specific and sometimes complex analyses to obtain relevant biological information. Here, we present ASTERICS (version 2.5), a publicly available web interface for the analyses of omics datasets. RESULTS ASTERICS is designed to make both standard and complex exploratory and integration analysis workflows easily available to biologists and to provide high quality interactive plots. Special care has been taken to provide a comprehensive documentation of the implemented analyses and to guide users toward sound analysis choices regarding some specific omics data. Data and analyses are organized in a comprehensive graphical workflow within ASTERICS workspace to facilitate the understanding of successive data editions and analyses leading to a given result. CONCLUSION ASTERICS provides an easy to use platform for omics data exploration and integration. The modular organization of its open source code makes it easy to incorporate new workflows and analyses by external contributors. ASTERICS is available at https://asterics.miat.inrae.fr and can also be deployed using provided docker images.
Collapse
Affiliation(s)
- Élise Maigné
- Université de Toulouse, INRAE, UR MIAT, 31326, Castanet-Tolosan, France
| | - Céline Noirot
- Université de Toulouse, INRAE, UR MIAT, 31326, Castanet-Tolosan, France
- Université Fédérale de Toulouse, INRAE, Bioinfomics, Genotoul Bioinformatics Facility, 31326, Castanet-Tolosan, France
| | - Julien Henry
- Université de Toulouse, INRAE, UR MIAT, 31326, Castanet-Tolosan, France
- Plateforme Biostatistique, Genotoul, Toulouse, France
| | - Yaa Adu Kesewaah
- Université de Toulouse, INRAE, UR MIAT, 31326, Castanet-Tolosan, France
- Plateforme Biostatistique, Genotoul, Toulouse, France
| | | | - Sébastien Déjean
- Plateforme Biostatistique, Genotoul, Toulouse, France
- IMT, UMR 5219, Université de Toulouse, CNRS, UPS, 31062, Toulouse, France
| | - Camille Guilmineau
- Université de Toulouse, INRAE, UR MIAT, 31326, Castanet-Tolosan, France
- Plateforme Biostatistique, Genotoul, Toulouse, France
| | - Arielle Krebs
- Université de Toulouse, INRAE, UR MIAT, 31326, Castanet-Tolosan, France
- Université Fédérale de Toulouse, INRAE, Bioinfomics, Genotoul Bioinformatics Facility, 31326, Castanet-Tolosan, France
| | - Fanny Mathevet
- Université de Toulouse, INRAE, UR MIAT, 31326, Castanet-Tolosan, France
- Plateforme Biostatistique, Genotoul, Toulouse, France
| | | | | | | | - Christine Gaspin
- Université de Toulouse, INRAE, UR MIAT, 31326, Castanet-Tolosan, France
- Université Fédérale de Toulouse, INRAE, Bioinfomics, Genotoul Bioinformatics Facility, 31326, Castanet-Tolosan, France
| | - Laurence Liaubet
- GenPhySE, Université de Toulouse, INRAE, ENVT, 31326, Castanet-Tolosan, France
| | - Nathalie Vialaneix
- Université de Toulouse, INRAE, UR MIAT, 31326, Castanet-Tolosan, France.
- Plateforme Biostatistique, Genotoul, Toulouse, France.
| |
Collapse
|
12
|
Wang Q, He M, Guo L, Chai H. AFEI: adaptive optimized vertical federated learning for heterogeneous multi-omics data integration. Brief Bioinform 2023; 24:bbad269. [PMID: 37497720 DOI: 10.1093/bib/bbad269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 06/26/2023] [Accepted: 07/04/2023] [Indexed: 07/28/2023] Open
Abstract
Vertical federated learning has gained popularity as a means of enabling collaboration and information sharing between different entities while maintaining data privacy and security. This approach has potential applications in disease healthcare, cancer prognosis prediction, and other industries where data privacy is a major concern. Although using multi-omics data for cancer prognosis prediction provides more information for treatment selection, collecting different types of omics data can be challenging due to their production in various medical institutions. Data owners must comply with strict data protection regulations such as European Union (EU) General Data Protection Regulation. To share patient data across multiple institutions, privacy and security issues must be addressed. Therefore, we propose an adaptive optimized vertical federated-learning-based framework adaptive optimized vertical federated learning for heterogeneous multi-omics data integration (AFEI) to integrate multi-omics data collected from multiple institutions for cancer prognosis prediction. AFEI enables participating parties to build an accurate joint evaluation model for learning more information related to cancer patients from different perspectives, based on the distributed and encrypted multi-omics features shared by multiple institutions. The experimental results demonstrate that AFEI achieves higher prediction accuracy (6.5% on average) than using single omics data by utilizing the encrypted multi-omics data from different institutions, and it performs almost as well as prognosis prediction by directly integrating multi-omics data. Overall, AFEI can be seen as an efficient solution for breaking down barriers to multi-institutional collaboration and promoting the development of cancer prognosis prediction.
Collapse
Affiliation(s)
- Qingyong Wang
- School of Information and Computer, Anhui Agricultural University, Hefei 230000, China
| | - Minfan He
- School of Mathematics and Big Data, Foshan University, Foshan 528000, China
| | - Longyi Guo
- Guangdong Provincial Hospital of Traditional Chinese Medical, Guangzhou 510000, China
| | - Hua Chai
- School of Mathematics and Big Data, Foshan University, Foshan 528000, China
| |
Collapse
|
13
|
Briscik M, Dillies MA, Déjean S. Improvement of variables interpretability in kernel PCA. BMC Bioinformatics 2023; 24:282. [PMID: 37438763 DOI: 10.1186/s12859-023-05404-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 06/27/2023] [Indexed: 07/14/2023] Open
Abstract
BACKGROUND Kernel methods have been proven to be a powerful tool for the integration and analysis of high-throughput technologies generated data. Kernels offer a nonlinear version of any linear algorithm solely based on dot products. The kernelized version of principal component analysis is a valid nonlinear alternative to tackle the nonlinearity of biological sample spaces. This paper proposes a novel methodology to obtain a data-driven feature importance based on the kernel PCA representation of the data. RESULTS The proposed method, kernel PCA Interpretable Gradient (KPCA-IG), provides a data-driven feature importance that is computationally fast and based solely on linear algebra calculations. It has been compared with existing methods on three benchmark datasets. The accuracy obtained using KPCA-IG selected features is equal to or greater than the other methods' average. Also, the computational complexity required demonstrates the high efficiency of the method. An exhaustive literature search has been conducted on the selected genes from a publicly available Hepatocellular carcinoma dataset to validate the retained features from a biological point of view. The results once again remark on the appropriateness of the computed ranking. CONCLUSIONS The black-box nature of kernel PCA needs new methods to interpret the original features. Our proposed methodology KPCA-IG proved to be a valid alternative to select influential variables in high-dimensional high-throughput datasets, potentially unravelling new biological and medical biomarkers.
Collapse
Affiliation(s)
- Mitja Briscik
- Institut de Mathématiques de Toulouse, UMR5219, CNRS, UPS, Université de Toulouse, Cedex 9, 31062, Toulouse, France.
| | - Marie-Agnès Dillies
- Institut Pasteur, Université Paris Cité, Bioinformatics and Biostatistics Hub, F-75015, Paris, France
| | - Sébastien Déjean
- Institut de Mathématiques de Toulouse, UMR5219, CNRS, UPS, Université de Toulouse, Cedex 9, 31062, Toulouse, France
| |
Collapse
|
14
|
Erdem C, Gross SM, Heiser LM, Birtwistle MR. MOBILE pipeline enables identification of context-specific networks and regulatory mechanisms. Nat Commun 2023; 14:3991. [PMID: 37414767 PMCID: PMC10326020 DOI: 10.1038/s41467-023-39729-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Accepted: 06/27/2023] [Indexed: 07/08/2023] Open
Abstract
Robust identification of context-specific network features that control cellular phenotypes remains a challenge. We here introduce MOBILE (Multi-Omics Binary Integration via Lasso Ensembles) to nominate molecular features associated with cellular phenotypes and pathways. First, we use MOBILE to nominate mechanisms of interferon-γ (IFNγ) regulated PD-L1 expression. Our analyses suggest that IFNγ-controlled PD-L1 expression involves BST2, CLIC2, FAM83D, ACSL5, and HIST2H2AA3 genes, which were supported by prior literature. We also compare networks activated by related family members transforming growth factor-beta 1 (TGFβ1) and bone morphogenetic protein 2 (BMP2) and find that differences in ligand-induced changes in cell size and clustering properties are related to differences in laminin/collagen pathway activity. Finally, we demonstrate the broad applicability and adaptability of MOBILE by analyzing publicly available molecular datasets to investigate breast cancer subtype specific networks. Given the ever-growing availability of multi-omics datasets, we envision that MOBILE will be broadly useful for identification of context-specific molecular features and pathways.
Collapse
Affiliation(s)
- Cemal Erdem
- Department of Chemical and Biomolecular Engineering, Clemson University, Clemson, SC, USA
| | - Sean M Gross
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA
| | - Laura M Heiser
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA.
| | - Marc R Birtwistle
- Department of Chemical and Biomolecular Engineering, Clemson University, Clemson, SC, USA.
- Department of Bioengineering, Clemson University, Clemson, SC, USA.
| |
Collapse
|
15
|
Nõlvak H, Truu M, Tiirik K, Devarajan AK, Peeb A, Truu J. The effect of synthetic silver nanoparticles on the antibiotic resistome and the removal efficiency of antibiotic resistance genes in a hybrid filter system treating municipal wastewater. WATER RESEARCH 2023; 237:119986. [PMID: 37098287 DOI: 10.1016/j.watres.2023.119986] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/04/2023] [Revised: 04/15/2023] [Accepted: 04/18/2023] [Indexed: 05/09/2023]
Abstract
Engineered nanoparticles, including silver nanoparticles (AgNPs), are released into the environment mainly through wastewater treatment systems. Knowledge of the impact of AgNPs on the abundance and removal efficiency of antibiotic resistance genes (ARGs) in wastewater treatment facilities, including constructed wetlands (CWs), is essential in the context of public health. This study evaluated the effect of increased (100-fold) collargol (protein-coated AgNPs) and ionic Ag+ in municipal wastewater on the structure, abundance, and removal efficiency of the antibiotic resistome, integron-integrase genes, and pathogens in a hybrid CW using quantitative PCR and metagenomic approaches. The abundance of ARGs in wastewater and the removal efficiency of ARGs in the hybrid system were significantly affected by higher Ag concentrations, especially with collargol treatment, resulting in an elevated ARG discharge of system effluent into the environment. The accumulated Ag in the filters had a more profound effect on the absolute and relative abundance of ARGs in the treated water than the Ag content in the water. This study recorded significantly enhanced relative abundance values for tetracycline (tetA, tetC, tetQ), sulfonamide (sul1, sul2), and aminoglycoside (aadA) resistance genes, which are frequently found on mobile genetic elements in collargol- and, to a lesser extent, AgNO3-treated subsystems. Elevated plasmid and integron-integrase gene levels, especially intI1, in response to collargol presence indicated the substantial role of AgNPs in promoting horizontal gene transfer in the treatment system. The pathogenic segment of the prokaryotic community was similar to a typical sewage community, and strong correlations between pathogen and ARG proportions were recorded in vertical subsurface flow filters. Furthermore, the proportion of Salmonella enterica was positively related to the Ag content in these filter effluents. The effect of AgNPs on the nature and characteristics of prominent resistance genes carried by mobile genetic elements in CWs requires further investigation.
Collapse
Affiliation(s)
- Hiie Nõlvak
- Institute of Molecular and Cell Biology, University of Tartu, Riia 23, Tartu 51010, Estonia.
| | - Marika Truu
- Institute of Molecular and Cell Biology, University of Tartu, Riia 23, Tartu 51010, Estonia
| | - Kertu Tiirik
- Institute of Molecular and Cell Biology, University of Tartu, Riia 23, Tartu 51010, Estonia
| | - Arun Kumar Devarajan
- Institute of Molecular and Cell Biology, University of Tartu, Riia 23, Tartu 51010, Estonia
| | - Angela Peeb
- Institute of Molecular and Cell Biology, University of Tartu, Riia 23, Tartu 51010, Estonia
| | - Jaak Truu
- Institute of Molecular and Cell Biology, University of Tartu, Riia 23, Tartu 51010, Estonia
| |
Collapse
|
16
|
Fu J, Zhu F, Xu CJ, Li Y. Metabolomics meets systems immunology. EMBO Rep 2023; 24:e55747. [PMID: 36916532 PMCID: PMC10074123 DOI: 10.15252/embr.202255747] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 12/24/2022] [Accepted: 02/24/2023] [Indexed: 03/16/2023] Open
Abstract
Metabolic processes play a critical role in immune regulation. Metabolomics is the systematic analysis of small molecules (metabolites) in organisms or biological samples, providing an opportunity to comprehensively study interactions between metabolism and immunity in physiology and disease. Integrating metabolomics into systems immunology allows the exploration of the interactions of multilayered features in the biological system and the molecular regulatory mechanism of these features. Here, we provide an overview on recent technological developments of metabolomic applications in immunological research. To begin, two widely used metabolomics approaches are compared: targeted and untargeted metabolomics. Then, we provide a comprehensive overview of the analysis workflow and the computational tools available, including sample preparation, raw spectra data preprocessing, data processing, statistical analysis, and interpretation. Third, we describe how to integrate metabolomics with other omics approaches in immunological studies using available tools. Finally, we discuss new developments in metabolomics and its prospects for immunology research. This review provides guidance to researchers using metabolomics and multiomics in immunity research, thus facilitating the application of systems immunology to disease research.
Collapse
Affiliation(s)
- Jianbo Fu
- Centre for Individualised Infection Medicine (CiiM), a joint venture between the Helmholtz Centre for Infection Research (HZI) and Hannover Medical School (MHH), Hannover, Germany.,TWINCORE Centre for Experimental and Clinical Infection Research, a joint venture between the Helmholtz Centre for Infection Research (HZI) and the Hannover Medical School (MHH), Hannover, Germany.,College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Cheng-Jian Xu
- Centre for Individualised Infection Medicine (CiiM), a joint venture between the Helmholtz Centre for Infection Research (HZI) and Hannover Medical School (MHH), Hannover, Germany.,TWINCORE Centre for Experimental and Clinical Infection Research, a joint venture between the Helmholtz Centre for Infection Research (HZI) and the Hannover Medical School (MHH), Hannover, Germany.,Department of Internal Medicine and Radboud Center for Infectious Diseases, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Yang Li
- Centre for Individualised Infection Medicine (CiiM), a joint venture between the Helmholtz Centre for Infection Research (HZI) and Hannover Medical School (MHH), Hannover, Germany.,TWINCORE Centre for Experimental and Clinical Infection Research, a joint venture between the Helmholtz Centre for Infection Research (HZI) and the Hannover Medical School (MHH), Hannover, Germany.,Department of Internal Medicine and Radboud Center for Infectious Diseases, Radboud University Medical Center, Nijmegen, The Netherlands
| |
Collapse
|
17
|
Erdem C, Birtwistle MR. MEMMAL: A tool for expanding large-scale mechanistic models with machine learned associations and big datasets. FRONTIERS IN SYSTEMS BIOLOGY 2023; 3:1099413. [PMID: 38269333 PMCID: PMC10807051 DOI: 10.3389/fsysb.2023.1099413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/26/2024]
Abstract
Computational models that can explain and predict complex sub-cellular, cellular, and tissue-level drug response mechanisms could speed drug discovery and prioritize patient-specific treatments (i.e., precision medicine). Some models are mechanistic with detailed equations describing known (or supposed) physicochemical processes, while some are statistical or machine learning-based approaches, that explain datasets but have no mechanistic or causal guarantees. These two types of modeling are rarely combined, missing the opportunity to explore possibly causal but data-driven new knowledge while explaining what is already known. Here, we explore combining machine learned associations with mechanistic models to develop computational models that could more fully represent cellular behavior. In this proposed MEMMAL (MEchanistic Modeling with MAchine Learning) framework, machine learning/statistical models built using omics datasets provide predictions for new interactions between genes and proteins where there is physicochemical uncertainty. These interactions are used as a basis for new reactions in mechanistic models. As a test case, we focused on incorporating novel IFNγ/PD-L1 related associations into a large-scale mechanistic model for cell proliferation and death to better recapitulate the recently released NIH LINCS Consortium MCF10A dataset and enable description of the cellular response to checkpoint inhibitor immunotherapies. This work is a template for combining big-data-inferred interactions with mechanistic models, which could be more broadly applicable for building multi-scale precision medicine and whole cell models.
Collapse
Affiliation(s)
- Cemal Erdem
- Department of Chemical and Biomolecular Engineering, Clemson University, Clemson, SC, United States
| | - Marc R. Birtwistle
- Department of Chemical and Biomolecular Engineering, Clemson University, Clemson, SC, United States
- Department of Bioengineering, Clemson University, Clemson, SC, United States
| |
Collapse
|
18
|
Flores JE, Claborne DM, Weller ZD, Webb-Robertson BJM, Waters KM, Bramer LM. Missing data in multi-omics integration: Recent advances through artificial intelligence. Front Artif Intell 2023; 6:1098308. [PMID: 36844425 PMCID: PMC9949722 DOI: 10.3389/frai.2023.1098308] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 01/23/2023] [Indexed: 02/11/2023] Open
Abstract
Biological systems function through complex interactions between various 'omics (biomolecules), and a more complete understanding of these systems is only possible through an integrated, multi-omic perspective. This has presented the need for the development of integration approaches that are able to capture the complex, often non-linear, interactions that define these biological systems and are adapted to the challenges of combining the heterogenous data across 'omic views. A principal challenge to multi-omic integration is missing data because all biomolecules are not measured in all samples. Due to either cost, instrument sensitivity, or other experimental factors, data for a biological sample may be missing for one or more 'omic techologies. Recent methodological developments in artificial intelligence and statistical learning have greatly facilitated the analyses of multi-omics data, however many of these techniques assume access to completely observed data. A subset of these methods incorporate mechanisms for handling partially observed samples, and these methods are the focus of this review. We describe recently developed approaches, noting their primary use cases and highlighting each method's approach to handling missing data. We additionally provide an overview of the more traditional missing data workflows and their limitations; and we discuss potential avenues for further developments as well as how the missing data issue and its current solutions may generalize beyond the multi-omics context.
Collapse
Affiliation(s)
- Javier E. Flores
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States
| | - Daniel M. Claborne
- Pacific Northwest National Laboratory, Artificial Intelligence and Data Analytics Division, National Security Directorate, Richland, WA, United States
| | - Zachary D. Weller
- Pacific Northwest National Laboratory, Artificial Intelligence and Data Analytics Division, National Security Directorate, Richland, WA, United States
| | - Bobbie-Jo M. Webb-Robertson
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States
| | - Katrina M. Waters
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States
| | - Lisa M. Bramer
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States
| |
Collapse
|
19
|
Wei Y, Li L, Zhao X, Yang H, Sa J, Cao H, Cui Y. Cancer subtyping with heterogeneous multi-omics data via hierarchical multi-kernel learning. Brief Bioinform 2023; 24:6847203. [PMID: 36433785 DOI: 10.1093/bib/bbac488] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Revised: 09/14/2022] [Accepted: 10/15/2022] [Indexed: 11/27/2022] Open
Abstract
Differentiating cancer subtypes is crucial to guide personalized treatment and improve the prognosis for patients. Integrating multi-omics data can offer a comprehensive landscape of cancer biological process and provide promising ways for cancer diagnosis and treatment. Taking the heterogeneity of different omics data types into account, we propose a hierarchical multi-kernel learning (hMKL) approach, a novel cancer molecular subtyping method to identify cancer subtypes by adopting a two-stage kernel learning strategy. In stage 1, we obtain a composite kernel borrowing the cancer integration via multi-kernel learning (CIMLR) idea by optimizing the kernel parameters for individual omics data type. In stage 2, we obtain a final fused kernel through a weighted linear combination of individual kernels learned from stage 1 using an unsupervised multiple kernel learning method. Based on the final fusion kernel, k-means clustering is applied to identify cancer subtypes. Simulation studies show that hMKL outperforms the one-stage CIMLR method when there is data heterogeneity. hMKL can estimate the number of clusters correctly, which is the key challenge in subtyping. Application to two real data sets shows that hMKL identified meaningful subtypes and key cancer-associated biomarkers. The proposed method provides a novel toolkit for heterogeneous multi-omics data integration and cancer subtypes identification.
Collapse
Affiliation(s)
- Yifang Wei
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi 030001, PR China
| | - Lingmei Li
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi 030001, PR China
| | - Xin Zhao
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi 030001, PR China
| | - Haitao Yang
- Division of Health Statistics, School of Public Health, Hebei Medical University, Shijiazhuang, Hebei 050017, PR China
| | - Jian Sa
- Department of Science and Technology, Shanxi Provincial Key Laboratory of Major Disease Risk Assessment, Shanxi Medical University, Taiyuan, Shanxi 030001, PR China
| | - Hongyan Cao
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi 030001, PR China.,Department of Mathematics, Shanxi Medical University, Taiyuan, Shanxi 030001, PR China
| | - Yuehua Cui
- Department of Statistics and Probability, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
20
|
Devarajan AK, Truu M, Gopalasubramaniam SK, Muthukrishanan G, Truu J. Application of data integration for rice bacterial strain selection by combining their osmotic stress response and plant growth-promoting traits. Front Microbiol 2022; 13:1058772. [PMID: 36590400 PMCID: PMC9797599 DOI: 10.3389/fmicb.2022.1058772] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 11/29/2022] [Indexed: 12/23/2022] Open
Abstract
Agricultural application of plant-beneficial bacteria to improve crop yield and alleviate the stress caused by environmental conditions, pests, and pathogens is gaining popularity. However, before using these bacterial strains in plant experiments, their environmental stress responses and plant health improvement potential should be examined. In this study, we explored the applicability of three unsupervised machine learning-based data integration methods, including principal component analysis (PCA) of concatenated data, multiple co-inertia analysis (MCIA), and multiple kernel learning (MKL), to select osmotic stress-tolerant plant growth-promoting (PGP) bacterial strains isolated from the rice phyllosphere. The studied datasets consisted of direct and indirect PGP activity measurements and osmotic stress responses of eight bacterial strains previously isolated from the phyllosphere of drought-tolerant rice cultivar. The production of phytohormones, such as indole-acetic acid (IAA), gibberellic acid (GA), abscisic acid (ABA), and cytokinin, were used as direct PGP traits, whereas the production of hydrogen cyanide and siderophore and antagonistic activity against the foliar pathogens Pyricularia oryzae and Helminthosporium oryzae were evaluated as measures of indirect PGP activity. The strains were subjected to a range of osmotic stress levels by adding PEG 6000 (0, 11, 21, and 32.6%) to their growth medium. The results of the osmotic stress response experiments showed that all bacterial strains accumulated endogenous proline and glycine betaine (GB) and exhibited an increase in growth, when osmotic stress levels were increased to a specific degree, while the production of IAA and GA considerably decreased. The three applied data integration methods did not provide a similar grouping of the strains. Especially deviant was the ordination of microbial strains based on the PCA of concatenated data. However, all three data integration methods indicated that the strains Bacillus altitudinis PB46 and B. megaterium PB50 shared high similarity in PGP traits and osmotic stress response. Overall, our results indicate that data integration methods complement the single-table data analysis approach and improve the selection process for PGP microbial strains.
Collapse
Affiliation(s)
- Arun Kumar Devarajan
- Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia,*Correspondence: Arun Kumar Devarajan,
| | - Marika Truu
- Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
| | - Sabarinathan Kuttalingam Gopalasubramaniam
- Department of Plant Pathology, Agricultural College and Research Institute, Tamil Nadu Agricultural University, Killikulam, Tuticorin, India,Sabarinathan Kuttalingam Gopalasubramaniam,
| | - Gomathy Muthukrishanan
- Department of Soil Science and Agricultural Chemistry, Agricultural College and Research Institute, Tamil Nadu Agricultural University, Killikulam, Tuticorin, India
| | - Jaak Truu
- Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
| |
Collapse
|
21
|
Athieniti E, Spyrou GM. A guide to multi-omics data collection and integration for translational medicine. Comput Struct Biotechnol J 2022; 21:134-149. [PMID: 36544480 PMCID: PMC9747357 DOI: 10.1016/j.csbj.2022.11.050] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 11/25/2022] [Accepted: 11/25/2022] [Indexed: 12/02/2022] Open
Abstract
The emerging high-throughput technologies have led to the shift in the design of translational medicine projects towards collecting multi-omics patient samples and, consequently, their integrated analysis. However, the complexity of integrating these datasets has triggered new questions regarding the appropriateness of the available computational methods. Currently, there is no clear consensus on the best combination of omics to include and the data integration methodologies required for their analysis. This article aims to guide the design of multi-omics studies in the field of translational medicine regarding the types of omics and the integration method to choose. We review articles that perform the integration of multiple omics measurements from patient samples. We identify five objectives in translational medicine applications: (i) detect disease-associated molecular patterns, (ii) subtype identification, (iii) diagnosis/prognosis, (iv) drug response prediction, and (v) understand regulatory processes. We describe common trends in the selection of omic types combined for different objectives and diseases. To guide the choice of data integration tools, we group them into the scientific objectives they aim to address. We describe the main computational methods adopted to achieve these objectives and present examples of tools. We compare tools based on how they deal with the computational challenges of data integration and comment on how they perform against predefined objective-specific evaluation criteria. Finally, we discuss examples of tools for downstream analysis and further extraction of novel insights from multi-omics datasets.
Collapse
Affiliation(s)
- Efi Athieniti
- Department of Bioinformatics, The Cyprus Institute of Neurology and Genetics, 6 Iroon Avenue, 2371 Ayios Dometios, Nicosia, Cyprus
| | - George M. Spyrou
- Department of Bioinformatics, The Cyprus Institute of Neurology and Genetics, 6 Iroon Avenue, 2371 Ayios Dometios, Nicosia, Cyprus
| |
Collapse
|
22
|
Chen ZS, Kulkarni P(P, Galatzer-Levy IR, Bigio B, Nasca C, Zhang Y. Modern views of machine learning for precision psychiatry. PATTERNS (NEW YORK, N.Y.) 2022; 3:100602. [PMID: 36419447 PMCID: PMC9676543 DOI: 10.1016/j.patter.2022.100602] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
In light of the National Institute of Mental Health (NIMH)'s Research Domain Criteria (RDoC), the advent of functional neuroimaging, novel technologies and methods provide new opportunities to develop precise and personalized prognosis and diagnosis of mental disorders. Machine learning (ML) and artificial intelligence (AI) technologies are playing an increasingly critical role in the new era of precision psychiatry. Combining ML/AI with neuromodulation technologies can potentially provide explainable solutions in clinical practice and effective therapeutic treatment. Advanced wearable and mobile technologies also call for the new role of ML/AI for digital phenotyping in mobile mental health. In this review, we provide a comprehensive review of ML methodologies and applications by combining neuroimaging, neuromodulation, and advanced mobile technologies in psychiatry practice. We further review the role of ML in molecular phenotyping and cross-species biomarker identification in precision psychiatry. We also discuss explainable AI (XAI) and neuromodulation in a closed human-in-the-loop manner and highlight the ML potential in multi-media information extraction and multi-modal data fusion. Finally, we discuss conceptual and practical challenges in precision psychiatry and highlight ML opportunities in future research.
Collapse
Affiliation(s)
- Zhe Sage Chen
- Department of Psychiatry, New York University Grossman School of Medicine, New York, NY 10016, USA
- Department of Neuroscience and Physiology, New York University Grossman School of Medicine, New York, NY 10016, USA
- The Neuroscience Institute, New York University Grossman School of Medicine, New York, NY 10016, USA
- Department of Biomedical Engineering, New York University Tandon School of Engineering, Brooklyn, NY 11201, USA
| | | | - Isaac R. Galatzer-Levy
- Department of Psychiatry, New York University Grossman School of Medicine, New York, NY 10016, USA
- Meta Reality Lab, New York, NY, USA
| | - Benedetta Bigio
- Department of Psychiatry, New York University Grossman School of Medicine, New York, NY 10016, USA
| | - Carla Nasca
- Department of Psychiatry, New York University Grossman School of Medicine, New York, NY 10016, USA
- The Neuroscience Institute, New York University Grossman School of Medicine, New York, NY 10016, USA
| | - Yu Zhang
- Department of Bioengineering, Lehigh University, Bethlehem, PA 18015, USA
- Department of Electrical and Computer Engineering, Lehigh University, Bethlehem, PA 18015, USA
| |
Collapse
|
23
|
Truu M, Ligi T, Nõlvak H, Peeb A, Tiirik K, Devarajan AK, Oopkaup K, Kasemets K, Kõiv-Vainik M, Kasak K, Truu J. Impact of synthetic silver nanoparticles on the biofilm microbial communities and wastewater treatment efficiency in experimental hybrid filter system treating municipal wastewater. JOURNAL OF HAZARDOUS MATERIALS 2022; 440:129721. [PMID: 35963093 DOI: 10.1016/j.jhazmat.2022.129721] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 07/22/2022] [Accepted: 08/04/2022] [Indexed: 06/15/2023]
Abstract
Silver nanoparticles (AgNPs) threaten human and ecosystem health, and are among the most widely used engineered nanomaterials that reach wastewater during production, usage, and disposal phases. This study evaluated the effect of a 100-fold increase in collargol (protein-coated AgNP) and Ag+ ions concentrations in municipal wastewater on the microbial community composition of the filter material biofilms (FMB) and the purification efficiency of the hybrid treatment system consisting of vertical (VF) and horizontal (HF) subsurface flow filters. We found that increased amounts of collargol and AgNO3 in wastewater had a modest effect on the prokaryotic community composition in FMB and did not significantly affect the performance of the studied system. Regardless of how Ag was introduced, 99.9% of it was removed by the system. AgNPs and AgNO3 concentrations did not significantly affect the purification efficiency of the system. AgNO3 induced a higher increase in the genetic potential of certain Ag resistance mechanisms in VFs than collargol; however, the increase in Ag resistance potential was similar for both substances in HF. Hence, the microbial community composition in biofilms of vertical and horizontal flow filters is largely resistant, resilient, or functionally redundant in response to AgNPs addition in the form of collargol.
Collapse
Affiliation(s)
- Marika Truu
- Institute of Molecular and Cell Biology, University of Tartu, Riia 23, 51010 Tartu, Estonia.
| | - Teele Ligi
- Institute of Molecular and Cell Biology, University of Tartu, Riia 23, 51010 Tartu, Estonia.
| | - Hiie Nõlvak
- Institute of Molecular and Cell Biology, University of Tartu, Riia 23, 51010 Tartu, Estonia.
| | - Angela Peeb
- Institute of Molecular and Cell Biology, University of Tartu, Riia 23, 51010 Tartu, Estonia.
| | - Kertu Tiirik
- Institute of Molecular and Cell Biology, University of Tartu, Riia 23, 51010 Tartu, Estonia.
| | - Arun Kumar Devarajan
- Institute of Molecular and Cell Biology, University of Tartu, Riia 23, 51010 Tartu, Estonia.
| | - Kristjan Oopkaup
- Institute of Molecular and Cell Biology, University of Tartu, Riia 23, 51010 Tartu, Estonia.
| | - Kaja Kasemets
- Laboratory of Environmental Toxicology, National Institute of Chemical Physics and Biophysics, Akadeemia tee 23, 12618 Tallinn, Estonia.
| | - Margit Kõiv-Vainik
- Institute of Ecology and Earth Sciences, University of Tartu, Vanemuise 46, 51014 Tartu, Estonia.
| | - Kuno Kasak
- Institute of Ecology and Earth Sciences, University of Tartu, Vanemuise 46, 51014 Tartu, Estonia.
| | - Jaak Truu
- Institute of Molecular and Cell Biology, University of Tartu, Riia 23, 51010 Tartu, Estonia.
| |
Collapse
|
24
|
Gliozzo J, Mesiti M, Notaro M, Petrini A, Patak A, Puertas-Gallardo A, Paccanaro A, Valentini G, Casiraghi E. Heterogeneous data integration methods for patient similarity networks. Brief Bioinform 2022; 23:6604996. [PMID: 35679533 PMCID: PMC9294435 DOI: 10.1093/bib/bbac207] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2021] [Revised: 04/14/2022] [Accepted: 05/04/2022] [Indexed: 12/29/2022] Open
Abstract
Patient similarity networks (PSNs), where patients are represented as nodes and their similarities as weighted edges, are being increasingly used in clinical research. These networks provide an insightful summary of the relationships among patients and can be exploited by inductive or transductive learning algorithms for the prediction of patient outcome, phenotype and disease risk. PSNs can also be easily visualized, thus offering a natural way to inspect complex heterogeneous patient data and providing some level of explainability of the predictions obtained by machine learning algorithms. The advent of high-throughput technologies, enabling us to acquire high-dimensional views of the same patients (e.g. omics data, laboratory data, imaging data), calls for the development of data fusion techniques for PSNs in order to leverage this rich heterogeneous information. In this article, we review existing methods for integrating multiple biomedical data views to construct PSNs, together with the different patient similarity measures that have been proposed. We also review methods that have appeared in the machine learning literature but have not yet been applied to PSNs, thus providing a resource to navigate the vast machine learning literature existing on this topic. In particular, we focus on methods that could be used to integrate very heterogeneous datasets, including multi-omics data as well as data derived from clinical information and medical imaging.
Collapse
Affiliation(s)
- Jessica Gliozzo
- AnacletoLab - Computer Science Department, Universitá degli Studi di Milano, Via Celoria 18, 20135, Milan, Italy.,European Commission, Joint Research Centre (JRC), Ispra (VA), Italy.,CINI, Infolife National Laboratory, Roma, Italy
| | - Marco Mesiti
- AnacletoLab - Computer Science Department, Universitá degli Studi di Milano, Via Celoria 18, 20135, Milan, Italy.,CINI, Infolife National Laboratory, Roma, Italy
| | - Marco Notaro
- AnacletoLab - Computer Science Department, Universitá degli Studi di Milano, Via Celoria 18, 20135, Milan, Italy.,CINI, Infolife National Laboratory, Roma, Italy
| | - Alessandro Petrini
- AnacletoLab - Computer Science Department, Universitá degli Studi di Milano, Via Celoria 18, 20135, Milan, Italy.,CINI, Infolife National Laboratory, Roma, Italy
| | - Alex Patak
- European Commission, Joint Research Centre (JRC), Ispra (VA), Italy
| | | | - Alberto Paccanaro
- Department of Computer Science, Royal Holloway, University of London, Egham, TW20 0EX UK.,School of Applied Mathematics (EMAp), Fundação Getúlio Vargas, Rio de Janeiro Brazil
| | - Giorgio Valentini
- AnacletoLab - Computer Science Department, Universitá degli Studi di Milano, Via Celoria 18, 20135, Milan, Italy.,CINI, Infolife National Laboratory, Roma, Italy.,DSRC UNIMI, Data Science Research Center, Milano, 20135, Italy.,ELLIS, European Laboratory for Learning and Intelligent Systems, Berlin, Germany
| | - Elena Casiraghi
- AnacletoLab - Computer Science Department, Universitá degli Studi di Milano, Via Celoria 18, 20135, Milan, Italy.,CINI, Infolife National Laboratory, Roma, Italy
| |
Collapse
|
25
|
Courbariaux M, De Santiago K, Dalmasso C, Danjou F, Bekadar S, Corvol JC, Martinez M, Szafranski M, Ambroise C. A Sparse Mixture-of-Experts Model With Screening of Genetic Associations to Guide Disease Subtyping. Front Genet 2022; 13:859462. [PMID: 35734430 PMCID: PMC9207464 DOI: 10.3389/fgene.2022.859462] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Accepted: 04/21/2022] [Indexed: 11/27/2022] Open
Abstract
Motivation: Identifying new genetic associations in non-Mendelian complex diseases is an increasingly difficult challenge. These diseases sometimes appear to have a significant component of heritability requiring explanation, and this missing heritability may be due to the existence of subtypes involving different genetic factors. Taking genetic information into account in clinical trials might potentially have a role in guiding the process of subtyping a complex disease. Most methods dealing with multiple sources of information rely on data transformation, and in disease subtyping, the two main strategies used are 1) the clustering of clinical data followed by posterior genetic analysis and 2) the concomitant clustering of clinical and genetic variables. Both of these strategies have limitations that we propose to address. Contribution: This work proposes an original method for disease subtyping on the basis of both longitudinal clinical variables and high-dimensional genetic markers via a sparse mixture-of-regressions model. The added value of our approach lies in its interpretability in relation to two aspects. First, our model links both clinical and genetic data with regard to their initial nature (i.e., without transformation) and does not require post-processing where the original information is accessed a second time to interpret the subtypes. Second, it can address large-scale problems because of a variable selection step that is used to discard genetic variables that may not be relevant for subtyping. Results: The proposed method was validated on simulations. A dataset from a cohort of Parkinson's disease patients was also analyzed. Several subtypes of the disease and genetic variants that potentially have a role in this typology were identified. Software availability: The R code for the proposed method, named DiSuGen, and a tutorial are available for download (see the references).
Collapse
Affiliation(s)
- Marie Courbariaux
- Université Paris-Saclay, CNRS, Université d’Évry, Laboratoire de Mathématiques et Modélisation d’Évry, Évry-Courcouronnes, France
| | - Kylliann De Santiago
- Université Paris-Saclay, CNRS, Université d’Évry, Laboratoire de Mathématiques et Modélisation d’Évry, Évry-Courcouronnes, France
| | - Cyril Dalmasso
- Université Paris-Saclay, CNRS, Université d’Évry, Laboratoire de Mathématiques et Modélisation d’Évry, Évry-Courcouronnes, France
| | - Fabrice Danjou
- Sorbonne Université, Paris Brain Institute–ICM, Inserm, CNRS, Assistance Publique Hôpitaux de Paris, Pitié-Salpêtrière Hospital, Department of Neurology, Paris, France
| | - Samir Bekadar
- Sorbonne Université, Paris Brain Institute–ICM, Inserm, CNRS, Assistance Publique Hôpitaux de Paris, Pitié-Salpêtrière Hospital, Department of Neurology, Paris, France
| | - Jean-Christophe Corvol
- Sorbonne Université, Paris Brain Institute–ICM, Inserm, CNRS, Assistance Publique Hôpitaux de Paris, Pitié-Salpêtrière Hospital, Department of Neurology, Paris, France
| | - Maria Martinez
- Institut de Recherche en Santé Digestive, Inserm, CHU Purpan, Toulouse, France
| | - Marie Szafranski
- Université Paris-Saclay, CNRS, Université d’Évry, Laboratoire de Mathématiques et Modélisation d’Évry, Évry-Courcouronnes, France
- ENSIIE, Évry-Courcouronnes, France
| | - Christophe Ambroise
- Université Paris-Saclay, CNRS, Université d’Évry, Laboratoire de Mathématiques et Modélisation d’Évry, Évry-Courcouronnes, France
| |
Collapse
|
26
|
Hall RD, D'Auria JC, Silva Ferreira AC, Gibon Y, Kruszka D, Mishra P, van de Zedde R. High-throughput plant phenotyping: a role for metabolomics? TRENDS IN PLANT SCIENCE 2022; 27:549-563. [PMID: 35248492 DOI: 10.1016/j.tplants.2022.02.001] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Revised: 01/18/2022] [Accepted: 02/02/2022] [Indexed: 05/17/2023]
Abstract
High-throughput (HTP) plant phenotyping approaches are developing rapidly and are already helping to bridge the genotype-phenotype gap. However, technologies should be developed beyond current physico-spectral evaluations to extend our analytical capacities to the subcellular level. Metabolites define and determine many key physiological and agronomic features in plants and an ability to integrate a metabolomics approach within current HTP phenotyping platforms has huge potential for added value. While key challenges remain on several fronts, novel technological innovations are upcoming yet under-exploited in a phenotyping context. In this review, we present an overview of the state of the art and how current limitations might be overcome to enable full integration of metabolomics approaches into a generic phenotyping pipeline in the near future.
Collapse
Affiliation(s)
- Robert D Hall
- BU Bioscience, Wageningen University & Research, 6700 AA, Wageningen, The Netherlands; Laboratory of Plant Physiology, Wageningen University, 6700 AA, Wageningen, The Netherlands; Netherlands Metabolomics Centre, Einsteinweg 55, Leiden, The Netherlands.
| | - John C D'Auria
- Department of Molecular Genetics, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK Gatersleben), Gatersleben, Corrensstraße 3, 06466 Seeland, Germany
| | - Antonio C Silva Ferreira
- Universidade Católica Portuguesa, CBQF-Centro de Biotecnologia e Química Fina-Laboratório Associado, Escola Superior de Biotecnologia, Rua Arquiteto Lobão Vital, Apartado 2511, 4202-401 Porto, Portugal; Faculty of AgriSciences, University of Stellenbosch, Matieland 7602, South Africa; Cork Supply Portugal, S.A., Rua Nova do Fial, 4535, Portugal
| | - Yves Gibon
- UMR 1332 Biologie du Fruit et Pathologie, INRAE, Univ. Bordeaux, INRAE Nouvelle Aquitaine - Bordeaux, Avenue Edouard Bourlaux, Villenave d'Ornon, France; Bordeaux Metabolome, MetaboHUB, INRAE, Univ. Bordeaux, Avenue Edouard Bourlaux, Villenave d'Ornon, France PMB-Metabolome, INRAE, Centre INRAE de Nouvelle, Aquitaine-Bordeaux, Villenave d'Ornon, France
| | - Dariusz Kruszka
- Institute of Plant Genetics, Polish Academy of Sciences, 60-479 Poznan, Poland
| | - Puneet Mishra
- Food and Biobased Research, Wageningen University & Research, 6708 WE, Wageningen, The Netherlands
| | - Rick van de Zedde
- Plant Sciences Group, Wageningen University & Research, 6700 AA, Wageningen, The Netherlands
| |
Collapse
|
27
|
Hesami M, Alizadeh M, Jones AMP, Torkamaneh D. Machine learning: its challenges and opportunities in plant system biology. Appl Microbiol Biotechnol 2022; 106:3507-3530. [PMID: 35575915 DOI: 10.1007/s00253-022-11963-6] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 03/14/2022] [Accepted: 05/07/2022] [Indexed: 12/25/2022]
Abstract
Sequencing technologies are evolving at a rapid pace, enabling the generation of massive amounts of data in multiple dimensions (e.g., genomics, epigenomics, transcriptomic, metabolomics, proteomics, and single-cell omics) in plants. To provide comprehensive insights into the complexity of plant biological systems, it is important to integrate different omics datasets. Although recent advances in computational analytical pipelines have enabled efficient and high-quality exploration and exploitation of single omics data, the integration of multidimensional, heterogenous, and large datasets (i.e., multi-omics) remains a challenge. In this regard, machine learning (ML) offers promising approaches to integrate large datasets and to recognize fine-grained patterns and relationships. Nevertheless, they require rigorous optimizations to process multi-omics-derived datasets. In this review, we discuss the main concepts of machine learning as well as the key challenges and solutions related to the big data derived from plant system biology. We also provide in-depth insight into the principles of data integration using ML, as well as challenges and opportunities in different contexts including multi-omics, single-cell omics, protein function, and protein-protein interaction. KEY POINTS: • The key challenges and solutions related to the big data derived from plant system biology have been highlighted. • Different methods of data integration have been discussed. • Challenges and opportunities of the application of machine learning in plant system biology have been highlighted and discussed.
Collapse
Affiliation(s)
- Mohsen Hesami
- Department of Plant Agriculture, University of Guelph, Guelph, ON, N1G 2W1, Canada
| | - Milad Alizadeh
- Department of Botany, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | | | - Davoud Torkamaneh
- Département de Phytologie, Université Laval, Québec City, QC, G1V 0A6, Canada. .,Institut de Biologie Intégrative Et Des Systèmes (IBIS), Université Laval, Québec City, QC, G1V 0A6, Canada.
| |
Collapse
|
28
|
Hwangbo S, Lee S, Lee S, Hwang H, Kim I, Park T. Kernel-based hierarchical structural component models for pathway analysis. Bioinformatics 2022; 38:3078-3086. [PMID: 35460238 DOI: 10.1093/bioinformatics/btac276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 04/08/2022] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Pathway analyses have led to more insight into the underlying biological functions related to the phenotype of interest in various types of omics data. Pathway-based statistical approaches have been actively developed, but most of them do not consider correlations among pathways. Because it is well known that there are quite a few biomarkers that overlap between pathways, these approaches may provide misleading results. In addition, most pathway-based approaches tend to assume that biomarkers within a pathway have linear associations with the phenotype of interest, even though the relationships are more complex. RESULTS To model complex effects including nonlinear effects, we propose a new approach, Hierarchical structural CoMponent analysis using Kernel (HisCoM-Kernel). The proposed method models nonlinear associations between biomarkers and phenotype by extending the kernel machine regression and analyzes entire pathways simultaneously by using the biomarker-pathway hierarchical structure. HisCoM-Kernel is a flexible model that can be applied to various omics data. It was successfully applied to three omics datasets generated by different technologies. Our simulation studies showed that HisCoM-Kernel provided higher statistical power than other existing pathway-based methods in all datasets. The application of HisCoM-Kernel to three types of omics dataset showed its superior performance compared to existing methods in identifying more biologically meaningful pathways, including those reported in previous studies. AVAILABILITY AND IMPLEMENTATION Freely available at http://statgen.snu.ac.kr/software/HisCom-Kernel/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Suhyun Hwangbo
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 151-747, Korea.,Department of Genomic Medicine, Seoul National University Hospital, Seoul, 03080, Korea
| | - Sungyoung Lee
- Department of Genomic Medicine, Seoul National University Hospital, Seoul, 03080, Korea
| | - Seungyeoun Lee
- Department of Mathematics and Statistics, Sejong University, Sejong, 05006, Korea
| | - Heungsun Hwang
- Department of Psychology, McGill University, Montreal, QC, H3A 1B1, Canada
| | - Inyoung Kim
- Department of Statistics, Virginia Tech, Blacksburg, Virginia, 24060, U.S.A
| | - Taesung Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 151-747, Korea.,Department of Statistics, Seoul National University, Seoul, 151-747, Korea
| |
Collapse
|
29
|
Vahabi N, Michailidis G. Unsupervised Multi-Omics Data Integration Methods: A Comprehensive Review. Front Genet 2022; 13:854752. [PMID: 35391796 PMCID: PMC8981526 DOI: 10.3389/fgene.2022.854752] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Accepted: 02/28/2022] [Indexed: 12/26/2022] Open
Abstract
Through the developments of Omics technologies and dissemination of large-scale datasets, such as those from The Cancer Genome Atlas, Alzheimer’s Disease Neuroimaging Initiative, and Genotype-Tissue Expression, it is becoming increasingly possible to study complex biological processes and disease mechanisms more holistically. However, to obtain a comprehensive view of these complex systems, it is crucial to integrate data across various Omics modalities, and also leverage external knowledge available in biological databases. This review aims to provide an overview of multi-Omics data integration methods with different statistical approaches, focusing on unsupervised learning tasks, including disease onset prediction, biomarker discovery, disease subtyping, module discovery, and network/pathway analysis. We also briefly review feature selection methods, multi-Omics data sets, and resources/tools that constitute critical components for carrying out the integration.
Collapse
Affiliation(s)
- Nasim Vahabi
- Informatics Institute, University of Florida, Gainesville, FL, United States
| | - George Michailidis
- Informatics Institute, University of Florida, Gainesville, FL, United States
| |
Collapse
|
30
|
Sapoval N, Aghazadeh A, Nute MG, Antunes DA, Balaji A, Baraniuk R, Barberan CJ, Dannenfelser R, Dun C, Edrisi M, Elworth RAL, Kille B, Kyrillidis A, Nakhleh L, Wolfe CR, Yan Z, Yao V, Treangen TJ. Current progress and open challenges for applying deep learning across the biosciences. Nat Commun 2022; 13:1728. [PMID: 35365602 PMCID: PMC8976012 DOI: 10.1038/s41467-022-29268-7] [Citation(s) in RCA: 95] [Impact Index Per Article: 31.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Accepted: 03/09/2022] [Indexed: 11/19/2022] Open
Abstract
Deep Learning (DL) has recently enabled unprecedented advances in one of the grand challenges in computational biology: the half-century-old problem of protein structure prediction. In this paper we discuss recent advances, limitations, and future perspectives of DL on five broad areas: protein structure prediction, protein function prediction, genome engineering, systems biology and data integration, and phylogenetic inference. We discuss each application area and cover the main bottlenecks of DL approaches, such as training data, problem scope, and the ability to leverage existing DL architectures in new contexts. To conclude, we provide a summary of the subject-specific and general challenges for DL across the biosciences.
Collapse
Affiliation(s)
- Nicolae Sapoval
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Amirali Aghazadeh
- Department of Electrical Engineering and Computer Sciences, University of California Berkeley, Berkeley, CA, USA
| | - Michael G Nute
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Dinler A Antunes
- Department of Biology and Biochemistry, University of Houston, Houston, TX, USA
| | - Advait Balaji
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Richard Baraniuk
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA
| | - C J Barberan
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA
| | | | - Chen Dun
- Department of Computer Science, Rice University, Houston, TX, USA
| | | | - R A Leo Elworth
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Bryce Kille
- Department of Computer Science, Rice University, Houston, TX, USA
| | | | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Cameron R Wolfe
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Zhi Yan
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Vicky Yao
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Todd J Treangen
- Department of Computer Science, Rice University, Houston, TX, USA.
- Department of Bioengineering, Rice University, Houston, TX, USA.
| |
Collapse
|
31
|
Brouard C, Mariette J, Flamary R, Vialaneix N. Feature selection for kernel methods in systems biology. NAR Genom Bioinform 2022; 4:lqac014. [PMID: 35265835 PMCID: PMC8900155 DOI: 10.1093/nargab/lqac014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Revised: 01/20/2022] [Accepted: 02/14/2022] [Indexed: 11/13/2022] Open
Abstract
The substantial development of high-throughput biotechnologies has rendered large-scale multi-omics datasets increasingly available. New challenges have emerged to process and integrate this large volume of information, often obtained from widely heterogeneous sources. Kernel methods have proven successful to handle the analysis of different types of datasets obtained on the same individuals. However, they usually suffer from a lack of interpretability since the original description of the individuals is lost due to the kernel embedding. We propose novel feature selection methods that are adapted to the kernel framework and go beyond the well-established work in supervised learning by addressing the more difficult tasks of unsupervised learning and kernel output learning. The method is expressed under the form of a non-convex optimization problem with a ℓ1 penalty, which is solved with a proximal gradient descent approach. It is tested on several systems biology datasets and shows good performances in selecting relevant and less redundant features compared to existing alternatives. It also proved relevant for identifying important governmental measures best explaining the time series of Covid-19 reproducing number evolution during the first months of 2020. The proposed feature selection method is embedded in the R package mixKernel version 0.8, published on CRAN. Installation instructions are available at http://mixkernel.clementine.wf/.
Collapse
Affiliation(s)
- Céline Brouard
- Université de Toulouse, INRAE, UR MIAT, F-31320, Castanet-Tolosan, France
| | - Jérôme Mariette
- Université de Toulouse, INRAE, UR MIAT, F-31320, Castanet-Tolosan, France
| | - Rémi Flamary
- École Polytechnique, CMAP, F-91120, Palaiseau, France
| | - Nathalie Vialaneix
- Université de Toulouse, INRAE, UR MIAT, F-31320, Castanet-Tolosan, France
| |
Collapse
|
32
|
Demirel HC, Arici MK, Tuncbag N. Computational approaches leveraging integrated connections of multi-omic data toward clinical applications. Mol Omics 2021; 18:7-18. [PMID: 34734935 DOI: 10.1039/d1mo00158b] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
In line with the advances in high-throughput technologies, multiple omic datasets have accumulated to study biological systems and diseases coherently. No single omics data type is capable of fully representing cellular activity. The complexity of the biological processes arises from the interactions between omic entities such as genes, proteins, and metabolites. Therefore, multi-omic data integration is crucial but challenging. The impact of the molecular alterations in multi-omic data is not local in the neighborhood of the altered gene or protein; rather, the impact diffuses in the network and changes the functionality of multiple signaling pathways and regulation of the gene expression. Additionally, multi-omic data is high-dimensional and has background noise. Several integrative approaches have been developed to accurately interpret the multi-omic datasets, including machine learning, network-based methods, and their combination. In this review, we overview the most recent integrative approaches and tools with a focus on network-based methods. We then discuss these approaches according to their specific applications, from disease-network and biomarker identification to patient stratification, drug discovery, and repurposing.
Collapse
Affiliation(s)
- Habibe Cansu Demirel
- Graduate School of Informatics, Middle East Technical University, Ankara, 06800, Turkey
| | - Muslum Kaan Arici
- Graduate School of Informatics, Middle East Technical University, Ankara, 06800, Turkey.,Foot and Mouth Diseases Institute, Ministry of Agriculture and Forestry, Ankara, 06044, Turkey
| | - Nurcan Tuncbag
- Chemical and Biological Engineering, College of Engineering, Koc University, Istanbul, 34450, Turkey.,School of Medicine, Koc University, Istanbul, 34450, Turkey.,Koc University Research Center for Translational Medicine (KUTTAM), Istanbul, Turkey.
| |
Collapse
|
33
|
Shi K, Lin W, Zhao XM. Identifying Molecular Biomarkers for Diseases With Machine Learning Based on Integrative Omics. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2514-2525. [PMID: 32305934 DOI: 10.1109/tcbb.2020.2986387] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Molecular biomarkers are certain molecules or set of molecules that can be of help for diagnosis or prognosis of diseases or disorders. In the past decades, thanks to the advances in high-throughput technologies, a huge amount of molecular 'omics' data, e.g., transcriptomics and proteomics, have been accumulated. The availability of these omics data makes it possible to screen biomarkers for diseases or disorders. Accordingly, a number of computational approaches have been developed to identify biomarkers by exploring the omics data. In this review, we present a comprehensive survey on the recent progress of identification of molecular biomarkers with machine learning approaches. Specifically, we categorize the machine learning approaches into supervised, un-supervised and recommendation approaches, where the biomarkers including single genes, gene sets and small gene networks. In addition, we further discuss potential problems underlying bio-medical data that may pose challenges for machine learning, and provide possible directions for future biomarker identification.
Collapse
|
34
|
Drug–disease associations prediction via Multiple Kernel-based Dual Graph Regularized Least Squares. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107811] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
|
35
|
Duan R, Gao L, Gao Y, Hu Y, Xu H, Huang M, Song K, Wang H, Dong Y, Jiang C, Zhang C, Jia S. Evaluation and comparison of multi-omics data integration methods for cancer subtyping. PLoS Comput Biol 2021; 17:e1009224. [PMID: 34383739 PMCID: PMC8384175 DOI: 10.1371/journal.pcbi.1009224] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Revised: 08/24/2021] [Accepted: 06/28/2021] [Indexed: 11/18/2022] Open
Abstract
Computational integrative analysis has become a significant approach in the data-driven exploration of biological problems. Many integration methods for cancer subtyping have been proposed, but evaluating these methods has become a complicated problem due to the lack of gold standards. Moreover, questions of practical importance remain to be addressed regarding the impact of selecting appropriate data types and combinations on the performance of integrative studies. Here, we constructed three classes of benchmarking datasets of nine cancers in TCGA by considering all the eleven combinations of four multi-omics data types. Using these datasets, we conducted a comprehensive evaluation of ten representative integration methods for cancer subtyping in terms of accuracy measured by combining both clustering accuracy and clinical significance, robustness, and computational efficiency. We subsequently investigated the influence of different omics data on cancer subtyping and the effectiveness of their combinations. Refuting the widely held intuition that incorporating more types of omics data always produces better results, our analyses showed that there are situations where integrating more omics data negatively impacts the performance of integration methods. Our analyses also suggested several effective combinations for most cancers under our studies, which may be of particular interest to researchers in omics data analysis. Cancer is one of the most heterogeneous diseases, characterized by diverse morphological, phenotypic, and genomic profiles between tumors and their subtypes. Identifying cancer subtypes can help patients receive precise treatments. With the development of high-throughput technologies, genomics, epigenomics, and transcriptomics data have been generated for large cancer patient cohorts. It is believed that the more omics data we use, the more accurate identification of cancer subtypes. To examine this assumption, we first constructed three classes of benchmarking datasets to conduct a comprehensive evaluation and comparison of ten representative multi-omics data integration methods for cancer subtyping by considering their accuracy, robustness, and computational efficiency. Then, we investigated the influence of different omics data and their various combinations on the effectiveness of cancer subtyping. Our analyses showed that there are situations where integrating more omics data negatively impacts the performance of integration methods. We hope that our work may help researchers choose a proper method and an effective data combination when identifying cancer subtypes using data integration methods.
Collapse
Affiliation(s)
- Ran Duan
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi’an, China
- * E-mail:
| | - Yong Gao
- Department of Computer Science, The University of British Columbia Okanagan, Kelowna, British Columbia, Canada
| | - Yuxuan Hu
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Han Xu
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Mingfeng Huang
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Kuo Song
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Hongda Wang
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Yongqiang Dong
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Chaoqun Jiang
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Chenxing Zhang
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Songwei Jia
- School of Computer Science and Technology, Xidian University, Xi’an, China
| |
Collapse
|
36
|
Hulot A, Laloë D, Jaffrézic F. A unified framework for the integration of multiple hierarchical clusterings or networks from multi-source data. BMC Bioinformatics 2021; 22:392. [PMID: 34348641 PMCID: PMC8336092 DOI: 10.1186/s12859-021-04303-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Accepted: 07/13/2021] [Indexed: 11/30/2022] Open
Abstract
Background Integrating data from different sources is a recurring question in computational biology. Much effort has been devoted to the integration of data sets of the same type, typically multiple numerical data tables. However, data types are generally heterogeneous: it is a common place to gather data in the form of trees, networks or factorial maps, as these representations all have an appealing visual interpretation that helps to study grouping patterns and interactions between entities. The question we aim to answer in this paper is that of the integration of such representations. Results To this end, we provide a simple procedure to compare data with various types, in particular trees or networks, that relies essentially on two steps: the first step projects the representations into a common coordinate system; the second step then uses a multi-table integration approach to compare the projected data. We rely on efficient and well-known methodologies for each step: the projection step is achieved by retrieving a distance matrix for each representation form and then applying multidimensional scaling to provide a new set of coordinates from all the pairwise distances. The integration step is then achieved by applying a multiple factor analysis to the multiple tables of the new coordinates. This procedure provides tools to integrate and compare data available, for instance, as tree or network structures. Our approach is complementary to kernel methods, traditionally used to answer the same question. Conclusion Our approach is evaluated on simulation and used to analyze two real-world data sets: first, we compare several clusterings for different cell-types obtained from a transcriptomics single-cell data set in mouse embryos; second, we use our procedure to aggregate a multi-table data set from the TCGA breast cancer database, in order to compare several protein networks inferred for different breast cancer subtypes. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04303-4.
Collapse
Affiliation(s)
- Audrey Hulot
- Université Paris-Saclay, INRAE, AgroParisTech, GABI , 78350, Jouy-en-Josas, France. .,Université Paris-Saclay, AgroParisTech, INRAE, UMR MIA-Paris , 75005, Paris, France. .,Université Paris-Saclay, UVSQ, Inserm, Infection et inflammation , 78180, Montigny-le-Bretonneux, France.
| | - Denis Laloë
- Université Paris-Saclay, INRAE, AgroParisTech, GABI , 78350, Jouy-en-Josas, France
| | - Florence Jaffrézic
- Université Paris-Saclay, INRAE, AgroParisTech, GABI , 78350, Jouy-en-Josas, France
| |
Collapse
|
37
|
Alam MA, Qiu C, Shen H, Wang YP, Deng HW. A generalized kernel machine approach to identify higher-order composite effects in multi-view datasets, with application to adolescent brain development and osteoporosis. J Biomed Inform 2021; 120:103854. [PMID: 34237438 DOI: 10.1016/j.jbi.2021.103854] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2020] [Revised: 05/28/2021] [Accepted: 06/28/2021] [Indexed: 10/20/2022]
Abstract
In recent years, a comprehensive study of complex disease with multi-view datasets (e.g., multi-omics and imaging scans) has been a focus and forefront in biomedical research. State-of-the-art biomedical technologies are enabling us to collect multi-view biomedical datasets for the study of complex diseases. While all the views of data tend to explore complementary information of disease, analysis of multi-view data with complex interactions is challenging for a deeper and holistic understanding of biological systems. In this paper, we propose a novel generalized kernel machine approach to identify higher-order composite effects in multi-view biomedical datasets (GKMAHCE). This generalized semi-parametric (a mixed-effect linear model) approach includes the marginal and joint Hadamard product of features from different views of data. The proposed kernel machine approach considers multi-view data as predictor variables to allow a more thorough and comprehensive modeling of a complex trait. We applied GKMAHCE approach to both synthesized datasets and real multi-view datasets from adolescent brain development and osteoporosis study. Our experiments demonstrate that the proposed method can effectively identify higher-order composite effects and suggest that corresponding features (genes, region of interests, and chemical taxonomies) function in a concerted effort. We show that the proposed method is more generalizable than existing ones. To promote reproducible research, the source code of the proposed method is available at.
Collapse
Affiliation(s)
- Md Ashad Alam
- Tulane Center for Biomedical Informatics and Genomics, Tulane University, New Orleans, LA 70112, USA; Division of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University, New Orleans, LA 70112, USA.
| | - Chuan Qiu
- Tulane Center for Biomedical Informatics and Genomics, Tulane University, New Orleans, LA 70112, USA; Division of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University, New Orleans, LA 70112, USA
| | - Hui Shen
- Tulane Center for Biomedical Informatics and Genomics, Tulane University, New Orleans, LA 70112, USA; Division of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University, New Orleans, LA 70112, USA
| | - Yu-Ping Wang
- Tulane Center for Biomedical Informatics and Genomics, Tulane University, New Orleans, LA 70112, USA; Department of Biomedical Engineering, Tulane University, New Orleans, LA 70118, USA
| | - Hong-Wen Deng
- Tulane Center for Biomedical Informatics and Genomics, Tulane University, New Orleans, LA 70112, USA; Division of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University, New Orleans, LA 70112, USA
| |
Collapse
|
38
|
Picard M, Scott-Boyer MP, Bodein A, Périn O, Droit A. Integration strategies of multi-omics data for machine learning analysis. Comput Struct Biotechnol J 2021; 19:3735-3746. [PMID: 34285775 PMCID: PMC8258788 DOI: 10.1016/j.csbj.2021.06.030] [Citation(s) in RCA: 205] [Impact Index Per Article: 51.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 06/17/2021] [Accepted: 06/21/2021] [Indexed: 12/25/2022] Open
Abstract
Increased availability of high-throughput technologies has generated an ever-growing number of omics data that seek to portray many different but complementary biological layers including genomics, epigenomics, transcriptomics, proteomics, and metabolomics. New insight from these data have been obtained by machine learning algorithms that have produced diagnostic and classification biomarkers. Most biomarkers obtained to date however only include one omic measurement at a time and thus do not take full advantage of recent multi-omics experiments that now capture the entire complexity of biological systems. Multi-omics data integration strategies are needed to combine the complementary knowledge brought by each omics layer. We have summarized the most recent data integration methods/ frameworks into five different integration strategies: early, mixed, intermediate, late and hierarchical. In this mini-review, we focus on challenges and existing multi-omics integration strategies by paying special attention to machine learning applications.
Collapse
Affiliation(s)
- Milan Picard
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Marie-Pier Scott-Boyer
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Antoine Bodein
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Olivier Périn
- Digital Sciences Department, L'Oréal Advanced Research, Aulnay-sous-bois, France
| | - Arnaud Droit
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
- Corresponding author.
| |
Collapse
|
39
|
Statistical Integration of 'Omics Data Increases Biological Knowledge Extracted from Metabolomics Data: Application to Intestinal Exposure to the Mycotoxin Deoxynivalenol. Metabolites 2021; 11:metabo11060407. [PMID: 34205708 PMCID: PMC8233929 DOI: 10.3390/metabo11060407] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Revised: 06/07/2021] [Accepted: 06/15/2021] [Indexed: 12/18/2022] Open
Abstract
The effects of low doses of toxicants are often subtle and information extracted from metabolomic data alone may not always be sufficient. As end products of enzymatic reactions, metabolites represent the final phenotypic expression of an organism and can also reflect gene expression changes caused by this exposure. Therefore, the integration of metabolomic and transcriptomic data could improve the extracted biological knowledge on these toxicants induced disruptions. In the present study, we applied statistical integration tools to metabolomic and transcriptomic data obtained from jejunal explants of pigs exposed to the food contaminant, deoxynivalenol (DON). Canonical correlation analysis (CCA) and self-organizing map (SOM) were compared for the identification of correlated transcriptomic and metabolomic features, and O2-PLS was used to model the relationship between exposure and selected features. The integration of both 'omics data increased the number of discriminant metabolites discovered (39) by about 10 times compared to the analysis of the metabolomic dataset alone (3). Besides the disturbance of energy metabolism previously reported, assessing correlations between both functional levels revealed several other types of damage linked to the intestinal exposure to DON, including the alteration of protein synthesis, oxidative stress, and inflammasome activation. This confirms the added value of integration to enrich the biological knowledge extracted from metabolomics.
Collapse
|
40
|
Parimbelli E, Wilk S, Cornet R, Sniatala P, Sniatala K, Glaser SLC, Fraterman I, Boekhout AH, Ottaviano M, Peleg M. A review of AI and Data Science support for cancer management. Artif Intell Med 2021; 117:102111. [PMID: 34127240 DOI: 10.1016/j.artmed.2021.102111] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Revised: 12/23/2020] [Accepted: 05/11/2021] [Indexed: 02/09/2023]
Abstract
INTRODUCTION Thanks to improvement of care, cancer has become a chronic condition. But due to the toxicity of treatment, the importance of supporting the quality of life (QoL) of cancer patients increases. Monitoring and managing QoL relies on data collected by the patient in his/her home environment, its integration, and its analysis, which supports personalization of cancer management recommendations. We review the state-of-the-art of computerized systems that employ AI and Data Science methods to monitor the health status and provide support to cancer patients managed at home. OBJECTIVE Our main objective is to analyze the literature to identify open research challenges that a novel decision support system for cancer patients and clinicians will need to address, point to potential solutions, and provide a list of established best-practices to adopt. METHODS We designed a review study, in compliance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, analyzing studies retrieved from PubMed related to monitoring cancer patients in their home environments via sensors and self-reporting: what data is collected, what are the techniques used to collect data, semantically integrate it, infer the patient's state from it and deliver coaching/behavior change interventions. RESULTS Starting from an initial corpus of 819 unique articles, a total of 180 papers were considered in the full-text analysis and 109 were finally included in the review. Our findings are organized and presented in four main sub-topics consisting of data collection, data integration, predictive modeling and patient coaching. CONCLUSION Development of modern decision support systems for cancer needs to utilize best practices like the use of validated electronic questionnaires for quality-of-life assessment, adoption of appropriate information modeling standards supplemented by terminologies/ontologies, adherence to FAIR data principles, external validation, stratification of patients in subgroups for better predictive modeling, and adoption of formal behavior change theories. Open research challenges include supporting emotional and social dimensions of well-being, including PROs in predictive modeling, and providing better customization of behavioral interventions for the specific population of cancer patients.
Collapse
Affiliation(s)
| | - S Wilk
- Poznan University of Technology, Poland
| | - R Cornet
- Amsterdam University Medical Centre, the Netherlands
| | | | | | - S L C Glaser
- Amsterdam University Medical Centre, the Netherlands
| | - I Fraterman
- Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - A H Boekhout
- Netherlands Cancer Institute, Amsterdam, the Netherlands
| | | | | |
Collapse
|
41
|
Fang R, Yang H, Gao Y, Cao H, Goode EL, Cui Y. Gene-based mediation analysis in epigenetic studies. Brief Bioinform 2021; 22:bbaa113. [PMID: 32608480 PMCID: PMC8660163 DOI: 10.1093/bib/bbaa113] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2020] [Revised: 04/07/2020] [Accepted: 05/12/2020] [Indexed: 12/15/2022] Open
Abstract
Mediation analysis has been a useful tool for investigating the effect of mediators that lie in the path from the independent variable to the outcome. With the increasing dimensionality of mediators such as in (epi)genomics studies, high-dimensional mediation model is needed. In this work, we focus on epigenetic studies with the goal to identify important DNA methylations that act as mediators between an exposure disease outcome. Specifically, we focus on gene-based high-dimensional mediation analysis implemented with kernel principal component analysis to capture potential nonlinear mediation effect. We first review the current high-dimensional mediation models and then propose two gene-based analytical approaches: gene-based high-dimensional mediation analysis based on linearity assumption between mediators and outcome (gHMA-L) and gene-based high-dimensional mediation analysis based on nonlinearity assumption (gHMA-NL). Since the underlying true mediation relationship is unknown in practice, we further propose an omnibus test of gene-based high-dimensional mediation analysis (gHMA-O) by combing gHMA-L and gHMA-NL. Extensive simulation studies show that gHMA-L performs better under the model linear assumption and gHMA-NL does better under the model nonlinear assumption, while gHMA-O is a more powerful and robust method by combining the two. We apply the proposed methods to two datasets to investigate genes whose methylation levels act as important mediators in the relationship: (1) between alcohol consumption and epithelial ovarian cancer risk using data from the Mayo Clinic Ovarian Cancer Case-Control Study and (2) between childhood maltreatment and comorbid post-traumatic stress disorder and depression in adulthood using data from the Gray Trauma Project.
Collapse
|
42
|
Chai H, Zhou X, Zhang Z, Rao J, Zhao H, Yang Y. Integrating multi-omics data through deep learning for accurate cancer prognosis prediction. Comput Biol Med 2021; 134:104481. [PMID: 33989895 DOI: 10.1016/j.compbiomed.2021.104481] [Citation(s) in RCA: 58] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Revised: 05/06/2021] [Accepted: 05/06/2021] [Indexed: 12/16/2022]
Abstract
BACKGROUND Genomic information is nowadays widely used for precise cancer treatments. Since the individual type of omics data only represents a single view that suffers from data noise and bias, multiple types of omics data are required for accurate cancer prognosis prediction. However, it is challenging to effectively integrate multi-omics data due to the large number of redundant variables but relatively small sample size. With the recent progress in deep learning techniques, Autoencoder was used to integrate multi-omics data for extracting representative features. Nevertheless, the generated model is fragile from data noises. Additionally, previous studies usually focused on individual cancer types without making comprehensive tests on pan-cancer. Here, we employed the denoising Autoencoder to get a robust representation of the multi-omics data, and then used the learned representative features to estimate patients' risks. RESULTS By applying to 15 cancers from The Cancer Genome Atlas (TCGA), our method was shown to improve the C-index values over previous methods by 6.5% on average. Considering the difficulty to obtain multi-omics data in practice, we further used only mRNA data to fit the estimated risks by training XGboost models, and found the models could achieve an average C-index value of 0.627. As a case study, the breast cancer prognosis prediction model was independently tested on three datasets from the Gene Expression Omnibus (GEO), and shown able to significantly separate high-risk patients from low-risk ones (C-index>0.6, p-values<0.05). Based on the risk subgroups divided by our method, we identified nine prognostic markers highly associated with breast cancer, among which seven genes have been proved by literature review. CONCLUSION Our comprehensive tests indicated that we have constructed an accurate and robust framework to integrate multi-omics data for cancer prognosis prediction. Moreover, it is an effective way to discover cancer prognosis-related genes.
Collapse
Affiliation(s)
- Hua Chai
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510000, China
| | - Xiang Zhou
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510000, China
| | - Zhongyue Zhang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510000, China
| | - Jiahua Rao
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510000, China
| | - Huiying Zhao
- Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, 510000, China.
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510000, China; Key Laboratory of Machine Intelligence and Advanced Computing (MOE), Sun Yat-sen University, Guangzhou, 510000, China.
| |
Collapse
|
43
|
Guo X, Zhou W, Shi B, Wang X, Du A, Ding Y, Tang J, Guo F. An Efficient Multiple Kernel Support Vector Regression Model for Assessing Dry Weight of Hemodialysis Patients. Curr Bioinform 2021. [DOI: 10.2174/1574893615999200614172536] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Dry Weight (DW) is the lowest weight after dialysis, and patients with
lower weight usually have symptoms of hypotension and shock. Several clinical-based approaches
have been presented to assess the dry weight of hemodialysis patients. However, these traditional
methods all depend on special instruments and professional technicians.
Objective:
In order to avoid this limitation, we need to find a machine-independent way to assess dry
weight, therefore we collected some clinical influencing characteristic data and constructed a
Machine Learning-based (ML) model to predict the dry weight of hemodialysis patients.
Methods::
In this paper, 476 hemodialysis patients' demographic data, anthropometric measurements,
and Bioimpedance spectroscopy (BIS) were collected. Among them, these patients' age, sex, Body
Mass Index (BMI), Blood Pressure (BP) and Heart Rate (HR) and Years of Dialysis (YD) were
closely related to their dry weight. All these relevant data were used to enter the regression equation.
Multiple Kernel Support Vector Regression-based on Maximizes the Average Similarity (MKSVRMAS)
model was proposed to predict the dry weight of hemodialysis patients.
Result:
The experimental results show that dry weight is positively correlated with BMI and HR.
And age, sex, systolic blood pressure, diastolic blood pressure and hemodialysis time are negatively
correlated with dry weight. Moreover, the Root Mean Square Error (RMSE) of our model was
1.3817.
Conclusion:
Our proposed model could serve as a viable alternative for dry weight estimation of
hemodialysis patients, thus providing a new way for clinical practice. Our proposed model could serve as a viable alternative of dry weight estimation for hemodialysis patients,
thus providing a new way for the clinic.
Collapse
Affiliation(s)
- Xiaoyi Guo
- Hemodialysis Center, The Affiliated Wuxi People's Hospital of Nanjing Medical University, 214000, Wuxi, China
| | - Wei Zhou
- Hemodialysis Center, The Affiliated Wuxi People's Hospital of Nanjing Medical University, 214000, Wuxi, China
| | - Bin Shi
- Hemodialysis Center, Northern Jiangsu People's Hospital, 225001, Yangzhou, China
| | - Xiaohua Wang
- Department of Urology, the First Affiliated Hospital of Soochow University, 215006, Suzhou, China
| | - Aiyan Du
- Hemodialysis Center, The Affiliated Wuxi People's Hospital of Nanjing Medical University, 214000, Wuxi, China
| | - Yijie Ding
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, 215009, Suzhou, China
| | - Jijun Tang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, 300350, Tianjin, China
| | - Fei Guo
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, 300350, Tianjin, China
| |
Collapse
|
44
|
Kokova D, Verhoeven A, Perina EA, Ivanov VV, Heijink M, Yazdanbakhsh M, Mayboroda OA. Metabolic Homeostasis in Chronic Helminth Infection Is Sustained by Organ-Specific Metabolic Rewiring. ACS Infect Dis 2021; 7:906-916. [PMID: 33764039 PMCID: PMC8154418 DOI: 10.1021/acsinfecdis.1c00026] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Indexed: 11/28/2022]
Abstract
Opisthorchiasis, is a hepatobiliary disease caused by flukes of the trematode family Opisthorchiidae. A chronic form of the disease implies a prolonged coexistence of a host and the parasite. The pathological changes inflicted by the worm to the host's hepatobiliary system are well documented. Yet, the response to the infection also triggers a deep remodeling of the host systemic metabolism reaching a new homeostasis and affecting the organs beyond the worm location. Understanding the metabolic alternation in chronic opisthorchiasis, could help us to pinpoint pathways that underlie infection opening possibilities for the development of more selective treatment strategies. Here, with this report we apply an integrative, multicompartment metabolomics analysis, using multiple biofluids, stool samples and tissue extracts to describe metabolic changes in Opisthorchis felineus infected animals at the chronic stage. We show that the shift in lipid metabolism in the serum, a depletion of the amino acids pool, an alteration of the ketogenic pathways in the jejunum and a suppressed metabolic activity of the spleen are the key features of the metabolic host adaptation at the chronic stage of O. felineus infection. We describe this combination of the metabolic changes as a "metabolically mediated immunosuppressive status of organism" which develops during a chronic infection. This status in combination with other factors (e.g., parasite-derived immunomodulators) might increase risk of infection-related malignancy.
Collapse
Affiliation(s)
- Daria Kokova
- Department
of Parasitology, Leiden University Medical
Center, Leiden, 2333ZA, The Netherlands
- Laboratory
of Clinical Metabolomics, Tomsk State University, Tomsk 634050, Russian Federation
| | - Aswin Verhoeven
- Center
for Proteomics and Metabolomics, Leiden
University Medical Center, Leiden, 2333ZA, The Netherlands
| | - Ekaterina A. Perina
- Central
Research Laboratory Siberian State Medical University, Tomsk 634050, Russian Federation
| | - Vladimir V. Ivanov
- Central
Research Laboratory Siberian State Medical University, Tomsk 634050, Russian Federation
| | - Marieke Heijink
- Center
for Proteomics and Metabolomics, Leiden
University Medical Center, Leiden, 2333ZA, The Netherlands
| | - Maria Yazdanbakhsh
- Department
of Parasitology, Leiden University Medical
Center, Leiden, 2333ZA, The Netherlands
| | - Oleg A. Mayboroda
- Center
for Proteomics and Metabolomics, Leiden
University Medical Center, Leiden, 2333ZA, The Netherlands
| |
Collapse
|
45
|
Wu Y, Wang H, Li Z, Cheng J, Fang R, Cao H, Cui Y. Subtypes identification on heart failure with preserved ejection fraction via network enhancement fusion using multi-omics data. Comput Struct Biotechnol J 2021; 19:1567-1578. [PMID: 33868594 PMCID: PMC8039555 DOI: 10.1016/j.csbj.2021.03.010] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2020] [Revised: 03/03/2021] [Accepted: 03/06/2021] [Indexed: 11/24/2022] Open
Abstract
Heart failure with preserved ejection fraction (HFpEF) is associated with multiple etiologic and pathophysiologic factors. HFpEF leads to significant cardiovascular morbidity and mortality. There are various reasons that fail to identify effective therapeutic interventions for HFpEF, primarily due to its clinical heterogeneity causing significant difficulties in determining physiologic and prognostic implications for this syndrome. Thus, identifying clinical subtypes using multi-omics data has great implications for efficient treatment and prognosis of HFpEF patients. Here we proposed to integrate mRNA, DNA methylation and microRNA (miRNA) expression data of HFpEF with a similarity network fusion (SNF) method following a network enhancement (ne-SNF) denoising technique to form a fused network. A spectral clustering method was then used to obtain clusters of patient subtypes. Experiments on HFpEF datasets demonstrated that ne-SNF significantly outperforms single data subtype analysis and other integrated methods. The identified subgroups were shown to have statistically significant differences in survival. Two HFpEF subtypes were defined: a high-risk group (16.8%) and a low-risk group (83.2%). The 5-year mortality rates were 63.3% and 33.0% for the high- and low-risk group, respectively. After adjusting for the effects of clinical covariates, HFpEF patients in the high-risk group were 2.43 times more likely to die than the low-risk group. A total of 157 differentially expressed (DE) mRNAs, 2199 abnormal methylations and 121 DE miRNAs were identified between two subtypes. They were also enriched in many HFpEF-related biological processes or pathways. The ne-SNF method provides a novel pipeline for subtype identification in integrated analysis of multi-omics data.
Collapse
Affiliation(s)
- Yongqing Wu
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi 030001, PR China
| | - Huihui Wang
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi 030001, PR China
| | - Zhi Li
- Department of Hematology, Taiyuan Central Hospital of Shanxi Medical University, Taiyuan, Shanxi 030001, PR China
| | - Jinfang Cheng
- Department of Cardiology, Bethune Hospital, Shanxi Medical University, Taiyuan, Shanxi 030001, PR China
| | - Ruiling Fang
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi 030001, PR China
| | - Hongyan Cao
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi 030001, PR China.,Shanxi Provincial Key Laboratory of Major Disease Risk Assessment, Taiyuan, Shanxi 030001, PR China
| | - Yuehua Cui
- Department of Statistics and Probability, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
46
|
Vlachavas EI, Bohn J, Ückert F, Nürnberg S. A Detailed Catalogue of Multi-Omics Methodologies for Identification of Putative Biomarkers and Causal Molecular Networks in Translational Cancer Research. Int J Mol Sci 2021; 22:2822. [PMID: 33802234 PMCID: PMC8000236 DOI: 10.3390/ijms22062822] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 03/05/2021] [Accepted: 03/05/2021] [Indexed: 02/06/2023] Open
Abstract
Recent advances in sequencing and biotechnological methodologies have led to the generation of large volumes of molecular data of different omics layers, such as genomics, transcriptomics, proteomics and metabolomics. Integration of these data with clinical information provides new opportunities to discover how perturbations in biological processes lead to disease. Using data-driven approaches for the integration and interpretation of multi-omics data could stably identify links between structural and functional information and propose causal molecular networks with potential impact on cancer pathophysiology. This knowledge can then be used to improve disease diagnosis, prognosis, prevention, and therapy. This review will summarize and categorize the most current computational methodologies and tools for integration of distinct molecular layers in the context of translational cancer research and personalized therapy. Additionally, the bioinformatics tools Multi-Omics Factor Analysis (MOFA) and netDX will be tested using omics data from public cancer resources, to assess their overall robustness, provide reproducible workflows for gaining biological knowledge from multi-omics data, and to comprehensively understand the significantly perturbed biological entities in distinct cancer types. We show that the performed supervised and unsupervised analyses result in meaningful and novel findings.
Collapse
Affiliation(s)
- Efstathios Iason Vlachavas
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
| | - Jonas Bohn
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
| | - Frank Ückert
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
- Applied Medical Informatics, University Hospital Hamburg-Eppendorf, 20251 Hamburg, Germany
| | - Sylvia Nürnberg
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
- Applied Medical Informatics, University Hospital Hamburg-Eppendorf, 20251 Hamburg, Germany
| |
Collapse
|
47
|
Ramon E, Belanche-Muñoz L, Molist F, Quintanilla R, Perez-Enciso M, Ramayo-Caldas Y. kernInt: A Kernel Framework for Integrating Supervised and Unsupervised Analyses in Spatio-Temporal Metagenomic Datasets. Front Microbiol 2021; 12:609048. [PMID: 33584612 PMCID: PMC7876079 DOI: 10.3389/fmicb.2021.609048] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Accepted: 01/07/2021] [Indexed: 12/20/2022] Open
Abstract
The advent of next-generation sequencing technologies allowed relative quantification of microbiome communities and their spatial and temporal variation. In recent years, supervised learning (i.e., prediction of a phenotype of interest) from taxonomic abundances has become increasingly common in the microbiome field. However, a gap exists between supervised and classical unsupervised analyses, based on computing ecological dissimilarities for visualization or clustering. Despite this, both approaches face common challenges, like the compositional nature of next-generation sequencing data or the integration of the spatial and temporal dimensions. Here we propose a kernel framework to place on a common ground the unsupervised and supervised microbiome analyses, including the retrieval of microbial signatures (taxa importances). We define two compositional kernels (Aitchison-RBF and compositional linear) and discuss how to transform non-compositional beta-dissimilarity measures into kernels. Spatial data is integrated with multiple kernel learning, while longitudinal data is evaluated by specific kernels. We illustrate our framework through a single point soil dataset, a human dataset with a spatial component, and a previously unpublished longitudinal dataset concerning pig production. The proposed framework and the case studies are freely available in the kernInt package at https://github.com/elies-ramon/kernInt.
Collapse
Affiliation(s)
- Elies Ramon
- Plant and Animal Genomics, Statistical and Population Genomics Group, CSIC-IRTA-UAB-UB Consortium, Centre for Research in Agricultural Genomics (CRAG), Bellaterra, Spain
| | - Lluís Belanche-Muñoz
- Department of Computer Science, Polytechnic University of Catalonia, Barcelona, Spain
| | | | | | - Miguel Perez-Enciso
- Plant and Animal Genomics, Statistical and Population Genomics Group, CSIC-IRTA-UAB-UB Consortium, Centre for Research in Agricultural Genomics (CRAG), Bellaterra, Spain.,Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain
| | | |
Collapse
|
48
|
Rodosthenous T, Shahrezaei V, Evangelou M. Integrating multi-OMICS data through sparse canonical correlation analysis for the prediction of complex traits: a comparison study. Bioinformatics 2020; 36:4616-4625. [PMID: 32437529 PMCID: PMC7750936 DOI: 10.1093/bioinformatics/btaa530] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2019] [Revised: 04/22/2020] [Accepted: 05/16/2020] [Indexed: 01/08/2023] Open
Abstract
Motivation Recent developments in technology have enabled researchers to collect multiple OMICS datasets for the same individuals. The conventional approach for understanding the relationships between the collected datasets and the complex trait of interest would be through the analysis of each OMIC dataset separately from the rest, or to test for associations between the OMICS datasets. In this work we show that integrating multiple OMICS datasets together, instead of analysing them separately, improves our understanding of their in-between relationships as well as the predictive accuracy for the tested trait. Several approaches have been proposed for the integration of heterogeneous and high-dimensional (p≫n) data, such as OMICS. The sparse variant of canonical correlation analysis (CCA) approach is a promising one that seeks to penalize the canonical variables for producing sparse latent variables while achieving maximal correlation between the datasets. Over the last years, a number of approaches for implementing sparse CCA (sCCA) have been proposed, where they differ on their objective functions, iterative algorithm for obtaining the sparse latent variables and make different assumptions about the original datasets. Results Through a comparative study we have explored the performance of the conventional CCA proposed by Parkhomenko et al., penalized matrix decomposition CCA proposed by Witten and Tibshirani and its extension proposed by Suo et al. The aforementioned methods were modified to allow for different penalty functions. Although sCCA is an unsupervised learning approach for understanding of the in-between relationships, we have twisted the problem as a supervised learning one and investigated how the computed latent variables can be used for predicting complex traits. The approaches were extended to allow for multiple (more than two) datasets where the trait was included as one of the input datasets. Both ways have shown improvement over conventional predictive models that include one or multiple datasets. Availability and implementation https://github.com/theorod93/sCCA. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Vahid Shahrezaei
- Department of Mathematics, Imperial College London, London SW7 2AZ, UK
| | - Marina Evangelou
- Department of Mathematics, Imperial College London, London SW7 2AZ, UK
| |
Collapse
|
49
|
Biswas N, Chakrabarti S. Artificial Intelligence (AI)-Based Systems Biology Approaches in Multi-Omics Data Analysis of Cancer. Front Oncol 2020; 10:588221. [PMID: 33154949 PMCID: PMC7591760 DOI: 10.3389/fonc.2020.588221] [Citation(s) in RCA: 53] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Accepted: 09/21/2020] [Indexed: 12/13/2022] Open
Abstract
Cancer is the manifestation of abnormalities of different physiological processes involving genes, DNAs, RNAs, proteins, and other biomolecules whose profiles are reflected in different omics data types. As these bio-entities are very much correlated, integrative analysis of different types of omics data, multi-omics data, is required to understanding the disease from the tumorigenesis to the disease progression. Artificial intelligence (AI), specifically machine learning algorithms, has the ability to make decisive interpretation of "big"-sized complex data and, hence, appears as the most effective tool for the analysis and understanding of multi-omics data for patient-specific observations. In this review, we have discussed about the recent outcomes of employing AI in multi-omics data analysis of different types of cancer. Based on the research trends and significance in patient treatment, we have primarily focused on the AI-based analysis for determining cancer subtypes, disease prognosis, and therapeutic targets. We have also discussed about AI analysis of some non-canonical types of omics data as they have the capability of playing the determiner role in cancer patient care. Additionally, we have briefly discussed about the data repositories because of their pivotal role in multi-omics data storing, processing, and analysis.
Collapse
Affiliation(s)
- Nupur Biswas
- Structural Biology and Bioinformatics Division, CSIR-Indian Institute of Chemical Biology, IICB TRUE Campus, Kolkata, India
| | - Saikat Chakrabarti
- Structural Biology and Bioinformatics Division, CSIR-Indian Institute of Chemical Biology, IICB TRUE Campus, Kolkata, India
| |
Collapse
|
50
|
Ren S, Liu F, Zhou W, Feng X, Siddique CN. Group-based local adaptive deep multiple kernel learning with lp norm. PLoS One 2020; 15:e0238535. [PMID: 32941468 PMCID: PMC7498035 DOI: 10.1371/journal.pone.0238535] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Accepted: 08/18/2020] [Indexed: 12/21/2022] Open
Abstract
The deep multiple kernel Learning (DMKL) method has attracted wide attention due to its better classification performance than shallow multiple kernel learning. However, the existing DMKL methods are hard to find suitable global model parameters to improve classification accuracy in numerous datasets and do not take into account inter-class correlation and intra-class diversity. In this paper, we present a group-based local adaptive deep multiple kernel learning (GLDMKL) method with lp norm. Our GLDMKL method can divide samples into multiple groups according to the multiple kernel k-means clustering algorithm. The learning process in each well-grouped local space is exactly adaptive deep multiple kernel learning. And our structure is adaptive, so there is no fixed number of layers. The learning model in each group is trained independently, so the number of layers of the learning model maybe different. In each local space, adapting the model by optimizing the SVM model parameter α and the local kernel weight β in turn and changing the proportion of the base kernel of the combined kernel in each layer by the local kernel weight, and the local kernel weight is constrained by the lp norm to avoid the sparsity of basic kernel. The hyperparameters of the kernel are optimized by the grid search method. Experiments on UCI and Caltech 256 datasets demonstrate that the proposed method is more accurate in classification accuracy than other deep multiple kernel learning methods, especially for datasets with relatively complex data.
Collapse
Affiliation(s)
- Shengbing Ren
- School of Computer Science and Engineering, Central South University, Changsha, China
- * E-mail:
| | - Fa Liu
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Weijia Zhou
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Xian Feng
- School of Computer Science and Engineering, Central South University, Changsha, China
| | | |
Collapse
|