51
|
Vahabi N, Michailidis G. Unsupervised Multi-Omics Data Integration Methods: A Comprehensive Review. Front Genet 2022; 13:854752. [PMID: 35391796 PMCID: PMC8981526 DOI: 10.3389/fgene.2022.854752] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Accepted: 02/28/2022] [Indexed: 12/26/2022] Open
Abstract
Through the developments of Omics technologies and dissemination of large-scale datasets, such as those from The Cancer Genome Atlas, Alzheimer’s Disease Neuroimaging Initiative, and Genotype-Tissue Expression, it is becoming increasingly possible to study complex biological processes and disease mechanisms more holistically. However, to obtain a comprehensive view of these complex systems, it is crucial to integrate data across various Omics modalities, and also leverage external knowledge available in biological databases. This review aims to provide an overview of multi-Omics data integration methods with different statistical approaches, focusing on unsupervised learning tasks, including disease onset prediction, biomarker discovery, disease subtyping, module discovery, and network/pathway analysis. We also briefly review feature selection methods, multi-Omics data sets, and resources/tools that constitute critical components for carrying out the integration.
Collapse
Affiliation(s)
- Nasim Vahabi
- Informatics Institute, University of Florida, Gainesville, FL, United States
| | - George Michailidis
- Informatics Institute, University of Florida, Gainesville, FL, United States
| |
Collapse
|
52
|
Zenere A, Rundquist O, Gustafsson M, Altafini C. Multi-omics protein-coding units as massively parallel Bayesian networks: empirical validation of causality structure. iScience 2022; 25:104048. [PMID: 35355520 PMCID: PMC8958332 DOI: 10.1016/j.isci.2022.104048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Revised: 01/17/2022] [Accepted: 03/08/2022] [Indexed: 11/29/2022] Open
Abstract
In this article we use high-throughput epigenomics, transcriptomics, and proteomics data to construct fine-graded models of the “protein-coding units” gathering all transcript isoforms and chromatin accessibility peaks associated with more than 4000 genes in humans. Each protein-coding unit has the structure of a directed acyclic graph (DAG) and can be represented as a Bayesian network. The factorization of the joint probability distribution induced by the DAGs imposes a number of conditional independence relationships among the variables forming a protein-coding unit, corresponding to the missing edges in the DAGs. We show that a large fraction of these conditional independencies are indeed verified by the data. Factors driving this verification appear to be the structural and functional annotation of the transcript isoforms, as well as a notion of structural balance (or frustration-free) of the corresponding sample correlation graph, which naturally leads to reduction of correlation (and hence to independence) upon conditioning. Protein coding unit: DAG associated with epigenetic and gene information of a protein DAGs correspond to Bayesian networks Edge absence on a DAG corresponds to conditional independence Multi-omics data (ATAC-seq, RNA-seq and mass-spec) are used for DAG validation
Collapse
|
53
|
Ganugi P, Fiorini A, Ardenti F, Caffi T, Bonini P, Taskin E, Puglisi E, Tabaglio V, Trevisan M, Lucini L. Nitrogen use efficiency, rhizosphere bacterial community, and root metabolome reprogramming due to maize seed treatment with microbial biostimulants. PHYSIOLOGIA PLANTARUM 2022; 174:e13679. [PMID: 35362106 PMCID: PMC9324912 DOI: 10.1111/ppl.13679] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/02/2021] [Revised: 02/26/2022] [Accepted: 03/25/2022] [Indexed: 06/14/2023]
Abstract
Seed inoculation with beneficial microorganisms has gained importance as it has been proven to show biostimulant activity in plants, especially in terms of abiotic/biotic stress tolerance and plant growth promotion, representing a sustainable way to ensure yield stability under low input sustainable agriculture. Nevertheless, limited knowledge is available concerning the molecular and physiological processes underlying the root-inoculant symbiosis or plant response at the root system level. Our work aimed to integrate the interrelationship between agronomic traits, rhizosphere microbial population and metabolic processes in roots, following seed treatment with either arbuscular mycorrhizal fungi (AMF) or Plant Growth-Promoting Rhizobacteria (PGPR). To this aim, maize was grown under open field conditions with either optimal or reduced nitrogen availability. Both seed treatments increased nitrogen uptake efficiency under reduced nitrogen supply revealed some microbial community changes among treatments at root microbiome level and limited yield increases, while significant changes could be observed at metabolome level. Amino acid, lipid, flavone, lignan, and phenylpropanoid concentrations were mostly modulated. Integrative analysis of multi-omics datasets (Multiple Co-Inertia Analysis) highlighted a strong correlation between the metagenomics and the untargeted metabolomics datasets, suggesting a coordinate modulation of root physiological traits.
Collapse
Affiliation(s)
- Paola Ganugi
- Department for Sustainable Food ProcessUniversità Cattolica del Sacro CuorePiacenzaItaly
| | - Andrea Fiorini
- Department of Sustainable Crop ProductionUniversità Cattolica del Sacro CuorePiacenzaItaly
| | - Federico Ardenti
- Department of Sustainable Crop ProductionUniversità Cattolica del Sacro CuorePiacenzaItaly
| | - Tito Caffi
- Department of Sustainable Crop ProductionUniversità Cattolica del Sacro CuorePiacenzaItaly
| | | | - Eren Taskin
- Department for Sustainable Food ProcessUniversità Cattolica del Sacro CuorePiacenzaItaly
| | - Edoardo Puglisi
- Department for Sustainable Food ProcessUniversità Cattolica del Sacro CuorePiacenzaItaly
| | - Vincenzo Tabaglio
- Department of Sustainable Crop ProductionUniversità Cattolica del Sacro CuorePiacenzaItaly
| | - Marco Trevisan
- Department for Sustainable Food ProcessUniversità Cattolica del Sacro CuorePiacenzaItaly
| | - Luigi Lucini
- Department for Sustainable Food ProcessUniversità Cattolica del Sacro CuorePiacenzaItaly
| |
Collapse
|
54
|
Macías-Pérez LA, Levard C, Barakat M, Angeletti B, Borschneck D, Poizat L, Achouak W, Auffan M. Contrasted microbial community colonization of a bauxite residue deposit marked by a complex geochemical context. JOURNAL OF HAZARDOUS MATERIALS 2022; 424:127470. [PMID: 34687997 DOI: 10.1016/j.jhazmat.2021.127470] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Revised: 09/24/2021] [Accepted: 10/06/2021] [Indexed: 06/13/2023]
Abstract
Bauxite residue is the alkaline byproduct generated during alumina extraction and is commonly landfilled in open-air deposits. The growth in global alumina production have raised environmental concerns about these deposits since no large-scale reuses exist to date. Microbial-driven techniques including bioremediation and critical metal bio-recovery are now considered sustainable and cost-effective methods to revalorize bauxite residues. However, the establishment of microbial communities and their active role in these strategies are still poorly understood. We thus determined the geochemical composition of different bauxite residues produced in southern France and explored the development of bacterial and fungal communities using Illumina high-throughput sequencing. Physicochemical parameters were influenced differently by the deposit age and the bauxite origin. Taxonomical analysis revealed an early-stage microbial community dominated by haloalkaliphilic microorganisms and strongly influenced by chemical gradients. Microbial richness, diversity and network complexity increased significantly with the deposit age, reaching an equilibrium community composition similar to typical soils after decades of natural weathering. Our results suggested that salinity, pH, and toxic metals affected the bacterial community structure, while fungal community composition showed no clear correlations with chemical variations.
Collapse
Affiliation(s)
- Luis Alberto Macías-Pérez
- Aix Marseille Université, CNRS, IRD, INRAE, Collège de France, CEREGE, Technopôle de l'Arbois-Méditerranée, BP80, 13545 Aix-en-Provence, France; Aix Marseille Univ, CEA, CNRS, BIAM, LEMIRE, Laboratory of Microbial Ecology of the Rhizosphere, ECCOREV FR 3098, F-13108 St-Paul-lez-Durance, France.
| | - Clément Levard
- Aix Marseille Université, CNRS, IRD, INRAE, Collège de France, CEREGE, Technopôle de l'Arbois-Méditerranée, BP80, 13545 Aix-en-Provence, France.
| | - Mohamed Barakat
- Aix Marseille Univ, CEA, CNRS, BIAM, LEMIRE, Laboratory of Microbial Ecology of the Rhizosphere, ECCOREV FR 3098, F-13108 St-Paul-lez-Durance, France.
| | - Bernard Angeletti
- Aix Marseille Université, CNRS, IRD, INRAE, Collège de France, CEREGE, Technopôle de l'Arbois-Méditerranée, BP80, 13545 Aix-en-Provence, France.
| | - Daniel Borschneck
- Aix Marseille Université, CNRS, IRD, INRAE, Collège de France, CEREGE, Technopôle de l'Arbois-Méditerranée, BP80, 13545 Aix-en-Provence, France.
| | | | - Wafa Achouak
- Aix Marseille Univ, CEA, CNRS, BIAM, LEMIRE, Laboratory of Microbial Ecology of the Rhizosphere, ECCOREV FR 3098, F-13108 St-Paul-lez-Durance, France.
| | - Mélanie Auffan
- Aix Marseille Université, CNRS, IRD, INRAE, Collège de France, CEREGE, Technopôle de l'Arbois-Méditerranée, BP80, 13545 Aix-en-Provence, France; Civil and Environmental Engineering, Duke University, Durham, NC 27708, USA.
| |
Collapse
|
55
|
Martínez-García M, Hernández-Lemus E. Data Integration Challenges for Machine Learning in Precision Medicine. Front Med (Lausanne) 2022; 8:784455. [PMID: 35145977 PMCID: PMC8821900 DOI: 10.3389/fmed.2021.784455] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Accepted: 12/28/2021] [Indexed: 12/19/2022] Open
Abstract
A main goal of Precision Medicine is that of incorporating and integrating the vast corpora on different databases about the molecular and environmental origins of disease, into analytic frameworks, allowing the development of individualized, context-dependent diagnostics, and therapeutic approaches. In this regard, artificial intelligence and machine learning approaches can be used to build analytical models of complex disease aimed at prediction of personalized health conditions and outcomes. Such models must handle the wide heterogeneity of individuals in both their genetic predisposition and their social and environmental determinants. Computational approaches to medicine need to be able to efficiently manage, visualize and integrate, large datasets combining structure, and unstructured formats. This needs to be done while constrained by different levels of confidentiality, ideally doing so within a unified analytical architecture. Efficient data integration and management is key to the successful application of computational intelligence approaches to medicine. A number of challenges arise in the design of successful designs to medical data analytics under currently demanding conditions of performance in personalized medicine, while also subject to time, computational power, and bioethical constraints. Here, we will review some of these constraints and discuss possible avenues to overcome current challenges.
Collapse
Affiliation(s)
- Mireya Martínez-García
- Clinical Research Division, National Institute of Cardiology ‘Ignacio Chávez’, Mexico City, Mexico
| | - Enrique Hernández-Lemus
- Computational Genomics Division, National Institute of Genomic Medicine (INMEGEN), Mexico City, Mexico
- Center for Complexity Sciences, Universidad Nacional Autnoma de Mexico, Mexico City, Mexico
| |
Collapse
|
56
|
Identifying temporal and spatial patterns of variation from multimodal data using MEFISTO. Nat Methods 2022; 19:179-186. [PMID: 35027765 PMCID: PMC8828471 DOI: 10.1038/s41592-021-01343-9] [Citation(s) in RCA: 42] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Accepted: 11/05/2021] [Indexed: 01/04/2023]
Abstract
Factor analysis is a widely used method for dimensionality reduction in genome biology, with applications from personalized health to single-cell biology. Existing factor analysis models assume independence of the observed samples, an assumption that fails in spatio-temporal profiling studies. Here we present MEFISTO, a flexible and versatile toolbox for modeling high-dimensional data when spatial or temporal dependencies between the samples are known. MEFISTO maintains the established benefits of factor analysis for multimodal data, but enables the performance of spatio-temporally informed dimensionality reduction, interpolation, and separation of smooth from non-smooth patterns of variation. Moreover, MEFISTO can integrate multiple related datasets by simultaneously identifying and aligning the underlying patterns of variation in a data-driven manner. To illustrate MEFISTO, we apply the model to different datasets with spatial or temporal resolution, including an evolutionary atlas of organ development, a longitudinal microbiome study, a single-cell multi-omics atlas of mouse gastrulation and spatially resolved transcriptomics. MEFISTO models bulk and single-cell multi-omics data with temporal or spatial dependencies for interpretable pattern discovery and integration.
Collapse
|
57
|
Odom GJ, Colaprico A, Silva TC, Chen XS, Wang L. PathwayMultiomics: An R Package for Efficient Integrative Analysis of Multi-Omics Datasets With Matched or Un-matched Samples. Front Genet 2022; 12:783713. [PMID: 35003218 PMCID: PMC8729182 DOI: 10.3389/fgene.2021.783713] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2021] [Accepted: 12/07/2021] [Indexed: 01/27/2023] Open
Abstract
Recent advances in technology have made multi-omics datasets increasingly available to researchers. To leverage the wealth of information in multi-omics data, a number of integrative analysis strategies have been proposed recently. However, effectively extracting biological insights from these large, complex datasets remains challenging. In particular, matched samples with multiple types of omics data measured on each sample are often required for multi-omics analysis tools, which can significantly reduce the sample size. Another challenge is that analysis techniques such as dimension reductions, which extract association signals in high dimensional datasets by estimating a few variables that explain most of the variations in the samples, are typically applied to whole-genome data, which can be computationally demanding. Here we present pathwayMultiomics, a pathway-based approach for integrative analysis of multi-omics data with categorical, continuous, or survival outcome variables. The input of pathwayMultiomics is pathway p-values for individual omics data types, which are then integrated using a novel statistic, the MiniMax statistic, to prioritize pathways dysregulated in multiple types of omics datasets. Importantly, pathwayMultiomics is computationally efficient and does not require matched samples in multi-omics data. We performed a comprehensive simulation study to show that pathwayMultiomics significantly outperformed currently available multi-omics tools with improved power and well-controlled false-positive rates. In addition, we also analyzed real multi-omics datasets to show that pathwayMultiomics was able to recover known biology by nominating biologically meaningful pathways in complex diseases such as Alzheimer's disease.
Collapse
Affiliation(s)
- Gabriel J Odom
- Department of Biostatistics, Stempel College of Public Health, Florida International University, Miami, FL, United States.,Department of Public Health Sciences, Miller School of Medicine, University of Miami, Miami, FL, United States
| | - Antonio Colaprico
- Department of Public Health Sciences, Miller School of Medicine, University of Miami, Miami, FL, United States
| | - Tiago C Silva
- Department of Public Health Sciences, Miller School of Medicine, University of Miami, Miami, FL, United States
| | - X Steven Chen
- Department of Public Health Sciences, Miller School of Medicine, University of Miami, Miami, FL, United States.,Sylvester Comprehensive Cancer Center, Miller School of Medicine, University of Miami, Miami, FL, United States
| | - Lily Wang
- Department of Public Health Sciences, Miller School of Medicine, University of Miami, Miami, FL, United States.,Sylvester Comprehensive Cancer Center, Miller School of Medicine, University of Miami, Miami, FL, United States.,Dr. John T Macdonald Foundation Department of Human Genetics, Miller School of Medicine, University of Miami, Miami, FL, United States.,John P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, FL, United States
| |
Collapse
|
58
|
Xu Y, Das P, McCord RP. SMILE: mutual information learning for integration of single-cell omics data. Bioinformatics 2022; 38:476-486. [PMID: 34623402 DOI: 10.1093/bioinformatics/btab706] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 09/15/2021] [Accepted: 10/06/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Deep learning approaches have empowered single-cell omics data analysis in many ways and generated new insights from complex cellular systems. As there is an increasing need for single-cell omics data to be integrated across sources, types and features of data, the challenges of integrating single-cell omics data are rising. Here, we present an unsupervised deep learning algorithm that learns discriminative representations for single-cell data via maximizing mutual information, SMILE (Single-cell Mutual Information Learning). RESULTS Using a unique cell-pairing design, SMILE successfully integrates multisource single-cell transcriptome data, removing batch effects and projecting similar cell types, even from different tissues, into the shared space. SMILE can also integrate data from two or more modalities, such as joint-profiling technologies using single-cell ATAC-seq, RNA-seq, DNA methylation, Hi-C and ChIP data. When paired cells are known, SMILE can integrate data with unmatched feature, such as genes for RNA-seq and genome-wide peaks for ATAC-seq. Integrated representations learned from joint-profiling technologies can then be used as a framework for comparing independent single source data. AVAILABILITY AND IMPLEMENTATION The source code of SMILE including analyses of key results in the study can be found at: https://github.com/rpmccordlab/SMILE, implemented in Python. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yang Xu
- UT-ORNL Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN 37996, USA
| | - Priyojit Das
- UT-ORNL Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN 37996, USA
| | - Rachel Patton McCord
- Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, TN 37996, USA
| |
Collapse
|
59
|
Tripp BA, Otu HH. Integration of Multi-Omics Data Using Probabilistic Graph Models and
External Knowledge. Curr Bioinform 2022. [DOI: 10.2174/1574893616666210906141545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
High-throughput sequencing technologies have revolutionized the ability to
perform systems-level biology and elucidate molecular mechanisms of disease through the comprehensive
characterization of different layers of biological information. Integration of these heterogeneous
layers can provide insight into the underlying biology but is challenged by modeling complex interactions.
Objective:
We introduce OBaNK: omics integration using Bayesian networks and external knowledge,
an algorithm to model interactions between heterogeneous high-dimensional biological data to elucidate
complex functional clusters and emergent relationships associated with an observed phenotype.
Method:
Using Bayesian network learning, we modeled the statistical dependencies and interactions
between lipidomics, proteomics, and metabolomics data. The strength of a learned interaction between
molecules was altered based on external knowledge.
Results :
Networks learned from synthetic datasets based on real pathways achieved an average area under
the curve score of ~0.85, an improvement of ~0.23 from baseline methods. When applied to real
multi-omics data collected during pregnancy, five distinct functional networks of heterogeneous biological
data were identified, and the results were compared to other multi-omics integration approaches.
Conclusion:
OBaNK successfully improved the accuracy of learning interaction networks from data integrating
external knowledge, identified heterogeneous functional networks from real data, and suggested
potential novel interactions associated with the phenotype. These findings can guide future hypothesis
generation. OBaNK source code is available at: https://github.com/bridgettripp/OBaNK.git, and a
graphical user interface is available at: http://otulab.unl.edu/OBaNK.
Collapse
Affiliation(s)
- Bridget A. Tripp
- Department of Electrical and Computer Engineering, University of Nebraska-Lincoln, Lincoln, Nebraska, USA
- PhD Program of Complex Biosystems, University of Nebraska-Lincoln, Lincoln, Nebraska, USA
| | - Hasan H. Otu
- Department of Electrical and Computer Engineering, University of Nebraska-Lincoln, Lincoln, Nebraska, USA
| |
Collapse
|
60
|
De Vito R, Bellio R, Trippa L, Parmigiani G. Bayesian multistudy factor analysis for high-throughput biological data. Ann Appl Stat 2021. [DOI: 10.1214/21-aoas1456] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
| | - Ruggero Bellio
- Department of Economics and Statistics, University of Udine
| | - Lorenzo Trippa
- Department of Data Science, Dana Farber Cancer Institute
| | | |
Collapse
|
61
|
|
62
|
Microbial Community Dynamics during Biodegradation of Crude Oil and Its Response to Biostimulation in Svalbard Seawater at Low Temperature. Microorganisms 2021; 9:microorganisms9122425. [PMID: 34946026 PMCID: PMC8707851 DOI: 10.3390/microorganisms9122425] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Revised: 11/19/2021] [Accepted: 11/22/2021] [Indexed: 11/17/2022] Open
Abstract
The development of oil exploration activities and an increase in shipping in Arctic areas have increased the risk of oil spills in this cold marine environment. The objective of this experimental study was to assess the effect of biostimulation on microbial community abundance, structure, dynamics, and metabolic potential for oil hydrocarbon degradation in oil-contaminated Arctic seawater. The combination of amplicon-based and shotgun sequencing, together with the integration of genome-resolved metagenomics and omics data, was applied to assess microbial community structure and metabolic properties in naphthenic crude oil-amended microcosms. The comparison of estimates for oil-degrading microbial taxa obtained with different sequencing and taxonomic assignment methods showed substantial discrepancies between applied methods. Consequently, the data acquired with different methods was integrated for the analysis of microbial community structure, and amended with quantitative PCR, producing a more objective description of microbial community dynamics and evaluation of the effect of biostimulation on particular microbial taxa. Implementing biostimulation of the seawater microbial community with the addition of nutrients resulted in substantially elevated prokaryotic community abundance (103-fold), a distinctly different bacterial community structure from that in the initial seawater, 1.3-fold elevation in the normalized abundance of hydrocarbon degradation genes, and 12% enhancement of crude oil biodegradation. The bacterial communities in biostimulated microcosms after four months of incubation were dominated by Gammaproteobacterial genera Pseudomonas, Marinomonas, and Oleispira, which were succeeded by Cycloclasticus and Paraperlucidibaca after eight months of incubation. The majority of 195 compiled good-quality metagenome-assembled genomes (MAGs) exhibited diverse hydrocarbon degradation gene profiles. The results reveal that biostimulation with nutrients promotes naphthenic oil degradation in Arctic seawater, but this strategy alone might not be sufficient to effectively achieve bioremediation goals within a reasonable timeframe.
Collapse
|
63
|
Nguyen H, Tran D, Tran B, Roy M, Cassell A, Dascalu S, Draghici S, Nguyen T. SMRT: Randomized Data Transformation for Cancer Subtyping and Big Data Analysis. Front Oncol 2021; 11:725133. [PMID: 34745946 PMCID: PMC8563705 DOI: 10.3389/fonc.2021.725133] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Accepted: 09/28/2021] [Indexed: 12/25/2022] Open
Abstract
Cancer is an umbrella term that includes a range of disorders, from those that are fast-growing and lethal to indolent lesions with low or delayed potential for progression to death. The treatment options, as well as treatment success, are highly dependent on the correct subtyping of individual patients. With the advancement of high-throughput platforms, we have the opportunity to differentiate among cancer subtypes from a holistic perspective that takes into consideration phenomena at different molecular levels (mRNA, methylation, etc.). This demands powerful integrative methods to leverage large multi-omics datasets for a better subtyping. Here we introduce Subtyping Multi-omics using a Randomized Transformation (SMRT), a new method for multi-omics integration and cancer subtyping. SMRT offers the following advantages over existing approaches: (i) the scalable analysis pipeline allows researchers to integrate multi-omics data and analyze hundreds of thousands of samples in minutes, (ii) the ability to integrate data types with different numbers of patients, (iii) the ability to analyze un-matched data of different types, and (iv) the ability to offer users a convenient data analysis pipeline through a web application. We also improve the efficiency of our ensemble-based, perturbation clustering to support analysis on machines with memory constraints. In an extensive analysis, we compare SMRT with eight state-of-the-art subtyping methods using 37 TCGA and two METABRIC datasets comprising a total of almost 12,000 patient samples from 28 different types of cancer. We also performed a number of simulation studies. We demonstrate that SMRT outperforms other methods in identifying subtypes with significantly different survival profiles. In addition, SMRT is extremely fast, being able to analyze hundreds of thousands of samples in minutes. The web application is available at http://SMRT.tinnguyen-lab.com. The R package will be deposited to CRAN as part of our PINSPlus software suite.
Collapse
Affiliation(s)
- Hung Nguyen
- Department of Computer Science and Engineering, University of Nevada Reno, Reno, NV, United States
| | - Duc Tran
- Department of Computer Science and Engineering, University of Nevada Reno, Reno, NV, United States
| | - Bang Tran
- Department of Computer Science and Engineering, University of Nevada Reno, Reno, NV, United States
| | - Monikrishna Roy
- Department of Computer Science and Engineering, University of Nevada Reno, Reno, NV, United States
| | - Adam Cassell
- Department of Computer Science and Engineering, University of Nevada Reno, Reno, NV, United States
| | - Sergiu Dascalu
- Department of Computer Science and Engineering, University of Nevada Reno, Reno, NV, United States
| | - Sorin Draghici
- Department of Computer Science, Wayne State University, Detroit, MI, United States
| | - Tin Nguyen
- Department of Computer Science and Engineering, University of Nevada Reno, Reno, NV, United States
| |
Collapse
|
64
|
Demirel HC, Arici MK, Tuncbag N. Computational approaches leveraging integrated connections of multi-omic data toward clinical applications. Mol Omics 2021; 18:7-18. [PMID: 34734935 DOI: 10.1039/d1mo00158b] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
In line with the advances in high-throughput technologies, multiple omic datasets have accumulated to study biological systems and diseases coherently. No single omics data type is capable of fully representing cellular activity. The complexity of the biological processes arises from the interactions between omic entities such as genes, proteins, and metabolites. Therefore, multi-omic data integration is crucial but challenging. The impact of the molecular alterations in multi-omic data is not local in the neighborhood of the altered gene or protein; rather, the impact diffuses in the network and changes the functionality of multiple signaling pathways and regulation of the gene expression. Additionally, multi-omic data is high-dimensional and has background noise. Several integrative approaches have been developed to accurately interpret the multi-omic datasets, including machine learning, network-based methods, and their combination. In this review, we overview the most recent integrative approaches and tools with a focus on network-based methods. We then discuss these approaches according to their specific applications, from disease-network and biomarker identification to patient stratification, drug discovery, and repurposing.
Collapse
Affiliation(s)
- Habibe Cansu Demirel
- Graduate School of Informatics, Middle East Technical University, Ankara, 06800, Turkey
| | - Muslum Kaan Arici
- Graduate School of Informatics, Middle East Technical University, Ankara, 06800, Turkey.,Foot and Mouth Diseases Institute, Ministry of Agriculture and Forestry, Ankara, 06044, Turkey
| | - Nurcan Tuncbag
- Chemical and Biological Engineering, College of Engineering, Koc University, Istanbul, 34450, Turkey.,School of Medicine, Koc University, Istanbul, 34450, Turkey.,Koc University Research Center for Translational Medicine (KUTTAM), Istanbul, Turkey.
| |
Collapse
|
65
|
Berlow NE. Probabilistic Boolean Modeling of Pre-clinical Tumor Models for Biomarker Identification in Cancer Drug Development. Curr Protoc 2021; 1:e269. [PMID: 34661991 DOI: 10.1002/cpz1.269] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
As high-throughput sequencing experiments become more widely used in pre-clinical and clinical settings, pharmacogenetic and pharmacogenomic biomarker development plays an increasingly important role in oncology drug development pipelines and programs. Consequently, computer-based learning approaches have entered into use at multiple stages in pre-clinical and clinical pipelines. However, few approaches are available to identify interpretable and implementable biomarkers of response early in the drug development process when only small pre-clinical data packages are available. To address the need for early-stage biomarker development using pre-clinical tumor models, we have adapted the previously published Probabilistic Target Inhibitor Map (PTIM) platform to the challenge of biomarker hypothesis development, and denoted this approach the Probabilistic Target Map-Biomarker (PTM-Biomarker). In this article, we detail the history and design philosophy of PTM-Biomarker, and present two case studies using the biomarker discovery tool to illustrate its utility in guiding cancer drug development. © 2021 Wiley Periodicals LLC.
Collapse
|
66
|
Ding P, Ouyang W, Luo J, Kwoh CK. Heterogeneous information network and its application to human health and disease. Brief Bioinform 2021; 21:1327-1346. [PMID: 31566212 DOI: 10.1093/bib/bbz091] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 06/29/2019] [Accepted: 06/30/2019] [Indexed: 12/11/2022] Open
Abstract
The molecular components with the functional interdependencies in human cell form complicated biological network. Diseases are mostly caused by the perturbations of the composite of the interaction multi-biomolecules, rather than an abnormality of a single biomolecule. Furthermore, new biological functions and processes could be revealed by discovering novel biological entity relationships. Hence, more and more biologists focus on studying the complex biological system instead of the individual biological components. The emergence of heterogeneous information network (HIN) offers a promising way to systematically explore complicated and heterogeneous relationships between various molecules for apparently distinct phenotypes. In this review, we first present the basic definition of HIN and the biological system considered as a complex HIN. Then, we discuss the topological properties of HIN and how these can be applied to detect network motif and functional module. Afterwards, methodologies of discovering relationships between disease and biomolecule are presented. Useful insights on how HIN aids in drug development and explores human interactome are provided. Finally, we analyze the challenges and opportunities for uncovering combinatorial patterns among pharmacogenomics and cell-type detection based on single-cell genomic data.
Collapse
Affiliation(s)
- Pingjian Ding
- School of Computer Science, University of South China, Hengyang, China
| | - Wenjue Ouyang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Chee-Keong Kwoh
- School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
67
|
Chai J, Lv X, Diao Q, Usdrowski H, Zhuang Y, Huang W, Cui K, Zhang N. Solid diet manipulates rumen epithelial microbiota and its interactions with host transcriptomic in young ruminants. Environ Microbiol 2021; 23:6557-6568. [PMID: 34490978 PMCID: PMC9292864 DOI: 10.1111/1462-2920.15757] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Accepted: 09/03/2021] [Indexed: 11/28/2022]
Abstract
Solid diet supplementation in the early life stages of ruminants could improve rumen microbiota and tissue development. However, most studies focus on bacteria in the rumen content community. The microbiota attached on rumen epithelium are rarely investigated, and their correlations with rumen content bacteria and host transcripts are unknown. In this study, rumen digesta attached in the epithelium from goats in three diet regimes (milk replacer only, milk replacer supplemented concentrate and milk replacer supplemented concentrate plus alfalfa pellets) were collected for measurement of the epithelial microbiota using next generation sequencing. Correspondingly, the rumen tissues of the same animals were measured with the host transcriptome. The distinct microbial structures and compositions between rumen content and epithelial communities were associated with solid diet supplementation. Regarding rumen development in pre‐weaning ruminants, a solid diet, especially its accompanying neutral detergent fibre nutrients, was the most significant driver that influenced the rumen microbiota and epithelium gene expression. Compared with content bacteria, rumen epithelial microbiota had a stronger association with the host transcriptome. The host transcriptome correlated with host phenotypes were associated with rumen epithelial microbiota and solid diet. This study reveals that the epithelial microbiota is crucial for proper rumen development, and solid diet could improve rumen development through both the rumen content and epithelial microbiota.
Collapse
Affiliation(s)
- Jianmin Chai
- Feed Research Institute of Chinese Academy of Agricultural Sciences, Key Laboratory of Feed Biotechnology of the Ministry of Agriculture and Rural Affairs, Beijing, 100081, China.,Department of Animal Science, Division of Agriculture, University of Arkansas, Fayetteville, AR, 72701, USA
| | - Xiaokang Lv
- Feed Research Institute of Chinese Academy of Agricultural Sciences, Key Laboratory of Feed Biotechnology of the Ministry of Agriculture and Rural Affairs, Beijing, 100081, China
| | - Qiyu Diao
- Feed Research Institute of Chinese Academy of Agricultural Sciences, Key Laboratory of Feed Biotechnology of the Ministry of Agriculture and Rural Affairs, Beijing, 100081, China
| | - Hunter Usdrowski
- Department of Animal Science, Division of Agriculture, University of Arkansas, Fayetteville, AR, 72701, USA
| | - Yimin Zhuang
- Feed Research Institute of Chinese Academy of Agricultural Sciences, Key Laboratory of Feed Biotechnology of the Ministry of Agriculture and Rural Affairs, Beijing, 100081, China
| | - Wenqin Huang
- Feed Research Institute of Chinese Academy of Agricultural Sciences, Key Laboratory of Feed Biotechnology of the Ministry of Agriculture and Rural Affairs, Beijing, 100081, China
| | - Kai Cui
- Feed Research Institute of Chinese Academy of Agricultural Sciences, Key Laboratory of Feed Biotechnology of the Ministry of Agriculture and Rural Affairs, Beijing, 100081, China
| | - Naifeng Zhang
- Feed Research Institute of Chinese Academy of Agricultural Sciences, Key Laboratory of Feed Biotechnology of the Ministry of Agriculture and Rural Affairs, Beijing, 100081, China
| |
Collapse
|
68
|
Immunophenotyping assessment in a COVID-19 cohort (IMPACC): A prospective longitudinal study. Sci Immunol 2021; 6:6/62/eabf3733. [PMID: 34376480 PMCID: PMC8713959 DOI: 10.1126/sciimmunol.abf3733] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Accepted: 08/05/2021] [Indexed: 12/13/2022]
Abstract
The Immunophenotyping Assessment in a COVID-19 Cohort (IMPACC) is a prospective longitudinal study designed to enroll 1000 hospitalized patients with COVID-19 (NCT04378777). IMPACC collects detailed clinical, laboratory, and radiographic data along with longitudinal biologic sampling of blood and respiratory secretions for in-depth testing. Clinical and laboratory data are integrated to identify immunologic, virologic, proteomic, metabolomic, and genomic features of COVID-19–related susceptibility, severity, and disease progression. The goals of IMPACC are to better understand the contributions of pathogen dynamics and host immune responses to the severity and course of COVID-19 and to generate hypotheses for identification of biomarkers and effective therapeutics, including optimal timing of such interventions. In this report, we summarize the IMPACC study design and protocols including clinical criteria and recruitment, multisite standardized sample collection and processing, virologic and immunologic assays, harmonization of assay protocols, high-level analyses, and the data sharing plans.
Collapse
|
69
|
Herrmann HA, Rusz M, Baier D, Jakupec MA, Keppler BK, Berger W, Koellensperger G, Zanghellini J. Thermodynamic Genome-Scale Metabolic Modeling of Metallodrug Resistance in Colorectal Cancer. Cancers (Basel) 2021; 13:cancers13164130. [PMID: 34439283 PMCID: PMC8391396 DOI: 10.3390/cancers13164130] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 07/23/2021] [Accepted: 08/03/2021] [Indexed: 12/11/2022] Open
Abstract
Simple Summary Cancer, but also its treatment, can lead to a reprogramming of cellular metabolism. These changes are observable in metabolite abundances, which can be unbiasedly measured via mass spectrometry metabolomics. However, even when the metabolome changes strongly, a (mechanistic) interpretation is difficult as metabolite levels do not necessarily directly correspond to pathway activities. Here we measure the changes of the cellular metabolome in colorectal cancer cell lines sensitive and resistant to the ruthenium-based drug BOLD-100/KP1339 and the platinum-based drug oxaliplatin. We map these changes onto a cancer-specific genome-scale metabolic model, which allows us not only to compute intracellular flux distributions, but also to disentangle drug-specific effects from growth differences from differences in metabolic adaptations due to resistance. Specifically, we find that resistance to BOLD-100/KP1339 induces more extensive reprogramming than oxaliplatin, especially with respect to fatty acid and amino acid metabolism. Abstract Background: Mass spectrometry-based metabolomics approaches provide an immense opportunity to enhance our understanding of the mechanisms that underpin the cellular reprogramming of cancers. Accurate comparative metabolic profiling of heterogeneous conditions, however, is still a challenge. Methods: Measuring both intracellular and extracellular metabolite concentrations, we constrain four instances of a thermodynamic genome-scale metabolic model of the HCT116 colorectal carcinoma cell line to compare the metabolic flux profiles of cells that are either sensitive or resistant to ruthenium- or platinum-based treatments with BOLD-100/KP1339 and oxaliplatin, respectively. Results: Normalizing according to growth rate and normalizing resistant cells according to their respective sensitive controls, we are able to dissect metabolic responses specific to the drug and to the resistance states. We find the normalization steps to be crucial in the interpretation of the metabolomics data and show that the metabolic reprogramming in resistant cells is limited to a select number of pathways. Conclusions: Here, we elucidate the key importance of normalization steps in the interpretation of metabolomics data, allowing us to uncover drug-specific metabolic reprogramming during acquired metal-drug resistance.
Collapse
Affiliation(s)
- Helena A. Herrmann
- Department of Analytical Chemistry, University of Vienna, 1090 Vienna, Austria; (H.A.H.); (M.R.)
| | - Mate Rusz
- Department of Analytical Chemistry, University of Vienna, 1090 Vienna, Austria; (H.A.H.); (M.R.)
- Institute of Inorganic Chemistry, University of Vienna, 1090 Vienna, Austria; (D.B.); (M.A.J.); (B.K.K.)
| | - Dina Baier
- Institute of Inorganic Chemistry, University of Vienna, 1090 Vienna, Austria; (D.B.); (M.A.J.); (B.K.K.)
| | - Michael A. Jakupec
- Institute of Inorganic Chemistry, University of Vienna, 1090 Vienna, Austria; (D.B.); (M.A.J.); (B.K.K.)
- Research Cluster Translational Cancer Therapy Research, University of Vienna and Medical University of Vienna, 1090 Vienna, Austria;
| | - Bernhard K. Keppler
- Institute of Inorganic Chemistry, University of Vienna, 1090 Vienna, Austria; (D.B.); (M.A.J.); (B.K.K.)
- Research Cluster Translational Cancer Therapy Research, University of Vienna and Medical University of Vienna, 1090 Vienna, Austria;
| | - Walter Berger
- Research Cluster Translational Cancer Therapy Research, University of Vienna and Medical University of Vienna, 1090 Vienna, Austria;
- Institute of Cancer Research and Comprehensive Cancer Center, Medical University of Vienna, 1090 Vienna, Austria
| | - Gunda Koellensperger
- Department of Analytical Chemistry, University of Vienna, 1090 Vienna, Austria; (H.A.H.); (M.R.)
- Vienna Metabolomics Center (VIME), University of Vienna, 1090 Vienna, Austria
- Research Network Chemistry Meets Microbiology, University of Vienna, 1090 Vienna, Austria
- Correspondence: (G.K.); (J.Z.)
| | - Jürgen Zanghellini
- Department of Analytical Chemistry, University of Vienna, 1090 Vienna, Austria; (H.A.H.); (M.R.)
- Correspondence: (G.K.); (J.Z.)
| |
Collapse
|
70
|
Duan R, Gao L, Gao Y, Hu Y, Xu H, Huang M, Song K, Wang H, Dong Y, Jiang C, Zhang C, Jia S. Evaluation and comparison of multi-omics data integration methods for cancer subtyping. PLoS Comput Biol 2021; 17:e1009224. [PMID: 34383739 PMCID: PMC8384175 DOI: 10.1371/journal.pcbi.1009224] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Revised: 08/24/2021] [Accepted: 06/28/2021] [Indexed: 11/18/2022] Open
Abstract
Computational integrative analysis has become a significant approach in the data-driven exploration of biological problems. Many integration methods for cancer subtyping have been proposed, but evaluating these methods has become a complicated problem due to the lack of gold standards. Moreover, questions of practical importance remain to be addressed regarding the impact of selecting appropriate data types and combinations on the performance of integrative studies. Here, we constructed three classes of benchmarking datasets of nine cancers in TCGA by considering all the eleven combinations of four multi-omics data types. Using these datasets, we conducted a comprehensive evaluation of ten representative integration methods for cancer subtyping in terms of accuracy measured by combining both clustering accuracy and clinical significance, robustness, and computational efficiency. We subsequently investigated the influence of different omics data on cancer subtyping and the effectiveness of their combinations. Refuting the widely held intuition that incorporating more types of omics data always produces better results, our analyses showed that there are situations where integrating more omics data negatively impacts the performance of integration methods. Our analyses also suggested several effective combinations for most cancers under our studies, which may be of particular interest to researchers in omics data analysis. Cancer is one of the most heterogeneous diseases, characterized by diverse morphological, phenotypic, and genomic profiles between tumors and their subtypes. Identifying cancer subtypes can help patients receive precise treatments. With the development of high-throughput technologies, genomics, epigenomics, and transcriptomics data have been generated for large cancer patient cohorts. It is believed that the more omics data we use, the more accurate identification of cancer subtypes. To examine this assumption, we first constructed three classes of benchmarking datasets to conduct a comprehensive evaluation and comparison of ten representative multi-omics data integration methods for cancer subtyping by considering their accuracy, robustness, and computational efficiency. Then, we investigated the influence of different omics data and their various combinations on the effectiveness of cancer subtyping. Our analyses showed that there are situations where integrating more omics data negatively impacts the performance of integration methods. We hope that our work may help researchers choose a proper method and an effective data combination when identifying cancer subtypes using data integration methods.
Collapse
Affiliation(s)
- Ran Duan
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi’an, China
- * E-mail:
| | - Yong Gao
- Department of Computer Science, The University of British Columbia Okanagan, Kelowna, British Columbia, Canada
| | - Yuxuan Hu
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Han Xu
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Mingfeng Huang
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Kuo Song
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Hongda Wang
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Yongqiang Dong
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Chaoqun Jiang
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Chenxing Zhang
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Songwei Jia
- School of Computer Science and Technology, Xidian University, Xi’an, China
| |
Collapse
|
71
|
Tepeli YI, Ünal AB, Akdemir FM, Tastan O. PAMOGK: a pathway graph kernel-based multiomics approach for patient clustering. Bioinformatics 2021; 36:5237-5246. [PMID: 32730565 DOI: 10.1093/bioinformatics/btaa655] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2020] [Revised: 06/30/2020] [Accepted: 07/20/2020] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION Accurate classification of patients into molecular subgroups is critical for the development of effective therapeutics and for deciphering what drives these subgroups to cancer. The availability of multiomics data catalogs for large cohorts of cancer patients provides multiple views into the molecular biology of the tumors with unprecedented resolution. RESULTS We develop Pathway-based MultiOmic Graph Kernel clustering (PAMOGK) that integrates multiomics patient data with existing biological knowledge on pathways. We develop a novel graph kernel that evaluates patient similarities based on a single molecular alteration type in the context of a pathway. To corroborate multiple views of patients evaluated by hundreds of pathways and molecular alteration combinations, we use multiview kernel clustering. Applying PAMOGK to kidney renal clear cell carcinoma (KIRC) patients results in four clusters with significantly different survival times (P-value =1.24e-11). When we compare PAMOGK to eight other state-of-the-art multiomics clustering methods, PAMOGK consistently outperforms these in terms of its ability to partition KIRC patients into groups with different survival distributions. The discovered patient subgroups also differ with respect to other clinical parameters such as tumor stage and grade, and primary tumor and metastasis tumor spreads. The pathways identified as important are highly relevant to KIRC. AVAILABILITY AND IMPLEMENTATION github.com/tastanlab/pamogk. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yasin Ilkagan Tepeli
- Department of Computer Science and Engineering, Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul 34956, Turkey
| | - Ali Burak Ünal
- Department of Computer Science and Engineering, Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul 34956, Turkey.,Department of Computer Engineering, Bilkent University, Ankara 06800, Turkey
| | | | - Oznur Tastan
- Department of Computer Science and Engineering, Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul 34956, Turkey
| |
Collapse
|
72
|
Lê Cao KA, Abadi AJ, Davis-Marcisak EF, Hsu L, Arora A, Coullomb A, Deshpande A, Feng Y, Jeganathan P, Loth M, Meng C, Mu W, Pancaldi V, Sankaran K, Righelli D, Singh A, Sodicoff JS, Stein-O’Brien GL, Subramanian A, Welch JD, You Y, Argelaguet R, Carey VJ, Dries R, Greene CS, Holmes S, Love MI, Ritchie ME, Yuan GC, Culhane AC, Fertig E. Community-wide hackathons to identify central themes in single-cell multi-omics. Genome Biol 2021; 22:220. [PMID: 34353350 PMCID: PMC8340473 DOI: 10.1186/s13059-021-02433-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Affiliation(s)
- Kim-Anh Lê Cao
- Melbourne Integrative Genomics, School of Mathematics and Statistics, University of Melbourne, Melbourne, Australia
| | - Al J. Abadi
- Melbourne Integrative Genomics, School of Mathematics and Statistics, University of Melbourne, Melbourne, Australia
| | - Emily F. Davis-Marcisak
- McKusick-Nathans Institute of the Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD USA
| | - Lauren Hsu
- Data Science, Dana-Farber Cancer Institute, Boston, MA USA
- Department of Genetics, UNC, Chapel Hill, NC USA
| | - Arshi Arora
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY USA
| | - Alexis Coullomb
- Centre de Recherches en Cancérologie de Toulouse (INSERM), Université Paul Sabatier III, Toulouse, France
| | - Atul Deshpande
- Cancer Convergence Institute and Division of Quantitative Sciences, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD USA
| | - Yuzhou Feng
- Melbourne Integrative Genomics, School of Mathematics and Statistics, University of Melbourne, Melbourne, Australia
| | | | - Melanie Loth
- Cancer Convergence Institute and Division of Quantitative Sciences, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD USA
| | - Chen Meng
- Bavarian Center for Biomolecular Mass Spectrometry (BayBioMS), School of Life Sciences, Technical University of Munich, Munich, Germany
| | - Wancen Mu
- Department of Biostatistics, UNC, Chapel Hill, NC USA
| | - Vera Pancaldi
- Centre de Recherches en Cancérologie de Toulouse (INSERM), Université Paul Sabatier III, Toulouse, France
- Barcelona Supercomputing Center, Barcelona, Spain
| | - Kris Sankaran
- Department of Statistics, University of Wisconsin, Madison, WI USA
| | - Dario Righelli
- Department of Statistical Sciences, University of Padova, Padova, PD Italy
| | - Amrit Singh
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC Canada
- PROOF Centre of Excellence, Vancouver, BC Canada
| | - Joshua S. Sodicoff
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI USA
- Department of Biomedical Engineering, University of Michigan, Ann Arbor, MI USA
| | - Genevieve L. Stein-O’Brien
- McKusick-Nathans Institute of the Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD USA
- Cancer Convergence Institute and Division of Quantitative Sciences, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD USA
- Department of Neuroscience, Johns Hopkins University, Baltimore, MD USA
- Kavli Neuroscience Discovery Institute, Johns Hopkins University, Baltimore, MD USA
| | | | - Joshua D. Welch
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI USA
- Department of Computer Science and Engineering, University of Michigan, Ann Arbor, MI USA
| | - Yue You
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, University of Melbourne, Melbourne, Australia
- Department of Medical Biology, University of Melbourne, Melbourne, Australia
| | | | - Vincent J. Carey
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA USA
| | - Ruben Dries
- Department of Hematology and Oncology, Boston Medical Center, Boston, MA USA
- Department of Computational Biomedicine, Boston University School of Medicine, Boston, MA USA
- Center for Regenerative Medicine (CReM), Boston University, Boston, MA USA
| | - Casey S. Greene
- Center for Health AI and Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, CO USA
| | - Susan Holmes
- Department of Statistics, Stanford University, Stanford, CA USA
| | - Michael I. Love
- Department of Biostatistics, UNC, Chapel Hill, NC USA
- Department of Genetics, UNC, Chapel Hill, NC USA
| | - Matthew E. Ritchie
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, University of Melbourne, Melbourne, Australia
- School of Mathematics and Statistics, University of Melbourne, Melbourne, Australia
- Department of Medical Biology, University of Melbourne, Melbourne, Australia
| | - Guo-Cheng Yuan
- Department of Genetics and Genomic Sciences, Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY USA
| | - Aedin C. Culhane
- Data Science, Dana-Farber Cancer Institute, Boston, MA USA
- Biostatistics, Harvard TH Chan School of Public Health, Boston, MA USA
| | - Elana Fertig
- Cancer Convergence Institute and Division of Quantitative Sciences, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD USA
- Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, MD USA
- Department of Applied Mathematics and Statistics, Johns Hopkins University Whiting School of Engineering, Baltimore, MD USA
| |
Collapse
|
73
|
Zhou G, Ewald J, Xia J. OmicsAnalyst: a comprehensive web-based platform for visual analytics of multi-omics data. Nucleic Acids Res 2021; 49:W476-W482. [PMID: 34019646 PMCID: PMC8262745 DOI: 10.1093/nar/gkab394] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 04/26/2021] [Accepted: 04/28/2021] [Indexed: 01/03/2023] Open
Abstract
Data analysis and interpretation remain a critical bottleneck in current multi-omics studies. Here, we introduce OmicsAnalyst, a user-friendly, web-based platform that allows users to perform a wide range of well-established data-driven approaches for multi-omics integration, and visually explore their results in a clear and meaningful manner. To help navigate complex landscapes of multi-omics analysis, these approaches are organized into three visual analytics tracks: (i) the correlation network analysis track, where users choose among univariate and multivariate methods to identify important features and explore their relationships in 2D or 3D networks; (ii) the cluster heatmap analysis track, where users apply several cutting-edge multi-view clustering algorithms and explore their results via interactive heatmaps; and (iii) the dimension reduction analysis track, where users choose among several recent multivariate techniques to reveal global data structures, and explore corresponding scores, loadings and biplots in interactive 3D scatter plots. The three visual analytics tracks are equipped with comprehensive options for parameter customization, view customization and targeted analysis. OmicsAnalyst lowers the access barriers to many well-established methods for multi-omics integration via novel visual analytics. It is freely available at https://www.omicsanalyst.ca.
Collapse
Affiliation(s)
- Guangyan Zhou
- Institute of Parasitology, McGill University, Montreal, Quebec, Canada
| | - Jessica Ewald
- Department of Natural Resource Sciences, McGill University, Montreal, Quebec, Canada
| | - Jianguo Xia
- Institute of Parasitology, McGill University, Montreal, Quebec, Canada.,Department of Animal Science, McGill University, Montreal, Quebec, Canada
| |
Collapse
|
74
|
Stanton JE, Malijauskaite S, McGourty K, Grabrucker AM. The Metallome as a Link Between the "Omes" in Autism Spectrum Disorders. Front Mol Neurosci 2021; 14:695873. [PMID: 34290588 PMCID: PMC8289253 DOI: 10.3389/fnmol.2021.695873] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Accepted: 06/14/2021] [Indexed: 12/26/2022] Open
Abstract
Metal dyshomeostasis plays a significant role in various neurological diseases such as Alzheimer's disease, Parkinson's disease, Autism Spectrum Disorders (ASD), and many more. Like studies investigating the proteome, transcriptome, epigenome, microbiome, etc., for years, metallomics studies have focused on data from their domain, i.e., trace metal composition, only. Still, few have considered the links between other "omes," which may together result in an individual's specific pathologies. In particular, ASD have been reported to have multitudes of possible causal effects. Metallomics data focusing on metal deficiencies and dyshomeostasis can be linked to functions of metalloenzymes, metal transporters, and transcription factors, thus affecting the proteome and transcriptome. Furthermore, recent studies in ASD have emphasized the gut-brain axis, with alterations in the microbiome being linked to changes in the metabolome and inflammatory processes. However, the microbiome and other "omes" are heavily influenced by the metallome. Thus, here, we will summarize the known implications of a changed metallome for other "omes" in the body in the context of "omics" studies in ASD. We will highlight possible connections and propose a model that may explain the so far independently reported pathologies in ASD.
Collapse
Affiliation(s)
- Janelle E Stanton
- Department of Biological Sciences, University of Limerick, Limerick, Ireland.,Bernal Institute, University of Limerick, Limerick, Ireland
| | - Sigita Malijauskaite
- Bernal Institute, University of Limerick, Limerick, Ireland.,Department of Chemical Sciences, University of Limerick, Limerick, Ireland
| | - Kieran McGourty
- Bernal Institute, University of Limerick, Limerick, Ireland.,Department of Chemical Sciences, University of Limerick, Limerick, Ireland.,Health Research Institute, University of Limerick, Limerick, Ireland
| | - Andreas M Grabrucker
- Department of Biological Sciences, University of Limerick, Limerick, Ireland.,Bernal Institute, University of Limerick, Limerick, Ireland.,Health Research Institute, University of Limerick, Limerick, Ireland
| |
Collapse
|
75
|
Picard M, Scott-Boyer MP, Bodein A, Périn O, Droit A. Integration strategies of multi-omics data for machine learning analysis. Comput Struct Biotechnol J 2021; 19:3735-3746. [PMID: 34285775 PMCID: PMC8258788 DOI: 10.1016/j.csbj.2021.06.030] [Citation(s) in RCA: 154] [Impact Index Per Article: 51.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 06/17/2021] [Accepted: 06/21/2021] [Indexed: 12/25/2022] Open
Abstract
Increased availability of high-throughput technologies has generated an ever-growing number of omics data that seek to portray many different but complementary biological layers including genomics, epigenomics, transcriptomics, proteomics, and metabolomics. New insight from these data have been obtained by machine learning algorithms that have produced diagnostic and classification biomarkers. Most biomarkers obtained to date however only include one omic measurement at a time and thus do not take full advantage of recent multi-omics experiments that now capture the entire complexity of biological systems. Multi-omics data integration strategies are needed to combine the complementary knowledge brought by each omics layer. We have summarized the most recent data integration methods/ frameworks into five different integration strategies: early, mixed, intermediate, late and hierarchical. In this mini-review, we focus on challenges and existing multi-omics integration strategies by paying special attention to machine learning applications.
Collapse
Affiliation(s)
- Milan Picard
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Marie-Pier Scott-Boyer
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Antoine Bodein
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Olivier Périn
- Digital Sciences Department, L'Oréal Advanced Research, Aulnay-sous-bois, France
| | - Arnaud Droit
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
- Corresponding author.
| |
Collapse
|
76
|
Alvarez-Franco A, Rouco R, Ramirez RJ, Guerrero-Serna G, Tiana M, Cogliati S, Kaur K, Saeed M, Magni R, Enriquez JA, Sanchez-Cabo F, Jalife J, Manzanares M. Transcriptome and proteome mapping in the sheep atria reveal molecular featurets of atrial fibrillation progression. Cardiovasc Res 2021; 117:1760-1775. [PMID: 33119050 PMCID: PMC8208739 DOI: 10.1093/cvr/cvaa307] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Accepted: 10/15/2020] [Indexed: 12/14/2022] Open
Abstract
AIMS Atrial fibrillation (AF) is a progressive cardiac arrhythmia that increases the risk of hospitalization and adverse cardiovascular events. There is a clear demand for more inclusive and large-scale approaches to understand the molecular drivers responsible for AF, as well as the fundamental mechanisms governing the transition from paroxysmal to persistent and permanent forms. In this study, we aimed to create a molecular map of AF and find the distinct molecular programmes underlying cell type-specific atrial remodelling and AF progression. METHODS AND RESULTS We used a sheep model of long-standing, tachypacing-induced AF, sampled right and left atrial tissue, and isolated cardiomyocytes (CMs) from control, intermediate (transition), and late time points during AF progression, and performed transcriptomic and proteome profiling. We have merged all these layers of information into a meaningful three-component space in which we explored the genes and proteins detected and their common patterns of expression. Our data-driven analysis points at extracellular matrix remodelling, inflammation, ion channel, myofibril structure, mitochondrial complexes, chromatin remodelling, and genes related to neural function, as well as critical regulators of cell proliferation as hallmarks of AF progression. Most important, we prove that these changes occur at early transitional stages of the disease, but not at later stages, and that the left atrium undergoes significantly more profound changes than the right atrium in its expression programme. The pattern of dynamic changes in gene and protein expression replicate the electrical and structural remodelling demonstrated previously in the sheep and in humans, and uncover novel mechanisms potentially relevant for disease treatment. CONCLUSIONS Transcriptomic and proteomic analysis of AF progression in a large animal model shows that significant changes occur at early stages, and that among others involve previously undescribed increase in mitochondria, changes to the chromatin of atrial CMs, and genes related to neural function and cell proliferation.
Collapse
Affiliation(s)
- Alba Alvarez-Franco
- Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), Madrid, Spain
| | - Raquel Rouco
- Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), Madrid, Spain
| | - Rafael J Ramirez
- Department of Internal Medicine, Center for Arrhythmia Research, University of Michigan, Ann Arbor, MI, USA
| | - Guadalupe Guerrero-Serna
- Department of Internal Medicine, Center for Arrhythmia Research, University of Michigan, Ann Arbor, MI, USA
| | - Maria Tiana
- Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), Madrid, Spain
| | - Sara Cogliati
- Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), Madrid, Spain
- Department of Physiology, Institute of Nutrition and Food Technology, Biomedical Research Centre, University of Granada, Granada, Spain
| | - Kuljeet Kaur
- Department of Internal Medicine, Center for Arrhythmia Research, University of Michigan, Ann Arbor, MI, USA
| | - Mohammed Saeed
- Department of Internal Medicine, Center for Arrhythmia Research, University of Michigan, Ann Arbor, MI, USA
| | - Ricardo Magni
- Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), Madrid, Spain
| | - Jose Antonio Enriquez
- Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), Madrid, Spain
| | - Fatima Sanchez-Cabo
- Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), Madrid, Spain
| | - José Jalife
- Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), Madrid, Spain
- Department of Internal Medicine, Center for Arrhythmia Research, University of Michigan, Ann Arbor, MI, USA
- Centro de Investigación Biomédica en Red de Enfermedades Cardiovasculares (CIBERCV), Spain
| | - Miguel Manzanares
- Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), Madrid, Spain
- Centro de Biología Molecular Severo Ochoa, CSIC-UAM, Madrid, Spain
| |
Collapse
|
77
|
Tarazona S, Arzalluz-Luque A, Conesa A. Undisclosed, unmet and neglected challenges in multi-omics studies. NATURE COMPUTATIONAL SCIENCE 2021; 1:395-402. [PMID: 38217236 DOI: 10.1038/s43588-021-00086-z] [Citation(s) in RCA: 51] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Accepted: 05/17/2021] [Indexed: 01/15/2024]
Abstract
Multi-omics approaches have become a reality in both large genomics projects and small laboratories. However, the multi-omics research community still faces a number of issues that have either not been sufficiently discussed or for which current solutions are still limited. In this Perspective, we elaborate on these limitations and suggest points of attention for future research. We finally discuss new opportunities and challenges brought to the field by the rapid development of single-cell high-throughput molecular technologies.
Collapse
Affiliation(s)
- Sonia Tarazona
- Department of Applied Statistics, Operations Research and Quality, Universitat Politècnica de València, Valencia, Spain
| | - Angeles Arzalluz-Luque
- Department of Applied Statistics, Operations Research and Quality, Universitat Politècnica de València, Valencia, Spain
| | - Ana Conesa
- Microbiology and Cell Science Department, Institute for Food and Agricultural Research, University of Florida, Gainesville, FL, USA.
- Genetics Institute, University of Florida, Gainesville, FL, USA.
- Institute for Integrative Systems Biology, Spanish National Research Council, Valencia, Spain.
| |
Collapse
|
78
|
Stevenson DK, Aghaeepour N, Maric I, Angst MS, Darmstadt GL, Druzin ML, Gaudilliere B, Ling XB, Moufarrej MN, Peterson LS, Quake SR, Relman DA, Snyder MP, Sylvester KG, Shaw GM, Wong RJ. Understanding how biologic and social determinants affect disparities in preterm birth and outcomes of preterm infants in the NICU. Semin Perinatol 2021; 45:151408. [PMID: 33875265 PMCID: PMC9159791 DOI: 10.1016/j.semperi.2021.151408] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
To understand the disparities in spontaneous preterm birth (sPTB) and/or its outcomes, biologic and social determinants as well as healthcare practice (such as those in neonatal intensive care units) should be considered. Disparities in sPTB have been largely intractable and remain obscure in most cases, despite a myriad of identified risk factors for and causes of sPTB. We still do not know how they lead to the different outcomes at different gestational ages and if they are independent of NICU practices. Here we describe an integrated approach to study the interplay between the genome and exposome, which may drive biochemistry and physiology and lead to health disparities.
Collapse
Affiliation(s)
- David K. Stevenson
- Department of Pediatrics, Division of Neonatal and Developmental Medicine, Stanford University School of Medicine, 1265 Welch Rd, X157, Stanford, CA 94305-5415, USA,Corresponding author. (D.K. Stevenson)
| | - Nima Aghaeepour
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Ivana Maric
- Department of Pediatrics, Division of Neonatal and Developmental Medicine, Stanford University School of Medicine, 1265 Welch Rd, X157, Stanford, CA 94305-5415, USA
| | - Martin S. Angst
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Gary L. Darmstadt
- Department of Pediatrics, Division of Neonatal and Developmental Medicine, Stanford University School of Medicine, 1265 Welch Rd, X157, Stanford, CA 94305-5415, USA
| | - Maurice L. Druzin
- Department of Obstetrics and Gynecology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Brice Gaudilliere
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Xuefeng B. Ling
- Department of Surgery, Stanford University School of Medicine, Stanford, CA 94305, USA,Clinical and Translational Research Program, Betty Irene Moore Children’s Heart Center, Lucile Packard Children’s Hospital, Palo Alto, CA 94306, USA
| | - Mira N. Moufarrej
- Department of Bioengineering and Applied Physics, Stanford University and Chan Zuckerberg Biohub, Stanford, CA 94305, USA
| | - Laura S. Peterson
- Department of Pediatrics, Division of Neonatal and Developmental Medicine, Stanford University School of Medicine, 1265 Welch Rd, X157, Stanford, CA 94305-5415, USA
| | - Stephen R. Quake
- Department of Bioengineering and Applied Physics, Stanford University and Chan Zuckerberg Biohub, Stanford, CA 94305, USA
| | - David A. Relman
- Department of Medicine, Stanford University School of Medicine and the Chan Zuckerberg Biohub Stanford, CA 94305, USA,Infectious Diseases Section, Veterans Affairs Palo Alto Health Care System, Palo Alto, CA 94304, USA
| | - Michael P. Snyder
- Stanford Center for Genomics and Personalized Medicine, Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Karl G. Sylvester
- Department of Surgery, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Gary M. Shaw
- Department of Pediatrics, Division of Neonatal and Developmental Medicine, Stanford University School of Medicine, 1265 Welch Rd, X157, Stanford, CA 94305-5415, USA
| | - Ronald J. Wong
- Department of Pediatrics, Division of Neonatal and Developmental Medicine, Stanford University School of Medicine, 1265 Welch Rd, X157, Stanford, CA 94305-5415, USA
| |
Collapse
|
79
|
Sillner N, Walker A, Lucio M, Maier TV, Bazanella M, Rychlik M, Haller D, Schmitt-Kopplin P. Longitudinal Profiles of Dietary and Microbial Metabolites in Formula- and Breastfed Infants. Front Mol Biosci 2021; 8:660456. [PMID: 34124150 PMCID: PMC8195334 DOI: 10.3389/fmolb.2021.660456] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 05/13/2021] [Indexed: 01/02/2023] Open
Abstract
The early-life metabolome of the intestinal tract is dynamically influenced by colonization of gut microbiota which in turn is affected by nutrition, i.e. breast milk or formula. A detailed examination of fecal metabolites was performed to investigate the effect of probiotics in formula compared to control formula and breast milk within the first months of life in healthy neonates. A broad metabolomics approach was conceptualized to describe fecal polar and semi-polar metabolites affected by feeding type within the first year of life. Fecal metabolomes were clearly distinct between formula- and breastfed infants, mainly originating from diet and microbial metabolism. Unsaturated fatty acids and human milk oligosaccharides were increased in breastfed, whereas Maillard products were found in feces of formula-fed children. Altered microbial metabolism was represented by bile acids and aromatic amino acid metabolites. Elevated levels of sulfated bile acids were detected in stool samples of breastfed infants, whereas secondary bile acids were increased in formula-fed infants. Microbial co-metabolism was supported by significant correlation between chenodeoxycholic or lithocholic acid and members of Clostridia. Fecal metabolites showed strong inter- and intra-individual behavior with features uniquely present in certain infants and at specific time points. Nevertheless, metabolite profiles converged at the end of the first year, coinciding with solid food introduction.
Collapse
Affiliation(s)
- Nina Sillner
- Research Unit Analytical BioGeoChemistry, Helmholtz Zentrum München, Neuherberg, Germany.,ZIEL Institute for Food and Health, Technical University of Munich, Freising, Germany
| | - Alesia Walker
- Research Unit Analytical BioGeoChemistry, Helmholtz Zentrum München, Neuherberg, Germany
| | - Marianna Lucio
- Research Unit Analytical BioGeoChemistry, Helmholtz Zentrum München, Neuherberg, Germany
| | - Tanja V Maier
- Research Unit Analytical BioGeoChemistry, Helmholtz Zentrum München, Neuherberg, Germany
| | - Monika Bazanella
- Chair of Nutrition and Immunology, Technical University of Munich, Freising, Germany
| | - Michael Rychlik
- Chair of Analytical Food Chemistry, Technical University of Munich, Freising, Germany
| | - Dirk Haller
- ZIEL Institute for Food and Health, Technical University of Munich, Freising, Germany.,Chair of Nutrition and Immunology, Technical University of Munich, Freising, Germany
| | - Philippe Schmitt-Kopplin
- Research Unit Analytical BioGeoChemistry, Helmholtz Zentrum München, Neuherberg, Germany.,ZIEL Institute for Food and Health, Technical University of Munich, Freising, Germany.,Chair of Analytical Food Chemistry, Technical University of Munich, Freising, Germany
| |
Collapse
|
80
|
Pettini F, Visibelli A, Cicaloni V, Iovinelli D, Spiga O. Multi-Omics Model Applied to Cancer Genetics. Int J Mol Sci 2021; 22:ijms22115751. [PMID: 34072237 PMCID: PMC8199287 DOI: 10.3390/ijms22115751] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Revised: 05/18/2021] [Accepted: 05/26/2021] [Indexed: 12/29/2022] Open
Abstract
In this review, we focus on bioinformatic oncology as an integrative discipline that incorporates knowledge from the mathematical, physical, and computational fields to further the biomedical understanding of cancer. Before providing a deeper insight into the bioinformatics approach and utilities involved in oncology, we must understand what is a system biology framework and the genetic connection, because of the high heterogenicity of the backgrounds of people approaching precision medicine. In fact, it is essential to providing general theoretical information on genomics, epigenomics, and transcriptomics to understand the phases of multi-omics approach. We consider how to create a multi-omics model. In the last section, we describe the new frontiers and future perspectives of this field.
Collapse
Affiliation(s)
- Francesco Pettini
- Department of Medical Biotechnology, University of Siena, Via M. Bracci 2, 53100 Siena, Italy
- Correspondence: ; Tel.: +39-3755461426
| | - Anna Visibelli
- Department of Biotechnology, Chemistry and Pharmacy, University of Siena, Via A. Moro 2, 53100 Siena, Italy; (A.V.); (D.I.); (O.S.)
| | - Vittoria Cicaloni
- Toscana Life Sciences Foundation, Via Fiorentina 1, 53100 Siena, Italy;
| | - Daniele Iovinelli
- Department of Biotechnology, Chemistry and Pharmacy, University of Siena, Via A. Moro 2, 53100 Siena, Italy; (A.V.); (D.I.); (O.S.)
| | - Ottavia Spiga
- Department of Biotechnology, Chemistry and Pharmacy, University of Siena, Via A. Moro 2, 53100 Siena, Italy; (A.V.); (D.I.); (O.S.)
| |
Collapse
|
81
|
Computational principles and challenges in single-cell data integration. Nat Biotechnol 2021; 39:1202-1215. [PMID: 33941931 DOI: 10.1038/s41587-021-00895-7] [Citation(s) in RCA: 158] [Impact Index Per Article: 52.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Accepted: 03/16/2021] [Indexed: 02/07/2023]
Abstract
The development of single-cell multimodal assays provides a powerful tool for investigating multiple dimensions of cellular heterogeneity, enabling new insights into development, tissue homeostasis and disease. A key challenge in the analysis of single-cell multimodal data is to devise appropriate strategies for tying together data across different modalities. The term 'data integration' has been used to describe this task, encompassing a broad collection of approaches ranging from batch correction of individual omics datasets to association of chromatin accessibility and genetic variation with transcription. Although existing integration strategies exploit similar mathematical ideas, they typically have distinct goals and rely on different principles and assumptions. Consequently, new definitions and concepts are needed to contextualize existing methods and to enable development of new methods.
Collapse
|
82
|
Integrative Multi-omics Analysis to Characterize Human Brain Ischemia. Mol Neurobiol 2021; 58:4107-4121. [PMID: 33939164 DOI: 10.1007/s12035-021-02401-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Accepted: 04/21/2021] [Indexed: 01/14/2023]
Abstract
Stroke is a major cause of death and disability. A better comprehension of stroke pathophysiology is fundamental to reduce its dramatic outcome. The use of high-throughput unbiased omics approaches and the integration of these data might deepen the knowledge of stroke at the molecular level, depicting the interaction between different molecular units. We aimed to identify protein and gene expression changes in the human brain after ischemia through an integrative approach to join the information of both omics analyses. The translational potential of our results was explored in a pilot study with blood samples from ischemic stroke patients. Proteomics and transcriptomics discovery studies were performed in human brain samples from six deceased stroke patients, comparing the infarct core with the corresponding contralateral brain region, unveiling 128 proteins and 2716 genes significantly dysregulated after stroke. Integrative bioinformatics analyses joining both datasets exposed canonical pathways altered in the ischemic area, highlighting the most influential molecules. Among the molecules with the highest fold-change, 28 genes and 9 proteins were selected to be validated in five independent human brain samples using orthogonal techniques. Our results were confirmed for NCDN, RAB3C, ST4A1, DNM1L, A1AG1, A1AT, JAM3, VTDB, ANXA1, ANXA2, and IL8. Finally, circulating levels of the validated proteins were explored in ischemic stroke patients. Fluctuations of A1AG1 and A1AT, both up-regulated in the ischemic brain, were detected in blood along the first week after onset. In summary, our results expand the knowledge of ischemic stroke pathology, revealing key molecules to be further explored as biomarkers and/or therapeutic targets.
Collapse
|
83
|
Coates JTT, Pirovano G, El Naqa I. Radiomic and radiogenomic modeling for radiotherapy: strategies, pitfalls, and challenges. J Med Imaging (Bellingham) 2021; 8:031902. [PMID: 33768134 PMCID: PMC7985651 DOI: 10.1117/1.jmi.8.3.031902] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Accepted: 01/12/2021] [Indexed: 12/14/2022] Open
Abstract
The power of predictive modeling for radiotherapy outcomes has historically been limited by an inability to adequately capture patient-specific variabilities; however, next-generation platforms together with imaging technologies and powerful bioinformatic tools have facilitated strategies and provided optimism. Integrating clinical, biological, imaging, and treatment-specific data for more accurate prediction of tumor control probabilities or risk of radiation-induced side effects are high-dimensional problems whose solutions could have widespread benefits to a diverse patient population-we discuss technical approaches toward this objective. Increasing interest in the above is specifically reflected by the emergence of two nascent fields, which are distinct but complementary: radiogenomics, which broadly seeks to integrate biological risk factors together with treatment and diagnostic information to generate individualized patient risk profiles, and radiomics, which further leverages large-scale imaging correlates and extracted features for the same purpose. We review classical analytical and data-driven approaches for outcomes prediction that serve as antecedents to both radiomic and radiogenomic strategies. Discussion then focuses on uses of conventional and deep machine learning in radiomics. We further consider promising strategies for the harmonization of high-dimensional, heterogeneous multiomics datasets (panomics) and techniques for nonparametric validation of best-fit models. Strategies to overcome common pitfalls that are unique to data-intensive radiomics are also discussed.
Collapse
Affiliation(s)
- James T. T. Coates
- Massachusetts General Hospital & Harvard Medical School, Center for Cancer Research, Boston, Massachusetts, United States
| | - Giacomo Pirovano
- Memorial Sloan Kettering Cancer Center, Department of Radiology, New York, New York, United States
| | - Issam El Naqa
- Moffitt Cancer Center and Research Institute, Department of Machine Learning, Tampa, Florida, United States
| |
Collapse
|
84
|
Kim DY, Kim JM. Multi-omics integration strategies for animal epigenetic studies - A review. Anim Biosci 2021; 34:1271-1282. [PMID: 33902167 PMCID: PMC8255897 DOI: 10.5713/ab.21.0042] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Accepted: 04/21/2021] [Indexed: 12/15/2022] Open
Abstract
Genome-wide studies provide considerable insights into the genetic background of animals; however, the inheritance of several heritable factors cannot be elucidated. Epigenetics explains these heritabilities, including those of genes influenced by environmental factors. Knowledge of the mechanisms underlying epigenetics enables understanding the processes of gene regulation through interactions with the environment. Recently developed next-generation sequencing (NGS) technologies help understand the interactional changes in epigenetic mechanisms. There are large sets of NGS data available; however, the integrative data analysis approaches still have limitations with regard to reliably interpreting the epigenetic changes. This review focuses on the epigenetic mechanisms and profiling methods and multi-omics integration methods that can provide comprehensive biological insights in animal genetic studies.
Collapse
Affiliation(s)
- Do-Young Kim
- Department of Animal Science and Technology, Chung-Ang University, Anseong, Gyeonggi 17546, Korea
| | - Jun-Mo Kim
- Department of Animal Science and Technology, Chung-Ang University, Anseong, Gyeonggi 17546, Korea
| |
Collapse
|
85
|
The Effect of the Effluent from a Small-Scale Conventional Wastewater Treatment Plant Treating Municipal Wastewater on the Composition and Abundance of the Microbial Community, Antibiotic Resistome, and Pathogens in the Sediment and Water of a Receiving Stream. WATER 2021. [DOI: 10.3390/w13060865] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
The effluents of wastewater treatment plants (WWTPs) are major contributors of nutrients, microbes—including those carrying antibiotic resistance genes (ARGs)—and pathogens to receiving waterbodies. The effect of the effluent of a small-scale activated sludge WWTP treating municipal wastewater on the composition and abundance of the microbial community as well as the antibiotic resistome and pathogens in the sediment and water of the receiving stream and river was studied using metagenome sequencing and a quantitative approach. Elevated Bacteroidetes proportions in the prokaryotic community, heightened sulfonamide and aminoglycoside resistance determinants proportions, and an increase of up to three orders of magnitude of sul1–sul2–aadA–blaOXA2 gene cluster abundances were recorded in stream water and sediments 0.3 km downstream of a WWTP discharge point. Further downstream, a gradual recovery of affected microbial communities along a distance gradient from WWTP was recorded, culminating in the mostly comparable state of river water and sediment parameters 3.7 km downstream of WWTP and stream water and sediments upstream of the WWTP discharge point. Archaea, especially Methanosarcina, Methanothrix, and Methanoregula, formed a substantial proportion of the microbial community of WWTP effluent as well as receiving stream water and sediment, and were linked to the spread of ARGs. Opportunistic environmental-origin pathogens were predominant in WWTP effluent and receiving stream bacterial communities, with Citrobacter freundii proportion being especially elevated in the close vicinity downstream of the WWTP discharge point.
Collapse
|
86
|
Abstract
The evolution of next-generation sequencing and high-throughput technologies has created new opportunities and challenges in data science. Currently, a classic proteomics analysis can be complemented by going a step beyond the individual analysis of the proteome by using integrative approaches. These integrations can be focused either on inferring relationships among proteins themselves, with other molecular levels, phenotype, or even environmental data, giving the researcher new tools to extract and determine the most relevant information in biological terms. Furthermore, it is also important the employ of visualization methods that allow a correct and deep interpretation of data.To carry out these analyses, several bioinformatics and biostatistical tools are required. In this chapter, different workflows that enable the creation of interaction networks are proposed. Resulting networks reduce the complexity of original datasets, depicting complex statistical relationships (through PLS analysis and variants), functional networks (STRING, shinyGO), and a combination of both approaches. Recently developed methods for integrating different omics levels, such as coinertial analyses or DIABLO, are also described. Finally, the use of Cytoscape or Gephi was described for the representation and mining of the different networks.This approach constitutes a new way of acquiring a deeper knowledge of the function of proteins, such as the search for specific connections of each group to identify differentially connected modules, which may reflect involved protein complexes and key pathways.
Collapse
|
87
|
Vlachavas EI, Bohn J, Ückert F, Nürnberg S. A Detailed Catalogue of Multi-Omics Methodologies for Identification of Putative Biomarkers and Causal Molecular Networks in Translational Cancer Research. Int J Mol Sci 2021; 22:2822. [PMID: 33802234 PMCID: PMC8000236 DOI: 10.3390/ijms22062822] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 03/05/2021] [Accepted: 03/05/2021] [Indexed: 02/06/2023] Open
Abstract
Recent advances in sequencing and biotechnological methodologies have led to the generation of large volumes of molecular data of different omics layers, such as genomics, transcriptomics, proteomics and metabolomics. Integration of these data with clinical information provides new opportunities to discover how perturbations in biological processes lead to disease. Using data-driven approaches for the integration and interpretation of multi-omics data could stably identify links between structural and functional information and propose causal molecular networks with potential impact on cancer pathophysiology. This knowledge can then be used to improve disease diagnosis, prognosis, prevention, and therapy. This review will summarize and categorize the most current computational methodologies and tools for integration of distinct molecular layers in the context of translational cancer research and personalized therapy. Additionally, the bioinformatics tools Multi-Omics Factor Analysis (MOFA) and netDX will be tested using omics data from public cancer resources, to assess their overall robustness, provide reproducible workflows for gaining biological knowledge from multi-omics data, and to comprehensively understand the significantly perturbed biological entities in distinct cancer types. We show that the performed supervised and unsupervised analyses result in meaningful and novel findings.
Collapse
Affiliation(s)
- Efstathios Iason Vlachavas
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
| | - Jonas Bohn
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
| | - Frank Ückert
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
- Applied Medical Informatics, University Hospital Hamburg-Eppendorf, 20251 Hamburg, Germany
| | - Sylvia Nürnberg
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
- Applied Medical Informatics, University Hospital Hamburg-Eppendorf, 20251 Hamburg, Germany
| |
Collapse
|
88
|
Grélard F, Legland D, Fanuel M, Arnaud B, Foucat L, Rogniaux H. Esmraldi: efficient methods for the fusion of mass spectrometry and magnetic resonance images. BMC Bioinformatics 2021; 22:56. [PMID: 33557761 PMCID: PMC7869484 DOI: 10.1186/s12859-020-03954-z] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Accepted: 12/30/2020] [Indexed: 11/29/2022] Open
Abstract
Background Mass spectrometry imaging (MSI) is a family of acquisition techniques producing images of the distribution of molecules in a sample, without any prior tagging of the molecules. This makes it a very interesting technique for exploratory research. However, the images are difficult to analyze because the enclosed data has high dimensionality, and their content does not necessarily reflect the shape of the object of interest. Conversely, magnetic resonance imaging (MRI) scans reflect the anatomy of the tissue. MRI also provides complementary information to MSI, such as the content and distribution of water. Results We propose a new workflow to merge the information from 2D MALDI–MSI and MRI images. Our workflow can be applied to large MSI datasets in a limited amount of time. Moreover, the workflow is fully automated and based on deterministic methods which ensures the reproducibility of the results. Our methods were evaluated and compared with state-of-the-art methods. Results show that the images are combined precisely and in a time-efficient manner. Conclusion Our workflow reveals molecules which co-localize with water in biological images. It can be applied on any MSI and MRI datasets which satisfy a few conditions: same regions of the shape enclosed in the images and similar intensity distributions.
Collapse
Affiliation(s)
- Florent Grélard
- UR BIA, INRAE, 44316, Nantes, France. .,BIBS Facility, INRAE, 44316, Nantes, France.
| | - David Legland
- UR BIA, INRAE, 44316, Nantes, France.,BIBS Facility, INRAE, 44316, Nantes, France
| | - Mathieu Fanuel
- UR BIA, INRAE, 44316, Nantes, France.,BIBS Facility, INRAE, 44316, Nantes, France
| | - Bastien Arnaud
- UR BIA, INRAE, 44316, Nantes, France.,BIBS Facility, INRAE, 44316, Nantes, France
| | - Loïc Foucat
- UR BIA, INRAE, 44316, Nantes, France.,BIBS Facility, INRAE, 44316, Nantes, France
| | - Hélène Rogniaux
- UR BIA, INRAE, 44316, Nantes, France.,BIBS Facility, INRAE, 44316, Nantes, France
| |
Collapse
|
89
|
Revilla L, Mayorgas A, Corraliza AM, Masamunt MC, Metwaly A, Haller D, Tristán E, Carrasco A, Esteve M, Panés J, Ricart E, Lozano JJ, Salas A. Multi-omic modelling of inflammatory bowel disease with regularized canonical correlation analysis. PLoS One 2021; 16:e0246367. [PMID: 33556098 PMCID: PMC7870068 DOI: 10.1371/journal.pone.0246367] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Accepted: 01/18/2021] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Personalized medicine requires finding relationships between variables that influence a patient's phenotype and predicting an outcome. Sparse generalized canonical correlation analysis identifies relationships between different groups of variables. This method requires establishing a model of the expected interaction between those variables. Describing these interactions is challenging when the relationship is unknown or when there is no pre-established hypothesis. Thus, our aim was to develop a method to find the relationships between microbiome and host transcriptome data and the relevant clinical variables in a complex disease, such as Crohn's disease. RESULTS We present here a method to identify interactions based on canonical correlation analysis. We show that the model is the most important factor to identify relationships between blocks using a dataset of Crohn's disease patients with longitudinal sampling. First the analysis was tested in two previously published datasets: a glioma and a Crohn's disease and ulcerative colitis dataset where we describe how to select the optimum parameters. Using such parameters, we analyzed our Crohn's disease data set. We selected the model with the highest inner average variance explained to identify relationships between transcriptome, gut microbiome and clinically relevant variables. Adding the clinically relevant variables improved the average variance explained by the model compared to multiple co-inertia analysis. CONCLUSIONS The methodology described herein provides a general framework for identifying interactions between sets of omic data and clinically relevant variables. Following this method, we found genes and microorganisms that were related to each other independently of the model, while others were specific to the model used. Thus, model selection proved crucial to finding the existing relationships in multi-omics datasets.
Collapse
Affiliation(s)
- Lluís Revilla
- Centro de Investigación Biomédica en Red de Enfermedades Hepática y Digestivas (CIBERehd), Barcelona, Spain
- Department of Gastroenterology, IDIBAPS, Hospital Clínic, Barcelona, Spain
| | - Aida Mayorgas
- Department of Gastroenterology, IDIBAPS, Hospital Clínic, Barcelona, Spain
| | - Ana M. Corraliza
- Department of Gastroenterology, IDIBAPS, Hospital Clínic, Barcelona, Spain
| | - Maria C. Masamunt
- Department of Gastroenterology, IDIBAPS, Hospital Clínic, Barcelona, Spain
| | - Amira Metwaly
- Chair of Nutrition and Immunology, Technical University of Munich, Freising-Weihenstephan, Germany
| | - Dirk Haller
- Chair of Nutrition and Immunology, Technical University of Munich, Freising-Weihenstephan, Germany
- ZIEL Institute for Food and Health, Technical University of Munich, Freising-Weihenstephan, Germany
| | - Eva Tristán
- Centro de Investigación Biomédica en Red de Enfermedades Hepática y Digestivas (CIBERehd), Barcelona, Spain
- Department of Gastroenterology, Hospital Universitari Mútua Terrassa, Barcelona, Spain
| | - Anna Carrasco
- Centro de Investigación Biomédica en Red de Enfermedades Hepática y Digestivas (CIBERehd), Barcelona, Spain
- Department of Gastroenterology, Hospital Universitari Mútua Terrassa, Barcelona, Spain
| | - Maria Esteve
- Centro de Investigación Biomédica en Red de Enfermedades Hepática y Digestivas (CIBERehd), Barcelona, Spain
- Department of Gastroenterology, Hospital Universitari Mútua Terrassa, Barcelona, Spain
| | - Julian Panés
- Centro de Investigación Biomédica en Red de Enfermedades Hepática y Digestivas (CIBERehd), Barcelona, Spain
- Department of Gastroenterology, IDIBAPS, Hospital Clínic, Barcelona, Spain
| | - Elena Ricart
- Centro de Investigación Biomédica en Red de Enfermedades Hepática y Digestivas (CIBERehd), Barcelona, Spain
- Department of Gastroenterology, IDIBAPS, Hospital Clínic, Barcelona, Spain
| | - Juan J. Lozano
- Centro de Investigación Biomédica en Red de Enfermedades Hepática y Digestivas (CIBERehd), Barcelona, Spain
| | - Azucena Salas
- Department of Gastroenterology, IDIBAPS, Hospital Clínic, Barcelona, Spain
| |
Collapse
|
90
|
Zoppi J, Guillaume JF, Neunlist M, Chaffron S. MiBiOmics: an interactive web application for multi-omics data exploration and integration. BMC Bioinformatics 2021; 22:6. [PMID: 33407076 PMCID: PMC7789220 DOI: 10.1186/s12859-020-03921-8] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Accepted: 12/02/2020] [Indexed: 12/12/2022] Open
Abstract
Background Multi-omics experimental approaches are becoming common practice in biological and medical sciences underlining the need to design new integrative techniques and applications to enable the multi-scale characterization of biological systems. The integrative analysis of heterogeneous datasets generally allows to acquire additional insights and generate novel hypotheses about a given biological system. However, it can become challenging given the often-large size of omics datasets and the diversity of existing techniques. Moreover, visualization tools for interpretation are usually non-accessible to biologists without programming skills. Results Here, we present MiBiOmics, a web-based and standalone application that facilitates multi-omics data visualization, exploration, integration, and analysis by providing easy access to dedicated and interactive protocols. It implements classical ordination techniques and the inference of omics-based (multilayer) networks to mine complex biological systems, and identify robust biomarkers linked to specific contextual parameters or biological states. Conclusions MiBiOmics provides easy-access to exploratory ordination techniques and to a network-based approach for integrative multi-omics analyses through an intuitive and interactive interface. MiBiOmics is currently available as a Shiny app at https://shiny-bird.univ-nantes.fr/app/Mibiomics and as a standalone application at https://gitlab.univ-nantes.fr/combi-ls2n/mibiomics.
Collapse
Affiliation(s)
| | - Jean-François Guillaume
- CHU Nantes, Inserm, CNRS, SFR Santé, Inserm UMS016, CNRS UMS 3556, Université de Nantes, 44000, Nantes, France
| | | | - Samuel Chaffron
- CNRS UMR6004, LS2N, Université de Nantes, 44000, Nantes, France. .,Research Federation (FR2022) Tara Oceans GO-SEE, Paris, France.
| |
Collapse
|
91
|
Wörheide MA, Krumsiek J, Kastenmüller G, Arnold M. Multi-omics integration in biomedical research - A metabolomics-centric review. Anal Chim Acta 2021; 1141:144-162. [PMID: 33248648 PMCID: PMC7701361 DOI: 10.1016/j.aca.2020.10.038] [Citation(s) in RCA: 101] [Impact Index Per Article: 33.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Revised: 10/09/2020] [Accepted: 10/19/2020] [Indexed: 02/07/2023]
Abstract
Recent advances in high-throughput technologies have enabled the profiling of multiple layers of a biological system, including DNA sequence data (genomics), RNA expression levels (transcriptomics), and metabolite levels (metabolomics). This has led to the generation of vast amounts of biological data that can be integrated in so-called multi-omics studies to examine the complex molecular underpinnings of health and disease. Integrative analysis of such datasets is not straightforward and is particularly complicated by the high dimensionality and heterogeneity of the data and by the lack of universal analysis protocols. Previous reviews have discussed various strategies to address the challenges of data integration, elaborating on specific aspects, such as network inference or feature selection techniques. Thereby, the main focus has been on the integration of two omics layers in their relation to a phenotype of interest. In this review we provide an overview over a typical multi-omics workflow, focusing on integration methods that have the potential to combine metabolomics data with two or more omics. We discuss multiple integration concepts including data-driven, knowledge-based, simultaneous and step-wise approaches. We highlight the application of these methods in recent multi-omics studies, including large-scale integration efforts aiming at a global depiction of the complex relationships within and between different biological layers without focusing on a particular phenotype.
Collapse
Affiliation(s)
- Maria A Wörheide
- Institute of Computational Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | - Jan Krumsiek
- Institute for Computational Biomedicine, Englander Institute for Precision Medicine, Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
| | - Gabi Kastenmüller
- Institute of Computational Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany; German Center for Diabetes Research (DZD), Neuherberg, Germany
| | - Matthias Arnold
- Institute of Computational Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany; Department of Psychiatry and Behavioral Sciences, Duke University, Durham, NC, USA.
| |
Collapse
|
92
|
Abstract
In recent biomedical studies, multidimensional profiling, which collects proteomics as well as other types of omics data on the same subjects, is getting increasingly popular. Proteomics, transcriptomics, genomics, epigenomics, and other types of data contain overlapping as well as independent information, which suggests the possibility of integrating multiple types of data to generate more reliable findings/models with better classification/prediction performance. In this chapter, a selective review is conducted on recent data integration techniques for both unsupervised and supervised analysis. The main objective is to provide the "big picture" of data integration that involves proteomics data and discuss the "intuition" beneath the recently developed approaches without invoking too many mathematical details. Potential pitfalls and possible directions for future developments are also discussed.
Collapse
Affiliation(s)
- Mengyun Wu
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China
| | - Yu Jiang
- School of Public Health, University of Memphis, Memphis, TN, USA
| | - Shuangge Ma
- Department of Biostatistics, Yale School of Public Health, Yale University, New Haven, CT, USA.
| |
Collapse
|
93
|
Qin G, Liu Z, Xie L. Multiple Omics Data Integration. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11508-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022] Open
|
94
|
Taguchi YH, Turki T. Tensor-Decomposition-Based Unsupervised Feature Extraction Applied to Prostate Cancer Multiomics Data. Genes (Basel) 2020; 11:genes11121493. [PMID: 33322492 PMCID: PMC7763286 DOI: 10.3390/genes11121493] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2020] [Revised: 12/03/2020] [Accepted: 12/07/2020] [Indexed: 01/06/2023] Open
Abstract
The large p small n problem is a challenge without a de facto standard method available to it. In this study, we propose a tensor-decomposition (TD)-based unsupervised feature extraction (FE) formalism applied to multiomics datasets, in which the number of features is more than 100,000 whereas the number of samples is as small as about 100, hence constituting a typical large p small n problem. The proposed TD-based unsupervised FE outperformed other conventional supervised feature selection methods, random forest, categorical regression (also known as analysis of variance, or ANOVA), penalized linear discriminant analysis, and two unsupervised methods, multiple non-negative matrix factorization and principal component analysis (PCA) based unsupervised FE when applied to synthetic datasets and four methods other than PCA based unsupervised FE when applied to multiomics datasets. The genes selected by TD-based unsupervised FE were enriched in genes known to be related to tissues and transcription factors measured. TD-based unsupervised FE was demonstrated to be not only the superior feature selection method but also the method that can select biologically reliable genes. To our knowledge, this is the first study in which TD-based unsupervised FE has been successfully applied to the integration of this variety of multiomics measurements.
Collapse
Affiliation(s)
- Y-h. Taguchi
- Department of Physics, Chuo University, Tokyo 112-8551, Japan
- Correspondence:
| | - Turki Turki
- Department of Computer Science, King Abdulaziz University, Jeddah 21589, Saudi Arabia;
| |
Collapse
|
95
|
Krassowski M, Das V, Sahu SK, Misra BB. State of the Field in Multi-Omics Research: From Computational Needs to Data Mining and Sharing. Front Genet 2020; 11:610798. [PMID: 33362867 PMCID: PMC7758509 DOI: 10.3389/fgene.2020.610798] [Citation(s) in RCA: 139] [Impact Index Per Article: 34.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2020] [Accepted: 11/20/2020] [Indexed: 12/24/2022] Open
Abstract
Multi-omics, variously called integrated omics, pan-omics, and trans-omics, aims to combine two or more omics data sets to aid in data analysis, visualization and interpretation to determine the mechanism of a biological process. Multi-omics efforts have taken center stage in biomedical research leading to the development of new insights into biological events and processes. However, the mushrooming of a myriad of tools, datasets, and approaches tends to inundate the literature and overwhelm researchers new to the field. The aims of this review are to provide an overview of the current state of the field, inform on available reliable resources, discuss the application of statistics and machine/deep learning in multi-omics analyses, discuss findable, accessible, interoperable, reusable (FAIR) research, and point to best practices in benchmarking. Thus, we provide guidance to interested users of the domain by addressing challenges of the underlying biology, giving an overview of the available toolset, addressing common pitfalls, and acknowledging current methods' limitations. We conclude with practical advice and recommendations on software engineering and reproducibility practices to share a comprehensive awareness with new researchers in multi-omics for end-to-end workflow.
Collapse
Affiliation(s)
- Michal Krassowski
- Nuffield Department of Women’s & Reproductive Health, University of Oxford, Oxford, United Kingdom
| | - Vivek Das
- Novo Nordisk Research Center Seattle, Inc, Seattle, WA, United States
| | | | | |
Collapse
|
96
|
Simats A, Ramiro L, García-Berrocoso T, Briansó F, Gonzalo R, Martín L, Sabé A, Gill N, Penalba A, Colomé N, Sánchez A, Canals F, Bustamante A, Rosell A, Montaner J. A Mouse Brain-based Multi-omics Integrative Approach Reveals Potential Blood Biomarkers for Ischemic Stroke. Mol Cell Proteomics 2020; 19:1921-1936. [PMID: 32868372 PMCID: PMC7710142 DOI: 10.1074/mcp.ra120.002283] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Indexed: 12/14/2022] Open
Abstract
Stroke remains a leading cause of death and disability worldwide. Despite continuous advances, the identification of key molecular signatures in the hyper-acute phase of ischemic stroke is still a primary interest for translational research on stroke diagnosis, prognosis, and treatment. Data integration from high-throughput -omics techniques has become crucial to unraveling key interactions among different molecular elements in complex biological contexts, such as ischemic stroke. Thus, we used advanced data integration methods for a multi-level joint analysis of transcriptomics and proteomics data sets obtained from mouse brains at 2 h after cerebral ischemia. By modeling net-like correlation structures, we identified an integrated network of genes and proteins that are differentially expressed at a very early stage after stroke. We validated 10 of these deregulated elements in acute stroke, and changes in their expression pattern over time after cerebral ischemia were described. Of these, CLDN20, GADD45G, RGS2, BAG5, and CTNND2 were next evaluated as blood biomarkers of cerebral ischemia in mice and human blood samples, which were obtained from stroke patients and patients presenting stroke-mimicking conditions. Our findings indicate that CTNND2 levels in blood might potentially be useful for distinguishing ischemic strokes from stroke-mimicking conditions in the hyper-acute phase of the disease. Furthermore, circulating GADD45G content within the first 6 h after stroke could also play a key role in predicting poor outcomes in stroke patients. For the first time, we have used an integrative biostatistical approach to elucidate key molecules in the initial stages of stroke pathophysiology and highlight new notable molecules that might be further considered as blood biomarkers of ischemic stroke.
Collapse
Affiliation(s)
- Alba Simats
- Neurovascular Research Laboratory, Vall d'Hebron Institute of Research (VHIR), Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Laura Ramiro
- Neurovascular Research Laboratory, Vall d'Hebron Institute of Research (VHIR), Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Teresa García-Berrocoso
- Neurovascular Research Laboratory, Vall d'Hebron Institute of Research (VHIR), Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Ferran Briansó
- Bioinformatics and Biostatistics Unit, Vall d'Hebron Institute of Research (VHIR), Universitat Autònoma de Barcelona, Barcelona, Spain; Genetics, Microbiology and Statistics Dept., Universitat de Barcelona, Barcelona, Spain
| | - Ricardo Gonzalo
- Bioinformatics and Biostatistics Unit, Vall d'Hebron Institute of Research (VHIR), Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Luna Martín
- Proteomics Laboratory, Vall d'Hebron Institute of Oncology (VHIO), Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Anna Sabé
- Proteomics Laboratory, Vall d'Hebron Institute of Oncology (VHIO), Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Natalia Gill
- Neurovascular Research Laboratory, Vall d'Hebron Institute of Research (VHIR), Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Anna Penalba
- Neurovascular Research Laboratory, Vall d'Hebron Institute of Research (VHIR), Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Nuria Colomé
- Proteomics Laboratory, Vall d'Hebron Institute of Oncology (VHIO), Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Alex Sánchez
- Bioinformatics and Biostatistics Unit, Vall d'Hebron Institute of Research (VHIR), Universitat Autònoma de Barcelona, Barcelona, Spain; Genetics, Microbiology and Statistics Dept., Universitat de Barcelona, Barcelona, Spain
| | - Francesc Canals
- Proteomics Laboratory, Vall d'Hebron Institute of Oncology (VHIO), Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Alejandro Bustamante
- Neurovascular Research Laboratory, Vall d'Hebron Institute of Research (VHIR), Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Anna Rosell
- Neurovascular Research Laboratory, Vall d'Hebron Institute of Research (VHIR), Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Joan Montaner
- Neurovascular Research Laboratory, Vall d'Hebron Institute of Research (VHIR), Universitat Autònoma de Barcelona, Barcelona, Spain.
| |
Collapse
|
97
|
Genome-Wide Characterization of OFP Family Genes in Wheat ( Triticum aestivum L.) Reveals That TaOPF29a-A Promotes Drought Tolerance. BIOMED RESEARCH INTERNATIONAL 2020; 2020:9708324. [PMID: 33224986 DOI: 10.1155/2020/9708324] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2020] [Revised: 09/15/2020] [Accepted: 09/22/2020] [Indexed: 12/21/2022]
Abstract
OVATE family proteins (OFPs) are plant-specific transcription factors that play important roles in plant development. Although common wheat (Triticum aestivum L.) is a major staple food worldwide, OFPs have not been systematically analyzed in this important crop. Here, we performed a genome-wide survey of OFP genes in wheat and identified 100 genes belonging to 34 homoeologous groups. Arabidopsis thaliana, rice (Oryza sativa), and wheat OFP genes were divided into four subgroups based on their phylogenetic relationships. Structural analysis indicated that only four TaOFPs contain introns. We mapped the TaOFP genes onto the wheat chromosomes and determined that TaOFP17 was duplicated in this crop. A survey of cis-acting elements along the promoter regions of TaOFP genes suggested that subfunctionalization of homoeologous genes might have occurred during evolution. The TaOFPs were highly expressed in wheat, with tissue- or organ-specific expression patterns. In addition, these genes were induced by various hormone and stress treatments. For instance, TaOPF29a-A was highly expressed in roots in response to drought stress. Wheat plants overexpressing TaOPF29a-A had longer roots and higher dry weights than nontransgenic plants under drought conditions, suggesting that this gene improves drought tolerance. Our findings provide a starting point for further functional analysis of this important transcription factor family and highlight the potential of using TaOPF29a-A to genetically engineer drought-tolerant crops.
Collapse
|
98
|
Labory J, Fierville M, Ait-El-Mkadem S, Bannwarth S, Paquis-Flucklinger V, Bottini S. Multi-Omics Approaches to Improve Mitochondrial Disease Diagnosis: Challenges, Advances, and Perspectives. Front Mol Biosci 2020; 7:590842. [PMID: 33240932 PMCID: PMC7667268 DOI: 10.3389/fmolb.2020.590842] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Accepted: 10/14/2020] [Indexed: 01/06/2023] Open
Abstract
Mitochondrial diseases (MD) are rare disorders caused by deficiency of the mitochondrial respiratory chain, which provides energy in each cell. They are characterized by a high clinical and genetic heterogeneity and in most patients, the responsible gene is unknown. Diagnosis is based on the identification of the causative gene that allows genetic counseling, prenatal diagnosis, understanding of pathological mechanisms, and personalized therapeutic approaches. Despite the emergence of Next Generation Sequencing (NGS), to date, more than one out of two patients has no diagnosis in the absence of identification of the responsible gene. Technologies currently used for detecting causal variants (genetic alterations) is far from complete, leading many variants of unknown significance (VUS) and mainly based on the use of whole exome sequencing thus neglecting the identification of non-coding variants. The complexity of human genome and its regulation at multiple levels has led biologists to develop several assays to interrogate the different aspects of biological processes. While one-dimension single omics investigation offers a peek of this complex system, the combination of different omics data allows the discovery of coherent signatures. The community of computational biologists and bioinformaticians, in order to integrate data from different omics, has developed several approaches and tools. However, it is difficult to understand which suits the best to predict diverse phenotypic outcome. First attempts to use multi-omics approaches showed an improvement of the diagnostic power. However, we are far from a complete understanding of MD and their diagnosis. After reviewing multi-omics algorithms developed in the latest years, we are proposing here a novel data-driven classification and we will discuss how multi-omics will change and improve the diagnosis of MD. Due to the growing use of multi-omics approaches in MD, we foresee that this work will contribute to set up good practices to perform multi-omics data integration to improve the prediction of phenotypic outcomes and the diagnostic power of MD.
Collapse
Affiliation(s)
- Justine Labory
- Université Côte d'Azur, Center of Modeling, Simulation and Interactions, Nice, France
| | - Morgane Fierville
- Université Côte d'Azur, Center of Modeling, Simulation and Interactions, Nice, France
| | - Samira Ait-El-Mkadem
- Université Côte d'Azur, Inserm U1081, CNRS UMR 7284, Institute for Research on Cancer and Aging, Nice (IRCAN), Centre hospitalier universitaire (CHU) de Nice, Nice, France
| | - Sylvie Bannwarth
- Université Côte d'Azur, Inserm U1081, CNRS UMR 7284, Institute for Research on Cancer and Aging, Nice (IRCAN), Centre hospitalier universitaire (CHU) de Nice, Nice, France
| | - Véronique Paquis-Flucklinger
- Université Côte d'Azur, Center of Modeling, Simulation and Interactions, Nice, France.,Université Côte d'Azur, Inserm U1081, CNRS UMR 7284, Institute for Research on Cancer and Aging, Nice (IRCAN), Centre hospitalier universitaire (CHU) de Nice, Nice, France
| | - Silvia Bottini
- Université Côte d'Azur, Center of Modeling, Simulation and Interactions, Nice, France
| |
Collapse
|
99
|
Mantini G, Pham TV, Piersma SR, Jimenez CR. Computational Analysis of Phosphoproteomics Data in Multi-Omics Cancer Studies. Proteomics 2020; 21:e1900312. [PMID: 32875713 DOI: 10.1002/pmic.201900312] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2020] [Revised: 07/09/2020] [Indexed: 12/24/2022]
Abstract
Multiple types of molecular data for the same set of clinical samples are increasingly available and may be analyzed jointly in an integrative analysis to maximize comprehensive biological insight. This analysis is important as separate analyses of individual omics data types usually do not fully explain disease phenotypes. An increasing number of studies have now been focusing on multi-omics data integration, yet not many studies have included phosphoproteomics data, an important layer for understanding signaling pathways. Multi-omics integration methods with phosphoproteomics data are reviewed in the context of cancer research as well as multi-omics methods papers that would be promising to apply to phosphoproteomics data. Analysis of individual data types is still the major approach even in large cohort proteogenomics studies. Hence, a section is dedicated on possible integrative methods for multi-omics and phosphoproteomics data. In summary, this review provides the readers with both currently used integrative methods previously applied to phosphoproteomics and multi-omics data integration and other algorithms for multi-omics data integration promising for future application to phosphoproteomics data.
Collapse
Affiliation(s)
- Giulia Mantini
- Department of Medical Oncology, OncoProteomics Laboratory, CCA 1-60, Amsterdam UMC VUmc-location, De Boelelaan 1117, Amsterdam, 1081 HV, The Netherlands
| | - Thang V Pham
- Department of Medical Oncology, OncoProteomics Laboratory, CCA 1-60, Amsterdam UMC VUmc-location, De Boelelaan 1117, Amsterdam, 1081 HV, The Netherlands
| | - Sander R Piersma
- Department of Medical Oncology, OncoProteomics Laboratory, CCA 1-60, Amsterdam UMC VUmc-location, De Boelelaan 1117, Amsterdam, 1081 HV, The Netherlands
| | - Connie R Jimenez
- Department of Medical Oncology, OncoProteomics Laboratory, CCA 1-60, Amsterdam UMC VUmc-location, De Boelelaan 1117, Amsterdam, 1081 HV, The Netherlands
| |
Collapse
|
100
|
Abstract
In this chapter we discuss the past, present and future of clinical biomarker development. We explore the advent of new technologies, paving the way in which health, medicine and disease is understood. This review includes the identification of physicochemical assays, current regulations, the development and reproducibility of clinical trials, as well as, the revolution of omics technologies and state-of-the-art integration and analysis approaches.
Collapse
|