1
|
Rashid MM, Selvarajoo K. Advancing drug-response prediction using multi-modal and -omics machine learning integration (MOMLIN): a case study on breast cancer clinical data. Brief Bioinform 2024; 25:bbae300. [PMID: 38904542 PMCID: PMC11190965 DOI: 10.1093/bib/bbae300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 05/30/2024] [Accepted: 06/11/2024] [Indexed: 06/22/2024] Open
Abstract
The inherent heterogeneity of cancer contributes to highly variable responses to any anticancer treatments. This underscores the need to first identify precise biomarkers through complex multi-omics datasets that are now available. Although much research has focused on this aspect, identifying biomarkers associated with distinct drug responders still remains a major challenge. Here, we develop MOMLIN, a multi-modal and -omics machine learning integration framework, to enhance drug-response prediction. MOMLIN jointly utilizes sparse correlation algorithms and class-specific feature selection algorithms, which identifies multi-modal and -omics-associated interpretable components. MOMLIN was applied to 147 patients' breast cancer datasets (clinical, mutation, gene expression, tumor microenvironment cells and molecular pathways) to analyze drug-response class predictions for non-responders and variable responders. Notably, MOMLIN achieves an average AUC of 0.989, which is at least 10% greater when compared with current state-of-the-art (data integration analysis for biomarker discovery using latent components, multi-omics factor analysis, sparse canonical correlation analysis). Moreover, MOMLIN not only detects known individual biomarkers such as genes at mutation/expression level, most importantly, it correlates multi-modal and -omics network biomarkers for each response class. For example, an interaction between ER-negative-HMCN1-COL5A1 mutations-FBXO2-CSF3R expression-CD8 emerge as a multimodal biomarker for responders, potentially affecting antimicrobial peptides and FLT3 signaling pathways. In contrast, for resistance cases, a distinct combination of lymph node-TP53 mutation-PON3-ENSG00000261116 lncRNA expression-HLA-E-T-cell exclusions emerged as multimodal biomarkers, possibly impacting neurotransmitter release cycle pathway. MOMLIN, therefore, is expected advance precision medicine, such as to detect context-specific multi-omics network biomarkers and better predict drug-response classifications.
Collapse
Affiliation(s)
- Md Mamunur Rashid
- Biomolecular Sequence to Function Division, BII, (ASTAR), Singapore 138671, Republic of Singapore
| | - Kumar Selvarajoo
- Biomolecular Sequence to Function Division, BII, (ASTAR), Singapore 138671, Republic of Singapore
- Synthetic Biology Translational Research Program, Yong Loo Lin School of Medicine, NUS, Singapore 117456, Republic of Singapore
- School of Biological Sciences, Nanyang Technological University (NTU), Singapore 639798, Republic of Singapore
| |
Collapse
|
2
|
Liu W, Vu T, Konigsberg I, Pratte K, Zhuang Y, Kechris K. SmCCNet 2.0: A Comprehensive Tool for Multi-omics Network Inference with Shiny Visualization. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.20.567893. [PMID: 38045372 PMCID: PMC10690212 DOI: 10.1101/2023.11.20.567893] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
Summary Sparse multiple canonical correlation network analysis (SmCCNet) is a machine learning technique for integrating omics data along with a variable of interest (e.g., phenotype of complex disease), and reconstructing multi-omics networks that are specific to this variable. We present the second-generation SmCCNet (SmCCNet 2.0) that adeptly integrates single or multiple omics data types along with a quantitative or binary phenotype of interest. In addition, this new package offers a streamlined setup process that can be configured manually or automatically, ensuring a flexible and user-friendly experience. Availability This package is available in both CRAN: https://cran.r-project.org/web/packages/SmCCNet/index.html and Github: https://github.com/KechrisLab/SmCCNet under the MIT license. The network visualization tool is available at https://smccnet.shinyapps.io/smccnetnetwork/.
Collapse
Affiliation(s)
- Weixuan Liu
- Department of Biostatistics and Informatics, School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, 80045, CO, USA
| | - Thao Vu
- Department of Biostatistics and Informatics, School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, 80045, CO, USA
| | - Iain Konigsberg
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, 80045, CO, USA
| | - Katherine Pratte
- Department of Biostatistics, National Jewish Health, Denver, 80206, CO, USA
| | - Yonghua Zhuang
- Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, 80045, CO, USA
| | - Katerina Kechris
- Department of Biostatistics and Informatics, School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, 80045, CO, USA
| |
Collapse
|
3
|
Muller E, Shiryan I, Borenstein E. Multi-omic integration of microbiome data for identifying disease-associated modules. Nat Commun 2024; 15:2621. [PMID: 38521774 PMCID: PMC10960825 DOI: 10.1038/s41467-024-46888-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Accepted: 03/08/2024] [Indexed: 03/25/2024] Open
Abstract
Multi-omic studies of the human gut microbiome are crucial for understanding its role in disease across multiple functional layers. Nevertheless, integrating and analyzing such complex datasets poses significant challenges. Most notably, current analysis methods often yield extensive lists of disease-associated features (e.g., species, pathways, or metabolites), without capturing the multi-layered structure of the data. Here, we address this challenge by introducing "MintTea", an intermediate integration-based approach combining canonical correlation analysis extensions, consensus analysis, and an evaluation protocol. MintTea identifies "disease-associated multi-omic modules", comprising features from multiple omics that shift in concord and that collectively associate with the disease. Applied to diverse cohorts, MintTea captures modules with high predictive power, significant cross-omic correlations, and alignment with known microbiome-disease associations. For example, analyzing samples from a metabolic syndrome study, MintTea identifies a module with serum glutamate- and TCA cycle-related metabolites, along with bacterial species linked to insulin resistance. In another dataset, MintTea identifies a module associated with late-stage colorectal cancer, including Peptostreptococcus and Gemella species and fecal amino acids, in line with these species' metabolic activity and their coordinated gradual increase with cancer development. This work demonstrates the potential of advanced integration methods in generating systems-level, multifaceted hypotheses underlying microbiome-disease interactions.
Collapse
Affiliation(s)
- Efrat Muller
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Itamar Shiryan
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Elhanan Borenstein
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel.
- Faculty of Medical and Health Sciences, Tel Aviv University, Tel Aviv, Israel.
- Santa Fe Institute, Santa Fe, NM, USA.
| |
Collapse
|
4
|
Bernatchez L, Ferchaud AL, Berger CS, Venney CJ, Xuereb A. Genomics for monitoring and understanding species responses to global climate change. Nat Rev Genet 2024; 25:165-183. [PMID: 37863940 DOI: 10.1038/s41576-023-00657-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/29/2023] [Indexed: 10/22/2023]
Abstract
All life forms across the globe are experiencing drastic changes in environmental conditions as a result of global climate change. These environmental changes are happening rapidly, incur substantial socioeconomic costs, pose threats to biodiversity and diminish a species' potential to adapt to future environments. Understanding and monitoring how organisms respond to human-driven climate change is therefore a major priority for the conservation of biodiversity in a rapidly changing environment. Recent developments in genomic, transcriptomic and epigenomic technologies are enabling unprecedented insights into the evolutionary processes and molecular bases of adaptation. This Review summarizes methods that apply and integrate omics tools to experimentally investigate, monitor and predict how species and communities in the wild cope with global climate change, which is by genetically adapting to new environmental conditions, through range shifts or through phenotypic plasticity. We identify advantages and limitations of each method and discuss future research avenues that would improve our understanding of species' evolutionary responses to global climate change, highlighting the need for holistic, multi-omics approaches to ecosystem monitoring during global climate change.
Collapse
Affiliation(s)
- Louis Bernatchez
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, Quebec, Canada
| | - Anne-Laure Ferchaud
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, Quebec, Canada.
- Parks Canada, Office of the Chief Ecosystem Scientist, Protected Areas Establishment, Quebec City, Quebec, Canada.
| | - Chloé Suzanne Berger
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, Quebec, Canada
| | - Clare J Venney
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, Quebec, Canada
| | - Amanda Xuereb
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, Quebec, Canada
| |
Collapse
|
5
|
Konigsberg IR, Vu T, Liu W, Litkowski EM, Pratte KA, Vargas LB, Gilmore N, Abdel-Hafiz M, Manichaikul AW, Cho MH, Hersh CP, DeMeo DL, Banaei-Kashani F, Bowler RP, Lange LA, Kechris KJ. Proteomic Networks and Related Genetic Variants Associated with Smoking and Chronic Obstructive Pulmonary Disease. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.02.26.24303069. [PMID: 38464285 PMCID: PMC10925350 DOI: 10.1101/2024.02.26.24303069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Background Studies have identified individual blood biomarkers associated with chronic obstructive pulmonary disease (COPD) and related phenotypes. However, complex diseases such as COPD typically involve changes in multiple molecules with interconnections that may not be captured when considering single molecular features. Methods Leveraging proteomic data from 3,173 COPDGene Non-Hispanic White (NHW) and African American (AA) participants, we applied sparse multiple canonical correlation network analysis (SmCCNet) to 4,776 proteins assayed on the SomaScan v4.0 platform to derive sparse networks of proteins associated with current vs. former smoking status, airflow obstruction, and emphysema quantitated from high-resolution computed tomography scans. We then used NetSHy, a dimension reduction technique leveraging network topology, to produce summary scores of each proteomic network, referred to as NetSHy scores. We next performed genome-wide association study (GWAS) to identify variants associated with the NetSHy scores, or network quantitative trait loci (nQTLs). Finally, we evaluated the replicability of the networks in an independent cohort, SPIROMICS. Results We identified networks of 13 to 104 proteins for each phenotype and exposure in NHW and AA, and the derived NetSHy scores significantly associated with the variable of interests. Networks included known (sRAGE, ALPP, MIP1) and novel molecules (CA10, CPB1, HIS3, PXDN) and interactions involved in COPD pathogenesis. We observed 7 nQTL loci associated with NetSHy scores, 4 of which remained after conditional analysis. Networks for smoking status and emphysema, but not airflow obstruction, demonstrated a high degree of replicability across race groups and cohorts. Conclusions In this work, we apply state-of-the-art molecular network generation and summarization approaches to proteomic data from COPDGene participants to uncover protein networks associated with COPD phenotypes. We further identify genetic associations with networks. This work discovers protein networks containing known and novel proteins and protein interactions associated with clinically relevant COPD phenotypes across race groups and cohorts.
Collapse
Affiliation(s)
- Iain R Konigsberg
- Department of Biomedical Informatics, University of Colorado - Anschutz Medical Campus, Aurora, CO
| | - Thao Vu
- Department of Biostatistics and Informatics, University of Colorado - Anschutz Medical Campus, Aurora, CO
| | - Weixuan Liu
- Department of Biostatistics and Informatics, University of Colorado - Anschutz Medical Campus, Aurora, CO
| | - Elizabeth M Litkowski
- Department of Biomedical Informatics, University of Colorado - Anschutz Medical Campus, Aurora, CO
- Department of Medicine, University of Michigan, Ann Arbor, MI
| | | | - Luciana B Vargas
- Department of Biomedical Informatics, University of Colorado - Anschutz Medical Campus, Aurora, CO
| | - Niles Gilmore
- Department of Biomedical Informatics, University of Colorado - Anschutz Medical Campus, Aurora, CO
| | - Mohamed Abdel-Hafiz
- Department of Computer Science and Engineering, University of Colorado - Denver, Denver, CO
| | - Ani W Manichaikul
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA
| | - Michael H Cho
- Channing Division of Network Medicine and Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - Craig P Hersh
- Channing Division of Network Medicine and Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - Dawn L DeMeo
- Channing Division of Network Medicine and Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | | | | | - Leslie A Lange
- Department of Biomedical Informatics, University of Colorado - Anschutz Medical Campus, Aurora, CO
| | - Katerina J Kechris
- Department of Biostatistics and Informatics, University of Colorado - Anschutz Medical Campus, Aurora, CO
| |
Collapse
|
6
|
Liu W, Pratte KA, Castaldi PJ, Hersh C, Bowler RP, Banaei-Kashani F, Kechris KJ. A Generalized Higher-order Correlation Analysis Framework for Multi-Omics Network Inference. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.22.576667. [PMID: 38328226 PMCID: PMC10849540 DOI: 10.1101/2024.01.22.576667] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Multiple -omics (genomics, proteomics, etc.) profiles are commonly generated to gain insight into a disease or physiological system. Constructing multi-omics networks with respect to the trait(s) of interest provides an opportunity to understand relationships between molecular features but integration is challenging due to multiple data sets with high dimensionality. One approach is to use canonical correlation to integrate one or two omics types and a single trait of interest. However, these types of methods may be limited due to (1) not accounting for higher-order correlations existing among features, (2) computational inefficiency when extending to more than two omics data when using a penalty term-based sparsity method, and (3) lack of flexibility for focusing on specific correlations (e.g., omics-to-phenotype correlation versus omics-to-omics correlations). In this work, we have developed a novel multi-omics network analysis pipeline called Sparse Generalized Tensor Canonical Correlation Analysis Network Inference (SGTCCA-Net) that can effectively overcome these limitations. We also introduce an implementation to improve the summarization of networks for downstream analyses. Simulation and real-data experiments demonstrate the effectiveness of our novel method for inferring omics networks and features of interest.
Collapse
Affiliation(s)
- Weixuan Liu
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | | | - Peter J. Castaldi
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, United States
| | - Craig Hersh
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, United States
| | - Russell P. Bowler
- Division of Pulmonary Medicine, Department of Medicine, National Jewish Health, Denver, CO, USA
| | - Farnoush Banaei-Kashani
- Department of Computer Science and Engineering, College of Engineering, Design and Computing, University of Colorado Denver, Denver, CO, USA
| | - Katerina J. Kechris
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| |
Collapse
|
7
|
Li Z, Melograna F, Hoskens H, Duroux D, Marazita ML, Walsh S, Weinberg SM, Shriver MD, Müller-Myhsok B, Claes P, Van Steen K. netMUG: a novel network-guided multi-view clustering workflow for dissecting genetic and facial heterogeneity. Front Genet 2023; 14:1286800. [PMID: 38125750 PMCID: PMC10731261 DOI: 10.3389/fgene.2023.1286800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 11/14/2023] [Indexed: 12/23/2023] Open
Abstract
Introduction: Multi-view data offer advantages over single-view data for characterizing individuals, which is crucial in precision medicine toward personalized prevention, diagnosis, or treatment follow-up. Methods: Here, we develop a network-guided multi-view clustering framework named netMUG to identify actionable subgroups of individuals. This pipeline first adopts sparse multiple canonical correlation analysis to select multi-view features possibly informed by extraneous data, which are then used to construct individual-specific networks (ISNs). Finally, the individual subtypes are automatically derived by hierarchical clustering on these network representations. Results: We applied netMUG to a dataset containing genomic data and facial images to obtain BMI-informed multi-view strata and showed how it could be used for a refined obesity characterization. Benchmark analysis of netMUG on synthetic data with known strata of individuals indicated its superior performance compared with both baseline and benchmark methods for multi-view clustering. The clustering derived from netMUG achieved an adjusted Rand index of 1 with respect to the synthesized true labels. In addition, the real-data analysis revealed subgroups strongly linked to BMI and genetic and facial determinants of these subgroups. Discussion: netMUG provides a powerful strategy, exploiting individual-specific networks to identify meaningful and actionable strata. Moreover, the implementation is easy to generalize to accommodate heterogeneous data sources or highlight data structures.
Collapse
Affiliation(s)
- Zuqi Li
- BIO3 - Laboratory for Systems Medicine, Department of Human Genetics, KU Leuven, Leuven, Belgium
- Medical Imaging Research Center, University Hospitals Leuven, Leuven, Belgium
| | - Federico Melograna
- BIO3 - Laboratory for Systems Medicine, Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Hanne Hoskens
- BIO3 - Laboratory for Systems Medicine, Department of Human Genetics, KU Leuven, Leuven, Belgium
- Medical Imaging Research Center, University Hospitals Leuven, Leuven, Belgium
| | - Diane Duroux
- BIO3 - Laboratory for Systems Genetics, GIGA-R Medical Genomics, University of Liège, Liège, Belgium
| | - Mary L. Marazita
- Center for Craniofacial and Dental Genetics, Department of Oral and Craniofacial Sciences, University of Pittsburgh, Pittsburgh, PA, United States
- Department of Human Genetics, University of Pittsburgh, Pittsburgh, PA, United States
| | - Susan Walsh
- Department of Biology, Indiana University Indianapolis, Indianapolis, IN, United States
| | - Seth M. Weinberg
- Center for Craniofacial and Dental Genetics, Department of Oral and Craniofacial Sciences, University of Pittsburgh, Pittsburgh, PA, United States
- Department of Human Genetics, University of Pittsburgh, Pittsburgh, PA, United States
| | - Mark D. Shriver
- Department of Anthropology, Pennsylvania State University, State College, PA, United States
| | | | - Peter Claes
- BIO3 - Laboratory for Systems Medicine, Department of Human Genetics, KU Leuven, Leuven, Belgium
- Medical Imaging Research Center, University Hospitals Leuven, Leuven, Belgium
- Department of Electrical Engineering, KU Leuven, Leuven, Belgium
- Murdoch Children’s Research Institute, Melbourne, VIC, Australia
| | - Kristel Van Steen
- BIO3 - Laboratory for Systems Medicine, Department of Human Genetics, KU Leuven, Leuven, Belgium
- BIO3 - Laboratory for Systems Genetics, GIGA-R Medical Genomics, University of Liège, Liège, Belgium
| |
Collapse
|
8
|
Graham BI, Harris JK, Zemanick ET, Wagner BD. Integrating airway microbiome and blood proteomics data to identify multi-omic networks associated with response to pulmonary infection. THE MICROBE 2023; 1:100023. [PMID: 38264413 PMCID: PMC10805068 DOI: 10.1016/j.microb.2023.100023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/25/2024]
Abstract
Host response to airway infections can vary widely. Cystic fibrosis (CF) pulmonary exacerbations provide an opportunity to better understand the interplay between respiratory microbes and the host. This study aimed to investigate the observed heterogeneity in airway infection recovery by analyzing microbiome and host response (i.e., blood proteome) data collected during the onset of 33 pulmonary infection events. We used sparse multiple canonical correlation network (SmCCNet) analysis to integrate these two types of -omics data along with a clinical measure of recovery. Four microbe-protein SmCCNet subnetworks at infection onset were identified that strongly correlate with recovery. Our findings support existing knowledge regarding CF airway infections. Additionally, we discovered novel microbe-protein subnetworks that are associated with recovery and merit further investigation.
Collapse
Affiliation(s)
- Brenton I.M. Graham
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - J. Kirk Harris
- Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Edith T. Zemanick
- Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Brandie D. Wagner
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
- Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| |
Collapse
|
9
|
Downing T, Angelopoulos N. A primer on correlation-based dimension reduction methods for multi-omics analysis. J R Soc Interface 2023; 20:20230344. [PMID: 37817584 PMCID: PMC10565429 DOI: 10.1098/rsif.2023.0344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 09/19/2023] [Indexed: 10/12/2023] Open
Abstract
The continuing advances of omic technologies mean that it is now more tangible to measure the numerous features collectively reflecting the molecular properties of a sample. When multiple omic methods are used, statistical and computational approaches can exploit these large, connected profiles. Multi-omics is the integration of different omic data sources from the same biological sample. In this review, we focus on correlation-based dimension reduction approaches for single omic datasets, followed by methods for pairs of omics datasets, before detailing further techniques for three or more omic datasets. We also briefly detail network methods when three or more omic datasets are available and which complement correlation-oriented tools. To aid readers new to this area, these are all linked to relevant R packages that can implement these procedures. Finally, we discuss scenarios of experimental design and present road maps that simplify the selection of appropriate analysis methods. This review will help researchers navigate emerging methods for multi-omics and integrating diverse omic datasets appropriately. This raises the opportunity of implementing population multi-omics with large sample sizes as omics technologies and our understanding improve.
Collapse
Affiliation(s)
- Tim Downing
- Pirbright Institute, Pirbright, Surrey, UK
- Department of Biotechnology, Dublin City University, Dublin, Ireland
| | | |
Collapse
|
10
|
Blutt SE, Coarfa C, Neu J, Pammi M. Multiomic Investigations into Lung Health and Disease. Microorganisms 2023; 11:2116. [PMID: 37630676 PMCID: PMC10459661 DOI: 10.3390/microorganisms11082116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 08/08/2023] [Accepted: 08/13/2023] [Indexed: 08/27/2023] Open
Abstract
Diseases of the lung account for more than 5 million deaths worldwide and are a healthcare burden. Improving clinical outcomes, including mortality and quality of life, involves a holistic understanding of the disease, which can be provided by the integration of lung multi-omics data. An enhanced understanding of comprehensive multiomic datasets provides opportunities to leverage those datasets to inform the treatment and prevention of lung diseases by classifying severity, prognostication, and discovery of biomarkers. The main objective of this review is to summarize the use of multiomics investigations in lung disease, including multiomics integration and the use of machine learning computational methods. This review also discusses lung disease models, including animal models, organoids, and single-cell lines, to study multiomics in lung health and disease. We provide examples of lung diseases where multi-omics investigations have provided deeper insight into etiopathogenesis and have resulted in improved preventative and therapeutic interventions.
Collapse
Affiliation(s)
- Sarah E. Blutt
- Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, TX 77030, USA;
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX 77030, USA;
| | - Cristian Coarfa
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX 77030, USA;
- Dan L Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Josef Neu
- Department of Pediatrics, Section of Neonatology, University of Florida, Gainesville, FL 32611, USA;
| | - Mohan Pammi
- Department of Pediatrics, Section of Neonatology, Baylor College of Medicine and Texas Children’s Hospital, Houston, TX 77030, USA
| |
Collapse
|
11
|
Zhang YH, Cho MH, Morrow JD, Castaldi PJ, Hersh CP, Midha MK, Hoopmann MR, Lutz SM, Moritz RL, Silverman EK. Integrating Genetics, Transcriptomics, and Proteomics in Lung Tissue to Investigate Chronic Obstructive Pulmonary Disease. Am J Respir Cell Mol Biol 2023; 68:651-663. [PMID: 36780661 PMCID: PMC10257075 DOI: 10.1165/rcmb.2022-0302oc] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 02/13/2023] [Indexed: 02/15/2023] Open
Abstract
The integration of transcriptomic and proteomic data from lung tissue with chronic obstructive pulmonary disease (COPD)-associated genetic variants could provide insight into the biological mechanisms of COPD. Here, we assessed associations between lung transcriptomics and proteomics with COPD in 98 subjects from the Lung Tissue Research Consortium. Low correlations between transcriptomics and proteomics were generally observed, but higher correlations were found for COPD-associated proteins. We integrated COPD risk SNPs or SNPs near COPD-associated proteins with lung transcripts and proteins to identify regulatory cis-quantitative trait loci (QTLs). Significant expression QTLs (eQTLs) and protein QTLs (pQTLs) were found regulating multiple COPD-associated biomarkers. We investigated mediated associations from significant pQTLs through transcripts to protein levels of COPD-associated proteins. We also attempted to identify colocalized effects between COPD genome-wide association studies and eQTL and pQTL signals. Evidence was found for colocalization between COPD genome-wide association study signals and a pQTL for RHOB and an eQTL for DSP. We applied weighted gene co-expression network analysis to find consensus COPD-associated network modules. Two network modules generated by consensus weighted gene co-expression network analysis were associated with COPD with a false discovery rate lower than 0.05. One network module is related to the catenin complex, and the other module is related to plasma membrane components. In summary, multiple cis-acting determinants of transcripts and proteins associated with COPD were identified. Colocalization analysis, mediation analysis, and correlation-based network analysis of multiple omics data may identify key genes and proteins that work together to influence COPD pathogenesis.
Collapse
Affiliation(s)
- Yu-Hang Zhang
- Channing Division of Network Medicine, Harvard Medical School, and
| | - Michael H Cho
- Channing Division of Network Medicine, Harvard Medical School, and
- Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Boston, Massachusetts; and
| | - Jarrett D Morrow
- Channing Division of Network Medicine, Harvard Medical School, and
| | - Peter J Castaldi
- Channing Division of Network Medicine, Harvard Medical School, and
| | - Craig P Hersh
- Channing Division of Network Medicine, Harvard Medical School, and
- Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Boston, Massachusetts; and
| | | | | | - Sharon M Lutz
- Channing Division of Network Medicine, Harvard Medical School, and
- Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Boston, Massachusetts; and
| | | | - Edwin K Silverman
- Channing Division of Network Medicine, Harvard Medical School, and
- Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Boston, Massachusetts; and
| |
Collapse
|
12
|
Li Z, Melograna F, Hoskens H, Duroux D, Marazita ML, Walsh S, Weinberg SM, Shriver MD, Müller-Myhsok B, Claes P, Van Steen K. netMUG: a novel network-guided multi-view clustering workflow for dissecting genetic and facial heterogeneity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.04.539350. [PMID: 37205363 PMCID: PMC10187283 DOI: 10.1101/2023.05.04.539350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Multi-view data offer advantages over single-view data for characterizing individuals, which is crucial in precision medicine toward personalized prevention, diagnosis, or treatment follow-up. Here, we develop a network-guided multi-view clustering framework named netMUG to identify actionable subgroups of individuals. This pipeline first adopts sparse multiple canonical correlation analysis to select multi-view features possibly informed by extraneous data, which are then used to construct individual-specific networks (ISNs). Finally, the individual subtypes are automatically derived by hierarchical clustering on these network representations. We applied netMUG to a dataset containing genomic data and facial images to obtain BMI-informed multi-view strata and showed how it could be used for a refined obesity characterization. Benchmark analysis of netMUG on synthetic data with known strata of individuals indicated its superior performance compared with both baseline and benchmark methods for multi-view clustering. In addition, the real-data analysis revealed subgroups strongly linked to BMI and genetic and facial determinants of these classes. NetMUG provides a powerful strategy, exploiting individual-specific networks to identify meaningful and actionable strata. Moreover, the implementation is easy to generalize to accommodate heterogeneous data sources or highlight data structures.
Collapse
Affiliation(s)
- Zuqi Li
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- Medical Imaging Research Center, University Hospitals Leuven, Leuven, Belgium
| | | | - Hanne Hoskens
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- Medical Imaging Research Center, University Hospitals Leuven, Leuven, Belgium
| | - Diane Duroux
- GIGA-R Medical Genomics, University of Liège, Liège, Belgium
| | - Mary L. Marazita
- Center for Craniofacial and Dental Genetics, Department of Oral and Craniofacial Sciences, University of Pittsburgh, Pittsburgh, PA 15219, USA
- Department of Human Genetics, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Susan Walsh
- Department of Biology, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, USA
| | - Seth M. Weinberg
- Center for Craniofacial and Dental Genetics, Department of Oral and Craniofacial Sciences, University of Pittsburgh, Pittsburgh, PA 15219, USA
- Department of Human Genetics, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Mark D. Shriver
- Department of Anthropology, Pennsylvania State University, State College, PA 16801, USA
| | | | - Peter Claes
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- Medical Imaging Research Center, University Hospitals Leuven, Leuven, Belgium
- Department of Electrical Engineering, ESAT/PSI, KU Leuven, Leuven, Belgium
- Murdoch Children’s Research Institute, Melbourne, Victoria, Australia
| | - Kristel Van Steen
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- GIGA-R Medical Genomics, University of Liège, Liège, Belgium
| |
Collapse
|
13
|
Poverennaya EV, Pyatnitskiy MA, Dolgalev GV, Arzumanian VA, Kiseleva OI, Kurbatov IY, Kurbatov LK, Vakhrushev IV, Romashin DD, Kim YS, Ponomarenko EA. Exploiting Multi-Omics Profiling and Systems Biology to Investigate Functions of TOMM34. BIOLOGY 2023; 12:biology12020198. [PMID: 36829477 PMCID: PMC9952762 DOI: 10.3390/biology12020198] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 01/17/2023] [Accepted: 01/25/2023] [Indexed: 01/31/2023]
Abstract
Although modern biology is now in the post-genomic era with vastly increased access to high-quality data, the set of human genes with a known function remains far from complete. This is especially true for hundreds of mitochondria-associated genes, which are under-characterized and lack clear functional annotation. However, with the advent of multi-omics profiling methods coupled with systems biology algorithms, the cellular role of many such genes can be elucidated. Here, we report genes and pathways associated with TOMM34, Translocase of Outer Mitochondrial Membrane, which plays role in the mitochondrial protein import as a part of cytosolic complex together with Hsp70/Hsp90 and is upregulated in various cancers. We identified genes, proteins, and metabolites altered in TOMM34-/- HepG2 cells. To our knowledge, this is the first attempt to study the functional capacity of TOMM34 using a multi-omics strategy. We demonstrate that TOMM34 affects various processes including oxidative phosphorylation, citric acid cycle, metabolism of purine, and several amino acids. Besides the analysis of already known pathways, we utilized de novo network enrichment algorithm to extract novel perturbed subnetworks, thus obtaining evidence that TOMM34 potentially plays role in several other cellular processes, including NOTCH-, MAPK-, and STAT3-signaling. Collectively, our findings provide new insights into TOMM34's cellular functions.
Collapse
Affiliation(s)
| | - Mikhail A. Pyatnitskiy
- Institute of Biomedical Chemistry, Moscow 119121, Russia
- Faculty Of Computer Science, National Research University Higher School of Economics, Moscow 101000, Russia
- Correspondence:
| | | | | | | | | | | | | | | | - Yan S. Kim
- Institute of Biomedical Chemistry, Moscow 119121, Russia
| | | |
Collapse
|
14
|
Eicher T, Spencer KD, Siddiqui JK, Machiraju R, Mathé EA. IntLIM 2.0: identifying multi-omic relationships dependent on discrete or continuous phenotypic measurements. BIOINFORMATICS ADVANCES 2023; 3:vbad009. [PMID: 36922980 PMCID: PMC10010601 DOI: 10.1093/bioadv/vbad009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Accepted: 02/01/2023] [Indexed: 02/04/2023]
Abstract
Motivation IntLIM uncovers phenotype-dependent linear associations between two types of analytes (e.g. genes and metabolites) in a multi-omic dataset, which may reflect chemically or biologically relevant relationships. Results The new IntLIM R package includes newly added support for generalized data types, covariate correction, continuous phenotypic measurements, model validation and unit testing. IntLIM analysis uncovered biologically relevant gene-metabolite associations in two separate datasets, and the run time is improved over baseline R functions by multiple orders of magnitude. Availability and implementation IntLIM is available as an R package with a detailed vignette (https://github.com/ncats/IntLIM) and as an R Shiny app (see Supplementary Figs S1-S6) (https://intlim.ncats.io/). Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Tara Eicher
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences, NIH, Rockville, MD 20892, USA.,Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210, USA
| | - Kyle D Spencer
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences, NIH, Rockville, MD 20892, USA.,Biomedical Sciences Graduate Program, The Ohio State University, Columbus, OH 43210, USA
| | - Jalal K Siddiqui
- Comprehensive Cancer Center, The Ohio State University, Columbus, OH 43210, USA
| | - Raghu Machiraju
- Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210, USA.,Biomedical Informatics Department, The Ohio State University, Columbus, OH 43210, USA.,Department of Pathology, The Ohio State University, Columbus, OH 43210, USA
| | - Ewy A Mathé
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences, NIH, Rockville, MD 20892, USA
| |
Collapse
|
15
|
Vu T, Litkowski EM, Liu W, Pratte KA, Lange L, Bowler RP, Banaei-Kashani F, Kechris KJ. NetSHy: network summarization via a hybrid approach leveraging topological properties. Bioinformatics 2023; 39:6957083. [PMID: 36548341 PMCID: PMC9831052 DOI: 10.1093/bioinformatics/btac818] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 08/30/2022] [Accepted: 12/20/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Biological networks can provide a system-level understanding of underlying processes. In many contexts, networks have a high degree of modularity, i.e. they consist of subsets of nodes, often known as subnetworks or modules, which are highly interconnected and may perform separate functions. In order to perform subsequent analyses to investigate the association between the identified module and a variable of interest, a module summarization, that best explains the module's information and reduces dimensionality is often needed. Conventional approaches for obtaining network representation typically rely only on the profiles of the nodes within the network while disregarding the inherent network topological information. RESULTS In this article, we propose NetSHy, a hybrid approach which is capable of reducing the dimension of a network while incorporating topological properties to aid the interpretation of the downstream analyses. In particular, NetSHy applies principal component analysis (PCA) on a combination of the node profiles and the well-known Laplacian matrix derived directly from the network similarity matrix to extract a summarization at a subject level. Simulation scenarios based on random and empirical networks at varying network sizes and sparsity levels show that NetSHy outperforms the conventional PCA approach applied directly on node profiles, in terms of recovering the true correlation with a phenotype of interest and maintaining a higher amount of explained variation in the data when networks are relatively sparse. The robustness of NetSHy is also demonstrated by a more consistent correlation with the observed phenotype as the sample size decreases. Lastly, a genome-wide association study is performed as an application of a downstream analysis, where NetSHy summarization scores on the biological networks identify more significant single nucleotide polymorphisms than the conventional network representation. AVAILABILITY AND IMPLEMENTATION R code implementation of NetSHy is available at https://github.com/thaovu1/NetSHy. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Thao Vu
- To whom correspondence should be addressed. or
| | - Elizabeth M Litkowski
- Department of Epidemiology, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Division of Biomedical Informatics & Personalized Medicine, School of Medicine, Colorado University Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Weixuan Liu
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Katherine A Pratte
- Department of Biostatistics, National Jewish Health, Denver, CO 80206, USA
| | - Leslie Lange
- Division of Biomedical Informatics & Personalized Medicine, School of Medicine, Colorado University Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Russell P Bowler
- Division of Pulmonary Medicine, Department of Medicine, National Jewish Health, Denver, CO 80206, USA
| | - Farnoush Banaei-Kashani
- Department of Computer Science and Engineering, College of Engineering, Design and Computing, University of Colorado Denver, Denver, CO 80204, USA
| | | |
Collapse
|
16
|
Hussein S, Vu T, Lange L, Bowler RP, Kechris KJ, Banaei-Kashani F. Effective Subject Representation based on Multi-omics Disease Networks using Graph Embedding. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE 2022; 2022:1911-1918. [PMID: 36776768 PMCID: PMC9916186 DOI: 10.1109/bibm55620.2022.9995707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
The study of complex behavior of biological systems has become increasingly dependent on evolutionary network modeling. In particular, multi-omics networks capture interactions between biomolecules such as proteins and metabolites, providing a basis for predicting relationships between such biomolecules and various phenotypic traits of complex diseases. In this paper, we introduce an integrative framework that given a multi-omics network representing a cohort of subjects, learns expressive representations for network nodes, and combines the learned nodes representations with the biological profiles of individual subjects for enriched representation of the subjects. With extensive empirical evaluation using real-world multi-omics networks, we show that our proposed framework significantly outperforms existing and baseline methods in terms of subject representation accuracy, particularly when the multi-omics network representing the cohort is sparse and structured and therefore, more informative.
Collapse
Affiliation(s)
- Sundous Hussein
- Department of Computer Science and Engineering, University of Colorado Denver, Denver, CO, USA
| | - Thao Vu
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Leslie Lange
- Division of Biomedical Informatics and Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Russell P Bowler
- Division of Pulmonary, Critical Care and Sleep Medicine, National Jewish Health, Denver, CO, USA
| | - Katerina J Kechris
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Farnoush Banaei-Kashani
- Department of Computer Science and Engineering, University of Colorado Denver, Denver, CO, USA
| |
Collapse
|
17
|
Loers JU, Vermeirssen V. SUBATOMIC: a SUbgraph BAsed mulTi-OMIcs clustering framework to analyze integrated multi-edge networks. BMC Bioinformatics 2022; 23:363. [PMID: 36064320 PMCID: PMC9442970 DOI: 10.1186/s12859-022-04908-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Accepted: 08/24/2022] [Indexed: 11/02/2022] Open
Abstract
BACKGROUND Representing the complex interplay between different types of biomolecules across different omics layers in multi-omics networks bears great potential to gain a deep mechanistic understanding of gene regulation and disease. However, multi-omics networks easily grow into giant hairball structures that hamper biological interpretation. Module detection methods can decompose these networks into smaller interpretable modules. However, these methods are not adapted to deal with multi-omics data nor consider topological features. When deriving very large modules or ignoring the broader network context, interpretability remains limited. To address these issues, we developed a SUbgraph BAsed mulTi-OMIcs Clustering framework (SUBATOMIC), which infers small and interpretable modules with a specific topology while keeping track of connections to other modules and regulators. RESULTS SUBATOMIC groups specific molecular interactions in composite network subgraphs of two and three nodes and clusters them into topological modules. These are functionally annotated, visualized and overlaid with expression profiles to go from static to dynamic modules. To preserve the larger network context, SUBATOMIC investigates statistically the connections in between modules as well as between modules and regulators such as miRNAs and transcription factors. We applied SUBATOMIC to analyze a composite Homo sapiens network containing transcription factor-target gene, miRNA-target gene, protein-protein, homologous and co-functional interactions from different databases. We derived and annotated 5586 modules with diverse topological, functional and regulatory properties. We created novel functional hypotheses for unannotated genes. Furthermore, we integrated modules with condition specific expression data to study the influence of hypoxia in three cancer cell lines. We developed two prioritization strategies to identify the most relevant modules in specific biological contexts: one considering GO term enrichments and one calculating an activity score reflecting the degree of differential expression. Both strategies yielded modules specifically reacting to low oxygen levels. CONCLUSIONS We developed the SUBATOMIC framework that generates interpretable modules from integrated multi-omics networks and applied it to hypoxia in cancer. SUBATOMIC can infer and contextualize modules, explore condition or disease specific modules, identify regulators and functionally related modules, and derive novel gene functions for uncharacterized genes. The software is available at https://github.com/CBIGR/SUBATOMIC .
Collapse
Affiliation(s)
- Jens Uwe Loers
- Lab for Computational Biology, Integromics and Gene Regulation (CBIGR), Cancer Research Institute Ghent (CRIG), Ghent, Belgium.,Department of Biomedical Molecular Biology, Ghent University, Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Vanessa Vermeirssen
- Lab for Computational Biology, Integromics and Gene Regulation (CBIGR), Cancer Research Institute Ghent (CRIG), Ghent, Belgium. .,Department of Biomedical Molecular Biology, Ghent University, Ghent, Belgium. .,Department of Biomolecular Medicine, Ghent University, Ghent, Belgium.
| |
Collapse
|
18
|
Xu X, Qi J, Yang J, Pan T, Han H, Yang M, Han Y. Up-Regulation of TRIM32 Associated With the Poor Prognosis of Acute Myeloid Leukemia by Integrated Bioinformatics Analysis With External Validation. Front Oncol 2022; 12:848395. [PMID: 35756612 PMCID: PMC9213666 DOI: 10.3389/fonc.2022.848395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Accepted: 05/23/2022] [Indexed: 11/13/2022] Open
Abstract
Background Acute myeloid leukemia (AML) is a malignant and molecularly heterogeneous disease. It is essential to clarify the molecular mechanisms of AML and develop targeted treatment strategies to improve patient prognosis. Methods AML mRNA expression data and survival status were extracted from TCGA and GEO databases (GSE37642, GSE76009, GSE16432, GSE12417, GSE71014). Weighted gene co-expression network analysis (WGCNA) and differential gene expression analysis were performed. Functional enrichment analysis and protein-protein interaction (PPI) network were used to screen out hub genes. In addition, we validated the expression levels of hub genes as well as the prognostic value and externally validated TRIM32 with clinical data from our center. AML cell lines transfected with TRIM32 shRNA were also established to detect the proliferation in vitro. Results A total of 2192 AML patients from TCGA and GEO datasets were included in this study and 20 differentially co-expressed genes were screened by WGCNA and differential gene expression analysis methods. These genes were mainly enriched in phospholipid metabolic processes (biological processes, BP), secretory granule membranes (cellular components, CC), and protein serine/threonine kinase activity (molecular functions, MF). In addition, the protein-protein interaction (PPI) network contains 15 nodes and 15 edges and 10 hub genes (TLE1, GLI2, HDAC9, MICALL2, DOCK1, PDPN, RAB27B, SIX3, TRIM32 and TBX1) were identified. The expression of 10 central genes, except TLE1, was associated with survival status in AML patients (p<0.05). High expression of TRIM32 was tightly associated with poor relapse-free survival (RFS) and overall survival (OS) in AML patients, which was verified in the bone marrow samples from our center. In vitro, knockdown of TRIM32 can inhibit the proliferation of AML cell lines. Conclusion TRIM32 was associated with the progression and prognosis of AML patients and could be a potential therapeutic target and biomarker for AML in the future.
Collapse
Affiliation(s)
- Xiaoyan Xu
- National clinical research center for hematologic diseases, Jiangsu Institute of Hematology, The First Affiliated Hospital of Soochow University, Suzhou, China.,Institute of Blood and Marrow Transplantation, Collaborative Innovation Center of Hematology, Soochow University, Suzhou, China.,Department of Hematology, Key Laboratory of Thrombosis and Hemostasis of Ministry of Health, Suzhou, China.,State Key Laboratory of Radiation Medicine and Protection, Soochow University, Suzhou, China
| | - Jiaqian Qi
- National clinical research center for hematologic diseases, Jiangsu Institute of Hematology, The First Affiliated Hospital of Soochow University, Suzhou, China.,Institute of Blood and Marrow Transplantation, Collaborative Innovation Center of Hematology, Soochow University, Suzhou, China.,Department of Hematology, Key Laboratory of Thrombosis and Hemostasis of Ministry of Health, Suzhou, China.,State Key Laboratory of Radiation Medicine and Protection, Soochow University, Suzhou, China
| | - Jingyi Yang
- National clinical research center for hematologic diseases, Jiangsu Institute of Hematology, The First Affiliated Hospital of Soochow University, Suzhou, China.,Institute of Blood and Marrow Transplantation, Collaborative Innovation Center of Hematology, Soochow University, Suzhou, China.,Department of Hematology, Key Laboratory of Thrombosis and Hemostasis of Ministry of Health, Suzhou, China.,State Key Laboratory of Radiation Medicine and Protection, Soochow University, Suzhou, China
| | - Tingting Pan
- National clinical research center for hematologic diseases, Jiangsu Institute of Hematology, The First Affiliated Hospital of Soochow University, Suzhou, China.,Institute of Blood and Marrow Transplantation, Collaborative Innovation Center of Hematology, Soochow University, Suzhou, China.,Department of Hematology, Key Laboratory of Thrombosis and Hemostasis of Ministry of Health, Suzhou, China.,State Key Laboratory of Radiation Medicine and Protection, Soochow University, Suzhou, China
| | - Haohao Han
- National clinical research center for hematologic diseases, Jiangsu Institute of Hematology, The First Affiliated Hospital of Soochow University, Suzhou, China.,Institute of Blood and Marrow Transplantation, Collaborative Innovation Center of Hematology, Soochow University, Suzhou, China.,Department of Hematology, Key Laboratory of Thrombosis and Hemostasis of Ministry of Health, Suzhou, China.,State Key Laboratory of Radiation Medicine and Protection, Soochow University, Suzhou, China
| | - Meng Yang
- National clinical research center for hematologic diseases, Jiangsu Institute of Hematology, The First Affiliated Hospital of Soochow University, Suzhou, China.,Institute of Blood and Marrow Transplantation, Collaborative Innovation Center of Hematology, Soochow University, Suzhou, China.,Department of Hematology, Key Laboratory of Thrombosis and Hemostasis of Ministry of Health, Suzhou, China.,State Key Laboratory of Radiation Medicine and Protection, Soochow University, Suzhou, China
| | - Yue Han
- National clinical research center for hematologic diseases, Jiangsu Institute of Hematology, The First Affiliated Hospital of Soochow University, Suzhou, China.,Institute of Blood and Marrow Transplantation, Collaborative Innovation Center of Hematology, Soochow University, Suzhou, China.,Department of Hematology, Key Laboratory of Thrombosis and Hemostasis of Ministry of Health, Suzhou, China.,State Key Laboratory of Radiation Medicine and Protection, Soochow University, Suzhou, China
| |
Collapse
|
19
|
Abdel-Hafiz M, Najafi M, Helmi S, Pratte KA, Zhuang Y, Liu W, Kechris KJ, Bowler RP, Lange L, Banaei-Kashani F. Significant Subgraph Detection in Multi-omics Networks for Disease Pathway Identification. Front Big Data 2022; 5:894632. [PMID: 35811829 PMCID: PMC9256965 DOI: 10.3389/fdata.2022.894632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2022] [Accepted: 05/27/2022] [Indexed: 01/21/2023] Open
Abstract
Chronic obstructive pulmonary disease (COPD) is one of the leading causes of death in the United States. COPD represents one of many areas of research where identifying complex pathways and networks of interacting biomarkers is an important avenue toward studying disease progression and potentially discovering cures. Recently, sparse multiple canonical correlation network analysis (SmCCNet) was developed to identify complex relationships between omics associated with a disease phenotype, such as lung function. SmCCNet uses two sets of omics datasets and an associated output phenotypes to generate a multi-omics graph, which can then be used to explore relationships between omics in the context of a disease. Detecting significant subgraphs within this multi-omics network, i.e., subgraphs which exhibit high correlation to a disease phenotype and high inter-connectivity, can help clinicians identify complex biological relationships involved in disease progression. The current approach to identifying significant subgraphs relies on hierarchical clustering, which can be used to inform clinicians about important pathways involved in the disease or phenotype of interest. The reliance on a hierarchical clustering approach can hinder subgraph quality by biasing toward finding more compact subgraphs and removing larger significant subgraphs. This study aims to introduce new significant subgraph detection techniques. In particular, we introduce two subgraph detection methods, dubbed Correlated PageRank and Correlated Louvain, by extending the Personalized PageRank Clustering and Louvain algorithms, as well as a hybrid approach combining the two proposed methods, and compare them to the hierarchical method currently in use. The proposed methods show significant improvement in the quality of the subgraphs produced when compared to the current state of the art.
Collapse
Affiliation(s)
- Mohamed Abdel-Hafiz
- Big Data Management and Mining Laboratory, Department of Computer Science and Engineering, College of Engineering, Design and Computing, University of Colorado Denver, Denver, CO, United States,*Correspondence: Mohamed Abdel-Hafiz
| | - Mesbah Najafi
- Department of Mathematics, College of Liberal Arts and Sciences, University of Colorado Denver, Denver, CO, United States
| | - Shahab Helmi
- Big Data Management and Mining Laboratory, Department of Computer Science and Engineering, College of Engineering, Design and Computing, University of Colorado Denver, Denver, CO, United States
| | | | - Yonghua Zhuang
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Weixuan Liu
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Katerina J. Kechris
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Russell P. Bowler
- National Jewish Health, Denver, CO, United States,School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Leslie Lange
- Division of Biomedical Informatics and Personalized Medicine, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Farnoush Banaei-Kashani
- Big Data Management and Mining Laboratory, Department of Computer Science and Engineering, College of Engineering, Design and Computing, University of Colorado Denver, Denver, CO, United States
| |
Collapse
|
20
|
O’Connor JB, Mottlowitz M, Kruk ME, Mickelson A, Wagner BD, Harris JK, Wendt CH, Laguna TA. Network Analysis to Identify Multi-Omic Correlations in the Lower Airways of Children With Cystic Fibrosis. Front Cell Infect Microbiol 2022; 12:805170. [PMID: 35360097 PMCID: PMC8960254 DOI: 10.3389/fcimb.2022.805170] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Accepted: 02/16/2022] [Indexed: 11/13/2022] Open
Abstract
The leading cause of morbidity and mortality in cystic fibrosis (CF) is progressive lung disease secondary to chronic airway infection and inflammation; however, what drives CF airway infection and inflammation is not well understood. By providing a physiological snapshot of the airway, metabolomics can provide insight into these processes. Linking metabolomic data with microbiome data and phenotypic measures can reveal complex relationships between metabolites, lower airway bacterial communities, and disease outcomes. In this study, we characterize the airway metabolome in bronchoalveolar lavage fluid (BALF) samples from persons with CF (PWCF) and disease control (DC) subjects and use multi-omic network analysis to identify correlations with the airway microbiome. The Biocrates targeted liquid chromatography mass spectrometry (LC-MS) platform was used to measure 409 metabolomic features in BALF obtained during clinically indicated bronchoscopy. Total bacterial load (TBL) was measured using quantitative polymerase chain reaction (qPCR). The Qiagen EZ1 Advanced automated extraction platform was used to extract DNA, and bacterial profiling was performed using 16S sequencing. Differences in metabolomic features across disease groups were assessed univariately using Wilcoxon rank sum tests, and Random forest (RF) was used to identify features that discriminated across the groups. Features were compared to TBL and markers of inflammation, including white blood cell count (WBC) and percent neutrophils. Sparse supervised canonical correlation network analysis (SsCCNet) was used to assess multi-omic correlations. The CF metabolome was characterized by increased amino acids and decreased acylcarnitines. Amino acids and acylcarnitines were also among the features most strongly correlated with inflammation and bacterial burden. RF identified strong metabolomic predictors of CF status, including L-methionine-S-oxide. SsCCNet identified correlations between the metabolome and the microbiome, including correlations between a traditional CF pathogen, Staphylococcus, a group of nontraditional taxa, including Prevotella, and a subnetwork of specific metabolomic markers. In conclusion, our work identified metabolomic characteristics unique to the CF airway and uncovered multi-omic correlations that merit additional study.
Collapse
Affiliation(s)
- John B. O’Connor
- Department of Pediatrics, Division of Pulmonary and Sleep Medicine, Ann & Robert H. Lurie Children’s Hospital of Chicago, Chicago, IL, United States
- *Correspondence: John B. O’Connor,
| | - Madison Mottlowitz
- Department of Pediatrics, Division of Pulmonary and Sleep Medicine, Ann & Robert H. Lurie Children’s Hospital of Chicago, Chicago, IL, United States
| | - Monica E. Kruk
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, United States
| | - Alan Mickelson
- Department of Medicine, University of Minnesota, Minneapolis VA Medical Center, Minneapolis, MN, United States
| | - Brandie D. Wagner
- School of Medicine, University of Colorado, Aurora, CO, United States
- Colorado School of Public Health, University of Colorado Denver, Aurora, CO, United States
| | | | - Christine H. Wendt
- Department of Medicine, University of Minnesota, Minneapolis VA Medical Center, Minneapolis, MN, United States
| | - Theresa A. Laguna
- Department of Pediatrics, Division of Pulmonary and Sleep Medicine, Ann & Robert H. Lurie Children’s Hospital of Chicago, Chicago, IL, United States
- Northwestern University Feinberg School of Medicine, Chicago, IL, United States
| |
Collapse
|
21
|
Zhuang Y, Hobbs BD, Hersh CP, Kechris K. Identifying miRNA-mRNA Networks Associated With COPD Phenotypes. Front Genet 2021; 12:748356. [PMID: 34777474 PMCID: PMC8581181 DOI: 10.3389/fgene.2021.748356] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Accepted: 09/27/2021] [Indexed: 11/21/2022] Open
Abstract
Chronic obstructive pulmonary disease (COPD) is characterized by expiratory airflow limitation and symptoms such as shortness of breath. Although many studies have demonstrated dysregulated microRNA (miRNA) and gene (mRNA) expression in the pathogenesis of COPD, how miRNAs and mRNAs systematically interact and contribute to COPD development is still not clear. To gain a deeper understanding of the gene regulatory network underlying COPD pathogenesis, we used Sparse Multiple Canonical Correlation Network (SmCCNet) to integrate whole blood miRNA and RNA-sequencing data from 404 participants in the COPDGene study to identify novel miRNA-mRNA networks associated with COPD-related phenotypes including lung function and emphysema. We hypothesized that phenotype-directed interpretable miRNA-mRNA networks from SmCCNet would assist in the discovery of novel biomarkers that traditional single biomarker discovery methods (such as differential expression) might fail to discover. Additionally, we investigated whether adjusting -omics and clinical phenotypes data for covariates prior to integration would increase the statistical power for network identification. Our study demonstrated that partial covariate adjustment for age, sex, race, and CT scanner model (in the quantitative emphysema networks) improved network identification when compared with no covariate adjustment. However, further adjustment for current smoking status and relative white blood cell (WBC) proportions sometimes weakened the power for identifying lung function and emphysema networks, a phenomenon which may be due to the correlation of smoking status and WBC counts with the COPD-related phenotypes. With partial covariate adjustment, we found six miRNA-mRNA networks associated with COPD-related phenotypes. One network consists of 2 miRNAs and 28 mRNAs which had a 0.33 correlation (p = 5.40E-12) to forced expiratory volume in 1 s (FEV1) percent predicted. We also found a network of 5 miRNAs and 81 mRNAs that had a 0.45 correlation (p = 8.80E-22) to percent emphysema. The miRNA-mRNA networks associated with COPD traits provide a systems view of COPD pathogenesis and complements biomarker identification with individual miRNA or mRNA expression data.
Collapse
Affiliation(s)
- Yonghua Zhuang
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Brian D Hobbs
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, United States
- Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Craig P Hersh
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, United States
- Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Katerina Kechris
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| |
Collapse
|
22
|
Carpenter CM, Zhang W, Gillenwater L, Severn C, Ghosh T, Bowler R, Kechris K, Ghosh D. PaIRKAT: A pathway integrated regression-based kernel association test with applications to metabolomics and COPD phenotypes. PLoS Comput Biol 2021; 17:e1008986. [PMID: 34679079 PMCID: PMC8565741 DOI: 10.1371/journal.pcbi.1008986] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Revised: 11/03/2021] [Accepted: 10/13/2021] [Indexed: 02/02/2023] Open
Abstract
High-throughput data such as metabolomics, genomics, transcriptomics, and proteomics have become familiar data types within the "-omics" family. For this work, we focus on subsets that interact with one another and represent these "pathways" as graphs. Observed pathways often have disjoint components, i.e., nodes or sets of nodes (metabolites, etc.) not connected to any other within the pathway, which notably lessens testing power. In this paper we propose the Pathway Integrated Regression-based Kernel Association Test (PaIRKAT), a new kernel machine regression method for incorporating known pathway information into the semi-parametric kernel regression framework. This work extends previous kernel machine approaches. This paper also contributes an application of a graph kernel regularization method for overcoming disconnected pathways. By incorporating a regularized or "smoothed" graph into a score test, PaIRKAT can provide more powerful tests for associations between biological pathways and phenotypes of interest and will be helpful in identifying novel pathways for targeted clinical research. We evaluate this method through several simulation studies and an application to real metabolomics data from the COPDGene study. Our simulation studies illustrate the robustness of this method to incorrect and incomplete pathway knowledge, and the real data analysis shows meaningful improvements of testing power in pathways. PaIRKAT was developed for application to metabolomic pathway data, but the techniques are easily generalizable to other data sources with a graph-like structure.
Collapse
Affiliation(s)
- Charlie M. Carpenter
- Department of Biostatistics and Informatics, University of Colorado Denver, Anschutz Medical campus, Denver, Colorado, United States of America
| | - Weiming Zhang
- Syneos Health, Morrisville, North Carolina, United States of America
| | - Lucas Gillenwater
- Computational Bioscience Program, University of Colorado Denver, Anschutz medical campus, Denver, Colorado, United States of America
| | - Cameron Severn
- Department of Biostatistics and Informatics, University of Colorado Denver, Anschutz Medical campus, Denver, Colorado, United States of America
| | - Tusharkanti Ghosh
- Department of Biostatistics and Informatics, University of Colorado Denver, Anschutz Medical campus, Denver, Colorado, United States of America
| | - Russell Bowler
- Department of Medicine, National Jewish Health, Denver; University of Colorado Denver, Anschutz Medical Campus, Denver, Colorado, United States of America
| | - Katerina Kechris
- Department of Biostatistics and Informatics, University of Colorado Denver, Anschutz Medical campus, Denver, Colorado, United States of America
| | - Debashis Ghosh
- Department of Biostatistics and Informatics, University of Colorado Denver, Anschutz Medical campus, Denver, Colorado, United States of America
| |
Collapse
|
23
|
Kosvyra A, Ntzioni E, Chouvarda I. Network analysis with biological data of cancer patients: A scoping review. J Biomed Inform 2021; 120:103873. [PMID: 34298154 DOI: 10.1016/j.jbi.2021.103873] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2020] [Revised: 06/30/2021] [Accepted: 07/18/2021] [Indexed: 12/25/2022]
Abstract
BACKGROUND & OBJECTIVE Network Analysis (NA) is a mathematical method that allows exploring relations between units and representing them as a graph. Although NA was initially related to social sciences, the past two decades was introduced in Bioinformatics. The recent growth of the networks' use in biological data analysis reveals the need to further investigate this area. In this work, we attempt to identify the use of NA with biological data, and specifically: (a) what types of data are used and whether they are integrated or not, (b) what is the purpose of this analysis, predictive or descriptive, and (c) the outcome of such analyses, specifically in cancer diseases. METHODS & MATERIALS The literature review was conducted on two databases, PubMed & IEEE, and was restricted to journal articles of the last decade (January 2010 - December 2019). At a first level, all articles were screened by title and abstract, and at a second level the screening was conducted by reading the full text article, following the predefined inclusion & exclusion criteria leading to 131 articles of interest. A table was created with the information of interest and was used for the classification of the articles. The articles were initially classified to analysis studies and studies that propose a new algorithm or methodology. Each one of these categories was further screened by the following clustering criteria: (a) data used, (b) study purpose, (c) study outcome. Specifically for the studies proposing a new algorithm, the novelty presented in each one was detected. RESULTS & Conclusions: In the past five years researchers are focusing on creating new algorithms and methodologies to enhance this field. The articles' classification revealed that only 25% of the analyses are integrating multi-omics data, although 50% of the new algorithms developed follow this integrative direction. Moreover, only 20% of the analyses and 10% of the newly developed methodologies have a predictive purpose. Regarding the result of the works reviewed, 75% of the studies focus on identifying, prognostic or not, gene signatures. Concluding, this review revealed the need for deploying predictive and multi-omics integrative algorithms and methodologies that can be used to enhance cancer diagnosis, prognosis and treatment.
Collapse
Affiliation(s)
- A Kosvyra
- Laboratory of Computing, Medical Informatics and Biomedical Imaging Technologies, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece.
| | - E Ntzioni
- Laboratory of Computing, Medical Informatics and Biomedical Imaging Technologies, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - I Chouvarda
- Laboratory of Computing, Medical Informatics and Biomedical Imaging Technologies, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece
| |
Collapse
|
24
|
TSCCA: A tensor sparse CCA method for detecting microRNA-gene patterns from multiple cancers. PLoS Comput Biol 2021; 17:e1009044. [PMID: 34061840 PMCID: PMC8195367 DOI: 10.1371/journal.pcbi.1009044] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2020] [Revised: 06/11/2021] [Accepted: 05/05/2021] [Indexed: 12/22/2022] Open
Abstract
Existing studies have demonstrated that dysregulation of microRNAs (miRNAs or miRs) is involved in the initiation and progression of cancer. Many efforts have been devoted to identify microRNAs as potential biomarkers for cancer diagnosis, prognosis and therapeutic targets. With the rapid development of miRNA sequencing technology, a vast amount of miRNA expression data for multiple cancers has been collected. These invaluable data repositories provide new paradigms to explore the relationship between miRNAs and cancer. Thus, there is an urgent need to explore the complex cancer-related miRNA-gene patterns by integrating multi-omics data in a pan-cancer paradigm. In this study, we present a tensor sparse canonical correlation analysis (TSCCA) method for identifying cancer-related miRNA-gene modules across multiple cancers. TSCCA is able to overcome the drawbacks of existing solutions and capture both the cancer-shared and specific miRNA-gene co-expressed modules with better biological interpretations. We comprehensively evaluate the performance of TSCCA using a set of simulated data and matched miRNA/gene expression data across 33 cancer types from the TCGA database. We uncover several dysfunctional miRNA-gene modules with important biological functions and statistical significance. These modules can advance our understanding of miRNA regulatory mechanisms of cancer and provide insights into miRNA-based treatments for cancer. MicroRNAs (miRNAs) are a class of small non-coding RNAs. Previous studies have revealed that miRNA-gene regulatory modules play key roles in the occurrence and development of cancer. However, little has been done to discover miRNA-gene regulatory modules from a pan-cancer view. Thus, it is urgently needed to develop new methods to explore the complex cancer-related miRNA-gene patterns by integrating multi-omics data of multi-cancers. To build the connections between miRNA-gene regulatory modules across different cancer types, we propose a tensor sparse canonical correlation analysis (TSCCA) method. Our specific contributions are two-fold: (1) We propose a sparse statistical learning model TSCCA and an efficient block-coordinate descent algorithm to solve it. (2) We apply TSCCA to a multi-omics data set of 33 cancer types from TCGA and identify some cancer-related miRNA-gene modules with important biological functions and statistical significance.
Collapse
|
25
|
Arbet J, Zhuang Y, Litkowski E, Saba L, Kechris K. Comparing Statistical Tests for Differential Network Analysis of Gene Modules. Front Genet 2021; 12:630215. [PMID: 34093641 PMCID: PMC8170128 DOI: 10.3389/fgene.2021.630215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Accepted: 04/19/2021] [Indexed: 11/13/2022] Open
Abstract
Genes often work together to perform complex biological processes, and "networks" provide a versatile framework for representing the interactions between multiple genes. Differential network analysis (DiNA) quantifies how this network structure differs between two or more groups/phenotypes (e.g., disease subjects and healthy controls), with the goal of determining whether differences in network structure can help explain differences between phenotypes. In this paper, we focus on gene co-expression networks, although in principle, the methods studied can be used for DiNA for other types of features (e.g., metabolome, epigenome, microbiome, proteome, etc.). Three common applications of DiNA involve (1) testing whether the connections to a single gene differ between groups, (2) testing whether the connection between a pair of genes differs between groups, or (3) testing whether the connections within a "module" (a subset of 3 or more genes) differs between groups. This article focuses on the latter, as there is a lack of studies comparing statistical methods for identifying differentially co-expressed modules (DCMs). Through extensive simulations, we compare several previously proposed test statistics and a new p-norm difference test (PND). We demonstrate that the true positive rate of the proposed PND test is competitive with and often higher than the other methods, while controlling the false positive rate. The R package discoMod (differentially co-expressed modules) implements the proposed method and provides a full pipeline for identifying DCMs: clustering tools to derive gene modules, tests to identify DCMs, and methods for visualizing the results.
Collapse
Affiliation(s)
- Jaron Arbet
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Yaxu Zhuang
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Elizabeth Litkowski
- Department of Epidemiology, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Laura Saba
- Department of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora CO, United States
| | - Katerina Kechris
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| |
Collapse
|
26
|
Eicher T, Kinnebrew G, Patt A, Spencer K, Ying K, Ma Q, Machiraju R, Mathé EA. Metabolomics and Multi-Omics Integration: A Survey of Computational Methods and Resources. Metabolites 2020; 10:E202. [PMID: 32429287 PMCID: PMC7281435 DOI: 10.3390/metabo10050202] [Citation(s) in RCA: 56] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Revised: 05/07/2020] [Accepted: 05/13/2020] [Indexed: 02/06/2023] Open
Abstract
As researchers are increasingly able to collect data on a large scale from multiple clinical and omics modalities, multi-omics integration is becoming a critical component of metabolomics research. This introduces a need for increased understanding by the metabolomics researcher of computational and statistical analysis methods relevant to multi-omics studies. In this review, we discuss common types of analyses performed in multi-omics studies and the computational and statistical methods that can be used for each type of analysis. We pinpoint the caveats and considerations for analysis methods, including required parameters, sample size and data distribution requirements, sources of a priori knowledge, and techniques for the evaluation of model accuracy. Finally, for the types of analyses discussed, we provide examples of the applications of corresponding methods to clinical and basic research. We intend that our review may be used as a guide for metabolomics researchers to choose effective techniques for multi-omics analyses relevant to their field of study.
Collapse
Affiliation(s)
- Tara Eicher
- Biomedical Informatics Department, The Ohio State University College of Medicine, Columbus, OH 43210, USA; (T.E.); (G.K.); (K.S.); (Q.M.); (R.M.)
- Computer Science and Engineering Department, The Ohio State University College of Engineering, Columbus, OH 43210, USA
| | - Garrett Kinnebrew
- Biomedical Informatics Department, The Ohio State University College of Medicine, Columbus, OH 43210, USA; (T.E.); (G.K.); (K.S.); (Q.M.); (R.M.)
- Comprehensive Cancer Center, The Ohio State University and James Cancer Hospital, Columbus, OH 43210, USA;
- Bioinformatics Shared Resource Group, The Ohio State University, Columbus, OH 43210, USA
| | - Andrew Patt
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences, NIH, 9800 Medical Center Dr., Rockville, MD, 20892, USA;
- Biomedical Sciences Graduate Program, The Ohio State University, Columbus, OH 43210, USA
| | - Kyle Spencer
- Biomedical Informatics Department, The Ohio State University College of Medicine, Columbus, OH 43210, USA; (T.E.); (G.K.); (K.S.); (Q.M.); (R.M.)
- Biomedical Sciences Graduate Program, The Ohio State University, Columbus, OH 43210, USA
- Nationwide Children’s Research Hospital, Columbus, OH 43210, USA
| | - Kevin Ying
- Comprehensive Cancer Center, The Ohio State University and James Cancer Hospital, Columbus, OH 43210, USA;
- Molecular, Cellular and Developmental Biology Program, The Ohio State University, Columbus, OH 43210, USA
| | - Qin Ma
- Biomedical Informatics Department, The Ohio State University College of Medicine, Columbus, OH 43210, USA; (T.E.); (G.K.); (K.S.); (Q.M.); (R.M.)
| | - Raghu Machiraju
- Biomedical Informatics Department, The Ohio State University College of Medicine, Columbus, OH 43210, USA; (T.E.); (G.K.); (K.S.); (Q.M.); (R.M.)
- Computer Science and Engineering Department, The Ohio State University College of Engineering, Columbus, OH 43210, USA
- Department of Pathology, Wexner Medical Center, The Ohio State University, Columbus, OH 43210, USA
- Translational Data Analytics Institute, The Ohio State University, Columbus, OH 43210, USA
| | - Ewy A. Mathé
- Biomedical Informatics Department, The Ohio State University College of Medicine, Columbus, OH 43210, USA; (T.E.); (G.K.); (K.S.); (Q.M.); (R.M.)
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences, NIH, 9800 Medical Center Dr., Rockville, MD, 20892, USA;
| |
Collapse
|
27
|
Mastej E, Gillenwater L, Zhuang Y, Pratte KA, Bowler RP, Kechris K. Identifying Protein-metabolite Networks Associated with COPD Phenotypes. Metabolites 2020; 10:metabo10040124. [PMID: 32218378 PMCID: PMC7241079 DOI: 10.3390/metabo10040124] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Revised: 03/06/2020] [Accepted: 03/23/2020] [Indexed: 02/02/2023] Open
Abstract
Chronic obstructive pulmonary disease (COPD) is a disease in which airflow obstruction in the lung makes it difficult for patients to breathe. Although COPD occurs predominantly in smokers, there are still deficits in our understanding of the additional risk factors in smokers. To gain a deeper understanding of the COPD molecular signatures, we used Sparse Multiple Canonical Correlation Network (SmCCNet), a recently developed tool that uses sparse multiple canonical correlation analysis, to integrate proteomic and metabolomic data from the blood of 1008 participants of the COPDGene study to identify novel protein-metabolite networks associated with lung function and emphysema. Our aim was to integrate -omic data through SmCCNet to build interpretable networks that could assist in the discovery of novel biomarkers that may have been overlooked in alternative biomarker discovery methods. We found a protein-metabolite network consisting of 13 proteins and 7 metabolites which had a -0.34 correlation (p-value = 2.5 × 10-28) to lung function. We also found a network of 13 proteins and 10 metabolites that had a -0.27 correlation (p-value = 2.6 × 10-17) to percent emphysema. Protein-metabolite networks can provide additional information on the progression of COPD that complements single biomarker or single -omic analyses.
Collapse
Affiliation(s)
- Emily Mastej
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Correspondence:
| | | | - Yonghua Zhuang
- Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | | | - Russell P. Bowler
- National Jewish Health, Denver, CO 80206, USA (K.A.P.)
- School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Katerina Kechris
- Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| |
Collapse
|