1
|
Clark T, Mohan J, Schaffer L, Obernier K, Al Manir S, Churas CP, Dailamy A, Doctor Y, Forget A, Hansen JN, Hu M, Lenkiewicz J, Levinson MA, Marquez C, Nourreddine S, Niestroy J, Pratt D, Qian G, Thaker S, Bélisle-Pipon JC, Brandt C, Chen J, Ding Y, Fodeh S, Krogan N, Lundberg E, Mali P, Payne-Foster P, Ratcliffe S, Ravitsky V, Sali A, Schulz W, Ideker T. Cell Maps for Artificial Intelligence: AI-Ready Maps of Human Cell Architecture from Disease-Relevant Cell Lines. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.21.589311. [PMID: 38826258 PMCID: PMC11142054 DOI: 10.1101/2024.05.21.589311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
This article describes the Cell Maps for Artificial Intelligence (CM4AI) project and its goals, methods, standards, current datasets, software tools , status, and future directions. CM4AI is the Functional Genomics Data Generation Project in the U.S. National Institute of Health's (NIH) Bridge2AI program. Its overarching mission is to produce ethical, AI-ready datasets of cell architecture, inferred from multimodal data collected for human cell lines, to enable transformative biomedical AI research.
Collapse
|
2
|
Doria-Belenguer S, Xenos A, Ceddia G, Malod-Dognin N, Pržulj N. The axes of biology: a novel axes-based network embedding paradigm to decipher the functional mechanisms of the cell. BIOINFORMATICS ADVANCES 2024; 4:vbae075. [PMID: 38827411 PMCID: PMC11142626 DOI: 10.1093/bioadv/vbae075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 04/15/2024] [Accepted: 05/22/2024] [Indexed: 06/04/2024]
Abstract
Summary Common approaches for deciphering biological networks involve network embedding algorithms. These approaches strictly focus on clustering the genes' embedding vectors and interpreting such clusters to reveal the hidden information of the networks. However, the difficulty in interpreting the genes' clusters and the limitations of the functional annotations' resources hinder the identification of the currently unknown cell's functioning mechanisms. We propose a new approach that shifts this functional exploration from the embedding vectors of genes in space to the axes of the space itself. Our methodology better disentangles biological information from the embedding space than the classic gene-centric approach. Moreover, it uncovers new data-driven functional interactions that are unregistered in the functional ontologies, but biologically coherent. Furthermore, we exploit these interactions to define new higher-level annotations that we term Axes-Specific Functional Annotations and validate them through literature curation. Finally, we leverage our methodology to discover evolutionary connections between cellular functions and the evolution of species. Availability and implementation Data and source code can be accessed at https://gitlab.bsc.es/sdoria/axes-of-biology.git.
Collapse
Affiliation(s)
| | | | - Gaia Ceddia
- Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain
| | | | - Nataša Pržulj
- Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain
- Department of Computer Science, University College London, London, WC1E 6BT, United Kingdom
- ICREA, Barcelona 08010, Spain
| |
Collapse
|
3
|
Konno N, Kijima Y, Watano K, Ishiguro S, Ono K, Tanaka M, Mori H, Masuyama N, Pratt D, Ideker T, Iwasaki W, Yachie N. Deep distributed computing to reconstruct extremely large lineage trees. Nat Biotechnol 2022; 40:566-575. [PMID: 34992246 PMCID: PMC9934975 DOI: 10.1038/s41587-021-01111-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2020] [Accepted: 10/01/2021] [Indexed: 02/07/2023]
Abstract
Phylogeny estimation (the reconstruction of evolutionary trees) has recently been applied to CRISPR-based cell lineage tracing, allowing the developmental history of an individual tissue or organism to be inferred from a large number of mutated sequences in somatic cells. However, current computational methods are not able to construct phylogenetic trees from extremely large numbers of input sequences. Here, we present a deep distributed computing framework to comprehensively trace accurate large lineages (FRACTAL) that substantially enhances the scalability of current lineage estimation software tools. FRACTAL first reconstructs only an upstream lineage of the input sequences and recursively iterates the same produce for its downstream lineages using independent computing nodes. We demonstrate the utility of FRACTAL by reconstructing lineages from >235 million simulated sequences and from >16 million cells from a simulated experiment with a CRISPR system that accumulates mutations during cell proliferation. We also successfully applied FRACTAL to evolutionary tree reconstructions and to an experiment using error-prone PCR (EP-PCR) for large-scale sequence diversification.
Collapse
Affiliation(s)
- Naoki Konno
- Synthetic Biology Division, Research Center for Advanced Science and Technology, The University of Tokyo, Tokyo, Japan.,Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, Japan.,Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Japan
| | - Yusuke Kijima
- Synthetic Biology Division, Research Center for Advanced Science and Technology, The University of Tokyo, Tokyo, Japan.,Department of Aquatic Bioscience, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan.,School of Biomedical Engineering, Faculty of Applied Science and Faculty of Medicine, The University of British Columbia, Vancouver, British Columbia, Canada.,These authors contributed equally: Yusuke Kijima, Keito Watano, Soh Ishiguro
| | - Keito Watano
- Synthetic Biology Division, Research Center for Advanced Science and Technology, The University of Tokyo, Tokyo, Japan.,Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, Japan.,Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Japan.,These authors contributed equally: Yusuke Kijima, Keito Watano, Soh Ishiguro
| | - Soh Ishiguro
- Synthetic Biology Division, Research Center for Advanced Science and Technology, The University of Tokyo, Tokyo, Japan.,School of Biomedical Engineering, Faculty of Applied Science and Faculty of Medicine, The University of British Columbia, Vancouver, British Columbia, Canada.,These authors contributed equally: Yusuke Kijima, Keito Watano, Soh Ishiguro
| | - Keiichiro Ono
- Department of Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Mamoru Tanaka
- Synthetic Biology Division, Research Center for Advanced Science and Technology, The University of Tokyo, Tokyo, Japan
| | - Hideto Mori
- Synthetic Biology Division, Research Center for Advanced Science and Technology, The University of Tokyo, Tokyo, Japan.,Institute for Advanced Biosciences, Keio University, Tsuruoka, Japan.,Graduate School of Media and Governance, Keio University, Fujisawa, Japan
| | - Nanami Masuyama
- Synthetic Biology Division, Research Center for Advanced Science and Technology, The University of Tokyo, Tokyo, Japan.,School of Biomedical Engineering, Faculty of Applied Science and Faculty of Medicine, The University of British Columbia, Vancouver, British Columbia, Canada.,Institute for Advanced Biosciences, Keio University, Tsuruoka, Japan.,Graduate School of Media and Governance, Keio University, Fujisawa, Japan
| | - Dexter Pratt
- Department of Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Trey Ideker
- Department of Medicine, University of California, San Diego, La Jolla, CA, USA.,Departments of Bioengineering and Computer Science, University of California San Diego, La Jolla, CA, USA
| | - Wataru Iwasaki
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, Japan.,Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Japan
| | - Nozomu Yachie
- Synthetic Biology Division, Research Center for Advanced Science and Technology, The University of Tokyo, Tokyo, Japan. .,School of Biomedical Engineering, Faculty of Applied Science and Faculty of Medicine, The University of British Columbia, Vancouver, British Columbia, Canada. .,Institute for Advanced Biosciences, Keio University, Tsuruoka, Japan. .,Graduate School of Media and Governance, Keio University, Fujisawa, Japan.
| |
Collapse
|
4
|
Rosenthal SB, Willsey HR, Xu Y, Mei Y, Dea J, Wang S, Curtis C, Sempou E, Khokha MK, Chi NC, Willsey AJ, Fisch KM, Ideker T. A convergent molecular network underlying autism and congenital heart disease. Cell Syst 2021; 12:1094-1107.e6. [PMID: 34411509 PMCID: PMC8602730 DOI: 10.1016/j.cels.2021.07.009] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Revised: 05/10/2021] [Accepted: 07/28/2021] [Indexed: 12/29/2022]
Abstract
Patients with neurodevelopmental disorders, including autism, have an elevated incidence of congenital heart disease, but the extent to which these conditions share molecular mechanisms remains unknown. Here, we use network genetics to identify a convergent molecular network underlying autism and congenital heart disease. This network is impacted by damaging genetic variants from both disorders in multiple independent cohorts of patients, pinpointing 101 genes with shared genetic risk. Network analysis also implicates risk genes for each disorder separately, including 27 previously unidentified genes for autism and 46 for congenital heart disease. For 7 genes with shared risk, we create engineered disruptions in Xenopus tropicalis, confirming both heart and brain developmental abnormalities. The network includes a family of ion channels, such as the sodium transporter SCN2A, linking these functions to early heart and brain development. This study provides a road map for identifying risk genes and pathways involved in co-morbid conditions.
Collapse
Affiliation(s)
- Sara Brin Rosenthal
- Center for Computational Biology & Bioinformatics, Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Helen Rankin Willsey
- Department of Psychiatry and Behavioral Sciences, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Yuxiao Xu
- Department of Psychiatry and Behavioral Sciences, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Yuan Mei
- Division of Genetics, Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Jeanselle Dea
- Department of Psychiatry and Behavioral Sciences, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Sheng Wang
- Quantitative Biosciences Institute (QBI), University of California, San Francisco, San Francisco, CA 94158, USA
| | - Charlotte Curtis
- Division of Genetics, Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Emily Sempou
- Pediatric Genomics Discovery Program, Department of Pediatrics and Genetics, Yale University School of Medicine, New Haven, CT 06510, USA
| | - Mustafa K Khokha
- Pediatric Genomics Discovery Program, Department of Pediatrics and Genetics, Yale University School of Medicine, New Haven, CT 06510, USA
| | - Neil C Chi
- Division of Cardiology, Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Arthur Jeremy Willsey
- Department of Psychiatry and Behavioral Sciences, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA; Quantitative Biosciences Institute (QBI), University of California, San Francisco, San Francisco, CA 94158, USA.
| | - Kathleen M Fisch
- Center for Computational Biology & Bioinformatics, Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA.
| | - Trey Ideker
- Division of Genetics, Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
5
|
Her HL, Lin PT, Wu YW. PangenomeNet: a pan-genome-based network reveals functional modules on antimicrobial resistome for Escherichia coli strains. BMC Bioinformatics 2021; 22:548. [PMID: 34758735 PMCID: PMC8579557 DOI: 10.1186/s12859-021-04459-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Accepted: 10/19/2021] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND Discerning genes crucial to antimicrobial resistance (AMR) mechanisms is becoming more and more important to accurately and swiftly identify AMR pathogenic strains. Pangenome-wide association studies (e.g. Scoary) identified numerous putative AMR genes. However, only a tiny proportion of the putative resistance genes are annotated by AMR databases or Gene Ontology. In addition, many putative resistance genes are of unknown function (termed hypothetical proteins). An annotation tool is crucially needed in order to reveal the functional organization of the resistome and expand our knowledge of the AMR gene repertoire. RESULTS We developed an approach (PangenomeNet) for building co-functional networks from pan-genomes to infer functions for hypothetical genes. Using Escherichia coli as an example, we demonstrated that it is possible to build co-functional network from its pan-genome using co-inheritance, domain-sharing, and protein-protein-interaction information. The investigation of the network revealed that it fits the characteristics of biological networks and can be used for functional inferences. The subgraph consisting of putative meropenem resistance genes consists of clusters of stress response genes and resistance gene acquisition pathways. Resistome subgraphs also demonstrate drug-specific AMR genes such as beta-lactamase, as well as functional roles shared among multiple classes of drugs, mostly in the stress-related pathways. CONCLUSIONS By demonstrating the idea of pan-genome-based co-functional network on the E. coli species, we showed that the network can infer functional roles of the genes, including those without functional annotations, and provides holistic views on the putative antimicrobial resistomes. We hope that the pan-genome network idea can help formulate hypothesis for targeted experimental works.
Collapse
Affiliation(s)
- Hsuan-Lin Her
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, 92093, USA
| | - Po-Ting Lin
- Department of Mechanical Engineering, National Taiwan University of Science and Technology, No.43, Keelung Rd., Sec.4, Da'an Dist., Taipei City, 10609, Taiwan.
- Center for Cyber-Physical System Innovation, National Taiwan University of Science and Technology, Taipei, 10609, Taiwan.
| | - Yu-Wei Wu
- Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, 250, Wuxing St., Sinyi District, Taipei, 11031, Taiwan.
- Clinical Big Data Research Center, Taipei Medical University Hospital, Taipei, 11031, Taiwan.
| |
Collapse
|
6
|
Zheng F, Kelly MR, Ramms DJ, Heintschel ML, Tao K, Tutuncuoglu B, Lee JJ, Ono K, Foussard H, Chen M, Herrington KA, Silva E, Liu S, Chen J, Churas C, Wilson N, Kratz A, Pillich RT, Patel DN, Park J, Kuenzi B, Yu MK, Licon K, Pratt D, Kreisberg JF, Kim M, Swaney DL, Nan X, Fraley SI, Gutkind JS, Krogan NJ, Ideker T. Interpretation of cancer mutations using a multiscale map of protein systems. Science 2021; 374:eabf3067. [PMID: 34591613 PMCID: PMC9126298 DOI: 10.1126/science.abf3067] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
A major goal of cancer research is to understand how mutations distributed across diverse genes affect common cellular systems, including multiprotein complexes and assemblies. Two challenges—how to comprehensively map such systems and how to identify which are under mutational selection—have hindered this understanding. Accordingly, we created a comprehensive map of cancer protein systems integrating both new and published multi-omic interaction data at multiple scales of analysis. We then developed a unified statistical model that pinpoints 395 specific systems under mutational selection across 13 cancer types. This map, called NeST (Nested Systems in Tumors), incorporates canonical processes and notable discoveries, including a PIK3CA-actomyosin complex that inhibits phosphatidylinositol 3-kinase signaling and recurrent mutations in collagen complexes that promote tumor proliferation. These systems can be used as clinical biomarkers and implicate a total of 548 genes in cancer evolution and progression. This work shows how disparate tumor mutations converge on protein assemblies at different scales.
Collapse
Affiliation(s)
- Fan Zheng
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Marcus R. Kelly
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Dana J. Ramms
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
- Moores Cancer Center, University of California San Diego, La Jolla, CA 92093, USA
- Department of Pharmacology, University of California San Diego, La Jolla, CA 92093, USA
| | - Marissa L. Heintschel
- Department of Bioengineering, University of California San Diego, La Jolla, CA 92093, USA
| | - Kai Tao
- Department of Biomedical Engineering, Oregon Health and Science University, Portland, OR, 97239, USA
- Center for Spatial Systems Biomedicine, Oregon Health and Science University, Portland, OR, 97201, USA
| | - Beril Tutuncuoglu
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
- Department of Cellular and Molecular Pharmacology, University of California San Francisco, CA 94158, USA
- The J. David Gladstone Institutes, San Francisco, CA 94158, USA
- Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA, 94158, USA
| | - John J. Lee
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Keiichiro Ono
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Helene Foussard
- Department of Cellular and Molecular Pharmacology, University of California San Francisco, CA 94158, USA
- The J. David Gladstone Institutes, San Francisco, CA 94158, USA
- Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA, 94158, USA
| | - Michael Chen
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Kari A. Herrington
- Department of Biochemistry and Biophysics Center for Advanced Light Microscopy at UCSF, University of California San Francisco, San Francisco, CA, 94158, USA
| | - Erica Silva
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Sophie Liu
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Jing Chen
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Christopher Churas
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Nicholas Wilson
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Anton Kratz
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Rudolf T. Pillich
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Devin N. Patel
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Jisoo Park
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Brent Kuenzi
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Michael K. Yu
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Katherine Licon
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Dexter Pratt
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Jason F. Kreisberg
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Minkyu Kim
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
- Department of Cellular and Molecular Pharmacology, University of California San Francisco, CA 94158, USA
- The J. David Gladstone Institutes, San Francisco, CA 94158, USA
- Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA, 94158, USA
| | - Danielle L. Swaney
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
- Department of Cellular and Molecular Pharmacology, University of California San Francisco, CA 94158, USA
- The J. David Gladstone Institutes, San Francisco, CA 94158, USA
- Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA, 94158, USA
| | - Xiaolin Nan
- Department of Biomedical Engineering, Oregon Health and Science University, Portland, OR, 97239, USA
- Center for Spatial Systems Biomedicine, Oregon Health and Science University, Portland, OR, 97201, USA
- Knight Cancer Early Detection Advanced Research Center, Oregon Health and Science University, Portland, OR, 97201, USA
| | - Stephanie I. Fraley
- Department of Bioengineering, University of California San Diego, La Jolla, CA 92093, USA
| | - J. Silvio Gutkind
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
- Moores Cancer Center, University of California San Diego, La Jolla, CA 92093, USA
- Department of Pharmacology, University of California San Diego, La Jolla, CA 92093, USA
| | - Nevan J. Krogan
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
- Department of Cellular and Molecular Pharmacology, University of California San Francisco, CA 94158, USA
- The J. David Gladstone Institutes, San Francisco, CA 94158, USA
- Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA, 94158, USA
| | - Trey Ideker
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| |
Collapse
|
7
|
Tanaka H, Kreisberg JF, Ideker T. Genetic dissection of complex traits using hierarchical biological knowledge. PLoS Comput Biol 2021; 17:e1009373. [PMID: 34534210 PMCID: PMC8480841 DOI: 10.1371/journal.pcbi.1009373] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 09/29/2021] [Accepted: 08/23/2021] [Indexed: 11/18/2022] Open
Abstract
Despite the growing constellation of genetic loci linked to common traits, these loci have yet to account for most heritable variation, and most act through poorly understood mechanisms. Recent machine learning (ML) systems have used hierarchical biological knowledge to associate genetic mutations with phenotypic outcomes, yielding substantial predictive power and mechanistic insight. Here, we use an ontology-guided ML system to map single nucleotide variants (SNVs) focusing on 6 classic phenotypic traits in natural yeast populations. The 29 identified loci are largely novel and account for ~17% of the phenotypic variance, versus <3% for standard genetic analysis. Representative results show that sensitivity to hydroxyurea is linked to SNVs in two alternative purine biosynthesis pathways, and that sensitivity to copper arises through failure to detoxify reactive oxygen species in fatty acid metabolism. This work demonstrates a knowledge-based approach to amplifying and interpreting signals in population genetic studies. Genome-wide association studies (GWAS) have identified many important loci for common diseases and other traits. However, the loci identified by these studies are almost always many steps away from an understanding of underlying biological mechanisms. Here we develop an approach using hierarchical biological knowledge to identify genes and pathways responsible for phenotypic traits. Variants identified by the new method could explain a substantially greater fraction of heritability than previously reported. Moreover, we identified mechanistic pathways by which each causal variant affects cellular function. For example, we find that sensitivity to hydroxyurea is tied to genetic variants in two alternative purine biosynthesis pathways, and that sensitivity to copper arises through failure to detoxify reactive oxygen species in fatty acid metabolism. The new approach is a potentially transformative concept for understanding the genetic drivers of phenotypic variance, with potential applications in understanding traits in biomedicine and agriculture.
Collapse
Affiliation(s)
- Hidenori Tanaka
- Department of Medicine, University of California San Diego, La Jolla, California, United States of America
| | - Jason F. Kreisberg
- Department of Medicine, University of California San Diego, La Jolla, California, United States of America
- * E-mail: (JFK); (TI)
| | - Trey Ideker
- Department of Medicine, University of California San Diego, La Jolla, California, United States of America
- * E-mail: (JFK); (TI)
| |
Collapse
|
8
|
Schaffer LV, Ideker T. Mapping the multiscale structure of biological systems. Cell Syst 2021; 12:622-635. [PMID: 34139169 PMCID: PMC8245186 DOI: 10.1016/j.cels.2021.05.012] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Revised: 05/04/2021] [Accepted: 05/14/2021] [Indexed: 01/14/2023]
Abstract
Biological systems are by nature multiscale, consisting of subsystems that factor into progressively smaller units in a deeply hierarchical structure. At any level of the hierarchy, an ever-increasing diversity of technologies can be applied to characterize the corresponding biological units and their relations, resulting in large networks of physical or functional proximities-e.g., proximities of amino acids within a protein, of proteins within a complex, or of cell types within a tissue. Here, we review general concepts and progress in using network proximity measures as a basis for creation of multiscale hierarchical maps of biological systems. We discuss the functionalization of these maps to create predictive models, including those useful in translation of genotype to phenotype, along with strategies for model visualization and challenges faced by multiscale modeling in the near future. Collectively, these approaches enable a unified hierarchical approach to biological data, with application from the molecular to the macroscopic.
Collapse
Affiliation(s)
- Leah V Schaffer
- Division of Genetics, Department of Medicine, University of California San Diego, San Diego, La Jolla, CA 92093, USA
| | - Trey Ideker
- Division of Genetics, Department of Medicine, University of California San Diego, San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
9
|
Zheng F, Zhang S, Churas C, Pratt D, Bahar I, Ideker T. HiDeF: identifying persistent structures in multiscale 'omics data. Genome Biol 2021; 22:21. [PMID: 33413539 PMCID: PMC7789082 DOI: 10.1186/s13059-020-02228-4] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2020] [Accepted: 12/08/2020] [Indexed: 01/14/2023] Open
Abstract
In any 'omics study, the scale of analysis can dramatically affect the outcome. For instance, when clustering single-cell transcriptomes, is the analysis tuned to discover broad or specific cell types? Likewise, protein communities revealed from protein networks can vary widely in sizes depending on the method. Here, we use the concept of persistent homology, drawn from mathematical topology, to identify robust structures in data at all scales simultaneously. Application to mouse single-cell transcriptomes significantly expands the catalog of identified cell types, while analysis of SARS-COV-2 protein interactions suggests hijacking of WNT. The method, HiDeF, is available via Python and Cytoscape.
Collapse
Affiliation(s)
- Fan Zheng
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA, 92093, USA.
| | - She Zhang
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, USA
| | - Christopher Churas
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA, 92093, USA
| | - Dexter Pratt
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA, 92093, USA
| | - Ivet Bahar
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, USA
| | - Trey Ideker
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA, 92093, USA.
| |
Collapse
|
10
|
Singhal A, Cao S, Churas C, Pratt D, Fortunato S, Zheng F, Ideker T. Multiscale community detection in Cytoscape. PLoS Comput Biol 2020; 16:e1008239. [PMID: 33095781 PMCID: PMC7584444 DOI: 10.1371/journal.pcbi.1008239] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Accepted: 08/12/2020] [Indexed: 02/08/2023] Open
Abstract
Detection of community structure has become a fundamental step in the analysis of biological networks with application to protein function annotation, disease gene prediction, and drug discovery. This recent impact creates a need to make these techniques and their accompanying visualization schemes available to a broad range of biologists. Here we present a service-oriented, end-to-end software framework, CDAPS (Community Detection APplication and Service), that integrates the identification, annotation, visualization, and interrogation of multiscale network communities, accessible within the popular Cytoscape network analysis platform. With novel design principles, CDAPS addresses unmet new challenges, such as identifying hierarchical community structures, comparison of outputs generated from diverse network resources, and easy deployment of new algorithms, to facilitate community-sourced science. We demonstrate that the CDAPS framework can be applied to high-throughput protein-protein interaction networks to gain novel insights, such as the identification of putative new members of known protein complexes.
Collapse
Affiliation(s)
- Akshat Singhal
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California, United States of America
| | - Song Cao
- Department of Medicine, University of California, San Diego, La Jolla, California, United States of America
| | - Christopher Churas
- Department of Medicine, University of California, San Diego, La Jolla, California, United States of America
| | - Dexter Pratt
- Department of Medicine, University of California, San Diego, La Jolla, California, United States of America
| | - Santo Fortunato
- School of Informatics, Computing, and Engineering, Indiana University, Bloomington, Indiana, United States of America
| | - Fan Zheng
- Department of Medicine, University of California, San Diego, La Jolla, California, United States of America
- * E-mail: (FZ); (TI)
| | - Trey Ideker
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California, United States of America
- Department of Medicine, University of California, San Diego, La Jolla, California, United States of America
- * E-mail: (FZ); (TI)
| |
Collapse
|
11
|
Zheng F, Zhang S, Churas C, Pratt D, Bahar I, Ideker T. Identifying persistent structures in multiscale 'omics data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2020:2020.06.16.151555. [PMID: 32587977 PMCID: PMC7310637 DOI: 10.1101/2020.06.16.151555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
In any 'omics study, the scale of analysis can dramatically affect the outcome. For instance, when clustering single-cell transcriptomes, is the analysis tuned to discover broad or specific cell types? Likewise, protein communities revealed from protein networks can vary widely in sizes depending on the method. Here we use the concept of "persistent homology", drawn from mathematical topology, to identify robust structures in data at all scales simultaneously. Application to mouse single-cell transcriptomes significantly expands the catalog of identified cell types, while analysis of SARS-COV-2 protein interactions suggests hijacking of WNT. The method, HiDeF, is available via Python and Cytoscape.
Collapse
Affiliation(s)
- Fan Zheng
- Division of Genetics, Department of Medicine, University of California, San Diego, CA 92093, USA
- These authors contributed equally to this work
| | - She Zhang
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15123, USA
- These authors contributed equally to this work
| | - Christopher Churas
- Division of Genetics, Department of Medicine, University of California, San Diego, CA 92093, USA
| | - Dexter Pratt
- Division of Genetics, Department of Medicine, University of California, San Diego, CA 92093, USA
| | - Ivet Bahar
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15123, USA
| | - Trey Ideker
- Division of Genetics, Department of Medicine, University of California, San Diego, CA 92093, USA
| |
Collapse
|
12
|
Thomas G, Bain JM, Budge S, Brown AJP, Ames RM. Identifying Candida albicans Gene Networks Involved in Pathogenicity. Front Genet 2020; 11:375. [PMID: 32391057 PMCID: PMC7193023 DOI: 10.3389/fgene.2020.00375] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2020] [Accepted: 03/26/2020] [Indexed: 11/17/2022] Open
Abstract
Candida albicans is a normal member of the human microbiome. It is also an opportunistic pathogen, which can cause life-threatening systemic infections in severely immunocompromized individuals. Despite the availability of antifungal drugs, mortality rates of systemic infections are high and new drugs are needed to overcome therapeutic challenges including the emergence of drug resistance. Targeting known disease pathways has been suggested as a promising avenue for the development of new antifungals. However, <30% of C. albicans genes are verified with experimental evidence of a gene product, and the full complement of genes involved in important disease processes is currently unknown. Tools to predict the function of partially or uncharacterized genes and generate testable hypotheses will, therefore, help to identify potential targets for new antifungal development. Here, we employ a network-extracted ontology to leverage publicly available transcriptomics data and identify potential candidate genes involved in disease processes. A subset of these genes has been phenotypically screened using available deletion strains and we present preliminary data that one candidate, PEP8, is involved in hyphal development and immune evasion. This work demonstrates the utility of network-extracted ontologies in predicting gene function to generate testable hypotheses that can be applied to pathogenic systems. This could represent a novel first step to identifying targets for new antifungal therapies.
Collapse
Affiliation(s)
- Graham Thomas
- Biosciences, University of Exeter, Exeter, United Kingdom
| | - Judith M Bain
- Aberdeen Fungal Group, Institute of Medical Sciences, University of Aberdeen, Aberdeen, United Kingdom
| | - Susan Budge
- Aberdeen Fungal Group, Institute of Medical Sciences, University of Aberdeen, Aberdeen, United Kingdom
| | - Alistair J P Brown
- Aberdeen Fungal Group, Institute of Medical Sciences, University of Aberdeen, Aberdeen, United Kingdom.,MRC Centre for Medical Mycology at the University of Exeter, Biosciences, University of Exeter, Exeter, United Kingdom
| | - Ryan M Ames
- Biosciences, University of Exeter, Exeter, United Kingdom
| |
Collapse
|
13
|
Mechanistic integration of exposure and effects: advances to apply systems toxicology in support of regulatory decision-making. CURRENT OPINION IN TOXICOLOGY 2019. [DOI: 10.1016/j.cotox.2019.09.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|