1
|
Li MM, Huang Y, Sumathipala M, Liang MQ, Valdeolivas A, Ananthakrishnan AN, Liao K, Marbach D, Zitnik M. Contextual AI models for single-cell protein biology. Nat Methods 2024; 21:1546-1557. [PMID: 39039335 PMCID: PMC11310085 DOI: 10.1038/s41592-024-02341-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 06/10/2024] [Indexed: 07/24/2024]
Abstract
Understanding protein function and developing molecular therapies require deciphering the cell types in which proteins act as well as the interactions between proteins. However, modeling protein interactions across biological contexts remains challenging for existing algorithms. Here we introduce PINNACLE, a geometric deep learning approach that generates context-aware protein representations. Leveraging a multiorgan single-cell atlas, PINNACLE learns on contextualized protein interaction networks to produce 394,760 protein representations from 156 cell type contexts across 24 tissues. PINNACLE's embedding space reflects cellular and tissue organization, enabling zero-shot retrieval of the tissue hierarchy. Pretrained protein representations can be adapted for downstream tasks: enhancing 3D structure-based representations for resolving immuno-oncological protein interactions, and investigating drugs' effects across cell types. PINNACLE outperforms state-of-the-art models in nominating therapeutic targets for rheumatoid arthritis and inflammatory bowel diseases and pinpoints cell type contexts with higher predictive capability than context-free models. PINNACLE's ability to adjust its outputs on the basis of the context in which it operates paves the way for large-scale context-specific predictions in biology.
Collapse
Affiliation(s)
- Michelle M Li
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Yepeng Huang
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Marissa Sumathipala
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Man Qing Liang
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Alberto Valdeolivas
- Roche Pharma Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Basel, Switzerland
| | - Ashwin N Ananthakrishnan
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Gastroenterology, Massachusetts General Hospital, Boston, MA, USA
| | - Katherine Liao
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Rheumatology, Inflammation, and Immunity, Brigham and Women's Hospital, Boston, MA, USA
| | - Daniel Marbach
- Roche Pharma Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Basel, Switzerland
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Allston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Harvard Data Science Initiative, Cambridge, MA, USA.
| |
Collapse
|
2
|
Li MM, Huang Y, Sumathipala M, Liang MQ, Valdeolivas A, Ananthakrishnan AN, Liao K, Marbach D, Zitnik M. Contextual AI models for single-cell protein biology. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.07.18.549602. [PMID: 37503080 PMCID: PMC10370131 DOI: 10.1101/2023.07.18.549602] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Understanding protein function and developing molecular therapies require deciphering the cell types in which proteins act as well as the interactions between proteins. However, modeling protein interactions across biological contexts remains challenging for existing algorithms. Here, we introduce Pinnacle, a geometric deep learning approach that generates context-aware protein representations. Leveraging a multi-organ single-cell atlas, Pinnacle learns on contextualized protein interaction networks to produce 394,760 protein representations from 156 cell type contexts across 24 tissues. Pinnacle's embedding space reflects cellular and tissue organization, enabling zero-shot retrieval of the tissue hierarchy. Pretrained protein representations can be adapted for downstream tasks: enhancing 3D structure-based representations for resolving immuno-oncological protein interactions, and investigating drugs' effects across cell types. Pinnacle outperforms state-of-the-art models in nominating therapeutic targets for rheumatoid arthritis and inflammatory bowel diseases, and pinpoints cell type contexts with higher predictive capability than context-free models. Pinnacle's ability to adjust its outputs based on the context in which it operates paves way for large-scale context-specific predictions in biology.
Collapse
Affiliation(s)
- Michelle M. Li
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Yepeng Huang
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Marissa Sumathipala
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Man Qing Liang
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Alberto Valdeolivas
- Roche Pharma Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Basel, Switzerland
| | - Ashwin N. Ananthakrishnan
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Gastroenterology, Massachusetts General Hospital, Boston, MA, USA
| | - Katherine Liao
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Rheumatology, Inflammation, and Immunity, Brigham and Women’s Hospital, Boston, MA, USA
| | - Daniel Marbach
- Roche Pharma Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Basel, Switzerland
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Allston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Data Science Initiative, Cambridge, MA, USA
| |
Collapse
|
3
|
Inge MM, Miller R, Hook H, Bray D, Keenan JL, Zhao R, Gilmore TD, Siggers T. Rapid profiling of transcription factor-cofactor interaction networks reveals principles of epigenetic regulation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.05.588333. [PMID: 38617258 PMCID: PMC11014505 DOI: 10.1101/2024.04.05.588333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/16/2024]
Abstract
Transcription factor (TF)-cofactor (COF) interactions define dynamic, cell-specific networks that govern gene expression; however, these networks are understudied due to a lack of methods for high-throughput profiling of DNA-bound TF-COF complexes. Here we describe the Cofactor Recruitment (CoRec) method for rapid profiling of cell-specific TF-COF complexes. We define a lysine acetyltransferase (KAT)-TF network in resting and stimulated T cells. We find promiscuous recruitment of KATs for many TFs and that 35% of KAT-TF interactions are condition specific. KAT-TF interactions identify NF-κB as a primary regulator of acutely induced H3K27ac. Finally, we find that heterotypic clustering of CBP/P300-recruiting TFs is a strong predictor of total promoter H3K27ac. Our data supports clustering of TF sites that broadly recruit KATs as a mechanism for widespread co-occurring histone acetylation marks. CoRec can be readily applied to different cell systems and provides a powerful approach to define TF-COF networks impacting chromatin state and gene regulation.
Collapse
Affiliation(s)
- MM Inge
- Department of Biology, Boston University, Boston, MA, USA
- Biological Design Center, Boston University, Boston, MA, USA
- These authors contributed equally
| | - R Miller
- Department of Biology, Boston University, Boston, MA, USA
- Bioinformatics Program, Boston University, Boston, MA, USA
- Biological Design Center, Boston University, Boston, MA, USA
- These authors contributed equally
| | - H Hook
- Department of Biology, Boston University, Boston, MA, USA
| | - D Bray
- Department of Biology, Boston University, Boston, MA, USA
- Bioinformatics Program, Boston University, Boston, MA, USA
| | - JL Keenan
- Department of Biology, Boston University, Boston, MA, USA
- Bioinformatics Program, Boston University, Boston, MA, USA
| | - R Zhao
- Department of Biology, Boston University, Boston, MA, USA
| | - TD Gilmore
- Department of Biology, Boston University, Boston, MA, USA
| | - T Siggers
- Department of Biology, Boston University, Boston, MA, USA
- Bioinformatics Program, Boston University, Boston, MA, USA
- Biological Design Center, Boston University, Boston, MA, USA
| |
Collapse
|
4
|
Kim JY, Hong N, Park S, Ham SW, Kim EJ, Kim SO, Jang J, Kim Y, Kim JK, Kim SC, Park JW, Kim H. Jagged1 intracellular domain/SMAD3 complex transcriptionally regulates TWIST1 to drive glioma invasion. Cell Death Dis 2023; 14:822. [PMID: 38092725 PMCID: PMC10719344 DOI: 10.1038/s41419-023-06356-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Revised: 11/25/2023] [Accepted: 11/30/2023] [Indexed: 12/17/2023]
Abstract
Jagged1 (JAG1) is a Notch ligand that correlates with tumor progression. Not limited to its function as a ligand, JAG1 can be cleaved, and its intracellular domain translocates to the nucleus, where it functions as a transcriptional cofactor. Previously, we showed that JAG1 intracellular domain (JICD1) forms a protein complex with DDX17/SMAD3/TGIF2. However, the molecular mechanisms underlying JICD1-mediated tumor aggressiveness remains unclear. Here, we demonstrate that JICD1 enhances the invasive phenotypes of glioblastoma cells by transcriptionally activating epithelial-to-mesenchymal transition (EMT)-related genes, especially TWIST1. The inhibition of TWIST1 reduced JICD1-driven tumor aggressiveness. Although SMAD3 is an important component of transforming growth factor (TGF)-β signaling, the JICD1/SMAD3 transcriptional complex was shown to govern brain tumor invasion independent of TGF-β signaling. Moreover, JICD1-TWIST1-MMP2 and MMP9 axes were significantly correlated with clinical outcome of glioblastoma patients. Collectively, we identified the JICD1/SMAD3-TWIST1 axis as a novel inducer of invasive phenotypes in cancer cells.
Collapse
Affiliation(s)
- Jung Yun Kim
- Department of Biotechnology, College of Life Sciences and Biotechnology, Korea University, Seoul, 02841, Republic of Korea
- Institute of Animal Molecular Biotechnology, Korea University, Seoul, 02841, Republic of Korea
| | - Nayoung Hong
- Department of Biotechnology, College of Life Sciences and Biotechnology, Korea University, Seoul, 02841, Republic of Korea
- Institute of Animal Molecular Biotechnology, Korea University, Seoul, 02841, Republic of Korea
| | - Sehyeon Park
- Department of Biotechnology, College of Life Sciences and Biotechnology, Korea University, Seoul, 02841, Republic of Korea
- Institute of Animal Molecular Biotechnology, Korea University, Seoul, 02841, Republic of Korea
| | - Seok Won Ham
- MEDIFIC Inc., Hwaseong-si, Gyeonggi-do, 18469, Republic of Korea
| | - Eun-Jung Kim
- MEDIFIC Inc., Hwaseong-si, Gyeonggi-do, 18469, Republic of Korea
| | - Sung-Ok Kim
- Department of Biochemistry, College of Medicine, Hallym University, Chuncheon, 24252, Republic of Korea
| | - Junseok Jang
- Department of Biotechnology, College of Life Sciences and Biotechnology, Korea University, Seoul, 02841, Republic of Korea
- Institute of Animal Molecular Biotechnology, Korea University, Seoul, 02841, Republic of Korea
| | - Yoonji Kim
- Department of Biotechnology, College of Life Sciences and Biotechnology, Korea University, Seoul, 02841, Republic of Korea
- Institute of Animal Molecular Biotechnology, Korea University, Seoul, 02841, Republic of Korea
| | - Jun-Kyum Kim
- MEDIFIC Inc., Hwaseong-si, Gyeonggi-do, 18469, Republic of Korea
| | - Sung-Chan Kim
- Department of Biochemistry, College of Medicine, Hallym University, Chuncheon, 24252, Republic of Korea
| | - Jong-Whi Park
- Department of Life Sciences, Gachon University, Incheon, 21999, Republic of Korea.
| | - Hyunggee Kim
- Department of Biotechnology, College of Life Sciences and Biotechnology, Korea University, Seoul, 02841, Republic of Korea.
- Institute of Animal Molecular Biotechnology, Korea University, Seoul, 02841, Republic of Korea.
| |
Collapse
|
5
|
Hao B, Kovács IA. A positive statistical benchmark to assess network agreement. Nat Commun 2023; 14:2988. [PMID: 37225699 DOI: 10.1038/s41467-023-38625-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Accepted: 05/09/2023] [Indexed: 05/26/2023] Open
Abstract
Current computational methods for validating experimental network datasets compare overlap, i.e., shared links, with a reference network using a negative benchmark. However, this fails to quantify the level of agreement between the two networks. To address this, we propose a positive statistical benchmark to determine the maximum possible overlap between networks. Our approach can efficiently generate this benchmark in a maximum entropy framework and provides a way to assess whether the observed overlap is significantly different from the best-case scenario. We introduce a normalized overlap score, Normlap, to enhance comparisons between experimental networks. As an application, we compare molecular and functional networks, resulting in an agreement network of human as well as yeast network datasets. The Normlap score can improve the comparison between experimental networks by providing a computational alternative to network thresholding and validation.
Collapse
Affiliation(s)
- Bingjie Hao
- Department of Physics and Astronomy, Northwestern University, Evanston, IL, 60208, USA
| | - István A Kovács
- Department of Physics and Astronomy, Northwestern University, Evanston, IL, 60208, USA.
- Northwestern Institute on Complex Systems, Northwestern University, Evanston, IL, 60208, USA.
| |
Collapse
|
6
|
Federico A, Kern J, Varelas X, Monti S. Structure Learning for Gene Regulatory Networks. PLoS Comput Biol 2023; 19:e1011118. [PMID: 37200395 DOI: 10.1371/journal.pcbi.1011118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Revised: 05/31/2023] [Accepted: 04/20/2023] [Indexed: 05/20/2023] Open
Abstract
Inference of biological network structures is often performed on high-dimensional data, yet is hindered by the limited sample size of high throughput "omics" data typically available. To overcome this challenge, often referred to as the "small n, large p problem," we exploit known organizing principles of biological networks that are sparse, modular, and likely share a large portion of their underlying architecture. We present SHINE-Structure Learning for Hierarchical Networks-a framework for defining data-driven structural constraints and incorporating a shared learning paradigm for efficiently learning multiple Markov networks from high-dimensional data at large p/n ratios not previously feasible. We evaluated SHINE on Pan-Cancer data comprising 23 tumor types, and found that learned tumor-specific networks exhibit expected graph properties of real biological networks, recapture previously validated interactions, and recapitulate findings in literature. Application of SHINE to the analysis of subtype-specific breast cancer networks identified key genes and biological processes for tumor maintenance and survival as well as potential therapeutic targets for modulating known breast cancer disease genes.
Collapse
Affiliation(s)
- Anthony Federico
- Section of Computational Biomedicine, Boston University School of Medicine, Boston, Massachusetts, United States of America
- Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
| | - Joseph Kern
- Department of Biochemistry, Boston University School of Medicine, Boston, Massachusetts, United States of America
| | - Xaralabos Varelas
- Department of Biochemistry, Boston University School of Medicine, Boston, Massachusetts, United States of America
| | - Stefano Monti
- Section of Computational Biomedicine, Boston University School of Medicine, Boston, Massachusetts, United States of America
- Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
| |
Collapse
|
7
|
The RNA helicase DDX5 cooperates with EHMT2 to sustain alveolar rhabdomyosarcoma growth. Cell Rep 2022; 40:111267. [PMID: 36044855 DOI: 10.1016/j.celrep.2022.111267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 06/14/2022] [Accepted: 08/05/2022] [Indexed: 11/24/2022] Open
Abstract
Rhabdomyosarcoma (RMS) is the most common soft-tissue sarcoma of childhood characterized by the inability to exit the proliferative myoblast-like stage. The alveolar fusion positive subtype (FP-RMS) is the most aggressive and is mainly caused by the expression of PAX3/7-FOXO1 oncoproteins, which are challenging pharmacological targets. Here, we show that the DEAD box RNA helicase 5 (DDX5) is overexpressed in alveolar RMS cells and that its depletion and pharmacological inhibition decrease FP-RMS viability and slow tumor growth in xenograft models. Mechanistically, we provide evidence that DDX5 functions upstream of the EHMT2/AKT survival signaling pathway, by directly interacting with EHMT2 mRNA, modulating its stability and consequent protein expression. We show that EHMT2 in turns regulates PAX3-FOXO1 activity in a methylation-dependent manner, thus sustaining FP-RMS myoblastic state. Together, our findings identify another survival-promoting loop in FP-RMS and highlight DDX5 as a potential therapeutic target to arrest RMS growth.
Collapse
|
8
|
Balabin H, Hoyt CT, Birkenbihl C, Gyori BM, Bachman J, Kodamullil AT, Plöger PG, Hofmann-Apitius M, Domingo-Fernández D. STonKGs: a sophisticated transformer trained on biomedical text and knowledge graphs. Bioinformatics 2022; 38:1648-1656. [PMID: 34986221 PMCID: PMC8896635 DOI: 10.1093/bioinformatics/btac001] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Revised: 12/09/2021] [Accepted: 01/03/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION The majority of biomedical knowledge is stored in structured databases or as unstructured text in scientific publications. This vast amount of information has led to numerous machine learning-based biological applications using either text through natural language processing (NLP) or structured data through knowledge graph embedding models. However, representations based on a single modality are inherently limited. RESULTS To generate better representations of biological knowledge, we propose STonKGs, a Sophisticated Transformer trained on biomedical text and Knowledge Graphs (KGs). This multimodal Transformer uses combined input sequences of structured information from KGs and unstructured text data from biomedical literature to learn joint representations in a shared embedding space. First, we pre-trained STonKGs on a knowledge base assembled by the Integrated Network and Dynamical Reasoning Assembler consisting of millions of text-triple pairs extracted from biomedical literature by multiple NLP systems. Then, we benchmarked STonKGs against three baseline models trained on either one of the modalities (i.e. text or KG) across eight different classification tasks, each corresponding to a different biological application. Our results demonstrate that STonKGs outperforms both baselines, especially on the more challenging tasks with respect to the number of classes, improving upon the F1-score of the best baseline by up to 0.084 (i.e. from 0.881 to 0.965). Finally, our pre-trained model as well as the model architecture can be adapted to various other transfer learning applications. AVAILABILITY AND IMPLEMENTATION We make the source code and the Python package of STonKGs available at GitHub (https://github.com/stonkgs/stonkgs) and PyPI (https://pypi.org/project/stonkgs/). The pre-trained STonKGs models and the task-specific classification models are respectively available at https://huggingface.co/stonkgs/stonkgs-150k and https://zenodo.org/communities/stonkgs. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Charles Tapley Hoyt
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| | - Colin Birkenbihl
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, 53757 Sankt Augustin, Germany
| | - Benjamin M Gyori
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| | - John Bachman
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| | - Alpha Tom Kodamullil
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, 53757 Sankt Augustin, Germany
| | - Paul G Plöger
- Department of Bonn-Rhein-Sieg, University of Applied Sciences, 53757 Sankt Augustin, Germany
| | - Martin Hofmann-Apitius
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, 53757 Sankt Augustin, Germany
| | | |
Collapse
|