1
|
Argov CM, Shneyour A, Jubran J, Sabag E, Mansbach A, Sepunaru Y, Filtzer E, Gruber G, Volozhinsky M, Yogev Y, Birk O, Chalifa-Caspi V, Rokach L, Yeger-Lotem E. Tissue-aware interpretation of genetic variants advances the etiology of rare diseases. Mol Syst Biol 2024; 20:1187-1206. [PMID: 39285047 PMCID: PMC11535248 DOI: 10.1038/s44320-024-00061-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Revised: 08/08/2024] [Accepted: 08/09/2024] [Indexed: 09/19/2024] Open
Abstract
Pathogenic variants underlying Mendelian diseases often disrupt the normal physiology of a few tissues and organs. However, variant effect prediction tools that aim to identify pathogenic variants are typically oblivious to tissue contexts. Here we report a machine-learning framework, denoted "Tissue Risk Assessment of Causality by Expression for variants" (TRACEvar, https://netbio.bgu.ac.il/TRACEvar/ ), that offers two advancements. First, TRACEvar predicts pathogenic variants that disrupt the normal physiology of specific tissues. This was achieved by creating 14 tissue-specific models that were trained on over 14,000 variants and combined 84 attributes of genetic variants with 495 attributes derived from tissue omics. TRACEvar outperformed 10 well-established and tissue-oblivious variant effect prediction tools. Second, the resulting models are interpretable, thereby illuminating variants' mode of action. Application of TRACEvar to variants of 52 rare-disease patients highlighted pathogenicity mechanisms and relevant disease processes. Lastly, the interpretation of all tissue models revealed that top-ranking determinants of pathogenicity included attributes of disease-affected tissues, particularly cellular process activities. Collectively, these results show that tissue contexts and interpretable machine-learning models can greatly enhance the etiology of rare diseases.
Collapse
Affiliation(s)
- Chanan M Argov
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, 84105, Israel
| | - Ariel Shneyour
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, 84105, Israel
| | - Juman Jubran
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, 84105, Israel
| | - Eric Sabag
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, 84105, Israel
| | - Avigdor Mansbach
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, 84105, Israel
| | - Yair Sepunaru
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, 84105, Israel
| | - Emmi Filtzer
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, 84105, Israel
| | - Gil Gruber
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, 84105, Israel
| | - Miri Volozhinsky
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, 84105, Israel
| | - Yuval Yogev
- Morris Kahn Laboratory of Human Genetics and the Genetics Institute at Soroka Medical Center, Faculty of Health Sciences, Ben Gurion University of the Negev, Beer Sheva, 84105, Israel
| | - Ohad Birk
- Morris Kahn Laboratory of Human Genetics and the Genetics Institute at Soroka Medical Center, Faculty of Health Sciences, Ben Gurion University of the Negev, Beer Sheva, 84105, Israel
- The National Institute for Biotechnology in the Negev, Ben-Gurion University of the Negev, Beer Sheva, 84105, Israel
| | - Vered Chalifa-Caspi
- Ilse Katz Institute for Nanoscale Science & Technology, Ben-Gurion University of the Negev, Beer-Sheva, 84105, Israel
| | - Lior Rokach
- Department of Software & Information Systems Engineering, Faculty of Engineering Sciences, Ben-Gurion University of the Negev, Beer Sheva, 84105, Israel
| | - Esti Yeger-Lotem
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, 84105, Israel.
- The National Institute for Biotechnology in the Negev, Ben-Gurion University of the Negev, Beer Sheva, 84105, Israel.
| |
Collapse
|
2
|
Kupershmidt Y, Kasif S, Sharan R. SPIDER: constructing cell-type-specific protein-protein interaction networks. BIOINFORMATICS ADVANCES 2024; 4:vbae130. [PMID: 39346952 PMCID: PMC11438548 DOI: 10.1093/bioadv/vbae130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Revised: 08/11/2024] [Accepted: 08/28/2024] [Indexed: 10/01/2024]
Abstract
Motivation Protein-protein interactions (PPIs) play essential roles in the buildup of cellular machinery and provide the skeleton for cellular signaling. However, these biochemical roles are context dependent and interactions may change across cell type, time, and space. In contrast, PPI detection assays are run in a single condition that may not even be an endogenous condition of the organism, resulting in static networks that do not reflect full cellular complexity. Thus, there is a need for computational methods to predict cell-type-specific interactions. Results Here we present SPIDER (Supervised Protein Interaction DEtectoR), a graph attention-based model for predicting cell-type-specific PPI networks. In contrast to previous attempts at this problem, which were unsupervised in nature, our model's training is guided by experimentally measured cell-type-specific networks, enhancing its performance. We evaluate our method using experimental data of cell-type-specific networks from both humans and mice, and show that it outperforms current approaches by a large margin. We further demonstrate the ability of our method to generalize the predictions to datasets of tissues lacking prior PPI experimental data. We leverage the networks predicted by the model to facilitate the identification of tissue-specific disease genes. Availability and implementation Our code and data are available at https://github.com/Kuper994/SPIDER.
Collapse
Affiliation(s)
- Yael Kupershmidt
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel
| | - Simon Kasif
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, United States
| | - Roded Sharan
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel
| |
Collapse
|
3
|
Li MM, Huang Y, Sumathipala M, Liang MQ, Valdeolivas A, Ananthakrishnan AN, Liao K, Marbach D, Zitnik M. Contextual AI models for single-cell protein biology. Nat Methods 2024; 21:1546-1557. [PMID: 39039335 PMCID: PMC11310085 DOI: 10.1038/s41592-024-02341-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 06/10/2024] [Indexed: 07/24/2024]
Abstract
Understanding protein function and developing molecular therapies require deciphering the cell types in which proteins act as well as the interactions between proteins. However, modeling protein interactions across biological contexts remains challenging for existing algorithms. Here we introduce PINNACLE, a geometric deep learning approach that generates context-aware protein representations. Leveraging a multiorgan single-cell atlas, PINNACLE learns on contextualized protein interaction networks to produce 394,760 protein representations from 156 cell type contexts across 24 tissues. PINNACLE's embedding space reflects cellular and tissue organization, enabling zero-shot retrieval of the tissue hierarchy. Pretrained protein representations can be adapted for downstream tasks: enhancing 3D structure-based representations for resolving immuno-oncological protein interactions, and investigating drugs' effects across cell types. PINNACLE outperforms state-of-the-art models in nominating therapeutic targets for rheumatoid arthritis and inflammatory bowel diseases and pinpoints cell type contexts with higher predictive capability than context-free models. PINNACLE's ability to adjust its outputs on the basis of the context in which it operates paves the way for large-scale context-specific predictions in biology.
Collapse
Affiliation(s)
- Michelle M Li
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Yepeng Huang
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Marissa Sumathipala
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Man Qing Liang
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Alberto Valdeolivas
- Roche Pharma Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Basel, Switzerland
| | - Ashwin N Ananthakrishnan
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Gastroenterology, Massachusetts General Hospital, Boston, MA, USA
| | - Katherine Liao
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Rheumatology, Inflammation, and Immunity, Brigham and Women's Hospital, Boston, MA, USA
| | - Daniel Marbach
- Roche Pharma Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Basel, Switzerland
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Allston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Harvard Data Science Initiative, Cambridge, MA, USA.
| |
Collapse
|
4
|
Li MM, Huang Y, Sumathipala M, Liang MQ, Valdeolivas A, Ananthakrishnan AN, Liao K, Marbach D, Zitnik M. Contextual AI models for single-cell protein biology. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.07.18.549602. [PMID: 37503080 PMCID: PMC10370131 DOI: 10.1101/2023.07.18.549602] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Understanding protein function and developing molecular therapies require deciphering the cell types in which proteins act as well as the interactions between proteins. However, modeling protein interactions across biological contexts remains challenging for existing algorithms. Here, we introduce Pinnacle, a geometric deep learning approach that generates context-aware protein representations. Leveraging a multi-organ single-cell atlas, Pinnacle learns on contextualized protein interaction networks to produce 394,760 protein representations from 156 cell type contexts across 24 tissues. Pinnacle's embedding space reflects cellular and tissue organization, enabling zero-shot retrieval of the tissue hierarchy. Pretrained protein representations can be adapted for downstream tasks: enhancing 3D structure-based representations for resolving immuno-oncological protein interactions, and investigating drugs' effects across cell types. Pinnacle outperforms state-of-the-art models in nominating therapeutic targets for rheumatoid arthritis and inflammatory bowel diseases, and pinpoints cell type contexts with higher predictive capability than context-free models. Pinnacle's ability to adjust its outputs based on the context in which it operates paves way for large-scale context-specific predictions in biology.
Collapse
Affiliation(s)
- Michelle M. Li
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Yepeng Huang
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Marissa Sumathipala
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Man Qing Liang
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Alberto Valdeolivas
- Roche Pharma Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Basel, Switzerland
| | - Ashwin N. Ananthakrishnan
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Gastroenterology, Massachusetts General Hospital, Boston, MA, USA
| | - Katherine Liao
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Rheumatology, Inflammation, and Immunity, Brigham and Women’s Hospital, Boston, MA, USA
| | - Daniel Marbach
- Roche Pharma Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Basel, Switzerland
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Allston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Data Science Initiative, Cambridge, MA, USA
| |
Collapse
|
5
|
Zhao Y, Xiang J, Shi X, Jia P, Zhang Y, Li M. MDDOmics: multi-omics resource of major depressive disorder. Database (Oxford) 2024; 2024:baae042. [PMID: 38917209 PMCID: PMC11197964 DOI: 10.1093/database/baae042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Revised: 03/02/2024] [Accepted: 05/29/2024] [Indexed: 06/27/2024]
Abstract
Major depressive disorder (MDD) is a pressing global health issue. Its pathogenesis remains elusive, but numerous studies have revealed its intricate associations with various biological factors. Consequently, there is an urgent need for a comprehensive multi-omics resource to help researchers in conducting multi-omics data analysis for MDD. To address this issue, we constructed the MDDOmics database (Major Depressive Disorder Omics, (https://www.csuligroup.com/MDDOmics/), which integrates an extensive collection of published multi-omics data related to MDD. The database contains 41 222 entries of MDD research results and several original datasets, including Single Nucleotide Polymorphisms, genes, non-coding RNAs, DNA methylations, metabolites and proteins, and offers various interfaces for searching and visualization. We also provide extensive downstream analyses of the collected MDD data, including differential analysis, enrichment analysis and disease-gene prediction. Moreover, the database also incorporates multi-omics data for bipolar disorder, schizophrenia and anxiety disorder, due to the challenge in differentiating MDD from similar psychiatric disorders. In conclusion, by leveraging the rich content and online interfaces from MDDOmics, researchers can conduct more comprehensive analyses of MDD and its similar disorders from various perspectives, thereby gaining a deeper understanding of potential MDD biomarkers and intricate disease pathogenesis. Database URL: https://www.csuligroup.com/MDDOmics/.
Collapse
Affiliation(s)
- Yichao Zhao
- School of Computer Science and Engineering, Central South University, No.932 South Lushan Road, Changsha 410083, China
| | - Ju Xiang
- School of Computer and Communication Engineering, Changsha University of Science and Technology, No.45 Chiling Road, Changsha 410114, China
| | - Xingyuan Shi
- School of Computer Science and Engineering, Central South University, No.932 South Lushan Road, Changsha 410083, China
| | - Pengzhen Jia
- School of Computer Science and Engineering, Central South University, No.932 South Lushan Road, Changsha 410083, China
| | - Yan Zhang
- Department of Psychiatry, and National Clinical Research Center for Mental Disorders, The Second Xiangya Hospital of Central South University, No.139 Renmin Road Central, Changsha 410011, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, No.932 South Lushan Road, Changsha 410083, China
| |
Collapse
|
6
|
Hadar N, Dolgin V, Oustinov K, Yogev Y, Poleg T, Safran A, Freund O, Agam N, Jean MM, Proskorovski-Ohayon R, Wormser O, Drabkin M, Halperin D, Eskin-Schwartz M, Narkis G, Sued-Hendrickson S, Aminov I, Gombosh M, Aharoni S, Birk OS. VARista: a free web platform for streamlined whole-genome variant analysis across T2T, hg38, and hg19. Hum Genet 2024; 143:695-701. [PMID: 38607411 DOI: 10.1007/s00439-024-02671-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Accepted: 03/24/2024] [Indexed: 04/13/2024]
Abstract
With the increasing importance of genomic data in understanding genetic diseases, there is an essential need for efficient and user-friendly tools that simplify variant analysis. Although multiple tools exist, many present barriers such as steep learning curves, limited reference genome compatibility, or costs. We developed VARista, a free web-based tool, to address these challenges and provide a streamlined solution for researchers, particularly those focusing on rare monogenic diseases. VARista offers a user-centric interface that eliminates much of the technical complexity typically associated with variant analysis. The tool directly supports VCF files generated using reference genomes hg19, hg38, and the emerging T2T, with seamless remapping capabilities between them. Features such as gene summaries and links, tissue and cell-specific gene expression data for both adults and fetuses, as well as automated PCR design and integration with tools such as SpliceAI and AlphaMissense, enable users to focus on the biology and the case itself. As we demonstrate, VARista proved effective in narrowing down potential disease-causing variants, prioritizing them effectively, and providing meaningful biological context, facilitating rapid decision-making. VARista stands out as a freely available and comprehensive tool that consolidates various aspects of variant analysis into a single platform that embraces the forefront of genomic advancements. Its design inherently supports a shift in focus from technicalities to critical thinking, thereby promoting better-informed decisions in genetic disease research. Given its unique capabilities and user-centric design, VARista has the potential to become an essential asset for the genomic research community. https://VARista.link.
Collapse
Affiliation(s)
- Noam Hadar
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Vadim Dolgin
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Katya Oustinov
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Yuval Yogev
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Tomer Poleg
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Amit Safran
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Ofek Freund
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Nadav Agam
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Matan M Jean
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Regina Proskorovski-Ohayon
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Ohad Wormser
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Max Drabkin
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Daniel Halperin
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Marina Eskin-Schwartz
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
- Genetics Institute, Soroka University Medical Center, Beer-Sheva, Israel
| | - Ginat Narkis
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
- Genetics Institute, Soroka University Medical Center, Beer-Sheva, Israel
| | - Sufa Sued-Hendrickson
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Ilana Aminov
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Maya Gombosh
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Sarit Aharoni
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Ohad S Birk
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel.
- Genetics Institute, Soroka University Medical Center, Beer-Sheva, Israel.
| |
Collapse
|
7
|
Computational Resources for Molecular Biology 2022. J Mol Biol 2022; 434:167625. [PMID: 35569508 DOI: 10.1016/j.jmb.2022.167625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|