1
|
Kwon JJ, Pan J, Gonzalez G, Hahn WC, Zitnik M. On knowing a gene: A distributional hypothesis of gene function. Cell Syst 2024; 15:488-496. [PMID: 38810640 PMCID: PMC11189734 DOI: 10.1016/j.cels.2024.04.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 02/25/2024] [Accepted: 04/30/2024] [Indexed: 05/31/2024]
Abstract
As words can have multiple meanings that depend on sentence context, genes can have various functions that depend on the surrounding biological system. This pleiotropic nature of gene function is limited by ontologies, which annotate gene functions without considering biological contexts. We contend that the gene function problem in genetics may be informed by recent technological leaps in natural language processing, in which representations of word semantics can be automatically learned from diverse language contexts. In contrast to efforts to model semantics as "is-a" relationships in the 1990s, modern distributional semantics represents words as vectors in a learned semantic space and fuels current advances in transformer-based models such as large language models and generative pre-trained transformers. A similar shift in thinking of gene functions as distributions over cellular contexts may enable a similar breakthrough in data-driven learning from large biological datasets to inform gene function.
Collapse
Affiliation(s)
- Jason J Kwon
- Dana-Farber Cancer Institute and Harvard Medical School, Department of Medical Oncology, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Joshua Pan
- Dana-Farber Cancer Institute and Harvard Medical School, Department of Medical Oncology, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Guadalupe Gonzalez
- Department of Computing, Faculty of Engineering, Imperial College, London SW7 2AZ, UK
| | - William C Hahn
- Dana-Farber Cancer Institute and Harvard Medical School, Department of Medical Oncology, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| | - Marinka Zitnik
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Harvard Medical School, Department of Biomedical Informatics, Boston, MA 02115, USA; Harvard Data Science Initiative, Harvard University, Cambridge, MA 02138, USA; Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Allston, MA 02134, USA.
| |
Collapse
|
2
|
Razew M, Fraudeau A, Pfleiderer MM, Linares R, Galej WP. Structural basis of the Integrator complex assembly and association with transcription factors. Mol Cell 2024:S1097-2765(24)00431-3. [PMID: 38823386 DOI: 10.1016/j.molcel.2024.05.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 03/18/2024] [Accepted: 05/09/2024] [Indexed: 06/03/2024]
Abstract
Integrator is a multi-subunit protein complex responsible for premature transcription termination of coding and non-coding RNAs. This is achieved via two enzymatic activities, RNA endonuclease and protein phosphatase, acting on the promoter-proximally paused RNA polymerase Ⅱ (RNAPⅡ). Yet, it remains unclear how Integrator assembly and recruitment are regulated and what the functions of many of its core subunits are. Here, we report the structures of two human Integrator sub-complexes: INTS10/13/14/15 and INTS5/8/10/15, and an integrative model of the fully assembled Integrator bound to the RNAPⅡ paused elongating complex (PEC). An in silico protein-protein interaction screen of over 1,500 human transcription factors (TFs) identified ZNF655 as a direct interacting partner of INTS13 within the fully assembled Integrator. We propose a model wherein INTS13 acts as a platform for the recruitment of TFs that could modulate the stability of the Integrator's association at specific loci and regulate transcription attenuation of the target genes.
Collapse
Affiliation(s)
- Michal Razew
- European Molecular Biology Laboratory, EMBL Grenoble, 71 Avenue des Martyrs, 38042 Grenoble, France
| | - Angelique Fraudeau
- European Molecular Biology Laboratory, EMBL Grenoble, 71 Avenue des Martyrs, 38042 Grenoble, France
| | - Moritz M Pfleiderer
- European Molecular Biology Laboratory, EMBL Grenoble, 71 Avenue des Martyrs, 38042 Grenoble, France
| | - Romain Linares
- European Molecular Biology Laboratory, EMBL Grenoble, 71 Avenue des Martyrs, 38042 Grenoble, France
| | - Wojciech P Galej
- European Molecular Biology Laboratory, EMBL Grenoble, 71 Avenue des Martyrs, 38042 Grenoble, France.
| |
Collapse
|
3
|
Li MM, Huang Y, Sumathipala M, Liang MQ, Valdeolivas A, Ananthakrishnan AN, Liao K, Marbach D, Zitnik M. Contextualizing protein representations using deep learning on protein networks and single-cell data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.07.18.549602. [PMID: 37503080 PMCID: PMC10370131 DOI: 10.1101/2023.07.18.549602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Understanding protein function and developing molecular therapies require deciphering the cell types in which proteins act as well as the interactions between proteins. However, modeling protein interactions across diverse biological contexts, such as tissues and cell types, remains a significant challenge for existing algorithms. We introduce Pinnacle, a flexible geometric deep learning approach that is trained on contextualized protein interaction networks to generate context-aware protein representations. Leveraging a human multi-organ single-cell transcriptomic atlas, Pinnacle provides 394,760 protein representations split across 156 cell type contexts from 24 tissues and organs. Pinnacle's contextualized representations of proteins reflect cellular and tissue organization and Pinnacle's tissue representations enable zero-shot retrieval of the tissue hierarchy. Pretrained Pinnacle's protein representations can be adapted for downstream tasks: to enhance 3D structure-based protein representations for important protein interactions in immuno-oncology (PD-1/PD-L1 and B7-1/CTLA-4) and to study the effects of drugs across cell type contexts. Pinnacle outperforms state-of-the-art, yet context-free, models in nominating therapeutic targets for rheumatoid arthritis and inflammatory bowel diseases, and can pinpoint cell type contexts that predict therapeutic targets better than context-free models (29 out of 156 cell types in rheumatoid arthritis; 13 out of 152 cell types in inflammatory bowel diseases). Pinnacle is a graph-based contextual AI model that dynamically adjusts its outputs based on biological contexts in which it operates.
Collapse
Affiliation(s)
| | | | | | | | - Alberto Valdeolivas
- Roche Pharma Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, Basel, Switzerland
| | - Ashwin N Ananthakrishnan
- Harvard Medical School, Boston, MA, USA
- Division of Gastroenterology, Massachusetts General Hospital, Boston, MA, USA
| | - Katherine Liao
- Harvard Medical School, Boston, MA, USA
- Division of Rheumatology, Inflammation, and Immunity, Brigham and Women's Hospital, Boston, MA, USA
| | - Daniel Marbach
- Roche Pharma Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, Basel, Switzerland
| | - Marinka Zitnik
- Harvard Medical School, Boston, MA, USA
- Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Data Science Initiative, Cambridge, MA, USA
| |
Collapse
|
4
|
Gonzalez G, Herath I, Veselkov K, Bronstein M, Zitnik M. Combinatorial prediction of therapeutic perturbations using causally-inspired neural networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.03.573985. [PMID: 38260532 PMCID: PMC10802439 DOI: 10.1101/2024.01.03.573985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
As an alternative to target-driven drug discovery, phenotype-driven approaches identify compounds that counteract the overall disease effects by analyzing phenotypic signatures. Our study introduces a novel approach to this field, aiming to expand the search space for new therapeutic agents. We introduce PDGrapher, a causally-inspired graph neural network model designed to predict arbitrary perturbagens - sets of therapeutic targets - capable of reversing disease effects. Unlike existing methods that learn responses to perturbations, PDGrapher solves the inverse problem, which is to infer the perturbagens necessary to achieve a specific response - i.e., directly predicting perturbagens by learning which perturbations elicit a desired response. Experiments across eight datasets of genetic and chemical perturbations show that PDGrapher successfully predicted effective perturbagens in up to 9% additional test samples and ranked therapeutic targets up to 35% higher than competing methods. A key innovation of PDGrapher is its direct prediction capability, which contrasts with the indirect, computationally intensive models traditionally used in phenotypedriven drug discovery that only predict changes in phenotypes due to perturbations. The direct approach enables PDGrapher to train up to 30 times faster, representing a significant leap in efficiency. Our results suggest that PDGrapher can advance phenotype-driven drug discovery, offering a fast and comprehensive approach to identifying therapeutically useful perturbations.
Collapse
Affiliation(s)
- Guadalupe Gonzalez
- Imperial College London, London, UK
- Prescient Design, Genentech, South San Francisco, CA, USA
- F. Hoffmann-La Roche Ltd, Basel, Switzerland
| | - Isuru Herath
- Merck & Co., South San Francisco, CA, USA
- Cornell University, Ithaca, NY, USA
| | | | | | - Marinka Zitnik
- Harvard Medical School, Boston, MA, USA
- Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Data Science Initiative, Cambridge, MA, USA
| |
Collapse
|
5
|
Brechtmann F, Bechtler T, Londhe S, Mertes C, Gagneur J. Evaluation of input data modality choices on functional gene embeddings. NAR Genom Bioinform 2023; 5:lqad095. [PMID: 37942285 PMCID: PMC10629286 DOI: 10.1093/nargab/lqad095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 09/07/2023] [Accepted: 09/28/2023] [Indexed: 11/10/2023] Open
Abstract
Functional gene embeddings, numerical vectors capturing gene function, provide a promising way to integrate functional gene information into machine learning models. These embeddings are learnt by applying self-supervised machine-learning algorithms on various data types including quantitative omics measurements, protein-protein interaction networks and literature. However, downstream evaluations comparing alternative data modalities used to construct functional gene embeddings have been lacking. Here we benchmarked functional gene embeddings obtained from various data modalities for predicting disease-gene lists, cancer drivers, phenotype-gene associations and scores from genome-wide association studies. Off-the-shelf predictors trained on precomputed embeddings matched or outperformed dedicated state-of-the-art predictors, demonstrating their high utility. Embeddings based on literature and protein-protein interactions inferred from low-throughput experiments outperformed embeddings derived from genome-wide experimental data (transcriptomics, deletion screens and protein sequence) when predicting curated gene lists. In contrast, they did not perform better when predicting genome-wide association signals and were biased towards highly-studied genes. These results indicate that embeddings derived from literature and low-throughput experiments appear favourable in many existing benchmarks because they are biased towards well-studied genes and should therefore be considered with caution. Altogether, our study and precomputed embeddings will facilitate the development of machine-learning models in genetics and related fields.
Collapse
Affiliation(s)
- Felix Brechtmann
- TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Munich Center for Machine Learning, Munich, Germany
| | - Thibault Bechtler
- TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Shubhankar Londhe
- TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Christian Mertes
- TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Munich Data Science Institute, Technical University of Munich, Garching, Germany
- Institute of Human Genetics, School of Medicine, Technical University of Munich, Munich, Germany
| | - Julien Gagneur
- TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Institute of Human Genetics, School of Medicine, Technical University of Munich, Munich, Germany
- Computational Health Center, Helmholtz Center Munich, Neuherberg, Germany
| |
Collapse
|
6
|
Hassan AZ, Ward HN, Rahman M, Billmann M, Lee Y, Myers CL. Dimensionality reduction methods for extracting functional networks from large-scale CRISPR screens. Mol Syst Biol 2023; 19:e11657. [PMID: 37750448 PMCID: PMC10632734 DOI: 10.15252/msb.202311657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2023] [Revised: 08/28/2023] [Accepted: 09/05/2023] [Indexed: 09/27/2023] Open
Abstract
CRISPR-Cas9 screens facilitate the discovery of gene functional relationships and phenotype-specific dependencies. The Cancer Dependency Map (DepMap) is the largest compendium of whole-genome CRISPR screens aimed at identifying cancer-specific genetic dependencies across human cell lines. A mitochondria-associated bias has been previously reported to mask signals for genes involved in other functions, and thus, methods for normalizing this dominant signal to improve co-essentiality networks are of interest. In this study, we explore three unsupervised dimensionality reduction methods-autoencoders, robust, and classical principal component analyses (PCA)-for normalizing the DepMap to improve functional networks extracted from these data. We propose a novel "onion" normalization technique to combine several normalized data layers into a single network. Benchmarking analyses reveal that robust PCA combined with onion normalization outperforms existing methods for normalizing the DepMap. Our work demonstrates the value of removing low-dimensional signals from the DepMap before constructing functional gene networks and provides generalizable dimensionality reduction-based normalization tools.
Collapse
Affiliation(s)
- Arshia Zernab Hassan
- Department of Computer Science and EngineeringUniversity of Minnesota – Twin CitiesMinneapolisMNUSA
| | - Henry N Ward
- Bioinformatics and Computational Biology Graduate ProgramUniversity of Minnesota – Twin CitiesMinneapolisMNUSA
| | - Mahfuzur Rahman
- Department of Computer Science and EngineeringUniversity of Minnesota – Twin CitiesMinneapolisMNUSA
| | - Maximilian Billmann
- Department of Computer Science and EngineeringUniversity of Minnesota – Twin CitiesMinneapolisMNUSA
- Institute of Human GeneticsUniversity of Bonn, School of Medicine and University Hospital BonnBonnGermany
| | - Yoonkyu Lee
- Bioinformatics and Computational Biology Graduate ProgramUniversity of Minnesota – Twin CitiesMinneapolisMNUSA
| | - Chad L Myers
- Department of Computer Science and EngineeringUniversity of Minnesota – Twin CitiesMinneapolisMNUSA
- Bioinformatics and Computational Biology Graduate ProgramUniversity of Minnesota – Twin CitiesMinneapolisMNUSA
| |
Collapse
|
7
|
Petti S, Reddy G, Desai MM. Inferring sparse structure in genotype-phenotype maps. Genetics 2023; 225:iyad127. [PMID: 37437111 PMCID: PMC10471201 DOI: 10.1093/genetics/iyad127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 05/24/2023] [Accepted: 06/21/2023] [Indexed: 07/14/2023] Open
Abstract
Correlation among multiple phenotypes across related individuals may reflect some pattern of shared genetic architecture: individual genetic loci affect multiple phenotypes (an effect known as pleiotropy), creating observable relationships between phenotypes. A natural hypothesis is that pleiotropic effects reflect a relatively small set of common "core" cellular processes: each genetic locus affects one or a few core processes, and these core processes in turn determine the observed phenotypes. Here, we propose a method to infer such structure in genotype-phenotype data. Our approach, sparse structure discovery (SSD) is based on a penalized matrix decomposition designed to identify latent structure that is low-dimensional (many fewer core processes than phenotypes and genetic loci), locus-sparse (each locus affects few core processes), and/or phenotype-sparse (each phenotype is influenced by few core processes). Our use of sparsity as a guide in the matrix decomposition is motivated by the results of a novel empirical test indicating evidence of sparse structure in several recent genotype-phenotype datasets. First, we use synthetic data to show that our SSD approach can accurately recover core processes if each genetic locus affects few core processes or if each phenotype is affected by few core processes. Next, we apply the method to three datasets spanning adaptive mutations in yeast, genotoxin robustness assay in human cell lines, and genetic loci identified from a yeast cross, and evaluate the biological plausibility of the core process identified. More generally, we propose sparsity as a guiding prior for resolving latent structure in empirical genotype-phenotype maps.
Collapse
Affiliation(s)
- Samantha Petti
- NSF-Simons Center for the Mathematical and Statistical Analysis of Biology, Harvard University, Cambridge, MA 02138, USA
| | - Gautam Reddy
- NSF-Simons Center for the Mathematical and Statistical Analysis of Biology, Harvard University, Cambridge, MA 02138, USA
- Physics & Informatics Laboratories, NTT Research, Inc., Sunnyvale, CA 94085, USA
- Center for Brain Science, Harvard University, Cambridge, MA 02138, USA
| | - Michael M Desai
- Department of Organismic and Evolutionary Biology and Department of Physics, Harvard University, Cambridge, MA 02138, USA
| |
Collapse
|
8
|
Chen X, Li Y, Zhu F, Xu X, Estrella B, Pazos MA, McGuire JT, Karagiannis D, Sahu V, Mustafokulov M, Scuoppo C, Sánchez-Rivera FJ, Soto-Feliciano YM, Pasqualucci L, Ciccia A, Amengual JE, Lu C. Context-defined cancer co-dependency mapping identifies a functional interplay between PRC2 and MLL-MEN1 complex in lymphoma. Nat Commun 2023; 14:4259. [PMID: 37460547 DOI: 10.1038/s41467-023-39990-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Accepted: 07/06/2023] [Indexed: 07/20/2023] Open
Abstract
Interplay between chromatin-associated complexes and modifications critically contribute to the partitioning of epigenome into stable and functionally distinct domains. Yet there is a lack of systematic identification of chromatin crosstalk mechanisms, limiting our understanding of the dynamic transition between chromatin states during development and disease. Here we perform co-dependency mapping of genes using CRISPR-Cas9-mediated fitness screens in pan-cancer cell lines to quantify gene-gene functional relationships. We identify 145 co-dependency modules and further define the molecular context underlying the essentiality of these modules by incorporating mutational, epigenome, gene expression and drug sensitivity profiles of cell lines. These analyses assign new protein complex composition and function, and predict new functional interactions, including an unexpected co-dependency between two transcriptionally counteracting chromatin complexes - polycomb repressive complex 2 (PRC2) and MLL-MEN1 complex. We show that PRC2-mediated H3K27 tri-methylation regulates the genome-wide distribution of MLL1 and MEN1. In lymphoma cells with EZH2 gain-of-function mutations, the re-localization of MLL-MEN1 complex drives oncogenic gene expression and results in a hypersensitivity to pharmacologic inhibition of MEN1. Together, our findings provide a resource for discovery of trans-regulatory interactions as mechanisms of chromatin regulation and potential targets of synthetic lethality.
Collapse
Affiliation(s)
- Xiao Chen
- Department of Genetics and Development, Columbia University Irving Medical Center, New York, NY, 10032, USA
- Marine College, Shandong University, 264209, Weihai, China
| | - Yinglu Li
- Department of Genetics and Development, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Fang Zhu
- Department of Genetics and Development, Columbia University Irving Medical Center, New York, NY, 10032, USA
- Union Hospital Cancer Center, Tongji Medical College, Huazhong University of Science and Technology, 430022, Wuhan, China
| | - Xinjing Xu
- Department of Genetics and Development, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Brian Estrella
- Division of Hematology and Oncology, Department of Medicine, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Manuel A Pazos
- Division of Hematology and Oncology, Department of Medicine, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - John T McGuire
- Department of Genetics and Development, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Dimitris Karagiannis
- Department of Genetics and Development, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Varun Sahu
- Department of Genetics and Development, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Mustafo Mustafokulov
- Department of Genetics and Development, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Claudio Scuoppo
- Institute for Cancer Genetics, Columbia University Irving Medical Center, New York, NY, 10032, USA
- Department of Pathology and Cell Biology, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Francisco J Sánchez-Rivera
- David H. Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, 02142, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02142, USA
| | - Yadira M Soto-Feliciano
- David H. Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, 02142, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02142, USA
| | - Laura Pasqualucci
- Institute for Cancer Genetics, Columbia University Irving Medical Center, New York, NY, 10032, USA
- Department of Pathology and Cell Biology, Columbia University Irving Medical Center, New York, NY, 10032, USA
- Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Alberto Ciccia
- Department of Genetics and Development, Columbia University Irving Medical Center, New York, NY, 10032, USA
- Institute for Cancer Genetics, Columbia University Irving Medical Center, New York, NY, 10032, USA
- Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Jennifer E Amengual
- Division of Hematology and Oncology, Department of Medicine, Columbia University Irving Medical Center, New York, NY, 10032, USA
- Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Chao Lu
- Department of Genetics and Development, Columbia University Irving Medical Center, New York, NY, 10032, USA.
- Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY, 10032, USA.
| |
Collapse
|
9
|
Kratz A, Kim M, Kelly MR, Zheng F, Koczor CA, Li J, Ono K, Qin Y, Churas C, Chen J, Pillich RT, Park J, Modak M, Collier R, Licon K, Pratt D, Sobol RW, Krogan NJ, Ideker T. A multi-scale map of protein assemblies in the DNA damage response. Cell Syst 2023; 14:447-463.e8. [PMID: 37220749 PMCID: PMC10330685 DOI: 10.1016/j.cels.2023.04.007] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 01/30/2023] [Accepted: 04/25/2023] [Indexed: 05/25/2023]
Abstract
The DNA damage response (DDR) ensures error-free DNA replication and transcription and is disrupted in numerous diseases. An ongoing challenge is to determine the proteins orchestrating DDR and their organization into complexes, including constitutive interactions and those responding to genomic insult. Here, we use multi-conditional network analysis to systematically map DDR assemblies at multiple scales. Affinity purifications of 21 DDR proteins, with/without genotoxin exposure, are combined with multi-omics data to reveal a hierarchical organization of 605 proteins into 109 assemblies. The map captures canonical repair mechanisms and proposes new DDR-associated proteins extending to stress, transport, and chromatin functions. We find that protein assemblies closely align with genetic dependencies in processing specific genotoxins and that proteins in multiple assemblies typically act in multiple genotoxin responses. Follow-up by DDR functional readouts newly implicates 12 assembly members in double-strand-break repair. The DNA damage response assemblies map is available for interactive visualization and query (ccmi.org/ddram/).
Collapse
Affiliation(s)
- Anton Kratz
- University of California San Diego, Department of Medicine, San Diego, CA 92093, USA; The Cancer Cell Map Initiative, San Francisco and La Jolla, CA, USA
| | - Minkyu Kim
- University of California San Francisco, Department of Cellular and Molecular Pharmacology, San Francisco, CA 94158, USA; The J. David Gladstone Institute of Data Science and Biotechnology, San Francisco, CA 94158, USA; Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA 94158, USA; The Cancer Cell Map Initiative, San Francisco and La Jolla, CA, USA; University of Texas Health Science Center San Antonio, Department of Biochemistry and Structural Biology, San Antonio, TX 78229, USA
| | - Marcus R Kelly
- University of California San Diego, Department of Medicine, San Diego, CA 92093, USA
| | - Fan Zheng
- University of California San Diego, Department of Medicine, San Diego, CA 92093, USA; The Cancer Cell Map Initiative, San Francisco and La Jolla, CA, USA
| | - Christopher A Koczor
- University of South Alabama, Department of Pharmacology and Mitchell Cancer Institute, Mobile, AL 36604, USA
| | - Jianfeng Li
- University of South Alabama, Department of Pharmacology and Mitchell Cancer Institute, Mobile, AL 36604, USA
| | - Keiichiro Ono
- University of California San Diego, Department of Medicine, San Diego, CA 92093, USA
| | - Yue Qin
- University of California San Diego, Department of Medicine, San Diego, CA 92093, USA
| | - Christopher Churas
- University of California San Diego, Department of Medicine, San Diego, CA 92093, USA
| | - Jing Chen
- University of California San Diego, Department of Medicine, San Diego, CA 92093, USA
| | - Rudolf T Pillich
- University of California San Diego, Department of Medicine, San Diego, CA 92093, USA
| | - Jisoo Park
- University of California San Diego, Department of Medicine, San Diego, CA 92093, USA; The Cancer Cell Map Initiative, San Francisco and La Jolla, CA, USA
| | - Maya Modak
- University of California San Francisco, Department of Cellular and Molecular Pharmacology, San Francisco, CA 94158, USA; The J. David Gladstone Institute of Data Science and Biotechnology, San Francisco, CA 94158, USA; Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA 94158, USA; The Cancer Cell Map Initiative, San Francisco and La Jolla, CA, USA
| | - Rachel Collier
- University of California San Diego, Department of Medicine, San Diego, CA 92093, USA
| | - Kate Licon
- University of California San Diego, Department of Medicine, San Diego, CA 92093, USA
| | - Dexter Pratt
- University of California San Diego, Department of Medicine, San Diego, CA 92093, USA
| | - Robert W Sobol
- University of South Alabama, Department of Pharmacology and Mitchell Cancer Institute, Mobile, AL 36604, USA; Brown University, Department of Pathology and Laboratory Medicine and Legorreta Cancer Center, Providence, RI 02903, USA.
| | - Nevan J Krogan
- University of California San Francisco, Department of Cellular and Molecular Pharmacology, San Francisco, CA 94158, USA; The J. David Gladstone Institute of Data Science and Biotechnology, San Francisco, CA 94158, USA; Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA 94158, USA; The Cancer Cell Map Initiative, San Francisco and La Jolla, CA, USA.
| | - Trey Ideker
- University of California San Diego, Department of Medicine, San Diego, CA 92093, USA; The Cancer Cell Map Initiative, San Francisco and La Jolla, CA, USA.
| |
Collapse
|
10
|
Offley SR, Pfleiderer MM, Zucco A, Fraudeau A, Welsh SA, Razew M, Galej WP, Gardini A. A combinatorial approach to uncover an additional Integrator subunit. Cell Rep 2023; 42:112244. [PMID: 36920904 DOI: 10.1016/j.celrep.2023.112244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Revised: 11/15/2022] [Accepted: 02/23/2023] [Indexed: 03/16/2023] Open
Abstract
RNA polymerase II (RNAPII) controls expression of all protein-coding genes and most noncoding loci in higher eukaryotes. Calibrating RNAPII activity requires an assortment of polymerase-associated factors that are recruited at sites of active transcription. The Integrator complex is one of the most elusive transcriptional regulators in metazoans, deemed to be recruited after initiation to help establish and modulate paused RNAPII. Integrator is known to be composed of 14 subunits that assemble and operate in a modular fashion. We employed proteomics and machine-learning structure prediction (AlphaFold2) to identify an additional Integrator subunit, INTS15. We report that INTS15 assembles primarily with the INTS13/14/10 module and interfaces with the Int-PP2A module. Functional genomics analysis further reveals a role for INTS15 in modulating RNAPII pausing at a subset of genes. Our study shows that omics approaches combined with AlphaFold2-based predictions provide additional insights into the molecular architecture of large and dynamic multiprotein complexes.
Collapse
Affiliation(s)
- Sarah R Offley
- The Wistar Institute, Philadelphia, PA 19103, USA; Cell and Molecular Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Moritz M Pfleiderer
- European Molecular Biology Laboratory, 71 Avenue des Martyrs, 38042 Grenoble, France
| | - Avery Zucco
- The Wistar Institute, Philadelphia, PA 19103, USA
| | - Angelique Fraudeau
- European Molecular Biology Laboratory, 71 Avenue des Martyrs, 38042 Grenoble, France
| | | | - Michal Razew
- European Molecular Biology Laboratory, 71 Avenue des Martyrs, 38042 Grenoble, France
| | - Wojciech P Galej
- European Molecular Biology Laboratory, 71 Avenue des Martyrs, 38042 Grenoble, France.
| | | |
Collapse
|
11
|
Wagner EJ, Tong L, Adelman K. Integrator is a global promoter-proximal termination complex. Mol Cell 2023; 83:416-427. [PMID: 36634676 PMCID: PMC10866050 DOI: 10.1016/j.molcel.2022.11.012] [Citation(s) in RCA: 31] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Revised: 11/08/2022] [Accepted: 11/14/2022] [Indexed: 01/13/2023]
Abstract
Integrator is a metazoan-specific protein complex capable of inducing termination at all RNAPII-transcribed loci. Integrator recognizes paused, promoter-proximal RNAPII and drives premature termination using dual enzymatic activities: an endonuclease that cleaves nascent RNA and a protein phosphatase that removes stimulatory phosphorylation associated with RNAPII pause release and productive elongation. Recent breakthroughs in structural biology have revealed the overall architecture of Integrator and provided insights into how multiple Integrator modules are coordinated to elicit termination effectively. Furthermore, functional genomics and biochemical studies have unraveled how Integrator-mediated termination impacts protein-coding and noncoding loci. Here, we review the current knowledge about the assembly and activity of Integrator and describe the role of Integrator in gene regulation, highlighting the importance of this complex for human health.
Collapse
Affiliation(s)
- Eric J Wagner
- Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, NY 14642, USA.
| | - Liang Tong
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA.
| | - Karen Adelman
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA.
| |
Collapse
|
12
|
Gheorghe V, Hart T. Optimal construction of a functional interaction network from pooled library CRISPR fitness screens. BMC Bioinformatics 2022; 23:510. [PMID: 36443674 PMCID: PMC9707256 DOI: 10.1186/s12859-022-05078-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Accepted: 11/23/2022] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Functional interaction networks, where edges connect genes likely to operate in the same biological process or pathway, can be inferred from CRISPR knockout screens in cancer cell lines. Genes with similar knockout fitness profiles across a sufficiently diverse set of cell line screens are likely to be co-functional, and these "coessentiality" networks are increasingly powerful predictors of gene function and biological modularity. While several such networks have been published, most use different algorithms for each step of the network construction process. RESULTS In this study, we identify an optimal measure of functional interaction and test all combinations of options at each step-essentiality scoring, sample variance and covariance normalization, and similarity measurement-to identify best practices for generating a functional interaction network from CRISPR knockout data. We show that Bayes Factor and Ceres scores give the best results, that Ceres outperforms the newer Chronos scoring scheme, and that covariance normalization is a critical step in network construction. We further show that Pearson correlation, mathematically identical to ordinary least squares after covariance normalization, can be extended by using partial correlation to detect and amplify signals from "moonlighting" proteins which show context-dependent interaction with different partners. CONCLUSIONS We describe a systematic survey of methods for generating coessentiality networks from the Cancer Dependency Map data and provide a partial correlation-based approach for exploring context-dependent interactions.
Collapse
Affiliation(s)
- Veronica Gheorghe
- grid.240145.60000 0001 2291 4776Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX USA ,grid.240145.60000 0001 2291 4776Graduate School of Biomedical Sciences, The University of Texas MD Anderson Cancer Center UTHealth, Houston, TX USA
| | - Traver Hart
- grid.240145.60000 0001 2291 4776Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX USA ,grid.240145.60000 0001 2291 4776Department of Cancer Biology, The University of Texas MD Anderson Cancer Center, Houston, TX USA
| |
Collapse
|
13
|
Funk L, Su KC, Ly J, Feldman D, Singh A, Moodie B, Blainey PC, Cheeseman IM. The phenotypic landscape of essential human genes. Cell 2022; 185:4634-4653.e22. [PMID: 36347254 PMCID: PMC10482496 DOI: 10.1016/j.cell.2022.10.017] [Citation(s) in RCA: 31] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 09/01/2022] [Accepted: 10/14/2022] [Indexed: 11/09/2022]
Abstract
Understanding the basis for cellular growth, proliferation, and function requires determining the roles of essential genes in diverse cellular processes, including visualizing their contributions to cellular organization and morphology. Here, we combined pooled CRISPR-Cas9-based functional screening of 5,072 fitness-conferring genes in human HeLa cells with microscopy-based imaging of DNA, the DNA damage response, actin, and microtubules. Analysis of >31 million individual cells identified measurable phenotypes for >90% of gene knockouts, implicating gene targets in specific cellular processes. Clustering of phenotypic similarities based on hundreds of quantitative parameters further revealed co-functional genes across diverse cellular activities, providing predictions for gene functions and associations. By conducting pooled live-cell screening of ∼450,000 cell division events for 239 genes, we additionally identified diverse genes with functional contributions to chromosome segregation. Our work establishes a resource detailing the consequences of disrupting core cellular processes that represents the functional landscape of essential human genes.
Collapse
Affiliation(s)
- Luke Funk
- Broad Institute of MIT and Harvard, 415 Main St., Cambridge, MA 02142, USA; Harvard-MIT Health Sciences and Technology, Massachusetts Institute of Technology, Cambridge, MA 02142, USA
| | - Kuan-Chung Su
- Whitehead Institute for Biomedical Research, 455 Main Street, Cambridge, MA 02142, USA; Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02142, USA
| | - Jimmy Ly
- Whitehead Institute for Biomedical Research, 455 Main Street, Cambridge, MA 02142, USA; Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02142, USA
| | - David Feldman
- Broad Institute of MIT and Harvard, 415 Main St., Cambridge, MA 02142, USA
| | - Avtar Singh
- Broad Institute of MIT and Harvard, 415 Main St., Cambridge, MA 02142, USA
| | - Brittania Moodie
- Whitehead Institute for Biomedical Research, 455 Main Street, Cambridge, MA 02142, USA; Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02142, USA
| | - Paul C Blainey
- Broad Institute of MIT and Harvard, 415 Main St., Cambridge, MA 02142, USA; Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02142, USA; Koch Institute for Integrative Cancer Research at MIT, Cambridge, MA 02142, USA.
| | - Iain M Cheeseman
- Whitehead Institute for Biomedical Research, 455 Main Street, Cambridge, MA 02142, USA; Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02142, USA.
| |
Collapse
|
14
|
Stein CB, Field AR, Mimoso CA, Zhao C, Huang KL, Wagner EJ, Adelman K. Integrator endonuclease drives promoter-proximal termination at all RNA polymerase II-transcribed loci. Mol Cell 2022; 82:4232-4245.e11. [PMID: 36309014 PMCID: PMC9680917 DOI: 10.1016/j.molcel.2022.10.004] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2022] [Revised: 08/28/2022] [Accepted: 10/04/2022] [Indexed: 11/07/2022]
Abstract
RNA polymerase II (RNAPII) pausing in early elongation is critical for gene regulation. Paused RNAPII can be released into productive elongation by the kinase P-TEFb or targeted for premature termination by the Integrator complex. Integrator comprises endonuclease and phosphatase activities, driving termination by cleavage of nascent RNA and removal of stimulatory phosphorylation. We generated a degron system for rapid Integrator endonuclease (INTS11) depletion to probe the direct consequences of Integrator-mediated RNA cleavage. Degradation of INTS11 elicits nearly universal increases in active early elongation complexes. However, these RNAPII complexes fail to achieve optimal elongation rates and exhibit persistent Integrator phosphatase activity. Thus, only short transcripts are significantly upregulated following INTS11 loss, including transcription factors, signaling regulators, and non-coding RNAs. We propose a uniform molecular function for INTS11 across all RNAPII-transcribed loci, with differential effects on particular genes, pathways, or RNA biotypes reflective of transcript lengths rather than specificity of Integrator activity.
Collapse
Affiliation(s)
- Chad B Stein
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| | - Andrew R Field
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA; Ludwig Center at Harvard, Boston, MA 02115, USA
| | - Claudia A Mimoso
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| | - ChenCheng Zhao
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| | - Kai-Lieh Huang
- Department of Biochemistry and Biophysics, University of Rochester School of Medicine and Dentistry, Rochester, NY 14642, USA
| | - Eric J Wagner
- Department of Biochemistry and Biophysics, University of Rochester School of Medicine and Dentistry, Rochester, NY 14642, USA
| | - Karen Adelman
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA; Ludwig Center at Harvard, Boston, MA 02115, USA; Broad Institute, Cambridge, MA 02142, USA.
| |
Collapse
|
15
|
Mapping information-rich genotype-phenotype landscapes with genome-scale Perturb-seq. Cell 2022; 185:2559-2575.e28. [PMID: 35688146 PMCID: PMC9380471 DOI: 10.1016/j.cell.2022.05.013] [Citation(s) in RCA: 149] [Impact Index Per Article: 74.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 03/07/2022] [Accepted: 05/16/2022] [Indexed: 11/23/2022]
Abstract
A central goal of genetics is to define the relationships between genotypes and phenotypes. High-content phenotypic screens such as Perturb-seq (CRISPR-based screens with single-cell RNA-sequencing readouts) enable massively parallel functional genomic mapping but, to date, have been used at limited scales. Here, we perform genome-scale Perturb-seq targeting all expressed genes with CRISPR interference (CRISPRi) across >2.5 million human cells. We use transcriptional phenotypes to predict the function of poorly characterized genes, uncovering new regulators of ribosome biogenesis (including CCDC86, ZNF236, and SPATA5L1), transcription (C7orf26), and mitochondrial respiration (TMEM242). In addition to assigning gene function, single-cell transcriptional phenotypes allow for in-depth dissection of complex cellular phenomena—from RNA processing to differentiation. We leverage this ability to systematically identify genetic drivers and consequences of aneuploidy and to discover an unanticipated layer of stress-specific regulation of the mitochondrial genome. Our information-rich genotype-phenotype map reveals a multidimensional portrait of gene and cellular function. Unbiased, genome-scaling profiling of genetic perturbations via single-cell RNA sequencing enables systematic assignment of function to genes and indepth study of complex cellular phenotypes such as aneuploidy and stress-specific regulation of the mitochondrial genome.
Collapse
|
16
|
Bondeson DP, Paolella BR, Asfaw A, Rothberg MV, Skipper TA, Langan C, Mesa G, Gonzalez A, Surface LE, Ito K, Kazachkova M, Colgan WN, Warren A, Dempster JM, Krill-Burger JM, Ericsson M, Tang AA, Fung I, Chambers ES, Abdusamad M, Dumont N, Doench JG, Piccioni F, Root DE, Boehm J, Hahn WC, Mannstadt M, McFarland JM, Vazquez F, Golub TR. Phosphate dysregulation via the XPR1-KIDINS220 protein complex is a therapeutic vulnerability in ovarian cancer. NATURE CANCER 2022; 3:681-695. [PMID: 35437317 PMCID: PMC9246846 DOI: 10.1038/s43018-022-00360-7] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Accepted: 03/04/2022] [Indexed: 12/13/2022]
Abstract
Despite advances in precision medicine, the clinical prospects for patients with ovarian and uterine cancers have not substantially improved. Here, we analyzed genome-scale CRISPR/Cas9 loss-of-function screens across 851 human cancer cell lines and found that frequent overexpression of SLC34A2 – encoding a phosphate importer – is correlated to sensitivity to loss of the phosphate exporter XPR1 in vitro and in vivo. In patient-derived tumor samples, we observed frequent PAX8-dependent overexpression of SLC34A2, XPR1 copy number amplifications, and XPR1 mRNA overexpression. Mechanistically, in SLC34A2-high cancer cell lines, genetic or pharmacologic inhibition of XPR1-dependent phosphate efflux leads to the toxic accumulation of intracellular phosphate. Finally, we show that XPR1 requires the novel partner protein KIDINS220 for proper cellular localization and activity, and that disruption of this protein complex results in acidic vacuolar structures preceding cell death. These data point to the XPR1:KIDINS220 complex and phosphate dysregulation as a therapeutic vulnerability in ovarian cancer. Golub and colleagues identify the phosphate exporter XPR1 as a therapeutic vulnerability in ovarian and uterine cancers, and show that phosphate efflux inhibition reduces tumor cell viability through accumulation of intracellular phosphate.
Collapse
Affiliation(s)
| | - Brenton R Paolella
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Merck Research Laboratories, Cambridge, MA, USA
| | - Adhana Asfaw
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | | | - Carly Langan
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Gabriel Mesa
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Lauren E Surface
- Endocrine Unit, Massachusetts General Hospital, Boston, MA, USA.,Harvard Medical School, Boston, MA, USA
| | - Kentaro Ito
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | | | | | | | | | | | - Andrew A Tang
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Iris Fung
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Mai Abdusamad
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Nancy Dumont
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - John G Doench
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Federica Piccioni
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Merck Research Laboratories, Cambridge, MA, USA
| | - David E Root
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Jesse Boehm
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - William C Hahn
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Harvard Medical School, Boston, MA, USA.,Departments of Pediatric and Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - Michael Mannstadt
- Endocrine Unit, Massachusetts General Hospital, Boston, MA, USA.,Harvard Medical School, Boston, MA, USA
| | | | | | - Todd R Golub
- Broad Institute of MIT and Harvard, Cambridge, MA, USA. .,Harvard Medical School, Boston, MA, USA. .,Departments of Pediatric and Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|