1
|
Andreani V, South EJ, Dunlop MJ. Generating information-dense promoter sequences with optimal string packing. PLoS Comput Biol 2024; 20:e1012276. [PMID: 39047028 PMCID: PMC11268586 DOI: 10.1371/journal.pcbi.1012276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Accepted: 06/25/2024] [Indexed: 07/27/2024] Open
Abstract
Dense arrangements of binding sites within nucleotide sequences can collectively influence downstream transcription rates or initiate biomolecular interactions. For example, natural promoter regions can harbor many overlapping transcription factor binding sites that influence the rate of transcription initiation. Despite the prevalence of overlapping binding sites in nature, rapid design of nucleotide sequences with many overlapping sites remains a challenge. Here, we show that this is an NP-hard problem, coined here as the nucleotide String Packing Problem (SPP). We then introduce a computational technique that efficiently assembles sets of DNA-protein binding sites into dense, contiguous stretches of double-stranded DNA. For the efficient design of nucleotide sequences spanning hundreds of base pairs, we reduce the SPP to an Orienteering Problem with integer distances, and then leverage modern integer linear programming solvers. Our method optimally packs sets of 20-100 binding sites into dense nucleotide arrays of 50-300 base pairs in 0.05-10 seconds. Unlike approximation algorithms or meta-heuristics, our approach finds provably optimal solutions. We demonstrate how our method can generate large sets of diverse sequences suitable for library generation, where the frequency of binding site usage across the returned sequences can be controlled by modulating the objective function. As an example, we then show how adding additional constraints, like the inclusion of sequence elements with fixed positions, allows for the design of bacterial promoters. The nucleotide string packing approach we present can accelerate the design of sequences with complex DNA-protein interactions. When used in combination with synthesis and high-throughput screening, this design strategy could help interrogate how complex binding site arrangements impact either gene expression or biomolecular mechanisms in varied cellular contexts.
Collapse
Affiliation(s)
- Virgile Andreani
- Biomedical Engineering Department, Boston University, Boston, Massachusetts, United States of America
- Biological Design Center, Boston University, Boston, Massachusetts, United States of America
| | - Eric J. South
- Biological Design Center, Boston University, Boston, Massachusetts, United States of America
- Molecular Biology, Cell Biology & Biochemistry Program, Boston University, Boston, Massachusetts, United States of America
| | - Mary J. Dunlop
- Biomedical Engineering Department, Boston University, Boston, Massachusetts, United States of America
- Biological Design Center, Boston University, Boston, Massachusetts, United States of America
- Molecular Biology, Cell Biology & Biochemistry Program, Boston University, Boston, Massachusetts, United States of America
| |
Collapse
|
2
|
Posfai A, Zhou J, McCandlish DM, Kinney JB. Gauge fixing for sequence-function relationships. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.12.593772. [PMID: 38798671 PMCID: PMC11118547 DOI: 10.1101/2024.05.12.593772] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Quantitative models of sequence-function relationships are ubiquitous in computational biology, e.g., for modeling the DNA binding of transcription factors or the fitness landscapes of proteins. Interpreting these models, however, is complicated by the fact that the values of model parameters can often be changed without affecting model predictions. Before the values of model parameters can be meaningfully interpreted, one must remove these degrees of freedom (called "gauge freedoms" in physics) by imposing additional constraints (a process called "fixing the gauge"). However, strategies for fixing the gauge of sequence-function relationships have received little attention. Here we derive an analytically tractable family of gauges for a large class of sequence-function relationships. These gauges are derived in the context of models with all-order interactions, but an important subset of these gauges can be applied to diverse types of models, including additive models, pairwise-interaction models, and models with higher-order interactions. Many commonly used gauges are special cases of gauges within this family. We demonstrate the utility of this family of gauges by showing how different choices of gauge can be used both to explore complex activity landscapes and to reveal simplified models that are approximately correct within localized regions of sequence space. The results provide practical gauge-fixing strategies and demonstrate the utility of gauge-fixing for model exploration and interpretation.
Collapse
Affiliation(s)
- Anna Posfai
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| | - Juannan Zhou
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
- Department of Biology, University of Florida, Gainesville, FL, 32611
| | - David M McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| | - Justin B Kinney
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| |
Collapse
|
3
|
He J, Huo X, Pei G, Jia Z, Yan Y, Yu J, Qu H, Xie Y, Yuan J, Zheng Y, Hu Y, Shi M, You K, Li T, Ma T, Zhang MQ, Ding S, Li P, Li Y. Dual-role transcription factors stabilize intermediate expression levels. Cell 2024; 187:2746-2766.e25. [PMID: 38631355 DOI: 10.1016/j.cell.2024.03.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 12/08/2023] [Accepted: 03/18/2024] [Indexed: 04/19/2024]
Abstract
Precise control of gene expression levels is essential for normal cell functions, yet how they are defined and tightly maintained, particularly at intermediate levels, remains elusive. Here, using a series of newly developed sequencing, imaging, and functional assays, we uncover a class of transcription factors with dual roles as activators and repressors, referred to as condensate-forming level-regulating dual-action transcription factors (TFs). They reduce high expression but increase low expression to achieve stable intermediate levels. Dual-action TFs directly exert activating and repressing functions via condensate-forming domains that compartmentalize core transcriptional unit selectively. Clinically relevant mutations in these domains, which are linked to a range of developmental disorders, impair condensate selectivity and dual-action TF activity. These results collectively address a fundamental question in expression regulation and demonstrate the potential of level-regulating dual-action TFs as powerful effectors for engineering controlled expression levels.
Collapse
Affiliation(s)
- Jinnan He
- The IDG/McGovern Institute for Brain Research, MOE Key Laboratory of Bioinformatics, State Key Lab of Molecular Oncology, Center for Synthetic and Systems Biology, Tsinghua University, Beijing 100084, China; School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China
| | - Xiangru Huo
- The IDG/McGovern Institute for Brain Research, MOE Key Laboratory of Bioinformatics, State Key Lab of Molecular Oncology, Center for Synthetic and Systems Biology, Tsinghua University, Beijing 100084, China; School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China
| | - Gaofeng Pei
- State Key Laboratory of Membrane Biology, Frontier Research Center for Biological Structure, School of Life Sciences, Tsinghua University, Beijing 100084, China; Tsinghua University-Peking University Joint Center for Life Sciences, Beijing 100084, China
| | - Zeran Jia
- The IDG/McGovern Institute for Brain Research, MOE Key Laboratory of Bioinformatics, State Key Lab of Molecular Oncology, Center for Synthetic and Systems Biology, Tsinghua University, Beijing 100084, China; School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China
| | - Yiming Yan
- The IDG/McGovern Institute for Brain Research, MOE Key Laboratory of Bioinformatics, State Key Lab of Molecular Oncology, Center for Synthetic and Systems Biology, Tsinghua University, Beijing 100084, China; School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China
| | - Jiawei Yu
- The IDG/McGovern Institute for Brain Research, MOE Key Laboratory of Bioinformatics, State Key Lab of Molecular Oncology, Center for Synthetic and Systems Biology, Tsinghua University, Beijing 100084, China; School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China
| | - Haozhi Qu
- The IDG/McGovern Institute for Brain Research, MOE Key Laboratory of Bioinformatics, State Key Lab of Molecular Oncology, Center for Synthetic and Systems Biology, Tsinghua University, Beijing 100084, China; School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China
| | - Yunxin Xie
- The IDG/McGovern Institute for Brain Research, MOE Key Laboratory of Bioinformatics, State Key Lab of Molecular Oncology, Center for Synthetic and Systems Biology, Tsinghua University, Beijing 100084, China; School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China
| | - Junsong Yuan
- The IDG/McGovern Institute for Brain Research, MOE Key Laboratory of Bioinformatics, State Key Lab of Molecular Oncology, Center for Synthetic and Systems Biology, Tsinghua University, Beijing 100084, China; School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China
| | - Yuan Zheng
- The IDG/McGovern Institute for Brain Research, MOE Key Laboratory of Bioinformatics, State Key Lab of Molecular Oncology, Center for Synthetic and Systems Biology, Tsinghua University, Beijing 100084, China; School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China
| | - Yanyan Hu
- School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China; Tsinghua University-Peking University Joint Center for Life Sciences, Beijing 100084, China
| | - Minglei Shi
- Bioinformatics Division, National Research Center for Information Science and Technology, School of Medicine, Tsinghua University, Beijing 100084, China
| | - Kaiqiang You
- Department of Biomedical Informatics, School of Basic Medical Sciences, Peking University Health Science Center, Beijing 100191, China
| | - Tingting Li
- Department of Biomedical Informatics, School of Basic Medical Sciences, Peking University Health Science Center, Beijing 100191, China
| | - Tianhua Ma
- School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China; Tsinghua University-Peking University Joint Center for Life Sciences, Beijing 100084, China
| | - Michael Q Zhang
- Bioinformatics Division, National Research Center for Information Science and Technology, School of Medicine, Tsinghua University, Beijing 100084, China; Department of Biological Sciences, Center for Systems Biology, The University of Texas, Dallas, TX 75080-3021, USA
| | - Sheng Ding
- School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China; Tsinghua University-Peking University Joint Center for Life Sciences, Beijing 100084, China
| | - Pilong Li
- State Key Laboratory of Membrane Biology, Frontier Research Center for Biological Structure, School of Life Sciences, Tsinghua University, Beijing 100084, China; Tsinghua University-Peking University Joint Center for Life Sciences, Beijing 100084, China.
| | - Yinqing Li
- The IDG/McGovern Institute for Brain Research, MOE Key Laboratory of Bioinformatics, State Key Lab of Molecular Oncology, Center for Synthetic and Systems Biology, Tsinghua University, Beijing 100084, China; School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China.
| |
Collapse
|
4
|
Liu J, Ashuach T, Inoue F, Ahituv N, Yosef N, Kreimer A. Optimizing sequence design strategies for perturbation MPRAs: a computational evaluation framework. Nucleic Acids Res 2024; 52:1613-1627. [PMID: 38296821 PMCID: PMC10939410 DOI: 10.1093/nar/gkae012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 12/26/2023] [Accepted: 01/12/2024] [Indexed: 02/02/2024] Open
Abstract
The advent of perturbation-based massively parallel reporter assays (MPRAs) technique has facilitated the delineation of the roles of non-coding regulatory elements in orchestrating gene expression. However, computational efforts remain scant to evaluate and establish guidelines for sequence design strategies for perturbation MPRAs. In this study, we propose a framework for evaluating and comparing various perturbation strategies for MPRA experiments. Within this framework, we benchmark three different perturbation approaches from the perspectives of alteration in motif-based profiles, consistency of MPRA outputs, and robustness of models that predict the activities of putative regulatory motifs. While our analyses show very similar results across multiple benchmarking metrics, the predictive modeling for the approach involving random nucleotide shuffling shows significant robustness compared with the other two approaches. Thus, we recommend designing sequences by randomly shuffling the nucleotides of the perturbed site in perturbation-MPRA, followed by a coherence check to prevent the introduction of other variations of the target motifs. In summary, our evaluation framework and the benchmarking findings create a resource of computational pipelines and highlight the potential of perturbation-MPRA in predicting non-coding regulatory activities.
Collapse
Affiliation(s)
- Jiayi Liu
- Graduate Program in Cell & Developmental Biology, Rutgers, The State University of New Jersey, 604 Allison Rd, Piscataway, NJ 08854, USA
- Department of Biochemistry and Molecular Biology, Rutgers, The State University of New Jersey, 604 Allison Road, Piscataway, NJ 08854, USA
- Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, 679 Hoes Lane West, Piscataway, Piscataway, NJ 08854, USA
| | - Tal Ashuach
- Department of Electrical Engineering and Computer Sciences and Center for Computational Biology, University of California, Berkeley, 387 Soda Hall, Berkeley, CA 94720, USA
| | - Fumitaka Inoue
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Faculty of Medicine Building B, Yoshidatachibanacho, Sakyo Ward, Kyoto 606-8303, Japan
| | - Nadav Ahituv
- Department of Bioengineering and Therapeutic Sciences, University of California, 1700 4th Street, San Francisco, CA 94158, USA
- Institute for Human Genetics, University of California, 513 Parnassus Ave, San Francisco, CA 94143, USA
| | - Nir Yosef
- Department of Systems Immunology, Weizmann Institute of Science, 234 Herzl Street, Rehovot 7610001, Israel
- Chan-Zuckerberg Biohub, 499 Illinois St, San Francisco, CA 94158, USA
- Department of Systems Immunology, Ragon Institute of MGH, MIT, and Harvard Institute of Science, 400 Technology Square, Cambridge, MA 02139, USA
| | - Anat Kreimer
- Department of Biochemistry and Molecular Biology, Rutgers, The State University of New Jersey, 604 Allison Road, Piscataway, NJ 08854, USA
- Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, 679 Hoes Lane West, Piscataway, Piscataway, NJ 08854, USA
| |
Collapse
|
5
|
Kwak IY, Kim BC, Lee J, Kang T, Garry DJ, Zhang J, Gong W. Proformer: a hybrid macaron transformer model predicts expression values from promoter sequences. BMC Bioinformatics 2024; 25:81. [PMID: 38378442 PMCID: PMC10877777 DOI: 10.1186/s12859-024-05645-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 01/08/2024] [Indexed: 02/22/2024] Open
Abstract
The breakthrough high-throughput measurement of the cis-regulatory activity of millions of randomly generated promoters provides an unprecedented opportunity to systematically decode the cis-regulatory logic that determines the expression values. We developed an end-to-end transformer encoder architecture named Proformer to predict the expression values from DNA sequences. Proformer used a Macaron-like Transformer encoder architecture, where two half-step feed forward (FFN) layers were placed at the beginning and the end of each encoder block, and a separable 1D convolution layer was inserted after the first FFN layer and in front of the multi-head attention layer. The sliding k-mers from one-hot encoded sequences were mapped onto a continuous embedding, combined with the learned positional embedding and strand embedding (forward strand vs. reverse complemented strand) as the sequence input. Moreover, Proformer introduced multiple expression heads with mask filling to prevent the transformer models from collapsing when training on relatively small amount of data. We empirically determined that this design had significantly better performance than the conventional design such as using the global pooling layer as the output layer for the regression task. These analyses support the notion that Proformer provides a novel method of learning and enhances our understanding of how cis-regulatory sequences determine the expression values.
Collapse
Affiliation(s)
- Il-Youp Kwak
- Department of Applied Statistics, Chung‑Ang University, Seoul, Republic of Korea
| | - Byeong-Chan Kim
- Department of Applied Statistics, Chung‑Ang University, Seoul, Republic of Korea
| | - Juhyun Lee
- Department of Applied Statistics, Chung‑Ang University, Seoul, Republic of Korea
| | - Taein Kang
- Department of Applied Statistics, Chung‑Ang University, Seoul, Republic of Korea
| | - Daniel J Garry
- Cardiovascular Division, Department of Medicine, Lillehei Heart Institute, University of Minnesota, 2231 6th St SE, Minneapolis, MN, 55455, USA.
- Stem Cell Institute, University of Minnesota, Minneapolis, MN, 55455, USA.
- Paul and Sheila Wellstone Muscular Dystrophy Center, University of Minnesota, Minneapolis, MN, 55455, USA.
| | - Jianyi Zhang
- Department of Biomedical Engineering, The University of Alabama at Birmingham, Birmingham, AL, 35233, USA
| | - Wuming Gong
- Cardiovascular Division, Department of Medicine, Lillehei Heart Institute, University of Minnesota, 2231 6th St SE, Minneapolis, MN, 55455, USA.
| |
Collapse
|
6
|
Andreani V, South EJ, Dunlop MJ. Generating information-dense promoter sequences with optimal string packing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.01.565124. [PMID: 37961203 PMCID: PMC10635063 DOI: 10.1101/2023.11.01.565124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Dense arrangements of binding sites within nucleotide sequences can collectively influence downstream transcription rates or initiate biomolecular interactions. For example, natural promoter regions can harbor many overlapping transcription factor binding sites that influence the rate of transcription initiation. Despite the prevalence of overlapping binding sites in nature, rapid design of nucleotide sequences with many overlapping sites remains a challenge. Here, we show that this is an NP-hard problem, coined here as the nucleotide String Packing Problem (SPP). We then introduce a computational technique that efficiently assembles sets of DNA-protein binding sites into dense, contiguous stretches of double-stranded DNA. For the efficient design of nucleotide sequences spanning hundreds of base pairs, we reduce the SPP to an Orienteering Problem with integer distances, and then leverage modern integer linear programming solvers. Our method optimally packs libraries of 20-100 binding sites into dense nucleotide arrays of 50-300 base pairs in 0.05-10 seconds. Unlike approximation algorithms or meta-heuristics, our approach finds provably optimal solutions. We demonstrate how our method can generate large sets of diverse sequences suitable for library generation, where the frequency of binding site usage across the returned sequences can be controlled by modulating the objective function. As an example, we then show how adding additional constraints, like the inclusion of sequence elements with fixed positions, allows for the design of bacterial promoters. The nucleotide string packing approach we present can accelerate the design of sequences with complex DNA-protein interactions. When used in combination with synthesis and high-throughput screening, this design strategy could help interrogate how complex binding site arrangements impact either gene expression or biomolecular mechanisms in varied cellular contexts. Author Summary The way protein binding sites are arranged on DNA can control the regulation and transcription of downstream genes. Areas with a high concentration of binding sites can enable complex interplay between transcription factors, a feature that is exploited by natural promoters. However, designing synthetic promoters that contain dense arrangements of binding sites is a challenge. The task involves overlapping many binding sites, each typically about 10 nucleotides long, within a constrained sequence area, which becomes increasingly difficult as sequence length decreases, and binding site variety increases. We introduce an approach to design nucleotide sequences with optimally packed protein binding sites, which we call the nucleotide String Packing Problem (SPP). We show that the SPP can be solved efficiently using integer linear programming to identify the densest arrangements of binding sites for a specified sequence length. We show how adding additional constraints, like the inclusion of sequence elements with fixed positions, allows for the design of bacterial promoters. The presented approach enables the rapid design and study of nucleotide sequences with complex, dense binding site architectures.
Collapse
|
7
|
Loell KJ, Friedman RZ, Myers CA, Corbo JC, Cohen BA, White MA. Transcription factor interactions explain the context-dependent activity of CRX binding sites. PLoS Comput Biol 2024; 20:e1011802. [PMID: 38227575 PMCID: PMC10817189 DOI: 10.1371/journal.pcbi.1011802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 01/26/2024] [Accepted: 01/06/2024] [Indexed: 01/18/2024] Open
Abstract
The effects of transcription factor binding sites (TFBSs) on the activity of a cis-regulatory element (CRE) depend on the local sequence context. In rod photoreceptors, binding sites for the transcription factor (TF) Cone-rod homeobox (CRX) occur in both enhancers and silencers, but the sequence context that determines whether CRX binding sites contribute to activation or repression of transcription is not understood. To investigate the context-dependent activity of CRX sites, we fit neural network-based models to the activities of synthetic CREs composed of photoreceptor TFBSs. The models revealed that CRX binding sites consistently make positive, independent contributions to CRE activity, while negative homotypic interactions between sites cause CREs composed of multiple CRX sites to function as silencers. The effects of negative homotypic interactions can be overcome by the presence of other TFBSs that either interact cooperatively with CRX sites or make independent positive contributions to activity. The context-dependent activity of CRX sites is thus determined by the balance between positive heterotypic interactions, independent contributions of TFBSs, and negative homotypic interactions. Our findings explain observed patterns of activity among genomic CRX-bound enhancers and silencers, and suggest that enhancers may require diverse TFBSs to overcome negative homotypic interactions between TFBSs.
Collapse
Affiliation(s)
- Kaiser J. Loell
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri, United States of America
- The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, Missouri, United States of America
| | - Ryan Z. Friedman
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri, United States of America
- The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, Missouri, United States of America
| | - Connie A. Myers
- Department of Pathology and Immunology, Washington University School of Medicine in St. Louis, St. Louis, Missouri, United States of America
| | - Joseph C. Corbo
- Department of Pathology and Immunology, Washington University School of Medicine in St. Louis, St. Louis, Missouri, United States of America
| | - Barak A. Cohen
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri, United States of America
- The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, Missouri, United States of America
| | - Michael A. White
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri, United States of America
- The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, Missouri, United States of America
| |
Collapse
|
8
|
Kleinschmidt H, Xu C, Bai L. Using Synthetic DNA Libraries to Investigate Chromatin and Gene Regulation. Chromosoma 2023; 132:167-189. [PMID: 37184694 PMCID: PMC10542970 DOI: 10.1007/s00412-023-00796-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2023] [Revised: 04/25/2023] [Accepted: 04/26/2023] [Indexed: 05/16/2023]
Abstract
Despite the recent explosion in genome-wide studies in chromatin and gene regulation, we are still far from extracting a set of genetic rules that can predict the function of the regulatory genome. One major reason for this deficiency is that gene regulation is a multi-layered process that involves an enormous variable space, which cannot be fully explored using native genomes. This problem can be partially solved by introducing synthetic DNA libraries into cells, a method that can test the regulatory roles of thousands to millions of sequences with limited variables. Here, we review recent applications of this method to study transcription factor (TF) binding, nucleosome positioning, and transcriptional activity. We discuss the design principles, experimental procedures, and major findings from these studies and compare the pros and cons of different approaches.
Collapse
Affiliation(s)
- Holly Kleinschmidt
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Cheng Xu
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Lu Bai
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, 16802, USA.
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, 16802, USA.
- Department of Physics, The Pennsylvania State University, University Park, PA, 16802, USA.
| |
Collapse
|
9
|
Tack DS, Tonner PD, Pressman A, Olson ND, Levy SF, Romantseva EF, Alperovich N, Vasilyeva O, Ross D. Precision engineering of biological function with large-scale measurements and machine learning. PLoS One 2023; 18:e0283548. [PMID: 36989327 PMCID: PMC10057847 DOI: 10.1371/journal.pone.0283548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 03/11/2023] [Indexed: 03/30/2023] Open
Abstract
As synthetic biology expands and accelerates into real-world applications, methods for quantitatively and precisely engineering biological function become increasingly relevant. This is particularly true for applications that require programmed sensing to dynamically regulate gene expression in response to stimuli. However, few methods have been described that can engineer biological sensing with any level of quantitative precision. Here, we present two complementary methods for precision engineering of genetic sensors: in silico selection and machine-learning-enabled forward engineering. Both methods use a large-scale genotype-phenotype dataset to identify DNA sequences that encode sensors with quantitatively specified dose response. First, we show that in silico selection can be used to engineer sensors with a wide range of dose-response curves. To demonstrate in silico selection for precise, multi-objective engineering, we simultaneously tune a genetic sensor's sensitivity (EC50) and saturating output to meet quantitative specifications. In addition, we engineer sensors with inverted dose-response and specified EC50. Second, we demonstrate a machine-learning-enabled approach to predictively engineer genetic sensors with mutation combinations that are not present in the large-scale dataset. We show that the interpretable machine learning results can be combined with a biophysical model to engineer sensors with improved inverted dose-response curves.
Collapse
Affiliation(s)
- Drew S Tack
- National Institute of Standards and Technology, Gaithersburg, MD, United States of America
| | - Peter D Tonner
- National Institute of Standards and Technology, Gaithersburg, MD, United States of America
| | - Abe Pressman
- National Institute of Standards and Technology, Gaithersburg, MD, United States of America
| | - Nathan D Olson
- National Institute of Standards and Technology, Gaithersburg, MD, United States of America
| | - Sasha F Levy
- SLAC National Accelerator Laboratory, Menlo Park, CA, United States of America
- Joint Initiative for Metrology in Biology, Stanford, CA, United States of America
| | - Eugenia F Romantseva
- National Institute of Standards and Technology, Gaithersburg, MD, United States of America
| | - Nina Alperovich
- National Institute of Standards and Technology, Gaithersburg, MD, United States of America
| | - Olga Vasilyeva
- National Institute of Standards and Technology, Gaithersburg, MD, United States of America
| | - David Ross
- National Institute of Standards and Technology, Gaithersburg, MD, United States of America
| |
Collapse
|
10
|
Cooper YA, Guo Q, Geschwind DH. Multiplexed functional genomic assays to decipher the noncoding genome. Hum Mol Genet 2022; 31:R84-R96. [PMID: 36057282 PMCID: PMC9585676 DOI: 10.1093/hmg/ddac194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2022] [Revised: 08/08/2022] [Accepted: 08/09/2022] [Indexed: 11/14/2022] Open
Abstract
Linkage disequilibrium and the incomplete regulatory annotation of the noncoding genome complicates the identification of functional noncoding genetic variants and their causal association with disease. Current computational methods for variant prioritization have limited predictive value, necessitating the application of highly parallelized experimental assays to efficiently identify functional noncoding variation. Here, we summarize two distinct approaches, massively parallel reporter assays and CRISPR-based pooled screens and describe their flexible implementation to characterize human noncoding genetic variation at unprecedented scale. Each approach provides unique advantages and limitations, highlighting the importance of multimodal methodological integration. These multiplexed assays of variant effects are undoubtedly poised to play a key role in the experimental characterization of noncoding genetic risk, informing our understanding of the underlying mechanisms of disease-associated loci and the development of more robust predictive classification algorithms.
Collapse
Affiliation(s)
- Yonatan A Cooper
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
- Medical Scientist Training Program, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
- Center for Neurobehavioral Genetics, Jane and Terry Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA, USA
| | - Qiuyu Guo
- Center for Neurobehavioral Genetics, Jane and Terry Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA, USA
| | - Daniel H Geschwind
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
- Program in Neurogenetics, Department of Neurology, University of California Los Angeles, Los Angeles, CA, USA
- Center for Autism Research and Treatment, Semel Institute, University of California Los Angeles, Los Angeles, CA, USA
- Institute of Precision Health, University of California Los Angeles, Los Angeles, CA, USA
| |
Collapse
|
11
|
Shahein A, López-Malo M, Istomin I, Olson EJ, Cheng S, Maerkl SJ. Systematic analysis of low-affinity transcription factor binding site clusters in vitro and in vivo establishes their functional relevance. Nat Commun 2022; 13:5273. [PMID: 36071116 PMCID: PMC9452512 DOI: 10.1038/s41467-022-32971-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 08/25/2022] [Indexed: 11/10/2022] Open
Abstract
Binding to binding site clusters has yet to be characterized in depth, and the functional relevance of low-affinity clusters remains uncertain. We characterized transcription factor binding to low-affinity clusters in vitro and found that transcription factors can bind concurrently to overlapping sites, challenging the notion of binding exclusivity. Furthermore, small clusters with binding sites an order of magnitude lower in affinity give rise to high mean occupancies at physiologically-relevant transcription factor concentrations. To assess whether the observed in vitro occupancies translate to transcriptional activation in vivo, we tested low-affinity binding site clusters in a synthetic and native gene regulatory network in S. cerevisiae. In both systems, clusters of low-affinity binding sites generated transcriptional output comparable to single or even multiple consensus sites. This systematic characterization demonstrates that clusters of low-affinity binding sites achieve substantial occupancies, and that this occupancy can drive expression in eukaryotic promoters.
Collapse
Affiliation(s)
- Amir Shahein
- Institute of Bioengineering, School of Engineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Maria López-Malo
- Institute of Bioengineering, School of Engineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Ivan Istomin
- Institute of Bioengineering, School of Engineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Evan J Olson
- Institute of Bioengineering, School of Engineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Shiyu Cheng
- Institute of Bioengineering, School of Engineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Sebastian J Maerkl
- Institute of Bioengineering, School of Engineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland.
| |
Collapse
|
12
|
Perkins ML, Gandara L, Crocker J. A synthetic synthesis to explore animal evolution and development. Philos Trans R Soc Lond B Biol Sci 2022; 377:20200517. [PMID: 35634925 PMCID: PMC9149795 DOI: 10.1098/rstb.2020.0517] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Identifying the general principles by which genotypes are converted into phenotypes remains a challenge in the post-genomic era. We still lack a predictive understanding of how genes shape interactions among cells and tissues in response to signalling and environmental cues, and hence how regulatory networks generate the phenotypic variation required for adaptive evolution. Here, we discuss how techniques borrowed from synthetic biology may facilitate a systematic exploration of evolvability across biological scales. Synthetic approaches permit controlled manipulation of both endogenous and fully engineered systems, providing a flexible platform for investigating causal mechanisms in vivo. Combining synthetic approaches with multi-level phenotyping (phenomics) will supply a detailed, quantitative characterization of how internal and external stimuli shape the morphology and behaviour of living organisms. We advocate integrating high-throughput experimental data with mathematical and computational techniques from a variety of disciplines in order to pursue a comprehensive theory of evolution. This article is part of the theme issue ‘Genetic basis of adaptation and speciation: from loci to causative mutations’.
Collapse
Affiliation(s)
- Mindy Liu Perkins
- Developmental Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Lautaro Gandara
- Developmental Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Justin Crocker
- Developmental Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| |
Collapse
|
13
|
Tareen A, Kooshkbaghi M, Posfai A, Ireland WT, McCandlish DM, Kinney JB. MAVE-NN: learning genotype-phenotype maps from multiplex assays of variant effect. Genome Biol 2022; 23:98. [PMID: 35428271 PMCID: PMC9011994 DOI: 10.1186/s13059-022-02661-7] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 03/24/2022] [Indexed: 12/17/2022] Open
Abstract
Multiplex assays of variant effect (MAVEs) are a family of methods that includes deep mutational scanning experiments on proteins and massively parallel reporter assays on gene regulatory sequences. Despite their increasing popularity, a general strategy for inferring quantitative models of genotype-phenotype maps from MAVE data is lacking. Here we introduce MAVE-NN, a neural-network-based Python package that implements a broadly applicable information-theoretic framework for learning genotype-phenotype maps—including biophysically interpretable models—from MAVE datasets. We demonstrate MAVE-NN in multiple biological contexts, and highlight the ability of our approach to deconvolve mutational effects from otherwise confounding experimental nonlinearities and noise.
Collapse
|
14
|
He N, Wang W, Fang C, Tan Y, Li L, Hou C. Integration of Count Difference and Curve Similarity in Negative Regulatory Element Detection. Front Genet 2022; 13:818344. [PMID: 35251128 PMCID: PMC8896116 DOI: 10.3389/fgene.2022.818344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Accepted: 01/20/2022] [Indexed: 12/05/2022] Open
Abstract
Negative regulatory elements (NREs) down-regulate gene expression by inhibiting the activities of promoters or enhancers. The repressing activity of NREs can be measured globally by massively parallel reporter assays (MPRAs). However, most existing algorithms are designed for the statistical detection of positively enriched signals in MPRA datasets. To identify reduced signals in MPRA experiments, we designed a NRE identification program, fast-NR, by integrating the count and graphic features of sequenced reads to detect NREs using datasets generated by experiments of self-transcribing active regulatory region sequencing (STARR-seq). Fast-NR identified hundreds of silencers in human K562 cells that can be validated by independent methods.
Collapse
Affiliation(s)
- Na He
- Harbin Institute of Technology, Harbin, China
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
- *Correspondence: Chunhui Hou, ; Na He,
| | - Wenjing Wang
- School of Life Science and State Key Laboratory of Agrobiotechnology, The Chinese University of Hong Kong, Hong Kong, Hong Kong SAR, China
| | - Chao Fang
- Cancer Centre, Faculty of Health Sciences, University of Macau, Macao, China
| | - Yongjian Tan
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
| | - Li Li
- Department of Bioinformatics, Huazhong Agricultural University, Wuhan, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
| | - Chunhui Hou
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
- *Correspondence: Chunhui Hou, ; Na He,
| |
Collapse
|
15
|
Anderson DA, Voigt CA. Competitive dCas9 binding as a mechanism for transcriptional control. Mol Syst Biol 2021; 17:e10512. [PMID: 34747560 PMCID: PMC8574044 DOI: 10.15252/msb.202110512] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Revised: 10/10/2021] [Accepted: 10/11/2021] [Indexed: 12/24/2022] Open
Abstract
Catalytically dead Cas9 (dCas9) is a programmable transcription factor that can be targeted to promoters through the design of small guide RNAs (sgRNAs), where it can function as an activator or repressor. Natural promoters use overlapping binding sites as a mechanism for signal integration, where the binding of one can block, displace, or augment the activity of the other. Here, we implemented this strategy in Escherichia coli using pairs of sgRNAs designed to repress and then derepress transcription through competitive binding. When designed to target a promoter, this led to 27-fold repression and complete derepression. This system was also capable of ratiometric input comparison over two orders of magnitude. Additionally, we used this mechanism for promoter sequence-independent control by adopting it for elongation control, achieving 8-fold repression and 4-fold derepression. This work demonstrates a new genetic control mechanism that could be used to build analog circuit or implement cis-regulatory logic on CRISPRi-targeted native genes.
Collapse
Affiliation(s)
- Daniel A Anderson
- Synthetic Biology CenterDepartment of Biological EngineeringMassachusetts Institute of TechnologyCambridgeMAUSA
| | - Christopher A Voigt
- Synthetic Biology CenterDepartment of Biological EngineeringMassachusetts Institute of TechnologyCambridgeMAUSA
| |
Collapse
|
16
|
Shih CH, Fay J. Cis-regulatory variants affect gene expression dynamics in yeast. eLife 2021; 10:e68469. [PMID: 34369376 PMCID: PMC8367379 DOI: 10.7554/elife.68469] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Accepted: 08/06/2021] [Indexed: 12/14/2022] Open
Abstract
Evolution of cis-regulatory sequences depends on how they affect gene expression and motivates both the identification and prediction of cis-regulatory variants responsible for expression differences within and between species. While much progress has been made in relating cis-regulatory variants to expression levels, the timing of gene activation and repression may also be important to the evolution of cis-regulatory sequences. We investigated allele-specific expression (ASE) dynamics within and between Saccharomyces species during the diauxic shift and found appreciable cis-acting variation in gene expression dynamics. Within-species ASE is associated with intergenic variants, and ASE dynamics are more strongly associated with insertions and deletions than ASE levels. To refine these associations, we used a high-throughput reporter assay to test promoter regions and individual variants. Within the subset of regions that recapitulated endogenous expression, we identified and characterized cis-regulatory variants that affect expression dynamics. Between species, chimeric promoter regions generate novel patterns and indicate constraints on the evolution of gene expression dynamics. We conclude that changes in cis-regulatory sequences can tune gene expression dynamics and that the interplay between expression dynamics and other aspects of expression is relevant to the evolution of cis-regulatory sequences.
Collapse
Affiliation(s)
- Ching-Hua Shih
- Department of Biology, University of RochesterRochesterUnited States
| | - Justin Fay
- Department of Biology, University of RochesterRochesterUnited States
| |
Collapse
|
17
|
Lee D, Kapoor A, Lee C, Mudgett M, Beer MA, Chakravarti A. Sequence-based correction of barcode bias in massively parallel reporter assays. Genome Res 2021; 31:1638-1645. [PMID: 34285053 PMCID: PMC8415370 DOI: 10.1101/gr.268599.120] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2020] [Accepted: 07/07/2021] [Indexed: 11/24/2022]
Abstract
Massively parallel reporter assays (MPRAs) are a high-throughput method for evaluating in vitro activities of thousands of candidate cis-regulatory elements (CREs). In these assays, candidate sequences are cloned upstream or downstream from a reporter gene tagged by unique DNA sequences. However, tag sequences may themselves affect reporter gene expression and lead to major potential biases in the measured cis-regulatory activity. Here, we present a sequence-based method for correcting tag-sequence-specific effects and show that our method can significantly reduce this source of variation and improve the identification of functional regulatory variants by MPRAs. We also show that our model captures sequence features associated with post-transcriptional regulation of mRNA. Thus, this new method helps not only to improve detection of regulatory signals in MPRA experiments but also to design better MPRA protocols.
Collapse
Affiliation(s)
| | - Ashish Kapoor
- University of Texas Health Science Center at Houston
| | | | | | | | | |
Collapse
|
18
|
Letiagina AE, Omelina ES, Ivankin AV, Pindyurin AV. MPRAdecoder: Processing of the Raw MPRA Data With a priori Unknown Sequences of the Region of Interest and Associated Barcodes. Front Genet 2021; 12:618189. [PMID: 34046055 PMCID: PMC8148044 DOI: 10.3389/fgene.2021.618189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 03/25/2021] [Indexed: 11/13/2022] Open
Abstract
Massively parallel reporter assays (MPRAs) enable high-throughput functional evaluation of numerous DNA regulatory elements and/or their mutant variants. The assays are based on the construction of reporter plasmid libraries containing two variable parts, a region of interest (ROI) and a barcode (BC), located outside and within the transcription unit, respectively. Importantly, each plasmid molecule in a such a highly diverse library is characterized by a unique BC-ROI association. The reporter constructs are delivered to target cells and expression of BCs at the transcript level is assayed by RT-PCR followed by next-generation sequencing (NGS). The obtained values are normalized to the abundance of BCs in the plasmid DNA sample. Altogether, this allows evaluating the regulatory potential of the associated ROI sequences. However, depending on the MPRA library construction design, the BC and ROI sequences as well as their associations can be a priori unknown. In such a case, the BC and ROI sequences, their possible mutant variants, and unambiguous BC-ROI associations have to be identified, whereas all uncertain cases have to be excluded from the analysis. Besides the preparation of additional "mapping" samples for NGS, this also requires specific bioinformatics tools. Here, we present a pipeline for processing raw MPRA data obtained by NGS for reporter construct libraries with a priori unknown sequences of BCs and ROIs. The pipeline robustly identifies unambiguous (so-called genuine) BCs and ROIs associated with them, calculates the normalized expression level for each BC and the averaged values for each ROI, and provides a graphical visualization of the processed data.
Collapse
Affiliation(s)
- Anna E Letiagina
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia.,Faculty of Natural Sciences, Novosibirsk State University, Novosibirsk, Russia
| | - Evgeniya S Omelina
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Anton V Ivankin
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Alexey V Pindyurin
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| |
Collapse
|
19
|
Yu TC, Liu WL, Brinck MS, Davis JE, Shek J, Bower G, Einav T, Insigne KD, Phillips R, Kosuri S, Urtecho G. Multiplexed characterization of rationally designed promoter architectures deconstructs combinatorial logic for IPTG-inducible systems. Nat Commun 2021; 12:325. [PMID: 33436562 PMCID: PMC7804116 DOI: 10.1038/s41467-020-20094-3] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Accepted: 11/04/2020] [Indexed: 12/21/2022] Open
Abstract
A crucial step towards engineering biological systems is the ability to precisely tune the genetic response to environmental stimuli. In the case of Escherichia coli inducible promoters, our incomplete understanding of the relationship between sequence composition and gene expression hinders our ability to predictably control transcriptional responses. Here, we profile the expression dynamics of 8269 rationally designed, IPTG-inducible promoters that collectively explore the individual and combinatorial effects of RNA polymerase and LacI repressor binding site strengths. We then fit a statistical mechanics model to measured expression that accurately models gene expression and reveals properties of theoretically optimal inducible promoters. Furthermore, we characterize three alternative promoter architectures and show that repositioning binding sites within promoters influences the types of combinatorial effects observed between promoter elements. In total, this approach enables us to deconstruct relationships between inducible promoter elements and discover practical insights for engineering inducible promoters with desirable characteristics.
Collapse
Affiliation(s)
- Timothy C Yu
- Department of Bioengineering, University of California, Los Angeles, CA, 90095, USA
| | - Winnie L Liu
- Department of Molecular, Cell, and Developmental Biology, University of California, Los Angeles, CA, 90095, USA
| | - Marcia S Brinck
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, CA, 90095, USA
| | - Jessica E Davis
- Department of Chemistry and Biochemistry, University of California, Los Angeles, CA, 90095, USA
| | - Jeremy Shek
- Department of Chemistry and Biochemistry, University of California, Los Angeles, CA, 90095, USA
| | - Grace Bower
- Department of Molecular, Cell, and Developmental Biology, University of California, Los Angeles, CA, 90095, USA
| | - Tal Einav
- Department of Physics, California Institute of Technology, Pasadena, CA, 91125, USA
| | - Kimberly D Insigne
- Bioinformatics Interdepartmental Graduate Program, University of California, Los Angeles, CA, 90095, USA
| | - Rob Phillips
- Department of Physics, California Institute of Technology, Pasadena, CA, 91125, USA
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125, USA
- Department of Applied Physics, California Institute of Technology, Pasadena, CA, 91125, USA
| | - Sriram Kosuri
- Department of Chemistry and Biochemistry, University of California, Los Angeles, CA, 90095, USA.
- UCLA-DOE Institute for Genomics and Proteomics, Los Angeles, CA, 90095, USA.
- Institute for Quantitative and Computational Biosciences (QCB), University of California, Los Angeles, Los Angeles, CA, 90095, USA.
- Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, 90095, USA.
- Molecular Biology Interdepartmental Doctoral Program, University of California, Los Angeles, CA, 90095, USA.
| | - Guillaume Urtecho
- Molecular Biology Interdepartmental Doctoral Program, University of California, Los Angeles, CA, 90095, USA.
| |
Collapse
|
20
|
Mulvey B, Lagunas T, Dougherty JD. Massively Parallel Reporter Assays: Defining Functional Psychiatric Genetic Variants Across Biological Contexts. Biol Psychiatry 2021; 89:76-89. [PMID: 32843144 PMCID: PMC7938388 DOI: 10.1016/j.biopsych.2020.06.011] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Revised: 06/09/2020] [Accepted: 06/10/2020] [Indexed: 12/18/2022]
Abstract
Neuropsychiatric phenotypes have long been known to be influenced by heritable risk factors, directly confirmed by the past decade of genetic studies that have revealed specific genetic variants enriched in disease cohorts. However, the initial hope that a small set of genes would be responsible for a given disorder proved false. The more complex reality is that a given disorder may be influenced by myriad small-effect noncoding variants and/or by rare but severe coding variants, many de novo. Noncoding genomic sequences-for which molecular functions cannot usually be inferred-harbor a large portion of these variants, creating a substantial barrier to understanding higher-order molecular and biological systems of disease. Fortunately, novel genetic technologies-scalable oligonucleotide synthesis, RNA sequencing, and CRISPR (clustered regularly interspaced short palindromic repeats)-have opened novel avenues to experimentally identify biologically significant variants en masse. Massively parallel reporter assays (MPRAs) are an especially versatile technique resulting from such innovations. MPRAs are powerful molecular genetics tools that can be used to screen thousands of untranscribed or untranslated sequences and their variants for functional effects in a single experiment. This approach, though underutilized in psychiatric genetics, has several useful features for the field. We review methods for assaying putatively functional genetic variants and regions, emphasizing MPRAs and the opportunities they hold for dissection of psychiatric polygenicity. We discuss literature applying functional assays in neurogenetics, highlighting strengths, caveats, and design considerations-especially regarding disease-relevant variables (cell type, neurodevelopment, and sex), and we ultimately propose applications of MPRA to both computational and experimental neurogenetics of polygenic disease risk.
Collapse
Affiliation(s)
- Bernard Mulvey
- Division of Biology and Biomedical Sciences, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Psychiatry, Washington University School of Medicine in St. Louis, St. Louis, Missouri
| | - Tomás Lagunas
- Division of Biology and Biomedical Sciences, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Psychiatry, Washington University School of Medicine in St. Louis, St. Louis, Missouri
| | - Joseph D Dougherty
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Psychiatry, Washington University School of Medicine in St. Louis, St. Louis, Missouri.
| |
Collapse
|
21
|
Renganaath K, Chong R, Day L, Kosuri S, Kruglyak L, Albert FW. Systematic identification of cis-regulatory variants that cause gene expression differences in a yeast cross. eLife 2020; 9:e62669. [PMID: 33179598 PMCID: PMC7685706 DOI: 10.7554/elife.62669] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Accepted: 11/11/2020] [Indexed: 02/06/2023] Open
Abstract
Sequence variation in regulatory DNA alters gene expression and shapes genetically complex traits. However, the identification of individual, causal regulatory variants is challenging. Here, we used a massively parallel reporter assay to measure the cis-regulatory consequences of 5832 natural DNA variants in the promoters of 2503 genes in the yeast Saccharomyces cerevisiae. We identified 451 causal variants, which underlie genetic loci known to affect gene expression. Several promoters harbored multiple causal variants. In five promoters, pairs of variants showed non-additive, epistatic interactions. Causal variants were enriched at conserved nucleotides, tended to have low derived allele frequency, and were depleted from promoters of essential genes, which is consistent with the action of negative selection. Causal variants were also enriched for alterations in transcription factor binding sites. Models integrating these features provided modest, but statistically significant, ability to predict causal variants. This work revealed a complex molecular basis for cis-acting regulatory variation.
Collapse
Affiliation(s)
- Kaushik Renganaath
- Department of Genetics, Cell Biology, & Development, University of MinnesotaMinneapolisUnited States
| | - Rockie Chong
- Department of Chemistry & Biochemistry, University of California, Los AngelesLos AngelesUnited States
| | - Laura Day
- Department of Human Genetics, University of California, Los AngelesLos AngelesUnited States
- Department of Biological Chemistry, University of California, Los AngelesLos AngelesUnited States
- Howard Hughes Medical Institute, University of California, Los AngelesLos AngelesUnited States
| | - Sriram Kosuri
- Department of Chemistry & Biochemistry, University of California, Los AngelesLos AngelesUnited States
| | - Leonid Kruglyak
- Department of Human Genetics, University of California, Los AngelesLos AngelesUnited States
- Department of Biological Chemistry, University of California, Los AngelesLos AngelesUnited States
- Howard Hughes Medical Institute, University of California, Los AngelesLos AngelesUnited States
| | - Frank W Albert
- Department of Genetics, Cell Biology, & Development, University of MinnesotaMinneapolisUnited States
| |
Collapse
|
22
|
Fuqua T, Jordan J, van Breugel ME, Halavatyi A, Tischer C, Polidoro P, Abe N, Tsai A, Mann RS, Stern DL, Crocker J. Dense and pleiotropic regulatory information in a developmental enhancer. Nature 2020; 587:235-239. [PMID: 33057197 DOI: 10.1038/s41586-020-2816-5] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Accepted: 07/22/2020] [Indexed: 01/08/2023]
Abstract
Changes in gene regulation underlie much of phenotypic evolution1. However, our understanding of the potential for regulatory evolution is biased, because most evidence comes from either natural variation or limited experimental perturbations2. Using an automated robotics pipeline, we surveyed an unbiased mutation library for a developmental enhancer in Drosophila melanogaster. We found that almost all mutations altered gene expression and that parameters of gene expression-levels, location, and state-were convolved. The widespread pleiotropic effects of most mutations may constrain the evolvability of developmental enhancers. Consistent with these observations, comparisons of diverse Drosophila larvae revealed apparent biases in the phenotypes influenced by the enhancer. Developmental enhancers may encode a higher density of regulatory information than has been appreciated previously, imposing constraints on regulatory evolution.
Collapse
Affiliation(s)
- Timothy Fuqua
- European Molecular Biology Laboratory, Heidelberg, Germany.,Joint PhD Collaboration, EMBL and Faculty of Biosciences Heidelberg University, Heidelberg, Germany
| | | | | | | | | | | | - Namiko Abe
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
| | - Albert Tsai
- European Molecular Biology Laboratory, Heidelberg, Germany
| | - Richard S Mann
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
| | | | - Justin Crocker
- European Molecular Biology Laboratory, Heidelberg, Germany.
| |
Collapse
|
23
|
Hammelman J, Krismer K, Banerjee B, Gifford DK, Sherwood RI. Identification of determinants of differential chromatin accessibility through a massively parallel genome-integrated reporter assay. Genome Res 2020; 30:1468-1480. [PMID: 32973041 PMCID: PMC7605270 DOI: 10.1101/gr.263228.120] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Accepted: 08/26/2020] [Indexed: 12/20/2022]
Abstract
A key mechanism in cellular regulation is the ability of the transcriptional machinery to physically access DNA. Transcription factors interact with DNA to alter the accessibility of chromatin, which enables changes to gene expression during development or disease or as a response to environmental stimuli. However, the regulation of DNA accessibility via the recruitment of transcription factors is difficult to study in the context of the native genome because every genomic site is distinct in multiple ways. Here we introduce the multiplexed integrated accessibility assay (MIAA), an assay that measures chromatin accessibility of synthetic oligonucleotide sequence libraries integrated into a controlled genomic context with low native accessibility. We apply MIAA to measure the effects of sequence motifs on cell type-specific accessibility between mouse embryonic stem cells and embryonic stem cell-derived definitive endoderm cells, screening 7905 distinct DNA sequences. MIAA recapitulates differential accessibility patterns of 100-nt sequences derived from natively differential genomic regions, identifying E-box motifs common to epithelial-mesenchymal transition driver transcription factors in stem cell-specific accessible regions that become repressed in endoderm. We show that a single binding motif for a key regulatory transcription factor is sufficient to open chromatin, and classify sets of stem cell-specific, endoderm-specific, and shared accessibility-modifying transcription factor motifs. We also show that overexpression of two definitive endoderm transcription factors, T and Foxa2, results in changes to accessibility in DNA sequences containing their respective DNA-binding motifs and identify preferential motif arrangements that influence accessibility.
Collapse
Affiliation(s)
- Jennifer Hammelman
- Computational and Systems Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Konstantin Krismer
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Budhaditya Banerjee
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts 02115, USA
| | - David K Gifford
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Richard I Sherwood
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts 02115, USA
- Hubrecht Institute, 3584 CT Utrecht, Netherlands
| |
Collapse
|
24
|
Ray JP, de Boer CG, Fulco CP, Lareau CA, Kanai M, Ulirsch JC, Tewhey R, Ludwig LS, Reilly SK, Bergman DT, Engreitz JM, Issner R, Finucane HK, Lander ES, Regev A, Hacohen N. Prioritizing disease and trait causal variants at the TNFAIP3 locus using functional and genomic features. Nat Commun 2020; 11:1237. [PMID: 32144282 PMCID: PMC7060350 DOI: 10.1038/s41467-020-15022-4] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2020] [Accepted: 02/17/2020] [Indexed: 12/19/2022] Open
Abstract
Genome-wide association studies have associated thousands of genetic variants with complex traits and diseases, but pinpointing the causal variant(s) among those in tight linkage disequilibrium with each associated variant remains a major challenge. Here, we use seven experimental assays to characterize all common variants at the multiple disease-associated TNFAIP3 locus in five disease-relevant immune cell lines, based on a set of features related to regulatory potential. Trait/disease-associated variants are enriched among SNPs prioritized based on either: (1) residing within CRISPRi-sensitive regulatory regions, or (2) localizing in a chromatin accessible region while displaying allele-specific reporter activity. Of the 15 trait/disease-associated haplotypes at TNFAIP3, 9 have at least one variant meeting one or both of these criteria, 5 of which are further supported by genetic fine-mapping. Our work provides a comprehensive strategy to characterize genetic variation at important disease-associated loci, and aids in the effort to identify trait causal genetic variants.
Collapse
Affiliation(s)
- John P Ray
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Carl G de Boer
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Charles P Fulco
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Department of Systems Biology, Harvard Medical School, Boston, MA, 02115, USA
| | - Caleb A Lareau
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA, 02115, USA
| | - Masahiro Kanai
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, 02114, USA
- Program in Bioinformatics and Integrative Genomics, Harvard Medical School, Boston, MA, 02115, USA
| | - Jacob C Ulirsch
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA, 02115, USA
| | - Ryan Tewhey
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, 02138, USA
| | - Leif S Ludwig
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Steven K Reilly
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, 02138, USA
| | - Drew T Bergman
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Jesse M Engreitz
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Harvard Society of Fellows, Harvard University, Cambridge, MA, 02138, USA
| | - Robbyn Issner
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Hilary K Finucane
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, 02114, USA
| | - Eric S Lander
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Department of Systems Biology, Harvard Medical School, Boston, MA, 02115, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02142, USA
| | - Aviv Regev
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02142, USA.
- Howard Hughes Medical Institute, Cambridge, MA, 02142, USA.
| | - Nir Hacohen
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.
- Center for Cancer Research, Massachusetts General Hospital, Boston, MA, 02114, USA.
| |
Collapse
|
25
|
King DM, Hong CKY, Shepherdson JL, Granas DM, Maricque BB, Cohen BA. Synthetic and genomic regulatory elements reveal aspects of cis-regulatory grammar in mouse embryonic stem cells. eLife 2020; 9:41279. [PMID: 32043966 PMCID: PMC7077988 DOI: 10.7554/elife.41279] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2018] [Accepted: 02/07/2020] [Indexed: 01/08/2023] Open
Abstract
In embryonic stem cells (ESCs), a core transcription factor (TF) network establishes the gene expression program necessary for pluripotency. To address how interactions between four key TFs contribute to cis-regulation in mouse ESCs, we assayed two massively parallel reporter assay (MPRA) libraries composed of binding sites for SOX2, POU5F1 (OCT4), KLF4, and ESRRB. Comparisons between synthetic cis-regulatory elements and genomic sequences with comparable binding site configurations revealed some aspects of a regulatory grammar. The expression of synthetic elements is influenced by both the number and arrangement of binding sites. This grammar plays only a small role for genomic sequences, as the relative activities of genomic sequences are best explained by the predicted occupancy of binding sites, regardless of binding site identity and positioning. Our results suggest that the effects of transcription factor binding sites (TFBS) are influenced by the order and orientation of sites, but that in the genome the overall occupancy of TFs is the primary determinant of activity. Transcription factors are proteins that flip genetic switches; their role is to control when and where genes are active. They do this by binding to short stretches of DNA called cis-regulatory sequences. Each sequence can have several binding sites for different transcription factors, but it is largely unclear whether the transcription factors binding to the same regulatory sequence actually work together. It is possible that each transcription factor may work independently and there only needs to be critical mass of transcription factors bound to throw the genetic switch. If this is the case, the most important features of a cis-regulatory sequence should be the number of binding sites it contains, and how tightly the transcription factors bind to those sites. The more transcription factors and the more strongly they bind, the more active the gene should be. An alternative option is that certain transcription factors may work better together, enhancing each other's effects such that the total effect is more than the sum of its parts. If this is true, the order, orientation and spacing of the binding sites within a sequence should matter more than the number. One way to investigate to distinguish between these possibilities is to study mouse embryonic stem cells, which have a core set of four transcription factors. Looking directly at a real genome, however, can be confusing and it is difficult to measure the effects of different cis-regulatory sequences because genes differ in so many other ways. To tackle this problem, King et al. created a synthetic set of cis-regulatory sequences based on the four core transcription factors found in mouse stem cells. The synthetic set had every combination of two, three or four of the binding sites, with each site either facing forwards or backwards along the DNA strand. King et al. attached each of the synthetic cis-regulatory sequences to a reporter gene to find out how well each sequence performed. This revealed that the cis-regulatory sequences with the most binding sites and the tightest binding affinities work best, suggesting that transcription factors mainly work independently. There was evidence of some interaction between some transcription factors, because, of the synthetic sequences with four binding sites, some worked better than others, and there were patterns in the most effective binding site combinations. However, these effects were small and when King et al. went on to test sequences from the real mouse genome, the most important factor by far was the number of binding sites. Synthetic libraries of DNA sequences allow researchers to examine gene regulation more clearly than is possible in real genomes. Yet this approach does have its limitations and it is impossible to capture every type of cis-regulatory sequence in one library. The next step to extend this work is to combine the two approaches, taking sequences from the real genome and manipulating them one by one. This could help to unravel the rules that govern how cis-regulatory sequences work in real cells.
Collapse
Affiliation(s)
- Dana M King
- Edison Center for Genome Sciences and Systems Biology, Washington University in St. Louis, St. Louis, United States.,Department of Genetics, Washington University in St. Louis, St. Louis, United States
| | - Clarice Kit Yee Hong
- Edison Center for Genome Sciences and Systems Biology, Washington University in St. Louis, St. Louis, United States.,Department of Genetics, Washington University in St. Louis, St. Louis, United States
| | - James L Shepherdson
- Edison Center for Genome Sciences and Systems Biology, Washington University in St. Louis, St. Louis, United States.,Department of Genetics, Washington University in St. Louis, St. Louis, United States
| | - David M Granas
- Edison Center for Genome Sciences and Systems Biology, Washington University in St. Louis, St. Louis, United States.,Department of Genetics, Washington University in St. Louis, St. Louis, United States
| | - Brett B Maricque
- Edison Center for Genome Sciences and Systems Biology, Washington University in St. Louis, St. Louis, United States.,Department of Genetics, Washington University in St. Louis, St. Louis, United States
| | - Barak A Cohen
- Edison Center for Genome Sciences and Systems Biology, Washington University in St. Louis, St. Louis, United States.,Department of Genetics, Washington University in St. Louis, St. Louis, United States
| |
Collapse
|
26
|
Esposito D, Weile J, Shendure J, Starita LM, Papenfuss AT, Roth FP, Fowler DM, Rubin AF. MaveDB: an open-source platform to distribute and interpret data from multiplexed assays of variant effect. Genome Biol 2019; 20:223. [PMID: 31679514 PMCID: PMC6827219 DOI: 10.1186/s13059-019-1845-6] [Citation(s) in RCA: 112] [Impact Index Per Article: 22.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2019] [Accepted: 10/01/2019] [Indexed: 11/10/2022] Open
Abstract
Multiplex assays of variant effect (MAVEs), such as deep mutational scans and massively parallel reporter assays, test thousands of sequence variants in a single experiment. Despite the importance of MAVE data for basic and clinical research, there is no standard resource for their discovery and distribution. Here, we present MaveDB ( https://www.mavedb.org ), a public repository for large-scale measurements of sequence variant impact, designed for interoperability with applications to interpret these datasets. We also describe the first such application, MaveVis, which retrieves, visualizes, and contextualizes variant effect maps. Together, the database and applications will empower the community to mine these powerful datasets.
Collapse
Affiliation(s)
- Daniel Esposito
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
| | - Jochen Weile
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Lea M Starita
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Anthony T Papenfuss
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
- Department of Medical Biology, University of Melbourne, Melbourne, VIC, Australia
- Bioinformatics and Cancer Genomics Laboratory, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
- Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, VIC, Australia
- Department of Mathematics and Statistics, University of Melbourne, Melbourne, VIC, Australia
| | - Frederick P Roth
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada.
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada.
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.
- Department of Computer Science, University of Toronto, Toronto, ON, Canada.
- Canadian Institute for Advanced Research, Toronto, ON, Canada.
| | - Douglas M Fowler
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Canadian Institute for Advanced Research, Toronto, ON, Canada.
- Department of Bioengineering, University of Washington, Seattle, WA, USA.
| | - Alan F Rubin
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia.
- Department of Medical Biology, University of Melbourne, Melbourne, VIC, Australia.
- Bioinformatics and Cancer Genomics Laboratory, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia.
| |
Collapse
|
27
|
Penzar DD, Zinkevich AO, Vorontsov IE, Sitnik VV, Favorov AV, Makeev VJ, Kulakovskiy IV. What Do Neighbors Tell About You: The Local Context of Cis-Regulatory Modules Complicates Prediction of Regulatory Variants. Front Genet 2019; 10:1078. [PMID: 31737053 PMCID: PMC6834773 DOI: 10.3389/fgene.2019.01078] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2019] [Accepted: 10/09/2019] [Indexed: 02/05/2023] Open
Abstract
Many problems of modern genetics and functional genomics require the assessment of functional effects of sequence variants, including gene expression changes. Machine learning is considered to be a promising approach for solving this task, but its practical applications remain a challenge due to the insufficient volume and diversity of training data. A promising source of valuable data is a saturation mutagenesis massively parallel reporter assay, which quantitatively measures changes in transcription activity caused by sequence variants. Here, we explore the computational predictions of the effects of individual single-nucleotide variants on gene transcription measured in the massively parallel reporter assays, based on the data from the recent "Regulation Saturation" Critical Assessment of Genome Interpretation challenge. We show that the estimated prediction quality strongly depends on the structure of the training and validation data. Particularly, training on the sequence segments located next to the validation data results in the "information leakage" caused by the local context. This information leakage allows reproducing the prediction quality of the best CAGI challenge submissions with a fairly simple machine learning approach, and even obtaining notably better-than-random predictions using irrelevant genomic regions. Validation scenarios preventing such information leakage dramatically reduce the measured prediction quality. The performance at independent regulatory regions entirely excluded from the training set appears to be much lower than needed for practical applications, and even the performance estimation will become reliable only in the future with richer data from multiple reporters. The source code and data are available at https://bitbucket.org/autosomeru_cagi2018/cagi2018_regsat and https://genomeinterpretation.org/content/expression-variants.
Collapse
Affiliation(s)
- Dmitry D. Penzar
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russia
- Department of Medical and Biological Physics, Moscow Institute of Physics and Technology (State University), Dolgoprudny, Russia
| | - Arsenii O. Zinkevich
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russia
| | - Ilya E. Vorontsov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
| | - Vasily V. Sitnik
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
| | - Alexander V. Favorov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, The Johns Hopkins University School of Medicine, Baltimore, MD, United States
| | - Vsevolod J. Makeev
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
- Department of Medical and Biological Physics, Moscow Institute of Physics and Technology (State University), Dolgoprudny, Russia
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | - Ivan V. Kulakovskiy
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
- Institute of Mathematical Problems of Biology RAS - the Branch of Keldysh Institute of Applied Mathematics of Russian Academy of Sciences, Pushchino, Russia
| |
Collapse
|
28
|
Empirical measures of mutational effects define neutral models of regulatory evolution in Saccharomyces cerevisiae. Proc Natl Acad Sci U S A 2019; 116:21085-21093. [PMID: 31570626 DOI: 10.1073/pnas.1902823116] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Understanding how phenotypes evolve requires disentangling the effects of mutation generating new variation from the effects of selection filtering it. Tests for selection frequently assume that mutation introduces phenotypic variation symmetrically around the population mean, yet few studies have tested this assumption by deeply sampling the distributions of mutational effects for particular traits. Here, we examine distributions of mutational effects for gene expression in the budding yeast Saccharomyces cerevisiae by measuring the effects of thousands of point mutations introduced randomly throughout the genome. We find that the distributions of mutational effects differ for the 10 genes surveyed and are inconsistent with normality. For example, all 10 distributions of mutational effects included more mutations with large effects than expected for normally distributed phenotypes. In addition, some genes also showed asymmetries in their distribution of mutational effects, with new mutations more likely to increase than decrease the gene's expression or vice versa. Neutral models of regulatory evolution that take these empirically determined distributions into account suggest that neutral processes may explain more expression variation within natural populations than currently appreciated.
Collapse
|
29
|
Vainberg Slutskin I, Weinberger A, Segal E. Sequence determinants of polyadenylation-mediated regulation. Genome Res 2019; 29:1635-1647. [PMID: 31530582 PMCID: PMC6771402 DOI: 10.1101/gr.247312.118] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2018] [Accepted: 08/13/2019] [Indexed: 12/31/2022]
Abstract
The cleavage and polyadenylation reaction is a crucial step in transcription termination and pre-mRNA maturation in human cells. Despite extensive research, the encoding of polyadenylation-mediated regulation of gene expression within the DNA sequence is not well understood. Here, we utilized a massively parallel reporter assay to inspect the effect of over 12,000 rationally designed polyadenylation sequences (PASs) on reporter gene expression and cleavage efficiency. We find that the PAS sequence can modulate gene expression by over five orders of magnitude. By using a uniquely designed scanning mutagenesis data set, we gain mechanistic insight into various modes of action by which the cleavage efficiency affects the sensitivity or robustness of the PAS to mutation. Furthermore, we employ motif discovery to identify both known and novel sequence motifs associated with PAS-mediated regulation. By leveraging the large scale of our data, we train a deep learning model for the highly accurate prediction of RNA levels from DNA sequence alone (R = 0.83). Moreover, we devise unique approaches for predicting exact cleavage sites for our reporter constructs and for endogenous transcripts. Taken together, our results expand our understanding of PAS-mediated regulation, and provide an unprecedented resource for analyzing and predicting PAS for regulatory genomics applications.
Collapse
Affiliation(s)
- Ilya Vainberg Slutskin
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 7610001, Israel.,Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Adina Weinberger
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 7610001, Israel.,Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Eran Segal
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 7610001, Israel.,Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 7610001, Israel
| |
Collapse
|
30
|
Kreimer A, Yan Z, Ahituv N, Yosef N. Meta-analysis of massively parallel reporter assays enables prediction of regulatory function across cell types. Hum Mutat 2019; 40:1299-1313. [PMID: 31131957 PMCID: PMC6771677 DOI: 10.1002/humu.23820] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2019] [Revised: 05/18/2019] [Accepted: 05/24/2019] [Indexed: 01/01/2023]
Abstract
Deciphering the potential of noncoding loci to influence gene regulation has been the subject of intense research, with important implications in understanding genetic underpinnings of human diseases. Massively parallel reporter assays (MPRAs) can measure regulatory activity of thousands of DNA sequences and their variants in a single experiment. With increasing number of publically available MPRA data sets, one can now develop data-driven models which, given a DNA sequence, predict its regulatory activity. Here, we performed a comprehensive meta-analysis of several MPRA data sets in a variety of cellular contexts. We first applied an ensemble of methods to predict MPRA output in each context and observed that the most predictive features are consistent across data sets. We then demonstrate that predictive models trained in one cellular context can be used to predict MPRA output in another, with loss of accuracy attributed to cell-type-specific features. Finally, we show that our approach achieves top performance in the Fifth Critical Assessment of Genome Interpretation "Regulation Saturation" Challenge for predicting effects of single-nucleotide variants. Overall, our analysis provides insights into how MPRA data can be leveraged to highlight functional regulatory regions throughout the genome and can guide effective design of future experiments by better prioritizing regions of interest.
Collapse
Affiliation(s)
- Anat Kreimer
- Department of Electrical Engineering and Computer Sciences, Center for Computational BiologyUniversity of CaliforniaBerkeleyCalifornia
- Department of Bioengineering and Therapeutic SciencesUniversity of California, San FranciscoSan FranciscoCalifornia
| | - Zhongxia Yan
- Department of Electrical Engineering and Computer Sciences, Center for Computational BiologyUniversity of CaliforniaBerkeleyCalifornia
| | - Nadav Ahituv
- Department of Bioengineering and Therapeutic SciencesUniversity of California, San FranciscoSan FranciscoCalifornia
| | - Nir Yosef
- Department of Electrical Engineering and Computer Sciences, Center for Computational BiologyUniversity of CaliforniaBerkeleyCalifornia
- Ragon Institute of MGH MIT and HarvardCambridgeMassachusetts
- Chan Zuckerberg BiohubSan FranciscoCalifornia
| |
Collapse
|
31
|
Wollman AJM, Hedlund EG, Shashkova S, Leake MC. Towards mapping the 3D genome through high speed single-molecule tracking of functional transcription factors in single living cells. Methods 2019; 170:82-89. [PMID: 31252059 PMCID: PMC6971689 DOI: 10.1016/j.ymeth.2019.06.021] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Accepted: 06/22/2019] [Indexed: 10/26/2022] Open
Abstract
How genomic DNA is organized in the nucleus is a long-standing question. We describe a single-molecule bioimaging method utilizing super-localization precision coupled to fully quantitative image analysis tools, towards determining snapshots of parts of the 3D genome architecture of model eukaryote budding yeast Saccharomyces cerevisiae with exceptional millisecond time resolution. We employ astigmatism imaging to enable robust extraction of 3D position data on genomically encoded fluorescent protein reporters that bind to DNA. Our relatively straightforward method enables snippets of 3D architectures of likely single genome conformations to be resolved captured via DNA-sequence specific binding proteins in single functional living cells.
Collapse
Affiliation(s)
- Adam J M Wollman
- Biological Physical Science Institute, Departments of Physics and Biology, University of York, YO10 5DD York, UK.
| | - Erik G Hedlund
- Biological Physical Science Institute, Departments of Physics and Biology, University of York, YO10 5DD York, UK.
| | - Sviatlana Shashkova
- Biological Physical Science Institute, Departments of Physics and Biology, University of York, YO10 5DD York, UK.
| | - Mark C Leake
- Biological Physical Science Institute, Departments of Physics and Biology, University of York, YO10 5DD York, UK.
| |
Collapse
|
32
|
Wang X, Zhou T, Wunderlich Z, Maurano MT, DePace AH, Nuzhdin SV, Rohs R. Analysis of Genetic Variation Indicates DNA Shape Involvement in Purifying Selection. Mol Biol Evol 2019; 35:1958-1967. [PMID: 29850830 PMCID: PMC6063282 DOI: 10.1093/molbev/msy099] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Noncoding DNA sequences, which play various roles in gene expression and regulation, are under evolutionary pressure. Gene regulation requires specific protein–DNA binding events, and our previous studies showed that both DNA sequence and shape readout are employed by transcription factors (TFs) to achieve DNA binding specificity. By investigating the shape-disrupting properties of single nucleotide polymorphisms (SNPs) in human regulatory regions, we established a link between disruptive local DNA shape changes and loss of specific TF binding. Furthermore, we described cases where disease-associated SNPs may alter TF binding through DNA shape changes. This link led us to hypothesize that local DNA shape within and around TF binding sites is under selection pressure. To verify this hypothesis, we analyzed SNP data derived from 216 natural strains of Drosophila melanogaster. Comparing SNPs located in functional and nonfunctional regions within experimentally validated cis-regulatory modules (CRMs) from D. melanogaster that are active in the blastoderm stage of development, we found that SNPs within functional regions tended to cause smaller DNA shape variations. Furthermore, SNPs with higher minor allele frequency were more likely to result in smaller DNA shape variations. The same analysis based on a large number of SNPs in putative CRMs of the D. melanogaster genome derived from DNase I accessibility data confirmed these observations. Taken together, our results indicate that common SNPs in functional regions tend to maintain DNA shape, whereas shape-disrupting SNPs are more likely to be eliminated through purifying selection.
Collapse
Affiliation(s)
- Xiaofei Wang
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA
| | - Tianyin Zhou
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA
| | - Zeba Wunderlich
- Department of Developmental and Cell Biology, University of California Irvine, Irvine, CA
| | - Matthew T Maurano
- Institute for Systems Genetics, New York University Medical Center, New York, NY
| | - Angela H DePace
- Department of Systems Biology, Harvard Medical School, Boston, MA
| | - Sergey V Nuzhdin
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA
| | - Remo Rohs
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA.,Departments of Chemistry, Physics and Astronomy, and Computer Science, University of Southern California, Los Angeles, CA
| |
Collapse
|
33
|
Kinney JB, McCandlish DM. Massively Parallel Assays and Quantitative Sequence-Function Relationships. Annu Rev Genomics Hum Genet 2019; 20:99-127. [PMID: 31091417 DOI: 10.1146/annurev-genom-083118-014845] [Citation(s) in RCA: 76] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Over the last decade, a rich variety of massively parallel assays have revolutionized our understanding of how biological sequences encode quantitative molecular phenotypes. These assays include deep mutational scanning, high-throughput SELEX, and massively parallel reporter assays. Here, we review these experimental methods and how the data they produce can be used to quantitatively model sequence-function relationships. In doing so, we touch on a diverse range of topics, including the identification of clinically relevant genomic variants, the modeling of transcription factor binding to DNA, the functional and evolutionary landscapes of proteins, and cis-regulatory mechanisms in both transcription and mRNA splicing. We further describe a unified conceptual framework and a core set of mathematical modeling strategies that studies in these diverse areas can make use of. Finally, we highlight key aspects of experimental design and mathematical modeling that are important for the results of such studies to be interpretable and reproducible.
Collapse
Affiliation(s)
- Justin B Kinney
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA; ,
| | - David M McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA; ,
| |
Collapse
|
34
|
Qiu C, Kaplan CD. Functional assays for transcription mechanisms in high-throughput. Methods 2019; 159-160:115-123. [PMID: 30797033 PMCID: PMC6589137 DOI: 10.1016/j.ymeth.2019.02.017] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2019] [Accepted: 02/18/2019] [Indexed: 01/12/2023] Open
Abstract
Dramatic increases in the scale of programmed synthesis of nucleic acid libraries coupled with deep sequencing have powered advances in understanding nucleic acid and protein biology. Biological systems centering on nucleic acids or encoded proteins greatly benefit from such high-throughput studies, given that large DNA variant pools can be synthesized and DNA, or RNA products of transcription, can be easily analyzed by deep sequencing. Here we review the scope of various high-throughput functional assays for studies of nucleic acids and proteins in general, followed by discussion of how these types of study have yielded insights into the RNA Polymerase II (Pol II) active site as an example. We discuss methodological considerations in the design and execution of these experiments that should be valuable to studies in any system.
Collapse
Affiliation(s)
- Chenxi Qiu
- Department of Medicine, Division of Translational Therapeutics, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215, USA; Cancer Research Institute, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| | - Craig D Kaplan
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260, USA.
| |
Collapse
|
35
|
Swank Z, Laohakunakorn N, Maerkl SJ. Cell-free gene-regulatory network engineering with synthetic transcription factors. Proc Natl Acad Sci U S A 2019; 116:5892-5901. [PMID: 30850530 PMCID: PMC6442555 DOI: 10.1073/pnas.1816591116] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Gene-regulatory networks are ubiquitous in nature and critical for bottom-up engineering of synthetic networks. Transcriptional repression is a fundamental function that can be tuned at the level of DNA, protein, and cooperative protein-protein interactions, necessitating high-throughput experimental approaches for in-depth characterization. Here, we used a cell-free system in combination with a high-throughput microfluidic device to comprehensively study the different tuning mechanisms of a synthetic zinc-finger repressor library, whose affinity and cooperativity can be rationally engineered. The device is integrated into a comprehensive workflow that includes determination of transcription-factor binding-energy landscapes and mechanistic modeling, enabling us to generate a library of well-characterized synthetic transcription factors and corresponding promoters, which we then used to build gene-regulatory networks de novo. The well-characterized synthetic parts and insights gained should be useful for rationally engineering gene-regulatory networks and for studying the biophysics of transcriptional regulation.
Collapse
Affiliation(s)
- Zoe Swank
- Institute of Bioengineering, School of Engineering, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Nadanai Laohakunakorn
- Institute of Bioengineering, School of Engineering, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Sebastian J Maerkl
- Institute of Bioengineering, School of Engineering, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| |
Collapse
|
36
|
Myint L, Avramopoulos DG, Goff LA, Hansen KD. Linear models enable powerful differential activity analysis in massively parallel reporter assays. BMC Genomics 2019; 20:209. [PMID: 30866806 PMCID: PMC6417258 DOI: 10.1186/s12864-019-5556-x] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Accepted: 02/22/2019] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Massively parallel reporter assays (MPRAs) have emerged as a popular means for understanding noncoding variation in a variety of conditions. While a large number of experiments have been described in the literature, analysis typically uses ad-hoc methods. There has been little attention to comparing performance of methods across datasets. RESULTS We present the mpralm method which we show is calibrated and powerful, by analyzing its performance on multiple MPRA datasets. We show that it outperforms existing statistical methods for analysis of this data type, in the first comprehensive evaluation of statistical methods on several datasets. We investigate theoretical and real-data properties of barcode summarization methods and show an unappreciated impact of summarization method for some datasets. Finally, we use our model to conduct a power analysis for this assay and show substantial improvements in power by performing up to 6 replicates per condition, whereas sequencing depth has smaller impact; we recommend to always use at least 4 replicates. An R package is available from the Bioconductor project. CONCLUSIONS Together, these results inform recommendations for differential analysis, general group comparisons, and power analysis and will help improve design and analysis of MPRA experiments.
Collapse
Affiliation(s)
- Leslie Myint
- Department of Mathematics, Statistics, and Computer Science, Macalester College, 1600 Grand Ave, Saint Paul, MN 55105 USA
| | | | - Loyal A. Goff
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, USA
- Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, USA
| | - Kasper D. Hansen
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 615 N. Wolfe St, E3527, Baltimore, MD 21212 USA
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, USA
| |
Collapse
|
37
|
Hartl D, Krebs AR, Grand RS, Baubec T, Isbel L, Wirbelauer C, Burger L, Schübeler D. CG dinucleotides enhance promoter activity independent of DNA methylation. Genome Res 2019; 29:554-563. [PMID: 30709850 PMCID: PMC6442381 DOI: 10.1101/gr.241653.118] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2018] [Accepted: 01/24/2019] [Indexed: 11/24/2022]
Abstract
Most mammalian RNA polymerase II initiation events occur at CpG islands, which are rich in CpGs and devoid of DNA methylation. Despite their relevance for gene regulation, it is unknown to what extent the CpG dinucleotide itself actually contributes to promoter activity. To address this question, we determined the transcriptional activity of a large number of chromosomally integrated promoter constructs and monitored binding of transcription factors assumed to play a role in CpG island activity. This revealed that CpG density significantly improves motif-based prediction of transcription factor binding. Our experiments also show that high CpG density alone is insufficient for transcriptional activity, yet results in increased transcriptional output when combined with particular transcription factor motifs. However, this CpG contribution to promoter activity is independent of DNA methyltransferase activity. Together, this refines our understanding of mammalian promoter regulation as it shows that high CpG density within CpG islands directly contributes to an environment permissive for full transcriptional activity.
Collapse
Affiliation(s)
- Dominik Hartl
- Friedrich Miescher Institute for Biomedical Research, CH 4058 Basel, Switzerland.,Faculty of Sciences, University of Basel, CH 4003 Basel, Switzerland
| | - Arnaud R Krebs
- Friedrich Miescher Institute for Biomedical Research, CH 4058 Basel, Switzerland
| | - Ralph S Grand
- Friedrich Miescher Institute for Biomedical Research, CH 4058 Basel, Switzerland
| | - Tuncay Baubec
- Friedrich Miescher Institute for Biomedical Research, CH 4058 Basel, Switzerland
| | - Luke Isbel
- Friedrich Miescher Institute for Biomedical Research, CH 4058 Basel, Switzerland
| | | | - Lukas Burger
- Friedrich Miescher Institute for Biomedical Research, CH 4058 Basel, Switzerland.,Swiss Institute of Bioinformatics, CH 4058 Basel, Switzerland
| | - Dirk Schübeler
- Friedrich Miescher Institute for Biomedical Research, CH 4058 Basel, Switzerland.,Faculty of Sciences, University of Basel, CH 4003 Basel, Switzerland
| |
Collapse
|
38
|
Shapshak P, Balaji S, Kangueane P, Chiappelli F, Somboonwit C, Menezes LJ, Sinnott JT. Innovative Technologies for Advancement of WHO Risk Group 4 Pathogens Research. GLOBAL VIROLOGY III: VIROLOGY IN THE 21ST CENTURY 2019. [PMCID: PMC7122670 DOI: 10.1007/978-3-030-29022-1_15] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Affiliation(s)
- Paul Shapshak
- Department of Internal Medicine, University of South Florida, Tampa, FL USA
| | - Seetharaman Balaji
- Department of Biotechnology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka India
| | | | - Francesco Chiappelli
- Oral Biology and Medicine, CHS 63-090, UCLA School of Dentistry Oral Biology and Medicine, CHS 63-090, Los Angeles, CA USA
| | | | - Lynette J. Menezes
- Department of Internal Medicine, University of South Florida, Tampa, FL USA
| | - John T. Sinnott
- Department of Internal Medicine, University of South Florida, Tampa, FL USA
| |
Collapse
|
39
|
Forcier TL, Ayaz A, Gill MS, Jones D, Phillips R, Kinney JB. Measuring cis-regulatory energetics in living cells using allelic manifolds. eLife 2018; 7:40618. [PMID: 30570483 PMCID: PMC6301791 DOI: 10.7554/elife.40618] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2018] [Accepted: 11/27/2018] [Indexed: 12/04/2022] Open
Abstract
Gene expression in all organisms is controlled by cooperative interactions between DNA-bound transcription factors (TFs), but quantitatively measuring TF-DNA and TF-TF interactions remains difficult. Here we introduce a strategy for precisely measuring the Gibbs free energy of such interactions in living cells. This strategy centers on the measurement and modeling of ‘allelic manifolds’, a multidimensional generalization of the classical genetics concept of allelic series. Allelic manifolds are measured using reporter assays performed on strategically designed cis-regulatory sequences. Quantitative biophysical models are then fit to the resulting data. We used this strategy to study regulation by two Escherichia coli TFs, CRP and σ70 RNA polymerase. Doing so, we consistently obtained energetic measurements precise to ∼0.1 kcal/mol. We also obtained multiple results that deviate from the prior literature. Our strategy is compatible with massively parallel reporter assays in both prokaryotes and eukaryotes, and should therefore be highly scalable and broadly applicable. Editorial note: This article has been through an editorial process in which the authors decide how to respond to the issues raised during peer review. The Reviewing Editor's assessment is that minor issues remain unresolved (see decision letter).
Collapse
Affiliation(s)
- Talitha L Forcier
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, United States
| | - Andalus Ayaz
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, United States
| | - Manraj S Gill
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, United States
| | - Daniel Jones
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, United States.,Department of Applied Physics, California Institute of Technology, Pasadena, United States
| | - Rob Phillips
- Department of Applied Physics, California Institute of Technology, Pasadena, United States
| | - Justin B Kinney
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, United States
| |
Collapse
|
40
|
Chakravorty S, Hegde M. Inferring the effect of genomic variation in the new era of genomics. Hum Mutat 2018; 39:756-773. [PMID: 29633501 DOI: 10.1002/humu.23427] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2017] [Revised: 03/20/2018] [Accepted: 03/28/2018] [Indexed: 12/11/2022]
Abstract
Accurate and detailed understanding of the effects of variants in the coding and noncoding regions of the genome is the next big challenge in the new genomic era of personalized medicine, especially to tackle newer findings of genetic and phenotypic heterogeneity of diseases. This is necessary to resolve the gene-variant-disease relationship, the pathogenic variant spectrum of genes, pathogenic variants with variable clinical consequences, and multiloci diseases. In turn, this will facilitate patient recruitment for relevant clinical trials. In this review, we describe the trends in research at the intersection of basic and clinical genomics aiming to (a) overcome molecular diagnostic challenges and increase the clinical utility of next-generation sequencing (NGS) platforms, (b) elucidate variants associated with disease, (c) determine overall genomic complexity including epistasis, complex inheritance patterns such as "synergistic heterozygosity," digenic/multigenic inheritance, modifier effect, and rare variant load. We describe the newly emerging field of integrated functional genomics, in vivo or in vitro large-scale functional approaches, statistical bioinformatics algorithms that support NGS genomics data to interpret variants for timely clinical diagnostics and disease management. Thus, facilitating the discovery of new therapeutic or biomarker options, and their roles in the future of personalized medicine.
Collapse
Affiliation(s)
- Samya Chakravorty
- Department of Human Genetics, Emory University School of Medicine, Whitehead Biomedical Research Building Suite 301, Atlanta, Georgia
| | - Madhuri Hegde
- Department of Human Genetics, Emory University School of Medicine, Whitehead Biomedical Research Building Suite 301, Atlanta, Georgia
| |
Collapse
|
41
|
Park J, Wang HH. Systematic and synthetic approaches to rewire regulatory networks. CURRENT OPINION IN SYSTEMS BIOLOGY 2018; 8:90-96. [PMID: 30637352 PMCID: PMC6329604 DOI: 10.1016/j.coisb.2017.12.009] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Microbial gene regulatory networks are composed of cis- and trans-components that in concert act to control essential and adaptive cellular functions. Regulatory components and interactions evolve to adopt new configurations through mutations and network rewiring events, resulting in novel phenotypes that may benefit the cell. Advances in high-throughput DNA synthesis and sequencing have enabled the development of new tools and approaches to better characterize and perturb various elements of regulatory networks. Here, we highlight key recent approaches to systematically dissect the sequence space of cis-regulatory elements and trans-regulators as well as their inter-connections. These efforts yield fundamental insights into the architecture, robustness, and dynamics of gene regulation and provide models and design principles for building synthetic regulatory networks for a variety of practical applications.
Collapse
Affiliation(s)
- Jimin Park
- Department of Systems Biology, Columbia University Medical Center, New York, USA
- Integrated Program in Cellular, Molecular and Biomedical Studies, Columbia University Medical Center, New York, USA
| | - Harris H Wang
- Department of Systems Biology, Columbia University Medical Center, New York, USA
- Department of Pathology and Cell Biology, Columbia University Medical Center, New York, USA
| |
Collapse
|
42
|
Iyer S, Acharya KR, Subramanian V. A comparative bioinformatic analysis of C9orf72. PeerJ 2018; 6:e4391. [PMID: 29479499 PMCID: PMC5822839 DOI: 10.7717/peerj.4391] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2017] [Accepted: 01/29/2018] [Indexed: 12/12/2022] Open
Abstract
C9orf72 is associated with frontotemporal dementia (FTD) and Amyotrophic Lateral Sclerosis (ALS), both of which are devastating neurodegenerative diseases. Findings suggest that an expanded hexanucleotide repeat in the non-coding region of the C9orf72 gene is the most common cause of familial FTD and ALS. Despite considerable efforts being made towards discerning the possible disease-causing mechanism/s of this repeat expansion mutation, the biological function of C9orf72 remains unclear. Here, we present the first comprehensive genomic study on C9orf72 gene. Analysis of the genomic level organization of C9orf72 across select species revealed architectural similarity of syntenic regions between human and mouse but a lack of conservation of the repeat-harboring intron 1 sequence. Information generated in this study provides a broad genomic perspective of C9orf72 which would form a basis for subsequent experimental approaches and facilitate future mechanistic and functional studies on this gene.
Collapse
Affiliation(s)
- Shalini Iyer
- Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom
| | - K Ravi Acharya
- Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom
| | - Vasanta Subramanian
- Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom
| |
Collapse
|
43
|
Unraveling the determinants of microRNA mediated regulation using a massively parallel reporter assay. Nat Commun 2018; 9:529. [PMID: 29410437 PMCID: PMC5802814 DOI: 10.1038/s41467-018-02980-z] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2017] [Accepted: 01/11/2018] [Indexed: 12/16/2022] Open
Abstract
Despite extensive research, the sequence features affecting microRNA-mediated regulation are not well understood, limiting our ability to predict gene expression levels in both native and synthetic sequences. Here we employed a massively parallel reporter assay to investigate the effect of over 14,000 rationally designed 3′ UTR sequences on reporter construct repression. We found that multiple factors, including microRNA identity, hybridization energy, target accessibility, and target multiplicity, can be manipulated to achieve a predictable, up to 57-fold, change in protein repression. Moreover, we predict protein repression and RNA levels with high accuracy (R = 0.84 and R = 0.80, respectively) using only 3′ UTR sequence, as well as the effect of mutation in native 3′ UTRs on protein repression (R = 0.63). Taken together, our results elucidate the effect of different sequence features on miRNA-mediated regulation and demonstrate the predictability of their effect on gene expression with applications in regulatory genomics and synthetic biology. MiRNAs are known regulators of gene expression. Here the authors perform a large-scale massively parallel reporter assay to investigate the effect of a large number of designed 3′ UTR sequences on reporter expression and asses how miRNA regulatory elements features affect miRNA mediated repression.
Collapse
|
44
|
Gan KA, Carrasco Pro S, Sewell JA, Fuxman Bass JI. Identification of Single Nucleotide Non-coding Driver Mutations in Cancer. Front Genet 2018; 9:16. [PMID: 29456552 PMCID: PMC5801294 DOI: 10.3389/fgene.2018.00016] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2017] [Accepted: 01/12/2018] [Indexed: 12/14/2022] Open
Abstract
Recent whole-genome sequencing studies have identified millions of somatic variants present in tumor samples. Most of these variants reside in non-coding regions of the genome potentially affecting transcriptional and post-transcriptional gene regulation. Although a few hallmark examples of driver mutations in non-coding regions have been reported, the functional role of the vast majority of somatic non-coding variants remains to be determined. This is because the few driver variants in each sample must be distinguished from the thousands of passenger variants and because the logic of regulatory element function has not yet been fully elucidated. Thus, variants prioritized based on mutational burden and location within regulatory elements need to be validated experimentally. This is generally achieved by combining assays that measure physical binding, such as chromatin immunoprecipitation, with those that determine regulatory activity, such as luciferase reporter assays. Here, we present an overview of in silico approaches used to prioritize somatic non-coding variants and the experimental methods used for functional validation and characterization.
Collapse
Affiliation(s)
- Kok A Gan
- Department of Biology, Boston University, Boston, MA, United States
| | | | - Jared A Sewell
- Department of Biology, Boston University, Boston, MA, United States
| | | |
Collapse
|
45
|
A Simple Grammar Defines Activating and Repressing cis-Regulatory Elements in Photoreceptors. Cell Rep 2017; 17:1247-1254. [PMID: 27783940 DOI: 10.1016/j.celrep.2016.09.066] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2016] [Revised: 08/06/2016] [Accepted: 09/20/2016] [Indexed: 12/22/2022] Open
Abstract
Transcription factors often activate and repress different target genes in the same cell. How activation and repression are encoded by different arrangements of transcription factor binding sites in cis-regulatory elements is poorly understood. We investigated how sites for the transcription factor CRX encode both activation and repression in photoreceptors by assaying thousands of genomic and synthetic cis-regulatory elements in wild-type and Crx-/- retinas. We found that sequences with high affinity for CRX repress transcription, whereas sequences with lower affinity activate. This rule is modified by a cooperative interaction between CRX sites and sites for the transcription factor NRL, which overrides the repressive effect of high affinity for CRX. Our results show how simple rearrangements of transcription factor binding sites encode qualitatively different responses to a single transcription factor and explain how CRX plays multiple cis-regulatory roles in the same cell.
Collapse
|
46
|
Hartl D, Krebs AR, Jüttner J, Roska B, Schübeler D. Cis-regulatory landscapes of four cell types of the retina. Nucleic Acids Res 2017; 45:11607-11621. [PMID: 29059322 PMCID: PMC5714137 DOI: 10.1093/nar/gkx923] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2017] [Revised: 07/28/2017] [Accepted: 10/02/2017] [Indexed: 12/18/2022] Open
Abstract
The retina is composed of ∼50 cell-types with specific functions for the process of vision. Identification of the cis-regulatory elements active in retinal cell-types is key to elucidate the networks controlling this diversity. Here, we combined transcriptome and epigenome profiling to map the regulatory landscape of four cell-types isolated from mouse retinas including rod and cone photoreceptors as well as rare inter-neuron populations such as horizontal and starburst amacrine cells. Integration of this information reveals sequence determinants and candidate transcription factors for controlling cellular specialization. Additionally, we refined parallel reporter assays to enable studying the transcriptional activity of large collection of sequences in individual cell-types isolated from a tissue. We provide proof of concept for this approach and its scalability by characterizing the transcriptional capacity of several hundred putative regulatory sequences within individual retinal cell-types. This generates a catalogue of cis-regulatory regions active in retinal cell types and we further demonstrate their utility as potential resource for cellular tagging and manipulation.
Collapse
Affiliation(s)
- Dominik Hartl
- Friedrich Miescher Institute for Biomedical Research, Maulbeerstrasse 66, CH 4058 Basel, Switzerland
- University of Basel, Faculty of Sciences, Petersplatz 1, CH 4003 Basel, Switzerland
| | - Arnaud R. Krebs
- Friedrich Miescher Institute for Biomedical Research, Maulbeerstrasse 66, CH 4058 Basel, Switzerland
| | - Josephine Jüttner
- Friedrich Miescher Institute for Biomedical Research, Maulbeerstrasse 66, CH 4058 Basel, Switzerland
| | - Botond Roska
- Friedrich Miescher Institute for Biomedical Research, Maulbeerstrasse 66, CH 4058 Basel, Switzerland
- University of Basel, Department of Ophthalmology, Mittlere Strasse 91, CH 4031 Basel, Switzerland
| | - Dirk Schübeler
- Friedrich Miescher Institute for Biomedical Research, Maulbeerstrasse 66, CH 4058 Basel, Switzerland
- University of Basel, Faculty of Sciences, Petersplatz 1, CH 4003 Basel, Switzerland
| |
Collapse
|
47
|
Brown AJ, Gibson SJ, Hatton D, James DC. In silico design of context-responsive mammalian promoters with user-defined functionality. Nucleic Acids Res 2017; 45:10906-10919. [PMID: 28977454 PMCID: PMC5737543 DOI: 10.1093/nar/gkx768] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2017] [Accepted: 08/22/2017] [Indexed: 12/19/2022] Open
Abstract
Comprehensive de novo-design of complex mammalian promoters is restricted by unpredictable combinatorial interactions between constituent transcription factor regulatory elements (TFREs). In this study, we show that modular binding sites that do not function cooperatively can be identified by analyzing host cell transcription factor expression profiles, and subsequently testing cognate TFRE activities in varying homotypic and heterotypic promoter architectures. TFREs that displayed position-insensitive, additive function within a specific expression context could be rationally combined together in silico to create promoters with highly predictable activities. As TFRE order and spacing did not affect the performance of these TFRE-combinations, compositions could be specifically arranged to preclude the formation of undesirable sequence features. This facilitated simple in silico-design of promoters with context-required, user-defined functionalities. To demonstrate this, we de novo-created promoters for biopharmaceutical production in CHO cells that exhibited precisely designed activity dynamics and long-term expression-stability, without causing observable retroactive effects on cellular performance. The design process described can be utilized for applications requiring context-responsive, customizable promoter function, particularly where co-expression of synthetic TFs is not suitable. Although the synthetic promoter structure utilized does not closely resemble native mammalian architectures, our findings also provide additional support for a flexible billboard model of promoter regulation.
Collapse
Affiliation(s)
- Adam J Brown
- Department of Chemical and Biological Engineering, University of Sheffield, Mappin St., Sheffield S1 3JD, UK
| | - Suzanne J Gibson
- Biopharmaceutical Development, MedImmune, Cambridge CB21 6GH, UK
| | - Diane Hatton
- Biopharmaceutical Development, MedImmune, Cambridge CB21 6GH, UK
| | - David C James
- Department of Chemical and Biological Engineering, University of Sheffield, Mappin St., Sheffield S1 3JD, UK
| |
Collapse
|
48
|
Kreimer A, Zeng H, Edwards MD, Guo Y, Tian K, Shin S, Welch R, Wainberg M, Mohan R, Sinnott-Armstrong NA, Li Y, Eraslan G, AMIN TB, Goke J, Mueller NS, Kellis M, Kundaje A, Beer MA, Keles S, Gifford DK, Yosef N. Predicting gene expression in massively parallel reporter assays: A comparative study. Hum Mutat 2017; 38:1240-1250. [PMID: 28220625 PMCID: PMC5560998 DOI: 10.1002/humu.23197] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2016] [Revised: 01/19/2017] [Accepted: 02/12/2017] [Indexed: 02/03/2023]
Abstract
In many human diseases, associated genetic changes tend to occur within noncoding regions, whose effect might be related to transcriptional control. A central goal in human genetics is to understand the function of such noncoding regions: given a region that is statistically associated with changes in gene expression (expression quantitative trait locus [eQTL]), does it in fact play a regulatory role? And if so, how is this role "coded" in its sequence? These questions were the subject of the Critical Assessment of Genome Interpretation eQTL challenge. Participants were given a set of sequences that flank eQTLs in humans and were asked to predict whether these are capable of regulating transcription (as evaluated by massively parallel reporter assays), and whether this capability changes between alternative alleles. Here, we report lessons learned from this community effort. By inspecting predictive properties in isolation, and conducting meta-analysis over the competing methods, we find that using chromatin accessibility and transcription factor binding as features in an ensemble of classifiers or regression models leads to the most accurate results. We then characterize the loci that are harder to predict, putting the spotlight on areas of weakness, which we expect to be the subject of future studies.
Collapse
Affiliation(s)
- Anat Kreimer
- Department of Electrical Engineering and Computer Science and Center for Computational Biology, University of California, Berkeley, Berkeley, CA 94720, USA
- Department of Bioengineering and Therapeutic Sciences, Institute for Human Genetics, University of California, San Francisco, San Francisco, California, USA
| | - Haoyang Zeng
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02142, USA
| | - Matthew D. Edwards
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02142, USA
| | - Yuchun Guo
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02142, USA
| | - Kevin Tian
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02142, USA
| | - Sunyoung Shin
- Department of Statistics, Department of Biostatistics and Medical Informatics University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Rene Welch
- Department of Statistics, Department of Biostatistics and Medical Informatics University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Michael Wainberg
- Department of Genetics, Stanford University School of Medicine, Department of Computer Science, Stanford, California 94305, USA
| | - Rahul Mohan
- Department of Genetics, Stanford University School of Medicine, Department of Computer Science, Stanford, California 94305, USA
| | - Nicholas A. Sinnott-Armstrong
- Department of Genetics, Stanford University School of Medicine, Department of Computer Science, Stanford, California 94305, USA
| | - Yue Li
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, 32 Vassar St, Cambridge, Massachusetts 02139, USA
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, 32 Vassar St, Cambridge, Massachusetts 02139, USA
| | - Gökcen Eraslan
- Computational Cell Maps, Institute of Computational Biology, Helmholtz Zentrum München, Ingolstädter Landstr. 1 85764 Neuherberg, Germany
| | - Talal Bin AMIN
- Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672, Singapore
| | - Jonathan Goke
- Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672, Singapore
| | - Nikola S. Mueller
- Computational Cell Maps, Institute of Computational Biology, Helmholtz Zentrum München, Ingolstädter Landstr. 1 85764 Neuherberg, Germany
| | - Manolis Kellis
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, 32 Vassar St, Cambridge, Massachusetts 02139, USA
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, 32 Vassar St, Cambridge, Massachusetts 02139, USA
| | - Anshul Kundaje
- Department of Genetics, Stanford University School of Medicine, Department of Computer Science, Stanford, California 94305, USA
| | - Michael A Beer
- McKusick-Nathans Institute of Genetic Medicine, Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Sunduz Keles
- Department of Statistics, Department of Biostatistics and Medical Informatics University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - David K. Gifford
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02142, USA
| | - Nir Yosef
- Department of Electrical Engineering and Computer Science and Center for Computational Biology, University of California, Berkeley, Berkeley, CA 94720, USA
- Ragon Institute of Massachusetts General Hospital, MIT and Harvard, Cambridge, MA, 02139
| |
Collapse
|
49
|
Huminiecki Ł, Horbańczuk J. Can We Predict Gene Expression by Understanding Proximal Promoter Architecture? Trends Biotechnol 2017; 35:530-546. [PMID: 28377102 DOI: 10.1016/j.tibtech.2017.03.007] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2016] [Revised: 02/14/2017] [Accepted: 03/09/2017] [Indexed: 10/19/2022]
Abstract
We review computational predictions of expression from the promoter architecture - the set of transcription factors that can bind the proximal promoter. We focus on spatial expression patterns in animals with complex body plans and many distinct tissue types. This field is ripe for change as functional genomics datasets accumulate for both expression and protein-DNA interactions. While there has been some success in predicting the breadth of expression (i.e., the fraction of tissue types a gene is expressed in), predicting tissue specificity remains challenging. We discuss how progress can be achieved through either machine learning or complementary combinatorial data mining. The likely impact of single-cell expression data is considered. Finally, we discuss the design of artificial promoters as a practical application.
Collapse
Affiliation(s)
- Łukasz Huminiecki
- Institute of Genetics and Animal Breeding, Polish Academy of Sciences, ul. Postępu 36A, Jastrzębiec, 05-552 Magdalenka, Poland.
| | - Jarosław Horbańczuk
- Institute of Genetics and Animal Breeding, Polish Academy of Sciences, ul. Postępu 36A, Jastrzębiec, 05-552 Magdalenka, Poland
| |
Collapse
|
50
|
Inukai S, Kock KH, Bulyk ML. Transcription factor-DNA binding: beyond binding site motifs. Curr Opin Genet Dev 2017; 43:110-119. [PMID: 28359978 PMCID: PMC5447501 DOI: 10.1016/j.gde.2017.02.007] [Citation(s) in RCA: 189] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2016] [Revised: 02/02/2017] [Accepted: 02/07/2017] [Indexed: 12/12/2022]
Abstract
Sequence-specific transcription factors (TFs) regulate gene expression by binding to cis-regulatory elements in promoter and enhancer DNA. While studies of TF-DNA binding have focused on TFs' intrinsic preferences for primary nucleotide sequence motifs, recent studies have elucidated additional layers of complexity that modulate TF-DNA binding. In this review, we discuss technological developments for identifying TF binding preferences and highlight recent discoveries that elaborate how TF interactions, local DNA structure, and genomic features influence TF-DNA binding. We highlight novel approaches for characterizing functional binding site motifs that promise to inform our understanding of how TF binding controls gene expression and ultimately contributes to phenotype.
Collapse
Affiliation(s)
- Sachi Inukai
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
| | - Kian Hong Kock
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA; Program in Biological and Biomedical Sciences, Harvard University, Cambridge, MA 02138, USA
| | - Martha L Bulyk
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA; Program in Biological and Biomedical Sciences, Harvard University, Cambridge, MA 02138, USA; Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA.
| |
Collapse
|