1
|
The evolution, evolvability and engineering of gene regulatory DNA. Nature 2022; 603:455-463. [PMID: 35264797 DOI: 10.1038/s41586-022-04506-6] [Citation(s) in RCA: 92] [Impact Index Per Article: 46.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Accepted: 02/02/2022] [Indexed: 11/08/2022]
Abstract
Mutations in non-coding regulatory DNA sequences can alter gene expression, organismal phenotype and fitness1-3. Constructing complete fitness landscapes, in which DNA sequences are mapped to fitness, is a long-standing goal in biology, but has remained elusive because it is challenging to generalize reliably to vast sequence spaces4-6. Here we build sequence-to-expression models that capture fitness landscapes and use them to decipher principles of regulatory evolution. Using millions of randomly sampled promoter DNA sequences and their measured expression levels in the yeast Saccharomyces cerevisiae, we learn deep neural network models that generalize with excellent prediction performance, and enable sequence design for expression engineering. Using our models, we study expression divergence under genetic drift and strong-selection weak-mutation regimes to find that regulatory evolution is rapid and subject to diminishing returns epistasis; that conflicting expression objectives in different environments constrain expression adaptation; and that stabilizing selection on gene expression leads to the moderation of regulatory complexity. We present an approach for using such models to detect signatures of selection on expression from natural variation in regulatory sequences and use it to discover an instance of convergent regulatory evolution. We assess mutational robustness, finding that regulatory mutation effect sizes follow a power law, characterize regulatory evolvability, visualize promoter fitness landscapes, discover evolvability archetypes and illustrate the mutational robustness of natural regulatory sequence populations. Our work provides a general framework for designing regulatory sequences and addressing fundamental questions in regulatory evolution.
Collapse
|
2
|
Amariuta T, Ishigaki K, Sugishita H, Ohta T, Koido M, Dey KK, Matsuda K, Murakami Y, Price AL, Kawakami E, Terao C, Raychaudhuri S. Improving the trans-ancestry portability of polygenic risk scores by prioritizing variants in predicted cell-type-specific regulatory elements. Nat Genet 2020; 52:1346-1354. [PMID: 33257898 PMCID: PMC8049522 DOI: 10.1038/s41588-020-00740-8] [Citation(s) in RCA: 104] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2020] [Accepted: 10/19/2020] [Indexed: 12/15/2022]
Abstract
Poor trans-ancestry portability of polygenic risk scores is a consequence of Eurocentric genetic studies and limited knowledge of shared causal variants. Leveraging regulatory annotations may improve portability by prioritizing functional over tagging variants. We constructed a resource of 707 cell-type-specific IMPACT regulatory annotations by aggregating 5,345 epigenetic datasets to predict binding patterns of 142 transcription factors across 245 cell types. We then partitioned the common SNP heritability of 111 genome-wide association study summary statistics of European (average n ≈ 189,000) and East Asian (average n ≈ 157,000) origin. IMPACT annotations captured consistent SNP heritability between populations, suggesting prioritization of shared functional variants. Variant prioritization using IMPACT resulted in increased trans-ancestry portability of polygenic risk scores from Europeans to East Asians across all 21 phenotypes analyzed (49.9% mean relative increase in R2). Our study identifies a crucial role for functional annotations such as IMPACT to improve the trans-ancestry portability of genetic data.
Collapse
Affiliation(s)
- Tiffany Amariuta
- Center for Data Sciences, Harvard Medical School, Boston, MA, USA
- Divisions of Genetics and Rheumatology, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Graduate School of Arts and Sciences, Harvard University, Cambridge, MA, USA
| | - Kazuyoshi Ishigaki
- Center for Data Sciences, Harvard Medical School, Boston, MA, USA
- Divisions of Genetics and Rheumatology, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Kanagawa, Japan
| | - Hiroki Sugishita
- Laboratory for Developmental Genetics, RIKEN Center for Integrative Medical Sciences (IMS), Kanagawa, Japan
| | - Tazro Ohta
- Medical Sciences Innovation Hub Program, RIKEN, Kanagawa, Japan
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Shizuoka, Japan
| | - Masaru Koido
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Kanagawa, Japan
- Division of Molecular Pathology, Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Kushal K Dey
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Koichi Matsuda
- Laboratory of Genome Technology, Human Genome Center, Institute of Medical Science, The University of Tokyo, Tokyo, Japan
- Laboratory of Clinical Genome Sequencing, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
| | - Yoshinori Murakami
- Division of Molecular Pathology, Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Alkes L Price
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Eiryo Kawakami
- Medical Sciences Innovation Hub Program, RIKEN, Kanagawa, Japan
- Artificial Intelligence Medicine, Graduate School of Medicine, Chiba University, Chiba, Japan
| | - Chikashi Terao
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Kanagawa, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
- Department of Applied Genetics, The School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka, Japan
| | - Soumya Raychaudhuri
- Center for Data Sciences, Harvard Medical School, Boston, MA, USA.
- Divisions of Genetics and Rheumatology, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Centre for Genetics and Genomics Versus Arthritis, Centre for Musculoskeletal Research, Manchester Academic Health Science Centre, The University of Manchester, Manchester, UK.
| |
Collapse
|
3
|
Evaluating the informativeness of deep learning annotations for human complex diseases. Nat Commun 2020; 11:4703. [PMID: 32943643 PMCID: PMC7499261 DOI: 10.1038/s41467-020-18515-4] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Accepted: 08/25/2020] [Indexed: 12/12/2022] Open
Abstract
Deep learning models have shown great promise in predicting regulatory effects from DNA sequence, but their informativeness for human complex diseases is not fully understood. Here, we evaluate genome-wide SNP annotations from two previous deep learning models, DeepSEA and Basenji, by applying stratified LD score regression to 41 diseases and traits (average N = 320K), conditioning on a broad set of coding, conserved and regulatory annotations. We aggregated annotations across all (respectively blood or brain) tissues/cell-types in meta-analyses across all (respectively 11 blood or 8 brain) traits. The annotations were highly enriched for disease heritability, but produced only limited conditionally significant results: non-tissue-specific and brain-specific Basenji-H3K4me3 for all traits and brain traits respectively. We conclude that deep learning models have yet to achieve their full potential to provide considerable unique information for complex disease, and that their conditional informativeness for disease cannot be inferred from their accuracy in predicting regulatory annotations. Deep learning models have shown great promise in predicting regulatory effects from DNA sequence. Here the authors evaluate sequence-based epigenomic deep learning models and conclude that these models are not yet ready to inform our knowledge of human disease.
Collapse
|