1
|
Breeze CE, Haugen E, Gutierrez-Arcelus M, Yao X, Teschendorff A, Beck S, Dunham I, Stamatoyannopoulos J, Franceschini N, Machiela MJ, Berndt SI. FORGEdb: a tool for identifying candidate functional variants and uncovering target genes and mechanisms for complex diseases. Genome Biol 2024; 25:3. [PMID: 38167104 PMCID: PMC10763681 DOI: 10.1186/s13059-023-03126-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2023] [Accepted: 11/27/2023] [Indexed: 01/05/2024] Open
Abstract
The majority of disease-associated variants identified through genome-wide association studies are located outside of protein-coding regions. Prioritizing candidate regulatory variants and gene targets to identify potential biological mechanisms for further functional experiments can be challenging. To address this challenge, we developed FORGEdb ( https://forgedb.cancer.gov/ ; https://forge2.altiusinstitute.org/files/forgedb.html ; and https://doi.org/10.5281/zenodo.10067458 ), a standalone and web-based tool that integrates multiple datasets, delivering information on associated regulatory elements, transcription factor binding sites, and target genes for over 37 million variants. FORGEdb scores provide researchers with a quantitative assessment of the relative importance of each variant for targeted functional experiments.
Collapse
Affiliation(s)
- Charles E Breeze
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA.
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue 98121, Seattle, USA.
- UCL Cancer Institute, University College London, 72 Huntley Street, London, WC1E 6BT, UK.
| | - Eric Haugen
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue 98121, Seattle, USA
| | - María Gutierrez-Arcelus
- Division of Immunology, Department of Pediatrics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Xiaozheng Yao
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Andrew Teschendorff
- CAS Key Lab of Computational Biology, Shanghai Institute for Biological Sciences, CAS-MPG Partner Institute for Computational Biology, Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai, 200031, China
| | - Stephan Beck
- UCL Cancer Institute, University College London, 72 Huntley Street, London, WC1E 6BT, UK
| | - Ian Dunham
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | | | - Nora Franceschini
- Department of Epidemiology, University of North Carolina, Chapel Hill, NC, USA
| | - Mitchell J Machiela
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Sonja I Berndt
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| |
Collapse
|
2
|
Jain PR, Burch M, Martinez M, Mir P, Fichna JP, Zekanowski C, Rizzo R, Tümer Z, Barta C, Yannaki E, Stamatoyannopoulos J, Drineas P, Paschou P. Can polygenic risk scores help explain disease prevalence differences around the world? A worldwide investigation. BMC Genom Data 2023; 24:70. [PMID: 37986041 PMCID: PMC10662565 DOI: 10.1186/s12863-023-01168-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 10/20/2023] [Indexed: 11/22/2023] Open
Abstract
Complex disorders are caused by a combination of genetic, environmental and lifestyle factors, and their prevalence can vary greatly across different populations. The extent to which genetic risk, as identified by Genome Wide Association Study (GWAS), correlates to disease prevalence in different populations has not been investigated systematically. Here, we studied 14 different complex disorders and explored whether polygenic risk scores (PRS) based on current GWAS correlate to disease prevalence within Europe and around the world. A clear variation in GWAS-based genetic risk was observed based on ancestry and we identified populations that have a higher genetic liability for developing certain disorders. We found that for four out of the 14 studied disorders, PRS significantly correlates to disease prevalence within Europe. We also found significant correlations between worldwide disease prevalence and PRS for eight of the studied disorders with Multiple Sclerosis genetic risk having the highest correlation to disease prevalence. Based on current GWAS results, the across population differences in genetic risk for certain disorders can potentially be used to understand differences in disease prevalence and identify populations with the highest genetic liability. The study highlights both the limitations of PRS based on current GWAS but also the fact that in some cases, PRS may already have high predictive power. This could be due to the genetic architecture of specific disorders or increased GWAS power in some cases.
Collapse
Affiliation(s)
- Pritesh R Jain
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
| | - Myson Burch
- Department of Computer Sciences, Purdue University, West Lafayette, IN, USA
| | - Melanie Martinez
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
| | - Pablo Mir
- Unidad de Trastornos del Movimiento, Instituto de Biomedicina de Sevilla (IBiS). Hospital Universitario Virgen del Rocío/CSIC/Universidad de Sevilla, Seville, Spain
- Centro de Investigación Biomédica en Red Sobre Enfermedades Neurodegenerativas (CIBERNED), Madrid, Spain
| | - Jakub P Fichna
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
- Department of Neurogenetics and Functional Genomics, Mossakowski Medical Research Institute, Polish Academy of Sciences, Warsaw, Poland
| | - Cezary Zekanowski
- Department of Neurogenetics and Functional Genomics, Mossakowski Medical Research Institute, Polish Academy of Sciences, Warsaw, Poland
| | - Renata Rizzo
- Child and Adolescent Neurology and Psychiatry, Department of Clinical and Experimental Medicine, University of Catania, Catania, Italy
| | - Zeynep Tümer
- Department of Clinical Genetics, Kennedy Center, Copenhagen University Hospital, Rigshospitalet, Copenhagen, Denmark
- Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Csaba Barta
- Department of Molecular Biology, Institute of Biochemistry and Molecular Biology, Semmelweis University, Budapest, Hungary
| | - Evangelia Yannaki
- Hematology Department- Hematopoietic Cell Transplantation Unit, Gene and Cell Therapy Center, George Papanikolaou Hospital, Thessaloniki, Greece
- Department of Medicine, University of Washington, Seattle, WA, USA
| | - John Stamatoyannopoulos
- Altius Institute for Biomedical Sciences, Seattle, WA, USA
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Department of Medicine, Division of Oncology, University of Washington, Seattle, WA, USA
| | - Petros Drineas
- Department of Computer Sciences, Purdue University, West Lafayette, IN, USA
| | - Peristera Paschou
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
| |
Collapse
|
3
|
Connally NJ, Nazeen S, Lee D, Shi H, Stamatoyannopoulos J, Chun S, Cotsapas C, Cassa CA, Sunyaev SR. The missing link between genetic association and regulatory function. eLife 2022; 11:74970. [PMID: 36515579 PMCID: PMC9842386 DOI: 10.7554/elife.74970] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Accepted: 12/02/2022] [Indexed: 12/15/2022] Open
Abstract
The genetic basis of most traits is highly polygenic and dominated by non-coding alleles. It is widely assumed that such alleles exert small regulatory effects on the expression of cis-linked genes. However, despite the availability of gene expression and epigenomic datasets, few variant-to-gene links have emerged. It is unclear whether these sparse results are due to limitations in available data and methods, or to deficiencies in the underlying assumed model. To better distinguish between these possibilities, we identified 220 gene-trait pairs in which protein-coding variants influence a complex trait or its Mendelian cognate. Despite the presence of expression quantitative trait loci near most GWAS associations, by applying a gene-based approach we found limited evidence that the baseline expression of trait-related genes explains GWAS associations, whether using colocalization methods (8% of genes implicated), transcription-wide association (2% of genes implicated), or a combination of regulatory annotations and distance (4% of genes implicated). These results contradict the hypothesis that most complex trait-associated variants coincide with homeostatic expression QTLs, suggesting that better models are needed. The field must confront this deficit and pursue this 'missing regulation.'
Collapse
Affiliation(s)
- Noah J Connally
- Department of Biomedical Informatics, Harvard Medical SchoolBostonUnited States
- Brigham and Women’s Hospital, Division of Genetics, Harvard Medical SchoolBostonUnited States
- Program in Medical and Population Genetics, Broad Institute of MIT and HarvardCambridgeUnited States
| | - Sumaiya Nazeen
- Department of Biomedical Informatics, Harvard Medical SchoolBostonUnited States
- Brigham and Women’s Hospital, Division of Genetics, Harvard Medical SchoolBostonUnited States
- Brigham and Women’s Hospital, Department of Neurology, Harvard Medical SchoolBostonUnited States
| | - Daniel Lee
- Department of Biomedical Informatics, Harvard Medical SchoolBostonUnited States
- Brigham and Women’s Hospital, Division of Genetics, Harvard Medical SchoolBostonUnited States
- Program in Medical and Population Genetics, Broad Institute of MIT and HarvardCambridgeUnited States
| | - Huwenbo Shi
- Program in Medical and Population Genetics, Broad Institute of MIT and HarvardCambridgeUnited States
- Department of Epidemiology, Harvard T.H. Chan School of Public HealthBostonUnited States
| | | | - Sung Chun
- Division of Pulmonary Medicine, Boston Children’s HospitalBostonUnited States
| | - Chris Cotsapas
- Program in Medical and Population Genetics, Broad Institute of MIT and HarvardCambridgeUnited States
- Department of Neurology, Yale Medical SchoolNew HavenUnited States
- Department of Genetics, Yale Medical SchoolNew HavenUnited States
| | - Christopher A Cassa
- Brigham and Women’s Hospital, Division of Genetics, Harvard Medical SchoolBostonUnited States
- Program in Medical and Population Genetics, Broad Institute of MIT and HarvardCambridgeUnited States
| | - Shamil R Sunyaev
- Department of Biomedical Informatics, Harvard Medical SchoolBostonUnited States
- Brigham and Women’s Hospital, Division of Genetics, Harvard Medical SchoolBostonUnited States
- Program in Medical and Population Genetics, Broad Institute of MIT and HarvardCambridgeUnited States
| |
Collapse
|
4
|
Breeze CE, Haugen E, Reynolds A, Teschendorff A, van Dongen J, Lan Q, Rothman N, Bourque G, Dunham I, Beck S, Stamatoyannopoulos J, Franceschini N, Berndt SI. Integrative analysis of 3604 GWAS reveals multiple novel cell type-specific regulatory associations. Genome Biol 2022; 23:13. [PMID: 34996498 PMCID: PMC8742386 DOI: 10.1186/s13059-021-02560-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2020] [Accepted: 11/26/2021] [Indexed: 01/02/2023] Open
Abstract
Background Genome-wide association study (GWAS) single nucleotide polymorphisms (SNPs) are known to preferentially co-locate to active regulatory elements in tissues and cell types relevant to disease aetiology. Further characterisation of associated cell type-specific regulation can broaden our understanding of how GWAS signals may contribute to disease risk. Results To gain insight into potential functional mechanisms underlying GWAS associations, we developed FORGE2 (https://forge2.altiusinstitute.org/), which is an updated version of the FORGE web tool. FORGE2 uses an expanded atlas of cell type-specific regulatory element annotations, including DNase I hotspots, five histone mark categories and 15 hidden Markov model (HMM) chromatin states, to identify tissue- and cell type-specific signals. An analysis of 3,604 GWAS from the NHGRI-EBI GWAS catalogue yielded at least one significant disease/trait-tissue association for 2,057 GWAS, including > 400 associations specific to epigenomic marks in immune tissues and cell types, > 30 associations specific to heart tissue, and > 60 associations specific to brain tissue, highlighting the key potential of tissue- and cell type-specific regulatory elements. Importantly, we demonstrate that FORGE2 analysis can separate previously observed accessible chromatin enrichments into different chromatin states, such as enhancers or active transcription start sites, providing a greater understanding of underlying regulatory mechanisms. Interestingly, tissue-specific enrichments for repressive chromatin states and histone marks were also detected, suggesting a role for tissue-specific repressed regions in GWAS-mediated disease aetiology. Conclusion In summary, we demonstrate that FORGE2 has the potential to uncover previously unreported disease-tissue associations and identify new candidate mechanisms. FORGE2 is a transparent, user-friendly web tool for the integrative analysis of loci discovered from GWAS. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-021-02560-3.
Collapse
Affiliation(s)
- Charles E Breeze
- National Cancer Institute, NIH, Bethesda, MD, 20892, USA. .,Altius Institute for Biomedical Sciences, Seattle, WA, 98121, USA. .,UCL Cancer Institute, University College London, WC1E 6BT, London, UK.
| | - Eric Haugen
- Altius Institute for Biomedical Sciences, Seattle, WA, 98121, USA
| | - Alex Reynolds
- Altius Institute for Biomedical Sciences, Seattle, WA, 98121, USA
| | - Andrew Teschendorff
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institute for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai, 200031, China
| | - Jenny van Dongen
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, 1081BT, The Netherlands
| | - Qing Lan
- National Cancer Institute, NIH, Bethesda, MD, 20892, USA
| | | | - Guillaume Bourque
- Department of Human Genetics, McGill University and Génome Québec Innovation Center, Montréal, H3A 0G1, Canada
| | - Ian Dunham
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Stephan Beck
- UCL Cancer Institute, University College London, WC1E 6BT, London, UK
| | | | - Nora Franceschini
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, USA
| | - Sonja I Berndt
- National Cancer Institute, NIH, Bethesda, MD, 20892, USA
| |
Collapse
|
5
|
Topaloudi A, Zagoriti Z, Flint AC, Martinez MB, Yang Z, Tsetsos F, Christou YP, Lagoumintzis G, Yannaki E, Zamba-Papanicolaou E, Tzartos J, Tsekmekidou X, Kotsa K, Maltezos E, Papanas N, Papazoglou D, Passadakis P, Roumeliotis A, Roumeliotis S, Theodoridis M, Thodis E, Panagoutsos S, Yovos J, Stamatoyannopoulos J, Poulas K, Kleopa K, Tzartos S, Georgitsi M, Paschou P. Myasthenia gravis genome-wide association study implicates AGRN as a risk locus. J Med Genet 2021; 59:801-809. [PMID: 34400559 DOI: 10.1136/jmedgenet-2021-107953] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2021] [Accepted: 07/20/2021] [Indexed: 11/04/2022]
Abstract
BACKGROUND Myasthenia gravis (MG) is a rare autoimmune disorder affecting the neuromuscular junction (NMJ). Here, we investigate the genetic architecture of MG via a genome-wide association study (GWAS) of the largest MG data set analysed to date. METHODS We performed GWAS meta-analysis integrating three different data sets (total of 1401 cases and 3508 controls). We carried out human leucocyte antigen (HLA) fine-mapping, gene-based and tissue enrichment analyses and investigated genetic correlation with 13 other autoimmune disorders as well as pleiotropy across MG and correlated disorders. RESULTS We confirmed the previously reported MG association with TNFRSF11A (rs4369774; p=1.09×10-13, OR=1.4). Furthermore, gene-based analysis revealed AGRN as a novel MG susceptibility gene. HLA fine-mapping pointed to two independent MG loci: HLA-DRB1 and HLA-B. MG onset-specific analysis reveals differences in the genetic architecture of early-onset MG (EOMG) versus late-onset MG (LOMG). Furthermore, we find MG to be genetically correlated with type 1 diabetes (T1D), rheumatoid arthritis (RA), late-onset vitiligo and autoimmune thyroid disease (ATD). Cross-disorder meta-analysis reveals multiple risk loci that appear pleiotropic across MG and correlated disorders. DISCUSSION Our gene-based analysis identifies AGRN as a novel MG susceptibility gene, implicating for the first time a locus encoding a protein (agrin) that is directly relevant to NMJ activation. Mutations in AGRN have been found to underlie congenital myasthenic syndrome. Our results are also consistent with previous studies highlighting the role of HLA and TNFRSF11A in MG aetiology and the different risk genes in EOMG versus LOMG. Finally, we uncover the genetic correlation of MG with T1D, RA, ATD and late-onset vitiligo, pointing to shared underlying genetic mechanisms.
Collapse
Affiliation(s)
- Apostolia Topaloudi
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, USA
| | - Zoi Zagoriti
- Department of Pharmacy, University of Patras, Rio, Greece
| | - Alyssa Camille Flint
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, USA
| | | | - Zhiyu Yang
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, USA
| | - Fotis Tsetsos
- Department of Molecular Biology and Genetics, Democritus University of Thrace, Alexandroupoli, Greece
| | | | | | - Evangelia Yannaki
- Department of Hematology, George Papanicolaou Hospital, Thessaloniki, Greece
| | - Eleni Zamba-Papanicolaou
- The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus.,Department of Neuroepidemiology and Centre for Neuromuscular Disorders, The Cyprus Institute of Neurology and Genetics and Cyprus School of Molecular Medicine, Nicosia, Cyprus
| | | | - Xanthippi Tsekmekidou
- Division of Endocrinology and Metabolism-Diabetes Center, 1st Department of Internal Medicine, AHEPA University Hospital, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Kalliopi Kotsa
- Division of Endocrinology and Metabolism-Diabetes Center, 1st Department of Internal Medicine, AHEPA University Hospital, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Efstratios Maltezos
- Diabetes Center, 2nd Department of Internal Medicine, Alexandroupolis University General Hospital, Democritus University of Thrace, Alexandroupoli, Greece
| | - Nikolaos Papanas
- Diabetes Center, 2nd Department of Internal Medicine, Alexandroupolis University General Hospital, Democritus University of Thrace, Alexandroupoli, Greece
| | - Dimitrios Papazoglou
- Diabetes Center, 2nd Department of Internal Medicine, Alexandroupolis University General Hospital, Democritus University of Thrace, Alexandroupoli, Greece
| | - Ploumis Passadakis
- Department of Nephrology, Alexandroupolis University General Hospital, Democritus University of Thrace, Alexandroupoli, Greece
| | - Athanasios Roumeliotis
- Division of Nephrology and Hypertension, 1st Department of Internal Medicine, AHEPA University Hospital, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Stefanos Roumeliotis
- Division of Nephrology and Hypertension, 1st Department of Internal Medicine, AHEPA University Hospital, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Marios Theodoridis
- Department of Nephrology, Alexandroupolis University General Hospital, Democritus University of Thrace, Alexandroupoli, Greece
| | - Elias Thodis
- Department of Nephrology, Alexandroupolis University General Hospital, Democritus University of Thrace, Alexandroupoli, Greece
| | - Stylianos Panagoutsos
- Department of Nephrology, Alexandroupolis University General Hospital, Democritus University of Thrace, Alexandroupoli, Greece
| | - John Yovos
- Division of Endocrinology and Metabolism-Diabetes Center, 1st Department of Internal Medicine, AHEPA University Hospital, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - John Stamatoyannopoulos
- Departments of Medicine and Genome Sciences, University of Washington, Seattle, Washington, USA
| | | | - Kleopas Kleopa
- The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus.,Department of Neuroscience and Centre for Neuromuscular Disorders, The Cyprus Institute of Neurology and Genetics and Cyprus School of Molecular Medicine, Nicosia, Cyprus
| | - Socrates Tzartos
- Department of Pharmacy, University of Patras, Rio, Greece.,Hellenic Pasteur Institute, Athens, Greece
| | - Marianthi Georgitsi
- 1st Laboratory of Medical Biology-Genetics, School of Medicine, Faculty of Health Sciences, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Peristera Paschou
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, USA
| |
Collapse
|
6
|
Nandakumar V, Stehling-Sun S, Kerwin W, Funnell A, Stamatoyannopoulos J. Abstract PO-048: Visual nucleotyping identifies chromatin phenotypes triggered by genome editing. Clin Cancer Res 2021. [DOI: 10.1158/1557-3265.adi21-po-048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Optical microscopy has the potential to provide rapid phenotypic readout of cellular states. Here we show that automated optical phenotyping of nuclei is capable of rapidly and reliably discriminating the effects of targeted mutations to chromatin remodelers, which are major players in oncogenesis and targets for therapeutic intervention. Smarca4 is a chromatin remodeler frequently mutated in cancer. We investigated whether nuclear morphometry and concomitant supervised machine learning approaches on standard optical microscopy images of nuclear-DNA stained cells could reliably discern Smarca4-/- lung cancer cells from isogenic Smarca4+/+ clones that were derived by precision genome editing. We used TALEN-based editing on A549 lung cancer cells to generate isogenic Smarca4+/+ clones, performed conventional 2D 40x optical imaging of DAPI-stained cell nuclei from parental (Smarca4-/-) and edited (Smarca4+/+) cells, quantified a set of numerical parameters (features) that reflected various intuitive attributes of nuclear morphology and chromatin texture of the cell nuclei, and finally tested the efficacy of supervised machine learning algorithms containing these features to distinguish between the two cell categories. We found that the parental and edited A549 cells exhibited similar nuclear size and shape attributes but showed measurable differences in subtle chromatin texture. Additionally, classifiers based on these chromatin texture features were able to accurately distinguish the smarca4-rescued cells from the smarca4-mutant cells. Our results thus demonstrate the promise of quantitative chromatin phenotyping for genome editing applications in oncology.
Citation Format: Vivek Nandakumar, Sandra Stehling-Sun, William Kerwin, Alister Funnell, John Stamatoyannopoulos. Visual nucleotyping identifies chromatin phenotypes triggered by genome editing [abstract]. In: Proceedings of the AACR Virtual Special Conference on Artificial Intelligence, Diagnosis, and Imaging; 2021 Jan 13-14. Philadelphia (PA): AACR; Clin Cancer Res 2021;27(5_Suppl):Abstract nr PO-048.
Collapse
|
7
|
Meuleman W, Muratov A, Rynes E, Halow J, Lee K, Bates D, Diegel M, Dunn D, Neri F, Teodosiadis A, Reynolds A, Haugen E, Nelson J, Johnson A, Frerker M, Buckley M, Sandstrom R, Vierstra J, Kaul R, Stamatoyannopoulos J. Index and biological spectrum of human DNase I hypersensitive sites. Nature 2020; 584:244-251. [PMID: 32728217 PMCID: PMC7422677 DOI: 10.1038/s41586-020-2559-3] [Citation(s) in RCA: 134] [Impact Index Per Article: 33.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Accepted: 07/01/2020] [Indexed: 01/08/2023]
Abstract
DNase I hypersensitive sites (DHSs) are generic markers of regulatory DNA1–5 and contain genetic variations associated with diseases and phenotypic traits6–8. We created high-resolution maps of DHSs from 733 human biosamples encompassing 438 cell and tissue types and states, and integrated these to delineate and numerically index approximately 3.6 million DHSs within the human genome sequence, providing a common coordinate system for regulatory DNA. Here we show that these maps highly resolve the cis-regulatory compartment of the human genome, which encodes unexpectedly diverse cell- and tissue-selective regulatory programs at very high density. These programs can be captured comprehensively by a simple vocabulary that enables the assignment to each DHS of a regulatory barcode that encapsulates its tissue manifestations, and global annotation of protein-coding and non-coding RNA genes in a manner orthogonal to gene expression. Finally, we show that sharply resolved DHSs markedly enhance the genetic association and heritability signals of diseases and traits. Rather than being confined to a small number of distal elements or promoters, we find that genetic signals converge on congruently regulated sets of DHSs that decorate entire gene bodies. Together, our results create a universal, extensible coordinate system and vocabulary for human regulatory DNA marked by DHSs, and provide a new global perspective on the architecture of human gene regulation. High-resolution maps of DNase I hypersensitive sites from 733 human biosamples are used to identify and index regulatory DNA within the human genome.
Collapse
Affiliation(s)
| | | | - Eric Rynes
- Altius Institute for Biomedical Sciences, Seattle, WA, USA
| | - Jessica Halow
- Altius Institute for Biomedical Sciences, Seattle, WA, USA
| | - Kristen Lee
- Altius Institute for Biomedical Sciences, Seattle, WA, USA
| | - Daniel Bates
- Altius Institute for Biomedical Sciences, Seattle, WA, USA
| | - Morgan Diegel
- Altius Institute for Biomedical Sciences, Seattle, WA, USA
| | - Douglas Dunn
- Altius Institute for Biomedical Sciences, Seattle, WA, USA
| | - Fidencio Neri
- Altius Institute for Biomedical Sciences, Seattle, WA, USA
| | | | - Alex Reynolds
- Altius Institute for Biomedical Sciences, Seattle, WA, USA
| | - Eric Haugen
- Altius Institute for Biomedical Sciences, Seattle, WA, USA
| | - Jemma Nelson
- Altius Institute for Biomedical Sciences, Seattle, WA, USA
| | - Audra Johnson
- Altius Institute for Biomedical Sciences, Seattle, WA, USA
| | - Mark Frerker
- Altius Institute for Biomedical Sciences, Seattle, WA, USA
| | | | | | - Jeff Vierstra
- Altius Institute for Biomedical Sciences, Seattle, WA, USA
| | - Rajinder Kaul
- Altius Institute for Biomedical Sciences, Seattle, WA, USA
| | - John Stamatoyannopoulos
- Altius Institute for Biomedical Sciences, Seattle, WA, USA. .,Department of Genome Sciences, University of Washington, Seattle, WA, USA. .,Division of Oncology, Department of Medicine, University of Washington, Seattle, WA, USA.
| |
Collapse
|
8
|
Chen Z, Kibler RD, Hunt A, Busch F, Pearl J, Jia M, VanAernum ZL, Wicky BIM, Dods G, Liao H, Wilken MS, Ciarlo C, Green S, El-Samad H, Stamatoyannopoulos J, Wysocki VH, Jewett MC, Boyken SE, Baker D. De novo design of protein logic gates. Science 2020; 368:78-84. [PMID: 32241946 DOI: 10.1126/science.aay2790] [Citation(s) in RCA: 115] [Impact Index Per Article: 28.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2019] [Accepted: 03/05/2020] [Indexed: 12/16/2022]
Abstract
The design of modular protein logic for regulating protein function at the posttranscriptional level is a challenge for synthetic biology. Here, we describe the design of two-input AND, OR, NAND, NOR, XNOR, and NOT gates built from de novo-designed proteins. These gates regulate the association of arbitrary protein units ranging from split enzymes to transcriptional machinery in vitro, in yeast and in primary human T cells, where they control the expression of the TIM3 gene related to T cell exhaustion. Designed binding interaction cooperativity, confirmed by native mass spectrometry, makes the gates largely insensitive to stoichiometric imbalances in the inputs, and the modularity of the approach enables ready extension to three-input OR, AND, and disjunctive normal form gates. The modularity and cooperativity of the control elements, coupled with the ability to de novo design an essentially unlimited number of protein components, should enable the design of sophisticated posttranslational control logic over a wide range of biological functions.
Collapse
Affiliation(s)
- Zibo Chen
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA.,Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
| | - Ryan D Kibler
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA.,Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
| | - Andrew Hunt
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL 60208, USA
| | - Florian Busch
- Department of Chemistry and Biochemistry, The Ohio State University, Columbus, OH 43210, USA.,Resource for Native Mass Spectrometry Guided Structural Biology, The Ohio State University, Columbus, OH 43210, USA
| | - Jocelynn Pearl
- Altius Institute for Biomedical Sciences, Seattle, WA 98195, USA
| | - Mengxuan Jia
- Department of Chemistry and Biochemistry, The Ohio State University, Columbus, OH 43210, USA.,Resource for Native Mass Spectrometry Guided Structural Biology, The Ohio State University, Columbus, OH 43210, USA
| | - Zachary L VanAernum
- Department of Chemistry and Biochemistry, The Ohio State University, Columbus, OH 43210, USA.,Resource for Native Mass Spectrometry Guided Structural Biology, The Ohio State University, Columbus, OH 43210, USA
| | - Basile I M Wicky
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA.,Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
| | - Galen Dods
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Hanna Liao
- Altius Institute for Biomedical Sciences, Seattle, WA 98195, USA
| | - Matthew S Wilken
- Altius Institute for Biomedical Sciences, Seattle, WA 98195, USA
| | - Christie Ciarlo
- Altius Institute for Biomedical Sciences, Seattle, WA 98195, USA
| | - Shon Green
- Altius Institute for Biomedical Sciences, Seattle, WA 98195, USA
| | - Hana El-Samad
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA 94158, USA.,Chan-Zuckerberg Biohub, San Francisco, CA 94158, USA
| | - John Stamatoyannopoulos
- Altius Institute for Biomedical Sciences, Seattle, WA 98195, USA.,Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA.,Department of Medicine, Division of Oncology, University of Washington, Seattle, WA 98109, USA
| | - Vicki H Wysocki
- Department of Chemistry and Biochemistry, The Ohio State University, Columbus, OH 43210, USA.,Resource for Native Mass Spectrometry Guided Structural Biology, The Ohio State University, Columbus, OH 43210, USA
| | - Michael C Jewett
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL 60208, USA.,Chemistry of Life Processes Institute, Northwestern University, Evanston, IL 60208, USA.,Center for Synthetic Biology, Northwestern University, Evanston, IL 60208, USA
| | - Scott E Boyken
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA.,Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA. .,Institute for Protein Design, University of Washington, Seattle, WA 98195, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
9
|
Abstract
Determining how the evolving genome-wide map of distal regulatory elements is connected with target genes has remained a significant challenge, despite progress in understanding chromatin architecture and regulation. A new study presents a computational approach for predicting distal element-gene interactions.
Collapse
Affiliation(s)
- John Stamatoyannopoulos
- Departments of Genome Sciences and Medicine at the University of Washington and at the Altius Institute for Biomedical Sciences, Seattle, Washington, USA
| |
Collapse
|
10
|
McGeachie MJ, Yates KP, Zhou X, Guo F, Sternberg AL, Van Natta ML, Wise RA, Szefler SJ, Sharma S, Kho AT, Cho MH, Croteau-Chonka DC, Castaldi PJ, Jain G, Sanyal A, Zhan Y, Lajoie BR, Dekker J, Stamatoyannopoulos J, Covar RA, Zeiger RS, Adkinson NF, Williams PV, Kelly HW, Grasemann H, Vonk JM, Koppelman GH, Postma DS, Raby BA, Houston I, Lu Q, Fuhlbrigge AL, Tantisira KG, Silverman EK, Tonascia J, Strunk RC, Weiss ST. Genetics and Genomics of Longitudinal Lung Function Patterns in Individuals with Asthma. Am J Respir Crit Care Med 2017; 194:1465-1474. [PMID: 27367781 DOI: 10.1164/rccm.201602-0250oc] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
RATIONALE Patterns of longitudinal lung function growth and decline in childhood asthma have been shown to be important in determining risk for future respiratory ailments including chronic airway obstruction and chronic obstructive pulmonary disease. OBJECTIVES To determine the genetic underpinnings of lung function patterns in subjects with childhood asthma. METHODS We performed a genome-wide association study of 581 non-Hispanic white individuals with asthma that were previously classified by patterns of lung function growth and decline (normal growth, normal growth with early decline, reduced growth, and reduced growth with early decline). The strongest association was also measured in two additional cohorts: a small asthma cohort and a large chronic obstructive pulmonary disease metaanalysis cohort. Interaction between the genomic region encompassing the most strongly associated single-nucleotide polymorphism and nearby genes was assessed by two chromosome conformation capture assays. MEASUREMENTS AND MAIN RESULTS An intergenic single-nucleotide polymorphism (rs4445257) on chromosome 8 was strongly associated with the normal growth with early decline pattern compared with all other pattern groups (P = 6.7 × 10-9; odds ratio, 2.8; 95% confidence interval, 2.0-4.0); replication analysis suggested this variant had opposite effects in normal growth with early decline and reduced growth with early decline pattern groups. Chromosome conformation capture experiments indicated a chromatin interaction between rs4445257 and the promoter of the distal CSMD3 gene. CONCLUSIONS Early decline in lung function after normal growth is associated with a genetic polymorphism that may also protect against early decline in reduced growth groups. Clinical trial registered with www.clinicaltrials.gov (NCT00000575).
Collapse
Affiliation(s)
- Michael J McGeachie
- 1 Channing Division of Network Medicine and.,2 Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts
| | | | - Xiaobo Zhou
- 1 Channing Division of Network Medicine and.,2 Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts
| | - Feng Guo
- 1 Channing Division of Network Medicine and.,2 Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts
| | | | | | - Robert A Wise
- 4 School of Medicine, Johns Hopkins University, Baltimore, Maryland
| | - Stanley J Szefler
- 5 National Jewish Health and Research Center, Denver, Colorado.,6 Children's Hospital Colorado and
| | - Sunita Sharma
- 7 Division of Pulmonary Sciences and Critical Care Medicine, Department of Medicine, University of Colorado, Denver, Colorado
| | - Alvin T Kho
- 1 Channing Division of Network Medicine and.,2 Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts.,8 Boston Children's Hospital, Boston, Massachusetts
| | - Michael H Cho
- 1 Channing Division of Network Medicine and.,2 Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts
| | - Damien C Croteau-Chonka
- 1 Channing Division of Network Medicine and.,2 Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts
| | - Peter J Castaldi
- 1 Channing Division of Network Medicine and.,2 Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts
| | - Gaurav Jain
- 9 Program in Systems Biology, Department of Biochemistry and Molecular Pharmacology, and
| | - Amartya Sanyal
- 9 Program in Systems Biology, Department of Biochemistry and Molecular Pharmacology, and.,10 School of Biological Sciences, Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore
| | - Ye Zhan
- 9 Program in Systems Biology, Department of Biochemistry and Molecular Pharmacology, and
| | - Bryan R Lajoie
- 9 Program in Systems Biology, Department of Biochemistry and Molecular Pharmacology, and
| | - Job Dekker
- 11 Howard Hughes Medical Institute, Program in Systems Biology, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts
| | | | - Ronina A Covar
- 5 National Jewish Health and Research Center, Denver, Colorado.,6 Children's Hospital Colorado and.,13 University of Colorado, Denver, Colorado
| | - Robert S Zeiger
- 14 Department of Pediatrics, University of California at San Diego, La Jolla, California.,15 Kaiser Permanente Southern California Region, San Diego, California
| | | | - Paul V Williams
- 16 ASTHMA, Inc., Clinical Research Center and Northwest Asthma & Allergy Center, Seattle, Washington
| | - H William Kelly
- 17 University of New Mexico Health Sciences Center, Albuquerque, New Mexico
| | - Hartmut Grasemann
- 18 Division of Respiratory Medicine, Department of Pediatrics, The Hospital for Sick Children and University of Toronto, Toronto, Canada
| | | | - Gerard H Koppelman
- 20 Department of Pediatric Pulmonology and Pediatric Allergology, Beatrix Children's Hospital, and
| | - Dirkje S Postma
- 21 Department of Pulmonology, University of Groningen, University Medical Center Groningen, Groningen Research Institute for Asthma and COPD, Groningen, the Netherlands
| | - Benjamin A Raby
- 1 Channing Division of Network Medicine and.,2 Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts
| | - Isaac Houston
- 1 Channing Division of Network Medicine and.,2 Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts
| | - Quan Lu
- 22 Program in Molecular and Integrative Physiological Sciences, Departments of Environmental Health and Genetics & Complex Diseases, Harvard T.H. Chan School of Public Health, Boston, Massachusetts; and
| | - Anne L Fuhlbrigge
- 1 Channing Division of Network Medicine and.,23 Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Boston, Massachusetts.,2 Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts
| | - Kelan G Tantisira
- 1 Channing Division of Network Medicine and.,2 Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts
| | - Edwin K Silverman
- 1 Channing Division of Network Medicine and.,2 Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts
| | | | - Robert C Strunk
- 24 Division of Allergy, Immunology, and Pulmonary Medicine, Washington University School of Medicine, St. Louis, Missouri
| | - Scott T Weiss
- 1 Channing Division of Network Medicine and.,2 Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts
| | | |
Collapse
|
11
|
McGeachie MJ, Yates KP, Zhou X, Guo F, Sternberg AL, Van Natta ML, Wise RA, Szefler SJ, Sharma S, Kho AT, Cho MH, Croteau-Chonka DC, Castaldi PJ, Jain G, Sanyal A, Zhan Y, Lajoie BR, Dekker J, Stamatoyannopoulos J, Covar RA, Zeiger RS, Adkinson NF, Williams PV, Kelly HW, Grasemann H, Vonk JM, Koppelman GH, Postma DS, Raby BA, Houston I, Lu Q, Fuhlbrigge AL, Tantisira KG, Silverman EK, Tonascia J, Weiss ST, Strunk RC. Patterns of Growth and Decline in Lung Function in Persistent Childhood Asthma. N Engl J Med 2016; 374:1842-1852. [PMID: 27168434 PMCID: PMC5032024 DOI: 10.1056/nejmoa1513737] [Citation(s) in RCA: 373] [Impact Index Per Article: 46.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
BACKGROUND Tracking longitudinal measurements of growth and decline in lung function in patients with persistent childhood asthma may reveal links between asthma and subsequent chronic airflow obstruction. METHODS We classified children with asthma according to four characteristic patterns of lung-function growth and decline on the basis of graphs showing forced expiratory volume in 1 second (FEV1), representing spirometric measurements performed from childhood into adulthood. Risk factors associated with abnormal patterns were also examined. To define normal values, we used FEV1 values from participants in the National Health and Nutrition Examination Survey who did not have asthma. RESULTS Of the 684 study participants, 170 (25%) had a normal pattern of lung-function growth without early decline, and 514 (75%) had abnormal patterns: 176 (26%) had reduced growth and an early decline, 160 (23%) had reduced growth only, and 178 (26%) had normal growth and an early decline. Lower baseline values for FEV1, smaller bronchodilator response, airway hyperresponsiveness at baseline, and male sex were associated with reduced growth (P<0.001 for all comparisons). At the last spirometric measurement (mean [±SD] age, 26.0±1.8 years), 73 participants (11%) met Global Initiative for Chronic Obstructive Lung Disease spirometric criteria for lung-function impairment that was consistent with chronic obstructive pulmonary disease (COPD); these participants were more likely to have a reduced pattern of growth than a normal pattern (18% vs. 3%, P<0.001). CONCLUSIONS Childhood impairment of lung function and male sex were the most significant predictors of abnormal longitudinal patterns of lung-function growth and decline. Children with persistent asthma and reduced growth of lung function are at increased risk for fixed airflow obstruction and possibly COPD in early adulthood. (Funded by the Parker B. Francis Foundation and others; ClinicalTrials.gov number, NCT00000575.).
Collapse
|
12
|
Turner T, Hormozdiari F, Duyzend M, McClymont S, Hook P, Iossifov I, Raja A, Baker C, Hoekzema K, Stessman H, Zody M, Nelson B, Huddleston J, Sandstrom R, Smith J, Hanna D, Swanson J, Faustman E, Bamshad M, Stamatoyannopoulos J, Nickerson D, McCallion A, Darnell R, Eichler E. Genome Sequencing of Autism-Affected Families Reveals Disruption of Putative Noncoding Regulatory DNA. Am J Hum Genet 2016; 98:58-74. [PMID: 26749308 DOI: 10.1016/j.ajhg.2015.11.023] [Citation(s) in RCA: 189] [Impact Index Per Article: 23.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2015] [Accepted: 11/25/2015] [Indexed: 12/17/2022] Open
Abstract
We performed whole-genome sequencing (WGS) of 208 genomes from 53 families affected by simplex autism. For the majority of these families, no copy-number variant (CNV) or candidate de novo gene-disruptive single-nucleotide variant (SNV) had been detected by microarray or whole-exome sequencing (WES). We integrated multiple CNV and SNV analyses and extensive experimental validation to identify additional candidate mutations in eight families. We report that compared to control individuals, probands showed a significant (p = 0.03) enrichment of de novo and private disruptive mutations within fetal CNS DNase I hypersensitive sites (i.e., putative regulatory regions). This effect was only observed within 50 kb of genes that have been previously associated with autism risk, including genes where dosage sensitivity has already been established by recurrent disruptive de novo protein-coding mutations (ARID1B, SCN2A, NR3C2, PRKCA, and DSCAM). In addition, we provide evidence of gene-disruptive CNVs (in DISC1, WNT7A, RBFOX1, and MBD5), as well as smaller de novo CNVs and exon-specific SNVs missed by exome sequencing in neurodevelopmental genes (e.g., CANX, SAE1, and PIK3CA). Our results suggest that the detection of smaller, often multiple CNVs affecting putative regulatory elements might help explain additional risk of simplex autism.
Collapse
|
13
|
Kazanov MD, Roberts SA, Polak P, Stamatoyannopoulos J, Klimczak LJ, Gordenin DA, Sunyaev SR. APOBEC-Induced Cancer Mutations Are Uniquely Enriched in Early-Replicating, Gene-Dense, and Active Chromatin Regions. Cell Rep 2015; 13:1103-1109. [PMID: 26527001 DOI: 10.1016/j.celrep.2015.09.077] [Citation(s) in RCA: 65] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2015] [Revised: 08/11/2015] [Accepted: 09/25/2015] [Indexed: 10/22/2022] Open
Abstract
An antiviral component of the human innate immune system-the APOBEC cytidine deaminases-was recently identified as a prominent source of mutations in cancers. Here, we investigated the distribution of APOBEC-induced mutations across the genomes of 119 breast and 24 lung cancer samples. While the rate of most mutations is known to be elevated in late-replicating regions that are characterized by reduced chromatin accessibility and low gene density, we observed a marked enrichment of APOBEC mutations in early-replicating regions. This unusual mutagenesis profile may be associated with a higher propensity to form single-strand DNA substrates for APOBEC enzymes in early-replicating regions and should be accounted for in statistical analyses of cancer genome mutation catalogs aimed at understanding the mechanisms of carcinogenesis as well as highlighting genes that are significantly mutated in cancer.
Collapse
Affiliation(s)
- Marat D Kazanov
- Research and Training Center on Bioinformatics, A.A. Kharkevich Institute for Information Transmission Problems, RAS, Moscow 127051, Russia
| | - Steven A Roberts
- National Institute of Environmental Health Sciences, Durham, NC 27709, USA; School of Molecular Biosciences, Washington State University, Pullman, WA 99164, USA
| | - Paz Polak
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - John Stamatoyannopoulos
- Departments of Genome Sciences and Medicine, University of Washington, Seattle, WA 98195, USA
| | - Leszek J Klimczak
- National Institute of Environmental Health Sciences, Durham, NC 27709, USA
| | - Dmitry A Gordenin
- National Institute of Environmental Health Sciences, Durham, NC 27709, USA.
| | - Shamil R Sunyaev
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA.
| |
Collapse
|
14
|
He X, Tillo D, Vierstra J, Syed KS, Deng C, Ray GJ, Stamatoyannopoulos J, FitzGerald PC, Vinson C. Methylated Cytosines Mutate to Transcription Factor Binding Sites that Drive Tetrapod Evolution. Genome Biol Evol 2015; 7:3155-69. [PMID: 26507798 PMCID: PMC4994754 DOI: 10.1093/gbe/evv205] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
In mammals, the cytosine in CG dinucleotides is typically methylated producing
5-methylcytosine (5mC), a chemically less stable form of cytosine that can spontaneously
deaminate to thymidine resulting in a T•G mismatched base pair. Unlike other eukaryotes
that efficiently repair this mismatched base pair back to C•G, in mammals, 5mCG
deamination is mutagenic, sometimes producing TG dinucleotides, explaining the depletion
of CG dinucleotides in mammalian genomes. It was suggested that new TG dinucleotides
generate genetic diversity that may be critical for evolutionary change. We tested this
conjecture by examining the DNA sequence properties of regulatory sequences identified by
DNase I hypersensitive sites (DHSs) in human and mouse genomes. We hypothesized that the
new TG dinucleotides generate transcription factor binding sites (TFBS) that become
tissue-specific DHSs (TS-DHSs). We find that 8-mers containing the CG dinucleotide are
enriched in DHSs in both species. However, 8-mers containing a TG and no CG dinucleotide
are preferentially enriched in TS-DHSs when compared with 8-mers with neither a TG nor a
CG dinucleotide. The most enriched 8-mer with a TG and no CG dinucleotide in
tissue-specific regulatory regions in both genomes is the AP-1 motif
(TGAC/GTCAN), and we find evidence that
TG dinucleotides in the AP-1 motif arose from CG dinucleotides. Additional TS-DHS-enriched
TFBS containing the TG/CA dinucleotide are the E-Box motif
(GCAGCTGC), the NF-1 motif (GGCA—TGCC), and the
GR (glucocorticoid receptor) motif (G-ACA—TGT-C). Our results support the
suggestion that cytosine methylation is mutagenic in tetrapods producing TG dinucleotides
that create TFBS that drive evolution.
Collapse
Affiliation(s)
- Ximiao He
- Laboratory of Metabolism, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
| | - Desiree Tillo
- Laboratory of Metabolism, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
| | - Jeff Vierstra
- Department of Genome Sciences, University of Washington
| | - Khund-Sayeed Syed
- Laboratory of Metabolism, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
| | - Callie Deng
- Laboratory of Metabolism, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
| | - G Jordan Ray
- Laboratory of Metabolism, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
| | | | - Peter C FitzGerald
- Genome Analysis Unit, Genetics Branch, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
| | - Charles Vinson
- Laboratory of Metabolism, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
| |
Collapse
|
15
|
Hochner H, Allard C, Granot-Hershkovitz E, Chen J, Sitlani CM, Sazdovska S, Lumley T, McKnight B, Rice K, Enquobahrie DA, Meigs JB, Kwok P, Hivert MF, Borecki IB, Gomez F, Wang T, van Duijn C, Amin N, Rotter JI, Stamatoyannopoulos J, Meiner V, Manor O, Dupuis J, Friedlander Y, Siscovick DS. Parent-of-Origin Effects of the APOB Gene on Adiposity in Young Adults. PLoS Genet 2015; 11:e1005573. [PMID: 26451733 PMCID: PMC4599806 DOI: 10.1371/journal.pgen.1005573] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2015] [Accepted: 09/15/2015] [Indexed: 01/23/2023] Open
Abstract
Loci identified in genome-wide association studies (GWAS) of cardio-metabolic traits account for a small proportion of the traits' heritability. To date, most association studies have not considered parent-of-origin effects (POEs). Here we report investigation of POEs on adiposity and glycemic traits in young adults. The Jerusalem Perinatal Family Follow-Up Study (JPS), comprising 1250 young adults and their mothers was used for discovery. Focusing on 18 genes identified by previous GWAS as associated with cardio-metabolic traits, we used linear regression to examine the associations of maternally- and paternally-derived offspring minor alleles with body mass index (BMI), waist circumference (WC), fasting glucose and insulin. We replicated and meta-analyzed JPS findings in individuals of European ancestry aged ≤50 belonging to pedigrees from the Framingham Heart Study, Family Heart Study and Erasmus Rucphen Family study (total N≅4800). We considered p<2.7x10-4 statistically significant to account for multiple testing. We identified a common coding variant in the 4th exon of APOB (rs1367117) with a significant maternally-derived effect on BMI (β = 0.8; 95%CI:0.4,1.1; p = 3.1x10-5) and WC (β = 2.7; 95%CI:1.7,3.7; p = 2.1x10-7). The corresponding paternally-derived effects were non-significant (p>0.6). Suggestive maternally-derived associations of rs1367117 were observed with fasting glucose (β = 0.9; 95%CI:0.3,1.5; p = 4.0x10-3) and insulin (ln-transformed, β = 0.06; 95%CI:0.03,0.1; p = 7.4x10-4). Bioinformatic annotation for rs1367117 revealed a variety of regulatory functions in this region in liver and adipose tissues and a 50% methylation pattern in liver only, consistent with allelic-specific methylation, which may indicate tissue-specific POE. Our findings demonstrate a maternal-specific association between a common APOB variant and adiposity, an association that was not previously detected in GWAS. These results provide evidence for the role of regulatory mechanisms, POEs specifically, in adiposity. In addition this study highlights the benefit of utilizing family studies for deciphering the genetic architecture of complex traits. To date, genetic variants identified in large-scale genetic studies using recent technical and methodological advances explain only a small proportion of the genetic basis of obesity, diabetes and other cardiovascular risk factors. These studies were typically conducted in samples of unrelated individuals. Here we utilize a family-based approach to identify genetic variants associated with obesity-related traits. Specifically, we examined the separate contribution of maternally- vs. paternally-inherited common genetic variants to these traits. By examining 1250 young adults and their mothers from Jerusalem, we show that a specific genetic variant, rs1367117, located in the APOB gene on chromosome 2 is related to body mass index and waist circumference when inherited from mother and not from father. This maternal effect is not restricted to Jerusalemites, but is also seen in a large sample of individuals of European descent from independent family studies worldwide. Our findings provide support of the role of complex genetic mechanisms in obesity, and highlight the benefit of utilizing family studies for uncovering genetic pathways underlying common risk factors and diseases.
Collapse
Affiliation(s)
- Hagit Hochner
- Braun School of Public Health, Hebrew University-Hadassah Medical Center, Jerusalem, Israel
- * E-mail:
| | - Catherine Allard
- Département de Mathématiques, Université de Sherbrooke and Centre de Recherche du Centre Hospitalier Universitaire de Sherbrooke, Sherbrooke, Quebec, Canada
| | | | - Jinbo Chen
- Department of Biostatistics and Epidemiology, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania, United States of America
| | - Colleen M. Sitlani
- Department of Medicine, University of Washington, Seattle, Washington, United States of America
- Cardiovascular Health Research Unit, University of Washington, Seattle, Washington, United States of America
| | - Sandra Sazdovska
- Braun School of Public Health, Hebrew University-Hadassah Medical Center, Jerusalem, Israel
| | - Thomas Lumley
- Department of Statistics, University of Auckland, Auckland, New Zealand
| | - Barbara McKnight
- Department of Biostatistics, University of Washington, Seattle, Washington, United States of America
| | - Kenneth Rice
- Department of Biostatistics, University of Washington, Seattle, Washington, United States of America
| | - Daniel A. Enquobahrie
- Department of Epidemiology, University of Washington, Seattle, Washington, United States of America
| | - James B. Meigs
- Harvard Medical School and General Medicine Division, Massachusetts General Hospital, Boston, Massachusetts, United States of America
| | - Pui Kwok
- Institute of Human Genetics, University of California, San Francisco, California, United States of America
- Cardiovascular Research Institute, University of California, San Francisco, California, United States of America
- Department of Dermatology, University of California, San Francisco, California, United States of America
| | - Marie-France Hivert
- Department of Population Medicine, Harvard Pilgrim Health Care Institute, Boston, Massachusetts, United States of America
| | - Ingrid B. Borecki
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Felicia Gomez
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Ting Wang
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Cornelia van Duijn
- Genetic Epidemiology Unit, Department of Epidemiology, Erasmus Medical Center, University Medical Center, Rotterdam, the Netherlands
| | - Najaf Amin
- Genetic Epidemiology Unit, Department of Epidemiology, Erasmus Medical Center, University Medical Center, Rotterdam, the Netherlands
| | - Jerome I. Rotter
- Institute for Translational Genomics and Population Sciences and Department of Pediatrics, Los Angeles BioMedical Research Institute at Harbor-UCLA Medical Center, Torrance, California, United States of America
| | - John Stamatoyannopoulos
- Department of Medicine, University of Washington, Seattle, Washington, United States of America
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Vardiella Meiner
- Department of Genetics and Metabolism, Hebrew University-Hadassah Medical Center, Jerusalem, Israel
| | - Orly Manor
- Braun School of Public Health, Hebrew University-Hadassah Medical Center, Jerusalem, Israel
| | - Josée Dupuis
- Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts, United States of America
| | - Yechiel Friedlander
- Braun School of Public Health, Hebrew University-Hadassah Medical Center, Jerusalem, Israel
| | - David S. Siscovick
- New York Academy of Medicine, New York, New York, United States of America
| |
Collapse
|
16
|
Mayer A, Iulio J, Maleri S, Eser U, Reynolds A, Vierstra J, Sandstrom R, Stamatoyannopoulos J, Churchman LS. High Resolution Architecture of Human Transcriptional Activity Revealed by Native Elongating Transcript Sequencing. FASEB J 2015. [DOI: 10.1096/fasebj.29.1_supplement.562.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Andreas Mayer
- Department of Genetics Harvard Medical SchoolBostonMAUnited States
| | - Julia Iulio
- Department of Genetics Harvard Medical SchoolBostonMAUnited States
| | - Seth Maleri
- Department of Genetics Harvard Medical SchoolBostonMAUnited States
| | - Umut Eser
- Department of Genetics Harvard Medical SchoolBostonMAUnited States
| | - Alex Reynolds
- Department of Genome SciencesUniversity of WashingtonSeattleWashingtonUnited States
| | - Jeff Vierstra
- Department of Genome SciencesUniversity of WashingtonSeattleWashingtonUnited States
| | - Richard Sandstrom
- Department of Genome SciencesUniversity of WashingtonSeattleWashingtonUnited States
| | | | | |
Collapse
|
17
|
Wilken MS, Brzezinski JA, La Torre A, Siebenthall K, Thurman R, Sabo P, Sandstrom RS, Vierstra J, Canfield TK, Hansen RS, Bender MA, Stamatoyannopoulos J, Reh TA. DNase I hypersensitivity analysis of the mouse brain and retina identifies region-specific regulatory elements. Epigenetics Chromatin 2015; 8:8. [PMID: 25972927 PMCID: PMC4429822 DOI: 10.1186/1756-8935-8-8] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2014] [Accepted: 01/27/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The brain, spinal cord, and neural retina comprise the central nervous system (CNS) of vertebrates. Understanding the regulatory mechanisms that underlie the enormous cell-type diversity of the CNS is a significant challenge. Whole-genome mapping of DNase I-hypersensitive sites (DHSs) has been used to identify cis-regulatory elements in many tissues. We have applied this approach to the mouse CNS, including developing and mature neural retina, whole brain, and two well-characterized brain regions, the cerebellum and the cerebral cortex. RESULTS For the various regions and developmental stages of the CNS that we analyzed, there were approximately the same number of DHSs; however, there were many DHSs unique to each CNS region and developmental stage. Many of the DHSs are likely to mark enhancers that are specific to the specific CNS region and developmental stage. We validated the DNase I mapping approach for identification of CNS enhancers using the existing VISTA Browser database and with in vivo and in vitro electroporation of the retina. Analysis of transcription factor consensus sites within the DHSs shows distinct region-specific profiles of transcriptional regulators particular to each region. Clustering developmentally dynamic DHSs in the retina revealed enrichment of developmental stage-specific transcriptional regulators. Additionally, we found reporter gene activity in the retina driven from several previously uncharacterized regulatory elements surrounding the neurodevelopmental gene Otx2. Identification of DHSs shared between mouse and human showed region-specific differences in the evolution of cis-regulatory elements. CONCLUSIONS Overall, our results demonstrate the potential of genome-wide DNase I mapping to cis-regulatory questions regarding the regional diversity within the CNS. These data represent an extensive catalogue of potential cis-regulatory elements within the CNS that display region and temporal specificity, as well as a set of DHSs common to CNS tissues. Further examination of evolutionary conservation of DHSs between CNS regions and different species may reveal important cis-regulatory elements in the evolution of the mammalian CNS.
Collapse
Affiliation(s)
- Matthew S Wilken
- Department of Biological Structure, University of Washington, 1959 NE Pacific Street, Box 357420, Seattle, WA 98195 USA ; Molecular and Cellular Biology Program, University of Washington, MCB Program Office, T-466 Health Sciences Building, Box 357275, Seattle, WA 98195 USA
| | - Joseph A Brzezinski
- Department of Biological Structure, University of Washington, 1959 NE Pacific Street, Box 357420, Seattle, WA 98195 USA ; Department of Ophthalmology, University of Colorado School of Medicine, 1675 Aurora Court, Aurora, CO 80045 USA
| | - Anna La Torre
- Department of Biological Structure, University of Washington, 1959 NE Pacific Street, Box 357420, Seattle, WA 98195 USA
| | - Kyle Siebenthall
- Department of Genome Sciences, University of Washington, Foege Building S-250, 3720 15th Ave NE, Box 355065, Seattle, WA 98195 USA
| | - Robert Thurman
- Department of Genome Sciences, University of Washington, Foege Building S-250, 3720 15th Ave NE, Box 355065, Seattle, WA 98195 USA
| | - Peter Sabo
- Department of Genome Sciences, University of Washington, Foege Building S-250, 3720 15th Ave NE, Box 355065, Seattle, WA 98195 USA
| | - Richard S Sandstrom
- Department of Genome Sciences, University of Washington, Foege Building S-250, 3720 15th Ave NE, Box 355065, Seattle, WA 98195 USA
| | - Jeff Vierstra
- Department of Genome Sciences, University of Washington, Foege Building S-250, 3720 15th Ave NE, Box 355065, Seattle, WA 98195 USA
| | - Theresa K Canfield
- Department of Genome Sciences, University of Washington, Foege Building S-250, 3720 15th Ave NE, Box 355065, Seattle, WA 98195 USA
| | - R Scott Hansen
- Department of Genome Sciences, University of Washington, Foege Building S-250, 3720 15th Ave NE, Box 355065, Seattle, WA 98195 USA
| | - Michael A Bender
- Department of Pediatrics, University of Washington, 1959 NE Pacific St, Health Sciences Building, Seattle, WA Box 356320, 98195 USA ; Clinical Research Division, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue North, Seattle, WA 98109 USA
| | - John Stamatoyannopoulos
- Department of Genome Sciences, University of Washington, Foege Building S-250, 3720 15th Ave NE, Box 355065, Seattle, WA 98195 USA
| | - Thomas A Reh
- Department of Biological Structure, University of Washington, 1959 NE Pacific Street, Box 357420, Seattle, WA 98195 USA
| |
Collapse
|
18
|
Shao J, He K, Wang H, Ho WS, Ren X, An X, Wong MK, Yan B, Xie D, Stamatoyannopoulos J, Zhao Z. Collaborative regulation of development but independent control of metabolism by two epidermis-specific transcription factors in Caenorhabditis elegans. J Biol Chem 2013; 288:33411-26. [PMID: 24097988 DOI: 10.1074/jbc.m113.487975] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Cell fate specification is typically initiated by a master regulator, which is relayed by tissue-specific regulatory proteins (usually transcription factors) for further enforcement of cell identities, but how the factors are coordinated among each other to "finish up" the specification remains poorly understood. Caenorhabditis elegans epidermis specification is initiated by a master regulator, ELT-1, that activates its targets, NHR-25 and ELT-3, two epidermis-specific transcription factors that are important for development but not for initial specification of epidermis, thus providing a unique paradigm for illustrating how the tissue-specific regulatory proteins work together to enforce cell fate specification. Here we addressed the question through contrasting genome-wide in vivo binding targets between NHR-25 and ELT-3. We demonstrate that the two factors bind discrete but conserved DNA motifs, most of which remain in proximity, suggesting formation of a complex between the two. In agreement with this, gene ontology analysis of putative target genes suggested differential regulation of metabolism but coordinated control of epidermal development between the two factors, which is supported by quantitative analysis of expression of their specific or common targets in the presence or absence of either protein. Functional validation of a subset of the target genes showed both activating and inhibitory roles of NHR-25 and ELT-3 in regulating their targets. We further demonstrated differential control of specification of AB and C lineage-derived epidermis. The results allow us to assemble a comprehensive gene network underlying C. elegans epidermis development that is likely to be widely used across species and provides insights into how tissue-specific transcription factors coordinate with one another to enforce cell fate specification initiated by its master regulator.
Collapse
Affiliation(s)
- Jiaofang Shao
- From the Department of Biology, Hong Kong Baptist University, Hong Kong, China and
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Tseng HHE, Hullar MAJ, Li F, Lampe JW, Sandstrom R, Johnson AK, Strate LL, Ruzzo WL, Stamatoyannopoulos J. A microbial profiling method for the human microbiota using high-throughput sequencing. Metagenomics (Cairo) 2013; 2:235646. [PMID: 24013439 DOI: 10.4303/mg/235646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Study of the human microbiota in relation to human health and disease is a rapidly expanding field. To fully understand the complex relationship between the human gut microbiota and disease risks, study designs that capture the variation within and between human subjects at the population level are required, but this has been hampered by the lack of cost-effective methods to characterize this variation. Illumina sequencing is inexpensive and produces millions of reads per run, but it is unclear whether short reads can adequately represent the microbial community of a human host. In this study, we examined the utility of a profiling method, microbial nucleotide signatures (MNS), focused on low-depth sampling of the human microbiota using Ilumina short reads. This method is intended to aid in human population-based studies where large sample sizes are required to adequately capture variation in disease or phenotype differences. We found that, by calculating the nucleotide diversities along the sequenced 16S rRNA gene region, which did not require assembly or phylogenetic identification, we were able to differentiate the gut microbial nucleotide signatures of 9 healthy individuals. When we further subsampled the reads down to 40,000 reads (51 bp long) per sample, the diversity profiles were relatively unchanged. Applying MNS to a public datasets showed that it could differentiate body site differences. The scalability of our approach offers rapid classification of study participants for studies with the sample sizes required for epidemiological studies. Using MNS to classify the microbiome associated with a disease state followed by targeted in-depth sequencing will give a comprehensive understanding of the role of the microbiome in human health.
Collapse
|
20
|
Rajagopal N, Xie W, Li Y, Wagner U, Wang W, Stamatoyannopoulos J, Ernst J, Kellis M, Ren B. RFECS: a random-forest based algorithm for enhancer identification from chromatin state. PLoS Comput Biol 2013; 9:e1002968. [PMID: 23526891 PMCID: PMC3597546 DOI: 10.1371/journal.pcbi.1002968] [Citation(s) in RCA: 151] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2012] [Accepted: 01/20/2013] [Indexed: 01/08/2023] Open
Abstract
Transcriptional enhancers play critical roles in regulation of gene expression, but their identification in the eukaryotic genome has been challenging. Recently, it was shown that enhancers in the mammalian genome are associated with characteristic histone modification patterns, which have been increasingly exploited for enhancer identification. However, only a limited number of cell types or chromatin marks have previously been investigated for this purpose, leaving the question unanswered whether there exists an optimal set of histone modifications for enhancer prediction in different cell types. Here, we address this issue by exploring genome-wide profiles of 24 histone modifications in two distinct human cell types, embryonic stem cells and lung fibroblasts. We developed a Random-Forest based algorithm, RFECS (Random Forest based Enhancer identification from Chromatin States) to integrate histone modification profiles for identification of enhancers, and used it to identify enhancers in a number of cell-types. We show that RFECS not only leads to more accurate and precise prediction of enhancers than previous methods, but also helps identify the most informative and robust set of three chromatin marks for enhancer prediction. Enhancers are regions in the genome that can activate the expression of a gene irrespective of their location with respect to the gene. Identifying these elements is critical in understanding regulatory differences between different cell-types. Since enhancers lack characteristic sequence features and can be far away from the gene they regulate, their identification is not trivial. Experimentally determining the genome-wide binding sites of transcriptional co-activator p300 is one way of finding enhancers but it can only identify a subset of enhancers. A few years ago, it was observed that the binding sites of p300 are marked by distinctive, post-translational histone modifications. Several groups have exploited this discovery to predict genome-wide enhancers based on their similarity to the histone modification profiles of p300 binding sites. We here report a novel algorithm for this purpose and show that it has much greater accuracy than existing methods. Another unique feature of our algorithm is the ability to automatically deduce the most informative set of histone modifications required for enhancer prediction. We expect that this method will become increasingly useful with the expanding number of known histone modifications and rapid accumulation of epigenomic datasets for various cell types and species.
Collapse
Affiliation(s)
- Nisha Rajagopal
- Ludwig Institute for Cancer Research, University of California at San Diego, La Jolla, California, United States of America
- Bioinformatics and Systems Biology program, University of California at San Diego, La Jolla, California, United States of America
| | - Wei Xie
- Ludwig Institute for Cancer Research, University of California at San Diego, La Jolla, California, United States of America
| | - Yan Li
- Ludwig Institute for Cancer Research, University of California at San Diego, La Jolla, California, United States of America
| | - Uli Wagner
- Ludwig Institute for Cancer Research, University of California at San Diego, La Jolla, California, United States of America
| | - Wei Wang
- Department of Chemistry and Biochemistry, University of California at San Diego, La Jolla, California, United States of America
| | - John Stamatoyannopoulos
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Jason Ernst
- Department of Biological Chemistry, University of California Los Angeles, Los Angeles, California, United States of America
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Manolis Kellis
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Bing Ren
- Ludwig Institute for Cancer Research, University of California at San Diego, La Jolla, California, United States of America
- Bioinformatics and Systems Biology program, University of California at San Diego, La Jolla, California, United States of America
- Department of Cellular and Molecular Medicine, Institute of Genomic Medicine, and Moores Cancer Center, University of California at San Diego, La Jolla, California, United States of America
- * E-mail:
| |
Collapse
|
21
|
Paige SL, Thomas S, Stoick-Cooper CL, Wang H, Maves L, Sandstrom R, Pabon L, Reinecke H, Pratt G, Keller G, Moon RT, Stamatoyannopoulos J, Murry CE. A temporal chromatin signature in human embryonic stem cells identifies regulators of cardiac development. Cell 2012; 151:221-32. [PMID: 22981225 DOI: 10.1016/j.cell.2012.08.027] [Citation(s) in RCA: 224] [Impact Index Per Article: 18.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2012] [Revised: 06/26/2012] [Accepted: 08/15/2012] [Indexed: 12/19/2022]
Abstract
Directed differentiation of human embryonic stem cells (ESCs) into cardiovascular cells provides a model for studying molecular mechanisms of human cardiovascular development. Although it is known that chromatin modification patterns in ESCs differ markedly from those in lineage-committed progenitors and differentiated cells, the temporal dynamics of chromatin alterations during differentiation along a defined lineage have not been studied. We show that differentiation of human ESCs into cardiovascular cells is accompanied by programmed temporal alterations in chromatin structure that distinguish key regulators of cardiovascular development from other genes. We used this temporal chromatin signature to identify regulators of cardiac development, including the homeobox gene MEIS2. Using the zebrafish model, we demonstrate that MEIS2 is critical for proper heart tube formation and subsequent cardiac looping. Temporal chromatin signatures should be broadly applicable to other models of stem cell differentiation to identify regulators and provide key insights into major developmental decisions.
Collapse
Affiliation(s)
- Sharon L Paige
- Department of Pathology, University of Washington, Seattle, 98109, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
22
|
Djebali S, Lagarde J, Kapranov P, Lacroix V, Borel C, Mudge JM, Howald C, Foissac S, Ucla C, Chrast J, Ribeca P, Martin D, Murray RR, Yang X, Ghamsari L, Lin C, Bell I, Dumais E, Drenkow J, Tress ML, Gelpí JL, Orozco M, Valencia A, van Berkum NL, Lajoie BR, Vidal M, Stamatoyannopoulos J, Batut P, Dobin A, Harrow J, Hubbard T, Dekker J, Frankish A, Salehi-Ashtiani K, Reymond A, Antonarakis SE, Guigó R, Gingeras TR. Evidence for transcript networks composed of chimeric RNAs in human cells. PLoS One 2012; 7:e28213. [PMID: 22238572 PMCID: PMC3251577 DOI: 10.1371/journal.pone.0028213] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2011] [Accepted: 11/03/2011] [Indexed: 12/03/2022] Open
Abstract
The classic organization of a gene structure has followed the Jacob and Monod bacterial gene model proposed more than 50 years ago. Since then, empirical determinations of the complexity of the transcriptomes found in yeast to human has blurred the definition and physical boundaries of genes. Using multiple analysis approaches we have characterized individual gene boundaries mapping on human chromosomes 21 and 22. Analyses of the locations of the 5′ and 3′ transcriptional termini of 492 protein coding genes revealed that for 85% of these genes the boundaries extend beyond the current annotated termini, most often connecting with exons of transcripts from other well annotated genes. The biological and evolutionary importance of these chimeric transcripts is underscored by (1) the non-random interconnections of genes involved, (2) the greater phylogenetic depth of the genes involved in many chimeric interactions, (3) the coordination of the expression of connected genes and (4) the close in vivo and three dimensional proximity of the genomic regions being transcribed and contributing to parts of the chimeric RNAs. The non-random nature of the connection of the genes involved suggest that chimeric transcripts should not be studied in isolation, but together, as an RNA network.
Collapse
Affiliation(s)
- Sarah Djebali
- Bioinformatics and Genomics, Centre for Genomic Regulation and Universitat Pompeu Fabra, Barcelona, Catalonia, Spain
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
23
|
Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, Amit I, Lajoie BR, Sabo PJ, Dorschner MO, Sandstrom R, Bernstein B, Bender MA, Groudine M, Gnirke A, Stamatoyannopoulos J, Mirny LA, Lander ES, Dekker J. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 2009; 326:289-93. [PMID: 19815776 DOI: 10.1126/science.1181369] [Citation(s) in RCA: 5314] [Impact Index Per Article: 354.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
We describe Hi-C, a method that probes the three-dimensional architecture of whole genomes by coupling proximity-based ligation with massively parallel sequencing. We constructed spatial proximity maps of the human genome with Hi-C at a resolution of 1 megabase. These maps confirm the presence of chromosome territories and the spatial proximity of small, gene-rich chromosomes. We identified an additional level of genome organization that is characterized by the spatial segregation of open and closed chromatin to form two genome-wide compartments. At the megabase scale, the chromatin conformation is consistent with a fractal globule, a knot-free, polymer conformation that enables maximally dense packing while preserving the ability to easily fold and unfold any genomic locus. The fractal globule is distinct from the more commonly used globular equilibrium model. Our results demonstrate the power of Hi-C to map the dynamic conformations of whole genomes.
Collapse
Affiliation(s)
- Erez Lieberman-Aiden
- Broad Institute of Harvard and Massachusetts Institute of Technology (MIT), MA 02139, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, Amit I, Lajoie BR, Sabo PJ, Dorschner MO, Sandstrom R, Bernstein B, Bender MA, Groudine M, Gnirke A, Stamatoyannopoulos J, Mirny LA, Lander ES, Dekker J. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 2009. [PMID: 19815776 DOI: 10.1126/science.1181369/suppl_file/lieberman-aiden.som.pdf] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/01/2023]
Abstract
We describe Hi-C, a method that probes the three-dimensional architecture of whole genomes by coupling proximity-based ligation with massively parallel sequencing. We constructed spatial proximity maps of the human genome with Hi-C at a resolution of 1 megabase. These maps confirm the presence of chromosome territories and the spatial proximity of small, gene-rich chromosomes. We identified an additional level of genome organization that is characterized by the spatial segregation of open and closed chromatin to form two genome-wide compartments. At the megabase scale, the chromatin conformation is consistent with a fractal globule, a knot-free, polymer conformation that enables maximally dense packing while preserving the ability to easily fold and unfold any genomic locus. The fractal globule is distinct from the more commonly used globular equilibrium model. Our results demonstrate the power of Hi-C to map the dynamic conformations of whole genomes.
Collapse
Affiliation(s)
- Erez Lieberman-Aiden
- Broad Institute of Harvard and Massachusetts Institute of Technology (MIT), MA 02139, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
25
|
Abstract
We developed a primer design method, Pythia, in which state of the art DNA binding affinity computations are directly integrated into the primer design process. We use chemical reaction equilibrium analysis to integrate multiple binding energy calculations into a conservative measure of polymerase chain reaction (PCR) efficiency, and a precomputed index on genomic sequences to evaluate primer specificity. We show that Pythia can design primers with success rates comparable with those of current methods, but yields much higher coverage in difficult genomic regions. For example, in RepeatMasked sequences in the human genome, Pythia achieved a median coverage of 89% as compared with a median coverage of 51% for Primer3. For parameter settings yielding sensitivities of 81%, our method has a recall of 97%, compared with the Primer3 recall of 48%. Because our primer design approach is based on the chemistry of DNA interactions, it has fewer and more physically meaningful parameters than current methods, and is therefore easier to adjust to specific experimental requirements. Our software is freely available at http://pythia.sourceforge.net.
Collapse
Affiliation(s)
- Tobias Mann
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | | | | | | | | |
Collapse
|
26
|
Kidd JM, Cooper GM, Donahue WF, Hayden HS, Sampas N, Graves T, Hansen N, Teague B, Alkan C, Antonacci F, Haugen E, Zerr T, Yamada NA, Tsang P, Newman TL, Tüzün E, Cheng Z, Ebling HM, Tusneem N, David R, Gillett W, Phelps KA, Weaver M, Saranga D, Brand A, Tao W, Gustafson E, McKernan K, Chen L, Malig M, Smith JD, Korn JM, McCarroll SA, Altshuler DA, Peiffer DA, Dorschner M, Stamatoyannopoulos J, Schwartz D, Nickerson DA, Mullikin JC, Wilson RK, Bruhn L, Olson MV, Kaul R, Smith DR, Eichler EE. Mapping and sequencing of structural variation from eight human genomes. Nature 2008; 453:56-64. [PMID: 18451855 PMCID: PMC2424287 DOI: 10.1038/nature06862] [Citation(s) in RCA: 877] [Impact Index Per Article: 54.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2007] [Accepted: 02/15/2008] [Indexed: 11/08/2022]
Abstract
Genetic variation among individual humans occurs on many different scales, ranging from gross alterations in the human karyotype to single nucleotide changes. Here we explore variation on an intermediate scale--particularly insertions, deletions and inversions affecting from a few thousand to a few million base pairs. We employed a clone-based method to interrogate this intermediate structural variation in eight individuals of diverse geographic ancestry. Our analysis provides a comprehensive overview of the normal pattern of structural variation present in these genomes, refining the location of 1,695 structural variants. We find that 50% were seen in more than one individual and that nearly half lay outside regions of the genome previously described as structurally variant. We discover 525 new insertion sequences that are not present in the human reference genome and show that many of these are variable in copy number between individuals. Complete sequencing of 261 structural variants reveals considerable locus complexity and provides insights into the different mutational processes that have shaped the human genome. These data provide the first high-resolution sequence map of human structural variation--a standard for genotyping platforms and a prelude to future individual genome sequencing projects.
Collapse
Affiliation(s)
- Jeffrey M Kidd
- Department of Genome Sciences and Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
27
|
Abstract
One of the major goals of comparative genomics is to understand the evolutionary history of each nucleotide in the human genome sequence, and the degree to which it is under selective pressure. Ascertainment of selective constraint at nucleotide resolution is particularly important for predicting the functional significance of human genetic variation and for analyzing the sequence substructure of cis-regulatory sequences and other functional elements. Current methods for analysis of sequence conservation are focused on delineation of conserved regions comprising tens or even hundreds of consecutive nucleotides. We therefore developed a novel computational approach designed specifically for scoring evolutionary conservation at individual base-pair resolution. Our approach estimates the rate at which each nucleotide position is evolving, computes the probability of neutrality given this rate estimate, and summarizes the result in a Sequence CONservation Evaluation (SCONE) score. We computed SCONE scores in a continuous fashion across 1% of the human genome for which high-quality sequence information from up to 23 genomes are available. We show that SCONE scores are clearly correlated with the allele frequency of human polymorphisms in both coding and noncoding regions. We find that the majority of noncoding conserved nucleotides lie outside of longer conserved elements predicted by other conservation analyses, and are experiencing ongoing selection in modern humans as evident from the allele frequency spectrum of human polymorphism. We also applied SCONE to analyze the distribution of conserved nucleotides within functional regions. These regions are markedly enriched in individually conserved positions and short (<15 bp) conserved “chunks.” Our results collectively suggest that the majority of functionally important noncoding conserved positions are highly fragmented and reside outside of canonically defined long conserved noncoding sequences. A small subset of these fragmented positions may be identified with high confidence. The structure of the human genome remains largely unknown, including which parts of the genome are functionally relevant and which parts are “junk.” The availability of genomic sequence from a large number of mammals allows a more detailed exploration of this structure, using comparison of related sequences from different species to identify portions of the genome that have remained unchanged, conserved by the action of natural selection, and thus likely to be functionally significant. To date, most efforts focused on localizing the functional fraction of the human genome have been based on identifying contiguous stretches of positions conserved in multiple species. Here, we present an analysis that is based instead on a single-position measure of conservation called SCONE. Our analysis suggests that the majority of conserved and putatively functional positions are highly fragmented and lie outside contiguous regions of conserved sequence. A subset of these fragmented positions may be identified based on local clustering.
Collapse
Affiliation(s)
- Saurabh Asthana
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Mikhail Roytberg
- Computational Biology Group, Institute of Mathematical Problems in Biology, Russian Academy of Sciences, Pushchino, Russia
| | - John Stamatoyannopoulos
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
- * To whom correspondence should be addressed. E-mail: (SS), (JS)
| | - Shamil Sunyaev
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
- * To whom correspondence should be addressed. E-mail: (SS), (JS)
| |
Collapse
|
28
|
Stamatoyannopoulos J. Large-scale analysis of erythroid chromatin structure. Blood Cells Mol Dis 2007. [DOI: 10.1016/j.bcmd.2006.10.132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
29
|
Abstract
MOTIVATION In the living cell nucleus, genomic DNA is packaged into chromatin. DNA sequences that regulate transcription and other chromosomal processes are associated with local disruptions, or 'openings', in chromatin structure caused by the cooperative action of regulatory proteins. Such perturbations are extremely specific for cis-regulatory elements and occur over short stretches of DNA (typically approximately 250 bp). They can be detected experimentally as DNaseI hypersensitive sites (HSs) in vivo, though the process is extremely laborious and costly. The ability to discriminate DNaseI HSs computationally would have a major impact on the annotation and utilization of the human genome. RESULTS We found that a supervised pattern recognition algorithm, trained using a set of 280 DNaseI HS and 737 non-HS control sequences from erythroid cells, was capable of de novo prediction of HSs across the human genome with surprisingly high accuracy determined by prospective in vivo validation. Systematic application of this computational approach will greatly facilitate the discovery and analysis of functional non-coding elements in the human and other complex genomes. AVAILABILITY Supplementary data is available at noble.gs.washington.edu/proj/hs
Collapse
|