1
|
Delaney A, Burkholder AB, Lavender CA, Plummer L, Mericq V, Merino PM, Quinton R, Lewis KL, Meader BN, Albano A, Shaw ND, Welt CK, Martin KA, Seminara SB, Biesecker LG, Bailey-Wilson JE, Hall JE. Increased Burden of Rare Sequence Variants in GnRH-Associated Genes in Women With Hypothalamic Amenorrhea. J Clin Endocrinol Metab 2021; 106:e1441-e1452. [PMID: 32870266 PMCID: PMC7947783 DOI: 10.1210/clinem/dgaa609] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Accepted: 08/28/2020] [Indexed: 12/17/2022]
Abstract
CONTEXT Functional hypothalamic amenorrhea (HA) is a common, acquired form of hypogonadotropic hypogonadism that occurs in the setting of energy deficits and/or stress. Variability in individual susceptibility to these stressors, HA heritability, and previous identification of several rare sequence variants (RSVs) in genes associated with the rare disorder, isolated hypogonadotropic hypogonadism (IHH), in individuals with HA suggest a possible genetic contribution to HA susceptibility. OBJECTIVE We sought to determine whether the burden of RSVs in IHH-related genes is greater in women with HA than controls. DESIGN We compared patients with HA to control women. SETTING The study was conducted at secondary referral centers. PATIENTS AND OTHER PARTICIPANTS Women with HA (n = 106) and control women (ClinSeq study; n = 468). INTERVENTIONS We performed exome sequencing in all patients and controls. MAIN OUTCOME MEASURE(S) The frequency of RSVs in 53 IHH-associated genes was determined using rare variant burden and association tests. RESULTS RSVs were overrepresented in women with HA compared with controls (P = .007). Seventy-eight heterozygous RSVs in 33 genes were identified in 58 women with HA (36.8% of alleles) compared to 255 RSVs in 41 genes among 200 control women (27.2%). CONCLUSIONS Women with HA are enriched for RSVs in genes that cause IHH, suggesting that variation in genes associated with gonadotropin-releasing hormone neuronal ontogeny and function may be a major determinant of individual susceptibility to developing HA in the face of diet, exercise, and/or stress.
Collapse
Affiliation(s)
- Angela Delaney
- National Institute of Environmental Health Sciences, National Institutes of Health (NIH), Research Triangle Park, North Carolina
- Eunice Kennedy Shriver National Institute of Child Health and Human Development, NIH, Bethesda, Maryland
| | - Adam B Burkholder
- National Institute of Environmental Health Sciences, National Institutes of Health (NIH), Research Triangle Park, North Carolina
| | - Christopher A Lavender
- National Institute of Environmental Health Sciences, National Institutes of Health (NIH), Research Triangle Park, North Carolina
| | - Lacey Plummer
- Harvard Reproductive Sciences Center and Reproductive Endocrine Unit, Massachusetts General Hospital, Boston, Massachusetts
| | - Veronica Mericq
- Institute of Maternal and Child Research, University of Chile, Santiago, Chile
- Department of Pediatrics, Clínica Las Condes, Santiago, Chile
| | - Paulina M Merino
- Institute of Maternal and Child Research, University of Chile, Santiago, Chile
| | - Richard Quinton
- Institute of Genetic Medicine, Newcastle University, Newcastle-upon-Tyne, UK
| | - Katie L Lewis
- Medical Genomics & Metabolic Genetics Branch, National Human Genome Research Institute, NIH, Bethesda, Maryland
| | - Brooke N Meader
- National Institute of Environmental Health Sciences, National Institutes of Health (NIH), Research Triangle Park, North Carolina
- Eunice Kennedy Shriver National Institute of Child Health and Human Development, NIH, Bethesda, Maryland
| | - Alessandro Albano
- National Institute of Environmental Health Sciences, National Institutes of Health (NIH), Research Triangle Park, North Carolina
- Eunice Kennedy Shriver National Institute of Child Health and Human Development, NIH, Bethesda, Maryland
| | - Natalie D Shaw
- National Institute of Environmental Health Sciences, National Institutes of Health (NIH), Research Triangle Park, North Carolina
| | - Corrine K Welt
- Division of Endocrinology, Metabolism and Diabetes, University of Utah, Salt Lake City, Utah
| | - Kathryn A Martin
- Harvard Reproductive Sciences Center and Reproductive Endocrine Unit, Massachusetts General Hospital, Boston, Massachusetts
| | - Stephanie B Seminara
- Harvard Reproductive Sciences Center and Reproductive Endocrine Unit, Massachusetts General Hospital, Boston, Massachusetts
| | - Leslie G Biesecker
- Medical Genomics & Metabolic Genetics Branch, National Human Genome Research Institute, NIH, Bethesda, Maryland
| | - Joan E Bailey-Wilson
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, NIH, Baltimore, Maryland
| | - Janet E Hall
- National Institute of Environmental Health Sciences, National Institutes of Health (NIH), Research Triangle Park, North Carolina
| |
Collapse
|
2
|
Lujan SA, Longley MJ, Humble MH, Lavender CA, Burkholder A, Blakely EL, Alston CL, Gorman GS, Turnbull DM, McFarland R, Taylor RW, Kunkel TA, Copeland WC. Ultrasensitive deletion detection links mitochondrial DNA replication, disease, and aging. Genome Biol 2020; 21:248. [PMID: 32943091 PMCID: PMC7500033 DOI: 10.1186/s13059-020-02138-5] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Accepted: 08/07/2020] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Acquired human mitochondrial genome (mtDNA) deletions are symptoms and drivers of focal mitochondrial respiratory deficiency, a pathological hallmark of aging and late-onset mitochondrial disease. RESULTS To decipher connections between these processes, we create LostArc, an ultrasensitive method for quantifying deletions in circular mtDNA molecules. LostArc reveals 35 million deletions (~ 470,000 unique spans) in skeletal muscle from 22 individuals with and 19 individuals without pathogenic variants in POLG. This nuclear gene encodes the catalytic subunit of replicative mitochondrial DNA polymerase γ. Ablation, the deleted mtDNA fraction, suffices to explain skeletal muscle phenotypes of aging and POLG-derived disease. Unsupervised bioinformatic analyses reveal distinct age- and disease-correlated deletion patterns. CONCLUSIONS These patterns implicate replication by DNA polymerase γ as the deletion driver and suggest little purifying selection against mtDNA deletions by mitophagy in postmitotic muscle fibers. Observed deletion patterns are best modeled as mtDNA deletions initiated by replication fork stalling during strand displacement mtDNA synthesis.
Collapse
Affiliation(s)
- Scott A Lujan
- Genome Integrity and Structural Biology Laboratory, DNA Replication Fidelity Group, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC, 27709, USA
| | - Matthew J Longley
- Genome Integrity and Structural Biology Laboratory, Mitochondrial DNA Replication Group, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC, 27709, USA
| | - Margaret H Humble
- Genome Integrity and Structural Biology Laboratory, Mitochondrial DNA Replication Group, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC, 27709, USA
| | - Christopher A Lavender
- Integrative Bioinformatics, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC, 27709, USA
| | - Adam Burkholder
- Integrative Bioinformatics, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC, 27709, USA
| | - Emma L Blakely
- Wellcome Centre for Mitochondrial Research, Translational and Clinical Research Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, NE2 4HH, UK
- NHS Highly Specialised Mitochondrial Diagnostic Laboratory, Newcastle upon Tyne Hospitals NHS Foundation Trust, Newcastle upon Tyne, NE1 4LP, UK
| | - Charlotte L Alston
- Wellcome Centre for Mitochondrial Research, Translational and Clinical Research Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, NE2 4HH, UK
- NHS Highly Specialised Mitochondrial Diagnostic Laboratory, Newcastle upon Tyne Hospitals NHS Foundation Trust, Newcastle upon Tyne, NE1 4LP, UK
| | - Grainne S Gorman
- Wellcome Centre for Mitochondrial Research, Translational and Clinical Research Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, NE2 4HH, UK
| | - Doug M Turnbull
- Wellcome Centre for Mitochondrial Research, Translational and Clinical Research Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, NE2 4HH, UK
| | - Robert McFarland
- Wellcome Centre for Mitochondrial Research, Translational and Clinical Research Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, NE2 4HH, UK
| | - Robert W Taylor
- Wellcome Centre for Mitochondrial Research, Translational and Clinical Research Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, NE2 4HH, UK
- NHS Highly Specialised Mitochondrial Diagnostic Laboratory, Newcastle upon Tyne Hospitals NHS Foundation Trust, Newcastle upon Tyne, NE1 4LP, UK
| | - Thomas A Kunkel
- Genome Integrity and Structural Biology Laboratory, DNA Replication Fidelity Group, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC, 27709, USA
| | - William C Copeland
- Genome Integrity and Structural Biology Laboratory, Mitochondrial DNA Replication Group, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC, 27709, USA.
| |
Collapse
|
3
|
Lavender CA, Shapiro AJ, Day FS, Fargo DC. ORSO (Online Resource for Social Omics): A data-driven social network connecting scientists to genomics datasets. PLoS Comput Biol 2020; 16:e1007571. [PMID: 31978042 PMCID: PMC7001987 DOI: 10.1371/journal.pcbi.1007571] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2018] [Revised: 02/05/2020] [Accepted: 11/29/2019] [Indexed: 11/17/2022] Open
Abstract
High-throughput sequencing has become ubiquitous in biomedical sciences. As new technologies emerge and sequencing costs decline, the diversity and volume of available data increases exponentially, and successfully navigating the data becomes more challenging. Though datasets are often hosted by public repositories, scientists must rely on inconsistent annotation to identify and interpret meaningful data. Moreover, the experimental heterogeneity and wide-ranging quality of high-throughput biological data means that even data with desired cell lines, tissue types, or molecular targets may not be readily interpretable or integrated. We have developed ORSO (Online Resource for Social Omics) as an easy-to-use web application to connect life scientists with genomics data. In ORSO, users interact within a data-driven social network, where they can favorite datasets and follow other users. In addition to more than 30,000 datasets hosted from major biomedical consortia, users may contribute their own data to ORSO, facilitating its discovery by other users. Leveraging user interactions, ORSO provides a novel recommendation system to automatically connect users with hosted data. In addition to social interactions, the recommendation system considers primary read coverage information and annotated metadata. Similarities used by the recommendation system are presented by ORSO in a graph display, allowing exploration of dataset associations. The topology of the network graph reflects established biology, with samples from related systems grouped together. We tested the recommendation system using an RNA-seq time course dataset from differentiation of embryonic stem cells to cardiomyocytes. The ORSO recommendation system correctly predicted early data point sources as embryonic stem cells and late data point sources as heart and muscle samples, resulting in recommendation of related datasets. By connecting scientists with relevant data, ORSO provides a critical new service that facilitates wide-ranging research interests. New sequencing technologies have rapidly transformed biomedical research. Public data repositories now contain millions of datasets, which have the potential to accelerate and bolster research projects. However, the sheer magnitude of available data makes navigation difficult. We created ORSO (Online Resource for Social Omics) to address these challenges. ORSO is a social network where entries are not status updates or tweets, but biological datasets. Users may add their own data to ORSO, joining 30,000 validated datasets that are already hosted, and other users may find these data through intuitive search functions and informative analytics. Users can then favorite datasets relevant to their interests or follow contributing users. ORSO also uses a recommendation system like those used on commercial websites to automatically recommend data to users based on user interactions and dataset similarities. By making data more accessible and by connecting users to relevant data, we anticipate that ORSO will be an important resource for scientists. ORSO may be the first of many applications that use methods originating in social media and ecommerce to enhance and further research projects in the life sciences.
Collapse
Affiliation(s)
- Christopher A Lavender
- Integrative Bioinformatics, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, North Carolina, United States of America
| | - Andrew J Shapiro
- Program Operations Branch, Division of the National Toxicology Program, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, North Carolina, United States of America
| | - Frank S Day
- Office of Scientific Computing, Division of Intramural Research, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, North Carolina, United States of America
| | - David C Fargo
- Office of Scientific Computing, Division of Intramural Research, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, North Carolina, United States of America
| |
Collapse
|
4
|
Oldfield AJ, Henriques T, Kumar D, Burkholder AB, Cinghu S, Paulet D, Bennett BD, Yang P, Scruggs BS, Lavender CA, Rivals E, Adelman K, Jothi R. NF-Y controls fidelity of transcription initiation at gene promoters through maintenance of the nucleosome-depleted region. Nat Commun 2019; 10:3072. [PMID: 31296853 PMCID: PMC6624317 DOI: 10.1038/s41467-019-10905-7] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Accepted: 05/27/2019] [Indexed: 12/22/2022] Open
Abstract
Faithful transcription initiation is critical for accurate gene expression, yet the mechanisms underlying specific transcription start site (TSS) selection in mammals remain unclear. Here, we show that the histone-fold domain protein NF-Y, a ubiquitously expressed transcription factor, controls the fidelity of transcription initiation at gene promoters in mouse embryonic stem cells. We report that NF-Y maintains the region upstream of TSSs in a nucleosome-depleted state while simultaneously protecting this accessible region against aberrant and/or ectopic transcription initiation. We find that loss of NF-Y binding in mammalian cells disrupts the promoter chromatin landscape, leading to nucleosomal encroachment over the canonical TSS. Importantly, this chromatin rearrangement is accompanied by upstream relocation of the transcription pre-initiation complex and ectopic transcription initiation. Further, this phenomenon generates aberrant extended transcripts that undergo translation, disrupting gene expression profiles. These results suggest NF-Y is a central player in TSS selection in metazoans and highlight the deleterious consequences of inaccurate transcription initiation. The mechanisms underlying specific TSS selection in mammals remain unclear. Here the authors show that the ubiquitously expressed transcription factor NF-Y regulate fidelity of transcription initiation at gene promoters, maintaining the region upstream of TSSs in a nucleosome-depleted state, while protecting this region from ectopic transcription initiation.
Collapse
Affiliation(s)
- Andrew J Oldfield
- Epigenetics and Stem Cell Biology Laboratory, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, Durham, NC, 27709, USA. .,Institute of Human Genetics, CNRS, University of Montpellier, Montpellier, 34396, France.
| | - Telmo Henriques
- Epigenetics and Stem Cell Biology Laboratory, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, Durham, NC, 27709, USA.,Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA, 02115, USA
| | - Dhirendra Kumar
- Epigenetics and Stem Cell Biology Laboratory, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, Durham, NC, 27709, USA
| | - Adam B Burkholder
- Integrative Bioinformatics Support Group, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, Durham, NC, 27709, USA
| | - Senthilkumar Cinghu
- Epigenetics and Stem Cell Biology Laboratory, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, Durham, NC, 27709, USA
| | - Damien Paulet
- Department of Computer Science, LIRMM, CNRS et Université de Montpellier, Montpellier, 34095, France.,Institut de Biologie Computationnelle (IBC), Université de Montpellier, Montpellier, 34095, France
| | - Brian D Bennett
- Integrative Bioinformatics Support Group, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, Durham, NC, 27709, USA
| | - Pengyi Yang
- Epigenetics and Stem Cell Biology Laboratory, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, Durham, NC, 27709, USA.,Charles Perkins Centre and School of Mathematics and Statistics, University of Sydney, Sydney, NSW 2006, Australia
| | - Benjamin S Scruggs
- Epigenetics and Stem Cell Biology Laboratory, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, Durham, NC, 27709, USA
| | - Christopher A Lavender
- Integrative Bioinformatics Support Group, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, Durham, NC, 27709, USA
| | - Eric Rivals
- Department of Computer Science, LIRMM, CNRS et Université de Montpellier, Montpellier, 34095, France.,Institut de Biologie Computationnelle (IBC), Université de Montpellier, Montpellier, 34095, France
| | - Karen Adelman
- Epigenetics and Stem Cell Biology Laboratory, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, Durham, NC, 27709, USA. .,Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA, 02115, USA.
| | - Raja Jothi
- Epigenetics and Stem Cell Biology Laboratory, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, Durham, NC, 27709, USA.
| |
Collapse
|
5
|
Nguyen TAT, Grimm SA, Bushel PR, Li J, Li Y, Bennett BD, Lavender CA, Ward JM, Fargo DC, Anderson CW, Li L, Resnick MA, Menendez D. Revealing a human p53 universe. Nucleic Acids Res 2019; 46:8153-8167. [PMID: 30107566 PMCID: PMC6144829 DOI: 10.1093/nar/gky720] [Citation(s) in RCA: 57] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2018] [Accepted: 07/27/2018] [Indexed: 12/13/2022] Open
Abstract
p53 transcriptional networks are well-characterized in many organisms. However, a global understanding of requirements for in vivo p53 interactions with DNA and relationships with transcription across human biological systems in response to various p53 activating situations remains limited. Using a common analysis pipeline, we analyzed 41 data sets from genome-wide ChIP-seq studies of which 16 have associated gene expression data, including our recent primary data with normal human lymphocytes. The resulting extensive analysis, accessible at p53 BAER hub via the UCSC browser, provides a robust platform to characterize p53 binding throughout the human genome including direct influence on gene expression and underlying mechanisms. We establish the impact of spacers and mismatches from consensus on p53 binding in vivo and propose that once bound, neither significantly influences the likelihood of expression. Our rigorous approach revealed a large p53 genome-wide cistrome composed of >900 genes directly targeted by p53. Importantly, we identify a core cistrome signature composed of genes appearing in over half the data sets, and we identify signatures that are treatment- or cell-specific, demonstrating new functions for p53 in cell biology. Our analysis reveals a broad homeostatic role for human p53 that is relevant to both basic and translational studies.
Collapse
Affiliation(s)
- Thuy-Ai T Nguyen
- Genome Integrity & Structural Biology Laboratory, National Institute of Environmental Health Sciences/National Institutes of Health, Research Triangle Park, NC 27709, USA
| | - Sara A Grimm
- Integrative Bioinformatics Support Group, National Institute of Environmental Health Sciences/National Institutes of Health, Research Triangle Park, NC 27709, USA
| | - Pierre R Bushel
- Biostatistics & Computational Biology Branch, National Institute of Environmental Health Sciences/National Institutes of Health, Research Triangle Park, NC 27709, USA
| | - Jianying Li
- Integrative Bioinformatics Support Group, National Institute of Environmental Health Sciences/National Institutes of Health, Research Triangle Park, NC 27709, USA
| | - Yuanyuan Li
- Biostatistics & Computational Biology Branch, National Institute of Environmental Health Sciences/National Institutes of Health, Research Triangle Park, NC 27709, USA
| | - Brian D Bennett
- Integrative Bioinformatics Support Group, National Institute of Environmental Health Sciences/National Institutes of Health, Research Triangle Park, NC 27709, USA
| | - Christopher A Lavender
- Integrative Bioinformatics Support Group, National Institute of Environmental Health Sciences/National Institutes of Health, Research Triangle Park, NC 27709, USA
| | - James M Ward
- Integrative Bioinformatics Support Group, National Institute of Environmental Health Sciences/National Institutes of Health, Research Triangle Park, NC 27709, USA
| | - David C Fargo
- Integrative Bioinformatics Support Group, National Institute of Environmental Health Sciences/National Institutes of Health, Research Triangle Park, NC 27709, USA.,Office of Scientific Computing, National Institute of Environmental Health Sciences/National Institutes of Health, Research Triangle Park, NC 27709, USA
| | - Carl W Anderson
- Genome Integrity & Structural Biology Laboratory, National Institute of Environmental Health Sciences/National Institutes of Health, Research Triangle Park, NC 27709, USA
| | - Leping Li
- Biostatistics & Computational Biology Branch, National Institute of Environmental Health Sciences/National Institutes of Health, Research Triangle Park, NC 27709, USA
| | - Michael A Resnick
- Genome Integrity & Structural Biology Laboratory, National Institute of Environmental Health Sciences/National Institutes of Health, Research Triangle Park, NC 27709, USA
| | - Daniel Menendez
- Genome Integrity & Structural Biology Laboratory, National Institute of Environmental Health Sciences/National Institutes of Health, Research Triangle Park, NC 27709, USA
| |
Collapse
|
6
|
Burkholder AB, Lujan SA, Lavender CA, Grimm SA, Kunkel TA, Fargo DC. Muver, a computational framework for accurately calling accumulated mutations. BMC Genomics 2018; 19:345. [PMID: 29743009 PMCID: PMC5944071 DOI: 10.1186/s12864-018-4753-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Accepted: 05/02/2018] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Identification of mutations from next-generation sequencing data typically requires a balance between sensitivity and accuracy. This is particularly true of DNA insertions and deletions (indels), that can impart significant phenotypic consequences on cells but are harder to call than substitution mutations from whole genome mutation accumulation experiments. To overcome these difficulties, we present muver, a computational framework that integrates established bioinformatics tools with novel analytical methods to generate mutation calls with the extremely low false positive rates and high sensitivity required for accurate mutation rate determination and comparison. RESULTS Muver uses statistical comparison of ancestral and descendant allelic frequencies to identify variant loci and assigns genotypes with models that include per-sample assessments of sequencing errors by mutation type and repeat context. Muver identifies maximally parsimonious mutation pathways that connect these genotypes, differentiating potential allelic conversion events and delineating ambiguities in mutation location, type, and size. Benchmarking with a human gold standard father-son pair demonstrates muver's sensitivity and low false positive rates. In DNA mismatch repair (MMR) deficient Saccharomyces cerevisiae, muver detects multi-base deletions in homopolymers longer than the replicative polymerase footprint at rates greater than predicted for sequential single-base deletions, implying a novel multi-repeat-unit slippage mechanism. CONCLUSIONS Benchmarking results demonstrate the high accuracy and sensitivity achieved with muver, particularly for indels, relative to available tools. Applied to an MMR-deficient Saccharomyces cerevisiae system, muver mutation calls facilitate mechanistic insights into DNA replication fidelity.
Collapse
Affiliation(s)
- Adam B Burkholder
- Integrative Bioinformatics, National Institute of Environmental Health Sciences, NIH, DHHS, Research Triangle Park, Durham, NC, 27709, USA
| | - Scott A Lujan
- Laboratory of Genomic Integrity and Structural Biology, National Institute of Environmental Health Sciences, NIH, DHHS, Research Triangle Park, Durham, NC, 27709, USA
| | - Christopher A Lavender
- Integrative Bioinformatics, National Institute of Environmental Health Sciences, NIH, DHHS, Research Triangle Park, Durham, NC, 27709, USA
| | - Sara A Grimm
- Integrative Bioinformatics, National Institute of Environmental Health Sciences, NIH, DHHS, Research Triangle Park, Durham, NC, 27709, USA
| | - Thomas A Kunkel
- Laboratory of Genomic Integrity and Structural Biology, National Institute of Environmental Health Sciences, NIH, DHHS, Research Triangle Park, Durham, NC, 27709, USA
| | - David C Fargo
- Integrative Bioinformatics, National Institute of Environmental Health Sciences, NIH, DHHS, Research Triangle Park, Durham, NC, 27709, USA.
| |
Collapse
|
7
|
Ching T, Himmelstein DS, Beaulieu-Jones BK, Kalinin AA, Do BT, Way GP, Ferrero E, Agapow PM, Zietz M, Hoffman MM, Xie W, Rosen GL, Lengerich BJ, Israeli J, Lanchantin J, Woloszynek S, Carpenter AE, Shrikumar A, Xu J, Cofer EM, Lavender CA, Turaga SC, Alexandari AM, Lu Z, Harris DJ, DeCaprio D, Qi Y, Kundaje A, Peng Y, Wiley LK, Segler MHS, Boca SM, Swamidass SJ, Huang A, Gitter A, Greene CS. Opportunities and obstacles for deep learning in biology and medicine. J R Soc Interface 2018; 15:20170387. [PMID: 29618526 PMCID: PMC5938574 DOI: 10.1098/rsif.2017.0387] [Citation(s) in RCA: 764] [Impact Index Per Article: 127.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2017] [Accepted: 03/07/2018] [Indexed: 11/12/2022] Open
Abstract
Deep learning describes a class of machine learning algorithms that are capable of combining raw inputs into layers of intermediate features. These algorithms have recently shown impressive results across a variety of domains. Biology and medicine are data-rich disciplines, but the data are complex and often ill-understood. Hence, deep learning techniques may be particularly well suited to solve problems of these fields. We examine applications of deep learning to a variety of biomedical problems-patient classification, fundamental biological processes and treatment of patients-and discuss whether deep learning will be able to transform these tasks or if the biomedical sphere poses unique challenges. Following from an extensive literature review, we find that deep learning has yet to revolutionize biomedicine or definitively resolve any of the most pressing challenges in the field, but promising advances have been made on the prior state of the art. Even though improvements over previous baselines have been modest in general, the recent progress indicates that deep learning methods will provide valuable means for speeding up or aiding human investigation. Though progress has been made linking a specific neural network's prediction to input features, understanding how users should interpret these models to make testable hypotheses about the system under study remains an open challenge. Furthermore, the limited amount of labelled data for training presents problems in some domains, as do legal and privacy constraints on work with sensitive health records. Nonetheless, we foresee deep learning enabling changes at both bench and bedside with the potential to transform several areas of biology and medicine.
Collapse
Affiliation(s)
- Travers Ching
- Molecular Biosciences and Bioengineering Graduate Program, University of Hawaii at Manoa, Honolulu, HI, USA
| | - Daniel S Himmelstein
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Brett K Beaulieu-Jones
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Alexandr A Kalinin
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
| | | | - Gregory P Way
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Enrico Ferrero
- Computational Biology and Stats, Target Sciences, GlaxoSmithKline, Stevenage, UK
| | | | - Michael Zietz
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Michael M Hoffman
- Princess Margaret Cancer Centre, Toronto, Ontario, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| | - Wei Xie
- Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN, USA
| | - Gail L Rosen
- Ecological and Evolutionary Signal-processing and Informatics Laboratory, Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA, USA
| | - Benjamin J Lengerich
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Johnny Israeli
- Biophysics Program, Stanford University, Stanford, CA, USA
| | - Jack Lanchantin
- Department of Computer Science, University of Virginia, Charlottesville, VA, USA
| | - Stephen Woloszynek
- Ecological and Evolutionary Signal-processing and Informatics Laboratory, Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA, USA
| | - Anne E Carpenter
- Imaging Platform, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Avanti Shrikumar
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, IL, USA
| | - Evan M Cofer
- Department of Computer Science, Trinity University, San Antonio, TX, USA
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| | - Christopher A Lavender
- Integrative Bioinformatics, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC, USA
| | - Srinivas C Turaga
- Howard Hughes Medical Institute, Janelia Research Campus, Ashburn, VA, USA
| | - Amr M Alexandari
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Zhiyong Lu
- National Center for Biotechnology Information and National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - David J Harris
- Department of Wildlife Ecology and Conservation, University of Florida, Gainesville, FL, USA
| | | | - Yanjun Qi
- Department of Computer Science, University of Virginia, Charlottesville, VA, USA
| | - Anshul Kundaje
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Yifan Peng
- National Center for Biotechnology Information and National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Laura K Wiley
- Division of Biomedical Informatics and Personalized Medicine, University of Colorado School of Medicine, Aurora, CO, USA
| | - Marwin H S Segler
- Institute of Organic Chemistry, Westfälische Wilhelms-Universität Münster, Münster, Germany
| | - Simina M Boca
- Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA
| | - S Joshua Swamidass
- Department of Pathology and Immunology, Washington University in Saint Louis, St Louis, MO, USA
| | - Austin Huang
- Department of Medicine, Brown University, Providence, RI, USA
| | - Anthony Gitter
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
- Morgridge Institute for Research, Madison, WI, USA
| | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
8
|
Henriques T, Scruggs BS, Inouye MO, Muse GW, Williams LH, Burkholder AB, Lavender CA, Fargo DC, Adelman K. Widespread transcriptional pausing and elongation control at enhancers. Genes Dev 2018; 32:26-41. [PMID: 29378787 PMCID: PMC5828392 DOI: 10.1101/gad.309351.117] [Citation(s) in RCA: 215] [Impact Index Per Article: 35.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2017] [Accepted: 12/21/2017] [Indexed: 02/07/2023]
Abstract
In this study, Henriques et al. demonstrate that transcription is a nearly universal feature of enhancers in Drosophila and mammalian cells and that nascent RNA sequencing strategies are optimal for identification of both enhancers and superenhancers. Their findings provide insights into the unique characteristics of superenhancers, which stimulate high-level gene expression through rapid pause release; interestingly, this property renders associated genes resistant to loss of factors that stabilize paused RNAPII. Regulation by gene-distal enhancers is critical for cell type-specific and condition-specific patterns of gene expression. Thus, to understand the basis of gene activity in a given cell type or tissue, we must identify the precise locations of enhancers and functionally characterize their behaviors. Here, we demonstrate that transcription is a nearly universal feature of enhancers in Drosophila and mammalian cells and that nascent RNA sequencing strategies are optimal for identification of both enhancers and superenhancers. We dissect the mechanisms governing enhancer transcription and discover remarkable similarities to transcription at protein-coding genes. We show that RNA polymerase II (RNAPII) undergoes regulated pausing and release at enhancers. However, as compared with mRNA genes, RNAPII at enhancers is less stable and more prone to early termination. Furthermore, we found that the level of histone H3 Lys4 (H3K4) methylation at enhancers corresponds to transcriptional activity such that highly active enhancers display H3K4 trimethylation rather than the H3K4 monomethylation considered a hallmark of enhancers. Finally, our work provides insights into the unique characteristics of superenhancers, which stimulate high-level gene expression through rapid pause release; interestingly, this property renders associated genes resistant to the loss of factors that stabilize paused RNAPII.
Collapse
Affiliation(s)
- Telmo Henriques
- Epigenetics and Stem Cell Biology Laboratory, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina 27709, USA.,Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, Massachusetts 02115, USA
| | - Benjamin S Scruggs
- Epigenetics and Stem Cell Biology Laboratory, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina 27709, USA
| | - Michiko O Inouye
- Epigenetics and Stem Cell Biology Laboratory, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina 27709, USA.,Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, Massachusetts 02115, USA
| | - Ginger W Muse
- Epigenetics and Stem Cell Biology Laboratory, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina 27709, USA
| | - Lucy H Williams
- Epigenetics and Stem Cell Biology Laboratory, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina 27709, USA
| | - Adam B Burkholder
- Center for Integrative Bioinformatics, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina 27709, USA
| | - Christopher A Lavender
- Center for Integrative Bioinformatics, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina 27709, USA
| | - David C Fargo
- Center for Integrative Bioinformatics, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina 27709, USA
| | - Karen Adelman
- Epigenetics and Stem Cell Biology Laboratory, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina 27709, USA.,Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, Massachusetts 02115, USA
| |
Collapse
|
9
|
Lavender CA, Shapiro AJ, Burkholder AB, Bennett BD, Adelman K, Fargo DC. ORIO (Online Resource for Integrative Omics): a web-based platform for rapid integration of next generation sequencing data. Nucleic Acids Res 2017; 45:5678-5690. [PMID: 28402545 PMCID: PMC5449597 DOI: 10.1093/nar/gkx270] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2017] [Accepted: 04/05/2017] [Indexed: 11/14/2022] Open
Abstract
Established and emerging next generation sequencing (NGS)-based technologies allow for genome-wide interrogation of diverse biological processes. However, accessibility of NGS data remains a problem, and few user-friendly resources exist for integrative analysis of NGS data from different sources and experimental techniques. Here, we present Online Resource for Integrative Omics (ORIO; https://orio.niehs.nih.gov/), a web-based resource with an intuitive user interface for rapid analysis and integration of NGS data. To use ORIO, the user specifies NGS data of interest along with a list of genomic coordinates. Genomic coordinates may be biologically relevant features from a variety of sources, such as ChIP-seq peaks for a given protein or transcription start sites from known gene models. ORIO first iteratively finds read coverage values at each genomic feature for each NGS dataset. Data are then integrated using clustering-based approaches, giving hierarchical relationships across NGS datasets and separating individual genomic features into groups. In focusing its analysis on read coverage, ORIO makes limited assumptions about the analyzed data; this allows the tool to be applied across data from a variety of experiments and techniques. Results from analysis are presented in dynamic displays alongside user-controlled statistical tests, supporting rapid statistical validation of observed results. We emphasize the versatility of ORIO through diverse examples, ranging from NGS data quality control to characterization of enhancer regions and integration of gene expression information. Easily accessible on a public web server, we anticipate wide use of ORIO in genome-wide investigations by life scientists.
Collapse
Affiliation(s)
- Christopher A Lavender
- Integrative Bioinformatics, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC 27709, USA
| | - Andrew J Shapiro
- Program Operations Branch, Division of the National Toxicology Program, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC 27709, USA
| | - Adam B Burkholder
- Integrative Bioinformatics, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC 27709, USA
| | - Brian D Bennett
- Integrative Bioinformatics, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC 27709, USA
| | - Karen Adelman
- Epigenetics and Stem Cell Biology Laboratory, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC 27709, USA.,Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| | - David C Fargo
- Office of Scientific Computing, Division of Intramural Research, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC 27709, USA
| |
Collapse
|
10
|
Meers MP, Henriques T, Lavender CA, McKay DJ, Strahl BD, Duronio RJ, Adelman K, Matera AG. Histone gene replacement reveals a post-transcriptional role for H3K36 in maintaining metazoan transcriptome fidelity. eLife 2017; 6. [PMID: 28346137 PMCID: PMC5404926 DOI: 10.7554/elife.23249] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2016] [Accepted: 03/23/2017] [Indexed: 12/17/2022] Open
Abstract
Histone H3 lysine 36 methylation (H3K36me) is thought to participate in a host of co-transcriptional regulatory events. To study the function of this residue independent from the enzymes that modify it, we used a ‘histone replacement’ system in Drosophila to generate a non-modifiable H3K36 lysine-to-arginine (H3K36R) mutant. We observed global dysregulation of mRNA levels in H3K36R animals that correlates with the incidence of H3K36me3. Similar to previous studies, we found that mutation of H3K36 also resulted in H4 hyperacetylation. However, neither cryptic transcription initiation, nor alternative pre-mRNA splicing, contributed to the observed changes in expression, in contrast with previously reported roles for H3K36me. Interestingly, knockdown of the RNA surveillance nuclease, Xrn1, and members of the CCR4-Not deadenylase complex, restored mRNA levels for a class of downregulated, H3K36me3-rich genes. We propose a post-transcriptional role for modification of replication-dependent H3K36 in the control of metazoan gene expression. DOI:http://dx.doi.org/10.7554/eLife.23249.001 In a single human cell there is enough DNA to stretch over a meter if laid end to end. To fit this DNA inside the cell – which is less than 20 micrometers in diameter – the DNA is tightly wrapped around millions of proteins known as histones, which look like “beads” along a “string” of DNA. These histones can prevent other proteins from binding to DNA and activating specific genes. Therefore, cells use enzymes to chemically modify histones to allow particular stretches of DNA to be unwrapped at specific times. Proteins are made up of building blocks called amino acids. A specific amino acid on histones known as H3K36 is modified in certain sections of DNA that suggest it affects the activities of many genes. However, the precise role of this amino acid remains unclear. Previous studies have tried to investigate this by removing the enzymes that modify it, but these enzymes can also modify many other proteins, making it difficult to know what exactly causes the changes in gene activity. Fruit flies are often used in experiments as models of how genetic processes work in humans and other animals. Like us, fruit flies also package their DNA using histones. To investigate the role of H3K36, Meers et al. generated a mutant fruit fly that has a version of the amino acid that cannot be chemically modified by the normal enzymes. Unexpectedly, the experiments suggest that some changes in gene activity that have been previously reported to be caused by modifying H3K36 might actually be due to other factors. Meers et al. found that H3K36 modifications may instead “mark” certain genes to be more active than they otherwise would be. These findings provide a starting point for understanding exactly how H3K36 regulates gene activity. The next challenge is to refine our understanding of how H3K36 modification affects genes in cancer and other diseases, which may aid the development of new therapies to treat these conditions. DOI:http://dx.doi.org/10.7554/eLife.23249.002
Collapse
Affiliation(s)
- Michael P Meers
- Curriculum in Genetics and Molecular Biology, The University of North Carolina at Chapel Hill, Chapel Hill, United States.,Integrative Program for Biological and Genome Sciences, The University of North Carolina at Chapel Hill, Chapel Hill, United States
| | - Telmo Henriques
- Epigenetics and Stem Cell Biology Laboratory, National Institute of Environmental Health Science, Durham, United States
| | - Christopher A Lavender
- Integrative Bioinformatics Support Group, National Institute of Environmental Health Science, Durham, United States
| | - Daniel J McKay
- Curriculum in Genetics and Molecular Biology, The University of North Carolina at Chapel Hill, Chapel Hill, United States.,Integrative Program for Biological and Genome Sciences, The University of North Carolina at Chapel Hill, Chapel Hill, United States.,Department of Genetics, The University of North Carolina at Chapel Hill, Chapel Hill, United States.,Department of Biology, The University of North Carolina at Chapel Hill, Chapel Hill, United States
| | - Brian D Strahl
- Department of Biochemistry and Biophysics, The University of North Carolina at Chapel Hill, Chapel Hill, United States.,Lineberger Comprehensive Cancer Center, The University of North Carolina at Chapel Hill, Chapel Hill, United States
| | - Robert J Duronio
- Curriculum in Genetics and Molecular Biology, The University of North Carolina at Chapel Hill, Chapel Hill, United States.,Integrative Program for Biological and Genome Sciences, The University of North Carolina at Chapel Hill, Chapel Hill, United States.,Department of Genetics, The University of North Carolina at Chapel Hill, Chapel Hill, United States.,Department of Biology, The University of North Carolina at Chapel Hill, Chapel Hill, United States.,Lineberger Comprehensive Cancer Center, The University of North Carolina at Chapel Hill, Chapel Hill, United States
| | - Karen Adelman
- Epigenetics and Stem Cell Biology Laboratory, National Institute of Environmental Health Science, Durham, United States
| | - A Gregory Matera
- Curriculum in Genetics and Molecular Biology, The University of North Carolina at Chapel Hill, Chapel Hill, United States.,Integrative Program for Biological and Genome Sciences, The University of North Carolina at Chapel Hill, Chapel Hill, United States.,Department of Genetics, The University of North Carolina at Chapel Hill, Chapel Hill, United States.,Department of Biology, The University of North Carolina at Chapel Hill, Chapel Hill, United States.,Lineberger Comprehensive Cancer Center, The University of North Carolina at Chapel Hill, Chapel Hill, United States
| |
Collapse
|
11
|
Lavender CA, Cannady KR, Hoffman JA, Trotter KW, Gilchrist DA, Bennett BD, Burkholder AB, Burd CJ, Fargo DC, Archer TK. Downstream Antisense Transcription Predicts Genomic Features That Define the Specific Chromatin Environment at Mammalian Promoters. PLoS Genet 2016; 12:e1006224. [PMID: 27487356 PMCID: PMC4972320 DOI: 10.1371/journal.pgen.1006224] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2016] [Accepted: 07/06/2016] [Indexed: 01/23/2023] Open
Abstract
Antisense transcription is a prevalent feature at mammalian promoters. Previous studies have primarily focused on antisense transcription initiating upstream of genes. Here, we characterize promoter-proximal antisense transcription downstream of gene transcription starts sites in human breast cancer cells, investigating the genomic context of downstream antisense transcription. We find extensive correlations between antisense transcription and features associated with the chromatin environment at gene promoters. Antisense transcription downstream of promoters is widespread, with antisense transcription initiation observed within 2 kb of 28% of gene transcription start sites. Antisense transcription initiates between nucleosomes regularly positioned downstream of these promoters. The nucleosomes between gene and downstream antisense transcription start sites carry histone modifications associated with active promoters, such as H3K4me3 and H3K27ac. This region is bound by chromatin remodeling and histone modifying complexes including SWI/SNF subunits and HDACs, suggesting that antisense transcription or resulting RNA transcripts contribute to the creation and maintenance of a promoter-associated chromatin environment. Downstream antisense transcription overlays additional regulatory features, such as transcription factor binding, DNA accessibility, and the downstream edge of promoter-associated CpG islands. These features suggest an important role for antisense transcription in the regulation of gene expression and the maintenance of a promoter-associated chromatin environment. Gene transcription is regulated by the coordinated interaction of genetic, epigenetic and trans-acting factors. The chromatin environment at gene promoters, including positioned nucleosomes that may display functional histone modifications, is a key regulator of gene expression, contributing to transcriptional activation and repression. In addition to sense-strand transcription of gene sequences, antisense transcription is prevalent at gene promoters. Often resulting in a short-lived non-coding RNA transcript, the function of antisense transcription is poorly understood. Using next-generation sequencing techniques, we characterized transcription in human breast cancer cells and found extensive correlations between antisense transcription and the chromatin environment at promoters. We found that downstream antisense transcription initiates from between regularly positioned nucleosomes and that those nucleosomes between sense and downstream antisense transcription start sites display histone modifications associated with active gene promoters. Chromatin remodelers and other protein complexes responsible for creation and maintenance of the promoter chromatin environment associate with this same region, suggesting an important role of antisense transcription in the regulation of gene expression.
Collapse
Affiliation(s)
- Christopher A Lavender
- Epigenetics and Stem Cell Biology Laboratory, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, North Carolina, United States of America.,Integrative Bioinformatics, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, North Carolina, United States of America
| | - Kimberly R Cannady
- Epigenetics and Stem Cell Biology Laboratory, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, North Carolina, United States of America
| | - Jackson A Hoffman
- Epigenetics and Stem Cell Biology Laboratory, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, North Carolina, United States of America
| | - Kevin W Trotter
- Epigenetics and Stem Cell Biology Laboratory, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, North Carolina, United States of America
| | - Daniel A Gilchrist
- Epigenetics and Stem Cell Biology Laboratory, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, North Carolina, United States of America
| | - Brian D Bennett
- Integrative Bioinformatics, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, North Carolina, United States of America
| | - Adam B Burkholder
- Integrative Bioinformatics, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, North Carolina, United States of America
| | - Craig J Burd
- Department of Molecular Genetics, The Ohio State University, Columbus, Ohio, United States of America.,Wexner Medical Center, The Ohio State University, Columbus, Ohio, United States of America
| | - David C Fargo
- Integrative Bioinformatics, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, North Carolina, United States of America
| | - Trevor K Archer
- Epigenetics and Stem Cell Biology Laboratory, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, North Carolina, United States of America
| |
Collapse
|
12
|
Lavender CA, Gorelick RJ, Weeks KM. Structure-Based Alignment and Consensus Secondary Structures for Three HIV-Related RNA Genomes. PLoS Comput Biol 2015; 11:e1004230. [PMID: 25992893 PMCID: PMC4439019 DOI: 10.1371/journal.pcbi.1004230] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2014] [Accepted: 03/08/2015] [Indexed: 11/30/2022] Open
Abstract
HIV and related primate lentiviruses possess single-stranded RNA genomes. Multiple regions of these genomes participate in critical steps in the viral replication cycle, and the functions of many RNA elements are dependent on the formation of defined structures. The structures of these elements are still not fully understood, and additional functional elements likely exist that have not been identified. In this work, we compared three full-length HIV-related viral genomes: HIV-1NL4-3, SIVcpz, and SIVmac (the latter two strains are progenitors for all HIV-1 and HIV-2 strains, respectively). Model-free RNA structure comparisons were performed using whole-genome structure information experimentally derived from nucleotide-resolution SHAPE reactivities. Consensus secondary structures were constructed for strongly correlated regions by taking into account both SHAPE probing structural data and nucleotide covariation information from structure-based alignments. In these consensus models, all known functional RNA elements were recapitulated with high accuracy. In addition, we identified multiple previously unannotated structural elements in the HIV-1 genome likely to function in translation, splicing and other replication cycle processes; these are compelling targets for future functional analyses. The structure-informed alignment strategy developed here will be broadly useful for efficient RNA motif discovery. Human immunodeficiency virus (HIV) is a persistent and critical threat to human health. Replication and pathogenesis of HIV is governed by information encoded in its single-stranded RNA genome. In addition to coding for viral proteins, the HIV genomic RNA forms base paired and higher-order structures that are critical for viral replication. It is likely that only a subset of functional RNA motifs has been identified. Here, we interrogate the structures of three diverse HIV-related viral genomes by nucleotide-resolution chemical probing. The three genomes include HIV-1, the virus that infects humans, and SIVcpz and SIVmac, which are progenitors for the main branches of the two HIV evolutionary groups. We used a structure-informed alignment approach to generate consensus models for base-paired secondary structures that are shared by these three HIV-related genomes. With this approach, we were able to recapitulate all known RNA structures and, additionally, discovered multiple previously undescribed structural elements that are clearly conserved among major HIV groups. We anticipate that the methods described here will be broadly useful for RNA structure motif discovery and, more immediately, for identification of RNA targets in HIV that are promising sites for therapeutic intervention.
Collapse
Affiliation(s)
- Christopher A. Lavender
- Department of Chemistry, University of North Carolina, Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Robert J. Gorelick
- AIDS and Cancer Virus Program, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, Frederick, Maryland, United States of America
| | - Kevin M. Weeks
- Department of Chemistry, University of North Carolina, Chapel Hill, Chapel Hill, North Carolina, United States of America
- * E-mail:
| |
Collapse
|
13
|
Ding F, Lavender CA, Weeks KM, Dokholyan NV. Three-dimensional RNA structure refinement by hydroxyl radical probing. Nat Methods 2012; 9:603-8. [PMID: 22504587 PMCID: PMC3422565 DOI: 10.1038/nmeth.1976] [Citation(s) in RCA: 69] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2012] [Accepted: 03/20/2012] [Indexed: 01/08/2023]
Abstract
Molecular modeling guided by experimentally-derived structural information is an attractive approach for three-dimensional structure determination of complex RNAs that are not amenable to study by high-resolution methods. Hydroxyl radical probing (HRP), performed routinely in many laboratories, provides a measure of solvent accessibility at individual nucleotides. HRP measurements have, to date, only been used to evaluate RNA models qualitatively. Here, we report development of a quantitative structure refinement approach using HRP measurements to drive discrete molecular dynamics simulations for RNAs ranging in size from 80 to 230 nucleotides. HRP reactivities were first used to identify RNAs that form extensive helical packing interactions. For these RNAs, we achieved highly significant structure predictions, given inputs of RNA sequence and base pairing. This HRP-directed tertiary structure refinement approach generates robust structural hypotheses useful for guiding explorations of structure-function interrelationships in RNA.
Collapse
Affiliation(s)
- Feng Ding
- Department of Biochemistry and Biophysics, University of North Carolina, USA
| | | | | | | |
Collapse
|
14
|
Cruz JA, Blanchet MF, Boniecki M, Bujnicki JM, Chen SJ, Cao S, Das R, Ding F, Dokholyan NV, Flores SC, Huang L, Lavender CA, Lisi V, Major F, Mikolajczak K, Patel DJ, Philips A, Puton T, Santalucia J, Sijenyi F, Hermann T, Rother K, Rother M, Serganov A, Skorupski M, Soltysinski T, Sripakdeevong P, Tuszynska I, Weeks KM, Waldsich C, Wildauer M, Leontis NB, Westhof E. RNA-Puzzles: a CASP-like evaluation of RNA three-dimensional structure prediction. RNA 2012; 18:610-25. [PMID: 22361291 PMCID: PMC3312550 DOI: 10.1261/rna.031054.111] [Citation(s) in RCA: 156] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
We report the results of a first, collective, blind experiment in RNA three-dimensional (3D) structure prediction, encompassing three prediction puzzles. The goals are to assess the leading edge of RNA structure prediction techniques; compare existing methods and tools; and evaluate their relative strengths, weaknesses, and limitations in terms of sequence length and structural complexity. The results should give potential users insight into the suitability of available methods for different applications and facilitate efforts in the RNA structure prediction community in ongoing efforts to improve prediction tools. We also report the creation of an automated evaluation pipeline to facilitate the analysis of future RNA structure prediction exercises.
Collapse
Affiliation(s)
- José Almeida Cruz
- Architecture et Réactivité de l'ARN, Université de Strasbourg, IBMC-CNRS, F-67084 Strasbourg, France
| | - Marc-Frédérick Blanchet
- Institute for Research in Immunology and Cancer (IRIC), Department of Computer Science and Operations Research, Université de Montréal, Montréal, Québec H3C 3J7, Canada
| | - Michal Boniecki
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, 02-109 Warsaw, Poland
| | - Janusz M. Bujnicki
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, 02-109 Warsaw, Poland
- Laboratory of Bioinformatics, Institute of Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University, 61-614 Poznan, Poland
| | - Shi-Jie Chen
- Department of Physics and Department of Biochemistry, University of Missouri, Columbia, Missouri 65211, USA
| | - Song Cao
- Department of Physics and Department of Biochemistry, University of Missouri, Columbia, Missouri 65211, USA
| | - Rhiju Das
- Department of Biochemistry
- Department of Physics, Stanford University, Stanford, California 94305, USA
| | - Feng Ding
- Department of Biochemistry and Biophysics, University of North Carolina, School of Medicine, Chapel Hill, North Carolina 27599, USA
| | - Nikolay V. Dokholyan
- Department of Biochemistry and Biophysics, University of North Carolina, School of Medicine, Chapel Hill, North Carolina 27599, USA
| | - Samuel Coulbourn Flores
- Computational & Systems Biology Program, Institute for Cell and Molecular Biology, Uppsala University, 751 05 Uppsala, Sweden
| | - Lili Huang
- Structural Biology Program, Memorial Sloan-Kettering Cancer Center, New York, New York 10021, USA
| | - Christopher A. Lavender
- Department of Chemistry, University of North Carolina, Chapel Hill, North Carolina 27599, USA
| | - Véronique Lisi
- Institute for Research in Immunology and Cancer (IRIC), Department of Computer Science and Operations Research, Université de Montréal, Montréal, Québec H3C 3J7, Canada
| | - François Major
- Institute for Research in Immunology and Cancer (IRIC), Department of Computer Science and Operations Research, Université de Montréal, Montréal, Québec H3C 3J7, Canada
| | - Katarzyna Mikolajczak
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, 02-109 Warsaw, Poland
| | - Dinshaw J. Patel
- Structural Biology Program, Memorial Sloan-Kettering Cancer Center, New York, New York 10021, USA
| | - Anna Philips
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, 02-109 Warsaw, Poland
- Laboratory of Bioinformatics, Institute of Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University, 61-614 Poznan, Poland
| | - Tomasz Puton
- Laboratory of Bioinformatics, Institute of Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University, 61-614 Poznan, Poland
| | - John Santalucia
- Department of Chemistry, Wayne State University, Detroit, Michigan 48202, USA
- DNA Software, Ann Arbor, Michigan 48104, USA
| | | | - Thomas Hermann
- Department of Chemistry and Biochemistry, University of California at San Diego, La Jolla, California 92093, USA
| | - Kristian Rother
- Laboratory of Bioinformatics, Institute of Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University, 61-614 Poznan, Poland
| | - Magdalena Rother
- Laboratory of Bioinformatics, Institute of Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University, 61-614 Poznan, Poland
| | - Alexander Serganov
- Structural Biology Program, Memorial Sloan-Kettering Cancer Center, New York, New York 10021, USA
| | - Marcin Skorupski
- Laboratory of Bioinformatics, Institute of Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University, 61-614 Poznan, Poland
| | - Tomasz Soltysinski
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, 02-109 Warsaw, Poland
| | - Parin Sripakdeevong
- Department of Biochemistry
- Department of Physics, Stanford University, Stanford, California 94305, USA
| | - Irina Tuszynska
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, 02-109 Warsaw, Poland
| | - Kevin M. Weeks
- Department of Chemistry, University of North Carolina, Chapel Hill, North Carolina 27599, USA
| | - Christina Waldsich
- Max F. Perutz Laboratories, Department of Biochemistry, University of Vienna, Vienna 1030, Austria
| | - Michael Wildauer
- Max F. Perutz Laboratories, Department of Biochemistry, University of Vienna, Vienna 1030, Austria
| | - Neocles B. Leontis
- Department of Chemistry and Center for Biomolecular Sciences, Bowling Green State University, Bowling Green, Ohio 43403, USA
| | - Eric Westhof
- Architecture et Réactivité de l'ARN, Université de Strasbourg, IBMC-CNRS, F-67084 Strasbourg, France
- Corresponding author.E-mail .
| |
Collapse
|
15
|
Lavender CA, Ding F, Dokholyan NV, Weeks KM. Correction to Robust and Generic RNA Modeling Using Inferred Constraints: A Structure for the Hepatitis C Virus IRES Pseudoknot Domain. Biochemistry 2010. [DOI: 10.1021/bi100935c] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
16
|
Lavender CA, Ding F, Dokholyan NV, Weeks KM. Robust and generic RNA modeling using inferred constraints: a structure for the hepatitis C virus IRES pseudoknot domain. Biochemistry 2010; 49:4931-3. [PMID: 20545364 PMCID: PMC2889920 DOI: 10.1021/bi100142y] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
RNA function is dependent on its structure, yet three-dimensional folds for most biologically important RNAs are unknown. We develop a generic discrete molecular dynamics-based modeling system that uses long-range constraints inferred from diverse biochemical or bioinformatic analyses to create statistically significant (p < 0.01) nativelike folds for RNAs of known structure ranging from 45 to 158 nucleotides. We then predict the unknown structure of the hepatitis C virus internal ribosome entry site (IRES) pseudoknot domain. The resulting RNA model rationalizes independent solvent accessibility and cryo-electron microscopy structure information. The pseudoknot domain positions the AUG start codon near the mRNA channel and is tRNA-like, suggesting the IRES employs molecular mimicry as a functional strategy.
Collapse
Affiliation(s)
- Christopher A. Lavender
- Department of Chemistry, University of North Carolina, Chapel Hill, North Carolina 27599-3290
| | - Feng Ding
- Department of Biochemistry and Biophysics, University of North Carolina, Chapel Hill, North Carolina 27599-7260
| | - Nikolay V. Dokholyan
- Department of Biochemistry and Biophysics, University of North Carolina, Chapel Hill, North Carolina 27599-7260
| | - Kevin M. Weeks
- Department of Chemistry, University of North Carolina, Chapel Hill, North Carolina 27599-3290
| |
Collapse
|
17
|
Jiang X, Lavender CA, Woodcock JW, Zhao B. Multiple Micellization and Dissociation Transitions of Thermo- and Light-Sensitive Poly(ethylene oxide)-b-poly(ethoxytri(ethylene glycol) acrylate-co-o-nitrobenzyl acrylate) in Water. Macromolecules 2008. [DOI: 10.1021/ma7028105] [Citation(s) in RCA: 178] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Xueguang Jiang
- Department of Chemistry, University of Tennessee, Knoxville, Tennessee 37996
| | | | | | - Bin Zhao
- Department of Chemistry, University of Tennessee, Knoxville, Tennessee 37996
| |
Collapse
|