1
|
Hramyka D, Sczakiel HL, Zhao MX, Stolpe O, Nieminen M, Adam R, Danyel M, Einicke L, Hägerling R, Knaus A, Mundlos S, Schwartzmann S, Seelow D, Ehmke N, Mensah M, Boschann F, Beule D, Holtgrewe M. REEV: review, evaluate and explain variants. Nucleic Acids Res 2024; 52:W148-W158. [PMID: 38769069 PMCID: PMC11223839 DOI: 10.1093/nar/gkae366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 04/07/2024] [Accepted: 05/03/2024] [Indexed: 05/22/2024] Open
Abstract
In the era of high throughput sequencing, special software is required for the clinical evaluation of genetic variants. We developed REEV (Review, Evaluate and Explain Variants), a user-friendly platform for clinicians and researchers in the field of rare disease genetics. Supporting data was aggregated from public data sources. We compared REEV with seven other tools for clinical variant evaluation. REEV (semi-)automatically fills individual ACMG criteria facilitating variant interpretation. REEV can store disease and phenotype data related to a case to use these for phenotype similarity measures. Users can create public permanent links for individual variants that can be saved as browser bookmarks and shared. REEV may help in the fast diagnostic assessment of genetic variants in a clinical as well as in a research context. REEV (https://reev.bihealth.org/) is free and open to all users and there is no login requirement.
Collapse
Affiliation(s)
- Dzmitry Hramyka
- Berlin Institute of Health, Core Unit Bioinformatics, Berlin, Germany
| | - Henrike Lisa Sczakiel
- Institute of Medical Genetics and Human Genetics, Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- BIH Biomedical Innovation Academy, Clinician Scientist Program, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
- RG Development & Disease, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Max Xiaohang Zhao
- Berlin Institute of Health, Core Unit Bioinformatics, Berlin, Germany
- Institute of Medical Genetics and Human Genetics, Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- Berlin Institute of Health, Berlin, Germany
| | - Oliver Stolpe
- Berlin Institute of Health, Core Unit Bioinformatics, Berlin, Germany
| | - Mikko Nieminen
- Berlin Institute of Health, Core Unit Bioinformatics, Berlin, Germany
| | - Ronja Adam
- Institute of Medical Genetics and Human Genetics, Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- Berlin Institute of Health, Berlin, Germany
| | - Magdalena Danyel
- Institute of Medical Genetics and Human Genetics, Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- BIH Biomedical Innovation Academy, Clinician Scientist Program, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Lara Einicke
- Institute of Medical Genetics and Human Genetics, Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- Berlin Institute of Health, Berlin, Germany
| | - René Hägerling
- Institute of Medical Genetics and Human Genetics, Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- BIH Biomedical Innovation Academy, Clinician Scientist Program, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
- RG Development & Disease, Max Planck Institute for Molecular Genetics, Berlin, Germany
- Berlin Institute of Health, BIH Center for Regenerative Therapies, Berlin, Germany
| | - Alexej Knaus
- Institute for Genomic Statistics and Bioinformatics, University Hospital Bonn, Germany
| | - Stefan Mundlos
- Institute of Medical Genetics and Human Genetics, Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- RG Development & Disease, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Sarina Schwartzmann
- Institute of Medical Genetics and Human Genetics, Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Dominik Seelow
- Institute of Medical Genetics and Human Genetics, Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- Berlin Institute of Health, Berlin, Germany
| | - Nadja Ehmke
- Institute of Medical Genetics and Human Genetics, Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- Berlin Institute of Health, Berlin, Germany
| | - Martin Atta Mensah
- Institute of Medical Genetics and Human Genetics, Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- BIH Biomedical Innovation Academy, Digital Clinician Scientist Program, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Felix Boschann
- Institute of Medical Genetics and Human Genetics, Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- BIH Biomedical Innovation Academy, Clinician Scientist Program, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
- Berlin Institute of Health, Berlin, Germany
| | - Dieter Beule
- Berlin Institute of Health, Core Unit Bioinformatics, Berlin, Germany
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
| | - Manuel Holtgrewe
- Berlin Institute of Health, Core Unit Bioinformatics, Berlin, Germany
| |
Collapse
|
2
|
Cummins M, Watson C, Edwards RJ, Mattick JS. The Evolution of Ultraconserved Elements in Vertebrates. Mol Biol Evol 2024; 41:msae146. [PMID: 39058500 PMCID: PMC11276968 DOI: 10.1093/molbev/msae146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Revised: 06/29/2024] [Accepted: 07/08/2024] [Indexed: 07/18/2024] Open
Abstract
Ultraconserved elements were discovered two decades ago, arbitrarily defined as sequences that are identical over a length ≥ 200 bp in the human, mouse, and rat genomes. The definition was subsequently extended to sequences ≥ 100 bp identical in at least three of five mammalian genomes (including dog and cow), and shown to have undergone rapid expansion from ancestors in fish and strong negative selection in birds and mammals. Since then, many more genomes have become available, allowing better definition and more thorough examination of ultraconserved element distribution and evolutionary history. We developed a fast and flexible analytical pipeline for identifying ultraconserved elements in multiple genomes, dedUCE, which allows manipulation of minimum length, sequence identity, and number of species with a detectable ultraconserved element according to specified parameters. We suggest an updated definition of ultraconserved elements as sequences ≥ 100 bp and ≥97% sequence identity in ≥50% of placental mammal orders (12,813 ultraconserved elements). By mapping ultraconserved elements to ∼200 species, we find that placental ultraconserved elements appeared early in vertebrate evolution, well before land colonization, suggesting that the evolutionary pressures driving ultraconserved element selection were present in aquatic environments in the Cambrian-Devonian periods. Most (>90%) ultraconserved elements likely appeared after the divergence of gnathostomes from jawless predecessors, were largely established in sequence identity by early Sarcopterygii evolution-before the divergence of lobe-finned fishes from tetrapods-and became near fixed in the amniotes. Ultraconserved elements are mainly located in the introns of protein-coding and noncoding genes involved in neurological and skeletomuscular development, enriched in regulatory elements, and dynamically expressed throughout embryonic development.
Collapse
Affiliation(s)
- Mitchell Cummins
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Sydney, NSW 2052, Australia
| | - Cadel Watson
- School of Engineering, UNSW Sydney, Sydney, NSW 2052, Australia
| | - Richard J Edwards
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Sydney, NSW 2052, Australia
| | - John S Mattick
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Sydney, NSW 2052, Australia
| |
Collapse
|
3
|
Li Y, Tan M, Akkari-Henić A, Zhang L, Kip M, Sun S, Sepers JJ, Xu N, Ariyurek Y, Kloet SL, Davis RP, Mikkers H, Gruber JJ, Snyder MP, Li X, Pang B. Genome-wide Cas9-mediated screening of essential non-coding regulatory elements via libraries of paired single-guide RNAs. Nat Biomed Eng 2024; 8:890-908. [PMID: 38778183 PMCID: PMC11310080 DOI: 10.1038/s41551-024-01204-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 03/27/2024] [Indexed: 05/25/2024]
Abstract
The functions of non-coding regulatory elements (NCREs), which constitute a major fraction of the human genome, have not been systematically studied. Here we report a method involving libraries of paired single-guide RNAs targeting both ends of an NCRE as a screening system for the Cas9-mediated deletion of thousands of NCREs genome-wide to study their functions in distinct biological contexts. By using K562 and 293T cell lines and human embryonic stem cells, we show that NCREs can have redundant functions, and that many ultra-conserved elements have silencer activity and play essential roles in cell growth and in cellular responses to drugs (notably, the ultra-conserved element PAX6_Tarzan may be critical for heart development, as removing it from human embryonic stem cells led to defects in cardiomyocyte differentiation). The high-throughput screen, which is compatible with single-cell sequencing, may allow for the identification of druggable NCREs.
Collapse
Affiliation(s)
- Yufeng Li
- Department of Cell and Chemical Biology, Leiden University Medical Center, Leiden, the Netherlands
| | - Minkang Tan
- Department of Cell and Chemical Biology, Leiden University Medical Center, Leiden, the Netherlands
| | - Almira Akkari-Henić
- Department of Cell and Chemical Biology, Leiden University Medical Center, Leiden, the Netherlands
| | - Limin Zhang
- Department of Cell and Chemical Biology, Leiden University Medical Center, Leiden, the Netherlands
| | - Maarten Kip
- Department of Cell and Chemical Biology, Leiden University Medical Center, Leiden, the Netherlands
| | - Shengnan Sun
- Department of Cell and Chemical Biology, Leiden University Medical Center, Leiden, the Netherlands
| | - Jorian J Sepers
- Department of Cell and Chemical Biology, Leiden University Medical Center, Leiden, the Netherlands
| | - Ningning Xu
- Department of Cell and Chemical Biology, Leiden University Medical Center, Leiden, the Netherlands
| | - Yavuz Ariyurek
- Leiden Genome Technology Center, Department of Human Genetics, Leiden University Medical Center, Leiden, the Netherlands
| | - Susan L Kloet
- Leiden Genome Technology Center, Department of Human Genetics, Leiden University Medical Center, Leiden, the Netherlands
| | - Richard P Davis
- Department of Anatomy and Embryology, The Novo Nordisk Foundation Center for Stem Cell Medicine (reNEW), Leiden University Medical Center, Leiden, the Netherlands
| | - Harald Mikkers
- Department of Cell and Chemical Biology, Leiden University Medical Center, Leiden, the Netherlands
| | - Joshua J Gruber
- Department of Internal Medicine, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | | | - Xiao Li
- Department of Biochemistry, The Center for RNA Science and Therapeutics, Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH, USA.
| | - Baoxu Pang
- Department of Cell and Chemical Biology, Leiden University Medical Center, Leiden, the Netherlands.
| |
Collapse
|
4
|
Singh PP, Reeves GA, Contrepois K, Papsdorf K, Miklas JW, Ellenberger M, Hu CK, Snyder MP, Brunet A. Evolution of diapause in the African turquoise killifish by remodeling the ancient gene regulatory landscape. Cell 2024; 187:3338-3356.e30. [PMID: 38810644 DOI: 10.1016/j.cell.2024.04.048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Revised: 11/30/2023] [Accepted: 04/30/2024] [Indexed: 05/31/2024]
Abstract
Suspended animation states allow organisms to survive extreme environments. The African turquoise killifish has evolved diapause as a form of suspended development to survive a complete drought. However, the mechanisms underlying the evolution of extreme survival states are unknown. To understand diapause evolution, we performed integrative multi-omics (gene expression, chromatin accessibility, and lipidomics) in the embryos of multiple killifish species. We find that diapause evolved by a recent remodeling of regulatory elements at very ancient gene duplicates (paralogs) present in all vertebrates. CRISPR-Cas9-based perturbations identify the transcription factors REST/NRSF and FOXOs as critical for the diapause gene expression program, including genes involved in lipid metabolism. Indeed, diapause shows a distinct lipid profile, with an increase in triglycerides with very-long-chain fatty acids. Our work suggests a mechanism for the evolution of complex adaptations and offers strategies to promote long-term survival by activating suspended animation programs in other species.
Collapse
Affiliation(s)
| | - G Adam Reeves
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Kévin Contrepois
- Department of Genetics, Stanford University, Stanford, CA, USA; Stanford Cardiovascular Institute, Stanford University, Stanford, CA, USA
| | | | - Jason W Miklas
- Department of Genetics, Stanford University, Stanford, CA, USA
| | | | - Chi-Kuo Hu
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Michael P Snyder
- Department of Genetics, Stanford University, Stanford, CA, USA; Stanford Cardiovascular Institute, Stanford University, Stanford, CA, USA; Stanford Diabetes Research Center, Stanford University, Stanford, CA, USA
| | - Anne Brunet
- Department of Genetics, Stanford University, Stanford, CA, USA; Glenn Center for the Biology of Aging, Stanford University, Stanford, CA, USA; Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, USA; Chan Zuckerberg Biohub, San Francisco, San Francisco, CA, USA.
| |
Collapse
|
5
|
Hickey AJ, Maloney SE, Kuehl PJ, Phillips JE, Wolff RK. Practical Considerations in Dose Extrapolation from Animals to Humans. J Aerosol Med Pulm Drug Deliv 2024; 37:77-89. [PMID: 38237032 DOI: 10.1089/jamp.2023.0041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/21/2024] Open
Abstract
Animal studies are an important component of drug product development and the regulatory review process since modern practices have been in place, for almost a century. A variety of experimental systems are available to generate aerosols for delivery to animals in both liquid and solid forms. The extrapolation of deposited dose in the lungs from laboratory animals to humans is challenging because of genetic, anatomical, physiological, pharmacological, and other biological differences between species. Inhaled drug delivery extrapolation requires scrutiny as the aerodynamic behavior, and its role in lung deposition is influenced not only by the properties of the drug aerosol but also by the anatomy and pulmonary function of the species in which it is being evaluated. Sources of variability between species include the formulation, delivery system, and species-specific biological factors. It is important to acknowledge the underlying variables that contribute to estimates of dose scaling between species.
Collapse
Affiliation(s)
- Anthony J Hickey
- Department of Technology Advancement and Commercialization, RTI International, Research Triangle Park, North Carolina, USA
| | - Sara E Maloney
- Department of Technology Advancement and Commercialization, RTI International, Research Triangle Park, North Carolina, USA
| | - Phillip J Kuehl
- Division: Scientific Core Laboratories; Lovelace Respiratory Research Institute, Albuquerque, New Mexico, USA
| | - Jonathan E Phillips
- Amgen, Inc., Inflammation Discovery Research, Thousand Oaks, California, USA
| | | |
Collapse
|
6
|
Zhang X, Xia F, Zhang X, Blumenthal RM, Cheng X. C2H2 Zinc Finger Transcription Factors Associated with Hemoglobinopathies. J Mol Biol 2024; 436:168343. [PMID: 37924864 PMCID: PMC11185177 DOI: 10.1016/j.jmb.2023.168343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 10/23/2023] [Accepted: 10/30/2023] [Indexed: 11/06/2023]
Abstract
In humans, specific aberrations in β-globin results in sickle cell disease and β-thalassemia, symptoms of which can be ameliorated by increased expression of fetal globin (HbF). Two recent CRISPR-Cas9 screens, centered on ∼1500 annotated sequence-specific DNA binding proteins and performed in a human erythroid cell line that expresses adult hemoglobin, uncovered four groups of candidate regulators of HbF gene expression. They are (1) members of the nucleosome remodeling and deacetylase (NuRD) complex proteins that are already known for HbF control; (2) seven C2H2 zinc finger (ZF) proteins, including some (ZBTB7A and BCL11A) already known for directly silencing the fetal γ-globin genes in adult human erythroid cells; (3) a few other transcription factors of different structural classes that might indirectly influence HbF gene expression; and (4) DNA methyltransferase 1 (DNMT1) that maintains the DNA methylation marks that attract the MBD2-associated NuRD complex to DNA as well as associated histone H3 lysine 9 methylation. Here we briefly discuss the effects of these regulators, particularly C2H2 ZFs, in inducing HbF expression for treating β-hemoglobin disorders, together with recent advances in developing safe and effective small-molecule therapeutics for the regulation of this well-conserved hemoglobin switch.
Collapse
Affiliation(s)
- Xing Zhang
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA.
| | - Fangfang Xia
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Xiaotian Zhang
- Department of Biochemistry and Molecular Biology, The University of Texas Health Science Center Houston, McGovern Medical School, Houston, TX 77030, USA
| | - Robert M Blumenthal
- Department of Medical Microbiology and Immunology, and Program in Bioinformatics, The University of Toledo College of Medicine and Life Sciences, Toledo, OH 43614, USA
| | - Xiaodong Cheng
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA.
| |
Collapse
|
7
|
Lim D, Baek C, Blanchette M. Graphylo: A deep learning approach for predicting regulatory DNA and RNA sites from whole-genome multiple alignments. iScience 2024; 27:109002. [PMID: 38362268 PMCID: PMC10867641 DOI: 10.1016/j.isci.2024.109002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2023] [Revised: 12/17/2023] [Accepted: 01/19/2024] [Indexed: 02/17/2024] Open
Abstract
This study focuses on enhancing the prediction of regulatory functional sites in DNA and RNA sequences, a crucial aspect of gene regulation. Current methods, such as motif overrepresentation and machine learning, often lack specificity. To address this issue, the study leverages evolutionary information and introduces Graphylo, a deep-learning approach for predicting transcription factor binding sites in the human genome. Graphylo combines Convolutional Neural Networks for DNA sequences with Graph Convolutional Networks on phylogenetic trees, using information from placental mammals' genomes and evolutionary history. The research demonstrates that Graphylo consistently outperforms both single-species deep learning techniques and methods that incorporate inter-species conservation scores on a wide range of datasets. It achieves this by utilizing a species-based attention model for evolutionary insights and an integrated gradient approach for nucleotide-level model interpretability. This innovative approach offers a promising avenue for improving the accuracy of regulatory site prediction in genomics.
Collapse
|
8
|
Verruma CG, Santos RS, Marchesi JAP, Sales SLA, Vila RA, Rios ÁFL, Furtado CLM, Ramos ES. Dynamic methylation pattern of H19DMR and KvDMR1 in bovine oocytes and preimplantation embryos. J Assist Reprod Genet 2024; 41:333-345. [PMID: 38231285 PMCID: PMC10894807 DOI: 10.1007/s10815-023-03011-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Accepted: 12/19/2023] [Indexed: 01/18/2024] Open
Abstract
PURPOSE This study aimed to evaluate the epigenetic reprogramming of ICR1 (KvDMR1) and ICR2 (H19DMR) and expression of genes controlled by them as well as those involved in methylation, demethylation, and pluripotency. METHODS We collected germinal vesicle (GV) and metaphase II (MII) oocytes, and preimplantation embryos at five stages [zygote, 4-8 cells, 8-16 cells, morula, and expanded blastocysts (ExB)]. DNA methylation was assessed by BiSeq, and the gene expression was evaluated using qPCR. RESULTS H19DMR showed an increased DNA methylation from GV to MII oocytes (68.04% and 98.05%, respectively), decreasing in zygotes (85.83%) until morula (61.65%), and ExB (63.63%). H19 and IGF2 showed increased expression in zygotes, which decreased in further stages. KvDMR1 was hypermethylated in both GV (71.82%) and MII (69.43%) and in zygotes (73.70%) up to morula (77.84%), with a loss of methylation at the ExB (36.64%). The zygote had higher expression of most genes, except for CDKN1C and PHLDA2, which were highly expressed in MII and GV oocytes, respectively. DNMTs showed increased expression in oocytes, followed by a reduction in the earliest stages of embryo development. TET1 was downregulated until 4-8-cell and upregulated in 8-16-cell embryos. TET2 and TET3 showed higher expression in oocytes, and a downregulation in MII oocytes and 4-8-cell embryo. CONCLUSION We highlighted the heterogeneity in the DNA methylation of H19DMR and KvDMR1 and a dynamic expression pattern of genes controlled by them. The expression of DNMTs and TETs genes was also dynamic owing to epigenetic reprogramming.
Collapse
Affiliation(s)
- Carolina G Verruma
- Department of Genetics, Ribeirão Preto Medical School, University of São Paulo, Ribeirão Preto, SP, 14049-900, Brazil
| | - Renan S Santos
- Postgraduate Program in Physiology and Pharmacology, Drug Research and Development Center (NPDM), Federal University of Ceara (UFC), Fortaleza, CE, 60430-275, Brazil
| | - Jorge A P Marchesi
- Department of Genetics, Ribeirão Preto Medical School, University of São Paulo, Ribeirão Preto, SP, 14049-900, Brazil
| | - Sarah L A Sales
- Postgraduate Program in Physiology and Pharmacology, Drug Research and Development Center (NPDM), Federal University of Ceara (UFC), Fortaleza, CE, 60430-275, Brazil
| | - Reginaldo A Vila
- Department of Genetics, Ribeirão Preto Medical School, University of São Paulo, Ribeirão Preto, SP, 14049-900, Brazil
| | - Álvaro F L Rios
- Biotechnology Laboratory, Center of Bioscience and Biotechnology, State University of North Fluminense Darcy Ribeiro, Goitacazes Campus, Rio de Janeiro, Brazil
| | - Cristiana L M Furtado
- Experimental Biology Center, Graduate Program in Medical Sciences, University of Fortaleza - UNIFOR, Fortaleza, CE, 60811-905, Brazil
- Drug Research and Development Center (NPDM), Postgraduate Program in Translational Medicine, Federal University of Ceara (UFC), Fortaleza, CE, 60430-275, Brazil
| | - Ester S Ramos
- Department of Genetics, Ribeirão Preto Medical School, University of São Paulo, Ribeirão Preto, SP, 14049-900, Brazil.
| |
Collapse
|
9
|
Liu X, Chen M, Qu X, Liu W, Dou Y, Liu Q, Shi D, Jiang M, Li H. Cis-Regulatory Elements in Mammals. Int J Mol Sci 2023; 25:343. [PMID: 38203513 PMCID: PMC10779164 DOI: 10.3390/ijms25010343] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 12/21/2023] [Accepted: 12/23/2023] [Indexed: 01/12/2024] Open
Abstract
In cis-regulatory elements, enhancers and promoters with complex molecular interactions are used to coordinate gene transcription through physical proximity and chemical modifications. These processes subsequently influence the phenotypic characteristics of an organism. An in-depth exploration of enhancers and promoters can substantially enhance our understanding of gene regulatory networks, shedding new light on mammalian development, evolution and disease pathways. In this review, we provide a comprehensive overview of the intrinsic structural attributes, detection methodologies as well as the operational mechanisms of enhancers and promoters, coupled with the relevant novel and innovative investigative techniques used to explore their actions. We further elucidated the state-of-the-art research on the roles of enhancers and promoters in the realms of mammalian development, evolution and disease, and we conclude with forward-looking insights into prospective research avenues.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Mingsheng Jiang
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangxi Key Laboratory of Animal Breeding, Disease Control and Prevention, College of Animal Science and Technology, Guangxi University, Nanning 530005, China
| | - Hui Li
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangxi Key Laboratory of Animal Breeding, Disease Control and Prevention, College of Animal Science and Technology, Guangxi University, Nanning 530005, China
| |
Collapse
|
10
|
Liu A, Wang N, Xie G, Li Y, Yan X, Li X, Zhu Z, Li Z, Yang J, Meng F, Dou M, Chen W, Ma N, Jiang Y, Gao Y, Wang Y. GC-biased gene conversion drives accelerated evolution of ultraconserved elements in mammalian and avian genomes. Genome Res 2023; 33:1673-1689. [PMID: 37884342 PMCID: PMC10691551 DOI: 10.1101/gr.277784.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Accepted: 08/23/2023] [Indexed: 10/28/2023]
Abstract
Ultraconserved elements (UCEs) are the most conserved regions among the genomes of evolutionarily distant species and are thought to play critical biological functions. However, some UCEs rapidly evolved in specific lineages, and whether they contributed to adaptive evolution is still controversial. Here, using an increased number of sequenced genomes with high taxonomic coverage, we identified 2191 mammalian UCEs and 5938 avian UCEs from 95 mammal and 94 bird genomes, respectively. Our results show that these UCEs are functionally constrained and that their adjacent genes are prone to widespread expression with low expression diversity across tissues. Functional enrichment of mammalian and avian UCEs shows different trends indicating that UCEs may contribute to adaptive evolution of taxa. Focusing on lineage-specific accelerated evolution, we discover that the proportion of fast-evolving UCEs in nine mammalian and 10 avian test lineages range from 0.19% to 13.2%. Notably, up to 62.1% of fast-evolving UCEs in test lineages are much more likely to result from GC-biased gene conversion (gBGC). A single cervid-specific gBGC region embracing the uc.359 allele significantly alters the expression of Nova1 and other neural-related genes in the rat brain. Combined with the altered regulatory activity of ancient gBGC-induced fast-evolving UCEs in eutherians, our results provide evidence that synergy between gBGC and selection shaped lineage-specific substitution patterns, even in the most constrained regulatory elements. In summary, our results show that gBGC played an important role in facilitating lineage-specific accelerated evolution of UCEs, and further support the idea that a combination of multiple evolutionary forces shapes adaptive evolution.
Collapse
Affiliation(s)
- Anguo Liu
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
- Key Laboratory of Livestock Biology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Nini Wang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
- Faculty of Mathematics and Natural Sciences, University of Cologne, and Cologne Excellence Cluster for Cellular Stress Responses in Aging-Associated Diseases (CECAD), University Hospital Cologne, Cologne 50931, Germany
| | - Guoxiang Xie
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
- Key Laboratory of Livestock Biology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Yang Li
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
- Key Laboratory of Livestock Biology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Xixi Yan
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
- Key Laboratory of Livestock Biology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Xinmei Li
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
- Key Laboratory of Livestock Biology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Zhenliang Zhu
- Key Laboratory of Livestock Biology, Northwest A&F University, Yangling, Shaanxi 712100, China
- College of Veterinary Medicine, Northwest A&F University, Yangling, Shaanxi 712100, China
- Key Laboratory of Animal Biotechnology, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Zhuohui Li
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
- Key Laboratory of Livestock Biology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Jing Yang
- Key Laboratory of Livestock Biology, Northwest A&F University, Yangling, Shaanxi 712100, China
- College of Veterinary Medicine, Northwest A&F University, Yangling, Shaanxi 712100, China
- Key Laboratory of Animal Biotechnology, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Fanxin Meng
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Mingle Dou
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
- Key Laboratory of Livestock Biology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Weihuang Chen
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Nange Ma
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
- Key Laboratory of Livestock Biology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Yu Jiang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
- Key Laboratory of Livestock Biology, Northwest A&F University, Yangling, Shaanxi 712100, China
- Center for Functional Genomics, Institute of Future Agriculture, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Yuanpeng Gao
- Key Laboratory of Livestock Biology, Northwest A&F University, Yangling, Shaanxi 712100, China;
- College of Veterinary Medicine, Northwest A&F University, Yangling, Shaanxi 712100, China
- Key Laboratory of Animal Biotechnology, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Yu Wang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China;
- Key Laboratory of Livestock Biology, Northwest A&F University, Yangling, Shaanxi 712100, China
| |
Collapse
|
11
|
Li G, Su G, Wang Y, Wang W, Shi J, Li D, Sui G. Integrative genomic analyses of promoter G-quadruplexes reveal their selective constraint and association with gene activation. Commun Biol 2023; 6:625. [PMID: 37301913 PMCID: PMC10257653 DOI: 10.1038/s42003-023-05015-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2023] [Accepted: 06/05/2023] [Indexed: 06/12/2023] Open
Abstract
G-quadruplexes (G4s) regulate DNA replication and gene transcription, and are enriched in promoters without fully appreciated functional relevance. Here we show high selection pressure on putative G4 (pG4) forming sequences in promoters through investigating genetic and genomic data. Analyses of 76,156 whole-genome sequences reveal that G-tracts and connecting loops in promoter pG4s display lower or higher allele frequencies, respectively, than pG4-flanking regions, and central guanines (Gs) in G-tracts show higher selection pressure than other Gs. Additionally, pG4-promoters produce over 72.4% of transcripts, and promoter G4-containing genes are expressed at relatively high levels. Most genes repressed by TMPyP4, a G4-ligand, regulate epigenetic processes, and promoter G4s are enriched with gene activation histone marks, chromatin remodeler and transcription factor binding sites. Consistently, cis-expression quantitative trait loci (cis-eQTLs) are enriched in promoter pG4s and their G-tracts. Overall, our study demonstrates selective constraint of promoter G4s and reinforces their stimulative role in gene expression.
Collapse
Affiliation(s)
- Guangyue Li
- College of Life Science, Northeast Forestry University, Harbin, 150040, China
| | - Gongbo Su
- College of Life Science, Northeast Forestry University, Harbin, 150040, China
| | - Yunxuan Wang
- Department of Medical Oncology, Harbin Medical University Cancer Hospital, Harbin, 150081, China
| | - Wenmeng Wang
- College of Life Science, Northeast Forestry University, Harbin, 150040, China
| | - Jinming Shi
- College of Life Science, Northeast Forestry University, Harbin, 150040, China
| | - Dangdang Li
- College of Life Science, Northeast Forestry University, Harbin, 150040, China
| | - Guangchao Sui
- College of Life Science, Northeast Forestry University, Harbin, 150040, China.
| |
Collapse
|
12
|
Won MM, Mladenov GD, Raymond SL, Khan FA, Radulescu A. What animal model should I use to study necrotizing enterocolitis? Semin Pediatr Surg 2023; 32:151313. [PMID: 37276781 DOI: 10.1016/j.sempedsurg.2023.151313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Unfortunately, we are all too familiar with the statement: "Necrotizing enterocolitis remains the leading cause of gastrointestinal surgical emergency in preterm neonates". It's been five decades since the first animal models of necrotizing enterocolitis (NEC) were described. There remains much investigative work to be done on identifying various aspects of NEC, ranging from the underlying mechanisms to treatment modalities. Experimental NEC is mainly focused on a rat, mouse, and piglet models. Our aim is to not only highlight the pros and cons of these three main models, but to also present some of the less-used animal models that have contributed to the body of knowledge about NEC. Choosing an appropriate model is essential to conducting effective research and answering the questions asked. As such, this paper reviews some of the variations that come with each model.
Collapse
Affiliation(s)
- Mitchell M Won
- School of Medicine, Loma Linda University, Loma Linda, CA, USA
| | - Georgi D Mladenov
- Division of Pediatric Surgery, Loma Linda University Children's Hospital, Loma Linda, CA, USA
| | - Steven L Raymond
- School of Medicine, Loma Linda University, Loma Linda, CA, USA; Division of Pediatric Surgery, Loma Linda University Children's Hospital, Loma Linda, CA, USA
| | - Faraz A Khan
- School of Medicine, Loma Linda University, Loma Linda, CA, USA; Division of Pediatric Surgery, Loma Linda University Children's Hospital, Loma Linda, CA, USA
| | - Andrei Radulescu
- School of Medicine, Loma Linda University, Loma Linda, CA, USA; Division of Pediatric Surgery, Loma Linda University Children's Hospital, Loma Linda, CA, USA.
| |
Collapse
|
13
|
Nabi A, Dilekoglu B, Adebali O, Tastan O. Discovering misannotated lncRNAs using deep learning training dynamics. Bioinformatics 2023; 39:6960922. [PMID: 36571493 PMCID: PMC9825752 DOI: 10.1093/bioinformatics/btac821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 10/05/2022] [Accepted: 12/23/2022] [Indexed: 12/27/2022] Open
Abstract
MOTIVATION Recent experimental evidence has shown that some long non-coding RNAs (lncRNAs) contain small open reading frames (sORFs) that are translated into functional micropeptides, suggesting that these lncRNAs are misannotated as non-coding. Current methods to detect misannotated lncRNAs rely on ribosome-profiling (Ribo-Seq) and mass-spectrometry experiments, which are cell-type dependent and expensive. RESULTS Here, we propose a computational method to identify possible misannotated lncRNAs from sequence information alone. Our approach first builds deep learning models to discriminate coding and non-coding transcripts and leverages these models' training dynamics to identify misannotated lncRNAs-i.e. lncRNAs with coding potential. The set of misannotated lncRNAs we identified significantly overlap with experimentally validated ones and closely resemble coding protein sequences as evidenced by significant BLAST hits. Our analysis on a subset of misannotated lncRNA candidates also shows that some ORFs they contain yield high confidence folded structures as predicted by AlphaFold2. This methodology offers promising potential for assisting experimental efforts in characterizing the hidden proteome encoded by misannotated lncRNAs and for curating better datasets for building coding potential predictors. AVAILABILITY AND IMPLEMENTATION Source code is available at https://github.com/nabiafshan/DetectingMisannotatedLncRNAs. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Afshan Nabi
- Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul 34956, Turkey
| | - Berke Dilekoglu
- Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul 34956, Turkey
| | - Ogun Adebali
- Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul 34956, Turkey
| | | |
Collapse
|
14
|
Smeds L, Ellegren H. From high masked to high realized genetic load in inbred Scandinavian wolves. Mol Ecol 2022; 32:1567-1580. [PMID: 36458895 DOI: 10.1111/mec.16802] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 11/17/2022] [Accepted: 11/28/2022] [Indexed: 12/03/2022]
Abstract
When new mutations arise at functional sites they are more likely to impair than improve fitness. If not removed by purifying selection, such deleterious mutations will generate a genetic load that can have negative fitness effects in small populations and increase the risk of extinction. This is relevant for the highly inbred Scandinavian wolf (Canis lupus) population, founded by only three wolves in the 1980s and suffering from inbreeding depression. We used functional annotation and evolutionary conservation scores to study deleterious variation in a total of 209 genomes from both the Scandinavian and neighbouring wolf populations in northern Europe. The masked load (deleterious mutations in heterozygote state) was highest in Russia and Finland with deleterious alleles segregating at lower frequency than neutral variation. Genetic drift in the Scandinavian population led to the loss of ancestral alleles, fixation of deleterious variants and a significant increase in the per-individual realized load (deleterious mutations in homozygote state; an increase by 45% in protein-coding genes) over five generations of inbreeding. Arrival of immigrants gave a temporary genetic rescue effect with ancestral alleles re-entering the population and thereby shifting deleterious alleles from homozygous into heterozygote genotypes. However, in the absence of permanent connectivity to Finnish and Russian populations, inbreeding has then again led to the exposure of deleterious mutations. These observations provide genome-wide insight into the magnitude of genetic load and genetic rescue at the molecular level, and in relation to population history. They emphasize the importance of securing gene flow in the management of endangered populations.
Collapse
Affiliation(s)
- Linnéa Smeds
- Department of Ecology and Genetics, Uppsala University, Uppsala, Sweden
| | - Hans Ellegren
- Department of Ecology and Genetics, Uppsala University, Uppsala, Sweden
| |
Collapse
|
15
|
Kitai H, Kato N, Ogami K, Komatsu S, Watanabe Y, Yoshino S, Koshi E, Tsubota S, Funahashi Y, Maeda T, Furuhashi K, Ishimoto T, Kosugi T, Maruyama S, Kadomatsu K, Suzuki HI. Systematic characterization of seed overlap microRNA cotargeting associated with lupus pathogenesis. BMC Biol 2022; 20:248. [PMID: 36357926 PMCID: PMC9650897 DOI: 10.1186/s12915-022-01447-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Accepted: 10/21/2022] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND Combinatorial gene regulation by multiple microRNAs (miRNAs) is widespread and closely spaced target sites often act cooperatively to achieve stronger repression ("neighborhood" miRNA cotargeting). While miRNA cotarget sites are suggested to be more conserved and implicated in developmental control, the pathological significance of miRNA cotargeting remains elusive. RESULTS Here, we report the pathogenic impacts of combinatorial miRNA regulation on inflammation in systemic lupus erythematosus (SLE). In the SLE mouse model, we identified the downregulation of two miRNAs, miR-128 and miR-148a, by TLR7 stimulation in plasmacytoid dendritic cells. Functional analyses using human cell lines demonstrated that miR-128 and miR-148a additively target KLF4 via extensively overlapping target sites ("seed overlap" miRNA cotargeting) and suppress the inflammatory responses. At the transcriptome level, "seed overlap" miRNA cotargeting increases susceptibility to downregulation by two miRNAs, consistent with additive but not cooperative recruitment of two miRNAs. Systematic characterization further revealed that extensive "seed overlap" is a prevalent feature among broadly conserved miRNAs. Highly conserved target sites of broadly conserved miRNAs are largely divided into two classes-those conserved among eutherian mammals and from human to Coelacanth, and the latter, including KLF4-cotargeting sites, has a stronger association with both "seed overlap" and "neighborhood" miRNA cotargeting. Furthermore, a deeply conserved miRNA target class has a higher probability of haplo-insufficient genes. CONCLUSIONS Our study collectively suggests the complexity of distinct modes of miRNA cotargeting and the importance of their perturbations in human diseases.
Collapse
Affiliation(s)
- Hiroki Kitai
- Department of Nephrology, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya, Aichi 466-8550 Japan
| | - Noritoshi Kato
- Department of Nephrology, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya, Aichi 466-8550 Japan
| | - Koichi Ogami
- Division of Molecular Oncology, Center for Neurological Diseases and Cancer, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya, Aichi 466-8550 Japan
| | - Shintaro Komatsu
- Department of Nephrology, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya, Aichi 466-8550 Japan
- Division of Molecular Oncology, Center for Neurological Diseases and Cancer, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya, Aichi 466-8550 Japan
| | - Yu Watanabe
- Department of Nephrology, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya, Aichi 466-8550 Japan
- Division of Molecular Oncology, Center for Neurological Diseases and Cancer, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya, Aichi 466-8550 Japan
| | - Seiko Yoshino
- Division of Molecular Oncology, Center for Neurological Diseases and Cancer, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya, Aichi 466-8550 Japan
| | - Eri Koshi
- Department of Nephrology, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya, Aichi 466-8550 Japan
- Division of Molecular Oncology, Center for Neurological Diseases and Cancer, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya, Aichi 466-8550 Japan
| | - Shoma Tsubota
- Department of Biochemistry, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya, Aichi 466-8550 Japan
| | - Yoshio Funahashi
- Department of Nephrology, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya, Aichi 466-8550 Japan
- Present Address: Yoshio Funahashi, Department of Anesthesiology and Perioperative Medicine, Oregon Health and Science University, 3181 S.W. Sam Jackson Park Road, Portland, OR 97239 USA
| | - Takahiro Maeda
- Department of General Medicine, Nagasaki University Graduate School of Biomedical Sciences, 1-7-1 Sakamoto, Nagasaki, Nagasaki 852-8501 Japan
| | - Kazuhiro Furuhashi
- Department of Nephrology, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya, Aichi 466-8550 Japan
| | - Takuji Ishimoto
- Department of Nephrology, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya, Aichi 466-8550 Japan
- Present Address: Takuji Ishimoto, Department of Nephrology and Rheumatology, Aichi Medical University, 1-1 Yazakokarimata, Nagakute, Aichi 480-1195 Japan
| | - Tomoki Kosugi
- Department of Nephrology, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya, Aichi 466-8550 Japan
| | - Shoichi Maruyama
- Department of Nephrology, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya, Aichi 466-8550 Japan
| | - Kenji Kadomatsu
- Department of Biochemistry, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya, Aichi 466-8550 Japan
- Institute for Glyco-core Research (iGCORE), Nagoya University, Furo-cho, Chikusa-ku, Nagoya, Aichi 464-8601 Japan
| | - Hiroshi I. Suzuki
- Division of Molecular Oncology, Center for Neurological Diseases and Cancer, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya, Aichi 466-8550 Japan
- Institute for Glyco-core Research (iGCORE), Nagoya University, Furo-cho, Chikusa-ku, Nagoya, Aichi 464-8601 Japan
| |
Collapse
|
16
|
Lee U, Stuelsatz P, Karaz S, McKellar DW, Russeil J, Deak M, De Vlaminck I, Lepper C, Deplancke B, Cosgrove BD, Feige JN. A Tead1-Apelin axis directs paracrine communication from myogenic to endothelial cells in skeletal muscle. iScience 2022; 25:104589. [PMID: 35789856 PMCID: PMC9250016 DOI: 10.1016/j.isci.2022.104589] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Revised: 03/10/2022] [Accepted: 06/08/2022] [Indexed: 11/23/2022] Open
Abstract
Apelin (Apln) is a myokine that regulates skeletal muscle plasticity and metabolism and declines during aging. Through a yeast one-hybrid transcription factor binding screen, we identified the TEA domain transcription factor 1 (Tead1) as a novel regulator of the Apln promoter. Single-cell analysis of regenerating muscle revealed that the apelin receptor (Aplnr) is enriched in endothelial cells, whereas Tead1 is enriched in myogenic cells. Knock-down of Tead1 stimulates Apln secretion from muscle cells in vitro and myofiber-specific overexpression of Tead1 suppresses Apln secretion in vivo. Apln secretion via Tead1 knock-down in muscle cells stimulates endothelial cell expansion via endothelial Aplnr. In vivo, Apln peptide supplementation enhances endothelial cell expansion while Tead1 muscle overexpression delays endothelial remodeling following muscle injury. Our work describes a novel paracrine crosstalk in which Apln secretion is controlled by Tead1 in myogenic cells and influences endothelial remodeling during muscle repair.
Collapse
Affiliation(s)
- Umji Lee
- Nestlé Institute of Health Sciences, Nestlé Research, Lausanne, Switzerland
- Meinig School of Biomedical Engineering, Cornell University, Ithaca, NY, USA
- School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Pascal Stuelsatz
- Nestlé Institute of Health Sciences, Nestlé Research, Lausanne, Switzerland
| | - Sonia Karaz
- Nestlé Institute of Health Sciences, Nestlé Research, Lausanne, Switzerland
| | - David W. McKellar
- Meinig School of Biomedical Engineering, Cornell University, Ithaca, NY, USA
| | - Julie Russeil
- School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Maria Deak
- Nestlé Institute of Health Sciences, Nestlé Research, Lausanne, Switzerland
| | - Iwijn De Vlaminck
- Meinig School of Biomedical Engineering, Cornell University, Ithaca, NY, USA
| | - Christoph Lepper
- Department of Physiology and Cell Biology, The Ohio State University College of Medicine, Columbus, OH, USA
| | - Bart Deplancke
- School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | | | - Jerome N. Feige
- Nestlé Institute of Health Sciences, Nestlé Research, Lausanne, Switzerland
- School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| |
Collapse
|
17
|
Campitelli LF, Yellan I, Albu M, Barazandeh M, Patel ZM, Blanchette M, Hughes TR. Reconstruction of full-length LINE-1 progenitors from ancestral genomes. Genetics 2022; 221:6584822. [PMID: 35552404 PMCID: PMC9252281 DOI: 10.1093/genetics/iyac074] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Accepted: 04/27/2022] [Indexed: 11/24/2022] Open
Abstract
Sequences derived from the Long INterspersed Element-1 (L1) family of retrotransposons occupy at least 17% of the human genome, with 67 distinct subfamilies representing successive waves of expansion and extinction in mammalian lineages. L1s contribute extensively to gene regulation, but their molecular history is difficult to trace, because most are present only as truncated and highly mutated fossils. Consequently, L1 entries in current databases of repeat sequences are composed mainly of short diagnostic subsequences, rather than full functional progenitor sequences for each subfamily. Here, we have coupled 2 levels of sequence reconstruction (at the level of whole genomes and L1 subfamilies) to reconstruct progenitor sequences for all human L1 subfamilies that are more functionally and phylogenetically plausible than existing models. Most of the reconstructed sequences are at or near the canonical length of L1s and encode uninterrupted ORFs with expected protein domains. We also show that the presence or absence of binding sites for KRAB-C2H2 Zinc Finger Proteins, even in ancient-reconstructed progenitor L1s, mirrors binding observed in human ChIP-exo experiments, thus extending the arms race and domestication model. RepeatMasker searches of the modern human genome suggest that the new models may be able to assign subfamily resolution identities to previously ambiguous L1 instances. The reconstructed L1 sequences will be useful for genome annotation and functional study of both L1 evolution and L1 contributions to host regulatory networks.
Collapse
Affiliation(s)
- Laura F Campitelli
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A1, Canada.,Donnelly Centre, University of Toronto, Toronto, ON M5S 1A1, Canada
| | - Isaac Yellan
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A1, Canada.,Donnelly Centre, University of Toronto, Toronto, ON M5S 1A1, Canada
| | - Mihai Albu
- Donnelly Centre, University of Toronto, Toronto, ON M5S 1A1, Canada
| | - Marjan Barazandeh
- Donnelly Centre, University of Toronto, Toronto, ON M5S 1A1, Canada.,Faculty of Pharmaceutical Sciences, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| | - Zain M Patel
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A1, Canada.,Donnelly Centre, University of Toronto, Toronto, ON M5S 1A1, Canada
| | - Mathieu Blanchette
- Faculty of Pharmaceutical Sciences, University of British Columbia, Vancouver, BC V6T 1Z4, Canada.,Department of Computer Science, McGill University, Montreal, Quebec H3A 0G4, Canada
| | - Timothy R Hughes
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A1, Canada.,Donnelly Centre, University of Toronto, Toronto, ON M5S 1A1, Canada
| |
Collapse
|
18
|
Mechaly A, Diamant E, Alcalay R, Ben David A, Dor E, Torgeman A, Barnea A, Girshengorn M, Levin L, Epstein E, Tennenhouse A, Fleishman SJ, Zichel R, Mazor O. Highly Specific Monoclonal Antibody Targeting the Botulinum Neurotoxin Type E Exposed SNAP-25 Neoepitope. Antibodies (Basel) 2022; 11:21. [PMID: 35323195 PMCID: PMC8944829 DOI: 10.3390/antib11010021] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 03/07/2022] [Accepted: 03/13/2022] [Indexed: 12/31/2022] Open
Abstract
Botulinum neurotoxin type E (BoNT/E), the fastest acting toxin of all BoNTs, cleaves the 25 kDa synaptosomal-associated protein (SNAP-25) in motor neurons, leading to flaccid paralysis. The specific detection and quantification of the BoNT/E-cleaved SNAP-25 neoepitope can facilitate the development of cell-based assays for the characterization of anti-BoNT/E antibody preparations. In order to isolate highly specific monoclonal antibodies suitable for the in vitro immuno-detection of the exposed neoepitope, mice and rabbits were immunized with an eight amino acid peptide composed of the C-terminus of the cleaved SNAP-25. The immunized rabbits developed a specific and robust polyclonal antibody response, whereas the immunized mice mostly demonstrated a weak antibody response that could not discriminate between the two forms of SNAP-25. An immune scFv phage-display library was constructed from the immunized rabbits and a panel of antibodies was isolated. The sequence alignment of the isolated clones revealed high similarity between both heavy and light chains with exceptionally short HCDR3 sequences. A chimeric scFv-Fc antibody was further expressed and characterized, exhibiting a selective, ultra-high affinity (pM) towards the SNAP-25 neoepitope. Moreover, this antibody enabled the sensitive detection of cleaved SNAP-25 in BoNT/E treated SiMa cells with no cross reactivity with the intact SNAP-25. Thus, by applying an immunization and selection procedure, we have isolated a novel, specific and high-affinity antibody against the BoNT/E-derived SNAP-25 neoepitope. This novel antibody can be applied in in vitro assays that determine the potency of antitoxin preparations and reduce the use of laboratory animals for these purposes.
Collapse
Affiliation(s)
- Adva Mechaly
- Department of Infectious Diseases, Israel Institute for Biological Research, Ness-Ziona 7410001, Israel;
| | - Eran Diamant
- Department of Biotechnology, Israel Institute for Biological Research, Ness-Ziona 7410001, Israel; (E.D.); (A.B.D.); (E.D.); (A.T.); (A.B.); (M.G.); (L.L.); (E.E.); (R.Z.)
| | - Ron Alcalay
- Department of Biochemistry and Molecular Genetics, Israel Institute for Biological Research, Ness-Ziona 7410001, Israel;
| | - Alon Ben David
- Department of Biotechnology, Israel Institute for Biological Research, Ness-Ziona 7410001, Israel; (E.D.); (A.B.D.); (E.D.); (A.T.); (A.B.); (M.G.); (L.L.); (E.E.); (R.Z.)
| | - Eyal Dor
- Department of Biotechnology, Israel Institute for Biological Research, Ness-Ziona 7410001, Israel; (E.D.); (A.B.D.); (E.D.); (A.T.); (A.B.); (M.G.); (L.L.); (E.E.); (R.Z.)
| | - Amram Torgeman
- Department of Biotechnology, Israel Institute for Biological Research, Ness-Ziona 7410001, Israel; (E.D.); (A.B.D.); (E.D.); (A.T.); (A.B.); (M.G.); (L.L.); (E.E.); (R.Z.)
| | - Ada Barnea
- Department of Biotechnology, Israel Institute for Biological Research, Ness-Ziona 7410001, Israel; (E.D.); (A.B.D.); (E.D.); (A.T.); (A.B.); (M.G.); (L.L.); (E.E.); (R.Z.)
| | - Meni Girshengorn
- Department of Biotechnology, Israel Institute for Biological Research, Ness-Ziona 7410001, Israel; (E.D.); (A.B.D.); (E.D.); (A.T.); (A.B.); (M.G.); (L.L.); (E.E.); (R.Z.)
| | - Lilach Levin
- Department of Biotechnology, Israel Institute for Biological Research, Ness-Ziona 7410001, Israel; (E.D.); (A.B.D.); (E.D.); (A.T.); (A.B.); (M.G.); (L.L.); (E.E.); (R.Z.)
| | - Eyal Epstein
- Department of Biotechnology, Israel Institute for Biological Research, Ness-Ziona 7410001, Israel; (E.D.); (A.B.D.); (E.D.); (A.T.); (A.B.); (M.G.); (L.L.); (E.E.); (R.Z.)
| | - Ariel Tennenhouse
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot 7600001, Israel; (A.T.); (S.J.F.)
| | - Sarel J. Fleishman
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot 7600001, Israel; (A.T.); (S.J.F.)
| | - Ran Zichel
- Department of Biotechnology, Israel Institute for Biological Research, Ness-Ziona 7410001, Israel; (E.D.); (A.B.D.); (E.D.); (A.T.); (A.B.); (M.G.); (L.L.); (E.E.); (R.Z.)
| | - Ohad Mazor
- Department of Infectious Diseases, Israel Institute for Biological Research, Ness-Ziona 7410001, Israel;
| |
Collapse
|
19
|
Yuanyuan J, Xinqiang Y. Micropeptides Identified from Human Genomes. J Proteome Res 2022; 21:865-873. [DOI: 10.1021/acs.jproteome.1c00889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Jing Yuanyuan
- School of Public Health, North Sichuan Medical College, Nanchong 637000, China
| | - Yin Xinqiang
- School of Basic Medicine and Forensics, North Sichuan Medical College, Nanchong 637000, China
| |
Collapse
|
20
|
White ND, Batz ZA, Braun EL, Braun MJ, Carleton KL, Kimball RT, Swaroop A. A novel exome probe set captures phototransduction genes across birds (Aves) enabling efficient analysis of vision evolution. Mol Ecol Resour 2021; 22:587-601. [PMID: 34652059 DOI: 10.1111/1755-0998.13496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Revised: 08/17/2021] [Accepted: 08/20/2021] [Indexed: 11/27/2022]
Abstract
The diversity of avian visual phenotypes provides a framework for studying mechanisms of trait diversification generally, and the evolution of vertebrate vision, specifically. Previous research has focused on opsins, but to fully understand visual adaptation, we must study the complete phototransduction cascade (PTC). Here, we developed a probe set that captures exonic regions of 46 genes representing the PTC and other light responses. For a subset of species, we directly compared gene capture between our probe set and low-coverage whole genome sequencing (WGS), and we discuss considerations for choosing between these methods. Finally, we developed a unique strategy to avoid chimeric assembly by using "decoy" reference sequences. We successfully captured an average of 64% of our targeted exome in 46 species across 14 orders using the probe set and had similar recovery using the WGS data. Compared to WGS or transcriptomes, our probe set: (1) reduces sequencing requirements by efficiently capturing vision genes, (2) employs a simpler bioinformatic pipeline by limiting required assembly and negating annotation, and (3) eliminates the need for fresh tissues, enabling researchers to leverage existing museum collections. We then utilized our vision exome data to identify positively selected genes in two evolutionary scenarios-evolution of night vision in nocturnal birds and evolution of high-speed vision specific to manakins (Pipridae). We found parallel positive selection of SLC24A1 in both scenarios, implicating the alteration of rod response kinetics, which could improve color discrimination in dim light conditions and/or facilitate higher temporal resolution.
Collapse
Affiliation(s)
- Noor D White
- Neurobiology Neurodegeneration and Repair Laboratory, National Eye Institute, National Institutes of Health, Bethesda, Maryland, USA.,Department of Vertebrate Zoology, National Museum of Natural History, Smithsonian Institution, Washington, District of Columbia, USA.,Behavior, Ecology, Evolution and Systematics Program, University of Maryland, College Park, Maryland, USA
| | - Zachary A Batz
- Neurobiology Neurodegeneration and Repair Laboratory, National Eye Institute, National Institutes of Health, Bethesda, Maryland, USA
| | - Edward L Braun
- Department of Biology, University of Florida, Gainesville, Florida, USA
| | - Michael J Braun
- Department of Vertebrate Zoology, National Museum of Natural History, Smithsonian Institution, Washington, District of Columbia, USA.,Behavior, Ecology, Evolution and Systematics Program, University of Maryland, College Park, Maryland, USA.,Department of Biology, University of Maryland, College Park, Maryland, USA
| | - Karen L Carleton
- Behavior, Ecology, Evolution and Systematics Program, University of Maryland, College Park, Maryland, USA.,Department of Biology, University of Maryland, College Park, Maryland, USA
| | - Rebecca T Kimball
- Department of Biology, University of Florida, Gainesville, Florida, USA
| | - Anand Swaroop
- Neurobiology Neurodegeneration and Repair Laboratory, National Eye Institute, National Institutes of Health, Bethesda, Maryland, USA
| |
Collapse
|
21
|
Guiblet WM, DeGiorgio M, Cheng X, Chiaromonte F, Eckert KA, Huang YF, Makova KD. Selection and thermostability suggest G-quadruplexes are novel functional elements of the human genome. Genome Res 2021; 31:1136-1149. [PMID: 34187812 PMCID: PMC8256861 DOI: 10.1101/gr.269589.120] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Accepted: 05/24/2021] [Indexed: 12/11/2022]
Abstract
Approximately 1% of the human genome has the ability to fold into G-quadruplexes (G4s)-noncanonical strand-specific DNA structures forming at G-rich motifs. G4s regulate several key cellular processes (e.g., transcription) and have been hypothesized to participate in others (e.g., firing of replication origins). Moreover, G4s differ in their thermostability, and this may affect their function. Yet, G4s may also hinder replication, transcription, and translation and may increase genome instability and mutation rates. Therefore, depending on their genomic location, thermostability, and functionality, G4 loci might evolve under different selective pressures, which has never been investigated. Here we conducted the first genome-wide analysis of G4 distribution, thermostability, and selection. We found an overrepresentation, high thermostability, and purifying selection for G4s within genic components in which they are expected to be functional-promoters, CpG islands, and 5' and 3' UTRs. A similar pattern was observed for G4s within replication origins, enhancers, eQTLs, and TAD boundary regions, strongly suggesting their functionality. In contrast, G4s on the nontranscribed strand of exons were underrepresented, were unstable, and evolved neutrally. In general, G4s on the nontranscribed strand of genic components had lower density and were less stable than those on the transcribed strand, suggesting that the former are avoided at the RNA level. Across the genome, purifying selection was stronger at stable G4s. Our results suggest that purifying selection preserves the sequences of functional G4s, whereas nonfunctional G4s are too costly to be tolerated in the genome. Thus, G4s are emerging as fundamental, functional genomic elements.
Collapse
Affiliation(s)
- Wilfried M Guiblet
- Bioinformatics and Genomics Graduate Program, Penn State University, University Park, Pennsylvania 16802, USA
| | - Michael DeGiorgio
- Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, Florida 33431, USA
| | - Xiaoheng Cheng
- Department of Biology, Penn State University, University Park, Pennsylvania 16802, USA
| | - Francesca Chiaromonte
- Department of Statistics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Center for Medical Genomics, Penn State University, University Park and Hershey, Pennsylvania 16802, USA
- Sant'Anna School of Advanced Studies, 56127 Pisa, Italy
| | - Kristin A Eckert
- Center for Medical Genomics, Penn State University, University Park and Hershey, Pennsylvania 16802, USA
- Department of Pathology, Penn State University, College of Medicine, Hershey, Pennsylvania 17033, USA
| | - Yi-Fei Huang
- Department of Biology, Penn State University, University Park, Pennsylvania 16802, USA
- Center for Medical Genomics, Penn State University, University Park and Hershey, Pennsylvania 16802, USA
| | - Kateryna D Makova
- Department of Biology, Penn State University, University Park, Pennsylvania 16802, USA
- Center for Medical Genomics, Penn State University, University Park and Hershey, Pennsylvania 16802, USA
| |
Collapse
|
22
|
Abstract
Phylogenomics, the study of phylogenetic relationships among taxa based on their genome sequences, has emerged as the preferred phylogenetic method because of the wealth of phylogenetic information contained in genome sequences. Genome sequencing, however, can be prohibitively expensive, especially for taxa with huge genomes and when many taxa need sequencing. Consequently, the less costly phylotranscriptomics has seen an increased use in recent years. Phylotranscriptomics reconstructs phylogenies using DNA sequences derived from transcriptomes, which are often orders of magnitude smaller than genomes. However, in the absence of corresponding genome sequences, comparative analyses of transcriptomes can be challenging and it is unclear whether phylotranscriptomics is as reliable as phylogenomics. Here, we respectively compare the phylogenomic and phylotranscriptomic trees of 22 mammals and 15 plants that have both sequenced nuclear genomes and publicly available RNA sequencing data from multiple tissues. We found that phylotranscriptomic analysis can be sensitive to orthologous gene identification. When a rigorous method for identifying orthologs is employed, phylogenomic and phylotranscriptomic trees are virtually identical to each other, regardless of the tissue of origin of the transcriptomes and whether the same tissue is used across species. These findings validate phylotranscriptomics, brighten its prospect, and illustrate the criticality of reliable ortholog detection in such practices.
Collapse
Affiliation(s)
- Seongmin Cheon
- School of Biological Sciences and Technology, Chonnam National University, Gwangju, Republic of Korea
| | - Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI
| | - Chungoo Park
- School of Biological Sciences and Technology, Chonnam National University, Gwangju, Republic of Korea
| |
Collapse
|
23
|
Lee S, Lee T, Noh YK, Kim S. Ranked k-Spectrum Kernel for Comparative and Evolutionary Comparison of Exons, Introns, and CpG Islands. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1174-1183. [PMID: 31494555 DOI: 10.1109/tcbb.2019.2938949] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
MOTIVATION Existing k-mer based string kernel methods have been successfully used for sequence comparison. However, existing kernel methods have limitations for comparative and evolutionary comparisons of genomes due to the sensitiveness to over-represented k-mers and variable sequence lengths. RESULTS In this study, we propose a novel ranked k-spectrum string (RKSS) kernel. 1) RKSS kernel utilizes common k-mer sets across species, named landmarks, that can be used for comparing multiple genomes. 2) Based on the landmarks, we can use ranks of k-mers, rather than frequencies, that can produce more robust distances between genomes. To show the power of RKSS kernel, we conducted two experiments using 10 mammalian species with exon, intron, and CpG island sequences. RKSS kernel reconstructed more consistent evolutionary trees than the k-spectrum string kernel. In the subsequent experiment, for each sequence, kernel distance was calculated from 30 landmarks representing exon, intron, and CpG island sequences of 10 genomes. Based on kernel distances, concordance tests were performed and the result suggested that more information is conserved in CpG islands across species than in introns. In conclusion, our analysis suggests that the relational order, exon CpG island intron, in terms of evolutionary information contents.
Collapse
|
24
|
Lu YY, Bai J, Wang Y, Wang Y, Sun F. CRAFT: Compact genome Representation toward large-scale Alignment-Free daTabase. Bioinformatics 2021; 37:155-161. [PMID: 32766810 DOI: 10.1093/bioinformatics/btaa699] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Revised: 03/11/2020] [Accepted: 07/28/2020] [Indexed: 01/02/2023] Open
Abstract
MOTIVATION Rapid developments in sequencing technologies have boosted generating high volumes of sequence data. To archive and analyze those data, one primary step is sequence comparison. Alignment-free sequence comparison based on k-mer frequencies offers a computationally efficient solution, yet in practice, the k-mer frequency vectors for large k of practical interest lead to excessive memory and storage consumption. RESULTS We report CRAFT, a general genomic/metagenomic search engine to learn compact representations of sequences and perform fast comparison between DNA sequences. Specifically, given genome or high throughput sequencing data as input, CRAFT maps the data into a much smaller embedding space and locates the best matching genome in the archived massive sequence repositories. With 102-104-fold reduction of storage space, CRAFT performs fast query for gigabytes of data within seconds or minutes, achieving comparable performance as six state-of-the-art alignment-free measures. AVAILABILITY AND IMPLEMENTATION CRAFT offers a user-friendly graphical user interface with one-click installation on Windows and Linux operating systems, freely available at https://github.com/jiaxingbai/CRAFT. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yang Young Lu
- Quantitative and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Jiaxing Bai
- Department of Automation, Xiamen University, Xiamen 361000, China
| | - Yiwen Wang
- Department of Automation, Xiamen University, Xiamen 361000, China
| | - Ying Wang
- Department of Automation, Xiamen University, Xiamen 361000, China.,Xiamen Key Lab. of Big Data Intelligent Analysis and Decision, Xiamen 361000, China
| | - Fengzhu Sun
- Quantitative and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
25
|
Lim D, Blanchette M. EvoLSTM: context-dependent models of sequence evolution using a sequence-to-sequence LSTM. Bioinformatics 2021; 36:i353-i361. [PMID: 32657367 PMCID: PMC7355264 DOI: 10.1093/bioinformatics/btaa447] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Motivation Accurate probabilistic models of sequence evolution are essential for a wide variety of bioinformatics tasks, including sequence alignment and phylogenetic inference. The ability to realistically simulate sequence evolution is also at the core of many benchmarking strategies. Yet, mutational processes have complex context dependencies that remain poorly modeled and understood. Results We introduce EvoLSTM, a recurrent neural network-based evolution simulator that captures mutational context dependencies. EvoLSTM uses a sequence-to-sequence long short-term memory model trained to predict mutation probabilities at each position of a given sequence, taking into consideration the 14 flanking nucleotides. EvoLSTM can realistically simulate mammalian and plant DNA sequence evolution and reveals unexpectedly strong long-range context dependencies in mutation probabilities. EvoLSTM brings modern machine-learning approaches to bear on sequence evolution. It will serve as a useful tool to study and simulate complex mutational processes. Availability and implementation Code and dataset are available at https://github.com/DongjoonLim/EvoLSTM. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dongjoon Lim
- School of Computer Science, McGill University, Montreal, Quebec H3A 0G4, Canada
| | - Mathieu Blanchette
- School of Computer Science, McGill University, Montreal, Quebec H3A 0G4, Canada
| |
Collapse
|
26
|
Guiblet WM, Cremona MA, Harris RS, Chen D, Eckert KA, Chiaromonte F, Huang YF, Makova KD. Non-B DNA: a major contributor to small- and large-scale variation in nucleotide substitution frequencies across the genome. Nucleic Acids Res 2021; 49:1497-1516. [PMID: 33450015 PMCID: PMC7897504 DOI: 10.1093/nar/gkaa1269] [Citation(s) in RCA: 60] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Revised: 12/14/2020] [Accepted: 01/11/2021] [Indexed: 12/12/2022] Open
Abstract
Approximately 13% of the human genome can fold into non-canonical (non-B) DNA structures (e.g. G-quadruplexes, Z-DNA, etc.), which have been implicated in vital cellular processes. Non-B DNA also hinders replication, increasing errors and facilitating mutagenesis, yet its contribution to genome-wide variation in mutation rates remains unexplored. Here, we conducted a comprehensive analysis of nucleotide substitution frequencies at non-B DNA loci within noncoding, non-repetitive genome regions, their ±2 kb flanking regions, and 1-Megabase windows, using human-orangutan divergence and human single-nucleotide polymorphisms. Functional data analysis at single-base resolution demonstrated that substitution frequencies are usually elevated at non-B DNA, with patterns specific to each non-B DNA type. Mirror, direct and inverted repeats have higher substitution frequencies in spacers than in repeat arms, whereas G-quadruplexes, particularly stable ones, have higher substitution frequencies in loops than in stems. Several non-B DNA types also affect substitution frequencies in their flanking regions. Finally, non-B DNA explains more variation than any other predictor in multiple regression models for diversity or divergence at 1-Megabase scale. Thus, non-B DNA substantially contributes to variation in substitution frequencies at small and large scales. Our results highlight the role of non-B DNA in germline mutagenesis with implications to evolution and genetic diseases.
Collapse
Affiliation(s)
- Wilfried M Guiblet
- Bioinformatics and Genomics Graduate Program, Penn State University, UniversityPark, PA 16802, USA
| | - Marzia A Cremona
- Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USA
- Department of Operations and Decision Systems, Université Laval, Canada
- CHU de Québec – Université Laval Research Center, Canada
| | - Robert S Harris
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Di Chen
- Intercollege Graduate Degree Program in Genetics, Huck Institutes of the Life Sciences, Penn State University, UniversityPark, PA 16802, USA
| | - Kristin A Eckert
- Department of Pathology, Penn State University, College of Medicine, Hershey, PA 17033, USA
- Center for Medical Genomics, Penn State University, University Park and Hershey, PA, USA
| | - Francesca Chiaromonte
- Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USA
- Center for Medical Genomics, Penn State University, University Park and Hershey, PA, USA
- EMbeDS, Sant’Anna School of Advanced Studies, 56127 Pisa, Italy
| | - Yi-Fei Huang
- Department of Biology, Penn State University, University Park, PA 16802, USA
- Center for Medical Genomics, Penn State University, University Park and Hershey, PA, USA
| | - Kateryna D Makova
- Department of Biology, Penn State University, University Park, PA 16802, USA
- Center for Medical Genomics, Penn State University, University Park and Hershey, PA, USA
| |
Collapse
|
27
|
Groß C, Bortoluzzi C, de Ridder D, Megens HJ, Groenen MAM, Reinders M, Bosse M. Prioritizing sequence variants in conserved non-coding elements in the chicken genome using chCADD. PLoS Genet 2020; 16:e1009027. [PMID: 32966296 PMCID: PMC7535126 DOI: 10.1371/journal.pgen.1009027] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Revised: 10/05/2020] [Accepted: 08/05/2020] [Indexed: 11/30/2022] Open
Abstract
The availability of genomes for many species has advanced our understanding of the non-protein-coding fraction of the genome. Comparative genomics has proven itself to be an invaluable approach for the systematic, genome-wide identification of conserved non-protein-coding elements (CNEs). However, for many non-mammalian model species, including chicken, our capability to interpret the functional importance of variants overlapping CNEs has been limited by current genomic annotations, which rely on a single information type (e.g. conservation). We here studied CNEs in chicken using a combination of population genomics and comparative genomics. To investigate the functional importance of variants found in CNEs we develop a ch(icken) Combined Annotation-Dependent Depletion (chCADD) model, a variant effect prediction tool first introduced for humans and later on for mouse and pig. We show that 73 Mb of the chicken genome has been conserved across more than 280 million years of vertebrate evolution. The vast majority of the conserved elements are in non-protein-coding regions, which display SNP densities and allele frequency distributions characteristic of genomic regions constrained by purifying selection. By annotating SNPs with the chCADD score we are able to pinpoint specific subregions of the CNEs to be of higher functional importance, as supported by SNPs found in these subregions are associated with known disease genes in humans, mice, and rats. Taken together, our findings indicate that CNEs harbor variants of functional significance that should be object of further investigation along with protein-coding mutations. We therefore anticipate chCADD to be of great use to the scientific community and breeding companies in future functional studies in chicken.
Collapse
Affiliation(s)
- Christian Groß
- Bioinformatics Group, Wageningen University & Research, 6708 PB, Wageningen, The Netherlands
- Delft Bioinformatics Lab, University of Technology Delft, 2600 GA, Delft, The Netherlands
| | - Chiara Bortoluzzi
- Animal Breeding and Genomics Group, Wageningen University & Research, 6708 PB, Wageningen, The Netherlands
| | - Dick de Ridder
- Bioinformatics Group, Wageningen University & Research, 6708 PB, Wageningen, The Netherlands
| | - Hendrik-Jan Megens
- Animal Breeding and Genomics Group, Wageningen University & Research, 6708 PB, Wageningen, The Netherlands
| | - Martien A. M. Groenen
- Animal Breeding and Genomics Group, Wageningen University & Research, 6708 PB, Wageningen, The Netherlands
| | - Marcel Reinders
- Delft Bioinformatics Lab, University of Technology Delft, 2600 GA, Delft, The Netherlands
| | - Mirte Bosse
- Animal Breeding and Genomics Group, Wageningen University & Research, 6708 PB, Wageningen, The Netherlands
| |
Collapse
|
28
|
Zhou Y, Pozo PN, Oh S, Stone HM, Cook JG. Distinct and sequential re-replication barriers ensure precise genome duplication. PLoS Genet 2020; 16:e1008988. [PMID: 32841231 PMCID: PMC7473519 DOI: 10.1371/journal.pgen.1008988] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Revised: 09/04/2020] [Accepted: 07/12/2020] [Indexed: 01/19/2023] Open
Abstract
Achieving complete and precise genome duplication requires that each genomic segment be replicated only once per cell division cycle. Protecting large eukaryotic genomes from re-replication requires an overlapping set of molecular mechanisms that prevent the first DNA replication step, the DNA loading of MCM helicase complexes to license replication origins, after S phase begins. Previous reports have defined many such origin licensing inhibition mechanisms, but the temporal relationships among them are not clear, particularly with respect to preventing re-replication in G2 and M phases. Using a combination of mutagenesis, biochemistry, and single cell analyses in human cells, we define a new mechanism that prevents re-replication through hyperphosphorylation of the essential MCM loading protein, Cdt1. We demonstrate that Cyclin A/CDK1 can hyperphosphorylate Cdt1 to inhibit MCM re-loading in G2 phase. The mechanism of inhibition is to block Cdt1 binding to MCM independently of other known Cdt1 inactivation mechanisms such as Cdt1 degradation during S phase or Geminin binding. Moreover, our findings suggest that Cdt1 dephosphorylation at the mitosis-to-G1 phase transition re-activates Cdt1. We propose that multiple distinct, non-redundant licensing inhibition mechanisms act in a series of sequential relays through each cell cycle phase to ensure precise genome duplication. The initial step of DNA replication is loading the DNA helicase, MCM, onto DNA during the first phase of the cell division cycle. If MCM loading occurs inappropriately onto DNA that has already been replicated, then cells risk DNA re-replication, a source of endogenous DNA damage and genome instability. How mammalian cells prevent any sections of their very large genomes from re-replicating is still not fully understood. We found that the Cdt1 protein, one of the critical MCM loading factors, is inhibited specifically in late cell cycle stages through a mechanism involving protein phosphorylation. This phosphorylation prevents Cdt1 from binding MCM; when Cdt1 cannot be phosphorylated MCM is inappropriately re-loaded onto DNA and cells are prone to re-replication. When cells divide and transition into G1 phase, Cdt1 is then dephosphorylated to re-activate it for MCM loading. Based on these findings we assert that the different mechanisms that cooperate to avoid re-replication are not redundant. Instead, different cell cycle phases are dominated by different re-replication control mechanisms. These findings have implications for understanding how genomes are duplicated precisely once per cell cycle and shed light on how that process is perturbed by changes in Cdt1 levels or phosphorylation activity.
Collapse
Affiliation(s)
- Yizhuo Zhou
- Department of Biochemistry and Biophysics, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United State of America
| | - Pedro N. Pozo
- Curriculum in Genetics and Molecular Biology, The University of North Carolina at Chapel Hill, Chapel Hill, NC, United State of America
| | - Seeun Oh
- F. Widjaja Foundation Inflammatory Bowel and Immunobiology Research Institute and the Research Division of Immunology, Department of Biomedical Sciences, Cedars-Sinai Medical Center, Los Angeles, CA, United State of America
| | - Haley M. Stone
- Department of Biochemistry and Biophysics, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United State of America
| | - Jeanette Gowen Cook
- Department of Biochemistry and Biophysics, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United State of America
- Curriculum in Genetics and Molecular Biology, The University of North Carolina at Chapel Hill, Chapel Hill, NC, United State of America
- Lineberger Comprehensive Cancer, The University of North Carolina at Chapel Hill, Chapel Hill, NC, United State of America
- * E-mail:
| |
Collapse
|
29
|
A Cambrian origin for globin gene regulation. Blood 2020; 136:261-262. [PMID: 32673391 PMCID: PMC9710423 DOI: 10.1182/blood.2020006649] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
|
30
|
Heterogeneous phenotype of Hereditary Xerocytosis in association with PIEZO1 variants. Blood Cells Mol Dis 2020; 82:102413. [DOI: 10.1016/j.bcmd.2020.102413] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Revised: 02/06/2020] [Accepted: 02/06/2020] [Indexed: 02/02/2023]
|
31
|
Nekrutenko A, Schatz MC. In memory of James Taylor: the birth of Galaxy. Genome Biol 2020; 21:105. [PMID: 32354350 PMCID: PMC7193333 DOI: 10.1186/s13059-020-02016-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2020] [Accepted: 04/13/2020] [Indexed: 11/14/2022] Open
Affiliation(s)
- Anton Nekrutenko
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, USA.
| | - Michael C Schatz
- Departments of Computer Science and Biology, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
32
|
Price N, Lopez L, Platts AE, Lasky JR. In the presence of population structure: From genomics to candidate genes underlying local adaptation. Ecol Evol 2020; 10:1889-1904. [PMID: 32128123 DOI: 10.1101/642306] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2019] [Revised: 12/19/2019] [Accepted: 12/23/2019] [Indexed: 05/26/2023] Open
Abstract
Understanding the genomic signatures, genes, and traits underlying local adaptation of organisms to heterogeneous environments is of central importance to the field evolutionary biology. To identify loci underlying local adaptation, models that combine allelic and environmental variation while controlling for the effects of population structure have emerged as the method of choice. Despite being evaluated in simulation studies, there has not been a thorough investigation of empirical evidence supporting local adaptation across these alleles. To evaluate these methods, we use 875 Arabidopsis thaliana Eurasian accessions and two mixed models (GEMMA and LFMM) to identify candidate SNPs underlying local adaptation to climate. Subsequently, to assess evidence of local adaptation and function among significant SNPs, we examine allele frequency differentiation and recent selection across Eurasian populations, in addition to their distribution along quantitative trait loci (QTL) explaining fitness variation between Italy and Sweden populations and cis-regulatory/nonsynonymous sites showing significant selective constraint. Our results indicate that significant LFMM/GEMMA SNPs show low allele frequency differentiation and linkage disequilibrium across locally adapted Italy and Sweden populations, in addition to a poor association with fitness QTL peaks (highest logarithm of odds score). Furthermore, when examining derived allele frequencies across the Eurasian range, we find that these SNPs are enriched in low-frequency variants that show very large climatic differentiation but low levels of linkage disequilibrium. These results suggest that their enrichment along putative functional sites most likely represents deleterious variation that is independent of local adaptation. Among all the genomic signatures examined, only SNPs showing high absolute allele frequency differentiation (AFD) and linkage disequilibrium (LD) between Italy and Sweden populations showed a strong association with fitness QTL peaks and were enriched along selectively constrained cis-regulatory/nonsynonymous sites. Using these SNPs, we find strong evidence linking flowering time, freezing tolerance, and the abscisic-acid pathway to local adaptation.
Collapse
Affiliation(s)
- Nicholas Price
- Department of Bioagricultural Sciences & Pest Management Colorado State University Fort Collins CO USA
- Department of Biological Sciences University of Cyprus Nicosia Cyprus
| | - Lua Lopez
- Department of Biology Binghamton University (State University of New York) Binghamton NY USA
| | - Adrian E Platts
- Simons Center for Quantitative Biology Cold Spring Harbor Laboratory Cold Spring Harbor NY USA
- Department of Biology Center for Genomics and Systems Biology New York University New York NY USA
| | - Jesse R Lasky
- Department of Biology Pennsylvania State University University Park PA USA
| |
Collapse
|
33
|
Price N, Lopez L, Platts AE, Lasky JR. In the presence of population structure: From genomics to candidate genes underlying local adaptation. Ecol Evol 2020; 10:1889-1904. [PMID: 32128123 PMCID: PMC7042746 DOI: 10.1002/ece3.6002] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2019] [Revised: 12/19/2019] [Accepted: 12/23/2019] [Indexed: 12/25/2022] Open
Abstract
Understanding the genomic signatures, genes, and traits underlying local adaptation of organisms to heterogeneous environments is of central importance to the field evolutionary biology. To identify loci underlying local adaptation, models that combine allelic and environmental variation while controlling for the effects of population structure have emerged as the method of choice. Despite being evaluated in simulation studies, there has not been a thorough investigation of empirical evidence supporting local adaptation across these alleles. To evaluate these methods, we use 875 Arabidopsis thaliana Eurasian accessions and two mixed models (GEMMA and LFMM) to identify candidate SNPs underlying local adaptation to climate. Subsequently, to assess evidence of local adaptation and function among significant SNPs, we examine allele frequency differentiation and recent selection across Eurasian populations, in addition to their distribution along quantitative trait loci (QTL) explaining fitness variation between Italy and Sweden populations and cis-regulatory/nonsynonymous sites showing significant selective constraint. Our results indicate that significant LFMM/GEMMA SNPs show low allele frequency differentiation and linkage disequilibrium across locally adapted Italy and Sweden populations, in addition to a poor association with fitness QTL peaks (highest logarithm of odds score). Furthermore, when examining derived allele frequencies across the Eurasian range, we find that these SNPs are enriched in low-frequency variants that show very large climatic differentiation but low levels of linkage disequilibrium. These results suggest that their enrichment along putative functional sites most likely represents deleterious variation that is independent of local adaptation. Among all the genomic signatures examined, only SNPs showing high absolute allele frequency differentiation (AFD) and linkage disequilibrium (LD) between Italy and Sweden populations showed a strong association with fitness QTL peaks and were enriched along selectively constrained cis-regulatory/nonsynonymous sites. Using these SNPs, we find strong evidence linking flowering time, freezing tolerance, and the abscisic-acid pathway to local adaptation.
Collapse
Affiliation(s)
- Nicholas Price
- Department of Bioagricultural Sciences & Pest ManagementColorado State UniversityFort CollinsCOUSA
- Department of Biological SciencesUniversity of CyprusNicosiaCyprus
| | - Lua Lopez
- Department of BiologyBinghamton University (State University of New York)BinghamtonNYUSA
| | - Adrian E. Platts
- Simons Center for Quantitative BiologyCold Spring Harbor LaboratoryCold Spring HarborNYUSA
- Department of BiologyCenter for Genomics and Systems BiologyNew York UniversityNew YorkNYUSA
| | - Jesse R. Lasky
- Department of BiologyPennsylvania State UniversityUniversity ParkPAUSA
| |
Collapse
|
34
|
Hecker N, Hiller M. A genome alignment of 120 mammals highlights ultraconserved element variability and placenta-associated enhancers. Gigascience 2020; 9:giz159. [PMID: 31899510 PMCID: PMC6941714 DOI: 10.1093/gigascience/giz159] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2019] [Revised: 11/29/2019] [Accepted: 12/13/2019] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND Multiple alignments of mammalian genomes have been the basis of many comparative genomic studies aiming at annotating genes, detecting regions under evolutionary constraint, and studying genome evolution. A key factor that affects the power of comparative analyses is the number of species included in a genome alignment. RESULTS To utilize the increased number of sequenced genomes and to provide an accessible resource for genomic studies, we generated a mammalian genome alignment comprising 120 species. We used this alignment and the CESAR method to provide protein-coding gene annotations for 119 non-human mammals. Furthermore, we illustrate the utility of this alignment by 2 exemplary analyses. First, we quantified how variable ultraconserved elements (UCEs) are among placental mammals. Leveraging the high taxonomic coverage in our alignment, we estimate that UCEs contain on average 4.7%-15.6% variable alignment columns. Furthermore, we show that the center regions of UCEs are generally most constrained. Second, we identified enhancer sequences that are only conserved in placental mammals. We found that these enhancers are significantly associated with placenta-related genes, suggesting that some of these enhancers may be involved in the evolution of placental mammal-specific aspects of the placenta. CONCLUSION The 120-mammal alignment and all other data are available for analysis and visualization in a genome browser at https://genome-public.pks.mpg.de/and for download at https://bds.mpi-cbg.de/hillerlab/120MammalAlignment/.
Collapse
Affiliation(s)
- Nikolai Hecker
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstr. 108, 01307 Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, Noethnitzer Str. 38, 01187 Dresden, Germany
- Center for Systems Biology Dresden, Pfotenhauerstr. 108, 01307 Dresden, Germany
| | - Michael Hiller
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstr. 108, 01307 Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, Noethnitzer Str. 38, 01187 Dresden, Germany
- Center for Systems Biology Dresden, Pfotenhauerstr. 108, 01307 Dresden, Germany
| |
Collapse
|
35
|
Tang K, Ren J, Sun F. Afann: bias adjustment for alignment-free sequence comparison based on sequencing data using neural network regression. Genome Biol 2019; 20:266. [PMID: 31801606 PMCID: PMC6891986 DOI: 10.1186/s13059-019-1872-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2019] [Accepted: 10/29/2019] [Indexed: 11/27/2022] Open
Abstract
Alignment-free methods, more time and memory efficient than alignment-based methods, have been widely used for comparing genome sequences or raw sequencing samples without assembly. However, in this study, we show that alignment-free dissimilarity calculated based on sequencing samples can be overestimated compared with the dissimilarity calculated based on their genomes, and this bias can significantly decrease the performance of the alignment-free analysis. Here, we introduce a new alignment-free tool, Alignment-Free methods Adjusted by Neural Network (Afann) that successfully adjusts this bias and achieves excellent performance on various independent datasets. Afann is freely available at https://github.com/GeniusTang/Afann.
Collapse
Affiliation(s)
- Kujin Tang
- Quantitative and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA, USA
| | - Jie Ren
- Quantitative and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA, USA
| | - Fengzhu Sun
- Quantitative and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
36
|
Dattilo M, Read AT, Samuels BC, Ethier CR. Detection and characterization of tree shrew retinal venous pulsations: An animal model to study human retinal venous pulsations. Exp Eye Res 2019; 185:107689. [PMID: 31175860 PMCID: PMC6698406 DOI: 10.1016/j.exer.2019.06.003] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2019] [Revised: 05/15/2019] [Accepted: 06/04/2019] [Indexed: 11/26/2022]
Abstract
Spontaneous retinal venous pulsations (SRVPs), pulsations of branches of the central retinal vein, are affected by intraocular pressure (IOP) and intracranial pressure (ICP) and thus convey potentially-useful information about ICP. However, the exact relationship between SRVPs, IOP, and ICP is unknown. It is not easily feasible to study this relationship in humans, necessitating the use of an animal model. We here propose tree shrews as a suitable animal model to study the complex relationship between SRVPs, IOP, and ICP. Tree shrew SRVP incidence was determined in a population of animals. Following validation of a modified IOP control system to accurately and quickly control IOP, IOP and/or ICP were manipulated in two tree shrews with SRVPs and the effects on SRVP properties were quantified. SRVPs were present in 75% of tree shrews at physiologic IOP and ICP. Altering IOP or ICP produced changes in tree shrew SRVP properties; specifically, increasing IOP caused SRVP amplitude to increase, while increasing ICP caused SRVP amplitude to decrease. In addition, a higher IOP was necessary to generate SRVPs at a higher ICP than at a lower ICP. SRVPs occur with a similar incidence in tree shrews as in humans, and tree shrew SRVPs are affected by changes in IOP and ICP in a manner qualitatively similar to that reported in humans. In view of anatomic similarities, tree shrews are a promising animal model system to further study the complex relationship between SRVPs, IOP, and ICP.
Collapse
Affiliation(s)
- Michael Dattilo
- Department of Ophthalmology, Emory University School of Medicine, 1365-B Clifton Road, Atlanta, 30322, GA, USA; Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, 313 Ferst Drive NW, Atlanta, 30332, GA, USA.
| | - A Thomas Read
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, 313 Ferst Drive NW, Atlanta, 30332, GA, USA.
| | - Brian C Samuels
- Department of Ophthalmology, University of Alabama at Birmingham School of Medicine, 1670 University Boulevard, Birmingham, 35294, AL, USA.
| | - C Ross Ethier
- Department of Ophthalmology, Emory University School of Medicine, 1365-B Clifton Road, Atlanta, 30322, GA, USA; Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, 313 Ferst Drive NW, Atlanta, 30332, GA, USA.
| |
Collapse
|
37
|
Pérez-Wohlfeil E, Diaz-Del-Pino S, Trelles O. Ultra-fast genome comparison for large-scale genomic experiments. Sci Rep 2019; 9:10274. [PMID: 31312019 PMCID: PMC6635410 DOI: 10.1038/s41598-019-46773-w] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2019] [Accepted: 06/07/2019] [Indexed: 01/23/2023] Open
Abstract
In the last decade, a technological shift in the bioinformatics field has occurred: larger genomes can now be sequenced quickly and cost effectively, resulting in the computational need to efficiently compare large and abundant sequences. Furthermore, detecting conserved similarities across large collections of genomes remains a problem. The size of chromosomes, along with the substantial amount of noise and number of repeats found in DNA sequences (particularly in mammals and plants), leads to a scenario where executing and waiting for complete outputs is both time and resource consuming. Filtering steps, manual examination and annotation, very long execution times and a high demand for computational resources represent a few of the many difficulties faced in large genome comparisons. In this work, we provide a method designed for comparisons of considerable amounts of very long sequences that employs a heuristic algorithm capable of separating noise and repeats from conserved fragments in pairwise genomic comparisons. We provide software implementation that computes in linear time using one core as a minimum and a small, constant memory footprint. The method produces both a previsualization of the comparison and a collection of indices to drastically reduce computational complexity when performing exhaustive comparisons. Last, the method scores the comparison to automate classification of sequences and produces a list of detected synteny blocks to enable new evolutionary studies.
Collapse
Affiliation(s)
- Esteban Pérez-Wohlfeil
- Computer Architecture Department, University of Málaga - Instituto de Investigación Biomédica de Málaga-IBIMA, Málaga, Spain
| | - Sergio Diaz-Del-Pino
- Computer Architecture Department, University of Málaga - Instituto de Investigación Biomédica de Málaga-IBIMA, Málaga, Spain
| | - Oswaldo Trelles
- Computer Architecture Department, University of Málaga - Instituto de Investigación Biomédica de Málaga-IBIMA, Málaga, Spain.
| |
Collapse
|
38
|
Lu YY, Tang K, Ren J, Fuhrman JA, Waterman MS, Sun F. CAFE: aCcelerated Alignment-FrEe sequence analysis. Nucleic Acids Res 2019; 45:W554-W559. [PMID: 28472388 PMCID: PMC5793812 DOI: 10.1093/nar/gkx351] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2017] [Accepted: 04/20/2017] [Indexed: 12/13/2022] Open
Abstract
Alignment-free genome and metagenome comparisons are increasingly important with the development of next generation sequencing (NGS) technologies. Recently developed state-of-the-art k-mer based alignment-free dissimilarity measures including CVTree, \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$d_2^*$\end{document} and \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$d_2^S$\end{document} are more computationally expensive than measures based solely on the k-mer frequencies. Here, we report a standalone software, aCcelerated Alignment-FrEe sequence analysis (CAFE), for efficient calculation of 28 alignment-free dissimilarity measures. CAFE allows for both assembled genome sequences and unassembled NGS shotgun reads as input, and wraps the output in a standard PHYLIP format. In downstream analyses, CAFE can also be used to visualize the pairwise dissimilarity measures, including dendrograms, heatmap, principal coordinate analysis and network display. CAFE serves as a general k-mer based alignment-free analysis platform for studying the relationships among genomes and metagenomes, and is freely available at https://github.com/younglululu/CAFE.
Collapse
Affiliation(s)
- Yang Young Lu
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, CA 90089, USA
| | - Kujin Tang
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, CA 90089, USA
| | - Jie Ren
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, CA 90089, USA
| | - Jed A Fuhrman
- Department of Biological Sciences and Wrigley Institute for Environmental Studies, University of Southern California, Los Angeles, CA 90089, USA
| | - Michael S Waterman
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, CA 90089, USA.,Centre for Computational Systems Biology, School of Mathematical Sciences, Fudan University, 200433 Shanghai, China
| | - Fengzhu Sun
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, CA 90089, USA.,Centre for Computational Systems Biology, School of Mathematical Sciences, Fudan University, 200433 Shanghai, China
| |
Collapse
|
39
|
Patel R, Kumar S. On estimating evolutionary probabilities of population variants. BMC Evol Biol 2019; 19:133. [PMID: 31238981 PMCID: PMC6593550 DOI: 10.1186/s12862-019-1455-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2018] [Accepted: 06/06/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The evolutionary probability (EP) of an allele in a DNA or protein sequence predicts evolutionarily permissible (ePerm; EP ≥ 0.05) and forbidden (eForb; EP < 0.05) variants. EP of an allele represents an independent evolutionary expectation of observing an allele in a population based solely on the long-term substitution patterns captured in a multiple sequence alignment. In the neutral theory, EP and population frequencies can be compared to identify neutral and non-neutral alleles. This approach has been used to discover candidate adaptive polymorphisms in humans, which are eForbs segregating with high frequencies. The original method to compute EP requires the evolutionary relationships and divergence times of species in the sequence alignment (a timetree), which are not known with certainty for most datasets. This requirement impedes a general use of the original EP formulation. Here, we present an approach in which the phylogeny and times are inferred from the sequence alignment itself prior to the EP calculation. We evaluate if the modified EP approach produces results that are similar to those from the original method. RESULTS We compared EP estimates from the original and the modified approaches by using more than 18,000 protein sequence alignments containing orthologous sequences from 46 vertebrate species. For the original EP calculations, we used species relationships from UCSC and divergence times from TimeTree web resource, and the resulting EP estimates were considered to be the ground truth. We found that the modified approaches produced reasonable EP estimates for HGMD disease missense variant and 1000 Genomes Project missense variant datasets. Our results showed that reliable estimates of EP can be obtained without a priori knowledge of the sequence phylogeny and divergence times. We also found that, in order to obtain robust EP estimates, it is important to assemble a dataset with many sequences, sampling from a diversity of species groups. CONCLUSION We conclude that the modified EP approach will be generally applicable for alignments and enable the detection of potentially neutral, deleterious, and adaptive alleles in populations.
Collapse
Affiliation(s)
- Ravi Patel
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA.,Department of Biology, Temple University, Philadelphia, PA, 19122, USA
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA. .,Department of Biology, Temple University, Philadelphia, PA, 19122, USA. .,Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia.
| |
Collapse
|
40
|
Wang S, Chim B, Su Y, Khil P, Wong M, Wang X, Foroushani A, Smith PT, Liu X, Li R, Ganesan S, Kanellopoulou C, Hafner M, Muljo SA. Enhancement of LIN28B-induced hematopoietic reprogramming by IGF2BP3. Genes Dev 2019; 33:1048-1068. [PMID: 31221665 PMCID: PMC6672051 DOI: 10.1101/gad.325100.119] [Citation(s) in RCA: 49] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2019] [Accepted: 05/16/2019] [Indexed: 01/07/2023]
Abstract
Fetal hematopoietic stem and progenitor cells (HSPCs) hold promise to cure a wide array of hematological diseases, and we previously found a role for the RNA-binding protein (RBP) Lin28b in respecifying adult HSPCs to resemble their fetal counterparts. Here we show by single-cell RNA sequencing that Lin28b alone was insufficient for complete reprogramming of gene expression from the adult toward the fetal pattern. Using proteomics and in situ analyses, we found that Lin28b (and its closely related paralog, Lin28a) directly interacted with Igf2bp3, another RBP, and their enforced co-expression in adult HSPCs reactivated fetal-like B-cell development in vivo more efficiently than either factor alone. In B-cell progenitors, Lin28b and Igf2bp3 jointly stabilized thousands of mRNAs by binding at the same sites, including those of the B-cell regulators Pax5 and Arid3a as well as Igf2bp3 mRNA itself, forming an autoregulatory loop. Our results suggest that Lin28b and Igf2bp3 are at the center of a gene regulatory network that mediates the fetal-adult hematopoietic switch. A method to efficiently generate induced fetal-like hematopoietic stem cells (ifHSCs) will facilitate basic studies of their biology and possibly pave a path toward their clinical application.
Collapse
Affiliation(s)
- Saifeng Wang
- Laboratory of Immune System Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Bryan Chim
- Laboratory of Immune System Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Yijun Su
- Laboratory of Immune System Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Pavel Khil
- Department of Laboratory Medicine, Clinical Center, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Madeline Wong
- Laboratory of Immune System Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Xiantao Wang
- Laboratory of Muscle Stem Cells and Gene Regulation, National Institute of Arthritis and Musculoskeletal and Skin Diseases, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Amir Foroushani
- Laboratory of Immune System Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Patrick T Smith
- Laboratory of Immune System Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Xiuhuai Liu
- Laboratory of Immune System Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Rui Li
- Laboratory of Immune System Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Sundar Ganesan
- Biological Imaging Section, Research Technologies Branch, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Chrysi Kanellopoulou
- Laboratory of Immune System Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Markus Hafner
- Laboratory of Muscle Stem Cells and Gene Regulation, National Institute of Arthritis and Musculoskeletal and Skin Diseases, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Stefan A Muljo
- Laboratory of Immune System Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland 20892, USA
| |
Collapse
|
41
|
Abstract
Background It was shown that the major part of human genome is transcribed and produces a large number of long noncoding RNAs (lncRNAs). Today there are many evidences that lncRNAs play important role in the regulation of gene expression during different cellular processes. Moreover, lncRNAs are involved in the development of various human diseases. However, the function of the major part of annotated transcripts is currently unknown, whereas different lncRNAs annotations tend to have low overlap. Recent studies revealed that some lncRNAs have small open reading frames (smORFs), that produce the functional microproteins. However, the question whether the function of such genes is determined by microprotein or RNA itself or both remains open. Thus, the study of new lncRNA genes is important to understanding the functional role of such a heterogeneous class of genes. Results In the present study, we used reverse transcription PCR and rapid amplification of cDNA ends (RACE) analysis to determine the structure of the LINC01420 transcript. We revealed that LINC01420 has two isoforms that differ in length of the last exon and are localized predominantly in the cytoplasm. We showed that expression of the short isoform is much higher than the long. Besides, MTT and wound-healing assays revealed that LINC01420 inhibited cell migration in human melanoma cell line A375, but does not influence on cell viability. Conclusion During our work, D’Lima et al. found smORF in the first exon of the LINC01420 gene. This smORF produces functional microprotein named non-annotated P-body dissociating polypeptide (NoBody). However, our results provide new facts about LINC01420 transcript and its function.
Collapse
Affiliation(s)
- Daria O Konina
- Department of Biological and Medical Physics, Moscow Institute of Physics and Technology (State University), Dolgoprudny, 141701, Russian Federation
| | | | - Mikhail Yu Skoblov
- Research Centre for Medical Genetics, Moscow, Russian Federation, 115522.,Far Eastern Federal University, Vladivostok, 690090, Russian Federation
| |
Collapse
|
42
|
Gain of transcription factor binding sites is associated to changes in the expression signature of human brain and testis and is correlated to genes with higher expression breadth. SCIENCE CHINA-LIFE SCIENCES 2019; 62:526-534. [PMID: 30919278 DOI: 10.1007/s11427-018-9454-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/20/2018] [Accepted: 10/15/2018] [Indexed: 11/26/2022]
Abstract
The gain of transcription factor binding sites (TFBS) is believed to represent one of the major causes of biological innovation. Here we used strategies based on comparative genomics to identify 21,822 TFBS specific to the human lineage (TFBS-HS), when compared to chimpanzee and gorilla genomes. More than 40% (9,206) of these TFBS-HS are in the vicinity of 1,283 genes. A comparison of the expression pattern of these genes and the corresponding orthologs in chimpanzee and gorilla identified genes differentially expressed in human tissues. These genes show a more divergent expression pattern in the human testis and brain, suggesting a role for positive selection in the fixation of TFBS gains. Genes associated with TFBS-HS were enriched in gene ontology categories related to transcriptional regulation, signaling, differentiation/development and nervous system. Furthermore, genes associated with TFBS-HS present a higher expression breadth when compared to genes in general. This biased distribution is due to a preferential gain of TFBS in genes with higher expression breadth rather than a shift in the expression pattern after the gain of TFBS.
Collapse
|
43
|
Liu L, Sanderford MD, Patel R, Chandrashekar P, Gibson G, Kumar S. Biological relevance of computationally predicted pathogenicity of noncoding variants. Nat Commun 2019; 10:330. [PMID: 30659175 PMCID: PMC6338804 DOI: 10.1038/s41467-018-08270-y] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2018] [Accepted: 12/19/2018] [Indexed: 11/15/2022] Open
Abstract
Computational prediction of the phenotypic propensities of noncoding single nucleotide variants typically combines annotation of genomic, functional and evolutionary attributes into a single score. Here, we evaluate if the claimed excellent accuracies of these predictions translate into high rates of success in addressing questions important in biological research, such as fine mapping causal variants, distinguishing pathogenic allele(s) at a given position, and prioritizing variants for genetic risk assessment. A significant disconnect is found to exist between the statistical modelling and biological performance of predictive approaches. We discuss fundamental reasons underlying these deficiencies and suggest that future improvements of computational predictions need to address confounding of allelic, positional and regional effects as well as imbalance of the proportion of true positive variants in candidate lists. Researchers can make use of a variety of computational tools to prioritize genetic variants and predict their pathogenicity. Here, the authors evaluate the performance of six of these tools in three typical biological tasks and find generally low concordance of predictions and experimental confirmation.
Collapse
Affiliation(s)
- Li Liu
- College of Health Solutions, Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | - Maxwell D Sanderford
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
| | - Ravi Patel
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA.,Department of Biology, Temple University, Philadelphia, PA, USA
| | - Pramod Chandrashekar
- College of Health Solutions, Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | - Greg Gibson
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA.
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA. .,Department of Biology, Temple University, Philadelphia, PA, USA.
| |
Collapse
|
44
|
Seim I, Baker AM, Chopin LK. RadAA: A Command-line Tool for Identification of Radical Amino Acid Changes in Multiple Sequence Alignments. Mol Inform 2019; 38:e1800057. [PMID: 30019526 PMCID: PMC6585820 DOI: 10.1002/minf.201800057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2018] [Accepted: 06/24/2018] [Indexed: 11/09/2022]
Abstract
High-throughput sequencing has revolutionised biology and medicine. Numerous genomes and transcriptome assemblies are now available, and these genomic data sets lend themselves to comparisons between species, strains, and other strata. Researchers often need to rapidly identify changes, in particular amino acid substitutions that could confer biological function in their system of interest. However, we are not aware of an easy-to-use tool that can be used to detect such changes, and researchers currently rely on idiosyncratic computer code. We present RadAA, a command-line tool which screens multiple sequence alignments for radical amino acid changes in a stratum/strata by classifying residues into groups by charge (with cysteine in its own group). RadAA is easy to use, even for researchers with little experience in computational biology. It can be run on most operating systems - including MacOS, Windows, and Linux - and integrated into high-performance computing environments. The RadAA source code and executable binaries are freely available at https://github.com/sciseim/RadAA.
Collapse
Affiliation(s)
- Inge Seim
- Comparative and Endocrine Biology Laboratory, Translational Research Institute - Institute of Health and Biomedical Innovation, School of Biomedical SciencesQueensland University of Technology37 Kent St4102WoolloongabbaAustralia
- Integrative Biology Laboratory, College of Life SciencesNanjing Normal University1 Wenyuan Road210023NanjingChina
| | - Andrew M. Baker
- School of Earth, Environmental and Biological Sciences, Science and Engineering FacultyQueensland University of Technology2 George St, 4001BrisbaneAustralia
| | - Lisa K. Chopin
- Comparative and Endocrine Biology Laboratory, Translational Research Institute - Institute of Health and Biomedical Innovation, School of Biomedical SciencesQueensland University of Technology37 Kent St4102WoolloongabbaAustralia
| |
Collapse
|
45
|
Hatlen A, Helmy M, Marco A. PopTargs: a database for studying population evolutionary genetics of human microRNA target sites. Database (Oxford) 2019; 2019:baz102. [PMID: 31608947 PMCID: PMC6790967 DOI: 10.1093/database/baz102] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2019] [Revised: 06/07/2019] [Accepted: 08/01/2019] [Indexed: 01/03/2023]
Abstract
There is an increasing interest in the study of polymorphic variants at gene regulatory motifs, including microRNA target sites. Understanding the effects of selective forces at specific microRNA target sites, together with other factors like expression levels or evolutionary conservation, requires the joint study of multiple datasets. We have compiled information from multiple sources and compared it with predicted microRNA target sites to build a comprehensive database for the study of microRNA targets in human populations. PopTargs is a web-based tool that allows the easy extraction of multiple datasets and the joint analyses of them, including allele frequencies, ancestral status, population differentiation statistics and site conservation. The user can also compare the allele frequency spectrum between two groups of target sites and conveniently produce plots. The database can be easily expanded as new data becomes available and the raw database as well as code for creating new custom-made databases is available for downloading. We also describe a few illustrative examples.
Collapse
Affiliation(s)
- Andrea Hatlen
- School of Life Sciences, University of Essex, Wivenhoe Park, Colchester CO4 3SQ, UK
| | - Mohab Helmy
- School of Life Sciences, University of Essex, Wivenhoe Park, Colchester CO4 3SQ, UK
| | - Antonio Marco
- School of Life Sciences, University of Essex, Wivenhoe Park, Colchester CO4 3SQ, UK
| |
Collapse
|
46
|
Brooks SA, Stick J, Braman A, Palermo K, Robinson NE, Ainsworth DM. Identification of loci affecting sexually dimorphic patterns for height and recurrent laryngeal neuropathy risk in American Belgian Draft Horses. Physiol Genomics 2018; 50:1051-1058. [DOI: 10.1152/physiolgenomics.00068.2018] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
Equine recurrent laryngeal neuropathy (RLN) is a bilateral mononeuropathy with an unknown etiology. In Thoroughbreds (TB), we previously demonstrated that the haplotype association for height (LCORL/NCAPG locus on ECA3, which affects body size) and RLN was coincident. In the present study, we performed a genome-wide association scan (GWAS) for RLN in 458 American Belgian Draft Horses, a breed fixed for the LCORL/NCAPG risk alelle. In this breed, RLN risk is associated with sexually dimorphic differences in height, and we identified a novel locus contributing to height in a sex-specific manner: MYPN (ECA1). Yet this specific locus contributes little to RLN risk, suggesting that other growth traits correlated to height may underlie the correlation to this disease. Controlling for height, we identified a locus on ECA15 contributing to RLN risk specifically in males. These results suggest that loci with sex-specific gene expression play an important role in altering growth traits impacting RLN etiology, but not necessarily adult height. These newly identified genes are promising targets for novel preventative and treatment strategies.
Collapse
Affiliation(s)
- Samantha A. Brooks
- Department of Animal Science, UF Genetics Institute, University of Florida, Gainesville, Florida
| | - John Stick
- Department of Large Animal Clinical Sciences, Michigan State University, East Lansing, Michigan
| | - Ashley Braman
- Department of Large Animal Clinical Sciences, Michigan State University, East Lansing, Michigan
| | - Katelyn Palermo
- Department of Animal Science, UF Genetics Institute, University of Florida, Gainesville, Florida
| | - N. Edward Robinson
- Department of Large Animal Clinical Sciences, Michigan State University, East Lansing, Michigan
| | | |
Collapse
|
47
|
Malinsky M, Svardal H, Tyers AM, Miska EA, Genner MJ, Turner GF, Durbin R. Whole-genome sequences of Malawi cichlids reveal multiple radiations interconnected by gene flow. Nat Ecol Evol 2018; 2:1940-1955. [PMID: 30455444 PMCID: PMC6443041 DOI: 10.1038/s41559-018-0717-x] [Citation(s) in RCA: 257] [Impact Index Per Article: 42.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2017] [Accepted: 10/10/2018] [Indexed: 12/30/2022]
Abstract
The hundreds of cichlid fish species in Lake Malawi constitute the most extensive recent vertebrate adaptive radiation. Here we characterize its genomic diversity by sequencing 134 individuals covering 73 species across all major lineages. The average sequence divergence between species pairs is only 0.1-0.25%. These divergence values overlap diversity within species, with 82% of heterozygosity shared between species. Phylogenetic analyses suggest that diversification initially proceeded by serial branching from a generalist Astatotilapia-like ancestor. However, no single species tree adequately represents all species relationships, with evidence for substantial gene flow at multiple times. Common signatures of selection on visual and oxygen transport genes shared by distantly related deep-water species point to both adaptive introgression and independent selection. These findings enhance our understanding of genomic processes underlying rapid species diversification, and provide a platform for future genetic analysis of the Malawi radiation.
Collapse
Affiliation(s)
- Milan Malinsky
- Wellcome Sanger Institute, Cambridge, UK.
- Zoological Institute, University of Basel, Basel, Switzerland.
| | - Hannes Svardal
- Wellcome Sanger Institute, Cambridge, UK
- Department of Genetics, University of Cambridge, Cambridge, UK
- Department of Biology, University of Antwerp, Antwerp, Belgium
- Naturalis Biodiversity Center, Leiden, The Netherlands
| | - Alexandra M Tyers
- School of Natural Sciences, Bangor University, Bangor, UK
- Max Planck Institute for Biology of Ageing, Cologne, Germany
| | - Eric A Miska
- Wellcome Sanger Institute, Cambridge, UK
- Department of Genetics, University of Cambridge, Cambridge, UK
- Gurdon Institute, University of Cambridge, Cambridge, UK
| | - Martin J Genner
- School of Biological Sciences, University of Bristol, Bristol, UK
| | | | - Richard Durbin
- Wellcome Sanger Institute, Cambridge, UK.
- Department of Genetics, University of Cambridge, Cambridge, UK.
| |
Collapse
|
48
|
Abstract
Ribosome profiling involves sequencing of approximately 30-base-long stretches of ribosome-protected mRNA. The technique enables genome-wide mapping of RNA undergoing active translation. Numerous small open reading frames have been identified by using ribosome profiling, leading researchers to question the assumed non-functional character of sORFs and to the identification of various important sORF translation products. sORFs.org (https://www.sorfs.org) is a public repository of small open reading frames identified by ribosome profiling in a database of over 3 million sORFs across 78 datasets from six species. sORFs.org is a multi-omics endeavor providing tools and metrics to assess the coding potential of the delineated sORFs. A pipeline is also in place to systematically rescan public mass spectrometry datasets to acquire new experimental evidence for sORF-encoded polypeptides. sORFs.org provides two distinct query interfaces, export functionality, and various visualization tools to enable inspection of the available information. © 2018 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Volodimir Olexiouk
- Department of Mathematical Modelling, Statistics and Bioinformatics, Universiteit Gent Faculteit Bio-Ingenieurswetenschappen, Gent, Belgium
| | - Gerben Menschaert
- Department of Mathematical Modelling, Statistics and Bioinformatics, Universiteit Gent Faculteit Bio-Ingenieurswetenschappen, Gent, Belgium
| |
Collapse
|
49
|
Fiddes IT, Armstrong J, Diekhans M, Nachtweide S, Kronenberg ZN, Underwood JG, Gordon D, Earl D, Keane T, Eichler EE, Haussler D, Stanke M, Paten B. Comparative Annotation Toolkit (CAT)-simultaneous clade and personal genome annotation. Genome Res 2018; 28:1029-1038. [PMID: 29884752 PMCID: PMC6028123 DOI: 10.1101/gr.233460.117] [Citation(s) in RCA: 68] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2017] [Accepted: 05/03/2018] [Indexed: 01/13/2023]
Abstract
The recent introductions of low-cost, long-read, and read-cloud sequencing technologies coupled with intense efforts to develop efficient algorithms have made affordable, high-quality de novo sequence assembly a realistic proposition. The result is an explosion of new, ultracontiguous genome assemblies. To compare these genomes, we need robust methods for genome annotation. We describe the fully open source Comparative Annotation Toolkit (CAT), which provides a flexible way to simultaneously annotate entire clades and identify orthology relationships. We show that CAT can be used to improve annotations on the rat genome, annotate the great apes, annotate a diverse set of mammals, and annotate personal, diploid human genomes. We demonstrate the resulting discovery of novel genes, isoforms, and structural variants-even in genomes as well studied as rat and the great apes-and how these annotations improve cross-species RNA expression experiments.
Collapse
Affiliation(s)
- Ian T Fiddes
- Genomics Institute, University of California Santa Cruz and Howard Hughes Medical Institute, Santa Cruz, California 95064, USA
- 10x Genomics, Pleasanton, California 94566, USA
| | - Joel Armstrong
- Genomics Institute, University of California Santa Cruz and Howard Hughes Medical Institute, Santa Cruz, California 95064, USA
| | - Mark Diekhans
- Genomics Institute, University of California Santa Cruz and Howard Hughes Medical Institute, Santa Cruz, California 95064, USA
| | - Stefanie Nachtweide
- Institute of Mathematics and Computer Science, University of Greifswald, 17489 Greifswald, Germany
| | - Zev N Kronenberg
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Jason G Underwood
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
- Pacific Biosciences of California, Incorporated, Menlo Park, California 94025, USA
| | - David Gordon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| | - Dent Earl
- Genomics Institute, University of California Santa Cruz and Howard Hughes Medical Institute, Santa Cruz, California 95064, USA
| | - Thomas Keane
- European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| | - David Haussler
- Genomics Institute, University of California Santa Cruz and Howard Hughes Medical Institute, Santa Cruz, California 95064, USA
| | - Mario Stanke
- Institute of Mathematics and Computer Science, University of Greifswald, 17489 Greifswald, Germany
| | - Benedict Paten
- Genomics Institute, University of California Santa Cruz and Howard Hughes Medical Institute, Santa Cruz, California 95064, USA
| |
Collapse
|
50
|
Ren J, Bai X, Lu YY, Tang K, Wang Y, Reinert G, Sun F. Alignment-Free Sequence Analysis and Applications. Annu Rev Biomed Data Sci 2018; 1:93-114. [PMID: 31828235 PMCID: PMC6905628 DOI: 10.1146/annurev-biodatasci-080917-013431] [Citation(s) in RCA: 58] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Genome and metagenome comparisons based on large amounts of next generation sequencing (NGS) data pose significant challenges for alignment-based approaches due to the huge data size and the relatively short length of the reads. Alignment-free approaches based on the counts of word patterns in NGS data do not depend on the complete genome and are generally computationally efficient. Thus, they contribute significantly to genome and metagenome comparison. Recently, novel statistical approaches have been developed for the comparison of both long and shotgun sequences. These approaches have been applied to many problems including the comparison of gene regulatory regions, genome sequences, metagenomes, binning contigs in metagenomic data, identification of virus-host interactions, and detection of horizontal gene transfers. We provide an updated review of these applications and other related developments of word-count based approaches for alignment-free sequence analysis.
Collapse
Affiliation(s)
- Jie Ren
- Molecular and Computational Biology Program, University of Southern California, Los Angeles, California, USA
| | - Xin Bai
- Molecular and Computational Biology Program, University of Southern California, Los Angeles, California, USA
- Centre for Computational Systems Biology, School of Mathematical Sciences, Fudan University, Shanghai, China
| | - Yang Young Lu
- Molecular and Computational Biology Program, University of Southern California, Los Angeles, California, USA
| | - Kujin Tang
- Molecular and Computational Biology Program, University of Southern California, Los Angeles, California, USA
| | - Ying Wang
- Department of Automation, Xiamen University, Xiamen, Fujian, China
| | - Gesine Reinert
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Fengzhu Sun
- Molecular and Computational Biology Program, University of Southern California, Los Angeles, California, USA
- Centre for Computational Systems Biology, School of Mathematical Sciences, Fudan University, Shanghai, China
| |
Collapse
|