1
|
Benegas G, Ye C, Albors C, Li JC, Song YS. Genomic Language Models: Opportunities and Challenges. ARXIV 2024:arXiv:2407.11435v2. [PMID: 39070037 PMCID: PMC11275703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
Large language models (LLMs) are having transformative impacts across a wide range of scientific fields, particularly in the biomedical sciences. Just as the goal of Natural Language Processing is to understand sequences of words, a major objective in biology is to understand biological sequences. Genomic Language Models (gLMs), which are LLMs trained on DNA sequences, have the potential to significantly advance our understanding of genomes and how DNA elements at various scales interact to give rise to complex functions. To showcase this potential, we highlight key applications of gLMs, including functional constraint prediction, sequence design, and transfer learning. Despite notable recent progress, however, developing effective and efficient gLMs presents numerous challenges, especially for species with large, complex genomes. Here, we discuss major considerations for developing and evaluating gLMs.
Collapse
Affiliation(s)
- Gonzalo Benegas
- Computer Science Division, University of California, Berkeley
| | - Chengzhong Ye
- Department of Statistics, University of California, Berkeley
| | - Carlos Albors
- Computer Science Division, University of California, Berkeley
| | - Jianan Canal Li
- Computer Science Division, University of California, Berkeley
| | - Yun S. Song
- Computer Science Division, University of California, Berkeley
- Department of Statistics, University of California, Berkeley
- Center for Computational Biology, University of California, Berkeley
| |
Collapse
|
2
|
Smith TA, Srikanth K, Huson HJ. Comparative Population Genomics of Arctic Sled Dogs Reveals a Deep and Complex History. Genome Biol Evol 2024; 16:evae190. [PMID: 39193769 PMCID: PMC11403282 DOI: 10.1093/gbe/evae190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2024] [Revised: 08/14/2024] [Accepted: 08/21/2024] [Indexed: 08/29/2024] Open
Abstract
Recent evidence demonstrates genomic and morphological continuity in the Arctic ancestral lineage of dogs. Here, we use the Siberian Husky to investigate the genomic legacy of the northeast Eurasian Arctic lineage and model the deep population history using genome-wide single nucleotide polymorphisms. Utilizing ancient dog-calibrated molecular clocks, we found that at least two distinct lineages of Arctic dogs existed in ancient Eurasia at the end of the Pleistocene. This pushes back the origin of sled dogs in the northeast Siberian Arctic with humans likely intentionally selecting dogs to perform different functions and keeping breeding populations that overlap in time and space relatively reproductively isolated. In modern Siberian Huskies, we found significant population structure based on how they are used by humans, recent European breed introgression in about half of the dogs that participate in races, moderate levels of inbreeding, and fewer potentially harmful variants in populations under strong selection for form and function (show, sled show, and racing populations of Siberian Huskies). As the struggle to preserve unique evolutionary lineages while maintaining genetic health intensifies across pedigreed dogs, understanding the genomic history to guide policies and best practices for breed management is crucial to sustain these ancient lineages and their unique evolutionary identity.
Collapse
Affiliation(s)
- Tracy A Smith
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD 21250, USA
| | - Krishnamoorthy Srikanth
- Department of Animal Sciences, Cornell University College of Agriculture and Life Sciences, Ithaca, NY 14853, USA
| | - Heather Jay Huson
- Department of Animal Sciences, Cornell University College of Agriculture and Life Sciences, Ithaca, NY 14853, USA
| |
Collapse
|
3
|
Sullivan PF, Yao S, Hjerling-Leffler J. Schizophrenia genomics: genetic complexity and functional insights. Nat Rev Neurosci 2024; 25:611-624. [PMID: 39030273 DOI: 10.1038/s41583-024-00837-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/04/2024] [Indexed: 07/21/2024]
Abstract
Determining the causes of schizophrenia has been a notoriously intractable problem, resistant to a multitude of investigative approaches over centuries. In recent decades, genomic studies have delivered hundreds of robust findings that implicate nearly 300 common genetic variants (via genome-wide association studies) and more than 20 rare variants (via whole-exome sequencing and copy number variant studies) as risk factors for schizophrenia. In parallel, functional genomic and neurobiological studies have provided exceptionally detailed information about the cellular composition of the brain and its interconnections in neurotypical individuals and, increasingly, in those with schizophrenia. Taken together, these results suggest unexpected complexity in the mechanisms that drive schizophrenia, pointing to the involvement of ensembles of genes (polygenicity) rather than single-gene causation. In this Review, we describe what we now know about the genetics of schizophrenia and consider the neurobiological implications of this information.
Collapse
Affiliation(s)
- Patrick F Sullivan
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA.
- Department of Psychiatry, University of North Carolina, Chapel Hill, NC, USA.
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden.
| | - Shuyang Yao
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Jens Hjerling-Leffler
- Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden.
| |
Collapse
|
4
|
Zhang R, Wang Y, Wang C, Sun X, Mergny JL. G-quadruplexes as pivotal components of cis-regulatory elements in the human genome. BMC Biol 2024; 22:177. [PMID: 39183303 PMCID: PMC11346177 DOI: 10.1186/s12915-024-01971-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Accepted: 08/05/2024] [Indexed: 08/27/2024] Open
Abstract
BACKGROUND Cis-regulatory elements (CREs) are crucial for regulating gene expression, and G-quadruplexes (G4s), as prototypal non-canonical DNA structures, may play a role in this regulation. However, the relationship between G4s and CREs, especially with non-promoter-like functional elements, requires further systematic investigation. We aimed to investigate the associations between G4s and human cCREs (candidate CREs) inferred from the Encyclopedia of DNA Elements (ENCODE) data. RESULTS We found that G4s are prominently enriched in most types of cCREs, especially those with promoter-like signatures (PLS). The co-occurrence of CTCF signals with H3K4me3 or H3K27ac signals strengthens the association between cCREs and G4s. Genetic variants in G4s, particularly within their G-runs, exhibit higher regulatory potential and deleterious effects compared to cCREs. The G-runs within G4s near transcriptional start sites (TSSs) are more evolutionarily constrained compared to G-runs in cCREs, while those far from the TSS are relatively less conserved. The presence of G4s is often linked to a more favorable local chromatin environment for the activation and execution of regulatory function of cCREs, potentially attributable to the formation of G4 secondary structures. Finally, we discovered that G4-associated cCREs exhibit widespread activation in a variety of cancers. CONCLUSIONS Our study suggests that G4s are integral components of human cis-regulatory elements, extending beyond their potential role in promoters. The G4 primary sequences are associated with the localization of CREs, while the G4 structures are linked to the activation of these elements. Therefore, we propose defining G4s as pivotal regulatory elements in the human genome.
Collapse
Affiliation(s)
- Rongxin Zhang
- Laboratoire d'Optique et Biosciences (LOB), Ecole Polytechnique, CNRS, INSERM, Institut Polytechnique de Paris, 91120, Palaiseau, France
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | - Yuqi Wang
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | - Cheng Wang
- Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, Nanjing Medical University, Nanjing, China
- Department of Bioinformatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China
| | - Xiao Sun
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China.
| | - Jean-Louis Mergny
- Laboratoire d'Optique et Biosciences (LOB), Ecole Polytechnique, CNRS, INSERM, Institut Polytechnique de Paris, 91120, Palaiseau, France.
| |
Collapse
|
5
|
Okamoto AS, Capellini TD. Parallel Evolution at the Regulatory Base-Pair Level Contributes to Mammalian Interspecific Differences in Polygenic Traits. Mol Biol Evol 2024; 41:msae157. [PMID: 39073613 PMCID: PMC11321361 DOI: 10.1093/molbev/msae157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Revised: 07/02/2024] [Accepted: 07/23/2024] [Indexed: 07/30/2024] Open
Abstract
Parallel evolution occurs when distinct lineages with similar ancestral states converge on a new phenotype. Parallel evolution has been well documented at the organ, gene pathway, and amino acid sequence level but in theory, it can also occur at individual nucleotides within noncoding regions. To examine the role of parallel evolution in shaping the biology of mammalian complex traits, we used data on single-nucleotide polymorphisms (SNPs) influencing human intraspecific variation to predict trait values in other species for 11 complex traits. We found that the alleles at SNP positions associated with human intraspecific height and red blood cell (RBC) count variation are associated with interspecific variation in the corresponding traits across mammals. These associations hold for deeper branches of mammalian evolution as well as between strains of collaborative cross mice. While variation in RBC count between primates uses both ancient and more recently evolved genomic regions, we found that only primate-specific elements were correlated with primate body size. We show that the SNP positions driving these signals are flanked by conserved sequences, maintain synteny with target genes, and overlap transcription factor binding sites. This work highlights the potential of conserved but tunable regulatory elements to be reused in parallel to facilitate evolutionary adaptation in mammals.
Collapse
Affiliation(s)
- Alexander S Okamoto
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Terence D Capellini
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| |
Collapse
|
6
|
Zhu Y, Watson C, Safonova Y, Pennell M, Bankevich A. Assessing Assembly Errors in Immunoglobulin Loci: A Comprehensive Evaluation of Long-read Genome Assemblies Across Vertebrates. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.19.604360. [PMID: 39091785 PMCID: PMC11291089 DOI: 10.1101/2024.07.19.604360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/04/2024]
Abstract
Long-read sequencing technologies have revolutionized genome assembly producing near-complete chromosome assemblies for numerous organisms, which are invaluable to research in many fields. However, regions with complex repetitive structure continue to represent a challenge for genome assembly algorithms, particularly in areas with high heterozygosity. Robust and comprehensive solutions for the assessment of assembly accuracy and completeness in these regions do not exist. In this study we focus on the assembly of biomedically important antibody-encoding immunoglobulin (IG) loci, which are characterized by complex duplications and repeat structures. High-quality full-length assemblies for these loci are critical for resolving haplotype-level annotations of IG genes, without which, functional and evolutionary studies of antibody immunity across vertebrates are not tractable. To address these challenges, we developed a pipeline, "CloseRead", that generates multiple assembly verification metrics for analysis and visualization. These metrics expand upon those of existing quality assessment tools and specifically target complex and highly heterozygous regions. Using CloseRead, we systematically assessed the accuracy and completeness of IG loci in publicly available assemblies of 74 vertebrate species, identifying problematic regions. We also demonstrated that inspecting assembly graphs for problematic regions can both identify the root cause of assembly errors and illuminate solutions for improving erroneous assemblies. For a subset of species, we were able to correct assembly errors through targeted reassembly. Together, our analysis demonstrated the utility of assembly assessment in improving the completeness and accuracy of IG loci across species.
Collapse
Affiliation(s)
- Yixin Zhu
- Department of Quantitative and Computational Biology and Biological Sciences, University of Southern California, Los Angeles, CA, United States
| | - Corey Watson
- Department of Biochemistry and Molecular Biology, University of Louisville School of Medicine, Louisville, KY, United States
| | - Yana Safonova
- Department of Computer Science and Engineering, Pennsylvania State University, PA, United States
| | - Matt Pennell
- Department of Quantitative and Computational Biology and Biological Sciences, University of Southern California, Los Angeles, CA, United States
| | - Anton Bankevich
- Department of Computer Science and Engineering, Pennsylvania State University, PA, United States
| |
Collapse
|
7
|
Oh JW, Beer MA. Gapped-kmer sequence modeling robustly identifies regulatory vocabularies and distal enhancers conserved between evolutionarily distant mammals. Nat Commun 2024; 15:6464. [PMID: 39085231 PMCID: PMC11291912 DOI: 10.1038/s41467-024-50708-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 07/17/2024] [Indexed: 08/02/2024] Open
Abstract
Gene regulatory elements drive complex biological phenomena and their mutations are associated with common human diseases. The impacts of human regulatory variants are often tested using model organisms such as mice. However, mapping human enhancers to conserved elements in mice remains a challenge, due to both rapid enhancer evolution and limitations of current computational methods. We analyze distal enhancers across 45 matched human/mouse cell/tissue pairs from a comprehensive dataset of DNase-seq experiments, and show that while cell-specific regulatory vocabulary is conserved, enhancers evolve more rapidly than promoters and CTCF binding sites. Enhancer conservation rates vary across cell types, in part explainable by tissue specific transposable element activity. We present an improved genome alignment algorithm using gapped-kmer features, called gkm-align, and make genome wide predictions for 1,401,803 orthologous regulatory elements. We show that gkm-align discovers 23,660 novel human/mouse conserved enhancers missed by previous algorithms, with strong evidence of conserved functional activity.
Collapse
Affiliation(s)
- Jin Woo Oh
- Department of Biomedical Engineering and McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Michael A Beer
- Department of Biomedical Engineering and McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
8
|
Buckley RM, Ostrander EA. Large-scale genomic analysis of the domestic dog informs biological discovery. Genome Res 2024; 34:811-821. [PMID: 38955465 PMCID: PMC11293549 DOI: 10.1101/gr.278569.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/04/2024]
Abstract
Recent advances in genomics, coupled with a unique population structure and remarkable levels of variation, have propelled the domestic dog to new levels as a system for understanding fundamental principles in mammalian biology. Central to this advance are more than 350 recognized breeds, each a closed population that has undergone selection for unique features. Genetic variation in the domestic dog is particularly well characterized compared with other domestic mammals, with almost 3000 high-coverage genomes publicly available. Importantly, as the number of sequenced genomes increases, new avenues for analysis are becoming available. Herein, we discuss recent discoveries in canine genomics regarding behavior, morphology, and disease susceptibility. We explore the limitations of current data sets for variant interpretation, tradeoffs between sequencing strategies, and the burgeoning role of long-read genomes for capturing structural variants. In addition, we consider how large-scale collections of whole-genome sequence data drive rare variant discovery and assess the geographic distribution of canine diversity, which identifies Asia as a major source of missing variation. Finally, we review recent comparative genomic analyses that will facilitate annotation of the noncoding genome in dogs.
Collapse
Affiliation(s)
- Reuben M Buckley
- National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Elaine A Ostrander
- National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| |
Collapse
|
9
|
Iñiguez-Muñoz S, Llinàs-Arias P, Ensenyat-Mendez M, Bedoya-López AF, Orozco JIJ, Cortés J, Roy A, Forsberg-Nilsson K, DiNome ML, Marzese DM. Hidden secrets of the cancer genome: unlocking the impact of non-coding mutations in gene regulatory elements. Cell Mol Life Sci 2024; 81:274. [PMID: 38902506 PMCID: PMC11335195 DOI: 10.1007/s00018-024-05314-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 12/07/2023] [Accepted: 06/06/2024] [Indexed: 06/22/2024]
Abstract
Discoveries in the field of genomics have revealed that non-coding genomic regions are not merely "junk DNA", but rather comprise critical elements involved in gene expression. These gene regulatory elements (GREs) include enhancers, insulators, silencers, and gene promoters. Notably, new evidence shows how mutations within these regions substantially influence gene expression programs, especially in the context of cancer. Advances in high-throughput sequencing technologies have accelerated the identification of somatic and germline single nucleotide mutations in non-coding genomic regions. This review provides an overview of somatic and germline non-coding single nucleotide alterations affecting transcription factor binding sites in GREs, specifically involved in cancer biology. It also summarizes the technologies available for exploring GREs and the challenges associated with studying and characterizing non-coding single nucleotide mutations. Understanding the role of GRE alterations in cancer is essential for improving diagnostic and prognostic capabilities in the precision medicine era, leading to enhanced patient-centered clinical outcomes.
Collapse
Affiliation(s)
- Sandra Iñiguez-Muñoz
- Cancer Epigenetics Laboratory at the Cancer Cell Biology Group, Institut d'Investigació Sanitària Illes Balears (IdISBa), Palma, Spain
| | - Pere Llinàs-Arias
- Cancer Epigenetics Laboratory at the Cancer Cell Biology Group, Institut d'Investigació Sanitària Illes Balears (IdISBa), Palma, Spain
| | - Miquel Ensenyat-Mendez
- Cancer Epigenetics Laboratory at the Cancer Cell Biology Group, Institut d'Investigació Sanitària Illes Balears (IdISBa), Palma, Spain
| | - Andrés F Bedoya-López
- Cancer Epigenetics Laboratory at the Cancer Cell Biology Group, Institut d'Investigació Sanitària Illes Balears (IdISBa), Palma, Spain
| | - Javier I J Orozco
- Saint John's Cancer Institute, Providence Saint John's Health Center, Santa Monica, CA, USA
| | - Javier Cortés
- International Breast Cancer Center (IBCC), Pangaea Oncology, Quiron Group, 08017, Barcelona, Spain
- Medica Scientia Innovation Research SL (MEDSIR), 08018, Barcelona, Spain
- Faculty of Biomedical and Health Sciences, Department of Medicine, Universidad Europea de Madrid, 28670, Madrid, Spain
| | - Ananya Roy
- Department of Immunology, Genetics and Pathology and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Karin Forsberg-Nilsson
- Department of Immunology, Genetics and Pathology and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
- University of Nottingham Biodiscovery Institute, Nottingham, UK
| | - Maggie L DiNome
- Department of Surgery, Duke University School of Medicine, Durham, NC, USA
| | - Diego M Marzese
- Cancer Epigenetics Laboratory at the Cancer Cell Biology Group, Institut d'Investigació Sanitària Illes Balears (IdISBa), Palma, Spain.
- Department of Surgery, Duke University School of Medicine, Durham, NC, USA.
| |
Collapse
|
10
|
Fegraeus K, Rosengren MK, Naboulsi R, Orlando L, Åbrink M, Jouni A, Velie BD, Raine A, Egner B, Mattsson CM, Lång K, Zhigulev A, Björck HM, Franco-Cereceda A, Eriksson P, Andersson G, Sahlén P, Meadows JRS, Lindgren G. An endothelial regulatory module links blood pressure regulation with elite athletic performance. PLoS Genet 2024; 20:e1011285. [PMID: 38885195 PMCID: PMC11182536 DOI: 10.1371/journal.pgen.1011285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 05/02/2024] [Indexed: 06/20/2024] Open
Abstract
The control of transcription is crucial for homeostasis in mammals. A previous selective sweep analysis of horse racing performance revealed a 19.6 kb candidate regulatory region 50 kb downstream of the Endothelin3 (EDN3) gene. Here, the region was narrowed to a 5.5 kb span of 14 SNVs, with elite and sub-elite haplotypes analyzed for association to racing performance, blood pressure and plasma levels of EDN3 in Coldblooded trotters and Standardbreds. Comparative analysis of human HiCap data identified the span as an enhancer cluster active in endothelial cells, interacting with genes relevant to blood pressure regulation. Coldblooded trotters with the sub-elite haplotype had significantly higher blood pressure compared to horses with the elite performing haplotype during exercise. Alleles within the elite haplotype were part of the standing variation in pre-domestication horses, and have risen in frequency during the era of breed development and selection. These results advance our understanding of the molecular genetics of athletic performance and vascular traits in both horses and humans.
Collapse
Affiliation(s)
- Kim Fegraeus
- Department of Medical Sciences, Science for life laboratory, Uppsala University, Sweden
| | - Maria K. Rosengren
- Department of Animal Biosciences, Swedish University of Agricultural Sciences Uppsala, Sweden
| | - Rakan Naboulsi
- Department of Animal Biosciences, Swedish University of Agricultural Sciences Uppsala, Sweden
- Childhood Cancer Research Unit, Department of Women’s and Children’s Health, Karolinska Institute, Stockholm
| | - Ludovic Orlando
- Centre d’Anthropobiologie et de Génomique de Toulouse (CNRS UMR 5288), Université Paul Sabatier, Toulouse, France
| | - Magnus Åbrink
- Department of Biomedical Sciences and Veterinary Public Health, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Ahmad Jouni
- Department of Animal Biosciences, Swedish University of Agricultural Sciences Uppsala, Sweden
| | - Brandon D. Velie
- School of Life & Environmental Sciences, University of Sydney, Sydney, Australia
| | - Amanda Raine
- Department of Medical Sciences, Science for life laboratory, Uppsala University, Sweden
| | - Beate Egner
- Department of Cardio-Vascular Research, Veterinary Academy of Higher Learning, Babenhausen, Germany
| | - C Mikael Mattsson
- Silicon Valley Exercise Analytics (svexa), MenloPark, CA, United States of America
| | - Karin Lång
- Division of Cardiovascular Medicine, Center for Molecular Medicine, Department of Medicine, Karolinska Institutet, Stockholm, Karolinska University Hospital, Solna, Sweden
| | - Artemy Zhigulev
- KTH Royal Institute of Technology, School of Chemistry, Biotechnology and Health, Science for Life Laboratory, Stockholm, Sweden
| | - Hanna M. Björck
- Division of Cardiovascular Medicine, Center for Molecular Medicine, Department of Medicine, Karolinska Institutet, Stockholm, Karolinska University Hospital, Solna, Sweden
| | - Anders Franco-Cereceda
- Section of Cardiothoracic Surgery, Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
| | - Per Eriksson
- Division of Cardiovascular Medicine, Center for Molecular Medicine, Department of Medicine, Karolinska Institutet, Stockholm, Karolinska University Hospital, Solna, Sweden
| | - Göran Andersson
- Department of Animal Biosciences, Swedish University of Agricultural Sciences Uppsala, Sweden
| | - Pelin Sahlén
- KTH Royal Institute of Technology, School of Chemistry, Biotechnology and Health, Science for Life Laboratory, Stockholm, Sweden
| | - Jennifer R. S. Meadows
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Gabriella Lindgren
- Department of Animal Biosciences, Swedish University of Agricultural Sciences Uppsala, Sweden
- Center for Animal Breeding and Genetics, Department of Biosystems, KU Leuven, Leuven, Belgium
| |
Collapse
|
11
|
Pratt HE, Andrews G, Shedd N, Phalke N, Li T, Pampari A, Jensen M, Wen C, Consortium P, Gandal MJ, Geschwind DH, Gerstein M, Moore J, Kundaje A, Colubri A, Weng Z. Using a comprehensive atlas and predictive models to reveal the complexity and evolution of brain-active regulatory elements. SCIENCE ADVANCES 2024; 10:eadj4452. [PMID: 38781344 PMCID: PMC11114231 DOI: 10.1126/sciadv.adj4452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Accepted: 04/25/2024] [Indexed: 05/25/2024]
Abstract
Most genetic variants associated with psychiatric disorders are located in noncoding regions of the genome. To investigate their functional implications, we integrate epigenetic data from the PsychENCODE Consortium and other published sources to construct a comprehensive atlas of candidate brain cis-regulatory elements. Using deep learning, we model these elements' sequence syntax and predict how binding sites for lineage-specific transcription factors contribute to cell type-specific gene regulation in various types of glia and neurons. The elements' evolutionary history suggests that new regulatory information in the brain emerges primarily via smaller sequence mutations within conserved mammalian elements rather than entirely new human- or primate-specific sequences. However, primate-specific candidate elements, particularly those active during fetal brain development and in excitatory neurons and astrocytes, are implicated in the heritability of brain-related human traits. Additionally, we introduce PsychSCREEN, a web-based platform offering interactive visualization of PsychENCODE-generated genetic and epigenetic data from diverse brain cell types in individuals with psychiatric disorders and healthy controls.
Collapse
Affiliation(s)
- Henry E. Pratt
- Department of Genomics and Computational Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| | - Gregory Andrews
- Department of Genomics and Computational Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| | - Nicole Shedd
- Department of Genomics and Computational Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| | - Nishigandha Phalke
- Department of Genomics and Computational Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| | - Tongxin Li
- Department of Genomics and Computational Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
- Khoury College of Computer Science, Northeastern University, Boston, MA 02115, USA
| | - Anusri Pampari
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Matthew Jensen
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Cindy Wen
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | | | - Michael J. Gandal
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Lifespan Brain Institute, The Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Daniel H. Geschwind
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Institute of Precision Health, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Mark Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Computer Science, Yale University, New Haven, CT 06520, USA
- Department of Statistics and Data Science, Yale University, New Haven, CT 06520, USA
| | - Jill Moore
- Department of Genomics and Computational Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| | - Anshul Kundaje
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Andrés Colubri
- Department of Genomics and Computational Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| | - Zhiping Weng
- Department of Genomics and Computational Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| |
Collapse
|
12
|
Parker HG, Harris AC, Plassais J, Dhawan D, Kim EM, Knapp DW, Ostrander EA. Genome-wide analyses reveals an association between invasive urothelial carcinoma in the Shetland sheepdog and NIPAL1. NPJ Precis Oncol 2024; 8:112. [PMID: 38778091 PMCID: PMC11111773 DOI: 10.1038/s41698-024-00591-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Accepted: 04/14/2024] [Indexed: 05/25/2024] Open
Abstract
Naturally occurring canine invasive urinary carcinoma (iUC) closely resembles human muscle invasive bladder cancer in terms of histopathology, metastases, response to therapy, and low survival rate. The heterogeneous nature of the disease has led to the association of large numbers of risk loci in humans, however most are of small effect. There exists a need for new and accurate animal models of invasive bladder cancer. In dogs, distinct breeds show markedly different rates of iUC, thus presenting an opportunity to identify additional risk factors and overcome the locus heterogeneity encountered in human mapping studies. In the association study presented here, inclusive of 100 Shetland sheepdogs and 58 dogs of other breeds, we identify a homozygous protein altering point mutation within the NIPAL1 gene which increases risk by eight-fold (OR = 8.42, CI = 3.12-22.71), accounting for nearly 30% of iUC risk in the Shetland sheepdog. Inclusion of six additional loci accounts for most of the disease risk in the breed and explains nearly 75% of the phenotypes in this study. When combined with sequence data from tumors, we show that variation in the MAPK signaling pathway is an overarching cause of iUC susceptibility in dogs.
Collapse
Affiliation(s)
- Heidi G Parker
- Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Center, National Institutes of Health, Bethesda, MD, USA
| | - Alexander C Harris
- Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Center, National Institutes of Health, Bethesda, MD, USA
| | - Jocelyn Plassais
- Institut de Génétique et Développement de Rennes, CNRS-UMR6290, University of Rennes, 35000, Rennes, France
| | - Deepika Dhawan
- Department of Veterinary Clinical Sciences, College of Veterinary Medicine, Purdue University, West Lafayette, IN, USA
| | - Erika M Kim
- Center for Biomedical Informatics & Information Technology, National Cancer Institute, National Institutes of Health, Frederick, MD, USA
| | - Deborah W Knapp
- Department of Veterinary Clinical Sciences, College of Veterinary Medicine, Purdue University, West Lafayette, IN, USA
- Purdue University Center for Cancer Research, West Lafayette, IN, USA
| | - Elaine A Ostrander
- Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Center, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
13
|
Siraj L, Castro RI, Dewey H, Kales S, Nguyen TTL, Kanai M, Berenzy D, Mouri K, Wang QS, McCaw ZR, Gosai SJ, Aguet F, Cui R, Vockley CM, Lareau CA, Okada Y, Gusev A, Jones TR, Lander ES, Sabeti PC, Finucane HK, Reilly SK, Ulirsch JC, Tewhey R. Functional dissection of complex and molecular trait variants at single nucleotide resolution. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.05.592437. [PMID: 38766054 PMCID: PMC11100724 DOI: 10.1101/2024.05.05.592437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Identifying the causal variants and mechanisms that drive complex traits and diseases remains a core problem in human genetics. The majority of these variants have individually weak effects and lie in non-coding gene-regulatory elements where we lack a complete understanding of how single nucleotide alterations modulate transcriptional processes to affect human phenotypes. To address this, we measured the activity of 221,412 trait-associated variants that had been statistically fine-mapped using a Massively Parallel Reporter Assay (MPRA) in 5 diverse cell-types. We show that MPRA is able to discriminate between likely causal variants and controls, identifying 12,025 regulatory variants with high precision. Although the effects of these variants largely agree with orthogonal measures of function, only 69% can plausibly be explained by the disruption of a known transcription factor (TF) binding motif. We dissect the mechanisms of 136 variants using saturation mutagenesis and assign impacted TFs for 91% of variants without a clear canonical mechanism. Finally, we provide evidence that epistasis is prevalent for variants in close proximity and identify multiple functional variants on the same haplotype at a small, but important, subset of trait-associated loci. Overall, our study provides a systematic functional characterization of likely causal common variants underlying complex and molecular human traits, enabling new insights into the regulatory grammar underlying disease risk.
Collapse
Affiliation(s)
- Layla Siraj
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Program in Biophysics, Harvard Graduate School of Arts and Sciences, Boston, MA, USA
- Harvard-Massachusetts Institute of Technology MD/PhD Program, Harvard Medical School, Boston, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | | | | | | | | | - Masahiro Kanai
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA USA
- Center for Computational and Integrative Biology, Massachusetts General Hospital, Boston, MA, USA
| | | | | | - Qingbo S. Wang
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA USA
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
- Department of Genome Informatics, Graduate School of Medicine, the University of Tokyo, Tokyo, Japan
| | | | - Sager J. Gosai
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA, USA
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - François Aguet
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA
| | - Ran Cui
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Caleb A. Lareau
- Program in Computational and Systems Biology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Yukinori Okada
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
- Department of Genome Informatics, Graduate School of Medicine, the University of Tokyo, Tokyo, Japan
- Laboratory for Systems Genetics, RIKEN Center for Integrative Medical Sciences, Kanagawa, Japan
| | - Alexander Gusev
- Harvard Medical School and Dana-Farber Cancer Institute, Boston, MA, USA
| | - Thouis R. Jones
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Eric S. Lander
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Biology, MIT, Cambridge, MA, USA
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Pardis C. Sabeti
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Hilary K. Finucane
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA USA
| | - Steven K. Reilly
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
- Wu Tsai Institute, Yale University, New Haven, CT, USA
| | - Jacob C. Ulirsch
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA USA
- Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA, USA
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA
| | - Ryan Tewhey
- The Jackson Laboratory, Bar Harbor, ME, USA
- Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, ME, USA
- Graduate School of Biomedical Sciences, Tufts University School of Medicine, Boston, MA, USA
| |
Collapse
|
14
|
Zeng T, Spence JP, Mostafavi H, Pritchard JK. Bayesian estimation of gene constraint from an evolutionary model with gene features. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.05.19.541520. [PMID: 37292653 PMCID: PMC10245655 DOI: 10.1101/2023.05.19.541520] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Measures of selective constraint on genes have been used for many applications including clinical interpretation of rare coding variants, disease gene discovery, and studies of genome evolution. However, widely-used metrics are severely underpowered at detecting constraint for the shortest ∼25% of genes, potentially causing important pathogenic mutations to be overlooked. We developed a framework combining a population genetics model with machine learning on gene features to enable accurate inference of an interpretable constraint metric, shet. Our estimates outperform existing metrics for prioritizing genes important for cell essentiality, human disease, and other phenotypes, especially for short genes. Our new estimates of selective constraint should have wide utility for characterizing genes relevant to human disease. Finally, our inference framework, GeneBayes, provides a flexible platform that can improve estimation of many gene-level properties, such as rare variant burden or gene expression differences.
Collapse
Affiliation(s)
- Tony Zeng
- Department of Genetics, Stanford University, Stanford CA
| | | | | | - Jonathan K. Pritchard
- Department of Genetics, Stanford University, Stanford CA
- Department of Biology, Stanford University, Stanford CA
| |
Collapse
|
15
|
Benegas G, Albors C, Aw AJ, Ye C, Song YS. GPN-MSA: an alignment-based DNA language model for genome-wide variant effect prediction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.10.10.561776. [PMID: 37873118 PMCID: PMC10592768 DOI: 10.1101/2023.10.10.561776] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Whereas protein language models have demonstrated remarkable efficacy in predicting the effects of missense variants, DNA counterparts have not yet achieved a similar competitive edge for genome-wide variant effect predictions, especially in complex genomes such as that of humans. To address this challenge, we here introduce GPN-MSA, a novel framework for DNA language models that leverages whole-genome sequence alignments across multiple species and takes only a few hours to train. Across several benchmarks on clinical databases (ClinVar, COSMIC, OMIM), experimental functional assays (DMS, DepMap), and population genomic data (gnomAD), our model for the human genome achieves outstanding performance on deleteriousness prediction for both coding and non-coding variants.
Collapse
Affiliation(s)
- Gonzalo Benegas
- Graduate Group in Computational Biology, University of California, Berkeley
| | - Carlos Albors
- Computer Science Division, University of California, Berkeley
| | - Alan J. Aw
- Department of Statistics, University of California, Berkeley
| | - Chengzhong Ye
- Department of Statistics, University of California, Berkeley
| | - Yun S. Song
- Computer Science Division, University of California, Berkeley
- Department of Statistics, University of California, Berkeley
- Center for Computational Biology, University of California, Berkeley
| |
Collapse
|
16
|
Rivas-González I, Tung J. A multi-million-year natural experiment: Comparative genomics on a massive scale and its implications for human health. Evol Med Public Health 2024; 12:67-70. [PMID: 38601345 PMCID: PMC11005778 DOI: 10.1093/emph/eoae006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 03/18/2024] [Indexed: 04/12/2024] Open
Abstract
Improving the diversity and quality of genome assemblies for non-human mammals has been a long-standing goal of comparative genomics. The last year saw substantial progress towards this goal, including the release of genome alignments for 240 mammals and nearly half the primate order. These resources have increased our ability to identify evolutionarily constrained regions of the genome, and together strongly support the importance of these regions to biomedically relevant trait variation in humans. They also provide new strategies for identifying the genetic basis of changes unique to individual lineages, illustrating the value of evolutionary comparative approaches for understanding human health.
Collapse
Affiliation(s)
- Iker Rivas-González
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Jenny Tung
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Department of Evolutionary Anthropology, Duke University, Durham, NC, USA
- Department of Biology, Duke University, Durham, NC, USA
- Faculty of Life Sciences, Institute of Biology, Leipzig University, Leipzig, Germany
| |
Collapse
|
17
|
Wirthlin ME, Schmid TA, Elie JE, Zhang X, Kowalczyk A, Redlich R, Shvareva VA, Rakuljic A, Ji MB, Bhat NS, Kaplow IM, Schäffer DE, Lawler AJ, Wang AZ, Phan BN, Annaldasula S, Brown AR, Lu T, Lim BK, Azim E, Clark NL, Meyer WK, Pond SLK, Chikina M, Yartsev MM, Pfenning AR. Vocal learning-associated convergent evolution in mammalian proteins and regulatory elements. Science 2024; 383:eabn3263. [PMID: 38422184 PMCID: PMC11313673 DOI: 10.1126/science.abn3263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Accepted: 02/20/2024] [Indexed: 03/02/2024]
Abstract
Vocal production learning ("vocal learning") is a convergently evolved trait in vertebrates. To identify brain genomic elements associated with mammalian vocal learning, we integrated genomic, anatomical, and neurophysiological data from the Egyptian fruit bat (Rousettus aegyptiacus) with analyses of the genomes of 215 placental mammals. First, we identified a set of proteins evolving more slowly in vocal learners. Then, we discovered a vocal motor cortical region in the Egyptian fruit bat, an emergent vocal learner, and leveraged that knowledge to identify active cis-regulatory elements in the motor cortex of vocal learners. Machine learning methods applied to motor cortex open chromatin revealed 50 enhancers robustly associated with vocal learning whose activity tended to be lower in vocal learners. Our research implicates convergent losses of motor cortex regulatory elements in mammalian vocal learning evolution.
Collapse
Affiliation(s)
- Morgan E. Wirthlin
- Department of Computational Biology, Carnegie Mellon University; Pittsburgh, PA 15213, USA
- Present address: Department of Biomedical Engineering, Duke University; Durham, NC 27705
| | - Tobias A. Schmid
- Helen Wills Neuroscience Institute, University of California, Berkeley; Berkeley, CA 94708, USA
| | - Julie E. Elie
- Helen Wills Neuroscience Institute, University of California, Berkeley; Berkeley, CA 94708, USA
- Department of Bioengineering, University of California, Berkeley; Berkeley, CA 94708, USA
| | - Xiaomeng Zhang
- Department of Computational Biology, Carnegie Mellon University; Pittsburgh, PA 15213, USA
| | - Amanda Kowalczyk
- Department of Computational Biology, Carnegie Mellon University; Pittsburgh, PA 15213, USA
- Present address: Department of Biomedical Engineering, Duke University; Durham, NC 27705
| | - Ruby Redlich
- Department of Computational Biology, Carnegie Mellon University; Pittsburgh, PA 15213, USA
| | - Varvara A. Shvareva
- Department of Molecular and Cell Biology, University of California, Berkeley; Berkeley, CA 94708, USA
| | - Ashley Rakuljic
- Department of Molecular and Cell Biology, University of California, Berkeley; Berkeley, CA 94708, USA
| | - Maria B. Ji
- Department of Psychology, University of California, Berkeley; Berkeley, CA 94708, USA
| | - Ninad S. Bhat
- Department of Molecular and Cell Biology, University of California, Berkeley; Berkeley, CA 94708, USA
| | - Irene M. Kaplow
- Department of Computational Biology, Carnegie Mellon University; Pittsburgh, PA 15213, USA
- Present address: Department of Biomedical Engineering, Duke University; Durham, NC 27705
| | - Daniel E. Schäffer
- Department of Computational Biology, Carnegie Mellon University; Pittsburgh, PA 15213, USA
| | - Alyssa J. Lawler
- Present address: Department of Biomedical Engineering, Duke University; Durham, NC 27705
- Department of Biological Sciences, Carnegie Mellon University; Pittsburgh, PA 15213, USA
| | - Andrew Z. Wang
- Department of Computational Biology, Carnegie Mellon University; Pittsburgh, PA 15213, USA
| | - BaDoi N. Phan
- Department of Computational Biology, Carnegie Mellon University; Pittsburgh, PA 15213, USA
- Present address: Department of Biomedical Engineering, Duke University; Durham, NC 27705
| | - Siddharth Annaldasula
- Department of Computational Biology, Carnegie Mellon University; Pittsburgh, PA 15213, USA
| | - Ashley R. Brown
- Department of Computational Biology, Carnegie Mellon University; Pittsburgh, PA 15213, USA
- Present address: Department of Biomedical Engineering, Duke University; Durham, NC 27705
| | - Tianyu Lu
- Department of Computational Biology, Carnegie Mellon University; Pittsburgh, PA 15213, USA
| | - Byung Kook Lim
- Neurobiology section, Division of Biological Science, University of California, San Diego; La Jolla, CA 92093, USA
| | - Eiman Azim
- Molecular Neurobiology Laboratory, Salk Institute for Biological Studies; La Jolla, CA 92037, USA
| | - Nathan L. Clark
- Department of Biological Sciences, University of Pittsburgh; Pittsburgh, PA 15213, USA
| | - Wynn K. Meyer
- Department of Biological Sciences, Lehigh University; Bethlehem, PA 18015, USA
| | | | - Maria Chikina
- Department of Computational and Systems Biology, University of Pittsburgh; Pittsburgh, PA 15213, USA
| | - Michael M. Yartsev
- Helen Wills Neuroscience Institute, University of California, Berkeley; Berkeley, CA 94708, USA
- Department of Bioengineering, University of California, Berkeley; Berkeley, CA 94708, USA
| | - Andreas R. Pfenning
- Department of Computational Biology, Carnegie Mellon University; Pittsburgh, PA 15213, USA
| |
Collapse
|
18
|
Fleck K, Luria V, Garag N, Karger A, Hunter T, Marten D, Phu W, Nam KM, Sestan N, O’Donnell-Luria AH, Erceg J. Functional associations of evolutionarily recent human genes exhibit sensitivity to the 3D genome landscape and disease. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.17.585403. [PMID: 38559085 PMCID: PMC10980080 DOI: 10.1101/2024.03.17.585403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Genome organization is intricately tied to regulating genes and associated cell fate decisions. In this study, we examine the positioning and functional significance of human genes, grouped by their evolutionary age, within the 3D organization of the genome. We reveal that genes of different evolutionary origin have distinct positioning relationships with both domains and loop anchors, and remarkably consistent relationships with boundaries across cell types. While the functional associations of each group of genes are primarily cell type-specific, such associations of conserved genes maintain greater stability across 3D genomic features and disease than recently evolved genes. Furthermore, the expression of these genes across various tissues follows an evolutionary progression, such that RNA levels increase from young genes to ancient genes. Thus, the distinct relationships of gene evolutionary age, function, and positioning within 3D genomic features contribute to tissue-specific gene regulation in development and disease.
Collapse
Affiliation(s)
- Katherine Fleck
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269
| | - Victor Luria
- Department of Neuroscience, Yale School of Medicine, New Haven, CT 06510
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA 02115
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115
| | - Nitanta Garag
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269
| | - Amir Karger
- IT-Research Computing, Harvard Medical School, Boston, MA 02115
| | - Trevor Hunter
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269
| | - Daniel Marten
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA 02115
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142
| | - William Phu
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA 02115
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142
| | - Kee-Myoung Nam
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT 06510
| | - Nenad Sestan
- Department of Neuroscience, Yale School of Medicine, New Haven, CT 06510
| | - Anne H. O’Donnell-Luria
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA 02115
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142
- Department of Pediatrics, Harvard Medical School, Boston, MA 02115
| | - Jelena Erceg
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT 06030
| |
Collapse
|
19
|
Schraiber JG, Edge MD, Pennell M. Unifying approaches from statistical genetics and phylogenetics for mapping phenotypes in structured populations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.10.579721. [PMID: 38496530 PMCID: PMC10942266 DOI: 10.1101/2024.02.10.579721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
In both statistical genetics and phylogenetics, a major goal is to identify correlations between genetic loci or other aspects of the phenotype or environment and a focal trait. In these two fields, there are sophisticated but disparate statistical traditions aimed at these tasks. The disconnect between their respective approaches is becoming untenable as questions in medicine, conservation biology, and evolutionary biology increasingly rely on integrating data from within and among species, and once-clear conceptual divisions are becoming increasingly blurred. To help bridge this divide, we derive a general model describing the covariance between the genetic contributions to the quantitative phenotypes of different individuals. Taking this approach shows that standard models in both statistical genetics (e.g., Genome-Wide Association Studies; GWAS) and phylogenetic comparative biology (e.g., phylogenetic regression) can be interpreted as special cases of this more general quantitative-genetic model. The fact that these models share the same core architecture means that we can build a unified understanding of the strengths and limitations of different methods for controlling for genetic structure when testing for associations. We develop intuition for why and when spurious correlations may occur using analytical theory and conduct population-genetic and phylogenetic simulations of quantitative traits. The structural similarity of problems in statistical genetics and phylogenetics enables us to take methodological advances from one field and apply them in the other. We demonstrate this by showing how a standard GWAS technique-including both the genetic relatedness matrix (GRM) as well as its leading eigenvectors, corresponding to the principal components of the genotype matrix, in a regression model-can mitigate spurious correlations in phylogenetic analyses. As a case study of this, we re-examine an analysis testing for co-evolution of expression levels between genes across a fungal phylogeny, and show that including covariance matrix eigenvectors as covariates decreases the false positive rate while simultaneously increasing the true positive rate. More generally, this work provides a foundation for more integrative approaches for understanding the genetic architecture of phenotypes and how evolutionary processes shape it.
Collapse
|
20
|
Sakamoto F, Kanamori S, Díaz LM, Cádiz A, Ishii Y, Yamaguchi K, Shigenobu S, Nakayama T, Makino T, Kawata M. Detection of evolutionary conserved and accelerated genomic regions related to adaptation to thermal niches in Anolis lizards. Ecol Evol 2024; 14:e11117. [PMID: 38455144 PMCID: PMC10920033 DOI: 10.1002/ece3.11117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 02/18/2024] [Accepted: 02/22/2024] [Indexed: 03/09/2024] Open
Abstract
Understanding the genetic basis for adapting to thermal environments is important due to serious effects of global warming on ectothermic species. Various genes associated with thermal adaptation in lizards have been identified mainly focusing on changes in gene expression or the detection of positively selected genes using coding regions. Only a few comprehensive genome-wide analyses have included noncoding regions. This study aimed to identify evolutionarily conserved and accelerated genomic regions using whole genomes of eight Anolis lizard species that have repeatedly adapted to similar thermal environments in multiple lineages. Evolutionarily conserved genomic regions were extracted as regions with overall sequence conservation (regions with fewer base substitutions) across all lineages compared with the neutral model. Genomic regions that underwent accelerated evolution in the lineage of interest were identified as those with more base substitutions in the target branch than in the entire background branch. Conserved elements across all branches were relatively abundant in "intergenic" genomic regions among noncoding regions. Accelerated regions (ARs) of each lineage contained a significantly greater proportion of noncoding RNA genes than the entire multiple alignment. Common genes containing ARs within 5 kb of their vicinity in lineages with similar thermal habitats were identified. Many genes associated with circadian rhythms and behavior were found in hot-open and cool-shaded habitat lineages. These genes might play a role in contributing to thermal adaptation and assist future studies examining the function of genes involved in thermal adaptation via genome editing.
Collapse
Affiliation(s)
- Fuku Sakamoto
- Graduate School of Life SciencesTohoku UniversitySendaiJapan
| | | | - Luis M. Díaz
- National Museum of Natural History of CubaHavanaCuba
| | - Antonio Cádiz
- Faculty of BiologyUniversity of HavanaHavanaCuba
- Present address:
Department of BiologyUniversity of MiamiCoral GablesFloridaUSA
| | - Yuu Ishii
- Graduate School of Life SciencesTohoku UniversitySendaiJapan
| | | | - Shuji Shigenobu
- Trans‐Omics FacilityNational Institute for Basic BiologyOkazakiJapan
- Department of Basic Biology, School of Life ScienceThe Graduate University for Advanced Studies, SOKENDAIOkazakiJapan
| | - Takuro Nakayama
- Division of Life Sciences, Center for Computational SciencesUniversity of TsukubaTsukubaJapan
| | - Takashi Makino
- Graduate School of Life SciencesTohoku UniversitySendaiJapan
| | - Masakado Kawata
- Graduate School of Life SciencesTohoku UniversitySendaiJapan
| |
Collapse
|
21
|
Xu H, Zhang S, Duan Q, Lou M, Ling Y. Comprehensive analyses of 435 goat transcriptomes provides insight into male reproduction. Int J Biol Macromol 2024; 255:127942. [PMID: 37979751 DOI: 10.1016/j.ijbiomac.2023.127942] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 09/25/2023] [Accepted: 11/01/2023] [Indexed: 11/20/2023]
Abstract
A systematic analysis of genes related to reproduction is crucial for obtaining a comprehensive understanding of the molecular mechanisms that underlie male reproductive traits in mammals. Here, we utilized 435 goat transcriptome datasets to unveil the testicular tissue-specific genes (TSGs), allele-specific expression (ASE) genes and their uncharacterized transcriptional features related to male goat reproduction. Results showed a total of 1790 TSGs were identified in goat testis, which was the most among all tissues. GO enrichment analyses suggested that testicular TSGs were mainly involved in spermatogenesis, multicellular organism development, spermatid development, and flagellated sperm motility. Subsequently, a total of 95 highly conserved TSGs (HCTSGs), 508 middle conserved TSGs (MCTSGs) and 42 no conserved TSGs (NCTSGs) were identified in goat testis. GO enrichment analyses suggested that the HCTSGs and MCTSGs has a more important association with male reproduction than NCTSGs. Additionally, we identified 644 ASE genes, including 88 tissue-specific ASE (TS-ASE) genes (e.g., FSIP2, TDRD9). GO enrichment analyses indicated that both ASE genes and TS-ASE genes were associated with goat male reproduction. Overall, this study revealed an extensive gene set involved in the regulation of male goat reproduction and their dynamic transcription patterns. Data reported here provide valuable insights for a further improvement of the economic benefits of goats as well as future treatments for male infertility.
Collapse
Affiliation(s)
- Han Xu
- College of Animal Science and Technology, Anhui Agricultural University, Hefei 230036, Anhui, China
| | - Sihuan Zhang
- College of Animal Science and Technology, Anhui Agricultural University, Hefei 230036, Anhui, China
| | - Qin Duan
- College of Animal Science and Technology, Anhui Agricultural University, Hefei 230036, Anhui, China
| | - Mengyu Lou
- College of Animal Science and Technology, Anhui Agricultural University, Hefei 230036, Anhui, China
| | - Yinghui Ling
- College of Animal Science and Technology, Anhui Agricultural University, Hefei 230036, Anhui, China; Anhui Province Key Laboratory of Local Livestock and Poultry Genetic Resource Conservation and Bio-Breeding, Anhui Agricultural University, Hefei 230036, Anhui, China.
| |
Collapse
|
22
|
Kuderna LFK, Ulirsch JC, Rashid S, Ameen M, Sundaram L, Hickey G, Cox AJ, Gao H, Kumar A, Aguet F, Christmas MJ, Clawson H, Haeussler M, Janiak MC, Kuhlwilm M, Orkin JD, Bataillon T, Manu S, Valenzuela A, Bergman J, Rouselle M, Silva FE, Agueda L, Blanc J, Gut M, de Vries D, Goodhead I, Harris RA, Raveendran M, Jensen A, Chuma IS, Horvath JE, Hvilsom C, Juan D, Frandsen P, Schraiber JG, de Melo FR, Bertuol F, Byrne H, Sampaio I, Farias I, Valsecchi J, Messias M, da Silva MNF, Trivedi M, Rossi R, Hrbek T, Andriaholinirina N, Rabarivola CJ, Zaramody A, Jolly CJ, Phillips-Conroy J, Wilkerson G, Abee C, Simmons JH, Fernandez-Duque E, Kanthaswamy S, Shiferaw F, Wu D, Zhou L, Shao Y, Zhang G, Keyyu JD, Knauf S, Le MD, Lizano E, Merker S, Navarro A, Nadler T, Khor CC, Lee J, Tan P, Lim WK, Kitchener AC, Zinner D, Gut I, Melin AD, Guschanski K, Schierup MH, Beck RMD, Karakikes I, Wang KC, Umapathy G, Roos C, Boubli JP, Siepel A, Kundaje A, Paten B, Lindblad-Toh K, Rogers J, Marques Bonet T, Farh KKH. Identification of constrained sequence elements across 239 primate genomes. Nature 2024; 625:735-742. [PMID: 38030727 PMCID: PMC10808062 DOI: 10.1038/s41586-023-06798-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Accepted: 10/30/2023] [Indexed: 12/01/2023]
Abstract
Noncoding DNA is central to our understanding of human gene regulation and complex diseases1,2, and measuring the evolutionary sequence constraint can establish the functional relevance of putative regulatory elements in the human genome3-9. Identifying the genomic elements that have become constrained specifically in primates has been hampered by the faster evolution of noncoding DNA compared to protein-coding DNA10, the relatively short timescales separating primate species11, and the previously limited availability of whole-genome sequences12. Here we construct a whole-genome alignment of 239 species, representing nearly half of all extant species in the primate order. Using this resource, we identified human regulatory elements that are under selective constraint across primates and other mammals at a 5% false discovery rate. We detected 111,318 DNase I hypersensitivity sites and 267,410 transcription factor binding sites that are constrained specifically in primates but not across other placental mammals and validate their cis-regulatory effects on gene expression. These regulatory elements are enriched for human genetic variants that affect gene expression and complex traits and diseases. Our results highlight the important role of recent evolution in regulatory sequence elements differentiating primates, including humans, from other placental mammals.
Collapse
Affiliation(s)
- Lukas F K Kuderna
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA
| | - Jacob C Ulirsch
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA
| | - Sabrina Rashid
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA
| | - Mohamed Ameen
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA
| | - Laksshman Sundaram
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA
| | - Glenn Hickey
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Anthony J Cox
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA
| | - Hong Gao
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA
| | - Arvind Kumar
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA
| | - Francois Aguet
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA
| | - Matthew J Christmas
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Hiram Clawson
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | | | - Mareike C Janiak
- School of Science, Engineering and Environment, University of Salford, Salford, UK
| | - Martin Kuhlwilm
- Department of Evolutionary Anthropology, University of Vienna, Vienna, Austria
- Human Evolution and Archaeological Sciences (HEAS), University of Vienna, Vienna, Austria
| | - Joseph D Orkin
- Département d'Anthropologie, Université de Montréal, Montréal, Quebec, Canada
| | - Thomas Bataillon
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| | - Shivakumara Manu
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
- Laboratory for the Conservation of Endangered Species, CSIR-Centre for Cellular and Molecular Biology, Hyderabad, India
| | - Alejandro Valenzuela
- IBE, Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| | - Juraj Bergman
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
- Section for Ecoinformatics and Biodiversity, Department of Biology, Aarhus University, Aarhus, Denmark
| | | | - Felipe Ennes Silva
- Research Group on Primate Biology and Conservation, Mamirauá Institute for Sustainable Development, Tefé, Brazil
- Evolutionary Biology and Ecology (EBE), Département de Biologie des Organismes, Université libre de Bruxelles (ULB), Brussels, Belgium
| | - Lidia Agueda
- Centro Nacional de Analisis Genomico (CNAG), Barcelona, Spain
| | - Julie Blanc
- Centro Nacional de Analisis Genomico (CNAG), Barcelona, Spain
| | - Marta Gut
- Centro Nacional de Analisis Genomico (CNAG), Barcelona, Spain
| | - Dorien de Vries
- School of Science, Engineering and Environment, University of Salford, Salford, UK
| | - Ian Goodhead
- School of Science, Engineering and Environment, University of Salford, Salford, UK
| | - R Alan Harris
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Muthuswamy Raveendran
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Axel Jensen
- Department of Ecology and Genetics, Animal Ecology, Uppsala University, Uppsala, Sweden
| | | | - Julie E Horvath
- North Carolina Museum of Natural Sciences, Raleigh, NC, USA
- Department of Biological and Biomedical Sciences, North Carolina Central University, Durham, NC, USA
- Department of Biological Sciences, North Carolina State University, Raleigh, NC, USA
- Department of Evolutionary Anthropology, Duke University, Durham, NC, USA
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | | | - David Juan
- IBE, Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| | | | - Joshua G Schraiber
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA
| | | | - Fabrício Bertuol
- Universidade Federal do Amazonas, Departamento de Genética, Laboratório de Evolução e Genética Animal (LEGAL), Manaus, Brazil
| | - Hazel Byrne
- Department of Anthropology, University of Utah, Salt Lake City, UT, USA
| | | | - Izeni Farias
- Universidade Federal do Amazonas, Departamento de Genética, Laboratório de Evolução e Genética Animal (LEGAL), Manaus, Brazil
| | - João Valsecchi
- Research Group on Terrestrial Vertebrate Ecology, Mamirauá Institute for Sustainable Development, Tefé, Brazil
- Rede de Pesquisa em Diversidade, Conservação e Uso da Fauna da Amazônia - RedeFauna, Manaus, Brazil
- Comunidad de Manejo de Fauna Silvestre en la Amazonía y en Latinoamérica-ComFauna, Iquitos, Peru
| | - Malu Messias
- Universidade Federal de Rondônia, Porto Velho, Brazil
| | | | - Mihir Trivedi
- Laboratory for the Conservation of Endangered Species, CSIR-Centre for Cellular and Molecular Biology, Hyderabad, India
| | - Rogerio Rossi
- Instituto de Biociências, Universidade Federal do Mato Grosso, Cuiabá, Brazil
| | - Tomas Hrbek
- Universidade Federal do Amazonas, Departamento de Genética, Laboratório de Evolução e Genética Animal (LEGAL), Manaus, Brazil
- Department of Biology, Trinity University, San Antonio, TX, USA
| | - Nicole Andriaholinirina
- Life Sciences and Environment, Technology and Environment of Mahajanga, University of Mahajanga, Mahajanga, Madagascar
| | - Clément J Rabarivola
- Life Sciences and Environment, Technology and Environment of Mahajanga, University of Mahajanga, Mahajanga, Madagascar
| | - Alphonse Zaramody
- Life Sciences and Environment, Technology and Environment of Mahajanga, University of Mahajanga, Mahajanga, Madagascar
| | - Clifford J Jolly
- Department of Anthropology, New York University, New York, NY, USA
| | - Jane Phillips-Conroy
- Department of Neuroscience, Washington University School of Medicine in St Louis, St Louis, MO, USA
| | - Gregory Wilkerson
- Keeling Center for Comparative Medicine and Research, MD Anderson Cancer Center, Bastrop, TX, USA
| | - Christian Abee
- Keeling Center for Comparative Medicine and Research, MD Anderson Cancer Center, Bastrop, TX, USA
| | - Joe H Simmons
- Keeling Center for Comparative Medicine and Research, MD Anderson Cancer Center, Bastrop, TX, USA
| | | | - Sree Kanthaswamy
- School of Interdisciplinary Forensics, Arizona State University, Phoenix, AZ, USA
- California National Primate Research Center, University of California, Davis, CA, USA
| | - Fekadu Shiferaw
- Guinea Worm Eradication Program, The Carter Center Ethiopia, Addis Ababa, Ethiopia
| | - Dongdong Wu
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
| | - Long Zhou
- Center for Evolutionary and Organismal Biology, Zhejiang University School of Medicine, Hangzhou, China
| | - Yong Shao
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
| | - Guojie Zhang
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
- Center for Evolutionary and Organismal Biology, Zhejiang University School of Medicine, Hangzhou, China
- Villum Centre for Biodiversity Genomics, Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark
- Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, China
- Women's Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Julius D Keyyu
- Tanzania Wildlife Research Institute (TAWIRI), Arusha, Tanzania
| | - Sascha Knauf
- Institute of International Animal Health/One Health, Friedrich-Loeffler-Institut, Federal Research Institute for Animal Health, Greifswald-Insel Riems, Germany
- Professorship for International Animal Health/One Health, Faculty of Veterinary Medicine, Justus Liebig University, Giessen, Germany
| | - Minh D Le
- Department of Environmental Ecology, Faculty of Environmental Sciences, University of Science and Central Institute for Natural Resources and Environmental Studies, Vietnam National University, Hanoi, Vietnam
| | - Esther Lizano
- IBE, Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Stefan Merker
- Department of Zoology, State Museum of Natural History Stuttgart, Stuttgart, Germany
| | - Arcadi Navarro
- IBE, Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Barcelonaβeta Brain Research Center, Pasqual Maragall Foundation, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Tilo Nadler
- Cuc Phuong Commune, Nho Quan District, Vietnam
| | - Chiea Chuen Khor
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
| | | | - Patrick Tan
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
- SingHealth Duke-NUS Institute of Precision Medicine (PRISM), Singapore, Singapore
- Cancer and Stem Cell Biology Program, Duke-NUS Medical School, Singapore, Singapore
| | - Weng Khong Lim
- SingHealth Duke-NUS Institute of Precision Medicine (PRISM), Singapore, Singapore
- Cancer and Stem Cell Biology Program, Duke-NUS Medical School, Singapore, Singapore
- SingHealth Duke-NUS Genomic Medicine Centre, Singapore, Singapore
| | - Andrew C Kitchener
- Department of Natural Sciences, National Museums Scotland, Edinburgh, UK
- School of Geosciences, Edinburgh, UK
| | - Dietmar Zinner
- Cognitive Ethology Laboratory, Germany Primate Center, Leibniz Institute for Primate Research, Göttingen, Germany
- Department of Primate Cognition, Georg-August-Universität Göttingen, Göttingen, Germany
- Leibniz ScienceCampus Primate Cognition, Göttingen, Germany
| | - Ivo Gut
- Centro Nacional de Analisis Genomico (CNAG), Barcelona, Spain
| | - Amanda D Melin
- Department of Anthropology and Archaeology, University of Calgary, Calgary, Alberta, Canada
- Department of Medical Genetics, University of Calgary, Calgary, Alberta, Canada
- Alberta Children's Hospital Research Institute, University of Calgary, Calgary, Alberta, Canada
| | - Katerina Guschanski
- Department of Ecology and Genetics, Animal Ecology, Uppsala University, Uppsala, Sweden
- Institute of Ecology and Evolution, School of Biological Sciences, University of Edinburgh, Edinburgh, UK
| | | | - Robin M D Beck
- School of Science, Engineering and Environment, University of Salford, Salford, UK
| | - Ioannis Karakikes
- Cardiovascular Institute, Stanford University, Stanford, CA, USA
- Department of Cardiothoracic Surgery, Stanford University, Stanford, CA, USA
| | - Kevin C Wang
- Department of Cancer Biology, Stanford University, Stanford, CA, USA
- Department of Dermatology, Stanford University School of Medicine, Stanford, CA, USA
- Veterans Affairs Palo Alto Healthcare System, Palo Alto, CA, USA
| | - Govindhaswamy Umapathy
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
- Laboratory for the Conservation of Endangered Species, CSIR-Centre for Cellular and Molecular Biology, Hyderabad, India
| | - Christian Roos
- Gene Bank of Primates and Primate Genetics Laboratory, German Primate Center, Leibniz Institute for Primate Research, Göttingen, Germany
| | - Jean P Boubli
- School of Science, Engineering and Environment, University of Salford, Salford, UK
| | - Adam Siepel
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Anshul Kundaje
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Kerstin Lindblad-Toh
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Jeffrey Rogers
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
| | - Tomas Marques Bonet
- IBE, Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona, Spain.
- Centro Nacional de Analisis Genomico (CNAG), Barcelona, Spain.
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Barcelona, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain.
- Universitat Pompeu Fabra, Barcelona, Spain.
| | - Kyle Kai-How Farh
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA.
| |
Collapse
|
23
|
Estabrooks T, Gurinovich A, Pietruska J, Lewis B, Harvey G, Post G, Lambert L, Miller A, Rodrigues L, White ME, Lopes C, London CA, Megquier K. Identification of genomic alterations with clinical impact in canine splenic hemangiosarcoma. Vet Comp Oncol 2023; 21:623-633. [PMID: 37734854 DOI: 10.1111/vco.12925] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 07/11/2023] [Accepted: 07/12/2023] [Indexed: 09/23/2023]
Abstract
Canine hemangiosarcoma (HSA) is an aggressive cancer of endothelial cells with short survival times. Understanding the genomic landscape of HSA may aid in developing therapeutic strategies for dogs and may also inform therapies for the rare and aggressive human cancer angiosarcoma. The objectives of this study were to build a framework for leveraging real-world genomic and clinical data that could provide the foundation for precision medicine in veterinary oncology, and to determine the relationships between genomic and clinical features in canine splenic HSA. One hundred and nine dogs with primary splenic HSA treated by splenectomy that had tumour sequencing via the FidoCure® Precision Medicine Platform targeted sequencing panel were enrolled. Patient signalment, weight, metastasis at diagnosis and overall survival time were retrospectively evaluated. The incidence of genomic alterations in individual genes and their relationship to patient variables including outcome were assessed. Somatic mutations in TP53 (n = 44), NRAS (n = 20) and PIK3CA (n = 19) were most common. Survival was associated with presence of metastases at diagnosis and germline variants in SETD2 and NOTCH1. Age at diagnosis was associated with somatic NRAS mutations and breed. TP53 and PIK3CA somatic mutations were found in larger dogs, while germline SETD2 variants were found in smaller dogs. We identified both somatic mutations and germline variants associated with clinical variables including age, breed and overall survival. These genetic changes may be useful prognostic factors and provide insight into the genomic landscape of hemangiosarcoma.
Collapse
Affiliation(s)
- Timothy Estabrooks
- Cummings School of Veterinary Medicine, Tufts University, North Grafton, Massachusetts, USA
| | | | - Jodie Pietruska
- MassBio, UMass Chan Medical School, Worcester, Massachusetts, USA
| | | | | | - Gerald Post
- One Health Company, Palo Alto, California, USA
| | | | | | | | | | | | - Cheryl A London
- Cummings School of Veterinary Medicine, Tufts University, North Grafton, Massachusetts, USA
| | - Kate Megquier
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| |
Collapse
|
24
|
Benegas G, Batra SS, Song YS. DNA language models are powerful predictors of genome-wide variant effects. Proc Natl Acad Sci U S A 2023; 120:e2311219120. [PMID: 37883436 PMCID: PMC10622914 DOI: 10.1073/pnas.2311219120] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Accepted: 09/08/2023] [Indexed: 10/28/2023] Open
Abstract
The expanding catalog of genome-wide association studies (GWAS) provides biological insights across a variety of species, but identifying the causal variants behind these associations remains a significant challenge. Experimental validation is both labor-intensive and costly, highlighting the need for accurate, scalable computational methods to predict the effects of genetic variants across the entire genome. Inspired by recent progress in natural language processing, unsupervised pretraining on large protein sequence databases has proven successful in extracting complex information related to proteins. These models showcase their ability to learn variant effects in coding regions using an unsupervised approach. Expanding on this idea, we here introduce the Genomic Pre-trained Network (GPN), a model designed to learn genome-wide variant effects through unsupervised pretraining on genomic DNA sequences. Our model also successfully learns gene structure and DNA motifs without any supervision. To demonstrate its utility, we train GPN on unaligned reference genomes of Arabidopsis thaliana and seven related species within the Brassicales order and evaluate its ability to predict the functional impact of genetic variants in A. thaliana by utilizing allele frequencies from the 1001 Genomes Project and a comprehensive database of GWAS. Notably, GPN outperforms predictors based on popular conservation scores such as phyloP and phastCons. Our predictions for A. thaliana can be visualized as sequence logos in the UCSC Genome Browser (https://genome.ucsc.edu/s/gbenegas/gpn-arabidopsis). We provide code (https://github.com/songlab-cal/gpn) to train GPN for any given species using its DNA sequence alone, enabling unsupervised prediction of variant effects across the entire genome.
Collapse
Affiliation(s)
- Gonzalo Benegas
- Graduate Group in Computational Biology, University of California, Berkeley, CA94720
| | | | - Yun S. Song
- Computer Science Division, University of California, Berkeley, CA94720
- Department of Statistics, University of California, Berkeley, CA94720
- Center for Computational Biology, University of California, Berkeley, CA94720
| |
Collapse
|
25
|
Smith C, Burugula BB, Dunn I, Aradhya S, Kitzman JO, Yee JL. High-Throughput Splicing Assays Identify Known and Novel WT1 Exon 9 Variants in Nephrotic Syndrome. Kidney Int Rep 2023; 8:2117-2125. [PMID: 37850022 PMCID: PMC10577367 DOI: 10.1016/j.ekir.2023.07.033] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 07/31/2023] [Indexed: 10/19/2023] Open
Abstract
Introduction Frasier syndrome (FS) is a rare Mendelian form of nephrotic syndrome (NS) caused by variants which disrupt the proper splicing of WT1. This key transcription factor gene is alternatively spliced at exon 9 to produce 2 isoforms ("KTS+" and "KTS-"), which are normally expressed in the kidney at a ∼2:1 (KTS+:KTS-) ratio. FS results from variants that reduce this ratio by disrupting the splice donor of the KTS+ isoform. FS is extremely rare, and it is unclear whether any variants beyond the 8 already known could cause FS. Methods To prospectively identify other splicing-disruptive variants, we leveraged a massively parallel splicing assay. We tested every possible single nucleotide variant (n = 519) in and around WT1 exon 9 for effects upon exon inclusion and KTS+/- ratio. Results Splice disruptive variants (SDVs) made up 11% of the tested point variants overall and were tightly concentrated near the canonical acceptor and the KTS+/- alternate donors. Our map successfully identified all 8 known FS or focal segmental glomerulosclerosis (FSGS) variants and 16 additional novel variants which were comparably disruptive to these known pathogenic variants. We also identified 19 variants that, conversely, increased the KTS+/KTS- ratio, of which 2 are observed in unrelated individuals with 46,XX ovotesticular disorder of sex development (46,XX OTDSD). Conclusion This splicing effect map can serve as functional evidence to guide the clinical interpretation of newly observed variants in and around WT1 exon 9.
Collapse
Affiliation(s)
- Cathy Smith
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, Michigan, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, Michigan, USA
| | - Bala Bharathi Burugula
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, Michigan, USA
| | - Ian Dunn
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, Michigan, USA
| | | | - Jacob O. Kitzman
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, Michigan, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, Michigan, USA
| | - Jennifer Lai Yee
- Department of Pediatrics, Division of Nephrology, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
26
|
Caglayan E, Konopka G. Decoding DNA sequence-driven evolution of the human brain epigenome at cellular resolution. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.14.557820. [PMID: 37745404 PMCID: PMC10515917 DOI: 10.1101/2023.09.14.557820] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
DNA-based evolutionary comparisons of regulatory genomic elements enable insight into functional changes, overcoming tissue inaccessibility. Here, we harnessed adult and fetal cortex single-cell ATAC-seq datasets to uncover DNA substitutions specific to the human and human-ancestral lineages within apes. We found that fetal microglia identity is evolutionarily divergent in all lineages, whereas other cell types are conserved. Using multiomic datasets, we further identified genes linked to multiple lineage-divergent gene regulatory elements and implicated biological pathways associated with these divergent features. We also uncovered patterns of transcription factor binding site evolution across lineages and identified expansion of bHLH-PAS factor targets in human-hominin lineages, and MEF2 factor targets in the ape lineage. Finally, conserved features were more enriched in brain disease variants, whereas there was no distinct enrichment on the human lineage compared to its ancestral lineages. Our study identifies major evolutionary patterns in the human brain epigenome at cellular resolution.
Collapse
Affiliation(s)
- Emre Caglayan
- Department of Neuroscience, UT Southwestern Medical Center, Dallas, TX 75390, USA
- Peter O’Donnell Jr. Brain Institute, UT Southwestern Medical Center, Dallas, TX 75390, USA
| | - Genevieve Konopka
- Department of Neuroscience, UT Southwestern Medical Center, Dallas, TX 75390, USA
- Peter O’Donnell Jr. Brain Institute, UT Southwestern Medical Center, Dallas, TX 75390, USA
| |
Collapse
|
27
|
Roy A, Sakthikumar S, Kozyrev SV, Nordin J, Pensch R, Mäkeläinen S, Pettersson M, Karlsson EK, Lindblad-Toh K, Forsberg-Nilsson K. Using evolutionary constraint to define novel candidate driver genes in medulloblastoma. Proc Natl Acad Sci U S A 2023; 120:e2300984120. [PMID: 37549291 PMCID: PMC10438395 DOI: 10.1073/pnas.2300984120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Accepted: 07/07/2023] [Indexed: 08/09/2023] Open
Abstract
Current knowledge of cancer genomics remains biased against noncoding mutations. To systematically search for regulatory noncoding mutations, we assessed mutations in conserved positions in the genome under the assumption that these are more likely to be functional than mutations in positions with low conservation. To this end, we use whole-genome sequencing data from the International Cancer Genome Consortium and combined it with evolutionary constraint inferred from 240 mammals, to identify genes enriched in noncoding constraint mutations (NCCMs), mutations likely to be regulatory in nature. We compare medulloblastoma (MB), which is malignant, to pilocytic astrocytoma (PA), a primarily benign tumor, and find highly different NCCM frequencies between the two, in agreement with the fact that malignant cancers tend to have more mutations. In PA, a high NCCM frequency only affects the BRAF locus, which is the most commonly mutated gene in PA. In contrast, in MB, >500 genes have high levels of NCCMs. Intriguingly, several loci with NCCMs in MB are associated with different ages of onset, such as the HOXB cluster in young MB patients. In adult patients, NCCMs occurred in, e.g., the WASF-2/AHDC1/FGR locus. One of these NCCMs led to increased expression of the SRC kinase FGR and augmented responsiveness of MB cells to dasatinib, a SRC kinase inhibitor. Our analysis thus points to different molecular pathways in different patient groups. These newly identified putative candidate driver mutations may aid in patient stratification in MB and could be valuable for future selection of personalized treatment options.
Collapse
Affiliation(s)
- Ananya Roy
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, 751 85Uppsala, Sweden
| | - Sharadha Sakthikumar
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 23Uppsala, Sweden
- Broad Institute, Cambridge, MA02142
| | - Sergey V. Kozyrev
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 23Uppsala, Sweden
| | - Jessika Nordin
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, 751 85Uppsala, Sweden
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 23Uppsala, Sweden
| | - Raphaela Pensch
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 23Uppsala, Sweden
| | - Suvi Mäkeläinen
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 23Uppsala, Sweden
| | - Mats Pettersson
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 23Uppsala, Sweden
| | | | - Elinor K. Karlsson
- Broad Institute, Cambridge, MA02142
- Program in Molecular Medicine, UMass Chan Medical School, Worcester, MA01605
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA01605
| | - Kerstin Lindblad-Toh
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 23Uppsala, Sweden
- Broad Institute, Cambridge, MA02142
| | - Karin Forsberg-Nilsson
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, 751 85Uppsala, Sweden
- Division of Cancer and Stem Cells, University of Nottingham Biodiscovery Institute, NottinghamNG72RD, United Kingdom
| |
Collapse
|
28
|
Lim ET, Chan Y. Editorial for the Neurogenetics and Neurogenomics special issue. Hum Genet 2023; 142:997-999. [PMID: 37474752 DOI: 10.1007/s00439-023-02585-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/22/2023]
|
29
|
Ye M, Zhang DF. Understanding and modeling human traits and diseases: Insights from the comparative genomics resources of Zoonomia. Innovation (N Y) 2023; 4:100444. [PMID: 37305857 PMCID: PMC10251142 DOI: 10.1016/j.xinn.2023.100444] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2023] [Accepted: 05/17/2023] [Indexed: 06/13/2023] Open
Affiliation(s)
- Maosen Ye
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences and Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650201, China
| | - Deng-Feng Zhang
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences and Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650201, China
- Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming 650204, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650204, China
- National Research Facility for Phenotypic & Genetic Analysis of Model Animals (Primate Facility), Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650107, China
- KIZ/CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650201, China
| |
Collapse
|
30
|
Zeng T, Spence JP, Mostafavi H, Pritchard JK. Bayesian estimation of gene constraint from an evolutionary model with gene features. RESEARCH SQUARE 2023:rs.3.rs-3012879. [PMID: 37398424 PMCID: PMC10312940 DOI: 10.21203/rs.3.rs-3012879/v1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
Measures of selective constraint on genes have been used for many applications including clinical interpretation of rare coding variants, disease gene discovery, and studies of genome evolution. However, widely-used metrics are severely underpowered at detecting constraint for the shortest ~25% of genes, potentially causing important pathogenic mutations to be overlooked. We developed a framework combining a population genetics model with machine learning on gene features to enable accurate inference of an interpretable constraint metric, s het . Our estimates outperform existing metrics for prioritizing genes important for cell essentiality, human disease, and other phenotypes, especially for short genes. Our new estimates of selective constraint should have wide utility for characterizing genes relevant to human disease. Finally, our inference framework, GeneBayes, provides a flexible platform that can improve estimation of many gene-level properties, such as rare variant burden or gene expression differences.
Collapse
Affiliation(s)
- Tony Zeng
- Department of Genetics, Stanford University, Stanford CA
| | | | | | - Jonathan K. Pritchard
- Department of Genetics, Stanford University, Stanford CA
- Department of Biology, Stanford University, Stanford CA
| |
Collapse
|
31
|
Ponting CP. Human genetics seen through an evolutionary lens. CELL GENOMICS 2023; 3:100323. [PMID: 37228753 PMCID: PMC10203256 DOI: 10.1016/j.xgen.2023.100323] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Most DNA bases crucial for species perpetuation are marked by a dearth of sequence change among species related over long evolutionary time. Recently, Christmas et al.1 and Sullivan et al.2 cast light on human DNA and its variants through comparison with 239 other mammalian species' genomes.
Collapse
Affiliation(s)
- Chris P. Ponting
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh EH4 2XU, UK
| |
Collapse
|
32
|
Romero IG. Seeing humans through an evolutionary lens. Science 2023; 380:360-361. [PMID: 37104588 DOI: 10.1126/science.adh0745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
A collection of mammalian genomes provides insights into human biology and evolution.
Collapse
Affiliation(s)
- Irene Gallego Romero
- Melbourne Integrative Genomics, University of Melbourne, Parkville, VIC, Australia
- School of BioSciences, University of Melbourne, Parkville, VIC, Australia
| |
Collapse
|