1
|
Vespasiani DM, Jacobs GS, Cook LE, Brucato N, Leavesley M, Kinipi C, Ricaut FX, Cox MP, Gallego Romero I. Denisovan introgression has shaped the immune system of present-day Papuans. PLoS Genet 2022; 18:e1010470. [PMID: 36480515 PMCID: PMC9731433 DOI: 10.1371/journal.pgen.1010470] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Accepted: 10/10/2022] [Indexed: 12/13/2022] Open
Abstract
Modern humans have admixed with multiple archaic hominins. Papuans, in particular, owe up to 5% of their genome to Denisovans, a sister group to Neanderthals whose remains have only been identified in Siberia and Tibet. Unfortunately, the biological and evolutionary significance of these introgression events remain poorly understood. Here we investigate the function of both Denisovan and Neanderthal alleles characterised within a set of 56 genomes from Papuan individuals. By comparing the distribution of archaic and non-archaic variants we assess the consequences of archaic admixture across a multitude of different cell types and functional elements. We observe an enrichment of archaic alleles within cis-regulatory elements and transcribed regions of the genome, with Denisovan variants strongly affecting elements active within immune-related cells. We identify 16,048 and 10,032 high-confidence Denisovan and Neanderthal variants that fall within annotated cis-regulatory elements and with the potential to alter the affinity of multiple transcription factors to their cognate DNA motifs, highlighting a likely mechanism by which introgressed DNA can impact phenotypes. Lastly, we experimentally validate these predictions by testing the regulatory potential of five Denisovan variants segregating within Papuan individuals, and find that two are associated with a significant reduction of transcriptional activity in plasmid reporter assays. Together, these data provide support for a widespread contribution of archaic DNA in shaping the present levels of modern human genetic diversity, with different archaic ancestries potentially affecting multiple phenotypic traits within non-Africans.
Collapse
Affiliation(s)
- Davide M. Vespasiani
- Melbourne Integrative Genomics, University of Melbourne, Parkville, Australia
- School of Biosciences, University of Melbourne, Parkville, Australia
| | - Guy S. Jacobs
- Department of Archaeology, University of Cambridge, Cambridge, Uniteed Kingdom
| | - Laura E. Cook
- Melbourne Integrative Genomics, University of Melbourne, Parkville, Australia
- School of Biosciences, University of Melbourne, Parkville, Australia
| | - Nicolas Brucato
- Laboratoire de Evolution et Diversite Biologique, Université de Toulouse Midi-Pyrénées, Toulouse, France
| | - Matthew Leavesley
- School of Humanities and Social Sciences, University of Papua New Guinea, Port Moresby, Papua New Guinea
- College of Arts, Society and Education, James Cook University, Cairns, Australia
- ARC Centre of Excellence for Australian Biodiversity and Heritage, University of Wollongong, Wollongong, Australia
| | - Christopher Kinipi
- School of Humanities and Social Sciences, University of Papua New Guinea, Port Moresby, Papua New Guinea
| | - François-Xavier Ricaut
- Laboratoire de Evolution et Diversite Biologique, Université de Toulouse Midi-Pyrénées, Toulouse, France
| | - Murray P. Cox
- School of Natural Sciences, Massey University, Palmerston North, New Zealand
| | - Irene Gallego Romero
- Melbourne Integrative Genomics, University of Melbourne, Parkville, Australia
- School of Biosciences, University of Melbourne, Parkville, Australia
- Center for Stem Cell Systems, University of Melbourne, Parkville, Australia
- Center for Genomics, Evolution and Medicine, University of Tartu, Tartu, Estonia
- * E-mail:
| |
Collapse
|
2
|
Yang Z, Chen H, Lu Y, Gao Y, Sun H, Wang J, Jin L, Chu J, Xu S. Genetic evidence of tri-genealogy hypothesis on the origin of ethnic minorities in Yunnan. BMC Biol 2022; 20:166. [PMID: 35864541 PMCID: PMC9306206 DOI: 10.1186/s12915-022-01367-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Accepted: 07/05/2022] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND Yunnan is located in Southwest China and consists of great cultural, linguistic, and genetic diversity. However, the genomic diversity of ethnic minorities in Yunnan is largely under-investigated. To gain insights into population history and local adaptation of Yunnan minorities, we analyzed 242 whole-exome sequencing data with high coverage (~ 100-150 ×) of Yunnan minorities representing Achang, Jingpo, Dai, and Deang, who were linguistically assumed to be derived from three ancient lineages (the tri-genealogy hypothesis), i.e., Di-Qiang, Bai-Yue, and Bai-Pu. RESULTS Yunnan minorities show considerable genetic differences. Di-Qiang populations likely migrated from the Tibetan area about 6700 years ago. Genetic divergence between Bai-Yue and Di-Qiang was estimated to be 7000 years, and that between Bai-Yue and Bai-Pu was estimated to be 5500 years. Bai-Pu is relatively isolated, but gene flow from surrounding Di-Qiang and Bai-Yue populations was also found. Furthermore, we identified genetic variants that are differentiated within Yunnan minorities possibly due to the living circumstances and habits. Notably, we found that adaptive variants related to malaria and glucose metabolism suggest the adaptation to thalassemia and G6PD deficiency resulting from malaria resistance in the Dai population. CONCLUSIONS We provided genetic evidence of the tri-genealogy hypothesis as well as new insights into the genetic history and local adaptation of the Yunnan minorities.
Collapse
Affiliation(s)
- Zhaoqing Yang
- Department of Medical Genetics, Institute of Medical Biology, Chinese Academy of Medical Sciences, Kunming, 650118, China
| | - Hao Chen
- Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Yan Lu
- State Key Laboratory of Genetic Engineering, Collaborative Innovation Center for Genetics and Development, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai, 200438, China
| | - Yang Gao
- Human Phenome Institute, Zhangjiang Fudan International Innovation Center, and Ministry of Education Key Laboratory of Contemporary Anthropology, Fudan University, Shanghai, 201203, China
| | - Hao Sun
- Department of Medical Genetics, Institute of Medical Biology, Chinese Academy of Medical Sciences, Kunming, 650118, China
| | - Jiucun Wang
- State Key Laboratory of Genetic Engineering, Collaborative Innovation Center for Genetics and Development, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai, 200438, China
- Human Phenome Institute, Zhangjiang Fudan International Innovation Center, and Ministry of Education Key Laboratory of Contemporary Anthropology, Fudan University, Shanghai, 201203, China
| | - Li Jin
- State Key Laboratory of Genetic Engineering, Collaborative Innovation Center for Genetics and Development, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai, 200438, China
- Human Phenome Institute, Zhangjiang Fudan International Innovation Center, and Ministry of Education Key Laboratory of Contemporary Anthropology, Fudan University, Shanghai, 201203, China
| | - Jiayou Chu
- Department of Medical Genetics, Institute of Medical Biology, Chinese Academy of Medical Sciences, Kunming, 650118, China.
| | - Shuhua Xu
- State Key Laboratory of Genetic Engineering, Collaborative Innovation Center for Genetics and Development, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai, 200438, China.
- Human Phenome Institute, Zhangjiang Fudan International Innovation Center, and Ministry of Education Key Laboratory of Contemporary Anthropology, Fudan University, Shanghai, 201203, China.
- Department of Liver Surgery and Transplantation Liver Cancer Institute, Zhongshan Hospital, Fudan University, Shanghai, 200032, China.
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, 650223, China.
| |
Collapse
|
3
|
Kumar P, Choudhary M, Halder T, Prakash NR, Singh V, V. VT, Sheoran S, T. RK, Longmei N, Rakshit S, Siddique KHM. Salinity stress tolerance and omics approaches: revisiting the progress and achievements in major cereal crops. Heredity (Edinb) 2022; 128:497-518. [DOI: 10.1038/s41437-022-00516-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Revised: 02/12/2022] [Accepted: 02/14/2022] [Indexed: 02/07/2023] Open
|
4
|
Schweizer G, Wagner A. Both Binding Strength and Evolutionary Accessibility Affect the Population Frequency of Transcription Factor Binding Sequences in Arabidopsis thaliana. Genome Biol Evol 2021; 13:6459646. [PMID: 34894231 PMCID: PMC8712246 DOI: 10.1093/gbe/evab273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/06/2021] [Indexed: 11/22/2022] Open
Abstract
Mutations in DNA sequences that bind transcription factors and thus modulate gene expression are a source of adaptive variation in gene expression. To understand how transcription factor binding sequences evolve in natural populations of the thale cress Arabidopsis thaliana, we integrated genomic polymorphism data for loci bound by transcription factors with in vitro data on binding affinity for these transcription factors. Specifically, we studied 19 different transcription factors, and the allele frequencies of 8,333 genomic loci bound in vivo by these transcription factors in 1,135 A. thaliana accessions. We find that transcription factor binding sequences show very low genetic diversity, suggesting that they are subject to purifying selection. High frequency alleles of such binding sequences tend to bind transcription factors strongly. Conversely, alleles that are absent from the population tend to bind them weakly. In addition, alleles with high frequencies also tend to be the endpoints of many accessible evolutionary paths leading to these alleles. We show that both high affinity and high evolutionary accessibility contribute to high allele frequency for at least some transcription factors. Although binding sequences with stronger affinity are more frequent, we did not find them to be associated with higher gene expression levels. Epistatic interactions among individual mutations that alter binding affinity are pervasive and can help explain variation in accessibility among binding sequences. In summary, combining in vitro binding affinity data with in vivo binding sequence data can help understand the forces that affect the evolution of transcription factor binding sequences in natural populations.
Collapse
Affiliation(s)
- Gabriel Schweizer
- Department of Evolutionary Biology and Environmental Studies, University of Zürich, Switzerland.,Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode, Lausanne, Switzerland
| | - Andreas Wagner
- Department of Evolutionary Biology and Environmental Studies, University of Zürich, Switzerland.,Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode, Lausanne, Switzerland.,Santa Fe Institute, Santa Fe, New Mexico, USA.,Stellenbosch Institute for Advanced Study (STIAS), Wallenberg Research Centre at Stellenbosch University, South Africa
| |
Collapse
|
5
|
Joshi M, Kapopoulou A, Laurent S. Impact of Genetic Variation in Gene Regulatory Sequences: A Population Genomics Perspective. Front Genet 2021; 12:660899. [PMID: 34276769 PMCID: PMC8282999 DOI: 10.3389/fgene.2021.660899] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 05/31/2021] [Indexed: 01/06/2023] Open
Abstract
The unprecedented rise of high-throughput sequencing and assay technologies has provided a detailed insight into the non-coding sequences and their potential role as gene expression regulators. These regulatory non-coding sequences are also referred to as cis-regulatory elements (CREs). Genetic variants occurring within CREs have been shown to be associated with altered gene expression and phenotypic changes. Such variants are known to occur spontaneously and ultimately get fixed, due to selection and genetic drift, in natural populations and, in some cases, pave the way for speciation. Hence, the study of genetic variation at CREs has improved our overall understanding of the processes of local adaptation and evolution. Recent advances in high-throughput sequencing and better annotations of CREs have enabled the evaluation of the impact of such variation on gene expression, phenotypic alteration and fitness. Here, we review recent research on the evolution of CREs and concentrate on studies that have investigated genetic variation occurring in these regulatory sequences within the context of population genetics.
Collapse
Affiliation(s)
- Manas Joshi
- Department of Comparative Development and Genetics, Max Planck Institute for Plant Breeding Research, Cologne, Germany
| | | | - Stefan Laurent
- Department of Comparative Development and Genetics, Max Planck Institute for Plant Breeding Research, Cologne, Germany
| |
Collapse
|
6
|
Zrimec J, Buric F, Kokina M, Garcia V, Zelezniak A. Learning the Regulatory Code of Gene Expression. Front Mol Biosci 2021; 8:673363. [PMID: 34179082 PMCID: PMC8223075 DOI: 10.3389/fmolb.2021.673363] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Accepted: 05/24/2021] [Indexed: 11/13/2022] Open
Abstract
Data-driven machine learning is the method of choice for predicting molecular phenotypes from nucleotide sequence, modeling gene expression events including protein-DNA binding, chromatin states as well as mRNA and protein levels. Deep neural networks automatically learn informative sequence representations and interpreting them enables us to improve our understanding of the regulatory code governing gene expression. Here, we review the latest developments that apply shallow or deep learning to quantify molecular phenotypes and decode the cis-regulatory grammar from prokaryotic and eukaryotic sequencing data. Our approach is to build from the ground up, first focusing on the initiating protein-DNA interactions, then specific coding and non-coding regions, and finally on advances that combine multiple parts of the gene and mRNA regulatory structures, achieving unprecedented performance. We thus provide a quantitative view of gene expression regulation from nucleotide sequence, concluding with an information-centric overview of the central dogma of molecular biology.
Collapse
Affiliation(s)
- Jan Zrimec
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Filip Buric
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Mariia Kokina
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Victor Garcia
- School of Life Sciences and Facility Management, Zurich University of Applied Sciences, Wädenswil, Switzerland
| | - Aleksej Zelezniak
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
- Science for Life Laboratory, Stockholm, Sweden
| |
Collapse
|
7
|
Ávila-Arcos MC, McManus KF, Sandoval K, Rodríguez-Rodríguez JE, Villa-Islas V, Martin AR, Luisi P, Peñaloza-Espinosa RI, Eng C, Huntsman S, Burchard EG, Gignoux CR, Bustamante CD, Moreno-Estrada A. Population History and Gene Divergence in Native Mexicans Inferred from 76 Human Exomes. Mol Biol Evol 2021; 37:994-1006. [PMID: 31848607 PMCID: PMC7086176 DOI: 10.1093/molbev/msz282] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Native American genetic variation remains underrepresented in most catalogs of human genome sequencing data. Previous genotyping efforts have revealed that Mexico’s Indigenous population is highly differentiated and substructured, thus potentially harboring higher proportions of private genetic variants of functional and biomedical relevance. Here we have targeted the coding fraction of the genome and characterized its full site frequency spectrum by sequencing 76 exomes from five Indigenous populations across Mexico. Using diffusion approximations, we modeled the demographic history of Indigenous populations from Mexico with northern and southern ethnic groups splitting 7.2 KYA and subsequently diverging locally 6.5 and 5.7 KYA, respectively. Selection scans for positive selection revealed BCL2L13 and KBTBD8 genes as potential candidates for adaptive evolution in Rarámuris and Triquis, respectively. BCL2L13 is highly expressed in skeletal muscle and could be related to physical endurance, a well-known phenotype of the northern Mexico Rarámuri. The KBTBD8 gene has been associated with idiopathic short stature and we found it to be highly differentiated in Triqui, a southern Indigenous group from Oaxaca whose height is extremely low compared to other Native populations.
Collapse
Affiliation(s)
- María C Ávila-Arcos
- International Laboratory for Human Genome Research (LIIGH), UNAM Juriquilla, Queretaro, Mexico.,Department of Genetics, Stanford University School of Medicine, Stanford, CA
| | - Kimberly F McManus
- Department of Biology, Stanford University, Stanford, CA.,Department of Biomedical Informatics, Stanford School of Medicine, Stanford, CA
| | - Karla Sandoval
- National Laboratory of Genomics for Biodiversity (LANGEBIO), UGA, CINVESTAV, Irapuato, Guanajuato 36821, Mexico
| | | | - Viridiana Villa-Islas
- International Laboratory for Human Genome Research (LIIGH), UNAM Juriquilla, Queretaro, Mexico
| | - Alicia R Martin
- Department of Genetics, Stanford University School of Medicine, Stanford, CA
| | - Pierre Luisi
- Centro de Investigación y Desarrollo en Inmunología y Enfermedades Infecciosas, Consejo Nacional de Investigaciones Científicas y Técnicas, Córdoba, Argentina.,Facultad de Filosofía y Humanidades, Universidad Nacional de Córdoba, Córdoba, Argentina
| | - Rosenda I Peñaloza-Espinosa
- Division of Biological and Health Sciences, Department of Biological Systems, Universidad Autónoma Metropolitana-Xochimilco, Mexico City, Mexico
| | - Celeste Eng
- Department Bioengineering & Therapeutic Sciences and Medicine, University of California San Francisco, San Francisco, CA
| | - Scott Huntsman
- Department Bioengineering & Therapeutic Sciences and Medicine, University of California San Francisco, San Francisco, CA
| | - Esteban G Burchard
- Department Bioengineering & Therapeutic Sciences and Medicine, University of California San Francisco, San Francisco, CA
| | - Christopher R Gignoux
- Division of Biomedical Informatics and Personalized Medicine, University of Colorado, Denver, CO
| | - Carlos D Bustamante
- Department of Genetics, Stanford University School of Medicine, Stanford, CA
| | - Andrés Moreno-Estrada
- National Laboratory of Genomics for Biodiversity (LANGEBIO), UGA, CINVESTAV, Irapuato, Guanajuato 36821, Mexico
| |
Collapse
|
8
|
Zrimec J, Börlin CS, Buric F, Muhammad AS, Chen R, Siewers V, Verendel V, Nielsen J, Töpel M, Zelezniak A. Deep learning suggests that gene expression is encoded in all parts of a co-evolving interacting gene regulatory structure. Nat Commun 2020; 11:6141. [PMID: 33262328 PMCID: PMC7708451 DOI: 10.1038/s41467-020-19921-4] [Citation(s) in RCA: 65] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Accepted: 11/02/2020] [Indexed: 12/31/2022] Open
Abstract
Understanding the genetic regulatory code governing gene expression is an important challenge in molecular biology. However, how individual coding and non-coding regions of the gene regulatory structure interact and contribute to mRNA expression levels remains unclear. Here we apply deep learning on over 20,000 mRNA datasets to examine the genetic regulatory code controlling mRNA abundance in 7 model organisms ranging from bacteria to Human. In all organisms, we can predict mRNA abundance directly from DNA sequence, with up to 82% of the variation of transcript levels encoded in the gene regulatory structure. By searching for DNA regulatory motifs across the gene regulatory structure, we discover that motif interactions could explain the whole dynamic range of mRNA levels. Co-evolution across coding and non-coding regions suggests that it is not single motifs or regions, but the entire gene regulatory structure and specific combination of regulatory elements that define gene expression levels.
Collapse
Affiliation(s)
- Jan Zrimec
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
| | - Christoph S Börlin
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
- Novo Nordisk Foundation Center for Biosustainability, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
| | - Filip Buric
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
| | - Azam Sheikh Muhammad
- Computer Science and Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
| | - Rhongzen Chen
- Computer Science and Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
| | - Verena Siewers
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
- Novo Nordisk Foundation Center for Biosustainability, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
| | - Vilhelm Verendel
- Computer Science and Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
| | - Jens Nielsen
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
- Novo Nordisk Foundation Center for Biosustainability, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
| | - Mats Töpel
- Department of Marine Sciences, University of Gothenburg, Box 461, SE-405 30, Gothenburg, Sweden
- Gothenburg Global Biodiversity Center (GGBC), Box 461, 40530, Gothenburg, Sweden
| | - Aleksej Zelezniak
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden.
- Science for Life Laboratory, Tomtebodavägen 23a, SE-171 65, Stockholm, Sweden.
| |
Collapse
|