1
|
Colucci M, Wetton JH, Rolf B, Sheehan N, Jobling MA. Evaluating genome-wide and targeted forensic sequencing approaches to kinship determination. Forensic Sci Int Genet 2025; 76:103228. [PMID: 39848204 DOI: 10.1016/j.fsigen.2025.103228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Revised: 01/17/2025] [Accepted: 01/19/2025] [Indexed: 01/25/2025]
Abstract
Kinship determination is a valuable tool in forensic genetics, with applications including familial searching, disaster victim identification, and investigative genetic genealogy. Conventional typing of small numbers of autosomal short tandem repeats (STRs) confidently identifies only first-degree relatives. Massively parallel sequencing (MPS) can access more STRs and resolve alleles identical by length but differing in sequence (isoalleles), which may increase the power of kinship estimation, particularly when combined with additional sequenced single nucleotide polymorphism (SNP) loci, as in the ForenSeq DNA Signature Prep kit. MPS sequencing of ∼10,000 SNPs is available in the ForenSeq Kintelligence kit, promising detection of more distant kin, while SNP chips carrying hundreds of thousands of markers increase resolution still further. Here we evaluate these different resolutions in a set of pedigrees, and via simulations. As expected, the key factor influencing the precision of kinship estimation is the number of markers analysed and MPS-based analysis of STRs increases resolution, with the full set of ForenSeq DNA Signature Prep kit markers allowing detection of third-degree relatives. Since SNP chips include non-autosomal (X- and Y-chromosomal, and mitochondrial [mtDNA]) markers, we ask how these perform within the pedigrees, cross-referencing to Y-STR sequence data. We highlight the importance of understanding haplogroup resolutions in the increasingly complex Y and mtDNA phylogenies, to avoid false exclusions. Incorporation of X-SNPs allows tracing of X-chromosome segments within families. These different approaches can add value to kinship estimation, but some require simpler bioinformatic interfaces to make them more widely accessible in practice, and also access to appropriate allele frequency data to avoid problems associated with ancestry mis-specification.
Collapse
Affiliation(s)
- Margherita Colucci
- Department of Genetics, Genomics & Cancer Sciences, University of Leicester, University Road, Leicester, UK
| | - Jon H Wetton
- Department of Genetics, Genomics & Cancer Sciences, University of Leicester, University Road, Leicester, UK
| | - Burkhard Rolf
- Eurofins Genomics and Forensics Campus, Ebersberg, Germany
| | - Nuala Sheehan
- Department of Population Health Sciences, University of Leicester, University Road, Leicester, UK
| | - Mark A Jobling
- Department of Genetics, Genomics & Cancer Sciences, University of Leicester, University Road, Leicester, UK.
| |
Collapse
|
2
|
Freudiger A, Jovanovic VM, Huang Y, Snyder-Mackler N, Conrad DF, Miller B, Montague MJ, Westphal H, Stadler PF, Bley S, Horvath JE, Brent LJN, Platt ML, Ruiz-Lambides A, Tung J, Nowick K, Ringbauer H, Widdig A. Estimating realized relatedness in free-ranging macaques by inferring identity-by-descent segments. Proc Natl Acad Sci U S A 2025; 122:e2401106122. [PMID: 39808663 PMCID: PMC11760927 DOI: 10.1073/pnas.2401106122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Accepted: 12/04/2024] [Indexed: 01/16/2025] Open
Abstract
Biological relatedness is a key consideration in studies of behavior, population structure, and trait evolution. Except for parent-offspring dyads, pedigrees capture relatedness imperfectly. The number and length of identical-by-descent DNA segments (IBD) yield the most precise relatedness estimates. Here, we leverage different methods for estimating IBD segments from low-depth whole genome resequencing data to demonstrate the feasibility and value of resolving fine-scaled gradients of relatedness in free-living animals. Using primarily 4 to 6× depth data from a rhesus macaque (Macaca mulatta) population with long-term pedigree data, we show that we can infer the number and length of IBD segments across the genome with high accuracy even at 0.5× sequencing depth. In line with expectations based on simulation, the resulting estimates demonstrate substantial variation in genetic relatedness within kin classes, leading to overlapping distributions between kin classes. By comparing the IBD-based estimates with pedigree and short tandem repeat-based methods, we show that IBD estimates are more reliable and provide more detailed information on kinship. The inferred IBD segments also identify cryptic genetic relatives not represented in the pedigree and reveal elevated recombination rates in females relative to males, which enables the majority of close maternal and paternal kin to be distinguished with genotype data alone. Our findings represent a breakthrough in the ability to study the predictors and consequences of genetic relatedness in natural populations, contributing to our understanding of a fundamental component of population structure in the wild.
Collapse
Affiliation(s)
- Annika Freudiger
- Department of Primate Behavioral Ecology, Institute of Biology, Leipzig University, Leipzig04103, Germany
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig04103, Germany
| | - Vladimir M. Jovanovic
- Department of Biology, Chemistry and Pharmacy, Human Biology and Primate Evolution, Freie Universität Berlin, Berlin14195, Germany
- Department of Mathematics and Computer Science, Bioinformatics Solution Center, Freie Universität Berlin, Berlin14195, Germany
| | - Yilei Huang
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig04103, Germany
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig University, Leipzig04107, Germany
| | - Noah Snyder-Mackler
- Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, AZ85281
| | - Donald F. Conrad
- Division of Genetics, Oregon National Primate Research Center, Portland, OR97006
| | - Brian Miller
- Division of Genetics, Oregon National Primate Research Center, Portland, OR97006
| | - Michael J. Montague
- Department of Neuroscience, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA19104
| | - Hendrikje Westphal
- Department of Primate Behavioral Ecology, Institute of Biology, Leipzig University, Leipzig04103, Germany
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig04103, Germany
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig University, Leipzig04107, Germany
| | - Peter F. Stadler
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig University, Leipzig04107, Germany
- Max Planck Institute for Mathematics in the Sciences, Leipzig04103, Germany
- Institute for Theoretical Chemistry, University of Vienna, Vienna1090, Austria
- Facultad de Ciencias, Universidad Nacional de Colombia, Bogotá111311, Colombia
- Santa Fe Institute, Santa Fe, NM87501
| | - Stefanie Bley
- Department of Primate Behavioral Ecology, Institute of Biology, Leipzig University, Leipzig04103, Germany
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig04103, Germany
| | - Julie E. Horvath
- Research and Collections Section, North Carolina Museum of Natural Sciences, Raleigh, NC27601
- Department of Biological Sciences, North Carolina State University, Raleigh, NC27607
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC27517
| | - Lauren J. N. Brent
- Centre for Research in Animal Behavior, University of Exeter, ExeterEX4 4QD, United Kingdom
| | - Michael L. Platt
- Department of Neuroscience, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA19104
- Marketing Department, Wharton School of Business, University of Pennsylvania, Philadelphia, PA19104
- Department of Psychology, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA19104
| | - Angelina Ruiz-Lambides
- Cayo Santiago Field Station, Caribbean Primate Research Center, University of Puerto Rico, Punta Santiago00741, Puerto Rico
| | - Jenny Tung
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig04103, Germany
- Department of Evolutionary Anthropology, Duke University, Durham, NC27710
- Department of Biology, Duke University, Durham, NC27710
- Duke University Population Research Institute, Durham, NC27710
| | - Katja Nowick
- Department of Biology, Chemistry and Pharmacy, Human Biology and Primate Evolution, Freie Universität Berlin, Berlin14195, Germany
- Department of Mathematics and Computer Science, Bioinformatics Solution Center, Freie Universität Berlin, Berlin14195, Germany
| | - Harald Ringbauer
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig04103, Germany
| | - Anja Widdig
- Department of Primate Behavioral Ecology, Institute of Biology, Leipzig University, Leipzig04103, Germany
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig04103, Germany
- German Centre for Integrative Biodiversity Research, Leipzig04103, Germany
| |
Collapse
|
3
|
Lavanchy E, Cumer T, Topaloudis A, Ducrest AL, Simon C, Roulin A, Goudet J. Too big to purge: persistence of deleterious Mutations in Island populations of the European Barn Owl (Tyto alba). Heredity (Edinb) 2024; 133:437-449. [PMID: 39397112 PMCID: PMC11589586 DOI: 10.1038/s41437-024-00728-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Revised: 09/24/2024] [Accepted: 09/25/2024] [Indexed: 10/15/2024] Open
Abstract
A key aspect of assessing the risk of extinction/extirpation for a particular wild species or population is the status of inbreeding, but the origin of inbreeding and the current mutational load are also two crucial factors to consider when determining survival probability of a population. In this study, we used samples from 502 barn owls from continental and island populations across Europe, with the aim of quantifying and comparing the level of inbreeding between populations with differing demographic histories. In addition to comparing inbreeding status, we determined whether inbreeding is due to non-random mating or high co-ancestry within the population. We show that islands have higher levels of inbreeding than continental populations, and that this is mainly due to small effective population sizes rather than recent consanguineous mating. We assess the probability that a region is autozygous along the genome and show that this probability decreased as the number of genes present in that region increased. Finally, we looked for evidence of reduced selection efficiency and purging in island populations. Among island populations, we found an increase in numbers of both neutral and deleterious minor alleles, possibly as a result of drift and decreased selection efficiency but we found no evidence of purging.
Collapse
Affiliation(s)
- Eléonore Lavanchy
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland
| | - Tristan Cumer
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland
| | - Alexandros Topaloudis
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland
| | - Anne-Lyse Ducrest
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
| | - Céline Simon
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
| | - Alexandre Roulin
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
| | - Jérôme Goudet
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland.
- Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland.
| |
Collapse
|
4
|
Zhou Y, Wang Q, Wang Q, Yan Y, Li G, Wu G, Yang N, Wen C. Pedigree reconstruction based on genotype data in chickens. Poult Sci 2024; 103:104327. [PMID: 39357237 PMCID: PMC11474194 DOI: 10.1016/j.psj.2024.104327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 08/22/2024] [Accepted: 09/09/2024] [Indexed: 10/04/2024] Open
Abstract
A reliable pedigree serves as the backbone of genetic evolution in domesticated animals, providing guidance for daily management and breeding strategies. However, in commercial chicken breeding, pedigree errors and omissions are common. The large-scale application of genomic selection provides an opportunity to reconstruct chicken pedigrees using SNP markers. Here, to reconstruct pedigrees in chickens, we detected high-quality SNPs from 2866 parent-offspring pairs and calculated their genomic relationship and identity by descent (IBD). The results showed that the IBD values for parent-offspring pairs ranged from 0.48 to 0.58, clearly distinguishing them from nonparent-offspring pairs and demonstrating robustness in parentage assignment. In contrast, the genomic relatedness coefficients varied from 0.32 to 0.65. The accuracy of pedigree reconstruction significantly improved as the SNP number and minor allele frequency (MAF) increased. When the number of SNPs exceeded 200, better inference power was exhibited with IBD than with genomic relatedness. Upon reaching an effective SNP quantity of 350, despite a MAF of 0.01, the accuracy of the pedigrees inferred reached a remarkable level of 99%. Furthermore, with a doubled SNP quantity of 700 and a MAF of 0.05, the accuracy increased to a perfect 100%. This study demonstrated the feasibility of accurately constructing pedigrees in chickens using low-density SNP markers and emphasized the importance of considering the number and MAFs of these markers to achieve optimal outcomes. The adoption of the IBD as a suitable metric for pedigree inference is promising for improving the efficiency and accuracy of genetic breeding programs. These findings are paramount for the development of cost-effective yet accurate parentage verification systems.
Collapse
Affiliation(s)
- Yan Zhou
- State Key Laboratory of Animal Biotech Breeding and Frontier Science Center for Molecular Design Breeding, China Agricultural University, Beijing, 100193, China; Department of Animal Genetics and Breeding, National Engineering Laboratory for Animal Breeding and Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China
| | - Qunpu Wang
- State Key Laboratory of Animal Biotech Breeding and Frontier Science Center for Molecular Design Breeding, China Agricultural University, Beijing, 100193, China; Department of Animal Genetics and Breeding, National Engineering Laboratory for Animal Breeding and Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China
| | - Qiulian Wang
- State Key Laboratory of Animal Biotech Breeding and Frontier Science Center for Molecular Design Breeding, China Agricultural University, Beijing, 100193, China; Department of Animal Genetics and Breeding, National Engineering Laboratory for Animal Breeding and Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China
| | - Yiyuan Yan
- Beijing Engineering Research Center of Layer, Beijing, 101206, China
| | - Guangqi Li
- Beijing Engineering Research Center of Layer, Beijing, 101206, China
| | - Guiqin Wu
- Beijing Engineering Research Center of Layer, Beijing, 101206, China
| | - Ning Yang
- State Key Laboratory of Animal Biotech Breeding and Frontier Science Center for Molecular Design Breeding, China Agricultural University, Beijing, 100193, China; Department of Animal Genetics and Breeding, National Engineering Laboratory for Animal Breeding and Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China; Sanya Institute of China Agricultural University, Hainan, 572025, China
| | - Chaoliang Wen
- State Key Laboratory of Animal Biotech Breeding and Frontier Science Center for Molecular Design Breeding, China Agricultural University, Beijing, 100193, China; Department of Animal Genetics and Breeding, National Engineering Laboratory for Animal Breeding and Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China; Sanya Institute of China Agricultural University, Hainan, 572025, China.
| |
Collapse
|
5
|
Ko BS, Lee SB, Kim TK. A brief guide to analyzing expression quantitative trait loci. Mol Cells 2024; 47:100139. [PMID: 39447874 PMCID: PMC11600780 DOI: 10.1016/j.mocell.2024.100139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Revised: 10/14/2024] [Accepted: 10/17/2024] [Indexed: 10/26/2024] Open
Abstract
Molecular quantitative trait locus (molQTL) mapping has emerged as an important approach for elucidating the functional consequences of genetic variants and unraveling the causal mechanisms underlying diseases or complex traits. However, the variety of analysis tools and sophisticated methodologies available for molQTL studies can be overwhelming for researchers with limited computational expertise. Here, we provide a brief guideline with a curated list of methods and software tools for analyzing expression quantitative trait loci, the most widely studied type of molQTL.
Collapse
Affiliation(s)
- Byung Su Ko
- Department of Brain Sciences, DGIST, Daegu 42988, Republic of Korea
| | - Sung Bae Lee
- Department of Brain Sciences, DGIST, Daegu 42988, Republic of Korea
| | - Tae-Kyung Kim
- Department of Life Sciences, Pohang University of Science and Technology (POSTECH), Pohang 37673, Republic of Korea; Institute for Convergence Research and Education in Advanced Technology, Yonsei University, Seoul 03722, Republic of Korea.
| |
Collapse
|
6
|
Schraiber JG, Edge MD, Pennell M. Unifying approaches from statistical genetics and phylogenetics for mapping phenotypes in structured populations. PLoS Biol 2024; 22:e3002847. [PMID: 39383205 PMCID: PMC11493298 DOI: 10.1371/journal.pbio.3002847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Revised: 10/21/2024] [Accepted: 09/17/2024] [Indexed: 10/11/2024] Open
Abstract
In both statistical genetics and phylogenetics, a major goal is to identify correlations between genetic loci or other aspects of the phenotype or environment and a focal trait. In these 2 fields, there are sophisticated but disparate statistical traditions aimed at these tasks. The disconnect between their respective approaches is becoming untenable as questions in medicine, conservation biology, and evolutionary biology increasingly rely on integrating data from within and among species, and once-clear conceptual divisions are becoming increasingly blurred. To help bridge this divide, we lay out a general model describing the covariance between the genetic contributions to the quantitative phenotypes of different individuals. Taking this approach shows that standard models in both statistical genetics (e.g., genome-wide association studies; GWAS) and phylogenetic comparative biology (e.g., phylogenetic regression) can be interpreted as special cases of this more general quantitative-genetic model. The fact that these models share the same core architecture means that we can build a unified understanding of the strengths and limitations of different methods for controlling for genetic structure when testing for associations. We develop intuition for why and when spurious correlations may occur analytically and conduct population-genetic and phylogenetic simulations of quantitative traits. The structural similarity of problems in statistical genetics and phylogenetics enables us to take methodological advances from one field and apply them in the other. We demonstrate by showing how a standard GWAS technique-including both the genetic relatedness matrix (GRM) as well as its leading eigenvectors, corresponding to the principal components of the genotype matrix, in a regression model-can mitigate spurious correlations in phylogenetic analyses. As a case study, we re-examine an analysis testing for coevolution of expression levels between genes across a fungal phylogeny and show that including eigenvectors of the covariance matrix as covariates decreases the false positive rate while simultaneously increasing the true positive rate. More generally, this work provides a foundation for more integrative approaches for understanding the genetic architecture of phenotypes and how evolutionary processes shape it.
Collapse
Affiliation(s)
- Joshua G. Schraiber
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California, United States of America
| | - Michael D. Edge
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California, United States of America
| | - Matt Pennell
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California, United States of America
- Department of Biological Sciences, University of Southern California, Los Angeles, California, United States of America
| |
Collapse
|
7
|
Balinova N, Hudjašov G, Pankratov V, Pennarun E, Reidla M, Metspalu E, Batyrov V, Khomyakova I, Reisberg T, Parik J, Dzhaubermezov M, Aiyzhy E, Balinova A, El'chinova G, Spitsyna N, Khusnutdinova E, Metspalu M, Tambets K, Villems R, Kushniarevich A. Gene pool preservation across time and space In Mongolian-speaking Oirats. Eur J Hum Genet 2024; 32:1150-1158. [PMID: 38605123 PMCID: PMC11369229 DOI: 10.1038/s41431-024-01588-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Revised: 02/27/2024] [Accepted: 03/04/2024] [Indexed: 04/13/2024] Open
Abstract
The Oirats are a group of Mongolian-speaking peoples residing in Russia, China, and Mongolia, who speak Oirat dialects of the Mongolian language. Migrations of nomadic ethnopolitical formations of the Oirats across the Eurasian Steppe during the Late Middle Ages/early Modern times resulted in a wide geographic spread of Oirat ethnic groups from present-day northwestern China in East Asia to the Lower Volga region in Eastern Europe. In this study, we generate new genome-wide and mitochondrial DNA data for present-day Oirat-speaking populations from Kalmykia in Eastern Europe, Western Mongolia, and the Xinjiang region of China, as well as Issyk-Kul Sart-Kalmaks from Central Asia, and historically related ethnic groups from Altai, Tuva, and Northern Mongolia to study the genetic structure and history of the Oirats. Despite their spatial and temporal separation, small current population census, both the Kalmyks of Eastern Europe and the Oirats of Western Mongolia in East Asia are characterized by strong genetic similarity, high effective population size, and low levels of interpopulation structure. This contrasts the fine genetic structure observed today at a smaller geographic scale in traditionally sedentary populations, and is conditioned by high mobility and marriage practices (traditional strict exogamy) in nomadic groups. Conversely, the genetic profile of the Issyk-Kul Sart-Kalmaks suggests a distinct source(s) of genetic ancestry, along with indications of isolation and genetic drift compared to other Oirats. Our results also show that there was limited gene flow between the ancestors of the Oirats and the Altaians during the late Middle Ages. Source of the yurt image: https://www.vecteezy.com/free-vector/yurt .
Collapse
Affiliation(s)
- Natalia Balinova
- Research Centre for Medical Genetics, Moskvorechye Str. 1, 115522, Moscow, Russia.
| | - Georgi Hudjašov
- Core Facility of Genomics, Institute of Genomics, University of Tartu, Riia 23B, 51010, Tartu, Estonia
| | - Vasili Pankratov
- Institute of Genomics, University of Tartu, Riia 23B, 51010, Tartu, Estonia
| | - Erwan Pennarun
- Institute of Genomics, University of Tartu, Riia 23B, 51010, Tartu, Estonia
| | - Maere Reidla
- Institute of Genomics, University of Tartu, Riia 23B, 51010, Tartu, Estonia
| | - Ene Metspalu
- Institute of Genomics, University of Tartu, Riia 23B, 51010, Tartu, Estonia
| | - Valery Batyrov
- Kalmyk State University named after B. B. Gorodovikov, Pushkina Str. 11, 358000, Elista, Russia
| | - Irina Khomyakova
- Anuchin Research Institute and Museum of Anthropology, Lomonosov Moscow State University, Mokhovaya Str., 11, 125009, Moscow, Russia
| | - Tuuli Reisberg
- Core Facility of Genomics, Institute of Genomics, University of Tartu, Riia 23B, 51010, Tartu, Estonia
| | - Jüri Parik
- Core Facility of Genomics, Institute of Genomics, University of Tartu, Riia 23B, 51010, Tartu, Estonia
| | - Murat Dzhaubermezov
- Institute of Biochemistry and Genetics, Ufa Federal Research Center of the Russian Academy of Sciences, 71 Prospekt Oktyabrya Str., 450054, Ufa, Russia
- Federal State Educational Institution of Higher Education "Ufa University of Science and Technology", 32 Zaki Validi Str., 450076, Ufa, Russia
| | - Elena Aiyzhy
- Tuvan State University, Kyzyl, Russian Federation, Lenina Str., 36, 667000, Kyzyl, Republiс of Tuva, Russia
| | - Altana Balinova
- Institute of Linguistics, Russian Academy of Sciences, Bolshoi Kislovsky Pereulok, 1, 125009, Moscow, Russia
| | - Galina El'chinova
- Research Centre for Medical Genetics, Moskvorechye Str. 1, 115522, Moscow, Russia
| | - Nailya Spitsyna
- Institute of Ethnology and Anthropology, Russian Academy of Sciences, Leninsky Prospekt, 32 А, 119334, Moscow, Russia
| | - Elza Khusnutdinova
- Institute of Biochemistry and Genetics, Ufa Federal Research Center of the Russian Academy of Sciences, 71 Prospekt Oktyabrya Str., 450054, Ufa, Russia
- Federal State Educational Institution of Higher Education "Ufa University of Science and Technology", 32 Zaki Validi Str., 450076, Ufa, Russia
| | - Mait Metspalu
- Institute of Genomics, University of Tartu, Riia 23B, 51010, Tartu, Estonia
| | - Kristiina Tambets
- Estonian Biocentre, Institute of Genomics, University of Tartu, Riia 23B, 51010, Tartu, Estonia
| | - Richard Villems
- Estonian Biocentre, Institute of Genomics, University of Tartu, Riia 23B, 51010, Tartu, Estonia
| | - Alena Kushniarevich
- Estonian Biocentre, Institute of Genomics, University of Tartu, Riia 23B, 51010, Tartu, Estonia.
| |
Collapse
|
8
|
Tiedje KE, Zhan Q, Ruybal-Pesantez S, Tonkin-Hill G, He Q, Tan MH, Argyropoulos DC, Deed SL, Ghansah A, Bangre O, Oduro AR, Koram KA, Pascual M, Day KP. Measuring changes in Plasmodium falciparum census population size in response to sequential malaria control interventions. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2023.05.18.23290210. [PMID: 37292908 PMCID: PMC10246142 DOI: 10.1101/2023.05.18.23290210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Here we introduce a new endpoint ″census population size″ to evaluate the epidemiology and control of Plasmodium falciparum infections, where the parasite, rather than the infected human host, is the unit of measurement. To calculate census population size, we rely on a definition of parasite variation known as multiplicity of infection (MOI var ), based on the hyper-diversity of the var multigene family. We present a Bayesian approach to estimate MOI var from sequencing and counting the number of unique DBLα tags (or DBLα types) of var genes, and derive from it census population size by summation of MOI var in the human population. We track changes in this parasite population size and structure through sequential malaria interventions by indoor residual spraying (IRS) and seasonal malaria chemoprevention (SMC) from 2012 to 2017 in an area of high-seasonal malaria transmission in northern Ghana. Following IRS, which reduced transmission intensity by > 90% and decreased parasite prevalence by ~40-50%, significant reductions in var diversity, MOI var , and population size were observed in ~2,000 humans across all ages. These changes, consistent with the loss of diverse parasite genomes, were short lived and 32-months after IRS was discontinued and SMC was introduced, var diversity and population size rebounded in all age groups except for the younger children (1-5 years) targeted by SMC. Despite major perturbations from IRS and SMC interventions, the parasite population remained very large and retained the var population genetic characteristics of a high-transmission system (high var diversity; low var repertoire similarity) demonstrating the resilience of P. falciparum to short-term interventions in high-burden countries of sub-Saharan Africa.
Collapse
|
9
|
Pagnamenta AT, Yu J, Walker S, Noble AJ, Lord J, Dutta P, Hashim M, Camps C, Green H, Devaiah S, Nashef L, Parr J, Fratter C, Ibnouf Hussein R, Lindsay SJ, Lalloo F, Banos-Pinero B, Evans D, Mallin L, Waite A, Evans J, Newman A, Allen Z, Perez-Becerril C, Ryan G, Hart R, Taylor J, Bedenham T, Clement E, Blair E, Hay E, Forzano F, Higgs J, Canham N, Majumdar A, McEntagart M, Lahiri N, Stewart H, Smithson S, Calpena E, Jackson A, Banka S, Titheradge H, McGowan R, Rankin J, Shaw-Smith C, Evans DG, Burghel GJ, Smith MJ, Anderson E, Madhu R, Firth H, Ellard S, Brennan P, Anderson C, Taupin D, Rogers MT, Cook JA, Durkie M, East JE, Fowler D, Wilson L, Igbokwe R, Gardham A, Tomlinson I, Baralle D, Uhlig HH, Taylor JC. The impact of inversions across 33,924 families with rare disease from a national genome sequencing project. Am J Hum Genet 2024; 111:1140-1164. [PMID: 38776926 PMCID: PMC11179413 DOI: 10.1016/j.ajhg.2024.04.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2023] [Revised: 04/25/2024] [Accepted: 04/25/2024] [Indexed: 05/25/2024] Open
Abstract
Detection of structural variants (SVs) is currently biased toward those that alter copy number. The relative contribution of inversions toward genetic disease is unclear. In this study, we analyzed genome sequencing data for 33,924 families with rare disease from the 100,000 Genomes Project. From a database hosting >500 million SVs, we focused on 351 genes where haploinsufficiency is a confirmed disease mechanism and identified 47 ultra-rare rearrangements that included an inversion (24 bp to 36.4 Mb, 20/47 de novo). Validation utilized a number of orthogonal approaches, including retrospective exome analysis. RNA-seq data supported the respective diagnoses for six participants. Phenotypic blending was apparent in four probands. Diagnostic odysseys were a common theme (>50 years for one individual), and targeted analysis for the specific gene had already been performed for 30% of these individuals but with no findings. We provide formal confirmation of a European founder origin for an intragenic MSH2 inversion. For two individuals with complex SVs involving the MECP2 mutational hotspot, ambiguous SV structures were resolved using long-read sequencing, influencing clinical interpretation. A de novo inversion of HOXD11-13 was uncovered in a family with Kantaputra-type mesomelic dysplasia. Lastly, a complex translocation disrupting APC and involving nine rearranged segments confirmed a clinical diagnosis for three family members and resolved a conundrum for a sibling with a single polyp. Overall, inversions play a small but notable role in rare disease, likely explaining the etiology in around 1/750 families across heterogeneous clinical cohorts.
Collapse
Affiliation(s)
- Alistair T Pagnamenta
- Oxford Biomedical Research Centre, Centre for Human Genetics, University of Oxford, Oxford, UK.
| | - Jing Yu
- Oxford Biomedical Research Centre, Centre for Human Genetics, University of Oxford, Oxford, UK; Novo Nordisk Oxford Research Centre, Oxford, UK
| | | | - Alexandra J Noble
- Translational Gastroenterology Unit, John Radcliffe Hospital, Oxford, UK
| | - Jenny Lord
- School of Human Development and Health, Faculty of Medicine, University of Southampton, Southampton, UK; Sheffield Institute for Translational Neuroscience, The University of Sheffield, Sheffield, UK
| | - Prasun Dutta
- Oxford Biomedical Research Centre, Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Mona Hashim
- Oxford Biomedical Research Centre, Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Carme Camps
- Oxford Biomedical Research Centre, Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Hannah Green
- Oxford Genetics Laboratories, Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| | - Smrithi Devaiah
- Oxford Centre for Genomic Medicine, Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| | - Lina Nashef
- Department of Neurology, King's College Hospital, London, UK
| | - Jason Parr
- Manchester Centre for Genomic Medicine, Manchester University Hospitals NHS Foundation Trust, Health Innovation Manchester, Manchester, UK
| | - Carl Fratter
- Oxford Genetics Laboratories, Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| | - Rana Ibnouf Hussein
- Manchester Centre for Genomic Medicine, Manchester University Hospitals NHS Foundation Trust, Health Innovation Manchester, Manchester, UK
| | - Sarah J Lindsay
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Fiona Lalloo
- Manchester Centre for Genomic Medicine, Manchester University Hospitals NHS Foundation Trust, Health Innovation Manchester, Manchester, UK
| | - Benito Banos-Pinero
- Oxford Genetics Laboratories, Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| | - David Evans
- Exeter Genomics Laboratory, Royal Devon University Healthcare NHS Foundation Trust, Exeter, UK
| | - Lucy Mallin
- Exeter Genomics Laboratory, Royal Devon University Healthcare NHS Foundation Trust, Exeter, UK
| | - Adrian Waite
- Bristol Genetics Laboratory, North Bristol NHS Trust, Bristol, UK
| | - Julie Evans
- Bristol Genetics Laboratory, North Bristol NHS Trust, Bristol, UK
| | - Andrew Newman
- The All Wales Medical Genomics Service, University Hospital of Wales, Cardiff, UK
| | - Zoe Allen
- North Thames Rare and Inherited Disease Laboratory, Great Ormond Street Hospital for Children NHS Foundation Trust, London, UK
| | - Cristina Perez-Becerril
- Manchester Centre for Genomic Medicine, Manchester University Hospitals NHS Foundation Trust, Health Innovation Manchester, Manchester, UK
| | - Gavin Ryan
- West Midlands Regional Genetics Laboratory, Central and South Genomic Laboratory Hub, Birmingham, UK
| | - Rachel Hart
- Liverpool Centre for Genomic Medicine, Liverpool Women's NHS Foundation Trust, Liverpool, UK
| | - John Taylor
- Oxford Genetics Laboratories, Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| | - Tina Bedenham
- Oxford Genetics Laboratories, Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| | - Emma Clement
- North East Thames Regional Genetic Service, Great Ormond Street Hospital for Children NHS Foundation Trust, London, UK
| | - Ed Blair
- Oxford Centre for Genomic Medicine, Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| | - Eleanor Hay
- North East Thames Regional Genetic Service, Great Ormond Street Hospital for Children NHS Foundation Trust, London, UK
| | - Francesca Forzano
- Clinical Genetics Department, Guy's and St Thomas' NHS Foundation Trust, London, UK
| | - Jenny Higgs
- Liverpool Centre for Genomic Medicine, Liverpool Women's NHS Foundation Trust, Liverpool, UK
| | - Natalie Canham
- Liverpool Centre for Genomic Medicine, Liverpool Women's NHS Foundation Trust, Liverpool, UK
| | - Anirban Majumdar
- Department of Paediatric Neurology, Bristol Children's Hospital, Bristol, UK
| | - Meriel McEntagart
- SW Thames Centre for Genomic Medicine, University of London & St George's University Hospitals NHS Foundation Trust, St George's, London, UK
| | - Nayana Lahiri
- SW Thames Centre for Genomic Medicine, University of London & St George's University Hospitals NHS Foundation Trust, St George's, London, UK
| | - Helen Stewart
- Oxford Centre for Genomic Medicine, Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| | - Sarah Smithson
- Department of Clinical Genetics, University Hospitals Bristol NHS Foundation Trust, Bristol, UK
| | - Eduardo Calpena
- Clinical Genetics Group, MRC Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, UK; Grupo de Investigación en Biomedicina Molecular, Celular y Genómica, Unidad CIBERER (CB06/07/1030), Instituto de Investigación Sanitaria La Fe (IIS La Fe), Valencia, Spain
| | - Adam Jackson
- Manchester Centre for Genomic Medicine, Manchester University Hospitals NHS Foundation Trust, Health Innovation Manchester, Manchester, UK; Division of Evolution, Infection and Genomics, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
| | - Siddharth Banka
- Manchester Centre for Genomic Medicine, Manchester University Hospitals NHS Foundation Trust, Health Innovation Manchester, Manchester, UK; Division of Evolution, Infection and Genomics, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
| | - Hannah Titheradge
- Department of Clinical Genetics, Birmingham Women's and Children's NHS Foundation Trust, Birmingham, UK
| | - Ruth McGowan
- West of Scotland Centre for Genomic Medicine, Glasgow, UK
| | - Julia Rankin
- Department of Clinical Genetics, Royal Devon University Healthcare NHS Trust, Exeter, UK
| | - Charles Shaw-Smith
- Department of Clinical Genetics, Royal Devon University Healthcare NHS Trust, Exeter, UK
| | - D Gareth Evans
- Manchester Centre for Genomic Medicine, Manchester University Hospitals NHS Foundation Trust, Health Innovation Manchester, Manchester, UK; Division of Evolution, Infection and Genomics, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
| | - George J Burghel
- Manchester Centre for Genomic Medicine, Manchester University Hospitals NHS Foundation Trust, Health Innovation Manchester, Manchester, UK
| | - Miriam J Smith
- Manchester Centre for Genomic Medicine, Manchester University Hospitals NHS Foundation Trust, Health Innovation Manchester, Manchester, UK
| | - Emily Anderson
- Liverpool Centre for Genomic Medicine, Liverpool Women's NHS Foundation Trust, Liverpool, UK
| | - Rajesh Madhu
- Paediatric Neurosciences Department, Alder Hey Children's Hospital NHS Foundation Trust, Liverpool, UK
| | - Helen Firth
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Sian Ellard
- Exeter Genomics Laboratory, Royal Devon University Healthcare NHS Foundation Trust, Exeter, UK
| | - Paul Brennan
- Institute of Genetic Medicine, Newcastle University, International Centre for Life, Newcastle University, Newcastle, UK
| | - Claire Anderson
- Canberra Clinical Genomics, Canberra Health Services and The Australian National University, Canberra, ACT, Australia
| | - Doug Taupin
- Cancer Research, Canberra Hospital, Canberra, ACT, Australia
| | - Mark T Rogers
- The All Wales Medical Genomics Service, University Hospital of Wales, Cardiff, UK
| | - Jackie A Cook
- Department of Clinical Genetics, Sheffield Children's NHS Foundation Trust, Sheffield, UK
| | - Miranda Durkie
- Sheffield Diagnostic Genetics Service, Sheffield Children's NHS Foundation Trust, North East and Yorkshire Genomic Laboratory Hub, Sheffield, UK
| | - James E East
- Translational Gastroenterology Unit, John Radcliffe Hospital, Oxford, UK
| | - Darren Fowler
- Department of Cellular Pathology, Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| | - Louise Wilson
- North East Thames Regional Genetic Service, Great Ormond Street Hospital for Children NHS Foundation Trust, London, UK
| | - Rebecca Igbokwe
- Department of Clinical Genetics, Birmingham Women's and Children's NHS Foundation Trust, Birmingham, UK
| | - Alice Gardham
- North East Thames Regional Genetic Service, Great Ormond Street Hospital for Children NHS Foundation Trust, London, UK
| | - Ian Tomlinson
- Department of Oncology, University of Oxford, Oxford, UK
| | - Diana Baralle
- School of Human Development and Health, Faculty of Medicine, University of Southampton, Southampton, UK
| | - Holm H Uhlig
- Oxford Biomedical Research Centre, Centre for Human Genetics, University of Oxford, Oxford, UK; Translational Gastroenterology Unit, John Radcliffe Hospital, Oxford, UK
| | - Jenny C Taylor
- Oxford Biomedical Research Centre, Centre for Human Genetics, University of Oxford, Oxford, UK.
| |
Collapse
|
10
|
Nantongo Z, Birungi J, Opiyo SO, Shirima G, Mugerwa S, Mutai C, Kyalo M, Munishi L, Agaba M, Mrode R. Genetic diversity, population structure and kinship relationships highlight the environmental influence on Uganda's indigenous goat populations. Front Genet 2024; 15:1385611. [PMID: 38873114 PMCID: PMC11169577 DOI: 10.3389/fgene.2024.1385611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Accepted: 04/22/2024] [Indexed: 06/15/2024] Open
Abstract
Knowledge about genetic diversity and population structure among goat populations is essential for understanding environmental adaptation and fostering efficient utilization, development, and conservation of goat breeds. Uganda's indigenous goats exist in three phenotypic groups: Mubende, Kigezi, and Small East African. However, a limited understanding of their genetic attributes and population structure hinders the development and sustainable utilization of the goats. Using the Goat Illumina 60k chip International Goat Genome Consortium V2, the whole-genome data for 1,021 indigenous goats sourced from 10 agroecological zones in Uganda were analyzed for genetic diversity and population structure. A total of 49,337 (82.6%) single-nucleotide polymorphism markers were aligned to the ARS-1 goat genome and used to assess the genetic diversity, population structure, and kinship relationships of Uganda's indigenous goats. Moderate genetic diversity was observed. The observed and expected heterozygosities were 0.378 and 0.383, the average genetic distance was 0.390, and the average minor allele frequency was 0.30. The average inbreeding coefficient (Fis) was 0.014, and the average fixation index (Fst) was 0.016. Principal component analysis, admixture analysis, and discriminant analysis of principal components grouped the 1,021 goat genotypes into three genetically distinct populations that did not conform to the known phenotypic populations but varied across environmental conditions. Population 1, comprising Mubende (90%) and Kigezi (8.1%) goats, is located in southwest and central Uganda, a warm and humid environment. Population 2, which is 59% Mubende and 49% Small East African goats, is located along the Nile Delta in northwestern Uganda and around the Albertine region, a hot and humid savannah grassland. Population 3, comprising 78.4% Small East African and 21.1% Mubende goats, is found in northeastern to eastern Uganda, a hot and dry Commiphora woodlands. Genetic diversity and population structure information from this study will be a basis for future development, conservation, and sustainable utilization of Uganda's goat genetic resources.
Collapse
Affiliation(s)
- Ziwena Nantongo
- Biosciences Eastern and Central Africa, International Livestock Research Institute, Consortium of International Agricultural Research Centers (CGIAR), Nairobi, Kenya
- School of Life Sciences, Nelson Mandela African Institution of Science and Technology, Arusha, Tanzania
- National Livestock Resources Research Institute, National Agricultural Research Organization, Kampala, Uganda
| | - Josephine Birungi
- Biosciences Eastern and Central Africa, International Livestock Research Institute, Consortium of International Agricultural Research Centers (CGIAR), Nairobi, Kenya
| | - Stephen Obol Opiyo
- Molecular and Cellular Imaging Center, The Ohio State University, Columbus, OH, United States
- Patira Data Science, Kampala, Uganda
| | - Gabriel Shirima
- School of Life Sciences, Nelson Mandela African Institution of Science and Technology, Arusha, Tanzania
| | - Swidiq Mugerwa
- National Livestock Resources Research Institute, National Agricultural Research Organization, Kampala, Uganda
| | - Collins Mutai
- Biosciences Eastern and Central Africa, International Livestock Research Institute, Consortium of International Agricultural Research Centers (CGIAR), Nairobi, Kenya
| | - Martina Kyalo
- Biosciences Eastern and Central Africa, International Livestock Research Institute, Consortium of International Agricultural Research Centers (CGIAR), Nairobi, Kenya
| | - Linus Munishi
- School of Life Sciences, Nelson Mandela African Institution of Science and Technology, Arusha, Tanzania
| | - Morris Agaba
- School of Life Sciences, Nelson Mandela African Institution of Science and Technology, Arusha, Tanzania
| | - Raphael Mrode
- Biosciences Eastern and Central Africa, International Livestock Research Institute, Consortium of International Agricultural Research Centers (CGIAR), Nairobi, Kenya
- Scotland Rural College, Edinburgh, United Kingdom
| |
Collapse
|
11
|
Bjørnstad PM, Aaløkken R, Åsheim J, Sundaram AYM, Felde CN, Østby GH, Dalland M, Sjursen W, Carrizosa C, Vigeland MD, Sorte HS, Sheng Y, Ariansen SL, Grindedal EM, Gilfillan GD. A 39 kb structural variant causing Lynch Syndrome detected by optical genome mapping and nanopore sequencing. Eur J Hum Genet 2024; 32:513-520. [PMID: 38030917 PMCID: PMC11061271 DOI: 10.1038/s41431-023-01494-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 10/19/2023] [Accepted: 11/06/2023] [Indexed: 12/01/2023] Open
Abstract
Lynch Syndrome (LS) is a hereditary cancer syndrome caused by pathogenic germline variants in one of the four mismatch repair (MMR) genes MLH1, MSH2, MSH6 and PMS2. It is characterized by a significantly increased risk of multiple cancer types, particularly colorectal and endometrial cancer, with autosomal dominant inheritance. Access to precise and sensitive methods for genetic testing is important, as early detection and prevention of cancer is possible when the variant is known. We present here two unrelated Norwegian families with family histories strongly suggestive of LS, where immunohistochemical and microsatellite instability analyses indicated presence of a pathogenic variant in MSH2, but targeted exon sequencing and multiplex ligation-dependent probe amplification (MLPA) were negative. Using Bionano optical genome mapping, we detected a 39 kb insertion in the MSH2 gene. Precise mapping of the insertion breakpoints and inserted sequence was performed by low-coverage whole-genome sequencing with an Oxford Nanopore MinION. The same variant was present in both families, and later found in other families from the same region of Norway, indicative of a founder event. To our knowledge, this is the first diagnosis of LS caused by a structural variant using these technologies. We suggest that structural variant detection be performed when LS is suspected but not confirmed with first-tier standard genetic testing.
Collapse
Affiliation(s)
- Pål Marius Bjørnstad
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Ragnhild Aaløkken
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - June Åsheim
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Arvind Y M Sundaram
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Caroline N Felde
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - G Henriette Østby
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Marianne Dalland
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Wenche Sjursen
- Department of Clinical & Molecular Medicine, NTNU and Department of Medical Genetics, St Olavs Hospital, Trondheim, Norway
| | - Christian Carrizosa
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Magnus D Vigeland
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
- Department of Forensic Sciences, Oslo University Hospital, 0372, Oslo, Norway
| | - Hanne S Sorte
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Ying Sheng
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Sarah L Ariansen
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Eli Marie Grindedal
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Gregor D Gilfillan
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway.
| |
Collapse
|
12
|
Lawson DJ, Howard-McCombe J, Beaumont M, Senn H. How admixed captive breeding populations could be rescued using local ancestry information. Mol Ecol 2024:e17349. [PMID: 38634332 DOI: 10.1111/mec.17349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 12/21/2023] [Accepted: 02/26/2024] [Indexed: 04/19/2024]
Abstract
This paper asks the question: can genomic information be used to recover a species that is already on the pathway to extinction due to genetic swamping from a related and more numerous population? We show that a breeding strategy in a captive breeding program can use whole genome sequencing to identify and remove segments of DNA introgressed through hybridisation. The proposed policy uses a generalized measure of kinship or heterozygosity accounting for local ancestry, that is, whether a specific genetic location was inherited from the target of conservation. We then show that optimizing these measures would minimize undesired ancestry while also controlling kinship and/or heterozygosity, in a simulated breeding population. The process is applied to real data representing the hybridized Scottish wildcat breeding population, with the result that it should be possible to breed out domestic cat ancestry. The ability to reverse introgression is a powerful tool brought about through the combination of sequencing with computational advances in ancestry estimation. Since it works best when applied early in the process, important decisions need to be made about which genetically distinct populations should benefit from it and which should be left to reform into a single population.
Collapse
Affiliation(s)
- Daniel J Lawson
- Institute of Statistical Sciences, School of Mathematics, University of Bristol, Bristol, UK
| | - Jo Howard-McCombe
- RZSS WildGenes Laboratory, Conservation Department, Royal Zoological Society of Scotland, Edinburgh, UK
| | - Mark Beaumont
- School of Biological Sciences, University of Bristol, Bristol, UK
| | - Helen Senn
- RZSS WildGenes Laboratory, Conservation Department, Royal Zoological Society of Scotland, Edinburgh, UK
| |
Collapse
|
13
|
Li Q, Bian J, Qian Y, Kossinna P, Gau C, Gordon PMK, Zhou X, Guo X, Yan J, Wu J, Long Q. An expression-directed linear mixed model discovering low-effect genetic variants. Genetics 2024; 226:iyae018. [PMID: 38314848 PMCID: PMC11630775 DOI: 10.1093/genetics/iyae018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 11/29/2023] [Accepted: 01/05/2024] [Indexed: 02/07/2024] Open
Abstract
Detecting genetic variants with low-effect sizes using a moderate sample size is difficult, hindering downstream efforts to learn pathology and estimating heritability. In this work, by utilizing informative weights learned from training genetically predicted gene expression models, we formed an alternative approach to estimate the polygenic term in a linear mixed model. Our linear mixed model estimates the genetic background by incorporating their relevance to gene expression. Our protocol, expression-directed linear mixed model, enables the discovery of subtle signals of low-effect variants using moderate sample size. By applying expression-directed linear mixed model to cohorts of around 5,000 individuals with either binary (WTCCC) or quantitative (NFBC1966) traits, we demonstrated its power gain at the low-effect end of the genetic etiology spectrum. In aggregate, the additional low-effect variants detected by expression-directed linear mixed model substantially improved estimation of missing heritability. Expression-directed linear mixed model moves precision medicine forward by accurately detecting the contribution of low-effect genetic variants to human diseases.
Collapse
Affiliation(s)
- Qing Li
- Department of Biochemistry & Molecular Biology, University of Calgary, Calgary T2N 1N4, Canada
| | - Jiayi Bian
- Department of Mathematics and Statistics, University of Calgary, Calgary T2N 1N4, Canada
| | - Yanzhao Qian
- Department of Mathematics and Statistics, University of Calgary, Calgary T2N 1N4, Canada
| | - Pathum Kossinna
- Department of Biochemistry & Molecular Biology, University of Calgary, Calgary T2N 1N4, Canada
| | - Cooper Gau
- Department of Mathematics and Statistics, University of Calgary, Calgary T2N 1N4, Canada
| | - Paul M K Gordon
- Alberta Children's Hospital Research Institute, University of Calgary, Calgary T2N 1N4, Canada
| | - Xiang Zhou
- School of Public Health, University of Michigan, Ann Arbor 48109, USA
| | - Xingyi Guo
- Department of Medicine & Biomedical Informatics, Vanderbilt University Medical Center, Nashville 37203, USA
| | - Jun Yan
- Physiology and Pharmacology, University of Calgary, Calgary T2N 1N4, Canada
- Hotchkiss Brain Institute, University of Calgary, Calgary T2N 1N4, Canada
| | - Jingjing Wu
- Department of Mathematics and Statistics, University of Calgary, Calgary T2N 1N4, Canada
| | - Quan Long
- Department of Biochemistry & Molecular Biology, University of Calgary, Calgary T2N 1N4, Canada
- Department of Mathematics and Statistics, University of Calgary, Calgary T2N 1N4, Canada
- Alberta Children's Hospital Research Institute, University of Calgary, Calgary T2N 1N4, Canada
- Hotchkiss Brain Institute, University of Calgary, Calgary T2N 1N4, Canada
- Department of Medical Genetics, University of Calgary, Calgary T2N 1N4, Canada
| |
Collapse
|
14
|
Schraiber JG, Edge MD, Pennell M. Unifying approaches from statistical genetics and phylogenetics for mapping phenotypes in structured populations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.10.579721. [PMID: 38496530 PMCID: PMC10942266 DOI: 10.1101/2024.02.10.579721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
In both statistical genetics and phylogenetics, a major goal is to identify correlations between genetic loci or other aspects of the phenotype or environment and a focal trait. In these two fields, there are sophisticated but disparate statistical traditions aimed at these tasks. The disconnect between their respective approaches is becoming untenable as questions in medicine, conservation biology, and evolutionary biology increasingly rely on integrating data from within and among species, and once-clear conceptual divisions are becoming increasingly blurred. To help bridge this divide, we derive a general model describing the covariance between the genetic contributions to the quantitative phenotypes of different individuals. Taking this approach shows that standard models in both statistical genetics (e.g., Genome-Wide Association Studies; GWAS) and phylogenetic comparative biology (e.g., phylogenetic regression) can be interpreted as special cases of this more general quantitative-genetic model. The fact that these models share the same core architecture means that we can build a unified understanding of the strengths and limitations of different methods for controlling for genetic structure when testing for associations. We develop intuition for why and when spurious correlations may occur using analytical theory and conduct population-genetic and phylogenetic simulations of quantitative traits. The structural similarity of problems in statistical genetics and phylogenetics enables us to take methodological advances from one field and apply them in the other. We demonstrate this by showing how a standard GWAS technique-including both the genetic relatedness matrix (GRM) as well as its leading eigenvectors, corresponding to the principal components of the genotype matrix, in a regression model-can mitigate spurious correlations in phylogenetic analyses. As a case study of this, we re-examine an analysis testing for co-evolution of expression levels between genes across a fungal phylogeny, and show that including covariance matrix eigenvectors as covariates decreases the false positive rate while simultaneously increasing the true positive rate. More generally, this work provides a foundation for more integrative approaches for understanding the genetic architecture of phenotypes and how evolutionary processes shape it.
Collapse
|
15
|
Fraimout A, Guillaume F, Li Z, Sillanpää MJ, Rastas P, Merilä J. Dissecting the genetic architecture of quantitative traits using genome-wide identity-by-descent sharing. Mol Ecol 2024; 33:e17299. [PMID: 38380534 DOI: 10.1111/mec.17299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 01/08/2024] [Accepted: 01/22/2024] [Indexed: 02/22/2024]
Abstract
Additive and dominance genetic variances underlying the expression of quantitative traits are important quantities for predicting short-term responses to selection, but they are notoriously challenging to estimate in most non-model wild populations. Specifically, large-sized or panmictic populations may be characterized by low variance in genetic relatedness among individuals which, in turn, can prevent accurate estimation of quantitative genetic parameters. We used estimates of genome-wide identity-by-descent (IBD) sharing from autosomal SNP loci to estimate quantitative genetic parameters for ecologically important traits in nine-spined sticklebacks (Pungitius pungitius) from a large, outbred population. Using empirical and simulated datasets, with varying sample sizes and pedigree complexity, we assessed the performance of different crossing schemes in estimating additive genetic variance and heritability for all traits. We found that low variance in relatedness characteristic of wild outbred populations with high migration rate can impair the estimation of quantitative genetic parameters and bias heritability estimates downwards. On the other hand, the use of a half-sib/full-sib design allowed precise estimation of genetic variance components and revealed significant additive variance and heritability for all measured traits, with negligible dominance contributions. Genome-partitioning and QTL mapping analyses revealed that most traits had a polygenic basis and were controlled by genes at multiple chromosomes. Furthermore, different QTL contributed to variation in the same traits in different populations suggesting heterogeneous underpinnings of parallel evolution at the phenotypic level. Our results provide important guidelines for future studies aimed at estimating adaptive potential in the wild, particularly for those conducted in outbred large-sized populations.
Collapse
Affiliation(s)
- Antoine Fraimout
- Organismal and Evolutionary Biology Research Programme, Faculty of Biological and Environmental Sciences, FI-00014 University of Helsinki, Helsinki, Finland
| | - Frédéric Guillaume
- Organismal and Evolutionary Biology Research Programme, Faculty of Biological and Environmental Sciences, FI-00014 University of Helsinki, Helsinki, Finland
| | - Zitong Li
- Organismal and Evolutionary Biology Research Programme, Faculty of Biological and Environmental Sciences, FI-00014 University of Helsinki, Helsinki, Finland
| | - Mikko J Sillanpää
- Research Unit of Mathematical Sciences, FI-90014 University of Oulu, Oulu, Finland
| | - Pasi Rastas
- Organismal and Evolutionary Biology Research Programme, Faculty of Biological and Environmental Sciences, FI-00014 University of Helsinki, Helsinki, Finland
- Institute of Biotechnology, FI-00014 University of Helsinki, Helsinki, Finland
| | - Juha Merilä
- Organismal and Evolutionary Biology Research Programme, Faculty of Biological and Environmental Sciences, FI-00014 University of Helsinki, Helsinki, Finland
- Area of Ecology and Biodiversity, School of Biological Sciences, The University of Hong Kong, Hong Kong SAR, China
| |
Collapse
|
16
|
Freudiger A, Jovanovic VM, Huang Y, Snyder-Mackler N, Conrad DF, Miller B, Montague MJ, Westphal H, Stadler PF, Bley S, Horvath JE, Brent LJN, Platt ML, Ruiz-Lambides A, Tung J, Nowick K, Ringbauer H, Widdig A. Taking identity-by-descent analysis into the wild: Estimating realized relatedness in free-ranging macaques. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.09.574911. [PMID: 38260273 PMCID: PMC10802400 DOI: 10.1101/2024.01.09.574911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Biological relatedness is a key consideration in studies of behavior, population structure, and trait evolution. Except for parent-offspring dyads, pedigrees capture relatedness imperfectly. The number and length of DNA segments that are identical-by-descent (IBD) yield the most precise estimates of relatedness. Here, we leverage novel methods for estimating locus-specific IBD from low coverage whole genome resequencing data to demonstrate the feasibility and value of resolving fine-scaled gradients of relatedness in free-living animals. Using primarily 4-6× coverage data from a rhesus macaque (Macaca mulatta) population with available long-term pedigree data, we show that we can call the number and length of IBD segments across the genome with high accuracy even at 0.5× coverage. The resulting estimates demonstrate substantial variation in genetic relatedness within kin classes, leading to overlapping distributions between kin classes. They identify cryptic genetic relatives that are not represented in the pedigree and reveal elevated recombination rates in females relative to males, which allows us to discriminate maternal and paternal kin using genotype data alone. Our findings represent a breakthrough in the ability to understand the predictors and consequences of genetic relatedness in natural populations, contributing to our understanding of a fundamental component of population structure in the wild.
Collapse
Affiliation(s)
- Annika Freudiger
- Behavioral Ecology Research Group, Faculty of Life Sciences, Institute of Biology, Leipzig University, Leipzig, Germany
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Vladimir M Jovanovic
- Human Biology and Primate Evolution, Institut für Zoologie, Freie Universität Berlin, Berlin, Germany
- Bioinformatics Solution Center, Freie Universität Berlin, Berlin, Germany
| | - Yilei Huang
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Bioinformatics Group, Institute of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig University, Leipzig, Germany
| | - Noah Snyder-Mackler
- Center for Evolution & Medicine, School of Life Sciences, Arizona State University, Tempe, USA
| | - Donald F Conrad
- Division of Genetics, Oregon National Primate Research Center, Portland, Oregon, USA
| | - Brian Miller
- Division of Genetics, Oregon National Primate Research Center, Portland, Oregon, USA
| | - Michael J Montague
- Department of Neuroscience, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Hendrikje Westphal
- Behavioral Ecology Research Group, Faculty of Life Sciences, Institute of Biology, Leipzig University, Leipzig, Germany
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Bioinformatics Group, Institute of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig University, Leipzig, Germany
| | - Peter F Stadler
- Bioinformatics Group, Institute of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig University, Leipzig, Germany
- Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany
- Institute for Theoretical Chemistry, University of Vienna, Austria
- Facultad de Ciencias, Universidad Nacional de Colombia, Bogotá, Colombia
- Santa Fe Institute, Santa Fe, NM, USA
| | - Stefanie Bley
- Behavioral Ecology Research Group, Faculty of Life Sciences, Institute of Biology, Leipzig University, Leipzig, Germany
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Julie E Horvath
- Department of Biological and Biomedical Sciences, North Carolina Central University, North Carolina, Durham, USA
- Research and Collections Section, North Carolina Museum of Natural Sciences, North Carolina, Raleigh, USA
- Department of Biological Sciences, North Carolina State University, North Carolina, Raleigh, USA
- Department of Evolutionary Anthropology, Duke University, North Carolina, Durham, USA
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Lauren J N Brent
- Centre for Research in Animal Behaviour, University of Exeter, Exeter, UK
| | - Michael L Platt
- Department of Neuroscience, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Marketing Department, the Wharton School of Business, University of Pennsylvania, Philadelphia, PA, USA
- Department of Psychology, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, USA
| | - Angelina Ruiz-Lambides
- Cayo Santiago Field Station, Caribbean Primate Research Center, University of Puerto Rico, Punta Santiago, Puerto Rico
| | - Jenny Tung
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Department of Evolutionary Anthropology, Duke University, North Carolina, Durham, USA
- Department of Biology, Duke University, Durham, North Carolina, USA
- Duke University Population Research Institute, Durham, North Carolina, USA
| | - Katja Nowick
- Human Biology and Primate Evolution, Institut für Zoologie, Freie Universität Berlin, Berlin, Germany
- Bioinformatics Solution Center, Freie Universität Berlin, Berlin, Germany
| | - Harald Ringbauer
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Anja Widdig
- Behavioral Ecology Research Group, Faculty of Life Sciences, Institute of Biology, Leipzig University, Leipzig, Germany
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, Germany
| |
Collapse
|
17
|
Link V, Schraiber JG, Fan C, Dinh B, Mancuso N, Chiang CWK, Edge MD. Tree-based QTL mapping with expected local genetic relatedness matrices. Am J Hum Genet 2023; 110:2077-2091. [PMID: 38065072 PMCID: PMC10716520 DOI: 10.1016/j.ajhg.2023.10.017] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2023] [Revised: 10/26/2023] [Accepted: 10/27/2023] [Indexed: 12/18/2023] Open
Abstract
Understanding the genetic basis of complex phenotypes is a central pursuit of genetics. Genome-wide association studies (GWASs) are a powerful way to find genetic loci associated with phenotypes. GWASs are widely and successfully used, but they face challenges related to the fact that variants are tested for association with a phenotype independently, whereas in reality variants at different sites are correlated because of their shared evolutionary history. One way to model this shared history is through the ancestral recombination graph (ARG), which encodes a series of local coalescent trees. Recent computational and methodological breakthroughs have made it feasible to estimate approximate ARGs from large-scale samples. Here, we explore the potential of an ARG-based approach to quantitative-trait locus (QTL) mapping, echoing existing variance-components approaches. We propose a framework that relies on the conditional expectation of a local genetic relatedness matrix (local eGRM) given the ARG. Simulations show that our method is especially beneficial for finding QTLs in the presence of allelic heterogeneity. By framing QTL mapping in terms of the estimated ARG, we can also facilitate the detection of QTLs in understudied populations. We use local eGRM to analyze two chromosomes containing known body size loci in a sample of Native Hawaiians. Our investigations can provide intuition about the benefits of using estimated ARGs in population- and statistical-genetic methods in general.
Collapse
Affiliation(s)
- Vivian Link
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Joshua G Schraiber
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Caoqi Fan
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA; Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Bryan Dinh
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA; Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Nicholas Mancuso
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA; Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Charleston W K Chiang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA; Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Michael D Edge
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
18
|
Gislason H. SNP heterozygosity, relatedness and inbreeding of whole genomes from the isolated population of the Faroe Islands. BMC Genomics 2023; 24:707. [PMID: 37996805 PMCID: PMC10666429 DOI: 10.1186/s12864-023-09763-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Accepted: 10/23/2023] [Indexed: 11/25/2023] Open
Abstract
BACKGROUND The population of the Faroe Islands is an isolated population but very little is known about it from whole genome sequencing. The population of about 50000 people has a high incidence of rare diseases e.g., 1:300 for Primary Carnitine Deficiency. A screening programme was implemented, and eleven persons were also whole genome sequenced at x37 coverage for diagnostic purposes of those cases that were not affected by the known mutations. The purpose of our study is to utilize the high coverage data to explore the genomic variation and the ancestral history of the population. We study the SNP heterozygosity, the pairwise relatedness from kinship, the inbreeding from runs of homozygosity ROH, and we find the minor allele frequency distribution. We estimate the population ancestry and the timing of the founding event by using the whole genomes from eight consenting individuals. RESULTS We find the number of SNPs and the heterozygosity for the eight individual samples, and for merged samples, for which we also study the relatedness. We find close relatedness between the supposedly unrelated individuals. From ROH, we interpret the high relatedness as an ancient property of the isolated population. A bottleneck event is estimated starting between years [Formula: see text] with a maximum consanguineous population in year [Formula: see text] and similarly consanguineous between years [Formula: see text]. The ancestry analysis shows the population descends from founders of [Formula: see text] European and [Formula: see text] Admixed American ancestry. A distinct clustering near the central European and British populations of the 1000 Genome Project is likely the result of the population isolation and genetic drift. The minor allele frequency distribution suggests many rare variants. CONCLUSIONS The ancestry is mainly European while the inbreeding is higher compared to European populations and population isolates. The Faroese population has inbreeding more like ancient Europeans. We discovered a bottlenecked and consanguineous population event and estimated it starting in the 1st-4th century as compared to the oldest archaeological findings from the 4th-6th century.
Collapse
Affiliation(s)
- Hannes Gislason
- Faculty of Science and Technology, University of the Faroe Islands, Tórshavn, Faroe Islands.
| |
Collapse
|
19
|
Cao TV, Sutherland HG, Benton MC, Haupt LM, Lea RA, Griffiths LR. Exploring the Functional Basis of Epigenetic Aging in Relation to Body Fat Phenotypes in the Norfolk Island Cohort. Curr Issues Mol Biol 2023; 45:7862-7877. [PMID: 37886940 PMCID: PMC10605526 DOI: 10.3390/cimb45100497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 09/18/2023] [Accepted: 09/25/2023] [Indexed: 10/28/2023] Open
Abstract
DNA methylation is an epigenetic factor that is modifiable and can change over a lifespan. While many studies have identified methylation sites (CpGs) related to aging, the relationship of these to gene function and age-related disease phenotypes remains unclear. This research explores this question by testing for the conjoint association of age-related CpGs with gene expression and the relation of these to body fat phenotypes. The study included blood-based gene transcripts and intragenic CpG methylation data from Illumina 450 K arrays in 74 healthy adults from the Norfolk Island population. First, a series of regression analyses were performed to detect associations between gene transcript level and intragenic CpGs and their conjoint relationship with age. Second, we explored how these age-related expression CpGs (eCpGs) correlated with obesity-related phenotypes, including body fat percentage, body mass index, and waist-to-hip ratio. We identified 35 age-related eCpGs associated with age. Of these, ten eCpGs were associated with at least one body fat phenotype. Collagen Type XI Alpha 2 Chain (COL11A2), Complement C1s (C1s), and four and a half LIM domains 2 (FHL2) genes were among the most significant genes with multiple eCpGs associated with both age and multiple body fat phenotypes. The COL11A2 gene contributes to the correct assembly of the extracellular matrix in maintaining the healthy structural arrangement of various components, with the C1s gene part of complement systems functioning in inflammation. Moreover, FHL2 expression was upregulated under hypermethylation in both blood and adipose tissue with aging. These results suggest new targets for future studies and require further validation to confirm the specific function of these genes on body fat regulation.
Collapse
Affiliation(s)
- Thao Van Cao
- Centre for Genomics and Personalised Health, School of Biomedical Sciences, Queensland University of Technology (QUT), Kelvin Grove, QLD 4059, Australia; (T.V.C.); (H.G.S.); (M.C.B.); (L.M.H.); (L.R.G.)
| | - Heidi G. Sutherland
- Centre for Genomics and Personalised Health, School of Biomedical Sciences, Queensland University of Technology (QUT), Kelvin Grove, QLD 4059, Australia; (T.V.C.); (H.G.S.); (M.C.B.); (L.M.H.); (L.R.G.)
| | - Miles C. Benton
- Centre for Genomics and Personalised Health, School of Biomedical Sciences, Queensland University of Technology (QUT), Kelvin Grove, QLD 4059, Australia; (T.V.C.); (H.G.S.); (M.C.B.); (L.M.H.); (L.R.G.)
| | - Larisa M. Haupt
- Centre for Genomics and Personalised Health, School of Biomedical Sciences, Queensland University of Technology (QUT), Kelvin Grove, QLD 4059, Australia; (T.V.C.); (H.G.S.); (M.C.B.); (L.M.H.); (L.R.G.)
- ARC Training Centre for Cell and Tissue Engineering Technologies, Queensland University of Technology (QUT), Kelvin Grove, QLD 4059, Australia
- Max Planck Queensland Centre for the Materials Sciences of Extracellular Matrices, Queensland University of Technology (QUT), Kelvin Grove, QLD 4059, Australia
| | - Rodney A. Lea
- Centre for Genomics and Personalised Health, School of Biomedical Sciences, Queensland University of Technology (QUT), Kelvin Grove, QLD 4059, Australia; (T.V.C.); (H.G.S.); (M.C.B.); (L.M.H.); (L.R.G.)
| | - Lyn R. Griffiths
- Centre for Genomics and Personalised Health, School of Biomedical Sciences, Queensland University of Technology (QUT), Kelvin Grove, QLD 4059, Australia; (T.V.C.); (H.G.S.); (M.C.B.); (L.M.H.); (L.R.G.)
| |
Collapse
|
20
|
Ashwath MN, Lavale SA, Santhoshkumar AV, Mohapatra SR, Bhardwaj A, Dash U, Shiran K, Samantara K, Wani SH. Genome-wide association studies: an intuitive solution for SNP identification and gene mapping in trees. Funct Integr Genomics 2023; 23:297. [PMID: 37700096 DOI: 10.1007/s10142-023-01224-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 04/26/2023] [Accepted: 08/31/2023] [Indexed: 09/14/2023]
Abstract
Analysis of natural diversity in wild/cultivated plants can be used to understand the genetic basis for plant breeding programs. Recent advancements in DNA sequencing have expanded the possibilities for genetically altering essential features. There have been several recently disclosed statistical genetic methods for discovering the genes impacting target qualities. One of these useful methods is the genome-wide association study (GWAS), which effectively identifies candidate genes for a variety of plant properties by examining the relationship between a molecular marker (such as SNP) and a target trait. Conventional QTL mapping with highly structured populations has major limitations. The limited number of recombination events results in poor resolution for quantitative traits. Only two alleles at any given locus can be studied simultaneously. Conventional mapping approach fails to work in perennial plants and vegetatively propagated crops. These limitations are sidestepped by association mapping or GWAS. The flexibility of GWAS comes from the fact that the individuals being examined need not be linked to one another, allowing for the use of all meiotic and recombination events to increase resolution. Phenotyping, genotyping, population structure analysis, kinship analysis, and marker-trait association analysis are the fundamental phases of GWAS. With the rapid development of sequencing technologies and computational methods, GWAS is becoming a potent tool for identifying the natural variations that underlie complex characteristics in crops. The use of high-throughput sequencing technologies along with genotyping approaches like genotyping-by-sequencing (GBS) and restriction site associated DNA (RAD) sequencing may be highly useful in fast-forward mapping approach like GWAS. Breeders may use GWAS to quickly unravel the genomes through QTL and association mapping by taking advantage of natural variances. The drawbacks of conventional linkage mapping can be successfully overcome with the use of high-resolution mapping and the inclusion of multiple alleles in GWAS.
Collapse
Affiliation(s)
- M N Ashwath
- Department of Forest Biology and Tree Improvement, Kerala Agricultural University, Thrissur, Kerala, 680 656, India
| | - Shivaji Ajinath Lavale
- Centre for Plant Biotechnology and Molecular Biology, Kerala Agricultural University, Thrissur, Kerala, 680 656, India
| | - A V Santhoshkumar
- Department of Forest Biology and Tree Improvement, Kerala Agricultural University, Thrissur, Kerala, 680 656, India
| | - Sourav Ranjan Mohapatra
- Department of Forest Biology and Tree Improvement, Odisha University of Agriculture and Technology, Bhubaneswar, Odisha, 751 003, India.
| | - Ankita Bhardwaj
- Department of Silviculture and Agroforestry, Kerala Agricultural University, Thrissur, Kerala, 680 656, India
| | - Umakanta Dash
- Department of Silviculture and Agroforestry, Kerala Agricultural University, Thrissur, Kerala, 680 656, India
| | - K Shiran
- Department of Forest Biology and Tree Improvement, Kerala Agricultural University, Thrissur, Kerala, 680 656, India
| | - Kajal Samantara
- Institute of Technology, University of Tartu, 50411, Tartu, Estonia
| | - Shabir Hussain Wani
- Mountain Research Center for Field crops, Sher-e-Kashmir University of Agricultural Sciences and Technology Srinagar, Khudwani, Srinagar, Jammu and Kashmir, India.
| |
Collapse
|
21
|
Feldmann MJ, Covarrubias-Pazaran G, Piepho HP. Complex traits and candidate genes: estimation of genetic variance components across multiple genetic architectures. G3 (BETHESDA, MD.) 2023; 13:jkad148. [PMID: 37405459 PMCID: PMC10468314 DOI: 10.1093/g3journal/jkad148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 06/09/2023] [Accepted: 06/12/2023] [Indexed: 07/06/2023]
Abstract
Large-effect loci-those statistically significant loci discovered by genome-wide association studies or linkage mapping-associated with key traits segregate amidst a background of minor, often undetectable, genetic effects in wild and domesticated plants and animals. Accurately attributing mean differences and variance explained to the correct components in the linear mixed model analysis is vital for selecting superior progeny and parents in plant and animal breeding, gene therapy, and medical genetics in humans. Marker-assisted prediction and its successor, genomic prediction, have many advantages for selecting superior individuals and understanding disease risk. However, these two approaches are less often integrated to study complex traits with different genetic architectures. This simulation study demonstrates that the average semivariance can be applied to models incorporating Mendelian, oligogenic, and polygenic terms simultaneously and yields accurate estimates of the variance explained for all relevant variables. Our previous research focused on large-effect loci and polygenic variance separately. This work aims to synthesize and expand the average semivariance framework to various genetic architectures and the corresponding mixed models. This framework independently accounts for the effects of large-effect loci and the polygenic genetic background and is universally applicable to genetics studies in humans, plants, animals, and microbes.
Collapse
Affiliation(s)
- Mitchell J Feldmann
- Department of Plant Sciences, University of California Davis, One Shields Ave, Davis, CA 95616, USA
| | - Giovanny Covarrubias-Pazaran
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México-Veracruz, El Batán, 56130 Texcoco, Edo. de México, México
| | - Hans-Peter Piepho
- Biostatistics Unit, Institute of Crop Science, University of Hohenheim, Stuttgart 70599, Germany
| |
Collapse
|
22
|
Zhao F, Zhang P, Wang X, Akdemir D, Garrick D, He J, Wang L. Genetic gain and inbreeding from simulation of different genomic mating schemes for pig improvement. J Anim Sci Biotechnol 2023; 14:87. [PMID: 37309010 DOI: 10.1186/s40104-023-00872-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Accepted: 04/02/2023] [Indexed: 06/14/2023] Open
Abstract
BACKGROUND Genomic selection involves choosing as parents those elite individuals with the higher genomic estimated breeding values (GEBV) to accelerate the speed of genetic improvement in domestic animals. But after multi-generation selection, the rate of inbreeding and the occurrence of homozygous harmful alleles might increase, which would reduce performance and genetic diversity. To mitigate the above problems, we can utilize genomic mating (GM) based upon optimal mate allocation to construct the best genotypic combinations in the next generation. In this study, we used stochastic simulation to investigate the impact of various factors on the efficiencies of GM to optimize pairing combinations after genomic selection of candidates in a pig population. These factors included: the algorithm used to derive inbreeding coefficients; the trait heritability (0.1, 0.3 or 0.5); the kind of GM scheme (focused average GEBV or inbreeding); the approach for computing the genomic relationship matrix (by SNP or runs of homozygosity (ROH)). The outcomes were compared to three traditional mating schemes (random, positive assortative or negative assortative matings). In addition, the performance of the GM approach was tested on real datasets obtained from a Large White pig breeding population. RESULTS Genomic mating outperforms other approaches in limiting the inbreeding accumulation for the same expected genetic gain. The use of ROH-based genealogical relatedness in GM achieved faster genetic gains than using relatedness based on individual SNPs. The GROH-based GM schemes with the maximum genetic gain resulted in 0.9%-2.6% higher rates of genetic gain ΔG, and 13%-83.3% lower ΔF than positive assortative mating regardless of heritability. The rates of inbreeding were always the fastest with positive assortative mating. Results from a purebred Large White pig population, confirmed that GM with ROH-based GRM was more efficient than traditional mating schemes. CONCLUSION Compared with traditional mating schemes, genomic mating can not only achieve sustainable genetic progress but also effectively control the rates of inbreeding accumulation in the population. Our findings demonstrated that breeders should consider using genomic mating for genetic improvement of pigs.
Collapse
Affiliation(s)
- Fuping Zhao
- Key Laboratory of Animal Genetics, Breeding and Reproduction (Poultry) of Ministry of Agriculture and Rural Affairs, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Pengfei Zhang
- Key Laboratory of Animal Genetics, Breeding and Reproduction (Poultry) of Ministry of Agriculture and Rural Affairs, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Xiaoqing Wang
- Key Laboratory of Animal Genetics, Breeding and Reproduction (Poultry) of Ministry of Agriculture and Rural Affairs, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Deniz Akdemir
- Center for Blood and Marrow Transplant Research, Minneapolis, MN, USA
| | - Dorian Garrick
- AL Rae Centre for Genetics and Breeding, Massey University, Hamilton, 3240, New Zealand
| | - Jun He
- College of Animal Science and Biotechnology, Hunnan Agricultural University, Changsha, 410128, China
| | - Lixian Wang
- Key Laboratory of Animal Genetics, Breeding and Reproduction (Poultry) of Ministry of Agriculture and Rural Affairs, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, 100193, China.
| |
Collapse
|
23
|
Wei Y, Naseri A, Zhi D, Zhang S. RaPID-Query for fast identity by descent search and genealogical analysis. Bioinformatics 2023; 39:btad312. [PMID: 37166451 PMCID: PMC10244210 DOI: 10.1093/bioinformatics/btad312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 04/26/2023] [Accepted: 05/09/2023] [Indexed: 05/12/2023] Open
Abstract
MOTIVATION Due to the rapid growth of the genetic database size, genealogical search, a process of inferring familial relatedness by identifying DNA matches, has become a viable approach to help individuals finding missing family members or law enforcement agencies locating suspects. A fast and accurate method is needed to search an out-of-database individual against millions of individuals. Most existing approaches only offer all-versus-all within panel match. Some prototype algorithms offer one-versus-all query from out-of-panel individual, but they do not tolerate errors. RESULTS A new method, random projection-based identity-by-descent (IBD) detection (RaPID) query, is introduced to make fast genealogical search possible. RaPID-Query identifies IBD segments between a query haplotype and a panel of haplotypes. By integrating matches over multiple PBWT indexes, RaPID-Query manages to locate IBD segments quickly with a given cutoff length while allowing mismatched sites. A single query against all UK biobank autosomal chromosomes was completed within 2.76 seconds on average, with the minimum length 7 cM and 700 markers. RaPID-Query achieved a 0.016 false negative rate and a 0.012 false positive rate simultaneously on a chromosome 20 sequencing panel having 86 265 sites. This is comparable to the state-of-the-art IBD detection method TPBWT(out-of-sample) and Hap-IBD. The high-quality IBD segments yielded by RaPID-Query were able to distinguish up to fourth degree of the familial relatedness for a given individual pair, and the area under the receiver operating characteristic curve values are at least 97.28%. AVAILABILITY AND IMPLEMENTATION The RaPID-Query program is available at https://github.com/ucfcbb/RaPID-Query.
Collapse
Affiliation(s)
- Yuan Wei
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, United States
| | - Ardalan Naseri
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
| | - Degui Zhi
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
| | - Shaojie Zhang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, United States
| |
Collapse
|
24
|
Argyropoulos DC, Tan MH, Adobor C, Mensah B, Labbé F, Tiedje KE, Koram KA, Ghansah A, Day KP. Performance of SNP barcodes to determine genetic diversity and population structure of Plasmodium falciparum in Africa. Front Genet 2023; 14:1071896. [PMID: 37323661 PMCID: PMC10267394 DOI: 10.3389/fgene.2023.1071896] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 05/17/2023] [Indexed: 06/17/2023] Open
Abstract
Panels of informative biallelic single nucleotide polymorphisms (SNPs) have been proposed to be an economical method to fast-track the population genetic analysis of Plasmodium falciparum in malaria-endemic areas. Whilst used successfully in low-transmission areas where infections are monoclonal and highly related, we present the first study to evaluate the performance of these 24- and 96-SNP molecular barcodes in African countries, characterised by moderate-to-high transmission, where multiclonal infections are prevalent. For SNP barcodes it is generally recommended that the SNPs chosen i) are biallelic, ii) have a minor allele frequency greater than 0.10, and iii) are independently segregating, to minimise bias in the analysis of genetic diversity and population structure. Further, to be standardised and used in many population genetic studies, these barcodes should maintain characteristics i) to iii) across various iv) geographies and v) time points. Using haplotypes generated from the MalariaGEN P. falciparum Community Project version six database, we investigated the ability of these two barcodes to fulfil these criteria in moderate-to-high transmission African populations in 25 sites across 10 countries. Predominantly clinical infections were analysed, with 52.3% found to be multiclonal, generating high proportions of mixed-allele calls (MACs) per isolate thereby impeding haplotype construction. Of the 24- and 96-SNPs, loci were removed if they were not biallelic and had low minor allele frequencies in all study populations, resulting in 20- and 75-SNP barcodes respectively for downstream population genetics analysis. Both SNP barcodes had low expected heterozygosity estimates in these African settings and consequently biased analyses of similarity. Both minor and major allele frequencies were temporally unstable. These SNP barcodes were also shown to identify weak genetic differentiation across large geographic distances based on Mantel Test and DAPC. These results demonstrate that these SNP barcodes are vulnerable to ascertainment bias and as such cannot be used as a standardised approach for malaria surveillance in moderate-to-high transmission areas in Africa, where the greatest genomic diversity of P. falciparum exists at local, regional and country levels.
Collapse
Affiliation(s)
- Dionne C. Argyropoulos
- Department of Microbiology and Immunology, Bio21 Institute and Peter Doherty Institute, The University of Melbourne, Melbourne, VIC, Australia
| | - Mun Hua Tan
- Department of Microbiology and Immunology, Bio21 Institute and Peter Doherty Institute, The University of Melbourne, Melbourne, VIC, Australia
| | - Courage Adobor
- Department of Parasitology, Noguchi Memorial Institute for Medical Research, College of Health Sciences, University of Ghana, Accra, Ghana
| | - Benedicta Mensah
- Department of Parasitology, Noguchi Memorial Institute for Medical Research, College of Health Sciences, University of Ghana, Accra, Ghana
| | - Frédéric Labbé
- Department of Ecology and Evolution, The University of Chicago, Chicago, IL, United States
| | - Kathryn E. Tiedje
- Department of Microbiology and Immunology, Bio21 Institute and Peter Doherty Institute, The University of Melbourne, Melbourne, VIC, Australia
| | - Kwadwo A. Koram
- Epidemiology Department, Noguchi Memorial Institute for Medical Research, University of Ghana, Accra, Ghana
| | - Anita Ghansah
- Department of Parasitology, Noguchi Memorial Institute for Medical Research, College of Health Sciences, University of Ghana, Accra, Ghana
| | - Karen P. Day
- Department of Microbiology and Immunology, Bio21 Institute and Peter Doherty Institute, The University of Melbourne, Melbourne, VIC, Australia
| |
Collapse
|
25
|
Hou Z, Ochoa A. Genetic association models are robust to common population kinship estimation biases. Genetics 2023; 224:iyad030. [PMID: 36843304 PMCID: PMC10474929 DOI: 10.1093/genetics/iyad030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 11/08/2022] [Accepted: 02/17/2023] [Indexed: 02/28/2023] Open
Abstract
Common genetic association models for structured populations, including principal component analysis (PCA) and linear mixed-effects models (LMMs), model the correlation structure between individuals using population kinship matrices, also known as genetic relatedness matrices. However, the most common kinship estimators can have severe biases that were only recently determined. Here we characterize the effect of these kinship biases on genetic association. We employ a large simulated admixed family and genotypes from the 1000 Genomes Project, both with simulated traits, to evaluate key kinship estimators. Remarkably, we find practically invariant association statistics for kinship matrices of different bias types (matching all other features). We then prove using statistical theory and linear algebra that LMM association tests are invariant to these kinship biases, and PCA approximately so. Our proof shows that the intercept and relatedness effect coefficients compensate for the kinship bias, an argument that extends to generalized linear models. As a corollary, association testing is also invariant to changing the reference ancestral population of the kinship matrix. Lastly, we observed that all kinship estimators, except for popkin ratio-of-means, can give improper non-positive semidefinite matrices, which can be problematic although some LMMs handle them surprisingly well, and condition numbers can be used to choose kinship estimators. Overall, we find that existing association studies are robust to kinship estimation bias, and our calculations may help improve association methods by taking advantage of this unexpected robustness, as well as help determine the effects of kinship bias in related problems.
Collapse
Affiliation(s)
- Zhuoran Hou
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27705, USA
| | - Alejandro Ochoa
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27705, USA
- Duke Center for Statistical Genetics and Genomics, Duke University, Durham, NC 27705, USA
| |
Collapse
|
26
|
Ghoreishifar M, Vahedi SM, Salek Ardestani S, Khansefid M, Pryce JE. Genome-wide assessment and mapping of inbreeding depression identifies candidate genes associated with semen traits in Holstein bulls. BMC Genomics 2023; 24:230. [PMID: 37138201 PMCID: PMC10157977 DOI: 10.1186/s12864-023-09298-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 04/05/2023] [Indexed: 05/05/2023] Open
Abstract
BACKGROUND The reduction in phenotypic performance of a population due to mating between close relatives is called inbreeding depression. The genetic background of inbreeding depression for semen traits is poorly understood. Thus, the objectives were to estimate the effect of inbreeding and to identify genomic regions underlying inbreeding depression of semen traits including ejaculate volume (EV), sperm concentration (SC), and sperm motility (SM). The dataset comprised ~ 330 K semen records from ~ 1.5 K Holstein bulls genotyped with 50 K single nucleotide polymorphism (SNP) BeadChip. Genomic inbreeding coefficients were estimated using runs of homozygosity (i.e., FROH > 1 Mb) and excess of SNP homozygosity (FSNP). The effect of inbreeding was estimated by regressing phenotypes of semen traits on inbreeding coefficients. Associated variants with inbreeding depression were also detected by regressing phenotypes on ROH state of the variants. RESULTS Significant inbreeding depression was observed for SC and SM (p < 0.01). A 1% increase in FROH reduced SM and SC by 0.28% and 0.42% of the population mean, respectively. By splitting FROH into different lengths, we found significant reduction in SC and SM due to longer ROH, which is indicative of more recent inbreeding. A genome-wide association study revealed two signals positioned on BTA 8 associated with inbreeding depression of SC (p < 0.00001; FDR < 0.02). Three candidate genes of GALNTL6, HMGB2, and ADAM29, located in these regions, have established and conserved connections with reproduction and/or male fertility. Moreover, six genomic regions on BTA 3, 9, 21 and 28 were associated with SM (p < 0.0001; FDR < 0.08). These genomic regions contained genes including PRMT6, SCAPER, EDC3, and LIN28B with established connections to spermatogenesis or fertility. CONCLUSIONS Inbreeding depression adversely affects SC and SM, with evidence that longer ROH, or more recent inbreeding, being especially detrimental. There are genomic regions associated with semen traits that seems to be especially sensitive to homozygosity, and evidence to support some from other studies. Breeding companies may wish to consider avoiding homozygosity in these regions for potential artificial insemination sires.
Collapse
Affiliation(s)
- Mohammad Ghoreishifar
- Agriculture Victoria Research, AgriBio, Centre for AgriBioscience, 5 Ring Road, Bundoora, Victoria, 3083, Australia.
- School of Applied Systems Biology, La Trobe University, Bundoora, Victoria, 3083, Australia.
| | - Seyed Milad Vahedi
- Department of Animal Science and Aquaculture, Dalhousie University, Truro, NS, B2N5E3, Canada
| | | | - Majid Khansefid
- Agriculture Victoria Research, AgriBio, Centre for AgriBioscience, 5 Ring Road, Bundoora, Victoria, 3083, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, Victoria, 3083, Australia
| | - Jennie E Pryce
- Agriculture Victoria Research, AgriBio, Centre for AgriBioscience, 5 Ring Road, Bundoora, Victoria, 3083, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, Victoria, 3083, Australia
| |
Collapse
|
27
|
Link V, Schraiber JG, Fan C, Dinh B, Mancuso N, Chiang CW, Edge MD. Tree-based QTL mapping with expected local genetic relatedness matrices. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.07.536093. [PMID: 37066144 PMCID: PMC10104234 DOI: 10.1101/2023.04.07.536093] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]
Abstract
Understanding the genetic basis of complex phenotypes is a central pursuit of genetics. Genome-wide Association Studies (GWAS) are a powerful way to find genetic loci associated with phenotypes. GWAS are widely and successfully used, but they face challenges related to the fact that variants are tested for association with a phenotype independently, whereas in reality variants at different sites are correlated because of their shared evolutionary history. One way to model this shared history is through the ancestral recombination graph (ARG), which encodes a series of local coalescent trees. Recent computational and methodological breakthroughs have made it feasible to estimate approximate ARGs from large-scale samples. Here, we explore the potential of an ARG-based approach to quantitative-trait locus (QTL) mapping, echoing existing variance-components approaches. We propose a framework that relies on the conditional expectation of a local genetic relatedness matrix given the ARG (local eGRM). Simulations show that our method is especially beneficial for finding QTLs in the presence of allelic heterogeneity. By framing QTL mapping in terms of the estimated ARG, we can also facilitate the detection of QTLs in understudied populations. We use local eGRM to identify a large-effect BMI locus, the CREBRF gene, in a sample of Native Hawaiians in which it was not previously detectable by GWAS because of a lack of population-specific imputation resources. Our investigations can provide intuition about the benefits of using estimated ARGs in population- and statistical-genetic methods in general.
Collapse
Affiliation(s)
- Vivian Link
- Department of Quantitative and Computational Biology, University of Southern California
| | - Joshua G. Schraiber
- Department of Quantitative and Computational Biology, University of Southern California
| | - Caoqi Fan
- Department of Quantitative and Computational Biology, University of Southern California
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California
| | - Bryan Dinh
- Department of Quantitative and Computational Biology, University of Southern California
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California
| | - Nicholas Mancuso
- Department of Quantitative and Computational Biology, University of Southern California
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California
| | - Charleston W.K. Chiang
- Department of Quantitative and Computational Biology, University of Southern California
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California
| | - Michael D. Edge
- Department of Quantitative and Computational Biology, University of Southern California
| |
Collapse
|
28
|
Solovieva E, Sakai H. PSReliP: an integrated pipeline for analysis and visualization of population structure and relatedness based on genome-wide genetic variant data. BMC Bioinformatics 2023; 24:135. [PMID: 37020193 PMCID: PMC10074814 DOI: 10.1186/s12859-023-05169-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 02/02/2023] [Indexed: 04/07/2023] Open
Abstract
BACKGROUND Population structure and cryptic relatedness between individuals (samples) are two major factors affecting false positives in genome-wide association studies (GWAS). In addition, population stratification and genetic relatedness in genomic selection in animal and plant breeding can affect prediction accuracy. The methods commonly used for solving these problems are principal component analysis (to adjust for population stratification) and marker-based kinship estimates (to correct for the confounding effects of genetic relatedness). Currently, many tools and software are available that analyze genetic variation among individuals to determine population structure and genetic relationships. However, none of these tools or pipelines perform such analyses in a single workflow and visualize all the various results in a single interactive web application. RESULTS We developed PSReliP, a standalone, freely available pipeline for the analysis and visualization of population structure and relatedness between individuals in a user-specified genetic variant dataset. The analysis stage of PSReliP is responsible for executing all steps of data filtering and analysis and contains an ordered sequence of commands from PLINK, a whole-genome association analysis toolset, along with in-house shell scripts and Perl programs that support data pipelining. The visualization stage is provided by Shiny apps, an R-based interactive web application. In this study, we describe the characteristics and features of PSReliP and demonstrate how it can be applied to real genome-wide genetic variant data. CONCLUSIONS The PSReliP pipeline allows users to quickly analyze genetic variants such as single nucleotide polymorphisms and small insertions or deletions at the genome level to estimate population structure and cryptic relatedness using PLINK software and to visualize the analysis results in interactive tables, plots, and charts using Shiny technology. The analysis and assessment of population stratification and genetic relatedness can aid in choosing an appropriate approach for the statistical analysis of GWAS data and predictions in genomic selection. The various outputs from PLINK can be used for further downstream analysis. The code and manual for PSReliP are available at https://github.com/solelena/PSReliP .
Collapse
Affiliation(s)
- Elena Solovieva
- Research Center for Advanced Analysis, National Agriculture and Food Research Organization, Tsukuba, Ibaraki, Japan
| | - Hiroaki Sakai
- Research Center for Advanced Analysis, National Agriculture and Food Research Organization, Tsukuba, Ibaraki, Japan.
| |
Collapse
|
29
|
Ghansah A, Tiedje KE, Argyropoulos DC, Onwona CO, Deed SL, Labbé F, Oduro AR, Koram KA, Pascual M, Day KP. Comparison of molecular surveillance methods to assess changes in the population genetics of Plasmodium falciparum in high transmission. FRONTIERS IN PARASITOLOGY 2023; 2:1067966. [PMID: 38031549 PMCID: PMC10686283 DOI: 10.3389/fpara.2023.1067966] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Accepted: 03/14/2023] [Indexed: 12/01/2023]
Abstract
A major motivation for developing molecular methods for malaria surveillance is to measure the impact of control interventions on the population genetics of Plasmodium falciparum as a potential marker of progress towards elimination. Here we assess three established methods (i) single nucleotide polymorphism (SNP) barcoding (panel of 24-biallelic loci), (ii) microsatellite genotyping (panel of 12-multiallelic loci), and (iii) varcoding (fingerprinting var gene diversity, akin to microhaplotyping) to identify changes in parasite population genetics in response to a short-term indoor residual spraying (IRS) intervention. Typical of high seasonal transmission in Africa, multiclonal infections were found in 82.3% (median 3; range 1-18) and 57.8% (median 2; range 1-12) of asymptomatic individuals pre- and post-IRS, respectively, in Bongo District, Ghana. Since directly phasing multilocus haplotypes for population genetic analysis is not possible for biallelic SNPs and microsatellites, we chose ~200 low-complexity infections biased to single and double clone infections for analysis. Each genotyping method presented a different pattern of change in diversity and population structure as a consequence of variability in usable data and the relative polymorphism of the molecular markers (i.e., SNPs < microsatellites < var). Varcoding and microsatellite genotyping showed the overall failure of the IRS intervention to significantly change the population structure from pre-IRS characteristics (i.e., many diverse genomes of low genetic similarity). The 24-SNP barcode provided limited information for analysis, largely due to the biallelic nature of SNPs leading to a high proportion of double-allele calls and a view of more isolate relatedness compared to microsatellites and varcoding. Relative performance, suitability, and cost-effectiveness of the methods relevant to sample size and local malaria elimination in high-transmission endemic areas are discussed.
Collapse
Affiliation(s)
- Anita Ghansah
- Department of Parasitology, Noguchi Memorial Institute for Medical Research, University of Ghana, Legon, Ghana
| | - Kathryn E. Tiedje
- Department of Microbiology and Immunology, The University of Melbourne, Bio21 Institute and Peter Doherty Institute, Melbourne, VIC, Australia
| | - Dionne C. Argyropoulos
- Department of Microbiology and Immunology, The University of Melbourne, Bio21 Institute and Peter Doherty Institute, Melbourne, VIC, Australia
| | - Christiana O. Onwona
- Department of Parasitology, Noguchi Memorial Institute for Medical Research, University of Ghana, Legon, Ghana
| | - Samantha L. Deed
- Department of Microbiology and Immunology, The University of Melbourne, Bio21 Institute and Peter Doherty Institute, Melbourne, VIC, Australia
| | - Frédéric Labbé
- Department Ecology and Evolution, The University of Chicago, Chicago, IL, United States
| | - Abraham R. Oduro
- Navrongo Health Research Centre, Ghana Health Service, Navrongo, Ghana
| | - Kwadwo A. Koram
- Epidemiology Department, Noguchi Memorial Institute for Medical Research, University of Ghana, Legon, Ghana
| | - Mercedes Pascual
- Department Ecology and Evolution, The University of Chicago, Chicago, IL, United States
- Santa Fe Institute, Santa Fe, NM, United States
| | - Karen P. Day
- Department of Microbiology and Immunology, The University of Melbourne, Bio21 Institute and Peter Doherty Institute, Melbourne, VIC, Australia
| |
Collapse
|
30
|
Abstract
Prior to the development of genome-wide arrays and whole genome sequencing technologies, heritability estimation mainly relied on the study of related individuals. Over the past decade, various approaches have been developed to estimate SNP-based narrow-sense heritability (h SNP 2 ${\rm{h}}_{{\rm{SNP}}}^2$ ) in unrelated individuals. These latter approaches use either individual-level genetic variations or summary results from genome-wide association studies (GWAS). Recently, several studies compared these approaches using extensive simulations and empirical datasets. However, sparse information on hands-on training necessitates revisiting these approaches from the perspective of a stepwise guide for practical applications. Here, we provide an overview of the commonly used SNP-heritability estimation approaches utilizing genome-wide array, imputed or whole genome data from unrelated individuals, or summary results. We not only discuss these approaches based on their statistical concepts, utility, advantages, and limitations, but also provide step-by-step protocols to apply these approaches. For illustration purposes, we estimateh SNP 2 ${\rm{h}}_{{\rm{SNP}}}^2$ of height and BMI utilizing individual-level data from The Northern Finland Birth Cohort (NFBC) and summary results from the Genetic Investigation of ANthropometric Traits (GIANT;) consortium. We present this review as a template for the researchers who estimate and use heritability in their studies and as a reference for geneticists who develop or extend heritability estimation approaches. © 2023 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: GREML (GCTA) Alternate Protocol 1: Stratified GREML Basic Protocol 2: LDAK Alternate Protocol 2: Stratified LDAK Basic Protocol 3: Threshold GREML Basic Protocol 4: LD score (LDSC) regression Basic Protocol 5: SumHer.
Collapse
Affiliation(s)
- Amit K. Srivastava
- Division of Human Genetics, Cincinnati Children’s Hospital Medical Center, USA; The Center for Prevention of Preterm Birth, Perinatal Institute, Cincinnati Children’s Hospital Medical Center, USA; March of Dimes Prematurity Research Center Ohio Collaborative, USA; Department of Pediatrics, University of Cincinnati College of Medicine, USA
| | - Scott M. Williams
- Department of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, USA; Department of Genetics and Genome Sciences, School of Medicine, Case Western Reserve University, USA; Institute of Computational Biology, Case Western Reserve University, USA
| | - Ge Zhang
- Division of Human Genetics, Cincinnati Children’s Hospital Medical Center, USA; The Center for Prevention of Preterm Birth, Perinatal Institute, Cincinnati Children’s Hospital Medical Center, USA; March of Dimes Prematurity Research Center Ohio Collaborative, USA; Department of Pediatrics, University of Cincinnati College of Medicine, USA
| |
Collapse
|
31
|
Putra AR, Yen JDL, Fournier-Level A. Forecasting trait responses in novel environments to aid seed provenancing under climate change. Mol Ecol Resour 2023; 23:565-580. [PMID: 36308465 DOI: 10.1111/1755-0998.13728] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Revised: 10/23/2022] [Accepted: 10/27/2022] [Indexed: 11/28/2022]
Abstract
Revegetation projects face the major challenge of sourcing optimal plant material. This is often done with limited information about plant performance and increasingly requires factoring resilience to climate change. Functional traits can be used as quantitative indices of plant performance and guide seed provenancing, but trait values expected under novel conditions are often unknown. To support climate-resilient provenancing efforts, we develop a trait prediction model that integrates the effect of genetic variation with fine-scale temperature variation. We train our model on multiple field plantings of Arabidopsis thaliana and predict two relevant fitness traits-days-to-bolting and fecundity-across the species' European range. Prediction accuracy was high for days-to-bolting and moderate for fecundity, with the majority of trait variation explained by temperature differences between plantings. Projection under future climate predicted a decline in fecundity, although this response was heterogeneous across the range. In response, we identified novel genotypes that could be introduced to genetically offset the fitness decay. Our study highlights the value of predictive models to aid seed provenancing and improve the success of revegetation projects.
Collapse
Affiliation(s)
- Andhika R Putra
- School of BioSciences, The University of Melbourne, Parkville, Victoria, Australia
| | - Jian D L Yen
- Arthur Rylah Institute for Environmental Research, Heidelberg, Victoria, Australia
| | | |
Collapse
|
32
|
Mostad P, Tillmar A, Kling D. Improved computations for relationship inference using low-coverage sequencing data. BMC Bioinformatics 2023; 24:90. [PMID: 36894920 PMCID: PMC9999603 DOI: 10.1186/s12859-023-05217-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 03/01/2023] [Indexed: 03/11/2023] Open
Abstract
Pedigree inference, for example determining whether two persons are second cousins or unrelated, can be done by comparing their genotypes at a selection of genetic markers. When the data for one or more of the persons is from low-coverage next generation sequencing (lcNGS), currently available computational methods either ignore genetic linkage or do not take advantage of the probabilistic nature of lcNGS data, relying instead on first estimating the genotype. We provide a method and software (see familias.name/lcNGS) bridging the above gap. Simulations indicate how our results are considerably more accurate compared to some previously available alternatives. Our method, utilizing a version of the Lander-Green algorithm, uses a group of symmetries to speed up calculations. This group may be of further interest in other calculations involving linked loci.
Collapse
Affiliation(s)
- Petter Mostad
- Mathematical Sciences, Chalmers University of Technology and the University of Gothenburg, Göteborg, Sweden.
| | - Andreas Tillmar
- Department of Forensic Genetics and Toxicology, National Board of Forensic Medicine, Linköping, Sweden.,Department of Biomedical and Clinical Sciences, Linköping University, Linköping, Sweden
| | - Daniel Kling
- Department of Forensic Genetics and Toxicology, National Board of Forensic Medicine, Linköping, Sweden.,Department of Forensic Sciences, Oslo University Hospital, Oslo, Norway.,Biostatistics (BIAS), Norwegian University of Life Sciences, Ås, Norway
| |
Collapse
|
33
|
Nyerki E, Kalmár T, Schütz O, Lima RM, Neparáczki E, Török T, Maróti Z. correctKin: an optimized method to infer relatedness up to the 4th degree from low-coverage ancient human genomes. Genome Biol 2023; 24:38. [PMID: 36855115 PMCID: PMC9972692 DOI: 10.1186/s13059-023-02882-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Accepted: 02/17/2023] [Indexed: 03/02/2023] Open
Abstract
Kinship analysis from very low-coverage ancient sequences has been possible up to the second degree with large uncertainties. We propose a new, accurate, and fast method, correctKin, to estimate the kinship coefficient and the confidence interval using low-coverage ancient data. We perform simulations and also validate correctKin on experimental modern and ancient data with widely different genome coverages (0.12×-11.9×) using samples with known family relations and known/unknown population structure. Based on our results, correctKin allows for the reliable identification of relatedness up to the 4th degree from variable/low-coverage ancient or badly degraded forensic whole genome sequencing data.
Collapse
Affiliation(s)
- Emil Nyerki
- Department of Pediatrics, University of Szeged Albert Szent-Györgyi Medical Center Faculty of Medicine, Szeged, Hungary
- Department of Archaeogenetics, Institute of Hungarian Research, Budapest, Hungary
| | - Tibor Kalmár
- Department of Pediatrics, University of Szeged Albert Szent-Györgyi Medical Center Faculty of Medicine, Szeged, Hungary
| | - Oszkár Schütz
- Department of Genetics, University of Szeged, Szeged, Hungary
| | - Rui M Lima
- Institute of Plant Biology, Biological Research Centre, Szeged, Hungary
| | - Endre Neparáczki
- Department of Archaeogenetics, Institute of Hungarian Research, Budapest, Hungary
- Department of Genetics, University of Szeged, Szeged, Hungary
| | - Tibor Török
- Department of Archaeogenetics, Institute of Hungarian Research, Budapest, Hungary
- Department of Genetics, University of Szeged, Szeged, Hungary
| | - Zoltán Maróti
- Department of Pediatrics, University of Szeged Albert Szent-Györgyi Medical Center Faculty of Medicine, Szeged, Hungary.
- Department of Archaeogenetics, Institute of Hungarian Research, Budapest, Hungary.
| |
Collapse
|
34
|
Arambepola R, Bérubé S, Freedman B, Taylor SM, Prudhomme O’Meara W, Obala AA, Wesolowski A. Exploring how space, time, and sampling impact our ability to measure genetic structure across Plasmodium falciparum populations. FRONTIERS IN EPIDEMIOLOGY 2023; 3:1058871. [PMID: 38516334 PMCID: PMC10956351 DOI: 10.3389/fepid.2023.1058871] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 01/18/2023] [Indexed: 03/23/2024]
Abstract
A primary use of malaria parasite genomics is identifying highly related infections to quantify epidemiological, spatial, or temporal factors associated with patterns of transmission. For example, spatial clustering of highly related parasites can indicate foci of transmission and temporal differences in relatedness can serve as evidence for changes in transmission over time. However, for infections in settings of moderate to high endemicity, understanding patterns of relatedness is compromised by complex infections, overall high forces of infection, and a highly diverse parasite population. It is not clear how much these factors limit the utility of using genomic data to better understand transmission in these settings. In particular, further investigation is required to determine which patterns of relatedness we expect to see with high quality, densely sampled genomic data in a high transmission setting and how these observations change under different study designs, missingness, and biases in sample collection. Here we investigate two identity-by-state measures of relatedness and apply them to amplicon deep sequencing data collected as part of a longitudinal cohort in Western Kenya that has previously been analysed to identify individual-factors associated with sharing parasites with infected mosquitoes. With these data we use permutation tests, to evaluate several hypotheses about spatiotemporal patterns of relatedness compared to a null distribution. We observe evidence of temporal structure, but not of fine-scale spatial structure in the cohort data. To explore factors associated with the lack of spatial structure in these data, we construct a series of simplified simulation scenarios using an agent based model calibrated to entomological, epidemiological and genomic data from this cohort study to investigate whether the lack of spatial structure observed in the cohort could be due to inherent power limitations of this analytical method. We further investigate how our hypothesis testing behaves under different sampling schemes, levels of completely random and systematic missingness, and different transmission intensities.
Collapse
Affiliation(s)
- Rohan Arambepola
- Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Batlimore, MD, United States
| | - Sophie Bérubé
- Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Batlimore, MD, United States
| | - Betsy Freedman
- Division of Infectious Diseases, Duke University Medical Center, Durham, NC, United States
| | - Steve M. Taylor
- Division of Infectious Diseases, Duke University Medical Center, Durham, NC, United States
- Duke Global Health Institute, Durham, NC, United States
| | - Wendy Prudhomme O’Meara
- Division of Infectious Diseases, Duke University Medical Center, Durham, NC, United States
- Duke Global Health Institute, Durham, NC, United States
| | | | - Amy Wesolowski
- Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Batlimore, MD, United States
| |
Collapse
|
35
|
Mary-Huard T, Balding D. Fast and accurate joint inference of coancestry parameters for populations and/or individuals. PLoS Genet 2023; 19:e1010054. [PMID: 36656906 PMCID: PMC9888729 DOI: 10.1371/journal.pgen.1010054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Revised: 01/31/2023] [Accepted: 12/01/2022] [Indexed: 01/20/2023] Open
Abstract
We introduce a fast, new algorithm for inferring from allele count data the FST parameters describing genetic distances among a set of populations and/or unrelated diploid individuals, and a tree with branch lengths corresponding to FST values. The tree can reflect historical processes of splitting and divergence, but seeks to represent the actual genetic variance as accurately as possible with a tree structure. We generalise two major approaches to defining FST, via correlations and mismatch probabilities of sampled allele pairs, which measure shared and non-shared components of genetic variance. A diploid individual can be treated as a population of two gametes, which allows inference of coancestry coefficients for individuals as well as for populations, or a combination of the two. A simulation study illustrates that our fast method-of-moments estimation of FST values, simultaneously for multiple populations/individuals, gains statistical efficiency over pairwise approaches when the population structure is close to tree-like. We apply our approach to genome-wide genotypes from the 26 worldwide human populations of the 1000 Genomes Project. We first analyse at the population level, then a subset of individuals and in a final analysis we pool individuals from the more homogeneous populations. This flexible analysis approach gives advantages over traditional approaches to population structure/coancestry, including visual and quantitative assessments of long-standing questions about the relative magnitudes of within- and between-population genetic differences.
Collapse
Affiliation(s)
- Tristan Mary-Huard
- MIA-Paris, INRAE, AgroParisTech, Université Paris-Saclay, Palaiseau, France
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, Génétique Quantitative et Evolution—Le Moulon, Gif-sur-Yvette, France
- * E-mail:
| | - David Balding
- Melbourne Integrative Genomics, School of BioSciences and School of Mathematics & Statistics, University of Melbourne, Parkville, Victoria, Australia
| |
Collapse
|
36
|
Lavanchy E, Goudet J. Effect of reduced genomic representation on using runs of homozygosity for inbreeding characterization. Mol Ecol Resour 2023; 23:787-802. [PMID: 36626297 DOI: 10.1111/1755-0998.13755] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Revised: 12/22/2022] [Accepted: 01/05/2023] [Indexed: 01/11/2023]
Abstract
Genomic measures of inbreeding based on identical-by-descent (IBD) segments are increasingly used to measure inbreeding and mostly estimated on SNP arrays and whole-genome sequencing (WGS) data. However, some softwares recurrently used for their estimation assume that genomic positions which have not been genotyped are nonvariant. This might be true for WGS data, but not for reduced genomic representations and can lead to spurious IBD segments estimation. In this project, we simulated the outputs of WGS, two SNP arrays of different sizes and RAD-sequencing for three populations with different sizes and histories. We compare the results of IBD segments estimation with two softwares: runs of homozygosity (ROHs) estimated with PLINK and homozygous-by-descent (HBD) segments estimated with RZooRoH. We demonstrate that to obtain meaningful estimates of inbreeding, RZooRoH requires a SNPs density 11 times smaller compared to PLINK: ranks of inbreeding coefficients were conserved among individuals above 22 SNPs/Mb for PLINK and 2 SNPs/Mb for RZooRoH. We also show that in populations with simple demographic histories, distribution of ROHs and HBD segments are correctly estimated with both SNP arrays and WGS. PLINK correctly estimated distribution of ROHs with SNP densities above 22 SNPs/Mb, while RZooRoH correctly estimated distribution of HBD segments with SNPs densities above 11 SNPs/Mb. However, in a population with a more complex demographic history, RZooRoH resulted in better distribution of IBD segments estimation compared to PLINK even with WGS data. Consequently, we advise researchers to use either methods relying on excess homozygosity averaged across SNPs or model-based HBD segments calling methods for inbreeding estimations.
Collapse
Affiliation(s)
- Eléonore Lavanchy
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland
| | - Jérôme Goudet
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland
| |
Collapse
|
37
|
Angarita Barajas BK, Cantet RJC, Steibel JP, Schrauf MF, Forneris NS. Heritability estimates and predictive ability for pig meat quality traits using identity-by-state and identity-by-descent relationships in an F 2 population. J Anim Breed Genet 2023; 140:13-27. [PMID: 36300585 DOI: 10.1111/jbg.12742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Accepted: 10/05/2022] [Indexed: 12/13/2022]
Abstract
Genomic relationships can be computed with dense genome-wide genotypes through different methods, either based on identity-by-state (IBS) or identity-by-descent (IBD). The latter has been shown to increase the accuracy of both estimated relationships and predicted breeding values. However, it is not clear whether an IBD approach would achieve greater heritability ( h 2 ) and predictive ability ( r ̂ y , y ̂ ) than its IBS counterpart for data with low-depth pedigrees. Here, we compare both approaches in terms of the estimated of h 2 and r ̂ y , y ̂ , using data on meat quality and carcass traits recorded in experimental crossbred pigs, with a pedigree constrained to only three generations. Three animal models were fitted which differed on the relationship matrix: an IBS model ( G IBS ), an IBD (defined within the known pedigree) model ( G IBD ), and a pedigree model ( A 22 ). In 9 of 20 traits, the range of increase for the estimates of σ u 2 and h 2 was 1.2-2.9 times greater with G IBS and G IBD models than with A 22 . Whereas for all traits, both parameters were similar between genomic models. The r ̂ y , y ̂ of the genomic models was higher compared to A 22 . A scarce increment in r ̂ y , y ̂ was found with G IBS when compared to G IBD , most likely due to the former recovering sizeable relationships among founder F0 animals.
Collapse
Affiliation(s)
| | - Rodolfo J C Cantet
- Instituto de Investigaciones en Producción Animal (INPA-CONICET-UBA), Buenos Aires, Argentina.,Departamento de Producción Animal, Facultad de Agronomía, Universidad de Buenos Aires, Buenos Aires, Argentina
| | - Juan P Steibel
- Department of Animal Science, Michigan State University, East Lansing, Michigan, USA.,Department of Fisheries and Wildlife, Michigan State University, East Lansing, Michigan, USA
| | - Matias F Schrauf
- Departamento de Métodos Cuantitativos y Sistemas de Información, Facultad de Agronomía, Universidad de Buenos Aires, Buenos Aires, Argentina.,Animal Breeding & Genomics, Wageningen Livestock Research, Wageningen University & Research, Wageningen, The Netherlands
| | - Natalia S Forneris
- Instituto de Investigaciones en Producción Animal (INPA-CONICET-UBA), Buenos Aires, Argentina.,Departamento de Producción Animal, Facultad de Agronomía, Universidad de Buenos Aires, Buenos Aires, Argentina
| |
Collapse
|
38
|
Vigeland MD. Two-locus identity coefficients in pedigrees. G3 (BETHESDA, MD.) 2022; 13:6917066. [PMID: 36525359 PMCID: PMC9911075 DOI: 10.1093/g3journal/jkac326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 09/08/2022] [Accepted: 11/24/2022] [Indexed: 12/23/2022]
Abstract
This paper proposes a solution to a long-standing problem concerning the joint distribution of allelic identity by descent between two individuals at two linked loci. Such distributions have important applications across various fields of genetics, and detailed formulas for selected relationships appear scattered throughout the literature. However, these results were obtained essentially by brute force, with no efficient method available for general pedigrees. The recursive algorithm described in this paper, and its implementation in R, allow efficient calculation of two-locus identity coefficients in any pedigree. As a result, many existing procedures and techniques may, for the first time, be applied to complex and inbred relationships. Two such applications are discussed, concerning the expected likelihood ratio in forensic kinship testing, and variances in realized relatedness.
Collapse
Affiliation(s)
- Magnus Dehli Vigeland
- Corresponding author: Department of Medical Genetics, Oslo University Hospital and the University of Oslo, PO Box 4950 Nydalen, 0424 Oslo, Norway.
| |
Collapse
|
39
|
Davidović S, Marinković S, Hribšek I, Patenković A, Stamenković-Radak M, Tanasković M. Sex ratio and relatedness in the Griffon vulture ( Gyps fulvus) population of Serbia. PeerJ 2022; 10:e14477. [PMID: 36523455 PMCID: PMC9745909 DOI: 10.7717/peerj.14477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Accepted: 11/07/2022] [Indexed: 12/12/2022] Open
Abstract
Background Once a widespread species across the region of Southeast Europe, the Griffon vulture is now confined to small and isolated populations across the Balkan Peninsula. The population from Serbia represents its biggest and most viable population that can serve as an important reservoir of genetic diversity from which the birds can be used for the region's reintroduction programmes. The available genetic data for this valuable population are scarce and as a protected species that belongs to the highly endangered vulture group, it needs to be well described so that it can be properly managed and used as a restocking population. Considering the serious recent bottleneck event that the Griffon vulture population from Serbia experienced we estimated the overall relatedness among the birds from this population. Sex ratio, another important parameter that shows the vitality and strength of the population was evaluated as well. Methods During the annual monitoring that was performed in the period from 2013-2021, we collected blood samples from individual birds that were marked in the nests. In total, 169 samples were collected and each was used for molecular sexing while 58 presumably unrelated birds from different nests were used for inbreeding and relatedness analyses. The relatedness was estimated using both biparentally (10 microsatellite loci) and uniparentally (Cytb and D-loop I of mitochondrial DNA) inherited markers. Results The level of inbreeding was relatively high and on average it was 8.3% while the mean number of relatives for each bird was close to three. The sex ratio was close to 1:1 and for the analysed period of 9 years, it didn't demonstrate a statistically significant deviation from the expected ratio of 1:1, suggesting that this is a stable and healthy population. Our data suggest that, even though a relatively high level of inbreeding can be detected among the individual birds, the Griffon vulture population from Serbia can be used as a source population for restocking and reintroduction programmes in the region. These data combined with previously observed genetic differentiation between the populations from the Iberian and Balkan Peninsulas suggest that the introduction of foreign birds should be avoided and that local birds should be used instead.
Collapse
Affiliation(s)
- Slobodan Davidović
- Department of Genetics of Populations and Ecogenotoxicology, Institute for Biological Research “Siniša Stanković”—National Institute of the Republic of Serbia, University of Belgrade, Belgrade, Serbia,Birds of Prey Protection Foundation, Belgrade, Serbia
| | - Saša Marinković
- Birds of Prey Protection Foundation, Belgrade, Serbia,Department of Ecology, Institute for Biological Research “Siniša Stanković”—National Institute of the Republic of Serbia, University of Belgrade, Belgrade, Serbia
| | - Irena Hribšek
- Birds of Prey Protection Foundation, Belgrade, Serbia,Natural History Museum Belgrade, Belgrade, Serbia
| | - Aleksandra Patenković
- Department of Genetics of Populations and Ecogenotoxicology, Institute for Biological Research “Siniša Stanković”—National Institute of the Republic of Serbia, University of Belgrade, Belgrade, Serbia
| | - Marina Stamenković-Radak
- Department of Genetics of Populations and Ecogenotoxicology, Institute for Biological Research “Siniša Stanković”—National Institute of the Republic of Serbia, University of Belgrade, Belgrade, Serbia,Faculty of Biology, University of Belgrade, Belgrade, Serbia
| | - Marija Tanasković
- Department of Genetics of Populations and Ecogenotoxicology, Institute for Biological Research “Siniša Stanković”—National Institute of the Republic of Serbia, University of Belgrade, Belgrade, Serbia
| |
Collapse
|
40
|
Wang S, Kim M, Li W, Jiang X, Chen H, Harmanci A. Privacy-aware estimation of relatedness in admixed populations. Brief Bioinform 2022; 23:bbac473. [PMID: 36384083 PMCID: PMC10144692 DOI: 10.1093/bib/bbac473] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 09/07/2022] [Accepted: 10/02/2022] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Estimation of genetic relatedness, or kinship, is used occasionally for recreational purposes and in forensic applications. While numerous methods were developed to estimate kinship, they suffer from high computational requirements and often make an untenable assumption of homogeneous population ancestry of the samples. Moreover, genetic privacy is generally overlooked in the usage of kinship estimation methods. There can be ethical concerns about finding unknown familial relationships in third-party databases. Similar ethical concerns may arise while estimating and reporting sensitive population-level statistics such as inbreeding coefficients for the concerns around marginalization and stigmatization. RESULTS Here, we present SIGFRIED, which makes use of existing reference panels with a projection-based approach that simplifies kinship estimation in the admixed populations. We use simulated and real datasets to demonstrate the accuracy and efficiency of kinship estimation. We present a secure federated kinship estimation framework and implement a secure kinship estimator using homomorphic encryption-based primitives for computing relatedness between samples in two different sites while genotype data are kept confidential. Source code and documentation for our methods can be found at https://doi.org/10.5281/zenodo.7053352. CONCLUSIONS Analysis of relatedness is fundamentally important for identifying relatives, in association studies, and for estimation of population-level estimates of inbreeding. As the awareness of individual and group genomic privacy is growing, privacy-preserving methods for the estimation of relatedness are needed. Presented methods alleviate the ethical and privacy concerns in the analysis of relatedness in admixed, historically isolated and underrepresented populations. SHORT ABSTRACT Genetic relatedness is a central quantity used for finding relatives in databases, correcting biases in genome wide association studies and for estimating population-level statistics. Methods for estimating genetic relatedness have high computational requirements, and occasionally do not consider individuals from admixed ancestries. Furthermore, the ethical concerns around using genetic data and calculating relatedness are not considered. We present a projection-based approach that can efficiently and accurately estimate kinship. We implement our method using encryption-based techniques that provide provable security guarantees to protect genetic data while kinship statistics are computed among multiple sites.
Collapse
Affiliation(s)
- Su Wang
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Miran Kim
- Department of Mathematics, Hanyang University, Seoul, 04763. Republic of Korea
| | - Wentao Li
- Center for Secure Artificial intelligence For hEalthcare (SAFE), School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX, 77030, USA
| | - Xiaoqian Jiang
- Center for Secure Artificial intelligence For hEalthcare (SAFE), School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX, 77030, USA
| | - Han Chen
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Arif Harmanci
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| |
Collapse
|
41
|
Duru S, Altınçekiç ŞÖ, Hanoğlu Oral H. Effectiveness of genetic grouping with different strategies for estimation of genetic parameters in growth traits in Merino lambs. Small Rumin Res 2022. [DOI: 10.1016/j.smallrumres.2022.106835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
42
|
Nilforooshan MA, Ruíz-Flores A. Understanding factors influencing the estimated genetic variance and the distribution of breeding values. Front Genet 2022; 13:1000228. [PMID: 36313459 PMCID: PMC9606665 DOI: 10.3389/fgene.2022.1000228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 09/26/2022] [Indexed: 11/13/2022] Open
Abstract
This study investigated the main factors influencing the genetic variance and the variance of breeding values (EBV). The first is the variance of genetic values in the base population, and the latter is the variance of genetic values in the population under evaluation. These variances are important as improper variances can lead to systematic bias. The inverse of the genetic relationship matrix (K -1) and the phenotypic variance are the main factors influencing the genetic variance and heritability (h2). These factors and h2 are also the main factors influencing the variance of EBVs. Pedigree- and genomic-based relationship matrices (A and G as K) and phenotypes on 599 wheat lines were used. Also, data were simulated, and a hybrid (genomic-pedigree) relationship matrix (H as K) and phenotypes were used. First, matrix K underwent a transformation (K* = w K + α 11' + β I), and the responses in the mean and variation of diag(K -1) and offdiag(K -1) elements, and genetic variance in the form of h2 were recorded. Then, the original K was inverted, and matrix K -1 underwent the same transformations as K, and the responses in the h2 estimate and the variance of EBVs in the forms of correlation and regression coefficients with the EBVs estimated based on the original K -1 were recorded. In response to weighting K by w, the estimated genetic variance changed by 1/w. We found that μ(diag(K)) - μ(offdiag(K)) influences the genetic variance. As such, α did not change the genetic variance, and increasing β increased the estimated genetic variance. Weighting K -1 by w was equivalent to weighting K by 1/w. Using the weighted K -1 together with its corresponding h2, EBVs remained unchanged, which shows the importance of using variance components that are compatible with the K -1. Increasing β I added to K -1 increased the estimated genetic variance, and the effect of α 11' was minor. We found that larger variation of diag(K -1) and higher concentration of offdiag(K -1) around the mean (0) are responsible for lower h2 estimate and variance of EBVs.
Collapse
|
43
|
Xu J, Shin J, McGee M, Unger S, Bando N, Sato J, Vandewouw M, Patel Y, Branson HM, Paus T, Pausova Z, O'Connor DL. Intake of mother's milk by very-low-birth-weight infants and variation in DNA methylation of genes involved in neurodevelopment at 5.5 years of age. Am J Clin Nutr 2022; 116:1038-1048. [PMID: 35977396 PMCID: PMC9535521 DOI: 10.1093/ajcn/nqac221] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2021] [Accepted: 08/09/2022] [Indexed: 01/26/2023] Open
Abstract
BACKGROUND Mechanisms responsible for associations between intake of mother's milk in very-low-birth-weight (VLBW, <1500 g) infants and later neurodevelopment are poorly understood. It is proposed that early nutrition may affect neurodevelopmental pathways by altering gene expression through epigenetic modification. Variation in DNA methylation (DNAm) at cytosine-guanine dinucleotides (CpGs) is a commonly studied epigenetic modification. OBJECTIVES We aimed to assess whether early mother's milk intake by VLBW infants is associated with variations in DNAm at 5.5 y, and whether these variations correlate with neurodevelopmental phenotypes. METHODS This cohort study was a 5.5-y follow-up (2016-2018) of VLBW infants born in Ontario, Canada who participated in the Donor Milk for Improved Neurodevelopmental Outcomes trial. We performed an epigenome-wide association study (EWAS) to test whether percentage mother's milk (not including supplemental donor milk) during hospitalization was associated with DNAm in buccal cells during early childhood (n = 143; mean ± SD age: 5.7 ± 0.2 y; birth weight: 1008 ± 517 g). DNAm was assessed with the Illumina Infinium MethylationEPIC array at 814,583 CpGs. In secondary analyses, we tested associations between top-ranked CpGs and measures of early childhood neurodevelopment, e.g., total surface area of the cerebral cortex (n = 41, MRI) and Full-Scale IQ (n = 133, Wechsler Preschool and Primary Scale of Intelligence-IV). RESULTS EWAS analysis demonstrated percentage mother's milk intake by VLBW infants during hospitalization was associated with DNAm at 2 CpGs, cg03744440 [myosin XVB (MYO15B)] and cg00851389 [metallothionein 1A (MT1A)], at 5.5 y (P < 9E-08). Gene set enrichment analysis indicated that top-ranked CpGs (P < 0.001) were annotated to genes enriched in neurodevelopmental biological processes. Corroborating these findings, DNAm at several top identified CpGs from the EWAS was associated with cortical surface area and IQ at 5.5 y (P < 0.05). CONCLUSIONS In-hospital percentage mother's milk intake by VLBW infants was associated with variations in DNAm of neurodevelopmental genes at 5.5 y; some of these DNAm variations are associated with brain structure and IQ.This trial was registered at isrctn.com as ISRCTN35317141 and at clinicaltrials.gov as NCT02759809.
Collapse
Affiliation(s)
- Jingxiong Xu
- Translational Medicine Program, The Hospital for Sick Children, Toronto, Ontario, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, Ontario, Canada
| | - Jean Shin
- Translational Medicine Program, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Meghan McGee
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| | - Sharon Unger
- Department of Nutritional Sciences, University of Toronto, Toronto, Ontario, Canada
- Department of Pediatrics, University of Toronto, Toronto, Ontario, Canada
- Department of Pediatrics, Sinai Health, Toronto, Ontario, Canada
- Division of Neonatology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Nicole Bando
- Translational Medicine Program, The Hospital for Sick Children, Toronto, Ontario, Canada
- Department of Nutritional Sciences, University of Toronto, Toronto, Ontario, Canada
| | - Julie Sato
- Department of Diagnostic Imaging, The Hospital for Sick Children, Toronto, Ontario, Canada
- Department of Psychology, University of Toronto, Toronto, Ontario, Canada
- Neuroscience & Mental Health Program, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Marlee Vandewouw
- Department of Diagnostic Imaging, The Hospital for Sick Children, Toronto, Ontario, Canada
- Neuroscience & Mental Health Program, The Hospital for Sick Children, Toronto, Ontario, Canada
- Autism Research Centre, Bloorview Research Institute, Holland Bloorview Kids Rehabilitation Hospital, Toronto, Ontario, Canada
- Institute of Biomedical Engineering, University of Toronto, Toronto, Ontario, Canada
| | - Yash Patel
- Institute of Medical Sciences, University of Toronto, Toronto, Ontario, Canada
| | - Helen M Branson
- Division of Neuroradiology, Department of Medical Imaging, The Hospital for Sick Children, Toronto, Ontario, Canada
- Department of Medical Imaging, University of Toronto, Toronto, Ontario, Canada
| | - Tomas Paus
- Department of Psychology, University of Toronto, Toronto, Ontario, Canada
- Institute of Medical Sciences, University of Toronto, Toronto, Ontario, Canada
- Department of Psychiatry, University of Toronto, Toronto, Ontario, Canada
- Department of Psychiatry, Faculty of Medicine and CHU Sainte-Justine, University of Montreal, Montreal, Quebec, Canada
- Department of Neuroscience, Faculty of Medicine and CHU Sainte-Justine, University of Montreal, Montreal, Quebec, Canada
| | - Zdenka Pausova
- Translational Medicine Program, The Hospital for Sick Children, Toronto, Ontario, Canada
- Department of Nutritional Sciences, University of Toronto, Toronto, Ontario, Canada
- Department of Physiology, University of Toronto, Toronto, Ontario, Canada
| | - Deborah L O'Connor
- Translational Medicine Program, The Hospital for Sick Children, Toronto, Ontario, Canada
- Department of Nutritional Sciences, University of Toronto, Toronto, Ontario, Canada
- Department of Pediatrics, Sinai Health, Toronto, Ontario, Canada
| |
Collapse
|
44
|
Caliebe A, Tekola‐Ayele F, Darst BF, Wang X, Song YE, Gui J, Sebro RA, Balding DJ, Saad M, Dubé M. Including diverse and admixed populations in genetic epidemiology research. Genet Epidemiol 2022; 46:347-371. [PMID: 35842778 PMCID: PMC9452464 DOI: 10.1002/gepi.22492] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 05/31/2022] [Accepted: 06/06/2022] [Indexed: 11/25/2022]
Abstract
The inclusion of ancestrally diverse participants in genetic studies can lead to new discoveries and is important to ensure equitable health care benefit from research advances. Here, members of the Ethical, Legal, Social, Implications (ELSI) committee of the International Genetic Epidemiology Society (IGES) offer perspectives on methods and analysis tools for the conduct of inclusive genetic epidemiology research, with a focus on admixed and ancestrally diverse populations in support of reproducible research practices. We emphasize the importance of distinguishing socially defined population categorizations from genetic ancestry in the design, analysis, reporting, and interpretation of genetic epidemiology research findings. Finally, we discuss the current state of genomic resources used in genetic association studies, functional interpretation, and clinical and public health translation of genomic findings with respect to diverse populations.
Collapse
Affiliation(s)
- Amke Caliebe
- Institute of Medical Informatics and StatisticsKiel University and University Hospital Schleswig‐HolsteinKielGermany
| | - Fasil Tekola‐Ayele
- Epidemiology Branch, Division of Population Health Research, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human DevelopmentNational Institutes of HealthBethesdaMarylandUSA
| | - Burcu F. Darst
- Center for Genetic EpidemiologyUniversity of Southern CaliforniaLos AngelesCaliforniaUSA
- Public Health Sciences DivisionFred Hutchinson Cancer Research CenterSeattleWashingtonUSA
| | - Xuexia Wang
- Department of MathematicsUniversity of North TexasDentonTexasUSA
| | - Yeunjoo E. Song
- Department of Population and Quantitative Health SciencesCase Western Reserve UniversityClevelandOhioUSA
| | - Jiang Gui
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth CollegeOne Medical Center Dr.LebanonNew HampshireUSA
| | | | - David J. Balding
- Melbourne Integrative Genomics, Schools of BioSciences and of Mathematics & StatisticsUniversity of MelbourneMelbourneAustralia
| | - Mohamad Saad
- Qatar Computing Research InstituteHamad Bin Khalifa UniversityDohaQatar
- Neuroscience Research Center, Faculty of Medical SciencesLebanese UniversityBeirutLebanon
| | - Marie‐Pierre Dubé
- Department of Medicine, and Social and Preventive MedicineUniversité de MontréalMontréalQuébecCanada
- Beaulieu‐Saucier Pharmacogenomcis CentreMontreal Heart InstituteMontrealCanada
| | | |
Collapse
|
45
|
Herzig AF, Ciullo M, Leutenegger AL, Perdry H. Moment estimators of relatedness from low-depth whole-genome sequencing data. BMC Bioinformatics 2022; 23:254. [PMID: 35751014 PMCID: PMC9233360 DOI: 10.1186/s12859-022-04795-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 06/09/2022] [Indexed: 11/29/2022] Open
Abstract
Background Estimating relatedness is an important step for many genetic study designs. A variety of methods for estimating coefficients of pairwise relatedness from genotype data have been proposed. Both the kinship coefficient \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\varphi$$\end{document}φ and the fraternity coefficient \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\psi$$\end{document}ψ for all pairs of individuals are of interest. However, when dealing with low-depth sequencing or imputation data, individual level genotypes cannot be confidently called. To ignore such uncertainty is known to result in biased estimates. Accordingly, methods have recently been developed to estimate kinship from uncertain genotypes. Results We present new method-of-moment estimators of both the coefficients \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\varphi$$\end{document}φ and \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\psi$$\end{document}ψ calculated directly from genotype likelihoods. We have simulated low-depth genetic data for a sample of individuals with extensive relatedness by using the complex pedigree of the known genetic isolates of Cilento in South Italy. Through this simulation, we explore the behaviour of our estimators, demonstrate their properties, and show advantages over alternative methods. A demonstration of our method is given for a sample of 150 French individuals with down-sampled sequencing data. Conclusions We find that our method can provide accurate relatedness estimates whilst holding advantages over existing methods in terms of robustness, independence from external software, and required computation time. The method presented in this paper is referred to as LowKi (Low-depth Kinship) and has been made available in an R package (https://github.com/genostats/LowKi). Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04795-8.
Collapse
Affiliation(s)
| | - M Ciullo
- Institute of Genetics and Biophysics A. Buzzati-Traverso - CNR, Naples, Italy.,IRCCS Neuromed, Pozzilli, Isernia, Italy
| | | | - A-L Leutenegger
- Inserm, Université Paris Cité, UMR 1141, NeuroDiderot, 75019, Paris, France
| | - H Perdry
- CESP Inserm U1018, Université Paris-Saclay, UVSQ, Villejuif, France
| |
Collapse
|
46
|
Feldmann MJ, Piepho HP, Knapp SJ. Average semivariance directly yields accurate estimates of the genomic variance in complex trait analyses. G3 GENES|GENOMES|GENETICS 2022; 12:6571389. [PMID: 35442424 PMCID: PMC9157152 DOI: 10.1093/g3journal/jkac080] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/27/2021] [Accepted: 03/17/2022] [Indexed: 11/23/2022]
Abstract
Many important traits in plants, animals, and microbes are polygenic and challenging to improve through traditional marker-assisted selection. Genomic prediction addresses this by incorporating all genetic data in a mixed model framework. The primary method for predicting breeding values is genomic best linear unbiased prediction, which uses the realized genomic relationship or kinship matrix (K) to connect genotype to phenotype. Genomic relationship matrices share information among entries to estimate the observed entries’ genetic values and predict unobserved entries’ genetic values. One of the main parameters of such models is genomic variance (σg2), or the variance of a trait associated with a genome-wide sample of DNA polymorphisms, and genomic heritability (hg2); however, the seminal papers introducing different forms of K often do not discuss their effects on the model estimated variance components despite their importance in genetic research and breeding. Here, we discuss the effect of several standard methods for calculating the genomic relationship matrix on estimates of σg2 and hg2. With current approaches, we found that the genomic variance tends to be either overestimated or underestimated depending on the scaling and centering applied to the marker matrix (Z), the value of the average diagonal element of K, and the assortment of alleles and heterozygosity (H) in the observed population. Using the average semivariance, we propose a new matrix, KASV, that directly yields accurate estimates of σg2 and hg2 in the observed population and produces best linear unbiased predictors equivalent to routine methods in plants and animals.
Collapse
Affiliation(s)
- Mitchell J Feldmann
- Department of Plant Sciences, University of California , Davis, CA 95616, USA
| | - Hans-Peter Piepho
- Biostatistics Unit, Institute of Crop Science, University of Hohenheim , 70593 Stuttgart, Germany
| | - Steven J Knapp
- Department of Plant Sciences, University of California , Davis, CA 95616, USA
| |
Collapse
|
47
|
Fan C, Mancuso N, Chiang CWK. A genealogical estimate of genetic relationships. Am J Hum Genet 2022; 109:812-824. [PMID: 35417677 PMCID: PMC9118131 DOI: 10.1016/j.ajhg.2022.03.016] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Accepted: 03/25/2022] [Indexed: 12/23/2022] Open
Abstract
The application of genetic relationships among individuals, characterized by a genetic relationship matrix (GRM), has far-reaching effects in human genetics. However, the current standard to calculate the GRM treats linked markers as independent and does not explicitly model the underlying genealogical history of the study sample. Here, we propose a coalescent-informed framework, namely the expected GRM (eGRM), to infer the expected relatedness between pairs of individuals given an ancestral recombination graph (ARG) of the sample. Through extensive simulations, we show that the eGRM is an unbiased estimate of latent pairwise genome-wide relatedness and is robust when computed with ARG inferred from incomplete genetic data. As a result, the eGRM better captures the structure of a population than the canonical GRM, even when using the same genetic information. More importantly, our framework allows a principled approach to estimate the eGRM at different time depths of the ARG, thereby revealing the time-varying nature of population structure in a sample. When applied to SNP array genotypes from a population sample from Northern and Eastern Finland, we find that clustering analysis with the eGRM reveals population structure driven by subpopulations that would not be apparent via the canonical GRM and that temporally the population model is consistent with recent divergence and expansion. Taken together, our proposed eGRM provides a robust tree-centric estimate of relatedness with wide application to genetic studies.
Collapse
Affiliation(s)
- Caoqi Fan
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
| | - Nicholas Mancuso
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA; Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Charleston W K Chiang
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
48
|
Speed D, Kaphle A, Balding DJ. SNP-based heritability and selection analyses: Improved models and new results. Bioessays 2022; 44:e2100170. [PMID: 35279859 DOI: 10.1002/bies.202100170] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 03/02/2022] [Accepted: 03/03/2022] [Indexed: 01/15/2023]
Abstract
Complex-trait genetics has advanced dramatically through methods to estimate the heritability tagged by SNPs, both genome-wide and in genomic regions of interest such as those defined by functional annotations. The models underlying many of these analyses are inadequate, and consequently many SNP-heritability results published to date are inaccurate. Here, we review the modelling issues, both for analyses based on individual genotype data and association test statistics, highlighting the role of a low-dimensional model for the heritability of each SNP. We use state-of-art models to present updated results about how heritability is distributed with respect to functional annotations in the human genome, and how it varies with allele frequency, which can reflect purifying selection. Our results give finer detail to the picture that has emerged in recent years of complex trait heritability widely dispersed across the genome. Confounding due to population structure remains a problem that summary statistic analyses cannot reliably overcome. Also see the video abstract here: https://youtu.be/WC2u03V65MQ.
Collapse
Affiliation(s)
- Doug Speed
- Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark.,Aarhus Institute of Advanced Studies, Aarhus University, Aarhus, Denmark.,UCL Genetics Institute, University College London, London, UK
| | - Anubhav Kaphle
- Melbourne Integrative Genomics, School of BioSciences and School of Mathematics and Statistics, University of Melbourne, Victoria, Australia
| | - David J Balding
- UCL Genetics Institute, University College London, London, UK.,Melbourne Integrative Genomics, School of BioSciences and School of Mathematics and Statistics, University of Melbourne, Victoria, Australia
| |
Collapse
|
49
|
Building a Calibration Set for Genomic Prediction, Characteristics to Be Considered, and Optimization Approaches. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2467:77-112. [PMID: 35451773 DOI: 10.1007/978-1-0716-2205-6_3] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
The efficiency of genomic selection strongly depends on the prediction accuracy of the genetic merit of candidates. Numerous papers have shown that the composition of the calibration set is a key contributor to prediction accuracy. A poorly defined calibration set can result in low accuracies, whereas an optimized one can considerably increase accuracy compared to random sampling, for a same size. Alternatively, optimizing the calibration set can be a way of decreasing the costs of phenotyping by enabling similar levels of accuracy compared to random sampling but with fewer phenotypic units. We present here the different factors that have to be considered when designing a calibration set, and review the different criteria proposed in the literature. We classified these criteria into two groups: model-free criteria based on relatedness, and criteria derived from the linear mixed model. We introduce criteria targeting specific prediction objectives including the prediction of highly diverse panels, biparental families, or hybrids. We also review different ways of updating the calibration set, and different procedures for optimizing phenotyping experimental designs.
Collapse
|
50
|
Guo Y, Zhang W, He R, Zheng C, Liu X, Ge M, Xu J. Investigating the Association Between rs2439302 Polymorphism and Thyroid Cancer: A Systematic Review and Meta-Analysis. Front Surg 2022; 9:877206. [PMID: 35558387 PMCID: PMC9086625 DOI: 10.3389/fsurg.2022.877206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Accepted: 03/15/2022] [Indexed: 11/30/2022] Open
Abstract
Background and Aims The extent of surgical treatment for most patients with thyroid cancer (TC) remains controversial and varies widely. As an emerging technology, genetic testing facilitates tumor typing and disease progression monitoring and is expected to influence the choice of surgical approach for patients with TC. Recent genome-wide association studies (GWASs) have identified that rs2439302 (8p12) variants near NRG1 are associated with TC risk; however, the results remain inconclusive. Therefore, we aimed to perform a meta-analysis to clarify the association between rs2439302 variants and the risk of TC. Methods We search eligible studies using Pubmed, Scopus, Embase, Web of Science, and Cochrane library by July 2021. We analyzed the pooled OR and the corresponding 95% confidence interval (95% CI) of the included studies and then conducted subgroup analysis according to the ethnicity. We also performed a sensitivity analysis to validate the findings. Results This meta-analysis finally included 7 studies involving 6,090 cases and 14,461 controls. Results showed that the G allele of the rs2439302 polymorphism was a significant risk factor of TC in Allele (G/C), Dominant (GG+GC/CC), Recessive (GG/GC+CC), Homozygote (GG/CC), Heterozygote (GC/CC) models, with pooled ORs of 1.38 (95%CI, 1.31–1.45), 1.51 (95%CI, 1.41–1.62), 1.52 (95%CI, 1.40–1.66), 1.90 (95%CI, 1.71–2.10), and 1.40 (95%CI, 1.30–1.51), respectively. The subgroup analysis showed that rs2439302 polymorphism was associated with higher TC risk in different ethnicities with OR > 1. The sensitivity analysis exhibited that the results were stable by omitting any included studies. Conclusions The study revealed that rs2439302 variants were associated with higher TC risk and may have a major influence on the choice of operative approach for patients with TC.
Collapse
Affiliation(s)
- Yawen Guo
- Department of Head and Neck Surgery, Otolaryngology & Head and Neck Center, Cancer Center, Zhejiang Provincial People's Hospital (Affiliated People's Hospital, Hangzhou Medical College), Hangzhou, China
- Department of Public Health, Zhejiang University School of Medicine, Hangzhou, China
- Key Laboratory of Endocrine Gland Diseases of Zhejiang Province, Hangzhou, China
| | - Wanchen Zhang
- Second Clinical Medical College, Zhejiang Chinese Medical University, Hangzhou, China
| | - Ru He
- School of Basic Medical Sciences and Forensic Medicine, Hangzhou Medical College, Hangzhou, China
| | - Chuanming Zheng
- Department of Head and Neck Surgery, Otolaryngology & Head and Neck Center, Cancer Center, Zhejiang Provincial People's Hospital (Affiliated People's Hospital, Hangzhou Medical College), Hangzhou, China
- Key Laboratory of Endocrine Gland Diseases of Zhejiang Province, Hangzhou, China
| | - Xuefeng Liu
- Neck and Breast Department 3, Tumour Hospital of Mudanjiang City, Mudanjiang, China
| | - Minghua Ge
- Department of Head and Neck Surgery, Otolaryngology & Head and Neck Center, Cancer Center, Zhejiang Provincial People's Hospital (Affiliated People's Hospital, Hangzhou Medical College), Hangzhou, China
- Key Laboratory of Endocrine Gland Diseases of Zhejiang Province, Hangzhou, China
| | - Jiajie Xu
- Department of Head and Neck Surgery, Otolaryngology & Head and Neck Center, Cancer Center, Zhejiang Provincial People's Hospital (Affiliated People's Hospital, Hangzhou Medical College), Hangzhou, China
- Key Laboratory of Endocrine Gland Diseases of Zhejiang Province, Hangzhou, China
- *Correspondence: Jiajie Xu
| |
Collapse
|