1
|
Peláez P, Lorenzana GP, Baesen K, Montes JR, De La Torre AR. Spatially heterogeneous selection and inter-varietal differentiation maintain population structure and local adaptation in a widespread conifer. BMC Ecol Evol 2024; 24:117. [PMID: 39227766 PMCID: PMC11373507 DOI: 10.1186/s12862-024-02304-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Accepted: 08/28/2024] [Indexed: 09/05/2024] Open
Abstract
BACKGROUND Douglas-fir (Pseudotsuga menziesii [Mirb.] Franco) plays a critical role in the ecology and economy of Western North America. This conifer species comprises two distinct varieties: the coastal variety (var. menziesii) along the Pacific coast, and the interior variety (var. glauca) spanning the Rocky Mountains into Mexico, with instances of inter-varietal hybridization in Washington and British Columbia. Recent investigations have focused on assessing environmental pressures shaping Douglas-fir's genomic variation for a better understanding of its evolutionary and adaptive responses. Here, we characterize range-wide population structure, estimate inter-varietal hybridization levels, identify candidate loci for climate adaptation, and forecast shifts in species and variety distribution under future climates. RESULTS Using a custom SNP-array, we genotyped 540 trees revealing four distinct clusters with asymmetric admixture patterns in the hybridization zone. Higher genetic diversity observed in coastal and hybrid populations contrasts with lower diversity in inland populations of the southern Rockies and Mexico, exhibiting a significant isolation by distance pattern, with less marked but still significant isolation by environment. For both varieties, we identified candidate loci associated with local adaptation, with hundreds of genes linked to processes such as stimulus response, reactions to chemical compounds, and metabolic functions. Ecological niche modeling revealed contrasting potential distribution shifts among the varieties in the coming decades, with interior populations projected to lose habitat and become more vulnerable, while coastal populations are expected to gain suitable areas. CONCLUSIONS Overall, our findings provide crucial insights into the population structure and adaptive potential of Douglas-fir, with the coastal variety being the most likely to preserve its evolutionary path throughout the present century, which carry implications for the conservation and management of this species across their range.
Collapse
Affiliation(s)
- Pablo Peláez
- School of Forestry, Northern Arizona University, Flagstaff, AZ, USA
| | | | - Kailey Baesen
- School of Forestry, Northern Arizona University, Flagstaff, AZ, USA
| | - Jose Ruben Montes
- Instituto de Biología, Universidad Nacional Autónoma de México, Ciudad de México, México
| | | |
Collapse
|
2
|
Mendoza-Maya E, Giles-Pérez GI, Vargas-Hernández JJ, Sáenz-Romero C, Martínez-Trujillo M, de Los Angeles Beltrán-Nambo M, Hernández-Díaz JC, Prieto-Ruíz JÁ, Jaramillo-Correa JP, Wehenkel C. Evolutionary drivers of reproductive fitness in two endangered forest trees. THE NEW PHYTOLOGIST 2024. [PMID: 39187985 DOI: 10.1111/nph.20073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Accepted: 08/06/2024] [Indexed: 08/28/2024]
Abstract
Population genetics theory predicts a relationship between fitness, genetic diversity (H0) and effective population size (Ne), which is often tested through heterozygosity-fitness correlations (HFCs). We tested whether population and individual fertility and heterozygosity are correlated in two endangered Mexican spruces (Picea martinezii and Picea mexicana) by combining genomic, demographic and reproductive data (seed development and germination traits). For both species, there was a positive correlation between population size and seed development traits, but not germination rate. Individual genome-wide heterozygosity and seed traits were only correlated in P. martinezii (general-effects HFC), and none of the candidate single nucleotide polymorphisms (SNPs) associated with individual fertility showed heterozygote advantage in any species (no local-effects HFC). We observed a single and recent (c. 30 thousand years ago (ka)) population decline for P. martinezii; the collapse of P. mexicana occurred in two phases separated by a long period of stability (c. 800 ka). Recruitment always contributed more to total population census than adult trees in P. mexicana, while this was only the case in the largest populations of P. martinezii. Equating fitness to either H0 or Ne, as traditionally proposed in conservation biology, might not always be adequate, as species-specific evolutionary factors can decouple the expected correlation between these parameters.
Collapse
Affiliation(s)
- Eduardo Mendoza-Maya
- Programa Institucional de Doctorado en Ciencias Agropecuarias y Forestales, Universidad Juárez del Estado de Durango, 34000, Durango, Mexico
| | - Gustavo Ibrahim Giles-Pérez
- Departamento de Ecología Evolutiva, Instituto de Ecología, Universidad Nacional Autónoma de México, 04510, Ciudad de Mexico, Mexico
| | - J Jesús Vargas-Hernández
- Postgrado en Ciencias Forestales, Colegio de Postgraduados, Montecillo, Texcoco, 56264, Estado de México, Mexico
| | - Cuauhtémoc Sáenz-Romero
- Instituto de Investigaciones sobre los Recursos Naturales, Universidad Michoacana de San Nicolás de Hidalgo, Morelia, 58330, Michoacán, Mexico
| | - Miguel Martínez-Trujillo
- Facultad de Biología, Universidad Michoacana de San Nicolás de Hidalgo, Morelia, 58030, Michoacán, Mexico
| | | | - José Ciro Hernández-Díaz
- Instituto de Silvicultura e Industria de la Madera, Universidad Juárez del Estado de Durango, 34120, Durango, Mexico
| | - José Ángel Prieto-Ruíz
- Facultad de Ciencias Forestales y Ambientales, Universidad Juárez del Estado de Durango, 34120, Durango, Mexico
| | - Juan P Jaramillo-Correa
- Departamento de Ecología Evolutiva, Instituto de Ecología, Universidad Nacional Autónoma de México, 04510, Ciudad de Mexico, Mexico
| | - Christian Wehenkel
- Instituto de Silvicultura e Industria de la Madera, Universidad Juárez del Estado de Durango, 34120, Durango, Mexico
| |
Collapse
|
3
|
Tenhunen S, Thomasen JR, Sørensen LP, Berg P, Kargo M. Genomic analysis of inbreeding and coancestry in Nordic Jersey and Holstein dairy cattle populations. J Dairy Sci 2024; 107:5897-5912. [PMID: 38608951 DOI: 10.3168/jds.2023-24553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 03/01/2024] [Indexed: 04/14/2024]
Abstract
In recent years, genomic selection (GS) has accelerated genetic gain in dairy cattle breeds worldwide. Despite the evident genetic progress, several dairy populations have also encountered challenges such as heightened inbreeding rates and reduced effective population sizes. The challenge has been to find a balance between achieving substantial genetic gain while managing genetic diversity within the population, thereby mitigating the negative effects of inbreeding depression. This study aims to elucidate the impact of GS on pedigree and genomic rates of inbreeding (ΔF) and coancestry (ΔC) in Nordic Jersey (NJ) and Holstein (NH) cattle populations. Furthermore, key genetic metrics, including the generation interval (L), effective population size (Ne), and future effective population size (FNe) were assessed between 2 time periods, before and after GS, and across distinct animal cohorts in both breeds: females, bulls, and approved semen-producing bulls (AI-sires). Analysis of ΔF and ΔC revealed distinct trends across the studied periods and animal groups. Notably, there was a consistent increase in yearly ΔF for most animal groups in both breeds. An exception was observed in NH AI-sires, which demonstrated a slight decrease in yearly ΔF. Moreover, NJ displayed minimal changes in yearly ΔC between the periods, whereas NH exhibited elevated ΔC values across all animal groups. Particularly striking was the substantial increase in yearly ΔC within the NH female population, surging from 0.02% to 0.39% between the periods. Implementation of GS resulted in a reduction of the generation interval across all animal cohorts in both NJ and NH breeds. However, the extent of reduction was more pronounced in males compared with females. This reduction in generation interval influenced generational changes in ΔF and ΔC. Bulls and AI-sires of both breeds exhibited reduced generational ΔF between periods, in contrast to females that demonstrated an opposing pattern. Between the periods, NJ maintained a relatively stable Ne (29.4 before and 30.3 after GS), whereas NH experienced a notable decline from 54.3 to 42.8. Female groups in both breeds displayed a negative Ne trend, whereas males demonstrated either neutral or positive Ne developments. Regarding FNe, NJ exhibited positive FNe development with an increase from 40.7 to 57.2. The opposite was observed in NH, where FNe decreased from 198.8 to 42.7. In summary, it was evident that the genomic methods could detect differences between the populations and changes in ΔF and ΔC more efficiently than pedigree methods. Implementation of GS yielded positive outcomes within the NJ population regarding the rate of coancestry but the opposite was observed with NH. Moreover, analysis of ΔC data hints at the potential to decrease future ΔF through informed mating strategies. Conversely, NH faces more pressing concerns, even though ΔF remains comparatively modest in contrast to what has been observed in other Holstein populations. These findings underscore the necessity of genomic control of inbreeding and coancestry with strategic changes in the Nordic breeding schemes for dairy to ensure long-term sustainability in the forthcoming years.
Collapse
Affiliation(s)
- S Tenhunen
- Aarhus University, Center for Quantitative Genetics and Genomics, 8000 Aarhus, Denmark; VikingGenetics, 8960 Randers SØ, Denmark.
| | | | | | - P Berg
- Norwegian University of Life Sciences, NMBU, 1433 Ås, Norway
| | - M Kargo
- Aarhus University, Center for Quantitative Genetics and Genomics, 8000 Aarhus, Denmark; VikingGenetics, 8960 Randers SØ, Denmark
| |
Collapse
|
4
|
Zhu Z, Lin R, Zhao B, Shi W, Cai Q, Zhang L, Xin Q, Li L, Miao Z, Zhou S, Huang Z, Huang Q, Zheng N. Whole-genome resequencing revealed the population structure and selection signal of 4 indigenous Chinese laying ducks. Poult Sci 2024; 103:103832. [PMID: 38781766 PMCID: PMC11145554 DOI: 10.1016/j.psj.2024.103832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2024] [Revised: 04/20/2024] [Accepted: 05/02/2024] [Indexed: 05/25/2024] Open
Abstract
The assessment of animal genetic structure had significant importance for the preservation and breeding of animal germplasm resources. Selection signals are genotype markers generated during the process of biological evolution, and the detection of selection signals could reveal the direction of species evolution. The aim of this study was to generate a whole-genome resequencing data from Jinding duck, Shanma duck, Youxian Partridge duck, and Taiwan Brown tsaiya duck to reveal their population structure and selection signals. The population structure analysis revealed significant genetic differences among the 4 indigenous laying ducks, indicating their independent lineage. Specifically, Shanma duck and Youxian partridge duck were closely and likely originated from a common ancestor. In addition, selection sweep analysis was performed using the population genetic differentiation coefficient (Fst) and nucleotide diversity ratio (π ratio). The top 5% was used as the threshold for the Fst and π ratio, and the 2 thresholds were combined to identify selected genomic regions. In the selected regions of the 3 comparison groups, 136, 143, and 268 candidate genes were detected. Further screening of all candidate genes revealed that 35 candidate genes appeared simultaneously in 3 comparative groups, with 16 genes annotated. The 16 genes were analyzed by Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses. The results revealed 5 functional genes (AQP3, PIK3C3, NOL6, RPP25, and DCTN3) that may be related to important economic traits in laying ducks and involved mainly invasopressin-regulated water reabsorption, ribosome biogenesis, and the PI3K signaling pathway. The results provide insights into the protection and exploitation of genetic resources of Chinese indigenous laying ducks.
Collapse
Affiliation(s)
- Zhiming Zhu
- Institute of Animal Husbandry and Veterinary Medicine, Fujian Academy of Agricultural Sciences/ Fujian Key Laboratory of Animal Genetics and Breeding, Fuzhou 350013, China
| | - Ruiyi Lin
- College of Animal Sciences (College of Bee Science), Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Bangzhe Zhao
- Institute of Animal Husbandry and Veterinary Medicine, Fujian Academy of Agricultural Sciences/ Fujian Key Laboratory of Animal Genetics and Breeding, Fuzhou 350013, China; College of Animal Sciences (College of Bee Science), Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Wenli Shi
- Institute of Animal Husbandry and Veterinary Medicine, Fujian Academy of Agricultural Sciences/ Fujian Key Laboratory of Animal Genetics and Breeding, Fuzhou 350013, China; College of Animal Sciences (College of Bee Science), Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Qiannan Cai
- Institute of Animal Husbandry and Veterinary Medicine, Fujian Academy of Agricultural Sciences/ Fujian Key Laboratory of Animal Genetics and Breeding, Fuzhou 350013, China; College of Animal Sciences (College of Bee Science), Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Linli Zhang
- Institute of Animal Husbandry and Veterinary Medicine, Fujian Academy of Agricultural Sciences/ Fujian Key Laboratory of Animal Genetics and Breeding, Fuzhou 350013, China
| | - Qingwu Xin
- Institute of Animal Husbandry and Veterinary Medicine, Fujian Academy of Agricultural Sciences/ Fujian Key Laboratory of Animal Genetics and Breeding, Fuzhou 350013, China
| | - Li Li
- Institute of Animal Husbandry and Veterinary Medicine, Fujian Academy of Agricultural Sciences/ Fujian Key Laboratory of Animal Genetics and Breeding, Fuzhou 350013, China
| | - Zhongwei Miao
- Institute of Animal Husbandry and Veterinary Medicine, Fujian Academy of Agricultural Sciences/ Fujian Key Laboratory of Animal Genetics and Breeding, Fuzhou 350013, China
| | - Shiyi Zhou
- Seed Industry Development Center of Shishi, Shishi 362700, China
| | - Zhongbin Huang
- Seed Industry Development Center of Shishi, Shishi 362700, China
| | - Qinlou Huang
- Institute of Animal Husbandry and Veterinary Medicine, Fujian Academy of Agricultural Sciences/ Fujian Key Laboratory of Animal Genetics and Breeding, Fuzhou 350013, China
| | - Nenzhu Zheng
- Institute of Animal Husbandry and Veterinary Medicine, Fujian Academy of Agricultural Sciences/ Fujian Key Laboratory of Animal Genetics and Breeding, Fuzhou 350013, China.
| |
Collapse
|
5
|
Lawson DJ, Howard-McCombe J, Beaumont M, Senn H. How admixed captive breeding populations could be rescued using local ancestry information. Mol Ecol 2024:e17349. [PMID: 38634332 DOI: 10.1111/mec.17349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 12/21/2023] [Accepted: 02/26/2024] [Indexed: 04/19/2024]
Abstract
This paper asks the question: can genomic information be used to recover a species that is already on the pathway to extinction due to genetic swamping from a related and more numerous population? We show that a breeding strategy in a captive breeding program can use whole genome sequencing to identify and remove segments of DNA introgressed through hybridisation. The proposed policy uses a generalized measure of kinship or heterozygosity accounting for local ancestry, that is, whether a specific genetic location was inherited from the target of conservation. We then show that optimizing these measures would minimize undesired ancestry while also controlling kinship and/or heterozygosity, in a simulated breeding population. The process is applied to real data representing the hybridized Scottish wildcat breeding population, with the result that it should be possible to breed out domestic cat ancestry. The ability to reverse introgression is a powerful tool brought about through the combination of sequencing with computational advances in ancestry estimation. Since it works best when applied early in the process, important decisions need to be made about which genetically distinct populations should benefit from it and which should be left to reform into a single population.
Collapse
Affiliation(s)
- Daniel J Lawson
- Institute of Statistical Sciences, School of Mathematics, University of Bristol, Bristol, UK
| | - Jo Howard-McCombe
- RZSS WildGenes Laboratory, Conservation Department, Royal Zoological Society of Scotland, Edinburgh, UK
| | - Mark Beaumont
- School of Biological Sciences, University of Bristol, Bristol, UK
| | - Helen Senn
- RZSS WildGenes Laboratory, Conservation Department, Royal Zoological Society of Scotland, Edinburgh, UK
| |
Collapse
|
6
|
Baguma JK, Mukasa SB, Nuwamanya E, Alicai T, Omongo CA, Ochwo-Ssemakula M, Ozimati A, Esuma W, Kanaabi M, Wembabazi E, Baguma Y, Kawuki RS. Identification of Genomic Regions for Traits Associated with Flowering in Cassava ( Manihot esculenta Crantz). PLANTS (BASEL, SWITZERLAND) 2024; 13:796. [PMID: 38592820 PMCID: PMC10974989 DOI: 10.3390/plants13060796] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 01/25/2024] [Accepted: 01/26/2024] [Indexed: 04/11/2024]
Abstract
Flowering in cassava (Manihot esculenta Crantz) is crucial for the generation of botanical seed for breeding. However, genotypes preferred by most farmers are erect and poor at flowering or never flower. To elucidate the genetic basis of flowering, 293 diverse cassava accessions were evaluated for flowering-associated traits at two locations and seasons in Uganda. Genotyping using the Diversity Array Technology Pty Ltd. (DArTseq) platform identified 24,040 single-nucleotide polymorphisms (SNPs) distributed on the 18 cassava chromosomes. Population structure analysis using principal components (PCs) and kinships showed three clusters; the first five PCs accounted for 49.2% of the observed genetic variation. Linkage disequilibrium (LD) estimation averaged 0.32 at a distance of ~2850 kb (kilo base pairs). Polymorphism information content (PIC) and minor allele frequency (MAF) were 0.25 and 0.23, respectively. A genome-wide association study (GWAS) analysis uncovered 53 significant marker-trait associations (MTAs) with flowering-associated traits involving 27 loci. Two loci, SNPs S5_29309724 and S15_11747301, were associated with all the traits. Using five of the 27 SNPs with a Phenotype_Variance_Explained (PVE) ≥ 5%, 44 candidate genes were identified in the peak SNP sites located within 50 kb upstream or downstream, with most associated with branching traits. Eight of the genes, orthologous to Arabidopsis and other plant species, had known functional annotations related to flowering, e.g., eukaryotic translation initiation factor and myb family transcription factor. This study identified genomic regions associated with flowering-associated traits in cassava, and the identified SNPs can be useful in marker-assisted selection to overcome hybridization challenges, like unsynchronized flowering, and candidate gene validation.
Collapse
Affiliation(s)
- Julius K. Baguma
- School of Agricultural Sciences, Makerere University, Kampala P.O. Box 7062, Uganda; (S.B.M.); (E.N.); (M.O.-S.)
- National Crops Resources Research Institute, Namulonge (NaCRRI), Kampala P.O. Box 7084, Uganda; (T.A.); (C.A.O.); (A.O.); (W.E.); (M.K.); (E.W.); (R.S.K.)
| | - Settumba B. Mukasa
- School of Agricultural Sciences, Makerere University, Kampala P.O. Box 7062, Uganda; (S.B.M.); (E.N.); (M.O.-S.)
| | - Ephraim Nuwamanya
- School of Agricultural Sciences, Makerere University, Kampala P.O. Box 7062, Uganda; (S.B.M.); (E.N.); (M.O.-S.)
- National Crops Resources Research Institute, Namulonge (NaCRRI), Kampala P.O. Box 7084, Uganda; (T.A.); (C.A.O.); (A.O.); (W.E.); (M.K.); (E.W.); (R.S.K.)
| | - Titus Alicai
- National Crops Resources Research Institute, Namulonge (NaCRRI), Kampala P.O. Box 7084, Uganda; (T.A.); (C.A.O.); (A.O.); (W.E.); (M.K.); (E.W.); (R.S.K.)
| | - Christopher Abu Omongo
- National Crops Resources Research Institute, Namulonge (NaCRRI), Kampala P.O. Box 7084, Uganda; (T.A.); (C.A.O.); (A.O.); (W.E.); (M.K.); (E.W.); (R.S.K.)
- National Agricultural Research Organisation (NARO), Entebbe P.O. Box 295, Uganda;
| | - Mildred Ochwo-Ssemakula
- School of Agricultural Sciences, Makerere University, Kampala P.O. Box 7062, Uganda; (S.B.M.); (E.N.); (M.O.-S.)
| | - Alfred Ozimati
- National Crops Resources Research Institute, Namulonge (NaCRRI), Kampala P.O. Box 7084, Uganda; (T.A.); (C.A.O.); (A.O.); (W.E.); (M.K.); (E.W.); (R.S.K.)
- School of Biological Sciences, Makerere University, Kampala P.O. Box 7062, Uganda
| | - Williams Esuma
- National Crops Resources Research Institute, Namulonge (NaCRRI), Kampala P.O. Box 7084, Uganda; (T.A.); (C.A.O.); (A.O.); (W.E.); (M.K.); (E.W.); (R.S.K.)
- National Agricultural Research Organisation (NARO), Entebbe P.O. Box 295, Uganda;
| | - Michael Kanaabi
- National Crops Resources Research Institute, Namulonge (NaCRRI), Kampala P.O. Box 7084, Uganda; (T.A.); (C.A.O.); (A.O.); (W.E.); (M.K.); (E.W.); (R.S.K.)
| | - Enoch Wembabazi
- National Crops Resources Research Institute, Namulonge (NaCRRI), Kampala P.O. Box 7084, Uganda; (T.A.); (C.A.O.); (A.O.); (W.E.); (M.K.); (E.W.); (R.S.K.)
| | - Yona Baguma
- National Agricultural Research Organisation (NARO), Entebbe P.O. Box 295, Uganda;
| | - Robert S. Kawuki
- National Crops Resources Research Institute, Namulonge (NaCRRI), Kampala P.O. Box 7084, Uganda; (T.A.); (C.A.O.); (A.O.); (W.E.); (M.K.); (E.W.); (R.S.K.)
- National Agricultural Research Organisation (NARO), Entebbe P.O. Box 295, Uganda;
| |
Collapse
|
7
|
Aalbers SE, Weir BS. Sequence-based population structure, relatedness, and inbreeding estimates for forensic autosomal STR markers. Forensic Sci Int Genet 2024; 69:103009. [PMID: 38237274 DOI: 10.1016/j.fsigen.2024.103009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 12/11/2023] [Accepted: 01/11/2024] [Indexed: 01/29/2024]
Abstract
Population data have become available for sequence data to aid forensic investigations and prepare the forensic community in the move towards implementing NGS methods. This comes with a need for updated population genetic parameters estimates to allow DNA evidence evaluations using sequence data. Initial work has been done on a small sample and here we expand this work by providing estimates of population structure and relatedness for autosomal STR data generated by sequencing technologies. We also discuss the effect of inbreeding on forensic calculations and discuss why the use of genotypic-based estimates may be preferred over allelic-based estimates.
Collapse
Affiliation(s)
- Sanne E Aalbers
- Institute for Public Health Genetics, University of Washington, Seattle, WA, USA; Department of Biostatistics, University of Washington, Seattle, WA, USA.
| | - Bruce S Weir
- Institute for Public Health Genetics, University of Washington, Seattle, WA, USA; Department of Biostatistics, University of Washington, Seattle, WA, USA
| |
Collapse
|
8
|
Guan Y, Levy D. Estimation of inbreeding and kinship coefficients via latent identity-by-descent states. Bioinformatics 2024; 40:btae082. [PMID: 38364309 PMCID: PMC10902678 DOI: 10.1093/bioinformatics/btae082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 01/15/2024] [Accepted: 02/12/2024] [Indexed: 02/18/2024] Open
Abstract
MOTIVATION Estimating the individual inbreeding coefficient and pairwise kinship is an important problem in human genetics (e.g. in disease mapping) and in animal and plant genetics (e.g. inbreeding design). Existing methods, such as sample correlation-based genetic relationship matrix, KING, and UKin, are either biased, or not able to estimate inbreeding coefficients, or produce a large proportion of negative estimates that are difficult to interpret. This limitation of existing methods is partly due to failure to explicitly model inbreeding. Since all humans are inbred to various degrees by virtue of shared ancestries, it is prudent to account for inbreeding when inferring kinship between individuals. RESULTS We present "Kindred," an approach that estimates inbreeding and kinship by modeling latent identity-by-descent states that accounts for all possible allele sharing-including inbreeding-between two individuals. Kindred used non-negative least squares method to fit the model, which not only increases computation efficiency compared to the maximum likelihood method, but also guarantees non-negativity of the kinship estimates. Through simulation, we demonstrate the high accuracy and non-negativity of kinship estimates by Kindred. By selecting a subset of SNPs that are similar in allele frequencies across different continental populations, Kindred can accurately estimate kinship between admixed samples. In addition, we demonstrate that the realized kinship matrix estimated by Kindred is effective in reducing genomic control values via linear mixed model in genome-wide association studies. Finally, we demonstrate that Kindred produces sensible heritability estimates on an Australian height dataset. AVAILABILITY AND IMPLEMENTATION Kindred is implemented in C with multi-threading. It takes vcf file or stream as input and works seamlessly with bcftools. Kindred is freely available at https://github.com/haplotype/kindred.
Collapse
Affiliation(s)
- Yongtao Guan
- Framingham Heart Study, Framingham, MA 01702, United States
- Population Sciences Branch, National Heart, Lung, and Blood Institute, Bethesda, DC 20892, United States
| | - Daniel Levy
- Framingham Heart Study, Framingham, MA 01702, United States
- Population Sciences Branch, National Heart, Lung, and Blood Institute, Bethesda, DC 20892, United States
| |
Collapse
|
9
|
Cui R, Wu J, Yan K, Luo S, Hu Y, Feng W, Lu B, Wang J. Phased genome assemblies reveal haplotype-specific genetic load in the critically endangered Chinese Bahaba (Teleostei, Sciaenidae). Mol Ecol 2024; 33:e17250. [PMID: 38179694 DOI: 10.1111/mec.17250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2023] [Revised: 12/06/2023] [Accepted: 12/11/2023] [Indexed: 01/06/2024]
Abstract
While haplotype-specific genetic load shapes the evolutionary trajectory of natural and captive populations, mixed-haplotype assembly and genotyping hindered its characterization in diploids. Herein, we produced two phased genome assemblies of the critically endangered fish Chinese Bahaba (Bahaba taipingensis, Sciaenidae, Teleostei) and resequenced 20 whole genomes to quantify population genetic load at a haplotype level. We identified frame-shifting variants as the most deleterious type, followed by mutations in the 5'-UTR, 3'-UTR and missense mutations at conserved amino acids. Phased haplotypes revealed gene deletions and high-impact deleterious variants. We estimated ~1.12% of genes missing or interrupted per haplotype, with a significant overlap of disrupted genes (30.35%) between haplotype sets. Relative proportions of deleterious variant categories differed significantly between haplotypes. Simulations suggested that purifying selection struggled to purge slightly deleterious genetic load in captive breeding compared to genotyping interventions, and that higher inter-haplotypic variance of genetic load predicted more efficient purging by artificial selection. Combining the knowledge of haplotype-resolved genetic load with predictive modelling will be immensely useful for understanding the evolution of deleterious variants and guiding conservation planning.
Collapse
Affiliation(s)
- Rongfeng Cui
- School of Ecology & State Key Laboratory of Biocontrol, Sun Yat-sen University, Shenzhen, China
- Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), Zhuhai, China
| | - Jinxian Wu
- Guangzhou Key Laboratory of Subtropical Biodiversity and Biomonitoring, Guangdong-Macao Joint Laboratory for Aquaculture Breeding Development and Innovation, School of Life Sciences, South China Normal University, Guangzhou, China
| | - Kuoqiu Yan
- Huangjing Marine Biotechnology Co. Ltd., Huizhou, China
| | - Sujun Luo
- Dongguan Forestry Affairs Center, Dongguan, China
| | - Yuting Hu
- Dongguan Forestry Affairs Center, Dongguan, China
| | - Wei Feng
- Dongguan Forestry Affairs Center, Dongguan, China
| | - Bingqian Lu
- Dongguan Forestry Affairs Center, Dongguan, China
| | - Junjie Wang
- Guangzhou Key Laboratory of Subtropical Biodiversity and Biomonitoring, Guangdong-Macao Joint Laboratory for Aquaculture Breeding Development and Innovation, School of Life Sciences, South China Normal University, Guangzhou, China
| |
Collapse
|
10
|
Tsouris A, Brach G, Schacherer J, Hou J. Non-additive genetic components contribute significantly to population-wide gene expression variation. CELL GENOMICS 2024; 4:100459. [PMID: 38190102 PMCID: PMC10794783 DOI: 10.1016/j.xgen.2023.100459] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 09/19/2023] [Accepted: 11/09/2023] [Indexed: 01/09/2024]
Abstract
Gene expression variation, an essential step between genotype and phenotype, is collectively controlled by local (cis) and distant (trans) regulatory changes. Nevertheless, how these regulatory elements differentially influence gene expression variation remains unclear. Here, we bridge this gap by analyzing the transcriptomes of a large diallel panel consisting of 323 unique hybrids originating from genetically divergent Saccharomyces cerevisiae isolates. Our analysis across 5,087 transcript abundance traits showed that non-additive components account for 36% of the gene expression variance on average. By comparing allele-specific read counts in parent-hybrid trios, we found that trans-regulatory changes underlie the majority of gene expression variation in the population. Remarkably, most cis-regulatory variations are also exaggerated or attenuated by additional trans effects. Overall, we showed that the transcriptome is globally buffered at the genetic level mainly due to trans-regulatory variation in the population.
Collapse
Affiliation(s)
- Andreas Tsouris
- Université de Strasbourg, CNRS, GMGM UMR, 7156 Strasbourg, France
| | - Gauthier Brach
- Université de Strasbourg, CNRS, GMGM UMR, 7156 Strasbourg, France
| | - Joseph Schacherer
- Université de Strasbourg, CNRS, GMGM UMR, 7156 Strasbourg, France; Institut Universitaire de France (IUF), Paris, France.
| | - Jing Hou
- Université de Strasbourg, CNRS, GMGM UMR, 7156 Strasbourg, France.
| |
Collapse
|
11
|
Goudet J, Weir BS. An allele-sharing, moment-based estimator of global, population-specific and population-pair FST under a general model of population structure. PLoS Genet 2023; 19:e1010871. [PMID: 38011288 PMCID: PMC10703327 DOI: 10.1371/journal.pgen.1010871] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 12/07/2023] [Accepted: 10/31/2023] [Indexed: 11/29/2023] Open
Abstract
Being able to properly quantify genetic differentiation is key to understanding the evolutionary potential of a species. One central parameter in this context is FST, the mean coancestry within populations relative to the mean coancestry between populations. Researchers have been estimating FST globally or between pairs of populations for a long time. More recently, it has been proposed to estimate population-specific FST values, and population-pair mean relative coancestry. Here, we review the several definitions and estimation methods of FST, and stress that they provide values relative to a reference population. We show the good statistical properties of an allele-sharing, method of moments based estimator of FST (global, population-specific and population-pair) under a very general model of population structure. We point to the limitation of existing likelihood and Bayesian estimators when the populations are not independent. Last, we show that recent attempts to estimate absolute, rather than relative, mean coancestry fail to do so.
Collapse
Affiliation(s)
- Jerome Goudet
- Dept Ecology & Evolution, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of BioInformatics, University of Lausanne, Lausanne, Switzerland
| | - Bruce S. Weir
- Department of Biostatistics, University of Washington, Seattle, Washington, United States of America
| |
Collapse
|
12
|
Garcia-Erill G, Hanghøj K, Heller R, Wiuf C, Albrechtsen A. Estimating admixture pedigrees of recent hybrids without a contiguous reference genome. Mol Ecol Resour 2023; 23:1604-1619. [PMID: 37400991 DOI: 10.1111/1755-0998.13830] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 05/30/2023] [Accepted: 06/15/2023] [Indexed: 07/05/2023]
Abstract
The genome of recently admixed individuals or hybrids has characteristic genetic patterns that can be used to learn about their recent admixture history. One of these are patterns of interancestry heterozygosity, which can be inferred from SNP data from either called genotypes or genotype likelihoods, without the need for information on genomic location. This makes them applicable to a wide range of data that are often used in evolutionary and conservation genomic studies, such as low-depth sequencing mapped to scaffolds and reduced representation sequencing. Here we implement maximum likelihood estimation of interancestry heterozygosity patterns using two complementary models. We furthermore develop apoh (Admixture Pedigrees of Hybrids), a software that uses estimates of paired ancestry proportions to detect recently admixed individuals or hybrids, and to suggest possible admixture pedigrees. It furthermore calculates several hybrid indices that make it easier to identify and rank possible admixture pedigrees that could give rise to the estimated patterns. We implemented apoh both as a command line tool and as a Graphical User Interface that allows the user to automatically and interactively explore, rank and visualize compatible recent admixture pedigrees, and calculate the different summary indices. We validate the performance of the method using admixed family trios from the 1000 Genomes Project. In addition, we show its applicability on identifying recent hybrids from RAD-seq data of Grant's gazelle (Nanger granti and Nanger petersii) and whole genome low-depth data of waterbuck (Kobus ellipsiprymnus) which shows complex admixture of up to four populations.
Collapse
Affiliation(s)
| | - Kristian Hanghøj
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Rasmus Heller
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Carsten Wiuf
- Department of Mathematical Sciences, University of Copenhagen, Copenhagen, Denmark
| | | |
Collapse
|
13
|
Cavedon M, Neufeld L, Finnegan L, Hervieux D, Michalak A, Pelletier A, Polfus J, Schwantje H, Skinner G, Steenweg R, Thacker C, Poissant J, Musiani M. Genomics of founders for conservation breeding: the Jasper caribou case. CONSERV GENET 2023; 24:855-867. [PMID: 37969360 PMCID: PMC10638200 DOI: 10.1007/s10592-023-01540-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Accepted: 06/07/2023] [Indexed: 11/17/2023]
Abstract
Conservation breeding programs are increasingly used as recovery actions for wild animals; bringing founders into captivity to rear captive populations for future reintroduction into the wild. The International Union for the Conservation of Nature recommends that founders should come from genetically close populations and should have sufficient genetic diversity to avoid mating among relatives. Genomic data are highly informative for evaluating founders due to their high resolution and ability to capture adaptive divergence, yet, their application in that context remains limited. Woodland caribou are federally listed as a Species at Risk in Canada, with several populations facing extirpation, such as those in the Rocky Mountains of Alberta and British Columbia (BC). To prevent local extirpation, Jasper National Park (JNP) is proposing a conservation breeding program. We examined single nucleotide polymorphisms for 144 caribou from 11 populations encompassing a 200,0002 km area surrounding JNP to provide information useful for identifying appropriate founders for this program. We found that this area likely hosts a caribou metapopulation historically characterized by high levels of gene flow, which indicates that multiple sources of founders would be appropriate for initiating a breeding program. However, population structure and adaptive divergence analyses indicate that JNP caribou are closest to populations in the BC Columbia range, which also have suitable genetic diversity for conservation breeding. We suggest that collaboration among jurisdictions would be beneficial to implement the program to promote recovery of JNP caribou and possibly other caribou populations in the surrounding area, which is strategically at the periphery of the distribution of this endangered species. Supplementary Information The online version contains supplementary material available at 10.1007/s10592-023-01540-3.
Collapse
Affiliation(s)
- Maria Cavedon
- Deparment of Biological Sciences, University of Calgary, Calgary, AB T2N 1N4 Canada
| | - Lalenia Neufeld
- Jasper National Park of Canada, Parks Canada, Jasper, Canada
| | - Laura Finnegan
- fRI Research, 1176 Switzer Drive, Hinton, AB T7V 1V3 Canada
| | - Dave Hervieux
- Fish and Wildlife Stewardship Branch, Alberta Environment and Protected Areas, Grande Prairie, AB T8V 6J4 Canada
| | - Anita Michalak
- Faculty of Veterinary Medicine, University of Calgary, Calgary, AB T2N 1N4 Canada
| | - Agnes Pelletier
- Ministry of Land, Water and Resource Stewardship Northeast Region, 400-10003-110Th Avenue, Fort St. John, BC V1J 6M7 Canada
| | - Jean Polfus
- Canadian Wildlife Service – Pacific Region, Environment and Climate Change Canada, 1238 Discovery Ave, Kelowna, BC V1V 1V9 Canada
| | - Helen Schwantje
- Wildlife and Habitat Branch, Ministry of Forests, Lands, Natural Resource Operations and Rural Development, Government of British Columbia, 2080 Labieux Road, Nanaimo, BC V9T 6J 9 Canada
| | - Geoff Skinner
- Jasper National Park of Canada, Parks Canada, Jasper, Canada
| | - Robin Steenweg
- Canadian Wildlife Service – Pacific Region, Environment and Climate Change Canada, 1238 Discovery Ave, Kelowna, BC V1V 1V9 Canada
| | - Caeley Thacker
- Wildlife and Habitat Branch, Ministry of Forests, Lands, Natural Resource Operations and Rural Development, Government of British Columbia, 2080 Labieux Road, Nanaimo, BC V9T 6J 9 Canada
| | - Jocelyn Poissant
- Faculty of Veterinary Medicine, University of Calgary, Calgary, AB T2N 1N4 Canada
| | - Marco Musiani
- Dipartimento Scienze Biologiche Geologiche Ambientali, Università Di Bologna, Via Zamboni, 33 - 40126 Bologna, Italia
| |
Collapse
|
14
|
LaPierre N, Fu B, Turnbull S, Eskin E, Sankararaman S. Leveraging family data to design Mendelian randomization that is provably robust to population stratification. Genome Res 2023; 33:1032-1041. [PMID: 37197991 PMCID: PMC10538495 DOI: 10.1101/gr.277664.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 04/16/2023] [Indexed: 05/19/2023]
Abstract
Mendelian randomization (MR) has emerged as a powerful approach to leverage genetic instruments to infer causality between pairs of traits in observational studies. However, the results of such studies are susceptible to biases owing to weak instruments, as well as the confounding effects of population stratification and horizontal pleiotropy. Here, we show that family data can be leveraged to design MR tests that are provably robust to confounding from population stratification, assortative mating, and dynastic effects. We show in simulations that our approach, MR-Twin, is robust to confounding from population stratification and is not affected by weak instrument bias, whereas standard MR methods yield inflated false positive rates. We then conduct an exploratory analysis of MR-Twin and other MR methods applied to 121 trait pairs in the UK Biobank data set. Our results suggest that confounding from population stratification can lead to false positives for existing MR methods, whereas MR-Twin is immune to this type of confounding, and that MR-Twin can help assess whether traditional approaches may be inflated owing to confounding from population stratification.
Collapse
Affiliation(s)
- Nathan LaPierre
- Department of Computer Science, University of California Los Angeles, Los Angeles, California 90095, USA;
| | - Boyang Fu
- Department of Computer Science, University of California Los Angeles, Los Angeles, California 90095, USA
| | - Steven Turnbull
- Department of Statistics, University of California Los Angeles, Los Angeles, California 90095, USA
| | - Eleazar Eskin
- Department of Computer Science, University of California Los Angeles, Los Angeles, California 90095, USA
- Department of Computational Medicine, University of California Los Angeles, Los Angeles, California 90095, USA
- Department of Human Genetics, University of California Los Angeles, Los Angeles, California 90095, USA
| | - Sriram Sankararaman
- Department of Computer Science, University of California Los Angeles, Los Angeles, California 90095, USA;
- Department of Computational Medicine, University of California Los Angeles, Los Angeles, California 90095, USA
- Department of Human Genetics, University of California Los Angeles, Los Angeles, California 90095, USA
| |
Collapse
|
15
|
He L, Luo J, Niu S, Bai D, Chen Y. Population structure analysis to explore genetic diversity and geographical distribution characteristics of wild tea plant in Guizhou Plateau. BMC PLANT BIOLOGY 2023; 23:255. [PMID: 37189087 DOI: 10.1186/s12870-023-04239-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Accepted: 04/21/2023] [Indexed: 05/17/2023]
Abstract
BACKGROUND Tea, the second largest consumer beverage in the world after water, is widely cultivated in tropical and subtropical areas. However, the effect of environmental factors on the distribution of wild tea plants is unclear. RESULTS A total of 159 wild tea plants were collected from different altitudes and geological types of the Guizhou Plateau. Using the genotyping-by-sequencing method, 98,241 high-quality single nucleotide polymorphisms were identified. Genetic diversity, population structure analysis, principal component analysis, phylogenetic analysis, and linkage disequilibrium were performed. The genetic diversity of the wild tea plant population from the Silicate Rock Classes of Camellia gymnogyna was higher than that from the Carbonate Rock Classes of Camellia tachangensis. In addition, the genetic diversity of wild tea plants from the second altitude gradient was significantly higher than that of wild tea plants from the third and first altitude gradients. Two inferred pure groups (GP01 and GP02) and one inferred admixture group (GP03) were identified by population structure analysis and were verified by principal component and phylogenetic analyses. The highest differentiation coefficients were determined for GP01 vs. GP02, while the lowest differentiation coefficients were determined for GP01 vs. GP03. CONCLUSIONS This study revealed the genetic diversity and geographical distribution characteristics of wild tea plants in the Guizhou Plateau. There are significant differences in genetic diversity and evolutionary direction between Camellia tachangensis with Carbonate Rock Classes at the first altitude gradient and Camellia gymnogyna with Silicate Rock Classes at the third altitude gradient. Geological environment, soil mineral element content, soil pH, and altitude markedly contributed to the genetic differentiation between Camellia tachangensis and Camellia gymnogyna.
Collapse
Affiliation(s)
- Limin He
- College of Tea Science / Institute of Agro-Bioengineering, Guizhou University, Guiyang, Guizhou Province, 550025, People's Republic of China
| | - Jing Luo
- College of Tea Science / Institute of Agro-Bioengineering, Guizhou University, Guiyang, Guizhou Province, 550025, People's Republic of China
| | - Suzhen Niu
- College of Tea Science / Institute of Agro-Bioengineering, Guizhou University, Guiyang, Guizhou Province, 550025, People's Republic of China.
- Key Laboratory of Plant Resources Conservation and Germplasm Innovation in Mountainous Region, Guizhou University, Ministry of Education, Institute of Agro-Bioengineering, Guiyang, 550025, Guizhou Province, People's Republic of China.
| | - Dingchen Bai
- College of Tea Science / Institute of Agro-Bioengineering, Guizhou University, Guiyang, Guizhou Province, 550025, People's Republic of China
| | - Yanjun Chen
- College of Tea Science / Institute of Agro-Bioengineering, Guizhou University, Guiyang, Guizhou Province, 550025, People's Republic of China
| |
Collapse
|
16
|
Hou Z, Ochoa A. Genetic association models are robust to common population kinship estimation biases. Genetics 2023; 224:iyad030. [PMID: 36843304 PMCID: PMC10474929 DOI: 10.1093/genetics/iyad030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 11/08/2022] [Accepted: 02/17/2023] [Indexed: 02/28/2023] Open
Abstract
Common genetic association models for structured populations, including principal component analysis (PCA) and linear mixed-effects models (LMMs), model the correlation structure between individuals using population kinship matrices, also known as genetic relatedness matrices. However, the most common kinship estimators can have severe biases that were only recently determined. Here we characterize the effect of these kinship biases on genetic association. We employ a large simulated admixed family and genotypes from the 1000 Genomes Project, both with simulated traits, to evaluate key kinship estimators. Remarkably, we find practically invariant association statistics for kinship matrices of different bias types (matching all other features). We then prove using statistical theory and linear algebra that LMM association tests are invariant to these kinship biases, and PCA approximately so. Our proof shows that the intercept and relatedness effect coefficients compensate for the kinship bias, an argument that extends to generalized linear models. As a corollary, association testing is also invariant to changing the reference ancestral population of the kinship matrix. Lastly, we observed that all kinship estimators, except for popkin ratio-of-means, can give improper non-positive semidefinite matrices, which can be problematic although some LMMs handle them surprisingly well, and condition numbers can be used to choose kinship estimators. Overall, we find that existing association studies are robust to kinship estimation bias, and our calculations may help improve association methods by taking advantage of this unexpected robustness, as well as help determine the effects of kinship bias in related problems.
Collapse
Affiliation(s)
- Zhuoran Hou
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27705, USA
| | - Alejandro Ochoa
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27705, USA
- Duke Center for Statistical Genetics and Genomics, Duke University, Durham, NC 27705, USA
| |
Collapse
|
17
|
Solovieva E, Sakai H. PSReliP: an integrated pipeline for analysis and visualization of population structure and relatedness based on genome-wide genetic variant data. BMC Bioinformatics 2023; 24:135. [PMID: 37020193 PMCID: PMC10074814 DOI: 10.1186/s12859-023-05169-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 02/02/2023] [Indexed: 04/07/2023] Open
Abstract
BACKGROUND Population structure and cryptic relatedness between individuals (samples) are two major factors affecting false positives in genome-wide association studies (GWAS). In addition, population stratification and genetic relatedness in genomic selection in animal and plant breeding can affect prediction accuracy. The methods commonly used for solving these problems are principal component analysis (to adjust for population stratification) and marker-based kinship estimates (to correct for the confounding effects of genetic relatedness). Currently, many tools and software are available that analyze genetic variation among individuals to determine population structure and genetic relationships. However, none of these tools or pipelines perform such analyses in a single workflow and visualize all the various results in a single interactive web application. RESULTS We developed PSReliP, a standalone, freely available pipeline for the analysis and visualization of population structure and relatedness between individuals in a user-specified genetic variant dataset. The analysis stage of PSReliP is responsible for executing all steps of data filtering and analysis and contains an ordered sequence of commands from PLINK, a whole-genome association analysis toolset, along with in-house shell scripts and Perl programs that support data pipelining. The visualization stage is provided by Shiny apps, an R-based interactive web application. In this study, we describe the characteristics and features of PSReliP and demonstrate how it can be applied to real genome-wide genetic variant data. CONCLUSIONS The PSReliP pipeline allows users to quickly analyze genetic variants such as single nucleotide polymorphisms and small insertions or deletions at the genome level to estimate population structure and cryptic relatedness using PLINK software and to visualize the analysis results in interactive tables, plots, and charts using Shiny technology. The analysis and assessment of population stratification and genetic relatedness can aid in choosing an appropriate approach for the statistical analysis of GWAS data and predictions in genomic selection. The various outputs from PLINK can be used for further downstream analysis. The code and manual for PSReliP are available at https://github.com/solelena/PSReliP .
Collapse
Affiliation(s)
- Elena Solovieva
- Research Center for Advanced Analysis, National Agriculture and Food Research Organization, Tsukuba, Ibaraki, Japan
| | - Hiroaki Sakai
- Research Center for Advanced Analysis, National Agriculture and Food Research Organization, Tsukuba, Ibaraki, Japan.
| |
Collapse
|
18
|
St-Pierre J, Oualkacha K, Bhatnagar SR. Efficient penalized generalized linear mixed models for variable selection and genetic risk prediction in high-dimensional data. Bioinformatics 2023; 39:7008326. [PMID: 36708013 PMCID: PMC9907224 DOI: 10.1093/bioinformatics/btad063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 01/13/2023] [Accepted: 01/25/2023] [Indexed: 01/29/2023] Open
Abstract
MOTIVATION Sparse regularized regression methods are now widely used in genome-wide association studies (GWAS) to address the multiple testing burden that limits discovery of potentially important predictors. Linear mixed models (LMMs) have become an attractive alternative to principal components (PCs) adjustment to account for population structure and relatedness in high-dimensional penalized models. However, their use in binary trait GWAS rely on the invalid assumption that the residual variance does not depend on the estimated regression coefficients. Moreover, LMMs use a single spectral decomposition of the covariance matrix of the responses, which is no longer possible in generalized linear mixed models (GLMMs). RESULTS We introduce a new method called pglmm, a penalized GLMM that allows to simultaneously select genetic markers and estimate their effects, accounting for between-individual correlations and binary nature of the trait. We develop a computationally efficient algorithm based on penalized quasi-likelihood estimation that allows to scale regularized mixed models on high-dimensional binary trait GWAS. We show through simulations that when the dimensionality of the relatedness matrix is high, penalized LMM and logistic regression with PC adjustment fail to select important predictors, and have inferior prediction accuracy compared to pglmm. Further, we demonstrate through the analysis of two polygenic binary traits in a subset of 6731 related individuals from the UK Biobank data with 320K SNPs that our method can achieve higher predictive performance, while also selecting fewer predictors than a sparse regularized logistic lasso with PC adjustment. AVAILABILITY AND IMPLEMENTATION Our Julia package PenalizedGLMM.jl is publicly available on github: https://github.com/julstpierre/PenalizedGLMM. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Julien St-Pierre
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montréal, QC H3A 1G1, Canada
| | - Karim Oualkacha
- Département de Mathématiques, Université du Québec à Montréal, Montréal, QC H2X 3Y7, Canada
| | - Sahir Rai Bhatnagar
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montréal, QC H3A 1G1, Canada
| |
Collapse
|
19
|
Mary-Huard T, Balding D. Fast and accurate joint inference of coancestry parameters for populations and/or individuals. PLoS Genet 2023; 19:e1010054. [PMID: 36656906 PMCID: PMC9888729 DOI: 10.1371/journal.pgen.1010054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Revised: 01/31/2023] [Accepted: 12/01/2022] [Indexed: 01/20/2023] Open
Abstract
We introduce a fast, new algorithm for inferring from allele count data the FST parameters describing genetic distances among a set of populations and/or unrelated diploid individuals, and a tree with branch lengths corresponding to FST values. The tree can reflect historical processes of splitting and divergence, but seeks to represent the actual genetic variance as accurately as possible with a tree structure. We generalise two major approaches to defining FST, via correlations and mismatch probabilities of sampled allele pairs, which measure shared and non-shared components of genetic variance. A diploid individual can be treated as a population of two gametes, which allows inference of coancestry coefficients for individuals as well as for populations, or a combination of the two. A simulation study illustrates that our fast method-of-moments estimation of FST values, simultaneously for multiple populations/individuals, gains statistical efficiency over pairwise approaches when the population structure is close to tree-like. We apply our approach to genome-wide genotypes from the 26 worldwide human populations of the 1000 Genomes Project. We first analyse at the population level, then a subset of individuals and in a final analysis we pool individuals from the more homogeneous populations. This flexible analysis approach gives advantages over traditional approaches to population structure/coancestry, including visual and quantitative assessments of long-standing questions about the relative magnitudes of within- and between-population genetic differences.
Collapse
Affiliation(s)
- Tristan Mary-Huard
- MIA-Paris, INRAE, AgroParisTech, Université Paris-Saclay, Palaiseau, France
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, Génétique Quantitative et Evolution—Le Moulon, Gif-sur-Yvette, France
- * E-mail:
| | - David Balding
- Melbourne Integrative Genomics, School of BioSciences and School of Mathematics & Statistics, University of Melbourne, Parkville, Victoria, Australia
| |
Collapse
|
20
|
LaPierre N, Fu B, Turnbull S, Eskin E, Sankararaman S. Leveraging family data to design Mendelian Randomization that is provably robust to population stratification. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.05.522936. [PMID: 36711635 PMCID: PMC9881984 DOI: 10.1101/2023.01.05.522936] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Mendelian Randomization (MR) has emerged as a powerful approach to leverage genetic instruments to infer causality between pairs of traits in observational studies. However, the results of such studies are susceptible to biases due to weak instruments as well as the confounding effects of population stratification and horizontal pleiotropy. Here, we show that family data can be leveraged to design MR tests that are provably robust to confounding from population stratification, assortative mating, and dynastic effects. We demonstrate in simulations that our approach, MR-Twin, is robust to confounding from population stratification and is not affected by weak instrument bias, while standard MR methods yield inflated false positive rates. We applied MR-Twin to 121 trait pairs in the UK Biobank dataset and found that MR-Twin identifies likely causal trait pairs and does not identify trait pairs that are unlikely to be causal. Our results suggest that confounding from population stratification can lead to false positives for existing MR methods, while MR-Twin is immune to this type of confounding.
Collapse
Affiliation(s)
| | - Boyang Fu
- Department of Computer Science, UCLA, Los Angeles CA
| | | | - Eleazar Eskin
- Department of Computer Science, UCLA, Los Angeles CA
- Department of Computational Medicine, UCLA, Los Angeles CA
- Department of Human Genetics, UCLA, Los Angeles CA
| | - Sriram Sankararaman
- Department of Computer Science, UCLA, Los Angeles CA
- Department of Computational Medicine, UCLA, Los Angeles CA
- Department of Human Genetics, UCLA, Los Angeles CA
| |
Collapse
|
21
|
Caliebe A, Tekola‐Ayele F, Darst BF, Wang X, Song YE, Gui J, Sebro RA, Balding DJ, Saad M, Dubé M. Including diverse and admixed populations in genetic epidemiology research. Genet Epidemiol 2022; 46:347-371. [PMID: 35842778 PMCID: PMC9452464 DOI: 10.1002/gepi.22492] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 05/31/2022] [Accepted: 06/06/2022] [Indexed: 11/25/2022]
Abstract
The inclusion of ancestrally diverse participants in genetic studies can lead to new discoveries and is important to ensure equitable health care benefit from research advances. Here, members of the Ethical, Legal, Social, Implications (ELSI) committee of the International Genetic Epidemiology Society (IGES) offer perspectives on methods and analysis tools for the conduct of inclusive genetic epidemiology research, with a focus on admixed and ancestrally diverse populations in support of reproducible research practices. We emphasize the importance of distinguishing socially defined population categorizations from genetic ancestry in the design, analysis, reporting, and interpretation of genetic epidemiology research findings. Finally, we discuss the current state of genomic resources used in genetic association studies, functional interpretation, and clinical and public health translation of genomic findings with respect to diverse populations.
Collapse
Affiliation(s)
- Amke Caliebe
- Institute of Medical Informatics and StatisticsKiel University and University Hospital Schleswig‐HolsteinKielGermany
| | - Fasil Tekola‐Ayele
- Epidemiology Branch, Division of Population Health Research, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human DevelopmentNational Institutes of HealthBethesdaMarylandUSA
| | - Burcu F. Darst
- Center for Genetic EpidemiologyUniversity of Southern CaliforniaLos AngelesCaliforniaUSA
- Public Health Sciences DivisionFred Hutchinson Cancer Research CenterSeattleWashingtonUSA
| | - Xuexia Wang
- Department of MathematicsUniversity of North TexasDentonTexasUSA
| | - Yeunjoo E. Song
- Department of Population and Quantitative Health SciencesCase Western Reserve UniversityClevelandOhioUSA
| | - Jiang Gui
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth CollegeOne Medical Center Dr.LebanonNew HampshireUSA
| | | | - David J. Balding
- Melbourne Integrative Genomics, Schools of BioSciences and of Mathematics & StatisticsUniversity of MelbourneMelbourneAustralia
| | - Mohamad Saad
- Qatar Computing Research InstituteHamad Bin Khalifa UniversityDohaQatar
- Neuroscience Research Center, Faculty of Medical SciencesLebanese UniversityBeirutLebanon
| | - Marie‐Pierre Dubé
- Department of Medicine, and Social and Preventive MedicineUniversité de MontréalMontréalQuébecCanada
- Beaulieu‐Saucier Pharmacogenomcis CentreMontreal Heart InstituteMontrealCanada
| | | |
Collapse
|
22
|
Giles‐Pérez GI, Aguirre‐Planter E, Eguiarte LE, Jaramillo‐Correa JP. Demographic modelling helps track the rapid and recent divergence of a conifer species pair from Central Mexico. Mol Ecol 2022; 31:5074-5088. [PMID: 35951172 PMCID: PMC9804182 DOI: 10.1111/mec.16646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Revised: 07/26/2022] [Accepted: 07/28/2022] [Indexed: 01/05/2023]
Abstract
Secondary contact of recently diverged species may have several outcomes, ranging from rampant hybridization to reinforced reproductive isolation. In plants, selfing tolerance and disjunct reproductive phenology may lead to reproductive isolation at contact zones. However, they may also evolve under both allopatric or parapatric frameworks and originate from adaptive and/or neutral forces. Inferring the historical demography of diverging taxa is thus a crucial step to identify factors that may have led to putative reproductive isolation. We explored various competing demographypotheses to account for the rapid divergence of a fir species complex (Abies flinckii-A. religiosa) distributed in "sky-islands" across central Mexico (i.e., along the Trans-Mexican Volcanic Belt; TMVB). Despite co-occurring in two independent sympatric regions (west and centre), these taxa rarely interbreed because of disjunct reproductive phenologies. We genotyped 1147 single nucleotide polymorphisms, generated by GBS (genotyping by sequencing), across 23 populations, and compared multiple scenarios based on the geological history of the TMVB. The best-fitting model revealed one of the most rapid and complete speciation cases for a conifer species-pair, dating back to ~1.2 million years ago. Coupled with the lack of support for stepwise colonization, our coalescent inferences point to an early cessation of interspecific gene flow under parapatric speciation; ancestral gene flow during divergence was asymmetrical (mostly from western firs into A. religiosa) and exclusive to the most ancient (i.e., central) contact zone. Factors promoting rapid reproductive isolation should be explored in other slowly evolving species complexes as they may account for the large tropical and subtropical diversity.
Collapse
Affiliation(s)
- Gustavo I. Giles‐Pérez
- Programa de Doctorado en Ciencias BiomédicasUniversidad Nacional Autónoma de MéxicoCDMXMexico,Departamento de Ecología EvolutivaInstituto de Ecología, Universidad Nacional Autónoma de MéxicoCDMXMexico
| | - Erika Aguirre‐Planter
- Departamento de Ecología EvolutivaInstituto de Ecología, Universidad Nacional Autónoma de MéxicoCDMXMexico
| | - Luis E. Eguiarte
- Departamento de Ecología EvolutivaInstituto de Ecología, Universidad Nacional Autónoma de MéxicoCDMXMexico
| | | |
Collapse
|
23
|
Alptekin B, Erfatpour M, Mangel D, Pauli D, Blake T, Turner H, Lachowiec J, Sherman J, Fischer A. Selection of favorable alleles of genes controlling flowering and senescence improves malt barley quality. MOLECULAR BREEDING : NEW STRATEGIES IN PLANT IMPROVEMENT 2022; 42:59. [PMID: 37313013 PMCID: PMC10248683 DOI: 10.1007/s11032-022-01331-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Accepted: 09/14/2022] [Indexed: 06/15/2023]
Abstract
Malt barley (Hordeum vulgare L.) is an important cash crop with stringent grain quality standards. Timing of the switch from vegetative to reproductive growth and timing of whole-plant senescence and nutrient remobilization are critical for cereal grain yield and quality. Understanding the genetic variation in genes associated with these developmental traits can streamline genotypic selection of superior malt barley germplasm. Here, we determined the effects of allelic variation in three genes encoding a glycine-rich RNA-binding protein (HvGR-RBP1) and two NAC transcription factors (HvNAM1 and HvNAM2) on malt barley agronomics and quality using previously developed markers for HvGR-RBP1 and HvNAM1 and a novel marker for HvNAM2. Based on a single-nucleotide polymorphism (SNP) in the first intron, the utilized marker differentiates NAM2 alleles of low-grain protein variety 'Karl' and of higher protein variety 'Lewis'. We demonstrate that the selection of favorable alleles for each gene impacts heading date, senescence timing, grain size, grain protein concentration, and malt quality. Specifically, combining 'Karl' alleles for the two NAC genes with the 'Lewis' HvGR-RBP1 allele extends grain fill duration, increases the percentage of plump kernels, decreases grain protein, and provides malt quality stability. Molecular markers for these genes are therefore highly useful tools in malt barley breeding. Supplementary Information The online version contains supplementary material available at 10.1007/s11032-022-01331-7.
Collapse
Affiliation(s)
- Burcu Alptekin
- Department of Plant Sciences and Plant Pathology, Montana State University, Bozeman, MT 59717 USA
- Present Address: Department of Bacteriology, University of Wisconsin, Madison, WI 53706 USA
| | - Mohammad Erfatpour
- Department of Plant Sciences and Plant Pathology, Montana State University, Bozeman, MT 59717 USA
- Present Address: Department of Plant Sciences, North Dakota State University, Fargo, ND 58108 USA
| | - Dylan Mangel
- Department of Plant Sciences and Plant Pathology, Montana State University, Bozeman, MT 59717 USA
- Present Address: Department of Plant Pathology, Kansas State University, Manhattan, KS 66506 USA
| | - Duke Pauli
- Department of Plant Sciences and Plant Pathology, Montana State University, Bozeman, MT 59717 USA
- Present Address: School of Plant Sciences, University of Arizona, Tucson, AZ 85721 USA
| | - Tom Blake
- Department of Plant Sciences and Plant Pathology, Montana State University, Bozeman, MT 59717 USA
| | - Hannah Turner
- Department of Plant Sciences and Plant Pathology, Montana State University, Bozeman, MT 59717 USA
| | - Jennifer Lachowiec
- Department of Plant Sciences and Plant Pathology, Montana State University, Bozeman, MT 59717 USA
| | - Jamie Sherman
- Department of Plant Sciences and Plant Pathology, Montana State University, Bozeman, MT 59717 USA
| | - Andreas Fischer
- Department of Plant Sciences and Plant Pathology, Montana State University, Bozeman, MT 59717 USA
| |
Collapse
|
24
|
Parodi L, Barbier M, Jacoupy M, Pujol C, Lejeune FX, Lallemant-Dudek P, Esteves T, Pennings M, Kamsteeg EJ, Guillaud-Bataille M, Banneau G, Coarelli G, Oumoussa BM, Fraidakis MJ, Stevanin G, Depienne C, van de Warrenburg B, Brice A, Durr A. The mitochondrial seryl-tRNA synthetase SARS2 modifies onset in spastic paraplegia type 4. Genet Med 2022; 24:2308-2317. [PMID: 36056923 DOI: 10.1016/j.gim.2022.07.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 07/24/2022] [Accepted: 07/25/2022] [Indexed: 11/25/2022] Open
Abstract
PURPOSE Hereditary spastic paraplegia type 4 is extremely variable in age at onset; the same variant can cause onset at birth or in the eighth decade. We recently discovered that missense variants in SPAST, which influences microtubule dynamics, are associated with earlier onset and more severe disease than truncating variants, but even within the early and late-onset groups there remained significant differences in onset. Given the rarity of the condition, we adapted an extreme phenotype approach to identify genetic modifiers of onset. METHODS We performed a genome-wide association study on 134 patients bearing truncating pathogenic variants in SPAST, divided into early- and late-onset groups (aged ≤15 and ≥45 years, respectively). A replication cohort of 419 included patients carrying either truncating or missense variants. Finally, age at onset was analyzed in the merged cohort (N = 553). RESULTS We found 1 signal associated with earlier age at onset (rs10775533, P = 8.73E-6) in 2 independent cohorts and in the merged cohort (N = 553, Mantel-Cox test, P < .0001). Western blotting in lymphocytes of 20 patients showed that this locus tends to upregulate SARS2 expression in earlier-onset patients. CONCLUSION SARS2 overexpression lowers the age of onset in hereditary spastic paraplegia type 4. Lowering SARS2 or improving mitochondrial function could thus present viable approaches to therapy.
Collapse
Affiliation(s)
- Livia Parodi
- Paris Brain Institute (Institut du Cerveau, ICM), INSERM, CNRS, Assistance Publique-Hôpitaux de Paris (AP-HP), Sorbonne Université, Paris, France
| | - Mathieu Barbier
- Paris Brain Institute (Institut du Cerveau, ICM), INSERM, CNRS, Assistance Publique-Hôpitaux de Paris (AP-HP), Sorbonne Université, Paris, France
| | - Maxime Jacoupy
- Paris Brain Institute (Institut du Cerveau, ICM), INSERM, CNRS, Assistance Publique-Hôpitaux de Paris (AP-HP), Sorbonne Université, Paris, France
| | - Claire Pujol
- Paris Brain Institute (Institut du Cerveau, ICM), INSERM, CNRS, Assistance Publique-Hôpitaux de Paris (AP-HP), Sorbonne Université, Paris, France; Pasteur Institute, Centre National de la Recherche Scientifique UMR 3691, Paris, France
| | - François-Xavier Lejeune
- Paris Brain Institute (Institut du Cerveau, ICM), INSERM, CNRS, Assistance Publique-Hôpitaux de Paris (AP-HP), Sorbonne Université, Paris, France
| | - Pauline Lallemant-Dudek
- Paris Brain Institute (Institut du Cerveau, ICM), INSERM, CNRS, Assistance Publique-Hôpitaux de Paris (AP-HP), Sorbonne Université, Paris, France
| | - Typhaine Esteves
- Paris Brain Institute (Institut du Cerveau, ICM), INSERM, CNRS, Assistance Publique-Hôpitaux de Paris (AP-HP), Sorbonne Université, Paris, France; Université de Bordeaux, CNRS, EPHE, INCIA, UMR 5287, Bordeaux, France
| | - Maartje Pennings
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Erik-Jan Kamsteeg
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, the Netherlands
| | | | - Guillaume Banneau
- Département de Génétique, AP-HP, GH Pitié-Salpêtrière, Sorbonne Université, Paris, France
| | - Giulia Coarelli
- Paris Brain Institute (Institut du Cerveau, ICM), INSERM, CNRS, Assistance Publique-Hôpitaux de Paris (AP-HP), Sorbonne Université, Paris, France
| | - Badreddine Mohand Oumoussa
- Sorbonne Université, Inserm, UMS Production et Analyse des données en Sciences de la vie et en Santé, PASS, Plateforme Post-génomique de la Pitié-Salpêtrière, P3S, Paris, France
| | - Matthew J Fraidakis
- Rare Neurological Diseases Unit, Department of Neurology, Attikon University Hospital, Medical School of the University of Athens, Athens, Greece
| | - Giovanni Stevanin
- Paris Brain Institute (Institut du Cerveau, ICM), INSERM, CNRS, Assistance Publique-Hôpitaux de Paris (AP-HP), Sorbonne Université, Paris, France; Université de Bordeaux, CNRS, EPHE, INCIA, UMR 5287, Bordeaux, France
| | - Christel Depienne
- Paris Brain Institute (Institut du Cerveau, ICM), INSERM, CNRS, Assistance Publique-Hôpitaux de Paris (AP-HP), Sorbonne Université, Paris, France; Institut für Humangenetik, Universitätsklinikum Essen, Essen, Germany
| | - Bart van de Warrenburg
- Department of Neurology, Donders Institute for Brain, Cognition and Behavior, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Alexis Brice
- Paris Brain Institute (Institut du Cerveau, ICM), INSERM, CNRS, Assistance Publique-Hôpitaux de Paris (AP-HP), Sorbonne Université, Paris, France
| | - Alexandra Durr
- Paris Brain Institute (Institut du Cerveau, ICM), INSERM, CNRS, Assistance Publique-Hôpitaux de Paris (AP-HP), Sorbonne Université, Paris, France.
| |
Collapse
|
25
|
Sherwin WB. Bray-Curtis (AFD) differentiation in molecular ecology: Forecasting, an adjustment ( A A), and comparative performance in selection detection. Ecol Evol 2022; 12:e9176. [PMID: 36110882 PMCID: PMC9465203 DOI: 10.1002/ece3.9176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Revised: 07/04/2022] [Accepted: 07/06/2022] [Indexed: 11/07/2022] Open
Abstract
Geographic genetic differentiation measures are used for purposes such as assessing genetic diversity and connectivity, and searching for signals of selection. Confirmation by unrelated measures can minimize false positives. A popular differentiation measure, Bray-Curtis, has been used increasingly in molecular ecology, renamed AFD (hereafter called BCAFD). Critically, BCAFD is expected to be partially independent of the commonly used Hill "Q-profile" measures. BCAFD needs scrutiny for potential biases, by examining limits on its value, and comparing simulations against expectations. BCAFD has two dependencies on within-population (alpha) variation, undesirable for a between-population (beta) measure. The first dependency is derived from similarity toG ST andF ST . The second dependency is that BCAFD cannot be larger than the highest allele proportion in either location (alpha variation), which can be overcome by data-filtering or by a modified statistic A A or "Adjusted AFD". The first dependency does not forestall applications such as assessing connectivity or selection, if we know the measure's null behavior under selective neutrality with specified conditions-which is shown in this article for A A, for equilibrium, and nonequilibrium, for the commonly used data type of single-nucleotide-polymorphisms (SNPs) in two locations. Thus, A A can be used in tandem with mathematically contrasting differentiation measures, with the aim of reducing false inferences. For detecting adaptive loci, the relative performance of A A and other measures was evaluated, showing that it is best to use two mathematically different measures simultaneously, and that A A is in one of the best such pairwise criteria. For any application, using A A, rather than BCAFD, avoids the counterintuitive limitation by maximum allele proportion within localities.
Collapse
Affiliation(s)
- William B. Sherwin
- Evolution and Ecology Research Centre, School of BEESUNSW‐SydneySydneyNew South WalesAustralia
| |
Collapse
|
26
|
Yang CJ, Ladejobi O, Mott R, Powell W, Mackay I. Analysis of historical selection in winter wheat. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2022; 135:3005-3023. [PMID: 35864201 PMCID: PMC9482581 DOI: 10.1007/s00122-022-04163-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/23/2022] [Accepted: 06/22/2022] [Indexed: 06/15/2023]
Abstract
KEY MESSAGE Modeling of the distribution of allele frequency over year of variety release identifies major loci involved in historical breeding of winter wheat. Winter wheat is a major crop with a rich selection history in the modern era of crop breeding. Genetic gains across economically important traits like yield have been well characterized and are the major force driving its production. Winter wheat is also an excellent model for analyzing historical genetic selection. As a proof of concept, we analyze two major collections of winter wheat varieties that were bred in Western Europe from 1916 to 2010, namely the Triticeae Genome (TG) and WAGTAIL panels, which include 333 and 403 varieties, respectively. We develop and apply a selection mapping approach, Regression of Alleles on Years (RALLY), in these panels, as well as in simulated populations. RALLY maps loci under sustained historical selection by using a simple logistic model to regress allele counts on years of variety release. To control for drift-induced allele frequency change, we develop a hybrid approach of genomic control and delta control. Within the TG panel, we identify 22 significant RALLY quantitative selection loci (QSLs) and estimate the local heritabilities for 12 traits across these QSLs. By correlating predicted marker effects with RALLY regression estimates, we show that alleles whose frequencies have increased over time are heavily biased toward conferring positive yield effect, but negative effects in flowering time, lodging, plant height and grain protein content. Altogether, our results (1) demonstrate the use of RALLY to identify selected genomic regions while controlling for drift, and (2) reveal key patterns in the historical selection in winter wheat and guide its future breeding.
Collapse
Affiliation(s)
- Chin Jian Yang
- Scotland's Rural College (SRUC), Kings Buildings, West Mains Road, Edinburgh, EH9 3JG, UK
| | - Olufunmilayo Ladejobi
- Department of Genetics, Evolution and Environment, University College London, London, WC1E 6BT, UK
| | - Richard Mott
- Department of Genetics, Evolution and Environment, University College London, London, WC1E 6BT, UK
| | - Wayne Powell
- Scotland's Rural College (SRUC), Kings Buildings, West Mains Road, Edinburgh, EH9 3JG, UK
| | - Ian Mackay
- Scotland's Rural College (SRUC), Kings Buildings, West Mains Road, Edinburgh, EH9 3JG, UK.
- IMplant Consultancy Ltd, Chelmsford, UK.
| |
Collapse
|
27
|
Whole blood DNA methylation analysis reveals respiratory environmental traits involved in COVID-19 severity following SARS-CoV-2 infection. Nat Commun 2022; 13:4597. [PMID: 35933486 PMCID: PMC9357033 DOI: 10.1038/s41467-022-32357-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Accepted: 07/26/2022] [Indexed: 02/06/2023] Open
Abstract
SARS-CoV-2 infection can cause an inflammatory syndrome (COVID-19) leading, in many cases, to bilateral pneumonia, severe dyspnea, and in ~5% of these, death. DNA methylation is known to play an important role in the regulation of the immune processes behind COVID-19 progression, however it has not been studied in depth. In this study, we aim to evaluate the implication of DNA methylation in COVID-19 progression by means of a genome-wide DNA methylation analysis combined with DNA genotyping. The results reveal the existence of epigenomic regulation of functional pathways associated with COVID-19 progression and mediated by genetic loci. We find an environmental trait-related signature that discriminates mild from severe cases and regulates, among other cytokines, IL-6 expression via the transcription factor CEBP. The analyses suggest that an interaction between environmental contribution, genetics, and epigenetics might be playing a role in triggering the cytokine storm described in the most severe cases.
Collapse
|
28
|
Long PN, Cook VJ, Majumder A, Barbour AG, Long AD. The utility of a closed breeding colony of Peromyscus leucopus for dissecting complex traits. Genetics 2022; 221:iyac026. [PMID: 35143664 PMCID: PMC9071557 DOI: 10.1093/genetics/iyac026] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Accepted: 02/01/2022] [Indexed: 11/13/2022] Open
Abstract
Deermice of the genus Peromyscus are well suited for addressing several questions of biologist interest, including the genetic bases of longevity, behavior, physiology, adaptation, and their ability to serve as disease vectors. Here, we explore a diversity outbred approach for dissecting complex traits in Peromyscus leucopus, a nontraditional genetic model system. We take advantage of a closed colony of deer-mice founded from 38 individuals and subsequently maintained for ∼40-60 generations. From 405 low-pass short-read sequenced deermice we accurate impute genotypes at 16 million single nucleotide polymorphisms. Conditional on observed genotypes simulations were conducted in which three different sized quantitative trait loci contribute to a complex trait under three different genetic models. Using a stringent significance threshold power was modest, largely a function of the percent variation attributable to the simulated quantitative trait loci, with the underlying genetic model having only a subtle impact. We additionally simulated 2,000 pseudo-individuals, whose genotypes were consistent with those observed in the genotyped cohort and carried out additional power simulations. In experiments employing more than 1,000 mice power is high to detect quantitative trait loci contributing greater than 2.5% to a complex trait, with a localization ability of ∼100 kb. We finally carried out a Genome-Wide Association Study on two demonstration traits, bleeding time and body weight, and uncovered one significant region. Our work suggests that complex traits can be dissected in founders-unknown P. leucopus colony mice and similar colonies in other systems using easily obtained genotypes from low-pass sequencing.
Collapse
Affiliation(s)
- Phillip N Long
- Department of Ecology and Evolutionary Biology, School of Biological Sciences, University of California Irvine, Irvine, CA 92697-2525, USA
| | - Vanessa J Cook
- Departments of Microbiology & Molecular Genetics and Medicine, School of Medical Sciences, University of California Irvine, Irvine, CA 92687-2525, USA
| | - Arundhati Majumder
- Department of Ecology and Evolutionary Biology, School of Biological Sciences, University of California Irvine, Irvine, CA 92697-2525, USA
| | - Alan G Barbour
- Department of Ecology and Evolutionary Biology, School of Biological Sciences, University of California Irvine, Irvine, CA 92697-2525, USA
- Departments of Microbiology & Molecular Genetics and Medicine, School of Medical Sciences, University of California Irvine, Irvine, CA 92687-2525, USA
| | - Anthony D Long
- Department of Ecology and Evolutionary Biology, School of Biological Sciences, University of California Irvine, Irvine, CA 92697-2525, USA
| |
Collapse
|
29
|
Maróstica AS, Nunes K, Castelli EC, Silva NSB, Weir BS, Goudet J, Meyer D. How HLA diversity is apportioned: influence of selection and relevance to transplantation. Philos Trans R Soc Lond B Biol Sci 2022; 377:20200420. [PMID: 35430892 PMCID: PMC9014195 DOI: 10.1098/rstb.2020.0420] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
In his 1972 paper ‘The apportionment of human diversity’, Lewontin showed that, when averaged over loci, genetic diversity is predominantly attributable to differences among individuals within populations. However, selection can alter the apportionment of diversity of specific genes or genomic regions. We examine genetic diversity at the human leucocyte antigen (HLA) loci, located within the major histocompatibility complex (MHC) region. HLA genes code for proteins that are critical to adaptive immunity and are well-documented targets of balancing selection. The single-nucleotide polymorphisms (SNPs) within HLA genes show strong signatures of balancing selection on large timescales and are broadly shared among populations, displaying low FST values. However, when we analyse haplotypes defined by these SNPs (which define ‘HLA alleles’), we find marked differences in frequencies between geographic regions. These differences are not reflected in the FST values because of the extreme polymorphism at HLA loci, illustrating challenges in interpreting FST. Differences in the frequency of HLA alleles among geographic regions are relevant to bone-marrow transplantation, which requires genetic identity at HLA loci between patient and donor. We discuss the case of Brazil's bone marrow registry, where a deficit of enrolled volunteers with African ancestry reduces the chance of finding donors for individuals with an MHC region of African ancestry. This article is part of the theme issue ‘Celebrating 50 years since Lewontin's apportionment of human diversity’.
Collapse
Affiliation(s)
- André Silva Maróstica
- Departamento de Genética e Biologia Evolutiva, Universidade de São Paulo, São Paulo, SP, Brazil
| | - Kelly Nunes
- Departamento de Genética e Biologia Evolutiva, Universidade de São Paulo, São Paulo, SP, Brazil
| | - Erick C. Castelli
- Departamento de Patologia, Universidade Estadual Paulista - Unesp, Faculdade de Medicina de Botucatu, Botucatu, SP, Brazil
- Molecular Genetics and Bioinformatics Laboratory, Experimental Research Unit, School of Medicine, São Paulo State University - Unesp, Botucatu, SP, Brazil
| | - Nayane S. B. Silva
- Molecular Genetics and Bioinformatics Laboratory, Experimental Research Unit, School of Medicine, São Paulo State University - Unesp, Botucatu, SP, Brazil
| | - Bruce S. Weir
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Jérôme Goudet
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland
- Swiss Institute of Bioinformatics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Diogo Meyer
- Departamento de Genética e Biologia Evolutiva, Universidade de São Paulo, São Paulo, SP, Brazil
| |
Collapse
|
30
|
Chiu AM, Molloy EK, Tan Z, Talwalkar A, Sankararaman S. Inferring population structure in biobank-scale genomic data. Am J Hum Genet 2022; 109:727-737. [PMID: 35298920 PMCID: PMC9069078 DOI: 10.1016/j.ajhg.2022.02.015] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Accepted: 02/21/2022] [Indexed: 01/07/2023] Open
Abstract
Inferring the structure of human populations from genetic variation data is a key task in population and medical genomic studies. Although a number of methods for population structure inference have been proposed, current methods are impractical to run on biobank-scale genomic datasets containing millions of individuals and genetic variants. We introduce SCOPE, a method for population structure inference that is orders of magnitude faster than existing methods while achieving comparable accuracy. SCOPE infers population structure in about a day on a dataset containing one million individuals and variants as well as on the UK Biobank dataset containing 488,363 individuals and 569,346 variants. Furthermore, SCOPE can leverage allele frequencies from previous studies to improve the interpretability of population structure estimates.
Collapse
Affiliation(s)
- Alec M Chiu
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Erin K Molloy
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA 90095, USA; Institute for Advanced Computer Studies, University of Maryland, College Park, College Park, MD 20742, USA
| | - Zilong Tan
- Facebook, Inc., Menlo Park, CA 94025, USA
| | - Ameet Talwalkar
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Sriram Sankararaman
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Computer Science, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA.
| |
Collapse
|
31
|
Lauer E, Holland J, Isik F. Prediction ability of genome-wide markers in Pinus taeda L. within and between population is affected by relatedness to the training population and trait genetic architecture. G3 (BETHESDA, MD.) 2022; 12:6440053. [PMID: 34849838 PMCID: PMC9210318 DOI: 10.1093/g3journal/jkab405] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Accepted: 11/08/2021] [Indexed: 11/26/2022]
Abstract
Genomic prediction has the potential to significantly increase the rate of genetic gain in tree breeding programs. In this study, a clonally replicated population (n = 2063) was used to train a genomic prediction model. The model was validated both within the training population and in a separate population (n = 451). The prediction abilities from random (20% vs 80%) cross validation within the training population were 0.56 and 0.78 for height and stem form, respectively. Removal of all full-sib relatives within the training population resulted in ∼50% reduction in their genomic prediction ability for both traits. The average prediction ability for all 451 individual trees was 0.29 for height and 0.57 for stem form. The degree of genetic linkage (full-sib family, half sib family, unrelated) between the training and validation sets had a strong impact on prediction ability for stem form but not for height. A dominant dwarfing allele, the first to be reported in a conifer species, was discovered via genome-wide association studies on linkage Group 5 that conferred a 0.33-m mean height reduction. However, the QTL was family specific. The rapid decay of linkage disequilibrium, large genome size, and inconsistencies in marker-QTL linkage phase suggest that large, diverse training populations are needed for genomic selection in Pinus taeda L.
Collapse
Affiliation(s)
- Edwin Lauer
- Department of Forestry and Environmental Resources, North Carolina State University, Raleigh, NC 27695, USA
| | - James Holland
- USDA-ARS Plant Science Research Unit, North Carolina State University, Raleigh, NC 27695, USA
| | - Fikret Isik
- Department of Forestry and Environmental Resources, North Carolina State University, Raleigh, NC 27695, USA
| |
Collapse
|
32
|
Zhang QS, Goudet J, Weir BS. Rank-invariant estimation of inbreeding coefficients. Heredity (Edinb) 2022; 128:1-10. [PMID: 34824382 PMCID: PMC8733021 DOI: 10.1038/s41437-021-00471-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Revised: 09/05/2021] [Accepted: 09/05/2021] [Indexed: 11/18/2022] Open
Abstract
The two alleles an individual carries at a locus are identical by descent (ibd) if they have descended from a single ancestral allele in a reference population, and the probability of such identity is the inbreeding coefficient of the individual. Inbreeding coefficients can be predicted from pedigrees with founders constituting the reference population, but estimation from genetic data is not possible without data from the reference population. Most inbreeding estimators that make explicit use of sample allele frequencies as estimates of allele probabilities in the reference population are confounded by average kinships with other individuals. This means that the ranking of those estimates depends on the scope of the study sample and we show the variation in rankings for common estimators applied to different subdivisions of 1000 Genomes data. Allele-sharing estimators of within-population inbreeding relative to average kinship in a study sample, however, do have invariant rankings across all studies including those individuals. They are unbiased with a large number of SNPs. We discuss how allele sharing estimates are the relevant quantities for a range of empirical applications.
Collapse
Affiliation(s)
- Qian S Zhang
- Department of Biostatistics, University of Washington, Seattle, WA, 98195-1617, USA
| | - Jérôme Goudet
- Department of Ecology and Evolution, University of Lausanne, CH-1015, Lausanne, Switzerland
| | - Bruce S Weir
- Department of Biostatistics, University of Washington, Seattle, WA, 98195-1617, USA.
| |
Collapse
|
33
|
Duk M, Kanapin A, Rozhmina T, Bankin M, Surkova S, Samsonova A, Samsonova M. The Genetic Landscape of Fiber Flax. FRONTIERS IN PLANT SCIENCE 2021; 12:764612. [PMID: 34950165 PMCID: PMC8691122 DOI: 10.3389/fpls.2021.764612] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Accepted: 11/03/2021] [Indexed: 06/14/2023]
Abstract
Genetic diversity in a breeding program is essential to overcome modern-day environmental challenges faced by humanity and produce robust, resilient crop cultivars with improved agronomic characteristics, as well as to trace crop domestication history. Flax (Linum usitatissimum), one of the first crops domesticated by mankind, has been traditionally cultivated for fiber as well as for medicinal purposes and as a nutritional product. The origins of fiber flax are hidden in the mists of time and can be hypothetically traced back to either the Indo-Afghan region or Fertile Crescent. To shed new light on fiber flax genetic diversity and breeding history, in this study, we presented a comprehensive analysis of the core collection of flax (306 accessions) of different morphotypes and geographic origins maintained by the Russian Federal Research Center for Bast Fiber Crops. We observed significant population differentiation between oilseed and fiber morphotypes, as well as mapped genomic regions affected by recent breeding efforts. We also sought to unravel the origins of kryazhs, Russian heritage landraces, and their genetic relatedness to modern fiber flax cultivars. For the first time, our results provide strong genetic evidence in favor of the hypothesis on kryazh's mixed origin from both the Indo-Afghan diversity center and Fertile Crescent. Finally, we showed predominant contribution from Russian landraces and kryazhs into the ancestry of modern fiber flax varieties. Taken together, these findings may have practical implications on the development of new improved flax varieties with desirable traits that give farmers greater choice in crop management and meet the aspirations of breeders.
Collapse
Affiliation(s)
- Maria Duk
- Mathematical Biology and Bioinformatics Laboratory, Peter the Great St. Petersburg Polytechnic University, Saint Petersburg, Russia
| | - Alexander Kanapin
- Centre for Computational Biology, Peter the Great St. Petersburg Polytechnic University, Saint Petersburg, Russia
| | - Tatyana Rozhmina
- Laboratory of Breeding Technologies, Federal Research Center for Bast Fiber Crops, Torzhok, Russia
| | - Mikhail Bankin
- Mathematical Biology and Bioinformatics Laboratory, Peter the Great St. Petersburg Polytechnic University, Saint Petersburg, Russia
| | - Svetlana Surkova
- Mathematical Biology and Bioinformatics Laboratory, Peter the Great St. Petersburg Polytechnic University, Saint Petersburg, Russia
| | - Anastasia Samsonova
- Centre for Computational Biology, Peter the Great St. Petersburg Polytechnic University, Saint Petersburg, Russia
- Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
| | - Maria Samsonova
- Mathematical Biology and Bioinformatics Laboratory, Peter the Great St. Petersburg Polytechnic University, Saint Petersburg, Russia
| |
Collapse
|
34
|
Laurent FX, Fischer A, Oldt RF, Kanthaswamy S, Buckleton JS, Hitchin S. Streamlining the decision-making process for international DNA kinship matching using Worldwide allele frequencies and tailored cutoff log 10LR thresholds. Forensic Sci Int Genet 2021; 57:102634. [PMID: 34871915 DOI: 10.1016/j.fsigen.2021.102634] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Revised: 10/13/2021] [Accepted: 11/15/2021] [Indexed: 11/30/2022]
Abstract
The identification of human remains belonging to missing persons is one of the main challenges for forensic genetics. Although other means of identification can be applied to missing person investigations, DNA is often extremely valuable to further support or refute potential associations. When reference DNA samples cannot be collected from personal items belonging to a missing person, a direct DNA identification cannot be carried out. However, identifications can be made indirectly using DNA from the missing person's relatives. The ranking of likelihood ratio (LR) values, which measure the fit of a missing person for any given pedigree, is often the first step in selecting candidates in a DNA database. Although implementing DNA kinship matching in a national environment is feasible, many challenges need to be resolved before applying this method to an international configuration. In this study, we present an innovative and intuitive method to perform international DNA kinship matching and facilitate the comparison of DNA profiles when the ancestry is unknown or unsure and/or when different marker sets are used. This straightforward method, which is based on calculations performed with the DNA matching software BONAPARTE, Worldwide allele frequencies and tailored cutoff log10LR thresholds, allows for the classification of potential candidates according to the strength of the DNA evidence and the predicted proportion of adventitious matches. This is a powerful method for streamlining the decision-making process in missing person investigations and DVI processes, especially when there are low numbers of overlapping typed STRs. Intuitive interpretation tables and a decision tree will help strengthen international data comparison for the identification of reported missing individuals discovered outside their national borders.
Collapse
Affiliation(s)
- François-Xavier Laurent
- International Criminal Police Organization - INTERPOL, DNA Unit, 200 quai Charles de Gaulle, 69006 Lyon, France.
| | - Andrea Fischer
- International Criminal Police Organization - INTERPOL, DNA Unit, 200 quai Charles de Gaulle, 69006 Lyon, France; Landeskriminalamt Baden-Württemberg, Taubenheimstr. 85, 70372 Stuttgart, Germany
| | - Robert F Oldt
- School of Mathematical and Natural Sciences, Arizona State University, Phoenix, AZ 85004, USA
| | - Sree Kanthaswamy
- School of Mathematical and Natural Sciences, Arizona State University, Phoenix, AZ 85004, USA
| | - John S Buckleton
- University of Auckland, Department of Statistics, Private Bag, 92019 Auckland, New Zealand
| | - Susan Hitchin
- International Criminal Police Organization - INTERPOL, DNA Unit, 200 quai Charles de Gaulle, 69006 Lyon, France.
| |
Collapse
|
35
|
Calboli FCF, Delahaut V, Deflem I, Hablützel PI, Hellemans B, Kordas A, Raeymaekers JAM, Bervoets L, De Boeck G, Volckaert FAM. Association between Chromosome 4 and mercury accumulation in muscle of the three-spined stickleback ( Gasterosteus aculeatus). Evol Appl 2021; 14:2553-2567. [PMID: 34745343 PMCID: PMC8549617 DOI: 10.1111/eva.13298] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 08/18/2021] [Accepted: 08/29/2021] [Indexed: 11/29/2022] Open
Abstract
Anthropogenic stressors, such as pollutants, act as selective factors that can leave measurable changes in allele frequencies in the genome. Metals are of particular concern among pollutants, because of interference with vital biological pathways. We use the three-spined stickleback as a model for adaptation to mercury pollution in natural populations. We collected sticklebacks from 21 locations in Flanders (Belgium), measured the accumulated levels of mercury in the skeletal muscle tissue, and genotyped the fish by sequencing (GBS). The spread of muscle mercury content across locations was considerable, ranging from 21.5 to 327 ng/g dry weight (DW). We then conducted a genome-wide association study (GWAS) between 28,450 single nucleotide polymorphisms (SNPs) and the accumulated levels of mercury, using different approaches. Based on a linear mixed model analysis, the GWAS yielded multiple hits with a single top hit on Chromosome 4, with eight more SNPs suggestive of association. A second approach, a latent factor mixed model analysis, highlighted one single SNP on Chromosome 11. Finally, an outlier test identified one additional SNP on Chromosome 4 that appeared under selection. Out of all ten SNPs we identified as associated with mercury in muscle, three SNPs all located on Chromosome 4 and positioned within a 2.5 kb distance of an annotated gene. Based on these results and the genome coverage of our SNPs, we conclude that the selective effect of mercury pollution in Flanders causes a significant association with at least one locus on Chromosome 4 in three-spined stickleback.
Collapse
Affiliation(s)
- Federico C. F. Calboli
- Laboratory of Biodiversity and Evolutionary GenomicsKU LeuvenLeuvenBelgium
- Present address:
Natural Resources Institute Finland (Luke)HelsinkiFinland
| | - Vyshal Delahaut
- Department of BiologySystemic Physiological and Ecotoxicological Research (SPHERE)University of AntwerpAntwerpenBelgium
| | - Io Deflem
- Laboratory of Biodiversity and Evolutionary GenomicsKU LeuvenLeuvenBelgium
| | | | - Bart Hellemans
- Laboratory of Biodiversity and Evolutionary GenomicsKU LeuvenLeuvenBelgium
| | - Anna Kordas
- Laboratory of Biodiversity and Evolutionary GenomicsKU LeuvenLeuvenBelgium
| | | | - Lieven Bervoets
- Department of BiologySystemic Physiological and Ecotoxicological Research (SPHERE)University of AntwerpAntwerpenBelgium
| | - Gudrun De Boeck
- Department of BiologySystemic Physiological and Ecotoxicological Research (SPHERE)University of AntwerpAntwerpenBelgium
| | | |
Collapse
|
36
|
Genome-Wide SNP Analysis Reveals Multiple Paternity in Burmese Pythons Invasive to the Greater Florida Everglades. J HERPETOL 2021. [DOI: 10.1670/20-104] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
37
|
A spectral theory for Wright's inbreeding coefficients and related quantities. PLoS Genet 2021; 17:e1009665. [PMID: 34280184 PMCID: PMC8320931 DOI: 10.1371/journal.pgen.1009665] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Revised: 07/29/2021] [Accepted: 06/13/2021] [Indexed: 12/20/2022] Open
Abstract
Wright’s inbreeding coefficient, FST, is a fundamental measure in population genetics. Assuming a predefined population subdivision, this statistic is classically used to evaluate population structure at a given genomic locus. With large numbers of loci, unsupervised approaches such as principal component analysis (PCA) have, however, become prominent in recent analyses of population structure. In this study, we describe the relationships between Wright’s inbreeding coefficients and PCA for a model of K discrete populations. Our theory provides an equivalent definition of FST based on the decomposition of the genotype matrix into between and within-population matrices. The average value of Wright’s FST over all loci included in the genotype matrix can be obtained from the PCA of the between-population matrix. Assuming that a separation condition is fulfilled and for reasonably large data sets, this value of FST approximates the proportion of genetic variation explained by the first (K − 1) principal components accurately. The new definition of FST is useful for computing inbreeding coefficients from surrogate genotypes, for example, obtained after correction of experimental artifacts or after removing adaptive genetic variation associated with environmental variables. The relationships between inbreeding coefficients and the spectrum of the genotype matrix not only allow interpretations of PCA results in terms of population genetic concepts but extend those concepts to population genetic analyses accounting for temporal, geographical and environmental contexts. Principal component analysis (PCA) is the most-frequently used approach to describe population genetic structure from large population genomic data sets. In this study, we show that PCA not only estimates ancestries of sampled individuals, but also computes the average value of Wright’s inbreeding coefficient over the loci included in the genotype matrix. Our result shows that inbreeding coefficients and PCA eigenvalues provide equivalent descriptions of population structure. As a consequence, PCA extends the definition of those coefficients beyond the framework of allelic frequencies. We give examples on how FST can be computed from ancient DNA samples for which genotypes are corrected for coverage, and in an ecological genomic example where a proportion of genetic variation is explained by environmental variables.
Collapse
|
38
|
Ochoa A, Storey JD. Estimating FST and kinship for arbitrary population structures. PLoS Genet 2021; 17:e1009241. [PMID: 33465078 PMCID: PMC7846127 DOI: 10.1371/journal.pgen.1009241] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2019] [Revised: 01/29/2021] [Accepted: 11/02/2020] [Indexed: 12/20/2022] Open
Abstract
FST and kinship are key parameters often estimated in modern population genetics studies in order to quantitatively characterize structure and relatedness. Kinship matrices have also become a fundamental quantity used in genome-wide association studies and heritability estimation. The most frequently-used estimators of FST and kinship are method-of-moments estimators whose accuracies depend strongly on the existence of simple underlying forms of structure, such as the independent subpopulations model of non-overlapping, independently evolving subpopulations. However, modern data sets have revealed that these simple models of structure likely do not hold in many populations, including humans. In this work, we analyze the behavior of these estimators in the presence of arbitrarily-complex population structures, which results in an improved estimation framework specifically designed for arbitrary population structures. After generalizing the definition of FST to arbitrary population structures and establishing a framework for assessing bias and consistency of genome-wide estimators, we calculate the accuracy of existing FST and kinship estimators under arbitrary population structures, characterizing biases and estimation challenges unobserved under their originally-assumed models of structure. We then present our new approach, which consistently estimates kinship and FST when the minimum kinship value in the dataset is estimated consistently. We illustrate our results using simulated genotypes from an admixture model, constructing a one-dimensional geographic scenario that departs nontrivially from the independent subpopulations model. Our simulations reveal the potential for severe biases in estimates of existing approaches that are overcome by our new framework. This work may significantly improve future analyses that rely on accurate kinship and FST estimates. Kinship coefficients and FST, which measure relatedness and population structure, respectively, are important quantities needed to accurately perform various analyses on genetic data, including genome-wide association studies and heritability estimation. However, existing estimators require restrictive assumptions of independence that are not met by real human and other datasets. In this work we find that existing estimators can be severely biased under reasonable scenarios, first by theoretically determining their properties, and then using an admixture simulation to illustrate our findings. In particular, we find that existing FST estimators are downwardly biased, and that existing kinship matrix estimators have related biases that are on average downward and of similar magnitude but vary for every pair of individuals. These insights led us to a new estimation framework for kinship and FST that is practically unbiased for any population structure, as demonstrated by theory and simulations. Our new approaches—available as open-source R packages—are easy to use and are more widely applicable than existing approaches, and they are likely to improve downstream analyses that require accurate kinship and FST estimates.
Collapse
Affiliation(s)
- Alejandro Ochoa
- Duke Center for Statistical Genetics and Genomics, Duke University, Durham, North Carolina, United States of America
- Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina, United States of America
| | - John D. Storey
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
- * E-mail:
| |
Collapse
|