1
|
Ko BS, Lee SB, Kim TK. A brief guide to analyzing expression quantitative trait loci. Mol Cells 2024; 47:100139. [PMID: 39447874 DOI: 10.1016/j.mocell.2024.100139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Revised: 10/14/2024] [Accepted: 10/17/2024] [Indexed: 10/26/2024] Open
Abstract
Molecular quantitative trait locus (molQTL) mapping has emerged as an important approach for elucidating the functional consequences of genetic variants and unraveling the causal mechanisms underlying diseases or complex traits. However, the variety of analysis tools and sophisticated methodologies available for molQTL studies can be overwhelming for researchers with limited computational expertise. Here, we provide a brief guideline with a curated list of methods and software tools for analyzing expression quantitative trait loci, the most widely studied type of molQTL.
Collapse
Affiliation(s)
- Byung Su Ko
- Department of Brain Sciences, DGIST, Daegu 42988, Republic of Korea
| | - Sung Bae Lee
- Department of Brain Sciences, DGIST, Daegu 42988, Republic of Korea
| | - Tae-Kyung Kim
- Department of Life Sciences, Pohang University of Science and Technology (POSTECH), Pohang 37673, Republic of Korea; Institute for Convergence Research and Education in Advanced Technology, Yonsei University, Seoul 03722, Republic of Korea.
| |
Collapse
|
2
|
Karasov TL, Neumann M, Leventhal L, Symeonidi E, Shirsekar G, Hawks A, Monroe G, Exposito-Alonso M, Bergelson J, Weigel D, Schwab R. Continental-scale associations of Arabidopsis thaliana phyllosphere members with host genotype and drought. Nat Microbiol 2024; 9:2748-2758. [PMID: 39242816 PMCID: PMC11457713 DOI: 10.1038/s41564-024-01773-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Accepted: 07/02/2024] [Indexed: 09/09/2024]
Abstract
Plants are colonized by distinct pathogenic and commensal microbiomes across different regions of the globe, but the factors driving their geographic variation are largely unknown. Here, using 16S ribosomal DNA and shotgun sequencing, we characterized the associations of the Arabidopsis thaliana leaf microbiome with host genetics and climate variables from 267 populations in the species' native range across Europe. Comparing the distribution of the 575 major bacterial amplicon variants (phylotypes), we discovered that microbiome composition in A. thaliana segregates along a latitudinal gradient. The latitudinal clines in microbiome composition are predicted by metrics of drought, but also by the spatial genetics of the host. To validate the relative effects of drought and host genotype we conducted a common garden field study, finding 10% of the core bacteria to be affected directly by drought and 20% to be affected by host genetic associations with drought. These data provide a valuable resource for the plant microbiome field, with the identified associations suggesting that drought can directly and indirectly shape genetic variation in A. thaliana via the leaf microbiome.
Collapse
Affiliation(s)
- Talia L Karasov
- School of Biological Sciences, University of Utah, Salt Lake City, UT, USA.
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Tübingen, Germany.
| | - Manuela Neumann
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Tübingen, Germany
- Robert Bosch GmbH, Renningen, Germany
| | - Laura Leventhal
- Department of Biology, Stanford University, Stanford, CA, USA
- Department of Plant Biology, Carnegie Institution for Plant Science, Stanford, CA, USA
| | - Efthymia Symeonidi
- School of Biological Sciences, University of Utah, Salt Lake City, UT, USA
| | - Gautam Shirsekar
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Tübingen, Germany
- Department of Entomology and Plant Pathology, Institute of Agriculture, University of Tennessee, Knoxville, TN, USA
| | - Aubrey Hawks
- School of Biological Sciences, University of Utah, Salt Lake City, UT, USA
| | - Grey Monroe
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Tübingen, Germany
- Department of Plant Sciences, University of California Davis, Davis, CA, USA
| | - Moisés Exposito-Alonso
- Department of Biology, Stanford University, Stanford, CA, USA
- Department of Plant Biology, Carnegie Institution for Plant Science, Stanford, CA, USA
- Department of Integrative Biology, University of California Berkeley, Berkeley, CA, USA
- Howard Hughes Medical Institute, University of California Berkeley, Berkeley, CA, USA
| | - Joy Bergelson
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| | - Detlef Weigel
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Tübingen, Germany.
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany.
| | - Rebecca Schwab
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Tübingen, Germany
| |
Collapse
|
3
|
Jiang Y, Qu M, Jiang M, Jiang X, Fernandez S, Porter T, Laws SM, Masters CL, Guo H, Cheng S, Wang C. MethylGenotyper: Accurate Estimation of SNP Genotypes and Genetic Relatedness from DNA Methylation Data. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae044. [PMID: 39353864 DOI: 10.1093/gpbjnl/qzae044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Revised: 05/26/2024] [Accepted: 06/06/2024] [Indexed: 10/04/2024]
Abstract
Epigenome-wide association studies (EWAS) are susceptible to widespread confounding caused by population structure and genetic relatedness. Nevertheless, kinship estimation is challenging in EWAS without genotyping data. Here, we proposed MethylGenotyper, a method that for the first time enables accurate genotyping at thousands of single nucleotide polymorphisms (SNPs) directly from commercial DNA methylation microarrays. We modeled the intensities of methylation probes near SNPs with a mixture of three beta distributions corresponding to different genotypes and estimated parameters with an expectation-maximization algorithm. We conducted extensive simulations to demonstrate the performance of the method. When applying MethylGenotyper to the Infinium EPIC array data of 4662 Chinese samples, we obtained genotypes at 4319 SNPs with a concordance rate of 98.26%, enabling the identification of 255 pairs of close relatedness. Furthermore, we showed that MethylGenotyper allows for the estimation of both population structure and cryptic relatedness among 702 Australians of diverse ancestry. We also implemented MethylGenotyper in a publicly available R package (https://github.com/Yi-Jiang/MethylGenotyper) to facilitate future large-scale EWAS.
Collapse
Affiliation(s)
- Yi Jiang
- Ministry of Education Key Laboratory of Environment and Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Minghan Qu
- Ministry of Education Key Laboratory of Environment and Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Minghui Jiang
- Ministry of Education Key Laboratory of Environment and Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Xuan Jiang
- Ministry of Education Key Laboratory of Environment and Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Shane Fernandez
- Centre for Precision Health, Edith Cowan University, Perth, WA 6027, Australia
- Collaborative Genomics and Translation Group, School of Medical and Health Sciences, Edith Cowan University, Perth, WA 6027, Australia
| | - Tenielle Porter
- Centre for Precision Health, Edith Cowan University, Perth, WA 6027, Australia
- Collaborative Genomics and Translation Group, School of Medical and Health Sciences, Edith Cowan University, Perth, WA 6027, Australia
- Curtin Medical School, Bentley, WA 6102, Australia
| | - Simon M Laws
- Centre for Precision Health, Edith Cowan University, Perth, WA 6027, Australia
- Collaborative Genomics and Translation Group, School of Medical and Health Sciences, Edith Cowan University, Perth, WA 6027, Australia
- Curtin Medical School, Bentley, WA 6102, Australia
| | - Colin L Masters
- The Florey Institute of Neuroscience and Mental Health, University of Melbourne, Melbourne, VIC 3052, Australia
| | - Huan Guo
- Ministry of Education Key Laboratory of Environment and Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
- Department of Occupational and Environmental Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Shanshan Cheng
- Ministry of Education Key Laboratory of Environment and Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Chaolong Wang
- Ministry of Education Key Laboratory of Environment and Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| |
Collapse
|
4
|
Freudiger A, Jovanovic VM, Huang Y, Snyder-Mackler N, Conrad DF, Miller B, Montague MJ, Westphal H, Stadler PF, Bley S, Horvath JE, Brent LJN, Platt ML, Ruiz-Lambides A, Tung J, Nowick K, Ringbauer H, Widdig A. Taking identity-by-descent analysis into the wild: Estimating realized relatedness in free-ranging macaques. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.09.574911. [PMID: 38260273 PMCID: PMC10802400 DOI: 10.1101/2024.01.09.574911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Biological relatedness is a key consideration in studies of behavior, population structure, and trait evolution. Except for parent-offspring dyads, pedigrees capture relatedness imperfectly. The number and length of DNA segments that are identical-by-descent (IBD) yield the most precise estimates of relatedness. Here, we leverage novel methods for estimating locus-specific IBD from low coverage whole genome resequencing data to demonstrate the feasibility and value of resolving fine-scaled gradients of relatedness in free-living animals. Using primarily 4-6× coverage data from a rhesus macaque (Macaca mulatta) population with available long-term pedigree data, we show that we can call the number and length of IBD segments across the genome with high accuracy even at 0.5× coverage. The resulting estimates demonstrate substantial variation in genetic relatedness within kin classes, leading to overlapping distributions between kin classes. They identify cryptic genetic relatives that are not represented in the pedigree and reveal elevated recombination rates in females relative to males, which allows us to discriminate maternal and paternal kin using genotype data alone. Our findings represent a breakthrough in the ability to understand the predictors and consequences of genetic relatedness in natural populations, contributing to our understanding of a fundamental component of population structure in the wild.
Collapse
Affiliation(s)
- Annika Freudiger
- Behavioral Ecology Research Group, Faculty of Life Sciences, Institute of Biology, Leipzig University, Leipzig, Germany
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Vladimir M Jovanovic
- Human Biology and Primate Evolution, Institut für Zoologie, Freie Universität Berlin, Berlin, Germany
- Bioinformatics Solution Center, Freie Universität Berlin, Berlin, Germany
| | - Yilei Huang
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Bioinformatics Group, Institute of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig University, Leipzig, Germany
| | - Noah Snyder-Mackler
- Center for Evolution & Medicine, School of Life Sciences, Arizona State University, Tempe, USA
| | - Donald F Conrad
- Division of Genetics, Oregon National Primate Research Center, Portland, Oregon, USA
| | - Brian Miller
- Division of Genetics, Oregon National Primate Research Center, Portland, Oregon, USA
| | - Michael J Montague
- Department of Neuroscience, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Hendrikje Westphal
- Behavioral Ecology Research Group, Faculty of Life Sciences, Institute of Biology, Leipzig University, Leipzig, Germany
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Bioinformatics Group, Institute of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig University, Leipzig, Germany
| | - Peter F Stadler
- Bioinformatics Group, Institute of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig University, Leipzig, Germany
- Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany
- Institute for Theoretical Chemistry, University of Vienna, Austria
- Facultad de Ciencias, Universidad Nacional de Colombia, Bogotá, Colombia
- Santa Fe Institute, Santa Fe, NM, USA
| | - Stefanie Bley
- Behavioral Ecology Research Group, Faculty of Life Sciences, Institute of Biology, Leipzig University, Leipzig, Germany
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Julie E Horvath
- Department of Biological and Biomedical Sciences, North Carolina Central University, North Carolina, Durham, USA
- Research and Collections Section, North Carolina Museum of Natural Sciences, North Carolina, Raleigh, USA
- Department of Biological Sciences, North Carolina State University, North Carolina, Raleigh, USA
- Department of Evolutionary Anthropology, Duke University, North Carolina, Durham, USA
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Lauren J N Brent
- Centre for Research in Animal Behaviour, University of Exeter, Exeter, UK
| | - Michael L Platt
- Department of Neuroscience, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Marketing Department, the Wharton School of Business, University of Pennsylvania, Philadelphia, PA, USA
- Department of Psychology, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, USA
| | - Angelina Ruiz-Lambides
- Cayo Santiago Field Station, Caribbean Primate Research Center, University of Puerto Rico, Punta Santiago, Puerto Rico
| | - Jenny Tung
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Department of Evolutionary Anthropology, Duke University, North Carolina, Durham, USA
- Department of Biology, Duke University, Durham, North Carolina, USA
- Duke University Population Research Institute, Durham, North Carolina, USA
| | - Katja Nowick
- Human Biology and Primate Evolution, Institut für Zoologie, Freie Universität Berlin, Berlin, Germany
- Bioinformatics Solution Center, Freie Universität Berlin, Berlin, Germany
| | - Harald Ringbauer
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Anja Widdig
- Behavioral Ecology Research Group, Faculty of Life Sciences, Institute of Biology, Leipzig University, Leipzig, Germany
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, Germany
| |
Collapse
|
5
|
Arora A, Jack K, Kumar AV, Borad M, Girardo ME, De Filippis E, Yang P, Dinu V. Genome-Wide Association Study of Gallstone Disease Identifies Novel Candidate Genomic Variants in a Latino Community of Southwest USA. J Racial Ethn Health Disparities 2023:10.1007/s40615-023-01867-0. [PMID: 38015333 DOI: 10.1007/s40615-023-01867-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 11/05/2023] [Accepted: 11/06/2023] [Indexed: 11/29/2023]
Abstract
Gallstone disease (GSD) is a prevalent health condition that impacts many adults and is associated with presence of stones in gallbladder cavity that results in inflammation, pain, fever, nausea and vomiting. Several genome-wide association studies (GWAS) in the past have identified genes associated with GSD but only a few were focused on Latino population. To identify genetic risk factors for GSD in Latino population living in the Southwest USA we used self-reported clinical history, physical and lab measurements data in Sangre Por Salud (SPS) cohort and identified participants with and without diagnosis of GSD. We performed a GWAS on this phenotype using GSD cases matched to normal controls based on a tight criterion. We identified several novel loci associated with GSD as well as loci that were previously identified in past GWAS studies. The top 3 loci (MATN2, GPRIN3, GPC6) were strongly associated with GSD phenotype in our combined analysis and a sex stratified analysis results in females were closest to the overall results reflecting a general higher disease prevalence in females. The top identified variants in MATN2, GPRIN3, and GPC6 remain unchanged after local ancestry adjustment in SPS Latino population. Follow-up pathway enrichment analysis suggests enrichment of GO terms that are associated with immunological pathways; enzymatic processes in gallbladder, liver, and gastrointestinal tract; and GSD pathology. Our findings suggest an initial starting point towards better and deeper understanding of differences in gallstone disease pathology, biological mechanisms, and disease progression among Southwest US Latino population.
Collapse
Affiliation(s)
- Amit Arora
- College of Health Solutions, Arizona State University, Phoenix, AZ, 85004, USA.
| | - Khadijah Jack
- College of Health Solutions, Arizona State University, Phoenix, AZ, 85004, USA
| | - Ashok V Kumar
- Department of Quantitative Health Science, Mayo Clinic, Scottsdale, AZ, 85259, USA
| | - Mitesh Borad
- Division of Hematology and Medical Oncology, Mayo Clinic, Scottsdale, AZ, 85259, USA
| | - Marlene E Girardo
- Department of Quantitative Health Science, Mayo Clinic, Scottsdale, AZ, 85259, USA
| | | | - Ping Yang
- Department of Quantitative Health Science, Mayo Clinic, Scottsdale, AZ, 85259, USA
| | - Valentin Dinu
- College of Health Solutions, Arizona State University, Phoenix, AZ, 85004, USA
| |
Collapse
|
6
|
Jiang W, Zhang X, Li S, Song S, Zhao H. An unbiased kinship estimation method for genetic data analysis. BMC Bioinformatics 2022; 23:525. [PMID: 36474154 PMCID: PMC9727941 DOI: 10.1186/s12859-022-05082-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Accepted: 11/25/2022] [Indexed: 12/13/2022] Open
Abstract
Accurate estimate of relatedness is important for genetic data analyses, such as heritability estimation and association mapping based on data collected from genome-wide association studies. Inaccurate relatedness estimates may lead to biased heritability estimations and spurious associations. Individual-level genotype data are often used to estimate kinship coefficient between individuals. The commonly used sample correlation-based genomic relationship matrix (scGRM) method estimates kinship coefficient by calculating the average sample correlation coefficient among all single nucleotide polymorphisms (SNPs), where the observed allele frequencies are used to calculate both the expectations and variances of genotypes. Although this method is widely used, a substantial proportion of estimated kinship coefficients are negative, which are difficult to interpret. In this paper, through mathematical derivation, we show that there indeed exists bias in the estimated kinship coefficient using the scGRM method when the observed allele frequencies are regarded as true frequencies. This leads to negative bias for the average estimate of kinship among all individuals, which explains the estimated negative kinship coefficients. Based on this observation, we propose an unbiased estimation method, UKin, which can reduce kinship estimation bias. We justify our improved method with rigorous mathematical proof. We have conducted simulations as well as two real data analyses to compare UKin with scGRM and three other kinship estimating methods: rGRM, tsGRM, and KING. Our results demonstrate that both bias and root mean square error in kinship coefficient estimation could be reduced by using UKin. We further investigated the performance of UKin, KING, and three GRM-based methods in calculating the SNP-based heritability, and show that UKin can improve estimation accuracy for heritability regardless of the scale of SNP panel.
Collapse
Affiliation(s)
- Wei Jiang
- Department of Biostatistics, School of Public Health, Yale University, New Haven, USA
| | - Xiangyu Zhang
- Department of Biostatistics, School of Public Health, Yale University, New Haven, USA
| | - Siting Li
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, USA
| | - Shuang Song
- Center for Statistical Science, Tsinghua University, Beijing, China
- Department of Industrial Engineering, Tsinghua University, Beijing, China
| | - Hongyu Zhao
- Department of Biostatistics, School of Public Health, Yale University, New Haven, USA.
| |
Collapse
|
7
|
Wang S, Kim M, Li W, Jiang X, Chen H, Harmanci A. Privacy-aware estimation of relatedness in admixed populations. Brief Bioinform 2022; 23:bbac473. [PMID: 36384083 PMCID: PMC10144692 DOI: 10.1093/bib/bbac473] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 09/07/2022] [Accepted: 10/02/2022] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Estimation of genetic relatedness, or kinship, is used occasionally for recreational purposes and in forensic applications. While numerous methods were developed to estimate kinship, they suffer from high computational requirements and often make an untenable assumption of homogeneous population ancestry of the samples. Moreover, genetic privacy is generally overlooked in the usage of kinship estimation methods. There can be ethical concerns about finding unknown familial relationships in third-party databases. Similar ethical concerns may arise while estimating and reporting sensitive population-level statistics such as inbreeding coefficients for the concerns around marginalization and stigmatization. RESULTS Here, we present SIGFRIED, which makes use of existing reference panels with a projection-based approach that simplifies kinship estimation in the admixed populations. We use simulated and real datasets to demonstrate the accuracy and efficiency of kinship estimation. We present a secure federated kinship estimation framework and implement a secure kinship estimator using homomorphic encryption-based primitives for computing relatedness between samples in two different sites while genotype data are kept confidential. Source code and documentation for our methods can be found at https://doi.org/10.5281/zenodo.7053352. CONCLUSIONS Analysis of relatedness is fundamentally important for identifying relatives, in association studies, and for estimation of population-level estimates of inbreeding. As the awareness of individual and group genomic privacy is growing, privacy-preserving methods for the estimation of relatedness are needed. Presented methods alleviate the ethical and privacy concerns in the analysis of relatedness in admixed, historically isolated and underrepresented populations. SHORT ABSTRACT Genetic relatedness is a central quantity used for finding relatives in databases, correcting biases in genome wide association studies and for estimating population-level statistics. Methods for estimating genetic relatedness have high computational requirements, and occasionally do not consider individuals from admixed ancestries. Furthermore, the ethical concerns around using genetic data and calculating relatedness are not considered. We present a projection-based approach that can efficiently and accurately estimate kinship. We implement our method using encryption-based techniques that provide provable security guarantees to protect genetic data while kinship statistics are computed among multiple sites.
Collapse
Affiliation(s)
- Su Wang
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Miran Kim
- Department of Mathematics, Hanyang University, Seoul, 04763. Republic of Korea
| | - Wentao Li
- Center for Secure Artificial intelligence For hEalthcare (SAFE), School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX, 77030, USA
| | - Xiaoqian Jiang
- Center for Secure Artificial intelligence For hEalthcare (SAFE), School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX, 77030, USA
| | - Han Chen
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Arif Harmanci
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| |
Collapse
|
8
|
Mbebi AJ, Breitler JC, Bordeaux M, Sulpice R, McHale M, Tong H, Toniutti L, Castillo JA, Bertrand B, Nikoloski Z. A comparative analysis of genomic and phenomic predictions of growth-related traits in 3-way coffee hybrids. G3 GENES|GENOMES|GENETICS 2022; 12:6632664. [PMID: 35792875 PMCID: PMC9434219 DOI: 10.1093/g3journal/jkac170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Accepted: 06/14/2022] [Indexed: 11/14/2022]
Abstract
Abstract
Genomic prediction has revolutionized crop breeding despite remaining issues of transferability of models to unseen environmental conditions and environments. Usage of endophenotypes rather than genomic markers leads to the possibility of building phenomic prediction models that can account, in part, for this challenge. Here, we compare and contrast genomic prediction and phenomic prediction models for 3 growth-related traits, namely, leaf count, tree height, and trunk diameter, from 2 coffee 3-way hybrid populations exposed to a series of treatment-inducing environmental conditions. The models are based on 7 different statistical methods built with genomic markers and ChlF data used as predictors. This comparative analysis demonstrates that the best-performing phenomic prediction models show higher predictability than the best genomic prediction models for the considered traits and environments in the vast majority of comparisons within 3-way hybrid populations. In addition, we show that phenomic prediction models are transferrable between conditions but to a lower extent between populations and we conclude that chlorophyll a fluorescence data can serve as alternative predictors in statistical models of coffee hybrid performance. Future directions will explore their combination with other endophenotypes to further improve the prediction of growth-related traits for crops.
Collapse
Affiliation(s)
- Alain J Mbebi
- Bioinformatics Group, Institute of Biochemistry and Biology, University of Potsdam , Potsdam-Golm 14476, Germany
- Systems Biology and Mathematical Modeling Group, Max Planck Institute of Molecular Plant Physiology , Potsdam-Golm 14476, Germany
| | - Jean-Christophe Breitler
- Centre de Coopération Internationale en Recherche Agronomique pour le Développement, Montpellier 34398, France
| | - Mélanie Bordeaux
- Fundación Nicafrance , Finca La Cumplida Km. 147 Carretera Matagalpa - La Dalia, 3 Km al Noreste, Matagalpa, Nicaragua
| | - Ronan Sulpice
- National University Ireland Galway, Plant Systems Biology Laboratory, Ryan Institute, School of Natural Sciences , Galway H91 TK33, Ireland
| | - Marcus McHale
- National University Ireland Galway, Plant Systems Biology Laboratory, Ryan Institute, School of Natural Sciences , Galway H91 TK33, Ireland
| | - Hao Tong
- Bioinformatics Group, Institute of Biochemistry and Biology, University of Potsdam , Potsdam-Golm 14476, Germany
- Systems Biology and Mathematical Modeling Group, Max Planck Institute of Molecular Plant Physiology , Potsdam-Golm 14476, Germany
- Center for Plant Systems Biology and Biotechnology , Plovdiv 4000, Bulgaria
| | - Lucile Toniutti
- Centre de Coopération Internationale en Recherche Agronomique pour le Développement, Montpellier 34398, France
| | - Jonny Alonso Castillo
- Fundación Nicafrance , Finca La Cumplida Km. 147 Carretera Matagalpa - La Dalia, 3 Km al Noreste, Matagalpa, Nicaragua
| | - Benoît Bertrand
- Centre de Coopération Internationale en Recherche Agronomique pour le Développement, Montpellier 34398, France
| | - Zoran Nikoloski
- Bioinformatics Group, Institute of Biochemistry and Biology, University of Potsdam , Potsdam-Golm 14476, Germany
- Systems Biology and Mathematical Modeling Group, Max Planck Institute of Molecular Plant Physiology , Potsdam-Golm 14476, Germany
- Center for Plant Systems Biology and Biotechnology , Plovdiv 4000, Bulgaria
| |
Collapse
|
9
|
Herzig AF, Ciullo M, Leutenegger AL, Perdry H. Moment estimators of relatedness from low-depth whole-genome sequencing data. BMC Bioinformatics 2022; 23:254. [PMID: 35751014 PMCID: PMC9233360 DOI: 10.1186/s12859-022-04795-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 06/09/2022] [Indexed: 11/29/2022] Open
Abstract
Background Estimating relatedness is an important step for many genetic study designs. A variety of methods for estimating coefficients of pairwise relatedness from genotype data have been proposed. Both the kinship coefficient \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\varphi$$\end{document}φ and the fraternity coefficient \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\psi$$\end{document}ψ for all pairs of individuals are of interest. However, when dealing with low-depth sequencing or imputation data, individual level genotypes cannot be confidently called. To ignore such uncertainty is known to result in biased estimates. Accordingly, methods have recently been developed to estimate kinship from uncertain genotypes. Results We present new method-of-moment estimators of both the coefficients \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\varphi$$\end{document}φ and \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\psi$$\end{document}ψ calculated directly from genotype likelihoods. We have simulated low-depth genetic data for a sample of individuals with extensive relatedness by using the complex pedigree of the known genetic isolates of Cilento in South Italy. Through this simulation, we explore the behaviour of our estimators, demonstrate their properties, and show advantages over alternative methods. A demonstration of our method is given for a sample of 150 French individuals with down-sampled sequencing data. Conclusions We find that our method can provide accurate relatedness estimates whilst holding advantages over existing methods in terms of robustness, independence from external software, and required computation time. The method presented in this paper is referred to as LowKi (Low-depth Kinship) and has been made available in an R package (https://github.com/genostats/LowKi). Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04795-8.
Collapse
Affiliation(s)
| | - M Ciullo
- Institute of Genetics and Biophysics A. Buzzati-Traverso - CNR, Naples, Italy.,IRCCS Neuromed, Pozzilli, Isernia, Italy
| | | | - A-L Leutenegger
- Inserm, Université Paris Cité, UMR 1141, NeuroDiderot, 75019, Paris, France
| | - H Perdry
- CESP Inserm U1018, Université Paris-Saclay, UVSQ, Villejuif, France
| |
Collapse
|
10
|
Laurent FX, Fischer A, Oldt RF, Kanthaswamy S, Buckleton JS, Hitchin S. Streamlining the decision-making process for international DNA kinship matching using Worldwide allele frequencies and tailored cutoff log 10LR thresholds. Forensic Sci Int Genet 2021; 57:102634. [PMID: 34871915 DOI: 10.1016/j.fsigen.2021.102634] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Revised: 10/13/2021] [Accepted: 11/15/2021] [Indexed: 11/30/2022]
Abstract
The identification of human remains belonging to missing persons is one of the main challenges for forensic genetics. Although other means of identification can be applied to missing person investigations, DNA is often extremely valuable to further support or refute potential associations. When reference DNA samples cannot be collected from personal items belonging to a missing person, a direct DNA identification cannot be carried out. However, identifications can be made indirectly using DNA from the missing person's relatives. The ranking of likelihood ratio (LR) values, which measure the fit of a missing person for any given pedigree, is often the first step in selecting candidates in a DNA database. Although implementing DNA kinship matching in a national environment is feasible, many challenges need to be resolved before applying this method to an international configuration. In this study, we present an innovative and intuitive method to perform international DNA kinship matching and facilitate the comparison of DNA profiles when the ancestry is unknown or unsure and/or when different marker sets are used. This straightforward method, which is based on calculations performed with the DNA matching software BONAPARTE, Worldwide allele frequencies and tailored cutoff log10LR thresholds, allows for the classification of potential candidates according to the strength of the DNA evidence and the predicted proportion of adventitious matches. This is a powerful method for streamlining the decision-making process in missing person investigations and DVI processes, especially when there are low numbers of overlapping typed STRs. Intuitive interpretation tables and a decision tree will help strengthen international data comparison for the identification of reported missing individuals discovered outside their national borders.
Collapse
Affiliation(s)
- François-Xavier Laurent
- International Criminal Police Organization - INTERPOL, DNA Unit, 200 quai Charles de Gaulle, 69006 Lyon, France.
| | - Andrea Fischer
- International Criminal Police Organization - INTERPOL, DNA Unit, 200 quai Charles de Gaulle, 69006 Lyon, France; Landeskriminalamt Baden-Württemberg, Taubenheimstr. 85, 70372 Stuttgart, Germany
| | - Robert F Oldt
- School of Mathematical and Natural Sciences, Arizona State University, Phoenix, AZ 85004, USA
| | - Sree Kanthaswamy
- School of Mathematical and Natural Sciences, Arizona State University, Phoenix, AZ 85004, USA
| | - John S Buckleton
- University of Auckland, Department of Statistics, Private Bag, 92019 Auckland, New Zealand
| | - Susan Hitchin
- International Criminal Police Organization - INTERPOL, DNA Unit, 200 quai Charles de Gaulle, 69006 Lyon, France.
| |
Collapse
|
11
|
Vai S, Diroma MA, Cannariato C, Budnik A, Lari M, Caramelli D, Pilli E. How a Paleogenomic Approach Can Provide Details on Bioarchaeological Reconstruction: A Case Study from the Globular Amphorae Culture. Genes (Basel) 2021; 12:genes12060910. [PMID: 34208224 PMCID: PMC8230892 DOI: 10.3390/genes12060910] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Revised: 06/02/2021] [Accepted: 06/08/2021] [Indexed: 12/14/2022] Open
Abstract
Ancient human remains have the potential to explain a great deal about the prehistory of humankind. Due to recent technological and bioinformatics advances, their study, at the palaeogenomic level, can provide important information about population dynamics, culture changes, and the lifestyles of our ancestors. In this study, mitochondrial and nuclear genome data obtained from human bone remains associated with the Neolithic Globular Amphorae culture, which were recovered in the Megalithic barrow of Kierzkowo (Poland), were reanalysed to gain insight into the social organisation and use of the archaeological site and to provide information at the individual level. We were able to successfully estimate the minimum number of individuals, sex, kin relationships, and phenotypic traits of the buried individuals, despite the low level of preservation of the bone samples and the intricate taphonomic conditions. In addition, the evaluation of damage patterns allowed us to highlight the presence of “intruders”—that is, of more recent skeletal remains that did not belong to the original burial. Due to its characteristics, the study of the Kierzkowo barrow represented a challenge for the reconstruction of the biological profile of the human community who exploited it and an excellent example of the contribution that ancient genomic analysis can provide to archaeological reconstruction.
Collapse
Affiliation(s)
- Stefania Vai
- Department of Biology, University of Florence, 50122 Florence, Italy; (M.A.D.); (C.C.); (M.L.); (D.C.); (E.P.)
- Correspondence:
| | - Maria Angela Diroma
- Department of Biology, University of Florence, 50122 Florence, Italy; (M.A.D.); (C.C.); (M.L.); (D.C.); (E.P.)
| | - Costanza Cannariato
- Department of Biology, University of Florence, 50122 Florence, Italy; (M.A.D.); (C.C.); (M.L.); (D.C.); (E.P.)
| | - Alicja Budnik
- Department of Human Biology, Institute of Biological Sciences, Cardinal Stefan Wyszyński University, 01-938 Warsaw, Poland;
| | - Martina Lari
- Department of Biology, University of Florence, 50122 Florence, Italy; (M.A.D.); (C.C.); (M.L.); (D.C.); (E.P.)
| | - David Caramelli
- Department of Biology, University of Florence, 50122 Florence, Italy; (M.A.D.); (C.C.); (M.L.); (D.C.); (E.P.)
| | - Elena Pilli
- Department of Biology, University of Florence, 50122 Florence, Italy; (M.A.D.); (C.C.); (M.L.); (D.C.); (E.P.)
| |
Collapse
|
12
|
Nøhr AK, Hanghøj K, Erill GG, Li Z, Moltke I, Albrechtsen A. NGSremix: A software tool for estimating pairwise relatedness between admixed individuals from next-generation sequencing data. G3-GENES GENOMES GENETICS 2021; 11:6279082. [PMID: 34015083 PMCID: PMC8496226 DOI: 10.1093/g3journal/jkab174] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 05/03/2021] [Indexed: 12/04/2022]
Abstract
Estimation of relatedness between pairs of individuals is important in many genetic research areas. When estimating relatedness, it is important to account for admixture if this is present. However, the methods that can account for admixture are all based on genotype data as input, which is a problem for low-depth next-generation sequencing (NGS) data from which genotypes are called with high uncertainty. Here, we present a software tool, NGSremix, for maximum likelihood estimation of relatedness between pairs of admixed individuals from low-depth NGS data, which takes the uncertainty of the genotypes into account via genotype likelihoods. Using both simulated and real NGS data for admixed individuals with an average depth of 4x or below we show that our method works well and clearly outperforms all the commonly used state-of-the-art relatedness estimation methods PLINK, KING, relateAdmix, and ngsRelate that all perform quite poorly. Hence, NGSremix is a useful new tool for estimating relatedness in admixed populations from low-depth NGS data. NGSremix is implemented in C/C++ in a multi-threaded software and is freely available on Github https://github.com/KHanghoj/NGSremix.
Collapse
Affiliation(s)
- Anne Krogh Nøhr
- The Bioinformatics Centre, Department of Biology, University of Copenhagen, 2200 Copenhagen N, Denmark.,H. Lundbeck A/S, 2500 Valby, Denmark
| | - Kristian Hanghøj
- The Bioinformatics Centre, Department of Biology, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Genis Garcia Erill
- The Bioinformatics Centre, Department of Biology, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Zilong Li
- The Bioinformatics Centre, Department of Biology, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Ida Moltke
- The Bioinformatics Centre, Department of Biology, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Anders Albrechtsen
- The Bioinformatics Centre, Department of Biology, University of Copenhagen, 2200 Copenhagen N, Denmark
| |
Collapse
|
13
|
DeVogel N, Auer PL, Manansala R, Rau A, Wang T. A unified linear mixed model for familial relatedness and population structure in genetic association studies. Genet Epidemiol 2020; 45:305-315. [PMID: 33175443 DOI: 10.1002/gepi.22371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Revised: 09/14/2020] [Accepted: 10/20/2020] [Indexed: 11/10/2022]
Abstract
Familial relatedness (FR) and population structure (PS) are two major sources for genetic correlation. In the human population, both FR and PS can further break down into additive and dominant components to account for potential additive and dominant genetic effects. In this study, besides the classical additive genomic relationship matrix, a dominant genomic relationship matrix is introduced. A link between the additive/dominant genomic relationship matrices and the coancestry (or kinship)/double coancestry coefficients is also established. In addition, a way to separate the FR and PS correlations based on the estimates of coancestry and double coancestry coefficients from the genomic relationship matrices is proposed. A unified linear mixed model is also developed, which can account for both the additive and dominance effects of FR and PS correlations as well as their possible random interactions. Finally, this unified linear mixed model is applied to analyze two study cohorts from UK Biobank.
Collapse
Affiliation(s)
- Nicholas DeVogel
- Division of Biostatistics, Institute for Health and Equity, Milwaukee, Wisconsin, USA
| | - Paul L Auer
- Zilber School of Public Health, University of Wisconsin-Milwaukee, Milwaukee, Wisconsin, USA
| | - Regina Manansala
- Zilber School of Public Health, University of Wisconsin-Milwaukee, Milwaukee, Wisconsin, USA
| | - Andrea Rau
- Zilber School of Public Health, University of Wisconsin-Milwaukee, Milwaukee, Wisconsin, USA.,INRAE, AgroParisTech, GABI, Université Paris-Saclay, Jouy-en-Josas, France
| | - Tao Wang
- Division of Biostatistics, Institute for Health and Equity, Milwaukee, Wisconsin, USA
| |
Collapse
|
14
|
Kumar S, Yadav N, Pandey S, Muthane UB, Govindappa ST, Abbas MM, Behari M, Thelma BK. Novel and reported variants in Parkinson's disease genes confer high disease burden among Indians. Parkinsonism Relat Disord 2020; 78:46-52. [PMID: 32707456 DOI: 10.1016/j.parkreldis.2020.07.014] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Revised: 06/24/2020] [Accepted: 07/13/2020] [Indexed: 12/20/2022]
Abstract
BACKGROUND Genetic heterogeneity in Parkinson's disease (PD) has been unambiguously reported across different populations. Assuming a higher genetic load, we tested variant burden in PD genes to an early onset PD cohort from India. METHODS Whole exome sequencing was performed in 250 PD patients recruited following MDS-UPDRS criteria. The number of rare variants in the 20 known PD genes per exome were used to calculate average rare variant burden with the 616 non-PD exomes available in-house as a comparison group. SKAT-O test was used for gene level analysis. RESULTS 80 patients harboured rare variants in 20 PD genes, of which six had known pathogenic variants accounting for 2.4% of the cohort. Of 80 patients, 12 had homozygous and nine had likely compound heterozygous variants in recessive PD genes and 59 had heterozygous variants in only dominant PD genes. Of the 16 novel variants of as yet unknown significance identified, four homozygous across ATP13A2, PRKN, SYNJ1 and PARK7; and 12 heterozygous among LRRK2, VPS35, EIF4G1 and CHCHD2 were observed. SKAT-O test suggested a higher burden in GBA (punadjusted = 0.002). Aggregate rare variant analysis including 75 more individuals with only heterozygous variants in recessive PD genes (excluding GBA), with an average of 0.85 protein-altering rare variants per PD patient exome versus 0.51 in the non-PD group, revealed a significant enrichment (p < 0.0001). CONCLUSION This first study in an early onset PD cohort among Indians identified 16 novel variants in known genes and also provides evidence for a high genetic burden in this ethnically distinct population.
Collapse
Affiliation(s)
- Sumeet Kumar
- Department of Genetics, University of Delhi South Campus, New Delhi, 110021, India
| | - Navneesh Yadav
- Department of Genetics, University of Delhi South Campus, New Delhi, 110021, India
| | - Sanjay Pandey
- Govind Ballabh Pant Postgraduate Institute of Medical Education and Research, New Delhi, India
| | - Uday B Muthane
- Parkinson's and Aging Research Foundation, Bengaluru, India
| | | | - Masoom M Abbas
- Parkinson's and Aging Research Foundation, Bengaluru, India
| | - Madhuri Behari
- All India Institute of Medical Sciences, New Delhi, India
| | - B K Thelma
- Department of Genetics, University of Delhi South Campus, New Delhi, 110021, India.
| |
Collapse
|
15
|
Dou J, Wu D, Ding L, Wang K, Jiang M, Chai X, Reilly DF, Tai ES, Liu J, Sim X, Cheng S, Wang C. Using off-target data from whole-exome sequencing to improve genotyping accuracy, association analysis and polygenic risk prediction. Brief Bioinform 2020; 22:5857014. [PMID: 32591784 DOI: 10.1093/bib/bbaa084] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Revised: 04/09/2020] [Accepted: 04/21/2020] [Indexed: 12/12/2022] Open
Abstract
Whole-exome sequencing (WES) has been widely used to study the role of protein-coding variants in genetic diseases. Non-coding regions, typically covered by sparse off-target data, are often discarded by conventional WES analyses. Here, we develop a genotype calling pipeline named WEScall to analyse both target and off-target data. We leverage linkage disequilibrium shared within study samples and from an external reference panel to improve genotyping accuracy. In an application to WES of 2527 Chinese and Malays, WEScall can reduce the genotype discordance rate from 0.26% (SE= 6.4 × 10-6) to 0.08% (SE = 3.6 × 10-6) across 1.1 million single nucleotide polymorphisms (SNPs) in the deeply sequenced target regions. Furthermore, we obtain genotypes at 0.70% (SE = 3.0 × 10-6) discordance rate across 5.2 million off-target SNPs, which had ~1.2× mean sequencing depth. Using this dataset, we perform genome-wide association studies of 10 metabolic traits. Despite of our small sample size, we identify 10 loci at genome-wide significance (P < 5 × 10-8), including eight well-established loci. The two novel loci, both associated with glycated haemoglobin levels, are GPATCH8-SLC4A1 (rs369762319, P = 2.56 × 10-12) and ROR2 (rs1201042, P = 3.24 × 10-8). Finally, using summary statistics from UK Biobank and Biobank Japan, we show that polygenic risk prediction can be significantly improved for six out of nine traits by incorporating off-target data (P < 0.01). These results demonstrate WEScall as a useful tool to facilitate WES studies with decent amounts of off-target data.
Collapse
Affiliation(s)
- Jinzhuang Dou
- School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Degang Wu
- School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Lin Ding
- School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Kai Wang
- School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Minghui Jiang
- School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | | | | | - E Shyong Tai
- Saw Swee Hock School of Public Health, Duke-NUS Medical School, and Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Jianjun Liu
- Genome Institute of Singapore and a professor at Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Xueling Sim
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore
| | - Shanshan Cheng
- Ministry of Education Key Laboratory for Environment and Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Chaolong Wang
- Ministry of Education Key Laboratory for Environment and Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| |
Collapse
|
16
|
Leong A, Lim VJY, Wang C, Chai JF, Dorajoo R, Heng CK, van Dam RM, Koh WP, Yuan JM, Jonas JB, Wang YX, Wei WB, Liu J, Reilly DF, Wong TY, Cheng CY, Sim X. Association of G6PD variants with hemoglobin A1c and impact on diabetes diagnosis in East Asian individuals. BMJ Open Diabetes Res Care 2020; 8:8/1/e001091. [PMID: 32209585 PMCID: PMC7103857 DOI: 10.1136/bmjdrc-2019-001091] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Revised: 01/20/2020] [Accepted: 02/14/2020] [Indexed: 12/13/2022] Open
Abstract
OBJECTIVE Hemoglobin A1c (HbA1c) accuracy is important for diabetes diagnosis and estimation of overall glycemia. The G6PD-Asahi variant which causes glucose-6-phosphate dehydrogenase (G6PD) deficiency has been shown to lower HbA1c independently of glycemia in African ancestry populations. As different G6PD variants occur in Asian ancestry, we sought to identify Asian-specific G6PD variants associated with HbA1c. RESEARCH DESIGN AND METHODS In eight Asian population-based cohorts, we performed imputation on the X chromosome using the 1000 Genomes reference panel and tested for association with HbA1c (10 005 East Asians and 2051 South Asians). Results were meta-analyzed across studies. We compared the proportion of individuals classified as having diabetes/pre-diabetes by fasting glucose ≥100 mg/dL or HbA1c ≥5.7% units among carriers and non-carriers of HbA1c-associated variants. RESULTS The strongest association was a missense variant (G6PD-Canton, rs72554665, minor allele frequency=2.2%, effect in men=-0.76% unit, 95% CI -0.88 to -0.64, p=1.25×10-27, n=2844). Conditional analyses identified a secondary distinct signal, missense variant (G6PD-Kaiping, rs72554664, minor allele frequency=1.6%, effect in men=-1.12 % unit, 95% CI -1.32 to -0.92, p=3.12×10-15, pconditional_Canton=7.57×10-11). Adjusting for glucose did not attenuate their effects. The proportion of individuals with fasting glucose ≥100 mg/dL did not differ by carrier status of G6PD-Canton (p=0.21). Whereas the proportion of individuals with HbA1c ≥5.7% units was lower in carriers (5%) compared with non-carriers of G6PD-Canton (30%, p=0.03). CONCLUSIONS We identified two G6PD variants in East Asian men associated with non-glycemic lowering of HbA1c. Carriers of these variants are more likely to be underdiagnosed for diabetes or pre-diabetes than non-carriers if screened by HbA1c without confirmation by direct glucose measurements.
Collapse
Affiliation(s)
- Aaron Leong
- Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts, USA
- Harvard Medical School, Boston, Massachusetts, USA
| | - Victor Jun Yu Lim
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore
| | - Chaolong Wang
- Department of Epidemiology and Biostatistics, Key Laboratory for Environment and Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore
| | - Jin-Fang Chai
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore
| | - Rajkumar Dorajoo
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore
| | - Chew-Kiat Heng
- Department of Paediatrics, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
- Khoo Teck Puat-National University Children's Medical Institute, National University Health System, Singapore
| | - Rob M van Dam
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore
| | - Woon-Puay Koh
- Health Services and Systems Research, Duke NUS Medical School, Singapore
| | - Jian-Min Yuan
- Division of Cancer Control and Population Sciences, UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
- Department of Epidemiology, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Jost B Jonas
- Department of Ophthalmology, Medical Faculty Mannheim, University of Heidelberg, Heidelberg, Baden-Württemberg, Germany
- Ophthalmology and Visual Sciences Key Laboratory, Beijing Institute of Ophthalmology, Beijing Tongren Hospital, Capital Medical University, Beijing, China
| | - Ya Xing Wang
- Ophthalmology and Visual Sciences Key Laboratory, Beijing Institute of Ophthalmology, Beijing Tongren Hospital, Capital Medical University, Beijing, China
| | - Wen-Bin Wei
- Beijing Key Laboratory of Intraocular Tumor Diagnosis and Treatment, Beijing Tongren Eye Center, Beijing Tongren Hospital, Capital Medical University, Beijing, China
| | - Jianjun Liu
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore
- Department of Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Dermot F Reilly
- Genetics, Merck Sharp and Dohme IA, Kenilworth, New Jersey, USA
| | - Tien-Yin Wong
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore
- Ophthalmology and Visual Sciences Academic Clinical Program (Eye ACP), Duke-NUS Medical School, Singapore
| | - Ching-Yu Cheng
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore
- Ophthalmology and Visual Sciences Academic Clinical Program (Eye ACP), Duke-NUS Medical School, Singapore
| | - Xueling Sim
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore
| |
Collapse
|
17
|
Waples RK, Albrechtsen A, Moltke I. Allele frequency-free inference of close familial relationships from genotypes or low-depth sequencing data. Mol Ecol 2019; 28:35-48. [PMID: 30462358 PMCID: PMC6850436 DOI: 10.1111/mec.14954] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2018] [Accepted: 10/12/2018] [Indexed: 01/03/2023]
Abstract
Knowledge of how individuals are related is important in many areas of research, and numerous methods for inferring pairwise relatedness from genetic data have been developed. However, the majority of these methods were not developed for situations where data are limited. Specifically, most methods rely on the availability of population allele frequencies, the relative genomic position of variants and accurate genotype data. But in studies of non‐model organisms or ancient samples, such data are not always available. Motivated by this, we present a new method for pairwise relatedness inference, which requires neither allele frequency information nor information on genomic position. Furthermore, it can be applied not only to accurate genotype data but also to low‐depth sequencing data from which genotypes cannot be accurately called. We evaluate it using data from a range of human populations and show that it can be used to infer close familial relationships with a similar accuracy as a widely used method that relies on population allele frequencies. Additionally, we show that our method is robust to SNP ascertainment and applicable to low‐depth sequencing data generated using different strategies, including resequencing and RADseq, which is important for application to a diverse range of populations and species.
Collapse
Affiliation(s)
- Ryan K Waples
- Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Copenhagen N, Denmark
| | - Anders Albrechtsen
- Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Copenhagen N, Denmark
| | - Ida Moltke
- Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Copenhagen N, Denmark
| |
Collapse
|
18
|
Large-Scale Whole-Genome Sequencing of Three Diverse Asian Populations in Singapore. Cell 2019; 179:736-749.e15. [DOI: 10.1016/j.cell.2019.09.019] [Citation(s) in RCA: 76] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2019] [Revised: 06/24/2019] [Accepted: 09/19/2019] [Indexed: 12/19/2022]
|
19
|
Tan ALM, Langley SR, Tan CF, Chai JF, Khoo CM, Leow MKS, Khoo EYH, Moreno-Moral A, Pravenec M, Rotival M, Sadananthan SA, Velan SS, Venkataraman K, Chong YS, Lee YS, Sim X, Stunkel W, Liu MH, Tai ES, Petretto E. Ethnicity-Specific Skeletal Muscle Transcriptional Signatures and Their Relevance to Insulin Resistance in Singapore. J Clin Endocrinol Metab 2019; 104:465-486. [PMID: 30137523 DOI: 10.1210/jc.2018-00309] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/07/2018] [Accepted: 08/14/2018] [Indexed: 11/19/2022]
Abstract
CONTEXT Insulin resistance (IR) and obesity differ among ethnic groups in Singapore, with the Malays more obese yet less IR than Asian-Indians. However, the molecular basis underlying these differences is not clear. OBJECTIVE As the skeletal muscle (SM) is metabolically relevant to IR, we investigated molecular pathways in SM that are associated with ethnic differences in IR, obesity, and related traits. DESIGN, SETTING, AND MAIN OUTCOME MEASURES We integrated transcriptomic, genomic, and phenotypic analyses in 156 healthy subjects representing three major ethnicities in the Singapore Adult Metabolism Study. PATIENTS This study contains Chinese (n = 63), Malay (n = 51), and Asian-Indian (n = 42) men, aged 21 to 40 years, without systemic diseases. RESULTS We found remarkable diversity in the SM transcriptome among the three ethnicities, with >8000 differentially expressed genes (40% of all genes expressed in SM). Comparison with blood transcriptome from a separate Singaporean cohort showed that >95% of SM expression differences among ethnicities were unique to SM. We identified a network of 46 genes that were specifically downregulated in Malays, suggesting dysregulation of components of cellular respiration in SM of Malay individuals. We also report 28 differentially expressed gene clusters, four of which were also enriched for genes that were found in genome-wide association studies of metabolic traits and disease and correlated with variation in IR, obesity, and related traits. CONCLUSION We identified extensive gene-expression changes in SM among the three Singaporean ethnicities and report specific genes and molecular pathways that might underpin and explain the differences in IR among these ethnic groups.
Collapse
Affiliation(s)
- Amelia Li Min Tan
- Department of Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
- Duke-National University of Singapore Medical School, Singapore
| | - Sarah R Langley
- Duke-National University of Singapore Medical School, Singapore
- National Heart Centre Singapore, Singapore
| | - Chee Fan Tan
- Nanyang Institute of Technology in Health and Medicine, Nanyang Technological University, Singapore
- School of Biological Sciences, Nanyang Technological University, Singapore
| | - Jin Fang Chai
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore
| | - Chin Meng Khoo
- Department of Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
- Duke-National University of Singapore Medical School, Singapore
- Division of Endocrinology, Department of Medicine, National University Health System, Singapore
| | - Melvin Khee-Shing Leow
- Duke-National University of Singapore Medical School, Singapore
- Singapore Institute for Clinical Sciences, Agency for Science, Technology and Research (A*STAR), Singapore
- Department of Endocrinology, Tan Tock Seng Hospital, Singapore
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore
| | - Eric Yin Hao Khoo
- Department of Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
- Division of Endocrinology, Department of Medicine, National University Health System, Singapore
| | | | - Michal Pravenec
- Institute Of Physiology, Czech Academy Of Sciences, Prague, Czech Republic
| | - Maxime Rotival
- Unit of Human Evolutionary Genetics, Institut Pasteur, Paris, France
| | - Suresh Anand Sadananthan
- Singapore Institute for Clinical Sciences, Agency for Science, Technology and Research (A*STAR), Singapore
| | - S Sendhil Velan
- Singapore Institute for Clinical Sciences, Agency for Science, Technology and Research (A*STAR), Singapore
- Singapore Bioimaging Consortium, Agency for Science, Technology and Research (A*STAR), Singapore
| | - Kavita Venkataraman
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore
| | - Yap Seng Chong
- Singapore Institute for Clinical Sciences, Agency for Science, Technology and Research (A*STAR), Singapore
- Department of Obstetrics and Gynaecology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Yung Seng Lee
- Singapore Institute for Clinical Sciences, Agency for Science, Technology and Research (A*STAR), Singapore
- Department of Paediatrics, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
- Division of Paediatrics Endocrinology, Khoo Teck Puat-National University Children's Medical Institute, National University Hospital, National University Health System, Singapore
| | - Xueling Sim
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore
| | - Walter Stunkel
- Experimental Biotherapeutics Centre, Agency for Science, Technology and Research (A*STAR), Singapore
| | - Mei Hui Liu
- Department of Chemistry, Food Science & Technology Programme, National University of Singapore, Singapore
| | - E Shyong Tai
- Department of Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
- Duke-National University of Singapore Medical School, Singapore
- Division of Endocrinology, Department of Medicine, National University Health System, Singapore
| | - Enrico Petretto
- Duke-National University of Singapore Medical School, Singapore
| |
Collapse
|
20
|
Kim J, Edge MD, Algee-Hewitt BFB, Li JZ, Rosenberg NA. Statistical Detection of Relatives Typed with Disjoint Forensic and Biomedical Loci. Cell 2018; 175:848-858.e6. [PMID: 30318150 DOI: 10.1016/j.cell.2018.09.008] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2018] [Revised: 08/10/2018] [Accepted: 09/04/2018] [Indexed: 10/28/2022]
Abstract
In familial searching in forensic genetics, a query DNA profile is tested against a database to determine whether it represents a relative of a database entrant. We examine the potential for using linkage disequilibrium to identify pairs of profiles as belonging to relatives when the query and database rely on nonoverlapping genetic markers. Considering data on individuals genotyped with both microsatellites used in forensic applications and genome-wide SNPs, we find that ∼30%-32% of parent-offspring pairs and ∼35%-36% of sib pairs can be identified from the SNPs of one member of the pair and the microsatellites of the other. The method suggests the possibility of performing familial searches of microsatellite databases using query SNP profiles, or vice versa. It also reveals that privacy concerns arising from computations across multiple databases that share no genetic markers in common entail risks, not only for database entrants, but for their close relatives as well.
Collapse
Affiliation(s)
- Jaehee Kim
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| | - Michael D Edge
- Department of Evolution and Ecology, University of California, Davis, Davis, CA 95616, USA
| | | | - Jun Z Li
- Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Noah A Rosenberg
- Department of Biology, Stanford University, Stanford, CA 94305, USA.
| |
Collapse
|