1
|
Dowell JA, Bowsher AW, Jamshad A, Shah R, Burke JM, Donovan LA, Mason CM. Historic breeding practices contribute to germplasm divergence in leaf specialized metabolism and ecophysiology in cultivated sunflower (Helianthus annuus). AMERICAN JOURNAL OF BOTANY 2024:e16420. [PMID: 39483110 DOI: 10.1002/ajb2.16420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Revised: 07/09/2024] [Accepted: 07/09/2024] [Indexed: 11/03/2024]
Abstract
PREMISE The use of hybrid breeding systems to increase crop yields has been the cornerstone of modern agriculture and is exemplified in the breeding and improvement of cultivated sunflower (Helianthus annuus). However, it is poorly understood what effect supporting separate breeding pools in such systems, combined with continued selection for yield, may have on leaf ecophysiology and specialized metabolite variation. METHODS We analyzed 288 lines of cultivated H. annuus to examine the genomic basis of several specialized metabolites and agronomically important traits across major heterotic groups. RESULTS Heterotic group identity supports phenotypic divergences between fertility restoring and cytoplasmic male-sterility maintainer lines in leaf ecophysiology and specialized metabolism. However, the divergence is not associated with physical linkage to nuclear genes that support current hybrid breeding practices in cultivated H. annuus. Additionally, we identified four genomic regions associated with leaf ecophysiology and specialized metabolism that colocalize with previously identified QTLs for quantitative self-compatibility traits and with S-protein homolog (SPH) proteins, a recently discovered family of proteins associated with self-incompatibility and self/nonself recognition in Papaver rhoeas (common poppy) with suggested conserved downstream mechanisms among eudicots. CONCLUSIONS Further work is necessary to confirm the self-incompatibility mechanisms in cultivated H. annuus and their relationship to the integrative and polygenic architecture of leaf ecophysiology and specialized metabolism in cultivated sunflower. However, because self-compatibility is a derived quantitative trait in cultivated H. annuus, trait linkage to divergent phenotypic traits may have partially arisen as a potential unintended consequence of historical breeding practices and selection for yield.
Collapse
Affiliation(s)
- Jordan A Dowell
- Department of Biological Sciences, Louisiana State University, Baton Rouge, 70802, LA, USA
- Department of Biology, University of Central Florida, Orlando, 32816, FL, USA
| | - Alan W Bowsher
- Department of Plant Biology, University of Georgia, Athens, 30602, GA, USA
| | - Amna Jamshad
- Department of Plant Biology, University of Georgia, Athens, 30602, GA, USA
| | - Rahul Shah
- Department of Medicine, Vanderbilt University Medical Center, Nashville, 37232, TN, USA
| | - John M Burke
- Department of Plant Biology, University of Georgia, Athens, 30602, GA, USA
- The Plant Center, University of Georgia, Athens, 30602, GA, USA
| | - Lisa A Donovan
- Department of Plant Biology, University of Georgia, Athens, 30602, GA, USA
| | - Chase M Mason
- Department of Biology, University of Central Florida, Orlando, 32816, FL, USA
- Department of Plant Biology, University of Georgia, Athens, 30602, GA, USA
- Department of Biology, University of British Columbia Okanagan, Kelowna, B.C. 9 V1V1V7, Canada
| |
Collapse
|
2
|
Zhao Z, Yang X, Dorn S, Miao J, Barcellos SH, Fletcher JM, Lu Q. Controlling for polygenic genetic confounding in epidemiologic association studies. Proc Natl Acad Sci U S A 2024; 121:e2408715121. [PMID: 39432782 PMCID: PMC11536117 DOI: 10.1073/pnas.2408715121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2024] [Accepted: 09/20/2024] [Indexed: 10/23/2024] Open
Abstract
Epidemiologic associations estimated from observational data are often confounded by genetics due to pervasive pleiotropy among complex traits. Many studies either neglect genetic confounding altogether or rely on adjusting for polygenic scores (PGS) in regression analysis. In this study, we unveil that the commonly employed PGS approach is inadequate for removing genetic confounding due to measurement error and model misspecification. To tackle this challenge, we introduce PENGUIN, a principled framework for polygenic genetic confounding control based on variance component estimation. In addition, we present extensions of this approach that can estimate genetically unconfounded associations using GWAS summary statistics alone as input and between multiple generations of study samples. Through simulations, we demonstrate superior statistical properties of PENGUIN compared to the existing approaches. Applying our method to multiple population cohorts, we reveal and remove substantial genetic confounding in the associations of educational attainment with various complex traits and between parental and offspring education. Our results show that PENGUIN is an effective solution for genetic confounding control in observational data analysis with broad applications in future epidemiologic association studies.
Collapse
Affiliation(s)
- Zijie Zhao
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI53706
| | - Xiaoyu Yang
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI53706
| | - Stephen Dorn
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI53706
| | - Jiacheng Miao
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI53706
| | - Silvia H. Barcellos
- Center for Economic and Social Research, University of Southern California, Los Angeles, CA90089
- Department of Economics, University of Southern California, Los Angeles, CA90089
| | - Jason M. Fletcher
- La Follette School of Public Affairs, University of Wisconsin-Madison, Madison, WI53706
| | - Qiongshi Lu
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI53706
- Department of Statistics, University of Wisconsin-Madison, Madison, WI53706
| |
Collapse
|
3
|
Zhao Z, Gruenloh T, Yan M, Wu Y, Sun Z, Miao J, Wu Y, Song J, Lu Q. Optimizing and benchmarking polygenic risk scores with GWAS summary statistics. Genome Biol 2024; 25:260. [PMID: 39379999 PMCID: PMC11462675 DOI: 10.1186/s13059-024-03400-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2023] [Accepted: 09/23/2024] [Indexed: 10/10/2024] Open
Abstract
BACKGROUND Polygenic risk score (PRS) is a major research topic in human genetics. However, a significant gap exists between PRS methodology and applications in practice due to often unavailable individual-level data for various PRS tasks including model fine-tuning, benchmarking, and ensemble learning. RESULTS We introduce an innovative statistical framework to optimize and benchmark PRS models using summary statistics of genome-wide association studies. This framework builds upon our previous work and can fine-tune virtually all existing PRS models while accounting for linkage disequilibrium. In addition, we provide an ensemble learning strategy named PUMAS-ensemble to combine multiple PRS models into an ensemble score without requiring external data for model fitting. Through extensive simulations and analysis of many complex traits in the UK Biobank, we demonstrate that this approach closely approximates gold-standard analytical strategies based on external validation, and substantially outperforms state-of-the-art PRS methods. CONCLUSIONS Our method is a powerful and general modeling technique that can continue to combine the best-performing PRS methods out there through ensemble learning and could become an integral component for all future PRS applications.
Collapse
Affiliation(s)
- Zijie Zhao
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
| | - Tim Gruenloh
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
| | - Meiyi Yan
- Department of Statistics, University of Wisconsin-Madison, Madison, WI, USA
| | - Yixuan Wu
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
| | - Zhongxuan Sun
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
| | - Jiacheng Miao
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
| | - Yuchang Wu
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
- Center for Demography of Health and Aging, University of Wisconsin-Madison, Madison, WI, USA
| | - Jie Song
- Center for Demography of Health and Aging, University of Wisconsin-Madison, Madison, WI, USA
| | - Qiongshi Lu
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA.
- Department of Statistics, University of Wisconsin-Madison, Madison, WI, USA.
- Center for Demography of Health and Aging, University of Wisconsin-Madison, Madison, WI, USA.
| |
Collapse
|
4
|
Karasov TL, Neumann M, Leventhal L, Symeonidi E, Shirsekar G, Hawks A, Monroe G, Exposito-Alonso M, Bergelson J, Weigel D, Schwab R. Continental-scale associations of Arabidopsis thaliana phyllosphere members with host genotype and drought. Nat Microbiol 2024; 9:2748-2758. [PMID: 39242816 PMCID: PMC11457713 DOI: 10.1038/s41564-024-01773-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Accepted: 07/02/2024] [Indexed: 09/09/2024]
Abstract
Plants are colonized by distinct pathogenic and commensal microbiomes across different regions of the globe, but the factors driving their geographic variation are largely unknown. Here, using 16S ribosomal DNA and shotgun sequencing, we characterized the associations of the Arabidopsis thaliana leaf microbiome with host genetics and climate variables from 267 populations in the species' native range across Europe. Comparing the distribution of the 575 major bacterial amplicon variants (phylotypes), we discovered that microbiome composition in A. thaliana segregates along a latitudinal gradient. The latitudinal clines in microbiome composition are predicted by metrics of drought, but also by the spatial genetics of the host. To validate the relative effects of drought and host genotype we conducted a common garden field study, finding 10% of the core bacteria to be affected directly by drought and 20% to be affected by host genetic associations with drought. These data provide a valuable resource for the plant microbiome field, with the identified associations suggesting that drought can directly and indirectly shape genetic variation in A. thaliana via the leaf microbiome.
Collapse
Affiliation(s)
- Talia L Karasov
- School of Biological Sciences, University of Utah, Salt Lake City, UT, USA.
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Tübingen, Germany.
| | - Manuela Neumann
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Tübingen, Germany
- Robert Bosch GmbH, Renningen, Germany
| | - Laura Leventhal
- Department of Biology, Stanford University, Stanford, CA, USA
- Department of Plant Biology, Carnegie Institution for Plant Science, Stanford, CA, USA
| | - Efthymia Symeonidi
- School of Biological Sciences, University of Utah, Salt Lake City, UT, USA
| | - Gautam Shirsekar
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Tübingen, Germany
- Department of Entomology and Plant Pathology, Institute of Agriculture, University of Tennessee, Knoxville, TN, USA
| | - Aubrey Hawks
- School of Biological Sciences, University of Utah, Salt Lake City, UT, USA
| | - Grey Monroe
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Tübingen, Germany
- Department of Plant Sciences, University of California Davis, Davis, CA, USA
| | - Moisés Exposito-Alonso
- Department of Biology, Stanford University, Stanford, CA, USA
- Department of Plant Biology, Carnegie Institution for Plant Science, Stanford, CA, USA
- Department of Integrative Biology, University of California Berkeley, Berkeley, CA, USA
- Howard Hughes Medical Institute, University of California Berkeley, Berkeley, CA, USA
| | - Joy Bergelson
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| | - Detlef Weigel
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Tübingen, Germany.
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany.
| | - Rebecca Schwab
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Tübingen, Germany
| |
Collapse
|
5
|
Xiang X, Liu S, He Y, Li D, Ofori AD, Ghani Kandhro A, Zheng T, Yi X, Li P, Huang F, Zheng A. Genome wide association study reveals new genes for resistance to striped stem borer in rice ( Oryza sativa L.). FRONTIERS IN PLANT SCIENCE 2024; 15:1466857. [PMID: 39345976 PMCID: PMC11427250 DOI: 10.3389/fpls.2024.1466857] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/18/2024] [Accepted: 08/27/2024] [Indexed: 10/01/2024]
Abstract
Rice is one of the most important food crops in the world and is important for global food security. However, damage caused by striped stem borer (SSB) seriously threatens rice production and can cause significant yield losses. The development and use of resistant rice varieties or genes is currently the most effective strategy for controlling SSB. We genotyped 201 rice samples using 2849855 high-confidence single nucleotide polymorphisms (SNPs). We conducted a genome-wide association study (GWAS) based on observed variation data of 201 rice cultivars resistant to SSB. We obtained a quantitative trait locus (QTL)-qRSSB4 that confers resistance to SSB. Through annotation and analysis of genes within the qRSSB4 locus, as well as qRT-PCR detection in resistant rice cultivars, we ultimately selected the candidate gene LOC_Os04g34140 (named OsRSSB4) for further analysis. Next, we overexpressed the candidate gene OsRSSB4 in Nipponbare through transgenic methods, resulting in OsRSSB4 overexpressing lines (OsRSSB4OE). In addition, we evaluated the insect resistance of OsRSSB4OE lines using wild type (Nipponbare) as a control. The bioassay experiment results of live plants showed that after 20 days of inoculation with SSB, the withering heart rate of OsRSSB4OE-34 and OsRSSB4OE-39 lines was only 8.3% and 0%, with resistance levels of 1 and 0, respectively; however, the withering heart rate of the wild-type reached 100%, with a resistance level of 9. The results of the in vitro stem bioassay showed that, compared with the wild-type, the average corrected mortality rate of the SSB fed on the OsRSSB4OE line reached 94.3%, and the resistance reached a high level. In summary, we preliminarily confirmed that OsRSSB4 positively regulates the defense of rice against SSB. This research findings reveal new SSB resistance gene resources, providing an important genetic basis for SSB resistance breeding in rice crops.
Collapse
Affiliation(s)
- Xing Xiang
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu, China
- College of Agronomy, Sichuan Agricultural University, Chengdu, China
| | - Shuhua Liu
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu, China
- College of Agronomy, Sichuan Agricultural University, Chengdu, China
| | - Yuewen He
- Guangan Vocational & Technical College, Guangan, China
| | - Deqiang Li
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu, China
| | - Andrews Danso Ofori
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu, China
- College of Agronomy, Sichuan Agricultural University, Chengdu, China
| | - Abdul Ghani Kandhro
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu, China
- College of Agronomy, Sichuan Agricultural University, Chengdu, China
| | - Tengda Zheng
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu, China
- College of Agronomy, Sichuan Agricultural University, Chengdu, China
| | - Xiaoqun Yi
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu, China
- College of Agronomy, Sichuan Agricultural University, Chengdu, China
| | - Ping Li
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu, China
| | - Fu Huang
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu, China
- College of Agronomy, Sichuan Agricultural University, Chengdu, China
| | - Aiping Zheng
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu, China
- College of Agronomy, Sichuan Agricultural University, Chengdu, China
| |
Collapse
|
6
|
Hu T, Parrish RL, Dai Q, Buchman AS, Tasaki S, Bennett DA, Seyfried NT, Epstein MP, Yang J. Omnibus proteome-wide association study identifies 43 risk genes for Alzheimer disease dementia. Am J Hum Genet 2024; 111:1848-1863. [PMID: 39079537 PMCID: PMC11393696 DOI: 10.1016/j.ajhg.2024.07.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 06/28/2024] [Accepted: 07/02/2024] [Indexed: 09/08/2024] Open
Abstract
Transcriptome-wide association study (TWAS) tools have been applied to conduct proteome-wide association studies (PWASs) by integrating proteomics data with genome-wide association study (GWAS) summary data. The genetic effects of PWAS-identified significant genes are potentially mediated through genetically regulated protein abundance, thus informing the underlying disease mechanisms better than GWAS loci. However, existing TWAS/PWAS tools are limited by considering only one statistical model. We propose an omnibus PWAS pipeline to account for multiple statistical models and demonstrate improved performance by simulation and application studies of Alzheimer disease (AD) dementia. We employ the Aggregated Cauchy Association Test to derive omnibus PWAS (PWAS-O) p values from PWAS p values obtained by three existing tools assuming complementary statistical models-TIGAR, PrediXcan, and FUSION. Our simulation studies demonstrated improved power, with well-calibrated type I error, for PWAS-O over all three individual tools. We applied PWAS-O to studying AD dementia with reference proteomic data profiled from dorsolateral prefrontal cortex of postmortem brains from individuals of European ancestry. We identified 43 risk genes, including 5 not identified by previous studies, which are interconnected through a protein-protein interaction network that includes the well-known AD risk genes TOMM40, APOC1, and APOC2. We also validated causal genetic effects mediated through the proteome for 27 (63%) PWAS-O risk genes, providing insights into the underlying biological mechanisms of AD dementia and highlighting promising targets for therapeutic development. PWAS-O can be easily applied to studying other complex diseases.
Collapse
Affiliation(s)
- Tingyang Hu
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA; Division of Biostatistics and Bioinformatics, Department of Public Health Sciences, Pennsylvania State University College of Medicine, Hershey, PA 17033, USA
| | - Randy L Parrish
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA; Department of Biostatistics and Bioinformatics, Emory University School of Public Health, Atlanta, GA 30322, USA
| | - Qile Dai
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA; Department of Biostatistics and Bioinformatics, Emory University School of Public Health, Atlanta, GA 30322, USA
| | - Aron S Buchman
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL 60612, USA
| | - Shinya Tasaki
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL 60612, USA
| | - David A Bennett
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL 60612, USA
| | - Nicholas T Seyfried
- Department of Biochemistry, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Michael P Epstein
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Jingjing Yang
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA.
| |
Collapse
|
7
|
Huang C, Cheng Y, Hu Y, Zhang X, Chen J, Zhao T, Si Z, Cao Y, Li Y, Fang L, Guan X, Zhang T. Impacts of parental genomic divergence in non-syntenic regions on cotton heterosis. J Adv Res 2024:S2090-1232(24)00331-X. [PMID: 39111623 DOI: 10.1016/j.jare.2024.08.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2024] [Revised: 08/04/2024] [Accepted: 08/04/2024] [Indexed: 08/10/2024] Open
Abstract
INTRODUCTION Heterosis has revolutionized crop breeding, enhancing global agricultural production. However, the mechanisms underlying heterosis remain obscure. Xiangzamian 2# (XZM2), a super hybrid upland cotton (Gossypium hirsutum L.) characterized by high-yield heterosis, has been developed and extensively planted in China. OBJECTIVES We conducted a systematic analysis of CRI12 and J8891, two parents of XZM2. We aimed to reveal the precise genetic information and the role of non-syntenic divergence in shaping heterosis, laying a foundation for advancing understanding of heterosis. METHODS We de novo assembled high-quality genomes of CRI12 and J8891, and further uncovered abundant genetic variations and non-syntenic regions between the parents. Whole-genome comparison, association analysis, transcriptomic analysis and relative identity-by-descent (rIBD) estimation were conducted to identify structural variations (SVs) and introgressions within non-syntenic blocks and to analyze their impacts on promoting heterosis. RESULTS Parental genetic divergence increased in non-syntenic regions. Furthermore, these regions, accounting for only 16.71% of the total genome, contained more loci with significantly higher heterotic effects, far exceeding the syntenic background. SVs covered 97.26% of non-syntenic sequences and caused widespread gene expression differences in these regions, driving dynamic complementation of gene expression in the hybrid. A set of SVs were responsible for trait improvement and had positive effects on heterosis, contributing larger heritability than short variations. We characterized numerous parental-specific introgressions from G. barbadense. Specifically, a functional introgression segment within non-syntenic blocks introduced an elite haplotype, which significantly increased lint yield and enhanced heterosis. CONCLUSION Our study clarified non-syntenic regions to harbor more loci with higher heterotic effects, revealed their importance in promoting heterosis and supported the crucial role of genetic complementation in heterosis. SVs and introgressions were identified as key factors responsible for non-syntenic divergence between the parents. They had important effects on gene expression and trait improvement, positively contributing to heterosis.
Collapse
Affiliation(s)
- Chujun Huang
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, Institute of Crop Science, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, China
| | - Yu Cheng
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, Institute of Crop Science, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, China
| | - Yan Hu
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, Institute of Crop Science, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, China; Hainan Institute of Zhejiang University, Sanya 572025, China
| | - Xuemei Zhang
- Annoroad Gene Technology (Beijing) Co., Ltd., Beijing 100176, China
| | - Jinwen Chen
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, Institute of Crop Science, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, China
| | - Ting Zhao
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, Institute of Crop Science, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, China
| | - Zhanfeng Si
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, Institute of Crop Science, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, China
| | - Yiwen Cao
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, Institute of Crop Science, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, China; Hainan Institute of Zhejiang University, Sanya 572025, China
| | - Yiqian Li
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, Institute of Crop Science, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, China
| | - Lei Fang
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, Institute of Crop Science, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, China; Hainan Institute of Zhejiang University, Sanya 572025, China
| | - Xueying Guan
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, Institute of Crop Science, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, China; Hainan Institute of Zhejiang University, Sanya 572025, China
| | - Tianzhen Zhang
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, Institute of Crop Science, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, China; Hainan Institute of Zhejiang University, Sanya 572025, China.
| |
Collapse
|
8
|
Qi T, Song L, Guo Y, Chen C, Yang J. From genetic associations to genes: methods, applications, and challenges. Trends Genet 2024; 40:642-667. [PMID: 38734482 DOI: 10.1016/j.tig.2024.04.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 04/15/2024] [Accepted: 04/16/2024] [Indexed: 05/13/2024]
Abstract
Genome-wide association studies (GWASs) have identified numerous genetic loci associated with human traits and diseases. However, pinpointing the causal genes remains a challenge, which impedes the translation of GWAS findings into biological insights and medical applications. In this review, we provide an in-depth overview of the methods and technologies used for prioritizing genes from GWAS loci, including gene-based association tests, integrative analysis of GWAS and molecular quantitative trait loci (xQTL) data, linking GWAS variants to target genes through enhancer-gene connection maps, and network-based prioritization. We also outline strategies for generating context-dependent xQTL data and their applications in gene prioritization. We further highlight the potential of gene prioritization in drug repurposing. Lastly, we discuss future challenges and opportunities in this field.
Collapse
Affiliation(s)
- Ting Qi
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou 310024, China; School of Life Sciences, Westlake University, Hangzhou 310024, China.
| | - Liyang Song
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou 310024, China; School of Life Sciences, Westlake University, Hangzhou 310024, China
| | - Yazhou Guo
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou 310024, China; School of Life Sciences, Westlake University, Hangzhou 310024, China
| | - Chang Chen
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou 310024, China; School of Life Sciences, Westlake University, Hangzhou 310024, China
| | - Jian Yang
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou 310024, China; School of Life Sciences, Westlake University, Hangzhou 310024, China.
| |
Collapse
|
9
|
Santiago-Lamelas L, Dos Santos-Sobrín R, Carracedo Á, Castro-Santos P, Díaz-Peña R. Utility of polygenic risk scores to aid in the diagnosis of rheumatic diseases. Best Pract Res Clin Rheumatol 2024:101973. [PMID: 38997822 DOI: 10.1016/j.berh.2024.101973] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Revised: 07/04/2024] [Accepted: 07/05/2024] [Indexed: 07/14/2024]
Abstract
Rheumatic diseases (RDs) are characterized by autoimmunity and autoinflammation and are recognized as complex due to the interplay of multiple genetic, environmental, and lifestyle factors in their pathogenesis. The rapid advancement of genome-wide association studies (GWASs) has enabled the identification of numerous single nucleotide polymorphisms (SNPs) associated with RD susceptibility. Based on these SNPs, polygenic risk scores (PRSs) have emerged as promising tools for quantifying genetic risk in this disease group. This chapter reviews the current status of PRSs in assessing the risk of RDs and discusses their potential to improve the accuracy of the diagnosis of these complex diseases through their ability to discriminate among different RDs. PRSs demonstrate a high discriminatory capacity for various RDs and show potential clinical utility. As GWASs continue to evolve, PRSs are expected to enable more precise risk stratification by integrating genetic, environmental, and lifestyle factors, thereby refining individual risk predictions and advancing disease management strategies.
Collapse
Affiliation(s)
- Lucía Santiago-Lamelas
- Fundación Pública Galega de Medicina Xenómica (SERGAS), Centro Nacional de Genotipado, Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, Spain
| | - Raquel Dos Santos-Sobrín
- Reumatología, Hospital Clínico Universitario, Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, Spain
| | - Ángel Carracedo
- Fundación Pública Galega de Medicina Xenómica (SERGAS), Centro Nacional de Genotipado, Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, Spain; Grupo de Medicina Xenómica, CIMUS, Universidade de Santiago de Compostela, Santiago de Compostela, Spain; Centre for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III, Madrid, Spain
| | - Patricia Castro-Santos
- Fundación Pública Galega de Medicina Xenómica (SERGAS), Centro Nacional de Genotipado, Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, Spain; Faculty of Health Sciences, Universidad Autónoma de Chile, Talca, Chile.
| | - Roberto Díaz-Peña
- Fundación Pública Galega de Medicina Xenómica (SERGAS), Centro Nacional de Genotipado, Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, Spain; Faculty of Health Sciences, Universidad Autónoma de Chile, Talca, Chile.
| |
Collapse
|
10
|
Wang X, Shi S, Ali Khan MY, Zhang Z, Zhang Y. Improving the accuracy of genomic prediction in dairy cattle using the biologically annotated neural networks framework. J Anim Sci Biotechnol 2024; 15:87. [PMID: 38945998 PMCID: PMC11215832 DOI: 10.1186/s40104-024-01044-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Accepted: 05/05/2024] [Indexed: 07/02/2024] Open
Abstract
BACKGROUND Biologically annotated neural networks (BANNs) are feedforward Bayesian neural network models that utilize partially connected architectures based on SNP-set annotations. As an interpretable neural network, BANNs model SNP and SNP-set effects in their input and hidden layers, respectively. Furthermore, the weights and connections of the network are regarded as random variables with prior distributions reflecting the manifestation of genetic effects at various genomic scales. However, its application in genomic prediction has yet to be explored. RESULTS This study extended the BANNs framework to the area of genomic selection and explored the optimal SNP-set partitioning strategies by using dairy cattle datasets. The SNP-sets were partitioned based on two strategies-gene annotations and 100 kb windows, denoted as BANN_gene and BANN_100kb, respectively. The BANNs model was compared with GBLUP, random forest (RF), BayesB and BayesCπ through five replicates of five-fold cross-validation using genotypic and phenotypic data on milk production traits, type traits, and one health trait of 6,558, 6,210 and 5,962 Chinese Holsteins, respectively. Results showed that the BANNs framework achieves higher genomic prediction accuracy compared to GBLUP, RF and Bayesian methods. Specifically, the BANN_100kb demonstrated superior accuracy and the BANN_gene exhibited generally suboptimal accuracy compared to GBLUP, RF, BayesB and BayesCπ across all traits. The average accuracy improvements of BANN_100kb over GBLUP, RF, BayesB and BayesCπ were 4.86%, 3.95%, 3.84% and 1.92%, and the accuracy of BANN_gene was improved by 3.75%, 2.86%, 2.73% and 0.85% compared to GBLUP, RF, BayesB and BayesCπ, respectively across all seven traits. Meanwhile, both BANN_100kb and BANN_gene yielded lower overall mean square error values than GBLUP, RF and Bayesian methods. CONCLUSION Our findings demonstrated that the BANNs framework performed better than traditional genomic prediction methods in our tested scenarios, and might serve as a promising alternative approach for genomic prediction in dairy cattle.
Collapse
Affiliation(s)
- Xue Wang
- State Key Laboratory of Animal Biotech Breeding, National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China
| | - Shaolei Shi
- State Key Laboratory of Animal Biotech Breeding, National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China
| | - Md Yousuf Ali Khan
- State Key Laboratory of Animal Biotech Breeding, National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China
- Bangladesh Livestock Research Institute, Dhaka 1341, Bangladesh
| | - Zhe Zhang
- Guangdong Laboratory of Lingnan Modern Agriculture, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China.
| | - Yi Zhang
- State Key Laboratory of Animal Biotech Breeding, National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China.
| |
Collapse
|
11
|
Pattillo Smith S, Darnell G, Udwin D, Stamp J, Harpak A, Ramachandran S, Crawford L. Discovering non-additive heritability using additive GWAS summary statistics. eLife 2024; 13:e90459. [PMID: 38913556 PMCID: PMC11196113 DOI: 10.7554/elife.90459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2023] [Accepted: 04/22/2024] [Indexed: 06/26/2024] Open
Abstract
LD score regression (LDSC) is a method to estimate narrow-sense heritability from genome-wide association study (GWAS) summary statistics alone, making it a fast and popular approach. In this work, we present interaction-LD score (i-LDSC) regression: an extension of the original LDSC framework that accounts for interactions between genetic variants. By studying a wide range of generative models in simulations, and by re-analyzing 25 well-studied quantitative phenotypes from 349,468 individuals in the UK Biobank and up to 159,095 individuals in BioBank Japan, we show that the inclusion of a cis-interaction score (i.e. interactions between a focal variant and proximal variants) recovers genetic variance that is not captured by LDSC. For each of the 25 traits analyzed in the UK Biobank and BioBank Japan, i-LDSC detects additional variation contributed by genetic interactions. The i-LDSC software and its application to these biobanks represent a step towards resolving further genetic contributions of sources of non-additive genetic effects to complex trait variation.
Collapse
Affiliation(s)
- Samuel Pattillo Smith
- Center for Computational Molecular Biology, Brown UniversityProvidenceUnited States
- Department of Ecology and Evolutionary Biology, Brown UniversityProvidenceUnited States
- Department of Integrative Biology, The University of Texas at AustinAustinUnited States
- Department of Population Health, The University of Texas at AustinAustinUnited States
| | - Gregory Darnell
- Center for Computational Molecular Biology, Brown UniversityProvidenceUnited States
- Institute for Computational and Experimental Research in Mathematics, Brown UniversityProvidenceUnited States
| | - Dana Udwin
- Department of Biostatistics, Brown UniversityProvidenceUnited States
| | - Julian Stamp
- Center for Computational Molecular Biology, Brown UniversityProvidenceUnited States
| | - Arbel Harpak
- Department of Integrative Biology, The University of Texas at AustinAustinUnited States
- Department of Population Health, The University of Texas at AustinAustinUnited States
| | - Sohini Ramachandran
- Center for Computational Molecular Biology, Brown UniversityProvidenceUnited States
- Department of Ecology and Evolutionary Biology, Brown UniversityProvidenceUnited States
- Data Science Institute, Brown UniversityProvidenceUnited States
| | - Lorin Crawford
- Center for Computational Molecular Biology, Brown UniversityProvidenceUnited States
- Department of Biostatistics, Brown UniversityProvidenceUnited States
- MicrosoftCambridgeUnited States
| |
Collapse
|
12
|
Zebardast N, Wiggs JL. How Genome-Wide Association Studies Transform Care for Patients at Risk of Glaucoma. EXPERT REVIEW OF OPHTHALMOLOGY 2024; 19:243-246. [PMID: 39464630 PMCID: PMC11507518 DOI: 10.1080/17469899.2024.2365736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Accepted: 06/05/2024] [Indexed: 10/29/2024]
Affiliation(s)
- Nazlee Zebardast
- Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, MA 02114, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Janey L. Wiggs
- Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, MA 02114, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| |
Collapse
|
13
|
Bastias CC, Estarague A, Vile D, Gaignon E, Lee CR, Exposito-Alonso M, Violle C, Vasseur F. Ecological trade-offs drive phenotypic and genetic differentiation of Arabidopsis thaliana in Europe. Nat Commun 2024; 15:5185. [PMID: 38890286 PMCID: PMC11189578 DOI: 10.1038/s41467-024-49267-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Accepted: 05/23/2024] [Indexed: 06/20/2024] Open
Abstract
Plant diversity is shaped by trade-offs between traits related to competitive ability, propagule dispersal, and stress resistance. However, we still lack a clear understanding of how these trade-offs influence species distribution and population dynamics. In Arabidopsis thaliana, recent genetic analyses revealed a group of cosmopolitan genotypes that successfully recolonized Europe from its center after the last glaciation, excluding older (relict) lineages from the distribution except for their north and south margins. Here, we tested the hypothesis that cosmopolitans expanded due to higher colonization ability, while relicts persisted at the margins due to higher tolerance to competition and/or stress. We compared the phenotypic and genetic differentiation between 71 European genotypes originating from the center, and the south and north margins. We showed that a trade-off between plant fecundity and seed mass shapes the differentiation of A. thaliana in Europe, suggesting that the success of the cosmopolitan groups could be explained by their high dispersal ability. However, at both north and south margins, we found evidence of selection for alleles conferring low dispersal but highly competitive and stress-resistance abilities. This study sheds light on the role of ecological trade-offs as evolutionary drivers of the distribution and dynamics of plant populations.
Collapse
Affiliation(s)
- Cristina C Bastias
- CEFE, Univ Montpellier, CNRS, EPHE, IRD, Montpellier, France.
- Área de Ecología, Facultad de Ciencias, Universidad de Córdoba, Campus de Rabanales, Córdoba, Spain.
| | - Aurélien Estarague
- CEFE, Univ Montpellier, CNRS, EPHE, IRD, Montpellier, France
- LEPSE, Univ Montpellier, INRAE, Institut Agro Montpellier, Montpellier, France
| | - Denis Vile
- LEPSE, Univ Montpellier, INRAE, Institut Agro Montpellier, Montpellier, France
| | - Elza Gaignon
- CEFE, Univ Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | - Cheng-Ruei Lee
- Institute of Ecology and Evolutionary Biology & Institute of Plant Biology, National Taiwan University, Taipei, Taiwan
| | | | - Cyrille Violle
- CEFE, Univ Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | | |
Collapse
|
14
|
Zou Y, Carbonetto P, Xie D, Wang G, Stephens M. Fast and flexible joint fine-mapping of multiple traits via the Sum of Single Effects model. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.04.14.536893. [PMID: 37425935 PMCID: PMC10327118 DOI: 10.1101/2023.04.14.536893] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
We introduce mvSuSiE, a multi-trait fine-mapping method for identifying putative causal variants from genetic association data (individual-level or summary data). mvSuSiE learns patterns of shared genetic effects from data, and exploits these patterns to improve power to identify causal SNPs. Comparisons on simulated data show that mvSuSiE is competitive in speed, power and precision with existing multi-trait methods, and uniformly improves on single-trait fine-mapping (SuSiE) in each trait separately. We applied mvSuSiE to jointly fine-map 16 blood cell traits using data from the UK Biobank. By jointly analyzing the traits and modeling heterogeneous effect sharing patterns, we discovered a much larger number of causal SNPs (>3,000) compared with single-trait fine-mapping, and with narrower credible sets. mvSuSiE also more comprehensively characterized the ways in which the genetic variants affect one or more blood cell traits; 68% of causal SNPs showed significant effects in more than one blood cell type.
Collapse
Affiliation(s)
- Yuxin Zou
- Department of Statistics, University of Chicago, Chicago, IL, USA
- Regeneron Genetics Center, Regeneron Pharmaceuticals, Inc., Tarrytown, NY, USA
| | - Peter Carbonetto
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Dongyue Xie
- Department of Statistics, University of Chicago, Chicago, IL, USA
| | - Gao Wang
- Gertrude. H. Sergievsky Center, Department of Neurology, Columbia University, New York, NY, USA
| | - Matthew Stephens
- Department of Statistics, University of Chicago, Chicago, IL, USA
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| |
Collapse
|
15
|
Strober BJ, Zhang MJ, Amariuta T, Rossen J, Price AL. Fine-mapping causal tissues and genes at disease-associated loci. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2023.11.01.23297909. [PMID: 37961337 PMCID: PMC10635248 DOI: 10.1101/2023.11.01.23297909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Heritable diseases often manifest in a highly tissue-specific manner, with different disease loci mediated by genes in distinct tissues or cell types. We propose Tissue-Gene Fine-Mapping (TGFM), a fine-mapping method that infers the posterior probability (PIP) for each gene-tissue pair to mediate a disease locus by analyzing GWAS summary statistics (and in-sample LD) and leveraging eQTL data from diverse tissues to build cis-predicted expression models; TGFM also assigns PIPs to causal variants that are not mediated by gene expression in assayed genes and tissues. TGFM accounts for both co-regulation across genes and tissues and LD between SNPs (generalizing existing fine-mapping methods), and incorporates genome-wide estimates of each tissue's contribution to disease as tissue-level priors. TGFM was well-calibrated and moderately well-powered in simulations; unlike previous methods, TGFM was able to attain correct calibration by modeling uncertainty in cis-predicted expression models. We applied TGFM to 45 UK Biobank diseases/traits (average N = 316K) using eQTL data from 38 GTEx tissues. TGFM identified an average of 147 PIP > 0.5 causal genetic elements per disease/trait, of which 11% were gene-tissue pairs. Implicated gene-tissue pairs were concentrated in known disease-critical tissues, and causal genes were strongly enriched in disease-relevant gene sets. Causal gene-tissue pairs identified by TGFM recapitulated known biology (e.g., TPO-thyroid for Hypothyroidism), but also included biologically plausible novel findings (e.g., SLC20A2-artery aorta for Diastolic blood pressure). Further application of TGFM to single-cell eQTL data from 9 cell types in peripheral blood mononuclear cells (PBMC), analyzed jointly with GTEx tissues, identified 30 additional causal gene-PBMC cell type pairs at PIP > 0.5-primarily for autoimmune disease and blood cell traits, including the biologically plausible example of CD52 in classical monocyte cells for Monocyte count. In conclusion, TGFM is a robust and powerful method for fine-mapping causal tissues and genes at disease-associated loci.
Collapse
Affiliation(s)
- Benjamin J. Strober
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Martin Jinye Zhang
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Tiffany Amariuta
- Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Jordan Rossen
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Alkes L. Price
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
16
|
Li R, Gao J, Zhou G, Zuo D, Sun Y. SABO-ILSTSVR: a genomic prediction method based on improved least squares twin support vector regression. Front Genet 2024; 15:1415249. [PMID: 38948357 PMCID: PMC11211513 DOI: 10.3389/fgene.2024.1415249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Accepted: 05/29/2024] [Indexed: 07/02/2024] Open
Abstract
In modern breeding practices, genomic prediction (GP) uses high-density single nucleotide polymorphisms (SNPs) markers to predict genomic estimated breeding values (GEBVs) for crucial phenotypes, thereby speeding up selection breeding process and shortening generation intervals. However, due to the characteristic of genotype data typically having far fewer sample numbers than SNPs markers, overfitting commonly arise during model training. To address this, the present study builds upon the Least Squares Twin Support Vector Regression (LSTSVR) model by incorporating a Lasso regularization term named ILSTSVR. Because of the complexity of parameter tuning for different datasets, subtraction average based optimizer (SABO) is further introduced to optimize ILSTSVR, and then obtain the GP model named SABO-ILSTSVR. Experiments conducted on four different crop datasets demonstrate that SABO-ILSTSVR outperforms or is equivalent in efficiency to widely-used genomic prediction methods. Source codes and data are available at: https://github.com/MLBreeding/SABO-ILSTSVR.
Collapse
Affiliation(s)
- Rui Li
- College of Computer and Information Engineering, Inner Mongolia Agricultual University, Hohhot, China
- Inner Mongolia Autonomous Region Key Laboratory of Big Data Research and Application for Agriculture and Animal Husbandry, Hohhot, China
| | - Jing Gao
- College of Computer and Information Engineering, Inner Mongolia Agricultual University, Hohhot, China
- Inner Mongolia Autonomous Region Key Laboratory of Big Data Research and Application for Agriculture and Animal Husbandry, Hohhot, China
- Inner Mongolia Autonomous Region Big Data Center, Hohhot, China
| | - Ganghui Zhou
- College of Computer and Information Engineering, Inner Mongolia Agricultual University, Hohhot, China
- Inner Mongolia Autonomous Region Key Laboratory of Big Data Research and Application for Agriculture and Animal Husbandry, Hohhot, China
| | - Dongshi Zuo
- College of Computer and Information Engineering, Inner Mongolia Agricultual University, Hohhot, China
- Inner Mongolia Autonomous Region Key Laboratory of Big Data Research and Application for Agriculture and Animal Husbandry, Hohhot, China
| | - Yao Sun
- College of Computer and Information Engineering, Inner Mongolia Agricultual University, Hohhot, China
- Inner Mongolia Autonomous Region Key Laboratory of Big Data Research and Application for Agriculture and Animal Husbandry, Hohhot, China
| |
Collapse
|
17
|
Wang JT, Chang XY, Zhao Q, Zhang YM. FastBiCmrMLM: a fast and powerful compressed variance component mixed logistic model for big genomic case-control genome-wide association study. Brief Bioinform 2024; 25:bbae290. [PMID: 38888457 PMCID: PMC11184901 DOI: 10.1093/bib/bbae290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 05/19/2024] [Accepted: 06/09/2024] [Indexed: 06/20/2024] Open
Abstract
Large sample datasets have been regarded as the primary basis for innovative discoveries and the solution to missing heritability in genome-wide association studies. However, their computational complexity cannot consider all comprehensive effects and all polygenic backgrounds, which reduces the effectiveness of large datasets. To address these challenges, we included all effects and polygenic backgrounds in a mixed logistic model for binary traits and compressed four variance components into two. The compressed model combined three computational algorithms to develop an innovative method, called FastBiCmrMLM, for large data analysis. These algorithms were tailored to sample size, computational speed, and reduced memory requirements. To mine additional genes, linkage disequilibrium markers were replaced by bin-based haplotypes, which are analyzed by FastBiCmrMLM, named FastBiCmrMLM-Hap. Simulation studies highlighted the superiority of FastBiCmrMLM over GMMAT, SAIGE and fastGWA-GLMM in identifying dominant, small α (allele substitution effect), and rare variants. In the UK Biobank-scale dataset, we demonstrated that FastBiCmrMLM could detect variants as small as 0.03% and with α ≈ 0. In re-analyses of seven diseases in the WTCCC datasets, 29 candidate genes, with both functional and TWAS evidence, around 36 variants identified only by the new methods, strongly validated the new methods. These methods offer a new way to decipher the genetic architecture of binary traits and address the challenges outlined above.
Collapse
Affiliation(s)
| | | | | | - Yuan-Ming Zhang
- Corresponding author. College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China. Tel.: +086-13505161564; E-mail:
| |
Collapse
|
18
|
Justen HC, Easton WE, Delmore KE. Mapping seasonal migration in a songbird hybrid zone -- heritability, genetic correlations, and genomic patterns linked to speciation. Proc Natl Acad Sci U S A 2024; 121:e2313442121. [PMID: 38648483 PMCID: PMC11067064 DOI: 10.1073/pnas.2313442121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Accepted: 03/19/2024] [Indexed: 04/25/2024] Open
Abstract
Seasonal migration is a widespread behavior relevant for adaptation and speciation, yet knowledge of its genetic basis is limited. We leveraged advances in tracking and sequencing technologies to bridge this gap in a well-characterized hybrid zone between songbirds that differ in migratory behavior. Migration requires the coordinated action of many traits, including orientation, timing, and wing morphology. We used genetic mapping to show these traits are highly heritable and genetically correlated, explaining how migration has evolved so rapidly in the past and suggesting future responses to climate change may be possible. Many of these traits mapped to the same genomic regions and small structural variants indicating the same, or tightly linked, genes underlie them. Analyses integrating transcriptomic data indicate cholinergic receptors could control multiple traits. Furthermore, analyses integrating genomic differentiation further suggested genes underlying migratory traits help maintain reproductive isolation in this hybrid zone.
Collapse
Affiliation(s)
- Hannah C. Justen
- Biology Department, Texas Agricultural and Mechanical University, TAMUCollege Station, TX3528
| | - Wendy E. Easton
- Environment and Climate Change Canada, Canadian Wildlife Service-Pacific Region, Delta, BCV4K 3N2, Canada
| | - Kira E. Delmore
- Biology Department, Texas Agricultural and Mechanical University, TAMUCollege Station, TX3528
| |
Collapse
|
19
|
Agha HI, Endelman JB, Chitwood-Brown J, Clough M, Coombs J, De Jong WS, Douches DS, Higgins CR, Holm DG, Novy R, Resende MFR, Sathuvalli V, Thompson AL, Yencho GC, Zotarelli L, Shannon LM. Genotype-by-environment interactions and local adaptation shape selection in the US National Chip Processing Trial. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2024; 137:99. [PMID: 38598016 PMCID: PMC11006776 DOI: 10.1007/s00122-024-04610-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/11/2023] [Accepted: 03/25/2024] [Indexed: 04/11/2024]
Abstract
KEY MESSAGE We find evidence of selection for local adaptation and extensive genotype-by-environment interaction in the potato National Chip Processing Trial (NCPT). We present a novel method for dissecting the interplay between selection, local adaptation and environmental response in plant breeding schemes. Balancing local adaptation and the desire for widely adapted cultivars is challenging for plant breeders and makes genotype-by-environment interactions (GxE) an important target of selection. Selecting for GxE requires plant breeders to evaluate plants across multiple environments. One way breeders have accomplished this is to test advanced materials across many locations. Public potato breeders test advanced breeding material in the National Chip Processing Trial (NCPT), a public-private partnership where breeders from ten institutions submit advanced chip lines to be evaluated in up to ten locations across the country. These clones are genotyped and phenotyped for important agronomic traits. We used these data to interrogate the NCPT for GxE. Further, because breeders submitting clones to the NCPT select in a relatively small geographic range for the first 3 years of selection, we examined these data for evidence of incidental selection for local adaptation, and the alleles underlying it, using an environmental genome-wide association study (envGWAS). We found genomic regions associated with continuous environmental variables and discrete breeding programs, as well as regions of the genome potentially underlying GxE for yield.
Collapse
Affiliation(s)
- Husain I Agha
- Department of Horticultural Science, University of Minnesota, Saint Paul, MN, USA
| | - Jeffrey B Endelman
- Department of Plant & Agroecosystem Sciences, University of Wisconsin-Madison, Madison, WI, USA
| | - Jessica Chitwood-Brown
- Department of Horticulture and Landscape Architecture, Colorado State University, Fort Collins, CO, USA
| | - Mark Clough
- Department of Horticultural Science, North Carolina State University, Raleigh, NC, USA
| | - Joseph Coombs
- Department of Plant Soil and Microbial Sciences, Michigan State University, East Lansing, MI, USA
| | - Walter S De Jong
- School of Integrative Plant Science, Cornell University, Ithaca, NY, USA
| | - David S Douches
- Department of Plant Soil and Microbial Sciences, Michigan State University, East Lansing, MI, USA
| | | | - David G Holm
- Department of Horticulture and Landscape Architecture, Colorado State University, Fort Collins, CO, USA
| | - Richard Novy
- Small Grains and Potato Germplasm Research, USDA-ARS, Aberdeen, ID, USA
| | - Marcio F R Resende
- Horticultural Sciences Department, University of Florida, Gainesville, FL, USA
| | - Vidyasagar Sathuvalli
- Hermiston Agricultural Research and Extension Center, Oregon State University, Hermiston, OR, USA
| | - Asunta L Thompson
- Department of Plant Sciences, North Dakota State University, Fargo, ND, USA
| | - G Craig Yencho
- Department of Horticultural Science, North Carolina State University, Raleigh, NC, USA
| | - Lincoln Zotarelli
- Horticultural Sciences Department, University of Florida, Gainesville, FL, USA
| | - Laura M Shannon
- Department of Horticultural Science, University of Minnesota, Saint Paul, MN, USA.
| |
Collapse
|
20
|
Li Q, Bian J, Qian Y, Kossinna P, Gau C, Gordon PMK, Zhou X, Guo X, Yan J, Wu J, Long Q. An expression-directed linear mixed model discovering low-effect genetic variants. Genetics 2024; 226:iyae018. [PMID: 38314848 DOI: 10.1093/genetics/iyae018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 11/29/2023] [Accepted: 01/05/2024] [Indexed: 02/07/2024] Open
Abstract
Detecting genetic variants with low-effect sizes using a moderate sample size is difficult, hindering downstream efforts to learn pathology and estimating heritability. In this work, by utilizing informative weights learned from training genetically predicted gene expression models, we formed an alternative approach to estimate the polygenic term in a linear mixed model. Our linear mixed model estimates the genetic background by incorporating their relevance to gene expression. Our protocol, expression-directed linear mixed model, enables the discovery of subtle signals of low-effect variants using moderate sample size. By applying expression-directed linear mixed model to cohorts of around 5,000 individuals with either binary (WTCCC) or quantitative (NFBC1966) traits, we demonstrated its power gain at the low-effect end of the genetic etiology spectrum. In aggregate, the additional low-effect variants detected by expression-directed linear mixed model substantially improved estimation of missing heritability. Expression-directed linear mixed model moves precision medicine forward by accurately detecting the contribution of low-effect genetic variants to human diseases.
Collapse
Affiliation(s)
- Qing Li
- Department of Biochemistry & Molecular Biology, University of Calgary, Calgary T2N 1N4, Canada
| | - Jiayi Bian
- Department of Mathematics and Statistics, University of Calgary, Calgary T2N 1N4, Canada
| | - Yanzhao Qian
- Department of Mathematics and Statistics, University of Calgary, Calgary T2N 1N4, Canada
| | - Pathum Kossinna
- Department of Biochemistry & Molecular Biology, University of Calgary, Calgary T2N 1N4, Canada
| | - Cooper Gau
- Department of Mathematics and Statistics, University of Calgary, Calgary T2N 1N4, Canada
| | - Paul M K Gordon
- Alberta Children's Hospital Research Institute, University of Calgary, Calgary T2N 1N4, Canada
| | - Xiang Zhou
- School of Public Health, University of Michigan, Ann Arbor 48109, USA
| | - Xingyi Guo
- Department of Medicine & Biomedical Informatics, Vanderbilt University Medical Center, Nashville 37203, USA
| | - Jun Yan
- Physiology and Pharmacology, University of Calgary, Calgary T2N 1N4, Canada
- Hotchkiss Brain Institute, University of Calgary, Calgary T2N 1N4, Canada
| | - Jingjing Wu
- Department of Mathematics and Statistics, University of Calgary, Calgary T2N 1N4, Canada
| | - Quan Long
- Department of Biochemistry & Molecular Biology, University of Calgary, Calgary T2N 1N4, Canada
- Department of Mathematics and Statistics, University of Calgary, Calgary T2N 1N4, Canada
- Alberta Children's Hospital Research Institute, University of Calgary, Calgary T2N 1N4, Canada
- Hotchkiss Brain Institute, University of Calgary, Calgary T2N 1N4, Canada
- Department of Medical Genetics, University of Calgary, Calgary T2N 1N4, Canada
| |
Collapse
|
21
|
Fenton S, Jacobs A, Bean CW, Adams CE, Elmer KR. Genomic underpinnings of head and body shape in Arctic charr ecomorph pairs. Mol Ecol 2024; 33:e17305. [PMID: 38421099 DOI: 10.1111/mec.17305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 01/31/2024] [Accepted: 02/05/2024] [Indexed: 03/02/2024]
Abstract
Across its Holarctic range, Arctic charr (Salvelinus alpinus) populations have diverged into distinct trophic specialists across independent replicate lakes. The major aspect of divergence between ecomorphs is in head shape and body shape, which are ecomorphological traits reflecting niche use. However, whether the genomic underpinnings of these parallel divergences are consistent across replicates was unknown but key for resolving the substrate of parallel evolution. We investigated the genomic basis of head shape and body shape morphology across four benthivore-planktivore ecomorph pairs of Arctic charr in Scotland. Through genome-wide association analyses, we found genomic regions associated with head shape (89 SNPs) or body shape (180 SNPs) separately and 50 of these SNPs were strongly associated with both body and head shape morphology. For each trait separately, only a small number of SNPs were shared across all ecomorph pairs (3 SNPs for head shape and 10 SNPs for body shape). Signs of selection on the associated genomic regions varied across pairs, consistent with evolutionary demography differing considerably across lakes. Using a comprehensive database of salmonid QTLs newly augmented and mapped to a charr genome, we found several of the head- and body-shape-associated SNPs were within or near morphology QTLs from other salmonid species, reflecting a shared genetic basis for these phenotypes across species. Overall, our results demonstrate how parallel ecotype divergences can have both population-specific and deeply shared genomic underpinnings across replicates, influenced by differences in their environments and demographic histories.
Collapse
Affiliation(s)
- Sam Fenton
- School of Biodiversity, One Health & Veterinary Medicine, University of Glasgow, Glasgow, UK
| | - Arne Jacobs
- School of Biodiversity, One Health & Veterinary Medicine, University of Glasgow, Glasgow, UK
| | - Colin W Bean
- School of Biodiversity, One Health & Veterinary Medicine, University of Glasgow, Glasgow, UK
- NatureScot, Clydebank, UK
| | - Colin E Adams
- School of Biodiversity, One Health & Veterinary Medicine, University of Glasgow, Glasgow, UK
- Scottish Centre for Ecology and the Natural Environment, University of Glasgow, Glasgow, UK
| | - Kathryn R Elmer
- School of Biodiversity, One Health & Veterinary Medicine, University of Glasgow, Glasgow, UK
| |
Collapse
|
22
|
Wu YS, Zheng WH, Liu TH, Sun Y, Xu YT, Shao LZ, Cai QY, Tang YQ. Joint-tissue integrative analysis identifies high-risk genes for Parkinson's disease. Front Neurosci 2024; 18:1309684. [PMID: 38576865 PMCID: PMC10991821 DOI: 10.3389/fnins.2024.1309684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2023] [Accepted: 02/22/2024] [Indexed: 04/06/2024] Open
Abstract
The loss of dopaminergic neurons in the substantia nigra and the abnormal accumulation of synuclein proteins and neurotransmitters in Lewy bodies constitute the primary symptoms of Parkinson's disease (PD). Besides environmental factors, scholars are in the early stages of comprehending the genetic factors involved in the pathogenic mechanism of PD. Although genome-wide association studies (GWAS) have unveiled numerous genetic variants associated with PD, precisely pinpointing the causal variants remains challenging due to strong linkage disequilibrium (LD) among them. Addressing this issue, expression quantitative trait locus (eQTL) cohorts were employed in a transcriptome-wide association study (TWAS) to infer the genetic correlation between gene expression and a particular trait. Utilizing the TWAS theory alongside the enhanced Joint-Tissue Imputation (JTI) technique and Mendelian Randomization (MR) framework (MR-JTI), we identified a total of 159 PD-associated genes by amalgamating LD score, GTEx eQTL data, and GWAS summary statistic data from a substantial cohort. Subsequently, Fisher's exact test was conducted on these PD-associated genes using 5,152 differentially expressed genes sourced from 12 PD-related datasets. Ultimately, 29 highly credible PD-associated genes, including CTX1B, SCNA, and ARSA, were uncovered. Furthermore, GO and KEGG enrichment analyses indicated that these genes primarily function in tissue synthesis, regulation of neuron projection development, vesicle organization and transportation, and lysosomal impact. The potential PD-associated genes identified in this study not only offer fresh insights into the disease's pathophysiology but also suggest potential biomarkers for early disease detection.
Collapse
Affiliation(s)
- Ya-Shi Wu
- Department of Bioinformatics, School of Basic Medical Sciences, Chongqing Medical University, Chongqing, China
- Department of Cell Biology and Medical Genetics, School of Basic Medical Sciences, Chongqing Medical University, Chongqing, China
| | - Wen-Han Zheng
- Department of Cell Biology and Medical Genetics, School of Basic Medical Sciences, Chongqing Medical University, Chongqing, China
| | - Tai-Hang Liu
- Department of Bioinformatics, School of Basic Medical Sciences, Chongqing Medical University, Chongqing, China
| | - Yan Sun
- Department of Cell Biology and Medical Genetics, School of Basic Medical Sciences, Chongqing Medical University, Chongqing, China
| | - Yu-Ting Xu
- Department of Cell Biology and Medical Genetics, School of Basic Medical Sciences, Chongqing Medical University, Chongqing, China
| | - Li-Zhen Shao
- Department of Bioinformatics, School of Basic Medical Sciences, Chongqing Medical University, Chongqing, China
| | - Qin-Yu Cai
- Department of Bioinformatics, School of Basic Medical Sciences, Chongqing Medical University, Chongqing, China
| | - Ya Qin Tang
- Department of Bioinformatics, School of Basic Medical Sciences, Chongqing Medical University, Chongqing, China
| |
Collapse
|
23
|
Gallinson DG, Kozakiewicz CP, Rautsaw RM, Beer MA, Ruiz-Aravena M, Comte S, Hamilton DG, Kerlin DH, McCallum HI, Hamede R, Jones ME, Storfer A, McMinds R, Margres MJ. Intergenomic signatures of coevolution between Tasmanian devils and an infectious cancer. Proc Natl Acad Sci U S A 2024; 121:e2307780121. [PMID: 38466855 PMCID: PMC10962979 DOI: 10.1073/pnas.2307780121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 01/17/2024] [Indexed: 03/13/2024] Open
Abstract
Coevolution is common and frequently governs host-pathogen interaction outcomes. Phenotypes underlying these interactions often manifest as the combined products of the genomes of interacting species, yet traditional quantitative trait mapping approaches ignore these intergenomic interactions. Devil facial tumor disease (DFTD), an infectious cancer afflicting Tasmanian devils (Sarcophilus harrisii), has decimated devil populations due to universal host susceptibility and a fatality rate approaching 100%. Here, we used a recently developed joint genome-wide association study (i.e., co-GWAS) approach, 15 y of mark-recapture data, and 960 genomes to identify intergenomic signatures of coevolution between devils and DFTD. Using a traditional GWA approach, we found that both devil and DFTD genomes explained a substantial proportion of variance in how quickly susceptible devils became infected, although genomic architectures differed across devils and DFTD; the devil genome had fewer loci of large effect whereas the DFTD genome had a more polygenic architecture. Using a co-GWA approach, devil-DFTD intergenomic interactions explained ~3× more variation in how quickly susceptible devils became infected than either genome alone, and the top genotype-by-genotype interactions were significantly enriched for cancer genes and signatures of selection. A devil regulatory mutation was associated with differential expression of a candidate cancer gene and showed putative allele matching effects with two DFTD coding sequence variants. Our results highlight the need to account for intergenomic interactions when investigating host-pathogen (co)evolution and emphasize the importance of such interactions when considering devil management strategies.
Collapse
Affiliation(s)
- Dylan G. Gallinson
- Department of Integrative Biology, University of South Florida, Tampa, FL33620
- College of Public Health, University of South Florida, Tampa, FL33620
| | - Christopher P. Kozakiewicz
- School of Biological Sciences, Washington State University, Pullman, WA99163
- W.K. Kellogg Biological Station, Department of Integrative Biology, Michigan State University, Hickory Corners, MI49060
| | - Rhett M. Rautsaw
- Department of Integrative Biology, University of South Florida, Tampa, FL33620
- School of Biological Sciences, Washington State University, Pullman, WA99163
| | - Marc A. Beer
- School of Biological Sciences, Washington State University, Pullman, WA99163
| | - Manuel Ruiz-Aravena
- School of Natural Sciences, University of Tasmania, Hobart, TAS7001, Australia
- Department of Public and Ecosystem Health, Cornell University, Ithaca, NY14853
| | - Sebastien Comte
- School of Natural Sciences, University of Tasmania, Hobart, TAS7001, Australia
- New South Wales Department of Primary Industries, Vertebrate Pest Research Unit, Orange, NSW2800, Australia
| | - David G. Hamilton
- School of Natural Sciences, University of Tasmania, Hobart, TAS7001, Australia
| | - Douglas H. Kerlin
- Centre for Planetary Health and Food Security, Griffith University, Nathan, QLD4111, Australia
| | - Hamish I. McCallum
- Centre for Planetary Health and Food Security, Griffith University, Nathan, QLD4111, Australia
| | - Rodrigo Hamede
- School of Natural Sciences, University of Tasmania, Hobart, TAS7001, Australia
- CANECEV Centre de Recherches Ecologiques et Evolutives sur le Cancer, Montpellier34394, France
| | - Menna E. Jones
- School of Natural Sciences, University of Tasmania, Hobart, TAS7001, Australia
| | - Andrew Storfer
- School of Biological Sciences, Washington State University, Pullman, WA99163
| | - Ryan McMinds
- Department of Integrative Biology, University of South Florida, Tampa, FL33620
- College of Public Health, University of South Florida, Tampa, FL33620
| | - Mark J. Margres
- Department of Integrative Biology, University of South Florida, Tampa, FL33620
| |
Collapse
|
24
|
Xu H, Kang Y, Liang T, Lu S, Xia X, Lu Z, Hu L, Guo L, Zhang L, Huang J, Ye L, Jiang P, Liu Y, Xinyi L, Zhai J, Wang Z, Liu Y. SNP-based and haplotype-based genome-wide association on drug dependence in Han Chinese. BMC Genomics 2024; 25:255. [PMID: 38448893 PMCID: PMC10919046 DOI: 10.1186/s12864-024-10117-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Accepted: 02/13/2024] [Indexed: 03/08/2024] Open
Abstract
BACKGROUND Drug addiction is a serious problem worldwide and is influenced by genetic factors. The present study aimed to investigate the association between genetics and drug addiction among Han Chinese. METHODS A total of 1000 Chinese users of illicit drugs and 9693 healthy controls were enrolled and underwent single nucleotide polymorphism (SNP)-based and haplotype-based association analyses via whole-genome genotyping. RESULTS Both single-SNP and haplotype tests revealed associations between illicit drug use and several immune-related genes in the major histocompatibility complex (MHC) region (SNP association: log10BF = 15.135, p = 1.054e-18; haplotype association: log10BF = 20.925, p = 2.065e-24). These genes may affect the risk of drug addiction via modulation of the neuroimmune system. The single-SNP test exclusively reported genome-wide significant associations between rs3782886 (SNP association: log10BF = 8.726, p = 4.842e-11) in BRAP and rs671 (SNP association: log10BF = 7.406, p = 9.333e-10) in ALDH2 and drug addiction. The haplotype test exclusively reported a genome-wide significant association (haplotype association: log10BF = 7.607, p = 3.342e-11) between a region with allelic heterogeneity on chromosome 22 and drug addiction, which may be involved in the pathway of vitamin B12 transport and metabolism, indicating a causal link between lower vitamin B12 levels and methamphetamine addiction. CONCLUSIONS These findings provide new insights into risk-modeling and the prevention and treatment of methamphetamine and heroin dependence, which may further contribute to potential novel therapeutic approaches.
Collapse
Affiliation(s)
- Hanli Xu
- College of Life Sciences and Bioengineering, School of Science, Beijing Jiaotong University, Beijing, 100028, China
| | - Yulin Kang
- Chinese Research Academy of Environmental Sciences, Beijing, 100012, China.
| | - Tingming Liang
- Jiangsu Key Laboratory for Molecular and Medical Biotechnology, School of Life Science, Nanjing Normal University, Nanjing, 210023, China
| | - Sifen Lu
- Precision Medicine Key Laboratory of Sichuan Province and Precision Medicine Center, West China Hospital, Sichuan University, Chengdu, 610041, China
| | - Xiaolin Xia
- Office of Academic Affairs, The National Police University for Criminal Justice, Baoding, 071000, China
| | - Zuhong Lu
- School of Biological Science & Medical Engineering, Southeast University, Nanjing, 211189, China
| | - Lingming Hu
- Chinese Research Academy of Environmental Sciences, Beijing, 100012, China
| | - Li Guo
- School of Geographic and Biologic Information, Nanjing University of Posts and Telecommunications, Nanjing, 210003, China
| | - Lishu Zhang
- College of Life Sciences and Bioengineering, School of Science, Beijing Jiaotong University, Beijing, 100028, China
| | - Jiaqiang Huang
- College of Life Sciences and Bioengineering, School of Science, Beijing Jiaotong University, Beijing, 100028, China
| | - Lin Ye
- Cheung Hong School of Journalism and Communication, Shantou University, Shantou, 515060, China
| | - Peiye Jiang
- Office of International Cooperation and Exchanges, Nanjing University, Nanjing, 210023, China
| | - Yi Liu
- Jiangsu Taihu Institute of Addiction Rehabilitation, Suzhou, 215111, China
| | - Li Xinyi
- College of Life Sciences and Bioengineering, School of Science, Beijing Jiaotong University, Beijing, 100028, China
| | - Jin Zhai
- Department of Social Work, Changzhou University, Changzhou, 213164, China
| | - Zi Wang
- School of Music, Nanjing Normal University, Nanjing, 210097, China
| | - Yangyang Liu
- Department of Psychology, Nanjing University, Nanjing, 210023, China.
- School of Education, Tianjin University, Tianjin, 200350, China.
| |
Collapse
|
25
|
Ferrão MAG, da Fonseca AFA, Volpi PS, de Souza LC, Comério M, Filho ACV, Riva-Souza EM, Munoz PR, Ferrão RG, Ferrão LFV. Genomic-assisted breeding for climate-smart coffee. THE PLANT GENOME 2024; 17:e20321. [PMID: 36946358 DOI: 10.1002/tpg2.20321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 01/25/2023] [Accepted: 02/12/2023] [Indexed: 06/18/2023]
Abstract
Coffee is a universal beverage that drives a multi-industry market on a global basis. Today, the sustainability of coffee production is threatened by accelerated climate changes. In this work, we propose the implementation of genomic-assisted breeding for climate-smart coffee in Coffea canephora. This species is adapted to higher temperatures and is more resilient to biotic and abiotic stresses. After evaluating two populations, over multiple harvests, and under severe drought weather condition, we dissected the genetic architecture of yield, disease resistance, and quality-related traits. By integrating genome-wide association studies and diallel analyses, our contribution is four-fold: (i) we identified a set of molecular markers with major effects associated with disease resistance and post-harvest traits, while yield and plant architecture presented a polygenic background; (ii) we demonstrated the relevance of nonadditive gene actions and projected hybrid vigor when genotypes from different geographically botanical groups are crossed; (iii) we computed medium-to-large heritability values for most of the traits, representing potential for fast genetic progress; and (iv) we provided a first step toward implementing molecular breeding to accelerate improvements in C. canephora. Altogether, this work is a blueprint for how quantitative genetics and genomics can assist coffee breeding and support the supply chain in the face of the current global changes.
Collapse
Affiliation(s)
- Maria Amélia G Ferrão
- Instituto Capixaba de Pesquisa, Assistência Técnica e Extensão Rural-Incaper, ES, Brazil
- Empresa Brasileira de Pesquisa Agropecuária-Embrapa Café, Brasília, Brazil
| | - Aymbire F A da Fonseca
- Instituto Capixaba de Pesquisa, Assistência Técnica e Extensão Rural-Incaper, ES, Brazil
- Empresa Brasileira de Pesquisa Agropecuária-Embrapa Café, Brasília, Brazil
| | - Paulo S Volpi
- Instituto Capixaba de Pesquisa, Assistência Técnica e Extensão Rural-Incaper, ES, Brazil
| | - Lucimara C de Souza
- Instituto Capixaba de Pesquisa, Assistência Técnica e Extensão Rural-Incaper, ES, Brazil
| | - Marcone Comério
- Instituto Capixaba de Pesquisa, Assistência Técnica e Extensão Rural-Incaper, ES, Brazil
| | - Abraão C Verdin Filho
- Instituto Capixaba de Pesquisa, Assistência Técnica e Extensão Rural-Incaper, ES, Brazil
| | - Elaine M Riva-Souza
- Instituto Capixaba de Pesquisa, Assistência Técnica e Extensão Rural-Incaper, ES, Brazil
| | - Patricio R Munoz
- Blueberry Breeding and Genomics Lab, Horticultural Sciences Department, University of Florida, Gainesville, FL, USA
| | - Romário G Ferrão
- Instituto Capixaba de Pesquisa, Assistência Técnica e Extensão Rural-Incaper, ES, Brazil
- Multivix Group, ES, Brazil
| | - Luís Felipe V Ferrão
- Blueberry Breeding and Genomics Lab, Horticultural Sciences Department, University of Florida, Gainesville, FL, USA
| |
Collapse
|
26
|
Nwizu C, Hughes M, Ramseier ML, Navia AW, Shalek AK, Fusi N, Raghavan S, Winter PS, Amini AP, Crawford L. Scalable nonparametric clustering with unified marker gene selection for single-cell RNA-seq data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.11.579839. [PMID: 38405697 PMCID: PMC10888887 DOI: 10.1101/2024.02.11.579839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
Clustering is commonly used in single-cell RNA-sequencing (scRNA-seq) pipelines to characterize cellular heterogeneity. However, current methods face two main limitations. First, they require user-specified heuristics which add time and complexity to bioinformatic workflows; second, they rely on post-selective differential expression analyses to identify marker genes driving cluster differences, which has been shown to be subject to inflated false discovery rates. We address these challenges by introducing nonparametric clustering of single-cell populations (NCLUSION): an infinite mixture model that leverages Bayesian sparse priors to identify marker genes while simultaneously performing clustering on single-cell expression data. NCLUSION uses a scalable variational inference algorithm to perform these analyses on datasets with up to millions of cells. By analyzing publicly available scRNA-seq studies, we demonstrate that NCLUSION (i) matches the performance of other state-of-the-art clustering techniques with significantly reduced runtime and (ii) provides statistically robust and biologically relevant transcriptomic signatures for each of the clusters it identifies. Overall, NCLUSION represents a reliable hypothesis-generating tool for understanding patterns of expression variation present in single-cell populations.
Collapse
Affiliation(s)
- Chibuikem Nwizu
- Center for Computational Molecular Biology, Brown University, Providence, RI, USA
- Warren Alpert Medical School of Brown University, Providence, RI, USA
| | | | - Michelle L. Ramseier
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Andrew W. Navia
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Alex K. Shalek
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, USA
- Harvard Medical School, Boston, MA, USA
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, MA, USA
| | | | - Srivatsan Raghavan
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
- Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | - Peter S. Winter
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | | | - Lorin Crawford
- Center for Computational Molecular Biology, Brown University, Providence, RI, USA
- Microsoft Research, Cambridge, MA, USA
- Department of Biostatistics, Brown University, Providence, RI, USA
| |
Collapse
|
27
|
Liu L, Yan R, Guo P, Ji J, Gong W, Xue F, Yuan Z, Zhou X. Conditional transcriptome-wide association study for fine-mapping candidate causal genes. Nat Genet 2024; 56:348-356. [PMID: 38279040 DOI: 10.1038/s41588-023-01645-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Accepted: 12/08/2023] [Indexed: 01/28/2024]
Abstract
Transcriptome-wide association studies (TWASs) aim to integrate genome-wide association studies with expression-mapping studies to identify genes with genetically predicted expression (GReX) associated with a complex trait. In the present report, we develop a method, GIFT (gene-based integrative fine-mapping through conditional TWAS), that performs conditional TWAS analysis by explicitly controlling for GReX of all other genes residing in a local region to fine-map putatively causal genes. GIFT is frequentist in nature, explicitly models both expression correlation and cis-single nucleotide polymorphism linkage disequilibrium across multiple genes and uses a likelihood framework to account for expression prediction uncertainty. As a result, GIFT produces calibrated P values and is effective for fine-mapping. We apply GIFT to analyze six traits in the UK Biobank, where GIFT narrows down the set size of putatively causal genes by 32.16-91.32% compared with existing TWAS fine-mapping approaches. The genes identified by GIFT highlight the importance of vessel regulation in determining blood pressures and lipid metabolism for regulating lipid levels.
Collapse
Affiliation(s)
- Lu Liu
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Ran Yan
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Ping Guo
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Jiadong Ji
- Institute for Financial Studies, Shandong University, Jinan, China
| | - Weiming Gong
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Fuzhong Xue
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Zhongshang Yuan
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China.
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, China.
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA.
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
28
|
Rowan TN, Schnabel RD, Decker JE. Uncovering the architecture of selection in two Bos taurus cattle breeds. Evol Appl 2024; 17:e13666. [PMID: 38405336 PMCID: PMC10883790 DOI: 10.1111/eva.13666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 12/18/2023] [Accepted: 01/26/2024] [Indexed: 02/27/2024] Open
Abstract
Directional selection alters the genome via hard sweeps, soft sweeps, and polygenic selection. However, mapping polygenic selection is difficult because it does not leave clear signatures on the genome like a selective sweep. In populations with temporally stratified genotypes, the Generation Proxy Selection Mapping (GPSM) method identifies variants associated with generation number (or appropriate proxy) and thus variants undergoing directional allele frequency changes. Here, we use GPSM on two large datasets of beef cattle to detect associations between an animal's generation and 11 million imputed SNPs. Using these datasets with high power and dense mapping resolution, GPSM detected a total of 294 unique loci actively under selection in two cattle breeds. We observed that GPSM has a high power to detect selection in the very recent past (<10 years), even when allele frequency changes are small. Variants identified by GPSM reside in genomic regions associated with known breed-specific selection objectives, such as fertility and maternal ability in Red Angus, and carcass merit and coat color in Simmental. Over 60% of the selected loci reside in or near (<50 kb) annotated genes. Using haplotype-based and composite selective sweep statistics, we identify hundreds of putative selective sweeps that likely occurred earlier in the evolution of these breeds; however, these sweeps have little overlap with recent polygenic selection. This makes GPSM a complementary approach to sweep detection methods when temporal genotype data are available. The selected loci that we identify across methods demonstrate the complex architecture of selection in domesticated cattle.
Collapse
Affiliation(s)
- Troy N. Rowan
- Division of Animal SciencesUniversity of MissouriColumbiaMissouriUSA
- Genetics Area ProgramUniversity of MissouriColumbiaMissouriUSA
- Department of Animal ScienceUniversity of Tennessee Institute of AgricultureKnoxvilleTennesseeUSA
- Department of Large Animal Clinical Sciences, College of Veterinary MedicineUniversity of TennesseeKnoxvilleTennesseeUSA
| | - Robert D. Schnabel
- Division of Animal SciencesUniversity of MissouriColumbiaMissouriUSA
- Genetics Area ProgramUniversity of MissouriColumbiaMissouriUSA
- Institute for Data Science and InformaticsUniversity of MissouriColumbiaMissouriUSA
| | - Jared E. Decker
- Division of Animal SciencesUniversity of MissouriColumbiaMissouriUSA
- Genetics Area ProgramUniversity of MissouriColumbiaMissouriUSA
- Institute for Data Science and InformaticsUniversity of MissouriColumbiaMissouriUSA
| |
Collapse
|
29
|
Agrawal V, Manouchehri A, Vaitinadin NS, Shi M, Bagheri M, Gupta DK, Kullo IJ, Luo Y, McNally EM, Puckelwartz MJ, Ferguson JF, Wells QS, Mosley JD. Identification of Clinical Drivers of Left Atrial Enlargement Through Genomics of Left Atrial Size. Circ Heart Fail 2024; 17:e010557. [PMID: 38126226 PMCID: PMC10842187 DOI: 10.1161/circheartfailure.123.010557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Accepted: 10/24/2023] [Indexed: 12/23/2023]
Abstract
BACKGROUND Greater left atrial size is associated with a higher incidence of cardiovascular disease and mortality, but the full spectrum of diagnoses associated with left atrial enlargement in sex-stratified clinical populations is not well known. Our study sought to identify genetic risk mechanisms affecting left atrial diameter (LAD) in a clinical cohort. METHODS Using Vanderbilt deidentified electronic health record, we studied 6163 females and 5993 males of European ancestry who had at least 1 LAD measure and available genotyping. A sex-stratified polygenic score was constructed for LAD variation and tested for association against 1680 International Classification of Diseases code-based phenotypes. Two-sample univariable and multivariable Mendelian randomization approaches were used to assess etiologic relationships between candidate associations and LAD. RESULTS A phenome-wide association study identified 25 International Classification of Diseases code-based diagnoses in females and 11 in males associated with a polygenic score of LAD (false discovery rate q<0.01), 5 of which were further evaluated by Mendelian randomization (waist circumference [WC], atrial fibrillation, heart failure, systolic blood pressure, and coronary artery disease). Sex-stratified differences in the genetic associations between risk factors and a polygenic score for LAD were observed (WC for females; heart failure, systolic blood pressure, atrial fibrillation, and WC for males). By multivariable Mendelian randomization, higher WC remained significantly associated with larger LAD in females, whereas coronary artery disease, WC, and atrial fibrillation remained significantly associated with larger LAD in males. CONCLUSIONS In a clinical population, we identified, by genomic approaches, potential etiologic risk factors for larger LAD. Further studies are needed to confirm the extent to which these risk factors may be modified to prevent or reverse adverse left atrial remodeling and the extent to which sex modifies these risk factors.
Collapse
Affiliation(s)
- Vineet Agrawal
- Vanderbilt Translational and Clinical Cardiovascular Research Center and Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Veterans Affairs, Nashville, TN, USA
| | - Ali Manouchehri
- Vanderbilt Translational and Clinical Cardiovascular Research Center and Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Nataraja Sarma Vaitinadin
- Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Mingjian Shi
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Minoo Bagheri
- Vanderbilt Translational and Clinical Cardiovascular Research Center and Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Deepak K. Gupta
- Vanderbilt Translational and Clinical Cardiovascular Research Center and Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Iftikhar J. Kullo
- Department of Cardiovascular Medicine, Mayo Clinic College of Medicine, Rochester, MN, USA
| | - Yuan Luo
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Elizabeth M. McNally
- Center for Genetic Medicine, Northwestern Feinberg School of Medicine, Chicago, IL, USA
| | - Megan J. Puckelwartz
- Center for Genetic Medicine, Northwestern Feinberg School of Medicine, Chicago, IL, USA
- Department of Pharmacology, Northwestern Feinberg School of Medicine, Chicago, IL, USA
| | - Jane F. Ferguson
- Vanderbilt Translational and Clinical Cardiovascular Research Center and Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Quinn S. Wells
- Vanderbilt Translational and Clinical Cardiovascular Research Center and Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Jonathan D. Mosley
- Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
30
|
Duan YY, Ke X, Wu H, Yao S, Shi W, Han JZ, Zhu RJ, Wang JH, Jia YY, Yang TL, Li M, Guo Y. Multi-tissue transcriptome-wide association study reveals susceptibility genes and drug targets for insulin resistance-relevant phenotypes. Diabetes Obes Metab 2024; 26:135-147. [PMID: 37779362 DOI: 10.1111/dom.15298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Revised: 09/04/2023] [Accepted: 09/11/2023] [Indexed: 10/03/2023]
Abstract
AIM Genome-wide association studies (GWAS) have identified multiple susceptibility loci associated with insulin resistance (IR)-relevant phenotypes. However, the genes responsible for these associations remain largely unknown. We aim to identify susceptibility genes for IR-relevant phenotypes via a transcriptome-wide association study. MATERIALS AND METHODS We conducted a large-scale multi-tissue transcriptome-wide association study for IR (Insulin Sensitivity Index, homeostasis model assessment-IR, fasting insulin) and lipid-relevant traits (high-density lipoprotein cholesterol, triglycerides, low-density lipoprotein cholesterol and total cholesterol) using the largest GWAS summary statistics and precomputed gene expression weights of 49 human tissues. Conditional and joint analyses were implemented to identify significantly independent genes. Furthermore, we estimated the causal effects of independent genes by Mendelian randomization causal inference analysis. RESULTS We identified 1190 susceptibility genes causally associated with IR-relevant phenotypes, including 58 genes that were not implicated in the original GWAS. Among them, 11 genes were further supported in differential expression analyses or a gene knockout mice database, such as KRIT1 showed both significantly differential expression and IR-related phenotypic effects in knockout mice. Meanwhile, seven proteins encoded by susceptibility genes were targeted by clinically approved drugs, and three of these genes (H6PD, CACNB2 and DRD2) have been served as drug targets for IR-related diseases/traits. Moreover, drug repurposing analysis identified four compounds with profiles opposing the expression of genes associated with IR risk. CONCLUSIONS Our study provided new insights into IR aetiology and avenues for therapeutic development.
Collapse
Affiliation(s)
- Yuan-Yuan Duan
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Xin Ke
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Hao Wu
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Shi Yao
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Wei Shi
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Ji-Zhou Han
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Ren-Jie Zhu
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Jia-Hao Wang
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Ying-Ying Jia
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Tie-Lin Yang
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Meng Li
- Department of Orthopedics, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
| | - Yan Guo
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China
| |
Collapse
|
31
|
Cui R, Elzur RA, Kanai M, Ulirsch JC, Weissbrod O, Daly MJ, Neale BM, Fan Z, Finucane HK. Improving fine-mapping by modeling infinitesimal effects. Nat Genet 2024; 56:162-169. [PMID: 38036779 PMCID: PMC11056999 DOI: 10.1038/s41588-023-01597-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 10/26/2023] [Indexed: 12/02/2023]
Abstract
Fine-mapping aims to identify causal genetic variants for phenotypes. Bayesian fine-mapping algorithms (for example, SuSiE, FINEMAP, ABF and COJO-ABF) are widely used, but assessing posterior probability calibration remains challenging in real data, where model misspecification probably exists, and true causal variants are unknown. We introduce replication failure rate (RFR), a metric to assess fine-mapping consistency by downsampling. SuSiE, FINEMAP and COJO-ABF show high RFR, indicating potential overconfidence in their output. Simulations reveal that nonsparse genetic architecture can lead to miscalibration, while imputation noise, nonuniform distribution of causal variants and quality control filters have minimal impact. Here we present SuSiE-inf and FINEMAP-inf, fine-mapping methods modeling infinitesimal effects alongside fewer larger causal effects. Our methods show improved calibration, RFR and functional enrichment, competitive recall and computational efficiency. Notably, using our methods' posterior effect sizes substantially increases polygenic risk score accuracy over SuSiE and FINEMAP. Our work improves causal variant identification for complex traits, a fundamental goal of human genetics.
Collapse
Affiliation(s)
- Ran Cui
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Roy A Elzur
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Masahiro Kanai
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
| | - Jacob C Ulirsch
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA, USA
| | - Omer Weissbrod
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Mark J Daly
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Benjamin M Neale
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Zhou Fan
- Department of Statistics and Data Science, Yale University, New Haven, CT, USA.
| | - Hilary K Finucane
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
32
|
Wang Z, Chen Q, Wang Y, Wang Y, Liu R. Refine localizations of functional variants affecting eggshell color of Lueyang black-boned chicken in the SLCO1B3. Poult Sci 2024; 103:103212. [PMID: 37980747 PMCID: PMC10685018 DOI: 10.1016/j.psj.2023.103212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 10/13/2023] [Accepted: 10/13/2023] [Indexed: 11/21/2023] Open
Abstract
Table eggs with color-uniformity shell are visually attractive for consumers. Lueyang black-boned chicken (LBC) lays colorful eggs, which is undesirable for sale of table eggs, but provides a segregating population for mapping functional variants affecting eggshell color. SLCO1B3 was identified as the causative gene for blue eggs in the Dongxiang and Araucana chickens. The aim of this study is to map functional variants associated with chicken eggshell color in the SLCO1B3. Eggshell color of LBC (n = 383) was measured using the L*a*b color space. SLCO1B3 was resequencing using a subset (n = 30) of 383 samples. Linkage disequilibrium among 139 SNP was analyzed. Association of 16 SNP in the SLCO1B3 and 8 in CPOX, ALAS1, and ABCG2 genes with L*a*b were tested by a polygenic model (LMM) and a polygenic/oligogenic mixed model (BSLMM). Chromatin state annotations were retrieved from the UCSC database. Effect of SLCO1B3 variants distributed in mapping and upstream 1.6-kb regions on promoter activities were analyzed using dual-luciferase reporter assay. One hundred and thirty-nine variants maintained low linkage disequilibrium with 80% of r2 less than 0.226. Fifteen SLCO1B3 variants were significantly associated with a*, of which 1B3_SNP108 was showed the strongest association and the largest effect on a*. In the BSLMM, 1B3_SNP108 alone appeared in the Markov chain Monte Carlo as major variants in 100% of posterior inclusion probability. None of variants in CPOX, ALAS1, and ABCG2 were significantly associated with color indexes except that 2 ALAS1 variants were associated with L*. 1B3_SNP108 distributes in the Intron4 where 6 active enhancers and 1 ATAC island were enriched. However, 1B3_SNP108-containing constructs showed negligible activities in the reporter assay. No significant differences of activities between haplotypes were found for five 5'-deleted promoter constructs. The data recognizes 1B3_SNP108 as a valuable marker for breeding of eggshell color. Functional variants are localized in the region adjacent to the 1B3_SNP108 due to low linkage disequilibrium in the LBC. Our findings extend the role of SLCO1B3 from a causative gene for blue eggs to a major regulator driving continuous variation of LBC eggshell color.
Collapse
Affiliation(s)
- Zhepeng Wang
- College of Animal Science and Technology, Northwest A&F University, Yangling 712100, Shaanxi, China.
| | - Qiu Chen
- College of Animal Science and Technology, Northwest A&F University, Yangling 712100, Shaanxi, China
| | - Yiwei Wang
- College of Animal Science and Technology, Northwest A&F University, Yangling 712100, Shaanxi, China
| | - Yulu Wang
- College of Animal Science and Technology, Northwest A&F University, Yangling 712100, Shaanxi, China
| | - Ruifang Liu
- College of Animal Science and Technology, Northwest A&F University, Yangling 712100, Shaanxi, China
| |
Collapse
|
33
|
Zhang MJ, Durvasula A, Chiang C, Koch EM, Strober BJ, Shi H, Barton AR, Kim SS, Weissbrod O, Loh PR, Gazal S, Sunyaev S, Price AL. Pervasive correlations between causal disease effects of proximal SNPs vary with functional annotations and implicate stabilizing selection. RESEARCH SQUARE 2023:rs.3.rs-3707248. [PMID: 38168385 PMCID: PMC10760228 DOI: 10.21203/rs.3.rs-3707248/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]
Abstract
The genetic architecture of human diseases and complex traits has been extensively studied, but little is known about the relationship of causal disease effect sizes between proximal SNPs, which have largely been assumed to be independent. We introduce a new method, LD SNP-pair effect correlation regression (LDSPEC), to estimate the correlation of causal disease effect sizes of derived alleles between proximal SNPs, depending on their allele frequencies, LD, and functional annotations; LDSPEC produced robust estimates in simulations across various genetic architectures. We applied LDSPEC to 70 diseases and complex traits from the UK Biobank (average N=306K), meta-analyzing results across diseases/traits. We detected significantly nonzero effect correlations for proximal SNP pairs (e.g., -0.37±0.09 for low-frequency positive-LD 0-100bp SNP pairs) that decayed with distance (e.g., -0.07±0.01 for low-frequency positive-LD 1-10kb), varied with allele frequency (e.g., -0.15±0.04 for common positive-LD 0-100bp), and varied with LD between SNPs (e.g., +0.12±0.05 for common negative-LD 0-100bp) (because we consider derived alleles, positive-LD and negative-LD SNP pairs may yield very different results). We further determined that SNP pairs with shared functions had stronger effect correlations that spanned longer genomic distances, e.g., -0.37±0.08 for low-frequency positive-LD same-gene promoter SNP pairs (average genomic distance of 47kb (due to alternative splicing)) and -0.32±0.04 for low-frequency positive-LD H3K27ac 0-1kb SNP pairs. Consequently, SNP-heritability estimates were substantially smaller than estimates of the sum of causal effect size variances across all SNPs (ratio of 0.87±0.02 across diseases/traits), particularly for certain functional annotations (e.g., 0.78±0.01 for common Super enhancer SNPs)-even though these quantities are widely assumed to be equal. We recapitulated our findings via forward simulations with an evolutionary model involving stabilizing selection, implicating the action of linkage masking, whereby haplotypes containing linked SNPs with opposite effects on disease have reduced effects on fitness and escape negative selection.
Collapse
Affiliation(s)
- Martin Jinye Zhang
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Arun Durvasula
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California
| | - Colby Chiang
- Department of Pediatrics, Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA
| | - Evan M. Koch
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Benjamin J. Strober
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Huwenbo Shi
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Alison R. Barton
- Department of Human Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Samuel S. Kim
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Omer Weissbrod
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Po-Ru Loh
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Steven Gazal
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California
- Department of Quantitative and Computational Biology, University of Southern California
- Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California
| | - Shamil Sunyaev
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Alkes L. Price
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
34
|
Link V, Schraiber JG, Fan C, Dinh B, Mancuso N, Chiang CWK, Edge MD. Tree-based QTL mapping with expected local genetic relatedness matrices. Am J Hum Genet 2023; 110:2077-2091. [PMID: 38065072 PMCID: PMC10716520 DOI: 10.1016/j.ajhg.2023.10.017] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2023] [Revised: 10/26/2023] [Accepted: 10/27/2023] [Indexed: 12/18/2023] Open
Abstract
Understanding the genetic basis of complex phenotypes is a central pursuit of genetics. Genome-wide association studies (GWASs) are a powerful way to find genetic loci associated with phenotypes. GWASs are widely and successfully used, but they face challenges related to the fact that variants are tested for association with a phenotype independently, whereas in reality variants at different sites are correlated because of their shared evolutionary history. One way to model this shared history is through the ancestral recombination graph (ARG), which encodes a series of local coalescent trees. Recent computational and methodological breakthroughs have made it feasible to estimate approximate ARGs from large-scale samples. Here, we explore the potential of an ARG-based approach to quantitative-trait locus (QTL) mapping, echoing existing variance-components approaches. We propose a framework that relies on the conditional expectation of a local genetic relatedness matrix (local eGRM) given the ARG. Simulations show that our method is especially beneficial for finding QTLs in the presence of allelic heterogeneity. By framing QTL mapping in terms of the estimated ARG, we can also facilitate the detection of QTLs in understudied populations. We use local eGRM to analyze two chromosomes containing known body size loci in a sample of Native Hawaiians. Our investigations can provide intuition about the benefits of using estimated ARGs in population- and statistical-genetic methods in general.
Collapse
Affiliation(s)
- Vivian Link
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Joshua G Schraiber
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Caoqi Fan
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA; Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Bryan Dinh
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA; Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Nicholas Mancuso
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA; Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Charleston W K Chiang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA; Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Michael D Edge
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
35
|
Zhang MJ, Durvasula A, Chiang C, Koch EM, Strober BJ, Shi H, Barton AR, Kim SS, Weissbrod O, Loh PR, Gazal S, Sunyaev S, Price AL. Pervasive correlations between causal disease effects of proximal SNPs vary with functional annotations and implicate stabilizing selection. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.12.04.23299391. [PMID: 38106023 PMCID: PMC10723494 DOI: 10.1101/2023.12.04.23299391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
The genetic architecture of human diseases and complex traits has been extensively studied, but little is known about the relationship of causal disease effect sizes between proximal SNPs, which have largely been assumed to be independent. We introduce a new method, LD SNP-pair effect correlation regression (LDSPEC), to estimate the correlation of causal disease effect sizes of derived alleles between proximal SNPs, depending on their allele frequencies, LD, and functional annotations; LDSPEC produced robust estimates in simulations across various genetic architectures. We applied LDSPEC to 70 diseases and complex traits from the UK Biobank (average N=306K), meta-analyzing results across diseases/traits. We detected significantly nonzero effect correlations for proximal SNP pairs (e.g., -0.37±0.09 for low-frequency positive-LD 0-100bp SNP pairs) that decayed with distance (e.g., -0.07±0.01 for low-frequency positive-LD 1-10kb), varied with allele frequency (e.g., -0.15±0.04 for common positive-LD 0-100bp), and varied with LD between SNPs (e.g., +0.12±0.05 for common negative-LD 0-100bp) (because we consider derived alleles, positive-LD and negative-LD SNP pairs may yield very different results). We further determined that SNP pairs with shared functions had stronger effect correlations that spanned longer genomic distances, e.g., -0.37±0.08 for low-frequency positive-LD same-gene promoter SNP pairs (average genomic distance of 47kb (due to alternative splicing)) and -0.32±0.04 for low-frequency positive-LD H3K27ac 0-1kb SNP pairs. Consequently, SNP-heritability estimates were substantially smaller than estimates of the sum of causal effect size variances across all SNPs (ratio of 0.87±0.02 across diseases/traits), particularly for certain functional annotations (e.g., 0.78±0.01 for common Super enhancer SNPs)-even though these quantities are widely assumed to be equal. We recapitulated our findings via forward simulations with an evolutionary model involving stabilizing selection, implicating the action of linkage masking, whereby haplotypes containing linked SNPs with opposite effects on disease have reduced effects on fitness and escape negative selection.
Collapse
Affiliation(s)
- Martin Jinye Zhang
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Arun Durvasula
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California
| | - Colby Chiang
- Department of Pediatrics, Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA
| | - Evan M Koch
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Benjamin J Strober
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Huwenbo Shi
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Alison R Barton
- Department of Human Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Samuel S Kim
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Omer Weissbrod
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Po-Ru Loh
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Steven Gazal
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California
- Department of Quantitative and Computational Biology, University of Southern California
- Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California
| | - Shamil Sunyaev
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Alkes L Price
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
36
|
Li H, Mazumder R, Lin X. Accurate and efficient estimation of local heritability using summary statistics and the linkage disequilibrium matrix. Nat Commun 2023; 14:7954. [PMID: 38040712 PMCID: PMC10692177 DOI: 10.1038/s41467-023-43565-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 11/14/2023] [Indexed: 12/03/2023] Open
Abstract
Existing SNP-heritability estimators that leverage summary statistics from genome-wide association studies (GWAS) are much less efficient (i.e., have larger standard errors) than the restricted maximum likelihood (REML) estimators which require access to individual-level data. We introduce a new method for local heritability estimation-Heritability Estimation with high Efficiency using LD and association Summary Statistics (HEELS)-that significantly improves the statistical efficiency of summary-statistics-based heritability estimator and attains comparable statistical efficiency as REML (with a relative statistical efficiency >92%). Moreover, we propose representing the empirical LD matrix as the sum of a low-rank matrix and a banded matrix. We show that this way of modeling the LD can not only reduce the storage and memory cost, but also improve the computational efficiency of heritability estimation. We demonstrate the statistical efficiency of HEELS and the advantages of our proposed LD approximation strategies both in simulations and through empirical analyses of the UK Biobank data.
Collapse
Affiliation(s)
- Hui Li
- Harvard T.H. Chan School of Public Health, Department of Biostatistics, Boston, MA, USA
| | - Rahul Mazumder
- Massachusetts Institute of Technology, Operations Research and Statistics group, Cambridge, MA, USA
| | - Xihong Lin
- Harvard T.H. Chan School of Public Health, Department of Biostatistics, Boston, MA, USA.
- Harvard University, Department of Statistics, Cambridge, MA, USA.
| |
Collapse
|
37
|
Meher PK, Gupta A, Rustgi S, Mir RR, Kumar A, Kumar J, Balyan HS, Gupta PK. Evaluation of eight Bayesian genomic prediction models for three micronutrient traits in bread wheat (Triticum aestivum L.). THE PLANT GENOME 2023; 16:e20332. [PMID: 37122189 DOI: 10.1002/tpg2.20332] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Revised: 02/21/2023] [Accepted: 03/13/2023] [Indexed: 06/19/2023]
Abstract
In wheat, genomic prediction accuracy (GPA) was assessed for three micronutrient traits (grain iron, grain zinc, and β-carotenoid concentrations) using eight Bayesian regression models. For this purpose, data on 246 accessions, each genotyped with 17,937 DArT markers, were utilized. The phenotypic data on traits were available for 2013-2014 from Powerkheda (Madhya Pradesh) and for 2014-2015 from Meerut (Uttar Pradesh), India. The accuracy of the models was measured in terms of reliability, which was computed following a repeated cross-validation approach. The predictions were obtained independently for each of the two environments after adjusting for the local effects and across environments after adjusting for the environmental effects. The Bayes ridge regression (BayesRR) model outperformed the other seven models, whereas BayesLASSO (BayesL) was the least efficient. The GPA increased with an increase in the size of the training set as well as with an increase in marker density. The GPA values differed for the three traits and were higher for the best linear unbiased estimate (BLUE) (obtained after adjusting for the environmental effects) relative to those for the two environments. The GPA also remained unaffected after accounting for the population structure. The results of the present study suggest that only the best model should be used for the estimations of genomic estimated breeding values (GEBVs) before their use for genomic selection to improve the grain micronutrient contents.
Collapse
Affiliation(s)
- Prabina Kumar Meher
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Ajit Gupta
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Sachin Rustgi
- Department of Plant and Environmental Sciences, Pee Dee Research and Education Centre, Clemson University, Florence, South Carolina, USA
| | - Reyazul Rouf Mir
- Division of Genetics and Plant Breeding, SKUAST-Kashmir, Kashmir, India
| | - Anuj Kumar
- Department of Microbiology and Immunology, Dalhousie University, Halifax, Nova Scotia, Canada
- Laboratory of Immunity, Shantou University Medical College, Shantou, People's Republic of China
| | - Jitendra Kumar
- National Agri-Food Biotechnology Institute (NABI), Ajitgarh, India
| | - Harindra Singh Balyan
- Department of Genetics and Plant Breeding, Chaudhary Charan Singh University, Meerut, India
| | - Pushpendra Kumar Gupta
- Department of Genetics and Plant Breeding, Chaudhary Charan Singh University, Meerut, India
| |
Collapse
|
38
|
de Carvalho CF, Slate J, Villoutreix R, Soria-Carrasco V, Riesch R, Feder JL, Gompert Z, Nosil P. DNA methylation differences between stick insect ecotypes. Mol Ecol 2023; 32:6809-6823. [PMID: 37864542 DOI: 10.1111/mec.17165] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 09/12/2023] [Accepted: 09/25/2023] [Indexed: 10/23/2023]
Abstract
Epigenetic mechanisms, such as DNA methylation, can influence gene regulation and affect phenotypic variation, raising the possibility that they contribute to ecological adaptation. Beginning to address this issue requires high-resolution sequencing studies of natural populations to pinpoint epigenetic regions of potential ecological and evolutionary significance. However, such studies are still relatively uncommon, especially in insects, and are mainly restricted to a few model organisms. Here, we characterize patterns of DNA methylation for natural populations of Timema cristinae adapted to two host plant species (i.e. ecotypes). By integrating results from sequencing of whole transcriptomes, genomes and methylomes, we investigate whether environmental, host and genetic differences of these stick insects are associated with methylation levels of cytosine nucleotides in the CpG context. We report an overall genome-wide methylation level for T. cristinae of ~14%, with methylation being enriched in gene bodies and impoverished in repetitive elements. Genome-wide DNA methylation variation was strongly positively correlated with genetic distance (relatedness), but also exhibited significant host-plant effects. Using methylome-environment association analysis, we pinpointed specific genomic regions that are differentially methylated between ecotypes, with these regions being enriched for genes with functions in membrane processes. The observed association between methylation variation and genetic relatedness, and with the ecologically important variable of host plant, suggests a potential role for epigenetic modification in T. cristinae adaptation. To substantiate such adaptive significance, future studies could test whether methylation can be transmitted across generations and the extent to which it responds to experimental manipulation in field and laboratory studies.
Collapse
Affiliation(s)
| | - Jon Slate
- School of Biosciences, University of Sheffield, Sheffield, UK
| | | | | | - Rüdiger Riesch
- University of Montpellier, CEFE, CNRS, EPHE, IRD, Montpellier, France
- Department of Biological Sciences, Centre for Ecology, Evolution and Behaviour, Royal Holloway University of London, Egham, UK
| | - Jeffrey L Feder
- Department of Biology, Notre Dame University, South Bend, Indiana, USA
| | | | - Patrik Nosil
- School of Biosciences, University of Sheffield, Sheffield, UK
- University of Montpellier, CEFE, CNRS, EPHE, IRD, Montpellier, France
| |
Collapse
|
39
|
Kuang T, Hu C, Shaw RK, Zhang Y, Fan J, Bi Y, Jiang F, Guo R, Fan X. A potential candidate gene associated with the angles of the ear leaf and the second leaf above the ear leaf in maize. BMC PLANT BIOLOGY 2023; 23:540. [PMID: 37924003 PMCID: PMC10625212 DOI: 10.1186/s12870-023-04553-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 10/22/2023] [Indexed: 11/06/2023]
Abstract
BACKGROUND Leaf angle is a key trait for maize plant architecture that plays a significant role in its morphological development, and ultimately impacting maize grain yield. Although many studies have been conducted on the association and localization of genes regulating leaf angle in maize, most of the candidate genes identified are associated with the regulation of ligule-ear development and phytohormone pathways, and only a few candidate genes have been reported to enhance the mechanical strength of leaf midrib and vascular tissues. RESULTS To address this gap, we conducted a genome-wide association study (GWAS) using the leaf angle phenotype and genotyping-by-sequencing data generated from three recombinant inbred line (RIL) populations of maize. Through GWAS analysis, we identified 156 SNPs significantly associated with the leaf angle trait and detected a total of 68 candidate genes located within 10 kb upstream and downstream of these individual SNPs. Among these candidate genes, Zm00001d045408, located on chromosome 9 emerged as a key gene controlling the angles of both the ear leaf and the second leaf above the ear leaf. Notably, this new gene's homolog in Arabidopsis promotes cell division and vascular tissue development. Further analysis revealed that a SNP transversion (G/T) at 7.536 kb downstream of the candidate gene Zm00001d045408 may have caused a reduction in leaf angles of the ear and the second leaf above the ear leaf. Our analysis of the 10 kb region downstream of this candidate gene revealed a 4.337 kb solo long-terminal reverse transcription transposon (solo LTR), located 3.112 kb downstream of Zm00001d045408, with the SNP located 87 bp upstream of the solo LTR. CONCLUSIONS In summary, we have identified a novel candidate gene, Zm00001d045408 and a solo LTR that are associated with the angles of both the ear leaf and the second leaf above the ear leaf. The future research holds great potential in exploring the precise role of newly identified candidate gene in leaf angle regulation. Functional characterization of this gene can help in gaining deeper insights into the complex genetic pathways underlying maize plant architecture.
Collapse
Affiliation(s)
- Tianhui Kuang
- Institute of Food Crops, Yunnan Academy of Agricultural Sciences, Kunming, China
| | - Can Hu
- Institute of Food Crops, Yunnan Academy of Agricultural Sciences, Kunming, China
- School of Agriculture, Yunnan University, Kunming, China
| | - Ranjan Kumar Shaw
- Institute of Food Crops, Yunnan Academy of Agricultural Sciences, Kunming, China
| | - Yudong Zhang
- Institute of Food Crops, Yunnan Academy of Agricultural Sciences, Kunming, China
| | - Jun Fan
- Institute of Food Crops, Yunnan Academy of Agricultural Sciences, Kunming, China
| | - Yaqi Bi
- Institute of Food Crops, Yunnan Academy of Agricultural Sciences, Kunming, China
| | - Fuyan Jiang
- Institute of Food Crops, Yunnan Academy of Agricultural Sciences, Kunming, China
| | - Ruijia Guo
- Institute of Food Crops, Yunnan Academy of Agricultural Sciences, Kunming, China
| | - Xingming Fan
- Institute of Food Crops, Yunnan Academy of Agricultural Sciences, Kunming, China.
| |
Collapse
|
40
|
Zhou Y, Luo K, Liang L, Chen M, He X. A new Bayesian factor analysis method improves detection of genes and biological processes affected by perturbations in single-cell CRISPR screening. Nat Methods 2023; 20:1693-1703. [PMID: 37770710 PMCID: PMC10630124 DOI: 10.1038/s41592-023-02017-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Accepted: 08/18/2023] [Indexed: 09/30/2023]
Abstract
Clustered regularly interspaced short palindromic repeats (CRISPR) screening coupled with single-cell RNA sequencing has emerged as a powerful tool to characterize the effects of genetic perturbations on the whole transcriptome at a single-cell level. However, due to its sparsity and complex structure, analysis of single-cell CRISPR screening data is challenging. In particular, standard differential expression analysis methods are often underpowered to detect genes affected by CRISPR perturbations. We developed a statistical method for such data, called guided sparse factor analysis (GSFA). GSFA infers latent factors that represent coregulated genes or gene modules; by borrowing information from these factors, it infers the effects of genetic perturbations on individual genes. We demonstrated through extensive simulation studies that GSFA detects perturbation effects with much higher power than state-of-the-art methods. Using single-cell CRISPR data from human CD8+ T cells and neural progenitor cells, we showed that GSFA identified biologically relevant gene modules and specific genes affected by CRISPR perturbations, many of which were missed by existing methods, providing new insights into the functions of genes involved in T cell activation and neurodevelopment.
Collapse
Affiliation(s)
- Yifan Zhou
- Graduate Program of Biophysical Sciences, University of Chicago, Chicago, IL, USA
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Kaixuan Luo
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Lifan Liang
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Mengjie Chen
- Department of Human Genetics, University of Chicago, Chicago, IL, USA.
- Department of Medicine, University of Chicago, Chicago, IL, USA.
| | - Xin He
- Department of Human Genetics, University of Chicago, Chicago, IL, USA.
| |
Collapse
|
41
|
Dowell JA, Mason C. Candidate pathway association and genome-wide association approaches reveal alternative genetic architectures of carotenoid content in cultivated sunflower ( Helianthus annuus). APPLICATIONS IN PLANT SCIENCES 2023; 11:e11558. [PMID: 38106540 PMCID: PMC10719882 DOI: 10.1002/aps3.11558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Revised: 05/10/2023] [Accepted: 05/19/2023] [Indexed: 12/19/2023]
Abstract
Premise The explosion of available genomic data poses significant opportunities and challenges for genome-wide association studies. Current approaches via linear mixed models (LMM) are straightforward but prevent flexible assumptions of an a priori genomic architecture, while Bayesian sparse LMMs (BSLMMs) allow this flexibility. Complex traits, such as specialized metabolites, are subject to various hierarchical effects, including gene regulation, enzyme efficiency, and the availability of reactants. Methods To identify alternative genetic architectures, we examined the genetic architecture underlying the carotenoid content of an association mapping panel of Helianthus annuus individuals using multiple BSLMM and LMM frameworks. Results The LMMs of genome-wide single-nucleotide polymorphisms (SNPs) identified a single transcription factor responsible for the observed variations in the carotenoid content; however, a BSLMM of the SNPs with the bottom 1% of effect sizes from the results of the LMM identified multiple biologically relevant quantitative trait loci (QTLs) for carotenoid content external to the known (annotated) carotenoid pathway. A candidate pathway analysis (CPA) suggested a β-carotene isomerase to be the enzyme with the highest impact on the observed carotenoid content within the carotenoid pathway. Discussion While traditional LMM approaches suggested a single unknown transcription factor associated with carotenoid content variation in sunflower petals, BSLMM proposed several QTLs with interpretable biological relevance to this trait. In addition, the CPA allowed for the dissection of the regulatory vs. biosynthetic genetic architectures underlying this metabolic trait.
Collapse
Affiliation(s)
- Jordan A. Dowell
- Department of Plant SciencesUniversity of CaliforniaDavisCalifornia95616USA
- Present address:
Department of Biological SciencesLouisiana State UniversityBaton RougeLouisiana70803USA
| | - Chase Mason
- Department of BiologyUniversity of Central FloridaOrlandoFlorida32816USA
| |
Collapse
|
42
|
John M, Lencz T. Potential application of elastic nets for shared polygenicity detection with adapted threshold selection. Int J Biostat 2023; 19:417-438. [PMID: 36327464 PMCID: PMC10154439 DOI: 10.1515/ijb-2020-0108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2020] [Accepted: 10/05/2022] [Indexed: 11/06/2022]
Abstract
Current research suggests that hundreds to thousands of single nucleotide polymorphisms (SNPs) with small to modest effect sizes contribute to the genetic basis of many disorders, a phenomenon labeled as polygenicity. Additionally, many such disorders demonstrate polygenic overlap, in which risk alleles are shared at associated genetic loci. A simple strategy to detect polygenic overlap between two phenotypes is based on rank-ordering the univariate p-values from two genome-wide association studies (GWASs). Although high-dimensional variable selection strategies such as Lasso and elastic nets have been utilized in other GWAS analysis settings, they are yet to be utilized for detecting shared polygenicity. In this paper, we illustrate how elastic nets, with polygenic scores as the dependent variable and with appropriate adaptation in selecting the penalty parameter, may be utilized for detecting a subset of SNPs involved in shared polygenicity. We provide theory to better understand our approaches, and illustrate their utility using synthetic datasets. Results from extensive simulations are presented comparing the elastic net approaches with the rank ordering approach, in various scenarios. Results from simulations studies exhibit one of the elastic net approaches to be superior when the correlations among the SNPs are high. Finally, we apply the methods on two real datasets to illustrate further the capabilities, limitations and differences among the methods.
Collapse
Affiliation(s)
- Majnu John
- Institute of Behavioral Science, Feinstein Institutes of Medical Research, Manhasset, NY
- Division of Psychiatry Research, The Zucker Hillside Hospital, Northwell Health System, Glen Oaks, NY
- Departments of Psychiatry and of Mathematics, Hofstra University, Hempstead, NY
| | - Todd Lencz
- Institute of Behavioral Science, Feinstein Institutes of Medical Research, Manhasset, NY
- Division of Psychiatry Research, The Zucker Hillside Hospital, Northwell Health System, Glen Oaks, NY
- Departments of Psychiatry and of Molecular Medicine, Zucker School of Medicine at Hofstra/Northwell, Hempstead, NY
| |
Collapse
|
43
|
Hai Y, Zhao W, Meng Q, Liu L, Wen Y. Bayesian linear mixed model with multiple random effects for family-based genetic studies. Front Genet 2023; 14:1267704. [PMID: 37928242 PMCID: PMC10620972 DOI: 10.3389/fgene.2023.1267704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 09/25/2023] [Indexed: 11/07/2023] Open
Abstract
Motivation: Family-based study design is one of the popular designs used in genetic research, and the whole-genome sequencing data obtained from family-based studies offer many unique features for risk prediction studies. They can not only provide a more comprehensive view of many complex diseases, but also utilize information in the design to further improve the prediction accuracy. While promising, existing analytical methods often ignore the information embedded in the study design and overlook the predictive effects of rare variants, leading to a prediction model with sub-optimal performance. Results: We proposed a Bayesian linear mixed model for the prediction analysis of sequencing data obtained from family-based studies. Our method can not only capture predictive effects from both common and rare variants, but also easily accommodate various disease model assumptions. It uses information embedded in the study design to form surrogates, where the predictive effects from unmeasured/unknown genetic and environmental risk factors can be modelled. Through extensive simulation studies and the analysis of sequencing data obtained from the Michigan State University Twin Registry study, we have demonstrated that the proposed method outperforms commonly adopted techniques. Availability: R package is available at https://github.com/yhai943/FBLMM.
Collapse
Affiliation(s)
- Yang Hai
- Department of Statistics, University of Auckland, Auckland, New Zealand
| | - Wenxuan Zhao
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Qingyu Meng
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Long Liu
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Yalu Wen
- Department of Statistics, University of Auckland, Auckland, New Zealand
| |
Collapse
|
44
|
Zhu D, Zhao Y, Zhang R, Wu H, Cai G, Wu Z, Wang Y, Hu X. Genomic prediction based on selective linkage disequilibrium pruning of low-coverage whole-genome sequence variants in a pure Duroc population. Genet Sel Evol 2023; 55:72. [PMID: 37853325 PMCID: PMC10583454 DOI: 10.1186/s12711-023-00843-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Accepted: 09/14/2023] [Indexed: 10/20/2023] Open
Abstract
BACKGROUND Although the accumulation of whole-genome sequencing (WGS) data has accelerated the identification of mutations underlying complex traits, its impact on the accuracy of genomic predictions is limited. Reliable genotyping data and pre-selected beneficial loci can be used to improve prediction accuracy. Previously, we reported a low-coverage sequencing genotyping method that yielded 11.3 million highly accurate single-nucleotide polymorphisms (SNPs) in pigs. Here, we introduce a method termed selective linkage disequilibrium pruning (SLDP), which refines the set of SNPs that show a large gain during prediction of complex traits using whole-genome SNP data. RESULTS We used the SLDP method to identify and select markers among millions of SNPs based on genome-wide association study (GWAS) prior information. We evaluated the performance of SLDP with respect to three real traits and six simulated traits with varying genetic architectures using two representative models (genomic best linear unbiased prediction and BayesR) on samples from 3579 Duroc boars. SLDP was determined by testing 180 combinations of two core parameters (GWAS P-value thresholds and linkage disequilibrium r2). The parameters for each trait were optimized in the training population by five fold cross-validation and then tested in the validation population. Similar to previous GWAS prior-based methods, the performance of SLDP was mainly affected by the genetic architecture of the traits analyzed. Specifically, SLDP performed better for traits controlled by major quantitative trait loci (QTL) or a small number of quantitative trait nucleotides (QTN). Compared with two commercial SNP chips, genotyping-by-sequencing data, and an unselected whole-genome SNP panel, the SLDP strategy led to significant improvements in prediction accuracy, which ranged from 0.84 to 3.22% for real traits controlled by major or moderate QTL and from 1.23 to 11.47% for simulated traits controlled by a small number of QTN. CONCLUSIONS The SLDP marker selection method can be incorporated into mainstream prediction models to yield accuracy improvements for traits with a relatively simple genetic architecture, however, it has no significant advantage for traits not controlled by major QTL. The main factors that affect its performance are the genetic architecture of traits and the reliability of GWAS prior information. Our findings can facilitate the application of WGS-based genomic selection.
Collapse
Affiliation(s)
- Di Zhu
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Yiqiang Zhao
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Ran Zhang
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Hanyu Wu
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, China
- National Research Facility for Phenotypic and Genotypic Analysis of Model Animals (Beijing), China Agricultural University, Beijing, China
| | - Gengyuan Cai
- National Engineering Research Center for Breeding Swine Industry, South China Agricultural University, Guangdong, China
| | - Zhenfang Wu
- National Engineering Research Center for Breeding Swine Industry, South China Agricultural University, Guangdong, China.
| | - Yuzhe Wang
- National Research Facility for Phenotypic and Genotypic Analysis of Model Animals (Beijing), China Agricultural University, Beijing, China.
| | - Xiaoxiang Hu
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, China.
| |
Collapse
|
45
|
Xu C, Ganesh SK, Zhou X. mtPGS: Leverage multiple correlated traits for accurate polygenic score construction. Am J Hum Genet 2023; 110:1673-1689. [PMID: 37716346 PMCID: PMC10577082 DOI: 10.1016/j.ajhg.2023.08.016] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 08/18/2023] [Accepted: 08/27/2023] [Indexed: 09/18/2023] Open
Abstract
Accurate polygenic scores (PGSs) facilitate the genetic prediction of complex traits and aid in the development of personalized medicine. Here, we develop a statistical method called multi-trait assisted PGS (mtPGS), which can construct accurate PGSs for a target trait of interest by leveraging multiple traits relevant to the target trait. Specifically, mtPGS borrows SNP effect size similarity information between the target trait and its relevant traits to improve the effect size estimation on the target trait, thus achieving accurate PGSs. In the process, mtPGS flexibly models the shared genetic architecture between the target and the relevant traits to achieve robust performance, while explicitly accounting for the environmental covariance among them to accommodate different study designs with various sample overlap patterns. In addition, mtPGS uses only summary statistics as input and relies on a deterministic algorithm with several algebraic techniques for scalable computation. We evaluate the performance of mtPGS through comprehensive simulations and applications to 25 traits in the UK Biobank, where in the real data mtPGS achieves an average of 0.90%-52.91% accuracy gain compared to the state-of-the-art PGS methods. Overall, mtPGS represents an accurate, fast, and robust solution for PGS construction in biobank-scale datasets.
Collapse
Affiliation(s)
- Chang Xu
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA; Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Santhi K Ganesh
- Department of Internal Medicine, Division of Cardiovascular Medicine, University of Michigan, Ann Arbor, MI, USA; Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA; Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA.
| |
Collapse
|
46
|
Riehl JFL, Cole CT, Morrow CJ, Barker HL, Bernhardsson C, Rubert‐Nason K, Ingvarsson PK, Lindroth RL. Genomic and transcriptomic analyses reveal polygenic architecture for ecologically important traits in aspen ( Populus tremuloides Michx.). Ecol Evol 2023; 13:e10541. [PMID: 37780087 PMCID: PMC10534199 DOI: 10.1002/ece3.10541] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 08/30/2023] [Accepted: 09/04/2023] [Indexed: 10/03/2023] Open
Abstract
Intraspecific genetic variation in foundation species such as aspen (Populus tremuloides Michx.) shapes their impact on forest structure and function. Identifying genes underlying ecologically important traits is key to understanding that impact. Previous studies, using single-locus genome-wide association (GWA) analyses to identify candidate genes, have identified fewer genes than anticipated for highly heritable quantitative traits. Mounting evidence suggests that polygenic control of quantitative traits is largely responsible for this "missing heritability" phenomenon. Our research characterized the genetic architecture of 30 ecologically important traits using a common garden of aspen through genomic and transcriptomic analyses. A multilocus association model revealed that most traits displayed a highly polygenic architecture, with most variation explained by loci with small effects (likely below the detection levels of single-locus GWA methods). Consistent with a polygenic architecture, our single-locus GWA analyses found only 38 significant SNPs in 22 genes across 15 traits. Next, we used differential expression analysis on a subset of aspen genets with divergent concentrations of salicinoid phenolic glycosides (key defense traits). This complementary method to traditional GWA discovered 1243 differentially expressed genes for a polygenic trait. Soft clustering analysis revealed three gene clusters (241 candidate genes) involved in secondary metabolite biosynthesis and regulation. Our work reveals that ecologically important traits governing higher-order community- and ecosystem-level attributes of a foundation forest tree species have complex underlying genetic structures and will require methods beyond traditional GWA analyses to unravel.
Collapse
Affiliation(s)
| | | | - Clay J. Morrow
- Department of Forest and Wildlife EcologyUniversity of Wisconsin‐MadisonMadisonWisconsinUSA
| | - Hilary L. Barker
- Department of EntomologyUniversity of Wisconsin‐MadisonMadisonWisconsinUSA
- Present address:
Office of Student SuccessWisconsin Technical College SystemMadisonWisconsinUSA
| | - Carolina Bernhardsson
- Department of Ecology and Environmental ScienceUmeå UniversityUmeåSweden
- Present address:
Department of Organismal Biology, Center for Evolutionary BiologyUppsala UniversityUppsalaSweden
| | - Kennedy Rubert‐Nason
- Department of EntomologyUniversity of Wisconsin‐MadisonMadisonWisconsinUSA
- Present address:
Division of Natural SciencesUniversity of Maine at Fort KentFort KentMaineUSA
| | - Pär K. Ingvarsson
- Department of Plant BiologySwedish University of Agricultural Sciences, Uppsala BioCenterUppsalaSweden
| | | |
Collapse
|
47
|
Xie L, Qin J, Rao L, Cui D, Tang X, Chen L, Xiao S, Zhang Z, Huang L. Genetic dissection and genomic prediction for pork cuts and carcass morphology traits in pig. J Anim Sci Biotechnol 2023; 14:116. [PMID: 37660101 PMCID: PMC10475202 DOI: 10.1186/s40104-023-00914-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 07/02/2023] [Indexed: 09/04/2023] Open
Abstract
BACKGROUND As pre-cut and pre-packaged chilled meat becomes increasingly popular, integrating the carcass-cutting process into the pig industry chain has become a trend. Identifying quantitative trait loci (QTLs) of pork cuts would facilitate the selection of pigs with a higher overall value. However, previous studies solely focused on evaluating the phenotypic and genetic parameters of pork cuts, neglecting the investigation of QTLs influencing these traits. This study involved 17 pork cuts and 12 morphology traits from 2,012 pigs across four populations genotyped using CC1 PorcineSNP50 BeadChips. Our aim was to identify QTLs and evaluate the accuracy of genomic estimated breed values (GEBVs) for pork cuts. RESULTS We identified 14 QTLs and 112 QTLs for 17 pork cuts by GWAS using haplotype and imputation genotypes, respectively. Specifically, we found that HMGA1, VRTN and BMP2 were associated with body length and weight. Subsequent analysis revealed that HMGA1 primarily affects the size of fore leg bones, VRTN primarily affects the number of vertebrates, and BMP2 primarily affects the length of vertebrae and the size of hind leg bones. The prediction accuracy was defined as the correlation between the adjusted phenotype and GEBVs in the validation population, divided by the square root of the trait's heritability. The prediction accuracy of GEBVs for pork cuts varied from 0.342 to 0.693. Notably, ribs, boneless picnic shoulder, tenderloin, hind leg bones, and scapula bones exhibited prediction accuracies exceeding 0.600. Employing better models, increasing marker density through genotype imputation, and pre-selecting markers significantly improved the prediction accuracy of GEBVs. CONCLUSIONS We performed the first study to dissect the genetic mechanism of pork cuts and identified a large number of significant QTLs and potential candidate genes. These findings carry significant implications for the breeding of pork cuts through marker-assisted and genomic selection. Additionally, we have constructed the first reference populations for genomic selection of pork cuts in pigs.
Collapse
Affiliation(s)
- Lei Xie
- State Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, 330045 China
| | - Jiangtao Qin
- State Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, 330045 China
| | - Lin Rao
- State Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, 330045 China
| | - Dengshuai Cui
- State Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, 330045 China
| | - Xi Tang
- State Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, 330045 China
| | - Liqing Chen
- State Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, 330045 China
| | - Shijun Xiao
- State Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, 330045 China
| | - Zhiyan Zhang
- State Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, 330045 China
| | - Lusheng Huang
- State Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, 330045 China
| |
Collapse
|
48
|
Mai J, Lu M, Gao Q, Zeng J, Xiao J. Transcriptome-wide association studies: recent advances in methods, applications and available databases. Commun Biol 2023; 6:899. [PMID: 37658226 PMCID: PMC10474133 DOI: 10.1038/s42003-023-05279-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 08/24/2023] [Indexed: 09/03/2023] Open
Abstract
Genome-wide association study has identified fruitful variants impacting heritable traits. Nevertheless, identifying critical genes underlying those significant variants has been a great task. Transcriptome-wide association study (TWAS) is an instrumental post-analysis to detect significant gene-trait associations focusing on modeling transcription-level regulations, which has made numerous progresses in recent years. Leveraging from expression quantitative loci (eQTL) regulation information, TWAS has advantages in detecting functioning genes regulated by disease-associated variants, thus providing insight into mechanisms of diseases and other phenotypes. Considering its vast potential, this review article comprehensively summarizes TWAS, including the methodology, applications and available resources.
Collapse
Affiliation(s)
- Jialin Mai
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Mingming Lu
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Qianwen Gao
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Jingyao Zeng
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China.
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China.
| | - Jingfa Xiao
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China.
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|
49
|
Stamp J, DenAdel A, Weinreich D, Crawford L. Leveraging the genetic correlation between traits improves the detection of epistasis in genome-wide association studies. G3 (BETHESDA, MD.) 2023; 13:jkad118. [PMID: 37243672 PMCID: PMC10484060 DOI: 10.1093/g3journal/jkad118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 01/11/2023] [Accepted: 05/23/2023] [Indexed: 05/29/2023]
Abstract
Epistasis, commonly defined as the interaction between genetic loci, is known to play an important role in the phenotypic variation of complex traits. As a result, many statistical methods have been developed to identify genetic variants that are involved in epistasis, and nearly all of these approaches carry out this task by focusing on analyzing one trait at a time. Previous studies have shown that jointly modeling multiple phenotypes can often dramatically increase statistical power for association mapping. In this study, we present the "multivariate MArginal ePIstasis Test" (mvMAPIT)-a multioutcome generalization of a recently proposed epistatic detection method which seeks to detect marginal epistasis or the combined pairwise interaction effects between a given variant and all other variants. By searching for marginal epistatic effects, one can identify genetic variants that are involved in epistasis without the need to identify the exact partners with which the variants interact-thus, potentially alleviating much of the statistical and computational burden associated with conventional explicit search-based methods. Our proposed mvMAPIT builds upon this strategy by taking advantage of correlation structure between traits to improve the identification of variants involved in epistasis. We formulate mvMAPIT as a multivariate linear mixed model and develop a multitrait variance component estimation algorithm for efficient parameter inference and P-value computation. Together with reasonable model approximations, our proposed approach is scalable to moderately sized genome-wide association studies. With simulations, we illustrate the benefits of mvMAPIT over univariate (or single-trait) epistatic mapping strategies. We also apply mvMAPIT framework to protein sequence data from two broadly neutralizing anti-influenza antibodies and approximately 2,000 heterogeneous stock of mice from the Wellcome Trust Centre for Human Genetics. The mvMAPIT R package can be downloaded at https://github.com/lcrawlab/mvMAPIT.
Collapse
Affiliation(s)
- Julian Stamp
- Center for Computational Molecular Biology, Brown University, Providence, RI 02906, USA
| | - Alan DenAdel
- Center for Computational Molecular Biology, Brown University, Providence, RI 02906, USA
| | - Daniel Weinreich
- Center for Computational Molecular Biology, Brown University, Providence, RI 02906, USA
- Department of Ecology, Evolution, and Organismal Biology, Brown University, Providence, RI 02906, USA
| | - Lorin Crawford
- Center for Computational Molecular Biology, Brown University, Providence, RI 02906, USA
- Department of Biostatistics, Brown University, Providence, RI 02903, USA
- Microsoft Research New England, Cambridge, MA 02142, USA
| |
Collapse
|
50
|
Neto C, Hancock A. Genetic Architecture of Flowering Time Differs Between Populations With Contrasting Demographic and Selective Histories. Mol Biol Evol 2023; 40:msad185. [PMID: 37603463 PMCID: PMC10461413 DOI: 10.1093/molbev/msad185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 08/09/2023] [Accepted: 08/10/2023] [Indexed: 08/23/2023] Open
Abstract
Understanding the evolutionary factors that impact the genetic architecture of traits is a central goal of evolutionary genetics. Here, we investigate how quantitative trait variation accumulated over time in populations that colonized a novel environment. We compare the genetic architecture of flowering time in Arabidopsis populations from the drought-prone Cape Verde Islands and their closest outgroup population from North Africa. We find that trait polygenicity is severely reduced in the island populations compared to the continental North African population. Further, trait architectures and reconstructed allelic histories best fit a model of strong directional selection in the islands in accord with a Fisher-Orr adaptive walk. Consistent with this, we find that large-effect variants that disrupt major flowering time genes (FRI and FLC) arose first, followed by smaller effect variants, including ATX2 L125F, which is associated with a 4-day reduction in flowering time. The most recently arising flowering time-associated loci are not known to be directly involved in flowering time, consistent with an omnigenic signature developing as the population approaches its trait optimum. Surprisingly, we find no effect in the natural population of EDI-Cvi-0 (CRY2 V367M), an allele for which an effect was previously validated by introgression into a Eurasian line. Instead, our results suggest the previously observed effect of the EDI-Cvi-0 allele on flowering time likely depends on genetic background, due to an epistatic interaction. Altogether, our results provide an empirical example of the effects demographic history and selection has on trait architecture.
Collapse
Affiliation(s)
- Célia Neto
- Molecular Basis of Adaptation Research Group, Max Planck Institute for Plant Breeding Research, Cologne, Germany
| | - Angela Hancock
- Molecular Basis of Adaptation Research Group, Max Planck Institute for Plant Breeding Research, Cologne, Germany
| |
Collapse
|