1
|
Kunkel D, Sørensen P, Shankar V, Morgante F. Improving polygenic prediction from summary data by learning patterns of effect sharing across multiple phenotypes. PLoS Genet 2025; 21:e1011519. [PMID: 39775068 DOI: 10.1371/journal.pgen.1011519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Accepted: 11/27/2024] [Indexed: 01/11/2025] Open
Abstract
Polygenic prediction of complex trait phenotypes has become important in human genetics, especially in the context of precision medicine. Recently, mr.mash, a flexible and computationally efficient method that models multiple phenotypes jointly and leverages sharing of effects across such phenotypes to improve prediction accuracy, was introduced. However, a drawback of mr.mash is that it requires individual-level data, which are often not publicly available. In this work, we introduce mr.mash-rss, an extension of the mr.mash model that requires only summary statistics from Genome-Wide Association Studies (GWAS) and linkage disequilibrium (LD) estimates from a reference panel. By using summary data, we achieve the twin goal of increasing the applicability of the mr.mash model to data sets that are not publicly available and making it scalable to biobank-size data. Through simulations, we show that mr.mash-rss is competitive with, and often outperforms, current state-of-the-art methods for single- and multi-phenotype polygenic prediction in a variety of scenarios that differ in the pattern of effect sharing across phenotypes, the number of phenotypes, the number of causal variants, and the genomic heritability. We also present a real data analysis of 16 blood cell phenotypes in the UK Biobank, showing that mr.mash-rss achieves higher prediction accuracy than competing methods for the majority of traits, especially when the data set has smaller sample size.
Collapse
Affiliation(s)
- Deborah Kunkel
- School of Mathematical and Statistical Sciences, Clemson University, Clemson, South Carolina, United States of America
| | - Peter Sørensen
- Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark
| | - Vijay Shankar
- Center for Human Genetics, Clemson University, Greenwood, South Carolina, United States of America
| | - Fabio Morgante
- Center for Human Genetics, Clemson University, Greenwood, South Carolina, United States of America
- Department of Genetics and Biochemistry, Clemson University, Clemson, South Carolina, United States of America
| |
Collapse
|
2
|
Strober BJ, Zhang MJ, Amariuta T, Rossen J, Price AL. Fine-mapping causal tissues and genes at disease-associated loci. Nat Genet 2025:10.1038/s41588-024-01994-2. [PMID: 39747598 DOI: 10.1038/s41588-024-01994-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Accepted: 10/18/2024] [Indexed: 01/04/2025]
Abstract
Complex diseases often have distinct mechanisms spanning multiple tissues. We propose tissue-gene fine-mapping (TGFM), which infers the posterior inclusion probability (PIP) for each gene-tissue pair to mediate a disease locus by analyzing summary statistics and expression quantitative trait loci (eQTL) data; TGFM also assigns PIPs to non-mediated variants. TGFM accounts for co-regulation across genes and tissues and models uncertainty in cis-predicted expression models, enabling correct calibration. We applied TGFM to 45 UK Biobank diseases or traits using eQTL data from 38 Genotype-Tissue Expression (GTEx) tissues. TGFM identified an average of 147 PIP > 0.5 causal genetic elements per disease or trait, of which 11% were gene-tissue pairs. Causal gene-tissue pairs identified by TGFM reflected both known biology (for example, TPO-thyroid for hypothyroidism) and biologically plausible findings (for example, SLC20A2-artery aorta for diastolic blood pressure). Application of TGFM to single-cell eQTL data from nine cell types in peripheral blood mononuclear cells (PBMCs), analyzed jointly with GTEx tissues, identified 30 additional causal gene-PBMC cell type pairs.
Collapse
Affiliation(s)
- Benjamin J Strober
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| | - Martin Jinye Zhang
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Tiffany Amariuta
- Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Jordan Rossen
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Alkes L Price
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| |
Collapse
|
3
|
Likasitwatanakul P, Li Z, Doan P, Spisak S, Raghawan AK, Liu Q, Liow P, Lee S, Chen D, Bala P, Sahgal P, Aitymbayev D, Thalappillil JS, Papanastasiou M, Hawkins W, Carr SA, Park H, Cleary JM, Qi J, Sethi NS. Chemical perturbations impacting histone acetylation govern colorectal cancer differentiation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.12.06.626451. [PMID: 39713466 PMCID: PMC11661112 DOI: 10.1101/2024.12.06.626451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 12/24/2024]
Abstract
Dysregulated epigenetic programs that restrict differentiation, reactivate fetal genes, and confer phenotypic plasticity are critical to colorectal cancer (CRC) development. By screening a small molecule library targeting epigenetic regulators using our dual reporter system, we found that inhibiting histone deacetylase (HDAC) 1/2 promotes CRC differentiation and anti-tumor activity. Comprehensive biochemical, chemical, and genetic experiments revealed that on-target blockade of the HDAC1/2 catalytic domain mediated the differentiated phenotype. Unbiased profiling of histone posttranslational modifications induced by HDAC1/2 inhibition nominated acetylation of specific histone lysine residues as potential regulators of differentiation. Genome-wide assessment of implicated marks indicated that H3K27ac gains at HDAC1/2-bound regions associated with open chromatin and upregulation of differentiation genes upon HDAC1/2 inhibition. Disrupting H3K27ac by degrading acetyltransferase EP300 rescued HDAC1/2 inhibitor-mediated differentiation of a patient-derived CRC model using single cell RNA-sequencing. Genetic screens revealed that DAPK3 contributes to CRC differentiation induced by HDAC1/2 inhibition. These results highlight the importance of specific chemically targetable histone modifications in governing cancer cell states and epigenetic reprogramming as a therapeutic strategy in CRC.
Collapse
Affiliation(s)
- Pornlada Likasitwatanakul
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard University, Cambridge, MA, USA
- Department of Medicine, Faculty of Medicine Siriraj Hospital, Bangkok, Thailand
| | - Zhixin Li
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard University, Cambridge, MA, USA
| | - Paul Doan
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard University, Cambridge, MA, USA
| | - Sandor Spisak
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Akhouri Kishore Raghawan
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard University, Cambridge, MA, USA
| | - Qi Liu
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Priscilla Liow
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Sunwoo Lee
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - David Chen
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| | - Pratyusha Bala
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard University, Cambridge, MA, USA
| | - Pranshu Sahgal
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard University, Cambridge, MA, USA
| | - Daulet Aitymbayev
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard University, Cambridge, MA, USA
| | - Jennifer S. Thalappillil
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Gastrointestinal Cancer Center, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Malvina Papanastasiou
- Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard University, Cambridge, MA, USA
| | - William Hawkins
- Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard University, Cambridge, MA, USA
| | - Steven A. Carr
- Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard University, Cambridge, MA, USA
| | - Haeseong Park
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Gastrointestinal Cancer Center, Dana-Farber Cancer Institute, Boston, MA, USA
| | - James M. Cleary
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Gastrointestinal Cancer Center, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Jun Qi
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Nilay S. Sethi
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard University, Cambridge, MA, USA
- Gastrointestinal Cancer Center, Dana-Farber Cancer Institute, Boston, MA, USA
| |
Collapse
|
4
|
Morris AH, Bohannan BJM. Estimates of microbiome heritability across hosts. Nat Microbiol 2024; 9:3110-3119. [PMID: 39548346 DOI: 10.1038/s41564-024-01865-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 10/15/2024] [Indexed: 11/17/2024]
Abstract
Microbiomes contribute to variation in many plant and animal traits, suggesting that microbiome-mediated traits could evolve through selection on the host. However, for such evolution to occur, microbiomes must exhibit sufficient heritability to contribute to host adaptation. Previous work has attempted to estimate the heritability of a variety of microbiome attributes. Here we show that most published estimates are limited to vertebrate and plant hosts, but significant heritability of microbiome attributes has been frequently reported. This indicates that microbiomes could evolve in response to host-level selection, but studies across a wider range of hosts are necessary before general conclusions can be made. We suggest future studies focus on standardizing heritability measurements for the purpose of meta-analyses and investigate the role of the environment in contributing to heritable microbiome variation. This could have important implications for the use of microbiomes in conservation, agriculture and medicine.
Collapse
Affiliation(s)
- Andrew H Morris
- Institute of Ecology & Evolution, University of Oregon, Eugene, OR, USA.
| | | |
Collapse
|
5
|
Zainab A, Anzawa H, Kinoshita K. Identifying key genes in COPD risk via multiple population data integration and gene prioritization. PLoS One 2024; 19:e0305803. [PMID: 39509417 PMCID: PMC11542775 DOI: 10.1371/journal.pone.0305803] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Accepted: 10/22/2024] [Indexed: 11/15/2024] Open
Abstract
Chronic obstructive pulmonary disease (COPD) is a highly prevalent disease, making it a leading cause of death worldwide. Several genome-wide association studies (GWAS) have been conducted to identify loci associated with COPD. However, different ancestral genetic compositions for the same disease across various populations present challenges in studies involving multi-population data. In this study, we aimed to identify protein-coding genes associated with COPD by prioritizing genes for each population's GWAS data, and then combining these results instead of performing a common meta-GWAS due to significant sample differences in different population cohorts. Lung function measurements are often used as indicators for COPD risk prediction; therefore, we used lung function GWAS data from two populations, Japanese and European, and re-evaluated them using a multi-population gene prioritization approach. This study identified significant single nucleotide variants (SNPs) in both Japanese and European populations. The Japanese GWAS revealed nine significant SNPs and four lead SNPs in three genomic risk loci. In comparison, the European population showed five lead SNPs and 17 independent significant SNPs in 21 genomic risk loci. A comparative analysis of the results found 28 similar genes in the prioritized gene lists of both populations. We also performed a standard meta-analysis for comparison and identified 18 common genes in both populations. Our approach demonstrated that trans-ethnic linkage disequilibrium (LD) could detect some significant novel associations and genes that have yet to be reported or were missed in previous analyses. The study suggests that a gene prioritization approach for multi-population analysis using GWAS data may be a feasible method to identify new associations in data with genetic diversity across different populations. It also highlights the possibility of identifying generalized and population-specific treatment and diagnostic options.
Collapse
Affiliation(s)
- Afeefa Zainab
- Graduate School of Information Sciences, Tohoku University, Sendai, Japan
| | - Hayato Anzawa
- Graduate School of Information Sciences, Tohoku University, Sendai, Japan
- Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan
| | - Kengo Kinoshita
- Graduate School of Information Sciences, Tohoku University, Sendai, Japan
- Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan
| |
Collapse
|
6
|
Cao C, Tian M, Li Z, Zhu W, Huang P, Yang S. GWAShug: a comprehensive platform for decoding the shared genetic basis between complex traits based on summary statistics. Nucleic Acids Res 2024:gkae873. [PMID: 39380491 DOI: 10.1093/nar/gkae873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2024] [Revised: 09/14/2024] [Accepted: 09/24/2024] [Indexed: 10/10/2024] Open
Abstract
The shared genetic basis offers very valuable insights into the etiology, diagnosis and therapy of complex traits. However, a comprehensive resource providing shared genetic basis using the accessible summary statistics is currently lacking. It is challenging to analyze the shared genetic basis due to the difficulty in selecting parameters and the complexity of pipeline implementation. To address these issues, we introduce GWAShug, a platform featuring a standardized best-practice pipeline with four trait level methods and three molecular level methods. Based on stringent quality control, the GWAShug resource module includes 539 high-quality GWAS summary statistics for European and East Asian populations, covering 54 945 pairs between a measurement-based and a disease-based trait and 43 902 pairs between two disease-based traits. Users can easily search for shared genetic basis information by trait name, MeSH term and category, and access detailed gene information across different trait pairs. The platform facilitates interactive visualization and analysis of shared genetic basic results, allowing users to explore data dynamically. Results can be conveniently downloaded via FTP links. Additionally, we offer an online analysis module that allows users to analyze their own summary statistics, providing comprehensive tables, figures and interactive visualization and analysis. GWAShug is freely accessible at http://www.gwashug.com.
Collapse
Affiliation(s)
- Chen Cao
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Min Tian
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Zhenghui Li
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Wenyan Zhu
- Department of Biostatistics, Centre for Global Health, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Peng Huang
- Department of Epidemiology, Centre for Global Health, School of Public Health, National Vaccine Innovation Platform, Key Laboratory of Public Health Safety and Emergency Prevention and Control Technology of Higher Education Institutions in Jiangsu Province, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Sheng Yang
- Department of Biostatistics, Centre for Global Health, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| |
Collapse
|
7
|
Kontou PI, Bagos PG. The goldmine of GWAS summary statistics: a systematic review of methods and tools. BioData Min 2024; 17:31. [PMID: 39238044 PMCID: PMC11375927 DOI: 10.1186/s13040-024-00385-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Accepted: 08/27/2024] [Indexed: 09/07/2024] Open
Abstract
Genome-wide association studies (GWAS) have revolutionized our understanding of the genetic architecture of complex traits and diseases. GWAS summary statistics have become essential tools for various genetic analyses, including meta-analysis, fine-mapping, and risk prediction. However, the increasing number of GWAS summary statistics and the diversity of software tools available for their analysis can make it challenging for researchers to select the most appropriate tools for their specific needs. This systematic review aims to provide a comprehensive overview of the currently available software tools and databases for GWAS summary statistics analysis. We conducted a comprehensive literature search to identify relevant software tools and databases. We categorized the tools and databases by their functionality, including data management, quality control, single-trait analysis, and multiple-trait analysis. We also compared the tools and databases based on their features, limitations, and user-friendliness. Our review identified a total of 305 functioning software tools and databases dedicated to GWAS summary statistics, each with unique strengths and limitations. We provide descriptions of the key features of each tool and database, including their input/output formats, data types, and computational requirements. We also discuss the overall usability and applicability of each tool for different research scenarios. This comprehensive review will serve as a valuable resource for researchers who are interested in using GWAS summary statistics to investigate the genetic basis of complex traits and diseases. By providing a detailed overview of the available tools and databases, we aim to facilitate informed tool selection and maximize the effectiveness of GWAS summary statistics analysis.
Collapse
Affiliation(s)
| | - Pantelis G Bagos
- Department of Computer Science and Biomedical Informatics, University of Thessaly, 35131, Lamia, Greece.
| |
Collapse
|
8
|
Singar S, Nagpal R, Arjmandi BH, Akhavan NS. Personalized Nutrition: Tailoring Dietary Recommendations through Genetic Insights. Nutrients 2024; 16:2673. [PMID: 39203810 PMCID: PMC11357412 DOI: 10.3390/nu16162673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Revised: 08/08/2024] [Accepted: 08/09/2024] [Indexed: 09/03/2024] Open
Abstract
Personalized nutrition (PN) represents a transformative approach in dietary science, where individual genetic profiles guide tailored dietary recommendations, thereby optimizing health outcomes and managing chronic diseases more effectively. This review synthesizes key aspects of PN, emphasizing the genetic basis of dietary responses, contemporary research, and practical applications. We explore how individual genetic differences influence dietary metabolisms, thus underscoring the importance of nutrigenomics in developing personalized dietary guidelines. Current research in PN highlights significant gene-diet interactions that affect various conditions, including obesity and diabetes, suggesting that dietary interventions could be more precise and beneficial if they are customized to genetic profiles. Moreover, we discuss practical implementations of PN, including technological advancements in genetic testing that enable real-time dietary customization. Looking forward, this review identifies the robust integration of bioinformatics and genomics as critical for advancing PN. We advocate for multidisciplinary research to overcome current challenges, such as data privacy and ethical concerns associated with genetic testing. The future of PN lies in broader adoption across health and wellness sectors, promising significant advancements in public health and personalized medicine.
Collapse
Affiliation(s)
- Saiful Singar
- Department of Health, Nutrition, and Food Sciences, College of Education, Health, and Human Sciences, Florida State University, Tallahassee, FL 32306, USA; (S.S.); (R.N.); (B.H.A.)
| | - Ravinder Nagpal
- Department of Health, Nutrition, and Food Sciences, College of Education, Health, and Human Sciences, Florida State University, Tallahassee, FL 32306, USA; (S.S.); (R.N.); (B.H.A.)
| | - Bahram H. Arjmandi
- Department of Health, Nutrition, and Food Sciences, College of Education, Health, and Human Sciences, Florida State University, Tallahassee, FL 32306, USA; (S.S.); (R.N.); (B.H.A.)
| | - Neda S. Akhavan
- Department of Kinesiology and Nutrition Sciences, School of Integrated Health Sciences, University of Nevada, Las Vegas, NV 89154, USA
| |
Collapse
|
9
|
Cheng J, Meng C, Li J, Kong Z, Zhou A. Integrating polygenic risk scores in the prediction of gestational diabetes risk in China. Front Endocrinol (Lausanne) 2024; 15:1391296. [PMID: 39165511 PMCID: PMC11333217 DOI: 10.3389/fendo.2024.1391296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Accepted: 07/12/2024] [Indexed: 08/22/2024] Open
Abstract
Background Polygenic risk scores (PRS) serve as valuable tools for connecting initial genetic discoveries with clinical applications in disease risk estimation. However, limited studies have explored the association between PRS and gestational diabetes mellitus (GDM), particularly in predicting GDM risk among Chinese populations. Aim To evaluate the relationship between PRS and GDM and explore the predictive capability of PRS for GDM risk in a Chinese population. Methods A prospective cohort study was conducted, which included 283 GDM and 2,258 non-GDM cases based on demographic information on pregnancies. GDM was diagnosed using the oral glucose tolerance test (OGTT) at 24-28 weeks. The strength of the association between PRS and GDM odds was assessed employing odds ratios (ORs) with 95% confidence intervals (CIs) derived from logistic regression. Receiver operating characteristic curves, net reclassification improvement (NRI), and integrated discrimination improvement (IDI) were employed to evaluate the improvement in prediction achieved by the new model. Results Women who developed GDM exhibited significantly higher PRS compared to control individuals (OR = 2.01, 95% CI = 1.33-3.07). The PRS value remained positively associated with fasting plasma glucose (FPG), 1-hour post-glucose load (1-h OGTT), and 2-hour post-glucose load (2-h OGTT) (all p < 0.05). The incorporation of PRS led to a statistically significant improvement in the area under the curve (0.71, 95% CI: 0.66-0.75, p = 0.024) and improved discrimination and classification (IDI: 0.007, 95% CI: 0.003-0.012, p < 0.001; NRI: 0.258, 95% CI: 0.135-0.382, p < 0.001). Conclusions This study highlights the increased odds of GDM associated with higher PRS values and modest improvements in predictive capability for GDM.
Collapse
Affiliation(s)
- Jiayi Cheng
- Department of Obstetrics, Wuhan Children’s Hospital (Wuhan Maternal and Child Health care Hospital), Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Chan Meng
- Department of Obstetrics, Wuhan Children’s Hospital (Wuhan Maternal and Child Health care Hospital), Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Junwei Li
- Department of Obstetrics, Wuhan Children’s Hospital (Wuhan Maternal and Child Health care Hospital), Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Ziwen Kong
- Department of Obstetrics, Wuhan Children’s Hospital (Wuhan Maternal and Child Health care Hospital), Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Aifen Zhou
- Department of Obstetrics, Wuhan Children’s Hospital (Wuhan Maternal and Child Health care Hospital), Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Institute of Maternal and Child Health, Wuhan Children’s Hospital (Wuhan Maternal and Child Health care Hospital), Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| |
Collapse
|
10
|
Xu L, Liu Y. Identification, Design, and Application of Noncoding Cis-Regulatory Elements. Biomolecules 2024; 14:945. [PMID: 39199333 PMCID: PMC11352686 DOI: 10.3390/biom14080945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2024] [Revised: 07/25/2024] [Accepted: 07/30/2024] [Indexed: 09/01/2024] Open
Abstract
Cis-regulatory elements (CREs) play a pivotal role in orchestrating interactions with trans-regulatory factors such as transcription factors, RNA-binding proteins, and noncoding RNAs. These interactions are fundamental to the molecular architecture underpinning complex and diverse biological functions in living organisms, facilitating a myriad of sophisticated and dynamic processes. The rapid advancement in the identification and characterization of these regulatory elements has been marked by initiatives such as the Encyclopedia of DNA Elements (ENCODE) project, which represents a significant milestone in the field. Concurrently, the development of CRE detection technologies, exemplified by massively parallel reporter assays, has progressed at an impressive pace, providing powerful tools for CRE discovery. The exponential growth of multimodal functional genomic data has necessitated the application of advanced analytical methods. Deep learning algorithms, particularly large language models, have emerged as invaluable tools for deconstructing the intricate nucleotide sequences governing CRE function. These advancements facilitate precise predictions of CRE activity and enable the de novo design of CREs. A deeper understanding of CRE operational dynamics is crucial for harnessing their versatile regulatory properties. Such insights are instrumental in refining gene therapy techniques, enhancing the efficacy of selective breeding programs, pushing the boundaries of genetic innovation, and opening new possibilities in microbial synthetic biology.
Collapse
Affiliation(s)
- Lingna Xu
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China;
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
| | - Yuwen Liu
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China;
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
- Kunpeng Institute of Modern Agriculture at Foshan, Chinese Academy of Agricultural Sciences, Foshan 528226, China
| |
Collapse
|
11
|
Chen DM, Dong R, Kachuri L, Hoffmann TJ, Jiang Y, Berndt SI, Shelley JP, Schaffer KR, Machiela MJ, Freedman ND, Huang WY, Li SA, Lilja H, Justice AC, Madduri RK, Rodriguez AA, Van Den Eeden SK, Chanock SJ, Haiman CA, Conti DV, Klein RJ, Mosley JD, Witte JS, Graff RE. Transcriptome-wide association analysis identifies candidate susceptibility genes for prostate-specific antigen levels in men without prostate cancer. HGG ADVANCES 2024; 5:100315. [PMID: 38845201 PMCID: PMC11262184 DOI: 10.1016/j.xhgg.2024.100315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Revised: 05/31/2024] [Accepted: 06/03/2024] [Indexed: 06/18/2024] Open
Abstract
Deciphering the genetic basis of prostate-specific antigen (PSA) levels may improve their utility for prostate cancer (PCa) screening. Using genome-wide association study (GWAS) summary statistics from 95,768 PCa-free men, we conducted a transcriptome-wide association study (TWAS) to examine impacts of genetically predicted gene expression on PSA. Analyses identified 41 statistically significant (p < 0.05/12,192 = 4.10 × 10-6) associations in whole blood and 39 statistically significant (p < 0.05/13,844 = 3.61 × 10-6) associations in prostate tissue, with 18 genes associated in both tissues. Cross-tissue analyses identified 155 statistically significantly (p < 0.05/22,249 = 2.25 × 10-6) genes. Out of 173 unique PSA-associated genes across analyses, we replicated 151 (87.3%) in a TWAS of 209,318 PCa-free individuals from the Million Veteran Program. Based on conditional analyses, we found 20 genes (11 single tissue, nine cross-tissue) that were associated with PSA levels in the discovery TWAS that were not attributable to a lead variant from a GWAS. Ten of these 20 genes replicated, and two of the replicated genes had colocalization probability of >0.5: CCNA2 and HIST1H2BN. Six of the 20 identified genes are not known to impact PCa risk. Fine-mapping based on whole blood and prostate tissue revealed five protein-coding genes with evidence of causal relationships with PSA levels. Of these five genes, four exhibited evidence of colocalization and one was conditionally independent of previous GWAS findings. These results yield hypotheses that should be further explored to improve understanding of genetic factors underlying PSA levels.
Collapse
Affiliation(s)
- Dorothy M Chen
- Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Ruocheng Dong
- Department of Epidemiology and Population Health, Stanford University, Stanford, CA 94305, USA
| | - Linda Kachuri
- Department of Epidemiology and Population Health, Stanford University, Stanford, CA 94305, USA; Stanford Cancer Institute, Stanford University, Stanford, CA 94305, USA
| | - Thomas J Hoffmann
- Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, CA 94158, USA; Institute for Human Genetics, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Yu Jiang
- Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Sonja I Berndt
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD 20814, USA
| | - John P Shelley
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Kerry R Schaffer
- Department of Internal Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Mitchell J Machiela
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD 20814, USA
| | - Neal D Freedman
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD 20814, USA
| | - Wen-Yi Huang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD 20814, USA
| | - Shengchao A Li
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD 20814, USA
| | - Hans Lilja
- Departments of Pathology and Laboratory Medicine, Surgery, Medicine, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Department of Translational Medicine, Lund University, 21428 Malmö, Sweden
| | | | | | | | | | - Stephen J Chanock
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD 20814, USA
| | - Christopher A Haiman
- Center for Genetic Epidemiology, Department of Population and Preventive Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA 90032, USA; Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - David V Conti
- Center for Genetic Epidemiology, Department of Population and Preventive Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA 90032, USA; Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Robert J Klein
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Jonathan D Mosley
- Departments of Internal Medicine and Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - John S Witte
- Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, CA 94158, USA; Department of Epidemiology and Population Health, Stanford University, Stanford, CA 94305, USA; Departments of Biomedical Data Science and Genetics (by courtesy), Stanford University, Stanford, CA 94305, USA.
| | - Rebecca E Graff
- Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, CA 94158, USA.
| |
Collapse
|
12
|
Zou Y, Carbonetto P, Xie D, Wang G, Stephens M. Fast and flexible joint fine-mapping of multiple traits via the Sum of Single Effects model. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.04.14.536893. [PMID: 37425935 PMCID: PMC10327118 DOI: 10.1101/2023.04.14.536893] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
We introduce mvSuSiE, a multi-trait fine-mapping method for identifying putative causal variants from genetic association data (individual-level or summary data). mvSuSiE learns patterns of shared genetic effects from data, and exploits these patterns to improve power to identify causal SNPs. Comparisons on simulated data show that mvSuSiE is competitive in speed, power and precision with existing multi-trait methods, and uniformly improves on single-trait fine-mapping (SuSiE) in each trait separately. We applied mvSuSiE to jointly fine-map 16 blood cell traits using data from the UK Biobank. By jointly analyzing the traits and modeling heterogeneous effect sharing patterns, we discovered a much larger number of causal SNPs (>3,000) compared with single-trait fine-mapping, and with narrower credible sets. mvSuSiE also more comprehensively characterized the ways in which the genetic variants affect one or more blood cell traits; 68% of causal SNPs showed significant effects in more than one blood cell type.
Collapse
Affiliation(s)
- Yuxin Zou
- Department of Statistics, University of Chicago, Chicago, IL, USA
- Regeneron Genetics Center, Regeneron Pharmaceuticals, Inc., Tarrytown, NY, USA
| | - Peter Carbonetto
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Dongyue Xie
- Department of Statistics, University of Chicago, Chicago, IL, USA
| | - Gao Wang
- Gertrude. H. Sergievsky Center, Department of Neurology, Columbia University, New York, NY, USA
| | - Matthew Stephens
- Department of Statistics, University of Chicago, Chicago, IL, USA
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| |
Collapse
|
13
|
Strober BJ, Zhang MJ, Amariuta T, Rossen J, Price AL. Fine-mapping causal tissues and genes at disease-associated loci. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2023.11.01.23297909. [PMID: 37961337 PMCID: PMC10635248 DOI: 10.1101/2023.11.01.23297909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Heritable diseases often manifest in a highly tissue-specific manner, with different disease loci mediated by genes in distinct tissues or cell types. We propose Tissue-Gene Fine-Mapping (TGFM), a fine-mapping method that infers the posterior probability (PIP) for each gene-tissue pair to mediate a disease locus by analyzing GWAS summary statistics (and in-sample LD) and leveraging eQTL data from diverse tissues to build cis-predicted expression models; TGFM also assigns PIPs to causal variants that are not mediated by gene expression in assayed genes and tissues. TGFM accounts for both co-regulation across genes and tissues and LD between SNPs (generalizing existing fine-mapping methods), and incorporates genome-wide estimates of each tissue's contribution to disease as tissue-level priors. TGFM was well-calibrated and moderately well-powered in simulations; unlike previous methods, TGFM was able to attain correct calibration by modeling uncertainty in cis-predicted expression models. We applied TGFM to 45 UK Biobank diseases/traits (average N = 316K) using eQTL data from 38 GTEx tissues. TGFM identified an average of 147 PIP > 0.5 causal genetic elements per disease/trait, of which 11% were gene-tissue pairs. Implicated gene-tissue pairs were concentrated in known disease-critical tissues, and causal genes were strongly enriched in disease-relevant gene sets. Causal gene-tissue pairs identified by TGFM recapitulated known biology (e.g., TPO-thyroid for Hypothyroidism), but also included biologically plausible novel findings (e.g., SLC20A2-artery aorta for Diastolic blood pressure). Further application of TGFM to single-cell eQTL data from 9 cell types in peripheral blood mononuclear cells (PBMC), analyzed jointly with GTEx tissues, identified 30 additional causal gene-PBMC cell type pairs at PIP > 0.5-primarily for autoimmune disease and blood cell traits, including the biologically plausible example of CD52 in classical monocyte cells for Monocyte count. In conclusion, TGFM is a robust and powerful method for fine-mapping causal tissues and genes at disease-associated loci.
Collapse
Affiliation(s)
- Benjamin J. Strober
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Martin Jinye Zhang
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Tiffany Amariuta
- Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Jordan Rossen
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Alkes L. Price
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
14
|
Zhao B, Zheng S, Zhu H. ON BLOCKWISE AND REFERENCE PANEL-BASED ESTIMATORS FOR GENETIC DATA PREDICTION IN HIGH DIMENSIONS. Ann Stat 2024; 52:948-965. [PMID: 39281348 PMCID: PMC11391480 DOI: 10.1214/24-aos2378] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/18/2024]
Abstract
Genetic prediction holds immense promise for translating genetic discoveries into medical advances. As the high-dimensional covariance matrix (or the linkage disequilibrium (LD) pattern) of genetic variants often presents a block-diagonal structure, numerous methods account for the dependence among variants in predetermined local LD blocks. Moreover, due to privacy considerations and data protection concerns, genetic variant dependence in each LD block is typically estimated from external reference panels rather than the original training data set. This paper presents a unified analysis of blockwise and reference panel-based estimators in a high-dimensional prediction framework without sparsity restrictions. We find that, surprisingly, even when the covariance matrix has a block-diagonal structure with well-defined boundaries, blockwise estimation methods adjusting for local dependence can be substantially less accurate than methods controlling for the whole covariance matrix. Further, estimation methods built on the original training data set and external reference panels are likely to have varying performance in high dimensions, which may reflect the cost of having only access to summary level data from the training data set. This analysis is based on novel results in random matrix theory for block-diagonal covariance matrix. We numerically evaluate our results using extensive simulations and real data analysis in the UK Biobank.
Collapse
Affiliation(s)
- Bingxin Zhao
- Department of Statistics and Data Science, University of Pennsylvania
| | - Shurong Zheng
- School of Mathematics and Statistics, Northeast Normal University
| | - Hongtu Zhu
- Department of Biostatistics, University of North Carolina at Chapel Hill
| |
Collapse
|
15
|
Patel A, Gill D, Shungin D, Mantzoros CS, Knudsen LB, Bowden J, Burgess S. Robust use of phenotypic heterogeneity at drug target genes for mechanistic insights: Application of cis-multivariable Mendelian randomization to GLP1R gene region. Genet Epidemiol 2024; 48:151-163. [PMID: 38379245 PMCID: PMC7616158 DOI: 10.1002/gepi.22551] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 12/08/2023] [Accepted: 01/30/2024] [Indexed: 02/22/2024]
Abstract
Phenotypic heterogeneity at genomic loci encoding drug targets can be exploited by multivariable Mendelian randomization to provide insight into the pathways by which pharmacological interventions may affect disease risk. However, statistical inference in such investigations may be poor if overdispersion heterogeneity in measured genetic associations is unaccounted for. In this work, we first develop conditional F statistics for dimension-reduced genetic associations that enable more accurate measurement of phenotypic heterogeneity. We then develop a novel extension for two-sample multivariable Mendelian randomization that accounts for overdispersion heterogeneity in dimension-reduced genetic associations. Our empirical focus is to use genetic variants in the GLP1R gene region to understand the mechanism by which GLP1R agonism affects coronary artery disease (CAD) risk. Colocalization analyses indicate that distinct variants in the GLP1R gene region are associated with body mass index and type 2 diabetes (T2D). Multivariable Mendelian randomization analyses that were corrected for overdispersion heterogeneity suggest that bodyweight lowering rather than T2D liability lowering effects of GLP1R agonism are more likely contributing to reduced CAD risk. Tissue-specific analyses prioritized brain tissue as the most likely to be relevant for CAD risk, of the tissues considered. We hope the multivariable Mendelian randomization approach illustrated here is widely applicable to better understand mechanisms linking drug targets to diseases outcomes, and hence to guide drug development efforts.
Collapse
Affiliation(s)
- Ashish Patel
- MRC Biostatistics Unit, University of Cambridge, UK
| | - Dipender Gill
- Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, UK
| | - Dmitry Shungin
- Human Genetics Centre of Excellence, AI and Digital Research, Novo Nordisk, Denmark
| | - Christos S. Mantzoros
- Department of Internal Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, USA
- Department of Internal Medicine, Boston VA Healthcare System, Harvard Medical School, USA
| | - Lotte Bjerre Knudsen
- Chief Scientific Advisor Office, Research and Early Development, Novo Nordisk, Denmark
| | - Jack Bowden
- Department of Clinical and Biomedical Sciences, University of Exeter, UK
- Department of Genetics, Novo Nordisk Research Centre Oxford, U.K
| | - Stephen Burgess
- MRC Biostatistics Unit, University of Cambridge, UK
- Cardiovascular Epidemiology Unit, University of Cambridge, UK
| |
Collapse
|
16
|
Zhang Y, Wang M, Li Z, Yang X, Li K, Xie A, Dong F, Wang S, Yan J, Liu J. An overview of detecting gene-trait associations by integrating GWAS summary statistics and eQTLs. SCIENCE CHINA. LIFE SCIENCES 2024; 67:1133-1154. [PMID: 38568343 DOI: 10.1007/s11427-023-2522-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 01/29/2024] [Indexed: 06/07/2024]
Abstract
Detecting genes that affect specific traits (such as human diseases and crop yields) is important for treating complex diseases and improving crop quality. A genome-wide association study (GWAS) provides new insights and directions for understanding complex traits by identifying important single nucleotide polymorphisms. Many GWAS summary statistics data related to various complex traits have been gathered recently. Studies have shown that GWAS risk loci and expression quantitative trait loci (eQTLs) often have a lot of overlaps, which makes gene expression gradually become an important intermediary to reveal the regulatory role of GWAS. In this review, we review three types of gene-trait association detection methods of integrating GWAS summary statistics and eQTLs data, namely colocalization methods, transcriptome-wide association study-oriented approaches, and Mendelian randomization-related methods. At the theoretical level, we discussed the differences, relationships, advantages, and disadvantages of various algorithms in the three kinds of gene-trait association detection methods. To further discuss the performance of various methods, we summarize the significant gene sets that influence high-density lipoprotein, low-density lipoprotein, total cholesterol, and triglyceride reported in 16 studies. We discuss the performance of various algorithms using the datasets of the four lipid traits. The advantages and limitations of various algorithms are analyzed based on experimental results, and we suggest directions for follow-up studies on detecting gene-trait associations.
Collapse
Affiliation(s)
- Yang Zhang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Mengyao Wang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Zhenguo Li
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Xuan Yang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Keqin Li
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Ao Xie
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Fang Dong
- College of Life Sciences, Nankai University, Tianjin, 300071, China
| | - Shihan Wang
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jianbing Yan
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jianxiao Liu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China.
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, 430070, China.
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China.
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China.
| |
Collapse
|
17
|
Zhao B, Yang X, Zhu H. Estimating trans-ancestry genetic correlation with unbalanced data resources. J Am Stat Assoc 2024; 119:839-850. [PMID: 39219674 PMCID: PMC11364214 DOI: 10.1080/01621459.2024.2344703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Accepted: 04/07/2024] [Indexed: 09/04/2024]
Abstract
The aim of this paper is to propose a novel method for estimating trans-ancestry genetic correlations in genome-wide association studies (GWAS) using genetically-predicted observations. These correlations describe how genetic architecture of complex traits varies among populations. Our new estimator corrects for biases arising from prediction errors in high-dimensional weak GWAS signals, while addressing the ethnic diversity inherent in GWAS data, such as linkage disequilibrium (LD) differences. A distinguishing feature of our approach is its flexibility regarding sample sizes: it necessitates a large GWAS sample only from one population, while the secondary population may have a much smaller cohort, even in the hundreds. This design directly addresses the existing imbalance in GWAS data resources, where datasets for European populations typically outnumber those of non-European ancestries. Through extensive simulations and real data analysis from the UK Biobank study encompassing 26 complex traits, we validate the reliability of our method. Our results illuminate the broader implications of transferring genetic findings across diverse populations.
Collapse
Affiliation(s)
- Bingxin Zhao
- Department of Statistics and Data Science, University of Pennsylvania
| | | | - Hongtu Zhu
- Department of Biostatistics, University of North Carolina at Chapel Hill
| |
Collapse
|
18
|
Bagheri M, Bombin A, Shi M, Murthy VL, Shah R, Mosley JD, Ferguson JF. Genotype-based "virtual" metabolomics in a clinical biobank identifies novel metabolite-disease associations. Front Genet 2024; 15:1392622. [PMID: 38812968 PMCID: PMC11133605 DOI: 10.3389/fgene.2024.1392622] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Accepted: 05/03/2024] [Indexed: 05/31/2024] Open
Abstract
Introduction: Circulating metabolites act as biomarkers of dysregulated metabolism and may inform disease pathophysiology. A portion of the inter-individual variability in circulating metabolites is influenced by common genetic variation. We evaluated whether a genetics-based "virtual" metabolomics approach can identify novel metabolite-disease associations. Methods: We examined the association between polygenic scores for 724 metabolites with 1,247 clinical phenotypes in the BioVU DNA biobank, comprising 57,735 European ancestry and 15,754 African ancestry participants. We applied Mendelian randomization (MR) to probe significant relationships and validated significant MR associations using independent GWAS of candidate phenotypes. Results and Discussion: We found significant associations between 336 metabolites and 168 phenotypes in European ancestry and 107 metabolites and 56 phenotypes in African ancestry. Of these metabolite-disease pairs, MR analyses confirmed associations between 73 metabolites and 53 phenotypes in European ancestry. Of 22 metabolitephenotype pairs evaluated for replication in independent GWAS, 16 were significant (false discovery rate p < 0.05). These included associations between bilirubin and X-21796 with cholelithiasis, phosphatidylcholine (16:0/22:5n3,18:1/20:4) and arachidonate with inflammatory bowel disease and Crohn's disease, and campesterol with coronary artery disease and myocardial infarction. These associations may represent biomarkers or potentially targetable mediators of disease risk.
Collapse
Affiliation(s)
- Minoo Bagheri
- Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, United States
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Andrei Bombin
- Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Mingjian Shi
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Venkatesh L. Murthy
- Division of Cardiovascular Medicine, University of Michigan, Ann Arbor, MI, United States
| | - Ravi Shah
- Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Jonathan D. Mosley
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States
- Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Jane F. Ferguson
- Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, United States
| |
Collapse
|
19
|
Rossen J, Shi H, Strober BJ, Zhang MJ, Kanai M, McCaw ZR, Liang L, Weissbrod O, Price AL. MultiSuSiE improves multi-ancestry fine-mapping in All of Us whole-genome sequencing data. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.05.13.24307291. [PMID: 38798542 PMCID: PMC11118590 DOI: 10.1101/2024.05.13.24307291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Leveraging data from multiple ancestries can greatly improve fine-mapping power due to differences in linkage disequilibrium and allele frequencies. We propose MultiSuSiE, an extension of the sum of single effects model (SuSiE) to multiple ancestries that allows causal effect sizes to vary across ancestries based on a multivariate normal prior informed by empirical data. We evaluated MultiSuSiE via simulations and analyses of 14 quantitative traits leveraging whole-genome sequencing data in 47k African-ancestry and 94k European-ancestry individuals from All of Us. In simulations, MultiSuSiE applied to Afr47k+Eur47k was well-calibrated and attained higher power than SuSiE applied to Eur94k; interestingly, higher causal variant PIPs in Afr47k compared to Eur47k were entirely explained by differences in the extent of LD quantified by LD 4th moments. Compared to very recently proposed multi-ancestry fine-mapping methods, MultiSuSiE attained higher power and/or much lower computational costs, making the analysis of large-scale All of Us data feasible. In real trait analyses, MultiSuSiE applied to Afr47k+Eur94k identified 579 fine-mapped variants with PIP > 0.5, and MultiSuSiE applied to Afr47k+Eur47k identified 44% more fine-mapped variants with PIP > 0.5 than SuSiE applied to Eur94k. We validated MultiSuSiE results for real traits via functional enrichment of fine-mapped variants. We highlight several examples where MultiSuSiE implicates well-studied or biologically plausible fine-mapped variants that were not implicated by other methods.
Collapse
|
20
|
Kunkel D, Sørensen P, Shankar V, Morgante F. Improving polygenic prediction from summary data by learning patterns of effect sharing across multiple phenotypes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.06.592745. [PMID: 38766136 PMCID: PMC11100663 DOI: 10.1101/2024.05.06.592745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Polygenic prediction of complex trait phenotypes has become important in human genetics, especially in the context of precision medicine. Recently, Morgante et al. introduced mr.mash, a flexible and computationally efficient method that models multiple phenotypes jointly and leverages sharing of effects across such phenotypes to improve prediction accuracy. However, a drawback of mr.mash is that it requires individual-level data, which are often not publicly available. In this work, we introduce mr.mash-rss, an extension of the mr.mash model that requires only summary statistics from Genome-Wide Association Studies (GWAS) and linkage disequilibrium (LD) estimates from a reference panel. By using summary data, we achieve the twin goal of increasing the applicability of the mr.mash model to data sets that are not publicly available and making it scalable to biobank-size data. Through simulations, we show that mr.mash-rss is competitive with, and often outperforms, current state-of-the-art methods for single- and multi-phenotype polygenic prediction in a variety of scenarios that differ in the pattern of effect sharing across phenotypes, the number of phenotypes, the number of causal variants, and the genomic heritability. We also present a real data analysis of 16 blood cell phenotypes in UK Biobank, showing that mr.mash-rss achieves higher prediction accuracy than competing methods for the majority of traits, especially when the data has smaller sample size.
Collapse
Affiliation(s)
- Deborah Kunkel
- School of Mathematical and Statistical Sciences, Clemson University, Clemson, SC, United States of America
| | - Peter Sørensen
- Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark
| | - Vijay Shankar
- Center for Human Genetics, Clemson University, Greenwood, SC, United States of America
| | - Fabio Morgante
- Center for Human Genetics, Clemson University, Greenwood, SC, United States of America
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC, United States of America
| |
Collapse
|
21
|
Wang Y, Sun Y, Tan M, Lin X, Tai P, Huang X, Jin Q, Yuan D, Xu T, He B. Association Between Polymorphisms in DNA Damage Repair Pathway Genes and Female Breast Cancer Risk. DNA Cell Biol 2024; 43:219-231. [PMID: 38634815 DOI: 10.1089/dna.2023.0331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2024] Open
Abstract
Breast cancer risk have been discussed to be associated with polymorphisms in genes as well as abnormal DNA damage repair function. This study aims to assess the relationship between genes single nucleotide polymorphisms (SNPs) related to DNA damage repair and female breast cancer risk in Chinese population. A case-control study containing 400 patients and 400 healthy controls was conducted. Genotype was identified using the sequence MassARRAY method and expression of estrogen receptor (ER), progesterone receptor (PR) and human epidermal growth factor receptor-2 (HER-2) in tumor tissues was analyzed by immunohistochemistry assay. The results revealed that ATR rs13091637 decreased breast cancer risk influenced by ER, PR (CT/TT vs. CC: adjusted odds ratio [OR] = 1.54, 95% confidence interval [CI]: 1.04-2.27, p = 0.032; CT/TT vs. CC: adjusted OR = 1.63, 95%CI: 1.14-2.35, p = 0.008) expression. Stratified analysis revealed that PALB2 rs16940342 increased breast cancer risk in response to menstrual status (AG/GG vs. AA: adjusted OR = 1.72, 95%CI: 1.13-2.62, p = 0.011) and age of menarche (AG/GG vs. AA: adjusted OR = 1.54, 95%CI: 1.03-2.31, p = 0.037), whereas ATM rs611646 and Ku70 rs132793 were associated with reduced breast cancer risk influenced by menarche (GA/AA vs. GG: adjusted OR = 0.50, 95%CI: 0.30-0.95, p = 0.033). In a summary, PALB2 rs16940342, ATR rs13091637, ATM rs611646, and Ku70 rs132793 were associated with breast cancer risk.
Collapse
Affiliation(s)
- Ying Wang
- School of Basic-Medicine and Clinical Pharmacy, China Pharmaceutical University, Nanjing, China
- Deparment of Laboratory Medicine, Nanjing First Hospital, Nanjing Medical University, Nanjing, China
| | - Yalan Sun
- School of Basic-Medicine and Clinical Pharmacy, China Pharmaceutical University, Nanjing, China
- Deparment of Laboratory Medicine, Nanjing First Hospital, Nanjing Medical University, Nanjing, China
| | - Mingjuan Tan
- Deparment of Laboratory Medicine, Nanjing First Hospital, Nanjing Medical University, Nanjing, China
| | - Xin Lin
- Deparment of Laboratory Medicine, Nanjing First Hospital, Nanjing Medical University, Nanjing, China
| | - Ping Tai
- Deparment of Laboratory Medicine, Nanjing First Hospital, Nanjing Medical University, Nanjing, China
| | - Xiaoqin Huang
- Deparment of Laboratory Medicine, Nanjing First Hospital, Nanjing Medical University, Nanjing, China
| | - Qing Jin
- Deparment of Laboratory Medicine, Nanjing First Hospital, Nanjing Medical University, Nanjing, China
| | - Dan Yuan
- Deparment of Laboratory Medicine, Nanjing First Hospital, Nanjing Medical University, Nanjing, China
| | - Tao Xu
- Deparment of Laboratory Medicine, Nanjing First Hospital, Nanjing Medical University, Nanjing, China
| | - Bangshun He
- School of Basic-Medicine and Clinical Pharmacy, China Pharmaceutical University, Nanjing, China
- Deparment of Laboratory Medicine, Nanjing First Hospital, Nanjing Medical University, Nanjing, China
| |
Collapse
|
22
|
Cao X, Zhang S, Sha Q. A novel method for multiple phenotype association studies based on genotype and phenotype network. PLoS Genet 2024; 20:e1011245. [PMID: 38728360 PMCID: PMC11111089 DOI: 10.1371/journal.pgen.1011245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Revised: 05/22/2024] [Accepted: 03/29/2024] [Indexed: 05/12/2024] Open
Abstract
Joint analysis of multiple correlated phenotypes for genome-wide association studies (GWAS) can identify and interpret pleiotropic loci which are essential to understand pleiotropy in diseases and complex traits. Meanwhile, constructing a network based on associations between phenotypes and genotypes provides a new insight to analyze multiple phenotypes, which can explore whether phenotypes and genotypes might be related to each other at a higher level of cellular and organismal organization. In this paper, we first develop a bipartite signed network by linking phenotypes and genotypes into a Genotype and Phenotype Network (GPN). The GPN can be constructed by a mixture of quantitative and qualitative phenotypes and is applicable to binary phenotypes with extremely unbalanced case-control ratios in large-scale biobank datasets. We then apply a powerful community detection method to partition phenotypes into disjoint network modules based on GPN. Finally, we jointly test the association between multiple phenotypes in a network module and a single nucleotide polymorphism (SNP). Simulations and analyses of 72 complex traits in the UK Biobank show that multiple phenotype association tests based on network modules detected by GPN are much more powerful than those without considering network modules. The newly proposed GPN provides a new insight to investigate the genetic architecture among different types of phenotypes. Multiple phenotypes association studies based on GPN are improved by incorporating the genetic information into the phenotype clustering. Notably, it might broaden the understanding of genetic architecture that exists between diagnoses, genes, and pleiotropy.
Collapse
Affiliation(s)
- Xuewei Cao
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| |
Collapse
|
23
|
Gedik H, Peterson R, Chatzinakos C, Dozmorov MG, Vladimirov V, Riley BP, Bacanu SA. A novel multi-omics mendelian randomization method for gene set enrichment and its application to psychiatric disorders. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.04.14.24305811. [PMID: 38699366 PMCID: PMC11065030 DOI: 10.1101/2024.04.14.24305811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2024]
Abstract
Genome-wide association studies (GWAS) of psychiatric disorders (PD) yield numerous loci with significant signals, but often do not implicate specific genes. Because GWAS risk loci are enriched in expression/protein/methylation quantitative loci (e/p/mQTL, hereafter xQTL), transcriptome/proteome/methylome-wide association studies (T/P/MWAS, hereafter XWAS) that integrate xQTL and GWAS information, can link GWAS signals to effects on specific genes. To further increase detection power, gene signals are aggregated within relevant gene sets (GS) by performing gene set enrichment (GSE) analyses. Often GSE methods test for enrichment of "signal" genes in curated GS while overlooking their linkage disequilibrium (LD) structure, allowing for the possibility of increased false positive rates. Moreover, no GSE tool uses xQTL information to perform mendelian randomization (MR) analysis. To make causal inference on association between PD and GS, we develop a novel MR GSE (MR-GSE) procedure. First, we generate a "synthetic" GWAS for each MSigDB GS by aggregating summary statistics for x-level (mRNA, protein or DNA methylation (DNAm) levels) from the largest xQTL studies available) of genes in a GS. Second, we use synthetic GS GWAS as exposure in a generalized summary-data-based-MR analysis of complex trait outcomes. We applied MR-GSE to GWAS of nine important PD. When applied to the underpowered opioid use disorder GWAS, none of the four analyses yielded any signals, which suggests a good control of false positive rates. For other PD, MR-GSE greatly increased the detection of GO terms signals (2,594) when compared to the commonly used (non-MR) GSE method (286). Some of the findings might be easier to adapt for treatment, e.g., our analyses suggest modest positive effects for supplementation with certain vitamins and/or omega-3 for schizophrenia, bipolar and major depression disorder patients. Similar to other MR methods, when applying MR-GSE researchers should be mindful of the confounding effects of horizontal pleiotropy on statistical inference.
Collapse
|
24
|
Sutherland HG, Jenkins B, Griffiths LR. Genetics of migraine: complexity, implications, and potential clinical applications. Lancet Neurol 2024; 23:429-446. [PMID: 38508838 DOI: 10.1016/s1474-4422(24)00026-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 01/09/2024] [Accepted: 01/11/2024] [Indexed: 03/22/2024]
Abstract
Migraine is a common neurological disorder with large burden in terms of disability for individuals and costs for society. Accurate diagnosis and effective treatments remain priorities. Understanding the genetic factors that contribute to migraine risk and symptom manifestation could improve individual management. Migraine has a strong genetic basis that includes both monogenic and polygenic forms. Some distinct, rare, familial migraine subtypes are caused by pathogenic variants in genes involved in ion transport and neurotransmitter release, suggesting an underlying vulnerability of the excitatory-inhibitory balance in the brain, which might be exacerbated by disruption of homoeostasis and lead to migraine. For more prevalent migraine subtypes, genetic studies have identified many susceptibility loci, implicating genes involved in both neuronal and vascular pathways. Genetic factors can also reveal the nature of relationships between migraine and its associated biomarkers and comorbidities and could potentially be used to identify new therapeutic targets and predict treatment response.
Collapse
Affiliation(s)
- Heidi G Sutherland
- Centre for Genomics and Personalised Health, Genomics Research Centre, School of Biomedical Sciences, Queensland University of Technology, Brisbane, QLD, Australia
| | - Bronwyn Jenkins
- Department of Neurology, Royal North Shore Hospital, Sydney, NSW, Australia
| | - Lyn R Griffiths
- Centre for Genomics and Personalised Health, Genomics Research Centre, School of Biomedical Sciences, Queensland University of Technology, Brisbane, QLD, Australia.
| |
Collapse
|
25
|
Wang P, Xu X, Li M, Lou XY, Xu S, Wu B, Gao G, Yin P, Liu N. Gene-based association tests in family samples using GWAS summary statistics. Genet Epidemiol 2024; 48:103-113. [PMID: 38317324 DOI: 10.1002/gepi.22548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 11/18/2023] [Accepted: 01/08/2024] [Indexed: 02/07/2024]
Abstract
Genome-wide association studies (GWAS) have led to rapid growth in detecting genetic variants associated with various phenotypes. Owing to a great number of publicly accessible GWAS summary statistics, and the difficulty in obtaining individual-level genotype data, many existing gene-based association tests have been adapted to require only GWAS summary statistics rather than individual-level data. However, these association tests are restricted to unrelated individuals and thus do not apply to family samples directly. Moreover, due to its flexibility and effectiveness, the linear mixed model has been increasingly utilized in GWAS to handle correlated data, such as family samples. However, it remains unknown how to perform gene-based association tests in family samples using the GWAS summary statistics estimated from the linear mixed model. In this study, we show that, when family size is negligible compared to the total sample size, the diagonal block structure of the kinship matrix makes it possible to approximate the correlation matrix of marginal Z scores by linkage disequilibrium matrix. Based on this result, current methods utilizing summary statistics for unrelated individuals can be directly applied to family data without any modifications. Our simulation results demonstrate that this proposed strategy controls the type 1 error rate well in various situations. Finally, we exemplify the usefulness of the proposed approach with a dental caries GWAS data set.
Collapse
Affiliation(s)
- Peng Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Hubei, People's Republic of China
| | - Xiao Xu
- Department of Epidemiology and Biostatistics, Indiana University School of Public Health-Bloomington, Bloomington, Indiana, USA
| | - Ming Li
- Department of Epidemiology and Biostatistics, Indiana University School of Public Health-Bloomington, Bloomington, Indiana, USA
| | - Xiang-Yang Lou
- Department of Biostatistics, College of Public Health and Health Professions and College of Medicine, University of Florida, Gainesville, Florida, USA
| | - Siqi Xu
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong, Hong Kong
| | - Baolin Wu
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, USA
| | - Guimin Gao
- Department of Public Health Sciences, University of Chicago, Chicago, Illinois, USA
| | - Ping Yin
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Hubei, People's Republic of China
| | - Nianjun Liu
- Department of Epidemiology and Biostatistics, Indiana University School of Public Health-Bloomington, Bloomington, Indiana, USA
| |
Collapse
|
26
|
Liu S, Luo H, Zhang P, Li Y, Hao D, Zhang S, Song T, Xu T, He S. Adaptive Selection of Cis-regulatory Elements in the Han Chinese. Mol Biol Evol 2024; 41:msae034. [PMID: 38377343 PMCID: PMC10917166 DOI: 10.1093/molbev/msae034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 01/18/2024] [Accepted: 02/05/2024] [Indexed: 02/22/2024] Open
Abstract
Cis-regulatory elements have an important role in human adaptation to the living environment. However, the lag in population genomic cohort studies and epigenomic studies, hinders the research in the adaptive analysis of cis-regulatory elements in human populations. In this study, we collected 4,013 unrelated individuals and performed a comprehensive analysis of adaptive selection of genome-wide cis-regulatory elements in the Han Chinese. In total, 12.34% of genomic regions are under the influence of adaptive selection, where 1.00% of enhancers and 2.06% of promoters are under positive selection, and 0.06% of enhancers and 0.02% of promoters are under balancing selection. Gene ontology enrichment analysis of these cis-regulatory elements under adaptive selection reveals that many positive selections in the Han Chinese occur in pathways involved in cell-cell adhesion processes, and many balancing selections are related to immune processes. Two classes of adaptive cis-regulatory elements related to cell adhesion were in-depth analyzed, one is the adaptive enhancers derived from neanderthal introgression, leads to lower hyaluronidase level in skin, and brings better performance on UV-radiation resistance to the Han Chinese. Another one is the cis-regulatory elements regulating wound healing, and the results suggest the positive selection inhibits coagulation and promotes angiogenesis and wound healing in the Han Chinese. Finally, we found that many pathogenic alleles, such as risky alleles of type 2 diabetes or schizophrenia, remain in the population due to the hitchhiking effect of positive selections. Our findings will help deepen our understanding of the adaptive evolution of genome regulation in the Han Chinese.
Collapse
Affiliation(s)
- Shuai Liu
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Huaxia Luo
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Peng Zhang
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Yanyan Li
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Di Hao
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Sijia Zhang
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Tingrui Song
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Tao Xu
- National Laboratory of Biomacromolecules, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
- Shandong First Medical University & Shandong Academy of Medical Sciences, Jinan 250117, Shandong, China
| | - Shunmin He
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
27
|
Zhu Y, Zhang H, Qi J, Liu Y, Yan Y, Wang T, Zeng P. Evaluating causal influence of maternal educational attainment on offspring birthweight via observational study and Mendelian randomization analyses. SSM Popul Health 2024; 25:101587. [PMID: 38229657 PMCID: PMC10790093 DOI: 10.1016/j.ssmph.2023.101587] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 11/25/2023] [Accepted: 12/16/2023] [Indexed: 01/18/2024] Open
Abstract
Background Although extensive discussions on the influence of maternal educational attainment on offspring birthweight, the conclusion remains controversial, and it is challenging to comprehensively assess the causal association between them. Methods To estimate effect of maternal educational attainment on the birthweight of first child, we first conducted an individual-level analysis with UK Biobank participants of white ancestry (n = 208,162). We then implemented Mendelian randomization (MR) methods including inverse variance weighted (IVW) MR and multivariable MR to assess the causal relation between maternal education and maternal-specific birthweight. Finally, using the UK Biobank parent-offspring trio data (n = 618), we performed a polygenic score based MR to simultaneously adjust for confounding effects of fetal-specific birthweight and paternal educational attainment. We also conducted simulations for power evaluation and sensitivity analyses for horizontal pleiotropy of instruments. Results We observed that birthweight of first child was positively influenced by maternal education, with 7 years of maternal education as the reference, adjusted effect = 44.8 (95%CIs 38.0-51.6, P = 6.15 × 10-38), 54.9 (95%CIs 47.6-62.2, P = 4.21 × 10-128), and 89.4 (95%CIs 82.1-96.7, P = 4.28 × 10-34) for 10, 15 and 20 years of maternal educational attainment, respectively. A causal relation between maternal education and offspring birthweight was revealed by IVW MR (estimated effect = 0.074 for one standard deviation increase in maternal education years, 95%CIs 0.054-0.093, P = 2.56 × 10-13) and by complementary MR methods. This connection was not substantially affected by paternal education or horizontal pleiotropy. Further, we found a positive but insignificant causal association (adjusted effect = 24.0, 95%CIs -150.1-198.1, P = 0.787) between maternal education and offspring birthweight after simultaneously controlling for fetal genome and paternal education; this null causality was largely due to limited power of small sample sizes of parent-offspring trios. Conclusion This study offers supportive evidence for a causal association between maternal education and offspring birthweight, highlighting the significance of enhancing maternal education to prevent low birthweight.
Collapse
Affiliation(s)
- Yiyang Zhu
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Hao Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Jike Qi
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Yuxin Liu
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Yu Yan
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Ting Wang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
- Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
- Xuzhou Engineering Research Innovation Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
- Jiangsu Engineering Research Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| |
Collapse
|
28
|
Chen Z, He Z, Chu BB, Gu J, Morrison T, Sabatti C, Candès E. Controlled Variable Selection from Summary Statistics Only? A Solution via GhostKnockoffs and Penalized Regression. ARXIV 2024:arXiv:2402.12724v1. [PMID: 38463500 PMCID: PMC10925382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Identifying which variables do influence a response while controlling false positives pervades statistics and data science. In this paper, we consider a scenario in which we only have access to summary statistics, such as the values of marginal empirical correlations between each dependent variable of potential interest and the response. This situation may arise due to privacy concerns, e.g., to avoid the release of sensitive genetic information. We extend GhostKnockoffs He et al. [2022] and introduce variable selection methods based on penalized regression achieving false discovery rate (FDR) control. We report empirical results in extensive simulation studies, demonstrating enhanced performance over previous work. We also apply our methods to genome-wide association studies of Alzheimer's disease, and evidence a significant improvement in power.
Collapse
Affiliation(s)
| | - Zihuai He
- Department of Neurology and Neurological Sciences, Stanford University
- Department of Medicine (Biomedical Informatics Research), Stanford University
| | - Benjamin B Chu
- Department of Biomedical Data Science, Stanford University
| | - Jiaqi Gu
- Department of Neurology and Neurological Sciences, Stanford University
| | | | - Chiara Sabatti
- Department of Statistics, Stanford University
- Department of Biomedical Data Science, Stanford University
| | - Emmanuel Candès
- Department of Statistics, Stanford University
- Department of Mathematics, Stanford University
| |
Collapse
|
29
|
Ren J, Pan W. Statistical inference with large-scale trait imputation. Stat Med 2024; 43:625-641. [PMID: 38038193 PMCID: PMC10848238 DOI: 10.1002/sim.9975] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 09/26/2023] [Accepted: 11/17/2023] [Indexed: 12/02/2023]
Abstract
Recently a nonparametric method called LS-imputation has been proposed for large-scale trait imputation based on a GWAS summary dataset and a large set of genotyped individuals. The imputed trait values, along with the genotypes, can be treated as an individual-level dataset for downstream genetic analyses, including those that cannot be done with GWAS summary data. However, since the covariance matrix of the imputed trait values is often too large to calculate, the current method imposes a working assumption that the imputed trait values are identically and independently distributed, which is incorrect in truth. Here we propose a "divide and conquer/combine" strategy to estimate and account for the covariance matrix of the imputed trait values via batches, thus relaxing the incorrect working assumption. Applications of the methods to the UK Biobank data for marginal association analysis showed some improvement by the new method in some cases, but overall the original method performed well, which was explained by nearly constant variances of and mostly weak correlations among imputed trait values.
Collapse
Affiliation(s)
- Jingchen Ren
- School of Statistics, University of Minnesota, Minneapolis, MN, 55455
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, 55455
| | - Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, 55455
| |
Collapse
|
30
|
Zhuang Y, Kim NY, Fritsche LG, Mukherjee B, Lee S. Incorporating functional annotation with bilevel continuous shrinkage for polygenic risk prediction. BMC Bioinformatics 2024; 25:65. [PMID: 38336614 PMCID: PMC11323637 DOI: 10.1186/s12859-024-05664-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 01/19/2024] [Indexed: 02/12/2024] Open
Abstract
BACKGROUND Genetic variants can contribute differently to trait heritability by their functional categories, and recent studies have shown that incorporating functional annotation can improve the predictive performance of polygenic risk scores (PRSs). In addition, when only a small proportion of variants are causal variants, PRS methods that employ a Bayesian framework with shrinkage can account for such sparsity. It is possible that the annotation group level effect is also sparse. However, the number of PRS methods that incorporate both annotation information and shrinkage on effect sizes is limited. We propose a PRS method, PRSbils, which utilizes the functional annotation information with a bilevel continuous shrinkage prior to accommodate the varying genetic architectures both on the variant-specific level and on the functional annotation level. RESULTS We conducted simulation studies and investigated the predictive performance in settings with different genetic architectures. Results indicated that when there was a relatively large variability of group-wise heritability contribution, the gain in prediction performance from the proposed method was on average 8.0% higher AUC compared to the benchmark method PRS-CS. The proposed method also yielded higher predictive performance compared to PRS-CS in settings with different overlapping patterns of annotation groups and obtained on average 6.4% higher AUC. We applied PRSbils to binary and quantitative traits in three real world data sources (the UK Biobank, the Michigan Genomics Initiative (MGI), and the Korean Genome and Epidemiology Study (KoGES)), and two sources of annotations: ANNOVAR, and pathway information from the Kyoto Encyclopedia of Genes and Genomes (KEGG), and demonstrated that the proposed method holds the potential for improving predictive performance by incorporating functional annotations. CONCLUSIONS By utilizing a bilevel shrinkage framework, PRSbils enables the incorporation of both overlapping and non-overlapping annotations into PRS construction to improve the performance of genetic risk prediction. The software is available at https://github.com/styvon/PRSbils .
Collapse
Affiliation(s)
| | - Na Yeon Kim
- Seoul National University, Seoul, Republic of Korea
| | | | | | - Seunggeun Lee
- Seoul National University, Seoul, Republic of Korea.
| |
Collapse
|
31
|
Dai J, Chen K, Zhu Y, Xia L, Wang T, Yuan Z, Zeng P. Identifying risk loci for obsessive-compulsive disorder and shared genetic component with schizophrenia: A large-scale multi-trait association analysis with summary statistics. Prog Neuropsychopharmacol Biol Psychiatry 2024; 129:110906. [PMID: 38043635 DOI: 10.1016/j.pnpbp.2023.110906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Revised: 11/26/2023] [Accepted: 11/28/2023] [Indexed: 12/05/2023]
Abstract
Due to limited samples, no genetic loci have been identified for obsessive-compulsive disorder (OCD) in genome-wide association studies. Additionally, although co-morbidities between OCD and schizophrenia (SCZ) were observed, their common genetic etiology was not completely known. Here, we conducted a comprehensive investigation regarding the genetic architecture of OCD and the common genetic foundation shared by OCD and SCZ using summary statistics data (2688 cases and 7037 controls for OCD; 53,386 cases and 77,258 controls for SCZ). We discovered significant genetic correlation between OCD and SCZ (r̂g=0.296, P = 2.82 × 10-11). We then performed two multi-trait association analyses to detect OCD-associated loci and colocalization analysis to detect causal variants. Parallel gene-level analyses were also implemented. We identified 323 OCD-relevant variants located within 12 loci, with four loci shared the same causal variants between OCD and SCZ. Further, the gene-level analyses discovered 8 OCD-associated genes. Finally, multiple functional analyses at both SNP and gene levels showed that these genetic association signals had significant enrichments in the regions of left ventricle and anterior cingulate cortex, and suggested an important role of pathways involving regulation of telomere maintenance, histone phosphorylation, and GnRH secretion. Overall, this study identified new genetic loci for OCD and provided substantial evidence supporting common genetic foundation underlying OCD and SCZ. The findings advanced our understanding of genetic architecture and pathophysiology of OCD as well as shedding light on shared genetic etiology of the two disorders.
Collapse
Affiliation(s)
- Jing Dai
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Keying Chen
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Yiyang Zhu
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Lei Xia
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Ting Wang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Zhongshang Yuan
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, Shandong 250012, China; Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, Shandong 250012, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China; Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China; Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China; Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China; Xuzhou Engineering Research Innovation Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China; Jiangsu Engineering Research Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China.
| |
Collapse
|
32
|
Jiang H, Tiche SJ, He CJ, Jedoui M, Forgo B, Zhao M, He B, Li Y, Li AM, Truong AT, Ho J, Simmermaker C, Yang Y, Zhou MN, Hu Z, Cuthbertson DJ, Svensson KJ, Hazard FK, Shimada H, Chiu B, Ye J. Mitochondrial uncoupler and retinoic acid synergistically induce differentiation and inhibit proliferation in neuroblastoma. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.22.576741. [PMID: 38328117 PMCID: PMC10849550 DOI: 10.1101/2024.01.22.576741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Neuroblastoma is a leading cause of death in childhood cancer cases. Unlike adult malignancies, which typically develop from aged cells through accumulated damage and mutagenesis, neuroblastoma originates from neural crest cells with disrupted differentiation. This distinct feature provides novel therapeutic opportunities beyond conventional cytotoxic methods. Previously, we reported that the mitochondrial uncoupler NEN (niclosamide ethanolamine) activated mitochondria respiration to reprogram the epigenome, promoting neuronal differentiation. In the current study, we further combine NEN with retinoic acid (RA) to promote neural differentiation both in vitro and in vivo. The treatment increased the expression of RA signaling and neuron differentiation-related genes, resulting in a global shift in the transcriptome towards a more favorable prognosis. Overall, these results suggest that the combination of a mitochondrial uncoupler and the differentiation agent RA is a promising therapeutic strategy for neuroblastoma.
Collapse
Affiliation(s)
- Haowen Jiang
- Department of Radiation Oncology, Stanford University, Stanford, CA, USA
| | | | - Clifford JiaJun He
- Department of Radiation Oncology, Stanford University, Stanford, CA, USA
| | - Mohamed Jedoui
- Department of Radiation Oncology, Stanford University, Stanford, CA, USA
| | - Balint Forgo
- Department of Pathology, Stanford University, Stanford, CA, USA
| | - Meng Zhao
- Department of Pathology, Stanford University, Stanford, CA, USA
- Stanford Diabetes Research Center, Stanford University School of Medicine, Stanford, CA, USA
- Stanford Cardiovascular Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Bo He
- Department of Radiation Oncology, Stanford University, Stanford, CA, USA
| | - Yang Li
- Department of Radiation Oncology, Stanford University, Stanford, CA, USA
| | - Albert M. Li
- Department of Radiation Oncology, Stanford University, Stanford, CA, USA
| | | | - Jestine Ho
- Agilent Technologies, Inc., Santa Clara, CA, USA
| | | | - Yanan Yang
- Agilent Technologies, Inc., Santa Clara, CA, USA
| | - Meng-Ning Zhou
- Department of Radiation Oncology, Stanford University, Stanford, CA, USA
| | - Zhen Hu
- Olivia Consulting Service, Redwood City, CA, USA
| | | | - Katrin J. Svensson
- Department of Pathology, Stanford University, Stanford, CA, USA
- Stanford Diabetes Research Center, Stanford University School of Medicine, Stanford, CA, USA
- Stanford Cardiovascular Institute, Stanford University School of Medicine, Stanford, CA, USA
| | | | | | - Bill Chiu
- Department of Surgery, Stanford University, Stanford, CA, USA
| | - Jiangbin Ye
- Department of Radiation Oncology, Stanford University, Stanford, CA, USA
| |
Collapse
|
33
|
Spisak S, Chen D, Likasitwatanakul P, Doan P, Li Z, Bala P, Vizkeleti L, Tisza V, De Silva P, Giannakis M, Wolpin B, Qi J, Sethi NS. Utilizing a dual endogenous reporter system to identify functional regulators of aberrant stem cell and differentiation activity in colorectal cancer. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.06.21.545895. [PMID: 38293113 PMCID: PMC10827082 DOI: 10.1101/2023.06.21.545895] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
Aberrant stem cell-like activity and impaired differentiation are central to the development of colorectal cancer (CRC). To identify functional mediators that regulate these key cellular programs in CRC, we developed an endogenous reporter system by genome-editing human CRC cell lines with knock-in fluorescent reporters at the SOX9 and KRT20 locus to report aberrant stem cell-like activity and differentiation, respectively, and then performed pooled genetic perturbation screens. Constructing a dual reporter system that simultaneously monitored aberrant stem cell-like and differentiation activity in the same CRC cell line improved our signal to noise discrimination. Using a focused-library CRISPR screen targeting 78 epigenetic regulators with 542 sgRNAs, we identified factors that contribute to stem cell-like activity and differentiation in CRC. Perturbation single cell RNA sequencing (Perturb-seq) of validated hits nominated SMARCB1 of the BAF complex (also known as SWI/SNF) as a negative regulator of differentiation across an array of neoplastic colon models. SMARCB1 is a dependency in CRC and required for in vivo growth of human CRC models. These studies highlight the utility of a biologically designed endogenous reporter system to uncover novel therapeutic targets for drug development.
Collapse
Affiliation(s)
- Sandor Spisak
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Institute of Enzymology, HUN-REN Research Centre for Natural Sciences, Budapest, Hungary
| | - David Chen
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| | - Pornlada Likasitwatanakul
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard University, Cambridge, MA, USA
- Department of Medicine, Faculty of Medicine Siriraj Hospital, Bangkok, Thailand
| | - Paul Doan
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard University, Cambridge, MA, USA
| | - Zhixin Li
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard University, Cambridge, MA, USA
| | - Pratyusha Bala
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard University, Cambridge, MA, USA
| | - Laura Vizkeleti
- Department of Bioinformatics, Faculty of Medicine, Semmelweis University, 1094 Budapest, Hungary
| | - Viktoria Tisza
- Institute of Enzymology, HUN-REN Research Centre for Natural Sciences, Budapest, Hungary
| | - Pushpamail De Silva
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Marios Giannakis
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard University, Cambridge, MA, USA
- Gastrointestinal Cancer Center, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Brian Wolpin
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Gastrointestinal Cancer Center, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Jun Qi
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Nilay S. Sethi
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard University, Cambridge, MA, USA
- Gastrointestinal Cancer Center, Dana-Farber Cancer Institute, Boston, MA, USA
| |
Collapse
|
34
|
Tanaka R, Wu D, Li X, Tibbs-Cortes LE, Wood JC, Magallanes-Lundback M, Bornowski N, Hamilton JP, Vaillancourt B, Li X, Deason NT, Schoenbaum GR, Buell CR, DellaPenna D, Yu J, Gore MA. Leveraging prior biological knowledge improves prediction of tocochromanols in maize grain. THE PLANT GENOME 2023; 16:e20276. [PMID: 36321716 DOI: 10.1002/tpg2.20276] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 09/21/2022] [Indexed: 06/16/2023]
Abstract
With an essential role in human health, tocochromanols are mostly obtained by consuming seed oils; however, the vitamin E content of the most abundant tocochromanols in maize (Zea mays L.) grain is low. Several large-effect genes with cis-acting variants affecting messenger RNA (mRNA) expression are mostly responsible for tocochromanol variation in maize grain, with other relevant associated quantitative trait loci (QTL) yet to be fully resolved. Leveraging existing genomic and transcriptomic information for maize inbreds could improve prediction when selecting for higher vitamin E content. Here, we first evaluated a multikernel genomic best linear unbiased prediction (MK-GBLUP) approach for modeling known QTL in the prediction of nine tocochromanol grain phenotypes (12-21 QTL per trait) within and between two panels of 1,462 and 242 maize inbred lines. On average, MK-GBLUP models improved predictive abilities by 7.0-13.6% when compared with GBLUP. In a second approach with a subset of 545 lines from the larger panel, the highest average improvement in predictive ability relative to GBLUP was achieved with a multi-trait GBLUP model (15.4%) that had a tocochromanol phenotype and transcript abundances in developing grain for a few large-effect candidate causal genes (1-3 genes per trait) as multiple response variables. Taken together, our study illustrates the enhancement of prediction models when informed by existing biological knowledge pertaining to QTL and candidate causal genes.
Collapse
Affiliation(s)
- Ryokei Tanaka
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell Univ., Ithaca, NY, 14853, USA
| | - Di Wu
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell Univ., Ithaca, NY, 14853, USA
| | - Xiaowei Li
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell Univ., Ithaca, NY, 14853, USA
| | | | - Joshua C Wood
- Institute for Plant Breeding, Genetics & Genomics, Center for Applied Genetic Technologies, Dep. of Crop & Soil Sciences, Univ. of Georgia, Athens, GA, 30602, USA
| | | | - Nolan Bornowski
- Dep. of Plant Biology, Michigan State Univ., East Lansing, MI, 48824, USA
| | - John P Hamilton
- Institute for Plant Breeding, Genetics & Genomics, Center for Applied Genetic Technologies, Dep. of Crop & Soil Sciences, Univ. of Georgia, Athens, GA, 30602, USA
| | - Brieanne Vaillancourt
- Institute for Plant Breeding, Genetics & Genomics, Center for Applied Genetic Technologies, Dep. of Crop & Soil Sciences, Univ. of Georgia, Athens, GA, 30602, USA
| | - Xianran Li
- USDA ARS, Wheat Health, Genetics, and Quality Research Unit, Pullman, WA, 99164, USA
| | - Nicholas T Deason
- Dep. of Biochemistry and Molecular Biology, Michigan State Univ., East Lansing, MI, 48824, USA
| | | | - C Robin Buell
- Institute for Plant Breeding, Genetics & Genomics, Center for Applied Genetic Technologies, Dep. of Crop & Soil Sciences, Univ. of Georgia, Athens, GA, 30602, USA
| | - Dean DellaPenna
- Dep. of Biochemistry and Molecular Biology, Michigan State Univ., East Lansing, MI, 48824, USA
| | - Jianming Yu
- Dep. of Agronomy, Iowa State Univ., Ames, IA, 50011, USA
| | - Michael A Gore
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell Univ., Ithaca, NY, 14853, USA
| |
Collapse
|
35
|
Jain PR, Burch M, Martinez M, Mir P, Fichna JP, Zekanowski C, Rizzo R, Tümer Z, Barta C, Yannaki E, Stamatoyannopoulos J, Drineas P, Paschou P. Can polygenic risk scores help explain disease prevalence differences around the world? A worldwide investigation. BMC Genom Data 2023; 24:70. [PMID: 37986041 PMCID: PMC10662565 DOI: 10.1186/s12863-023-01168-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 10/20/2023] [Indexed: 11/22/2023] Open
Abstract
Complex disorders are caused by a combination of genetic, environmental and lifestyle factors, and their prevalence can vary greatly across different populations. The extent to which genetic risk, as identified by Genome Wide Association Study (GWAS), correlates to disease prevalence in different populations has not been investigated systematically. Here, we studied 14 different complex disorders and explored whether polygenic risk scores (PRS) based on current GWAS correlate to disease prevalence within Europe and around the world. A clear variation in GWAS-based genetic risk was observed based on ancestry and we identified populations that have a higher genetic liability for developing certain disorders. We found that for four out of the 14 studied disorders, PRS significantly correlates to disease prevalence within Europe. We also found significant correlations between worldwide disease prevalence and PRS for eight of the studied disorders with Multiple Sclerosis genetic risk having the highest correlation to disease prevalence. Based on current GWAS results, the across population differences in genetic risk for certain disorders can potentially be used to understand differences in disease prevalence and identify populations with the highest genetic liability. The study highlights both the limitations of PRS based on current GWAS but also the fact that in some cases, PRS may already have high predictive power. This could be due to the genetic architecture of specific disorders or increased GWAS power in some cases.
Collapse
Affiliation(s)
- Pritesh R Jain
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
| | - Myson Burch
- Department of Computer Sciences, Purdue University, West Lafayette, IN, USA
| | - Melanie Martinez
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
| | - Pablo Mir
- Unidad de Trastornos del Movimiento, Instituto de Biomedicina de Sevilla (IBiS). Hospital Universitario Virgen del Rocío/CSIC/Universidad de Sevilla, Seville, Spain
- Centro de Investigación Biomédica en Red Sobre Enfermedades Neurodegenerativas (CIBERNED), Madrid, Spain
| | - Jakub P Fichna
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
- Department of Neurogenetics and Functional Genomics, Mossakowski Medical Research Institute, Polish Academy of Sciences, Warsaw, Poland
| | - Cezary Zekanowski
- Department of Neurogenetics and Functional Genomics, Mossakowski Medical Research Institute, Polish Academy of Sciences, Warsaw, Poland
| | - Renata Rizzo
- Child and Adolescent Neurology and Psychiatry, Department of Clinical and Experimental Medicine, University of Catania, Catania, Italy
| | - Zeynep Tümer
- Department of Clinical Genetics, Kennedy Center, Copenhagen University Hospital, Rigshospitalet, Copenhagen, Denmark
- Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Csaba Barta
- Department of Molecular Biology, Institute of Biochemistry and Molecular Biology, Semmelweis University, Budapest, Hungary
| | - Evangelia Yannaki
- Hematology Department- Hematopoietic Cell Transplantation Unit, Gene and Cell Therapy Center, George Papanikolaou Hospital, Thessaloniki, Greece
- Department of Medicine, University of Washington, Seattle, WA, USA
| | - John Stamatoyannopoulos
- Altius Institute for Biomedical Sciences, Seattle, WA, USA
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Department of Medicine, Division of Oncology, University of Washington, Seattle, WA, USA
| | - Petros Drineas
- Department of Computer Sciences, Purdue University, West Lafayette, IN, USA
| | - Peristera Paschou
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
| |
Collapse
|
36
|
Wang X, Hivert V, Groot S, Wang Y, Yengo L, McGrath JJ, Kemper KE, Visscher PM, Wray NR, Revez JA. Cross-ancestry analyses identify new genetic loci associated with 25-hydroxyvitamin D. PLoS Genet 2023; 19:e1011033. [PMID: 37963177 PMCID: PMC10684098 DOI: 10.1371/journal.pgen.1011033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 11/28/2023] [Accepted: 10/24/2023] [Indexed: 11/16/2023] Open
Abstract
Vitamin D status-a complex trait influenced by environmental and genetic factors-is tightly associated with skin colour and ancestry. Yet very few studies have investigated the genetic underpinnings of vitamin D levels across diverse ancestries, and the ones that have, relied on small sample sizes, resulting in inconclusive results. Here, we conduct genome-wide association studies (GWAS) of 25 hydroxyvitamin D (25OHD)-the main circulating form of vitamin D-in 442,435 individuals from four broad genetically-determined ancestry groups represented in the UK Biobank: European (N = 421,867), South Asian (N = 9,983), African (N = 8,306) and East Asian (N = 2,279). We identify a new genetic determinant of 25OHD (rs146759773) in individuals of African ancestry, which was not detected in previous analysis of much larger European cohorts due to low minor allele frequency. We show genome-wide significant evidence of dominance effects in 25OHD that protect against vitamin D deficiency. Given that key events in the synthesis of 25OHD occur in the skin and are affected by pigmentation levels, we conduct GWAS of 25OHD stratified by skin colour and identify new associations. Lastly, we test the interaction between skin colour and variants associated with variance in 25OHD levels and identify two loci (rs10832254 and rs1352846) whose association with 25OHD differs in individuals of distinct complexions. Collectively, our results provide new insights into the complex relationship between 25OHD and skin colour and highlight the importance of diversity in genomic studies. Despite the much larger rates of vitamin D deficiency that we and others report for ancestry groups with dark skin (e.g., South Asian), our study highlights the importance of considering ancestral background and/or skin colour when assessing the implications of low vitamin D.
Collapse
Affiliation(s)
- Xiaotong Wang
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
| | - Valentin Hivert
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
| | - Shiane Groot
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
| | - Ying Wang
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, United States of America
- Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America
| | - Loic Yengo
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
| | - John J. McGrath
- National Centre for Register-Based Research, Aarhus University, Aarhus V, Denmark
- Queensland Centre for Mental Health Research, The Park Centre for Mental Health, Brisbane, Queensland, Australia
- Queensland Brain Institute, The University of Queensland, Brisbane, Queensland, Australia
| | - Kathryn E. Kemper
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
| | - Peter M. Visscher
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
| | - Naomi R. Wray
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
- Queensland Brain Institute, The University of Queensland, Brisbane, Queensland, Australia
| | - Joana A. Revez
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
| |
Collapse
|
37
|
Bagheri M, Bombin A, Shi M, Murthy VL, Shah R, Mosley JD, Ferguson JF. Genotype-based "virtual" metabolomics in a clinical biobank identifies novel metabolite-disease associations. RESEARCH SQUARE 2023:rs.3.rs-3222588. [PMID: 37790512 PMCID: PMC10543429 DOI: 10.21203/rs.3.rs-3222588/v2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
Circulating metabolites act as biomarkers of dysregulated metabolism, and may inform disease pathophysiology. A portion of the inter-individual variability in circulating metabolites is influenced by common genetic variation. We evaluated whether a genetics-based "virtual" metabolomics approach can identify novel metabolite-disease associations. We examined the association between polygenic scores for 726 metabolites (derived from OMICSPRED) with 1,247 clinical phenotypes in 57,735 European ancestry and 15,754 African ancestry participants from the BioVU DNA Biobank. We probed significant relationships through Mendelian randomization (MR) using genetic instruments constructed from the METSIM Study, and validated significant MR associations using independent GWAS of candidate phenotypes. We found significant associations between 336 metabolites and 168 phenotypes in European ancestry and 107 metabolites and 56 phenotypes among African ancestry. Of these metabolite-disease pairs, MR analyses confirmed associations between 73 metabolites and 53 phenotypes in European ancestry. Of 22 metabolite-phenotype pairs evaluated for replication in independent GWAS, 16 were significant (false discovery rate p<0.05). Validated findings included the metabolites bilirubin and X-21796 with cholelithiasis, phosphatidylcholine(16:0/22:5n3,18:1/20:4) and arachidonate(20:4n6) with inflammatory bowel disease and Crohn's disease, and campesterol with coronary artery disease and myocardial infarction. These associations may represent biomarkers or potentially targetable mediators of disease risk.
Collapse
Affiliation(s)
| | | | | | | | - Ravi Shah
- Vanderbilt University Medical Center
| | | | | |
Collapse
|
38
|
Bagheri M, Bombin A, Shi M, Murthy VL, Shah R, Mosley JD, Ferguson JF. Genotype-based "virtual" metabolomics in a clinical biobank identifies novel metabolite-disease associations. RESEARCH SQUARE 2023:rs.3.rs-3222588. [PMID: 37790512 PMCID: PMC10543429 DOI: 10.21203/rs.3.rs-3222588/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
Circulating metabolites act as biomarkers of dysregulated metabolism, and may inform disease pathophysiology. A portion of the inter-individual variability in circulating metabolites is influenced by common genetic variation. We evaluated whether a genetics-based "virtual" metabolomics approach can identify novel metabolite-disease associations. We examined the association between polygenic scores for 726 metabolites (derived from OMICSPRED) with 1,247 clinical phenotypes in 57,735 European ancestry and 15,754 African ancestry participants from the BioVU DNA Biobank. We probed significant relationships through Mendelian randomization (MR) using genetic instruments constructed from the METSIM Study, and validated significant MR associations using independent GWAS of candidate phenotypes. We found significant associations between 336 metabolites and 168 phenotypes in European ancestry and 107 metabolites and 56 phenotypes among African ancestry. Of these metabolite-disease pairs, MR analyses confirmed associations between 73 metabolites and 53 phenotypes in European ancestry. Of 22 metabolite-phenotype pairs evaluated for replication in independent GWAS, 16 were significant (false discovery rate p<0.05). Validated findings included the metabolites bilirubin and X-21796 with cholelithiasis, phosphatidylcholine(16:0/22:5n3,18:1/20:4) and arachidonate(20:4n6) with inflammatory bowel disease and Crohn's disease, and campesterol with coronary artery disease and myocardial infarction. These associations may represent biomarkers or potentially targetable mediators of disease risk.
Collapse
Affiliation(s)
| | | | | | | | - Ravi Shah
- Vanderbilt University Medical Center
| | | | | |
Collapse
|
39
|
Anwar MY, Graff M, Highland HM, Smit R, Wang Z, Buchanan VL, Young KL, Kenny EE, Fernandez-Rhodes L, Liu S, Assimes T, Garcia DO, Daeeun K, Gignoux CR, Justice AE, Haiman CA, Buyske S, Peters U, Loos RJF, Kooperberg C, North KE. Assessing efficiency of fine-mapping obesity-associated variants through leveraging ancestry architecture and functional annotation using PAGE and UKBB cohorts. Hum Genet 2023; 142:1477-1489. [PMID: 37658231 PMCID: PMC11512743 DOI: 10.1007/s00439-023-02593-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Accepted: 08/10/2023] [Indexed: 09/03/2023]
Abstract
Inadequate representation of non-European ancestry populations in genome-wide association studies (GWAS) has limited opportunities to isolate functional variants. Fine-mapping in multi-ancestry populations should improve the efficiency of prioritizing variants for functional interrogation. To evaluate this hypothesis, we leveraged ancestry architecture to perform comparative GWAS and fine-mapping of obesity-related phenotypes in European ancestry populations from the UK Biobank (UKBB) and multi-ancestry samples from the Population Architecture for Genetic Epidemiology (PAGE) consortium with comparable sample sizes. In the investigated regions with genome-wide significant associations for obesity-related traits, fine-mapping in our ancestrally diverse sample led to 95% and 99% credible sets (CS) with fewer variants than in the European ancestry sample. Lead fine-mapped variants in PAGE regions had higher average coding scores, and higher average posterior probabilities for causality compared to UKBB. Importantly, 99% CS in PAGE loci contained strong expression quantitative trait loci (eQTLs) in adipose tissues or harbored more variants in tighter linkage disequilibrium (LD) with eQTLs. Leveraging ancestrally diverse populations with heterogeneous ancestry architectures, coupled with functional annotation, increased fine-mapping efficiency and performance, and reduced the set of candidate variants for consideration for future functional studies. Significant overlap in genetic causal variants across populations suggests generalizability of genetic mechanisms underpinning obesity-related traits across populations.
Collapse
Affiliation(s)
- Mohammad Yaser Anwar
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
| | - Mariaelisa Graff
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Heather M Highland
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Roelof Smit
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Zhe Wang
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Victoria L Buchanan
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Kristin L Young
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Eimear E Kenny
- Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Lindsay Fernandez-Rhodes
- Department of Biobehavioral Health, College of Health and Human Development, Pennsylvania State University, University Park, PA, 16802, USA
| | - Simin Liu
- Department of Epidemiology and Center for Global Cardiometabolic Health, School of Public Health, Brown University, Providence, RI, 02903, USA
| | - Themistocles Assimes
- Division of Cardiovascular Medicine, Department of Medicine, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - David O Garcia
- Department of Health Promotion Sciences, Mel & Enid Zuckerman College of Public Health, University of Arizona, Tucson, AZ, 85724, USA
| | - Kim Daeeun
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Christopher R Gignoux
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Anne E Justice
- Department of Population Health Sciences, Geisinger Health, Danville, PA, 17822, USA
| | - Christopher A Haiman
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, 90033, USA
| | - Steve Buyske
- Department of Statistics, Rutgers University, Piscataway, NJ, 08854, USA
| | - Ulrike Peters
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Ruth J F Loos
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Kari E North
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| |
Collapse
|
40
|
Jullian Fabres P, Lee SH. Phenotypic variance partitioning by transcriptomic gene expression levels and environmental variables for anthropometric traits using GTEx data. Genet Epidemiol 2023; 47:465-474. [PMID: 37318147 DOI: 10.1002/gepi.22531] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Revised: 04/03/2023] [Accepted: 06/02/2023] [Indexed: 06/16/2023]
Abstract
Phenotypic variation in human is the results of genetic variation and environmental influences. Understanding the contribution of genetic and environmental components to phenotypic variation is of great interest. The variance explained by genome-wide single nucleotide polymorphisms (SNPs) typically represents a small proportion of the phenotypic variance for complex traits, which may be because the genome is only a part of the whole biological process to shape the phenotypes. In this study, we propose to partition the phenotypic variance of three anthropometric traits, using gene expression levels and environmental variables from GTEx data. We use the gene expression of four tissues that are deemed relevant for the anthropometric traits (two adipose tissues, skeletal muscle tissue and blood tissue). Additionally, we estimate the transcriptome-environment correlation that partly underlies the phenotypes of the anthropometric traits. We found that genetic factors play a significant role in determining body mass index (BMI), with the proportion of phenotypic variance explained by gene expression levels of visceral adipose tissue being 0.68 (SE = 0.06). However, we also observed that environmental factors such as age, sex, ancestry, smoking status, and drinking alcohol status have a small but significant impact (0.005, SE = 0.001). Interestingly, we found a significant negative correlation between the transcriptomic and environmental effects on BMI (transcriptome-environment correlation = -0.54, SE = 0.14), suggesting an antagonistic relationship. This implies that individuals with lower genetic profiles may be more susceptible to the effects of environmental factors on BMI, while those with higher genetic profiles may be less susceptible. We also show that the estimated transcriptomic variance varies across tissues, e.g., the gene expression levels of whole blood tissue and environmental variables explain a lower proportion of BMI phenotypic variance (0.16, SE = 0.05 and 0.04, SE = 0.004 respectively). We observed a significant positive correlation between transcriptomic and environmental effects (1.21, SE = 0.23) for this tissue. In conclusion, phenotypic variance partitioning can be done using gene expression and environmental data even with a small sample size (n = 838 from GTEx data), which can provide insights into how the transcriptomic and environmental effects contribute to the phenotypes of the anthropometric traits.
Collapse
Affiliation(s)
- Pastor Jullian Fabres
- Australian Centre for Precision Health, University of South Australia, Adelaide, South Australia, Australia
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, South Australia, Australia
- South Australian Health and Medical Research Institute (SAHMRI), University of South Australia, Adelaide, South Australia, Australia
| | - S Hong Lee
- Australian Centre for Precision Health, University of South Australia, Adelaide, South Australia, Australia
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, South Australia, Australia
- South Australian Health and Medical Research Institute (SAHMRI), University of South Australia, Adelaide, South Australia, Australia
| |
Collapse
|
41
|
Salehi Nowbandegani P, Wohns AW, Ballard JL, Lander ES, Bloemendal A, Neale BM, O'Connor LJ. Extremely sparse models of linkage disequilibrium in ancestrally diverse association studies. Nat Genet 2023; 55:1494-1502. [PMID: 37640881 DOI: 10.1038/s41588-023-01487-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 07/24/2023] [Indexed: 08/31/2023]
Abstract
Linkage disequilibrium (LD) is the correlation among nearby genetic variants. In genetic association studies, LD is often modeled using large correlation matrices, but this approach is inefficient, especially in ancestrally diverse studies. In the present study, we introduce LD graphical models (LDGMs), which are an extremely sparse and efficient representation of LD. LDGMs are derived from genome-wide genealogies; statistical relationships among alleles in the LDGM correspond to genealogical relationships among haplotypes. We published LDGMs and ancestry-specific LDGM precision matrices for 18 million common variants (minor allele frequency >1%) in five ancestry groups, validated their accuracy and demonstrated order-of-magnitude improvements in runtime for commonly used LD matrix computations. We implemented an extremely fast multiancestry polygenic prediction method, BLUPx-ldgm, which performs better than a similar method based on the reference LD correlation matrix. LDGMs will enable sophisticated methods that scale to ancestrally diverse genetic association data across millions of variants and individuals.
Collapse
Affiliation(s)
- Pouria Salehi Nowbandegani
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| | - Anthony Wilder Wohns
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Stanford University School of Medicine, Stanford, CA, USA.
| | - Jenna L Ballard
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Graduate Group in Genomics and Computational Biology, University of Pennsylvania, Philadelphia, PA, USA
| | - Eric S Lander
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biology, MIT, Cambridge, MA, USA
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Alex Bloemendal
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Benjamin M Neale
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Luke J O'Connor
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
42
|
Ren J, Lin Z, He R, Shen X, Pan W. Using GWAS summary data to impute traits for genotyped individuals. HGG ADVANCES 2023; 4:100197. [PMID: 37181332 PMCID: PMC10173780 DOI: 10.1016/j.xhgg.2023.100197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Accepted: 04/07/2023] [Indexed: 05/16/2023] Open
Abstract
Genome-wide association study (GWAS) summary data have become extremely useful in daily routine data analysis, largely facilitating new methods development and new applications. However, a severe limitation with the current use of GWAS summary data is its exclusive restriction to only linear single nucleotide polymorphism (SNP)-trait association analyses. To further expand the use of GWAS summary data, along with a large sample of individual-level genotypes, we propose a nonparametric method for large-scale imputation of the genetic component of the trait for the given genotypes. The imputed individual-level trait values, along with the individual-level genotypes, make it possible to conduct any analysis as with individual-level GWAS data, including nonlinear SNP-trait associations and predictions. We use the UK Biobank data to highlight the usefulness and effectiveness of the proposed method in three applications that currently cannot be done with only GWAS summary data (for SNP-trait associations): marginal SNP-trait association analysis under non-additive genetic models, detection of SNP-SNP interactions, and genetic prediction of a trait using a nonlinear model of SNPs.
Collapse
Affiliation(s)
- Jingchen Ren
- School of Statistics, University of Minnesota, Minneapolis, MN 55455, USA
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA
| | - Zhaotong Lin
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA
| | - Ruoyu He
- School of Statistics, University of Minnesota, Minneapolis, MN 55455, USA
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA
| | - Xiaotong Shen
- School of Statistics, University of Minnesota, Minneapolis, MN 55455, USA
| | - Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA
| |
Collapse
|
43
|
Morgante F, Carbonetto P, Wang G, Zou Y, Sarkar A, Stephens M. A flexible empirical Bayes approach to multivariate multiple regression, and its improved accuracy in predicting multi-tissue gene expression from genotypes. PLoS Genet 2023; 19:e1010539. [PMID: 37418505 PMCID: PMC10355440 DOI: 10.1371/journal.pgen.1010539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Accepted: 06/02/2023] [Indexed: 07/09/2023] Open
Abstract
Predicting phenotypes from genotypes is a fundamental task in quantitative genetics. With technological advances, it is now possible to measure multiple phenotypes in large samples. Multiple phenotypes can share their genetic component; therefore, modeling these phenotypes jointly may improve prediction accuracy by leveraging effects that are shared across phenotypes. However, effects can be shared across phenotypes in a variety of ways, so computationally efficient statistical methods are needed that can accurately and flexibly capture patterns of effect sharing. Here, we describe new Bayesian multivariate, multiple regression methods that, by using flexible priors, are able to model and adapt to different patterns of effect sharing and specificity across phenotypes. Simulation results show that these new methods are fast and improve prediction accuracy compared with existing methods in a wide range of settings where effects are shared. Further, in settings where effects are not shared, our methods still perform competitively with state-of-the-art methods. In real data analyses of expression data in the Genotype Tissue Expression (GTEx) project, our methods improve prediction performance on average for all tissues, with the greatest gains in tissues where effects are strongly shared, and in the tissues with smaller sample sizes. While we use gene expression prediction to illustrate our methods, the methods are generally applicable to any multi-phenotype applications, including prediction of polygenic scores and breeding values. Thus, our methods have the potential to provide improvements across fields and organisms.
Collapse
Affiliation(s)
- Fabio Morgante
- Center for Human Genetics, Clemson University, Greenwood, South Carolina, United States of America
- Department of Genetics and Biochemistry, Clemson University, Clemson, South Carolina, United States of America
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
| | - Peter Carbonetto
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
- Research Computing Center, University of Chicago, Chicago, Illinois, United States of America
| | - Gao Wang
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
- Department of Neurology, Columbia University, New York, New York, United States of America
- Gertrude H. Sergievsky Center, Columbia University, New York, New York, United States of America
| | - Yuxin Zou
- Department of Statistics, University of Chicago, Chicago, Illinois, United States of America
- Regeneron Genetics Center, Regeneron Pharmaceuticals Inc., Tarrytown, New York, United States of America
| | - Abhishek Sarkar
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| | - Matthew Stephens
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
- Department of Statistics, University of Chicago, Chicago, Illinois, United States of America
| |
Collapse
|
44
|
Karhunen V, Launonen I, Järvelin MR, Sebert S, Sillanpää MJ. Genetic fine-mapping from summary data using a nonlocal prior improves the detection of multiple causal variants. Bioinformatics 2023; 39:btad396. [PMID: 37348543 PMCID: PMC10326304 DOI: 10.1093/bioinformatics/btad396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Revised: 06/09/2023] [Accepted: 06/20/2023] [Indexed: 06/24/2023] Open
Abstract
MOTIVATION Genome-wide association studies (GWAS) have been successful in identifying genomic loci associated with complex traits. Genetic fine-mapping aims to detect independent causal variants from the GWAS-identified loci, adjusting for linkage disequilibrium patterns. RESULTS We present "FiniMOM" (fine-mapping using a product inverse-moment prior), a novel Bayesian fine-mapping method for summarized genetic associations. For causal effects, the method uses a nonlocal inverse-moment prior, which is a natural prior distribution to model non-null effects in finite samples. A beta-binomial prior is set for the number of causal variants, with a parameterization that can be used to control for potential misspecifications in the linkage disequilibrium reference. The results of simulations studies aimed to mimic a typical GWAS on circulating protein levels show improved credible set coverage and power of the proposed method over current state-of-the-art fine-mapping method SuSiE, especially in the case of multiple causal variants within a locus. AVAILABILITY AND IMPLEMENTATION https://vkarhune.github.io/finimom/.
Collapse
Affiliation(s)
- Ville Karhunen
- Research Unit of Mathematical Sciences, University of Oulu, Oulu, P.O.Box 8000, FI-90014, Finland
- Research Unit of Population Health, University of Oulu, Oulu, Finland
| | - Ilkka Launonen
- Research Unit of Mathematical Sciences, University of Oulu, Oulu, P.O.Box 8000, FI-90014, Finland
| | - Marjo-Riitta Järvelin
- Research Unit of Population Health, University of Oulu, Oulu, Finland
- Department of Epidemiology and Biostatistics, Imperial College London, London, United Kingdom
- Department of Life Sciences, College of Health and Life Sciences, Brunel University, London, United Kingdom
| | - Sylvain Sebert
- Research Unit of Population Health, University of Oulu, Oulu, Finland
| | - Mikko J Sillanpää
- Research Unit of Mathematical Sciences, University of Oulu, Oulu, P.O.Box 8000, FI-90014, Finland
| |
Collapse
|
45
|
Fu S, Purdue MP, Zhang H, Qin J, Song L, Berndt SI, Yu K. Improve the model of disease subtype heterogeneity by leveraging external summary data. PLoS Comput Biol 2023; 19:e1011236. [PMID: 37437002 DOI: 10.1371/journal.pcbi.1011236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 06/02/2023] [Indexed: 07/14/2023] Open
Abstract
Researchers are often interested in understanding the disease subtype heterogeneity by testing whether a risk exposure has the same level of effect on different disease subtypes. The polytomous logistic regression (PLR) model provides a flexible tool for such an evaluation. Disease subtype heterogeneity can also be investigated with a case-only study that uses a case-case comparison procedure to directly assess the difference between risk effects on two disease subtypes. Motivated by a large consortium project on the genetic basis of non-Hodgkin lymphoma (NHL) subtypes, we develop PolyGIM, a procedure to fit the PLR model by integrating individual-level data with summary data extracted from multiple studies under different designs. The summary data consist of coefficient estimates from working logistic regression models established by external studies. Examples of the working model include the case-case comparison model and the case-control comparison model, which compares the control group with a subtype group or a broad disease group formed by merging several subtypes. PolyGIM efficiently evaluates risk effects and provides a powerful test for disease subtype heterogeneity in situations when only summary data, instead of individual-level data, is available from external studies due to various informatics and privacy constraints. We investigate the theoretic properties of PolyGIM and use simulation studies to demonstrate its advantages. Using data from eight genome-wide association studies within the NHL consortium, we apply it to study the effect of the polygenic risk score defined by a lymphoid malignancy on the risks of four NHL subtypes. These results show that PolyGIM can be a valuable tool for pooling data from multiple sources for a more coherent evaluation of disease subtype heterogeneity.
Collapse
Affiliation(s)
- Sheng Fu
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, United States of America
| | - Mark P Purdue
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, United States of America
| | - Han Zhang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, United States of America
| | - Jing Qin
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Lei Song
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, United States of America
| | - Sonja I Berndt
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, United States of America
| | - Kai Yu
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, United States of America
| |
Collapse
|
46
|
Lu H, Zhang S, Jiang Z, Zeng P. Leveraging trans-ethnic genetic risk scores to improve association power for complex traits in underrepresented populations. Brief Bioinform 2023:bbad232. [PMID: 37332016 DOI: 10.1093/bib/bbad232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 05/06/2023] [Accepted: 06/04/2023] [Indexed: 06/20/2023] Open
Abstract
Trans-ethnic genome-wide association studies have revealed that many loci identified in European populations can be reproducible in non-European populations, indicating widespread trans-ethnic genetic similarity. However, how to leverage such shared information more efficiently in association analysis is less investigated for traits in underrepresented populations. We here propose a statistical framework, trans-ethnic genetic risk score informed gene-based association mixed model (GAMM), by hierarchically modeling single-nucleotide polymorphism effects in the target population as a function of effects of the same trait in well-studied populations. GAMM powerfully integrates genetic similarity across distinct ancestral groups to enhance power in understudied populations, as confirmed by extensive simulations. We illustrate the usefulness of GAMM via the application to 13 blood cell traits (i.e. basophil count, eosinophil count, hematocrit, hemoglobin concentration, lymphocyte count, mean corpuscular hemoglobin, mean corpuscular hemoglobin concentration, mean corpuscular volume, monocyte count, neutrophil count, platelet count, red blood cell count and total white blood cell count) in Africans of the UK Biobank (n = 3204) while utilizing genetic overlap shared in Europeans (n = 746 667) and East Asians (n = 162 255). We discovered multiple new associated genes, which had otherwise been missed by existing methods, and revealed that the trans-ethnic information indirectly contributed much to the phenotypic variance. Overall, GAMM represents a flexible and powerful statistical framework of association analysis for complex traits in underrepresented populations by integrating trans-ethnic genetic similarity across well-studied populations, and helps attenuate health inequities in current genetics research for people of minority populations.
Collapse
Affiliation(s)
- Haojie Lu
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Shuo Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Zhou Jiang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
- Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
- Engineering Research Innovation Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| |
Collapse
|
47
|
Long E, Wan P, Chen Q, Lu Z, Choi J. From function to translation: Decoding genetic susceptibility to human diseases via artificial intelligence. CELL GENOMICS 2023; 3:100320. [PMID: 37388909 PMCID: PMC10300605 DOI: 10.1016/j.xgen.2023.100320] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 07/01/2023]
Abstract
While genome-wide association studies (GWAS) have discovered thousands of disease-associated loci, molecular mechanisms for a considerable fraction of the loci remain to be explored. The logical next steps for post-GWAS are interpreting these genetic associations to understand disease etiology (GWAS functional studies) and translating this knowledge into clinical benefits for the patients (GWAS translational studies). Although various datasets and approaches using functional genomics have been developed to facilitate these studies, significant challenges remain due to data heterogeneity, multiplicity, and high dimensionality. To address these challenges, artificial intelligence (AI) technology has demonstrated considerable promise in decoding complex functional datasets and providing novel biological insights into GWAS findings. This perspective first describes the landmark progress driven by AI in interpreting and translating GWAS findings and then outlines specific challenges followed by actionable recommendations related to data availability, model optimization, and interpretation, as well as ethical concerns.
Collapse
Affiliation(s)
- Erping Long
- Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Peixing Wan
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Qingyu Chen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Zhiyong Lu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Jiyeon Choi
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
48
|
Sharapov SZ, Timoshchuk AN, Aulchenko YS. Genetic control of N-glycosylation of human blood plasma proteins. Vavilovskii Zhurnal Genet Selektsii 2023; 27:224-239. [PMID: 37293449 PMCID: PMC10244589 DOI: 10.18699/vjgb-23-29] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 01/20/2023] [Accepted: 01/23/2022] [Indexed: 06/10/2023] Open
Abstract
Glycosylation is an important protein modification, which influences the physical and chemical properties as well as biological function of these proteins. Large-scale population studies have shown that the levels of various plasma protein N-glycans are associated with many multifactorial human diseases. Observed associations between protein glycosylation levels and human diseases have led to the conclusion that N-glycans can be considered a potential source of biomarkers and therapeutic targets. Although biochemical pathways of glycosylation are well studied, the understanding of the mechanisms underlying general and tissue-specific regulation of these biochemical reactions in vivo is limited. This complicates both the interpretation of the observed associations between protein glycosylation levels and human diseases, and the development of glycan-based biomarkers and therapeutics. By the beginning of the 2010s, high-throughput methods of N-glycome profiling had become available, allowing research into the genetic control of N-glycosylation using quantitative genetics methods, including genome-wide association studies (GWAS). Application of these methods has made it possible to find previously unknown regulators of N-glycosylation and expanded the understanding of the role of N-glycans in the control of multifactorial diseases and human complex traits. The present review considers the current knowledge of the genetic control of variability in the levels of N-glycosylation of plasma proteins in human populations. It briefly describes the most popular physical-chemical methods of N-glycome profiling and the databases that contain genes involved in the biosynthesis of N-glycans. It also reviews the results of studies of environmental and genetic factors contributing to the variability of N-glycans as well as the mapping results of the genomic loci of N-glycans by GWAS. The results of functional in vitro and in silico studies are described. The review summarizes the current progress in human glycogenomics and suggests possible directions for further research.
Collapse
Affiliation(s)
- S Zh Sharapov
- MSU Institute for Artificial Intelligence, Lomonosov Moscow State University, Moscow, Russia
| | - A N Timoshchuk
- MSU Institute for Artificial Intelligence, Lomonosov Moscow State University, Moscow, Russia
| | - Y S Aulchenko
- MSU Institute for Artificial Intelligence, Lomonosov Moscow State University, Moscow, Russia Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| |
Collapse
|
49
|
Aygün N, Liang D, Crouse WL, Keele GR, Love MI, Stein JL. Inferring cell-type-specific causal gene regulatory networks during human neurogenesis. Genome Biol 2023; 24:130. [PMID: 37254169 PMCID: PMC10230710 DOI: 10.1186/s13059-023-02959-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 05/05/2023] [Indexed: 06/01/2023] Open
Abstract
BACKGROUND Genetic variation influences both chromatin accessibility, assessed in chromatin accessibility quantitative trait loci (caQTL) studies, and gene expression, assessed in expression QTL (eQTL) studies. Genetic variants can impact either nearby genes (cis-eQTLs) or distal genes (trans-eQTLs). Colocalization between caQTL and eQTL, or cis- and trans-eQTLs suggests that they share causal variants. However, pairwise colocalization between these molecular QTLs does not guarantee a causal relationship. Mediation analysis can be applied to assess the evidence supporting causality versus independence between molecular QTLs. Given that the function of QTLs can be cell-type-specific, we performed mediation analyses to find epigenetic and distal regulatory causal pathways for genes within two major cell types of the developing human cortex, progenitors and neurons. RESULTS We find that the expression of 168 and 38 genes is mediated by chromatin accessibility in progenitors and neurons, respectively. We also find that the expression of 11 and 12 downstream genes is mediated by upstream genes in progenitors and neurons. Moreover, we discover that a genetic locus associated with inter-individual differences in brain structure shows evidence for mediation of SLC26A7 through chromatin accessibility, identifying molecular mechanisms of a common variant association to a brain trait. CONCLUSIONS In this study, we identify cell-type-specific causal gene regulatory networks whereby the impacts of variants on gene expression were mediated by chromatin accessibility or distal gene expression. Identification of these causal paths will enable identifying and prioritizing actionable regulatory targets perturbing these key processes during neurodevelopment.
Collapse
Affiliation(s)
- Nil Aygün
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Dan Liang
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Wesley L Crouse
- Department of Human Genetics, University of Chicago, Chicago, IL, 60637, USA
| | - Gregory R Keele
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME, 04609, USA
| | - Michael I Love
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
| | - Jason L Stein
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
| |
Collapse
|
50
|
Vallarino JG, Jun H, Wang S, Wang X, Sade N, Orf I, Zhang D, Shi J, Shen S, Cuadros-Inostroza Á, Xu Q, Luo J, Fernie AR, Brotman Y. Limitations and advantages of using metabolite-based genome-wide association studies: focus on fruit quality traits. PLANT SCIENCE : AN INTERNATIONAL JOURNAL OF EXPERIMENTAL PLANT BIOLOGY 2023; 333:111748. [PMID: 37230189 DOI: 10.1016/j.plantsci.2023.111748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2023] [Revised: 05/19/2023] [Accepted: 05/21/2023] [Indexed: 05/27/2023]
Abstract
In the last decades, linkage mapping has help in the location of metabolite quantitative trait loci (QTL) in many species; however, this approach shows some limitations. Recently, thanks to the most recent advanced in high-throughput genotyping technologies like next-generation sequencing, metabolite genome-wide association study (mGWAS) has been proposed a powerful tool to identify the genetic variants in polygenic agrinomic traits. Fruit flavor is a complex interaction of aroma volatiles and taste being sugar and acid ratio key parameter for flavor acceptance. Here, we review recent progress of mGWAS in pinpoint gene polymorphisms related to flavor-related metabolites in fruits. Despite clear successes in discovering novel genes or regions associated with metabolite accumulation affecting sensory attributes in fruits, GWAS incurs in several limitations summarized in this review. In addition, in our own work, we performed mGWAS on 194 Citrus grandis accessions to investigate the genetic control of individual primary and lipid metabolites in ripe fruit. We have identified a total of 667 associations for 14 primary metabolites including amino acids, sugars, and organic acids, and 768 associations corresponding to 47 lipids. Furthermore, candidate genes related to important metabolites related to fruit quality such as sugars, organic acids and lipids were discovered.
Collapse
Affiliation(s)
- José G Vallarino
- Instituto de Hortofruticultura Subtropical y Mediterránea "La Mayora", Universidad de Málaga-Consejo Superior de Investigaciones Científicas, Departamento de Biología Molecular y Bioquímica, Campus de Teatinos, 29071 Málaga, Spain
| | - Hong Jun
- Department of Genetics and Developmental Science, Joint International Research Laboratory of Metabolic and Developmental Sciences, State Key Laboratory of Hybrid Rice, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China; Waite Research Institute, School of Agriculture, Food and Wine, University of Adelaide, Urrbrae, SA, Australia
| | | | - Xia Wang
- Key Laboratory of Horticultural Plant Biology (Ministry of Education), Huazhong Agricultural University, Wuhan, China
| | - Nir Sade
- School of Plant Sciences and Food Security, Tel Aviv University, P.O.B. 39040, 55 Haim Levanon St., Tel Aviv 6139001, Israel
| | - Isabel Orf
- Department of Life Sciences, Ben Gurion University of the Negev, Beersheva, Israel
| | - Dabing Zhang
- Department of Genetics and Developmental Science, Joint International Research Laboratory of Metabolic and Developmental Sciences, State Key Laboratory of Hybrid Rice, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China; Waite Research Institute, School of Agriculture, Food and Wine, University of Adelaide, Urrbrae, SA, Australia
| | - Jianxin Shi
- Department of Genetics and Developmental Science, Joint International Research Laboratory of Metabolic and Developmental Sciences, State Key Laboratory of Hybrid Rice, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Shuangqian Shen
- National Key Laboratory of Crop Genetic Improvement and National Center of Plant Gene Research (Wuhan), Huazhong Agricultural University, Wuhan, China
| | | | - Qiang Xu
- Key Laboratory of Horticultural Plant Biology (Ministry of Education), Huazhong Agricultural University, Wuhan, China
| | - Jie Luo
- College of Tropical Crops, Hainan University, Haikou, China; National Key Laboratory of Crop Genetic Improvement and National Center of Plant Gene Research (Wuhan), Huazhong Agricultural University, Wuhan, China
| | - Alisdair R Fernie
- Department of Root Biology and Symbiosis, Max Planck Institute of Molecular Plant Physiology, 1 Am Mühlenberg, Golm, Potsdam 14476, Germany; Department of Plant Metabolomics, Center for Plant Systems Biology and Biotechnology, 139 Ruski Blvd., Plovdiv 4000, Bulgaria.
| | - Yariv Brotman
- Department of Life Sciences, Ben Gurion University of the Negev, Beersheva, Israel.
| |
Collapse
|