1
|
Tan Q, Xu X, Zhou H, Jia J, Jia Y, Tu H, Zhou D, Wu X. A multi-ancestry cerebral cortex transcriptome-wide association study identifies genes associated with smoking behaviors. Mol Psychiatry 2024:10.1038/s41380-024-02605-6. [PMID: 38816585 DOI: 10.1038/s41380-024-02605-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 04/30/2024] [Accepted: 05/09/2024] [Indexed: 06/01/2024]
Abstract
Transcriptome-wide association studies (TWAS) have provided valuable insight in identifying genes that may impact cigarette smoking. Most of previous studies, however, mainly focused on European ancestry. Limited TWAS studies have been conducted across multiple ancestries to explore genes that may impact smoking behaviors. In this study, we used cis-eQTL data of cerebral cortex from multiple ancestries in MetaBrain, including European, East Asian, and African samples, as reference panels to perform multi-ancestry TWAS analyses on ancestry-matched GWASs of four smoking behaviors including smoking initiation, smoking cessation, age of smoking initiation, and number of cigarettes per day in GWAS & Sequencing Consortium of Alcohol and Nicotine use (GSCAN). Multiple-ancestry fine-mapping approach was conducted to identify credible gene sets associated with these four traits. Enrichment and module network analyses were further performed to explore the potential roles of these identified gene sets. A total of 719 unique genes were identified to be associated with at least one of the four smoking traits across ancestries. Among those, 249 genes were further prioritized as putative causal genes in multiple ancestry-based fine-mapping approach. Several well-known smoking-related genes, including PSMA4, IREB2, and CHRNA3, showed high confidence across ancestries. Some novel genes, e.g., TSPAN3 and ANK2, were also identified in the credible sets. The enrichment analysis identified a series of critical pathways related to smoking such as synaptic transmission and glutamate receptor activity. Leveraging the power of the latest multi-ancestry GWAS and eQTL data sources, this study revealed hundreds of genes and relevant biological processes related to smoking behaviors. These findings provide new insights for future functional studies on smoking behaviors.
Collapse
Affiliation(s)
- Qilong Tan
- Department of Big Data in Health Science School of Public Health, Center of Clinical Big Data and Analytics of The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, 310058, China
- The Key Laboratory of Intelligent Preventive Medicine of Zhejiang Province, Hangzhou, 310058, China
| | - Xiaohang Xu
- Department of Big Data in Health Science School of Public Health, Center of Clinical Big Data and Analytics of The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, 310058, China
- The Key Laboratory of Intelligent Preventive Medicine of Zhejiang Province, Hangzhou, 310058, China
| | - Hanyi Zhou
- Department of Big Data in Health Science School of Public Health, Center of Clinical Big Data and Analytics of The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, 310058, China
- The Key Laboratory of Intelligent Preventive Medicine of Zhejiang Province, Hangzhou, 310058, China
| | - Junlin Jia
- Department of Big Data in Health Science School of Public Health, Center of Clinical Big Data and Analytics of The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, 310058, China
- The Key Laboratory of Intelligent Preventive Medicine of Zhejiang Province, Hangzhou, 310058, China
| | - Yubing Jia
- Department of Big Data in Health Science School of Public Health, Center of Clinical Big Data and Analytics of The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, 310058, China
- The Key Laboratory of Intelligent Preventive Medicine of Zhejiang Province, Hangzhou, 310058, China
| | - Huakang Tu
- Department of Big Data in Health Science School of Public Health, Center of Clinical Big Data and Analytics of The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, 310058, China
- The Key Laboratory of Intelligent Preventive Medicine of Zhejiang Province, Hangzhou, 310058, China
- National Institute for Data Science in Health and Medicine, Zhejiang University, Hangzhou, Zhejiang, China
| | - Dan Zhou
- Department of Big Data in Health Science School of Public Health, Center of Clinical Big Data and Analytics of The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, 310058, China
- The Key Laboratory of Intelligent Preventive Medicine of Zhejiang Province, Hangzhou, 310058, China
- Cancer Center, Zhejiang University, Hangzhou, 310058, China
| | - Xifeng Wu
- Department of Big Data in Health Science School of Public Health, Center of Clinical Big Data and Analytics of The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, 310058, China.
- The Key Laboratory of Intelligent Preventive Medicine of Zhejiang Province, Hangzhou, 310058, China.
- School of Medicine and Health Science, George Washington University, Washington, DC, USA.
| |
Collapse
|
2
|
Troubat L, Fettahoglu D, Henches L, Aschard H, Julienne H. Multi-trait GWAS for diverse ancestries: mapping the knowledge gap. BMC Genomics 2024; 25:375. [PMID: 38627641 PMCID: PMC11022331 DOI: 10.1186/s12864-024-10293-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 04/09/2024] [Indexed: 04/19/2024] Open
Abstract
BACKGROUND Approximately 95% of samples analyzed in univariate genome-wide association studies (GWAS) are of European ancestry. This bias toward European ancestry populations in association screening also exists for other analyses and methods that are often developed and tested on European ancestry only. However, existing data in non-European populations, which are often of modest sample size, could benefit from innovative approaches as recently illustrated in the context of polygenic risk scores. METHODS Here, we extend and assess the potential limitations and gains of our multi-trait GWAS pipeline, JASS (Joint Analysis of Summary Statistics), for the analysis of non-European ancestries. To this end, we conducted the joint GWAS of 19 hematological traits and glycemic traits across five ancestries (European (EUR), admixed American (AMR), African (AFR), East Asian (EAS), and South-East Asian (SAS)). RESULTS We detected 367 new genome-wide significant associations in non-European populations (15 in Admixed American (AMR), 72 in African (AFR) and 280 in East Asian (EAS)). New associations detected represent 5%, 17% and 13% of associations in the AFR, AMR and EAS populations, respectively. Overall, multi-trait testing increases the replication of European associated loci in non-European ancestry by 15%. Pleiotropic effects were highly similar at significant loci across ancestries (e.g. the mean correlation between multi-trait genetic effects of EUR and EAS ancestries was 0.88). For hematological traits, strong discrepancies in multi-trait genetic effects are tied to known evolutionary divergences: the ARKC1 loci, which is adaptive to overcome p.vivax induced malaria. CONCLUSIONS Multi-trait GWAS can be a valuable tool to narrow the genetic knowledge gap between European and non-European populations.
Collapse
Affiliation(s)
- Lucie Troubat
- Department of Computational Biology, Institut Pasteur, Université Paris Cité, Paris, F-75015, France
| | - Deniz Fettahoglu
- Department of Computational Biology, Institut Pasteur, Université Paris Cité, Paris, F-75015, France
| | - Léo Henches
- Department of Computational Biology, Institut Pasteur, Université Paris Cité, Paris, F-75015, France
| | - Hugues Aschard
- Department of Computational Biology, Institut Pasteur, Université Paris Cité, Paris, F-75015, France
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Hanna Julienne
- Department of Computational Biology, Institut Pasteur, Université Paris Cité, Paris, F-75015, France.
- Institut Pasteur, Université Paris Cité, Bioinformatics and Biostatistics Hub, Paris, F-75015, France.
| |
Collapse
|
3
|
Lappalainen T, Li YI, Ramachandran S, Gusev A. Genetic and molecular architecture of complex traits. Cell 2024; 187:1059-1075. [PMID: 38428388 DOI: 10.1016/j.cell.2024.01.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 12/20/2023] [Accepted: 01/16/2024] [Indexed: 03/03/2024]
Abstract
Human genetics has emerged as one of the most dynamic areas of biology, with a broadening societal impact. In this review, we discuss recent achievements, ongoing efforts, and future challenges in the field. Advances in technology, statistical methods, and the growing scale of research efforts have all provided many insights into the processes that have given rise to the current patterns of genetic variation. Vast maps of genetic associations with human traits and diseases have allowed characterization of their genetic architecture. Finally, studies of molecular and cellular effects of genetic variants have provided insights into biological processes underlying disease. Many outstanding questions remain, but the field is well poised for groundbreaking discoveries as it increases the use of genetic data to understand both the history of our species and its applications to improve human health.
Collapse
Affiliation(s)
- Tuuli Lappalainen
- New York Genome Center, New York, NY, USA; Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden.
| | - Yang I Li
- Section of Genetic Medicine, University of Chicago, Chicago, IL, USA; Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Sohini Ramachandran
- Ecology, Evolution and Organismal Biology, Center for Computational Molecular Biology, and the Data Science Institute, Brown University, Providence, RI 029129, USA
| | - Alexander Gusev
- Harvard Medical School and Dana-Farber Cancer Institute, Boston, MA, USA
| |
Collapse
|
4
|
Ahlquist KD, Sugden LA, Ramachandran S. Enabling interpretable machine learning for biological data with reliability scores. PLoS Comput Biol 2023; 19:e1011175. [PMID: 37235578 PMCID: PMC10249903 DOI: 10.1371/journal.pcbi.1011175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 06/08/2023] [Accepted: 05/10/2023] [Indexed: 05/28/2023] Open
Abstract
Machine learning tools have proven useful across biological disciplines, allowing researchers to draw conclusions from large datasets, and opening up new opportunities for interpreting complex and heterogeneous biological data. Alongside the rapid growth of machine learning, there have also been growing pains: some models that appear to perform well have later been revealed to rely on features of the data that are artifactual or biased; this feeds into the general criticism that machine learning models are designed to optimize model performance over the creation of new biological insights. A natural question arises: how do we develop machine learning models that are inherently interpretable or explainable? In this manuscript, we describe the SWIF(r) reliability score (SRS), a method building on the SWIF(r) generative framework that reflects the trustworthiness of the classification of a specific instance. The concept of the reliability score has the potential to generalize to other machine learning methods. We demonstrate the utility of the SRS when faced with common challenges in machine learning including: 1) an unknown class present in testing data that was not present in training data, 2) systemic mismatch between training and testing data, and 3) instances of testing data that have missing values for some attributes. We explore these applications of the SRS using a range of biological datasets, from agricultural data on seed morphology, to 22 quantitative traits in the UK Biobank, and population genetic simulations and 1000 Genomes Project data. With each of these examples, we demonstrate how the SRS can allow researchers to interrogate their data and training approach thoroughly, and to pair their domain-specific knowledge with powerful machine-learning frameworks. We also compare the SRS to related tools for outlier and novelty detection, and find that it has comparable performance, with the advantage of being able to operate when some data are missing. The SRS, and the broader discussion of interpretable scientific machine learning, will aid researchers in the biological machine learning space as they seek to harness the power of machine learning without sacrificing rigor and biological insight.
Collapse
Affiliation(s)
- K. D. Ahlquist
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, Rhode Island, United States of America
| | - Lauren A. Sugden
- Department of Mathematics and Computer Science, Duquesne University, Pittsburgh, Pennsylvania, United States of America
| | - Sohini Ramachandran
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
- Department of Ecology, Evolution and Organismal Biology, Brown University, Providence, Rhode Island, United States of America
- Data Science Initiative, Brown University, Providence, Rhode Island, United States of America
| |
Collapse
|
5
|
Lee CJ, Chen TH, Lim AMW, Chang CC, Sie JJ, Chen PL, Chang SW, Wu SJ, Hsu CL, Hsieh AR, Yang WS, Fann CSJ. Phenome-wide analysis of Taiwan Biobank reveals novel glycemia-related loci and genetic risks for diabetes. Commun Biol 2022; 5:1175. [PMID: 36329257 PMCID: PMC9633758 DOI: 10.1038/s42003-022-04168-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Accepted: 10/25/2022] [Indexed: 11/05/2022] Open
Abstract
To explore the complex genetic architecture of common diseases and traits, we conducted comprehensive PheWAS of ten diseases and 34 quantitative traits in the community-based Taiwan Biobank (TWB). We identified 995 significantly associated loci with 135 novel loci specific to Taiwanese population. Further analyses highlighted the genetic pleiotropy of loci related to complex disease and associated quantitative traits. Extensive analysis on glycaemic phenotypes (T2D, fasting glucose and HbA1c) was performed and identified 115 significant loci with four novel genetic variants (HACL1, RAD21, ASH1L and GAK). Transcriptomics data also strengthen the relevancy of the findings to metabolic disorders, thus contributing to better understanding of pathogenesis. In addition, genetic risk scores are constructed and validated for absolute risks prediction of T2D in Taiwanese population. In conclusion, our data-driven approach without a priori hypothesis is useful for novel gene discovery and validation on top of disease risk prediction for unique non-European population.
Collapse
Affiliation(s)
- Chia-Jung Lee
- Institute of Biomedical Sciences, Academia Sinica, Taipei, 115, Taiwan.,Department of Pathology and Immunology, Washington University School of Medicine, Saint Louis, MO, USA
| | - Ting-Huei Chen
- Department of Mathematics and Statistics, Laval University, Quebec, QC, G1V0A6, Canada.,Brain Research Centre (CERVO), Quebec, QC, G1V0A6, Canada
| | - Aylwin Ming Wee Lim
- Institute of Biomedical Sciences, Academia Sinica, Taipei, 115, Taiwan.,Taiwan International Graduate Program in Molecular Medicine, National Yang Ming Chiao Tung University and Academia Sinica, Taipei, 115, Taiwan
| | - Chien-Ching Chang
- Institute of Biomedical Sciences, Academia Sinica, Taipei, 115, Taiwan
| | - Jia-Jyun Sie
- Department of Mathematics, National Changhua University of Education, Changhua, Taiwan
| | - Pei-Lung Chen
- Graduate Institute of Medical Genomics and Proteomics, College of Medicine, National Taiwan University, Taipei, 10617, Taiwan.,Department of Medical Genetics, National Taiwan University Hospital, Taipei, 100225, Taiwan.,Graduate Institute of Clinical Medicine, College of Medicine, National Taiwan University, Taipei, 10617, Taiwan
| | - Su-Wei Chang
- Clinical Informatics and Medical Statistics Research Center, Chang Gung University, Taoyuan, 333, Taiwan.,Department of Laboratory Medicine, Chang Gung Memorial Hospital at Linkou, Taoyuan, 333, Taiwan
| | - Shang-Jung Wu
- Institute of Biomedical Sciences, Academia Sinica, Taipei, 115, Taiwan
| | - Chia-Lin Hsu
- Institute of Biomedical Sciences, Academia Sinica, Taipei, 115, Taiwan
| | - Ai-Ru Hsieh
- Department of Statistics, Tamkang University, New Taipei City, 251301, Taiwan.
| | - Wei-Shiung Yang
- Graduate Institute of Medical Genomics and Proteomics, College of Medicine, National Taiwan University, Taipei, 10617, Taiwan. .,Graduate Institute of Clinical Medicine, College of Medicine, National Taiwan University, Taipei, 10617, Taiwan. .,Department of Internal Medicine, National Taiwan University Hospital, Taipei, 100225, Taiwan.
| | - Cathy S J Fann
- Institute of Biomedical Sciences, Academia Sinica, Taipei, 115, Taiwan.
| |
Collapse
|
6
|
Lu Z, Gopalan S, Yuan D, Conti DV, Pasaniuc B, Gusev A, Mancuso N. Multi-ancestry fine-mapping improves precision to identify causal genes in transcriptome-wide association studies. Am J Hum Genet 2022; 109:1388-1404. [PMID: 35931050 PMCID: PMC9388396 DOI: 10.1016/j.ajhg.2022.07.002] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 06/30/2022] [Indexed: 02/06/2023] Open
Abstract
Transcriptome-wide association studies (TWASs) are a powerful approach to identify genes whose expression is associated with complex disease risk. However, non-causal genes can exhibit association signals due to confounding by linkage disequilibrium (LD) patterns and eQTL pleiotropy at genomic risk regions, which necessitates fine-mapping of TWAS signals. Here, we present MA-FOCUS, a multi-ancestry framework for the improved identification of genes underlying traits of interest. We demonstrate that by leveraging differences in ancestry-specific patterns of LD and eQTL signals, MA-FOCUS consistently outperforms single-ancestry fine-mapping approaches with equivalent total sample sizes across multiple metrics. We perform TWASs for 15 blood traits using genome-wide summary statistics (average nEA = 511 k, nAA = 13 k) and lymphoblastoid cell line eQTL data from cohorts of primarily European and African continental ancestries. We recapitulate evidence demonstrating shared genetic architectures for eQTL and blood traits between the two ancestry groups and observe that gene-level effects correlate 20% more strongly across ancestries than SNP-level effects. Lastly, we perform fine-mapping using MA-FOCUS and find evidence that genes at TWAS risk regions are more likely to be shared across ancestries than they are to be ancestry specific. Using multiple lines of evidence to validate our findings, we find that gene sets produced by MA-FOCUS are more enriched in hematopoietic categories than alternative approaches (p = 2.36 × 10-15). Our work demonstrates that including and appropriately accounting for genetic diversity can drive more profound insights into the genetic architecture of complex traits.
Collapse
Affiliation(s)
- Zeyun Lu
- Biostatistics Division, Department of Population and Public Health Sciences, University of Southern California, Los Angeles, CA, USA,Corresponding author
| | - Shyamalika Gopalan
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA,Department of Evolutionary Anthropology, Duke University, Durham, NC, USA
| | - Dong Yuan
- Biostatistics Division, Department of Population and Public Health Sciences, University of Southern California, Los Angeles, CA, USA
| | - David V. Conti
- Biostatistics Division, Department of Population and Public Health Sciences, University of Southern California, Los Angeles, CA, USA,Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Bogdan Pasaniuc
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, USA,Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA,Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA,Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Alexander Gusev
- Division of Population Sciences, Dana-Farber Cancer Institute & Harvard Medical School, Boston, MA, USA,Division of Genetics, Brigham & Women’s Hospital, Boston, MA, USA,Program in Medical and Population Genetics, The Broad Institute, Cambridge, MA, USA
| | - Nicholas Mancuso
- Biostatistics Division, Department of Population and Public Health Sciences, University of Southern California, Los Angeles, CA, USA,Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA,Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA,Corresponding author
| |
Collapse
|