Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Okoro PC, Schubert R, Guo X, Johnson WC, Rotter JI, Hoeschele I, Liu Y, Im HK, Luke A, Dugas LR, Wheeler HE. Transcriptome prediction performance across machine learning models and diverse ancestries. HGG Adv 2021;2:100019. [PMID: 33937878 PMCID: PMC8087249 DOI: 10.1016/j.xhgg.2020.100019] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

For:	Okoro PC, Schubert R, Guo X, Johnson WC, Rotter JI, Hoeschele I, Liu Y, Im HK, Luke A, Dugas LR, Wheeler HE. Transcriptome prediction performance across machine learning models and diverse ancestries. HGG Adv 2021;2:100019. [PMID: 33937878 PMCID: PMC8087249 DOI: 10.1016/j.xhgg.2020.100019] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Number

Cited by Other Article(s)

Danaeifar M, Najafi A. Artificial Intelligence and Computational Biology in Gene Therapy: A Review. Biochem Genet 2024:10.1007/s10528-024-10799-1. [PMID: 38635012 DOI: 10.1007/s10528-024-10799-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 04/02/2024] [Indexed: 04/19/2024]

Malakhov MM, Dai B, Shen XT, Pan W. A BOOTSTRAP MODEL COMPARISON TEST FOR IDENTIFYING GENES WITH CONTEXT-SPECIFIC PATTERNS OF GENETIC REGULATION. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.06.531446. [PMID: 36945657 PMCID: PMC10028853 DOI: 10.1101/2023.03.06.531446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/09/2023]

Araujo DS, Nguyen C, Hu X, Mikhaylova AV, Gignoux C, Ardlie K, Taylor KD, Durda P, Liu Y, Papanicolaou G, Cho MH, Rich SS, Rotter JI, Im HK, Manichaikul A, Wheeler HE. Multivariate adaptive shrinkage improves cross-population transcriptome prediction and association studies in underrepresented populations. HGG ADVANCES 2023;4:100216. [PMID: 37869564 PMCID: PMC10589725 DOI: 10.1016/j.xhgg.2023.100216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 06/27/2023] [Indexed: 10/24/2023] Open

Abstract

Transcriptome prediction models built with data from European-descent individuals are less accurate when applied to different populations because of differences in linkage disequilibrium patterns and allele frequencies. We hypothesized that methods that leverage shared regulatory effects across different conditions, in this case, across different populations, may improve cross-population transcriptome prediction. To test this hypothesis, we made transcriptome prediction models for use in transcriptome-wide association studies (TWASs) using different methods (elastic net, joint-tissue imputation [JTI], matrix expression quantitative trait loci [Matrix eQTL], multivariate adaptive shrinkage in R [MASHR], and transcriptome-integrated genetic association resource [TIGAR]) and tested their out-of-sample transcriptome prediction accuracy in population-matched and cross-population scenarios. Additionally, to evaluate model applicability in TWASs, we integrated publicly available multiethnic genome-wide association study (GWAS) summary statistics from the Population Architecture using Genomics and Epidemiology (PAGE) study and Pan-ancestry genetic analysis of the UK Biobank (PanUKBB) with our developed transcriptome prediction models. In regard to transcriptome prediction accuracy, MASHR models performed better or the same as other methods in both population-matched and cross-population transcriptome predictions. Furthermore, in multiethnic TWASs, MASHR models yielded more discoveries that replicate in both PAGE and PanUKBB across all methods analyzed, including loci previously mapped in GWASs and loci previously not found in GWASs. Overall, our study demonstrates the importance of using methods that benefit from different populations' effect size estimates in order to improve TWASs for multiethnic or underrepresented populations.

Collapse

Affiliation(s)

Daniel S. Araujo Program in Bioinformatics, Loyola University Chicago, Chicago, IL 60660, USA
Chris Nguyen Department of Biology, Loyola University Chicago, Chicago, IL 60660, USA
Xiaowei Hu Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia, Charlottesville, VA 22908, USA
Anna V. Mikhaylova Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
Chris Gignoux Division of Biomedical Informatics and Personalized Medicine, Department of Medicine, UC Denver Anschutz Medical Campus, Aurora, CO 80045, USA
Kristin Ardlie Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
Kent D. Taylor The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, the Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502, USA
Peter Durda Laboratory for Clinical Biochemistry Research, University of Vermont, Colchester, VT 05446, USA
Yongmei Liu Department of Medicine, Duke University School of Medicine, Durham, NC 27710, USA
George Papanicolaou Epidemiology Branch, Division of Cardiovascular Sciences, National Heart, Lung and Blood Institute, Bethesda, MD 20892, USA
Michael H. Cho Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA
Stephen S. Rich Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia, Charlottesville, VA 22908, USA
Jerome I. Rotter The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, the Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502, USA
NHLBI TOPMed Consortium Program in Bioinformatics, Loyola University Chicago, Chicago, IL 60660, USA Department of Biology, Loyola University Chicago, Chicago, IL 60660, USA Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia, Charlottesville, VA 22908, USA Department of Biostatistics, University of Washington, Seattle, WA 98195, USA Division of Biomedical Informatics and Personalized Medicine, Department of Medicine, UC Denver Anschutz Medical Campus, Aurora, CO 80045, USA Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, the Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502, USA Laboratory for Clinical Biochemistry Research, University of Vermont, Colchester, VT 05446, USA Department of Medicine, Duke University School of Medicine, Durham, NC 27710, USA Epidemiology Branch, Division of Cardiovascular Sciences, National Heart, Lung and Blood Institute, Bethesda, MD 20892, USA Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA Section of Genetic Medicine, University of Chicago, Chicago, IL 60637, USA
Hae Kyung Im Section of Genetic Medicine, University of Chicago, Chicago, IL 60637, USA
Ani Manichaikul Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia, Charlottesville, VA 22908, USA
Heather E. Wheeler Program in Bioinformatics, Loyola University Chicago, Chicago, IL 60660, USA Department of Biology, Loyola University Chicago, Chicago, IL 60660, USA

Collapse

Mai J, Lu M, Gao Q, Zeng J, Xiao J. Transcriptome-wide association studies: recent advances in methods, applications and available databases. Commun Biol 2023;6:899. [PMID: 37658226 PMCID: PMC10474133 DOI: 10.1038/s42003-023-05279-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 08/24/2023] [Indexed: 09/03/2023] Open

Affiliation(s)

Jialin Mai National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China University of Chinese Academy of Sciences, Beijing, 100049, China
Mingming Lu National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China University of Chinese Academy of Sciences, Beijing, 100049, China
Qianwen Gao National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China University of Chinese Academy of Sciences, Beijing, 100049, China
Jingyao Zeng National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China. CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China.
Jingfa Xiao National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China. CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China. University of Chinese Academy of Sciences, Beijing, 100049, China.

Collapse

Ren J, Lin Z, He R, Shen X, Pan W. Using GWAS summary data to impute traits for genotyped individuals. HGG ADVANCES 2023;4:100197. [PMID: 37181332 PMCID: PMC10173780 DOI: 10.1016/j.xhgg.2023.100197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Accepted: 04/07/2023] [Indexed: 05/16/2023] Open

Araujo DS, Nguyen C, Hu X, Mikhaylova AV, Gignoux C, Ardlie K, Taylor KD, Durda P, Liu Y, Papanicolaou G, Cho MH, Rich SS, Rotter JI, Im HK, Manichaikul A, Wheeler HE. Multivariate adaptive shrinkage improves cross-population transcriptome prediction for transcriptome-wide association studies in underrepresented populations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.09.527747. [PMID: 36798214 PMCID: PMC9934635 DOI: 10.1101/2023.02.09.527747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]

Affiliation(s)

Daniel S. Araujo Program in Bioinformatics, Loyola University Chicago, Chicago, IL, 60660, USA
Chris Nguyen Department of Biology, Loyola University Chicago, Chicago, IL, 60660, USA
Xiaowei Hu Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia, Charlottesville, VA, 22908, USA
Anna V. Mikhaylova Department of Biostatistics, University of Washington, Seattle, WA, 98195, USA
Chris Gignoux Division of Biomedical Informatics and Personalized Medicine, Department of Medicine, UC Denver Anschutz Medical Campus, Aurora, CO, 80045, USA
Kristin Ardlie Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
Kent D. Taylor The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, 90502, USA
Peter Durda Laboratory for Clinical Biochemistry Research, University of Vermont, Colchester, VT, 05446, USA
Yongmei Liu Department of Medicine, Duke University School of Medicine, Durham, NC, 27710, USA
George Papanicolaou Epidemiology Branch, Division of Cardiovascular Sciences, National Heart, Lung and Blood Institute, Bethesda, MD, 20892, USA
Michael H. Cho Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, MA, 02115, USA
Stephen S. Rich Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia, Charlottesville, VA, 22908, USA
Jerome I. Rotter The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, 90502, USA
NHLBI TOPMed Consortium
Hae Kyung Im Section of Genetic Medicine, The University of Chicago, Chicago, IL, 60637, USA
Ani Manichaikul Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia, Charlottesville, VA, 22908, USA
Heather E. Wheeler Program in Bioinformatics, Loyola University Chicago, Chicago, IL, 60660, USA Department of Biology, Loyola University Chicago, Chicago, IL, 60660, USA

Collapse

Kim J, Kim H, Lee MS, Lee H, Kim YJ, Lee WY, Yun SH, Kim HC, Hong HK, Hannenhalli S, Cho YB, Park D, Choi SS. Transcriptomes of the tumor-adjacent normal tissues are more informative than tumors in predicting recurrence in colorectal cancer patients. J Transl Med 2023;21:209. [PMID: 36941605 PMCID: PMC10029176 DOI: 10.1186/s12967-023-04053-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2022] [Accepted: 03/10/2023] [Indexed: 03/23/2023] Open

Affiliation(s)

Jinho Kim Division of Biomedical Convergence, College of Biomedical Science, Institute of Bioscience & Biotechnology, Kangwon National University, Chuncheon, 24341, Korea
Hyunjung Kim Precision Medicine Center, Future Innovation Research Division, Seoul National University Bundang Hospital, Seongnam, 13620, Korea
Min-Seok Lee Division of Biomedical Convergence, College of Biomedical Science, Institute of Bioscience & Biotechnology, Kangwon National University, Chuncheon, 24341, Korea
Heetak Lee Precision Medicine Center, Future Innovation Research Division, Seoul National University Bundang Hospital, Seongnam, 13620, Korea Center for Genome Engineering, Institute for Basic Science, 55, Expo-ro, Yuseng-gu, Daejeon, 34126, Korea
Yeon Jeong Kim Samsung Genome Institute, Samsung Medical Center, Seoul, 06351, Korea
Woo Yong Lee Department of Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, 06351, Korea
Seong Hyeon Yun Department of Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, 06351, Korea
Hee Cheol Kim Department of Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, 06351, Korea
Hye Kyung Hong Institute for Future Medicine, Samsung Medical Center, Seoul, 06351, Korea
Sridhar Hannenhalli Cancer Data Science Lab, Center for Cancer Research, National Cancer Institute, Bethesda, 20814, MD, USA
Yong Beom Cho Department of Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, 06351, Korea. Department of Health Sciences and Technology, SAIHST, Sungkyunkwan University, Seoul, 06351, Korea.
Donghyun Park Geninus Inc., Seoul, 05836, Korea.
Sun Shim Choi Division of Biomedical Convergence, College of Biomedical Science, Institute of Bioscience & Biotechnology, Kangwon National University, Chuncheon, 24341, Korea.

Collapse

He R, Xue H, Pan W. Statistical power of transcriptome-wide association studies. Genet Epidemiol 2022;46:572-588. [PMID: 35766062 PMCID: PMC9669108 DOI: 10.1002/gepi.22491] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Revised: 05/27/2022] [Accepted: 05/31/2022] [Indexed: 01/02/2023]

Abstract

Transcriptome-Wide Association Studies (TWASs) have become increasingly popular in identifying genes (or other endophenotypes or exposures) associated with complex traits. In TWAS, one first builds a predictive model for gene expressions using an expression quantitative trait loci (eQTL) data set in stage 1, then tests the association between the predicted gene expression and a trait based on a large, independent genome-wide association study (GWAS) data set in stage 2. However, since the sample size of the eQTL data set is usually small and the coefficient of multiple determination (i.e.,R 2 ${R}^{2}$ ) of the model for many genes is also small, a question of interest is to what extent these factors affect the statistical power of TWAS. In addition, in contrast to a standard (univariate) TWAS (UV-TWAS) considering only a single gene at a time, multivariate TWAS (MV-TWAS) methods have recently emerged to account for the effects of multiple genes, or a gene's nonlinear effects, simultaneously. With the absence of the power analysis for these MV-TWAS methods, it would be of interest to investigate whether one can gain or lose power by using the newly proposed MV-TWAS instead of UV-TWAS. In this paper, we first outline a general method for sample size/power calculations for two-sample TWAS, then use real data-the Alzheimer's Disease Neuroimaging Initiative (ADNI) expression quantitative trait loci (eQTL) data and the Genotype-Tissue Expression (GTEx) eQTL data for stage 1, the International Genomics of Alzheimer's Project Alzheimer's disease (AD) GWAS summary data and UK Biobank (UKB) individual-level data for stage 2-to empirically address these questions. Our most important conclusions are the following. First, a sample size of a few thousands (~8000) would suffice in stage 1, where the power of TWAS would be more determined by cis-heritability of gene expression. Second, as in the general case of simple regression versus multiple regression, the power of MV-TWAS may be higher or lower than that of UV-TWAS, depending on the specific relationships among the GWAS trait and multiple genes (or linear and nonlinear terms of the same gene's expression levels), such as their correlations and effect sizes. Interestingly, several top genes with large power gains in MV-TWAS (over that in UV-TWAS) were known to be (and in our data more significantly) associated with AD. We also reached similar conclusions in an application to the GTEx whole blood gene expression data and UKB GWAS data of high-density lipoprotein cholesterol. The proposed method and the conclusions are expected to be useful in planning and designing future TWAS and other related studies (e.g., Proteome- or Metabolome-Wide Association Studies) when determining the sample sizes for the two stages.

Collapse

Elgart M, Lyons G, Romero-Brufau S, Kurniansyah N, Brody JA, Guo X, Lin HJ, Raffield L, Gao Y, Chen H, de Vries P, Lloyd-Jones DM, Lange LA, Peloso GM, Fornage M, Rotter JI, Rich SS, Morrison AC, Psaty BM, Levy D, Redline S, Sofer T. Non-linear machine learning models incorporating SNPs and PRS improve polygenic prediction in diverse human populations. Commun Biol 2022;5:856. [PMID: 35995843 PMCID: PMC9395509 DOI: 10.1038/s42003-022-03812-z] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Accepted: 08/05/2022] [Indexed: 01/03/2023] Open

Affiliation(s)

Michael Elgart Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA. Department of Medicine, Harvard Medical School, Boston, MA, USA.
Genevieve Lyons Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
Santiago Romero-Brufau Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA Department of Medicine, Mayo Clinic, Rochester, MN, USA
Nuzulul Kurniansyah Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA
Jennifer A Brody Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
Xiuqing Guo The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
Henry J Lin The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
Laura Raffield Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
Yan Gao The Jackson Heart Study, University of Mississippi Medical Center, Jackson, MS, USA
Han Chen Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
Paul de Vries Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
Donald M Lloyd-Jones Department of Preventive Medicine, Northwestern University, Chicago, IL, USA
Leslie A Lange Department of Medicine, University of Colorado Denver, Anschutz Medical Campus, Aurora, CO, USA
Gina M Peloso Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
Myriam Fornage Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA Brown Foundation Institute of Molecular Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA
Jerome I Rotter The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
Stephen S Rich Center for Public Health Genomics, University of Virginia School of Medicine, Charlottesville, VA, USA
Alanna C Morrison Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
Bruce M Psaty Cardiovascular Health Research Unit, Departments of Medicine, Epidemiology, and Health Services, University of Washington, Seattle, WA, USA
Daniel Levy The Population Sciences Branch of the National Heart, Lung and Blood Institute, Bethesda, MD, USA The Framingham Heart Study, Framingham, MA, USA
Susan Redline Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA Department of Medicine, Harvard Medical School, Boston, MA, USA
Tamar Sofer Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA. Department of Medicine, Harvard Medical School, Boston, MA, USA. Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.

Collapse

Network Approaches for Charting the Transcriptomic and Epigenetic Landscape of the Developmental Origins of Health and Disease. Genes (Basel) 2022;13:genes13050764. [PMID: 35627149 PMCID: PMC9141211 DOI: 10.3390/genes13050764] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 04/04/2022] [Accepted: 04/13/2022] [Indexed: 02/04/2023] Open

Schubert R, Geoffroy E, Gregga I, Mulford AJ, Aguet F, Ardlie K, Gerszten R, Clish C, Van Den Berg D, Taylor KD, Durda P, Johnson WC, Cornell E, Guo X, Liu Y, Tracy R, Conomos M, Blackwell T, Papanicolaou G, Lappalainen T, Mikhaylova AV, Thornton TA, Cho MH, Gignoux CR, Lange L, Lange E, Rich SS, Rotter JI, Manichaikul A, Im HK, Wheeler HE. Protein prediction for trait mapping in diverse populations. PLoS One 2022;17:e0264341. [PMID: 35202437 PMCID: PMC8870552 DOI: 10.1371/journal.pone.0264341] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 02/08/2022] [Indexed: 11/18/2022] Open

Abstract

Genetically regulated gene expression has helped elucidate the biological mechanisms underlying complex traits. Improved high-throughput technology allows similar interrogation of the genetically regulated proteome for understanding complex trait mechanisms. Here, we used the Trans-omics for Precision Medicine (TOPMed) Multi-omics pilot study, which comprises data from Multi-Ethnic Study of Atherosclerosis (MESA), to optimize genetic predictors of the plasma proteome for genetically regulated proteome-wide association studies (PWAS) in diverse populations. We built predictive models for protein abundances using data collected in TOPMed MESA, for which we have measured 1,305 proteins by a SOMAscan assay. We compared predictive models built via elastic net regression to models integrating posterior inclusion probabilities estimated by fine-mapping SNPs prior to elastic net. In order to investigate the transferability of predictive models across ancestries, we built protein prediction models in all four of the TOPMed MESA populations, African American (n = 183), Chinese (n = 71), European (n = 416), and Hispanic/Latino (n = 301), as well as in all populations combined. As expected, fine-mapping produced more significant protein prediction models, especially in African ancestries populations, potentially increasing opportunity for discovery. When we tested our TOPMed MESA models in the independent European INTERVAL study, fine-mapping improved cross-ancestries prediction for some proteins. Using GWAS summary statistics from the Population Architecture using Genomics and Epidemiology (PAGE) study, which comprises ∼50,000 Hispanic/Latinos, African Americans, Asians, Native Hawaiians, and Native Americans, we applied S-PrediXcan to perform PWAS for 28 complex traits. The most protein-trait associations were discovered, colocalized, and replicated in large independent GWAS using proteome prediction model training populations with similar ancestries to PAGE. At current training population sample sizes, performance between baseline and fine-mapped protein prediction models in PWAS was similar, highlighting the utility of elastic net. Our predictive models in diverse populations are publicly available for use in proteome mapping methods at https://doi.org/10.5281/zenodo.4837327.

Collapse

Affiliation(s)

Ryan Schubert Department of Mathematics and Statistics, Loyola University Chicago, Chicago, IL, United States of America Department of Biology, Loyola University Chicago, Chicago, IL, United States of America Program in Bioinformatics, Loyola University Chicago, Chicago, IL, United States of America
Elyse Geoffroy Program in Bioinformatics, Loyola University Chicago, Chicago, IL, United States of America
Isabelle Gregga Department of Biology, Loyola University Chicago, Chicago, IL, United States of America
Ashley J. Mulford Department of Biology, Loyola University Chicago, Chicago, IL, United States of America Program in Bioinformatics, Loyola University Chicago, Chicago, IL, United States of America
Francois Aguet Broad Institute, Cambridge, MA, United States of America
Kristin Ardlie Broad Institute, Cambridge, MA, United States of America
Robert Gerszten Beth Israel Deaconess Medical Center, Boston, MA, United States of America
Clary Clish Broad Institute, Cambridge, MA, United States of America
David Van Den Berg University of Southern California, Los Angeles, CA, United States of America
Kent D. Taylor The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, United States of America
Peter Durda Laboratory for Clinical Biochemistry Research, University of Vermont, Burlington, VT, United States of America
W. Craig Johnson Collaborative Health Studies Coordinating Center, University of Washington, Seattle, WA, United States of America
Elaine Cornell Laboratory for Clinical Biochemistry Research, University of Vermont, Burlington, VT, United States of America
Xiuqing Guo The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, United States of America
Yongmei Liu Department of Medicine, Duke University School of Medicine, Durham, NC, United States of America
Russell Tracy Laboratory for Clinical Biochemistry Research, University of Vermont, Burlington, VT, United States of America
Matthew Conomos Department of Biostatistics, University of Washington, Seattle, WA, United States of America
Tom Blackwell Department of Biostatistics, University of Michigan, Ann Arbor, MI, United States of America
George Papanicolaou Epidemiology Branch, National Heart, Lung and Blood Institute, Bethesda, MD, United States of America
Tuuli Lappalainen New York Genome Center and Department of Systems Biology, Columbia University, New York, NY United States of America
Anna V. Mikhaylova Department of Biostatistics, University of Washington, Seattle, WA, United States of America
Timothy A. Thornton Department of Biostatistics, University of Washington, Seattle, WA, United States of America
Michael H. Cho Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, United States of America
Christopher R. Gignoux Division of Biomedical Informatics and Personalized Medicine, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, United States of America
Leslie Lange Division of Biomedical Informatics and Personalized Medicine, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, United States of America
Ethan Lange Division of Biomedical Informatics and Personalized Medicine, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, United States of America
Stephen S. Rich Center for Public Health Genomics, University of Virginia, Charlottesville, VA, United States of America
Jerome I. Rotter The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, United States of America
NHLBI TOPMed Consortium
Ani Manichaikul Center for Public Health Genomics, University of Virginia, Charlottesville, VA, United States of America
Hae Kyung Im Section of Genetic Medicine, The University of Chicago, Chicago, IL, United States of America
Heather E. Wheeler Department of Biology, Loyola University Chicago, Chicago, IL, United States of America Program in Bioinformatics, Loyola University Chicago, Chicago, IL, United States of America * E-mail:

Collapse

Meta-imputation of transcriptome from genotypes across multiple datasets by leveraging publicly available summary-level data. PLoS Genet 2022;18:e1009571. [PMID: 35100255 PMCID: PMC8830793 DOI: 10.1371/journal.pgen.1009571] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Revised: 02/10/2022] [Accepted: 01/07/2022] [Indexed: 11/22/2022] Open

Abstract

Transcriptome wide association studies (TWAS) can be used as a powerful method to identify and interpret the underlying biological mechanisms behind GWAS by mapping gene expression levels with phenotypes. In TWAS, gene expression is often imputed from individual-level genotypes of regulatory variants identified from external resources, such as Genotype-Tissue Expression (GTEx) Project. In this setting, a straightforward approach to impute expression levels of a specific tissue is to use the model trained from the same tissue type. When multiple tissues are available for the same subjects, it has been demonstrated that training imputation models from multiple tissue types improves the accuracy because of shared eQTLs between the tissues and increase in effective sample size. However, existing joint-tissue methods require access of genotype and expression data across all tissues. Moreover, they cannot leverage the abundance of various expression datasets across various tissues for non-overlapping individuals. Here, we explore the optimal way to combine imputed levels across training models from multiple tissues and datasets in a flexible manner using summary-level data. Our proposed method (SWAM) combines arbitrary number of transcriptome imputation models to linearly optimize the imputation accuracy given a target tissue. By integrating models across tissues and/or individuals, SWAM can improve the accuracy of transcriptome imputation or to improve power to TWAS while only requiring individual-level data from a single reference cohort. To evaluate the accuracy of SWAM, we combined 49 tissue-specific gene expression imputation models from the GTEx Project as well as from a large eQTL study of Depression Susceptibility Genes and Networks (DGN) Project and tested imputation accuracy in GEUVADIS lymphoblastoid cell lines samples. We also extend our meta-imputation method to meta-TWAS to leverage multiple tissues in TWAS analysis with summary-level statistics. Our results capitalize on the importance of integrating multiple tissues to unravel regulatory impacts of genetic variants on complex traits.

The gene expression levels within a cell are affected by various factors, including DNA variation, cell type, cellular microenvironment, disease status, and other environmental factors surrounding the individual. The genetic component of gene expression is known to explain a substantial fraction of transcriptional variation among individuals and can be imputed from genotypes in a tissue-specific manner, by training from population-scale transcriptomic profiles designed to identify expression quantitative loci (eQTLs). Imputing gene expression levels is shown to help understand the genetic basis of human disease through Transcriptome-wide association analysis (TWAS) and Mendelian Randomization (MR). However, it has been unclear how to integrate multiple imputation models trained from individual datasets to maximize their accuracy without having to access individual genotypes and expression levels that are often protected for privacy concerns. We developed SWAM (Smartly Weighted Averaging across Multiple datasets), a meta-imputation framework which can accurately impute gene expression levels from genotypes by integrating multiple imputation models without requiring individual-level data. Our method examines the similarity or differences between resources and borrowing information most relevant to the tissue of interest. We demonstrate that SWAM outperforms existing single-tissue and multi-tissue imputation models and continue to increase accuracy when integrating additional imputation models.

Collapse

Lin Z, Xue H, Malakhov MM, Knutson KA, Pan W. Accounting for nonlinear effects of gene expression identifies additional associated genes in transcriptome-wide association studies. Hum Mol Genet 2022;31:2462-2470. [PMID: 35043938 PMCID: PMC9307319 DOI: 10.1093/hmg/ddac015] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Revised: 01/08/2022] [Accepted: 01/10/2022] [Indexed: 01/21/2023] Open