1
|
Khodayari Moez E, Hajihosseini M, Andrews JL, Dinu I. Longitudinal linear combination test for gene set analysis. BMC Bioinformatics 2019; 20:650. [PMID: 31822265 PMCID: PMC6902471 DOI: 10.1186/s12859-019-3221-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Accepted: 11/13/2019] [Indexed: 11/12/2022] Open
Abstract
Background Although microarray studies have greatly contributed to recent genetic advances, lack of replication has been a continuing concern in this area. Complex study designs have the potential to address this concern, though they remain undervalued by investigators due to the lack of proper analysis methods. The primary challenge in the analysis of complex microarray study data is handling the correlation structure within data while also dealing with the combination of large number of genetic measurements and small number of subjects that are ubiquitous even in standard microarray studies. Motivated by the lack of available methods for analysis of repeatedly measured phenotypic or transcriptomic data, herein we develop a longitudinal linear combination test (LLCT). Results LLCT is a two-step method to analyze multiple longitudinal phenotypes when there is high dimensionality in response and/or explanatory variables. Alternating between calculating within-subjects and between-subjects variations in two steps, LLCT examines if the maximum possible correlation between a linear combination of the time trends and a linear combination of the predictors given by the gene expressions is statistically significant. A generalization of this method can handle family-based study designs when the subjects are not independent. This method is also applicable to time-course microarray, with the ability to identify gene sets that exhibit significantly different expression patterns over time. Based on the results from a simulation study, LLCT outperformed its alternative: pathway analysis via regression. LLCT was shown to be very powerful in the analysis of large gene sets even when the sample size is small. Conclusions This self-contained pathway analysis method is applicable to a wide range of longitudinal genomics, proteomics, metabolomics (OMICS) data, allows adjusting for potentially time-dependent covariates and works well with unbalanced and incomplete data. An important potential application of this method could be time-course linkage of OMICS, an attractive possibility for future genetic researchers. Availability: R package of LLCT is available at: https://github.com/its-likeli-jeff/LLCT
Collapse
|
2
|
Blangero J, Teslovich TM, Sim X, Almeida MA, Jun G, Dyer TD, Johnson M, Peralta JM, Manning A, Wood AR, Fuchsberger C, Kent JW, Aguilar DA, Below JE, Farook VS, Arya R, Fowler S, Blackwell TW, Puppala S, Kumar S, Glahn DC, Moses EK, Curran JE, Thameem F, Jenkinson CP, DeFronzo RA, Lehman DM, Hanis C, Abecasis G, Boehnke M, Göring H, Duggirala R, Almasy L. Omics-squared: human genomic, transcriptomic and phenotypic data for genetic analysis workshop 19. BMC Proc 2016; 10:71-77. [PMID: 27980614 PMCID: PMC5133484 DOI: 10.1186/s12919-016-0008-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background The Genetic Analysis Workshops (GAW) are a forum for development, testing, and comparison of statistical genetic methods and software. Each contribution to the workshop includes an application to a specified data set. Here we describe the data distributed for GAW19, which focused on analysis of human genomic and transcriptomic data. Methods GAW19 data were donated by the T2D-GENES Consortium and the San Antonio Family Heart Study and included whole genome and exome sequences for odd-numbered autosomes, measures of gene expression, systolic and diastolic blood pressures, and related covariates in two Mexican American samples. These two samples were a collection of 20 large families with whole genome sequence and transcriptomic data and a set of 1943 unrelated individuals with exome sequence. For each sample, simulated phenotypes were constructed based on the real sequence data. ‘Functional’ genes and variants for the simulations were chosen based on observed correlations between gene expression and blood pressure. The simulations focused primarily on additive genetic models but also included a genotype-by-medication interaction. A total of 245 genes were designated as ‘functional’ in the simulations with a few genes of large effect and most genes explaining < 1 % of the trait variation. An additional phenotype, Q1, was simulated to be correlated among related individuals, based on theoretical or empirical kinship matrices, but was not associated with any sequence variants. Two hundred replicates of the phenotypes were simulated. The GAW19 data are an expansion of the data used at GAW18, which included the family-based whole genome sequence, blood pressure, and simulated phenotypes, but not the gene expression data or the set of 1943 unrelated individuals with exome sequence.
Collapse
Affiliation(s)
- John Blangero
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley, Harlingen, TX 78550 USA
| | - Tanya M Teslovich
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109 USA
| | - Xueling Sim
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109 USA
| | - Marcio A Almeida
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley, Harlingen, TX 78550 USA
| | - Goo Jun
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109 USA ; Department of Epidemiology, Human Genetics and Environmenal Sciences, University of Texas Health Science Center at Houston, Houston, TX 77030 USA
| | - Thomas D Dyer
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley, Harlingen, TX 78550 USA
| | - Matthew Johnson
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley, Harlingen, TX 78550 USA
| | - Juan M Peralta
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley, Harlingen, TX 78550 USA
| | - Alisa Manning
- Department of Genetics, Massachusetts General Hospital, Boston, MA 02114 USA
| | - Andrew R Wood
- Genetics of Complex Traits, Peninsula College of Medicine and Dentistry, University of Exeter, Exeter, UK
| | - Christian Fuchsberger
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109 USA
| | - Jack W Kent
- Department of Genetics, Texas Biomedical Research Institute, 7620 NW Loop 410, San Antonio, TX 78227 USA
| | - David A Aguilar
- Cardiovascular Division, Baylor College of Medicine, Houston, TX 77030 USA
| | - Jennifer E Below
- Department of Epidemiology, Human Genetics and Environmenal Sciences, University of Texas Health Science Center at Houston, Houston, TX 77030 USA
| | - Vidya S Farook
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley, Harlingen, TX 78550 USA
| | - Rector Arya
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley, Harlingen, TX 78550 USA
| | - Sharon Fowler
- Division of Clinical Epidemiology, Department of Medicine, University of San Antonio Health Science Center at San Antonio, San Antonio, TX 78229 USA
| | - Tom W Blackwell
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109 USA
| | - Sobha Puppala
- Department of Genetics, Texas Biomedical Research Institute, 7620 NW Loop 410, San Antonio, TX 78227 USA
| | - Satish Kumar
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley, Harlingen, TX 78550 USA
| | - David C Glahn
- Department of Psychiatry, Yale University, New Haven, CT 06106 USA
| | - Eric K Moses
- Centre for Genetic Origins of Health and Disease, University of Western Australia, Crawley, Australia
| | - Joanne E Curran
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley, Harlingen, TX 78550 USA
| | - Farook Thameem
- Department of Biochemistry, Faculty of Medicine, Kuwait University, Safat, Kuwait City, 13110 Kuwait
| | - Christopher P Jenkinson
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley, Harlingen, TX 78550 USA
| | - Ralph A DeFronzo
- Texas Diabetes Institute, University of San Antonio Health Science Center at San Antonio, San Antonio, TX 78229 USA
| | - Donna M Lehman
- Division of Clinical Epidemiology, Department of Medicine, University of San Antonio Health Science Center at San Antonio, San Antonio, TX 78229 USA
| | - Craig Hanis
- Department of Epidemiology, Human Genetics and Environmenal Sciences, University of Texas Health Science Center at Houston, Houston, TX 77030 USA
| | - Goncalo Abecasis
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109 USA
| | - Michael Boehnke
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109 USA
| | - Harald Göring
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley, Harlingen, TX 78550 USA
| | - Ravindranath Duggirala
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley, Harlingen, TX 78550 USA
| | - Laura Almasy
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley, Harlingen, TX 78550 USA ; Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104 USA
| | | |
Collapse
|