1
|
Beck A, Luedtke A, Liu K, Tintle N. A POWERFUL METHOD FOR INCLUDING GENOTYPE UNCERTAINTY IN TESTS OF HARDY-WEINBERG EQUILIBRIUM. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2016; 22:368-379. [PMID: 27896990 DOI: 10.1142/9789813207813_0035] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
The use of posterior probabilities to summarize genotype uncertainty is pervasive across genotype, sequencing and imputation platforms. Prior work in many contexts has shown the utility of incorporating genotype uncertainty (posterior probabilities) in downstream statistical tests. Typical approaches to incorporating genotype uncertainty when testing Hardy-Weinberg equilibrium tend to lack calibration in the type I error rate, especially as genotype uncertainty increases. We propose a new approach in the spirit of genomic control that properly calibrates the type I error rate, while yielding improved power to detect deviations from Hardy-Weinberg Equilibrium. We demonstrate the improved performance of our method on both simulated and real genotypes.
Collapse
Affiliation(s)
- Andrew Beck
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, 48109, USA,
| | | | | | | |
Collapse
|
2
|
Comparing genetic variants detected in the 1000 genomes project with SNPs determined by the International HapMap Consortium. J Genet 2016; 94:731-40. [PMID: 26690529 DOI: 10.1007/s12041-015-0588-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Single-nucleotide polymorphisms (SNPs) determined based on SNP arrays from the international HapMap consortium (HapMap) and the genetic variants detected in the 1000 genomes project (1KGP) can serve as two references for genomewide association studies (GWAS). We conducted comparative analyses to provide a means for assessing concerns regarding SNP array-based GWAS findings as well as for realistically bounding expectations for next generation sequencing (NGS)-based GWAS. We calculated and compared base composition, transitions to transversions ratio, minor allele frequency and heterozygous rate for SNPs from HapMap and 1KGP for the 622 common individuals. We analysed the genotype discordance between HapMap and 1KGP to assess consistency in the SNPs from the two references. In 1KGP, 90.58% of 36,817,799 SNPs detected were not measured in HapMap. More SNPs with minor allele frequencies less than 0.01 were found in 1KGP than HapMap. The two references have low disc ordance (generally smaller than 0.02) in genotypes of common SNPs, with most discordance from heterozygous SNPs. Our study demonstrated that SNP array-based GWAS findings were reliable and useful, although only a small portion of genetic variances were explained. NGS can detect not only common but also rare variants, supporting the expectation that NGS-based GWAS will be able to incorporate a much larger portion of genetic variance than SNP arrays-based GWAS.
Collapse
|
3
|
Interaction of BDNF rs6265 variants and energy and protein intake in the risk for glucose intolerance and type 2 diabetes in middle-aged adults. Nutrition 2016; 33:187-194. [PMID: 27553771 DOI: 10.1016/j.nut.2016.07.001] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2016] [Revised: 07/01/2016] [Accepted: 07/19/2016] [Indexed: 12/14/2022]
Abstract
OBJECTIVES Brain-derived neurotrophic factor (BDNF) is associated with the risk for Alzheimer's disease and type 2 diabetes. The aim of this study was to examine the association of BDNF variants with type 2 diabetes and the interactions of different BDNF genotypes with dietary habits and food and nutrient intakes in middle-aged adults. METHODS The study population included 8840 adults ages 40 to 65 y from the Ansan and Asung areas in the Korean Genome Epidemiology Study, a cross-sectional study of Korean adults, conducted from 2001 to 2002. Adjusted odd ratios for the prevalence of glucose intolerance and type 2 diabetes according to BDNF genotypes were calculated after adjusting for age, sex, residence area, body mass index, physical activity, and smoking and stress status. Nutrient intake was calculated from usual food intake determined by semiquantitative food frequencies using the nutrient assessment software. RESULTS BDNF rs6265 Val/Met and Met/Met variants were negatively associated with the risk for type 2 diabetes after adjusting for covariates. Serum glucose levels after glucose loading and hemoglobin A1c, but not serum insulin levels, also were negatively associated with BDNF Val/Met and Met/Met. In subgroup analysis, sex and stress levels had an interaction with BDNF Val/Met in the risk for type 2 diabetes. Glucose-intolerant and diabetic, but not nondiabetic, patients with BDNF Met/Met had nominally, but significantly higher intakes of energy than those with BDNF Val/Val. BDNF rs6265 had consistent gene-diet interactions with energy and protein intake. With low-energy, low-protein, and high-carbohydrate intake, BDNF Val/Met lowered the risk for type 2 diabetes after adjusting for confounding factors. BDNF Val/Met did not compensate for developing type 2 diabetes with high-energy intake. Additionally, indexes of insulin resistance and insulin secretion showed the same gene-energy interaction as type 2 diabetes. CONCLUSIONS BDNF Val/Met and Met/Met variants (rs6265) decreases the risk for glucose intolerance and type 2 diabetes. BDNF variants interacted with nutrient intake, especially energy and protein intake: Middle-aged individuals with BDNF Val/Val are prone to developing type 2 diabetes even with low energy and protein intake.
Collapse
|
4
|
Genomic Discoveries and Personalized Medicine in Neurological Diseases. Pharmaceutics 2015; 7:542-53. [PMID: 26690205 PMCID: PMC4695833 DOI: 10.3390/pharmaceutics7040542] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2015] [Revised: 11/30/2015] [Accepted: 12/02/2015] [Indexed: 12/22/2022] Open
Abstract
In the past decades, we have witnessed dramatic changes in clinical diagnoses and treatments due to the revolutions of genomics and personalized medicine. Undoubtedly we also met many challenges when we use those advanced technologies in drug discovery and development. In this review, we describe when genomic information is applied in personal healthcare in general. We illustrate some case examples of genomic discoveries and promising personalized medicine applications in the area of neurological disease particular. Available data suggest that individual genomics can be applied to better treat patients in the near future.
Collapse
|
5
|
Ye H, Meehan J, Tong W, Hong H. Alignment of Short Reads: A Crucial Step for Application of Next-Generation Sequencing Data in Precision Medicine. Pharmaceutics 2015; 7:523-41. [PMID: 26610555 PMCID: PMC4695832 DOI: 10.3390/pharmaceutics7040523] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2015] [Revised: 11/14/2015] [Accepted: 11/17/2015] [Indexed: 02/06/2023] Open
Abstract
Precision medicine or personalized medicine has been proposed as a modernized and promising medical strategy. Genetic variants of patients are the key information for implementation of precision medicine. Next-generation sequencing (NGS) is an emerging technology for deciphering genetic variants. Alignment of raw reads to a reference genome is one of the key steps in NGS data analysis. Many algorithms have been developed for alignment of short read sequences since 2008. Users have to make a decision on which alignment algorithm to use in their studies. Selection of the right alignment algorithm determines not only the alignment algorithm but also the set of suitable parameters to be used by the algorithm. Understanding these algorithms helps in selecting the appropriate alignment algorithm for different applications in precision medicine. Here, we review current available algorithms and their major strategies such as seed-and-extend and q-gram filter. We also discuss the challenges in current alignment algorithms, including alignment in multiple repeated regions, long reads alignment and alignment facilitated with known genetic variants.
Collapse
Affiliation(s)
- Hao Ye
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA.
| | - Joe Meehan
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA.
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA.
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA.
| |
Collapse
|
6
|
Zhang W, Soika V, Meehan J, Su Z, Ge W, Ng HW, Perkins R, Simonyan V, Tong W, Hong H. Quality control metrics improve repeatability and reproducibility of single-nucleotide variants derived from whole-genome sequencing. THE PHARMACOGENOMICS JOURNAL 2014; 15:298-309. [PMID: 25384574 DOI: 10.1038/tpj.2014.70] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2014] [Revised: 07/16/2014] [Accepted: 09/19/2014] [Indexed: 12/18/2022]
Abstract
Although many quality control (QC) methods have been developed to improve the quality of single-nucleotide variants (SNVs) in SNV-calling, QC methods for use subsequent to single-nucleotide polymorphism-calling have not been reported. We developed five QC metrics to improve the quality of SNVs using the whole-genome-sequencing data of a monozygotic twin pair from the Korean Personal Genome Project. The QC metrics improved both repeatability between the monozygotic twin pair and reproducibility between SNV-calling pipelines. We demonstrated the QC metrics improve reproducibility of SNVs derived from not only whole-genome-sequencing data but also whole-exome-sequencing data. The QC metrics are calculated based on the reference genome used in the alignment without accessing the raw and intermediate data or knowing the SNV-calling details. Therefore, the QC metrics can be easily adopted in downstream association analysis.
Collapse
Affiliation(s)
- W Zhang
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - V Soika
- Office of The Center Director, Center for Biologics Evaluation and Research, US Food and Drug Administration, Rockville, MD, USA
| | - J Meehan
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Z Su
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - W Ge
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - H W Ng
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - R Perkins
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - V Simonyan
- Office of The Center Director, Center for Biologics Evaluation and Research, US Food and Drug Administration, Rockville, MD, USA
| | - W Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - H Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| |
Collapse
|
7
|
Hong H, Xu L, Liu J, Jones WD, Su Z, Ning B, Perkins R, Ge W, Miclaus K, Zhang L, Park K, Green B, Han T, Fang H, Lambert CG, Vega SC, Lin SM, Jafari N, Czika W, Wolfinger RD, Goodsaid F, Tong W, Shi L. Technical reproducibility of genotyping SNP arrays used in genome-wide association studies. PLoS One 2012; 7:e44483. [PMID: 22970228 PMCID: PMC3436888 DOI: 10.1371/journal.pone.0044483] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2012] [Accepted: 08/08/2012] [Indexed: 01/25/2023] Open
Abstract
During the last several years, high-density genotyping SNP arrays have facilitated genome-wide association studies (GWAS) that successfully identified common genetic variants associated with a variety of phenotypes. However, each of the identified genetic variants only explains a very small fraction of the underlying genetic contribution to the studied phenotypic trait. Moreover, discordance observed in results between independent GWAS indicates the potential for Type I and II errors. High reliability of genotyping technology is needed to have confidence in using SNP data and interpreting GWAS results. Therefore, reproducibility of two widely genotyping technology platforms from Affymetrix and Illumina was assessed by analyzing four technical replicates from each of the six individuals in five laboratories. Genotype concordance of 99.40% to 99.87% within a laboratory for the sample platform, 98.59% to 99.86% across laboratories for the same platform, and 98.80% across genotyping platforms was observed. Moreover, arrays with low quality data were detected when comparing genotyping data from technical replicates, but they could not be detected according to venders' quality control (QC) suggestions. Our results demonstrated the technical reliability of currently available genotyping platforms but also indicated the importance of incorporating some technical replicates for genotyping QC in order to improve the reliability of GWAS results. The impact of discordant genotypes on association analysis results was simulated and could explain, at least in part, the irreproducibility of some GWAS findings when the effect size (i.e. the odds ratio) and the minor allele frequencies are low.
Collapse
Affiliation(s)
- Huixiao Hong
- Center of Excellence for Bioinformatics, Division of Systems Biology, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, Arizona, United States of America
| | - Lei Xu
- Center of Excellence for Bioinformatics, Division of Systems Biology, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, Arizona, United States of America
| | - Jie Liu
- Division of Personalized Nutrition and Medicine, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, Arizona, United States of America
| | - Wendell D. Jones
- Expression Analysis Inc., Durham, North Carolina, United States of America
| | - Zhenqiang Su
- ICF International Company at National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, Arizona, United States of America
| | - Baitang Ning
- Division of Personalized Nutrition and Medicine, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, Arizona, United States of America
| | - Roger Perkins
- Center of Excellence for Bioinformatics, Division of Systems Biology, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, Arizona, United States of America
| | - Weigong Ge
- Center of Excellence for Bioinformatics, Division of Systems Biology, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, Arizona, United States of America
| | - Kelci Miclaus
- SAS Institute Inc, Cary, North Carolina, United States of America
| | - Li Zhang
- Office of Clinical Pharmacology, Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, Maryland, United States of America
| | - Kyunghee Park
- Samsung Advanced Institute of Technology, Giheung-gu, Yongin-si Gyeonggi-do, Republic of Korea
| | - Bridgett Green
- Division of Personalized Nutrition and Medicine, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, Arizona, United States of America
| | - Tao Han
- Center of Excellence for Genomics, Division of Systems Biology, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, Arizona, United States of America
| | - Hong Fang
- ICF International Company at National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, Arizona, United States of America
| | | | - Silvia C. Vega
- Rosetta BioSoftware, Health Solutions Group, Microsoft, Seattle, Washington, United States of America
| | - Simon M. Lin
- Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, Marshfield, Wisconsin, United States of America
| | - Nadereh Jafari
- Center for Genetic Medicine, Northwestern University, Chicago, Illinois, United States of America
| | - Wendy Czika
- SAS Institute Inc, Cary, North Carolina, United States of America
| | | | - Federico Goodsaid
- Office of Clinical Pharmacology, Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, Maryland, United States of America
| | - Weida Tong
- Center of Excellence for Bioinformatics, Division of Systems Biology, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, Arizona, United States of America
| | - Leming Shi
- Center of Excellence for Bioinformatics, Division of Systems Biology, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, Arizona, United States of America
| |
Collapse
|
8
|
Fan YH, Song YQ. IPGWAS: an integrated pipeline for rational quality control and association analysis of genome-wide genetic studies. Biochem Biophys Res Commun 2012; 422:363-8. [PMID: 22564732 DOI: 10.1016/j.bbrc.2012.04.117] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2012] [Accepted: 04/21/2012] [Indexed: 01/03/2023]
Abstract
Large numbers of samples and marker loci were tested for association in genome-wide association studies (GWAS). Hence, quality control (QC) by removing individuals or markers with low genotyping quality is of utmost importance to minimize potential false positive associations. IPGWAS was developed to facilitate the identification of the rational thresholds in QC of GWAS datasets, association analysis, Manhattan plot, quantile-quantile (QQ) plot, and format conversion for genetic analyses, such as meta-analysis, genotype phasing, and imputation. IPGWAS is a multiplatform application written in Perl with a graphical user interface (GUI) and available for free at http://sourceforge.net/projects/ipgwas/.
Collapse
Affiliation(s)
- Yan-Hui Fan
- Department of Biochemistry, The University of Hong Kong, Pokfulam, Hong Kong.
| | | |
Collapse
|
9
|
Tayo BO, Teil M, Tong L, Qin H, Khitrov G, Zhang W, Song Q, Gottesman O, Zhu X, Pereira AC, Cooper RS, Bottinger EP. Genetic background of patients from a university medical center in Manhattan: implications for personalized medicine. PLoS One 2011; 6:e19166. [PMID: 21573225 PMCID: PMC3087725 DOI: 10.1371/journal.pone.0019166] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2010] [Accepted: 03/28/2011] [Indexed: 11/29/2022] Open
Abstract
Background The rapid progress currently being made in genomic science has created interest in potential clinical applications; however, formal translational research has been limited thus far. Studies of population genetics have demonstrated substantial variation in allele frequencies and haplotype structure at loci of medical relevance and the genetic background of patient cohorts may often be complex. Methods and Findings To describe the heterogeneity in an unselected clinical sample we used the Affymetrix 6.0 gene array chip to genotype self-identified European Americans (N = 326), African Americans (N = 324) and Hispanics (N = 327) from the medical practice of Mount Sinai Medical Center in Manhattan, NY. Additional data from US minority groups and Brazil were used for external comparison. Substantial variation in ancestral origin was observed for both African Americans and Hispanics; data from the latter group overlapped with both Mexican Americans and Brazilians in the external data sets. A pooled analysis of the African Americans and Hispanics from NY demonstrated a broad continuum of ancestral origin making classification by race/ethnicity uninformative. Selected loci harboring variants associated with medical traits and drug response confirmed substantial within- and between-group heterogeneity. Conclusion As a consequence of these complementary levels of heterogeneity group labels offered no guidance at the individual level. These findings demonstrate the complexity involved in clinical translation of the results from genome-wide association studies and suggest that in the genomic era conventional racial/ethnic labels are of little value.
Collapse
Affiliation(s)
- Bamidele O. Tayo
- Department of Preventive Medicine and Epidemiology, Loyola University Chicago Stritch School of Medicine, Maywood, Illinois, United States of America
- * E-mail: (BOT); (EPB)
| | - Marie Teil
- Charles R. Bronfman Institute for Personalized Medicine, Mount Sinai School of Medicine, New York, New York, United States of America
| | - Liping Tong
- Department of Preventive Medicine and Epidemiology, Loyola University Chicago Stritch School of Medicine, Maywood, Illinois, United States of America
| | - Huaizhen Qin
- Department of Biostatistics and Epidemiology, Case Western University, Cleveland, Ohio, United States of America
| | - Gregory Khitrov
- Charles R. Bronfman Institute for Personalized Medicine, Mount Sinai School of Medicine, New York, New York, United States of America
| | - Weijia Zhang
- Charles R. Bronfman Institute for Personalized Medicine, Mount Sinai School of Medicine, New York, New York, United States of America
| | - Quinbin Song
- Charles R. Bronfman Institute for Personalized Medicine, Mount Sinai School of Medicine, New York, New York, United States of America
| | - Omri Gottesman
- Charles R. Bronfman Institute for Personalized Medicine, Mount Sinai School of Medicine, New York, New York, United States of America
| | - Xiaofeng Zhu
- Department of Biostatistics and Epidemiology, Case Western University, Cleveland, Ohio, United States of America
| | | | - Richard S. Cooper
- Department of Preventive Medicine and Epidemiology, Loyola University Chicago Stritch School of Medicine, Maywood, Illinois, United States of America
| | - Erwin P. Bottinger
- Charles R. Bronfman Institute for Personalized Medicine, Mount Sinai School of Medicine, New York, New York, United States of America
- * E-mail: (BOT); (EPB)
| |
Collapse
|