1
|
Zheng H, Ye Y, Huang H, Huang C, Gao W, Wang M, Li W, Zhou R, Jiang J, Wang S, Yu C, Lv J, Wu X, Huang X, Cao W, Yan Y, Zheng K, Wu T, Li L. A pedigree-based cohort to study the genetic risk factors for cardiometabolic diseases: study design, baseline characteristics and preliminary results. Front Public Health 2023; 11:1189993. [PMID: 37521988 PMCID: PMC10374840 DOI: 10.3389/fpubh.2023.1189993] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 06/26/2023] [Indexed: 08/01/2023] Open
Abstract
Background We initiated the Fujian Tulou Pedigree-based Cohort (FTPC) as the integration of extended pedigrees and prospective cohort to clarify the genetic and environmental risk factors of cardiometabolic diseases. Methods FTPC was carried out in Nanjing County, Fujian Province, China from August 2015 to December 2017 to recruit probands with the same surnames and then enroll their first-degree and more distant relatives. The participants were asked to complete questionnaire interview, physical examination, and blood collection. According to the local genealogical booklets and family registry, we reconstructed extended pedigrees to estimate the heritability of cardiometabolic traits. The follow-up of FTPC is scheduled every 5 years in the future. Results The baseline survey interviewed 2,727 individuals in two clans. A total of 1,563 adult subjects who completed all baseline examinations were used to reconstruct pedigrees and 452 extended pedigrees were finally identified, including one seven-generation pedigree, two five-generation pedigrees, 23 four-generation pedigrees, 186 three-generation pedigrees, and 240 two-generation pedigrees. The average age of the participants was 57.4 years, with 43.6% being males. The prevalence of hypertension, diabetes and dyslipidemia in FTPC were 49.2, 10.0, and 45.2%, respectively. Based on the pedigree structure, the heritability of systolic blood pressure, diastolic blood pressure, fast blood glucose, total cholesterol, triglyceride, high-density lipoprotein, and low-density lipoprotein was estimated at 0.379, 0.306, 0.386, 0.452, 0.568, 0.852, and 0.387, respectively. Conclusion As an extended pedigree cohort in China, FTPC will provide an important source to study both genetic and environmental risk factors prospectively.
Collapse
Affiliation(s)
- Hongchen Zheng
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, China
- Key Laboratory of Epidemiology of Major Diseases (Peking University), Ministry of Education, Beijing, China
- Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Laboratory of Genetics, Peking University Cancer Hospital and Institute, Beijing, China
| | - Ying Ye
- Department of Local Diseases Control and Prevention, Fujian Provincial Center for Disease Control and Prevention, Fuzhou, China
| | - Hui Huang
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, China
- Key Laboratory of Epidemiology of Major Diseases (Peking University), Ministry of Education, Beijing, China
| | - Chunlan Huang
- Department of Hygiene, Nanjing Country Center for Disease Control and Prevention, Nanjing, China
| | - Wenjing Gao
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, China
- Key Laboratory of Epidemiology of Major Diseases (Peking University), Ministry of Education, Beijing, China
| | - Mengying Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, China
- Key Laboratory of Epidemiology of Major Diseases (Peking University), Ministry of Education, Beijing, China
| | - Wenyong Li
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, China
- Key Laboratory of Epidemiology of Major Diseases (Peking University), Ministry of Education, Beijing, China
| | - Ren Zhou
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, China
- Key Laboratory of Epidemiology of Major Diseases (Peking University), Ministry of Education, Beijing, China
| | - Jin Jiang
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, China
- Key Laboratory of Epidemiology of Major Diseases (Peking University), Ministry of Education, Beijing, China
| | - Siyue Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, China
- Key Laboratory of Epidemiology of Major Diseases (Peking University), Ministry of Education, Beijing, China
| | - Canqing Yu
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, China
- Key Laboratory of Epidemiology of Major Diseases (Peking University), Ministry of Education, Beijing, China
- Center for Public Health and Epidemic Preparedness and Response, Peking University, Beijing, China
| | - Jun Lv
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, China
- Key Laboratory of Epidemiology of Major Diseases (Peking University), Ministry of Education, Beijing, China
- Center for Public Health and Epidemic Preparedness and Response, Peking University, Beijing, China
| | - Xiaoling Wu
- Department of Hygiene, Nanjing Country Center for Disease Control and Prevention, Nanjing, China
| | - Xiaoming Huang
- Department of Hygiene, Nanjing Country Center for Disease Control and Prevention, Nanjing, China
| | - Weihua Cao
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, China
- Key Laboratory of Epidemiology of Major Diseases (Peking University), Ministry of Education, Beijing, China
| | - Yansheng Yan
- Department of Local Diseases Control and Prevention, Fujian Provincial Center for Disease Control and Prevention, Fuzhou, China
| | - Kuicheng Zheng
- Fujian Provincial Center for Disease Control and Prevention, Fuzhou, China
| | - Tao Wu
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, China
- Key Laboratory of Epidemiology of Major Diseases (Peking University), Ministry of Education, Beijing, China
- Key Laboratory of Reproductive Health, Ministry of Health, Beijing, China
| | - Liming Li
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, China
- Key Laboratory of Epidemiology of Major Diseases (Peking University), Ministry of Education, Beijing, China
- Center for Public Health and Epidemic Preparedness and Response, Peking University, Beijing, China
| |
Collapse
|
2
|
Lee D, Kim Y, Chung Y, Lee D, Seo D, Choi TJ, Lim D, Yoon D, Lee SH. Accuracy of genotype imputation based on reference population size and marker density in Hanwoo cattle. JOURNAL OF ANIMAL SCIENCE AND TECHNOLOGY 2021; 63:1232-1246. [PMID: 34957440 PMCID: PMC8672260 DOI: 10.5187/jast.2021.e117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 10/13/2021] [Accepted: 10/14/2021] [Indexed: 11/20/2022]
Abstract
Recently, the cattle genome sequence has been completed, followed by developing a
commercial single nucleotide polymorphism (SNP) chip panel in the animal genome
industry. In order to increase statistical power for detecting quantitative
trait locus (QTL), a number of animals should be genotyped. However, a
high-density chip for many animals would be increasing the genotyping cost.
Therefore, statistical inference of genotype imputation (low-density chip to
high-density) will be useful in the animal industry. The purpose of this study
is to investigate the effect of the reference population size and marker density
on the imputation accuracy and to suggest the appropriate number of reference
population sets for the imputation in Hanwoo cattle. A total of 3,821 Hanwoo
cattle were divided into reference and validation populations. The reference
sets consisted of 50k (38,916) marker data and different population sizes (500,
1,000, 1,500, 2,000, and 3,600). The validation sets consisted of four
validation sets (Total 889) and the different marker density (5k [5,000], 10k
[10,000], and 15k [15,000]). The accuracy of imputation was calculated by direct
comparison of the true genotype and the imputed genotype. In conclusion, when
the lowest marker density (5k) was used in the validation set, according to the
reference population size, the imputation accuracy was 0.793 to 0.929. On the
other hand, when the highest marker density (15k), according to the reference
population size, the imputation accuracy was 0.904 to 0.967. Moreover, the
reference population size should be more than 1,000 to obtain at least 88%
imputation accuracy in Hanwoo cattle.
Collapse
Affiliation(s)
- DooHo Lee
- Division of Animal and Dairy Science, Chungnam National University, Daejeon 34134, Korea
| | - Yeongkuk Kim
- Division of Animal and Dairy Science, Chungnam National University, Daejeon 34134, Korea
| | - Yoonji Chung
- Division of Animal and Dairy Science, Chungnam National University, Daejeon 34134, Korea
| | - Dongjae Lee
- Division of Animal and Dairy Science, Chungnam National University, Daejeon 34134, Korea
| | - Dongwon Seo
- Division of Animal and Dairy Science, Chungnam National University, Daejeon 34134, Korea
| | - Tae Jeong Choi
- National Institute of Animal Science, Cheonan 31000, Korea
| | - Dajeong Lim
- Animal Genomics and Bioinformatics Division, National Institute of Animal Science, Wanju 55365, Korea
| | - Duhak Yoon
- Department of Animal Science & Biotechnology, Kyungpook National University, Sangju 37224, Korea
| | - Seung Hwan Lee
- Division of Animal and Dairy Science, Chungnam National University, Daejeon 34134, Korea
| |
Collapse
|
3
|
Samuels DC, Below JE, Ness S, Yu H, Leng S, Guo Y. Alternative Applications of Genotyping Array Data Using Multivariant Methods. Trends Genet 2020; 36:857-867. [PMID: 32773169 PMCID: PMC7572808 DOI: 10.1016/j.tig.2020.07.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Revised: 07/08/2020] [Accepted: 07/09/2020] [Indexed: 10/23/2022]
Abstract
One of the forerunners that pioneered the revolution of high-throughput genomic technologies is the genotyping microarray technology, which can genotype millions of single-nucleotide variants simultaneously. Owing to apparent benefits, such as high speed, low cost, and high throughput, the genotyping array has gained lasting applications in genome-wide association studies (GWAS) and thus accumulated an enormous amount of data. Empowered by continuous manufactural upgrades and analytical innovation, unconventional applications of genotyping array data have emerged to address more diverse genetic problems, holding promise of boosting genetic research into human diseases through the re-mining of the rich accumulated data. Here, we review several unconventional genotyping array analysis techniques that have been built on the idea of large-scale multivariant analysis and provide empirical application examples. These unconventional outcomes of genotyping arrays include polygenic score, runs of homozygosity (ROH)/heterozygosity ratio, distant pedigree computation, and mitochondrial DNA (mtDNA) copy number inference.
Collapse
Affiliation(s)
- David C Samuels
- Department of Molecular Physiology and Biophysics, Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN 37232, USA
| | - Jennifer E Below
- Devision of Genetic Medicine, Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Scott Ness
- Department of Internal Medicine, Comprehensive Cancer Center, University of New Mexico, Albuquerque, NM 87109, USA
| | - Hui Yu
- Department of Internal Medicine, Comprehensive Cancer Center, University of New Mexico, Albuquerque, NM 87109, USA
| | - Shuguang Leng
- Department of Internal Medicine, Comprehensive Cancer Center, University of New Mexico, Albuquerque, NM 87109, USA
| | - Yan Guo
- Department of Internal Medicine, Comprehensive Cancer Center, University of New Mexico, Albuquerque, NM 87109, USA.
| |
Collapse
|
4
|
Kanzi AM, San JE, Chimukangara B, Wilkinson E, Fish M, Ramsuran V, de Oliveira T. Next Generation Sequencing and Bioinformatics Analysis of Family Genetic Inheritance. Front Genet 2020; 11:544162. [PMID: 33193618 PMCID: PMC7649788 DOI: 10.3389/fgene.2020.544162] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2020] [Accepted: 09/21/2020] [Indexed: 12/29/2022] Open
Abstract
Mendelian and complex genetic trait diseases continue to burden and affect society both socially and economically. The lack of effective tests has hampered diagnosis thus, the affected lack proper prognosis. Mendelian diseases are caused by genetic mutations in a singular gene while complex trait diseases are caused by the accumulation of mutations in either linked or unlinked genomic regions. Significant advances have been made in identifying novel diseases associated mutations especially with the introduction of next generation and third generation sequencing. Regardless, some diseases are still without diagnosis as most tests rely on SNP genotyping panels developed from population based genetic analyses. Analysis of family genetic inheritance using whole genomes, whole exomes or a panel of genes has been shown to be effective in identifying disease-causing mutations. In this review, we discuss next generation and third generation sequencing platforms, bioinformatic tools and genetic resources commonly used to analyze family based genomic data with a focus on identifying inherited or novel disease-causing mutations. Additionally, we also highlight the analytical, ethical and regulatory challenges associated with analyzing personal genomes which constitute the data used for family genetic inheritance.
Collapse
Affiliation(s)
- Aquillah M. Kanzi
- Kwazulu-Natal Research and Innovation Sequencing Platform (KRISP), School of Laboratory Medicine and Medical Sciences, College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
| | | | | | | | | | | | | |
Collapse
|
5
|
Rediscovering the value of families for psychiatric genetics research. Mol Psychiatry 2019; 24:523-535. [PMID: 29955165 PMCID: PMC7028329 DOI: 10.1038/s41380-018-0073-x] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/21/2017] [Revised: 01/11/2018] [Accepted: 03/26/2018] [Indexed: 01/09/2023]
Abstract
As it is likely that both common and rare genetic variation are important for complex disease risk, studies that examine the full range of the allelic frequency distribution should be utilized to dissect the genetic influences on mental illness. The rate limiting factor for inferring an association between a variant and a phenotype is inevitably the total number of copies of the minor allele captured in the studied sample. For rare variation, with minor allele frequencies of 0.5% or less, very large samples of unrelated individuals are necessary to unambiguously associate a locus with an illness. Unfortunately, such large samples are often cost prohibitive. However, by using alternative analytic strategies and studying related individuals, particularly those from large multiplex families, it is possible to reduce the required sample size while maintaining statistical power. We contend that using whole genome sequence (WGS) in extended pedigrees provides a cost-effective strategy for psychiatric gene mapping that complements common variant approaches and WGS in unrelated individuals. This was our impetus for forming the "Pedigree-Based Whole Genome Sequencing of Affective and Psychotic Disorders" consortium. In this review, we provide a rationale for the use of WGS with pedigrees in modern psychiatric genetics research. We begin with a focused review of the current literature, followed by a short history of family-based research in psychiatry. Next, we describe several advantages of pedigrees for WGS research, including power estimates, methods for studying the environment, and endophenotypes. We conclude with a brief description of our consortium and its goals.
Collapse
|
6
|
Revisit Population-based and Family-based Genotype Imputation. Sci Rep 2019; 9:1800. [PMID: 30755687 PMCID: PMC6372660 DOI: 10.1038/s41598-018-38469-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2018] [Accepted: 12/27/2018] [Indexed: 11/12/2022] Open
Abstract
Genome-Wide Association (GWA) with population-based imputation (PBI) has been successful in identifying common variants associated with complex diseases; however, much heritability remains to be explained and low frequency variants (LFV) may contribute. To identify LFV, a study of unrelated individuals may no longer be as efficient as a family study, where rare population variants can be frequent in families. Family-based imputation (FBI) provides an opportunity to evaluate LFV. To compare the performance of PBI and FBI, we conducted extensive simulations, generating genotypes using SeqSIMLA from various reference panels for families. We masked genotype information for variants unavailable in Framingham 550 K GWA genotype data in less informative subjects selected by GIGI-Pick. We implemented IMPUTE2 with duoHMM in SHAPEIT (Impute2_duoHMM) for PBI, MERLIN and GIGI for FBI and PedBLIMP for a hybrid approach. In general, FBI in both MERLIN and GIGI outperformed other approaches with imputation accuracy greater than 0.99 for the squared correlation and imputation quality scores (IQS) especially for LFV, although imputation accuracy from MERLIN depends on pedigree splitting for larger families. PBI performed worst with the exception of good imputation accuracy for common variants when a closely ancestry matched reference is used. In summary, linkage disequilibrium (LD) information from large available genotype resources provides good imputation for common variants with well-selected reference panels without requiring densely sequenced data in family members, while imputation of LFV with FBI benefits more from information on inheritance patterns within families yielding better imputation.
Collapse
|
7
|
Ullah E, Mall R, Abbas MM, Kunji K, Nato AQ, Bensmail H, Wijsman EM, Saad M. Comparison and assessment of family- and population-based genotype imputation methods in large pedigrees. Genome Res 2018; 29:125-134. [PMID: 30514702 PMCID: PMC6314157 DOI: 10.1101/gr.236315.118] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Accepted: 11/30/2018] [Indexed: 01/19/2023]
Abstract
Genotype imputation is widely used in genome-wide association studies to boost variant density, allowing increased power in association testing. Many studies currently include pedigree data due to increasing interest in rare variants coupled with the availability of appropriate analysis tools. The performance of population-based (subjects are unrelated) imputation methods is well established. However, the performance of family- and population-based imputation methods on family data has been subject to much less scrutiny. Here, we extensively compare several family- and population-based imputation methods on family data of large pedigrees with both European and African ancestry. Our comparison includes many widely used family- and population-based tools and another method, Ped_Pop, which combines family- and population-based imputation results. We also compare four subject selection strategies for full sequencing to serve as the reference panel for imputation: GIGI-Pick, ExomePicks, PRIMUS, and random selection. Moreover, we compare two imputation accuracy metrics: the Imputation Quality Score and Pearson's correlation R 2 for predicting power of association analysis using imputation results. Our results show that (1) GIGI outperforms Merlin; (2) family-based imputation outperforms population-based imputation for rare variants but not for common ones; (3) combining family- and population-based imputation outperforms all imputation approaches for all minor allele frequencies; (4) GIGI-Pick gives the best selection strategy based on the R 2 criterion; and (5) R 2 is the best measure of imputation accuracy. Our study is the first to extensively evaluate the imputation performance of many available family- and population-based tools on the same family data and provides guidelines for future studies.
Collapse
Affiliation(s)
- Ehsan Ullah
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Raghvendra Mall
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Mostafa M Abbas
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Khalid Kunji
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Alejandro Q Nato
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, Washington 98195-9460, USA.,Department of Biomedical Sciences, Joan C. Edwards School of Medicine, Marshall University, Huntington, West Virginia 25755, USA
| | - Halima Bensmail
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Ellen M Wijsman
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, Washington 98195-9460, USA.,Department of Biostatistics, University of Washington, Seattle, Washington 98195-9460, USA
| | - Mohamad Saad
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| |
Collapse
|
8
|
Dueker ND, Beecham A, Wang L, Blanton SH, Guo S, Rundek T, Sacco RL. Rare Variants in NOD1 Associated with Carotid Bifurcation Intima-Media Thickness in Dominican Republic Families. PLoS One 2016; 11:e0167202. [PMID: 27936005 PMCID: PMC5147882 DOI: 10.1371/journal.pone.0167202] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2016] [Accepted: 11/10/2016] [Indexed: 12/19/2022] Open
Abstract
Cardiovascular disorders including ischemic stroke (IS) and myocardial infarction (MI) are heritable; however, few replicated loci have been identified. One strategy to identify loci influencing these complex disorders is to study subclinical phenotypes, such as carotid bifurcation intima-media thickness (bIMT). We have previously shown bIMT to be heritable and found evidence for linkage and association with common variants on chromosome 7p for bIMT. In this study, we aimed to characterize contributions of rare variants (RVs) in 7p to bIMT. To achieve this aim, we sequenced the 1 LOD unit down region on 7p in nine extended families from the Dominican Republic (DR) with strong evidence for linkage to bIMT. We then performed the family-based sequence kernel association test (famSKAT) on genes within the 7p region. Analyses were restricted to single nucleotide variants (SNVs) with population based minor allele frequency (MAF) <5%. We first analyzed all exonic RVs and then the subset of only non-synonymous RVs. There were 68 genes in our analyses. Nucleotide-binding oligomerization domain (NOD1) was the most significantly associated gene when analyzing exonic RVs (famSKAT p = 9.2x10-4; number of SNVs = 14). We achieved suggestive replication of NOD1 in an independent sample of twelve extended families from the DR (p = 0.055). Our study provides suggestive statistical evidence for a role of rare variants in NOD1 in bIMT. Studies in mice have shown Nod1 to play a role in heart function and atherosclerosis, providing biologic plausibility for a role in bIMT thus making NOD1 an excellent bIMT candidate.
Collapse
Affiliation(s)
- Nicole D. Dueker
- John P. Hussman Institute for Human Genomics, University of Miami, Miami, Florida, United States of America
| | - Ashley Beecham
- John P. Hussman Institute for Human Genomics, University of Miami, Miami, Florida, United States of America
| | - Liyong Wang
- John P. Hussman Institute for Human Genomics, University of Miami, Miami, Florida, United States of America
- Dr. John T. Macdonald Foundation Department of Human Genetics, University of Miami, Miami, Florida, United States of America
| | - Susan H. Blanton
- John P. Hussman Institute for Human Genomics, University of Miami, Miami, Florida, United States of America
- Dr. John T. Macdonald Foundation Department of Human Genetics, University of Miami, Miami, Florida, United States of America
| | - Shengru Guo
- John P. Hussman Institute for Human Genomics, University of Miami, Miami, Florida, United States of America
| | - Tatjana Rundek
- Department of Neurology, Miller School of Medicine, University of Miami, Miami, Florida, United States of America
- Department of Public Health Sciences, Miller School of Medicine, University of Miami, Miami, Florida, United States of America
| | - Ralph L. Sacco
- Dr. John T. Macdonald Foundation Department of Human Genetics, University of Miami, Miami, Florida, United States of America
- Department of Neurology, Miller School of Medicine, University of Miami, Miami, Florida, United States of America
- Department of Public Health Sciences, Miller School of Medicine, University of Miami, Miami, Florida, United States of America
- * E-mail:
| |
Collapse
|
9
|
Darst BF, Engelman CD. Transmission and decorrelation methods for detecting rare variants using sequencing data from related individuals. BMC Proc 2016; 10:203-207. [PMID: 27980637 PMCID: PMC5133523 DOI: 10.1186/s12919-016-0031-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
BACKGROUND Advances in whole genome sequencing have enabled the investigation of rare variants, which could explain some of the missing heritability that genome-wide association studies are unable to detect. Most methods to detect associations with rare variants are developed for unrelated individuals; however, several methods exist that utilize family studies and could have better power to detect such associations. METHODS Using whole genome sequencing data and simulated phenotypes provided by the organizers of the Genetic Analysis Workshop 19 (GAW19), we compared family-based methods that test for associations between rare and common variants with a quantitative trait. This was done using 2 fairly novel methods: family-based association test for rare variants (FBAT-RV), which is a transmission-based method that utilizes the transmission of genetic information from parent to offspring; and Minimum p value Optimized Nuisance parameter Score Test Extended to Relatives (MONSTER), which is a decorrelation method that instead attempts to adjust for relatedness using a regression-based method. We also considered family-based association test linear combination (FBAT-LC) and FBAT-Min P, which are slightly older methods that do not allow for the weighting of rare or common variants, but contrast some of the limitations of FBAT-RV. RESULTS MONSTER had much higher overall power than FBAT-RV and FBAT-Min P. Interestingly, FBAT-LC had similar overall power as MONSTER. MONSTER had the highest power for a gene accounting for a larger percent of the phenotypic variance, whereas MONSTER and FBAT-LC both had the highest power for a gene accounting for moderate variance. FBAT-LC had the highest power for a gene accounting for the least variance. CONCLUSIONS Based on the simulated data from GAW19, MONSTER and FBAT-LC were the most powerful of the methods assessed. However, there are limitations to each of these methods that should be carefully considered when conducting an analysis of rare variants in related individuals. This emphasizes the need for methods that can incorporate the advantages of each of these methods into 1 family-based association test for rare variants.
Collapse
Affiliation(s)
- Burcu F. Darst
- University of Wisconsin, Madison, WI USA
- Department of Population Health Sciences, University of Wisconsin School of Medicine and Public Health, Madison, WI USA
| | - Corinne D. Engelman
- University of Wisconsin, Madison, WI USA
- Department of Population Health Sciences, University of Wisconsin School of Medicine and Public Health, Madison, WI USA
| |
Collapse
|
10
|
Genome-wide linkage and association analysis of cardiometabolic phenotypes in Hispanic Americans. J Hum Genet 2016; 62:175-184. [PMID: 27535031 PMCID: PMC5266668 DOI: 10.1038/jhg.2016.103] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2016] [Revised: 07/01/2016] [Accepted: 07/11/2016] [Indexed: 01/01/2023]
Abstract
Linkage studies of complex genetic diseases have been largely replaced by genome-wide association studies, due in part to limited success in complex trait discovery. However, recent interest in rare and low-frequency variants motivates re-examination of family-based methods. In this study, we investigated the performance of two-point linkage analysis for over 1.6 million single-nucleotide polymorphisms (SNPs) combined with single variant association analysis to identify high impact variants, which are both strongly linked and associated with cardiometabolic traits in up to 1414 Hispanics from the Insulin Resistance Atherosclerosis Family Study (IRASFS). Evaluation of all 50 phenotypes yielded 83 557 000 LOD (logarithm of the odds) scores, with 9214 LOD scores ⩾3.0, 845 ⩾4.0 and 89 ⩾5.0, with a maximal LOD score of 6.49 (rs12956744 in the LAMA1 gene for tumor necrosis factor-α (TNFα) receptor 2). Twenty-seven variants were associated with P<0.005 as well as having an LOD score >4, including variants in the NFIB gene under a linkage peak with TNFα receptor 2 levels on chromosome 9. Linkage regions of interest included a broad peak (31 Mb) on chromosome 1q with acute insulin response (max LOD=5.37). This region was previously documented with type 2 diabetes in family-based studies, providing support for the validity of these results. Overall, we have demonstrated the utility of two-point linkage and association in comprehensive genome-wide array-based SNP genotypes.
Collapse
|
11
|
Staples J, Witherspoon D, Jorde L, Nickerson D, Below J, Huff C, Huff CD. PADRE: Pedigree-Aware Distant-Relationship Estimation. Am J Hum Genet 2016; 99:154-62. [PMID: 27374771 DOI: 10.1016/j.ajhg.2016.05.020] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2016] [Accepted: 05/16/2016] [Indexed: 10/21/2022] Open
Abstract
Accurate estimation of shared ancestry is an important component of many genetic studies; current prediction tools accurately estimate pairwise genetic relationships up to the ninth degree. Pedigree-aware distant-relationship estimation (PADRE) combines relationship likelihoods generated by estimation of recent shared ancestry (ERSA) with likelihoods from family networks reconstructed by pedigree reconstruction and identification of a maximum unrelated set (PRIMUS), improving the power to detect distant relationships between pedigrees. Using PADRE, we estimated relationships from simulated pedigrees and three extended pedigrees, correctly predicting 20% more fourth- through ninth-degree simulated relationships than when using ERSA alone. By leveraging pedigree information, PADRE can even identify genealogical relationships between individuals who are genetically unrelated. For example, although 95% of 13(th)-degree relatives are genetically unrelated, in simulations, PADRE correctly predicted 50% of 13(th)-degree relationships to within one degree of relatedness. The improvement in prediction accuracy was consistent between simulated and actual pedigrees. We also applied PADRE to the HapMap3 CEU samples and report new cryptic relationships and validation of previously described relationships between families. PADRE greatly expands the range of relationships that can be estimated by using genetic data in pedigrees.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Chad D Huff
- Department of Epidemiology, The University of Texas M.D. Anderson Cancer Center, Houston, TX 77030, USA.
| |
Collapse
|
12
|
Chung RH, Tsai WY, Kang CY, Yao PJ, Tsai HJ, Chen CH. FamPipe: An Automatic Analysis Pipeline for Analyzing Sequencing Data in Families for Disease Studies. PLoS Comput Biol 2016; 12:e1004980. [PMID: 27272119 PMCID: PMC4894624 DOI: 10.1371/journal.pcbi.1004980] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2015] [Accepted: 05/12/2016] [Indexed: 11/18/2022] Open
Abstract
In disease studies, family-based designs have become an attractive approach to analyzing next-generation sequencing (NGS) data for the identification of rare mutations enriched in families. Substantial research effort has been devoted to developing pipelines for automating sequence alignment, variant calling, and annotation. However, fewer pipelines have been designed specifically for disease studies. Most of the current analysis pipelines for family-based disease studies using NGS data focus on a specific function, such as identifying variants with Mendelian inheritance or identifying shared chromosomal regions among affected family members. Consequently, some other useful family-based analysis tools, such as imputation, linkage, and association tools, have yet to be integrated and automated. We developed FamPipe, a comprehensive analysis pipeline, which includes several family-specific analysis modules, including the identification of shared chromosomal regions among affected family members, prioritizing variants assuming a disease model, imputation of untyped variants, and linkage and association tests. We used simulation studies to compare properties of some modules implemented in FamPipe, and based on the results, we provided suggestions for the selection of modules to achieve an optimal analysis strategy. The pipeline is under the GNU GPL License and can be downloaded for free at http://fampipe.sourceforge.net.
Collapse
Affiliation(s)
- Ren-Hua Chung
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Miaoli County, Taiwan
- * E-mail:
| | - Wei-Yun Tsai
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Miaoli County, Taiwan
| | - Chen-Yu Kang
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Miaoli County, Taiwan
| | - Po-Ju Yao
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Miaoli County, Taiwan
| | - Hui-Ju Tsai
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Miaoli County, Taiwan
- Department of Public Health, China Medical University, Taichung, Taiwan
- Department of Pediatrics, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, United States of America
| | - Chia-Hsiang Chen
- Department of Psychiatry, Chang Gung Memorial Hospital-Linkou, Gueishan, Taoyuan, Taiwan
- Department and Graduate Institute of Biomedical Sciences, Chang Gung University, Taoyuan, Taiwan
| |
Collapse
|
13
|
Knowles EEM, Kent JW, McKay DR, Sprooten E, Mathias SR, Curran JE, Carless MA, de Almeida MAA, Harald HHG, Dyer TD, Olvera RL, Fox PT, Duggirala R, Almasy L, Blangero J, Glahn DC. Genome-wide linkage on chromosome 10q26 for a dimensional scale of major depression. J Affect Disord 2016; 191:123-31. [PMID: 26655122 PMCID: PMC4715913 DOI: 10.1016/j.jad.2015.11.012] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/25/2015] [Revised: 10/27/2015] [Accepted: 11/09/2015] [Indexed: 12/28/2022]
Abstract
Major depressive disorder (MDD) is a common and potentially life-threatening mood disorder. Identifying genetic markers for depression might provide reliable indicators of depression risk, which would, in turn, substantially improve detection, enabling earlier and more effective treatment. The aim of this study was to identify rare variants for depression, modeled as a continuous trait, using linkage and post-hoc association analysis. The sample comprised 1221 Mexican-American individuals from extended pedigrees. A single dimensional scale of MDD was derived using confirmatory factor analysis applied to all items from the Past Major Depressive Episode section of the Mini-International Neuropsychiatric Interview. Scores on this scale of depression were subjected to linkage analysis followed by QTL region-specific association analysis. Linkage analysis revealed a single genome-wide significant QTL (LOD=3.43) on 10q26.13, QTL-specific association analysis conducted in the entire sample revealed a suggestive variant within an intron of the gene LHPP (rs11245316, p=7.8×10(-04); LD-adjusted Bonferroni-corrected p=8.6×10(-05)). This region of the genome has previously been implicated in the etiology of MDD; the present study extends our understanding of the involvement of this region by highlighting a putative gene of interest (LHPP).
Collapse
Affiliation(s)
- Emma E M Knowles
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA; Olin Neuropsychiatric Research Center, Institute of Living, Hartford Hospital, Hartford, CT, USA.
| | - Jack W Kent
- Department of Genetics, Texas Biomedical Research Institute, San Antonio, TX, USA
| | - D Reese McKay
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA; Olin Neuropsychiatric Research Center, Institute of Living, Hartford Hospital, Hartford, CT, USA
| | - Emma Sprooten
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA; Olin Neuropsychiatric Research Center, Institute of Living, Hartford Hospital, Hartford, CT, USA
| | - Samuel R Mathias
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA; Olin Neuropsychiatric Research Center, Institute of Living, Hartford Hospital, Hartford, CT, USA
| | - Joanne E Curran
- South Texas Diabetes and Obesity Institute, University of Texas Health Science Center at San Antonio & University of Texas of the Rio Grande Valley, Brownsville, TX, United States
| | - Melanie A Carless
- Department of Genetics, Texas Biomedical Research Institute, San Antonio, TX, USA
| | - Marcio A A de Almeida
- South Texas Diabetes and Obesity Institute, University of Texas Health Science Center at San Antonio & University of Texas of the Rio Grande Valley, Brownsville, TX, United States
| | - H H Goring Harald
- South Texas Diabetes and Obesity Institute, University of Texas Health Science Center at San Antonio & University of Texas of the Rio Grande Valley, Brownsville, TX, United States
| | - Tom D Dyer
- South Texas Diabetes and Obesity Institute, University of Texas Health Science Center at San Antonio & University of Texas of the Rio Grande Valley, Brownsville, TX, United States
| | - Rene L Olvera
- Department of Psychiatry, University of Texas Health Science Center San Antonio, Texas Center San Antonio, San Antonio, TX, United States
| | - Peter T Fox
- Research Imaging Institute, University of Texas Health Science Center San Antonio, San Antonio, TX, United States; South Texas Veterans' Healthcare System, 7400 Merton Minter, San Antonio, TX 78229, USA
| | - Ravi Duggirala
- South Texas Diabetes and Obesity Institute, University of Texas Health Science Center at San Antonio & University of Texas of the Rio Grande Valley, Brownsville, TX, United States
| | - Laura Almasy
- South Texas Diabetes and Obesity Institute, University of Texas Health Science Center at San Antonio & University of Texas of the Rio Grande Valley, Brownsville, TX, United States
| | - John Blangero
- South Texas Diabetes and Obesity Institute, University of Texas Health Science Center at San Antonio & University of Texas of the Rio Grande Valley, Brownsville, TX, United States
| | - David C Glahn
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA; Olin Neuropsychiatric Research Center, Institute of Living, Hartford Hospital, Hartford, CT, USA
| |
Collapse
|
14
|
Nato AQ, Chapman NH, Sohi HK, Nguyen HD, Brkanac Z, Wijsman EM. PBAP: a pipeline for file processing and quality control of pedigree data with dense genetic markers. Bioinformatics 2015; 31:3790-8. [PMID: 26231429 PMCID: PMC4668752 DOI: 10.1093/bioinformatics/btv444] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2015] [Revised: 07/07/2015] [Accepted: 07/25/2015] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Huge genetic datasets with dense marker panels are now common. With the availability of sequence data and recognition of importance of rare variants, smaller studies based on pedigrees are again also common. Pedigree-based samples often start with a dense marker panel, a subset of which may be used for linkage analysis to reduce computational burden and to limit linkage disequilibrium between single-nucleotide polymorphisms (SNPs). Programs attempting to select markers for linkage panels exist but lack flexibility. RESULTS We developed a pedigree-based analysis pipeline (PBAP) suite of programs geared towards SNPs and sequence data. PBAP performs quality control, marker selection and file preparation. PBAP sets up files for MORGAN, which can handle analyses for small and large pedigrees, typically human, and results can be used with other programs and for downstream analyses. We evaluate and illustrate its features with two real datasets. AVAILABILITY AND IMPLEMENTATION PBAP scripts may be downloaded from http://faculty.washington.edu/wijsman/software.shtml. CONTACT wijsman@uw.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | | | - Hiep D Nguyen
- Division of Medical Genetics, Department of Medicine
| | | | - Ellen M Wijsman
- Division of Medical Genetics, Department of Medicine, Department of Biostatistics and Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
15
|
Chapman NH, Nato AQ, Bernier R, Ankenman K, Sohi H, Munson J, Patowary A, Archer M, Blue EM, Webb SJ, Coon H, Raskind WH, Brkanac Z, Wijsman EM. Whole exome sequencing in extended families with autism spectrum disorder implicates four candidate genes. Hum Genet 2015; 134:1055-68. [PMID: 26204995 PMCID: PMC4578871 DOI: 10.1007/s00439-015-1585-y] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2015] [Accepted: 07/11/2015] [Indexed: 12/26/2022]
Abstract
Autism spectrum disorders (ASDs) are a group of neurodevelopmental disorders, characterized by impairment in communication and social interactions, and by repetitive behaviors. ASDs are highly heritable, and estimates of the number of risk loci range from hundreds to >1000. We considered 7 extended families (size 12-47 individuals), each with ≥3 individuals affected by ASD. All individuals were genotyped with dense SNP panels. A small subset of each family was typed with whole exome sequence (WES). We used a 3-step approach for variant identification. First, we used family-specific parametric linkage analysis of the SNP data to identify regions of interest. Second, we filtered variants in these regions based on frequency and function, obtaining exactly 200 candidates. Third, we compared two approaches to narrowing this list further. We used information from the SNP data to impute exome variant dosages into those without WES. We regressed affected status on variant allele dosage, using pedigree-based kinship matrices to account for relationships. The p value for the test of the null hypothesis that variant allele dosage is unrelated to phenotype was used to indicate strength of evidence supporting the variant. A cutoff of p = 0.05 gave 28 variants. As an alternative third filter, we required Mendelian inheritance in those with WES, resulting in 70 variants. The imputation- and association-based approach was effective. We identified four strong candidate genes for ASD (SEZ6L, HISPPD1, FEZF1, SAMD11), all of which have been previously implicated in other studies, or have a strong biological argument for their relevance.
Collapse
Affiliation(s)
- Nicola H Chapman
- Division of Medical Genetics, School of Medicine, University of Washington, Seattle, WA, USA
| | - Alejandro Q Nato
- Division of Medical Genetics, School of Medicine, University of Washington, Seattle, WA, USA
| | - Raphael Bernier
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA, USA
| | - Katy Ankenman
- Department of Psychiatry, University of California, San Francisco, CA, USA
| | - Harkirat Sohi
- Division of Medical Genetics, School of Medicine, University of Washington, Seattle, WA, USA
| | - Jeff Munson
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA, USA
- Center on Child Development and Disability, University of Washington, Seattle, WA, USA
| | - Ashok Patowary
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA, USA
| | - Marilyn Archer
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA, USA
| | - Elizabeth M Blue
- Division of Medical Genetics, School of Medicine, University of Washington, Seattle, WA, USA
| | - Sara Jane Webb
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA, USA
- Center on Child Development and Disability, University of Washington, Seattle, WA, USA
| | - Hilary Coon
- Department of Internal Medicine, University of Utah, Salt Lake City, UT, USA
- Department of Psychiatry, School of Medicine, University of Utah, Salt Lake City, UT, USA
| | - Wendy H Raskind
- Division of Medical Genetics, School of Medicine, University of Washington, Seattle, WA, USA
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA, USA
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Zoran Brkanac
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA, USA
| | - Ellen M Wijsman
- Division of Medical Genetics, School of Medicine, University of Washington, Seattle, WA, USA.
- Department of Biostatistics, University of Washington, Seattle, WA, USA.
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- University of Washington, University of Washington Tower, T15, 4333 Brooklyn Ave, NE, BOX 359460, Seattle, WA, 98195-9460, USA.
| |
Collapse
|
16
|
Feng S, Pistis G, Zhang H, Zawistowski M, Mulas A, Zoledziewska M, Holmen OL, Busonero F, Sanna S, Hveem K, Willer C, Cucca F, Liu DJ, Abecasis GR. Methods for association analysis and meta-analysis of rare variants in families. Genet Epidemiol 2015; 39:227-38. [PMID: 25740221 DOI: 10.1002/gepi.21892] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2014] [Revised: 01/03/2015] [Accepted: 01/26/2015] [Indexed: 11/09/2022]
Abstract
Advances in exome sequencing and the development of exome genotyping arrays are enabling explorations of association between rare coding variants and complex traits. To ensure power for these rare variant analyses, a variety of association tests that group variants by gene or functional unit have been proposed. Here, we extend these tests to family-based studies. We develop family-based burden tests, variable frequency threshold tests and sequence kernel association tests. Through simulations, we compare the performance of different tests. We describe situations where family-based studies provide greater power than studies of unrelated individuals to detect rare variants associated with moderate to large changes in trait values. Broadly speaking, we find that when sample sizes are limited and only a modest fraction of all trait-associated variants can be identified, family samples are more powerful. Finally, we illustrate our approach by analyzing the relationship between coding variants and levels of high-density lipoprotein (HDL) cholesterol in 11,556 individuals from the HUNT and SardiNIA studies, demonstrating association for coding variants in the APOC3, CETP, LIPC, LIPG, and LPL genes and illustrating the value of family samples, meta-analysis, and gene-level tests. Our methods are implemented in freely available C++ code.
Collapse
Affiliation(s)
- Shuang Feng
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Peng B. Reproducible simulations of realistic samples for next-generation sequencing studies using Variant Simulation Tools. Genet Epidemiol 2015; 39:45-52. [PMID: 25395236 PMCID: PMC6432799 DOI: 10.1002/gepi.21867] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2014] [Revised: 09/14/2014] [Accepted: 09/26/2014] [Indexed: 12/31/2022]
Abstract
Computer simulations have been widely used to validate and evaluate the power of statistical methods for genetic epidemiological studies. Although a large number of simulation methods and software packages have been developed for genome-wide association studies, methodological and bioinformatics challenges have limited their applications in simulating datasets for whole-genome and whole-exome sequencing studies. With the development of more sophisticated statistical methods that make fuller use of available data and our knowledge of the human genome, there is a pressing need for genetic simulators that capture more features of empirical data (e.g., multiallele variants, indels, use of the Variant Call Format) and the human genome (e.g., functional annotations of genetic variants). This article introduces Variant Simulation Tools (VST), a module of Variant Tools for the simulation of genetic variants for sequencing-based genetic epidemiological studies. Although multiple simulation engines are provided, the core of VST is a novel forward-time simulation engine that simulates real nucleotide sequences of the human genome using DNA mutation models, fine-scale recombination maps, and a selection model based on amino acid changes of translated protein sequences. The design of VST allows users to easily create and distribute simulation methods and simulated datasets for a variety of applications and encourages fair comparison between statistical methods through the use of existing or reproduced simulated datasets.
Collapse
Affiliation(s)
- Bo Peng
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, 1400 Pressler Street, Unit 1401, Houston, TX, 77030
| |
Collapse
|
18
|
Lin WY. Adaptive combination of P-values for family-based association testing with sequence data. PLoS One 2014; 9:e115971. [PMID: 25541952 PMCID: PMC4277421 DOI: 10.1371/journal.pone.0115971] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2014] [Accepted: 12/01/2014] [Indexed: 12/24/2022] Open
Abstract
Family-based study design will play a key role in identifying rare causal variants, because rare causal variants can be enriched in families with multiple affected subjects. Furthermore, different from population-based studies, family studies are robust to bias induced by population substructure. It is well known that rare causal variants are difficult to detect from single-locus tests. Therefore, burden tests and non-burden tests have been developed, by combining signals of multiple variants in a chromosomal region or a functional unit. This inevitably incorporates some neutral variants into the test statistics, which can dilute the power of statistical methods. To guard against the noise caused by neutral variants, we here propose an 'adaptive combination of P-values method' (abbreviated as 'ADA'). This method combines per-site P-values of variants that are more likely to be causal. Variants with large P-values (which are more likely to be neutral variants) are discarded from the combined statistic. In addition to performing extensive simulation studies, we applied these tests to the Genetic Analysis Workshop 17 data sets, where real sequence data were generated according to the 1000 Genomes Project. Compared with some existing methods, ADA is more robust to the inclusion of neutral variants. This is a merit especially when dichotomous traits are analyzed. However, there are some limitations for ADA. First, it is more computationally intensive. Second, pedigree structures and founders' sequence data are required for the permutation procedure. Third, unrelated controls cannot be included. We here show that, for family-based studies, the application of ADA is limited to dichotomous trait analyses with full pedigree information.
Collapse
Affiliation(s)
- Wan-Yu Lin
- Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
19
|
Saad M, Wijsman EM. Combining family- and population-based imputation data for association analysis of rare and common variants in large pedigrees. Genet Epidemiol 2014; 38:579-90. [PMID: 25132070 PMCID: PMC4190076 DOI: 10.1002/gepi.21844] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2014] [Revised: 05/24/2014] [Accepted: 06/27/2014] [Indexed: 12/27/2022]
Abstract
In the last two decades, complex traits have become the main focus of genetic studies. The hypothesis that both rare and common variants are associated with complex traits is increasingly being discussed. Family-based association studies using relatively large pedigrees are suitable for both rare and common variant identification. Because of the high cost of sequencing technologies, imputation methods are important for increasing the amount of information at low cost. A recent family-based imputation method, Genotype Imputation Given Inheritance (GIGI), is able to handle large pedigrees and accurately impute rare variants, but does less well for common variants where population-based methods perform better. Here, we propose a flexible approach to combine imputation data from both family- and population-based methods. We also extend the Sequence Kernel Association Test for Rare and Common variants (SKAT-RC), originally proposed for data from unrelated subjects, to family data in order to make use of such imputed data. We call this extension "famSKAT-RC." We compare the performance of famSKAT-RC and several other existing burden and kernel association tests. In simulated pedigree sequence data, our results show an increase of imputation accuracy from use of our combining approach. Also, they show an increase of power of the association tests with this approach over the use of either family- or population-based imputation methods alone, in the context of rare and common variants. Moreover, our results show better performance of famSKAT-RC compared to the other considered tests, in most scenarios investigated here.
Collapse
Affiliation(s)
- Mohamad Saad
- Division of Medical Genetics, Department of Medicine; and Department
of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Ellen M. Wijsman
- Division of Medical Genetics, Department of Medicine; and Department
of Biostatistics, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
20
|
Grarup N, Sandholt CH, Hansen T, Pedersen O. Genetic susceptibility to type 2 diabetes and obesity: from genome-wide association studies to rare variants and beyond. Diabetologia 2014; 57:1528-41. [PMID: 24859358 DOI: 10.1007/s00125-014-3270-4] [Citation(s) in RCA: 127] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/13/2013] [Accepted: 04/22/2014] [Indexed: 12/29/2022]
Abstract
During the past 7 years, genome-wide association studies have shed light on the contribution of common genomic variants to the genetic architecture of type 2 diabetes, obesity and related intermediate phenotypes. The discoveries have firmly established more than 175 genomic loci associated with these phenotypes. Despite the tight correlation between type 2 diabetes and obesity, these conditions do not appear to share a common genetic background, since they have few genetic risk loci in common. The recent genetic discoveries do however highlight specific details of the interplay between the pathogenesis of type 2 diabetes, insulin resistance and obesity. The focus is currently shifting towards investigations of data from targeted array-based genotyping and exome and genome sequencing to study the individual and combined effect of low-frequency and rare variants in metabolic disease. Here we review recent progress as regards the concepts, methodologies and derived outcomes of studies of the genetics of type 2 diabetes and obesity, and discuss avenues to be investigated in the future within this research field.
Collapse
Affiliation(s)
- Niels Grarup
- The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, DIKU Building, Universitetsparken 1, 2100, Copenhagen Ø, Denmark,
| | | | | | | |
Collapse
|
21
|
Jiang Y, Conneely KN, Epstein MP. Flexible and robust methods for rare-variant testing of quantitative traits in trios and nuclear families. Genet Epidemiol 2014; 38:542-51. [PMID: 25044337 DOI: 10.1002/gepi.21839] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2014] [Revised: 05/21/2014] [Accepted: 05/29/2014] [Indexed: 11/07/2022]
Abstract
Most rare-variant association tests for complex traits are applicable only to population-based or case-control resequencing studies. There are fewer rare-variant association tests for family-based resequencing studies, which is unfortunate because pedigrees possess many attractive characteristics for such analyses. Family-based studies can be more powerful than their population-based counterparts due to increased genetic load and further enable the implementation of rare-variant association tests that, by design, are robust to confounding due to population stratification. With this in mind, we propose a rare-variant association test for quantitative traits in families; this test integrates the QTDT approach of Abecasis et al. [Abecasis et al., ] into the kernel-based SNP association test KMFAM of Schifano et al. [Schifano et al., ]. The resulting within-family test enjoys the many benefits of the kernel framework for rare-variant association testing, including rapid evaluation of P-values and preservation of power when a region harbors rare causal variation that acts in different directions on phenotype. Additionally, by design, this within-family test is robust to confounding due to population stratification. Although within-family association tests are generally less powerful than their counterparts that use all genetic information, we show that we can recover much of this power (although still ensuring robustness to population stratification) using a straightforward screening procedure. Our method accommodates covariates and allows for missing parental genotype data, and we have written software implementing the approach in R for public use.
Collapse
Affiliation(s)
- Yunxuan Jiang
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia, United States of America
| | | | | |
Collapse
|
22
|
Rubenstein K, Raskind WH, Berninger VW, Matsushita MM, Wijsman EM. Genome scan for cognitive trait loci of dyslexia: Rapid naming and rapid switching of letters, numbers, and colors. Am J Med Genet B Neuropsychiatr Genet 2014; 165B:345-56. [PMID: 24807833 PMCID: PMC4053475 DOI: 10.1002/ajmg.b.32237] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/06/2013] [Accepted: 04/14/2014] [Indexed: 12/14/2022]
Abstract
Dyslexia, or specific reading disability, is a common developmental disorder that affects 5-12% of school-aged children. Dyslexia and its component phenotypes, assessed categorically or quantitatively, have complex genetic bases. The ability to rapidly name letters, numbers, and colors from rows presented visually correlates strongly with reading in multiple languages and is a valid predictor of reading and spelling impairment. Performance on measures of rapid naming and switching, RAN and RAS, is stable throughout elementary school years, with slowed performance persisting in adults who still manifest dyslexia. Targeted analyses of dyslexia candidate regions have included RAN measures, but only one other genome-wide linkage study has been reported. As part of a broad effort to identify genetic contributors to dyslexia, we performed combined oligogenic segregation and linkage analyses of measures of RAN and RAS in a family-based cohort ascertained through probands with dyslexia. We obtained strong evidence for linkage of RAN letters to the DYX3 locus on chromosome 2p and RAN colors to chromosome 10q, but were unable to confirm the chromosome 6p21 linkage detected for a composite measure of RAN colors and objects in the previous genome-wide study.
Collapse
Affiliation(s)
- Kevin Rubenstein
- Department of Biostatistics University of Washington, Seattle, WA
| | - Wendy H. Raskind
- Division of Medical Genetics, Department of Medicine University of Washington, Seattle, WA
| | | | - Mark M. Matsushita
- Division of Medical Genetics, Department of Medicine University of Washington, Seattle, WA
| | - Ellen M. Wijsman
- Department of Biostatistics University of Washington, Seattle, WA
- Division of Medical Genetics, Department of Medicine University of Washington, Seattle, WA
| |
Collapse
|
23
|
A statistical framework to guide sequencing choices in pedigrees. Am J Hum Genet 2014; 94:257-67. [PMID: 24507777 DOI: 10.1016/j.ajhg.2014.01.005] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2013] [Accepted: 01/13/2014] [Indexed: 11/23/2022] Open
Abstract
The use of large pedigrees is an effective design for identifying rare functional variants affecting heritable traits. Cost-effective studies using sequence data can be achieved via pedigree-based genotype imputation in which some subjects are sequenced and missing genotypes are inferred on the remaining subjects. Because of high cost, it is important to carefully prioritize subjects for sequencing. Here, we introduce a statistical framework that enables systematic comparison among subject-selection choices for sequencing. We introduce a metric "local coverage," which allows the use of inferred inheritance vectors to measure genotype-imputation ability specifically in a region of interest, such as one with prior evidence of linkage. In the absence of linkage information, we can instead use a "genome-wide coverage" metric computed with the pedigree structure. These metrics enable the development of a method that identifies efficient selection choices for sequencing. As implemented in GIGI-Pick, this method also flexibly allows initial manual selection of subjects and optimizes selections within the constraint that only some subjects might be available for sequencing. In the present study, we used simulations to compare GIGI-Pick with PRIMUS, ExomePicks, and common ad hoc methods of selecting subjects. In genotype imputation of both common and rare alleles, GIGI-Pick substantially outperformed all other methods considered and had the added advantage of incorporating prior linkage information. We also used a real pedigree to demonstrate the utility of our approach in identifying causal mutations. Our work enables prioritization of subjects for sequencing to facilitate dissection of the genetic basis of heritable traits.
Collapse
|