1
|
Brugger M, Lutz M, Müller-Nurasyid M, Lichtner P, Slater EP, Matthäi E, Bartsch DK, Strauch K. Joint Linkage and Association Analysis Using GENEHUNTER-MODSCORE with an Application to Familial Pancreatic Cancer. Hum Hered 2024; 89:8-31. [PMID: 38198765 DOI: 10.1159/000535840] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Accepted: 12/07/2023] [Indexed: 01/12/2024] Open
Abstract
INTRODUCTION Joint linkage and association (JLA) analysis combines two disease gene mapping strategies: linkage information contained in families and association information contained in populations. Such a JLA analysis can increase mapping power, especially when the evidence for both linkage and association is low to moderate. Similarly, an association analysis based on haplotypes instead of single markers can increase mapping power when the association pattern is complex. METHODS In this paper, we present an extension to the GENEHUNTER-MODSCORE software package that enables a JLA analysis based on haplotypes and uses information from arbitrary pedigree types and unrelated individuals. Our new JLA method is an extension of the MOD score approach for linkage analysis, which allows the estimation of trait-model and linkage disequilibrium (LD) parameters, i.e., penetrance, disease-allele frequency, and haplotype frequencies. LD is modeled between alleles at a single diallelic disease locus and up to three diallelic test markers. Linkage information is contributed by additional multi-allelic flanking markers. We investigated the statistical properties of our JLA implementation using extensive simulations, and we compared our approach to another commonly used single-marker JLA test. To demonstrate the applicability of our new method in practice, we analyzed pedigree data from the German National Case Collection for Familial Pancreatic Cancer (FaPaCa). RESULTS Based on the simulated data, we demonstrated the validity of our JLA-MOD score analysis implementation and identified scenarios in which haplotype-based tests outperformed the single-marker test. The estimated trait-model and LD parameters were in good accordance with the simulated values. Our method outperformed another commonly used JLA single-marker test when the LD pattern was complex. The exploratory analysis of the FaPaCa families led to the identification of a promising genetic region on chromosome 22q13.33, which can serve as a starting point for future mutation analysis and molecular research in pancreatic cancer. CONCLUSION Our newly proposed JLA-MOD score method proves to be a valuable gene mapping and characterization tool, especially when either linkage or association information alone provide insufficient power to identify the disease-causing genetic variants.
Collapse
Affiliation(s)
- Markus Brugger
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center, Johannes Gutenberg University, Mainz, Germany
- Institute of Medical Information Processing, Biometry and Epidemiology - IBE, LMU Munich, Munich, Germany
- Institute of Genetic Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany
| | - Manuel Lutz
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center, Johannes Gutenberg University, Mainz, Germany
- Institute of Medical Information Processing, Biometry and Epidemiology - IBE, LMU Munich, Munich, Germany
- Institute of Genetic Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany
| | - Martina Müller-Nurasyid
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center, Johannes Gutenberg University, Mainz, Germany
- Institute of Medical Information Processing, Biometry and Epidemiology - IBE, LMU Munich, Munich, Germany
- Institute of Genetic Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany
| | - Peter Lichtner
- Institute of Human Genetics, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany
| | - Emily P Slater
- Department of Visceral, Thoracic and Vascular Surgery, Philipps University, Marburg, Germany
| | - Elvira Matthäi
- Department of Visceral, Thoracic and Vascular Surgery, Philipps University, Marburg, Germany
| | - Detlef K Bartsch
- Department of Visceral, Thoracic and Vascular Surgery, Philipps University, Marburg, Germany
| | - Konstantin Strauch
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center, Johannes Gutenberg University, Mainz, Germany
- Institute of Medical Information Processing, Biometry and Epidemiology - IBE, LMU Munich, Munich, Germany
- Institute of Genetic Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany
| |
Collapse
|
2
|
Brugger M, Rospleszcz S, Strauch K. Estimation of Trait-Model Parameters in a MOD Score Linkage Analysis. Hum Hered 2017; 82:103-139. [PMID: 29131067 PMCID: PMC6187844 DOI: 10.1159/000479738] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2017] [Accepted: 07/25/2017] [Indexed: 12/02/2022] Open
Abstract
Background/Aims Theoretically, the trait-model parameters (disease allele frequency and penetrance function) can be estimated without bias in a MOD score linkage analysis. We aimed to practically evaluate the MOD score approach regarding its ability to provide unbiased trait-model parameters for various pedigree-type and trait-model scenarios. We further investigated the ability of the MOD score approach to detect imprinting using affected sib pairs (ASPs) and affected half-sib pairs (AHSPs) when all parental genotypes are missing. Methods Simulated pedigree data were analyzed using the GENEHUNTER-MODSCORE software package. Parameter estimation performance in terms of bias and variability was evaluated with regard to trait-model type and pedigree complexity. Results Generally, parameters were estimated with lower bias and variability with increasing pedigree complexity, especially for recessive and overdominant models. However, dominant and additive models could hardly be distinguished even when using 3-generation pedigrees. Imprinting could clearly be detected for mixtures of mainly ASPs and only few AHSPs with the common parent of the imprinted sex, even though no parental genotypes were available. Conclusion Our results provide guidance to researchers regarding the possibility to estimate trait-model parameters by a MOD score analysis, including the degree of imprinting, with certain types of pedigrees.
Collapse
Affiliation(s)
- Markus Brugger
- Institute of Genetic Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, and Chair of Genetic Epidemiology, IBE, Faculty of Medicine, LMU Munich, Germany
| | | | | |
Collapse
|
3
|
Künzel T, Strauch K. Parameter estimation and quantitative parametric linkage analysis with GENEHUNTER-QMOD. Hum Hered 2012; 73:208-19. [PMID: 22948723 DOI: 10.1159/000339904] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2012] [Accepted: 06/05/2012] [Indexed: 11/19/2022] Open
Abstract
OBJECTIVE We present a parametric method for linkage analysis of quantitative phenotypes. The method provides a test for linkage as well as an estimate of different phenotype parameters. We have implemented our new method in the program GENEHUNTER-QMOD and evaluated its properties by performing simulations. METHODS The phenotype is modeled as a normally distributed variable, with a separate distribution for each genotype. Parameter estimates are obtained by maximizing the LOD score over the normal distribution parameters with a gradient-based optimization called PGRAD method. RESULTS The PGRAD method has lower power to detect linkage than the variance components analysis (VCA) in case of a normal distribution and small pedigrees. However, it outperforms the VCA and Haseman-Elston regression for extended pedigrees, nonrandomly ascertained data and non-normally distributed phenotypes. Here, the higher power even goes along with conservativeness, while the VCA has an inflated type I error. Parameter estimation tends to underestimate residual variances but performs better for expectation values of the phenotype distributions. CONCLUSION With GENEHUNTER-QMOD, a powerful new tool is provided to explicitly model quantitative phenotypes in the context of linkage analysis. It is freely available at http://www.helmholtz-muenchen.de/genepi/downloads.
Collapse
Affiliation(s)
- Thomas Künzel
- Institute of Medical Biometry and Epidemiology, Philipps University Marburg, Marburg, Germany.
| | | |
Collapse
|
4
|
Flaquer A, Strauch K. A comparison of different linkage statistics in small to moderate sized pedigrees with complex diseases. BMC Res Notes 2012; 5:411. [PMID: 22862841 PMCID: PMC3475142 DOI: 10.1186/1756-0500-5-411] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2012] [Accepted: 07/25/2012] [Indexed: 11/10/2022] Open
Abstract
Background In the last years GWA studies have successfully identified common SNPs associated with complex diseases. However, most of the variants found this way account for only a small portion of the trait variance. This fact leads researchers to focus on rare-variant mapping with large scale sequencing, which can be facilitated by using linkage information. The question arises why linkage analysis often fails to identify genes when analyzing complex diseases. Using simulations we have investigated the power of parametric and nonparametric linkage statistics (KC-LOD, NPL, LOD and MOD scores), to detect the effect of genes responsible for complex diseases using different pedigree structures. Results As expected, a small number of pedigrees with less than three affected individuals has low power to map disease genes with modest effect. Interestingly, the power decreases when unaffected individuals are included in the analysis, irrespective of the true mode of inheritance. Furthermore, we found that the best performing statistic depends not only on the type of pedigrees but also on the true mode of inheritance. Conclusions When applied in a sensible way linkage is an appropriate and robust technique to map genes for complex disease. Unlike association analysis, linkage analysis is not hampered by allelic heterogeneity. So, why does linkage analysis often fail with complex diseases? Evidently, when using an insufficient number of small pedigrees, one might miss a true genetic linkage when actually a real effect exists. Furthermore, we show that the test statistic has an important effect on the power to detect linkage as well. Therefore, a linkage analysis might fail if an inadequate test statistic is employed. We provide recommendations regarding the most favorable test statistics, in terms of power, for a given mode of inheritance and type of pedigrees under study, in order to reduce the probability to miss a true linkage.
Collapse
Affiliation(s)
- Antònia Flaquer
- Institute of Medical Informatics, Biometry and Epidemiology, Chair of Genetic Epidemiology, Ludwig-Maximilians-Universität (LMU) Munich, Germany.
| | | |
Collapse
|
5
|
Axenovich TI, Aulchenko YS. MQScore_SNP software for multipoint parametric linkage analysis of quantitative traits in large pedigrees. Ann Hum Genet 2010; 74:286-9. [PMID: 20529018 DOI: 10.1111/j.1469-1809.2010.00576.x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
We describe software for multipoint parametric linkage analysis of quantitative traits using information about SNP genotypes. A mixed model of major gene and polygene inheritance is implemented in this software. Implementation of several algorithms to avoid computational underflow and decrease running time permits application of our software to the analysis of very large pedigrees collected in human genetically isolated populations. We tested our software by performing linkage analysis of adult height in a large pedigree from a Dutch isolated population. Three significant and four suggestive loci were identified with the help of our programs, whereas variance-component-based linkage analysis, which requires the pedigree fragmentation, demonstrated only three suggestive peaks. The software package MQScore_SNP is available at http://mga.bionet.nsc.ru/soft/index.html.
Collapse
Affiliation(s)
- Tatiana I Axenovich
- Institute of Cytology & Genetics, Siberian Division, Russian Academy of Sciences, Novosibirsk, 630090, Russia.
| | | |
Collapse
|
6
|
Hodge SE, Rodriguez-Murillo L, Strug LJ, Greenberg DA. Multipoint lods provide reliable linkage evidence despite unknown limiting distribution: type I error probabilities decrease with sample size for multipoint lods and mods. Genet Epidemiol 2009; 32:800-15. [PMID: 18613118 DOI: 10.1002/gepi.20350] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
We investigate the behavior of type I error rates in model-based multipoint (MP) linkage analysis, as a function of sample size (N). We consider both MP lods (i.e., MP linkage analysis that uses the correct genetic model) and MP mods (maximizing MP lods over 18 dominant and recessive models). Following Xing and Elston (2006 Genet. Epidemiol, 30: 447-458), we first consider MP linkage analysis limited to a single position; then we enlarge the scope and maximize the lods and mods over a span of positions. In all situations we examined, type I error rates decrease with increasing sample size, apparently approaching zero. We show: (a) For MP lods analyzed only at a single position, well-known statistical theory predicts that type I error rates approach zero. (b) For MP lods and mods maximized over position, this result has a different explanation, related to the fact that one maximizes the scores over only a finite portion of the parameter range. The implications of these findings may be far-reaching: Although it is widely accepted that fixed nominal critical values for MP lods and mods are not known, this study shows that whatever the nominal error rates are, the actual error rates appear to decrease with increasing sample size. Moreover, the actual (observed) type I error rate may be quite small for any given study. We conclude that MP lod and mod scores provide reliable linkage evidence for complex diseases, despite the unknown limiting distributions of these MP scores.
Collapse
Affiliation(s)
- Susan E Hodge
- Division of Epidemiology, NY State Psychiatric Institute, New York, New York 10032, USA.
| | | | | | | |
Collapse
|
7
|
Mattheisen M, Dietter J, Knapp M, Baur MP, Strauch K. Inferential testing for linkage with GENEHUNTER-MODSCORE: The impact of the pedigree structure on the null distribution of multipoint MOD scores. Genet Epidemiol 2008; 32:73-83. [PMID: 17849490 DOI: 10.1002/gepi.20264] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The asymptotic distribution of [MOD] scores under the null hypothesis of no linkage is only known for affected sib pairs and other types of affected relative pairs. We have extended the GENEHUNTER-MODSCORE program to allow for simulations under the null hypothesis of no linkage to determine the empirical significance of MOD-score results in general situations. We performed simulations with families of different size (one million replicates of 500 families per simulation setting) to thoroughly investigate the impact of the pedigree size on the null distribution of multipoint MOD scores. It is shown that the distribution is dependent on the size and structure of the pedigrees under study. By performing simulations in the context of MOD-score analysis, our new tool efficiently explores the linkage data in a comprehensive way and also provides a valid method to inferentially test for linkage.
Collapse
Affiliation(s)
- Manuel Mattheisen
- Institute for Medical Biometry, Informatics, and Epidemiology, University of Bonn, Bonn, Germany.
| | | | | | | | | |
Collapse
|