1
|
Chung RH, Tsai WY, Kang CY, Yao PJ, Tsai HJ, Chen CH. FamPipe: An Automatic Analysis Pipeline for Analyzing Sequencing Data in Families for Disease Studies. PLoS Comput Biol 2016; 12:e1004980. [PMID: 27272119 PMCID: PMC4894624 DOI: 10.1371/journal.pcbi.1004980] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2015] [Accepted: 05/12/2016] [Indexed: 11/18/2022] Open
Abstract
In disease studies, family-based designs have become an attractive approach to analyzing next-generation sequencing (NGS) data for the identification of rare mutations enriched in families. Substantial research effort has been devoted to developing pipelines for automating sequence alignment, variant calling, and annotation. However, fewer pipelines have been designed specifically for disease studies. Most of the current analysis pipelines for family-based disease studies using NGS data focus on a specific function, such as identifying variants with Mendelian inheritance or identifying shared chromosomal regions among affected family members. Consequently, some other useful family-based analysis tools, such as imputation, linkage, and association tools, have yet to be integrated and automated. We developed FamPipe, a comprehensive analysis pipeline, which includes several family-specific analysis modules, including the identification of shared chromosomal regions among affected family members, prioritizing variants assuming a disease model, imputation of untyped variants, and linkage and association tests. We used simulation studies to compare properties of some modules implemented in FamPipe, and based on the results, we provided suggestions for the selection of modules to achieve an optimal analysis strategy. The pipeline is under the GNU GPL License and can be downloaded for free at http://fampipe.sourceforge.net.
Collapse
Affiliation(s)
- Ren-Hua Chung
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Miaoli County, Taiwan
- * E-mail:
| | - Wei-Yun Tsai
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Miaoli County, Taiwan
| | - Chen-Yu Kang
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Miaoli County, Taiwan
| | - Po-Ju Yao
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Miaoli County, Taiwan
| | - Hui-Ju Tsai
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Miaoli County, Taiwan
- Department of Public Health, China Medical University, Taichung, Taiwan
- Department of Pediatrics, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, United States of America
| | - Chia-Hsiang Chen
- Department of Psychiatry, Chang Gung Memorial Hospital-Linkou, Gueishan, Taoyuan, Taiwan
- Department and Graduate Institute of Biomedical Sciences, Chang Gung University, Taoyuan, Taiwan
| |
Collapse
|
2
|
Han L, Abney M. Identity by descent estimation with dense genome-wide genotype data. Genet Epidemiol 2011; 35:557-67. [PMID: 21769932 DOI: 10.1002/gepi.20606] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2011] [Revised: 05/06/2011] [Accepted: 05/31/2011] [Indexed: 11/11/2022]
Abstract
We present a novel method, IBDLD, for estimating the probability of identity by descent (IBD) for a pair of related individuals at a locus, given dense genotype data and a pedigree of arbitrary size and complexity. IBDLD overcomes the challenges of exact multipoint estimation of IBD in pedigrees of potentially large size and eliminates the difficulty of accommodating the background linkage disequilibrium (LD) that is present in high-density genotype data. We show that IBDLD is much more accurate at estimating the true IBD sharing than methods that remove LD by pruning SNPs and is highly robust to pedigree errors or other forms of misspecified relationships. The method is fast and can be used to estimate the probability for each possible IBD sharing state at every SNP from a high-density genotyping array for hundreds of thousands of pairs of individuals. We use it to estimate point-wise and genomewide IBD sharing between 185,745 pairs of subjects all of whom are related through a single, large and complex 13-generation pedigree and genotyped with the Affymetrix 500 k chip. We find that we are able to identify the true pedigree relationship for individuals who were misidentified in the collected data and estimate empirical kinship coefficients that can be used in follow-up QTL mapping studies. IBDLD is implemented as an open source software package and is freely available.
Collapse
Affiliation(s)
- Lide Han
- Department of Human Genetics, University of Chicago, Illinois, USA
| | | |
Collapse
|
3
|
Selmer KK, Grøndahl J, Riise R, Brandal K, Braaten Ø, Bragadottir R, Undlien DE. Autosomal dominant pericentral retinal dystrophy caused by a novel missense mutation in the TOPORS gene. Acta Ophthalmol 2010; 88:323-8. [PMID: 19183411 DOI: 10.1111/j.1755-3768.2008.01465.x] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
PURPOSE This study aimed to identify the genetic cause of autosomal dominant pericentral retinal dystrophy (adPRD) in a large Norwegian family with 35 affected members. METHODS The family was characterized by clinical ophthalmological examination along with fundus photography, dark adaptometry and electroretinography. We performed a genome-wide linkage analysis followed by sequencing of a candidate gene to identify the mutation causing the disease. RESULTS The ophthalmological examinations revealed an atypical form of retinitis pigmentosa (RP), which we prefer to call adPRD. Compared with classical RP, this phenotype has a favourable prognosis. Linkage analysis showed a linkage peak covering the most recently reported adRP gene TOPORS. This gene was sequenced in 19 family members and a novel missense mutation, c.1205a>c, resulting in an amino acid substitution p.Q402P, was detected in all affected members. The mutation showed complete co-segregation with the disease in this family, with a LOD score of 7.3. It is located in a highly conserved region and alignment with the appropriate DNA sequence from other species shows complete conservation of this amino acid. The mutation was not detected in 207 healthy, unrelated controls of Norwegian origin. CONCLUSIONS We present a novel mutation in the TOPORS gene co-segregating with a distinct phenotype of adPRD in a large Norwegian family.
Collapse
|
4
|
George AW. Estimation of copy number in polyploid plants: the good, the bad, and the ugly. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2009; 119:483-496. [PMID: 19449176 DOI: 10.1007/s00122-009-1054-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2008] [Accepted: 04/24/2009] [Indexed: 05/27/2023]
Abstract
Genetic studies in polyploid plants rely heavily on the collection of data from dominant marker loci. A dominant marker locus is a locus for which only the presence or absence of an observable (dominant) allele is recorded. Before these marker loci can be used for genetic exploration, the number of copies of a dominant allele carried by a parent (copy number) must be determined for each marker locus. Copy number in polyploids is estimated using a hypothesis testing procedure. The performance of this estimation procedure has never been evaluated. In this paper, I quantify whether the highly sought after single-copy markers can be accurately identified, if the performance of the estimation procedure improves with increasing sample size, and whether the estimation procedure is capable of accurately estimating the copy number of high copy markers. I found that the probability of incorrectly estimating copy number is quite low and that more data can actually reduce the accuracy of the estimation procedure when the testing assumptions are violated. Fortunately, when a significant result is obtained, it is almost always correct. The challenge often is in obtaining a significant result.
Collapse
Affiliation(s)
- Andrew W George
- Mathematical and Information Sciences, CSIRO, Brisbane, QLD 4067, Australia.
| |
Collapse
|
5
|
Selmer KK, Brandal K, Olstad OK, Birkenes B, Undlien DE, Egeland T. Genome-wide Linkage Analysis with Clustered SNP Markers. ACTA ACUST UNITED AC 2008; 14:92-6. [DOI: 10.1177/1087057108327327] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Single nucleotide polymorphisms (SNPs) have recently replaced microsatellites as the genetic markers of choice in linkage analysis, primarily because they are more abundant and the genotypes more amenable for automatic calling. One of the most recently launched linkage mapping sets (LMS) is the Applied Biosystems Human LMS 4K, which is a genome-wide linkage set based on the SNPlex™ technology and the use of clustered SNPs. In this article the authors report on their experience with this set and the associated genotyping software GeneMapper® version 4.0, which they have used for linkage analyses in 17 moderate to large families with assumed monogenic disease. For comparison of methods, they also performed a genome-wide linkage analysis in 1 of the 17 families using the Affymetrix GeneChip® Human Mapping 10K 2.0 array. The conclusion is that both methods performed technically well, with high call rates and comparable and low rates of Mendelian inconsistencies. However, genotyping is less automated in GeneMapper® version 4.0 than in the Affymetrix software and thus more time consuming. ( Journal of Biomolecular Screening 2009:92-96)
Collapse
Affiliation(s)
- Kaja K. Selmer
- Institute of Medical Genetics, University of Oslo, Oslo, Norway, Department of Medical Genetics, UllevÅl University Hospital, Oslo, Norway,
| | - Kristin Brandal
- Institute of Medical Genetics, University of Oslo, Oslo, Norway
| | - Ole K. Olstad
- Department of Clinical Chemistry, UllevÅl University Hospital, Oslo, Norway
| | - Bård Birkenes
- Institute of Medical Genetics, University of Oslo, Oslo, Norway
| | - Dag E. Undlien
- Institute of Medical Genetics, University of Oslo, Oslo, Norway, Department of Medical Genetics, UllevÅl University Hospital, Oslo, Norway
| | - Thore Egeland
- Department of Medical Genetics, UllevÅl University Hospital, Oslo, Norway, Oslo University College, Oslo, Norway
| |
Collapse
|
6
|
Sung YJ, Rao D. Model-based linkage analysis with imprinting for quantitative traits: ignoring imprinting effects can severely jeopardize detection of linkage. Genet Epidemiol 2008; 32:487-96. [DOI: 10.1002/gepi.20321] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
7
|
Albers CA, Stankovich J, Thomson R, Bahlo M, Kappen HJ. Multipoint approximations of identity-by-descent probabilities for accurate linkage analysis of distantly related individuals. Am J Hum Genet 2008; 82:607-22. [PMID: 18319071 PMCID: PMC2427226 DOI: 10.1016/j.ajhg.2007.12.016] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2007] [Revised: 10/22/2007] [Accepted: 12/11/2007] [Indexed: 12/22/2022] Open
Abstract
We propose an analytical approximation method for the estimation of multipoint identity by descent (IBD) probabilities in pedigrees containing a moderate number of distantly related individuals. We show that in large pedigrees where cases are related through untyped ancestors only, it is possible to formulate the hidden Markov model of the Lander-Green algorithm in terms of the IBD configurations of the cases. We use a first-order Markov approximation to model the changes in this IBD-configuration variable along the chromosome. In simulated and real data sets, we demonstrate that estimates of parametric and nonparametric linkage statistics based on the first-order Markov approximation are accurate. The computation time is exponential in the number of cases instead of in the number of meioses separating the cases. We have implemented our approach in the computer program ALADIN (accurate linkage analysis of distantly related individuals). ALADIN can be applied to general pedigrees and marker types and has the ability to model marker-marker linkage disequilibrium with a clustered-markers approach. Using ALADIN is straightforward: It requires no parameters to be specified and accepts standard input files.
Collapse
Affiliation(s)
- Cornelis A Albers
- Department of Biophysics, Institute for Computing and Information Sciences, Radboud University, 6525 EZ Nijmegen, The Netherlands.
| | | | | | | | | |
Collapse
|
8
|
Libiger O, Schork NJ. A simulation-based analysis of chromosome segment sharing among a group of arbitrarily related individuals. Eur J Hum Genet 2007; 15:1260-8. [PMID: 17700628 DOI: 10.1038/sj.ejhg.5201910] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Abstract
A fundamental set of issues in human genetics research concerns the statistical properties of the DNA sequence or chromosomal segments that are shared between related individuals. Although well-established mathematical formulations exist that consider such sharing via measures such as the kinship coefficient, many of these formulations are derived for entire genomes, individual sequence variations, or small stretches of DNA, and hence, do not consider either the actual size or the number of the genome-wide chromosomal segments that are shared between two or more arbitrarily related individuals. In this paper, we employ a flexible gene-dropping simulation-based approach for estimating the distribution of the size and the number of chromosomal segments shared by any number of arbitrarily related individuals. The approach takes advantage of chromosome- and sex-specific recombination rates adopted from integrated genetic and physical maps, and considers the genome as a whole, rather than specific genomic regions or loci. In addition, our analysis considers the effects of linkage disequilibrium and crossover interference on segment sharing. Our proposed analysis and computational strategy can be used to provide compelling answers to questions concerning variation in the kinship coefficient as well as the distribution of chromosomal sharing over individual chromosomes. We present results that showcase possible application of assessing genomic sharing in gene mapping and apply our analysis to data available from published gene mapping studies.
Collapse
Affiliation(s)
- Ondrej Libiger
- [1] 1Scripps Genomic Medicine, Scripps Health, La Jolla, CA, USA
| | | |
Collapse
|
9
|
Lele SR, Dennis B, Lutscher F. Data cloning: easy maximum likelihood estimation for complex ecological models using Bayesian Markov chain Monte Carlo methods. Ecol Lett 2007; 10:551-63. [PMID: 17542934 DOI: 10.1111/j.1461-0248.2007.01047.x] [Citation(s) in RCA: 193] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We introduce a new statistical computing method, called data cloning, to calculate maximum likelihood estimates and their standard errors for complex ecological models. Although the method uses the Bayesian framework and exploits the computational simplicity of the Markov chain Monte Carlo (MCMC) algorithms, it provides valid frequentist inferences such as the maximum likelihood estimates and their standard errors. The inferences are completely invariant to the choice of the prior distributions and therefore avoid the inherent subjectivity of the Bayesian approach. The data cloning method is easily implemented using standard MCMC software. Data cloning is particularly useful for analysing ecological situations in which hierarchical statistical models, such as state-space models and mixed effects models, are appropriate. We illustrate the method by fitting two nonlinear population dynamics models to data in the presence of process and observation noise.
Collapse
Affiliation(s)
- Subhash R Lele
- Department of Mathematical and Statistical Sciences, University of Alberta, Edmonton, AB T6G2G1, Canada
| | | | | |
Collapse
|
10
|
Thomas A. Towards linkage analysis with markers in linkage disequilibrium by graphical modelling. Hum Hered 2007; 64:16-26. [PMID: 17483593 DOI: 10.1159/000101419] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
We review recent developments of MCMC integration methods for computations on graphical models for two applications in statistical genetics: modelling allelic association and pedigree based linkage analysis. We discuss and illustrate estimation of graphical models from haploid and diploid genotypes, and the importance of MCMC updating schemes beyond what is strictly necessary for irreducibility. We then outline an approach combining these methods to compute linkage statistics when alleles at the marker loci are in linkage disequilibrium. Other extensions suitable for analysis of SNP genotype data in pedigrees are also discussed and programs that implement these methods, and which are available from the author's web site, are described. We conclude with a discussion of how this still experimental approach might be further developed.
Collapse
Affiliation(s)
- Alun Thomas
- Department of Biomedical Informatics, Genetic Epidemiology, University of Utah, Salt Lake City, Utah 84108, USA.
| |
Collapse
|
11
|
Thomson R, Quinn S, McKay J, Silver J, Bahlo M, FitzGerald L, Foote S, Dickinson J, Stankovich J. The advantages of dense marker sets for linkage analysis with very large families. Hum Genet 2007; 121:459-68. [PMID: 17252250 DOI: 10.1007/s00439-007-0323-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2006] [Accepted: 01/02/2007] [Indexed: 10/23/2022]
Abstract
Dense sets of hundreds of thousands of markers have been developed for genome-wide association studies. These marker sets are also beneficial for linkage analysis of large, deep pedigrees containing distantly related cases. It is impossible to analyse jointly all genotypes in large pedigrees using the Lander-Green Algorithm, however, as marker density increases it becomes less crucial to analyse all individuals' genotypes simultaneously. In this report, an approximate multipoint non-parametric technique is described, where large pedigrees are split into many small pedigrees, each containing just two cases. This technique is demonstrated, using phased data from the International Hapmap Project to simulate sets of 10,000, 50,000 and 250,000 markers, showing that it becomes increasingly accurate as more markers are genotyped. This method allows routine linkage analysis of large families with dense marker sets and represents a more easily applied alternative to Monte Carlo Markov Chain methods.
Collapse
Affiliation(s)
- Russell Thomson
- Menzies Research Institute, University of Tasmania, Private Bag 23, Hobart, TAS, 7001, Australia.
| | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Wijsman EM, Rothstein JH, Thompson EA. Multipoint linkage analysis with many multiallelic or dense diallelic markers: Markov chain-Monte Carlo provides practical approaches for genome scans on general pedigrees. Am J Hum Genet 2006; 79:846-58. [PMID: 17033961 PMCID: PMC1698573 DOI: 10.1086/508472] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2006] [Accepted: 08/11/2006] [Indexed: 11/03/2022] Open
Abstract
Computations for genome scans need to adapt to the increasing use of dense diallelic markers as well as of full-chromosome multipoint linkage analysis with either diallelic or multiallelic markers. Whereas suitable exact-computation tools are available for use with small pedigrees, equivalent exact computation for larger pedigrees remains infeasible. Markov chain-Monte Carlo (MCMC)-based methods currently provide the only computationally practical option. To date, no systematic comparison of the performance of MCMC-based programs is available, nor have these programs been systematically evaluated for use with dense diallelic markers. Using simulated data, we evaluate the performance of two MCMC-based linkage-analysis programs--lm_markers from the MORGAN package and SimWalk2--under a variety of analysis conditions. Pedigrees consisted of 14, 52, or 98 individuals in 3, 5, or 6 generations, respectively, with increasing amounts of missing data in larger pedigrees. One hundred replicates of markers and trait data were simulated on a 100-cM chromosome, with up to 10 multiallelic and up to 200 diallelic markers used simultaneously for computation of multipoint LOD scores. Exact computation was available for comparison in most situations, and comparison with a perfectly informative marker or interprogram comparison was available in the remaining situations. Our results confirm the accuracy of both programs in multipoint analysis with multiallelic markers on pedigrees of varied sizes and missing-data patterns, but there are some computational differences. In contrast, for large numbers of dense diallelic markers, only the lm_markers program was able to provide accurate results within a computationally practical time. Thus, programs in the MORGAN package are the first available to provide a computationally practical option for accurate linkage analyses in genome scans with both large numbers of diallelic markers and large pedigrees.
Collapse
Affiliation(s)
- Ellen M Wijsman
- Division of Medical Genetics, University of Washington, Seattle, WA 98195, USA.
| | | | | |
Collapse
|
13
|
Albers CA, Leisink MAR, Kappen HJ. The cluster variation method for efficient linkage analysis on extended pedigrees. BMC Bioinformatics 2006; 7 Suppl 1:S1. [PMID: 16723002 PMCID: PMC1810310 DOI: 10.1186/1471-2105-7-s1-s1] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Background Computing exact multipoint LOD scores for extended pedigrees rapidly becomes infeasible as the number of markers and untyped individuals increase. When markers are excluded from the computation, significant power may be lost. Therefore accurate approximate methods which take into account all markers are desirable. Methods We present a novel method for efficient estimation of LOD scores on extended pedigrees. Our approach is based on the Cluster Variation Method, which deterministically estimates likelihoods by performing exact computations on tractable subsets of variables (clusters) of a Bayesian network. First a distribution over inheritances on the marker loci is approximated with the Cluster Variation Method. Then this distribution is used to estimate the LOD score for each location of the trait locus. Results First we demonstrate that significant power may be lost if markers are ignored in the multi-point analysis. On a set of pedigrees where exact computation is possible we compare the estimates of the LOD scores obtained with our method to the exact LOD scores. Secondly, we compare our method to a state of the art MCMC sampler. When both methods are given equal computation time, our method is more efficient. Finally, we show that CVM scales to large problem instances. Conclusion We conclude that the Cluster Variation Method is as accurate as MCMC and generally is more efficient. Our method is a promising alternative to approaches based on MCMC sampling.
Collapse
Affiliation(s)
- Cornelis A Albers
- Department of Medical Physics and Biophysics, Radboud University, Nijmegen, The Netherlands
| | - Martijn AR Leisink
- Department of Medical Physics and Biophysics, Radboud University, Nijmegen, The Netherlands
| | - Hilbert J Kappen
- Department of Medical Physics and Biophysics, Radboud University, Nijmegen, The Netherlands
| |
Collapse
|
14
|
Igo RP, Chapman NH, Berninger VW, Matsushita M, Brkanac Z, Rothstein JH, Holzman T, Nielsen K, Raskind WH, Wijsman EM. Genomewide scan for real-word reading subphenotypes of dyslexia: novel chromosome 13 locus and genetic complexity. Am J Med Genet B Neuropsychiatr Genet 2006; 141B:15-27. [PMID: 16331673 PMCID: PMC2556979 DOI: 10.1002/ajmg.b.30245] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Dyslexia is a common learning disability exhibited as a delay in acquiring reading skills despite adequate intelligence and instruction. Reading single real words (real-word reading, RWR) is especially impaired in many dyslexics. We performed a genome scan, using variance components (VC) linkage analysis and Bayesian Markov chain Monte Carlo (MCMC) joint segregation and linkage analysis, for three quantitative measures of RWR in 108 multigenerational families, with follow up of the strongest signals with parametric LOD score analyses. We used single-word reading efficiency (SWE) to assess speed and accuracy of RWR, and word identification (WID) to assess accuracy alone. Adjusting SWE for WID provided a third measure of RWR efficiency. All three methods of analysis identified a strong linkage signal for SWE on chromosome 13q. Based on multipoint analysis with 13 markers we obtained a MCMC intensity ratio (IR) of 53.2 (chromosome-wide P < 0.004), a VC LOD score of 2.29, and a parametric LOD score of 2.94, based on a quantitative-trait model from MCMC segregation analysis (SA). A weaker signal for SWE on chromosome 2q occurred in the same location as a significant linkage peak seen previously in a scan for phonological decoding. MCMC oligogenic SA identified three models of transmission for WID, which could be assigned to two distinct linkage peaks on chromosomes 12 and 15. Taken together, these results indicate a locus for efficiency and accuracy of RWR on chromosome 13, and a complex model for inheritance of RWR accuracy with loci on chromosomes 12 and 15.
Collapse
Affiliation(s)
- Robert P. Igo
- Department of Medicine, University of Washington, Seattle, WA
- Department of Biostatistics, University of Washington, Seattle, WA
| | | | | | - Mark Matsushita
- Department of Medicine, University of Washington, Seattle, WA
| | - Zoran Brkanac
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA
| | | | | | - Kathleen Nielsen
- Department of Educational Psychology, University of Washington, Seattle, WA
| | - Wendy H. Raskind
- Department of Medicine, University of Washington, Seattle, WA
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA
| | - Ellen M. Wijsman
- Department of Medicine, University of Washington, Seattle, WA
- Department of Biostatistics, University of Washington, Seattle, WA
| |
Collapse
|
15
|
Abstract
The past 25 years has seen an explosion in the number of genetic markers that can be measured on DNA samples at an ever decreasing cost. Although basic statistical methods for analysing such data gathered on samples of either independent individuals or family members, one or two markers at a time, were already well developed before this explosion occurred, there has been a corresponding burst in activity to develop multiple marker models to find disease-causing gene variants, capitalizing on the data that have become available, to increase the power of such methods. This has required the concomitant development of faster algorithms to speed up the computation of various likelihoods. For linkage analysis, to obtain the approximate locations for genes of interest, Mendelian segregation models have been extended to be more realistic and statistical models that do not assume specific modes of inheritance have been extended to allow for the analysis of larger pedigree structures. For association analysis, to obtain more precise locations for genes of interest, the recent completion of the first stage of the HapMap project has spurred the development, still underway, of novel experimental designs and analytical methods to combat the curse of dimensionality and the resulting multiple testing problem. Perhaps the greatest current challenge concerns how best to gather and synthesize the many lines of evidence possible in order to discover the genetic determinants underlying complex diseases.
Collapse
Affiliation(s)
- Robert C Elston
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA.
| | | |
Collapse
|
16
|
Sieh W, Basu S, Fu AQ, Rothstein JH, Scheet PA, Stewart WCL, Sung YJ, Thompson EA, Wijsman EM. Comparison of marker types and map assumptions using Markov chain Monte Carlo-based linkage analysis of COGA data. BMC Genet 2005; 6 Suppl 1:S11. [PMID: 16451566 PMCID: PMC1866829 DOI: 10.1186/1471-2156-6-s1-s11] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
We performed multipoint linkage analysis of the electrophysiological trait ECB21 on chromosome 4 in the full pedigrees provided by the Collaborative Study on the Genetics of Alcoholism (COGA). Three Markov chain Monte Carlo (MCMC)-based approaches were applied to the provided and re-estimated genetic maps and to five different marker panels consisting of microsatellite (STRP) and/or SNP markers at various densities. We found evidence of linkage near the GABRB1 STRP using all methods, maps, and marker panels. Difficulties encountered with SNP panels included convergence problems and demanding computations.
Collapse
Affiliation(s)
- Weiva Sieh
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, Washington, 98195, USA
| | - Saonli Basu
- Department of Statistics, University of Washington, Seattle, Washington, 98195, USA
| | - Audrey Q Fu
- Department of Statistics, University of Washington, Seattle, Washington, 98195, USA
| | - Joseph H Rothstein
- Department of Biostatistics, University of Washington, Seattle, Washington, 98195, USA
| | - Paul A Scheet
- Department of Statistics, University of Washington, Seattle, Washington, 98195, USA
| | - William CL Stewart
- Department of Statistics, University of Washington, Seattle, Washington, 98195, USA
| | - Yun J Sung
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, Washington, 98195, USA
| | - Elizabeth A Thompson
- Department of Statistics, University of Washington, Seattle, Washington, 98195, USA
- Department of Biostatistics, University of Washington, Seattle, Washington, 98195, USA
| | - Ellen M Wijsman
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, Washington, 98195, USA
- Department of Biostatistics, University of Washington, Seattle, Washington, 98195, USA
| |
Collapse
|
17
|
Gagnon F, Jarvik GP, Badzioch MD, Motulsky AG, Brunzell JD, Wijsman EM. Genome scan for quantitative trait loci influencing HDL levels: evidence for multilocus inheritance in familial combined hyperlipidemia. Hum Genet 2005; 117:494-505. [PMID: 15959807 DOI: 10.1007/s00439-005-1338-4] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2004] [Accepted: 04/27/2005] [Indexed: 11/25/2022]
Abstract
Several genome scans in search of high-density lipoprotein (HDL) quantitative trait loci (QTLs) have been performed. However, to date the actual identification of genes implicated in the regulation of common forms of HDL abnormalities remains unsuccessful. This may be due, in part, to the oligogenic and multivariate nature of HDL regulation, and potentially, pleiotropy affecting HDL and other lipid-related traits. Using a Bayesian Markov Chain Monte Carlo (MCMC) approach, we recently provided evidence of linkage of HDL level variation to the APOA1-C3-A4-A5 gene complex, in familial combined hyperlipidemia pedigrees, with an estimated number of two to three large QTLs remaining to be identified. We also presented results consistent with pleiotropy affecting HDL and triglycerides at the APOA1-C3-A4-A5 gene complex. Here we use the same MCMC analytic strategy, which allows for oligogenic trait models, as well as simultaneous incorporation of covariates, in the context of multipoint analysis. We now present results from a genome scan in search for the additional HDL QTLs in these pedigrees. We provide evidence of linkage for additional HDL QTLs on chromosomes 3p14 and 13q32, with results on chromosome 3 further supported by maximum parametric and variance component LOD scores of 3.0 and 2.6, respectively. Weaker evidence of linkage was also obtained for 7q32, 12q12, 14q31-32 and 16q23-24.
Collapse
Affiliation(s)
- France Gagnon
- Department of Epidemiology and Community Medicine, University of Ottawa, Ottawa, Ontario, Canada
| | | | | | | | | | | |
Collapse
|
18
|
George AW, Wijsman EM, Thompson EA. MCMC Multilocus Lod Scores: Application of a New Approach. Hum Hered 2005; 59:98-108. [PMID: 15838179 DOI: 10.1159/000085224] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2004] [Accepted: 12/23/2004] [Indexed: 11/19/2022] Open
Abstract
On extended pedigrees with extensive missing data, the calculation of multilocus likelihoods for linkage analysis is often beyond the computational bounds of exact methods. Growing interest therefore surrounds the implementation of Monte Carlo estimation methods. In this paper, we demonstrate the speed and accuracy of a new Markov chain Monte Carlo method for the estimation of linkage likelihoods through an analysis of real data from a study of early-onset Alzheimer's disease. For those data sets where comparison with exact analysis is possible, we achieved up to a 100-fold increase in speed. Our approach is implemented in the program lm_bayes within the framework of the freely available MORGAN 2.6 package for Monte Carlo genetic analysis (http://www.stat.washington.edu/thompson/Genepi/MORGAN/Morgan.shtml).
Collapse
Affiliation(s)
- Andrew W George
- Department of Statistics, University of Washington, Seattle, WA 98195, USA
| | | | | |
Collapse
|