26
|
Lin DY, Chiang TY, Huang CC, Lin HD, Tzeng SJ, Kang SR, Sung HM, Wu MC. Polymorphic microsatellite loci isolated from Cervus unicolor (Cervidae) show inbreeding in a domesticated population of Taiwan Sambar deer. GENETICS AND MOLECULAR RESEARCH 2014; 13:3967-71. [PMID: 24938607 DOI: 10.4238/2014.may.23.7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
Primers for eight microsatellites were developed; they successfully amplified DNA from 20 domesticated Formosan Sambar deer (Cervus unicolor swinhoei). All loci were polymorphic, with 10-19 alleles per locus. The average observed heterozygosity across loci and samples was 0.310, ranging from 0 to 0.750 at each locus. All loci but one, CU18, deviated from Hardy-Weinberg equilibrium due to excessive homozygosity in these domesticated broodstocks, reflecting inbreeding. These microsatellite loci will be useful, not only for assessment of population structure and genetic variability, but also for conservation of wild deer populations in Taiwan.
Collapse
|
27
|
Zeng D, Lin DY. Efficient Estimation of Semiparametric Transformation Models for Two-Phase Cohort Studies. J Am Stat Assoc 2014; 109:371-383. [PMID: 24659837 DOI: 10.1080/01621459.2013.842172] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Under two-phase cohort designs, such as case-cohort and nested case-control sampling, information on observed event times, event indicators, and inexpensive covariates is collected in the first phase, and the first-phase information is used to select subjects for measurements of expensive covariates in the second phase; inexpensive covariates are also used in the data analysis to control for confounding and to evaluate interactions. This paper provides efficient estimation of semiparametric transformation models for such designs, accommodating both discrete and continuous covariates and allowing inexpensive and expensive covariates to be correlated. The estimation is based on the maximization of a modified nonparametric likelihood function through a generalization of the expectation-maximization algorithm. The resulting estimators are shown to be consistent, asymptotically normal and asymptotically efficient with easily estimated variances. Simulation studies demonstrate that the asymptotic approximations are accurate in practical situations. Empirical data from Wilms' tumor studies and the Atherosclerosis Risk in Communities (ARIC) study are presented.
Collapse
|
28
|
Lin DY. Survival analysis with incomplete genetic data. LIFETIME DATA ANALYSIS 2014; 20:16-22. [PMID: 23722305 PMCID: PMC3806886 DOI: 10.1007/s10985-013-9262-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/09/2013] [Accepted: 05/11/2013] [Indexed: 06/02/2023]
Abstract
Genetic data are now collected frequently in clinical studies and epidemiological cohort studies. For a large study, it may be prohibitively expensive to genotype all study subjects, especially with the next-generation sequencing technology. Two-phase sampling, such as case-cohort and nested case-control sampling, is cost-effective in such settings but entails considerable analysis challenges, especially if efficient estimators are desired. Another type of missing data arises when the investigators are interested in the haplotypes or the genetic markers that are not on the genotyping platform used for the current study. Valid and efficient analysis of such missing data is also interesting and challenging. This article provides an overview of these issues and outlines some directions for future research.
Collapse
|
29
|
Abstract
Ross Prentice's work has had the most profound impact on the theory and practice of statistics. His research interests range from survival analysis, longitudinal data analysis, epidemiologic designs and analysis, to genomic studies. His contributions are so broad and so deep that it would be impossible to provide a comprehensive review in any limited amount of space. In this commentary, I will attempt to give a brief tour of some of his statistical work, focusing on ten of my favorite papers of his. I will describe the main ideas in those papers and their influence on the directions of statistical research and on the designs and analysis of medical studies. I will mention a few stories along the way.
Collapse
|
30
|
Abstract
We propose a graphical measure, the generalized negative predictive function, to quantify the predictive accuracy of covariates for survival time or recurrent event times. This new measure characterizes the event-free probabilities over time conditional on a thresholded linear combination of covariates and has direct clinical utility. We show that this function is maximized at the set of covariates truly related to event times and thus can be used to compare the predictive accuracy of different sets of covariates. We construct nonparametric estimators for this function under right censoring and prove that the proposed estimators, upon proper normalization, converge weakly to zero-mean Gaussian processes. To bypass the estimation of complex density functions involved in the asymptotic variances, we adopt the bootstrap approach and establish its validity. Simulation studies demonstrate that the proposed methods perform well in practical situations. Two clinical studies are presented.
Collapse
|
31
|
Pham MH, Berthouly-Salazar C, Tran XH, Chang WH, Crooijmans RPMA, Lin DY, Hoang VT, Lee YP, Tixier-Boichard M, Chen CF. Genetic diversity of Vietnamese domestic chicken populations as decision-making support for conservation strategies. Anim Genet 2013; 44:509-21. [PMID: 23714019 DOI: 10.1111/age.12045] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/26/2013] [Indexed: 11/27/2022]
Abstract
The aims of this study were to assess the genetic diversity of 17 populations of Vietnamese local chickens (VNN) and one Red Jungle Fowl population, together with six chicken populations of Chinese origin (CNO), and to provide priorities supporting the conservation of genetic resources using 20 microsatellites. Consequently, the VNN populations exhibited a higher diversity than did CNO populations in terms of number of alleles but showed a slightly lower observed heterozygosity. The VNN populations showed in total seven private alleles, whereas no CNO private alleles were found. The expected heterozygosity of 0.576 in the VNN populations was higher than the observed heterozygosity of 0.490, leading to heterozygote deficiency within populations. This issue could be partly explained by the Wahlund effect due to fragmentation of several populations between chicken flocks. Molecular analysis of variance showed that most of genetic variation was found within VNN populations. The Bayesian clustering analysis showed that VNN and CNO chickens were separated into two distinct groups with little evidence for gene flow between them. Among the 24 populations, 13 were successfully assigned to their own cluster, whereas the structuring was not clear for the remaining 11 chicken populations. The contributions of 24 populations to the total genetic diversity were mostly consistent across two approaches, taking into account the within- and between-populations genetic diversity and allelic richness. The black H'mong, Lien Minh, Luong Phuong and Red Jungle Fowl were ranked with the highest priorities for conservation according to Caballero and Toro's and Petit's approaches. In conclusion, a national strategy needs to be set up for Vietnamese chicken populations, with three main components: conservation of high-priority breeds, within-breed management with animal exchanges between flocks to avoid Wahlund effect and monitoring of inbreeding rate.
Collapse
|
32
|
Wu JS, Huang YK, Wu FL, Lin DY. Design and implementation of a versatile and variable-frequency piezoelectric coefficient measurement system. THE REVIEW OF SCIENTIFIC INSTRUMENTS 2012; 83:085110. [PMID: 22938335 DOI: 10.1063/1.4746769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
We present a simple but versatile piezoelectric coefficient measurement system, which can measure the longitudinal and transverse piezoelectric coefficients in the pressing and bending modes, respectively, at different applied forces and a wide range of frequencies. The functionality of this measurement system has been demonstrated on three samples, including a PbZr(0.52)Ti(0.48)O(3) (PZT) piezoelectric ceramic bulk, a ZnO thin film, and a laminated piezoelectric film sensor. The static longitudinal piezoelectric coefficients of the PZT bulk and the ZnO film are estimated to be around 210 and 8.1 pC/N, respectively. The static transverse piezoelectric coefficients of the ZnO film and the piezoelectric film sensor are determined to be, respectively, -0.284 and -0.031 C/m(2).
Collapse
|
33
|
Johnson BA, Lin DY, Zeng D. Penalized Estimating Functions and Variable Selection in Semiparametric Regression Models. J Am Stat Assoc 2012; 103:672-680. [PMID: 20376193 DOI: 10.1198/016214508000000184] [Citation(s) in RCA: 98] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
We propose a general strategy for variable selection in semiparametric regression models by penalizing appropriate estimating functions. Important applications include semiparametric linear regression with censored responses and semiparametric regression with missing predictors. Unlike the existing penalized maximum likelihood estimators, the proposed penalized estimating functions may not pertain to the derivatives of any objective functions and may be discrete in the regression coefficients. We establish a general asymptotic theory for penalized estimating functions and present suitable numerical algorithms to implement the proposed estimators. In addition, we develop a resampling technique to estimate the variances of the estimated regression coefficients when the asymptotic variances cannot be evaluated directly. Simulation studies demonstrate that the proposed methods perform well in variable selection and variance estimation. We illustrate our methods using data from the Paul Coverdell Stroke Registry.
Collapse
|
34
|
Lin DY, Zeng D. Correcting for Population Stratification in Genomewide Association Studies. J Am Stat Assoc 2011; 106:997-1008. [PMID: 22467997 DOI: 10.1198/jasa.2011.tm10294] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Genomewide association studies have become the primary tool for discovering the genetic basis of complex human diseases. Such studies are susceptible to the confounding effects of population stratification, in that the combination of allele-frequency heterogeneity with disease-risk heterogeneity among different ancestral subpopulations can induce spurious associations between genetic variants and disease. This article provides a statistically rigorous and computationally feasible solution to this challenging problem of unmeasured confounders. We show that the odds ratio of disease with a genetic variant is identifiable if and only if the genotype is independent of the unknown population substructure conditional on a set of observed ancestry-informative markers in the disease-free population. Under this condition, the odds ratio of interest can be estimated by fitting a semiparametric logistic regression model with an arbitrary function of a propensity score relating the genotype probability to ancestry-informative markers. Approximating the unknown function of the propensity score by B-splines, we derive a consistent and asymptotically normal estimator for the odds ratio of interest with a consistent variance estimator. Simulation studies demonstrate that the proposed inference procedures perform well in realistic settings. An application to the well-known Wellcome Trust Case-Control Study is presented. Supplemental materials are available online.
Collapse
|
35
|
Chen L, Lin DY, Zeng D. Checking semiparametric transformation models with censored data. Biostatistics 2011; 13:18-31. [PMID: 21785165 DOI: 10.1093/biostatistics/kxr017] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Semiparametric transformation models provide a very general framework for studying the effects of (possibly time-dependent) covariates on survival time and recurrent event times. Assessing the adequacy of these models is an important task because model misspecification affects the validity of inference and the accuracy of prediction. In this paper, we introduce appropriate time-dependent residuals for these models and consider the cumulative sums of the residuals. Under the assumed model, the cumulative sum processes converge weakly to zero-mean Gaussian processes whose distributions can be approximated through Monte Carlo simulation. These results enable one to assess, both graphically and numerically, how unusual the observed residual patterns are in reference to their null distributions. The residual patterns can also be used to determine the nature of model misspecification. Extensive simulation studies demonstrate that the proposed methods perform well in practical situations. Three medical studies are provided for illustrations.
Collapse
|
36
|
Abstract
Analysis of untyped single nucleotide polymorphisms (SNPs) can facilitate the localization of disease-causing variants and permit meta-analysis of association studies with different genotyping platforms. We present two approaches for using the linkage disequilibrium structure of an external reference panel to infer the unknown value of an untyped SNP from the observed genotypes of typed SNPs. The maximum-likelihood approach integrates the prediction of untyped genotypes and estimation of association parameters into a single framework and yields consistent and efficient estimators of genetic effects and gene-environment interactions with proper variance estimators. The imputation approach is a two-stage strategy, which first imputes the untyped genotypes by either the most likely genotypes or the expected genotype counts and then uses the imputed values in a downstream association analysis. The latter approach has proper control of type I error in single-SNP tests with possible covariate adjustments even when the reference panel is misspecified; however, type I error may not be properly controlled in testing multiple-SNP effects or gene-environment interactions. In general, imputation yields biased estimators of genetic effects and gene-environment interactions, and the variances are underestimated. We conduct extensive simulation studies to compare the bias, type I error, power, and confidence interval coverage between the maximum likelihood and imputation approaches in the analysis of single-SNP effects, multiple-SNP effects, and gene-environment interactions under cross-sectional and case-control designs. In addition, we provide an illustration with genome-wide data from the Wellcome Trust Case-Control Consortium (WTCCC) [2007].
Collapse
|
37
|
Abstract
Attributable fractions are commonly used to measure the impact of risk factors on disease incidence in the population. These static measures can be extended to functions of time when the time to disease occurrence or event time is of interest. The present paper deals with nonparametric and semiparametric estimation of attributable fraction functions for cohort studies with potentially censored event time data. The semiparametric models include the familiar proportional hazards model and a broad class of transformation models. The proposed estimators are shown to be consistent, asymptotically normal and asymptotically efficient. Extensive simulation studies demonstrate that the proposed methods perform well in practical situations. A cardiovascular health study is provided. Connections to causal inference are discussed.
Collapse
|
38
|
Lin DY, Zeng D. On the relative efficiency of using summary statistics versus individual-level data in meta-analysis. Biometrika 2010; 97:321-332. [PMID: 23049122 DOI: 10.1093/biomet/asq006] [Citation(s) in RCA: 107] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open
Abstract
Meta-analysis is widely used to synthesize the results of multiple studies. Although meta-analysis is traditionally carried out by combining the summary statistics of relevant studies, advances in technologies and communications have made it increasingly feasible to access the original data on individual participants. In the present paper, we investigate the relative efficiency of analyzing original data versus combining summary statistics. We show that, for all commonly used parametric and semiparametric models, there is no asymptotic efficiency gain by analyzing original data if the parameter of main interest has a common value across studies, the nuisance parameters have distinct values among studies, and the summary statistics are based on maximum likelihood. We also assess the relative efficiency of the two methods when the parameter of main interest has different values among studies or when there are common nuisance parameters across studies. We conduct simulation studies to confirm the theoretical results and provide empirical comparisons from a genetic association study.
Collapse
|
39
|
Diao G, Lin DY. Variance-components methods for linkage and association analysis of ordinal traits in general pedigrees. Genet Epidemiol 2010; 34:232-7. [PMID: 19918762 PMCID: PMC3003595 DOI: 10.1002/gepi.20453] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Many complex human diseases such as alcoholism and cancer are rated on ordinal scales. Well-developed statistical methods for the genetic mapping of quantitative traits may not be appropriate for ordinal traits. We propose a class of variance-component models for the joint linkage and association analysis of ordinal traits. The proposed models accommodate arbitrary pedigrees and allow covariates and gene-environment interactions. We develop efficient likelihood-based inference procedures under the proposed models. The maximum likelihood estimators are approximately unbiased, normally distributed, and statistically efficient. Extensive simulation studies demonstrate that the proposed methods perform well in practical situations. An application to data from the Collaborative Study on the Genetics of Alcoholism is provided.
Collapse
|
40
|
Hu YJ, Lin DY, Zeng D. A general framework for studying genetic effects and gene-environment interactions with missing data. Biostatistics 2010; 11:583-98. [PMID: 20348396 DOI: 10.1093/biostatistics/kxq015] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Missing data arise in genetic association studies when genotypes are unknown or when haplotypes are of direct interest. We provide a general likelihood-based framework for making inference on genetic effects and gene-environment interactions with such missing data. We allow genetic and environmental variables to be correlated while leaving the distribution of environmental variables completely unspecified. We consider 3 major study designs-cross-sectional, case-control, and cohort designs-and construct appropriate likelihood functions for all common phenotypes (e.g. case-control status, quantitative traits, and potentially censored ages at onset of disease). The likelihood functions involve both finite- and infinite-dimensional parameters. The maximum likelihood estimators are shown to be consistent, asymptotically normal, and asymptotically efficient. Expectation-Maximization (EM) algorithms are developed to implement the corresponding inference procedures. Extensive simulation studies demonstrate that the proposed inferential and numerical methods perform well in practical settings. Illustration with a genome-wide association study of lung cancer is provided.
Collapse
|
41
|
Lin DY, Zeng D. Meta-analysis of genome-wide association studies: no efficiency gain in using individual participant data. Genet Epidemiol 2010; 34:60-6. [PMID: 19847795 DOI: 10.1002/gepi.20435] [Citation(s) in RCA: 85] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
To identify genetic variants with modest effects on complex human diseases, a growing number of networks or consortia are created for sharing data from multiple genome-wide association studies on the same disease or related disorders. A central question in this enterprise is whether to obtain summary results or individual participant data from relevant studies. We show theoretically and numerically that meta-analysis of summary results is statistically as efficient as joint analysis of individual participant data (provided that both analyses are performed properly under the same modeling assumptions). We illustrate this equivalence with case-control data from the Finland-United States Investigation of NIDDM Genetics (FUSION) study. Collating only summary results will increase the number and representativeness of available studies, simplify data collection and analysis, reduce resource utilization, and accelerate discovery.
Collapse
|
42
|
Lin DY, Villegas MS, Tan PL, Wang S, Shek LP. Severe Kikuchi's disease responsive to immune modulation. Singapore Med J 2010; 51:e18-e21. [PMID: 20200761] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Kikuchi's disease, although an uncommon entity, has been increasingly reported since it was first discovered in 1972. The most common manifestation of Kikuchi's disease, cervical lymphadenopathy, has no clinically distinguishable features. Therefore, a diagnosis of Kikuchi's disease has largely been based on clinical suspicion and histopathological confirmation. We present a 15-year-old Chinese girl with severe Kikuchi's disease, whose relapsing course was only responsive to highdose steroid and intravenous immunoglobulin therapy.
Collapse
|
43
|
Lin DY, Zeng D. Proper analysis of secondary phenotype data in case-control association studies. Genet Epidemiol 2009; 33:256-65. [PMID: 19051285 DOI: 10.1002/gepi.20377] [Citation(s) in RCA: 101] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Case-control association studies often collect extensive information on secondary phenotypes, which are quantitative or qualitative traits other than the case-control status. Exploring secondary phenotypes can yield valuable insights into biological pathways and identify genetic variants influencing phenotypes of direct interest. All publications on secondary phenotypes have used standard statistical methods, such as least-squares regression for quantitative traits. Because of unequal selection probabilities between cases and controls, the case-control sample is not a random sample from the general population. As a result, standard statistical analysis of secondary phenotype data can be extremely misleading. Although one may avoid the sampling bias by analyzing cases and controls separately or by including the case-control status as a covariate in the model, the associations between a secondary phenotype and a genetic variant in the case and control groups can be quite different from the association in the general population. In this article, we present novel statistical methods that properly reflect the case-control sampling in the analysis of secondary phenotype data. The new methods provide unbiased estimation of genetic effects and accurate control of false-positive rates while maximizing statistical power. We demonstrate the pitfalls of the standard methods and the advantages of the new methods both analytically and numerically. The relevant software is available at our website.
Collapse
|
44
|
Zeng D, Lin DY. Semiparametric transformation models with random effects for joint analysis of recurrent and terminal events. Biometrics 2008; 65:746-52. [PMID: 18945267 DOI: 10.1111/j.1541-0420.2008.01126.x] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
We propose a broad class of semiparametric transformation models with random effects for the joint analysis of recurrent events and a terminal event. The transformation models include proportional hazards/intensity and proportional odds models. We estimate the model parameters by the nonparametric maximum likelihood approach. The estimators are shown to be consistent, asymptotically normal, and asymptotically efficient. Simple and stable numerical algorithms are provided to calculate the parameter estimators and to estimate their variances. Extensive simulation studies demonstrate that the proposed inference procedures perform well in realistic settings. Applications to two HIV/AIDS studies are presented.
Collapse
|
45
|
Huang BE, Amos CI, Lin DY. Detecting haplotype effects in genomewide association studies. Genet Epidemiol 2008; 31:803-12. [PMID: 17549762 DOI: 10.1002/gepi.20242] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
The analysis of genomewide association studies requires methods that are both computationally feasible and statistically powerful. Given the large-scale collection of single nucleotide polymorphisms (SNPs), it is desirable to explore the information contained in their interrelationships. In particular, utilizing haplotypes rather than individual SNPs and accounting for correlations of polymorphisms in adjustment for multiple testing can lead to increased power. We present a statistically powerful and numerically efficient method based on sliding windows of adjacent SNPs to detect haplotype-disease association in genomewide studies. This method consists of an efficient algorithm to calculate a proper likelihood-ratio statistic for any given window of SNPs, along with an accurate and efficient Monte Carlo procedure to adjust for multiple testing. Simulation studies using the HapMap data showed that the proposed method performs well in realistic situations. We applied the new method to a case-control study on rheumatoid arthritis and identified several loci worthy of further investigations.
Collapse
|
46
|
Diao G, Lin DY. Semiparametric methods for genome-wide linkage analysis of human gene expression data. BMC Proc 2007; 1 Suppl 1:S83. [PMID: 18466586 PMCID: PMC2367566 DOI: 10.1186/1753-6561-1-s1-s83] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
With the availability of high-throughput microarray technologies, investigators can simultaneously measure the expression levels of many thousands of genes in a short period. Although there are rich statistical methods for analyzing microarray data in the literature, limited work has been done in mapping expression quantitative trait loci (eQTL) that influence the variation in levels of gene expression. Most existing eQTL mapping methods assume that the expression phenotypes follow a normal distribution and violation of the normality assumption may lead to inflated type I error and reduced power. QTL analysis of expression data involves the mapping of many expression phenotypes at thousands or hundreds of thousands of marker loci across the whole genome. An appropriate procedure to adjust for multiple testing is essential for guarding against an abundance of false positive results. In this study, we applied a semiparametric quantitative trait loci (SQTL) mapping method to human gene expression data. The SQTL mapping method is rank-based and therefore robust to non-normality and outliers. Furthermore, we apply an efficient Monte Carlo procedure to account for multiple testing and assess the genome-wide significance level. Particularly, we apply the SQTL mapping method and the Monte-Carlo approach to the gene expression data provided by Genetic Analysis Workshop 15.
Collapse
|
47
|
Lin DY. On the Breslow estimator. LIFETIME DATA ANALYSIS 2007; 13:471-80. [PMID: 17768681 DOI: 10.1007/s10985-007-9048-y] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/12/2007] [Accepted: 07/16/2007] [Indexed: 05/17/2023]
Abstract
In his discussion of Cox's (1972) paper on proportional hazards regression, Breslow (1972) provided the maximum likelihood estimator for the cumulative baseline hazard function. This estimator is commonly used in practice. The estimator has also been highly valuable in the further development of Cox regression and semiparametric inference with censored data. The present paper describes the Breslow estimator and its tremendous impact on the theory and practice of survival analysis.
Collapse
|
48
|
Abstract
We propose a simple and general resampling strategy to estimate variances for parameter estimators derived from nonsmooth estimating functions. This approach applies to a wide variety of semiparametric and nonparametric problems in biostatistics. It does not require solving estimating equations and is thus much faster than the existing resampling procedures. Its usefulness is illustrated with heteroscedastic quantile regression and censored data rank regression. Numerical results based on simulated and real data are provided.
Collapse
|
49
|
|
50
|
Huang BE, Lin DY. Efficient association mapping of quantitative trait loci with selective genotyping. Am J Hum Genet 2007; 80:567-76. [PMID: 17273979 PMCID: PMC1821103 DOI: 10.1086/512727] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2006] [Accepted: 01/09/2007] [Indexed: 11/03/2022] Open
Abstract
Selective genotyping (i.e., genotyping only those individuals with extreme phenotypes) can greatly improve the power to detect and map quantitative trait loci in genetic association studies. Because selection depends on the phenotype, the resulting data cannot be properly analyzed by standard statistical methods. We provide appropriate likelihoods for assessing the effects of genotypes and haplotypes on quantitative traits under selective-genotyping designs. We demonstrate that the likelihood-based methods are highly effective in identifying causal variants and are substantially more powerful than existing methods.
Collapse
|