1
|
Zhang M, Chen Y, Yu D, Zhong W, Zhang J, Ma P. Elucidating dynamic cell lineages and gene networks in time-course single cell differentiation. ARTIFICIAL INTELLIGENCE IN THE LIFE SCIENCES 2023; 3:100068. [PMID: 37426065 PMCID: PMC10328540 DOI: 10.1016/j.ailsci.2023.100068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
Single cell RNA sequencing (scRNA-seq) technologies provide researchers with an unprecedented opportunity to exploit cell heterogeneity. For example, the sequenced cells belong to various cell lineages, which may have different cell fates in stem and progenitor cells. Those cells may differentiate into various mature cell types in a cell differentiation process. To trace the behavior of cell differentiation, researchers reconstruct cell lineages and predict cell fates by ordering cells chronologically into a trajectory with a pseudo-time. However, in scRNA-seq experiments, there are no cell-to-cell correspondences along with the time to reconstruct the cell lineages, which creates a significant challenge for cell lineage tracing and cell fate prediction. Therefore, methods that can accurately reconstruct the dynamic cell lineages and predict cell fates are highly desirable. In this article, we develop an innovative machine-learning framework called Cell Smoothing Transformation (CellST) to elucidate the dynamic cell fate paths and construct gene networks in cell differentiation processes. Unlike the existing methods that construct one single bulk cell trajectory, CellST builds cell trajectories and tracks behaviors for each individual cell. Additionally, CellST can predict cell fates even for less frequent cell types. Based on the individual cell fate trajectories, CellST can further construct dynamic gene networks to model gene-gene relationships along the cell differentiation process and discover critical genes that potentially regulate cells into various mature cell types.
Collapse
Affiliation(s)
| | - Yongkai Chen
- Department of Statistics, University of Georgia, Athens, GA 30602, United Stated
| | - Dingyi Yu
- Department of Industrial Engineering, Center for Statistical Science, Tsinghua University, Beijing, China
| | - Wenxuan Zhong
- Department of Statistics, University of Georgia, Athens, GA 30602, United Stated
| | - Jingyi Zhang
- Department of Industrial Engineering, Center for Statistical Science, Tsinghua University, Beijing, China
| | - Ping Ma
- Department of Statistics, University of Georgia, Athens, GA 30602, United Stated
| |
Collapse
|
2
|
Robust Permutation Tests for Penalized Splines. STATS 2022. [DOI: 10.3390/stats5030053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Penalized splines are frequently used in applied research for understanding functional relationships between variables. In most applications, statistical inference for penalized splines is conducted using the random effects or Bayesian interpretation of a smoothing spline. These interpretations can be used to assess the uncertainty of the fitted values and the estimated component functions. However, statistical tests about the nature of the function are more difficult, because such tests often involve testing a null hypothesis that a variance component is equal to zero. Furthermore, valid statistical inference using the random effects or Bayesian interpretation depends on the validity of the utilized parametric assumptions. To overcome these limitations, I propose a flexible and robust permutation testing framework for inference with penalized splines. The proposed approach can be used to test omnibus hypotheses about functional relationships, as well as more flexible hypotheses about conditional relationships. I establish the conditions under which the methods will produce exact results, as well as the asymptotic behavior of the various permutation tests. Additionally, I present extensive simulation results to demonstrate the robustness and superiority of the proposed approach compared to commonly used methods.
Collapse
|
3
|
Wang T, Yu L, Leurgans SE, Wilson RS, Bennett DA, Boyle PA. Conditional functional clustering for longitudinal data with heterogeneous nonlinear patterns. Ann Appl Stat 2022. [DOI: 10.1214/21-aoas1542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Tianhao Wang
- Rush Alzheimer’s Disease Center, Rush University Medical Center
| | - Lei Yu
- Rush Alzheimer’s Disease Center, Rush University Medical Center
| | - Sue E. Leurgans
- Rush Alzheimer’s Disease Center, Rush University Medical Center
| | | | | | | |
Collapse
|
4
|
Aghababaei Jazi O, Pullenayegum E. Variable selection in semiparametric regression models for longitudinal data with informative observation times. Stat Med 2022; 41:3281-3298. [PMID: 35468658 DOI: 10.1002/sim.9417] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Revised: 03/16/2022] [Accepted: 04/06/2022] [Indexed: 01/04/2023]
Abstract
A common issue in longitudinal studies is that subjects' visits are irregular and may depend on observed outcome values which is known as longitudinal data with informative observation times (follow-up). Semiparametric regression modeling for this type of data has received much attention as it provides more flexibility in studying the association between regression factors and a longitudinal outcome. An important problem here is how to select relevant variables and estimate their coefficients in semiparametric regression models when the number of covariates at baseline is large. The current penalization procedures in semiparametric regression models for longitudinal data do not account for informative observation times. We propose a variable selection procedure that is suitable for the estimation methods based on pseudo-score functions. We investigate the asymptotic properties of penalized estimators and conduct simulation studies to illustrate the theoretical results. We also use the procedure for variable selection in semiparametric regression models for the STAR*D dataset from a multistage randomized clinical trial for treating major depressive disorder.
Collapse
Affiliation(s)
- Omidali Aghababaei Jazi
- Department of Mathematical and Computational Sciences, University of Toronto Mississauga, Mississauga, Ontario, Canada
| | - Eleanor Pullenayegum
- Department of Child Health Evaluative Sciences, The Hospital for Sick Children, Toronto, Ontario, Canada
| |
Collapse
|
5
|
Yu Q, Li B. Third-variable effect analysis with multilevel additive models. PLoS One 2020; 15:e0241072. [PMID: 33095796 PMCID: PMC7584256 DOI: 10.1371/journal.pone.0241072] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2020] [Accepted: 10/08/2020] [Indexed: 12/02/2022] Open
Abstract
Third-variable effect refers to the effect transmitted by third-variables that intervene in the relationship between an exposure and a response variable. Third-variable effect analysis has been broadly studied in many fields. However, it remains a challenge for researchers to differentiate indirect effect of individual factor from multiple third-variables, especially when the involving variables are of hierarchical structure. Yu et al. (2014) defined third-variable effects that were consistent for all different types of response (categorical or continuous), exposure, or third-variables. With these definitions, multiple third-variables can be considered simultaneously, and the indirect effects carried by individual third-variables can be separated from the total effect. In this paper, we extend the definitions of third-variable effects to multilevel data structures, where multilevel additive models are adapted to model the variable relationships. And then third-variable effects can be estimated at different levels. Moreover, transformations on variables are allowed to present nonlinear relationships among variables. We compile an R package mlma, to carry out the proposed multilevel third-variable analysis. Simulations show that the proposed method can effectively differentiate and estimate third-variable effects from different levels. Further, we implement the method to explore the racial disparity in body mass index accounting for both environmental and individual level risk factors.
Collapse
Affiliation(s)
- Qingzhao Yu
- Biostatistics Program, School of Public Health, Louisiana State University Health Science Center, New Orleans, LA, United States of America
| | - Bin Li
- Department of Experimental Statistics, Louisiana State University, Baton Rouge, LA, United States of America
| |
Collapse
|
6
|
Klein SD, Olman CA, Sponheim SR. Perceptual Mechanisms of Visual Hallucinations and Illusions in Psychosis. JOURNAL OF PSYCHIATRY AND BRAIN SCIENCE 2020; 5:e200020. [PMID: 32944656 PMCID: PMC7494209 DOI: 10.20900/jpbs.20200020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Psychosis has been associated with neural anomalies across a number of brain regions and cortical networks. Nevertheless, the exact pathophysiology of the disorder remains unclear. Aberrant visual perceptions such as hallucinations are evident in psychosis, while the occurrence of visual distortions is elevated in individuals with genetic liability for psychosis. The overall goals of this project are to: (1) use psychophysical tasks and neuroimaging to characterize deficits in visual perception; (2) acquire a mechanistic understanding of these deficits through development and validation of a computational model; and (3) determine if said mechanisms mark genetic liability for psychosis. Visual tasks tapping both low- and high-level visual processing are being completed as individuals with psychotic disorders (IPD), first-degree biological siblings of IPDs (SibIPDs) and healthy controls (HCs) undergo 248-channel magneto-encephalography (MEG) recordings followed by 7 Tesla functional magnetic resonance imaging (MRI). By deriving cortical source signals from MEG and MRI data, we will characterize the timing, location and coordination of neural processes. We hypothesize that IPDs prone to visual hallucinations will exhibit deviant functions within early visual cortex, and that aberrant contextual influences on visual perception will involve higher-level visual cortical regions and be associated with visual hallucinations. SibIPDs who experience visual distortions-but not hallucinations-are hypothesized to exhibit deficits in higher-order visual processing reflected in abnormal inter-regional neural synchronization. We hope the results lead to the development of targeted interventions for psychotic disorders, as well as identify useful biomarkers for aberrant neural functions that give rise to psychosis.
Collapse
Affiliation(s)
- Samuel D. Klein
- Clinical Science and Psychopathology Research Program, University of Minnesota-Twin Cities, 75 East River Road, Minneapolis, MN 55455, USA
| | - Cheryl A. Olman
- Department of Psychology, University of Minnesota-Twin Cities, 75 East River Road, Minneapolis, MN 55455, USA
- Center for Magnetic Resonance Research, University of Minnesota-Twin Cities, 2021 6th St SE, Minneapolis, MN 55455, USA
| | - Scott R. Sponheim
- Minneapolis Veterans Affairs Health Care System, 1 Veterans Dr, Minneapolis, MN 55417, USA
- Department of Psychiatry and Behavioral Science, University of Minnesota, 606 24th Ave S, Minneapolis, MN 55454, USA
| |
Collapse
|
7
|
Hammell AE, Helwig NE, Kaczkurkin AN, Sponheim SR, Lissek S. The temporal course of over-generalized conditioned threat expectancies in posttraumatic stress disorder. Behav Res Ther 2019; 124:103513. [PMID: 31864116 DOI: 10.1016/j.brat.2019.103513] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2019] [Revised: 10/28/2019] [Accepted: 11/08/2019] [Indexed: 11/30/2022]
Abstract
One key conditioning abnormality in posttraumatic stress disorder (PTSD) is heightened generalization of fear from a conditioned danger-cue (CS+) to similarly appearing safe stimuli. The present work represents the first effort to track the time-course of heightened generalization in PTSD with the prediction of heightened PTSD-related over-generalization in earlier but not later trials. This prediction derives from past discriminative fear-conditioning studies providing incidental evidence that over-generalization in PTSD may be reduced with sufficient learning trials. In the current study, we re-analyzed previously published conditioned fear-generalization data (Kaczkurkin et al., 2017) including combat veterans with PTSD (n = 15) or subthreshold PTSD (SubPTSD: n = 18), and trauma controls (TC: n = 19). This re-analysis aimed to identify the trial-by-trial course of group differences in generalized perceived risk across three classes of safe generalization stimuli (GSs) parametrically varying in similarity to a CS+ paired with shock. Those with PTSD and SubPTSD, relative to TC, displayed significantly elevated generalization to all GSs combined in early but not late generalization trials. Additionally, over-generalization in PTSD and SubPTSD persisted across trials to a greater extent for classes of GSs bearing higher resemblance to CS+. Such results suggest that PTSD-related over-generalization of conditioned threat expectancies can be reduced with sufficient exposure to unreinforced GSs and accentuate the importance of analyzing trial-by-trial changes when assessing over-generalization in clinical populations.
Collapse
Affiliation(s)
- Abbey E Hammell
- Department of Psychology, University of Minnesota, Elliot Hall, 75 East River Parkway, Minneapolis, MN, 55455, USA
| | - Nathaniel E Helwig
- Department of Psychology, University of Minnesota, Elliot Hall, 75 East River Parkway, Minneapolis, MN, 55455, USA; School of Statistics, University of Minnesota, Ford Hall, 224 Church Street SE, Minneapolis, MN, 55455, USA
| | - Antonia N Kaczkurkin
- Department of Psychological Sciences, Vanderbilt University, 2301 Vanderbilt Place, Nashville, TN, 37240-7817, USA
| | - Scott R Sponheim
- Minneapolis Veterans Affairs Health Care System, 1 Veterans Drive, Minneapolis, MN, 55417, USA; Department of Psychiatry, University of Minnesota, F282/2A West Building, 2450 Riverside Avenue S, Minneapolis, MN, 55454, USA
| | - Shmuel Lissek
- Department of Psychology, University of Minnesota, Elliot Hall, 75 East River Parkway, Minneapolis, MN, 55455, USA.
| |
Collapse
|
8
|
Xu G, Shang Z, Cheng G. Distributed Generalized Cross-Validation for Divide-and-Conquer Kernel Ridge Regression and Its Asymptotic Optimality. J Comput Graph Stat 2019. [DOI: 10.1080/10618600.2019.1586714] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Affiliation(s)
- Ganggang Xu
- Department of Management Science, University of Miami, Coral Gables, FL
| | - Zuofeng Shang
- Department of Mathematical Sciences, Indiana University – Purdue University Indianapolis, Indianapolis, IN
- Department of Mathematical Sciences, New Jersey Institute of Technology, Newark, NJ
| | - Guang Cheng
- Department of Statistics, Purdue University, West Lafayette, IN
| |
Collapse
|
9
|
|
10
|
Luo S, Song R, Styner M, Gilmore JH, Zhu H. FSEM: Functional Structural Equation Models for Twin Functional Data. J Am Stat Assoc 2018; 114:344-357. [PMID: 31057192 DOI: 10.1080/01621459.2017.1407773] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
The aim of this paper is to develop a novel class of functional structural equation models (FSEMs) for dissecting functional genetic and environmental effects on twin functional data, while characterizing the varying association between functional data and covariates of interest. We propose a three-stage estimation procedure to estimate varying coefficient functions for various covariates (e.g., gender) as well as three covariance operators for the genetic and environmental effects. We develop an inference procedure based on weighted likelihood ratio statistics to test the genetic/environmental effect at either a fixed location or a compact region. We also systematically carry out the theoretical analysis of the estimated varying functions, the weighted likelihood ratio statistics, and the estimated covariance operators. We conduct extensive Monte Carlo simulations to examine the finite-sample performance of the estimation and inference procedures. We apply the proposed FSEM to quantify the degree of genetic and environmental effects on twin white-matter tracts obtained from the UNC early brain development study.
Collapse
Affiliation(s)
- S Luo
- Departments of Statistics, North Carolina State University, Cary, North Carolina, USA
| | - R Song
- Departments of Statistics, North Carolina State University, Cary, North Carolina, USA
| | - M Styner
- Department of Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - J H Gilmore
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - H Zhu
- Department of Biostatistics, and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.,Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| |
Collapse
|
11
|
Zhao K, Lian H. Sparsistent and constansistent estimation of the varying-coefficient model with a diverging number of predictors. COMMUN STAT-THEOR M 2016. [DOI: 10.1080/03610926.2014.890224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
12
|
Helwig NE, Shorter KA, Ma P, Hsiao-Wecksler ET. Smoothing spline analysis of variance models: A new tool for the analysis of cyclic biomechanical data. J Biomech 2016; 49:3216-3222. [DOI: 10.1016/j.jbiomech.2016.07.035] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2015] [Revised: 06/09/2016] [Accepted: 07/31/2016] [Indexed: 11/26/2022]
|
13
|
Ding J, Zhang Z. Bayesian regression on non-parametric mixed-effect models with shape-restricted Bernstein polynomials. J Appl Stat 2016; 43:2524-2537. [PMID: 38818091 PMCID: PMC11134047 DOI: 10.1080/02664763.2016.1142940] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2014] [Accepted: 01/13/2016] [Indexed: 10/22/2022]
Abstract
We develop a Bayesian estimation method to non-parametric mixed-effect models under shape-constrains. The approach uses a hierarchical Bayesian framework and characterizations of shape-constrained Bernstein polynomials (BPs). We employ Markov chain Monte Carlo methods for model fitting, using a truncated normal distribution as the prior for the coefficients of BPs to ensure the desired shape constraints. The small sample properties of the Bayesian shape-constrained estimators across a range of functions are provided via simulation studies. Two real data analysis are given to illustrate the application of the proposed method.
Collapse
Affiliation(s)
- Jianhua Ding
- Department of Statistics, Shanxi Datong University, Datong, People's Republic of China
| | - Zhongzhan Zhang
- College of Applied Sciences, Beijing University of Technology, Beijing, People's Republic of China
| |
Collapse
|
14
|
Lu T, Wang M. Investigate Data Dependency for Dynamic Gene Regulatory Network Identification through High-dimensional Differential Equation Approach. COMMUN STAT-SIMUL C 2014. [DOI: 10.1080/03610918.2014.902224] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
15
|
Using semiparametric-mixed model and functional linear model to detect vulnerable prenatal window to carcinogenic polycyclic aromatic hydrocarbons on fetal growth. Biom J 2013; 56:243-55. [DOI: 10.1002/bimj.201200132] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2012] [Revised: 07/14/2013] [Accepted: 07/29/2013] [Indexed: 12/19/2022]
|
16
|
Kim I, Pang H, Zhao H. Statistical properties on semiparametric regression for evaluating pathway effects. J Stat Plan Inference 2013; 143:745-763. [PMID: 24014933 PMCID: PMC3763850 DOI: 10.1016/j.jspi.2012.09.009] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Most statistical methods for microarray data analysis consider one gene at a time, and they may miss subtle changes at the single gene level. This limitation may be overcome by considering a set of genes simultaneously where the gene sets are derived from prior biological knowledge. We call a pathway as a predefined set of genes that serve a particular cellular or physiological function. Limited work has been done in the regression settings to study the effects of clinical covariates and expression levels of genes in a pathway on a continuous clinical outcome. A semiparametric regression approach for identifying pathways related to a continuous outcome was proposed by Liu et al. (2007), who demonstrated the connection between a least squares kernel machine for nonparametric pathway effect and a restricted maximum likelihood (REML) for variance components. However, the asymptotic properties on a semiparametric regression for identifying pathway have never been studied. In this paper, we study the asymptotic properties of the parameter estimates on semiparametric regression and compare Liu et al.'s REML with our REML obtained from a profile likelihood. We prove that both approaches provide consistent estimators, have [Formula: see text] convergence rate under regularity conditions, and have either an asymptotically normal distribution or a mixture of normal distributions. However, the estimators based on our REML obtained from a profile likelihood have a theoretically smaller mean squared error than those of Liu et al.'s REML. Simulation study supports this theoretical result. A profile restricted likelihood ratio test is also provided for the non-standard testing problem. We apply our approach to a type II diabetes data set (Mootha et al., 2003).
Collapse
Affiliation(s)
- Inyoung Kim
- Department of Statistics, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | - Herbert Pang
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27705, USA
| | - Hongyu Zhao
- Division of Biostatistics, Yale School of Public Health, New Haven, CT 06520, USA
- Department of Genetics, Yale University School of Medicine, New Haven, CT 06520, USA
| |
Collapse
|
17
|
González Manteiga W, Lombardía MJ, Martínez Miranda MD, Sperlich S. Kernel smoothers and bootstrapping for semiparametric mixed effects models. J MULTIVARIATE ANAL 2013. [DOI: 10.1016/j.jmva.2012.08.005] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
18
|
Xu G, Huang JZ. Asymptotic optimality and efficient computation of the leave-subject-out cross-validation. Ann Stat 2012. [DOI: 10.1214/12-aos1063] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
19
|
Sun X, Ma P, Mumm RH. Nonparametric method for genomics-based prediction of performance of quantitative traits involving epistasis in plant breeding. PLoS One 2012; 7:e50604. [PMID: 23226325 PMCID: PMC3511520 DOI: 10.1371/journal.pone.0050604] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2012] [Accepted: 10/25/2012] [Indexed: 12/31/2022] Open
Abstract
Genomic selection (GS) procedures have proven useful in estimating breeding value and predicting phenotype with genome-wide molecular marker information. However, issues of high dimensionality, multicollinearity, and the inability to deal effectively with epistasis can jeopardize accuracy and predictive ability. We, therefore, propose a new nonparametric method, pRKHS, which combines the features of supervised principal component analysis (SPCA) and reproducing kernel Hilbert spaces (RKHS) regression, with versions for traits with no/low epistasis, pRKHS-NE, to high epistasis, pRKHS-E. Instead of assigning a specific relationship to represent the underlying epistasis, the method maps genotype to phenotype in a nonparametric way, thus requiring fewer genetic assumptions. SPCA decreases the number of markers needed for prediction by filtering out low-signal markers with the optimal marker set determined by cross-validation. Principal components are computed from reduced marker matrix (called supervised principal components, SPC) and included in the smoothing spline ANOVA model as independent variables to fit the data. The new method was evaluated in comparison with current popular methods for practicing GS, specifically RR-BLUP, BayesA, BayesB, as well as a newer method by Crossa et al., RKHS-M, using both simulated and real data. Results demonstrate that pRKHS generally delivers greater predictive ability, particularly when epistasis impacts trait expression. Beyond prediction, the new method also facilitates inferences about the extent to which epistasis influences trait expression.
Collapse
Affiliation(s)
- Xiaochun Sun
- Department of Crop Sciences and the Illinois Plant Breeding Center, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Ping Ma
- Department of Statistics; University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- * E-mail: (PM); (RM)
| | - Rita H. Mumm
- Department of Crop Sciences and the Illinois Plant Breeding Center, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- * E-mail: (PM); (RM)
| |
Collapse
|
20
|
|
21
|
A Comparison of Error Variance Estimates in Nonparametric Mixed Models. COMMUN STAT-THEOR M 2012. [DOI: 10.1080/03610926.2010.529526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
22
|
Lu T, Liang H, Li H, Wu H. High Dimensional ODEs Coupled with Mixed-Effects Modeling Techniques for Dynamic Gene Regulatory Network Identification. J Am Stat Assoc 2012. [PMID: 23204614 DOI: 10.1198/jasa.2011.ap10194] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Gene regulation is a complicated process. The interaction of many genes and their products forms an intricate biological network. Identification of this dynamic network will help us understand the biological process in a systematic way. However, the construction of such a dynamic network is very challenging for a high-dimensional system. In this article we propose to use a set of ordinary differential equations (ODE), coupled with dimensional reduction by clustering and mixed-effects modeling techniques, to model the dynamic gene regulatory network (GRN). The ODE models allow us to quantify both positive and negative gene regulations as well as feedback effects of one set of genes in a functional module on the dynamic expression changes of the genes in another functional module, which results in a directed graph network. A five-step procedure, Clustering, Smoothing, regulation Identification, parameter Estimates refining and Function enrichment analysis (CSIEF) is developed to identify the ODE-based dynamic GRN. In the proposed CSIEF procedure, a series of cutting-edge statistical methods and techniques are employed, that include non-parametric mixed-effects models with a mixture distribution for clustering, nonparametric mixed-effects smoothing-based methods for ODE models, the smoothly clipped absolute deviation (SCAD)-based variable selection, and stochastic approximation EM (SAEM) approach for mixed-effects ODE model parameter estimation. The key step, the SCAD-based variable selection of the proposed procedure is justified by investigating its asymptotic properties and validated by Monte Carlo simulations. We apply the proposed method to identify the dynamic GRN for yeast cell cycle progression data. We are able to annotate the identified modules through function enrichment analyses. Some interesting biological findings are discussed. The proposed procedure is a promising tool for constructing a general dynamic GRN and more complicated dynamic networks.
Collapse
Affiliation(s)
- Tao Lu
- Department of Biostatistics and Computational Biology, School of Medicine and Dentistry, University of Rochester, Rochester, New York 14642
| | | | | | | |
Collapse
|
23
|
Shang Z. Convergence rate and Bahadur type representation of general smoothing spline M-estimates. Electron J Stat 2010. [DOI: 10.1214/10-ejs588] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
24
|
|
25
|
Du P. Nonparametric modeling of the gap time in recurrent event data. LIFETIME DATA ANALYSIS 2009; 15:256-277. [PMID: 19123038 DOI: 10.1007/s10985-008-9110-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/20/2008] [Accepted: 12/15/2008] [Indexed: 05/27/2023]
Abstract
Recurrent event data arise in many biomedical and engineering studies when failure events can occur repeatedly over time for each study subject. In this article, we are interested in nonparametric estimation of the hazard function for gap time. A penalized likelihood model is proposed to estimate the hazard as a function of both gap time and covariate. Method for smoothing parameter selection is developed from subject-wise cross-validation. Confidence intervals for the hazard function are derived using the Bayes model of the penalized likelihood. An eigenvalue analysis establishes the asymptotic convergence rates of the relevant estimates. Empirical studies are performed to evaluate various aspects of the method. The proposed technique is demonstrated through an application to the well-known bladder tumor cancer data.
Collapse
Affiliation(s)
- Pang Du
- Department of Statistics, Virginia Tech, Blacksburg, VA 24061, USA.
| |
Collapse
|
26
|
XU WANGLI, ZHU LIXING. Kernel-based Generalized Cross-validation in Non-parametric Mixed-effect Models. Scand Stat Theory Appl 2009. [DOI: 10.1111/j.1467-9469.2008.00625.x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
27
|
Xie B, Pan W, Shen X. Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables. Electron J Stat 2008; 2:168-212. [PMID: 19920875 DOI: 10.1214/08-ejs194] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Clustering analysis is one of the most widely used statistical tools in many emerging areas such as microarray data analysis. For microarray and other high-dimensional data, the presence of many noise variables may mask underlying clustering structures. Hence removing noise variables via variable selection is necessary. For simultaneous variable selection and parameter estimation, existing penalized likelihood approaches in model-based clustering analysis all assume a common diagonal covariance matrix across clusters, which however may not hold in practice. To analyze high-dimensional data, particularly those with relatively low sample sizes, this article introduces a novel approach that shrinks the variances together with means, in a more general situation with cluster-specific (diagonal) covariance matrices. Furthermore, selection of grouped variables via inclusion or exclusion of a group of variables altogether is permitted by a specific form of penalty, which facilitates incorporating subject-matter knowledge, such as gene functions in clustering microarray samples for disease subtype discovery. For implementation, EM algorithms are derived for parameter estimation, in which the M-steps clearly demonstrate the effects of shrinkage and thresholding. Numerical examples, including an application to acute leukemia subtype discovery with microarray gene expression data, are provided to demonstrate the utility and advantage of the proposed method.
Collapse
Affiliation(s)
- Benhuai Xie
- Division of Biostatistics, School of Public Health, University of Minnesota,
| | | | | |
Collapse
|
28
|
Liu A, Wang Y. Modeling of hormone secretion-generating mechanisms with splines: a pseudo-likelihood approach. Biometrics 2007; 63:201-8. [PMID: 17447946 DOI: 10.1111/j.1541-0420.2006.00672.x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
A flexible and robust approach is proposed for the investigation of underlying hormone secretion-generating mechanisms. Characterizing hormone time series is a difficult task as most hormones are secreted in a pulsatile manner and pulses are often masked by slow decay. We model hormone concentration as a filtered counting process where the intensity function of the counting process is modeled nonparametrically using periodic splines. The intensity function and parameters are estimated using a combination of weighted least squares and pseudo-likelihood based on the first two moments. Our method uses concentration measurements directly, which avoids the difficult task of estimating pulse numbers and locations. Both simulations and applications suggest that our method performs well for estimating the intensity function of the pulse-generating counting processes.
Collapse
Affiliation(s)
- Anna Liu
- Department of Mathematics and Statistics, University of Massachusetts, Amherst, Massachusetts 01003, USA
| | | |
Collapse
|
29
|
Ma P, Wang P, Tenorio L, de Hoop MV, van der Hilst RD. Imaging of structure at and near the core-mantle boundary using a generalized radon transform: 2. Statistical inference of singularities. ACTA ACUST UNITED AC 2007. [DOI: 10.1029/2006jb004513] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|