1
|
Dzinza R, Ngwira A. Comparing parametric and Cox regression models using HIV/AIDS survival data from a retrospective study in Ntcheu district in Malawi. J Public Health Res 2022; 11:22799036221125328. [PMID: 36185416 PMCID: PMC9523851 DOI: 10.1177/22799036221125328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Accepted: 08/19/2022] [Indexed: 11/16/2022] Open
Abstract
Background: The study was designed to compare parametric and Cox regression survival models. It was also aimed at determining risk factors of death due to HIV/AIDS. Design and methods: The models were fitted to time from ART initiation to death due to HIV/AIDS while using data that was collected from 6670 patients records who registered for ART from 2007 to 2012 at Ntcheu district hospital in Malawi. The best fitting model was used to determine risk factors of death due to HIV/AIDS. Results: The exponential and Gompertz model competed very well with the Cox regression. Patients in WHO clinical stage 4 (HR = 1.69, p-value <0.001) and males (HR = 1.74, p-value <0.001) were associated with increased hazard of death than those in WHO clinical stage 3 and females. Patients with high body mass index (HR = 0.82, p-value <0.001) were associated with lower hazard of death than those with lower body mass index. Conclusions: Parametric models may perform as good as the Cox regression and the plausibility of all models needs to be investigated to use the correct model for accurate inferences. Furthermore, strategies to limit deaths due to HIV/AIDS should initiate ART early before WHO clinical stage 4 and males should receive special attention. The strategies should also aim at improving the body mass index of patients.
Collapse
Affiliation(s)
- Rabson Dzinza
- Mathematics and Statistics Department, Nalikule College of Education, Lilongwe, Malawi
| | - Alfred Ngwira
- Basic Sciences Department, Lilongwe University of Agriculture and Natural Resources, Lilongwe, Malawi
| |
Collapse
|
2
|
Ren J, Tapert S, Fan CC, Thompson WK. A semi-parametric Bayesian model for semi-continuous longitudinal data. Stat Med 2022; 41:2354-2374. [PMID: 35274335 DOI: 10.1002/sim.9359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Revised: 01/21/2022] [Accepted: 02/03/2022] [Indexed: 11/11/2022]
Abstract
Semi-continuous data present challenges in both model fitting and interpretation. Parametric distributions may be inappropriate for extreme long right tails of the data. Mean effects of covariates, susceptible to extreme values, may fail to capture relevant information for most of the sample. We propose a two-component semi-parametric Bayesian mixture model, with the discrete component captured by a probability mass (typically at zero) and the continuous component of the density modeled by a mixture of B-spline densities that can be flexibly fit to any data distribution. The model includes random effects of subjects to allow for application to longitudinal data. We specify prior distributions on parameters and perform model inference using a Markov chain Monte Carlo (MCMC) Gibbs-sampling algorithm programmed in R. Statistical inference can be made for multiple quantiles of the covariate effects simultaneously providing a comprehensive view. Various MCMC sampling techniques are used to facilitate convergence. We demonstrate the performance and the interpretability of the model via simulations and analyses on the National Consortium on Alcohol and Neurodevelopment in Adolescence study (NCANDA) data on alcohol binge drinking.
Collapse
Affiliation(s)
- Junting Ren
- Division of Biostatistics, Herbert Wertheim School of Public Health and Human Longevity Science, University of California San Diego, La Jolla, California, USA.,Population Neuroscience and Genetics Lab, University of California San Diego, La Jolla, California, USA
| | - Susan Tapert
- Department of Psychiatry, University of California San Diego, La Jolla, California, USA
| | - Chun Chieh Fan
- Population Neuroscience and Genetics Lab, University of California San Diego, La Jolla, California, USA.,Center for Human Development, University of California San Diego, La Jolla, California, USA
| | - Wesley K Thompson
- Population Neuroscience and Genetics Lab, University of California San Diego, La Jolla, California, USA.,Department of Radiology, University of California San Diego, La Jolla, California, USA
| |
Collapse
|
3
|
Rønneberg L, Cremaschi A, Hanes R, Enserink JM, Zucknick M. bayesynergy: flexible Bayesian modelling of synergistic interaction effects in in vitro drug combination experiments. Brief Bioinform 2021; 22:bbab251. [PMID: 34308471 PMCID: PMC8575029 DOI: 10.1093/bib/bbab251] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Revised: 05/26/2021] [Accepted: 06/14/2021] [Indexed: 11/24/2022] Open
Abstract
The effect of cancer therapies is often tested pre-clinically via in vitro experiments, where the post-treatment viability of the cancer cell population is measured through assays estimating the number of viable cells. In this way, large libraries of compounds can be tested, comparing the efficacy of each treatment. Drug interaction studies focus on the quantification of the additional effect encountered when two drugs are combined, as opposed to using the treatments separately. In the bayesynergy R package, we implement a probabilistic approach for the description of the drug combination experiment, where the observed dose response curve is modelled as a sum of the expected response under a zero-interaction model and an additional interaction effect (synergistic or antagonistic). Although the model formulation makes use of the Bliss independence assumption, we note that the posterior estimates of the dose-response surface can also be used to extract synergy scores based on other reference models, which we illustrate for the Highest Single Agent model. The interaction is modelled in a flexible manner, using a Gaussian process formulation. Since the proposed approach is based on a statistical model, it allows the natural inclusion of replicates, handles missing data and uneven concentration grids, and provides uncertainty quantification around the results. The model is implemented in the open-source Stan programming language providing a computationally efficient sampler, a fast approximation of the posterior through variational inference, and features parallel processing for working with large drug combination screens.
Collapse
Affiliation(s)
- Leiv Rønneberg
- Oslo Centre for Biostatistics and Epidemiology (OCBE), University of Oslo, Norway
| | - Andrea Cremaschi
- Singapore Institute for Clinical Sciences (SICS), A*STAR, Singapore
| | - Robert Hanes
- Department of Molecular Cell Biology, Institute for Cancer Research, The Norwegian Radium Hospital, Montebello, Oslo 0379, Norway
- Centre for Cancer Cell Reprogramming, Institute of Clinical Medicine, Faculty of Medicine, University of Oslo, Oslo, Norway
| | - Jorrit M Enserink
- Department of Molecular Cell Biology, Institute for Cancer Research, The Norwegian Radium Hospital, Montebello, Oslo 0379, Norway
- Centre for Cancer Cell Reprogramming, Institute of Clinical Medicine, Faculty of Medicine, University of Oslo, Oslo, Norway
- Department of Biosciences, Faculty of Mathematics and Natural Sciences, University of Oslo, PO Box 1066 Blindern, Oslo 0316, Norway
| | - Manuela Zucknick
- Oslo Centre for Biostatistics and Epidemiology (OCBE), University of Oslo, Norway
| |
Collapse
|
4
|
Enyew BY, Asfaw ZG. Comparison of survival models and assessment of risk factors for survival of cardiovascular patients at Addis Ababa Cardiac Center, Ethiopia: a retrospective study. Afr Health Sci 2021; 21:1201-1213. [PMID: 35222583 PMCID: PMC8843306 DOI: 10.4314/ahs.v21i3.29] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Background Cardiovascular diseases (CVDs) is disorders of heart and blood vessels. It is a major health problem across the world, and 82% of CVD deaths is contributed by countries with low and middle income. The aim of this study was to choose appropriate model for the survival of cardiovascular patients data and identify the factors that affect the survival of cardiovascular patients at Addis Ababa Cardiac Center. Method A Retrospective study was conducted on patients under follow-up at Addis Ababa Cardiac Center between September 2010 to December 2018. The patients included have made either post operation or pre-operation. Out of 1042 cardiac patients, a sample of 332 were selected for the current study using simple random sampling technique. Non-parametric, semi-parametric and parametric survival models were used and comparisons were made to select the appropriate predicting model. Results Among the sample of 332 cardiac patients, only 67(20.2%) experienced CVD and the remaining 265(79.8%) were censored. The median and the maximum survival time of cardiac patients was 1925 and 1403 days respectively. The estimated hazard ratio of male patients to female patients is 1.926214 (95%CI: 1.111917–3.336847; p = 0.019) implying that the risk of death of male patients is 1.926214 times higher than female cardiac patients keeping the other covariates constant in the model. Even if, all semi parametric and parametric survival models fitted to the current data well, various model comparison criteria showed that parametric/weibull AFT survival model is better than the other. Conclusions The governmental and non-governmental stakeholders should pay attention to give training on the risk factors identified on the current study to optimize individual's knowledge and awareness so that death due to CVDs can be minimized.
Collapse
|
5
|
Mújica-Mota RE, Landa P, Pitt M, Allen M, Spencer A. The heterogeneous causal effects of neonatal care: a model of endogenous demand for multiple treatment options based on geographical access to care. Health Econ 2020; 29:46-60. [PMID: 31746059 DOI: 10.1002/hec.3970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/06/2018] [Revised: 08/14/2019] [Accepted: 10/06/2019] [Indexed: 06/10/2023]
Abstract
Neonatal units in the UK are organised into three levels, from highest Neonatal Intensive Care Unit (NICU), to Local Neonatal Unit (LNU) to lowest Special Care Unit (SCU). We model the endogenous treatment selection of neonatal care unit of birth to estimate the average and marginal treatment effects of different neonatal designations on infant mortality, length of stay and hospital costs. We use prognostic factors, survival and hospital care use data on all preterm births in England for 2014-2015, supplemented by national reimbursement tariffs and instrumental variables of travel time from a geographic information system. The data were consistent with a model of demand for preterm birth care driven by physical access. In-hospital mortality of infants born before 32 weeks was 8.5% overall, and 1.2 (95% CI: -0.7, 3.2) percentage points lower for live births in hospitals with NICU or SCU compared to those with an LNU according to instrumental variable estimates. We find imprecise differences in average total hospital costs by unit designation, with positive unobserved selection of those with higher unexplained absolute and incremental costs into NICU. Our results suggest a limited scope for improvement in infant mortality by increasing in-utero transfers based on unit designation alone.
Collapse
Affiliation(s)
- Rubén E Mújica-Mota
- University of Leeds Medical School, Leeds Institute of Health Sciences, Leeds, UK
| | - Paolo Landa
- Department of Economics, University of Genoa, Genoa, Italy
| | - Martin Pitt
- University of Exeter Medical School, Institute of Health Research, Exeter, UK
| | - Mike Allen
- University of Exeter Medical School, Institute of Health Research, Exeter, UK
| | - Anne Spencer
- University of Exeter Medical School, Institute of Health Research, Exeter, UK
| |
Collapse
|
6
|
Balzer LB, Zheng W, van der Laan MJ, Petersen ML. A new approach to hierarchical data analysis: Targeted maximum likelihood estimation for the causal effect of a cluster-level exposure. Stat Methods Med Res 2019; 28:1761-1780. [PMID: 29921160 PMCID: PMC6173669 DOI: 10.1177/0962280218774936] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
We often seek to estimate the impact of an exposure naturally occurring or randomly assigned at the cluster-level. For example, the literature on neighborhood determinants of health continues to grow. Likewise, community randomized trials are applied to learn about real-world implementation, sustainability, and population effects of interventions with proven individual-level efficacy. In these settings, individual-level outcomes are correlated due to shared cluster-level factors, including the exposure, as well as social or biological interactions between individuals. To flexibly and efficiently estimate the effect of a cluster-level exposure, we present two targeted maximum likelihood estimators (TMLEs). The first TMLE is developed under a non-parametric causal model, which allows for arbitrary interactions between individuals within a cluster. These interactions include direct transmission of the outcome (i.e. contagion) and influence of one individual's covariates on another's outcome (i.e. covariate interference). The second TMLE is developed under a causal sub-model assuming the cluster-level and individual-specific covariates are sufficient to control for confounding. Simulations compare the alternative estimators and illustrate the potential gains from pairing individual-level risk factors and outcomes during estimation, while avoiding unwarranted assumptions. Our results suggest that estimation under the sub-model can result in bias and misleading inference in an observational setting. Incorporating working assumptions during estimation is more robust than assuming they hold in the underlying causal model. We illustrate our approach with an application to HIV prevention and treatment.
Collapse
Affiliation(s)
- Laura B Balzer
- Department of Biostatistics & Epidemiology, School of Public Health & Health Sciences, University of Massachusetts, Amherst, MA, USA
| | | | - Mark J van der Laan
- Division of Biostatistics, School of Public Health, University of California, Berkeley, CA, USA
| | - Maya L Petersen
- Division of Biostatistics, School of Public Health, University of California, Berkeley, CA, USA
| |
Collapse
|
7
|
Pirracchio R, Yue JK, Manley GT, van der Laan MJ, Hubbard AE. Collaborative targeted maximum likelihood estimation for variable importance measure: Illustration for functional outcome prediction in mild traumatic brain injuries. Stat Methods Med Res 2018; 27:286-297. [PMID: 27363429 PMCID: PMC5589499 DOI: 10.1177/0962280215627335] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Standard statistical practice used for determining the relative importance of competing causes of disease typically relies on ad hoc methods, often byproducts of machine learning procedures (stepwise regression, random forest, etc.). Causal inference framework and data-adaptive methods may help to tailor parameters to match the clinical question and free one from arbitrary modeling assumptions. Our focus is on implementations of such semiparametric methods for a variable importance measure (VIM). We propose a fully automated procedure for VIM based on collaborative targeted maximum likelihood estimation (cTMLE), a method that optimizes the estimate of an association in the presence of potentially numerous competing causes. We applied the approach to data collected from traumatic brain injury patients, specifically a prospective, observational study including three US Level-1 trauma centers. The primary outcome was a disability score (Glasgow Outcome Scale - Extended (GOSE)) collected three months post-injury. We identified clinically important predictors among a set of risk factors using a variable importance analysis based on targeted maximum likelihood estimators (TMLE) and on cTMLE. Via a parametric bootstrap, we demonstrate that the latter procedure has the potential for robust automated estimation of variable importance measures based upon machine-learning algorithms. The cTMLE estimator was associated with substantially less positivity bias as compared to TMLE and larger coverage of the 95% CI. This study confirms the power of an automated cTMLE procedure that can target model selection via machine learning to estimate VIMs in complicated, high-dimensional data.
Collapse
Affiliation(s)
- Romain Pirracchio
- Department of Anesthesia and Perioperative Care, UCSF, San Francisco General Hospital, San Francisco, CA, USA
| | - John K Yue
- Brain and Spinal Injury Center, San Francisco, CA, USA
- Department of Neurosurgery, University of California San Francisco, San Francisco, CA, USA
| | - Geoffrey T Manley
- Brain and Spinal Injury Center, San Francisco, CA, USA
- Department of Neurosurgery, University of California San Francisco, San Francisco, CA, USA
| | - Mark J van der Laan
- Division of Biostatistics, School of Public Health, University of California Berkeley, Berkeley, CA, USA
| | - Alan E Hubbard
- Division of Biostatistics, School of Public Health, University of California Berkeley, Berkeley, CA, USA
| | | |
Collapse
|
8
|
Tidwell JW, Dougherty MR, Chrabaszcz JS, Thomas RP. Order-constrained linear optimization. Br J Math Stat Psychol 2017; 70:391-411. [PMID: 28239834 DOI: 10.1111/bmsp.12090] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/10/2015] [Revised: 10/04/2016] [Indexed: 06/06/2023]
Abstract
Despite the fact that data and theories in the social, behavioural, and health sciences are often represented on an ordinal scale, there has been relatively little emphasis on modelling ordinal properties. The most common analytic framework used in psychological science is the general linear model, whose variants include ANOVA, MANOVA, and ordinary linear regression. While these methods are designed to provide the best fit to the metric properties of the data, they are not designed to maximally model ordinal properties. In this paper, we develop an order-constrained linear least-squares (OCLO) optimization algorithm that maximizes the linear least-squares fit to the data conditional on maximizing the ordinal fit based on Kendall's τ. The algorithm builds on the maximum rank correlation estimator (Han, 1987, Journal of Econometrics, 35, 303) and the general monotone model (Dougherty & Thomas, 2012, Psychological Review, 119, 321). Analyses of simulated data indicate that when modelling data that adhere to the assumptions of ordinary least squares, OCLO shows minimal bias, little increase in variance, and almost no loss in out-of-sample predictive accuracy. In contrast, under conditions in which data include a small number of extreme scores (fat-tailed distributions), OCLO shows less bias and variance, and substantially better out-of-sample predictive accuracy, even when the outliers are removed. We show that the advantages of OCLO over ordinary least squares in predicting new observations hold across a variety of scenarios in which researchers must decide to retain or eliminate extreme scores when fitting data.
Collapse
Affiliation(s)
- Joe W Tidwell
- Department of Psychology, University of Maryland, College Park, Maryland, USA
| | - Michael R Dougherty
- Department of Psychology, University of Maryland, College Park, Maryland, USA
| | | | - Rick P Thomas
- Department of Psychology, Georgia Institute of Technology, Atlanta, Georgia, USA
| |
Collapse
|
9
|
Abstract
Motivated by a genetic investigation on the progressive decline in renal function in a clinical trial study of kidney disease, we develop a practical test for evaluating the group difference in trajectories under a semi-parametric modeling framework. For the temporal patterns or trajectories of longitudinal data, B-splines are used to approximate the function non-parametrically. Such approximation asymptotically converts the problem of testing trajectory difference into the significance test of regression coefficients that can be simply estimated by generalized estimating equations. To select the optimal number of inner knots for B-splines, a cross-validation procedure is performed using the criterion of the generalized residual sum of squares. The new proposed test successfully detects a significant difference of underlying genetic impact on the progression of renal disease, which is not captured by the parametric approach.
Collapse
Affiliation(s)
- Feiyang Niu
- 1 Department of Statistics, University of Virginia, VA, USA
| | - Jianhui Zhou
- 1 Department of Statistics, University of Virginia, VA, USA
| | - Thu H Le
- 2 Division of Nephrology, Department of Medicine, University of Virginia, VA, USA
| | - Jennie Z Ma
- 3 Division of Biostatistics, Department of Public Health Sciences, University of Virginia, Charlottesville, VA, USA
| |
Collapse
|
10
|
Zhao Q, Strykowski G, Li J, Pan X, Xu X. Evaluation and Comparison of the Processing Methods of Airborne Gravimetry Concerning the Errors Effects on Downward Continuation Results: Case Studies in Louisiana (USA) and the Tibetan Plateau (China). Sensors (Basel) 2017; 17:E1205. [PMID: 28587086 DOI: 10.3390/s17061205] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/04/2017] [Revised: 05/11/2017] [Accepted: 05/20/2017] [Indexed: 11/17/2022]
Abstract
Gravity data gaps in mountainous areas are nowadays often filled in with the data from airborne gravity surveys. Because of the errors caused by the airborne gravimeter sensors, and because of rough flight conditions, such errors cannot be completely eliminated. The precision of the gravity disturbances generated by the airborne gravimetry is around 3-5 mgal. A major obstacle in using airborne gravimetry are the errors caused by the downward continuation. In order to improve the results the external high-accuracy gravity information e.g., from the surface data can be used for high frequency correction, while satellite information can be applying for low frequency correction. Surface data may be used to reduce the systematic errors, while regularization methods can reduce the random errors in downward continuation. Airborne gravity surveys are sometimes conducted in mountainous areas and the most extreme area of the world for this type of survey is the Tibetan Plateau. Since there are no high-accuracy surface gravity data available for this area, the above error minimization method involving the external gravity data cannot be used. We propose a semi-parametric downward continuation method in combination with regularization to suppress the systematic error effect and the random error effect in the Tibetan Plateau; i.e., without the use of the external high-accuracy gravity data. We use a Louisiana airborne gravity dataset from the USA National Oceanic and Atmospheric Administration (NOAA) to demonstrate that the new method works effectively. Furthermore, and for the Tibetan Plateau we show that the numerical experiment is also successfully conducted using the synthetic Earth Gravitational Model 2008 (EGM08)-derived gravity data contaminated with the synthetic errors. The estimated systematic errors generated by the method are close to the simulated values. In addition, we study the relationship between the downward continuation altitudes and the error effect. The analysis results show that the proposed semi-parametric method combined with regularization is efficient to address such modelling problems.
Collapse
|
11
|
Shikha M, Kanika A, Rao AR, Mallikarjuna MG, Gupta HS, Nepolean T. Genomic Selection for Drought Tolerance Using Genome-Wide SNPs in Maize. Front Plant Sci 2017; 8:550. [PMID: 28484471 PMCID: PMC5399777 DOI: 10.3389/fpls.2017.00550] [Citation(s) in RCA: 72] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/17/2017] [Accepted: 03/27/2017] [Indexed: 05/05/2023]
Abstract
Traditional breeding strategies for selecting superior genotypes depending on phenotypic traits have proven to be of limited success, as this direct selection is hindered by low heritability, genetic interactions such as epistasis, environmental-genotype interactions, and polygenic effects. With the advent of new genomic tools, breeders have paved a way for selecting superior breeds. Genomic selection (GS) has emerged as one of the most important approaches for predicting genotype performance. Here, we tested the breeding values of 240 maize subtropical lines phenotyped for drought at different environments using 29,619 cured SNPs. Prediction accuracies of seven genomic selection models (ridge regression, LASSO, elastic net, random forest, reproducing kernel Hilbert space, Bayes A and Bayes B) were tested for their agronomic traits. Though prediction accuracies of Bayes B, Bayes A and RKHS were comparable, Bayes B outperformed the other models by predicting highest Pearson correlation coefficient in all three environments. From Bayes B, a set of the top 1053 significant SNPs with higher marker effects was selected across all datasets to validate the genes and QTLs. Out of these 1053 SNPs, 77 SNPs associated with 10 drought-responsive transcription factors. These transcription factors were associated with different physiological and molecular functions (stomatal closure, root development, hormonal signaling and photosynthesis). Of several models, Bayes B has been shown to have the highest level of prediction accuracy for our data sets. Our experiments also highlighted several SNPs based on their performance and relative importance to drought tolerance. The result of our experiments is important for the selection of superior genotypes and candidate genes for breeding drought-tolerant maize hybrids.
Collapse
Affiliation(s)
- Mittal Shikha
- Division of Genetics, ICAR-Indian Agricultural Research InstituteNew Delhi, India
| | - Arora Kanika
- Division of Genetics, ICAR-Indian Agricultural Research InstituteNew Delhi, India
| | - Atmakuri Ramakrishna Rao
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research InstituteNew Delhi, India
| | | | - Hari Shanker Gupta
- Division of Genetics, ICAR-Indian Agricultural Research InstituteNew Delhi, India
- Office of Director General, Borlaug Institute for South AsiaNew Delhi, India
| | | |
Collapse
|
12
|
Shikha M, Kanika A, Rao AR, Mallikarjuna MG, Gupta HS, Nepolean T. Genomic Selection for Drought Tolerance Using Genome-Wide SNPs in Maize. Front Plant Sci 2017. [PMID: 28484471 DOI: 10.3385/fpls.2017.00550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Traditional breeding strategies for selecting superior genotypes depending on phenotypic traits have proven to be of limited success, as this direct selection is hindered by low heritability, genetic interactions such as epistasis, environmental-genotype interactions, and polygenic effects. With the advent of new genomic tools, breeders have paved a way for selecting superior breeds. Genomic selection (GS) has emerged as one of the most important approaches for predicting genotype performance. Here, we tested the breeding values of 240 maize subtropical lines phenotyped for drought at different environments using 29,619 cured SNPs. Prediction accuracies of seven genomic selection models (ridge regression, LASSO, elastic net, random forest, reproducing kernel Hilbert space, Bayes A and Bayes B) were tested for their agronomic traits. Though prediction accuracies of Bayes B, Bayes A and RKHS were comparable, Bayes B outperformed the other models by predicting highest Pearson correlation coefficient in all three environments. From Bayes B, a set of the top 1053 significant SNPs with higher marker effects was selected across all datasets to validate the genes and QTLs. Out of these 1053 SNPs, 77 SNPs associated with 10 drought-responsive transcription factors. These transcription factors were associated with different physiological and molecular functions (stomatal closure, root development, hormonal signaling and photosynthesis). Of several models, Bayes B has been shown to have the highest level of prediction accuracy for our data sets. Our experiments also highlighted several SNPs based on their performance and relative importance to drought tolerance. The result of our experiments is important for the selection of superior genotypes and candidate genes for breeding drought-tolerant maize hybrids.
Collapse
Affiliation(s)
- Mittal Shikha
- Division of Genetics, ICAR-Indian Agricultural Research InstituteNew Delhi, India
| | - Arora Kanika
- Division of Genetics, ICAR-Indian Agricultural Research InstituteNew Delhi, India
| | - Atmakuri Ramakrishna Rao
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research InstituteNew Delhi, India
| | | | - Hari Shanker Gupta
- Division of Genetics, ICAR-Indian Agricultural Research InstituteNew Delhi, India
- Office of Director General, Borlaug Institute for South AsiaNew Delhi, India
| | | |
Collapse
|
13
|
Jacquin L, Cao TV, Ahmadi N. A Unified and Comprehensible View of Parametric and Kernel Methods for Genomic Prediction with Application to Rice. Front Genet 2016; 7:145. [PMID: 27555865 PMCID: PMC4977290 DOI: 10.3389/fgene.2016.00145] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2016] [Accepted: 07/26/2016] [Indexed: 11/29/2022] Open
Abstract
One objective of this study was to provide readers with a clear and unified understanding of parametric statistical and kernel methods, used for genomic prediction, and to compare some of these in the context of rice breeding for quantitative traits. Furthermore, another objective was to provide a simple and user-friendly R package, named KRMM, which allows users to perform RKHS regression with several kernels. After introducing the concept of regularized empirical risk minimization, the connections between well-known parametric and kernel methods such as Ridge regression [i.e., genomic best linear unbiased predictor (GBLUP)] and reproducing kernel Hilbert space (RKHS) regression were reviewed. Ridge regression was then reformulated so as to show and emphasize the advantage of the kernel “trick” concept, exploited by kernel methods in the context of epistatic genetic architectures, over parametric frameworks used by conventional methods. Some parametric and kernel methods; least absolute shrinkage and selection operator (LASSO), GBLUP, support vector machine regression (SVR) and RKHS regression were thereupon compared for their genomic predictive ability in the context of rice breeding using three real data sets. Among the compared methods, RKHS regression and SVR were often the most accurate methods for prediction followed by GBLUP and LASSO. An R function which allows users to perform RR-BLUP of marker effects, GBLUP and RKHS regression, with a Gaussian, Laplacian, polynomial or ANOVA kernel, in a reasonable computation time has been developed. Moreover, a modified version of this function, which allows users to tune kernels for RKHS regression, has also been developed and parallelized for HPC Linux clusters. The corresponding KRMM package and all scripts have been made publicly available.
Collapse
Affiliation(s)
- Laval Jacquin
- Centre de Coopération Internationale en Recherche Agronomique pour le Développement, BIOS, UMR AGAP Montpellier, France
| | - Tuong-Vi Cao
- Centre de Coopération Internationale en Recherche Agronomique pour le Développement, BIOS, UMR AGAP Montpellier, France
| | - Nourollah Ahmadi
- Centre de Coopération Internationale en Recherche Agronomique pour le Développement, BIOS, UMR AGAP Montpellier, France
| |
Collapse
|
14
|
Austin MD, Simon DK, Betensky RA. Computationally simple estimation and improved efficiency for special cases of double truncation. Lifetime Data Anal 2014; 20:335-354. [PMID: 24347050 PMCID: PMC4058384 DOI: 10.1007/s10985-013-9287-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/20/2013] [Accepted: 11/28/2013] [Indexed: 06/03/2023]
Abstract
Doubly truncated survival data arise when event times are observed only if they occur within subject specific intervals of times. Existing iterative estimation procedures for doubly truncated data are computationally intensive (Turnbull 38:290-295, 1976; Efron and Petrosian 94:824-825, 1999; Shen 62:835-853, 2010a). These procedures assume that the event time is independent of the truncation times, in the sample space that conforms to their requisite ordering. This type of independence is referred to as quasi-independence. In this paper we identify and consider two special cases of quasi-independence: complete quasi-independence and complete truncation dependence. For the case of complete quasi-independence, we derive the nonparametric maximum likelihood estimator in closed-form. For the case of complete truncation dependence, we derive a closed-form nonparametric estimator that requires some external information, and a semi-parametric maximum likelihood estimator that achieves improved efficiency relative to the standard nonparametric maximum likelihood estimator, in the absence of external information. We demonstrate the consistency and potentially improved efficiency of the estimators in simulation studies, and illustrate their use in application to studies of AIDS incubation and Parkinson's disease age of onset.
Collapse
Affiliation(s)
- Matthew D. Austin
- Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA
| | - David K. Simon
- Department of Neurology, Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, MA 02115, USA
| | - Rebecca A. Betensky
- Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA
| |
Collapse
|
15
|
Abstract
The Cox proportional hazards model or its discrete time analogue, the logistic failure time model, posit highly restrictive parametric models and attempt to estimate parameters which are specific to the model proposed. These methods are typically implemented when assessing effect modification in survival analyses despite their flaws. The targeted maximum likelihood estimation (TMLE) methodology is more robust than the methods typically implemented and allows practitioners to estimate parameters that directly answer the question of interest. TMLE will be used in this paper to estimate two newly proposed parameters of interest that quantify effect modification in the time to event setting. These methods are then applied to the Tshepo study to assess if either gender or baseline CD4 level modify the effect of two cART therapies of interest, efavirenz (EFV) and nevirapine (NVP), on the progression of HIV. The results show that women tend to have more favorable outcomes using EFV while males tend to have more favorable outcomes with NVP. Furthermore, EFV tends to be favorable compared to NVP for individuals at high CD4 levels.
Collapse
|