1
|
Pua YH, Kang H, Thumboo J, Clark RA, Chew ESX, Poon CLL, Chong HC, Yeo SJ. Machine learning methods are comparable to logistic regression techniques in predicting severe walking limitation following total knee arthroplasty. Knee Surg Sports Traumatol Arthrosc 2020; 28:3207-3216. [PMID: 31832697 DOI: 10.1007/s00167-019-05822-7] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Accepted: 12/05/2019] [Indexed: 01/13/2023]
Abstract
PURPOSE Machine-learning methods are flexible prediction algorithms with potential advantages over conventional regression. This study aimed to use machine learning methods to predict post-total knee arthroplasty (TKA) walking limitation, and to compare their performance with that of logistic regression. METHODS From the department's clinical registry, a cohort of 4026 patients who underwent elective, primary TKA between July 2013 and July 2017 was identified. Candidate predictors included demographics and preoperative clinical, psychosocial, and outcome measures. The primary outcome was severe walking limitation at 6 months post-TKA, defined as a maximum walk time ≤ 15 min. Eight common regression (logistic, penalized logistic, and ordinal logistic with natural splines) and ensemble machine learning (random forest, extreme gradient boosting, and SuperLearner) methods were implemented to predict the probability of severe walking limitation. Models were compared on discrimination and calibration metrics. RESULTS At 6 months post-TKA, 13% of patients had severe walking limitation. Machine learning and logistic regression models performed moderately [mean area under the ROC curves (AUC) 0.73-0.75]. Overall, the ordinal logistic regression model performed best while the SuperLearner performed best among machine learning methods, with negligible differences between them (Brier score difference, < 0.001; 95% CI [- 0.0025, 0.002]). CONCLUSIONS When predicting post-TKA physical function, several machine learning methods did not outperform logistic regression-in particular, ordinal logistic regression that does not assume linearity in its predictors. LEVEL OF EVIDENCE Prognostic level II.
Collapse
Affiliation(s)
- Yong-Hao Pua
- Department of Physiotherapy, Singapore General Hospital, Singapore, Singapore.
| | - Hakmook Kang
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Julian Thumboo
- Department of Rheumatology and Immunology, Singapore General Hospital, Singapore, Singapore
| | - Ross Allan Clark
- Research Health Institute, University of the Sunshine Coast, Sunshine Coast, Australia
| | | | - Cheryl Lian-Li Poon
- Department of Physiotherapy, Singapore General Hospital, Singapore, Singapore
| | - Hwei-Chi Chong
- Department of Physiotherapy, Changi General Hospital, Singapore, Singapore
| | - Seng-Jin Yeo
- Department of Orthopaedic Surgery, Singapore General Hospital, Singapore, Singapore
| |
Collapse
|
2
|
Li H, Wang J, Bao Z. A novel genomic selection method combining GBLUP and LASSO. Genetica 2015; 143:299-304. [PMID: 25655266 DOI: 10.1007/s10709-015-9826-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2014] [Accepted: 02/02/2015] [Indexed: 11/29/2022]
Abstract
Genetic prediction of quantitative traits is a critical task in plant and animal breeding. Genomic selection is an accurate and efficient method of estimating genetic merits by using high-density genome-wide single nucleotide polymorphisms (SNP). In the framework of linear mixed models, we extended genomic best linear unbiased prediction (GBLUP) by including additional quantitative trait locus (QTL) information that was extracted from high-throughput SNPs by using least absolute shrinkage selection operator (LASSO). GBLUP was combined with three LASSO methods-standard LASSO (SLGBLUP), adaptive LASSO (ALGBLUP), and elastic net (ENGBLUP)-that were used for detecting QTLs, and these QTLs were fitted as fixed effects; the remaining SNPs were fitted using a realized genetic relationship matrix. Simulations performed under distinct scenarios revealed that (1) the prediction accuracy of SLGBLUP was the lowest; (2) the prediction accuracies of ALGBLUP and ENGBLUP were equivalent to or higher than that of GBLUP, except under scenarios in which the number of QTLs was large; and (3) the persistence of prediction accuracy over generations was strongest in the case of ENGBLUP. Building on the favorable computational characteristics of GBLUP, ENGBLUP enables robust modeling and efficient computation to be performed for genomic selection.
Collapse
Affiliation(s)
- Hengde Li
- Centre for Applied Aquatic Genomics, Chinese Academy of Fishery Sciences, Beijing, 100141, China,
| | | | | |
Collapse
|
3
|
de Los Campos G, Hickey JM, Pong-Wong R, Daetwyler HD, Calus MPL. Whole-genome regression and prediction methods applied to plant and animal breeding. Genetics 2013; 193:327-45. [PMID: 22745228 PMCID: PMC3567727 DOI: 10.1534/genetics.112.143313] [Citation(s) in RCA: 495] [Impact Index Per Article: 45.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2012] [Accepted: 06/11/2012] [Indexed: 11/18/2022] Open
Abstract
Genomic-enabled prediction is becoming increasingly important in animal and plant breeding and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of markers concurrently. Methods exist that allow implementing these large-p with small-n regressions, and genome-enabled selection (GS) is being implemented in several plant and animal breeding programs. The list of available methods is long, and the relationships between them have not been fully addressed. In this article we provide an overview of available methods for implementing parametric WGR models, discuss selected topics that emerge in applications, and present a general discussion of lessons learned from simulation and empirical data analysis in the last decade.
Collapse
Affiliation(s)
- Gustavo de Los Campos
- Department of Biostatistics, School of Public Health, University of Alabama, Birmingham, AL 35294, USA.
| | | | | | | | | |
Collapse
|
4
|
Zeng J, Pszczola M, Wolc A, Strabel T, Fernando RL, Garrick DJ, Dekkers JCM. Genomic breeding value prediction and QTL mapping of QTLMAS2011 data using Bayesian and GBLUP methods. BMC Proc 2012; 6 Suppl 2:S7. [PMID: 22640755 PMCID: PMC3363161 DOI: 10.1186/1753-6561-6-s2-s7] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The goal of this study was to apply Bayesian and GBLUP methods to predict genomic breeding values (GEBV), map QTL positions and explore the genetic architecture of the trait simulated for the 15th QTL-MAS workshop. METHODS Three methods with models considering dominance and epistasis inheritances were used to fit the data: (i) BayesB with a proportion π = 0.995 of SNPs assumed to have no effect, (ii) BayesCπ, where π is considered as unknown, and (iii) GBLUP, which directly fits animal genetic effects using a genomic relationship matrix. RESULTS BayesB, BayesCπ and GBLUP with various fitted models detected 6, 5, and 4 out of 8 simulated QTL, respectively. All five additive QTL were detected by Bayesian methods. When two QTL were in either coupling or repulsion phase, GBLUP only detected one of them and missed the other. In addition, GBLUP yielded more false positives. One imprinted QTL was detected by BayesB and GBLUP despite that only additive gene action was assumed. This QTL was missed by BayesCπ. None of the methods found two simulated additive-by-additive epistatic QTL. Variance components estimation correctly detected no evidence for dominance gene-action. Bayesian methods predicted additive genetic merit more accurately than GBLUP, and similar accuracies were observed between BayesB and BayesCπ. CONCLUSIONS Bayesian methods and GBLUP mapped QTL to similar chromosome regions but Bayesian methods gave fewer false positives. Bayesian methods can be superior to GBLUP in GEBV prediction when genomic architecture is unknown.
Collapse
Affiliation(s)
- Jian Zeng
- Department of Animal Science and Center for Integrated Animal Genomics, Iowa State University, Ames, USA
| | - Marcin Pszczola
- Department of Genetics and Animal Breeding, Poznan University of Life Sciences, Poznan, Poland
- Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, Lelystad, The Netherlands
| | - Anna Wolc
- Department of Animal Science and Center for Integrated Animal Genomics, Iowa State University, Ames, USA
- Department of Genetics and Animal Breeding, Poznan University of Life Sciences, Poznan, Poland
| | - Tomasz Strabel
- Department of Genetics and Animal Breeding, Poznan University of Life Sciences, Poznan, Poland
| | - Rohan L Fernando
- Department of Animal Science and Center for Integrated Animal Genomics, Iowa State University, Ames, USA
| | - Dorian J Garrick
- Department of Animal Science and Center for Integrated Animal Genomics, Iowa State University, Ames, USA
| | - Jack CM Dekkers
- Department of Animal Science and Center for Integrated Animal Genomics, Iowa State University, Ames, USA
| |
Collapse
|
5
|
Le Roy P, Filangi O, Demeure O, Elsen JM. Comparison of analyses of the XVth QTLMAS common dataset III: Genomic Estimations of Breeding Values. BMC Proc 2012; 6 Suppl 2:S3. [PMID: 22640599 PMCID: PMC3363157 DOI: 10.1186/1753-6561-6-s2-s3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND The QTLMAS XVth dataset consisted of pedigree, marker genotypes and quantitative trait performances of animals with a sib family structure. Pedigree and genotypes concerned 3,000 progenies among those 2,000 were phenotyped. The trait was regulated by 8 QTLs which displayed additive, imprinting or epistatic effects. The 1,000 unphenotyped progenies were considered as candidates to selection and their Genomic Estimated Breeding Values (GEBV) were evaluated by participants of the XVth QTLMAS workshop. This paper aims at comparing the GEBV estimation results obtained by seven participants to the workshop. METHODS From the known QTL genotypes of each candidate, two "true" genomic values (TV) were estimated by organizers: the genotypic value of the candidate (TGV) and the expectation of its progeny genotypic values (TBV). GEBV were computed by the participants following different statistical methods: random linear models (including BLUP and Ridge Regression), selection variable techniques (LASSO, Elastic Net) and Bayesian methods. Accuracy was evaluated by the correlation between TV (TGV or TBV) and GEBV presented by participants. Rank correlation of the best 10% of individuals and error in predictions were also evaluated. Bias was tested by regression of TV on GEBV. RESULTS Large differences between methods were found for all criteria and type of genetic values (TGV, TBV). In general, the criteria ranked consistently methods belonging to the same family. CONCLUSIONS Bayesian methods - A
Collapse
Affiliation(s)
- Pascale Le Roy
- INRA, UMR1348 PEGASE, Domaine de la Prise, 35590 Saint-Gilles, France
- Agrocampus OUEST, UMR1348 PEGASE, 65 rue de St Brieuc, 35042 Rennes, France
| | - Olivier Filangi
- INRA, UMR1348 PEGASE, Domaine de la Prise, 35590 Saint-Gilles, France
- Agrocampus OUEST, UMR1348 PEGASE, 65 rue de St Brieuc, 35042 Rennes, France
| | - Olivier Demeure
- INRA, UMR1348 PEGASE, Domaine de la Prise, 35590 Saint-Gilles, France
- Agrocampus OUEST, UMR1348 PEGASE, 65 rue de St Brieuc, 35042 Rennes, France
| | - Jean-Michel Elsen
- INRA UR0631 SAGA, chemin de borde rouge, BP 52627, 31326 Castanet-Tolosan, France
| |
Collapse
|
6
|
Bastiaansen JWM, Coster A, Calus MPL, van Arendonk JAM, Bovenhuis H. Long-term response to genomic selection: effects of estimation method and reference population structure for different genetic architectures. Genet Sel Evol 2012; 44:3. [PMID: 22273519 PMCID: PMC3305523 DOI: 10.1186/1297-9686-44-3] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2011] [Accepted: 01/24/2012] [Indexed: 11/18/2022] Open
Abstract
Background Genomic selection has become an important tool in the genetic improvement of animals and plants. The objective of this study was to investigate the impacts of breeding value estimation method, reference population structure, and trait genetic architecture, on long-term response to genomic selection without updating marker effects. Methods Three methods were used to estimate genomic breeding values: a BLUP method with relationships estimated from genome-wide markers (GBLUP), a Bayesian method, and a partial least squares regression method (PLSR). A shallow (individuals from one generation) or deep reference population (individuals from five generations) was used with each method. The effects of the different selection approaches were compared under four different genetic architectures for the trait under selection. Selection was based on one of the three genomic breeding values, on pedigree BLUP breeding values, or performed at random. Selection continued for ten generations. Results Differences in long-term selection response were small. For a genetic architecture with a very small number of three to four quantitative trait loci (QTL), the Bayesian method achieved a response that was 0.05 to 0.1 genetic standard deviation higher than other methods in generation 10. For genetic architectures with approximately 30 to 300 QTL, PLSR (shallow reference) or GBLUP (deep reference) had an average advantage of 0.2 genetic standard deviation over the Bayesian method in generation 10. GBLUP resulted in 0.6% and 0.9% less inbreeding than PLSR and BM and on average a one third smaller reduction of genetic variance. Responses in early generations were greater with the shallow reference population while long-term response was not affected by reference population structure. Conclusions The ranking of estimation methods was different with than without selection. Under selection, applying GBLUP led to lower inbreeding and a smaller reduction of genetic variance while a similar response to selection was achieved. The reference population structure had a limited effect on long-term accuracy and response. Use of a shallow reference population, most closely related to the selection candidates, gave early benefits while in later generations, when marker effects were not updated, the estimation of marker effects based on a deeper reference population did not pay off.
Collapse
Affiliation(s)
- John W M Bastiaansen
- Animal Breeding and Genomics Centre, Wageningen University, Wageningen, the Netherlands.
| | | | | | | | | |
Collapse
|