1
|
Varoquaux G, Raamana PR, Engemann DA, Hoyos-Idrobo A, Schwartz Y, Thirion B. Assessing and tuning brain decoders: Cross-validation, caveats, and guidelines. Neuroimage 2016; 145:166-179. [PMID: 27989847 DOI: 10.1016/j.neuroimage.2016.10.038] [Citation(s) in RCA: 415] [Impact Index Per Article: 46.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2015] [Revised: 09/19/2016] [Accepted: 10/24/2016] [Indexed: 10/20/2022] Open
Abstract
Decoding, i.e. prediction from brain images or signals, calls for empirical evaluation of its predictive power. Such evaluation is achieved via cross-validation, a method also used to tune decoders' hyper-parameters. This paper is a review on cross-validation procedures for decoding in neuroimaging. It includes a didactic overview of the relevant theoretical considerations. Practical aspects are highlighted with an extensive empirical study of the common decoders in within- and across-subject predictions, on multiple datasets -anatomical and functional MRI and MEG- and simulations. Theory and experiments outline that the popular "leave-one-out" strategy leads to unstable and biased estimates, and a repeated random splits method should be preferred. Experiments outline the large error bars of cross-validation in neuroimaging settings: typical confidence intervals of 10%. Nested cross-validation can tune decoders' parameters while avoiding circularity bias. However we find that it can be favorable to use sane defaults, in particular for non-sparse decoders.
Collapse
|
Research Support, Non-U.S. Gov't |
9 |
415 |
2
|
Ten simple rules for predictive modeling of individual differences in neuroimaging. Neuroimage 2019; 193:35-45. [PMID: 30831310 PMCID: PMC6521850 DOI: 10.1016/j.neuroimage.2019.02.057] [Citation(s) in RCA: 246] [Impact Index Per Article: 41.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2018] [Revised: 01/28/2019] [Accepted: 02/21/2019] [Indexed: 11/24/2022] Open
Abstract
Establishing brain-behavior associations that map brain organization to phenotypic measures and generalize to novel individuals remains a challenge in neuroimaging. Predictive modeling approaches that define and validate models with independent datasets offer a solution to this problem. While these methods can detect novel and generalizable brain-behavior associations, they can be daunting, which has limited their use by the wider connectivity community. Here, we offer practical advice and examples based on functional magnetic resonance imaging (fMRI) functional connectivity data for implementing these approaches. We hope these ten rules will increase the use of predictive models with neuroimaging data.
Collapse
|
Review |
6 |
246 |
3
|
On Splitting Training and Validation Set: A Comparative Study of Cross-Validation, Bootstrap and Systematic Sampling for Estimating the Generalization Performance of Supervised Learning. JOURNAL OF ANALYSIS AND TESTING 2018; 2:249-262. [PMID: 30842888 PMCID: PMC6373628 DOI: 10.1007/s41664-018-0068-2] [Citation(s) in RCA: 236] [Impact Index Per Article: 33.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2018] [Revised: 10/08/2018] [Accepted: 10/12/2018] [Indexed: 11/15/2022]
Abstract
Model validation is the most important part of building a supervised model. For building a model with good generalization performance one must have a sensible data splitting strategy, and this is crucial for model validation. In this study, we conducted a comparative study on various reported data splitting methods. The MixSim model was employed to generate nine simulated datasets with different probabilities of mis-classification and variable sample sizes. Then partial least squares for discriminant analysis and support vector machines for classification were applied to these datasets. Data splitting methods tested included variants of cross-validation, bootstrapping, bootstrapped Latin partition, Kennard-Stone algorithm (K-S) and sample set partitioning based on joint X–Y distances algorithm (SPXY). These methods were employed to split the data into training and validation sets. The estimated generalization performances from the validation sets were then compared with the ones obtained from the blind test sets which were generated from the same distribution but were unseen by the training/validation procedure used in model construction. The results showed that the size of the data is the deciding factor for the qualities of the generalization performance estimated from the validation set. We found that there was a significant gap between the performance estimated from the validation set and the one from the test set for the all the data splitting methods employed on small datasets. Such disparity decreased when more samples were available for training/validation, and this is because the models were then moving towards approximations of the central limit theory for the simulated datasets used. We also found that having too many or too few samples in the training set had a negative effect on the estimated model performance, suggesting that it is necessary to have a good balance between the sizes of training set and validation set to have a reliable estimation of model performance. We also found that systematic sampling method such as K-S and SPXY generally had very poor estimation of the model performance, most likely due to the fact that they are designed to take the most representative samples first and thus left a rather poorly representative sample set for model performance estimation.
Collapse
|
Journal Article |
7 |
236 |
4
|
Abstract
We present an automated algorithm for unified rejection and repair of bad trials in magnetoencephalography (MEG) and electroencephalography (EEG) signals. Our method capitalizes on cross-validation in conjunction with a robust evaluation metric to estimate the optimal peak-to-peak threshold - a quantity commonly used for identifying bad trials in M/EEG. This approach is then extended to a more sophisticated algorithm which estimates this threshold for each sensor yielding trial-wise bad sensors. Depending on the number of bad sensors, the trial is then repaired by interpolation or by excluding it from subsequent analysis. All steps of the algorithm are fully automated thus lending itself to the name Autoreject. In order to assess the practical significance of the algorithm, we conducted extensive validation and comparisons with state-of-the-art methods on four public datasets containing MEG and EEG recordings from more than 200 subjects. The comparisons include purely qualitative efforts as well as quantitatively benchmarking against human supervised and semi-automated preprocessing pipelines. The algorithm allowed us to automate the preprocessing of MEG data from the Human Connectome Project (HCP) going up to the computation of the evoked responses. The automated nature of our method minimizes the burden of human inspection, hence supporting scalability and reliability demanded by data analysis in modern neuroscience.
Collapse
|
Research Support, Non-U.S. Gov't |
8 |
212 |
5
|
Abstract
Cross-validation type of methods have been widely used to facilitate model estimation and variable selection. In this work, we suggest a new K-fold cross validation procedure to select a candidate 'optimal' model from each hold-out fold and average the K candidate 'optimal' models to obtain the ultimate model. Due to the averaging effect, the variance of the proposed estimates can be significantly reduced. This new procedure results in more stable and efficient parameter estimation than the classical K-fold cross validation procedure. In addition, we show the asymptotic equivalence between the proposed and classical cross validation procedures in the linear regression setting. We also demonstrate the broad applicability of the proposed procedure via two examples of parameter sparsity regularization and quantile smoothing splines modeling. We illustrate the promise of the proposed method through simulations and a real data example.
Collapse
|
Journal Article |
10 |
144 |
6
|
Tian L, Zhao L, Wei LJ. Predicting the restricted mean event time with the subject's baseline covariates in survival analysis. Biostatistics 2013; 15:222-33. [PMID: 24292992 DOI: 10.1093/biostatistics/kxt050] [Citation(s) in RCA: 136] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
For designing, monitoring, and analyzing a longitudinal study with an event time as the outcome variable, the restricted mean event time (RMET) is an easily interpretable, clinically meaningful summary of the survival function in the presence of censoring. The RMET is the average of all potential event times measured up to a time point τ and can be estimated consistently by the area under the Kaplan-Meier curve over $[0, \tau ]$. In this paper, we study a class of regression models, which directly relates the RMET to its "baseline" covariates for predicting the future subjects' RMETs. Since the standard Cox and the accelerated failure time models can also be used for estimating such RMETs, we utilize a cross-validation procedure to select the "best" among all the working models considered in the model building and evaluation process. Lastly, we draw inferences for the predicted RMETs to assess the performance of the final selected model using an independent data set or a "hold-out" sample from the original data set. All the proposals are illustrated with the data from the an HIV clinical trial conducted by the AIDS Clinical Trials Group and the primary biliary cirrhosis study conducted by the Mayo Clinic.
Collapse
|
Research Support, N.I.H., Extramural |
12 |
136 |
7
|
Tanner EM, Bornehag CG, Gennings C. Repeated holdout validation for weighted quantile sum regression. MethodsX 2019; 6:2855-2860. [PMID: 31871919 PMCID: PMC6911906 DOI: 10.1016/j.mex.2019.11.008] [Citation(s) in RCA: 122] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Accepted: 11/06/2019] [Indexed: 12/11/2022] Open
Abstract
Weighted Quantile Sum (WQS) regression is a method commonly used in environmental epidemiology to assess the impact of chemical mixtures in relation to a health outcome of interest. Data are partitioned into a single training and test set to reduce sample-specific chemical weights. However, in typical epidemiology sample sizes, this may produce unstable chemical weights and WQS index estimates, and investigators may resort to training and testing on the same data. To solve this problem, we propose repeated holdout validation whereby data are randomly partitioned 100 times, producing a distribution of validated results. Taking the mean as the final estimate, confidence estimates may also be calculated for inference. Further, this method helps characterize the variability in chemical weights, aiding in the identification of chemicals of concern. This is important since it may direct future research into specific chemicals. Using data from 718 mother-child pairs in the Swedish Environmental Longitudinal, Mother and Child, Asthma and Allergy (SELMA) study, we assessed the association between prenatal exposure to 26 endocrine disrupting chemicals and child Intelligence Quotient (IQ). Results using a single partition were unstable, varying by random seed. The WQS index estimate was significant when all data was used (e.g. no partition) (β = −2.2 CI = −3.43, −0.98), but attenuated and nonsignificant using repeated holdout validation (β = −0.82 CI = −2.11, 0.45). When implementing WQS in epidemiologic studies with limited sample sizes, repeated holdout validation is a viable alternative to using a single, or no partitioning. Repeated holdout can both stabilize results and help characterize the uncertainty in identifying chemicals of concern, while maintaining some of the the rigor of holdout validation.
Repeated holdout validation improves the stability of WQS estimates in finite study samples Uncertainty in identifying toxic chemicals of concern is acknowledged and characterized
Collapse
|
Journal Article |
6 |
122 |
8
|
Nguyen CD, Carlin JB, Lee KJ. Model checking in multiple imputation: an overview and case study. Emerg Themes Epidemiol 2017; 14:8. [PMID: 28852415 PMCID: PMC5569512 DOI: 10.1186/s12982-017-0062-6] [Citation(s) in RCA: 101] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2017] [Accepted: 08/07/2017] [Indexed: 11/20/2022] Open
Abstract
Background Multiple imputation has become very popular as a general-purpose method for handling missing data. The validity of multiple-imputation-based analyses relies on the use of an appropriate model to impute the missing values. Despite the widespread use of multiple imputation, there are few guidelines available for checking imputation models.
Analysis In this paper, we provide an overview of currently available methods for checking imputation models. These include graphical checks and numerical summaries, as well as simulation-based methods such as posterior predictive checking. These model checking techniques are illustrated using an analysis affected by missing data from the Longitudinal Study of Australian Children. Conclusions As multiple imputation becomes further established as a standard approach for handling missing data, it will become increasingly important that researchers employ appropriate model checking approaches to ensure that reliable results are obtained when using this method.
Electronic supplementary material The online version of this article (doi:10.1186/s12982-017-0062-6) contains supplementary material, which is available to authorized users.
Collapse
|
Journal Article |
8 |
101 |
9
|
Mateos-Pérez JM, Dadar M, Lacalle-Aurioles M, Iturria-Medina Y, Zeighami Y, Evans AC. Structural neuroimaging as clinical predictor: A review of machine learning applications. NEUROIMAGE-CLINICAL 2018; 20:506-522. [PMID: 30167371 PMCID: PMC6108077 DOI: 10.1016/j.nicl.2018.08.019] [Citation(s) in RCA: 90] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/16/2017] [Revised: 01/22/2018] [Accepted: 08/09/2018] [Indexed: 11/26/2022]
Abstract
In this paper, we provide an extensive overview of machine learning techniques applied to structural magnetic resonance imaging (MRI) data to obtain clinical classifiers. We specifically address practical problems commonly encountered in the literature, with the aim of helping researchers improve the application of these techniques in future works. Additionally, we survey how these algorithms are applied to a wide range of diseases and disorders (e.g. Alzheimer's disease (AD), Parkinson's disease (PD), autism, multiple sclerosis, traumatic brain injury, etc.) in order to provide a comprehensive view of the state of the art in different fields.
Collapse
|
Review |
7 |
90 |
10
|
Tsamardinos I, Greasidou E, Borboudakis G. Bootstrapping the out-of-sample predictions for efficient and accurate cross-validation. Mach Learn 2018; 107:1895-1922. [PMID: 30393425 PMCID: PMC6191021 DOI: 10.1007/s10994-018-5714-4] [Citation(s) in RCA: 84] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2017] [Accepted: 04/21/2018] [Indexed: 12/26/2022]
Abstract
Cross-Validation (CV), and out-of-sample performance-estimation protocols in general, are often employed both for (a) selecting the optimal combination of algorithms and values of hyper-parameters (called a configuration) for producing the final predictive model, and (b) estimating the predictive performance of the final model. However, the cross-validated performance of the best configuration is optimistically biased. We present an efficient bootstrap method that corrects for the bias, called Bootstrap Bias Corrected CV (BBC-CV). BBC-CV's main idea is to bootstrap the whole process of selecting the best-performing configuration on the out-of-sample predictions of each configuration, without additional training of models. In comparison to the alternatives, namely the nested cross-validation (Varma and Simon in BMC Bioinform 7(1):91, 2006) and a method by Tibshirani and Tibshirani (Ann Appl Stat 822-829, 2009), BBC-CV is computationally more efficient, has smaller variance and bias, and is applicable to any metric of performance (accuracy, AUC, concordance index, mean squared error). Subsequently, we employ again the idea of bootstrapping the out-of-sample predictions to speed up the CV process. Specifically, using a bootstrap-based statistical criterion we stop training of models on new folds of inferior (with high probability) configurations. We name the method Bootstrap Bias Corrected with Dropping CV (BBCD-CV) that is both efficient and provides accurate performance estimates.
Collapse
|
research-article |
7 |
84 |
11
|
Baumann D, Baumann K. Reliable estimation of prediction errors for QSAR models under model uncertainty using double cross-validation. J Cheminform 2014; 6:47. [PMID: 25506400 PMCID: PMC4260165 DOI: 10.1186/s13321-014-0047-1] [Citation(s) in RCA: 83] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2014] [Accepted: 10/30/2014] [Indexed: 01/17/2023] Open
Abstract
Background Generally, QSAR modelling requires both model selection and validation since there is no a priori knowledge about the optimal QSAR model. Prediction errors (PE) are frequently used to select and to assess the models under study. Reliable estimation of prediction errors is challenging – especially under model uncertainty – and requires independent test objects. These test objects must not be involved in model building nor in model selection. Double cross-validation, sometimes also termed nested cross-validation, offers an attractive possibility to generate test data and to select QSAR models since it uses the data very efficiently. Nevertheless, there is a controversy in the literature with respect to the reliability of double cross-validation under model uncertainty. Moreover, systematic studies investigating the adequate parameterization of double cross-validation are still missing. Here, the cross-validation design in the inner loop and the influence of the test set size in the outer loop is systematically studied for regression models in combination with variable selection. Methods Simulated and real data are analysed with double cross-validation to identify important factors for the resulting model quality. For the simulated data, a bias-variance decomposition is provided. Results The prediction errors of QSAR/QSPR regression models in combination with variable selection depend to a large degree on the parameterization of double cross-validation. While the parameters for the inner loop of double cross-validation mainly influence bias and variance of the resulting models, the parameters for the outer loop mainly influence the variability of the resulting prediction error estimate. Conclusions Double cross-validation reliably and unbiasedly estimates prediction errors under model uncertainty for regression models. As compared to a single test set, double cross-validation provided a more realistic picture of model quality and should be preferred over a single test set. Electronic supplementary material The online version of this article (doi:10.1186/s13321-014-0047-1) contains supplementary material, which is available to authorized users.
Collapse
|
Journal Article |
11 |
83 |
12
|
Saini I, Singh D, Khosla A. QRS detection using K-Nearest Neighbor algorithm (KNN) and evaluation on standard ECG databases. J Adv Res 2012; 4:331-44. [PMID: 25685438 PMCID: PMC4293876 DOI: 10.1016/j.jare.2012.05.007] [Citation(s) in RCA: 73] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2012] [Revised: 05/10/2012] [Accepted: 05/30/2012] [Indexed: 11/18/2022] Open
Abstract
The performance of computer aided ECG analysis depends on the precise and accurate delineation of QRS-complexes. This paper presents an application of K-Nearest Neighbor (KNN) algorithm as a classifier for detection of QRS-complex in ECG. The proposed algorithm is evaluated on two manually annotated standard databases such as CSE and MIT-BIH Arrhythmia database. In this work, a digital band-pass filter is used to reduce false detection caused by interference present in ECG signal and further gradient of the signal is used as a feature for QRS-detection. In addition the accuracy of KNN based classifier is largely dependent on the value of K and type of distance metric. The value of K = 3 and Euclidean distance metric has been proposed for the KNN classifier, using fivefold cross-validation. The detection rates of 99.89% and 99.81% are achieved for CSE and MIT-BIH databases respectively. The QRS detector obtained a sensitivity Se = 99.86% and specificity Sp = 99.86% for CSE database, and Se = 99.81% and Sp = 99.86% for MIT-BIH Arrhythmia database. A comparison is also made between proposed algorithm and other published work using CSE and MIT-BIH Arrhythmia databases. These results clearly establishes KNN algorithm for reliable and accurate QRS-detection.
Collapse
|
Journal Article |
13 |
73 |
13
|
AtbPpred: A Robust Sequence-Based Prediction of Anti-Tubercular Peptides Using Extremely Randomized Trees. Comput Struct Biotechnol J 2019; 17:972-981. [PMID: 31372196 PMCID: PMC6658830 DOI: 10.1016/j.csbj.2019.06.024] [Citation(s) in RCA: 72] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Revised: 06/27/2019] [Accepted: 06/28/2019] [Indexed: 01/01/2023] Open
Abstract
Mycobacterium tuberculosis is one of the most dangerous pathogens in humans. It acts as an etiological agent of tuberculosis (TB), infecting almost one-third of the world's population. Owing to the high incidence of multidrug-resistant TB and extensively drug-resistant TB, there is an urgent need for novel and effective alternative therapies. Peptide-based therapy has several advantages, such as diverse mechanisms of action, low immunogenicity, and selective affinity to bacterial cell envelopes. However, the identification of anti-tubercular peptides (AtbPs) via experimentation is laborious and expensive; hence, the development of an efficient computational method is necessary for the prediction of AtbPs prior to both in vitro and in vivo experiments. To this end, we developed a two-layer machine learning (ML)-based predictor called AtbPpred for the identification of AtbPs. In the first layer, we applied a two-step feature selection procedure and identified the optimal feature set individually for nine different feature encodings, whose corresponding models were developed using extremely randomized tree (ERT). In the second-layer, the predicted probability of AtbPs from the above nine models were considered as input features to ERT and developed the final predictor. AtbPpred respectively achieved average accuracies of 88.3% and 87.3% during cross-validation and an independent evaluation, which were ~8.7% and 10.0% higher than the state-of-the-art method. Furthermore, we established a user-friendly webserver which is currently available at http://thegleelab.org/AtbPpred. We anticipate that this predictor could be useful in the high-throughput prediction of AtbPs and also provide mechanistic insights into its functions.
We developed a novel computational framework for the identification of anti-tubercular peptides using Extremely randomized tree. AtbPpred displayed superior performance compared to the existing method on both benchmark and independent datasets. We constructed a user-friendly web server that implements the proposed AtbPpred method.
Collapse
|
Journal Article |
6 |
72 |
14
|
Schuster C, Hardiman O, Bede P. Survival prediction in Amyotrophic lateral sclerosis based on MRI measures and clinical characteristics. BMC Neurol 2017; 17:73. [PMID: 28412941 PMCID: PMC5393027 DOI: 10.1186/s12883-017-0854-x] [Citation(s) in RCA: 71] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2016] [Accepted: 04/01/2017] [Indexed: 11/20/2022] Open
Abstract
Background Amyotrophic lateral sclerosis (ALS) a highly heterogeneous neurodegenerative condition. Accurate diagnostic, monitoring and prognostic biomarkers are urgently needed both for individualised patient care and clinical trials. A multimodal magnetic resonance imaging study is presented, where MRI measures of ALS-associated brain regions are utilised to predict 18-month survival. Methods A total of 60 ALS patients and 69 healthy controls were included in this study. 20% of the patient sample was utilised as an independent validation sample. Surface-based morphometry and diffusion tensor white matter parameters were used to identify anatomical patterns of neurodegeneration in 80% of the patient sample compared to healthy controls. Binary logistic ridge regressions were carried out to predict 18-month survival based on clinical measures alone, MRI features, and a combination of clinical and MRI data. Clinical indices included age at symptoms onset, site of disease onset, diagnostic delay from first symptom to diagnosis, and physical disability (ALSFRS-r). MRI features included the average cortical thickness of the precentral and paracentral gyri, the average fractional anisotropy, radial-, medial-, and axial diffusivity of the superior and inferior corona radiata, internal capsule, cerebral peduncles and the genu, body and splenium of the corpus callosum. Results Clinical data alone had a survival prediction accuracy of 66.67%, with 62.50% sensitivity and 70.84% specificity. MRI data alone resulted in a prediction accuracy of 77.08%, with 79.16% sensitivity and 75% specificity. The combination of clinical and MRI measures led to a survival prediction accuracy of 79.17%, with 75% sensitivity and 83.34% specificity. Conclusion Quantitative MRI measures of ALS-specific brain regions enhance survival prediction in ALS and should be incorporated in future clinical trial designs. Electronic supplementary material The online version of this article (doi:10.1186/s12883-017-0854-x) contains supplementary material, which is available to authorized users.
Collapse
|
Journal Article |
8 |
71 |
15
|
Hernández B, Parnell A, Pennington SR. Why have so few proteomic biomarkers "survived" validation? (Sample size and independent validation considerations). Proteomics 2014; 14:1587-92. [PMID: 24737731 DOI: 10.1002/pmic.201300377] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2013] [Revised: 03/19/2014] [Accepted: 04/07/2014] [Indexed: 12/22/2022]
Abstract
Proteomic biomarker discovery has led to the identification of numerous potential candidates for disease diagnosis, prognosis, and prediction of response to therapy. However, very few of these identified candidate biomarkers reach clinical validation and go on to be routinely used in clinical practice. One particular issue with biomarker discovery is the identification of significantly changing proteins in the initial discovery experiment that do not validate when subsequently tested on separate patient sample cohorts. Here, we seek to highlight some of the statistical challenges surrounding the analysis of LC-MS proteomic data for biomarker candidate discovery. We show that common statistical algorithms run on data with low sample sizes can overfit and yield misleading misclassification rates and AUC values. A common solution to this problem is to prefilter variables (via, e.g. ANOVA and or use of correction methods such as Bonferonni or false discovery rate) to give a smaller dataset and reduce the size of the apparent statistical challenge. However, we show that this exacerbates the problem yielding even higher performance metrics while reducing the predictive accuracy of the biomarker panel. To illustrate some of these limitations, we have run simulation analyses with known biomarkers. For our chosen algorithm (random forests), we show that the above problems are substantially reduced if a sufficient number of samples are analyzed and the data are not prefiltered. Our view is that LC-MS proteomic biomarker discovery data should be analyzed without prefiltering and that increasing the sample size in biomarker discovery experiments should be a very high priority.
Collapse
|
Research Support, Non-U.S. Gov't |
11 |
65 |
16
|
Mutanen TP, Metsomaa J, Liljander S, Ilmoniemi RJ. Automatic and robust noise suppression in EEG and MEG: The SOUND algorithm. Neuroimage 2017; 166:135-151. [PMID: 29061529 DOI: 10.1016/j.neuroimage.2017.10.021] [Citation(s) in RCA: 60] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2017] [Revised: 09/25/2017] [Accepted: 10/10/2017] [Indexed: 10/18/2022] Open
Abstract
Electroencephalography (EEG) and magnetoencephalography (MEG) often suffer from noise- and artifact-contaminated channels and trials. Conventionally, EEG and MEG data are inspected visually and cleaned accordingly, e.g., by identifying and rejecting the so-called "bad" channels. This approach has several shortcomings: data inspection is laborious, the rejection criteria are subjective, and the process does not fully utilize all the information in the collected data. Here, we present noise-cleaning methods based on modeling the multi-sensor and multi-trial data. These approaches offer objective, automatic, and robust removal of noise and disturbances by taking into account the sensor- or trial-specific signal-to-noise ratios. We introduce a method called the source-estimate-utilizing noise-discarding algorithm (the SOUND algorithm). SOUND employs anatomical information of the head to cross-validate the data between the sensors. As a result, we are able to identify and suppress noise and artifacts in EEG and MEG. Furthermore, we discuss the theoretical background of SOUND and show that it is a special case of the well-known Wiener estimators. We explain how a completely data-driven Wiener estimator (DDWiener) can be used when no anatomical information is available. DDWiener is easily applicable to any linear multivariate problem; as a demonstrative example, we show how DDWiener can be utilized when estimating event-related EEG/MEG responses. We validated the performance of SOUND with simulations and by applying SOUND to multiple EEG and MEG datasets. SOUND considerably improved the data quality, exceeding the performance of the widely used channel-rejection and interpolation scheme. SOUND also helped in localizing the underlying neural activity by preventing noise from contaminating the source estimates. SOUND can be used to detect and reject noise in functional brain data, enabling improved identification of active brain areas.
Collapse
|
Research Support, Non-U.S. Gov't |
8 |
60 |
17
|
Yoon JH, Lee JM, Woo HS, Yu MH, Joo I, Lee ES, Sohn JY, Lee KB, Han JK, Choi BI. Staging of hepatic fibrosis: comparison of magnetic resonance elastography and shear wave elastography in the same individuals. Korean J Radiol 2013; 14:202-12. [PMID: 23483022 PMCID: PMC3590331 DOI: 10.3348/kjr.2013.14.2.202] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2012] [Accepted: 10/12/2012] [Indexed: 02/06/2023] Open
Abstract
Objective To cross-validate liver stiffness (LS) measured on shear wave elastography (SWE) and on magnetic resonance elastography (MRE) in the same individuals. Materials and Methods We included 94 liver transplantation (LT) recipients and 114 liver donors who underwent either MRE or SWE before surgery or biopsy. We determined the technical success rates and the incidence of unreliable LS measurements (LSM) of SWE and MRE. Among the 69 patients who underwent both MRE and SWE, the median and coefficient of variation (CV) of the LSM from each examination were compared and correlated. Areas under the receiver operating characteristic curve in both examinations were calculated in order to exclude the presence of hepatic fibrosis (HF). Results The technical success rates of MRE and SWE were 96.4% and 92.2%, respectively (p = 0.17), and all of the technical failures occurred in LT recipients. SWE showed 13.1% unreliable LSM, whereas MRE showed no such case (p < 0.05). There was moderate correlation in the LSM in both examinations (r = 0.67). SWE showed a significantly larger median LSM and CV than MRE. Both examinations showed similar diagnostic performance for excluding HF (Az; 0.989, 1.000, respectively). Conclusion MRE and SWE show moderate correlation in their LSMs, although SWE shows higher incidence of unreliable LSMs in cirrhotic liver.
Collapse
|
Research Support, Non-U.S. Gov't |
12 |
58 |
18
|
Emura T, Matsui S, Chen HY. compound.Cox: Univariate feature selection and compound covariate for predicting survival. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2019; 168:21-37. [PMID: 30527130 DOI: 10.1016/j.cmpb.2018.10.020] [Citation(s) in RCA: 52] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2018] [Revised: 09/26/2018] [Accepted: 10/26/2018] [Indexed: 05/15/2023]
Abstract
BACKGROUND AND OBJECTIVE Univariate feature selection is one of the simplest and most commonly used techniques to develop a multigene predictor for survival. Presently, there is no software tailored to perform univariate feature selection and predictor construction. METHODS We develop the compound.Cox R package that implements univariate significance tests (via the Wald tests or score tests) for feature selection. We provide a cross-validation algorithm to measure predictive capability of selected genes and a permutation algorithm to assess the false discovery rate. We also provide three algorithms for constructing a multigene predictor (compound covariate, compound shrinkage, and copula-based methods), which are tailored to the subset of genes obtained from univariate feature selection. We demonstrate our package using survival data on the lung cancer patients. We examine the predictive capability of the developed algorithms by the lung cancer data and simulated data. RESULTS The developed R package, compound.Cox, is available on the CRAN repository. The statistical tools in compound.Cox allow researchers to determine an optimal significance level of the tests, thus providing researchers an optimal subset of genes for prediction. The package also allows researchers to compute the false discovery rate and various prediction algorithms.
Collapse
|
|
6 |
52 |
19
|
From Vivaldi to Beatles and back: predicting lateralized brain responses to music. Neuroimage 2013; 83:627-36. [PMID: 23810975 DOI: 10.1016/j.neuroimage.2013.06.064] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2012] [Revised: 05/31/2013] [Accepted: 06/18/2013] [Indexed: 11/21/2022] Open
Abstract
We aimed at predicting the temporal evolution of brain activity in naturalistic music listening conditions using a combination of neuroimaging and acoustic feature extraction. Participants were scanned using functional Magnetic Resonance Imaging (fMRI) while listening to two musical medleys, including pieces from various genres with and without lyrics. Regression models were built to predict voxel-wise brain activations which were then tested in a cross-validation setting in order to evaluate the robustness of the hence created models across stimuli. To further assess the generalizability of the models we extended the cross-validation procedure by including another dataset, which comprised continuous fMRI responses of musically trained participants to an Argentinean tango. Individual models for the two musical medleys revealed that activations in several areas in the brain belonging to the auditory, limbic, and motor regions could be predicted. Notably, activations in the medial orbitofrontal region and the anterior cingulate cortex, relevant for self-referential appraisal and aesthetic judgments, could be predicted successfully. Cross-validation across musical stimuli and participant pools helped identify a region of the right superior temporal gyrus, encompassing the planum polare and the Heschl's gyrus, as the core structure that processed complex acoustic features of musical pieces from various genres, with or without lyrics. Models based on purely instrumental music were able to predict activation in the bilateral auditory cortices, parietal, somatosensory, and left hemispheric primary and supplementary motor areas. The presence of lyrics on the other hand weakened the prediction of activations in the left superior temporal gyrus. Our results suggest spontaneous emotion-related processing during naturalistic listening to music and provide supportive evidence for the hemispheric specialization for categorical sounds with realistic stimuli. We herewith introduce a powerful means to predict brain responses to music, speech, or soundscapes across a large variety of contexts.
Collapse
|
Research Support, Non-U.S. Gov't |
12 |
52 |
20
|
Flack KD, Siders WA, Johnson L, Roemmich JN. Cross-Validation of Resting Metabolic Rate Prediction Equations. J Acad Nutr Diet 2016; 116:1413-1422. [PMID: 27138231 DOI: 10.1016/j.jand.2016.03.018] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2015] [Accepted: 03/18/2016] [Indexed: 10/21/2022]
Abstract
BACKGROUND Resting metabolic rate (RMR) measurement is time consuming and requires specialized equipment. Prediction equations provide an easy method to estimate RMR; however, their accuracy likely varies across individuals. Understanding the factors that influence the accuracy of RMR predictions will help to revise existing, or develop new and improved, equations. OBJECTIVE Our aim was to test the validity of RMR predicted in healthy adults by the Harris-Benedict, World Health Organization, Mifflin-St Jeor, Nelson, Wang equations, and three meta-equations of Sabounchi. DESIGN Predicted RMR was tested for agreement with indirect calorimetry. PARTICIPANTS/SETTING Men and women (n=30) age 18 to 65 years from Grand Forks, ND, were recruited and included for analysis during spring/summer 2014. Participants were nonobese or obese (body mass index range=19 to 39) and primarly white. MAIN OUTCOME MEASURE Agreement between measured (indirect calorimetry) and predicted RMR was measured. STATISTICAL ANALYSIS The methods of Bland and Altman were employed to determine mean bias (predicted minus measured RMR, kcal/day) and limits of agreement between predicted and measured RMR. Repeated-measures analysis of variance was used to test for bias in RMR predicted from each equation vs the measured RMR. RESULTS Bias (mean±2 standard deviations) was lowest for the Harris-Benedict (-14±378 kcal/24 h) and World Health Organization (-25±394 kcal/24 h) equations. These equations also predicted RMR that were not different from measured. Mean RMR predictions from all other equations significantly differed from indirect calorimetry. The 2 standard deviation limits of agreement were moderate or large for all equations tested, ranging from 314 to 445 kcal/24 h. Prediction bias was inversely associated with the magnitude of RMR and with fat-free mass. CONCLUSIONS At the group level, the traditional Harris-Benedict and World Health Organization equations were the most accurate. However, these equations did not perform well at the individual level. As fat-free mass increased, the prediction equations further underestimated RMR.
Collapse
|
Validation Study |
9 |
50 |
21
|
Caruso R, Pittella F, Zaghini F, Fida R, Sili A. Development and validation of the Nursing Profession Self-Efficacy Scale. Int Nurs Rev 2016; 63:455-64. [PMID: 27291103 DOI: 10.1111/inr.12291] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
AIM This study describes the development and validation of the Nursing Profession Self-Efficacy Scale. BACKGROUND Self-efficacy can be useful in predicting performance, job satisfaction or well-being. In the nursing field, there is a shortage of studies on self-efficacy with regard to nurses' global confidence in coping ability across a range of everyday, challenging work situations. METHODS To define the theoretical framework of nursing professional self-efficacy, two focus groups and a literature review were performed. An empirical study was then conducted to test validity and reliability. Face and content validity, construct validity, concurrent validity, internal consistency and test-retest reliability were examined. The content validity index was evaluated by 12 experts who suggested deleting 11 redundant items. The final developed tool was tested for construct analysis using a cross-validation approach, randomly splitting the overall sample of 917 nurses in two sub-groups. FINDINGS The construct validity indicated two dimensions. The face and content validity were adequate. Test-retest reliability displayed a good stability, and internal consistency (Cronbach's α) was acceptable. Moreover, concurrent validity using the Generalized Self-Efficacy Scale was in line with the theoretical framework. CONCLUSION The scale showed evidence of validity and reliability. The major limitation is the strong influence of the Italian context in the tool development. IMPLICATIONS FOR NURSING AND HEALTH POLICY The Nursing Profession Self-Efficacy Scale could be a fruitful tool that facilitates the application of theories (i.e. social-cognitive theory) in the nursing field and even development of interventions. Furthermore, a measurement of self-efficacy could be used to predict nursing clinical performance.
Collapse
|
Validation Study |
9 |
50 |
22
|
Charilaou P, Battat R. Machine learning models and over-fitting considerations. World J Gastroenterol 2022; 28:605-607. [PMID: 35316964 PMCID: PMC8905023 DOI: 10.3748/wjg.v28.i5.605] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Revised: 12/29/2021] [Accepted: 01/14/2022] [Indexed: 02/06/2023] Open
Abstract
Machine learning models may outperform traditional statistical regression algorithms for predicting clinical outcomes. Proper validation of building such models and tuning their underlying algorithms is necessary to avoid over-fitting and poor generalizability, which smaller datasets can be more prone to. In an effort to educate readers interested in artificial intelligence and model-building based on machine-learning algorithms, we outline important details on cross-validation techniques that can enhance the performance and generalizability of such models.
Collapse
|
Letter to the Editor |
3 |
50 |
23
|
Wahl S, Boulesteix AL, Zierer A, Thorand B, Avan de Wiel M. Assessment of predictive performance in incomplete data by combining internal validation and multiple imputation. BMC Med Res Methodol 2016; 16:144. [PMID: 27782817 PMCID: PMC5080703 DOI: 10.1186/s12874-016-0239-7] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2016] [Accepted: 09/30/2016] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Missing values are a frequent issue in human studies. In many situations, multiple imputation (MI) is an appropriate missing data handling strategy, whereby missing values are imputed multiple times, the analysis is performed in every imputed data set, and the obtained estimates are pooled. If the aim is to estimate (added) predictive performance measures, such as (change in) the area under the receiver-operating characteristic curve (AUC), internal validation strategies become desirable in order to correct for optimism. It is not fully understood how internal validation should be combined with multiple imputation. METHODS In a comprehensive simulation study and in a real data set based on blood markers as predictors for mortality, we compare three combination strategies: Val-MI, internal validation followed by MI on the training and test parts separately, MI-Val, MI on the full data set followed by internal validation, and MI(-y)-Val, MI on the full data set omitting the outcome followed by internal validation. Different validation strategies, including bootstrap und cross-validation, different (added) performance measures, and various data characteristics are considered, and the strategies are evaluated with regard to bias and mean squared error of the obtained performance estimates. In addition, we elaborate on the number of resamples and imputations to be used, and adopt a strategy for confidence interval construction to incomplete data. RESULTS Internal validation is essential in order to avoid optimism, with the bootstrap 0.632+ estimate representing a reliable method to correct for optimism. While estimates obtained by MI-Val are optimistically biased, those obtained by MI(-y)-Val tend to be pessimistic in the presence of a true underlying effect. Val-MI provides largely unbiased estimates, with a slight pessimistic bias with increasing true effect size, number of covariates and decreasing sample size. In Val-MI, accuracy of the estimate is more strongly improved by increasing the number of bootstrap draws rather than the number of imputations. With a simple integrated approach, valid confidence intervals for performance estimates can be obtained. CONCLUSIONS When prognostic models are developed on incomplete data, Val-MI represents a valid strategy to obtain estimates of predictive performance measures.
Collapse
|
research-article |
9 |
45 |
24
|
Slieker RC, Donnelly LA, Fitipaldi H, Bouland GA, Giordano GN, Åkerlund M, Gerl MJ, Ahlqvist E, Ali A, Dragan I, Festa A, Hansen MK, Mansour Aly D, Kim M, Kuznetsov D, Mehl F, Klose C, Simons K, Pavo I, Pullen TJ, Suvitaival T, Wretlind A, Rossing P, Lyssenko V, Legido-Quigley C, Groop L, Thorens B, Franks PW, Ibberson M, Rutter GA, Beulens JWJ, 't Hart LM, Pearson ER. Replication and cross-validation of type 2 diabetes subtypes based on clinical variables: an IMI-RHAPSODY study. Diabetologia 2021; 64:1982-1989. [PMID: 34110439 PMCID: PMC8382625 DOI: 10.1007/s00125-021-05490-8] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Accepted: 03/12/2021] [Indexed: 11/26/2022]
Abstract
AIMS/HYPOTHESIS Five clusters based on clinical characteristics have been suggested as diabetes subtypes: one autoimmune and four subtypes of type 2 diabetes. In the current study we replicate and cross-validate these type 2 diabetes clusters in three large cohorts using variables readily measured in the clinic. METHODS In three independent cohorts, in total 15,940 individuals were clustered based on age, BMI, HbA1c, random or fasting C-peptide, and HDL-cholesterol. Clusters were cross-validated against the original clusters based on HOMA measures. In addition, between cohorts, clusters were cross-validated by re-assigning people based on each cohort's cluster centres. Finally, we compared the time to insulin requirement for each cluster. RESULTS Five distinct type 2 diabetes clusters were identified and mapped back to the original four All New Diabetics in Scania (ANDIS) clusters. Using C-peptide and HDL-cholesterol instead of HOMA2-B and HOMA2-IR, three of the clusters mapped with high sensitivity (80.6-90.7%) to the previously identified severe insulin-deficient diabetes (SIDD), severe insulin-resistant diabetes (SIRD) and mild obesity-related diabetes (MOD) clusters. The previously described ANDIS mild age-related diabetes (MARD) cluster could be mapped to the two milder groups in our study: one characterised by high HDL-cholesterol (mild diabetes with high HDL-cholesterol [MDH] cluster), and the other not having any extreme characteristic (mild diabetes [MD]). When these two milder groups were combined, they mapped well to the previously labelled MARD cluster (sensitivity 79.1%). In the cross-validation between cohorts, particularly the SIDD and MDH clusters cross-validated well, with sensitivities ranging from 73.3% to 97.1%. SIRD and MD showed a lower sensitivity, ranging from 36.1% to 92.3%, where individuals shifted from SIRD to MD and vice versa. People belonging to the SIDD cluster showed the fastest progression towards insulin requirement, while the MDH cluster showed the slowest progression. CONCLUSIONS/INTERPRETATION Clusters based on C-peptide instead of HOMA2 measures resemble those based on HOMA2 measures, especially for SIDD, SIRD and MOD. By adding HDL-cholesterol, the MARD cluster based upon HOMA2 measures resulted in the current clustering into two clusters, with one cluster having high HDL levels. Cross-validation between cohorts showed generally a good resemblance between cohorts. Together, our results show that the clustering based on clinical variables readily measured in the clinic (age, HbA1c, HDL-cholesterol, BMI and C-peptide) results in informative clusters that are representative of the original ANDIS clusters and stable across cohorts. Adding HDL-cholesterol to the clustering resulted in the identification of a cluster with very slow glycaemic deterioration.
Collapse
|
research-article |
4 |
44 |
25
|
He B, Dai C, Lang J, Bing P, Tian G, Wang B, Yang J. A machine learning framework to trace tumor tissue-of-origin of 13 types of cancer based on DNA somatic mutation. Biochim Biophys Acta Mol Basis Dis 2020; 1866:165916. [PMID: 32771416 DOI: 10.1016/j.bbadis.2020.165916] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2020] [Revised: 07/20/2020] [Accepted: 08/03/2020] [Indexed: 12/13/2022]
Abstract
Carcinoma of unknown primary (CUP), defined as metastatic cancers with unknown cancer origin, occurs in 3-5 per 100 cancer patients in the United States. Heterogeneity and metastasis of cancer brings great difficulties to the follow-up diagnosis and treatment for CUP. To find the tissue-of-origin (TOO) of the CUP, multiple methods have been raised. However, the accuracies for computed tomography (CT) and positron emission tomography (PET) to identify TOO were 20%-27% and 24%-40% respectively, which were not enough for determining targeted therapies. In this study, we provide a machine learning framework to trace tumor tissue origin by using gene length-normalized somatic mutation sequencing data. Somatic mutation data was downloaded from the Data Portal (Release 28) of the International Cancer Genome Consortium (ICGC), and 4909 samples for 13 cancers was used to identify primary site of cancers. Optimal results were obtained based on a 600-gene set by using the random forest algorithm with 10-fold cross-validation, and the average accuracy and F1-score were 0.8822 and 0.8886 respectively across 13 types of cancer. In conclusion, we provide an effective computational framework to infer cancer tissue-of-origin by combining DNA sequencing and machine learning techniques, which is promising in assisting clinical diagnosis of cancers.
Collapse
|
Research Support, Non-U.S. Gov't |
5 |
42 |