1
|
Yoneoka D, Omae K, Henmi M, Eguchi S. Area under the curve-optimized synthesis of prediction models from a meta-analytical perspective. Res Synth Methods 2023; 14:234-246. [PMID: 36424356 DOI: 10.1002/jrsm.1612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Revised: 08/31/2022] [Accepted: 11/07/2022] [Indexed: 11/27/2022]
Abstract
The number of clinical prediction models sharing the same prediction task has increased in the medical literature. However, evidence synthesis methodologies that use the results of these prediction models have not been sufficiently studied, particularly in the context of meta-analysis settings where only summary statistics are available. In particular, we consider the following situation: we want to predict an outcome Y, that is not included in our current data, while the covariate data are fully available. In addition, the summary statistics from prior studies, which share the same prediction task (i.e., the prediction of Y), are available. This study introduces a new method for synthesizing the summary results of binary prediction models reported in the prior studies using a linear predictor under a distributional assumption between the current and prior studies. The method provides an integrated predictor combining all predictors reported in the prior studies with weights. The vector of the weights is designed to achieve the hypothetical improvement of area under the receiver operating characteristic curve (AUC) on the current available data under a practical situation where there are different sets of covariates in the prior studies. We observe a counterintuitive aspect in typical situations where a part of weight components in the proposed method becomes negative. It implies that flipping the sign of the prediction results reported in each individual study would improve the overall prediction performance. Finally, numerical and real-world data analysis were conducted and showed that our method outperformed conventional methods in terms of AUC.
Collapse
Affiliation(s)
- Daisuke Yoneoka
- Infectious Disease Surveillance Center, National Institute of Infectious Diseases, Tokyo, Japan
| | - Katsuhiro Omae
- Department of Data Science, National Cerebral and Cardiovascular Center, Osaka, Japan
| | | | - Shinto Eguchi
- The Institute of Statistical Mathematics, Tokyo, Japan
| |
Collapse
|
2
|
Hao L, Huang G. An improved AdaBoost algorithm for identification of lung cancer based on electronic nose. Heliyon 2023; 9:e13633. [PMID: 36915521 PMCID: PMC10006450 DOI: 10.1016/j.heliyon.2023.e13633] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Revised: 02/01/2023] [Accepted: 02/06/2023] [Indexed: 02/23/2023] Open
Abstract
The research developed an improved intelligent enhancement learning algorithm based on AdaBoost, that can be applied for lung cancer breath detection by the electronic nose (eNose). First, collected the breath signals from volunteers by eNose, including healthy individuals and people who had lung cancer. Additionally, the signals' features were extracted and optimized. Then, multi sub-classifiers were obtained, and their coefficients were derived from the training error. To improve generalization performance, K-fold cross-validation was used when constructing each sub-classifier. The prediction results of a sub-classifier on the test set were then achieved by the voting method. Thus, an improved AdaBoost classifier would be built through heterogeneous integration. The results shows that the average precision of the improved algorithm classifier for distinguishing between people with lung cancer and healthy individuals could reach 98.47%, with 98.33% sensitivity and 97% specificity. And in 100 independent and randomized tests, the coefficient of variation of the classifier's performance hardly exceeded 4%. Compared with other integrated algorithms, the generalization and stability of the improved algorithm classifier are more superior. It is clear that the improved AdaBoost algorithm may help screen out lung cancer more comprehensively. Additionally, it will significantly advance the use of eNose in the early identification of lung cancer.
Collapse
Affiliation(s)
- Lijun Hao
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China.,Medical Instrumentation College, Shanghai University of Medicine and Health Sciences, Shanghai, 201318, China
| | - Gang Huang
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China.,Shanghai Key Laboratory of Molecular Imaging, Jiading District Central Hospital Affiliated Shanghai University of Medicine and Health Sciences, Shanghai, 201318, China
| |
Collapse
|
3
|
Zhuo C, Chen G, Lin C, Jia F, Yang L, Zhang Q, Chen J, Tian H, Jiang D. A borderline personality assessment for adolescents: Validity and reliability of the Chinese languages borderline personality features scale (short form version) for adolescents/children. Front Psychiatry 2022; 13:1050559. [PMID: 36590618 PMCID: PMC9798434 DOI: 10.3389/fpsyt.2022.1050559] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Accepted: 11/21/2022] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Borderline personality disorder (BPD) is characterized by behavioral patterns that promote suffering in many adolescents and their guardians. Currently, early diagnosis of BPD mainly depends on the effective assessment of pathological personality traits (i.e., borderline personality features) and using the indicated scales. The Borderline Personality Features Scale for Children-Short Form (BPFSC-SF) is widely used and the introduction of a Chinese version of the BPFSC-SF, can improve the diagnosis and prognosis of Chinese patients with BPD. OBJECTIVE The aim of the present study was to assess the validity and reliability of the Chinese version of the BPFSC-SF. METHOD 120 adolescents with BPD were enrolled in the present study and completed the BPFSC-SF and the Personality Belief Questionnaire-Short Form (PBQ-SF) assessments. Confirmatory factor analysis (CFA) was used to test assessment validity. Test-retest correlations and the Cronbach's α coefficients were used to determine reliability. RESULTS CFA analysis identified primary factors of BPFSC, with each item ranging from 0.597~0.899. The Spearman rank correlation coefficient was 0.877 between CL-BFSFC-SF and the state vs. trait loneliness scale. The Cronbach's α of the scale was 0.854 in the clinical group. The test-retest reliability correlation coefficient (interclass correlation coefficients.ICC) was 0.937. CONCLUSION The Chinese version of BPFSC-SF is a valid and reliable tool for adolescent Chinese patients with BPD.
Collapse
Affiliation(s)
- Chuanjun Zhuo
- Department of Psychiatry, Wenzhou Seventh Peoples Hospital, Wenzhou, China.,Department of Psychiatry, Tianjin Fourth Center Hospital, Nankai University Affiliated Tianjin Fourth Center Hospital, Tianjin, China.,PNGC_Lab, Tianjin Anding Hospital, Tianjin Mental Health Center of Tianjin Medical University, Tianjin, China
| | - Guangdong Chen
- Department of Psychiatry, Wenzhou Seventh Peoples Hospital, Wenzhou, China
| | - Chongguang Lin
- Department of Psychiatry, Wenzhou Seventh Peoples Hospital, Wenzhou, China
| | - Feng Jia
- PNGC_Lab, Tianjin Anding Hospital, Tianjin Mental Health Center of Tianjin Medical University, Tianjin, China
| | - Lei Yang
- Department of Psychiatry, Tianjin Fourth Center Hospital, Nankai University Affiliated Tianjin Fourth Center Hospital, Tianjin, China
| | - Qiuyu Zhang
- Department of Psychiatry, Tianjin Fourth Center Hospital, Nankai University Affiliated Tianjin Fourth Center Hospital, Tianjin, China
| | - Jiayue Chen
- Department of Psychiatry, Tianjin Fourth Center Hospital, Nankai University Affiliated Tianjin Fourth Center Hospital, Tianjin, China
| | - Hongjun Tian
- Department of Psychiatry, Tianjin Fourth Center Hospital, Nankai University Affiliated Tianjin Fourth Center Hospital, Tianjin, China
| | - Deguo Jiang
- Department of Psychiatry, Wenzhou Seventh Peoples Hospital, Wenzhou, China
| |
Collapse
|
4
|
Liu L, Chen X, Wong KC. Early Cancer Detection from Genome-wide Cell-free DNA Fragmentation via Shuffled Frog Leaping Algorithm and Support Vector Machine. Bioinformatics 2021; 37:3099-3105. [PMID: 33837381 DOI: 10.1093/bioinformatics/btab236] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Revised: 03/19/2021] [Accepted: 04/08/2021] [Indexed: 01/09/2023] Open
Abstract
MOTIVATION Early cancer detection is significant for the patient mortality rate reduction. Although machine learning has been widely employed in that context, there are still deficiencies. In this work, we studied different machine learning algorithms for early cancer detection and proposed an Adaptive Support Vector Machine (ASVM) method by synergizing Shuffled Frog Leaping Algorithm (SFLA) and Support Vector Machine (SVM) in this paper. RESULTS As ASVM regulates SVM for parameter adaption based on data characteristics, the experimental results demonstrated the robust generalization capability of ASVM on different datasets under different settings; for instance, ASVM can enhance the sensitivity by over 10% for early cancer detection compared with SVM. Besides, our proposed ASVM outperformed Grid Search + SVM and Random Search + SVM by significant margins in terms of the area under the ROC curve (AUC) (0.938 vs. 0.922 vs. 0.921). AVAILABILITY The proposed algorithm and dataset are available at https://github.com/ElaineLIU-920/ASVM-for-Early-Cancer-Detection. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Linjing Liu
- Department of Computer Science, City University of Hong Kong, Hong Kong, China
| | - Xingjian Chen
- Department of Computer Science, City University of Hong Kong, Hong Kong, China
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Hong Kong, China
| |
Collapse
|
5
|
Hayashi K, Eguchi S. The power-integrated discriminant improvement: An accurate measure of the incremental predictive value of additional biomarkers. Stat Med 2019; 38:2589-2604. [PMID: 30859601 DOI: 10.1002/sim.8135] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2016] [Revised: 10/24/2018] [Accepted: 02/08/2019] [Indexed: 11/07/2022]
Abstract
The predictive performance of biomarkers is a central concern in biomedical research. This is often evaluated by comparing two statistical models: a "new" model incorporating additional biomarkers and an "old" model without them. In 2008, the integrated discrimination improvement (IDI) was proposed for cases when the response variable is binary, and it is now widely applied as a promising alternative to conventional measures, such as the difference of the area under the receiver operating characteristic curve. However, the IDI can erroneously identify a significant improvement in the new model even if no additional information has been provided by new biomarkers. In order to overcome problems with existing measures, in this study, we propose the power-IDI as a measure of incremental predictive value. Our study explains why the IDI cannot avoid false detection of apparent improvements in a new model and we show that our proposed measure is better able to capture improvements in prediction. Numerical simulations and examples using real empirical data reveal that the power-IDI is not only more powerful but also incurs fewer false detections of improvement.
Collapse
|
6
|
Multi-objective evolutionary algorithm for optimizing the partial area under the ROC curve. Knowl Based Syst 2019. [DOI: 10.1016/j.knosys.2019.01.029] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
7
|
Goldstein BA, Polley EC, Briggs FBS, van der Laan MJ, Hubbard A. Testing the Relative Performance of Data Adaptive Prediction Algorithms: A Generalized Test of Conditional Risk Differences. Int J Biostat 2017; 12:117-29. [PMID: 26529567 DOI: 10.1515/ijb-2015-0014] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Comparing the relative fit of competing models can be used to address many different scientific questions. In classical statistics one can, if appropriate, use likelihood ratio tests and information based criterion, whereas clinical medicine has tended to rely on comparisons of fit metrics like C-statistics. However, for many data adaptive modelling procedures such approaches are not suitable. In these cases, statisticians have used cross-validation, which can make inference challenging. In this paper we propose a general approach that focuses on the "conditional" risk difference (conditional on the model fits being fixed) for the improvement in prediction risk. Specifically, we derive a Wald-type test statistic and associated confidence intervals for cross-validated test sets utilizing the independent validation within cross-validation in conjunction with a test for multiple comparisons. We show that this test maintains proper Type I Error under the null fit, and can be used as a general test of relative fit for any semi-parametric model alternative. We apply the test to a candidate gene study to test for the association of a set of genes in a genetic pathway.
Collapse
|
8
|
Narasimhan H, Agarwal S. Support Vector Algorithms for Optimizing the Partial Area under the ROC Curve. Neural Comput 2017; 29:1919-1963. [PMID: 28562216 DOI: 10.1162/neco_a_00972] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
The area under the ROC curve (AUC) is a widely used performance measure in machine learning. Increasingly, however, in several applications, ranging from ranking to biometric screening to medicine, performance is measured not in terms of the full area under the ROC curve but in terms of the partial area under the ROC curve between two false-positive rates. In this letter, we develop support vector algorithms for directly optimizing the partial AUC between any two false-positive rates. Our methods are based on minimizing a suitable proxy or surrogate objective for the partial AUC error. In the case of the full AUC, one can readily construct and optimize convex surrogates by expressing the performance measure as a summation of pairwise terms. The partial AUC, on the other hand, does not admit such a simple decomposable structure, making it more challenging to design and optimize (tight) convex surrogates for this measure. Our approach builds on the structural SVM framework of Joachims ( 2005 ) to design convex surrogates for partial AUC and solves the resulting optimization problem using a cutting plane solver. Unlike the full AUC, where the combinatorial optimization needed in each iteration of the cutting plane solver can be decomposed and solved efficiently, the corresponding problem for the partial AUC is harder to decompose. One of our main contributions is a polynomial time algorithm for solving the combinatorial optimization problem associated with partial AUC. We also develop an approach for optimizing a tighter nonconvex hinge loss-based surrogate for the partial AUC using difference-of-convex programming. Our experiments on a variety of real-world and benchmark tasks confirm the efficacy of the proposed methods.
Collapse
Affiliation(s)
- Harikrishna Narasimhan
- John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138, U.S.A.
| | - Shivani Agarwal
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19104, U.S.A.
| |
Collapse
|
9
|
Farah K, Smith JET, Cook EP. ROC-based estimates of neural-behavioral covariations using matched filters. Neural Comput 2014; 26:1667-89. [PMID: 24877731 DOI: 10.1162/neco_a_00616] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Correlations between responses in visual cortex and perceptual performance help draw a functional link between neural activity and visually guided behavior. These correlations are commonly derived with ROC-based neural-behavioral covariances (referred to as choice or detect probability) using boxcar analysis windows. Although boxcar windows capture the covariation between neural activity and behavior during steady-state stimulus presentations, they are not optimized to capture these correlations during short time-varying visual inputs. In this study, we implemented a matched-filter technique, combined with cross-validation, to improve the estimation of ROC-based neural-behavioral covariance under short and dynamic stimulus conditions. We show that this approach maximizes the area under the ROC curve and converges to the true neural-behavioral covariance using a Poisson spiking model. We also demonstrate that the matched filter, combined with cross-validation, reveals the dynamics of the neural-behavioral covariations of individual MT neurons during the detection of a brief motion stimulus.
Collapse
Affiliation(s)
- Kamal Farah
- Department of Electrical and Computer Engineering, McGill University, Montreal, Quebec H3A 0E9, Canada
| | | | | |
Collapse
|
10
|
Ma JJ, Ding H, Xu BH, Xu C, Song LJ, Huang BJ, Wang WP. Diagnostic performances of various gray-scale, color Doppler, and contrast-enhanced ultrasonography findings in predicting malignant thyroid nodules. Thyroid 2014; 24:355-63. [PMID: 23978252 DOI: 10.1089/thy.2013.0150] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
BACKGROUND Ultrasonography is the most frequently used clinical tool for the identification, assessment, and follow-up of thyroid nodules. The purpose of this research was to evaluate the value of diagnostic ultrasonography indicators, to obtain rankings of the most valuable indicators in the differential diagnosis of thyroid nodules, and to analyze the optimal diagnostic points and clinical values. METHODS One hundred forty-four patients with 172 thyroid nodules underwent preoperative ultrasonography examinations, including gray-scale ultrasonography (GSUS), color Doppler ultrasonography (CDUS), and contrast-enhanced ultrasonography (CEUS). Fourteen indicators of thyroid nodules on GSUS, CDUS, and CEUS were selected to evaluate all thyroid nodules. The differences between the benign and malignant thyroid nodules in all indicators were analyzed by the chi-squared test; the diagnostic ultrasonography values were obtained by logistic regression; and the optimal diagnostic points were explored by receiver operating characteristic curve analysis. RESULTS Of the 172 thyroid nodules that were surgically removed, 78 were benign and 94 were malignant. Ten indicators of GSUS and CEUS showed significant differences between the benign and malignant nodules (p<0.05), whereas four CDUS indicators had no value. The rankings of the valuable indicators were obtained according to their odds ratios (ORs). The top four indicators were ring enhancement and homogeneity of enhancement on CEUS, and microcalcification and halo on GSUS. These indicators were the most valuable, with ORs of greater than 20 in the differential diagnosis of benign and malignant thyroid nodules. The other six indicators-the relative arrival time of the nodule on CEUS, interior echogenicity on GSUS, peak interior echogenicity on CEUS, shape on GSUS, peak peripheral echogenicity on CEUS, and orientation on GSUS-were also valuable, with ORs less than 20. The areas under the receiver operating characteristic curves for GSUS, CEUS, and the combination of GSUS and CEUS in the diagnosis of thyroid nodules were 0.936, 0.910, and 0.966, respectively. Five positive features of the 10 valuable indicators on GSUS and CEUS defined the cut-off for the diagnosis of malignant thyroid nodules, with a sensitivity of 89.4% (84/94), specificity of 93.6% (73/78), and accuracy of 91.3% (157/172). CONCLUSIONS The ring enhancement and homogeneity of enhancement of thyroid nodules on CEUS and the microcalcification and halo on GSUS were the four most valuable indicators in the differential diagnosis of thyroid nodules. Conjoint analysis of specific features of thyroid nodules on GSUS and CEUS could enhance the diagnostic value of thyroid nodules.
Collapse
Affiliation(s)
- Jiao-jiao Ma
- 1 Department of Ultrasound, Zhongshan Hospital, Fudan University , Shanghai, China
| | | | | | | | | | | | | |
Collapse
|
11
|
Hwang KB, Ha BY, Ju S, Kim S. Partial AUC maximization for essential gene prediction using genetic algorithms. BMB Rep 2013; 46:41-6. [PMID: 23351383 PMCID: PMC4133830 DOI: 10.5483/bmbrep.2013.46.1.159] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Identifying genes indispensable for an organism‘s life and their characteristics is one of the central questions in current biological research, and hence it would be helpful to develop computational approaches towards the prediction of essential genes. The performance of a predictor is usually measured by the area under the receiver operating characteristic curve (AUC). We propose a novel method by implementing genetic algorithms to maximize the partial AUC that is restricted to a specific interval of lower false positive rate (FPR), the region relevant to follow-up experimental validation. Our predictor uses various features based on sequence information, proteinprotein interaction network topology, and gene expression profiles. A feature selection wrapper was developed to alleviate the over-fitting problem and to weigh each feature’s relevance to prediction. We evaluated our method using the proteome of budding yeast. Our implementation of genetic algorithms maximizing the partial AUC below 0.05 or 0.10 of FPR outperformed other popular classification methods. [BMB Reports 2013; 46(1): 41-46]
Collapse
Affiliation(s)
- Kyu-Baek Hwang
- School of Computer Science and Engineering, Soongsil University, Seoul, Korea
| | | | | | | |
Collapse
|