1
|
Shu Y, Chen Z, Chi J, Cheng S, Li H, Liu P, Luo J. A Machine Learning Method for Differentiation Crohn's Disease and Intestinal Tuberculosis. J Multidiscip Healthc 2024; 17:3835-3847. [PMID: 39135850 PMCID: PMC11318598 DOI: 10.2147/jmdh.s470429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Accepted: 07/29/2024] [Indexed: 08/15/2024] Open
Abstract
Background Whether machine learning (ML) can assist in the diagnosis of Crohn's disease (CD) and intestinal tuberculosis (ITB) remains to be explored. Methods We collected clinical data from 241 patients, and 51 parameters were included. Six ML methods were tested, including logistic regression, decision tree, k-nearest neighbor, multinomial NB, multilayer perceptron, and XGBoost. SHAP and LIME were subsequently introduced as interpretability methods. The ML model was tested in a real-world clinical practice and compared with a multidisciplinary team (MDT) meeting. Results XGBoost displays the best performance among the six ML models. The diagnostic AUROC and the accuracy of XGBoost were 0.946 and 0.884, respectively. The top three clinical features affecting our ML model's result prediction were T-spot, pulmonary tuberculosis, and onset age. The ML model's accuracy, sensitivity, and specificity in clinical practice were 0.860, 0.833, and 0.871, respectively. The agreement rate and kappa coefficient of the ML and MDT methods were 90.7% and 0.780, respectively (P<0.001). Conclusion We developed an ML model based on XGBoost. The ML model could provide effective and efficient differential diagnoses of ITB and CD with diagnostic bases. The ML model performs well in real-world clinical practice, and the agreement between the ML model and MDT is strong.
Collapse
Affiliation(s)
- Yufeng Shu
- Department of Gastroenterology, Third Xiangya Hospital, Central South University., Changsha, Hunan, People’s Republic of China
| | - Zhe Chen
- Department of Gerontology, The Affiliated Changsha Central Hospital, Hengyang Medical School, University of South China., Changsha, Hunan, People’s Republic of China
| | - Jingshu Chi
- Department of Gastroenterology, Third Xiangya Hospital, Central South University., Changsha, Hunan, People’s Republic of China
| | - Sha Cheng
- Department of Gastroenterology, Third Xiangya Hospital, Central South University., Changsha, Hunan, People’s Republic of China
| | - Huan Li
- Department of Gastroenterology, Third Xiangya Hospital, Central South University., Changsha, Hunan, People’s Republic of China
| | - Peng Liu
- Department of Gastroenterology, The Affiliated Changsha Central Hospital, Hengyang Medical School, University of South China., Changsha, Hunan, People’s Republic of China
| | - Ju Luo
- Department of Gerontology, The Affiliated Changsha Central Hospital, Hengyang Medical School, University of South China., Changsha, Hunan, People’s Republic of China
| |
Collapse
|
2
|
Deng S, Wang L, Guan S, Li M, Wang L. Non-parametric Nearest Neighbor Classification Based on Global Variance Difference. INT J COMPUT INT SYS 2023. [DOI: 10.1007/s44196-023-00200-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023] Open
Abstract
AbstractAs technology improves, how to extract information from vast datasets is becoming more urgent. As is well known, k-nearest neighbor classifiers are simple to implement and conceptually simple to implement. It is not without its shortcomings, however, as follows: (1) there is still a sensitivity to the choice of k-values even when representative attributes are not considered in each class; (2) in some cases, the proximity between test samples and nearest neighbor samples cannot be reflected accurately due to proximity measurements, etc. Here, we propose a non-parametric nearest neighbor classification method based on global variance differences. First, the difference in variance is calculated before and after adding the sample to be the subject, then the difference is divided by the variance before adding the sample to be tested, and the resulting quotient serves as the objective function. In the final step, the samples to be tested are classified into the class with the smallest objective function. Here, we discuss the theoretical aspects of this function. Using the Lagrange method, it can be shown that the objective function can be optimal when the sample centers of each class are averaged. Twelve real datasets from the University of California, Irvine are used to compare the proposed algorithm with competitors such as the Local mean k-nearest neighbor algorithm and the pseudo-nearest neighbor algorithm. According to a comprehensive experimental study, the average accuracy on 12 datasets is as high as 86.27$$\%$$
%
, which is far higher than other algorithms. The experimental findings verify that the proposed algorithm produces results that are more dependable than other existing algorithms.
Collapse
|
3
|
k-relevance vectors: Considering relevancy beside nearness. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
4
|
Daley MS, Gever D, Posada-Quintero HF, Kong Y, Chon K, Bolkhovsky JB. Machine Learning Models for the Classification of Sleep Deprivation Induced Performance Impairment During a Psychomotor Vigilance Task Using Indices of Eye and Face Tracking. Front Artif Intell 2021; 3:17. [PMID: 33733136 PMCID: PMC7861325 DOI: 10.3389/frai.2020.00017] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2019] [Accepted: 03/13/2020] [Indexed: 11/17/2022] Open
Abstract
High risk professions, such as pilots, police officers, and TSA agents, require sustained vigilance over long periods of time and/or under conditions of little sleep. This can lead to performance impairment in occupational tasks. Predicting impaired states before performance decrement manifests is critical to prevent costly and damaging mistakes. We hypothesize that machine learning models developed to analyze indices of eye and face tracking technologies can accurately predict impaired states. To test this we trained 12 types of machine learning algorithms using five methods of feature selection with indices of eye and face tracking to predict the performance of individual subjects during a psychomotor vigilance task completed at 2-h intervals during a 25-h sleep deprivation protocol. Our results show that (1) indices of eye and face tracking are sensitive to physiological and behavioral changes concomitant with impairment; (2) methods of feature selection heavily influence classification performance of machine learning algorithms; and (3) machine learning models using indices of eye and face tracking can correctly predict whether an individual's performance is “normal” or “impaired” with an accuracy up to 81.6%. These methods can be used to develop machine learning based systems intended to prevent operational mishaps due to sleep deprivation by predicting operator impairment, using indices of eye and face tracking.
Collapse
Affiliation(s)
- Matthew S Daley
- Naval Submarine Medical Research Laboratory, Groton, CT, United States
| | - David Gever
- Naval Submarine Medical Research Laboratory, Groton, CT, United States
| | - Hugo F Posada-Quintero
- Department of Biomedical Engineering, University of Connecticut, Storrs, CT, United States
| | - Youngsun Kong
- Department of Biomedical Engineering, University of Connecticut, Storrs, CT, United States
| | - Ki Chon
- Department of Biomedical Engineering, University of Connecticut, Storrs, CT, United States
| | | |
Collapse
|
5
|
|
6
|
Sorino P, Caruso MG, Misciagna G, Bonfiglio C, Campanella A, Mirizzi A, Franco I, Bianco A, Buongiorno C, Liuzzi R, Cisternino AM, Notarnicola M, Chiloiro M, Pascoschi G, Osella AR. Selecting the best machine learning algorithm to support the diagnosis of Non-Alcoholic Fatty Liver Disease: A meta learner study. PLoS One 2020; 15:e0240867. [PMID: 33079971 PMCID: PMC7575109 DOI: 10.1371/journal.pone.0240867] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Accepted: 10/03/2020] [Indexed: 02/08/2023] Open
Abstract
Background & aims Liver ultrasound scan (US) use in diagnosing Non-Alcoholic Fatty Liver Disease (NAFLD) causes costs and waiting lists overloads. We aimed to compare various Machine learning algorithms with a Meta learner approach to find the best of these as a predictor of NAFLD. Methods The study included 2970 subjects, 2920 constituting the training set and 50, randomly selected, used in the test phase, performing cross-validation. The best predictors were combined to create three models: 1) FLI plus GLUCOSE plus SEX plus AGE, 2) AVI plus GLUCOSE plus GGT plus SEX plus AGE, 3) BRI plus GLUCOSE plus GGT plus SEX plus AGE. Eight machine learning algorithms were trained with the predictors of each of the three models created. For these algorithms, the percent accuracy, variance and percent weight were compared. Results The SVM algorithm performed better with all models. Model 1 had 68% accuracy, with 1% variance and an algorithm weight of 27.35; Model 2 had 68% accuracy, with 1% variance and an algorithm weight of 33.62 and Model 3 had 77% accuracy, with 1% variance and an algorithm weight of 34.70. Model 2 was the most performing, composed of AVI plus GLUCOSE plus GGT plus SEX plus AGE, despite a lower percentage of accuracy. Conclusion A Machine Learning approach can support NAFLD diagnosis and reduce health costs. The SVM algorithm is easy to apply and the necessary parameters are easily retrieved in databases.
Collapse
Affiliation(s)
- Paolo Sorino
- Laboratory of Epidemiology and Biostatistics, National Institute of Gastroenterology, “S de Bellis” Research Hospital, Castellana Grotte, Bari, Italy
| | - Maria Gabriella Caruso
- Laboratory of Nutritional Biochemistry, National Institute of Gastroenterology, “S de Bellis” Research Hospital, Castellana Grotte, Bari, Italy
| | - Giovanni Misciagna
- Scientific and Ethical Committee, Polyclinic Hospital, University of Bari, Bari, Italy
| | - Caterina Bonfiglio
- Laboratory of Epidemiology and Biostatistics, National Institute of Gastroenterology, “S de Bellis” Research Hospital, Castellana Grotte, Bari, Italy
| | - Angelo Campanella
- Laboratory of Epidemiology and Biostatistics, National Institute of Gastroenterology, “S de Bellis” Research Hospital, Castellana Grotte, Bari, Italy
| | - Antonella Mirizzi
- Laboratory of Epidemiology and Biostatistics, National Institute of Gastroenterology, “S de Bellis” Research Hospital, Castellana Grotte, Bari, Italy
| | - Isabella Franco
- Laboratory of Epidemiology and Biostatistics, National Institute of Gastroenterology, “S de Bellis” Research Hospital, Castellana Grotte, Bari, Italy
| | - Antonella Bianco
- Laboratory of Epidemiology and Biostatistics, National Institute of Gastroenterology, “S de Bellis” Research Hospital, Castellana Grotte, Bari, Italy
| | - Claudia Buongiorno
- Laboratory of Epidemiology and Biostatistics, National Institute of Gastroenterology, “S de Bellis” Research Hospital, Castellana Grotte, Bari, Italy
| | - Rosalba Liuzzi
- Laboratory of Epidemiology and Biostatistics, National Institute of Gastroenterology, “S de Bellis” Research Hospital, Castellana Grotte, Bari, Italy
| | - Anna Maria Cisternino
- Clinical Nutrition Outpatient Clinic, National Institute of Gastroenterology, “S de Bellis” Research Hospital, Castellana Grotte, Bari, Italy
| | - Maria Notarnicola
- Laboratory of Nutritional Biochemistry, National Institute of Gastroenterology, “S de Bellis” Research Hospital, Castellana Grotte, Bari, Italy
| | - Marisa Chiloiro
- San Giacomo Hospital Largo S. Veneziani, Monopoli, Bari, Italy
| | - Giovanni Pascoschi
- Department of Electrical and Information Engineering, Polytechnic of Bari, Bari, Italy
| | - Alberto Rubén Osella
- Laboratory of Epidemiology and Biostatistics, National Institute of Gastroenterology, “S de Bellis” Research Hospital, Castellana Grotte, Bari, Italy
- * E-mail:
| | | |
Collapse
|
7
|
Missing Value Estimation Methods Research for Arrhythmia Classification Using the Modified Kernel Difference-Weighted KNN Algorithms. BIOMED RESEARCH INTERNATIONAL 2020; 2020:7141725. [PMID: 32685521 PMCID: PMC7327608 DOI: 10.1155/2020/7141725] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Revised: 05/07/2020] [Accepted: 05/09/2020] [Indexed: 12/12/2022]
Abstract
Electrocardiogram (ECG) signal is critical to the classification of cardiac arrhythmia using some machine learning methods. In practice, the ECG datasets are usually with multiple missing values due to faults or distortion. Unfortunately, many established algorithms for classification require a fully complete matrix as input. Thus it is necessary to impute the missing data to increase the effectiveness of classification for datasets with a few missing values. In this paper, we compare the main methods for estimating the missing values in electrocardiogram data, e.g., the “Zero method”, “Mean method”, “PCA-based method”, and “RPCA-based method” and then propose a novel KNN-based classification algorithm, i.e., a modified kernel Difference-Weighted KNN classifier (MKDF-WKNN), which is fit for the classification of imbalance datasets. The experimental results on the UCI database indicate that the “RPCA-based method” can successfully handle missing values in arrhythmia dataset no matter how many values in it are missing and our proposed classification algorithm, MKDF-WKNN, is superior to other state-of-the-art algorithms like KNN, DS-WKNN, DF-WKNN, and KDF-WKNN for uneven datasets which impacts the accuracy of classification.
Collapse
|
8
|
A Novel Distance Metric Based on Differential Evolution. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2019. [DOI: 10.1007/s13369-019-04003-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
9
|
Gou J, Qiu W, Yi Z, Xu Y, Mao Q, Zhan Y. A Local Mean Representation-based
K
-Nearest Neighbor Classifier. ACM T INTEL SYST TEC 2019. [DOI: 10.1145/3319532] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
K
-nearest neighbor classification method (KNN), as one of the top 10 algorithms in data mining, is a very simple and yet effective nonparametric technique for pattern recognition. However, due to the selective sensitiveness of the neighborhood size
k
, the simple majority vote, and the conventional metric measure, the KNN-based classification performance can be easily degraded, especially in the small training sample size cases. In this article, to further improve the classification performance and overcome the main issues in the KNN-based classification, we propose a local mean representation-based
k
-nearest neighbor classifier (LMRKNN). In the LMRKNN, the categorical
k
-nearest neighbors of a query sample are first chosen to calculate the corresponding categorical
k
-local mean vectors, and then the query sample is represented by the linear combination of the categorical
k
-local mean vectors; finally, the class-specific representation-based distances between the query sample and the categorical
k
-local mean vectors are adopted to determine the class of the query sample. Extensive experiments on many UCI and KEEL datasets and three popular face databases are carried out by comparing LMRKNN to the state-of-art KNN-based methods. The experimental results demonstrate that the proposed LMRKNN outperforms the related competitive KNN-based methods with more robustness and effectiveness.
Collapse
Affiliation(s)
- Jianping Gou
- Jiangsu University, Jiangsu Key Laboratory of Security Technology for Industrial Cyberspace, Zhenjiang, PR, China
| | - Wenmo Qiu
- Jiangsu University, Jiangsu Key Laboratory of Security Technology for Industrial Cyberspace, Zhenjiang, PR, China
| | - Zhang Yi
- Sichuan University, Chengdu, Sichuan, PR China
| | - Yong Xu
- Shenzhen Graduate School, Harbin Institute of Technology, Guangdong, PR China
| | | | | |
Collapse
|
10
|
Gweon H, Schonlau M, Steiner SH. The k conditional nearest neighbor algorithm for classification and class probability estimation. PeerJ Comput Sci 2019; 5:e194. [PMID: 33816847 PMCID: PMC7924495 DOI: 10.7717/peerj-cs.194] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2018] [Accepted: 04/14/2019] [Indexed: 06/12/2023]
Abstract
The k nearest neighbor (kNN) approach is a simple and effective nonparametric algorithm for classification. One of the drawbacks of kNN is that the method can only give coarse estimates of class probabilities, particularly for low values of k. To avoid this drawback, we propose a new nonparametric classification method based on nearest neighbors conditional on each class: the proposed approach calculates the distance between a new instance and the kth nearest neighbor from each class, estimates posterior probabilities of class memberships using the distances, and assigns the instance to the class with the largest posterior. We prove that the proposed approach converges to the Bayes classifier as the size of the training data increases. Further, we extend the proposed approach to an ensemble method. Experiments on benchmark data sets show that both the proposed approach and the ensemble version of the proposed approach on average outperform kNN, weighted kNN, probabilistic kNN and two similar algorithms (LMkNN and MLM-kHNN) in terms of the error rate. A simulation shows that kCNN may be useful for estimating posterior probabilities when the class distributions overlap.
Collapse
|
11
|
A novel nearest interest point classifier for offline Tamil handwritten character recognition. Pattern Anal Appl 2019. [DOI: 10.1007/s10044-018-00776-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
12
|
Performance Comparison of WiFi and UWB Fingerprinting Indoor Positioning Systems. TECHNOLOGIES 2018. [DOI: 10.3390/technologies6010014] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
13
|
|
14
|
Ertuğrul ÖF, Tağluk ME. A novel version of k nearest neighbor: Dependent nearest neighbor. Appl Soft Comput 2017. [DOI: 10.1016/j.asoc.2017.02.020] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
15
|
Fault Detection Using the Clustering-kNN Rule for Gas Sensor Arrays. SENSORS 2016; 16:s16122069. [PMID: 27929412 PMCID: PMC5191050 DOI: 10.3390/s16122069] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/21/2016] [Revised: 11/28/2016] [Accepted: 11/30/2016] [Indexed: 11/17/2022]
Abstract
The k-nearest neighbour (kNN) rule, which naturally handles the possible non-linearity of data, is introduced to solve the fault detection problem of gas sensor arrays. In traditional fault detection methods based on the kNN rule, the detection process of each new test sample involves all samples in the entire training sample set. Therefore, these methods can be computation intensive in monitoring processes with a large volume of variables and training samples and may be impossible for real-time monitoring. To address this problem, a novel clustering-kNN rule is presented. The landmark-based spectral clustering (LSC) algorithm, which has low computational complexity, is employed to divide the entire training sample set into several clusters. Further, the kNN rule is only conducted in the cluster that is nearest to the test sample; thus, the efficiency of the fault detection methods can be enhanced by reducing the number of training samples involved in the detection process of each test sample. The performance of the proposed clustering-kNN rule is fully verified in numerical simulations with both linear and non-linear models and a real gas sensor array experimental system with different kinds of faults. The results of simulations and experiments demonstrate that the clustering-kNN rule can greatly enhance both the accuracy and efficiency of fault detection methods and provide an excellent solution to reliable and real-time monitoring of gas sensor arrays.
Collapse
|
16
|
Yu B, Song X, Guan F, Yang Z, Yao B. k-Nearest Neighbor Model for Multiple-Time-Step Prediction of Short-Term Traffic Condition. ACTA ACUST UNITED AC 2016. [DOI: 10.1061/(asce)te.1943-5436.0000816] [Citation(s) in RCA: 150] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
Affiliation(s)
- Bin Yu
- Professor, Transportation Management College, Dalian Maritime Univ., Dalian 116026, P.R.China
| | - Xiaolin Song
- Transportation Management College, Dalian Maritime Univ., Dalian 116026, P.R. China
| | - Feng Guan
- Transportation Management College, Dalian Maritime Univ., Dalian 116026, P.R. China
| | - Zhiming Yang
- Transportation Management College, Dalian Maritime Univ., Dalian 116026, P.R. China
| | - Baozhen Yao
- Associate Professor, School of Automotive Engineering, Dalian Univ. of Technology, Dalian 116024, P.R. China (corresponding author)
| |
Collapse
|
17
|
Ulutagay G, Kantarci S. An Extension of Fuzzy L-R Data Classification with Fuzzy OWA Distance. INT J INTELL SYST 2015. [DOI: 10.1002/int.21717] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Gözde Ulutagay
- Department of Industrial Engineering; Izmir University; Izmir 35350 Turkey
| | - Suzan Kantarci
- FuDeS Group in Department of Computer Science; Dokuz Eylul University; Izmir 35160 Turkey
| |
Collapse
|
18
|
|
19
|
|
20
|
Gou J, Zhan Y, Rao Y, Shen X, Wang X, He W. Improved pseudo nearest neighbor classification. Knowl Based Syst 2014. [DOI: 10.1016/j.knosys.2014.07.020] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
21
|
Pulse waveform classification using support vector machine with Gaussian time warp edit distance kernel. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2014; 2014:947254. [PMID: 24660022 PMCID: PMC3934457 DOI: 10.1155/2014/947254] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/13/2013] [Revised: 12/16/2013] [Accepted: 12/23/2013] [Indexed: 11/18/2022]
Abstract
Advances in signal processing techniques have provided effective tools for quantitative research in traditional Chinese pulse diagnosis. However, because of the inevitable intraclass variations of pulse patterns, the automatic classification of pulse waveforms has remained a difficult problem. Utilizing the new elastic metric, that is, time wrap edit distance (TWED), this paper proposes to address the problem under the support vector machines (SVM) framework by using the Gaussian TWED kernel function. The proposed method, SVM with GTWED kernel (GTWED-SVM), is evaluated on a dataset including 2470 pulse waveforms of five distinct patterns. The experimental results show that the proposed method achieves a lower average error rate than current pulse waveform classification methods.
Collapse
|
22
|
Tomašev N, Radovanović M, Mladenić D, Ivanović M. Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification. INT J MACH LEARN CYB 2012. [DOI: 10.1007/s13042-012-0137-1] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
23
|
|
24
|
|
25
|
|
26
|
Abstract
Journal impact factor (IF) manipulation has unhealthy effects on the academic community and is attracting more attention from scholars. In this paper, an intelligent method is proposed to identify manipulative self-citation behaviour in journals using pattern recognition. Data on IFs, age distributions of total citations, and numbers of self-citations were collected for 18 journals from 1998 to 2007 in Journal Citation Reports (JCR); these journals include known manipulated journals. The feature variables of the citation distribution functions of the known manipulated journals were extracted using the k-nearest neighbour classifier, and a feature attribute space was established for pattern recognition. The MATLAB software was used to process, train, and test the data and to develop a suitable matrix model which can provide an original model for identifying other manipulated journals. To verify the validity and reliability of this method, the authors randomly collected citation distribution data from several journals in JCR, analysed the results of the verification, and proved the effectiveness of pattern recognition in this context.
Collapse
Affiliation(s)
- Guang Yu
- School of Management, Harbin Institute of Technology, P. R. China,
| | - Dong-Hui Yang
- School of Management, Harbin Institute of Technology, P. R. China
| | - Hui-Xin He
- School of Energy, Harbin Institute of Technology, P. R. China
| |
Collapse
|