Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Lavrac N. Selected techniques for data mining in medicine. Artif Intell Med 1999;16:3-23. [PMID: 10225344 DOI: 10.1016/s0933-3657(98)00062-1] [Citation(s) in RCA: 118] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

Number

Cited by Other Article(s)

Mishra M, Acharjya DP. A hybridized red deer and rough set clinical information retrieval system for hepatitis B diagnosis. Sci Rep 2024;14:3815. [PMID: 38360918 PMCID: PMC10869783 DOI: 10.1038/s41598-024-53170-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Accepted: 01/29/2024] [Indexed: 02/17/2024] Open

Abstract

Healthcare is a big concern in the current booming population. Many approaches for improving health are imposed, such as early disease identification, treatment, and prevention. Therefore, knowledge acquisition is highly essential at different stages of decision-making. Inferring knowledge from the information system, which necessitates multiple steps for extracting useful information, is one technique to address this problem. Handling uncertainty throughout data analysis is also another challenging task. Computer intelligence is a step forward to this end while selecting characteristics, classification, clustering, and developing clinical information retrieval systems. According to recent studies, swarm optimization is a useful technique for discovering key features while resolving real-world issues. However, it is ineffective in managing uncertainty. Conversely, a rough set helps a decision system generate decision rules. This produces decision rules without any additional information. In order to assess real-world information systems while managing uncertainties, a hybrid strategy that combines a rough set and red deer algorithm is presented in this research. In the red deer optimization algorithm, the suggested method selects the optimal characteristics in terms of the degree of dependence on the rough set. In order to determine the decision rules, further a rough set is used. The efficiency of the suggested model is also contrasted with that of the decision tree algorithm and the conventional rough set. An empirical study on hepatitis disease illustrates the viability of the proposed research as compared to the decision tree and crisp rough set. The proposed hybridization of rough set and red deer algorithm achieves an accuracy of 91.7% accuracy. The acquired accuracy for the decision tree, and rough set methods is 82.9%, and 88.9%, respectively. It suggests that the proposed research is viable.

Collapse

Vallée R, Vallée JN, Guillevin C, Lallouette A, Thomas C, Rittano G, Wager M, Guillevin R, Vallée A. Machine learning decision tree models for multiclass classification of common malignant brain tumors using perfusion and spectroscopy MRI data. Front Oncol 2023;13:1089998. [PMID: 37614505 PMCID: PMC10442801 DOI: 10.3389/fonc.2023.1089998] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 07/17/2023] [Indexed: 08/25/2023] Open

Abstract

Background

To investigate the contribution of machine learning decision tree models applied to perfusion and spectroscopy MRI for multiclass classification of lymphomas, glioblastomas, and metastases, and then to bring out the underlying key pathophysiological processes involved in the hierarchization of the decision-making algorithms of the models.

Methods

From 2013 to 2020, 180 consecutive patients with histopathologically proved lymphomas (n = 77), glioblastomas (n = 45), and metastases (n = 58) were included in machine learning analysis after undergoing MRI. The perfusion parameters (rCBVmax, PSRmax) and spectroscopic concentration ratios (lac/Cr, Cho/NAA, Cho/Cr, and lip/Cr) were applied to construct Classification and Regression Tree (CART) models for multiclass classification of these brain tumors. A 5-fold random cross validation was performed on the dataset.

Results

The decision tree model thus constructed successfully classified all 3 tumor types with a performance (AUC) of 0.98 for PCNSLs, 0.98 for GBM and 1.00 for METs. The model accuracy was 0.96 with a RSquare of 0.887. Five rules of classifier combinations were extracted with a predicted probability from 0.907 to 0.989 for that end nodes of the decision tree for tumor multiclass classification. In hierarchical order of importance, the root node (Cho/NAA) in the decision tree algorithm was primarily based on the proliferative, infiltrative, and neuronal destructive characteristics of the tumor, the internal node (PSRmax), on tumor tissue capillary permeability characteristics, and the end node (Lac/Cr or Cho/Cr), on tumor energy glycolytic (Warburg effect), or on membrane lipid tumor metabolism.

Conclusion

Our study shows potential implementation of machine learning decision tree model algorithms based on a hierarchical, convenient, and personalized use of perfusion and spectroscopy MRI data for multiclass classification of these brain tumors.

Collapse

Affiliation(s)

Rodolphe Vallée Interdisciplinary Laboratory in Neurosciences, Physiology and Psychology (LINP2), Université Paris Lumière (UPL), Paris Nanterre University, Nanterre, France Laboratory of Mathematics and Applications (LMA) Centre National de la Recherche Scientifique - Unité Mixte de Recherche (CNRS UMR)7348, i3M-DACTIM-MIH (Data Analysis and Computations Through Imaging Modeling - Mathematics, Image, Health), Poitiers University, Poitiers, France Glaucoma Research Center, Swiss Visio Network, Lausanne, Switzerland
Jean-Noël Vallée Laboratory of Mathematics and Applications (LMA) Centre National de la Recherche Scientifique - Unité Mixte de Recherche (CNRS UMR)7348, i3M-DACTIM-MIH (Data Analysis and Computations Through Imaging Modeling - Mathematics, Image, Health), Poitiers University, Poitiers, France Diagnostic and Functional Neuroradiology and Brain stimulation Department, 15-20 National Vision Hospital of Paris - Paris University Hospital Center, University of PARIS-SACLAY - UVSQ, Paris, France
Carole Guillevin Laboratory of Mathematics and Applications (LMA) Centre National de la Recherche Scientifique - Unité Mixte de Recherche (CNRS UMR)7348, i3M-DACTIM-MIH (Data Analysis and Computations Through Imaging Modeling - Mathematics, Image, Health), Poitiers University, Poitiers, France Radiology Department, Poitiers University Hospital, Poitiers University, Poitiers, France
Athéna Lallouette Center of Genève Ophtalmologie, Geneve, Switzerland
Clément Thomas Laboratory of Mathematics and Applications (LMA) Centre National de la Recherche Scientifique - Unité Mixte de Recherche (CNRS UMR)7348, i3M-DACTIM-MIH (Data Analysis and Computations Through Imaging Modeling - Mathematics, Image, Health), Poitiers University, Poitiers, France Diagnostic and Functional Neuroradiology and Brain stimulation Department, 15-20 National Vision Hospital of Paris - Paris University Hospital Center, University of PARIS-SACLAY - UVSQ, Paris, France
Guillaume Rittano Radiology Department, Hopital Riveira Chablais, Rennaz, Switzerland
Michel Wager Neurosurgery Department, Poitiers University Hospital, Poitiers University, Poitiers, France
Rémy Guillevin Laboratory of Mathematics and Applications (LMA) Centre National de la Recherche Scientifique - Unité Mixte de Recherche (CNRS UMR)7348, i3M-DACTIM-MIH (Data Analysis and Computations Through Imaging Modeling - Mathematics, Image, Health), Poitiers University, Poitiers, France Radiology Department, Poitiers University Hospital, Poitiers University, Poitiers, France
Alexandre Vallée Department of Epidemiology and Public Health, Foch Hospital, Suresnes, France

Collapse

Vallée A. Arterial stiffness and biological parameters: A decision tree machine learning application in hypertensive participants. PLoS One 2023;18:e0288298. [PMID: 37418473 DOI: 10.1371/journal.pone.0288298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Accepted: 06/23/2023] [Indexed: 07/09/2023] Open

Abstract

Arterial stiffness, measured by arterial stiffness index (ASI), could be considered a main denominator in target organ damage among hypertensive subjects. Currently, no reported ASI normal references have been reported. The index of arterial stiffness is evaluated by calculation of a stiffness index. Predicted ASI can be estimated regardless to age, sex, mean blood pressure, and heart rate, to compose an individual stiffness index [(measured ASI-predicted ASI)/predicted ASI]. A stiffness index greater than zero defines arterial stiffness. Thus, the purpose of this study was 1) to determine determinants of stiffness index 2) to perform threshold values to discriminate stiffness index and then 3) to determine hierarchical associations of the determinants by performing a decision tree model among hypertensive participants without CV diseases. A study was conducted from 53,363 healthy participants in the UK Biobank survey to determine predicted ASI. Stiffness index was applied on 49,452 hypertensives without CV diseases to discriminate determinants of positive stiffness index (N = 22,453) from negative index (N = 26,999). The input variables for the models were clinical and biological parameters. The independent classifiers were ranked from the most sensitives: HDL cholesterol≤1.425 mmol/L, smoking pack years≥9.2pack-years, Phosphate≥1.172 mmol/L, to the most specifics: Cystatin c≤0.901 mg/L, Triglycerides≥1.487 mmol/L, Urate≥291.9 μmol/L, ALT≥22.13 U/L, AST≤32.5 U/L, Albumin≤45.92 g/L, Testosterone≥5.181 nmol/L. A decision tree model was performed to determine rules to highlight the different hierarchization and interactions between these classifiers with a higher performance than multiple logistic regression (p<0.001). The stiffness index could be an integrator of CV risk factors and participate in future CV risk management evaluations for preventive strategies. Decision trees can provide accurate and useful classification for clinicians.

Collapse

Chen N, Fan F, Geng J, Yang Y, Gao Y, Jin H, Chu Q, Yu D, Wang Z, Shi J. Evaluating the risk of hypertension in residents in primary care in Shanghai, China with machine learning algorithms. Front Public Health 2022;10:984621. [PMID: 36267989 PMCID: PMC9577109 DOI: 10.3389/fpubh.2022.984621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2022] [Accepted: 09/12/2022] [Indexed: 01/25/2023] Open

Abstract

Objective

The prevention of hypertension in primary care requires an effective and suitable hypertension risk assessment model. The aim of this study was to develop and compare the performances of three machine learning algorithms in predicting the risk of hypertension for residents in primary care in Shanghai, China.

Methods

A dataset of 40,261 subjects over the age of 35 years was extracted from Electronic Healthcare Records of 47 community health centers from 2017 to 2019 in the Pudong district of Shanghai. Embedded methods were applied for feature selection. Machine learning algorithms, XGBoost, random forest, and logistic regression analyses were adopted in the process of model construction. The performance of models was evaluated by calculating the area under the receiver operating characteristic curve, sensitivity, specificity, positive predictive value, negative predictive value, accuracy and F1-score.

Results

The XGBoost model outperformed the other two models and achieved an AUC of 0.765 in the testing set. Twenty features were selected to construct the model, including age, diabetes status, urinary protein level, BMI, elderly health self-assessment, creatinine level, systolic blood pressure measured on the upper right arm, waist circumference, smoking status, low-density lipoprotein cholesterol level, high-density lipoprotein cholesterol level, frequency of drinking, glucose level, urea nitrogen level, total cholesterol level, diastolic blood pressure measured on the upper right arm, exercise frequency, time spent engaged in exercise, high salt consumption, and triglyceride level.

Conclusions

XGBoost outperformed random forest and logistic regression in predicting the risk of hypertension in primary care. The integration of this risk assessment model into primary care facilities may improve the prevention and management of hypertension in residents.

Collapse

Affiliation(s)

Ning Chen School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, China
Feng Fan School of Medicine, Tongji University, Shanghai, China
Jinsong Geng School of Medicine, Nantong University, Nantong, China
Yan Yang School of Economics and Management, Tongji University, Shanghai, China
Ya Gao School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, China
Hua Jin Department of General Practice, Yangpu Hospital, Tongji University School of Medicine, Shanghai, China,Shanghai General Practice and Community Health Development Research Center, Shanghai, China,Academic Department of General Practice, Tongji University School of Medicine, Shanghai, China,Clinical Research Center for General Practice, Tongji University, Shanghai, China
Qiao Chu School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, China
Dehua Yu Department of General Practice, Yangpu Hospital, Tongji University School of Medicine, Shanghai, China,Shanghai General Practice and Community Health Development Research Center, Shanghai, China,Academic Department of General Practice, Tongji University School of Medicine, Shanghai, China,Clinical Research Center for General Practice, Tongji University, Shanghai, China,*Correspondence: Dehua Yu
Zhaoxin Wang The First Affiliated Hospital of Hainan Medical University, Haikou, China,Department of Social Medicine and Health Management, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, China,School of Management, Hainan Medical University, Haikou, China,Zhaoxin Wang
Jianwei Shi Department of General Practice, Yangpu Hospital, Tongji University School of Medicine, Shanghai, China,Shanghai General Practice and Community Health Development Research Center, Shanghai, China,Department of Social Medicine and Health Management, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, China,Jianwei Shi

Collapse

Constructing Explainable Classifiers from the Start—Enabling Human-in-the Loop Machine Learning. INFORMATION 2022. [DOI: 10.3390/info13100464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open

Vallée A. Association between serum uric acid and arterial stiffness in a large-aged 40-70 years old population. J Clin Hypertens (Greenwich) 2022;24:885-897. [PMID: 35748644 PMCID: PMC9278596 DOI: 10.1111/jch.14527] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 05/25/2022] [Accepted: 05/28/2022] [Indexed: 12/24/2022]

Vallée A. Arterial Stiffness Determinants for Primary Cardiovascular Prevention among Healthy Participants. J Clin Med 2022;11:jcm11092512. [PMID: 35566636 PMCID: PMC9105622 DOI: 10.3390/jcm11092512] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 04/13/2022] [Accepted: 04/27/2022] [Indexed: 12/27/2022] Open

Ji W, Xue M, Zhang Y, Yao H, Wang Y. A Machine Learning Based Framework to Identify and Classify Non-alcoholic Fatty Liver Disease in a Large-Scale Population. Front Public Health 2022;10:846118. [PMID: 35444985 PMCID: PMC9013842 DOI: 10.3389/fpubh.2022.846118] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Accepted: 02/23/2022] [Indexed: 12/12/2022] Open

Haouassi H, Mahdaoui R, Chouhal O, Bakhouche A. An efficient classification rule generation for coronary artery disease diagnosis using a novel discrete equilibrium optimizer algorithm. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2022. [DOI: 10.3233/jifs-213257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Abstract Many machine learning-based methods have been widely applied to Coronary Artery Disease (CAD) and are achieving high accuracy. However, they are black-box methods that are unable to explain the reasons behind the diagnosis. The trade-off between accuracy and interpretability of diagnosis models is important, especially for human disease. This work aims to propose an approach for generating rule-based models for CAD diagnosis. The classification rule generation is modeled as combinatorial optimization problem and it can be solved by means of metaheuristic algorithms. Swarm intelligence algorithms like Equilibrium Optimizer Algorithm (EOA) have demonstrated great performance in solving different optimization problems. Our present study comes up with a Novel Discrete Equilibrium Optimizer Algorithm (NDEOA) for the classification rule generation from training CAD dataset. The proposed NDEOA is a discrete version of EOA, which use a discrete encoding of a particle for representing a classification rule; new discrete operators are also defined for the particle’s position update equation to adapt real operators to discrete space. To evaluate the proposed approach, the real world Z-Alizadeh Sani dataset has been employed. The proposed approach generate a diagnosis model composed of 17 rules, among them, five rules for the class “Normal” and 12 rules for the class “CAD”. In comparison to nine black-box and eight white-box state-of-the-art approaches, the results show that the generated diagnosis model by the proposed approach is more accurate and more interpretable than all white-box models and are competitive to the black-box models. It achieved an overall accuracy, sensitivity and specificity of 93.54%, 80% and 100% respectively; which show that, the proposed approach can be successfully utilized to generate efficient rule-based CAD diagnosis models. Collapse

k-relevance vectors: Considering relevancy beside nearness. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Kolose S, Stewart T, Hume P, Tomkinson GR. Prediction of military combat clothing size using decision trees and 3D body scan data. APPLIED ERGONOMICS 2021;95:103435. [PMID: 33932688 DOI: 10.1016/j.apergo.2021.103435] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Revised: 04/06/2021] [Accepted: 04/12/2021] [Indexed: 06/12/2023]

A Noninvasive Prediction Model for Hepatitis B Virus Disease in Patients with HIV: Based on the Population of Jiangsu, China. BIOMED RESEARCH INTERNATIONAL 2021;2021:6696041. [PMID: 33860053 PMCID: PMC8024075 DOI: 10.1155/2021/6696041] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/27/2020] [Accepted: 03/17/2021] [Indexed: 02/07/2023]

Abstract

Objective

To establish a machine learning model for identifying patients coinfected with hepatitis B virus (HBV) and human immunodeficiency virus (HIV) through two sexual transmission routes in Jiangsu, China.

Methods

A total of 14197 HIV cases transmitted by homosexual and heterosexual routes were recruited. After data processing, 12469 cases (HIV and HBV, 1033; HIV, 11436) were left for further analysis, including 7849 cases with homosexual transmission and 4620 cases with heterosexual transmission. Univariate logistic regression was used to select variables with significant P value and odds ratio for multivariable analysis. In homosexual transmission and heterosexual transmission groups, 10 and 6 variables were selected, respectively. For identifying HIV individuals coinfected with HBV, a machine learning model was constructed with four algorithms, including Decision Tree, Random Forest, AdaBoost with decision tree (AdaBoost), and extreme gradient boosting decision tree (XGBoost). The detective value of each variable was calculated using the optimal machine learning algorithm.

Results

AdaBoost algorithm showed the highest efficiency in both transmission groups (homosexual transmission group: accuracy = 0.928, precision = 0.915, recall = 0.944, F − 1 = 0.930, and AUC = 0.96; heterosexual transmission group: accuracy = 0.892, precision = 0.881, recall = 0.905, F − 1 = 0.893, and AUC = 0.98). Calculated by AdaBoost algorithm, the detective value of PLA was the highest in homosexual transmission group, followed by CR, AST, HB, ALT, TBIL, leucocyte, age, marital status, and treatment condition; in the heterosexual transmission group, the detective value of PLA was the highest (consistent with the condition in the homosexual group), followed by ALT, AST, TBIL, leucocyte, and symptom severity.

Conclusions

The univariate logistics regression combined with the AdaBoost algorithm could accurately screen the risk factors of HBV in HIV coinfection without invasive testing. Further studies are needed to evaluate the utility and feasibility of this model in various settings.

Collapse

de Barros ADMC, Silva AFR, Zibordi M, Spagnolo JD, Corrêa RR, Belli CB, de Camargo MM. Equine simplified acute physiology score: Personalised medicine for the equine emergency patient. Vet Rec 2021;189:e136. [PMID: 33729604 DOI: 10.1002/vetr.136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 12/11/2020] [Accepted: 01/26/2021] [Indexed: 11/12/2022]

Prediction of Important Factors for Bleeding in Liver Cirrhosis Disease Using Ensemble Data Mining Approach. MATHEMATICS 2020. [DOI: 10.3390/math8111887] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

AlKaabi LA, Ahmed LS, Al Attiyah MF, Abdel-Rahman ME. Predicting hypertension using machine learning: Findings from Qatar Biobank Study. PLoS One 2020;15:e0240370. [PMID: 33064740 PMCID: PMC7567367 DOI: 10.1371/journal.pone.0240370] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Accepted: 09/08/2020] [Indexed: 12/14/2022] Open

Abstract

Background and objective

Hypertension, a global burden, is associated with several risk factors and can be treated by lifestyle modifications and medications. Prediction and early diagnosis is important to prevent related health complications. The objective is to construct and compare predictive models to identify individuals at high risk of developing hypertension without the need of invasive clinical procedures.

Methods

This is a cross-sectional study using 987 records of Qataris and long-term residents aged 18₊ years from Qatar Biobank. Percentages were used to summarize data and chi-square tests to assess associations. Predictive models of hypertension were constructed and compared using three supervised machine learning algorithms: decision tree, random forest, and logistics regression using 5-fold cross-validation. The performance of algorithms was assessed using accuracy, positive predictive value (PPV), sensitivity, F-measure, and area under the receiver operating characteristic curve (AUC). Stata and Weka were used for analysis.

Results

Age, gender, education level, employment, tobacco use, physical activity, adequate consumption of fruits and vegetables, abdominal obesity, history of diabetes, history of high cholesterol, and mother’s history high blood pressure were important predictors of hypertension. All algorithms showed more or less similar performances: Random forest (accuracy = 82.1%, PPV = 81.4%, sensitivity = 82.1%), logistic regression (accuracy = 81.1%, PPV = 80.1%, sensitivity = 81.1%) and decision tree (accuracy = 82.1%, PPV = 81.2%, sensitivity = 82.1%. In terms of AUC, compared to logistic regression, while random forest performed similarly, decision tree had a significantly lower discrimination ability (p-value<0.05) with AUC’s equal to 85.0, 86.9, and 79.9, respectively.

Conclusions

Machine learning provides the chance of having a rapid predictive model using non-invasive predictors to screen for hypertension. Future research should consider improving the predictive accuracy of models in larger general populations, including more important predictors and using a variety of algorithms.

Collapse

Design of an integrated model for diagnosis and classification of pediatric acute leukemia using machine learning. Proc Inst Mech Eng H 2020;234:1051-1069. [DOI: 10.1177/0954411920938567] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Benmouna Y, Mezmaz MS, Mahmoudi S, Chikh MA. Parallel cycle-based branch-and-bound method for Bayesian network learning. Pattern Anal Appl 2020. [DOI: 10.1007/s10044-019-00815-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Geldof T, Van Damme N, Huys I, Van Dyck W. Patient-Level Effectiveness Prediction Modeling for Glioblastoma Using Classification Trees. Front Pharmacol 2020;10:1665. [PMID: 32116674 PMCID: PMC7025482 DOI: 10.3389/fphar.2019.01665] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Accepted: 12/19/2019] [Indexed: 12/18/2022] Open

Abstract

Objectives

Little research has been done in pharmacoepidemiology on the use of machine learning for exploring medicinal treatment effectiveness in oncology. Therefore, the aim of this study was to explore the added value of machine learning methods to investigate individual treatment responses for glioblastoma patients treated with temozolomide.

Methods

Based on a retrospective observational registry covering 3090 patients with glioblastoma treated with temozolomide, we proposed the use of a two-step iterative exploratory learning process consisting of an initialization phase and a machine learning phase. For initialization, we defined a binary response variable as the target label using one-by-one nearest neighbor propensity score matching. Secondly, a classification tree algorithm was trained and validated for dividing individual patients into treatment response and non-response groups. Theorizing about treatment response was then done by evaluating the tree performance.

Results

The classification tree model has an area under the curve (AUC) classification performance of 67% corresponding to a sensitivity of 0.69 and a specificity of 0.51. This result in predicting patient-level response was slightly better than the logistic regression model featuring an AUC of 64% (0.63 sensitivity and 0.54 specificity). The tree confirms confounding by age and discovers further age-related stratification with chemotherapy-treatment dependency, both not revealed in preceding clinical studies. The model lacked genetic information confounding treatment response.

Conclusions

A classification tree was found to be suitable for understanding patient-level effectiveness for this glioblastoma–temozolomide case because of its high interpretability and capability to deal with covariate interdependencies, essential in a real-world environment. Possible improvements in the model’s classification can be achieved by including genetic information and collecting primary data on treatment response. The model can be valuable in clinical practice for predicting personal treatment pathways.

Collapse

Strong approximate Markov blanket and its application on filter-based feature selection. Appl Soft Comput 2020. [DOI: 10.1016/j.asoc.2019.105957] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

AlMuhaideb S, Alswailem O, Alsubaie N, Ferwana I, Alnajem A. Prediction of hospital no-show appointments through artificial intelligence algorithms. Ann Saudi Med 2019;39:373-381. [PMID: 31804138 PMCID: PMC6894458 DOI: 10.5144/0256-4947.2019.373] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open

Di Noia A, Martino A, Montanari P, Rizzi A. Supervised machine learning techniques and genetic optimization for occupational diseases risk prediction. Soft comput 2019. [DOI: 10.1007/s00500-019-04200-2] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Itani S, Rossignol M, Lecron F, Fortemps P. Towards interpretable machine learning models for diagnosis aid: A case study on attention deficit/hyperactivity disorder. PLoS One 2019;14:e0215720. [PMID: 31022245 PMCID: PMC6483231 DOI: 10.1371/journal.pone.0215720] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2018] [Accepted: 04/09/2019] [Indexed: 12/31/2022] Open

Pei D, Gong Y, Kang H, Zhang C, Guo Q. Accurate and rapid screening model for potential diabetes mellitus. BMC Med Inform Decis Mak 2019;19:41. [PMID: 30866905 PMCID: PMC6416888 DOI: 10.1186/s12911-019-0790-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2018] [Accepted: 03/03/2019] [Indexed: 11/26/2022] Open

Abstract

Background

Prediction or early diagnosis of diabetes is crucial for populations with high risk of diabetes.

Methods

In this study, we assessed the ability of five popular classifiers (J48, AdaboostM1, SMO, Bayes Net, and Naïve Bayes) to identify individuals with diabetes based on nine non-invasive and easily obtained clinical features, including age, gender, body mass index (BMI), hypertension, history of cardiovascular disease or stroke, family history of diabetes, physical activity, work stress, and salty food preference. A total of 4205 data entries were obtained from annual physical examination reports for adults in the Shengjing Hospital of China Medical University during January–April 2017. Weka data mining software was used to identify the best algorithm for diabetes classification.

Results

The results indicate that decision tree classifier J48 has the best performance (accuracy = 0.9503, precision = 0.950, recall = 0.950, F-measure = 0.948, and AUC = 0.964). The decision tree structure shows that age is the most significant feature, followed by family history of diabetes, work stress, BMI, salty food preference, physical activity, hypertension, gender, and history of cardiovascular disease or stroke.

Conclusions

Our study shows that decision tree analyses can be applied to screen individuals for early diabetes risk without the need for invasive tests. This procedure will be particularly useful in developing regions with high epidemiological risk and poor socioeconomic status, and enable clinical practitioners to rapidly screen patients for increased risk of diabetes. The key features in the tree structure could further facilitate diabetes prevention through targeted community interventions, which can potentially improve early diabetes diagnosis and reduce burdens on the healthcare system.

Collapse

A rule-based semantic approach for data integration, standardization and dimensionality reduction utilizing the UMLS: Application to predicting bariatric surgery outcomes. Comput Biol Med 2019;106:84-90. [DOI: 10.1016/j.compbiomed.2019.01.019] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2018] [Revised: 01/21/2019] [Accepted: 01/21/2019] [Indexed: 11/24/2022]

Parrales Bravo F, Del Barrio García AA, Gallego MM, Gago Veiga AB, Ruiz M, Guerrero Peral A, Ayala JL. Prediction of patient's response to OnabotulinumtoxinA treatment for migraine. Heliyon 2019;5:e01043. [PMID: 30886915 PMCID: PMC6401533 DOI: 10.1016/j.heliyon.2018.e01043] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2018] [Revised: 05/15/2018] [Accepted: 12/10/2018] [Indexed: 01/03/2023] Open

Mining Compact Predictive Pattern Sets Using Classification Model. Artif Intell Med 2019;11526:386-396. [DOI: 10.1007/978-3-030-21642-9_49] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]

Vijayakumar R, Cheung MWL. Replicability of Machine Learning Models in the Social Sciences. ZEITSCHRIFT FUR PSYCHOLOGIE-JOURNAL OF PSYCHOLOGY 2018. [DOI: 10.1027/2151-2604/a000344] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

Abstract Abstract. Machine learning tools are increasingly used in social sciences and policy fields due to their increase in predictive accuracy. However, little research has been done on how well the models of machine learning methods replicate across samples. We compare machine learning methods with regression on the replicability of variable selection, along with predictive accuracy, using an empirical dataset as well as simulated data with additive, interaction, and non-linear squared terms added as predictors. Methods analyzed include support vector machines (SVM), random forests (RF), multivariate adaptive regression splines (MARS), and the regularized regression variants, least absolute shrinkage and selection operator (LASSO), and elastic net. In simulations with additive and linear interactions, machine learning methods performed similarly to regression in replicating predictors; they also performed mostly equal or below regression on measures of predictive accuracy. In simulations with square terms, machine learning methods SVM, RF, and MARS improved predictive accuracy and replicated predictors better than regression. Thus, in simulated datasets, the gap between machine learning methods and regression on predictive measures foreshadowed the gap in variable selection. In replications on the empirical dataset, however, improved prediction by machine learning methods was not accompanied by a visible improvement in replicability in variable selection. This disparity is explained by the overall explanatory power of the models. When predictors have small effects and noise predominates, improved global measures of prediction in a sample by machine learning methods may not lead to the robust selection of predictors; thus, in the presence of weak predictors and noise, regression remains a useful tool for model building and replication. Collapse

Evaluating of associated risk factors of metabolic syndrome by using decision tree. ACTA ACUST UNITED AC 2017. [DOI: 10.1007/s00580-017-2580-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]

K G, C R. Heuristic Classifier for Observe Accuracy of Cancer Polyp Using Video Capsule Endoscopy. Asian Pac J Cancer Prev 2017;18:1681-1688. [PMID: 28670889 PMCID: PMC6373793 DOI: 10.22034/apjcp.2017.18.6.1681] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open

Abstract

Methods: Colonoscopy is a technique for examine colon cancer, polyps. In endoscopy, video capsule is universally used mechanism for finding gastrointestinal stages. But both the mechanisms are used to find the colon cancer or colorectal polyp. The Automatic Polyp Detection sub-challenge conducted as part of the Endoscopic Vision Challenge (http://endovis.grand-challenge.org). Method: Colonoscopy may be primary way of improve the ability of colon cancer detection especially flat lesions. Which otherwise may be difficult to detect. Recently, automatic polyp detection algorithms have been proposed with various degrees of success. Though polyp detection in colonoscopy and other traditional endoscopy procedure based images is becoming a mature field, due to its unique imaging characteristics, detecting polyps automatically in colonoscopy is a hard problem. So the proposed video capsule cam supports to diagnose the polyps accurate and easy to identify its pattern. Existing methodology mainly concentrated on high accuracy and less time consumption and it uses many different types of data mining techniques. To analyse these high resolution video scale image we have to take segmentation of image in pixel level binary pattern with the help of a mid-pass filter and relative gray level of neighbours. This work consists of three major steps to improve the accuracy of video capsule endoscopy such as missing data imputation, high dimensionality reduction or feature selection and classification. The above steps are performed using a dataset called endoscopy polyp disease dataset with 500 patients. Our binary classification algorithm relieves human analyses using the video frames. SVM has given major contribution to process the dataset. Results: In this paper the key aspect of proposed results provide segmentation, binary pattern approach with Genetic Fuzzy based Improved Kernel Support Vector machine (GF-IKSVM) classifier. The segmented images all are mostly round shape. The result is refined via smooth filtering, computer vision methods and thresholding steps. Conclusion: Our experimental result produces 94.4% accuracy in that the proposed fuzzy system and genetic Fuzzy, which is higher than the methods, used in the literature. The GF-IKSVM classifier is well-organized and provides good accuracy results for patched VCE polyp disease diagnosis.

Collapse

Olivera AR, Roesler V, Iochpe C, Schmidt MI, Vigo Á, Barreto SM, Duncan BB. Comparison of machine-learning algorithms to build a predictive model for detecting undiagnosed diabetes - ELSA-Brasil: accuracy study. SAO PAULO MED J 2017;135:234-246. [PMID: 28746659 PMCID: PMC10019841 DOI: 10.1590/1516-3180.2016.0309010217] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/19/2017] [Accepted: 02/01/2017] [Indexed: 01/23/2023] Open

Arabasadi Z, Alizadehsani R, Roshanzamir M, Moosaei H, Yarifard AA. Computer aided decision making for heart disease detection using hybrid neural network-Genetic algorithm. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2017;141:19-26. [PMID: 28241964 DOI: 10.1016/j.cmpb.2017.01.004] [Citation(s) in RCA: 139] [Impact Index Per Article: 19.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/11/2016] [Revised: 12/18/2016] [Accepted: 01/12/2017] [Indexed: 05/28/2023]

Alanazi HO, Abdullah AH, Qureshi KN. A Critical Review for Developing Accurate and Dynamic Predictive Models Using Machine Learning Methods in Medicine and Health Care. J Med Syst 2017;41:69. [PMID: 28285459 DOI: 10.1007/s10916-017-0715-6] [Citation(s) in RCA: 63] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2016] [Accepted: 02/26/2017] [Indexed: 10/20/2022]

Tayefi M, Esmaeili H, Saberi Karimian M, Amirabadi Zadeh A, Ebrahimi M, Safarian M, Nematy M, Parizadeh SMR, Ferns GA, Ghayour-Mobarhan M. The application of a decision tree to establish the parameters associated with hypertension. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2017;139:83-91. [PMID: 28187897 DOI: 10.1016/j.cmpb.2016.10.020] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2016] [Revised: 09/13/2016] [Accepted: 10/18/2016] [Indexed: 06/06/2023]

Bamidis PD, Psarouli E, Stilou S. Using modern IT tools to assess the awareness of MDs on radiation issues and plan a continuous education programme. Health Informatics J 2016. [DOI: 10.1177/146045820100700307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Umut İ. PSGMiner: A modular software for polysomnographic analysis. Comput Biol Med 2016;73:1-9. [DOI: 10.1016/j.compbiomed.2016.03.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2016] [Revised: 03/26/2016] [Accepted: 03/28/2016] [Indexed: 10/22/2022]

Molina ME, Perez A, Valente JP. Classification of auditory brainstem responses through symbolic pattern discovery. Artif Intell Med 2016;70:12-30. [PMID: 27431034 DOI: 10.1016/j.artmed.2016.05.001] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2015] [Accepted: 05/09/2016] [Indexed: 01/20/2023]

Abstract

INTRODUCTION

Numeric time series are present in a very wide range of domains, including many branches of medicine. Data mining techniques have proved to be useful for knowledge discovery in this type of data and for supporting decision-making processes.

OBJECTIVES

The overall objective is to classify time series based on the discovery of frequent patterns. These patterns will be discovered in symbolic sequences obtained from the time series data by means of a temporal abstraction process.

METHODS

Firstly, we transform numeric time series into symbolic time sequences, where the symbols aim to represent the relevant domain concepts. These symbols can be defined using either public or expert domain knowledge. Then we apply a symbolic pattern discovery technique to the output symbolic sequences. This technique identifies the subsequences frequently found in a population group. These subsequences (patterns) are representative of population groups. Finally, we employ a classification technique based on the identified patterns in order to classify new individuals. Thanks to the inclusion of domain knowledge, the classification results can be explained using domain terminology. This makes the results easier to interpret for the domain specialist (physician).

RESULTS

This method has been applied to brainstem auditory evoked potentials (BAEPs) time series. Preliminary experiments were carried out to analyse several aspects of the method including the best configuration of the pattern discovery technique parameters. We then applied the method to the BAEPs of 83 individuals belonging to four classes (healthy, conductive hearing loss, vestibular schwannoma-brainstem involvement and vestibular schwannoma-8th-nerve involvement). According to the results of the cross-validation, overall accuracy was 99.4%, sensitivity (recall) was 97.6% and specificity was 100% (no false positives).

CONCLUSION

The proposed method effectively reduces dimensionality. Additionally, if the symbolic transformation includes the right domain knowledge, the method arguably outputs a data representation that denotes the relevant domain concepts more clearly. The method is capable of finding patterns in BAEPs time series and is very accurate at correctly predicting whether or not new patients have an auditory-related disorder.

Collapse

Umut İ, Çentik G. Detection of Periodic Leg Movements by Machine Learning Methods Using Polysomnographic Parameters Other Than Leg Electromyography. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2016;2016:2041467. [PMID: 27213008 PMCID: PMC4860221 DOI: 10.1155/2016/2041467] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/29/2016] [Revised: 04/02/2016] [Accepted: 04/04/2016] [Indexed: 11/18/2022]

Prediction of solubility of some statin drugs in supercritical carbon dioxide using classification and regression tree analysis and adaptive neuro-fuzzy inference systems. Russ Chem Bull 2016. [DOI: 10.1007/s11172-016-1424-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]

Gutiérrez S, Tardaguila J, Fernández-Novales J, Diago MP. Data Mining and NIR Spectroscopy in Viticulture: Applications for Plant Phenotyping under Field Conditions. SENSORS 2016;16:236. [PMID: 26891304 PMCID: PMC4801612 DOI: 10.3390/s16020236] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Revised: 01/29/2016] [Accepted: 02/04/2016] [Indexed: 11/16/2022]

Blood type classification using computer vision and machine learning. Neural Comput Appl 2016. [DOI: 10.1007/s00521-015-2151-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]

Early-Stage Event Prediction for Longitudinal Data. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING 2016. [DOI: 10.1007/978-3-319-31753-3_12] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]

High-dimensional feature selection via feature grouping: A Variable Neighborhood Search approach. Inf Sci (N Y) 2016. [DOI: 10.1016/j.ins.2015.07.041] [Citation(s) in RCA: 76] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Bashir S, Qamar U, Khan FH. IntelliHealth: A medical decision support application using a novel weighted multi-layer classifier ensemble framework. J Biomed Inform 2015;59:185-200. [PMID: 26703093 DOI: 10.1016/j.jbi.2015.12.001] [Citation(s) in RCA: 82] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2015] [Revised: 11/01/2015] [Accepted: 12/06/2015] [Indexed: 11/30/2022]

Shahraki AD, Safdari R, Gahfarokhi HH, Tahmasebian S. The Usage of Association Rule Mining to Identify Influencing Factors on Deafness After Birth. Acta Inform Med 2015;23:356-9. [PMID: 26862245 PMCID: PMC4720831 DOI: 10.5455/aim.2015.23.356-359] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2015] [Accepted: 11/17/2015] [Indexed: 11/03/2022] Open

Abstract

BACKGROUND

Providing complete and high quality health care services has very important role to enable people to understand the factors related to personal and social health and to make decision regarding choice of suitable healthy behaviors in order to achieve healthy life. For this reason, demographic and clinical data of person are collecting, this huge volume of data can be known as a valuable resource for analyzing, exploring and discovering valuable information and communication. This study using forum rules techniques in the data mining has tried to identify the affecting factors on hearing loss after birth in Iran.

MATERIALS AND METHODS

The survey is kind of data oriented study. The population of the study is contained questionnaires in several provinces of the country. First, all data of questionnaire was implemented in the form of information table in Software SQL Server and followed by Data Entry using written software of C # .Net, then algorithm Association in SQL Server Data Tools software and Clementine software was implemented to determine the rules and hidden patterns in the gathered data.

FINDINGS

Two factors of number of deaf brothers and the degree of consanguinity of the parents have a significant impact on severity of deafness of individuals. Also, when the severity of hearing loss is greater than or equal to moderately severe hearing loss, people use hearing aids and Men are also less interested in the use of hearing aids.

CONCLUSION

In fact, it can be said that in families with consanguineous marriage of parents that are from first degree (girl/boy cousins) and 2(nd) degree relatives (girl/boy cousins) and especially from first degree, the number of people with severe hearing loss or deafness are more and in the use of hearing aids, gender of the patient is more important than the severity of the hearing loss.

Collapse

Diciolla M, Binetti G, Di Noia T, Pesce F, Schena FP, Vågane AM, Bjørneklett R, Suzuki H, Tomino Y, Naso D. Patient classification and outcome prediction in IgA nephropathy. Comput Biol Med 2015;66:278-86. [PMID: 26453758 DOI: 10.1016/j.compbiomed.2015.09.003] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2015] [Revised: 08/08/2015] [Accepted: 09/02/2015] [Indexed: 10/23/2022]

Abstract

OBJECTIVE

IgA Nephropathy (IgAN) is a common kidney disease which may entail renal failure, known as End Stage Kidney Disease (ESKD). One of the major difficulties dealing with this disease is to predict the time of the long-term prognosis for a patient at the time of diagnosis. In fact, the progression of IgAN to ESKD depends on an intricate interrelationship between clinical and laboratory findings. Therefore, the objective of this work has been the selection of the best data mining tool to build a model able to predict (I) if a patient with a biopsy proven IgAN will reach ESKD and (II) if a patient will reach the ESKD before or after 5 years.

MATERIAL AND METHODS

The largest available cohort study worldwide on IgAN has been used to design and compare several data-driven models. The complete dataset was composed of 1174 records collected from Italian, Norwegian, and Japanese IgAN patients, in the last 30 years. The data mining tools considered in this work were artificial neural networks (ANNs), neuro fuzzy systems (NFSs), support vector machines (SVMs), and decision trees (DTs). A 10-fold cross validation was used to evaluate unbiased performances for all the models.

RESULTS

An extensive model comparison based on accuracy, precision, recall, and f-measure was provided. Overall, the results indicate that ANNs can provide superior performance compared to the other models. The ANN for time-to-ESKD prediction is characterized by accuracy, precision, recall, and f-measure greater than 90%. The ANN for ESKD prediction has accuracy greater than 90% as well as precision, recall, and f-measure for the class of patients not reaching ESKD, while precision, recall, and f-measure for the class of patients reaching ESKD are slightly lower. The obtained model has been implemented in a Web-based decision support system (DSS).

CONCLUSIONS

The extraction of novel knowledge from clinical data and the definition of predictive models to support diagnosis, prognosis, and therapy is becoming an essential tool for researchers and clinical practitioners in medicine. The proposed comparative study of several data mining models for the outcome prediction in IgAN patients, using a large dataset of clinical records from three different countries, provides an insight into the relative prediction ability of the considered methods applied to such a disease.

Collapse

Huang H, Fava A, Guhr T, Cimbro R, Rosen A, Boin F, Ellis H. A methodology for exploring biomarker--phenotype associations: application to flow cytometry data and systemic sclerosis clinical manifestations. BMC Bioinformatics 2015;16:293. [PMID: 26373409 PMCID: PMC4571079 DOI: 10.1186/s12859-015-0722-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2015] [Accepted: 08/26/2015] [Indexed: 01/19/2023] Open

Abstract

Background

This work seeks to develop a methodology for identifying reliable biomarkers of disease activity, progression and outcome through the identification of significant associations between high-throughput flow cytometry (FC) data and interstitial lung disease (ILD) - a systemic sclerosis (SSc, or scleroderma) clinical phenotype which is the leading cause of morbidity and mortality in SSc. A specific aim of the work involves developing a clinically useful screening tool that could yield accurate assessments of disease state such as the risk or presence of SSc-ILD, the activity of lung involvement and the likelihood to respond to therapeutic intervention. Ultimately this instrument could facilitate a refined stratification of SSc patients into clinically relevant subsets at the time of diagnosis and subsequently during the course of the disease and thus help in preventing bad outcomes from disease progression or unnecessary treatment side effects.

The methods utilized in the work involve: (1) clinical and peripheral blood flow cytometry data (Immune Response In Scleroderma, IRIS) from consented patients followed at the Johns Hopkins Scleroderma Center. (2) machine learning (Conditional Random Forests - CRF) coupled with Gene Set Enrichment Analysis (GSEA) to identify subsets of FC variables that are highly effective in classifying ILD patients; and (3) stochastic simulation to design, train and validate ILD risk screening tools.

Results

Our hybrid analysis approach (CRF-GSEA) proved successful in predicting SSc patient ILD status with a high degree of success (>82 % correct classification in validation; 79 patients in the training data set, 40 patients in the validation data set).

Conclusions

IRIS flow cytometry data provides useful information in assessing the ILD status of SSc patients. Our new approach combining Conditional Random Forests and Gene Set Enrichment Analysis was successful in identifying a subset of flow cytometry variables to create a screening tool that proved effective in correctly identifying ILD patients in the training and validation data sets. From a somewhat broader perspective, the identification of subsets of flow cytometry variables that exhibit coordinated movement (i.e., multi-variable up or down regulation) may lead to insights into possible effector pathways and thereby improve the state of knowledge of systemic sclerosis pathogenesis.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0722-x) contains supplementary material, which is available to authorized users.

Collapse

Tomczak JM, Zięba M. Probabilistic combination of classification rules and its application to medical diagnosis. Mach Learn 2015. [DOI: 10.1007/s10994-015-5508-x] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Al-Hyari AY, Al-Taee AM, Al-Taee MA. Diagnosis and Classification of Chronic Renal Failure Utilising Intelligent Data Mining Classifiers. INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY AND WEB ENGINEERING 2014. [DOI: 10.4018/ijitwe.2014100101] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Ramezankhani A, Pournik O, Shahrabi J, Khalili D, Azizi F, Hadaegh F. Applying decision tree for identification of a low risk population for type 2 diabetes. Tehran Lipid and Glucose Study. Diabetes Res Clin Pract 2014;105:391-8. [PMID: 25085758 DOI: 10.1016/j.diabres.2014.07.003] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/30/2013] [Revised: 04/15/2014] [Accepted: 07/05/2014] [Indexed: 01/06/2023]

Ho JC, Ghosh J, Steinhubl SR, Stewart WF, Denny JC, Malin BA, Sun J. Limestone: high-throughput candidate phenotype generation via tensor factorization. J Biomed Inform 2014;52:199-211. [PMID: 25038555 DOI: 10.1016/j.jbi.2014.07.001] [Citation(s) in RCA: 72] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2013] [Revised: 05/14/2014] [Accepted: 07/02/2014] [Indexed: 12/22/2022]

Abstract

The rapidly increasing availability of electronic health records (EHRs) from multiple heterogeneous sources has spearheaded the adoption of data-driven approaches for improved clinical research, decision making, prognosis, and patient management. Unfortunately, EHR data do not always directly and reliably map to medical concepts that clinical researchers need or use. Some recent studies have focused on EHR-derived phenotyping, which aims at mapping the EHR data to specific medical concepts; however, most of these approaches require labor intensive supervision from experienced clinical professionals. Furthermore, existing approaches are often disease-centric and specialized to the idiosyncrasies of the information technology and/or business practices of a single healthcare organization. In this paper, we propose Limestone, a nonnegative tensor factorization method to derive phenotype candidates with virtually no human supervision. Limestone represents the data source interactions naturally using tensors (a generalization of matrices). In particular, we investigate the interaction of diagnoses and medications among patients. The resulting tensor factors are reported as phenotype candidates that automatically reveal patient clusters on specific diagnoses and medications. Using the proposed method, multiple phenotypes can be identified simultaneously from data. We demonstrate the capability of Limestone on a cohort of 31,815 patient records from the Geisinger Health System. The dataset spans 7years of longitudinal patient records and was initially constructed for a heart failure onset prediction study. Our experiments demonstrate the robustness, stability, and the conciseness of Limestone-derived phenotypes. Our results show that using only 40 phenotypes, we can outperform the original 640 features (169 diagnosis categories and 471 medication types) to achieve an area under the receiver operator characteristic curve (AUC) of 0.720 (95% CI 0.715 to 0.725). Moreover, in consultation with a medical expert, we confirmed 82% of the top 50 candidates automatically extracted by Limestone are clinically meaningful.

Collapse