1
|
Bennour A, Ben Aoun N, Khalaf OI, Ghabban F, Wong WK, Algburi S. Contribution to pulmonary diseases diagnostic from X-ray images using innovative deep learning models. Heliyon 2024; 10:e30308. [PMID: 38707425 PMCID: PMC11068804 DOI: 10.1016/j.heliyon.2024.e30308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 04/09/2024] [Accepted: 04/23/2024] [Indexed: 05/07/2024] Open
Abstract
Pulmonary disease identification and characterization are among the most intriguing research topics of recent years since they require an accurate and prompt diagnosis. Although pulmonary radiography has helped in lung disease diagnosis, the interpretation of the radiographic image has always been a major concern for doctors and radiologists to reduce diagnosis errors. Due to their success in image classification and segmentation tasks, cutting-edge artificial intelligence techniques like machine learning (ML) and deep learning (DL) are widely encouraged to be applied in the field of diagnosing lung disorders and identifying them using medical images, particularly radiographic ones. For this end, the researchers are concurring to build systems based on these techniques in particular deep learning ones. In this paper, we proposed three deep-learning models that were trained to identify the presence of certain lung diseases using thoracic radiography. The first model, named "CovCXR-Net", identifies the COVID-19 disease (two cases: COVID-19 or normal). The second model, named "MDCXR3-Net", identifies the COVID-19 and pneumonia diseases (three cases: COVID-19, pneumonia, or normal), and the last model, named "MDCXR4-Net", is destined to identify the COVID-19, pneumonia and the pulmonary opacity diseases (4 cases: COVID-19, pneumonia, pulmonary opacity or normal). These models have proven their superiority in comparison with the state-of-the-art models and reached an accuracy of 99,09 %, 97.74 %, and 90,37 % respectively with three benchmarks.
Collapse
Affiliation(s)
- Akram Bennour
- LAMIS Laboratiry, Echahid Cheikh Larbi Tebessi University, Tebessa, Algeria
| | - Najib Ben Aoun
- College of Computer Science and Information Technology, Al-Baha University, Al Baha, Saudi Arabia
- REGIM-Lab: Research Groups in Intelligent Machines, National School of Engineers of Sfax (ENIS), University of Sfax, Tunisia
| | - Osamah Ibrahim Khalaf
- Department of Solar, Al-Nahrain Research Center for Renewable Energy, Al-Nahrain University, Jadriya, Baghdad, Iraq
| | - Fahad Ghabban
- College of Computer Science and Engineering, Taibah University, Medina, Saudi Arabia
| | | | - Sameer Algburi
- Al-Kitab University, College of Engineering Techniques, Kirkuk, Iraq
| |
Collapse
|
2
|
Al-Zubayer MA, Alam K, Shanto HH, Maniruzzaman M, Majumder UK, Ahammed B. Machine learning models for prediction of double and triple burdens of non-communicable diseases in Bangladesh. J Biosoc Sci 2024; 56:426-444. [PMID: 38505939 DOI: 10.1017/s0021932024000063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/21/2024]
Abstract
Increasing prevalence of non-communicable diseases (NCDs) has become the leading cause of death and disability in Bangladesh. Therefore, this study aimed to measure the prevalence of and risk factors for double and triple burden of NCDs (DBNCDs and TBNCDs), considering diabetes, hypertension, and overweight and obesity as well as establish a machine learning approach for predicting DBNCDs and TBNCDs. A total of 12,151 respondents from the 2017 to 2018 Bangladesh Demographic and Health Survey were included in this analysis, where 10%, 27.4%, and 24.3% of respondents had diabetes, hypertension, and overweight and obesity, respectively. Chi-square test and multilevel logistic regression (LR) analysis were applied to select factors associated with DBNCDs and TBNCDs. Furthermore, six classifiers including decision tree (DT), LR, naïve Bayes (NB), k-nearest neighbour (KNN), random forest (RF), and extreme gradient boosting (XGBoost) with three cross-validation protocols (K2, K5, and K10) were adopted to predict the status of DBNCDs and TBNCDs. The classification accuracy (ACC) and area under the curve (AUC) were computed for each protocol and repeated 10 times to make them more robust, and then the average ACC and AUC were computed. The prevalence of DBNCDs and TBNCDs was 14.3% and 2.3%, respectively. The findings of this study revealed that DBNCDs and TBNCDs were significantly influenced by age, sex, marital status, wealth index, education and geographic region. Compared to other classifiers, the RF-based classifier provides the highest ACC and AUC for both DBNCDs (ACC = 81.06% and AUC = 0.93) and TBNCDs (ACC = 88.61% and AUC = 0.97) for the K10 protocol. A combination of considered two-step factor selections and RF-based classifier can better predict the burden of NCDs. The findings of this study suggested that decision-makers might adopt suitable decisions to control and prevent the burden of NCDs using RF classifiers.
Collapse
Affiliation(s)
| | - Khorshed Alam
- School of Business, University of Southern Queensland, Toowoomba, QLD, Australia
- Centre for Health Research, University of Southern Queensland, Toowoomba, QLD, Australia
| | | | - Md Maniruzzaman
- Statistics Discipline, Khulna University, Khulna, Bangladesh
| | | | - Benojir Ahammed
- Statistics Discipline, Khulna University, Khulna, Bangladesh
| |
Collapse
|
3
|
Ma H, Yuan X, Sun X, Lawson G, Wang Q. Seeing Your Stories: Visualization for Narrative Medicine. HEALTH DATA SCIENCE 2024; 4:0103. [PMID: 38486622 PMCID: PMC10880175 DOI: 10.34133/hds.0103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Accepted: 11/29/2023] [Indexed: 03/17/2024]
Abstract
Importance: Narrative medicine (NM), in which patient stories play a crucial role in their diagnosis and treatment, can potentially support a more holistic approach to patient care than traditional scientific ones. However, there are some challenges in the implementation of narrative medicine, for example, differences in understanding illnesses between physicians and patients and physicians' increased workloads and overloaded schedules. This paper first presents a review to explore previous visualization research for narrative medicine to bridge the gap between visualization researchers and narrative medicine experts and explore further visualization opportunities. Highlights: The review is conducted from 2 perspectives: (a) the contexts and domains in which visualization has been explored for narrative medicine and (b) the forms and solutions applied in these studies. Four applied domains are defined, including understanding patients from narrative records, medical communication, medical conversation training in education, and psychotherapy and emotional wellness enhancement. Conclusions: A future work framework illustrates some opportunities for future research, including groups of specific directions and future points for the 4 domains and 3 technological exploration opportunities (combination of narrative and medical data visualization, task-audience-based visual storytelling, and user-centered interactive visualization). Specifically, 3 directions of future work in medical communication (asynchronous online physician-patient communication, synchronous face-to-face medical conversation, and medical knowledge dissemination) were concluded.
Collapse
Affiliation(s)
- Hua Ma
- Faculty of Science and Engineering,
University of Nottingham, Ningbo 315100, China
- Digital Art Department,
Art & Design Technology Institute, Suzhou 215104, China
| | - Xiaoru Yuan
- National Key Laboratory of General Artificial Intelligence and School of Intelligence Science and Technology,
Peking University, Beijing 100871, China
- Health Data Visualization and Visual Analytics Research Center, National Institute of Health Data Science at PKU, Beijing 100191, China
| | - Xu Sun
- Faculty of Science and Engineering,
University of Nottingham, Ningbo 315100, China
- Nottingham Ningbo China Beacons of Excellence Research and Innovation Institute,
University of Nottingham Ningbo China, Ningbo 315100, China
| | - Glyn Lawson
- Human Factors Research Group, Faculty of Engineering,
University of Nottingham, Nottingham NG7 2RD, UK
| | - Qingfeng Wang
- Nottingham University Business School China,
University of Nottingham, Ningbo 315100, China
| |
Collapse
|
4
|
Arain Z, Iliodromiti S, Slabaugh G, David AL, Chowdhury TT. Machine learning and disease prediction in obstetrics. Curr Res Physiol 2023; 6:100099. [PMID: 37324652 PMCID: PMC10265477 DOI: 10.1016/j.crphys.2023.100099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 05/09/2023] [Indexed: 06/17/2023] Open
Abstract
Machine learning technologies and translation of artificial intelligence tools to enhance the patient experience are changing obstetric and maternity care. An increasing number of predictive tools have been developed with data sourced from electronic health records, diagnostic imaging and digital devices. In this review, we explore the latest tools of machine learning, the algorithms to establish prediction models and the challenges to assess fetal well-being, predict and diagnose obstetric diseases such as gestational diabetes, pre-eclampsia, preterm birth and fetal growth restriction. We discuss the rapid growth of machine learning approaches and intelligent tools for automated diagnostic imaging of fetal anomalies and to asses fetoplacental and cervix function using ultrasound and magnetic resonance imaging. In prenatal diagnosis, we discuss intelligent tools for magnetic resonance imaging sequencing of the fetus, placenta and cervix to reduce the risk of preterm birth. Finally, the use of machine learning to improve safety standards in intrapartum care and early detection of complications will be discussed. The demand for technologies to enhance diagnosis and treatment in obstetrics and maternity should improve frameworks for patient safety and enhance clinical practice.
Collapse
Affiliation(s)
- Zara Arain
- Centre for Bioengineering, School of Engineering and Materials Science, Queen Mary University of London, Mile End Road, London, E1 4NS, UK
| | - Stamatina Iliodromiti
- Women's Health Research Unit, Wolfson Institute of Population Health, Queen Mary University of London, 58 Turner Street, London, E1 2AB, UK
| | - Gregory Slabaugh
- Digital Environment Research Institute, School of Electronic Engineering and Computer Science, Queen Mary University of London, London, E1 1HH, UK
| | - Anna L. David
- Elizabeth Garrett Anderson Institute for Women's Health, University College London, Medical School Building, Huntley Street, London, WC1E 6AU, UK
| | - Tina T. Chowdhury
- Centre for Bioengineering, School of Engineering and Materials Science, Queen Mary University of London, Mile End Road, London, E1 4NS, UK
| |
Collapse
|
5
|
Dai W, Cui Y, Wang P, Wu H, Zhang L, Bian Y, Li Y, Li Y, Hu H, Zhao J, Xu D, Kong D, Wang Y, Xu L. Classification regularized dimensionality reduction improves ultrasound thyroid nodule diagnostic accuracy and inter-observer consistency. Comput Biol Med 2023; 154:106536. [PMID: 36708654 DOI: 10.1016/j.compbiomed.2023.106536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Revised: 12/20/2022] [Accepted: 01/10/2023] [Indexed: 01/13/2023]
Abstract
PROBLEM Convolutional Neural Networks (CNNs) for medical image analysis usually only output a probability value, providing no further information about the original image or inter-relationships between different images. Dimensionality Reduction Techniques (DRTs) are used for visualization of high dimensional medical image data, but they are not intended for discriminative classification analysis. AIM We develop an interactive phenotype distribution field visualization system for medical images to accurately reflect the pathological characteristics of lesions and their similarity to assist radiologists in diagnosis and medical research. METHODS We propose a novel method, Classification Regularized Uniform Manifold Approximation and Projection (UMAP) referred as CReUMAP, combining the advantages of CNN and DRT, to project the extracted feature vector fused with the malignant probability predicted by a CNN to a two-dimensional space, and then apply a spatial segmentation classifier trained on 2614 ultrasound images for prediction of thyroid nodule malignancy and guidance to radiologists. RESULTS The CReUMAP embedding correlates well with the TI-RADS categories of thyroid nodules. The parametric version that embeds external test dataset of 303 images in presence of the training data with known pathological diagnosis improves the benign and malignant nodule diagnostic accuracy (p-value = 0.016) and confidence (p-value = 1.902 × 10-6) of eight radiologists of different experience levels significantly as well as their inter-observer agreements (kappa≥0.75). CReUMAP achieve 90.8% accuracy, 92.1% sensitivity and 88.6% specificity in test set. CONCLUSION CReUMAP embedding is well correlated with the pathological diagnosis of thyroid nodules, and helps radiologists achieve more accurate, confident and consistent diagnosis. It allows a medical center to generate its locally adapted embedding using an already-trained classification model in an updateable manner on an ever-growing local database as long as the extracted feature vectors and predicted diagnostic probabilities of the correspondent classification model can be outputted.
Collapse
Affiliation(s)
- Wenli Dai
- School of Mathematical Sciences, Zhejiang University, Hangzhou, China
| | - Yan Cui
- School of Mathematical Sciences, Zhejiang University, Hangzhou, China
| | - Peiyi Wang
- School of Mathematical Sciences, Zhejiang University, Hangzhou, China
| | - Hao Wu
- Department of Ultrasound, The Second Affiliated Hospital of Zhejiang Chinese Medical University, Hangzhou, China
| | - Lei Zhang
- Department of Ultrasound, The Second Affiliated Hospital of Zhejiang Chinese Medical University, Hangzhou, China
| | - Yeping Bian
- Department of Ultrasonography, The Cancer Hospital of the University of Chinese Academy of Sciences (Zhejiang Cancer Hospital), Institute of Basic Medicine and Cancer, Chinese Academy of Sciences, Hangzhou, China
| | - Yingying Li
- Department of Special Examinations, Hangzhou Third People's Hospital, Hangzhou, China
| | - Yutao Li
- Department of Ultrasound, Hangzhou First People's Hospital Affiliated to Medical College of Zhejiang University, Hangzhou, China
| | - Hairong Hu
- Demetics Medical Technology, Hangzhou, China
| | - Jiaqi Zhao
- Department of Ultrasound, Shanghai Fourth People's Hospital, School of Medicine, Tongji University, Shanghai, China
| | - Dong Xu
- Department of Ultrasonography, The Cancer Hospital of the University of Chinese Academy of Sciences (Zhejiang Cancer Hospital), Institute of Basic Medicine and Cancer, Chinese Academy of Sciences, Hangzhou, China
| | - Dexing Kong
- School of Mathematical Sciences, Zhejiang University, Hangzhou, China; Zhejiang Qiushi Institute for Mathematical Medicine, Hangzhou, China
| | - Yajuan Wang
- Department of Geriatric Medicine & Key Laboratory of Cardiovascular Proteomics of Shandong Province, Qilu Hospital of Shandong University, Jinan, China.
| | - Lei Xu
- Zhejiang Qiushi Institute for Mathematical Medicine, Hangzhou, China.
| |
Collapse
|
6
|
A heterogeneous multi-modal medical data fusion framework supporting hybrid data exploration. Health Inf Sci Syst 2022; 10:22. [PMID: 36039096 PMCID: PMC9417071 DOI: 10.1007/s13755-022-00183-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 07/02/2022] [Indexed: 12/02/2022] Open
Abstract
Industry 4.0 era has witnessed that more and more high-tech and precise devices are applied into medical field to provide better services. Besides EMRs, medical data include a large amount of unstructured data such as X-rays, MRI scans, CT scans and PET scans, which is still continually increasing. These massive, heterogeneous multi-modal data bring the big challenge to finding valuable data sets for healthcare researchers and other users. The traditional data warehouses are able to integrate the data and support interactive data exploration through ETL process. However, they have high cost and are not real-time. Furthermore, they lack of the ability to deal with multi-modal data in two phases—data fusion and data exploration. In the data fusion phase, it is difficult to unify the multi-modal data under one data model. In the data exploration phase, it is challenging to explore the multi-modal data at the same time, which impedes the process of extracting the diverse information underlying multi-modal data. Therefore, in order to solve these problems, we propose a highly efficient data fusion framework supporting data exploration for heterogeneous multi-modal medical data based on data lake. This framework provides a novel and efficient method to fuse the fragmented multi-modal medical data and store their metadata in the data lake. It offers a user-friendly interface supporting hybrid graph queries to explore multi-modal data. Indexes are created to accelerate the hybrid data exploration. One prototype has been implemented and tested in a hospital, which demonstrates the effectiveness of our framework.
Collapse
|
7
|
Oubenali N, Messaoud S, Filiot A, Lamer A, Andrey P. Visualization of medical concepts represented using word embeddings: a scoping review. BMC Med Inform Decis Mak 2022; 22:83. [PMID: 35351120 PMCID: PMC8962592 DOI: 10.1186/s12911-022-01822-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Accepted: 03/07/2022] [Indexed: 11/10/2022] Open
Abstract
Abstract
Background
Analyzing the unstructured textual data contained in electronic health records (EHRs) has always been a challenging task. Word embedding methods have become an essential foundation for neural network-based approaches in natural language processing (NLP), to learn dense and low-dimensional word representations from large unlabeled corpora that capture the implicit semantics of words. Models like Word2Vec, GloVe or FastText have been broadly applied and reviewed in the bioinformatics and healthcare fields, most often to embed clinical notes or activity and diagnostic codes. Visualization of the learned embeddings has been used in a subset of these works, whether for exploratory or evaluation purposes. However, visualization practices tend to be heterogeneous, and lack overall guidelines.
Objective
This scoping review aims to describe the methods and strategies used to visualize medical concepts represented using word embedding methods. We aim to understand the objectives of the visualizations and their limits.
Methods
This scoping review summarizes different methods used to visualize word embeddings in healthcare. We followed the methodology proposed by Arksey and O’Malley (Int J Soc Res Methodol 8:19–32, 2005) and by Levac et al. (Implement Sci 5:69, 2010) to better analyze the data and provide a synthesis of the literature on the matter.
Results
We first obtained 471 unique articles from a search conducted in PubMed, MedRxiv and arXiv databases. 30 of these were effectively reviewed, based on our inclusion and exclusion criteria. 23 articles were excluded in the full review stage, resulting in the analysis of 7 papers that fully correspond to our inclusion criteria. Included papers pursued a variety of objectives and used distinct methods to evaluate their embeddings and to visualize them. Visualization also served heterogeneous purposes, being alternatively used as a way to explore the embeddings, to evaluate them or to merely illustrate properties otherwise formally assessed.
Conclusions
Visualization helps to explore embedding results (further dimensionality reduction, synthetic representation). However, it does not exhaust the information conveyed by the embeddings nor constitute a self-sustaining evaluation method of their pertinence.
Collapse
|
8
|
Xiang J, Zhang J, Zhao Y, Wu FX, Li M. Biomedical data, computational methods and tools for evaluating disease-disease associations. Brief Bioinform 2022; 23:6522999. [PMID: 35136949 DOI: 10.1093/bib/bbac006] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 01/04/2022] [Accepted: 01/05/2022] [Indexed: 12/12/2022] Open
Abstract
In recent decades, exploring potential relationships between diseases has been an active research field. With the rapid accumulation of disease-related biomedical data, a lot of computational methods and tools/platforms have been developed to reveal intrinsic relationship between diseases, which can provide useful insights to the study of complex diseases, e.g. understanding molecular mechanisms of diseases and discovering new treatment of diseases. Human complex diseases involve both external phenotypic abnormalities and complex internal molecular mechanisms in organisms. Computational methods with different types of biomedical data from phenotype to genotype can evaluate disease-disease associations at different levels, providing a comprehensive perspective for understanding diseases. In this review, available biomedical data and databases for evaluating disease-disease associations are first summarized. Then, existing computational methods for disease-disease associations are reviewed and classified into five groups in terms of the usages of biomedical data, including disease semantic-based, phenotype-based, function-based, representation learning-based and text mining-based methods. Further, we summarize software tools/platforms for computation and analysis of disease-disease associations. Finally, we give a discussion and summary on the research of disease-disease associations. This review provides a systematic overview for current disease association research, which could promote the development and applications of computational methods and tools/platforms for disease-disease associations.
Collapse
Affiliation(s)
- Ju Xiang
- School of Computer Science and Engineering, Central South University, China
| | - Jiashuai Zhang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Yichao Zhao
- School of Computer Science and Engineering, Central South University, China
| | - Fang-Xiang Wu
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Min Li
- Division of Biomedical Engineering and Department of Mechanical Engineering at University of Saskatchewan, Saskatoon, Canada
| |
Collapse
|
9
|
Pham T, Tao X, Zhang J, Yong J, Li Y, Xie H. Graph-based multi-label disease prediction model learning from medical data and domain knowledge. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2021.107662] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
10
|
CHEN WEI, SUN QIANG, XIE GANGCAI, XU CHEN. A NOVEL DEEP LEARNING NEURAL NETWORK SYSTEM FOR IMBALANCED HEART SOUNDS CLASSIFICATION. J MECH MED BIOL 2021. [DOI: 10.1142/s0219519421500640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
This study proposed a novel TFNNS method, which aimed to solve the imbalanced phonocardiogram (PCG) signals’ classification. TFFNS consisted of three submodules: HeartNet, 2D-Maps transformation, and TF-Mask augmentation. HeartNet, deep neural networks (CNNs), was designed to recognize the categories of PCG signals. In particular, on the basis of short-time Fourier transform and Mel filtering, 2D-Maps transformation was used to convert one-dimensional PCG into two-dimensional Savitzky-MFSC feature maps that were fed into HeartNet; TF-Mask augmentation was designed to augment the training datasets by randomly shielded Savitzky-MFSC maps in the domains of time and frequency. We trained our model on the PASCAL heart sounds’ datasets to classify three categories of heart sounds including normal, murmur, and extrasystole. We also evaluated and compared the model with the baselines on the consistent evaluation protocols. The experimental results showed that the proposed TFFNS method significantly promoted the performance of the PCG signals’ classification and exceeded the baselines by giving the mean precision of 94%, heart problem specificity of 99%, and discriminant power of 1.317.
Collapse
Affiliation(s)
- WEI CHEN
- Institute of Reproductive Medicine, Medical School, Nantong University, Jiangsu 226001, P. R. China
- School of Information Science and Technology, Nantong University, Jiangsu 226019, P. R. China
| | - QIANG SUN
- School of Information Science and Technology, Nantong University, Jiangsu 226019, P. R. China
| | - GANGCAI XIE
- Institute of Reproductive Medicine, Medical School, Nantong University, Jiangsu 226001, P. R. China
| | - CHEN XU
- School of Information Science and Technology, Nantong University, Jiangsu 226019, P. R. China
| |
Collapse
|
11
|
de Oliveira JM, da Costa CA, Antunes RS. Data structuring of electronic health records: a systematic review. HEALTH AND TECHNOLOGY 2021. [DOI: 10.1007/s12553-021-00607-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
|
12
|
Maniruzzaman M, Islam MM, Rahman MJ, Hasan MAM, Shin J. Risk prediction of diabetic nephropathy using machine learning techniques: A pilot study with secondary data. Diabetes Metab Syndr 2021; 15:102263. [PMID: 34482122 DOI: 10.1016/j.dsx.2021.102263] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Revised: 08/21/2021] [Accepted: 08/24/2021] [Indexed: 11/27/2022]
Abstract
AIMS This research work presented a comparative study of machine learning (ML), including two objectives: (i) determination of the risk factors of diabetic nephropathy (DN) based on principal component analysis (PCA) via different cutoffs; (ii) prediction of DN patients using ML-based techniques. METHODS The combination of PCA and ML-based techniques has been implemented to select the best features at different PCA cutoff values and choose the optimal PCA cutoff in which ML-based techniques give the highest accuracy. These optimum features are fed into six ML-based techniques: linear discriminant analysis, support vector machine (SVM), logistic regression, K-nearest neighborhood, naïve Bayes, and artificial neural network. The leave-one-out cross-validation protocol is executed and compared ML-based techniques performance using accuracy and area under the curve (AUC). RESULTS The data utilized in this work consists of 133 respondents having 73 DN patients with an average age of 69.6±10.2 years and 54.2% of DN patients are female. Our findings illustrate that PCA combined with SVM-RBF classifier yields 88.7% accuracy and 0.91 AUC at 0.96 PCA cutoff. CONCLUSIONS This study also suggests that PCA combined with SVM-RBF classifier may correctly classify DN patients with the highest accuracy when compared to the models published in the existing research. Prospective studies are warranted to further validate the applicability of our model in clinical settings.
Collapse
Affiliation(s)
- Md Maniruzzaman
- Statistics Discipline, Khulna University, Khulna, Bangladesh.
| | - Md Merajul Islam
- Department of Statistics, University of Rajshahi, Rajshahi, Bangladesh.
| | - Md Jahanur Rahman
- Department of Statistics, University of Rajshahi, Rajshahi, Bangladesh.
| | - Md Al Mehedi Hasan
- Department of Computer Science & Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh; School of Computer Science and Engineering, University of Aizu, Aizuwakamatsu, Fukushima, Japan.
| | - Jungpil Shin
- School of Computer Science and Engineering, University of Aizu, Aizuwakamatsu, Fukushima, Japan.
| |
Collapse
|
13
|
Soft Computing of a Medically Important Arthropod Vector with Autoregressive Recurrent and Focused Time Delay Artificial Neural Networks. INSECTS 2021; 12:insects12060503. [PMID: 34072705 PMCID: PMC8227104 DOI: 10.3390/insects12060503] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Revised: 05/25/2021] [Accepted: 05/27/2021] [Indexed: 12/02/2022]
Abstract
Simple Summary Arthropod vectors are responsible for transmitting a large number of diseases, and for most, there are still not available effective vaccines. Vector disease control is mostly achieved by a sustained prediction of vector populations to maintain support for surveillance and control activities. Mathematical models may assist in predicting arthropod population dynamics. However, arthropod dynamics, and mosquitoes particularly, due their complex life cycle, often exhibit an abrupt and non-linear occurrence. Therefore, there is a growing interest in describing mosquito population dynamics using new methodologies. In this work, we made an effort to gain insights into the non-linear population dynamics of Culex sp. adults, aiming to introduce straightforward soft-computing techniques based on artificial neural networks (ANNs). We propose two kind of models, one autoregressive, handling temperature as an exogenous driver and population as an endogenous one, and a second based only on the exogenous factor. To the best of our knowledge, this is the first study using recurrent neural networks and the most influential environmental variable for prediction of the WNv vector Culex sp. population dynamics, providing a new framework to be used in arthropod decision-support systems. Abstract A central issue of public health strategies is the availability of decision tools to be used in the preventive management of the transmission cycle of vector-borne diseases. In this work, we present, for the first time, a soft system computing modeling approach using two dynamic artificial neural network (ANNs) models to describe and predict the non-linear incidence and time evolution of a medically important mosquito species, Culex sp., in Northern Greece. The first model is an exogenous non-linear autoregressive recurrent neural network (NARX), which is designed to take as inputs the temperature as an exogenous variable and mosquito abundance as endogenous variable. The second model is a focused time-delay neural network (FTD), which takes into account only the temperature variable as input to provide forecasts of the mosquito abundance as the target variable. Both models behaved well considering the non-linear nature of the adult mosquito abundance data. Although, the NARX model predicted slightly better (R = 0.623) compared to the FTD model (R = 0.534), the advantage of the FTD over the NARX neural network model is that it can be applied in the case where past values of the population system, here mosquito abundance, are not available for their forecasting.
Collapse
|
14
|
Mehmood A, Khan IR, Dawood H, Dawood H. A non-uniform quantization scheme for visualization of CT images. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2021; 18:4311-4326. [PMID: 34198438 DOI: 10.3934/mbe.2021216] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Medical science heavily depends on image acquisition and post-processing for accurate diagnosis and treatment planning. The introduction of noise degrades the visual quality of the medical images during the capturing process, which may result in false perception. Therefore, medical image enhancement is an essential topic of research for the improvement of image quality. In this paper, a clustering-based contrast enhancement technique is presented for computed tomography (CT) images. Our approach uses the recursive splitting of data into clusters targeting the maximum error reduction in each cluster. This leads to grouping similar pixels in every cluster, maximizing inter-cluster and minimizing intra-cluster similarities. A suitable number of clusters can be chosen to represent high precision data with the desired bit-depth. We use 256 clusters to convert 16-bit CT scans to 8-bit images suitable for visualization on standard low dynamic range displays. We compare our method with several existing contrast enhancement algorithms and show that the proposed technique provides better results in terms of execution efficiency and quality of enhanced images.
Collapse
Affiliation(s)
- Anam Mehmood
- Department of Software Engineering, University of Engineering and Technology, Taxila, Pakistan
| | - Ishtiaq Rasool Khan
- Department of Computer Science and Artificial Intelligence, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
| | - Hassan Dawood
- Department of Software Engineering, University of Engineering and Technology, Taxila, Pakistan
| | - Hussain Dawood
- Department of Computer and Network Engineering, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
| |
Collapse
|
15
|
Cheerkoot-Jalim S, Khedo KK. A systematic review of text mining approaches applied to various application areas in the biomedical domain. JOURNAL OF KNOWLEDGE MANAGEMENT 2020. [DOI: 10.1108/jkm-09-2019-0524] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Purpose
This work shows the results of a systematic literature review on biomedical text mining. The purpose of this study is to identify the different text mining approaches used in different application areas of the biomedical domain, the common tools used and the challenges of biomedical text mining as compared to generic text mining algorithms. This study will be of value to biomedical researchers by allowing them to correlate text mining approaches to specific biomedical application areas. Implications for future research are also discussed.
Design/methodology/approach
The review was conducted following the principles of the Kitchenham method. A number of research questions were first formulated, followed by the definition of the search strategy. The papers were then selected based on a list of assessment criteria. Each of the papers were analyzed and information relevant to the research questions were extracted.
Findings
It was found that researchers have mostly harnessed data sources such as electronic health records, biomedical literature, social media and health-related forums. The most common text mining technique was natural language processing using tools such as MetaMap and Unstructured Information Management Architecture, alongside the use of medical terminologies such as Unified Medical Language System. The main application area was the detection of adverse drug events. Challenges identified included the need to deal with huge amounts of text, the heterogeneity of the different data sources, the duality of meaning of words in biomedical text and the amount of noise introduced mainly from social media and health-related forums.
Originality/value
To the best of the authors’ knowledge, other reviews in this area have focused on either specific techniques, specific application areas or specific data sources. The results of this review will help researchers to correlate most relevant and recent advances in text mining approaches to specific biomedical application areas by providing an up-to-date and holistic view of work done in this research area. The use of emerging text mining techniques has great potential to spur the development of innovative applications, thus considerably impacting on the advancement of biomedical research.
Collapse
|
16
|
Pham T, Tao X, Zhang J, Yong J. Constructing a knowledge-based heterogeneous information graph for medical health status classification. Health Inf Sci Syst 2020; 8:10. [PMID: 32117570 DOI: 10.1007/s13755-020-0100-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2019] [Accepted: 01/23/2020] [Indexed: 02/06/2023] Open
Abstract
Applying Pearson correlation and semantic relations in building a heterogeneous information graph (HIG) to develop a classification model has achieved a notable performance in improving the accuracy of predicting the status of health risks. In this study, the approach that was used, integrated knowledge of the medical domain as well as taking advantage of applying Pearson correlation and semantic relations in building a classification model for diagnosis. The research mined knowledge which was extracted from titles and abstracts of MEDLINE to discover how to assess the links between objects relating to medical concepts. A knowledge-base HIG model then was developed for the prediction of a patient's health status. The results of the experiment showed that the knowledge-base model was superior to the baseline model and has demonstrated that the knowledge-base could help improve the performance of the classification model. The contribution of this study has been to provide a framework for applying a knowledge-base in the classification model which helps these models achieve the best performance of predictions. This study has also contributed a model to medical practice to help practitioners become more confident in making final decisions in diagnosing illness. Moreover, this study affirmed that biomedical literature could assist in building a classification model. This contribution will be advantageous for future researchers in mining the knowledge-base to develop different kinds of classification models.
Collapse
Affiliation(s)
- Thuan Pham
- University of Southern Queensland, Toowoomba, Australia
| | - Xiaohui Tao
- University of Southern Queensland, Toowoomba, Australia
| | - Ji Zhang
- University of Southern Queensland, Toowoomba, Australia
| | - Jianming Yong
- University of Southern Queensland, Toowoomba, Australia
| |
Collapse
|
17
|
Classification and prediction of diabetes disease using machine learning paradigm. Health Inf Sci Syst 2020; 8:7. [PMID: 31949894 DOI: 10.1007/s13755-019-0095-z] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2019] [Accepted: 12/21/2019] [Indexed: 12/19/2022] Open
Abstract
Background and objectives Diabetes is a chronic disease characterized by high blood sugar. It may cause many complicated disease like stroke, kidney failure, heart attack, etc. About 422 million people were affected by diabetes disease in worldwide in 2014. The figure will be reached 642 million in 2040. The main objective of this study is to develop a machine learning (ML)-based system for predicting diabetic patients. Materials and methods Logistic regression (LR) is used to identify the risk factors for diabetes disease based on p value and odds ratio (OR). We have adopted four classifiers like naïve Bayes (NB), decision tree (DT), Adaboost (AB), and random forest (RF) to predict the diabetic patients. Three types of partition protocols (K2, K5, and K10) have also adopted and repeated these protocols into 20 trails. Performances of these classifiers are evaluated using accuracy (ACC) and area under the curve (AUC). Results We have used diabetes dataset, conducted in 2009-2012, derived from the National Health and Nutrition Examination Survey. The dataset consists of 6561 respondents with 657 diabetic and 5904 controls. LR model demonstrates that 7 factors out of 14 as age, education, BMI, systolic BP, diastolic BP, direct cholesterol, and total cholesterol are the risk factors for diabetes. The overall ACC of ML-based system is 90.62%. The combination of LR-based feature selection and RF-based classifier gives 94.25% ACC and 0.95 AUC for K10 protocol. Conclusion The combination of LR and RF-based classifier performs better. This combination will be very helpful for predicting diabetic patients.
Collapse
|
18
|
Jose JM, Yilmaz E, Magalhães J, Castells P, Ferro N, Silva MJ, Martins F. DSR: A Collection for the Evaluation of Graded Disease-Symptom Relations. LECTURE NOTES IN COMPUTER SCIENCE 2020. [PMCID: PMC7148057 DOI: 10.1007/978-3-030-45442-5_54] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The effective extraction of ranked disease-symptom relationships is a critical component in various medical tasks, including computer-assisted medical diagnosis or the discovery of unexpected associations between diseases. While existing disease-symptom relationship extraction methods are used as the foundation in the various medical tasks, no collection is available to systematically evaluate the performance of such methods. In this paper, we introduce the Disease-SymptomRelation Collection (dsr-collection), created by five physicians as expert annotators. We provide graded symptom judgments for diseases by differentiating between relevant symptoms and primary symptoms. Further, we provide several strong baselines, based on the methods used in previous studies. The first method is based on word embeddings, and the second on co-occurrences of MeSH-keywords of medical articles. For the co-occurrence method, we propose an adaption in which not only keywords are considered, but also the full text of medical articles. The evaluation on the dsr-collection shows the effectiveness of the proposed adaption in terms of nDCG, precision, and recall.
Collapse
|
19
|
Xue M, Su Y, Li C, Wang S, Yao H. Identification of Potential Type II Diabetes in a Large-Scale Chinese Population Using a Systematic Machine Learning Framework. J Diabetes Res 2020; 2020:6873891. [PMID: 33029536 PMCID: PMC7532405 DOI: 10.1155/2020/6873891] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Revised: 08/01/2020] [Accepted: 09/02/2020] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND An estimated 425 million people globally have diabetes, accounting for 12% of the world's health expenditures, and the number continues to grow, placing a huge burden on the healthcare system, especially in those remote, underserved areas. METHODS A total of 584,168 adult subjects who have participated in the national physical examination were enrolled in this study. The risk factors for type II diabetes mellitus (T2DM) were identified by p values and odds ratio, using logistic regression (LR) based on variables of physical measurement and a questionnaire. Combined with the risk factors selected by LR, we used a decision tree, a random forest, AdaBoost with a decision tree (AdaBoost), and an extreme gradient boosting decision tree (XGBoost) to identify individuals with T2DM, compared the performance of the four machine learning classifiers, and used the best-performing classifier to output the degree of variables' importance scores of T2DM. RESULTS The results indicated that XGBoost had the best performance (accuracy = 0.906, precision = 0.910, recall = 0.902, F-1 = 0.906, and AUC = 0.968). The degree of variables' importance scores in XGBoost showed that BMI was the most significant feature, followed by age, waist circumference, systolic pressure, ethnicity, smoking amount, fatty liver, hypertension, physical activity, drinking status, dietary ratio (meat to vegetables), drink amount, smoking status, and diet habit (oil loving). CONCLUSIONS We proposed a classifier based on LR-XGBoost which used fourteen variables of patients which are easily obtained and noninvasive as predictor variables to identify potential incidents of T2DM. The classifier can accurately screen the risk of diabetes in the early phrase, and the degree of variables' importance scores gives a clue to prevent diabetes occurrence.
Collapse
Affiliation(s)
- Mingyue Xue
- Hospital of Traditional Chinese Medicine Affiliated to the Fourth Clinical Medical College of Xinjiang Medical University, Urumqi, China
- College of Public Health, Xinjiang Medical University, Urumqi, China
| | - Yinxia Su
- College of Public Health, Xinjiang Medical University, Urumqi, China
| | - Chen Li
- The First Affiliated Hospital of Xinjiang Medical University, Urumqi, China
| | - Shuxia Wang
- Center of Health Management, The First Affiliated Hospital, Xinjiang Medical University, Urumqi, China
| | - Hua Yao
- Center of Health Management, The First Affiliated Hospital, Xinjiang Medical University, Urumqi, China
| |
Collapse
|
20
|
Siuly S, Zhang X. Guest Editorial: Special issue on "Application of artificial intelligence in health research". Health Inf Sci Syst 2019; 8:1. [PMID: 31867102 DOI: 10.1007/s13755-019-0089-x] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Affiliation(s)
- Siuly Siuly
- 1Institute for Sustainable Industries & Liveable Cities, Victoria University, Melbourne, Australia
| | - Xiangliang Zhang
- 2King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| |
Collapse
|
21
|
Liu S, Lee I. Extracting features with medical sentiment lexicon and position encoding for drug reviews. Health Inf Sci Syst 2019; 7:11. [PMID: 31168364 PMCID: PMC6542915 DOI: 10.1007/s13755-019-0072-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2018] [Accepted: 05/15/2019] [Indexed: 11/26/2022] Open
Abstract
Medical sentiment analysis refers to the extraction of sentiments or emotions from documents retrieved from healthcare sources, such as public forums and drug review websites. Previous studies prove that sentiment analysis for clinical documents has the potential for assisting patients with information for self assessing treatments, providing health professionals with more insights into patients' health conditions, or even managing relations between patients and doctors. Nevertheless, the lack of data used for empirical experiments in previous research indicates that there are strong needs for a systematic framework in order to identify medical field specific sentiments. We propose a new feature extraction approach utilising position embeddings to generate a medical domain enhanced sentiment lexicon with position encoding representation for drug review sentiment analysis. Experiments on different feature extraction methods using two types of sentiment lexicons with various machine learning classifiers, support the superior performance of sentiment classification with position encoding incorporated medical sentiment lexicon for drug review datasets.
Collapse
Affiliation(s)
- Sisi Liu
- Discipline of Computer Science & Information Technology, College of Science & Engineering, James Cook University, PO Box 6811, Cairns, QLD 4870 Australia
| | - Ickjai Lee
- Discipline of Computer Science & Information Technology, College of Science & Engineering, James Cook University, PO Box 6811, Cairns, QLD 4870 Australia
| |
Collapse
|
22
|
Yazdani A, Safdari R, Golkar A, R Niakan Kalhori S. Words prediction based on N-gram model for free-text entry in electronic health records. Health Inf Sci Syst 2019; 7:6. [PMID: 30886701 DOI: 10.1007/s13755-019-0065-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2017] [Accepted: 02/01/2019] [Indexed: 12/29/2022] Open
Abstract
The process of documentation is one of the most important parts of electronic health records (EHR). It is time-consuming, and up until now, available documentation procedures have not been able to overcome this type of EHR limitations. Thus, entering information into EHR still has remained a challenge. In this study, by applying the trigram language model, we presented a method to predict the next words while typing free texts. It is hypothesized that using this system may save typing time of free text. The words prediction model introduced in this research was trained and tested on the free texts regarding to colonoscopy, transesophageal echocardiogram, and anterior-cervical-decompression. Required time of typing for each of the above-mentioned reports calculated and compared with manual typing of the same words. It is revealed that 33.36% reduction in typing time and 73.53% reduction in keystroke. The designed system reduced the time of typing free text which might be an approach for EHRs improvement in terms of documentation.
Collapse
Affiliation(s)
- Azita Yazdani
- 1Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran
| | - Reza Safdari
- 1Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran
| | - Ali Golkar
- Department of Computer Engineering, Yasooj Branch, Islamic Azad University, Yasooj, Iran
| | - Sharareh R Niakan Kalhori
- 1Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran
| |
Collapse
|