1
|
Al Meslamani AZ, Sobrino I, de la Fuente J. Machine learning in infectious diseases: potential applications and limitations. Ann Med 2024; 56:2362869. [PMID: 38853633 PMCID: PMC11168216 DOI: 10.1080/07853890.2024.2362869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 05/02/2024] [Indexed: 06/11/2024] Open
Abstract
Infectious diseases are a major threat for human and animal health worldwide. Artificial Intelligence (AI) combined algorithms including Machine Learning and Big Data analytics have emerged as a potential solution to analyse diverse datasets and face challenges posed by infectious diseases. In this commentary we explore the potential applications and limitations of ML to management of infectious disease. It explores challenges in key areas such as outbreak prediction, pathogen identification, drug discovery, and personalized medicine. We propose potential solutions to mitigate these hurdles and applications of ML to identify biomolecules for effective treatment and prevention of infectious diseases. In addition to use of ML for management of infectious diseases, potential applications are based on catastrophic evolution events for the identification of biomolecular targets to reduce risks for infectious diseases and vaccinomics for discovery and characterization of vaccine protective antigens using intelligent Big Data analytics techniques. These considerations set a foundation for developing effective strategies for managing infectious diseases in the future.
Collapse
Affiliation(s)
- Ahmad Z. Al Meslamani
- College of Pharmacy, Al Ain University, Abu Dhabi, United Arab Emirates
- AAU Health and Biomedical Research Center, Al Ain University, Abu Dhabi, United Arab Emirates
| | - Isidro Sobrino
- SaBio, Instituto de Investigación en Recursos Cinegéticos (IREC), Consejo Superior de Investigaciones Científicas (CSIC), Universidad de Castilla-La Mancha (UCLM)-Junta de Comunidades de Castilla-La Mancha (JCCM), Ciudad Real, Spain
| | - José de la Fuente
- SaBio, Instituto de Investigación en Recursos Cinegéticos (IREC), Consejo Superior de Investigaciones Científicas (CSIC), Universidad de Castilla-La Mancha (UCLM)-Junta de Comunidades de Castilla-La Mancha (JCCM), Ciudad Real, Spain
- Department of Veterinary Pathobiology, Center for Veterinary Health Sciences, OK State University, Stillwater, Oklahoma, USA
| |
Collapse
|
2
|
Xu N, Cai Y, Tong Y, Tang L, Zhou Y, Gong Y, Huang J, Wang J, Chen Y, Jiang Q, Zheng M, Zhou Y. Prediction on the spatial distribution of the seropositive rate of schistosomiasis in Hunan Province, China: a machine learning model integrated with the Kriging method. Parasitol Res 2024; 123:316. [PMID: 39230789 DOI: 10.1007/s00436-024-08331-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Accepted: 08/19/2024] [Indexed: 09/05/2024]
Abstract
Schistosomiasis remains a formidable challenge to global public health. This study aims to predict the spatial distribution of schistosomiasis seropositive rates in Hunan Province, pinpointing high-risk transmission areas and advocating for tailored control measures in low-endemic regions. Six machine learning models and their corresponding hybrid machine learning-Kriging models were employed to predict the seropositive rate. The optimal model was selected through internal and external validations to simulate the spatial distribution of seropositive rates. Our results showed that the hybrid machine learning-Kriging model demonstrated superior predictive performance compared to basic machine learning model and the Cubist-Kriging model emerged as the most optimal model for this study. The predictive map revealed elevated seropositive rates around Dongting Lake and its waterways with significant clustering, notably in the central and northern regions of Yiyang City and the northeastern areas of Changde City. The model identified gross domestic product, annual average wind speed and the nearest distance from the river as the top three predictors of seropositive rates, with annual average daytime surface temperature contributing the least. In conclusion, our research has revealed that integrating the Kriging method significantly enhances the predictive performance of machine learning models. We developed a Cubist-Kriging model with high predictive performance to forecast the spatial distribution of schistosomiasis seropositive rates. These findings provide valuable guidance for the precise prevention and control of schistosomiasis.
Collapse
Affiliation(s)
- Ning Xu
- Fudan University School of Public Health, Building 8, 130 Dong'an Road, Shanghai, 200032, China
- Key Laboratory of Public Health Safety, Ministry of Education, Fudan University, Building 8, 130 Dong'an Road, Shanghai, 200032, China
- Fudan University Center for Tropical Disease Research, Building 8, 130 Dong'an Road, Shanghai, 200032, China
| | - Yu Cai
- Hunan Institute for Schistosomiasis Control, Jin'e Middle Road, Yueyang, 414021, Hunan, China
| | - Yixin Tong
- Fudan University School of Public Health, Building 8, 130 Dong'an Road, Shanghai, 200032, China
- Key Laboratory of Public Health Safety, Ministry of Education, Fudan University, Building 8, 130 Dong'an Road, Shanghai, 200032, China
- Fudan University Center for Tropical Disease Research, Building 8, 130 Dong'an Road, Shanghai, 200032, China
| | - Ling Tang
- Hunan Institute for Schistosomiasis Control, Jin'e Middle Road, Yueyang, 414021, Hunan, China
| | - Yu Zhou
- Fudan University School of Public Health, Building 8, 130 Dong'an Road, Shanghai, 200032, China
- Key Laboratory of Public Health Safety, Ministry of Education, Fudan University, Building 8, 130 Dong'an Road, Shanghai, 200032, China
- Fudan University Center for Tropical Disease Research, Building 8, 130 Dong'an Road, Shanghai, 200032, China
| | - Yanfeng Gong
- Fudan University School of Public Health, Building 8, 130 Dong'an Road, Shanghai, 200032, China
- Key Laboratory of Public Health Safety, Ministry of Education, Fudan University, Building 8, 130 Dong'an Road, Shanghai, 200032, China
- Fudan University Center for Tropical Disease Research, Building 8, 130 Dong'an Road, Shanghai, 200032, China
| | - Junhui Huang
- Fudan University School of Public Health, Building 8, 130 Dong'an Road, Shanghai, 200032, China
- Key Laboratory of Public Health Safety, Ministry of Education, Fudan University, Building 8, 130 Dong'an Road, Shanghai, 200032, China
- Fudan University Center for Tropical Disease Research, Building 8, 130 Dong'an Road, Shanghai, 200032, China
| | - Jiamin Wang
- Fudan University School of Public Health, Building 8, 130 Dong'an Road, Shanghai, 200032, China
- Key Laboratory of Public Health Safety, Ministry of Education, Fudan University, Building 8, 130 Dong'an Road, Shanghai, 200032, China
- Fudan University Center for Tropical Disease Research, Building 8, 130 Dong'an Road, Shanghai, 200032, China
| | - Yue Chen
- School of Epidemiology and Public Health, Faculty of Medicine, University of Ottawa, 600 Peter Morand Crescent, Ottawa, ON, K1G 5Z3, Canada
| | - Qingwu Jiang
- Fudan University School of Public Health, Building 8, 130 Dong'an Road, Shanghai, 200032, China
- Key Laboratory of Public Health Safety, Ministry of Education, Fudan University, Building 8, 130 Dong'an Road, Shanghai, 200032, China
- Fudan University Center for Tropical Disease Research, Building 8, 130 Dong'an Road, Shanghai, 200032, China
| | - Mao Zheng
- Hunan Institute for Schistosomiasis Control, Jin'e Middle Road, Yueyang, 414021, Hunan, China.
| | - Yibiao Zhou
- Fudan University School of Public Health, Building 8, 130 Dong'an Road, Shanghai, 200032, China.
- Key Laboratory of Public Health Safety, Ministry of Education, Fudan University, Building 8, 130 Dong'an Road, Shanghai, 200032, China.
- Fudan University Center for Tropical Disease Research, Building 8, 130 Dong'an Road, Shanghai, 200032, China.
| |
Collapse
|
3
|
Sidhoum NR, Boucheikhchoukh M, Azzouzi C, Mechouk N, Culda CA, Ionică AM, Balmos OM, Mihalca AD, Deak G. Molecular survey of flea-borne pathogens in fleas associated with carnivores from Algeria and an Artificial Neural Network-based risk analysis of flea-borne diseases. Res Vet Sci 2024; 171:105235. [PMID: 38554609 DOI: 10.1016/j.rvsc.2024.105235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 03/19/2024] [Accepted: 03/19/2024] [Indexed: 04/02/2024]
Abstract
As ectoparasites and efficient vectors of pathogens fleas constitute a source of nuisance for animals as well as a major issue for public health in Algeria. In this study, a molecular survey has been conducted to investigate the presence of pathogens in fleas infesting domestic and wild carnivores in the central north and eastern north and south of Algeria. The molecular screening that targeted Acanthocheilonema reconditum, Bartonella spp.,and Dipylidium caninum, was supplemented by a comprehensive analysis of risk factors related to flea-borne pathogens, drawing data from all documentation across multiple languages and sources from Morocco, Algeria, and Tunisia. In the current study, several Bartonella spp. 56/430 (13.02%) and Dipylidium caninum 3/430 (0.7%) were identified. The sequencing results revealed 5/23 (21.74%) B. clarridgeiae, 3/23 (13.04%) B. henselae, and 3/23 (13.04%) B. vinsonii. The two haplotypes, H1 and H2, of D. caninum were identified for the first time in North Africa. The results of the Artificial Neural Network risk analyses unveiled that the prevalence of pathogens and the presence of host generalist fleas as well as the vectorial competence are the most determinant risk factors of flea-borne diseases in Maghreb.
Collapse
Affiliation(s)
- Noureddine Rabah Sidhoum
- Department of Veterinary Sciences, Chadli Bendjedid El Tarf University, PB 73, El-Tarf 36000, Algeria; Biodiversity and Ecosystems Pollution Laboratory, Faculty of Life and Nature Sciences, Chadli Bendjedid El Tarf University, El Tarf 36000, Algeria
| | - Mehdi Boucheikhchoukh
- Department of Veterinary Sciences, Chadli Bendjedid El Tarf University, PB 73, El-Tarf 36000, Algeria.
| | - Chaima Azzouzi
- Department of Veterinary Sciences, Chadli Bendjedid El Tarf University, PB 73, El-Tarf 36000, Algeria; Biodiversity and Ecosystems Pollution Laboratory, Faculty of Life and Nature Sciences, Chadli Bendjedid El Tarf University, El Tarf 36000, Algeria
| | - Noureddine Mechouk
- Ecology of Terrestrial and Aquatics Systems Laboratory (EcoSTAq), Department of Biology, Faculty of Science, Badji Mokhtar University, Annaba 23200, Algeria; Department of Parasitology and Parasitic Diseases, Faculty of Veterinary Medicine, University of Agricultural Sciences and Veterinary Medicine of Cluj-Napoca, Calea Mănăștur 3-5, Cluj-Napoca 400372, Romania
| | - Carla Andreea Culda
- Department of Parasitology and Parasitic Diseases, Faculty of Veterinary Medicine, University of Agricultural Sciences and Veterinary Medicine of Cluj-Napoca, Calea Mănăștur 3-5, Cluj-Napoca 400372, Romania
| | - Angela Monica Ionică
- Department of Parasitology and Parasitic Diseases, Faculty of Veterinary Medicine, University of Agricultural Sciences and Veterinary Medicine of Cluj-Napoca, Calea Mănăștur 3-5, Cluj-Napoca 400372, Romania; Clinical Hospital of Infectious Diseases of Cluj-Napoca, Iuliu Moldovan 23, Cluj-Napoca 400348, Romania
| | - Oana-Maria Balmos
- Department of Parasitology and Parasitic Diseases, Faculty of Veterinary Medicine, University of Agricultural Sciences and Veterinary Medicine of Cluj-Napoca, Calea Mănăștur 3-5, Cluj-Napoca 400372, Romania
| | - Andrei Daniel Mihalca
- Department of Parasitology and Parasitic Diseases, Faculty of Veterinary Medicine, University of Agricultural Sciences and Veterinary Medicine of Cluj-Napoca, Calea Mănăștur 3-5, Cluj-Napoca 400372, Romania
| | - Georgiana Deak
- Department of Parasitology and Parasitic Diseases, Faculty of Veterinary Medicine, University of Agricultural Sciences and Veterinary Medicine of Cluj-Napoca, Calea Mănăștur 3-5, Cluj-Napoca 400372, Romania.
| |
Collapse
|
4
|
Daoud S, Taha M. Protein characteristics substantially influence the propensity of activity cliffs among kinase inhibitors. Sci Rep 2024; 14:9058. [PMID: 38643174 PMCID: PMC11032345 DOI: 10.1038/s41598-024-59501-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2023] [Accepted: 04/11/2024] [Indexed: 04/22/2024] Open
Abstract
Activity cliffs (ACs) are pairs of structurally similar molecules with significantly different affinities for a biotarget, posing a challenge in computer-assisted drug discovery. This study focuses on protein kinases, significant therapeutic targets, with some exhibiting ACs while others do not despite numerous inhibitors. The hypothesis that the presence of ACs is dependent on the target protein and its complete structural context is explored. Machine learning models were developed to link protein properties to ACs, revealing specific tripeptide sequences and overall protein properties as critical factors in ACs occurrence. The study highlights the importance of considering the entire protein matrix rather than just the binding site in understanding ACs. This research provides valuable insights for drug discovery and design, paving the way for addressing ACs-related challenges in modern computational approaches.
Collapse
Affiliation(s)
- Safa Daoud
- Department of Pharmaceutical Chemistry and Pharmacognosy, Faculty of Pharmacy, Applied Sciences Private University, Amman, Jordan.
| | - Mutasem Taha
- Department of Pharmaceutical Sciences, Faculty of Pharmacy, University of Jordan, Amman, Jordan.
| |
Collapse
|
5
|
Jaradat NJ, Hatmal M, Alqudah D, Taha MO. Computational workflow for discovering small molecular binders for shallow binding sites by integrating molecular dynamics simulation, pharmacophore modeling, and machine learning: STAT3 as case study. J Comput Aided Mol Des 2023; 37:659-678. [PMID: 37597062 DOI: 10.1007/s10822-023-00528-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Accepted: 07/26/2023] [Indexed: 08/21/2023]
Abstract
STAT3 belongs to a family of seven transcription factors. It plays an important role in activating the transcription of various genes involved in a variety of cellular processes. High levels of STAT3 are detected in several types of cancer. Hence, STAT3 inhibition is considered a promising therapeutic anti-cancer strategy. However, since STAT3 inhibitors bind to the shallow SH2 domain of the protein, it is expected that hydration water molecules play significant role in ligand-binding complicating the discovery of potent binders. To remedy this issue, we herein propose to extract pharmacophores from molecular dynamics (MD) frames of a potent co-crystallized ligand complexed within STAT3 SH2 domain. Subsequently, we employ genetic function algorithm coupled with machine learning (GFA-ML) to explore the optimal combination of MD-derived pharmacophores that can account for the variations in bioactivity among a list of inhibitors. To enhance the dataset, the training and testing lists were augmented nearly a 100-fold by considering multiple conformers of the ligands. A single significant pharmacophore emerged after 188 ns of MD simulation to represent STAT3-ligand binding. Screening the National Cancer Institute (NCI) database with this model identified one low micromolar inhibitor most likely binds to the SH2 domain of STAT3 and inhibits this pathway.
Collapse
Affiliation(s)
- Nour Jamal Jaradat
- Department of Medical Laboratory Sciences, Faculty of Applied Health Sciences, The Hashemite University, P.O. Box 330127, Zarqa, 13133, Jordan
| | - Mamon Hatmal
- Department of Medical Laboratory Sciences, Faculty of Applied Health Sciences, The Hashemite University, P.O. Box 330127, Zarqa, 13133, Jordan
| | - Dana Alqudah
- Cell Therapy Center, the University of Jordan, Amman, 11942, Jordan
| | - Mutasem Omar Taha
- Department of Pharmaceutical Sciences, Faculty of Pharmacy, University of Jordan, Amman, Jordan.
| |
Collapse
|
6
|
Kakarla SG, Kondeti PK, Vavilala HP, Boddeda GSB, Mopuri R, Kumaraswamy S, Kadiri MR, Mutheneni SR. Weather integrated multiple machine learning models for prediction of dengue prevalence in India. INTERNATIONAL JOURNAL OF BIOMETEOROLOGY 2023; 67:285-297. [PMID: 36380258 PMCID: PMC9666965 DOI: 10.1007/s00484-022-02405-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Revised: 07/21/2022] [Accepted: 11/04/2022] [Indexed: 05/11/2023]
Abstract
Dengue is a rapidly spreading viral disease transmitted to humans by Aedes mosquitoes. Due to global urbanization and climate change, the number of dengue cases are gradually increasing in recent decades. Hence, an early prediction of dengue continues to be a major concern for public health in countries with high prevalence of dengue. Creating a robust forecast model for the accurate prediction of dengue is a complex task and can be done through various data modelling approaches. In the present study, we have applied vector auto regression, generalized boosted models, support vector regression, and long short-term memory (LSTM) to predict the dengue prevalence in Kerala state of the Indian subcontinent. We consider the number of dengue cases as the target variable and weather variables viz., relative humidity, soil moisture, mean temperature, precipitation, and NINO3.4 as independent variables. Various analytical models have been applied on both datasets and predicted the dengue cases. Among all the models, the LSTM model was outperformed with superior prediction capability (RMSE: 0.345 and R2:0.86) than the other models. However, other models are able to capture the trend of dengue cases but failed in predicting the outbreak periods when compared to LSTM. The findings of this study will be helpful for public health agencies and policymakers to draw appropriate control measures before the onset of dengue. The proposed LSTM model for dengue prediction can be followed by other states of India as well.
Collapse
Affiliation(s)
- Satya Ganesh Kakarla
- ENVIS Resource Partner On Climate Change and Public Health, Applied Biology Division, CSIR-Indian Institute of Chemical Technology (CSIR-IICT), Tarnaka, Hyderabad, 500007, Telangana, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - Phani Krishna Kondeti
- ENVIS Resource Partner On Climate Change and Public Health, Applied Biology Division, CSIR-Indian Institute of Chemical Technology (CSIR-IICT), Tarnaka, Hyderabad, 500007, Telangana, India
| | - Hari Prasad Vavilala
- ENVIS Resource Partner On Climate Change and Public Health, Applied Biology Division, CSIR-Indian Institute of Chemical Technology (CSIR-IICT), Tarnaka, Hyderabad, 500007, Telangana, India
| | - Gopi Sumanth Bhaskar Boddeda
- ENVIS Resource Partner On Climate Change and Public Health, Applied Biology Division, CSIR-Indian Institute of Chemical Technology (CSIR-IICT), Tarnaka, Hyderabad, 500007, Telangana, India
| | - Rajasekhar Mopuri
- ENVIS Resource Partner On Climate Change and Public Health, Applied Biology Division, CSIR-Indian Institute of Chemical Technology (CSIR-IICT), Tarnaka, Hyderabad, 500007, Telangana, India
| | - Sriram Kumaraswamy
- ENVIS Resource Partner On Climate Change and Public Health, Applied Biology Division, CSIR-Indian Institute of Chemical Technology (CSIR-IICT), Tarnaka, Hyderabad, 500007, Telangana, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - Madhusudhan Rao Kadiri
- ENVIS Resource Partner On Climate Change and Public Health, Applied Biology Division, CSIR-Indian Institute of Chemical Technology (CSIR-IICT), Tarnaka, Hyderabad, 500007, Telangana, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - Srinivasa Rao Mutheneni
- ENVIS Resource Partner On Climate Change and Public Health, Applied Biology Division, CSIR-Indian Institute of Chemical Technology (CSIR-IICT), Tarnaka, Hyderabad, 500007, Telangana, India.
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India.
| |
Collapse
|
7
|
Alabed SJ, Zihlif M, Taha M. Discovery of new potent lysine specific histone demythelase-1 inhibitors (LSD-1) using structure based and ligand based molecular modelling and machine learning. RSC Adv 2022; 12:35873-35895. [PMID: 36545090 PMCID: PMC9751883 DOI: 10.1039/d2ra05102h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Accepted: 12/05/2022] [Indexed: 12/23/2022] Open
Abstract
Lysine-specific histone demethylase 1 (LSD-1) is an epigenetic enzyme that oxidatively cleaves methyl groups from monomethyl and dimethyl Lys4 of histone H3 and is highly overexpressed in different types of cancer. Therefore, it has been widely recognized as a promising therapeutic target for cancer therapy. Towards this end, we employed various Computer Aided Drug Design (CADD) approaches including pharmacophore modelling and machine learning. Pharmacophores generated by structure-based (SB) (either crystallographic-based or docking-based) and ligand-based (LB) (either supervised or unsupervised) modelling methods were allowed to compete within the context of genetic algorithm/machine learning and were assessed by Shapley additive explanation values (SHAP) to end up with three successful pharmacophores that were used to screen the National Cancer Institute (NCI) database. Seventy-five NCI hits were tested for their LSD-1 inhibitory properties against neuroblastoma SH-SY5Y cells, pancreatic carcinoma Panc-1 cells, glioblastoma U-87 MG cells and in vitro enzymatic assay, culminating in 3 nanomolar LSD-1 inhibitors of novel chemotypes.
Collapse
Affiliation(s)
- Shada J Alabed
- Department of Pharmacy, Faculty of Pharmacy, Al-Zaytoonah University of Jordan Amman Jordan
| | - Malek Zihlif
- Department of Pharmacology, Faculty of Medicine, University of Jordan Amman Jordan
| | - Mutasem Taha
- Department of Pharmaceutical Sciences, Faculty of Pharmacy, University of Jordan Amman Jordan
| |
Collapse
|
8
|
Althomsons SP, Winglee K, Heilig CM, Talarico S, Silk B, Wortham J, Hill AN, Navin TR. Using Machine Learning Techniques and National Tuberculosis Surveillance Data to Predict Excess Growth in Genotyped Tuberculosis Clusters. Am J Epidemiol 2022; 191:1936-1943. [PMID: 35780450 PMCID: PMC10790200 DOI: 10.1093/aje/kwac117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Revised: 05/05/2022] [Accepted: 06/28/2022] [Indexed: 02/01/2023] Open
Abstract
The early identification of clusters of persons with tuberculosis (TB) that will grow to become outbreaks creates an opportunity for intervention in preventing future TB cases. We used surveillance data (2009-2018) from the United States, statistically derived definitions of unexpected growth, and machine-learning techniques to predict which clusters of genotype-matched TB cases are most likely to continue accumulating cases above expected growth within a 1-year follow-up period. We developed a model to predict which clusters are likely to grow on a training and testing data set that was generalizable to a validation data set. Our model showed that characteristics of clusters were more important than the social, demographic, and clinical characteristics of the patients in those clusters. For instance, the time between cases before unexpected growth was identified as the most important of our predictors. A faster accumulation of cases increased the probability of excess growth being predicted during the follow-up period. We have demonstrated that combining the characteristics of clusters and cases with machine learning can add to existing tools to help prioritize which clusters may benefit most from public health interventions. For example, consideration of an entire cluster, not only an individual patient, may assist in interrupting ongoing transmission.
Collapse
Affiliation(s)
- Sandy P. Althomsons
- Division of TB Elimination, National Center for HIV, Viral Hepatitis, STD, and TB Prevention, Centers for Disease Control and Prevention, Atlanta, Georgia, United States
| | - Kathryn Winglee
- Division of TB Elimination, National Center for HIV, Viral Hepatitis, STD, and TB Prevention, Centers for Disease Control and Prevention, Atlanta, Georgia, United States
| | - Charles M. Heilig
- Center for Surveillance, Epidemiology, and Laboratory Services, Centers for Disease Control and Prevention, Atlanta, Georgia, United States
| | - Sarah Talarico
- Division of TB Elimination, National Center for HIV, Viral Hepatitis, STD, and TB Prevention, Centers for Disease Control and Prevention, Atlanta, Georgia, United States
| | - Benjamin Silk
- Division of TB Elimination, National Center for HIV, Viral Hepatitis, STD, and TB Prevention, Centers for Disease Control and Prevention, Atlanta, Georgia, United States
| | - Jonathan Wortham
- Division of TB Elimination, National Center for HIV, Viral Hepatitis, STD, and TB Prevention, Centers for Disease Control and Prevention, Atlanta, Georgia, United States
| | - Andrew N. Hill
- Division of TB Elimination, National Center for HIV, Viral Hepatitis, STD, and TB Prevention, Centers for Disease Control and Prevention, Atlanta, Georgia, United States
| | - Thomas R. Navin
- Division of TB Elimination, National Center for HIV, Viral Hepatitis, STD, and TB Prevention, Centers for Disease Control and Prevention, Atlanta, Georgia, United States
| |
Collapse
|
9
|
Ghane M, Ang MC, Nilashi M, Sorooshian S. Enhanced decision tree induction using evolutionary techniques for Parkinson's disease classification. Biocybern Biomed Eng 2022. [DOI: 10.1016/j.bbe.2022.07.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
10
|
Singh SK, Taylor RW, Pradhan B, Shirzadi A, Pham BT. Predicting sustainable arsenic mitigation using machine learning techniques. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2022; 232:113271. [PMID: 35121252 DOI: 10.1016/j.ecoenv.2022.113271] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 01/21/2022] [Accepted: 01/28/2022] [Indexed: 06/14/2023]
Abstract
This study evaluates state-of-the-art machine learning models in predicting the most sustainable arsenic mitigation preference. A Gaussian distribution-based Naïve Bayes (NB) classifier scored the highest Area Under the Curve (AUC) of the Receiver Operating Characteristic curve (0.82), followed by Nu Support Vector Classification (0.80), and K-Neighbors (0.79). Ensemble classifiers scored higher than 70% AUC, with Random Forest being the top performer (0.77), and Decision Tree model ranked fourth with an AUC of 0.77. The multilayer perceptron model also achieved high performance (AUC=0.75). Most linear classifiers underperformed, with the Ridge classifier at the top (AUC=0.73) and perceptron at the bottom (AUC=0.57). A Bernoulli distribution-based Naïve Bayes classifier was the poorest model (AUC=0.50). The Gaussian NB was also the most robust ML model with the slightest variation of Kappa score on training (0.58) and test data (0.64). The results suggest that nonlinear or ensemble classifiers could more accurately understand the complex relationships of socio-environmental data and help develop accurate and robust prediction models of sustainable arsenic mitigation. Furthermore, Gaussian NB is the best option when data is scarce.
Collapse
Affiliation(s)
- Sushant K Singh
- Department of Earth and Environmental Studies, Montclair State University, New Jersey, USA; The Center for Artificial Intelligence and Environmental Sustainability (CAIES) Foundation, Patna, Bihar, India.
| | - Robert W Taylor
- Department of Earth and Environmental Studies, Montclair State University, New Jersey, USA.
| | - Biswajeet Pradhan
- Centre for Advanced Modelling and Geospatial Information Systems (CAMGIS), School of Civil and Environmental Engineering, University of Technology Sydney, NSW 2007, Australia; Department of Energy and Mineral Resources Engineering, Sejong University, Choongmu-gwan, 209 Neungdong-ro Gwangjin-gu, Seoul 05006, Republic of Korea; Center of Excellence for Climate Change Research, King Abdulaziz University, P. O. Box 80234, Jeddah 21589, Saudi Arabia; Earth Observation Centre, Institute of Climate Change, Universiti Kebangsaan Malaysia, 43600 UKM, Bangi, Selangor, Malaysia.
| | - Ataollah Shirzadi
- College of Natural Resources, Department of Rangeland and Watershed Management Sciences, University of Kurdistan, Sanandaj, Iran.
| | - Binh Thai Pham
- Department of Geotechnical Engineering, University of Transport Technology, 54 Trieu Khuc, Thanh Xuan, Ha Noi, Viet Nam.
| |
Collapse
|
11
|
Exploiting activity cliffs for building pharmacophore models and comparison with other pharmacophore generation methods: sphingosine kinase 1 as case study. J Comput Aided Mol Des 2022; 36:39-62. [PMID: 35059939 DOI: 10.1007/s10822-021-00435-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Accepted: 11/24/2021] [Indexed: 12/20/2022]
|
12
|
Hatmal MM, Alshaer W, Mahmoud IS, Al-Hatamleh MAI, Al-Ameer HJ, Abuyaman O, Zihlif M, Mohamud R, Darras M, Al Shhab M, Abu-Raideh R, Ismail H, Al-Hamadi A, Abdelhay A. Investigating the association of CD36 gene polymorphisms (rs1761667 and rs1527483) with T2DM and dyslipidemia: Statistical analysis, machine learning based prediction, and meta-analysis. PLoS One 2021; 16:e0257857. [PMID: 34648514 PMCID: PMC8516279 DOI: 10.1371/journal.pone.0257857] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Accepted: 09/11/2021] [Indexed: 12/15/2022] Open
Abstract
CD36 (cluster of differentiation 36) is a membrane protein involved in lipid metabolism and has been linked to pathological conditions associated with metabolic disorders, such as diabetes and dyslipidemia. A case-control study was conducted and included 177 patients with type-2 diabetes mellitus (T2DM) and 173 control subjects to study the involvement of CD36 gene rs1761667 (G>A) and rs1527483 (C>T) polymorphisms in the pathogenesis of T2DM and dyslipidemia among Jordanian population. Lipid profile, blood sugar, gender and age were measured and recorded. Also, genotyping analysis for both polymorphisms was performed. Following statistical analysis, 10 different neural networks and machine learning (ML) tools were used to predict subjects with diabetes or dyslipidemia. Towards further understanding of the role of CD36 protein and gene in T2DM and dyslipidemia, a protein-protein interaction network and meta-analysis were carried out. For both polymorphisms, the genotypic frequencies were not significantly different between the two groups (p > 0.05). On the other hand, some ML tools like multilayer perceptron gave high prediction accuracy (≥ 0.75) and Cohen's kappa (κ) (≥ 0.5). Interestingly, in K-star tool, the accuracy and Cohen's κ values were enhanced by including the genotyping results as inputs (0.73 and 0.46, respectively, compared to 0.67 and 0.34 without including them). This study confirmed, for the first time, that there is no association between CD36 polymorphisms and T2DM or dyslipidemia among Jordanian population. Prediction of T2DM and dyslipidemia, using these extensive ML tools and based on such input data, is a promising approach for developing diagnostic and prognostic prediction models for a wide spectrum of diseases, especially based on large medical databases.
Collapse
Affiliation(s)
- Ma’mon M. Hatmal
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, The Hashemite University, Zarqa, Jordan
- * E-mail:
| | - Walhan Alshaer
- Cell Therapy Centre, The University of Jordan, Amman, Jordan
| | - Ismail S. Mahmoud
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, The Hashemite University, Zarqa, Jordan
| | - Mohammad A. I. Al-Hatamleh
- Department of Immunology, School of Medical Sciences, Universiti Sains Malaysia, Kubang Kerian, Kelantan, Malaysia
| | - Hamzeh J. Al-Ameer
- Department of Biology and Biotechnology, American University of Madaba, Madaba, Jordan
- Department of Pharmacology, Faculty of Medicine, The University of Jordan, Amman, Jordan
| | - Omar Abuyaman
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, The Hashemite University, Zarqa, Jordan
| | - Malek Zihlif
- Department of Pharmacology, Faculty of Medicine, The University of Jordan, Amman, Jordan
| | - Rohimah Mohamud
- Department of Immunology, School of Medical Sciences, Universiti Sains Malaysia, Kubang Kerian, Kelantan, Malaysia
| | - Mais Darras
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, The Hashemite University, Zarqa, Jordan
| | - Mohammad Al Shhab
- Department of Pharmacology, Faculty of Medicine, The University of Jordan, Amman, Jordan
| | - Rand Abu-Raideh
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, The Hashemite University, Zarqa, Jordan
| | - Hilweh Ismail
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, The Hashemite University, Zarqa, Jordan
| | - Ali Al-Hamadi
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, The Hashemite University, Zarqa, Jordan
| | - Ali Abdelhay
- Department of Pharmacology, Faculty of Medicine, The University of Jordan, Amman, Jordan
| |
Collapse
|
13
|
Hatmal MM, Abuyaman O, Taha M. Docking-generated multiple ligand poses for bootstrapping bioactivity classifying Machine Learning: Repurposing covalent inhibitors for COVID-19-related TMPRSS2 as case study. Comput Struct Biotechnol J 2021; 19:4790-4824. [PMID: 34426763 PMCID: PMC8373588 DOI: 10.1016/j.csbj.2021.08.023] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Revised: 08/03/2021] [Accepted: 08/16/2021] [Indexed: 01/10/2023] Open
Abstract
In the present work we introduce the use of multiple docked poses for bootstrapping machine learning-based QSAR modelling. Ligand-receptor contact fingerprints are implemented as descriptor variables. We implemented this method for the discovery of potential inhibitors of the serine protease enzyme TMPRSS2 involved the infectivity of coronaviruses. Several machine learners were scanned, however, Xgboost, support vector machines (SVM) and random forests (RF) were the best with testing set accuracies reaching 90%. Three potential hits were identified upon using the method to scan known untested FDA approved drugs against TMPRSS2. Subsequent molecular dynamics simulation and covalent docking supported the results of the new computational approach.
Collapse
Affiliation(s)
- Ma'mon M. Hatmal
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, The Hashemite University, PO Box 330127, Zarqa 13133, Jordan
| | - Omar Abuyaman
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, The Hashemite University, PO Box 330127, Zarqa 13133, Jordan
| | - Mutasem Taha
- Department of Pharmaceutical Sciences, Faculty of Pharmacy, University of Jordan, Amman 11942, Jordan
| |
Collapse
|
14
|
Machine learning approach to support taxonomic species discrimination based on helminth collections data. Parasit Vectors 2021; 14:230. [PMID: 33933139 PMCID: PMC8088700 DOI: 10.1186/s13071-021-04721-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2020] [Accepted: 04/07/2021] [Indexed: 11/10/2022] Open
Abstract
Background There are more than 300 species of capillariids that parasitize various vertebrate groups worldwide. Species identification is hindered because of the few taxonomically informative structures available, making the task laborious and genus definition controversial. Thus, its taxonomy is one of the most complex among Nematoda. Eggs are the parasitic structures most viewed in coprological analysis in both modern and ancient samples; consequently, their presence is indicative of positive diagnosis for infection. The structure of the egg could play a role in genera or species discrimination. Institutional biological collections are taxonomic repositories of specimens described and strictly identified by systematics specialists. Methods The present work aims to characterize eggs of capillariid species deposited in institutional helminth collections and to process the morphological, morphometric and ecological data using machine learning (ML) as a new approach for taxonomic identification. Specimens of 28 species and 8 genera deposited at Coleção Helmintológica do Instituto Oswaldo Cruz (CHIOC, IOC/FIOCRUZ/Brazil) and Collection de Nématodes Zooparasites du Muséum National d’Histoire Naturelle de Paris (MNHN/France) were examined under light microscopy. In the morphological and morphometric analyses (MM), the total length and width of eggs as well as plugs and shell thickness were considered. In addition, eggshell ornamentations and ecological parameters of the geographical location (GL) and host (H) were included. Results The performance of the logistic model tree (LMT) algorithm showed the highest values in all metrics compared with the other algorithms. Algorithm J48 produced the most reliable decision tree for species identification alongside REPTree. The Majority Voting algorithm showed high metric values, but the combined classifiers did not attenuate the errors revealed in each algorithm alone. The statistical evaluation of the dataset indicated a significant difference between trees, with GL + H + MM and MM only with the best scores. Conclusions The present research proposed a novel procedure for taxonomic species identification, integrating data from centenary biological collections and the logic of artificial intelligence techniques. This study will support future research on taxonomic identification and diagnosis of both modern and archaeological capillariids. Graphical abstract Supplementary Information The online version contains supplementary material available at 10.1186/s13071-021-04721-6.
Collapse
|