1
|
Cui Y, Moyo S, Pretorius Holme M, Hurwitz KE, Choga W, Bennett K, Chakalisa U, San JE, Manyake K, Kgathi C, Diphoko A, Gaseitsiwe S, Gaolathe T, Essex M, Tchetgen Tchetgen E, Makhema JM, Lockman S. Predictors of HIV seroconversion in Botswana. AIDS 2025; 39:290-297. [PMID: 39497537 DOI: 10.1097/qad.0000000000004055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Accepted: 10/31/2024] [Indexed: 12/17/2024]
Abstract
OBJECTIVE To identify predictors of HIV acquisition in Botswana. DESIGN We applied machine learning approaches to identify HIV risk predictors using existing data from a large, well characterized HIV incidence cohort. METHODS We applied machine learning (randomForestSRC) to analyze data from a large population-based HIV incidence cohort enrolled in a cluster-randomized HIV prevention trial in 30 communities across Botswana. We sought to identify the most important risk factors for HIV acquisition, starting with 110 potential predictors. RESULTS During a median 29-month follow-up of 8551 HIV-negative adults, 147 (1.7%) acquired HIV. Our machine learning analysis found that for females, the most important variables for predicting HIV acquisition were the use of injectable hormonal contraception, frequency of sex in the prior 3 months with the most recent partner and residing in a community with HIV prevalence of 29% or higher. For the small proportion (0.3%) of females who had all three risk factors, their estimated probability of acquiring HIV during 29 months of follow-up was 34% (approximate annual incidence of 14%). For males, nonlong-term relationships with the most recent partner and community HIV prevalence of 34% or higher were the most important HIV risk predictors. The 6% of males who had both risk factors had a 5.1% probability of acquiring HIV during the follow-up period (approximate annual incidence of 2.1%). CONCLUSION Machine learning approaches allowed us to analyze a large number of variables to efficiently identify key factors strongly predictive of HIV risk. These factors could help target HIV prevention interventions in Botswana. CLINICAL TRIALS REGISTRATION NCT01965470.
Collapse
Affiliation(s)
- Yifan Cui
- Center for Data Science, Zhejiang University, Hangzhou, Zhejiang, China
| | - Sikhulile Moyo
- Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Botswana Harvard Health Partnership, Gaborone, Botswana
- Division of Medical Virology, Faculty of Medicine and Health Sciences, Stellenbosch University, Tygerberg, Cape Town, South Africa
| | - Molly Pretorius Holme
- Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | | | - Wonderful Choga
- Botswana Harvard Health Partnership, Gaborone, Botswana
- Division of Medical Virology, Faculty of Medicine and Health Sciences, Stellenbosch University, Tygerberg, Cape Town, South Africa
- School of Allied Health Sciences, Faculty of Health Sciences, University of Botswana, Gaborone, Botswana
- Centre for Epidemic Response and Innovation (CERI), School of Data Science and Computational Thinking, Stellenbosch University, Stellenbosch, South Africa
| | | | | | - James Emmanuel San
- Department of Internal Medicine, University of Botswana, Gaborone, Botswana
- Department of Statistics and Data Science
| | - Kutlo Manyake
- Botswana Harvard Health Partnership, Gaborone, Botswana
| | | | - Ame Diphoko
- Botswana Harvard Health Partnership, Gaborone, Botswana
| | | | - Tendani Gaolathe
- Botswana Harvard Health Partnership, Gaborone, Botswana
- Department of Internal Medicine, University of Botswana, Gaborone, Botswana
| | - M Essex
- Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Botswana Harvard Health Partnership, Gaborone, Botswana
| | - Eric Tchetgen Tchetgen
- Department of Statistics and Data Science
- Department of Biostatistics and Epidemiology, The Wharton School, University of Pennsylvania, Philadelphia, PA
| | - Joseph M Makhema
- Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Botswana Harvard Health Partnership, Gaborone, Botswana
| | - Shahin Lockman
- Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Botswana Harvard Health Partnership, Gaborone, Botswana
- Division of Infectious Diseases, Brigham & Women's Hospital, Boston, MA, USA
| |
Collapse
|
2
|
Jaiteh M, Phalane E, Shiferaw YA, Phaswana-Mafuya RN. The Application of Machine Learning Algorithms to Predict HIV Testing in Repeated Adult Population-Based Surveys in South Africa: Protocol for a Multiwave Cross-Sectional Analysis. JMIR Res Protoc 2025; 14:e59916. [PMID: 39870368 DOI: 10.2196/59916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Revised: 08/28/2024] [Accepted: 10/18/2024] [Indexed: 01/29/2025] Open
Abstract
BACKGROUND HIV testing is the cornerstone of HIV prevention and a pivotal step in realizing the Joint United Nations Program on HIV/AIDS (UNAIDS) goal of ending AIDS by 2030. Despite the availability of relevant survey data, there exists a research gap in using machine learning (ML) to analyze and predict HIV testing among adults in South Africa. Further investigation is needed to bridge this knowledge gap and inform evidence-based interventions to improve HIV testing. OBJECTIVE This study aims to determine consistent predictors of HIV testing by applying supervised ML algorithms in repeated adult population-based surveys in South Africa. METHODS A retrospective analysis of multiwave cross-sectional survey data will be conducted to determine the predictors of HIV testing among South African adults aged 18 years and older. A supervised ML technique will be applied across the five cycles of the South African National HIV Prevalence, Incidence, Behavior, and Communication Survey (SABSSM) surveys. The Human Science Research Council (HSRC) conducted the SABSSM surveys in 2002, 2005, 2008, 2012, and 2017. The available SABSSM datasets will be imported to RStudio (version 4.3.2; Posit Software, PBC) to clean and remove outliers. A chi-square test will be conducted to select important predictors of HIV testing. Each dataset will be split into 80% training and 20% test samples. Logistic regression, support vector machines, random forests, and decision trees will be used. A cross-validation technique will be used to divide the training sample into k-folds, including a validation set, and models will be trained on each fold. The models' performance will be evaluated on the validation set using evaluation metrics such as accuracy, precision, recall, F1-score, area under curve-receiver operating characteristics, and confusion matrix. RESULTS The SABSSM datasets are open access datasets available on the HSRC database. Ethics approval for this study was obtained from the University of Johannesburg Research and Ethics Committee on April 23, 2024 (REC-2725-2024). The authors were given access to all five SABSSM datasets by the HSRC on August 20, 2024. The datasets were explored to identify the independent variables likely influencing HIV testing uptake. The findings of this study will determine consistent variables predicting HIV testing uptake among the South African adult population over the course of 20 years. Furthermore, this study will evaluate and compare the performance metrics of the 4 different ML algorithms, and the best model will be used to develop an HIV testing predictive model. CONCLUSIONS This study will contribute to existing knowledge and deepen understanding of factors linked to HIV testing beyond traditional methods. Consequently, the findings would inform evidence-based policy recommendations that can guide policy makers to formulate more effective and targeted public health approaches toward strengthening HIV testing. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID) DERR1-10.2196/59916.
Collapse
Affiliation(s)
- Musa Jaiteh
- South African Medical Research Council/University of Johannesburg Pan African Centre for Epidemics Research Extramural Unit, Faculty of Health Sciences, University of Johannesburg, Johannesburg, South Africa
| | - Edith Phalane
- South African Medical Research Council/University of Johannesburg Pan African Centre for Epidemics Research Extramural Unit, Faculty of Health Sciences, University of Johannesburg, Johannesburg, South Africa
| | - Yegnanew A Shiferaw
- Department of Statistics, Faculty of Science, University of Johannesburg, Johannesburg, South Africa
| | - Refilwe Nancy Phaswana-Mafuya
- South African Medical Research Council/University of Johannesburg Pan African Centre for Epidemics Research Extramural Unit, Faculty of Health Sciences, University of Johannesburg, Johannesburg, South Africa
| |
Collapse
|
3
|
Schnall R, Kempf MC, Phillips G, Dionne JA, Wingood G, Long DM, Klitzman R, Hughes TL, Liu J, Nassel AF, Corcoran JL, Johnson AK. Protocol: the American Women: Assessing Risk Epidemiologically (AWARE) cohort study. BMC Public Health 2024; 24:3422. [PMID: 39695485 DOI: 10.1186/s12889-024-20810-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2024] [Accepted: 11/20/2024] [Indexed: 12/20/2024] Open
Abstract
BACKGROUND While progress has been made in reducing HIV incidence rates among cisgender women, it continues to fall short of reaching the goal of ending the HIV epidemic with no new cases. OBJECTIVE This study aims to use innovative electronic methods (e.g., social media with community-informed advertisements) to recruit and retain a large (N = 1,800), diverse national sample of women at higher risk for HIV seroconversion who are 14 years of age and older to better understand the predictors of HIV-related sexual risk and HIV incidence within the context of a theoretically-grounded social-ecological framework. METHODS A US-based national longitudinal cohort study was launched among cisgender women with greater likelihood of HIV seroconversion Participants complete a survey with items related to demographics, substance use, mental health symptoms, interpersonal violence and other social factors. Biospecimens include self-collected vaginal and rectal swabs, and blood in microtainers to test for HIV, syphilis, chlamydia, gonorrhea, and trichomoniasis every 6 months for 2 years. RESULTS Participant recruitment began in June 2023 and baseline enrollment is scheduled to finish in July 2025. DISCUSSION Innovative and culturally sensitive strategies to improve access to HIV prevention and treatment services for cisgender women are vital to curb the burden of the HIV epidemic for this key population. Findings from this study will inform future research, intervention strategies, and public policies.
Collapse
Affiliation(s)
- Rebecca Schnall
- School of Nursing, Columbia University, New York, NY, 10032, USA.
- Columbia University Mailman School of Public Health, New York, NY, 10032, USA.
- Disease Prevention and Health Promotion, Columbia University School of Nursing, 560 West 168th Street, New York, NY, 10032, USA.
| | - Mirjam-Colette Kempf
- School of Public Health, University of Alabama at Birmingham, Birmingham, AL, 35294, USA
- School of Nursing, University of Alabama at Birmingham, Birmingham, AL, 35294, USA
- Heersink School of Medicine, University of Alabama at Birmingham, Birmingham, AL, 35294, USA
| | - Gregory Phillips
- Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA
| | - Jodie A Dionne
- Heersink School of Medicine, University of Alabama at Birmingham, Birmingham, AL, 35294, USA
| | - Gina Wingood
- Columbia University Mailman School of Public Health, New York, NY, 10032, USA
| | - Dustin Marsh Long
- Wake Forest University School of Medicine, Winston-Salem, NC, 27101, USA
| | - Robert Klitzman
- Columbia University Mailman School of Public Health, New York, NY, 10032, USA
| | - Tonda L Hughes
- School of Nursing, Columbia University, New York, NY, 10032, USA
| | - Jianfang Liu
- School of Nursing, Columbia University, New York, NY, 10032, USA
| | - Ariann F Nassel
- School of Public Health, University of Alabama at Birmingham, Birmingham, AL, 35294, USA
| | - Jessica Lee Corcoran
- School of Nursing, University of Alabama at Birmingham, Birmingham, AL, 35294, USA
| | - Amy K Johnson
- Ann & Robert H. Lurie Children's Hospital of Chicago, Chicago, IL, 60611, USA
| |
Collapse
|
4
|
Yu E, Du J, Xiang Y, Hu X, Feng J, Luo X, Schneider JA, Zhi D, Fujimoto K, Tao C. Explainable artificial intelligence and domain adaptation for predicting HIV infection with graph neural networks. Ann Med 2024; 56:2407063. [PMID: 39417227 PMCID: PMC11488171 DOI: 10.1080/07853890.2024.2407063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 05/15/2024] [Accepted: 05/23/2024] [Indexed: 10/19/2024] Open
Abstract
OBJECTIVE Investigation of explainable deep learning methods for graph neural networks to predict HIV infections with social network information and performing domain adaptation to evaluate model transferability across different datasets. METHODS Network data from two cohorts of younger sexual minority men (SMM) from two U.S. cities (Chicago, IL, and Houston, TX) were collected between 2014 and 2016. Feature importance from graph attention network (GAT) models were determined using GNNExplainer. Domain adaptation was performed to examine model transferability from one city dataset to the other dataset, training with 100% of the source dataset with 30% of the target dataset and prediction on the remaining 70% from the target dataset. RESULTS Domain adaptation showed the ability of GAT to improve prediction over training with single city datasets. Feature importance analysis with GAT models in single city training indicated similar features across different cities, reinforcing potential application of GAT models in predicting HIV infections through domain adaptation. CONCLUSION GAT models can be used to address the data sparsity issue in HIV study populations. They are powerful tools for predicting individual risk of HIV that can be further explored for better understanding of HIV transmission.
Collapse
Affiliation(s)
- Evan Yu
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Jingcheng Du
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Yang Xiang
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Xinyue Hu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL, USA
| | - Jingna Feng
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL, USA
| | - Xi Luo
- School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - John A. Schneider
- Departments of Medicine and Public Health Sciences, University of Chicago, Chicago, IL, USA
| | - Degui Zhi
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Kayo Fujimoto
- School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Cui Tao
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL, USA
| |
Collapse
|
5
|
Saldana CS, Burkhardt E, Pennisi A, Oliver K, Olmstead J, Holland DP, Gettings J, Mauck D, Austin D, Wortley P, Ochoa KVS. Development of a Machine Learning Modeling Tool for Predicting HIV Incidence Using Public Health Data From a County in the Southern United States. Clin Infect Dis 2024; 79:717-726. [PMID: 38393832 DOI: 10.1093/cid/ciae100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 01/29/2024] [Accepted: 02/21/2024] [Indexed: 02/25/2024] Open
Abstract
BACKGROUND Advancements in machine learning (ML) have improved the accuracy of models that predict human immunodeficiency virus (HIV) incidence. These models have used electronic medical records and registries. We aim to broaden the application of these tools by using deidentified public health datasets for notifiable sexually transmitted infections (STIs) from a southern US county known for high HIV incidence. The goal is to assess the feasibility and accuracy of ML in predicting HIV incidence, which could inform and enhance public health interventions. METHODS We analyzed 2 deidentified public health datasets from January 2010 to December 2021, focusing on notifiable STIs. Our process involved data processing and feature extraction, including sociodemographic factors, STI cases, and social vulnerability index (SVI) metrics. Various ML models were trained and evaluated for predicting HIV incidence using metrics such as accuracy, precision, recall, and F1 score. RESULTS We included 85 224 individuals; 2027 (2.37%) were newly diagnosed with HIV during the study period. The ML models demonstrated high performance in predicting HIV incidence among males and females. Influential features for males included age at STI diagnosis, previous STI information, provider type, and SVI. For females, predictive features included age, ethnicity, previous STI information, overall SVI, and race. CONCLUSIONS The high accuracy of our ML models in predicting HIV incidence highlights the potential of using public health datasets for public health interventions such as tailored HIV testing and prevention. While these findings are promising, further research is needed to translate these models into practical public health applications.
Collapse
Affiliation(s)
- Carlos S Saldana
- Division of Infectious Diseases, Department of Medicine, Emory University School of Medicine, Atlanta, Georgia, USA
| | - Elizabeth Burkhardt
- Epidemiology Division, Georgia Department of Public Health, Atlanta, Georgia, USA
| | - Alfred Pennisi
- Epidemiology Division, Georgia Department of Public Health, Atlanta, Georgia, USA
| | - Kirsten Oliver
- Epidemiology Division, Georgia Department of Public Health, Atlanta, Georgia, USA
| | - John Olmstead
- Epidemiology Division, Georgia Department of Public Health, Atlanta, Georgia, USA
| | - David P Holland
- Division of Primary Care, Mercy Care Health Systems, Atlanta, Georgia, USA
- Fulton County Board of Health, Communicable Disease Prevention Branch, Atlanta, Georgia, USA
| | - Jenna Gettings
- Epidemiology Division, Georgia Department of Public Health, Atlanta, Georgia, USA
| | - Daniel Mauck
- Epidemiology Division, Georgia Department of Public Health, Atlanta, Georgia, USA
| | - David Austin
- Epidemiology Division, Georgia Department of Public Health, Atlanta, Georgia, USA
| | - Pascale Wortley
- Epidemiology Division, Georgia Department of Public Health, Atlanta, Georgia, USA
| | - Karla V Saldana Ochoa
- School of Architecture, College of Design, Construction, and Planning, University of Florida, Gainesville, Florida, USA
| |
Collapse
|
6
|
Alie MS, Negesse Y. Machine learning prediction of adolescent HIV testing services in Ethiopia. Front Public Health 2024; 12:1341279. [PMID: 38560439 PMCID: PMC10981275 DOI: 10.3389/fpubh.2024.1341279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 03/04/2024] [Indexed: 04/04/2024] Open
Abstract
Background Despite endeavors to achieve the Joint United Nations Programme on HIV/AIDS 95-95-95 fast track targets established in 2014 for HIV prevention, progress has fallen short. Hence, it is imperative to identify factors that can serve as predictors of an adolescent's HIV status. This identification would enable the implementation of targeted screening interventions and the enhancement of healthcare services. Our primary objective was to identify these predictors to facilitate the improvement of HIV testing services for adolescents in Ethiopia. Methods A study was conducted by utilizing eight different machine learning techniques to develop models using demographic and health data from 4,502 adolescent respondents. The dataset consisted of 31 variables and variable selection was done using different selection methods. To train and validate the models, the data was randomly split into 80% for training and validation, and 20% for testing. The algorithms were evaluated, and the one with the highest accuracy and mean f1 score was selected for further training using the most predictive variables. Results The J48 decision tree algorithm has proven to be remarkably successful in accurately detecting HIV positivity, outperforming seven other algorithms with an impressive accuracy rate of 81.29% and a Receiver Operating Characteristic (ROC) curve of 86.3%. The algorithm owes its success to its remarkable capability to identify crucial predictor features, with the top five being age, knowledge of HIV testing locations, age at first sexual encounter, recent sexual activity, and exposure to family planning. Interestingly, the model's performance witnessed a significant improvement when utilizing only twenty variables as opposed to including all variables. Conclusion Our research findings indicate that the J48 decision tree algorithm, when combined with demographic and health-related data, is a highly effective tool for identifying potential predictors of HIV testing. This approach allows us to accurately predict which adolescents are at a high risk of infection, enabling the implementation of targeted screening strategies for early detection and intervention. To improve the testing status of adolescents in the country, we recommend considering demographic factors such as age, age at first sexual encounter, exposure to family planning, recent sexual activity, and other identified predictors.
Collapse
Affiliation(s)
- Melsew Setegn Alie
- Department of Public Health, School of Public Health, College of Medicine and Health Science, Mizan-Tepi University, Mizan-Aman, Ethiopia
| | - Yilkal Negesse
- Department of Public Health, College of Medicine and Health Science, Debre-Markos University, Gojjam, Ethiopia
| |
Collapse
|
7
|
Tang Z, Van Nguyen TP, Yang W, Xia X, Chen H, Mullens AB, Dean JA, Osborne SR, Li Y. High security and privacy protection model for STI/HIV risk prediction. Digit Health 2024; 10:20552076241298425. [PMID: 39574801 PMCID: PMC11580078 DOI: 10.1177/20552076241298425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2024] [Accepted: 10/07/2024] [Indexed: 11/24/2024] Open
Abstract
Introduction Applying and leveraging artificial intelligence within the healthcare domain has emerged as a fundamental pursuit to advance health. Data-driven models rooted in deep learning have become powerful tools for use in healthcare informatics. Nevertheless, healthcare data are highly sensitive and must be safeguarded, particularly information related to sexually transmissible infections (STIs) and human immunodeficiency virus (HIV). Methods We employed federated learning (FL) in combination with homomorphic encryption (HE) for STI/HIV prediction to train deep learning models on decentralized data while upholding rigorous privacy. The dataset included 168,459 data entries collected from eight countries between 2013 and 2018. The data for each country was split into two groups, with 70% allocated for training and 30% for testing. Our strategy was based on two-step aggregation to enhance model performance and leverage the area under the curve (AUC) and accuracy metrics and involved a secondary aggregation at the local level before utilizing the global model for each client. We introduced a dropout approach as an effective client-side solution to mitigate computational costs. Results Model performance was progressively enhanced from an AUC of 0.78 and an accuracy of 74.4% using the local model to an AUC of 0.94 and an accuracy of 90.7% using the more advanced model. Conclusion Our proposed model for STI/HIV risk prediction surpasses those achieved by local models and those constructed from centralized data sources, highlighting the potential of our approach to improve healthcare outcomes while safeguarding sensitive patient information.
Collapse
Affiliation(s)
- Zhaohui Tang
- School of Mathematics, Physics and Computing, Centre for Health Research, University of Southern Queensland, Toowoomba Campus, QLD, Australia
| | - Thi Phuoc Van Nguyen
- School of Mathematics, Physics and Computing, Centre for Health Research, University of Southern Queensland, Toowoomba Campus, QLD, Australia
- Department of Information Technology, Thanh Do University, Hanoi, Vietnam
| | - Wencheng Yang
- School of Mathematics, Physics and Computing, Centre for Health Research, University of Southern Queensland, Toowoomba Campus, QLD, Australia
| | - Xiaoyu Xia
- School of Computing Technologies, RMIT University, Melbourne, VIC, Australia
| | - Huaming Chen
- School of Electrical and Computer Engineering, The University of Sydney, Darlington, NSW, Australia
| | - Amy B. Mullens
- School of Psychology and Wellbeing, Centre for Health Research, Institute for Resilient Regions, University of Southern Queensland, Ipswich, QLD, Australia
| | - Judith A. Dean
- School of Public Health, Faculty of Medicine, The University of Queensland, Herston Campus, QLD, Australia
| | - Sonya R Osborne
- School of Nursing and Midwifery, Centre for Health Research, Institute for Resilient Regions, University of Southern Queensland, Ipswich, QLD, Australia
| | - Yan Li
- School of Mathematics, Physics and Computing, Centre for Health Research, University of Southern Queensland, Toowoomba Campus, QLD, Australia
| |
Collapse
|
8
|
Nadarzynski T, Lunt A, Knights N, Bayley J, Llewellyn C. "But can chatbots understand sex?" Attitudes towards artificial intelligence chatbots amongst sexual and reproductive health professionals: An exploratory mixed-methods study. Int J STD AIDS 2023; 34:809-816. [PMID: 37269292 PMCID: PMC10561522 DOI: 10.1177/09564624231180777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Accepted: 05/22/2023] [Indexed: 06/05/2023]
Abstract
BACKGROUND Artificial Intelligence (AI)-enabled chatbots can offer anonymous education about sexual and reproductive health (SRH). Understanding chatbot acceptability and feasibility allows the identification of barriers to the design and implementation. METHODS In 2020, we conducted an online survey and qualitative interviews with SRH professionals recruited online to explore the views on AI, automation and chatbots. Qualitative data were analysed thematically. RESULTS Amongst 150 respondents (48% specialist doctor/consultant), only 22% perceived chatbots as effective and 24% saw them as ineffective for SRH advice [Mean = 2.91, SD = 0.98, range: 1-5]. Overall, there were mixed attitudes towards SRH chatbots [Mean = 4.03, SD = 0.87, range: 1-7]. Chatbots were most acceptable for appointment booking, general sexual health advice and signposting, but not acceptable for safeguarding, virtual diagnosis, and emotional support. Three themes were identified: "Moving towards a 'digital' age'", "AI improving access and service efficacy", and "Hesitancy towards AI". CONCLUSIONS Half of SRH professionals were hesitant about the use of chatbots in SRH services, attributed to concerns about patient safety, and lack of familiarity with this technology. Future studies should explore the role of AI chatbots as supplementary tools for SRH promotion. Chatbot designers need to address the concerns of health professionals to increase acceptability and engagement with AI-enabled services.
Collapse
Affiliation(s)
| | - Alexandria Lunt
- Brighton and Sussex Medical School, University of Sussex, Brighton
| | | | | | - Carrie Llewellyn
- Brighton and Sussex Medical School, University of Sussex, Brighton
| |
Collapse
|
9
|
Mutai CK, McSharry PE, Ngaruye I, Musabanganji E. Use of unsupervised machine learning to characterise HIV predictors in sub-Saharan Africa. BMC Infect Dis 2023; 23:482. [PMID: 37468851 DOI: 10.1186/s12879-023-08467-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 07/17/2023] [Indexed: 07/21/2023] Open
Abstract
INTRODUCTION Significant regional variations in the HIV epidemic hurt effective common interventions in sub-Saharan Africa. It is crucial to analyze HIV positivity distributions within clusters and assess the homogeneity of countries. We aim at identifying clusters of countries based on socio-behavioural predictors of HIV for screening. METHOD We used an agglomerative hierarchical, unsupervised machine learning, approach for clustering to analyse data for 146,733 male and 155,622 female respondents from 13 sub-Saharan African countries with 20 and 26 features, respectively, using Population-based HIV Impact Assessment (PHIA) data from the survey years 2015-2019. We employed agglomerative hierarchical clustering and optimal silhouette index criterion to identify clusters of countries based on the similarity of socio-behavioural characteristics. We analyse the distribution of HIV positivity with socio-behavioural predictors of HIV within each cluster. RESULTS Two principal components were obtained, with the first describing 62.3% and 70.1% and the second explaining 18.3% and 20.6% variance of the total socio-behavioural variation in females and males, respectively. Two clusters per sex were identified, and the most predictor features in both sexes were: relationship with family head, enrolled in school, circumcision status for males, delayed pregnancy, work for payment in last 12 months, Urban area indicator, known HIV status and delayed pregnancy. The HIV positivity distribution with these variables was significant within each cluster. CONCLUSIONS /FINDINGS The findings provide a potential use of unsupervised machine learning approaches for substantially identifying clustered countries based on the underlying socio-behavioural characteristics.
Collapse
Affiliation(s)
- Charles K Mutai
- African Center of Excellence in Data Science, University of Rwanda, Kigali, BP 4285, Rwanda.
- Department of Mathematics, Physics and Computing, Moi University, Eldoret, Kenya.
| | - Patrick E McSharry
- African Center of Excellence in Data Science, University of Rwanda, Kigali, BP 4285, Rwanda
- College of Engineering, Carnegie Mellon University Africa, Kigali, BP 6150, Rwanda
- Oxford-Man Institute of Quantitative Finance, Oxford University, Oxford, OX2 6ED, UK
| | - Innocent Ngaruye
- College of Science and Technology, University of Rwanda, Kigali, Rwanda
| | | |
Collapse
|
10
|
Friedman EE, Shankaran S, Devlin SA, Kishen EB, Mason JA, Sha BE, Ridgway JP. Development of a predictive model for identifying women vulnerable to HIV in Chicago. BMC Womens Health 2023; 23:313. [PMID: 37328764 PMCID: PMC10276380 DOI: 10.1186/s12905-023-02460-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 06/03/2023] [Indexed: 06/18/2023] Open
Abstract
INTRODUCTION Researchers in the United States have created several models to predict persons most at risk for HIV. Many of these predictive models use data from all persons newly diagnosed with HIV, the majority of whom are men, and specifically men who have sex with men (MSM). Consequently, risk factors identified by these models are biased toward features that apply only to men or capture sexual behaviours of MSM. We sought to create a predictive model for women using cohort data from two major hospitals in Chicago with large opt-out HIV screening programs. METHODS We matched 48 newly diagnosed women to 192 HIV-negative women based on number of previous encounters at University of Chicago or Rush University hospitals. We examined data for each woman for the two years prior to either their HIV diagnosis or their last encounter. We assessed risk factors including demographic characteristics and clinical diagnoses taken from patient electronic medical records (EMR) using odds ratios and 95% confidence intervals. We created a multivariable logistic regression model and measured predictive power with the area under the curve (AUC). In the multivariable model, age group, race, and ethnicity were included a priori due to increased risk for HIV among specific demographic groups. RESULTS The following clinical diagnoses were significant at the bivariate level and were included in the model: pregnancy (OR 1.96 (1.00, 3.84)), hepatitis C (OR 5.73 (1.24, 26.51)), substance use (OR 3.12 (1.12, 8.65)) and sexually transmitted infections (STIs) chlamydia, gonorrhoea, or syphilis. We also a priori included demographic factors that are associated with HIV. Our final model had an AUC of 0.74 and included healthcare site, age group, race, ethnicity, pregnancy, hepatitis C, substance use, and STI diagnosis. CONCLUSIONS Our predictive model showed acceptable discrimination between those who were and were not newly diagnosed with HIV. We identified risk factors such as recent pregnancy, recent hepatitis C diagnosis, and substance use in addition to the traditionally used recent STI diagnosis that can be incorporated by health systems to detect women who are vulnerable to HIV and would benefit from preexposure prophylaxis (PrEP).
Collapse
Affiliation(s)
- Eleanor E. Friedman
- Department of Medicine, University of Chicago, 5841 S. Maryland Ave, MC 5065, Chicago, IL 60637 USA
| | | | - Samantha A. Devlin
- Department of Medicine, University of Chicago, 5841 S. Maryland Ave, MC 5065, Chicago, IL 60637 USA
| | | | - Joseph A. Mason
- Department of Medicine, University of Chicago, 5841 S. Maryland Ave, MC 5065, Chicago, IL 60637 USA
| | | | - Jessica P. Ridgway
- Department of Medicine, University of Chicago, 5841 S. Maryland Ave, MC 5065, Chicago, IL 60637 USA
| |
Collapse
|
11
|
Birri Makota RB, Musenge E. Predicting HIV infection in the decade (2005-2015) pre-COVID-19 in Zimbabwe: A supervised classification-based machine learning approach. PLOS DIGITAL HEALTH 2023; 2:e0000260. [PMID: 37285368 DOI: 10.1371/journal.pdig.0000260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Accepted: 04/24/2023] [Indexed: 06/09/2023]
Abstract
The burden of HIV and related diseases have been areas of great concern pre and post the emergence of COVID-19 in Zimbabwe. Machine learning models have been used to predict the risk of diseases, including HIV accurately. Therefore, this paper aimed to determine common risk factors of HIV positivity in Zimbabwe between the decade 2005 to 2015. The data were from three two staged population five-yearly surveys conducted between 2005 and 2015. The outcome variable was HIV status. The prediction model was fit by adopting 80% of the data for learning/training and 20% for testing/prediction. Resampling was done using the stratified 5-fold cross-validation procedure repeatedly. Feature selection was done using Lasso regression, and the best combination of selected features was determined using Sequential Forward Floating Selection. We compared six algorithms in both sexes based on the F1 score, which is the harmonic mean of precision and recall. The overall HIV prevalence for the combined dataset was 22.5% and 15.3% for females and males, respectively. The best-performing algorithm to identify individuals with a higher likelihood of HIV infection was XGBoost, with a high F1 score of 91.4% for males and 90.1% for females based on the combined surveys. The results from the prediction model identified six common features associated with HIV, with total number of lifetime sexual partners and cohabitation duration being the most influential variables for females and males, respectively. In addition to other risk reduction techniques, machine learning may aid in identifying those who might require Pre-exposure prophylaxis, particularly women who experience intimate partner violence. Furthermore, compared to traditional statistical approaches, machine learning uncovered patterns in predicting HIV infection with comparatively reduced uncertainty and, therefore, crucial for effective decision-making.
Collapse
Affiliation(s)
- Rutendo Beauty Birri Makota
- Division of Epidemiology and Biostatistics, School of Public Health, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - Eustasius Musenge
- Division of Epidemiology and Biostatistics, School of Public Health, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| |
Collapse
|
12
|
Ekundayo TC, Ijabadeniyi OA, Igbinosa EO, Okoh AI. Using machine learning models to predict the effects of seasonal fluxes on Plesiomonas shigelloides population density. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2023; 317:120734. [PMID: 36455774 DOI: 10.1016/j.envpol.2022.120734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 11/21/2022] [Accepted: 11/22/2022] [Indexed: 06/17/2023]
Abstract
Seasonal variations (SVs) affect the population density (PD), fate, and fitness of pathogens in environmental water resources and the public health impacts. Therefore, this study is aimed at applying machine learning intelligence (MLI) to predict the impacts of SVs on P. shigelloides population density (PDP) in the aquatic milieu. Physicochemical events (PEs) and PDP from three rivers acquired via standard microbiological and instrumental techniques across seasons were fitted to MLI algorithms (linear regression (LR), multiple linear regression (MR), random forest (RF), gradient boosted machine (GBM), neural network (NN), K-nearest neighbour (KNN), boosted regression tree (BRT), extreme gradient boosting (XGB) regression, support vector regression (SVR), decision tree regression (DTR), M5 pruned regression (M5P), artificial neural network (ANN) regression (with one 10-node hidden layer (ANN10), two 6- and 4-node hidden layers (ANN64), and two 5- and 5-node hidden layers (ANN55)), and elastic net regression (ENR)) to assess the implications of the SVs of PEs on aquatic PDP. The results showed that SVs significantly influenced PDP and PEs in the water (p < 0.0001), exhibiting a site-specific pattern. While MLI algorithms predicted PDP with differing absolute flux magnitudes for the contributing variables, DTR predicted the highest PDP value of 1.707 log unit, followed by XGB (1.637 log unit), but XGB (mean-squared-error (MSE) = 0.0025; root-mean-squared-error (RMSE) = 0.0501; R2 =0.998; medium absolute deviation (MAD) = 0.0275) outperformed other models in terms of regression metrics. Temperature and total suspended solids (TSS) ranked first and second as significant factors in predicting PDP in 53.3% (8/15) and 40% (6/15), respectively, of the models, based on the RMSE loss after permutations. Additionally, season ranked third among the 7 models, and turbidity (TBS) ranked fourth at 26.7% (4/15), as the primary significant factor for predicting PDP in the aquatic milieu. The results of this investigation demonstrated that MLI predictive modelling techniques can promisingly be exploited to complement the repetitive laboratory-based monitoring of PDP and other pathogens, especially in low-resource settings, in response to seasonal fluxes and can provide insights into the potential public health risks of emerging pathogens and TSS pollution (e.g., nanoparticles and micro- and nanoplastics) in the aquatic milieu. The model outputs provide low-cost and effective early warning information to assist watershed managers and fish farmers in making appropriate decisions about water resource protection, aquaculture management, and sustainable public health protection.
Collapse
Affiliation(s)
- Temitope C Ekundayo
- SAMRC Microbial Water Quality Monitoring Centre, University of Fort Hare, Alice, Eastern Cape, South Africa; Department of Biotechnology and Food Science, Durban University of Technology, Steve Biko Campus, Steve Biko Rd, Musgrave, Berea, 4001, Durban, South Africa; Department of Microbiology, University of Medical Sciences, Ondo City, Ondo State, Nigeria.
| | - Oluwatosin A Ijabadeniyi
- Department of Biotechnology and Food Science, Durban University of Technology, Steve Biko Campus, Steve Biko Rd, Musgrave, Berea, 4001, Durban, South Africa
| | - Etinosa O Igbinosa
- SAMRC Microbial Water Quality Monitoring Centre, University of Fort Hare, Alice, Eastern Cape, South Africa; Department of Microbiology, Faculty of Life Sciences University of Benin, Private Mail Bag 1154, Benin City, 300283, Nigeria
| | - Anthony I Okoh
- SAMRC Microbial Water Quality Monitoring Centre, University of Fort Hare, Alice, Eastern Cape, South Africa; Department of Environmental Health Sciences, College of Health Sciences, University of Sharjah, Sharjah, P.O. Box 27272, United Arab Emirates
| |
Collapse
|
13
|
Majam M, Segal B, Fieggen J, Smith E, Hermans L, Singh L, Phatsoane M, Arora L, Lalla-Edward S. Utility of a machine-guided tool for assessing risk behaviour associated with contracting HIV in three sites in South Africa. INFORMATICS IN MEDICINE UNLOCKED 2023; 37:101192. [PMID: 36911795 PMCID: PMC9993399 DOI: 10.1016/j.imu.2023.101192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 02/03/2023] [Accepted: 02/04/2023] [Indexed: 02/11/2023] Open
Abstract
Introduction Digital data collection and the associated mobile health technologies have allowed for the recent exploration of artificial intelligence as a tool for combatting the HIV epidemic. Machine learning has been found to be useful both in HIV risk prediction and as a decision support tool for guiding pre-exposure prophylaxis (PrEP) treatment. This paper reports data from two sequential studies evaluating the viability of using machine learning to predict the susceptibility of adults to HIV infection using responses from a digital survey deployed in a high burden, low-resource setting. Methods 1036 and 593 participants were recruited across two trials. The first trial was a cross-sectional study in one location and the second trial was a cohort study across three trial sites. The data from the studies were merged, partitioned using standard techniques, and then used to train and evaluate multiple different machine learning models and select and evaluate a final model. Variable importance estimates were calculated using the PIMP and SHAP methodologies. Results Characteristics associated with HIV were consistent across both studies. Overall, HIV positive patients had a higher median age (34 [IQR: 29-39] vs 26 [IQR 22-33], p < 0.001), and were more likely to be female (155/703 [22%] vs 107/927 [12%], p < 0.001). HIV positive participants also had more commonly gone a year or more since their last HIV test (183/262 [70%] vs 540/1368 [39%], p < 0.001) and were less likely to report consistent condom usage (113/262 [43%] vs 758/1368 [55%], p < 0.001). Patients who reported TB symptoms were more likely to be HIV positive. The trained models had accuracy values (AUROCs) ranging from 78.5% to 82.8%. A boosted tree model performed best with a sensitivity of 84% (95% CI 72-92), specificity of 71% (95% CI 67-76), and a negative predictive value of 95% (95% CI 93-96) in a hold-out dataset. Age, duration since last HIV test, and number of male sexual partners were consistently three of the four most important variables across both variable importance estimates. Conclusions This study has highlighted the synergies present between mobile health and machine learning in HIV. It has been demonstrated that a viable ML model can be built using digital survey data from an low-middle income setting with potential utility in directing health resources.
Collapse
Affiliation(s)
- M. Majam
- Ezintsha, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - B. Segal
- Phithos Technologies, Johannesburg, South Africa
| | - J. Fieggen
- Phithos Technologies, Johannesburg, South Africa
| | - Eli Smith
- Phithos Technologies, Johannesburg, South Africa
| | - L. Hermans
- Ezintsha, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
- Department of Microbiology, University Medical Center Utrecht, Utrecht, the Netherlands
- Infectious Diseases Unit, Department of Medicine, University of Cape Town, Cape Town, South Africa
| | - L. Singh
- Ezintsha, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - M. Phatsoane
- Ezintsha, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - L. Arora
- Phithos Technologies, Johannesburg, South Africa
| | - S.T. Lalla-Edward
- Ezintsha, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| |
Collapse
|
14
|
Fieggen J, Smith E, Arora L, Segal B. The role of machine learning in HIV risk prediction. FRONTIERS IN REPRODUCTIVE HEALTH 2022; 4:1062387. [PMID: 36619681 PMCID: PMC9815547 DOI: 10.3389/frph.2022.1062387] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Accepted: 12/05/2022] [Indexed: 12/24/2022] Open
Abstract
Despite advances in reducing HIV-related mortality, persistently high HIV incidence rates are undermining global efforts to end the epidemic by 2030. The UNAIDS Fast-track targets as well as other preventative strategies, such as pre-exposure prophylaxis, have been identified as priority areas to reduce the ongoing transmission threatening to undermine recent progress. Accurate and granular risk prediction is critical for these campaigns but is often lacking in regions where the burden is highest. Owing to their ability to capture complex interactions between data, machine learning and artificial intelligence algorithms have proven effective at predicting the risk of HIV infection in both high resource and low resource settings. However, interpretability of these algorithms presents a challenge to the understanding and adoption of these algorithms. In this perspectives article, we provide an introduction to machine learning and discuss some of the important considerations when choosing the variables used in model development and when evaluating the performance of different machine learning algorithms, as well as the role emerging tools such as Shapely Additive Explanations may play in helping understand and decompose these models in the context of HIV. Finally, we discuss some of the potential public health and clinical use cases for such decomposed risk assessment models in directing testing and preventative interventions including pre-exposure prophylaxis, as well as highlight the potential integration synergies with algorithms that predict the risk of sexually transmitted infections and tuberculosis.
Collapse
Affiliation(s)
- Joshua Fieggen
- School of Public Health and Family Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa,Phithos Technologies, Johannesburg, South Africa,Correspondence: Joshua Fieggen ;
| | - Eli Smith
- Phithos Technologies, Johannesburg, South Africa
| | | | - Bradley Segal
- Phithos Technologies, Johannesburg, South Africa,Department of Biomedical Engineering, University of the Witwatersrand, Johannesburg, South Africa
| |
Collapse
|
15
|
Predicting HIV Status among Men Who Have Sex with Men in Bulawayo & Harare, Zimbabwe Using Bio-Behavioural Data, Recurrent Neural Networks, and Machine Learning Techniques. Trop Med Infect Dis 2022; 7:tropicalmed7090231. [PMID: 36136641 PMCID: PMC9506312 DOI: 10.3390/tropicalmed7090231] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 08/31/2022] [Accepted: 09/02/2022] [Indexed: 11/16/2022] Open
Abstract
HIV and AIDS continue to be major public health concerns globally. Despite significant progress in addressing their impact on the general population and achieving epidemic control, there is a need to improve HIV testing, particularly among men who have sex with men (MSM). This study applied deep and machine learning algorithms such as recurrent neural networks (RNNs), the bagging classifier, gradient boosting classifier, support vector machines, and Naïve Bayes classifier to predict HIV status among MSM using the dataset from the Zimbabwe Ministry of Health and Child Care. RNNs performed better than the bagging classifier, gradient boosting classifier, support vector machines, and Gaussian Naïve Bayes classifier in predicting HIV status. RNNs recorded a high prediction accuracy of 0.98 as compared to the Gaussian Naïve Bayes classifier (0.84), bagging classifier (0.91), support vector machine (0.91), and gradient boosting classifier (0.91). In addition, RNNs achieved a high precision of 0.98 for predicting both HIV-positive and -negative cases, a recall of 1.00 for HIV-negative cases and 0.94 for HIV-positive cases, and an F1-score of 0.99 for HIV-negative cases and 0.96 for positive cases. HIV status prediction models can significantly improve early HIV screening and assist healthcare professionals in effectively providing healthcare services to the MSM community. The results show that integrating HIV status prediction models into clinical software systems can complement indicator condition-guided HIV testing strategies and identify individuals that may require healthcare services, particularly for hard-to-reach vulnerable populations like MSM. Future studies are necessary to optimize machine learning models further to integrate them into primary care. The significance of this manuscript is that it presents results from a study population where very little information is available in Zimbabwe due to the criminalization of MSM activities in the country. For this reason, MSM tends to be a hidden sector of the population, frequently harassed and arrested. In almost all communities in Zimbabwe, MSM issues have remained taboo, and stigma exists in all sectors of society.
Collapse
|
16
|
Tran V, Saad T, Tesfaye M, Walelign S, Wordofa M, Abera D, Desta K, Tsegaye A, Ay A, Taye B. Helicobacter pylori (H. pylori) risk factor analysis and prevalence prediction: a machine learning-based approach. BMC Infect Dis 2022; 22:655. [PMID: 35902812 PMCID: PMC9330977 DOI: 10.1186/s12879-022-07625-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Accepted: 07/18/2022] [Indexed: 12/03/2022] Open
Abstract
Background Although previous epidemiological studies have examined the potential risk factors that increase the likelihood of acquiring Helicobacter pylori infections, most of these analyses have utilized conventional statistical models, including logistic regression, and have not benefited from advanced machine learning techniques. Objective We examined H. pylori infection risk factors among school children using machine learning algorithms to identify important risk factors as well as to determine whether machine learning can be used to predict H. pylori infection status. Methods We applied feature selection and classification algorithms to data from a school-based cross-sectional survey in Ethiopia. The data set included 954 school children with 27 sociodemographic and lifestyle variables. We conducted five runs of tenfold cross-validation on the data. We combined the results of these runs for each combination of feature selection (e.g., Information Gain) and classification (e.g., Support Vector Machines) algorithms. Results The XGBoost classifier had the highest accuracy in predicting H. pylori infection status with an accuracy of 77%—a 13% improvement from the baseline accuracy of guessing the most frequent class (64% of the samples were H. Pylori negative.) K-Nearest Neighbors showed the worst performance across all classifiers. A similar performance was observed using the F1-score and area under the receiver operating curve (AUROC) classifier evaluation metrics. Among all features, place of residence (with urban residence increasing risk) was the most common risk factor for H. pylori infection, regardless of the feature selection method choice. Additionally, our machine learning algorithms identified other important risk factors for H. pylori infection, such as; electricity usage in the home, toilet type, and waste disposal location. Using a 75% cutoff for robustness, machine learning identified five of the eight significant features found by traditional multivariate logistic regression. However, when a lower robustness threshold is used, machine learning approaches identified more H. pylori risk factors than multivariate logistic regression and suggested risk factors not detected by logistic regression. Conclusion This study provides evidence that machine learning approaches are positioned to uncover H. pylori infection risk factors and predict H. pylori infection status. These approaches identify similar risk factors and predict infection with comparable accuracy to logistic regression, thus they could be used as an alternative method. Supplementary Information The online version contains supplementary material available at 10.1186/s12879-022-07625-7.
Collapse
Affiliation(s)
- Van Tran
- Department of Mathematics, Colgate University, 13 Oak Dr., Hamilton, NY, USA
| | - Tazmilur Saad
- Department of Mathematics, Colgate University, 13 Oak Dr., Hamilton, NY, USA
| | - Mehret Tesfaye
- College of Health Sciences, Department of Medical Laboratory Science, Addis Ababa University, Addis Ababa, Ethiopia
| | - Sosina Walelign
- College of Health Sciences, Department of Medical Laboratory Science, Addis Ababa University, Addis Ababa, Ethiopia
| | - Moges Wordofa
- College of Health Sciences, Department of Medical Laboratory Science, Addis Ababa University, Addis Ababa, Ethiopia
| | - Dessie Abera
- College of Health Sciences, Department of Medical Laboratory Science, Addis Ababa University, Addis Ababa, Ethiopia
| | - Kassu Desta
- College of Health Sciences, Department of Medical Laboratory Science, Addis Ababa University, Addis Ababa, Ethiopia
| | - Aster Tsegaye
- College of Health Sciences, Department of Medical Laboratory Science, Addis Ababa University, Addis Ababa, Ethiopia
| | - Ahmet Ay
- Department of Mathematics, Colgate University, 13 Oak Dr., Hamilton, NY, USA. .,Department of Biology, Colgate University, 13 Oak Dr., Hamilton, NY, USA.
| | - Bineyam Taye
- Department of Biology, Colgate University, 13 Oak Dr., Hamilton, NY, USA.
| |
Collapse
|
17
|
Linear and Machine Learning modelling for spatiotemporal disease predictions: Force-of-Infection of Chagas disease. PLoS Negl Trop Dis 2022; 16:e0010594. [PMID: 35853042 PMCID: PMC9337653 DOI: 10.1371/journal.pntd.0010594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Revised: 07/29/2022] [Accepted: 06/18/2022] [Indexed: 12/02/2022] Open
Abstract
Background Chagas disease is a long-lasting disease with a prolonged asymptomatic period. Cumulative indices of infection such as prevalence do not shed light on the current epidemiological situation, as they integrate infection over long periods. Instead, metrics such as the Force-of-Infection (FoI) provide information about the rate at which susceptible people become infected and permit sharper inference about temporal changes in infection rates. FoI is estimated by fitting (catalytic) models to available age-stratified serological (ground-truth) data. Predictive FoI modelling frameworks are then used to understand spatial and temporal trends indicative of heterogeneity in transmission and changes effected by control interventions. Ideally, these frameworks should be able to propagate uncertainty and handle spatiotemporal issues. Methodology/principal findings We compare three methods in their ability to propagate uncertainty and provide reliable estimates of FoI for Chagas disease in Colombia as a case study: two Machine Learning (ML) methods (Boosted Regression Trees (BRT) and Random Forest (RF)), and a Linear Model (LM) framework that we had developed previously. Our analyses show consistent results between the three modelling methods under scrutiny. The predictors (explanatory variables) selected, as well as the location of the most uncertain FoI values, were coherent across frameworks. RF was faster than BRT and LM, and provided estimates with fewer extreme values when extrapolating to areas where no ground-truth data were available. However, BRT and RF were less efficient at propagating uncertainty. Conclusions/significance The choice of FoI predictive models will depend on the objectives of the analysis. ML methods will help characterise the mean behaviour of the estimates, while LM will provide insight into the uncertainty surrounding such estimates. Our approach can be extended to the modelling of FoI patterns in other Chagas disease-endemic countries and to other infectious diseases for which serosurveys are regularly conducted for surveillance. Metrics such as the per susceptible rate of infection acquisition (Force-of-Infection) are crucial to understand the current epidemiological situation and the impact of control interventions for long-lasting diseases in which the infection event might have occurred many years previously, such as Chagas disease. FoI values are estimated from serological age profiles, often obtained in a few locations. However, when using predictive models to estimate the FoI over time and space (including areas where serosurveys had not been conducted), methods able to handle and propagate uncertainty must be implemented; otherwise, overconfident predictions may be obtained. Although Machine Learning (ML) methods are powerful tools, they may not be able to entirely handle this challenge. Therefore, the use of ML must be considered in relation to the aims of the analyses. ML will be more relevant to characterise the central trends of the estimates while Linear Models will help identify areas where further serosurveys should be conducted to improve the reliability of the predictions. Our approaches can be used to generate FoI predictions in other Chagas disease-endemic countries as well as in other diseases for which serological surveillance data are collected.
Collapse
|
18
|
Albalawi U, Mustafa M. Current Artificial Intelligence (AI) Techniques, Challenges, and Approaches in Controlling and Fighting COVID-19: A Review. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:5901. [PMID: 35627437 PMCID: PMC9140632 DOI: 10.3390/ijerph19105901] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Revised: 05/07/2022] [Accepted: 05/09/2022] [Indexed: 11/17/2022]
Abstract
SARS-CoV-2 (COVID-19) has been one of the worst global health crises in the 21st century. The currently available rollout vaccines are not 100% effective for COVID-19 due to the evolving nature of the virus. There is a real need for a concerted effort to fight the virus, and research from diverse fields must contribute. Artificial intelligence-based approaches have proven to be significantly effective in every branch of our daily lives, including healthcare and medical domains. During the early days of this pandemic, artificial intelligence (AI) was utilized in the fight against this virus outbreak and it has played a major role in containing the spread of the virus. It provided innovative opportunities to speed up the development of disease interventions. Several methods, models, AI-based devices, robotics, and technologies have been proposed and utilized for diverse tasks such as surveillance, spread prediction, peak time prediction, classification, hospitalization, healthcare management, heath system capacity, etc. This paper attempts to provide a quick, concise, and precise survey of the state-of-the-art AI-based techniques, technologies, and datasets used in fighting COVID-19. Several domains, including forecasting, surveillance, dynamic times series forecasting, spread prediction, genomics, compute vision, peak time prediction, the classification of medical imaging-including CT and X-ray and how they can be processed-and biological data (genome and protein sequences) have been investigated. An overview of the open-access computational resources and platforms is given and their useful tools are pointed out. The paper presents the potential research areas in AI and will thus encourage researchers to contribute to fighting against the virus and aid global health by slowing down the spread of the virus. This will be a significant contribution to help minimize the high death rate across the globe.
Collapse
Affiliation(s)
- Umar Albalawi
- Faculty of Computing and Information Technology, University of Tabuk, KSA, Tabuk 71491, Saudi Arabia;
- Industrial Innovation and Robotics Center, University of Tabuk, KSA, Tabuk 71491, Saudi Arabia
| | - Mohammed Mustafa
- Faculty of Computing and Information Technology, University of Tabuk, KSA, Tabuk 71491, Saudi Arabia;
- Industrial Innovation and Robotics Center, University of Tabuk, KSA, Tabuk 71491, Saudi Arabia
| |
Collapse
|