1
|
Yang Y, Kwon JW, Yang Y. [Factors Influencing Sexual Experiences in Adolescents Using a Random Forest Model: Secondary Data Analysis of the 2019~2021 Korea Youth Risk Behavior Web-based Survey Data]. J Korean Acad Nurs 2024; 54:193-210. [PMID: 38863188 DOI: 10.4040/jkan.23134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 03/10/2024] [Accepted: 04/08/2024] [Indexed: 06/13/2024]
Abstract
PURPOSE The objective of this study was to develop a predictive model for the sexual experiences of adolescents using the random forest method and to identify the "variable importance." METHODS The study utilized data from the 2019 to 2021 Korea Youth Risk Behavior Web-based Survey, which included 86,595 man and 80,504 woman participants. The number of independent variables stood at 44. SPSS was used to conduct Rao-Scott χ² tests and complex sample t-tests. Modeling was performed using the random forest algorithm in Python. Performance evaluation of each model included assessments of precision, recall, F1-score, receiver operating characteristics curve, and area under the curve calculations derived from the confusion matrix. RESULTS The prevalence of sexual experiences initially decreased during the COVID-19 pandemic, but later increased. "Variable importance" for predicting sexual experiences, ranked in the top six, included week and weekday sedentary time and internet usage time, followed by ease of cigarette purchase, age at first alcohol consumption, smoking initiation, breakfast consumption, and difficulty purchasing alcohol. CONCLUSION Education and support programs for promoting adolescent sexual health, based on the top-ranking important variables, should be integrated with health behavior intervention programs addressing internet usage, smoking, and alcohol consumption. We recommend active utilization of the random forest analysis method to develop high-performance predictive models for effective disease prevention, treatment, and nursing care.
Collapse
Affiliation(s)
- Yoonseok Yang
- Research Center of Healthcare & Welfare Instrument for the Aged, Division of Biomedical Engineering, College of Engineering, Jeonbuk National University, Jeonju, Korea
| | - Ju Won Kwon
- Department of Electrical Engineering and Computer Science, Daegu Gyeongbuk Institute of Science and Technology, Daegu, Korea
| | - Youngran Yang
- College of Nursing, Research Institute of Nursing Science, Jeonbuk National University, Jeonju, Korea.
| |
Collapse
|
2
|
Alie MS, Negesse Y. Machine learning prediction of adolescent HIV testing services in Ethiopia. Front Public Health 2024; 12:1341279. [PMID: 38560439 PMCID: PMC10981275 DOI: 10.3389/fpubh.2024.1341279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 03/04/2024] [Indexed: 04/04/2024] Open
Abstract
Background Despite endeavors to achieve the Joint United Nations Programme on HIV/AIDS 95-95-95 fast track targets established in 2014 for HIV prevention, progress has fallen short. Hence, it is imperative to identify factors that can serve as predictors of an adolescent's HIV status. This identification would enable the implementation of targeted screening interventions and the enhancement of healthcare services. Our primary objective was to identify these predictors to facilitate the improvement of HIV testing services for adolescents in Ethiopia. Methods A study was conducted by utilizing eight different machine learning techniques to develop models using demographic and health data from 4,502 adolescent respondents. The dataset consisted of 31 variables and variable selection was done using different selection methods. To train and validate the models, the data was randomly split into 80% for training and validation, and 20% for testing. The algorithms were evaluated, and the one with the highest accuracy and mean f1 score was selected for further training using the most predictive variables. Results The J48 decision tree algorithm has proven to be remarkably successful in accurately detecting HIV positivity, outperforming seven other algorithms with an impressive accuracy rate of 81.29% and a Receiver Operating Characteristic (ROC) curve of 86.3%. The algorithm owes its success to its remarkable capability to identify crucial predictor features, with the top five being age, knowledge of HIV testing locations, age at first sexual encounter, recent sexual activity, and exposure to family planning. Interestingly, the model's performance witnessed a significant improvement when utilizing only twenty variables as opposed to including all variables. Conclusion Our research findings indicate that the J48 decision tree algorithm, when combined with demographic and health-related data, is a highly effective tool for identifying potential predictors of HIV testing. This approach allows us to accurately predict which adolescents are at a high risk of infection, enabling the implementation of targeted screening strategies for early detection and intervention. To improve the testing status of adolescents in the country, we recommend considering demographic factors such as age, age at first sexual encounter, exposure to family planning, recent sexual activity, and other identified predictors.
Collapse
Affiliation(s)
- Melsew Setegn Alie
- Department of Public Health, School of Public Health, College of Medicine and Health Science, Mizan-Tepi University, Mizan-Aman, Ethiopia
| | - Yilkal Negesse
- Department of Public Health, College of Medicine and Health Science, Debre-Markos University, Gojjam, Ethiopia
| |
Collapse
|
3
|
Zhang F, Zhu S, Chen S, Hao Z, Fang Y, Zou H, Cai Y, Cao B, Zhang K, Cao H, Chen Y, Hu T, Wang Z. Application of machine learning for risky sexual behavior interventions among factory workers in China. Front Public Health 2023; 11:1092018. [PMID: 37601175 PMCID: PMC10437811 DOI: 10.3389/fpubh.2023.1092018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 07/11/2023] [Indexed: 08/22/2023] Open
Abstract
Introduction Assessing the likelihood of engaging in high-risk sexual behavior can assist in delivering tailored educational interventions. The objective of this study was to identify the most effective algorithm and assess high-risk sexual behaviors within the last six months through the utilization of machine-learning models. Methods The survey conducted in the Longhua District CDC, Shenzhen, involved 2023 participants who were employees of 16 different factories. The data was collected through questionnaires administered between October 2019 and November 2019. We evaluated the model's overall predictive classification performance using the area under the curve (AUC) of the receiver operating characteristic (ROC) curve. All analyses were performed using the open-source Python version 3.9.12. Results About a quarter of the factory workers had engaged in risky sexual behavior in the past 6 months. Most of them were Han Chinese (84.53%), hukou in foreign provinces (85.12%), or rural areas (83.19%), with junior high school education (55.37%), personal monthly income between RMB3,000 (US$417.54) and RMB4,999 (US$695.76; 64.71%), and were workers (80.67%). The random forest model (RF) outperformed all other models in assessing risky sexual behavior in the past 6 months and provided acceptable performance (accuracy 78%; sensitivity 11%; specificity 98%; PPV 63%; ROC 84%). Discussion Machine learning has aided in evaluating risky sexual behavior within the last six months. Our assessment models can be integrated into government or public health departments to guide sexual health promotion and follow-up services.
Collapse
Affiliation(s)
- Fang Zhang
- Department of Science and Education, Shenzhen Baoan Women's and Children's Hospital, Shenzhen, Guangdong, China
| | - Shiben Zhu
- Centre for Health Behaviours Research, Jockey Club School of Public Health and Primary Care, The Chinese University of Hong Kong, Hong Kong, China
| | - Siyu Chen
- Centre for Health Behaviours Research, Jockey Club School of Public Health and Primary Care, The Chinese University of Hong Kong, Hong Kong, China
| | - Ziyu Hao
- Centre for Health Behaviours Research, Jockey Club School of Public Health and Primary Care, The Chinese University of Hong Kong, Hong Kong, China
| | - Yuan Fang
- Department of Health and Physical Education, The Education University of Hong Kong, Hong Kong, China
| | - Huachun Zou
- School of Public Health, Sun Yat-sen University, Shenzhen, China
- Kirby Institute, University of New South Wales, Sydney, NSW, Australia
| | - Yong Cai
- School of Public Health, Shanghai Jiaotong University School of Medicine, Shanghai, China
| | - Bolin Cao
- School of Media and Communication, Shenzhen University, Shenzhen, China
| | - Kechun Zhang
- Longhua District Center for Disease Control and Prevention, Shenzhen, China
| | - He Cao
- Longhua District Center for Disease Control and Prevention, Shenzhen, China
| | - Yaqi Chen
- Longhua District Center for Disease Control and Prevention, Shenzhen, China
| | - Tian Hu
- Longhua District Center for Disease Control and Prevention, Shenzhen, China
| | - Zixin Wang
- Centre for Health Behaviours Research, Jockey Club School of Public Health and Primary Care, The Chinese University of Hong Kong, Hong Kong, China
| |
Collapse
|
4
|
Ogbechie MD, Fischer Walker C, Lee MT, Abba Gana A, Oduola A, Idemudia A, Edor M, Harris EL, Stephens J, Gao X, Chen PL, Persaud NE. Predicting Treatment Interruption Among People Living With HIV in Nigeria: Machine Learning Approach. JMIR AI 2023; 2:e44432. [PMID: 38875546 PMCID: PMC11041440 DOI: 10.2196/44432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Revised: 03/16/2023] [Accepted: 04/03/2023] [Indexed: 06/16/2024]
Abstract
BACKGROUND Antiretroviral therapy (ART) has transformed HIV from a fatal illness to a chronic disease. Given the high rate of treatment interruptions, HIV programs use a range of approaches to support individuals in adhering to ART and in re-engaging those who interrupt treatment. These interventions can often be time-consuming and costly, and thus providing for all may not be sustainable. OBJECTIVE This study aims to describe our experiences developing a machine learning (ML) model to predict interruption in treatment (IIT) at 30 days among people living with HIV newly enrolled on ART in Nigeria and our integration of the model into the routine information system. In addition, we collected health workers' perceptions and use of the model's outputs for case management. METHODS Routine program data collected from January 2005 through February 2021 was used to train and test an ML model (boosting tree and Extreme Gradient Boosting) to predict future IIT. Data were randomly sampled using an 80/20 split into training and test data sets, respectively. Model performance was estimated using sensitivity, specificity, and positive and negative predictive values. Variables considered to be highly associated with treatment interruption were preselected by a group of HIV prevention researchers, program experts, and biostatisticians for inclusion in the model. Individuals were defined as having IIT if they were provided a 30-day supply of antiretrovirals but did not return for a refill within 28 days of their scheduled follow-up visit date. Outputs from the ML model were shared weekly with health care workers at selected facilities. RESULTS After data cleaning, complete data for 136,747 clients were used for the analysis. The percentage of IIT cases decreased from 58.6% (36,663/61,864) before 2017 to 14.2% (3690/28,046) from October 2019 through February 2021. Overall IIT was higher among clients who were sicker at enrollment. Other factors that were significantly associated with IIT included pregnancy and breastfeeding status and facility characteristics (location, service level, and service type). Several models were initially developed; the selected model had a sensitivity of 81%, specificity of 88%, positive predictive value of 83%, and negative predictive value of 87%, and was successfully integrated into the national electronic medical records database. During field-testing, the majority of users reported that an IIT prediction tool could lead to proactive steps for preventing IIT and improving patient outcomes. CONCLUSIONS High-performing ML models to identify patients with HIV at risk of IIT can be developed using routinely collected service delivery data and integrated into routine health management information systems. Machine learning can improve the targeting of interventions through differentiated models of care before patients interrupt treatment, resulting in increased cost-effectiveness and improved patient outcomes.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Emily Lark Harris
- United States Agency for International Development, Dar es Salaam, United Republic of Tanzania
| | - Jessica Stephens
- United States Agency for International Development, Washington, DC, United States
| | | | | | | |
Collapse
|
5
|
He J, Li J, Jiang S, Cheng W, Jiang J, Xu Y, Yang J, Zhou X, Chai C, Wu C. Application of machine learning algorithms in predicting HIV infection among men who have sex with men: Model development and validation. Front Public Health 2022; 10:967681. [PMID: 36091522 PMCID: PMC9452878 DOI: 10.3389/fpubh.2022.967681] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Accepted: 08/02/2022] [Indexed: 01/25/2023] Open
Abstract
Background Continuously growing of HIV incidence among men who have sex with men (MSM), as well as the low rate of HIV testing of MSM in China, demonstrates a need for innovative strategies to improve the implementation of HIV prevention. The use of machine learning algorithms is an increasing tendency in disease diagnosis prediction. We aimed to develop and validate machine learning models in predicting HIV infection among MSM that can identify individuals at increased risk of HIV acquisition for transmission-reduction interventions. Methods We extracted data from MSM sentinel surveillance in Zhejiang province from 2018 to 2020. Univariate logistic regression was used to select significant variables in 2018-2019 data (P < 0.05). After data processing and feature selection, we divided the model development data into two groups by stratified random sampling: training data (70%) and testing data (30%). The Synthetic Minority Oversampling Technique (SMOTE) was applied to solve the problem of unbalanced data. The evaluation metrics of model performance were comprised of accuracy, precision, recall, F-measure, and the area under the receiver operating characteristic curve (AUC). Then, we explored three commonly-used machine learning algorithms to compare with logistic regression (LR), including decision tree (DT), support vector machines (SVM), and random forest (RF). Finally, the four models were validated prospectively with 2020 data from Zhejiang province. Results A total of 6,346 MSM were included in model development data, 372 of whom were diagnosed with HIV. In feature selection, 12 variables were selected as model predicting indicators. Compared with LR, the algorithms of DT, SVM, and RF improved the classification prediction performance in SMOTE-processed data, with the AUC of 0.778, 0.856, 0.887, and 0.942, respectively. RF was the best-performing algorithm (accuracy = 0.871, precision = 0.960, recall = 0.775, F-measure = 0.858, and AUC = 0.942). And the RF model still performed well on prospective validation (AUC = 0.846). Conclusion Machine learning models are substantially better than conventional LR model and RF should be considered in prediction tools of HIV infection in Chinese MSM. Further studies are needed to optimize and promote these algorithms and evaluate their impact on HIV prevention of MSM.
Collapse
Affiliation(s)
- Jiajin He
- School of Public Health, Zhejiang University School of Medicine, Hangzhou, China
| | - Jinhua Li
- School of Software Technology, Zhejiang University, Ningbo, China
| | - Siqing Jiang
- School of Public Health, Zhejiang University School of Medicine, Hangzhou, China
| | - Wei Cheng
- Zhejiang Provincial Center for Disease Control and Prevention, Hangzhou, China
| | - Jun Jiang
- Zhejiang Provincial Center for Disease Control and Prevention, Hangzhou, China
| | - Yun Xu
- Zhejiang Provincial Center for Disease Control and Prevention, Hangzhou, China
| | - Jiezhe Yang
- Zhejiang Provincial Center for Disease Control and Prevention, Hangzhou, China
| | - Xin Zhou
- Zhejiang Provincial Center for Disease Control and Prevention, Hangzhou, China
| | - Chengliang Chai
- Zhejiang Provincial Center for Disease Control and Prevention, Hangzhou, China,*Correspondence: Chengliang Chai
| | - Chao Wu
- School of Public Affairs, Zhejiang University, Hangzhou, China,Chao Wu
| |
Collapse
|
6
|
Abstract
The articles in this special issue of AIDS focus on the application of the so-called Big Data science (BDS) as applied to a variety of HIV-applied research questions in the sphere of health services and epidemiology. Recent advances in technology means that a critical mass of HIV-related health data with actionable intelligence is available for optimizing health outcomes, improving and informing surveillance. Data science will play a key but complementary role in supporting current efforts in prevention, diagnosis, treatment, and response needed to end the HIV epidemic. This collection provides a glimpse of the promise inherent in leveraging the digital age and improved methods in Big Data science to reimagine HIV treatment and prevention in a digital age.
Collapse
Affiliation(s)
- Bankole Olatosi
- Big Data Health Science Center, University of South Carolina, Columbia, SC 29208
- Department of Health Services Policy and Management, Arnold School of Public Health, University of South Carolina, Columbia, SC 29208
| | - Sten H. Vermund
- School of Public Health, Yale University, New Haven, CT 06510
| | - Xiaoming Li
- Big Data Health Science Center, University of South Carolina, Columbia, SC 29208
- Department of Health Promotion, Behavior and Education, Arnold School of Public Health, University of South Carolina, Columbia, SC 29208
| |
Collapse
|