1
|
Lee JJR, Srinivasan R, Ong CS, Alejo D, Schena S, Shpitser I, Sussman M, Whitman GJR, Malinsky D. Causal determinants of postoperative length of stay in cardiac surgery using causal graphical learning. J Thorac Cardiovasc Surg 2023; 166:e446-e462. [PMID: 36154975 PMCID: PMC9968823 DOI: 10.1016/j.jtcvs.2022.08.012] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Revised: 07/24/2022] [Accepted: 08/18/2022] [Indexed: 11/22/2022]
Abstract
OBJECTIVE We aimed to learn the causal determinants of postoperative length of stay in cardiac surgery patients undergoing isolated coronary artery bypass grafting or aortic valve replacement surgery. METHODS For patients undergoing isolated coronary artery bypass grafting or isolated aortic valve replacement surgeries between 2011 and 2016, we used causal graphical modeling on electronic health record data. The Fast Causal Inference (FCI) algorithm from the Tetrad software was used on data to estimate a Partial Ancestral Graph (PAG) depicting direct and indirect causes of postoperative length of stay, given background clinical knowledge. Then, we used the latent variable intervention-calculus when the directed acyclic graph is absent (LV-IDA) algorithm to estimate strengths of causal effects of interest. Finally, we ran a linear regression for postoperative length of stay to contrast statistical associations with what was learned by our causal analysis. RESULTS In our cohort of 2610 patients, the mean postoperative length of stay was 219 hours compared with the Society of Thoracic Surgeons 2016 national mean postoperative length of stay of approximately 168 hours. Most variables that clinicians believe to be predictors of postoperative length of stay were found to be causes, but some were direct (eg, age, diabetes, hematocrit, total operating time, and postoperative complications), and others were indirect (including gender, race, and operating surgeon). The strongest average causal effects on postoperative length of stay were exhibited by preoperative dialysis (209 hours); neuro-, pulmonary-, and infection-related postoperative complications (315 hours, 89 hours, and 131 hours, respectively); reintubation (61 hours); extubation in operating room (-47 hours); and total operating room duration (48 hours). Linear regression coefficients diverged from causal effects in magnitude (eg, dialysis) and direction (eg, crossclamp time). CONCLUSIONS By using retrospective electronic health record data and background clinical knowledge, causal graphical modeling retrieved direct and indirect causes of postoperative length of stay and their relative strengths. These insights will be useful in designing clinical protocols and targeting improvements in patient management.
Collapse
Affiliation(s)
- Jaron J R Lee
- Department of Computer Science, Johns Hopkins University, Baltimore, Md; Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, Md.
| | - Ranjani Srinivasan
- Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, Md; Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, Md
| | - Chin Siang Ong
- Division of Surgical Outcomes, Department of Surgery, Yale School of Medicine, New Haven, Conn
| | - Diane Alejo
- Division of Cardiac Surgery, Department of Surgery, Johns Hopkins Hospital, Baltimore, Md
| | - Stefano Schena
- Division of Cardiac Surgery, Department of Surgery, Johns Hopkins Hospital, Baltimore, Md
| | - Ilya Shpitser
- Department of Computer Science, Johns Hopkins University, Baltimore, Md; Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, Md
| | - Marc Sussman
- Division of Cardiac Surgery, Department of Surgery, Johns Hopkins Hospital, Baltimore, Md
| | - Glenn J R Whitman
- Division of Cardiac Surgery, Department of Surgery, Johns Hopkins Hospital, Baltimore, Md
| | - Daniel Malinsky
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY
| |
Collapse
|
2
|
Mutai CK, McSharry PE, Ngaruye I, Musabanganji E. Use of unsupervised machine learning to characterise HIV predictors in sub-Saharan Africa. BMC Infect Dis 2023; 23:482. [PMID: 37468851 DOI: 10.1186/s12879-023-08467-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 07/17/2023] [Indexed: 07/21/2023] Open
Abstract
INTRODUCTION Significant regional variations in the HIV epidemic hurt effective common interventions in sub-Saharan Africa. It is crucial to analyze HIV positivity distributions within clusters and assess the homogeneity of countries. We aim at identifying clusters of countries based on socio-behavioural predictors of HIV for screening. METHOD We used an agglomerative hierarchical, unsupervised machine learning, approach for clustering to analyse data for 146,733 male and 155,622 female respondents from 13 sub-Saharan African countries with 20 and 26 features, respectively, using Population-based HIV Impact Assessment (PHIA) data from the survey years 2015-2019. We employed agglomerative hierarchical clustering and optimal silhouette index criterion to identify clusters of countries based on the similarity of socio-behavioural characteristics. We analyse the distribution of HIV positivity with socio-behavioural predictors of HIV within each cluster. RESULTS Two principal components were obtained, with the first describing 62.3% and 70.1% and the second explaining 18.3% and 20.6% variance of the total socio-behavioural variation in females and males, respectively. Two clusters per sex were identified, and the most predictor features in both sexes were: relationship with family head, enrolled in school, circumcision status for males, delayed pregnancy, work for payment in last 12 months, Urban area indicator, known HIV status and delayed pregnancy. The HIV positivity distribution with these variables was significant within each cluster. CONCLUSIONS /FINDINGS The findings provide a potential use of unsupervised machine learning approaches for substantially identifying clustered countries based on the underlying socio-behavioural characteristics.
Collapse
Affiliation(s)
- Charles K Mutai
- African Center of Excellence in Data Science, University of Rwanda, Kigali, BP 4285, Rwanda.
- Department of Mathematics, Physics and Computing, Moi University, Eldoret, Kenya.
| | - Patrick E McSharry
- African Center of Excellence in Data Science, University of Rwanda, Kigali, BP 4285, Rwanda
- College of Engineering, Carnegie Mellon University Africa, Kigali, BP 6150, Rwanda
- Oxford-Man Institute of Quantitative Finance, Oxford University, Oxford, OX2 6ED, UK
| | - Innocent Ngaruye
- College of Science and Technology, University of Rwanda, Kigali, Rwanda
| | | |
Collapse
|
3
|
Leist AK, Klee M, Kim JH, Rehkopf DH, Bordas SPA, Muniz-Terrera G, Wade S. Mapping of machine learning approaches for description, prediction, and causal inference in the social and health sciences. SCIENCE ADVANCES 2022; 8:eabk1942. [PMID: 36260666 PMCID: PMC9581488 DOI: 10.1126/sciadv.abk1942] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Accepted: 09/01/2022] [Indexed: 05/20/2023]
Abstract
Machine learning (ML) methodology used in the social and health sciences needs to fit the intended research purposes of description, prediction, or causal inference. This paper provides a comprehensive, systematic meta-mapping of research questions in the social and health sciences to appropriate ML approaches by incorporating the necessary requirements to statistical analysis in these disciplines. We map the established classification into description, prediction, counterfactual prediction, and causal structural learning to common research goals, such as estimating prevalence of adverse social or health outcomes, predicting the risk of an event, and identifying risk factors or causes of adverse outcomes, and explain common ML performance metrics. Such mapping may help to fully exploit the benefits of ML while considering domain-specific aspects relevant to the social and health sciences and hopefully contribute to the acceleration of the uptake of ML applications to advance both basic and applied social and health sciences research.
Collapse
Affiliation(s)
- Anja K. Leist
- Department of Social Sciences, Institute for Research on Socio-Economic Inequality (IRSEI), University of Luxembourg, Esch-sur-Alzette, Luxembourg
- Corresponding author.
| | - Matthias Klee
- Department of Social Sciences, Institute for Research on Socio-Economic Inequality (IRSEI), University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Jung Hyun Kim
- Department of Social Sciences, Institute for Research on Socio-Economic Inequality (IRSEI), University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - David H. Rehkopf
- Department of Epidemiology and Population Health, Stanford University, Palo Alto, CA, USA
| | | | - Graciela Muniz-Terrera
- Centre for Dementia Prevention, University of Edinburgh, Edinburgh, UK
- Ohio University, Athens, OH, USA
| | - Sara Wade
- School of Mathematics, University of Edinburgh, Edinburgh, UK
| |
Collapse
|
4
|
Sherafat-Kazemzadeh R, Gaumer G, Hariharan D, Sombrio A, Nandakumar A. Between a Rock and a Hard Place: How poverty and lack of agency affect HIV risk behaviors among married women in 25 African countries: A cross-sectional study. J Glob Health 2021; 11:04059. [PMID: 34737859 PMCID: PMC8564885 DOI: 10.7189/jogh.11.04059] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Background Gender inequality and poverty exacerbate the burden of HIV/AIDS among women in Africa. AIDS awareness and educational campaigns have been inadequate in many countries and rates of HIV testing and adherence to condom use remains considerably low, especially among married women. We investigate whether higher HIV knowledge is equally effective in lowering risky behaviors among groups of women with different levels of wealth and agency. Methods Pooled data on 113 151 adult married women from Demographic and Health Surveys (DHS) in 25 African countries was used (2010 to 2016). Agency was defined as women’s ability to refuse sex and ask her partner to use a condom, plus have a role in decision making in household spending and health-related issues. The lowest tertile of DHS wealth index defined poverty. Questions about HIV prevention and mother-to-child transmission were used to create a scale for knowledge (0-5). Use of condom, HIV testing, absence of sexually transmitted disease (STD), and having one partner were dependent variables. Regression models investigated the effect of agency and knowledge as predictors of behaviors. Separate additional models were run to measure associations of each behavior with knowledge scores on groups of women divided by agency and poverty. Analyses were adjusted for demographic factors, history of pregnancy, wife-beating attitude, and country dummies. Results Significantly higher risk and lower level of protective factors exist for poor women who lack agency. Knowledge had positive associations with a better score in behavior, higher rate of condom use and testing for HIV both among poor and not poor women. When examining compound effects of agency and poverty, absence of agency reduces the positive effect of knowledge on lowering STD rate and overall behavior score among poor women. It also nullifies the effect of knowledge on condom use in both wealth groups. Conclusion Knowledge of HIV does not exert its potential protective effect when women live in poverty compounded with lack of agency. Success of anti-HIV programs should be tailored to dynamics of risk and sociocultural and economic context of target populations.
Collapse
Affiliation(s)
- Roya Sherafat-Kazemzadeh
- Institute for Global Health and Development, The Heller School for Social Policy and Management, Brandeis University, Waltham, Massachusetts, USA
| | - Gary Gaumer
- Institute for Global Health and Development, The Heller School for Social Policy and Management, Brandeis University, Waltham, Massachusetts, USA
| | - Dhwani Hariharan
- Institute for Global Health and Development, The Heller School for Social Policy and Management, Brandeis University, Waltham, Massachusetts, USA
| | - Anna Sombrio
- Institute for Global Health and Development, The Heller School for Social Policy and Management, Brandeis University, Waltham, Massachusetts, USA
| | - Allyala Nandakumar
- Institute for Global Health and Development, The Heller School for Social Policy and Management, Brandeis University, Waltham, Massachusetts, USA
| |
Collapse
|
5
|
Use of machine learning techniques to identify HIV predictors for screening in sub-Saharan Africa. BMC Med Res Methodol 2021; 21:159. [PMID: 34332540 PMCID: PMC8325403 DOI: 10.1186/s12874-021-01346-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 07/13/2021] [Indexed: 11/17/2022] Open
Abstract
Aim HIV prevention measures in sub-Saharan Africa are still short of attaining the UNAIDS 90–90-90 fast track targets set in 2014. Identifying predictors for HIV status may facilitate targeted screening interventions that improve health care. We aimed at identifying HIV predictors as well as predicting persons at high risk of the infection. Method We applied machine learning approaches for building models using population-based HIV Impact Assessment (PHIA) data for 41,939 male and 45,105 female respondents with 30 and 40 variables respectively from four countries in sub-Saharan countries. We trained and validated the algorithms on 80% of the data and tested on the remaining 20% where we rotated around the left-out country. An algorithm with the best mean f1 score was retained and trained on the most predictive variables. We used the model to identify people living with HIV and individuals with a higher likelihood of contracting the disease. Results Application of XGBoost algorithm appeared to significantly improve identification of HIV positivity over the other five algorithms by f1 scoring mean of 90% and 92% for males and females respectively. Amongst the eight most predictor features in both sexes were: age, relationship with family head, the highest level of education, highest grade at that school level, work for payment, avoiding pregnancy, age at the first experience of sex, and wealth quintile. Model performance using these variables increased significantly compared to having all the variables included. We identified five males and 19 females individuals that would require testing to find one HIV positive individual. We also predicted that 4·14% of males and 10.81% of females are at high risk of infection. Conclusion Our findings provide a potential use of the XGBoost algorithm with socio-behavioural-driven data at substantially identifying HIV predictors and predicting individuals at high risk of infection for targeted screening. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-021-01346-2.
Collapse
|
6
|
HIV Incidence Among Women in Sub-Saharan Africa: A Time Trend Analysis of the 2000-2017 Period. J Assoc Nurses AIDS Care 2021; 32:662. [PMID: 33989245 DOI: 10.1097/jnc.0000000000000254] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
ABSTRACT The aim of this study was to use data from the United Nations Global Indicators Database to analyze the trends in the HIV incidence rate among women in sub-Saharan African countries between 2000 and 2017. The HIV incidence rate is defined as the number of new HIV infections per 1,000 uninfected population, aged 15 to 49 years old. Joinpoint regression analysis was applied to identify periods when there were significant changes in the HIV incidence rate. The results show that there was a global decrease trend in the HIV incidence rates among women in sub-Saharan Africa, decreasing in all sub-Saharan African countries, except in Angola, Equatorial Guinea, and Sudan, which have remained the same, and Madagascar, where the overall trend is increasing. The joinpoint regression statistical method offers an in-depth analysis of the incidence of HIV among women in sub-Saharan Africa.
Collapse
|
7
|
Kummerfeld E. A simple interpretation of undirected edges in essential graphs is wrong. PLoS One 2021; 16:e0249415. [PMID: 33831048 PMCID: PMC8031147 DOI: 10.1371/journal.pone.0249415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Accepted: 03/18/2021] [Indexed: 11/18/2022] Open
Abstract
Artificial intelligence for causal discovery frequently uses Markov equivalence classes of directed acyclic graphs, graphically represented as essential graphs, as a way of representing uncertainty in causal directionality. There has been confusion regarding how to interpret undirected edges in essential graphs, however. In particular, experts and non-experts both have difficulty quantifying the likelihood of uncertain causal arrows being pointed in one direction or another. A simple interpretation of undirected edges treats them as having equal odds of being oriented in either direction, but I show in this paper that any agent interpreting undirected edges in this simple way can be Dutch booked. In other words, I can construct a set of bets that appears rational for the users of the simple interpretation to accept, but for which in all possible outcomes they lose money. I put forward another interpretation, prove this interpretation leads to a bet-taking strategy that is sufficient to avoid all Dutch books of this kind, and conjecture that this strategy is also necessary for avoiding such Dutch books. Finally, I demonstrate that undirected edges that are more likely to be oriented in one direction than the other are common in graphs with 4 nodes and 3 edges.
Collapse
Affiliation(s)
- Erich Kummerfeld
- Institute for Health Informatics, University of Minnesota, Minneapolis, Minnesota, United States of America
| |
Collapse
|
8
|
Merzouki A, Estill J, Orel E, Tal K, Keiser O. Clusters of sub-Saharan African countries based on sociobehavioural characteristics and associated HIV incidence. PeerJ 2021; 9:e10660. [PMID: 33520455 PMCID: PMC7812934 DOI: 10.7717/peerj.10660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Accepted: 12/07/2020] [Indexed: 11/20/2022] Open
Abstract
Introduction HIV incidence varies widely between sub-Saharan African (SSA) countries. This variation coincides with a substantial sociobehavioural heterogeneity, which complicates the design of effective interventions. In this study, we investigated how sociobehavioural heterogeneity in sub-Saharan Africa could account for the variance of HIV incidence between countries. Methods We analysed aggregated data, at the national-level, from the most recent Demographic and Health Surveys of 29 SSA countries (2010–2017), which included 594,644 persons (183,310 men and 411,334 women). We preselected 48 demographic, socio-economic, behavioural and HIV-related attributes to describe each country. We used Principal Component Analysis to visualize sociobehavioural similarity between countries, and to identify the variables that accounted for most sociobehavioural variance in SSA. We used hierarchical clustering to identify groups of countries with similar sociobehavioural profiles, and we compared the distribution of HIV incidence (estimates from UNAIDS) and sociobehavioural variables within each cluster. Results The most important characteristics, which explained 69% of sociobehavioural variance across SSA among the variables we assessed were: religion; male circumcision; number of sexual partners; literacy; uptake of HIV testing; women’s empowerment; accepting attitude toward people living with HIV/AIDS; rurality; ART coverage; and, knowledge about AIDS. Our model revealed three groups of countries, each with characteristic sociobehavioural profiles. HIV incidence was mostly similar within each cluster and different between clusters (median (IQR); 0.5/1000 (0.6/1000), 1.8/1000 (1.3/1000) and 5.0/1000 (4.2/1000)). Conclusions Our findings suggest that the combination of sociobehavioural factors play a key role in determining the course of the HIV epidemic, and that similar techniques can help to predict the effects of behavioural change on the HIV epidemic and to design targeted interventions to impede HIV transmission in SSA.
Collapse
Affiliation(s)
- Aziza Merzouki
- Institute of Global Health, University of Geneva, Geneva, Switzerland
| | - Janne Estill
- Institute of Global Health, University of Geneva, Geneva, Switzerland.,Institute of Mathematical Statistics and Actuarial Science, University of Bern, Bern, Switzerland
| | - Erol Orel
- Institute of Global Health, University of Geneva, Geneva, Switzerland
| | - Kali Tal
- Institute of Primary Health Care (BIHAM), University of Bern, Bern, Switzerland
| | - Olivia Keiser
- Institute of Global Health, University of Geneva, Geneva, Switzerland
| |
Collapse
|
9
|
Merzouki A, Styles A, Estill J, Orel E, Baranczuk Z, Petrie K, Keiser O. Identifying groups of people with similar sociobehavioural characteristics in Malawi to inform HIV interventions: a latent class analysis. J Int AIDS Soc 2020; 23:e25615. [PMID: 32985772 PMCID: PMC7521110 DOI: 10.1002/jia2.25615] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2020] [Revised: 08/19/2020] [Accepted: 08/26/2020] [Indexed: 01/04/2023] Open
Abstract
INTRODUCTION Within many sub-Saharan African countries including Malawi, HIV prevalence varies widely between regions. This variability may be related to the distribution of population groups with specific sociobehavioural characteristics that influence the transmission of HIV and the uptake of prevention. In this study, we intended to identify groups of people in Malawi with similar risk profiles. METHODS We used data from the Demographic and Health Survey in Malawi (2015 to 2016), and stratified the analysis by sex. We considered demographic, socio-behavioural and HIV-related variables. Using Latent Class Analysis (LCA), we identified groups of people sharing common sociobehavioural characteristics. The optimal number of classes (groups) was selected based on the Bayesian information criterion. We compared the proportions of individuals belonging to the different groups across the three regions and 28 districts of Malawi. RESULTS We found nine groups of women and six groups of men. Most women in the groups with highest risk of being HIV positive were living in female-headed households and were formerly married or in a union. Among men, older men had the highest risk of being HIV positive, followed by young (20 to 25) single men. Generally, low HIV testing uptake correlated with lower risk of having HIV. However, rural adolescent girls had a low probability of being tested (48.7%) despite a relatively high HIV prevalence. Urban districts and the Southern region had a higher percentage of high-prevalence and less tested groups of individuals than other areas. CONCLUSIONS LCA is an efficient method to find groups of people sharing common HIV risk profiles, identify particularly vulnerable sub-populations, and plan targeted interventions focusing on these groups. Tailored support, prevention and HIV testing programmes should focus particularly on female household heads, adolescent girls living in rural areas, older married men and young men who have never been married.
Collapse
Affiliation(s)
- Aziza Merzouki
- Institute of Global HealthUniversity of GenevaGenevaSwitzerland
| | | | - Janne Estill
- Institute of Global HealthUniversity of GenevaGenevaSwitzerland
- Institute of Mathematical Statistics and Actuarial ScienceUniversity of BernBernSwitzerland
| | - Erol Orel
- Institute of Global HealthUniversity of GenevaGenevaSwitzerland
| | - Zofia Baranczuk
- Institute of Global HealthUniversity of GenevaGenevaSwitzerland
- Department of PsychologyUniversity of ZurichZurichSwitzerland
- Institute of MathematicsUniversity of ZurichZurichSwitzerland
| | | | - Olivia Keiser
- Institute of Global HealthUniversity of GenevaGenevaSwitzerland
| |
Collapse
|
10
|
Thiabaud A, Triulzi I, Orel E, Tal K, Keiser O. Social, Behavioral, and Cultural factors of HIV in Malawi: Semi-Automated Systematic Review. J Med Internet Res 2020; 22:e18747. [PMID: 32795992 PMCID: PMC7455873 DOI: 10.2196/18747] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2020] [Revised: 05/20/2020] [Accepted: 06/04/2020] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Demographic and sociobehavioral factors are strong drivers of HIV infection rates in sub-Saharan Africa. These factors are often studied in qualitative research but ignored in quantitative analyses. However, they provide in-depth insight into the local behavior and may help to improve HIV prevention. OBJECTIVE To obtain a comprehensive overview of the sociobehavioral factors influencing HIV prevalence and incidence in Malawi, we systematically reviewed the literature using a newly programmed tool for automatizing part of the systematic review process. METHODS Due to the choice of broad search terms ("HIV AND Malawi"), our preliminary search revealed many thousands of articles. We, therefore, developed a Python tool to automatically extract, process, and categorize open-access articles published from January 1, 1987 to October 1, 2019 in the PubMed, PubMed Central, JSTOR, Paperity, and arXiV databases. We then used a topic modelling algorithm to classify and identify publications of interest. RESULTS Our tool extracted 22,709 unique articles; 16,942 could be further processed. After topic modelling, 519 of these were clustered into relevant topics, of which 20 were kept after manual screening. We retrieved 7 more publications after examining the references so that 27 publications were finally included in the review. Reducing the 16,942 articles to 519 potentially relevant articles using the software took 5 days. Several factors contributing to the risk of HIV infection were identified, including religion, gender and relationship dynamics, beliefs, and sociobehavioral attitudes. CONCLUSIONS Our software does not replace traditional systematic reviews, but it returns useful results to broad queries of open-access literature in under a week, without a priori knowledge. This produces a "seed dataset" of relevance that could be further developed. It identified known factors and factors that may be specific to Malawi. In the future, we aim to expand the tool by adding more social science databases and applying it to other sub-Saharan African countries.
Collapse
Affiliation(s)
- Amaury Thiabaud
- Institut de Santé Globale, Université de Genève, Genève, Switzerland
| | - Isotta Triulzi
- Institut de Santé Globale, Université de Genève, Genève, Switzerland
- Institute of Management, Scuola Superiore Sant'Anna, Pisa, Italy
| | - Erol Orel
- Institut de Santé Globale, Université de Genève, Genève, Switzerland
| | - Kali Tal
- Institut de Santé Globale, Université de Genève, Genève, Switzerland
- Institute of Primary Health Care (BIHAM), University of Bern, Bern, Switzerland
| | - Olivia Keiser
- Institut de Santé Globale, Université de Genève, Genève, Switzerland
| |
Collapse
|