1
|
Esber AL, Dear NF, King D, Francisco LV, Sing'oei V, Owuoth J, Maswai J, Iroezindu M, Bahemana E, Kibuuka H, Shah N, Polyak CS, Ake JA, Crowell TA. Achieving the third 95 in sub-Saharan Africa: application of machine learning approaches to predict viral failure. AIDS 2023; 37:1861-1870. [PMID: 37418549 DOI: 10.1097/qad.0000000000003646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/09/2023]
Abstract
OBJECTIVE Viral failure in people with HIV (PWH) may be influenced by multiple sociobehavioral, clinical, and context-specific factors, and supervised learning approaches may identify novel predictors. We compared the performance of two supervised learning algorithms to predict viral failure in four African countries. DESIGN Cohort study. METHODS The African Cohort Study is an ongoing, longitudinal cohort enrolling PWH at 12 sites in Uganda, Kenya, Tanzania, and Nigeria. Participants underwent physical examination, medical history-taking, medical record extraction, sociobehavioral interviews, and laboratory testing. In cross-sectional analyses of enrollment data, viral failure was defined as a viral load at least 1000 copies/ml among participants on antiretroviral therapy (ART) for at least 6 months. We compared the performance of lasso-type regularized regression and random forests by calculating area under the curve (AUC) and used each to identify factors associated with viral failure; 94 explanatory variables were considered. RESULTS Between January 2013 and December 2020, 2941 PWH were enrolled, 1602 had been on antiretroviral therapy (ART) for at least 6 months, and 1571 participants with complete case data were included. At enrollment, 190 (12.0%) had viral failure. The lasso regression model was slightly superior to the random forest in its ability to identify PWH with viral failure (AUC: 0.82 vs. 0.75). Both models identified CD4 + count, ART regimen, age, self-reported ART adherence and duration on ART as important factors associated with viral failure. CONCLUSION These findings corroborate existing literature primarily based on hypothesis-testing statistical approaches and help to generate questions for future investigations that may impact viral failure.
Collapse
Affiliation(s)
- Allahna L Esber
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring
- Henry M. Jackson Foundation for the Advancement of Military Medicine, Bethesda, Maryland, USA
| | - Nicole F Dear
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring
- Henry M. Jackson Foundation for the Advancement of Military Medicine, Bethesda, Maryland, USA
| | - David King
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring
- Henry M. Jackson Foundation for the Advancement of Military Medicine, Bethesda, Maryland, USA
| | - Leilani V Francisco
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring
- Henry M. Jackson Foundation for the Advancement of Military Medicine, Bethesda, Maryland, USA
| | - Valentine Sing'oei
- U.S. Army Medical Research Directorate - Africa
- HJF Medical Research International, Kisumu
| | - John Owuoth
- U.S. Army Medical Research Directorate - Africa
- HJF Medical Research International, Kisumu
| | - Jonah Maswai
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring
- U.S. Army Medical Research Directorate - Africa, Kericho, Kenya
| | - Michael Iroezindu
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring
- HJF Medical Research International, Abuja, Nigeria
| | - Emmanuel Bahemana
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring
- HJF Medical Research International, Mbeya, Tanzania
| | - Hannah Kibuuka
- Makerere University-Walter Reed Project, Kampala, Uganda
| | - Neha Shah
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring
| | - Christina S Polyak
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring
- Henry M. Jackson Foundation for the Advancement of Military Medicine, Bethesda, Maryland, USA
| | - Julie A Ake
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring
| | - Trevor A Crowell
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring
- Henry M. Jackson Foundation for the Advancement of Military Medicine, Bethesda, Maryland, USA
| |
Collapse
|
2
|
Wellawatte GP, Seshadri A, White AD. Model agnostic generation of counterfactual explanations for molecules. Chem Sci 2022; 13:3697-3705. [PMID: 35432902 PMCID: PMC8966631 DOI: 10.1039/d1sc05259d] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 02/06/2022] [Indexed: 11/25/2022] Open
Abstract
An outstanding challenge in deep learning in chemistry is its lack of interpretability. The inability of explaining why a neural network makes a prediction is a major barrier to deployment of AI models. This not only dissuades chemists from using deep learning predictions, but also has led to neural networks learning spurious correlations that are difficult to notice. Counterfactuals are a category of explanations that provide a rationale behind a model prediction with satisfying properties like providing chemical structure insights. Yet, counterfactuals have been previously limited to specific model architectures or required reinforcement learning as a separate process. In this work, we show a universal model-agnostic approach that can explain any black-box model prediction. We demonstrate this method on random forest models, sequence models, and graph neural networks in both classification and regression.
Collapse
Affiliation(s)
| | - Aditi Seshadri
- Department of Chemical Engineering, University of Rochester Rochester NY USA
| | - Andrew D White
- Department of Chemical Engineering, University of Rochester Rochester NY USA
| |
Collapse
|
3
|
Bose E, Paintsil E, Ghebremichael M. Minimum redundancy maximal relevance gene selection of apoptosis pathway genes in peripheral blood mononuclear cells of HIV-infected patients with antiretroviral therapy-associated mitochondrial toxicity. BMC Med Genomics 2021; 14:285. [PMID: 34852799 PMCID: PMC8638104 DOI: 10.1186/s12920-021-01136-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2021] [Accepted: 11/16/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND We previously identified differentially expressed genes on the basis of false discovery rate adjusted P value using empirical Bayes moderated tests. However, that approach yielded a subset of differentially expressed genes without accounting for redundancy between the selected genes. METHODS This study is a secondary analysis of a case-control study of the effect of antiretroviral therapy on apoptosis pathway genes comprising of 16 cases (HIV infected with mitochondrial toxicity) and 16 controls (uninfected). We applied the maximum relevance minimum redundancy (mRMR) algorithm on the genes that were differentially expressed between the cases and controls. The mRMR algorithm iteratively selects features (genes) that are maximally relevant for class prediction and minimally redundant. We implemented several machine learning classifiers and tested the prediction accuracy of the two mRMR genes. We next used network analysis to estimate and visualize the association among the differentially expressed genes. We employed Markov Random Field or undirected network models to identify gene networks related to mitochondrial toxicity. The Spinglass model was used to identify clusters of gene communities. RESULTS The mRMR algorithm ranked DFFA and TNFRSF1A, two of the upregulated proapoptotic genes, on the top. The overall prediction accuracy was 86%, the two mRMR genes correctly classified 86% of the participants into their respective groups. The estimated network models showed different patterns of gene networks. In the network of the cases, FASLG was the most central gene. However, instead of FASLG, ABL1 and LTBR had the highest centrality in controls. CONCLUSION The mRMR algorithm and network analysis revealed a new correlation of genes associated with mitochondrial toxicity.
Collapse
Affiliation(s)
- Eliezer Bose
- Massachusetts General Hospital Institute of Health Professions, Boston, MA USA
| | - Elijah Paintsil
- Department of Pediatrics, Yale University School of Medicine, New Haven, CT USA
| | - Musie Ghebremichael
- Harvard Medical School, Cambridge, MA USA
- Ragon Institute of MGH, MIT and Harvard, 400 Technology Square, Cambridge, MA 02129 USA
| |
Collapse
|
4
|
Elkhadrawi M, Stevens BA, Wheeler BJ, Akcakaya M, Wheeler S. Machine Learning Classification of False-Positive Human Immunodeficiency Virus Screening Results. J Pathol Inform 2021; 12:46. [PMID: 34934521 PMCID: PMC8652341 DOI: 10.4103/jpi.jpi_7_21] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Revised: 06/29/2021] [Accepted: 07/13/2021] [Indexed: 11/04/2022] Open
Abstract
BACKGROUND Human immunodeficiency virus (HIV) screening has improved significantly in the past decade as we have implemented tests that include antigen detection of p24. Incorporation of p24 detection narrows the window from 4 to 2 weeks between infection acquisition and ability to detect infection, reducing unintentional spread of HIV. The fourth- and fifth-generation HIV (HIV5G) screening tests in low prevalence populations have high numbers of false-positive screens and it is unclear if orthogonal testing improves diagnostic and public health outcomes. METHODS We used a cohort of 60,587 HIV5G screening tests with molecular and clinical correlates collected from 2016 to 2018 and applied machine learning to generate a classifier that could predict likely true and false positivity. RESULTS The best classification was achieved by using support vector machines and transformation of results with principle component analysis. The final classifier had an accuracy of 94% for correct classification of false-positive screens and an accuracy of 92% for classification of true-positive screens. CONCLUSIONS Implementation of this classifier as a screening method for all HIV5G reactive screens allows for improved workflow with likely true positives reported immediately to reduce infection spread and initiate follow-up testing and treatment and likely false positives undergoing orthogonal testing utilizing the same specimen already drawn to reduce distress and follow-up visits. Application of machine learning to the clinical laboratory allows for workflow improvement and decision support to provide improved patient care and public health.
Collapse
Affiliation(s)
- Mahmoud Elkhadrawi
- Department of Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, PA, USA
| | - Bryan A Stevens
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
| | - Bradley J Wheeler
- Department of Pathology, School of Computing and Information, University of Pittsburgh, Pittsburgh, PA, USA
| | - Murat Akcakaya
- Department of Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, PA, USA
| | - Sarah Wheeler
- Department of Pathology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Pathology, University of Pittsburgh Medical Center, Pittsburgh, PA, USA
| |
Collapse
|
5
|
Machine Learning Refutes Loss of Smell as a Risk Indicator of Diabetes Mellitus. J Clin Med 2021; 10:jcm10214971. [PMID: 34768493 PMCID: PMC8584618 DOI: 10.3390/jcm10214971] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 10/19/2021] [Accepted: 10/21/2021] [Indexed: 12/02/2022] Open
Abstract
Because it is associated with central nervous changes, and olfactory dysfunction has been reported with increased prevalence among persons with diabetes, this study addressed the question of whether the risk of developing diabetes in the next 10 years is reflected in olfactory symptoms. In a cross-sectional study, in 164 individuals seeking medical consulting for possible diabetes, olfactory function was evaluated using a standardized clinical test assessing olfactory threshold, odor discrimination, and odor identification. Metabolomics parameters were assessed via blood concentrations. The individual diabetes risk was quantified according to the validated German version of the “FINDRISK” diabetes risk score. Machine learning algorithms trained with metabolomics patterns predicted low or high diabetes risk with a balanced accuracy of 63–75%. Similarly, olfactory subtest results predicted the olfactory dysfunction category with a balanced accuracy of 85–94%, occasionally reaching 100%. However, olfactory subtest results failed to improve the prediction of diabetes risk based on metabolomics data, and metabolomics data did not improve the prediction of the olfactory dysfunction category based on olfactory subtest results. Results of the present study suggest that olfactory function is not a useful predictor of diabetes.
Collapse
|
6
|
Hu H, Wang L, Li C, Ge W, Xia J. An improved method for the effect estimation of the intermediate event on the outcome based on the susceptible pre-identification. BMC Med Res Methodol 2021; 21:192. [PMID: 34548029 PMCID: PMC8454140 DOI: 10.1186/s12874-021-01378-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Accepted: 08/24/2021] [Indexed: 11/17/2022] Open
Abstract
Background In follow-up studies, the occurrence of the intermediate event may influence the risk of the outcome of interest. Existing methods estimate the effect of the intermediate event by including a time-varying covariate in the outcome model. However, the insusceptible fraction to the intermediate event in the study population has not been considered in the literature, leading to effect estimation bias due to the inaccurate dataset. Methods In this paper, we propose a new effect estimation method, in which the susceptible subpopulation is identified firstly so that the estimation could be conducted in the right population. Then, the effect is estimated via the extended Cox regression and landmark methods in the identified susceptible subpopulation. For susceptibility identification, patients with observed intermediate event time are classified as susceptible. Based on the mixture cure model fitted the incidence and time of the intermediate event, the susceptibility of the patient with censored intermediate event time is predicted by the residual intermediate event time imputation. The effect estimation performance of the new method was investigated in various scenarios via Monte-Carlo simulations with the performance of existing methods serving as the comparison. The application of the proposed method to mycosis fungoides data has been reported as an example. Results The simulation results show that the estimation bias of the proposed method is smaller than that of the existing methods, especially in the case of a large insusceptible fraction. The results hold for small sample sizes. Besides, the estimation bias of the new method decreases with the increase of the covariates, especially continuous covariates, in the mixture cure model. The heterogeneity of the effect of covariates on the outcome in the insusceptible and susceptible subpopulation, as well as the landmark time, does not affect the estimation performance of the new method. Conclusions Based on the pre-identification of the susceptible, the proposed new method could improve the effect estimation accuracy of the intermediate event on the outcome when there is an insusceptible fraction to the intermediate event in the study population. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-021-01378-8.
Collapse
Affiliation(s)
- Haixia Hu
- Department of Health Statistics, Faculty of Preventive Medicine, Air Force Medical University, No.169 Changle West Road, Xi'an, 710032, Shaanxi, China
| | - Ling Wang
- Department of Health Statistics, Faculty of Preventive Medicine, Air Force Medical University, No.169 Changle West Road, Xi'an, 710032, Shaanxi, China
| | - Chen Li
- Department of Health Statistics, Faculty of Preventive Medicine, Air Force Medical University, No.169 Changle West Road, Xi'an, 710032, Shaanxi, China
| | - Wei Ge
- Department of Health Statistics, Faculty of Preventive Medicine, Air Force Medical University, No.169 Changle West Road, Xi'an, 710032, Shaanxi, China
| | - Jielai Xia
- Department of Health Statistics, Faculty of Preventive Medicine, Air Force Medical University, No.169 Changle West Road, Xi'an, 710032, Shaanxi, China.
| |
Collapse
|
7
|
Lötsch J, Hintschich CA, Petridis P, Pade J, Hummel T. Machine-Learning Points at Endoscopic, Quality of Life, and Olfactory Parameters as Outcome Criteria for Endoscopic Paranasal Sinus Surgery in Chronic Rhinosinusitis. J Clin Med 2021; 10:4245. [PMID: 34575356 PMCID: PMC8465949 DOI: 10.3390/jcm10184245] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Revised: 09/09/2021] [Accepted: 09/15/2021] [Indexed: 12/26/2022] Open
Abstract
Chronic rhinosinusitis (CRS) is often treated by functional endoscopic paranasal sinus surgery, which improves endoscopic parameters and quality of life, while olfactory function was suggested as a further criterion of treatment success. In a prospective cohort study, 37 parameters from four categories were recorded from 60 men and 98 women before and four months after endoscopic sinus surgery, including endoscopic measures of nasal anatomy/pathology, assessments of olfactory function, quality of life, and socio-demographic or concomitant conditions. Parameters containing relevant information about changes associated with surgery were examined using unsupervised and supervised methods, including machine-learning techniques for feature selection. The analyzed cohort included 52 men and 38 women. Changes in the endoscopic Lildholdt score allowed separation of baseline from postoperative data with a cross-validated accuracy of 85%. Further relevant information included primary nasal symptoms from SNOT-20 assessments, and self-assessments of olfactory function. Overall improvement in these relevant parameters was observed in 95% of patients. A ranked list of criteria was developed as a proposal to assess the outcome of functional endoscopic sinus surgery in CRS patients with nasal polyposis. Three different facets were captured, including the Lildholdt score as an endoscopic measure and, in addition, disease-specific quality of life and subjectively perceived olfactory function.
Collapse
Affiliation(s)
- Jörn Lötsch
- Institute of Clinical Pharmacology, Goethe-University, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany
- Fraunhofer Institute for Translational Medicine and Pharmacology ITMP, Theodor-Stern-Kai 7, 60596 Frankfurt am Main, Germany
| | - Constantin A. Hintschich
- Department of Otorhinolaryngology, University of Regensburg, Franz-Josef-Strauß-Allee 11, 93053 Regensburg, Germany;
- Smell & Taste Clinic, Department of Otorhinolaryngology, TU Dresden, Fetscherstrasse 74, 01307 Dresden, Germany;
| | - Petros Petridis
- Department of Otorhinolaryngology, St. Johannes Municipal Hospital, Johannesstraße 9-17, 44137 Dortmund, Germany; (P.P.); (J.P.)
| | - Jürgen Pade
- Department of Otorhinolaryngology, St. Johannes Municipal Hospital, Johannesstraße 9-17, 44137 Dortmund, Germany; (P.P.); (J.P.)
| | - Thomas Hummel
- Smell & Taste Clinic, Department of Otorhinolaryngology, TU Dresden, Fetscherstrasse 74, 01307 Dresden, Germany;
| |
Collapse
|
8
|
Use of machine learning techniques to identify HIV predictors for screening in sub-Saharan Africa. BMC Med Res Methodol 2021; 21:159. [PMID: 34332540 PMCID: PMC8325403 DOI: 10.1186/s12874-021-01346-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 07/13/2021] [Indexed: 11/17/2022] Open
Abstract
Aim HIV prevention measures in sub-Saharan Africa are still short of attaining the UNAIDS 90–90-90 fast track targets set in 2014. Identifying predictors for HIV status may facilitate targeted screening interventions that improve health care. We aimed at identifying HIV predictors as well as predicting persons at high risk of the infection. Method We applied machine learning approaches for building models using population-based HIV Impact Assessment (PHIA) data for 41,939 male and 45,105 female respondents with 30 and 40 variables respectively from four countries in sub-Saharan countries. We trained and validated the algorithms on 80% of the data and tested on the remaining 20% where we rotated around the left-out country. An algorithm with the best mean f1 score was retained and trained on the most predictive variables. We used the model to identify people living with HIV and individuals with a higher likelihood of contracting the disease. Results Application of XGBoost algorithm appeared to significantly improve identification of HIV positivity over the other five algorithms by f1 scoring mean of 90% and 92% for males and females respectively. Amongst the eight most predictor features in both sexes were: age, relationship with family head, the highest level of education, highest grade at that school level, work for payment, avoiding pregnancy, age at the first experience of sex, and wealth quintile. Model performance using these variables increased significantly compared to having all the variables included. We identified five males and 19 females individuals that would require testing to find one HIV positive individual. We also predicted that 4·14% of males and 10.81% of females are at high risk of infection. Conclusion Our findings provide a potential use of the XGBoost algorithm with socio-behavioural-driven data at substantially identifying HIV predictors and predicting individuals at high risk of infection for targeted screening. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-021-01346-2.
Collapse
|