1
|
Orlenko A, Freda PJ, Ghosh A, Choi H, Matsumoto N, Bright TJ, Walker CT, Obafemi-Ajayi T, Moore JH. Cluster Analysis reveals Socioeconomic Disparities among Elective Spine Surgery Patients. Pac Symp Biocomput 2024; 29:359-373. [PMID: 38160292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 01/03/2024]
Abstract
This work demonstrates the use of cluster analysis in detecting fair and unbiased novel discoveries. Given a sample population of elective spinal fusion patients, we identify two overarching subgroups driven by insurance type. The Medicare group, associated with lower socioeconomic status, exhibited an over-representation of negative risk factors. The findings provide a compelling depiction of the interwoven socioeconomic and racial disparities present within the healthcare system, highlighting their consequential effects on health inequalities. The results are intended to guide design of fair and precise machine learning models based on intentional integration of population stratification.
Collapse
Affiliation(s)
- Alena Orlenko
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, California, USA*These authors contributed equally to the paper
| | | | | | | | | | | | | | | | | |
Collapse
|
2
|
Cho H, Ren Z, Divaris K, Roach J, Lin BM, Liu C, Azcarate-Peril MA, Simancas-Pallares MA, Shrestha P, Orlenko A, Ginnis J, North KE, Zandona AGF, Ribeiro AA, Wu D, Koo H. Selenomonas sputigena acts as a pathobiont mediating spatial structure and biofilm virulence in early childhood caries. Nat Commun 2023; 14:2919. [PMID: 37217495 PMCID: PMC10202936 DOI: 10.1038/s41467-023-38346-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2022] [Accepted: 04/21/2023] [Indexed: 05/24/2023] Open
Abstract
Streptococcus mutans has been implicated as the primary pathogen in childhood caries (tooth decay). While the role of polymicrobial communities is appreciated, it remains unclear whether other microorganisms are active contributors or interact with pathogens. Here, we integrate multi-omics of supragingival biofilm (dental plaque) from 416 preschool-age children (208 males and 208 females) in a discovery-validation pipeline to identify disease-relevant inter-species interactions. Sixteen taxa associate with childhood caries in metagenomics-metatranscriptomics analyses. Using multiscale/computational imaging and virulence assays, we examine biofilm formation dynamics, spatial arrangement, and metabolic activity of Selenomonas sputigena, Prevotella salivae and Leptotrichia wadei, either individually or with S. mutans. We show that S. sputigena, a flagellated anaerobe with previously unknown role in supragingival biofilm, becomes trapped in streptococcal exoglucans, loses motility but actively proliferates to build a honeycomb-like multicellular-superstructure encapsulating S. mutans, enhancing acidogenesis. Rodent model experiments reveal an unrecognized ability of S. sputigena to colonize supragingival tooth surfaces. While incapable of causing caries on its own, when co-infected with S. mutans, S. sputigena causes extensive tooth enamel lesions and exacerbates disease severity in vivo. In summary, we discover a pathobiont cooperating with a known pathogen to build a unique spatial structure and heighten biofilm virulence in a prevalent human disease.
Collapse
Affiliation(s)
- Hunyong Cho
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Zhi Ren
- Biofilm Research Laboratories, Center for Innovation & Precision Dentistry, School of Dental Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Kimon Divaris
- Division of Pediatric and Public Health, Adams School of Dentistry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
- Department of Epidemiology, Gillings School of Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
| | - Jeffrey Roach
- UNC Information Technology Services and Research Computing, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- UNC Microbiome Core, Center for Gastrointestinal Biology and Disease, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Bridget M Lin
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Chuwen Liu
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - M Andrea Azcarate-Peril
- UNC Microbiome Core, Center for Gastrointestinal Biology and Disease, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Medicine, Division of Gastroenterology and Hepatology, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Miguel A Simancas-Pallares
- Division of Pediatric and Public Health, Adams School of Dentistry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Poojan Shrestha
- Division of Pediatric and Public Health, Adams School of Dentistry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Epidemiology, Gillings School of Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Alena Orlenko
- Artificial Intelligence Innovation Lab, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Jeannie Ginnis
- Division of Pediatric and Public Health, Adams School of Dentistry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Kari E North
- Department of Epidemiology, Gillings School of Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | | | - Apoena Aguiar Ribeiro
- Division of Diagnostic Sciences, Adams School of Dentistry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Di Wu
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
- Division of Oral and Craniofacial Health Sciences, Adams School of Dentistry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
| | - Hyun Koo
- Biofilm Research Laboratories, Center for Innovation & Precision Dentistry, School of Dental Medicine, University of Pennsylvania, Philadelphia, PA, USA.
- Department of Orthodontics, School of Dental Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
3
|
Vidula MK, Orlenko A, Zhao L, Salvador L, Small AM, Horton E, Cohen JB, Adusumalli S, Denduluri S, Kobayashi T, Hyman M, Fiorilli P, Magro C, Singh B, Pourmussa B, Greczylo C, Basso M, Ebert C, Yarde M, Li Z, Cvijic ME, Wang Z, Walsh A, Maranville J, Kick E, Luettgen J, Adam L, Schafer P, Ramirez-Valle F, Seiffert D, Moore JH, Gordon D, Chirinos JA. Plasma biomarkers associated with adverse outcomes in patients with calcific aortic stenosis. Eur J Heart Fail 2021; 23:2021-2032. [PMID: 34632675 DOI: 10.1002/ejhf.2361] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Revised: 09/29/2021] [Accepted: 10/06/2021] [Indexed: 12/25/2022] Open
Abstract
AIMS Enhanced risk stratification of patients with aortic stenosis (AS) is necessary to identify patients at high risk for adverse outcomes, and may allow for better management of patient subgroups at high risk of myocardial damage. The objective of this study was to identify plasma biomarkers and multimarker profiles associated with adverse outcomes in AS. METHODS AND RESULTS We studied 708 patients with calcific AS and measured 49 biomarkers using a Luminex platform. We studied the correlation between biomarkers and the risk of (i) death and (ii) death or heart failure-related hospital admission (DHFA). We also utilized machine-learning methods (a tree-based pipeline optimizer platform) to develop multimarker models associated with the risk of death and DHFA. In this cohort with a median follow-up of 2.8 years, multiple biomarkers were significantly predictive of death in analyses adjusted for clinical confounders, including tumour necrosis factor (TNF)-α [hazard ratio (HR) 1.28, P < 0.0001], TNF receptor 1 (TNFRSF1A; HR 1.38, P < 0.0001), fibroblast growth factor (FGF)-23 (HR 1.22, P < 0.0001), N-terminal pro B-type natriuretic peptide (NT-proBNP) (HR 1.58, P < 0.0001), matrix metalloproteinase-7 (HR 1.24, P = 0.0002), syndecan-1 (HR 1.27, P = 0.0002), suppression of tumorigenicity-2 (ST2) (IL1RL1; HR 1.22, P = 0.0002), interleukin (IL)-8 (CXCL8; HR 1.22, P = 0.0005), pentraxin (PTX)-3 (HR 1.17, P = 0.001), neutrophil gelatinase-associated lipocalin (LCN2; HR 1.18, P < 0.0001), osteoprotegerin (OPG) (TNFRSF11B; HR 1.26, P = 0.0002), and endostatin (COL18A1; HR 1.28, P = 0.0012). Several biomarkers were also significantly predictive of DHFA in adjusted analyses including FGF-23 (HR 1.36, P < 0.0001), TNF-α (HR 1.26, P < 0.0001), TNFR1 (HR 1.34, P < 0.0001), angiopoietin-2 (HR 1.26, P < 0.0001), syndecan-1 (HR 1.23, P = 0.0006), ST2 (HR 1.27, P < 0.0001), IL-8 (HR 1.18, P = 0.0009), PTX-3 (HR 1.18, P = 0.0002), OPG (HR 1.20, P = 0.0013), and NT-proBNP (HR 1.63, P < 0.0001). Machine-learning multimarker models were strongly associated with adverse outcomes (mean 1-year probability of death of 0%, 2%, and 60%; mean 1-year probability of DHFA of 0%, 4%, 97%; P < 0.0001). In these models, IL-6 (a biomarker of inflammation) and FGF-23 (a biomarker of calcification) emerged as the biomarkers of highest importance. CONCLUSIONS Plasma biomarkers are strongly associated with the risk of adverse outcomes in patients with AS. Biomarkers of inflammation and calcification were most strongly related to prognosis.
Collapse
Affiliation(s)
- Mahesh K Vidula
- Division of Cardiovascular Medicine, Hospital of the University of Pennsylvania, Philadelphia, PA, USA
| | - Alena Orlenko
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Lei Zhao
- Bristol Myers Squibb Company, Lawrenceville, NJ, USA
| | - Lisa Salvador
- Bristol Myers Squibb Company, Lawrenceville, NJ, USA
| | - Aeron M Small
- Division of Cardiovascular Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Edward Horton
- Department of Internal Medicine, Hospital of the University of Pennsylvania, Philadelphia, PA, USA
| | - Jordana B Cohen
- Renal-Electrolyte and Hypertension Division, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.,Center for Clinical Epidemiology and Biostatistics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Srinath Adusumalli
- Division of Cardiovascular Medicine, Hospital of the University of Pennsylvania, Philadelphia, PA, USA
| | - Srinivas Denduluri
- Division of Cardiovascular Medicine, Hospital of the University of Pennsylvania, Philadelphia, PA, USA
| | - Taisei Kobayashi
- Division of Cardiovascular Medicine, Hospital of the University of Pennsylvania, Philadelphia, PA, USA
| | - Matthew Hyman
- Division of Cardiovascular Medicine, Hospital of the University of Pennsylvania, Philadelphia, PA, USA
| | - Paul Fiorilli
- Division of Cardiovascular Medicine, Hospital of the University of Pennsylvania, Philadelphia, PA, USA
| | - Caroline Magro
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Bibi Singh
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Bianca Pourmussa
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Candy Greczylo
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Michael Basso
- Bristol Myers Squibb Company, Lawrenceville, NJ, USA
| | | | - Melissa Yarde
- Bristol Myers Squibb Company, Lawrenceville, NJ, USA
| | - Zhuyin Li
- Bristol Myers Squibb Company, Lawrenceville, NJ, USA
| | | | - Zhaoqing Wang
- Bristol Myers Squibb Company, Lawrenceville, NJ, USA
| | - Alice Walsh
- Bristol Myers Squibb Company, Lawrenceville, NJ, USA
| | | | - Ellen Kick
- Bristol Myers Squibb Company, Lawrenceville, NJ, USA
| | | | - Leonard Adam
- Bristol Myers Squibb Company, Lawrenceville, NJ, USA
| | - Peter Schafer
- Bristol Myers Squibb Company, Lawrenceville, NJ, USA
| | | | | | - Jason H Moore
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - David Gordon
- Bristol Myers Squibb Company, Lawrenceville, NJ, USA
| | - Julio A Chirinos
- Division of Cardiovascular Medicine, Hospital of the University of Pennsylvania, Philadelphia, PA, USA.,University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| |
Collapse
|
4
|
Vidula M, Orlenko A, Zhao L, Salvador L, Smalll A, Horton E, Cohen J, Margo C, Singh B, Pourmussa B, Greczylo C, Yarde M, Li Z, Cvijic ME, Wang Z, Schafer P, Ramirez-Valle F, Seiffert D, Gordon D, Rader D, Chirinos J. PLASMA BIOMARKERS FOR RISK STRATIFICATION OF OUTCOMES IN PATIENTS WITH AORTIC STENOSIS. J Am Coll Cardiol 2021. [DOI: 10.1016/s0735-1097(21)03109-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
5
|
Orlenko A, Moore JH. A comparison of methods for interpreting random forest models of genetic association in the presence of non-additive interactions. BioData Min 2021; 14:9. [PMID: 33514397 PMCID: PMC7847145 DOI: 10.1186/s13040-021-00243-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Accepted: 01/13/2021] [Indexed: 01/19/2023] Open
Abstract
BACKGROUND Non-additive interactions among genes are frequently associated with a number of phenotypes, including known complex diseases such as Alzheimer's, diabetes, and cardiovascular disease. Detecting interactions requires careful selection of analytical methods, and some machine learning algorithms are unable or underpowered to detect or model feature interactions that exhibit non-additivity. The Random Forest method is often employed in these efforts due to its ability to detect and model non-additive interactions. In addition, Random Forest has the built-in ability to estimate feature importance scores, a characteristic that allows the model to be interpreted with the order and effect size of the feature association with the outcome. This characteristic is very important for epidemiological and clinical studies where results of predictive modeling could be used to define the future direction of the research efforts. An alternative way to interpret the model is with a permutation feature importance metric which employs a permutation approach to calculate a feature contribution coefficient in units of the decrease in the model's performance and with the Shapely additive explanations which employ cooperative game theory approach. Currently, it is unclear which Random Forest feature importance metric provides a superior estimation of the true informative contribution of features in genetic association analysis. RESULTS To address this issue, and to improve interpretability of Random Forest predictions, we compared different methods for feature importance estimation in real and simulated datasets with non-additive interactions. As a result, we detected a discrepancy between the metrics for the real-world datasets and further established that the permutation feature importance metric provides more precise feature importance rank estimation for the simulated datasets with non-additive interactions. CONCLUSIONS By analyzing both real and simulated data, we established that the permutation feature importance metric provides more precise feature importance rank estimation in the presence of non-additive interactions.
Collapse
Affiliation(s)
- Alena Orlenko
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Jason H Moore
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
6
|
Heimisdottir LH, Lin BM, Cho H, Orlenko A, Ribeiro AA, Simon-Soro A, Roach J, Shungin D, Ginnis J, Simancas-Pallares MA, Spangler HD, Zandoná AGF, Wright JT, Ramamoorthy P, Moore JH, Koo H, Wu D, Divaris K. Metabolomics Insights in Early Childhood Caries. J Dent Res 2021; 100:615-622. [PMID: 33423574 DOI: 10.1177/0022034520982963] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Dental caries is characterized by a dysbiotic shift at the biofilm-tooth surface interface, yet comprehensive biochemical characterizations of the biofilm are scant. We used metabolomics to identify biochemical features of the supragingival biofilm associated with early childhood caries (ECC) prevalence and severity. The study's analytical sample comprised 289 children ages 3 to 5 (51% with ECC) who attended public preschools in North Carolina and were enrolled in a community-based cross-sectional study of early childhood oral health. Clinical examinations were conducted by calibrated examiners in community locations using International Caries Detection and Classification System (ICDAS) criteria. Supragingival plaque collected from the facial/buccal surfaces of all primary teeth in the upper-left quadrant was analyzed using ultra-performance liquid chromatography-tandem mass spectrometry. Associations between individual metabolites and 18 clinical traits (based on different ECC definitions and sets of tooth surfaces) were quantified using Brownian distance correlations (dCor) and linear regression modeling of log2-transformed values, applying a false discovery rate multiple testing correction. A tree-based pipeline optimization tool (TPOT)-machine learning process was used to identify the best-fitting ECC classification metabolite model. There were 503 named metabolites identified, including microbial, host, and exogenous biochemicals. Most significant ECC-metabolite associations were positive (i.e., upregulations/enrichments). The localized ECC case definition (ICDAS ≥1 caries experience within the surfaces from which plaque was collected) had the strongest correlation with the metabolome (dCor P = 8 × 10-3). Sixteen metabolites were significantly associated with ECC after multiple testing correction, including fucose (P = 3.0 × 10-6) and N-acetylneuraminate (p = 6.8 × 10-6) with higher ECC prevalence, as well as catechin (P = 4.7 × 10-6) and epicatechin (P = 2.9 × 10-6) with lower. Catechin, epicatechin, imidazole propionate, fucose, 9,10-DiHOME, and N-acetylneuraminate were among the top 15 metabolites in terms of ECC classification importance in the automated TPOT model. These supragingival biofilm metabolite findings provide novel insights in ECC biology and can serve as the basis for the development of measures of disease activity or risk assessment.
Collapse
Affiliation(s)
- L H Heimisdottir
- Division of Pediatric and Public Health, Adams School of Dentistry, University of North Carolina, Chapel Hill, NC, USA
| | - B M Lin
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, USA
| | - H Cho
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, USA
| | - A Orlenko
- Department of Biostatistics, Epidemiology and Informatics, Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - A A Ribeiro
- Division of Diagnostic Sciences, Adams School of Dentistry, University of North Carolina, Chapel Hill, NC, USA
| | - A Simon-Soro
- Biofilm Research Labs, Center for Innovation and Precision Dentistry, School of Dental Medicine and School of Engineering and Applied Sciences, University of Pennsylvania, Philadelphia, PA, USA.,Department of Orthodontics and Divisions of Pediatric Dentistry and Community Oral Health, School of Dental Medicine, University of Pennsylvania, Philadelphia, PA, USA.,Department of Stomatology, School of Dentistry, University of Sevilla, Sevilla, Spain
| | - J Roach
- Research Computing, University of North Carolina, Chapel Hill, NC, USA
| | - D Shungin
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Department of Odontology, Umeå University, Umeå, Sweden
| | - J Ginnis
- Division of Pediatric and Public Health, Adams School of Dentistry, University of North Carolina, Chapel Hill, NC, USA
| | - M A Simancas-Pallares
- Division of Pediatric and Public Health, Adams School of Dentistry, University of North Carolina, Chapel Hill, NC, USA
| | - H D Spangler
- Division of Pediatric and Public Health, Adams School of Dentistry, University of North Carolina, Chapel Hill, NC, USA
| | - A G Ferreira Zandoná
- Department of Comprehensive Care, School of Dental Medicine, Tufts University, Boston, MA, USA
| | - J T Wright
- Division of Pediatric and Public Health, Adams School of Dentistry, University of North Carolina, Chapel Hill, NC, USA
| | | | - J H Moore
- Department of Biostatistics, Epidemiology and Informatics, Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - H Koo
- Biofilm Research Labs, Center for Innovation and Precision Dentistry, School of Dental Medicine and School of Engineering and Applied Sciences, University of Pennsylvania, Philadelphia, PA, USA.,Department of Orthodontics and Divisions of Pediatric Dentistry and Community Oral Health, School of Dental Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - D Wu
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, USA.,Division of Oral & Craniofacial Health Sciences, School of Dentistry, University of North Carolina, Chapel Hill, NC, USA
| | - K Divaris
- Division of Pediatric and Public Health, Adams School of Dentistry, University of North Carolina, Chapel Hill, NC, USA.,Department of Epidemiology, Gillings School of Public Health, University of North Carolina, Chapel Hill, NC, USA
| |
Collapse
|
7
|
Orlenko A, Kofink D, Lyytikäinen LP, Nikus K, Mishra P, Kuukasjärvi P, Karhunen PJ, Kähönen M, Laurikka JO, Lehtimäki T, Asselbergs FW, Moore JH. Model selection for metabolomics: predicting diagnosis of coronary artery disease using automated machine learning. Bioinformatics 2020; 36:1772-1778. [PMID: 31702773 PMCID: PMC7703753 DOI: 10.1093/bioinformatics/btz796] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Revised: 10/03/2019] [Accepted: 10/30/2019] [Indexed: 11/20/2022] Open
Abstract
Motivation Selecting the optimal machine learning (ML) model for a given dataset is often challenging. Automated ML (AutoML) has emerged as a powerful tool for enabling the automatic selection of ML methods and parameter settings for the prediction of biomedical endpoints. Here, we apply the tree-based pipeline optimization tool (TPOT) to predict angiographic diagnoses of coronary artery disease (CAD). With TPOT, ML models are represented as expression trees and optimal pipelines discovered using a stochastic search method called genetic programing. We provide some guidelines for TPOT-based ML pipeline selection and optimization-based on various clinical phenotypes and high-throughput metabolic profiles in the Angiography and Genes Study (ANGES). Results We analyzed nuclear magnetic resonance-derived lipoprotein and metabolite profiles in the ANGES cohort with a goal to identify the role of non-obstructive CAD patients in CAD diagnostics. We performed a comparative analysis of TPOT-generated ML pipelines with selected ML classifiers, optimized with a grid search approach, applied to two phenotypic CAD profiles. As a result, TPOT-generated ML pipelines that outperformed grid search optimized models across multiple performance metrics including balanced accuracy and area under the precision-recall curve. With the selected models, we demonstrated that the phenotypic profile that distinguishes non-obstructive CAD patients from no CAD patients is associated with higher precision, suggesting a discrepancy in the underlying processes between these phenotypes. Availability and implementation TPOT is freely available via http://epistasislab.github.io/tpot/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Alena Orlenko
- Department of Biostatistics, Epidemiology and Informatics, Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Daniel Kofink
- Department of Cardiology, Division Heart and Lungs, Utrecht, The Netherlands
| | | | - Kjell Nikus
- Department of Clinical Chemistry, Fimlab Laboratories, Tampere, Finland.,Department of Cardiology, Tampere University Hospital, Tampere, Finland
| | - Pashupati Mishra
- Department of Clinical Chemistry, Fimlab Laboratories, Tampere, Finland.,Department of Cardiology, Tampere University Hospital, Tampere, Finland
| | - Pekka Kuukasjärvi
- Department of Cardio-Thoracic Surgery, Heart Center, Tampere University Hospital, Tampere, Finland
| | - Pekka J Karhunen
- Department of Forensic Medicine, Fimlab Laboratories, Tampere, Finland
| | - Mika Kähönen
- Department of Clinical Physiology, Tampere University Hospital, Tampere, Finland
| | - Jari O Laurikka
- Department of Cardio-Thoracic Surgery, Heart Center, Tampere University Hospital, Tampere, Finland
| | - Terho Lehtimäki
- Department of Clinical Chemistry, Fimlab Laboratories, Tampere, Finland.,Department of Cardiology, Tampere University Hospital, Tampere, Finland
| | - Folkert W Asselbergs
- Department of Cardiology, Division Heart and Lungs, Utrecht, The Netherlands.,Health Data Research UK London, Institute for Health Informatics, University College London, London, UK.,Institute of Cardiovascular Science, Faculty of Population Health Sciences, University College London, London, UK
| | - Jason H Moore
- Department of Biostatistics, Epidemiology and Informatics, Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
8
|
Chirinos JA, Orlenko A, Zhao L, Basso MD, Cvijic ME, Li Z, Spires TE, Yarde M, Wang Z, Seiffert DA, Prenner S, Zamani P, Bhattacharya P, Kumar A, Margulies KB, Car BD, Gordon DA, Moore JH, Cappola TP. Multiple Plasma Biomarkers for Risk Stratification in Patients With Heart Failure and Preserved Ejection Fraction. J Am Coll Cardiol 2020; 75:1281-1295. [PMID: 32192654 PMCID: PMC7147356 DOI: 10.1016/j.jacc.2019.12.069] [Citation(s) in RCA: 102] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Revised: 12/22/2019] [Accepted: 12/23/2019] [Indexed: 12/30/2022]
Abstract
BACKGROUND Better risk stratification strategies are needed to enhance clinical care and trial design in heart failure with preserved ejection fraction (HFpEF). OBJECTIVES The purpose of this study was to assess the value of a targeted plasma multi-marker approach to enhance our phenotypic characterization and risk prediction in HFpEF. METHODS In this study, the authors measured 49 plasma biomarkers from TOPCAT (Treatment of Preserved Cardiac Function Heart Failure With an Aldosterone Antagonist) trial participants (n = 379) using a Multiplex assay. The relationship between biomarkers and the risk of all-cause death or heart failure-related hospital admission (DHFA) was assessed. A tree-based pipeline optimizer platform was used to generate a multimarker predictive model for DHFA. We validated the model in an independent cohort of HFpEF patients enrolled in the PHFS (Penn Heart Failure Study) (n = 156). RESULTS Two large, tightly related dominant biomarker clusters were found, which included biomarkers of fibrosis/tissue remodeling, inflammation, renal injury/dysfunction, and liver fibrosis. Other clusters were composed of neurohormonal regulators of mineral metabolism, intermediary metabolism, and biomarkers of myocardial injury. Multiple biomarkers predicted incident DHFA, including 2 biomarkers related to mineral metabolism/calcification (fibroblast growth factor-23 and OPG [osteoprotegerin]), 3 inflammatory biomarkers (tumor necrosis factor-alpha, sTNFRI [soluble tumor necrosis factor-receptor I], and interleukin-6), YKL-40 (related to liver injury and inflammation), 2 biomarkers related to intermediary metabolism and adipocyte biology (fatty acid binding protein-4 and growth differentiation factor-15), angiopoietin-2 (related to angiogenesis), matrix metalloproteinase-7 (related to extracellular matrix turnover), ST-2, and N-terminal pro-B-type natriuretic peptide. A machine-learning-derived model using a combination of biomarkers was strongly predictive of the risk of DHFA (standardized hazard ratio: 2.85; 95% confidence interval: 2.03 to 4.02; p < 0.0001) and markedly improved the risk prediction when added to the MAGGIC (Meta-Analysis Global Group in Chronic Heart Failure Risk Score) risk score. In an independent cohort (PHFS), the model strongly predicted the risk of DHFA (standardized hazard ratio: 2.74; 95% confidence interval: 1.93 to 3.90; p < 0.0001), which was also independent of the MAGGIC risk score. CONCLUSIONS Various novel circulating biomarkers in key pathophysiological domains are predictive of outcomes in HFpEF, and a multimarker approach coupled with machine-learning represents a promising strategy for enhancing risk stratification in HFpEF.
Collapse
Affiliation(s)
- Julio A Chirinos
- Division of Cardiovascular Medicine, Hospital of the University of Pennsylvania, Philadelphia, Pennsylvania; University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania.
| | - Alena Orlenko
- University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania
| | - Lei Zhao
- Bristol-Myers Squibb Company, Lawrenceville, New Jersey
| | | | | | - Zhuyin Li
- Bristol-Myers Squibb Company, Lawrenceville, New Jersey
| | | | - Melissa Yarde
- Bristol-Myers Squibb Company, Lawrenceville, New Jersey
| | - Zhaoqing Wang
- Bristol-Myers Squibb Company, Lawrenceville, New Jersey
| | | | - Stuart Prenner
- Division of Cardiovascular Medicine, Hospital of the University of Pennsylvania, Philadelphia, Pennsylvania; University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania
| | - Payman Zamani
- Division of Cardiovascular Medicine, Hospital of the University of Pennsylvania, Philadelphia, Pennsylvania; University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania
| | - Priyanka Bhattacharya
- Division of Cardiovascular Medicine, Hospital of the University of Pennsylvania, Philadelphia, Pennsylvania; University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania
| | - Anupam Kumar
- Vanderbilt University Medical Center, Nashville, Tennessee
| | - Kenneth B Margulies
- Division of Cardiovascular Medicine, Hospital of the University of Pennsylvania, Philadelphia, Pennsylvania; University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania
| | - Bruce D Car
- Bristol-Myers Squibb Company, Lawrenceville, New Jersey
| | | | - Jason H Moore
- Division of Cardiovascular Medicine, Hospital of the University of Pennsylvania, Philadelphia, Pennsylvania; University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania
| | - Thomas P Cappola
- Division of Cardiovascular Medicine, Hospital of the University of Pennsylvania, Philadelphia, Pennsylvania; University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania
| |
Collapse
|
9
|
Orlenko A, Moore JH, Orzechowski P, Olson RS, Cairns J, Caraballo PJ, Weinshilboum RM, Wang L, Breitenstein MK. Considerations for automated machine learning in clinical metabolic profiling: Altered homocysteine plasma concentration associated with metformin exposure. Pac Symp Biocomput 2018; 23:460-471. [PMID: 29218905 PMCID: PMC5882490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
With the maturation of metabolomics science and proliferation of biobanks, clinical metabolic profiling is an increasingly opportunistic frontier for advancing translational clinical research. Automated Machine Learning (AutoML) approaches provide exciting opportunity to guide feature selection in agnostic metabolic profiling endeavors, where potentially thousands of independent data points must be evaluated. In previous research, AutoML using high-dimensional data of varying types has been demonstrably robust, outperforming traditional approaches. However, considerations for application in clinical metabolic profiling remain to be evaluated. Particularly, regarding the robustness of AutoML to identify and adjust for common clinical confounders. In this study, we present a focused case study regarding AutoML considerations for using the Tree-Based Optimization Tool (TPOT) in metabolic profiling of exposure to metformin in a biobank cohort. First, we propose a tandem rank-accuracy measure to guide agnostic feature selection and corresponding threshold determination in clinical metabolic profiling endeavors. Second, while AutoML, using default parameters, demonstrated potential to lack sensitivity to low-effect confounding clinical covariates, we demonstrated residual training and adjustment of metabolite features as an easily applicable approach to ensure AutoML adjustment for potential confounding characteristics. Finally, we present increased homocysteine with long-term exposure to metformin as a potentially novel, non-replicated metabolite association suggested by TPOT; an association not identified in parallel clinical metabolic profiling endeavors. While warranting independent replication, our tandem rank-accuracy measure suggests homocysteine to be the metabolite feature with largest effect, and corresponding priority for further translational clinical research. Residual training and adjustment for a potential confounding effect by BMI only slightly modified the suggested association. Increased homocysteine is thought to be associated with vitamin B12 deficiency - evaluation for potential clinical relevance is suggested. While considerations for clinical metabolic profiling are recommended, including adjustment approaches for clinical confounders, AutoML presents an exciting tool to enhance clinical metabolic profiling and advance translational research endeavors.
Collapse
Affiliation(s)
- Alena Orlenko
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Orlenko A, Chi PB, Liberles DA. Characterizing the roles of changing population size and selection on the evolution of flux control in metabolic pathways. BMC Evol Biol 2017; 17:117. [PMID: 28545395 PMCID: PMC5445498 DOI: 10.1186/s12862-017-0962-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2016] [Accepted: 05/09/2017] [Indexed: 12/20/2022] Open
Abstract
Background Understanding the genotype-phenotype map is fundamental to our understanding of genomes. Genes do not function independently, but rather as part of networks or pathways. In the case of metabolic pathways, flux through the pathway is an important next layer of biological organization up from the individual gene or protein. Flux control in metabolic pathways, reflecting the importance of mutation to individual enzyme genes, may be evolutionarily variable due to the role of mutation-selection-drift balance. The evolutionary stability of rate limiting steps and the patterns of inter-molecular co-evolution were evaluated in a simulated pathway with a system out of equilibrium due to fluctuating selection, population size, or positive directional selection, to contrast with those under stabilizing selection. Results Depending upon the underlying population genetic regime, fluctuating population size was found to increase the evolutionary stability of rate limiting steps in some scenarios. This result was linked to patterns of local adaptation of the population. Further, during positive directional selection, as with more complex mutational scenarios, an increase in the observation of inter-molecular co-evolution was observed. Conclusions Differences in patterns of evolution when systems are in and out of equilibrium, including during positive directional selection may lead to predictable differences in observed patterns for divergent evolutionary scenarios. In particular, this result might be harnessed to detect differences between compensatory processes and directional processes at the pathway level based upon evolutionary observations in individual proteins. Detecting functional shifts in pathways reflects an important milestone in predicting when changes in genotypes result in changes in phenotypes. Electronic supplementary material The online version of this article (doi:10.1186/s12862-017-0962-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Alena Orlenko
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA, 19122, USA.,Department of Molecular Biology, University of Wyoming, Laramie, WY, 82071, USA
| | - Peter B Chi
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA, 19122, USA.,Department of Mathematics and Computer Science, Ursinus College, Collegeville, PA, 19426, USA
| | - David A Liberles
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA, 19122, USA. .,Department of Molecular Biology, University of Wyoming, Laramie, WY, 82071, USA.
| |
Collapse
|
11
|
Orlenko A, Teufel AI, Chi PB, Liberles DA. Selection on metabolic pathway function in the presence of mutation-selection-drift balance leads to rate-limiting steps that are not evolutionarily stable. Biol Direct 2016; 11:31. [PMID: 27393343 PMCID: PMC4938953 DOI: 10.1186/s13062-016-0133-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2016] [Accepted: 07/02/2016] [Indexed: 11/15/2022] Open
Abstract
Background While commonly assumed in the biochemistry community that the control of metabolic pathways is thought to be critical to cellular function, it is unclear if metabolic pathways generally have evolutionarily stable rate limiting (flux controlling) steps. Results A set of evolutionary simulations using a kinetic model of a metabolic pathway was performed under different conditions to evaluate the evolutionary stability of rate limiting steps. Simulations used combinations of selection for steady state flux, selection against the cost of molecular biosynthesis, and selection against the accumulation of high concentrations of a deleterious intermediate. Two mutational regimes were used, one with mutations that on average were neutral to molecular phenotype and a second with a preponderance of activity-destroying mutations. The evolutionary stability of rate limiting steps was low in all simulations with non-neutral mutational processes. Clustering of parameter co-evolution showed divergent inter-molecular evolutionary patterns under different evolutionary regimes. Conclusions This study provides a null model for pathway evolution when compensatory processes dominate with potential applications to predicting pathway functional change. This result also suggests a possible mechanism in which studies in statistical genetics that aim to associate a genotype to a phenotype assuming independent action of variants may be mis-specified through a mis-characterization of the link between individual gene function and pathway function. A better understanding of the genotype-phenotype map has potential applications in differentiating between compensatory changes and directional selection on pathways as well as detecting SNPs and fixed differences that might have phenotypic effects. Reviewers This article was reviewed by Arne Elofsson, David Ardell, and Shamil Sunyaev. Electronic supplementary material The online version of this article (doi:10.1186/s13062-016-0133-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Alena Orlenko
- Center for Computational Genetics and Genomics and Department of Biology, Temple University, Bio-Life Building, 1900 N. 12th Street, Philadelphia, PA, 19122-1801, USA.,Department of Molecular Biology, University of Wyoming, Laramie, WY, 82071, USA
| | - Ashley I Teufel
- Center for Computational Genetics and Genomics and Department of Biology, Temple University, Bio-Life Building, 1900 N. 12th Street, Philadelphia, PA, 19122-1801, USA.,Department of Molecular Biology, University of Wyoming, Laramie, WY, 82071, USA
| | - Peter B Chi
- Center for Computational Genetics and Genomics and Department of Biology, Temple University, Bio-Life Building, 1900 N. 12th Street, Philadelphia, PA, 19122-1801, USA.,Department of Mathematics and Computer Science, Ursinus College, Collegeville, PA, 19426, USA
| | - David A Liberles
- Center for Computational Genetics and Genomics and Department of Biology, Temple University, Bio-Life Building, 1900 N. 12th Street, Philadelphia, PA, 19122-1801, USA. .,Department of Molecular Biology, University of Wyoming, Laramie, WY, 82071, USA.
| |
Collapse
|
12
|
Abstract
Biochemical thought posits that rate-limiting steps (defined here as points of flux control) are strongly selected as points of pathway regulation and control and are thus expected to be evolutionarily conserved. Conversely, population genetic thought based upon the concepts of mutation-selection-drift balance at the pathway level might suggest variation in flux controlling steps over evolutionary time. Glycolysis, as one of the most conserved and best characterized pathways, was studied to evaluate its evolutionary conservation. The flux controlling step in glycolysis was found to vary over the tree of life. Further, phylogenetic analysis suggested at least 60 events of gene duplication and additional events of putative positive selection that might alter pathway kinetic properties. Together, these results suggest that even with presumed largely negative selection on pathway output on glycolysis, the co-evolutionary process under the hood is dynamic.
Collapse
Affiliation(s)
- Alena Orlenko
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA, 19122, USA.,Department of Molecular Biology, University of Wyoming, Laramie, WY, 82071, USA
| | - Russell A Hermansen
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA, 19122, USA.,Department of Molecular Biology, University of Wyoming, Laramie, WY, 82071, USA
| | - David A Liberles
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA, 19122, USA. .,Department of Molecular Biology, University of Wyoming, Laramie, WY, 82071, USA.
| |
Collapse
|