1
|
Velez-Arce A, Huang K, Li MM, Lin X, Gao W, Fu T, Kellis M, Pentelute BL, Zitnik M. TDC-2: Multimodal Foundation for Therapeutic Science. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.12.598655. [PMID: 38948789 PMCID: PMC11212894 DOI: 10.1101/2024.06.12.598655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Therapeutics Data Commons (tdcommons.ai) is an open science initiative with unified datasets, AI models, and benchmarks to support research across therapeutic modalities and drug discovery and development stages. The Commons 2.0 (TDC-2) is a comprehensive overhaul of Therapeutic Data Commons to catalyze research in multimodal models for drug discovery by unifying single-cell biology of diseases, biochemistry of molecules, and effects of drugs through multimodal datasets, AI-powered API endpoints, new multimodal tasks and model frameworks, and comprehensive benchmarks. TDC-2 introduces over 1,000 multimodal datasets spanning approximately 85 million cells, pre-calculated embeddings from 5 state-of-the-art single-cell models, and a biomedical knowledge graph. TDC-2 drastically expands the coverage of ML tasks across therapeutic pipelines and 10+ new modalities, spanning but not limited to single-cell gene expression data, clinical trial data, peptide sequence data, peptidomimetics protein-peptide interaction data regarding newly discovered ligands derived from AS-MS spectroscopy, novel 3D structural data for proteins, and cell-type-specific protein-protein interaction networks at single-cell resolution. TDC-2 introduces multimodal data access under an API-first design using the model-view-controller paradigm. TDC-2 introduces 7 novel ML tasks with fine-grained biological contexts: contextualized drug-target identification, single-cell chemical/genetic perturbation response prediction, protein-peptide binding affinity prediction task, and clinical trial outcome prediction task, which introduce antigen-processing-pathway-specific, cell-type-specific, peptide-specific, and patient-specific biological contexts. TDC-2 also releases benchmarks evaluating 15+ state-of-the-art models across 5+ new learning tasks evaluating models on diverse biological contexts and sampling approaches. Among these, TDC-2 provides the first benchmark for context-specific learning. TDC-2, to our knowledge, is also the first to introduce a protein-peptide binding interaction benchmark.
Collapse
|
2
|
Oikonomou EK, Thangaraj PM, Bhatt DL, Ross JS, Young LH, Krumholz HM, Suchard MA, Khera R. An explainable machine learning-based phenomapping strategy for adaptive predictive enrichment in randomized clinical trials. NPJ Digit Med 2023; 6:217. [PMID: 38001154 PMCID: PMC10673945 DOI: 10.1038/s41746-023-00963-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 11/05/2023] [Indexed: 11/26/2023] Open
Abstract
Randomized clinical trials (RCT) represent the cornerstone of evidence-based medicine but are resource-intensive. We propose and evaluate a machine learning (ML) strategy of adaptive predictive enrichment through computational trial phenomaps to optimize RCT enrollment. In simulated group sequential analyses of two large cardiovascular outcomes RCTs of (1) a therapeutic drug (pioglitazone versus placebo; Insulin Resistance Intervention after Stroke (IRIS) trial), and (2) a disease management strategy (intensive versus standard systolic blood pressure reduction in the Systolic Blood Pressure Intervention Trial (SPRINT)), we constructed dynamic phenotypic representations to infer response profiles during interim analyses and examined their association with study outcomes. Across three interim timepoints, our strategy learned dynamic phenotypic signatures predictive of individualized cardiovascular benefit. By conditioning a prospective candidate's probability of enrollment on their predicted benefit, we estimate that our approach would have enabled a reduction in the final trial size across ten simulations (IRIS: -14.8% ± 3.1%, pone-sample t-test = 0.001; SPRINT: -17.6% ± 3.6%, pone-sample t-test < 0.001), while preserving the original average treatment effect (IRIS: hazard ratio of 0.73 ± 0.01 for pioglitazone vs placebo, vs 0.76 in the original trial; SPRINT: hazard ratio of 0.72 ± 0.01 for intensive vs standard systolic blood pressure, vs 0.75 in the original trial; all simulations with Cox regression-derived p value of < 0.01 for the effect of the intervention on the respective primary outcome). This adaptive framework has the potential to maximize RCT enrollment efficiency.
Collapse
Affiliation(s)
- Evangelos K Oikonomou
- Section of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA
| | - Phyllis M Thangaraj
- Section of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA
| | - Deepak L Bhatt
- Mount Sinai Heart, Icahn School of Medicine at Mount Sinai Health System, New York, NY, USA
| | - Joseph S Ross
- Section of General Internal Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA
- Center for Outcomes Research and Evaluation, Yale-New Haven Hospital, New Haven, CT, USA
- Department of Health Policy and Management, Yale School of Public Health, New Haven, CT, USA
| | - Lawrence H Young
- Section of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA
| | - Harlan M Krumholz
- Section of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA
- Center for Outcomes Research and Evaluation, Yale-New Haven Hospital, New Haven, CT, USA
- Department of Health Policy and Management, Yale School of Public Health, New Haven, CT, USA
| | - Marc A Suchard
- Departments of Computational Medicine and Human Genetics, David Geffen School of Medicine at UCLA, University of California, Los Angeles, CA, USA
| | - Rohan Khera
- Section of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA.
- Center for Outcomes Research and Evaluation, Yale-New Haven Hospital, New Haven, CT, USA.
- Section of Health Informatics, Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA.
- Section of Biomedical Informatics and Data Science, Yale School of Public Health, New Haven, CT, USA.
| |
Collapse
|
3
|
Oikonomou EK, Thangaraj PM, Bhatt DL, Ross JS, Young LH, Krumholz HM, Suchard MA, Khera R. An explainable machine learning-based phenomapping strategy for adaptive predictive enrichment in randomized controlled trials. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.06.18.23291542. [PMID: 37961715 PMCID: PMC10635225 DOI: 10.1101/2023.06.18.23291542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Randomized controlled trials (RCT) represent the cornerstone of evidence-based medicine but are resource-intensive. We propose and evaluate a machine learning (ML) strategy of adaptive predictive enrichment through computational trial phenomaps to optimize RCT enrollment. In simulated group sequential analyses of two large cardiovascular outcomes RCTs of (1) a therapeutic drug (pioglitazone versus placebo; Insulin Resistance Intervention after Stroke (IRIS) trial), and (2) a disease management strategy (intensive versus standard systolic blood pressure reduction in the Systolic Blood Pressure Intervention Trial (SPRINT)), we constructed dynamic phenotypic representations to infer response profiles during interim analyses and examined their association with study outcomes. Across three interim timepoints, our strategy learned dynamic phenotypic signatures predictive of individualized cardiovascular benefit. By conditioning a prospective candidate's probability of enrollment on their predicted benefit, we estimate that our approach would have enabled a reduction in the final trial size across ten simulations (IRIS: -14.8% ± 3.1%, pone-sample t-test=0.001; SPRINT: -17.6% ± 3.6%, pone-sample t-test<0.001), while preserving the original average treatment effect (IRIS: hazard ratio of 0.73 ± 0.01 for pioglitazone vs placebo, vs 0.76 in the original trial; SPRINT: hazard ratio of 0.72 ± 0.01 for intensive vs standard systolic blood pressure, vs 0.75 in the original trial; all with pone-sample t-test<0.01). This adaptive framework has the potential to maximize RCT enrollment efficiency.
Collapse
Affiliation(s)
- Evangelos K Oikonomou
- Section of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA
| | - Phyllis M. Thangaraj
- Section of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA
| | - Deepak L Bhatt
- Mount Sinai Heart, Icahn School of Medicine at Mount Sinai Health System, New York, NY, USA
| | - Joseph S Ross
- Section of General Internal Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA
| | - Lawrence H Young
- Section of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA
| | - Harlan M Krumholz
- Section of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA
- Center for Outcomes Research and Evaluation, Yale-New Haven Hospital, New Haven, CT, USA
| | - Marc A Suchard
- Departments of Computational Medicine and Human Genetics, David Geffen School of Medicine at UCLA, University of California, Los Angeles, CA, USA
| | - Rohan Khera
- Section of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA
- Center for Outcomes Research and Evaluation, Yale-New Haven Hospital, New Haven, CT, USA
- Section of Health Informatics, Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
- Section of Biomedical Informatics and Data Science, Yale School of Public Health, New Haven, CT
| |
Collapse
|
4
|
Niazi SK. The Coming of Age of AI/ML in Drug Discovery, Development, Clinical Testing, and Manufacturing: The FDA Perspectives. Drug Des Devel Ther 2023; 17:2691-2725. [PMID: 37701048 PMCID: PMC10493153 DOI: 10.2147/dddt.s424991] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Accepted: 08/24/2023] [Indexed: 09/14/2023] Open
Abstract
Artificial intelligence (AI) and machine learning (ML) represent significant advancements in computing, building on technologies that humanity has developed over millions of years-from the abacus to quantum computers. These tools have reached a pivotal moment in their development. In 2021 alone, the U.S. Food and Drug Administration (FDA) received over 100 product registration submissions that heavily relied on AI/ML for applications such as monitoring and improving human performance in compiling dossiers. To ensure the safe and effective use of AI/ML in drug discovery and manufacturing, the FDA and numerous other U.S. federal agencies have issued continuously updated, stringent guidelines. Intriguingly, these guidelines are often generated or updated with the aid of AI/ML tools themselves. The overarching goal is to expedite drug discovery, enhance the safety profiles of existing drugs, introduce novel treatment modalities, and improve manufacturing compliance and robustness. Recent FDA publications offer an encouraging outlook on the potential of these tools, emphasizing the need for their careful deployment. This has expanded market opportunities for retraining personnel handling these technologies and enabled innovative applications in emerging therapies such as gene editing, CRISPR-Cas9, CAR-T cells, mRNA-based treatments, and personalized medicine. In summary, the maturation of AI/ML technologies is a testament to human ingenuity. Far from being autonomous entities, these are tools created by and for humans designed to solve complex problems now and in the future. This paper aims to present the status of these technologies, along with examples of their present and future applications.
Collapse
|
5
|
Lu Y, Ganz ML, Robinson RL, Zagar AJ, Okala S, Hartrick CT, Johnston B, Dorling P, Slim M, Thakkar S, Berger A. Use of electronic health data to identify patients with moderate-to-severe osteoarthritis of the hip and/or knee and inadequate response to pain medications. BMC Med Res Methodol 2023; 23:156. [PMID: 37391751 PMCID: PMC10311749 DOI: 10.1186/s12874-023-01964-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 06/06/2023] [Indexed: 07/02/2023] Open
Abstract
BACKGROUND No algorithms exist to identify important osteoarthritis (OA) patient subgroups (i.e., moderate-to-severe disease, inadequate response to pain treatments) in electronic healthcare data, possibly due to the complexity in defining these characteristics as well as the lack of relevant measures in these data sources. We developed and validated algorithms intended for use with claims and/or electronic medical records (EMR) to identify these patient subgroups. METHODS We obtained claims, EMR, and chart data from two integrated delivery networks. Chart data were used to identify the presence or absence of the three relevant OA-related characteristics (OA of the hip and/or knee, moderate-to-severe disease, inadequate/intolerable response to at least two pain-related medications); the resulting classification served as the benchmark for algorithm validation. We developed two sets of case-identification algorithms: one based on a literature review and clinical input (predefined algorithms), and another using machine learning (ML) methods (logistic regression, classification and regression tree, random forest). Patient classifications based on these algorithms were compared and validated against the chart data. RESULTS We sampled and analyzed 571 adult patients, of whom 519 had OA of hip and/or knee, 489 had moderate-to-severe OA, and 431 had inadequate response to at least two pain medications. Individual predefined algorithms had high positive predictive values (all PPVs ≥ 0.83) for identifying each of these OA characteristics, but low negative predictive values (all NPVs between 0.16-0.54) and sometimes low sensitivity; their sensitivity and specificity for identifying patients with all three characteristics was 0.95 and 0.26, respectively (NPV 0.65, PPV 0.78, accuracy 0.77). ML-derived algorithms performed better in identifying this patient subgroup (range: sensitivity 0.77-0.86, specificity 0.66-0.75, PPV 0.88-0.92, NPV 0.47-0.62, accuracy 0.75-0.83). CONCLUSIONS Predefined algorithms adequately identified OA characteristics of interest, but more sophisticated ML-based methods better differentiated between levels of disease severity and identified patients with inadequate response to analgesics. The ML methods performed well, yielding high PPV, NPV, sensitivity, specificity, and accuracy using either claims or EMR data. Use of these algorithms may expand the ability of real-world data to address questions of interest in this underserved patient population.
Collapse
|
6
|
Encarnação S, Vaz P, Fortunato Á, Forte P, Vaz C, Monteiro AM. Aerobic Fitness as an Important Moderator Risk Factor for Loneliness in Physically Trained Older People: An Explanatory Case Study Using Machine Learning. Life (Basel) 2023; 13:1374. [PMID: 37374156 DOI: 10.3390/life13061374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 05/30/2023] [Accepted: 06/09/2023] [Indexed: 06/29/2023] Open
Abstract
BACKGROUND Loneliness in older people seems to have emerged as an increasingly prevalent social problem. OBJECTIVE To apply a machine learning (ML) algorithm to the task of understanding the influence of sociodemographic variables, physical fitness, physical activity levels (PAL), and sedentary behavior (SB) on the loneliness feelings of physically trained older people. MATERIALS AND METHODS The UCLA loneliness scale was used to evaluate loneliness, the Functional Fitness Test Battery was used to evaluate the correlation of sociodemographic variables, physical fitness, PAL, and SB in the loneliness feelings scores of 23 trained older people (19 women and 4 men). For this purpose, a naive Bayes ML algorithm was applied. RESULTS After analysis, we inferred that aerobic fitness (AF), hand grip strength (HG), and upper limb strength (ULS) comprised the most relevant variables panel to cause high participant loneliness with 100% accuracy and F-1 score. CONCLUSIONS The naive Bayes algorithm with leave-one-out cross-validation (LOOCV) predicted loneliness in trained older with a high precision. In addition, AF was the most potent variable in reducing loneliness risk.
Collapse
Affiliation(s)
- Samuel Encarnação
- Department of Sport Sciences, Instituto Politécnico de Bragança (IPB), 5300-253 Bragança, Portugal
- Research Centre in Basic Education (CIEB), Instituto Politécnico de Bragança (IPB), 5300-253 Bragança, Portugal
- Department of Pysical Activity and Sport Sciences, Universidad Autónoma de Madrid (UAM), Ciudad Universitaria de Cantoblanco, 28049 Madrid, Spain
| | - Paula Vaz
- Research Centre in Basic Education (CIEB), Instituto Politécnico de Bragança (IPB), 5300-253 Bragança, Portugal
| | - Álvaro Fortunato
- Department of Sport Sciences, Instituto Politécnico de Bragança (IPB), 5300-253 Bragança, Portugal
- Research Centre in Sports Sciences, Health Sciences and Human Development (CIDESD), 5001-801 Vila Real, Portugal
| | - Pedro Forte
- Department of Sport Sciences, Instituto Politécnico de Bragança (IPB), 5300-253 Bragança, Portugal
- Research Centre in Sports Sciences, Health Sciences and Human Development (CIDESD), 5001-801 Vila Real, Portugal
- CI-ISCE, Higher Institute of Educational Sciences of the Douro (ISCE Douro), 4560-708 Penafiel, Portugal
| | - Cátia Vaz
- CI-ISCE, Higher Institute of Educational Sciences of the Douro (ISCE Douro), 4560-708 Penafiel, Portugal
- Department of Education and Supervision, Instituto Politécnico de Bragança (IPB), 5300-253 Bragança, Portugal
| | - António Miguel Monteiro
- Department of Sport Sciences, Instituto Politécnico de Bragança (IPB), 5300-253 Bragança, Portugal
- Department of Pysical Activity and Sport Sciences, Universidad Autónoma de Madrid (UAM), Ciudad Universitaria de Cantoblanco, 28049 Madrid, Spain
| |
Collapse
|
7
|
Higgins L, Gerdes H, Cutillas PR. Principles of phosphoproteomics and applications in cancer research. Biochem J 2023; 480:403-420. [PMID: 36961757 PMCID: PMC10212522 DOI: 10.1042/bcj20220220] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 02/24/2023] [Accepted: 02/28/2023] [Indexed: 03/25/2023]
Abstract
Phosphorylation constitutes the most common and best-studied regulatory post-translational modification in biological systems and archetypal signalling pathways driven by protein and lipid kinases are disrupted in essentially all cancer types. Thus, the study of the phosphoproteome stands to provide unique biological information on signalling pathway activity and on kinase network circuitry that is not captured by genetic or transcriptomic technologies. Here, we discuss the methods and tools used in phosphoproteomics and highlight how this technique has been used, and can be used in the future, for cancer research. Challenges still exist in mass spectrometry phosphoproteomics and in the software required to provide biological information from these datasets. Nevertheless, improvements in mass spectrometers with enhanced scan rates, separation capabilities and sensitivity, in biochemical methods for sample preparation and in computational pipelines are enabling an increasingly deep analysis of the phosphoproteome, where previous bottlenecks in data acquisition, processing and interpretation are being relieved. These powerful hardware and algorithmic innovations are not only providing exciting new mechanistic insights into tumour biology, from where new drug targets may be derived, but are also leading to the discovery of phosphoproteins as mediators of drug sensitivity and resistance and as classifiers of disease subtypes. These studies are, therefore, uncovering phosphoproteins as a new generation of disruptive biomarkers to improve personalised anti-cancer therapies.
Collapse
Affiliation(s)
- Luke Higgins
- Cell Signaling and Proteomics Group, Centre for Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London, U.K
| | - Henry Gerdes
- Cell Signaling and Proteomics Group, Centre for Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London, U.K
| | - Pedro R. Cutillas
- Cell Signaling and Proteomics Group, Centre for Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London, U.K
- Alan Turing Institute, The British Library, London, U.K
- Digital Environment Research Institute, Queen Mary University of London, London, U.K
| |
Collapse
|
8
|
Takkavatakarn K, Hofer IS. Artificial Intelligence and Machine Learning in Perioperative Acute Kidney Injury. ADVANCES IN KIDNEY DISEASE AND HEALTH 2023; 30:53-60. [PMID: 36723283 DOI: 10.1053/j.akdh.2022.10.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Revised: 09/30/2022] [Accepted: 10/28/2022] [Indexed: 12/24/2022]
Abstract
Acute kidney injury (AKI) is a common complication after a surgery, especially in cardiac and aortic procedures, and has a significant impact on morbidity and mortality. Early identification of high-risk patients and providing effective prevention and therapeutic approach are the main strategies for reducing the possibility of perioperative AKI. Consequently, several risk-prediction models and risk assessment scores have been developed for the prediction of perioperative AKI. However, a majority of these risk scores are only derived from preoperative data while the intraoperative time-series monitoring data such as heart rate and blood pressure were not included. Moreover, the complexity of the pathophysiology of AKI, as well as its nonlinear and heterogeneous nature, imposes limitations on the use of linear statistical techniques. The development of clinical medicine's digitization, the widespread availability of electronic medical records, and the increase in the use of continuous monitoring have generated vast quantities of data. Machine learning has recently shown promise as a method for automatically integrating large amounts of data in predicting the risk of perioperative outcomes. In this article, we discussed the development, limitations of existing work, and the potential future direction of models using machine learning techniques to predict AKI after a surgery.
Collapse
Affiliation(s)
- Kullaya Takkavatakarn
- Division of Nephrology, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY; Division of Nephrology, Department of Medicine, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand
| | - Ira S Hofer
- Department of Anesthesiology, Pain and Perioperative Medicine, Icahn School of Medicine at Mount, Sinai, NY.
| |
Collapse
|