1
|
Farnoush A, Sedighi-Maman Z, Rasoolian B, Heath JJ, Fallah B. Prediction of adverse drug reactions using demographic and non-clinical drug characteristics in FAERS data. Sci Rep 2024; 14:23636. [PMID: 39384938 PMCID: PMC11464664 DOI: 10.1038/s41598-024-74505-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 09/26/2024] [Indexed: 10/11/2024] Open
Abstract
The presence of adverse drug reactions (ADRs) is an ongoing public health concern. While traditional methods to discover ADRs are very costly and limited, it is prudent to predict ADRs through non-invasive methods such as machine learning based on existing data. Although various studies exist regarding ADR prediction using non-clinical data, a process that leverages both demographic and non-clinical data for ADR prediction is missing. In addition, the importance of individual features in ADR prediction has yet to be fully explored. This study aims to develop an ADR prediction model based on demographic and non-clinical data, where we identify the highest contributing factors. We focus our efforts on 30 common and severe ADRs reported to the Food and Drug Administration (FDA) between 2012 and 2023. We have developed a random forest (RF) and deep learning (DL) machine learning model that ingests demographic data (e.g., Age and Gender of patients) and non-clinical data, which includes chemical, molecular, and biological drug characteristics. We successfully unified both demographic and non-clinical data sources within a complete dataset regarding ADR prediction. Model performances were assessed via the area under the receiver operating characteristic curve (AUC) and the mean average precision (MAP). We demonstrated that our parsimonious models, which include only the top 20 most important features comprising 5 demographic features and 15 non-clinical features (13 molecular and 2 biological), achieve ADR prediction performance comparable to a less practical, feature-rich model consisting of all 2,315 features. Specifically, our models achieved an AUC of 0.611 and 0.674 for RF and DL algorithms, respectively. We hope our research provides researchers and clinicians with valuable insights and facilitates future research designs by identifying top ADR predictors (including demographic information) and practical parsimonious models.
Collapse
Affiliation(s)
- Alireza Farnoush
- Darla Moore School of Business, University of South Carolina, Columbia, SC, 29208, USA.
| | - Zahra Sedighi-Maman
- McDonough School of Business, Georgetown University, Washington, DC, 20057, USA
| | - Behnam Rasoolian
- Department of Industrial and System Engineering, Auburn University, Auburn, AL, 36849, USA
| | - Jonathan J Heath
- School of Business, St. Bonaventure University, Washington, DCNY, 2005714778, USA
| | - Banafsheh Fallah
- Department of Industrial and System Engineering, Auburn University, Auburn, AL, 36849, USA
| |
Collapse
|
2
|
Rudrapal M, Kirboga KK, Abdalla M, Maji S. Explainable artificial intelligence-assisted virtual screening and bioinformatics approaches for effective bioactivity prediction of phenolic cyclooxygenase-2 (COX-2) inhibitors using PubChem molecular fingerprints. Mol Divers 2024; 28:2099-2118. [PMID: 38200203 DOI: 10.1007/s11030-023-10782-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 11/22/2023] [Indexed: 01/12/2024]
Abstract
Cyclooxygenase-2 (COX-2) inhibitors are nonsteroidal anti-inflammatory drugs that treat inflammation, pain and fever. This study determined the interaction mechanisms of COX-2 inhibitors and the molecular properties needed to design new drug candidates. Using machine learning and explainable AI methods, the inhibition activity of 1488 molecules was modelled, and essential properties were identified. These properties included aromatic rings, nitrogen-containing functional groups and aliphatic hydrocarbons. They affected the water solubility, hydrophobicity and binding affinity of COX-2 inhibitors. The binding mode, stability and ADME properties of 16 ligands bound to the Cyclooxygenase active site of COX-2 were investigated by molecular docking, molecular dynamics simulation and MM-GBSA analysis. The results showed that ligand 339,222 was the most stable and effective COX-2 inhibitor. It inhibited prostaglandin synthesis by disrupting the protein conformation of COX-2. It had good ADME properties and high clinical potential. This study demonstrated the potential of machine learning and bioinformatics methods in discovering COX-2 inhibitors.
Collapse
Affiliation(s)
- Mithun Rudrapal
- Department of Pharmaceutical Sciences, School of Biotechnology and Pharmaceutical Sciences, Vignan's Foundation for Science, Technology & Research (Deemed to Be University), Guntur, 522213, India.
| | - Kevser Kübra Kirboga
- Informatics Institute, Istanbul Technical University, 34469, Maslak, Istanbul, Turkey.
- Bioengineering Department, BilecikSeyhEdebali University, 11230, Bilecik, Turkey.
| | - Mohnad Abdalla
- Pediatric Research Institute, Children's Hospital Affiliated to Shandong University, Jinan, 250022, Shandong, People's Republic of China
| | - Siddhartha Maji
- Department of Chemistry, Oklahoma State University, Stillwater, OK, USA
| |
Collapse
|
3
|
Sinha K, Parwez S, Mv S, Yadav A, Siddiqi MI, Banerjee D. Machine learning and biological evaluation-based identification of a potential MMP-9 inhibitor, effective against ovarian cancer cells SKOV3. J Biomol Struct Dyn 2024; 42:6823-6841. [PMID: 37504963 DOI: 10.1080/07391102.2023.2240416] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2023] [Accepted: 07/08/2023] [Indexed: 07/29/2023]
Abstract
MMP-9, also known as gelatinase B, is a zinc-metalloproteinase family protein that plays a key role in the degradation of the extracellular matrix (ECM). The normal function of MMP-9 includes the breakdown of ECM, a process that aids in normal physiological processes such as embryonic development, angiogenesis, etc. Interruptions in these processes due to the over-expression or downregulation of MMP-9 are reported to cause some pathological conditions like neurodegenerative diseases and cancer. In the present study, an integrated approach for ML-based virtual screening of the Maybridge library was carried out and their biological activity was tested in an attempt to identify novel small molecule scaffolds that can inhibit the activity of MMP-9. The top hits were identified and selected for target-based activity against MMP-9 protein using the kit (Biovision K844). Further, MTT assay was performed in various cancer cell lines such as breast (MCF-7, MDA-MB-231), colorectal (HCT119, DL-D-1), cervical (HeLa), lung (A549) and ovarian cancer (SKOV3). Interestingly, one compound viz., RJF02215 exhibited anti-cancer activity selectively in SKOV3. Wound healing assay and colony formation assay performed on SKOV3 cell line in the presence of RJF02215 confirmed that the compound had a significant inhibitory effect on this cell line. Thus, we have identified a novel molecule that can inhibit MMP-9 activity in vitro and inhibits the proliferation of SKOV3 cells. Novel molecules based on the structure of RJF02215 may become a good value addition for the treatment of ovarian cancer by exhibiting selective MMP-9 activity.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Khushboo Sinha
- Cancer Biology Division, CSIR-Central Drug Research Institute, Lucknow, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| | - Shahid Parwez
- Biochemistry and Structural Biology Division, CSIR-Central Drug Research Institute, Lucknow, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| | - Shahana Mv
- Cancer Biology Division, CSIR-Central Drug Research Institute, Lucknow, India
| | - Ananya Yadav
- Cancer Biology Division, CSIR-Central Drug Research Institute, Lucknow, India
| | - Mohammad Imran Siddiqi
- Biochemistry and Structural Biology Division, CSIR-Central Drug Research Institute, Lucknow, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| | - Dibyendu Banerjee
- Cancer Biology Division, CSIR-Central Drug Research Institute, Lucknow, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| |
Collapse
|
4
|
Rashidi HH, Ikram A, Dang LT, Bashir A, Zohra T, Ali A, Tanvir H, Mudassar M, Ravindran R, Akhtar N, Sikandar RI, Umer M, Akhter N, Butt R, Fennell BD, Khan IH. Comparing machine learning screening approaches using clinical data and cytokine profiles for COVID-19 in resource-limited and resource-abundant settings. Sci Rep 2024; 14:14892. [PMID: 38937503 PMCID: PMC11211475 DOI: 10.1038/s41598-024-63707-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Accepted: 05/31/2024] [Indexed: 06/29/2024] Open
Abstract
Accurate screening of COVID-19 infection status for symptomatic patients is a critical public health task. Although molecular and antigen tests now exist for COVID-19, in resource-limited settings, screening tests are often not available. Furthermore, during the early stages of the pandemic tests were not available in any capacity. We utilized an automated machine learning (ML) approach to train and evaluate thousands of models on a clinical dataset consisting of commonly available clinical and laboratory data, along with cytokine profiles for patients (n = 150). These models were then further tested for generalizability on an out-of-sample secondary dataset (n = 120). We were able to develop a ML model for rapid and reliable screening of patients as COVID-19 positive or negative using three approaches: commonly available clinical and laboratory data, a cytokine profile, and a combination of the common data and cytokine profile. Of the tens of thousands of models automatically tested for the three approaches, all three approaches demonstrated > 92% sensitivity and > 88 specificity while our highest performing model achieved 95.6% sensitivity and 98.1% specificity. These models represent a potential effective deployable solution for COVID-19 status classification for symptomatic patients in resource-limited settings and provide proof-of-concept for rapid development of screening tools for novel emerging infectious diseases.
Collapse
Affiliation(s)
- Hooman H Rashidi
- Computational Pathology and AI Center of Excellence (CPACE), University of Pittsburgh Medical Center, and University of Pittsburgh School of Medicine, Pittsburgh, USA.
| | - Aamer Ikram
- National Institutes of Health, Islamabad, Pakistan
| | - Luke T Dang
- Department of Pathology and Laboratory Medicine, University of California, 4400 V Street, DavisSacramento, CA, 95817, USA
| | - Adnan Bashir
- Health Information Systems Program (HISP), Islamabad, Pakistan
| | | | - Amna Ali
- National Institutes of Health, Islamabad, Pakistan
| | - Hamza Tanvir
- National Institutes of Health, Islamabad, Pakistan
| | | | - Resmi Ravindran
- Department of Pathology and Laboratory Medicine, University of California, 4400 V Street, DavisSacramento, CA, 95817, USA
| | - Nasim Akhtar
- Pakistan Institute of Medical Sciences, Islamabad, Pakistan
| | | | - Mohammed Umer
- Rawalpindi Medical University-Rawalpindi, Rawalpindi, Pakistan
| | - Naeem Akhter
- Rawalpindi Medical University-Rawalpindi, Rawalpindi, Pakistan
| | - Rafi Butt
- Isolation Hospital and Infectious Treatment Centre, Islamabad, Pakistan
| | - Brandon D Fennell
- Department of Medicine, University of California, San Francisco, USA
| | - Imran H Khan
- Department of Pathology and Laboratory Medicine, University of California, 4400 V Street, DavisSacramento, CA, 95817, USA.
| |
Collapse
|
5
|
Toni E, Ayatollahi H, Abbaszadeh R, Fotuhi Siahpirani A. Machine Learning Techniques for Predicting Drug-Related Side Effects: A Scoping Review. Pharmaceuticals (Basel) 2024; 17:795. [PMID: 38931462 PMCID: PMC11206653 DOI: 10.3390/ph17060795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2024] [Revised: 06/11/2024] [Accepted: 06/12/2024] [Indexed: 06/28/2024] Open
Abstract
BACKGROUND Drug safety relies on advanced methods for timely and accurate prediction of side effects. To tackle this requirement, this scoping review examines machine-learning approaches for predicting drug-related side effects with a particular focus on chemical, biological, and phenotypical features. METHODS This was a scoping review in which a comprehensive search was conducted in various databases from 1 January 2013 to 31 December 2023. RESULTS The results showed the widespread use of Random Forest, k-nearest neighbor, and support vector machine algorithms. Ensemble methods, particularly random forest, emphasized the significance of integrating chemical and biological features in predicting drug-related side effects. CONCLUSIONS This review article emphasized the significance of considering a variety of features, datasets, and machine learning algorithms for predicting drug-related side effects. Ensemble methods and Random Forest showed the best performance and combining chemical and biological features improved prediction. The results suggested that machine learning techniques have some potential to improve drug development and trials. Future work should focus on specific feature types, selection techniques, and graph-based methods for even better prediction.
Collapse
Affiliation(s)
- Esmaeel Toni
- Medical Informatics, Student Research Committee, Iran University of Medical Sciences, Tehran, Iran 14496-14535;
| | - Haleh Ayatollahi
- Medical Informatics, Health Management and Economics Research Center, Health Management Research Institute, Iran University of Medical Sciences, Tehran, Iran 1996-713883
| | - Reza Abbaszadeh
- Pediatric Cardiology, Rajaie Cardiovascular Medical and Research Center, Iran University of Medical Sciences, Tehran, Iran 19956-14331;
| | - Alireza Fotuhi Siahpirani
- Systems Biology and Bioinformatics, Department of Bioinformatics, Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran 14176-14411;
| |
Collapse
|
6
|
Das P, Mazumder DH. K 1K 2NN: A novel multi-label classification approach based on neighbors for predicting COVID-19 drug side effects. Comput Biol Chem 2024; 110:108066. [PMID: 38579549 DOI: 10.1016/j.compbiolchem.2024.108066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Revised: 03/12/2024] [Accepted: 04/01/2024] [Indexed: 04/07/2024]
Abstract
COVID-19, a novel ailment, has received comparatively fewer drugs for its treatment. Side Effects (SE) of a COVID-19 drug could cause long-term health issues. Hence, SE prediction is essential in COVID-19 drug development. Efficient models are also needed to predict COVID-19 drug SE since most existing research has proposed many classifiers to predict SE for diseases other than COVID-19. This work proposes a novel classifier based on neighbors named K1 K2 Nearest Neighbors (K1K2NN) to predict the SE of the COVID-19 drug from 17 molecules' descriptors and the chemical 1D structure of the drugs. The model is implemented based on the proposition that chemically similar drugs may be assigned similar drug SE, and co-occurring SE may be assigned to chemically similar drugs. The K1K2NN model chooses the first K1 neighbors to the test drug sample by calculating its similarity with the train drug samples. It then assigns the test sample with the SE label having the majority count on the SE labels of these K1 neighbor drugs obtained through a voting mechanism. The model then calculates the SE-SE similarity using the Jaccard similarity measure from the SE co-occurrence values. Finally, the model chooses the most similar K2 SE neighbors for those SE determined by the K1 neighbor drugs and assigns these SE to that test drug sample. The proposed K1K2NN model has showcased promising performance with the highest accuracy of 97.53% on chemical 1D drug structure and outperforms the state-of-the-art multi-label classifiers. In addition, we demonstrate the successful application of the proposed model on gene expression signature datasets, which aided in evaluating its performance and confirming its accuracy and robustness.
Collapse
Affiliation(s)
- Pranab Das
- Department of Computer Science & Engineering, National Institute of Technology Nagaland, Chumukedima, Dimapur, Nagaland 797103, India
| | - Dilwar Hussain Mazumder
- Department of Computer Science & Engineering, National Institute of Technology Nagaland, Chumukedima, Dimapur, Nagaland 797103, India.
| |
Collapse
|
7
|
Torkamannia A, Omidi Y, Ferdousi R. SYNDEEP: a deep learning approach for the prediction of cancer drugs synergy. Sci Rep 2023; 13:6184. [PMID: 37061563 PMCID: PMC10105711 DOI: 10.1038/s41598-023-33271-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2023] [Accepted: 04/11/2023] [Indexed: 04/17/2023] Open
Abstract
Drug combinations can be the prime strategy for increasing the initial treatment options in cancer therapy. However, identifying the combinations through experimental approaches is very laborious and costly. Notably, in vitro and/or in vivo examination of all the possible combinations might not be plausible. This study presented a novel computational approach to predicting synergistic drug combinations. Specifically, the deep neural network-based binary classification was utilized to develop the model. Various physicochemical, genomic, protein-protein interaction and protein-metabolite interaction information were used to predict the synergy effects of the combinations of different drugs. The performance of the constructed model was compared with shallow neural network (SNN), k-nearest neighbors (KNN), random forest (RF), support vector machines (SVMs), and gradient boosting classifiers (GBC). Based on our findings, the proposed deep neural network model was found to be capable of predicting synergistic drug combinations with high accuracy. The prediction accuracy and AUC metrics for this model were 92.21% and 97.32% in tenfold cross-validation. According to the results, the integration of different types of physicochemical and genomics features leads to more accurate prediction of synergy in cancer drugs.
Collapse
Affiliation(s)
- Anna Torkamannia
- Department of Health Information Technology, School of Management and Medical Informatics, Tabriz University of Medical Sciences, Tabriz, 51656/65811, Iran
| | - Yadollah Omidi
- Department of Pharmaceutical Sciences, College of Pharmacy, Nova Southeastern University, Fort Lauderdale, FL, 33328, USA
| | - Reza Ferdousi
- Department of Health Information Technology, School of Management and Medical Informatics, Tabriz University of Medical Sciences, Tabriz, 51656/65811, Iran.
| |
Collapse
|
8
|
Das P, Mazumder DH. An extensive survey on the use of supervised machine learning techniques in the past two decades for prediction of drug side effects. Artif Intell Rev 2023; 56:1-28. [PMID: 36819660 PMCID: PMC9930028 DOI: 10.1007/s10462-023-10413-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/01/2023] [Indexed: 02/19/2023]
Abstract
Approved drugs for sale must be effective and safe, implying that the drug's advantages outweigh its known harmful side effects. Side effects (SE) of drugs are one of the common reasons for drug failure that may halt the whole drug discovery pipeline. The side effects might vary from minor concerns like a runny nose to potentially life-threatening issues like liver damage, heart attack, and death. Therefore, predicting the side effects of the drug is vital in drug development, discovery, and design. Supervised machine learning-based side effects prediction task has recently received much attention since it reduces time, chemical waste, design complexity, risk of failure, and cost. The advancement of supervised learning approaches for predicting side effects have emerged as essential computational tools. Supervised machine learning technique provides early information on drug side effects to develop an effective drug based on drug properties. Still, there are several challenges to predicting drug side effects. Thus, a near-exhaustive survey is carried out in this paper on the use of supervised machine learning approaches employed in drug side effects prediction tasks in the past two decades. In addition, this paper also summarized the drug descriptor required for the side effects prediction task, commonly utilized drug properties sources, computational models, and their performances. Finally, the research gap, open problems, and challenges for the further supervised learning-based side effects prediction task have been discussed.
Collapse
Affiliation(s)
- Pranab Das
- Department of Computer Science and Engineering, National Institute of Technology Nagaland, Chumukedima, Dimapur, Nagaland 797103 India
| | - Dilwar Hussain Mazumder
- Department of Computer Science and Engineering, National Institute of Technology Nagaland, Chumukedima, Dimapur, Nagaland 797103 India
| |
Collapse
|
9
|
Wen J, Zhang X, Rush E, Panickan VA, Li X, Cai T, Zhou D, Ho YL, Costa L, Begoli E, Hong C, Gaziano JM, Cho K, Lu J, Liao KP, Zitnik M, Cai T. Multimodal representation learning for predicting molecule-disease relations. Bioinformatics 2023; 39:btad085. [PMID: 36805623 PMCID: PMC9940625 DOI: 10.1093/bioinformatics/btad085] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 12/23/2022] [Accepted: 02/08/2023] [Indexed: 02/22/2023] Open
Abstract
MOTIVATION Predicting molecule-disease indications and side effects is important for drug development and pharmacovigilance. Comprehensively mining molecule-molecule, molecule-disease and disease-disease semantic dependencies can potentially improve prediction performance. METHODS We introduce a Multi-Modal REpresentation Mapping Approach to Predicting molecular-disease relations (M2REMAP) by incorporating clinical semantics learned from electronic health records (EHR) of 12.6 million patients. Specifically, M2REMAP first learns a multimodal molecule representation that synthesizes chemical property and clinical semantic information by mapping molecule chemicals via a deep neural network onto the clinical semantic embedding space shared by drugs, diseases and other common clinical concepts. To infer molecule-disease relations, M2REMAP combines multimodal molecule representation and disease semantic embedding to jointly infer indications and side effects. RESULTS We extensively evaluate M2REMAP on molecule indications, side effects and interactions. Results show that incorporating EHR embeddings improves performance significantly, for example, attaining an improvement over the baseline models by 23.6% in PRC-AUC on indications and 23.9% on side effects. Further, M2REMAP overcomes the limitation of existing methods and effectively predicts drugs for novel diseases and emerging pathogens. AVAILABILITY AND IMPLEMENTATION The code is available at https://github.com/celehs/M2REMAP, and prediction results are provided at https://shiny.parse-health.org/drugs-diseases-dev/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jun Wen
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
- VA Boston Healthcare System, Boston, MA 02130, USA
| | - Xiang Zhang
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Everett Rush
- Department of Energy, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - Vidul A Panickan
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
- VA Boston Healthcare System, Boston, MA 02130, USA
| | - Xingyu Li
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Tianrun Cai
- VA Boston Healthcare System, Boston, MA 02130, USA
- Mass General Brigham, Boston, MA 02130, USA
| | - Doudou Zhou
- Department of Statistics, University of California, Davis, CA 95616, USA
| | - Yuk-Lam Ho
- VA Boston Healthcare System, Boston, MA 02130, USA
| | - Lauren Costa
- VA Boston Healthcare System, Boston, MA 02130, USA
| | - Edmon Begoli
- Department of Energy, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - Chuan Hong
- VA Boston Healthcare System, Boston, MA 02130, USA
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27708, USA
| | - J Michael Gaziano
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
- VA Boston Healthcare System, Boston, MA 02130, USA
- Brigham and Women’s Hospital, Boston, MA 02115, USA
| | - Kelly Cho
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
- VA Boston Healthcare System, Boston, MA 02130, USA
- Brigham and Women’s Hospital, Boston, MA 02115, USA
| | - Junwei Lu
- VA Boston Healthcare System, Boston, MA 02130, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Katherine P Liao
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
- VA Boston Healthcare System, Boston, MA 02130, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Harvard Data Science Initiative, Cambridge, MA 02138, USA
| | - Tianxi Cai
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
- VA Boston Healthcare System, Boston, MA 02130, USA
- Mass General Brigham, Boston, MA 02130, USA
| |
Collapse
|
10
|
McMaster C, Chan J, Liew DFL, Su E, Frauman AG, Chapman WW, Pires DEV. Developing a deep learning natural language processing algorithm for automated reporting of adverse drug reactions. J Biomed Inform 2023; 137:104265. [PMID: 36464227 DOI: 10.1016/j.jbi.2022.104265] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Revised: 11/01/2022] [Accepted: 11/29/2022] [Indexed: 12/03/2022]
Abstract
The detection of adverse drug reactions (ADRs) is critical to our understanding of the safety and risk-benefit profile of medications. With an incidence that has not changed over the last 30 years, ADRs are a significant source of patient morbidity, responsible for 5%-10% of acute care hospital admissions worldwide. Spontaneous reporting of ADRs has long been the standard method of reporting, however this approach is known to have high rates of under-reporting, a problem that limits pharmacovigilance efforts. Automated ADR reporting presents an alternative pathway to increase reporting rates, although this may be limited by over-reporting of other drug-related adverse events. We developed a deep learning natural language processing algorithm to identify ADRs in discharge summaries at a single academic hospital centre. Our model was developed in two stages: first, a pre-trained model (DeBERTa) was further pre-trained on 1.1 million unlabelled clinical documents; secondly, this model was fine-tuned to detect ADR mentions in a corpus of 861 annotated discharge summaries. This model was compared to a version without the pre-training step, and a previously published RoBERTa model pretrained on MIMIC III, which has demonstrated strong performance on other pharmacovigilance tasks. To ensure that our algorithm could differentiate ADRs from other drug-related adverse events, the annotated corpus was enriched for both validated ADR reports and confounding drug-related adverse events using. The final model demonstrated good performance with a ROC-AUC of 0.955 (95% CI 0.933 - 0.978) for the task of identifying discharge summaries containing ADR mentions, significantly outperforming the two comparator models.
Collapse
Affiliation(s)
- Christopher McMaster
- Department of Clinical Pharmacology & Therapeutics, Austin Health, Melbourne, Victoria, Australia; Department of Rheumatology, Austin Health, Melbourne, Victoria, Australia; The Centre for Digital Transformation of Health, University of Melbourne, Melbourne, Victoria, Australia; School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia.
| | - Julia Chan
- Department of Rheumatology, Austin Health, Melbourne, Victoria, Australia
| | - David F L Liew
- Department of Clinical Pharmacology & Therapeutics, Austin Health, Melbourne, Victoria, Australia; Department of Rheumatology, Austin Health, Melbourne, Victoria, Australia; Department of Medicine, University of Melbourne, Melbourne, Victoria, Australia
| | - Elizabeth Su
- Department of Clinical Pharmacology & Therapeutics, Austin Health, Melbourne, Victoria, Australia
| | - Albert G Frauman
- Department of Clinical Pharmacology & Therapeutics, Austin Health, Melbourne, Victoria, Australia; Department of Medicine, University of Melbourne, Melbourne, Victoria, Australia
| | - Wendy W Chapman
- The Centre for Digital Transformation of Health, University of Melbourne, Melbourne, Victoria, Australia
| | - Douglas E V Pires
- The Centre for Digital Transformation of Health, University of Melbourne, Melbourne, Victoria, Australia; School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia
| |
Collapse
|
11
|
Alpay BA, Gosink M, Aguiar D. Evaluating molecular fingerprint-based models of drug side effects against a statistical control. Drug Discov Today 2022; 27:103364. [PMID: 36115633 DOI: 10.1016/j.drudis.2022.103364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Revised: 08/24/2022] [Accepted: 09/09/2022] [Indexed: 11/16/2022]
Abstract
There are many machine learning models that use molecular fingerprints of drugs to predict side effects. Characterizing their skill is necessary for understanding their usefulness in pharmaceutical development. Here, we analyze a statistical control of side effect prediction skill, develop a pipeline for benchmarking models, and evaluate how well existing models predict side effects identified in pharmaceutical documentation. We demonstrate that molecular fingerprints are useful for ranking drugs by their likelihood to cause a given side effect. However, the predictions for one or more drugs overall benefit only marginally from molecular fingerprints when ranking the likelihoods of many possible side effects, and display at most modest overall skill at identifying the side effects that do and do not occur.
Collapse
Affiliation(s)
- Berk A Alpay
- Systems, Synthetic, and Quantitative Biology Program, Harvard University, Cambridge, MA 02138, USA.
| | | | - Derek Aguiar
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, USA
| |
Collapse
|
12
|
Seo Y, Bang S, Son J, Kim D, Jeong Y, Kim P, Yang J, Eom JH, Choi N, Kim HN. Brain physiome: A concept bridging in vitro 3D brain models and in silico models for predicting drug toxicity in the brain. Bioact Mater 2022; 13:135-148. [PMID: 35224297 PMCID: PMC8843968 DOI: 10.1016/j.bioactmat.2021.11.009] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 11/01/2021] [Accepted: 11/06/2021] [Indexed: 12/12/2022] Open
Abstract
In the last few decades, adverse reactions to pharmaceuticals have been evaluated using 2D in vitro models and animal models. However, with increasing computational power, and as the key drivers of cellular behavior have been identified, in silico models have emerged. These models are time-efficient and cost-effective, but the prediction of adverse reactions to unknown drugs using these models requires relevant experimental input. Accordingly, the physiome concept has emerged to bridge experimental datasets with in silico models. The brain physiome describes the systemic interactions of its components, which are organized into a multilevel hierarchy. Because of the limitations in obtaining experimental data corresponding to each physiome component from 2D in vitro models and animal models, 3D in vitro brain models, including brain organoids and brain-on-a-chip, have been developed. In this review, we present the concept of the brain physiome and its hierarchical organization, including cell- and tissue-level organizations. We also summarize recently developed 3D in vitro brain models and link them with the elements of the brain physiome as a guideline for dataset collection. The connection between in vitro 3D brain models and in silico modeling will lead to the establishment of cost-effective and time-efficient in silico models for the prediction of the safety of unknown drugs.
Collapse
Affiliation(s)
- Yoojin Seo
- Brain Science Institute, Korea Institute of Science and Technology (KIST), Seoul, 02792, Republic of Korea
| | - Seokyoung Bang
- Brain Science Institute, Korea Institute of Science and Technology (KIST), Seoul, 02792, Republic of Korea
| | - Jeongtae Son
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
| | - Dongsup Kim
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
| | - Yong Jeong
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
| | - Pilnam Kim
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
| | - Jihun Yang
- Next&Bio Inc., Seoul, 02841, Republic of Korea
| | - Joon-Ho Eom
- Medical Device Research Division, National Institute of Food and Drug Safety Evaluation, Cheongju, 28159, Republic of Korea
| | - Nakwon Choi
- Brain Science Institute, Korea Institute of Science and Technology (KIST), Seoul, 02792, Republic of Korea
- Division of Bio-Medical Science & Technology, KIST School, Korea University of Science and Technology (UST), Seoul, 02792, Republic of Korea
- KU-KIST Graduate School of Converging Science and Technology, Korea University, Seoul, 02841, Republic of Korea
| | - Hong Nam Kim
- Brain Science Institute, Korea Institute of Science and Technology (KIST), Seoul, 02792, Republic of Korea
- Division of Bio-Medical Science & Technology, KIST School, Korea University of Science and Technology (UST), Seoul, 02792, Republic of Korea
- School of Mechanical Engineering, Yonsei University, Seoul, 03722, Republic of Korea
- Yonsei-KIST Convergence Research Institute, Yonsei University, Seoul, 03722, Republic of Korea
| |
Collapse
|
13
|
Das P, Pal V. Integrative analysis of chemical properties and functions of drugs for adverse drug reaction prediction based on multi-label deep neural network. J Integr Bioinform 2022; 19:jib-2022-0007. [PMID: 35585715 DOI: 10.1515/jib-2022-0007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Accepted: 03/16/2022] [Indexed: 11/15/2022] Open
Abstract
The prediction of adverse drug reactions (ADR) is an important step of drug discovery and design process. Different drug properties have been employed for ADR prediction but the prediction capability of drug properties and drug functions in integrated manner is yet to be explored. In the present work, a multi-label deep neural network and MLSMOTE based methodology has been proposed for ADR prediction. The proposed methodology has been applied on SMILES Strings data of drugs, 17 molecular descriptors data of drugs and drug functions data individually and in integrated manner for ADR prediction. The experimental results shows that the SMILES Strings + drug functions has outperformed other types of data with regards to ADR prediction capability.
Collapse
Affiliation(s)
- Pranab Das
- National Institute of Technology Meghalaya, Shillong, India
| | - Vipin Pal
- National Institute of Technology Meghalaya, Shillong, India
| |
Collapse
|
14
|
Crofton KM, Bassan A, Behl M, Chushak YG, Fritsche E, Gearhart JM, Marty MS, Mumtaz M, Pavan M, Ruiz P, Sachana M, Selvam R, Shafer TJ, Stavitskaya L, Szabo DT, Szabo ST, Tice RR, Wilson D, Woolley D, Myatt GJ. Current status and future directions for a neurotoxicity hazard assessment framework that integrates in silico approaches. COMPUTATIONAL TOXICOLOGY (AMSTERDAM, NETHERLANDS) 2022; 22:100223. [PMID: 35844258 PMCID: PMC9281386 DOI: 10.1016/j.comtox.2022.100223] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/27/2023]
Abstract
Neurotoxicology is the study of adverse effects on the structure or function of the developing or mature adult nervous system following exposure to chemical, biological, or physical agents. The development of more informative alternative methods to assess developmental (DNT) and adult (NT) neurotoxicity induced by xenobiotics is critically needed. The use of such alternative methods including in silico approaches that predict DNT or NT from chemical structure (e.g., statistical-based and expert rule-based systems) is ideally based on a comprehensive understanding of the relevant biological mechanisms. This paper discusses known mechanisms alongside the current state of the art in DNT/NT testing. In silico approaches available today that support the assessment of neurotoxicity based on knowledge of chemical structure are reviewed, and a conceptual framework for the integration of in silico methods with experimental information is presented. Establishing this framework is essential for the development of protocols, namely standardized approaches, to ensure that assessments of NT and DNT based on chemical structures are generated in a transparent, consistent, and defendable manner.
Collapse
Affiliation(s)
| | - Arianna Bassan
- Innovatune srl, Via Giulio Zanon 130/D, 35129 Padova,
Italy
| | - Mamta Behl
- Division of the National Toxicology Program, National
Institutes of Environmental Health Sciences, Durham, NC 27709, USA
| | - Yaroslav G. Chushak
- Henry M Jackson Foundation for the Advancement of Military
Medicine, Wright-Patterson AFB, OH 45433, USA
| | - Ellen Fritsche
- IUF – Leibniz Research Institute for Environmental
Medicine & Medical Faculty Heinrich-Heine-University, Düsseldorf,
Germany
| | - Jeffery M. Gearhart
- Henry M Jackson Foundation for the Advancement of Military
Medicine, Wright-Patterson AFB, OH 45433, USA
| | | | - Moiz Mumtaz
- Agency for Toxic Substances and Disease Registry, US
Department of Health and Human Services, Atlanta, GA, USA
| | - Manuela Pavan
- Innovatune srl, Via Giulio Zanon 130/D, 35129 Padova,
Italy
| | - Patricia Ruiz
- Agency for Toxic Substances and Disease Registry, US
Department of Health and Human Services, Atlanta, GA, USA
| | - Magdalini Sachana
- Environment Health and Safety Division, Environment
Directorate, Organisation for Economic Co-Operation and Development (OECD), 75775
Paris Cedex 16, France
| | - Rajamani Selvam
- Office of Clinical Pharmacology, Office of Translational
Sciences, Center for Drug Evaluation and Research (CDER), U.S. Food and Drug
Administration (FDA), Silver Spring, MD 20993, USA
| | - Timothy J. Shafer
- Biomolecular and Computational Toxicology Division, Center
for Computational Toxicology and Exposure, US EPA, Research Triangle Park, NC,
USA
| | - Lidiya Stavitskaya
- Office of Clinical Pharmacology, Office of Translational
Sciences, Center for Drug Evaluation and Research (CDER), U.S. Food and Drug
Administration (FDA), Silver Spring, MD 20993, USA
| | | | | | | | - Dan Wilson
- The Dow Chemical Company, Midland, MI 48667, USA
| | | | - Glenn J. Myatt
- Instem, Columbus, OH 43215, USA
- Corresponding author.
(G.J. Myatt)
| |
Collapse
|
15
|
Jiang M, Zhou B, Chen L. Identification of drug side effects with a path-based method. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:5754-5771. [PMID: 35603377 DOI: 10.3934/mbe.2022269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The study of drug side effects is a significant task in drug discovery. Candidate drugs with unaccepted side effects must be eliminated to prevent risks for both patients and pharmaceutical companies. Thus, all side effects for any candidate drug should be determined. However, this task, which is carried out through traditional experiments, is time-consuming and expensive. Building computational methods has been increasingly used for the identification of drug side effects. In the present study, a new path-based method was proposed to determine drug side effects. A heterogeneous network was built to perform such method, which defined drugs and side effects as nodes. For any drug and side effect, the proposed path-based method determined all paths with limited length that connects them and further evaluated the association between them based on these paths. The strong association indicates that the drug has a side effect with a high probability. By using two types of jackknife test, the method yielded good performance and was superior to some other network-based methods. Furthermore, the effects of one parameter in the method and heterogeneous network was analyzed.
Collapse
Affiliation(s)
- Meng Jiang
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Bo Zhou
- Shanghai University of Medicine & Health Sciences, Shanghai 201318, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| |
Collapse
|
16
|
Similarity-Based Method with Multiple-Feature Sampling for Predicting Drug Side Effects. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:9547317. [PMID: 35401786 PMCID: PMC8993545 DOI: 10.1155/2022/9547317] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Revised: 09/18/2021] [Accepted: 03/15/2022] [Indexed: 12/23/2022]
Abstract
Drugs can treat different diseases but also bring side effects. Undetected and unaccepted side effects for approved drugs can greatly harm the human body and bring huge risks for pharmaceutical companies. Traditional experimental methods used to determine the side effects have several drawbacks, such as low efficiency and high cost. One alternative to achieve this purpose is to design computational methods. Previous studies modeled a binary classification problem by pairing drugs and side effects; however, their classifiers can only extract one feature from each type of drug association. The present work proposed a novel multiple-feature sampling scheme that can extract several features from one type of drug association. Thirteen classification algorithms were employed to construct classifiers with features yielded by such scheme. Their performance was greatly improved compared with that of the classifiers that use the features yielded by the original scheme. Best performance was observed for the classifier based on random forest with MCC of 0.8661, AUROC of 0.969, and AUPR of 0.977. Finally, one key parameter in the multiple-feature sampling scheme was analyzed.
Collapse
|
17
|
Hung CM, Shi HY, Lee PH, Chang CS, Rau KM, Lee HM, Tseng CH, Pei SN, Tsai KJ, Chiu CC. Potential and role of artificial intelligence in current medical healthcare. Artif Intell Cancer 2022; 3:1-10. [DOI: 10.35713/aic.v3.i1.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Revised: 12/31/2021] [Accepted: 02/20/2022] [Indexed: 02/06/2023] Open
Abstract
Artificial intelligence (AI) is defined as the digital computer or computer-controlled robot's ability to mimic intelligent conduct and crucial thinking commonly associated with intelligent beings. The application of AI technology and machine learning in medicine have allowed medical practitioners to provide patients with better quality of services; and current advancements have led to a dramatic change in the healthcare system. However, many efficient applications are still in their initial stages, which need further evaluations to improve and develop these applications. Clinicians must recognize and acclimate themselves with the developments in AI technology to improve their delivery of healthcare services; but for this to be possible, a significant revision of medical education is needed to provide future leaders with the required competencies. This article reviews the potential and limitations of AI in healthcare, as well as the current medical application trends including healthcare administration, clinical decision assistance, patient health monitoring, healthcare resource allocation, medical research, and public health policy development. Also, future possibilities for further clinical and scientific practice were also summarized.
Collapse
Affiliation(s)
- Chao-Ming Hung
- Department of General Surgery, E-Da Cancer Hospital, Kaohsiung 82445, Taiwan
- College of Medicine, I-Shou University, Kaohsiung 82445, Taiwan
| | - Hon-Yi Shi
- Department of Healthcare Administration and Medical Informatics, Kaohsiung Medical University, Kaohsiung 80708, Taiwan
- Department of Business Management, National Sun Yat-Sen University, Kaohsiung 80420, Taiwan
- Department of Medical Research, Kaohsiung Medical University Hospital, Kaohsiung 80708, Taiwan
- Department of Medical Research, China Medical University Hospital, China Medical University, Taichung 40402, Taiwan
| | - Po-Huang Lee
- College of Medicine, I-Shou University, Kaohsiung 82445, Taiwan
- Department of Surgery, E-Da Hospital, Kaohsiung 82445, Taiwan
| | - Chao-Sung Chang
- Department of Hematology & Oncology, E-Da Cancer Hospital, Kaohsiung 82445, Taiwan
- School of Medicine for International Students, College of Medicine, I-Shou University, Kaohsiung 82445, Taiwan
| | - Kun-Ming Rau
- Department of Hematology & Oncology, E-Da Cancer Hospital, Kaohsiung 82445, Taiwan
- School of Medicine, College of Medicine, I-Shou University, Kaohsiung 82445, Taiwan
| | - Hui-Ming Lee
- Department of General Surgery, E-Da Cancer Hospital, Kaohsiung 82445, Taiwan
- College of Medicine, I-Shou University, Kaohsiung 82445, Taiwan
| | - Cheng-Hao Tseng
- School of Medicine, College of Medicine, I-Shou University, Kaohsiung 82445, Taiwan
- Department of Gastroenterology and Hepatology, E-Da Cancer Hospital, Kaohsiung 82445, Taiwan
- Department of Gastroenterology and Hepatology, E-Da Hospital, Kaohsiung 82445, Taiwan
| | - Sung-Nan Pei
- Department of Hematology & Oncology, E-Da Cancer Hospital, Kaohsiung 82445, Taiwan
- School of Medicine, College of Medicine, I-Shou University, Kaohsiung 82445, Taiwan
| | - Kuen-Jang Tsai
- Department of General Surgery, E-Da Cancer Hospital, Kaohsiung 82445, Taiwan
| | - Chong-Chi Chiu
- Department of General Surgery, E-Da Cancer Hospital, Kaohsiung 82445, Taiwan
- School of Medicine, College of Medicine, I-Shou University, Kaohsiung 82445, Taiwan
- Department of Medical Education and Research, E-Da Cancer Hospital, Kaohsiung 82445, Taiwan
| |
Collapse
|
18
|
Rajasekhar S, Karuppasamy R, Chanda K. Exploration of potential inhibitors for tuberculosis via structure-based drug design, molecular docking, and molecular dynamics simulation studies. J Comput Chem 2021; 42:1736-1749. [PMID: 34216033 DOI: 10.1002/jcc.26712] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2021] [Revised: 05/28/2021] [Accepted: 06/21/2021] [Indexed: 12/20/2022]
Abstract
Drug resistance in tuberculosis is major threat to human population. In the present investigation, we aimed to identify novel and potent benzimidazole molecules to overcome the resistance management. A series of 20 benzimidazole derivatives were examined for its activity as selective antitubercular agents. Initially, AutodockVina algorithm was performed to assess the efficacy of the molecules. The results are further enriched by redocking by means of Glide algorithm. The binding free energies of the compounds were then calculated by MM-generalized-born surface area method. Molecular docking studies elucidated that benzimidazole derivatives has revealed formation of hydrogen bond and strong binding affinity in the active site of Mycobacterium tuberculosis protein. Note that ARG308, GLY189, VAL312, LEU403, and LEU190 amino acid residues of Mycobacterium tuberculosis protein PrpR are involved in binding with ligands of benzimidazoles. Interestingly, the ligands exhibited same binding potential to the active site of protein complex PrpR in both the docking programs. In essence, the result portrays that benzimidazole derivatives such as 1p, 1q, and 1 t could be potent and selective antitubercular agents than the standard drug isoniazid. These compounds were then subjected to molecular dynamics simulation to validate the dynamics activity of the compounds against PrpR. Finally, the inhibitory behavior of compounds was predicted using a machine learning algorithm trained on a data collection of 15,000 compounds utilizing graph-based signatures. Overall, the study concludes that designed benzimidazoles can be employed as antitubercular agents. Indeed, the results are helpful for the experimental biologists to develop safe and non-toxic drugs against tuberculosis.
Collapse
Affiliation(s)
- Sreerama Rajasekhar
- Department of Chemistry, School of Advanced Science, Vellore Institute of Technology, Vellore, India
| | - Ramanathan Karuppasamy
- Department of Biotechnology, School of BioSciences and Technology, Vellore Institute of Technology, Vellore, Tamil Nadu, India
| | - Kaushik Chanda
- Department of Chemistry, School of Advanced Science, Vellore Institute of Technology, Vellore, India
| |
Collapse
|
19
|
Rashidi HH, Dang LT, Albahra S, Ravindran R, Khan IH. Automated machine learning for endemic active tuberculosis prediction from multiplex serological data. Sci Rep 2021; 11:17900. [PMID: 34504228 PMCID: PMC8429671 DOI: 10.1038/s41598-021-97453-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Accepted: 08/25/2021] [Indexed: 11/09/2022] Open
Abstract
Serological diagnosis of active tuberculosis (TB) is enhanced by detection of multiple antibodies due to variable immune responses among patients. Clinical interpretation of these complex datasets requires development of suitable algorithms, a time consuming and tedious undertaking addressed by the automated machine learning platform MILO (Machine Intelligence Learning Optimizer). MILO seamlessly integrates data processing, feature selection, model training, and model validation to simultaneously generate and evaluate thousands of models. These models were then further tested for generalizability on out-of-sample secondary and tertiary datasets. Out of 31 antigens evaluated, a 23-antigen model was the most robust on both the secondary dataset (TB vs healthy) and the tertiary dataset (TB vs COPD) with sensitivity of 90.5% and respective specificities of 100.0% and 74.6%. MILO represents a user-friendly, end-to-end solution for automated generation and deployment of optimized models, ideal for applications where rapid clinical implementation is critical such as emerging infectious diseases.
Collapse
Affiliation(s)
- Hooman H Rashidi
- Department of Pathology and Laboratory Medicine, University of California Davis, 4400 V Street, Sacramento, CA, 95817, USA.
| | - Luke T Dang
- Department of Pathology and Laboratory Medicine, University of California Davis, 4400 V Street, Sacramento, CA, 95817, USA
| | - Samer Albahra
- Department of Pathology and Laboratory Medicine, University of California Davis, 4400 V Street, Sacramento, CA, 95817, USA
| | - Resmi Ravindran
- Department of Pathology and Laboratory Medicine, University of California Davis, 4400 V Street, Sacramento, CA, 95817, USA
| | - Imran H Khan
- Department of Pathology and Laboratory Medicine, University of California Davis, 4400 V Street, Sacramento, CA, 95817, USA.
| |
Collapse
|
20
|
Gupta R, Srivastava D, Sahu M, Tiwari S, Ambasta RK, Kumar P. Artificial intelligence to deep learning: machine intelligence approach for drug discovery. Mol Divers 2021; 25:1315-1360. [PMID: 33844136 PMCID: PMC8040371 DOI: 10.1007/s11030-021-10217-3] [Citation(s) in RCA: 302] [Impact Index Per Article: 100.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 03/22/2021] [Indexed: 02/06/2023]
Abstract
Drug designing and development is an important area of research for pharmaceutical companies and chemical scientists. However, low efficacy, off-target delivery, time consumption, and high cost impose a hurdle and challenges that impact drug design and discovery. Further, complex and big data from genomics, proteomics, microarray data, and clinical trials also impose an obstacle in the drug discovery pipeline. Artificial intelligence and machine learning technology play a crucial role in drug discovery and development. In other words, artificial neural networks and deep learning algorithms have modernized the area. Machine learning and deep learning algorithms have been implemented in several drug discovery processes such as peptide synthesis, structure-based virtual screening, ligand-based virtual screening, toxicity prediction, drug monitoring and release, pharmacophore modeling, quantitative structure-activity relationship, drug repositioning, polypharmacology, and physiochemical activity. Evidence from the past strengthens the implementation of artificial intelligence and deep learning in this field. Moreover, novel data mining, curation, and management techniques provided critical support to recently developed modeling algorithms. In summary, artificial intelligence and deep learning advancements provide an excellent opportunity for rational drug design and discovery process, which will eventually impact mankind. The primary concern associated with drug design and development is time consumption and production cost. Further, inefficiency, inaccurate target delivery, and inappropriate dosage are other hurdles that inhibit the process of drug delivery and development. With advancements in technology, computer-aided drug design integrating artificial intelligence algorithms can eliminate the challenges and hurdles of traditional drug design and development. Artificial intelligence is referred to as superset comprising machine learning, whereas machine learning comprises supervised learning, unsupervised learning, and reinforcement learning. Further, deep learning, a subset of machine learning, has been extensively implemented in drug design and development. The artificial neural network, deep neural network, support vector machines, classification and regression, generative adversarial networks, symbolic learning, and meta-learning are examples of the algorithms applied to the drug design and discovery process. Artificial intelligence has been applied to different areas of drug design and development process, such as from peptide synthesis to molecule design, virtual screening to molecular docking, quantitative structure-activity relationship to drug repositioning, protein misfolding to protein-protein interactions, and molecular pathway identification to polypharmacology. Artificial intelligence principles have been applied to the classification of active and inactive, monitoring drug release, pre-clinical and clinical development, primary and secondary drug screening, biomarker development, pharmaceutical manufacturing, bioactivity identification and physiochemical properties, prediction of toxicity, and identification of mode of action.
Collapse
Affiliation(s)
- Rohan Gupta
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Devesh Srivastava
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Mehar Sahu
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Swati Tiwari
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Rashmi K Ambasta
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Pravir Kumar
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India.
| |
Collapse
|
21
|
Zhang F, Sun B, Diao X, Zhao W, Shu T. Prediction of adverse drug reactions based on knowledge graph embedding. BMC Med Inform Decis Mak 2021; 21:38. [PMID: 33541342 PMCID: PMC7863488 DOI: 10.1186/s12911-021-01402-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Accepted: 01/19/2021] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND Adverse drug reactions (ADRs) are an important concern in the medication process and can pose a substantial economic burden for patients and hospitals. Because of the limitations of clinical trials, it is difficult to identify all possible ADRs of a drug before it is marketed. We developed a new model based on data mining technology to predict potential ADRs based on available drug data. METHOD Based on the Word2Vec model in Nature Language Processing, we propose a new knowledge graph embedding method that embeds drugs and ADRs into their respective vectors and builds a logistic regression classification model to predict whether a given drug will have ADRs. RESULT First, a new knowledge graph embedding method was proposed, and comparison with similar studies showed that our model not only had high prediction accuracy but also was simpler in model structure. In our experiments, the AUC of the classification model reached a maximum of 0.87, and the mean AUC was 0.863. CONCLUSION In this paper, we introduce a new method to embed knowledge graph to vectorize drugs and ADRs, then use a logistic regression classification model to predict whether there is a causal relationship between them. The experiment showed that the use of knowledge graph embedding can effectively encode drugs and ADRs. And the proposed ADRs prediction system is also very effective.
Collapse
Affiliation(s)
- Fei Zhang
- Department of Information Center, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, No. 167 North Lishi Road, Xicheng District, Beijing, 100037 China
| | - Bo Sun
- Department of Information Center, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, No. 167 North Lishi Road, Xicheng District, Beijing, 100037 China
| | - Xiaolin Diao
- Department of Information Center, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, No. 167 North Lishi Road, Xicheng District, Beijing, 100037 China
| | - Wei Zhao
- Department of Information Center, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, No. 167 North Lishi Road, Xicheng District, Beijing, 100037 China
| | - Ting Shu
- National Institute of Hospital Administration, National Health Commission, Building 3, Yard 6, Shouti South Road, Haidian, Beijing, 100044 China
| |
Collapse
|
22
|
Coveney PV, Highfield RR. From digital hype to analogue reality: Universal simulation beyond the quantum and exascale eras. JOURNAL OF COMPUTATIONAL SCIENCE 2020; 46:101093. [PMID: 33312270 PMCID: PMC7709487 DOI: 10.1016/j.jocs.2020.101093] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Accepted: 03/03/2020] [Indexed: 05/23/2023]
Abstract
Many believe that the future of innovation lies in simulation. However, as computers are becoming ever more powerful, so does the hyperbole used to discuss their potential in modelling across a vast range of domains, from subatomic physics to chemistry, climate science, epidemiology, economics and cosmology. As we are about to enter the era of quantum and exascale computing, machine learning and artificial intelligence have entered the field in a significant way. In this article we give a brief history of simulation, discuss how machine learning can be more powerful if underpinned by deeper mechanistic understanding, outline the potential of exascale and quantum computing, highlight the limits of digital computing - classical and quantum - and distinguish rhetoric from reality in assessing the future of modelling and simulation, when we believe analogue computing will play an increasingly important role.
Collapse
Affiliation(s)
- Peter V. Coveney
- Centre for Computational Science, University College London, Gordon Street, London, WC1H 0AJ, UK
- Institute for Informatics, Science Park 904, University of Amsterdam, 1098 XH, Amsterdam, Netherlands
| | | |
Collapse
|
23
|
Prediction of Drug Side Effects with a Refined Negative Sample Selection Strategy. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2020; 2020:1573543. [PMID: 32454877 PMCID: PMC7232712 DOI: 10.1155/2020/1573543] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/28/2020] [Revised: 04/14/2020] [Accepted: 04/23/2020] [Indexed: 01/07/2023]
Abstract
Drugs are an important way to treat various diseases. However, they inevitably produce side effects, bringing great risks to human bodies and pharmaceutical companies. How to predict the side effects of drugs has become one of the essential problems in drug research. Designing efficient computational methods is an alternative way. Some studies paired the drug and side effect as a sample, thereby modeling the problem as a binary classification problem. However, the selection of negative samples is a key problem in this case. In this study, a novel negative sample selection strategy was designed for accessing high-quality negative samples. Such strategy applied the random walk with restart (RWR) algorithm on a chemical-chemical interaction network to select pairs of drugs and side effects, such that drugs were less likely to have corresponding side effects, as negative samples. Through several tests with a fixed feature extraction scheme and different machine-learning algorithms, models with selected negative samples produced high performance. The best model even yielded nearly perfect performance. These models had much higher performance than those without such strategy or with another selection strategy. Furthermore, it is not necessary to consider the balance of positive and negative samples under such a strategy.
Collapse
|
24
|
Jamal S, Khubaib M, Gangwar R, Grover S, Grover A, Hasnain SE. Artificial Intelligence and Machine learning based prediction of resistant and susceptible mutations in Mycobacterium tuberculosis. Sci Rep 2020; 10:5487. [PMID: 32218465 PMCID: PMC7099008 DOI: 10.1038/s41598-020-62368-2] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2019] [Accepted: 03/13/2020] [Indexed: 11/09/2022] Open
Abstract
Tuberculosis (TB), an infectious disease caused by Mycobacterium tuberculosis (M.tb), causes highest number of deaths globally for any bacterial disease necessitating novel diagnosis and treatment strategies. High-throughput sequencing methods generate a large amount of data which could be exploited in determining multi-drug resistant (MDR-TB) associated mutations. The present work is a computational framework that uses artificial intelligence (AI) based machine learning (ML) approaches for predicting resistance in the genes rpoB, inhA, katG, pncA, gyrA and gyrB for the drugs rifampicin, isoniazid, pyrazinamide and fluoroquinolones. The single nucleotide variations were represented by several sequence and structural features that indicate the influence of mutations on the target protein coded by each gene. We used ML algorithms - naïve bayes, k nearest neighbor, support vector machine, and artificial neural network, to build the prediction models. The classification models had an average accuracy of 85% across all examined genes and were evaluated on an external unseen dataset to demonstrate their application. Further, molecular docking and molecular dynamics simulations were performed for wild type and predicted resistance causing mutant protein and anti-TB drug complexes to study their impact on the conformation of proteins to confirm the observed phenotype.
Collapse
Affiliation(s)
- Salma Jamal
- Jamia Hamdard Institute of Molecular Medicine, Jamia Hamdard, Hamdard Nagar, New Delhi, 110062, India
| | - Mohd Khubaib
- Jamia Hamdard Institute of Molecular Medicine, Jamia Hamdard, Hamdard Nagar, New Delhi, 110062, India
| | - Rishabh Gangwar
- Jamia Hamdard Institute of Molecular Medicine, Jamia Hamdard, Hamdard Nagar, New Delhi, 110062, India
| | - Sonam Grover
- Jamia Hamdard Institute of Molecular Medicine, Jamia Hamdard, Hamdard Nagar, New Delhi, 110062, India
| | - Abhinav Grover
- School of Biotechnology, Jawaharlal Nehru University, New Mehrauli Road, New Delhi, 110 067, India
| | - Seyed E Hasnain
- Jamia Hamdard Institute of Molecular Medicine, Jamia Hamdard, Hamdard Nagar, New Delhi, 110062, India. .,Dr. Reddy's Institute of Life Sciences, University of Hyderabad Campus, Professor C.R. Rao Road, Hyderabad, 500046, India.
| |
Collapse
|
25
|
Spiro A, Fernández García J, Yanover C. Inferring new relations between medical entities using literature curated term co-occurrences. JAMIA Open 2020; 2:378-385. [PMID: 31984370 PMCID: PMC6951958 DOI: 10.1093/jamiaopen/ooz022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Revised: 06/05/2019] [Accepted: 06/08/2019] [Indexed: 11/17/2022] Open
Abstract
Objectives Identifying new relations between medical entities, such as drugs, diseases, and side effects, is typically a resource-intensive task, involving experimentation and clinical trials. The increased availability of related data and curated knowledge enables a computational approach to this task, notably by training models to predict likely relations. Such models rely on meaningful representations of the medical entities being studied. We propose a generic features vector representation that leverages co-occurrences of medical terms, linked with PubMed citations. Materials and Methods We demonstrate the usefulness of the proposed representation by inferring two types of relations: a drug causes a side effect and a drug treats an indication. To predict these relations and assess their effectiveness, we applied 2 modeling approaches: multi-task modeling using neural networks and single-task modeling based on gradient boosting machines and logistic regression. Results These trained models, which predict either side effects or indications, obtained significantly better results than baseline models that use a single direct co-occurrence feature. The results demonstrate the advantage of a comprehensive representation. Discussion Selecting the appropriate representation has an immense impact on the predictive performance of machine learning models. Our proposed representation is powerful, as it spans multiple medical domains and can be used to predict a wide range of relation types. Conclusion The discovery of new relations between various medical entities can be translated into meaningful insights, for example, related to drug development or disease understanding. Our representation of medical entities can be used to train models that predict such relations, thus accelerating healthcare-related discoveries.
Collapse
Affiliation(s)
- Adam Spiro
- Machine Learning for Healthcare and Life Sciences, Department of Health Informatics, IBM Research, Haifa, Israel
| | - Jonatan Fernández García
- Machine Learning for Healthcare and Life Sciences, Department of Health Informatics, IBM Research, Haifa, Israel
| | - Chen Yanover
- Machine Learning for Healthcare and Life Sciences, Department of Health Informatics, IBM Research, Haifa, Israel
| |
Collapse
|
26
|
Caldera M, Müller F, Kaltenbrunner I, Licciardello MP, Lardeau CH, Kubicek S, Menche J. Mapping the perturbome network of cellular perturbations. Nat Commun 2019; 10:5140. [PMID: 31723137 PMCID: PMC6853941 DOI: 10.1038/s41467-019-13058-9] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2019] [Accepted: 10/15/2019] [Indexed: 12/15/2022] Open
Abstract
Drug combinations provide effective treatments for diverse diseases, but also represent a major cause of adverse reactions. Currently there is no systematic understanding of how the complex cellular perturbations induced by different drugs influence each other. Here, we introduce a mathematical framework for classifying any interaction between perturbations with high-dimensional effects into 12 interaction types. We apply our framework to a large-scale imaging screen of cell morphology changes induced by diverse drugs and their combination, resulting in a perturbome network of 242 drugs and 1832 interactions. Our analysis of the chemical and biological features of the drugs reveals distinct molecular fingerprints for each interaction type. We find a direct link between drug similarities on the cell morphology level and the distance of their respective protein targets within the cellular interactome of molecular interactions. The interactome distance is also predictive for different types of drug interactions.
Collapse
Affiliation(s)
- Michael Caldera
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Lazarettgasse 14, AKH BT 25.3, A-1090, Vienna, Austria
| | - Felix Müller
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Lazarettgasse 14, AKH BT 25.3, A-1090, Vienna, Austria
| | - Isabel Kaltenbrunner
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Lazarettgasse 14, AKH BT 25.3, A-1090, Vienna, Austria
| | - Marco P Licciardello
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Lazarettgasse 14, AKH BT 25.3, A-1090, Vienna, Austria
- Cancer Research UK Cancer Therapeutics Unit, The Institute of Cancer Research, London, UK
| | - Charles-Hugues Lardeau
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Lazarettgasse 14, AKH BT 25.3, A-1090, Vienna, Austria
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca, Alderley Park, Macclesfield, UK
| | - Stefan Kubicek
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Lazarettgasse 14, AKH BT 25.3, A-1090, Vienna, Austria
| | - Jörg Menche
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Lazarettgasse 14, AKH BT 25.3, A-1090, Vienna, Austria.
| |
Collapse
|
27
|
Meng HY, Jin WL, Yan CK, Yang H. The Application of Machine Learning Techniques in Clinical Drug Therapy. Curr Comput Aided Drug Des 2019; 15:111-119. [PMID: 29804538 DOI: 10.2174/1573409914666180525124608] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Revised: 05/15/2018] [Accepted: 05/22/2018] [Indexed: 12/19/2022]
Abstract
INTRODUCTION The development of a novel drug is an extremely complicated process that includes the target identification, design and manufacture, and proper therapy of the novel drug, as well as drug dose selection, drug efficacy evaluation, and adverse drug reaction control. Due to the limited resources, high costs, long duration, and low hit-to-lead ratio in the development of pharmacogenetics and computer technology, machine learning techniques have assisted novel drug development and have gradually received more attention by researchers. METHODS According to current research, machine learning techniques are widely applied in the process of the discovery of new drugs and novel drug targets, the decision surrounding proper therapy and drug dose, and the prediction of drug efficacy and adverse drug reactions. RESULTS AND CONCLUSION In this article, we discussed the history, workflow, and advantages and disadvantages of machine learning techniques in the processes mentioned above. Although the advantages of machine learning techniques are fairly obvious, the application of machine learning techniques is currently limited. With further research, the application of machine techniques in drug development could be much more widespread and could potentially be one of the major methods used in drug development.
Collapse
Affiliation(s)
- Huan-Yu Meng
- Department of Neurology, Xiangya Hospital of Central South University, Changsha, China
| | - Wan-Lin Jin
- Department of Neurology, Xiangya Hospital of Central South University, Changsha, China
| | - Cheng-Kai Yan
- Department of Neurology, Xiangya Hospital of Central South University, Changsha, China
| | - Huan Yang
- Department of Neurology, Xiangya Hospital of Central South University, Changsha, China
| |
Collapse
|
28
|
Jamal S, Ali W, Nagpal P, Grover S, Grover A. Computational models for the prediction of adverse cardiovascular drug reactions. J Transl Med 2019; 17:171. [PMID: 31118067 PMCID: PMC6530172 DOI: 10.1186/s12967-019-1918-z] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2019] [Accepted: 05/10/2019] [Indexed: 02/06/2023] Open
Abstract
Background Predicting adverse drug reactions (ADRs) has become very important owing to the huge global health burden and failure of drugs. This indicates a need for prior prediction of probable ADRs in preclinical stages which can improve drug failures and reduce the time and cost of development thus providing efficient and safer therapeutic options for patients. Though several approaches have been put forward for in silico ADR prediction, there is still room for improvement. Methods In the present work, we have used machine learning based approach for cardiovascular (CV) ADRs prediction by integrating different features of drugs, biological (drug transporters, targets and enzymes), chemical (substructure fingerprints) and phenotypic (therapeutic indications and other identified ADRs), and their two and three level combinations. To recognize quality and important features, we used minimum redundancy maximum relevance approach while synthetic minority over-sampling technique balancing method was used to introduce a balance in the training sets. Results This is a rigorous and comprehensive study which involved the generation of a total of 504 computational models for 36 CV ADRs using two state-of-the-art machine-learning algorithms: random forest and sequential minimization optimization. All the models had an accuracy of around 90% and the biological and chemical features models were more informative as compared to the models generated using chemical features. Conclusions The results obtained demonstrated that the predictive models generated in the present study were highly accurate, and the phenotypic information of the drugs played the most important role in drug ADRs prediction. Furthermore, the results also showed that using the proposed method, different drugs properties can be combined to build computational predictive models which can effectively predict potential ADRs during early stages of drug development. Electronic supplementary material The online version of this article (10.1186/s12967-019-1918-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Salma Jamal
- JH-Institute of Molecular Medicine, Jamia Hamdard, New Delhi, India
| | - Waseem Ali
- JH-Institute of Molecular Medicine, Jamia Hamdard, New Delhi, India
| | - Priya Nagpal
- Department of Biotechnology, Jamia Millia Islamia, New Delhi, India
| | - Sonam Grover
- JH-Institute of Molecular Medicine, Jamia Hamdard, New Delhi, India.
| | - Abhinav Grover
- School of Biotechnology, Jawaharlal Nehru University, New Delhi, India.
| |
Collapse
|
29
|
Song X, Waitman LR, Hu Y, Yu ASL, Robins D, Liu M. Robust clinical marker identification for diabetic kidney disease with ensemble feature selection. J Am Med Inform Assoc 2019; 26:242-253. [PMID: 30602020 PMCID: PMC7792755 DOI: 10.1093/jamia/ocy165] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2018] [Revised: 11/05/2018] [Accepted: 11/21/2018] [Indexed: 11/15/2022] Open
Abstract
Objective Diabetic kidney disease (DKD) is one of the most frequent complications in diabetes associated with substantial morbidity and mortality. To accelerate DKD risk factor discovery, we present an ensemble feature selection approach to identify a robust set of discriminant factors using electronic medical records (EMRs). Material and Methods We identified a retrospective cohort of 15 645 adult patients with type 2 diabetes, excluding those with pre-existing kidney disease, and utilized all available clinical data types in modeling. We compared 3 machine-learning-based embedded feature selection methods in conjunction with 6 feature ensemble techniques for selecting top-ranked features in terms of robustness to data perturbations and predictability for DKD onset. Results The gradient boosting machine (GBM) with weighted mean rank feature ensemble technique achieved the best performance with an AUC of 0.82 [95%-CI, 0.81-0.83] on internal validation and 0.71 [95%-CI, 0.68-0.73] on external temporal validation. The ensemble model identified a set of 440 features from 84 872 unique clinical features that are both predicative of DKD onset and robust against data perturbations, including 191 labs, 51 visit details (mainly vital signs), 39 medications, 34 orders, 30 diagnoses, and 95 other clinical features. Discussion Many of the top-ranked features have not been included in the state-of-art DKD prediction models, but their relationships with kidney function have been suggested in existing literature. Conclusion Our ensemble feature selection framework provides an option for identifying a robust and parsimonious feature set unbiasedly from EMR data, which effectively aids in knowledge discovery for DKD risk factors.
Collapse
Affiliation(s)
- Xing Song
- Department of Internal Medicine, Division of Medical Informatics, University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Lemuel R Waitman
- Department of Internal Medicine, Division of Medical Informatics, University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Yong Hu
- Big Data Decision Institute, Jinan University, Guangzhou, PRC
| | - Alan S L Yu
- Division of Nephrology and Hypertension and the Kidney Institute, University of Kansas Medical Center, Kansas City, Kansas, USA
| | - David Robins
- Diabetes Institute, University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Mei Liu
- Department of Internal Medicine, Division of Medical Informatics, University of Kansas Medical Center, Kansas City, Kansas, USA
| |
Collapse
|
30
|
Schotland P, Racz R, Jackson D, Levin R, Strauss DG, Burkhart K. Target-Adverse Event Profiles to Augment Pharmacovigilance: A Pilot Study With Six New Molecular Entities. CPT-PHARMACOMETRICS & SYSTEMS PHARMACOLOGY 2018; 7:809-817. [PMID: 30354029 PMCID: PMC6310867 DOI: 10.1002/psp4.12356] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/11/2018] [Accepted: 09/06/2018] [Indexed: 12/20/2022]
Abstract
Clinical trials can fail to detect rare adverse events (AEs). We assessed the ability of pharmacological target adverse‐event (TAE) profiles to predict AEs on US Food and Drug Administration (FDA) drug labels at least 4 years after approval. TAE profiles were generated by aggregating AEs from the FDA adverse event reporting system (FAERS) reports and the FDA drug labels for drugs that hit a common target. A genetic algorithm (GA) was used to choose the adverse event (AE) case count (N), disproportionality score in FAERS (proportional reporting ratio (PRR)), and percent of comparator drug labels with an AE to maximize F‐measure. With FAERS data alone, precision, recall, and specificity were 0.57, 0.78, and 0.61, respectively. After including FDA drug label data, precision, recall, and specificity improved to 0.67, 0.81, and 0.71, respectively. Eighteen of 23 (78%) postmarket label changes were identified correctly. TAE analysis shows promise as a method to predict AEs at the time of drug approval.
Collapse
Affiliation(s)
- Peter Schotland
- Division of Applied Regulatory Science, Office of Clinical Pharmacology, Office of Translational Science, Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, Maryland, USA
| | - Rebecca Racz
- Division of Applied Regulatory Science, Office of Clinical Pharmacology, Office of Translational Science, Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, Maryland, USA
| | | | - Robert Levin
- Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, Maryland, USA
| | - David G Strauss
- Division of Applied Regulatory Science, Office of Clinical Pharmacology, Office of Translational Science, Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, Maryland, USA
| | - Keith Burkhart
- Division of Applied Regulatory Science, Office of Clinical Pharmacology, Office of Translational Science, Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, Maryland, USA
| |
Collapse
|
31
|
Mower J, Subramanian D, Cohen T. Learning predictive models of drug side-effect relationships from distributed representations of literature-derived semantic predications. J Am Med Inform Assoc 2018; 25:1339-1350. [PMID: 30010902 PMCID: PMC6454491 DOI: 10.1093/jamia/ocy077] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2018] [Revised: 04/23/2018] [Accepted: 06/05/2018] [Indexed: 02/01/2023] Open
Abstract
Objective The aim of this work is to leverage relational information extracted from biomedical literature using a novel synthesis of unsupervised pretraining, representational composition, and supervised machine learning for drug safety monitoring. Methods Using ≈80 million concept-relationship-concept triples extracted from the literature using the SemRep Natural Language Processing system, distributed vector representations (embeddings) were generated for concepts as functions of their relationships utilizing two unsupervised representational approaches. Embeddings for drugs and side effects of interest from two widely used reference standards were then composed to generate embeddings of drug/side-effect pairs, which were used as input for supervised machine learning. This methodology was developed and evaluated using cross-validation strategies and compared to contemporary approaches. To qualitatively assess generalization, models trained on the Observational Medical Outcomes Partnership (OMOP) drug/side-effect reference set were evaluated against a list of ≈1100 drugs from an online database. Results The employed method improved performance over previous approaches. Cross-validation results advance the state of the art (AUC 0.96; F1 0.90 and AUC 0.95; F1 0.84 across the two sets), outperforming methods utilizing literature and/or spontaneous reporting system data. Examination of predictions for unseen drug/side-effect pairs indicates the ability of these methods to generalize, with over tenfold label support enrichment in the top 100 predictions versus the bottom 100 predictions. Discussion and Conclusion Our methods can assist the pharmacovigilance process using information from the biomedical literature. Unsupervised pretraining generates a rich relationship-based representational foundation for machine learning techniques to classify drugs in the context of a putative side effect, given known examples.
Collapse
Affiliation(s)
- Justin Mower
- Baylor College of Medicine, Quantitative and Computational Biosciences, Houston, Texas, USA
| | | | - Trevor Cohen
- School of Biomedical Informatics, University of Texas Health Science Center Houston, Texas, USA
| |
Collapse
|
32
|
Kastrin A, Ferk P, Leskošek B. Predicting potential drug-drug interactions on topological and semantic similarity features using statistical learning. PLoS One 2018; 13:e0196865. [PMID: 29738537 PMCID: PMC5940181 DOI: 10.1371/journal.pone.0196865] [Citation(s) in RCA: 55] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2017] [Accepted: 04/20/2018] [Indexed: 01/03/2023] Open
Abstract
Drug-drug interaction (DDI) is a change in the effect of a drug when patient takes another drug. Characterizing DDIs is extremely important to avoid potential adverse drug reactions. We represent DDIs as a complex network in which nodes refer to drugs and links refer to their potential interactions. Recently, the problem of link prediction has attracted much consideration in scientific community. We represent the process of link prediction as a binary classification task on networks of potential DDIs. We use link prediction techniques for predicting unknown interactions between drugs in five arbitrary chosen large-scale DDI databases, namely DrugBank, KEGG, NDF-RT, SemMedDB, and Twosides. We estimated the performance of link prediction using a series of experiments on DDI networks. We performed link prediction using unsupervised and supervised approach including classification tree, k-nearest neighbors, support vector machine, random forest, and gradient boosting machine classifiers based on topological and semantic similarity features. Supervised approach clearly outperforms unsupervised approach. The Twosides network gained the best prediction performance regarding the area under the precision-recall curve (0.93 for both random forests and gradient boosting machine). The applied methodology can be used as a tool to help researchers to identify potential DDIs. The supervised link prediction approach proved to be promising for potential DDIs prediction and may facilitate the identification of potential DDIs in clinical research.
Collapse
Affiliation(s)
- Andrej Kastrin
- Institute of Biostatistics and Medical Informatics, Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia
| | - Polonca Ferk
- Institute of Biostatistics and Medical Informatics, Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia
| | - Brane Leskošek
- Institute of Biostatistics and Medical Informatics, Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia
| |
Collapse
|
33
|
Zhu Y, Elemento O, Pathak J, Wang F. Drug knowledge bases and their applications in biomedical informatics research. Brief Bioinform 2018; 20:1308-1321. [DOI: 10.1093/bib/bbx169] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2017] [Revised: 11/15/2017] [Indexed: 11/14/2022] Open
Abstract
Abstract
Recent advances in biomedical research have generated a large volume of drug-related data. To effectively handle this flood of data, many initiatives have been taken to help researchers make good use of them. As the results of these initiatives, many drug knowledge bases have been constructed. They range from simple ones with specific focuses to comprehensive ones that contain information on almost every aspect of a drug. These curated drug knowledge bases have made significant contributions to the development of efficient and effective health information technologies for better health-care service delivery. Understanding and comparing existing drug knowledge bases and how they are applied in various biomedical studies will help us recognize the state of the art and design better knowledge bases in the future. In addition, researchers can get insights on novel applications of the drug knowledge bases through a review of successful use cases. In this study, we provide a review of existing popular drug knowledge bases and their applications in drug-related studies. We discuss challenges in constructing and using drug knowledge bases as well as future research directions toward a better ecosystem of drug knowledge bases.
Collapse
|