1
|
Zhai S, Tan Y, Zhu C, Zhang C, Gao Y, Mao Q, Zhang Y, Duan H, Yin Y. PepExplainer: An explainable deep learning model for selection-based macrocyclic peptide bioactivity prediction and optimization. Eur J Med Chem 2024; 275:116628. [PMID: 38944933 DOI: 10.1016/j.ejmech.2024.116628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 06/21/2024] [Accepted: 06/24/2024] [Indexed: 07/02/2024]
Abstract
Macrocyclic peptides possess unique features, making them highly promising as a drug modality. However, evaluating their bioactivity through wet lab experiments is generally resource-intensive and time-consuming. Despite advancements in artificial intelligence (AI) for bioactivity prediction, challenges remain due to limited data availability and the interpretability issues in deep learning models, often leading to less-than-ideal predictions. To address these challenges, we developed PepExplainer, an explainable graph neural network based on substructure mask explanation (SME). This model excels at deciphering amino acid substructures, translating macrocyclic peptides into detailed molecular graphs at the atomic level, and efficiently handling non-canonical amino acids and complex macrocyclic peptide structures. PepExplainer's effectiveness is enhanced by utilizing the correlation between peptide enrichment data from selection-based focused library and bioactivity data, and employing transfer learning to improve bioactivity predictions of macrocyclic peptides against IL-17C/IL-17 RE interaction. Additionally, PepExplainer underwent further validation for bioactivity prediction using an additional set of thirteen newly synthesized macrocyclic peptides. Moreover, it enabled the optimization of the IC50 of a macrocyclic peptide, reducing it from 15 nM to 5.6 nM based on the contribution score provided by PepExplainer. This achievement underscores PepExplainer's skill in deciphering complex molecular patterns, highlighting its potential to accelerate the discovery and optimization of macrocyclic peptides.
Collapse
Affiliation(s)
- Silong Zhai
- School of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, China
| | - Yahong Tan
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao, 266237, China
| | - Cheng Zhu
- School of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, China
| | - Chengyun Zhang
- School of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, China
| | - Yan Gao
- Qilu Institute of Technology, Jinan, 250200, China
| | - Qingyi Mao
- School of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, China
| | - Youming Zhang
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao, 266237, China
| | - Hongliang Duan
- Faculty of Applied Sciences, Macao Polytechnic University, Macao, 999078, China.
| | - Yizhen Yin
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao, 266237, China; Shandong Research Institute of Industrial Technology, Jinan, 250101, China.
| |
Collapse
|
2
|
Wei Z, Wang X, Lu L, Li S, Long W, Zhang L, Shen S. Construction of an Early Risk Prediction Model for Type 2 Diabetic Peripheral Neuropathy Based on Random Forest. Comput Inform Nurs 2024; 42:665-674. [PMID: 38913980 DOI: 10.1097/cin.0000000000001157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
Diabetic peripheral neuropathy is a major cause of disability and death in the later stages of diabetes. A retrospective chart review was performed using a hospital-based electronic medical record database to identify 1020 patients who met the criteria. The objective of this study was to explore and analyze the early risk factors for peripheral neuropathy in patients with type 2 diabetes, even in the absence of specific clinical symptoms or signs. Finally, the random forest algorithm was used to rank the influencing factors and construct a predictive model, and then the model performance was evaluated. Logistic regression analysis revealed that vitamin D plays a crucial protective role in preventing diabetic peripheral neuropathy. The top three risk factors with significant contributions to the model in the random forest algorithm eigenvalue ranking were glycosylated hemoglobin, disease duration, and vitamin D. The areas under the receiver operating characteristic curve of the model ware 0.90. The accuracy, precision, specificity, and sensitivity were 0.85, 0.83, 0.92, and 0.71, respectively. The predictive model, which is based on the random forest algorithm, is intended to support clinical decision-making by healthcare professionals and help them target timely interventions to key factors in early diabetic peripheral neuropathy.
Collapse
Affiliation(s)
- Zhengang Wei
- Author Affiliations: Department of Nursing, Affiliated Hospital of Zunyi Medical University (Mr Wei; Mss Lu, Long, and Zhang; and Dr Shen); Department of Endocrinology and Metabolic Diseases, Affiliated Hospital of Zunyi Medical (Ms Li); and Department of Information Technology, Affiliated Hospital of Zunyi Medical University (Dr Wang), China
| | | | | | | | | | | | | |
Collapse
|
3
|
Zhang R, Zhu H, Chen M, Sang W, Lu K, Li Z, Wang C, Zhang L, Yin FF, Yang Z. A dual-radiomics model for overall survival prediction in early-stage NSCLC patient using pre-treatment CT images. Front Oncol 2024; 14:1419621. [PMID: 39206157 PMCID: PMC11349529 DOI: 10.3389/fonc.2024.1419621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Accepted: 07/26/2024] [Indexed: 09/04/2024] Open
Abstract
Introduction Radiation therapy (RT) is one of the primary treatment options for early-stage non-small cell lung cancer (ES-NSCLC). Therefore, accurately predicting the overall survival (OS) rate following radiotherapy is crucial for implementing personalized treatment strategies. This work aims to develop a dual-radiomics (DR) model to (1) predict 3-year OS in ES-NSCLC patients receiving RT using pre-treatment CT images, and (2) provide explanations between feature importanceand model prediction performance. Methods The publicly available TCIA Lung1 dataset with 132 ES-NSCLC patients received RT were studied: 89/43 patients in the under/over 3-year OS group. For each patient, two types of radiomic features were examined: 56 handcrafted radiomic features (HRFs) extracted within gross tumor volume, and 512 image deep features (IDFs) extracted using a pre-trained U-Net encoder. They were combined as inputs to an explainable boosting machine (EBM) model for OS prediction. The EBM's mean absolute scores for HRFs and IDFs were used as feature importance explanations. To evaluate identified feature importance, the DR model was compared with EBM using either (1) key or (2) non-key feature type only. Comparison studies with other models, including supporting vector machine (SVM) and random forest (RF), were also included. The performance was evaluated by the area under the receiver operating characteristic curve (AUCROC), accuracy, sensitivity, and specificity with a 100-fold Monte Carlo cross-validation. Results The DR model showed highestperformance in predicting 3-year OS (AUCROC=0.81 ± 0.04), and EBM scores suggested that IDFs showed significantly greater importance (normalized mean score=0.0019) than HRFs (score=0.0008). The comparison studies showed that EBM with key feature type (IDFs-only demonstrated comparable AUCROC results (0.81 ± 0.04), while EBM with non-key feature type (HRFs-only) showed limited AUCROC (0.64 ± 0.10). The results suggested that feature importance score identified by EBM is highly correlated with OS prediction performance. Both SVM and RF models were unable to explain key feature type while showing limited overall AUCROC=0.66 ± 0.07 and 0.77 ± 0.06, respectively. Accuracy, sensitivity, and specificity showed a similar trend. Discussion In conclusion, a DR model was successfully developed to predict ES-NSCLC OS based on pre-treatment CT images. The results suggested that the feature importance from DR model is highly correlated to the model prediction power.
Collapse
Affiliation(s)
- Rihui Zhang
- Medical Physics Graduate Program, Duke Kunshan University, Kunshan, Jiangsu, China
| | - Haiming Zhu
- Medical Physics Graduate Program, Duke Kunshan University, Kunshan, Jiangsu, China
| | - Minbin Chen
- Department of Radiotherapy & Oncology, The First People’s Hospital of Kunshan, Kunshan, Jiangsu, China
| | - Weiwei Sang
- Medical Physics Graduate Program, Duke Kunshan University, Kunshan, Jiangsu, China
| | - Ke Lu
- Deparment of Radiation Oncology, Duke University, Durham, NC, United States
| | - Zhen Li
- Radiation Oncology Department, Shanghai Sixth People’s Hospital, Shanghai, China
| | - Chunhao Wang
- Deparment of Radiation Oncology, Duke University, Durham, NC, United States
| | - Lei Zhang
- Medical Physics Graduate Program, Duke Kunshan University, Kunshan, Jiangsu, China
| | - Fang-Fang Yin
- Medical Physics Graduate Program, Duke Kunshan University, Kunshan, Jiangsu, China
| | - Zhenyu Yang
- Medical Physics Graduate Program, Duke Kunshan University, Kunshan, Jiangsu, China
| |
Collapse
|
4
|
Rodoplu Solovchuk D. Advances in AI-assisted biochip technology for biomedicine. Biomed Pharmacother 2024; 177:116997. [PMID: 38943990 DOI: 10.1016/j.biopha.2024.116997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Revised: 06/13/2024] [Accepted: 06/15/2024] [Indexed: 07/01/2024] Open
Abstract
The integration of biochips with AI opened up new possibilities and is expected to revolutionize smart healthcare tools within the next five years. The combination of miniaturized, multi-functional, rapid, high-throughput sample processing and sensing capabilities of biochips, with the computational data processing and predictive power of AI, allows medical professionals to collect and analyze vast amounts of data quickly and efficiently, leading to more accurate and timely diagnoses and prognostic evaluations. Biochips, as smart healthcare devices, offer continuous monitoring of patient symptoms. Integrated virtual assistants have the potential to send predictive feedback to users and healthcare practitioners, paving the way for personalized and predictive medicine. This review explores the current state-of-the-art biochip technologies including gene-chips, organ-on-a-chips, and neural implants, and the diagnostic and therapeutic utility of AI-assisted biochips in medical practices such as cancer, diabetes, infectious diseases, and neurological disorders. Choosing the appropriate AI model for a specific biomedical application, and possible solutions to the current challenges are explored. Surveying advances in machine learning models for biochip functionality, this paper offers a review of biochips for the future of biomedicine, an essential guide for keeping up with trends in healthcare, while inspiring cross-disciplinary collaboration among biomedical engineering, medicine, and machine learning fields.
Collapse
Affiliation(s)
- Didem Rodoplu Solovchuk
- Institute of Biomedical Engineering and Nanomedicine, National Health Research Institutes, Zhunan, Miaoli 35053, Taiwan.
| |
Collapse
|
5
|
Shah SK, Chaple DR, Masand VH, Jawarkar RD, Chaudhari S, Abiramasundari A, Zaki MEA, Al-Hussain SA. Multi-Target In-Silico modeling strategies to discover novel angiotensin converting enzyme and neprilysin dual inhibitors. Sci Rep 2024; 14:15991. [PMID: 38987327 PMCID: PMC11237057 DOI: 10.1038/s41598-024-66230-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Accepted: 06/28/2024] [Indexed: 07/12/2024] Open
Abstract
Cardiovascular diseases, including heart failure, stroke, and hypertension, affect 608 million people worldwide and cause 32% of deaths. Combination therapy is required in 60% of patients, involving concurrent Renin-Angiotensin-Aldosterone-System (RAAS) and Neprilysin inhibition. This study introduces a novel multi-target in-silico modeling technique (mt-QSAR) to evaluate the inhibitory potential against Neprilysin and Angiotensin-converting enzymes. Using both linear (GA-LDA) and non-linear (RF) algorithms, mt-QSAR classification models were developed using 983 chemicals to predict inhibitory effects on Neprilysin and Angiotensin-converting enzymes. The Box-Jenkins method, feature selection method, and machine learning algorithms were employed to obtain the most predictive model with ~ 90% overall accuracy. Additionally, the study employed virtual screening of designed scaffolds (Chalcone and its analogues, 1,3-Thiazole, 1,3,4-Thiadiazole) applying developed mt-QSAR models and molecular docking. The identified virtual hits underwent successive filtration steps, incorporating assessments of drug-likeness, ADMET profiles, and synthetic accessibility tools. Finally, Molecular dynamic simulations were then used to identify and rank the most favourable compounds. The data acquired from this study may provide crucial direction for the identification of new multi-targeted cardiovascular inhibitors.
Collapse
Affiliation(s)
- Sapan K Shah
- Department of Pharmaceutical Chemistry, Priyadarshini J. L. College of Pharmacy, Hingna Road, Nagpur, 440016, Maharashtra, India.
| | - Dinesh R Chaple
- Department of Pharmaceutical Chemistry, Priyadarshini J. L. College of Pharmacy, Hingna Road, Nagpur, 440016, Maharashtra, India
| | - Vijay H Masand
- Department of Chemistry, Vidya Bharati Mahavidyalaya, Amravati, 444602, Maharashtra, India
| | - Rahul D Jawarkar
- Department of Medicinal Chemistry and Drug Discovery, Dr. Rajendra Gode Institute of Pharmacy, University Mardi Road, Amravati, 444603, India
| | - Somdatta Chaudhari
- Department of Pharmaceutical Chemistry, Modern College of Pharmacy, Nigdi, Pune, India
| | | | - Magdi E A Zaki
- Department of Chemistry, College of Science, Imam Mohammad Ibn Saud Islamic University, Riyadh, 11623, Saudi Arabia.
| | - Sami A Al-Hussain
- Department of Chemistry, College of Science, Imam Mohammad Ibn Saud Islamic University, Riyadh, 11623, Saudi Arabia
| |
Collapse
|
6
|
Lorca M, Muscia GC, Pérez-Benavente S, Bautista JM, Acosta A, González C, Sabadini G, Mella J, Asís SE, Mellado M. 2D/3D-QSAR Model Development Based on a Quinoline Pharmacophoric Core for the Inhibition of Plasmodium falciparum: An In Silico Approach with Experimental Validation. Pharmaceuticals (Basel) 2024; 17:889. [PMID: 39065740 PMCID: PMC11279914 DOI: 10.3390/ph17070889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2024] [Revised: 06/19/2024] [Accepted: 06/27/2024] [Indexed: 07/28/2024] Open
Abstract
Malaria is an infectious disease caused by Plasmodium spp. parasites, with widespread drug resistance to most antimalarial drugs. We report the development of two 3D-QSAR models based on comparative molecular field analysis (CoMFA), comparative molecular similarity index analysis (CoMSIA), and a 2D-QSAR model, using a database of 349 compounds with activity against the P. falciparum 3D7 strain. The models were validated internally and externally, complying with all metrics (q2 > 0.5, r2test > 0.6, r2m > 0.5, etc.). The final models have shown the following statistical values: r2test CoMFA = 0.878, r2test CoMSIA = 0.876, and r2test 2D-QSAR = 0.845. The models were experimentally tested through the synthesis and biological evaluation of ten quinoline derivatives against P. falciparum 3D7. The CoMSIA and 2D-QSAR models outperformed CoMFA in terms of better predictive capacity (MAE = 0.7006, 0.4849, and 1.2803, respectively). The physicochemical and pharmacokinetic properties of three selected quinoline derivatives were similar to chloroquine. Finally, the compounds showed low cytotoxicity (IC50 > 100 µM) on human HepG2 cells. These results suggest that the QSAR models accurately predict the toxicological profile, correlating well with experimental in vivo data.
Collapse
Affiliation(s)
- Marcos Lorca
- Instituto de Química y Bioquímica, Facultad de Ciencias, Universidad de Valparaíso, Av. Gran Bretaña 1111, Valparaíso 2360102, Chile; (M.L.); (G.S.)
| | - Gisela C. Muscia
- Departamento de Ciencias Químicas, Facultad de Farmacia y Bioquímica, Universidad de Buenos Aires, Junín 956, C1113AAB Ciudad Autónoma de Buenos Aires, Buenos Aires 1113, Argentina;
| | - Susana Pérez-Benavente
- Departamento de Bioquímica y Biología Molecular, Facultad de Veterinaria, Universidad Complutense de Madrid, 28040 Madrid, Spain; (S.P.-B.); (J.M.B.)
| | - José M. Bautista
- Departamento de Bioquímica y Biología Molecular, Facultad de Veterinaria, Universidad Complutense de Madrid, 28040 Madrid, Spain; (S.P.-B.); (J.M.B.)
| | - Alison Acosta
- Universidad Andres Bello, Facultad de Ciencias Exactas, Departamento de Ciencias Químicas, Viña del Mar 2531015, Chile;
| | - Cesar González
- Departamento de Química, Universidad Técnica Federico Santa María, Av. España 1680, Valparaíso 2390123, Chile;
| | - Gianfranco Sabadini
- Instituto de Química y Bioquímica, Facultad de Ciencias, Universidad de Valparaíso, Av. Gran Bretaña 1111, Valparaíso 2360102, Chile; (M.L.); (G.S.)
| | - Jaime Mella
- Instituto de Química y Bioquímica, Facultad de Ciencias, Universidad de Valparaíso, Av. Gran Bretaña 1111, Valparaíso 2360102, Chile; (M.L.); (G.S.)
- Centro de Investigacion, Desarrollo e Innovacion de Productos Bioactivos (CInBIO), Universidad de Valparaiso, Av. Gran Bretaña 1111, Valparaíso 2360102, Chile
| | - Silvia E. Asís
- Departamento de Ciencias Químicas, Facultad de Farmacia y Bioquímica, Universidad de Buenos Aires, Junín 956, C1113AAB Ciudad Autónoma de Buenos Aires, Buenos Aires 1113, Argentina;
| | - Marco Mellado
- Facultad de Medicina y Ciencias de la Salud, Universidad Central de Chile, Santiago 8330507, Chile
| |
Collapse
|
7
|
Zhang R, Nolte D, Sanchez-Villalobos C, Ghosh S, Pal R. Topological regression as an interpretable and efficient tool for quantitative structure-activity relationship modeling. Nat Commun 2024; 15:5072. [PMID: 38871711 DOI: 10.1038/s41467-024-49372-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2023] [Accepted: 06/04/2024] [Indexed: 06/15/2024] Open
Abstract
Quantitative structure-activity relationship (QSAR) modeling is a powerful tool for drug discovery, yet the lack of interpretability of commonly used QSAR models hinders their application in molecular design. We propose a similarity-based regression framework, topological regression (TR), that offers a statistically grounded, computationally fast, and interpretable technique to predict drug responses. We compare the predictive performance of TR on 530 ChEMBL human target activity datasets against the predictive performance of deep-learning-based QSAR models. Our results suggest that our sparse TR model can achieve equal, if not better, performance than the deep learning-based QSAR models and provide better intuitive interpretation by extracting an approximate isometry between the chemical space of the drugs and their activity space.
Collapse
Affiliation(s)
- Ruibo Zhang
- Department of Electrical and Computer Engineering, Texas Tech University, Lubbock, TX, 79409, USA
| | - Daniel Nolte
- Department of Electrical and Computer Engineering, Texas Tech University, Lubbock, TX, 79409, USA
| | - Cesar Sanchez-Villalobos
- Department of Electrical and Computer Engineering, Texas Tech University, Lubbock, TX, 79409, USA
| | - Souparno Ghosh
- Department of Statistics, University of Nebraska - Lincoln, Lincoln, NB, 68588, USA.
| | - Ranadip Pal
- Department of Electrical and Computer Engineering, Texas Tech University, Lubbock, TX, 79409, USA.
| |
Collapse
|
8
|
Kumar N, Acharya V. Advances in machine intelligence-driven virtual screening approaches for big-data. Med Res Rev 2024; 44:939-974. [PMID: 38129992 DOI: 10.1002/med.21995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 07/15/2023] [Accepted: 10/29/2023] [Indexed: 12/23/2023]
Abstract
Virtual screening (VS) is an integral and ever-evolving domain of drug discovery framework. The VS is traditionally classified into ligand-based (LB) and structure-based (SB) approaches. Machine intelligence or artificial intelligence has wide applications in the drug discovery domain to reduce time and resource consumption. In combination with machine intelligence algorithms, VS has emerged into revolutionarily progressive technology that learns within robust decision orders for data curation and hit molecule screening from large VS libraries in minutes or hours. The exponential growth of chemical and biological data has evolved as "big-data" in the public domain demands modern and advanced machine intelligence-driven VS approaches to screen hit molecules from ultra-large VS libraries. VS has evolved from an individual approach (LB and SB) to integrated LB and SB techniques to explore various ligand and target protein aspects for the enhanced rate of appropriate hit molecule prediction. Current trends demand advanced and intelligent solutions to handle enormous data in drug discovery domain for screening and optimizing hits or lead with fewer or no false positive hits. Following the big-data drift and tremendous growth in computational architecture, we presented this review. Here, the article categorized and emphasized individual VS techniques, detailed literature presented for machine learning implementation, modern machine intelligence approaches, and limitations and deliberated the future prospects.
Collapse
Affiliation(s)
- Neeraj Kumar
- Artificial Intelligence for Computational Biology Lab (AICoB), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh, India
- Academy of Scientific and Innovative Research, Ghaziabad, India
| | - Vishal Acharya
- Artificial Intelligence for Computational Biology Lab (AICoB), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh, India
- Academy of Scientific and Innovative Research, Ghaziabad, India
| |
Collapse
|
9
|
Chou RT, Ouattara A, Adams M, Berry AA, Takala-Harrison S, Cummings MP. Positive-unlabeled learning identifies vaccine candidate antigens in the malaria parasite Plasmodium falciparum. NPJ Syst Biol Appl 2024; 10:44. [PMID: 38678051 PMCID: PMC11055854 DOI: 10.1038/s41540-024-00365-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Accepted: 03/29/2024] [Indexed: 04/29/2024] Open
Abstract
Malaria vaccine development is hampered by extensive antigenic variation and complex life stages of Plasmodium species. Vaccine development has focused on a small number of antigens, many of which were identified without utilizing systematic genome-level approaches. In this study, we implement a machine learning-based reverse vaccinology approach to predict potential new malaria vaccine candidate antigens. We assemble and analyze P. falciparum proteomic, structural, functional, immunological, genomic, and transcriptomic data, and use positive-unlabeled learning to predict potential antigens based on the properties of known antigens and remaining proteins. We prioritize candidate antigens based on model performance on reference antigens with different genetic diversity and quantify the protein properties that contribute most to identifying top candidates. Candidate antigens are characterized by gene essentiality, gene ontology, and gene expression in different life stages to inform future vaccine development. This approach provides a framework for identifying and prioritizing candidate vaccine antigens for a broad range of pathogens.
Collapse
Affiliation(s)
- Renee Ti Chou
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, College Park, MD, USA
| | - Amed Ouattara
- Center for Vaccine Development and Global Health, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Matthew Adams
- Center for Vaccine Development and Global Health, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Andrea A Berry
- Center for Vaccine Development and Global Health, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Shannon Takala-Harrison
- Center for Vaccine Development and Global Health, University of Maryland School of Medicine, Baltimore, MD, USA.
| | - Michael P Cummings
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, College Park, MD, USA.
| |
Collapse
|
10
|
Zhang S, Luo X, Mai B. Multi-task machine learning models for simultaneous prediction of tissue-to-blood partition coefficients of chemicals in mammals. ENVIRONMENTAL RESEARCH 2024; 241:117603. [PMID: 37939805 DOI: 10.1016/j.envres.2023.117603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 10/25/2023] [Accepted: 11/04/2023] [Indexed: 11/10/2023]
Abstract
Tissue-to-blood partition coefficients (Ptb) are crucial for assessing the distribution of chemicals in organisms. Given the lack of experimental data and laborious nature of experimental methods, there is an urgent need to develop efficient predictive models. With the help of machine learning algorithms, i,e., random forest (RF), and artificial neural network (ANN), this study developed multi-task (MT) models that can simultaneously predict Ptb values for various mammalian tissues, including liver, muscle, brain, lung, and adipose. Single-task (ST) models using partial least squares regression, RF, and ANN algorithms for each endpoint were established for comparison. Overall, the performances of MT models were superior to those of ST models. The MT model using ANN algorithms showed the highest prediction accuracy with determination coefficients ranging from 0.704 to 0.886, root mean square errors between 0.223 and 0.410, and mean absolute errors ranging from 0.178 to 0.285 log units. Results showed that lipophilicity and polarizability of molecules significantly influence their partition behavior in organisms. Applicability domains (ADs) of the models were characterized by weighted molecular similarity density, and weighted inconsistency in molecular activities of structure-activity landscapes. When constrained by ADs, the models displayed enhanced predictive accuracy, making them valuable tools for the risk assessment and management of chemicals.
Collapse
Affiliation(s)
- Shuying Zhang
- State Key Laboratory of Organic Geochemistry and Guangdong Key Laboratory of Environmental Resources Utilization and Protection, Guangzhou Institute of Geochemistry, Chinese Academy of Sciences, Guangzhou, 510640, China
| | - Xiaojun Luo
- State Key Laboratory of Organic Geochemistry and Guangdong Key Laboratory of Environmental Resources Utilization and Protection, Guangzhou Institute of Geochemistry, Chinese Academy of Sciences, Guangzhou, 510640, China.
| | - Bixian Mai
- State Key Laboratory of Organic Geochemistry and Guangdong Key Laboratory of Environmental Resources Utilization and Protection, Guangzhou Institute of Geochemistry, Chinese Academy of Sciences, Guangzhou, 510640, China
| |
Collapse
|
11
|
Lu J, Ji X, Liu X, Jiang Y, Li G, Fang P, Li W, Zuo A, Guo Z, Yang S, Ji Y, Lu D. Machine learning-based radiomics strategy for prediction of acquired EGFR T790M mutation following treatment with EGFR-TKI in NSCLC. Sci Rep 2024; 14:446. [PMID: 38172228 PMCID: PMC10764785 DOI: 10.1038/s41598-023-50984-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Accepted: 12/28/2023] [Indexed: 01/05/2024] Open
Abstract
The epidermal growth factor receptor (EGFR) Thr790 Met (T790M) mutation is responsible for approximately half of the acquired resistance to EGFR-tyrosine kinase inhibitor (TKI) in non-small-cell lung cancer (NSCLC) patients. Identifying patients at diagnosis who are likely to develop this mutation after first- or second-generation EGFR-TKI treatment is crucial for better treatment outcomes. This study aims to develop and validate a radiomics-based machine learning (ML) approach to predict the T790M mutation in NSCLC patients at diagnosis. We collected retrospective data from 210 positive EGFR mutation NSCLC patients, extracting 1316 radiomics features from CT images. Using the LASSO algorithm, we selected 10 radiomics features and 2 clinical features most relevant to the mutations. We built models with 7 ML approaches and assessed their performance through the receiver operating characteristic (ROC) curve. The radiomics model and combined model, which integrated radiomics features and relevant clinical factors, achieved an area under the curve (AUC) of 0.80 (95% confidence interval [CI] 0.79-0.81) and 0.86 (0.87-0.88), respectively, in predicting the T790M mutation. Our study presents a convenient and noninvasive radiomics-based ML model for predicting this mutation at the time of diagnosis, aiding in targeted treatment planning for NSCLC patients with EGFR mutations.
Collapse
Affiliation(s)
- Jiameng Lu
- Department of Respiratory, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Shandong Institute of Respiratory Diseases, Shandong Institute of Anesthesia and Respiratory Critical Medicine, 16766 Jingshilu, Lixia, Jinan, 250014, Shandong, People's Republic of China
- School of Microelectronics, Shandong University, Jinan, 250100, Shandong, People's Republic of China
| | - Xiaoqing Ji
- Department of Nursing, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, 250014, Shandong, People's Republic of China
| | - Xinyi Liu
- Graduate School of Shandong First Medical University, Jinan, 250000, Shandong, People's Republic of China
| | - Yunxiu Jiang
- Graduate School of Shandong First Medical University, Jinan, 250000, Shandong, People's Republic of China
| | - Gang Li
- Department of Radiology, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Shandong Medicine and Health Key Laboratory of Abdominal Medicine Imaging, Shandong Lung Cancer Institute, Shandong Institute of Neuroimmunology, Jinan, 250000, Shandong, China
| | - Ping Fang
- Department of Blood Transfusion, The First Affiliated Hospital of Shandong First Medical University and Shandong Province Qianfoshan Hospital, Jinan, 250014, Shandong, China
| | - Wei Li
- Department of Radiology, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Shandong Medicine and Health Key Laboratory of Abdominal Medicine Imaging, Shandong Lung Cancer Institute, Shandong Institute of Neuroimmunology, Jinan, 250000, Shandong, China
| | - Anli Zuo
- Graduate School of Shandong First Medical University, Jinan, 250000, Shandong, People's Republic of China
| | - Zihan Guo
- Graduate School of Shandong First Medical University, Jinan, 250000, Shandong, People's Republic of China
| | - Shuran Yang
- Graduate School of Shandong First Medical University, Jinan, 250000, Shandong, People's Republic of China
| | - Yanbo Ji
- Department of Nursing, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, 250014, Shandong, People's Republic of China
| | - Degan Lu
- Department of Respiratory, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Shandong Institute of Respiratory Diseases, Shandong Institute of Anesthesia and Respiratory Critical Medicine, 16766 Jingshilu, Lixia, Jinan, 250014, Shandong, People's Republic of China.
| |
Collapse
|
12
|
Wang C, Liu J, Qiu C, Su X, Ma N, Li J, Wang S, Qu S. Identifying the drivers of chlorophyll-a dynamics in a landscape lake recharged by reclaimed water using interpretable machine learning. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 906:167483. [PMID: 37832666 DOI: 10.1016/j.scitotenv.2023.167483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/06/2023] [Revised: 09/21/2023] [Accepted: 09/28/2023] [Indexed: 10/15/2023]
Abstract
The water quality of lakes recharged by reclaimed water is affected by both the fluctuation of reclaimed water quality and the biochemical processes in the lakes, and therefore the main controlling factors of algal blooms are difficult to identify. Taking a typical landscape lake recharged by reclaimed water as an example and using the spatiotemporal distribution characteristics and correlation analysis of water quality indexes, we propose an interpretable machine learning framework based on random forest to predict chlorophyll-a (Chl-a). The model considered nutrient difference indexes between reclaimed water and lake water, and further used feature importance ranking and partial dependence plot to identify nutrient drivers. Results show that the NO3--N input from reclaimed water is the dominant nutrient driver for algal bloom especially at high temperatures, and the negative correlation between NO3--N and Chl-a in the lake water is the consequence of algal bloom rather than the cause. Our study provides new insights into the identification of eutrophication factors for lakes recharged by reclaimed water.
Collapse
Affiliation(s)
- Chenchen Wang
- School of Environmental and Municipal Engineering, Tianjin Chengjian University, Tianjin 300384, China; Tianjin Key Laboratory of Aquatic Science and Technology, Tianjin Chengjian University, Tianjin 300384, China; Key Laboratory of Drinking Water Science and Technology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| | - Juan Liu
- School of Environmental and Municipal Engineering, Tianjin Chengjian University, Tianjin 300384, China
| | - Chunsheng Qiu
- School of Environmental and Municipal Engineering, Tianjin Chengjian University, Tianjin 300384, China; Tianjin Key Laboratory of Aquatic Science and Technology, Tianjin Chengjian University, Tianjin 300384, China.
| | - Xiao Su
- Tianjin Water Group Co., Ltd, Tianjin 300042, China
| | - Ning Ma
- Tianjin Eco-City Water Investment and Construction Ltd, Tianjin 300467, China
| | - Jing Li
- School of Environmental and Municipal Engineering, Tianjin Chengjian University, Tianjin 300384, China
| | - Shaopo Wang
- School of Environmental and Municipal Engineering, Tianjin Chengjian University, Tianjin 300384, China; Tianjin Key Laboratory of Aquatic Science and Technology, Tianjin Chengjian University, Tianjin 300384, China
| | - Shen Qu
- Beijing Institute of Technology, Beijing 100081, China.
| |
Collapse
|
13
|
Gao T, Ren H, He S, Liang D, Xu Y, Chen K, Wang Y, Zhu Y, Dong H, Xu Z, Chen W, Cheng W, Jing F, Tao X. Development of an interpretable machine learning-based intelligent system of exercise prescription for cardio-oncology preventive care: A study protocol. Front Cardiovasc Med 2023; 9:1091885. [PMID: 38106819 PMCID: PMC10722170 DOI: 10.3389/fcvm.2022.1091885] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 12/12/2022] [Indexed: 12/19/2023] Open
Abstract
Background Cardiovascular disease (CVD) and cancer are the first and second causes of death in over 130 countries across the world. They are also among the top three causes in almost 180 countries worldwide. Cardiovascular complications are often noticed in cancer patients, with nearly 20% exhibiting cardiovascular comorbidities. Physical exercise may be helpful for cancer survivors and people living with cancer (PLWC), as it prevents relapses, CVD, and cardiotoxicity. Therefore, it is beneficial to recommend exercise as part of cardio-oncology preventive care. Objective With the progress of deep learning algorithms and the improvement of big data processing techniques, artificial intelligence (AI) has gradually become popular in the fields of medicine and healthcare. In the context of the shortage of medical resources in China, it is of great significance to adopt AI and machine learning methods for prescription recommendations. This study aims to develop an interpretable machine learning-based intelligent system of exercise prescription for cardio-oncology preventive care, and this paper presents the study protocol. Methods This will be a retrospective machine learning modeling cohort study with interventional methods (i.e., exercise prescription). We will recruit PLWC participants at baseline (from 1 January 2025 to 31 December 2026) and follow up over several years (from 1 January 2027 to 31 December 2028). Specifically, participants will be eligible if they are (1) PLWC in Stage I or cancer survivors from Stage I; (2) aged between 18 and 55 years; (3) interested in physical exercise for rehabilitation; (4) willing to wear smart sensors/watches; (5) assessed by doctors as suitable for exercise interventions. At baseline, clinical exercise physiologist certificated by the joint training program (from 1 January 2023 to 31 December 2024) of American College of Sports Medicine and Chinese Association of Sports Medicine will recommend exercise prescription to each participant. During the follow-up, effective exercise prescription will be determined by assessing the CVD status of the participants. Expected outcomes This study aims to develop not only an interpretable machine learning model to recommend exercise prescription but also an intelligent system of exercise prescription for precision cardio-oncology preventive care. Ethics This study is approved by Human Experimental Ethics Inspection of Guangzhou Sport University. Clinical trial registration http://www.chictr.org.cn, identifier ChiCTR2300077887.
Collapse
Affiliation(s)
- Tianyu Gao
- School of Physical Education, Jinan University, Guangzhou, China
| | - Hao Ren
- Institute for Healthcare Artificial Intelligence Application, Guangdong Second Provincial General Hospital, Guangzhou, China
- Faculty of Data Science, City University of Macau, Macao, Macao SAR, China
| | - Shan He
- Guangzhou Sport University, Guangzhou, China
| | - Deyi Liang
- Guangdong Women and Children Hospital, Guangzhou, China
| | - Yuming Xu
- Division of Physical Education, Guangdong University of Finance and Economics, Guangzhou, China
- School of Education, City University of Macau, Macao, Macao SAR, China
| | - Kecheng Chen
- School of Data Science, City University of Hong Kong, Hong Kong, Hong Kong SAR, China
| | - Yufan Wang
- Department of Industrial Engineering and Management, School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Yuxin Zhu
- Syns Institute of Educational Research, Hong Kong, Hong Kong SAR, China
| | - Heling Dong
- School of Physical Education, Jinan University, Guangzhou, China
| | - Zhongzhi Xu
- School of Public Health, Sun Yat-Sen University, Guangzhou, China
| | - Weiming Chen
- Department of Health Medicine, Guangdong Second Provincial General Hospital, Guangzhou, China
| | - Weibin Cheng
- Institute for Healthcare Artificial Intelligence Application, Guangdong Second Provincial General Hospital, Guangzhou, China
- School of Data Science, City University of Hong Kong, Hong Kong, Hong Kong SAR, China
| | - Fengshi Jing
- Institute for Healthcare Artificial Intelligence Application, Guangdong Second Provincial General Hospital, Guangzhou, China
- Faculty of Data Science, City University of Macau, Macao, Macao SAR, China
- UNC Project-China, UNC Global, School of Medicine, The University of North Carolina, Chapel Hill, NC, United States
| | - Xiaoyu Tao
- Zhuhai College of Science and Technology, Zhuhai, China
- ZCST Health and Medicine Industry Research Institute, Zhuhai, China
| |
Collapse
|
14
|
Jia X, Wang T, Zhu H. Advancing Computational Toxicology by Interpretable Machine Learning. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:17690-17706. [PMID: 37224004 PMCID: PMC10666545 DOI: 10.1021/acs.est.3c00653] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 05/05/2023] [Accepted: 05/05/2023] [Indexed: 05/26/2023]
Abstract
Chemical toxicity evaluations for drugs, consumer products, and environmental chemicals have a critical impact on human health. Traditional animal models to evaluate chemical toxicity are expensive, time-consuming, and often fail to detect toxicants in humans. Computational toxicology is a promising alternative approach that utilizes machine learning (ML) and deep learning (DL) techniques to predict the toxicity potentials of chemicals. Although the applications of ML- and DL-based computational models in chemical toxicity predictions are attractive, many toxicity models are "black boxes" in nature and difficult to interpret by toxicologists, which hampers the chemical risk assessments using these models. The recent progress of interpretable ML (IML) in the computer science field meets this urgent need to unveil the underlying toxicity mechanisms and elucidate the domain knowledge of toxicity models. In this review, we focused on the applications of IML in computational toxicology, including toxicity feature data, model interpretation methods, use of knowledge base frameworks in IML development, and recent applications. The challenges and future directions of IML modeling in toxicology are also discussed. We hope this review can encourage efforts in developing interpretable models with new IML algorithms that can assist new chemical assessments by illustrating toxicity mechanisms in humans.
Collapse
Affiliation(s)
- Xuelian Jia
- Department
of Chemistry and Biochemistry, Rowan University, Glassboro, New Jersey 08028, United States
| | - Tong Wang
- Department
of Chemistry and Biochemistry, Rowan University, Glassboro, New Jersey 08028, United States
| | - Hao Zhu
- Department
of Chemistry and Biochemistry, Rowan University, Glassboro, New Jersey 08028, United States
| |
Collapse
|
15
|
Al-Maini M, Maindarkar M, Kitas GD, Khanna NN, Misra DP, Johri AM, Mantella L, Agarwal V, Sharma A, Singh IM, Tsoulfas G, Laird JR, Faa G, Teji J, Turk M, Viskovic K, Ruzsa Z, Mavrogeni S, Rathore V, Miner M, Kalra MK, Isenovic ER, Saba L, Fouda MM, Suri JS. Artificial intelligence-based preventive, personalized and precision medicine for cardiovascular disease/stroke risk assessment in rheumatoid arthritis patients: a narrative review. Rheumatol Int 2023; 43:1965-1982. [PMID: 37648884 DOI: 10.1007/s00296-023-05415-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 07/31/2023] [Indexed: 09/01/2023]
Abstract
The challenges associated with diagnosing and treating cardiovascular disease (CVD)/Stroke in Rheumatoid arthritis (RA) arise from the delayed onset of symptoms. Existing clinical risk scores are inadequate in predicting cardiac events, and conventional risk factors alone do not accurately classify many individuals at risk. Several CVD biomarkers consider the multiple pathways involved in the development of atherosclerosis, which is the primary cause of CVD/Stroke in RA. To enhance the accuracy of CVD/Stroke risk assessment in the RA framework, a proposed approach involves combining genomic-based biomarkers (GBBM) derived from plasma and/or serum samples with innovative non-invasive radiomic-based biomarkers (RBBM), such as measurements of synovial fluid, plaque area, and plaque burden. This review presents two hypotheses: (i) RBBM and GBBM biomarkers exhibit a significant correlation and can precisely detect the severity of CVD/Stroke in RA patients. (ii) Artificial Intelligence (AI)-based preventive, precision, and personalized (aiP3) CVD/Stroke risk AtheroEdge™ model (AtheroPoint™, CA, USA) that utilizes deep learning (DL) to accurately classify the risk of CVD/stroke in RA framework. The authors conducted a comprehensive search using the PRISMA technique, identifying 153 studies that assessed the features/biomarkers of RBBM and GBBM for CVD/Stroke. The study demonstrates how DL models can be integrated into the AtheroEdge™-aiP3 framework to determine the risk of CVD/Stroke in RA patients. The findings of this review suggest that the combination of RBBM with GBBM introduces a new dimension to the assessment of CVD/Stroke risk in the RA framework. Synovial fluid levels that are higher than normal lead to an increase in the plaque burden. Additionally, the review provides recommendations for novel, unbiased, and pruned DL algorithms that can predict CVD/Stroke risk within a RA framework that is preventive, precise, and personalized.
Collapse
Affiliation(s)
- Mustafa Al-Maini
- Allergy, Clinical Immunology and Rheumatology Institute, Toronto, ON, L4Z 4C4, Canada
| | - Mahesh Maindarkar
- Stroke Monitoring and Diagnostic Division, AtheroPoint™, Roseville, CA, 95661, USA
- Asia Pacific Vascular Society, New Delhi, 110001, India
| | - George D Kitas
- Academic Affairs, Dudley Group NHS Foundation Trust, Dudley, DY1 2HQ, UK
- Arthritis Research UK Epidemiology Unit, Manchester University, Manchester, M13 9PL, UK
| | - Narendra N Khanna
- Asia Pacific Vascular Society, New Delhi, 110001, India
- Department of Cardiology, Indraprastha APOLLO Hospitals, New Delhi, 110001, India
| | | | - Amer M Johri
- Division of Cardiology, Department of Medicine, Queen's University, Kingston, Canada
| | - Laura Mantella
- Division of Cardiology, Department of Medicine, University of Toronto, Toronto, Canada
| | - Vikas Agarwal
- Department of Immunology, SGPIMS, Lucknow, 226014, India
| | - Aman Sharma
- Department of Immunology, SGPIMS, Lucknow, 226014, India
| | - Inder M Singh
- Stroke Monitoring and Diagnostic Division, AtheroPoint™, Roseville, CA, 95661, USA
| | - George Tsoulfas
- Department of Surgery, Aristoteleion University of Thessaloniki, 54124, Thessaloniki, Greece
| | - John R Laird
- Heart and Vascular Institute, Adventist Health St. Helena, St Helena, CA, 94574, USA
| | - Gavino Faa
- Department of Pathology, Azienda Ospedaliero Universitaria, 09124, Cagliari, Italy
| | - Jagjit Teji
- Ann and Robert H. Lurie Children's Hospital of Chicago, Chicago, IL, 60611, USA
| | - Monika Turk
- The Hanse-Wissenschaftskolleg Institute for Advanced Study, 27753, Delmenhorst, Germany
| | - Klaudija Viskovic
- Department of Radiology and Ultrasound, UHID, 10 000, Zagreb, Croatia
| | - Zoltan Ruzsa
- Invasive Cardiology Division, University of Szeged, Szeged, Hungary
| | - Sophie Mavrogeni
- Cardiology Clinic, Onassis Cardiac Surgery Centre, Athens, Greece
| | - Vijay Rathore
- Nephrology Department, Kaiser Permanente, Sacramento, CA, 95823, USA
| | - Martin Miner
- Men's Health Centre, Miriam Hospital Providence, Providence, RI, 02906, USA
| | - Manudeep K Kalra
- Department of Radiology, Harvard Medical School, Boston, MA, USA
| | - Esma R Isenovic
- Department of Radiobiology and Molecular Genetics, National Institute of the Republic of Serbia, University of Belgrade, 11000, Belgrade, Serbia
| | - Luca Saba
- Department of Radiology, Azienda Ospedaliero Universitaria, 40138, Cagliari, Italy
| | - Mostafa M Fouda
- Department of Electrical and Computer Engineering, Idaho State University, Pocatello, ID, 83209, USA
| | - Jasjit S Suri
- Stroke Monitoring and Diagnostic Division, AtheroPoint™, Roseville, CA, 95661, USA.
| |
Collapse
|
16
|
Luo L, Li B, Wang X, Cui L, Liu G. Interpretable spatial identity neural network-based epidemic prediction. Sci Rep 2023; 13:18159. [PMID: 37875546 PMCID: PMC10598274 DOI: 10.1038/s41598-023-45177-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Accepted: 10/17/2023] [Indexed: 10/26/2023] Open
Abstract
Epidemic spatial-temporal risk analysis, e.g., infectious number forecasting, is a mainstream task in the multivariate time series research field, which plays a crucial role in the public health management process. With the rise of deep learning methods, many studies have focused on the epidemic prediction problem. However, recent primary prediction techniques face two challenges: the overcomplicated model and unsatisfactory interpretability. Therefore, this paper proposes an Interpretable Spatial IDentity (ISID) neural network to predict infectious numbers at the regional weekly level, which employs a light model structure and provides post-hoc explanations. First, this paper streamlines the classical spatio-temporal identity model (STID) and retains the optional spatial identity matrix for learning the contagion relationship between regions. Second, the well-known SHapley Additive explanations (SHAP) method was adopted to interpret how the ISID model predicts with multivariate sliding-window time series input data. The prediction accuracy of ISID is compared with several models in the experimental study, and the results show that the proposed ISID model achieves satisfactory epidemic prediction performance. Furthermore, the SHAP result demonstrates that the ISID pays particular attention to the most proximate and remote data in the input sequence (typically 20 steps long) while paying little attention to the intermediate steps. This study contributes to reliable and interpretable epidemic prediction through a more coherent approach for public health experts.
Collapse
Affiliation(s)
- Lanjun Luo
- School of Management, North Sichuan Medical College, Nanchong, China
| | - Boxiao Li
- Graduate School of Information Sciences, Tohoku University, Sendai, Japan
| | - Xueyan Wang
- Information Centre, Affiliated Hospital of North Sichuan Medical College, Nanchong, China.
| | - Lei Cui
- School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, China
| | - Gang Liu
- School of Management, Huazhong University of Science and Technology, Wuhan, China
| |
Collapse
|
17
|
Xiang Y, Tang YH, Lin G, Reker D. Interpretable Molecular Property Predictions Using Marginalized Graph Kernels. J Chem Inf Model 2023; 63:4633-4640. [PMID: 37504964 DOI: 10.1021/acs.jcim.3c00396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Marginalized graph kernels have shown competitive performance in molecular machine learning tasks but currently lack measures of interpretability, which are important to improve trust in the models, detect biases, and inform molecular optimization campaigns. We here conceive and implement two interpretability measures for Gaussian process regression using a marginalized graph kernel (GPR-MGK) to quantify (1) the contribution of specific training data to the prediction and (2) the contribution of specific nodes of the graph to the prediction. We demonstrate the applicability of these interpretability measures for molecular property prediction. We compare GPR-MGK to graph neural networks on four logic and two real-world toxicology data sets and find that the atomic attribution of GPR-MGK generally outperforms the atomic attribution of graph neural networks. We also perform a detailed molecular attribution analysis using the FreeSolv data set, showing how molecules in the training set influence machine learning predictions and why Morgan fingerprints perform poorly on this data set. This is the first systematic examination of the interpretability of GPR-MGK and thereby is an important step in the further maturation of marginalized graph kernel methods for interpretable molecular predictions.
Collapse
Affiliation(s)
- Yan Xiang
- Department of Biomedical Engineering, Duke University, Durham, North Carolina 27705, United States
| | - Yu-Hang Tang
- Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Guang Lin
- Department of Mathematics & School of Mechanical Engineering, Purdue University, West Lafayette, Indiana 47907, United States
| | - Daniel Reker
- Department of Biomedical Engineering, Duke University, Durham, North Carolina 27705, United States
| |
Collapse
|
18
|
Charvet CJ, Ofori K, Falcone C, Rigby Dames BA. Transcription, structure, and organoids translate time across the lifespan of humans and great apes. PNAS NEXUS 2023; 2:pgad230. [PMID: 37554928 PMCID: PMC10406161 DOI: 10.1093/pnasnexus/pgad230] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 02/20/2023] [Accepted: 07/13/2023] [Indexed: 08/10/2023]
Abstract
How the neural structures supporting human cognition developed and arose in evolution is an enduring question of interest. Yet, we still lack appropriate procedures to align ages across primates, and this lacuna has hindered progress in understanding the evolution of biological programs. We generated a dataset of unprecedented size consisting of 573 time points from abrupt and gradual changes in behavior, anatomy, and transcription across human and 8 nonhuman primate species. We included time points from diverse human populations to capture within-species variation in the generation of cross-species age alignments. We also extracted corresponding ages from organoids. The identification of corresponding ages across the lifespan of 8 primate species, including apes (e.g., orangutans, gorillas) and monkeys (i.e., marmosets, macaques), reveals that some biological pathways are extended in humans compared with some nonhuman primates. Notably, the human lifespan is unusually extended relative to studied nonhuman primates demonstrating that very old age is a phase of life in humans that does not map to other studied primate species. More generally, our work prompts a reevaluation in the choice of a model system to understand aging given very old age in humans is a period of life without a clear counterpart in great apes.
Collapse
Affiliation(s)
- Christine J Charvet
- Department of Anatomy, Physiology and Pharmacology, College of Veterinary Medicine, Auburn University, 1130 Wire Road, Auburn, 36832, AL, USA
| | - Kwadwo Ofori
- Department of Biology, Delaware State University, 1200 N. Dupont Highway, Dover, DE, 19901, USA
| | - Carmen Falcone
- Department of Neuroscience, International School for Advanced Studies (SISSA), Via Bonomea, 265, 34136 Trieste, Italy
| | - Brier A Rigby Dames
- Department of Computer Science, University of Bath, Claverton Down, Bath, BA2 7AY, UK
- Department of Psychology, University of Bath, Claverton Down, Bath, BA2 7AY, UK
| |
Collapse
|
19
|
Rossi RJ, Tisherman RA, Jaeger JM, Domen J, Shonkoff SBC, DiGiulio DC. Historic and Contemporary Surface Disposal of Produced Water Likely Inputs Arsenic and Selenium to Surficial Aquifers. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:7559-7567. [PMID: 37146013 DOI: 10.1021/acs.est.3c01219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
Oil and gas development generates large amounts of wastewater (i.e., produced water), which in California has been partially disposed of in unlined percolation/evaporation ponds since the mid-20th century. Although produced water is known to contain multiple environmental contaminants (e.g., radium and trace metals), prior to 2015, detailed chemical characterizations of pondwaters were the exception rather than the norm. Using a state-run database, we synthesized samples (n = 1688) collected from produced water ponds within the southern San Joaquin Valley of California, one of the most productive agricultural regions in the world, to examine regional trends in pondwater arsenic and selenium concentrations. We filled crucial knowledge gaps resulting from historical pondwater monitoring by constructing random forest regression models using commonly measured analytes (boron, chloride, and total dissolved solids) and geospatial data (e.g., soil physiochemical data) to predict arsenic and selenium concentrations in historical samples. Our analysis suggests that both arsenic and selenium levels are elevated in pondwaters and thus this disposal practice may have contributed substantial amounts of arsenic and selenium to aquifers having beneficial uses. We further use our models to identify areas where additional monitoring infrastructure would better constrain the extent of legacy contamination and potential threats to groundwater quality.
Collapse
Affiliation(s)
- Robert J Rossi
- PSE Healthy Energy, Oakland, California 94612, United States
| | | | - Jessie M Jaeger
- PSE Healthy Energy, Oakland, California 94612, United States
| | - Jeremy Domen
- PSE Healthy Energy, Oakland, California 94612, United States
| | - Seth B C Shonkoff
- PSE Healthy Energy, Oakland, California 94612, United States
- Division of Environmental Health Sciences, School of Public Health, University of California, Berkeley, California 94720, United States
- Energy Technologies Area, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Dominic C DiGiulio
- Department of Civil, Environmental, and Architectural Engineering, University of Colorado, Boulder, Colorado 80309, United States
| |
Collapse
|
20
|
Wu Y, Grant S, Chen W, Szarka A. Refining acute human exposure assessment to pesticides in surface water: An integrated data-driven modeling approach. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 865:161190. [PMID: 36581287 DOI: 10.1016/j.scitotenv.2022.161190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 12/03/2022] [Accepted: 12/21/2022] [Indexed: 06/17/2023]
Abstract
The substantial spatial and temporal variability of pesticides has led to large uncertainties when determining their peak aqueous concentrations. There is however a lack of large-scale studies dealing with accurate determination of annual maximum daily concentration (AMDC) across the landscape and over time based on the publicly available monitoring data. We developed a novel data-driven approach that firstly used time series modeling to generate AMDCs for qualified water monitoring sites in the conterminous U.S. With feature variables such as pesticide use and land cover compiled into the dataset, machine learning models using eXtreme Gradient Boosting (XGBoost) and Random Forest Regressor (RF) were then developed to estimate AMDCs in surface waters across the U.S. Both models exhibited significant predictability, while a hybrid model consisting of the average predictions by XGBoost and RF model had the highest prediction accuracy (mean absolute error (MAE): 1.23; R2: 0.61). The analysis of permutation variable importance indicated that pesticide use and drainage area were the two most important drivers. Partial dependence analysis revealed that pesticide use, precipitation, cultivated crop land cover and solubility exhibited concentration-promoting effects, whereas drainage area and molecular weight had concentration-demoting effects. Soil adsorption coefficient (Koc) showed nonmonotonic effects. The hybrid model was used to predict and map AMDCs of four example pesticides, including 2,4-dichlorophenoxyacetic acid (2,4-D), atrazine, glyphosate and imidacloprid during 2016-2019 at national scale. The predictive capability was validated using independent monitoring datasets. The fully evaluated approach significantly reduced the uncertainties in modeling annual peak concentrations and served as a valuable solution for conducting geographically oriented, highly refined exposure assessments for pesticides.
Collapse
Affiliation(s)
- Yaoxing Wu
- Product Safety, Syngenta Crop Protection LLC, Greensboro, NC 27409, USA.
| | - Shanique Grant
- Product Safety, Syngenta Crop Protection LLC, Greensboro, NC 27409, USA
| | - Wenlin Chen
- Product Safety, Syngenta Crop Protection LLC, Greensboro, NC 27409, USA
| | - Arpad Szarka
- Product Safety, Syngenta Crop Protection LLC, Greensboro, NC 27409, USA
| |
Collapse
|
21
|
Zhang HS, Feng QD, Zhang DY, Zhu GL, Yang L. Bacterial community structure in geothermal springs on the northern edge of Qinghai-Tibet plateau. Front Microbiol 2023; 13:994179. [PMID: 37180363 PMCID: PMC10172933 DOI: 10.3389/fmicb.2022.994179] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 12/13/2022] [Indexed: 03/19/2023] Open
Abstract
Introduction:In order to reveal the composition of the subsurface hydrothermal bacterial community in the zones of magmatic tectonics and their response to heat storage environments.Methods:In this study, we performed hydrochemical analysis and regional sequencing of the 16S rRNA microbial V4-V5 region in 7 Pleistocene and Lower Neogene hot water samples from the Gonghe basin.Results:Two geothermal hot spring reservoirs in the study area were found to be alkaline reducing environments with a mean temperature of 24.83°C and 69.28°C, respectively, and the major type of hydrochemistry was SO4-Cl·Na. The composition and structure of microorganisms in both types of geologic thermal storage were primarily controlled by temperature, reducing environment intensity, and hydrogeochemical processes. Only 195 ASVs were shared across different temperature environments, and the dominant bacterial genera in recent samples from temperate hot springs were Thermus and Hydrogenobacter, with both genera being typical of thermophiles. The correlation analysis showed that the overall level of relative abundance of the subsurface hot spring relied on a high temperature and a slightly alkaline reducing environment. Nearly all of the top 4 species in the abundance level (53.99% of total abundance) were positively correlated with temperature and pH, whereas they were negatively correlated with ORP (oxidation–reduction potential), nitrate, and bromine ions.Discussion:In general, the composition of bacteria in the groundwater in the study area was sensitive to the response of the thermal storage environment and also showed a relationship with geochemical processes, such as gypsum dissolution, mineral oxidation, etc.
Collapse
|
22
|
Romero-Gainza E, Stewart C. AI-Driven Validation of Digital Agriculture Models. SENSORS (BASEL, SWITZERLAND) 2023; 23:1187. [PMID: 36772227 PMCID: PMC9919666 DOI: 10.3390/s23031187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Revised: 01/12/2023] [Accepted: 01/16/2023] [Indexed: 06/18/2023]
Abstract
Digital agriculture employs artificial intelligence (AI) to transform data collected in the field into actionable crop management. Effective digital agriculture models can detect problems early, reducing costs significantly. However, ineffective models can be counterproductive. Farmers often want to validate models by spot checking their fields before expending time and effort on recommended actions. However, in large fields, farmers can spot check too few areas, leading them to wrongly believe that ineffective models are effective. Model validation is especially difficult for models that use neural networks, an AI technology that normally assesses crops health accurately but makes inexplicable recommendations. We present a new approach that trains random forests, an AI modeling approach whose recommendations are easier to explain, to mimic neural network models. Then, using the random forest as an explainable white box, we can (1) gain knowledge about the neural network, (2) assess how well a test set represents possible inputs in a given field, (3) determine when and where a farmer should spot check their field for model validation, and (4) find input data that improve the test set. We tested our approach with data used to assess soybean defoliation. Using information from the four processes above, our approach can reduce spot checks by up to 94%.
Collapse
Affiliation(s)
- Eduardo Romero-Gainza
- Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210, USA
| | - Christopher Stewart
- Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210, USA
| |
Collapse
|
23
|
Yang J, Zhang D, Cai Y, Yu K, Li M, Liu L, Chen X. Computational Prediction of Drug Phenotypic Effects Based on Substructure-Phenotype Associations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:256-265. [PMID: 35239490 DOI: 10.1109/tcbb.2022.3155453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Identifying drug phenotypic effects, including therapeutic effects and adverse drug reactions (ADRs), is an inseparable part for evaluating the potentiality of new drug candidates (NDCs). However, current computational methods for predicting phenotypic effects of NDCs are mainly based on the overall structure of an NDC or a related target. These approaches often lead to inconsistencies between the structures and functions and limit the prediction space of NDCs. In this study, first, we constructed quantitative associations of substructure-domain, domain-ADR, and domain-ATC (Anatomical Therapeutic Chemical Classification System code) through L1LOG and L1SVM machine learning models. These associations represent relationships between phenotypes (ADRs and ATCs) and local structures of drugs and proteins. Then, based on these established associations, substructure-phenotype relationships were constructed which were utilized to quantify drug-phenotype relationships. Thus, this approach could achieve high-throughput and effective evaluations of the druggability of NDCs by referring to the established substructure-phenotype relationships and structural information of NDCs without additional prior knowledge. Using this computational pipeline, 83,205 drug-ATC relationships (including 1,479 drugs and 178 ATCs) and 306,421 drug-ADR relationships (including 1,752 drugs and 454 ADRs) were predicted in total. The prediction results were validated at four levels: five-fold cross validation, public databases, literature, and molecular docking. Furthermore, three case studies demonstrated the feasibility of our method. 79 ATCs and 269 ADRs were predicted to be related to Maraviroc, an approved drug, including the existing antiviral effect in clinical use. Additionally, we also found risk substructures of severe ADRs, for example, SUB215 (>= 1, saturated or only aromatic carbon ring size 7) can result in shock. And we analyzed the mechanism of action (MOA) of interested drugs based on the established drug-substructure-domain-protein associations. In a word, this approach through establishing drug-substructure-phenotype relationships can achieve quantitative prediction of phenotypes for a given NDC or drug without any prior knowledge except its structure information. Using that way, we can directly obtain the relationships between substructure and phenotype of a compound, which is more convenient to analyze the phenotypic mechanism of drugs and accelerate the process of rational drug design.
Collapse
|
24
|
Wang S, Wang J, Zhu MX, Tan Q. Machine learning for the prediction of minor amputation in University of Texas grade 3 diabetic foot ulcers. PLoS One 2022; 17:e0278445. [PMID: 36472981 PMCID: PMC9725167 DOI: 10.1371/journal.pone.0278445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 11/16/2022] [Indexed: 12/12/2022] Open
Abstract
Minor amputations are performed in a large proportion of patients with diabetic foot ulcers (DFU) and early identification of the outcome of minor amputations facilitates medical decision-making and ultimately reduces major amputations and deaths. However, there are currently no clinical predictive tools for minor amputations in patients with DFU. We aim to establish a predictive model based on machine learning to quickly identify patients requiring minor amputation among newly admitted patients with DFU. Overall, 362 cases with University of Texas grade (UT) 3 DFU were screened from tertiary care hospitals in East China. We utilized the synthetic minority oversampling strategy to compensate for the disparity in the initial dataset. A univariable analysis revealed nine variables to be included in the model: random blood glucose, years with diabetes, cardiovascular diseases, peripheral arterial diseases, DFU history, smoking history, albumin, creatinine, and C-reactive protein. Then, risk prediction models based on five machine learning algorithms: decision tree, random forest, logistic regression, support vector machine, and extreme gradient boosting (XGBoost) were independently developed with these variables. After evaluation, XGBoost earned the highest score (accuracy 0.814, precision 0.846, recall 0.767, F1-score 0.805, and AUC 0.881). For convenience, a web-based calculator based on our data and the XGBoost algorithm was established (https://dfuprediction.azurewebsites.net/). These findings imply that XGBoost can be used to develop a reliable prediction model for minor amputations in patients with UT3 DFU, and that our online calculator will make it easier for clinicians to assess the risk of minor amputations and make proactive decisions.
Collapse
Affiliation(s)
- Shiqi Wang
- Department of Burns and Plastic Surgery, Affiliated Drum Tower Hospital, Medical School of Nanjing University, Nanjing, China
| | - Jinwan Wang
- School of Information Management, Nanjing University, Nanjing, China
| | - Mark Xuefang Zhu
- School of Information Management, Nanjing University, Nanjing, China
- * E-mail: (MXZ); (QT)
| | - Qian Tan
- Department of Burns and Plastic Surgery, Affiliated Drum Tower Hospital, Medical School of Nanjing University, Nanjing, China
- * E-mail: (MXZ); (QT)
| |
Collapse
|
25
|
Khanna NN, Maindarkar MA, Viswanathan V, Puvvula A, Paul S, Bhagawati M, Ahluwalia P, Ruzsa Z, Sharma A, Kolluri R, Krishnan PR, Singh IM, Laird JR, Fatemi M, Alizad A, Dhanjil SK, Saba L, Balestrieri A, Faa G, Paraskevas KI, Misra DP, Agarwal V, Sharma A, Teji JS, Al-Maini M, Nicolaides A, Rathore V, Naidu S, Liblik K, Johri AM, Turk M, Sobel DW, Miner M, Viskovic K, Tsoulfas G, Protogerou AD, Mavrogeni S, Kitas GD, Fouda MM, Kalra MK, Suri JS. Cardiovascular/Stroke Risk Stratification in Diabetic Foot Infection Patients Using Deep Learning-Based Artificial Intelligence: An Investigative Study. J Clin Med 2022; 11:6844. [PMID: 36431321 PMCID: PMC9693632 DOI: 10.3390/jcm11226844] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Revised: 11/15/2022] [Accepted: 11/16/2022] [Indexed: 11/22/2022] Open
Abstract
A diabetic foot infection (DFI) is among the most serious, incurable, and costly to treat conditions. The presence of a DFI renders machine learning (ML) systems extremely nonlinear, posing difficulties in CVD/stroke risk stratification. In addition, there is a limited number of well-explained ML paradigms due to comorbidity, sample size limits, and weak scientific and clinical validation methodologies. Deep neural networks (DNN) are potent machines for learning that generalize nonlinear situations. The objective of this article is to propose a novel investigation of deep learning (DL) solutions for predicting CVD/stroke risk in DFI patients. The Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) search strategy was used for the selection of 207 studies. We hypothesize that a DFI is responsible for increased morbidity and mortality due to the worsening of atherosclerotic disease and affecting coronary artery disease (CAD). Since surrogate biomarkers for CAD, such as carotid artery disease, can be used for monitoring CVD, we can thus use a DL-based model, namely, Long Short-Term Memory (LSTM) and Recurrent Neural Networks (RNN) for CVD/stroke risk prediction in DFI patients, which combines covariates such as office and laboratory-based biomarkers, carotid ultrasound image phenotype (CUSIP) lesions, along with the DFI severity. We confirmed the viability of CVD/stroke risk stratification in the DFI patients. Strong designs were found in the research of the DL architectures for CVD/stroke risk stratification. Finally, we analyzed the AI bias and proposed strategies for the early diagnosis of CVD/stroke in DFI patients. Since DFI patients have an aggressive atherosclerotic disease, leading to prominent CVD/stroke risk, we, therefore, conclude that the DL paradigm is very effective for predicting the risk of CVD/stroke in DFI patients.
Collapse
Affiliation(s)
- Narendra N. Khanna
- Department of Cardiology, Indraprastha APOLLO Hospitals, New Delhi 110001, India
| | - Mahesh A. Maindarkar
- Stroke Monitoring and Diagnostic Division, AtheroPoint™, Roseville, CA 95661, USA
- Department of Biomedical Engineering, North Eastern Hill University, Shillong 793022, India
| | | | - Anudeep Puvvula
- Stroke Monitoring and Diagnostic Division, AtheroPoint™, Roseville, CA 95661, USA
- Annu’s Hospitals for Skin and Diabetes, Nellore 524101, India
| | - Sudip Paul
- Department of Biomedical Engineering, North Eastern Hill University, Shillong 793022, India
| | - Mrinalini Bhagawati
- Department of Biomedical Engineering, North Eastern Hill University, Shillong 793022, India
| | - Puneet Ahluwalia
- Max Institute of Cancer Care, Max Super Specialty Hospital, New Delhi 110017, India
| | - Zoltan Ruzsa
- Invasive Cardiology Division, Faculty of Medicine, University of Szeged, 6720 Szeged, Hungary
| | - Aditya Sharma
- Division of Cardiovascular Medicine, University of Virginia, Charlottesville, VA 22904, USA
| | - Raghu Kolluri
- Ohio Health Heart and Vascular, Columbus, OH 43214, USA
| | | | - Inder M. Singh
- Stroke Monitoring and Diagnostic Division, AtheroPoint™, Roseville, CA 95661, USA
| | - John R. Laird
- Heart and Vascular Institute, Adventist Health St. Helena, St Helena, CA 94574, USA
| | - Mostafa Fatemi
- Department of Physiology & Biomedical Engineering, Mayo Clinic College of Medicine and Science, Rochester, MN 55905, USA
| | - Azra Alizad
- Department of Radiology, Mayo Clinic College of Medicine and Science, Rochester, MN 55905, USA
| | - Surinder K. Dhanjil
- Stroke Monitoring and Diagnostic Division, AtheroPoint™, Roseville, CA 95661, USA
| | - Luca Saba
- Department of Radiology, Azienda Ospedaliero Universitaria, 40138 Cagliari, Italy
| | - Antonella Balestrieri
- Cardiovascular Prevention and Research Unit, Department of Pathophysiology, National & Kapodistrian University of Athens, 15772 Athens, Greece
| | - Gavino Faa
- Department of Pathology, Azienda Ospedaliero Universitaria, 09124 Cagliari, Italy
| | | | | | - Vikas Agarwal
- Department of Immunology, SGPGIMS, Lucknow 226014, India
| | - Aman Sharma
- Department of Immunology, SGPGIMS, Lucknow 226014, India
| | - Jagjit S. Teji
- Ann and Robert H. Lurie Children’s Hospital of Chicago, Chicago, IL 60611, USA
| | - Mustafa Al-Maini
- Allergy, Clinical Immunology and Rheumatology Institute, Toronto, ON L4Z 4C4, Canada
| | - Andrew Nicolaides
- Vascular Screening and Diagnostic Centre, University of Nicosia Medical School, Egkomi 2408, Cyprus
| | | | - Subbaram Naidu
- Electrical Engineering Department, University of Minnesota, Duluth, MN 55812, USA
| | - Kiera Liblik
- Department of Medicine, Division of Cardiology, Queen’s University, Kingston, ON K7L 3N6, Canada
| | - Amer M. Johri
- Department of Medicine, Division of Cardiology, Queen’s University, Kingston, ON K7L 3N6, Canada
| | - Monika Turk
- The Hanse-Wissenschaftskolleg Institute for Advanced Study, 27753 Delmenhorst, Germany
| | - David W. Sobel
- Rheumatology Unit, National Kapodistrian University of Athens, 15772 Athens, Greece
| | - Martin Miner
- Men’s Health Centre, Miriam Hospital Providence, Providence, RI 02906, USA
| | - Klaudija Viskovic
- Department of Radiology and Ultrasound, University Hospital for Infectious Diseases, 10000 Zagreb, Croatia
| | - George Tsoulfas
- Department of Surgery, Aristoteleion University of Thessaloniki, 54124 Thessaloniki, Greece
| | - Athanasios D. Protogerou
- Cardiovascular Prevention and Research Unit, Department of Pathophysiology, National & Kapodistrian University of Athens, 15772 Athens, Greece
| | - Sophie Mavrogeni
- Cardiology Clinic, Onassis Cardiac Surgery Centre, 17674 Athens, Greece
| | - George D. Kitas
- Academic Affairs, Dudley Group NHS Foundation Trust, Dudley DY1 2HQ, UK
- Arthritis Research UK Epidemiology Unit, Manchester University, Manchester M13 9PL, UK
| | - Mostafa M. Fouda
- Department of Electrical and Computer Engineering, Idaho State University, Pocatello, ID 83209, USA
| | | | - Jasjit S. Suri
- Stroke Monitoring and Diagnostic Division, AtheroPoint™, Roseville, CA 95661, USA
| |
Collapse
|
26
|
Bellamy H, Rehim AA, Orhobor OI, King R. Batched Bayesian Optimization for Drug Design in Noisy Environments. J Chem Inf Model 2022; 62:3970-3981. [PMID: 36044048 PMCID: PMC9472273 DOI: 10.1021/acs.jcim.2c00602] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
![]()
The early stages of the drug design process involve identifying
compounds with suitable bioactivities via noisy assays. As databases
of possible drugs are often very large, assays can only be performed
on a subset of the candidates. Selecting which assays to perform is
best done within an active learning process, such as batched Bayesian
optimization, and aims to reduce the number of assays that must be
performed. We compare how noise affects different batched Bayesian
optimization techniques and introduce a retest policy to mitigate
the effect of noise. Our experiments show that batched Bayesian optimization
remains effective, even when large amounts of noise are present, and
that the retest policy enables more active compounds to be identified
in the same number of experiments.
Collapse
Affiliation(s)
- Hugo Bellamy
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, UK
| | - Abbi Abdel Rehim
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, UK
| | - Oghenejokpeme I Orhobor
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, UK
| | - Ross King
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, UK
| |
Collapse
|
27
|
Effah CY, Miao R, Drokow EK, Agboyibor C, Qiao R, Wu Y, Miao L, Wang Y. Machine learning-assisted prediction of pneumonia based on non-invasive measures. Front Public Health 2022; 10:938801. [PMID: 35968461 PMCID: PMC9371749 DOI: 10.3389/fpubh.2022.938801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Accepted: 06/23/2022] [Indexed: 11/13/2022] Open
Abstract
Background Pneumonia is an infection of the lungs that is characterized by high morbidity and mortality. The use of machine learning systems to detect respiratory diseases via non-invasive measures such as physical and laboratory parameters is gaining momentum and has been proposed to decrease diagnostic uncertainty associated with bacterial pneumonia. Herein, this study conducted several experiments using eight machine learning models to predict pneumonia based on biomarkers, laboratory parameters, and physical features. Methods We perform machine-learning analysis on 535 different patients, each with 45 features. Data normalization to rescale all real-valued features was performed. Since it is a binary problem, we categorized each patient into one class at a time. We designed three experiments to evaluate the models: (1) feature selection techniques to select appropriate features for the models, (2) experiments on the imbalanced original dataset, and (3) experiments on the SMOTE data. We then compared eight machine learning models to evaluate their effectiveness in predicting pneumonia Results Biomarkers such as C-reactive protein and procalcitonin demonstrated the most significant discriminating power. Ensemble machine learning models such as RF (accuracy = 92.0%, precision = 91.3%, recall = 96.0%, f1-Score = 93.6%) and XGBoost (accuracy = 90.8%, precision = 92.6%, recall = 92.3%, f1-score = 92.4%) achieved the highest performance accuracy on the original dataset with AUCs of 0.96 and 0.97, respectively. On the SMOTE dataset, RF and XGBoost achieved the highest prediction results with f1-scores of 92.0 and 91.2%, respectively. Also, AUC of 0.97 was achieved for both RF and XGBoost models. Conclusions Our models showed that in the diagnosis of pneumonia, individual clinical history, laboratory indicators, and symptoms do not have adequate discriminatory power. We can also conclude that the ensemble ML models performed better in this study.
Collapse
Affiliation(s)
| | - Ruoqi Miao
- College of Public Health, Zhengzhou University, Zhengzhou, China
| | - Emmanuel Kwateng Drokow
- Department of Radiation Oncology, Zhengzhou University People's Hospital, Henan Provincial People's Hospital, Zhengzhou, China
| | - Clement Agboyibor
- School of Pharmaceutical Sciences, Zhengzhou University, Zhengzhou, China
| | - Ruiping Qiao
- Department of Respiratory and Critical Care Medicine, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Yongjun Wu
- College of Public Health, Zhengzhou University, Zhengzhou, China
- *Correspondence: Yongjun Wu
| | - Lijun Miao
- Department of Respiratory and Critical Care Medicine, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
- Lijun Miao
| | - Yanbin Wang
- Center of Health Management, General Hospital of Anyang Iron and Steel Group Co., Ltd, Anyang, China
- Yanbin Wang
| |
Collapse
|
28
|
Artificial intelligence and machine-learning approaches in structure and ligand-based discovery of drugs affecting central nervous system. Mol Divers 2022; 27:959-985. [PMID: 35819579 DOI: 10.1007/s11030-022-10489-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Accepted: 06/21/2022] [Indexed: 12/11/2022]
Abstract
CNS disorders are indications with a very high unmet medical needs, relatively smaller number of available drugs, and a subpar satisfaction level among patients and caregiver. Discovery of CNS drugs is extremely expensive affair with its own unique challenges leading to extremely high attrition rates and low efficiency. With explosion of data in information age, there is hardly any aspect of life that has not been touched by data driven technologies such as artificial intelligence (AI) and machine learning (ML). Drug discovery is no exception, emergence of big data via genomic, proteomic, biological, and chemical technologies has driven pharmaceutical giants to collaborate with AI oriented companies to revolutionise drug discovery, with the goal of increasing the efficiency of the process. In recent years many examples of innovative applications of AI and ML techniques in CNS drug discovery has been reported. Research on therapeutics for diseases such as schizophrenia, Alzheimer's and Parkinsonism has been provided with a new direction and thrust from these developments. AI and ML has been applied to both ligand-based and structure-based drug discovery and design of CNS therapeutics. In this review, we have summarised the general aspects of AI and ML from the perspective of drug discovery followed by a comprehensive coverage of the recent developments in the applications of AI/ML techniques in CNS drug discovery.
Collapse
|
29
|
Beers AT, Frey SN. Greater sage‐grouse habitat selection varies across the marginal habitat of its lagging range margin. Ecosphere 2022. [DOI: 10.1002/ecs2.4146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Affiliation(s)
- Aidan T. Beers
- Department of Wildland Resources Utah State University Logan Utah USA
| | - Shandra N. Frey
- Department of Wildland Resources Utah State University Logan Utah USA
| |
Collapse
|
30
|
Meli R, Morris GM, Biggin PC. Scoring Functions for Protein-Ligand Binding Affinity Prediction using Structure-Based Deep Learning: A Review. FRONTIERS IN BIOINFORMATICS 2022; 2:885983. [PMID: 36187180 PMCID: PMC7613667 DOI: 10.3389/fbinf.2022.885983] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 05/11/2022] [Indexed: 01/01/2023] Open
Abstract
The rapid and accurate in silico prediction of protein-ligand binding free energies or binding affinities has the potential to transform drug discovery. In recent years, there has been a rapid growth of interest in deep learning methods for the prediction of protein-ligand binding affinities based on the structural information of protein-ligand complexes. These structure-based scoring functions often obtain better results than classical scoring functions when applied within their applicability domain. Here we review structure-based scoring functions for binding affinity prediction based on deep learning, focussing on different types of architectures, featurization strategies, data sets, methods for training and evaluation, and the role of explainable artificial intelligence in building useful models for real drug-discovery applications.
Collapse
Affiliation(s)
- Rocco Meli
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| | - Garrett M. Morris
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Philip C. Biggin
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
31
|
Jiménez-Luna J, Skalic M, Weskamp N. Benchmarking Molecular Feature Attribution Methods with Activity Cliffs. J Chem Inf Model 2022; 62:274-283. [PMID: 35019265 DOI: 10.1021/acs.jcim.1c01163] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Feature attribution techniques are popular choices within the explainable artificial intelligence toolbox, as they can help elucidate which parts of the provided inputs used by an underlying supervised-learning method are considered relevant for a specific prediction. In the context of molecular design, these approaches typically involve the coloring of molecular graphs, whose presentation to medicinal chemists can be useful for making a decision of which compounds to synthesize or prioritize. The consistency of the highlighted moieties alongside expert background knowledge is expected to contribute to the understanding of machine-learning models in drug design. Quantitative evaluation of such coloring approaches, however, has so far been limited to substructure identification tasks. We here present an approach that is based on maximum common substructure algorithms applied to experimentally-determined activity cliffs. Using the proposed benchmark, we found that molecule coloring approaches in conjunction with classical machine-learning models tend to outperform more modern, graph-neural-network alternatives. The provided benchmark data are fully open sourced, which we hope will facilitate the testing of newly developed molecular feature attribution techniques.
Collapse
Affiliation(s)
- José Jiménez-Luna
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, 8093 Zurich, Switzerland.,Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Straße 65, 88397 Biberach an der Riss, Germany
| | - Miha Skalic
- Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Straße 65, 88397 Biberach an der Riss, Germany
| | - Nils Weskamp
- Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Straße 65, 88397 Biberach an der Riss, Germany
| |
Collapse
|
32
|
Polypharmacology: The science of multi-targeting molecules. Pharmacol Res 2022; 176:106055. [PMID: 34990865 DOI: 10.1016/j.phrs.2021.106055] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 12/23/2021] [Accepted: 12/31/2021] [Indexed: 12/28/2022]
Abstract
Polypharmacology is a concept where a molecule can interact with two or more targets simultaneously. It offers many advantages as compared to the conventional single-targeting molecules. A multi-targeting drug is much more efficacious due to its cumulative efficacy at all of its individual targets making it much more effective in complex and multifactorial diseases like cancer, where multiple proteins and pathways are involved in the onset and development of the disease. For a molecule to be polypharmacologic in nature, it needs to possess promiscuity which is the ability to interact with multiple targets; and at the same time avoid binding to antitargets which would otherwise result in off-target adverse effects. There are certain structural features and physicochemical properties which when present would help researchers to predict if the designed molecule would possess promiscuity or not. Promiscuity can also be identified via advanced state-of-the-art computational methods. In this review, we also elaborate on the methods by which one can intentionally incorporate promiscuity in their molecules and make them polypharmacologic. The polypharmacology paradigm of "one drug-multiple targets" has numerous applications especially in drug repurposing where an already established drug is redeveloped for a new indication. Though designing a polypharmacological drug is much more difficult than designing a single-targeting drug, with the current technologies and information regarding different diseases and chemical functional groups, it is plausible for researchers to intentionally design a polypharmacological drug and unlock its advantages.
Collapse
|
33
|
Wang S, Xia C, Zheng Q, Wang A, Tan Q. Machine Learning Models for Predicting the Risk of Hard-to-Heal Diabetic Foot Ulcers in a Chinese Population. Diabetes Metab Syndr Obes 2022; 15:3347-3359. [PMID: 36341229 PMCID: PMC9628710 DOI: 10.2147/dmso.s383960] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 10/20/2022] [Indexed: 11/05/2022] Open
Abstract
BACKGROUND Early detection of hard-to-heal diabetic foot ulcers (DFUs) is vital to prevent a poor prognosis. The purpose of this work was to employ clinical characteristics to create an optimal predictive model of hard-to-heal DFUs (failing to decrease by >50% at 4 weeks) based on machine learning algorithms. METHODS A total of 362 DFU patients hospitalized in two tertiary hospitals in eastern China were enrolled in this study. The training dataset and validation dataset were split at a ratio of 7:3. Univariate logistic analysis and clinical experience were utilized to screen clinical characteristics as predictive features. The following six machine learning algorithms were used to build prediction models for differentiating hard-to-heal DFUs: support vector machine, the naïve Bayesian (NB) model, k-nearest neighbor, general linear regression, adaptive boosting, and random forest. Five cross-validations were employed to realize the model's parameters. Accuracy, precision, recall, F1-scores, and AUCs were utilized to compare and evaluate the models' efficacy. On the basis of the best model identified, the significance of each characteristic was evaluated, and then an online calculator was developed. RESULTS Independent predictors for model establishment included sex, insulin use, random blood glucose, wound area, diabetic retinopathy, peripheral arterial disease, smoking history, serum albumin, serum creatinine, and C-reactive protein. After evaluation, the NB model was identified as the most generalizable model, with an AUC of 0.864, a recall of 0.907, and an F1-score of 0.744. Random blood glucose, C-reactive protein, and wound area were determined to be the three most important influencing factors. A corresponding online calculator was created (https://predicthardtoheal.azurewebsites.net/). CONCLUSION Based on clinical characteristics, machine learning algorithms can achieve acceptable predictions of hard-to-heal DFUs, with the NB model performing the best. Our online calculator can assist doctors in identifying the possibility of hard-to-heal DFUs at the time of admission to reduce the likelihood of a dismal prognosis.
Collapse
Affiliation(s)
- Shiqi Wang
- Department of Burns and Plastic Surgery, Affiliated Drum Tower Hospital, Medical School of Nanjing University, Nanjing, People’s Republic of China
| | - Chao Xia
- Department of Orthopedics, Air Force Hospital of Eastern Theater Command, Nanjing, People’s Republic of China
| | - Qirui Zheng
- Software Institute, Nanjing University, Nanjing, People's Republic of China
| | - Aiping Wang
- Department of Endocrinology, Air Force Hospital of Eastern Theater Command, Nanjing, People's Republic of China
- Aiping Wang, Department of Endocrinology, Air Force Hospital of Eastern Theater Command, Nanjing, 210002, People’s Republic of China, Email
| | - Qian Tan
- Department of Burns and Plastic Surgery, Affiliated Drum Tower Hospital, Medical School of Nanjing University, Nanjing, People’s Republic of China
- Correspondence: Qian Tan, Department of Burns and Plastic Surgery, Nanjing Drum Tower Hospital, The Affiliated Hospital of Nanjing University Medical School, Nanjing, People’s Republic of China, Tel +86 25 83106666, Email
| |
Collapse
|
34
|
Zhan M, Chen Z, Ding C, Qu Q, Wang G, Liu S, Wen F. Risk prediction for delayed clearance of high-dose methotrexate in pediatric hematological malignancies by machine learning. Int J Hematol 2021; 114:483-493. [PMID: 34170480 DOI: 10.1007/s12185-021-03184-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 06/21/2021] [Accepted: 06/21/2021] [Indexed: 10/21/2022]
Abstract
This study aimed to establish a predictive model to identify children with hematologic malignancy at high risk for delayed clearance of high-dose methotrexate (HD-MTX) based on machine learning. A total of 205 patients were recruited. Five variables (hematocrit, risk classification, dose, SLC19A1 rs2838958, sex) and three variables (SLC19A1 rs2838958, sex, dose) were statistically significant in univariable analysis and, separately, multivariate logistic regression. The data was randomly split into a "training cohort" and a "validation cohort". A nomogram for prediction of delayed HD-MTX clearance was constructed using the three variables in the training dataset and validated in the validation dataset. Five machine learning algorithms (cart classification and regression trees, naïve Bayes, support vector machine, random forest, C5.0 decision tree) combined with different resampling methods were used for model building with five or three variables. When developed machine learning models were evaluated in the validation dataset, the C5.0 decision tree combined with the synthetic minority oversampling technique (SMOTE) using five variables had the highest area under the receiver operating characteristic curve (AUC 0.807 [95% CI 0.724-0.889]), a better performance than the nomogram (AUC 0.69 [95% CI 0.594-0.787]). The results support potential clinical application of machine learning for patient risk classification.
Collapse
Affiliation(s)
- Min Zhan
- Department of Pharmacy, Shenzhen Children's Hospital, Shenzhen, 518036, People's Republic of China
| | - Zebin Chen
- Department of Pharmacy, Shenzhen Children's Hospital, Shenzhen, 518036, People's Republic of China
| | - Changcai Ding
- Department of Research and Development, Shenzhen Advanced Precision Medical CO., LTD, Shenzhen, 518000, People's Republic of China
| | - Qiang Qu
- Department of Pharmacy, Xiangya Hospital Central South University, Changsha, 410008, People's Republic of China
| | - Guoqiang Wang
- Department of Pharmacy, Shenzhen Children's Hospital, Shenzhen, 518036, People's Republic of China
| | - Sixi Liu
- Department of Hematology/Oncology, Shenzhen Children's Hospital, Shenzhen, 518036, People's Republic of China
| | - Feiqiu Wen
- Department of Hematology/Oncology, Shenzhen Children's Hospital, Shenzhen, 518036, People's Republic of China.
| |
Collapse
|
35
|
Ye Z, Yang W, Yang Y, Ouyang D. Interpretable machine learning methods for in vitro pharmaceutical formulation development. FOOD FRONTIERS 2021. [DOI: 10.1002/fft2.78] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Affiliation(s)
- Zhuyifan Ye
- State Key Laboratory of Quality Research in Chinese Medicine Institute of Chinese Medical Sciences (ICMS) University of Macau Macau China
| | - Wenmian Yang
- State Key Laboratory of Internet of Things for Smart City University of Macau Macau China
| | - Yilong Yang
- School of Software Beihang University Beijing China
| | - Defang Ouyang
- State Key Laboratory of Quality Research in Chinese Medicine Institute of Chinese Medical Sciences (ICMS) University of Macau Macau China
| |
Collapse
|
36
|
Vatansever S, Schlessinger A, Wacker D, Kaniskan HÜ, Jin J, Zhou M, Zhang B. Artificial intelligence and machine learning-aided drug discovery in central nervous system diseases: State-of-the-arts and future directions. Med Res Rev 2021; 41:1427-1473. [PMID: 33295676 PMCID: PMC8043990 DOI: 10.1002/med.21764] [Citation(s) in RCA: 102] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 10/30/2020] [Accepted: 11/20/2020] [Indexed: 01/11/2023]
Abstract
Neurological disorders significantly outnumber diseases in other therapeutic areas. However, developing drugs for central nervous system (CNS) disorders remains the most challenging area in drug discovery, accompanied with the long timelines and high attrition rates. With the rapid growth of biomedical data enabled by advanced experimental technologies, artificial intelligence (AI) and machine learning (ML) have emerged as an indispensable tool to draw meaningful insights and improve decision making in drug discovery. Thanks to the advancements in AI and ML algorithms, now the AI/ML-driven solutions have an unprecedented potential to accelerate the process of CNS drug discovery with better success rate. In this review, we comprehensively summarize AI/ML-powered pharmaceutical discovery efforts and their implementations in the CNS area. After introducing the AI/ML models as well as the conceptualization and data preparation, we outline the applications of AI/ML technologies to several key procedures in drug discovery, including target identification, compound screening, hit/lead generation and optimization, drug response and synergy prediction, de novo drug design, and drug repurposing. We review the current state-of-the-art of AI/ML-guided CNS drug discovery, focusing on blood-brain barrier permeability prediction and implementation into therapeutic discovery for neurological diseases. Finally, we discuss the major challenges and limitations of current approaches and possible future directions that may provide resolutions to these difficulties.
Collapse
Affiliation(s)
- Sezen Vatansever
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute for Data Science and Genomic TechnologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Avner Schlessinger
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Daniel Wacker
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of NeuroscienceIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - H. Ümit Kaniskan
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Jian Jin
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Ming‐Ming Zhou
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Bin Zhang
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute for Data Science and Genomic TechnologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| |
Collapse
|
37
|
Gousiadou C, Marchese Robinson RL, Kotzabasaki M, Doganis P, Wilkins TA, Jia X, Sarimveis H, Harper SL. Machine learning predictions of concentration-specific aggregate hazard scores of inorganic nanomaterials in embryonic zebrafish. Nanotoxicology 2021; 15:446-476. [PMID: 33586589 DOI: 10.1080/17435390.2021.1872113] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
The possibility of employing computational approaches like nano-QSAR or nano-read-across to predict nanomaterial hazard is attractive from both a financial, and most importantly, where in vivo tests are required, ethical perspective. In the present work, we have employed advanced Machine Learning techniques, including stacked model ensembles, to create nano-QSAR tools for modeling the toxicity of metallic and metal oxide nanomaterials, both coated and uncoated and with a variety of different core compositions, tested at different dosage concentrations on embryonic zebrafish. Using both computed and experimental descriptors, we have identified a set of properties most relevant for the assessment of nanomaterial toxicity and successfully correlated these properties with the associated biological responses observed in zebrafish. Our findings suggest that for the group of metal and metal oxide nanomaterials, the core chemical composition, concentration and properties dependent upon nanomaterial surface and medium composition (such as zeta potential and agglomerate size) are significant factors influencing toxicity, albeit the ranking of different variables is sensitive to the exact analysis method and data modeled. Our generalized nano-QSAR ensemble models provide a promising framework for anticipating the toxicity potential of new nanomaterials and may contribute to the transition out of the animal testing paradigm. However, future experimental studies are required to generate comparable, similarly high quality data, using consistent protocols, for well characterized nanomaterials, as per the dataset modeled herein. This would enable the predictive power of our promising ensemble modeling approaches to be robustly assessed on large, diverse and truly external datasets.
Collapse
Affiliation(s)
- C Gousiadou
- School of Chemical Engineering, National Technical University of Athens, Athens, Greece
| | - R L Marchese Robinson
- School of Chemical and Process Engineering, University of Leeds, Leeds, United Kingdom
| | - M Kotzabasaki
- School of Chemical Engineering, National Technical University of Athens, Athens, Greece
| | - P Doganis
- School of Chemical Engineering, National Technical University of Athens, Athens, Greece
| | - T A Wilkins
- School of Chemical and Process Engineering, University of Leeds, Leeds, United Kingdom
| | - X Jia
- School of Chemical and Process Engineering, University of Leeds, Leeds, United Kingdom
| | - H Sarimveis
- School of Chemical Engineering, National Technical University of Athens, Athens, Greece
| | - S L Harper
- School of Chemical, Biological and Environmental Engineering, Oregon State University, Corvallis, OR, USA.,Department of Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR, USA.,Safer Nanomaterials and Nanomanufacturing Initiative, Oregon Nanoscience and Microtechnologies Institute, Eugene, OR, USA
| |
Collapse
|
38
|
Barton-Henry K, Wenz L, Levermann A. Decay radius of climate decision for solar panels in the city of Fresno, USA. Sci Rep 2021; 11:8571. [PMID: 33883574 PMCID: PMC8060319 DOI: 10.1038/s41598-021-87714-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2021] [Accepted: 03/30/2021] [Indexed: 11/09/2022] Open
Abstract
To design incentives towards achieving climate mitigation targets, it is important to understand the mechanisms that affect individual climate decisions such as solar panel installation. It has been shown that peer effects are important in determining the uptake and spread of household photovoltaic installations. Due to coarse geographical data, it remains unclear whether this effect is generated through geographical proximity or within groups exhibiting similar characteristics. Here we show that geographical proximity is the most important predictor of solar panel implementation, and that peer effects diminish with distance. Using satellite imagery, we build a unique geo-located dataset for the city of Fresno to specify the importance of small distances. Employing machine learning techniques, we find the density of solar panels within the shortest measured radius of an address is the most important factor in determining the likelihood of that address having a solar panel. The importance of geographical proximity decreases with distance following an exponential curve with a decay radius of 210 meters. The dependence is slightly more pronounced in low-income groups. These findings support the model of distance-related social diffusion, and suggest priority should be given to seeding panels in areas where few exist.
Collapse
Affiliation(s)
| | - Leonie Wenz
- Potsdam Institute for Climate Impact Research, Potsdam, Germany.
- Mercator Research Institute On Global Commons and Climate Change, Berlin, Germany.
- Department of Agriculture and Resource Economics, University of California, Berkeley, USA.
| | - Anders Levermann
- Potsdam Institute for Climate Impact Research, Potsdam, Germany
- Institute of Physics, Potsdam University, Potsdam, Germany
- Columbia University, New York, NY, USA
| |
Collapse
|
39
|
Wu Z, Jiang D, Hsieh CY, Chen G, Liao B, Cao D, Hou T. Hyperbolic relational graph convolution networks plus: a simple but highly efficient QSAR-modeling method. Brief Bioinform 2021; 22:6235968. [PMID: 33866354 DOI: 10.1093/bib/bbab112] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Revised: 03/11/2021] [Accepted: 03/12/2021] [Indexed: 01/04/2023] Open
Abstract
Accurate predictions of druggability and bioactivities of compounds are desirable to reduce the high cost and time of drug discovery. After more than five decades of continuing developments, quantitative structure-activity relationship (QSAR) methods have been established as indispensable tools that facilitate fast, reliable and affordable assessments of physicochemical and biological properties of compounds in drug-discovery programs. Currently, there are mainly two types of QSAR methods, descriptor-based methods and graph-based methods. The former is developed based on predefined molecular descriptors, whereas the latter is developed based on simple atomic and bond information. In this study, we presented a simple but highly efficient modeling method by combining molecular graphs and molecular descriptors as the input of a modified graph neural network, called hyperbolic relational graph convolution network plus (HRGCN+). The evaluation results show that HRGCN+ achieves state-of-the-art performance on 11 drug-discovery-related datasets. We also explored the impact of the addition of traditional molecular descriptors on the predictions of graph-based methods, and found that the addition of molecular descriptors can indeed boost the predictive power of graph-based methods. The results also highlight the strong anti-noise capability of our method. In addition, our method provides a way to interpret models at both the atom and descriptor levels, which can help medicinal chemists extract hidden information from complex datasets. We also offer an HRGCN+'s online prediction service at https://quantum.tencent.com/hrgcn/.
Collapse
Affiliation(s)
- Zhenxing Wu
- College of Pharmaceutical Sciences, Zhejiang University, under the supervision of Prof. Tingjun Hou
| | - Dejun Jiang
- College of Pharmaceutical Sciences, Zhejiang University, under the supervision of Prof. Tingjun Hou
| | | | - Guangyong Chen
- Shenzhen Institute of Advanced Technology Chinese Academy of Sciences
| | - Ben Liao
- demonstrated history of working in industry and academia. Skilled in machine learning, mathematics, natural language processing, computer vision and graph neural networks. Strong education professional with a PhD from Université de Paris in France
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University
| |
Collapse
|
40
|
Alcantara RS, Day EM, Hahn ME, Grabowski AM. Sacral acceleration can predict whole-body kinetics and stride kinematics across running speeds. PeerJ 2021; 9:e11199. [PMID: 33954039 PMCID: PMC8048400 DOI: 10.7717/peerj.11199] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Accepted: 03/10/2021] [Indexed: 12/31/2022] Open
Abstract
Background Stress fractures are injuries caused by repetitive loading during activities such as running. The application of advanced analytical methods such as machine learning to data from multiple wearable sensors has allowed for predictions of biomechanical variables associated with running-related injuries like stress fractures. However, it is unclear if data from a single wearable sensor can accurately estimate variables that characterize external loading during running such as peak vertical ground reaction force (vGRF), vertical impulse, and ground contact time. Predicting these biomechanical variables with a single wearable sensor could allow researchers, clinicians, and coaches to longitudinally monitor biomechanical running-related injury risk factors without expensive force-measuring equipment. Purpose We quantified the accuracy of applying quantile regression forest (QRF) and linear regression (LR) models to sacral-mounted accelerometer data to predict peak vGRF, vertical impulse, and ground contact time across a range of running speeds. Methods Thirty-seven collegiate cross country runners (24 females, 13 males) ran on a force-measuring treadmill at 3.8-5.4 m/s while wearing an accelerometer clipped posteriorly to the waistband of their running shorts. We cross-validated QRF and LR models by training them on acceleration data, running speed, step frequency, and body mass as predictor variables. Trained models were then used to predict peak vGRF, vertical impulse, and contact time. We compared predicted values to those calculated from a force-measuring treadmill on a subset of data (n = 9) withheld during model training. We quantified prediction accuracy by calculating the root mean square error (RMSE) and mean absolute percentage error (MAPE). Results The QRF model predicted peak vGRF with a RMSE of 0.150 body weights (BW) and MAPE of 4.27 ± 2.85%, predicted vertical impulse with a RMSE of 0.004 BW*s and MAPE of 0.80 ± 0.91%, and predicted contact time with a RMSE of 0.011 s and MAPE of 4.68 ± 3.00%. The LR model predicted peak vGRF with a RMSE of 0.139 BW and MAPE of 4.04 ± 2.57%, predicted vertical impulse with a RMSE of 0.002 BW*s and MAPE of 0.50 ± 0.42%, and predicted contact time with a RMSE of 0.008 s and MAPE of 3.50 ± 2.27%. There were no statistically significant differences between QRF and LR model prediction MAPE for peak vGRF (p = 0.549) or vertical impulse (p = 0.073), but the LR model's MAPE for contact time was significantly lower than the QRF model's MAPE (p = 0.0497). Conclusions Our findings indicate that the QRF and LR models can accurately predict peak vGRF, vertical impulse, and contact time (MAPE < 5%) from a single sacral-mounted accelerometer across a range of running speeds. These findings may be beneficial for researchers, clinicians, or coaches seeking to monitor running-related injury risk factors without force-measuring equipment.
Collapse
Affiliation(s)
- Ryan S Alcantara
- Department of Integrative Physiology, University of Colorado Boulder, Boulder, CO, United States of America
| | - Evan M Day
- Department of Human Physiology, University of Oregon, Eugene, OR, United States of America
| | - Michael E Hahn
- Department of Human Physiology, University of Oregon, Eugene, OR, United States of America
| | - Alena M Grabowski
- Department of Integrative Physiology, University of Colorado Boulder, Boulder, CO, United States of America
| |
Collapse
|
41
|
Wang Y, Yu Y, Han W, Zhang YJ, Jiang L, Xue HD, Lei J, Jin ZY, Yu JC. CT Radiomics for Distinction of Human Epidermal Growth Factor Receptor 2 Negative Gastric Cancer. Acad Radiol 2021; 28:e86-e92. [PMID: 32303442 DOI: 10.1016/j.acra.2020.02.018] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2019] [Revised: 02/12/2020] [Accepted: 02/14/2020] [Indexed: 02/07/2023]
Abstract
RATIONALE AND OBJECTIVES The purpose of this study was to investigate the role of computed tomography (CT) radiomics for the prediction of the human epidermal growth factor 2 (HER2) status in patients with gastric cancer. METHODS One hundred and thirty two consecutive patients with advanced gastric cancer undergoing radical gastrectomy were retrospectively reviewed. All patients received preoperative contrast CT examination, and immunohistochemistry results of their HER2 status were available. All the subjects were randomly divided into a training cohort (n = 90) and a test cohort (n = 42). Arterial phase (AP) and portal phase (PP) contrast CT images were retrieved for tumor segmentation and feature extraction. Receiver operating characteristics (ROC) curves and area under the curve (AUC) were used to evaluate the performance of the radiomics classifiers. RESULTS Among the 132 patients, a total of 99 patients were HER2 negative, and the remaining 33 patients were border line or positive. The AP radiomics model could distinguish HER2-negative cases with an AUC of 0.756 (95% confidence interval [CI]: 0.656-0.840) in the training cohort, which was confirmed in the test cohort with AUC of 0.830 (95% CI: 0.678-0.930). The PP radiomics model showed AUCs of 0.715 (95% CI: 0.612-0.804) and 0.718 (95% CI: 0.554-0.849) in the training and test cohort for distinction of negative HER2 cases, respectively. CONCLUSION Radiomics models based on standard-of-care CT images hold promise for distinguishing HER2-negative gastric cancer.
Collapse
|
42
|
Jiménez-Luna J, Skalic M, Weskamp N, Schneider G. Coloring Molecules with Explainable Artificial Intelligence for Preclinical Relevance Assessment. J Chem Inf Model 2021; 61:1083-1094. [PMID: 33629843 DOI: 10.1021/acs.jcim.0c01344] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Graph neural networks are able to solve certain drug discovery tasks such as molecular property prediction and de novo molecule generation. However, these models are considered "black-box" and "hard-to-debug". This study aimed to improve modeling transparency for rational molecular design by applying the integrated gradients explainable artificial intelligence (XAI) approach for graph neural network models. Models were trained for predicting plasma protein binding, hERG channel inhibition, passive permeability, and cytochrome P450 inhibition. The proposed methodology highlighted molecular features and structural elements that are in agreement with known pharmacophore motifs, correctly identified property cliffs, and provided insights into unspecific ligand-target interactions. The developed XAI approach is fully open-sourced and can be used by practitioners to train new models on other clinically relevant endpoints.
Collapse
Affiliation(s)
- José Jiménez-Luna
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, 8049 Zurich, Switzerland
| | - Miha Skalic
- Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Straße 65, 88397 Biberach an der Riss, Germany
| | - Nils Weskamp
- Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Straße 65, 88397 Biberach an der Riss, Germany
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, 8049 Zurich, Switzerland
| |
Collapse
|
43
|
Wu Z, Zhu M, Kang Y, Leung ELH, Lei T, Shen C, Jiang D, Wang Z, Cao D, Hou T. Do we need different machine learning algorithms for QSAR modeling? A comprehensive assessment of 16 machine learning algorithms on 14 QSAR data sets. Brief Bioinform 2020; 22:6032614. [PMID: 33313673 DOI: 10.1093/bib/bbaa321] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Revised: 10/09/2020] [Accepted: 10/19/2020] [Indexed: 12/18/2022] Open
Abstract
Although a wide variety of machine learning (ML) algorithms have been utilized to learn quantitative structure-activity relationships (QSARs), there is no agreed single best algorithm for QSAR learning. Therefore, a comprehensive understanding of the performance characteristics of popular ML algorithms used in QSAR learning is highly desirable. In this study, five linear algorithms [linear function Gaussian process regression (linear-GPR), linear function support vector machine (linear-SVM), partial least squares regression (PLSR), multiple linear regression (MLR) and principal component regression (PCR)], three analogizers [radial basis function support vector machine (rbf-SVM), K-nearest neighbor (KNN) and radial basis function Gaussian process regression (rbf-GPR)], six symbolists [extreme gradient boosting (XGBoost), Cubist, random forest (RF), multiple adaptive regression splines (MARS), gradient boosting machine (GBM), and classification and regression tree (CART)] and two connectionists [principal component analysis artificial neural network (pca-ANN) and deep neural network (DNN)] were employed to learn the regression-based QSAR models for 14 public data sets comprising nine physicochemical properties and five toxicity endpoints. The results show that rbf-SVM, rbf-GPR, XGBoost and DNN generally illustrate better performances than the other algorithms. The overall performances of different algorithms can be ranked from the best to the worst as follows: rbf-SVM > XGBoost > rbf-GPR > Cubist > GBM > DNN > RF > pca-ANN > MARS > linear-GPR ≈ KNN > linear-SVM ≈ PLSR > CART ≈ PCR ≈ MLR. In terms of prediction accuracy and computational efficiency, SVM and XGBoost are recommended to the regression learning for small data sets, and XGBoost is an excellent choice for large data sets. We then investigated the performances of the ensemble models by integrating the predictions of multiple ML algorithms. The results illustrate that the ensembles of two or three algorithms in different categories can indeed improve the predictions of the best individual ML algorithms.
Collapse
Affiliation(s)
- Zhenxing Wu
- College of Pharmaceutical Sciences, Hangzhou Institute of Innovative Medicine, Zhejiang University, P. R. China
| | - Minfeng Zhu
- Xiangya School of Pharmaceutical Sciences, Central South University, P. R. China
| | - Yu Kang
- College of Pharmaceutical Sciences, Hangzhou Institute of Innovative Medicine, Zhejiang University, P. R. China
| | - Elaine Lai-Han Leung
- State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, P. R. China
| | - Tailong Lei
- College of Pharmaceutical Sciences, Hangzhou Institute of Innovative Medicine, Zhejiang University, P. R. China
| | - Chao Shen
- College of Pharmaceutical Sciences, Hangzhou Institute of Innovative Medicine, Zhejiang University, P. R. China
| | - Dejun Jiang
- College of Pharmaceutical Sciences, Hangzhou Institute of Innovative Medicine, Zhejiang University, P. R. China
| | - Zhe Wang
- College of Pharmaceutical Sciences, Hangzhou Institute of Innovative Medicine, Zhejiang University, P. R. China
| | | | - Tingjun Hou
- Peking University, China. He is currently a professor in the College of Pharmaceutical Sciences, Zhejiang University, China
| |
Collapse
|
44
|
Multiclass machine learning vs. conventional calculators for stroke/CVD risk assessment using carotid plaque predictors with coronary angiography scores as gold standard: a 500 participants study. Int J Cardiovasc Imaging 2020; 37:1171-1187. [DOI: 10.1007/s10554-020-02099-7] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Accepted: 11/03/2020] [Indexed: 02/07/2023]
|
45
|
Tinkov O, Polishchuk P, Matveieva M, Grigorev V, Grigoreva L, Porozov Y. The Influence of Structural Patterns on Acute Aquatic Toxicity of Organic Compounds. Mol Inform 2020; 40:e2000209. [PMID: 33029954 DOI: 10.1002/minf.202000209] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Accepted: 10/01/2020] [Indexed: 12/28/2022]
Abstract
Investigation of the influence of molecular structure of different organic compounds on acute toxicity towards Fathead minnow, Daphnia magna, and Tetrahymena pyriformis has been carried out using 2D simplex representation of molecular structure and two modelling methods: Random Forest (RF) and Gradient Boosting Machine (GBM). Suitable QSAR (Quantitative Structure - Activity Relationships) models were obtained. The study was focused on QSAR models interpretation. The aim of the study was to develop a set of structural fragments that simultaneously consistently increase toxicity toward Fathead minnow, Daphnia magna, Tetrahymena pyriformis. The interpretation allowed to gain more details about known toxicophores and to propose new fragments. The results obtained made it possible to rank the contributions of molecular fragments to various types of toxicity to aquatic organisms. This information can be used for molecular optimization of chemicals. According to the results of structural interpretation, the most significant common mechanisms of the toxic effect of organic compounds on Fathead minnow, Daphnia magna and Tetrahymena pyriformis are reactions of nucleophilic substitution and inhibition of oxidative phosphorylation in mitochondria. In addition acetylcholinesterase and voltage-gated ion channel of Fathead minnow and Daphnia magna are important targets for toxicants. The on-line version of the OCHEM expert system (https://ochem.eu) were used for a comparative QSAR investigation. The proposed QSAR models comply with the OECD principles and can be used to reliably predict acute toxicity of organic compounds towards Fathead minnow, Daphnia magna and Tetrahymena pyriformis with allowance for applicability domain estimation.
Collapse
Affiliation(s)
- Oleg Tinkov
- Department of Computer Science, Military Institute of the Ministry of Defense, 3300, Gogol str. 2"B", Tiraspol, Transdniestria, Moldova.,Department of Pharmacology and Pharmaceutical Chemistry, Medical Faculty, Transnistrian State University, 3300, October 25 str. 128, Tiraspol, Transdniestria, Moldova
| | - Pavel Polishchuk
- Institute of Molecular and Translational Medicine Faculty of Medicine and Dentistry Palacký University and University Hospital in Olomouc, Hnevotinska 5, 77900, Olomouc, Czech Republic
| | - Mariia Matveieva
- Institute of Molecular and Translational Medicine Faculty of Medicine and Dentistry Palacký University and University Hospital in Olomouc, Hnevotinska 5, 77900, Olomouc, Czech Republic
| | - Veniamin Grigorev
- Institute of Physiologically Active Compounds, Russian Academy of Sciences, 142432, Severniy proezd 1, Chernogolovka, Moscow region, Russia
| | - Ludmila Grigoreva
- Department of Fundamental Physical and Chemical Engineering, Moscow State University, 119991, Leninskiye Gory 1/51, Moscow, Russia
| | - Yuri Porozov
- World-Class Research Center "Digital biodesign and personalized healthcare", I.M. Sechenov First Moscow State Medical University, Moscow, Russia.,Department of Computational Biology, Sirius University of Science and Technology, 354340, Olympic Ave 1, Sochi, Russia
| |
Collapse
|
46
|
Jiménez-Luna J, Grisoni F, Schneider G. Drug discovery with explainable artificial intelligence. NAT MACH INTELL 2020. [DOI: 10.1038/s42256-020-00236-4] [Citation(s) in RCA: 152] [Impact Index Per Article: 38.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
47
|
Wang Y, Liu W, Yu Y, Liu JJ, Jiang L, Xue HD, Lei J, Jin Z, Yu JC. Prediction of the Depth of Tumor Invasion in Gastric Cancer: Potential Role of CT Radiomics. Acad Radiol 2020; 27:1077-1084. [PMID: 31761666 DOI: 10.1016/j.acra.2019.10.020] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Revised: 10/22/2019] [Accepted: 10/25/2019] [Indexed: 12/12/2022]
Abstract
RATIONALE AND OBJECTIVES The aim of this study was to investigate the value of computed tomography (CT) radiomics for the differentiation between T2 and T3/4 stage lesions in gastric cancer. MATERIALS AND METHODS A total of 244 consecutive patients with pathologically proven gastric cancer were retrospectively included and split into a training cohort (171 patients) and a test cohort (73 patients). Preoperative arterial phase and portal phase contrast enhanced CT images were retrieved for tumor segmentation and feature extraction by using a dedicated postprocessing software. The random forest method was used to build the classifier models. RESULTS The performance of single phase radiomics models were favorable in the differentiation between T2 and T3/4 stage tumors. Arterial phase-based radiomics model exhibited areas under the curve of 0.899 (95% CI: 0.812-0.955) in the training cohort and 0.825 (95% CI: 0.718-0.904) in the test cohort. Portal phase-based radiomics model showed areas under the curve of 0.843 (95% CI: 0.746-0.914) and 0.818 (95% CI: 0.711-0.899) in the training and test cohort, respectively. CONCLUSION CT radiomics approach has a potential role in differentiation between T2 and T3/4 stage tumors in gastric cancer.
Collapse
|
48
|
Chen CH, Tanaka K, Kotera M, Funatsu K. Comparison and improvement of the predictability and interpretability with ensemble learning models in QSPR applications. J Cheminform 2020; 12:19. [PMID: 33430997 PMCID: PMC7106596 DOI: 10.1186/s13321-020-0417-9] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2018] [Accepted: 02/05/2020] [Indexed: 12/23/2022] Open
Abstract
Ensemble learning helps improve machine learning results by combining several models and allows the production of better predictive performance compared to a single model. It also benefits and accelerates the researches in quantitative structure–activity relationship (QSAR) and quantitative structure–property relationship (QSPR). With the growing number of ensemble learning models such as random forest, the effectiveness of QSAR/QSPR will be limited by the machine’s inability to interpret the predictions to researchers. In fact, many implementations of ensemble learning models are able to quantify the overall magnitude of each feature. For example, feature importance allows us to assess the relative importance of features and to interpret the predictions. However, different ensemble learning methods or implementations may lead to different feature selections for interpretation. In this paper, we compared the predictability and interpretability of four typical well-established ensemble learning models (Random forest, extreme randomized trees, adaptive boosting and gradient boosting) for regression and binary classification modeling tasks. Then, the blending methods were built by summarizing four different ensemble learning methods. The blending method led to better performance and a unification interpretation by summarizing individual predictions from different learning models. The important features of two case studies which gave us some valuable information to compound properties were discussed in detail in this report. QSPR modeling with interpretable machine learning techniques can move the chemical design forward to work more efficiently, confirm hypothesis and establish knowledge for better results.
Collapse
Affiliation(s)
- Chia-Hsiu Chen
- Department of Chemical System Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan
| | - Kenichi Tanaka
- Department of Chemical System Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan
| | - Masaaki Kotera
- Department of Chemical System Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan
| | - Kimito Funatsu
- Department of Chemical System Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan.
| |
Collapse
|
49
|
Luo Y, Tang Z, Hu X, Lu S, Miao B, Hong S, Bai H, Sun C, Qiu J, Liang H, Na N. Machine learning for the prediction of severe pneumonia during posttransplant hospitalization in recipients of a deceased-donor kidney transplant. ANNALS OF TRANSLATIONAL MEDICINE 2020; 8:82. [PMID: 32175375 DOI: 10.21037/atm.2020.01.09] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Background Pneumonia accounts for the majority of infection-related deaths after kidney transplantation. We aimed to build a predictive model based on machine learning for severe pneumonia in recipients of deceased-donor transplants within the perioperative period after surgery. Methods We collected the features of kidney transplant recipients and used a tree-based ensemble classification algorithm (Random Forest or AdaBoost) and a nonensemble classifier (support vector machine, Naïve Bayes, or logistic regression) to build the predictive models. We used the area under the precision-recall curve (AUPRC) and the area under the receiver operating characteristic curve (AUROC) to evaluate the predictive performance via ten-fold cross validation. Results Five hundred nineteen patients who underwent transplantation from January 2015 to December 2018 were included. Forty-three severe pneumonia episodes (8.3%) occurred during hospitalization after surgery. Significant differences in the recipients' age, diabetes status, HBsAg level, operation time, reoperation, usage of anti-fungal drugs, preoperative albumin and immunoglobulin levels, preoperative pulmonary lesions, and delayed graft function, as well as donor age, were observed between patients with and without severe pneumonia (P<0.05). We screened eight important features correlated with severe pneumonia using the recursive feature elimination method and then constructed a predictive model based on these features. The top three features were preoperative pulmonary lesions, reoperation and recipient age (with importance scores of 0.194, 0.124 and 0.078, respectively). Among the machine learning algorithms described above, the Random Forest algorithm displayed better predictive performance, with a sensitivity of 0.67, specificity of 0.97, positive likelihood ratio of 22.33, negative likelihood ratio of 0.34, AUROC of 0.91, and AUPRC of 0.72. Conclusions The Random Forest model is potentially useful for predicting severe pneumonia in kidney transplant recipients. Recipients with a potential preoperative potential pulmonary infection, who are of older age and who require reoperation should be monitored carefully to prevent the occurrence of severe pneumonia.
Collapse
Affiliation(s)
- You Luo
- Department of Kidney Transplantation, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou 510630, China
| | - Zuofu Tang
- Department of Kidney Transplantation, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou 510630, China
| | - Xiao Hu
- Department of Kidney Transplantation, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou 510630, China
| | - Shuo Lu
- Department of Kidney Transplantation, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou 510630, China
| | - Bin Miao
- Department of Kidney Transplantation, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou 510630, China
| | - Songlin Hong
- Fane Data Technology Corporation, Tianjin 300384, China
| | - Haiyun Bai
- Fane Data Technology Corporation, Tianjin 300384, China
| | - Chen Sun
- Fane Data Technology Corporation, Tianjin 300384, China
| | - Jiang Qiu
- Department of Kidney Transplantation, The First Affiliated Hospital of Sun Yat-sen University, Guangzhou 510080, China
| | - Huiying Liang
- Institute of Pediatrics, Guangzhou Women and Children's Medical Center, Guangzhou 510623, China
| | - Ning Na
- Department of Kidney Transplantation, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou 510630, China
| |
Collapse
|
50
|
Neural-based approaches to overcome feature selection and applicability domain in drug-related property prediction. Appl Soft Comput 2019. [DOI: 10.1016/j.asoc.2019.105777] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|