1
|
Boonyarit B, Yamprasert N, Kaewnuratchadasorn P, Kinchagawat J, Prommin C, Rungrotmongkol T, Nutanong S. GraphEGFR: Multi-task and transfer learning based on molecular graph attention mechanism and fingerprints improving inhibitor bioactivity prediction for EGFR family proteins on data scarcity. J Comput Chem 2024; 45:2001-2023. [PMID: 38713612 DOI: 10.1002/jcc.27388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 04/16/2024] [Accepted: 04/19/2024] [Indexed: 05/09/2024]
Abstract
The proteins within the human epidermal growth factor receptor (EGFR) family, members of the tyrosine kinase receptor family, play a pivotal role in the molecular mechanisms driving the development of various tumors. Tyrosine kinase inhibitors, key compounds in targeted therapy, encounter challenges in cancer treatment due to emerging drug resistance mutations. Consequently, machine learning has undergone significant evolution to address the challenges of cancer drug discovery related to EGFR family proteins. However, the application of deep learning in this area is hindered by inherent difficulties associated with small-scale data, particularly the risk of overfitting. Moreover, the design of a model architecture that facilitates learning through multi-task and transfer learning, coupled with appropriate molecular representation, poses substantial challenges. In this study, we introduce GraphEGFR, a deep learning regression model designed to enhance molecular representation and model architecture for predicting the bioactivity of inhibitors against both wild-type and mutant EGFR family proteins. GraphEGFR integrates a graph attention mechanism for molecular graphs with deep and convolutional neural networks for molecular fingerprints. We observed that GraphEGFR models employing multi-task and transfer learning strategies generally achieve predictive performance comparable to existing competitive methods. The integration of molecular graphs and fingerprints adeptly captures relationships between atoms and enables both global and local pattern recognition. We further validated potential multi-targeted inhibitors for wild-type and mutant HER1 kinases, exploring key amino acid residues through molecular dynamics simulations to understand molecular interactions. This predictive model offers a robust strategy that could significantly contribute to overcoming the challenges of developing deep learning models for drug discovery with limited data and exploring new frontiers in multi-targeted kinase drug discovery for EGFR family proteins.
Collapse
Affiliation(s)
- Bundit Boonyarit
- School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology, Rayong, Thailand
| | - Nattawin Yamprasert
- School of Information, Computer, and Communication Technology, Sirindhorn International Institute of Technology, Thammasat University, Pathum Thani, Thailand
| | | | - Jiramet Kinchagawat
- School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology, Rayong, Thailand
| | - Chanatkran Prommin
- School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology, Rayong, Thailand
| | - Thanyada Rungrotmongkol
- Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, Bangkok, Thailand
- Center of Excellence in Structural and Computational Biology Research Unit, Department of Biochemistry, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
| | - Sarana Nutanong
- School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology, Rayong, Thailand
| |
Collapse
|
2
|
Chen Z, Zhang L, Zhang P, Guo H, Zhang R, Li L, Li X. Prediction of Cytochrome P450 Inhibition Using a Deep Learning Approach and Substructure Pattern Recognition. J Chem Inf Model 2024; 64:2528-2538. [PMID: 37864562 DOI: 10.1021/acs.jcim.3c01396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2023]
Abstract
Cytochrome P450 (CYP) is a family of enzymes that are responsible for about 75% of all metabolic reactions. Among them, CYP1A2, CYP2C9, CYP2C19, CYP2D6, and CYP3A4 participate in the metabolism of most drugs and mediate many adverse drug reactions. Therefore, it is necessary to estimate the chemical inhibition of Cytochrome P450 enzymes in drug discovery and the food industry. In the past few decades, many computational models have been reported, and some provided good performance. However, there are still several issues that should be resolved for these models, such as single isoform, models with unbalanced performance, lack of structural characteristics analysis, and poor availability. In the present study, the deep learning models based on python using the Keras framework and TensorFlow were developed for the chemical inhibition of each CYP isoform. These models were established based on a large data set containing 85715 compounds extracted from the PubChem bioassay database. On external validation, the models provided good AUC values with 0.97, 0.94, 0.94, 0.96, and 0.94 for CYP1A2, CYP2C9, CYP2C19, CYP2D6, and CYP3A4, respectively. The models can be freely accessed on the Web server named CYPi-DNNpredictor (cypi.sapredictor.cn), and the codes for the model were made open source in the Supporting Information. In addition, we also analyzed the structural characteristics of chemicals with CYP450 inhibition and detected the structural alerts (SAs), which should be responsible for the inhibition. The SAs were also made available online, named CYPi-SAdetector (cypisa.sapredictor.cn). The models can be used as a powerful tool for the prediction of CYP450 inhibitors, and the SAs should provide useful information for the mechanisms of Cytochrome P450 inhibition.
Collapse
Affiliation(s)
- Zhaoyang Chen
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan 250014, China
| | - Le Zhang
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan 250014, China
| | - Pei Zhang
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan 250014, China
| | - Huizhu Guo
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan 250014, China
| | - Ruiqiu Zhang
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan 250014, China
| | - Ling Li
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan 250014, China
| | - Xiao Li
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan 250014, China
| |
Collapse
|
3
|
Ibrahim S, Abdul Wahab N. Optimizing neural network algorithms for submerged membrane bioreactor: A comparative study of OVAT and RSM hyperparameter optimization techniques. WATER SCIENCE AND TECHNOLOGY : A JOURNAL OF THE INTERNATIONAL ASSOCIATION ON WATER POLLUTION RESEARCH 2024; 89:1701-1724. [PMID: 38619898 DOI: 10.2166/wst.2024.099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 03/10/2024] [Indexed: 04/17/2024]
Abstract
Hyperparameter tuning is an important process to maximize the performance of any neural network model. This present study proposed the factorial design of experiment for screening and response surface methodology to optimize the hyperparameter of two artificial neural network algorithms. Feed-forward neural network (FFNN) and radial basis function neural network (RBFNN) are applied to predict the permeate flux of palm oil mill effluent. Permeate pump and transmembrane pressure of the submerge membrane bioreactor system are the input variables. Six hyperparameters of the FFNN model including four numerical factors (neuron numbers, learning rate, momentum, and epoch numbers) and two categorical factors (training and activation function) are used in hyperparameter optimization. RBFNN includes two numerical factors such as a number of neurons and spreads. The conventional method (one-variable-at-a-time) is compared in terms of optimization processing time and the accuracy of the model. The result indicates that the optimal hyperparameters obtained by the proposed approach produce good accuracy with a smaller generalization error. The simulation results show an improvement of more than 65% of training performance, with less repetition and processing time. This proposed methodology can be utilized for any type of neural network application to find the optimum levels of different parameters.
Collapse
Affiliation(s)
- Syahira Ibrahim
- Faculty of Electrical Engineering, Universiti Teknologi Malaysia, Johor Bahru 81310, Malaysia
| | - Norhaliza Abdul Wahab
- Faculty of Electrical Engineering, Universiti Teknologi Malaysia, Johor Bahru 81310, Malaysia E-mail:
| |
Collapse
|
4
|
López-Flores FJ, Ramírez-Márquez C, Rubio-Castro E, Ponce-Ortega JM. Solar photovoltaic panel production in Mexico: A novel machine learning approach. ENVIRONMENTAL RESEARCH 2024; 246:118047. [PMID: 38160972 DOI: 10.1016/j.envres.2023.118047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 11/29/2023] [Accepted: 12/24/2023] [Indexed: 01/03/2024]
Abstract
This study examines the potential for widespread solar photovoltaic panel production in Mexico and emphasizes the country's unique qualities that position it as a strong manufacturing candidate in this field. An advanced model based on artificial neural networks has been developed to predict solar photovoltaic panel plant metrics. This model integrates a state-of-the-art non-linear programming framework using Pyomo as well as an innovative optimization and machine learning toolkit library. This approach creates surrogate models for individual photovoltaic plants including production timelines. While this research, conducted through extensive simulations and meticulous computations, unveiled that Latin America has been significantly underrepresented in the production of silicon, wafers, cells, and modules within the global market; it also demonstrates the substantial potential of scaling up photovoltaic panel production in Mexico, leading to significant economic, social, and environmental benefits. By hyperparameter optimization, an outstanding and competitive artificial neural network model has been developed with a coefficient of determination values above 0.99 for all output variables. It has been found that water and energy consumption during PV panel production is remarkable. However, water consumption (33.16 × 10-4 m3/kWh) and the emissions generated (1.12 × 10-6 TonCO2/kWh) during energy production are significantly lower than those of conventional power plants. Notably, the results highlight a positive economic trend, with module production plants generating the highest profits (35.7%) among all production stages, while polycrystalline silicon production plants yield comparatively lower earnings (13.0%). Furthermore, this study underscores a critical factor in the photovoltaic panel production process which is that cell production plants contribute the most to energy consumption (39.7%) due to their intricate multi-stage processes. The blending of Machine Learning and optimization models heralds a new era in resource allocation for a more sustainable renewable energy sector, offering a brighter, greener future.
Collapse
Affiliation(s)
- Francisco Javier López-Flores
- Chemical Engineering Department, Universidad Michoacana de San Nicolás de Hidalgo, Av. Francisco J. Múgica, S/N, Ciudad Universitaria, Edificio V1, Morelia, Mich., 58060, Mexico
| | - César Ramírez-Márquez
- Chemical Engineering Department, Universidad Michoacana de San Nicolás de Hidalgo, Av. Francisco J. Múgica, S/N, Ciudad Universitaria, Edificio V1, Morelia, Mich., 58060, Mexico
| | - Eusiel Rubio-Castro
- Chemical and Biological Sciences Department, Universidad Autónoma de Sinaloa, Av. de las Américas S/N, Culiacán, Sinaloa, 80010, Mexico
| | - José María Ponce-Ortega
- Chemical Engineering Department, Universidad Michoacana de San Nicolás de Hidalgo, Av. Francisco J. Múgica, S/N, Ciudad Universitaria, Edificio V1, Morelia, Mich., 58060, Mexico.
| |
Collapse
|
5
|
Boldini D, Ballabio D, Consonni V, Todeschini R, Grisoni F, Sieber SA. Effectiveness of molecular fingerprints for exploring the chemical space of natural products. J Cheminform 2024; 16:35. [PMID: 38528548 DOI: 10.1186/s13321-024-00830-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 03/17/2024] [Indexed: 03/27/2024] Open
Abstract
Natural products are a diverse class of compounds with promising biological properties, such as high potency and excellent selectivity. However, they have different structural motifs than typical drug-like compounds, e.g., a wider range of molecular weight, multiple stereocenters and higher fraction of sp3-hybridized carbons. This makes the encoding of natural products via molecular fingerprints difficult, thus restricting their use in cheminformatics studies. To tackle this issue, we explored over 30 years of research to systematically evaluate which molecular fingerprint provides the best performance on the natural product chemical space. We considered 20 molecular fingerprints from four different sources, which we then benchmarked on over 100,000 unique natural products from the COCONUT (COlleCtion of Open Natural prodUcTs) and CMNPD (Comprehensive Marine Natural Products Database) databases. Our analysis focused on the correlation between different fingerprints and their classification performance on 12 bioactivity prediction datasets. Our results show that different encodings can provide fundamentally different views of the natural product chemical space, leading to substantial differences in pairwise similarity and performance. While Extended Connectivity Fingerprints are the de-facto option to encoding drug-like compounds, other fingerprints resulted to match or outperform them for bioactivity prediction of natural products. These results highlight the need to evaluate multiple fingerprinting algorithms for optimal performance and suggest new areas of research. Finally, we provide an open-source Python package for computing all molecular fingerprints considered in the study, as well as data and scripts necessary to reproduce the results, at https://github.com/dahvida/NP_Fingerprints .
Collapse
Affiliation(s)
- Davide Boldini
- TUM School of Natural Sciences, Department of Bioscience, Technical University of Munich, Center for Functional Protein Assemblies (CPA), 85748, Garching bei München, Germany.
| | - Davide Ballabio
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, P.zza Della Scienza, 1, 20126, Milan, Italy
| | - Viviana Consonni
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, P.zza Della Scienza, 1, 20126, Milan, Italy
| | - Roberto Todeschini
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, P.zza Della Scienza, 1, 20126, Milan, Italy
| | - Francesca Grisoni
- Institute for Complex Molecular Systems and Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, Netherlands
- Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Utrecht, Netherlands
| | - Stephan A Sieber
- TUM School of Natural Sciences, Department of Bioscience, Technical University of Munich, Center for Functional Protein Assemblies (CPA), 85748, Garching bei München, Germany
| |
Collapse
|
6
|
Tran TTV, Tayara H, Chong KT. Recent Studies of Artificial Intelligence on In Silico Drug Absorption. J Chem Inf Model 2023; 63:6198-6211. [PMID: 37819031 DOI: 10.1021/acs.jcim.3c00960] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/13/2023]
Abstract
Absorption is an important area of research in pharmacochemistry and drug development, because the drug has to be absorbed before any drug effects can occur. Furthermore, the ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) profile of drugs can be directly and considerably altered by modulating factors affecting absorption. Many drugs in development fail because of poor absorption. The research and continuous efforts of researchers in recent years have brought many successes and promises in drug absorption property prediction, especially in silico, which helps to reduce the time and cost significantly for screening undesirable drug candidates. In this report, we explicitly provide an overview of recent in silico studies on predicting absorption properties, especially from 2019 to the present, using artificial intelligence. Additionally, we have collected and investigated public databases that support absorption prediction research. On those grounds, we also proposed the challenges and development directions of absorption prediction in the future. We hope this review can provide researchers with valuable guidelines on absorption prediction to facilitate the development of newer approaches in drug discovery.
Collapse
Affiliation(s)
- Thi Tuyet Van Tran
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Republic of Korea
- Faculty of Information Technology, An Giang University, Long Xuyen 880000, Vietnam
- Vietnam National University, Ho Chi Minh City, Ho Chi Minh 700000, Vietnam
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, Republic of Korea
| | - Kil To Chong
- Advances Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, Republic of Korea
| |
Collapse
|
7
|
Bond A, Mccay K, Lal S. Artificial intelligence & clinical nutrition: What the future might have in store. Clin Nutr ESPEN 2023; 57:542-549. [PMID: 37739704 DOI: 10.1016/j.clnesp.2023.07.082] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 07/02/2023] [Accepted: 07/17/2023] [Indexed: 09/24/2023]
Abstract
Artificial Intelligence (AI) is a rapidly emerging technology in healthcare that has the potential to revolutionise clinical nutrition. AI can assist in analysing complex data, interpreting medical images, and providing personalised nutrition interventions for patients. Clinical nutrition is a critical aspect of patient care, and AI can help clinicians make more informed decisions regarding patients' nutritional requirements, disease prevention, and management. AI algorithms can analyse large datasets to identify novel associations between diet and disease outcomes, enabling clinicians to make evidence-based nutritional recommendations. AI-powered devices and applications can also assist in tracking dietary intake, providing feedback, and motivating patients towards healthier food choices. However, the adoption of AI in clinical nutrition raises several ethical and regulatory concerns, such as data privacy and bias. Further research is needed to assess the clinical effectiveness and safety of AI-powered nutrition interventions. In conclusion, AI has the potential to transform clinical nutrition, but its integration into clinical practice should be carefully monitored to ensure patient safety and benefit. This article discusses the current and future applications of AI in clinical nutrition and highlights its potential benefits.
Collapse
Affiliation(s)
- Ashley Bond
- Intestinal Failure Unit, Salford Royal Foundation Trust, UK; University of Manchester, Manchester, UK.
| | - Kevin Mccay
- Manchester Metropolitan University, Manchester, UK; Northern Care Alliance NHS Foundation Trust, Salford Royal Hospital, Salford, UK
| | - Simon Lal
- Intestinal Failure Unit, Salford Royal Foundation Trust, UK; University of Manchester, Manchester, UK
| |
Collapse
|
8
|
Cuomo A, Ibarraran S, Sreekumar S, Li H, Eun J, Menzel JP, Zhang P, Buono F, Song JJ, Crabtree RH, Batista VS, Newhouse TR. Feed-Forward Neural Network for Predicting Enantioselectivity of the Asymmetric Negishi Reaction. ACS CENTRAL SCIENCE 2023; 9:1768-1774. [PMID: 37780365 PMCID: PMC10540279 DOI: 10.1021/acscentsci.3c00512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Indexed: 10/03/2023]
Abstract
Density functional theory (DFT) is a powerful tool to model transition state (TS) energies to predict selectivity in chemical synthesis. However, a successful multistep synthesis campaign must navigate energetically narrow differences in pathways that create some limits to rapid and unambiguous application of DFT to these problems. While powerful data science techniques may provide a complementary approach to overcome this problem, doing so with the relatively small data sets that are widespread in organic synthesis presents a significant challenge. Herein, we show that a small data set can be labeled with features from DFT TS calculations to train a feed-forward neural network for predicting enantioselectivity of a Negishi cross-coupling reaction with P-chiral hindered phosphines. This approach to modeling enantioselectivity is compared with conventional approaches, including exclusive use of DFT energies and data science approaches, using features from ligands or ground states with neural network architectures.
Collapse
Affiliation(s)
- Abbigayle
E. Cuomo
- Department
of Chemistry, Yale University, New Haven, Connecticut 06511, United States
| | - Sebastian Ibarraran
- Department
of Chemistry, Yale University, New Haven, Connecticut 06511, United States
| | - Sanil Sreekumar
- Chemical
Development, Boehringer Ingelheim Pharmaceuticals
Inc, 900 Ridgebury Road, Ridgefield, Connecticut 06877, United States
| | - Haote Li
- Department
of Chemistry, Yale University, New Haven, Connecticut 06511, United States
| | - Jungmin Eun
- Department
of Chemistry, Yale University, New Haven, Connecticut 06511, United States
| | - Jan Paul Menzel
- Department
of Chemistry, Yale University, New Haven, Connecticut 06511, United States
| | - Pengpeng Zhang
- Department
of Chemistry, Yale University, New Haven, Connecticut 06511, United States
| | - Frederic Buono
- Chemical
Development, Boehringer Ingelheim Pharmaceuticals
Inc, 900 Ridgebury Road, Ridgefield, Connecticut 06877, United States
| | - Jinhua J. Song
- Chemical
Development, Boehringer Ingelheim Pharmaceuticals
Inc, 900 Ridgebury Road, Ridgefield, Connecticut 06877, United States
| | - Robert H. Crabtree
- Department
of Chemistry, Yale University, New Haven, Connecticut 06511, United States
| | - Victor S. Batista
- Department
of Chemistry, Yale University, New Haven, Connecticut 06511, United States
| | - Timothy R. Newhouse
- Department
of Chemistry, Yale University, New Haven, Connecticut 06511, United States
| |
Collapse
|
9
|
Fang C, Wang Y, Grater R, Kapadnis S, Black C, Trapa P, Sciabola S. Prospective Validation of Machine Learning Algorithms for Absorption, Distribution, Metabolism, and Excretion Prediction: An Industrial Perspective. J Chem Inf Model 2023. [PMID: 37216672 DOI: 10.1021/acs.jcim.3c00160] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Absorption, distribution, metabolism, and excretion (ADME), which collectively define the concentration profile of a drug at the site of action, are of critical importance to the success of a drug candidate. Recent advances in machine learning algorithms and the availability of larger proprietary as well as public ADME data sets have generated renewed interest within the academic and pharmaceutical science communities in predicting pharmacokinetic and physicochemical endpoints in early drug discovery. In this study, we collected 120 internal prospective data sets over 20 months across six ADME in vitro endpoints: human and rat liver microsomal stability, MDR1-MDCK efflux ratio, solubility, and human and rat plasma protein binding. A variety of machine learning algorithms in combination with different molecular representations were evaluated. Our results suggest that gradient boosting decision tree and deep learning models consistently outperformed random forest over time. We also observed better performance when models were retrained on a fixed schedule, and the more frequent retraining generally resulted in increased accuracy, while hyperparameters tuning only improved the prospective predictions marginally.
Collapse
Affiliation(s)
- Cheng Fang
- Medicinal Chemistry, Biogen, Cambridge, Massachusetts 02142, United States
| | - Ye Wang
- Medicinal Chemistry, Biogen, Cambridge, Massachusetts 02142, United States
| | - Richard Grater
- DMPK, Biogen, Cambridge, Massachusetts 02142, United States
| | | | - Cheryl Black
- DMPK, Biogen, Cambridge, Massachusetts 02142, United States
| | - Patrick Trapa
- DMPK, Biogen, Cambridge, Massachusetts 02142, United States
| | - Simone Sciabola
- Medicinal Chemistry, Biogen, Cambridge, Massachusetts 02142, United States
| |
Collapse
|
10
|
Chen S, Wulamu A, Zou Q, Zheng H, Wen L, Guo X, Chen H, Zhang T, Zhang Y. MD-GNN: A mechanism-data-driven graph neural network for molecular properties prediction and new material discovery. J Mol Graph Model 2023; 123:108506. [PMID: 37182505 DOI: 10.1016/j.jmgm.2023.108506] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 04/12/2023] [Accepted: 04/30/2023] [Indexed: 05/16/2023]
Abstract
Molecular properties prediction and new material discovery are significant for the pharmaceutical industry, food, chemistry, and other fields. The popular methods are theoretical mechanism calculation and machine learning. There is a deviation between the theoretical mechanism calculation results and the experimental data. Machine learning method provides a promising solution. However, the process is lack of interpretability, and the reliability and the generalization depend on the training data. In this paper, a mechanism correction model combined with graph neural network (GNN) model which is based on the fusion of graph embedding and descriptors vector is proposed as backbone network to proceed molecule properties prediction and new material discovery. The molecular structure is input to graph neural network and the abstracted features are fused with numerical features together for training. The experiment data and computing data are designed as label constructor, and then the theoretical computation (mechanism driven model) is fused with the output of GNN (data-driven model) to form a fused model to modulate the output for the molecular property prediction. Experiments for public data set are executed and the results show that Mechanism-Data-Driven Graph Neural Network (MD-GNN) can effectively make the predicted results more accurate. Nineteen molecules by different construction are designed for potential drug discovery, the prediction from the proposed MD-GNN model shows that there are 9 candidates are discovered.
Collapse
Affiliation(s)
- Saian Chen
- Department of Computer, School of Computer and Communication Engineering, University of Science and Technology Beijing (USTB), Beijing, 100083, China; Beijing Key Laboratory of Knowledge Engineering for Materials Science, Beijing, 100083, China
| | - Aziguli Wulamu
- Department of Computer, School of Computer and Communication Engineering, University of Science and Technology Beijing (USTB), Beijing, 100083, China; Beijing Key Laboratory of Knowledge Engineering for Materials Science, Beijing, 100083, China
| | - Qiping Zou
- Key Laboratory of AI and Information Processing (Hechi University), Education Department of Guangxi Zhuang Autonomous Region, Hechi, 546300, Guangxi, China
| | - Han Zheng
- Key Laboratory of AI and Information Processing (Hechi University), Education Department of Guangxi Zhuang Autonomous Region, Hechi, 546300, Guangxi, China
| | - Li Wen
- Department of Business Administration, School of Business, City University of Macau (City U), Macao, 999078, China
| | - Xi Guo
- Department of Computer, School of Computer and Communication Engineering, University of Science and Technology Beijing (USTB), Beijing, 100083, China; Beijing Key Laboratory of Knowledge Engineering for Materials Science, Beijing, 100083, China
| | - Han Chen
- Department of Computer, School of Computer and Communication Engineering, University of Science and Technology Beijing (USTB), Beijing, 100083, China; Beijing Key Laboratory of Knowledge Engineering for Materials Science, Beijing, 100083, China
| | - Taohong Zhang
- Department of Computer, School of Computer and Communication Engineering, University of Science and Technology Beijing (USTB), Beijing, 100083, China; Beijing Key Laboratory of Knowledge Engineering for Materials Science, Beijing, 100083, China.
| | - Ying Zhang
- QingGong College, North China University of Science and Technology, TangShan, Hebei, 064000, China
| |
Collapse
|
11
|
Fine-tuning BERT for automatic ADME semantic labeling in FDA drug labeling to enhance product-specific guidance assessment. J Biomed Inform 2023; 138:104285. [PMID: 36632860 DOI: 10.1016/j.jbi.2023.104285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2022] [Revised: 10/25/2022] [Accepted: 01/07/2023] [Indexed: 01/11/2023]
Abstract
Product-specific guidances (PSGs) recommended by the United States Food and Drug Administration (FDA) are instrumental to promote and guide generic drug product development. To assess a PSG, the FDA assessor needs to take extensive time and effort to manually retrieve supportive drug information of absorption, distribution, metabolism, and excretion (ADME) from the reference listed drug labeling. In this work, we leveraged the state-of-the-art pre-trained language models to automatically label the ADME paragraphs in the pharmacokinetics section from the FDA-approved drug labeling to facilitate PSG assessment. We applied a transfer learning approach by fine-tuning the pre-trained Bidirectional Encoder Representations from Transformers (BERT) model to develop a novel application of ADME semantic labeling, which can automatically retrieve ADME paragraphs from drug labeling instead of manual work. We demonstrate that fine-tuning the pre-trained BERT model can outperform conventional machine learning techniques, achieving up to 12.5% absolute F1 improvement. To our knowledge, we were the first to successfully apply BERT to solve the ADME semantic labeling task. We further assessed the relative contribution of pre-training and fine-tuning to the overall performance of the BERT model in the ADME semantic labeling task using a series of analysis methods, such as attention similarity and layer-based ablations. Our analysis revealed that the information learned via fine-tuning is focused on task-specific knowledge in the top layers of the BERT, whereas the benefit from the pre-trained BERT model is from the bottom layers.
Collapse
|
12
|
Recent Studies of Artificial Intelligence on In Silico Drug Distribution Prediction. Int J Mol Sci 2023; 24:ijms24031815. [PMID: 36768139 PMCID: PMC9915725 DOI: 10.3390/ijms24031815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Revised: 01/11/2023] [Accepted: 01/13/2023] [Indexed: 01/19/2023] Open
Abstract
Drug distribution is an important process in pharmacokinetics because it has the potential to influence both the amount of medicine reaching the active sites and the effectiveness as well as safety of the drug. The main causes of 90% of drug failures in clinical development are lack of efficacy and uncontrolled toxicity. In recent years, several advances and promising developments in drug distribution property prediction have been achieved, especially in silico, which helped to drastically reduce the time and expense of screening undesired drug candidates. In this study, we provide comprehensive knowledge of drug distribution background, influencing factors, and artificial intelligence-based distribution property prediction models from 2019 to the present. Additionally, we gathered and analyzed public databases and datasets commonly utilized by the scientific community for distribution prediction. The distribution property prediction performance of five large ADMET prediction tools is mentioned as a benchmark for future research. On this basis, we also offer future challenges in drug distribution prediction and research directions. We hope that this review will provide researchers with helpful insight into distribution prediction, thus facilitating the development of innovative approaches for drug discovery.
Collapse
|
13
|
Interactive framework for Covid-19 detection and segmentation with feedback facility for dynamically improved accuracy and trust. PLoS One 2022; 17:e0278487. [PMID: 36548288 PMCID: PMC9778629 DOI: 10.1371/journal.pone.0278487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2022] [Accepted: 11/17/2022] [Indexed: 12/24/2022] Open
Abstract
Due to the severity and speed of spread of the ongoing Covid-19 pandemic, fast but accurate diagnosis of Covid-19 patients has become a crucial task. Achievements in this respect might enlighten future efforts for the containment of other possible pandemics. Researchers from various fields have been trying to provide novel ideas for models or systems to identify Covid-19 patients from different medical and non-medical data. AI-based researchers have also been trying to contribute to this area by mostly providing novel approaches of automated systems using convolutional neural network (CNN) and deep neural network (DNN) for Covid-19 detection and diagnosis. Due to the efficiency of deep learning (DL) and transfer learning (TL) models in classification and segmentation tasks, most of the recent AI-based researches proposed various DL and TL models for Covid-19 detection and infected region segmentation from chest medical images like X-rays or CT images. This paper describes a web-based application framework for Covid-19 lung infection detection and segmentation. The proposed framework is characterized by a feedback mechanism for self learning and tuning. It uses variations of three popular DL models, namely Mask R-CNN, U-Net, and U-Net++. The models were trained, evaluated and tested using CT images of Covid patients which were collected from two different sources. The web application provide a simple user friendly interface to process the CT images from various resources using the chosen models, thresholds and other parameters to generate the decisions on detection and segmentation. The models achieve high performance scores for Dice similarity, Jaccard similarity, accuracy, loss, and precision values. The U-Net model outperformed the other models with more than 98% accuracy.
Collapse
|
14
|
Ji Y, Li R, Tian Y, Chen G, Yan A. Classification models and SAR analysis on thromboxane A 2 synthase inhibitors by machine learning methods. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2022; 33:429-462. [PMID: 35678125 DOI: 10.1080/1062936x.2022.2078880] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Accepted: 05/11/2022] [Indexed: 06/15/2023]
Abstract
Thromboxane A2 synthase (TXS) is a promising drug target for cardiovascular diseases and cancer. In this work, we conducted a structure-activity relationship (SAR) study on 526 TXS inhibitors for bioactivity prediction. Three types of descriptors (MACCS fingerprints, ECFP4 fingerprints, and MOE descriptors) were utilized to characterize inhibitors, 24 classification models were developed by support vector machine (SVM), random forest (RF), extreme gradient boosting (XGBoost), and deep neural networks (DNN). Then we reduced the number of fingerprints according to the contribution of descriptors to the models, and constructed 16 extra models on simplified fingerprints. In general, Model_4D built by DNN algorithm and 67 bits MACCS fingerprints performs best. The prediction accuracy of the model on the test set is 0.969, and Matthews correlation coefficient (MCC) is 0.936. The distance between compound and model (dSTD-PRO) was used to characterize the application domain of the model. In the test set of Model_4D, dSTD-PRO of 91.5% compounds is lower than the corresponding training set threshold (threshold0.90 = 0.1055), and the accuracy of these compounds is 0.983. In addition, the important descriptors were summarized and further analyzed. It showed that aromatic nitrogenous heterocyclic groups were beneficial to improve the bioactivity of TXS inhibitors.
Collapse
Affiliation(s)
- Y Ji
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, Beijing, P. R. China
| | - R Li
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, Beijing, P. R. China
| | - Y Tian
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, Beijing, P. R. China
| | - G Chen
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing, China
| | - A Yan
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, Beijing, P. R. China
| |
Collapse
|
15
|
Tuning hyperparameters of machine learning algorithms and deep neural networks using metaheuristics: A bioinformatics study on biomedical and biological cases. Comput Biol Chem 2022; 97:107619. [DOI: 10.1016/j.compbiolchem.2021.107619] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Revised: 08/23/2021] [Accepted: 12/17/2021] [Indexed: 12/14/2022]
|
16
|
Accurate predictions of drugs aqueous solubility via deep learning tools. J Mol Struct 2022. [DOI: 10.1016/j.molstruc.2021.131562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
17
|
Brooks-Warburton J, Ashton J, Dhar A, Tham T, Allen PB, Hoque S, Lovat LB, Sebastian S. Artificial intelligence and inflammatory bowel disease: practicalities and future prospects. Frontline Gastroenterol 2021; 13:325-331. [PMID: 35722596 PMCID: PMC9186028 DOI: 10.1136/flgastro-2021-102003] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Accepted: 11/16/2021] [Indexed: 02/04/2023] Open
Abstract
Artificial intelligence (AI) is an emerging technology predicted to have significant applications in healthcare. This review highlights AI applications that impact the patient journey in inflammatory bowel disease (IBD), from genomics to endoscopic applications in disease classification, stratification and self-monitoring to risk stratification for personalised management. We discuss the practical AI applications currently in use while giving a balanced view of concerns and pitfalls and look to the future with the potential of where AI can provide significant value to the care of the patient with IBD.
Collapse
Affiliation(s)
- Johanne Brooks-Warburton
- Department of Clinical Pharmacology and Biological Sciences, University of Hertfordshire, Hatfield, UK,Gastroenterology Department, Lister Hospital, Stevenage, UK
| | - James Ashton
- Paediatric Gastroenterology, Southampton University Hospitals NHS Trust, Southampton, UK
| | - Anjan Dhar
- Gastroenterology, County Durham & Darlington NHS Foundation Trust, Bishop Auckland, UK
| | - Tony Tham
- Department of Gastroenterology, Ulster Hospital, Dundonald, UK
| | - Patrick B Allen
- Department of Gastroenterology, Ulster Hospital, Dundonald, UK
| | - Sami Hoque
- Department of Gastroenterology, Barts Health NHS Trust, London, UK
| | - Laurence B Lovat
- Division of Surgery & Interventional Science, University College London, London, UK
| | - Shaji Sebastian
- Department of Gastroenterology, Hull University Teaching Hospitals NHS Trust, Hull, UK,Hull York Medical School, Hull, UK
| |
Collapse
|
18
|
Muller C, Rabal O, Diaz Gonzalez C. Artificial Intelligence, Machine Learning, and Deep Learning in Real-Life Drug Design Cases. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2021; 2390:383-407. [PMID: 34731478 DOI: 10.1007/978-1-0716-1787-8_16] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
The discovery and development of drugs is a long and expensive process with a high attrition rate. Computational drug discovery contributes to ligand discovery and optimization, by using models that describe the properties of ligands and their interactions with biological targets. In recent years, artificial intelligence (AI) has made remarkable modeling progress, driven by new algorithms and by the increase in computing power and storage capacities, which allow the processing of large amounts of data in a short time. This review provides the current state of the art of AI methods applied to drug discovery, with a focus on structure- and ligand-based virtual screening, library design and high-throughput analysis, drug repurposing and drug sensitivity, de novo design, chemical reactions and synthetic accessibility, ADMET, and quantum mechanics.
Collapse
Affiliation(s)
- Christophe Muller
- Evotec (France) SAS, Computational Drug Discovery, Integrated Drug Discovery, Toulouse, France
| | - Obdulia Rabal
- Evotec (France) SAS, Computational Drug Discovery, Integrated Drug Discovery, Toulouse, France
| | | |
Collapse
|
19
|
Grebner C, Matter H, Kofink D, Wenzel J, Schmidt F, Hessler G. Application of Deep Neural Network Models in Drug Discovery Programs. ChemMedChem 2021; 16:3772-3786. [PMID: 34596968 DOI: 10.1002/cmdc.202100418] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 09/29/2021] [Indexed: 12/14/2022]
Abstract
In silico driven optimization of compound properties related to pharmacokinetics, pharmacodynamics, and safety is a key requirement in modern drug discovery. Nowadays, large and harmonized datasets allow to implement deep neural networks (DNNs) as a framework for leveraging predictive models. Nevertheless, various available model architectures differ in their global applicability and performance in lead optimization projects, such as stability over time and interpretability of the results. Here, we describe and compare the value of established DNN-based methods for the prediction of key ADME property trends and biological activity in an industrial drug discovery environment, represented by microsomal lability, CYP3A4 inhibition and factor Xa inhibition. Three architectures are exemplified, our earlier described multilayer perceptron approach (MLP), graph convolutional network-based models (GCN) and a vector representation approach, Mol2Vec. From a statistical perspective, MLP and GCN were found to perform superior over Mol2Vec, when applied to external validation sets. Interestingly, GCN-based predictions are most stable over a longer period in a time series validation study. Apart from those statistical observations, DNN prove of value to guide local SAR. To illustrate this important aspect in pharmaceutical research projects, we discuss challenging applications in medicinal chemistry towards a more realistic picture of artificial intelligence in drug discovery.
Collapse
Affiliation(s)
- Christoph Grebner
- Sanofi-Aventis Deutschland GmbH, R&D, Integrated Drug Discovery, Industriepark Höchst, 65926, Frankfurt am Main, Germany
| | - Hans Matter
- Sanofi-Aventis Deutschland GmbH, R&D, Integrated Drug Discovery, Industriepark Höchst, 65926, Frankfurt am Main, Germany
| | - Daniel Kofink
- Sanofi-Aventis France SA, R&D, Digital & Data Science, AI and Deep Analytics, 1 Avenue Pierre Brossolette, 91380, Chilly-Mazarin, France
| | - Jan Wenzel
- Sanofi-Aventis Deutschland GmbH, R&D, Preclinical Safety, Industriepark Höchst, 65926, Frankfurt am Main, Germany
| | - Friedemann Schmidt
- Sanofi-Aventis Deutschland GmbH, R&D, Preclinical Safety, Industriepark Höchst, 65926, Frankfurt am Main, Germany
| | - Gerhard Hessler
- Sanofi-Aventis Deutschland GmbH, R&D, Integrated Drug Discovery, Industriepark Höchst, 65926, Frankfurt am Main, Germany
| |
Collapse
|
20
|
Brown N, Ertl P, Lewis R, Luksch T, Reker D, Schneider N. Artificial intelligence in chemistry and drug design. J Comput Aided Mol Des 2021; 34:709-715. [PMID: 32468207 DOI: 10.1007/s10822-020-00317-x] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Affiliation(s)
- Nathan Brown
- BenevolentAI, 4-8 Maple Street, London, W1T 5HD, UK
| | - Peter Ertl
- Novartis Institutes for BioMedical Research, 4056, Basel, Switzerland
| | - Richard Lewis
- Novartis Institutes for BioMedical Research, 4056, Basel, Switzerland.
| | | | - Daniel Reker
- Koch Institute for Integrative Cancer Research and MIT-IBM Watson AI Lab, Massachusetts Institute of Technology, Cambridge, MA, 02142, USA. .,Division of Gastroenterology, Hepatology and Endoscopy, Department of Medicine, Harvard Medical School, Brigham and Women's Hospital, Boston, MA,, 02115, USA.
| | - Nadine Schneider
- Novartis Institutes for BioMedical Research, 4056, Basel, Switzerland
| |
Collapse
|
21
|
Aleksić S, Seeliger D, Brown JB. ADMET Predictability at Boehringer Ingelheim: State-of-the-Art, and Do Bigger Datasets or Algorithms Make a Difference? Mol Inform 2021; 41:e2100113. [PMID: 34473408 DOI: 10.1002/minf.202100113] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Accepted: 08/21/2021] [Indexed: 11/08/2022]
Abstract
Computational methods assisting drug discovery and development are routine in the pharmaceutical industry. Digital recording of ADMET assays has provided a rich source of data for development of predictive models. Despite the accumulation of data and the public availability of advanced modeling algorithms, the utility of prediction in ADMET research is not clear. Here, we present a critical evaluation of the relationships between data volume, modeling algorithm, chemical representation and grouping, and temporal aspect (time sequence of assays) using an in-house ADMET database. We find no large difference in prediction algorithms nor any systemic and substantial gain from increasingly large datasets. Temporal-based data enlargement led to performance improvement in only in a limited number of assays, and with fractional improvement at best. Assays that are well-, intermediately-, or poorly-suited for ADMET predictions and reasons for such behavior are systematically identified, generating realistic expectations for areas in which computational models can be used to guide decision making in molecular design and development.
Collapse
Affiliation(s)
- Stevan Aleksić
- Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, 88397, Biberach, Germany
| | - Daniel Seeliger
- Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, 88397, Biberach, Germany
| | - J B Brown
- Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, 88397, Biberach, Germany
| |
Collapse
|
22
|
|
23
|
Ciallella HL, Russo DP, Aleksunes LM, Grimm FA, Zhu H. Revealing Adverse Outcome Pathways from Public High-Throughput Screening Data to Evaluate New Toxicants by a Knowledge-Based Deep Neural Network Approach. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2021; 55:10875-10887. [PMID: 34304572 PMCID: PMC8713073 DOI: 10.1021/acs.est.1c02656] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
Traditional experimental testing to identify endocrine disruptors that enhance estrogenic signaling relies on expensive and labor-intensive experiments. We sought to design a knowledge-based deep neural network (k-DNN) approach to reveal and organize public high-throughput screening data for compounds with nuclear estrogen receptor α and β (ERα and ERβ) binding potentials. The target activity was rodent uterotrophic bioactivity driven by ERα/ERβ activations. After training, the resultant network successfully inferred critical relationships among ERα/ERβ target bioassays, shown as weights of 6521 edges between 1071 neurons. The resultant network uses an adverse outcome pathway (AOP) framework to mimic the signaling pathway initiated by ERα and identify compounds that mimic endogenous estrogens (i.e., estrogen mimetics). The k-DNN can predict estrogen mimetics by activating neurons representing several events in the ERα/ERβ signaling pathway. Therefore, this virtual pathway model, starting from a compound's chemistry initiating ERα activation and ending with rodent uterotrophic bioactivity, can efficiently and accurately prioritize new estrogen mimetics (AUC = 0.864-0.927). This k-DNN method is a potential universal computational toxicology strategy to utilize public high-throughput screening data to characterize hazards and prioritize potentially toxic compounds.
Collapse
Affiliation(s)
- Heather L Ciallella
- Center for Computational and Integrative Biology, Rutgers University Camden, Camden, New Jersey 08103, United States
| | - Daniel P Russo
- Center for Computational and Integrative Biology, Rutgers University Camden, Camden, New Jersey 08103, United States
- Department of Chemistry, Rutgers University Camden, Camden, New Jersey 08102, United States
| | - Lauren M Aleksunes
- Department of Pharmacology and Toxicology, Ernest Mario School of Pharmacy, Rutgers University, Piscataway, New Jersey 08854, United States
| | - Fabian A Grimm
- ExxonMobil Biomedical Sciences, Inc., Annandale, New Jersey 08801, United States
| | - Hao Zhu
- Center for Computational and Integrative Biology, Rutgers University Camden, Camden, New Jersey 08103, United States
- Department of Chemistry, Rutgers University Camden, Camden, New Jersey 08102, United States
| |
Collapse
|
24
|
Ciallella HL, Russo DP, Aleksunes LM, Grimm FA, Zhu H. Predictive modeling of estrogen receptor agonism, antagonism, and binding activities using machine- and deep-learning approaches. J Transl Med 2021; 101:490-502. [PMID: 32778734 PMCID: PMC7873171 DOI: 10.1038/s41374-020-00477-2] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Revised: 07/19/2020] [Accepted: 07/21/2020] [Indexed: 11/23/2022] Open
Abstract
As defined by the World Health Organization, an endocrine disruptor is an exogenous substance or mixture that alters function(s) of the endocrine system and consequently causes adverse health effects in an intact organism, its progeny, or (sub)populations. Traditional experimental testing regimens to identify toxicants that induce endocrine disruption can be expensive and time-consuming. Computational modeling has emerged as a promising and cost-effective alternative method for screening and prioritizing potentially endocrine-active compounds. The efficient identification of suitable chemical descriptors and machine-learning algorithms, including deep learning, is a considerable challenge for computational toxicology studies. Here, we sought to apply classic machine-learning algorithms and deep-learning approaches to a panel of over 7500 compounds tested against 18 Toxicity Forecaster assays related to nuclear estrogen receptor (ERα and ERβ) activity. Three binary fingerprints (Extended Connectivity FingerPrints, Functional Connectivity FingerPrints, and Molecular ACCess System) were used as chemical descriptors in this study. Each descriptor was combined with four machine-learning and two deep- learning (normal and multitask neural networks) approaches to construct models for all 18 ER assays. The resulting model performance was evaluated using the area under the receiver- operating curve (AUC) values obtained from a fivefold cross-validation procedure. The results showed that individual models have AUC values that range from 0.56 to 0.86. External validation was conducted using two additional sets of compounds (n = 592 and n = 966) with established interactions with nuclear ER demonstrated through experimentation. An agonist, antagonist, or binding score was determined for each compound by averaging its predicted probabilities in relevant assay models as an external validation, yielding AUC values ranging from 0.63 to 0.91. The results suggest that multitask neural networks offer advantages when modeling mechanistically related endpoints. Consensus predictions based on the average values of individual models remain the best modeling strategy for computational toxicity evaluations.
Collapse
Affiliation(s)
- Heather L Ciallella
- Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, USA
| | - Daniel P Russo
- Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, USA
| | - Lauren M Aleksunes
- Department of Pharmacology and Toxicology, Ernest Mario School of Pharmacy, Rutgers University, Piscataway, NJ, USA
| | - Fabian A Grimm
- ExxonMobil Biomedical Sciences, Inc., Annandale, NJ, USA
| | - Hao Zhu
- Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, USA.
- Department of Chemistry, Rutgers University, Camden, NJ, USA.
| |
Collapse
|
25
|
|
26
|
|
27
|
Gao P, Zhang J, Sun Y, Yu J. Accurate predictions of aqueous solubility of drug molecules via the multilevel graph convolutional network (MGCN) and SchNet architectures. Phys Chem Chem Phys 2020; 22:23766-23772. [PMID: 33063077 DOI: 10.1039/d0cp03596c] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Deep learning based methods have been widely applied to predict various kinds of molecular properties in the pharmaceutical industry with increasingly more success. In this study, we propose two novel models for aqueous solubility predictions, based on the Multilevel Graph Convolutional Network (MGCN) and SchNet architectures, respectively. The advantage of the MGCN lies in the fact that it could extract the graph features of the target molecules directly from the (3D) structural information; therefore, it doesn't need to rely on a lot of intra-molecular descriptors to learn the features, which are of significance for accurate predictions of the molecular properties. The SchNet performs well in modelling the interatomic interactions inside a molecule, and such a deep learning architecture is also capable of extracting structural information and further predicting the related properties. The actual accuracy of these two novel approaches was systematically benchmarked with four different independent datasets. We found that both the MGCN and SchNet models performed well for aqueous solubility predictions. In the future, we believe such promising predictive models will be applicable to enhancing the efficiency of the screening, crystallization and delivery of drug molecules, essentially as a useful tool to promote the development of molecular pharmaceutics.
Collapse
Affiliation(s)
- Peng Gao
- School of Chemistry and Molecular Bioscience, University of Wollongong, NSW 2500, Australia
| | | | | | | |
Collapse
|
28
|
Zhao L, Ciallella HL, Aleksunes LM, Zhu H. Advancing computer-aided drug discovery (CADD) by big data and data-driven machine learning modeling. Drug Discov Today 2020; 25:1624-1638. [PMID: 32663517 PMCID: PMC7572559 DOI: 10.1016/j.drudis.2020.07.005] [Citation(s) in RCA: 66] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2020] [Revised: 06/26/2020] [Accepted: 07/06/2020] [Indexed: 02/06/2023]
Abstract
Advancing a new drug to market requires substantial investments in time as well as financial resources. Crucial bioactivities for drug candidates, including their efficacy, pharmacokinetics (PK), and adverse effects, need to be investigated during drug development. With advancements in chemical synthesis and biological screening technologies over the past decade, a large amount of biological data points for millions of small molecules have been generated and are stored in various databases. These accumulated data, combined with new machine learning (ML) approaches, such as deep learning, have shown great potential to provide insights into relevant chemical structures to predict in vitro, in vivo, and clinical outcomes, thereby advancing drug discovery and development in the big data era.
Collapse
Affiliation(s)
- Linlin Zhao
- The Rutgers Center for Computational and Integrative Biology, Camden, NJ 08102, USA
| | - Heather L Ciallella
- The Rutgers Center for Computational and Integrative Biology, Camden, NJ 08102, USA
| | - Lauren M Aleksunes
- Department of Pharmacology and Toxicology, Ernest Mario School of Pharmacy, Rutgers University, Piscataway, NJ 08854, USA
| | - Hao Zhu
- The Rutgers Center for Computational and Integrative Biology, Camden, NJ 08102, USA; Department of Chemistry, Rutgers University, Camden, NJ 08102, USA.
| |
Collapse
|
29
|
Hu B, Zhou X, Mohutsky MA, Desai PV. Structure–Property Relationships and Machine Learning Models for Addressing CYP3A4-Mediated Victim Drug–Drug Interaction Risk in Drug Discovery. Mol Pharm 2020; 17:3600-3608. [DOI: 10.1021/acs.molpharmaceut.0c00637] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Affiliation(s)
- Bingjie Hu
- Computational ADME, ADME−Toxicology−PKPD, Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, Indiana 46285, United States
| | - Xin Zhou
- ADME−Toxicology−PKPD, Lilly Research Laboratories, Eli Lilly and Company, San Diego, California 92121, United States
| | - Michael A. Mohutsky
- Investigational Drug Disposition, ADME−Toxicology−PKPD, Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, Indiana 46285, United States
| | - Prashant V. Desai
- Computational ADME, ADME−Toxicology−PKPD, Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, Indiana 46285, United States
| |
Collapse
|
30
|
Shen J, Nicolaou CA. Molecular property prediction: recent trends in the era of artificial intelligence. DRUG DISCOVERY TODAY. TECHNOLOGIES 2020; 32-33:29-36. [PMID: 33386091 DOI: 10.1016/j.ddtec.2020.05.001] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Revised: 03/10/2020] [Accepted: 04/06/2020] [Indexed: 12/18/2022]
Abstract
Artificial intelligence (AI) has become a powerful tool in many fields, including drug discovery. Among various AI applications, molecular property prediction can have more significant immediate impact to the drug discovery process since most algorithms and methods use predicted properties to evaluate, select, and generate molecules. Herein, we provide a brief review of the state-of-art molecular property prediction methodologies and discuss examples reported recently. We highlight key techniques that have been applied to molecular property prediction such as learned representation, multi-task learning, transfer learning, and federated learning. We also point out some critical but less discussed issues such as data set quality, benchmark, model performance evaluation, and prediction confidence quantification.
Collapse
Affiliation(s)
- Jie Shen
- Advanced Analytics and Data Sciences, Eli Lilly and Company, Indianapolis, IN 46285, United States.
| | - Christos A Nicolaou
- Discovery Chemistry Research & Technologies, Eli Lilly and Company, Indianapolis, IN 46285, United States.
| |
Collapse
|
31
|
Affiliation(s)
- Günter Klambauer
- Johannes Kepler University , LIT AI Lab & Institute for Machine Learning , 4040 Linz , Austria
| | - Sepp Hochreiter
- Johannes Kepler University , LIT AI Lab & Institute for Machine Learning , 4040 Linz , Austria
| | - Matthias Rarey
- Universität Hamburg , ZBH-Center for Bioinformatics , 20146 Hamburg , Germany
| |
Collapse
|
32
|
Sturm N, Mayr A, Le Van T, Chupakhin V, Ceulemans H, Wegner J, Golib-Dzib JF, Jeliazkova N, Vandriessche Y, Böhm S, Cima V, Martinovic J, Greene N, Vander Aa T, Ashby TJ, Hochreiter S, Engkvist O, Klambauer G, Chen H. Industry-scale application and evaluation of deep learning for drug target prediction. J Cheminform 2020; 12:26. [PMID: 33430964 PMCID: PMC7169028 DOI: 10.1186/s13321-020-00428-5] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Accepted: 03/30/2020] [Indexed: 12/02/2022] Open
Abstract
Artificial intelligence (AI) is undergoing a revolution thanks to the breakthroughs of machine learning algorithms in computer vision, speech recognition, natural language processing and generative modelling. Recent works on publicly available pharmaceutical data showed that AI methods are highly promising for Drug Target prediction. However, the quality of public data might be different than that of industry data due to different labs reporting measurements, different measurement techniques, fewer samples and less diverse and specialized assays. As part of a European funded project (ExCAPE), that brought together expertise from pharmaceutical industry, machine learning, and high-performance computing, we investigated how well machine learning models obtained from public data can be transferred to internal pharmaceutical industry data. Our results show that machine learning models trained on public data can indeed maintain their predictive power to a large degree when applied to industry data. Moreover, we observed that deep learning derived machine learning models outperformed comparable models, which were trained by other machine learning algorithms, when applied to internal pharmaceutical company datasets. To our knowledge, this is the first large-scale study evaluating the potential of machine learning and especially deep learning directly at the level of industry-scale settings and moreover investigating the transferability of publicly learned target prediction models towards industrial bioactivity prediction pipelines.
Collapse
Affiliation(s)
- Noé Sturm
- Clinical Pharmacology and Safety Science, R&D BioPharmaceuticals, AstraZeneca, Pepparedsleden 1, 43183, Mölndal, Sweden.
| | - Andreas Mayr
- LIT AI Lab & Institute for Machine Learning, Johannes Kepler University Linz, Altenberger Str. 69, 4040, Linz, Austria
| | - Thanh Le Van
- High-Dimensional Biology & Discovery Data Sciences, Discovery Sciences, Janssen Pharmaceutica, Turnhoutseweg 30, 2349, Beerse, Belgium
| | - Vladimir Chupakhin
- High-Dimensional Biology & Discovery Data Sciences, Discovery Sciences, Janssen R&D, 1400 McKean Rd, Spring House, Pennsylvania, 19002, USA
| | - Hugo Ceulemans
- High-Dimensional Biology & Discovery Data Sciences, Discovery Sciences, Janssen Pharmaceutica, Turnhoutseweg 30, 2349, Beerse, Belgium
| | - Joerg Wegner
- High-Dimensional Biology & Discovery Data Sciences, Discovery Sciences, Janssen Pharmaceutica, Turnhoutseweg 30, 2349, Beerse, Belgium
| | - Jose-Felipe Golib-Dzib
- High-Dimensional Biology & Discovery Data Sciences, Discovery Sciences, Janssen Cilag SA, Calle Río Jarama, 75A, 45007, Toledo, Spain
| | - Nina Jeliazkova
- Ideaconsult Ltd., 4. Angel Kanchev Str., 1000, Sofia, Bulgaria
| | - Yves Vandriessche
- Intel Corporation, Data Center Group, Veldkant 31, 2550, Kontich, Belgium
| | - Stanislav Böhm
- IT4Innovations, VSB - Technical University of Ostrava, 17. Listopadu 2172/15, 70800, Ostrava-Poruba, Czech Republic
| | - Vojtech Cima
- IT4Innovations, VSB - Technical University of Ostrava, 17. Listopadu 2172/15, 70800, Ostrava-Poruba, Czech Republic
| | - Jan Martinovic
- IT4Innovations, VSB - Technical University of Ostrava, 17. Listopadu 2172/15, 70800, Ostrava-Poruba, Czech Republic
| | - Nigel Greene
- Clinical Pharmacology and Safety Science, R&D BioPharmaceuticals, AstraZeneca, Pepparedsleden 1, 43183, Mölndal, Sweden
| | - Tom Vander Aa
- Exascience Lab, Imec, Kapeldreef 75, 3001, Louvain, Belgium
| | - Thomas J Ashby
- Exascience Lab, Imec, Kapeldreef 75, 3001, Louvain, Belgium
| | - Sepp Hochreiter
- LIT AI Lab & Institute for Machine Learning, Johannes Kepler University Linz, Altenberger Str. 69, 4040, Linz, Austria
| | - Ola Engkvist
- Hit Discovery, Discovery Sciences, R&D BioPharmaceuticals, AstraZeneca, Pepparedsleden 1, 43183, Mölndal, Sweden
| | - Günter Klambauer
- LIT AI Lab & Institute for Machine Learning, Johannes Kepler University Linz, Altenberger Str. 69, 4040, Linz, Austria.
| | - Hongming Chen
- Hit Discovery, Discovery Sciences, R&D BioPharmaceuticals, AstraZeneca, Pepparedsleden 1, 43183, Mölndal, Sweden.
| |
Collapse
|
33
|
Modeling Physico-Chemical ADMET Endpoints with Multitask Graph Convolutional Networks. Molecules 2019; 25:molecules25010044. [PMID: 31877719 PMCID: PMC6982787 DOI: 10.3390/molecules25010044] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2019] [Revised: 12/19/2019] [Accepted: 12/20/2019] [Indexed: 11/19/2022] Open
Abstract
Simple physico-chemical properties, like logD, solubility, or melting point, can reveal a great deal about how a compound under development might later behave. These data are typically measured for most compounds in drug discovery projects in a medium throughput fashion. Collecting and assembling all the Bayer in-house data related to these properties allowed us to apply powerful machine learning techniques to predict the outcome of those assays for new compounds. In this paper, we report our finding that, especially for predicting physicochemical ADMET endpoints, a multitask graph convolutional approach appears a highly competitive choice. For seven endpoints of interest, we compared the performance of that approach to fully connected neural networks and different single task models. The new model shows increased predictive performance compared to previous modeling methods and will allow early prioritization of compounds even before they are synthesized. In addition, our model follows the generalized solubility equation without being explicitly trained under this constraint.
Collapse
|
34
|
Neural-based approaches to overcome feature selection and applicability domain in drug-related property prediction. Appl Soft Comput 2019. [DOI: 10.1016/j.asoc.2019.105777] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
35
|
Abstract
Due to the massive data sets available for drug candidates, modern drug discovery has advanced to the big data era. Central to this shift is the development of artificial intelligence approaches to implementing innovative modeling based on the dynamic, heterogeneous, and large nature of drug data sets. As a result, recently developed artificial intelligence approaches such as deep learning and relevant modeling studies provide new solutions to efficacy and safety evaluations of drug candidates based on big data modeling and analysis. The resulting models provided deep insights into the continuum from chemical structure to in vitro, in vivo, and clinical outcomes. The relevant novel data mining, curation, and management techniques provided critical support to recent modeling studies. In summary, the new advancement of artificial intelligence in the big data era has paved the road to future rational drug development and optimization, which will have a significant impact on drug discovery procedures and, eventually, public health.
Collapse
Affiliation(s)
- Hao Zhu
- Department of Chemistry and Center for Computational and Integrative Biology, Rutgers University, Camden, New Jersey 08102, USA;
| |
Collapse
|
36
|
Neural networks in drug discovery: current insights from medicinal chemists. Future Med Chem 2019; 11:1669-1672. [DOI: 10.4155/fmc-2019-0118] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
|
37
|
Cai C, Guo P, Zhou Y, Zhou J, Wang Q, Zhang F, Fang J, Cheng F. Deep Learning-Based Prediction of Drug-Induced Cardiotoxicity. J Chem Inf Model 2019; 59:1073-1084. [PMID: 30715873 DOI: 10.1021/acs.jcim.8b00769] [Citation(s) in RCA: 89] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Blockade of the human ether-à-go-go-related gene (hERG) channel by small molecules induces the prolongation of the QT interval which leads to fatal cardiotoxicity and accounts for the withdrawal or severe restrictions on the use of many approved drugs. In this study, we develop a deep learning approach, termed deephERG, for prediction of hERG blockers of small molecules in drug discovery and postmarketing surveillance. In total, we assemble 7,889 compounds with well-defined experimental data on the hERG and with diverse chemical structures. We find that deephERG models built by a multitask deep neural network (DNN) algorithm outperform those built by single-task DNN, naı̈ve Bayes (NB), support vector machine (SVM), random forest (RF), and graph convolutional neural network (GCNN). Specifically, the area under the receiver operating characteristic curve (AUC) value for the best model of deephERG is 0.967 on the validation set. Furthermore, based on 1,824 U.S. Food and Drug Administration (FDA) approved drugs, 29.6% drugs are computationally identified to have potential hERG inhibitory activities by deephERG, highlighting the importance of hERG risk assessment in early drug discovery. Finally, we showcase several novel predicted hERG blockers on approved antineoplastic agents, which are validated by clinical case reports, experimental evidence, and the literature. In summary, this study presents a powerful deep learning-based tool for risk assessment of hERG-mediated cardiotoxicities in drug discovery and postmarketing surveillance.
Collapse
Affiliation(s)
- Chuipu Cai
- Institute of Clinical Pharmacology , Guangzhou University of Chinese Medicine , Guangzhou 510405 , China.,School of Basic Medical Sciences , Guangzhou University of Chinese Medicine , Guangzhou 510405 , China
| | - Pengfei Guo
- Institute of Clinical Pharmacology , Guangzhou University of Chinese Medicine , Guangzhou 510405 , China
| | - Yadi Zhou
- Department of Chemistry and Biochemistry , Ohio University , Athens , Ohio 45701 , United States
| | - Jingwei Zhou
- Institute of Clinical Pharmacology , Guangzhou University of Chinese Medicine , Guangzhou 510405 , China
| | - Qi Wang
- Institute of Clinical Pharmacology , Guangzhou University of Chinese Medicine , Guangzhou 510405 , China
| | - Fengxue Zhang
- School of Basic Medical Sciences , Guangzhou University of Chinese Medicine , Guangzhou 510405 , China
| | - Jiansong Fang
- Institute of Clinical Pharmacology , Guangzhou University of Chinese Medicine , Guangzhou 510405 , China
| | - Feixiong Cheng
- Genomic Medicine Institute, Lerner Research Institute , Cleveland Clinic , Cleveland , Ohio 44106 , United States.,Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine , Case Western Reserve University , 9500 Euclid Avenue , Cleveland , Ohio 44195 , United States.,Case Comprehensive Cancer Center , Case Western Reserve University School of Medicine , Cleveland , Ohio 44106 , United States
| |
Collapse
|