1
|
Wu JN, Wang T, Chen Y, Tang LJ, Wu HL, Yu RQ. t-SMILES: a fragment-based molecular representation framework for de novo ligand design. Nat Commun 2024; 15:4993. [PMID: 38862578 PMCID: PMC11167009 DOI: 10.1038/s41467-024-49388-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Accepted: 06/04/2024] [Indexed: 06/13/2024] Open
Abstract
Effective representation of molecules is a crucial factor affecting the performance of artificial intelligence models. This study introduces a flexible, fragment-based, multiscale molecular representation framework called t-SMILES (tree-based SMILES) with three code algorithms: TSSA (t-SMILES with shared atom), TSDY (t-SMILES with dummy atom but without ID) and TSID (t-SMILES with ID and dummy atom). It describes molecules using SMILES-type strings obtained by performing a breadth-first search on a full binary tree formed from a fragmented molecular graph. Systematic evaluations using JTVAE, BRICS, MMPA, and Scaffold show the feasibility of constructing a multi-code molecular description system, where various descriptions complement each other, enhancing the overall performance. In addition, it can avoid overfitting and achieve higher novelty scores while maintaining reasonable similarity on labeled low-resource datasets, regardless of whether the model is original, data-augmented, or pre-trained then fine-tuned. Furthermore, it significantly outperforms classical SMILES, DeepSMILES, SELFIES and baseline models in goal-directed tasks. And it surpasses state-of-the-art fragment, graph and SMILES based approaches on ChEMBL, Zinc, and QM9.
Collapse
Affiliation(s)
- Juan-Ni Wu
- State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha, 410082, PR China
| | - Tong Wang
- State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha, 410082, PR China
| | - Yue Chen
- State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha, 410082, PR China
| | - Li-Juan Tang
- State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha, 410082, PR China
| | - Hai-Long Wu
- State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha, 410082, PR China.
| | - Ru-Qin Yu
- State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha, 410082, PR China.
| |
Collapse
|
2
|
McDonald SM, Augustine EK, Lanners Q, Rudin C, Catherine Brinson L, Becker ML. Applied machine learning as a driver for polymeric biomaterials design. Nat Commun 2023; 14:4838. [PMID: 37563117 PMCID: PMC10415291 DOI: 10.1038/s41467-023-40459-8] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Accepted: 07/24/2023] [Indexed: 08/12/2023] Open
Abstract
Polymers are ubiquitous to almost every aspect of modern society and their use in medical products is similarly pervasive. Despite this, the diversity in commercial polymers used in medicine is stunningly low. Considerable time and resources have been extended over the years towards the development of new polymeric biomaterials which address unmet needs left by the current generation of medical-grade polymers. Machine learning (ML) presents an unprecedented opportunity in this field to bypass the need for trial-and-error synthesis, thus reducing the time and resources invested into new discoveries critical for advancing medical treatments. Current efforts pioneering applied ML in polymer design have employed combinatorial and high throughput experimental design to address data availability concerns. However, the lack of available and standardized characterization of parameters relevant to medicine, including degradation time and biocompatibility, represents a nearly insurmountable obstacle to ML-aided design of biomaterials. Herein, we identify a gap at the intersection of applied ML and biomedical polymer design, highlight current works at this junction more broadly and provide an outlook on challenges and future directions.
Collapse
Affiliation(s)
| | - Emily K Augustine
- Thomas Lord Department of Mechanical Engineering and Materials Science, Duke University, Durham, NC, USA
| | - Quinn Lanners
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| | - Cynthia Rudin
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| | - L Catherine Brinson
- Thomas Lord Department of Mechanical Engineering and Materials Science, Duke University, Durham, NC, USA
| | - Matthew L Becker
- Department of Chemistry, Duke University, Durham, NC, USA.
- Thomas Lord Department of Mechanical Engineering and Materials Science, Duke University, Durham, NC, USA.
| |
Collapse
|
3
|
Chen Y, Ou Y, Zheng P, Huang Y, Ge F, Dral PO. Benchmark of general-purpose machine learning-based quantum mechanical method AIQM1 on reaction barrier heights. J Chem Phys 2023; 158:074103. [PMID: 36813722 DOI: 10.1063/5.0137101] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Artificial intelligence-enhanced quantum mechanical method 1 (AIQM1) is a general-purpose method that was shown to achieve high accuracy for many applications with a speed close to its baseline semiempirical quantum mechanical (SQM) method ODM2*. Here, we evaluate the hitherto unknown performance of out-of-the-box AIQM1 without any refitting for reaction barrier heights on eight datasets, including a total of ∼24 thousand reactions. This evaluation shows that AIQM1's accuracy strongly depends on the type of transition state and ranges from excellent for rotation barriers to poor for, e.g., pericyclic reactions. AIQM1 clearly outperforms its baseline ODM2* method and, even more so, a popular universal potential, ANI-1ccx. Overall, however, AIQM1 accuracy largely remains similar to SQM methods (and B3LYP/6-31G* for most reaction types) suggesting that it is desirable to focus on improving AIQM1 performance for barrier heights in the future. We also show that the built-in uncertainty quantification helps in identifying confident predictions. The accuracy of confident AIQM1 predictions is approaching the level of popular density functional theory methods for most reaction types. Encouragingly, AIQM1 is rather robust for transition state optimizations, even for the type of reactions it struggles with the most. Single-point calculations with high-level methods on AIQM1-optimized geometries can be used to significantly improve barrier heights, which cannot be said for its baseline ODM2* method.
Collapse
Affiliation(s)
- Yuxinxin Chen
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Yanchi Ou
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Peikun Zheng
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Yaohuang Huang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Fuchun Ge
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| |
Collapse
|
4
|
Abate C, Decherchi S, Cavalli A. Graph neural networks for conditional de novo drug design. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2023. [DOI: 10.1002/wcms.1651] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Affiliation(s)
- Carlo Abate
- Fondazione Istituto Italiano di Tecnologia Genoa Italy
- Università degli Studi di Bologna Bologna Italy
| | | | - Andrea Cavalli
- Fondazione Istituto Italiano di Tecnologia Genoa Italy
- Università degli Studi di Bologna Bologna Italy
| |
Collapse
|
5
|
Ess DH, Jelfs KE, Kulik HJ. Chemical design by artificial intelligence. J Chem Phys 2022; 157:120401. [PMID: 36182437 DOI: 10.1063/5.0123281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Daniel H Ess
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84604, USA
| | - Kim E Jelfs
- Department of Chemistry, Molecular Sciences Research Hub, 82 Wood Lane, White City Campus, Imperial College London, London W12 0BZ, United Kingdom
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| |
Collapse
|