Karasev DA, Sobolev BN, Filimonov DA, Lagunin A. Prediction of viral protease inhibitors using proteochemometrics approach.
Comput Biol Chem 2024;
110:108061. [PMID:
38574417 DOI:
10.1016/j.compbiolchem.2024.108061]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Revised: 03/21/2024] [Accepted: 03/23/2024] [Indexed: 04/06/2024]
Abstract
Being widely accepted tools in computational drug search, the (Q)SAR methods have limitations related to data incompleteness. The proteochemometrics (PCM) approach expands the applicability area by using description for both protein and ligand structures. The PCM algorithms are urgently required for the development of new antiviral agents. We suggest the PCM method using the TLMNA descriptors, combining the MNA descriptors of ligands and protein sequence N-grams. Our method was validated on the viral chymotrypsin-like proteases and their ligands. We have developed an original protocol allowing us to collect a comprehensive set of 15 protein sequences and more than 9000 ligands from the ChEMBL database. The N-grams were derived from the 3D-based alignment, accurately superposing ligand-binding regions. In testing the ligand set in SAR mode with MNA descriptors, an accuracy above 0.95 was determined that shows the perspective of the antiviral drug search in virtual chemical libraries. The effective PCM models were built with the TLMNA descriptor. The strong validation procedure with pair exclusion simulated the prediction of interactions between the new ligands and new targets, resulting in accuracy estimation up to 0.89. The PCM approach shows slightly lower accuracy caused by more uncertainty compared with SAR, but it overcomes the problem of data incompleteness.
Collapse