1
|
Tarasova O, Biziukova N, Shemshura A, Filimonov D, Kireev D, Pokrovskaya A, Poroikov VV. Identification of Molecular Mechanisms Involved in Viral Infection Progression Based on Text Mining: Case Study for HIV Infection. Int J Mol Sci 2023; 24:ijms24021465. [PMID: 36674980 PMCID: PMC9862153 DOI: 10.3390/ijms24021465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 12/29/2022] [Accepted: 01/06/2023] [Indexed: 01/13/2023] Open
Abstract
Viruses cause various infections that may affect human lifestyle for durations ranging from several days to for many years. Although preventative and therapeutic remedies are available for many viruses, they may still have a profound impact on human life. The human immunodeficiency virus type 1 is the most common cause of HIV infection, which represents one of the most dangerous and complex diseases since it affects the immune system and causes its disruption, leading to secondary complications and negatively influencing health-related quality of life. While highly active antiretroviral therapy may decrease the viral load and the velocity of HIV infection progression, some individual peculiarities may affect viral load control or the progression of T-cell malfunction induced by HIV. Our study is aimed at the text-based identification of molecular mechanisms that may be involved in viral infection progression, using HIV as a case study. Specifically, we identified human proteins and genes which commonly occurred, overexpressed or underexpressed, in the collections of publications relevant to (i) HIV infection progression and (ii) acute and chronic stages of HIV infection. Then, we considered biological processes that are controlled by the identified protein and genes. We verified the impact of the identified molecules in the associated clinical study.
Collapse
Affiliation(s)
- Olga Tarasova
- Institute of Biomedical Chemistry, 10 Bldg. 8, Pogodinskaya Str., 119121 Moscow, Russia
| | - Nadezhda Biziukova
- Institute of Biomedical Chemistry, 10 Bldg. 8, Pogodinskaya Str., 119121 Moscow, Russia
| | - Andrey Shemshura
- Federal Budget Public Health Institution “Clinical Center of HIV/AIDS Treatment and Prevention” of the Ministry of Health of Krasnodar Region, 204/2, im. Mitrofana Sedina Str., 350000 Krasnodar, Russia
| | - Dmitry Filimonov
- Institute of Biomedical Chemistry, 10 Bldg. 8, Pogodinskaya Str., 119121 Moscow, Russia
| | - Dmitry Kireev
- Federal Budget Institution of Science «Central Research Institute for Epidemiology» of the Federal Service for Surveillance on Consumer Rights Protection and Human Wellbeing, Novogireevskaya Str., 3A, 111123 Moscow, Russia
| | - Anastasia Pokrovskaya
- Federal Budget Institution of Science «Central Research Institute for Epidemiology» of the Federal Service for Surveillance on Consumer Rights Protection and Human Wellbeing, Novogireevskaya Str., 3A, 111123 Moscow, Russia
- Department of Infectious Diseases with Courses of Epidemiology and Phthisiology, Medical Institute, Peoples’ Friendship University of Russia, 6 Miklukho-Maklaya Str., 117198 Moscow, Russia
| | - Vladimir V. Poroikov
- Institute of Biomedical Chemistry, 10 Bldg. 8, Pogodinskaya Str., 119121 Moscow, Russia
- Correspondence:
| |
Collapse
|
2
|
Ghosh S, Lu K. Band gap information extraction from materials science literature – a pilot study. ASLIB J INFORM MANAG 2022. [DOI: 10.1108/ajim-03-2022-0141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
PurposeThe purpose of this paper is to present a preliminary work on extracting band gap information of materials from academic papers. With increasing demand for renewable energy, band gap information will help material scientists design and implement novel photovoltaic (PV) cells.Design/methodology/approachThe authors collected 1.44 million titles and abstracts of scholarly articles related to materials science, and then filtered the collection to 11,939 articles that potentially contain relevant information about materials and their band gap values. ChemDataExtractor was extended to extract information about PV materials and their band gap information. Evaluation was performed on randomly sampled information records of 415 papers.FindingsThe findings of this study show that the current system is able to correctly extract information for 51.32% articles, with partially correct extraction for 36.62% articles and incorrect for 12.04%. The authors have also identified the errors belonging to three main categories pertaining to chemical entity identification, band gap information and interdependency resolution. Future work will focus on addressing these errors to improve the performance of the system.Originality/valueThe authors did not find any literature to date on band gap information extraction from academic text using automated methods. This work is unique and original. Band gap information is of importance to materials scientists in applications such as solar cells, light emitting diodes and laser diodes.
Collapse
|
3
|
Tarasova OA, Rudik AV, Biziukova NY, Filimonov DA, Poroikov VV. Chemical named entity recognition in the texts of scientific publications using the naïve Bayes classifier approach. J Cheminform 2022; 14:55. [PMID: 35964150 PMCID: PMC9375066 DOI: 10.1186/s13321-022-00633-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 07/12/2022] [Indexed: 11/24/2022] Open
Abstract
Motivation Application of chemical named entity recognition (CNER) algorithms allows retrieval of information from texts about chemical compound identifiers and creates associations with physical–chemical properties and biological activities. Scientific texts represent low-formalized sources of information. Most methods aimed at CNER are based on machine learning approaches, including conditional random fields and deep neural networks. In general, most machine learning approaches require either vector or sparse word representation of texts. Chemical named entities (CNEs) constitute only a small fraction of the whole text, and the datasets used for training are highly imbalanced. Methods and results We propose a new method for extracting CNEs from texts based on the naïve Bayes classifier combined with specially developed filters. In contrast to the earlier developed CNER methods, our approach uses the representation of the data as a set of fragments of text (FoTs) with the subsequent preparati`on of a set of multi-n-grams (sequences from one to n symbols) for each FoT. Our approach may provide the recognition of novel CNEs. For CHEMDNER corpus, the values of the sensitivity (recall) was 0.95, precision was 0.74, specificity was 0.88, and balanced accuracy was 0.92 based on five-fold cross validation. We applied the developed algorithm to the extracted CNEs of potential Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) main protease (Mpro) inhibitors. A set of CNEs corresponding to the chemical substances evaluated in the biochemical assays used for the discovery of Mpro inhibitors was retrieved. Manual analysis of the appropriate texts showed that CNEs of potential SARS-CoV-2 Mpro inhibitors were successfully identified by our method. Conclusion The obtained results show that the proposed method can be used for filtering out words that are not related to CNEs; therefore, it can be successfully applied to the extraction of CNEs for the purposes of cheminformatics and medicinal chemistry. Supplementary Information The online version contains supplementary material available at 10.1186/s13321-022-00633-4.
Collapse
Affiliation(s)
- O A Tarasova
- Laboratory of Structure-Function Based Drug Design, Institute of Biomedical Chemistry, 10 bldg. 8, Pogodinskaya Str., Moscow, 119121, Russia.
| | - A V Rudik
- Laboratory of Structure-Function Based Drug Design, Institute of Biomedical Chemistry, 10 bldg. 8, Pogodinskaya Str., Moscow, 119121, Russia
| | - N Yu Biziukova
- Laboratory of Structure-Function Based Drug Design, Institute of Biomedical Chemistry, 10 bldg. 8, Pogodinskaya Str., Moscow, 119121, Russia
| | - D A Filimonov
- Laboratory of Structure-Function Based Drug Design, Institute of Biomedical Chemistry, 10 bldg. 8, Pogodinskaya Str., Moscow, 119121, Russia
| | - V V Poroikov
- Laboratory of Structure-Function Based Drug Design, Institute of Biomedical Chemistry, 10 bldg. 8, Pogodinskaya Str., Moscow, 119121, Russia
| |
Collapse
|
4
|
Web-Based Quantitative Structure-Activity Relationship Resources Facilitate Effective Drug Discovery. Top Curr Chem (Cham) 2021; 379:37. [PMID: 34554348 DOI: 10.1007/s41061-021-00349-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Accepted: 08/17/2021] [Indexed: 12/28/2022]
Abstract
Traditional drug discovery effectively contributes to the treatment of many diseases but is limited by high costs and long cycles. Quantitative structure-activity relationship (QSAR) methods were introduced to evaluate the activity of compounds virtually, which saves the significant cost of determining the activities of the compounds experimentally. Over the past two decades, many web tools for QSAR modeling with various features have been developed to facilitate the usage of QSAR methods. These web tools significantly reduce the difficulty of using QSAR and indirectly promote drug discovery. However, there are few comprehensive summaries of these QSAR tools, and researchers may have difficulty determining which tool to use. Hence, we systematically surveyed the mainstream web tools for QSAR modeling. This work may guide researchers in choosing appropriate web tools for developing QSAR models, and may also help develop more bioinformatics tools based on these existing resources. For nonprofessionals, we also hope to make more people aware of QSAR methods and expand their use.
Collapse
|
5
|
Bannigan P, Aldeghi M, Bao Z, Häse F, Aspuru-Guzik A, Allen C. Machine learning directed drug formulation development. Adv Drug Deliv Rev 2021; 175:113806. [PMID: 34019959 DOI: 10.1016/j.addr.2021.05.016] [Citation(s) in RCA: 71] [Impact Index Per Article: 23.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 03/31/2021] [Accepted: 05/14/2021] [Indexed: 12/12/2022]
Abstract
Machine learning (ML) has enabled ground-breaking advances in the healthcare and pharmaceutical sectors, from improvements in cancer diagnosis, to the identification of novel drugs and drug targets as well as protein structure prediction. Drug formulation is an essential stage in the discovery and development of new medicines. Through the design of drug formulations, pharmaceutical scientists can engineer important properties of new medicines, such as improved bioavailability and targeted delivery. The traditional approach to drug formulation development relies on iterative trial-and-error, requiring a large number of resource-intensive and time-consuming in vitro and in vivo experiments. This review introduces the basic concepts of ML-directed workflows and discusses how these tools can be used to aid in the development of various types of drug formulations. ML-directed drug formulation development offers unparalleled opportunities to fast-track development efforts, uncover new materials, innovative formulations, and generate new knowledge in drug formulation science. The review also highlights the latest artificial intelligence (AI) technologies, such as generative models, Bayesian deep learning, reinforcement learning, and self-driving laboratories, which have been gaining momentum in drug discovery and chemistry and have potential in drug formulation development.
Collapse
Affiliation(s)
- Pauric Bannigan
- Leslie Dan Faculty of Pharmacy, University of Toronto, Toronto, ON M5S 3M2, Canada
| | - Matteo Aldeghi
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON M5S 3H6, Canada; Department of Computer Science, University of Toronto, Toronto, ON M5S 3H6, Canada; Vector Institute for Artificial Intelligence, Toronto, ON M5S 1M1, Canada
| | - Zeqing Bao
- Leslie Dan Faculty of Pharmacy, University of Toronto, Toronto, ON M5S 3M2, Canada
| | - Florian Häse
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON M5S 3H6, Canada; Department of Computer Science, University of Toronto, Toronto, ON M5S 3H6, Canada; Vector Institute for Artificial Intelligence, Toronto, ON M5S 1M1, Canada
| | - Alán Aspuru-Guzik
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON M5S 3H6, Canada; Department of Computer Science, University of Toronto, Toronto, ON M5S 3H6, Canada; Vector Institute for Artificial Intelligence, Toronto, ON M5S 1M1, Canada; Lebovic Fellow, Canadian Institute for Advanced Research, Toronto, ON M5S 1M1, Canada.
| | - Christine Allen
- Leslie Dan Faculty of Pharmacy, University of Toronto, Toronto, ON M5S 3M2, Canada.
| |
Collapse
|
6
|
Tarasova O, Poroikov V. Machine Learning in Discovery of New Antivirals and Optimization of Viral Infections Therapy. Curr Med Chem 2021; 28:7840-7861. [PMID: 33949929 DOI: 10.2174/0929867328666210504114351] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Revised: 02/13/2021] [Accepted: 02/24/2021] [Indexed: 11/22/2022]
Abstract
Nowadays, computational approaches play an important role in the design of new drug-like compounds and optimization of pharmacotherapeutic treatment of diseases. The emerging growth of viral infections, including those caused by the Human Immunodeficiency Virus (HIV), Ebola virus, recently detected coronavirus, and some others, leads to many newly infected people with a high risk of death or severe complications. A huge amount of chemical, biological, clinical data is at the disposal of the researchers. Therefore, there are many opportunities to find the relationships between the particular features of chemical data and the antiviral activity of biologically active compounds based on machine learning approaches. Biological and clinical data can also be used for building models to predict relationships between viral genotype and drug resistance, which might help determine the clinical outcome of treatment. In the current study, we consider machine-learning approaches in the antiviral research carried out during the past decade. We overview in detail the application of machine-learning methods for the design of new potential antiviral agents and vaccines, drug resistance prediction, and analysis of virus-host interactions. Our review also covers the perspectives of using the machine-learning approaches for antiviral research, including Dengue, Ebola viruses, Influenza A, Human Immunodeficiency Virus, coronaviruses, and some others.
Collapse
Affiliation(s)
- Olga Tarasova
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow. Russian Federation
| | - Vladimir Poroikov
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow. Russian Federation
| |
Collapse
|
7
|
Tarasova OA, Biziukova NY, Rudik AV, Dmitriev AV, Filimonov DA, Poroikov VV. Extraction of Data on Parent Compounds and Their Metabolites from Texts of Scientific Abstracts. J Chem Inf Model 2021; 61:1683-1690. [PMID: 33724829 DOI: 10.1021/acs.jcim.0c01054] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The growing amount of experimental data on chemical objects includes properties of small molecules, results of studies of their interaction with human and animal proteins, and methods of synthesis of organic compounds (OCs). The data obtained can be used to identify the names of OCs automatically, including all possible synonyms and relevant data on the molecular properties and biological activity. Utilization of different synonymic names of chemical compounds allows researchers to increase the completeness of data on their properties available from publications. Enrichment of the data on the names of chemical compounds by information about their possible metabolites can help estimate the biological effects of parent compounds and their metabolites more thoroughly. Therefore, an attempt at automated extraction of the names of parent compounds and their metabolites from the texts is a rather important task. In our study, we aimed at developing a method that provides the extraction of the named entities (NEs) of parent compounds and their metabolites from abstracts of scientific publications. Based on the application of the conditional random fields' algorithm, we extracted the NEs of chemical compounds. We developed a set of rules allowing identification of parent compound NEs and their metabolites in the texts. We evaluated the possibility of extracting the names of potential metabolites based on cosine similarity between strings representing names of parent compounds and all other chemical NEs found in the text. Additionally, we used conditional random fields to fetch the names of parent compounds and their metabolites from the texts based on the corpus of texts labeled manually. Our computational experiments showed that usage of rules in combination with cosine similarity could increase the accuracy of recognition of the names of metabolites compared to the rule-based algorithm and application of a machine-learning algorithm (conditional random fields).
Collapse
Affiliation(s)
- Olga A Tarasova
- Institute of Biomedical Chemistry, Pogodinskaya Str., 10/8, Moscow 119121, Russia
| | | | - Anastassia V Rudik
- Institute of Biomedical Chemistry, Pogodinskaya Str., 10/8, Moscow 119121, Russia
| | - Alexander V Dmitriev
- Institute of Biomedical Chemistry, Pogodinskaya Str., 10/8, Moscow 119121, Russia
| | - Dmitry A Filimonov
- Institute of Biomedical Chemistry, Pogodinskaya Str., 10/8, Moscow 119121, Russia
| | - Vladimir V Poroikov
- Institute of Biomedical Chemistry, Pogodinskaya Str., 10/8, Moscow 119121, Russia
| |
Collapse
|
8
|
Biziukova N, Tarasova O, Ivanov S, Poroikov V. Automated Extraction of Information From Texts of Scientific Publications: Insights Into HIV Treatment Strategies. Front Genet 2021; 11:618862. [PMID: 33414815 PMCID: PMC7783389 DOI: 10.3389/fgene.2020.618862] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2020] [Accepted: 11/26/2020] [Indexed: 12/16/2022] Open
Abstract
Text analysis can help to identify named entities (NEs) of small molecules, proteins, and genes. Such data are very important for the analysis of molecular mechanisms of disease progression and development of new strategies for the treatment of various diseases and pathological conditions. The texts of publications represent a primary source of information, which is especially important to collect the data of the highest quality due to the immediate obtaining information, in comparison with databases. In our study, we aimed at the development and testing of an approach to the named entity recognition in the abstracts of publications. More specifically, we have developed and tested an algorithm based on the conditional random fields, which provides recognition of NEs of (i) genes and proteins and (ii) chemicals. Careful selection of abstracts strictly related to the subject of interest leads to the possibility of extracting the NEs strongly associated with the subject. To test the applicability of our approach, we have applied it for the extraction of (i) potential HIV inhibitors and (ii) a set of proteins and genes potentially responsible for viremic control in HIV-positive patients. The computational experiments performed provide the estimations of evaluating the accuracy of recognition of chemical NEs and proteins (genes). The precision of the chemical NEs recognition is over 0.91; recall is 0.86, and the F1-score (harmonic mean of precision and recall) is 0.89; the precision of recognition of proteins and genes names is over 0.86; recall is 0.83; while F1-score is above 0.85. Evaluation of the algorithm on two case studies related to HIV treatment confirms our suggestion about the possibility of extracting the NEs strongly relevant to (i) HIV inhibitors and (ii) a group of patients i.e., the group of HIV-positive individuals with an ability to maintain an undetectable HIV-1 viral load overtime in the absence of antiretroviral therapy. Analysis of the results obtained provides insights into the function of proteins that can be responsible for viremic control. Our study demonstrated the applicability of the developed approach for the extraction of useful data on HIV treatment.
Collapse
Affiliation(s)
- Nadezhda Biziukova
- Laboratory of Structure-Function Based Drug Design, Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow, Russia
| | - Olga Tarasova
- Laboratory of Structure-Function Based Drug Design, Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow, Russia
| | - Sergey Ivanov
- Laboratory of Structure-Function Based Drug Design, Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow, Russia.,Department of Bioinformatics, Faculty of Biomedicine, Pirogov Russian National Research Medical University, Moscow, Russia
| | - Vladimir Poroikov
- Laboratory of Structure-Function Based Drug Design, Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow, Russia
| |
Collapse
|
9
|
Tarasova O, Ivanov S, Filimonov DA, Poroikov V. Data and Text Mining Help Identify Key Proteins Involved in the Molecular Mechanisms Shared by SARS-CoV-2 and HIV-1. Molecules 2020; 25:E2944. [PMID: 32604797 PMCID: PMC7357070 DOI: 10.3390/molecules25122944] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Revised: 06/22/2020] [Accepted: 06/24/2020] [Indexed: 12/11/2022] Open
Abstract
Viruses can be spread from one person to another; therefore, they may cause disorders in many people, sometimes leading to epidemics and even pandemics. New, previously unstudied viruses and some specific mutant or recombinant variants of known viruses constantly appear. An example is a variant of coronaviruses (CoV) causing severe acute respiratory syndrome (SARS), named SARS-CoV-2. Some antiviral drugs, such as remdesivir as well as antiretroviral drugs including darunavir, lopinavir, and ritonavir are suggested to be effective in treating disorders caused by SARS-CoV-2. There are data on the utilization of antiretroviral drugs against SARS-CoV-2. Since there are many studies aimed at the identification of the molecular mechanisms of human immunodeficiency virus type 1 (HIV-1) infection and the development of novel therapeutic approaches against HIV-1, we used HIV-1 for our case study to identify possible molecular pathways shared by SARS-CoV-2 and HIV-1. We applied a text and data mining workflow and identified a list of 46 targets, which can be essential for the development of infections caused by SARS-CoV-2 and HIV-1. We show that SARS-CoV-2 and HIV-1 share some molecular pathways involved in inflammation, immune response, cell cycle regulation.
Collapse
Affiliation(s)
- Olga Tarasova
- Department for Bioinformatics, Institute of Biomedical Chemistry, 107076 Moscow, Russia; (S.I.); (D.A.F.); (V.P.)
| | - Sergey Ivanov
- Department for Bioinformatics, Institute of Biomedical Chemistry, 107076 Moscow, Russia; (S.I.); (D.A.F.); (V.P.)
- Department of Bioinformatics of Pirogov Russian National Research Medical University, 107076 Moscow, Russia
| | - Dmitry A. Filimonov
- Department for Bioinformatics, Institute of Biomedical Chemistry, 107076 Moscow, Russia; (S.I.); (D.A.F.); (V.P.)
| | - Vladimir Poroikov
- Department for Bioinformatics, Institute of Biomedical Chemistry, 107076 Moscow, Russia; (S.I.); (D.A.F.); (V.P.)
| |
Collapse
|
10
|
Stolbov LA, Druzhilovskiy DS, Filimonov DA, Nicklaus MC, Poroikov VV. (Q)SAR Models of HIV-1 Protein Inhibition by Drug-Like Compounds. Molecules 2019; 25:molecules25010087. [PMID: 31881687 PMCID: PMC6983201 DOI: 10.3390/molecules25010087] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2019] [Revised: 12/17/2019] [Accepted: 12/18/2019] [Indexed: 12/17/2022] Open
Abstract
Despite the achievements of antiretroviral therapy, discovery of new anti-HIV medicines remains an essential task because the existing drugs do not provide a complete cure for the infected patients, exhibit severe adverse effects, and lead to the appearance of resistant strains. To predict the interaction of drug-like compounds with multiple targets for HIV treatment, ligand-based drug design approach is widely applied. In this study, we evaluated the possibilities and limitations of (Q)SAR analysis aimed at the discovery of novel antiretroviral agents inhibiting the vital HIV enzymes. Local (Q)SAR models are based on the analysis of structure–activity relationships for molecules from the same chemical class, which significantly restrict their applicability domain. In contrast, global (Q)SAR models exploit data from heterogeneous sets of drug-like compounds, which allows their application to databases containing diverse structures. We compared the information for HIV-1 integrase, protease and reverse transcriptase inhibitors available in the EBI ChEMBL, NIAID HIV/OI/TB Therapeutics, and Clarivate Analytics Integrity databases as the sources for (Q)SAR training sets. Using the PASS and GUSAR software, we developed and validated a variety of (Q)SAR models, which can be further used for virtual screening of new antiretrovirals in the SAVI library. The developed models are implemented in the freely available web resource AntiHIV-Pred.
Collapse
Affiliation(s)
- Leonid A. Stolbov
- Laboratory of Structure-Function Based Drug Design, Institute of Biomedical Chemistry, 10 bldg. 8, Pogodinskaya str., 119121 Moscow, Russia; (L.A.S.); (D.S.D.); (D.A.F.)
| | - Dmitry S. Druzhilovskiy
- Laboratory of Structure-Function Based Drug Design, Institute of Biomedical Chemistry, 10 bldg. 8, Pogodinskaya str., 119121 Moscow, Russia; (L.A.S.); (D.S.D.); (D.A.F.)
| | - Dmitry A. Filimonov
- Laboratory of Structure-Function Based Drug Design, Institute of Biomedical Chemistry, 10 bldg. 8, Pogodinskaya str., 119121 Moscow, Russia; (L.A.S.); (D.S.D.); (D.A.F.)
| | - Marc C. Nicklaus
- Computer-Aided Drug Design Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, Frederick, MD 21702, USA;
| | - Vladimir V. Poroikov
- Laboratory of Structure-Function Based Drug Design, Institute of Biomedical Chemistry, 10 bldg. 8, Pogodinskaya str., 119121 Moscow, Russia; (L.A.S.); (D.S.D.); (D.A.F.)
- Correspondence:
| |
Collapse
|