Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Tarasova OA, Biziukova NY, Filimonov DA, Poroikov VV, Nicklaus MC. Data Mining Approach for Extraction of Useful Information About Biologically Active Compounds from Publications. J Chem Inf Model 2019;59:3635-3644. [PMID: 31453694 DOI: 10.1021/acs.jcim.9b00164] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

For:	Tarasova OA, Biziukova NY, Filimonov DA, Poroikov VV, Nicklaus MC. Data Mining Approach for Extraction of Useful Information About Biologically Active Compounds from Publications. J Chem Inf Model 2019;59:3635-3644. [PMID: 31453694 DOI: 10.1021/acs.jcim.9b00164] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

Number

Cited by Other Article(s)

Tarasova O, Biziukova N, Shemshura A, Filimonov D, Kireev D, Pokrovskaya A, Poroikov VV. Identification of Molecular Mechanisms Involved in Viral Infection Progression Based on Text Mining: Case Study for HIV Infection. Int J Mol Sci 2023;24:ijms24021465. [PMID: 36674980 PMCID: PMC9862153 DOI: 10.3390/ijms24021465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 12/29/2022] [Accepted: 01/06/2023] [Indexed: 01/13/2023] Open

Ghosh S, Lu K. Band gap information extraction from materials science literature – a pilot study. ASLIB J INFORM MANAG 2022. [DOI: 10.1108/ajim-03-2022-0141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Tarasova OA, Rudik AV, Biziukova NY, Filimonov DA, Poroikov VV. Chemical named entity recognition in the texts of scientific publications using the naïve Bayes classifier approach. J Cheminform 2022;14:55. [PMID: 35964150 PMCID: PMC9375066 DOI: 10.1186/s13321-022-00633-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 07/12/2022] [Indexed: 11/24/2022] Open

Abstract

Motivation

Application of chemical named entity recognition (CNER) algorithms allows retrieval of information from texts about chemical compound identifiers and creates associations with physical–chemical properties and biological activities. Scientific texts represent low-formalized sources of information. Most methods aimed at CNER are based on machine learning approaches, including conditional random fields and deep neural networks. In general, most machine learning approaches require either vector or sparse word representation of texts. Chemical named entities (CNEs) constitute only a small fraction of the whole text, and the datasets used for training are highly imbalanced.

Methods and results

We propose a new method for extracting CNEs from texts based on the naïve Bayes classifier combined with specially developed filters. In contrast to the earlier developed CNER methods, our approach uses the representation of the data as a set of fragments of text (FoTs) with the subsequent preparati`on of a set of multi-n-grams (sequences from one to n symbols) for each FoT. Our approach may provide the recognition of novel CNEs. For CHEMDNER corpus, the values of the sensitivity (recall) was 0.95, precision was 0.74, specificity was 0.88, and balanced accuracy was 0.92 based on five-fold cross validation. We applied the developed algorithm to the extracted CNEs of potential Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) main protease (Mpro) inhibitors. A set of CNEs corresponding to the chemical substances evaluated in the biochemical assays used for the discovery of Mpro inhibitors was retrieved. Manual analysis of the appropriate texts showed that CNEs of potential SARS-CoV-2 Mpro inhibitors were successfully identified by our method.

Conclusion

The obtained results show that the proposed method can be used for filtering out words that are not related to CNEs; therefore, it can be successfully applied to the extraction of CNEs for the purposes of cheminformatics and medicinal chemistry.

Supplementary Information

The online version contains supplementary material available at 10.1186/s13321-022-00633-4.

Collapse

Web-Based Quantitative Structure-Activity Relationship Resources Facilitate Effective Drug Discovery. Top Curr Chem (Cham) 2021;379:37. [PMID: 34554348 DOI: 10.1007/s41061-021-00349-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Accepted: 08/17/2021] [Indexed: 12/28/2022]

Bannigan P, Aldeghi M, Bao Z, Häse F, Aspuru-Guzik A, Allen C. Machine learning directed drug formulation development. Adv Drug Deliv Rev 2021;175:113806. [PMID: 34019959 DOI: 10.1016/j.addr.2021.05.016] [Citation(s) in RCA: 71] [Impact Index Per Article: 23.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 03/31/2021] [Accepted: 05/14/2021] [Indexed: 12/12/2022]

Tarasova O, Poroikov V. Machine Learning in Discovery of New Antivirals and Optimization of Viral Infections Therapy. Curr Med Chem 2021;28:7840-7861. [PMID: 33949929 DOI: 10.2174/0929867328666210504114351] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Revised: 02/13/2021] [Accepted: 02/24/2021] [Indexed: 11/22/2022]

Tarasova OA, Biziukova NY, Rudik AV, Dmitriev AV, Filimonov DA, Poroikov VV. Extraction of Data on Parent Compounds and Their Metabolites from Texts of Scientific Abstracts. J Chem Inf Model 2021;61:1683-1690. [PMID: 33724829 DOI: 10.1021/acs.jcim.0c01054] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

Abstract

The growing amount of experimental data on chemical objects includes properties of small molecules, results of studies of their interaction with human and animal proteins, and methods of synthesis of organic compounds (OCs). The data obtained can be used to identify the names of OCs automatically, including all possible synonyms and relevant data on the molecular properties and biological activity. Utilization of different synonymic names of chemical compounds allows researchers to increase the completeness of data on their properties available from publications. Enrichment of the data on the names of chemical compounds by information about their possible metabolites can help estimate the biological effects of parent compounds and their metabolites more thoroughly. Therefore, an attempt at automated extraction of the names of parent compounds and their metabolites from the texts is a rather important task. In our study, we aimed at developing a method that provides the extraction of the named entities (NEs) of parent compounds and their metabolites from abstracts of scientific publications. Based on the application of the conditional random fields' algorithm, we extracted the NEs of chemical compounds. We developed a set of rules allowing identification of parent compound NEs and their metabolites in the texts. We evaluated the possibility of extracting the names of potential metabolites based on cosine similarity between strings representing names of parent compounds and all other chemical NEs found in the text. Additionally, we used conditional random fields to fetch the names of parent compounds and their metabolites from the texts based on the corpus of texts labeled manually. Our computational experiments showed that usage of rules in combination with cosine similarity could increase the accuracy of recognition of the names of metabolites compared to the rule-based algorithm and application of a machine-learning algorithm (conditional random fields).

Collapse

Biziukova N, Tarasova O, Ivanov S, Poroikov V. Automated Extraction of Information From Texts of Scientific Publications: Insights Into HIV Treatment Strategies. Front Genet 2021;11:618862. [PMID: 33414815 PMCID: PMC7783389 DOI: 10.3389/fgene.2020.618862] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2020] [Accepted: 11/26/2020] [Indexed: 12/16/2022] Open

Abstract

Text analysis can help to identify named entities (NEs) of small molecules, proteins, and genes. Such data are very important for the analysis of molecular mechanisms of disease progression and development of new strategies for the treatment of various diseases and pathological conditions. The texts of publications represent a primary source of information, which is especially important to collect the data of the highest quality due to the immediate obtaining information, in comparison with databases. In our study, we aimed at the development and testing of an approach to the named entity recognition in the abstracts of publications. More specifically, we have developed and tested an algorithm based on the conditional random fields, which provides recognition of NEs of (i) genes and proteins and (ii) chemicals. Careful selection of abstracts strictly related to the subject of interest leads to the possibility of extracting the NEs strongly associated with the subject. To test the applicability of our approach, we have applied it for the extraction of (i) potential HIV inhibitors and (ii) a set of proteins and genes potentially responsible for viremic control in HIV-positive patients. The computational experiments performed provide the estimations of evaluating the accuracy of recognition of chemical NEs and proteins (genes). The precision of the chemical NEs recognition is over 0.91; recall is 0.86, and the F1-score (harmonic mean of precision and recall) is 0.89; the precision of recognition of proteins and genes names is over 0.86; recall is 0.83; while F1-score is above 0.85. Evaluation of the algorithm on two case studies related to HIV treatment confirms our suggestion about the possibility of extracting the NEs strongly relevant to (i) HIV inhibitors and (ii) a group of patients i.e., the group of HIV-positive individuals with an ability to maintain an undetectable HIV-1 viral load overtime in the absence of antiretroviral therapy. Analysis of the results obtained provides insights into the function of proteins that can be responsible for viremic control. Our study demonstrated the applicability of the developed approach for the extraction of useful data on HIV treatment.

Collapse

Tarasova O, Ivanov S, Filimonov DA, Poroikov V. Data and Text Mining Help Identify Key Proteins Involved in the Molecular Mechanisms Shared by SARS-CoV-2 and HIV-1. Molecules 2020;25:E2944. [PMID: 32604797 PMCID: PMC7357070 DOI: 10.3390/molecules25122944] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Revised: 06/22/2020] [Accepted: 06/24/2020] [Indexed: 12/11/2022] Open

Stolbov LA, Druzhilovskiy DS, Filimonov DA, Nicklaus MC, Poroikov VV. (Q)SAR Models of HIV-1 Protein Inhibition by Drug-Like Compounds. Molecules 2019;25:molecules25010087. [PMID: 31881687 PMCID: PMC6983201 DOI: 10.3390/molecules25010087] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2019] [Revised: 12/17/2019] [Accepted: 12/18/2019] [Indexed: 12/17/2022] Open