Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Rebholz-Schuhmann D, Oellrich A, Hoehndorf R. Text-mining solutions for biomedical research: enabling integrative biology. Nat Rev Genet 2012;13:829-39. [DOI: 10.1038/nrg3337] [Citation(s) in RCA: 170] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]

For:	Rebholz-Schuhmann D, Oellrich A, Hoehndorf R. Text-mining solutions for biomedical research: enabling integrative biology. Nat Rev Genet 2012;13:829-39. [DOI: 10.1038/nrg3337] [Citation(s) in RCA: 170] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]

Number

Cited by Other Article(s)

Arora S, Chettri S, Percha V, Kumar D, Latwal M. Artifical intelligence: a virtual chemist for natural product drug discovery. J Biomol Struct Dyn 2024;42:3826-3835. [PMID: 37232451 DOI: 10.1080/07391102.2023.2216295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 05/12/2023] [Indexed: 05/27/2023]

Chen J, Ikeda SI, Negishi K, Tsubota K, Kurihara T. Identification of Potential Therapeutic Targets for Myopic Choroidal Neovascularization via Discovery-Driven Data Mining. Curr Eye Res 2023;48:1160-1169. [PMID: 37610842 DOI: 10.1080/02713683.2023.2252201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 07/29/2023] [Accepted: 08/21/2023] [Indexed: 08/25/2023]

Buch AM, Vértes PE, Seidlitz J, Kim SH, Grosenick L, Liston C. Molecular and network-level mechanisms explaining individual differences in autism spectrum disorder. Nat Neurosci 2023;26:650-663. [PMID: 36894656 DOI: 10.1038/s41593-023-01259-x] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Accepted: 01/17/2023] [Indexed: 03/11/2023]

Kontoghiorghes L, Colubi A. New metrics and tests for subject prevalence in documents based on topic modeling. Int J Approx Reason 2023. [DOI: 10.1016/j.ijar.2023.02.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]

Scott-Fordsmand JJ, Amorim MJB. Using Machine Learning to make nanomaterials sustainable. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023;859:160303. [PMID: 36410486 DOI: 10.1016/j.scitotenv.2022.160303] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Revised: 11/06/2022] [Accepted: 11/15/2022] [Indexed: 06/16/2023]

Gene Identification and Potential Drug Therapy for Drug-Resistant Melanoma with Bioinformatics and Deep Learning Technology. DISEASE MARKERS 2022;2022:2461055. [PMID: 35915735 PMCID: PMC9338845 DOI: 10.1155/2022/2461055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Revised: 06/13/2022] [Accepted: 06/21/2022] [Indexed: 11/17/2022]

Ali F, Khan A, Muhammad SA, Abbas SQ, Hassan SSU, Bungau S. Genome-wide Meta-analysis Reveals New Gene Signatures and Potential Drug Targets of Hypertension. ACS OMEGA 2022;7:22754-22772. [PMID: 35811894 PMCID: PMC9260904 DOI: 10.1021/acsomega.2c02277] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Accepted: 06/03/2022] [Indexed: 06/02/2023]

Abstract

The prevalence of hypertension reported around the world is increasing and is an important public health challenge. This study was designed to explore the disease's genetic variations and to identify new hypertension-related genes and target proteins. We analyzed 22 publicly available Affymetrix cDNA datasets of hypertension using an integrated system-level framework involving differential expression genetic (DEG) analysis, data mining, gene enrichment, protein-protein interaction, microRNA analysis, toxicogenomics, gene regulation, molecular docking, and simulation studies. We found potential DEGs after screening out the extracellular proteins. We studied the functional role of seven shortlisted DEGs (ADM, EDN1, ANGPTL4, NFIL3, MSR1, CEBPD, and USP8) in hypertension after disease gene curation analysis. The expression profiling and cluster analysis showed significant variations and enriched GO terms. hsa-miR-365a-3p, hsa-miR-2052, hsa-miR-3065-3p, hsa-miR-603, hsa-miR-7113-3p, hsa-miR-3923, and hsa-miR-524-5p were identified as hypertension-associated miRNA targets for each gene using computational algorithms. We found functional interactions of source DEGs with target and important gene signatures including EGFR, AGT, AVP, APOE, RHOA, SRC, APOB, STAT3, UBC, LPL, APOA1, and AKT1 associated with the disease. These DEGs are mainly involved in fatty acid metabolism, myometrial pathways, MAPK, and G-alpha signaling pathways linked with hypertension pathogenesis. We predicted significantly disordered regions of 71.2, 48.8, and 45.4% representing the mutation in the sequence of NFIL3, USP8, and ADM, respectively. Regulation of gene expression was performed to find upregulated genes. Molecular docking analysis was used to evaluate Food and Drug Administration-approved medicines against the four DEGs that were overexpressed. For each elevated target protein, the three best drug candidates were chosen. Furthermore, molecular dynamics (MD) simulation using the target's active sites for 100 ns was used to validate these 12 complexes after docking. This investigation establishes the worth of systems genetics for finding four possible genes as potential drug targets for hypertension. These network-based approaches are significant for finding genetic variant data, which will advance the understanding of how to hasten the identification of drug targets and improve the understanding regarding the treatment of hypertension.

Collapse

Silva MC, Eugénio P, Faria D, Pesquita C. Ontologies and Knowledge Graphs in Oncology Research. Cancers (Basel) 2022;14:cancers14081906. [PMID: 35454813 PMCID: PMC9029532 DOI: 10.3390/cancers14081906] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 03/25/2022] [Accepted: 04/07/2022] [Indexed: 11/16/2022] Open

Alshahrani M, Almansour A, Alkhaldi A, Thafar MA, Uludag M, Essack M, Hoehndorf R. Combining biomedical knowledge graphs and text to improve predictions for drug-target interactions and drug-indications. PeerJ 2022;10:e13061. [PMID: 35402106 PMCID: PMC8988936 DOI: 10.7717/peerj.13061] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Accepted: 02/13/2022] [Indexed: 01/11/2023] Open

Darling: A Web Application for Detecting Disease-Related Biomedical Entity Associations with Literature Mining. Biomolecules 2022;12:biom12040520. [PMID: 35454109 PMCID: PMC9028073 DOI: 10.3390/biom12040520] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Revised: 03/24/2022] [Accepted: 03/28/2022] [Indexed: 12/15/2022] Open

Fisher JL, Jones EF, Flanary VL, Williams AS, Ramsey EJ, Lasseigne BN. Considerations and challenges for sex-aware drug repurposing. Biol Sex Differ 2022;13:13. [PMID: 35337371 PMCID: PMC8949654 DOI: 10.1186/s13293-022-00420-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Accepted: 03/06/2022] [Indexed: 01/09/2023] Open

Abstract

Sex differences are essential factors in disease etiology and manifestation in many diseases such as cardiovascular disease, cancer, and neurodegeneration [33]. The biological influence of sex differences (including genomic, epigenetic, hormonal, immunological, and metabolic differences between males and females) and the lack of biomedical studies considering sex differences in their study design has led to several policies. For example, the National Institute of Health's (NIH) sex as a biological variable (SABV) and Sex and Gender Equity in Research (SAGER) policies to motivate researchers to consider sex differences [204]. However, drug repurposing, a promising alternative to traditional drug discovery by identifying novel uses for FDA-approved drugs, lacks sex-aware methods that can improve the identification of drugs that have sex-specific responses [7, 11, 14, 33]. Sex-aware drug repurposing methods either select drug candidates that are more efficacious in one sex or deprioritize drug candidates based on if they are predicted to cause a sex-bias adverse event (SBAE), unintended therapeutic effects that are more likely to occur in one sex. Computational drug repurposing methods are encouraging approaches to develop for sex-aware drug repurposing because they can prioritize sex-specific drug candidates or SBAEs at lower cost and time than traditional drug discovery. Sex-aware methods currently exist for clinical, genomic, and transcriptomic information [1, 7, 155]. They have not expanded to other data types, such as DNA variation, which has been beneficial in other drug repurposing methods that do not consider sex [114]. Additionally, some sex-aware methods suffer from poorer performance because a disproportionate number of male and female samples are available to train computational methods [7]. However, there is development potential for several different categories (i.e., data mining, ligand binding predictions, molecular associations, and networks). Low-dimensional representations of molecular association and network approaches are also especially promising candidates for future sex-aware drug repurposing methodologies because they reduce the multiple hypothesis testing burden and capture sex-specific variation better than the other methods [151, 159]. Here we review how sex influences drug response, the current state of drug repurposing including with respect to sex-bias drug response, and how model organism study design choices influence drug repurposing validation.

Collapse

Restrepo S, ter Horst E, Zambrano JD, Gunn LH, Molina G, Salazar CA. Hierarchical Bayesian classification methods to identify topics by journal quartile with an application in biological sciences. EDUCATION FOR INFORMATION 2022. [DOI: 10.3233/efi-211546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Zhang XC, Yi JC, Yang GP, Wu CK, Hou TJ, Cao DS. ABC-Net: a divide-and-conquer based deep learning architecture for SMILES recognition from molecular images. Brief Bioinform 2022;23:6535678. [PMID: 35212357 DOI: 10.1093/bib/bbac033] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 01/10/2022] [Accepted: 01/24/2022] [Indexed: 11/14/2022] Open

Saldívar-González FI, Aldas-Bulos VD, Medina-Franco JL, Plisson F. Natural product drug discovery in the artificial intelligence era. Chem Sci 2022;13:1526-1546. [PMID: 35282622 PMCID: PMC8827052 DOI: 10.1039/d1sc04471k] [Citation(s) in RCA: 50] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Accepted: 12/10/2021] [Indexed: 12/19/2022] Open

Xiang J, Zhang J, Zhao Y, Wu FX, Li M. Biomedical data, computational methods and tools for evaluating disease-disease associations. Brief Bioinform 2022;23:6522999. [PMID: 35136949 DOI: 10.1093/bib/bbac006] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 01/04/2022] [Accepted: 01/05/2022] [Indexed: 12/12/2022] Open

Donoghue T, Voytek B. Automated meta-analysis of the event-related potential (ERP) literature. Sci Rep 2022;12:1867. [PMID: 35115622 PMCID: PMC8814144 DOI: 10.1038/s41598-022-05939-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Accepted: 01/18/2022] [Indexed: 12/04/2022] Open

Yim WWY, Kurikawa Y, Mizushima N. An exploratory text analysis of the autophagy research field. Autophagy 2021;18:1648-1661. [PMID: 34812110 PMCID: PMC9298454 DOI: 10.1080/15548627.2021.1995151] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open

Hypoglycemia, Vascular Disease and Cognitive Dysfunction in Diabetes: Insights from Text Mining-Based Reconstruction and Bioinformatics Analysis of the Gene Networks. Int J Mol Sci 2021;22:ijms222212419. [PMID: 34830301 PMCID: PMC8620086 DOI: 10.3390/ijms222212419] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 11/14/2021] [Accepted: 11/16/2021] [Indexed: 12/16/2022] Open

Baltoumas FA, Zafeiropoulou S, Karatzas E, Paragkamian S, Thanati F, Iliopoulos I, Eliopoulos AG, Schneider R, Jensen LJ, Pafilis E, Pavlopoulos GA. OnTheFly^2.0: a text-mining web application for automated biomedical entity recognition, document annotation, network and functional enrichment analysis. NAR Genom Bioinform 2021;3:lqab090. [PMID: 34632381 PMCID: PMC8494211 DOI: 10.1093/nargab/lqab090] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Revised: 09/09/2021] [Accepted: 09/20/2021] [Indexed: 02/06/2023] Open

Delmas M, Filangi O, Paulhe N, Vinson F, Duperier C, Garrier W, Saunier PE, Pitarch Y, Jourdan F, Giacomoni F, Frainay C. FORUM: Building a Knowledge Graph from public databases and scientific literature to extract associations between chemicals and diseases. Bioinformatics 2021;37:3896-3904. [PMID: 34478489 PMCID: PMC8570811 DOI: 10.1093/bioinformatics/btab627] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Revised: 08/16/2021] [Accepted: 09/01/2021] [Indexed: 11/22/2022] Open

Mann M, Kumar C, Zeng WF, Strauss MT. Artificial intelligence for proteomics and biomarker discovery. Cell Syst 2021;12:759-770. [PMID: 34411543 DOI: 10.1016/j.cels.2021.06.006] [Citation(s) in RCA: 86] [Impact Index Per Article: 28.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 05/07/2021] [Accepted: 06/28/2021] [Indexed: 12/14/2022]

Chen Q, Leaman R, Allot A, Luo L, Wei CH, Yan S, Lu Z. Artificial Intelligence in Action: Addressing the COVID-19 Pandemic with Natural Language Processing. Annu Rev Biomed Data Sci 2021;4:313-339. [PMID: 34465169 DOI: 10.1146/annurev-biodatasci-021821-061045] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Ali I, Dreij K, Baker S, Högberg J, Korhonen A, Stenius U. Application of Text Mining in Risk Assessment of Chemical Mixtures: A Case Study of Polycyclic Aromatic Hydrocarbons (PAHs). ENVIRONMENTAL HEALTH PERSPECTIVES 2021;129:67008. [PMID: 34165340 PMCID: PMC8318069 DOI: 10.1289/ehp6702] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Revised: 05/07/2021] [Accepted: 05/10/2021] [Indexed: 05/08/2023]

Abstract

BACKGROUND

Cancer risk assessment of complex exposures, such as exposure to mixtures of polycyclic aromatic hydrocarbons (PAHs), is challenging due to the diverse biological activities of these compounds. With the help of text mining (TM), we have developed TM tools-the latest iteration of the Cancer Risk Assessment using Biomedical literature tool (CRAB3) and a Cancer Hallmarks Analytics Tool (CHAT)-that could be useful for automatic literature analyses in cancer risk assessment and research. Although CRAB3 analyses are based on carcinogenic modes of action (MOAs) and cover almost all the key characteristics of carcinogens, CHAT evaluates literature according to the hallmarks of cancer referring to the alterations in cellular behavior that characterize the cancer cell.

OBJECTIVES

The objective was to evaluate the usefulness of these tools to support cancer risk assessment by performing a case study of 22 European Union and U.S. Environmental Protection Agency priority PAHs and diesel exhaust and a case study of PAH interactions with silica.

METHODS

We analyzed PubMed literature, comprising 57,498 references concerning priority PAHs and complex PAH mixtures, using CRAB3 and CHAT.

RESULTS

CRAB3 analyses correctly identified similarities and differences in genotoxic and nongenotoxic MOAs of the 22 priority PAHs and grouped them according to their known carcinogenic potential. CHAT had the same capacity and complemented the CRAB output when comparing, for example, benzo[a]pyrene and dibenzo[a,l]pyrene. Both CRAB3 and CHAT analyses highlighted potentially interacting mechanisms within and across complex PAH mixtures and mechanisms of possible importance for interactions with silica.

CONCLUSION

These data suggest that our TM approach can be useful in the hazard identification of PAHs and mixtures including PAHs. The tools can assist in grouping chemicals and identifying similarities and differences in carcinogenic MOAs and their interactions. https://doi.org/10.1289/EHP6702.

Collapse

Pereira V, Cooper CL, Chandwani R, Varma A, Tarba SYY. Guest editorial. JOURNAL OF KNOWLEDGE MANAGEMENT 2021. [DOI: 10.1108/jkm-02-2021-0086] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Britt BC, Britt RK, Hayes JL, Panek ET, Maddox J, Musaev A. Oral Healthcare Implications of Dedicated Online Communities: A Computational Content Analysis of the r/Dentistry Subreddit. HEALTH COMMUNICATION 2021;36:572-584. [PMID: 32091259 DOI: 10.1080/10410236.2020.1731937] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]

Danesh F, Dastani M, Ghorbani M. Retrospective and prospective approaches of coronavirus publications in the last half-century: a Latent Dirichlet allocation analysis. LIBRARY HI TECH 2021. [DOI: 10.1108/lht-09-2020-0216] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]

Turina P, Fariselli P, Capriotti E. ThermoScan: Semi-automatic Identification of Protein Stability Data From PubMed. Front Mol Biosci 2021;8:620475. [PMID: 33842537 PMCID: PMC8027235 DOI: 10.3389/fmolb.2021.620475] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Accepted: 02/18/2021] [Indexed: 11/13/2022] Open

Abstract

During the last years, the increasing number of DNA sequencing and protein mutagenesis studies has generated a large amount of variation data published in the biomedical literature. The collection of such data has been essential for the development and assessment of tools predicting the impact of protein variants at functional and structural levels. Nevertheless, the collection of manually curated data from literature is a highly time consuming and costly process that requires domain experts. In particular, the development of methods for predicting the effect of amino acid variants on protein stability relies on the thermodynamic data extracted from literature. In the past, such data were deposited in the ProTherm database, which however is no longer maintained since 2013. For facilitating the collection of protein thermodynamic data from literature, we developed the semi-automatic tool ThermoScan. ThermoScan is a text mining approach for the identification of relevant thermodynamic data on protein stability from full-text articles. The method relies on a regular expression searching for groups of words, including the most common conceptual words appearing in experimental studies on protein stability, several thermodynamic variables, and their units of measure. ThermoScan analyzes full-text articles from the PubMed Central Open Access subset and calculates an empiric score that allows the identification of manuscripts reporting thermodynamic data on protein stability. The method was optimized on a set of publications included in the ProTherm database, and tested on a new curated set of articles, manually selected for presence of thermodynamic data. The results show that ThermoScan returns accurate predictions and outperforms recently developed text-mining algorithms based on the analysis of publication abstracts. Availability: The ThermoScan server is freely accessible online at https://folding.biofold.org/thermoscan. The ThermoScan python code and the Google Chrome extension for submitting visualized PMC web pages to the ThermoScan server are available at https://github.com/biofold/ThermoScan.

Collapse

Tarasova OA, Biziukova NY, Rudik AV, Dmitriev AV, Filimonov DA, Poroikov VV. Extraction of Data on Parent Compounds and Their Metabolites from Texts of Scientific Abstracts. J Chem Inf Model 2021;61:1683-1690. [PMID: 33724829 DOI: 10.1021/acs.jcim.0c01054] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

Abstract

The growing amount of experimental data on chemical objects includes properties of small molecules, results of studies of their interaction with human and animal proteins, and methods of synthesis of organic compounds (OCs). The data obtained can be used to identify the names of OCs automatically, including all possible synonyms and relevant data on the molecular properties and biological activity. Utilization of different synonymic names of chemical compounds allows researchers to increase the completeness of data on their properties available from publications. Enrichment of the data on the names of chemical compounds by information about their possible metabolites can help estimate the biological effects of parent compounds and their metabolites more thoroughly. Therefore, an attempt at automated extraction of the names of parent compounds and their metabolites from the texts is a rather important task. In our study, we aimed at developing a method that provides the extraction of the named entities (NEs) of parent compounds and their metabolites from abstracts of scientific publications. Based on the application of the conditional random fields' algorithm, we extracted the NEs of chemical compounds. We developed a set of rules allowing identification of parent compound NEs and their metabolites in the texts. We evaluated the possibility of extracting the names of potential metabolites based on cosine similarity between strings representing names of parent compounds and all other chemical NEs found in the text. Additionally, we used conditional random fields to fetch the names of parent compounds and their metabolites from the texts based on the corpus of texts labeled manually. Our computational experiments showed that usage of rules in combination with cosine similarity could increase the accuracy of recognition of the names of metabolites compared to the rule-based algorithm and application of a machine-learning algorithm (conditional random fields).

Collapse

Cheerkoot-Jalim S, Khedo KK. A systematic review of text mining approaches applied to various application areas in the biomedical domain. JOURNAL OF KNOWLEDGE MANAGEMENT 2020. [DOI: 10.1108/jkm-09-2019-0524] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]

Abstract Purpose This work shows the results of a systematic literature review on biomedical text mining. The purpose of this study is to identify the different text mining approaches used in different application areas of the biomedical domain, the common tools used and the challenges of biomedical text mining as compared to generic text mining algorithms. This study will be of value to biomedical researchers by allowing them to correlate text mining approaches to specific biomedical application areas. Implications for future research are also discussed. Design/methodology/approach The review was conducted following the principles of the Kitchenham method. A number of research questions were first formulated, followed by the definition of the search strategy. The papers were then selected based on a list of assessment criteria. Each of the papers were analyzed and information relevant to the research questions were extracted. Findings It was found that researchers have mostly harnessed data sources such as electronic health records, biomedical literature, social media and health-related forums. The most common text mining technique was natural language processing using tools such as MetaMap and Unstructured Information Management Architecture, alongside the use of medical terminologies such as Unified Medical Language System. The main application area was the detection of adverse drug events. Challenges identified included the need to deal with huge amounts of text, the heterogeneity of the different data sources, the duality of meaning of words in biomedical text and the amount of noise introduced mainly from social media and health-related forums. Originality/value To the best of the authors’ knowledge, other reviews in this area have focused on either specific techniques, specific application areas or specific data sources. The results of this review will help researchers to correlate most relevant and recent advances in text mining approaches to specific biomedical application areas by providing an up-to-date and holistic view of work done in this research area. The use of emerging text mining techniques has great potential to spur the development of innovative applications, thus considerably impacting on the advancement of biomedical research. Collapse

Saik OV, Klimontov VV. Bioinformatic Reconstruction and Analysis of Gene Networks Related to Glucose Variability in Diabetes and Its Complications. Int J Mol Sci 2020;21:ijms21228691. [PMID: 33217980 PMCID: PMC7698756 DOI: 10.3390/ijms21228691] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Revised: 11/06/2020] [Accepted: 11/16/2020] [Indexed: 02/06/2023] Open

Gobeill J, Caucheteur D, Michel PA, Mottin L, Pasche E, Ruch P. SIB Literature Services: RESTful customizable search engines in biomedical literature, enriched with automatically mapped biomedical concepts. Nucleic Acids Res 2020;48:W12-W16. [PMID: 32379317 PMCID: PMC7319474 DOI: 10.1093/nar/gkaa328] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Revised: 04/09/2020] [Accepted: 04/22/2020] [Indexed: 01/05/2023] Open

Yan S, Wong KC. Context awareness and embedding for biomedical event extraction. Bioinformatics 2020;36:637-643. [PMID: 31392318 DOI: 10.1093/bioinformatics/btz607] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2018] [Revised: 07/26/2019] [Accepted: 08/06/2019] [Indexed: 11/13/2022] Open

Piereck B, Oliveira-Lima M, Benko-Iseppon AM, Diehl S, Schneider R, Brasileiro-Vidal AC, Barbosa-Silva A. LAITOR4HPC: A text mining pipeline based on HPC for building interaction networks. BMC Bioinformatics 2020;21:365. [PMID: 32838742 PMCID: PMC7447576 DOI: 10.1186/s12859-020-03620-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2019] [Accepted: 06/19/2020] [Indexed: 11/11/2022] Open

Abstract

Background

The amount of published full-text articles has increased dramatically. Text mining tools configure an essential approach to building biological networks, updating databases and providing annotation for new pathways. PESCADOR is an online web server based on LAITOR and NLProt text mining tools, which retrieves protein-protein co-occurrences in a tabular-based format, adding a network schema. Here we present an HPC-oriented version of PESCADOR’s native text mining tool, renamed to LAITOR4HPC, aiming to access an unlimited abstract amount in a short time to enrich available networks, build new ones and possibly highlight whether fields of research have been exhaustively studied.

Results

By taking advantage of parallel computing HPC infrastructure, the full collection of MEDLINE abstracts available until June 2017 was analyzed in a shorter period (6 days) when compared to the original online implementation (with an estimated 2 years to run the same data). Additionally, three case studies were presented to illustrate LAITOR4HPC usage possibilities. The first case study targeted soybean and was used to retrieve an overview of published co-occurrences in a single organism, retrieving 15,788 proteins in 7894 co-occurrences. In the second case study, a target gene family was searched in many organisms, by analyzing 15 species under biotic stress. Most co-occurrences regarded Arabidopsis thaliana and Zea mays. The third case study concerned the construction and enrichment of an available pathway. Choosing A. thaliana for further analysis, the defensin pathway was enriched, showing additional signaling and regulation molecules, and how they respond to each other in the modulation of this complex plant defense response.

Conclusions

LAITOR4HPC can be used for an efficient text mining based construction of biological networks derived from big data sources, such as MEDLINE abstracts. Time consumption and data input limitations will depend on the available resources at the HPC facility. LAITOR4HPC enables enough flexibility for different approaches and data amounts targeted to an organism, a subject, or a specific pathway. Additionally, it can deliver comprehensive results where interactions are classified into four types, according to their reliability.

Collapse

Malvezzi H, Marengo EB, Podgaec S, Piccinato CDA. Endometriosis: current challenges in modeling a multifactorial disease of unknown etiology. J Transl Med 2020;18:311. [PMID: 32787880 PMCID: PMC7425005 DOI: 10.1186/s12967-020-02471-0] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Accepted: 07/28/2020] [Indexed: 02/07/2023] Open

Arguello Casteleiro M, Des Diz J, Maroto N, Fernandez Prieto MJ, Peters S, Wroe C, Sevillano Torrado C, Maseda Fernandez D, Stevens R. Semantic Deep Learning: Prior Knowledge and a Type of Four-Term Embedding Analogy to Acquire Treatments for Well-Known Diseases. JMIR Med Inform 2020;8:e16948. [PMID: 32759099 PMCID: PMC7441383 DOI: 10.2196/16948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2019] [Revised: 02/27/2020] [Accepted: 02/27/2020] [Indexed: 11/13/2022] Open

Abstract

Background

How to treat a disease remains to be the most common type of clinical question. Obtaining evidence-based answers from biomedical literature is difficult. Analogical reasoning with embeddings from deep learning (embedding analogies) may extract such biomedical facts, although the state-of-the-art focuses on pair-based proportional (pairwise) analogies such as man:woman::king:queen (“queen = −man +king +woman”).

Objective

This study aimed to systematically extract disease treatment statements with a Semantic Deep Learning (SemDeep) approach underpinned by prior knowledge and another type of 4-term analogy (other than pairwise).

Methods

As preliminaries, we investigated Continuous Bag-of-Words (CBOW) embedding analogies in a common-English corpus with five lines of text and observed a type of 4-term analogy (not pairwise) applying the 3CosAdd formula and relating the semantic fields person and death: “dagger = −Romeo +die +died” (search query: −Romeo +die +died). Our SemDeep approach worked with pre-existing items of knowledge (what is known) to make inferences sanctioned by a 4-term analogy (search query −x +z1 +z2) from CBOW and Skip-gram embeddings created with a PubMed systematic reviews subset (PMSB dataset). Stage1: Knowledge acquisition. Obtaining a set of terms, candidate y, from embeddings using vector arithmetic. Some n-gram pairs from the cosine and validated with evidence (prior knowledge) are the input for the 3cosAdd, seeking a type of 4-term analogy relating the semantic fields disease and treatment. Stage 2: Knowledge organization. Identification of candidates sanctioned by the analogy belonging to the semantic field treatment and mapping these candidates to unified medical language system Metathesaurus concepts with MetaMap. A concept pair is a brief disease treatment statement (biomedical fact). Stage 3: Knowledge validation. An evidence-based evaluation followed by human validation of biomedical facts potentially useful for clinicians.

Results

We obtained 5352 n-gram pairs from 446 search queries by applying the 3CosAdd. The microaveraging performance of MetaMap for candidate y belonging to the semantic field treatment was F-measure=80.00% (precision=77.00%, recall=83.25%). We developed an empirical heuristic with some predictive power for clinical winners, that is, search queries bringing candidate y with evidence of a therapeutic intent for target disease x. The search queries -asthma +inhaled_corticosteroids +inhaled_corticosteroid and -epilepsy +valproate +antiepileptic_drug were clinical winners, finding eight evidence-based beneficial treatments.

Conclusions

Extracting treatments with therapeutic intent by analogical reasoning from embeddings (423K n-grams from the PMSB dataset) is an ambitious goal. Our SemDeep approach is knowledge-based, underpinned by embedding analogies that exploit prior knowledge. Biomedical facts from embedding analogies (4-term type, not pairwise) are potentially useful for clinicians. The heuristic offers a practical way to discover beneficial treatments for well-known diseases. Learning from deep learning models does not require a massive amount of data. Embedding analogies are not limited to pairwise analogies; hence, analogical reasoning with embeddings is underexploited.

Collapse

Nédellec C, Ibanescu L, Bossy R, Sourdille P. WTO, an ontology for wheat traits and phenotypes in scientific publications. Genomics Inform 2020;18:e14. [PMID: 32634868 PMCID: PMC7362939 DOI: 10.5808/gi.2020.18.2.e14] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Revised: 06/09/2020] [Accepted: 06/10/2020] [Indexed: 11/20/2022] Open

Abriata LA. Building blocks for commodity augmented reality-based molecular visualization and modeling in web browsers. PeerJ Comput Sci 2020;6:e260. [PMID: 33816912 PMCID: PMC7924717 DOI: 10.7717/peerj-cs.260] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2019] [Accepted: 01/22/2020] [Indexed: 06/12/2023]

Althubaiti S, Kafkas Ş, Abdelhakim M, Hoehndorf R. Combining lexical and context features for automatic ontology extension. J Biomed Semantics 2020;11:1. [PMID: 31931870 PMCID: PMC6958746 DOI: 10.1186/s13326-019-0218-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2018] [Accepted: 12/24/2019] [Indexed: 11/10/2022] Open

Oh J, Bae H, Kim CE. Construction And Analysis Of The Time-Evolving Pain-Related Brain Network Using Literature Mining. J Pain Res 2019;12:2891-2903. [PMID: 31802931 PMCID: PMC6801488 DOI: 10.2147/jpr.s217036] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Accepted: 09/17/2019] [Indexed: 11/23/2022] Open

Abstract

Purpose

We aimed to quantitatively investigate how the neuroscience field developed over time in terms of its concept on how pain is represented in the brain and compare the research trends of pain with those of mental disorders through literature mining of accumulated published articles.

Methods

The abstracts and publication years of 137,525 pain-related articles were retrieved from the PubMed database. We defined 22 pain-related brain regions that appeared more than 100 times in the retrieved abstracts. Time-evolving networks of pain-related brain regions were constructed using the co-occurrence frequency. The state-space model was implemented to capture the trend patterns of the pain-related brain regions and the patterns were compared with those of mental disorders.

Results

The number of pain-related abstracts including brain areas steadily increased; however, the relative frequency of each brain region showed different patterns. According to the chronological patterns of relative frequencies, pain-related brain regions were clustered into three groups: rising, falling, and consistent. The network of pain-related brain regions extended over time from localized regions (mainly including brain stem and diencephalon) to wider cortical/subcortical regions. In the state-space model, the relative frequency trajectory of pain-related brain regions gradually became closer to that of mental disorder-related brain regions.

Conclusion

Temporal changes of pain-related brain regions in the abstracts indicate that emotional/cognitive aspects of pain have been gradually emphasized. The networks of pain-related brain regions imply perspective changes on pain from the simple percept to the multidimensional experience. Based on the notable occurrence patterns of the cerebellum and motor cortex, we suggest that motor-related areas will be actively explored in pain studies.

Collapse

A universal information theoretic approach to the identification of stopwords. NAT MACH INTELL 2019. [DOI: 10.1038/s42256-019-0112-6] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]

Zhou W, Shao F, Li J. Bioinformatic analysis of the molecular mechanism underlying bronchial pulmonary dysplasia using a text mining approach. Medicine (Baltimore) 2019;98:e18493. [PMID: 31876736 PMCID: PMC6946243 DOI: 10.1097/md.0000000000018493] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open

Arguello-Casteleiro M, Stevens R, Des-Diz J, Wroe C, Fernandez-Prieto MJ, Maroto N, Maseda-Fernandez D, Demetriou G, Peters S, Noble PJM, Jones PH, Dukes-McEwan J, Radford AD, Keane J, Nenadic G. Exploring semantic deep learning for building reliable and reusable one health knowledge from PubMed systematic reviews and veterinary clinical notes. J Biomed Semantics 2019;10:22. [PMID: 31711540 PMCID: PMC6849172 DOI: 10.1186/s13326-019-0212-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open

Abstract

BACKGROUND

Deep Learning opens up opportunities for routinely scanning large bodies of biomedical literature and clinical narratives to represent the meaning of biomedical and clinical terms. However, the validation and integration of this knowledge on a scale requires cross checking with ground truths (i.e. evidence-based resources) that are unavailable in an actionable or computable form. In this paper we explore how to turn information about diagnoses, prognoses, therapies and other clinical concepts into computable knowledge using free-text data about human and animal health. We used a Semantic Deep Learning approach that combines the Semantic Web technologies and Deep Learning to acquire and validate knowledge about 11 well-known medical conditions mined from two sets of unstructured free-text data: 300 K PubMed Systematic Review articles (the PMSB dataset) and 2.5 M veterinary clinical notes (the VetCN dataset). For each target condition we obtained 20 related clinical concepts using two deep learning methods applied separately on the two datasets, resulting in 880 term pairs (target term, candidate term). Each concept, represented by an n-gram, is mapped to UMLS using MetaMap; we also developed a bespoke method for mapping short forms (e.g. abbreviations and acronyms). Existing ontologies were used to formally represent associations. We also create ontological modules and illustrate how the extracted knowledge can be queried. The evaluation was performed using the content within BMJ Best Practice.

RESULTS

MetaMap achieves an F measure of 88% (precision 85%, recall 91%) when applied directly to the total of 613 unique candidate terms for the 880 term pairs. When the processing of short forms is included, MetaMap achieves an F measure of 94% (precision 92%, recall 96%). Validation of the term pairs with BMJ Best Practice yields precision between 98 and 99%.

CONCLUSIONS

The Semantic Deep Learning approach can transform neural embeddings built from unstructured free-text data into reliable and reusable One Health knowledge using ontologies and content from BMJ Best Practice.

Collapse

Desterke C, Chiappini F. Lipid Related Genes Altered in NASH Connect Inflammation in Liver Pathogenesis Progression to HCC: A Canonical Pathway. Int J Mol Sci 2019;20:ijms20225594. [PMID: 31717414 PMCID: PMC6888337 DOI: 10.3390/ijms20225594] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2019] [Revised: 11/03/2019] [Accepted: 11/04/2019] [Indexed: 02/06/2023] Open

ProtFus: A Comprehensive Method Characterizing Protein-Protein Interactions of Fusion Proteins. PLoS Comput Biol 2019;15:e1007239. [PMID: 31437145 PMCID: PMC6705771 DOI: 10.1371/journal.pcbi.1007239] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2018] [Accepted: 07/03/2019] [Indexed: 01/10/2023] Open

Abstract

Tailored therapy aims to cure cancer patients effectively and safely, based on the complex interactions between patients' genomic features, disease pathology and drug metabolism. Thus, the continual increase in scientific literature drives the need for efficient methods of data mining to improve the extraction of useful information from texts based on patients' genomic features. An important application of text mining to tailored therapy in cancer encompasses the use of mutations and cancer fusion genes as moieties that change patients' cellular networks to develop cancer, and also affect drug metabolism. Fusion proteins, which are derived from the slippage of two parental genes, are produced in cancer by chromosomal aberrations and trans-splicing. Given that the two parental proteins for predicted fusion proteins are known, we used our previously developed method for identifying chimeric protein-protein interactions (ChiPPIs) associated with the fusion proteins. Here, we present a validation approach that receives fusion proteins of interest, predicts their cellular network alterations by ChiPPI and validates them by our new method, ProtFus, using an online literature search. This process resulted in a set of 358 fusion proteins and their corresponding protein interactions, as a training set for a Naïve Bayes classifier, to identify predicted fusion proteins that have reliable evidence in the literature and that were confirmed experimentally. Next, for a test group of 1817 fusion proteins, we were able to identify from the literature 2908 PPIs in total, across 18 cancer types. The described method, ProtFus, can be used for screening the literature to identify unique cases of fusion proteins and their PPIs, as means of studying alterations of protein networks in cancers. Availability: http://protfus.md.biu.ac.il/.

Collapse

García del Valle EP, Lagunes García G, Prieto Santamaría L, Zanin M, Menasalvas Ruiz E, Rodríguez-González A. Disease networks and their contribution to disease understanding: A review of their evolution, techniques and data sources. J Biomed Inform 2019;94:103206. [DOI: 10.1016/j.jbi.2019.103206] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Revised: 04/14/2019] [Accepted: 05/06/2019] [Indexed: 12/14/2022]

Essack M, Salhi A, Stanimirovic J, Tifratene F, Bin Raies A, Hungler A, Uludag M, Van Neste C, Trpkovic A, Bajic VP, Bajic VB, Isenovic ER. Literature-Based Enrichment Insights into Redox Control of Vascular Biology. OXIDATIVE MEDICINE AND CELLULAR LONGEVITY 2019;2019:1769437. [PMID: 31223421 PMCID: PMC6542245 DOI: 10.1155/2019/1769437] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2019] [Revised: 04/11/2019] [Accepted: 05/02/2019] [Indexed: 02/07/2023]

Demenkov PS, Saik OV, Ivanisenko TV, Kolchanov NA, Kochetov AV, Ivanisenko VA. Prioritization of potato genes involved in the formation of agronomically valuable traits using the SOLANUM TUBEROSUM knowledge base. Vavilovskii Zhurnal Genet Selektsii 2019. [DOI: 10.18699/vj19.501] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open

Abstract The development of highly efficient technologies in genomics, transcriptomics, proteomics and metabolomics, as well as new technologies in agriculture has led to an “information explosion” in plant biology and crop production, including potato production. Only a small part of the information reaches formalized databases (for example, Uniprot, NCBI Gene, BioGRID, IntAct, etc.). One of the main sources of reliable biological data is the scientific literature. The well-known PubMed database contains more than 18 thousand abstracts of articles on potato. The effective use of knowledge presented in such a number of non-formalized documents in natural language requires the use of modern intellectual methods of analysis. However, in the literature, there is no evidence of a widespread use of intelligent methods for automatically extracting knowledge from scientific publications on cultures such as potatoes. Earlier we developed the SOLANUM TUBEROSUM knowledge base (http://www-bionet.sysbio.cytogen. ru/and/plant/). Integrated into the knowledge base information about the molecular genetic mechanisms underlying the selection of significant traits helps to accelerate the identification of candidate genes for the breeding characteristics of potatoes and the development of diagnostic markers for breeding. The article searches for new potential participants of the molecular genetic mechanisms of resistance to adverse factors in plants. Prioritizing candidate genes has shown that the PHYA, GF14, CNIH1, RCI1A, ABI5, CPK1, RGS1, NHL3, GRF8, and CYP21-4 genes are the most promising for further testing of their relationships with resistance to adverse factors. As a result of the analysis, it was shown that the molecular genetic relationships responsible for the formation of significant agricultural traits are complex and include many direct and indirect interactions. The construction of associative gene networks and their analysis using the SOLANUM TUBEROSUM knowledge base is the basis for searching for target genes for targeted mutagenesis and marker-oriented selection of potato varieties with valuable agricultural characteristics. Collapse

Karim MR, Michel A, Zappa A, Baranov P, Sahay R, Rebholz-Schuhmann D. Improving data workflow systems with cloud services and use of open data for bioinformatics research. Brief Bioinform 2019;19:1035-1050. [PMID: 28419324 PMCID: PMC6169675 DOI: 10.1093/bib/bbx039] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2016] [Indexed: 11/22/2022] Open

Saqi M, Lysenko A, Guo YK, Tsunoda T, Auffray C. Navigating the disease landscape: knowledge representations for contextualizing molecular signatures. Brief Bioinform 2019;20:609-623. [PMID: 29684165 PMCID: PMC6556902 DOI: 10.1093/bib/bby025] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2017] [Revised: 02/05/2018] [Indexed: 12/14/2022] Open

Inferring Drug-Protein⁻Side Effect Relationships from Biomedical Text. Genes (Basel) 2019;10:genes10020159. [PMID: 30791472 PMCID: PMC6409686 DOI: 10.3390/genes10020159] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Revised: 02/13/2019] [Accepted: 02/14/2019] [Indexed: 11/16/2022] Open

Abstract

Background: Although there are many studies of drugs and their side effects, the underlying mechanisms of these side effects are not well understood. It is also difficult to understand the specific pathways between drugs and side effects. Objective: The present study seeks to construct putative paths between drugs and their side effects by applying text-mining techniques to free text of biomedical studies, and to develop ranking metrics that could identify the most-likely paths. Materials and Methods: We extracted three types of relationships—drug-protein, protein-protein, and protein–side effect—from biomedical texts by using text mining and predefined relation-extraction rules. Based on the extracted relationships, we constructed whole drug-protein–side effect paths. For each path, we calculated its ranking score by a new ranking function that combines corpus- and ontology-based semantic similarity as well as co-occurrence frequency. Results: We extracted 13 plausible biomedical paths connecting drugs and their side effects from cancer-related abstracts in the PubMed database. The top 20 paths were examined, and the proposed ranking function outperformed the other methods tested, including co-occurrence, COALS, and UMLS by P@5-P@20. In addition, we confirmed that the paths are novel hypotheses that are worth investigating further. Discussion: The risk of side effects has been an important issue for the US Food and Drug Administration (FDA). However, the causes and mechanisms of such side effects have not been fully elucidated. This study extends previous research on understanding drug side effects by using various techniques such as Named Entity Recognition (NER), Relation Extraction (RE), and semantic similarity. Conclusion: It is not easy to reveal the biomedical mechanisms of side effects due to a huge number of possible paths. However, we automatically generated predictable paths using the proposed approach, which could provide meaningful information to biomedical researchers to generate plausible hypotheses for the understanding of such mechanisms.

Collapse