1
|
Arora S, Chettri S, Percha V, Kumar D, Latwal M. Artifical intelligence: a virtual chemist for natural product drug discovery. J Biomol Struct Dyn 2024; 42:3826-3835. [PMID: 37232451 DOI: 10.1080/07391102.2023.2216295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 05/12/2023] [Indexed: 05/27/2023]
Abstract
Nature is full of a bundle of medicinal substances and its product perceived as a prerogative structure to collaborate with protein drug targets. The natural product's (NPs) structure heterogeneity and eccentric characteristics inspired scientists to work on natural product-inspired medicine. To gear NP drug-finding artificial intelligence (AI) to confront and excavate unexplored opportunities. Natural product-inspired drug discoveries based on AI to act as an innovative tool for molecular design and lead discovery. Various models of machine learning produce quickly synthesizable mimetics of the natural products templates. The invention of novel natural products mimetics by computer-assisted technology provides a feasible strategy to get the natural product with defined bio-activities. AI's hit rate makes its high importance by improving trail patterns such as dose selection, trail life span, efficacy parameters, and biomarkers. Along these lines, AI methods can be a successful tool in a targeted way to formulate advanced medicinal applications for natural products. 'Prediction of future of natural product based drug discovery is not magic, actually its artificial intelligence'Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Shefali Arora
- Department of Chemistry, University of Petroleum and Energy Studies, Dehradun, Uttarakhand, India
| | - Sukanya Chettri
- Department of Chemistry, University of Petroleum and Energy Studies, Dehradun, Uttarakhand, India
| | - Versha Percha
- Department of Pharmaceutical Chemistry, Dolphin(PG) Institute of Biomedical and Natural Sciences, Dehradun, Uttarakhand, India
| | - Deepak Kumar
- Department of Pharmaceutical Chemistry, Dolphin(PG) Institute of Biomedical and Natural Sciences, Dehradun, Uttarakhand, India
| | - Mamta Latwal
- Department of Chemistry, University of Petroleum and Energy Studies, Dehradun, Uttarakhand, India
| |
Collapse
|
2
|
Chen J, Ikeda SI, Negishi K, Tsubota K, Kurihara T. Identification of Potential Therapeutic Targets for Myopic Choroidal Neovascularization via Discovery-Driven Data Mining. Curr Eye Res 2023; 48:1160-1169. [PMID: 37610842 DOI: 10.1080/02713683.2023.2252201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 07/29/2023] [Accepted: 08/21/2023] [Indexed: 08/25/2023]
Abstract
Purpose: Myopic choroidal neovascularization (mCNV) is a prevalent cause of vision loss. However, the development of effective therapeutic targets for mCNV has been hindered by the paucity of suitable animal models. Therefore, the aim of this study is to identify potential genes and pathways associated with mCNV and to unearth prospective therapeutic targets that can be utilized to devise efficacious treatments.Methods: Text data mining was used to identify genes linked to choroid, neovascularization, and myopia. g: Profiler was utilized to analyze the biological processes of gene ontology and the Reactome pathways. Protein interaction network analysis was performed using strings and visualized in Cytoscape. MCODE and cytoHubba were used for further screening.Results: Discovery-driven text data mining identified 55 potential genes related to choroid, neovascularization, and myopia. Gene enrichment analysis revealed 11 biological processes and seven Reactome pathways. A protein-protein interaction network with 47 nodes was constructed and analyzed using centrality ranking. Key clusters were identified through algorithm tools. Finally, 14 genes (IL6, FGF2, MMP9, IL10, TNF, MMP2, HGF, MMP3, IGF1, CCL2, CTNNB1, BDNF, NGF, and EDN1), in addition to VEGFA, were evaluated as targets with potential as future therapeutics.Conclusions: This study provides new potential therapeutic targets for mCNV, including IL6, FGF2, MMP9, IL10, TNF, MMP2, HGF, MMP3, IGF1, CCL2, CTNNB1, BDNF, NGF, and EDN1, which correspond to seven potential enriched pathways. These findings provide a basis for further research and offer new possibilities for developing therapeutic interventions for this condition.
Collapse
Affiliation(s)
- Junhan Chen
- Laboratory of Photobiology, Keio University School of Medicine, Tokyo, Japan
- Department of Ophthalmology, Keio University School of Medicine, Tokyo, Japan
| | - Shin-Ichi Ikeda
- Laboratory of Photobiology, Keio University School of Medicine, Tokyo, Japan
- Department of Ophthalmology, Keio University School of Medicine, Tokyo, Japan
| | - Kazuno Negishi
- Department of Ophthalmology, Keio University School of Medicine, Tokyo, Japan
| | - Kazuo Tsubota
- Department of Ophthalmology, Keio University School of Medicine, Tokyo, Japan
- Tsubota Laboratory, Inc, Tokyo, Japan
| | - Toshihide Kurihara
- Laboratory of Photobiology, Keio University School of Medicine, Tokyo, Japan
- Department of Ophthalmology, Keio University School of Medicine, Tokyo, Japan
| |
Collapse
|
3
|
Buch AM, Vértes PE, Seidlitz J, Kim SH, Grosenick L, Liston C. Molecular and network-level mechanisms explaining individual differences in autism spectrum disorder. Nat Neurosci 2023; 26:650-663. [PMID: 36894656 DOI: 10.1038/s41593-023-01259-x] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Accepted: 01/17/2023] [Indexed: 03/11/2023]
Abstract
The mechanisms underlying phenotypic heterogeneity in autism spectrum disorder (ASD) are not well understood. Using a large neuroimaging dataset, we identified three latent dimensions of functional brain network connectivity that predicted individual differences in ASD behaviors and were stable in cross-validation. Clustering along these three dimensions revealed four reproducible ASD subgroups with distinct functional connectivity alterations in ASD-related networks and clinical symptom profiles that were reproducible in an independent sample. By integrating neuroimaging data with normative gene expression data from two independent transcriptomic atlases, we found that within each subgroup, ASD-related functional connectivity was explained by regional differences in the expression of distinct ASD-related gene sets. These gene sets were differentially associated with distinct molecular signaling pathways involving immune and synapse function, G-protein-coupled receptor signaling, protein synthesis and other processes. Collectively, our findings delineate atypical connectivity patterns underlying different forms of ASD that implicate distinct molecular signaling mechanisms.
Collapse
Affiliation(s)
- Amanda M Buch
- Department of Psychiatry and Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA
| | - Petra E Vértes
- Department of Psychiatry, University of Cambridge, Cambridge, UK
| | - Jakob Seidlitz
- Department of Psychiatry, University of Pennsylvania, Philadelphia, PA, USA
- Department of Child and Adolescent Psychiatry and Behavioral Science, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - So Hyun Kim
- Department of Psychiatry and Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA
- Center for Autism and the Developing Brain, Weill Cornell Medicine, White Plains, NY, USA
- School of Psychology, Korea University, Seoul, South Korea
| | - Logan Grosenick
- Department of Psychiatry and Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA.
| | - Conor Liston
- Department of Psychiatry and Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA.
| |
Collapse
|
4
|
Kontoghiorghes L, Colubi A. New metrics and tests for subject prevalence in documents based on topic modeling. Int J Approx Reason 2023. [DOI: 10.1016/j.ijar.2023.02.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
|
5
|
Scott-Fordsmand JJ, Amorim MJB. Using Machine Learning to make nanomaterials sustainable. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 859:160303. [PMID: 36410486 DOI: 10.1016/j.scitotenv.2022.160303] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Revised: 11/06/2022] [Accepted: 11/15/2022] [Indexed: 06/16/2023]
Abstract
Sustainable development is a key challenge for contemporary human societies; failure to achieve sustainability could threaten human survival. In this review article, we illustrate how Machine Learning (ML) could support more sustainable development, covering the basics of data gathering through each step of the Environmental Risk Assessment (ERA). The literature provides several examples showing how ML can be employed in most steps of a typical ERA.A key observation is that there are currently no clear guidance for using such autonomous technologies in ERAs or which standards/checks are required. Steering thus seems to be the most important task for supporting the use of ML in the ERA of nano- and smart-materials. Resources should be devoted to developing a strategy for implementing ML in ERA with a strong emphasis on data foundations, methodologies, and the related sensitivities/uncertainties. We should recognise historical errors and biases (e.g., in data) to avoid embedding them during ML programming.
Collapse
Affiliation(s)
| | - Mónica J B Amorim
- Department of Biology & CESAM, University of Aveiro, 3810-193 Aveiro, Portugal.
| |
Collapse
|
6
|
Gene Identification and Potential Drug Therapy for Drug-Resistant Melanoma with Bioinformatics and Deep Learning Technology. DISEASE MARKERS 2022; 2022:2461055. [PMID: 35915735 PMCID: PMC9338845 DOI: 10.1155/2022/2461055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Revised: 06/13/2022] [Accepted: 06/21/2022] [Indexed: 11/17/2022]
Abstract
Background. Melanomas are skin malignant tumors that arise from melanocytes which are primarily treated with surgery, chemotherapy, targeted therapy, immunotherapy, radiation therapy, etc. Targeted therapy is a promising approach to treating advanced melanomas, but resistance always occurs. This study is aimed at identifying the potential target genes and candidate drugs for drug-resistant melanoma effectively with computational methods. Methods. Identification of genes associated with drug-resistant melanomas was conducted using the text mining tool pubmed2ensembl. Further gene screening was carried out by GO and KEGG pathway enrichment analyses. The PPI network was constructed using STRING database and Cytoscape. GEPIA was used to perform the survival analysis and conduct the Kaplan-Meier curve. Drugs targeted at these genes were selected in Pharmaprojects. The binding affinity scores of drug-target interactions were predicted by DeepPurpose. Results. A total of 433 genes were found associated with drug-resistant melanomas by text mining. The most statistically differential functional enriched pathways of GO and KEGG analyses contained 348 genes, and 27 hub genes were further screened out by MCODE in Cytoscape. Six genes were identified with statistical differences after survival analysis and literature review. 16 candidate drugs targeted at hub genes were found by Pharmaprojects under our restrictions. Finally, 11 ERBB2-targeted drugs with top affinity scores were predicted by DeepPurpose, including 10 ERBB2 kinase inhibitors and 1 antibody-drug conjugate. Conclusion. Text mining and bioinformatics are valuable methods for gene identification in drug discovery. DeepPurpose is an efficient and operative deep learning tool for predicting the DTI and selecting the candidate drugs.
Collapse
|
7
|
Ali F, Khan A, Muhammad SA, Abbas SQ, Hassan SSU, Bungau S. Genome-wide Meta-analysis Reveals New Gene Signatures and Potential Drug Targets of Hypertension. ACS OMEGA 2022; 7:22754-22772. [PMID: 35811894 PMCID: PMC9260904 DOI: 10.1021/acsomega.2c02277] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Accepted: 06/03/2022] [Indexed: 06/02/2023]
Abstract
The prevalence of hypertension reported around the world is increasing and is an important public health challenge. This study was designed to explore the disease's genetic variations and to identify new hypertension-related genes and target proteins. We analyzed 22 publicly available Affymetrix cDNA datasets of hypertension using an integrated system-level framework involving differential expression genetic (DEG) analysis, data mining, gene enrichment, protein-protein interaction, microRNA analysis, toxicogenomics, gene regulation, molecular docking, and simulation studies. We found potential DEGs after screening out the extracellular proteins. We studied the functional role of seven shortlisted DEGs (ADM, EDN1, ANGPTL4, NFIL3, MSR1, CEBPD, and USP8) in hypertension after disease gene curation analysis. The expression profiling and cluster analysis showed significant variations and enriched GO terms. hsa-miR-365a-3p, hsa-miR-2052, hsa-miR-3065-3p, hsa-miR-603, hsa-miR-7113-3p, hsa-miR-3923, and hsa-miR-524-5p were identified as hypertension-associated miRNA targets for each gene using computational algorithms. We found functional interactions of source DEGs with target and important gene signatures including EGFR, AGT, AVP, APOE, RHOA, SRC, APOB, STAT3, UBC, LPL, APOA1, and AKT1 associated with the disease. These DEGs are mainly involved in fatty acid metabolism, myometrial pathways, MAPK, and G-alpha signaling pathways linked with hypertension pathogenesis. We predicted significantly disordered regions of 71.2, 48.8, and 45.4% representing the mutation in the sequence of NFIL3, USP8, and ADM, respectively. Regulation of gene expression was performed to find upregulated genes. Molecular docking analysis was used to evaluate Food and Drug Administration-approved medicines against the four DEGs that were overexpressed. For each elevated target protein, the three best drug candidates were chosen. Furthermore, molecular dynamics (MD) simulation using the target's active sites for 100 ns was used to validate these 12 complexes after docking. This investigation establishes the worth of systems genetics for finding four possible genes as potential drug targets for hypertension. These network-based approaches are significant for finding genetic variant data, which will advance the understanding of how to hasten the identification of drug targets and improve the understanding regarding the treatment of hypertension.
Collapse
Affiliation(s)
- Fawad Ali
- Riphah
Institute of Pharmaceutical Sciences, Riphah
International University, Islamabad, 44000 Pakistan
- Department
of Pharmacy, Kohat University of science
and technology, Kohat, 26000 Pakistan
| | - Arifullah Khan
- Riphah
Institute of Pharmaceutical Sciences, Riphah
International University, Islamabad, 44000 Pakistan
| | - Syed Aun Muhammad
- Institute
of Molecular Biology and Biotechnology, Bahauddin Zakariya University, Multan, 60800 Pakistan
| | - Syed Qamar Abbas
- Department
of Pharmacy, Sarhad University of Science
and Technology, Peshawar 24840, Pakistan
| | - Syed Shams ul Hassan
- Shanghai
Key Laboratory for Molecular Engineering of Chiral Drugs, School of
Pharmacy, Shanghai Jiao Tong University, Shanghai 200240, PR China
- Department
of Natural Product Chemistry, School of Pharmacy, Shanghai Jiao Tong University, Shanghai 200240, PR China
| | - Simona Bungau
- Department
of Pharmacy, Faculty of Medicine and Pharmacy, University of Oradea, 410028 Oradea, Romania
- Doctoral
School of Biological and Biomedical Sciences, University of Oradea, 410087 Oradea, Romania
| |
Collapse
|
8
|
Silva MC, Eugénio P, Faria D, Pesquita C. Ontologies and Knowledge Graphs in Oncology Research. Cancers (Basel) 2022; 14:cancers14081906. [PMID: 35454813 PMCID: PMC9029532 DOI: 10.3390/cancers14081906] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 03/25/2022] [Accepted: 04/07/2022] [Indexed: 11/16/2022] Open
Abstract
The complexity of cancer research stems from leaning on several biomedical disciplines for relevant sources of data, many of which are complex in their own right. A holistic view of cancer—which is critical for precision medicine approaches—hinges on integrating a variety of heterogeneous data sources under a cohesive knowledge model, a role which biomedical ontologies can fill. This study reviews the application of ontologies and knowledge graphs in cancer research. In total, our review encompasses 141 published works, which we categorized under 14 hierarchical categories according to their usage of ontologies and knowledge graphs. We also review the most commonly used ontologies and newly developed ones. Our review highlights the growing traction of ontologies in biomedical research in general, and cancer research in particular. Ontologies enable data accessibility, interoperability and integration, support data analysis, facilitate data interpretation and data mining, and more recently, with the emergence of the knowledge graph paradigm, support the application of Artificial Intelligence methods to unlock new knowledge from a holistic view of the available large volumes of heterogeneous data.
Collapse
|
9
|
Alshahrani M, Almansour A, Alkhaldi A, Thafar MA, Uludag M, Essack M, Hoehndorf R. Combining biomedical knowledge graphs and text to improve predictions for drug-target interactions and drug-indications. PeerJ 2022; 10:e13061. [PMID: 35402106 PMCID: PMC8988936 DOI: 10.7717/peerj.13061] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Accepted: 02/13/2022] [Indexed: 01/11/2023] Open
Abstract
Biomedical knowledge is represented in structured databases and published in biomedical literature, and different computational approaches have been developed to exploit each type of information in predictive models. However, the information in structured databases and literature is often complementary. We developed a machine learning method that combines information from literature and databases to predict drug targets and indications. To effectively utilize information in published literature, we integrate knowledge graphs and published literature using named entity recognition and normalization before applying a machine learning model that utilizes the combination of graph and literature. We then use supervised machine learning to show the effects of combining features from biomedical knowledge and published literature on the prediction of drug targets and drug indications. We demonstrate that our approach using datasets for drug-target interactions and drug indications is scalable to large graphs and can be used to improve the ranking of targets and indications by exploiting features from either structure or unstructured information alone.
Collapse
Affiliation(s)
- Mona Alshahrani
- National Center for Artificial Intelligence (NCAI), Saudi Data and Artificial Intelligence Authority (SDAIA), Riyadh, Saudi Arabia
| | - Abdullah Almansour
- National Center for Artificial Intelligence (NCAI), Saudi Data and Artificial Intelligence Authority (SDAIA), Riyadh, Saudi Arabia
| | - Asma Alkhaldi
- National Center for Artificial Intelligence (NCAI), Saudi Data and Artificial Intelligence Authority (SDAIA), Riyadh, Saudi Arabia
| | - Maha A. Thafar
- College of Computers and Information Technology, Taif University, Taif, Saudi Arabia,Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Mahmut Uludag
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Magbubah Essack
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| |
Collapse
|
10
|
Darling: A Web Application for Detecting Disease-Related Biomedical Entity Associations with Literature Mining. Biomolecules 2022; 12:biom12040520. [PMID: 35454109 PMCID: PMC9028073 DOI: 10.3390/biom12040520] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Revised: 03/24/2022] [Accepted: 03/28/2022] [Indexed: 12/15/2022] Open
Abstract
Finding, exploring and filtering frequent sentence-based associations between a disease and a biomedical entity, co-mentioned in disease-related PubMed literature, is a challenge, as the volume of publications increases. Darling is a web application, which utilizes Name Entity Recognition to identify human-related biomedical terms in PubMed articles, mentioned in OMIM, DisGeNET and Human Phenotype Ontology (HPO) disease records, and generates an interactive biomedical entity association network. Nodes in this network represent genes, proteins, chemicals, functions, tissues, diseases, environments and phenotypes. Users can search by identifiers, terms/entities or free text and explore the relevant abstracts in an annotated format.
Collapse
|
11
|
Fisher JL, Jones EF, Flanary VL, Williams AS, Ramsey EJ, Lasseigne BN. Considerations and challenges for sex-aware drug repurposing. Biol Sex Differ 2022; 13:13. [PMID: 35337371 PMCID: PMC8949654 DOI: 10.1186/s13293-022-00420-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Accepted: 03/06/2022] [Indexed: 01/09/2023] Open
Abstract
Sex differences are essential factors in disease etiology and manifestation in many diseases such as cardiovascular disease, cancer, and neurodegeneration [33]. The biological influence of sex differences (including genomic, epigenetic, hormonal, immunological, and metabolic differences between males and females) and the lack of biomedical studies considering sex differences in their study design has led to several policies. For example, the National Institute of Health's (NIH) sex as a biological variable (SABV) and Sex and Gender Equity in Research (SAGER) policies to motivate researchers to consider sex differences [204]. However, drug repurposing, a promising alternative to traditional drug discovery by identifying novel uses for FDA-approved drugs, lacks sex-aware methods that can improve the identification of drugs that have sex-specific responses [7, 11, 14, 33]. Sex-aware drug repurposing methods either select drug candidates that are more efficacious in one sex or deprioritize drug candidates based on if they are predicted to cause a sex-bias adverse event (SBAE), unintended therapeutic effects that are more likely to occur in one sex. Computational drug repurposing methods are encouraging approaches to develop for sex-aware drug repurposing because they can prioritize sex-specific drug candidates or SBAEs at lower cost and time than traditional drug discovery. Sex-aware methods currently exist for clinical, genomic, and transcriptomic information [1, 7, 155]. They have not expanded to other data types, such as DNA variation, which has been beneficial in other drug repurposing methods that do not consider sex [114]. Additionally, some sex-aware methods suffer from poorer performance because a disproportionate number of male and female samples are available to train computational methods [7]. However, there is development potential for several different categories (i.e., data mining, ligand binding predictions, molecular associations, and networks). Low-dimensional representations of molecular association and network approaches are also especially promising candidates for future sex-aware drug repurposing methodologies because they reduce the multiple hypothesis testing burden and capture sex-specific variation better than the other methods [151, 159]. Here we review how sex influences drug response, the current state of drug repurposing including with respect to sex-bias drug response, and how model organism study design choices influence drug repurposing validation.
Collapse
Affiliation(s)
- Jennifer L. Fisher
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294 USA
| | - Emma F. Jones
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294 USA
| | - Victoria L. Flanary
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294 USA
| | - Avery S. Williams
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294 USA
| | - Elizabeth J. Ramsey
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294 USA
| | - Brittany N. Lasseigne
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294 USA
| |
Collapse
|
12
|
Restrepo S, ter Horst E, Zambrano JD, Gunn LH, Molina G, Salazar CA. Hierarchical Bayesian classification methods to identify topics by journal quartile with an application in biological sciences. EDUCATION FOR INFORMATION 2022. [DOI: 10.3233/efi-211546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
This manuscript builds on a novel, automatic, freely-available Bayesian approach to extract information in abstracts and titles to classify research topics by quartile. This approach is demonstrated for all N= 149,129 ISI-indexed publications in biological sciences journals during 2017. A Bayesian multinomial inverse regression approach is used to extract rankings of topics without the need of a pre-defined dictionary. Bigrams are used for extraction of research topics across manuscripts, and rankings of research topics are constructed by quartile. Worldwide and local results (e.g., comparison between two peer/aspirational research institutions in Colombia) are provided, and differences are explored both at the global and local levels. Some topics persist across quartiles, while the relevance of others is quartile-specific. Challenges in sustainable development appear as more prevalent in top quartile journals across institutions, while the two Colombian institutions favour plant and microorganism research. This approach can reduce information inequities, by allowing young/incipient researchers in biological sciences, especially within lower income countries or universities with limited resources, to freely assess the state of the literature and the relative likelihood of publication in higher impact journals by research topic. This can also serve institutions of higher education to identify missing research topics and areas of competitive advantage.
Collapse
Affiliation(s)
| | | | | | - Laura H. Gunn
- University of North Carolina at Charlotte & Imperial College London, USA
| | | | | |
Collapse
|
13
|
Zhang XC, Yi JC, Yang GP, Wu CK, Hou TJ, Cao DS. ABC-Net: a divide-and-conquer based deep learning architecture for SMILES recognition from molecular images. Brief Bioinform 2022; 23:6535678. [PMID: 35212357 DOI: 10.1093/bib/bbac033] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 01/10/2022] [Accepted: 01/24/2022] [Indexed: 11/14/2022] Open
Abstract
Structural information for chemical compounds is often described by pictorial images in most scientific documents, which cannot be easily understood and manipulated by computers. This dilemma makes optical chemical structure recognition (OCSR) an essential tool for automatically mining knowledge from an enormous amount of literature. However, existing OCSR methods fall far short of our expectations for realistic requirements due to their poor recovery accuracy. In this paper, we developed a deep neural network model named ABC-Net (Atom and Bond Center Network) to predict graph structures directly. Based on the divide-and-conquer principle, we propose to model an atom or a bond as a single point in the center. In this way, we can leverage a fully convolutional neural network (CNN) to generate a series of heat-maps to identify these points and predict relevant properties, such as atom types, atom charges, bond types and other properties. Thus, the molecular structure can be recovered by assembling the detected atoms and bonds. Our approach integrates all the detection and property prediction tasks into a single fully CNN, which is scalable and capable of processing molecular images quite efficiently. Experimental results demonstrate that our method could achieve a significant improvement in recognition performance compared with publicly available tools. The proposed method could be considered as a promising solution to OCSR problems and a starting point for the acquisition of molecular information in the literature.
Collapse
Affiliation(s)
- Xiao-Chen Zhang
- School of Computer Science, National University of Defense Technology, China
| | - Jia-Cai Yi
- School of Computer Science and Technology, National University of Defense Technology, China
| | - Guo-Ping Yang
- Center of Clinical Pharmacology, the Third Xiangya Hospital, Central South University, China
| | - Cheng-Kun Wu
- Institute for Quantum Information & State Key Laboratory of High-Performance Computing, College of Computer Science and Technology, National University of Defense Technology, China
| | - Ting-Jun Hou
- College of Pharmaceutical Sciences, Zhejiang University, China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, China
| |
Collapse
|
14
|
Saldívar-González FI, Aldas-Bulos VD, Medina-Franco JL, Plisson F. Natural product drug discovery in the artificial intelligence era. Chem Sci 2022; 13:1526-1546. [PMID: 35282622 PMCID: PMC8827052 DOI: 10.1039/d1sc04471k] [Citation(s) in RCA: 50] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Accepted: 12/10/2021] [Indexed: 12/19/2022] Open
Abstract
Natural products (NPs) are primarily recognized as privileged structures to interact with protein drug targets. Their unique characteristics and structural diversity continue to marvel scientists for developing NP-inspired medicines, even though the pharmaceutical industry has largely given up. High-performance computer hardware, extensive storage, accessible software and affordable online education have democratized the use of artificial intelligence (AI) in many sectors and research areas. The last decades have introduced natural language processing and machine learning algorithms, two subfields of AI, to tackle NP drug discovery challenges and open up opportunities. In this article, we review and discuss the rational applications of AI approaches developed to assist in discovering bioactive NPs and capturing the molecular "patterns" of these privileged structures for combinatorial design or target selectivity.
Collapse
Affiliation(s)
- F I Saldívar-González
- DIFACQUIM Research Group, School of Chemistry, Department of Pharmacy, Universidad Nacional Autónoma de México Avenida Universidad 3000 04510 Mexico Mexico
| | - V D Aldas-Bulos
- Unidad de Genómica Avanzada, Laboratorio Nacional de Genómica para la Biodiversidad (Langebio), Centro de Investigación y de Estudios Avanzados del IPN Irapuato Guanajuato Mexico
| | - J L Medina-Franco
- DIFACQUIM Research Group, School of Chemistry, Department of Pharmacy, Universidad Nacional Autónoma de México Avenida Universidad 3000 04510 Mexico Mexico
| | - F Plisson
- CONACYT - Unidad de Genómica Avanzada, Laboratorio Nacional de Genómica para la Biodiversidad (Langebio), Centro de Investigación y de Estudios Avanzados del IPN Irapuato Guanajuato Mexico
| |
Collapse
|
15
|
Xiang J, Zhang J, Zhao Y, Wu FX, Li M. Biomedical data, computational methods and tools for evaluating disease-disease associations. Brief Bioinform 2022; 23:6522999. [PMID: 35136949 DOI: 10.1093/bib/bbac006] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 01/04/2022] [Accepted: 01/05/2022] [Indexed: 12/12/2022] Open
Abstract
In recent decades, exploring potential relationships between diseases has been an active research field. With the rapid accumulation of disease-related biomedical data, a lot of computational methods and tools/platforms have been developed to reveal intrinsic relationship between diseases, which can provide useful insights to the study of complex diseases, e.g. understanding molecular mechanisms of diseases and discovering new treatment of diseases. Human complex diseases involve both external phenotypic abnormalities and complex internal molecular mechanisms in organisms. Computational methods with different types of biomedical data from phenotype to genotype can evaluate disease-disease associations at different levels, providing a comprehensive perspective for understanding diseases. In this review, available biomedical data and databases for evaluating disease-disease associations are first summarized. Then, existing computational methods for disease-disease associations are reviewed and classified into five groups in terms of the usages of biomedical data, including disease semantic-based, phenotype-based, function-based, representation learning-based and text mining-based methods. Further, we summarize software tools/platforms for computation and analysis of disease-disease associations. Finally, we give a discussion and summary on the research of disease-disease associations. This review provides a systematic overview for current disease association research, which could promote the development and applications of computational methods and tools/platforms for disease-disease associations.
Collapse
Affiliation(s)
- Ju Xiang
- School of Computer Science and Engineering, Central South University, China
| | - Jiashuai Zhang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Yichao Zhao
- School of Computer Science and Engineering, Central South University, China
| | - Fang-Xiang Wu
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Min Li
- Division of Biomedical Engineering and Department of Mechanical Engineering at University of Saskatchewan, Saskatoon, Canada
| |
Collapse
|
16
|
Donoghue T, Voytek B. Automated meta-analysis of the event-related potential (ERP) literature. Sci Rep 2022; 12:1867. [PMID: 35115622 PMCID: PMC8814144 DOI: 10.1038/s41598-022-05939-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Accepted: 01/18/2022] [Indexed: 12/04/2022] Open
Abstract
Event-related potentials (ERPs) are a common approach for investigating the neural basis of cognition and disease. There exists a vast and growing literature of ERP-related articles, the scale of which motivates the need for efficient and systematic meta-analytic approaches for characterizing this research. Here we present an automated text-mining approach as a form of meta-analysis to examine the relationships between ERP terms, cognitive domains and clinical disorders. We curated dictionaries of terms, collected articles of interest, and measured co-occurrence probabilities in published articles between ERP components and cognitive and disorder terms. Collectively, this literature dataset allows for creating data-driven profiles for each ERP, examining key associations of each component, and comparing the similarity across components, ultimately allowing for characterizing patterns and associations between topics and components. Additionally, by examining large literature collections, novel analyses can be done, such as examining how ERPs of different latencies relate to different cognitive associations. This openly available dataset and project can be used both as a pedagogical tool, and as a method of inquiry into the previously hidden structure of the existing literature. This project also motivates the need for consistency in naming, and for developing a clear ontology of electrophysiological components.
Collapse
Affiliation(s)
- Thomas Donoghue
- Department of Cognitive Science, University of California, San Diego, La Jolla, USA.
| | - Bradley Voytek
- Department of Cognitive Science, University of California, San Diego, La Jolla, USA.,Neurosciences Graduate Program, University of California, San Diego, La Jolla, USA.,Halıcıoğlu Data Science Institute, University of California, San Diego, La Jolla, USA
| |
Collapse
|
17
|
Yim WWY, Kurikawa Y, Mizushima N. An exploratory text analysis of the autophagy research field. Autophagy 2021; 18:1648-1661. [PMID: 34812110 PMCID: PMC9298454 DOI: 10.1080/15548627.2021.1995151] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
After its discovery in the 1950 s, the autophagy research field has seen its annual number of publications climb from tens to thousands. The ever-growing number of autophagy publications is a wealth of information but presents a challenge to researchers, especially those new to the field, who are looking for a general overview of the field to, for example, determine current topics of the field or formulate new hypotheses. Here, we employed text mining tools to extract research trends in the autophagy field, including those of genes, terms, and topics. The publication trend of the field can be separated into three phases. The exponential rise in publication number began in the last phase and is most likely spurred by a series of highly cited research papers published in previous phases. The exponential increase in papers has resulted in a larger variety of research topics, with the majority involving those that are directly physiologically relevant, such as disease and modulating autophagy. Our findings provide researchers a summary of the history of the autophagy research field and perhaps hints of what is to come.Abbreviations: 5Y-IF: 5-year impact factor; AIS: article influence score; EM: electron microscopy; HGNC: HUGO gene nomenclature committee; LDA: latent Dirichlet allocation; MeSH: medical subject headings; ncRNA: non-coding RNA.
Collapse
Affiliation(s)
- Willa Wen-You Yim
- Department of Biochemistry and Molecular Biology, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Yoshitaka Kurikawa
- Department of Biochemistry and Molecular Biology, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Noboru Mizushima
- Department of Biochemistry and Molecular Biology, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
18
|
Hypoglycemia, Vascular Disease and Cognitive Dysfunction in Diabetes: Insights from Text Mining-Based Reconstruction and Bioinformatics Analysis of the Gene Networks. Int J Mol Sci 2021; 22:ijms222212419. [PMID: 34830301 PMCID: PMC8620086 DOI: 10.3390/ijms222212419] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 11/14/2021] [Accepted: 11/16/2021] [Indexed: 12/16/2022] Open
Abstract
Hypoglycemia has been recognized as a risk factor for diabetic vascular complications and cognitive decline, but the molecular mechanisms of the effect of hypoglycemia on target organs are not fully understood. In this work, gene networks of hypoglycemia and cardiovascular disease, diabetic retinopathy, diabetic nephropathy, diabetic neuropathy, cognitive decline, and Alzheimer's disease were reconstructed using ANDSystem, a text-mining-based tool. The gene network of hypoglycemia included 141 genes and 2467 interactions. Enrichment analysis of Gene Ontology (GO) biological processes showed that the regulation of insulin secretion, glucose homeostasis, apoptosis, nitric oxide biosynthesis, and cell signaling are significantly enriched for hypoglycemia. Among the network hubs, INS, IL6, LEP, TNF, IL1B, EGFR, and FOS had the highest betweenness centrality, while GPR142, MBOAT4, SLC5A4, IGFBP6, PPY, G6PC1, SLC2A2, GYS2, GCGR, and AQP7 demonstrated the highest cross-talk specificity. Hypoglycemia-related genes were overrepresented in the gene networks of diabetic complications and comorbidity; moreover, 14 genes were mutual for all studied disorders. Eleven GO biological processes (glucose homeostasis, nitric oxide biosynthesis, smooth muscle cell proliferation, ERK1 and ERK2 cascade, etc.) were overrepresented in all reconstructed networks. The obtained results expand our understanding of the molecular mechanisms underlying the deteriorating effects of hypoglycemia in diabetes-associated vascular disease and cognitive dysfunction.
Collapse
|
19
|
Baltoumas FA, Zafeiropoulou S, Karatzas E, Paragkamian S, Thanati F, Iliopoulos I, Eliopoulos AG, Schneider R, Jensen LJ, Pafilis E, Pavlopoulos GA. OnTheFly 2.0: a text-mining web application for automated biomedical entity recognition, document annotation, network and functional enrichment analysis. NAR Genom Bioinform 2021; 3:lqab090. [PMID: 34632381 PMCID: PMC8494211 DOI: 10.1093/nargab/lqab090] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Revised: 09/09/2021] [Accepted: 09/20/2021] [Indexed: 02/06/2023] Open
Abstract
Extracting and processing information from documents is of great importance as lots of experimental results and findings are stored in local files. Therefore, extracting and analyzing biomedical terms from such files in an automated way is absolutely necessary. In this article, we present OnTheFly2.0, a web application for extracting biomedical entities from individual files such as plain texts, office documents, PDF files or images. OnTheFly2.0 can generate informative summaries in popup windows containing knowledge related to the identified terms along with links to various databases. It uses the EXTRACT tagging service to perform named entity recognition (NER) for genes/proteins, chemical compounds, organisms, tissues, environments, diseases, phenotypes and gene ontology terms. Multiple files can be analyzed, whereas identified terms such as proteins or genes can be explored through functional enrichment analysis or be associated with diseases and PubMed entries. Finally, protein-protein and protein-chemical networks can be generated with the use of STRING and STITCH services. To demonstrate its capacity for knowledge discovery, we interrogated published meta-analyses of clinical biomarkers of severe COVID-19 and uncovered inflammatory and senescence pathways that impact disease pathogenesis. OnTheFly2.0 currently supports 197 species and is available at http://bib.fleming.gr:3838/OnTheFly/ and http://onthefly.pavlopouloslab.info.
Collapse
Affiliation(s)
- Fotis A Baltoumas
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center "Alexander Fleming", Vari 16672, Greece
| | - Sofia Zafeiropoulou
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center "Alexander Fleming", Vari 16672, Greece
| | - Evangelos Karatzas
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center "Alexander Fleming", Vari 16672, Greece
| | - Savvas Paragkamian
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes P.O. Box 2214, 71003 Heraklion, Crete, Greece
| | - Foteini Thanati
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center "Alexander Fleming", Vari 16672, Greece
| | - Ioannis Iliopoulos
- Department of Basic Sciences, School of Medicine, University of Crete, Heraklion 71003, Crete, Greece
| | - Aristides G Eliopoulos
- Department of Biology, School of Medicine, National and Kapodistrian University of Athens, Athens, 70013, Greece
| | - Reinhard Schneider
- University of Luxembourg, Luxembourg Centre for Systems Biomedicine, Bioinformatics Core, Esch-sur-Alzette, L-4365, Luxembourg
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, 2200, Denmark
| | - Evangelos Pafilis
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes P.O. Box 2214, 71003 Heraklion, Crete, Greece
| | - Georgios A Pavlopoulos
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center "Alexander Fleming", Vari 16672, Greece
| |
Collapse
|
20
|
Delmas M, Filangi O, Paulhe N, Vinson F, Duperier C, Garrier W, Saunier PE, Pitarch Y, Jourdan F, Giacomoni F, Frainay C. FORUM: Building a Knowledge Graph from public databases and scientific literature to extract associations between chemicals and diseases. Bioinformatics 2021; 37:3896-3904. [PMID: 34478489 PMCID: PMC8570811 DOI: 10.1093/bioinformatics/btab627] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Revised: 08/16/2021] [Accepted: 09/01/2021] [Indexed: 11/22/2022] Open
Abstract
Motivation Metabolomics studies aim at reporting a metabolic signature (list of metabolites) related to a particular experimental condition. These signatures are instrumental in the identification of biomarkers or classification of individuals, however their biological and physiological interpretation remains a challenge. To support this task, we introduce FORUM: a Knowledge Graph (KG) providing a semantic representation of relations between chemicals and biomedical concepts, built from a federation of life science databases and scientific literature repositories. Results The use of a Semantic Web framework on biological data allows us to apply ontological-based reasoning to infer new relations between entities. We show that these new relations provide different levels of abstraction and could open the path to new hypotheses. We estimate the statistical relevance of each extracted relation, explicit or inferred, using an enrichment analysis, and instantiate them as new knowledge in the KG to support results interpretation/further inquiries. Availability and implementation A web interface to browse and download the extracted relations, as well as a SPARQL endpoint to directly probe the whole FORUM KG, are available at https://forum-webapp.semantic-metabolomics.fr. The code needed to reproduce the triplestore is available at https://github.com/eMetaboHUB/Forum-DiseasesChem. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- M Delmas
- Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, 31300, France
| | - O Filangi
- IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, Le Rheu, 35653, France
| | - N Paulhe
- Université Clermont Auvergne, INRAE, UNH, Plateforme d'Exploration du Métabolisme, MetaboHUB Clermont, Clermont-Ferrand, F-63000, France
| | - F Vinson
- Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, 31300, France
| | - C Duperier
- Université Clermont Auvergne, INRAE, UNH, Plateforme d'Exploration du Métabolisme, MetaboHUB Clermont, Clermont-Ferrand, F-63000, France
| | - W Garrier
- ISIMA, Campus des Cézeaux, Aubière, 63177, France
| | - P-E Saunier
- ISIMA, Campus des Cézeaux, Aubière, 63177, France
| | - Y Pitarch
- IRIT, Université de Toulouse, Cours Rose Dieng-Kuntz, Toulouse, 31400, France
| | - F Jourdan
- Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, 31300, France
| | - F Giacomoni
- Université Clermont Auvergne, INRAE, UNH, Plateforme d'Exploration du Métabolisme, MetaboHUB Clermont, Clermont-Ferrand, F-63000, France
| | - C Frainay
- Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, 31300, France
| |
Collapse
|
21
|
Mann M, Kumar C, Zeng WF, Strauss MT. Artificial intelligence for proteomics and biomarker discovery. Cell Syst 2021; 12:759-770. [PMID: 34411543 DOI: 10.1016/j.cels.2021.06.006] [Citation(s) in RCA: 86] [Impact Index Per Article: 28.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 05/07/2021] [Accepted: 06/28/2021] [Indexed: 12/14/2022]
Abstract
There is an avalanche of biomedical data generation and a parallel expansion in computational capabilities to analyze and make sense of these data. Starting with genome sequencing and widely employed deep sequencing technologies, these trends have now taken hold in all omics disciplines and increasingly call for multi-omics integration as well as data interpretation by artificial intelligence technologies. Here, we focus on mass spectrometry (MS)-based proteomics and describe how machine learning and, in particular, deep learning now predicts experimental peptide measurements from amino acid sequences alone. This will dramatically improve the quality and reliability of analytical workflows because experimental results should agree with predictions in a multi-dimensional data landscape. Machine learning has also become central to biomarker discovery from proteomics data, which now starts to outperform existing best-in-class assays. Finally, we discuss model transparency and explainability and data privacy that are required to deploy MS-based biomarkers in clinical settings.
Collapse
Affiliation(s)
- Matthias Mann
- Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany.
| | - Chanchal Kumar
- Translational Science & Experimental Medicine, Research and Early Development, Cardiovascular, Renal and Metabolism (CVRM), BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden.
| | - Wen-Feng Zeng
- Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany.
| | | |
Collapse
|
22
|
Chen Q, Leaman R, Allot A, Luo L, Wei CH, Yan S, Lu Z. Artificial Intelligence in Action: Addressing the COVID-19 Pandemic with Natural Language Processing. Annu Rev Biomed Data Sci 2021; 4:313-339. [PMID: 34465169 DOI: 10.1146/annurev-biodatasci-021821-061045] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The COVID-19 (coronavirus disease 2019) pandemic has had a significant impact on society, both because of the serious health effects of COVID-19 and because of public health measures implemented to slow its spread. Many of these difficulties are fundamentally information needs; attempts to address these needs have caused an information overload for both researchers and the public. Natural language processing (NLP)-the branch of artificial intelligence that interprets human language-can be applied to address many of the information needs made urgent by the COVID-19 pandemic. This review surveys approximately 150 NLP studies and more than 50 systems and datasets addressing the COVID-19 pandemic. We detail work on four core NLP tasks: information retrieval, named entity recognition, literature-based discovery, and question answering. We also describe work that directly addresses aspects of the pandemic through four additional tasks: topic modeling, sentiment and emotion analysis, caseload forecasting, and misinformation detection. We conclude by discussing observable trends and remaining challenges.
Collapse
Affiliation(s)
- Qingyu Chen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA;
| | - Robert Leaman
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA;
| | - Alexis Allot
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA;
| | - Ling Luo
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA;
| | - Chih-Hsuan Wei
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA;
| | - Shankai Yan
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA;
| | - Zhiyong Lu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA;
| |
Collapse
|
23
|
Ali I, Dreij K, Baker S, Högberg J, Korhonen A, Stenius U. Application of Text Mining in Risk Assessment of Chemical Mixtures: A Case Study of Polycyclic Aromatic Hydrocarbons (PAHs). ENVIRONMENTAL HEALTH PERSPECTIVES 2021; 129:67008. [PMID: 34165340 PMCID: PMC8318069 DOI: 10.1289/ehp6702] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Revised: 05/07/2021] [Accepted: 05/10/2021] [Indexed: 05/08/2023]
Abstract
BACKGROUND Cancer risk assessment of complex exposures, such as exposure to mixtures of polycyclic aromatic hydrocarbons (PAHs), is challenging due to the diverse biological activities of these compounds. With the help of text mining (TM), we have developed TM tools-the latest iteration of the Cancer Risk Assessment using Biomedical literature tool (CRAB3) and a Cancer Hallmarks Analytics Tool (CHAT)-that could be useful for automatic literature analyses in cancer risk assessment and research. Although CRAB3 analyses are based on carcinogenic modes of action (MOAs) and cover almost all the key characteristics of carcinogens, CHAT evaluates literature according to the hallmarks of cancer referring to the alterations in cellular behavior that characterize the cancer cell. OBJECTIVES The objective was to evaluate the usefulness of these tools to support cancer risk assessment by performing a case study of 22 European Union and U.S. Environmental Protection Agency priority PAHs and diesel exhaust and a case study of PAH interactions with silica. METHODS We analyzed PubMed literature, comprising 57,498 references concerning priority PAHs and complex PAH mixtures, using CRAB3 and CHAT. RESULTS CRAB3 analyses correctly identified similarities and differences in genotoxic and nongenotoxic MOAs of the 22 priority PAHs and grouped them according to their known carcinogenic potential. CHAT had the same capacity and complemented the CRAB output when comparing, for example, benzo[a]pyrene and dibenzo[a,l]pyrene. Both CRAB3 and CHAT analyses highlighted potentially interacting mechanisms within and across complex PAH mixtures and mechanisms of possible importance for interactions with silica. CONCLUSION These data suggest that our TM approach can be useful in the hazard identification of PAHs and mixtures including PAHs. The tools can assist in grouping chemicals and identifying similarities and differences in carcinogenic MOAs and their interactions. https://doi.org/10.1289/EHP6702.
Collapse
Affiliation(s)
- Imran Ali
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Kristian Dreij
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Simon Baker
- Department of Theoretical and Applied Linguistics, University of Cambridge, Cambridge, UK
| | - Johan Högberg
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Anna Korhonen
- Department of Theoretical and Applied Linguistics, University of Cambridge, Cambridge, UK
| | - Ulla Stenius
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
24
|
Pereira V, Cooper CL, Chandwani R, Varma A, Tarba SYY. Guest editorial. JOURNAL OF KNOWLEDGE MANAGEMENT 2021. [DOI: 10.1108/jkm-02-2021-0086] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
25
|
Britt BC, Britt RK, Hayes JL, Panek ET, Maddox J, Musaev A. Oral Healthcare Implications of Dedicated Online Communities: A Computational Content Analysis of the r/Dentistry Subreddit. HEALTH COMMUNICATION 2021; 36:572-584. [PMID: 32091259 DOI: 10.1080/10410236.2020.1731937] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The current study explores communication expressed by participants in a subreddit surrounding oral health care, moderated by dentists and dental hygienists. The corpus was analyzed through Leximancer, a computer-assisted program used for computational content analyses of large data sets. Users' personal disclosures about ongoing dental concerns, advice about others' self-care, and the role of interpersonal communication with and among health care providers emerged as dominant themes. The findings suggest that online communities may serve an important role that dentists are unable to fill in their limited interactions with individual patients. Such interaction spaces may therefore offer a fertile environment for future interventions to promote beneficial practices and achieve positive health-related outcomes.
Collapse
Affiliation(s)
- Brian C Britt
- Department of Advertising and Public Relations, University of Alabama
| | - Rebecca K Britt
- Department of Journalism and Creative Media, University of Alabama
| | - Jameson L Hayes
- Department of Advertising and Public Relations, University of Alabama
| | - Elliot T Panek
- Department of Journalism and Creative Media, University of Alabama
| | - Jessica Maddox
- Department of Journalism and Creative Media, University of Alabama
| | - Aibek Musaev
- Department of Computer Science, University of Alabama
| |
Collapse
|
26
|
Danesh F, Dastani M, Ghorbani M. Retrospective and prospective approaches of coronavirus publications in the last half-century: a Latent Dirichlet allocation analysis. LIBRARY HI TECH 2021. [DOI: 10.1108/lht-09-2020-0216] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
PurposeThe present article's primary purpose is the topic modeling of the global coronavirus publications in the last 50 years.Design/methodology/approachThe present study is applied research that has been conducted using text mining. The statistical population is the coronavirus publications that have been collected from the Web of Science Core Collection (1970–2020). The main keywords were extracted from the Medical Subject Heading browser to design the search strategy. Latent Dirichlet allocation and Python programming language were applied to analyze the data and implement the text mining algorithms of topic modeling.FindingsThe findings indicated that the SARS, science, protein, MERS, veterinary, cell, human, RNA, medicine and virology are the most important keywords in the global coronavirus publications. Also, eight important topics were identified in the global coronavirus publications by implementing the topic modeling algorithm. The highest number of publications were respectively on the following topics: “structure and proteomics,” “Cell signaling and immune response,” “clinical presentation and detection,” “Gene sequence and genomics,” “Diagnosis tests,” “vaccine and immune response and outbreak,” “Epidemiology and Transmission” and “gastrointestinal tissue.”Originality/valueThe originality of this article can be considered in three ways. First, text mining and Latent Dirichlet allocation were applied to analyzing coronavirus literature for the first time. Second, coronavirus is mentioned as a hot topic of research. Finally, in addition to the retrospective approaches to 50 years of data collection and analysis, the results can be exploited with prospective approaches to strategic planning and macro-policymaking.
Collapse
|
27
|
Turina P, Fariselli P, Capriotti E. ThermoScan: Semi-automatic Identification of Protein Stability Data From PubMed. Front Mol Biosci 2021; 8:620475. [PMID: 33842537 PMCID: PMC8027235 DOI: 10.3389/fmolb.2021.620475] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Accepted: 02/18/2021] [Indexed: 11/13/2022] Open
Abstract
During the last years, the increasing number of DNA sequencing and protein mutagenesis studies has generated a large amount of variation data published in the biomedical literature. The collection of such data has been essential for the development and assessment of tools predicting the impact of protein variants at functional and structural levels. Nevertheless, the collection of manually curated data from literature is a highly time consuming and costly process that requires domain experts. In particular, the development of methods for predicting the effect of amino acid variants on protein stability relies on the thermodynamic data extracted from literature. In the past, such data were deposited in the ProTherm database, which however is no longer maintained since 2013. For facilitating the collection of protein thermodynamic data from literature, we developed the semi-automatic tool ThermoScan. ThermoScan is a text mining approach for the identification of relevant thermodynamic data on protein stability from full-text articles. The method relies on a regular expression searching for groups of words, including the most common conceptual words appearing in experimental studies on protein stability, several thermodynamic variables, and their units of measure. ThermoScan analyzes full-text articles from the PubMed Central Open Access subset and calculates an empiric score that allows the identification of manuscripts reporting thermodynamic data on protein stability. The method was optimized on a set of publications included in the ProTherm database, and tested on a new curated set of articles, manually selected for presence of thermodynamic data. The results show that ThermoScan returns accurate predictions and outperforms recently developed text-mining algorithms based on the analysis of publication abstracts. Availability: The ThermoScan server is freely accessible online at https://folding.biofold.org/thermoscan. The ThermoScan python code and the Google Chrome extension for submitting visualized PMC web pages to the ThermoScan server are available at https://github.com/biofold/ThermoScan.
Collapse
Affiliation(s)
- Paola Turina
- Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Bologna, Italy
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Torino, Italy
| | - Emidio Capriotti
- Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Bologna, Italy
| |
Collapse
|
28
|
Tarasova OA, Biziukova NY, Rudik AV, Dmitriev AV, Filimonov DA, Poroikov VV. Extraction of Data on Parent Compounds and Their Metabolites from Texts of Scientific Abstracts. J Chem Inf Model 2021; 61:1683-1690. [PMID: 33724829 DOI: 10.1021/acs.jcim.0c01054] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The growing amount of experimental data on chemical objects includes properties of small molecules, results of studies of their interaction with human and animal proteins, and methods of synthesis of organic compounds (OCs). The data obtained can be used to identify the names of OCs automatically, including all possible synonyms and relevant data on the molecular properties and biological activity. Utilization of different synonymic names of chemical compounds allows researchers to increase the completeness of data on their properties available from publications. Enrichment of the data on the names of chemical compounds by information about their possible metabolites can help estimate the biological effects of parent compounds and their metabolites more thoroughly. Therefore, an attempt at automated extraction of the names of parent compounds and their metabolites from the texts is a rather important task. In our study, we aimed at developing a method that provides the extraction of the named entities (NEs) of parent compounds and their metabolites from abstracts of scientific publications. Based on the application of the conditional random fields' algorithm, we extracted the NEs of chemical compounds. We developed a set of rules allowing identification of parent compound NEs and their metabolites in the texts. We evaluated the possibility of extracting the names of potential metabolites based on cosine similarity between strings representing names of parent compounds and all other chemical NEs found in the text. Additionally, we used conditional random fields to fetch the names of parent compounds and their metabolites from the texts based on the corpus of texts labeled manually. Our computational experiments showed that usage of rules in combination with cosine similarity could increase the accuracy of recognition of the names of metabolites compared to the rule-based algorithm and application of a machine-learning algorithm (conditional random fields).
Collapse
Affiliation(s)
- Olga A Tarasova
- Institute of Biomedical Chemistry, Pogodinskaya Str., 10/8, Moscow 119121, Russia
| | | | - Anastassia V Rudik
- Institute of Biomedical Chemistry, Pogodinskaya Str., 10/8, Moscow 119121, Russia
| | - Alexander V Dmitriev
- Institute of Biomedical Chemistry, Pogodinskaya Str., 10/8, Moscow 119121, Russia
| | - Dmitry A Filimonov
- Institute of Biomedical Chemistry, Pogodinskaya Str., 10/8, Moscow 119121, Russia
| | - Vladimir V Poroikov
- Institute of Biomedical Chemistry, Pogodinskaya Str., 10/8, Moscow 119121, Russia
| |
Collapse
|
29
|
Cheerkoot-Jalim S, Khedo KK. A systematic review of text mining approaches applied to various application areas in the biomedical domain. JOURNAL OF KNOWLEDGE MANAGEMENT 2020. [DOI: 10.1108/jkm-09-2019-0524] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Purpose
This work shows the results of a systematic literature review on biomedical text mining. The purpose of this study is to identify the different text mining approaches used in different application areas of the biomedical domain, the common tools used and the challenges of biomedical text mining as compared to generic text mining algorithms. This study will be of value to biomedical researchers by allowing them to correlate text mining approaches to specific biomedical application areas. Implications for future research are also discussed.
Design/methodology/approach
The review was conducted following the principles of the Kitchenham method. A number of research questions were first formulated, followed by the definition of the search strategy. The papers were then selected based on a list of assessment criteria. Each of the papers were analyzed and information relevant to the research questions were extracted.
Findings
It was found that researchers have mostly harnessed data sources such as electronic health records, biomedical literature, social media and health-related forums. The most common text mining technique was natural language processing using tools such as MetaMap and Unstructured Information Management Architecture, alongside the use of medical terminologies such as Unified Medical Language System. The main application area was the detection of adverse drug events. Challenges identified included the need to deal with huge amounts of text, the heterogeneity of the different data sources, the duality of meaning of words in biomedical text and the amount of noise introduced mainly from social media and health-related forums.
Originality/value
To the best of the authors’ knowledge, other reviews in this area have focused on either specific techniques, specific application areas or specific data sources. The results of this review will help researchers to correlate most relevant and recent advances in text mining approaches to specific biomedical application areas by providing an up-to-date and holistic view of work done in this research area. The use of emerging text mining techniques has great potential to spur the development of innovative applications, thus considerably impacting on the advancement of biomedical research.
Collapse
|
30
|
Saik OV, Klimontov VV. Bioinformatic Reconstruction and Analysis of Gene Networks Related to Glucose Variability in Diabetes and Its Complications. Int J Mol Sci 2020; 21:ijms21228691. [PMID: 33217980 PMCID: PMC7698756 DOI: 10.3390/ijms21228691] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Revised: 11/06/2020] [Accepted: 11/16/2020] [Indexed: 02/06/2023] Open
Abstract
Glucose variability (GV) has been recognized recently as a promoter of complications and therapeutic targets in diabetes. The aim of this study was to reconstruct and analyze gene networks related to GV in diabetes and its complications. For network analysis, we used the ANDSystem that provides automatic network reconstruction and analysis based on text mining. The network of GV consisted of 37 genes/proteins associated with both hyperglycemia and hypoglycemia. Cardiovascular system, pancreas, adipose and muscle tissues, gastrointestinal tract, and kidney were recognized as the loci with the highest expression of GV-related genes. According to Gene Ontology enrichment analysis, these genes are associated with insulin secretion, glucose metabolism, glycogen biosynthesis, gluconeogenesis, MAPK and JAK-STAT cascades, protein kinase B signaling, cell proliferation, nitric oxide biosynthesis, etc. GV-related genes were found to occupy central positions in the networks of diabetes complications (cardiovascular disease, diabetic nephropathy, retinopathy, and neuropathy) and were associated with response to hypoxia. Gene prioritization analysis identified new gene candidates (THBS1, FN1, HSP90AA1, EGFR, MAPK1, STAT3, TP53, EGF, GSK3B, and PTEN) potentially involved in GV. The results expand the understanding of the molecular mechanisms of the GV phenomenon in diabetes and provide molecular markers and therapeutic targets for future research.
Collapse
Affiliation(s)
- Olga V. Saik
- Laboratory of Endocrinology, Research Institute of Clinical and Experimental Lymphology—Branch of the Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences (RICEL—Branch of IC&G SB RAS), 630060 Novosibirsk, Russia;
- Laboratory of Computer Proteomics, Federal Research Center Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences (ICG SB RAS), 630090 Novosibirsk, Russia
- Correspondence:
| | - Vadim V. Klimontov
- Laboratory of Endocrinology, Research Institute of Clinical and Experimental Lymphology—Branch of the Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences (RICEL—Branch of IC&G SB RAS), 630060 Novosibirsk, Russia;
| |
Collapse
|
31
|
Gobeill J, Caucheteur D, Michel PA, Mottin L, Pasche E, Ruch P. SIB Literature Services: RESTful customizable search engines in biomedical literature, enriched with automatically mapped biomedical concepts. Nucleic Acids Res 2020; 48:W12-W16. [PMID: 32379317 PMCID: PMC7319474 DOI: 10.1093/nar/gkaa328] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Revised: 04/09/2020] [Accepted: 04/22/2020] [Indexed: 01/05/2023] Open
Abstract
Thanks to recent efforts by the text mining community, biocurators have now access to plenty of good tools and Web interfaces for identifying and visualizing biomedical entities in literature. Yet, many of these systems start with a PubMed query, which is limited by strong Boolean constraints. Some semantic search engines exploit entities for Information Retrieval, and/or deliver relevance-based ranked results. Yet, they are not designed for supporting a specific curation workflow, and allow very limited control on the search process. The Swiss Institute of Bioinformatics Literature Services (SIBiLS) provide personalized Information Retrieval in the biological literature. Indeed, SIBiLS allow fully customizable search in semantically enriched contents, based on keywords and/or mapped biomedical entities from a growing set of standardized and legacy vocabularies. The services have been used and favourably evaluated to assist the curation of genes and gene products, by delivering customized literature triage engines to different curation teams. SIBiLS (https://candy.hesge.ch/SIBiLS) are freely accessible via REST APIs and are ready to empower any curation workflow, built on modern technologies scalable with big data: MongoDB and Elasticsearch. They cover MEDLINE and PubMed Central Open Access enriched by nearly 2 billion of mapped biomedical entities, and are daily updated.
Collapse
Affiliation(s)
- Julien Gobeill
- To whom correspondence should be addressed. Tel: +41 22 388 17 86; Fax: +41 22 546 97 38;
| | - Déborah Caucheteur
- BiTeM group, Information Sciences, HES-SO / HEG Geneva, 1227 Carouge, Switzerland
| | - Pierre-André Michel
- SIB Text Mining group, Swiss Institute of Bioinformatics, 1206 Geneva, Switzerland
| | - Luc Mottin
- BiTeM group, Information Sciences, HES-SO / HEG Geneva, 1227 Carouge, Switzerland
| | - Emilie Pasche
- SIB Text Mining group, Swiss Institute of Bioinformatics, 1206 Geneva, Switzerland
- BiTeM group, Information Sciences, HES-SO / HEG Geneva, 1227 Carouge, Switzerland
| | - Patrick Ruch
- Correspondence may also be addressed to Patrick Ruch. Tel: +41 22 388 17 81; Fax: +41 22 546 97 38;
| |
Collapse
|
32
|
Yan S, Wong KC. Context awareness and embedding for biomedical event extraction. Bioinformatics 2020; 36:637-643. [PMID: 31392318 DOI: 10.1093/bioinformatics/btz607] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2018] [Revised: 07/26/2019] [Accepted: 08/06/2019] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Biomedical event extraction is fundamental for information extraction in molecular biology and biomedical research. The detected events form the central basis for comprehensive biomedical knowledge fusion, facilitating the digestion of massive information influx from the literature. Limited by the event context, the existing event detection models are mostly applicable for a single task. A general and scalable computational model is desiderated for biomedical knowledge management. RESULTS We consider and propose a bottom-up detection framework to identify the events from recognized arguments. To capture the relations between the arguments, we trained a bidirectional long short-term memory network to model their context embedding. Leveraging the compositional attributes, we further derived the candidate samples for training event classifiers. We built our models on the datasets from BioNLP Shared Task for evaluations. Our method achieved the average F-scores of 0.81 and 0.92 on BioNLPST-BGI and BioNLPST-BB datasets, respectively. Comparing with seven state-of-the-art methods, our method nearly doubled the existing F-score performance (0.92 versus 0.56) on the BioNLPST-BB dataset. Case studies were conducted to reveal the underlying reasons. AVAILABILITY AND IMPLEMENTATION https://github.com/cskyan/evntextrc. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Shankai Yan
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong SAR 999077
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong SAR 999077
| |
Collapse
|
33
|
Piereck B, Oliveira-Lima M, Benko-Iseppon AM, Diehl S, Schneider R, Brasileiro-Vidal AC, Barbosa-Silva A. LAITOR4HPC: A text mining pipeline based on HPC for building interaction networks. BMC Bioinformatics 2020; 21:365. [PMID: 32838742 PMCID: PMC7447576 DOI: 10.1186/s12859-020-03620-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2019] [Accepted: 06/19/2020] [Indexed: 11/11/2022] Open
Abstract
Background The amount of published full-text articles has increased dramatically. Text mining tools configure an essential approach to building biological networks, updating databases and providing annotation for new pathways. PESCADOR is an online web server based on LAITOR and NLProt text mining tools, which retrieves protein-protein co-occurrences in a tabular-based format, adding a network schema. Here we present an HPC-oriented version of PESCADOR’s native text mining tool, renamed to LAITOR4HPC, aiming to access an unlimited abstract amount in a short time to enrich available networks, build new ones and possibly highlight whether fields of research have been exhaustively studied. Results By taking advantage of parallel computing HPC infrastructure, the full collection of MEDLINE abstracts available until June 2017 was analyzed in a shorter period (6 days) when compared to the original online implementation (with an estimated 2 years to run the same data). Additionally, three case studies were presented to illustrate LAITOR4HPC usage possibilities. The first case study targeted soybean and was used to retrieve an overview of published co-occurrences in a single organism, retrieving 15,788 proteins in 7894 co-occurrences. In the second case study, a target gene family was searched in many organisms, by analyzing 15 species under biotic stress. Most co-occurrences regarded Arabidopsis thaliana and Zea mays. The third case study concerned the construction and enrichment of an available pathway. Choosing A. thaliana for further analysis, the defensin pathway was enriched, showing additional signaling and regulation molecules, and how they respond to each other in the modulation of this complex plant defense response. Conclusions LAITOR4HPC can be used for an efficient text mining based construction of biological networks derived from big data sources, such as MEDLINE abstracts. Time consumption and data input limitations will depend on the available resources at the HPC facility. LAITOR4HPC enables enough flexibility for different approaches and data amounts targeted to an organism, a subject, or a specific pathway. Additionally, it can deliver comprehensive results where interactions are classified into four types, according to their reliability.
Collapse
Affiliation(s)
- Bruna Piereck
- Genetics Department, Laboratório de Genética e Biologia Vegetal, Universidade Federal de Pernambuco, Recife, Pernambuco, Brazil
| | - Marx Oliveira-Lima
- Genetics Department, Laboratório de Genética e Biologia Vegetal, Universidade Federal de Pernambuco, Recife, Pernambuco, Brazil
| | - Ana Maria Benko-Iseppon
- Genetics Department, Laboratório de Genética e Biologia Vegetal, Universidade Federal de Pernambuco, Recife, Pernambuco, Brazil.
| | - Sarah Diehl
- University of Luxembourg, Luxembourg Centre for Systems Biomedicine, Bioinformatics Core, Esch-sur-Alzette, Luxembourg
| | - Reinhard Schneider
- University of Luxembourg, Luxembourg Centre for Systems Biomedicine, Bioinformatics Core, Esch-sur-Alzette, Luxembourg
| | - Ana Christina Brasileiro-Vidal
- Genetics Department, Laboratório de Genética e Biologia Vegetal, Universidade Federal de Pernambuco, Recife, Pernambuco, Brazil
| | - Adriano Barbosa-Silva
- University of Luxembourg, Luxembourg Centre for Systems Biomedicine, Bioinformatics Core, Esch-sur-Alzette, Luxembourg. .,Queen Mary University of London, Centre for Translational Bioinformatics, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Charterhouse Square, London, UK.
| |
Collapse
|
34
|
Malvezzi H, Marengo EB, Podgaec S, Piccinato CDA. Endometriosis: current challenges in modeling a multifactorial disease of unknown etiology. J Transl Med 2020; 18:311. [PMID: 32787880 PMCID: PMC7425005 DOI: 10.1186/s12967-020-02471-0] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Accepted: 07/28/2020] [Indexed: 02/07/2023] Open
Abstract
Endometriosis is a chronic inflammatory hormone-dependent condition associated with pelvic pain and infertility, characterized by the growth of ectopic endometrium outside the uterus. Given its still unknown etiology, treatments usually aim at diminishing pain and/or achieving pregnancy. Despite some progress in defining mode-of-action for drug development, the lack of reliable animal models indicates that novel approaches are required. The difficulties inherent to modeling endometriosis are related to its multifactorial nature, a condition that hinders the recreation of its pathology and the identification of clinically relevant metrics to assess drug efficacy. In this review, we report and comment endometriosis models and how they have led to new therapies. We envision a roadmap for endometriosis research, integrating Artificial Intelligence, three-dimensional cultures and organ-on-chip models as ways to achieve better understanding of physiopathological features and better tailored effective treatments.
Collapse
Affiliation(s)
- Helena Malvezzi
- Hospital Israelita Albert Einstein, São Paulo, SP 05652-900 Brazil
| | - Eliana Blini Marengo
- Instituto Butanta- EstabilidadeBiotech Quality Control, São Paulo, SP 05503-900 Brazil
| | - Sérgio Podgaec
- Hospital Israelita Albert Einstein, São Paulo, SP 05652-900 Brazil
| | | |
Collapse
|
35
|
Arguello Casteleiro M, Des Diz J, Maroto N, Fernandez Prieto MJ, Peters S, Wroe C, Sevillano Torrado C, Maseda Fernandez D, Stevens R. Semantic Deep Learning: Prior Knowledge and a Type of Four-Term Embedding Analogy to Acquire Treatments for Well-Known Diseases. JMIR Med Inform 2020; 8:e16948. [PMID: 32759099 PMCID: PMC7441383 DOI: 10.2196/16948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2019] [Revised: 02/27/2020] [Accepted: 02/27/2020] [Indexed: 11/13/2022] Open
Abstract
Background How to treat a disease remains to be the most common type of clinical question. Obtaining evidence-based answers from biomedical literature is difficult. Analogical reasoning with embeddings from deep learning (embedding analogies) may extract such biomedical facts, although the state-of-the-art focuses on pair-based proportional (pairwise) analogies such as man:woman::king:queen (“queen = −man +king +woman”). Objective This study aimed to systematically extract disease treatment statements with a Semantic Deep Learning (SemDeep) approach underpinned by prior knowledge and another type of 4-term analogy (other than pairwise). Methods As preliminaries, we investigated Continuous Bag-of-Words (CBOW) embedding analogies in a common-English corpus with five lines of text and observed a type of 4-term analogy (not pairwise) applying the 3CosAdd formula and relating the semantic fields person and death: “dagger = −Romeo +die +died” (search query: −Romeo +die +died). Our SemDeep approach worked with pre-existing items of knowledge (what is known) to make inferences sanctioned by a 4-term analogy (search query −x +z1 +z2) from CBOW and Skip-gram embeddings created with a PubMed systematic reviews subset (PMSB dataset). Stage1: Knowledge acquisition. Obtaining a set of terms, candidate y, from embeddings using vector arithmetic. Some n-gram pairs from the cosine and validated with evidence (prior knowledge) are the input for the 3cosAdd, seeking a type of 4-term analogy relating the semantic fields disease and treatment. Stage 2: Knowledge organization. Identification of candidates sanctioned by the analogy belonging to the semantic field treatment and mapping these candidates to unified medical language system Metathesaurus concepts with MetaMap. A concept pair is a brief disease treatment statement (biomedical fact). Stage 3: Knowledge validation. An evidence-based evaluation followed by human validation of biomedical facts potentially useful for clinicians. Results We obtained 5352 n-gram pairs from 446 search queries by applying the 3CosAdd. The microaveraging performance of MetaMap for candidate y belonging to the semantic field treatment was F-measure=80.00% (precision=77.00%, recall=83.25%). We developed an empirical heuristic with some predictive power for clinical winners, that is, search queries bringing candidate y with evidence of a therapeutic intent for target disease x. The search queries -asthma +inhaled_corticosteroids +inhaled_corticosteroid and -epilepsy +valproate +antiepileptic_drug were clinical winners, finding eight evidence-based beneficial treatments. Conclusions Extracting treatments with therapeutic intent by analogical reasoning from embeddings (423K n-grams from the PMSB dataset) is an ambitious goal. Our SemDeep approach is knowledge-based, underpinned by embedding analogies that exploit prior knowledge. Biomedical facts from embedding analogies (4-term type, not pairwise) are potentially useful for clinicians. The heuristic offers a practical way to discover beneficial treatments for well-known diseases. Learning from deep learning models does not require a massive amount of data. Embedding analogies are not limited to pairwise analogies; hence, analogical reasoning with embeddings is underexploited.
Collapse
Affiliation(s)
| | | | - Nava Maroto
- Departamento de Lingüística Aplicada a la Ciencia y a la Tecnología, Universidad Politécnica de Madrid, Madrid, Spain
| | | | - Simon Peters
- School of Social Sciences, University of Manchester, Manchester, United Kingdom
| | | | | | | | - Robert Stevens
- Department of Computer Science, University of Manchester, Manchester, United Kingdom
| |
Collapse
|
36
|
Nédellec C, Ibanescu L, Bossy R, Sourdille P. WTO, an ontology for wheat traits and phenotypes in scientific publications. Genomics Inform 2020; 18:e14. [PMID: 32634868 PMCID: PMC7362939 DOI: 10.5808/gi.2020.18.2.e14] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Revised: 06/09/2020] [Accepted: 06/10/2020] [Indexed: 11/20/2022] Open
Abstract
Phenotyping is a major issue for wheat agriculture to meet the challenges of adaptation of wheat varieties to climate change and chemical input reduction in crop. The need to improve the reuse of observations and experimental data has led to the creation of reference ontologies to standardize descriptions of phenotypes and to facilitate their comparison. The scientific literature is largely under-exploited, although extremely rich in phenotype descriptions associated with cultivars and genetic information. In this paper we propose the Wheat Trait Ontology (WTO) that is suitable for the extraction and management of scientific information from scientific papers, and its combination with data from genomic and experimental databases. We describe the principles of WTO construction and show examples of WTO use for the extraction and management of phenotype descriptions obtained from scientific documents.
Collapse
Affiliation(s)
- Claire Nédellec
- Paris-Saclay University, INRAE, MaIAGE, F-78350 Jouy-en-Josas, France
| | - Liliana Ibanescu
- Paris-Saclay University, INRAE, UMR MIA-Paris, AgroParisTech, F-75005, Paris, France
| | - Robert Bossy
- Paris-Saclay University, INRAE, MaIAGE, F-78350 Jouy-en-Josas, France
| | - Pierre Sourdille
- University Clermont-Auvergne, INRAE, UMR 1095 GDEC, F-63000 Clermont-Ferrand, France
| |
Collapse
|
37
|
Abriata LA. Building blocks for commodity augmented reality-based molecular visualization and modeling in web browsers. PeerJ Comput Sci 2020; 6:e260. [PMID: 33816912 PMCID: PMC7924717 DOI: 10.7717/peerj-cs.260] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2019] [Accepted: 01/22/2020] [Indexed: 06/12/2023]
Abstract
For years, immersive interfaces using virtual and augmented reality (AR) for molecular visualization and modeling have promised a revolution in the way how we teach, learn, communicate and work in chemistry, structural biology and related areas. However, most tools available today for immersive modeling require specialized hardware and software, and are costly and cumbersome to set up. These limitations prevent wide use of immersive technologies in education and research centers in a standardized form, which in turn prevents large-scale testing of the actual effects of such technologies on learning and thinking processes. Here, I discuss building blocks for creating marker-based AR applications that run as web pages on regular computers, and explore how they can be exploited to develop web content for handling virtual molecular systems in commodity AR with no more than a webcam- and internet-enabled computer. Examples span from displaying molecules, electron microscopy maps and molecular orbitals with minimal amounts of HTML code, to incorporation of molecular mechanics, real-time estimation of experimental observables and other interactive resources using JavaScript. These web apps provide virtual alternatives to physical, plastic-made molecular modeling kits, where the computer augments the experience with information about spatial interactions, reactivity, energetics, etc. The ideas and prototypes introduced here should serve as starting points for building active content that everybody can utilize online at minimal cost, providing novel interactive pedagogic material in such an open way that it could enable mass-testing of the effect of immersive technologies on chemistry education.
Collapse
Affiliation(s)
- Luciano A. Abriata
- École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
38
|
Althubaiti S, Kafkas Ş, Abdelhakim M, Hoehndorf R. Combining lexical and context features for automatic ontology extension. J Biomed Semantics 2020; 11:1. [PMID: 31931870 PMCID: PMC6958746 DOI: 10.1186/s13326-019-0218-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2018] [Accepted: 12/24/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Ontologies are widely used across biology and biomedicine for the annotation of databases. Ontology development is often a manual, time-consuming, and expensive process. Automatic or semi-automatic identification of classes that can be added to an ontology can make ontology development more efficient. RESULTS We developed a method that uses machine learning and word embeddings to identify words and phrases that are used to refer to an ontology class in biomedical Europe PMC full-text articles. Once labels and synonyms of a class are known, we use machine learning to identify the super-classes of a class. For this purpose, we identify lexical term variants, use word embeddings to capture context information, and rely on automated reasoning over ontologies to generate features, and we use an artificial neural network as classifier. We demonstrate the utility of our approach in identifying terms that refer to diseases in the Human Disease Ontology and to distinguish between different types of diseases. CONCLUSIONS Our method is capable of discovering labels that refer to a class in an ontology but are not present in an ontology, and it can identify whether a class should be a subclass of some high-level ontology classes. Our approach can therefore be used for the semi-automatic extension and quality control of ontologies. The algorithm, corpora and evaluation datasets are available at https://github.com/bio-ontology-research-group/ontology-extension.
Collapse
Affiliation(s)
- Sara Althubaiti
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia.,Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia
| | - Şenay Kafkas
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia.,Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia
| | - Marwa Abdelhakim
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia.,Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia
| | - Robert Hoehndorf
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia. .,Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia.
| |
Collapse
|
39
|
Oh J, Bae H, Kim CE. Construction And Analysis Of The Time-Evolving Pain-Related Brain Network Using Literature Mining. J Pain Res 2019; 12:2891-2903. [PMID: 31802931 PMCID: PMC6801488 DOI: 10.2147/jpr.s217036] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Accepted: 09/17/2019] [Indexed: 11/23/2022] Open
Abstract
Purpose We aimed to quantitatively investigate how the neuroscience field developed over time in terms of its concept on how pain is represented in the brain and compare the research trends of pain with those of mental disorders through literature mining of accumulated published articles. Methods The abstracts and publication years of 137,525 pain-related articles were retrieved from the PubMed database. We defined 22 pain-related brain regions that appeared more than 100 times in the retrieved abstracts. Time-evolving networks of pain-related brain regions were constructed using the co-occurrence frequency. The state-space model was implemented to capture the trend patterns of the pain-related brain regions and the patterns were compared with those of mental disorders. Results The number of pain-related abstracts including brain areas steadily increased; however, the relative frequency of each brain region showed different patterns. According to the chronological patterns of relative frequencies, pain-related brain regions were clustered into three groups: rising, falling, and consistent. The network of pain-related brain regions extended over time from localized regions (mainly including brain stem and diencephalon) to wider cortical/subcortical regions. In the state-space model, the relative frequency trajectory of pain-related brain regions gradually became closer to that of mental disorder-related brain regions. Conclusion Temporal changes of pain-related brain regions in the abstracts indicate that emotional/cognitive aspects of pain have been gradually emphasized. The networks of pain-related brain regions imply perspective changes on pain from the simple percept to the multidimensional experience. Based on the notable occurrence patterns of the cerebellum and motor cortex, we suggest that motor-related areas will be actively explored in pain studies.
Collapse
Affiliation(s)
- Jihong Oh
- Department of Physiology, College of Korean Medicine, Gachon University, Seongnam 13120, Republic of Korea
| | - Hyojin Bae
- Department of Physiology, College of Korean Medicine, Gachon University, Seongnam 13120, Republic of Korea
| | - Chang-Eop Kim
- Department of Physiology, College of Korean Medicine, Gachon University, Seongnam 13120, Republic of Korea
| |
Collapse
|
40
|
|
41
|
Zhou W, Shao F, Li J. Bioinformatic analysis of the molecular mechanism underlying bronchial pulmonary dysplasia using a text mining approach. Medicine (Baltimore) 2019; 98:e18493. [PMID: 31876736 PMCID: PMC6946243 DOI: 10.1097/md.0000000000018493] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
Bronchopulmonary dysplasia (BPD) is a common disease of premature infants with very low birth weight. The mechanism is inconclusive. The aim of this study is to systematically explore BPD-related genes and characterize their functions.Natural language processing analysis was used to identify BPD-related genes. Gene data were extracted from PubMed database. Gene ontology, pathway, and network analysis were carried out, and the result was integrated with corresponding database.In this study, 216 genes were identified as BPD-related genes with P < .05, and 30 pathways were identified as significant. A network of BPD-related genes was also constructed with 17 hub genes identified. In particular, phosphatidyl inositol-3-enzyme-serine/threonine kinase signaling pathway involved the largest number of genes. Insulin was found to be a promising candidate gene related with BPD, suggesting that it may serve as an effective therapeutic target.Our data may help to better understand the molecular mechanisms underlying BPD. However, the mechanisms of BPD are elusive, and further studies are needed.
Collapse
Affiliation(s)
- Weitao Zhou
- Department of Pediatrics, The First Affiliated Hospital of the University of Science and Technology of China
| | - Fei Shao
- Department of Oncology, Second Affiliated Hospital of Anhui Medical University, Hefei
| | - Jing Li
- Department of Pediatric Intensive Care Unit, Children's Hospital of Chongqing Medical University; Ministry of Education Key Laboratory of Child Development and Disorders; National Clinical Research Center for Child Health and Disorders; China International Science and Technology Cooperation base of Child Development and Critical Disorders; Children's Hospital of Chongqing Medical University
- Chongqing Key Laboratory of Pediatrics, Chongqing, China
| |
Collapse
|
42
|
Arguello-Casteleiro M, Stevens R, Des-Diz J, Wroe C, Fernandez-Prieto MJ, Maroto N, Maseda-Fernandez D, Demetriou G, Peters S, Noble PJM, Jones PH, Dukes-McEwan J, Radford AD, Keane J, Nenadic G. Exploring semantic deep learning for building reliable and reusable one health knowledge from PubMed systematic reviews and veterinary clinical notes. J Biomed Semantics 2019; 10:22. [PMID: 31711540 PMCID: PMC6849172 DOI: 10.1186/s13326-019-0212-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
BACKGROUND Deep Learning opens up opportunities for routinely scanning large bodies of biomedical literature and clinical narratives to represent the meaning of biomedical and clinical terms. However, the validation and integration of this knowledge on a scale requires cross checking with ground truths (i.e. evidence-based resources) that are unavailable in an actionable or computable form. In this paper we explore how to turn information about diagnoses, prognoses, therapies and other clinical concepts into computable knowledge using free-text data about human and animal health. We used a Semantic Deep Learning approach that combines the Semantic Web technologies and Deep Learning to acquire and validate knowledge about 11 well-known medical conditions mined from two sets of unstructured free-text data: 300 K PubMed Systematic Review articles (the PMSB dataset) and 2.5 M veterinary clinical notes (the VetCN dataset). For each target condition we obtained 20 related clinical concepts using two deep learning methods applied separately on the two datasets, resulting in 880 term pairs (target term, candidate term). Each concept, represented by an n-gram, is mapped to UMLS using MetaMap; we also developed a bespoke method for mapping short forms (e.g. abbreviations and acronyms). Existing ontologies were used to formally represent associations. We also create ontological modules and illustrate how the extracted knowledge can be queried. The evaluation was performed using the content within BMJ Best Practice. RESULTS MetaMap achieves an F measure of 88% (precision 85%, recall 91%) when applied directly to the total of 613 unique candidate terms for the 880 term pairs. When the processing of short forms is included, MetaMap achieves an F measure of 94% (precision 92%, recall 96%). Validation of the term pairs with BMJ Best Practice yields precision between 98 and 99%. CONCLUSIONS The Semantic Deep Learning approach can transform neural embeddings built from unstructured free-text data into reliable and reusable One Health knowledge using ontologies and content from BMJ Best Practice.
Collapse
Affiliation(s)
| | - Robert Stevens
- School of Computer Science, University of Manchester, Manchester, UK
| | - Julio Des-Diz
- Hospital do Salnés, Villagarcía de Arousa, Pontevedra, Spain
| | | | | | - Nava Maroto
- Departamento de Lingüística Aplicada a la Ciencia y a la Tecnología, Universidad Politécnica de Madrid, Madrid, Spain
| | - Diego Maseda-Fernandez
- Midcheshire Hospital Foundation Trust, NHS England, Crewe, UK
- School of Medical Sciences, University of Manchester, Manchester, UK
| | - George Demetriou
- School of Computer Science, University of Manchester, Manchester, UK
| | - Simon Peters
- School of Social Sciences, University of Manchester, Manchester, UK
| | - Peter-John M Noble
- Small Animal Veterinary Surveillance Network, University of Liverpool, Liverpool, UK
| | - Phil H Jones
- Small Animal Veterinary Surveillance Network, University of Liverpool, Liverpool, UK
| | - Jo Dukes-McEwan
- Small Animal Teaching Hospital, University of Liverpool, Liverpool, UK
| | - Alan D Radford
- Small Animal Veterinary Surveillance Network, University of Liverpool, Liverpool, UK
| | - John Keane
- School of Computer Science, University of Manchester, Manchester, UK
- Manchester Institute of Biotechnology, University of Manchester, Manchester, UK
| | - Goran Nenadic
- School of Computer Science, University of Manchester, Manchester, UK
- Manchester Institute of Biotechnology, University of Manchester, Manchester, UK
- Health eResearch Centre, University of Manchester, Manchester, UK
| |
Collapse
|
43
|
Desterke C, Chiappini F. Lipid Related Genes Altered in NASH Connect Inflammation in Liver Pathogenesis Progression to HCC: A Canonical Pathway. Int J Mol Sci 2019; 20:ijms20225594. [PMID: 31717414 PMCID: PMC6888337 DOI: 10.3390/ijms20225594] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2019] [Revised: 11/03/2019] [Accepted: 11/04/2019] [Indexed: 02/06/2023] Open
Abstract
Nonalcoholic steatohepatitis (NASH) is becoming a public health problem worldwide. While the number of research studies on NASH progression rises every year, sometime their findings are controversial. To identify the most important and commonly described findings related to NASH progression, we used an original bioinformatics, integrative, text-mining approach that combines PubMed database querying and available gene expression omnibus dataset. We have identified a signature of 25 genes that are commonly found to be dysregulated during steatosis progression to NASH and cancer. These genes are implicated in lipid metabolism, insulin resistance, inflammation, and cancer. They are functionally connected, forming the basis necessary for steatosis progression to NASH and further progression to hepatocellular carcinoma (HCC). We also show that five of the identified genes have genome alterations present in HCC patients. The patients with these genes associated to genome alteration are associated with a poor prognosis. In conclusion, using an integrative literature- and data-mining approach, we have identified and described a canonical pathway underlying progression of NASH. Other parameters (e.g., polymorphisms) can be added to this pathway that also contribute to the progression of the disease to cancer. This work improved our understanding of the molecular basis of NASH progression and will help to develop new therapeutic approaches.
Collapse
Affiliation(s)
| | - Franck Chiappini
- Laboratoire Croissance, Régénération, Réparation et Régénération Tissulaires (CRRET)/ EAC CNRS 7149, Univ Paris-Est Créteil (UPEC), F-94010 Créteil, France
- Correspondence: ; Tel.: +33-(0)1-45177080; Fax: +33-(0)1-45171816
| |
Collapse
|
44
|
ProtFus: A Comprehensive Method Characterizing Protein-Protein Interactions of Fusion Proteins. PLoS Comput Biol 2019; 15:e1007239. [PMID: 31437145 PMCID: PMC6705771 DOI: 10.1371/journal.pcbi.1007239] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2018] [Accepted: 07/03/2019] [Indexed: 01/10/2023] Open
Abstract
Tailored therapy aims to cure cancer patients effectively and safely, based on the complex interactions between patients' genomic features, disease pathology and drug metabolism. Thus, the continual increase in scientific literature drives the need for efficient methods of data mining to improve the extraction of useful information from texts based on patients' genomic features. An important application of text mining to tailored therapy in cancer encompasses the use of mutations and cancer fusion genes as moieties that change patients' cellular networks to develop cancer, and also affect drug metabolism. Fusion proteins, which are derived from the slippage of two parental genes, are produced in cancer by chromosomal aberrations and trans-splicing. Given that the two parental proteins for predicted fusion proteins are known, we used our previously developed method for identifying chimeric protein-protein interactions (ChiPPIs) associated with the fusion proteins. Here, we present a validation approach that receives fusion proteins of interest, predicts their cellular network alterations by ChiPPI and validates them by our new method, ProtFus, using an online literature search. This process resulted in a set of 358 fusion proteins and their corresponding protein interactions, as a training set for a Naïve Bayes classifier, to identify predicted fusion proteins that have reliable evidence in the literature and that were confirmed experimentally. Next, for a test group of 1817 fusion proteins, we were able to identify from the literature 2908 PPIs in total, across 18 cancer types. The described method, ProtFus, can be used for screening the literature to identify unique cases of fusion proteins and their PPIs, as means of studying alterations of protein networks in cancers. Availability: http://protfus.md.biu.ac.il/.
Collapse
|
45
|
García del Valle EP, Lagunes García G, Prieto Santamaría L, Zanin M, Menasalvas Ruiz E, Rodríguez-González A. Disease networks and their contribution to disease understanding: A review of their evolution, techniques and data sources. J Biomed Inform 2019; 94:103206. [DOI: 10.1016/j.jbi.2019.103206] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Revised: 04/14/2019] [Accepted: 05/06/2019] [Indexed: 12/14/2022]
|
46
|
Essack M, Salhi A, Stanimirovic J, Tifratene F, Bin Raies A, Hungler A, Uludag M, Van Neste C, Trpkovic A, Bajic VP, Bajic VB, Isenovic ER. Literature-Based Enrichment Insights into Redox Control of Vascular Biology. OXIDATIVE MEDICINE AND CELLULAR LONGEVITY 2019; 2019:1769437. [PMID: 31223421 PMCID: PMC6542245 DOI: 10.1155/2019/1769437] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2019] [Revised: 04/11/2019] [Accepted: 05/02/2019] [Indexed: 02/07/2023]
Abstract
In cellular physiology and signaling, reactive oxygen species (ROS) play one of the most critical roles. ROS overproduction leads to cellular oxidative stress. This may lead to an irrecoverable imbalance of redox (oxidation-reduction reaction) function that deregulates redox homeostasis, which itself could lead to several diseases including neurodegenerative disease, cardiovascular disease, and cancers. In this study, we focus on the redox effects related to vascular systems in mammals. To support research in this domain, we developed an online knowledge base, DES-RedoxVasc, which enables exploration of information contained in the biomedical scientific literature. The DES-RedoxVasc system analyzed 233399 documents consisting of PubMed abstracts and PubMed Central full-text articles related to different aspects of redox biology in vascular systems. It allows researchers to explore enriched concepts from 28 curated thematic dictionaries, as well as literature-derived potential associations of pairs of such enriched concepts, where associations themselves are statistically enriched. For example, the system allows exploration of associations of pathways, diseases, mutations, genes/proteins, miRNAs, long ncRNAs, toxins, drugs, biological processes, molecular functions, etc. that allow for insights about different aspects of redox effects and control of processes related to the vascular system. Moreover, we deliver case studies about some existing or possibly novel knowledge regarding redox of vascular biology demonstrating the usefulness of DES-RedoxVasc. DES-RedoxVasc is the first compiled knowledge base using text mining for the exploration of this topic.
Collapse
Affiliation(s)
- Magbubah Essack
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal, Saudi Arabia
| | - Adil Salhi
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal, Saudi Arabia
| | - Julijana Stanimirovic
- Vinca Institute, University of Belgrade, Laboratory for Molecular Endocrinology and Radiobiology, Belgrade, Serbia
| | - Faroug Tifratene
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal, Saudi Arabia
| | - Arwa Bin Raies
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal, Saudi Arabia
| | - Arnaud Hungler
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal, Saudi Arabia
| | - Mahmut Uludag
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal, Saudi Arabia
| | - Christophe Van Neste
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal, Saudi Arabia
| | - Andreja Trpkovic
- Vinca Institute, University of Belgrade, Laboratory for Molecular Endocrinology and Radiobiology, Belgrade, Serbia
| | - Vladan P. Bajic
- Vinca Institute, University of Belgrade, Laboratory for Molecular Endocrinology and Radiobiology, Belgrade, Serbia
| | - Vladimir B. Bajic
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal, Saudi Arabia
| | - Esma R. Isenovic
- Vinca Institute, University of Belgrade, Laboratory for Molecular Endocrinology and Radiobiology, Belgrade, Serbia
| |
Collapse
|
47
|
Demenkov PS, Saik OV, Ivanisenko TV, Kolchanov NA, Kochetov AV, Ivanisenko VA. Prioritization of potato genes involved in the formation of agronomically valuable traits using the SOLANUM TUBEROSUM knowledge base. Vavilovskii Zhurnal Genet Selektsii 2019. [DOI: 10.18699/vj19.501] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
The development of highly efficient technologies in genomics, transcriptomics, proteomics and metabolomics, as well as new technologies in agriculture has led to an “information explosion” in plant biology and crop production, including potato production. Only a small part of the information reaches formalized databases (for example, Uniprot, NCBI Gene, BioGRID, IntAct, etc.). One of the main sources of reliable biological data is the scientific literature. The well-known PubMed database contains more than 18 thousand abstracts of articles on potato. The effective use of knowledge presented in such a number of non-formalized documents in natural language requires the use of modern intellectual methods of analysis. However, in the literature, there is no evidence of a widespread use of intelligent methods for automatically extracting knowledge from scientific publications on cultures such as potatoes. Earlier we developed the SOLANUM TUBEROSUM knowledge base (http://www-bionet.sysbio.cytogen. ru/and/plant/). Integrated into the knowledge base information about the molecular genetic mechanisms underlying the selection of significant traits helps to accelerate the identification of candidate genes for the breeding characteristics of potatoes and the development of diagnostic markers for breeding. The article searches for new potential participants of the molecular genetic mechanisms of resistance to adverse factors in plants. Prioritizing candidate genes has shown that the PHYA, GF14, CNIH1, RCI1A, ABI5, CPK1, RGS1, NHL3, GRF8, and CYP21-4 genes are the most promising for further testing of their relationships with resistance to adverse factors. As a result of the analysis, it was shown that the molecular genetic relationships responsible for the formation of significant agricultural traits are complex and include many direct and indirect interactions. The construction of associative gene networks and their analysis using the SOLANUM TUBEROSUM knowledge base is the basis for searching for target genes for targeted mutagenesis and marker-oriented selection of potato varieties with valuable agricultural characteristics.
Collapse
Affiliation(s)
- P. S. Demenkov
- Institute of Cytology and Genetics, SB RAS; Novosibirsk State University
| | - O. V. Saik
- Institute of Cytology and Genetics, SB RAS
| | | | | | | | | |
Collapse
|
48
|
Karim MR, Michel A, Zappa A, Baranov P, Sahay R, Rebholz-Schuhmann D. Improving data workflow systems with cloud services and use of open data for bioinformatics research. Brief Bioinform 2019; 19:1035-1050. [PMID: 28419324 PMCID: PMC6169675 DOI: 10.1093/bib/bbx039] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2016] [Indexed: 11/22/2022] Open
Abstract
Data workflow systems (DWFSs) enable bioinformatics researchers to combine components for data access and data analytics, and to share the final data analytics approach with their collaborators. Increasingly, such systems have to cope with large-scale data, such as full genomes (about 200 GB each), public fact repositories (about 100 TB of data) and 3D imaging data at even larger scales. As moving the data becomes cumbersome, the DWFS needs to embed its processes into a cloud infrastructure, where the data are already hosted. As the standardized public data play an increasingly important role, the DWFS needs to comply with Semantic Web technologies. This advancement to DWFS would reduce overhead costs and accelerate the progress in bioinformatics research based on large-scale data and public resources, as researchers would require less specialized IT knowledge for the implementation. Furthermore, the high data growth rates in bioinformatics research drive the demand for parallel and distributed computing, which then imposes a need for scalability and high-throughput capabilities onto the DWFS. As a result, requirements for data sharing and access to public knowledge bases suggest that compliance of the DWFS with Semantic Web standards is necessary. In this article, we will analyze the existing DWFS with regard to their capabilities toward public open data use as well as large-scale computational and human interface requirements. We untangle the parameters for selecting a preferable solution for bioinformatics research with particular consideration to using cloud services and Semantic Web technologies. Our analysis leads to research guidelines and recommendations toward the development of future DWFS for the bioinformatics research community.
Collapse
Affiliation(s)
- Md Rezaul Karim
- Semantics in eHealth and Life Sciences (SeLS), Insight Centre for Data Analytics, National University of Ireland, Galway, Ireland
| | - Audrey Michel
- School of Biochemistry and Cell Biology, University College Cork, Ireland
| | - Achille Zappa
- Insight Centre for Data Analytics, National University of Ireland Galway, Dangan, Galway, Ireland
| | - Pavel Baranov
- School of Biochemistry and Cell Biology, University College Cork, Ireland
| | - Ratnesh Sahay
- Semantics in eHealth and Life Sciences (SeLS), Insight Centre for Data Analytics, National University of Ireland, Galway, Ireland
| | | |
Collapse
|
49
|
Saqi M, Lysenko A, Guo YK, Tsunoda T, Auffray C. Navigating the disease landscape: knowledge representations for contextualizing molecular signatures. Brief Bioinform 2019; 20:609-623. [PMID: 29684165 PMCID: PMC6556902 DOI: 10.1093/bib/bby025] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2017] [Revised: 02/05/2018] [Indexed: 12/14/2022] Open
Abstract
Large amounts of data emerging from experiments in molecular medicine are leading to the identification of molecular signatures associated with disease subtypes. The contextualization of these patterns is important for obtaining mechanistic insight into the aberrant processes associated with a disease, and this typically involves the integration of multiple heterogeneous types of data. In this review, we discuss knowledge representations that can be useful to explore the biological context of molecular signatures, in particular three main approaches, namely, pathway mapping approaches, molecular network centric approaches and approaches that represent biological statements as knowledge graphs. We discuss the utility of each of these paradigms, illustrate how they can be leveraged with selected practical examples and identify ongoing challenges for this field of research.
Collapse
Affiliation(s)
- Mansoor Saqi
- Mansoor Saqi Data Science Institute, Imperial College London, UK
| | - Artem Lysenko
- Artem Lysenko Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Yi-Ke Guo
- Yi-Ke Guo Data Science Institute, Imperial College London, UK
| | - Tatsuhiko Tsunoda
- Tatsuhiko Tsunoda Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan CREST, JST, Tokyo, Japan Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, Japan
| | - Charles Auffray
- Charles Auffray European Institute for Systems Biology and Medicine, Lyon, France
| |
Collapse
|
50
|
Inferring Drug-Protein⁻Side Effect Relationships from Biomedical Text. Genes (Basel) 2019; 10:genes10020159. [PMID: 30791472 PMCID: PMC6409686 DOI: 10.3390/genes10020159] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Revised: 02/13/2019] [Accepted: 02/14/2019] [Indexed: 11/16/2022] Open
Abstract
Background: Although there are many studies of drugs and their side effects, the underlying mechanisms of these side effects are not well understood. It is also difficult to understand the specific pathways between drugs and side effects. Objective: The present study seeks to construct putative paths between drugs and their side effects by applying text-mining techniques to free text of biomedical studies, and to develop ranking metrics that could identify the most-likely paths. Materials and Methods: We extracted three types of relationships—drug-protein, protein-protein, and protein–side effect—from biomedical texts by using text mining and predefined relation-extraction rules. Based on the extracted relationships, we constructed whole drug-protein–side effect paths. For each path, we calculated its ranking score by a new ranking function that combines corpus- and ontology-based semantic similarity as well as co-occurrence frequency. Results: We extracted 13 plausible biomedical paths connecting drugs and their side effects from cancer-related abstracts in the PubMed database. The top 20 paths were examined, and the proposed ranking function outperformed the other methods tested, including co-occurrence, COALS, and UMLS by P@5-P@20. In addition, we confirmed that the paths are novel hypotheses that are worth investigating further. Discussion: The risk of side effects has been an important issue for the US Food and Drug Administration (FDA). However, the causes and mechanisms of such side effects have not been fully elucidated. This study extends previous research on understanding drug side effects by using various techniques such as Named Entity Recognition (NER), Relation Extraction (RE), and semantic similarity. Conclusion: It is not easy to reveal the biomedical mechanisms of side effects due to a huge number of possible paths. However, we automatically generated predictable paths using the proposed approach, which could provide meaningful information to biomedical researchers to generate plausible hypotheses for the understanding of such mechanisms.
Collapse
|