1
|
Lecca P, Lecca M. Graph embedding and geometric deep learning relevance to network biology and structural chemistry. Front Artif Intell 2023; 6:1256352. [PMID: 38035201 PMCID: PMC10687447 DOI: 10.3389/frai.2023.1256352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 10/16/2023] [Indexed: 12/02/2023] Open
Abstract
Graphs are used as a model of complex relationships among data in biological science since the advent of systems biology in the early 2000. In particular, graph data analysis and graph data mining play an important role in biology interaction networks, where recent techniques of artificial intelligence, usually employed in other type of networks (e.g., social, citations, and trademark networks) aim to implement various data mining tasks including classification, clustering, recommendation, anomaly detection, and link prediction. The commitment and efforts of artificial intelligence research in network biology are motivated by the fact that machine learning techniques are often prohibitively computational demanding, low parallelizable, and ultimately inapplicable, since biological network of realistic size is a large system, which is characterised by a high density of interactions and often with a non-linear dynamics and a non-Euclidean latent geometry. Currently, graph embedding emerges as the new learning paradigm that shifts the tasks of building complex models for classification, clustering, and link prediction to learning an informative representation of the graph data in a vector space so that many graph mining and learning tasks can be more easily performed by employing efficient non-iterative traditional models (e.g., a linear support vector machine for the classification task). The great potential of graph embedding is the main reason of the flourishing of studies in this area and, in particular, the artificial intelligence learning techniques. In this mini review, we give a comprehensive summary of the main graph embedding algorithms in light of the recent burgeoning interest in geometric deep learning.
Collapse
Affiliation(s)
- Paola Lecca
- Faculty of Engineering, Free University of Bozen-Bolzano, Bolzano, Italy
| | - Michela Lecca
- Fondazione Bruno Kessler, Digital Industry Center, Technologies of Vision, Trento, Italy
| |
Collapse
|
2
|
He F, Liu K, Yang Z, Chen Y, Hammer RD, Xu D, Popescu M. pathCLIP: Detection of Genes and Gene Relations from Biological Pathway Figures through Image-Text Contrastive Learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.31.564859. [PMID: 37961680 PMCID: PMC10635012 DOI: 10.1101/2023.10.31.564859] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
In biomedical literature, biological pathways are commonly described through a combination of images and text. These pathways contain valuable information, including genes and their relationships, which provide insight into biological mechanisms and precision medicine. Curating pathway information across the literature enables the integration of this information to build a comprehensive knowledge base. While some studies have extracted pathway information from images and text independently, they often overlook the correspondence between the two modalities. In this paper, we present a pathway figure curation system named pathCLIP for identifying genes and gene relations from pathway figures. Our key innovation is the use of an image-text contrastive learning model to learn coordinated embeddings of image snippets and text descriptions of genes and gene relations, thereby improving curation. Our validation results, using pathway figures from PubMed, showed that our multimodal model outperforms models using only a single modality. Additionally, our system effectively curates genes and gene relations from multiple literature sources. A case study on extracting pathway information from non-small cell lung cancer literature further demonstrates the usefulness of our curated pathway information in enhancing related pathways in the KEGG database.
Collapse
Affiliation(s)
- Fei He
- School of Information Science and Technology, Northeast Normal University, Changchun 130000, China; Department of Electrical Engineer and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia Missouri, MO 65211 USA
| | - Kai Liu
- School of Information Science and Technology, Northeast Normal University, Changchun 130000, China
| | - Zhiyuan Yang
- School of Information Science and Technology, Northeast Normal University, Changchun 130000, China
| | - Yibo Chen
- Department of Electrical Engineer and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia Missouri, MO 65211 USA
| | - Richard D Hammer
- School of Medicine, University of Missouri, Columbia Missouri, MO 65211 USA
| | - Dong Xu
- Department of Electrical Engineer and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia Missouri, MO 65211 USA
| | - Mihail Popescu
- School of Medicine, University of Missouri, Columbia Missouri, MO 65211 USA
| |
Collapse
|
3
|
Eslami M, Borujeni AE, Eramian H, Weston M, Zheng G, Urrutia J, Corbet C, Becker D, Maschhoff P, Clowers K, Cristofaro A, Hosseini HD, Gordon DB, Dorfan Y, Singer J, Vaughn M, Gaffney N, Fonner J, Stubbs J, Voigt CA, Yeung E. Prediction of whole-cell transcriptional response with machine learning. Bioinformatics 2022; 38:404-409. [PMID: 34570169 DOI: 10.1093/bioinformatics/btab676] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Revised: 08/29/2021] [Accepted: 09/22/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Applications in synthetic and systems biology can benefit from measuring whole-cell response to biochemical perturbations. Execution of experiments to cover all possible combinations of perturbations is infeasible. In this paper, we present the host response model (HRM), a machine learning approach that maps response of single perturbations to transcriptional response of the combination of perturbations. RESULTS The HRM combines high-throughput sequencing with machine learning to infer links between experimental context, prior knowledge of cell regulatory networks, and RNASeq data to predict a gene's dysregulation. We find that the HRM can predict the directionality of dysregulation to a combination of inducers with an accuracy of >90% using data from single inducers. We further find that the use of prior, known cell regulatory networks doubles the predictive performance of the HRM (an R2 from 0.3 to 0.65). The model was validated in two organisms, Escherichia coli and Bacillus subtilis, using new experiments conducted after training. Finally, while the HRM is trained with gene expression data, the direct prediction of differential expression makes it possible to also conduct enrichment analyses using its predictions. We show that the HRM can accurately classify >95% of the pathway regulations. The HRM reduces the number of RNASeq experiments needed as responses can be tested in silico prior to the experiment. AVAILABILITY AND IMPLEMENTATION The HRM software and tutorial are available at https://github.com/sd2e/CDM and the configurable differential expression analysis tools and tutorials are available at https://github.com/SD2E/omics_tools. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Amin Espah Borujeni
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Hamed Eramian
- Data Science, Netrias, LLC, Annapolis, MD 21409, USA
| | - Mark Weston
- Data Science, Netrias, LLC, Annapolis, MD 21409, USA
| | - George Zheng
- Data Science, Netrias, LLC, Annapolis, MD 21409, USA
| | - Joshua Urrutia
- Life Sciences and Computing, Texas Advanced Computing Center, Austin, TX 78758, USA
| | | | | | | | | | - Alexander Cristofaro
- TScan Therapeutics, Inc., Waltham, MA 02451, USA.,Foundry for Synthetic Biology, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Hamid Doost Hosseini
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - D Benjamin Gordon
- Foundry for Synthetic Biology, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Yuval Dorfan
- Foundry for Synthetic Biology, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | | | - Matthew Vaughn
- Life Sciences and Computing, Texas Advanced Computing Center, Austin, TX 78758, USA
| | - Niall Gaffney
- Life Sciences and Computing, Texas Advanced Computing Center, Austin, TX 78758, USA
| | - John Fonner
- Life Sciences and Computing, Texas Advanced Computing Center, Austin, TX 78758, USA
| | - Joe Stubbs
- Life Sciences and Computing, Texas Advanced Computing Center, Austin, TX 78758, USA
| | - Christopher A Voigt
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Enoch Yeung
- Bioengineering Center, University of California Santa Barbara, Santa Barbara, CA 93106, USA
| |
Collapse
|
4
|
A Survey on Recent Named Entity Recognition and Relationship Extraction Techniques on Clinical Texts. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11188319] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Significant growth in Electronic Health Records (EHR) over the last decade has provided an abundance of clinical text that is mostly unstructured and untapped. This huge amount of clinical text data has motivated the development of new information extraction and text mining techniques. Named Entity Recognition (NER) and Relationship Extraction (RE) are key components of information extraction tasks in the clinical domain. In this paper, we highlight the present status of clinical NER and RE techniques in detail by discussing the existing proposed NLP models for the two tasks and their performances and discuss the current challenges. Our comprehensive survey on clinical NER and RE encompass current challenges, state-of-the-art practices, and future directions in information extraction from clinical text. This is the first attempt to discuss both of these interrelated topics together in the clinical context. We identified many research articles published based on different approaches and looked at applications of these tasks. We also discuss the evaluation metrics that are used in the literature to measure the effectiveness of the two these NLP methods and future research directions.
Collapse
|
5
|
Sänger M, Leser U. Large-scale entity representation learning for biomedical relationship extraction. Bioinformatics 2021; 37:236-242. [PMID: 32726411 DOI: 10.1093/bioinformatics/btaa674] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Revised: 07/14/2020] [Accepted: 07/21/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The automatic extraction of published relationships between molecular entities has important applications in many biomedical fields, ranging from Systems Biology to Personalized Medicine. Existing works focused on extracting relationships described in single articles or in single sentences. However, a single record is rarely sufficient to judge upon the biological correctness of a relation, as experimental evidence might be weak or only valid in a certain context. Furthermore, statements may be more speculative than confirmative, and different articles often contradict each other. Experts therefore always take the complete literature into account to take a reliable decision upon a relationship. It is an open research question how to do this effectively in an automatic manner. RESULTS We propose two novel relation extraction approaches which use recent representation learning techniques to create comprehensive models of biomedical entities or entity-pairs, respectively. These representations are learned by considering all publications from PubMed mentioning an entity or a pair. They are used as input for a neural network for classifying relations globally, i.e. the derived predictions are corpus-based, not sentence- or article based as in prior art. Experiments on the extraction of mutation-disease, drug-disease and drug-drug relationships show that the learned embeddings indeed capture semantic information of the entities under study and outperform traditional methods by 4-29% regarding F1 score. AVAILABILITY AND IMPLEMENTATION Source codes are available at: https://github.com/mariosaenger/bio-re-with-entity-embeddings. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mario Sänger
- Computer Science Department, Knowledge Management in Bioinformatics, Humboldt-Universität zu Berlin, Berlin 10099, Germany
| | - Ulf Leser
- Computer Science Department, Knowledge Management in Bioinformatics, Humboldt-Universität zu Berlin, Berlin 10099, Germany
| |
Collapse
|
6
|
Adam ZR, Fahrenbach AC, Jacobson SM, Kacar B, Zubarev DY. Radiolysis generates a complex organosynthetic chemical network. Sci Rep 2021; 11:1743. [PMID: 33462313 PMCID: PMC7813863 DOI: 10.1038/s41598-021-81293-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Accepted: 01/01/2021] [Indexed: 11/22/2022] Open
Abstract
The architectural features of cellular life and its ecologies at larger scales are built upon foundational networks of reactions between molecules that avoid a collapse to equilibrium. The search for life’s origins is, in some respects, a search for biotic network attributes in abiotic chemical systems. Radiation chemistry has long been employed to model prebiotic reaction networks, and here we report network-level analyses carried out on a compiled database of radiolysis reactions, acquired by the scientific community over decades of research. The resulting network shows robust connections between abundant geochemical reservoirs and the production of carboxylic acids, amino acids, and ribonucleotide precursors—the chemistry of which is predominantly dependent on radicals. Moreover, the network exhibits the following measurable attributes associated with biological systems: (1) the species connectivity histogram exhibits a heterogeneous (heavy-tailed) distribution, (2) overlapping families of closed-loop cycles, and (3) a hierarchical arrangement of chemical species with a bottom-heavy energy-size spectrum. The latter attribute is implicated with stability and entropy production in complex systems, notably in ecology where it is known as a trophic pyramid. Radiolysis is implicated as a driver of abiotic chemical organization and could provide insights about the complex and perhaps radical-dependent mechanisms associated with life’s origins.
Collapse
Affiliation(s)
- Zachary R Adam
- Department of Planetary Sciences, University of Arizona, Tucson, AZ, 85721, USA. .,Department of Earth and Planetary Sciences, Harvard University, Cambridge, MA, USA.
| | - Albert C Fahrenbach
- School of Chemistry, University of New South Wales, Sydney, NSW, 2052, Australia
| | - Sofia M Jacobson
- Department of Molecular and Cellular Biology, University of Arizona, Tucson, AZ, 85721, USA
| | - Betul Kacar
- Department of Planetary Sciences, University of Arizona, Tucson, AZ, 85721, USA.,Department of Molecular and Cellular Biology, University of Arizona, Tucson, AZ, 85721, USA.,Department of Astronomy, University of Arizona, Tucson, AZ, 85721, USA.,Earth-Life Science Institute, Tokyo Institute of Technology, Tokyo, Japan
| | | |
Collapse
|
7
|
Abstract
AbstractIdentifying the evolution path of a research field is essential to scientific and technological innovation. There have been many attempts to identify the technology evolution path based on the topic model or social networks analysis, but many of them had deficiencies in methodology. First, many studies have only considered a single type of information (text or citation information) in scientific literature, which may lead to incomplete technology path mapping. Second, the number of topics in each period cannot be determined automatically, making dynamic topic tracking difficult. Third, data mining methods fail to be effectively combined with visual analysis, which will affect the efficiency and flexibility of mapping. In this study, we developed a method for mapping the technology evolution path using a novel non-parametric topic model, the citation involved Hierarchical Dirichlet Process (CIHDP), to achieve better topic detection and tracking of scientific literature. To better present and analyze the path, D3.js is used to visualize the splitting and fusion of the evolutionary path. We used this novel model to mapping the artificial intelligence research domain, through a successful mapping of the evolution path, the proposed method’s validity and merits are shown. After incorporating the citation information, we found that the CIHDP can be mapping a complete path evolution process and had better performance than the Hierarchical Dirichlet Process and LDA. This method can be helpful for understanding and analyzing the development of technical topics. Moreover, it can be well used to map the science or technology of the innovation ecosystem. It may also arouse the interest of technology evolution path researchers or policymakers.
Collapse
|
8
|
Lei X, Wang Y. Predicting Microbe-Disease Association by Learning Graph Representations and Rule-Based Inference on the Heterogeneous Network. Front Microbiol 2020; 11:579. [PMID: 32351464 PMCID: PMC7174569 DOI: 10.3389/fmicb.2020.00579] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Accepted: 03/17/2020] [Indexed: 12/18/2022] Open
Abstract
More and more clinical observations have implied that microbes have great effects on human diseases. Understanding the relations between microbes and diseases are of profound significance for disease prevention and therapy. In this paper, we propose a predictive model based on the known microbe-disease associations to discover potential microbe-disease associations through integrating Learning Graph Representations and a modified Scoring mechanism on the Heterogeneous network (called LGRSH). Firstly, the similarity networks for microbe and disease are obtained based on the similarity of Gaussian interaction profile kernel. Then, we construct a heterogeneous network including these two similarity networks and microbe-disease associations' network. After that, the embedding algorithm Node2vec is implemented to learn representations of nodes in the heterogeneous network. Finally, according to these low-dimensional vector representations, we calculate the relevance between each microbe and disease by utilizing a modified rule-based inference method. By comparison with three other methods including LRLSHMDA, KATZHMDA and BiRWHMDA, LGRSH performs better than others. Moreover, in case studies of asthma, Chronic Obstructive Pulmonary Disease and Inflammatory Bowel Disease, there are 8, 8, and 10 out of the top-10 discovered disease-related microbes were validated respectively, demonstrating that LGRSH performs well in predicting potential microbe-disease associations.
Collapse
Affiliation(s)
- Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Yueyue Wang
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| |
Collapse
|
9
|
Turki T, Taguchi YH. SCGRNs: Novel supervised inference of single-cell gene regulatory networks of complex diseases. Comput Biol Med 2020; 118:103656. [PMID: 32174324 DOI: 10.1016/j.compbiomed.2020.103656] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2020] [Revised: 02/06/2020] [Accepted: 02/07/2020] [Indexed: 12/19/2022]
|
10
|
Abstract
The abundance of high-throughput data and technical refinements in graph theories have allowed network analysis to become an effective approach for various medical fields. This chapter introduces co-expression, Bayesian, and regression-based network construction methods, which are the basis of network analysis. Various methods in network topology analysis are explained, along with their unique features and applications in biomedicine. Furthermore, we explain the role of network embedding in reducing the dimensionality of networks and outline several popular algorithms used by researchers today. Current literature has implemented different combinations of topology analysis and network embedding techniques, and we outline several studies in the fields of genetic-based disease prediction, drug-target identification, and multi-level omics integration.
Collapse
|