1
|
Rojas-Carabali W, Agrawal R, Gutierrez-Sinisterra L, Baxter SL, Cifuentes-González C, Wei YC, Abisheganaden J, Kannapiran P, Wong S, Lee B, de-la-Torre A, Agrawal R. Natural Language Processing in medicine and ophthalmology: A review for the 21st-century clinician. Asia Pac J Ophthalmol (Phila) 2024:100084. [PMID: 39059557 DOI: 10.1016/j.apjo.2024.100084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Revised: 07/08/2024] [Accepted: 07/19/2024] [Indexed: 07/28/2024] Open
Abstract
Natural Language Processing (NLP) is a subfield of artificial intelligence that focuses on the interaction between computers and human language, enabling computers to understand, generate, and derive meaning from human language. NLP's potential applications in the medical field are extensive and vary from extracting data from Electronic Health Records -one of its most well-known and frequently exploited uses- to investigating relationships among genetics, biomarkers, drugs, and diseases for the proposal of new medications. NLP can be useful for clinical decision support, patient monitoring, or medical image analysis. Despite its vast potential, the real-world application of NLP is still limited due to various challenges and constraints, meaning that its evolution predominantly continues within the research domain. However, with the increasingly widespread use of NLP, particularly with the availability of large language models, such as ChatGPT, it is crucial for medical professionals to be aware of the status, uses, and limitations of these technologies.
Collapse
Affiliation(s)
- William Rojas-Carabali
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore; Tan Tock Seng Hospital, National Healthcare Group Eye Institute, Singapore
| | - Rajdeep Agrawal
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore
| | | | - Sally L Baxter
- Division of Ophthalmology Informatics and Data Science, Viterbi Family Department of Ophthalmology and Shiley Eye Institute, University of California San Diego, La Jolla, CA, USA; Division of Biomedical Informatics, Department of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Carlos Cifuentes-González
- Neuroscience Research Group (NEUROS), Neurovitae Center for Neuroscience, Institute of Translational Medicine (IMT), Escuela de Medicina y Ciencias de la Salud, Universidad del Rosario, Bogotá, Colombia
| | - Yap Chun Wei
- Health Services and Outcomes Research, National Healthcare Group, Singapore
| | - John Abisheganaden
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore; Health Services and Outcomes Research, National Healthcare Group, Singapore; Department of Respiratory Medicine, Tan Tock Seng Hospital, Singapore
| | | | - Sunny Wong
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore
| | - Bernett Lee
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore
| | - Alejandra de-la-Torre
- Neuroscience Research Group (NEUROS), Neurovitae Center for Neuroscience, Institute of Translational Medicine (IMT), Escuela de Medicina y Ciencias de la Salud, Universidad del Rosario, Bogotá, Colombia
| | - Rupesh Agrawal
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore; Tan Tock Seng Hospital, National Healthcare Group Eye Institute, Singapore; Singapore Eye Research Institute, Singapore; Duke NUS Medical School, National University of Singapore, Singapore.
| |
Collapse
|
2
|
Bhuvaneshwari S, Venkataraman K, Sankaranarayanan K. Exploring potential ion channel targets for rheumatoid arthritis: combination of network analysis and gene expression analysis. Biotechnol Appl Biochem 2024. [PMID: 39049164 DOI: 10.1002/bab.2638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Accepted: 06/29/2024] [Indexed: 07/27/2024]
Abstract
Rheumatoid arthritis (RA) is a systemic autoimmune disease characterized by chronic inflammation of the synovial membrane that leads to the destruction of cartilage and bone. Currently, pharmacological targeting of ion channels is being increasingly recognized as an attractive and feasible strategy for the treatment of RA. The present work employs a network analysis approach to predict the most promising ion channel target for potential RA-treating drugs. A protein-protein interaction map was generated for 343 genes associated with inflammation in RA and ion channel genes using Search Tool for the Retrieval of Interacting Genes and visualized using Cytoscape. Based on the betweenness centrality and traffic values as key topological parameters, 17 hub nodes were identified, including FOS (9800.85), tumor necrosis factor (3654.60), TGFB1 (3305.75), and VEGFA (3052.88). The backbone network constructed with these 17 hub genes was intensely analyzed to identify the most promising ion channel target using network analyzer. Calcium permeating ion channels, especially store-operated calcium entry channels, and their associated regulatory proteins were found to highly interact with RA inflammatory hub genes. This significant ion channel target for RA identified by theoretical and statistical studies was further validated by a pilot case-control gene expression study. Experimental verification of the above findings in 75 RA cases and 25 controls showed increased ORAI1 expression. Thus, with a combination of network analysis approach and gene expression studies, we have explored potential targets for RA treatment.
Collapse
Affiliation(s)
- Sampath Bhuvaneshwari
- Ion Channel Biology Laboratory, AU-KBC Research Centre, Madras Institute of Technology, Anna University, Chennai, India
| | | | - Kavitha Sankaranarayanan
- Ion Channel Biology Laboratory, AU-KBC Research Centre, Madras Institute of Technology, Anna University, Chennai, India
| |
Collapse
|
3
|
Wishart DS, Hiebert-Giesbrecht M, Inchehborouni G, Cao X, Guo AC, LeVatte MA, Torres-Calzada C, Gautam V, Johnson M, Liigand J, Wang F, Zahraei S, Bhumireddy S, Wang Y, Zheng J, Mandal R, Dyck JRB. Chemical Composition of Commercial Cannabis. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2024; 72:14099-14113. [PMID: 38181219 PMCID: PMC11212042 DOI: 10.1021/acs.jafc.3c06616] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 12/18/2023] [Accepted: 12/22/2023] [Indexed: 01/07/2024]
Abstract
Cannabis is widely used for medicinal and recreational purposes. As a result, there is increased interest in its chemical components and their physiological effects. However, current information on cannabis chemistry is often outdated or scattered across many books and journals. To address this issue, we used modern metabolomics techniques and modern bioinformatics techniques to compile a comprehensive list of >6000 chemical constituents in commercial cannabis. The metabolomics methods included a combination of high- and low-resolution liquid chromatography-mass spectrometry (MS), gas chromatography-MS, and inductively coupled plasma-MS. The bioinformatics methods included computer-aided text mining and computational genome-scale metabolic inference. This information, along with detailed compound descriptions, physicochemical data, known physiological effects, protein targets, and referential compound spectra, has been made available through a publicly accessible database called the Cannabis Compound Database (https://cannabisdatabase.ca). Such a centralized, open-access resource should prove to be quite useful for the cannabis community.
Collapse
Affiliation(s)
- David S. Wishart
- Department
of Biological Sciences, University of Alberta, Edmonton, Alberta T6G 2E9, Canada
- Department
of Computing Science, University of Alberta, Edmonton, Alberta T6G 2E8, Canada
- Faculty
of Pharmacy and Pharmaceutical Sciences, University of Alberta, Edmonton, Alberta T6G 2H7, Canada
- Department
of Laboratory Medicine and Pathology, University
of Alberta, Edmonton, Alberta T6G 2R3, Canada
| | | | - Gozal Inchehborouni
- Department
of Biological Sciences, University of Alberta, Edmonton, Alberta T6G 2E9, Canada
| | - Xuan Cao
- Department
of Biological Sciences, University of Alberta, Edmonton, Alberta T6G 2E9, Canada
| | - An Chi Guo
- Department
of Biological Sciences, University of Alberta, Edmonton, Alberta T6G 2E9, Canada
| | - Marcia A. LeVatte
- Department
of Biological Sciences, University of Alberta, Edmonton, Alberta T6G 2E9, Canada
| | - Claudia Torres-Calzada
- Department
of Biological Sciences, University of Alberta, Edmonton, Alberta T6G 2E9, Canada
| | - Vasuk Gautam
- Department
of Biological Sciences, University of Alberta, Edmonton, Alberta T6G 2E9, Canada
| | - Mathew Johnson
- Department
of Biological Sciences, University of Alberta, Edmonton, Alberta T6G 2E9, Canada
| | - Jaanus Liigand
- Department
of Biological Sciences, University of Alberta, Edmonton, Alberta T6G 2E9, Canada
| | - Fei Wang
- Department
of Computing Science, University of Alberta, Edmonton, Alberta T6G 2E8, Canada
| | - Shirin Zahraei
- Department
of Biological Sciences, University of Alberta, Edmonton, Alberta T6G 2E9, Canada
| | - Sudarshana Bhumireddy
- Department
of Biological Sciences, University of Alberta, Edmonton, Alberta T6G 2E9, Canada
| | - Yilin Wang
- Department
of Biological Sciences, University of Alberta, Edmonton, Alberta T6G 2E9, Canada
| | - Jiamin Zheng
- Department
of Biological Sciences, University of Alberta, Edmonton, Alberta T6G 2E9, Canada
| | - Rupasri Mandal
- Department
of Biological Sciences, University of Alberta, Edmonton, Alberta T6G 2E9, Canada
| | - Jason R. B. Dyck
- Department
of Pediatrics, University of Alberta, Edmonton, Alberta T6G 1C9, Canada
| |
Collapse
|
4
|
Zhao Y, Yin J, Zhang L, Zhang Y, Chen X. Drug-drug interaction prediction: databases, web servers and computational models. Brief Bioinform 2023; 25:bbad445. [PMID: 38113076 PMCID: PMC10782925 DOI: 10.1093/bib/bbad445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 10/26/2023] [Accepted: 11/14/2023] [Indexed: 12/21/2023] Open
Abstract
In clinical treatment, two or more drugs (i.e. drug combination) are simultaneously or successively used for therapy with the purpose of primarily enhancing the therapeutic efficacy or reducing drug side effects. However, inappropriate drug combination may not only fail to improve efficacy, but even lead to adverse reactions. Therefore, according to the basic principle of improving the efficacy and/or reducing adverse reactions, we should study drug-drug interactions (DDIs) comprehensively and thoroughly so as to reasonably use drug combination. In this review, we first introduced the basic conception and classification of DDIs. Further, some important publicly available databases and web servers about experimentally verified or predicted DDIs were briefly described. As an effective auxiliary tool, computational models for predicting DDIs can not only save the cost of biological experiments, but also provide relevant guidance for combination therapy to some extent. Therefore, we summarized three types of prediction models (including traditional machine learning-based models, deep learning-based models and score function-based models) proposed during recent years and discussed the advantages as well as limitations of them. Besides, we pointed out the problems that need to be solved in the future research of DDIs prediction and provided corresponding suggestions.
Collapse
Affiliation(s)
- Yan Zhao
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Jun Yin
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Li Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Yong Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Xing Chen
- School of Science, Jiangnan University, Wuxi 214122, China
| |
Collapse
|
5
|
Upadhyay V, Boorla VS, Maranas CD. Rank-ordering of known enzymes as starting points for re-engineering novel substrate activity using a convolutional neural network. Metab Eng 2023; 78:171-182. [PMID: 37301359 DOI: 10.1016/j.ymben.2023.06.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 05/19/2023] [Accepted: 06/02/2023] [Indexed: 06/12/2023]
Abstract
Retro-biosynthetic approaches have made significant advances in predicting synthesis routes of target biofuel, bio-renewable or bio-active molecules. The use of only cataloged enzymatic activities limits the discovery of new production routes. Recent retro-biosynthetic algorithms increasingly use novel conversions that require altering the substrate or cofactor specificities of existing enzymes while connecting pathways leading to a target metabolite. However, identifying and re-engineering enzymes for desired novel conversions are currently the bottlenecks in implementing such designed pathways. Herein, we present EnzRank, a convolutional neural network (CNN) based approach, to rank-order existing enzymes in terms of their suitability to undergo successful protein engineering through directed evolution or de novo design towards a desired specific substrate activity. We train the CNN model on 11,800 known active enzyme-substrate pairs from the BRENDA database as positive samples and data generated by scrambling these pairs as negative samples using substrate dissimilarity between an enzyme's native substrate and all other molecules present in the dataset using Tanimoto similarity score. EnzRank achieves an average recovery rate of 80.72% and 73.08% for positive and negative pairs on test data after using a 10-fold holdout method for training and cross-validation. We further developed a web-based user interface (available at https://huggingface.co/spaces/vuu10/EnzRank) to predict enzyme-substrate activity using SMILES strings of substrates and enzyme sequence as input to allow convenient and easy-to-use access to EnzRank. In summary, this effort can aid de novo pathway design tools to prioritize starting enzyme re-engineering candidates for novel reactions as well as in predicting the potential secondary activity of enzymes in cell metabolism.
Collapse
Affiliation(s)
- Vikas Upadhyay
- Department of Chemical Engineering, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Veda Sheersh Boorla
- Department of Chemical Engineering, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Costas D Maranas
- Department of Chemical Engineering, The Pennsylvania State University, University Park, PA, 16802, USA.
| |
Collapse
|
6
|
Gajendran S, Manjula D, Sugumaran V, Hema R. Extraction of knowledge graph of Covid-19 through mining of unstructured biomedical corpora. Comput Biol Chem 2023; 102:107808. [PMID: 36621289 PMCID: PMC9807269 DOI: 10.1016/j.compbiolchem.2022.107808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 12/21/2022] [Accepted: 12/29/2022] [Indexed: 01/04/2023]
Abstract
The number of biomedical articles published is increasing rapidly over the years. Currently there are about 30 million articles in PubMed and over 25 million mentions in Medline. Among these fundamentals, Biomedical Named Entity Recognition (BioNER) and Biomedical Relation Extraction (BioRE) are the most essential in analysing the literature. In the biomedical domain, Knowledge Graph is used to visualize the relationships between various entities such as proteins, chemicals and diseases. Scientific publications have increased dramatically as a result of the search for treatments and potential cures for the new Coronavirus, but efficiently analysing, integrating, and utilising related sources of information remains a difficulty. In order to effectively combat the disease during pandemics like COVID-19, literature must be used quickly and effectively. In this paper, we introduced a fully automated framework consists of BERT-BiLSTM, Knowledge graph, and Representation Learning model to extract the top diseases, chemicals, and proteins related to COVID-19 from the literature. The proposed framework uses Named Entity Recognition models for disease recognition, chemical recognition, and protein recognition. Then the system uses the Chemical - Disease Relation Extraction and Chemical - Protein Relation Extraction models. And the system extracts the entities and relations from the CORD-19 dataset using the models. The system then creates a Knowledge Graph for the extracted relations and entities. The system performs Representation Learning on this KG to get the embeddings of all entities and get the top related diseases, chemicals, and proteins with respect to COVID-19.
Collapse
Affiliation(s)
- Sudhakaran Gajendran
- School of Electronics Engineering, Vellore Institute of Technology, Chennai, India,Corresponding author
| | - D. Manjula
- School of Computer Science Engineering, Vellore Institute of Technology, Chennai, India
| | - Vijayan Sugumaran
- Center for Data Science and Big Data Analytics, Oakland University, Rochester, MI, USA,Department of Decision and Information Sciences, School of Business Administration, Oakland University, Rochester, MI, USA
| | - R. Hema
- Department of Electronics and Communication Engineering, St. Joseph College of Engineering, Chennai, India
| |
Collapse
|
7
|
Feng Z, Shen Z, Li H, Li S. e-TSN: an interactive visual exploration platform for target-disease knowledge mapping from literature. Brief Bioinform 2022; 23:6809962. [PMID: 36347537 PMCID: PMC9677481 DOI: 10.1093/bib/bbac465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 09/20/2022] [Accepted: 09/27/2022] [Indexed: 11/10/2022] Open
Abstract
Target discovery and identification processes are driven by the increasing amount of biomedical data. The vast numbers of unstructured texts of biomedical publications provide a rich source of knowledge for drug target discovery research and demand the development of specific algorithms or tools to facilitate finding disease genes and proteins. Text mining is a method that can automatically mine helpful information related to drug target discovery from massive biomedical literature. However, there is a substantial lag between biomedical publications and the subsequent abstraction of information extracted by text mining to databases. The knowledge graph is introduced to integrate heterogeneous biomedical data. Here, we describe e-TSN (Target significance and novelty explorer, http://www.lilab-ecust.cn/etsn/), a knowledge visualization web server integrating the largest database of associations between targets and diseases from the full scientific literature by constructing significance and novelty scoring methods based on bibliometric statistics. The platform aims to visualize target-disease knowledge graphs to assist in prioritizing candidate disease-related proteins. Approved drugs and associated bioactivities for each interested target are also provided to facilitate the visualization of drug-target relationships. In summary, e-TSN is a fast and customizable visualization resource for investigating and analyzing the intricate target-disease networks, which could help researchers understand the mechanisms underlying complex disease phenotypes and improve the drug discovery and development efficiency, especially for the unexpected outbreak of infectious disease pandemics like COVID-19.
Collapse
Affiliation(s)
- Ziyan Feng
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Zihao Shen
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Honglin Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China,Innovation Center for AI and Drug Discovery, East China Normal University, Shanghai 200062, China,Lingang Laboratory, Shanghai 200031, China
| | - Shiliang Li
- Corresponding author: Shiliang Li, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China; Innovation Center for AI and Drug Discovery, East China Normal University, Shanghai 200062, China. E-mail:
| |
Collapse
|
8
|
Gnilopyat S, DePietro PJ, Parry TK, McLaughlin WA. The Pharmacorank Search Tool for the Retrieval of Prioritized Protein Drug Targets and Drug Repositioning Candidates According to Selected Diseases. Biomolecules 2022; 12:1559. [PMID: 36358909 PMCID: PMC9687941 DOI: 10.3390/biom12111559] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 10/19/2022] [Accepted: 10/22/2022] [Indexed: 08/13/2023] Open
Abstract
We present the Pharmacorank search tool as an objective means to obtain prioritized protein drug targets and their associated medications according to user-selected diseases. This tool could be used to obtain prioritized protein targets for the creation of novel medications or to predict novel indications for medications that already exist. To prioritize the proteins associated with each disease, a gene similarity profiling method based on protein functions is implemented. The priority scores of the proteins are found to correlate well with the likelihoods that the associated medications are clinically relevant in the disease's treatment. When the protein priority scores are plotted against the percentage of protein targets that are known to bind medications currently indicated to treat the disease, which we termed the pertinency score, a strong correlation was observed. The correlation coefficient was found to be 0.9978 when using a weighted second-order polynomial fit. As the highly predictive fit was made using a broad range of diseases, we were able to identify a general threshold for the pertinency score as a starting point for considering drug repositioning candidates. Several repositioning candidates are described for proteins that have high predicated pertinency scores, and these provide illustrative examples of the applications of the tool. We also describe focused reviews of repositioning candidates for Alzheimer's disease. Via the tool's URL, https://protein.som.geisinger.edu/Pharmacorank/, an open online interface is provided for interactive use; and there is a site for programmatic access.
Collapse
Affiliation(s)
| | | | | | - William A. McLaughlin
- Department of Medical Education, Geisinger Commonwealth School of Medicine, 525 Pine Street, Scranton, PA 18509, USA
| |
Collapse
|
9
|
Lazarczyk M, Duda K, Mickael ME, AK O, Paszkiewicz J, Kowalczyk A, Horbańczuk JO, Sacharczuk M. Adera2.0: A Drug Repurposing Workflow for Neuroimmunological Investigations Using Neural Networks. MOLECULES (BASEL, SWITZERLAND) 2022; 27:molecules27196453. [PMID: 36234990 PMCID: PMC9571571 DOI: 10.3390/molecules27196453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Revised: 09/25/2022] [Accepted: 09/26/2022] [Indexed: 11/16/2022]
Abstract
Drug repurposing in the context of neuroimmunological (NI) investigations is still in its primary stages. Drug repurposing is an important method that bypasses lengthy drug discovery procedures and focuses on discovering new usages for known medications. Neuroimmunological diseases, such as Alzheimer's, Parkinson's, multiple sclerosis, and depression, include various pathologies that result from the interaction between the central nervous system and the immune system. However, the repurposing of NI medications is hindered by the vast amount of information that needs mining. We previously presented Adera1.0, which was capable of text mining PubMed for answering query-based questions. However, Adera1.0 was not able to automatically identify chemical compounds within relevant sentences. To challenge the need for repurposing known medications for neuroimmunological diseases, we built a deep neural network named Adera2.0 to perform drug repurposing. The workflow uses three deep learning networks. The first network is an encoder and its main task is to embed text into matrices. The second network uses a mean squared error (MSE) loss function to predict answers in the form of embedded matrices. The third network, which constitutes the main novelty in our updated workflow, also uses a MSE loss function. Its main usage is to extract compound names from relevant sentences resulting from the previous network. To optimize the network function, we compared eight different designs. We found that a deep neural network consisting of an RNN neural network and a leaky ReLU could achieve 0.0001 loss and 67% sensitivity. Additionally, we validated Adera2.0's ability to predict NI drug usage against the DRUG Repurposing Hub database. These results establish the ability of Adera2.0 to repurpose drug candidates that can shorten the development of the drug cycle. The workflow could be download online.
Collapse
Affiliation(s)
- Marzena Lazarczyk
- Department of Experimental Genomics, Institute of Genetics and Animal Biotechnology of the Polish Academy of Sciences, ul. Postepu 36A, Jastrzebiec, 05-552 Magdalenka, Poland
| | - Kamila Duda
- Centre for Preclinical Research and Technology, Department of Pharmacodynamics, Faculty of Pharmacy with the Laboratory Medicine Division, Medical University of Warsaw, Banacha 1B, 02-091 Warsaw, Poland
| | - Michel Edwar Mickael
- Department of Experimental Genomics, Institute of Genetics and Animal Biotechnology of the Polish Academy of Sciences, ul. Postepu 36A, Jastrzebiec, 05-552 Magdalenka, Poland
- PM Research Center, Väpnaregatan 22, 58649 Linköping, Sweden
- Correspondence: (M.E.M.); (M.S.)
| | - Onurhan AK
- Department of Sociology, Queen’s University at Kingston, 99 University Ave, Kingston, ON K7L 3N6, Canada
| | - Justyna Paszkiewicz
- Department of Health, John Paul II University of Applied Sciences in Biala Podlaska, Sidorska 95/97, 21-500 Biała Podlaska, Poland
| | - Agnieszka Kowalczyk
- Centre for Preclinical Research and Technology, Department of Pharmacodynamics, Faculty of Pharmacy with the Laboratory Medicine Division, Medical University of Warsaw, Banacha 1B, 02-091 Warsaw, Poland
| | - Jarosław Olav Horbańczuk
- Institute of Genetics and Animal Biotechnology of the Polish Academy of Sciences, ul. Postepu 36A, Jastrzebiec, 05-552 Magdalenka, Poland
| | - Mariusz Sacharczuk
- Department of Experimental Genomics, Institute of Genetics and Animal Biotechnology of the Polish Academy of Sciences, ul. Postepu 36A, Jastrzebiec, 05-552 Magdalenka, Poland
- Department of Pharmacodynamics, Faculty of Pharmacy with the Laboratory Medicine Division, Medical University of Warsaw, Banacha 1B, 02-091 Warsaw, Poland
- Correspondence: (M.E.M.); (M.S.)
| |
Collapse
|
10
|
Dlamini Z, Skepu A, Kim N, Mkhabele M, Khanyile R, Molefi T, Mbatha S, Setlai B, Mulaudzi T, Mabongo M, Bida M, Kgoebane-Maseko M, Mathabe K, Lockhat Z, Kgokolo M, Chauke-Malinga N, Ramagaga S, Hull R. AI and precision oncology in clinical cancer genomics: From prevention to targeted cancer therapies-an outcomes based patient care. INFORMATICS IN MEDICINE UNLOCKED 2022. [DOI: 10.1016/j.imu.2022.100965] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
|
11
|
Marchesin S, Silvello G. TBGA: a large-scale Gene-Disease Association dataset for Biomedical Relation Extraction. BMC Bioinformatics 2022; 23:111. [PMID: 35361129 PMCID: PMC8973894 DOI: 10.1186/s12859-022-04646-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Accepted: 03/22/2022] [Indexed: 01/12/2023] Open
Abstract
Background Databases are fundamental to advance biomedical science. However, most of them are populated and updated with a great deal of human effort. Biomedical Relation Extraction (BioRE) aims to shift this burden to machines. Among its different applications, the discovery of Gene-Disease Associations (GDAs) is one of BioRE most relevant tasks. Nevertheless, few resources have been developed to train models for GDA extraction. Besides, these resources are all limited in size—preventing models from scaling effectively to large amounts of data. Results To overcome this limitation, we have exploited the DisGeNET database to build a large-scale, semi-automatically annotated dataset for GDA extraction. DisGeNET stores one of the largest available collections of genes and variants involved in human diseases. Relying on DisGeNET, we developed TBGA: a GDA extraction dataset generated from more than 700K publications that consists of over 200K instances and 100K gene-disease pairs. Each instance consists of the sentence from which the GDA was extracted, the corresponding GDA, and the information about the gene-disease pair. Conclusions TBGA is amongst the largest datasets for GDA extraction. We have evaluated state-of-the-art models for GDA extraction on TBGA, showing that it is a challenging and well-suited dataset for the task. We made the dataset publicly available to foster the development of state-of-the-art BioRE models for GDA extraction. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04646-6.
Collapse
Affiliation(s)
- Stefano Marchesin
- Department of Information Engineering, University of Padova, Padova, Italy.
| | - Gianmaria Silvello
- Department of Information Engineering, University of Padova, Padova, Italy
| |
Collapse
|
12
|
Comparison of Text Mining Models for Food and Dietary Constituent Named-Entity Recognition. MACHINE LEARNING AND KNOWLEDGE EXTRACTION 2022. [DOI: 10.3390/make4010012] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Biomedical Named-Entity Recognition (BioNER) has become an essential part of text mining due to the continuously increasing digital archives of biological and medical articles. While there are many well-performing BioNER tools for entities such as genes, proteins, diseases or species, there is very little research into food and dietary constituent named-entity recognition. For this reason, in this paper, we study seven BioNER models for food and dietary constituents recognition. Specifically, we study a dictionary-based model, a conditional random fields (CRF) model and a new hybrid model, called FooDCoNER (Food and Dietary Constituents Named-Entity Recognition), which we introduce combining the former two models. In addition, we study deep language models including BERT, BioBERT, RoBERTa and ELECTRA. As a result, we find that FooDCoNER does not only lead to the overall best results, comparable with the deep language models, but FooDCoNER is also much more efficient with respect to run time and sample size requirements of the training data. The latter has been identified via the study of learning curves. Overall, our results not only provide a new tool for food and dietary constituent NER but also shed light on the difference between classical machine learning models and recent deep language models.
Collapse
|
13
|
Popescu VB, Kanhaiya K, Năstac DI, Czeizler E, Petre I. Network controllability solutions for computational drug repurposing using genetic algorithms. Sci Rep 2022; 12:1437. [PMID: 35082323 PMCID: PMC8791995 DOI: 10.1038/s41598-022-05335-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 12/29/2021] [Indexed: 12/22/2022] Open
Abstract
Control theory has seen recently impactful applications in network science, especially in connections with applications in network medicine. A key topic of research is that of finding minimal external interventions that offer control over the dynamics of a given network, a problem known as network controllability. We propose in this article a new solution for this problem based on genetic algorithms. We tailor our solution for applications in computational drug repurposing, seeking to maximize its use of FDA-approved drug targets in a given disease-specific protein-protein interaction network. We demonstrate our algorithm on several cancer networks and on several random networks with their edges distributed according to the Erdős-Rényi, the Scale-Free, and the Small World properties. Overall, we show that our new algorithm is more efficient in identifying relevant drug targets in a disease network, advancing the computational solutions needed for new therapeutic and drug repurposing approaches.
Collapse
Affiliation(s)
| | | | - Dumitru Iulian Năstac
- POLITEHNICA University of Bucharest, Faculty of Electronics, Telecommunications and Information Technology, 061071, Bucharest, Romania
| | - Eugen Czeizler
- Computer Science, Åbo Akademi University, 20500, Turku, Finland
- National Institute for Research and Development in Biological Sciences, 060031, Bucharest, Romania
| | - Ion Petre
- Department of Mathematics and Statistics, University of Turku, 20014, Turku, Finland.
- National Institute for Research and Development in Biological Sciences, 060031, Bucharest, Romania.
| |
Collapse
|
14
|
Common and Unique Genetic Background between Attention-Deficit/Hyperactivity Disorder and Excessive Body Weight. Genes (Basel) 2021; 12:genes12091407. [PMID: 34573389 PMCID: PMC8464917 DOI: 10.3390/genes12091407] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Revised: 09/08/2021] [Accepted: 09/09/2021] [Indexed: 02/07/2023] Open
Abstract
Comorbidity studies show that children with ADHD have a higher risk of being overweight and obese than healthy children. This study aimed to assess the genetic alternations that differ between and are shared by ADHD and excessive body weight (EBW). The sample consisted of 743 Polish children aged between 6 and 17 years. We analyzed a unique set of genes and polymorphisms selected for ADHD and/or obesity based on gene prioritization tools. Polymorphisms in the KCNIP1, SLC1A3, MTHFR, ADRA2A, and SLC6A2 genes proved to be associated with the risk of ADHD in the studied population. The COMT gene polymorphism was one that specifically increased the risk of EBW in the ADHD group. Using the whole-exome sequencing technique, we have shown that the ADHD group contains rare and protein-truncating variants in the FBXL17, DBH, MTHFR, PCDH7, RSPH3, SPTBN1, and TNRC6C genes. In turn, variants in the ADRA2A, DYNC1H1, MAP1A, SEMA6D, and ZNF536 genes were specific for ADHD with EBW. In this way, we confirmed, at the molecular level, the existence of genes specifically predisposing to EBW in ADHD patients, which are associated with the biological pathways involved in the regulation of the reward system, intestinal microbiome, and muscle metabolism.
Collapse
|
15
|
Delmas M, Filangi O, Paulhe N, Vinson F, Duperier C, Garrier W, Saunier PE, Pitarch Y, Jourdan F, Giacomoni F, Frainay C. FORUM: Building a Knowledge Graph from public databases and scientific literature to extract associations between chemicals and diseases. Bioinformatics 2021; 37:3896-3904. [PMID: 34478489 PMCID: PMC8570811 DOI: 10.1093/bioinformatics/btab627] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Revised: 08/16/2021] [Accepted: 09/01/2021] [Indexed: 11/22/2022] Open
Abstract
Motivation Metabolomics studies aim at reporting a metabolic signature (list of metabolites) related to a particular experimental condition. These signatures are instrumental in the identification of biomarkers or classification of individuals, however their biological and physiological interpretation remains a challenge. To support this task, we introduce FORUM: a Knowledge Graph (KG) providing a semantic representation of relations between chemicals and biomedical concepts, built from a federation of life science databases and scientific literature repositories. Results The use of a Semantic Web framework on biological data allows us to apply ontological-based reasoning to infer new relations between entities. We show that these new relations provide different levels of abstraction and could open the path to new hypotheses. We estimate the statistical relevance of each extracted relation, explicit or inferred, using an enrichment analysis, and instantiate them as new knowledge in the KG to support results interpretation/further inquiries. Availability and implementation A web interface to browse and download the extracted relations, as well as a SPARQL endpoint to directly probe the whole FORUM KG, are available at https://forum-webapp.semantic-metabolomics.fr. The code needed to reproduce the triplestore is available at https://github.com/eMetaboHUB/Forum-DiseasesChem. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- M Delmas
- Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, 31300, France
| | - O Filangi
- IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, Le Rheu, 35653, France
| | - N Paulhe
- Université Clermont Auvergne, INRAE, UNH, Plateforme d'Exploration du Métabolisme, MetaboHUB Clermont, Clermont-Ferrand, F-63000, France
| | - F Vinson
- Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, 31300, France
| | - C Duperier
- Université Clermont Auvergne, INRAE, UNH, Plateforme d'Exploration du Métabolisme, MetaboHUB Clermont, Clermont-Ferrand, F-63000, France
| | - W Garrier
- ISIMA, Campus des Cézeaux, Aubière, 63177, France
| | - P-E Saunier
- ISIMA, Campus des Cézeaux, Aubière, 63177, France
| | - Y Pitarch
- IRIT, Université de Toulouse, Cours Rose Dieng-Kuntz, Toulouse, 31400, France
| | - F Jourdan
- Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, 31300, France
| | - F Giacomoni
- Université Clermont Auvergne, INRAE, UNH, Plateforme d'Exploration du Métabolisme, MetaboHUB Clermont, Clermont-Ferrand, F-63000, France
| | - C Frainay
- Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, 31300, France
| |
Collapse
|
16
|
Yao R, Ianevski A, Kainov D. Safe-in-Man Broad Spectrum Antiviral Agents. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2021; 1322:313-337. [PMID: 34258746 DOI: 10.1007/978-981-16-0267-2_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Emerging and re-emerging viral diseases occur with regularity within the human population. The conventional 'one drug, one virus' paradigm for antivirals does not adequately allow for proper preparedness in the face of unknown future epidemics. In addition, drug developers lack the financial incentives to work on antiviral drug discovery, with most pharmaceutical companies choosing to focus on more profitable disease areas. Safe-in-man broad spectrum antiviral agents (BSAAs) can help meet the need for antiviral development by already having passed phase I clinical trials, requiring less time and money to develop, and having the capacity to work against many viruses, allowing for a speedy response when unforeseen epidemics arise. In this chapter, we discuss the benefits of repurposing existing drugs as BSAAs, describe the major steps in safe-in-man BSAA drug development from discovery through clinical trials, and list several database resources that are useful tools for antiviral drug repositioning.
Collapse
Affiliation(s)
- Rouan Yao
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology (NTNU), Trondheim, Norway
| | - Aleksandr Ianevski
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology (NTNU), Trondheim, Norway
| | - Denis Kainov
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology (NTNU), Trondheim, Norway.
- Institute of Technology, University of Tartu, Tartu, Estonia.
- Institute for Molecule Medicine Finland, FIMM, University of Helsinki, Helsinki, Finland.
| |
Collapse
|
17
|
Birgmeier J, Haeussler M, Deisseroth CA, Steinberg EH, Jagadeesh KA, Ratner AJ, Guturu H, Wenger AM, Diekhans ME, Stenson PD, Cooper DN, Ré C, Beggs AH, Bernstein JA, Bejerano G. AMELIE speeds Mendelian diagnosis by matching patient phenotype and genotype to primary literature. Sci Transl Med 2021; 12:12/544/eaau9113. [PMID: 32434849 DOI: 10.1126/scitranslmed.aau9113] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2018] [Revised: 08/14/2019] [Accepted: 04/22/2020] [Indexed: 12/21/2022]
Abstract
The diagnosis of Mendelian disorders requires labor-intensive literature research. Trained clinicians can spend hours looking for the right publication(s) supporting a single gene that best explains a patient's disease. AMELIE (Automatic Mendelian Literature Evaluation) greatly accelerates this process. AMELIE parses all 29 million PubMed abstracts and downloads and further parses hundreds of thousands of full-text articles in search of information supporting the causality and associated phenotypes of most published genetic variants. AMELIE then prioritizes patient candidate variants for their likelihood of explaining any patient's given set of phenotypes. Diagnosis of singleton patients (without relatives' exomes) is the most time-consuming scenario, and AMELIE ranked the causative gene at the very top for 66% of 215 diagnosed singleton Mendelian patients from the Deciphering Developmental Disorders project. Evaluating only the top 11 AMELIE-scored genes of 127 (median) candidate genes per patient resulted in a rapid diagnosis in more than 90% of cases. AMELIE-based evaluation of all cases was 3 to 19 times more efficient than hand-curated database-based approaches. We replicated these results on a retrospective cohort of clinical cases from Stanford Children's Health and the Manton Center for Orphan Disease Research. An analysis web portal with our most recent update, programmatic interface, and code is available at AMELIE.stanford.edu.
Collapse
Affiliation(s)
- Johannes Birgmeier
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Maximilian Haeussler
- Santa Cruz Genomics Institute, MS CBSE, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Cole A Deisseroth
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Ethan H Steinberg
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Karthik A Jagadeesh
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Alexander J Ratner
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Harendra Guturu
- Department of Pediatrics, Stanford School of Medicine, Stanford, CA 94305, USA
| | - Aaron M Wenger
- Department of Pediatrics, Stanford School of Medicine, Stanford, CA 94305, USA
| | - Mark E Diekhans
- Santa Cruz Genomics Institute, MS CBSE, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Peter D Stenson
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, UK
| | - David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, UK
| | - Christopher Ré
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Alan H Beggs
- Manton Center for Orphan Disease Research, Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA
| | | | - Gill Bejerano
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA. .,Department of Pediatrics, Stanford School of Medicine, Stanford, CA 94305, USA.,Department of Developmental Biology, Stanford University, Stanford, CA 94305, USA.,Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
18
|
Suresh NT, Ravindran VE, Krishnakumar U. A Computational Framework to Identify Cross Association Between Complex Disorders by Protein-protein Interaction Network Analysis. Curr Bioinform 2021. [DOI: 10.2174/1574893615999200724145434] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Objective:
It is a known fact that numerous complex disorders do not happen in
isolation indicating the plausible set of shared causes common to several different sicknesses.
Hence, analysis of comorbidity can be utilized to explore the association between several
disorders. In this study, we have proposed a network-based computational approach, in which
genes are organized based on the topological characteristics of the constructed Protein-Protein
Interaction Network (PPIN) followed by a network prioritization scheme, to identify distinctive
key genes and biological pathways shared among diseases.
Methods:
The proposed approach is initiated from constructed PPIN of any randomly chosen
disease genes in order to infer its associations with other diseases in terms of shared pathways, coexpression,
co-occurrence etc. For this, initially, proteins associated to any disease based on
random choice were identified. Secondly, PPIN is organized through topological analysis to define
hub genes. Finally, using a prioritization algorithm a ranked list of newly predicted
multimorbidity-associated proteins is generated. Using Gene Ontology (GO), cellular pathways
involved in multimorbidity-associated proteins are mined.
Result and Conclusion:
: The proposed methodology is tested using three disorders, namely
Diabetes, Obesity and blood pressure at an atomic level and the results suggest the comorbidity of
other complex diseases that have associations with the proteins included in the disease of present
study through shared proteins and pathways. For diabetes, we have obtained key genes like
GAPDH, TNF, IL6, AKT1, ALB, TP53, IL10, MAPK3, TLR4 and EGF with key pathways like
P53 pathway, VEGF signaling pathway, Ras Pathway, Interleukin signaling pathway, Endothelin
signaling pathway, Huntington disease etc. Studies on other disorders such as obesity and blood
pressure also revealed promising results.
Collapse
Affiliation(s)
- Nikhila T. Suresh
- Department of Computer Science and IT, Amrita School of Arts and Sciences, Amrita Vishwa Vidyapeetham, Kochi Campus, Kochi, India
| | - Vimina E. Ravindran
- Department of Computer Science and IT, Amrita School of Arts and Sciences, Amrita Vishwa Vidyapeetham, Kochi Campus, Kochi, India
| | - Ullattil Krishnakumar
- Department of Computer Science and IT, Amrita School of Arts and Sciences, Amrita Vishwa Vidyapeetham, Kochi Campus, Kochi, India
| |
Collapse
|
19
|
Muthubharathi BC, Gowripriya T, Balamurugan K. Metabolomics: small molecules that matter more. Mol Omics 2021; 17:210-229. [PMID: 33598670 DOI: 10.1039/d0mo00176g] [Citation(s) in RCA: 61] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Metabolomics, an analytical study with high-throughput profiling, helps to understand interactions within a biological system. Small molecules, called metabolites or metabolomes with the size of <1500 Da, depict the status of a biological system in a different manner. Currently, we are in need to globally analyze the metabolome and the pathways involved in healthy, as well as diseased conditions, for possible therapeutic applications. Metabolome analysis has revealed high-abundance molecules during different conditions such as diet, environmental stress, microbiota, and disease and treatment states. As a result, it is hard to understand the complete and stable network of metabolites of a biological system. This review helps readers know the available techniques to study metabolomics in addition to other major omics such as genomics, transcriptomics, and proteomics. This review also discusses the metabolomics in various pathological conditions and the importance of metabolomics in therapeutic applications.
Collapse
|
20
|
Protein-Protein Interaction Analysis through Network Topology (Oral Cancer). JOURNAL OF HEALTHCARE ENGINEERING 2021; 2021:6623904. [PMID: 33510888 PMCID: PMC7826244 DOI: 10.1155/2021/6623904] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Revised: 12/21/2020] [Accepted: 12/23/2020] [Indexed: 11/18/2022]
Abstract
Oral cancer is a complex disorder. Its creation and spreading are due to the interaction of several proteins and genes in different biological thoroughfares. To study biological pathways, many high-yield methods have been used. Efforts to merge several data found at separate levels related to biological thoroughfares and interlinkage networks remain elusive. In our research work, we have proposed a technique known as protein-protein interaction network for analysis and exploring the genes involved in oral cancer disorders. The previous studies have not fully analyzed the proteins or genes involved in oral cancer. Our proposed technique is fully interactive and analyzes the data of oral cancer disorder more accurately and efficiently. The methods used here enabled us to observe the wide network consists of one mighty network comprising of 208 nodes 1572 edges which connect these nodes and various detached small networks. In our study, TP53 is a gene that occupied an important position in the network. TP53 has a 113-degree value and 0.03881821 BC value, indicating that TP53 is centrally localized in the network and is a significant bottleneck protein in the oral cancer protein-protein interaction network. These findings suggested that the pathogenesis of oral cancer variation was organized by means of an integrated PPI network, which is centered on TP53. Furthermore, our identification shows that TP53 is the key role-playing protein in the oral cancer network, and its significance in the cellular networks in the body is determined as well. As TP53 (tumor protein 53) is a vital player in the cell division process, the cells may not grow or divide disorderly; it fulfills the function of at least one of the gene groups in oral cancer. However, the latter progression in the area is any measure; the intention of developing these networks is to transfigure sketch of core disease development, prognosis, and treatment.
Collapse
|
21
|
Taha K, Davuluri R, Yoo P, Spencer J. Personizing the prediction of future susceptibility to a specific disease. PLoS One 2021; 16:e0243127. [PMID: 33406077 PMCID: PMC7787538 DOI: 10.1371/journal.pone.0243127] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2019] [Accepted: 11/17/2020] [Indexed: 01/22/2023] Open
Abstract
A traceable biomarker is a member of a disease's molecular pathway. A disease may be associated with several molecular pathways. Each different combination of these molecular pathways, to which detected traceable biomarkers belong, may serve as an indicative of the elicitation of the disease at a different time frame in the future. Based on this notion, we introduce a novel methodology for personalizing an individual's degree of future susceptibility to a specific disease. We implemented the methodology in a working system called Susceptibility Degree to a Disease Predictor (SDDP). For a specific disease d, let S be the set of molecular pathways, to which traceable biomarkers detected from most patients of d belong. For the same disease d, let S' be the set of molecular pathways, to which traceable biomarkers detected from a certain individual belong. SDDP is able to infer the subset S'' ⊆{S-S'} of undetected molecular pathways for the individual. Thus, SDDP can infer undetected molecular pathways of a disease for an individual based on few molecular pathways detected from the individual. SDDP can also help in inferring the combination of molecular pathways in the set {S'+S''}, whose traceable biomarkers collectively is an indicative of the disease. SDDP is composed of the following four components: information extractor, interrelationship between molecular pathways modeler, logic inferencer, and risk indicator. The information extractor takes advantage of the exponential increase of biomedical literature to automatically extract the common traceable biomarkers for a specific disease. The interrelationship between molecular pathways modeler models the hierarchical interrelationships between the molecular pathways of the traceable biomarkers. The logic inferencer transforms the hierarchical interrelationships between the molecular pathways into rule-based specifications. It employs the specification rules and the inference rules for predicate logic to infer as many as possible undetected molecular pathways of a disease for an individual. The risk indicator outputs a risk indicator value that reflects the individual's degree of future susceptibility to the disease. We evaluated SDDP by comparing it experimentally with other methods. Results revealed marked improvement.
Collapse
Affiliation(s)
- Kamal Taha
- Department of Electrical and Computer Science, Khalifa University, Abu Dhabi, UAE
- * E-mail:
| | - Ramana Davuluri
- Department of Biomedical Informatics, School of Medicine and College of Engineering and Applied Sciences, Stony Brook University, Stony Brook, New York, United States of America
| | - Paul Yoo
- Department of Computer Science & Information Systems, University of London, Birkbeck College, London, United Kingdom
| | - Jesse Spencer
- Department of Pathology, University of Utah, Salt Lake City, Utah, United States of America
| |
Collapse
|
22
|
Perera N, Dehmer M, Emmert-Streib F. Named Entity Recognition and Relation Detection for Biomedical Information Extraction. Front Cell Dev Biol 2020; 8:673. [PMID: 32984300 PMCID: PMC7485218 DOI: 10.3389/fcell.2020.00673] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Accepted: 07/02/2020] [Indexed: 12/29/2022] Open
Abstract
The number of scientific publications in the literature is steadily growing, containing our knowledge in the biomedical, health, and clinical sciences. Since there is currently no automatic archiving of the obtained results, much of this information remains buried in textual details not readily available for further usage or analysis. For this reason, natural language processing (NLP) and text mining methods are used for information extraction from such publications. In this paper, we review practices for Named Entity Recognition (NER) and Relation Detection (RD), allowing, e.g., to identify interactions between proteins and drugs or genes and diseases. This information can be integrated into networks to summarize large-scale details on a particular biomedical or clinical problem, which is then amenable for easy data management and further analysis. Furthermore, we survey novel deep learning methods that have recently been introduced for such tasks.
Collapse
Affiliation(s)
- Nadeesha Perera
- Predictive Society and Data Analytics Lab, Faculty of Information Technology and Communication Sciences, Tampere University, Tampere, Finland
| | - Matthias Dehmer
- Department of Mechatronics and Biomedical Computer Science, University for Health Sciences, Medical Informatics and Technology (UMIT), Hall in Tirol, Austria
- College of Artificial Intelligence, Nankai University, Tianjin, China
| | - Frank Emmert-Streib
- Predictive Society and Data Analytics Lab, Faculty of Information Technology and Communication Sciences, Tampere University, Tampere, Finland
- Faculty of Medicine and Health Technology, Institute of Biosciences and Medical Technology, Tampere University, Tampere, Finland
| |
Collapse
|
23
|
Jarada TN, Rokne JG, Alhajj R. A review of computational drug repositioning: strategies, approaches, opportunities, challenges, and directions. J Cheminform 2020; 12:46. [PMID: 33431024 PMCID: PMC7374666 DOI: 10.1186/s13321-020-00450-7] [Citation(s) in RCA: 139] [Impact Index Per Article: 34.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2020] [Accepted: 07/13/2020] [Indexed: 01/13/2023] Open
Abstract
Drug repositioning is the process of identifying novel therapeutic potentials for existing drugs and discovering therapies for untreated diseases. Drug repositioning, therefore, plays an important role in optimizing the pre-clinical process of developing novel drugs by saving time and cost compared to the traditional de novo drug discovery processes. Since drug repositioning relies on data for existing drugs and diseases the enormous growth of publicly available large-scale biological, biomedical, and electronic health-related data along with the high-performance computing capabilities have accelerated the development of computational drug repositioning approaches. Multidisciplinary researchers and scientists have carried out numerous attempts, with different degrees of efficiency and success, to computationally study the potential of repositioning drugs to identify alternative drug indications. This study reviews recent advancements in the field of computational drug repositioning. First, we highlight different drug repositioning strategies and provide an overview of frequently used resources. Second, we summarize computational approaches that are extensively used in drug repositioning studies. Third, we present different computing and experimental models to validate computational methods. Fourth, we address prospective opportunities, including a few target areas. Finally, we discuss challenges and limitations encountered in computational drug repositioning and conclude with an outline of further research directions.
Collapse
Affiliation(s)
- Tamer N Jarada
- Department of Computer Science, University of Calgary, Calgary, Alberta, Canada
| | - Jon G Rokne
- Department of Computer Science, University of Calgary, Calgary, Alberta, Canada
| | - Reda Alhajj
- Department of Computer Science, University of Calgary, Calgary, Alberta, Canada.
- Department of Computer Engineering, Istanbul Medipol University, Istanbul, Turkey.
| |
Collapse
|
24
|
Suresh NT, E R V, U K. Multi-scale top-down approach for modelling epileptic protein-protein interaction network analysis to identify driver nodes and pathways. Comput Biol Chem 2020; 88:107323. [PMID: 32653778 DOI: 10.1016/j.compbiolchem.2020.107323] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Revised: 06/04/2020] [Accepted: 06/23/2020] [Indexed: 12/23/2022]
Abstract
Protein - Protein Interaction Network (PPIN) analysis unveils molecular level mechanisms involved in disease condition. To explore the complex regulatory mechanisms behind epilepsy and to address the clinical and biological issues of epilepsy, in silico techniques are feasible in a cost- effective manner. In this work, a hierarchical procedure to identify influential genes and regulatory pathways in epilepsy prognosis is proposed. To obtain key genes and pathways causing epilepsy, integration of two benchmarked datasets which are exclusively devoted for complex disorders is done as an initial step. Using STRING database, PPIN is constructed for modelling protein-protein interactions. Further, key interactions are obtained from the established PPIN using network centrality measures followed by network propagation algorithm -Random Walk with Restart (RWR). The outcome of the method reveals some influential genes behind epilepsy prognosis, along with their associated pathways like PI3 kinase, VEGF signaling, Ras, Wnt signaling etc. In comparison with similar works, our results have shown improvement in identifying unique molecular functions, biological processes, gene co-occurrences etc. Also, CORUM provides an annotation for approximately 60% of similarity in human protein complexes with the obtained result. We believe that the formulated strategy can put-up the vast consideration of indigenous drugs towards meticulous identification of genes encoded by protein against several combinatorial disorders.
Collapse
Affiliation(s)
- Nikhila T Suresh
- Department of Computer Science and IT, Amrita School of Arts and Sciences, Amrita Vishwa Vidyapeetham, Kochi Campus, India
| | - Vimina E R
- Department of Computer Science and IT, Amrita School of Arts and Sciences, Amrita Vishwa Vidyapeetham, Kochi Campus, India.
| | - Krishnakumar U
- Department of Computer Science and IT, Amrita School of Arts and Sciences, Amrita Vishwa Vidyapeetham, Kochi Campus, India
| |
Collapse
|
25
|
Foroutan A, Fitzsimmons C, Mandal R, Piri-Moghadam H, Zheng J, Guo A, Li C, Guan LL, Wishart DS. The Bovine Metabolome. Metabolites 2020; 10:metabo10060233. [PMID: 32517015 PMCID: PMC7345087 DOI: 10.3390/metabo10060233] [Citation(s) in RCA: 71] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2020] [Revised: 06/01/2020] [Accepted: 06/02/2020] [Indexed: 01/17/2023] Open
Abstract
From an animal health perspective, relatively little is known about the typical or healthy ranges of concentrations for many metabolites in bovine biofluids and tissues. Here, we describe the results of a comprehensive, quantitative metabolomic characterization of six bovine biofluids and tissues, including serum, ruminal fluid, liver, Longissimus thoracis (LT) muscle, semimembranosus (SM) muscle, and testis tissues. Using nuclear magnetic resonance (NMR) spectroscopy, liquid chromatography–tandem mass spectrometry (LC–MS/MS), and inductively coupled plasma–mass spectrometry (ICP–MS), we were able to identify and quantify more than 145 metabolites in each of these biofluids/tissues. Combining these results with previous work done by our team on other bovine biofluids, as well as previously published literature values for other bovine tissues and biofluids, we were able to generate quantitative reference concentration data for 2100 unique metabolites across five different bovine biofluids and seven different tissues. These experimental data were combined with computer-aided, genome-scale metabolite inference techniques to add another 48,628 unique metabolites that are biochemically expected to be in bovine tissues or biofluids. Altogether, 51,801 unique metabolites were identified in this study. Detailed information on these 51,801 unique metabolites has been placed in a publicly available database called the Bovine Metabolome Database.
Collapse
Affiliation(s)
- Aidin Foroutan
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB T6G 2P5, Canada; (A.F.); (C.F.); (L.L.G.)
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada; (R.M.); (H.P.-M.); (J.Z.); (A.G.); (C.L.)
| | - Carolyn Fitzsimmons
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB T6G 2P5, Canada; (A.F.); (C.F.); (L.L.G.)
- Agriculture and Agri-Food Canada, Edmonton, AB T6G 2P5, Canada
| | - Rupasri Mandal
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada; (R.M.); (H.P.-M.); (J.Z.); (A.G.); (C.L.)
| | - Hamed Piri-Moghadam
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada; (R.M.); (H.P.-M.); (J.Z.); (A.G.); (C.L.)
| | - Jiamin Zheng
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada; (R.M.); (H.P.-M.); (J.Z.); (A.G.); (C.L.)
| | - AnChi Guo
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada; (R.M.); (H.P.-M.); (J.Z.); (A.G.); (C.L.)
| | - Carin Li
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada; (R.M.); (H.P.-M.); (J.Z.); (A.G.); (C.L.)
| | - Le Luo Guan
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB T6G 2P5, Canada; (A.F.); (C.F.); (L.L.G.)
| | - David S. Wishart
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada; (R.M.); (H.P.-M.); (J.Z.); (A.G.); (C.L.)
- Department of Computing Science, University of Alberta, Edmonton, AB T6G 2E8, Canada
- Correspondence:
| |
Collapse
|
26
|
Cabrera-Andrade A, López-Cortés A, Jaramillo-Koupermann G, Paz-y-Miño C, Pérez-Castillo Y, Munteanu CR, González-Díaz H, Pazos A, Tejera E. Gene Prioritization through Consensus Strategy, Enrichment Methodologies Analysis, and Networking for Osteosarcoma Pathogenesis. Int J Mol Sci 2020; 21:E1053. [PMID: 32033398 PMCID: PMC7038221 DOI: 10.3390/ijms21031053] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Revised: 01/30/2020] [Accepted: 01/30/2020] [Indexed: 12/12/2022] Open
Abstract
Osteosarcoma is the most common subtype of primary bone cancer, affecting mostly adolescents. In recent years, several studies have focused on elucidating the molecular mechanisms of this sarcoma; however, its molecular etiology has still not been determined with precision. Therefore, we applied a consensus strategy with the use of several bioinformatics tools to prioritize genes involved in its pathogenesis. Subsequently, we assessed the physical interactions of the previously selected genes and applied a communality analysis to this protein-protein interaction network. The consensus strategy prioritized a total list of 553 genes. Our enrichment analysis validates several studies that describe the signaling pathways PI3K/AKT and MAPK/ERK as pathogenic. The gene ontology described TP53 as a principal signal transducer that chiefly mediates processes associated with cell cycle and DNA damage response It is interesting to note that the communality analysis clusters several members involved in metastasis events, such as MMP2 and MMP9, and genes associated with DNA repair complexes, like ATM, ATR, CHEK1, and RAD51. In this study, we have identified well-known pathogenic genes for osteosarcoma and prioritized genes that need to be further explored.
Collapse
Affiliation(s)
- Alejandro Cabrera-Andrade
- Grupo de Bio-Quimioinformática, Universidad de Las Américas, Quito 170125, Ecuador;
- Carrera de Enfermería, Facultad de Ciencias de la Salud, Universidad de Las Américas, Quito 170125, Ecuador
- RNASA-IMEDIR, Computer Sciences Faculty, University of A Coruna, 15071 A Coruña, Spain; (A.L.-C.); (C.R.M.); (A.P.)
| | - Andrés López-Cortés
- RNASA-IMEDIR, Computer Sciences Faculty, University of A Coruna, 15071 A Coruña, Spain; (A.L.-C.); (C.R.M.); (A.P.)
- Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Quito 170129, Ecuador;
| | - Gabriela Jaramillo-Koupermann
- Laboratorio de Biología Molecular, Subproceso de Anatomía Patológica, Hospital de Especialidades Eugenio Espejo, Quito 170403, Ecuador;
| | - César Paz-y-Miño
- Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Quito 170129, Ecuador;
| | - Yunierkis Pérez-Castillo
- Grupo de Bio-Quimioinformática, Universidad de Las Américas, Quito 170125, Ecuador;
- Escuela de Ciencias Físicas y Matemáticas, Universidad de Las Américas, Quito 170125, Ecuador
| | - Cristian R. Munteanu
- RNASA-IMEDIR, Computer Sciences Faculty, University of A Coruna, 15071 A Coruña, Spain; (A.L.-C.); (C.R.M.); (A.P.)
- Biomedical Research Institute of A Coruña (INIBIC), University Hospital Complex of A Coruña (CHUAC), 15006 A Coruña, Spain
- Centro de Investigación en Tecnologías de la Información y las Comunicaciones (CITIC), Campus de Elviña s/n, 15071 A Coruña, Spain
| | - Humbert González-Díaz
- Department of Organic Chemistry II, University of the Basque Country UPV/EHU, 48940 Leioa, Spain
- IKERBASQUE, Basque Foundation for Science, 48011 Bilbao, Spain;
| | - Alejandro Pazos
- RNASA-IMEDIR, Computer Sciences Faculty, University of A Coruna, 15071 A Coruña, Spain; (A.L.-C.); (C.R.M.); (A.P.)
- Biomedical Research Institute of A Coruña (INIBIC), University Hospital Complex of A Coruña (CHUAC), 15006 A Coruña, Spain
- Centro de Investigación en Tecnologías de la Información y las Comunicaciones (CITIC), Campus de Elviña s/n, 15071 A Coruña, Spain
| | - Eduardo Tejera
- Grupo de Bio-Quimioinformática, Universidad de Las Américas, Quito 170125, Ecuador;
- Facultad de Ingeniería y Ciencias Agropecuarias, Universidad de Las Américas, Quito 170125, Ecuador
| |
Collapse
|
27
|
Boland MR, Kashyap A, Xiong J, Holmes J, Lorch S. Development and validation of the PEPPER framework (Prenatal Exposure PubMed ParsER) with applications to food additives. J Am Med Inform Assoc 2019; 25:1432-1443. [PMID: 30371821 PMCID: PMC6213088 DOI: 10.1093/jamia/ocy119] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2018] [Accepted: 08/13/2018] [Indexed: 11/14/2022] Open
Abstract
Background Globally, 36% of deaths among children can be attributed to environmental factors. However, no comprehensive list of environmental exposures exists. We seek to address this gap by developing a literature-mining algorithm to catalog prenatal environmental exposures. Methods We designed a framework called. PEPPER Prenatal Exposure PubMed ParsER to a) catalog prenatal exposures studied in the literature and b) identify study type. Using PubMed Central, PEPPER classifies article type (methodology, systematic review) and catalogs prenatal exposures. We coupled PEPPER with the FDA's food additive database to form a master set of exposures. Results We found that of 31 764 prenatal exposure studies only 53.0% were methodology studies. PEPPER consists of 219 prenatal exposures, including a common set of 43 exposures. PEPPER captured prenatal exposures from 56.4% of methodology studies (9492/16 832 studies). Two raters independently reviewed 50 randomly selected articles and annotated presence of exposures and study methodology type. Error rates for PEPPER's exposure assignment ranged from 0.56% to 1.30% depending on the rater. Evaluation of the study type assignment showed agreement ranging from 96% to 100% (kappa = 0.909, p < .001). Using a gold-standard set of relevant prenatal exposure studies, PEPPER achieved a recall of 94.4%. Conclusions Using curated exposures and food additives; PEPPER provides the first comprehensive list of 219 prenatal exposures studied in methodology papers. On average, 1.45 exposures were investigated per study. PEPPER successfully distinguished article type for all prenatal studies allowing literature gaps to be easily identified.
Collapse
Affiliation(s)
- Mary Regina Boland
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.,Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA.,Center for Excellence in Environmental Toxicology, University of Pennsylvania, Philadelphia, PA, USA.,Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Aditya Kashyap
- Data Science Masters Program, University of Pennsylvania, Philadelphia, PA, USA
| | - Jiadi Xiong
- Data Science Masters Program, University of Pennsylvania, Philadelphia, PA, USA
| | - John Holmes
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.,Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Scott Lorch
- Division of Neonatology, Department of Pediatrics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| |
Collapse
|
28
|
Chaganti S, Welty VF, Taylor W, Albert K, Failla MD, Cascio C, Smith S, Mawn L, Resnick SM, Beason-Held LL, Bagnato F, Lasko T, Blume JD, Landman BA. Discovering novel disease comorbidities using electronic medical records. PLoS One 2019; 14:e0225495. [PMID: 31774837 PMCID: PMC6880990 DOI: 10.1371/journal.pone.0225495] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Accepted: 09/22/2019] [Indexed: 11/18/2022] Open
Abstract
Increasing reliance on electronic medical records at large medical centers provides unique opportunities to perform population level analyses exploring disease progression and etiology. The massive accumulation of diagnostic, procedure, and laboratory codes in one place has enabled the exploration of co-occurring conditions, their risk factors, and potential prognostic factors. While most of the readily identifiable associations in medical records are (now) well known to the scientific community, there is no doubt many more relationships are still to be uncovered in EMR data. In this paper, we introduce a novel finding index to help with that task. This new index uses data mined from real-time PubMed abstracts to indicate the extent to which empirically discovered associations are already known (i.e., present in the scientific literature). Our methods leverage second-generation p-values, which better identify associations that are truly clinically meaningful. We illustrate our new method with three examples: Autism Spectrum Disorder, Alzheimer's Disease, and Optic Neuritis. Our results demonstrate wide utility for identifying new associations in EMR data that have the highest priority among the complex web of correlations and causalities. Data scientists and clinicians can work together more effectively to discover novel associations that are both empirically reliable and clinically understudied.
Collapse
Affiliation(s)
- Shikha Chaganti
- Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Valerie F. Welty
- Department of Biostatistics, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Warren Taylor
- Department of Psychiatry & Behavioral Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Kimberly Albert
- Department of Psychiatry & Behavioral Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Michelle D. Failla
- Department of Psychiatry & Behavioral Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Carissa Cascio
- Department of Psychiatry & Behavioral Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Seth Smith
- Department of Radiology and Radiological Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Louise Mawn
- Department of Ophthalmology and Visual Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Susan M. Resnick
- Laboratory of Behavioral Neuroscience, National Institute on Aging, Baltimore, Maryland, United States of America
| | - Lori L. Beason-Held
- Laboratory of Behavioral Neuroscience, National Institute on Aging, Baltimore, Maryland, United States of America
| | - Francesca Bagnato
- Department of Neurology, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Thomas Lasko
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Jeffrey D. Blume
- Department of Biostatistics, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Bennett A. Landman
- Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, Tennessee, United States of America
| |
Collapse
|
29
|
Kumar R, Harilal S, Gupta SV, Jose J, Thomas Parambi DG, Uddin MS, Shah MA, Mathew B. Exploring the new horizons of drug repurposing: A vital tool for turning hard work into smart work. Eur J Med Chem 2019; 182:111602. [PMID: 31421629 PMCID: PMC7127402 DOI: 10.1016/j.ejmech.2019.111602] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 08/07/2019] [Accepted: 08/07/2019] [Indexed: 02/07/2023]
Abstract
Drug discovery and development are long and financially taxing processes. On an average it takes 12-15 years and costs 1.2 billion USD for successful drug discovery and approval for clinical use. Many lead molecules are not developed further and their potential is not tapped to the fullest due to lack of resources or time constraints. In order for a drug to be approved by FDA for clinical use, it must have excellent therapeutic potential in the desired area of target with minimal toxicities as supported by both pre-clinical and clinical studies. The targeted clinical evaluations fail to explore other potential therapeutic applications of the candidate drug. Drug repurposing or repositioning is a fast and relatively cheap alternative to the lengthy and expensive de novo drug discovery and development. Drug repositioning utilizes the already available clinical trials data for toxicity and adverse effects, at the same time explores the drug's therapeutic potential for a different disease. This review addresses recent developments and future scope of drug repositioning strategy.
Collapse
Affiliation(s)
- Rajesh Kumar
- Department of Pharmacy, Kerala University of Health Sciences, Thrissur, Kerala, India
| | - Seetha Harilal
- Department of Pharmacy, Kerala University of Health Sciences, Thrissur, Kerala, India
| | - Sheeba Varghese Gupta
- Department of Pharmaceutical Sciences, College of Pharmacy, University of South Florida, Tampa, FL, 33612, USA
| | - Jobin Jose
- Department of Pharmaceutics, NGSM Institute of Pharmaceutical Science, NITTE Deemed to be University, Manglore, 575018, India
| | - Della Grace Thomas Parambi
- Department of Pharmaceutical Chemistry, College of Pharmacy, Jouf University, Sakaka, Al Jouf, 2014, Saudi Arabia
| | - Md Sahab Uddin
- Department of Pharmacy, Southeast University, Dhaka, Bangladesh; Pharmakon Neuroscience Research Network, Dhaka, Bangladesh
| | - Muhammad Ajmal Shah
- Department of Pharmacogonosy, Faculty of Pharmaceutical Sciences, Government College University, Faisalabad, Pakistan
| | - Bijo Mathew
- Division of Drug Design and Medicinal Chemistry Research Lab, Department of Pharmaceutical Chemistry, Ahalia School of Pharmacy, Palakkad, 678557, Kerala, India.
| |
Collapse
|
30
|
Zolotareva O, Kleine M. A Survey of Gene Prioritization Tools for Mendelian and Complex Human Diseases. J Integr Bioinform 2019; 16:/j/jib.ahead-of-print/jib-2018-0069/jib-2018-0069.xml. [PMID: 31494632 PMCID: PMC7074139 DOI: 10.1515/jib-2018-0069] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2018] [Accepted: 07/12/2019] [Indexed: 12/16/2022] Open
Abstract
Modern high-throughput experiments provide us with numerous potential associations between genes and diseases. Experimental validation of all the discovered associations, let alone all the possible interactions between them, is time-consuming and expensive. To facilitate the discovery of causative genes, various approaches for prioritization of genes according to their relevance for a given disease have been developed. In this article, we explain the gene prioritization problem and provide an overview of computational tools for gene prioritization. Among about a hundred of published gene prioritization tools, we select and briefly describe 14 most up-to-date and user-friendly. Also, we discuss the advantages and disadvantages of existing tools, challenges of their validation, and the directions for future research.
Collapse
Affiliation(s)
- Olga Zolotareva
- Bielefeld University, Faculty of Technology and Center for Biotechnology, International Research Training Group "Computational Methods for the Analysis of the Diversity and Dynamics of Genomes" and Genome Informatics, Universitätsstraße 25, Bielefeld, Germany
| | - Maren Kleine
- Bielefeld University, Faculty of Technology, Bioinformatics/Medical Informatics Department, Universitätsstraße 25, Bielefeld, Germany
| |
Collapse
|
31
|
GPS: Identification of disease genes by rank aggregation of multi-genomic scoring schemes. Genomics 2019; 111:612-618. [DOI: 10.1016/j.ygeno.2018.03.017] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2018] [Revised: 03/16/2018] [Accepted: 03/21/2018] [Indexed: 12/19/2022]
|
32
|
Essack M, Salhi A, Stanimirovic J, Tifratene F, Bin Raies A, Hungler A, Uludag M, Van Neste C, Trpkovic A, Bajic VP, Bajic VB, Isenovic ER. Literature-Based Enrichment Insights into Redox Control of Vascular Biology. OXIDATIVE MEDICINE AND CELLULAR LONGEVITY 2019; 2019:1769437. [PMID: 31223421 PMCID: PMC6542245 DOI: 10.1155/2019/1769437] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2019] [Revised: 04/11/2019] [Accepted: 05/02/2019] [Indexed: 02/07/2023]
Abstract
In cellular physiology and signaling, reactive oxygen species (ROS) play one of the most critical roles. ROS overproduction leads to cellular oxidative stress. This may lead to an irrecoverable imbalance of redox (oxidation-reduction reaction) function that deregulates redox homeostasis, which itself could lead to several diseases including neurodegenerative disease, cardiovascular disease, and cancers. In this study, we focus on the redox effects related to vascular systems in mammals. To support research in this domain, we developed an online knowledge base, DES-RedoxVasc, which enables exploration of information contained in the biomedical scientific literature. The DES-RedoxVasc system analyzed 233399 documents consisting of PubMed abstracts and PubMed Central full-text articles related to different aspects of redox biology in vascular systems. It allows researchers to explore enriched concepts from 28 curated thematic dictionaries, as well as literature-derived potential associations of pairs of such enriched concepts, where associations themselves are statistically enriched. For example, the system allows exploration of associations of pathways, diseases, mutations, genes/proteins, miRNAs, long ncRNAs, toxins, drugs, biological processes, molecular functions, etc. that allow for insights about different aspects of redox effects and control of processes related to the vascular system. Moreover, we deliver case studies about some existing or possibly novel knowledge regarding redox of vascular biology demonstrating the usefulness of DES-RedoxVasc. DES-RedoxVasc is the first compiled knowledge base using text mining for the exploration of this topic.
Collapse
Affiliation(s)
- Magbubah Essack
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal, Saudi Arabia
| | - Adil Salhi
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal, Saudi Arabia
| | - Julijana Stanimirovic
- Vinca Institute, University of Belgrade, Laboratory for Molecular Endocrinology and Radiobiology, Belgrade, Serbia
| | - Faroug Tifratene
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal, Saudi Arabia
| | - Arwa Bin Raies
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal, Saudi Arabia
| | - Arnaud Hungler
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal, Saudi Arabia
| | - Mahmut Uludag
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal, Saudi Arabia
| | - Christophe Van Neste
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal, Saudi Arabia
| | - Andreja Trpkovic
- Vinca Institute, University of Belgrade, Laboratory for Molecular Endocrinology and Radiobiology, Belgrade, Serbia
| | - Vladan P. Bajic
- Vinca Institute, University of Belgrade, Laboratory for Molecular Endocrinology and Radiobiology, Belgrade, Serbia
| | - Vladimir B. Bajic
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal, Saudi Arabia
| | - Esma R. Isenovic
- Vinca Institute, University of Belgrade, Laboratory for Molecular Endocrinology and Radiobiology, Belgrade, Serbia
| |
Collapse
|
33
|
Foroutan A, Guo AC, Vazquez-Fresno R, Lipfert M, Zhang L, Zheng J, Badran H, Budinski Z, Mandal R, Ametaj BN, Wishart DS. Chemical Composition of Commercial Cow's Milk. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2019; 67:4897-4914. [PMID: 30994344 DOI: 10.1021/acs.jafc.9b00204] [Citation(s) in RCA: 120] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Bovine milk is a nutritionally rich, chemically complex biofluid consisting of hundreds of different components. While the chemical composition of cow's milk has been studied for decades, much of this information is fragmentary and very dated. In an effort to consolidate and update this information, we have applied modern, quantitative metabolomics techniques along with computer-aided literature mining to obtain the most comprehensive and up-to-date characterization of the chemical constituents in commercial cow's milk. Using nuclear magnetic resonance (NMR) spectroscopy, liquid chromatography-mass spectrometry (LC-MS), and inductively coupled plasma-mass spectrometry (ICP-MS), we were able to identify and quantify 296 bovine milk metabolites or metabolite species (corresponding to 1447 unique structures) from a variety of commercial milk samples. Through our literature analysis, we also found another 676 metabolites or metabolite species (corresponding to 908 unique structures). Detailed information regarding all 2355 of the identified chemicals in bovine milk have been made freely available through a Web-accessible database called the Milk Composition Database or MCDB ( http://www.mcdb.ca/ ).
Collapse
Affiliation(s)
- Aidin Foroutan
- Department of Biological Sciences , University of Alberta , Edmonton , Alberta , Canada T6G 2E9
- Department of Agricultural , Food and Nutritional Science , Edmonton , Alberta , Canada T6G 2P5
| | - An Chi Guo
- Department of Biological Sciences , University of Alberta , Edmonton , Alberta , Canada T6G 2E9
| | - Rosa Vazquez-Fresno
- Department of Biological Sciences , University of Alberta , Edmonton , Alberta , Canada T6G 2E9
| | - Matthias Lipfert
- Department of Biological Sciences , University of Alberta , Edmonton , Alberta , Canada T6G 2E9
| | - Lun Zhang
- Department of Biological Sciences , University of Alberta , Edmonton , Alberta , Canada T6G 2E9
| | - Jiamin Zheng
- Department of Biological Sciences , University of Alberta , Edmonton , Alberta , Canada T6G 2E9
| | - Hasan Badran
- Department of Computing Sciences , University of Alberta , Edmonton , Alberta , Canada T6G 2E8
| | - Zachary Budinski
- Department of Computing Sciences , University of Alberta , Edmonton , Alberta , Canada T6G 2E8
| | - Rupasri Mandal
- Department of Biological Sciences , University of Alberta , Edmonton , Alberta , Canada T6G 2E9
| | - Burim N Ametaj
- Department of Agricultural , Food and Nutritional Science , Edmonton , Alberta , Canada T6G 2P5
| | - David S Wishart
- Department of Biological Sciences , University of Alberta , Edmonton , Alberta , Canada T6G 2E9
- Department of Computing Sciences , University of Alberta , Edmonton , Alberta , Canada T6G 2E8
| |
Collapse
|
34
|
Xu J, Yang P, Xue S, Sharma B, Sanchez-Martin M, Wang F, Beaty KA, Dehan E, Parikh B. Translating cancer genomics into precision medicine with artificial intelligence: applications, challenges and future perspectives. Hum Genet 2019; 138:109-124. [PMID: 30671672 PMCID: PMC6373233 DOI: 10.1007/s00439-019-01970-5] [Citation(s) in RCA: 94] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2018] [Accepted: 01/02/2019] [Indexed: 02/07/2023]
Abstract
In the field of cancer genomics, the broad availability of genetic information offered by next-generation sequencing technologies and rapid growth in biomedical publication has led to the advent of the big-data era. Integration of artificial intelligence (AI) approaches such as machine learning, deep learning, and natural language processing (NLP) to tackle the challenges of scalability and high dimensionality of data and to transform big data into clinically actionable knowledge is expanding and becoming the foundation of precision medicine. In this paper, we review the current status and future directions of AI application in cancer genomics within the context of workflows to integrate genomic analysis for precision cancer care. The existing solutions of AI and their limitations in cancer genetic testing and diagnostics such as variant calling and interpretation are critically analyzed. Publicly available tools or algorithms for key NLP technologies in the literature mining for evidence-based clinical recommendations are reviewed and compared. In addition, the present paper highlights the challenges to AI adoption in digital healthcare with regard to data requirements, algorithmic transparency, reproducibility, and real-world assessment, and discusses the importance of preparing patients and physicians for modern digitized healthcare. We believe that AI will remain the main driver to healthcare transformation toward precision medicine, yet the unprecedented challenges posed should be addressed to ensure safety and beneficial impact to healthcare.
Collapse
Affiliation(s)
- Jia Xu
- IBM Watson Health, Cambridge, MA, USA.
| | | | - Shang Xue
- IBM Watson Health, Cambridge, MA, USA
| | | | | | - Fang Wang
- IBM Watson Health, Cambridge, MA, USA
| | | | | | | |
Collapse
|
35
|
Zhao Y, Jhamb D, Shu L, Arneson D, Rajpal DK, Yang X. Multi-omics integration reveals molecular networks and regulators of psoriasis. BMC SYSTEMS BIOLOGY 2019; 13:8. [PMID: 30642337 PMCID: PMC6332659 DOI: 10.1186/s12918-018-0671-x] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/08/2018] [Accepted: 12/11/2018] [Indexed: 12/19/2022]
Abstract
BACKGROUND Psoriasis is a complex multi-factorial disease, involving both genetic susceptibilities and environmental triggers. Genome-wide association studies (GWAS) and epigenome-wide association studies (EWAS) have been carried out to identify genetic and epigenetic variants that are associated with psoriasis. However, these loci cannot fully explain the disease pathogenesis. METHODS To achieve a comprehensive mechanistic understanding of psoriasis, we conducted a systems biology study, integrating multi-omics datasets including GWAS, EWAS, tissue-specific transcriptome, expression quantitative trait loci (eQTLs), gene networks, and biological pathways to identify the key genes, processes, and networks that are genetically and epigenetically associated with psoriasis risk. RESULTS This integrative genomics study identified both well-characterized (e.g., the IL17 pathway in both GWAS and EWAS) and novel biological processes (e.g., the branched chain amino acid catabolism process in GWAS and the platelet and coagulation pathway in EWAS) involved in psoriasis. Finally, by utilizing tissue-specific gene regulatory networks, we unraveled the interactions among the psoriasis-associated genes and pathways in a tissue-specific manner and detected potential key regulatory genes in the psoriasis networks. CONCLUSIONS The integration and convergence of multi-omics signals provide deeper and comprehensive insights into the biological mechanisms associated with psoriasis susceptibility.
Collapse
Affiliation(s)
- Yuqi Zhao
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Dr. East, Los Angeles, CA, 90095, USA
| | - Deepali Jhamb
- Target Sciences, Computational Biology (US) GSK, 1250 South Collegeville Road, Collegeville, PA, 19426, USA
| | - Le Shu
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Dr. East, Los Angeles, CA, 90095, USA
| | - Douglas Arneson
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Dr. East, Los Angeles, CA, 90095, USA
| | - Deepak K Rajpal
- Target Sciences, Computational Biology (US) GSK, 1250 South Collegeville Road, Collegeville, PA, 19426, USA.
| | - Xia Yang
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Dr. East, Los Angeles, CA, 90095, USA. .,Institute for Quantitative and Computational Biosciences, University of California , 610 Charles E. Young Dr. East, Los Angeles, CA, 90095, USA. .,Molecular Biology Institute, University of California, 610 Charles E. Young Dr. East, Los Angeles, CA, 90095, USA. .,Bioinformatics Interdepartmental Program, University of California, 10 Charles E. Young Dr. East, Los Angeles, CA, 90095, USA.
| |
Collapse
|
36
|
Stacey D, Fauman EB, Ziemek D, Sun BB, Harshfield EL, Wood AM, Butterworth AS, Suhre K, Paul DS. ProGeM: a framework for the prioritization of candidate causal genes at molecular quantitative trait loci. Nucleic Acids Res 2019; 47:e3. [PMID: 30239796 PMCID: PMC6326795 DOI: 10.1093/nar/gky837] [Citation(s) in RCA: 64] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2017] [Revised: 08/31/2018] [Accepted: 09/11/2018] [Indexed: 12/27/2022] Open
Abstract
Quantitative trait locus (QTL) mapping of molecular phenotypes such as metabolites, lipids and proteins through genome-wide association studies represents a powerful means of highlighting molecular mechanisms relevant to human diseases. However, a major challenge of this approach is to identify the causal gene(s) at the observed QTLs. Here, we present a framework for the 'Prioritization of candidate causal Genes at Molecular QTLs' (ProGeM), which incorporates biological domain-specific annotation data alongside genome annotation data from multiple repositories. We assessed the performance of ProGeM using a reference set of 227 previously reported and extensively curated metabolite QTLs. For 98% of these loci, the expert-curated gene was one of the candidate causal genes prioritized by ProGeM. Benchmarking analyses revealed that 69% of the causal candidates were nearest to the sentinel variant at the investigated molecular QTLs, indicating that genomic proximity is the most reliable indicator of 'true positive' causal genes. In contrast, cis-gene expression QTL data led to three false positive candidate causal gene assignments for every one true positive assignment. We provide evidence that these conclusions also apply to other molecular phenotypes, suggesting that ProGeM is a powerful and versatile tool for annotating molecular QTLs. ProGeM is freely available via GitHub.
Collapse
Affiliation(s)
- David Stacey
- MRC/BHF Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge CB1 8RN, UK
| | - Eric B Fauman
- Pfizer Worldwide Research & Development, Genome Sciences & Technologies, Cambridge, MA 02142, USA
| | - Daniel Ziemek
- Pfizer Worldwide Research & Development, Inflammation & Immunology, 14167 Berlin, Germany
| | - Benjamin B Sun
- MRC/BHF Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge CB1 8RN, UK
| | - Eric L Harshfield
- MRC/BHF Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge CB1 8RN, UK
- Department of Clinical Neurosciences, University of Cambridge, Cambridge CB2 0QQ, UK
| | - Angela M Wood
- MRC/BHF Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge CB1 8RN, UK
| | - Adam S Butterworth
- MRC/BHF Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge CB1 8RN, UK
| | - Karsten Suhre
- Department of Physiology and Biophysics, Weill Cornell Medicine-Qatar, PO 24144, Doha, Qatar
| | - Dirk S Paul
- MRC/BHF Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge CB1 8RN, UK
| |
Collapse
|
37
|
Abstract
Recent advances in technology have led to the exponential growth of scientific literature in biomedical sciences. This rapid increase in information has surpassed the threshold for manual curation efforts, necessitating the use of text mining approaches in the field of life sciences. One such application of text mining is in fostering in silico drug discovery such as drug target screening, pharmacogenomics, adverse drug event detection, etc. This chapter serves as an introduction to the applications of various text mining approaches in drug discovery. It is divided into two parts with the first half as an overview of text mining in the biosciences. The second half of the chapter reviews strategies and methods for four unique applications of text mining in drug discovery.
Collapse
Affiliation(s)
- Si Zheng
- Institute of Medical Information and Library, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Shazia Dharssi
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD, USA
| | - Meng Wu
- Institute of Medical Information and Library, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Jiao Li
- Institute of Medical Information and Library, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD, USA.
| |
Collapse
|
38
|
Shen J, Vasaikar S, Zhang B. DLAD4U: deriving and prioritizing disease lists from PubMed literature. BMC Bioinformatics 2018; 19:495. [PMID: 30591010 PMCID: PMC6309061 DOI: 10.1186/s12859-018-2463-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Background Due to recent technology advancements, disease related knowledge is growing rapidly. It becomes nontrivial to go through all published literature to identify associations between human diseases and genetic, environmental, and life style factors, disease symptoms, and treatment strategies. Here we report DLAD4U (Disease List Automatically Derived For You), an efficient, accurate and easy-to-use disease search engine based on PubMed literature. Results DLAD4U uses the eSearch and eFetch APIs from the National Center for Biotechnology Information (NCBI) to find publications related to a query and to identify diseases from the retrieved publications. The hypergeometric test was used to prioritize identified diseases for displaying to users. DLAD4U accepts any valid queries for PubMed, and the output results include a ranked disease list, information associated with each disease, chronologically-ordered supporting publications, a summary of the run, and links for file export. DLAD4U outperformed other disease search engines in our comparative evaluation using selected genes and drugs as query terms and manually curated data as “gold standard”. For 100 genes that are associated with only one disease in the gold standard, the Mean Average Precision (MAP) measure from DLAD4U was 0.77, which clearly outperformed other tools. For 10 genes that are associated with multiple diseases in the gold standard, the mean precision, recall and F-measure scores from DLAD4U were always higher than those from other tools. The superior performance of DLAD4U was further confirmed using 100 drugs as queries, with an MAP of 0.90. Conclusions DLAD4U is a new, intuitive disease search engine that takes advantage of existing resources at NCBI to provide computational efficiency and uses statistical analyses to ensure accuracy. DLAD4U is publicly available at http://dlad4u.zhang-lab.org. Electronic supplementary material The online version of this article (10.1186/s12859-018-2463-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Junhui Shen
- Information Center, Beijing University of Chinese Medicine, Beijing, China
| | - Suhas Vasaikar
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Mail Stop BCM600, Houston, TX, 77030, USA
| | - Bing Zhang
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX, USA. .,Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Mail Stop BCM600, Houston, TX, 77030, USA.
| |
Collapse
|
39
|
López-Cortés A, Paz-Y-Miño C, Cabrera-Andrade A, Barigye SJ, Munteanu CR, González-Díaz H, Pazos A, Pérez-Castillo Y, Tejera E. Gene prioritization, communality analysis, networking and metabolic integrated pathway to better understand breast cancer pathogenesis. Sci Rep 2018; 8:16679. [PMID: 30420728 PMCID: PMC6232116 DOI: 10.1038/s41598-018-35149-1] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2018] [Accepted: 10/16/2018] [Indexed: 12/30/2022] Open
Abstract
Consensus strategy was proved to be highly efficient in the recognition of gene-disease association. Therefore, the main objective of this study was to apply theoretical approaches to explore genes and communities directly involved in breast cancer (BC) pathogenesis. We evaluated the consensus between 8 prioritization strategies for the early recognition of pathogenic genes. A communality analysis in the protein-protein interaction (PPi) network of previously selected genes was enriched with gene ontology, metabolic pathways, as well as oncogenomics validation with the OncoPPi and DRIVE projects. The consensus genes were rationally filtered to 1842 genes. The communality analysis showed an enrichment of 14 communities specially connected with ERBB, PI3K-AKT, mTOR, FOXO, p53, HIF-1, VEGF, MAPK and prolactin signaling pathways. Genes with highest ranking were TP53, ESR1, BRCA2, BRCA1 and ERBB2. Genes with highest connectivity degree were TP53, AKT1, SRC, CREBBP and EP300. The connectivity degree allowed to establish a significant correlation between the OncoPPi network and our BC integrated network conformed by 51 genes and 62 PPi. In addition, CCND1, RAD51, CDC42, YAP1 and RPA1 were functional genes with significant sensitivity score in BC cell lines. In conclusion, the consensus strategy identifies both well-known pathogenic genes and prioritized genes that need to be further explored.
Collapse
Affiliation(s)
- Andrés López-Cortés
- Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Mariscal Sucre Avenue, 170129, Quito, Ecuador.
- RNASA-IMEDIR, Computer Sciences Faculty, University of Coruna, 15071, Coruna, Spain.
| | - César Paz-Y-Miño
- Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Mariscal Sucre Avenue, 170129, Quito, Ecuador
| | - Alejandro Cabrera-Andrade
- Carrera de Enfermería, Facultad de Ciencias de la Salud, Universidad de las Américas, Avenue de los Granados, 170125, Quito, Ecuador
- Grupo de Bio-Quimioinformática, Universidad de las Américas, Avenue de los Granados, 170125, Quito, Ecuador
| | - Stephen J Barigye
- Department of Chemistry, McGill University, 801 Sherbrooke Street West, Montreal, QC, H3A 0B8, Canada
| | - Cristian R Munteanu
- RNASA-IMEDIR, Computer Sciences Faculty, University of Coruna, 15071, Coruna, Spain
- INIBIC, Institute of Biomedical Research, CHUAC, UDC, 15006, Coruna, Spain
| | - Humberto González-Díaz
- Department of Organic Chemistry II, University of the Basque Country UPV/EHU, 48940, Leioa, Biscay, Spain
- IKERBASQUE, Basque Foundation for Science, 48011, Bilbao, Biscay, Spain
| | - Alejandro Pazos
- RNASA-IMEDIR, Computer Sciences Faculty, University of Coruna, 15071, Coruna, Spain
- INIBIC, Institute of Biomedical Research, CHUAC, UDC, 15006, Coruna, Spain
| | - Yunierkis Pérez-Castillo
- Grupo de Bio-Quimioinformática, Universidad de las Américas, Avenue de los Granados, 170125, Quito, Ecuador
- Escuela de Ciencias Físicas y Matemáticas, Universidad de las Américas, Avenue de los Granados, 170125, Quito, Ecuador
| | - Eduardo Tejera
- Grupo de Bio-Quimioinformática, Universidad de las Américas, Avenue de los Granados, 170125, Quito, Ecuador.
- Facultad de Ingeniería y Ciencias Agropecuarias, Universidad de las Américas, Avenue de los Granados, 170125, Quito, Ecuador.
| |
Collapse
|
40
|
Kordopati V, Salhi A, Razali R, Radovanovic A, Tifratene F, Uludag M, Li Y, Bokhari A, AlSaieedi A, Bin Raies A, Van Neste C, Essack M, Bajic VB. DES-Mutation: System for Exploring Links of Mutations and Diseases. Sci Rep 2018; 8:13359. [PMID: 30190574 PMCID: PMC6127254 DOI: 10.1038/s41598-018-31439-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2017] [Accepted: 08/17/2018] [Indexed: 12/17/2022] Open
Abstract
During cellular division DNA replicates and this process is the basis for passing genetic information to the next generation. However, the DNA copy process sometimes produces a copy that is not perfect, that is, one with mutations. The collection of all such mutations in the DNA copy of an organism makes it unique and determines the organism’s phenotype. However, mutations are often the cause of diseases. Thus, it is useful to have the capability to explore links between mutations and disease. We approached this problem by analyzing a vast amount of published information linking mutations to disease states. Based on such information, we developed the DES-Mutation knowledgebase which allows for exploration of not only mutation-disease links, but also links between mutations and concepts from 27 topic-specific dictionaries such as human genes/proteins, toxins, pathogens, etc. This allows for a more detailed insight into mutation-disease links and context. On a sample of 600 mutation-disease associations predicted and curated, our system achieves precision of 72.83%. To demonstrate the utility of DES-Mutation, we provide case studies related to known or potentially novel information involving disease mutations. To our knowledge, this is the first mutation-disease knowledgebase dedicated to the exploration of this topic through text-mining and data-mining of different mutation types and their associations with terms from multiple thematic dictionaries.
Collapse
Affiliation(s)
- Vasiliki Kordopati
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, 23955-6900, Saudi Arabia
| | - Adil Salhi
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, 23955-6900, Saudi Arabia
| | - Rozaimi Razali
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, 23955-6900, Saudi Arabia
| | - Aleksandar Radovanovic
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, 23955-6900, Saudi Arabia
| | - Faroug Tifratene
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, 23955-6900, Saudi Arabia
| | - Mahmut Uludag
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, 23955-6900, Saudi Arabia
| | - Yu Li
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, 23955-6900, Saudi Arabia
| | - Ameerah Bokhari
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, 23955-6900, Saudi Arabia
| | - Ahdab AlSaieedi
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, 23955-6900, Saudi Arabia.,King Abdulaziz University (KAU), Faculty of Applied Medical Sciences (FAMS), Department of Medical Laboratory Technology (MLT), Jeddah, 21589-80324, Saudi Arabia
| | - Arwa Bin Raies
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, 23955-6900, Saudi Arabia
| | - Christophe Van Neste
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, 23955-6900, Saudi Arabia.,Ghent University, Center for Medical Genetics Ghent (CMGG), B-9000, Ghent, Belgium
| | - Magbubah Essack
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, 23955-6900, Saudi Arabia
| | - Vladimir B Bajic
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, 23955-6900, Saudi Arabia.
| |
Collapse
|
41
|
Ibrahim SJA, Thangamani M. Prediction of Novel Drugs and Diseases for Hepatocellular Carcinoma Based on Multi-Source Simulated Annealing Based Random Walk. J Med Syst 2018; 42:188. [PMID: 30173379 DOI: 10.1007/s10916-018-1038-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2018] [Accepted: 08/20/2018] [Indexed: 01/09/2023]
Abstract
Computational techniques for foreseeing drug-disease associations by means of incorporating gene expression as well as biological network give high intuitions to the composite associations amongst targets, drugs, disease genes in addition to the diseases at a system level. Hepatocellular Carcinoma (HCC) is a malevolent tumor containing a greater rate of sickness as well as mortality. In the present work, an Integrative framework is presented with the aim of resolving this problem, for identifying new Drugs for HCC dependent upon Multi-Source Random Walk (PD-MRW), in which score the complete drugs by means of building the drug-drug similarity network. On the other hand, the collection of clinical phenotypes as well as drug side effects in combination with patient-specific genetic info. As a result, the formation of disease-drug networks that denotes the prescriptions, which are allotted to treat those diseases that are not concentrated by means of PD-MRW model. With the aim of overcoming this issue, this research offers an integrative framework for foreseeing new drugs as well as diseases for HCC dependent upon Multi-Source Simulated Annealing based Random Walk (PDD-MSSARW). Primarily, build a Gene-Gene Weighted Interaction Network (GWIN), dependent upon the gene expression as well as protein interaction network. After that, construct a drug-drug similarity network, dependent upon multi-source random walk in GWIN, disease-drug similarity network with the help of Similarity Weighted Bipartite Graph Network (SWBGN) that is build up in which the nodes are drugs as well as association among one node to another node that explains the disease diagnoses. Lastly, dependent upon the known drugs for HCC, score the entire drugs in the similarity networks. The sturdiness of the likelihoods, their overlap with those stated in Comparative Toxicogenomics Database (CTD) as well as kinds of literature, and their enhanced KEGG pathway illustrate PDD-MSSARW method be capable of efficiently find out novel drug signs.
Collapse
Affiliation(s)
| | - M Thangamani
- Kongu Engineering College, Perundurai, Tamilnadu, India
| |
Collapse
|
42
|
Bhasuran B, Subramanian D, Natarajan J. Text mining and network analysis to find functional associations of genes in high altitude diseases. Comput Biol Chem 2018; 75:101-110. [DOI: 10.1016/j.compbiolchem.2018.05.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2017] [Revised: 03/14/2018] [Accepted: 05/01/2018] [Indexed: 02/07/2023]
|
43
|
Bhasuran B, Natarajan J. Automatic extraction of gene-disease associations from literature using joint ensemble learning. PLoS One 2018; 13:e0200699. [PMID: 30048465 PMCID: PMC6061985 DOI: 10.1371/journal.pone.0200699] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2018] [Accepted: 07/02/2018] [Indexed: 12/26/2022] Open
Abstract
A wealth of knowledge concerning relations between genes and its associated diseases is present in biomedical literature. Mining these biological associations from literature can provide immense support to research ranging from drug-targetable pathways to biomarker discovery. However, time and cost of manual curation heavily slows it down. In this current scenario one of the crucial technologies is biomedical text mining, and relation extraction shows the promising result to explore the research of genes associated with diseases. By developing automatic extraction of gene-disease associations from the literature using joint ensemble learning we addressed this problem from a text mining perspective. In the proposed work, we employ a supervised machine learning approach in which a rich feature set covering conceptual, syntax and semantic properties jointly learned with word embedding are trained using ensemble support vector machine for extracting gene-disease relations from four gold standard corpora. Upon evaluating the machine learning approach shows promised results of 85.34%, 83.93%,87.39% and 85.57% of F-measure on EUADR, GAD, CoMAGC and PolySearch corpora respectively. We strongly believe that the presented novel approach combining rich syntax and semantic feature set with domain-specific word embedding through ensemble support vector machines evaluated on four gold standard corpora can act as a new baseline for future works in gene-disease relation extraction from literature.
Collapse
Affiliation(s)
- Balu Bhasuran
- DRDO-BU Center for Life Sciences, Bharathiar University Campus, Coimbatore, Tamilnadu, India
| | - Jeyakumar Natarajan
- DRDO-BU Center for Life Sciences, Bharathiar University Campus, Coimbatore, Tamilnadu, India
- Data mining and Text mining Laboratory, Department of Bioinformatics, Bharathiar University, Coimbatore, Tamilnadu, India
- * E-mail:
| |
Collapse
|
44
|
Xue H, Li J, Xie H, Wang Y. Review of Drug Repositioning Approaches and Resources. Int J Biol Sci 2018; 14:1232-1244. [PMID: 30123072 PMCID: PMC6097480 DOI: 10.7150/ijbs.24612] [Citation(s) in RCA: 323] [Impact Index Per Article: 53.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2017] [Accepted: 06/12/2018] [Indexed: 12/23/2022] Open
Abstract
Drug discovery is a time-consuming, high-investment, and high-risk process in traditional drug development. Drug repositioning has become a popular strategy in recent years. Different from traditional drug development strategies, the strategy is efficient, economical and riskless. There are usually three kinds of approaches: computational approaches, biological experimental approaches, and mixed approaches, all of which are widely used in drug repositioning. In this paper, we reviewed computational approaches and highlighted their characteristics to provide references for researchers to develop more powerful approaches. At the same time, the important findings obtained using these approaches are listed. Furthermore, we summarized 76 important resources about drug repositioning. Finally, challenges and opportunities in drug repositioning are discussed from multiple perspectives, including technology, commercial models, patents and investment.
Collapse
Affiliation(s)
- Hanqing Xue
- School of Computer Science and Technology, Harbin Institute of Technology, 150001, Harbin, China
| | - Jie Li
- School of Computer Science and Technology, Harbin Institute of Technology, 150001, Harbin, China
| | - Haozhe Xie
- School of Computer Science and Technology, Harbin Institute of Technology, 150001, Harbin, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, 150001, Harbin, China
| |
Collapse
|
45
|
Xing W, Yuan X, Li L, Hu L, Peng J. Phenotype Extraction Based on Word Embedding to Sentence Embedding Cascaded Approach. IEEE Trans Nanobioscience 2018; 17:172-180. [PMID: 29994536 DOI: 10.1109/tnb.2018.2838137] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
As a significant determinant in the development of named entity recognition, phenotypic descriptions are normally presented differently in biomedical literature with the use of complicated semantics. In this paper, a novel approach has been proposed to identify plant phenotypes by adopting word embedding to sentence embedding cascaded approach. We make use of a word embedding method to find high-frequency phenotypes with original sentences used as input in a sentence embedding method. In doing so, a variety of complicated phenotypic expressions can be recognized accurately. Besides, the state-of-the-art word representation models have been compared and among them, skip-gram with negative sampling was selected with the best performance. To evaluate the performance of our approach, we applied it to the dataset composed of 56 748 PubMed abstracts of model organism Arabidopsis thaliana. The experiment results showed that our approach yielded the best performance, as it achieved a 2.588-fold increase in terms of the number of new phenotypic descriptions when compared to the original phenotype ontology.
Collapse
|
46
|
Xing W, Qi J, Yuan X, Li L, Zhang X, Fu Y, Xiong S, Hu L, Peng J. A gene-phenotype relationship extraction pipeline from the biomedical literature using a representation learning approach. Bioinformatics 2018; 34:i386-i394. [PMID: 29950017 PMCID: PMC6022650 DOI: 10.1093/bioinformatics/bty263] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Motivation The fundamental challenge of modern genetic analysis is to establish gene-phenotype correlations that are often found in the large-scale publications. Because lexical features of gene are relatively regular in text, the main challenge of these relation extraction is phenotype recognition. Due to phenotypic descriptions are often study- or author-specific, few lexicon can be used to effectively identify the entire phenotypic expressions in text, especially for plants. Results We have proposed a pipeline for extracting phenotype, gene and their relations from biomedical literature. Combined with abbreviation revision and sentence template extraction, we improved the unsupervised word-embedding-to-sentence-embedding cascaded approach as representation learning to recognize the various broad phenotypic information in literature. In addition, the dictionary- and rule-based method was applied for gene recognition. Finally, we integrated one of famous information extraction system OLLIE to identify gene-phenotype relations. To demonstrate the applicability of the pipeline, we established two types of comparison experiment using model organism Arabidopsis thaliana. In the comparison of state-of-the-art baselines, our approach obtained the best performance (F1-Measure of 66.83%). We also applied the pipeline to 481 full-articles from TAIR gene-phenotype manual relationship dataset to prove the validity. The results showed that our proposed pipeline can cover 70.94% of the original dataset and add 373 new relations to expand it. Availability and implementation The source code is available at http://www.wutbiolab.cn: 82/Gene-Phenotype-Relation-Extraction-Pipeline.zip. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wenhui Xing
- School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China
| | - Junsheng Qi
- Department of Plant Science, College of Biological Science, China Agricultural University, Beijing, China
| | - Xiaohui Yuan
- School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China
| | - Lin Li
- School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China
| | - Xiaoyu Zhang
- Britton Chance Center for Biomedical Photonics, Wuhan National Laboratory for Optoelectronics-Huazhong University of Science and Technology, Wuhan, China
| | - Yuhua Fu
- School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China
| | - Shengwu Xiong
- School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China
| | - Lun Hu
- School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China
| | - Jing Peng
- School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China
| |
Collapse
|
47
|
Ozsoy MG, Özyer T, Polat F, Alhajj R. Realizing drug repositioning by adapting a recommendation system to handle the process. BMC Bioinformatics 2018; 19:136. [PMID: 29649971 PMCID: PMC5898022 DOI: 10.1186/s12859-018-2142-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2017] [Accepted: 03/27/2018] [Indexed: 12/26/2022] Open
Abstract
Background Drug repositioning is the process of identifying new targets for known drugs. It can be used to overcome problems associated with traditional drug discovery by adapting existing drugs to treat new discovered diseases. Thus, it may reduce associated risk, cost and time required to identify and verify new drugs. Nowadays, drug repositioning has received more attention from industry and academia. To tackle this problem, researchers have applied many different computational methods and have used various features of drugs and diseases. Results In this study, we contribute to the ongoing research efforts by combining multiple features, namely chemical structures, protein interactions and side-effects to predict new indications of target drugs. To achieve our target, we realize drug repositioning as a recommendation process and this leads to a new perspective in tackling the problem. The utilized recommendation method is based on Pareto dominance and collaborative filtering. It can also integrate multiple data-sources and multiple features. For the computation part, we applied several settings and we compared their performance. Evaluation results show that the proposed method can achieve more concentrated predictions with high precision, where nearly half of the predictions are true. Conclusions Compared to other state of the art methods described in the literature, the proposed method is better at making right predictions by having higher precision. The reported results demonstrate the applicability and effectiveness of recommendation methods for drug repositioning.
Collapse
Affiliation(s)
- Makbule Gulcin Ozsoy
- Department of Computer Engineering, Middle East Technical University, Ankara, Turkey
| | - Tansel Özyer
- Department of Computer Engineering, TOBB University, Ankara, Turkey
| | - Faruk Polat
- Department of Computer Engineering, Middle East Technical University, Ankara, Turkey
| | - Reda Alhajj
- Department of Computer Science, University of Calgary, Calgary, AB, Canada.
| |
Collapse
|
48
|
Yu KH, Lee TLM, Wang CS, Chen YJ, Ré C, Kou SC, Chiang JH, Kohane IS, Snyder M. Systematic Protein Prioritization for Targeted Proteomics Studies through Literature Mining. J Proteome Res 2018; 17:1383-1396. [PMID: 29505266 DOI: 10.1021/acs.jproteome.7b00772] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
There are more than 3.7 million published articles on the biological functions or disease implications of proteins, constituting an important resource of proteomics knowledge. However, it is difficult to summarize the millions of proteomics findings in the literature manually and quantify their relevance to the biology and diseases of interest. We developed a fully automated bioinformatics framework to identify and prioritize proteins associated with any biological entity. We used the 22 targeted areas of the Biology/Disease-driven (B/D)-Human Proteome Project (HPP) as examples, prioritized the relevant proteins through their Protein Universal Reference Publication-Originated Search Engine (PURPOSE) scores, validated the relevance of the score by comparing the protein prioritization results with a curated database, computed the scores of proteins across the topics of B/D-HPP, and characterized the top proteins in the common model organisms. We further extended the bioinformatics workflow to identify the relevant proteins in all organ systems and human diseases and deployed a cloud-based tool to prioritize proteins related to any custom search terms in real time. Our tool can facilitate the prioritization of proteins for any organ system or disease of interest and can contribute to the development of targeted proteomic studies for precision medicine.
Collapse
Affiliation(s)
- Kun-Hsing Yu
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115, United States
- Department of Statistics, Harvard University, Cambridge, Massachusetts 02138, United States
| | - Tsung-Lu Michael Lee
- Department of Information Engineering, Kun Shan University, Tainan City 710-03, Taiwan
| | - Chi-Shiang Wang
- Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan City 701-01, Taiwan
| | - Yu-Ju Chen
- Institute of Chemistry, Academia Sinica, Taipei 115-29, Taiwan
| | - Christopher Ré
- Department of Computer Science, Stanford University, Stanford, California 94305, United States
| | - Samuel C. Kou
- Department of Statistics, Harvard University, Cambridge, Massachusetts 02138, United States
| | - Jung-Hsien Chiang
- Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan City 701-01, Taiwan
| | - Isaac S. Kohane
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115, United States
| | - Michael Snyder
- Department of Genetics, Stanford University, Stanford, California 94305, United States
| |
Collapse
|
49
|
Lee K, Kim B, Choi Y, Kim S, Shin W, Lee S, Park S, Kim S, Tan AC, Kang J. Deep learning of mutation-gene-drug relations from the literature. BMC Bioinformatics 2018; 19:21. [PMID: 29368597 PMCID: PMC5784504 DOI: 10.1186/s12859-018-2029-1] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2017] [Accepted: 01/17/2018] [Indexed: 12/31/2022] Open
Abstract
Background Molecular biomarkers that can predict drug efficacy in cancer patients are crucial components for the advancement of precision medicine. However, identifying these molecular biomarkers remains a laborious and challenging task. Next-generation sequencing of patients and preclinical models have increasingly led to the identification of novel gene-mutation-drug relations, and these results have been reported and published in the scientific literature. Results Here, we present two new computational methods that utilize all the PubMed articles as domain specific background knowledge to assist in the extraction and curation of gene-mutation-drug relations from the literature. The first method uses the Biomedical Entity Search Tool (BEST) scoring results as some of the features to train the machine learning classifiers. The second method uses not only the BEST scoring results, but also word vectors in a deep convolutional neural network model that are constructed from and trained on numerous documents such as PubMed abstracts and Google News articles. Using the features obtained from both the BEST search engine scores and word vectors, we extract mutation-gene and mutation-drug relations from the literature using machine learning classifiers such as random forest and deep convolutional neural networks. Our methods achieved better results compared with the state-of-the-art methods. We used our proposed features in a simple machine learning model, and obtained F1-scores of 0.96 and 0.82 for mutation-gene and mutation-drug relation classification, respectively. We also developed a deep learning classification model using convolutional neural networks, BEST scores, and the word embeddings that are pre-trained on PubMed or Google News data. Using deep learning, the classification accuracy improved, and F1-scores of 0.96 and 0.86 were obtained for the mutation-gene and mutation-drug relations, respectively. Conclusion We believe that our computational methods described in this research could be used as an important tool in identifying molecular biomarkers that predict drug responses in cancer patients. We also built a database of these mutation-gene-drug relations that were extracted from all the PubMed abstracts. We believe that our database can prove to be a valuable resource for precision medicine researchers. Electronic supplementary material The online version of this article (10.1186/s12859-018-2029-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kyubum Lee
- Department of Computer Science and Engineering, Korea University, Seoul, South Korea
| | - Byounggun Kim
- Interdisciplinary Graduate Program in Bioinformatics, Korea University, Seoul, South Korea
| | - Yonghwa Choi
- Department of Computer Science and Engineering, Korea University, Seoul, South Korea
| | - Sunkyu Kim
- Department of Computer Science and Engineering, Korea University, Seoul, South Korea
| | - Wonho Shin
- Interdisciplinary Graduate Program in Bioinformatics, Korea University, Seoul, South Korea
| | - Sunwon Lee
- Department of Computer Science and Engineering, Korea University, Seoul, South Korea
| | - Sungjoon Park
- Department of Computer Science and Engineering, Korea University, Seoul, South Korea
| | - Seongsoon Kim
- Department of Computer Science and Engineering, Korea University, Seoul, South Korea
| | - Aik Choon Tan
- Translational Bioinformatics and Cancer Systems Biology Laboratory, Division of Medical Oncology, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA.
| | - Jaewoo Kang
- Department of Computer Science and Engineering, Korea University, Seoul, South Korea. .,Interdisciplinary Graduate Program in Bioinformatics, Korea University, Seoul, South Korea.
| |
Collapse
|
50
|
|