1
|
Muhammad SA, Qousain Naqvi ST, Nguyen T, Wu X, Munir F, Jamshed MB, Zhang Q. Cisplatin's potential for type 2 diabetes repositioning by inhibiting CDKN1A, FAS, and SESN1. Comput Biol Med 2021; 135:104640. [PMID: 34261004 DOI: 10.1016/j.compbiomed.2021.104640] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Revised: 07/06/2021] [Accepted: 07/06/2021] [Indexed: 12/16/2022]
Abstract
Cisplatin is a DNA-damaging chemotherapeutic agent used for treating cancer. Based on cDNA dataset analysis, we investigated how cisplatin modified gene expression and observed cisplatin-induced dysregulation and system-level variations relating to insulin resistance and type 2 diabetes mellitus (T2DM). T2DM is a multifactorial disease affecting 462 million people in the world, and drug-induced T2DM is a serious issue. To understand this etiology, we designed an integrative, system-level study to identify associations between cisplatin-induced differentially expressed genes (DEGs) and T2DM. From a list of differential expressed genes, cisplatin downregulated the cyclin-dependent kinase inhibitor 1 (CDKN1A), tumor necrosis factor (FAS), and sestrin-1 (SESN1) genes responsible for modifying signaling pathways, including the p53, JAK-STAT, FOXO, MAPK, mTOR, P13-AKT, Toll-like receptor (TLR), adipocytokine, and insulin signaling pathways. These enriched pathways were expressively associated with the disease. We observed significant gene signatures, including SMAD3, IRS, PDK1, PRKAA1, AKT, SOS, RAS, GRB2, MEK1/2, and ERK, interacting with source genes. This study revealed the value of system genetics for identifying the cisplatin-induced genetic variants responsible for the progression of T2DM. Also, by cross-validating gene expression data for T2DM islets, we found that downregulating IRS and PRK families is critical in insulin and T2DM signaling pathways. Cisplatin, by inhibiting CDKN1A, FAS, and SESN1, promotes IRS and PRK activity in a similar way to rosiglitazone (a popular drug used for T2DM treatment). Our integrative, network-based approach can help in understanding the drug-induced pathophysiological mechanisms of diabetes.
Collapse
Affiliation(s)
- Syed Aun Muhammad
- Institute of Molecular Biology and Biotechnology, Bahauddin Zakariya University, Multan, Pakistan.
| | | | - Thanh Nguyen
- Informatics Institute, School of Medicine, The University of Alabama, Birmingham, AL, USA
| | - Xiaogang Wu
- The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Fahad Munir
- Department of Hepatobiliary Surgery, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China; Wenzhou Medical University, Wenzhou, Zhejiang Province, China
| | - Muhammad Babar Jamshed
- Department of Hepatobiliary Surgery, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China; Wenzhou Medical University, Wenzhou, Zhejiang Province, China
| | - QiYu Zhang
- Department of Hepatobiliary Surgery, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China.
| |
Collapse
|
2
|
Lai PT, Lu Z. BERT-GT: Cross-sentence n-ary relation extraction with BERT and graph transformer. Bioinformatics 2021; 36:5678-5685. [PMID: 33416851 PMCID: PMC8023679 DOI: 10.1093/bioinformatics/btaa1087] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Revised: 12/17/2020] [Accepted: 12/20/2020] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION A biomedical relation statement is commonly expressed in multiple sentences and consists of many concepts, including gene, disease, chemical, and mutation. To automatically extract information from biomedical literature, existing biomedical text-mining approaches typically formulate the problem as a cross-sentence n-ary relation-extraction task that detects relations among n entities across multiple sentences, and use either a graph neural network (GNN) with long short-term memory (LSTM) or an attention mechanism. Recently, Transformer has been shown to outperform LSTM on many natural language processing (NLP) tasks. RESULTS In this work, we propose a novel architecture that combines Bidirectional Encoder Representations from Transformers with Graph Transformer (BERT-GT), through integrating a neighbor-attention mechanism into the BERT architecture. Unlike the original Transformer architecture, which utilizes the whole sentence(s) to calculate the attention of the current token, the neighbor-attention mechanism in our method calculates its attention utilizing only its neighbor tokens. Thus, each token can pay attention to its neighbor information with little noise. We show that this is critically important when the text is very long, as in cross-sentence or abstract-level relation-extraction tasks. Our benchmarking results show improvements of 5.44% and 3.89% in accuracy and F1-measure over the state-of-the-art on n-ary and chemical-protein relation datasets, suggesting BERT-GT is a robust approach that is applicable to other biomedical relation extraction tasks or datasets. AVAILABILITY AND IMPLEMENTATION the source code of BERT-GT will be made freely available at https://github.com/ncbi-nlp/bert_gt upon publication.
Collapse
Affiliation(s)
- Po-Ting Lai
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD, 20894, USA
| | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD, 20894, USA
| |
Collapse
|
3
|
Raza W, Guo J, Qadir MI, Bai B, Muhammad SA. qPCR Analysis Reveals Association of Differential Expression of SRR, NFKB1, and PDE4B Genes With Type 2 Diabetes Mellitus. Front Endocrinol (Lausanne) 2021; 12:774696. [PMID: 35046895 PMCID: PMC8761634 DOI: 10.3389/fendo.2021.774696] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/12/2021] [Accepted: 11/08/2021] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND Type 2 diabetes mellitus (T2DM) is a heterogeneous, metabolic, and chronic condition affecting vast numbers of the world's population. The related variables and T2DM associations have not been fully understood due to their diverse nature. However, functional genomics can facilitate understanding of the disease. This information will be useful in drug design, advanced diagnostic, and prognostic markers. AIM To understand the genetic causes of T2DM, this study was designed to identify the differentially expressed genes (DEGs) of the disease. METHODS We investigated 20 publicly available disease-specific cDNA datasets from Gene Expression Omnibus (GEO) containing several attributes including gene symbols and clone identifiers, GenBank accession numbers, and phenotypic feature coordinates. We analyzed an integrated system-level framework involving Gene Ontology (GO), protein motifs and co-expression analysis, pathway enrichment, and transcriptional factors to reveal the biological information of genes. A co-expression network was studied to highlight the genes that showed a coordinated expression pattern across a group of samples. The DEGs were validated by quantitative PCR (qPCR) to analyze the expression levels of case and control samples (50 each) using glyceraldehyde 3-phosphate dehydrogenase (GAPDH) as the reference gene. RESULTS From the list of 50 DEGs, we ranked three T2DM-related genes (p < 0.05): SRR, NFKB1, and PDE4B. The enriched terms revealed a significant functional role in amino acid metabolism, signal transduction, transmembrane and intracellular transport, and other vital biological functions. DMBX1, TAL1, ZFP161, NFIC (66.7%), and NR1H4 (33.3%) are transcriptional factors associated with the regulatory mechanism. We found substantial enrichment of insulin signaling and other T2DM-related pathways, such as valine, leucine and isoleucine biosynthesis, serine and threonine metabolism, adipocytokine signaling pathway, P13K/Akt pathway, and Hedgehog signaling pathway. The expression profiles of these DEGs verified by qPCR showed a substantial level of twofold change (FC) expression (2-ΔΔCT) in the genes SRR (FC ≤ 0.12), NFKB1 (FC ≤ 1.09), and PDE4B (FC ≤ 0.9) compared to controls (FC ≥ 1.6). The downregulated expression of these genes is associated with pathophysiological development and metabolic disorders. CONCLUSION This study would help to modulate the therapeutic strategies for T2DM and could speed up drug discovery outcomes.
Collapse
Affiliation(s)
- Waseem Raza
- Institute of Molecular Biology and Biotechnology, Bahauddin Zakariya University, Multan, Pakistan
| | - Jinlei Guo
- School of Medical Engineering, Sanquan College of Xinxiang Medical University, Xinxiang, China
| | - Muhammad Imran Qadir
- Institute of Molecular Biology and Biotechnology, Bahauddin Zakariya University, Multan, Pakistan
| | - Baogang Bai
- School of Information and Technology, Wenzhou Business College, Wenzhou, China
- Engineering Research Center of Intelligent Medicine, Wenzhou, China
- The 1st School of Medical, School of Information and Engineering, The 1st Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
- *Correspondence: Syed Aun Muhammad, ; Baogang Bai,
| | - Syed Aun Muhammad
- Institute of Molecular Biology and Biotechnology, Bahauddin Zakariya University, Multan, Pakistan
- *Correspondence: Syed Aun Muhammad, ; Baogang Bai,
| |
Collapse
|
4
|
Abbas SZ, Qadir MI, Muhammad SA. Systems-level differential gene expression analysis reveals new genetic variants of oral cancer. Sci Rep 2020; 10:14667. [PMID: 32887903 PMCID: PMC7473858 DOI: 10.1038/s41598-020-71346-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Accepted: 07/20/2020] [Indexed: 01/28/2023] Open
Abstract
Oral cancer (OC) ranked as eleventh malignancy worldwide, with the increasing incidence among young patients. Limited understanding of complications in cancer progression, its development system, and their interactions are major restrictions towards the progress of optimal and effective treatment strategies. The system-level approach has been designed to explore genetic complexity of the disease and to identify novel oral cancer related genes to detect genomic alterations at molecular level, through cDNA differential analysis. We analyzed 21 oral cancer-related cDNA datasets and listed 30 differentially expressed genes (DEGs). Among 30, we found 6 significant DEGs including CYP1A1, CYP1B1, ADCY2, C7, SERPINB5, and ANAPC13 and studied their functional role in OC. Our genomic and interactive analysis showed significant enrichment of xenobiotics metabolism, p53 signaling pathway and microRNA pathways, towards OC progression and development. We used human proteomic data for post-translational modifications to interpret disease mutations and inter-individual genetic variations. The mutational analysis revealed the sequence predicted disordered region of 14%, 12.5%, 10.5% for ADCY2, CYP1B1, and C7 respectively. The MiRNA target prediction showed functional molecular annotation including specific miRNA-targets hsa-miR-4282, hsa-miR-2052, hsa-miR-216a-3p, for CYP1B1, C7, and ADCY2 respectively associated with oral cancer. We constructed the system level network and found important gene signatures. The drug-gene interaction of OC source genes with seven FDA approved OC drugs help to design or identify new drug target or establishing novel biomedical linkages regarding disease pathophysiology. This investigation demonstrates the importance of system genetics for identifying 6 OC genes (CYP1A1, CYP1B1, ADCY2, C7, SERPINB5, and ANAPC13) as potential drugs targets. Our integrative network-based system-level approach would help to find the genetic variants of OC that can accelerate drug discovery outcomes to develop a better understanding regarding treatment strategies for many cancer types.
Collapse
Affiliation(s)
- Syeda Zahra Abbas
- Institute of Molecular Biology and Biotechnology, Bahauddin Zakariya University, Multan, Pakistan
| | - Muhammad Imran Qadir
- Institute of Molecular Biology and Biotechnology, Bahauddin Zakariya University, Multan, Pakistan
| | - Syed Aun Muhammad
- Institute of Molecular Biology and Biotechnology, Bahauddin Zakariya University, Multan, Pakistan.
| |
Collapse
|
5
|
Cañada A, Capella-Gutierrez S, Rabal O, Oyarzabal J, Valencia A, Krallinger M. LimTox: a web tool for applied text mining of adverse event and toxicity associations of compounds, drugs and genes. Nucleic Acids Res 2019; 45:W484-W489. [PMID: 28531339 PMCID: PMC5570141 DOI: 10.1093/nar/gkx462] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2017] [Accepted: 05/16/2017] [Indexed: 01/03/2023] Open
Abstract
A considerable effort has been devoted to retrieve systematically information for genes and proteins as well as relationships between them. Despite the importance of chemical compounds and drugs as a central bio-entity in pharmacological and biological research, only a limited number of freely available chemical text-mining/search engine technologies are currently accessible. Here we present LimTox (Literature Mining for Toxicology), a web-based online biomedical search tool with special focus on adverse hepatobiliary reactions. It integrates a range of text mining, named entity recognition and information extraction components. LimTox relies on machine-learning, rule-based, pattern-based and term lookup strategies. This system processes scientific abstracts, a set of full text articles and medical agency assessment reports. Although the main focus of LimTox is on adverse liver events, it enables also basic searches for other organ level toxicity associations (nephrotoxicity, cardiotoxicity, thyrotoxicity and phospholipidosis). This tool supports specialized search queries for: chemical compounds/drugs, genes (with additional emphasis on key enzymes in drug metabolism, namely P450 cytochromes—CYPs) and biochemical liver markers. The LimTox website is free and open to all users and there is no login requirement. LimTox can be accessed at: http://limtox.bioinfo.cnio.es
Collapse
Affiliation(s)
- Andres Cañada
- Spanish National Bioinformatics Institute Unit, Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid 28029, Spain
| | - Salvador Capella-Gutierrez
- Spanish National Bioinformatics Institute Unit, Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid 28029, Spain
| | - Obdulia Rabal
- Small Molecule Discovery Platform, Molecular Therapeutics Program, Center for Applied Medical Research (CIMA), University of Navarra, Pamplona 31008, Spain
| | - Julen Oyarzabal
- Small Molecule Discovery Platform, Molecular Therapeutics Program, Center for Applied Medical Research (CIMA), University of Navarra, Pamplona 31008, Spain
| | - Alfonso Valencia
- Barcelona Supercomputing Center (BSC), Joint BSC-CRG-IRB, Research Program in Computational Biology, BSC-CRG-IRB, Barcelona 08028, Spain.,Life Science Department, Barcelona Supercomputing Centre (BSC-CNS), 08034 Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain
| | - Martin Krallinger
- Biological Text Mining Unit, Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid 28029, Spain
| |
Collapse
|
6
|
Muhammad SA, Fatima N, Paracha RZ, Ali A, Chen JY. A systematic simulation-based meta-analytical framework for prediction of physiological biomarkers in alopecia. ACTA ACUST UNITED AC 2019; 26:2. [PMID: 30993080 PMCID: PMC6449998 DOI: 10.1186/s40709-019-0094-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2018] [Accepted: 03/20/2019] [Indexed: 01/13/2023]
Abstract
Background Alopecia or hair loss is a complex polygenetic and psychologically devastating disease affecting millions of men and women globally. Since the gene annotation and environmental knowledge is limited for alopecia, a systematic analysis for the identification of candidate biomarkers is required that could provide potential therapeutic targets for hair loss therapy. Results We designed an interactive framework to perform a meta-analytical study based on differential expression analysis, systems biology, and functional proteomic investigations. We analyzed eight publicly available microarray datasets and found 12 potential candidate biomarkers including three extracellular proteins from the list of differentially expressed genes with a p-value < 0.05. After expression profiling and functional analysis, we studied protein–protein interactions and observed functional associations of source proteins including WIF1, SPON1, LYZ, GPRC5B, PTPRE, ZFP36L2, HBB, PHF15, LMCD1, KRT35 and VAV3 with target proteins including APCDD1, WNT1, WNT3A, SHH, ESRI, TGFB1, and APP. Pathway analysis of these molecules revealed their role in major physiological reactions including protein metabolism, signal transduction, WNT, BMP, EDA, NOTCH and SHH pathways. These pathways regulate hair growth, hair follicle differentiation, pigmentation, and morphogenesis. We studied the regulatory role of β-catenin, Nf-kappa B, cytokines and retinoic acid in the development of hair growth. Therefore, the differential expression of these significant proteins would affect the normal level and could cause aberrations in hair growth. Conclusion Our integrative approach helps to prioritize the biomarkers that ultimately lessen the economic burden of experimental studies. It will also be valuable to discover mutants in genomic data in order to increase the identification of new biomarkers for similar problems. Electronic supplementary material The online version of this article (10.1186/s40709-019-0094-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Syed Aun Muhammad
- 1Institute of Molecular Biology and Biotechnology, Bahauddin Zakariya University, Multan, 60800 Pakistan
| | - Nighat Fatima
- 2Department of Pharmacy, COMSATS Institute of Information Technology, Abbottabad, 22060 Pakistan
| | - Rehan Zafar Paracha
- 3Research Center of Modeling and Simulation (RCMS), Department of Computational Sciences, National University of Sciences and Technology (NUST), Islamabad, 44000 Pakistan
| | - Amjad Ali
- 4Atta-ur-Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology (NUST), Islamabad, 44000 Pakistan
| | - Jake Y Chen
- 5Informatics Institute, School of Medicine, The University of Alabama (UAB), Birmingham, USA
| |
Collapse
|
7
|
Zhu Y, Elemento O, Pathak J, Wang F. Drug knowledge bases and their applications in biomedical informatics research. Brief Bioinform 2018; 20:1308-1321. [DOI: 10.1093/bib/bbx169] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2017] [Revised: 11/15/2017] [Indexed: 11/14/2022] Open
Abstract
Abstract
Recent advances in biomedical research have generated a large volume of drug-related data. To effectively handle this flood of data, many initiatives have been taken to help researchers make good use of them. As the results of these initiatives, many drug knowledge bases have been constructed. They range from simple ones with specific focuses to comprehensive ones that contain information on almost every aspect of a drug. These curated drug knowledge bases have made significant contributions to the development of efficient and effective health information technologies for better health-care service delivery. Understanding and comparing existing drug knowledge bases and how they are applied in various biomedical studies will help us recognize the state of the art and design better knowledge bases in the future. In addition, researchers can get insights on novel applications of the drug knowledge bases through a review of successful use cases. In this study, we provide a review of existing popular drug knowledge bases and their applications in drug-related studies. We discuss challenges in constructing and using drug knowledge bases as well as future research directions toward a better ecosystem of drug knowledge bases.
Collapse
|
8
|
Muhammad SA, Raza W, Nguyen T, Bai B, Wu X, Chen J. Cellular Signaling Pathways in Insulin Resistance-Systems Biology Analyses of Microarray Dataset Reveals New Drug Target Gene Signatures of Type 2 Diabetes Mellitus. Front Physiol 2017; 8:13. [PMID: 28179884 PMCID: PMC5264126 DOI: 10.3389/fphys.2017.00013] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2016] [Accepted: 01/09/2017] [Indexed: 01/09/2023] Open
Abstract
Purpose: Type 2 diabetes mellitus (T2DM) is a chronic and metabolic disorder affecting large set of population of the world. To widen the scope of understanding of genetic causes of this disease, we performed interactive and toxicogenomic based systems biology study to find potential T2DM related genes after cDNA differential analysis. Methods: From the list of 50-differential expressed genes (p < 0.05), we found 9-T2DM related genes using extensive data mapping. In our constructed gene-network, T2DM-related differentially expressed seeder genes (9-genes) are found to interact with functionally related gene signatures (31-genes). The genetic interaction network of both T2DM-associated seeder as well as signature genes generally relates well with the disease condition based on toxicogenomic and data curation. Results: These networks showed significant enrichment of insulin signaling, insulin secretion and other T2DM-related pathways including JAK-STAT, MAPK, TGF, Toll-like receptor, p53 and mTOR, adipocytokine, FOXO, PPAR, P13-AKT, and triglyceride metabolic pathways. We found some enriched pathways that are common in different conditions. We recognized 11-signaling pathways as a connecting link between gene signatures in insulin resistance and T2DM. Notably, in the drug-gene network, the interacting genes showed significant overlap with 13-FDA approved and few non-approved drugs. This study demonstrates the value of systems genetics for identifying 18 potential genes associated with T2DM that are probable drug targets. Conclusions: This integrative and network based approaches for finding variants in genomic data expect to accelerate identification of new drug target molecules for different diseases and can speed up drug discovery outcomes.
Collapse
Affiliation(s)
- Syed Aun Muhammad
- Institute of Molecular Biology and Biotechnology, Bahauddin Zakariya UniversityMultan, Pakistan; Institute of Biopharmaceutical Informatics and Technologies, Wenzhou Medical UniversityWenzhou, China; Wenzhou Medical University, 1st Affiliate Hospital WenzhouWenzhou, China
| | - Waseem Raza
- Institute of Molecular Biology and Biotechnology, Bahauddin Zakariya University Multan, Pakistan
| | - Thanh Nguyen
- Institute of Biopharmaceutical Informatics and Technologies, Wenzhou Medical UniversityWenzhou, China; Wenzhou Medical University, 1st Affiliate Hospital WenzhouWenzhou, China; Department of Computer and Information Science, Purdue UniversityIndianapolis, IN, USA
| | - Baogang Bai
- Institute of Biopharmaceutical Informatics and Technologies, Wenzhou Medical University Wenzhou, China
| | - Xiaogang Wu
- Institute for Systems Biology Seattle, WA, USA
| | - Jake Chen
- Institute of Biopharmaceutical Informatics and Technologies, Wenzhou Medical UniversityWenzhou, China; Wenzhou Medical University, 1st Affiliate Hospital WenzhouWenzhou, China; Department of Computer and Information Science, Purdue UniversityIndianapolis, IN, USA; Indiana Center for Systems Biology and Personalized Medicine, Indiana University-Purdue UniversityIndianapolis, IN, USA; Informatics Institute, School of Medicine, The University of AlabamaBirmingham, AL, USA
| |
Collapse
|
9
|
Xu D, Zhang M, Xie Y, Wang F, Chen M, Zhu KQ, Wei J. DTMiner: identification of potential disease targets through biomedical literature mining. Bioinformatics 2016; 32:3619-3626. [PMID: 27506226 PMCID: PMC5181534 DOI: 10.1093/bioinformatics/btw503] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2016] [Revised: 06/07/2016] [Accepted: 07/19/2016] [Indexed: 11/12/2022] Open
Abstract
Motivation: Biomedical researchers often search through massive catalogues of literature to look for potential relationships between genes and diseases. Given the rapid growth of biomedical literature, automatic relation extraction, a crucial technology in biomedical literature mining, has shown great potential to support research of gene-related diseases. Existing work in this field has produced datasets that are limited both in scale and accuracy. Results: In this study, we propose a reliable and efficient framework that takes large biomedical literature repositories as inputs, identifies credible relationships between diseases and genes, and presents possible genes related to a given disease and possible diseases related to a given gene. The framework incorporates name entity recognition (NER), which identifies occurrences of genes and diseases in texts, association detection whereby we extract and evaluate features from gene–disease pairs, and ranking algorithms that estimate how closely the pairs are related. The F1-score of the NER phase is 0.87, which is higher than existing studies. The association detection phase takes drastically less time than previous work while maintaining a comparable F1-score of 0.86. The end-to-end result achieves a 0.259 F1-score for the top 50 genes associated with a disease, which performs better than previous work. In addition, we released a web service for public use of the dataset. Availability and Implementation: The implementation of the proposed algorithms is publicly available at http://gdr-web.rwebox.com/public_html/index.php?page=download.php. The web service is available at http://gdr-web.rwebox.com/public_html/index.php. Contact:jenny.wei@astrazeneca.com or kzhu@cs.sjtu.edu.cn Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dong Xu
- Department of CSE, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Meizhuo Zhang
- R&D Information, Innovation Center China, AstraZeneca, Pudong, Shanghai 201203, China
| | - Yanping Xie
- Department of CSE, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Fan Wang
- Department of CSE, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Ming Chen
- R&D Information, Innovation Center China, AstraZeneca, Pudong, Shanghai 201203, China
| | - Kenny Q Zhu
- Department of CSE, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Jia Wei
- R&D Information, Innovation Center China, AstraZeneca, Pudong, Shanghai 201203, China
| |
Collapse
|
10
|
Niu Y, Wang Y. Protein-protein interaction identification using a hybrid model. Artif Intell Med 2015; 64:185-93. [PMID: 26054427 DOI: 10.1016/j.artmed.2015.05.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2014] [Revised: 05/13/2015] [Accepted: 05/15/2015] [Indexed: 11/26/2022]
Abstract
BACKGROUND Most existing systems that identify protein-protein interaction (PPI) in literature make decisions solely on evidence within a single sentence and ignore the rich context of PPI descriptions in large corpora. Moreover, they often suffer from the heavy burden of manual annotation. METHODS To address these problems, a new relational-similarity (RS)-based approach exploiting context in large-scale text is proposed. A basic RS model is first established to make initial predictions. Then word similarity matrices that are sensitive to the PPI identification task are constructed using a corpus-based approach. Finally, a hybrid model is developed to integrate the word similarity model with the basic RS model. RESULTS The experimental results show that the basic RS model achieves F-scores much higher than a baseline of random guessing on interactions (from 50.6% to 75.0%) and non-interactions (from 49.4% to 74.2%). The hybrid model further improves F-score by about 2% on interactions and 3% on non-interactions. CONCLUSION The experimental evaluations conducted with PPIs in well-known databases showed the effectiveness of our approach that explores context information in PPI identification. This investigation confirmed that within the framework of relational similarity, the word similarity model relieves the data sparseness problem in similarity calculation.
Collapse
Affiliation(s)
- Yun Niu
- College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, 29 Yudao Street, Qinhuaiqu, Nanjing, Jiangsu 210016, China.
| | - Yuwei Wang
- College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, 29 Yudao Street, Qinhuaiqu, Nanjing, Jiangsu 210016, China
| |
Collapse
|
11
|
Rinaldi F, Clematide S, Marques H, Ellendorff T, Romacker M, Rodriguez-Esteban R. OntoGene web services for biomedical text mining. BMC Bioinformatics 2014; 15 Suppl 14:S6. [PMID: 25472638 PMCID: PMC4255746 DOI: 10.1186/1471-2105-15-s14-s6] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
Text mining services are rapidly becoming a crucial component of various knowledge management pipelines, for example in the process of database curation, or for exploration and enrichment of biomedical data within the pharmaceutical industry. Traditional architectures, based on monolithic applications, do not offer sufficient flexibility for a wide range of use case scenarios, and therefore open architectures, as provided by web services, are attracting increased interest. We present an approach towards providing advanced text mining capabilities through web services, using a recently proposed standard for textual data interchange (BioC). The web services leverage a state-of-the-art platform for text mining (OntoGene) which has been tested in several community-organized evaluation challenges, with top ranked results in several of them.
Collapse
|
12
|
Rinaldi F, Clematide S, Hafner S, Schneider G, Grigonyte G, Romacker M, Vachon T. Using the OntoGene pipeline for the triage task of BioCreative 2012. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2013; 2013:bas053. [PMID: 23396322 PMCID: PMC3568389 DOI: 10.1093/database/bas053] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
In this article, we describe the architecture of the OntoGene Relation mining pipeline and its application in the triage task of BioCreative 2012. The aim of the task is to support the triage of abstracts relevant to the process of curation of the Comparative Toxicogenomics Database. We use a conventional information retrieval system (Lucene) to provide a baseline ranking, which we then combine with information provided by our relation mining system, in order to achieve an optimized ranking. Our approach additionally delivers domain entities mentioned in each input document as well as candidate relationships, both ranked according to a confidence score computed by the system. This information is presented to the user through an advanced interface aimed at supporting the process of interactive curation. Thanks, in particular, to the high-quality entity recognition, the OntoGene system achieved the best overall results in the task.
Collapse
Affiliation(s)
- Fabio Rinaldi
- Institute of Computational Linguistics, University of Zurich, Binzmuhlestrasse 14, Zurich 8050, Switzerland.
| | | | | | | | | | | | | |
Collapse
|
13
|
Islamaj Doğan R, Yeganova L. Topics in machine learning for biomedical literature analysis and text retrieval. J Biomed Semantics 2012; 3 Suppl 3:S1. [PMID: 23046748 PMCID: PMC3465208 DOI: 10.1186/2041-1480-3-s3-s1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Affiliation(s)
- Rezarta Islamaj Doğan
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | | |
Collapse
|
14
|
Rinaldi F, Clematide S, Garten Y, Whirl-Carrillo M, Gong L, Hebert JM, Sangkuhl K, Thorn CF, Klein TE, Altman RB. Using ODIN for a PharmGKB revalidation experiment. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2012; 2012:bas021. [PMID: 22529178 PMCID: PMC3332569 DOI: 10.1093/database/bas021] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The need for efficient text-mining tools that support curation of the biomedical literature is ever increasing. In this article, we describe an experiment aimed at verifying whether a text-mining tool capable of extracting meaningful relationships among domain entities can be successfully integrated into the curation workflow of a major biological database. We evaluate in particular (i) the usability of the system's interface, as perceived by users, and (ii) the correlation of the ranking of interactions, as provided by the text-mining system, with the choices of the curators.
Collapse
Affiliation(s)
- Fabio Rinaldi
- Institute of Computational Linguistics, Binzmuhlestrasse 171, 8050 Zurich, Switzerland.
| | | | | | | | | | | | | | | | | | | |
Collapse
|