1
|
Wang X, Lyu J, Dong L, Xu K. Multitask learning for biomedical named entity recognition with cross-sharing structure. BMC Bioinformatics 2019; 20:427. [PMID: 31419937 PMCID: PMC6697996 DOI: 10.1186/s12859-019-3000-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2018] [Accepted: 07/18/2019] [Indexed: 01/06/2023] Open
Abstract
BACKGROUND Biomedical named entity recognition (BioNER) is a fundamental and essential task for biomedical literature mining, which affects the performance of downstream tasks. Most BioNER models rely on domain-specific features or hand-crafted rules, but extracting features from massive data requires much time and human efforts. To solve this, neural network models are used to automatically learn features. Recently, multi-task learning has been applied successfully to neural network models of biomedical literature mining. For BioNER models, using multi-task learning makes use of features from multiple datasets and improves the performance of models. RESULTS In experiments, we compared our proposed model with other multi-task models and found our model outperformed the others on datasets of gene, protein, disease categories. We also tested the performance of different dataset pairs to find out the best partners of datasets. Besides, we explored and analyzed the influence of different entity types by using sub-datasets. When dataset size was reduced, our model still produced positive results. CONCLUSION We propose a novel multi-task model for BioNER with the cross-sharing structure to improve the performance of multi-task models. The cross-sharing structure in our model makes use of features from both datasets in the training procedure. Detailed analysis about best partners of datasets and influence between entity categories can provide guidance of choosing proper dataset pairs for multi-task training. Our implementation is available at https://github.com/JogleLew/bioner-cross-sharing .
Collapse
Affiliation(s)
- Xi Wang
- State Key Laboratory of Software Development Environment, Beihang University, Beijing, 100191, China
| | - Jiagao Lyu
- State Key Laboratory of Software Development Environment, Beihang University, Beijing, 100191, China
| | - Li Dong
- State Key Laboratory of Software Development Environment, Beihang University, Beijing, 100191, China
| | - Ke Xu
- State Key Laboratory of Software Development Environment, Beihang University, Beijing, 100191, China.
| |
Collapse
|
2
|
Zhao Y, Jhamb D, Shu L, Arneson D, Rajpal DK, Yang X. Multi-omics integration reveals molecular networks and regulators of psoriasis. BMC SYSTEMS BIOLOGY 2019; 13:8. [PMID: 30642337 PMCID: PMC6332659 DOI: 10.1186/s12918-018-0671-x] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/08/2018] [Accepted: 12/11/2018] [Indexed: 12/19/2022]
Abstract
BACKGROUND Psoriasis is a complex multi-factorial disease, involving both genetic susceptibilities and environmental triggers. Genome-wide association studies (GWAS) and epigenome-wide association studies (EWAS) have been carried out to identify genetic and epigenetic variants that are associated with psoriasis. However, these loci cannot fully explain the disease pathogenesis. METHODS To achieve a comprehensive mechanistic understanding of psoriasis, we conducted a systems biology study, integrating multi-omics datasets including GWAS, EWAS, tissue-specific transcriptome, expression quantitative trait loci (eQTLs), gene networks, and biological pathways to identify the key genes, processes, and networks that are genetically and epigenetically associated with psoriasis risk. RESULTS This integrative genomics study identified both well-characterized (e.g., the IL17 pathway in both GWAS and EWAS) and novel biological processes (e.g., the branched chain amino acid catabolism process in GWAS and the platelet and coagulation pathway in EWAS) involved in psoriasis. Finally, by utilizing tissue-specific gene regulatory networks, we unraveled the interactions among the psoriasis-associated genes and pathways in a tissue-specific manner and detected potential key regulatory genes in the psoriasis networks. CONCLUSIONS The integration and convergence of multi-omics signals provide deeper and comprehensive insights into the biological mechanisms associated with psoriasis susceptibility.
Collapse
Affiliation(s)
- Yuqi Zhao
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Dr. East, Los Angeles, CA, 90095, USA
| | - Deepali Jhamb
- Target Sciences, Computational Biology (US) GSK, 1250 South Collegeville Road, Collegeville, PA, 19426, USA
| | - Le Shu
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Dr. East, Los Angeles, CA, 90095, USA
| | - Douglas Arneson
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Dr. East, Los Angeles, CA, 90095, USA
| | - Deepak K Rajpal
- Target Sciences, Computational Biology (US) GSK, 1250 South Collegeville Road, Collegeville, PA, 19426, USA.
| | - Xia Yang
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Dr. East, Los Angeles, CA, 90095, USA. .,Institute for Quantitative and Computational Biosciences, University of California , 610 Charles E. Young Dr. East, Los Angeles, CA, 90095, USA. .,Molecular Biology Institute, University of California, 610 Charles E. Young Dr. East, Los Angeles, CA, 90095, USA. .,Bioinformatics Interdepartmental Program, University of California, 10 Charles E. Young Dr. East, Los Angeles, CA, 90095, USA.
| |
Collapse
|
4
|
Henderson J, Ke J, Ho JC, Ghosh J, Wallace BC. Phenotype Instance Verification and Evaluation Tool (PIVET): A Scaled Phenotype Evidence Generation Framework Using Web-Based Medical Literature. J Med Internet Res 2018; 20:e164. [PMID: 29728351 PMCID: PMC5960038 DOI: 10.2196/jmir.9610] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2017] [Revised: 02/26/2018] [Accepted: 02/28/2018] [Indexed: 12/24/2022] Open
Abstract
Background Researchers are developing methods to automatically extract clinically relevant and useful patient characteristics from raw healthcare datasets. These characteristics, often capturing essential properties of patients with common medical conditions, are called computational phenotypes. Being generated by automated or semiautomated, data-driven methods, such potential phenotypes need to be validated as clinically meaningful (or not) before they are acceptable for use in decision making. Objective The objective of this study was to present Phenotype Instance Verification and Evaluation Tool (PIVET), a framework that uses co-occurrence analysis on an online corpus of publically available medical journal articles to build clinical relevance evidence sets for user-supplied phenotypes. PIVET adopts a conceptual framework similar to the pioneering prototype tool PheKnow-Cloud that was developed for the phenotype validation task. PIVET completely refactors each part of the PheKnow-Cloud pipeline to deliver vast improvements in speed without sacrificing the quality of the insights PheKnow-Cloud achieved. Methods PIVET leverages indexing in NoSQL databases to efficiently generate evidence sets. Specifically, PIVET uses a succinct representation of the phenotypes that corresponds to the index on the corpus database and an optimized co-occurrence algorithm inspired by the Aho-Corasick algorithm. We compare PIVET’s phenotype representation with PheKnow-Cloud’s by using PheKnow-Cloud’s experimental setup. In PIVET’s framework, we also introduce a statistical model trained on domain expert–verified phenotypes to automatically classify phenotypes as clinically relevant or not. Additionally, we show how the classification model can be used to examine user-supplied phenotypes in an online, rather than batch, manner. Results PIVET maintains the discriminative power of PheKnow-Cloud in terms of identifying clinically relevant phenotypes for the same corpus with which PheKnow-Cloud was originally developed, but PIVET’s analysis is an order of magnitude faster than that of PheKnow-Cloud. Not only is PIVET much faster, it can be scaled to a larger corpus and still retain speed. We evaluated multiple classification models on top of the PIVET framework and found ridge regression to perform best, realizing an average F1 score of 0.91 when predicting clinically relevant phenotypes. Conclusions Our study shows that PIVET improves on the most notable existing computational tool for phenotype validation in terms of speed and automation and is comparable in terms of accuracy.
Collapse
Affiliation(s)
- Jette Henderson
- The University of Texas at Austin, Austin, TX, United States
| | - Junyuan Ke
- Emory University, Atlanda, GA, United States
| | - Joyce C Ho
- Emory University, Atlanda, GA, United States
| | - Joydeep Ghosh
- The University of Texas at Austin, Austin, TX, United States
| | | |
Collapse
|
5
|
Rastegar-Mojarad M, Liu H, Nambisan P. Using Social Media Data to Identify Potential Candidates for Drug Repurposing: A Feasibility Study. JMIR Res Protoc 2016; 5:e121. [PMID: 27311964 PMCID: PMC4929348 DOI: 10.2196/resprot.5621] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2016] [Revised: 04/14/2016] [Accepted: 04/15/2016] [Indexed: 11/26/2022] Open
Abstract
Background Drug repurposing (defined as discovering new indications for existing drugs) could play a significant role in drug development, especially considering the declining success rates of developing novel drugs. Typically, new indications for existing medications are identified by accident. However, new technologies and a large number of available resources enable the development of systematic approaches to identify and validate drug-repurposing candidates. Patients today report their experiences with medications on social media and reveal side effects as well as beneficial effects of those medications. Objective Our aim was to assess the feasibility of using patient reviews from social media to identify potential candidates for drug repurposing. Methods We retrieved patient reviews of 180 medications from an online forum, WebMD. Using dictionary-based and machine learning approaches, we identified disease names in the reviews. Several publicly available resources were used to exclude comments containing known indications and adverse drug effects. After manually reviewing some of the remaining comments, we implemented a rule-based system to identify beneficial effects. Results The dictionary-based system and machine learning system identified 2178 and 6171 disease names respectively in 64,616 patient comments. We provided a list of 10 common patterns that patients used to report any beneficial effects or uses of medication. After manually reviewing the comments tagged by our rule-based system, we identified five potential drug repurposing candidates. Conclusions To our knowledge, this is the first study to consider using social media data to identify drug-repurposing candidates. We found that even a rule-based system, with a limited number of rules, could identify beneficial effect mentions in patient comments. Our preliminary study shows that social media has the potential to be used in drug repurposing.
Collapse
Affiliation(s)
- Majid Rastegar-Mojarad
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, United States.
| | | | | |
Collapse
|
6
|
McEntire R, Szalkowski D, Butler J, Kuo MS, Chang M, Chang M, Freeman D, McQuay S, Patel J, McGlashen M, Cornell WD, Xu JJ. Application of an automated natural language processing (NLP) workflow to enable federated search of external biomedical content in drug discovery and development. Drug Discov Today 2016; 21:826-35. [DOI: 10.1016/j.drudis.2016.03.006] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2015] [Revised: 02/25/2016] [Accepted: 03/07/2016] [Indexed: 10/22/2022]
|