Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Zweigenbaum P, Demner-Fushman D, Yu H, Cohen KB. Frontiers of biomedical text mining: current progress. Brief Bioinform 2007;8:358-75. [PMID: 17977867 PMCID: PMC2516302 DOI: 10.1093/bib/bbm045] [Citation(s) in RCA: 141] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open

For:	Zweigenbaum P, Demner-Fushman D, Yu H, Cohen KB. Frontiers of biomedical text mining: current progress. Brief Bioinform 2007;8:358-75. [PMID: 17977867 PMCID: PMC2516302 DOI: 10.1093/bib/bbm045] [Citation(s) in RCA: 141] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open

Number

Cited by Other Article(s)

Matsumoto Y, Gotoh H. Compound Classification and Consideration of Correlation with Chemical Descriptors from Articles on Antioxidant Capacity Using Natural Language Processing. J Chem Inf Model 2024;64:119-127. [PMID: 38118462 DOI: 10.1021/acs.jcim.3c01826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2023]

Natural Language Processing to Extract Information from Portuguese-Language Medical Records. DATA 2022. [DOI: 10.3390/data8010011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open

Li Y, Hui L, Zou L, Li H, Xu L, Wang X, Chua S. Relation Extraction in Biomedical Texts: Development of a Multi-Head Attention Model with Syntactic Dependency Feature (Preprint). JMIR Med Inform 2022;10:e41136. [PMID: 36264604 PMCID: PMC9634522 DOI: 10.2196/41136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2022] [Revised: 08/27/2022] [Accepted: 09/07/2022] [Indexed: 11/19/2022] Open

Abstract

Background

With the rapid expansion of biomedical literature, biomedical information extraction has attracted increasing attention from researchers. In particular, relation extraction between 2 entities is a long-term research topic.

Objective

This study aimed to perform 2 multiclass relation extraction tasks of Biomedical Natural Language Processing Workshop 2019 Open Shared Tasks: relation extraction of Bacteria-Biotope (BB-rel) task and binary relation extraction of plant seed development (SeeDev-binary) task. In essence, these 2 tasks are aimed at extracting the relation between annotated entity pairs from biomedical texts, which is a challenging problem.

Methods

Traditional research methods adopted feature- or kernel-based methods and achieved good performance. For these tasks, we propose a deep learning model based on a combination of several distributed features, such as domain-specific word embedding, part-of-speech embedding, entity-type embedding, distance embedding, and position embedding. The multi-head attention mechanism is used to extract the global semantic features of an entire sentence. Meanwhile, we introduced a dependency-type feature and the shortest dependency path connecting 2 candidate entities in the syntactic dependency graph to enrich the feature representation.

Results

Experiments show that our proposed model has excellent performance in biomedical relation extraction, achieving F₁ scores of 65.56% and 38.04% on the test sets of the BB-rel and SeeDev-binary tasks. Especially in the SeeDev-binary task, the F₁ score of our model is superior to that of other existing models and achieves state-of-the-art performance.

Conclusions

We demonstrated that the multi-head attention mechanism can learn relevant syntactic and semantic features in different representation subspaces and different positions to extract comprehensive feature representation. Moreover, syntactic dependency features can improve the performance of the model by learning dependency relation between the entities in biomedical texts.

Collapse

Ong SQ, Pauzi MBM, Gan KH. Text mining in mosquito-borne disease: A systematic review. Acta Trop 2022;231:106447. [PMID: 35430265 PMCID: PMC9663275 DOI: 10.1016/j.actatropica.2022.106447] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Revised: 03/31/2022] [Accepted: 04/01/2022] [Indexed: 01/09/2023]

Abstract

Mosquito-borne diseases are emerging and re-emerging across the globe, especially after the COVID19 pandemic. The recent advances in text mining in infectious diseases hold the potential of providing timely access to explicit and implicit associations among information in the text. In the past few years, the availability of online text data in the form of unstructured or semi-structured text with rich content of information from this domain enables many studies to provide solutions in this area, e.g., disease-related knowledge discovery, disease surveillance, early detection system, etc. However, a recent review of text mining in the domain of mosquito-borne disease was not available to the best of our knowledge. In this review, we survey the recent works in the text mining techniques used in combating mosquito-borne diseases. We highlight the corpus sources, technologies, applications, and the challenges faced by the studies, followed by the possible future directions that can be taken further in this domain. We present a bibliometric analysis of the 294 scientific articles that have been published in Scopus and PubMed in the domain of text mining in mosquito-borne diseases, from the year 2016 to 2021. The papers were further filtered and reviewed based on the techniques used to analyze the text related to mosquito-borne diseases. Based on the corpus of 158 selected articles, we found 27 of the articles were relevant and used text mining in mosquito-borne diseases. These articles covered the majority of Zika (38.70%), Dengue (32.26%), and Malaria (29.03%), with extremely low numbers or none of the other crucial mosquito-borne diseases like chikungunya, yellow fever, West Nile fever. Twitter was the dominant corpus resource to perform text mining in mosquito-borne diseases, followed by PubMed and LexisNexis databases. Sentiment analysis was the most popular technique of text mining to understand the discourse of the disease and followed by information extraction, which dependency relation and co-occurrence-based approach to extract relations and events. Surveillance was the main usage of most of the reviewed studies and followed by treatment, which focused on the drug-disease or symptom-disease association. The advance in text mining could improve the management of mosquito-borne diseases. However, the technique and application posed many limitations and challenges, including biases like user authentication and language, real-world implementation, etc. We discussed the future direction which can be useful to expand this area and domain. This review paper contributes mainly as a library for text mining in mosquito-borne diseases and could further explore the system for other neglected diseases.

Collapse

Transducer Cascades for Biological Literature-Based Discovery. INFORMATION 2022. [DOI: 10.3390/info13050262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open

Abstract G protein-coupled receptors (GPCRs) control the response of cells to many signals, and as such, are involved in most cellular processes. As membrane receptors, they are accessible at the surface of the cell. GPCRs are also the largest family of membrane receptors, with more than 800 representatives in mammal genomes. For this reason, they are ideal targets for drugs. Although about one third of approved drugs target GPCRs, only about 16% of GPCRs are targeted by drugs. One of the difficulties comes from the lack of knowledge on the intra-cellular events triggered by these molecules. In the last two decades, scientists have started mapping the signaling networks triggered by GPCRs. However, it soon appeared that the system is very complex, which led to the publication of more than 320,000 scientific papers. Clearly, a human cannot take into account such massive sources of information. These papers represent a mine of information about both ontological knowledge and experimental results related to GPCRs, which have to be exploited in order to build signaling networks. The ABLISS project aims at the automatic building of GPCRs networks using automated deductive reasoning, allowing to integrate all available data. Therefore, we processed the automatic extraction of network information from the literature using Natural Language Processing (NLP). We mainly focused on the experimental results about GPCRs reported in the scientific papers, as so far there is no source gathering all these experimental results. We designed a relational database in order to make them available to the scientific community later. After introducing the more general objectives of the ABLISS project, we describe the formalism in detail. We then explain the NLP program using the finite state methods (Unitex graph cascades) we implemented and discuss the extracted facts obtained. Finally, we present the design of the relational database that stores the facts extracted from the selected papers. Collapse

Zhao S, Su C, Lu Z, Wang F. Recent advances in biomedical literature mining. Brief Bioinform 2021;22:bbaa057. [PMID: 32422651 PMCID: PMC8138828 DOI: 10.1093/bib/bbaa057] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Revised: 03/22/2020] [Accepted: 03/25/2020] [Indexed: 01/26/2023] Open

Wang LL, Lo K. Text mining approaches for dealing with the rapidly expanding literature on COVID-19. Brief Bioinform 2021;22:781-799. [PMID: 33279995 PMCID: PMC7799291 DOI: 10.1093/bib/bbaa296] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Revised: 10/02/2020] [Accepted: 10/07/2020] [Indexed: 12/13/2022] Open

Espinosa C, Becker M, Marić I, Wong RJ, Shaw GM, Gaudilliere B, Aghaeepour N, Stevenson DK. Data-Driven Modeling of Pregnancy-Related Complications. Trends Mol Med 2021;27:762-776. [PMID: 33573911 DOI: 10.1016/j.molmed.2021.01.007] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Revised: 12/01/2020] [Accepted: 01/20/2021] [Indexed: 12/11/2022]

Chen Y. A transfer learning model with multi-source domains for biomedical event trigger extraction. BMC Genomics 2021;22:31. [PMID: 33413073 PMCID: PMC7788773 DOI: 10.1186/s12864-020-07315-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2020] [Accepted: 12/07/2020] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Automatic extraction of biomedical events from literature, that allows for faster update of the latest discoveries automatically, is a heated research topic now. Trigger word recognition is a critical step in the process of event extraction. Its performance directly influences the results of the event extraction. In general, machine learning-based trigger recognition approaches such as neural networks must to be trained on a dataset with plentiful annotations to achieve high performances. However, the problem of the datasets in wide coverage event domains is that their annotations are insufficient and imbalance. One of the methods widely used to deal with this problem is transfer learning. In this work, we aim to extend the transfer learning to utilize multiple source domains. Multiple source domain datasets can be jointly trained to help achieve a higher recognition performance on a target domain with wide coverage events.

RESULTS

Based on the study of previous work, we propose an improved multi-source domain neural network transfer learning architecture and a training approach for biomedical trigger detection task, which can share knowledge between the multi-source and target domains more comprehensively. We extend the ability of traditional adversarial networks to extract common features between source and target domains, when there is more than one dataset in the source domains. Multiple feature extraction channels to simultaneously capture global and local common features are designed. Moreover, under the constraint of an extra classifier, the multiple local common feature sub-channels can extract and transfer more diverse common features from the related multi-source domains effectively. In the experiments, MLEE corpus is used to train and test the proposed model to recognize the wide coverage triggers as a target dataset. Other four corpora with the varying degrees of relevance with MLEE from different domains are used as source datasets, respectively. Our proposed approach achieves recognition improvement compared with traditional adversarial networks. Moreover, its performance is competitive compared with the results of other leading systems on the same MLEE corpus.

CONCLUSIONS

The proposed Multi-Source Transfer Learning-based Trigger Recognizer (MSTLTR) can further improve the performance compared with the traditional method, when the source domains are more than one. The most essential improvement is that our approach represents common features in two aspects: the global common features and the local common features. Hence, these more sharable features improve the performance and generalization of the model on the target domain effectively.

Collapse

Sousa D, Lamurias A, Couto FM. Using Neural Networks for Relation Extraction from Biomedical Literature. Methods Mol Biol 2021;2190:289-305. [PMID: 32804372 DOI: 10.1007/978-1-0716-0826-5_14] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Cheerkoot-Jalim S, Khedo KK. A systematic review of text mining approaches applied to various application areas in the biomedical domain. JOURNAL OF KNOWLEDGE MANAGEMENT 2020. [DOI: 10.1108/jkm-09-2019-0524] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]

Abstract Purpose This work shows the results of a systematic literature review on biomedical text mining. The purpose of this study is to identify the different text mining approaches used in different application areas of the biomedical domain, the common tools used and the challenges of biomedical text mining as compared to generic text mining algorithms. This study will be of value to biomedical researchers by allowing them to correlate text mining approaches to specific biomedical application areas. Implications for future research are also discussed. Design/methodology/approach The review was conducted following the principles of the Kitchenham method. A number of research questions were first formulated, followed by the definition of the search strategy. The papers were then selected based on a list of assessment criteria. Each of the papers were analyzed and information relevant to the research questions were extracted. Findings It was found that researchers have mostly harnessed data sources such as electronic health records, biomedical literature, social media and health-related forums. The most common text mining technique was natural language processing using tools such as MetaMap and Unstructured Information Management Architecture, alongside the use of medical terminologies such as Unified Medical Language System. The main application area was the detection of adverse drug events. Challenges identified included the need to deal with huge amounts of text, the heterogeneity of the different data sources, the duality of meaning of words in biomedical text and the amount of noise introduced mainly from social media and health-related forums. Originality/value To the best of the authors’ knowledge, other reviews in this area have focused on either specific techniques, specific application areas or specific data sources. The results of this review will help researchers to correlate most relevant and recent advances in text mining approaches to specific biomedical application areas by providing an up-to-date and holistic view of work done in this research area. The use of emerging text mining techniques has great potential to spur the development of innovative applications, thus considerably impacting on the advancement of biomedical research. Collapse

Menadue CB. Pandemics, epidemics, viruses, plagues, and disease: Comparative frequency analysis of a cultural pathology reflected in science fiction magazines from 1926 to 2015. ACTA ACUST UNITED AC 2020;2:100048. [PMID: 34173491 PMCID: PMC7480741 DOI: 10.1016/j.ssaho.2020.100048] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 07/13/2020] [Accepted: 07/13/2020] [Indexed: 12/03/2022]

Current trends in cancer immunotherapy: a literature-mining analysis. Cancer Immunol Immunother 2020;69:2425-2439. [PMID: 32556496 DOI: 10.1007/s00262-020-02630-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2020] [Accepted: 05/28/2020] [Indexed: 11/27/2022]

Zhu H, Zeng Y, Wang D, Huangfu C. Species Classification for Neuroscience Literature Based on Span of Interest Using Sequence-to-Sequence Learning Model. Front Hum Neurosci 2020;14:128. [PMID: 32372933 PMCID: PMC7187631 DOI: 10.3389/fnhum.2020.00128] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2020] [Accepted: 03/19/2020] [Indexed: 11/13/2022] Open

Abstract

Large-scale neuroscience literature call for effective methods to mine the knowledge from species perspective to link the brain and neuroscience communities, neurorobotics, computing devices, and AI research communities. Structured knowledge can motivate researchers to better understand the functionality and structure of the brain and link the related resources and components. However, the abstracts of massive scientific works do not explicitly mention the species. Therefore, in addition to dictionary-based methods, we need to mine species using cognitive computing models that are more like the human reading process, and these methods can take advantage of the rich information in the literature. We also enable the model to automatically distinguish whether the mentioned species is the main research subject. Distinguishing the two situations can generate value at different levels of knowledge management. We propose SpecExplorer project which is used to explore the knowledge associations of different species for brain and neuroscience. This project frees humans from the tedious task of classifying neuroscience literature by species. Species classification task belongs to the multi-label classification which is more complex than the single-label classification due to the correlation between labels. To resolve this problem, we present the sequence-to-sequence classification framework to adaptively assign multiple species to the literature. To model the structure information of documents, we propose the hierarchical attentive decoding (HAD) to extract span of interest (SOI) for predicting each species. We create three datasets from PubMed and PMC corpora. We present two versions of annotation criteria (mention-based annotation and semantic-based annotation) for species research. Experiments demonstrate that our approach achieves improvements in the final results. Finally, we perform species-based analysis of brain diseases, brain cognitive functions, and proteins related to the hippocampus and provide potential research directions for certain species.

Collapse

Krsnik I, Glavaš G, Krsnik M, Miletić D, Štajduhar I. Automatic Annotation of Narrative Radiology Reports. Diagnostics (Basel) 2020;10:E196. [PMID: 32244833 PMCID: PMC7235892 DOI: 10.3390/diagnostics10040196] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2020] [Revised: 03/27/2020] [Accepted: 03/27/2020] [Indexed: 12/04/2022] Open

Abstract

Narrative texts in electronic health records can be efficiently utilized for building decision support systems in the clinic, only if they are correctly interpreted automatically in accordance with a specified standard. This paper tackles the problem of developing an automated method of labeling free-form radiology reports, as a precursor for building query-capable report databases in hospitals. The analyzed dataset consists of 1295 radiology reports concerning the condition of a knee, retrospectively gathered at the Clinical Hospital Centre Rijeka, Croatia. Reports were manually labeled with one or more labels from a set of 10 most commonly occurring clinical conditions. After primary preprocessing of the texts, two sets of text classification methods were compared: (1) traditional classification models-Naive Bayes (NB), Logistic Regression (LR), Support Vector Machine (SVM), and Random Forests (RF)-coupled with Bag-of-Words (BoW) features (i.e., symbolic text representation) and (2) Convolutional Neural Network (CNN) coupled with dense word vectors (i.e., word embeddings as a semantic text representation) as input features. We resorted to nested 10-fold cross-validation to evaluate the performance of competing methods using accuracy, precision, recall, and F 1 score. The CNN with semantic word representations as input yielded the overall best performance, having a micro-averaged F 1 score of 86 . 7 % . The CNN classifier yielded particularly encouraging results for the most represented conditions: degenerative disease ( 95 . 9 % ), arthrosis ( 93 . 3 % ), and injury ( 89 . 2 % ). As a data-hungry deep learning model, the CNN, however, performed notably worse than the competing models on underrepresented classes with fewer training instances such as multicausal disease or metabolic disease. LR, RF, and SVM performed comparably well, with the obtained micro-averaged F 1 scores of 84 . 6 % , 82 . 2 % , and 82 . 1 % , respectively.

Collapse

Jiang K, Yang T, Wu C, Chen L, Mao L, Wu Y, Deng L, Jiang T. LATTE: A knowledge-based method to normalize various expressions of laboratory test results in free text of Chinese electronic health records. J Biomed Inform 2020;102:103372. [PMID: 31901507 DOI: 10.1016/j.jbi.2019.103372] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2019] [Revised: 12/29/2019] [Accepted: 12/30/2019] [Indexed: 12/23/2022]

Abstract

BACKGROUND

A wealth of clinical information is buried in free text of electronic health records (EHR), and converting clinical information to machine-understandable form is crucial for the secondary use of EHRs. Laboratory test results, as one of the most important types of clinical information, are written in various styles in free text of EHRs. This has brought great difficulties for data integration and utilization of EHRs. Therefore, developing technology to normalize different expressions of laboratory test results in free text is indispensable for the secondary use of EHRs.

METHODS

In this study, we developed a knowledge-based method named LATTE (transforming lab test results), which could transform various expressions of laboratory test results into a normalized and machine-understandable format. We first identified the analyte of a laboratory test result with a dictionary-based method and then designed a series of rules to detect information associated with the analyte, including its specimen, measured value, unit of measure, conclusive phrase and sampling factor. We determined whether a test result is normal or abnormal by understanding the meaning of conclusive phrases or by comparing its measured value with an appropriate normal range. Finally, we converted various expressions of laboratory test results, either in numeric or textual form, into a normalized form as "specimen-analyte-abnormality". With this method, a laboratory test with the same type of abnormality would have the same representation, regardless of the way that it is mentioned in free text.

RESULTS

LATTE was developed and optimized on a training set including 8894 laboratory test results from 756 EHRs, and evaluated on a test set including 3740 laboratory test results from 210 EHRs. Compared to experts' annotations, LATTE achieved a precision of 0.936, a recall of 0.897 and an F1 score of 0.916 on the training set, and a precision of 0.892, a recall of 0.843 and an F1 score of 0.867 on the test set. For 223 laboratory tests with at least two different expression forms in the test set, LATTE transformed 85.7% (2870/3350) of laboratory test results into a normalized form. Besides, LATTE achieved F1 scores above 0.8 for EHRs from 18 of 21 different hospital departments, indicating its generalization capabilities in normalizing laboratory test results.

CONCLUSION

In conclusion, LATTE is an effective method for normalizing various expressions of laboratory test results in free text of EHRs. LATTE will facilitate EHR-based applications such as cohort querying, patient clustering and machine learning.

AVAILABILITY

LATTE is freely available for download on GitHub (https://github.com/denglizong/LATTE).

Collapse

Boland MR, Kashyap A, Xiong J, Holmes J, Lorch S. Development and validation of the PEPPER framework (Prenatal Exposure PubMed ParsER) with applications to food additives. J Am Med Inform Assoc 2019;25:1432-1443. [PMID: 30371821 PMCID: PMC6213088 DOI: 10.1093/jamia/ocy119] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2018] [Accepted: 08/13/2018] [Indexed: 11/14/2022] Open

Farahmand S, Riley T, Zarringhalam K. ModEx: A text mining system for extracting mode of regulation of transcription factor-gene regulatory interaction. J Biomed Inform 2019;102:103353. [PMID: 31857203 DOI: 10.1016/j.jbi.2019.103353] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2019] [Revised: 11/22/2019] [Accepted: 12/10/2019] [Indexed: 10/25/2022]

Abstract

BACKGROUND

Transcription factors (TFs) are proteins that are fundamental to transcription and regulation of gene expression. Each TF may regulate multiple genes and each gene may be regulated by multiple TFs. TFs can act as either activator or repressor of gene expression. This complex network of interactions between TFs and genes underlies many developmental and biological processes and is implicated in several human diseases such as cancer. Hence deciphering the network of TF-gene interactions with information on mode of regulation (activation vs. repression) is an important step toward understanding the regulatory pathways that underlie complex traits. There are many experimental, computational, and manually curated databases of TF-gene interactions. In particular, high-throughput ChIP-Seq datasets provide a large-scale map or transcriptional regulatory interactions. However, these interactions are not annotated with information on context and mode of regulation. Such information is crucial to gain a global picture of gene regulatory mechanisms and can aid in developing machine learning models for applications such as biomarker discovery, prediction of response to therapy, and precision medicine.

METHODS

In this work, we introduce a text-mining system to annotate ChIP-Seq derived interaction with such meta data through mining PubMed articles. We evaluate the performance of our system using gold standard small scale manually curated databases.

RESULTS

Our results show that the method is able to accurately extract mode of regulation with F-score 0.77 on TRRUST curated interaction and F-score 0.96 on intersection of TRUSST and ChIP-network. We provide a HTTP REST API for our code to facilitate usage. Availibility: Source code and datasets are available for download on GitHub: https://github.com/samanfrm/modex.

Collapse

Zhang T, Lin H, Ren Y, Yang L, Xu B, Yang Z, Wang J, Zhang Y. Adverse drug reaction detection via a multihop self-attention mechanism. BMC Bioinformatics 2019;20:479. [PMID: 31533622 PMCID: PMC6751590 DOI: 10.1186/s12859-019-3053-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Accepted: 08/26/2019] [Indexed: 12/17/2022] Open

Abstract

Background

The adverse reactions that are caused by drugs are potentially life-threatening problems. Comprehensive knowledge of adverse drug reactions (ADRs) can reduce their detrimental impacts on patients. Detecting ADRs through clinical trials takes a large number of experiments and a long period of time. With the growing amount of unstructured textual data, such as biomedical literature and electronic records, detecting ADRs in the available unstructured data has important implications for ADR research. Most of the neural network-based methods typically focus on the simple semantic information of sentence sequences; however, the relationship of the two entities depends on more complex semantic information.

Methods

In this paper, we propose multihop self-attention mechanism (MSAM) model that aims to learn the multi-aspect semantic information for the ADR detection task. first, the contextual information of the sentence is captured by using the bidirectional long short-term memory (Bi-LSTM) model. Then, via applying the multiple steps of an attention mechanism, multiple semantic representations of a sentence are generated. Each attention step obtains a different attention distribution focusing on the different segments of the sentence. Meanwhile, our model locates and enhances various keywords from the multiple representations of a sentence.

Results

Our model was evaluated by using two ADR corpora. It is shown that the method has a stable generalization ability. Via extensive experiments, our model achieved F-measure of 0.853, 0.799 and 0.851 for ADR detection for TwiMed-PubMed, TwiMed-Twitter, and ADE, respectively. The experimental results showed that our model significantly outperforms other compared models for ADR detection.

Conclusions

In this paper, we propose a modification of multihop self-attention mechanism (MSAM) model for an ADR detection task. The proposed method significantly improved the learning of the complex semantic information of sentences.

Collapse

Chen Y. Multiple-level biomedical event trigger recognition with transfer learning. BMC Bioinformatics 2019;20:459. [PMID: 31492112 PMCID: PMC6731566 DOI: 10.1186/s12859-019-3030-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Accepted: 08/16/2019] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Automatic extraction of biomedical events from literature is an important task in the understanding biological systems, allowing for faster update of the latest discoveries automatically. Detecting trigger words which indicate events is a critical step in the process of event extraction, because following steps depend on the recognized triggers. The task in this study is to identify event triggers from the literature across multiple levels of biological organization. In order to achieve high performances, the machine learning based approaches, such as neural networks, must be trained on a dataset with plentiful annotations. However, annotations might be difficult to obtain on the multiple levels, and annotated resources have so far mainly focused on the relations and processes at the molecular level. In this work, we aim to apply transfer learning for multiple-level trigger recognition, in which a source dataset with sufficient annotations on the molecular level is utilized to improve performance on a target domain with insufficient annotations and more trigger types.

RESULTS

We propose a generalized cross-domain neural network transfer learning architecture and approach, which can share as much knowledge as possible between the source and target domains, especially when their label sets overlap. In the experiments, MLEE corpus is used to train and test the proposed model to recognize the multiple-level triggers as a target dataset. Two different corpora having the varying degrees of overlapping labels with MLEE from the BioNLP'09 and BioNLP'11 Shared Tasks are used as source datasets, respectively. Regardless of the degree of overlap, our proposed approach achieves recognition improvement. Moreover, its performance exceeds previously reported results of other leading systems on the same MLEE corpus.

CONCLUSIONS

The proposed transfer learning method can further improve the performance compared with the traditional method, when the labels of the source and target datasets overlap. The most essential reason is that our approach has changed the way parameters are shared. The vertical sharing replaces the horizontal sharing, which brings more sharable parameters. Hence, these more shared parameters between networks improve the performance and generalization of the model on the target domain effectively.

Collapse

Percha B, Altman RB. A global network of biomedical relationships derived from text. Bioinformatics 2019;34:2614-2624. [PMID: 29490008 PMCID: PMC6061699 DOI: 10.1093/bioinformatics/bty114] [Citation(s) in RCA: 67] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2017] [Accepted: 02/26/2018] [Indexed: 11/14/2022] Open

Abstract

Motivation

The biomedical community’s collective understanding of how chemicals, genes and phenotypes interact is distributed across the text of over 24 million research articles. These interactions offer insights into the mechanisms behind higher order biochemical phenomena, such as drug-drug interactions and variations in drug response across individuals. To assist their curation at scale, we must understand what relationship types are possible and map unstructured natural language descriptions onto these structured classes. We used NCBI’s PubTator annotations to identify instances of chemical, gene and disease names in Medline abstracts and applied the Stanford dependency parser to find connecting dependency paths between pairs of entities in single sentences. We combined a published ensemble biclustering algorithm (EBC) with hierarchical clustering to group the dependency paths into semantically-related categories, which we annotated with labels, or ‘themes’ (‘inhibition’ and ‘activation’, for example). We evaluated our theme assignments against six human-curated databases: DrugBank, Reactome, SIDER, the Therapeutic Target Database, OMIM and PharmGKB.

Results

Clustering revealed 10 broad themes for chemical-gene relationships, 7 for chemical-disease, 10 for gene-disease and 9 for gene–gene. In most cases, enriched themes corresponded directly to known database relationships. Our final dataset, represented as a network, contained 37 491 thematically-labeled chemical-gene edges, 2 021 192 chemical-disease edges, 136 206 gene-disease edges and 41 418 gene–gene edges, each representing a single-sentence description of an interaction from somewhere in the literature.

Availability and implementation

The complete network is available on Zenodo (https://zenodo.org/record/1035500). We have also provided the full set of dependency paths connecting biomedical entities in Medline abstracts, with associated sentences, for future use by the biomedical research community.

Supplementary information

Supplementary data are available at Bioinformatics online.

Collapse

Azam MF, Musa A, Dehmer M, Yli-Harja OP, Emmert-Streib F. Global Genetics Research in Prostate Cancer: A Text Mining and Computational Network Theory Approach. Front Genet 2019;10:70. [PMID: 30838019 PMCID: PMC6383410 DOI: 10.3389/fgene.2019.00070] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2018] [Accepted: 01/28/2019] [Indexed: 11/13/2022] Open

Rusanov A, Miotto R, Weng C. Trends in anesthesiology research: a machine learning approach to theme discovery and summarization. JAMIA Open 2018;1:283-293. [PMID: 30474079 PMCID: PMC6241511 DOI: 10.1093/jamiaopen/ooy009] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2017] [Revised: 03/18/2018] [Accepted: 08/23/2018] [Indexed: 11/13/2022] Open

Oh SY, Kim JH, Kim SJ, Nam HJ, Park HS. GNI Corpus Version 1.0: Annotated Full-Text Corpus of Genomics & Informatics to Support Biomedical Information Extraction. Genomics Inform 2018;16:75-77. [PMID: 30309207 PMCID: PMC6187819 DOI: 10.5808/gi.2018.16.3.75] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2018] [Accepted: 08/23/2018] [Indexed: 11/21/2022] Open

Mower J, Subramanian D, Cohen T. Learning predictive models of drug side-effect relationships from distributed representations of literature-derived semantic predications. J Am Med Inform Assoc 2018;25:1339-1350. [PMID: 30010902 PMCID: PMC6454491 DOI: 10.1093/jamia/ocy077] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2018] [Revised: 04/23/2018] [Accepted: 06/05/2018] [Indexed: 02/01/2023] Open

Abstract

Objective

The aim of this work is to leverage relational information extracted from biomedical literature using a novel synthesis of unsupervised pretraining, representational composition, and supervised machine learning for drug safety monitoring.

Methods

Using ≈80 million concept-relationship-concept triples extracted from the literature using the SemRep Natural Language Processing system, distributed vector representations (embeddings) were generated for concepts as functions of their relationships utilizing two unsupervised representational approaches. Embeddings for drugs and side effects of interest from two widely used reference standards were then composed to generate embeddings of drug/side-effect pairs, which were used as input for supervised machine learning. This methodology was developed and evaluated using cross-validation strategies and compared to contemporary approaches. To qualitatively assess generalization, models trained on the Observational Medical Outcomes Partnership (OMOP) drug/side-effect reference set were evaluated against a list of ≈1100 drugs from an online database.

Results

The employed method improved performance over previous approaches. Cross-validation results advance the state of the art (AUC 0.96; F1 0.90 and AUC 0.95; F1 0.84 across the two sets), outperforming methods utilizing literature and/or spontaneous reporting system data. Examination of predictions for unseen drug/side-effect pairs indicates the ability of these methods to generalize, with over tenfold label support enrichment in the top 100 predictions versus the bottom 100 predictions.

Discussion and Conclusion

Our methods can assist the pharmacovigilance process using information from the biomedical literature. Unsupervised pretraining generates a rich relationship-based representational foundation for machine learning techniques to classify drugs in the context of a putative side effect, given known examples.

Collapse

Vilar S, Friedman C, Hripcsak G. Detection of drug-drug interactions through data mining studies using clinical sources, scientific literature and social media. Brief Bioinform 2018;19:863-877. [PMID: 28334070 PMCID: PMC6454455 DOI: 10.1093/bib/bbx010] [Citation(s) in RCA: 82] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2016] [Revised: 12/28/2016] [Indexed: 11/13/2022] Open

Zhu Y, Elemento O, Pathak J, Wang F. Drug knowledge bases and their applications in biomedical informatics research. Brief Bioinform 2018;20:1308-1321. [DOI: 10.1093/bib/bbx169] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2017] [Revised: 11/15/2017] [Indexed: 11/14/2022] Open

Smalheiser NR. Rediscovering Don Swanson: the Past, Present and Future of Literature-Based Discovery. JOURNAL OF DATA AND INFORMATION SCIENCE 2017;2:43-64. [PMID: 29355246 PMCID: PMC5771422 DOI: 10.1515/jdis-2017-0019] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open

Singh G, Marshall IJ, Thomas J, Shawe-Taylor J, Wallace BC. A Neural Candidate-Selector Architecture for Automatic Structured Clinical Text Annotation. PROCEEDINGS OF THE ... ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT. ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT 2017;2017:1519-1528. [PMID: 29308293 PMCID: PMC5752318 DOI: 10.1145/3132847.3132989] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

Kang T, Zhang S, Tang Y, Hruby GW, Rusanov A, Elhadad N, Weng C. EliIE: An open-source information extraction system for clinical trial eligibility criteria. J Am Med Inform Assoc 2017;24:1062-1071. [PMID: 28379377 PMCID: PMC6259668 DOI: 10.1093/jamia/ocx019] [Citation(s) in RCA: 53] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2016] [Revised: 01/31/2017] [Accepted: 03/02/2017] [Indexed: 12/22/2022] Open

Correlating Lab Test Results in Clinical Notes with Structured Lab Data: A Case Study in HbA1c and Glucose. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2017;2017:221-228. [PMID: 28815133 PMCID: PMC5543347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

Kuusisto F, Steill J, Kuang Z, Thomson J, Page D, Stewart R. A Simple Text Mining Approach for Ranking Pairwise Associations in Biomedical Applications. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2017;2017:166-174. [PMID: 28815126 PMCID: PMC5543342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]

Yan S, Wong KC. Elucidating high-dimensional cancer hallmark annotation via enriched ontology. J Biomed Inform 2017;73:84-94. [PMID: 28723579 DOI: 10.1016/j.jbi.2017.07.011] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2016] [Revised: 05/23/2017] [Accepted: 07/14/2017] [Indexed: 10/19/2022]

Wang P, Hao T, Yan J, Jin L. Large-scale extraction of drug-disease pairs from the medical literature. J Assoc Inf Sci Technol 2017. [DOI: 10.1002/asi.23876] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]

Krallinger M, Rabal O, Lourenço A, Oyarzabal J, Valencia A. Information Retrieval and Text Mining Technologies for Chemistry. Chem Rev 2017;117:7673-7761. [PMID: 28475312 DOI: 10.1021/acs.chemrev.6b00851] [Citation(s) in RCA: 111] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]

Karystianis G, Thayer K, Wolfe M, Tsafnat G. Evaluation of a rule-based method for epidemiological document classification towards the automation of systematic reviews. J Biomed Inform 2017;70:27-34. [PMID: 28455150 DOI: 10.1016/j.jbi.2017.04.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2016] [Revised: 03/14/2017] [Accepted: 04/02/2017] [Indexed: 02/02/2023]

Abstract

INTRODUCTION

Most data extraction efforts in epidemiology are focused on obtaining targeted information from clinical trials. In contrast, limited research has been conducted on the identification of information from observational studies, a major source for human evidence in many fields, including environmental health. The recognition of key epidemiological information (e.g., exposures) through text mining techniques can assist in the automation of systematic reviews and other evidence summaries.

METHOD

We designed and applied a knowledge-driven, rule-based approach to identify targeted information (study design, participant population, exposure, outcome, confounding factors, and the country where the study was conducted) from abstracts of epidemiological studies included in several systematic reviews of environmental health exposures. The rules were based on common syntactical patterns observed in text and are thus not specific to any systematic review. To validate the general applicability of our approach, we compared the data extracted using our approach versus hand curation for 35 epidemiological study abstracts manually selected for inclusion in two systematic reviews.

RESULTS

The returned F-score, precision, and recall ranged from 70% to 98%, 81% to 100%, and 54% to 97%, respectively. The highest precision was observed for exposure, outcome and population (100%) while recall was best for exposure and study design with 97% and 89%, respectively. The lowest recall was observed for the population (54%), which also had the lowest F-score (70%).

CONCLUSION

The generated performance of our text-mining approach demonstrated encouraging results for the identification of targeted information from observational epidemiological study abstracts related to environmental exposures. We have demonstrated that rules based on generic syntactic patterns in one corpus can be applied to other observational study design by simple interchanging the dictionaries aiming to identify certain characteristics (i.e., outcomes, exposures). At the document level, the recognised information can assist in the selection and categorization of studies included in a systematic review.

Collapse

Wu H, Oellrich A, Girges C, de Bono B, Hubbard TJ, Dobson RJ. Automated PDF highlighting to support faster curation of literature for Parkinson's and Alzheimer's disease. Database (Oxford) 2017;2017:3091736. [PMID: 28365743 PMCID: PMC5467557 DOI: 10.1093/database/bax027] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2016] [Revised: 01/23/2017] [Accepted: 03/08/2017] [Indexed: 12/20/2022]

Luo Y, Uzuner Ö, Szolovits P. Bridging semantics and syntax with graph algorithms-state-of-the-art of extracting biomedical relations. Brief Bioinform 2017;18:160-178. [PMID: 26851224 PMCID: PMC5221425 DOI: 10.1093/bib/bbw001] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2015] [Revised: 11/29/2015] [Indexed: 01/18/2023] Open

Feng Q, Gui Y, Yang Z, Wang L, Li Y. Semisupervised Learning Based Disease-Symptom and Symptom-Therapeutic Substance Relation Extraction from Biomedical Literature. BIOMED RESEARCH INTERNATIONAL 2016;2016:3594937. [PMID: 27822473 PMCID: PMC5086401 DOI: 10.1155/2016/3594937] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/24/2016] [Revised: 07/13/2016] [Accepted: 08/18/2016] [Indexed: 11/18/2022]

Swain MC, Cole JM. ChemDataExtractor: A Toolkit for Automated Extraction of Chemical Information from the Scientific Literature. J Chem Inf Model 2016;56:1894-1904. [PMID: 27669338 DOI: 10.1021/acs.jcim.6b00207] [Citation(s) in RCA: 158] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Abstract

The emergence of "big data" initiatives has led to the need for tools that can automatically extract valuable chemical information from large volumes of unstructured data, such as the scientific literature. Since chemical information can be present in figures, tables, and textual paragraphs, successful information extraction often depends on the ability to interpret all of these domains simultaneously. We present a complete toolkit for the automated extraction of chemical entities and their associated properties, measurements, and relationships from scientific documents that can be used to populate structured chemical databases. Our system provides an extensible, chemistry-aware, natural language processing pipeline for tokenization, part-of-speech tagging, named entity recognition, and phrase parsing. Within this scope, we report improved performance for chemical named entity recognition through the use of unsupervised word clustering based on a massive corpus of chemistry articles. For phrase parsing and information extraction, we present the novel use of multiple rule-based grammars that are tailored for interpreting specific document domains such as textual paragraphs, captions, and tables. We also describe document-level processing to resolve data interdependencies and show that this is particularly necessary for the autogeneration of chemical databases since captions and tables commonly contain chemical identifiers and references that are defined elsewhere in the text. The performance of the toolkit to correctly extract various types of data was evaluated, affording an F-score of 93.4%, 86.8%, and 91.5% for extracting chemical identifiers, spectroscopic attributes, and chemical property attributes, respectively; set against the CHEMDNER chemical name extraction challenge, ChemDataExtractor yields a competitive F-score of 87.8%. All tools have been released under the MIT license and are available to download from http://www.chemdataextractor.org .

Collapse

Papamokos G, Silins I. Combining QSAR Modeling and Text-Mining Techniques to Link Chemical Structures and Carcinogenic Modes of Action. Front Pharmacol 2016;7:284. [PMID: 27625608 PMCID: PMC5003827 DOI: 10.3389/fphar.2016.00284] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2016] [Accepted: 08/18/2016] [Indexed: 12/28/2022] Open

Sharma V, Law W, Balick MJ, Sarkar IN. Identifying Plant-Human Disease Associations in Biomedical Literature: A Case Study. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2016;2016:84-93. [PMID: 27595045 PMCID: PMC5009952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

Abbe A, Grouin C, Zweigenbaum P, Falissard B. Text mining applications in psychiatry: a systematic literature review. Int J Methods Psychiatr Res 2016;25:86-100. [PMID: 26184780 PMCID: PMC6877250 DOI: 10.1002/mpr.1481] [Citation(s) in RCA: 59] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/05/2014] [Revised: 01/21/2015] [Accepted: 04/09/2015] [Indexed: 11/08/2022] Open

Zhu Y, Song M, Yan E. Identifying Liver Cancer and Its Relations with Diseases, Drugs, and Genes: A Literature-Based Approach. PLoS One 2016;11:e0156091. [PMID: 27195695 PMCID: PMC4873143 DOI: 10.1371/journal.pone.0156091] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2015] [Accepted: 05/09/2016] [Indexed: 12/04/2022] Open

Jain S, Tumkur KR, Kuo TT, Bhargava S, Lin G, Hsu CN. Weakly supervised learning of biomedical information extraction from curated data. BMC Bioinformatics 2016;17 Suppl 1:1. [PMID: 26817711 PMCID: PMC4847485 DOI: 10.1186/s12859-015-0844-1] [Citation(s) in RCA: 70] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open

Ahmed Z, Dandekar T. MSL: Facilitating automatic and physical analysis of published scientific literature in PDF format. F1000Res 2015;4:1453. [PMID: 29721305 PMCID: PMC5897790 DOI: 10.12688/f1000research.7329.3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/26/2018] [Indexed: 01/12/2023] Open

Vilares M, Fernández M, Blanco A. Supporting knowledge discovery for biodiversity. DATA KNOWL ENG 2015. [DOI: 10.1016/j.datak.2015.08.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]

Gonzalez GH, Tahsin T, Goodale BC, Greene AC, Greene CS. Recent Advances and Emerging Applications in Text and Data Mining for Biomedical Discovery. Brief Bioinform 2015;17:33-42. [PMID: 26420781 PMCID: PMC4719073 DOI: 10.1093/bib/bbv087] [Citation(s) in RCA: 103] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2015] [Indexed: 02/06/2023] Open

GNormPlus: An Integrative Approach for Tagging Genes, Gene Families, and Protein Domains. BIOMED RESEARCH INTERNATIONAL 2015;2015:918710. [PMID: 26380306 PMCID: PMC4561873 DOI: 10.1155/2015/918710] [Citation(s) in RCA: 111] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/15/2015] [Revised: 04/03/2015] [Accepted: 04/04/2015] [Indexed: 02/01/2023]

From Literature to Knowledge: Exploiting PubMed to Answer Biomedical Questions in Natural Language. ACTA ACUST UNITED AC 2015. [DOI: 10.1007/978-3-319-22741-2_1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/19/2023]