1
|
Lee CC, Lee S, Song MH, Kim JY, Lee S. Bidirectional Long Short-Term Memory-Based Detection of Adverse Drug Reaction Posts Using Korean Social Networking Services Data: Deep Learning Approaches. JMIR Med Inform 2024; 12:e45289. [PMID: 39565685 PMCID: PMC11601139 DOI: 10.2196/45289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 05/20/2024] [Accepted: 08/06/2024] [Indexed: 11/22/2024] Open
Abstract
Background Social networking services (SNS) closely reflect the lives of individuals in modern society and generate large amounts of data. Previous studies have extracted drug information using relevant SNS data. In particular, it is important to detect adverse drug reactions (ADRs) early using drug surveillance systems. To this end, various deep learning methods have been used to analyze data in multiple languages in addition to English. Objective A cautionary drug that can cause ADRs in older patients was selected, and Korean SNS data containing this drug information were collected. Based on this information, we aimed to develop a deep learning model that classifies drug ADR posts based on a recurrent neural network. Methods In previous studies, ketoprofen, which has a high prescription frequency and, thus, was referred to the most in posts secured from SNS data, was selected as the target drug. Blog posts, café posts, and NAVER Q&A posts from 2005 to 2020 were collected from NAVER, a portal site containing drug-related information, and natural language processing techniques were applied to analyze data written in Korean. Posts containing highly relevant drug names and ADR word pairs were filtered through association analysis, and training data were generated through manual labeling tasks. Using the training data, an embedded layer of word2vec was formed, and a Bidirectional Long Short-Term Memory (Bi-LSTM) classification model was generated. Then, we evaluated the area under the curve with other machine learning models. In addition, the entire process was further verified using the nonsteroidal anti-inflammatory drug aceclofenac. Results Among the nonsteroidal anti-inflammatory drugs, Korean SNS posts containing information on ketoprofen and aceclofenac were secured, and the generic name lexicon, ADR lexicon, and Korean stop word lexicon were generated. In addition, to improve the accuracy of the classification model, an embedding layer was created considering the association between the drug name and the ADR word. In the ADR post classification test, ketoprofen and aceclofenac achieved 85% and 80% accuracy, respectively. Conclusions Here, we propose a process for developing a model for classifying ADR posts using SNS data. After analyzing drug name-ADR patterns, we filtered high-quality data by extracting posts, including known ADR words based on the analysis. Based on these data, we developed a model that classifies ADR posts. This confirmed that a model that can leverage social data to monitor ADRs automatically is feasible.
Collapse
Affiliation(s)
- Chung-Chun Lee
- Department of Biomedical Informatics, College of Medicine, Konyang University, Daejeon, Republic of Korea
| | - Seunghee Lee
- Healthcare Data Science Center, Konyang University Hospital, Daejeon, Republic of Korea
| | - Mi-Hwa Song
- Division of Computer Engineering, College of IT Engineering, Hansung University, Seoul, Republic of Korea
| | - Jong-Yeup Kim
- Department of Biomedical Informatics, College of Medicine, Konyang University, Daejeon, Republic of Korea
- Healthcare Data Science Center, Konyang University Hospital, Daejeon, Republic of Korea
| | - Suehyun Lee
- Department of Computer Engineering, College of IT Convergence, Gachon University, Seongnam, Republic of Korea
| |
Collapse
|
2
|
Nerella S, Bandyopadhyay S, Zhang J, Contreras M, Siegel S, Bumin A, Silva B, Sena J, Shickel B, Bihorac A, Khezeli K, Rashidi P. Transformers and large language models in healthcare: A review. Artif Intell Med 2024; 154:102900. [PMID: 38878555 PMCID: PMC11638972 DOI: 10.1016/j.artmed.2024.102900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 05/28/2024] [Accepted: 05/30/2024] [Indexed: 08/09/2024]
Abstract
With Artificial Intelligence (AI) increasingly permeating various aspects of society, including healthcare, the adoption of the Transformers neural network architecture is rapidly changing many applications. Transformer is a type of deep learning architecture initially developed to solve general-purpose Natural Language Processing (NLP) tasks and has subsequently been adapted in many fields, including healthcare. In this survey paper, we provide an overview of how this architecture has been adopted to analyze various forms of healthcare data, including clinical NLP, medical imaging, structured Electronic Health Records (EHR), social media, bio-physiological signals, biomolecular sequences. Furthermore, which have also include the articles that used the transformer architecture for generating surgical instructions and predicting adverse outcomes after surgeries under the umbrella of critical care. Under diverse settings, these models have been used for clinical diagnosis, report generation, data reconstruction, and drug/protein synthesis. Finally, we also discuss the benefits and limitations of using transformers in healthcare and examine issues such as computational cost, model interpretability, fairness, alignment with human values, ethical implications, and environmental impact.
Collapse
Affiliation(s)
- Subhash Nerella
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | | | - Jiaqing Zhang
- Department of Electrical and Computer Engineering, University of Florida, Gainesville, United States
| | - Miguel Contreras
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | - Scott Siegel
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | - Aysegul Bumin
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, United States
| | - Brandon Silva
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, United States
| | - Jessica Sena
- Department Of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Benjamin Shickel
- Department of Medicine, University of Florida, Gainesville, United States
| | - Azra Bihorac
- Department of Medicine, University of Florida, Gainesville, United States
| | - Kia Khezeli
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | - Parisa Rashidi
- Department of Biomedical Engineering, University of Florida, Gainesville, United States.
| |
Collapse
|
3
|
Yang J, Hu Z, Zhang L, Peng B. Predicting Drugs Suspected of Causing Adverse Drug Reactions Using Graph Features and Attention Mechanisms. Pharmaceuticals (Basel) 2024; 17:822. [PMID: 39065673 PMCID: PMC11279999 DOI: 10.3390/ph17070822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Revised: 06/12/2024] [Accepted: 06/20/2024] [Indexed: 07/28/2024] Open
Abstract
BACKGROUND Adverse drug reactions (ADRs) refer to an unintended harmful reaction that occurs after the administration of a medication for therapeutic purposes, which is unrelated to the intended pharmacological action of the drug. In the United States, ADRs account for 6% of all hospital admissions annually. The cost of ADR-related illnesses in 2016 was estimated at USD 528.4 billion. Increasing the awareness of ADRs is an effective measure to prevent them. Assessing suspected drugs in adverse events helps to enhance the awareness of ADRs. METHODS In this study, a suspect drug assisted judgment model (SDAJM) is designed to identify suspected drugs in adverse events. This framework utilizes the graph isomorphism network (GIN) and an attention mechanism to extract features based on patients' demographic information, drug information, and ADR information. RESULTS By comparing it with other models, the results of various tests show that this model performs well in predicting the suspected drugs in adverse reaction events. ADR signal detection was conducted on a group of cardiovascular system drugs, and case analyses were performed on two classic drugs, Mexiletine and Captopril, as well as on two classic antithyroid drugs. The results indicate that the model can accomplish the task of predicting drug ADRs. Validation using benchmark datasets from ten drug discovery domains shows that the model is applicable to classification tasks on the Tox21 and SIDER datasets. CONCLUSIONS This study applies deep learning methods to construct the SDAJM model for three purposes: (1) identifying drugs suspected to cause adverse drug events (ADEs), (2) predicting the ADRs of drugs, and (3) other drug discovery tasks. The results indicate that this method can offer new directions for research in the field of ADRs.
Collapse
Affiliation(s)
| | | | | | - Bin Peng
- College of Public Health, Chongqing Medical University, Chongqing 401331, China; (J.Y.); (Z.H.); (L.Z.)
| |
Collapse
|
4
|
Lyu D, Wang X, Chen Y, Wang F. Language model and its interpretability in biomedicine: A scoping review. iScience 2024; 27:109334. [PMID: 38495823 PMCID: PMC10940999 DOI: 10.1016/j.isci.2024.109334] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/19/2024] Open
Abstract
With advancements in large language models, artificial intelligence (AI) is undergoing a paradigm shift where AI models can be repurposed with minimal effort across various downstream tasks. This provides great promise in learning generally useful representations from biomedical corpora, at scale, which would empower AI solutions in healthcare and biomedical research. Nonetheless, our understanding of how they work, when they fail, and what they are capable of remains underexplored due to their emergent properties. Consequently, there is a need to comprehensively examine the use of language models in biomedicine. This review aims to summarize existing studies of language models in biomedicine and identify topics ripe for future research, along with the technical and analytical challenges w.r.t. interpretability. We expect this review to help researchers and practitioners better understand the landscape of language models in biomedicine and what methods are available to enhance the interpretability of their models.
Collapse
Affiliation(s)
- Daoming Lyu
- Institute of Artificial Intelligence for Digital Health, Weill Cornell Medicine, New York, NY, USA
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Xingbo Wang
- Institute of Artificial Intelligence for Digital Health, Weill Cornell Medicine, New York, NY, USA
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Yong Chen
- Department of Biostatistics, Epidemiology & Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Fei Wang
- Institute of Artificial Intelligence for Digital Health, Weill Cornell Medicine, New York, NY, USA
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| |
Collapse
|
5
|
Khoruzhaya A, Kozlov D, Arzamasov K, Kremneva E. Comparison of an Ensemble of Machine Learning Models and the BERT Language Model for Analysis of Text Descriptions of Brain CT Reports to Determine the Presence of Intracranial Hemorrhage. Sovrem Tekhnologii Med 2024; 16:27-34. [PMID: 39421632 PMCID: PMC11482096 DOI: 10.17691/stm2024.16.1.03] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Indexed: 10/19/2024] Open
Abstract
The aim of this study is to train and test an ensemble of machine learning models, as well as to compare its performance with the BERT language model pre-trained on medical data to perform simple binary classification, i.e., determine the presence/absence of the signs of intracranial hemorrhage (ICH) in brain CT reports. Materials and Methods Seven machine learning algorithms and three text vectorization techniques were selected as models to solve the binary classification problem. These models were trained on textual data represented by 3980 brain CT reports from 56 inpatient medical facilities in Moscow. The study utilized three text vectorization techniques: bag of words, TF-IDF, and word2vec. The resulting data were then processed by the following machine learning algorithms: decision tree, random forest, logistic regression, nearest neighbors, support vector machines, Catboost, and XGboost. Data analysis and pre-processing were performed using NLTK (Natural Language Toolkit, version 3.6.5), libraries for character-based and statistical processing of natural language, and Scikit-learn (version 0.24.2), a library for machine learning containing tools to tackle classification challenges. MedRuBertTiny2 was taken as a BERT transformer model pre-trained on medical data. Results Based on the training and testing outcomes from seven machine learning algorithms, the authors selected three algorithms that yielded the highest metrics (i.e. sensitivity and specificity): CatBoost, logistic regression, and nearest neighbors. The highest metrics were achieved by the bag of words technique. These algorithms were assembled into an ensemble using the stacking technique. The sensitivity and specificity for the validation dataset separated from the original sample were 0.93 and 0.90, respectively. Next, the ensemble and the BERT model were trained on an independent dataset containing 9393 textual radiology reports also divided into training and test sets. Once the ensemble was tested on this dataset, the resulting sensitivity and specificity were 0.92 and 0.90, respectively. The BERT model tested on these data demonstrated a sensitivity of 0.97 and a specificity of 0.90. Conclusion When analyzing textual reports of brain CT scans with signs of intracranial hemorrhage, the trained ensemble demonstrated high accuracy metrics. Still, manual quality control of the results is required during its application. The pre-trained BERT transformer model, additionally trained on diagnostic textual reports, demonstrated higher accuracy metrics (p<0.05). The results show promise in terms of finding specific values for both binary classification task and in-depth analysis of unstructured medical information.
Collapse
Affiliation(s)
- A.N. Khoruzhaya
- Junior Researcher, Department of Innovative Technologies; Scientific and Practical Clinical Center for Diagnostics and Telemedicine Technologies of the Moscow Department of Health, Bldg 1, 24 Petrovka St., Moscow, 127051, Russia
| | - D.V. Kozlov
- Junior Researcher, Department of Medical Informatics, Radiomics and Radiogenomics; Scientific and Practical Clinical Center for Diagnostics and Telemedicine Technologies of the Moscow Department of Health, Bldg 1, 24 Petrovka St., Moscow, 127051, Russia
| | - K.M. Arzamasov
- Head of the Department of Medical Informatics, Radiomics and Radiogenomics; Scientific and Practical Clinical Center for Diagnostics and Telemedicine Technologies of the Moscow Department of Health, Bldg 1, 24 Petrovka St., Moscow, 127051, Russia
| | - E.I. Kremneva
- Leading Researcher, Department of Innovative Technologies; Scientific and Practical Clinical Center for Diagnostics and Telemedicine Technologies of the Moscow Department of Health, Bldg 1, 24 Petrovka St., Moscow, 127051, Russia; Senior Researcher; Research Center for Neurology, 80 Volokolamskoye Shosse, Moscow, 125367, Russia
| |
Collapse
|
6
|
Loukachevitch N, Manandhar S, Baral E, Rozhkov I, Braslavski P, Ivanov V, Batura T, Tutubalina E. NEREL-BIO: a dataset of biomedical abstracts annotated with nested named entities. Bioinformatics 2023; 39:btad161. [PMID: 37004189 PMCID: PMC10129873 DOI: 10.1093/bioinformatics/btad161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2022] [Revised: 02/12/2023] [Accepted: 03/24/2023] [Indexed: 04/03/2023] Open
Abstract
MOTIVATION This article describes NEREL-BIO-an annotation scheme and corpus of PubMed abstracts in Russian and smaller number of abstracts in English. NEREL-BIO extends the general domain dataset NEREL by introducing domain-specific entity types. NEREL-BIO annotation scheme covers both general and biomedical domains making it suitable for domain transfer experiments. NEREL-BIO provides annotation for nested named entities as an extension of the scheme employed for NEREL. Nested named entities may cross entity boundaries to connect to shorter entities nested within longer entities, making them harder to detect. RESULTS NEREL-BIO contains annotations for 700+ Russian and 100+ English abstracts. All English PubMed annotations have corresponding Russian counterparts. Thus, NEREL-BIO comprises the following specific features: annotation of nested named entities, it can be used as a benchmark for cross-domain (NEREL → NEREL-BIO) and cross-language (English → Russian) transfer. We experiment with both transformer-based sequence models and machine reading comprehension models and report their results. AVAILABILITY AND IMPLEMENTATION The dataset and annotation guidelines are freely available at https://github.com/nerel-ds/NEREL-BIO.
Collapse
Affiliation(s)
| | - Suresh Manandhar
- Madan Bhandari University of Science and Technology, Chitlang 44600, Nepal
| | - Elina Baral
- Madan Bhandari University of Science and Technology, Chitlang 44600, Nepal
| | - Igor Rozhkov
- Lomonosov Moscow State University, Moscow 19899, Russia
| | - Pavel Braslavski
- Ural Federal University, Yekaterinburg 620002, Russia
- HSE University, Moscow 101000, Russia
| | | | - Tatiana Batura
- A.P. Ershov Institute of Informatics Systems, Novosibirsk 630090, Russia
| | - Elena Tutubalina
- HSE University, Moscow 101000, Russia
- Artificial Intelligence Research Institute, Moscow 105064, Russia
- Sber AI, Moscow 121170, Russia
| |
Collapse
|
7
|
iADRGSE: A Graph-Embedding and Self-Attention Encoding for Identifying Adverse Drug Reaction in the Earlier Phase of Drug Development. Int J Mol Sci 2022; 23:ijms232416216. [PMID: 36555858 PMCID: PMC9786008 DOI: 10.3390/ijms232416216] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Revised: 12/15/2022] [Accepted: 12/16/2022] [Indexed: 12/23/2022] Open
Abstract
Adverse drug reactions (ADRs) are a major issue to be addressed by the pharmaceutical industry. Early and accurate detection of potential ADRs contributes to enhancing drug safety and reducing financial expenses. The majority of the approaches that have been employed to identify ADRs are limited to determining whether a drug exhibits an ADR, rather than identifying the exact type of ADR. By introducing the "multi-level feature-fusion deep-learning model", a new predictor, called iADRGSE, has been developed, which can be used to identify adverse drug reactions at the early stage of drug discovery. iADRGSE integrates a self-attentive module and a graph-network module that can extract one-dimensional sub-structure sequence information and two-dimensional chemical-structure graph information of drug molecules. As a demonstration, cross-validation and independent testing were performed with iADRGSE on a dataset of ADRs classified into 27 categories, based on SOC (system organ classification). In addition, experiments comparing iADRGSE with approaches such as NPF were conducted on the OMOP dataset, using the jackknife test method. Experiments show that iADRGSE was superior to existing state-of-the-art predictors.
Collapse
|
8
|
Sakhovskiy A, Tutubalina E. Multimodal model with text and drug embeddings for adverse drug reaction classification. J Biomed Inform 2022; 135:104182. [DOI: 10.1016/j.jbi.2022.104182] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 05/24/2022] [Accepted: 08/20/2022] [Indexed: 11/16/2022]
|
9
|
Analysis of the Full-Size Russian Corpus of Internet Drug Reviews with Complex NER Labeling Using Deep Learning Neural Networks and Language Models. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12010491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The paper presents the full-size Russian corpus of Internet users’ reviews on medicines with complex named entity recognition (NER) labeling of pharmaceutically relevant entities. We evaluate the accuracy levels reached on this corpus by a set of advanced deep learning neural networks for extracting mentions of these entities. The corpus markup includes mentions of the following entities: medication (33,005 mentions), adverse drug reaction (1778), disease (17,403), and note (4490). Two of them—medication and disease—include a set of attributes. A part of the corpus has a coreference annotation with 1560 coreference chains in 300 documents. A multi-label model based on a language model and a set of features has been developed for recognizing entities of the presented corpus. We analyze how the choice of different model components affects the entity recognition accuracy. Those components include methods for vector representation of words, types of language models pre-trained for the Russian language, ways of text normalization, and other pre-processing methods. The sufficient size of our corpus allows us to study the effects of particularities of annotation and entity balancing. We compare our corpus to existing ones by the occurrences of entities of different types and show that balancing the corpus by the number of texts with and without adverse drug event (ADR) mentions improves the ADR recognition accuracy with no notable decline in the accuracy of detecting entities of other types. As a result, the state of the art for the pharmacological entity extraction task for the Russian language is established on a full-size labeled corpus. For the ADR entity type, the accuracy achieved is 61.1% by the F1-exact metric, which is on par with the accuracy level for other language corpora with similar characteristics and ADR representativeness. The accuracy of the coreference relation extraction evaluated on our corpus is 71%, which is higher than the results achieved on the other Russian-language corpora.
Collapse
|
10
|
RuMedBench: A Russian Medical Language Understanding Benchmark. Artif Intell Med 2022. [DOI: 10.1007/978-3-031-09342-5_38] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
11
|
Kalyan KS, Rajasekharan A, Sangeetha S. AMMU: A survey of transformer-based biomedical pretrained language models. J Biomed Inform 2021; 126:103982. [PMID: 34974190 DOI: 10.1016/j.jbi.2021.103982] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Revised: 12/12/2021] [Accepted: 12/20/2021] [Indexed: 01/04/2023]
Abstract
Transformer-based pretrained language models (PLMs) have started a new era in modern natural language processing (NLP). These models combine the power of transformers, transfer learning, and self-supervised learning (SSL). Following the success of these models in the general domain, the biomedical research community has developed various in-domain PLMs starting from BioBERT to the latest BioELECTRA and BioALBERT models. We strongly believe there is a need for a survey paper that can provide a comprehensive survey of various transformer-based biomedical pretrained language models (BPLMs). In this survey, we start with a brief overview of foundational concepts like self-supervised learning, embedding layer and transformer encoder layers. We discuss core concepts of transformer-based PLMs like pretraining methods, pretraining tasks, fine-tuning methods, and various embedding types specific to biomedical domain. We introduce a taxonomy for transformer-based BPLMs and then discuss all the models. We discuss various challenges and present possible solutions. We conclude by highlighting some of the open issues which will drive the research community to further improve transformer-based BPLMs. The list of all the publicly available transformer-based BPLMs along with their links is provided at https://mr-nlp.github.io/posts/2021/05/transformer-based-biomedical-pretrained-language-models-list/.
Collapse
|
12
|
Giachelle F, Irrera O, Silvello G. MedTAG: a portable and customizable annotation tool for biomedical documents. BMC Med Inform Decis Mak 2021; 21:352. [PMID: 34922517 PMCID: PMC8684237 DOI: 10.1186/s12911-021-01706-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Accepted: 12/01/2021] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Semantic annotators and Natural Language Processing (NLP) methods for Named Entity Recognition and Linking (NER+L) require plenty of training and test data, especially in the biomedical domain. Despite the abundance of unstructured biomedical data, the lack of richly annotated biomedical datasets poses hindrances to the further development of NER+L algorithms for any effective secondary use. In addition, manual annotation of biomedical documents performed by physicians and experts is a costly and time-consuming task. To support, organize and speed up the annotation process, we introduce MedTAG, a collaborative biomedical annotation tool that is open-source, platform-independent, and free to use/distribute. RESULTS We present the main features of MedTAG and how it has been employed in the histopathology domain by physicians and experts to annotate more than seven thousand clinical reports manually. We compare MedTAG with a set of well-established biomedical annotation tools, including BioQRator, ezTag, MyMiner, and tagtog, comparing their pros and cons with those of MedTag. We highlight that MedTAG is one of the very few open-source tools provided with an open license and a straightforward installation procedure supporting cross-platform use. CONCLUSIONS MedTAG has been designed according to five requirements (i.e. available, distributable, installable, workable and schematic) defined in a recent extensive review of manual annotation tools. Moreover, MedTAG satisfies 20 over 22 criteria specified in the same study.
Collapse
Affiliation(s)
- Fabio Giachelle
- Department of Information Engineering, University of Padua, Padua, Italy
| | - Ornella Irrera
- Department of Information Engineering, University of Padua, Padua, Italy
| | - Gianmaria Silvello
- Department of Information Engineering, University of Padua, Padua, Italy
| |
Collapse
|