1
|
Fernandez-Llimos F, Negrão LG, Bond C, Stewart D. Influence of automated indexing in Medical Subject Headings (MeSH) selection for pharmacy practice journals. Res Social Adm Pharm 2024; 20:911-917. [PMID: 38902136 DOI: 10.1016/j.sapharm.2024.06.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Accepted: 06/10/2024] [Indexed: 06/22/2024]
Abstract
BACKGROUND The Medical Subject Headings (MeSH) thesaurus is the controlled vocabulary used to index articles in MEDLINE. MeSH were mainly manually selected until June 2022 when an automated algorithm, the Medical Text Indexer (MTI) automated was fully implemented. A selection of automated indexed articles is then reviewed (curated) by human indexers to ensure the quality of the process. OBJECTIVE To describe the association of MEDLINE indexing methods (i.e., manual, automated, and automated + curated) on the MeSH assignment in pharmacy practice journals compared with medical journals. METHODS Original research articles published between 2016 and 2023 in two groups of journals (i.e., the Big-five general medicine and three pharmacy practice journals) were selected from PubMed using journal-specific search strategies. Metadata of the articles, including MeSH terms and indexing method, was extracted. A list of pharmacy-specific MeSH terms had been compiled from previously published studies, and their presence in pharmacy practice journal records was investigated. Using bivariate and multivariate analyses, as well as effect size measures, the number of MeSH per article was compared between journal groups, geographic origin of the journal, and indexing method. RESULTS A total of 8479 original research articles was retrieved: 6254 from the medical journals and 2225 from pharmacy practice journals. The number of articles indexed by the various methods was disproportionate; 77.8 % of medical and 50.5 % of pharmacy manually indexed. Among those indexed using the automated system, 51.1 % medical and 10.9 % pharmacy practice articles were then curated to ensure the indexing quality. Number of MeSH per article varied among the three indexing methods for medical and pharmacy journals, with 15.5 vs. 13.0 in manually indexed, 9.4 vs. 7.4 in automated indexed, and 12.1 vs. 7.8 in automated and then curated, respectively. Multivariate analysis showed significant effect of indexing method and journal group in the number of MeSH attributed, but not the geographical origin of the journal. CONCLUSIONS Articles indexed using automated MTI have less MeSH than manually indexed articles. Articles published in pharmacy practice journals were indexed with fewer number of MeSH compared with general medical journal articles regardless of the indexing method used.
Collapse
Affiliation(s)
- Fernando Fernandez-Llimos
- Applied Molecular Biosciences Unit (UCIBIO), Laboratory of Pharmacology, Department of Drug Sciences, Faculty of Pharmacy, University of Porto, Porto, Portugal.
| | - Luciana G Negrão
- Applied Molecular Biosciences Unit (UCIBIO), Laboratory of Pharmacology, Department of Drug Sciences, Faculty of Pharmacy, University of Porto, Porto, Portugal.
| | - Christine Bond
- Primary Care, University of Aberdeen Institute of Applied Health Sciences, Aberdeen, United Kingdom.
| | - Derek Stewart
- College of Pharmacy, QU Health, Qatar University, Doha, Qatar.
| |
Collapse
|
2
|
Zolotarev O, Khakimova A, Rahim F, Senel E, Zatsman I, Gu D. Scientometric analysis of trends in global research on acne treatment. Int J Womens Dermatol 2023; 9:e082. [PMID: 37521754 PMCID: PMC10378739 DOI: 10.1097/jw9.0000000000000082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 03/12/2023] [Indexed: 08/01/2023] Open
Abstract
Acne or acne vulgaris is the most common chronic inflammatory disease of the sebaceous follicles. Objectives The present study aims to identify the main lines of research in the field of acne treatment using reproducible scientometric methods. In this article, we reviewed the following research trends: facial acne, different antibiotics, retinoids, anti-inflammatory drugs, epidermal growth factor receptor inhibitors therapy, and associated diseases. Methods The analysis of publications from the PubMed collection was carried out from 1871 to 2022. All data were analyzed using Microsoft Excel. The evolution of the terminological portrait of the disease is shown. Results Trends in the use of various groups of antibiotics, retinoids, anti-inflammatory drugs, and photodynamic therapy for acne treatment have been found. There is a growing interest in clindamycin and doxycycline (polynomial and exponential growth, respectively). The effects of isotretinoin are also being studied more frequently (active linear growth). The publication of studies on spironolactone is increasing (linear growth). There is also a steady interest in the use of epidermal growth factor receptor inhibitors in the recent years. There is active research on acne and polycystic ovary syndrome (exponential growth). Limitations Only articles in English were selected. The most frequent terms were considered. Conclusions The dynamics of publication activity in the field of acne was considered. The aim of the current scientometric study was to analyze the global trends in acne treatments. The trend analysis made it possible to identify the most explored areas of research, as well as indicate those areas in dermatology in which interest is declining.
Collapse
Affiliation(s)
- Oleg Zolotarev
- Institute of Information Systems and Engineering Computer Technologies, Russian New University, Moscow, Russia
| | - Aida Khakimova
- Institute of Information Systems and Engineering Computer Technologies, Russian New University, Moscow, Russia
| | - Fakher Rahim
- Department of Anesthesia, Cihan University - Sulaimaniya, Kurdistan Region, Iraq
| | - Engin Senel
- Department of Dermatology and Venereology, Hitit University Faculty of Medicine, Corum, Turkey
| | - Igor Zatsman
- Research Department, Institute of Informatics Problems FRC CSC RAS, Moscow, Russia
| | - Dongxiao Gu
- MIS School of Management, Hefei University of Technology, Hefei, Anhui, China
| |
Collapse
|
3
|
Cai L, Li J, Lv H, Liu W, Niu H, Wang Z. Integrating domain knowledge for biomedical text analysis into deep learning: A survey. J Biomed Inform 2023; 143:104418. [PMID: 37290540 DOI: 10.1016/j.jbi.2023.104418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Revised: 04/24/2023] [Accepted: 05/31/2023] [Indexed: 06/10/2023]
Abstract
The past decade has witnessed an explosion of textual information in the biomedical field. Biomedical texts provide a basis for healthcare delivery, knowledge discovery, and decision-making. Over the same period, deep learning has achieved remarkable performance in biomedical natural language processing, however, its development has been limited by well-annotated datasets and interpretability. To solve this, researchers have considered combining domain knowledge (such as biomedical knowledge graph) with biomedical data, which has become a promising means of introducing more information into biomedical datasets and following evidence-based medicine. This paper comprehensively reviews more than 150 recent literature studies on incorporating domain knowledge into deep learning models to facilitate typical biomedical text analysis tasks, including information extraction, text classification, and text generation. We eventually discuss various challenges and future directions.
Collapse
Affiliation(s)
- Linkun Cai
- School of Biological Science and Medical Engineering, Beihang University, 100191 Beijing, China
| | - Jia Li
- Department of Radiology, Beijing Friendship Hospital, Capital Medical University, 100050 Beijing, China
| | - Han Lv
- Department of Radiology, Beijing Friendship Hospital, Capital Medical University, 100050 Beijing, China
| | - Wenjuan Liu
- Aerospace Center Hospital, 100049 Beijing, China
| | - Haijun Niu
- School of Biological Science and Medical Engineering, Beihang University, 100191 Beijing, China
| | - Zhenchang Wang
- School of Biological Science and Medical Engineering, Beihang University, 100191 Beijing, China; Department of Radiology, Beijing Friendship Hospital, Capital Medical University, 100050 Beijing, China.
| |
Collapse
|
4
|
Mylonas N, Karlos S, Tsoumakas G. WeakMeSH: Leveraging provenance information for weakly supervised classification of biomedical articles with emerging MeSH descriptors. Artif Intell Med 2023; 137:102505. [PMID: 36868691 DOI: 10.1016/j.artmed.2023.102505] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 12/15/2022] [Accepted: 01/27/2023] [Indexed: 02/04/2023]
Abstract
Medical Subject Headings (MeSH) is a hierarchically structured thesaurus created by the National Library of Medicine of USA. Each year the vocabulary gets revised, bringing forth different types of changes. Those of particular interest are the ones that introduce new descriptors in the vocabulary either brand new or those who come up as a product of a complex change. These new descriptors often lack ground truth articles and rendering learning models that require supervision not applicable. Furthermore, this problem is characterized by its multi label nature and the fine-grained character of the descriptors that play the role of classes, requiring expert supervision and a lot of human resources. In this work, we alleviate these issues through retrieving insights from provenance information about those descriptors present in MeSH to create a weakly labeled train set for them. At the same time, we make use of a similarity mechanism to further filter the weak labels obtained through the descriptor information mentioned earlier. Our method, called WeakMeSH, was applied on a large-scale subset of the BioASQ 2018 data set consisting of 900 thousand biomedical articles. The performance of our method was evaluated on BioASQ 2020 against several other approaches that had given competitive results in similar problems in the past, or apply alternative transformations against the proposed one, as well as some variants that showcase the importance of each different component of our proposed approach. Finally, an analysis was performed on the different MeSH descriptors each year to assess the applicability of our method on the thesaurus.
Collapse
Affiliation(s)
- Nikolaos Mylonas
- Aristotle University of Thessaloniki, Thessaloniki 541 24, Thessaloniki, 54124, Greece.
| | - Stamatis Karlos
- Aristotle University of Thessaloniki, Thessaloniki 541 24, Thessaloniki, 54124, Greece
| | - Grigorios Tsoumakas
- Aristotle University of Thessaloniki, Thessaloniki 541 24, Thessaloniki, 54124, Greece
| |
Collapse
|
5
|
Gu J, Chersoni E, Wang X, Huang CR, Qian L, Zhou G. LitCovid ensemble learning for COVID-19 multi-label classification. Database (Oxford) 2022; 2022:6846687. [PMID: 36426767 PMCID: PMC9693804 DOI: 10.1093/database/baac103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2022] [Revised: 10/27/2022] [Accepted: 11/04/2022] [Indexed: 11/27/2022]
Abstract
The Coronavirus Disease 2019 (COVID-19) pandemic has shifted the focus of research worldwide, and more than 10 000 new articles per month have concentrated on COVID-19-related topics. Considering this rapidly growing literature, the efficient and precise extraction of the main topics of COVID-19-relevant articles is of great importance. The manual curation of this information for biomedical literature is labor-intensive and time-consuming, and as such the procedure is insufficient and difficult to maintain. In response to these complications, the BioCreative VII community has proposed a challenging task, LitCovid Track, calling for a global effort to automatically extract semantic topics for COVID-19 literature. This article describes our work on the BioCreative VII LitCovid Track. We proposed the LitCovid Ensemble Learning (LCEL) method for the tasks and integrated multiple biomedical pretrained models to address the COVID-19 multi-label classification problem. Specifically, seven different transformer-based pretrained models were ensembled for the initialization and fine-tuning processes independently. To enhance the representation abilities of the deep neural models, diverse additional biomedical knowledge was utilized to facilitate the fruitfulness of the semantic expressions. Simple yet effective data augmentation was also leveraged to address the learning deficiency during the training phase. In addition, given the imbalanced label distribution of the challenging task, a novel asymmetric loss function was applied to the LCEL model, which explicitly adjusted the negative-positive importance by assigning different exponential decay factors and helped the model focus on the positive samples. After the training phase, an ensemble bagging strategy was adopted to merge the outputs from each model for final predictions. The experimental results show the effectiveness of our proposed approach, as LCEL obtains the state-of-the-art performance on the LitCovid dataset. Database URL: https://github.com/JHnlp/LCEL.
Collapse
Affiliation(s)
| | - Emmanuele Chersoni
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong 999077, China
| | - Xing Wang
- Tencent AI Lab, Shenzhen 518071, China
| | - Chu-Ren Huang
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong 999077, China
| | - Longhua Qian
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China
| | - Guodong Zhou
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China
| |
Collapse
|
6
|
Liu H, Carini S, Chen Z, Phillips Hey S, Sim I, Weng C. Ontology-based categorization of clinical studies by their conditions. J Biomed Inform 2022; 135:104235. [PMID: 36283581 DOI: 10.1016/j.jbi.2022.104235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2022] [Revised: 09/24/2022] [Accepted: 10/18/2022] [Indexed: 11/20/2022]
Abstract
OBJECTIVE The free-text Condition data field in the ClinicalTrials.gov is not amenable to computational processes for retrieving, aggregating and visualizing clinical studies by condition categories. This paper contributes a method for automated ontology-based categorization of clinical studies by their conditions. MATERIALS AND METHODS Our method first maps text entries in ClinicalTrials.gov's Condition field to standard condition concepts in the OMOP Common Data Model by using SNOMED CT as a reference ontology and using Usagi for concept normalization, followed by hierarchical traversal of the SNOMED ontology for concept expansion, ontology-driven condition categorization, and visualization. We compared the accuracy of this method to that of the MeSH-based method. RESULTS We reviewed the 4,506 studies on Vivli.org categorized by our method. Condition terms of 4,501 (99.89%) studies were successfully mapped to SNOMED CT concepts, and with a minimum concept mapping score threshold, 4,428 (98.27%) studies were categorized into 31 predefined categories. When validating with manual categorization results on a random sample of 300 studies, our method achieved an estimated categorization accuracy of 95.7%, while the MeSH-based method had an accuracy of 85.0%. CONCLUSION We showed that categorizing clinical studies using their Condition terms with referencing to SNOMED CT achieved a better accuracy and coverage than using MeSH terms. The proposed ontology-driven condition categorization was useful to create accurate clinical study categorization that enables clinical researchers to aggregate evidence from a large number of clinical studies.
Collapse
Affiliation(s)
- Hao Liu
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Simona Carini
- Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - Zhehuan Chen
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | | | - Ida Sim
- Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY, USA.
| |
Collapse
|
7
|
Tonin FS, Gmünder V, Bonetti AF, Mendes AM, Fernandez-Llimos F. Use of 'Pharmaceutical services' Medical Subject Headings (MeSH) in articles assessing pharmacists' interventions. EXPLORATORY RESEARCH IN CLINICAL AND SOCIAL PHARMACY 2022; 7:100172. [PMID: 36082143 PMCID: PMC9445408 DOI: 10.1016/j.rcsop.2022.100172] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2022] [Revised: 08/11/2022] [Accepted: 08/19/2022] [Indexed: 11/15/2022] Open
Abstract
Background Medical Subject Headings (MeSH) thesaurus contribute towards efficient searching of biomedical information. However, insufficient coverage of specific fields and inaccuracies in the indexing of articles can lead to bias during literature retrieval. Objectives This meta-research study aimed to assess the use of 'Pharmaceutical Services' MeSH terms in studies evaluating the effect of pharmacists' interventions. Methods An updated systematic search (Jan-2022) to gather meta-analyses comparing pharmacists' interventions vs. other forms of care was performed. All MeSH terms allocated to the MEDLINE record of each primary study included in the selected meta-analyses were systematically extracted. Terms from the 'Pharmaceutical Services' branch, including its descendants, as well as other 26 pharmacy-specific MeSH terms were identified. The assignment of these terms as a 'Major MeSH' was also evaluated. Descriptive statistics and social network analyses to evaluate the co-occurrence of the MeSH terms in the articles were conducted. Sensitivity analyses including only meta-analyses with declared objectives mentioning the words 'pharmacist' or 'pharmacy' were performed (SPSS v.24.0). Results Overall, 138 meta-analyses including 2012 primary articles were evaluated. A median of 15 [IQR 12-18] MeSH terms were assigned per article with a slight positive time-trend (Spearman rho = 0.193; p < 0.001). Only 36.6% (n = 736/2012) and 58.1% (n = 338/1099) of studies were indexed with one MeSH term from the 'Pharmaceutical Services' branch in the overall and sensitivity analyses, respectively. In <20% of cases, these terms were a 'Major MeSH'. The pharmacy-specific term 'Pharmacists' was the most frequently used, yet in only 27.8% and 47.7% of articles in the original and sensitivity analyses, respectively. Social networks showed a weak association between pharmacy-specific and 'Pharmaceutical services' branch MeSH terms. Conclusions The availability of a 'Pharmaceutical services' branch hierarchic tree and further pharmacy-specific MeSH terms incorporated to the MeSH thesaurus in the past years is not related with accurate indexing of articles.
Collapse
Affiliation(s)
- Fernanda S. Tonin
- H&TRC - Health & Technology Research Center, ESTeSL - Escola Superior de Tecnologia da Saúde, Instituto Politécnico de Lisboa, Lisbon, Portugal
| | - Vanessa Gmünder
- Pharmaceutical Care, Department of Pharmaceutical Sciences, University of Basel, Basel, Switzerland
| | - Aline F. Bonetti
- Pharmaceutical Sciences Postgraduate Research Program, Federal University of Paraná, Curitiba, Brazil
| | - Antonio M. Mendes
- Pharmacy Service, Hospital de Clínicas, Federal University of Paraná, Curitiba, Brazil
| | - Fernando Fernandez-Llimos
- Laboratory of Pharmacology, Department of Drug Sciences, Faculty of Pharmacy, University of Porto, Porto, Portugal
- Center for Health Technology and Services Research (CINTESIS), University of Porto, Porto, Portugal
| |
Collapse
|
8
|
Frandsen TF, Carlsen AMF, Eriksen MB. The use of subject headings varied in Embase and MEDLINE: An analysis of indexing across six subject areas. J Inf Sci 2022. [DOI: 10.1177/01655515221107335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Many bibliographic databases describe the content of a publication using a thesaurus. The vocabularies vary and the extent to which the databases apply them may also differ significantly. The aim of this study is to empirically explore the number of subject headings assigned to publications in two databases over time and to determine if publication characteristics are associated with the number of subject headings. Articles and reviews in MEDLINE and Embase from 1990 to 2019 assigned with one of the subject headings from six subject areas are included in this study. Each of the retrieved publications in Embase is matched with a similar publication in MEDLINE. Furthermore, multivariable linear regressions are used to explore the association of the number of subject headings in MEDLINE and Embase with six prespecified publication characteristics. The average number of assigned subject headings in MEDLINE is stable or even slightly decreasing over time. In Embase, the average number of assigned subject headings was stable until about 2000 where the average number increased dramatically during the next 3 years. Furthermore, linear regressions show that the average number of subject headings in MEDLINE and Embase is higher for publications in English, publications with longer abstract, recent publications and if it belongs to specific subject areas. However, reviews are assigned with more subject headings in Embase and fewer in MEDLINE. The implications of the results are discussed.
Collapse
Affiliation(s)
- Tove Faber Frandsen
- Department of Design and Communication, University of Southern Denmark, Denmark
| | | | - Mette Brandt Eriksen
- The University Library of Southern Denmark, Cochrane Denmark & Centre for Evidence-Based Medicine Odense (CEBMO), University of Southern Denmark
| |
Collapse
|
9
|
Almeida T, Antunes R, F. Silva J, Almeida JR, Matos S. Chemical identification and indexing in PubMed full-text articles using deep learning and heuristics. Database (Oxford) 2022; 2022:6625810. [PMID: 35776534 PMCID: PMC9248917 DOI: 10.1093/database/baac047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Revised: 05/13/2022] [Accepted: 06/06/2022] [Indexed: 11/14/2022]
Abstract
Abstract
The identification of chemicals in articles has attracted a large interest in the biomedical scientific community, given its importance in drug development research. Most of previous research have focused on PubMed abstracts, and further investigation using full-text documents is required because these contain additional valuable information that must be explored. The manual expert task of indexing Medical Subject Headings (MeSH) terms to these articles later helps researchers find the most relevant publications for their ongoing work. The BioCreative VII NLM-Chem track fostered the development of systems for chemical identification and indexing in PubMed full-text articles. Chemical identification consisted in identifying the chemical mentions and linking these to unique MeSH identifiers. This manuscript describes our participation system and the post-challenge improvements we made. We propose a three-stage pipeline that individually performs chemical mention detection, entity normalization and indexing. Regarding chemical identification, we adopted a deep-learning solution that utilizes the PubMedBERT contextualized embeddings followed by a multilayer perceptron and a conditional random field tagging layer. For the normalization approach, we use a sieve-based dictionary filtering followed by a deep-learning similarity search strategy. Finally, for the indexing we developed rules for identifying the more relevant MeSH codes for each article. During the challenge, our system obtained the best official results in the normalization and indexing tasks despite the lower performance in the chemical mention recognition task. In a post-contest phase we boosted our results by improving our named entity recognition model with additional techniques. The final system achieved 0.8731, 0.8275 and 0.4849 in the chemical identification, normalization and indexing tasks, respectively. The code to reproduce our experiments and run the pipeline is publicly available.
Database URL
https://github.com/bioinformatics-ua/biocreativeVII_track2
Collapse
Affiliation(s)
- Tiago Almeida
- Department of Electronics, Telecommunications and Informatics (DETI), Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro , Aveiro, Portugal
| | - Rui Antunes
- Department of Electronics, Telecommunications and Informatics (DETI), Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro , Aveiro, Portugal
| | - João F. Silva
- Department of Electronics, Telecommunications and Informatics (DETI), Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro , Aveiro, Portugal
| | - João R Almeida
- Department of Electronics, Telecommunications and Informatics (DETI), Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro , Aveiro, Portugal
- Department of Information and Communications Technologies, University of A Coruña , A Coruña, Spain
| | - Sérgio Matos
- Department of Electronics, Telecommunications and Informatics (DETI), Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro , Aveiro, Portugal
| |
Collapse
|
10
|
Gu J, Xiang R, Wang X, Li J, Li W, Qian L, Zhou G, Huang CR. Multi-probe attention neural network for COVID-19 semantic indexing. BMC Bioinformatics 2022; 23:259. [PMID: 35768777 PMCID: PMC9241329 DOI: 10.1186/s12859-022-04803-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Accepted: 06/15/2022] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND The COVID-19 pandemic has increasingly accelerated the publication pace of scientific literature. How to efficiently curate and index this large amount of biomedical literature under the current crisis is of great importance. Previous literature indexing is mainly performed by human experts using Medical Subject Headings (MeSH), which is labor-intensive and time-consuming. Therefore, to alleviate the expensive time consumption and monetary cost, there is an urgent need for automatic semantic indexing technologies for the emerging COVID-19 domain. RESULTS In this research, to investigate the semantic indexing problem for COVID-19, we first construct the new COVID-19 Semantic Indexing dataset, which consists of more than 80 thousand biomedical articles. We then propose a novel semantic indexing framework based on the multi-probe attention neural network (MPANN) to address the COVID-19 semantic indexing problem. Specifically, we employ a k-nearest neighbour based MeSH masking approach to generate candidate topic terms for each input article. We encode and feed the selected candidate terms as well as other contextual information as probes into the downstream attention-based neural network. Each semantic probe carries specific aspects of biomedical knowledge and provides informatively discriminative features for the input article. After extracting the semantic features at both term-level and document-level through the attention-based neural network, MPANN adopts a linear multi-view classifier to conduct the final topic prediction for COVID-19 semantic indexing. CONCLUSION The experimental results suggest that MPANN promises to represent the semantic features of biomedical texts and is effective in predicting semantic topics for COVID-19 related biomedical articles.
Collapse
Affiliation(s)
- Jinghang Gu
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong, China
| | - Rong Xiang
- Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China
| | | | - Jing Li
- Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China
| | - Wenjie Li
- Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China
| | - Longhua Qian
- School of Computer Science and Technology, Soochow University, Suzhou, China
| | - Guodong Zhou
- School of Computer Science and Technology, Soochow University, Suzhou, China
| | - Chu-Ren Huang
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong, China.
| |
Collapse
|
11
|
Karlos S, Mylonas N, Tsoumakas G. Instance-Based Zero-Shot learning for semi-Automatic MeSH indexing. Pattern Recognit Lett 2021. [DOI: 10.1016/j.patrec.2021.08.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
12
|
Bachelet VC, Navarrete MS, Barrera-Riquelme C, Carrasco VA, Dallaserra M, Díaz RA, Ibarra ÁA, Lizana FJ, Meza-Ducaud N, Saavedra MG, Tapia-Davegno C, Vergara AF, Villanueva J. A multiyear systematic survey of the quality of reporting for randomised trials in dentistry, neurology and geriatrics published in journals of Spain and Latin America. BMC Med Res Methodol 2021; 21:153. [PMID: 34311704 PMCID: PMC8314448 DOI: 10.1186/s12874-021-01337-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Accepted: 06/22/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The Iberoamerican Cochrane Network is currently developing an extensive project to identify Spanish-language journals that publish original clinical research in Spain and Latin America. The project is called BADERI (Database of Iberoamerican Essays and Journal) and feeds the research articles, mainly randomised clinical trials (RCTs), into CENTRAL (Cochrane Collaboration Central Register of Controlled Trials). This study aims to assess the quality of reporting of RCTs published in Spanish and Latin American journals for three clinical fields and assess changes over time. METHODS We did a systematic survey with time trend analysis of RCTs for dentistry, geriatrics, and neurology. These fields were chosen for pragmatic reasons as they had not yet been completed in BADERI. After screening RCTs from 1990 to 2018 for randomised or quasi-randomised clinical trials, we extracted data for 23 CONSORT items. The primary outcome was the total score of the 23 predefined CONSORT 2010 items for each RCT (score range from 0 to 34). The secondary outcome measure was the score for each one of these 23 items. RESULTS A total of 392 articles from 1990 to 2018 were included as follows: dentistry (282), neurology (80), and geriatrics (30). We found that the overall compliance score for the CONSORT items included in this study for all 392 RCTs analysed was 12.6 on a scale with a maximum score of 34. With time, the quality of reporting improved slightly for all RCTs. None of the articles achieved the complete individual CONSORT item compliance score. The lowest overall compliance percentage was for item 10 (Randomisation implementation) and item 24 (Protocol registration), with a dismal 1% compliance across all included RCTs, regardless of country. CONCLUSIONS CONSORT compliance is very poor in the 392 analysed RCTs. The impact of the CONSORT statement on improving the completeness of RCT reporting in Latin America and Spain is not clear. Iberoamerican journals should become more involved in endorsing and enforcing adherence to the CONSORT guidelines.
Collapse
Affiliation(s)
- Vivienne C Bachelet
- Escuela de Medicina, Facultad de Ciencias Médicas, Universidad de Santiago de Chile (USACH), Avenida Libertador Bernardo OHiggins 3363, Santiago, Estación Central, Chile.
| | - María S Navarrete
- Escuela de Medicina, Facultad de Ciencias Médicas, Universidad de Santiago de Chile (USACH), Avenida Libertador Bernardo OHiggins 3363, Santiago, Estación Central, Chile
| | - Constanza Barrera-Riquelme
- Escuela de Medicina, Facultad de Ciencias Médicas, Universidad de Santiago de Chile (USACH), Avenida Libertador Bernardo OHiggins 3363, Santiago, Estación Central, Chile
| | - Víctor A Carrasco
- Escuela de Medicina, Facultad de Ciencias Médicas, Universidad de Santiago de Chile (USACH), Avenida Libertador Bernardo OHiggins 3363, Santiago, Estación Central, Chile
| | - Matías Dallaserra
- Departamento de Cirugía Maxilofacial, Facultad de Odontología, Universidad de Chile, Santiago, Chile
| | - Rubén A Díaz
- Escuela de Medicina, Facultad de Ciencias Médicas, Universidad de Santiago de Chile (USACH), Avenida Libertador Bernardo OHiggins 3363, Santiago, Estación Central, Chile
| | - Álvaro A Ibarra
- Escuela de Medicina, Facultad de Ciencias Médicas, Universidad de Santiago de Chile (USACH), Avenida Libertador Bernardo OHiggins 3363, Santiago, Estación Central, Chile
| | - Francisca J Lizana
- Escuela de Medicina, Facultad de Ciencias Médicas, Universidad de Santiago de Chile (USACH), Avenida Libertador Bernardo OHiggins 3363, Santiago, Estación Central, Chile
| | - Nicolás Meza-Ducaud
- Escuela de Medicina, Facultad de Ciencias Médicas, Universidad de Santiago de Chile (USACH), Avenida Libertador Bernardo OHiggins 3363, Santiago, Estación Central, Chile
| | - Macarena G Saavedra
- Escuela de Medicina, Facultad de Ciencias Médicas, Universidad de Santiago de Chile (USACH), Avenida Libertador Bernardo OHiggins 3363, Santiago, Estación Central, Chile
| | - Camila Tapia-Davegno
- Escuela de Medicina, Facultad de Ciencias Médicas, Universidad de Santiago de Chile (USACH), Avenida Libertador Bernardo OHiggins 3363, Santiago, Estación Central, Chile
| | - Alonso F Vergara
- Escuela de Medicina, Facultad de Ciencias Médicas, Universidad de Santiago de Chile (USACH), Avenida Libertador Bernardo OHiggins 3363, Santiago, Estación Central, Chile
| | - Julio Villanueva
- Departamento de Cirugía Maxilofacial, Facultad de Odontología, Universidad de Chile, Santiago, Chile
- Hospital Clínico San Borja-Arriarán, Santiago, Chile
| |
Collapse
|
13
|
You R, Liu Y, Mamitsuka H, Zhu S. BERTMeSH: deep contextual representation learning for large-scale high-performance MeSH indexing with full text. Bioinformatics 2021; 37:684-692. [PMID: 32976559 DOI: 10.1093/bioinformatics/btaa837] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2020] [Revised: 09/02/2020] [Accepted: 09/11/2020] [Indexed: 02/06/2023] Open
Abstract
MOTIVATION With the rapid increase of biomedical articles, large-scale automatic Medical Subject Headings (MeSH) indexing has become increasingly important. FullMeSH, the only method for large-scale MeSH indexing with full text, suffers from three major drawbacks: FullMeSH (i) uses Learning To Rank, which is time-consuming, (ii) can capture some pre-defined sections only in full text and (iii) ignores the whole MEDLINE database. RESULTS We propose a computationally lighter, full text and deep-learning-based MeSH indexing method, BERTMeSH, which is flexible for section organization in full text. BERTMeSH has two technologies: (i) the state-of-the-art pre-trained deep contextual representation, Bidirectional Encoder Representations from Transformers (BERT), which makes BERTMeSH capture deep semantics of full text. (ii) A transfer learning strategy for using both full text in PubMed Central (PMC) and title and abstract (only and no full text) in MEDLINE, to take advantages of both. In our experiments, BERTMeSH was pre-trained with 3 million MEDLINE citations and trained on ∼1.5 million full texts in PMC. BERTMeSH outperformed various cutting-edge baselines. For example, for 20 K test articles of PMC, BERTMeSH achieved a Micro F-measure of 69.2%, which was 6.3% higher than FullMeSH with the difference being statistically significant. Also prediction of 20 K test articles needed 5 min by BERTMeSH, while it took more than 10 h by FullMeSH, proving the computational efficiency of BERTMeSH. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ronghui You
- School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai 200433, China
| | - Yuxuan Liu
- School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai 200433, China
| | - Hiroshi Mamitsuka
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Japan.,Department of Computer Science, Aalto University, Espoo, Finland
| | - Shanfeng Zhu
- School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai 200433, China.,Institute of Science and Technology for Brain-Inspired Intelligence and Shanghai Institute of Artificial Intelligence Algorithms, Shanghai 200433, China.,Ministry of Education, Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China.,Bio-Med Big Data Center, Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Science, Chinese Academy of Sciences, Shanghai 200031, China
| |
Collapse
|
14
|
Zhao S, Su C, Lu Z, Wang F. Recent advances in biomedical literature mining. Brief Bioinform 2021; 22:bbaa057. [PMID: 32422651 PMCID: PMC8138828 DOI: 10.1093/bib/bbaa057] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Revised: 03/22/2020] [Accepted: 03/25/2020] [Indexed: 01/26/2023] Open
Abstract
The recent years have witnessed a rapid increase in the number of scientific articles in biomedical domain. These literature are mostly available and readily accessible in electronic format. The domain knowledge hidden in them is critical for biomedical research and applications, which makes biomedical literature mining (BLM) techniques highly demanding. Numerous efforts have been made on this topic from both biomedical informatics (BMI) and computer science (CS) communities. The BMI community focuses more on the concrete application problems and thus prefer more interpretable and descriptive methods, while the CS community chases more on superior performance and generalization ability, thus more sophisticated and universal models are developed. The goal of this paper is to provide a review of the recent advances in BLM from both communities and inspire new research directions.
Collapse
Affiliation(s)
- Sendong Zhao
- Department of Healthcare Policy and Research, Weill Medical College of Cornell University, New York, NY 10065, USA
| | - Chang Su
- Division of Health Informatics, Department of Healthcare Policy and Research at Weill Cornell Medicine at Cornell University, New York, NY, USA
| | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI) at National Library of Medicine, National Institute of Health, Bethesda, MD, USA
| | - Fei Wang
- Department of Healthcare Policy and Research, Weill Medical College of Cornell University, New York, NY 10065, USA
| |
Collapse
|
15
|
Koutsomitropoulos DA, Andriopoulos AD. Thesaurus-based word embeddings for automated biomedical literature classification. Neural Comput Appl 2021; 34:937-950. [PMID: 33994670 PMCID: PMC8111057 DOI: 10.1007/s00521-021-06053-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2020] [Accepted: 04/15/2021] [Indexed: 11/29/2022]
Abstract
The special nature, volume and broadness of biomedical literature pose barriers for automated classification methods. On the other hand, manually indexing is time-consuming, costly and error prone. We argue that current word embedding algorithms can be efficiently used to support the task of biomedical text classification even in a multilabel setting, with many distinct labels. The ontology representation of Medical Subject Headings provides machine-readable labels and specifies the dimensionality of the problem space. Both deep- and shallow network approaches are implemented. Predictions are determined by the similarity between extracted features from contextualized representations of abstracts and headings. The addition of a separate classifier for transfer learning is also proposed and evaluated. Large datasets of biomedical citations are harvested for their metadata and used for training and testing. These automated approaches are still far from entirely substituting human experts, yet they can be useful as a mechanism for validation and recommendation. Dataset balancing, distributed processing and training parallelization in GPUs, all play an important part regarding the effectiveness and performance of proposed methods.
Collapse
Affiliation(s)
| | - Andreas D Andriopoulos
- Department of Computer Engineering and Informatics, School of Engineering, University of Patras, Patras, Greece
| |
Collapse
|
16
|
Pita Costa J, Rei L, Stopar L, Fuart F, Grobelnik M, Mladenić D, Novalija I, Staines A, Pääkkönen J, Konttila J, Bidaurrazaga J, Belar O, Henderson C, Epelde G, Gabaráin MA, Carlin P, Wallace J. NewsMeSH: A new classifier designed to annotate health news with MeSH headings. Artif Intell Med 2021; 114:102053. [PMID: 33875160 DOI: 10.1016/j.artmed.2021.102053] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Revised: 01/21/2021] [Accepted: 03/11/2021] [Indexed: 11/29/2022]
Abstract
MOTIVATION In the age of big data, the amount of scientific information available online dwarfs the ability of current tools to support researchers in locating and securing access to the necessary materials. Well-structured open data and the smart systems that make the appropriate use of it are invaluable and can help health researchers and professionals to find the appropriate information by, e.g., configuring the monitoring of information or refining a specific query on a disease. METHODS We present an automated text classifier approach based on the MEDLINE/MeSH thesaurus, trained on the manual annotation of more than 26 million expert-annotated scientific abstracts. The classifier was developed tailor-fit to the public health and health research domain experts, in the light of their specific challenges and needs. We have applied the proposed methodology on three specific health domains: the Coronavirus, Mental Health and Diabetes, considering the pertinence of the first, and the known relations with the other two health topics. RESULTS A classifier is trained on the MEDLINE dataset that can automatically annotate text, such as scientific articles, news articles or medical reports with relevant concepts from the MeSH thesaurus. CONCLUSIONS The proposed text classifier shows promising results in the evaluation of health-related news. The application of the developed classifier enables the exploration of news and extraction of health-related insights, based on the MeSH thesaurus, through a similar workflow as in the usage of PubMed, with which most health researchers are familiar.
Collapse
Affiliation(s)
| | - Luis Rei
- Jožef Stefan Institute, Slovenia
| | - Luka Stopar
- Jožef Stefan Institute, Slovenia; Quintelligence, Slovenia
| | - Flavio Fuart
- Jožef Stefan Institute, Slovenia; Quintelligence, Slovenia
| | | | - Dunja Mladenić
- Jožef Stefan Institute, Slovenia; Quintelligence, Slovenia
| | | | | | | | | | | | | | | | - Gorka Epelde
- Vicomtech Foundation, Basque Research and Technology Alliance (BRTA), Spain; Biodonostia, Spain
| | - Mónica Arrúe Gabaráin
- Northern Ireland Department of Health, UK; Vicomtech Foundation, Basque Research and Technology Alliance (BRTA), Spain
| | | | | |
Collapse
|
17
|
Ru X, Ye X, Sakurai T, Zou Q. Application of learning to rank in bioinformatics tasks. Brief Bioinform 2021; 22:6102666. [PMID: 33454758 DOI: 10.1093/bib/bbaa394] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2020] [Revised: 11/09/2020] [Accepted: 11/24/2020] [Indexed: 12/17/2022] Open
Abstract
Over the past decades, learning to rank (LTR) algorithms have been gradually applied to bioinformatics. Such methods have shown significant advantages in multiple research tasks in this field. Therefore, it is necessary to summarize and discuss the application of these algorithms so that these algorithms are convenient and contribute to bioinformatics. In this paper, the characteristics of LTR algorithms and their strengths over other types of algorithms are analyzed based on the application of multiple perspectives in bioinformatics. Finally, the paper further discusses the shortcomings of the LTR algorithms, the methods and means to better use the algorithms and some open problems that currently exist.
Collapse
Affiliation(s)
| | - Xiucai Ye
- Department of Computer Science and Center for Artificial Intelligence Research (C-AIR), University of Tsukuba
| | | | - Quan Zou
- University of Electronic Science and Technology of China
| |
Collapse
|
18
|
Mylonas N, Karlos S, Tsoumakas G. A Multi-instance Multi-label Weakly Supervised Approach for Dealing with Emerging MeSH Descriptors. Artif Intell Med 2021. [DOI: 10.1007/978-3-030-77211-6_47] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|