1
|
Thompson P, Ananiadou S, Basinas I, Brinchmann BC, Cramer C, Galea KS, Ge C, Georgiadis P, Kirkeleit J, Kuijpers E, Nguyen N, Nuñez R, Schlünssen V, Stokholm ZA, Taher EA, Tinnerberg H, Van Tongeren M, Xie Q. Supporting the working life exposome: Annotating occupational exposure for enhanced literature search. PLoS One 2024; 19:e0307844. [PMID: 39146349 PMCID: PMC11326626 DOI: 10.1371/journal.pone.0307844] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Accepted: 07/12/2024] [Indexed: 08/17/2024] Open
Abstract
An individual's likelihood of developing non-communicable diseases is often influenced by the types, intensities and duration of exposures at work. Job exposure matrices provide exposure estimates associated with different occupations. However, due to their time-consuming expert curation process, job exposure matrices currently cover only a subset of possible workplace exposures and may not be regularly updated. Scientific literature articles describing exposure studies provide important supporting evidence for developing and updating job exposure matrices, since they report on exposures in a variety of occupational scenarios. However, the constant growth of scientific literature is increasing the challenges of efficiently identifying relevant articles and important content within them. Natural language processing methods emulate the human process of reading and understanding texts, but in a fraction of the time. Such methods can increase the efficiency of both finding relevant documents and pinpointing specific information within them, which could streamline the process of developing and updating job exposure matrices. Named entity recognition is a fundamental natural language processing method for language understanding, which automatically identifies mentions of domain-specific concepts (named entities) in documents, e.g., exposures, occupations and job tasks. State-of-the-art machine learning models typically use evidence from an annotated corpus, i.e., a set of documents in which named entities are manually marked up (annotated) by experts, to learn how to detect named entities automatically in new documents. We have developed a novel annotated corpus of scientific articles to support machine learning based named entity recognition relevant to occupational substance exposures. Through incremental refinements to the annotation process, we demonstrate that expert annotators can attain high levels of agreement, and that the corpus can be used to train high-performance named entity recognition models. The corpus thus constitutes an important foundation for the wider development of natural language processing tools to support the study of occupational exposures.
Collapse
Affiliation(s)
- Paul Thompson
- Department of Computer Science, National Centre for Text Mining, University of Manchester, Manchester, United Kingdom
| | - Sophia Ananiadou
- Department of Computer Science, National Centre for Text Mining, University of Manchester, Manchester, United Kingdom
| | - Ioannis Basinas
- Centre for Occupational and Environmental Health, School of Health Sciences, University of Manchester, Manchester, United Kingdom
| | - Bendik C Brinchmann
- Federation of Norwegian Industries, Oslo, Norway
- Department of Occupational Medicine and Epidemiology, National Institute of Occupational Health, Oslo, Norway
| | - Christine Cramer
- Department of Public Health, Research Unit for Environment, Occupation and Health, Danish Ramazzini Centre, Aarhus University, Aarhus, Denmark
- Department of Occupational Medicine, Danish Ramazzini Centre, Aarhus University Hospital, Aarhus, Denmark
| | - Karen S Galea
- Institute of Occupational Medicine, Edinburgh, United Kingdom
| | - Calvin Ge
- Netherlands Organisation for Applied Scientific Research, Utrecht, Netherlands
| | - Panagiotis Georgiadis
- Department of Computer Science, National Centre for Text Mining, University of Manchester, Manchester, United Kingdom
| | - Jorunn Kirkeleit
- Federation of Norwegian Industries, Oslo, Norway
- Department of Global Public Health and Primary Care, University of Bergen, Bergen, Norway
| | - Eelco Kuijpers
- Netherlands Organisation for Applied Scientific Research, Utrecht, Netherlands
| | - Nhung Nguyen
- Department of Computer Science, National Centre for Text Mining, University of Manchester, Manchester, United Kingdom
| | - Roberto Nuñez
- Occupational Health Group, Institute for Risk Assessment Sciences, Utrecht University, Utrecht, Netherlands
| | - Vivi Schlünssen
- Department of Public Health, Research Unit for Environment, Occupation and Health, Danish Ramazzini Centre, Aarhus University, Aarhus, Denmark
| | - Zara Ann Stokholm
- Department of Occupational Medicine, Danish Ramazzini Centre, Aarhus University Hospital, Aarhus, Denmark
| | - Evana Amir Taher
- Center for Occupational and Environmental Medicine, Stockholm, Sweden
| | - Håkan Tinnerberg
- School of Public Health and Community Medicine, University of Gothenburg, Gothenburg, Sweden
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Martie Van Tongeren
- Centre for Occupational and Environmental Health, School of Health Sciences, University of Manchester, Manchester, United Kingdom
| | - Qianqian Xie
- Department of Computer Science, National Centre for Text Mining, University of Manchester, Manchester, United Kingdom
| |
Collapse
|
2
|
Gehanno JF, Thaon I, Pelissier C, Rollin L. Assessment of search strategies in Medline to identify studies on the impact of long COVID on workability. Front Res Metr Anal 2024; 9:1300533. [PMID: 38495828 PMCID: PMC10940504 DOI: 10.3389/frma.2024.1300533] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Accepted: 02/19/2024] [Indexed: 03/19/2024] Open
Abstract
Objectives Studies on the impact of long COVID on work capacity are increasing but are difficult to locate in bibliographic databases, due to the heterogeneity of the terms used to describe this new condition and its consequences. This study aims to report on the effectiveness of different search strategies to find studies on the impact of long COVID on work participation in PubMed and to create validated search strings. Methods We searched PubMed for articles published on Long COVID and including information about work. Relevant articles were identified and their reference lists were screened. Occupational health journals were manually scanned to identify articles that could have been missed. A total of 885 articles potentially relevant were collected and 120 were finally included in a gold standard database. Recall, Precision, and Number Needed to Read (NNR) of various keywords or combinations of keywords were assessed. Results Overall, 123 search-words alone or in combination were tested. The highest Recalls with a single MeSH term or textword were 23 and 90%, respectively. Two different search strings were developed, one optimizing Recall while keeping Precision acceptable (Recall 98.3%, Precision 15.9%, NNR 6.3) and one optimizing Precision while keeping Recall acceptable (Recall 90.8%, Precision 26.1%, NNR 3.8). Conclusions No single MeSH term allows to find all relevant studies on the impact of long COVID on work ability in PubMed. The use of various MeSH and non-MeSH terms in combination is required to recover such studies without being overwhelmed by irrelevant articles.
Collapse
Affiliation(s)
- Jean-François Gehanno
- Institute of Occupational Medicine, Rouen University Hospital, Rouen, France
- Inserm, Rouen University, Sorbonne University, University of Paris 13, Laboratory of Medical Informatics and Knowledge Engineering in e-Health, LIMICS, Paris, France
| | - Isabelle Thaon
- Centre de Consultations de Pathologie Professionnelle, CHRU de Nancy, Vandoeuvre les Nancy, Nancy, France
| | - Carole Pelissier
- Centre Hospitalier Universitaire de Saint-Etienne, Université Lyon 1, Université de St Etienne, Université Gustave Eiffel-IFSTTAR, Saint-Etienne, France
- UMRESTTE UMR-T9405, Saint-Etienne, France
| | - Laetitia Rollin
- Institute of Occupational Medicine, Rouen University Hospital, Rouen, France
- Inserm, Rouen University, Sorbonne University, University of Paris 13, Laboratory of Medical Informatics and Knowledge Engineering in e-Health, LIMICS, Paris, France
| |
Collapse
|
3
|
Ahmad I, Amelio A, Merla A, Scozzari F. A survey on the role of artificial intelligence in managing Long COVID. Front Artif Intell 2024; 6:1292466. [PMID: 38274052 PMCID: PMC10808521 DOI: 10.3389/frai.2023.1292466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Accepted: 12/26/2023] [Indexed: 01/27/2024] Open
Abstract
In the last years, several techniques of artificial intelligence have been applied to data from COVID-19. In addition to the symptoms related to COVID-19, many individuals with SARS-CoV-2 infection have described various long-lasting symptoms, now termed Long COVID. In this context, artificial intelligence techniques have been utilized to analyze data from Long COVID patients in order to assist doctors and alleviate the considerable strain on care and rehabilitation facilities. In this paper, we explore the impact of the machine learning methodologies that have been applied to analyze the many aspects of Long COVID syndrome, from clinical presentation through diagnosis. We also include the text mining techniques used to extract insights and trends from large amounts of text data related to Long COVID. Finally, we critically compare the various approaches and outline the work that has to be done to create a robust artificial intelligence approach for efficient diagnosis and treatment of Long COVID.
Collapse
Affiliation(s)
- Ijaz Ahmad
- Department of Human, Legal and Economic Sciences, Telematic University “Leonardo da Vinci”, Chieti, Italy
| | - Alessia Amelio
- Department of Engineering and Geology, University “G. d'Annunzio” Chieti-Pescara, Pescara, Italy
| | - Arcangelo Merla
- Department of Engineering and Geology, University “G. d'Annunzio” Chieti-Pescara, Pescara, Italy
| | - Francesca Scozzari
- Laboratory of Computational Logic and Artificial Intelligence, Department of Economic Studies, University “G. d'Annunzio” Chieti-Pescara, Pescara, Italy
| |
Collapse
|
4
|
Somayajula SA, Litake O, Liang Y, Hosseini R, Nemati S, Wilson DO, Weinreb RN, Malhotra A, Xie P. Improving long COVID-related text classification: a novel end-to-end domain-adaptive paraphrasing framework. Sci Rep 2024; 14:85. [PMID: 38168099 PMCID: PMC10761882 DOI: 10.1038/s41598-023-48594-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 11/28/2023] [Indexed: 01/05/2024] Open
Abstract
The emergence of long COVID during the ongoing COVID-19 pandemic has presented considerable challenges for healthcare professionals and researchers. The task of identifying relevant literature is particularly daunting due to the rapidly evolving scientific landscape, inconsistent definitions, and a lack of standardized nomenclature. This paper proposes a novel solution to this challenge by employing machine learning techniques to classify long COVID literature. However, the scarcity of annotated data for machine learning poses a significant obstacle. To overcome this, we introduce a strategy called medical paraphrasing, which diversifies the training data while maintaining the original content. Additionally, we propose a Data-Reweighting-Based Multi-Level Optimization Framework for Domain Adaptive Paraphrasing, supported by a Meta-Weight-Network (MWN). This innovative approach incorporates feedback from the downstream text classification model to influence the training of the paraphrasing model. During the training process, the framework assigns higher weights to the training examples that contribute more effectively to the downstream task of long COVID text classification. Our findings demonstrate that this method substantially improves the accuracy and efficiency of long COVID literature classification, offering a valuable tool for physicians and researchers navigating this complex and ever-evolving field.
Collapse
Affiliation(s)
- Sai Ashish Somayajula
- Department of Electrical and Computer Engineering, University of California, La Jolla, San Diego, USA
| | - Onkar Litake
- Department of Electrical and Computer Engineering, University of California, La Jolla, San Diego, USA
| | - Youwei Liang
- Department of Electrical and Computer Engineering, University of California, La Jolla, San Diego, USA
| | - Ramtin Hosseini
- Department of Electrical and Computer Engineering, University of California, La Jolla, San Diego, USA
| | - Shamim Nemati
- Division of Biomedical Informatics, University of California, La Jolla, San Diego, USA
| | - David O Wilson
- Department of Medicine, University of Pittsburgh Medical Center, Pittsburgh, USA
| | - Robert N Weinreb
- Hamilton Glaucoma Center, Shiley Eye Center and Department of Ophthalmology, University of California, La Jolla, San Diego, USA
| | - Atul Malhotra
- UC San Diego Health, Department of Medicine, La Jolla, San Diego, USA
| | - Pengtao Xie
- Department of Electrical and Computer Engineering, University of California, La Jolla, San Diego, USA.
| |
Collapse
|