1
|
Madan S, Lentzen M, Brandt J, Rueckert D, Hofmann-Apitius M, Fröhlich H. Transformer models in biomedicine. BMC Med Inform Decis Mak 2024; 24:214. [PMID: 39075407 PMCID: PMC11287876 DOI: 10.1186/s12911-024-02600-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Accepted: 07/08/2024] [Indexed: 07/31/2024] Open
Abstract
Deep neural networks (DNN) have fundamentally revolutionized the artificial intelligence (AI) field. The transformer model is a type of DNN that was originally used for the natural language processing tasks and has since gained more and more attention for processing various kinds of sequential data, including biological sequences and structured electronic health records. Along with this development, transformer-based models such as BioBERT, MedBERT, and MassGenie have been trained and deployed by researchers to answer various scientific questions originating in the biomedical domain. In this paper, we review the development and application of transformer models for analyzing various biomedical-related datasets such as biomedical textual data, protein sequences, medical structured-longitudinal data, and biomedical images as well as graphs. Also, we look at explainable AI strategies that help to comprehend the predictions of transformer-based models. Finally, we discuss the limitations and challenges of current models, and point out emerging novel research directions.
Collapse
Affiliation(s)
- Sumit Madan
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, Sankt Augustin, 53757, Germany.
- Institute of Computer Science, University of Bonn, Bonn, 53115, Germany.
| | - Manuel Lentzen
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, Sankt Augustin, 53757, Germany
- Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, Bonn, 53115, Germany
| | - Johannes Brandt
- School of Medicine, Klinikum Rechts der Isar, Technical University Munich, Munich, Germany
| | - Daniel Rueckert
- School of Medicine, Klinikum Rechts der Isar, Technical University Munich, Munich, Germany
- School of Computation, Information and Technology, Technical University Munich, Munich, Germany
- Department of Computing, Imperial College London, London, UK
| | - Martin Hofmann-Apitius
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, Sankt Augustin, 53757, Germany
- Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, Bonn, 53115, Germany
| | - Holger Fröhlich
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, Sankt Augustin, 53757, Germany.
- Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, Bonn, 53115, Germany.
| |
Collapse
|
2
|
Ullah S, Li Y, Rahman W, Ullah F, Ijaz M, Ullah A, Ahmad G, Ullah H, Gao T. CO-19 PDB 2.0: A Comprehensive COVID-19 Database with Global Auto-Alerts, Statistical Analysis, and Cancer Correlations. Database (Oxford) 2024; 2024:baae072. [PMID: 39066515 PMCID: PMC11281848 DOI: 10.1093/database/baae072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2024] [Revised: 06/13/2024] [Accepted: 07/15/2024] [Indexed: 07/28/2024]
Abstract
Biological databases serve as critical basics for modern research, and amid the dynamic landscape of biology, the COVID-19 database has emerged as an indispensable resource. The global outbreak of Covid-19, commencing in December 2019, necessitates comprehensive databases to unravel the intricate connections between this novel virus and cancer. Despite existing databases, a crucial need persists for a centralized and accessible method to acquire precise information within the research community. The main aim of the work is to develop a database which has all the COVID-19-related data available in just one click with auto global notifications. This gap is addressed by the meticulously designed COVID-19 Pandemic Database (CO-19 PDB 2.0), positioned as a comprehensive resource for researchers navigating the complexities of COVID-19 and cancer. Between December 2019 and June 2024, the CO-19 PDB 2.0 systematically collected and organized 120 datasets into six distinct categories, each catering to specific functionalities. These categories encompass a chemical structure database, a digital image database, a visualization tool database, a genomic database, a social science database, and a literature database. Functionalities range from image analysis and gene sequence information to data visualization and updates on environmental events. CO-19 PDB 2.0 has the option to choose either the search page for the database or the autonotification page, providing a seamless retrieval of information. The dedicated page introduces six predefined charts, providing insights into crucial criteria such as the number of cases and deaths', country-wise distribution, 'new cases and recovery', and rates of death and recovery. The global impact of COVID-19 on cancer patients has led to extensive collaboration among research institutions, producing numerous articles and computational studies published in international journals. A key feature of this initiative is auto daily notifications for standardized information updates. Users can easily navigate based on different categories or use a direct search option. The study offers up-to-date COVID-19 datasets and global statistics on COVID-19 and cancer, highlighting the top 10 cancers diagnosed in the USA in 2022. Breast and prostate cancers are the most common, representing 30% and 26% of new cases, respectively. The initiative also ensures the removal or replacement of dead links, providing a valuable resource for researchers, healthcare professionals, and individuals. The database has been implemented in PHP, HTML, CSS and MySQL and is available freely at https://www.co-19pdb.habdsk.org/. Database URL: https://www.co-19pdb.habdsk.org/.
Collapse
Affiliation(s)
| | - Yingmei Li
- Department of Pharmacy, The Seventh Affiliated Hospital of Sun Yat-sen University, Shenzhen 518107, China
- Big Data Center, The Seventh Affiliated Hospital of Sun Yat-sen University, Shenzhen 518107, China
| | | | | | | | - Anees Ullah
- S Khan Lab Mardan, Khyber Pakhtunkhwa, Pakistan
| | | | | | - Tianshun Gao
- Big Data Center, The Seventh Affiliated Hospital of Sun Yat-sen University, Shenzhen 518107, China
| |
Collapse
|
3
|
Gustavson AM, Morrow CD, Brown RJ, Kaka AS, Sowerby C, Wilt TJ, Diem SJ. Reimagining How We Synthesize Information to Impact Clinical Care, Policy, and Research Priorities in Real Time: Examples and Lessons Learned from COVID-19. J Gen Intern Med 2024:10.1007/s11606-024-08855-y. [PMID: 38926318 DOI: 10.1007/s11606-024-08855-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Accepted: 06/03/2024] [Indexed: 06/28/2024]
Abstract
Real-time clinical care, policy, and research decisions need real-time evidence synthesis. However, as we found during the COVID-19 pandemic, it is challenging to rapidly address key clinical and policy questions through rigorous, relevant, and usable evidence. Our objective is to present three exemplar cases of rapid evidence synthesis products from the Veterans Healthcare Administration Evidence Synthesis Program (ESP) and, in the context of these examples, outline ESP products, challenges, and lessons learned. We faced challenges in (1) balancing scientific rigor with the speed in which evidence synthesis was needed, (2) sorting through rapidly evolving large bodies of evidence, and (3) assessing the impact of evidence synthesis products on clinical care, policy, and research. We found solutions in (1) engaging stakeholders early, (2) utilizing artificial intelligence capabilities, (3) building infrastructure to establish living reviews, and (4) planning for dissemination to maximize impact.
Collapse
Affiliation(s)
- Allison M Gustavson
- Veterans Affairs Health Services Research and Development Center for Care Delivery and Outcomes Research, Minneapolis Veterans Affairs Healthcare System, Minneapolis, MN, USA.
- Department of Medicine, University of Minnesota, Minneapolis, MN, USA.
| | | | - Rebecca Jl Brown
- Veterans Affairs Health Services Research and Development Center for Care Delivery and Outcomes Research, Minneapolis Veterans Affairs Healthcare System, Minneapolis, MN, USA
- School of Nursing, University of Minnesota, Minneapolis, MN, USA
| | - Anjum S Kaka
- Veterans Affairs Health Services Research and Development Center for Care Delivery and Outcomes Research, Minneapolis Veterans Affairs Healthcare System, Minneapolis, MN, USA
- Department of Medicine, University of Minnesota, Minneapolis, MN, USA
| | - Catherine Sowerby
- Veterans Affairs Health Services Research and Development Center for Care Delivery and Outcomes Research, Minneapolis Veterans Affairs Healthcare System, Minneapolis, MN, USA
| | - Timothy J Wilt
- Veterans Affairs Health Services Research and Development Center for Care Delivery and Outcomes Research, Minneapolis Veterans Affairs Healthcare System, Minneapolis, MN, USA
- Department of Medicine, University of Minnesota, Minneapolis, MN, USA
- Division of Health Policy and Management, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - Susan J Diem
- Veterans Affairs Health Services Research and Development Center for Care Delivery and Outcomes Research, Minneapolis Veterans Affairs Healthcare System, Minneapolis, MN, USA
- Department of Medicine, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
4
|
Hagerman L, Clark EC, Neil-Sztramko SE, Colangeli T, Dobbins M. Features of databases that supported searching for rapid evidence synthesis during COVID-19: implications for future public health emergencies. BMC Med Res Methodol 2024; 24:135. [PMID: 38907198 PMCID: PMC11191239 DOI: 10.1186/s12874-024-02246-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Accepted: 05/14/2024] [Indexed: 06/23/2024] Open
Abstract
BACKGROUND As evidence related to the COVID-19 pandemic surged, databases, platforms, and repositories evolved with features and functions to assist users in promptly finding the most relevant evidence. In response, research synthesis teams adopted novel searching strategies to sift through the vast amount of evidence to synthesize and disseminate the most up-to-date evidence. This paper explores the key database features that facilitated systematic searching for rapid evidence synthesis during the COVID-19 pandemic to inform knowledge management infrastructure during future global health emergencies. METHODS This paper outlines the features and functions of previously existing and newly created evidence sources routinely searched as part of the NCCMT's Rapid Evidence Service methods, including databases, platforms, and repositories. Specific functions of each evidence source were assessed as they pertain to searching in the context of a public health emergency, including the topics of indexed citations, the level of evidence of indexed citations, and specific usability features of each evidence source. RESULTS Thirteen evidence sources were assessed, of which four were newly created and nine were either pre-existing or adapted from previously existing resources. Evidence sources varied in topics indexed, level of evidence indexed, and specific searching functions. CONCLUSION This paper offers insights into which features enabled systematic searching for the completion of rapid reviews to inform decision makers within 5-10 days. These findings provide guidance for knowledge management strategies and evidence infrastructures during future public health emergencies.
Collapse
Affiliation(s)
- Leah Hagerman
- National Collaborating Centre for Methods and Tools, McMaster University, McMaster Innovation Park, 175 Longwood Rd S, Suite 210a, Hamilton, ON, L8P 0A1, Canada
| | - Emily C Clark
- National Collaborating Centre for Methods and Tools, McMaster University, McMaster Innovation Park, 175 Longwood Rd S, Suite 210a, Hamilton, ON, L8P 0A1, Canada
| | - Sarah E Neil-Sztramko
- National Collaborating Centre for Methods and Tools, McMaster University, McMaster Innovation Park, 175 Longwood Rd S, Suite 210a, Hamilton, ON, L8P 0A1, Canada
- Department of Health Research Methods, Evidence & Impact, McMaster University, McMaster University Medical Centre, 2C Area, 1280 Main St W, Hamilton, ON, L8S 4K1, Canada
| | - Taylor Colangeli
- National Collaborating Centre for Methods and Tools, McMaster University, McMaster Innovation Park, 175 Longwood Rd S, Suite 210a, Hamilton, ON, L8P 0A1, Canada
| | - Maureen Dobbins
- National Collaborating Centre for Methods and Tools, McMaster University, McMaster Innovation Park, 175 Longwood Rd S, Suite 210a, Hamilton, ON, L8P 0A1, Canada.
- School of Nursing, Health Sciences Centre, McMaster University, 2J20, 1280 Main St W, Hamilton, ON, L8S 4K1, Canada.
| |
Collapse
|
5
|
Invernici F, Bernasconi A, Ceri S. Searching COVID-19 Clinical Research Using Graph Queries: Algorithm Development and Validation. J Med Internet Res 2024; 26:e52655. [PMID: 38814687 PMCID: PMC11176882 DOI: 10.2196/52655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2023] [Revised: 03/06/2024] [Accepted: 03/30/2024] [Indexed: 05/31/2024] Open
Abstract
BACKGROUND Since the beginning of the COVID-19 pandemic, >1 million studies have been collected within the COVID-19 Open Research Dataset, a corpus of manuscripts created to accelerate research against the disease. Their related abstracts hold a wealth of information that remains largely unexplored and difficult to search due to its unstructured nature. Keyword-based search is the standard approach, which allows users to retrieve the documents of a corpus that contain (all or some of) the words in a target list. This type of search, however, does not provide visual support to the task and is not suited to expressing complex queries or compensating for missing specifications. OBJECTIVE This study aims to consider small graphs of concepts and exploit them for expressing graph searches over existing COVID-19-related literature, leveraging the increasing use of graphs to represent and query scientific knowledge and providing a user-friendly search and exploration experience. METHODS We considered the COVID-19 Open Research Dataset corpus and summarized its content by annotating the publications' abstracts using terms selected from the Unified Medical Language System and the Ontology of Coronavirus Infectious Disease. Then, we built a co-occurrence network that includes all relevant concepts mentioned in the corpus, establishing connections when their mutual information is relevant. A sophisticated graph query engine was built to allow the identification of the best matches of graph queries on the network. It also supports partial matches and suggests potential query completions using shortest paths. RESULTS We built a large co-occurrence network, consisting of 128,249 entities and 47,198,965 relationships; the GRAPH-SEARCH interface allows users to explore the network by formulating or adapting graph queries; it produces a bibliography of publications, which are globally ranked; and each publication is further associated with the specific parts of the query that it explains, thereby allowing the user to understand each aspect of the matching. CONCLUSIONS Our approach supports the process of query formulation and evidence search upon a large text corpus; it can be reapplied to any scientific domain where documents corpora and curated ontologies are made available.
Collapse
Affiliation(s)
- Francesco Invernici
- Department of Electronics, Information, and Bioengineering, Politecnico di Milano, Milan, Italy
| | - Anna Bernasconi
- Department of Electronics, Information, and Bioengineering, Politecnico di Milano, Milan, Italy
| | - Stefano Ceri
- Department of Electronics, Information, and Bioengineering, Politecnico di Milano, Milan, Italy
| |
Collapse
|
6
|
Zhao D, Cheng S, Tsui FR, Mathur MB, Wang CHJ. The Risk of Aircraft-Acquired SARS-CoV-2 Transmission during Commercial Flights: A Systematic Review. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2024; 21:654. [PMID: 38928901 PMCID: PMC11203943 DOI: 10.3390/ijerph21060654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 05/09/2024] [Accepted: 05/15/2024] [Indexed: 06/28/2024]
Abstract
The aircraft-acquired transmission of SARS-CoV-2 poses a public health risk. Following PRISMA guidelines, we conducted a systematic review and analysis of articles, published prior to vaccines being available, from 24 January 2020 to 20 April 2021 to identify factors important for transmission. Articles were included if they mentioned index cases and identifiable flight duration, and excluded if they discussed non-commercial aircraft, airflow or transmission models, cases without flight data, or that were unable to determine in-flight transmission. From the 15 articles selected for in-depth review, 50 total flights were analyzed by flight duration both as a categorical variable-short (<3 h), medium (3-6 h), or long flights (>6 h)-and as a continuous variable with case counts modeled by negative binomial regression. Compared to short flights without masking, medium and long flights without masking were associated with 4.66-fold increase (95% CI: [1.01, 21.52]; p < 0.0001) and 25.93-fold increase in incidence rates (95% CI: [4.1, 164]; p < 0.0001), respectively; long flights with enforced masking had no transmission reported. A 1 h increase in flight duration was associated with 1.53-fold (95% CI: [1.19, 1.66]; p < 0.001) increase in the incidence rate ratio (IRR) of cases. Masking should be considered for long flights.
Collapse
Affiliation(s)
- Diana Zhao
- Center for Policy, Outcomes and Prevention, Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94305, USA; (D.Z.)
| | - Stephanie Cheng
- Center for Policy, Outcomes and Prevention, Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94305, USA; (D.Z.)
| | - Fuchiang R. Tsui
- Department of Anesthesiology and Critical Care Medicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19146, USA;
| | - Maya B. Mathur
- Quantitative Sciences Unit, Department of Pediatrics, Stanford University, Stanford, CA 94305, USA;
| | - Chih-Hung Jason Wang
- Center for Policy, Outcomes and Prevention, Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94305, USA; (D.Z.)
- Center for Health Policy, Freeman-Spogli Institute for International Studies, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
7
|
Hogberg HT, Tsaioun K, Breidenbach JD, Elmore B, Filipovska J, Garcia-Reyero N, Hargreaves AJ, Joshi O, Omeragic E, Plant S, Ram R, Virmani I, Waspe J, Macmillan DS. A systematic scoping review of the neurological effects of COVID-19. Neurotoxicology 2024; 103:16-26. [PMID: 38763473 DOI: 10.1016/j.neuro.2024.05.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 05/10/2024] [Accepted: 05/15/2024] [Indexed: 05/21/2024]
Abstract
BACKGROUND The global coronavirus 2019 (COVID-19) pandemic began in early 2020, caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). In mid-2020 the CIAO (Modelling the Pathogenesis of COVID-19 Using the Adverse Outcome Pathway Framework) project was established, bringing together over 75 interdisciplinary scientists worldwide to collaboratively investigate the underlying biological mechanisms of COVID-19 and consolidate the data using the Adverse Outcome Pathway (AOP) Framework. Neurological symptoms such as anosmia and encephalitis have been frequently reported to be associated with infection with SARS-CoV-2. OBJECTIVE Within CIAO, a working group was formed to conduct a systematic scoping review of COVID-19 and its related neurological symptoms to determine which key events and modulating factors are most commonly reported and to identify knowledge gaps. DESIGN LitCOVID was used to retrieve 86,075 papers of which 10,244 contained relevant keywords. After title and abstract screening, 2,328 remained and their full texts were reviewed based on predefined inclusion and exclusion criteria. 991 studies fulfilled the inclusion criteria and were retrieved to conduct knowledge synthesis. RESULTS The majority of publications reported human observational studies. Early key events were less likely to be reported compared to middle and late key events/adverse outcomes. The majority of modulating factors described related to age or sex. Less recognised COVID-19 associated AO or neurological effects of COVID-19 were also identified including multiple sclerosis/demyelination, neurodegeneration/cognitive effects and peripheral neuronal effects. CONCLUSION There were many methodological and reporting issues noted in the reviewed studies. In particular, publication abstracts would benefit from clearer reporting of the methods and endpoints used and the key findings, to ensure relevant papers are included when systematic reviews are conducted. The information extracted from the scoping review may be useful in understanding the mechanisms of neurological effects of COVID-19 and to further develop or support existing AOPs linking COVID-19 and its neurological key events and adverse outcomes. Further evaluation of the less recognised COVID-19 effects is needed.
Collapse
Affiliation(s)
- Helena T Hogberg
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC, USA; Evidence-Based Toxicology Collaboration (EBTC), Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Katya Tsaioun
- Evidence-Based Toxicology Collaboration (EBTC), Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Joshua D Breidenbach
- Biochemistry and Biotechnology Group, Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | | | | | - Natalia Garcia-Reyero
- Office of the Secretary of Defense Energy, Installations & Environment, Washington, DC, USA
| | | | | | - Elma Omeragic
- University of Sarajevo-Faculty of Pharmacy, Sarajevo, Bosnia and Herzegovina
| | | | | | - Ishita Virmani
- Centre for Alternatives to Animal Testing, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA; RECETOX, Faculty of Science, Masaryk University, Brno, Czech Republic
| | | | - Donna S Macmillan
- Humane Society International, 1255 23rd St. NW, Suite 450, Washington, DC 20037, USA.
| |
Collapse
|
8
|
Qiu X, Wang C, Li B, Tong H, Tan X, Yang L, Tao J, Huang J. An audio-semantic multimodal model for automatic obstructive sleep Apnea-Hypopnea Syndrome classification via multi-feature analysis of snoring sounds. Front Neurosci 2024; 18:1336307. [PMID: 38800571 PMCID: PMC11116639 DOI: 10.3389/fnins.2024.1336307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 04/29/2024] [Indexed: 05/29/2024] Open
Abstract
Introduction Obstructive Sleep Apnea-Hypopnea Syndrome (OSAHS) is a common sleep-related breathing disorder that significantly impacts the daily lives of patients. Currently, the diagnosis of OSAHS relies on various physiological signal monitoring devices, requiring a comprehensive Polysomnography (PSG). However, this invasive diagnostic method faces challenges such as data fluctuation and high costs. To address these challenges, we propose a novel data-driven Audio-Semantic Multi-Modal model for OSAHS severity classification (i.e., ASMM-OSA) based on patient snoring sound characteristics. Methods In light of the correlation between the acoustic attributes of a patient's snoring patterns and their episodes of breathing disorders, we utilize the patient's sleep audio recordings as an initial screening modality. We analyze the audio features of snoring sounds during the night for subjects suspected of having OSAHS. Audio features were augmented via PubMedBERT to enrich their diversity and detail and subsequently classified for OSAHS severity using XGBoost based on the number of sleep apnea events. Results Experimental results using the OSAHS dataset from a collaborative university hospital demonstrate that our ASMM-OSA audio-semantic multimodal model achieves a diagnostic level in automatically identifying sleep apnea events and classifying the four-class severity (normal, mild, moderate, and severe) of OSAHS. Discussion Our proposed model promises new perspectives for non-invasive OSAHS diagnosis, potentially reducing costs and enhancing patient quality of life.
Collapse
Affiliation(s)
- Xihe Qiu
- School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai, China
| | - Chenghao Wang
- School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai, China
| | - Bin Li
- School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai, China
| | - Huijie Tong
- School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai, China
| | - Xiaoyu Tan
- INF Technology (Shanghai) Co., Ltd., Shanghai, China
| | - Long Yang
- Department of Otolaryngology, Shenzhen Second People's Hospital, Shenzhen, China
| | - Jing Tao
- Department of Otolaryngology, Shenzhen Second People's Hospital, Shenzhen, China
| | | |
Collapse
|
9
|
Chandrabhatla AS, Narahari AK, Horgan TM, Patel PD, Sturek JM, Davis CL, Jackson PEH, Bell TD. Machine Learning-based Analysis of Publications Funded by the National Institutes of Health's Initial COVID-19 Pandemic Response. Open Forum Infect Dis 2024; 11:ofae156. [PMID: 38659624 PMCID: PMC11041405 DOI: 10.1093/ofid/ofae156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2024] [Accepted: 03/14/2024] [Indexed: 04/26/2024] Open
Abstract
Background The National Institutes of Health (NIH) mobilized more than $4 billion in extramural funding for the COVID-19 pandemic. Assessing the research output from this effort is crucial to understanding how the scientific community leveraged federal funding and responded to this public health crisis. Methods NIH-funded COVID-19 grants awarded between January 2020 and December 2021 were identified from NIH Research Portfolio Online Reporting Tools Expenditures and Results using the "COVID-19 Response" filter. PubMed identifications of publications under these grants were collected and the NIH iCite tool was used to determine citation counts and focus (eg, clinical, animal). iCite and the NIH's LitCOVID database were used to identify publications directly related to COVID-19. Publication titles and Medical Subject Heading terms were used as inputs to a machine learning-based model built to identify common topics/themes within the publications. Results and Conclusions We evaluated 2401 grants that resulted in 14 654 publications. The majority of these papers were published in peer-reviewed journals, though 483 were published to preprint servers. In total, 2764 (19%) papers were directly related to COVID-19 and generated 252 029 citations. These papers were mostly clinically focused (62%), followed by cell/molecular (32%), and animal focused (6%). Roughly 60% of preprint publications were cell/molecular-focused, compared with 26% of nonpreprint publications. The machine learning-based model identified the top 3 research topics to be clinical trials and outcomes research (8.5% of papers), coronavirus-related heart and lung damage (7.3%), and COVID-19 transmission/epidemiology (7.2%). This study provides key insights regarding how researchers leveraged federal funding to study the COVID-19 pandemic during its initial phase.
Collapse
Affiliation(s)
| | - Adishesh K Narahari
- Division of Cardiothoracic Surgery, University of Virginia School of Medicine, Charlottesville, Virginia, USA
| | - Taylor M Horgan
- School of Medicine, University of Virginia, Charlottesville, Virginia, USA
| | - Paranjay D Patel
- Department of Cardiovascular Surgery, Houston Methodist Hospital, Houston, Texas, USA
| | - Jeffrey M Sturek
- School of Medicine, University of Virginia, Charlottesville, Virginia, USA
- Division Of Pulmonary and Critical Care Medicine, University of Virginia, Charlottesville, Virginia, USA
| | - Claire L Davis
- School of Medicine, University of Virginia, Charlottesville, Virginia, USA
- Division Of Pulmonary and Critical Care Medicine, University of Virginia, Charlottesville, Virginia, USA
| | - Patrick E H Jackson
- School of Medicine, University of Virginia, Charlottesville, Virginia, USA
- Division of Infectious Diseases and International Health, University of Virginia, Charlottesville, Virginia, USA
| | - Taison D Bell
- School of Medicine, University of Virginia, Charlottesville, Virginia, USA
- Division Of Pulmonary and Critical Care Medicine, University of Virginia, Charlottesville, Virginia, USA
- Division of Infectious Diseases and International Health, University of Virginia, Charlottesville, Virginia, USA
| |
Collapse
|
10
|
Liu H, Soroush A, Nestor JG, Park E, Idnay B, Fang Y, Pan J, Liao S, Bernard M, Peng Y, Weng C. Retrieval augmented scientific claim verification. JAMIA Open 2024; 7:ooae021. [PMID: 38455840 PMCID: PMC10919922 DOI: 10.1093/jamiaopen/ooae021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 01/19/2024] [Accepted: 02/14/2024] [Indexed: 03/09/2024] Open
Abstract
Objective To automate scientific claim verification using PubMed abstracts. Materials and Methods We developed CliVER, an end-to-end scientific Claim VERification system that leverages retrieval-augmented techniques to automatically retrieve relevant clinical trial abstracts, extract pertinent sentences, and use the PICO framework to support or refute a scientific claim. We also created an ensemble of three state-of-the-art deep learning models to classify rationale of support, refute, and neutral. We then constructed CoVERt, a new COVID VERification dataset comprising 15 PICO-encoded drug claims accompanied by 96 manually selected and labeled clinical trial abstracts that either support or refute each claim. We used CoVERt and SciFact (a public scientific claim verification dataset) to assess CliVER's performance in predicting labels. Finally, we compared CliVER to clinicians in the verification of 19 claims from 6 disease domains, using 189 648 PubMed abstracts extracted from January 2010 to October 2021. Results In the evaluation of label prediction accuracy on CoVERt, CliVER achieved a notable F1 score of 0.92, highlighting the efficacy of the retrieval-augmented models. The ensemble model outperforms each individual state-of-the-art model by an absolute increase from 3% to 11% in the F1 score. Moreover, when compared with four clinicians, CliVER achieved a precision of 79.0% for abstract retrieval, 67.4% for sentence selection, and 63.2% for label prediction, respectively. Conclusion CliVER demonstrates its early potential to automate scientific claim verification using retrieval-augmented strategies to harness the wealth of clinical trial abstracts in PubMed. Future studies are warranted to further test its clinical utility.
Collapse
Affiliation(s)
- Hao Liu
- School of Computing, Montclair State University, Montclair, NJ 07043, United States
| | - Ali Soroush
- Department of Medicine, Columbia University, New York, NY 10027, United States
| | - Jordan G Nestor
- Department of Medicine, Columbia University, New York, NY 10027, United States
| | - Elizabeth Park
- Department of Medicine, Columbia University, New York, NY 10027, United States
| | - Betina Idnay
- Department of Biomedical Informatics, Columbia University, New York, NY 10027, United States
| | - Yilu Fang
- Department of Biomedical Informatics, Columbia University, New York, NY 10027, United States
| | - Jane Pan
- Department of Applied Physics and Applied Mathematics, Columbia University, New York, NY 10027, United States
| | - Stan Liao
- Department of Applied Physics and Applied Mathematics, Columbia University, New York, NY 10027, United States
| | - Marguerite Bernard
- Institute of Human Nutrition, Columbia University, New York, NY 10027, United States
| | - Yifan Peng
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, United States
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY 10027, United States
| |
Collapse
|
11
|
Harada NM, Kuzmichev A, Dean HD. COVID-19 Response of the Journal Public Health Reports ( PHR), March 2020-March 2023. Public Health Rep 2024; 139:154-162. [PMID: 38044622 PMCID: PMC10851904 DOI: 10.1177/00333549231210514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2023] Open
Abstract
OBJECTIVE Publication science is the scholarly study of various aspects of the academic publishing process. Its applications to COVID-19 literature have been limited. Here, we describe COVID-19 submissions to, and resulting articles published by, the journal Public Health Reports (PHR), an important resource for US public health practice. METHODS We reviewed PHR's COVID-19 submissions and articles published between March 27, 2020, and March 27, 2023. We coded each article for article type, author affiliation, the categories listed in PHR's call for COVID-19 papers, and the public health emergency preparedness and response capabilities from the Centers for Disease Control and Prevention (CDC). RESULTS During the study period, PHR received 1545 COVID-19 submissions and published 190 of those articles in a collection, COVID-19 Response. The COVID-19 Response collection included 102 research articles, 29 case study/practice articles, and 24 commentaries. The corresponding author of more than half (52.1%; n = 99) of the articles was affiliated with academia. By the categories listed in PHR's call for COVID-19 papers, 51 articles addressed health disparities, 38 addressed public health surveillance, and 34 addressed COVID-19 vaccination. By the CDC public health emergency preparedness and response capabilities, 87 articles addressed public health surveillance and epidemiologic investigation, 38 addressed community preparedness, and 32 addressed community recovery. The percentage of articles focused on policy/law was higher early in the pandemic (2020-2021) than later (2022-2023) (9.5% vs <3.0%). During the latter period, articles largely focused on vaccination (12.8%) and contact tracing (10.6%). CONCLUSIONS Articles published in PHR's COVID-19 Response collection covered a broad range of topics and were authored by contributors from diverse organizations. Our characterization of the COVID-19 output of a representative US public health practice journal can help academic publishing better address informational needs of public health responders.
Collapse
Affiliation(s)
- Noelle M. Harada
- Public Health Reports, Office of the Surgeon General, US Department of Health and Human Services, Washington, DC, USA
| | - Andrey Kuzmichev
- Public Health Reports, Office of the Surgeon General, US Department of Health and Human Services, Washington, DC, USA
- Fogarty International Center, National Institutes of Health, Bethesda, MD, USA
| | - Hazel D. Dean
- Public Health Reports, Office of the Surgeon General, US Department of Health and Human Services, Washington, DC, USA
| |
Collapse
|
12
|
Jahan I, Laskar MTR, Peng C, Huang JX. A comprehensive evaluation of large Language models on benchmark biomedical text processing tasks. Comput Biol Med 2024; 171:108189. [PMID: 38447502 DOI: 10.1016/j.compbiomed.2024.108189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 02/14/2024] [Accepted: 02/18/2024] [Indexed: 03/08/2024]
Abstract
Recently, Large Language Models (LLMs) have demonstrated impressive capability to solve a wide range of tasks. However, despite their success across various tasks, no prior work has investigated their capability in the biomedical domain yet. To this end, this paper aims to evaluate the performance of LLMs on benchmark biomedical tasks. For this purpose, a comprehensive evaluation of 4 popular LLMs in 6 diverse biomedical tasks across 26 datasets has been conducted. To the best of our knowledge, this is the first work that conducts an extensive evaluation and comparison of various LLMs in the biomedical domain. Interestingly, we find based on our evaluation that in biomedical datasets that have smaller training sets, zero-shot LLMs even outperform the current state-of-the-art models when they were fine-tuned only on the training set of these datasets. This suggests that pre-training on large text corpora makes LLMs quite specialized even in the biomedical domain. We also find that not a single LLM can outperform other LLMs in all tasks, with the performance of different LLMs may vary depending on the task. While their performance is still quite poor in comparison to the biomedical models that were fine-tuned on large training sets, our findings demonstrate that LLMs have the potential to be a valuable tool for various biomedical tasks that lack large annotated data.
Collapse
Affiliation(s)
- Israt Jahan
- Department of Biology, York University, Canada; Information Retrieval and Knowledge Management Research Lab, York University, Canada.
| | - Md Tahmid Rahman Laskar
- School of Information Technology, York University, Canada; Information Retrieval and Knowledge Management Research Lab, York University, Canada; Dialpad Inc., Canada.
| | - Chun Peng
- Department of Biology, York University, Canada.
| | - Jimmy Xiangji Huang
- School of Information Technology, York University, Canada; Information Retrieval and Knowledge Management Research Lab, York University, Canada.
| |
Collapse
|
13
|
Jones MG, Clarke PJ, Meshesha HS, Mulhorn KA, Traci MA, Nieuwenhuijsen ER. COVID-19, Disability, and the International Classification of Functioning, Disability and Health: A Scoping Review of Early-Stage Pandemic Response. AJPM FOCUS 2024; 3:100152. [PMID: 38089427 PMCID: PMC10711380 DOI: 10.1016/j.focus.2023.100152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/11/2024]
Abstract
Introduction This study aimed to systematically identify the environmental factors that impacted people with disability during the COVID-19 pandemic. Methods A scoping literature review was conducted using LitCOVID (January 1-July 31, 2020). Sixty-six articles met the inclusion criteria that (1) discussed disability and/or health conditions related to functioning and (2) considered environmental factors. A qualitative content analysis was conducted using codes from the WHO International Classification of Functioning, Disability and Health. Results A total of 212 International Classification of Functioning, Disability and Health codes were used in the coding process. The most frequent codes referred to health services policies and public health guidelines. These policies, although generally considered facilitators for minimizing infection, were frequently identified as barriers to the health, participation, and human rights of people with disability. The lack of disability-specific population data was identified as a key barrier to planning and decision making. Conclusions The social determinants of health for people with disability were not adequately considered in the acute phase of infection prevention at the population level. Integrating the International Classification of Functioning, Disability and Health in emergency management provides a tool to evaluate functioning and address barriers for those in need.
Collapse
Affiliation(s)
| | - Philippa J. Clarke
- Institute for Social Research, University of Michigan, Ann Arbor, Michigan
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, Michigan
| | - Hana Shewamoltot Meshesha
- Rural Institute for Inclusive Communities, University of Montana, Missoula, Montana
- Department of Counseling, College of Education, University of Montana, Missoula, Montana
- Department of Counseling, Idaho State University, Meridian, Idaho
| | - Kristine A. Mulhorn
- Health Administration Department, Drexel University, Philadelphia, Pennsylvania
| | - Meg Ann Traci
- Rural Institute for Inclusive Communities, University of Montana, Missoula, Montana
- School of Public and Community Health Sciences, College of Health, University of Montana, Missoula, Montana
| | | |
Collapse
|
14
|
Jin Q, Leaman R, Lu Z. PubMed and beyond: biomedical literature search in the age of artificial intelligence. EBioMedicine 2024; 100:104988. [PMID: 38306900 PMCID: PMC10850402 DOI: 10.1016/j.ebiom.2024.104988] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2023] [Revised: 01/14/2024] [Accepted: 01/15/2024] [Indexed: 02/04/2024] Open
Abstract
Biomedical research yields vast information, much of which is only accessible through the literature. Consequently, literature search is crucial for healthcare and biomedicine. Recent improvements in artificial intelligence (AI) have expanded functionality beyond keywords, but they might be unfamiliar to clinicians and researchers. In response, we present an overview of over 30 literature search tools tailored to common biomedical use cases, aiming at helping readers efficiently fulfill their information needs. We first discuss recent improvements and continued challenges of the widely used PubMed. Then, we describe AI-based literature search tools catering to five specific information needs: 1. Evidence-based medicine. 2. Precision medicine and genomics. 3. Searching by meaning, including questions. 4. Finding related articles with literature recommendation. 5. Discovering hidden associations through literature mining. Finally, we discuss the impacts of recent developments of large language models such as ChatGPT on biomedical information seeking.
Collapse
Affiliation(s)
- Qiao Jin
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD, USA
| | - Robert Leaman
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD, USA
| | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD, USA.
| |
Collapse
|
15
|
Alqaissi E, Alotaibi F, Sher Ramzan M, Algarni A. Novel graph-based machine-learning technique for viral infectious diseases: application to influenza and hepatitis diseases. Ann Med 2024; 55:2304108. [PMID: 38242107 PMCID: PMC10802812 DOI: 10.1080/07853890.2024.2304108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Accepted: 12/18/2023] [Indexed: 01/21/2024] Open
Abstract
BACKGROUND Most infectious diseases are caused by viruses, fungi, bacteria and parasites. Their ability to easily infect humans and trigger large-scale epidemics makes them a public health concern. Methods for early detection of these diseases have been developed; however, they are hindered by the absence of a unified, interoperable and reusable model. This study seeks to create a holistic and real-time model for swift, preliminary detection of infectious diseases using symptoms and additional clinical data. MATERIALS AND METHODS In this study, we present a medical knowledge graph (MKG) that leverages multiple data sources to analyse connections between different nodes. Medical ontologies were used to enhance the MKG. We applied various graph algorithms to extract key features. The performance of multiple machine-learning (ML) techniques for influenza and hepatitis detection was assessed, selecting multi-layer perceptron (MLP) and random forest (RF) models due to their superior outcomes. The hyperparameters of both graph-based ML models were automatically fine-tuned. RESULTS Both the graph-based MLP and RF models showcased the least loss and error rates, along with the most specific, accurate recall, precision and F1 scores. Their Matthews correlation coefficients were also optimal. When compared with existing ML techniques and findings from the literature, these graph-based ML models manifested superior detection accuracy. CONCLUSIONS The graph-based MLP and RF models effectively diagnosed influenza and hepatitis, respectively. This underlines the potential of graph data science in enhancing ML model performance and uncovering concealed relationships in the MKG.
Collapse
Affiliation(s)
- Eman Alqaissi
- Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
- Computer Science and Information Systems, The Applied College, King Khalid University, Abha, Saudi Arabia
| | - Fahd Alotaibi
- Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Muhammad Sher Ramzan
- Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | | |
Collapse
|
16
|
Somayajula SA, Litake O, Liang Y, Hosseini R, Nemati S, Wilson DO, Weinreb RN, Malhotra A, Xie P. Improving long COVID-related text classification: a novel end-to-end domain-adaptive paraphrasing framework. Sci Rep 2024; 14:85. [PMID: 38168099 PMCID: PMC10761882 DOI: 10.1038/s41598-023-48594-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 11/28/2023] [Indexed: 01/05/2024] Open
Abstract
The emergence of long COVID during the ongoing COVID-19 pandemic has presented considerable challenges for healthcare professionals and researchers. The task of identifying relevant literature is particularly daunting due to the rapidly evolving scientific landscape, inconsistent definitions, and a lack of standardized nomenclature. This paper proposes a novel solution to this challenge by employing machine learning techniques to classify long COVID literature. However, the scarcity of annotated data for machine learning poses a significant obstacle. To overcome this, we introduce a strategy called medical paraphrasing, which diversifies the training data while maintaining the original content. Additionally, we propose a Data-Reweighting-Based Multi-Level Optimization Framework for Domain Adaptive Paraphrasing, supported by a Meta-Weight-Network (MWN). This innovative approach incorporates feedback from the downstream text classification model to influence the training of the paraphrasing model. During the training process, the framework assigns higher weights to the training examples that contribute more effectively to the downstream task of long COVID text classification. Our findings demonstrate that this method substantially improves the accuracy and efficiency of long COVID literature classification, offering a valuable tool for physicians and researchers navigating this complex and ever-evolving field.
Collapse
Affiliation(s)
- Sai Ashish Somayajula
- Department of Electrical and Computer Engineering, University of California, La Jolla, San Diego, USA
| | - Onkar Litake
- Department of Electrical and Computer Engineering, University of California, La Jolla, San Diego, USA
| | - Youwei Liang
- Department of Electrical and Computer Engineering, University of California, La Jolla, San Diego, USA
| | - Ramtin Hosseini
- Department of Electrical and Computer Engineering, University of California, La Jolla, San Diego, USA
| | - Shamim Nemati
- Division of Biomedical Informatics, University of California, La Jolla, San Diego, USA
| | - David O Wilson
- Department of Medicine, University of Pittsburgh Medical Center, Pittsburgh, USA
| | - Robert N Weinreb
- Hamilton Glaucoma Center, Shiley Eye Center and Department of Ophthalmology, University of California, La Jolla, San Diego, USA
| | - Atul Malhotra
- UC San Diego Health, Department of Medicine, La Jolla, San Diego, USA
| | - Pengtao Xie
- Department of Electrical and Computer Engineering, University of California, La Jolla, San Diego, USA.
| |
Collapse
|
17
|
Ghedira K, Dallali H, Ardhaoui M, Bouslema Z, Hamdi Y, Feki Ben-Salah S, Chelbi H, Atri C, Chaouch M, Dekhil N, Rais A, Azouz S, Gharbi M, Guerfali F, Hkimi C, Kamoun S, Ksouri A, Moumni I, Ouragini H, Bsibes R, Afifi Z, Youssfi K, Ben Hassine H, Hadhri N, Mardassi H, Othman H, Khamessi O. PHINDaccess Hackathons for COVID-19 and Host-Pathogen Interaction: Lessons Learned and Recommendations for Low- and Middle-Income Countries. BIOMED RESEARCH INTERNATIONAL 2023; 2023:6638714. [PMID: 37854792 PMCID: PMC10581832 DOI: 10.1155/2023/6638714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/15/2023] [Revised: 08/29/2023] [Accepted: 09/01/2023] [Indexed: 10/20/2023]
Abstract
Hackathons are collaborative events that bring together diverse groups to solve predefined challenges. The COVID-19 pandemic caused by SARS-CoV-2 has emphasized the need for portable and reproducible genomics analysis pipelines to study the genetic susceptibility of the human host and investigate human-SARS-CoV-2 protein interactions. To build and strengthen institutional capacities in OMICS data analysis applied to host-pathogen interaction (HPI), the PHINDaccess project organized two hackathons in 2020 and 2021. These hackathons are aimed at developing bioinformatics pipelines related to the SARS-CoV-2 viral genome, its phylodynamic transmission, and the identification of human genome host variants, with a focus on addressing global health challenges, particularly in low- and middle-income countries (LMIC). This paper outlines the preparation, proceedings, and lessons learned from these hackathons, including the challenges faced by participants and our recommendations based on our experience for organizing hackathons in LMIC and beyond.
Collapse
Affiliation(s)
- Kais Ghedira
- Laboratory of Bioinformatics, Biomathematics and Biostatistics LR20IPT09, Pasteur Institute of Tunis, University of Tunis El Manar, Tunis 1002, Tunisia
| | - Hamza Dallali
- Laboratory of Biomedical Genomics and Oncogenetics (LR20IPT05), Pasteur Institute of Tunis, University of Tunis El Manar, Tunis 1002, Tunisia
| | - Monia Ardhaoui
- Department of Human and Experimentally Anatomic Pathology, Laboratory of Molecular Epidemiology and Experimental Pathology, Institut Pasteur de Tunis, University of Tunis El Manar, Tunis, Tunisia
- Laboratory of Molecular Epidemiology and Experimental Pathology, Tunisia
| | - Zied Bouslema
- Laboratory for Rabies Diagnostics, Institute Pasteur of Tunis, Belvedere, Tunis 1002, Tunisia
- University of Tunis El Manar, Tunis, Tunisia
| | - Yosr Hamdi
- Laboratory of Biomedical Genomics and Oncogenetics (LR20IPT05), Pasteur Institute of Tunis, University of Tunis El Manar, Tunis 1002, Tunisia
| | - Salma Feki Ben-Salah
- Laboratory of Virus, Vector and Hosts (LR20IPT02), Institut Pasteur de Tunis, Université Tunis El Manar, Tunis 1068, Tunisia
| | - Hanen Chelbi
- Laboratory of Medical Parasitology, Biotechnology and Biomolecules, LR16IPT06, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis Belvédère 1002, Tunisia
| | - Chiraz Atri
- Laboratory of Transmission, Control and Immunobiology of Infections (LTCII), LR16IPT02, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis Belvédère 1002, Tunisia
| | - Melek Chaouch
- Laboratory of Bioinformatics, Biomathematics and Biostatistics LR20IPT09, Pasteur Institute of Tunis, University of Tunis El Manar, Tunis 1002, Tunisia
| | - Naira Dekhil
- Laboratory of Molecular Microbiology, Vaccinology, And Biotechnology Development, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia
| | - Afef Rais
- Laboratory of Transmission, Control and Immunobiology of Infections (LTCII), LR16IPT02, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis Belvédère 1002, Tunisia
| | - Saifeddine Azouz
- Genomics Platform, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis 1068, Tunisia
| | - Manel Gharbi
- Laboratory of Epidemiology and Veterinary Microbiology. Group of Bacteriology and Biotechnology Institut Pasteur of Tunisia, University of Tunis El Manar (UTM), Tunis 1002, Tunisia
| | - Fatma Guerfali
- Laboratory of Transmission, Control and Immunobiology of Infections (LTCII), LR16IPT02, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis Belvédère 1002, Tunisia
| | - Chaima Hkimi
- Laboratory of Bioinformatics, Biomathematics and Biostatistics LR20IPT09, Pasteur Institute of Tunis, University of Tunis El Manar, Tunis 1002, Tunisia
| | - Selim Kamoun
- Laboratory of Bioinformatics, Biomathematics and Biostatistics LR20IPT09, Pasteur Institute of Tunis, University of Tunis El Manar, Tunis 1002, Tunisia
| | - Ayoub Ksouri
- Laboratory of Venom, Toxins and Therapeutic Molecules, Institut Pasteur Tunis, University Tunis El Manar, Tunis, Tunisia
| | - Imen Moumni
- Laboratory of Molecular and Cellular Hematology, LR16IPT07, Pasteur Institute of Tunis, University of Tunis El Manar, Tunisia
| | - Houyem Ouragini
- Laboratory of Molecular and Cellular Hematology, LR16IPT07, Pasteur Institute of Tunis, University of Tunis El Manar, Tunisia
| | - Raghda Bsibes
- Grant Office, Institut Pasteur de Tunis, Tunis, Tunisia
| | - Zeineb Afifi
- Grant Office, Institut Pasteur de Tunis, Tunis, Tunisia
| | - Khouloud Youssfi
- Specialized Unit “Communication, Science and Society”, Institut Pasteur de Tunis, Tunis, Tunisia
| | - Hichem Ben Hassine
- Specialized Unit “Communication, Science and Society”, Institut Pasteur de Tunis, Tunis, Tunisia
| | - Najet Hadhri
- Grant Office, Institut Pasteur de Tunis, Tunis, Tunisia
| | - Helmi Mardassi
- Laboratory of Molecular Microbiology, Vaccinology, And Biotechnology Development, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia
| | - Houcemeddine Othman
- Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
- Department of Genetics, Farhat Hached University Hospital, Sousse, Tunisia
- Laboratory of Cytogenetics, Molecular Genetics, and Reproductive Biology (LR03SP02), Farhat Hached University Hospital, Sousse, Tunisia
| | - Oussema Khamessi
- Laboratory of Venoms and Therapeutic Molecules LR11IPT08, Institut Pasteur de Tunis, University of Tunis El Manar, 13 Place Pasteur BP74Belvédère, Tunis Belvédère, Tunisia
- High Institute of Biotechnology of Sidi Thabet, University of Manouba, Ariana BP-66, Manouba 2010, Tunisia
| |
Collapse
|
18
|
Liu T, Hu Y, Wang B, Sun Y, Gao J, Yin B. Hierarchical Graph Convolutional Networks for Structured Long Document Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:8071-8085. [PMID: 35767491 DOI: 10.1109/tnnls.2022.3185295] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Long document classification (LDC) has been a focused interest in natural language processing (NLP) recently with the exponential increase of publications. Based on the pretrained language models, many LDC methods have been proposed and achieved considerable progression. However, most of the existing methods model long documents as sequences of text while omitting the document structure, thus limiting the capability of effectively representing long texts carrying structure information. To mitigate such limitation, we propose a novel hierarchical graph convolutional network (HGCN) for structured LDC in this article, in which a section graph network is proposed to model the macrostructure of a document and a word graph network with a decoupled graph convolutional block is designed to extract the fine-grained features of a document. In addition, an interaction strategy is proposed to integrate these two networks as a whole by propagating features between them. To verify the effectiveness of the proposed model, four structured long document datasets are constructed, and the extensive experiments conducted on these datasets and another unstructured dataset show that the proposed method outperforms the state-of-the-art related classification methods.
Collapse
|
19
|
Zhang Z, Fang M, Wu R, Zong H, Huang H, Tong Y, Xie Y, Cheng S, Wei Z, Crabbe MJC, Zhang X, Wang Y. Large-Scale Biomedical Relation Extraction Across Diverse Relation Types: Model Development and Usability Study on COVID-19. J Med Internet Res 2023; 25:e48115. [PMID: 37632414 PMCID: PMC10551783 DOI: 10.2196/48115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Revised: 07/03/2023] [Accepted: 08/25/2023] [Indexed: 08/28/2023] Open
Abstract
BACKGROUND Biomedical relation extraction (RE) is of great importance for researchers to conduct systematic biomedical studies. It not only helps knowledge mining, such as knowledge graphs and novel knowledge discovery, but also promotes translational applications, such as clinical diagnosis, decision-making, and precision medicine. However, the relations between biomedical entities are complex and diverse, and comprehensive biomedical RE is not yet well established. OBJECTIVE We aimed to investigate and improve large-scale RE with diverse relation types and conduct usability studies with application scenarios to optimize biomedical text mining. METHODS Data sets containing 125 relation types with different entity semantic levels were constructed to evaluate the impact of entity semantic information on RE, and performance analysis was conducted on different model architectures and domain models. This study also proposed a continued pretraining strategy and integrated models with scripts into a tool. Furthermore, this study applied RE to the COVID-19 corpus with article topics and application scenarios of clinical interest to assess and demonstrate its biological interpretability and usability. RESULTS The performance analysis revealed that RE achieves the best performance when the detailed semantic type is provided. For a single model, PubMedBERT with continued pretraining performed the best, with an F1-score of 0.8998. Usability studies on COVID-19 demonstrated the interpretability and usability of RE, and a relation graph database was constructed, which was used to reveal existing and novel drug paths with edge explanations. The models (including pretrained and fine-tuned models), integrated tool (Docker), and generated data (including the COVID-19 relation graph database and drug paths) have been made publicly available to the biomedical text mining community and clinical researchers. CONCLUSIONS This study provided a comprehensive analysis of RE with diverse relation types. Optimized RE models and tools for diverse relation types were developed, which can be widely used in biomedical text mining. Our usability studies provided a proof-of-concept demonstration of how large-scale RE can be leveraged to facilitate novel research.
Collapse
Affiliation(s)
- Zeyu Zhang
- Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, China
- Department of Clinical Laboratory Medicine Center, Yueyang Hospital of Integrated Traditional Chinese and Western Medicine, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Meng Fang
- Department of Laboratory Medicine, Shanghai Eastern Hepatobiliary Surgery Hospital, Shanghai, China
| | - Rebecca Wu
- University of California, Berkeley, Berkeley, CA, United States
| | - Hui Zong
- Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, China
- Institutes for Systems Genetics, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, China
| | - Honglian Huang
- Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Yuantao Tong
- Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Yujia Xie
- Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Shiyang Cheng
- Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Ziyi Wei
- Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - M James C Crabbe
- Wolfson College, Oxford University, Oxford, United Kingdom
- Institute of Biomedical and Environmental Science & Technology, University of Bedfordshire, Luton, United Kingdom
- School of Life Sciences, Shanxi University, Taiyuan, China
| | - Xiaoyan Zhang
- Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Ying Wang
- Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, China
- Department of Clinical Laboratory Medicine Center, Yueyang Hospital of Integrated Traditional Chinese and Western Medicine, Shanghai University of Traditional Chinese Medicine, Shanghai, China
- Department of Laboratory Medicine, Shanghai Eastern Hepatobiliary Surgery Hospital, Shanghai, China
| |
Collapse
|
20
|
Case BKM, Young JG, Hébert-Dufresne L. Accurately summarizing an outbreak using epidemiological models takes time. ROYAL SOCIETY OPEN SCIENCE 2023; 10:230634. [PMID: 37771961 PMCID: PMC10523082 DOI: 10.1098/rsos.230634] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Accepted: 08/30/2023] [Indexed: 09/30/2023]
Abstract
Recent outbreaks of Mpox and Ebola, and worrying waves of COVID-19, influenza and respiratory syncytial virus, have all led to a sharp increase in the use of epidemiological models to estimate key epidemiological parameters. The feasibility of this estimation task is known as the practical identifiability (PI) problem. Here, we investigate the PI of eight commonly reported statistics of the classic susceptible-infectious-recovered model using a new measure that shows how much a researcher can expect to learn in a model-based Bayesian analysis of prevalence data. Our findings show that the basic reproductive number and final outbreak size are often poorly identified, with learning exceeding that of individual model parameters only in the early stages of an outbreak. The peak intensity, peak timing and initial growth rate are better identified, being in expectation over 20 times more probable having seen the data by the time the underlying outbreak peaks. We then test PI for a variety of true parameter combinations and find that PI is especially problematic in slow-growing or less-severe outbreaks. These results add to the growing body of literature questioning the reliability of inferences from epidemiological models when limited data are available.
Collapse
Affiliation(s)
- B. K. M. Case
- Vermont Complex Systems Center, University of Vermont, Burlington, VT 05405, USA
- Department of Computer Science, University of Vermont, Burlington, VT 05405, USA
| | - Jean-Gabriel Young
- Vermont Complex Systems Center, University of Vermont, Burlington, VT 05405, USA
- Department of Computer Science, University of Vermont, Burlington, VT 05405, USA
- Department of Mathematics and Statistics, University of Vermont, Burlington, VT 05405, USA
| | - Laurent Hébert-Dufresne
- Vermont Complex Systems Center, University of Vermont, Burlington, VT 05405, USA
- Department of Computer Science, University of Vermont, Burlington, VT 05405, USA
| |
Collapse
|
21
|
Rafique Q, Rehman A, Afghan MS, Ahmad HM, Zafar I, Fayyaz K, Ain Q, Rayan RA, Al-Aidarous KM, Rashid S, Mushtaq G, Sharma R. Reviewing methods of deep learning for diagnosing COVID-19, its variants and synergistic medicine combinations. Comput Biol Med 2023; 163:107191. [PMID: 37354819 PMCID: PMC10281043 DOI: 10.1016/j.compbiomed.2023.107191] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 05/28/2023] [Accepted: 06/19/2023] [Indexed: 06/26/2023]
Abstract
The COVID-19 pandemic has necessitated the development of reliable diagnostic methods for accurately detecting the novel coronavirus and its variants. Deep learning (DL) techniques have shown promising potential as screening tools for COVID-19 detection. In this study, we explore the realistic development of DL-driven COVID-19 detection methods and focus on the fully automatic framework using available resources, which can effectively investigate various coronavirus variants through modalities. We conducted an exploration and comparison of several diagnostic techniques that are widely used and globally validated for the detection of COVID-19. Furthermore, we explore review-based studies that provide detailed information on synergistic medicine combinations for the treatment of COVID-19. We recommend DL methods that effectively reduce time, cost, and complexity, providing valuable guidance for utilizing available synergistic combinations in clinical and research settings. This study also highlights the implication of innovative diagnostic technical and instrumental strategies, exploring public datasets, and investigating synergistic medicines using optimised DL rules. By summarizing these findings, we aim to assist future researchers in their endeavours by providing a comprehensive overview of the implication of DL techniques in COVID-19 detection and treatment. Integrating DL methods with various diagnostic approaches holds great promise in improving the accuracy and efficiency of COVID-19 diagnostics, thus contributing to effective control and management of the ongoing pandemic.
Collapse
Affiliation(s)
- Qandeel Rafique
- Department of Internal Medicine, Sahiwal Medical College, Sahiwal, 57040, Pakistan.
| | - Ali Rehman
- Department of General Medicine Govt. Eye and General Hospital Lahore, 54000, Pakistan.
| | - Muhammad Sher Afghan
- Department of Internal Medicine District Headquarter Hospital Faislaabad, 62300, Pakistan.
| | - Hafiz Muhamad Ahmad
- Department of Internal Medicine District Headquarter Hospital Bahawalnagar, 62300, Pakistan.
| | - Imran Zafar
- Department of Bioinformatics and Computational Biology, Virtual University Pakistan, 44000, Pakistan.
| | - Kompal Fayyaz
- Department of National Centre for Bioinformatics, Quaid-I-Azam University Islamabad, 45320, Pakistan.
| | - Quratul Ain
- Department of Chemistry, Government College Women University Faisalabad, 03822, Pakistan.
| | - Rehab A Rayan
- Department of Epidemiology, High Institute of Public Health, Alexandria University, 21526, Egypt.
| | - Khadija Mohammed Al-Aidarous
- Department of Computer Science, College of Science and Arts in Sharurah, Najran University, 51730, Saudi Arabia.
| | - Summya Rashid
- Department of Pharmacology & Toxicology, College of Pharmacy, Prince Sattam Bin Abdulaziz University, P.O. Box 173, Al-Kharj, 11942, Saudi Arabia.
| | - Gohar Mushtaq
- Center for Scientific Research, Faculty of Medicine, Idlib University, Idlib, Syria.
| | - Rohit Sharma
- Department of Rasashastra and Bhaishajya Kalpana, Faculty of Ayurveda, Institute of Medical Sciences, Banaras Hindu University, Varanasi, India.
| |
Collapse
|
22
|
Wang L, Ambite JL, Appaji A, Bijsterbosch J, Dockes J, Herrick R, Kogan A, Lander H, Marcus D, Moore SM, Poline JB, Rajasekar A, Sahoo SS, Turner MD, Wang X, Wang Y, Turner JA. NeuroBridge: a prototype platform for discovery of the long-tail neuroimaging data. Front Neuroinform 2023; 17:1215261. [PMID: 37720825 PMCID: PMC10500076 DOI: 10.3389/fninf.2023.1215261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Accepted: 08/01/2023] [Indexed: 09/19/2023] Open
Abstract
Introduction Open science initiatives have enabled sharing of large amounts of already collected data. However, significant gaps remain regarding how to find appropriate data, including underutilized data that exist in the long tail of science. We demonstrate the NeuroBridge prototype and its ability to search PubMed Central full-text papers for information relevant to neuroimaging data collected from schizophrenia and addiction studies. Methods The NeuroBridge architecture contained the following components: (1) Extensible ontology for modeling study metadata: subject population, imaging techniques, and relevant behavioral, cognitive, or clinical data. Details are described in the companion paper in this special issue; (2) A natural-language based document processor that leveraged pre-trained deep-learning models on a small-sample document corpus to establish efficient representations for each article as a collection of machine-recognized ontological terms; (3) Integrated search using ontology-driven similarity to query PubMed Central and NeuroQuery, which provides fMRI activation maps along with PubMed source articles. Results The NeuroBridge prototype contains a corpus of 356 papers from 2018 to 2021 describing schizophrenia and addiction neuroimaging studies, of which 186 were annotated with the NeuroBridge ontology. The search portal on the NeuroBridge website https://neurobridges.org/ provides an interactive Query Builder, where the user builds queries by selecting NeuroBridge ontology terms to preserve the ontology tree structure. For each return entry, links to the PubMed abstract as well as to the PMC full-text article, if available, are presented. For each of the returned articles, we provide a list of clinical assessments described in the Section "Methods" of the article. Articles returned from NeuroQuery based on the same search are also presented. Conclusion The NeuroBridge prototype combines ontology-based search with natural-language text-mining approaches to demonstrate that papers relevant to a user's research question can be identified. The NeuroBridge prototype takes a first step toward identifying potential neuroimaging data described in full-text papers. Toward the overall goal of discovering "enough data of the right kind," ongoing work includes validating the document processor with a larger corpus, extending the ontology to include detailed imaging data, and extracting information regarding data availability from the returned publications and incorporating XNAT-based neuroimaging databases to enhance data accessibility.
Collapse
Affiliation(s)
- Lei Wang
- Psychiatry and Behavioral Health Department, The Ohio State University Wexner Medical Center, Columbus, OH, United States
| | - José Luis Ambite
- Information Sciences Institute and Computer Science, University of Southern California, Los Angeles, CA, United States
| | - Abhishek Appaji
- Department of Medical Electronics Engineering, BMS College of Engineering, Bangalore, India
| | - Janine Bijsterbosch
- Department of Radiology, Washington University in St. Louis, St. Louis, MO, United States
| | - Jerome Dockes
- Department of Neurology and Neurosurgery, McGill University, Montreal, QC, Canada
| | - Rick Herrick
- Department of Radiology, Washington University in St. Louis, St. Louis, MO, United States
| | - Alex Kogan
- Psychiatry and Behavioral Health Department, The Ohio State University Wexner Medical Center, Columbus, OH, United States
| | - Howard Lander
- Renaissance Computing Institute, Chapel Hill, NC, United States
| | - Daniel Marcus
- Department of Radiology, Washington University in St. Louis, St. Louis, MO, United States
| | - Stephen M. Moore
- Department of Radiology, Washington University in St. Louis, St. Louis, MO, United States
| | - Jean-Baptiste Poline
- Department of Neurology and Neurosurgery, McGill University, Montreal, QC, Canada
| | - Arcot Rajasekar
- Renaissance Computing Institute, Chapel Hill, NC, United States
- School of Information and Library Science, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Satya S. Sahoo
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, United States
| | - Matthew D. Turner
- Psychiatry and Behavioral Health Department, The Ohio State University Wexner Medical Center, Columbus, OH, United States
| | - Xiaochen Wang
- College of Information Sciences and Technology, Pennsylvania State University, State College, PA, United States
| | - Yue Wang
- School of Information and Library Science, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Jessica A. Turner
- Psychiatry and Behavioral Health Department, The Ohio State University Wexner Medical Center, Columbus, OH, United States
| |
Collapse
|
23
|
Sofi-Mahmudi A, Raittio E, Uribe SE. Transparency of COVID-19-related research: A meta-research study. PLoS One 2023; 18:e0288406. [PMID: 37494359 PMCID: PMC10370694 DOI: 10.1371/journal.pone.0288406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Accepted: 06/26/2023] [Indexed: 07/28/2023] Open
Abstract
BACKGROUND We aimed to assess the adherence to five transparency practices (data availability, code availability, protocol registration and conflicts of interest (COI), and funding disclosures) from open access Coronavirus disease 2019 (COVID-19) related articles. METHODS We searched and exported all open access COVID-19-related articles from PubMed-indexed journals in the Europe PubMed Central database published from January 2020 to June 9, 2022. With a validated and automated tool, we detected transparent practices of three paper types: research articles, randomized controlled trials (RCTs), and reviews. Basic journal- and article-related information were retrieved from the database. We used R for the descriptive analyses. RESULTS The total number of articles was 258,678, of which we were able to retrieve full texts of 186,157 (72%) articles from the database Over half of the papers (55.7%, n = 103,732) were research articles, 10.9% (n = 20,229) were review articles, and less than one percent (n = 1,202) were RCTs. Approximately nine-tenths of articles (in all three paper types) had a statement to disclose COI. Funding disclosure (83.9%, confidence interval (CI): 81.7-85.8 95%) and protocol registration (53.5%, 95% CI: 50.7-56.3) were more frequent in RCTs than in reviews or research articles. Reviews shared data (2.5%, 95% CI: 2.3-2.8) and code (0.4%, 95% CI: 0.4-0.5) less frequently than RCTs or research articles. Articles published in 2022 had the highest adherence to all five transparency practices. Most of the reviews (62%) and research articles (58%) adhered to two transparency practices, whereas almost half of the RCTs (47%) adhered to three practices. There were journal- and publisher-related differences in all five practices, and articles that did not adhere to transparency practices were more likely published in lowest impact journals and were less likely cited. CONCLUSION While most articles were freely available and had a COI disclosure, adherence to other transparent practices was far from acceptable. A much stronger commitment to open science practices, particularly to protocol registration, data and code sharing, is needed from all stakeholders.
Collapse
Affiliation(s)
- Ahmad Sofi-Mahmudi
- National Pain Centre, Department of Anesthesia, McMaster University, Hamilton, Ontario, Canada
- Department of Health Research Methods, Evidence and Impact, McMaster University, Hamilton, Ontario, Canada
- Seqiz Health Network, Kurdistan University of Medical Sciences, Seqiz, Kurdistan
| | - Eero Raittio
- Institute of Dentistry, University of Eastern Finland, Kuopio, Finland
- Department of Dentistry and Oral Health, Aarhus University, Aarhus, Denmark
| | - Sergio E Uribe
- Department of Conservative Dentistry and Oral Health, Riga Stradins University, Riga, Latvia
- School of Dentistry, Universidad Austral de Chile, Valdivia, Chile
- Baltic Biomaterials Centre of Excellence, Headquarters at Riga Technical University, Riga, Latvia
| |
Collapse
|
24
|
Trelles Trabucco J, Arighi C, Shatkay H, Marai GE. Enhancing biomedical search interfaces with images. BIOINFORMATICS ADVANCES 2023; 3:vbad095. [PMID: 37485423 PMCID: PMC10359625 DOI: 10.1093/bioadv/vbad095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 06/30/2023] [Accepted: 07/14/2023] [Indexed: 07/25/2023]
Abstract
Motivation Figures in biomedical papers communicate essential information with the potential to identify relevant documents in biomedical and clinical settings. However, academic search interfaces mainly search over text fields. Results We describe a search system for biomedical documents that leverages image modalities and an existing index server. We integrate a problem-specific taxonomy of image modalities and image-based data into a custom search system. Our solution features a front-end interface to enhance classical document search results with image-related data, including page thumbnails, figures, captions and image-modality information. We demonstrate the system on a subset of the CORD-19 document collection. A quantitative evaluation demonstrates higher precision and recall for biomedical document retrieval. A qualitative evaluation with domain experts further highlights our solution's benefits to biomedical search. Availability and implementation A demonstration is available at https://runachay.evl.uic.edu/scholar. Our code and image models can be accessed via github.com/uic-evl/bio-search. The dataset is continuously expanded.
Collapse
Affiliation(s)
- Juan Trelles Trabucco
- Department of Computer Science, University of Illinois Chicago, Chicago, IL 60607, USA
| | - Cecilia Arighi
- Department of Computer and Information Science, University of Delaware, Newark, DE 19716, USA
| | - Hagit Shatkay
- Department of Computer and Information Science, University of Delaware, Newark, DE 19716, USA
| | | |
Collapse
|
25
|
Maison DP, Deng Y, Gerschenson M. SARS-CoV-2 and the host-immune response. Front Immunol 2023; 14:1195871. [PMID: 37404823 PMCID: PMC10315470 DOI: 10.3389/fimmu.2023.1195871] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 06/05/2023] [Indexed: 07/06/2023] Open
Abstract
The SARS-CoV-2 pandemic and the COVID-19 disease have affected everyone globally, leading to one of recorded history's most significant research surges. As our knowledge evolves, our approaches to the virus and treatments must also evolve. The evaluation of future research approaches to SARS-CoV-2 will necessitate reviewing the host immune response and viral antagonism of that response. This review provides an overview of the current knowledge on SARS-CoV-2 by summarizing the virus and human response. The focuses are on the viral genome, replication cycle, host immune activation, response, signaling, and antagonism. To effectively fight the pandemic, efforts must focus on the current state of research to help develop treatments and prepare for future outbreaks.
Collapse
Affiliation(s)
- David P. Maison
- Department of Cell and Molecular Biology, John A. Burns School of Medicine, University of Hawaii at Manoa, Honolulu, HI, United States
| | - Youping Deng
- Department of Quantitative Health Sciences, John A. Burns School of Medicine, University of Hawaii at Manoa, Honolulu, HI, United States
| | - Mariana Gerschenson
- Department of Cell and Molecular Biology, John A. Burns School of Medicine, University of Hawaii at Manoa, Honolulu, HI, United States
| |
Collapse
|
26
|
Knafou J, Haas Q, Borissov N, Counotte M, Low N, Imeri H, Ipekci AM, Buitrago-Garcia D, Heron L, Amini P, Teodoro D. Ensemble of deep learning language models to support the creation of living systematic reviews for the COVID-19 literature. Syst Rev 2023; 12:94. [PMID: 37277872 DOI: 10.1186/s13643-023-02247-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Accepted: 04/24/2023] [Indexed: 06/07/2023] Open
Abstract
BACKGROUND The COVID-19 pandemic has led to an unprecedented amount of scientific publications, growing at a pace never seen before. Multiple living systematic reviews have been developed to assist professionals with up-to-date and trustworthy health information, but it is increasingly challenging for systematic reviewers to keep up with the evidence in electronic databases. We aimed to investigate deep learning-based machine learning algorithms to classify COVID-19-related publications to help scale up the epidemiological curation process. METHODS In this retrospective study, five different pre-trained deep learning-based language models were fine-tuned on a dataset of 6365 publications manually classified into two classes, three subclasses, and 22 sub-subclasses relevant for epidemiological triage purposes. In a k-fold cross-validation setting, each standalone model was assessed on a classification task and compared against an ensemble, which takes the standalone model predictions as input and uses different strategies to infer the optimal article class. A ranking task was also considered, in which the model outputs a ranked list of sub-subclasses associated with the article. RESULTS The ensemble model significantly outperformed the standalone classifiers, achieving a F1-score of 89.2 at the class level of the classification task. The difference between the standalone and ensemble models increases at the sub-subclass level, where the ensemble reaches a micro F1-score of 70% against 67% for the best-performing standalone model. For the ranking task, the ensemble obtained the highest recall@3, with a performance of 89%. Using an unanimity voting rule, the ensemble can provide predictions with higher confidence on a subset of the data, achieving detection of original papers with a F1-score up to 97% on a subset of 80% of the collection instead of 93% on the whole dataset. CONCLUSION This study shows the potential of using deep learning language models to perform triage of COVID-19 references efficiently and support epidemiological curation and review. The ensemble consistently and significantly outperforms any standalone model. Fine-tuning the voting strategy thresholds is an interesting alternative to annotate a subset with higher predictive confidence.
Collapse
Affiliation(s)
- Julien Knafou
- University of Applied Sciences and Arts of Western Switzerland (HES-SO), Rue de la Tambourine 17, 1227, Geneva, Switzerland.
| | | | - Nikolay Borissov
- University of Applied Sciences and Arts of Western Switzerland (HES-SO), Rue de la Tambourine 17, 1227, Geneva, Switzerland
- CTU Bern, University of Bern, Bern, Switzerland
| | - Michel Counotte
- Institute of Social and Preventive Medicine, University of Bern, Bern, Switzerland
- Wageningen Bioveterinary Research, Wageningen University & Research, Wageningen, The Netherlands
| | - Nicola Low
- Institute of Social and Preventive Medicine, University of Bern, Bern, Switzerland
| | - Hira Imeri
- Institute of Social and Preventive Medicine, University of Bern, Bern, Switzerland
| | - Aziz Mert Ipekci
- Institute of Social and Preventive Medicine, University of Bern, Bern, Switzerland
| | | | - Leonie Heron
- Institute of Social and Preventive Medicine, University of Bern, Bern, Switzerland
| | - Poorya Amini
- Risklick AG, Bern, Switzerland
- CTU Bern, University of Bern, Bern, Switzerland
| | - Douglas Teodoro
- University of Applied Sciences and Arts of Western Switzerland (HES-SO), Rue de la Tambourine 17, 1227, Geneva, Switzerland.
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland.
| |
Collapse
|
27
|
Ramjattun K, Xiaojun M, Shou-Jiang G, Singh H, Osmanbeyoglu HU. COVID-19db linkage maps of cell surface proteins and transcription factors in immune cells. J Med Virol 2023; 95:e28887. [PMID: 37341527 PMCID: PMC10478683 DOI: 10.1002/jmv.28887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 05/25/2023] [Accepted: 06/08/2023] [Indexed: 06/22/2023]
Abstract
The highly contagious SARS-CoV-2 and its associated disease (COVID-19) are a threat to global public health and economies. To develop effective treatments for COVID-19, we must understand the host cell types, cell states and regulators associated with infection and pathogenesis such as dysregulated transcription factors (TFs) and surface proteins, including signaling receptors. To link cell surface proteins with TFs, we recently developed SPaRTAN (Single-cell Proteomic and RNA-based Transcription factor Activity Network) by integrating parallel single-cell proteomic and transcriptomic data based on Cellular Indexing of Transcriptomes and Epitopes by sequencing (CITE-seq) and gene cis-regulatory information. We apply SPaRTAN to CITE-seq data sets from patients with varying degrees of COVID-19 severity and healthy controls to identify the associations between surface proteins and TFs in host immune cells. Here, we present COVID-19db of Immune Cell States (https://covid19db.streamlit.app/), a web server containing cell surface protein expression, SPaRTAN-inferred TF activities, and their associations with major host immune cell types. The data include four high-quality COVID-19 CITE-seq data sets with a toolset for user-friendly data analysis and visualization. We provide interactive surface protein and TF visualizations across major immune cell types for each data set, allowing comparison between various patient severity groups for the discovery of potential therapeutic targets and diagnostic biomarkers.
Collapse
Affiliation(s)
- Koushul Ramjattun
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
- UPMC Hillman Cancer Center, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - Ma Xiaojun
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
- UPMC Hillman Cancer Center, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - Gao Shou-Jiang
- UPMC Hillman Cancer Center, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
- Department of Microbiology and Molecular Genetics, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania, USA
| | - Harinder Singh
- Center for Systems Immunology and Department of Immunology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Hatice Ulku Osmanbeyoglu
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
- UPMC Hillman Cancer Center, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
- Department of Bioengineering, University of Pittsburgh School of Engineering, Pittsburgh, USA
- Department of Biostatistics, University of Pittsburgh School of Public Health, Pittsburgh, PA, USA
| |
Collapse
|
28
|
Wu M, Zhang Y, Markley M, Cassidy C, Newman N, Porter A. COVID-19 knowledge deconstruction and retrieval: an intelligent bibliometric solution. Scientometrics 2023:1-31. [PMID: 37360228 PMCID: PMC10230150 DOI: 10.1007/s11192-023-04747-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2022] [Accepted: 05/16/2023] [Indexed: 06/28/2023]
Abstract
COVID-19 has been an unprecedented challenge that disruptively reshaped societies and brought a massive amount of novel knowledge to the scientific community. However, as this knowledge flood continues surging, researchers have been disadvantaged by not having access to a platform that can quickly synthesize emerging information and link the new knowledge to the latent knowledge foundation. Aiming to fill this gap, we propose a research framework and develop a dashboard that can assist scientists in identifying, retrieving, and understanding COVID-19 knowledge from the ocean of scholarly articles. Incorporating principal component decomposition (PCD), a knowledge mode-based search approach, and hierarchical topic tree (HTT) analysis, the proposed framework profiles the COVID-19 research landscape, retrieves topic-specific latent knowledge foundation, and visualizes knowledge structures. The regularly updated dashboard presents our research results. Addressing 127,971 COVID-19 research papers from PubMed, the PCD topic analysis identifies 35 research hotspots, along with their inner correlations and fluctuating trends. The HTT result segments the global knowledge landscape of COVID-19 into clinical and public health branches and reveals the deeper exploration of those studies. To supplement this analysis, we additionally built a knowledge model from research papers on the topic of vaccination and fetched 92,286 pre-Covid publications as the latent knowledge foundation for reference. The HTT analysis results on the retrieved papers show multiple relevant biomedical disciplines and four future research topics: monoclonal antibody treatments, vaccinations in diabetic patients, vaccine immunity effectiveness and durability, and vaccination-related allergic sensitization.
Collapse
Affiliation(s)
- Mengjia Wu
- Australian Artificial Intelligence Institute, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, Australia
| | - Yi Zhang
- Australian Artificial Intelligence Institute, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, Australia
| | | | | | | | - Alan Porter
- Search Technology, Inc., Norcross, USA
- Science, Technology & Innovation Policy, Georgia Institute of Technology, Atlanta, USA
| |
Collapse
|
29
|
Badenes-Olmedo C, Corcho O. Lessons learned to enable question answering on knowledge graphs extracted from scientific publications: A case study on the coronavirus literature. J Biomed Inform 2023; 142:104382. [PMID: 37156393 PMCID: PMC10163941 DOI: 10.1016/j.jbi.2023.104382] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 04/14/2023] [Accepted: 05/03/2023] [Indexed: 05/10/2023]
Abstract
The article presents a workflow to create a question-answering system whose knowledge base combines knowledge graphs and scientific publications on coronaviruses. It is based on the experience gained in modeling evidence from research articles to provide answers to questions in natural language. The work contains best practices for acquiring scientific publications, tuning language models to identify and normalize relevant entities, creating representational models based on probabilistic topics, and formalizing an ontology that describes the associations between domain concepts supported by the scientific literature. All the resources generated in the domain of coronavirus are available openly as part of the Drugs4COVID initiative, and can be (re)-used independently or as a whole. They can be exploited by scientific communities conducting research related to SARS-CoV-2/COVID-19 and also by therapeutic communities, laboratories, etc., wishing to find and understand relationships between symptoms, drugs, active ingredients and their documentary evidence.
Collapse
Affiliation(s)
| | - Oscar Corcho
- Artificial Intelligence Department, Campus de Montegancedo, s/n., Boadilla del Monte, 28660, Madrid, Spain
| |
Collapse
|
30
|
Chakraborty C, Bhattacharya M, Dhama K, Agoramoorthy G. Artificial intelligence-enabled clinical trials might be a faster way to perform rapid clinical trials and counter future pandemics: lessons learned from the COVID-19 period. Int J Surg 2023; 109:1535-1538. [PMID: 36906740 PMCID: PMC10389411 DOI: 10.1097/js9.0000000000000088] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Accepted: 11/20/2022] [Indexed: 03/13/2023]
Affiliation(s)
- Chiranjib Chakraborty
- Department of Biotechnology, School of Life Science and Biotechnology, Adamas University, Kolkata, West Bengal
| | | | - Kuldeep Dhama
- Division of Pathology, ICAR-Indian Veterinary Research Institute, Bareilly, Uttar Pradesh, India
| | | |
Collapse
|
31
|
Alqaissi E, Alotaibi F, Ramzan MS. Graph data science and machine learning for the detection of COVID-19 infection from symptoms. PeerJ Comput Sci 2023; 9:e1333. [PMID: 37346701 PMCID: PMC10280642 DOI: 10.7717/peerj-cs.1333] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Accepted: 03/16/2023] [Indexed: 06/23/2023]
Abstract
Background COVID-19 is an infectious disease caused by SARS-CoV-2. The symptoms of COVID-19 vary from mild-to-moderate respiratory illnesses, and it sometimes requires urgent medication. Therefore, it is crucial to detect COVID-19 at an early stage through specific clinical tests, testing kits, and medical devices. However, these tests are not always available during the time of the pandemic. Therefore, this study developed an automatic, intelligent, rapid, and real-time diagnostic model for the early detection of COVID-19 based on its symptoms. Methods The COVID-19 knowledge graph (KG) constructed based on literature from heterogeneous data is imported to understand the COVID-19 different relations. We added human disease ontology to the COVID-19 KG and applied a node-embedding graph algorithm called fast random projection to extract an extra feature from the COVID-19 dataset. Subsequently, experiments were conducted using two machine learning (ML) pipelines to predict COVID-19 infection from its symptoms. Additionally, automatic tuning of the model hyperparameters was adopted. Results We compared two graph-based ML models, logistic regression (LR) and random forest (RF) models. The proposed graph-based RF model achieved a small error rate = 0.0064 and the best scores on all performance metrics, including specificity = 98.71%, accuracy = 99.36%, precision = 99.65%, recall = 99.53%, and F1-score = 99.59%. Furthermore, the Matthews correlation coefficient achieved by the RF model was higher than that of the LR model. Comparative analysis with other ML algorithms and with studies from the literature showed that the proposed RF model exhibited the best detection accuracy. Conclusion The graph-based RF model registered high performance in classifying the symptoms of COVID-19 infection, thereby indicating that the graph data science, in conjunction with ML techniques, helps improve performance and accelerate innovations.
Collapse
Affiliation(s)
- Eman Alqaissi
- Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
- Information Systems, King Khalid University, Abha, Saudi Arabia
| | - Fahd Alotaibi
- Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Muhammad Sher Ramzan
- Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
32
|
Tsueng G, Mullen JL, Alkuzweny M, Cano M, Rush B, Haag E, Lin J, Welzel DJ, Zhou X, Qian Z, Latif AA, Hufbauer E, Zeller M, Andersen KG, Wu C, Su AI, Gangavarapu K, Hughes LD. Outbreak.info Research Library: a standardized, searchable platform to discover and explore COVID-19 resources. Nat Methods 2023; 20:536-540. [PMID: 36823331 PMCID: PMC10393269 DOI: 10.1038/s41592-023-01770-w] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Accepted: 01/17/2023] [Indexed: 02/25/2023]
Abstract
Outbreak.info Research Library is a standardized, searchable interface of coronavirus disease 2019 (COVID-19) and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) publications, clinical trials, datasets, protocols and other resources, built with a reusable framework. We developed a rigorous schema to enforce consistency across different sources and resource types and linked related resources. Researchers can quickly search the latest research across data repositories, regardless of resource type or repository location, via a search interface, public application programming interface (API) and R package.
Collapse
Affiliation(s)
- Ginger Tsueng
- Department of Integrative, Structural and Computational Biology, the Scripps Research Institute, La Jolla, CA, USA.
| | - Julia L Mullen
- Department of Integrative, Structural and Computational Biology, the Scripps Research Institute, La Jolla, CA, USA
| | - Manar Alkuzweny
- Department of Biological Sciences, University of Notre Dame, Notre Dame, IN, USA
- Department of Immunology and Microbiology, the Scripps Research Institute, La Jolla, CA, USA
| | - Marco Cano
- Department of Integrative, Structural and Computational Biology, the Scripps Research Institute, La Jolla, CA, USA
| | | | - Emily Haag
- Department of Integrative, Structural and Computational Biology, the Scripps Research Institute, La Jolla, CA, USA
| | - Jason Lin
- Department of Integrative, Structural and Computational Biology, the Scripps Research Institute, La Jolla, CA, USA
| | - Dylan J Welzel
- Department of Integrative, Structural and Computational Biology, the Scripps Research Institute, La Jolla, CA, USA
| | - Xinghua Zhou
- Department of Integrative, Structural and Computational Biology, the Scripps Research Institute, La Jolla, CA, USA
| | - Zhongchao Qian
- Department of Integrative, Structural and Computational Biology, the Scripps Research Institute, La Jolla, CA, USA
| | - Alaa Abdel Latif
- Department of Immunology and Microbiology, the Scripps Research Institute, La Jolla, CA, USA
| | - Emory Hufbauer
- Department of Immunology and Microbiology, the Scripps Research Institute, La Jolla, CA, USA
| | - Mark Zeller
- Department of Immunology and Microbiology, the Scripps Research Institute, La Jolla, CA, USA
| | - Kristian G Andersen
- Department of Immunology and Microbiology, the Scripps Research Institute, La Jolla, CA, USA
- Scripps Research Translational Institute, La Jolla, CA, USA
| | - Chunlei Wu
- Department of Integrative, Structural and Computational Biology, the Scripps Research Institute, La Jolla, CA, USA
- Scripps Research Translational Institute, La Jolla, CA, USA
- Department of Molecular Medicine, the Scripps Research Institute, La Jolla, CA, USA
| | - Andrew I Su
- Department of Integrative, Structural and Computational Biology, the Scripps Research Institute, La Jolla, CA, USA
- Scripps Research Translational Institute, La Jolla, CA, USA
- Department of Molecular Medicine, the Scripps Research Institute, La Jolla, CA, USA
| | - Karthik Gangavarapu
- Department of Immunology and Microbiology, the Scripps Research Institute, La Jolla, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
| | - Laura D Hughes
- Department of Integrative, Structural and Computational Biology, the Scripps Research Institute, La Jolla, CA, USA.
| |
Collapse
|
33
|
Felemban O, Al-Zahrani A, Alsharari A. Prevalence, Attitudes, and Factors Influencing Uptake of the COVID-19 Vaccine in Saudi Arabia. Healthcare (Basel) 2023; 11:healthcare11070999. [PMID: 37046926 PMCID: PMC10094212 DOI: 10.3390/healthcare11070999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 03/26/2023] [Accepted: 03/28/2023] [Indexed: 04/03/2023] Open
Abstract
Background: The availability and access to COVID-19 vaccinations are critical to a successful pandemic response. More than 70% of the population reportedly needs to be vaccinated against COVID-19 to achieve herd immunity worldwide. However, the reluctance to get vaccinated with the COVID-19 vaccines is holding up the process of vaccination and efforts to control the pandemic and its negative consequences for the global health system, society, and economy. Previous studies have shown low uptake of vaccination in some Middle Eastern countries due to negative attitudes toward vaccination, including concerns about safety and efficacy and doubts about the need for vaccination. Aim: The aim of this study is to investigate the prevalence, attitudes, and factors influencing COVID-19 vaccine acceptance among healthcare workers, academic staff, and students in Saudi Arabia after the vaccine was made widely available by the government. Method: A cross-sectional survey was conducted to determine the prevalence, attitudes, and association between demographic factors and uptake of the first or second dose of vaccination among Saudi Arabian health workers and students. Data were collected using an online questionnaire administered and distributed through the Qualtrics platform. Results: The study recruited 173 participants from different countries and from different Saudi regions, most of whom were faculty members (n = 83). Results indicated significant differences between regions; the mean attitude score for the Western region (M 3.23) was significantly higher than that for other regions (M 3.08, p = 0.030). There was also an association between education level and number of vaccine doses received. Thus, the participants with higher education were the most compliant with national vaccination requirements (p = 0.004). Although the three professional groups reported social media as the most frequently reported source of information (p = 0.021), administrators were more likely to receive information from the MOH than other professional groups. Similarly, faculty members were more likely to receive information from colleagues and professional journals than the other two professional groups. Conclusions: Government officials should build public confidence through vaccination campaigns and devise effective health education programs to increase vaccination uptake. Authorized institutions can effectively use social media platforms to encourage vaccination and promote awareness among all audiences.
Collapse
Affiliation(s)
- Ohood Felemban
- College of Nursing, King Abdulaziz University, Jeddah 21589, Saudi Arabia
- Correspondence:
| | - Ahlam Al-Zahrani
- College of Nursing, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | | |
Collapse
|
34
|
Krithara A, Nentidis A, Bougiatiotis K, Paliouras G. BioASQ-QA: A manually curated corpus for Biomedical Question Answering. Sci Data 2023; 10:170. [PMID: 36973320 PMCID: PMC10042099 DOI: 10.1038/s41597-023-02068-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Accepted: 03/13/2023] [Indexed: 03/29/2023] Open
Abstract
The BioASQ question answering (QA) benchmark dataset contains questions in English, along with golden standard (reference) answers and related material. The dataset has been designed to reflect real information needs of biomedical experts and is therefore more realistic and challenging than most existing datasets. Furthermore, unlike most previous QA benchmarks that contain only exact answers, the BioASQ-QA dataset also includes ideal answers (in effect summaries), which are particularly useful for research on multi-document summarization. The dataset combines structured and unstructured data. The materials linked with each question comprise documents and snippets, which are useful for Information Retrieval and Passage Retrieval experiments, as well as concepts that are useful in concept-to-text Natural Language Generation. Researchers working on paraphrasing and textual entailment can also measure the degree to which their methods improve the performance of biomedical QA systems. Last but not least, the dataset is continuously extended, as the BioASQ challenge is running and new data are generated.
Collapse
Affiliation(s)
- Anastasia Krithara
- Institute of Informatics and Telecommunications, National Center for Scientific Research "Demokritos", Athens, Greece.
| | - Anastasios Nentidis
- Institute of Informatics and Telecommunications, National Center for Scientific Research "Demokritos", Athens, Greece
- School of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Konstantinos Bougiatiotis
- Institute of Informatics and Telecommunications, National Center for Scientific Research "Demokritos", Athens, Greece
| | - Georgios Paliouras
- Institute of Informatics and Telecommunications, National Center for Scientific Research "Demokritos", Athens, Greece
| |
Collapse
|
35
|
Leaman R, Islamaj R, Adams V, Alliheedi MA, Almeida JR, Antunes R, Bevan R, Chang YC, Erdengasileng A, Hodgskiss M, Ida R, Kim H, Li K, Mercer RE, Mertová L, Mobasher G, Shin HC, Sung M, Tsujimura T, Yeh WC, Lu Z. Chemical identification and indexing in full-text articles: an overview of the NLM-Chem track at BioCreative VII. Database (Oxford) 2023; 2023:7071696. [PMID: 36882099 PMCID: PMC9991492 DOI: 10.1093/database/baad005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Revised: 01/06/2023] [Accepted: 02/15/2023] [Indexed: 03/09/2023]
Abstract
The BioCreative National Library of Medicine (NLM)-Chem track calls for a community effort to fine-tune automated recognition of chemical names in the biomedical literature. Chemicals are one of the most searched biomedical entities in PubMed, and-as highlighted during the coronavirus disease 2019 pandemic-their identification may significantly advance research in multiple biomedical subfields. While previous community challenges focused on identifying chemical names mentioned in titles and abstracts, the full text contains valuable additional detail. We, therefore, organized the BioCreative NLM-Chem track as a community effort to address automated chemical entity recognition in full-text articles. The track consisted of two tasks: (i) chemical identification and (ii) chemical indexing. The chemical identification task required predicting all chemicals mentioned in recently published full-text articles, both span [i.e. named entity recognition (NER)] and normalization (i.e. entity linking), using Medical Subject Headings (MeSH). The chemical indexing task required identifying which chemicals reflect topics for each article and should therefore appear in the listing of MeSH terms for the document in the MEDLINE article indexing. This manuscript summarizes the BioCreative NLM-Chem track and post-challenge experiments. We received a total of 85 submissions from 17 teams worldwide. The highest performance achieved for the chemical identification task was 0.8672 F-score (0.8759 precision and 0.8587 recall) for strict NER performance and 0.8136 F-score (0.8621 precision and 0.7702 recall) for strict normalization performance. The highest performance achieved for the chemical indexing task was 0.6073 F-score (0.7417 precision and 0.5141 recall). This community challenge demonstrated that (i) the current substantial achievements in deep learning technologies can be utilized to improve automated prediction accuracy further and (ii) the chemical indexing task is substantially more challenging. We look forward to further developing biomedical text-mining methods to respond to the rapid growth of biomedical literature. The NLM-Chem track dataset and other challenge materials are publicly available at https://ftp.ncbi.nlm.nih.gov/pub/lu/BC7-NLM-Chem-track/. Database URL https://ftp.ncbi.nlm.nih.gov/pub/lu/BC7-NLM-Chem-track/.
Collapse
Affiliation(s)
| | | | - Virginia Adams
- NVIDIA, 2788 San Tomas Expressway, Santa Clara, CA 95051, USA
| | - Mohammed A Alliheedi
- Department of Computer Science, Al Baha University, 4781 King Fahd Rd, Al Aqiq 65779, Saudi Arabia
| | - João Rafael Almeida
- Department of Electronics, Telecommunications and Informatics (DETI), Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro, Campus Universitário de Santiago, Aveiro 3810-193, Portugal
- Department of Information and Communications Technologies, University of A Coruña, Camiño do Lagar de Castro, A Coruña 15008, Spain
| | - Rui Antunes
- Department of Electronics, Telecommunications and Informatics (DETI), Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro, Campus Universitário de Santiago, Aveiro 3810-193, Portugal
| | - Robert Bevan
- Informatics Department, Medicines Discovery Catapult, Alderley Park, Block 35, Mereside, Macclesfield SK10 4ZF, UK
| | - Yung-Chun Chang
- Graduate Institute of Data Science, Taipei Medical University, No. 172-1, Section 2, Keelung Rd, Da’an District, Taipei City , Taipei 106, Taiwan
| | - Arslan Erdengasileng
- Department of Statistics, Florida State University, 117 N. Woodward Ave, Tallahassee, FL 32306, USA
| | - Matthew Hodgskiss
- Informatics Department, Medicines Discovery Catapult, Alderley Park, Block 35, Mereside, Macclesfield SK10 4ZF, UK
| | - Ryuki Ida
- Computational Intelligence Laboratory, Toyota Technological Institute, 2-12-1 Hisakata, Tempaku-ku, Nagoya, Aichi 468-8511, Japan
| | - Hyunjae Kim
- Department of Computer Science and Engineering, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul 02841, South Korea
| | - Keqiao Li
- Department of Statistics, Florida State University, 117 N. Woodward Ave, Tallahassee, FL 32306, USA
| | - Robert E Mercer
- Department of Computer Science, The University of Western Ontario, Room 355, Middlesex College, Ontario , London N6A 5B7, Canada
| | - Lukrécia Mertová
- Scientific Databases and Visualization Group, Heidelberg Institute for Theoretical Studies (HITS gGmbH), Schloss-Wolfsbrunnenweg 35, Heidelberg 69118, Germany
| | - Ghadeer Mobasher
- Scientific Databases and Visualization Group, Heidelberg Institute for Theoretical Studies (HITS gGmbH), Schloss-Wolfsbrunnenweg 35, Heidelberg 69118, Germany
- Institute of Computer Science, Heidelberg University, Im Neuenheimer Feld 205, Heidelberg 69120, Germany
| | - Hoo-Chang Shin
- NVIDIA, 2788 San Tomas Expressway, Santa Clara, CA 95051, USA
| | - Mujeen Sung
- Department of Computer Science and Engineering, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul 02841, South Korea
| | - Tomoki Tsujimura
- Computational Intelligence Laboratory, Toyota Technological Institute, 2-12-1 Hisakata, Tempaku-ku, Nagoya, Aichi 468-8511, Japan
| | - Wen-Chao Yeh
- Institute of Information Systems and Applications, National Tsing Hua University, No. 101, Section 2, Kuang-Fu Road, Hsinchu 30013, Taiwan
| | - Zhiyong Lu
- *Corresponding author: Tel: +1-301-594-7089; Fax: +1-301-480-2290;
| |
Collapse
|
36
|
Basit SA, Qureshi R, Musleh S, Guler R, Rahman MS, Biswas KH, Alam T. COVID-19Base v3: Update of the knowledgebase for drugs and biomedical entities linked to COVID-19. Front Public Health 2023; 11:1125917. [PMID: 36950105 PMCID: PMC10025554 DOI: 10.3389/fpubh.2023.1125917] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Accepted: 02/07/2023] [Indexed: 03/08/2023] Open
Abstract
COVID-19 has taken a huge toll on our lives over the last 3 years. Global initiatives put forward by all stakeholders are still in place to combat this pandemic and help us learn lessons for future ones. While the vaccine rollout was not able to curb the spread of the disease for all strains, the research community is still trying to develop effective therapeutics for COVID-19. Although Paxlovid and remdesivir have been approved by the FDA against COVID-19, they are not free of side effects. Therefore, the search for a therapeutic solution with high efficacy continues in the research community. To support this effort, in this latest version (v3) of COVID-19Base, we have summarized the biomedical entities linked to COVID-19 that have been highlighted in the scientific literature after the vaccine rollout. Eight different topic-specific dictionaries, i.e., gene, miRNA, lncRNA, PDB entries, disease, alternative medicines registered under clinical trials, drugs, and the side effects of drugs, were used to build this knowledgebase. We have introduced a BLSTM-based deep-learning model to predict the drug-disease associations that outperforms the existing model for the same purpose proposed in the earlier version of COVID-19Base. For the very first time, we have incorporated disease-gene, disease-miRNA, disease-lncRNA, and drug-PDB associations covering the largest number of biomedical entities related to COVID-19. We have provided examples of and insights into different biomedical entities covered in COVID-19Base to support the research community by incorporating all of these entities under a single platform to provide evidence-based support from the literature. COVID-19Base v3 can be accessed from: https://covidbase-v3.vercel.app/. The GitHub repository for the source code and data dictionaries is available to the community from: https://github.com/91Abdullah/covidbasev3.0.
Collapse
Affiliation(s)
- Syed Abdullah Basit
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
| | - Rizwan Qureshi
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
| | - Saleh Musleh
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
| | - Reto Guler
- International Centre for Genetic Engineering and Biotechnology (ICGEB), Cape Town Component, University of Cape Town, Cape Town, South Africa
- Department of Pathology, Division of Immunology and South African Medical Research Council (SAMRC) Immunology of Infectious Diseases, Institute of Infectious Diseases and Molecular Medicine (IDM), Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
- Wellcome Centre for Infectious Diseases Research in Africa, Institute of Infectious Diseases and Molecular Medicine (IDM), Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - M. Sohel Rahman
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - Kabir H. Biswas
- College of Health and Life Sciences, Hamad Bin Khalifa University, Doha, Qatar
| | - Tanvir Alam
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
| |
Collapse
|
37
|
Systematic Guidelines for Effective Utilization of COVID-19 Databases in Genomic, Epidemiologic, and Clinical Research. Viruses 2023; 15:v15030692. [PMID: 36992400 PMCID: PMC10059256 DOI: 10.3390/v15030692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 02/27/2023] [Accepted: 03/04/2023] [Indexed: 03/09/2023] Open
Abstract
The pandemic has led to the production and accumulation of various types of data related to coronavirus disease 2019 (COVID-19). To understand the features and characteristics of COVID-19 data, we summarized representative databases and determined the data types, purpose, and utilization details of each database. In addition, we categorized COVID-19 associated databases into epidemiological data, genome and protein data, and drug and target data. We found that the data present in each of these databases have nine separate purposes (clade/variant/lineage, genome browser, protein structure, epidemiological data, visualization, data analysis tool, treatment, literature, and immunity) according to the types of data. Utilizing the databases we investigated, we created four queries as integrative analysis methods that aimed to answer important scientific questions related to COVID-19. Our queries can make effective use of multiple databases to produce valuable results that can reveal novel findings through comprehensive analysis. This allows clinical researchers, epidemiologists, and clinicians to have easy access to COVID-19 data without requiring expert knowledge in computing or data science. We expect that users will be able to reference our examples to construct their own integrative analysis methods, which will act as a basis for further scientific inquiry and data searching.
Collapse
|
38
|
Dai T, Zhao J, Li D, Tian S, Zhao X, Pan S. Heterogeneous deep graph convolutional network with citation relational BERT for COVID-19 inline citation recommendation. EXPERT SYSTEMS WITH APPLICATIONS 2023; 213:118841. [PMID: 36157791 PMCID: PMC9482209 DOI: 10.1016/j.eswa.2022.118841] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 09/02/2022] [Accepted: 09/12/2022] [Indexed: 06/16/2023]
Abstract
The outbreak of COVID-19 brings almost the biggest explosions of scientific literature ever. Facing such volume literature, it is hard for researches to find desired citation when carrying out COVID-19 related research, especially for junior researchers. This paper presents a novel neural network based method, called citation relational BERT with heterogeneous deep graph convolutional network (CRB-HDGCN), for COVID-19 inline citation recommendation task. The CRB-HDGCN contains two main stages. The first stage is to enhance the representation learning of BERT model for COVID-19 inline citation recommendation task through CRB. To achieve the above goal, an augmented citation sentence corpus, which replaces the citation placeholder with the title of the cited papers, is used to lightly retrain BERT model. In addition, we extract three types of sentence pair according citation relation, and establish sentence prediction tasks to further fine-tune the BERT model. The second stage is to learn effective dense vector of nodes among COVID-19 bibliographic graph through HDGCN. The HDGCN contains four layers which are essentially all sub neural networks. The first layer is initial embedding layer which generates initial input vectors with fixed size through CRB and a multilayer perceptron. The second layer is a heterogeneous graph convolutional layer. In this layer, we expand traditional homogeneous graph convolutional network into heterogeneous by subtly adding heterogeneous nodes and relations. The third layer is a deep attention layer. This layer uses trainable project vectors to reweight the node importance simultaneously according to both node types and convolution layers, which further promotes the performance of learnt node vectors. The last decoder layer recovers the graph structure and let the whole network trainable. The recommendation is finally achieved by integrating the high performance heterogeneous vectors learnt from CRB-HDGCN with the query vectors. We conduct experiments on the CORD-19 and LitCovid datasets. The results show that compared with the second best method CO-Search, CRB-HDGCN improves MAP, MRR, P@100 and R@100 with 21.8%, 22.7%, 37.6% and 21.2% on CORD-19, and 29.1%, 25.9%, 15.3% and 11.3% on LitCovid, respectively.
Collapse
Affiliation(s)
- Tao Dai
- School of Future Transportation, Chang'an University, Xi'an, Shaanxi 710064, China
| | - Jie Zhao
- School of Economics and Management, Chang'an University, Xi'an, Shaanxi 710064, China
| | - Dehong Li
- School of Economics and Management, Chang'an University, Xi'an, Shaanxi 710064, China
| | - Shun Tian
- School of Future Transportation, Chang'an University, Xi'an, Shaanxi 710064, China
| | - Xiangmo Zhao
- School of Information Engineering, Chang'an University, Xi'an, Shaanxi 710064, China
| | - Shirui Pan
- Faculty of Information Technology, Monash University, Melbourne, Australia
| |
Collapse
|
39
|
Ahmad SJS, Degiannis K, Borucki J, Pouwels S, Rawaf DL, Head M, Li CH, Archid R, Ahmed AR, Lala A, Raza W, Mellor K, Wichmann D, Exadaktylos A. The most influential COVID-19 articles: A systematic review. New Microbes New Infect 2023; 52:101094. [PMID: 36816491 PMCID: PMC9918314 DOI: 10.1016/j.nmni.2023.101094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Revised: 01/27/2023] [Accepted: 02/02/2023] [Indexed: 02/12/2023] Open
Abstract
Background Since December 2019, the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2),causative pathogen of coronavirus disease 2019 (COVID-19), has triggered a pandemic with challenges for health care systems around the world. Researchers have studied and published on the subject of SARS-CoV-2 and the disease extensively. What is the significance of articles published, shared and cited in the early stages of such a pandemic? Materials and methods A systematic literature search in a time frame of 12 months and analysis rating using Principle Component Analysis (PCA) and Multiple Factor Analysis (MFA) were performed. Results The 100 most cited COVID-19 articles were identified. The majority of these articles were from China (n = 54), followed by United States of America (USA) (n = 21) and United Kingdom (UK) (n = 8). All articles were published in high-ranked, peer-reviewed journals, with research focusing onthe the diagnosis, transmission and therapy of COVID-19. The level of evidence of the 100 most cited COVID-19 articles on average was low. Conclusion In the early stages of a pandemic, new and innovative research can emerge and be highly cited, regardless of the level of evidence.
Collapse
Affiliation(s)
- Suhaib JS. Ahmad
- Department of General Surgery, Betsi Cadwaladr University Health Board, Wales, UK,Department of Emergency Medicine, Inselspital, University Hospital of Bern, Bern, Switzerland,Corresponding author. Department of General Surgery, Betsi Cadwaladr University Health Board, Wales, UK.
| | - Konstantinos Degiannis
- Department of Trauma, Hand and Reconstructive Surgery, University Hospital of Saarland, University of Saarland, Homburg, Germany,Department of Emergency Medicine, Inselspital, University Hospital of Bern, Bern, Switzerland
| | - Joseph Borucki
- Norfolk and Norwich University Hospitals NHS Foundation Trust, Norwich, UK
| | - Sjaak Pouwels
- Department of General, Abdominal and Minimally Invasive Surgery, Helios Klinikum Krefeld, Germany
| | - David Laith Rawaf
- WHO Collaborating Centre for Public Health Education & Training, Imperial College London, London, UK
| | - Marion Head
- Department of General Surgery, Betsi Cadwaladr University Health Board, Wales, UK
| | - Chun Hei Li
- Vascular Institute, St George's University Hospitals NHS Foundation Trust, London, UK
| | - Rami Archid
- Department of General, Visceral and Transplant Surgery, Eberhard-Karls-University Hospital Tuebingen, Tuebingen, Germany
| | - Ahmed R. Ahmed
- Department of Bariatric and Metabolic Surgery, Imperial College London, London, UK
| | - Anil Lala
- Department of General Surgery, Betsi Cadwaladr University Health Board, Wales, UK
| | - Wasif Raza
- Department of General Surgery, Betsi Cadwaladr University Health Board, Wales, UK
| | - Katie Mellor
- Department of General Surgery, Betsi Cadwaladr University Health Board, Wales, UK
| | - Doerte Wichmann
- Department of General, Visceral and Transplant Surgery, Eberhard-Karls-University Hospital Tuebingen, Tuebingen, Germany
| | - Aristomenis Exadaktylos
- Department of Emergency Medicine, Inselspital, University Hospital of Bern, Bern, Switzerland
| |
Collapse
|
40
|
In silico transcriptional analysis of asymptomatic and severe COVID-19 patients reveals the susceptibility of severe patients to other comorbidities and non-viral pathological conditions. HUMAN GENE 2023; 35. [PMID: 37521006 PMCID: PMC9754755 DOI: 10.1016/j.humgen.2022.201135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
COVID-19 is a severe respiratory disease caused by SARS-CoV-2, a novel human coronavirus. Patients infected with SARS-CoV-2 exhibit heterogeneous symptoms that pose pragmatic hurdles for implementing appropriate therapy and management of the COVID-19 patients and their post-COVID complications. Thus, understanding the impact of infection severity at the molecular level in the host is vital to understand the host response and accordingly it's precise management. In the current study, we performed a comparative transcriptomics analysis of publicly available seven asymptomatic and eight severe COVID-19 patients. Exploratory data analysis employing Principal Component Analysis (PCA) showed the distinct clusters of asymptomatic and severe patients. Subsequently, the differential gene expression analysis using DESeq2 identified 1224 significantly upregulated genes (logFC≥ 1.5, p-adjusted value <0.05) and 268 significantly downregulated genes (logFC≤ −1.5, p-adjusted value <0.05) in severe samples in comparison to asymptomatic samples. Eventually, Gene Set Enrichment Analysis (GSEA) revealed the upregulation of anti-viral and anti-inflammatory pathways, secondary infections, Iron homeostasis, anemia, cardiac-related, etc.; while, downregulation of lipid metabolism, adaptive immune response, translation, recurrent respiratory infections, heme-biosynthetic pathways, etc. Conclusively, these findings provide insight into the enhanced susceptibility of severe COVID-19 patients to other health comorbidities including non-viral pathogenic infections, atherosclerosis, autoinflammatory diseases, anemia, male infertility, etc. owing to the activation of biological processes, pathways and molecular functions associated with them. We anticipate this study will facilitate the researchers in finding efficient therapeutic targets and eventually the clinicians in management of COVID-19 patients and post-COVID-19 effects in them.
Collapse
|
41
|
Jimeno Yepes AJ, Verspoor K. Classifying literature mentions of biological pathogens as experimentally studied using natural language processing. J Biomed Semantics 2023; 14:1. [PMID: 36721225 PMCID: PMC9889128 DOI: 10.1186/s13326-023-00282-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Accepted: 01/17/2023] [Indexed: 02/02/2023] Open
Abstract
BACKGROUND Information pertaining to mechanisms, management and treatment of disease-causing pathogens including viruses and bacteria is readily available from research publications indexed in MEDLINE. However, identifying the literature that specifically characterises these pathogens and their properties based on experimental research, important for understanding of the molecular basis of diseases caused by these agents, requires sifting through a large number of articles to exclude incidental mentions of the pathogens, or references to pathogens in other non-experimental contexts such as public health. OBJECTIVE In this work, we lay the foundations for the development of automatic methods for characterising mentions of pathogens in scientific literature, focusing on the task of identifying research that involves the experimental study of a pathogen in an experimental context. There are no manually annotated pathogen corpora available for this purpose, while such resources are necessary to support the development of machine learning-based models. We therefore aim to fill this gap, producing a large data set automatically from MEDLINE under some simplifying assumptions for the task definition, and using it to explore automatic methods that specifically support the detection of experimentally studied pathogen mentions in research publications. METHODS We developed a pathogen mention characterisation literature data set -READBiomed-Pathogens- automatically using NCBI resources, which we make available. Resources such as the NCBI Taxonomy, MeSH and GenBank can be used effectively to identify relevant literature about experimentally researched pathogens, more specifically using MeSH to link to MEDLINE citations including titles and abstracts with experimentally researched pathogens. We experiment with several machine learning-based natural language processing (NLP) algorithms leveraging this data set as training data, to model the task of detecting papers that specifically describe experimental study of a pathogen. RESULTS We show that our data set READBiomed-Pathogens can be used to explore natural language processing configurations for experimental pathogen mention characterisation. READBiomed-Pathogens includes citations related to organisms including bacteria, viruses, and a small number of toxins and other disease-causing agents. CONCLUSIONS We studied the characterisation of experimentally studied pathogens in scientific literature, developing several natural language processing methods supported by an automatically developed data set. As a core contribution of the work, we presented a methodology to automatically construct a data set for pathogen identification using existing biomedical resources. The data set and the annotation code are made publicly available. Performance of the pathogen mention identification and characterisation algorithms were additionally evaluated on a small manually annotated data set shows that the data set that we have generated allows characterising pathogens of interest. TRIAL REGISTRATION N/A.
Collapse
Affiliation(s)
- Antonio Jose Jimeno Yepes
- School of Computing Technologies, RMIT University, Melbourne, Australia.
- School of Computing and Information Systems, The University of Melbourne, Melbourne, Australia.
| | - Karin Verspoor
- School of Computing Technologies, RMIT University, Melbourne, Australia
- School of Computing and Information Systems, The University of Melbourne, Melbourne, Australia
| |
Collapse
|
42
|
COVID-19 literature surveillance-A framework to manage the literature and support evidence-based decision-making on a rapidly evolving public health topic. CANADA COMMUNICABLE DISEASE REPORT = RELEVE DES MALADIES TRANSMISSIBLES AU CANADA 2023; 49:5-9. [PMID: 36815866 PMCID: PMC9902036 DOI: 10.14745/ccdr.v49i01a02] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Background The coronavirus disease 2019 (COVID-19) pandemic has led to a rapid surge of literature on severe acute respiratory syndrome coronavirus 2 and the wider impacts of the pandemic. Research on COVID-19 has been produced at an unprecedented rate, and the ability to stay on top of the most relevant evidence is top priority for clinicians, researchers, public health professionals and policymakers. This article presents a knowledge synthesis methodology developed and used by the Public Health Agency of Canada for managing and maintaining a literature surveillance system to identify, characterize, categorize and disseminate COVID-19 evidence daily. Methods The Daily Scan of COVID-19 Literature project comprised a systematic process involving four main steps: literature search; screening for relevance; classification and summarization of studies; and disseminating a daily report. Results As of the end of March 2022 there were approximately 300,000 COVID-19 and pandemic-related citations in the COVID-19 database, of which 50%-60% were primary research. Each day, a report of all new COVID-19 citations, literature highlights and a link to the updated database was generated and sent to a mailing list of over 200 recipients including federal, provincial and local public health agencies and academic institutions. Conclusion This central repository of COVID-19 literature was maintained in real time to aid in accelerated evidence synthesis activities and support evidence-based decision-making during the pandemic response in Canada. This systematic process can be applied to future rapidly evolving public health topics that require the continuous evaluation and dissemination of evidence.
Collapse
|
43
|
Building an intelligent system for answering specialized questions about COVID-19. PROCEDIA COMPUTER SCIENCE 2023; 219:388-396. [PMID: 36968672 PMCID: PMC10030178 DOI: 10.1016/j.procs.2023.01.304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/24/2023]
Abstract
The paper discusses the design and implementation process of an intelligent system for answering specialized questions about COVID-19. The system is based on deep learning and transfer learning techniques and uses the popular CORD-19 dataset as a source of scientific knowledge about the problem domain. The experiments performed with the pilot version of the system are presented and the obtained results are analyzed. Conclusions are formulated about the applicability and the opportunities for improvement of the proposed approach.
Collapse
|
44
|
Bashir SR, Raza S, Kocaman V, Qamar U. Clinical Application of Detecting COVID-19 Risks: A Natural Language Processing Approach. Viruses 2022; 14:v14122761. [PMID: 36560764 PMCID: PMC9781729 DOI: 10.3390/v14122761] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Accepted: 12/08/2022] [Indexed: 12/14/2022] Open
Abstract
The clinical application of detecting COVID-19 factors is a challenging task. The existing named entity recognition models are usually trained on a limited set of named entities. Besides clinical, the non-clinical factors, such as social determinant of health (SDoH), are also important to study the infectious disease. In this paper, we propose a generalizable machine learning approach that improves on previous efforts by recognizing a large number of clinical risk factors and SDoH. The novelty of the proposed method lies in the subtle combination of a number of deep neural networks, including the BiLSTM-CNN-CRF method and a transformer-based embedding layer. Experimental results on a cohort of COVID-19 data prepared from PubMed articles show the superiority of the proposed approach. When compared to other methods, the proposed approach achieves a performance gain of about 1-5% in terms of macro- and micro-average F1 scores. Clinical practitioners and researchers can use this approach to obtain accurate information regarding clinical risks and SDoH factors, and use this pipeline as a tool to end the pandemic or to prepare for future pandemics.
Collapse
Affiliation(s)
- Syed Raza Bashir
- Department of Computer Science, Toronto Metropolitan University, Toronto, ON M5B 2K3, Canada
| | - Shaina Raza
- Dalla Lana School of Public Health, University of Toronto, Toronto, ON M5T 3M7, Canada
- Correspondence:
| | | | - Urooj Qamar
- Institute of Business & Information Technology, University of the Punjab, Lahore 54590, Pakistan
| |
Collapse
|
45
|
Tsueng G, Mullen JL, Alkuzweny M, Cano M, Rush B, Haag E, Curators O, Lin J, Welzel DJ, Zhou X, Qian Z, Latif AA, Hufbauer E, Zeller M, Andersen KG, Wu C, Su AI, Gangavarapu K, Hughes LD. Outbreak.info Research Library: A standardized, searchable platform to discover and explore COVID-19 resources. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2022:2022.01.20.477133. [PMID: 35132411 PMCID: PMC8820656 DOI: 10.1101/2022.01.20.477133] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
To combat the ongoing COVID-19 pandemic, scientists have been conducting research at breakneck speeds, producing over 52,000 peer-reviewed articles within the first year. To address the challenge in tracking the vast amount of new research located in separate repositories, we developed outbreak.info Research Library, a standardized, searchable interface of COVID-19 and SARS-CoV-2 resources. Unifying metadata from sixteen repositories, we assembled a collection of over 350,000 publications, clinical trials, datasets, protocols, and other resources as of October 2022. We used a rigorous schema to enforce consistency across different sources and resource types and linked related resources. Researchers can quickly search the latest research across data repositories, regardless of resource type or repository location, via a search interface, public API, and R package. Finally, we discuss the challenges inherent in combining metadata from scattered and heterogeneous resources and provide recommendations to streamline this process to aid scientific research.
Collapse
Affiliation(s)
- Ginger Tsueng
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Julia L. Mullen
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Manar Alkuzweny
- Department of Biological Sciences, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Marco Cano
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | | | - Emily Haag
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | | | - Jason Lin
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Dylan J. Welzel
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Xinghua Zhou
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Zhongchao Qian
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Alaa Abdel Latif
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Emory Hufbauer
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Mark Zeller
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Kristian G. Andersen
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA 92037, USA
- Scripps Research Translational Institute, La Jolla, CA 92037, USA
| | - Chunlei Wu
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
- Scripps Research Translational Institute, La Jolla, CA 92037, USA
- Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Andrew I. Su
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
- Scripps Research Translational Institute, La Jolla, CA 92037, USA
- Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Karthik Gangavarapu
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Laura D. Hughes
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| |
Collapse
|
46
|
Byrne JA, Park Y, Richardson RAK, Pathmendra P, Sun M, Stoeger T. Protection of the human gene research literature from contract cheating organizations known as research paper mills. Nucleic Acids Res 2022; 50:12058-12070. [PMID: 36477580 PMCID: PMC9757046 DOI: 10.1093/nar/gkac1139] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Revised: 11/08/2022] [Accepted: 11/14/2022] [Indexed: 12/12/2022] Open
Abstract
Human gene research generates new biology insights with translational potential, yet few studies have considered the health of the human gene literature. The accessibility of human genes for targeted research, combined with unreasonable publication pressures and recent developments in scholarly publishing, may have created a market for low-quality or fraudulent human gene research articles, including articles produced by contract cheating organizations known as paper mills. This review summarises the evidence that paper mills contribute to the human gene research literature at scale and outlines why targeted gene research may be particularly vulnerable to systematic research fraud. To raise awareness of targeted gene research from paper mills, we highlight features of problematic manuscripts and publications that can be detected by gene researchers and/or journal staff. As improved awareness and detection could drive the further evolution of paper mill-supported publications, we also propose changes to academic publishing to more effectively deter and correct problematic publications at scale. In summary, the threat of paper mill-supported gene research highlights the need for all researchers to approach the literature with a more critical mindset, and demand publications that are underpinned by plausible research justifications, rigorous experiments and fully transparent reporting.
Collapse
Affiliation(s)
- Jennifer A Byrne
- To whom correspondence should be addressed. Tel: +61 2 4920 4135;
| | - Yasunori Park
- School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, NSW, Australia
| | - Reese A K Richardson
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, USA
| | - Pranujan Pathmendra
- School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, NSW, Australia
| | - Mengyi Sun
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, USA
| | - Thomas Stoeger
- To whom correspondence should be addressed. Tel: +61 2 4920 4135;
| |
Collapse
|
47
|
Raza S, Reji DJ, Shajan F, Bashir SR. Large-scale application of named entity recognition to biomedicine and epidemiology. PLOS DIGITAL HEALTH 2022; 1:e0000152. [PMID: 36812589 PMCID: PMC9931203 DOI: 10.1371/journal.pdig.0000152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/04/2022] [Accepted: 11/01/2022] [Indexed: 12/13/2022]
Abstract
BACKGROUND Despite significant advancements in biomedical named entity recognition methods, the clinical application of these systems continues to face many challenges: (1) most of the methods are trained on a limited set of clinical entities; (2) these methods are heavily reliant on a large amount of data for both pre-training and prediction, making their use in production impractical; (3) they do not consider non-clinical entities, which are also related to patient's health, such as social, economic or demographic factors. METHODS In this paper, we develop Bio-Epidemiology-NER (https://pypi.org/project/Bio-Epidemiology-NER/) an open-source Python package for detecting biomedical named entities from the text. This approach is based on a Transformer-based system and trained on a dataset that is annotated with many named entities (medical, clinical, biomedical, and epidemiological). This approach improves on previous efforts in three ways: (1) it recognizes many clinical entity types, such as medical risk factors, vital signs, drugs, and biological functions; (2) it is easily configurable, reusable, and can scale up for training and inference; (3) it also considers non-clinical factors (age and gender, race and social history and so) that influence health outcomes. At a high level, it consists of the phases: pre-processing, data parsing, named entity recognition, and named entity enhancement. RESULTS Experimental results show that our pipeline outperforms other methods on three benchmark datasets with macro-and micro average F1 scores around 90 percent and above. CONCLUSION This package is made publicly available for researchers, doctors, clinicians, and anyone to extract biomedical named entities from unstructured biomedical texts.
Collapse
Affiliation(s)
- Shaina Raza
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
- * E-mail: (SR); (SRB)
| | | | - Femi Shajan
- Environmental Resources Management, Bangalore, India
| | - Syed Raza Bashir
- Toronto Metropolitan University, Toronto, Ontario, Canada
- * E-mail: (SR); (SRB)
| |
Collapse
|
48
|
Kidera A, Moritsugu K, Ekimoto T, Ikeguchi M. Functional dynamics of SARS-CoV-2 3C-like protease as a member of clan PA. Biophys Rev 2022; 14:1473-1485. [PMID: 36474932 PMCID: PMC9716165 DOI: 10.1007/s12551-022-01020-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Accepted: 11/17/2022] [Indexed: 12/05/2022] Open
Abstract
SARS-CoV-2 3C-like protease (3CLpro), a potential therapeutic target for COVID-19, consists of a chymotrypsin fold and a C-terminal α-helical domain (domain III), the latter of which mediates dimerization required for catalytic activation. To gain further understanding of the functional dynamics of SARS-CoV-2 3CLpro, this review extends the scope to the comparative study of many crystal structures of proteases having the chymotrypsin fold (clan PA of the MEROPS database). First, the close correspondence between the zymogen-enzyme transformation in chymotrypsin and the allosteric dimerization activation in SARS-CoV-2 3CLpro is illustrated. Then, it is shown that the 3C-like proteases of family Coronaviridae (the protease family C30), which are closely related to SARS-CoV-2 3CLpro, have the same homodimeric structure and common activation mechanism via domain III mediated dimerization. The survey extended to order Nidovirales reveals that all 3C-like proteases belonging to Nidovirales have domain III, but with various chain lengths, and 3CLpro of family Mesoniviridae (family C107) has the same homodimeric structure as that of C30, even though they have no sequence similarity. As a reference, monomeric 3C proteases belonging to the more distant family Picornaviridae (family C3) lacking domain III are compared with C30, and it is shown that the 3C proteases are rigid enough to maintain their structures in the active state. Supplementary Information The online version contains supplementary material available at 10.1007/s12551-022-01020-x.
Collapse
Affiliation(s)
- Akinori Kidera
- Graduate School of Medical Life Science, Yokohama City University, 1-7-29 Suehiro-Cho, Tsurumi, Yokohama 230-0045 Japan
| | - Kei Moritsugu
- Graduate School of Medical Life Science, Yokohama City University, 1-7-29 Suehiro-Cho, Tsurumi, Yokohama 230-0045 Japan ,Present Address: Graduate School of Science, Osaka Metropolitan University, 1-1 Gakuen-Cho, Nakaku, Sakai, Osaka 599-8570 Japan
| | - Toru Ekimoto
- Graduate School of Medical Life Science, Yokohama City University, 1-7-29 Suehiro-Cho, Tsurumi, Yokohama 230-0045 Japan
| | - Mitsunori Ikeguchi
- Graduate School of Medical Life Science, Yokohama City University, 1-7-29 Suehiro-Cho, Tsurumi, Yokohama 230-0045 Japan
| |
Collapse
|
49
|
Comprehensively identifying Long Covid articles with human-in-the-loop machine learning. PATTERNS (NEW YORK, N.Y.) 2022; 4:100659. [PMID: 36471749 PMCID: PMC9712067 DOI: 10.1016/j.patter.2022.100659] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 09/19/2022] [Accepted: 11/17/2022] [Indexed: 12/05/2022]
Abstract
A significant percentage of COVID-19 survivors experience ongoing multisystemic symptoms that often affect daily living, a condition known as Long Covid or post-acute-sequelae of SARS-CoV-2 infection. However, identifying scientific articles relevant to Long Covid is challenging since there is no standardized or consensus terminology. We developed an iterative human-in-the-loop machine learning framework combining data programming with active learning into a robust ensemble model, demonstrating higher specificity and considerably higher sensitivity than other methods. Analysis of the Long Covid Collection shows that (1) most Long Covid articles do not refer to Long Covid by any name, (2) when the condition is named, the name used most frequently in the literature is Long Covid, and (3) Long Covid is associated with disorders in a wide variety of body systems. The Long Covid Collection is updated weekly and is searchable online at the LitCovid portal: https://www.ncbi.nlm.nih.gov/research/coronavirus/docsum?filters=e_condition.LongCovid.
Collapse
|
50
|
Rabby G, Berka P. Multi-class classification of COVID-19 documents using machine learning algorithms. J Intell Inf Syst 2022; 60:571-591. [PMID: 36465147 PMCID: PMC9707112 DOI: 10.1007/s10844-022-00768-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 11/16/2022] [Accepted: 11/17/2022] [Indexed: 11/30/2022]
Abstract
In most biomedical research paper corpus, document classification is a crucial task. Even due to the global epidemic, it is a crucial task for researchers across a variety of fields to figure out the relevant scientific research papers accurately and quickly from a flood of biomedical research papers. It can also assist learners or researchers in assigning a research paper to an appropriate category and also help to find the relevant research paper within a very short time. A biomedical document classifier needs to be designed differently to go beyond a "general" text classifier because it's not dependent only on the text itself (i.e. on titles and abstracts) but can also utilize other information like entities extracted using some medical taxonomies or bibliometric data. The main objective of this research was to find out the type of information or features and representation method creates influence the biomedical document classification task. For this reason, we run several experiments on conventional text classification methods with different kinds of features extracted from the titles, abstracts, and bibliometric data. These procedures include data cleaning, feature engineering, and multi-class classification. Eleven different variants of input data tables were created and analyzed using ten machine learning algorithms. We also evaluate the data efficiency and interpretability of these models as essential features of any biomedical research paper classification system for handling specifically the COVID-19 related health crisis. Our major findings are that TF-IDF representations outperform the entity extraction methods and the abstract itself provides sufficient information for correct classification. Out of the used machine learning algorithms, the best performance over various forms of document representation was achieved by Random Forest and Neural Network (BERT). Our results lead to a concrete guideline for practitioners on biomedical document classification.
Collapse
Affiliation(s)
- Gollam Rabby
- Department of Information and Knowledge Engineering, Prague University of Economics and Business, Prague, Czech Republic
| | - Petr Berka
- Department of Information and Knowledge Engineering, Prague University of Economics and Business, Prague, Czech Republic
| |
Collapse
|