1
|
Theile CM, Beall AL. Conducting a Systematic Review of the Literature. J Dent Hyg 2024; 98:51-56. [PMID: 38649289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Accepted: 03/25/2024] [Indexed: 04/25/2024]
Abstract
This overview of the systematic review provides guidance regarding how and when to use this approach to a research question. High quality systematic reviews are essential to assist health care practitioners keep current with the large and rapidly growing body of scientific evidence. The systematic review is a transparent and reproducible synthesis of all the available evidence on a clearly defined research question or topic. Key stages in conducting a systematic review include clarification of aims and methods in a protocol, finding all of the relevant research, data collection, quality assessments, synthesizing evidence, and interpreting the findings. This short report provides examples for the various stages and steps of the systematic review research approach.
Collapse
Affiliation(s)
- Cheryl M Theile
- Department of Dental Hygiene and Dental Assisting New York University College of Dentistry New York, NY, USA
| | - Andrea L Beall
- Department of Dental Hygiene and Dental Assisting New York University College of Dentistry New York, NY, USA
| |
Collapse
|
2
|
Kernan Freire S, Wang C, Foosherian M, Wellsandt S, Ruiz-Arenas S, Niforatos E. Knowledge sharing in manufacturing using LLM-powered tools: user study and model benchmarking. Front Artif Intell 2024; 7:1293084. [PMID: 38601111 PMCID: PMC11004332 DOI: 10.3389/frai.2024.1293084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 03/14/2024] [Indexed: 04/12/2024] Open
Abstract
Recent advances in natural language processing enable more intelligent ways to support knowledge sharing in factories. In manufacturing, operating production lines has become increasingly knowledge-intensive, putting strain on a factory's capacity to train and support new operators. This paper introduces a Large Language Model (LLM)-based system designed to retrieve information from the extensive knowledge contained in factory documentation and knowledge shared by expert operators. The system aims to efficiently answer queries from operators and facilitate the sharing of new knowledge. We conducted a user study at a factory to assess its potential impact and adoption, eliciting several perceived benefits, namely, enabling quicker information retrieval and more efficient resolution of issues. However, the study also highlighted a preference for learning from a human expert when such an option is available. Furthermore, we benchmarked several commercial and open-sourced LLMs for this system. The current state-of-the-art model, GPT-4, consistently outperformed its counterparts, with open-source models trailing closely, presenting an attractive option given their data privacy and customization benefits. In summary, this work offers preliminary insights and a system design for factories considering using LLM tools for knowledge management.
Collapse
Affiliation(s)
- Samuel Kernan Freire
- Faculty of Industrial Design Engineering, Delft University of Technology, Delft, Netherlands
| | - Chaofan Wang
- Faculty of Industrial Design Engineering, Delft University of Technology, Delft, Netherlands
| | - Mina Foosherian
- BIBA—Bremer Institut für Produktion und Logistik GmbH, Bremen, Germany
| | - Stefan Wellsandt
- BIBA—Bremer Institut für Produktion und Logistik GmbH, Bremen, Germany
| | - Santiago Ruiz-Arenas
- Grupo de Investigación en Ingeniería de Diseño (GRID), Universidad EAFIT - Escuela de Administración, Finanzas e Instituto Tecnológico, Medellin, Colombia
| | - Evangelos Niforatos
- Faculty of Industrial Design Engineering, Delft University of Technology, Delft, Netherlands
| |
Collapse
|
3
|
Gharavi E, LeRoy NJ, Zheng G, Zhang A, Brown DE, Sheffield NC. Joint Representation Learning for Retrieval and Annotation of Genomic Interval Sets. Bioengineering (Basel) 2024; 11:263. [PMID: 38534537 DOI: 10.3390/bioengineering11030263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 02/20/2024] [Accepted: 02/22/2024] [Indexed: 03/28/2024] Open
Abstract
As available genomic interval data increase in scale, we require fast systems to search them. A common approach is simple string matching to compare a search term to metadata, but this is limited by incomplete or inaccurate annotations. An alternative is to compare data directly through genomic region overlap analysis, but this approach leads to challenges like sparsity, high dimensionality, and computational expense. We require novel methods to quickly and flexibly query large, messy genomic interval databases. Here, we develop a genomic interval search system using representation learning. We train numerical embeddings for a collection of region sets simultaneously with their metadata labels, capturing similarity between region sets and their metadata in a low-dimensional space. Using these learned co-embeddings, we develop a system that solves three related information retrieval tasks using embedding distance computations: retrieving region sets related to a user query string, suggesting new labels for database region sets, and retrieving database region sets similar to a query region set. We evaluate these use cases and show that jointly learned representations of region sets and metadata are a promising approach for fast, flexible, and accurate genomic region information retrieval.
Collapse
Affiliation(s)
- Erfaneh Gharavi
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
- School of Data Science, University of Virginia, Charlottesville, VA 22904, USA
| | - Nathan J LeRoy
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
- Department of Biomedical Engineering, School of Medicine, University of Virginia, Charlottesville, VA 22904, USA
| | - Guangtao Zheng
- Department of Computer Science, School of Engineering, University of Virginia, Charlottesville, VA 22908, USA
| | - Aidong Zhang
- School of Data Science, University of Virginia, Charlottesville, VA 22904, USA
- Department of Biomedical Engineering, School of Medicine, University of Virginia, Charlottesville, VA 22904, USA
- Department of Computer Science, School of Engineering, University of Virginia, Charlottesville, VA 22908, USA
| | - Donald E Brown
- School of Data Science, University of Virginia, Charlottesville, VA 22904, USA
- Department of Systems and Information Engineering, University of Virginia, Charlottesville, VA 22908, USA
| | - Nathan C Sheffield
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
- School of Data Science, University of Virginia, Charlottesville, VA 22904, USA
- Department of Biomedical Engineering, School of Medicine, University of Virginia, Charlottesville, VA 22904, USA
- Department of Computer Science, School of Engineering, University of Virginia, Charlottesville, VA 22908, USA
- Department of Public Health Sciences, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
- Child Health Research Center, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
| |
Collapse
|
4
|
Escobar-Liquitay CM, Vergara-Merino L, Verdejo C, Kirmayr M, Schuller-Martínez B, Madrid E, Meza N, Bracchiglione J, Franco JVA. Methodological and users' surveys on the use of the LILACS database in Cochrane reviews identified desirable improvements to the database. Health Info Libr J 2024; 41:76-83. [PMID: 37574776 DOI: 10.1111/hir.12505] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2021] [Revised: 05/14/2023] [Accepted: 07/25/2023] [Indexed: 08/15/2023]
Abstract
BACKGROUND Latin American and Caribbean Health Sciences Literature (LILACS) is the main reference database in the region; however, the way in which this resource is used in Cochrane systematic reviews has not been studied. OBJECTIVES To assess the search methods of Cochrane reviews that used LILACS as a source of information and explore the Cochrane community's perceptions about this resource. METHODS We identified all Cochrane reviews of interventions published during 2019, which included LILACS as a source of information, and analysed their search methods and also ran a survey through the Cochrane Community. RESULTS We found 133 Cochrane reviews that reported the full search strategies, identifying heterogeneity in search details. The respondents to our survey highlighted many areas for improvement in the use of LILACS, including the usability of the search platform for this purpose. DISCUSSION The use and reporting of LILACS in Cochrane reviews demonstrate inconsistencies, as evidenced by the analysis of search reports from systematic reviews and surveys conducted among members of the Cochrane community. CONCLUSION With better guidance on how LILACS database is structured, information specialists working on Cochrane reviews should be able to make more effective use of this unique resource.
Collapse
Affiliation(s)
- Camila Micaela Escobar-Liquitay
- Research Department, Cochrane Associate Centre, Instituto Universitario Hospital Italiano de Buenos Aires, Buenos Aires, Argentina
| | - Laura Vergara-Merino
- Medicine School, University of Valparaíso, Associated Cochrane Centre, Valparaíso, Chile
| | - Catalina Verdejo
- Medicine School, University of Valparaíso, Associated Cochrane Centre, Valparaíso, Chile
| | - Matías Kirmayr
- Medicine School, University of Valparaíso, Associated Cochrane Centre, Valparaíso, Chile
| | | | - Eva Madrid
- Interdisciplinary Centre for Health Studies (CIESAL), Cochrane Associate Centre, School of Medicine, Universidad de Valparaíso, Valparaíso, Chile
| | - Nicolás Meza
- Interdisciplinary Centre for Health Studies (CIESAL), Cochrane Associate Centre, School of Medicine, Universidad de Valparaíso, Valparaíso, Chile
| | - Javier Bracchiglione
- Interdisciplinary Centre for Health Studies (CIESAL), Cochrane Associate Centre, School of Medicine, Universidad de Valparaíso, Valparaíso, Chile
| | - Juan Víctor Ariel Franco
- Institute of General Practice, Medical Faculty of the Heinrich-Heine-University Düsseldorf, Düsseldorf, Germany
| |
Collapse
|
5
|
Theile CM, Beall AL. Narrative Reviews of the Literature: An overview. J Dent Hyg 2024; 98:78-82. [PMID: 38346895] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 01/17/2024] [Indexed: 02/15/2024]
Abstract
This short report guides the reader through the types of narrative reviews and describes the narrative review process from conception to completion. This report is an overview on the topic of literature reviews and serves to provide guidance regarding how and when to use a narrative review approach. Authors have many purposes for selecting the narrative review of the literature including introducing an original research manuscript, reviewing a critical topic for a scholarly journal, creating an introductory chapter for a thesis, or completing a classroom assignment. Each purpose may include a specific format and may require different components to be included in the research and writing process. This short report provides examples for each section of the narrative review research and writing process.
Collapse
Affiliation(s)
- Cheryl M Theile
- Department of Dental Hygiene and Dental Assisting, New York University College of Dentistry, New York, NY, USA
| | - Andrea L Beall
- Department of Dental Hygiene and Dental Assisting, New York University College of Dentistry, New York, NY, USA
| |
Collapse
|
6
|
Zare-Farashbandi E, Adibi P, Zare-Farashbandi F. Retrieving Rare Cases: A Protocol for Searching Complex Medical Cases. Med Ref Serv Q 2024; 43:15-25. [PMID: 38237019 DOI: 10.1080/02763869.2024.2289797] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2024]
Abstract
This study sought to provide a protocol for searching complex medical cases of grand rounds. A clinical informationist was embedded in gastroenterology grand rounds to use comprehensive search strategies and summarize patients' information through concept mapping. Our proposed protocol classifies into three categories: (1) The general search strategy, (2) The protocol for searching for evidence about rare diseases, and (3) Identifying other resources more than routine medical databases. This approach represents a novel method beyond previous studies which were focused on usual ward rounds to facilitate evidence-based decision-making by providing and simplifying a comprehensive summary view of complex medical cases.
Collapse
Affiliation(s)
- Elahe Zare-Farashbandi
- Clinical Informationist Research Group, Health Information Technology Research Center, Isfahan University of Medical Sciences, Iran
| | - Peyman Adibi
- Integrative Functional Gastroenterology Research Center, Isfahan University of Medical Sciences, Iran
| | | |
Collapse
|
7
|
Liu S, Bourgeois FT, Narang C, Dunn AG. A comparison of machine learning methods to find clinical trials for inclusion in new systematic reviews from their PROSPERO registrations prior to searching and screening. Res Synth Methods 2024; 15:73-85. [PMID: 37749068 PMCID: PMC10872991 DOI: 10.1002/jrsm.1672] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 08/13/2023] [Accepted: 09/08/2023] [Indexed: 09/27/2023]
Abstract
Searching for trials is a key task in systematic reviews and a focus of automation. Previous approaches required knowing examples of relevant trials in advance, and most methods are focused on published trial articles. To complement existing tools, we compared methods for finding relevant trial registrations given a International Prospective Register of Systematic Reviews (PROSPERO) entry and where no relevant trials have been screened for inclusion in advance. We compared SciBERT-based (extension of Bidirectional Encoder Representations from Transformers) PICO extraction, MetaMap, and term-based representations using an imperfect dataset mined from 3632 PROSPERO entries connected to a subset of 65,662 trial registrations and 65,834 trial articles known to be included in systematic reviews. Performance was measured by the median rank and recall by rank of trials that were eventually included in the published systematic reviews. When ranking trial registrations relative to PROSPERO entries, 296 trial registrations needed to be screened to identify half of the relevant trials, and the best performing approach used a basic term-based representation. When ranking trial articles relative to PROSPERO entries, 162 trial articles needed to be screened to identify half of the relevant trials, and the best-performing approach used a term-based representation. The results show that MetaMap and term-based representations outperformed approaches that included PICO extraction for this use case. The results suggest that when starting with a PROSPERO entry and where no trials have been screened for inclusion, automated methods can reduce workload, but additional processes are still needed to efficiently identify trial registrations or trial articles that meet the inclusion criteria of a systematic review.
Collapse
Affiliation(s)
- Shifeng Liu
- Biomedical Informatics and Digital Health, Faculty of Medicine and Health, The University of Sydney, Sydney, New South Wales, Australia
| | - Florence T Bourgeois
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA
- Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
| | - Claire Narang
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA
| | - Adam G Dunn
- Biomedical Informatics and Digital Health, Faculty of Medicine and Health, The University of Sydney, Sydney, New South Wales, Australia
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA
| |
Collapse
|
8
|
Wang G, Gao K, Liu Q, Wu Y, Zhang K, Zhou W, Guo C. Potential and Limitations of ChatGPT 3.5 and 4.0 as a Source of COVID-19 Information: Comprehensive Comparative Analysis of Generative and Authoritative Information. J Med Internet Res 2023; 25:e49771. [PMID: 38096014 PMCID: PMC10755661 DOI: 10.2196/49771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Revised: 10/01/2023] [Accepted: 11/16/2023] [Indexed: 12/18/2023] Open
Abstract
BACKGROUND The COVID-19 pandemic, caused by the SARS-CoV-2 virus, has necessitated reliable and authoritative information for public guidance. The World Health Organization (WHO) has been a primary source of such information, disseminating it through a question and answer format on its official website. Concurrently, ChatGPT 3.5 and 4.0, a deep learning-based natural language generation system, has shown potential in generating diverse text types based on user input. OBJECTIVE This study evaluates the accuracy of COVID-19 information generated by ChatGPT 3.5 and 4.0, assessing its potential as a supplementary public information source during the pandemic. METHODS We extracted 487 COVID-19-related questions from the WHO's official website and used ChatGPT 3.5 and 4.0 to generate corresponding answers. These generated answers were then compared against the official WHO responses for evaluation. Two clinical experts scored the generated answers on a scale of 0-5 across 4 dimensions-accuracy, comprehensiveness, relevance, and clarity-with higher scores indicating better performance in each dimension. The WHO responses served as the reference for this assessment. Additionally, we used the BERT (Bidirectional Encoder Representations from Transformers) model to generate similarity scores (0-1) between the generated and official answers, providing a dual validation mechanism. RESULTS The mean (SD) scores for ChatGPT 3.5-generated answers were 3.47 (0.725) for accuracy, 3.89 (0.719) for comprehensiveness, 4.09 (0.787) for relevance, and 3.49 (0.809) for clarity. For ChatGPT 4.0, the mean (SD) scores were 4.15 (0.780), 4.47 (0.641), 4.56 (0.600), and 4.09 (0.698), respectively. All differences were statistically significant (P<.001), with ChatGPT 4.0 outperforming ChatGPT 3.5. The BERT model verification showed mean (SD) similarity scores of 0.83 (0.07) for ChatGPT 3.5 and 0.85 (0.07) for ChatGPT 4.0 compared with the official WHO answers. CONCLUSIONS ChatGPT 3.5 and 4.0 can generate accurate and relevant COVID-19 information to a certain extent. However, compared with official WHO responses, gaps and deficiencies exist. Thus, users of ChatGPT 3.5 and 4.0 should also reference other reliable information sources to mitigate potential misinformation risks. Notably, ChatGPT 4.0 outperformed ChatGPT 3.5 across all evaluated dimensions, a finding corroborated by BERT model validation.
Collapse
Affiliation(s)
- Guoyong Wang
- Children's Hospital, Chongqing Medical University, Chongqing, China
- Women and Children's Hospital, Chongqing Medical University, Chongqing, China
| | - Kai Gao
- Guangzhou Women and Children's Medical Center, Guangzhou Medical University, Guangzhou, China
| | - Qianyang Liu
- Women and Children's Hospital, Chongqing Medical University, Chongqing, China
| | - Yuxin Wu
- Children's Hospital, Chongqing Medical University, Chongqing, China
| | - Kaijun Zhang
- Children's Hospital, Chongqing Medical University, Chongqing, China
| | - Wei Zhou
- Women and Children's Hospital, Chongqing Medical University, Chongqing, China
| | - Chunbao Guo
- Women and Children's Hospital, Chongqing Medical University, Chongqing, China
| |
Collapse
|
9
|
McDonald S, Hill K, Li HZ, Turner T. Evidence surveillance for a living clinical guideline: Case study of the Australian stroke guidelines. Health Info Libr J 2023. [PMID: 37942888 DOI: 10.1111/hir.12515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 07/26/2023] [Accepted: 10/26/2023] [Indexed: 11/10/2023]
Abstract
BACKGROUND Continual evidence surveillance is an integral feature of living guidelines. The Australian Stroke Guidelines include recommendations on 100 clinical topics and have been 'living' since 2018. OBJECTIVES To describe the approach for establishing and evaluating an evidence surveillance system for the living Australian Stroke Guidelines. METHODS We developed a pragmatic surveillance system based on an analysis of the searches for the 2017 Stroke Guidelines and evaluated its reliability by assessing the potential impact on guideline recommendations. Search retrieval and screening workload are monitored monthly, together with the frequency of changes to the guideline recommendations. RESULTS Evidence surveillance was guided by practical considerations of efficiency and sustainability. A single PubMed search covering all guideline topics, limited to systematic reviews and randomised trials, is run monthly. The search retrieves about 400 records a month of which a sixth are triaged to the guideline panels for further consideration. Evaluations with Epistemonikos and the Cochrane Stroke Trials Register demonstrated the robustness of adopting this more restrictive approach. Collaborating with the guideline team in designing, implementing and evaluating the surveillance is essential for optimising the approach. CONCLUSION Monthly evidence surveillance for a large living guideline is feasible and sustainable when applying a pragmatic approach.
Collapse
Affiliation(s)
- Steve McDonald
- Cochrane Australia, School of Public Health and Preventive Medicine, Monash University, Melbourne, Australia
| | - Kelvin Hill
- Stroke Services, Stroke Foundation, Melbourne, Australia
| | - Heidi Z Li
- Stroke Services, Stroke Foundation, Melbourne, Australia
| | - Tari Turner
- Cochrane Australia, School of Public Health and Preventive Medicine, Monash University, Melbourne, Australia
| |
Collapse
|
10
|
Sutton A, O'Keefe H, Johnson EE, Marshall C. A mapping exercise using automated techniques to develop a search strategy to identify systematic review tools. Res Synth Methods 2023; 14:874-881. [PMID: 37669905 DOI: 10.1002/jrsm.1665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 07/31/2023] [Accepted: 08/04/2023] [Indexed: 09/07/2023]
Abstract
The Systematic Review Toolbox aims provide a web-based catalogue of tools that support various tasks within the systematic review and wider evidence synthesis process. Identifying publications surrounding specific systematic review tools is currently challenging, leading to a high screening burden for few eligible records. We aimed to develop a search strategy that could be regularly and automatically run to identify eligible records for the SR Toolbox, thus reducing time on task and burden for those involved. We undertook a mapping exercise to identify the PubMed IDs of papers indexed within the SR Toolbox. We then used the Yale MeSH Analyser and Visualisation of Similarities (VOS) Viewer text-mining software to identify the most commonly used MeSH terms and text words within the eligible records. These MeSH terms and text words were combined using Boolean Operators into a search strategy for Ovid MEDLINE. Prior to the mapping exercise and search strategy development, 81 software tools and 55 'Other' tools were included within the SR Toolbox. Since implementation of the search strategy, 146 tools have been added. There has been an increase in tools added to the toolbox since the search was developed and its corresponding auto-alert in MEDLINE was originally set up. Developing a search strategy based on a mapping exercise is an effective way of identifying new tools to support the systematic review process. Further research could be conducted to help prioritise records for screening to reduce reviewer burden further and to adapt the strategy for disciplines beyond healthcare.
Collapse
Affiliation(s)
- Anthea Sutton
- Sheffield Centre for Health and Related Research, School of Medicine and Population Health, The University of Sheffield, Sheffield, UK
| | - Hannah O'Keefe
- NIHR Innovation Observatory, Newcastle University, Newcastle, UK
| | - Eugenie Evelynne Johnson
- NIHR Innovation Observatory, Newcastle University, Newcastle, UK
- Population Health Sciences Institute, Newcastle University, Newcastle, UK
| | | |
Collapse
|
11
|
Hickner A. How do search systems impact systematic searching? A qualitative study. J Med Libr Assoc 2023; 111:774-782. [PMID: 37928121 PMCID: PMC10621724 DOI: 10.5195/jmla.2023.1647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2023] Open
Abstract
Objective Systematic reviews and other evidence synthesis projects require systematic search methods. Search systems require several essential attributes to support systematic searching; however, many systems used in evidence synthesis fail to meet one or more of these requirements. I undertook a qualitative study to examine the effects of these limitations on systematic searching and how searchers select information sources for evidence synthesis projects. Methods Qualitative data were collected from interviews with twelve systematic searchers. Data were analyzed using reflexive thematic analysis. Results I used thematic analysis to identify two key themes relating to search systems: systems shape search processes, and systematic searching occurs within the information market. Many systems required for systematic reviews, in particular sources of unpublished studies, are not designed for systematic searching. Participants described various workarounds for the limitations they encounter in these systems. Economic factors influence searchers' selection of sources to search, as well as the degree to which vendors prioritize these users. Conclusion Interviews with systematic searchers suggest priorities for improving search systems, and barriers to improvement that must be overcome. Vendors must understand the unique requirements of systematic searching and recognize systematic searchers as a distinct group of users. Better interfaces and improved functionality will result in more efficient evidence synthesis.
Collapse
Affiliation(s)
- Andy Hickner
- , Education and Outreach Librarian, Weill Cornell Medicine, New York, NY
| |
Collapse
|
12
|
Wu DTY, Hanauer D, Murdock P, Vydiswaran VGV, Mei Q, Zheng K. Developing a Semantically Based Query Recommendation for an Electronic Medical Record Search Engine: Query Log Analysis and Design Implications. JMIR Form Res 2023; 7:e45376. [PMID: 37713239 PMCID: PMC10541636 DOI: 10.2196/45376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Revised: 07/19/2023] [Accepted: 08/03/2023] [Indexed: 09/16/2023] Open
Abstract
BACKGROUND An effective and scalable information retrieval (IR) system plays a crucial role in enabling clinicians and researchers to harness the valuable information present in electronic health records. In a previous study, we developed a prototype medical IR system, which incorporated a semantically based query recommendation (SBQR) feature. The system was evaluated empirically and demonstrated high perceived performance by end users. To delve deeper into the factors contributing to this perceived performance, we conducted a follow-up study using query log analysis. OBJECTIVE One of the primary challenges faced in IR is that users often have limited knowledge regarding their specific information needs. Consequently, an IR system, particularly its user interface, needs to be thoughtfully designed to assist users through the iterative process of refining their queries as they encounter relevant documents during their search. To address these challenges, we incorporated "query recommendation" into our Electronic Medical Record Search Engine (EMERSE), drawing inspiration from the success of similar features in modern IR systems for general purposes. METHODS The query log data analyzed in this study were collected during our previous experimental study, where we developed EMERSE with the SBQR feature. We implemented a logging mechanism to capture user query behaviors and the output of the IR system (retrieved documents). In this analysis, we compared the initial query entered by users with the query formulated with the assistance of the SBQR. By examining the results of this comparison, we could examine whether the use of SBQR helped in constructing improved queries that differed from the original ones. RESULTS Our findings revealed that the first query entered without SBQR and the final query with SBQR assistance were highly similar (Jaccard similarity coefficient=0.77). This suggests that the perceived positive performance of the system was primarily attributed to the automatic query expansion facilitated by the SBQR rather than users manually manipulating their queries. In addition, through entropy analysis, we observed that search results converged in scenarios of moderate difficulty, and the degree of convergence correlated strongly with the perceived system performance. CONCLUSIONS The study demonstrated the potential contribution of the SBQR in shaping participants' positive perceptions of system performance, contingent upon the difficulty of the search scenario. Medical IR systems should therefore consider incorporating an SBQR as a user-controlled option or a semiautomated feature. Future work entails redesigning the experiment in a more controlled manner and conducting multisite studies to demonstrate the effectiveness of EMERSE with SBQR for patient cohort identification. By further exploring and validating these findings, we can enhance the usability and functionality of medical IR systems in real-world settings.
Collapse
Affiliation(s)
- Danny T Y Wu
- Department of Biomedical Informatics, University of Cincinnati College of Medicine, Cincinnati, OH, United States
- School of Information, University of Michigan, Ann Arbor, MI, United States
| | - David Hanauer
- School of Information, University of Michigan, Ann Arbor, MI, United States
- Department of Learning Health Sciences, University of Michigan, Ann Arbor, MI, United States
| | - Paul Murdock
- Burnett School of Medicine, Texas Christian University, Fort Worth, TX, United States
- Department of Biomedical Informatics, University of Cincinnati, Cincinnati, OH, United States
| | - V G Vinod Vydiswaran
- School of Information, University of Michigan, Ann Arbor, MI, United States
- Department of Learning Health Sciences, University of Michigan, Ann Arbor, MI, United States
| | - Qiaozhu Mei
- School of Information, University of Michigan, Ann Arbor, MI, United States
| | - Kai Zheng
- School of Information, University of Michigan, Ann Arbor, MI, United States
- Department of Informatics, University of California, Irvine, CA, United States
| |
Collapse
|
13
|
Siglen E, Vetti HH, Augestad M, Steen VM, Lunde Å, Bjorvatn C. Evaluation of the Rosa Chatbot Providing Genetic Information to Patients at Risk of Hereditary Breast and Ovarian Cancer: Qualitative Interview Study. J Med Internet Res 2023; 25:e46571. [PMID: 37656502 PMCID: PMC10504626 DOI: 10.2196/46571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 06/27/2023] [Accepted: 07/20/2023] [Indexed: 09/02/2023] Open
Abstract
BACKGROUND Genetic testing has become an integrated part of health care for patients with breast or ovarian cancer, and the increasing demand for genetic testing is accompanied by an increasing need for easy access to reliable genetic information for patients. Therefore, we developed a chatbot app (Rosa) that is able to perform humanlike digital conversations about genetic BRCA testing. OBJECTIVE Before implementing this new information service in daily clinical practice, we wanted to explore 2 aspects of chatbot use: the perceived utility and trust in chatbot technology among healthy patients at risk of hereditary cancer and how interaction with a chatbot regarding sensitive information about hereditary cancer influences patients. METHODS Overall, 175 healthy individuals at risk of hereditary breast and ovarian cancer were invited to test the chatbot, Rosa, before and after genetic counseling. To secure a varied sample, participants were recruited from all cancer genetic clinics in Norway, and the selection was based on age, gender, and risk of having a BRCA pathogenic variant. Among the 34.9% (61/175) of participants who consented for individual interview, a selected subgroup (16/61, 26%) shared their experience through in-depth interviews via video. The semistructured interviews covered the following topics: usability, perceived usefulness, trust in the information received via the chatbot, how Rosa influenced the user, and thoughts about future use of digital tools in health care. The transcripts were analyzed using the stepwise-deductive inductive approach. RESULTS The overall finding was that the chatbot was very welcomed by the participants. They appreciated the 24/7 availability wherever they were and the possibility to use it to prepare for genetic counseling and to repeat and ask questions about what had been said afterward. As Rosa was created by health care professionals, they also valued the information they received as being medically correct. Rosa was referred to as being better than Google because it provided specific and reliable answers to their questions. The findings were summed up in 3 concepts: "Anytime, anywhere"; "In addition, not instead"; and "Trustworthy and true." All participants (16/16) denied increased worry after reading about genetic testing and hereditary breast and ovarian cancer in Rosa. CONCLUSIONS Our results indicate that a genetic information chatbot has the potential to contribute to easy access to uniform information for patients at risk of hereditary breast and ovarian cancer, regardless of geographical location. The 24/7 availability of quality-assured information, tailored to the specific situation, had a reassuring effect on our participants. It was consistent across concepts that Rosa was a tool for preparation and repetition; however, none of the participants (0/16) supported that Rosa could replace genetic counseling if hereditary cancer was confirmed. This indicates that a chatbot can be a well-suited digital companion to genetic counseling.
Collapse
Affiliation(s)
- Elen Siglen
- Western Norway Familial Cancer Center, Department of Medical Genetics, Haukeland University Hospital, Bergen, Norway
- Faculty of Health Studies, VID Specialized University, Bergen, Norway
| | - Hildegunn Høberg Vetti
- Western Norway Familial Cancer Center, Department of Medical Genetics, Haukeland University Hospital, Bergen, Norway
- Faculty of Health Studies, VID Specialized University, Bergen, Norway
| | - Mirjam Augestad
- Faculty of Health Studies, VID Specialized University, Bergen, Norway
| | - Vidar M Steen
- Western Norway Familial Cancer Center, Department of Medical Genetics, Haukeland University Hospital, Bergen, Norway
- Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Åshild Lunde
- Department of Global Public Health and Primary Care, University of Bergen, Bergen, Norway
| | - Cathrine Bjorvatn
- Western Norway Familial Cancer Center, Department of Medical Genetics, Haukeland University Hospital, Bergen, Norway
- Faculty of Health Studies, VID Specialized University, Bergen, Norway
| |
Collapse
|
14
|
Inau ET, Sack J, Waltemath D, Zeleke AA. Initiatives, Concepts, and Implementation Practices of the Findable, Accessible, Interoperable, and Reusable Data Principles in Health Data Stewardship: Scoping Review. J Med Internet Res 2023; 25:e45013. [PMID: 37639292 PMCID: PMC10495848 DOI: 10.2196/45013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 03/25/2023] [Accepted: 04/14/2023] [Indexed: 08/29/2023] Open
Abstract
BACKGROUND Thorough data stewardship is a key enabler of comprehensive health research. Processes such as data collection, storage, access, sharing, and analytics require researchers to follow elaborate data management strategies properly and consistently. Studies have shown that findable, accessible, interoperable, and reusable (FAIR) data leads to improved data sharing in different scientific domains. OBJECTIVE This scoping review identifies and discusses concepts, approaches, implementation experiences, and lessons learned in FAIR initiatives in health research data. METHODS The Arksey and O'Malley stage-based methodological framework for scoping reviews was applied. PubMed, Web of Science, and Google Scholar were searched to access relevant publications. Articles written in English, published between 2014 and 2020, and addressing FAIR concepts or practices in the health domain were included. The 3 data sources were deduplicated using a reference management software. In total, 2 independent authors reviewed the eligibility of each article based on defined inclusion and exclusion criteria. A charting tool was used to extract information from the full-text papers. The results were reported using the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines. RESULTS A total of 2.18% (34/1561) of the screened articles were included in the final review. The authors reported FAIRification approaches, which include interpolation, inclusion of comprehensive data dictionaries, repository design, semantic interoperability, ontologies, data quality, linked data, and requirement gathering for FAIRification tools. Challenges and mitigation strategies associated with FAIRification, such as high setup costs, data politics, technical and administrative issues, privacy concerns, and difficulties encountered in sharing health data despite its sensitive nature were also reported. We found various workflows, tools, and infrastructures designed by different groups worldwide to facilitate the FAIRification of health research data. We also uncovered a wide range of problems and questions that researchers are trying to address by using the different workflows, tools, and infrastructures. Although the concept of FAIR data stewardship in the health research domain is relatively new, almost all continents have been reached by at least one network trying to achieve health data FAIRness. Documented outcomes of FAIRification efforts include peer-reviewed publications, improved data sharing, facilitated data reuse, return on investment, and new treatments. Successful FAIRification of data has informed the management and prognosis of various diseases such as cancer, cardiovascular diseases, and neurological diseases. Efforts to FAIRify data on a wider variety of diseases have been ongoing since the COVID-19 pandemic. CONCLUSIONS This work summarises projects, tools, and workflows for the FAIRification of health research data. The comprehensive review shows that implementing the FAIR concept in health data stewardship carries the promise of improved research data management and transparency in the era of big data and open research publishing. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID) RR2-10.2196/22505.
Collapse
Affiliation(s)
- Esther Thea Inau
- Department of Medical Informatics, Institute for Community Medicine, University Medicine Greifswald, Greifswald, Germany
| | - Jean Sack
- International Health Department, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
| | - Dagmar Waltemath
- Department of Medical Informatics, Institute for Community Medicine, University Medicine Greifswald, Greifswald, Germany
| | - Atinkut Alamirrew Zeleke
- Department of Medical Informatics, Institute for Community Medicine, University Medicine Greifswald, Greifswald, Germany
| |
Collapse
|
15
|
El-Khatib Z, Richter L, Reich A, Benka B, Assadian O. Implementation of a Surveillance System for Severe Acute Respiratory Infections at a Tertiary Care Hospital in Austria: Protocol for a Retrospective Longitudinal Feasibility Study. JMIR Res Protoc 2023; 12:e47547. [PMID: 37535414 PMCID: PMC10436110 DOI: 10.2196/47547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 05/31/2023] [Accepted: 06/14/2023] [Indexed: 08/04/2023] Open
Abstract
BACKGROUND The risk of a large number of severe acute respiratory infection (SARI) cases emerging is a global concern. SARI can overwhelm the health care capacity and cause several deaths. Therefore, the Austrian Agency for Health and Food Safety will explore the feasibility of implementing an automatic electronically based SARI surveillance system at a tertiary care hospital in Austria as part of the hospital network, initiated by the European Centre for Disease Prevention and Control. OBJECTIVE We aim to investigate the availability of routinely collected health record data pertaining to respiratory infections and the optimal approach to use such available data for systematic surveillance of SARI in a real-world setting, describe the characteristics of patients with SARI before and after the beginning of the COVID-19 pandemic, and investigate the feasibility of identifying the risk factors for a severe outcome (intensive care unit admission or death) in patients with SARI. METHODS We will test the feasibility of a surveillance system, as part of a large European network, at a tertiary care hospital in the province of Lower Austria (called Regional Hospital Wiener Neustadt). It will be a cross-sectional study for the inventory of the electronic data records and implementation of automatic data retrieval for the period of January 2019 through the end of December 2022. The analysis will include an exploration of the database structure, descriptive analysis of the general characteristics of the patients with SARI, estimation of the SARI incidence rate, and assessment of the risk factors and different levels of severity of patients with SARI using logistic regression analysis. RESULTS This will be the first study to assess the feasibility of SARI surveillance at a large 800-bed tertiary care hospital in Austria. It will provide a general overview of the potential for establishing a hospital-based surveillance system for SARI. In addition, if successful, the electronic surveillance will be able to improve the response to early warning signs of new SARI, which will better inform policy makers in strengthening the surveillance system. CONCLUSIONS The findings will support the expansion of the SARI hospital-based surveillance system to other hospitals in Austria. This network will be of use to Austria in preparing for future pandemics. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID) PRR1-10.2196/47547.
Collapse
Affiliation(s)
- Ziad El-Khatib
- Austrian Agency for Health and Food Safety, Vienna, Austria
| | - Lukas Richter
- Austrian Agency for Health and Food Safety, Vienna, Austria
| | - Andreas Reich
- Austrian Agency for Health and Food Safety, Vienna, Austria
| | - Bernhard Benka
- Austrian Agency for Health and Food Safety, Vienna, Austria
| | - Ojan Assadian
- Landesklinikum Wiener Neustadt, Wiener Neustadt, Austria
| |
Collapse
|
16
|
Chen E, Bullard J, Giustini D. Automated indexing using NLM's Medical Text Indexer (MTI) compared to human indexing in Medline: a pilot study. J Med Libr Assoc 2023; 111:684-694. [PMID: 37483360 PMCID: PMC10361558 DOI: 10.5195/jmla.2023.1588] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/25/2023] Open
Abstract
Objective In 2002, the National Library of Medicine (NLM) introduced semi-automated indexing of Medline using the Medical Text Indexer (MTI). In 2021, NLM announced that it would fully automate its indexing in Medline with an improved MTI by mid-2022. This pilot study examines indexing using a sample of records in Medline from 2000, and how an early, public version of MTI's outputs compares to records created by human indexers. Methods This pilot study examines twenty Medline records from 2000, a year before the MTI was introduced as a MeSH term recommender. We identified twenty higher- and lower-impact biomedical journals based on Journal Impact Factor (JIF) and examined the indexing of papers by feeding their PubMed records into the Interactive MTI tool. Results In the sample, we found key differences between automated and human-indexed Medline records: MTI assigned more terms and used them more accurately for citations in the higher JIF group, and MTI tended to rank the Male check tag more highly than the Female check tag and to omit Aged check tags. Sometimes MTI chose more specific terms than human indexers but was inconsistent in applying specificity principles. Conclusion NLM's transition to fully automated indexing of the biomedical literature could introduce or perpetuate inconsistencies and biases in Medline. Librarians and searchers should assess changes to index terms, and their impact on PubMed's mapping features for a range of topics. Future research should evaluate automated indexing as it pertains to finding clinical information effectively, and in performing systematic searches.
Collapse
Affiliation(s)
- Eileen Chen
- , Student, University of British Columbia, School of Information, Vancouver, British Columbia, Canada
| | - Julia Bullard
- , Assistant Professor, University of British Columbia, School of Information, Vancouver, British Columbia, Canada
| | - Dean Giustini
- , Librarian, University of British Columbia, Biomedical Branch Library, Vancouver General Hospital, Vancouver, British Columbia, Canada
| |
Collapse
|
17
|
Gendrin A, Souliotis L, Loudon-Griffiths J, Aggarwal R, Amoako D, Desouza G, Dimitrievska S, Metcalfe P, Louvet E, Sahni H. Identifying Patient Populations in Texts Describing Drug Approvals Through Deep Learning-Based Information Extraction: Development of a Natural Language Processing Algorithm. JMIR Form Res 2023; 7:e44876. [PMID: 37347514 DOI: 10.2196/44876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 03/30/2023] [Accepted: 04/17/2023] [Indexed: 06/23/2023] Open
Abstract
BACKGROUND New drug treatments are regularly approved, and it is challenging to remain up-to-date in this rapidly changing environment. Fast and accurate visualization is important to allow a global understanding of the drug market. Automation of this information extraction provides a helpful starting point for the subject matter expert, helps to mitigate human errors, and saves time. OBJECTIVE We aimed to semiautomate disease population extraction from the free text of oncology drug approval descriptions from the BioMedTracker database for 6 selected drug targets. More specifically, we intended to extract (1) line of therapy, (2) stage of cancer of the patient population described in the approval, and (3) the clinical trials that provide evidence for the approval. We aimed to use these results in downstream applications, aiding the searchability of relevant content against related drug project sources. METHODS We fine-tuned a state-of-the-art deep learning model, Bidirectional Encoder Representations from Transformers, for each of the 3 desired outputs. We independently applied rule-based text mining approaches. We compared the performances of deep learning and rule-based approaches and selected the best method, which was then applied to new entries. The results were manually curated by a subject matter expert and then used to train new models. RESULTS The training data set is currently small (433 entries) and will enlarge over time when new approval descriptions become available or if a choice is made to take another drug target into account. The deep learning models achieved 61% and 56% 5-fold cross-validated accuracies for line of therapy and stage of cancer, respectively, which were treated as classification tasks. Trial identification is treated as a named entity recognition task, and the 5-fold cross-validated F1-score is currently 87%. Although the scores of the classification tasks could seem low, the models comprise 5 classes each, and such scores are a marked improvement when compared to random classification. Moreover, we expect improved performance as the input data set grows, since deep learning models need to be trained on a large enough amount of data to be able to learn the task they are taught. The rule-based approach achieved 60% and 74% 5-fold cross-validated accuracies for line of therapy and stage of cancer, respectively. No attempt was made to define a rule-based approach for trial identification. CONCLUSIONS We developed a natural language processing algorithm that is currently assisting subject matter experts in disease population extraction, which supports health authority approvals. This algorithm achieves semiautomation, enabling subject matter experts to leverage the results for deeper analysis and to accelerate information retrieval in a crowded clinical environment such as oncology.
Collapse
|
18
|
Upadhyay R, Knoth P, Pasi G, Viviani M. Explainable online health information truthfulness in Consumer Health Search. Front Artif Intell 2023; 6:1184851. [PMID: 37415938 PMCID: PMC10321772 DOI: 10.3389/frai.2023.1184851] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2023] [Accepted: 05/30/2023] [Indexed: 07/08/2023] Open
Abstract
Introduction People are today increasingly relying on health information they find online to make decisions that may impact both their physical and mental wellbeing. Therefore, there is a growing need for systems that can assess the truthfulness of such health information. Most of the current literature solutions use machine learning or knowledge-based approaches treating the problem as a binary classification task, discriminating between correct information and misinformation. Such solutions present several problems with regard to user decision making, among which: (i) the binary classification task provides users with just two predetermined possibilities with respect to the truthfulness of the information, which users should take for granted; indeed, (ii) the processes by which the results were obtained are often opaque and the results themselves have little or no interpretation. Methods To address these issues, we approach the problem as an ad hoc retrieval task rather than a classification task, with reference, in particular, to the Consumer Health Search task. To do this, a previously proposed Information Retrieval model, which considers information truthfulness as a dimension of relevance, is used to obtain a ranked list of both topically-relevant and truthful documents. The novelty of this work concerns the extension of such a model with a solution for the explainability of the results obtained, by relying on a knowledge base consisting of scientific evidence in the form of medical journal articles. Results and discussion We evaluate the proposed solution both quantitatively, as a standard classification task, and qualitatively, through a user study to examine the "explained" ranked list of documents. The results obtained illustrate the solution's effectiveness and usefulness in making the retrieved results more interpretable by Consumer Health Searchers, both with respect to topical relevance and truthfulness.
Collapse
Affiliation(s)
- Rishabh Upadhyay
- Information and Knowledge Representation, Retrieval, and Reasoning (IKR3) Lab, Department of Informatics, Systems, and Communication, University of Milano-Bicocca, Milan, Italy
| | - Petr Knoth
- Big Scientific Data and Text Analytics Group, Knowledge Media Institute, The Open University, Milton Keynes, United Kingdom
| | - Gabriella Pasi
- Information and Knowledge Representation, Retrieval, and Reasoning (IKR3) Lab, Department of Informatics, Systems, and Communication, University of Milano-Bicocca, Milan, Italy
| | - Marco Viviani
- Information and Knowledge Representation, Retrieval, and Reasoning (IKR3) Lab, Department of Informatics, Systems, and Communication, University of Milano-Bicocca, Milan, Italy
| |
Collapse
|
19
|
Khan MA, Mowforth OD, Kuhn I, Kotter MRN, Davies BM. Development of a validated search filter for Ovid Embase for degenerative cervical myelopathy. Health Info Libr J 2023; 40:181-189. [PMID: 34409722 DOI: 10.1111/hir.12373] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2020] [Revised: 03/14/2021] [Accepted: 04/14/2021] [Indexed: 12/17/2022]
Abstract
BACKGROUND Degenerative cervical myelopathy (DCM) is a recently proposed umbrella term for symptomatic cervical spinal cord compression secondary to degeneration of the spine. Currently literature searching for DCM is challenged by the inconsistent uptake of the term 'DCM' with many overlapping keywords and numerous synonyms. OBJECTIVES Here, we adapt our previous Ovid medline search filter for the Ovid embase database, to support comprehensive literature searching. Both embase and medline are recommended as a minimum for systematic reviews. METHODS References contained within embase identified in our prior study formed a 'development gold standard' reference database (N = 220). The search filter was adapted for embase and checked against the reference database. The filter was then validated against the 'validation gold standard'. RESULTS A direct translation was not possible, as medline indexing for DCM and the keywords search field were not available in embase. We also used the 'focus' function to improve precision. The resulting search filter has 100% sensitivity in testing. DISCUSSION AND CONCLUSION We have developed a validated search filter capable of retrieving DCM references in embase with high sensitivity. In the absence of consistent terminology and indexing, this will support more efficient and robust evidence synthesis in the field.
Collapse
Affiliation(s)
- Maaz A Khan
- Academic Neurosurgery Unit, Department of Clinical Neurosurgery, Addenbrooke's Hospital, University of Cambridge, Cambridge, UK
| | - Oliver D Mowforth
- Academic Neurosurgery Unit, Department of Clinical Neurosurgery, Addenbrooke's Hospital, University of Cambridge, Cambridge, UK
| | - Isla Kuhn
- University of Cambridge Medical Library, Addenbrooke's Hospital, University of Cambridge, Cambridge, UK
| | - Mark R N Kotter
- Academic Neurosurgery Unit, Department of Clinical Neurosurgery, Addenbrooke's Hospital, University of Cambridge, Cambridge, UK
| | - Benjamin M Davies
- Academic Neurosurgery Unit, Department of Clinical Neurosurgery, Addenbrooke's Hospital, University of Cambridge, Cambridge, UK
| |
Collapse
|
20
|
Lemenkova P, Debeir O. Multispectral Satellite Image Analysis for Computing Vegetation Indices by R in the Khartoum Region of Sudan, Northeast Africa. J Imaging 2023; 9:jimaging9050098. [PMID: 37233317 DOI: 10.3390/jimaging9050098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 05/04/2023] [Accepted: 05/08/2023] [Indexed: 05/27/2023] Open
Abstract
Desertification is one of the most destructive climate-related issues in the Sudan-Sahel region of Africa. As the assessment of desertification is possible by satellite image analysis using vegetation indices (VIs), this study reports on the technical advantages and capabilities of scripting the 'raster' and 'terra' R-language packages for computing the VIs. The test area which was considered includes the region of the confluence between the Blue and White Niles in Khartoum, southern Sudan, northeast Africa and the Landsat 8-9 OLI/TIRS images taken for the years 2013, 2018 and 2022, which were chosen as test datasets. The VIs used here are robust indicators of plant greenness, and combined with vegetation coverage, are essential parameters for environmental analytics. Five VIs were calculated to compare both the status and dynamics of vegetation through the differences between the images collected within the nine-year span. Using scripts for computing and visualising the VIs over Sudan demonstrates previously unreported patterns of vegetation to reveal climate-vegetation relationships. The ability of the R packages 'raster' and 'terra' to process spatial data was enhanced through scripting to automate image analysis and mapping, and choosing Sudan for the case study enables us to present new perspectives for image processing.
Collapse
Affiliation(s)
- Polina Lemenkova
- Laboratory of Image Synthesis and Analysis (LISA), École Polytechnique de Bruxelles (Brussels Faculty of Engineering), Université Libre de Bruxelles (ULB), Building L, Campus du Solbosch, ULB-LISA CP165/57, Avenue Franklin D. Roosevelt 50, 1050 Brussels, Belgium
| | - Olivier Debeir
- Laboratory of Image Synthesis and Analysis (LISA), École Polytechnique de Bruxelles (Brussels Faculty of Engineering), Université Libre de Bruxelles (ULB), Building L, Campus du Solbosch, ULB-LISA CP165/57, Avenue Franklin D. Roosevelt 50, 1050 Brussels, Belgium
| |
Collapse
|
21
|
Banerjee A, Banik P, Wörndl W. A review on individual and multistakeholder fairness in tourism recommender systems. Front Big Data 2023; 6:1168692. [PMID: 37234689 PMCID: PMC10206003 DOI: 10.3389/fdata.2023.1168692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 04/18/2023] [Indexed: 05/28/2023] Open
Abstract
The growing use of Recommender Systems (RS) across various industries, including e-commerce, social media, news, travel, and tourism, has prompted researchers to examine these systems for any biases or fairness concerns. Fairness in RS is a multi-faceted concept ensuring fair outcomes for all stakeholders involved in the recommendation process, and its definition can vary based on the context and domain. This paper highlights the importance of evaluating RS from multiple stakeholders' perspectives, specifically focusing on Tourism Recommender Systems (TRS). Stakeholders in TRS are categorized based on their main fairness criteria, and the paper reviews state-of-the-art research on TRS fairness from various viewpoints. It also outlines the challenges, potential solutions, and research gaps in developing fair TRS. The paper concludes that designing fair TRS is a multi-dimensional process that requires consideration not only of the other stakeholders but also of the environmental impact and effects of overtourism and undertourism.
Collapse
|
22
|
Brody S, Loree S, Sampson M, Mensinkai S, Coffman J, Mueller MH, Askin N, Hamill C, Wilson E, McAteer MB, Staines H. Searching for evidence in public health emergencies: a white paper of best practices. J Med Libr Assoc 2023; 111:566-578. [PMID: 37312802 PMCID: PMC10259619 DOI: 10.5195/jmla.2023.1530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023] Open
Abstract
Objectives Information professionals have supported medical providers, administrators and decision-makers, and guideline creators in the COVID-19 response. Searching COVID-19 literature presented new challenges, including the volume and heterogeneity of literature and the proliferation of new information sources, and exposed existing issues in metadata and publishing. An expert panel developed best practices, including recommendations, elaborations, and examples, for searching during public health emergencies. Methods Project directors and advisors developed core elements from experience and literature. Experts, identified by affiliation with evidence synthesis groups, COVID-19 search experience, and nomination, responded to an online survey to reach consensus on core elements. Expert participants provided written responses to guiding questions. A synthesis of responses provided the foundation for focus group discussions. A writing group then drafted the best practices into a statement. Experts reviewed the statement prior to dissemination. Results Twelve information professionals contributed to best practice recommendations on six elements: core resources, search strategies, publication types, transparency and reproducibility, collaboration, and conducting research. Underlying principles across recommendations include timeliness, openness, balance, preparedness, and responsiveness. Conclusions The authors and experts anticipate the recommendations for searching for evidence during public health emergencies will help information specialists, librarians, evidence synthesis groups, researchers, and decision-makers respond to future public health emergencies, including but not limited to disease outbreaks. The recommendations complement existing guidance by addressing concerns specific to emergency response. The statement is intended as a living document. Future revisions should solicit input from a broader community and reflect conclusions of meta-research on COVID-19 and health emergencies.
Collapse
Affiliation(s)
- Stacy Brody
- , Reference & Instruction Librarian, Himmelfarb Health Sciences Library, George Washington University, School of Medicine and Health Sciences, Washington, DC, United States
| | - Sara Loree
- , Medical Library Manager, St. Luke's Health System, ID, United States
| | - Margaret Sampson
- Children's Hospital of Eastern Ontario Research Institute, Ottawa, ON, Canada
| | | | - Jennifer Coffman
- , Science and Engineering Research Librarian, University of Virginia, Charlottesville, VA, United States
| | | | - Nicole Askin
- , WRHA Virtual Library, University of Manitoba, Winnipeg, MB, Canada
| | - Cheryl Hamill
- , South and East Metropolitan Health Services, Perth, Australia
| | - Emma Wilson
- , The University of Edinburgh, Centre for Clinical Brain Sciences, Edinburgh, Scotland
| | - Mary Beth McAteer
- , Virginia Mason Medical Center, Jones Learning Center, Seattle, WA, United States
| | | | - Best Practices for Searching During Public Health Emergencies Working Group
- Cheryl Hamill, FALIA, AALIA (CP) Health, , 0000-0002-6069-1806, South and East Metropolitan Health Services, Perth, Australia; Maureen Dobbins, RN, PhD, 0000-0002-1968-6765, McMaster University, Canada; Amy M Claussen, MLIS, 0000-0003-3996-1055, University of Minnesota, United States; Kavita Umesh Kothari, MPH, 0000-0002-0759-5225, Health Information Consultant, Kobe, Japan; Caroline De Brún, PhD, 0000-0002-5185-0043, UK Health Security Agency, United Kingdom; Sarah Young, 0000-0002-8301-5106, Carnegie Mellon University, United States; Sarah E Neil-Sztramko, PhD, 0000-0002-9600-3403, McMaster University, Canada; Shaila Mensinkai, MA, MLIS, Librarian Reserve Corps, Canada; Emma Wilson, 0000-0002-8100-7508, The University of Edinburgh, Scotland; Robin M Featherstone MLIS, 0000-0003-2517-2258, CADTH Canadian Agency for Drugs and Technologies in Health (present affiliation); Cochrane Central Executive Team (sponsor), Toronto, Canada; Margaret Sampson, MLIS, PhD, AHIP, 0000-0003-2550-9893, Children's Hospital of Eastern Ontario Research Institute, Canada; Heather Staines, PhD, MA, 0000-0003-3876-1182, Delta Think, United States; Martha Knuth, MLIS, 0000-0003-4264-1642, Centers for Disease Control and Prevention, United States
| |
Collapse
|
23
|
Teitz J, Sander J, Sarker H, Fernandez-Patron C. Potential of dissimilarity measure-based computation of protein thermal stability data for determining protein interactions. Brief Bioinform 2023; 24:7126339. [PMID: 37068306 DOI: 10.1093/bib/bbad143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 03/02/2023] [Accepted: 03/18/2023] [Indexed: 04/19/2023] Open
Abstract
Determining the interacting proteins in multiprotein complexes can be technically challenging. An emerging biochemical approach to this end is based on the 'thermal proximity co-aggregation' (TPCA) phenomenon. Accordingly, when two or more proteins interact to form a complex, they tend to co-aggregate when subjected to heat-induced denaturation and thus exhibit similar melting curves. Here, we explore the potential of leveraging TPCA for determining protein interactions. We demonstrate that dissimilarity measure-based information retrieval applied to melting curves tends to rank a protein-of-interest's interactors higher than its non-interactors, as shown in the context of pull-down assay results. Consequently, such rankings can reduce the number of confirmatory biochemical experiments needed to find bona fide protein-protein interactions. In general, rankings based on dissimilarity measures generated through metric learning further reduce the required number of experiments compared to those based on standard dissimilarity measures such as Euclidean distance. When a protein mixture's melting curves are obtained in two conditions, we propose a scoring function that uses melting curve data to inform how likely a protein pair is to interact in one condition but not another. We show that ranking protein pairs by their scores is an effective approach for determining condition-specific protein-protein interactions. By contrast, clustering melting curve data generally does not inform about the interacting proteins in multiprotein complexes. In conclusion, we report improved methods for dissimilarity measure-based computation of melting curves data that can greatly enhance the determination of interacting proteins in multiprotein complexes.
Collapse
Affiliation(s)
- Joshua Teitz
- Department of Computing Science, Faculty of Science, 2-32 Athabasca Hall, University of Alberta, Edmonton, AB Canada T6G 2E8
| | - Joerg Sander
- Department of Computing Science, Faculty of Science, 2-32 Athabasca Hall, University of Alberta, Edmonton, AB Canada T6G 2E8
| | - Hassan Sarker
- Department of Biochemistry, Faculty of Medicine & Dentistry, College of Health Sciences, 3-19 Medical Sciences Building, University of Alberta, Edmonton, AB Canada T6G 2H7
| | - Carlos Fernandez-Patron
- Department of Biochemistry, Faculty of Medicine & Dentistry, College of Health Sciences, 3-19 Medical Sciences Building, University of Alberta, Edmonton, AB Canada T6G 2H7
| |
Collapse
|
24
|
Rosonovski S, Levchenko M, Ide-Smith M, Faulk L, Harrison M, McEntyre J. Searching and Evaluating Publications and Preprints Using Europe PMC. Curr Protoc 2023; 3:e694. [PMID: 36946755 DOI: 10.1002/cpz1.694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/23/2023]
Abstract
In the field of life sciences there is a growing need for literature analysis tools that help scientists tackle information overload. Europe PubMed Central (Europe PMC), a partner of PubMed Central (PMC; National Library of Medicine, 2022), is an open access database of over 41 million life science publications and preprints, enriched with supporting data, reviews, protocols, and other relevant resources. Europe PMC is a trusted repository of choice for many life science funders (Europe PMC, 2022a), offering a suite of innovative search tools that allow users to search and evaluate the literature, including finding highly cited articles, preprints with community peer reviews, or papers referencing a proteomics dataset in the figure legend. In addition, Europe PMC utilizes text-mining to help researchers identify key terms and find data and evidence in the literature. First-time users often do not utilize the wealth of tools Europe PMC offers and can feel overwhelmed about how to perform the most effective search. This protocol, describing how to search and evaluate publications and preprints using Europe PMC, demonstrates how to carry out more efficient and effective literature searches using the tools provided by Europe PMC. This includes discovering the latest findings on a research topic, following research from a specific author, journal, or preprint server, exploring literature on a new method, expanding your reading list with relevant articles, as well as accessing and evaluating publications and preprints of interest. © 2023 EMBL-EBI. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Finding articles and preprints on a topic of interest Basic Protocol 2: Accessing an article Basic Protocol 3: Browsing the article Basic Protocol 4: Evaluating the article Basic Protocol 5: Refining search results Basic Protocol 6: Finding research by author Basic Protocol 7: Finding a specific article Basic Protocol 8: Finding information about a methodology Basic Protocol 9: Finding evidence of biological interactions, relations, and modifications Basic Protocol 10: Finding data behind a publication Basic Protocol 11: Expanding a reading list and building a bibliography Basic Protocol 12: Staying on top of the current literature.
Collapse
Affiliation(s)
- Summer Rosonovski
- Literature services, EMBL-EBI, Wellcome Trust Genome Campus, Cambridge, United Kingdom
| | - Maria Levchenko
- Literature services, EMBL-EBI, Wellcome Trust Genome Campus, Cambridge, United Kingdom
| | - Michele Ide-Smith
- Literature services, EMBL-EBI, Wellcome Trust Genome Campus, Cambridge, United Kingdom
| | - Lynne Faulk
- Literature services, EMBL-EBI, Wellcome Trust Genome Campus, Cambridge, United Kingdom
| | - Melissa Harrison
- Literature services, EMBL-EBI, Wellcome Trust Genome Campus, Cambridge, United Kingdom
| | - Johanna McEntyre
- Literature services, EMBL-EBI, Wellcome Trust Genome Campus, Cambridge, United Kingdom
| |
Collapse
|
25
|
Pallath A, Zhang Q. Paperfetcher: A tool to automate handsearching and citation searching for systematic reviews. Res Synth Methods 2023; 14:323-335. [PMID: 36260090 DOI: 10.1002/jrsm.1604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2022] [Revised: 08/04/2022] [Accepted: 08/20/2022] [Indexed: 11/09/2022]
Abstract
Systematic reviews are vital instruments for researchers to understand broad trends in a field and synthesize evidence on the effectiveness of interventions in addressing specific issues. The quality of a systematic review depends critically on having comprehensively surveyed all relevant literature on the review topic. In addition to database searching, handsearching is an important supplementary technique that helps increase the likelihood of identifying all relevant studies in a literature search. Traditional handsearching requires reviewers to manually browse through a curated list of field-specific journals and conference proceedings to find articles relevant to the review topic. This manual process is not only time-consuming, laborious, costly, and error-prone due to human fatigue, but it also lacks replicability due to its cumbersome manual nature. To address these issues, this paper presents a free and open-source Python package and an accompanying web-app, Paperfetcher, to automate the retrieval of article metadata for handsearching. With Paperfetcher's assistance, researchers can retrieve article metadata from designated journals within a specified time frame in just a few clicks. In addition to handsearching, it also incorporates a beta version of citation searching in both forward and backward directions. Paperfetcher has an easy-to-use interface, which allows researchers to download the metadata of retrieved studies as a list of DOIs or as an RIS file to facilitate seamless import into systematic review screening software. To the best of our knowledge, Paperfetcher is the first tool to automate handsearching with high usability and a multi-disciplinary focus.
Collapse
Affiliation(s)
- Akash Pallath
- Chemical and Biomolecular Engineering, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Qiyang Zhang
- School of Education, Johns Hopkins University, Baltimore, Maryland, USA
| |
Collapse
|
26
|
Oh IY, Schindler SE, Ghoshal N, Lai AM, Payne PRO, Gupta A. Extraction of clinical phenotypes for Alzheimer's disease dementia from clinical notes using natural language processing. JAMIA Open 2023; 6:ooad014. [PMID: 36844369 PMCID: PMC9952043 DOI: 10.1093/jamiaopen/ooad014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Revised: 01/27/2023] [Accepted: 02/10/2023] [Indexed: 02/28/2023] Open
Abstract
Objectives There is much interest in utilizing clinical data for developing prediction models for Alzheimer's disease (AD) risk, progression, and outcomes. Existing studies have mostly utilized curated research registries, image analysis, and structured electronic health record (EHR) data. However, much critical information resides in relatively inaccessible unstructured clinical notes within the EHR. Materials and Methods We developed a natural language processing (NLP)-based pipeline to extract AD-related clinical phenotypes, documenting strategies for success and assessing the utility of mining unstructured clinical notes. We evaluated the pipeline against gold-standard manual annotations performed by 2 clinical dementia experts for AD-related clinical phenotypes including medical comorbidities, biomarkers, neurobehavioral test scores, behavioral indicators of cognitive decline, family history, and neuroimaging findings. Results Documentation rates for each phenotype varied in the structured versus unstructured EHR. Interannotator agreement was high (Cohen's kappa = 0.72-1) and positively correlated with the NLP-based phenotype extraction pipeline's performance (average F1-score = 0.65-0.99) for each phenotype. Discussion We developed an automated NLP-based pipeline to extract informative phenotypes that may improve the performance of eventual machine learning predictive models for AD. In the process, we examined documentation practices for each phenotype relevant to the care of AD patients and identified factors for success. Conclusion Success of our NLP-based phenotype extraction pipeline depended on domain-specific knowledge and focus on a specific clinical domain instead of maximizing generalizability.
Collapse
Affiliation(s)
- Inez Y Oh
- Corresponding Author: Inez Y. Oh, Institute for Informatics, Washington University School of Medicine, 660 S. Euclid Ave, Campus Box 8132, St Louis, MO 63110, USA;
| | - Suzanne E Schindler
- Department of Neurology, Washington University School of Medicine, St. Louis, Missouri, USA
| | - Nupur Ghoshal
- Department of Neurology, Washington University School of Medicine, St. Louis, Missouri, USA,Department of Psychiatry, Washington University School of Medicine, St. Louis, Missouri, USA
| | - Albert M Lai
- Institute for Informatics, Washington University School of Medicine, St. Louis, Missouri, USA
| | - Philip R O Payne
- Institute for Informatics, Washington University School of Medicine, St. Louis, Missouri, USA
| | - Aditi Gupta
- Institute for Informatics, Washington University School of Medicine, St. Louis, Missouri, USA,Division of Biostatistics, Washington University School of Medicine, St. Louis, Missouri, USA
| |
Collapse
|
27
|
Munarko Y, Rampadarath A, Nickerson DP. CASBERT: BERT-based retrieval for compositely annotated biosimulation model entities. Front Bioinform 2023; 3:1107467. [PMID: 36865672 PMCID: PMC9971925 DOI: 10.3389/fbinf.2023.1107467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Accepted: 01/31/2023] [Indexed: 02/16/2023] Open
Abstract
Maximising FAIRness of biosimulation models requires a comprehensive description of model entities such as reactions, variables, and components. The COmputational Modeling in BIology NEtwork (COMBINE) community encourages the use of Resource Description Framework with composite annotations that semantically involve ontologies to ensure completeness and accuracy. These annotations facilitate scientists to find models or detailed information to inform further reuse, such as model composition, reproduction, and curation. SPARQL has been recommended as a key standard to access semantic annotation with RDF, which helps get entities precisely. However, SPARQL is unsuitable for most repository users who explore biosimulation models freely without adequate knowledge of ontologies, RDF structure, and SPARQL syntax. We propose here a text-based information retrieval approach, CASBERT, that is easy to use and can present candidates of relevant entities from models across a repository's contents. CASBERT adapts Bidirectional Encoder Representations from Transformers (BERT), where each composite annotation about an entity is converted into an entity embedding for subsequent storage in a list of entity embeddings. For entity lookup, a query is transformed to a query embedding and compared to the entity embeddings, and then the entities are displayed in order based on their similarity. The list structure makes it possible to implement CASBERT as an efficient search engine product, with inexpensive addition, modification, and insertion of entity embedding. To demonstrate and test CASBERT, we created a dataset for testing from the Physiome Model Repository and a static export of the BioModels database consisting of query-entities pairs. Measured using Mean Average Precision and Mean Reciprocal Rank, we found that our approach can perform better than the traditional bag-of-words method.
Collapse
Affiliation(s)
- Yuda Munarko
- Auckland Bioengineering Institute, University of Auckland, Auckland, New Zealand,*Correspondence: Yuda Munarko,
| | - Anand Rampadarath
- Auckland Bioengineering Institute, University of Auckland, Auckland, New Zealand,The New Zealand Institute for Plant & Food Research Ltd., Auckland, New Zealand
| | - David P. Nickerson
- Auckland Bioengineering Institute, University of Auckland, Auckland, New Zealand
| |
Collapse
|
28
|
Munarko Y, Rampadarath A, Nickerson D. Building a search tool for compositely annotated entities using Transformer-based approach: Case study in Biosimulation Model Search Engine (BMSE). F1000Res 2023; 12:162. [PMID: 37842339 PMCID: PMC10570691 DOI: 10.12688/f1000research.128982.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/25/2023] [Indexed: 10/17/2023] Open
Abstract
The Transformer-based approaches to solving natural language processing (NLP) tasks such as BERT and GPT are gaining popularity due to their ability to achieve high performance. These approaches benefit from using enormous data sizes to create pre-trained models and the ability to understand the context of words in a sentence. Their use in the information retrieval domain is thought to increase effectiveness and efficiency. This paper demonstrates a BERT-based method (CASBERT) implementation to build a search tool over data annotated compositely using ontologies. The data was a collection of biosimulation models written using the CellML standard in the Physiome Model Repository (PMR). A biosimulation model structurally consists of basic entities of constants and variables that construct higher-level entities such as components, reactions, and the model. Finding these entities specific to their level is beneficial for various purposes regarding variable reuse, experiment setup, and model audit. Initially, we created embeddings representing compositely-annotated entities for constant and variable search (lowest level entity). Then, these low-level entity embeddings were vertically and efficiently combined to create higher-level entity embeddings to search components, models, images, and simulation setups. Our approach was general, so it can be used to create search tools with other data semantically annotated with ontologies - biosimulation models encoded in the SBML format, for example. Our tool is named Biosimulation Model Search Engine (BMSE).
Collapse
Affiliation(s)
- Yuda Munarko
- Auckland Bioengineering Institute, University of Auckland, Auckland, 1010, New Zealand
| | - Anand Rampadarath
- Auckland Bioengineering Institute, University of Auckland, Auckland, 1010, New Zealand
- The New Zealand Institute for Plant and Food Research Limited, Auckland, New Zealand
| | - David Nickerson
- Auckland Bioengineering Institute, University of Auckland, Auckland, 1010, New Zealand
| |
Collapse
|
29
|
Bakheet S, Al-Hamadi A, Soliman E, Heshmat M. Hybrid Bag-of-Visual-Words and FeatureWiz Selection for Content-Based Visual Information Retrieval. Sensors (Basel) 2023; 23:1653. [PMID: 36772705 PMCID: PMC9919877 DOI: 10.3390/s23031653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Revised: 01/30/2023] [Accepted: 01/30/2023] [Indexed: 06/18/2023]
Abstract
Recently, content-based image retrieval (CBIR) based on bag-of-visual-words (BoVW) model has been one of the most promising and increasingly active research areas. In this paper, we propose a new CBIR framework based on the visual words fusion of multiple feature descriptors to achieve an improved retrieval performance, where interest points are separately extracted from an image using features from accelerated segment test (FAST) and speeded-up robust features (SURF). The extracted keypoints are then fused together in a single keypoint feature vector and the improved RootSIFT algorithm is applied to describe the region surrounding each keypoint. Afterward, the FeatureWiz algorithm is employed to reduce features and select the best features for the BoVW learning model. To create the codebook, K-means clustering is applied to quantize visual features into a smaller set of visual words. Finally, the feature vectors extracted from the BoVW model are fed into a support vector machines (SVMs) classifier for image retrieval. An inverted index technique based on cosine distance metric is applied to sort the retrieved images to the similarity of the query image. Experiments on three benchmark datasets (Corel-1000, Caltech-10 and Oxford Flower-17) show that the presented CBIR technique can deliver comparable results to other state-of-the-art techniques, by achieving average accuracies of 92.94%, 98.40% and 84.94% on these datasets, respectively.
Collapse
Affiliation(s)
- Samy Bakheet
- Faculty of Computers and Artificial Intelligence, Sohag University, Sohag 82524, Egypt
- Institute for Information Technology and Communications (IIKT), Otto-von-Guericke-University Magdeburg, 39106 Magdeburg, Germany
| | - Ayoub Al-Hamadi
- Institute for Information Technology and Communications (IIKT), Otto-von-Guericke-University Magdeburg, 39106 Magdeburg, Germany
| | - Emadeldeen Soliman
- Faculty of Computers and Artificial Intelligence, Sohag University, Sohag 82524, Egypt
| | - Mohamed Heshmat
- Faculty of Computers and Artificial Intelligence, Sohag University, Sohag 82524, Egypt
| |
Collapse
|
30
|
Mavragani A, Sandsdalen V, Manskow US, Småbrekke L, Waaseth M. Internet Use for Obtaining Medicine Information: Cross-sectional Survey. JMIR Form Res 2023; 7:e40466. [PMID: 36729577 PMCID: PMC9936360 DOI: 10.2196/40466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 12/23/2022] [Accepted: 12/27/2022] [Indexed: 02/03/2023] Open
Abstract
BACKGROUND The internet is increasingly being used as a source of medicine-related information. People want information to facilitate decision-making and self-management, and they tend to prefer the internet for ease of access. However, it is widely acknowledged that the quality of web-based information varies. Poor interpretation of medicine information can lead to anxiety and poor adherence to drug therapy. It is therefore important to understand how people search, select, and trust medicine information. OBJECTIVE The objectives of this study were to establish the extent of internet use for seeking medicine information among Norwegian pharmacy customers, analyze factors associated with internet use, and investigate the level of trust in different sources and websites. METHODS This is a cross-sectional study with a convenience sample of pharmacy customers recruited from all but one community pharmacy in Tromsø, a medium size municipality in Norway (77,000 inhabitants). Persons (aged ≥16 years) able to complete a questionnaire in Norwegian were asked to participate in the study. The recruitment took place in September and October 2020. Due to COVID-19 restrictions, social media was also used to recruit medicine users. RESULTS A total of 303 respondents reported which sources they used to obtain information about their medicines (both prescription and over the counter) and to what extent they trusted these sources. A total of 125 (41.3%) respondents used the internet for medicine information, and the only factor associated with internet use was age. The odds of using the internet declined by 5% per year of age (odds ratio 0.95, 95% CI 0.94-0.97; P=.048). We found no association between internet use and gender, level of education, or regular medicine use. The main purpose reported for using the internet was to obtain information about side effects. Other main sources of medicine information were physicians (n=191, 63%), pharmacy personnel (n=142, 47%), and medication package leaflets (n=124, 42%), while 36 (12%) respondents did not obtain medicine information from any sources. Note that 272 (91%) respondents trusted health professionals as a source of medicine information, whereas 58 (46%) respondents who used the internet trusted the information they found on the internet. The most reliable websites were the national health portals and other official health information sites. CONCLUSIONS Norwegian pharmacy customers use the internet as a source of medicine information, but most still obtain medicine information from health professionals and packet leaflets. People are aware of the potential for misinformation on websites, and they mainly trust high-quality sites run by health authorities.
Collapse
Affiliation(s)
| | - Vilde Sandsdalen
- Department of Pharmacy, UiT The Artic University of Norway, Tromsø, Norway
| | - Unn Sollid Manskow
- Norwegian Centre for E-health Research, University Hospital of North Norway, Tromsø, Norway
| | - Lars Småbrekke
- Department of Pharmacy, UiT The Artic University of Norway, Tromsø, Norway
| | - Marit Waaseth
- Department of Pharmacy, UiT The Artic University of Norway, Tromsø, Norway
| |
Collapse
|
31
|
Alderden JG, Sharkey PD, Kennerly SM, Ghosh S, Barrett RS, Horn SD, Ghosh S, Yap TL. Developing a Relational Database for Best Practice Data Management: The Turn Everyone and Move for Ulcer Prevention Database. Comput Inform Nurs 2023; 41:59-65. [PMID: 36735569 PMCID: PMC10153087 DOI: 10.1097/cin.0000000000001011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Affiliation(s)
- Jenny Grace Alderden
- Author Affiliations: Boise State University (Dr Alderden), ID; Sellinger School of Business, Loyola University Maryland (Dr Sharkey), Baltimore; East Carolina University (Dr Kennerly), Greenville, NC; Duke University (Mr Sanjay Ghosh), Durham, NC; Acima (Mr Barrett), Draper, UT; School of Medicine, University of Utah (Dr Horn), Salt Lake City; University of North Carolina, Charlotte (Ms Sayoni Ghosh); and Duke University (Dr Yap), Durham, NC
| | | | | | | | | | | | | | | |
Collapse
|
32
|
Weinzierl MA, Harabagiu SM. Epidemic Question Answering: question generation and entailment for Answer Nugget discovery. J Am Med Inform Assoc 2023; 30:329-339. [PMID: 36394232 PMCID: PMC9846678 DOI: 10.1093/jamia/ocac222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 10/31/2022] [Accepted: 11/03/2022] [Indexed: 11/18/2022] Open
Abstract
OBJECTIVE The rapidly growing body of communications during the COVID-19 pandemic posed a challenge to information seekers, who struggled to find answers to their specific and changing information needs. We designed a Question Answering (QA) system capable of answering ad-hoc questions about the COVID-19 disease, its causal virus SARS-CoV-2, and the recommended response to the pandemic. MATERIALS AND METHODS The QA system incorporates, in addition to relevance models, automatic generation of questions from relevant sentences. We relied on entailment between questions for (1) pinpointing answers and (2) selecting novel answers early in the list of its results. RESULTS The QA system produced state-of-the-art results when processing questions asked by experts (eg, researchers, scientists, or clinicians) and competitive results when processing questions asked by consumers of health information. Although state-of-the-art models for question generation and question entailment were used, more than half of the answers were missed, due to the limitations of the relevance models employed. DISCUSSION Although question entailment enabled by automatic question generation is the cornerstone of our QA system's architecture, question entailment did not prove to always be reliable or sufficient in ranking the answers. Question entailment should be enhanced with additional inferential capabilities. CONCLUSION The QA system presented in this article produced state-of-the-art results processing expert questions and competitive results processing consumer questions. Improvements should be considered by using better relevance models and enhanced inference methods. Moreover, experts and consumers have different answer expectations, which should be accounted for in future QA development.
Collapse
Affiliation(s)
- Maxwell A Weinzierl
- Human Language Technology Research Institute, Department of Computer Science, University of Texas at Dallas, Richardson, Texas, USA
| | - Sanda M Harabagiu
- Human Language Technology Research Institute, Department of Computer Science, University of Texas at Dallas, Richardson, Texas, USA
| |
Collapse
|
33
|
Lee G, Jo W, Choi Y. VERD: Emergence of Product-Based Video E-Commerce Retrieval Dataset from User's Perspective. Sensors (Basel) 2023; 23:513. [PMID: 36617111 PMCID: PMC9824814 DOI: 10.3390/s23010513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Revised: 12/28/2022] [Accepted: 12/30/2022] [Indexed: 06/17/2023]
Abstract
Customer demands for product search are growing as a result of the recent growth of the e-commerce market. According to this trend, studies on object-centric retrieval using product images have emerged, but it is difficult to respond to complex user-environment scenarios and a search requires a vast amount of data. In this paper, we propose the Video E-commerce Retrieval Dataset (VERD), which utilizes user-perspective videos. In addition, a benchmark and additional experiments are presented to demonstrate the need for independent research on product-centered video-based retrieval. VERD is publicly accessible for academic research and can be downloaded by contacting the author by email.
Collapse
Affiliation(s)
- Gwangjin Lee
- Department of Intelligent Mechatronics Engineering, Sejong University, Seoul 05006, Republic of Korea
| | - Won Jo
- Department of Artificial Intelligence, Sejong University, Seoul 05006, Republic of Korea
| | - Yukyung Choi
- Department of Intelligent Mechatronics Engineering, Sejong University, Seoul 05006, Republic of Korea
- Department of Artificial Intelligence, Sejong University, Seoul 05006, Republic of Korea
| |
Collapse
|
34
|
O'Keefe H, Rankin J, Wallace SA, Beyer F. Investigation of text-mining methodologies to aid the construction of search strategies in systematic reviews of diagnostic test accuracy-a case study. Res Synth Methods 2023; 14:79-98. [PMID: 35841125 PMCID: PMC10088010 DOI: 10.1002/jrsm.1593] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Revised: 07/01/2022] [Accepted: 07/08/2022] [Indexed: 01/18/2023]
Abstract
Current methodologies for designing search strategies rely heavily on the knowledge and expertise of information specialists. Yet, the volume and complexity of scientific literature is overwhelming for even the most experienced information specialists, making it difficult to produce robust search strategies for complex systematic reviews. In this case study, we aimed to assess and describe the benefits and limitations of using semi-automated text-mining tools for designing search strategies in a systematic review of diagnostic test accuracy. An experienced information specialist designed a search strategy using traditional methods. This strategy was then amended to include additional terms identified by text-mining tools. We evaluated the usability and expertise required, risk of introducing bias to the search, precision of the search strategy and rated the usefulness of the tools. Thirteen of the 16 investigated tools produced a total of 40 additional terms, beyond those in the original search strategy. This resulted in 11 previously unidentified relevant articles being retrieved. Precision was reduced or remained the same in all cases. After considering all aspects of the investigation we rated each application, with two being 'extremely useful', three being 'useful', three having 'no impact' and eight being 'not very useful'. Comparative analysis revealed discrepancies between similar tools. Our findings have implications for the way in which these methodologies are used and applied to search strategies. If semi-automated techniques are to become mainstream in information retrieval for complex systematic reviews, we need tailored tools that fit information specialists' requirements across disciplines.
Collapse
Affiliation(s)
- Hannah O'Keefe
- Evidence Synthesis Group, National Institute for Health Research (NIHR) Innovation Observatory, Newcastle University, Newcastle Upon Tyne, UK
| | - Judith Rankin
- Maternal and Child Health, Population Health Sciences Institute, Faculty of Medical Sciences, Newcastle University, Medical School, Newcastle Upon Tyne, UK
| | - Sheila A Wallace
- Cochrane Incontinence, Evidence Synthesis Group, Population Health Sciences Institute, Faculty of Medical Sciences, Newcastle University, Newcastle Upon Tyne, UK
| | - Fiona Beyer
- Evidence Synthesis Group, National Institute for Health Research (NIHR) Innovation Observatory, Newcastle University, Newcastle Upon Tyne, UK
| |
Collapse
|
35
|
Rodrigues J, Liu H, Folgado D, Belo D, Schultz T, Gamboa H. Feature-Based Information Retrieval of Multimodal Biosignals with a Self-Similarity Matrix: Focus on Automatic Segmentation. Biosensors (Basel) 2022; 12:1182. [PMID: 36551149 PMCID: PMC9776348 DOI: 10.3390/bios12121182] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2022] [Revised: 12/14/2022] [Accepted: 12/15/2022] [Indexed: 05/27/2023]
Abstract
Biosignal-based technology has been increasingly available in our daily life, being a critical information source. Wearable biosensors have been widely applied in, among others, biometrics, sports, health care, rehabilitation assistance, and edutainment. Continuous data collection from biodevices provides a valuable volume of information, which needs to be curated and prepared before serving machine learning applications. One of the universal preparation steps is data segmentation and labelling/annotation. This work proposes a practical and manageable way to automatically segment and label single-channel or multimodal biosignal data using a self-similarity matrix (SSM) computed with signals' feature-based representation. Applied to public biosignal datasets and a benchmark for change point detection, the proposed approach delivered lucid visual support in interpreting the biosignals with the SSM while performing accurate automatic segmentation of biosignals with the help of the novelty function and associating the segments grounded on their similarity measures with the similarity profiles. The proposed method performed superior to other algorithms in most cases of a series of automatic biosignal segmentation tasks; of equal appeal is that it provides an intuitive visualization for information retrieval of multimodal biosignals.
Collapse
Affiliation(s)
- João Rodrigues
- Laboratory for Instrumentation, Biomedical Engineering and Radiation Physics, NOVA School of Science and Technology, Campus da Caparica, 2829-516 Caparica, Portugal
- Cognitive Systems Lab, University of Bremen, Bibliothekstraße 1, 28359 Bremen, Germany
| | - Hui Liu
- Laboratory for Instrumentation, Biomedical Engineering and Radiation Physics, NOVA School of Science and Technology, Campus da Caparica, 2829-516 Caparica, Portugal
- Cognitive Systems Lab, University of Bremen, Bibliothekstraße 1, 28359 Bremen, Germany
| | - Duarte Folgado
- Laboratory for Instrumentation, Biomedical Engineering and Radiation Physics, NOVA School of Science and Technology, Campus da Caparica, 2829-516 Caparica, Portugal
- Associação Fraunhofer Portugal Research, Rua Alfredo Allen 455/461, 4200-135, Porto, Portugal
| | - David Belo
- Associação Fraunhofer Portugal Research, Rua Alfredo Allen 455/461, 4200-135, Porto, Portugal
| | - Tanja Schultz
- Cognitive Systems Lab, University of Bremen, Bibliothekstraße 1, 28359 Bremen, Germany
| | - Hugo Gamboa
- Laboratory for Instrumentation, Biomedical Engineering and Radiation Physics, NOVA School of Science and Technology, Campus da Caparica, 2829-516 Caparica, Portugal
| |
Collapse
|
36
|
Focsa M, Tan C, Chen M, Yan M, Zhang N, Huang S, Liu X. State-of-the-Art Evidence Retriever for Precision Medicine: Algorithm Development and Validation. JMIR Med Inform 2022; 10:e40743. [PMID: 36409468 PMCID: PMC9801267 DOI: 10.2196/40743] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Revised: 11/13/2022] [Accepted: 11/16/2022] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Under the paradigm of precision medicine (PM), patients with the same disease can receive different personalized therapies according to their clinical and genetic features. These therapies are determined by the totality of all available clinical evidence, including results from case reports, clinical trials, and systematic reviews. However, it is increasingly difficult for physicians to find such evidence from scientific publications, whose size is growing at an unprecedented pace. OBJECTIVE In this work, we propose the PM-Search system to facilitate the retrieval of clinical literature that contains critical evidence for or against giving specific therapies to certain cancer patients. METHODS The PM-Search system combines a baseline retriever that selects document candidates at a large scale and an evidence reranker that finely reorders the candidates based on their evidence quality. The baseline retriever uses query expansion and keyword matching with the ElasticSearch retrieval engine, and the evidence reranker fits pretrained language models to expert annotations that are derived from an active learning strategy. RESULTS The PM-Search system achieved the best performance in the retrieval of high-quality clinical evidence at the Text Retrieval Conference PM Track 2020, outperforming the second-ranking systems by large margins (0.4780 vs 0.4238 for standard normalized discounted cumulative gain at rank 30 and 0.4519 vs 0.4193 for exponential normalized discounted cumulative gain at rank 30). CONCLUSIONS We present PM-Search, a state-of-the-art search engine to assist the practicing of evidence-based PM. PM-Search uses a novel Bidirectional Encoder Representations from Transformers for Biomedical Text Mining-based active learning strategy that models evidence quality and improves the model performance. Our analyses show that evidence quality is a distinct aspect from general relevance, and specific modeling of evidence quality beyond general relevance is required for a PM search engine.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Xiaozhong Liu
- Indiana University Bloomington, Bloomington, IN, United States
| |
Collapse
|
37
|
Williams-Lekuona M, Cosma G, Phillips I. A Framework for Enabling Unpaired Multi-Modal Learning for Deep Cross-Modal Hashing Retrieval. J Imaging 2022; 8:jimaging8120328. [PMID: 36547493 PMCID: PMC9785405 DOI: 10.3390/jimaging8120328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Revised: 11/30/2022] [Accepted: 12/06/2022] [Indexed: 12/23/2022] Open
Abstract
Cross-Modal Hashing (CMH) retrieval methods have garnered increasing attention within the information retrieval research community due to their capability to deal with large amounts of data thanks to the computational efficiency of hash-based methods. To date, the focus of cross-modal hashing methods has been on training with paired data. Paired data refers to samples with one-to-one correspondence across modalities, e.g., image and text pairs where the text sample describes the image. However, real-world applications produce unpaired data that cannot be utilised by most current CMH methods during the training process. Models that can learn from unpaired data are crucial for real-world applications such as cross-modal neural information retrieval where paired data is limited or not available to train the model. This paper provides (1) an overview of the CMH methods when applied to unpaired datasets, (2) proposes a framework that enables pairwise-constrained CMH methods to train with unpaired samples, and (3) evaluates the performance of state-of-the-art CMH methods across different pairing scenarios.
Collapse
|
38
|
Hu YJ, Fedyukova A, Wang J, Said JM, Thomas N, Noble E, Cheong JLY, Karanatsios B, Goldfeld S, Wake M. Improving Cohort-Hospital Matching Accuracy through Standardization and Validation of Participant Identifiable Information. Children (Basel) 2022; 9. [PMID: 36553359 DOI: 10.3390/children9121916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Revised: 11/25/2022] [Accepted: 12/03/2022] [Indexed: 12/12/2022]
Abstract
Linking very large, consented birth cohorts to birthing hospitals clinical data could elucidate the lifecourse outcomes of health care and exposures during the pregnancy, birth and newborn periods. Unfortunately, cohort personally identifiable information (PII) often does not include unique identifier numbers, presenting matching challenges. To develop optimized cohort matching to birthing hospital clinical records, this pilot drew on a one-year (December 2020-December 2021) cohort for a single Australian birthing hospital participating in the whole-of-state Generation Victoria (GenV) study. For 1819 consented mother-baby pairs and 58 additional babies (whose mothers were not themselves participating), we tested the accuracy and effort of various approaches to matching. We selected demographic variables drawn from names, DOB, sex, telephone, address (and birth order for multiple births). After variable standardization and validation, accuracy rose from 10% to 99% using a deterministic-rule-based approach in 10 steps. Using cohort-specific modifications of the Australian Statistical Linkage Key (SLK-581), it took only 3 steps to reach 97% (SLK-5881) and 98% (SLK-5881.1) accuracy. We conclude that our SLK-5881 process could safely and efficiently achieve high accuracy at the population level for future birth cohort-birth hospital matching in the absence of unique identifier numbers.
Collapse
|
39
|
Levay P, Heath A, Tuvey D. Efficient searching for NICE public health guidelines: Would using fewer sources still find the evidence? Res Synth Methods 2022; 13:760-789. [PMID: 35657294 PMCID: PMC9795891 DOI: 10.1002/jrsm.1577] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Revised: 05/13/2022] [Accepted: 05/31/2022] [Indexed: 12/30/2022]
Abstract
Systematic searches are integral to identifying the evidence that is used in National Institute for Health and Care Excellence (NICE) public health guidelines (PHGs). This study analyses the sources, including bibliographic databases and other techniques, required for PHGs. The aims were to analyse the sources used to identify the publications included in NICE PHGs; and to assess whether fewer sources could have been searched to retrieve these publications. Data showing how the included publications had been identified was collated using search summary tables. Three scenarios were created to test various combinations of sources to determine whether fewer sources could have been used. The sample included 29 evidence reviews, compiled using 13 searches, to support 10 PHG topics. Across the PHGs, 23 databases and six other techniques retrieved included publications. A mean reduction in total results of 6.5% could have been made if the minimum set of sources plus Cochrane Library, Embase, and MEDLINE were searched. On average, Cochrane Library, Embase, and MEDLINE contributed 76.8% of the included publications, with other databases adding 11% and other techniques 12.2%. None of the searches had a minimum set that was comprised entirely of databases. There was not a core set of sources for PHGs. A range of databases and techniques, covering a multi-disciplinary evidence base, was required to identify all included publications. It would be possible to reduce the number of sources searched and make some gains in productivity. It is important to create a tailored set of sources to do an efficient search.
Collapse
Affiliation(s)
- Paul Levay
- Information ServicesNational Institute for Health and Care Excellence (NICE)ManchesterUK
| | - Andrea Heath
- Information ServicesNational Institute for Health and Care Excellence (NICE)LondonUK
| | - Daniel Tuvey
- Information ServicesNational Institute for Health and Care Excellence (NICE)LondonUK
| |
Collapse
|
40
|
Urru S, Sciannameo V, Lanera C, Salaris S, Gregori D, Berchialla P. A topic trend analysis on COVID-19 literature. Digit Health 2022; 8:20552076221133696. [PMID: 36325437 PMCID: PMC9619924 DOI: 10.1177/20552076221133696] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Accepted: 09/30/2022] [Indexed: 11/06/2022] Open
Abstract
Objective In the past 2 years, the number of scientific publications has grown exponentially. The COVID-19 outbreak hugely contributed to this dramatic increase in the volume of published research. Currently, text mining of the volume of SARS-CoV-2 and COVID-19 publications is limited to the first months of the outbreak. We aim to identify the major topics in COVID-19 literature collected from several citational sources and analyze the temporal trend from November 2019 to December 2021. Methods We performed an extensive literature search on SARS-Cov-2 and COVID-19 publications on PubMed, Scopus, and Web of Science (WoS) and a structural topic modelling on the retrieved abstracts. The temporal trend of the recognized topics was analyzed. Furthermore, a comparison between our corpus and the COVID-19 Open Research Dataset (CORD-19) repository was performed. Results We collected 269,186 publications and identified 10 topics. The most popular topic was related to the clinical pictures of the COVID-19 outbreak, which has a constant trend, and the least popular includes studies on COVID-19 literature and databases. "Telemedicine", "Vaccine development", and "Epidemiology" were popular topics in the early phase of the pandemic; increasing topics in the last period are "COVID-19 impact on mental health", "Forecasting", and "Molecular Biology". "Education" was the second most popular topic, which emerged in September 2020. Conclusions We identified 10 topics for classifying COVID-19 research publications and estimated a nonlinear temporal trend that gives an overview of their unfolding over time. Several citational databases must be searched to retrieve a complete set of studies despite the efforts to build repositories for COVID-19 literature. Our collected data can help build a more focused literature search between November 2019 and December 2021 when carrying out systematic and rapid reviews and our findings can give a complete picture on the topic.
Collapse
Affiliation(s)
- Sara Urru
- Unit of Biostatistics, Epidemiology and Public Health, Department of
Cardiac, Thoracic, Vascular Sciences and Public Health,
University of
Padova, Padua, Italy
| | - Veronica Sciannameo
- Center of Biostatistics, Epidemiology and Public Health, Department
of Clinical and Biological Sciences, University of
Torino, Turin, Italy
| | - Corrado Lanera
- Unit of Biostatistics, Epidemiology and Public Health, Department of
Cardiac, Thoracic, Vascular Sciences and Public Health,
University of
Padova, Padua, Italy
| | - Silvano Salaris
- Unit of Biostatistics, Epidemiology and Public Health, Department of
Cardiac, Thoracic, Vascular Sciences and Public Health,
University of
Padova, Padua, Italy
| | - Dario Gregori
- Unit of Biostatistics, Epidemiology and Public Health, Department of
Cardiac, Thoracic, Vascular Sciences and Public Health,
University of
Padova, Padua, Italy
| | - Paola Berchialla
- Center of Biostatistics, Epidemiology and Public Health, Department
of Clinical and Biological Sciences, University of
Torino, Turin, Italy,Paola Berchialla, Center of Biostatistics,
Epidemiology and Public Health, Department of Clinical and Biological Sciences,
University of Torino, Regione Gonzole 10, Turin, 10043 Orbassano, Italy.
| |
Collapse
|
41
|
Ebeid IA. MedGraph: A semantic biomedical information retrieval framework using knowledge graph embedding for PubMed. Front Big Data 2022; 5:965619. [PMID: 36338335 PMCID: PMC9627348 DOI: 10.3389/fdata.2022.965619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 09/20/2022] [Indexed: 01/24/2023] Open
Abstract
Here we study the semantic search and retrieval problem in biomedical digital libraries. First, we introduce MedGraph, a knowledge graph embedding-based method that provides semantic relevance retrieval and ranking for the biomedical literature indexed in PubMed. Second, we evaluate our approach using PubMed's Best Match algorithm. Moreover, we compare our method MedGraph to a traditional TF-IDF-based algorithm. Third, we use a dataset extracted from PubMed, including 30 million articles' metadata such as abstracts, author information, citation information, and extracted biological entity mentions. We pull a subset of the dataset to evaluate MedGraph using predefined queries with ground truth ranked results. To our knowledge, this technique has not been explored before in biomedical information retrieval. In addition, our results provide some evidence that semantic approaches to search and relevance in biomedical digital libraries that rely on knowledge graph modeling offer better search relevance results when compared with traditional methods in terms of objective metrics.
Collapse
|
42
|
Zhang C, Zhou Q, Qiao M, Tang K, Xu L, Liu F. Re_Trans: Combined Retrieval and Transformer Model for Source Code Summarization. Entropy (Basel) 2022; 24:1372. [PMID: 37420392 DOI: 10.3390/e24101372] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Revised: 09/20/2022] [Accepted: 09/23/2022] [Indexed: 07/09/2023]
Abstract
Source code summarization (SCS) is a natural language description of source code functionality. It can help developers understand programs and maintain software efficiently. Retrieval-based methods generate SCS by reorganizing terms selected from source code or use SCS of similar code snippets. Generative methods generate SCS via attentional encoder-decoder architecture. However, a generative method can generate SCS for any code, but sometimes the accuracy is still far from expectation (due to the lack of numerous high-quality training sets). A retrieval-based method is considered to have a higher accurac, but usually fails to generate SCS for a source code in the absence of a similar candidate in the database. In order to effectively combine the advantages of retrieval-based methods and generative methods, we propose a new method: Re_Trans. For a given code, we first utilize the retrieval-based method to obtain its most similar code with regard to sematic and corresponding SCS (S_RM). Then, we input the given code and similar code into the trained discriminator. If the discriminator outputs onr, we take S_RM as the result; otherwise, we utilize the generate model, transformer, to generate the given code' SCS. Particularly, we use AST-augmented (AbstractSyntax Tree) and code sequence-augmented information to make the source code semantic extraction more complete. Furthermore, we build a new SCS retrieval library through the public dataset. We evaluate our method on a dataset of 2.1 million Java code-comment pairs, and experimental results show improvement over the state-of-the-art (SOTA) benchmarks, which demonstrates the effectiveness and efficiency of our method.
Collapse
Affiliation(s)
- Chunyan Zhang
- State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450001, China
| | - Qinglei Zhou
- School of Information Engineering, Zhengzhou University, Zhengzhou 450001, China
| | - Meng Qiao
- State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450001, China
| | - Ke Tang
- State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450001, China
| | - Lianqiu Xu
- State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450001, China
| | - Fudong Liu
- State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450001, China
| |
Collapse
|
43
|
Noor K, Roguski L, Bai X, Handy A, Klapaukh R, Folarin A, Romao L, Matteson J, Lea N, Zhu L, Asselbergs FW, Wong WK, Shah A, Dobson RJ. Deployment of a Free-Text Analytics Platform at a UK National Health Service Research Hospital: CogStack at University College London Hospitals. JMIR Med Inform 2022; 10:e38122. [PMID: 36001371 PMCID: PMC9453582 DOI: 10.2196/38122] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Revised: 06/05/2022] [Accepted: 07/01/2022] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND As more health care organizations transition to using electronic health record (EHR) systems, it is important for these organizations to maximize the secondary use of their data to support service improvement and clinical research. These organizations will find it challenging to have systems capable of harnessing the unstructured data fields in the record (clinical notes, letters, etc) and more practically have such systems interact with all of the hospital data systems (legacy and current). OBJECTIVE We describe the deployment of the EHR interfacing information extraction and retrieval platform CogStack at University College London Hospitals (UCLH). METHODS At UCLH, we have deployed the CogStack platform, an information retrieval platform with natural language processing capabilities. The platform addresses the problem of data ingestion and harmonization from multiple data sources using the Apache NiFi module for managing complex data flows. The platform also facilitates the extraction of structured data from free-text records through use of the MedCAT natural language processing library. Finally, data science tools are made available to support data scientists and the development of downstream applications dependent upon data ingested and analyzed by CogStack. RESULTS The platform has been deployed at the hospital, and in particular, it has facilitated a number of research and service evaluation projects. To date, we have processed over 30 million records, and the insights produced from CogStack have informed a number of clinical research use cases at the hospital. CONCLUSIONS The CogStack platform can be configured to handle the data ingestion and harmonization challenges faced by a hospital. More importantly, the platform enables the hospital to unlock important clinical information from the unstructured portion of the record using natural language processing technology.
Collapse
Affiliation(s)
- Kawsar Noor
- University College London, London, United Kingdom
- Institute of Health Informatics, University College London, London, United Kingdom
- National Institute for Health and Care Research Biomedical Research Centre, University College London Hospitals National Health Service Foundation Trust, London, United Kingdom
- Health Data Research UK London, University College London, London, United Kingdom
| | - Lukasz Roguski
- University College London, London, United Kingdom
- Institute of Health Informatics, University College London, London, United Kingdom
- National Institute for Health and Care Research Biomedical Research Centre, University College London Hospitals National Health Service Foundation Trust, London, United Kingdom
| | - Xi Bai
- University College London, London, United Kingdom
- Institute of Health Informatics, University College London, London, United Kingdom
- National Institute for Health and Care Research Biomedical Research Centre, University College London Hospitals National Health Service Foundation Trust, London, United Kingdom
| | - Alex Handy
- University College London, London, United Kingdom
- Institute of Health Informatics, University College London, London, United Kingdom
- National Institute for Health and Care Research Biomedical Research Centre, University College London Hospitals National Health Service Foundation Trust, London, United Kingdom
- Health Data Research UK London, University College London, London, United Kingdom
| | - Roman Klapaukh
- Health Data Research UK London, University College London, London, United Kingdom
| | - Amos Folarin
- University College London, London, United Kingdom
- Institute of Health Informatics, University College London, London, United Kingdom
- Health Data Research UK London, University College London, London, United Kingdom
- National Institute for Health and Care Research Biomedical Research Centre, South London and Maudsley National Health Service Foundation Trust, King's College London, London, United Kingdom
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom
| | - Luis Romao
- Institute of Health Informatics, University College London, London, United Kingdom
- National Institute for Health and Care Research Biomedical Research Centre, University College London Hospitals National Health Service Foundation Trust, London, United Kingdom
- Health Data Research UK London, University College London, London, United Kingdom
| | | | - Nathan Lea
- Institute of Health Informatics, University College London, London, United Kingdom
- National Institute for Health and Care Research Biomedical Research Centre, University College London Hospitals National Health Service Foundation Trust, London, United Kingdom
- Health Data Research UK London, University College London, London, United Kingdom
| | - Leilei Zhu
- National Institute for Health and Care Research Biomedical Research Centre, University College London Hospitals National Health Service Foundation Trust, London, United Kingdom
| | - Folkert W Asselbergs
- Institute of Health Informatics, University College London, London, United Kingdom
- National Institute for Health and Care Research Biomedical Research Centre, University College London Hospitals National Health Service Foundation Trust, London, United Kingdom
| | - Wai Keong Wong
- National Institute for Health and Care Research Biomedical Research Centre, University College London Hospitals National Health Service Foundation Trust, London, United Kingdom
| | - Anoop Shah
- University College London, London, United Kingdom
- Institute of Health Informatics, University College London, London, United Kingdom
- National Institute for Health and Care Research Biomedical Research Centre, University College London Hospitals National Health Service Foundation Trust, London, United Kingdom
- Health Data Research UK London, University College London, London, United Kingdom
| | - Richard Jb Dobson
- University College London, London, United Kingdom
- Institute of Health Informatics, University College London, London, United Kingdom
- National Institute for Health and Care Research Biomedical Research Centre, University College London Hospitals National Health Service Foundation Trust, London, United Kingdom
- Health Data Research UK London, University College London, London, United Kingdom
- National Institute for Health and Care Research Biomedical Research Centre, South London and Maudsley National Health Service Foundation Trust, King's College London, London, United Kingdom
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom
| |
Collapse
|
44
|
Wu L, Ali S, Ali H, Brock T, Xu J, Tong W. NeuroCORD: A Language Model to Facilitate COVID-19-Associated Neurological Disorder Studies. Int J Environ Res Public Health 2022; 19:9974. [PMID: 36011614 PMCID: PMC9408703 DOI: 10.3390/ijerph19169974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 08/03/2022] [Accepted: 08/05/2022] [Indexed: 06/15/2023]
Abstract
COVID-19 can lead to multiple severe outcomes including neurological and psychological impacts. However, it is challenging to manually scan hundreds of thousands of COVID-19 articles on a regular basis. To update our knowledge, provide sound science to the public, and communicate effectively, it is critical to have an efficient means of following the most current published data. In this study, we developed a language model to search abstracts using the most advanced artificial intelligence (AI) to accurately retrieve articles on COVID-19-associated neurological disorders. We applied this NeuroCORD model to the largest benchmark dataset of COVID-19, CORD-19. We found that the model developed on the training set yielded 94% prediction accuracy on the test set. This result was subsequently verified by two experts in the field. In addition, when applied to 96,000 non-labeled articles that were published after 2020, the NeuroCORD model accurately identified approximately 3% of them to be relevant for the study of COVID-19-associated neurological disorders, while only 0.5% were retrieved using conventional keyword searching. In conclusion, NeuroCORD provides an opportunity to profile neurological disorders resulting from COVID-19 in a rapid and efficient fashion, and its general framework could be used to study other COVID-19-related emerging health issues.
Collapse
Affiliation(s)
- Leihong Wu
- National Center for Toxicological Research, Food and Drug Administration, 3900 NCTR Rd., Jefferson, AR 72079, USA
| | - Syed Ali
- National Center for Toxicological Research, Food and Drug Administration, 3900 NCTR Rd., Jefferson, AR 72079, USA
| | - Heather Ali
- Department of Internal Medicine, University of Arkansas for Medical Sciences, 4301 West Markham, Little Rock, AR 72205, USA
| | - Tyrone Brock
- National Center for Toxicological Research, Food and Drug Administration, 3900 NCTR Rd., Jefferson, AR 72079, USA
- Department of Mathematics and Computer Science, University of Arkansas at Pine Bluff, 1200 University Drive, Pine Bluff, AR 71601, USA
| | - Joshua Xu
- National Center for Toxicological Research, Food and Drug Administration, 3900 NCTR Rd., Jefferson, AR 72079, USA
| | - Weida Tong
- National Center for Toxicological Research, Food and Drug Administration, 3900 NCTR Rd., Jefferson, AR 72079, USA
| |
Collapse
|
45
|
Golder S, Farrah K, Mierzwinski-Urban M, Barker B, Rama A. Updated generic search filters for finding studies of adverse drug effects in Ovid medline and Embase may retrieve up to 90% of relevant studies. Health Info Libr J 2022. [PMID: 35670564 DOI: 10.1111/hir.12441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Revised: 03/23/2022] [Accepted: 05/19/2022] [Indexed: 11/30/2022]
Abstract
BACKGROUND The most current objectively derived search filters for adverse drug effects are 15 years old and other strategies have not been developed and tested empirically. OBJECTIVE To develop and validate search filters to retrieve evidence on adverse drug effects from Ovid medline and Ovid Embase. METHODS We identified systematic reviews of adverse drug effects in Epistemonikos. From these reviews, we collated their included studies which we then randomly divided into three tests and one validation set of records. We constructed a search strategy to maximise relative recall using word frequency analysis with test set one. This search strategy was then refined using test sets two and three and validated on the final set of records. RESULTS Of 107 systematic reviews which met our inclusion criteria, 1948 unique included studies were available from medline and 1980 from Embase. Generic adverse drug effects searches in medline and Embase achieved 90% and 89% relative recall, respectively. When specific adverse effects terms were added recall was improved. CONCLUSION We have derived and validated search filters that retrieve around 90% of records with adverse drug effects data in medline and Embase. The addition of specific adverse effects terms is required to achieve higher recall.
Collapse
Affiliation(s)
- Su Golder
- Department of Health Sciences, University of York, York, UK
| | - Kelly Farrah
- Research Information Services, Canadian Agency for Drugs and Technologies in Health (CADTH), Ottawa, Ontario, Canada
| | - Monika Mierzwinski-Urban
- Research Information Services, Canadian Agency for Drugs and Technologies in Health (CADTH), Ottawa, Ontario, Canada
| | - Beth Barker
- Department of Social and Policy Sciences, University of Bath, Bath, UK
| | - Anna Rama
- Hull York Medical School (HYMS), University of York, York, UK
| |
Collapse
|
46
|
Antoun J, Lapin J, Beck D. Information retrieval at the point of care of community family physicians in Arab countries. Health Info Libr J 2022; 39:178-184. [PMID: 35396788 DOI: 10.1111/hir.12429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Accepted: 03/18/2022] [Indexed: 11/30/2022]
Abstract
This study is based on Jumana Antoun's PhD thesis at Walden University, USA examining the information retrieval behaviour of 72 community family physicians' at the point of care in eight Arab countries in the Eastern Mediterranean. The key findings were that participants looked for digital clinical information at the point of care on average 14.0 times per week with the majority (80.3%) using a mobile phone. Clinical information about medication dosage and side effects was the most sought clinical question, and patient education was the least. Almost half of the participants considered that they often found relevant (55.6%), useful (56.9%) and unbiased (58.3%) information. Whilst none of the factors examined predicted the physicians' self-reported effectiveness and efficiency at information retrieval, the implication for practice points clearly to the barriers and the need for curricula to focus on search strategies using free resources at the point of care.
Collapse
Affiliation(s)
- Jumana Antoun
- Department of Family Medicine, American University of Beirut, Beirut, Lebanon
| | - Jennifer Lapin
- Richard W. Riley College of Education and Leadership, Walden University, Minneapolis, Minnesota, USA
| | - Dennis Beck
- Department of Educational Technology, University of Arkansas, Fayetteville, Arkansas, USA
| |
Collapse
|
47
|
Pohyer V, Baudoin D, Fournier L, Rance B. Extraction of Tumor Response Criteria in Semi-Structured Imaging Report. Stud Health Technol Inform 2022; 294:149-150. [PMID: 35612044 DOI: 10.3233/shti220424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
In this study, we extracted information from 6,376 french CT scan semi-structured text reports evaluating the cancer treatment response using the RECIST methodology. We evaluated the performance against manual annotation of 100 reports and measured the evolution of the presence of information over time. The results show high performances of the extraction as well as trends.
Collapse
Affiliation(s)
| | - David Baudoin
- Hôpital Européen Georges Pompidou, AP-HP, Paris, France
- INSERM, UMRS 1138, Centre de Recherche des Cordeliers, Université Sorbonne-Paris Cité, Paris, France
| | - Laure Fournier
- Hôpital Européen Georges Pompidou, AP-HP, Paris, France
- INSERM, PARCC, Paris, France
- Université de Paris, Paris, France
| | - Bastien Rance
- Hôpital Européen Georges Pompidou, AP-HP, Paris, France
- INSERM, UMRS 1138, Centre de Recherche des Cordeliers, Université Sorbonne-Paris Cité, Paris, France
- Université de Paris, Paris, France
- INRIA, France
| |
Collapse
|
48
|
Guan R, Pang H, Liang Y, Shao Z, Gao X, Xu D, Feng X. Discovering trends and hotspots of biosafety and biosecurity research via machine learning. Brief Bioinform 2022; 23:6590367. [PMID: 35596953 PMCID: PMC9487701 DOI: 10.1093/bib/bbac194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Revised: 04/06/2022] [Accepted: 04/27/2022] [Indexed: 11/14/2022] Open
Abstract
Coronavirus disease 2019 (COVID-19) has infected hundreds of millions of people and killed millions of them. As an RNA virus, COVID-19 is more susceptible to variation than other viruses. Many problems involved in this epidemic have made biosafety and biosecurity (hereafter collectively referred to as ‘biosafety’) a popular and timely topic globally. Biosafety research covers a broad and diverse range of topics, and it is important to quickly identify hotspots and trends in biosafety research through big data analysis. However, the data-driven literature on biosafety research discovery is quite scant. We developed a novel topic model based on latent Dirichlet allocation, affinity propagation clustering and the PageRank algorithm (LDAPR) to extract knowledge from biosafety research publications from 2011 to 2020. Then, we conducted hotspot and trend analysis with LDAPR and carried out further studies, including annual hot topic extraction, a 10-year keyword evolution trend analysis, topic map construction, hot region discovery and fine-grained correlation analysis of interdisciplinary research topic trends. These analyses revealed valuable information that can guide epidemic prevention work: (1) the research enthusiasm over a certain infectious disease not only is related to its epidemic characteristics but also is affected by the progress of research on other diseases, and (2) infectious diseases are not only strongly related to their corresponding microorganisms but also potentially related to other specific microorganisms. The detailed experimental results and our code are available at https://github.com/KEAML-JLU/Biosafety-analysis.
Collapse
Affiliation(s)
- Renchu Guan
- Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, 130012, Jilin, China.,Zhuhai Sub Laboratory, Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, Zhuhai College of Science and Technology, Zhuhai, 519041, Guangdong, China
| | - Haoyu Pang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, 130012, Jilin, China
| | - Yanchun Liang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, 130012, Jilin, China.,Zhuhai Sub Laboratory, Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, Zhuhai College of Science and Technology, Zhuhai, 519041, Guangdong, China
| | - Zhongjun Shao
- Department of Epidemiology, Ministry of Education Key Laboratory of Hazard Assessment and Control in Special Operational Environment, School of Public Health, Air Force Medical University, Xi'an, 710032, Shaanxi, China
| | - Xin Gao
- Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia.,Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia.,BioMap, Beijing, 100192, China
| | - Dong Xu
- Department of Electric Engineering and Computer Science, and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, 65201, Missouri, USA
| | - Xiaoyue Feng
- Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, 130012, Jilin, China
| |
Collapse
|
49
|
Liu S, Bourgeois FT, Dunn AG. Identifying unreported links between ClinicalTrials.gov trial registrations and their published results. Res Synth Methods 2022; 13:342-352. [PMID: 34970844 PMCID: PMC9090946 DOI: 10.1002/jrsm.1545] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2021] [Revised: 12/13/2021] [Accepted: 12/17/2021] [Indexed: 11/10/2022]
Abstract
A substantial proportion of trial registrations are not linked to corresponding published articles, limiting analyses and new tools. Our aim was to develop a method for finding articles reporting the results of trials that are registered on ClinicalTrials.gov when they do not include metadata links. We used a set of 27,280 trial registration and article pairs to train and evaluate methods for identifying missing links in both directions-from articles to registrations and from registrations to articles. We trained a classifier with six distance metrics as feature representations to rank the correct article or registration, using recall@K to evaluate performance and compare to baseline methods. When identifying links from registrations to published articles, the classifier ranked the correct article first (recall@1) among 378,048 articles in 80.8% of evaluation cases and 34.9% in the baseline method. Recall@10 was 85.1% compared to 60.7% in the baseline. When predicting links from articles to registrations, recall@1 was 83.4% for the classifier and 39.8% in the baseline. Recall@10 was 89.5% compared to 65.8% in the baseline. The proposed method improves on our baseline document similarity method to be feasible for identifying missing links in practice. Given a ClinicalTrials.gov registration, a user checking 10 ranked articles can expect to identify the matching article in at least 85% of cases, if the trial has been published. The proposed method can be used to improve the coupling of ClinicalTrials.gov and PubMed, with applications related to automating systematic review and evidence synthesis processes.
Collapse
Affiliation(s)
- Shifeng Liu
- Faculty of Medicine and Health, The University of Sydney, Biomedical Informatics and Digital Health, School of Medical Sciences, Sydney, New South Wales, Australia
| | - Florence T Bourgeois
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA
- Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
| | - Adam G Dunn
- Faculty of Medicine and Health, The University of Sydney, Biomedical Informatics and Digital Health, School of Medical Sciences, Sydney, New South Wales, Australia
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA
| |
Collapse
|
50
|
Haddaway NR, Grainger MJ, Gray CT. citationchaser: a tool for transparent and efficient forward and backward citation chasing in systematic searching. Res Synth Methods 2022; 13:533-545. [PMID: 35472127 DOI: 10.1002/jrsm.1563] [Citation(s) in RCA: 49] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Revised: 04/01/2022] [Accepted: 04/21/2022] [Indexed: 11/10/2022]
Abstract
Systematic searching aims to find all possibly relevant research from multiple sources, the basis for an unbiased and comprehensive evidence base. Along with bibliographic databases, systematic reviewers use a variety of additional methods to minimise procedural bias. Citation chasing exploits connections between research articles to identify relevant records for a review by making use of explicit mentions of one article within another. Citation chasing is a popular supplementary search method because it helps to build on the work of primary research and review authors. It does so by identifying potentially relevant studies that might otherwise not be retrieved by other search methods; for example, because they did not use the review authors' search terms in the specified combinations in their titles, abstracts, or keywords. Here, we briefly provide an overview of citation chasing as a method for systematic reviews. Furthermore, given the challenges and high resource requirements associated with citation chasing, the limited application of citation chasing in otherwise rigorous systematic reviews, and the potential benefit of identifying terminologically disconnected but semantically linked research studies, we have developed and describe a free and open source tool that allows for rapid forward and backward citation chasing. We introduce citationchaser, an R package and Shiny app for conducting forward and backward citation chasing from a starting set of articles. We describe the sources of data, the backend code functionality, and the user interface provided in the Shiny app. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Neal R Haddaway
- Leibniz-Centre for Agricultural Landscape Research (ZALF), Eberswalder Str. 84, 15374, Müncheberg, Germany.,African Centre for Evidence, University of Johannesburg, Johannesburg, South Africa
| | - Matthew J Grainger
- Norwegian Institute for Nature Research, Postboks 5685 Torgarden, 7485, Trondheim, Norway
| | - Charles T Gray
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, UK
| |
Collapse
|