1
|
Roth S, Wermer-Colan A. Machine Learning Methods for Systematic Reviews:: A Rapid Scoping Review. Dela J Public Health 2023; 9:40-47. [PMID: 38173960 PMCID: PMC10759980 DOI: 10.32481/djph.2023.11.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2024] Open
Abstract
Objective At the forefront of machine learning research since its inception has been natural language processing, also known as text mining, referring to a wide range of statistical processes for analyzing textual data and retrieving information. In medical fields, text mining has made valuable contributions in unexpected ways, not least by synthesizing data from disparate biomedical studies. This rapid scoping review examines how machine learning methods for text mining can be implemented at the intersection of these disparate fields to improve the workflow and process of conducting systematic reviews in medical research and related academic disciplines. Methods The primary research question that this investigation asked, "what impact does the use of machine learning have on the methods used by systematic review teams to carry out the systematic review process, such as the precision of search strategies, unbiased article selection or data abstraction and/or analysis for systematic reviews and other comprehensive review types of similar methodology?" A literature search was conducted by a medical librarian utilizing multiple databases, a grey literature search and handsearching of the literature. The search was completed on December 4, 2020. Handsearching was done on an ongoing basis with an end date of April 14, 2023. Results The search yielded 23,190 studies after duplicates were removed. As a result, 117 studies (1.70%) met eligibility criteria for inclusion in this rapid scoping review. Conclusions There are several techniques and/or types of machine learning methods in development or that have already been fully developed to assist with the systematic review stages. Combined with human intelligence, these machine learning methods and tools provide promise for making the systematic review process more efficient, saving valuable time for systematic review authors, and increasing the speed in which evidence can be created and placed in the hands of decision makers and the public.
Collapse
Affiliation(s)
- Stephanie Roth
- Medical Librarian, Lewis B. Flinn Medical Library, ChristianaCare
| | - Alex Wermer-Colan
- Academic Director, Loretta C. Duckworth Scholars Studio, Temple University Libraries
| |
Collapse
|
2
|
Oliveira Dos Santos Á, Sergio da Silva E, Machado Couto L, Valadares Labanca Reis G, Silva Belo V. The use of artificial intelligence for automating or semi-automating biomedical literature analyses: a scoping review. J Biomed Inform 2023; 142:104389. [PMID: 37187321 DOI: 10.1016/j.jbi.2023.104389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 04/11/2023] [Accepted: 05/08/2023] [Indexed: 05/17/2023]
Abstract
OBJECTIVE Evidence-based medicine (EBM) is a decision-making process based on the conscious and judicious use of the best available scientific evidence. However, the exponential increase in the amount of information currently available likely exceeds the capacity of human-only analysis. In this context, artificial intelligence (AI) and its branches such as machine learning (ML) can be used to facilitate human efforts in analyzing the literature to foster EBM. The present scoping review aimed to examine the use of AI in the automation of biomedical literature survey and analysis with a view to establishing the state-of-the-art and identifying knowledge gaps. MATERIALS AND METHODS Comprehensive searches of the main databases were performed for articles published up to June 2022 and studies were selected according to inclusion and exclusion criteria. Data were extracted from the included articles and the findings categorized. RESULTS The total number of records retrieved from the databases was 12,145, of which 273 were included in the review. Classification of the studies according to the use of AI in evaluating the biomedical literature revealed three main application groups, namely assembly of scientific evidence (n=127; 47%), mining the biomedical literature (n=112; 41%) and quality analysis (n=34; 12%). Most studies addressed the preparation of systematic reviews, while articles focusing on the development of guidelines and evidence synthesis were the least frequent. The biggest knowledge gap was identified within the quality analysis group, particularly regarding methods and tools that assess the strength of recommendation and consistency of evidence. CONCLUSION Our review shows that, despite significant progress in the automation of biomedical literature surveys and analyses in recent years, intense research is needed to fill knowledge gaps on more difficult aspects of ML, deep learning and natural language processing, and to consolidate the use of automation by end-users (biomedical researchers and healthcare professionals).
Collapse
Affiliation(s)
| | - Eduardo Sergio da Silva
- Federal University of São João del-Rei, Campus Centro-Oeste Dona Lindu, Divinópolis, Minas Gerais, Brazil.
| | - Letícia Machado Couto
- Federal University of São João del-Rei, Campus Centro-Oeste Dona Lindu, Divinópolis, Minas Gerais, Brazil.
| | | | - Vinícius Silva Belo
- Federal University of São João del-Rei, Campus Centro-Oeste Dona Lindu, Divinópolis, Minas Gerais, Brazil.
| |
Collapse
|
3
|
GIL-LEIVA I, FUJITA MSL, REDIGOLO FM, SARAN JF. Extracción de información de documentos PDF para su uso en la indización automática de e-books. TRANSINFORMACAO 2022. [DOI: 10.1590/2318-0889202234e210069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Resumen El número de libros electrónicos que ingresan en las bibliotecas en formato PDF cada día es mayor, complicando y haciendo casi inviables algunos procesos realizados tradicionalmente de forma manual por los bibliotecarios, como es la asignación de materias. En este contexto, se hace necesario el diseño y desarrollo de aplicaciones que asistan a los bibliotecarios. Teniendo esto en consideración, presentamos en este trabajo la evaluación de herramientas de extracción de información de libros en PDF que podrían usarse posteriormente como materia prima para un sistema de indización automática. Para ello, realizamos una primera evaluación de cinco softwares (PDFMiner.six, PDFAct, PDF-extract, PDFExtract y Grobib) y, posteriormente, como PDFAct consiguió el mejor rendimiento, hicimos una segunda evaluación para averiguar su capacidad para identificar y extraer informaciones de los libros, tales como títulos, índices, secciones, títulos de tablas y gráficos y referencias bibliográficas, informaciones relevantes para cualquier sistema de indización. Se concluye que ninguna de las herramientas evaluadas extrae adecuadamente las diferentes partes de libros en PDF, si bien, PDFAct ha logrado un rendimiento superior al del resto.
Collapse
|
4
|
Scheibel B, Mangler J, Rinderle-Ma S. Extraction of dimension requirements from engineering drawings for supporting quality control in production processes. COMPUT IND 2021. [DOI: 10.1016/j.compind.2021.103442] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
5
|
Hussain M, Hussain J, Ali T, Lee S. An Empirical Method of Automatic Pattern Extraction for Clinical Text Classification. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2020; 2020:5292-5295. [PMID: 33019178 DOI: 10.1109/embc44109.2020.9176503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Clinical text classification is an indispensable and extensively studied problem in medical text processing. Existing research primarily employs machine learning and pattern based approaches to address the stated problem. In general, pattern based approaches perform better than other methods. However, these approaches commonly require human intervention for pattern identification, which diminish their benefits and restrain their applications. In this study, we present a novel pattern extraction algorithm, which identifies and extracts patterns from clinical textual resources, automatically. The algorithm identifies the candidate concepts in the clinical text, finds the context of the concepts by discovering their context windows, and finally transforms each context window to a pattern. We evaluate our proposed algorithm on Hypertension, Rhinosinusitis, and Asthma guidelines. 70% of the hypertension guideline was used for pattern extraction while the remaining 30% and the other two guidelines were used for evaluations. The algorithm extracts 21 patterns that classify Hypertension, Rhinosinusitis, and Asthma guidelines sentences to the recommendation and non-recommendation sentences with 84.53%, 80.03%, and 84.62% accuracy, respectively. The initial results reveal the benefits and applicability of the algorithm for clinical text classification.
Collapse
|
6
|
Mbuagbaw L, Lawson DO, Puljak L, Allison DB, Thabane L. A tutorial on methodological studies: the what, when, how and why. BMC Med Res Methodol 2020; 20:226. [PMID: 32894052 PMCID: PMC7487909 DOI: 10.1186/s12874-020-01107-7] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Accepted: 08/27/2020] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Methodological studies - studies that evaluate the design, analysis or reporting of other research-related reports - play an important role in health research. They help to highlight issues in the conduct of research with the aim of improving health research methodology, and ultimately reducing research waste. MAIN BODY We provide an overview of some of the key aspects of methodological studies such as what they are, and when, how and why they are done. We adopt a "frequently asked questions" format to facilitate reading this paper and provide multiple examples to help guide researchers interested in conducting methodological studies. Some of the topics addressed include: is it necessary to publish a study protocol? How to select relevant research reports and databases for a methodological study? What approaches to data extraction and statistical analysis should be considered when conducting a methodological study? What are potential threats to validity and is there a way to appraise the quality of methodological studies? CONCLUSION Appropriate reflection and application of basic principles of epidemiology and biostatistics are required in the design and analysis of methodological studies. This paper provides an introduction for further discussion about the conduct of methodological studies.
Collapse
Affiliation(s)
- Lawrence Mbuagbaw
- Department of Health Research Methods, Evidence and Impact, McMaster University, Hamilton, ON, Canada.
- Biostatistics Unit/FSORC, 50 Charlton Avenue East, St Joseph's Healthcare-Hamilton, 3rd Floor Martha Wing, Room H321, Hamilton, Ontario, L8N 4A6, Canada.
- Centre for the Development of Best Practices in Health, Yaoundé, Cameroon.
| | - Daeria O Lawson
- Department of Health Research Methods, Evidence and Impact, McMaster University, Hamilton, ON, Canada
| | - Livia Puljak
- Center for Evidence-Based Medicine and Health Care, Catholic University of Croatia, Ilica 242, 10000, Zagreb, Croatia
| | - David B Allison
- Department of Epidemiology and Biostatistics, School of Public Health - Bloomington, Indiana University, Bloomington, IN, 47405, USA
| | - Lehana Thabane
- Department of Health Research Methods, Evidence and Impact, McMaster University, Hamilton, ON, Canada
- Biostatistics Unit/FSORC, 50 Charlton Avenue East, St Joseph's Healthcare-Hamilton, 3rd Floor Martha Wing, Room H321, Hamilton, Ontario, L8N 4A6, Canada
- Departments of Paediatrics and Anaesthesia, McMaster University, Hamilton, ON, Canada
- Centre for Evaluation of Medicine, St. Joseph's Healthcare-Hamilton, Hamilton, ON, Canada
- Population Health Research Institute, Hamilton Health Sciences, Hamilton, ON, Canada
| |
Collapse
|
7
|
A multi-label text classification method via dynamic semantic representation model and deep neural network. APPL INTELL 2020. [DOI: 10.1007/s10489-020-01680-w] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
8
|
Mull HJ, Stolzmann K, Kalver E, Shin MH, Schweizer ML, Asundi A, Mehta P, Stanislawski M, Branch-Elliman W. Novel methodology to measure pre-procedure antimicrobial prophylaxis: integrating text searches with structured data from the Veterans Health Administration's electronic medical record. BMC Med Inform Decis Mak 2020; 20:15. [PMID: 32000780 PMCID: PMC6993312 DOI: 10.1186/s12911-020-1031-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Accepted: 01/20/2020] [Indexed: 11/11/2022] Open
Abstract
Background Antimicrobial prophylaxis is an evidence-proven strategy for reducing procedure-related infections; however, measuring this key quality metric typically requires manual review, due to the way antimicrobial prophylaxis is documented in the electronic medical record (EMR). Our objective was to electronically measure compliance with antimicrobial prophylaxis using both structured and unstructured data from the Veterans Health Administration (VA) EMR. We developed this methodology for cardiac device implantation procedures. Methods With clinician input and review of clinical guidelines, we developed a list of antimicrobial names recommended for the prevention of cardiac device infection. We trained the algorithm using existing fiscal year (FY) 2008–15 data from the VA Clinical Assessment Reporting and Tracking-Electrophysiology (CART-EP), which contains manually determined information about antimicrobial prophylaxis. We merged CART-EP data with EMR data and programmed statistical software to flag an antimicrobial orders or drug fills from structured data fields in the EMR and hits on text string searches of antimicrobial names documented in clinician’s notes. We iteratively tested combinations of these data elements to optimize an algorithm to accurately classify antimicrobial use. The final algorithm was validated in a national cohort of VA cardiac device procedures from FY2016–2017. Discordant cases underwent expert manual review to identify reasons for algorithm misclassification. Results The CART-EP dataset included 2102 procedures at 38 VA facilities with manually identified antimicrobial prophylaxis in 2056 cases (97.8%). The final algorithm combining structured EMR fields and text note search results correctly classified 2048 of the CART-EP cases (97.4%). In the validation sample, the algorithm measured compliance with antimicrobial prophylaxis in 16,606 of 18,903 cardiac device procedures (87.8%). Misclassification was due to EMR documentation issues, such as antimicrobial prophylaxis documented only in hand-written clinician notes in a format that cannot be electronically searched. Conclusions We developed a methodology with high accuracy to measure guideline concordant use of antimicrobial prophylaxis before cardiac device procedures using data fields present in modern EMRs. This method can replace manual review in quality measurement in the VA and other healthcare systems with EMRs; further, this method could be adapted to measure compliance in other procedural areas where antimicrobial prophylaxis is recommended.
Collapse
Affiliation(s)
- Hillary J Mull
- VA Boston Healthcare System, Center for Healthcare Organization and Implementation Research (CHOIR), 150 S. Huntington Ave, Boston, MA, 02130, USA. .,Department of Surgery, Boston University School of Medicine, Boston, MA, USA.
| | - Kelly Stolzmann
- VA Boston Healthcare System, Center for Healthcare Organization and Implementation Research (CHOIR), 150 S. Huntington Ave, Boston, MA, 02130, USA
| | - Emily Kalver
- VA Boston Healthcare System, Center for Healthcare Organization and Implementation Research (CHOIR), 150 S. Huntington Ave, Boston, MA, 02130, USA
| | - Marlena H Shin
- VA Boston Healthcare System, Center for Healthcare Organization and Implementation Research (CHOIR), 150 S. Huntington Ave, Boston, MA, 02130, USA
| | - Marin L Schweizer
- Center for Access and Delivery Research and Evaluation (CADRE), Iowa City VA Health Care System, Iowa City, Iowa, USA.,University of Iowa Carver College of Medicine, Iowa City, Iowa, USA
| | - Archana Asundi
- Department of Medicine, Boston University School of Medicine, Boston, MA, USA.,Boston Medical Center, Department of Medicine, Division of Infectious Diseases, Boston, MA, USA
| | - Payal Mehta
- VA Boston Healthcare System, Department of Medicine, Sections of Infectious Diseases and Cardiology, Boston, MA, USA
| | - Maggie Stanislawski
- Seattle-Denver Center of Innovation for Veteran-Centered and Value-Driven Care, Seattle, Washington and Denver, Colorado, USA.,Division of Biomedical Informatics and Personalized Medicine, University of Colorado School of Medicine, Aurora, Colorado, USA
| | - Westyn Branch-Elliman
- VA Boston Healthcare System, Center for Healthcare Organization and Implementation Research (CHOIR), 150 S. Huntington Ave, Boston, MA, 02130, USA.,VA Boston Healthcare System, Department of Medicine, Sections of Infectious Diseases and Cardiology, Boston, MA, USA.,Harvard Medical School, Boston, MA, USA
| |
Collapse
|
9
|
Riedel MC, Salo T, Hays J, Turner MD, Sutherland MT, Turner JA, Laird AR. Automated, Efficient, and Accelerated Knowledge Modeling of the Cognitive Neuroimaging Literature Using the ATHENA Toolkit. Front Neurosci 2019; 13:494. [PMID: 31156374 PMCID: PMC6530419 DOI: 10.3389/fnins.2019.00494] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2019] [Accepted: 04/29/2019] [Indexed: 11/13/2022] Open
Abstract
Neuroimaging research is growing rapidly, providing expansive resources for synthesizing data. However, navigating these dense resources is complicated by the volume of research articles and variety of experimental designs implemented across studies. The advent of machine learning algorithms and text-mining techniques has advanced automated labeling of published articles in biomedical research to alleviate such obstacles. As of yet, a comprehensive examination of document features and classifier techniques for annotating neuroimaging articles has yet to be undertaken. Here, we evaluated which combination of corpus (abstract-only or full-article text), features (bag-of-words or Cognitive Atlas terms), and classifier (Bernoulli naïve Bayes, k-nearest neighbors, logistic regression, or support vector classifier) resulted in the highest predictive performance in annotating a selection of 2,633 manually annotated neuroimaging articles. We found that, when utilizing full article text, data-driven features derived from the text performed the best, whereas if article abstracts were used for annotation, features derived from the Cognitive Atlas performed better. Additionally, we observed that when features were derived from article text, anatomical terms appeared to be the most frequently utilized for classification purposes and that cognitive concepts can be identified based on similar representations of these anatomical terms. Optimizing parameters for the automated classification of neuroimaging articles may result in a larger proportion of the neuroimaging literature being annotated with labels supporting the meta-analysis of psychological constructs.
Collapse
Affiliation(s)
- Michael C. Riedel
- Department of Physics, Florida International University, Miami, FL, United States
| | - Taylor Salo
- Department of Psychology, Florida International University, Miami, FL, United States
| | - Jason Hays
- Department of Psychology, Florida International University, Miami, FL, United States
| | - Matthew D. Turner
- Psychology and Neuroscience, Georgia State University, Atlanta, GA, United States
| | - Matthew T. Sutherland
- Department of Psychology, Florida International University, Miami, FL, United States
| | - Jessica A. Turner
- Psychology and Neuroscience, Georgia State University, Atlanta, GA, United States
| | - Angela R. Laird
- Department of Physics, Florida International University, Miami, FL, United States
| |
Collapse
|
10
|
Pendergrass SA, Crawford DC. Using Electronic Health Records To Generate Phenotypes For Research. CURRENT PROTOCOLS IN HUMAN GENETICS 2019; 100:e80. [PMID: 30516347 PMCID: PMC6318047 DOI: 10.1002/cphg.80] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Electronic health records contain patient-level data collected during and for clinical care. Data within the electronic health record include diagnostic billing codes, procedure codes, vital signs, laboratory test results, clinical imaging, and physician notes. With repeated clinic visits, these data are longitudinal, providing important information on disease development, progression, and response to treatment or intervention strategies. The near universal adoption of electronic health records nationally has the potential to provide population-scale real-world clinical data accessible for biomedical research, including genetic association studies. For this research potential to be realized, high-quality research-grade variables must be extracted from these clinical data warehouses. We describe here common and emerging electronic phenotyping approaches applied to electronic health records, as well as current limitations of both the approaches and the biases associated with these clinically collected data that impact their use in research. © 2018 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Sarah A. Pendergrass
- Biomedical and Translational Informatics Institute,
Geisinger Research, Rockville MD
| | - Dana C. Crawford
- Institute for Computational Biology, Department of
Population and Quantitative Health Sciences, Case Western Reserve University,
Cleveland, OH
| |
Collapse
|
11
|
The UAB Informatics Institute and 2016 CEGS N-GRID de-identification shared task challenge. J Biomed Inform 2017; 75S:S54-S61. [PMID: 28478268 DOI: 10.1016/j.jbi.2017.05.001] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2017] [Revised: 04/04/2017] [Accepted: 05/01/2017] [Indexed: 11/22/2022]
Abstract
Clinical narratives (the text notes found in patients' medical records) are important information sources for secondary use in research. However, in order to protect patient privacy, they must be de-identified prior to use. Manual de-identification is considered to be the gold standard approach but is tedious, expensive, slow, and impractical for use with large-scale clinical data. Automated or semi-automated de-identification using computer algorithms is a potentially promising alternative. The Informatics Institute of the University of Alabama at Birmingham is applying de-identification to clinical data drawn from the UAB hospital's electronic medical records system before releasing them for research. We participated in a shared task challenge by the Centers of Excellence in Genomic Science (CEGS) Neuropsychiatric Genome-Scale and RDoC Individualized Domains (N-GRID) at the de-identification regular track to gain experience developing our own automatic de-identification tool. We focused on the popular and successful methods from previous challenges: rule-based, dictionary-matching, and machine-learning approaches. We also explored new techniques such as disambiguation rules, term ambiguity measurement, and used multi-pass sieve framework at a micro level. For the challenge's primary measure (strict entity), our submissions achieved competitive results (f-measures: 87.3%, 87.1%, and 86.7%). For our preferred measure (binary token HIPAA), our submissions achieved superior results (f-measures: 93.7%, 93.6%, and 93%). With those encouraging results, we gain the confidence to improve and use the tool for the real de-identification task at the UAB Informatics Institute.
Collapse
|
12
|
Bui DDA, Del Fiol G, Hurdle JF, Jonnalagadda S. Extractive text summarization system to aid data extraction from full text in systematic review development. J Biomed Inform 2016; 64:265-272. [PMID: 27989816 PMCID: PMC5362293 DOI: 10.1016/j.jbi.2016.10.014] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2016] [Revised: 10/24/2016] [Accepted: 10/25/2016] [Indexed: 01/30/2023]
Abstract
OBJECTIVES Extracting data from publication reports is a standard process in systematic review (SR) development. However, the data extraction process still relies too much on manual effort which is slow, costly, and subject to human error. In this study, we developed a text summarization system aimed at enhancing productivity and reducing errors in the traditional data extraction process. METHODS We developed a computer system that used machine learning and natural language processing approaches to automatically generate summaries of full-text scientific publications. The summaries at the sentence and fragment levels were evaluated in finding common clinical SR data elements such as sample size, group size, and PICO values. We compared the computer-generated summaries with human written summaries (title and abstract) in terms of the presence of necessary information for the data extraction as presented in the Cochrane review's study characteristics tables. RESULTS At the sentence level, the computer-generated summaries covered more information than humans do for systematic reviews (recall 91.2% vs. 83.8%, p<0.001). They also had a better density of relevant sentences (precision 59% vs. 39%, p<0.001). At the fragment level, the ensemble approach combining rule-based, concept mapping, and dictionary-based methods performed better than individual methods alone, achieving an 84.7% F-measure. CONCLUSION Computer-generated summaries are potential alternative information sources for data extraction in systematic review development. Machine learning and natural language processing are promising approaches to the development of such an extractive summarization system.
Collapse
Affiliation(s)
- Duy Duc An Bui
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA; Division of Health and Biomedical Informatics, Northwestern University, Chicago, IL, USA.
| | - Guilherme Del Fiol
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | - John F Hurdle
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | | |
Collapse
|