Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Afzal M, Hussain M, Haynes RB, Lee S. Context-aware grading of quality evidences for evidence-based decision-making. Health Informatics J 2017;25:429-445. [PMID: 28766402 DOI: 10.1177/1460458217719560] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

For:	Afzal M, Hussain M, Haynes RB, Lee S. Context-aware grading of quality evidences for evidence-based decision-making. Health Informatics J 2017;25:429-445. [PMID: 28766402 DOI: 10.1177/1460458217719560] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Number

Cited by Other Article(s)

Lokker C, Abdelkader W, Bagheri E, Parrish R, Cotoi C, Navarro T, Germini F, Linkins LA, Haynes RB, Chu L, Afzal M, Iorio A. Boosting efficiency in a clinical literature surveillance system with LightGBM. PLOS DIGITAL HEALTH 2024;3:e0000299. [PMID: 39312500 PMCID: PMC11419392 DOI: 10.1371/journal.pdig.0000299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Accepted: 08/14/2024] [Indexed: 09/25/2024]

Abstract

Given the suboptimal performance of Boolean searching to identify methodologically sound and clinically relevant studies in large bibliographic databases, exploring machine learning (ML) to efficiently classify studies is warranted. To boost the efficiency of a literature surveillance program, we used a large internationally recognized dataset of articles tagged for methodological rigor and applied an automated ML approach to train and test binary classification models to predict the probability of clinical research articles being of high methodologic quality. We trained over 12,000 models on a dataset of titles and abstracts of 97,805 articles indexed in PubMed from 2012-2018 which were manually appraised for rigor by highly trained research associates and rated for clinical relevancy by practicing clinicians. As the dataset is unbalanced, with more articles that do not meet the criteria for rigor, we used the unbalanced dataset and over- and under-sampled datasets. Models that maintained sensitivity for high rigor at 99% and maximized specificity were selected and tested in a retrospective set of 30,424 articles from 2020 and validated prospectively in a blinded study of 5253 articles. The final selected algorithm, combining a LightGBM (gradient boosting machine) model trained in each dataset, maintained high sensitivity and achieved 57% specificity in the retrospective validation test and 53% in the prospective study. The number of articles needed to read to find one that met appraisal criteria was 3.68 (95% CI 3.52 to 3.85) in the prospective study, compared with 4.63 (95% CI 4.50 to 4.77) when relying only on Boolean searching. Gradient-boosting ML models reduced the work required to classify high quality clinical research studies by 45%, improving the efficiency of literature surveillance and subsequent dissemination to clinicians and other evidence users.

Collapse

Affiliation(s)

Cynthia Lokker Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
Wael Abdelkader Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
Elham Bagheri Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
Rick Parrish Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
Chris Cotoi Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
Tamara Navarro Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
Federico Germini Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada Department of Medicine, McMaster University, Hamilton, Ontario, Canada
Lori-Ann Linkins Department of Medicine, McMaster University, Hamilton, Ontario, Canada
R. Brian Haynes Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada Department of Medicine, McMaster University, Hamilton, Ontario, Canada
Lingyang Chu Department of Computing and Software, McMaster University, Hamilton, Ontario, Canada
Muhammad Afzal School of Computing and Digital Technology, Birmingham City University, Birmingham, United Kingdom
Alfonso Iorio Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada Department of Medicine, McMaster University, Hamilton, Ontario, Canada

Collapse

Jin L, Li C, Zhu Z, Zou S, Sun X. Mining LDA topics on construction engineering change risks based on graded evidence. PLoS One 2024;19:e0303424. [PMID: 38900821 PMCID: PMC11189256 DOI: 10.1371/journal.pone.0303424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Accepted: 04/25/2024] [Indexed: 06/22/2024] Open

Abstract

Engineering change (EC) risk may negatively impact project schedule, cost, quality, and stakeholder satisfaction. However, existing methods for managing EC risk have certain shortcomings in evidence selection and do not adequately consider the quality and reliability of evidence associated with EC risks. Evidence grading plays a crucial role in ensuring the reliability of decisions related to EC risks and can provide essential scientific and reliability support for decision-making. In order to explore the potential risks associated with architectural engineering changes (ECs) and identify the most significant ones, this study proposed a methodology that combines evidence grading theory and Latent Dirichlet Allocation (LDA) topic analysis means. Initially, the evidence-based grading theory served as the creation of a grading table for evidence sources related to EC risk. Specifically, we categorized the evidence sources into three levels based on their credibility. Subsequently, we selected evidence with higher credibility levels for textual analysis, utilizing the LDA topic model. This involved analyzing regulations, industry standards, and judgment documents related to EC, ultimately identifying the themes associated with EC risks. In addition, by combining EC risk topics with relevant literature, we identified factors influencing EC risks. Subsequently, we designed an expert survey questionnaire to determine the key risks and important risk topics associated with potential risks. The results show that by synthesizing information from both Class A and B evidence, a total of five prominent risk themes were identified, namely contract, technology, funds, personnel, and other hazards. Among them, the technical risk has the highest value, so it implies that the risk is the most important, and the key risks are engineering design defects, errors, and omissions.

Collapse

Lokker C, Bagheri E, Abdelkader W, Parrish R, Afzal M, Navarro T, Cotoi C, Germini F, Linkins L, Brian Haynes R, Chu L, Iorio A. Deep Learning to Refine the Identification of High-Quality Clinical Research Articles from the Biomedical Literature: Performance Evaluation. J Biomed Inform 2023;142:104384. [PMID: 37164244 DOI: 10.1016/j.jbi.2023.104384] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 04/24/2023] [Accepted: 05/03/2023] [Indexed: 05/12/2023]

Abstract

BACKGROUND

Identifying practice-ready evidence-based journal articles in medicine is a challenge due to the sheer volume of biomedical research publications. Newer approaches to support evidence discovery apply deep learning techniques to improve the efficiency and accuracy of classifying sound evidence.

OBJECTIVE

To determine how well deep learning models using variants of Bidirectional Encoder Representations from Transformers (BERT) identify high-quality evidence with high clinical relevance from the biomedical literature for consideration in clinical practice.

METHODS

We fine-tuned variations of BERT models (BERT_BASE, BioBERT, BlueBERT, and PubMedBERT) and compared their performance in classifying articles based on methodological quality criteria. The dataset used for fine-tuning models included titles and abstracts of >160,000 PubMed records from 2012-2020 that were of interest to human health which had been manually labeled based on meeting established critical appraisal criteria for methodological rigor. The data was randomly divided into 80:10:10 sets for training, validating, and testing. In addition to using the full unbalanced set, the training data was randomly undersampled into four balanced datasets to assess performance and select the best performing model. For each of the four sets, one model that maintained sensitivity (recall) at ≥99% was selected and were ensembled. The best performing model was evaluated in a prospective, blinded test and applied to an established reference standard, the Clinical Hedges dataset.

RESULTS

In training, three of the four selected best performing models were trained using BioBERT_BASE. The ensembled model did not boost performance compared with the best individual model. Hence a solo BioBERT-based model (named DL-PLUS) was selected for further testing as it was computationally more efficient. The model had high recall (>99%) and 60% to 77% specificity in a prospective evaluation conducted with blinded research associates and saved >60% of the work required to identify high quality articles.

CONCLUSIONS

Deep learning using pretrained language models and a large dataset of classified articles produced models with improved specificity while maintaining >99% recall. The resulting DL-PLUS model identifies high-quality, clinically relevant articles from PubMed at the time of publication. The model improves the efficiency of a literature surveillance program, which allows for faster dissemination of appraised research.

Collapse

Affiliation(s)

Cynthia Lokker Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada.
Elham Bagheri Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
Wael Abdelkader Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
Rick Parrish Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
Muhammad Afzal Department of Computing, Birmingham City University, Birmingham, UK
Tamara Navarro Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
Chris Cotoi Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
Federico Germini Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada; Department of Medicine, McMaster University, Hamilton, Ontario, Canada
Lori Linkins Department of Medicine, McMaster University, Hamilton, Ontario, Canada
R Brian Haynes Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada; Department of Medicine, McMaster University, Hamilton, Ontario, Canada
Lingyang Chu Department of Computing and Software, McMaster University, Hamilton, Ontario, Canada
Alfonso Iorio Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada; Department of Medicine, McMaster University, Hamilton, Ontario, Canada

Collapse

Šuster S, Baldwin T, Lau JH, Jimeno Yepes A, Martinez Iraola D, Otmakhova Y, Verspoor K. Automating Quality Assessment of Medical Evidence in Systematic Reviews: Model Development and Validation Study. J Med Internet Res 2023;25:e35568. [PMID: 36722350 PMCID: PMC10131699 DOI: 10.2196/35568] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2021] [Revised: 01/18/2023] [Accepted: 01/31/2023] [Indexed: 02/01/2023] Open

Abstract

BACKGROUND

Assessment of the quality of medical evidence available on the web is a critical step in the preparation of systematic reviews. Existing tools that automate parts of this task validate the quality of individual studies but not of entire bodies of evidence and focus on a restricted set of quality criteria.

OBJECTIVE

We proposed a quality assessment task that provides an overall quality rating for each body of evidence (BoE), as well as finer-grained justification for different quality criteria according to the Grading of Recommendation, Assessment, Development, and Evaluation formalization framework. For this purpose, we constructed a new data set and developed a machine learning baseline system (EvidenceGRADEr).

METHODS

We algorithmically extracted quality-related data from all summaries of findings found in the Cochrane Database of Systematic Reviews. Each BoE was defined by a set of population, intervention, comparison, and outcome criteria and assigned a quality grade (high, moderate, low, or very low) together with quality criteria (justification) that influenced that decision. Different statistical data, metadata about the review, and parts of the review text were extracted as support for grading each BoE. After pruning the resulting data set with various quality checks, we used it to train several neural-model variants. The predictions were compared against the labels originally assigned by the authors of the systematic reviews.

RESULTS

Our quality assessment data set, Cochrane Database of Systematic Reviews Quality of Evidence, contains 13,440 instances, or BoEs labeled for quality, originating from 2252 systematic reviews published on the internet from 2002 to 2020. On the basis of a 10-fold cross-validation, the best neural binary classifiers for quality criteria detected risk of bias at 0.78 F₁ (P=.68; R=0.92) and imprecision at 0.75 F₁ (P=.66; R=0.86), while the performance on inconsistency, indirectness, and publication bias criteria was lower (F₁ in the range of 0.3-0.4). The prediction of the overall quality grade into 1 of the 4 levels resulted in 0.5 F₁. When casting the task as a binary problem by merging the Grading of Recommendation, Assessment, Development, and Evaluation classes (high+moderate vs low+very low-quality evidence), we attained 0.74 F₁. We also found that the results varied depending on the supporting information that is provided as an input to the models.

CONCLUSIONS

Different factors affect the quality of evidence in the context of systematic reviews of medical evidence. Some of these (risk of bias and imprecision) can be automated with reasonable accuracy. Other quality dimensions such as indirectness, inconsistency, and publication bias prove more challenging for machine learning, largely because they are much rarer. This technology could substantially reduce reviewer workload in the future and expedite quality assessment as part of evidence synthesis.

Collapse

Abdelkader W, Navarro T, Parrish R, Cotoi C, Germini F, Iorio A, Haynes RB, Lokker C. Machine Learning Approaches to Retrieve High-Quality, Clinically Relevant Evidence From the Biomedical Literature: Systematic Review. JMIR Med Inform 2021;9:e30401. [PMID: 34499041 PMCID: PMC8461527 DOI: 10.2196/30401] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Revised: 07/15/2021] [Accepted: 07/25/2021] [Indexed: 11/20/2022] Open

Afzal M, Alam F, Malik KM, Malik GM. Clinical Context-Aware Biomedical Text Summarization Using Deep Neural Network: Model Development and Validation. J Med Internet Res 2020;22:e19810. [PMID: 33095174 PMCID: PMC7647812 DOI: 10.2196/19810] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2020] [Accepted: 09/24/2020] [Indexed: 01/09/2023] Open

Abstract

Background

Automatic text summarization (ATS) enables users to retrieve meaningful evidence from big data of biomedical repositories to make complex clinical decisions. Deep neural and recurrent networks outperform traditional machine-learning techniques in areas of natural language processing and computer vision; however, they are yet to be explored in the ATS domain, particularly for medical text summarization.

Objective

Traditional approaches in ATS for biomedical text suffer from fundamental issues such as an inability to capture clinical context, quality of evidence, and purpose-driven selection of passages for the summary. We aimed to circumvent these limitations through achieving precise, succinct, and coherent information extraction from credible published biomedical resources, and to construct a simplified summary containing the most informative content that can offer a review particular to clinical needs.

Methods

In our proposed approach, we introduce a novel framework, termed Biomed-Summarizer, that provides quality-aware Patient/Problem, Intervention, Comparison, and Outcome (PICO)-based intelligent and context-enabled summarization of biomedical text. Biomed-Summarizer integrates the prognosis quality recognition model with a clinical context–aware model to locate text sequences in the body of a biomedical article for use in the final summary. First, we developed a deep neural network binary classifier for quality recognition to acquire scientifically sound studies and filter out others. Second, we developed a bidirectional long-short term memory recurrent neural network as a clinical context–aware classifier, which was trained on semantically enriched features generated using a word-embedding tokenizer for identification of meaningful sentences representing PICO text sequences. Third, we calculated the similarity between query and PICO text sequences using Jaccard similarity with semantic enrichments, where the semantic enrichments are obtained using medical ontologies. Last, we generated a representative summary from the high-scoring PICO sequences aggregated by study type, publication credibility, and freshness score.

Results

Evaluation of the prognosis quality recognition model using a large dataset of biomedical literature related to intracranial aneurysm showed an accuracy of 95.41% (2562/2686) in terms of recognizing quality articles. The clinical context–aware multiclass classifier outperformed the traditional machine-learning algorithms, including support vector machine, gradient boosted tree, linear regression, K-nearest neighbor, and naïve Bayes, by achieving 93% (16127/17341) accuracy for classifying five categories: aim, population, intervention, results, and outcome. The semantic similarity algorithm achieved a significant Pearson correlation coefficient of 0.61 (0-1 scale) on a well-known BIOSSES dataset (with 100 pair sentences) after semantic enrichment, representing an improvement of 8.9% over baseline Jaccard similarity. Finally, we found a highly positive correlation among the evaluations performed by three domain experts concerning different metrics, suggesting that the automated summarization is satisfactory.

Conclusions

By employing the proposed method Biomed-Summarizer, high accuracy in ATS was achieved, enabling seamless curation of research evidence from the biomedical literature to use for clinical decision-making.

Collapse

Deep Learning Based Biomedical Literature Classification Using Criteria of Scientific Rigor. ELECTRONICS 2020. [DOI: 10.3390/electronics9081253] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]

Afzal M, Hussain M, Malik KM, Lee S. Impact of Automatic Query Generation and Quality Recognition Using Deep Learning to Curate Evidence From Biomedical Literature: Empirical Study. JMIR Med Inform 2019;7:e13430. [PMID: 31815673 PMCID: PMC6928703 DOI: 10.2196/13430] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Revised: 08/07/2019] [Accepted: 09/26/2019] [Indexed: 11/13/2022] Open