1
|
Jacquemyn X, Guerrier K, Harvey E, Tackett S, Kutty S, Wetzel GT. pECGreview: Assessment of a Novel Tool to Evaluate the Accuracy of Pediatric ECG Interpretation Skills. Pediatr Cardiol 2024:10.1007/s00246-024-03556-z. [PMID: 38953950 DOI: 10.1007/s00246-024-03556-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/03/2024] [Accepted: 06/14/2024] [Indexed: 07/04/2024]
Abstract
The skill of interpretation of the electrocardiogram (ECG) remains poor despite existing educational initiatives. We sought to evaluate the validity of using a subjective scoring system to assess the accuracy of ECG interpretations submitted by pediatric cardiology fellows, trainees, and faculty to the Pediatric ECG Review (pECGreview), a web-based ECG interpretation training program. We conducted a retrospective, cross-sectional study of responses submitted to pECGreview. ECG interpretations were assessed independently by four individuals with a range of experience. Accuracy was assessed using a 3-point scale: 100% for generally correct interpretations, 50% for over- or underdiagnosis of minor ECG abnormalities, and 0% for over- or underdiagnosis of major ECG abnormalities. Inter-rater agreement was assessed using expanded Bland-Altman plots, Pearson correlation coefficients, and Intraclass Correlation Coefficients (ICC). 1460 ECG interpretations by 192 participants were analyzed. 107 participants interpreted at least five ECGs. The mean accuracy score was 76.6 ± 13.7%. Participants were correct in 66.1 ± 5.1%, had minor over- or underdiagnosis in 21.5 ± 4.6% and major over- or underdiagnosis in 12.3 ± 3.9% of interpretations. Validation of agreement between evaluators demonstrated limits of agreement of 11.3%. Inter-rater agreement exhibited consistent patterns (all correlations ≥ 0.75). Absolute agreement was 0.74 (95% CI 0.69-0.80), and average measures agreement was 0.92 (95% CI 0.89-0.94). Accuracy score analysis of as few as five ECG interpretations submitted to pECGreview yielded good inter-rater reliability for assessing and ranking ECG interpretation skills in pediatric cardiology fellows in training.
Collapse
Affiliation(s)
- Xander Jacquemyn
- Helen B. Taussig Heart Center, Department of Pediatrics, Johns Hopkins Hospital, M2315, 1800 Orleans St, Baltimore, MD, 21287, USA
- Department of Cardiovascular Sciences, KU Leuven, Leuven, Belgium
| | - Karine Guerrier
- Division of Peds Cardiology, Department of Pediatrics, University of Tennessee Health Science Center, College of Medicine, Memphis, TN, USA
| | - Evan Harvey
- Division of Peds Cardiology, Department of Pediatrics, University of Tennessee Health Science Center, College of Medicine, Memphis, TN, USA
| | - Sean Tackett
- Biostatistics, Epidemiology, and Data Management Core, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Shelby Kutty
- Helen B. Taussig Heart Center, Department of Pediatrics, Johns Hopkins Hospital, M2315, 1800 Orleans St, Baltimore, MD, 21287, USA
| | - Glenn T Wetzel
- Helen B. Taussig Heart Center, Department of Pediatrics, Johns Hopkins Hospital, M2315, 1800 Orleans St, Baltimore, MD, 21287, USA.
- Division of Peds Cardiology, Department of Pediatrics, University of Tennessee Health Science Center, College of Medicine, Memphis, TN, USA.
| |
Collapse
|
2
|
Al-Dasuqi K, Taylor E, Ehrlich L, Cooperman D, Socci A, Tuason D, Hoerner M, Staib L, Silva CT. Performance and reliability assessment of a lower dose, task-based scoliosis radiography protocol in pediatric patients. Pediatr Radiol 2024; 54:146-153. [PMID: 38010426 DOI: 10.1007/s00247-023-05812-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 11/08/2023] [Accepted: 11/09/2023] [Indexed: 11/29/2023]
Abstract
BACKGROUND Follow-up scoliosis radiographs are performed to assess the degree of spinal curvature and skeletal maturity, which can be done at lower radiation exposures than those in standard-dose radiography. OBJECTIVE Describe and evaluate a protocol that reduced the radiation in follow-up frontal-view scoliosis radiographs. MATERIALS AND METHODS We implemented a postero-anterior lower dose modified-technique for scoliosis radiography with task-based definition of adequate image quality and use of technique charts based on target exposure index and patient's height and weight. We subsequently retrospectively evaluated 40 consecutive patients who underwent a follow-up radiograph using the modified-technique after an initial standard-technique radiograph. We evaluated comparisons of proportions for subjective assessment with chi-squared tests, and agreements of reader's scores with intraclass correlation coefficients and Bland-Altman plots. We determined incident air kerma, exposure index, deviation index/standard deviation, dose-area product (DAP), and effective dose for each radiograph. We set statistical significance at P<0.05. RESULTS Forty patients (65% female), aged 4-17 years. Median effective dose was reduced from 39 to 10 µSv (P<0.001), incident air kerma from 139 to 29 µSv (P<0.001), and DAP from 266 to 55 mGy*cm2 (P<0.001). All modified-technique parameters were rated with a mean score of acceptable or above. All modified-technique measurements obtained inter- and intra-observer correlation coefficient agreements of 0.86 ("Good") or greater. CONCLUSION Substantial dose reduction on follow-up scoliosis imaging with existing radiography units is achievable through task-based definition of adequate image quality and tailoring of radiation to each patient's height and weight, while still allowing for reliable assessment and reproducible measurements.
Collapse
Affiliation(s)
- Khalid Al-Dasuqi
- Department of Radiology and Biomedical Imaging, Yale School of Medicine, 333 Cedar St, New Haven, CT, 06510, USA
- Department of Radiology, Boston Children's Hospital, Boston, MA, USA
| | - Erin Taylor
- Department of Radiology and Biomedical Imaging, Yale School of Medicine, 333 Cedar St, New Haven, CT, 06510, USA
- Advanced Diagnostic Imaging, St. Vincent's Medical Center, Hartford Healthcare, Bridgeport, CT, USA
| | - Lauren Ehrlich
- Department of Radiology and Biomedical Imaging, Yale School of Medicine, 333 Cedar St, New Haven, CT, 06510, USA
| | - Daniel Cooperman
- Department of Orthopaedics and Rehabilitation, Yale School of Medicine, New Haven, CT, USA
| | - Adrienne Socci
- Department of Orthopaedics and Rehabilitation, Yale School of Medicine, New Haven, CT, USA
| | - Dominick Tuason
- Department of Orthopaedics and Rehabilitation, Yale School of Medicine, New Haven, CT, USA
| | - Matthew Hoerner
- Department of Radiology and Biomedical Imaging, Yale School of Medicine, 333 Cedar St, New Haven, CT, 06510, USA
| | - Lawrence Staib
- Department of Radiology and Biomedical Imaging, Yale School of Medicine, 333 Cedar St, New Haven, CT, 06510, USA
- Department of Biomedical Engineering, Yale School of Engineering, New Haven, CT, USA
- Department of Electrical Engineering, Yale School of Engineering, New Haven, CT, USA
| | - Cicero T Silva
- Department of Radiology and Biomedical Imaging, Yale School of Medicine, 333 Cedar St, New Haven, CT, 06510, USA.
| |
Collapse
|
3
|
Yang F, Zamzmi G, Angara S, Rajaraman S, Aquilina A, Xue Z, Jaeger S, Papagiannakis E, Antani SK. Assessing Inter-Annotator Agreement for Medical Image Segmentation. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2023; 11:21300-21312. [PMID: 37008654 PMCID: PMC10062409 DOI: 10.1109/access.2023.3249759] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Artificial Intelligence (AI)-based medical computer vision algorithm training and evaluations depend on annotations and labeling. However, variability between expert annotators introduces noise in training data that can adversely impact the performance of AI algorithms. This study aims to assess, illustrate and interpret the inter-annotator agreement among multiple expert annotators when segmenting the same lesion(s)/abnormalities on medical images. We propose the use of three metrics for the qualitative and quantitative assessment of inter-annotator agreement: 1) use of a common agreement heatmap and a ranking agreement heatmap; 2) use of the extended Cohen's kappa and Fleiss' kappa coefficients for a quantitative evaluation and interpretation of inter-annotator reliability; and 3) use of the Simultaneous Truth and Performance Level Estimation (STAPLE) algorithm, as a parallel step, to generate ground truth for training AI models and compute Intersection over Union (IoU), sensitivity, and specificity to assess the inter-annotator reliability and variability. Experiments are performed on two datasets, namely cervical colposcopy images from 30 patients and chest X-ray images from 336 tuberculosis (TB) patients, to demonstrate the consistency of inter-annotator reliability assessment and the importance of combining different metrics to avoid bias assessment.
Collapse
Affiliation(s)
- Feng Yang
- National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Ghada Zamzmi
- National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Sandeep Angara
- National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | | | - Zhiyun Xue
- National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Stefan Jaeger
- National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | - Sameer K Antani
- National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|
4
|
Bland–Altman Limits of Agreement from a Bayesian and Frequentist Perspective. STATS 2021. [DOI: 10.3390/stats4040062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Bland–Altman agreement analysis has gained widespread application across disciplines, last but not least in health sciences, since its inception in the 1980s. Bayesian analysis has been on the rise due to increased computational power over time, and Alari, Kim, and Wand have put Bland–Altman Limits of Agreement in a Bayesian framework (Meas. Phys. Educ. Exerc. Sci. 2021, 25, 137–148). We contrasted the prediction of a single future observation and the estimation of the Limits of Agreement from the frequentist and a Bayesian perspective by analyzing interrater data of two sequentially conducted, preclinical studies. The estimation of the Limits of Agreement θ1 and θ2 has wider applicability than the prediction of single future differences. While a frequentist confidence interval represents a range of nonrejectable values for null hypothesis significance testing of H0: θ1 ≤ −δ or θ2 ≥ δ against H1: θ1 > −δ and θ2 < δ, with a predefined benchmark value δ, Bayesian analysis allows for direct interpretation of both the posterior probability of the alternative hypothesis and the likelihood of parameter values. We discuss group-sequential testing and nonparametric alternatives briefly. Frequentist simplicity does not beat Bayesian interpretability due to improved computational resources, but the elicitation and implementation of prior information demand caution. Accounting for clustered data (e.g., repeated measurements per subject) is well-established in frequentist, but not yet in Bayesian Bland–Altman analysis.
Collapse
|