1
|
Rtam N. Evaluation of the departmental inter-rater reliability when scoring thyroid nodules according to the British Thyroid Association Ultrasound-classification model: Is there significant disagreement? Ultrasound 2024; 32:76-84. [PMID: 38694831 PMCID: PMC11060119 DOI: 10.1177/1742271x231215500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 08/30/2023] [Indexed: 05/04/2024]
Abstract
Introduction The British Thyroid Association Ultrasound-classification is a risk stratification model which grades thyroid nodules in U2-5 based on their sonographic appearance. Existence of variability between the ultrasound operators when U-scoring is reported in the literature with some evidence found in the author's department. The aim of this study was to investigate whether there is significant disagreement in the department and identify potential reasons for variability. Methods Eight operators, radiologists and sonographers, were recruited to grade 33 TNs and answer a tick box questionnaire using the British Thyroid Association lexicon. The inter-operator variability for the U-categories, indication for fine-needle aspiration biopsy and ultrasound features was assessed using Fleiss' kappa and Gwet-AC1. The operators' accuracy was measured against the most experienced operator in the department using Cohen's kappa and percentage agreement. Results Fair agreement (Fleiss' K = 0.21) was obtained between the participants when U-scoring (U2-5). Fair-to-moderate agreement was noted between sonographers (K = 0.40). Significant variability was demonstrated between radiologists (p > 0.05). Indication for fine-needle aspiration biopsy reached fair to almost substantial agreement (radiologists' AC1 = 0.34, sonographers' AC1 = 0.58, overall AC1 = 0.41). No significant variability measured for echogenicity (K = 0.29), composition (K = 0.33), shape (K = 0.58), margin (K = 0.45), halo (K = 0.34) and vascularity (K = 0.44). Accuracy reached fair agreement (mean Cohen's K = 0.29) and moderate agreement (mean AC1 = 0.53) for the U-categories and fine-needle aspiration biopsy, respectively. Radiologists demonstrated lower accuracy. Conclusion No significant inter-rater variability in U-scoring or recommending fine-needle aspiration biopsy was demonstrated between all the operators in the department. Radiologists showed significant variability in U-scoring and lower accuracy. Reliability and accuracy could be improved by addressing those problematic categories and features identified with this study.
Collapse
Affiliation(s)
- Nabil Rtam
- Ultrasound Department, Yeovil District Hospital, Somerset NHS Foundation Trust, Yeovil, UK
| |
Collapse
|
2
|
Pizzi MA, Damiao J. Inter-Rater Reliability of the Pizzi Health and Wellness Assessment (PHWA). Occup Ther Health Care 2024; 38:414-423. [PMID: 35703067 DOI: 10.1080/07380577.2022.2088916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Revised: 05/30/2022] [Accepted: 06/04/2022] [Indexed: 10/18/2022]
Abstract
The objective of this study is to determine the inter-rater reliability of the Pizzi Health and Wellness Assessment (PHWA) by comparing the consistency in scores between clients and their caregivers in the following areas of participation: social, physical, family, occupational, mental/emotional, and spiritual. A retrospective inter-rater correlational design was used to analyze the agreement of scores from a convenience sample consisting of two groups: clients with disabilities (n = 19) and their healthy caregivers (n = 19). Inter-rater reliability was calculated using correlations for the PHWA as a whole, and for the current level of participation and wishing to improve participation subsections. Inter-rater reliability as calculated by an Intraclass Correlation Coefficient, and either the Pearson or Spearman rho correlation and found to be reliable between clients and caregivers (rICC = .636, p < .001; rho = .642, p < .001). More specifically, current level of participation demonstrated acceptable reliability (rICC = .513, p < .001; r = .521, p < .001) as did wishing to improve participation (rICC = .689, p < .001; r = .725, p < .001). This supports the PHWA as a clinically relevant health and wellness occupational therapy assessment.
Collapse
Affiliation(s)
- Michael A Pizzi
- Occupational Therapy, New York Institute of Technology, Old Westbury, NY, USA
| | - John Damiao
- Occupational Therapy, Pace University, Pleasantville, NJ, USA
| |
Collapse
|
3
|
Stanley C, Rotman A, McKenzie D, Malcolm L, Paddle P. South of the UES: Improving the ability of speech-language pathologists to detect oesophageal abnormalities during videofluoroscopy swallowing studies. Int J Speech Lang Pathol 2024; 26:225-232. [PMID: 37403440 DOI: 10.1080/17549507.2023.2225801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/06/2023]
Abstract
PURPOSE With two-thirds of adults presenting for a videofluoroscopy swallow study (VFSS) with oesophageal abnormalities, it seems prudent to include visualisation of the oesophagus, in the context of the entire swallow process, to provide further information to the diagnostic team. This study aims to evaluate the ability of speech-language pathologists (SLPs) to interpret oesophageal sweep on VFSS and the relative improvement in that ability with additional training. METHOD One hundred SLPs attended training in oesophageal visualisation during VFSS, based on a previous study. Ten oesophageal sweep videos (five normal, five abnormal) with one 20 ml thin fluid barium bolus (19% w/v) were presented at baseline and following training. Raters were blinded to patient information other than age. Binary ratings were collected for oesophageal transit time (OTT), presence of stasis, redirection, and referral to other specialists. RESULT Inter-rater reliability as measured by Fleiss' kappa improved for all parameters, reaching statistical significance for OTT (pre-test kappa = 0.34, post-test kappa = 0.73; p < 0.01) and redirection (pre-test kappa = 0.38, post-test kappa = 0.49; p < 0.05). Overall agreement improved significantly (p < 0.001) for all parameters except stasis, where improvement was only slight. Interaction between pre-post and type of video (normal/abnormal) was statistically significant (p < 0.001) for redirection, with a large pre-post increase in positive accuracy compared with a slight pre-post decrease in negative accuracy. CONCLUSION Findings indicate that SLPs require training to accurately interpret an oesophageal sweep on VFSS. This supports the inclusion of education and training on both normal and abnormal oesophageal sweep patterns, and the use of standardised protocols for clinicians using oesophageal visualisation as part of the VFSS protocol.
Collapse
Affiliation(s)
- Claire Stanley
- Department of Otolaryngology, Head and Neck Surgery, Monash Health, Melbourne, Australia
- Department of Surgery, Nursing and Health Sciences, Monash University, Melbourne, Australia
- Melbourne Swallow Analysis Centre, Melbourne, Australia
| | - Anthony Rotman
- Department of Otolaryngology, Head and Neck Surgery, Monash Health, Melbourne, Australia
- Department of Surgery, Nursing and Health Sciences, Monash University, Melbourne, Australia
- Melbourne Swallow Analysis Centre, Melbourne, Australia
| | - Dean McKenzie
- Epworth HealthCare, Melbourne, Australia, and
- Department of Epidemiology and Preventive Medicine, Monash University, Melbourne, Australia
| | | | - Paul Paddle
- Department of Otolaryngology, Head and Neck Surgery, Monash Health, Melbourne, Australia
- Department of Surgery, Nursing and Health Sciences, Monash University, Melbourne, Australia
- Melbourne Swallow Analysis Centre, Melbourne, Australia
- Epworth HealthCare, Melbourne, Australia, and
| |
Collapse
|
4
|
Yoshida R, Kuruma H. Intra- and Inter-rater Reliability of the Lumbar-Locked Thoracic Rotation Test in Patients With Neck Pain. Cureus 2024; 16:e56407. [PMID: 38638709 PMCID: PMC11023911 DOI: 10.7759/cureus.56407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/18/2024] [Indexed: 04/20/2024] Open
Abstract
PURPOSE Neck pain is a common musculoskeletal disorder. Therefore, establishing effective physical therapy for neck pain is one of the most important issues. In addition, in physical therapy for neck pain, it is important to evaluate the thoracic spine, which is an adjacent region of the neck. The lumbar-locked rotation test is designed to evaluate the rotational range of the thoracic spine. However, the reliability of the test when performed on patients with neck pain has not been confirmed. OBJECTIVE We aimed to determine the intra- and inter-rater reliability of the lumbar-locked rotation test in patients with neck pain. METHODS In this study involving 43 patients, two separate examiners measured thoracic spine rotation. Both examiners conducted three measurements for each side, before and after a five-minute interval. Reliability was assessed using various intra-class correlation coefficient (ICC) models. RESULTS The intra-rater reliability showed ICC values of 0.99 for both examiners. The inter-rater reliability showed ICC values of 0.98 for both right and left thoracic rotations. CONCLUSION The findings strongly suggest that the lumbar-locked rotation test has high within-session intra- and inter-rater reliability for patients with neck pain. This test can be considered a reliable method of measuring the thoracic spine rotational range of motion in patients with neck pain in clinical practice.
Collapse
Affiliation(s)
- Ryota Yoshida
- Department of Physical Therapy, Tokyo Metropolitan University, Tokyo, JPN
| | - Hironobu Kuruma
- Department of Physical Therapy, Tokyo Metropolitan University, Tokyo, JPN
| |
Collapse
|
5
|
Lidbeck C, Bartonek Å, Ferrari A, Alboresi S, Örtqvist M. Signs of perceptual disorder during movement were reliably assessed in children with cerebral palsy in Sweden. Acta Paediatr 2024; 113:344-352. [PMID: 37874018 DOI: 10.1111/apa.17012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Revised: 10/03/2023] [Accepted: 10/12/2023] [Indexed: 10/25/2023]
Abstract
AIM The aim of this Swedish study was to evaluate the assessment of clinical signs of perceptual disorder in children with cerebral palsy (CP). METHODS Three experienced raters assessed 56 videos of 19 children from 1 to 18 years of age with bilateral spastic CP, which were recorded by colleagues at an Italian hospital. Six signs were evaluated for inter-rater reliability and criterion validity. Clinical applicability was evaluated by assessing inter-rater reliability between 47 Swedish clinicians, who examined 15 of the videos during face-to-face and online education seminars. There were 41 physiotherapists, two occupational therapists and four doctors, with 1-37 years of clinical experience and a median of 10 years. RESULTS The experienced raters demonstrated moderate to almost perfect inter-rater reliability (kappa 0.54-0.81) and criterion validity (0.54-0.87) for startle reaction, upper limbs in startle position, averted eye gaze and eye blinking. The clinicians recognised these signs with at least moderate reliability (0.56-0.88). Grimacing and posture freezing were less reliable (0.22-0.35) and valid (0.09-0.50). CONCLUSION Four of the six signs of perceptual disorder were reliably recognised by experienced raters and by clinicians after education seminars. Extended education and larger study samples are needed to recognise all the signs.
Collapse
Affiliation(s)
- Cecilia Lidbeck
- Department of Women's and Children's Health, Karolinska Institutet, Stockholm, Sweden
- Astrid Lindgren Children's Hospital, Karolinska University Hospital, Stockholm, Sweden
| | - Åsa Bartonek
- Department of Women's and Children's Health, Karolinska Institutet, Stockholm, Sweden
- Astrid Lindgren Children's Hospital, Karolinska University Hospital, Stockholm, Sweden
| | - Adriano Ferrari
- Department of Neuroscience, University of Modena and Reggio Emilia, Reggio Emilia, Italy
| | - Silvia Alboresi
- Children Rehabilitation Unit of S. M. Nuova Hospital, AUSL-IRCCS, Reggio Emilia, Italy
| | - Maria Örtqvist
- Department of Women's and Children's Health, Karolinska Institutet, Stockholm, Sweden
- Astrid Lindgren Children's Hospital, Karolinska University Hospital, Stockholm, Sweden
| |
Collapse
|
6
|
Lesinski M, Bashford G, Markov A, Risch L, Cassel M. Reliability of assessing skeletal muscle architecture and tissue organization of the gastrocnemius medialis and vastus lateralis muscle using ultrasound and spatial frequency analysis. Front Sports Act Living 2024; 6:1282031. [PMID: 38304420 PMCID: PMC10830747 DOI: 10.3389/fspor.2024.1282031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Accepted: 01/05/2024] [Indexed: 02/03/2024] Open
Abstract
Introduction The purpose of this study was to investigate inter- and intra-rater reliability as well as the inter-rater interpretation error of ultrasound measurements assessing skeletal muscle architecture and tissue organization of the gastrocnemius medialis (GM) and vastus lateralis (VL) muscle. Methods The GM and VL of 13 healthy adults (22 ± 3 years) were examined thrice with sagittal B-mode ultrasound: intraday test-retest examination by one investigator (intra-rater) and separate examinations by two investigators (inter-rater). Additionally, images from one investigator were analysed by two interpretators (interpretation error). Muscle architecture was assessed by muscle thickness [MT], fascicle length [FL], as well as superior and inferior pennation angle [PA]. Muscle tissue organization was determined by spatial frequency analysis (SFA: peak spatial frequency radius, peak -6 dB width, PSFR/P6, normalized peak value of amplitude spectrum [Amax], power within peak [PWP], peak power percent). Reliability of ultrasound examination and image interpretation are presented as intraclass correlation coefficient (ICC), test-retest variability, standard error of measurement as well as bias and limits of agreement. Results GM and VL demonstrated excellent ICCs for inter- and intra-rater reliability, along with excellent ICCs for interpretation error of MT (0.91-0.99), showing minimal variability (<5%) and SEM% (<5%). Systematic bias for MT was less than 1 mm. For PA and FL poor to good ICCs for inter- and intra-rater reliability were revealed (0.41-0.90), with moderate variability (<12%), low SEM% (<10%) and systematic bias between 0.1-1.4°. Tissue organization analysis indicated moderate to good ICCs for inter- and intra-rater reliability. Notably, Amax and PWP consistently held the highest ICC values (0.77-0.87) across all analyses but with higher variability (<24%) and SEM% (<18%), compared to lower variability (<9%) and SEM% (<8%) in other tissue organization parameters. Interpretation error of all muscle tissue organization parameters showed excellent ICCs (0.96-0.999) with very low variability (≤1%) and SEM% (<2%), except Amax & PWP (TRV%: <6%; SEM%: <7%). Conclusion Our findings demonstrated excellent inter- and intra-rater reliability for MT. However, agreement for PA, FL, and SFA parameters was not as strong. Additionally, MT and all SFA parameters exhibited excellent agreement for inter-rater interpretation error. Therefore, the SFA seems to offer the possibility of objectively and reliably evaluating ultrasound images.
Collapse
Affiliation(s)
- Melanie Lesinski
- Division of Training and Movement Sciences, Research Focus Cognition Sciences, University of Potsdam, Potsdam, Germany
| | - Gregory Bashford
- Department of Biological Systems Engineering, University of Nebraska, Lincoln, NE, United States
| | - Adrian Markov
- Division of Training and Movement Sciences, Research Focus Cognition Sciences, University of Potsdam, Potsdam, Germany
| | - Lucie Risch
- Department of Sports Medicine, University Outpatient Clinic, University of Potsdam, Potsdam, Germany
| | - Michael Cassel
- Department of Sports Medicine, University Outpatient Clinic, University of Potsdam, Potsdam, Germany
| |
Collapse
|
7
|
Moss J. Measures of Agreement with Multiple Raters: Fréchet Variances and Inference. Psychometrika 2024:10.1007/s11336-023-09945-2. [PMID: 38190018 DOI: 10.1007/s11336-023-09945-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 12/06/2023] [Indexed: 01/09/2024]
Abstract
Most measures of agreement are chance-corrected. They differ in three dimensions: their definition of chance agreement, their choice of disagreement function, and how they handle multiple raters. Chance agreement is usually defined in a pairwise manner, following either Cohen's kappa or Fleiss's kappa. The disagreement function is usually a nominal, quadratic, or absolute value function. But how to handle multiple raters is contentious, with the main contenders being Fleiss's kappa, Conger's kappa, and Hubert's kappa, the variant of Fleiss's kappa where agreement is said to occur only if every rater agrees. More generally, multi-rater agreement coefficients can be defined in a g-wise way, where the disagreement weighting function uses g raters instead of two. This paper contains two main contributions. (a) We propose using Fréchet variances to handle the case of multiple raters. The Fréchet variances are intuitive disagreement measures and turn out to generalize the nominal, quadratic, and absolute value functions to the case of more than two raters. (b) We derive the limit theory of g-wise weighted agreement coefficients, with chance agreement of the Cohen-type or Fleiss-type, for the case where every item is rated by the same number of raters. Trying out three confidence interval constructions, we end up recommending calculating confidence intervals using the arcsine transform or the Fisher transform.
Collapse
Affiliation(s)
- Jonas Moss
- Department of Data Science and Analytics, BI Norwegian Business School, Oslo, Norway.
| |
Collapse
|
8
|
Kim JH, Kim SK, Choi J, Lee Y. Reliability of ChatGPT for performing triage task in the emergency department using the Korean Triage and Acuity Scale. Digit Health 2024; 10:20552076241227132. [PMID: 38250148 PMCID: PMC10798071 DOI: 10.1177/20552076241227132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 12/28/2023] [Indexed: 01/23/2024] Open
Abstract
Background Artificial intelligence (AI) technology can enable more efficient decision-making in healthcare settings. There is a growing interest in improving the speed and accuracy of AI systems in providing responses for given tasks in healthcare settings. Objective This study aimed to assess the reliability of ChatGPT in determining emergency department (ED) triage accuracy using the Korean Triage and Acuity Scale (KTAS). Methods Two hundred and two virtual patient cases were built. The gold standard triage classification for each case was established by an experienced ED physician. Three other human raters (ED paramedics) were involved and rated the virtual cases individually. The virtual cases were also rated by two different versions of the chat generative pre-trained transformer (ChatGPT, 3.5 and 4.0). Inter-rater reliability was examined using Fleiss' kappa and intra-class correlation coefficient (ICC). Results The kappa values for the agreement between the four human raters and ChatGPTs were .523 (version 4.0) and .320 (version 3.5). Of the five levels, the performance was poor when rating patients at levels 1 and 5, as well as case scenarios with additional text descriptions. There were differences in the accuracy of the different versions of GPTs. The ICC between version 3.5 and the gold standard was .520, and that between version 4.0 and the gold standard was .802. Conclusions A substantial level of inter-rater reliability was revealed when GPTs were used as KTAS raters. The current study showed the potential of using GPT in emergency healthcare settings. Considering the shortage of experienced manpower, this AI method may help improve triaging accuracy.
Collapse
Affiliation(s)
- Jae Hyuk Kim
- Department of Emergency Medicine, Mokpo Hankook Hospital, Jeonnam, South Korea
| | - Sun Kyung Kim
- Department of Nursing, Mokpo National University, Jeonnam, South Korea
- Department of Biomedicine, Health & Life Convergence Sciences, Biomedical and Healthcare Research Institute, Jeonnam, South Korea
| | - Jongmyung Choi
- Department of Computer Engineering, Mokpo National University, Jeonnam, South Korea
| | - Youngho Lee
- Department of Computer Engineering, Mokpo National University, Jeonnam, South Korea
| |
Collapse
|
9
|
Cortese MD, Arcuri F, Vatrano M, Pioggia G, Cerasa A, Raso MG, Tonin P, Riganello F. Wessex Head Injury Matrix in Patients with Prolonged Disorders of Consciousness: A Reliability Study. Biomedicines 2023; 12:82. [PMID: 38255189 PMCID: PMC10813453 DOI: 10.3390/biomedicines12010082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 12/22/2023] [Accepted: 12/27/2023] [Indexed: 01/24/2024] Open
Abstract
Introduction: The Wessex Head Injury Matrix (WHIM) was developed to assess patients with disorders of consciousness (DOC) and was tested in terms of inter-rater reliability (IRR) and test-retest reliability (TRR) in the year 2000. The American Congress of Rehabilitation and Medicine reported that IRR and TRR were unproven. We aim to assess the reliability of the WHIM in prolonged DOC patients (PDOC). Methods: A total of 51 PDOC patients (32 unresponsive wakefulness syndrome (UWS/VS) and 19 minimally conscious state (MCS)) who were hosted in a dedicated unit for long-term brain injury care were enrolled. The time from injury ranged from 182 to 3325 days. Two raters administered the Coma Recovery Scale-Revised (CRS-R) and the WHIM to test the IRR and TRR. The TRR was administered two weeks after the first assessment. Results: For the CRS-R, the agreement in IRR and TRR was perfect between the two raters. The agreement for the WHIM ranged from substantial to almost perfect for IRR and from fair to substantial for the TRR. Conclusions: The WHIM showed a strong IRR when administered by expert raters and strongly correlated with the CRS-R. This study provides further evidence of the psychometric qualities of the WHIM and the importance of its use in PDOC patients.
Collapse
Affiliation(s)
- Maria Daniela Cortese
- Research in Advanced Neurorehabilitation, S. Anna Institute, Via Siris, 11, 88900 Crotone, Italy; (M.D.C.); (F.A.); (M.V.); (A.C.); (P.T.)
| | - Francesco Arcuri
- Research in Advanced Neurorehabilitation, S. Anna Institute, Via Siris, 11, 88900 Crotone, Italy; (M.D.C.); (F.A.); (M.V.); (A.C.); (P.T.)
| | - Martina Vatrano
- Research in Advanced Neurorehabilitation, S. Anna Institute, Via Siris, 11, 88900 Crotone, Italy; (M.D.C.); (F.A.); (M.V.); (A.C.); (P.T.)
| | - Giovanni Pioggia
- Institute for Biomedical Research and Innovation (IRIB), National Research Council of Italy (CNR), 98100 Messina, Italy;
| | - Antonio Cerasa
- Research in Advanced Neurorehabilitation, S. Anna Institute, Via Siris, 11, 88900 Crotone, Italy; (M.D.C.); (F.A.); (M.V.); (A.C.); (P.T.)
- Institute for Biomedical Research and Innovation (IRIB), National Research Council of Italy (CNR), 98100 Messina, Italy;
| | - Maria Girolama Raso
- Research in Advanced Neurorehabilitation, S. Anna Institute, Via Siris, 11, 88900 Crotone, Italy; (M.D.C.); (F.A.); (M.V.); (A.C.); (P.T.)
| | - Paolo Tonin
- Research in Advanced Neurorehabilitation, S. Anna Institute, Via Siris, 11, 88900 Crotone, Italy; (M.D.C.); (F.A.); (M.V.); (A.C.); (P.T.)
| | - Francesco Riganello
- Research in Advanced Neurorehabilitation, S. Anna Institute, Via Siris, 11, 88900 Crotone, Italy; (M.D.C.); (F.A.); (M.V.); (A.C.); (P.T.)
| |
Collapse
|
10
|
Wood D, Reid M, Elliot B, Alderson J, Mian A. The expert eye? An inter-rater comparison of elite tennis serve kinematics and performance. J Sports Sci 2023; 41:1779-1786. [PMID: 38155177 DOI: 10.1080/02640414.2023.2298102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 12/13/2023] [Indexed: 12/30/2023]
Abstract
This study examined the reliability of expert tennis coaches/biomechanists to qualitatively assess selected features of the serve with the aid of two-dimensional (2D) video replays. Two expert high-performance coaches rated the serves of 150 male and 150 female players across three different age groups from two different camera viewing angles. Serve performance was rated across 13 variables that represented commonly investigated and coached (serve) mechanics using a 1-7 Likert rating scale. A total of 7800 ratings were performed. The reliability of the experts' ratings was assessed using a Krippendorffs alpha. Strong agreement was shown across all age groups and genders when the experts rated the overall serve score (0.727-0.924), power or speed of the serve (0.720-0.907), rhythm (0.744-0.944), quality of the trunk action (0.775-1.000), leg drive (0.731-0.959) and the likelihood of back injury (0.703-0.934). They encountered greater difficulty in consistently rating shoulder internal rotation speed (0.688-0.717). In high-performance settings, the desire for highly precise measurement and large data sets powered by new technologies, is commonplace but this study revealed that tennis experts, through the use of 2D video, can reliably rate important mechanical features of the game's most important shot, the serve.
Collapse
Affiliation(s)
- Dylan Wood
- University of Western Australia & Tennis Australia, Perth, Australia
| | - Machar Reid
- University of Western Australia & Tennis Australia, Perth, Australia
| | - Bruce Elliot
- School of Human Sciences, University of Western Australia, Perth, Australia
| | | | - Ajmal Mian
- School of Mathematics and Computer Science, University of Western Australia, Perth, Australia
| |
Collapse
|
11
|
Olvet DM, Bird JB, Fulton TB, Kruidering M, Papp KK, Qua K, Willey JM, Brenner JM. A Multi-institutional Study of the Feasibility and Reliability of the Implementation of Constructed Response Exam Questions. Teach Learn Med 2023; 35:609-622. [PMID: 35989668 DOI: 10.1080/10401334.2022.2111571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Accepted: 07/27/2022] [Indexed: 06/15/2023]
Abstract
PROBLEM Some medical schools have incorporated constructed response short answer questions (CR-SAQs) into their assessment toolkits. Although CR-SAQs carry benefits for medical students and educators, the faculty perception that the amount of time required to create and score CR-SAQs is not feasible and concerns about reliable scoring may impede the use of this assessment type in medical education. INTERVENTION Three US medical schools collaborated to write and score CR-SAQs based on a single vignette. Study participants included faculty question writers (N = 5) and three groups of scorers: faculty content experts (N = 7), faculty non-content experts (N = 6), and fourth-year medical students (N = 7). Structured interviews were performed with question writers and an online survey was administered to scorers to gather information about their process for creating and scoring CR-SAQs. A content analysis was performed on the qualitative data using Bowen's model of feasibility as a framework. To examine inter-rater reliability between the content expert and other scorers, a random selection of fifty student responses from each site were scored by each site's faculty content experts, faculty non-content experts, and student scorers. A holistic rubric (6-point Likert scale) was used by two schools and an analytic rubric (3-4 point checklist) was used by one school. Cohen's weighted kappa (κw) was used to evaluate inter-rater reliability. CONTEXT This research study was implemented at three US medical schools that are nationally dispersed and have been administering CR-SAQ summative exams as part of their programs of assessment for at least five years. The study exam question was included in an end-of-course summative exam during the first year of medical school. IMPACT Five question writers (100%) participated in the interviews and twelve scorers (60% response rate) completed the survey. Qualitative comments revealed three aspects of feasibility: practicality (time, institutional culture, teamwork), implementation (steps in the question writing and scoring process), and adaptation (feedback, rubric adjustment, continuous quality improvement). The scorers' described their experience in terms of the need for outside resources, concern about lack of expertise, and value gained through scoring. Inter-rater reliability between the faculty content expert and student scorers was fair/moderate (κw=.34-.53, holistic rubrics) or substantial (κw=.67-.76, analytic rubric), but much lower between faculty content and non-content experts (κw=.18-.29, holistic rubrics; κw=.59-.66, analytic rubric). LESSONS LEARNED Our findings show that from the faculty perspective it is feasible to include CR-SAQs in summative exams and we provide practical information for medical educators creating and scoring CR-SAQs. We also learned that CR-SAQs can be reliably scored by faculty without content expertise or senior medical students using an analytic rubric, or by senior medical students using a holistic rubric, which provides options to alleviate the faculty burden associated with grading CR-SAQs.
Collapse
Affiliation(s)
- Doreen M Olvet
- Department of Science Education, Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Hempstead, New York, USA
| | - Jeffrey B Bird
- Department of Science Education, Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Hempstead, New York, USA
| | - Tracy B Fulton
- Department of Biochemistry and Biophysics, University of California San Francisco School of Medicine, San Francisco, California, USA
| | - Marieke Kruidering
- Department of Cellular & Molecular Pharmacology, University of California at San Francisco School of Medicine, San Francisco, California, USA
| | - Klara K Papp
- Center for Medical Education, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA
| | - Kelli Qua
- Research and Evaluation, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA
| | - Joanne M Willey
- Department of Science Education, Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Hempstead, New York, USA
| | - Judith M Brenner
- Department of Science Education, Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Hempstead, New York, USA
| |
Collapse
|
12
|
Capinha M, Rijo D, Matos M, Pereira M. Interpartner Agreement on Intimate Partner Violence Reports: Evidence From a Community Sample of Different-Sex Couples. Assessment 2023:10731911231196483. [PMID: 37732644 DOI: 10.1177/10731911231196483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/22/2023]
Abstract
An accurate assessment of intimate partner violence (IPV) is crucial to guide public policy and intervention. The Conflict Tactic Scales Revised (CTS-2) is one of the most widely used instruments to do so. Despite its good psychometric properties, research on interpartner agreement has pointed to low-to-moderate estimates, which generated some concerns about the validity of the results obtained through single-partner reports. This cross-sectional study introduces indexes that have not previously been used to assess interpartner agreement. Both partners' reports on perpetration and victimization were analyzed in a community sample of 268 different-sex couples. Our results generally pointed to better agreement levels on IPV occurrence than frequency, suggesting that the proxy method (i.e., using a single-partner report) could be a reliable method for assessing IPV occurrence but not its frequency in this population. Findings are discussed as well as the advantages and constraints of different IPV assessment practices.
Collapse
Affiliation(s)
- Marta Capinha
- Faculty of Psychology and Educational Sciences, Center for Research in Neuropsychology and Cognitive and Behavioral Intervention, University of Coimbra, Coimbra, Portugal
| | - Daniel Rijo
- Faculty of Psychology and Educational Sciences, Center for Research in Neuropsychology and Cognitive and Behavioral Intervention, University of Coimbra, Coimbra, Portugal
| | - Marlene Matos
- Psychology Research Center, School of Psychology, University of Minho, Braga, Portugal
| | - Marco Pereira
- Faculty of Psychology and Educational Sciences, Center for Research in Neuropsychology and Cognitive and Behavioral Intervention, University of Coimbra, Coimbra, Portugal
| |
Collapse
|
13
|
Quinton K, Guy-Frank CJ, Syed S, Klugh JM, Dhanani NH, Adibi SS, Kao LS. Poor Oral Health in Trauma Intensive Care Unit Patients: Application of a Novel Oral Health Score. Surg Infect (Larchmt) 2023; 24:657-662. [PMID: 37695683 DOI: 10.1089/sur.2023.171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/13/2023] Open
Abstract
Background: Although oral hygiene in patients in the intensive care unit (ICU) has been shown to reduce hospital-associated infections, baseline and progressive oral health are often not reported because of lack of a standardized tool. The Oral Health Risk Assessment Value Index (OHRAVI) is a comprehensive oral assessment validated by dental providers. This study hypothesizes that non-dental providers can use OHRAVI in trauma ICU patients with minimal training and acceptable inter-rater reliability (IRR). Patients and Methods: Dentulous adult patients in the ICU at a level 1 trauma center were scored, excluding those with severe orofacial trauma. The eight categories of the OHRAVI were scored 0 to 3 (best to worst) with summed total and index (average) score. Index scores 1 or less need routine oral care; greater than 1-2 require moderate care; and greater than 2-3 require extensive oromaxillofacial care. Inter-rater reliability was assessed by two to three raters with Krippendorff's α (≥0.80 for good and ≥0.667 for acceptable). Results: Eighty-four ratings were completed across 34 patients, with 16 patients (47%) scored by all three raters. Ten patients (29%) had an index score <1. The average index score for patients was 1.28 (median, 1.34; range, 0.63-2). Krippendorff's α for index score was 0.86. For individual categories, α ranged from 0.44 to 1, with six of the eight categories achieving an α ≥ 0.667. Conclusions: With minimal training, non-dental providers were able to use OHRAVI with a good IRR for index score and an acceptable/good IRR for most individual categories. This novel, simple, comprehensive oral health score could help standardize oral assessment and facilitate future studies of peri-operative oral hygiene interventions.
Collapse
Affiliation(s)
- Kayli Quinton
- Department of Surgery, McGovern Medical School at the University of Texas Health Science Center, Houston, Texas, USA
| | - Chelsea J Guy-Frank
- Department of Surgery, McGovern Medical School at the University of Texas Health Science Center, Houston, Texas, USA
- Center for Surgical Trials and Evidence-Based Practice, McGovern Medical School at the University of Texas Health Science Center, Houston, Texas, USA
| | - Sophia Syed
- Department of Surgery, McGovern Medical School at the University of Texas Health Science Center, Houston, Texas, USA
| | - James M Klugh
- Department of Surgery, McGovern Medical School at the University of Texas Health Science Center, Houston, Texas, USA
- Center for Surgical Trials and Evidence-Based Practice, McGovern Medical School at the University of Texas Health Science Center, Houston, Texas, USA
| | - Naila H Dhanani
- Department of Surgery, McGovern Medical School at the University of Texas Health Science Center, Houston, Texas, USA
- Center for Surgical Trials and Evidence-Based Practice, McGovern Medical School at the University of Texas Health Science Center, Houston, Texas, USA
| | - Shawn S Adibi
- UTHealth Houston, School of Dentistry, Houston, Texas, USA
| | - Lillian S Kao
- Department of Surgery, McGovern Medical School at the University of Texas Health Science Center, Houston, Texas, USA
- Center for Surgical Trials and Evidence-Based Practice, McGovern Medical School at the University of Texas Health Science Center, Houston, Texas, USA
- Center for Translational Injury Research, McGovern Medical School at the University of Texas Health Science Center, Houston, Texas, USA
- Center for Clinical Research and Evidence-Based Medicine, McGovern Medical School at the University of Texas Health Science Center, Houston, Texas, USA
| |
Collapse
|
14
|
Dolotova DD, Blagosklonova ER, Muslimov RS, Ramazanov GR, Zagryazkina TA, Stepanov VN, Gavrilov AV. Inter-Rater Reliability of Collateral Status Assessment Based on CT Angiography: A Retrospective Study of Middle Cerebral Artery Ischaemic Stroke. J Clin Med 2023; 12:5470. [PMID: 37685536 PMCID: PMC10487547 DOI: 10.3390/jcm12175470] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 08/12/2023] [Accepted: 08/15/2023] [Indexed: 09/10/2023] Open
Abstract
The importance of assessing the collateral status (CS) in patients with ischaemic stroke (IS) has repeatedly been emphasised in clinical guidelines. Various publications offer qualitative or semiquantitative scales with gradations corresponding to the different extents of the collaterals, visualised mostly on the basis of CTA images. However, information on their inter-rater reliability is limited. Therefore, the aim of this study is to investigate the inter-rater reliability of the scales for collateral assessment. CTA images of 158 patients in the acute period of IS were used in the study. The assessment of CS was performed by two experts using three methodologies: the modified Tan scale, the Miteff scale, and the Rosenthal scale. Cohen's kappa, weighted kappa and Krippendorff's alpha were used as reliability measures. For the modified Tan scale and the Miteff and Rosenthal scales, the weighted kappa values were 0.72, 0.49 and 0.59, respectively. Although the best measure of consistency was found for the modified Tan scale, no statistically significant differences were revealed among the scales. The impact of the CS on the degree of neurological deficit at discharge was shown for the modified Tan and Rosenthal scales. In conclusion, the analysis showed a moderate inter-rater reliability of the three scales, but was not able to distinguish the best one among them.
Collapse
Affiliation(s)
- Daria D. Dolotova
- Department of Bioinformatics, Department of Pediatric Surgery, Pirogov Russian National Research Medical University, Russian Ministry of Health, 117997 Moscow, Russia
- Research Department, Gammamed-Soft, Ltd., 127473 Moscow, Russia
| | | | - Rustam Sh. Muslimov
- Department of Radiology, Scientific Department of Emergency Neurology and Rehabilitation Treatment, N.V. Sklifosovsky Research Institute for Emergency Medicine, Moscow Health Department, 129090 Moscow, Russia
| | - Ganipa R. Ramazanov
- Department of Radiology, Scientific Department of Emergency Neurology and Rehabilitation Treatment, N.V. Sklifosovsky Research Institute for Emergency Medicine, Moscow Health Department, 129090 Moscow, Russia
| | | | - Valentin N. Stepanov
- Department of Radiology, Scientific Department of Emergency Neurology and Rehabilitation Treatment, N.V. Sklifosovsky Research Institute for Emergency Medicine, Moscow Health Department, 129090 Moscow, Russia
| | - Andrey V. Gavrilov
- Research Department, Gammamed-Soft, Ltd., 127473 Moscow, Russia
- Scobeltsyn Nuclear Physics Research Institute, Lomonosov Moscow State University, 119991 Moscow, Russia
| |
Collapse
|
15
|
Mickenautsch S, Rupf S, Miletić I, Strähle UT, Sturm R, Kimmie-Dhansay F, Vidosusić K, Yengopal V. Inter-rater reliability of the extended Composite Quality Score (CQS-2). Front Med (Lausanne) 2023; 10:1201517. [PMID: 37663665 PMCID: PMC10469905 DOI: 10.3389/fmed.2023.1201517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Accepted: 08/07/2023] [Indexed: 09/05/2023] Open
Abstract
Aim To establish the inter-rater reliability of the Composite Quality Score (CQS-2) and to test the null hypothesis that it did not differ significantly from that of the first CQS version (CQS-1). Materials and methods Four independent raters were selected to rate 45 clinical trial reports using CQS-1 and CQS-2. The raters remained unaware of each other's participation in this study until all rating had been completed. Each rater received only one rating template at a time in a random sequence for CQS-1 and CQS-2 rating. Raters completed each template and sent these back to the principal investigator. Each rater received their next template 2 weeks after submission of the completed previous template. The inter-rater reliabilities for the overall appraisal score of the CQS-1 and the CQS-2 were established by using the Brennan-Prediger coefficient (BPC). The coefficients of both CQS versions were compared by using the two-sample z-test. During secondary analysis, the BPCs for every criterion and each corroboration level for both CQS versions were established. Results The BPC for the CQS-1 was 0.85 (95% CI: 0.64-1.00) and for the CQS-2 it was 1.00 (95% CI: 0.94-1.00), suggesting a very high inter-rater reliability for both. The difference between the two CQS versions was statistically not significant (p = 0.17). The null hypothesis was accepted. Conclusion The CQS-2 is still under development, This study shows that it is associated with a very high inter-rater reliability, which did not statistically significantly differ from that of the CQS-1. The promising results of this study warrant further investigation in the applicability of the CQS-2 as an appraisal tool for prospective controlled clinical therapy trials.
Collapse
Affiliation(s)
- Steffen Mickenautsch
- Faculty of Dentistry, University of the Western Cape, Bellville, South Africa
- Department of Community Dentistry, Faculty of Health Sciences, School of Oral Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
- Review Centre for Health Science Research, Johannesburg, South Africa
| | - Stefan Rupf
- Synoptic Dentistry, Saarland University, Homburg, Germany
| | - Ivana Miletić
- Department of Endodontics and Restorative Dentistry, School of Dental Medicine, University of Zagreb, Zagreb, Croatia
| | | | - Richard Sturm
- Department of Operative, Preventive and Paediatric Dentistry, Charité – Universitätsmedizin Berlin, Berlin, Germany
| | - Faheema Kimmie-Dhansay
- Department of Community Oral Health, Faculty of Dentistry, University of the Western Cape, Bellville, South Africa
| | - Kata Vidosusić
- Department of Endodontics and Restorative Dentistry, School of Dental Medicine, University of Zagreb, Zagreb, Croatia
| | - Veerasamy Yengopal
- Faculty of Dentistry, University of the Western Cape, Bellville, South Africa
| |
Collapse
|
16
|
Frącz W, Matuska J, Szyszka J, Dobrakowski P, Szopka W, Skorupska E. The Cross-Sectional Area Assessment of Pelvic Muscles Using the MRI Manual Segmentation among Patients with Low Back Pain and Healthy Subjects. J Imaging 2023; 9:155. [PMID: 37623687 PMCID: PMC10455268 DOI: 10.3390/jimaging9080155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Revised: 07/25/2023] [Accepted: 07/28/2023] [Indexed: 08/26/2023] Open
Abstract
The pain pathomechanism of chronic low back pain (LBP) is complex and the available diagnostic methods are insufficient. Patients present morphological changes in volume and cross-sectional area (CSA) of lumbosacral region. The main objective of this study was to assess if CSA measurements of pelvic muscle will indicate muscle atrophy between asymptomatic and symptomatic sides in chronic LBP patients, as well as between right and left sides in healthy volunteers. In addition, inter-rater reliability for CSA measurements was examined. The study involved 71 chronic LBP patients and 29 healthy volunteers. The CSA of gluteus maximus, medius, minimus and piriformis were measured using the MRI manual segmentation method. Muscle atrophy was confirmed in gluteus maximus, gluteus minimus and piriformis muscle for over 50% of chronic LBP patients (p < 0.05). Gluteus medius showed atrophy in patients with left side pain occurrence (p < 0.001). Muscle atrophy occurred on the symptomatic side for all inspected muscles, except gluteus maximus in rater one assessment. The reliability of CSA measurements between raters calculated using CCC and ICC presented great inter-rater reproducibility for each muscle both in patients and healthy volunteers (p < 0.95). Therefore, there is the possibility of using CSA assessment in the diagnosis of patients with symptoms of chronic LBP.
Collapse
Affiliation(s)
- Wiktoria Frącz
- Faculty of Biomedical Sciences, Medical University of Lodz, Al. Kosciuszki 4, 90-419 Lodz, Poland;
| | - Jakub Matuska
- Department of Physiotherapy, Poznan University of Medical Sciences, ul. 28 czerwca 1956r. nr 135/147, 61-545 Poznan, Poland;
- Doctoral School, Poznan University of Medical Sciences, Bukowska 70, 60-812 Poznań, Poland
- Doctoral School, Rovira I Virgili University, Carrer St. Llorenç No. 21, 43201 Reus, Spain
| | - Jarosław Szyszka
- Opole Rehabilitation Centre in Korfantów, Wyzwolenia 11, 48-317 Korfantów, Poland
| | - Paweł Dobrakowski
- Psychology Institute, Humanitas University in Sosnowiec, 41-200 Sosnowiec, Poland
| | - Wiktoria Szopka
- Faculty of Veterinary Medicine and Animal Science, Poznan University of Life Sciences, 60-637 Poznań, Poland
| | - Elżbieta Skorupska
- Department of Physiotherapy, Poznan University of Medical Sciences, ul. 28 czerwca 1956r. nr 135/147, 61-545 Poznan, Poland;
| |
Collapse
|
17
|
Ferenstein M, Ostrzyżek-Przeździecka K, Gąsior JS, Werner B. Inter-Rater Reliability of the Polish Version of the Alberta Infant Motor Scale in Children with Heart Disease. J Clin Med 2023; 12:4555. [PMID: 37445590 DOI: 10.3390/jcm12134555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 07/04/2023] [Accepted: 07/05/2023] [Indexed: 07/15/2023] Open
Abstract
There is an urgent need for the systematic monitoring of motor and cognitive neurodevelopment and the evaluation of motor skill development in infants and children with heart disease. Familiarizing students and early graduates with the developmental care needed by these patients may help in the system-wide implementation of early motor screening in this population. The purpose of this study was to investigate the agreement between a last-year physiotherapy student and an experienced pediatric physiotherapist when applying the Polish version of the Alberta Infant Motor Scale (AIMS) to a heterogenous group of children with congenital heart defects. Agreement between raters was verified based on the observation of 80 (38 females) patients with heart disease aged 1-18 months using a Bland-Altman plot with limits of agreement and an intraclass correlation coefficient. The bias between raters for the total score for four age groups (0-3 months, 4-7 months, 8-11 months and 12-18 months) was between -0.17 and 0.22 (range: -0.54-0.78), and the ICC was between 0.875 and 1.000. Thus, a reliable assessment of motor development or motor skills using the Polish version of the AIMS can be performed in pediatric patients with heart defects by clinically inexperienced last-year physiotherapy students who are familiarized with the AIMS manual.
Collapse
Affiliation(s)
- Maria Ferenstein
- Department of Pediatric Cardiology and General Pediatrics, Medical University of Warsaw, 02-091 Warsaw, Poland
| | | | - Jakub S Gąsior
- Department of Pediatric Cardiology and General Pediatrics, Medical University of Warsaw, 02-091 Warsaw, Poland
| | - Bożena Werner
- Department of Pediatric Cardiology and General Pediatrics, Medical University of Warsaw, 02-091 Warsaw, Poland
| |
Collapse
|
18
|
López-Ruiz J, Estrada-Barranco C, Giménez-Mestre MJ, Villarroya-Mateos I, Martín-Casas P, López-de-Uralde-Villanueva I. Differences between Novice and Expert Raters Assessing Trunk Control Using the Trunk Control Measurement Scale Spanish Version (TCMS-S) in Children with Cerebral Palsy. J Clin Med 2023; 12:jcm12103568. [PMID: 37240674 DOI: 10.3390/jcm12103568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 05/09/2023] [Accepted: 05/18/2023] [Indexed: 05/28/2023] Open
Abstract
The Trunk Control Measurement Scale (TCMS) is a valid and reliable tool to assess static and dynamic trunk control in cerebral palsy. However, there is no evidence informing about differences between novice and expert raters. A cross-sectional study was conducted with participants between the ages of 6 and 18 years with a CP diagnosis. The TCMS Spanish version (TCMS-S) was administered in-person by an expert rater, and video recordings were taken for later scoring by the expert and three other raters with varying levels of clinical experience. The intraclass correlation coefficient (ICC) was used to evaluate reliability between raters for the total and subscales of the TCMS-S scores. Standard Error of Measurement (SEM) and Minimal Detectable Change (MDC) were also calculated. There was a high level of agreement between expert raters (ICC ≥ 0.93), while novice raters demonstrated good agreement (ICC > 0.72). Additionally, it was observed that novice raters had a slightly higher SEM and MDC than expert raters. The Selective Movement Control subscale exhibited slightly higher SEM and MDC values compared to the TCMS-S total and other subscales, irrespective of the rater's level of expertise. Overall, the study showed that the TCMS-S is a reliable tool for evaluating trunk control in the Spanish pediatric population with cerebral palsy, regardless of the rater's experience level.
Collapse
Affiliation(s)
- Javier López-Ruiz
- Department of Physiotherapy, Faculty of Sport Sciences, Universidad Europea de Madrid, Villaviciosa de Odón, 28670 Madrid, Spain
- Doctoral Program in Healthcare, Faculty of Nursing, Physiotherapy and Podiatry, University Complutense of Madrid, 28040 Madrid, Spain
- Department of Radiology, Rehabilitation and Physiotherapy, Faculty of Nursing, Physiotherapy and Podiatry, Universidad Complutense de Madrid, 28040 Madrid, Spain
| | - Cecilia Estrada-Barranco
- Department of Physiotherapy, Faculty of Sport Sciences, Universidad Europea de Madrid, Villaviciosa de Odón, 28670 Madrid, Spain
| | - Maria José Giménez-Mestre
- Department of Physiotherapy, Faculty of Sport Sciences, Universidad Europea de Madrid, Villaviciosa de Odón, 28670 Madrid, Spain
| | | | - Patricia Martín-Casas
- Department of Radiology, Rehabilitation and Physiotherapy, Faculty of Nursing, Physiotherapy and Podiatry, Universidad Complutense de Madrid, 28040 Madrid, Spain
- InPhysio Research Group, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), 28040 Madrid, Spain
| | - Ibai López-de-Uralde-Villanueva
- Department of Radiology, Rehabilitation and Physiotherapy, Faculty of Nursing, Physiotherapy and Podiatry, Universidad Complutense de Madrid, 28040 Madrid, Spain
- InPhysio Research Group, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), 28040 Madrid, Spain
| |
Collapse
|
19
|
Jonsdottir G, Haraldsdottir E, Sigurdardottir V, Thoroddsen A, Vilhjalmsson R, Tryggvadottir GB, Jonsdottir H. Developing and testing inter-rater reliability of a data collection tool for patient health records on end-of-life care of neurological patients in an acute hospital ward. Nurs Open 2023. [PMID: 37141442 DOI: 10.1002/nop2.1789] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 11/21/2022] [Accepted: 04/16/2023] [Indexed: 05/06/2023] Open
Abstract
AIM Develop and test a data collection tool-Neurological End-Of-Life Care Assessment Tool (NEOLCAT)-for extracting data from patient health records (PHRs) on end-of-life care of neurological patients in an acute hospital ward. DESIGN Instrument development and inter-rater reliability (IRR) assessment. METHOD NEOLCAT was constructed from patient care items obtained from clinical guidelines and literature on end-of-life care. Expert clinicians reviewed the items. Using percentage agreement and Fleiss' kappa we calculated IRR on 32 nominal items, out of 76 items. RESULTS IRR of NEOLCAT showed 89% (range 83%-95%) overall categorical percentage agreement. The Fleiss' kappa categorical coefficient was 0.84 (range 0.71-0.91). There was fair or moderate agreement on six items, and moderate or almost perfect agreement on 26 items. CONCLUSION The NEOLCAT shows promising psychometric properties for studying clinical components of care of neurological patients at the end-of-life on an acute hospital ward but could be further developed in future studies.
Collapse
Affiliation(s)
- Gudrun Jonsdottir
- Faculty of Nursing and Midwifery, School of Health Sciences, University of Iceland, Reykjavik, Iceland
- Landspitali, The National University Hospital of Iceland, Reykjavik, Iceland
| | | | | | - Asta Thoroddsen
- Faculty of Nursing and Midwifery, School of Health Sciences, University of Iceland, Reykjavik, Iceland
| | - Runar Vilhjalmsson
- Faculty of Nursing and Midwifery, School of Health Sciences, University of Iceland, Reykjavik, Iceland
| | | | - Helga Jonsdottir
- Faculty of Nursing and Midwifery, School of Health Sciences, University of Iceland, Reykjavik, Iceland
| |
Collapse
|
20
|
Jang JS, Kim JI, Ku B, Lee JH. Reliability Analysis of Vertebral Landmark Labelling on Lumbar Spine X-ray Images. Diagnostics (Basel) 2023; 13:diagnostics13081411. [PMID: 37189512 DOI: 10.3390/diagnostics13081411] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Revised: 03/28/2023] [Accepted: 04/12/2023] [Indexed: 05/17/2023] Open
Abstract
Vertebral landmark labelling on X-ray images is important for objective and quantitative diagnosis. Most studies related to the reliability of labelling focus on the Cobb angle, and it is difficult to find studies describing landmark point locations. Since points are the most fundamental geometric feature that can generate lines and angles, the assessment of landmark point locations is essential. The aim of this study is to provide a reliability analysis of landmark points and vertebral endplate lines with a large number of lumbar spine X-ray images. A total of 1000 pairs of anteroposterior and lateral view lumbar spine images were prepared, and 12 manual medicine experts participated in the labelling process as raters. A standard operating procedure (SOP) was proposed by consensus of the raters based on manual medicine and provided guidelines for reducing sources of error in landmark labelling. High intraclass correlation coefficients ranging from 0.934 to 0.991 verified the reliability of the labelling process using the proposed SOP. We also presented means and standard deviations of measurement errors, which could be a valuable reference for evaluating both automated landmark detection algorithms and manual labelling by experts.
Collapse
Affiliation(s)
- Jun-Su Jang
- Digital Health Research Division, Korea Institute of Oriental Medicine, 1672 Yuseong-daero, Yuseong-gu, Daejeon 34054, Republic of Korea
| | - Joong Il Kim
- Digital Health Research Division, Korea Institute of Oriental Medicine, 1672 Yuseong-daero, Yuseong-gu, Daejeon 34054, Republic of Korea
| | - Boncho Ku
- Digital Health Research Division, Korea Institute of Oriental Medicine, 1672 Yuseong-daero, Yuseong-gu, Daejeon 34054, Republic of Korea
| | - Jin-Hyun Lee
- Institute for Integrative Medicine, Catholic Kwandong University International St. Mary's Hospital, 25 Simgok-ro 100 beon-gil, Seo-gu, Incheon 22711, Republic of Korea
| |
Collapse
|
21
|
Yee KOK, Yoon CK, Seman Z, Hong CK, Misron SNF, Lim CH. Ictal Electroencephalogram Visual Pattern Recognition of Seizure Adequacy During Electroconvulsive Therapy Treatment: A Step-by-Step Approach. Malays J Med Sci 2023; 30:83-89. [PMID: 37102040 PMCID: PMC10125233 DOI: 10.21315/mjms2023.30.2.7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Accepted: 09/09/2022] [Indexed: 04/28/2023] Open
Abstract
Background The NEURON (Neuropsychiatry and Neuromodulation Unit) electroconvulsive therapy electroencephalogram (ECT-EEG) Algorithmic Rating Scale (NEARS) is a step-by-step approach to ictal electroencephalogram visual pattern recognition of seizure adequacy based on recruitment, amplitude, symmetry, duration and degree of post-ictal suppression. The objectives of this clinical audit were to determine the degree of agreement on the NEARS operational criteria between two neuropsychiatrists, the reliability of electroconvulsive therapy practitioners' administration of NEARS during ECT procedures and the correlation of NEARS scores with Clinical Global Impression scale scores after each ECT treatment session. Methods Systematic random sampling was conducted. Even numbers of ictal tracings were selected for analysis from the total samples collected over 8 consecutive days of ECT overseen by a total of eight different ECT practitioners. Cohen's kappa coefficient was used to measure the inter-rater reliability of the two neuropsychiatrists and determine the level of agreement between NEARS scores and those of the ECT practitioners. The correlation using NEARS scores and post-ECT Clinical Global Impression scores was measured with Spearman's test. The significance level was set at P < 0.05. Results Cohen's kappa showed perfect agreement between the two neuropsychiatrists, at κ = 1.00 (SE = 0.001; P < 0.001), and strong agreement between NEARS scores of overall seizure adequacy and the scores interpreted by the ECT practitioners, at κ = 0.83 (95% CI: 0.66, 0.99; P < 0.001). Spearman's test showed a weak negative association between NEARS scores and post-ECT Clinical Global Impression scores (r = -0.018; P = 0.900). Conclusion NEARS may facilitate a brief, objectively reliable and practical assessment of ictal electroencephalogram quality. The scale is readily applicable by any trained ECT practitioner during an ongoing ECT procedure, especially when a prompt treatment decision is required.
Collapse
Affiliation(s)
- Kenny Ong Kheng Yee
- NEURON (Neuropsychiatry and Neuromodulation Unit), Department of Psychiatry and Mental Health, Kuala Lumpur General Hospital, Kuala Lumpur, Malaysia
| | - Chee Kok Yoon
- NEURON (Neuropsychiatry and Neuromodulation Unit), Department of Psychiatry and Mental Health, Kuala Lumpur General Hospital, Kuala Lumpur, Malaysia
| | - Zamtira Seman
- Sector for Biostatistics and Data Repository, National Institutes of Health, Selangor, Malaysia
| | - Chhoa Keng Hong
- NEURON (Neuropsychiatry and Neuromodulation Unit), Department of Psychiatry and Mental Health, Kuala Lumpur General Hospital, Kuala Lumpur, Malaysia
| | | | - Chin Han Lim
- NEURON (Neuropsychiatry and Neuromodulation Unit), Department of Psychiatry and Mental Health, Kuala Lumpur General Hospital, Kuala Lumpur, Malaysia
| |
Collapse
|
22
|
Sathraju S, Johnson K, Cicalese KV, Opalak CF, Broaddus WC. Reducing Gadolinium Exposure in Patients Undergoing Monitoring for Meningiomas. Cureus 2023; 15:e37492. [PMID: 37187666 PMCID: PMC10180544 DOI: 10.7759/cureus.37492] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/12/2023] [Indexed: 05/17/2023] Open
Abstract
Background Due to the non-malignant and slow-growing nature of many meningiomas, surveillance with serial magnetic resonance imaging (MRI) serves as an acceptable management plan. However, repeated imaging with gold-standard contrast-based studies may lead to contrast-associated adverse effects. Non-gadolinium T2 sequences may serve as a suitable alternative without the risk of adverse effects of contrast. Thus, this study sought to investigate the agreement between post-contrast T1 and non-gadolinium T2 MRI sequences in the measurement of meningioma growth. Methodology The Virginia Commonwealth University School of Medicine (VCU SOM) brain tumor database was used to create a cohort of meningioma patients and determine the number of patients who had T1 post-contrast imaging accompanied by readily measurable imaging from either T2 fast spin echo (FSE) or T2 fluid-attenuated inversion recovery (FLAIR) sequences. Measurements of the largest axial and perpendicular diameters of each tumor were conducted by two independent observers using T1 post-contrast, T2 FSE, and T2 FLAIR imaging series. Lin's concordance correlation coefficient (CCC) was calculated to assess inter-rater reliability between observers and agreement between measurements of tumor diameter among the different imaging sequences. Results In total, 33 patients (average age = 72.1 ± 12.9 years, 90% female) with meningiomas were extracted from our database, with 22 (66.7%) undergoing T1 post-contrast imaging accompanied with readily measurable imaging from T2 FSE and/or T2 FLAIR sequences. The inter-rater reliability between the measurements of T1 axial and perpendicular diameters was 0.96 (95% confidence interval (CI) = 0.92-0.98) and 0.92 (95% CI = 0.83-0.97), respectively. The inter-rater reliability between the measurements of T2 axial perpendicular diameters was 0.93 (95% = CI 0.92-0.97) and 0.89 (95% CI = 0.74-0.95), respectively. The agreements between the measurement of T1 and T2 FSE axial diameter by each observer were 0.97 (95% CI = 0.93-0.98) and 0.92 (95% CI = 0.81-0.97). The agreements between the measurements of T1 and T2 FSE perpendicular diameter measurements by each observer were 0.98 (95% CI = 0.95-0.99) and 0.88 (95% CI = 0.73-0.95). Conclusions Two-thirds of our patients had meningiomas that were readily measurable on either T2 FSE or T2 FLAIR sequences. Additionally, there was excellent inter-rater reliability between the observers in our study as well as an agreement between individual measurements of T1 post-contrast and T2 FSE tumor diameters. These findings suggest that T2 FSE may serve as a safe and similarly effective surveillance method for the long-term management of meningioma patients.
Collapse
Affiliation(s)
- Srikar Sathraju
- Neurosurgery, Virginia Commonwealth University School of Medicine, Richmond, USA
| | | | - Kyle V Cicalese
- Neurosurgery, Virginia Commonwealth University School of Medicine, Richmond, USA
| | - Charles F Opalak
- Neurosurgery, Prisma Health Southeastern Neurosurgical and Spine Institute, Greenville, USA
| | - William C Broaddus
- Neurosurgery, Virginia Commonwealth University School of Medicine, Richmond, USA
| |
Collapse
|
23
|
Kaiser I, Pfahlberg AB, Mathes S, Uter W, Diehl K, Steeb T, Heppt MV, Gefeller O. Inter-Rater Agreement in Assessing Risk of Bias in Melanoma Prediction Studies Using the Prediction Model Risk of Bias Assessment Tool (PROBAST): Results from a Controlled Experiment on the Effect of Specific Rater Training. J Clin Med 2023; 12:jcm12051976. [PMID: 36902763 PMCID: PMC10003882 DOI: 10.3390/jcm12051976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 02/27/2023] [Accepted: 02/28/2023] [Indexed: 03/06/2023] Open
Abstract
Assessing the risk of bias (ROB) of studies is an important part of the conduct of systematic reviews and meta-analyses in clinical medicine. Among the many existing ROB tools, the Prediction Model Risk of Bias Assessment Tool (PROBAST) is a rather new instrument specifically designed to assess the ROB of prediction studies. In our study we analyzed the inter-rater reliability (IRR) of PROBAST and the effect of specialized training on the IRR. Six raters independently assessed the risk of bias (ROB) of all melanoma risk prediction studies published until 2021 (n = 42) using the PROBAST instrument. The raters evaluated the ROB of the first 20 studies without any guidance other than the published PROBAST literature. The remaining 22 studies were assessed after receiving customized training and guidance. Gwet's AC1 was used as the primary measure to quantify the pairwise and multi-rater IRR. Depending on the PROBAST domain, results before training showed a slight to moderate IRR (multi-rater AC1 ranging from 0.071 to 0.535). After training, the multi-rater AC1 ranged from 0.294 to 0.780 with a significant improvement for the overall ROB rating and two of the four domains. The largest net gain was achieved in the overall ROB rating (difference in multi-rater AC1: 0.405, 95%-CI 0.149-0.630). In conclusion, without targeted guidance, the IRR of PROBAST is low, questioning its use as an appropriate ROB instrument for prediction studies. Intensive training and guidance manuals with context-specific decision rules are needed to correctly apply and interpret the PROBAST instrument and to ensure consistency of ROB ratings.
Collapse
Affiliation(s)
- Isabelle Kaiser
- Department of Medical Informatics, Biometry and Epidemiology, Friedrich Alexander University of Erlangen-Nuremberg, 91054 Erlangen, Germany
- Correspondence:
| | - Annette B. Pfahlberg
- Department of Medical Informatics, Biometry and Epidemiology, Friedrich Alexander University of Erlangen-Nuremberg, 91054 Erlangen, Germany
| | - Sonja Mathes
- Department of Dermatology and Allergy Biederstein, Faculty of Medicine, Technical University of Munich, 80802 Munich, Germany
| | - Wolfgang Uter
- Department of Medical Informatics, Biometry and Epidemiology, Friedrich Alexander University of Erlangen-Nuremberg, 91054 Erlangen, Germany
| | - Katharina Diehl
- Department of Medical Informatics, Biometry and Epidemiology, Friedrich Alexander University of Erlangen-Nuremberg, 91054 Erlangen, Germany
| | - Theresa Steeb
- Department of Dermatology, University Hospital Erlangen, 91054 Erlangen, Germany
| | - Markus V. Heppt
- Department of Dermatology, University Hospital Erlangen, 91054 Erlangen, Germany
- Comprehensive Cancer Center Erlangen-European Metropolitan Area of Nuremberg (CCC ER-EMN), 91054 Erlangen, Germany
| | - Olaf Gefeller
- Department of Medical Informatics, Biometry and Epidemiology, Friedrich Alexander University of Erlangen-Nuremberg, 91054 Erlangen, Germany
| |
Collapse
|
24
|
Staibano P, Ham J, Chen J, Zhang H, Gupta MK. Inter-Rater Reliability of Thyroid Ultrasound Risk Criteria: A Systematic Review and Meta-Analysis. Laryngoscope 2023; 133:485-493. [PMID: 36039947 DOI: 10.1002/lary.30347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Revised: 07/05/2022] [Accepted: 07/29/2022] [Indexed: 11/06/2022]
Abstract
OBJECTIVE The most commonly employed diagnostic criteria for identifying thyroid nodules include Thyroid Imaging and Reporting Data System (TI-RADS) and American Thyroid Association (ATA) guidelines. The purpose of this systematic review and meta-analysis is to determine the inter-rater reliability of thyroid ultrasound criteria. METHODS We performed a library search of MEDLINE (Ovid), EMBASE (Ovid), and Web of Science for full-text articles published from January 2005 to June 2022. We included full-text primary research articles that used TI-RADS and/or ATA guidelines to evaluate thyroid nodules in adults. These included studies must have calculated inter-rater reliability using any validated metric. The Quality Appraisal for Reliability Studies (QAREL) was used to assess study quality. We planned for a random-effects meta-analysis, in addition to covariate and publication bias analyses. This study was performed in accordance with Preferred Reporting Items for a Systematic Review and Meta-analysis guidelines and registered prior to conduction (International prospective register of systematic reviews-PROSPERO: CRD42021275072). RESULTS Of the 951 articles identified via the database search, 35 met eligibility criteria. All studies were observational. The most commonly utilized criteria were ACR Thyroid Imaging and Reporting Data System (TI-RADS) and/or ATA criteria, while the majority of studies employed Κ statistics. For ACR TI-RADS, the pooled Κ was 0.51 (95% confidence interval [CI]: 0.42, 0.57; n = 7) while for ATA, the pooled Κ was 0.52 (95% CI: 0.37, 0.67; n = 3). Due to the small number of studies, covariate or publication bias analyses were not performed. CONCLUSION Ultrasound criteria demonstrate moderate inter-rater reliability, but these findings are impacted by poor study quality and a lack of standardization. Laryngoscope, 133:485-493, 2023.
Collapse
Affiliation(s)
- Phillip Staibano
- Division of Otolaryngology-Head and Neck Surgery, Department of Surgery, McMaster University, Hamilton, Ontario, Canada
| | - Jennifer Ham
- Division of Otolaryngology-Head and Neck Surgery, Department of Surgery, McMaster University, Hamilton, Ontario, Canada
| | - Jennifer Chen
- Division of Otolaryngology-Head and Neck Surgery, Department of Surgery, McMaster University, Hamilton, Ontario, Canada
| | - Han Zhang
- Division of Otolaryngology-Head and Neck Surgery, Department of Surgery, McMaster University, Hamilton, Ontario, Canada
| | - Michael K Gupta
- Division of Otolaryngology-Head and Neck Surgery, Department of Surgery, McMaster University, Hamilton, Ontario, Canada
| |
Collapse
|
25
|
Aita SL, Moncrief GG, Greene J, Trujillo S, Carrillo A, Iwanicki S, Morera CC, Gioia GA, Isquith PK, Roth RM. Univariate and Multivariate Base Rates of Score Elevations, Reliable Change, and Inter-Rater Discrepancies in the BRIEF-A Standardization Samples. Assessment 2023; 30:390-401. [PMID: 34726086 DOI: 10.1177/10731911211055673] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
The Behavior Rating Inventory of Executive Function-Adult Version (BRIEF-A) is a standardized rating scale of subjective executive functioning. We provide univariate and multivariate base rates (BRs) for scale/index scores in the clinical range (T scores ≥65), reliable change, and inter-rater information not included in the Professional Manual. Participants were adults (ages = 18-90 years) from the BRIEF-A self-report (N = 1,050) and informant report (N = 1,200) standardization samples, as well as test-retest (n = 50 for self, n = 44 for informant) and inter-rater (n = 180) samples. Univariate BRs of elevated T scores were low (self-report = 3.3%-15.4%, informant report = 4.5%-16.3%). Multivariate BRs revealed the common occurrence of obtaining at least one elevated T-score across scales (self-report = 26.5%-37.3%, informant report = 22.7%-30.3%), whereas virtually none had elevated scores on all scales. Test-retest scores were highly correlated (self = .82-.94; informant = .91-.96). Inter-rater correlations ranged from .44 to .68. Significant (p < .05) test-retest T-score differences ranged from 7 to 12 for self-report, from 6 to 8 for informant report, and from 16 to 21 points for inter-rater T-score differences. Applications of these findings are discussed.
Collapse
Affiliation(s)
- Stephen L Aita
- Geisel School of Medicine at Dartmouth, Hanover, NH, USA.,Dartmouth-Hitchcock Medical Center, Lebanon, NH, USA
| | - Grant G Moncrief
- Geisel School of Medicine at Dartmouth, Hanover, NH, USA.,Dartmouth-Hitchcock Medical Center, Lebanon, NH, USA
| | | | - Sue Trujillo
- Psychological Assessment Resources, Lutz, FL, USA
| | | | | | | | | | | | - Robert M Roth
- Geisel School of Medicine at Dartmouth, Hanover, NH, USA.,Dartmouth-Hitchcock Medical Center, Lebanon, NH, USA
| |
Collapse
|
26
|
Thaker AI, Smith J, Pathak M, Park JY. Challenges in Inter-rater Agreement on Lamina Propria Fibrosis in Esophageal Biopsies. Pediatr Dev Pathol 2023; 26:106-114. [PMID: 36755427 DOI: 10.1177/10935266221147084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/10/2023]
Abstract
BACKGROUND Mucosal biopsies in eosinophilic esophagitis (EoE) can exhibit lamina propria (LP) fibrosis, which may portend stenotic complications; however, the histologic diagnosis of LP fibrosis is subjective. We sought to assess and improve the consistency of LP fibrosis diagnosis among our pathologist group. METHODS At a large pediatric hospital, 25 esophageal biopsy slides from 19 patients (16 with EoE) exhibiting a wide spectrum of LP area, artifacts, and fibrosis severity were scanned into whole-slide images. Staff pediatric pathologists (n = 8) separate from the authors classified each biopsy by LP adequacy and fibrosis severity 1 month before and after completion of an educational tutorial. Consensus was defined as >70% agreement. RESULTS At baseline, 16/25 (64%) cases reached consensus for no fibrosis (n = 3), fibrosis (n = 7), or inadequate LP (n = 6); agreement was fair (α = 0.34). Post-tutorial, 13/25 (52%) cases reached consensus for no fibrosis (n = 2), fibrosis (n = 7), or inadequate LP (n = 4); agreement was again fair (α = 0.33). There was moderate agreement in grading of fibrosis severity (α = 0.54). CONCLUSION We document only fair-to-moderate agreement in the diagnosis of esophageal LP fibrosis and adequacy in a large pediatric pathologist group despite targeted education, highlighting a challenge in incorporating this feature into EoE research and clinical decision-making.
Collapse
Affiliation(s)
- Ameet I Thaker
- Division of Pediatric Pathology, UT Southwestern/Children's Health, Dallas, TX, USA
| | - Jacob Smith
- Division of Pediatric Pathology, UT Austin/Dell Children's Medical Center, Austin, TX, USA
| | - Mona Pathak
- Department of Population and Data Sciences, UT Southwestern, Dallas, TX, USA
| | - Jason Y Park
- Division of Pediatric Pathology, UT Southwestern/Children's Health, Dallas, TX, USA
| |
Collapse
|
27
|
Williams B, Hedger N, McNabb CB, Rossetti GMK, Christakou A. Inter-rater reliability of functional MRI data quality control assessments: A standardised protocol and practical guide using pyfMRIqc. Front Neurosci 2023; 17:1070413. [PMID: 36816136 PMCID: PMC9936142 DOI: 10.3389/fnins.2023.1070413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Accepted: 01/11/2023] [Indexed: 02/05/2023] Open
Abstract
Quality control is a critical step in the processing and analysis of functional magnetic resonance imaging data. Its purpose is to remove problematic data that could otherwise lead to downstream errors in the analysis and reporting of results. The manual inspection of data can be a laborious and error-prone process that is susceptible to human error. The development of automated tools aims to mitigate these issues. One such tool is pyfMRIqc, which we previously developed as a user-friendly method for assessing data quality. Yet, these methods still generate output that requires subjective interpretations about whether the quality of a given dataset meets an acceptable standard for further analysis. Here we present a quality control protocol using pyfMRIqc and assess the inter-rater reliability of four independent raters using this protocol for data from the fMRI Open QC project (https://osf.io/qaesm/). Data were classified by raters as either "include," "uncertain," or "exclude." There was moderate to substantial agreement between raters for "include" and "exclude," but little to no agreement for "uncertain." In most cases only a single rater used the "uncertain" classification for a given participant's data, with the remaining raters showing agreement for "include"/"exclude" decisions in all but one case. We suggest several approaches to increase rater agreement and reduce disagreement for "uncertain" cases, aiding classification consistency.
Collapse
Affiliation(s)
- Brendan Williams
- Centre for Integrative Neuroscience and Neurodynamics, University of Reading, Reading, United Kingdom,School of Psychology and Clinical Language Sciences, University of Reading, Reading, United Kingdom,*Correspondence: Brendan Williams,
| | - Nicholas Hedger
- Centre for Integrative Neuroscience and Neurodynamics, University of Reading, Reading, United Kingdom,School of Psychology and Clinical Language Sciences, University of Reading, Reading, United Kingdom
| | - Carolyn B. McNabb
- Cardiff University Brain Research Imaging Centre (CUBRIC), School of Psychology, College of Biomedical and Life Sciences, Cardiff University, Cardiff, United Kingdom
| | - Gabriella M. K. Rossetti
- Centre for Integrative Neuroscience and Neurodynamics, University of Reading, Reading, United Kingdom,School of Psychology and Clinical Language Sciences, University of Reading, Reading, United Kingdom
| | - Anastasia Christakou
- Centre for Integrative Neuroscience and Neurodynamics, University of Reading, Reading, United Kingdom,School of Psychology and Clinical Language Sciences, University of Reading, Reading, United Kingdom
| |
Collapse
|
28
|
Wagener MG, Schregel J, Ossowski N, Trojakowska A, Ganter M, Kiene F. The influence of different examiners on the Body Condition Score (BCS) in South American camelids-Experiences from a mixed llama and alpaca herd. Front Vet Sci 2023; 10:1126399. [PMID: 36816196 PMCID: PMC9936059 DOI: 10.3389/fvets.2023.1126399] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2022] [Accepted: 01/16/2023] [Indexed: 02/05/2023] Open
Abstract
Particularly in unshorn llamas and alpacas with a dense fiber coat, changes in body condition often remain undetected for a long time. Manual palpation of the lumbar vertebrae is hence a simple and practical method for the objective assessment of body condition in South American camelids (SAC). Depending on tissue coverage, a body condition score (BCS) of 1 (emaciated) to 5 (obese) with an optimum of 3 is assigned. To date, there is a lack of detailed information on the comparability of the results when the BCS in llamas or alpacas is assessed by different examiners. Reliability of BCS assessment of 20 llamas and nine alpacas during a veterinary herd visit by six examiners was hence evaluated in this study. A gold standard BCS (gsBCS) was calculated from the results of the two most experienced examiners. The other examiners deviated by a maximum of 0.5 score points from the gsBCS in more than 80% of the animals. Inter-rater reliability statistics between the assessors were comparable to those in body condition scoring in sheep and cattle (r = 0.52-0.89; τ = 0.43-0.80; κw = 0.50-0.79). Agreements were higher among the more experienced assessors. Based on the results, the assessment of BCS in SAC by palpation of the lumbar vertebrae can be considered as a simple and reproducible method to reliably determine nutritional status in llamas and alpacas.
Collapse
|
29
|
Sinvani RT, Gilboa Y. Video-Conference-Based Graphomotor Examination for Children: A Validation Study. OTJR (Thorofare N J) 2023:15394492221145693. [PMID: 36631753 DOI: 10.1177/15394492221145693] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
BACKGROUND Telehealth can assist with providing accessible pediatric occupational therapy services. OBJECTIVES This study aimed to investigate the acceptability, inter-rater reliability, and concurrent validity of the Gilboa Functional Test (GIFT) as a video-conference-based graphomotor examination for children (GIFT-Online). METHODOLOGY A community-based sample of 157 children aged 3 to 7 years was screened using the GIFT-Online. FINDINGS Inter-rater reliability was excellent (r = 0.97; p < .001; n = 40). In addition, significant correlations were found between the total GIFT-Online scores and the Developmental Coordination Disorder Questionnaire 2007 (DCDQ'07), the Little DCDQ (LDCDQ) (r = 0.29, p < .001; n = 157), and the Drawing Proficiency Screening Questionnaire (DPSQ) (r = -0.35, p < .001; n = 157); demonstrating construct and concurrent validity. The online assessment was well received by parents and children. CONCLUSIONS The GIFT-Online was found to be an acceptable method of assessing graphomotor performance. Our results support the validity and reliability of the GIFT-Online as a screening tool administered remotely, thereby overcoming physical distancing and travel restrictions.
Collapse
|
30
|
Romaniello C, Romanazzo S, Cosci F. Clinimetric properties of the diagnostic criteria for psychosomatic research among the elderly. Clin Psychol Psychother 2023. [PMID: 36607260 DOI: 10.1002/cpp.2822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 12/23/2022] [Accepted: 12/28/2022] [Indexed: 01/07/2023]
Abstract
INTRODUCTION Among the elderly, the availability of tool assessing psychosomatic syndromes is limited. The present study aims at testing inter-rater reliability and concurrent validity of the semi-structured interview for the Diagnostic Criteria for Psychosomatic Research (DCPR-R-SSI) in the elderly of the general population. METHOD One hundred eight subjects were recruited. Participants received a clinical assessment which included the DCPR-R-SSI, the Illness Attitude Scale (IAS), the Geriatric Depression Scale (GDS), the Psychosocial Index (PSI), the Toronto Alexithymia Scale-20 (TAS-20). Analyses of inter-rater reliability of DCPR-R-SSI and concurrent validity between DCPR-R-SSI and self-administered questionnaires were conducted. RESULTS DCPR-R-SSI showed excellent inter-rater reliability with a percent of agreement of 90.7% (K Cohen: 0.856 [SE = 0.043], 95% CI: 0.77-0.94). DCPR-R demoralization showed fair concurrent validity with GDS; concurrent validity was also fair between DCPR-R Alexithymia and TAS-20, and between DCPR-R allostatic overload and PSI allostatic load, while the concurrent validity between DCPR-R Disease Phobia and IAS was moderate. CONCLUSION DCPR-R-SSI represents a reliable and valid tool to assess psychosomatic syndromes in the elderly. DCPR-R is in need of being implemented in the elderly clinical evaluation.
Collapse
Affiliation(s)
- Caterina Romaniello
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy
| | - Sara Romanazzo
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy
| | - Fiammetta Cosci
- Department of Health Sciences, University of Florence, Florence, Italy.,International Lab of Clinical Measurements, University of Florence, Florence, Italy.,Department of Psychiatry and Neuropsychology, Maastricht University, Maastricht, The Netherlands
| |
Collapse
|
31
|
Astrup K, Corner E, Van Tulder M, Sørensen L. Reliability and responsiveness of the Danish version of The Chelsea Critical Care Physical Assessment tool (CPAx). Physiother Theory Pract 2023; 39:193-199. [PMID: 34784835 DOI: 10.1080/09593985.2021.2005197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
INTRODUCTION Measurement instruments are important in clinical practice and research for assessing physical function in critically ill patients in the intensive care unit (ICU). OBJECTIVE To investigate inter-rater reliability and responsiveness of the Danish version of the CPAx (CPAx-D). METHOD Critically ill patients from three Danish ICUs were included. Patients were assessed with CPAx-D by two blinded testers during a regular physiotherapy session. Follow-up tests were performed in patients who stayed in the ICU for more than 24 hours, were not transferred to another hospital or received palliative care. Floor and ceiling effects were examined in all assessments.Results For the reliability analysis 66 patients were included. Results Showed no significant difference between raters. For the total score, intra class correlation coefficient (ICC) was 0.996 (95% CI: 0.993; 0.997), standard error of measurement was 0.72 point and minimal detectable change 2.0 points. Bland-Altman plot revealed no heteroscedacity. The responsiveness results of 24 patients showed that the effect size was 1.2 and the standardized response mean 1.1, which was in accordance with the hypothesis. No ceiling or floor effect was revealed. CONCLUSION The CPAx-D showed excellent inter-rater reliability and responsiveness.
Collapse
Affiliation(s)
- Katrine Astrup
- Department of Physiotherapy and Occupational Therapy, Aarhus University Hospital, Aarhus N, Denmark.,Department of Clinical Medicine, Aarhus University, Aarhus N, Denmark
| | - Evelyn Corner
- Department of Health Sciences, Brunel University London, London, UK
| | - Maurits Van Tulder
- Department of Physiotherapy and Occupational Therapy, Aarhus University Hospital, Aarhus N, Denmark.,Department of Clinical Medicine, Aarhus University, Aarhus N, Denmark.,Imperial College NHS Healthcare Trust, London, UK
| | - Lotte Sørensen
- Department of Physiotherapy and Occupational Therapy, Aarhus University Hospital, Aarhus N, Denmark.,Faculty Behavioural and Movement Sciences, Vrije Universiteit Amsterdam, Amsterdam Netherlands
| |
Collapse
|
32
|
Scielzo SA, Abdelfattah K, Ryder HF. Is It All About the Form? Norm- vs Criterion-Referenced Ratings and Faculty Inter-Rater Reliability. Ochsner J 2023; 23:206-221. [PMID: 37711480 PMCID: PMC10498947 DOI: 10.31486/toj.23.0014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/16/2023] Open
Abstract
Background: Little research to date has examined the quality of data obtained from resident performance evaluations. This study sought to address this need and compared inter-rater reliability obtained from norm-referenced and criterion-referenced evaluation scaling approaches for faculty completing resident performance evaluations. Methods: Resident performance evaluation data were examined from 2 institutions (3 programs, 2 internal medicine and 1 surgery; 426 residents in total), with 4 evaluation forms: 2 criterion-referenced (1 with an additional norm-referenced item) and 2 norm-referenced. Faculty inter-rater reliability was calculated with intraclass correlation coefficients (ICCs) (1,10) for each competency area within the form. ICCs were transformed to z-scores, and 95% CIs were computed. Reliabilities for each evaluation form and competency, averages within competency, and averages within scaling type were examined. Results: Inter-rater reliability averages were higher for all competencies that used criterion-referenced scaling relative to those that used norm-referenced scaling. Aggregate scores of all independent categories (competencies and the items assessing overall competence) for criterion-referenced scaling demonstrated higher reliability (z=1.37, CI 1.26-1.48) than norm-referenced scaling (z=0.88, CI 0.77-0.99). Moreover, examination of the distributions of composite scores (average of all competencies and raters for each individual being rated) suggested that the criterion-referenced evaluations better represented the performance continuum. Conclusion: Criterion-referenced evaluation approaches appear to provide superior inter-rater reliability relative to norm-referenced evaluation scaling approaches. Although more research is needed to identify resident evaluation best practices, using criterion-referenced scaling may provide more valid data than norm-referenced scaling.
Collapse
Affiliation(s)
- Shannon A. Scielzo
- Department of Internal Medicine, University of Texas Southwestern Medical Center, Dallas, TX
| | - Kareem Abdelfattah
- Department of General Surgery, University of Texas Southwestern Medical Center, Dallas, TX
| | - Hilary F. Ryder
- Burnett School of Medicine, Texas Christian University, Fort Worth, TX
- Internal Medicine Residency Program, Texas Health Harris Methodist Hospital, Fort Worth, TX
| |
Collapse
|
33
|
Somaskandhan P, Leppänen T, Terrill PI, Sigurdardottir S, Arnardottir ES, Ólafsdóttir KA, Serwatko M, Sigurðardóttir SÞ, Clausen M, Töyräs J, Korkalainen H. Deep learning-based algorithm accurately classifies sleep stages in preadolescent children with sleep-disordered breathing symptoms and age-matched controls. Front Neurol 2023; 14:1162998. [PMID: 37122306 PMCID: PMC10140398 DOI: 10.3389/fneur.2023.1162998] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Accepted: 03/23/2023] [Indexed: 05/02/2023] Open
Abstract
Introduction Visual sleep scoring has several shortcomings, including inter-scorer inconsistency, which may adversely affect diagnostic decision-making. Although automatic sleep staging in adults has been extensively studied, it is uncertain whether such sophisticated algorithms generalize well to different pediatric age groups due to distinctive EEG characteristics. The preadolescent age group (10-13-year-olds) is relatively understudied, and thus, we aimed to develop an automatic deep learning-based sleep stage classifier specifically targeting this cohort. Methods A dataset (n = 115) containing polysomnographic recordings of Icelandic preadolescent children with sleep-disordered breathing (SDB) symptoms, and age and sex-matched controls was utilized. We developed a combined convolutional and long short-term memory neural network architecture relying on electroencephalography (F4-M1), electrooculography (E1-M2), and chin electromyography signals. Performance relative to human scoring was further evaluated by analyzing intra- and inter-rater agreements in a subset (n = 10) of data with repeat scoring from two manual scorers. Results The deep learning-based model achieved an overall cross-validated accuracy of 84.1% (Cohen's kappa κ = 0.78). There was no meaningful performance difference between SDB-symptomatic (n = 53) and control subgroups (n = 52) [83.9% (κ = 0.78) vs. 84.2% (κ = 0.78)]. The inter-rater reliability between manual scorers was 84.6% (κ = 0.78), and the automatic method reached similar agreements with scorers, 83.4% (κ = 0.76) and 82.7% (κ = 0.75). Conclusion The developed algorithm achieved high classification accuracy and substantial agreements with two manual scorers; the performance metrics compared favorably with typical inter-rater reliability between manual scorers and performance reported in previous studies. These suggest that our algorithm may facilitate less labor-intensive and reliable automatic sleep scoring in preadolescent children.
Collapse
Affiliation(s)
- Pranavan Somaskandhan
- School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, QLD, Australia
- *Correspondence: Pranavan Somaskandhan,
| | - Timo Leppänen
- School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, QLD, Australia
- Department of Technical Physics, University of Eastern Finland, Kuopio, Finland
- Diagnostic Imaging Center, Kuopio University Hospital, Kuopio, Finland
| | - Philip I. Terrill
- School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, QLD, Australia
| | - Sigridur Sigurdardottir
- Reykjavik University Sleep Institute, School of Technology, Reykjavik University, Reykjavik, Iceland
| | - Erna Sif Arnardottir
- Reykjavik University Sleep Institute, School of Technology, Reykjavik University, Reykjavik, Iceland
- Internal Medicine Services, Landspitali–The National University Hospital of Iceland, Reykjavik, Iceland
| | - Kristín A. Ólafsdóttir
- Reykjavik University Sleep Institute, School of Technology, Reykjavik University, Reykjavik, Iceland
| | - Marta Serwatko
- Department of Clinical Engineering, Landspitali University Hospital, Reykjavik, Iceland
| | - Sigurveig Þ. Sigurðardóttir
- Department of Immunology, Landspitali University Hospital, Reykjavik, Iceland
- Faculty of Medicine, University of Iceland, Reykjavik, Iceland
| | - Michael Clausen
- Department of Allergy, Landspitali University Hospital, Reykjavik, Iceland
- Children's Hospital Reykjavik, Reykjavik, Iceland
| | - Juha Töyräs
- School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, QLD, Australia
- Department of Technical Physics, University of Eastern Finland, Kuopio, Finland
- Science Service Center, Kuopio University Hospital, Kuopio, Finland
| | - Henri Korkalainen
- Department of Technical Physics, University of Eastern Finland, Kuopio, Finland
- Diagnostic Imaging Center, Kuopio University Hospital, Kuopio, Finland
| |
Collapse
|
34
|
Yamakawa Y, Yamamoto N, Tomita Y, Okuda R, Masada Y, Shiroshita A, Matsumoto T. Reliability of the Garden Alignment Index and Valgus Tilt Measurement for Nondisplaced Femoral Neck Fractures. J Pers Med 2022; 13:jpm13010053. [PMID: 36675714 PMCID: PMC9863890 DOI: 10.3390/jpm13010053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Revised: 12/23/2022] [Accepted: 12/24/2022] [Indexed: 12/28/2022] Open
Abstract
Anteroposterior (AP) alignment assessment for nondisplaced femoral neck fractures is important for determining the treatment strategy and predicting postoperative outcomes. AP alignment is generally measured using the Garden alignment index (GAI). However, its reliability remains unknown. We compared the reliability of GAI and a new AP alignment measurement (valgus tilt measurement [VTM]) using preoperative AP radiographs of nondisplaced femoral neck fractures. The study was designed as an intra- and inter-rater reliability analysis. The raters were four trauma surgeons who assessed 50 images twice. The main outcome was the intraclass correlation coefficient (ICC). To calculate intra- and inter-rater reliability, we used a mixed-effects model considering rater, patient, and time. The overall ICC (95% CI) of GAI and VTM for intra-rater reliability was 0.92 (0.89−0.94) and 0.86 (0.82−0.89), respectively. The overall ICC of GAI and VTM for inter-rater reliability was 0.92 (0.89−0.95), and 0.85 (0.81−0.88), respectively. The intra- and inter-rater reliability of GAI was higher in patients aged <80 years than in patients aged ≥80 years. Our results showed that GAI is a more reliable measurement method than VTM, although both are reliable. Variations in patient age should be considered in GAI measurements.
Collapse
Affiliation(s)
- Yasuaki Yamakawa
- Department of Orthopedic Surgery, Kochi Health Sciences Center, Kochi 781-8555, Japan
| | - Norio Yamamoto
- Department of Orthopedic Surgery, Miyamoto Orthopedic Hospital, Okayama 773-8236, Japan
- Department of Epidemiology, Graduate School of Medicine, Dentistry and Pharmaceutical Sciences, Okayama University, Okayama 700-8558, Japan
- Scientific Research Works Peer Support Group (SRWS-PSG), Osaka 541-0043, Japan
- Correspondence:
| | - Yosuke Tomita
- Department of Physical Therapy, Faculty of Health Care, Takasaki University of Health and Welfare, Gunma 370-0033, Japan
| | - Ryuichiro Okuda
- Department of Orthopedic Surgery, Kochi Health Sciences Center, Kochi 781-8555, Japan
| | - Yasutaka Masada
- Department of Orthopedic Surgery, Kochi Health Sciences Center, Kochi 781-8555, Japan
| | - Akihiro Shiroshita
- Scientific Research Works Peer Support Group (SRWS-PSG), Osaka 541-0043, Japan
- Division of Epidemiology, Department of Medicine, Vanderbilt University School of Medicine, 2525 West End Avenue, Nashville, TN 37203, USA
- Division of Allergy, Pulmonary and Critical Care Medicine, Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
| | - Toshiyuki Matsumoto
- Department of Orthopedic Surgery, Kochi Health Sciences Center, Kochi 781-8555, Japan
| |
Collapse
|
35
|
Longo P, Toppino F, Martini M, Panero M, De Bacco C, Marzola E, Abbate-Daga G. Diagnostic Concordance between Research and Clinical-Based Assessments of Psychiatric Comorbidity in Anorexia Nervosa. J Clin Med 2022; 11:jcm11247419. [PMID: 36556034 PMCID: PMC9782669 DOI: 10.3390/jcm11247419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Revised: 12/09/2022] [Accepted: 12/12/2022] [Indexed: 12/23/2022] Open
Abstract
The literature has reported poor concordance in the assessment of psychiatric conditions, and inhomogeneity in the prevalence of psychiatric comorbidities in Anorexia Nervosa (AN). We aimed to investigate concordance level between clinicians' and researchers' diagnoses of psychiatric comorbidity in AN and differences in eating and general psychopathology between patients with and without psychiatric comorbidity assessed by clinicians versus researchers. A clinical psychiatrist interviewed 122 patients with AN; then a researcher administered the Structured and Clinical Interview for DSM-5 (SCID-5). Participants completed the Eating Disorder Examination Questionnaire (EDE-Q), the State-Trait Anxiety Inventory (STAI), and the Beck Depression Inventory (BDI). The agreement between clinicians and researchers was poor for all diagnoses but obsessive-compulsive disorder and substance use disorder. Patients with comorbid disorders diagnosed by researchers reported more severe eating and general psychopathology than those without SCID-comorbidity. The differences between patients with and without comorbidities assessed by a clinician were smaller. Two approaches to psychiatry comorbidity assessment emerged: SCID-5 diagnoses yield a precise and rigorous assessment, while clinicians tend to consider some symptoms as secondary to the eating disorder rather than as part of another psychiatric condition, seeing the clinical picture as a whole. Overall, the study highlights the importance of carefully assessing comorbidity in AN.
Collapse
|
36
|
Eltayar AN, Aref SR, Khalifa HM, Hammad AS. Do entrustment scales make a difference in the inter-rater reliability of the workplace-based assessment? Med Educ Online 2022; 27:2053401. [PMID: 35311494 PMCID: PMC8942514 DOI: 10.1080/10872981.2022.2053401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 03/08/2022] [Accepted: 03/11/2022] [Indexed: 06/14/2023]
Abstract
BACKGROUND A workplace-based assessment (WBA) is used to assess learners' competencies in their workplaces. Many workplace assessment tools are available and validated to assess various constructs. The implementation of workplace-based assessment requires proper training of the staff. OBJECTIVE This study aimed to explore the impact of staff training on WBA practices and evaluate the inter-rater reliability of these practices while using entrustment scales, performance descriptors, and personal judgment. DESIGN A quasi-experimental study, in which the staff members of the orthopedic department were invited to participate in a training program on the use of entrustment scales and assessment descriptors within the WBA tools. As a response to the training, subjective judgment was replaced by entrustment scales and performance descriptors in a trauma course offered by the orthopedic department. The inter-rater reliability of the WBA was evaluated using various rating scales. RESULTS The entrustment scales had higher inter-rater reliability of the assessment tools than performance descriptors and the personal judgment. CONCLUSION The inter-rater reliability was highest when using entrustment scales for WBAs, which could indicate that the entrustment scales achieve good psychometric properties as regards consistency among different raters. Thus, they decrease the confounding effect of differences in assessors. They may also give a clearer image of the actual academic level of the learners.
Collapse
Affiliation(s)
- Ayat Nabil Eltayar
- Medical Education Department, Faculty of Medicine, Alexandria University, Egypt
| | - Soha Rashed Aref
- Community Medicine and Public Health Department, Faculty of Medicine, Alexandria University, Egypt
| | - Hoda Mahmoud Khalifa
- Histology and Cell biology Department, Faculty of Medicine, Alexandria University, Egypt
| | - Abdullah Said Hammad
- Orthopaedic and Traumatology Department, Faculty of Medicine, Alexandria University, Egypt
| |
Collapse
|
37
|
de Jesus Ferreira LG, de Almeida Ventura Á, da Silva Almeida I, Mansur H, Babault N, Durigan JLQ, de Cássia Marqueti R. Intra- and Inter-Rater Reliability and Agreement of Ultrasound Imaging of Muscle Architecture and Patellar Tendon in Post-COVID-19 Patients Who Had Experienced Moderate or Severe COVID-19 Infection. J Clin Med 2022; 11:jcm11236934. [PMID: 36498509 PMCID: PMC9738112 DOI: 10.3390/jcm11236934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 11/03/2022] [Accepted: 11/07/2022] [Indexed: 11/27/2022] Open
Abstract
COVID-19 is associated with musculoskeletal disorders. Ultrasound is a tool to assess muscle architecture and tendon measurements, offering an idea of the proportion of the consequences of the disease, since significant changes directly reflect the reduction in the ability to produce force and, consequently, in the functionality of the patient; however, its application in post-COVID-19 infection needs to be determined. We aimed to assess the intra- and inter-rater reliability of ultrasound measures of the architecture of the vastus lateralis (VL), rectus femoris (RF), vastus medialis (VM), gastrocnemius lateralis (GL), gastrocnemius medialis (GM), soleus (SO), and tibialis anterior (TA) muscles, as well as the patellar tendon (PT) cross-sectional area (CSA) in post-COVID-19 patients. An observational, prospective study with repeated measures was designed to evaluate 20 post-COVID-19 patients, who were measured for the pennation angle (θp), fascicular length (Lf), thickness, echogenicity of muscles, CSA and echogenicity of the PT. The intra-class correlation coefficient (ICC) and 95% limits of agreement were used. The intra-rater reliability presented high or very high correlations (ICC = 0.71-1.0) for most measures, except the θp of the TA, which was classified as moderate (ICC = 0.69). Observing the inter-rater reliability, all the evaluations of the PT, thickness and echogenicity of the muscles presented high or very high correlations. For the Lf, only the RF showed as low (ICC = 0.43), for the θp, RF (ICC = 0.68), GL (ICC = 0.70) and TA (ICC = 0.71) moderate and the SO (ICC = 0.40) low. The ultrasound reliability was acceptable for the muscle architecture, muscle and tendon echogenicity, and PT CSA, despite the low reliability for the Lf and θp of the RF and SO, respectively.
Collapse
Affiliation(s)
- Leandro Gomes de Jesus Ferreira
- Laboratory of Muscle and Tendon Plasticity, Graduate Program in Rehabilitation Science, Faculdade de Ceilândia, Universidade de Brasília, Distrito Federal, Brasília 70910-900, Brazil
- Laboratory of Molecular Analysis, Graduate Program in Rehabilitation Science, Faculdade de Ceilândia, Universidade de Brasília, Distrito Federal, Brasília 70910-900, Brazil
| | - Álvaro de Almeida Ventura
- Laboratory of Muscle and Tendon Plasticity, Graduate Program in Rehabilitation Science, Faculdade de Ceilândia, Universidade de Brasília, Distrito Federal, Brasília 70910-900, Brazil
- Laboratory of Molecular Analysis, Graduate Program in Rehabilitation Science, Faculdade de Ceilândia, Universidade de Brasília, Distrito Federal, Brasília 70910-900, Brazil
| | - Isabella da Silva Almeida
- Laboratory of Muscle and Tendon Plasticity, Graduate Program in Rehabilitation Science, Faculdade de Ceilândia, Universidade de Brasília, Distrito Federal, Brasília 70910-900, Brazil
- Laboratory of Molecular Analysis, Graduate Program in Rehabilitation Science, Faculdade de Ceilândia, Universidade de Brasília, Distrito Federal, Brasília 70910-900, Brazil
| | - Henrique Mansur
- Department of Orthopaedics, Hospital Santa Helena—Rede D’or, Sao Paulo 03313-000, Brazil
| | - Nicolas Babault
- Centre d’Expertise de la Performance, INSERM U1093 CAPS, Sports Science Faculty, University of Burgundy, 21000 Dijon, France
| | - João Luiz Quagliotti Durigan
- Laboratory of Muscle and Tendon Plasticity, Graduate Program in Rehabilitation Science, Faculdade de Ceilândia, Universidade de Brasília, Distrito Federal, Brasília 70910-900, Brazil
- Laboratory of Molecular Analysis, Graduate Program in Rehabilitation Science, Faculdade de Ceilândia, Universidade de Brasília, Distrito Federal, Brasília 70910-900, Brazil
- Graduate Program in Physical Education, Physical Education Department, Universidade de Brasília, Distrito Federal, Brasília 70910-900, Brazil
| | - Rita de Cássia Marqueti
- Laboratory of Muscle and Tendon Plasticity, Graduate Program in Rehabilitation Science, Faculdade de Ceilândia, Universidade de Brasília, Distrito Federal, Brasília 70910-900, Brazil
- Laboratory of Molecular Analysis, Graduate Program in Rehabilitation Science, Faculdade de Ceilândia, Universidade de Brasília, Distrito Federal, Brasília 70910-900, Brazil
- Correspondence: ; Tel./Fax: +55-61-3107-8401
| |
Collapse
|
38
|
Hasebe Y, Suzuki K, Akasaka K, Saita K, Ogihara S. Inter-examiner reliability in identifying lumbar paraspinal muscle atrophy by lumbar paraspinal muscle atrophy index, a novel parameter. J Phys Ther Sci 2022; 34:737-740. [PMID: 36337221 PMCID: PMC9622343 DOI: 10.1589/jpts.34.737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Accepted: 08/16/2022] [Indexed: 11/05/2022] Open
Abstract
[Purpose] To evaluate the inter-examiner reliability of our novel parameter, the lumbar
paraspinal muscle atrophy index, in identifying the lumbar paravertebral muscle atrophy.
[Participants and Methods] The study group consisted of 225 adults, with a mean age of
64.7 (range, 21–89) years, who underwent posterior lumbar spinal surgery for degenerative
spinal disease at our hospital between July 2013 and June 2017. Preoperative axial
T2-weighted magnetic resonance images were used to evaluate the lumbar paraspinal muscle
atrophy index and observe the presence or absence of severe lumbar paraspinal muscle
atrophy. The lumbar paraspinal muscle atrophy index was calculated at each intervertebral
level, from L1-2 through L4-5, once by two examiners, and the Cohen’s kappa statistic was
used to calculate the inter-examiner agreement of the classification of the presence or
absence of atrophy at each level. [Results] The agreement was high (kappa, 0.79–0.88) for
the lumbar paraspinal muscle atrophy index at all levels, except at the L3-4 level (kappa,
0.49). The lower kappa statistic at L3-4 likely reflects the unique morphological
characteristics at this level. [Conclusion] The lumbar paraspinal muscle atrophy index is
a new, simple, easy-to-use, and sufficiently reliable parameter to identify lumbar
paraspinal atrophy.
Collapse
Affiliation(s)
- Yuki Hasebe
- Department of Rehabilitation, Saitama Medical Center,
Saitama Medical University, Japan, Department of Physical Therapy, Saitama Medical University
Graduate School of Medicine, Japan
| | - Kenta Suzuki
- Department of Rehabilitation, Saitama Medical Center,
Saitama Medical University, Japan
| | - Kiyokazu Akasaka
- Department of Physical Therapy, Saitama Medical University
Graduate School of Medicine, Japan
| | - Kazuo Saita
- Department of Orthopedics, Saitama Medical Center, Saitama
Medical University: 1981 Kamoda, Kawagoe, Saitama 350-8550, Japan
| | - Satoshi Ogihara
- Department of Orthopedics, Saitama Medical Center, Saitama
Medical University: 1981 Kamoda, Kawagoe, Saitama 350-8550, Japan,Corresponding author. Satoshi Ogihara (E-mail: )
| |
Collapse
|
39
|
Bower J, Magee WL, Catroppa C, Baker FA. Content Validity and Inter-rater Reliability of the Music Interventions in Pediatric DoC Behavior Observation Record. J Music Ther 2022; 60:13-35. [PMID: 36197798 DOI: 10.1093/jmt/thac013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Aligned with best practice guidelines for patients presenting with a disorder of consciousness (DoC), music therapy interventions with this population aim to increase arousal and awareness to support emergence to consciousness. There is a significant evidence base supporting music therapy for adults with a DoC; however, there are currently no published tools that systematically capture behavioral responses of this population during rehabilitative music therapy interventions. Further, the developmentally specific response to severe brain injury in the pediatric population means pediatric-specific research is required. The Music Interventions in Pediatric DoC Behavior Observation Record (Music Behavior Record [MBR]) was developed to objectively record responses during music therapy interventions for children presenting with a DoC. To establish content validity and inter-rater reliability, a pragmatic pilot study was undertaken. Results established that the MBR has content validity with 100% agreement among participants. Overall fair-substantial inter-rater reliability in >70% of the behavioral responses recorded in the MBR indicate the MBR is an early but promising tool to objectively capture responses during music therapy interventions. The use of the MBR may ultimately support clinical advancement and intervention research to optimize consciousness recovery for the pediatric DoC population.
Collapse
Affiliation(s)
- Janeen Bower
- Faculty of Fine Arts and Music, The University of Melbourne, Southbank, VIC, Australia.,Music Therapy Department, The Royal Children's Hospital Melbourne, Parkville, VIC, Australia
| | - Wendy L Magee
- Boyer College of Music and Dance, Temple University, Philadelphia, PA,USA
| | - Cathy Catroppa
- Brain and Mind, Clinical Sciences, The Murdoch Children's Research Institute, Parkville, VIC, Australia.,Melbourne School of Psychological Sciences and The Department of Paediatrics, The University of Melbourne, Parkville, VIC, Australia
| | - Felicity A Baker
- Faculty of Fine Arts and Music, The University of Melbourne, Southbank, VIC, Australia.,Centre of Research in Music and Health, Norwegian Academy of Music, Oslo, Norway
| |
Collapse
|
40
|
Budeanu RG, Broemmer C, Budeanu AR, Pop M. Comparing the Diagnostic Performance of ECG Gated versus Non-Gated CT Angiography in Ascending Aortic Dissection: A GRRAS Study. Tomography 2022; 8:2426-2434. [PMID: 36287800 PMCID: PMC9609484 DOI: 10.3390/tomography8050201] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Revised: 09/17/2022] [Accepted: 09/26/2022] [Indexed: 11/27/2022] Open
Abstract
Rationale and Objective: Thoracic CT angiography (CTA) for ascending aortic dissection, a life-threatening emergency, is performed routinely without Electrocardiographic (ECG) gating, therefore allowing the apparition of a pulsation artefact. We aimed to evaluate and compare the diagnostic performance, the inter and intra-reporter agreement of ECG gated CTA and non-ECG gated CTA for detecting ascending aortic dissection, considering their training level. Our hypothesis is that ECG gated CTA has superior diagnostic accuracy for ascending aortic dissection compared to non-gated CTA. Materials and Methods: We collected data using 24 questions survey using clinically validated CT examinations. Sixty-six respondents (medical students, radiology residents, and consultants) blinded to the actual diagnosis independently evaluated the images pertaining to the presence of ascending aortic dissection. The reference standard was represented by clinical and imaging diagnosis. Inter-rater and inter-group concordance was evaluated; the agreement with reference tests was calculated and assessed as a function of reporters’ training level. Results: Reporters’ ascending aortic dissection assessment showed a better correlation with the reference standard in the ECG gated CTA. The inter-rater correlation was higher in the ECG gated CTA compared to non-ECG gated CTA. Observers’ confidence for diagnosing ascending aortic dissection was higher in the ECG gated CTA. Statistically significant differences (p < 0.05) were found between different training levels when assessing non-ECG gated examinations. Conclusions: ECG gated CTA shows a higher diagnostic performance for ascending aortic dissection than non-ECG gated CTA, regardless of the reporters’ training level.
Collapse
Affiliation(s)
| | - Christian Broemmer
- ME1 Department, “George Emil Palade” University of Medicine, Pharmacy, Science and Technology of Targu Mures, 540142 Targu Mures, Romania
| | - Anamaria R. Budeanu
- Emergency County Hospital Târgu Mureș, 540136 Targu Mures, Romania
- Correspondence:
| | - Marian Pop
- ME1 Department, “George Emil Palade” University of Medicine, Pharmacy, Science and Technology of Targu Mures, 540142 Targu Mures, Romania
- Radiology and Medical Imaging Department, Emergency Institute for Cardiovascular Disease and Heart Transplant of Targu Mures, 540136 Targu Mures, Romania
| |
Collapse
|
41
|
Jeong J, Lee JM, Cho YS, Kim J. Inter-rater discrepancy of the House-Brackmann facial nerve grading system. Clin Otolaryngol 2022; 47:680-683. [PMID: 35818896 DOI: 10.1111/coa.13956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Revised: 05/31/2022] [Accepted: 06/26/2022] [Indexed: 11/30/2022]
Affiliation(s)
- Junhui Jeong
- Department of Otorhinolaryngology, National Health Insurance Service Ilsan Hospital, Goyang, Korea
| | - Jeon Mi Lee
- Department of Otorhinolaryngology, Ilsan Paik Hospital, Inje University College of Medicine, Goyang, Korea
| | - Yang-Sun Cho
- Department of Otorhinolaryngology-Head and Neck Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul
| | - Jin Kim
- Department of Otorhinolaryngology-Head and Neck Surgery, Dongtan Sacred Heart Hospital, Hallym University College of Medicine, Hwaseong, Korea
| |
Collapse
|
42
|
Fisher EH, Claudius I, Kaji AH, Shaban A, McGlynn N, Cicero MX, Santillanes G, Gausche-Hill M, Chang TP, Donofrio-Odmann JJ. Inter-Rater Reliability and Agreement Among Mass-Casualty Incident Algorithms Using a Pediatric Trauma Dataset: A Pilot Study. Prehosp Disaster Med 2022;:1-8. [PMID: 35441588 DOI: 10.1017/S1049023X22000632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
INTRODUCTION Many triage algorithms exist for use in mass-casualty incidents (MCIs) involving pediatric patients. Most of these algorithms have not been validated for reliability across users. STUDY OBJECTIVE Investigators sought to compare inter-rater reliability (IRR) and agreement among five MCI algorithms used in the pediatric population. METHODS A dataset of 253 pediatric (<14 years of age) trauma activations from a Level I trauma center was used to obtain prehospital information and demographics. Three raters were trained on five MCI triage algorithms: Simple Triage and Rapid Treatment (START) and JumpSTART, as appropriate for age (combined as J-START); Sort Assess Life-Saving Intervention Treatment (SALT); Pediatric Triage Tape (PTT); CareFlight (CF); and Sacco Triage Method (STM). Patient outcomes were collected but not available to raters. Each rater triaged the full set of patients into Green, Yellow, Red, or Black categories with each of the five MCI algorithms. The IRR was reported as weighted kappa scores with 95% confidence intervals (CI). Descriptive statistics were used to describe inter-rater and inter-MCI algorithm agreement. RESULTS Of the 253 patients, 247 had complete triage assignments among the five algorithms and were included in the study. The IRR was excellent for a majority of the algorithms; however, J-START and CF had the highest reliability with a kappa 0.94 or higher (0.9-1.0, 95% CI for overall weighted kappa). The greatest variability was in SALT among Green and Yellow patients. Overall, J-START and CF had the highest inter-rater and inter-MCI algorithm agreements. CONCLUSION The IRR was excellent for a majority of the algorithms. The SALT algorithm, which contains subjective components, had the lowest IRR when applied to this dataset of pediatric trauma patients. Both J-START and CF demonstrated the best overall reliability and agreement.
Collapse
|
43
|
Ansari NN, Rahimi M, Naghdi S, Barzegar-Ganji Z, Hasson S, Moghimi E. Inter- and intra-rater reliability of the modified modified ashworth scale in the assessment of muscle spasticity in cerebral palsy: A preliminary study. J Pediatr Rehabil Med 2022; 15:151-158. [PMID: 35213334 DOI: 10.3233/prm-190648] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
PURPOSE The aim of the study was to investigate the inter- and intra-rater reliability of the Modified Modified Ashworth Scale (MMAS) in the assessment of lower extremity spasticity in children with spastic cerebral palsy (CP). METHODS Fifteen children (10 boys) with a mean age of 8.7±3.4 years participated. Two physiotherapists rated the spasticity of the hip adductors, knee extensors, and ankle plantar flexors for inter-rater reliability. Each child was examined again by one of the physiotherapists (same physiotherapist for all of the children) for intra-rater reliability (mean interval = 7 days). A random sequence of raters and muscles tested was applied. RESULTS The reliability of the intraclass correlation coefficients (ICC) for individual muscle groups ranged between good to excellent (ICCagreement of 0.60-0.83). The ICC values for overall inter-rater (ICCagreement = 0.82) and intra-rater reliability (ICCagreement = 0.85) were excellent. CONCLUSION The MMAS showed excellent reliability for the assessment of lower extremity muscle spasticity in children with cerebral palsy. However, an interpretation should be made with caution due to the small sample size and wide range of confidence interval values.
Collapse
Affiliation(s)
- Noureddin Nakhostin Ansari
- Department of Physiotherapy, School of Rehabilitation, Tehran University of Medical Sciences, Tehran, Iran
- Research Center for War-affected People, Tehran University of Medical Sciences, Tehran, Iran
| | - Maryam Rahimi
- Department of Physiotherapy, School of Rehabilitation, Tehran University of Medical Sciences, Tehran, Iran
| | - Soofia Naghdi
- Department of Physiotherapy, School of Rehabilitation, Tehran University of Medical Sciences, Tehran, Iran
- Research Center for War-affected People, Tehran University of Medical Sciences, Tehran, Iran
| | - Zahra Barzegar-Ganji
- Department of Physiotherapy, School of Rehabilitation, Tehran University of Medical Sciences, Tehran, Iran
| | - Scott Hasson
- Department of Physical Therapy, Augusta University, Augusta, GA, USA
| | - Ehsan Moghimi
- Research Center for War-affected People, Tehran University of Medical Sciences, Tehran, Iran
| |
Collapse
|
44
|
Park E, Lee K, Han T, Nam HS. Agreement and Reliability Analysis of Machine Learning Scaling and Wireless Monitoring in the Assessment of Acute Proximal Weakness by Experts and Non-Experts: A Feasibility Study. J Pers Med 2022; 12:jpm12010020. [PMID: 35055335 PMCID: PMC8780198 DOI: 10.3390/jpm12010020] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2021] [Accepted: 12/17/2021] [Indexed: 12/15/2022] Open
Abstract
Assessing the symptoms of proximal weakness caused by neurological deficits requires the knowledge and experience of neurologists. Recent advances in machine learning and the Internet of Things have resulted in the development of automated systems that emulate physicians’ assessments. The application of those systems requires not only accuracy in the classification but also reliability regardless of users’ proficiency in the real environment for the clinical point-of-care and the personalized health management. This study provides an agreement and reliability analysis of using a machine learning-based scaling of Medical Research Council (MRC) proximal scores to evaluate proximal weakness by experts and non-experts. The system trains an ensemble learning model using the signals from sensors attached to the limbs of patients in a neurological intensive care unit. For the agreement analysis, we investigated the percent agreement of MRC proximal scores and Bland-Altman plots of kinematic features between the expert- and non-expert scaling. We also analyzed the intra-class correlation coefficients (ICCs) of kinematic features and Krippendorff’s alpha of the observers’ scaling for the reliability analysis. The mean percent agreement between the expert- and the non-expert scaling was 0.542 for manual scaling and 0.708 for autonomous scaling. The ICCs of kinematic features measured using sensors ranged from 0.742 to 0.850, whereas the Krippendorff’s alpha of manual scaling for the three observers was 0.275. The autonomous assessment system can be utilized by the caregivers, paramedics, or other observers during an emergency to evaluate acute stroke patients.
Collapse
Affiliation(s)
- Eunjeong Park
- Integrative Research Center for Cerebrovascular and Cardiovascular Diseases, Yonsei University College of Medicine, Seoul 03722, Korea;
| | - Kijeong Lee
- Department of Neurology, National Health Insurance Service, Ilsan Hospital, Goyang 10444, Korea;
| | - Taehwa Han
- Health-IT Center, Yonsei University College of Medicine, Seoul 03722, Korea;
| | - Hyo Suk Nam
- Department of Neurology, Yonsei University College of Medicine, Seoul 03722, Korea
- Correspondence: ; Tel.: +82-2-2228-1617
| |
Collapse
|
45
|
Schaeffer EK, Ponton E, Sankar WN, Kim HK, Kelley SP, Cundy PJ, Price CT, Clarke NM, Wedge JH, Mulpuri K. Interobserver and Intraobserver Reliability in the Salter Classification of Avascular Necrosis of the Femoral Head in Developmental Dysplasia of the Hip. J Pediatr Orthop 2022; 42:e59-e64. [PMID: 34889834 PMCID: PMC8663514 DOI: 10.1097/bpo.0000000000001979] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
BACKGROUND Avascular necrosis (AVN) of the femoral head is a concerning complication that can result from treatments for developmental dysplasia of the hip (DDH). AVN can lead to degenerative osteoarthritis, persistent acetabular dysplasia, reduced function, and continuing hip pain. The incidence of AVN reported in the DDH literature is widely varied (0% to 73%). This variability may arise from lack of consensus on what constitutes true AVN in this patient population, and lack of clear criteria provided in studies reporting incidence rates. METHODS A multicentre, prospective database of infants diagnosed with DDH between 2010 and 2014 from 0 to 18 months of age was analyzed for patients treated by closed reduction (CR). Twelve pediatric orthopaedic surgeons completed 2 rounds of AVN assessments. Deidentified anteroposterior radiographs at most recent follow-up were provided to surgeons along with patient age at radiographic assessment, length of follow-up, ands affected hip. Ten of 12 surgeons completed a third round of assessments where they were provided with 1 to 2 additional radiographs within the follow-up period. Radiographic criteria for total AVN described by Salter and colleagues were used. Surgeons rated the presence of AVN as "yes" or "no" and kappa values were calculated within and between rounds. RESULTS A total of 69 hips in 60 patients were assessed for AVN a median of 22 months (range: 12 to 36) post-CR. Interobserver kappa values for rounds 1, 2, and 3 were 0.52 (range: 0.11 to 0.90), 0.61 (range: 0.21 to 0.90), and 0.53 (range: 0.10 to 0.79), respectively. Intraobserver agreement for AVN diagnosis was an average of 0.72 (range: 0.31 to 0.96). CONCLUSIONS Despite using the most commonly referenced diagnostic criteria, radiographic diagnosis of AVN following CR in DDH patients demonstrated only moderate agreement across surgeons. The addition of sequential radiographs did not improve cross-observer reliability, and while substantial agreement was seen within observers, the range of intraobserver kappa values was large. LEVEL OF EVIDENCE Level I-diagnostic study.
Collapse
Affiliation(s)
- Emily K. Schaeffer
- Department of Orthopaedics, University of British Columbia
- Department of Orthopaedic Surgery, BC Children’s Hospital
| | - Ethan Ponton
- Department of Orthopaedics, University of British Columbia
- Office of Pediatric Surgical Evaluation and Innovation, BC Children’s Hospital, University of British Columbia, Vancouver, BC
| | - Wudbhav N. Sankar
- Division of Orthopaedics, The Children’s Hospital of Philadelphia, Philadelphia, PA
| | - Harry K.W. Kim
- Center for Excellence in Hip Disorders, Texas Scottish Rite Hospital for Children, Dallas, TX
- Department of Orthopaedic Surgery, University of Texas Southwestern Medical Center, Dallas, TX
| | | | - Peter J. Cundy
- Centre for Orthopaedic and Trauma Research, The University of Adelaide
- Department of Orthopaedic Surgery, Women’s and Children’s Hospital, Adelaide, SA, Australia
| | | | - Nicholas M.P. Clarke
- Department of Pediatric Orthopaedic Surgery, Southampton Children’s Hospital
- University of Southampton, Southampton, UK
| | | | - Kishore Mulpuri
- Department of Orthopaedics, University of British Columbia
- Department of Orthopaedic Surgery, BC Children’s Hospital
| |
Collapse
|
46
|
Charvolin L, Rippert P, Roche S, Rabilloud M, Morard MD, Marco JD, Dinomais M, Pouyfaucon M, Gimat R, Perennou D, Houx L, Iwaz J, Rode G, Vuillerot C. Determining the inter-rater reliability of the SOFMER Activity Score (version 2) for subjects in rehabilitation centers. Arch Phys Med Rehabil 2021; 103:1122-1130. [PMID: 34890563 DOI: 10.1016/j.apmr.2021.11.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Revised: 11/03/2021] [Accepted: 11/09/2021] [Indexed: 11/29/2022]
Abstract
OBJECTIVES To assess the inter-rater reliability of the SOFMER Activity Score (SAS, version 2, an 8-item -4 motor and 4 cognitive- and 5-level scale) and improve its scoring system before conducting further validation steps. DESIGN Cross-sectional, prospective, observational, non-interventional, and multicentric study. SETTING The study was conducted between November 2018 and September 2019 in four French rehabilitation centers (two public university hospitals for adults and two private not-for-profit rehabilitation centers for children). PARTICIPANTS The study included 101 subjects (mean age: 44.5 years; SD: 25.4; 28.7% under 18 and 18.8% over 65). The female/male sex ratio was 0.6. The causes for admission to the center were mainly neurological (65%) or orthopedic (24%). INTERVENTIONS None. MAIN OUTCOME MEASURE Activity limitation was rated with the SOFMER Activity Score the same day by two independent multidisciplinary teams. The inter-rater reliabilities of the Score items were assessed using weighted kappa coefficients. RESULTS All weighted kappa coefficients ranged between 0.83 and 0.92 indicating 'good' to 'excellent' inter-rater reliability. Inter-team score disagreements occurred in 227 scores out of 808 (28%). The reason for most disagreements was unnoticed human or material aid during the observation period. CONCLUSION The results demonstrate the high inter-rater reliability of the SASv2 and allow carrying out further validation steps after minor changes to item scoring instructions and clearer definitions of some items that help improving scoring standardization. The SASv2 may then become a consistent measure of activity level for clinical research or burden of care investigations.
Collapse
Affiliation(s)
- Lorraine Charvolin
- Service de Médecine Physique et de Réadaptation Pédiatrique (L'Escale), Hôpital Femme-Mère-Enfant, Hospices Civils de Lyon, Bron, France.
| | - Pascal Rippert
- Service Recherche et Épidémiologie Clinique, Pôle santé publique, Hospices Civils de Lyon, Lyon, France
| | - Sylvain Roche
- Université de Lyon, Lyon, France; Université Lyon 1, Villeurbanne, France; Hospices Civils de Lyon, Pôle Santé Publique, Service de Biostatistique-Bioinformatique, Lyon, France; CNRS, UMR 5558, Laboratoire de Biométrie et Biologie Évolutive, Équipe Biostatistique-Santé, Villeurbanne, France
| | - Muriel Rabilloud
- Université de Lyon, Lyon, France; Université Lyon 1, Villeurbanne, France; Hospices Civils de Lyon, Pôle Santé Publique, Service de Biostatistique-Bioinformatique, Lyon, France; CNRS, UMR 5558, Laboratoire de Biométrie et Biologie Évolutive, Équipe Biostatistique-Santé, Villeurbanne, France
| | - Marie-Doriane Morard
- Service de Médecine Physique et de Réadaptation Pédiatrique (L'Escale), Hôpital Femme-Mère-Enfant, Hospices Civils de Lyon, Bron, France
| | - Julie Di Marco
- Service de Médecine Physique et Réadaptation, Hôpital Henry-Gabrielle, Hospices Civils de Lyon, Saint-Genis-Laval, France
| | - Mickael Dinomais
- Département de Médecine Physique et Rééducation, Centre Hospitalier Universitaire, Angers, France
| | - Margaux Pouyfaucon
- Service Médecine Physique et Rééducation Fonctionnelle, Centre Hospitalier Universitaire d'Angers, Angers, France; Centre de Rééducation et de Réadaptation Fonctionnelles Les Capucins, Angers, France; Service de Rééducation, Centre Hospitaliser de Cholet, Cholet, France
| | - Rémi Gimat
- Service Rééducation Neurologique, Hôpital Sud Centre Hospitalier Universitaire de Grenoble-Alpes, Echirolles, France; Laboratoire de Psychologie et Neurocognition (LPNC), Université Grenoble-Alpes, Grenoble, France
| | - Dominique Perennou
- Service Rééducation Neurologique, Hôpital Sud Centre Hospitalier Universitaire de Grenoble-Alpes, Echirolles, France; Laboratoire de Psychologie et Neurocognition (LPNC), Université Grenoble-Alpes, Grenoble, France
| | - Laetitia Houx
- Service de Médecine Physique et de Réadaptation, Centre Hospitalier Régional et Universitaire de Brest, Brest, France; Inserm UMR 1101, Laboratoire de Traitement de l'Information Médicale (LaTIM), Brest, France; Service de Médecine Physique et de Réadaptation Pédiatrique, Fondation Ildys, Brest. France
| | - Jean Iwaz
- Université de Lyon, Lyon, France; Université Lyon 1, Villeurbanne, France; Hospices Civils de Lyon, Pôle Santé Publique, Service de Biostatistique-Bioinformatique, Lyon, France; CNRS, UMR 5558, Laboratoire de Biométrie et Biologie Évolutive, Équipe Biostatistique-Santé, Villeurbanne, France
| | - Gilles Rode
- Service de Médecine Physique et Réadaptation, Hôpital Henry-Gabrielle, Hospices Civils de Lyon, Saint-Genis-Laval, France; Integrative, Multisensory, Perception, Action and Cognition Team (IMPACT), Centre de Recherche en Neurosciences de Lyon (Inserm UMR-S, 1028, CNRS UMR 5292, Université Lyon 1, Université Saint-Etienne), Bron, France
| | - Carole Vuillerot
- Service de Médecine Physique et de Réadaptation Pédiatrique (L'Escale), Hôpital Femme-Mère-Enfant, Hospices Civils de Lyon, Bron, France; Institut Neuromyogène, CNRS UMR 5310 - INSERM U1217, Université de Lyon, Lyon, France
| |
Collapse
|
47
|
Nguyen Huynh A, Besse C, Mediouni Z, El May E, Shoman Y, Hansez I, Guseva Canu I. Diagnostic Performances of an Occupational Burnout Detection Method Designed for Healthcare Professionals. Int J Environ Res Public Health 2021; 18:ijerph182312300. [PMID: 34886022 PMCID: PMC8657176 DOI: 10.3390/ijerph182312300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Revised: 11/09/2021] [Accepted: 11/19/2021] [Indexed: 11/16/2022]
Abstract
BACKGROUND We aimed to assess the validity (criterion and cross-cultural validity) and reliability of the first occupational burnout (OB) detection tool designed for healthcare professionals in Belgium in the context of Swiss medical practice. METHODS First, we assessed the sensitivity and specificity of the Tool. We developed this tool based on the consultation reports of 42 patients and compared its detection to the results of the Oldenburg Burnout Inventory (OLBI), filled-in by patients before a consultation. Second, we performed an inter-rater reliability (IRR) assessment on the OB symptoms and detection reached by the Tool between a psychiatrist, two psychologists, and an occupational physician. RESULTS The Tool correctly identified over 80% of patients with OB, regardless of the cutoff value used for OLBI scores, reflecting its high sensitivity. Conversely, its specificity strongly varied depending on the OLBI cutoff. There was a slight to fair overall agreement between the four raters on the detection of OB and the number of OB symptoms. Around 41% of symptoms showed a substantial to an almost perfect agreement, and 36% showed a slight to a moderate agreement. CONCLUSIONS The Tool seems useful for identifying OB of moderate and strong severity in both the Belgian and Swiss contexts.
Collapse
Affiliation(s)
- Agathe Nguyen Huynh
- Center for Primary Care and Public Health (Unisanté), Department of Occupational and Environmental Health, University of Lausanne, 1066 Lausanne, Switzerland; (A.N.H.); (Z.M.); (E.E.M.); (I.G.C.)
| | - Christine Besse
- Medical Direction of the Department of Psychiatry, CHUV, Les Cèdres (Cery), 1008 Prilly, Switzerland;
| | - Zakia Mediouni
- Center for Primary Care and Public Health (Unisanté), Department of Occupational and Environmental Health, University of Lausanne, 1066 Lausanne, Switzerland; (A.N.H.); (Z.M.); (E.E.M.); (I.G.C.)
| | - Emna El May
- Center for Primary Care and Public Health (Unisanté), Department of Occupational and Environmental Health, University of Lausanne, 1066 Lausanne, Switzerland; (A.N.H.); (Z.M.); (E.E.M.); (I.G.C.)
- Faculty of Psychology and Educational Sciences, University of Geneva, 1205 Geneva, Switzerland
| | - Yara Shoman
- Center for Primary Care and Public Health (Unisanté), Department of Occupational and Environmental Health, University of Lausanne, 1066 Lausanne, Switzerland; (A.N.H.); (Z.M.); (E.E.M.); (I.G.C.)
- Correspondence: ; Tel.: +41-21-314-7413
| | - Isabelle Hansez
- Unit of Promotion of Human Resources, Faculty of Psychology, Speech Therapy and Educational Sciences, University of Liège, 4000 Liège, Belgium;
| | - Irina Guseva Canu
- Center for Primary Care and Public Health (Unisanté), Department of Occupational and Environmental Health, University of Lausanne, 1066 Lausanne, Switzerland; (A.N.H.); (Z.M.); (E.E.M.); (I.G.C.)
| |
Collapse
|
48
|
Van der Plas D, Verbraecken J, Willemen M, Meert W, Davis J. Evaluation of Automated Hypnogram Analysis on Multi-Scored Polysomnographies. Front Digit Health 2021; 3:707589. [PMID: 34713177 PMCID: PMC8521900 DOI: 10.3389/fdgth.2021.707589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Accepted: 06/29/2021] [Indexed: 11/21/2022] Open
Abstract
A new method for automated sleep stage scoring of polysomnographies is proposed that uses a random forest approach to model feature interactions and temporal effects. The model mostly relies on features based on the rules from the American Academy of Sleep Medicine, which allows medical experts to gain insights into the model. A common way to evaluate automated approaches to constructing hypnograms is to compare the one produced by the algorithm to an expert's hypnogram. However, given the same data, two expert annotators will construct (slightly) different hypnograms due to differing interpretations of the data or individual mistakes. A thorough evaluation of our method is performed on a multi-labeled dataset in which both the inter-rater variability as well as the prediction uncertainties are taken into account, leading to a new standard for the evaluation of automated sleep stage scoring algorithms. On all epochs, our model achieves an accuracy of 82.7%, which is only slightly lower than the inter-rater disagreement. When only considering the 63.3% of the epochs where both the experts and algorithm are certain, the model achieves an accuracy of 97.8%. Transition periods between sleep stages are identified and studied for the first time. Scoring guidelines for medical experts are provided to complement the certain predictions by scoring only a few epochs manually. This makes the proposed method highly time-efficient while guaranteeing a highly accurate final hypnogram.
Collapse
Affiliation(s)
- Dries Van der Plas
- Onafhankelijke Software Groep (OSG bv), Micromed Group, Kontich, Belgium.,Department of Computer Science, Leuven AI, KU Leuven, Leuven, Belgium.,Faculty of Medicine and Health Sciences, University of Antwerp, Antwerp, Belgium
| | - Johan Verbraecken
- Faculty of Medicine and Health Sciences, University of Antwerp, Antwerp, Belgium.,Multidisciplinary Sleep Disorders Centre, Antwerp University Hospital, Antwerp, Belgium.,Department of Pulmonary Medicine, Antwerp University Hospital, Antwerp, Belgium
| | - Marc Willemen
- Multidisciplinary Sleep Disorders Centre, Antwerp University Hospital, Antwerp, Belgium
| | - Wannes Meert
- Department of Computer Science, Leuven AI, KU Leuven, Leuven, Belgium
| | - Jesse Davis
- Department of Computer Science, Leuven AI, KU Leuven, Leuven, Belgium
| |
Collapse
|
49
|
Dumas CM, Grajo LC. The Content Validity and Inter-Rater Reliability of the Occupational Therapy Pediatric Inventory of Cognitive Skills (OT-PICS): An Assessment Tool of Functional Cognition in Children. Occup Ther Health Care 2021; 36:84-100. [PMID: 34473001 DOI: 10.1080/07380577.2021.1972381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
The Occupational Therapy Pediatric Inventory of Cognitive Skills (OT-PICS) is being developed to evaluate functional cognition skills in children in the domains of play, educational participation, and self-care. This study aimed to determine the content validity and inter-rater reliability of the OT-PICS. Seven content experts agreed that all 15 items of the tool are essential items to examine functional cognition in children (k = 0.71-1.0; I-CVI = 0.71-1.0; S-CVI = 0.96. The OT-PICS also has moderate reliability (ICC = 0.63) between nine trained raters. The tool was then revised and refined for clarity based on therapist's comments and feedback.
Collapse
Affiliation(s)
- Christina M Dumas
- Programs in Occupational Therapy, Department of Rehabilitation and Regenerative Medicine, Columbia University Irving Medical Center, New York, NY, USA
| | - Lenin C Grajo
- Programs in Occupational Therapy, Department of Rehabilitation and Regenerative Medicine, Columbia University Irving Medical Center, New York, NY, USA
| |
Collapse
|
50
|
Jager NW, Newig J, Challies E, Kochskämper E, von Wehrden H. Case study meta-analysis in the social sciences. Insights on data quality and reliability from a large-N case survey. Res Synth Methods 2021; 13:12-27. [PMID: 34318609 DOI: 10.1002/jrsm.1514] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Revised: 04/28/2021] [Accepted: 07/10/2021] [Indexed: 11/06/2022]
Abstract
Meta-analytical methods face particular challenges in research fields such as social and political research, where studies often rest primarily on qualitative and case study research. In such contexts, where research findings are less standardized and amenable to structured synthesis, the case survey method has been proposed as a means of data generation and analysis. The method offers a meta-analytical tool to synthesize larger numbers of qualitative case studies, yielding data amenable to large-N analysis. However, resulting data is prone to specific threats to validity, including biases due to publication type, rater behaviour, and variable characteristics, which researchers need to be aware of. While these biases are well known in theory, and typically explored for primary research, their prevalence in case survey meta-analyses remains relatively unexplored. We draw on a case survey of 305 published qualitative case studies of public environmental decision-making, and systematically analyze these biases in the resultant data. Our findings indicate that case surveys can deliver high-quality and reliable results. However, we also find that these biases do indeed occur, albeit to a small degree or under specific conditions of complexity. We identify a number of design choices to mitigate biases that may threaten validity in case survey meta-analysis. Our findings are of importance to those using the case survey method - and to those who might apply insights derived by this method to inform policy and practice.
Collapse
Affiliation(s)
- Nicolas W Jager
- Research Group on Ecological Economics, Carl von Ossietzky University of Oldenburg, Oldenburg, Germany
| | - Jens Newig
- Research Group Governance and Sustainability, Leuphana University of Lüneburg, Lüneburg, Germany
| | - Edward Challies
- Research Group Governance and Sustainability, Leuphana University of Lüneburg, Lüneburg, Germany.,Waterways Centre for Freshwater Management, University of Canterbury, Christchurch, New Zealand
| | - Elisa Kochskämper
- IRS Leibniz Institute for Research on Society and Space, Erkner, Germany
| | - Henrik von Wehrden
- Faculty of Sustainability, Leuphana University of Lüneburg, Lüneburg, Germany
| |
Collapse
|