1
|
Chen Z, Hu B, Liu X, Becker B, Eickhoff SB, Miao K, Gu X, Tang Y, Dai X, Li C, Leonov A, Xiao Z, Feng Z, Chen J, Chuan-Peng H. Sampling inequalities affect generalization of neuroimaging-based diagnostic classifiers in psychiatry. BMC Med 2023; 21:241. [PMID: 37400814 DOI: 10.1186/s12916-023-02941-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Accepted: 06/13/2023] [Indexed: 07/05/2023] Open
Abstract
BACKGROUND The development of machine learning models for aiding in the diagnosis of mental disorder is recognized as a significant breakthrough in the field of psychiatry. However, clinical practice of such models remains a challenge, with poor generalizability being a major limitation. METHODS Here, we conducted a pre-registered meta-research assessment on neuroimaging-based models in the psychiatric literature, quantitatively examining global and regional sampling issues over recent decades, from a view that has been relatively underexplored. A total of 476 studies (n = 118,137) were included in the current assessment. Based on these findings, we built a comprehensive 5-star rating system to quantitatively evaluate the quality of existing machine learning models for psychiatric diagnoses. RESULTS A global sampling inequality in these models was revealed quantitatively (sampling Gini coefficient (G) = 0.81, p < .01), varying across different countries (regions) (e.g., China, G = 0.47; the USA, G = 0.58; Germany, G = 0.78; the UK, G = 0.87). Furthermore, the severity of this sampling inequality was significantly predicted by national economic levels (β = - 2.75, p < .001, R2adj = 0.40; r = - .84, 95% CI: - .41 to - .97), and was plausibly predictable for model performance, with higher sampling inequality for reporting higher classification accuracy. Further analyses showed that lack of independent testing (84.24% of models, 95% CI: 81.0-87.5%), improper cross-validation (51.68% of models, 95% CI: 47.2-56.2%), and poor technical transparency (87.8% of models, 95% CI: 84.9-90.8%)/availability (80.88% of models, 95% CI: 77.3-84.4%) are prevailing in current diagnostic classifiers despite improvements over time. Relating to these observations, model performances were found decreased in studies with independent cross-country sampling validations (all p < .001, BF10 > 15). In light of this, we proposed a purpose-built quantitative assessment checklist, which demonstrated that the overall ratings of these models increased by publication year but were negatively associated with model performance. CONCLUSIONS Together, improving sampling economic equality and hence the quality of machine learning models may be a crucial facet to plausibly translating neuroimaging-based diagnostic classifiers into clinical practice.
Collapse
Affiliation(s)
- Zhiyi Chen
- Experimental Research Center for Medical and Psychological Science (ERC-MPS), School of Psychology, Third Military Medical University, Chongqing, China.
- Faculty of Psychology, Southwest University, Chongqing, China.
| | - Bowen Hu
- Faculty of Psychology, Southwest University, Chongqing, China
| | - Xuerong Liu
- Experimental Research Center for Medical and Psychological Science (ERC-MPS), School of Psychology, Third Military Medical University, Chongqing, China
| | - Benjamin Becker
- The Center of Psychosomatic Medicine, Sichuan Provincial Center for Mental Health, Sichuan Provincial People's Hospital, Chengdu, China
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Laboratory for Neuroinformation, University of Electronic Science and Technology of China, Chengdu, China
| | - Simon B Eickhoff
- Institute of Systems Neuroscience, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Kuan Miao
- Experimental Research Center for Medical and Psychological Science (ERC-MPS), School of Psychology, Third Military Medical University, Chongqing, China
| | - Xingmei Gu
- Experimental Research Center for Medical and Psychological Science (ERC-MPS), School of Psychology, Third Military Medical University, Chongqing, China
| | - Yancheng Tang
- School of Business and Management, Shanghai International Studies University, Shanghai, China
| | - Xin Dai
- Faculty of Psychology, Southwest University, Chongqing, China
| | - Chao Li
- Department of Radiology, The Third Affiliated Hospital, Sun Yat-Sen University, Guangdong, China
| | - Artemiy Leonov
- School of Psychology, Clark University, Worcester, MA, USA
| | - Zhibing Xiao
- State Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, Beijing, China
| | - Zhengzhi Feng
- Experimental Research Center for Medical and Psychological Science (ERC-MPS), School of Psychology, Third Military Medical University, Chongqing, China
| | - Ji Chen
- Department of Psychology and Behavioral Sciences, Zhejiang University, Hangzhou, China.
- Department of Psychiatry, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, China.
| | - Hu Chuan-Peng
- School of Psychology, Nanjing Normal University, Nanjing, China
| |
Collapse
|
2
|
Chen Z, Liu X, Yang Q, Wang YJ, Miao K, Gong Z, Yu Y, Leonov A, Liu C, Feng Z, Chuan-Peng H. Evaluation of Risk of Bias in Neuroimaging-Based Artificial Intelligence Models for Psychiatric Diagnosis: A Systematic Review. JAMA Netw Open 2023; 6:e231671. [PMID: 36877519 PMCID: PMC9989906 DOI: 10.1001/jamanetworkopen.2023.1671] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 03/07/2023] Open
Abstract
IMPORTANCE Neuroimaging-based artificial intelligence (AI) diagnostic models have proliferated in psychiatry. However, their clinical applicability and reporting quality (ie, feasibility) for clinical practice have not been systematically evaluated. OBJECTIVE To systematically assess the risk of bias (ROB) and reporting quality of neuroimaging-based AI models for psychiatric diagnosis. EVIDENCE REVIEW PubMed was searched for peer-reviewed, full-length articles published between January 1, 1990, and March 16, 2022. Studies aimed at developing or validating neuroimaging-based AI models for clinical diagnosis of psychiatric disorders were included. Reference lists were further searched for suitable original studies. Data extraction followed the CHARMS (Checklist for Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modeling Studies) and PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-analyses) guidelines. A closed-loop cross-sequential design was used for quality control. The PROBAST (Prediction Model Risk of Bias Assessment Tool) and modified CLEAR (Checklist for Evaluation of Image-Based Artificial Intelligence Reports) benchmarks were used to systematically evaluate ROB and reporting quality. FINDINGS A total of 517 studies presenting 555 AI models were included and evaluated. Of these models, 461 (83.1%; 95% CI, 80.0%-86.2%) were rated as having a high overall ROB based on the PROBAST. The ROB was particular high in the analysis domain, including inadequate sample size (398 of 555 models [71.7%; 95% CI, 68.0%-75.6%]), poor model performance examination (with 100% of models lacking calibration examination), and lack of handling data complexity (550 of 555 models [99.1%; 95% CI, 98.3%-99.9%]). None of the AI models was perceived to be applicable to clinical practices. Overall reporting completeness (ie, number of reported items/number of total items) for the AI models was 61.2% (95% CI, 60.6%-61.8%), and the completeness was poorest for the technical assessment domain with 39.9% (95% CI, 38.8%-41.1%). CONCLUSIONS AND RELEVANCE This systematic review found that the clinical applicability and feasibility of neuroimaging-based AI models for psychiatric diagnosis were challenged by a high ROB and poor reporting quality. Particularly in the analysis domain, ROB in AI diagnostic models should be addressed before clinical application.
Collapse
Affiliation(s)
- Zhiyi Chen
- School of Psychology, Third Military Medical University, Chongqing, China
- Experimental Research Center for Medical and Psychological Science, Third Military Medical University, Chongqing, China
| | - Xuerong Liu
- School of Psychology, Third Military Medical University, Chongqing, China
- Experimental Research Center for Medical and Psychological Science, Third Military Medical University, Chongqing, China
| | - Qingwu Yang
- Department of Neurology, Daping Hospital, Third Military Medical University, Chongqing, China
| | - Yan-Jiang Wang
- Department of Neurology, Daping Hospital, Third Military Medical University, Chongqing, China
| | - Kuan Miao
- School of Psychology, Third Military Medical University, Chongqing, China
- Experimental Research Center for Medical and Psychological Science, Third Military Medical University, Chongqing, China
| | - Zheng Gong
- School of Psychology, Third Military Medical University, Chongqing, China
- Experimental Research Center for Medical and Psychological Science, Third Military Medical University, Chongqing, China
| | - Yang Yu
- School of Psychology, Third Military Medical University, Chongqing, China
| | - Artemiy Leonov
- Department of Psychology, Clark University, Worcester, Massachusetts
| | - Chunlei Liu
- School of Psychology, Qufu Normal University, Qufu, China
| | - Zhengzhi Feng
- School of Psychology, Third Military Medical University, Chongqing, China
- Experimental Research Center for Medical and Psychological Science, Third Military Medical University, Chongqing, China
| | - Hu Chuan-Peng
- School of Psychology, Nanjing Normal University, Nanjing, China
| |
Collapse
|
3
|
Abramov DM, Miranda de Sá AMFL. Probability waves: Adaptive cluster-based correction by convolution of p-value series from mass univariate analysis. J Neurosci Methods 2021; 357:109155. [PMID: 33781790 DOI: 10.1016/j.jneumeth.2021.109155] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2020] [Revised: 02/09/2021] [Accepted: 03/20/2021] [Indexed: 11/25/2022]
Abstract
BACKGROUND Methods for p-value correction are criticized for either increasing Type II error or improperly reducing Type I error in large exploratory data analysis. This text considers patterns in probability vectors resulting from mass univariate analysis to correct p-values, where clusters of significant p-values may indicate true H0 rejection. NEW METHOD We used ERP experimental data from control and ADHD boys to test the method. The Log10 of p-vector was convolved with a Gaussian window whose length was set as the shortest lag above which autocorrelation of each ERP wave may be assumed to have vanished. We realized Monte-Carlo simulations (MC) to (1) evaluate confidence intervals of rejected and non-rejected areas of our data, (2) to evaluate differences between corrected and uncorrected p-vectors or simulated ones in terms of distribution of significant p-values, and (3) to empirically verify the type-I error rate (comparing 10,000 pairs of mixed samples whit control and ADHD subjects). RESULTS The differences between simulation or raw p-vector and corrected p-vectors were, respectively, minimal and maximal for window length set by autocorrelation in p-vector convolution. COMPARISON WITH EXISTING METHODS Our method was less conservative while FDR methods rejected basically all significant p-values.The MC simulations presented 2.78 ± 4.83% of difference (20 channels) from corrected p-vector, while difference from raw p-vector was 596 ± 5.00% (p = 0.0003). CONCLUSION As a cluster-based correction, the present new method seems to be biological and statistically suitable to correct p-values in mass univariate analysis of ERP waves, which adopts adaptive parameters to correction.
Collapse
Affiliation(s)
- Dimitri Marques Abramov
- Laboratory of Neurobiology and Clinical Neurophysiology, National Institute of Women, Children and Adolescents Health Fernandes Figueira, Oswaldo Cruz Foundation, Rio de Janeiro, RJ, Brazil.
| | | |
Collapse
|
4
|
Abramov DM, Cunha CQ, Galhanone PR, Alvim RJ, de Oliveira AM, Lazarev VV. Neurophysiological and behavioral correlates of alertness impairment and compensatory processes in ADHD evidenced by the Attention Network Test. PLoS One 2019; 14:e0219472. [PMID: 31344047 PMCID: PMC6657843 DOI: 10.1371/journal.pone.0219472] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2019] [Accepted: 06/24/2019] [Indexed: 11/21/2022] Open
Abstract
In Attention Deficit Hyperactivity disorder (ADHD), fMRI studies show asymmetric alterations: widespread hypoactivation in anterior cortical areas and hyperactivation in some posterior regions, and the latter is considered to be related to compensatory processes. In Posner’s attentional networks, an important role is attributed to functional interhemispheric asymmetries. The psychophysiological Attention Network Test (ANT), which measures the efficiency of the alerting, orienting, and executive networks, seems particularly informative for ADHD. Potentials related to ANT stimuli (ANT-RPs) have revealed reduced cognitive potential P3 in ADHD. However, there are no studies associated with asymmetry of ANT-RPs. In the present study, conducted with 20 typically developing boys and 19 boys with ADHD, aged 11–13 years, the efficiency of the three Posner’s networks regarding performance and amplitude asymmetries in ANT-RPs was evaluated according to the arithmetic difference of these parameters between different cue and target presentation conditions. The results were correlated to Diagnostic and Statistical Manual of Mental Disorders (DSM-IV-TR) scores. Regarding accuracy and intraindividual variation in reaction time, ADHD subjects showed lower efficiency of executive and alerting network, and this effect was correlated with DSM. Regarding alerting network, ANT-RPs in ADHD did not have the right-side amplitude prevalence in the temporal regions, which was observed in controls. In all ANT conditions, significantly higher asymmetries were observed in ADHD than in controls in the occipital regions 40–200 ms after target onset. Their amplitude in ADHD subjects was inversely proportional to DSM scores of inattentiveness and directly proportional to accuracy and efficiency of the executive network. The results suggest impaired alerting and executive networks in ADHD and compensatory occipital mechanisms.
Collapse
Affiliation(s)
- Dimitri M. Abramov
- Laboratory of Neurobiology and Clinical Neurophysiology, National Institute of Women, Children and Adolescents Health Fernandes Figueira, Oswaldo Cruz Foundation, Rio de Janeiro, Brazil
- * E-mail:
| | - Carla Quero Cunha
- Laboratory of Neurobiology and Clinical Neurophysiology, National Institute of Women, Children and Adolescents Health Fernandes Figueira, Oswaldo Cruz Foundation, Rio de Janeiro, Brazil
| | - Paulo Ricardo Galhanone
- Laboratory of Neurobiology and Clinical Neurophysiology, National Institute of Women, Children and Adolescents Health Fernandes Figueira, Oswaldo Cruz Foundation, Rio de Janeiro, Brazil
| | - Renata Joviano Alvim
- Laboratory of Neurobiology and Clinical Neurophysiology, National Institute of Women, Children and Adolescents Health Fernandes Figueira, Oswaldo Cruz Foundation, Rio de Janeiro, Brazil
| | - Andrei Mayer de Oliveira
- Department of Physiological Sciences, Center of Biological Sciences, Federal University of Santa Catarina, Florianopolis, Brazil
| | - Vladimir V. Lazarev
- Laboratory of Neurobiology and Clinical Neurophysiology, National Institute of Women, Children and Adolescents Health Fernandes Figueira, Oswaldo Cruz Foundation, Rio de Janeiro, Brazil
| |
Collapse
|