1
|
Sanchez T, Esteban O, Gomez Y, Pron A, Koob M, Dunet V, Girard N, Jakab A, Eixarch E, Auzias G, Bach Cuadra M. FetMRQC: A robust quality control system for multi-centric fetal brain MRI. Med Image Anal 2024; 97:103282. [PMID: 39053168 DOI: 10.1016/j.media.2024.103282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 06/28/2024] [Accepted: 07/15/2024] [Indexed: 07/27/2024]
Abstract
Fetal brain MRI is becoming an increasingly relevant complement to neurosonography for perinatal diagnosis, allowing fundamental insights into fetal brain development throughout gestation. However, uncontrolled fetal motion and heterogeneity in acquisition protocols lead to data of variable quality, potentially biasing the outcome of subsequent studies. We present FetMRQC, an open-source machine-learning framework for automated image quality assessment and quality control that is robust to domain shifts induced by the heterogeneity of clinical data. FetMRQC extracts an ensemble of quality metrics from unprocessed anatomical MRI and combines them to predict experts' ratings using random forests. We validate our framework on a pioneeringly large and diverse dataset of more than 1600 manually rated fetal brain T2-weighted images from four clinical centers and 13 different scanners. Our study shows that FetMRQC's predictions generalize well to unseen data while being interpretable. FetMRQC is a step towards more robust fetal brain neuroimaging, which has the potential to shed new insights on the developing human brain.
Collapse
Affiliation(s)
- Thomas Sanchez
- CIBM - Center for Biomedical Imaging, Switzerland; Department of Diagnostic and Interventional Radiology, Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland.
| | - Oscar Esteban
- Department of Diagnostic and Interventional Radiology, Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland
| | - Yvan Gomez
- BCNatal Fetal Medicine Research Center (Hospital Clínic and Hospital Sant Joan de Déu), Universitat de Barcelona, Spain; Department Woman-Mother-Child, CHUV, Lausanne, Switzerland
| | - Alexandre Pron
- Aix-Marseille Université, CNRS, Institut de Neurosciences de La Timone, Marseilles, France
| | - Mériam Koob
- Department of Diagnostic and Interventional Radiology, Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland
| | - Vincent Dunet
- Department of Diagnostic and Interventional Radiology, Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland
| | - Nadine Girard
- Aix-Marseille Université, CNRS, Institut de Neurosciences de La Timone, Marseilles, France; Service de Neuroradiologie Diagnostique et Interventionnelle, Hôpital Timone, AP-HM, Marseilles, France
| | - Andras Jakab
- Center for MR Research, University Children's Hospital Zurich, University of Zurich, Zurich, Switzerland; Neuroscience Center Zurich, University of Zurich, Zurich, Switzerland; Research Priority Project Adaptive Brain Circuits in Development and Learning (AdaBD), University of Zürich, Zurich, Switzerland
| | - Elisenda Eixarch
- BCNatal Fetal Medicine Research Center (Hospital Clínic and Hospital Sant Joan de Déu), Universitat de Barcelona, Spain; IDIBAPS and CIBERER, Barcelona, Spain
| | - Guillaume Auzias
- Aix-Marseille Université, CNRS, Institut de Neurosciences de La Timone, Marseilles, France
| | - Meritxell Bach Cuadra
- CIBM - Center for Biomedical Imaging, Switzerland; Department of Diagnostic and Interventional Radiology, Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland
| |
Collapse
|
2
|
Gross M, Huber S, Arora S, Ze'evi T, Haider SP, Kucukkaya AS, Iseke S, Kuhn TN, Gebauer B, Michallek F, Dewey M, Vilgrain V, Sartoris R, Ronot M, Jaffe A, Strazzabosco M, Chapiro J, Onofrey JA. Automated MRI liver segmentation for anatomical segmentation, liver volumetry, and the extraction of radiomics. Eur Radiol 2024; 34:5056-5065. [PMID: 38217704 PMCID: PMC11245591 DOI: 10.1007/s00330-023-10495-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Revised: 09/20/2023] [Accepted: 10/29/2023] [Indexed: 01/15/2024]
Abstract
OBJECTIVES To develop and evaluate a deep convolutional neural network (DCNN) for automated liver segmentation, volumetry, and radiomic feature extraction on contrast-enhanced portal venous phase magnetic resonance imaging (MRI). MATERIALS AND METHODS This retrospective study included hepatocellular carcinoma patients from an institutional database with portal venous MRI. After manual segmentation, the data was randomly split into independent training, validation, and internal testing sets. From a collaborating institution, de-identified scans were used for external testing. The public LiverHccSeg dataset was used for further external validation. A 3D DCNN was trained to automatically segment the liver. Segmentation accuracy was quantified by the Dice similarity coefficient (DSC) with respect to manual segmentation. A Mann-Whitney U test was used to compare the internal and external test sets. Agreement of volumetry and radiomic features was assessed using the intraclass correlation coefficient (ICC). RESULTS In total, 470 patients met the inclusion criteria (63.9±8.2 years; 376 males) and 20 patients were used for external validation (41±12 years; 13 males). DSC segmentation accuracy of the DCNN was similarly high between the internal (0.97±0.01) and external (0.96±0.03) test sets (p=0.28) and demonstrated robust segmentation performance on public testing (0.93±0.03). Agreement of liver volumetry was satisfactory in the internal (ICC, 0.99), external (ICC, 0.97), and public (ICC, 0.85) test sets. Radiomic features demonstrated excellent agreement in the internal (mean ICC, 0.98±0.04), external (mean ICC, 0.94±0.10), and public (mean ICC, 0.91±0.09) datasets. CONCLUSION Automated liver segmentation yields robust and generalizable segmentation performance on MRI data and can be used for volumetry and radiomic feature extraction. CLINICAL RELEVANCE STATEMENT Liver volumetry, anatomic localization, and extraction of quantitative imaging biomarkers require accurate segmentation, but manual segmentation is time-consuming. A deep convolutional neural network demonstrates fast and accurate segmentation performance on T1-weighted portal venous MRI. KEY POINTS • This deep convolutional neural network yields robust and generalizable liver segmentation performance on internal, external, and public testing data. • Automated liver volumetry demonstrated excellent agreement with manual volumetry. • Automated liver segmentations can be used for robust and reproducible radiomic feature extraction.
Collapse
Affiliation(s)
- Moritz Gross
- Department of Radiology and Biomedical Imaging, Yale University School of Medicine, New Haven, CT, USA.
- Charité Center for Diagnostic and Interventional Radiology, Charité - Universitätsmedizin Berlin, Berlin, Germany.
| | - Steffen Huber
- Department of Radiology and Biomedical Imaging, Yale University School of Medicine, New Haven, CT, USA
| | - Sandeep Arora
- Department of Radiology and Biomedical Imaging, Yale University School of Medicine, New Haven, CT, USA
| | - Tal Ze'evi
- Department of Biomedical Engineering, Yale University, New Haven, CT, USA
| | - Stefan P Haider
- Department of Radiology and Biomedical Imaging, Yale University School of Medicine, New Haven, CT, USA
- Department of Otorhinolaryngology, University Hospital of Ludwig Maximilians Universität München, Munich, Germany
| | - Ahmet S Kucukkaya
- Department of Radiology and Biomedical Imaging, Yale University School of Medicine, New Haven, CT, USA
- Charité Center for Diagnostic and Interventional Radiology, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Simon Iseke
- Department of Radiology and Biomedical Imaging, Yale University School of Medicine, New Haven, CT, USA
- Department of Diagnostic and Interventional Radiology, Pediatric Radiology and Neuroradiology, Rostock University Medical Center, Rostock, Germany
| | - Tom Niklas Kuhn
- Department of Radiology and Biomedical Imaging, Yale University School of Medicine, New Haven, CT, USA
- Department of Diagnostic and Interventional Radiology, University Duesseldorf, Duesseldorf, Germany
| | - Bernhard Gebauer
- Charité Center for Diagnostic and Interventional Radiology, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Florian Michallek
- Charité Center for Diagnostic and Interventional Radiology, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Marc Dewey
- Charité Center for Diagnostic and Interventional Radiology, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Valérie Vilgrain
- Université Paris Cité, Île-de-France, Paris, France
- Department of Radiology, Hôpital Beaujon, AP-HP.Nord, Department of Radiology, Île-de-France, Clichy, France
| | - Riccardo Sartoris
- Université Paris Cité, Île-de-France, Paris, France
- Department of Radiology, Hôpital Beaujon, AP-HP.Nord, Department of Radiology, Île-de-France, Clichy, France
| | - Maxime Ronot
- Université Paris Cité, Île-de-France, Paris, France
- Department of Radiology, Hôpital Beaujon, AP-HP.Nord, Department of Radiology, Île-de-France, Clichy, France
| | - Ariel Jaffe
- Department of Internal Medicine, Yale University School of Medicine, New Haven, CT, USA
| | - Mario Strazzabosco
- Department of Internal Medicine, Yale University School of Medicine, New Haven, CT, USA
| | - Julius Chapiro
- Department of Radiology and Biomedical Imaging, Yale University School of Medicine, New Haven, CT, USA
- Department of Biomedical Engineering, Yale University, New Haven, CT, USA
| | - John A Onofrey
- Department of Radiology and Biomedical Imaging, Yale University School of Medicine, New Haven, CT, USA.
- Department of Biomedical Engineering, Yale University, New Haven, CT, USA.
- Department of Urology, Yale University School of Medicine, New Haven, CT, USA.
| |
Collapse
|
3
|
Rosenblatt M, Tejavibulya L, Sun H, Camp CC, Khaitova M, Adkinson BD, Jiang R, Westwater ML, Noble S, Scheinost D. Power and reproducibility in the external validation of brain-phenotype predictions. Nat Hum Behav 2024:10.1038/s41562-024-01931-7. [PMID: 39085406 DOI: 10.1038/s41562-024-01931-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Accepted: 06/18/2024] [Indexed: 08/02/2024]
Abstract
Brain-phenotype predictive models seek to identify reproducible and generalizable brain-phenotype associations. External validation, or the evaluation of a model in external datasets, is the gold standard in evaluating the generalizability of models in neuroimaging. Unlike typical studies, external validation involves two sample sizes: the training and the external sample sizes. Thus, traditional power calculations may not be appropriate. Here we ran over 900 million resampling-based simulations in functional and structural connectivity data to investigate the relationship between training sample size, external sample size, phenotype effect size, theoretical power and simulated power. Our analysis included a wide range of datasets: the Healthy Brain Network, the Adolescent Brain Cognitive Development Study, the Human Connectome Project (Development and Young Adult), the Philadelphia Neurodevelopmental Cohort, the Queensland Twin Adolescent Brain Project, and the Chinese Human Connectome Project; and phenotypes: age, body mass index, matrix reasoning, working memory, attention problems, anxiety/depression symptoms and relational processing. High effect size predictions achieved adequate power with training and external sample sizes of a few hundred individuals, whereas low and medium effect size predictions required hundreds to thousands of training and external samples. In addition, most previous external validation studies used sample sizes prone to low power, and theoretical power curves should be adjusted for the training sample size. Furthermore, model performance in internal validation often informed subsequent external validation performance (Pearson's r difference <0.2), particularly for well-harmonized datasets. These results could help decide how to power future external validation studies.
Collapse
Affiliation(s)
- Matthew Rosenblatt
- Department of Biomedical Engineering, Yale University, New Haven, CT, USA.
| | - Link Tejavibulya
- Interdepartmental Neuroscience Program, Yale University, New Haven, CT, USA
| | - Huili Sun
- Department of Biomedical Engineering, Yale University, New Haven, CT, USA
| | - Chris C Camp
- Interdepartmental Neuroscience Program, Yale University, New Haven, CT, USA
| | - Milana Khaitova
- Department of Radiology and Biomedical Imaging, Yale School of Medicine, New Haven, CT, USA
| | - Brendan D Adkinson
- Interdepartmental Neuroscience Program, Yale University, New Haven, CT, USA
| | - Rongtao Jiang
- Department of Radiology and Biomedical Imaging, Yale School of Medicine, New Haven, CT, USA
| | - Margaret L Westwater
- Department of Radiology and Biomedical Imaging, Yale School of Medicine, New Haven, CT, USA
| | - Stephanie Noble
- Department of Radiology and Biomedical Imaging, Yale School of Medicine, New Haven, CT, USA
- Department of Bioengineering, Northeastern University, Boston, MA, USA
- Department of Psychology, Northeastern University, Boston, MA, USA
| | - Dustin Scheinost
- Department of Biomedical Engineering, Yale University, New Haven, CT, USA
- Interdepartmental Neuroscience Program, Yale University, New Haven, CT, USA
- Department of Radiology and Biomedical Imaging, Yale School of Medicine, New Haven, CT, USA
- Child Study Center, Yale School of Medicine, New Haven, CT, USA
- Department of Statistics and Data Science, Yale University, New Haven, CT, USA
| |
Collapse
|
4
|
Faust L, Wilson P, Asai S, Fu S, Liu H, Ruan X, Storlie C. Considerations for Quality Control Monitoring of Machine Learning Models in Clinical Practice. JMIR Med Inform 2024; 12:e50437. [PMID: 38941140 PMCID: PMC11245651 DOI: 10.2196/50437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 08/22/2023] [Accepted: 05/04/2024] [Indexed: 06/29/2024] Open
Abstract
Integrating machine learning (ML) models into clinical practice presents a challenge of maintaining their efficacy over time. While existing literature offers valuable strategies for detecting declining model performance, there is a need to document the broader challenges and solutions associated with the real-world development and integration of model monitoring solutions. This work details the development and use of a platform for monitoring the performance of a production-level ML model operating in Mayo Clinic. In this paper, we aimed to provide a series of considerations and guidelines necessary for integrating such a platform into a team's technical infrastructure and workflow. We have documented our experiences with this integration process, discussed the broader challenges encountered with real-world implementation and maintenance, and included the source code for the platform. Our monitoring platform was built as an R shiny application, developed and implemented over the course of 6 months. The platform has been used and maintained for 2 years and is still in use as of July 2023. The considerations necessary for the implementation of the monitoring platform center around 4 pillars: feasibility (what resources can be used for platform development?); design (through what statistics or models will the model be monitored, and how will these results be efficiently displayed to the end user?); implementation (how will this platform be built, and where will it exist within the IT ecosystem?); and policy (based on monitoring feedback, when and what actions will be taken to fix problems, and how will these problems be translated to clinical staff?). While much of the literature surrounding ML performance monitoring emphasizes methodological approaches for capturing changes in performance, there remains a battery of other challenges and considerations that must be addressed for successful real-world implementation.
Collapse
Affiliation(s)
- Louis Faust
- Robert D and Patricia E Kern Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, MN, United States
| | - Patrick Wilson
- Robert D and Patricia E Kern Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, MN, United States
| | - Shusaku Asai
- Robert D and Patricia E Kern Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, MN, United States
| | - Sunyang Fu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
| | - Hongfang Liu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
| | - Xiaoyang Ruan
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
| | - Curt Storlie
- Robert D and Patricia E Kern Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, MN, United States
| |
Collapse
|
5
|
Brosula R, Corbin CK, Chen JH. Pathophysiological Features in Electronic Medical Records Sustain Model Performance under Temporal Dataset Shift. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2024; 2024:95-104. [PMID: 38827052 PMCID: PMC11141811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Access to real-world data streams like electronic medical records (EMRs) has accelerated the development of supervised machine learning (ML) models for clinical applications. However, few studies investigate the differential impact of particular features in the EMR on model performance under temporal dataset shift. To explain how features in the EMR impact models over time, this study aggregates features into feature groups by their source (e.g. medication orders, diagnosis codes and lab results) and feature categories based on their reflection of patient pathophysiology or healthcare processes. We adapt Shapley values to explain feature groups' and feature categories' marginal contribution to initial and sustained model performance. We investigate three standard clinical prediction tasks and find that while feature contributions to initial performance differ across tasks, pathophysiological features help mitigate temporal discrimination deterioration. These results provide interpretable insights on how specific feature groups contribute to model performance and robustness to temporal dataset shift.
Collapse
Affiliation(s)
- Raphael Brosula
- Genomic Center for Infectious Diseases, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Conor K Corbin
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Jonathan H Chen
- Center for Biomedical Informatics Research, Stanford University, Stanford, CA, USA
| |
Collapse
|
6
|
Andersson E, Hult J, Troein C, Stridh M, Sjögren B, Pekar-Lukacs A, Hernandez-Palacios J, Edén P, Persson B, Olariu V, Malmsjö M, Merdasa A. Facilitating clinically relevant skin tumor diagnostics with spectroscopy-driven machine learning. iScience 2024; 27:109653. [PMID: 38680659 PMCID: PMC11053315 DOI: 10.1016/j.isci.2024.109653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 03/26/2024] [Accepted: 04/01/2024] [Indexed: 05/01/2024] Open
Abstract
In the dawning era of artificial intelligence (AI), health care stands to undergo a significant transformation with the increasing digitalization of patient data. Digital imaging, in particular, will serve as an important platform for AI to aid decision making and diagnostics. A growing number of studies demonstrate the potential of automatic pre-surgical skin tumor delineation, which could have tremendous impact on clinical practice. However, current methods rely on having ground truth images in which tumor borders are already identified, which is not clinically possible. We report a novel approach where hyperspectral images provide spectra from small regions representing healthy tissue and tumor, which are used to generate prediction maps using artificial neural networks (ANNs), after which a segmentation algorithm automatically identifies the tumor borders. This circumvents the need for ground truth images, since an ANN model is trained with data from each individual patient, representing a more clinically relevant approach.
Collapse
Affiliation(s)
- Emil Andersson
- Centre for Environmental and Climate Science, Lund University, Lund, Sweden
| | - Jenny Hult
- Department of Clinical Sciences Lund, Ophthalmology, Lund University, Lund, Sweden
| | - Carl Troein
- Centre for Environmental and Climate Science, Lund University, Lund, Sweden
| | - Magne Stridh
- Department of Clinical Sciences Lund, Ophthalmology, Lund University, Lund, Sweden
| | - Benjamin Sjögren
- Department of Clinical Sciences Lund, Ophthalmology, Lund University, Lund, Sweden
| | | | | | - Patrik Edén
- Centre for Environmental and Climate Science, Lund University, Lund, Sweden
| | - Bertil Persson
- Department of Dermatology, Skåne University Hospital, Lund, Sweden
| | - Victor Olariu
- Centre for Environmental and Climate Science, Lund University, Lund, Sweden
| | - Malin Malmsjö
- Department of Clinical Sciences Lund, Ophthalmology, Lund University, Lund, Sweden
| | - Aboma Merdasa
- Department of Clinical Sciences Lund, Ophthalmology, Lund University, Lund, Sweden
| |
Collapse
|
7
|
Ktena I, Wiles O, Albuquerque I, Rebuffi SA, Tanno R, Roy AG, Azizi S, Belgrave D, Kohli P, Cemgil T, Karthikesalingam A, Gowal S. Generative models improve fairness of medical classifiers under distribution shifts. Nat Med 2024; 30:1166-1173. [PMID: 38600282 PMCID: PMC11031395 DOI: 10.1038/s41591-024-02838-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 01/26/2024] [Indexed: 04/12/2024]
Abstract
Domain generalization is a ubiquitous challenge for machine learning in healthcare. Model performance in real-world conditions might be lower than expected because of discrepancies between the data encountered during deployment and development. Underrepresentation of some groups or conditions during model development is a common cause of this phenomenon. This challenge is often not readily addressed by targeted data acquisition and 'labeling' by expert clinicians, which can be prohibitively expensive or practically impossible because of the rarity of conditions or the available clinical expertise. We hypothesize that advances in generative artificial intelligence can help mitigate this unmet need in a steerable fashion, enriching our training dataset with synthetic examples that address shortfalls of underrepresented conditions or subgroups. We show that diffusion models can automatically learn realistic augmentations from data in a label-efficient manner. We demonstrate that learned augmentations make models more robust and statistically fair in-distribution and out of distribution. To evaluate the generality of our approach, we studied three distinct medical imaging contexts of varying difficulty: (1) histopathology, (2) chest X-ray and (3) dermatology images. Complementing real samples with synthetic ones improved the robustness of models in all three medical tasks and increased fairness by improving the accuracy of clinical diagnosis within underrepresented groups, especially out of distribution.
Collapse
|
8
|
Rosenblatt M, Tejavibulya L, Jiang R, Noble S, Scheinost D. Data leakage inflates prediction performance in connectome-based machine learning models. Nat Commun 2024; 15:1829. [PMID: 38418819 PMCID: PMC10901797 DOI: 10.1038/s41467-024-46150-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 02/15/2024] [Indexed: 03/02/2024] Open
Abstract
Predictive modeling is a central technique in neuroimaging to identify brain-behavior relationships and test their generalizability to unseen data. However, data leakage undermines the validity of predictive models by breaching the separation between training and test data. Leakage is always an incorrect practice but still pervasive in machine learning. Understanding its effects on neuroimaging predictive models can inform how leakage affects existing literature. Here, we investigate the effects of five forms of leakage-involving feature selection, covariate correction, and dependence between subjects-on functional and structural connectome-based machine learning models across four datasets and three phenotypes. Leakage via feature selection and repeated subjects drastically inflates prediction performance, whereas other forms of leakage have minor effects. Furthermore, small datasets exacerbate the effects of leakage. Overall, our results illustrate the variable effects of leakage and underscore the importance of avoiding data leakage to improve the validity and reproducibility of predictive modeling.
Collapse
Affiliation(s)
- Matthew Rosenblatt
- Department of Biomedical Engineering, Yale University, New Haven, CT, USA.
| | - Link Tejavibulya
- Interdepartmental Neuroscience Program, Yale University, New Haven, CT, USA
| | - Rongtao Jiang
- Department of Radiology & Biomedical Imaging, Yale School of Medicine, New Haven, CT, USA
| | - Stephanie Noble
- Department of Radiology & Biomedical Imaging, Yale School of Medicine, New Haven, CT, USA
- Department of Bioengineering, Northeastern University, Boston, MA, USA
- Department of Psychology, Northeastern University, Boston, MA, USA
| | - Dustin Scheinost
- Department of Biomedical Engineering, Yale University, New Haven, CT, USA
- Interdepartmental Neuroscience Program, Yale University, New Haven, CT, USA
- Department of Radiology & Biomedical Imaging, Yale School of Medicine, New Haven, CT, USA
- Child Study Center, Yale School of Medicine, New Haven, CT, USA
- Department of Statistics & Data Science, Yale University, New Haven, CT, USA
| |
Collapse
|
9
|
Sievers B, Thornton MA. Deep social neuroscience: the promise and peril of using artificial neural networks to study the social brain. Soc Cogn Affect Neurosci 2024; 19:nsae014. [PMID: 38334747 PMCID: PMC10880882 DOI: 10.1093/scan/nsae014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 12/20/2023] [Accepted: 02/04/2024] [Indexed: 02/10/2024] Open
Abstract
This review offers an accessible primer to social neuroscientists interested in neural networks. It begins by providing an overview of key concepts in deep learning. It then discusses three ways neural networks can be useful to social neuroscientists: (i) building statistical models to predict behavior from brain activity; (ii) quantifying naturalistic stimuli and social interactions; and (iii) generating cognitive models of social brain function. These applications have the potential to enhance the clinical value of neuroimaging and improve the generalizability of social neuroscience research. We also discuss the significant practical challenges, theoretical limitations and ethical issues faced by deep learning. If the field can successfully navigate these hazards, we believe that artificial neural networks may prove indispensable for the next stage of the field's development: deep social neuroscience.
Collapse
Affiliation(s)
- Beau Sievers
- Department of Psychology, Stanford University, 420 Jane Stanford Way, Stanford, CA 94305, USA
- Department of Psychology, Harvard University, 33 Kirkland St., Cambridge, MA 02138, USA
| | - Mark A Thornton
- Department of Psychological and Brain Sciences, Dartmouth College, 6207 Moore Hall, Hanover, NH 03755, USA
| |
Collapse
|
10
|
Demidenko MI, Mumford JA, Ram N, Poldrack RA. A multi-sample evaluation of the measurement structure and function of the modified monetary incentive delay task in adolescents. Dev Cogn Neurosci 2024; 65:101337. [PMID: 38160517 PMCID: PMC10801229 DOI: 10.1016/j.dcn.2023.101337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2022] [Revised: 12/11/2023] [Accepted: 12/26/2023] [Indexed: 01/03/2024] Open
Abstract
Interpreting the neural response elicited during task functional magnetic resonance imaging (fMRI) remains a challenge in neurodevelopmental research. The monetary incentive delay (MID) task is an fMRI reward processing task that is extensively used in the literature. However, modern psychometric tools have not been used to evaluate measurement properties of the MID task fMRI data. The current study uses data for a similar task design across three adolescent samples (N = 346 [Agemean 12.0; 44 % Female]; N = 97 [19.3; 58 %]; N = 112 [20.2; 38 %]) to evaluate multiple measurement properties of fMRI responses on the MID task. Confirmatory factor analysis (CFA) is used to evaluate an a priori theoretical model for the task and its measurement invariance across three samples. Exploratory factor analysis (EFA) is used to identify the data-driven measurement structure across the samples. CFA results suggest that the a priori model is a poor representation of these MID task fMRI data. Across the samples, the data-driven EFA models consistently identify a six-to-seven factor structure with run and bilateral brain region factors. This factor structure is moderately-to-highly congruent across the samples. Altogether, these findings demonstrate a need to evaluate theoretical frameworks for popular fMRI task designs to improve our understanding and interpretation of brain-behavior associations.
Collapse
Affiliation(s)
| | | | - Nilam Ram
- Department of Psychology, Stanford University, Stanford, United States
| | | |
Collapse
|
11
|
Adkinson BD, Rosenblatt M, Dadashkarimi J, Tejavibulya L, Jiang R, Noble S, Scheinost D. Brain-phenotype predictions can survive across diverse real-world data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.23.576916. [PMID: 38328100 PMCID: PMC10849571 DOI: 10.1101/2024.01.23.576916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Recent work suggests that machine learning models predicting psychiatric treatment outcomes based on clinical data may fail when applied to unharmonized samples. Neuroimaging predictive models offer the opportunity to incorporate neurobiological information, which may be more robust to dataset shifts. Yet, among the minority of neuroimaging studies that undertake any form of external validation, there is a notable lack of attention to generalization across dataset-specific idiosyncrasies. Research settings, by design, remove the between-site variations that real-world and, eventually, clinical applications demand. Here, we rigorously test the ability of a range of predictive models to generalize across three diverse, unharmonized samples: the Philadelphia Neurodevelopmental Cohort (n=1291), the Healthy Brain Network (n=1110), and the Human Connectome Project in Development (n=428). These datasets have high inter-dataset heterogeneity, encompassing substantial variations in age distribution, sex, racial and ethnic minority representation, recruitment geography, clinical symptom burdens, fMRI tasks, sequences, and behavioral measures. We demonstrate that reproducible and generalizable brain-behavior associations can be realized across diverse dataset features with sample sizes in the hundreds. Results indicate the potential of functional connectivity-based predictive models to be robust despite substantial inter-dataset variability. Notably, for the HCPD and HBN datasets, the best predictions were not from training and testing in the same dataset (i.e., cross-validation) but across datasets. This result suggests that training on diverse data may improve prediction in specific cases. Overall, this work provides a critical foundation for future work evaluating the generalizability of neuroimaging predictive models in real-world scenarios and clinical settings.
Collapse
Affiliation(s)
- Brendan D Adkinson
- Interdepartmental Neuroscience Program, Yale School of Medicine, New Haven, CT, 06510, USA
| | - Matthew Rosenblatt
- Department of Biomedical Engineering, Yale University, New Haven, CT, 06520, USA
| | - Javid Dadashkarimi
- Department of Radiology, Athinoula. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, MA, 02129, USA
- Department of Radiology, Harvard Medical School, Boston, MA, 02129, USA
| | - Link Tejavibulya
- Interdepartmental Neuroscience Program, Yale School of Medicine, New Haven, CT, 06510, USA
| | - Rongtao Jiang
- Department of Radiology & Biomedical Imaging, Yale School of Medicine, New Haven, CT, 06510, USA
| | - Stephanie Noble
- Department of Radiology & Biomedical Imaging, Yale School of Medicine, New Haven, CT, 06510, USA
- Department of Bioengineering, Northeastern University, Boston, MA, 02120, USA
- Department of Psychology, Northeastern University, Boston, MA, 02115, USA
| | - Dustin Scheinost
- Interdepartmental Neuroscience Program, Yale School of Medicine, New Haven, CT, 06510, USA
- Department of Biomedical Engineering, Yale University, New Haven, CT, 06520, USA
- Department of Radiology & Biomedical Imaging, Yale School of Medicine, New Haven, CT, 06510, USA
- Department of Statistics & Data Science, Yale University, New Haven, CT, 06520, USA
- Child Study Center, Yale School of Medicine, New Haven, CT, 06510, USA
- Wu Tsai Institute, Yale University, New Haven, CT, 06510, USA
| |
Collapse
|
12
|
Belov V, Erwin-Grabner T, Aghajani M, Aleman A, Amod AR, Basgoze Z, Benedetti F, Besteher B, Bülow R, Ching CRK, Connolly CG, Cullen K, Davey CG, Dima D, Dols A, Evans JW, Fu CHY, Gonul AS, Gotlib IH, Grabe HJ, Groenewold N, Hamilton JP, Harrison BJ, Ho TC, Mwangi B, Jaworska N, Jahanshad N, Klimes-Dougan B, Koopowitz SM, Lancaster T, Li M, Linden DEJ, MacMaster FP, Mehler DMA, Melloni E, Mueller BA, Ojha A, Oudega ML, Penninx BWJH, Poletti S, Pomarol-Clotet E, Portella MJ, Pozzi E, Reneman L, Sacchet MD, Sämann PG, Schrantee A, Sim K, Soares JC, Stein DJ, Thomopoulos SI, Uyar-Demir A, van der Wee NJA, van der Werff SJA, Völzke H, Whittle S, Wittfeld K, Wright MJ, Wu MJ, Yang TT, Zarate C, Veltman DJ, Schmaal L, Thompson PM, Goya-Maldonado R. Multi-site benchmark classification of major depressive disorder using machine learning on cortical and subcortical measures. Sci Rep 2024; 14:1084. [PMID: 38212349 PMCID: PMC10784593 DOI: 10.1038/s41598-023-47934-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Accepted: 11/19/2023] [Indexed: 01/13/2024] Open
Abstract
Machine learning (ML) techniques have gained popularity in the neuroimaging field due to their potential for classifying neuropsychiatric disorders. However, the diagnostic predictive power of the existing algorithms has been limited by small sample sizes, lack of representativeness, data leakage, and/or overfitting. Here, we overcome these limitations with the largest multi-site sample size to date (N = 5365) to provide a generalizable ML classification benchmark of major depressive disorder (MDD) using shallow linear and non-linear models. Leveraging brain measures from standardized ENIGMA analysis pipelines in FreeSurfer, we were able to classify MDD versus healthy controls (HC) with a balanced accuracy of around 62%. But after harmonizing the data, e.g., using ComBat, the balanced accuracy dropped to approximately 52%. Accuracy results close to random chance levels were also observed in stratified groups according to age of onset, antidepressant use, number of episodes and sex. Future studies incorporating higher dimensional brain imaging/phenotype features, and/or using more advanced machine and deep learning methods may yield more encouraging prospects.
Collapse
Affiliation(s)
- Vladimir Belov
- Laboratory of Systems Neuroscience and Imaging in Psychiatry (SNIP-Lab), Department of Psychiatry and Psychotherapy, University Medical Center Göttingen (UMG), Georg-August University, Von-Siebold-Str. 5, 37075, Göttingen, Germany
| | - Tracy Erwin-Grabner
- Laboratory of Systems Neuroscience and Imaging in Psychiatry (SNIP-Lab), Department of Psychiatry and Psychotherapy, University Medical Center Göttingen (UMG), Georg-August University, Von-Siebold-Str. 5, 37075, Göttingen, Germany
| | - Moji Aghajani
- Department of Psychiatry, Amsterdam UMC, Amsterdam Neuroscience, Amsterdam Public Health Research Institute, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Institute of Education and Child Studies, Section Forensic Family and Youth Care, Leiden University, Leiden, The Netherlands
| | - Andre Aleman
- Department of Biomedical Sciences of Cells and Systems, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Alyssa R Amod
- Department of Psychiatry and Mental Health, University of Cape Town, Cape Town, South Africa
| | - Zeynep Basgoze
- Department of Psychiatry and Behavioral Science, University of Minnesota Medical School, Minneapolis, MN, USA
| | - Francesco Benedetti
- Division of Neuroscience, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Bianca Besteher
- Department of Psychiatry and Psychotherapy, Jena University Hospital, Jena, Germany
| | - Robin Bülow
- Institute for Radiology and Neuroradiology, University Medicine Greifswald, Greifswald, Germany
| | - Christopher R K Ching
- Imaging Genetics Center, Mark and Mary Stevens Neuroimaging and Informatics Institute, Keck School of Medicine, University of Southern California, Marina del Rey, CA, USA
| | - Colm G Connolly
- Department of Biomedical Sciences, Florida State University, Tallahassee, FL, USA
| | - Kathryn Cullen
- Department of Psychiatry and Behavioral Science, University of Minnesota Medical School, Minneapolis, MN, USA
| | - Christopher G Davey
- Melbourne Neuropsychiatry Centre, Department of Psychiatry, The University of Melbourne, Parkville, VIC, Australia
| | - Danai Dima
- Department of Psychology, School of Arts and Social Sciences, City, University of London, London, UK
- Department of Neuroimaging, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Annemiek Dols
- Department of Psychiatry, Amsterdam UMC, Amsterdam Neuroscience, Amsterdam Public Health Research Institute, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Jennifer W Evans
- Experimental Therapeutics and Pathophysiology Branch, National Institute for Mental Health, National Institutes of Health, Bethesda, MD, USA
| | - Cynthia H Y Fu
- School of Psychology, University of East London, London, UK
- Centre for Affective Disorders, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Ali Saffet Gonul
- SoCAT Lab, Department of Psychiatry, School of Medicine, Ege University, Izmir, Turkey
| | - Ian H Gotlib
- Department of Psychology, Stanford University, Stanford, CA, USA
| | - Hans J Grabe
- Department of Psychiatry and Psychotherapy, University Medicine Greifswald, Greifswald, Germany
| | - Nynke Groenewold
- Department of Psychiatry and Mental Health, University of Cape Town, Cape Town, South Africa
| | - J Paul Hamilton
- Center for Social and Affective Neuroscience, Department of Biomedical and Clinical Sciences, Linköping University, Linköping, Sweden
- Center for Medical Imaging and Visualization, Linköping University, Linköping, Sweden
| | - Ben J Harrison
- Melbourne Neuropsychiatry Centre, Department of Psychiatry, The University of Melbourne, Parkville, VIC, Australia
| | - Tiffany C Ho
- Department of Psychiatry and Behavioral Sciences, Division of Child and Adolescent Psychiatry, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
- Department of Psychology, University of California, Los Angeles, Los Angeles, CA, USA
| | - Benson Mwangi
- Louis A. Faillace, MD, Department of Psychiatry and Behavioral Sciences, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Center Of Excellence On Mood Disorders, Louis A. Faillace, MD, Department of Psychiatry and Behavioral Sciences at McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Natalia Jaworska
- Department of Psychiatry, McGill University, Montreal, QC, Canada
| | - Neda Jahanshad
- Imaging Genetics Center, Mark and Mary Stevens Neuroimaging and Informatics Institute, Keck School of Medicine, University of Southern California, Marina del Rey, CA, USA
| | | | | | - Thomas Lancaster
- Cardiff University Brain Research Imaging Center, Cardiff University, Cardiff, UK
- MRC Center for Neuropsychiatric Genetics and Genomics, Cardiff University, Cardiff, UK
| | - Meng Li
- Department of Psychiatry and Psychotherapy, Jena University Hospital, Jena, Germany
| | - David E J Linden
- Cardiff University Brain Research Imaging Center, Cardiff University, Cardiff, UK
- MRC Center for Neuropsychiatric Genetics and Genomics, Cardiff University, Cardiff, UK
- Division of Psychological Medicine and Clinical Neurosciences, Cardiff University, Cardiff, UK
- School of Mental Health and Neuroscience, Faculty of Health, Medicine and Life Sciences, Maastricht University, Maastricht, The Netherlands
| | - Frank P MacMaster
- Departments of Psychiatry and Pediatrics, University of Calgary, Calgary, AB, Canada
| | - David M A Mehler
- Cardiff University Brain Research Imaging Center, Cardiff University, Cardiff, UK
- MRC Center for Neuropsychiatric Genetics and Genomics, Cardiff University, Cardiff, UK
- Department of Psychiatry, Psychotherapy and Psychosomatics, Medical School, RWTH Aachen University, Aachen, Germany
| | - Elisa Melloni
- Division of Neuroscience, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Bryon A Mueller
- Department of Psychiatry and Behavioral Science, University of Minnesota Medical School, Minneapolis, MN, USA
| | - Amar Ojha
- Center for Neuroscience, University of Pittsburgh, Pittsburgh, PA, USA
- Center for Neural Basis of Cognition, University of Pittsburgh, Pittsburgh, PA, USA
| | - Mardien L Oudega
- Department of Psychiatry, Amsterdam UMC, Amsterdam Neuroscience, Amsterdam Public Health Research Institute, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Brenda W J H Penninx
- Department of Psychiatry, Amsterdam UMC, Amsterdam Neuroscience, Amsterdam Public Health Research Institute, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Sara Poletti
- Division of Neuroscience, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Edith Pomarol-Clotet
- FIDMAG Germanes Hospitalàries Research Foundation, Centro de Investigación Biomédica en Red de Salud Mental (CIBERSAM), Barcelona, Catalonia, Spain
| | - Maria J Portella
- Sant Pau Mental Health Research Group, Institut de Recerca de L'Hospital de La Santa Creu I Sant Pau, Barcelona, Catalonia, Spain
| | - Elena Pozzi
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia
- Orygen, Parkville, VIC, Australia
| | - Liesbeth Reneman
- Department of Radiology and Nuclear Medicine, Amsterdam University Medical Centers, Amsterdam, The Netherlands
| | - Matthew D Sacchet
- Meditation Research Program, Department of Psychiatry, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | | | - Anouk Schrantee
- Department of Radiology and Nuclear Medicine, Amsterdam University Medical Centers, Amsterdam, The Netherlands
| | - Kang Sim
- West Region, Institute of Mental Health, Singapore, Singapore
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
| | - Jair C Soares
- Center Of Excellence On Mood Disorders, Louis A. Faillace, MD, Department of Psychiatry and Behavioral Sciences at McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Dan J Stein
- SA MRC Research Unit on Risk and Resilience in Mental Disorders, Department of Psychiatry and Neuroscience Institute, University of Cape Town, Cape Town, South Africa
| | - Sophia I Thomopoulos
- Imaging Genetics Center, Mark and Mary Stevens Neuroimaging and Informatics Institute, Keck School of Medicine, University of Southern California, Marina del Rey, CA, USA
| | - Aslihan Uyar-Demir
- SoCAT Lab, Department of Psychiatry, School of Medicine, Ege University, Izmir, Turkey
| | - Nic J A van der Wee
- Leiden Institute for Brain and Cognition, Leiden University Medical Center, Leiden, The Netherlands
| | - Steven J A van der Werff
- Leiden Institute for Brain and Cognition, Leiden University Medical Center, Leiden, The Netherlands
- Department of Psychiatry, Leiden University Medical Center, Leiden, The Netherlands
| | - Henry Völzke
- Institute for Community Medicine, University Medicine Greifswald, Greifswald, Germany
| | - Sarah Whittle
- Melbourne Neuropsychiatry Centre, Department of Psychiatry, The University of Melbourne and Melbourne Health, Melbourne, VIC, Australia
| | - Katharina Wittfeld
- Department of Psychiatry and Psychotherapy, University Medicine Greifswald, Greifswald, Germany
- German Center for Neurodegenerative Diseases (DZNE), Site Rostock/ Greifswald, Greifswald, Germany
| | - Margaret J Wright
- Queensland Brain Institute, The University of Queensland, Brisbane, QLD, Australia
- Centre for Advanced Imaging, The University of Queensland, Brisbane, QLD, Australia
| | - Mon-Ju Wu
- Louis A. Faillace, MD, Department of Psychiatry and Behavioral Sciences, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Center Of Excellence On Mood Disorders, Louis A. Faillace, MD, Department of Psychiatry and Behavioral Sciences at McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Tony T Yang
- Department of Psychiatry and Behavioral Sciences, Division of Child and Adolescent Psychiatry, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | - Carlos Zarate
- Section on the Neurobiology and Treatment of Mood Disorders, National Institute of Mental Health, Bethesda, MD, USA
| | - Dick J Veltman
- Department of Psychiatry, Amsterdam UMC, Amsterdam Neuroscience, Amsterdam Public Health Research Institute, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Lianne Schmaal
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia
- Orygen, Parkville, VIC, Australia
| | - Paul M Thompson
- Imaging Genetics Center, Mark and Mary Stevens Neuroimaging and Informatics Institute, Keck School of Medicine, University of Southern California, Marina del Rey, CA, USA
| | - Roberto Goya-Maldonado
- Laboratory of Systems Neuroscience and Imaging in Psychiatry (SNIP-Lab), Department of Psychiatry and Psychotherapy, University Medical Center Göttingen (UMG), Georg-August University, Von-Siebold-Str. 5, 37075, Göttingen, Germany.
| |
Collapse
|
13
|
Grzenda A, Widge AS. Electronic health records and stratified psychiatry: bridge to precision treatment? Neuropsychopharmacology 2024; 49:285-290. [PMID: 37667021 PMCID: PMC10700348 DOI: 10.1038/s41386-023-01724-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 08/24/2023] [Accepted: 08/27/2023] [Indexed: 09/06/2023]
Abstract
The use of a stratified psychiatry approach that combines electronic health records (EHR) data with machine learning (ML) is one potentially fruitful path toward rapidly improving precision treatment in clinical practice. This strategy, however, requires confronting pervasive methodological flaws as well as deficiencies in transparency and reporting in the current conduct of ML-based studies for treatment prediction. EHR data shares many of the same data quality issues as other types of data used in ML prediction, plus some unique challenges. To fully leverage EHR data's power for patient stratification, increased attention to data quality and collection of patient-reported outcome data is needed.
Collapse
Affiliation(s)
- Adrienne Grzenda
- Department of Psychiatry & Biobehavioral Sciences, David Geffen School of Medicine, University of California-Los Angeles, Los Angeles, CA, USA.
- Olive View-UCLA Medical Center, Sylmar, CA, USA.
| | - Alik S Widge
- Department of Psychiatry & Behavioral Sciences, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
14
|
Rosenblatt M, Tejavibulya L, Jiang R, Noble S, Scheinost D. The effects of data leakage on connectome-based machine learning models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.09.544383. [PMID: 38234740 PMCID: PMC10793416 DOI: 10.1101/2023.06.09.544383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2024]
Abstract
Predictive modeling has now become a central technique in neuroimaging to identify complex brain-behavior relationships and test their generalizability to unseen data. However, data leakage, which unintentionally breaches the separation between data used to train and test the model, undermines the validity of predictive models. Previous literature suggests that leakage is generally pervasive in machine learning, but few studies have empirically evaluated the effects of leakage in neuroimaging data. Although leakage is always an incorrect practice, understanding the effects of leakage on neuroimaging predictive models provides insight into the extent to which leakage may affect the literature. Here, we investigated the effects of leakage on machine learning models in two common neuroimaging modalities, functional and structural connectomes. Using over 400 different pipelines spanning four large datasets and three phenotypes, we evaluated five forms of leakage fitting into three broad categories: feature selection, covariate correction, and lack of independence between subjects. As expected, leakage via feature selection and repeated subjects drastically inflated prediction performance. Notably, other forms of leakage had only minor effects (e.g., leaky site correction) or even decreased prediction performance (e.g., leaky covariate regression). In some cases, leakage affected not only prediction performance, but also model coefficients, and thus neurobiological interpretations. Finally, we found that predictive models using small datasets were more sensitive to leakage. Overall, our results illustrate the variable effects of leakage on prediction pipelines and underscore the importance of avoiding data leakage to improve the validity and reproducibility of predictive modeling.
Collapse
Affiliation(s)
| | - Link Tejavibulya
- Interdepartmental Neuroscience Program, Yale University, New Haven, CT
| | - Rongtao Jiang
- Department of Radiology & Biomedical Imaging, Yale School of Medicine, New Haven, CT
| | - Stephanie Noble
- Department of Radiology & Biomedical Imaging, Yale School of Medicine, New Haven, CT
- Department of Bioengineering, Northeastern University, Boston, MA
- Department of Psychology, Northeastern University, Boston, MA
| | - Dustin Scheinost
- Department of Biomedical Engineering, Yale University, New Haven, CT
- Interdepartmental Neuroscience Program, Yale University, New Haven, CT
- Department of Radiology & Biomedical Imaging, Yale School of Medicine, New Haven, CT
- Child Study Center, Yale School of Medicine, New Haven, CT
- Department of Statistics & Data Science, Yale University, New Haven, CT
| |
Collapse
|
15
|
Shen X, Mo S, Zeng X, Wang Y, Lin L, Weng M, Sugasawa T, Wang L, Gu W, Nakajima T. Identification of antigen-presentation related B cells as a key player in Crohn's disease using single-cell dissecting, hdWGCNA, and deep learning. Clin Exp Med 2023; 23:5255-5267. [PMID: 37550553 DOI: 10.1007/s10238-023-01145-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2023] [Accepted: 07/12/2023] [Indexed: 08/09/2023]
Abstract
Crohn's disease (CD) arises from intricate intercellular interactions within the intestinal lamina propria. Our objective was to use single-cell RNA sequencing to investigate CD pathogenesis and explore its clinical significance. We identified a distinct subset of B cells, highly infiltrated in the CD lamina propria, that expressed genes related to antigen presentation. Using high-dimensional weighted gene co-expression network analysis and nine machine learning techniques, we demonstrated that the antigen-presenting CD-specific B cell signature effectively differentiated diseased mucosa from normal mucosa (Independent external testing AUC = 0.963). Additionally, using MCPcounter and non-negative matrix factorization, we established a relationship between the antigen-presenting CD-specific B cell signature and immune cell infiltration and patient heterogeneity. Finally, we developed a gene-immune convolutional neural network deep learning model that accurately diagnosed CD mucosa in diverse cohorts (Independent external testing AUC = 0.963). Our research has revealed a population of B cells with a potential promoting role in CD pathogenesis and represents a fundamental step in the development of future clinical diagnostic tools for the disease.
Collapse
Affiliation(s)
- Xin Shen
- Department of Digestive Diseases, Huashan Hospital, Fudan University, Shanghai, 200040, China
| | - Shaocong Mo
- Department of Digestive Diseases, Huashan Hospital, Fudan University, Shanghai, 200040, China.
| | - Xinlei Zeng
- School of Pharmaceutical Sciences, Sun Yat-Sen University, Guangzhou, 510006, China
| | - Yulin Wang
- Department of Nephrology, Zhongshan Hospital, Fudan University, Shanghai, 200032, China
| | - Lingxi Lin
- Department of Digestive Diseases, Huashan Hospital, Fudan University, Shanghai, 200040, China
| | - Meilin Weng
- Department of Anesthesiology, Zhongshan Hospital, Fudan University, Shanghai, China
| | - Takehito Sugasawa
- Laboratory of Clinical Examination and Sports Medicine, Department of Clinical Medicine, Faculty of Medicine, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, 305-8577, Japan
| | - Lei Wang
- Department of Pathology, Fudan University Shanghai Cancer Center, Shanghai, China
- Department of Oncology, Shanghai Medical College of Fudan University, Shanghai, China
| | - Wenchao Gu
- Department of Diagnostic and Interventional Radiology, University of Tsukuba, Ibaraki, 305-8577, Japan.
- Department of Diagnostic Radiology and Nuclear Medicine, Gunma University Graduate School of Medicine, Maebashi, 371-8511, Japan.
| | - Takahito Nakajima
- Department of Diagnostic and Interventional Radiology, University of Tsukuba, Ibaraki, 305-8577, Japan
| |
Collapse
|
16
|
Rosenblatt M, Tejavibulya L, Camp CC, Jiang R, Westwater ML, Noble S, Scheinost D. Power and reproducibility in the external validation of brain-phenotype predictions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.25.563971. [PMID: 37961654 PMCID: PMC10634903 DOI: 10.1101/2023.10.25.563971] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Identifying reproducible and generalizable brain-phenotype associations is a central goal of neuroimaging. Consistent with this goal, prediction frameworks evaluate brain-phenotype models in unseen data. Most prediction studies train and evaluate a model in the same dataset. However, external validation, or the evaluation of a model in an external dataset, provides a better assessment of robustness and generalizability. Despite the promise of external validation and calls for its usage, the statistical power of such studies has yet to be investigated. In this work, we ran over 60 million simulations across several datasets, phenotypes, and sample sizes to better understand how the sizes of the training and external datasets affect statistical power. We found that prior external validation studies used sample sizes prone to low power, which may lead to false negatives and effect size inflation. Furthermore, increases in the external sample size led to increased simulated power directly following theoretical power curves, whereas changes in the training dataset size offset the simulated power curves. Finally, we compared the performance of a model within a dataset to the external performance. The within-dataset performance was typically within r=0.2 of the cross-dataset performance, which could help decide how to power future external validation studies. Overall, our results illustrate the importance of considering the sample sizes of both the training and external datasets when performing external validation.
Collapse
Affiliation(s)
| | - Link Tejavibulya
- Interdepartmental Neuroscience Program, Yale University, New Haven, CT
| | - Chris C. Camp
- Interdepartmental Neuroscience Program, Yale University, New Haven, CT
| | - Rongtao Jiang
- Department of Radiology & Biomedical Imaging, Yale School of Medicine, New Haven, CT
| | - Margaret L. Westwater
- Department of Radiology & Biomedical Imaging, Yale School of Medicine, New Haven, CT
| | - Stephanie Noble
- Department of Radiology & Biomedical Imaging, Yale School of Medicine, New Haven, CT
- Department of Bioengineering, Northeastern University, Boston, MA
- Department of Psychology, Northeastern University, Boston, MA
| | - Dustin Scheinost
- Department of Biomedical Engineering, Yale University, New Haven, CT
- Interdepartmental Neuroscience Program, Yale University, New Haven, CT
- Department of Radiology & Biomedical Imaging, Yale School of Medicine, New Haven, CT
- Child Study Center, Yale School of Medicine, New Haven, CT
- Department of Statistics & Data Science, Yale University, New Haven, CT
| |
Collapse
|
17
|
Sahiner B, Chen W, Samala RK, Petrick N. Data drift in medical machine learning: implications and potential remedies. Br J Radiol 2023; 96:20220878. [PMID: 36971405 PMCID: PMC10546450 DOI: 10.1259/bjr.20220878] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 02/16/2023] [Accepted: 02/20/2023] [Indexed: 03/29/2023] Open
Abstract
Data drift refers to differences between the data used in training a machine learning (ML) model and that applied to the model in real-world operation. Medical ML systems can be exposed to various forms of data drift, including differences between the data sampled for training and used in clinical operation, differences between medical practices or context of use between training and clinical use, and time-related changes in patient populations, disease patterns, and data acquisition, to name a few. In this article, we first review the terminology used in ML literature related to data drift, define distinct types of drift, and discuss in detail potential causes within the context of medical applications with an emphasis on medical imaging. We then review the recent literature regarding the effects of data drift on medical ML systems, which overwhelmingly show that data drift can be a major cause for performance deterioration. We then discuss methods for monitoring data drift and mitigating its effects with an emphasis on pre- and post-deployment techniques. Some of the potential methods for drift detection and issues around model retraining when drift is detected are included. Based on our review, we find that data drift is a major concern in medical ML deployment and that more research is needed so that ML models can identify drift early, incorporate effective mitigation strategies and resist performance decay.
Collapse
Affiliation(s)
- Berkman Sahiner
- Center for Devices and Radiological Health, U.S. Food and Drug Administration 10903 New Hampshire Avenue, Silver Spring, MD 20993-0002
| | - Weijie Chen
- Center for Devices and Radiological Health, U.S. Food and Drug Administration 10903 New Hampshire Avenue, Silver Spring, MD 20993-0002
| | - Ravi K. Samala
- Center for Devices and Radiological Health, U.S. Food and Drug Administration 10903 New Hampshire Avenue, Silver Spring, MD 20993-0002
| | - Nicholas Petrick
- Center for Devices and Radiological Health, U.S. Food and Drug Administration 10903 New Hampshire Avenue, Silver Spring, MD 20993-0002
| |
Collapse
|
18
|
Kiessner AK, Schirrmeister RT, Gemein LAW, Boedecker J, Ball T. An extended clinical EEG dataset with 15,300 automatically labelled recordings for pathology decoding. Neuroimage Clin 2023; 39:103482. [PMID: 37544168 PMCID: PMC10432245 DOI: 10.1016/j.nicl.2023.103482] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Revised: 06/09/2023] [Accepted: 07/25/2023] [Indexed: 08/08/2023]
Abstract
Automated clinical EEG analysis using machine learning (ML) methods is a growing EEG research area. Previous studies on binary EEG pathology decoding have mainly used the Temple University Hospital (TUH) Abnormal EEG Corpus (TUAB) which contains approximately 3,000 manually labelled EEG recordings. To evaluate and eventually even improve the generalisation performance of machine learning methods for EEG pathology, decoding larger, publicly available datasets is required. A number of studies addressed the automatic labelling of large open-source datasets as an approach to create new datasets for EEG pathology decoding, but little is known about the extent to which training on larger, automatically labelled dataset affects decoding performances of established deep neural networks. In this study, we automatically created additional pathology labels for the Temple University Hospital (TUH) EEG Corpus (TUEG) based on the medical reports using a rule-based text classifier. We generated a dataset of 15,300 newly labelled recordings, which we call the TUH Abnormal Expansion EEG Corpus (TUABEX), and which is five times larger than the TUAB. Since the TUABEX contains more pathological (75%) than non-pathological (25%) recordings, we then selected a balanced subset of 8,879 recordings, the TUH Abnormal Expansion Balanced EEG Corpus (TUABEXB). To investigate how training on a larger, automatically labelled dataset affects the decoding performance of deep neural networks, we applied four established deep convolutional neural networks (ConvNets) to the task of pathological versus non-pathological classification and compared the performance of each architecture after training on different datasets. The results show that training on the automatically labelled TUABEXB dataset rather than training on the manually labelled TUAB dataset increases accuracies on TUABEXB and even for TUAB itself for some architectures. We argue that automatically labelling of large open-source datasets can be used to efficiently utilise the massive amount of EEG data stored in clinical archives. We make the proposed TUABEXB available open source and thus offer a new dataset for EEG machine learning research.
Collapse
Affiliation(s)
- Ann-Kathrin Kiessner
- Neuromedical AI Lab, Department of Neurosurgery, Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Engelbergerstr. 21, 79106 Freiburg, Germany; BrainLinks-BrainTools, IMBIT (Institute for Machine-Brain Interfacing Technology), University of Freiburg, Georges-Köhler-Allee 201, 79110 Freiburg, Germany; Autonomous Intelligent Systems, Computer Science Department - University of Freiburg, Faculty of Engineering, University of Freiburg, Georges-Köhler-Allee 80, 79110 Freiburg, Germany.
| | - Robin T Schirrmeister
- Neuromedical AI Lab, Department of Neurosurgery, Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Engelbergerstr. 21, 79106 Freiburg, Germany; BrainLinks-BrainTools, IMBIT (Institute for Machine-Brain Interfacing Technology), University of Freiburg, Georges-Köhler-Allee 201, 79110 Freiburg, Germany
| | - Lukas A W Gemein
- Neuromedical AI Lab, Department of Neurosurgery, Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Engelbergerstr. 21, 79106 Freiburg, Germany; Neurorobotics Lab, Computer Science Department - University of Freiburg, Faculty of Engineering, University of Freiburg, Georges-Köhler-Allee 80, 79110 Freiburg, Germany
| | - Joschka Boedecker
- BrainLinks-BrainTools, IMBIT (Institute for Machine-Brain Interfacing Technology), University of Freiburg, Georges-Köhler-Allee 201, 79110 Freiburg, Germany; Neurorobotics Lab, Computer Science Department - University of Freiburg, Faculty of Engineering, University of Freiburg, Georges-Köhler-Allee 80, 79110 Freiburg, Germany
| | - Tonio Ball
- Neuromedical AI Lab, Department of Neurosurgery, Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Engelbergerstr. 21, 79106 Freiburg, Germany; BrainLinks-BrainTools, IMBIT (Institute for Machine-Brain Interfacing Technology), University of Freiburg, Georges-Köhler-Allee 201, 79110 Freiburg, Germany
| |
Collapse
|
19
|
Badal K, Lee CM, Esserman LJ. Guiding principles for the responsible development of artificial intelligence tools for healthcare. COMMUNICATIONS MEDICINE 2023; 3:47. [PMID: 37005467 PMCID: PMC10066953 DOI: 10.1038/s43856-023-00279-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Accepted: 03/21/2023] [Indexed: 04/04/2023] Open
Abstract
Several principles have been proposed to improve use of artificial intelligence (AI) in healthcare, but the need for AI to improve longstanding healthcare challenges has not been sufficiently emphasized. We propose that AI should be designed to alleviate health disparities, report clinically meaningful outcomes, reduce overdiagnosis and overtreatment, have high healthcare value, consider biographical drivers of health, be easily tailored to the local population, promote a learning healthcare system, and facilitate shared decision-making. These principles are illustrated by examples from breast cancer research and we provide questions that can be used by AI developers when applying each principle to their work.
Collapse
Affiliation(s)
- Kimberly Badal
- Department of Surgery, Helen Diller Comprehensive Cancer Center, University of California, San Francisco, CA, USA.
| | - Carmen M Lee
- Department of Emergency Medicine, Highland Hospital, Alameda Health System, Alameda, CA, USA
| | - Laura J Esserman
- Department of Surgery, Helen Diller Comprehensive Cancer Center, University of California, San Francisco, CA, USA
| |
Collapse
|
20
|
Modabbernia A, Whalley HC, Glahn DC, Thompson PM, Kahn RS, Frangou S. Systematic evaluation of machine learning algorithms for neuroanatomically-based age prediction in youth. Hum Brain Mapp 2022; 43:5126-5140. [PMID: 35852028 PMCID: PMC9812239 DOI: 10.1002/hbm.26010] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 05/25/2022] [Accepted: 06/27/2022] [Indexed: 01/15/2023] Open
Abstract
Application of machine learning (ML) algorithms to structural magnetic resonance imaging (sMRI) data has yielded behaviorally meaningful estimates of the biological age of the brain (brain-age). The choice of the ML approach in estimating brain-age in youth is important because age-related brain changes in this age-group are dynamic. However, the comparative performance of the available ML algorithms has not been systematically appraised. To address this gap, the present study evaluated the accuracy (mean absolute error [MAE]) and computational efficiency of 21 machine learning algorithms using sMRI data from 2105 typically developing individuals aged 5-22 years from five cohorts. The trained models were then tested in two independent holdout datasets, one comprising 4078 individuals aged 9-10 years and another comprising 594 individuals aged 5-21 years. The algorithms encompassed parametric and nonparametric, Bayesian, linear and nonlinear, tree-based, and kernel-based models. Sensitivity analyses were performed for parcellation scheme, number of neuroimaging input features, number of cross-validation folds, number of extreme outliers, and sample size. Tree-based models and algorithms with a nonlinear kernel performed comparably well, with the latter being especially computationally efficient. Extreme Gradient Boosting (MAE of 1.49 years), Random Forest Regression (MAE of 1.58 years), and Support Vector Regression (SVR) with Radial Basis Function (RBF) Kernel (MAE of 1.64 years) emerged as the three most accurate models. Linear algorithms, with the exception of Elastic Net Regression, performed poorly. Findings of the present study could be used as a guide for optimizing methodology when quantifying brain-age in youth.
Collapse
Affiliation(s)
| | - Heather C. Whalley
- Division of PsychiatryUniversity of Edinburgh, Kennedy Tower, Royal Edinburgh HospitalEdinburghUK
| | - David C. Glahn
- Boston Children's Hospital and Harvard Medical SchoolBostonMassachusettsUSA
| | - Paul M. Thompson
- Imaging Genetics Center, Mark and Mary Stevens Neuroimaging and Informatics Institute, Keck School of MedicineUniversity of Southern CaliforniaLos AngelesCaliforniaUSA
| | - Rene S. Kahn
- Department of PsychiatryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Sophia Frangou
- Department of PsychiatryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Psychiatry, Djavad Mowafaghian Centre for Brain HealthUniversity of British ColumbiaVancouverBritish ColumbiaCanada
| |
Collapse
|
21
|
Li Y, Salimi-Khorshidi G, Rao S, Canoy D, Hassaine A, Lukasiewicz T, Rahimi K, Mamouei M. Validation of risk prediction models applied to longitudinal electronic health record data for the prediction of major cardiovascular events in the presence of data shifts. EUROPEAN HEART JOURNAL. DIGITAL HEALTH 2022; 3:535-547. [PMID: 36710898 PMCID: PMC9779795 DOI: 10.1093/ehjdh/ztac061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 09/22/2022] [Indexed: 12/24/2022]
Abstract
Aims Deep learning has dominated predictive modelling across different fields, but in medicine it has been met with mixed reception. In clinical practice, simple, statistical models and risk scores continue to inform cardiovascular disease risk predictions. This is due in part to the knowledge gap about how deep learning models perform in practice when they are subject to dynamic data shifts; a key criterion that common internal validation procedures do not address. We evaluated the performance of a novel deep learning model, BEHRT, under data shifts and compared it with several ML-based and established risk models. Methods and results Using linked electronic health records of 1.1 million patients across England aged at least 35 years between 1985 and 2015, we replicated three established statistical models for predicting 5-year risk of incident heart failure, stroke, and coronary heart disease. The results were compared with a widely accepted machine learning model (random forests), and a novel deep learning model (BEHRT). In addition to internal validation, we investigated how data shifts affect model discrimination and calibration. To this end, we tested the models on cohorts from (i) distinct geographical regions; (ii) different periods. Using internal validation, the deep learning models substantially outperformed the best statistical models by 6%, 8%, and 11% in heart failure, stroke, and coronary heart disease, respectively, in terms of the area under the receiver operating characteristic curve. Conclusion The performance of all models declined as a result of data shifts; despite this, the deep learning models maintained the best performance in all risk prediction tasks. Updating the model with the latest information can improve discrimination but if the prior distribution changes, the model may remain miscalibrated.
Collapse
Affiliation(s)
- Yikuan Li
- Deep Medicine, Oxford Martin School, University of Oxford, Hayes House, 75 George Street, Oxford OX1 2BQ, UK
- Nuffield Department of Women’s and Reproductive Health, Medical Science Division, University of Oxford, Oxford, UK
| | - Gholamreza Salimi-Khorshidi
- Deep Medicine, Oxford Martin School, University of Oxford, Hayes House, 75 George Street, Oxford OX1 2BQ, UK
- Nuffield Department of Women’s and Reproductive Health, Medical Science Division, University of Oxford, Oxford, UK
| | - Shishir Rao
- Deep Medicine, Oxford Martin School, University of Oxford, Hayes House, 75 George Street, Oxford OX1 2BQ, UK
- Nuffield Department of Women’s and Reproductive Health, Medical Science Division, University of Oxford, Oxford, UK
| | - Dexter Canoy
- Deep Medicine, Oxford Martin School, University of Oxford, Hayes House, 75 George Street, Oxford OX1 2BQ, UK
- Nuffield Department of Women’s and Reproductive Health, Medical Science Division, University of Oxford, Oxford, UK
- NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| | - Abdelaali Hassaine
- Deep Medicine, Oxford Martin School, University of Oxford, Hayes House, 75 George Street, Oxford OX1 2BQ, UK
- Nuffield Department of Women’s and Reproductive Health, Medical Science Division, University of Oxford, Oxford, UK
| | | | - Kazem Rahimi
- Deep Medicine, Oxford Martin School, University of Oxford, Hayes House, 75 George Street, Oxford OX1 2BQ, UK
- Nuffield Department of Women’s and Reproductive Health, Medical Science Division, University of Oxford, Oxford, UK
- NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| | - Mohammad Mamouei
- Deep Medicine, Oxford Martin School, University of Oxford, Hayes House, 75 George Street, Oxford OX1 2BQ, UK
- Nuffield Department of Women’s and Reproductive Health, Medical Science Division, University of Oxford, Oxford, UK
| |
Collapse
|
22
|
Thomas AW, Ré C, Poldrack RA. Interpreting mental state decoding with deep learning models. Trends Cogn Sci 2022; 26:972-986. [PMID: 36223760 DOI: 10.1016/j.tics.2022.07.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2021] [Revised: 07/15/2022] [Accepted: 07/18/2022] [Indexed: 01/12/2023]
Abstract
In mental state decoding, researchers aim to identify the set of mental states (e.g., experiencing happiness or fear) that can be reliably identified from the activity patterns of a brain region (or network). Deep learning (DL) models are highly promising for mental state decoding because of their unmatched ability to learn versatile representations of complex data. However, their widespread application in mental state decoding is hindered by their lack of interpretability, difficulties in applying them to small datasets, and in ensuring their reproducibility and robustness. We recommend approaching these challenges by leveraging recent advances in explainable artificial intelligence (XAI) and transfer learning, and also provide recommendations on how to improve the reproducibility and robustness of DL models in mental state decoding.
Collapse
Affiliation(s)
- Armin W Thomas
- Stanford Data Science, Stanford University, Stanford, CA, USA; Department of Psychology, Stanford University, Stanford, CA, USA.
| | - Christopher Ré
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Russell A Poldrack
- Stanford Data Science, Stanford University, Stanford, CA, USA; Department of Psychology, Stanford University, Stanford, CA, USA
| |
Collapse
|
23
|
Bayer JMM, Thompson PM, Ching CRK, Liu M, Chen A, Panzenhagen AC, Jahanshad N, Marquand A, Schmaal L, Sämann PG. Site effects how-to and when: An overview of retrospective techniques to accommodate site effects in multi-site neuroimaging analyses. Front Neurol 2022; 13:923988. [PMID: 36388214 PMCID: PMC9661923 DOI: 10.3389/fneur.2022.923988] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Accepted: 08/12/2022] [Indexed: 09/12/2023] Open
Abstract
Site differences, or systematic differences in feature distributions across multiple data-acquisition sites, are a known source of heterogeneity that may adversely affect large-scale meta- and mega-analyses of independently collected neuroimaging data. They influence nearly all multi-site imaging modalities and biomarkers, and methods to compensate for them can improve reliability and generalizability in the analysis of genetics, omics, and clinical data. The origins of statistical site effects are complex and involve both technical differences (scanner vendor, head coil, acquisition parameters, imaging processing) and differences in sample characteristics (inclusion/exclusion criteria, sample size, ancestry) between sites. In an age of expanding international consortium research, there is a growing need to disentangle technical site effects from sample characteristics of interest. Numerous statistical and machine learning methods have been developed to control for, model, or attenuate site effects - yet to date, no comprehensive review has discussed the benefits and drawbacks of each for different use cases. Here, we provide an overview of the different existing statistical and machine learning methods developed to remove unwanted site effects from independently collected neuroimaging samples. We focus on linear mixed effect models, the ComBat technique and its variants, adjustments based on image quality metrics, normative modeling, and deep learning approaches such as generative adversarial networks. For each method, we outline the statistical foundation and summarize strengths and weaknesses, including their assumptions and conditions of use. We provide information on software availability and comment on the ease of use and the applicability of these methods to different types of data. We discuss validation and comparative reports, mention caveats and provide guidance on when to use each method, depending on context and specific research questions.
Collapse
Affiliation(s)
- Johanna M. M. Bayer
- Centre for Youth Mental Health, University of Melbourne, Melbourne, VIC, Australia
- Orygen, Parkville, VIC, Australia
| | - Paul M. Thompson
- Imaging Genetics Center, Mark and Mary Stevens Neuroimaging and Informatics Institute, Keck School of Medicine, University of Southern California, Marina del Rey, CA, United States
| | - Christopher R. K. Ching
- Imaging Genetics Center, Mark and Mary Stevens Neuroimaging and Informatics Institute, Keck School of Medicine, University of Southern California, Marina del Rey, CA, United States
| | - Mengting Liu
- School of Biomedical Engineering, Sun Yat-sen University, Shenzhen, China
| | - Andrew Chen
- Department of Biostatistics, Epidemiology, and Informatics, Penn Statistics in Imaging and Visualization Center, University of Pennsylvania, Philadelphia, PA, United States
- Center for Biomedical Image Computing and Analytics, University of Pennsylvania, Philadelphia, PA, United States
| | - Alana C. Panzenhagen
- Programa de Pós-graduação em Ciências Biológicas: Bioquímica, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
- Department of Translational Psychiatry, Max Planck Institute of Psychiatry, Munich, Germany
| | - Neda Jahanshad
- Laboratory of Brain eScience, Mark and Mary Stevens Neuroimaging and Informatics Institute, Keck School of Medicine of USC, University of Southern California, Marina del Rey, CA, United States
| | - Andre Marquand
- Department of Cognitive Neuroscience, Donders Institute for Brain, Cognition and Behavior, Radboudumc, Nijmegen, Netherlands
| | - Lianne Schmaal
- Centre for Youth Mental Health, University of Melbourne, Melbourne, VIC, Australia
- Orygen, Parkville, VIC, Australia
| | | |
Collapse
|
24
|
Sabovčik F, Ntalianis E, Cauwenberghs N, Kuznetsova T. Improving predictive performance in incident heart failure using machine learning and multi-center data. Front Cardiovasc Med 2022; 9:1011071. [PMID: 36330000 PMCID: PMC9623026 DOI: 10.3389/fcvm.2022.1011071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 10/03/2022] [Indexed: 12/05/2022] Open
Abstract
Objective To mitigate the burden associated with heart failure (HF), primary prevention is of the utmost importance. To improve early risk stratification, advanced computational methods such as machine learning (ML) capturing complex individual patterns in large data might be necessary. Therefore, we compared the predictive performance of incident HF risk models in terms of (a) flexible ML models and linear models and (b) models trained on a single cohort (single-center) and on multiple heterogeneous cohorts (multi-center). Design and methods In our analysis, we used the meta-data consisting of 30,354 individuals from 6 cohorts. During a median follow-up of 5.40 years, 1,068 individuals experienced a non-fatal HF event. We evaluated the predictive performance of survival gradient boosting (SGB), CoxNet, the PCP-HF risk score, and a stacking method. Predictions were obtained iteratively, in each iteration one cohort serving as an external test set and either one or all remaining cohorts as a training set (single- or multi-center, respectively). Results Overall, multi-center models systematically outperformed single-center models. Further, c-index in the pooled population was higher in SGB (0.735) than in CoxNet (0.694). In the precision-recall (PR) analysis for predicting 10-year HF risk, the stacking method, combining the SGB, CoxNet, Gaussian mixture and PCP-HF models, outperformed other models with PR/AUC 0.804, while PCP-HF achieved only 0.551. Conclusion With a greater number and variety of training cohorts, the model learns a wider range of specific individual health characteristics. Flexible ML algorithms can be used to capture these diverse distributions and produce more precise prediction models.
Collapse
Affiliation(s)
| | | | | | - Tatiana Kuznetsova
- Research Unit of Hypertension and Cardiovascular Epidemiology, KU Leuven Department of Cardiovascular Sciences, University of Leuven, Leuven, Belgium
| |
Collapse
|
25
|
Feng C, Wang Z, Liu C, Liu S, Wang Y, Zeng Y, Wang Q, Peng T, Pu X, Liu J. Integrated bioinformatical analysis, machine learning and in vitro experiment-identified m6A subtype, and predictive drug target signatures for diagnosing renal fibrosis. Front Pharmacol 2022; 13:909784. [PMID: 36120336 PMCID: PMC9470879 DOI: 10.3389/fphar.2022.909784] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Accepted: 07/29/2022] [Indexed: 11/13/2022] Open
Abstract
Renal biopsy is the gold standard for defining renal fibrosis which causes calcium deposits in the kidneys. Persistent calcium deposition leads to kidney inflammation, cell necrosis, and is related to serious kidney diseases. However, it is invasive and involves the risk of complications such as bleeding, especially in patients with end-stage renal diseases. Therefore, it is necessary to identify specific diagnostic biomarkers for renal fibrosis. This study aimed to develop a predictive drug target signature to diagnose renal fibrosis based on m6A subtypes. We then performed an unsupervised consensus clustering analysis to identify three different m6A subtypes of renal fibrosis based on the expressions of 21 m6A regulators. We evaluated the immune infiltration characteristics and expression of canonical immune checkpoints and immune-related genes with distinct m6A modification patterns. Subsequently, we performed the WGCNA analysis using the expression data of 1,611 drug targets to identify 474 genes associated with the m6A modification. 92 overlapping drug targets between WGCNA and DEGs (renal fibrosis vs. normal samples) were defined as key drug targets. A five target gene predictive model was developed through the combination of LASSO regression and stepwise logistic regression (LASSO-SLR) to diagnose renal fibrosis. We further performed drug sensitivity analysis and extracellular matrix analysis on model genes. The ROC curve showed that the risk score (AUC = 0.863) performed well in diagnosing renal fibrosis in the training dataset. In addition, the external validation dataset further confirmed the outstanding predictive performance of the risk score (AUC = 0.755). These results indicate that the risk model has an excellent predictive performance for diagnosing the disease. Furthermore, our results show that this 5-target gene model is significantly associated with many drugs and extracellular matrix activities. Finally, the expression levels of both predictive signature genes EGR1 and PLA2G4A were validated in renal fibrosis and adjacent normal tissues by using qRT-PCR and Western blot method.
Collapse
Affiliation(s)
- Chunxiang Feng
- Department of Urology, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangdong Guangzhou, Wuhan, China
| | - Zhixian Wang
- Department of Urology, Wuhan Hospital of Traditional Chinese and Western Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Department of Urology, Wuhan No. 1 Hospital, Wuhan, China
| | - Chang Liu
- Department of Geriatrics, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Shiliang Liu
- Department of Urology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Yuxi Wang
- Department of Nephrology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Yuanyuan Zeng
- School of Life Science and Engineering, Southwest Jiaotong University, Chengdu, China
| | - Qianqian Wang
- Department of Urology, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangdong Guangzhou, Wuhan, China
| | - Tianming Peng
- Department of Urology, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangdong Guangzhou, Wuhan, China
| | - Xiaoyong Pu
- Department of Urology, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangdong Guangzhou, Wuhan, China
- *Correspondence: Xiaoyong Pu, ; Jiumin Liu,
| | - Jiumin Liu
- Department of Urology, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangdong Guangzhou, Wuhan, China
- *Correspondence: Xiaoyong Pu, ; Jiumin Liu,
| |
Collapse
|
26
|
Spisak T. Statistical quantification of confounding bias in machine learning models. Gigascience 2022; 11:giac082. [PMID: 36017878 PMCID: PMC9412867 DOI: 10.1093/gigascience/giac082] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 07/07/2022] [Accepted: 07/28/2022] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND The lack of nonparametric statistical tests for confounding bias significantly hampers the development of robust, valid, and generalizable predictive models in many fields of research. Here I propose the partial confounder test, which, for a given confounder variable, probes the null hypotheses of the model being unconfounded. RESULTS The test provides a strict control for type I errors and high statistical power, even for nonnormally and nonlinearly dependent predictions, often seen in machine learning. Applying the proposed test on models trained on large-scale functional brain connectivity data (N= 1,865) (i) reveals previously unreported confounders and (ii) shows that state-of-the-art confound mitigation approaches may fail preventing confounder bias in several cases. CONCLUSIONS The proposed test (implemented in the package mlconfound; https://mlconfound.readthedocs.io) can aid the assessment and improvement of the generalizability and validity of predictive models and, thereby, fosters the development of clinically useful machine learning biomarkers.
Collapse
Affiliation(s)
- Tamas Spisak
- Center for Translational Neuro- and Behavioral Sciences, Institute for Diagnostic and Interventional Radiology and Neuroradiology, Center University Hospital Essen, Essen, D-45147, Germany
| |
Collapse
|
27
|
Leonardsen EH, Peng H, Kaufmann T, Agartz I, Andreassen OA, Celius EG, Espeseth T, Harbo HF, Høgestøl EA, Lange AMD, Marquand AF, Vidal-Piñeiro D, Roe JM, Selbæk G, Sørensen Ø, Smith SM, Westlye LT, Wolfers T, Wang Y. Deep neural networks learn general and clinically relevant representations of the ageing brain. Neuroimage 2022; 256:119210. [PMID: 35462035 PMCID: PMC7614754 DOI: 10.1016/j.neuroimage.2022.119210] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Revised: 03/16/2022] [Accepted: 04/11/2022] [Indexed: 12/17/2022] Open
Abstract
The discrepancy between chronological age and the apparent age of the brain based on neuroimaging data - the brain age delta - has emerged as a reliable marker of brain health. With an increasing wealth of data, approaches to tackle heterogeneity in data acquisition are vital. To this end, we compiled raw structural magnetic resonance images into one of the largest and most diverse datasets assembled (n=53542), and trained convolutional neural networks (CNNs) to predict age. We achieved state-of-the-art performance on unseen data from unknown scanners (n=2553), and showed that higher brain age delta is associated with diabetes, alcohol intake and smoking. Using transfer learning, the intermediate representations learned by our model complemented and partly outperformed brain age delta in predicting common brain disorders. Our work shows we can achieve generalizable and biologically plausible brain age predictions using CNNs trained on heterogeneous datasets, and transfer them to clinical use cases.
Collapse
Affiliation(s)
- Esten H Leonardsen
- Department of Psychology, University of Oslo, Oslo, Norway; Norwegian Centre for Mental Disorders Research (NORMENT), Oslo University Hospital & Institute of Clinical Medicine, University of Oslo, Oslo, Norway.
| | - Han Peng
- Wellcome Centre for Integrative Neuroimaging (WIN FMRIB), University of Oxford, Oxford, OX3 9DU, United Kingdom
| | - Tobias Kaufmann
- Norwegian Centre for Mental Disorders Research (NORMENT), Oslo University Hospital & Institute of Clinical Medicine, University of Oslo, Oslo, Norway; Department of Psychiatry and Psychotherapy, Tübingen Center for Mental Health, University of Tübingen, Germany
| | - Ingrid Agartz
- Norwegian Centre for Mental Disorders Research (NORMENT), Oslo University Hospital & Institute of Clinical Medicine, University of Oslo, Oslo, Norway; Department of Psychiatric Research, Diakonhjemmet Hospital, Oslo, Norway; Centre for Psychiatry Research, Department of Clinical Neuroscience, Karolinska Institutet & Stockholm Health Care Services, Stockholm County Council, Stockholm, Sweden
| | - Ole A Andreassen
- Norwegian Centre for Mental Disorders Research (NORMENT), Oslo University Hospital & Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | - Elisabeth Gulowsen Celius
- Department of Neurology, Oslo University Hospital, Norway; Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | - Thomas Espeseth
- Department of Psychology, University of Oslo, Oslo, Norway; Department of Psychology, Bjørknes University College, Oslo, Norway
| | - Hanne F Harbo
- Department of Neurology, Oslo University Hospital, Norway; Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | - Einar A Høgestøl
- Department of Psychology, University of Oslo, Oslo, Norway; Norwegian Centre for Mental Disorders Research (NORMENT), Oslo University Hospital & Institute of Clinical Medicine, University of Oslo, Oslo, Norway; Department of Neurology, Oslo University Hospital, Norway
| | - Ann-Marie de Lange
- Department of Psychology, University of Oslo, Oslo, Norway; LREN, Centre for Research in Neurosciences-Department of Clinical Neurosciences, CHUV and University of Lausanne, Lausanne, Switzerland; Department of Psychiatry, University of Oxford, Oxford, UK
| | - Andre F Marquand
- Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Centre, Nijmegen, Netherlands
| | | | - James M Roe
- Department of Psychology, University of Oslo, Oslo, Norway
| | - Geir Selbæk
- Norwegian National Advisory Unit on Aging and Health, Vestfold Hospital Trust, Tønsberg, Norway; Department of Geriatric Medicine, Oslo University Hospital, Oslo, Norway
| | | | - Stephen M Smith
- Wellcome Centre for Integrative Neuroimaging (WIN FMRIB), University of Oxford, Oxford, OX3 9DU, United Kingdom
| | - Lars T Westlye
- Department of Psychology, University of Oslo, Oslo, Norway; Norwegian Centre for Mental Disorders Research (NORMENT), Oslo University Hospital & Institute of Clinical Medicine, University of Oslo, Oslo, Norway; KG Jebsen Center for Neurodevelopmental Disorders, University of Oslo, Oslo, Norway
| | - Thomas Wolfers
- Department of Psychology, University of Oslo, Oslo, Norway; Norwegian Centre for Mental Disorders Research (NORMENT), Oslo University Hospital & Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | - Yunpeng Wang
- Department of Psychology, University of Oslo, Oslo, Norway
| |
Collapse
|
28
|
Dobosz P, Stempor PA, Ramírez Moreno M, Bulgakova NA. Transcriptional and post-transcriptional regulation of checkpoint genes on the tumour side of the immunological synapse. Heredity (Edinb) 2022; 129:64-74. [PMID: 35459932 PMCID: PMC9273643 DOI: 10.1038/s41437-022-00533-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Revised: 03/29/2022] [Accepted: 03/30/2022] [Indexed: 02/06/2023] Open
Abstract
Cancer is a disease of the genome, therefore, its development has a clear Mendelian component, demonstrated by well-studied genes such as BRCA1 and BRCA2 in breast cancer risk. However, it is known that a single genetic variant is not enough for cancer to develop leading to the theory of multistage carcinogenesis. In many cases, it is a sequence of events, acquired somatic mutations, or simply polygenic components with strong epigenetic effects, such as in the case of brain tumours. The expression of many genes is the product of the complex interplay between several factors, including the organism's genotype (in most cases Mendelian-inherited), genetic instability, epigenetic factors (non-Mendelian-inherited) as well as the immune response of the host, to name just a few. In recent years the importance of the immune system has been elevated, especially in the light of the immune checkpoint genes discovery and the subsequent development of their inhibitors. As the expression of these genes normally suppresses self-immunoreactivity, their expression by tumour cells prevents the elimination of the tumour by the immune system. These discoveries led to the rapid growth of the field of immuno-oncology that offers new possibilities of long-lasting and effective treatment options. Here we discuss the recent advances in the understanding of the key mechanisms controlling the expression of immune checkpoint genes in tumour cells.
Collapse
Affiliation(s)
- Paula Dobosz
- Central Clinical Hospital of the Ministry of Interior Affairs and Administration in Warsaw, Warsaw, Poland
| | | | - Miguel Ramírez Moreno
- School of Biosciences and Bateson Centre, The University of Sheffield, Sheffield, UK
| | - Natalia A Bulgakova
- School of Biosciences and Bateson Centre, The University of Sheffield, Sheffield, UK.
| |
Collapse
|
29
|
Varoquaux G, Cheplygina V. Machine learning for medical imaging: methodological failures and recommendations for the future. NPJ Digit Med 2022; 5:48. [PMID: 35413988 PMCID: PMC9005663 DOI: 10.1038/s41746-022-00592-y] [Citation(s) in RCA: 144] [Impact Index Per Article: 72.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Accepted: 03/09/2022] [Indexed: 12/23/2022] Open
Abstract
Research in computer analysis of medical images bears many promises to improve patients' health. However, a number of systematic challenges are slowing down the progress of the field, from limitations of the data, such as biases, to research incentives, such as optimizing for publication. In this paper we review roadblocks to developing and assessing methods. Building our analysis on evidence from the literature and data challenges, we show that at every step, potential biases can creep in. On a positive note, we also discuss on-going efforts to counteract these problems. Finally we provide recommendations on how to further address these problems in the future.
Collapse
Affiliation(s)
- Gaël Varoquaux
- INRIA, Versailles, France.
- McGill University, Montreal, Canada.
- Mila, Montreal, Canada.
| | | |
Collapse
|
30
|
Crosby D, Bhatia S, Brindle KM, Coussens LM, Dive C, Emberton M, Esener S, Fitzgerald RC, Gambhir SS, Kuhn P, Rebbeck TR, Balasubramanian S. Early detection of cancer. Science 2022; 375:eaay9040. [PMID: 35298272 DOI: 10.1126/science.aay9040] [Citation(s) in RCA: 257] [Impact Index Per Article: 128.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Survival improves when cancer is detected early. However, ~50% of cancers are at an advanced stage when diagnosed. Early detection of cancer or precancerous change allows early intervention to try to slow or prevent cancer development and lethality. To achieve early detection of all cancers, numerous challenges must be overcome. It is vital to better understand who is at greatest risk of developing cancer. We also need to elucidate the biology and trajectory of precancer and early cancer to identify consequential disease that requires intervention. Insights must be translated into sensitive and specific early detection technologies and be appropriately evaluated to support practical clinical implementation. Interdisciplinary collaboration is key; advances in technology and biological understanding highlight that it is time to accelerate early detection research and transform cancer survival.
Collapse
Affiliation(s)
| | - Sangeeta Bhatia
- Marble Center for Cancer Nanomedicine, Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Kevin M Brindle
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Lisa M Coussens
- Cell, Developmental and Cancer Biology, Oregon Health and Science University, Portland, OR, USA
- Knight Cancer Institute, Oregon Health and Science University, Portland, OR, USA
| | - Caroline Dive
- Cancer Research UK Lung Cancer Centre of Excellence at the University of Manchester and University College London, University of Manchester, Manchester, UK
- CRUK Manchester Institute Cancer Biomarker Centre, University of Manchester, Manchester, UK
| | - Mark Emberton
- Division of Surgery and Interventional Science, University College London, London, UK
| | - Sadik Esener
- Knight Cancer Institute, Oregon Health and Science University, Portland, OR, USA
- Department of Biomedical Engineering, School of Medicine, Oregon Health and Science University, Portland, OR, USA
- Cancer Early Detection Advanced Research Center, Oregon Health and Science University, Portland, OR, USA
| | - Rebecca C Fitzgerald
- Medical Research Council (MRC) Cancer Unit, Hutchison/MRC Research Centre, University of Cambridge, Cambridge, UK
| | - Sanjiv S Gambhir
- Department of Radiology, Molecular Imaging Program at Stanford, Stanford University, Stanford, CA, USA
| | - Peter Kuhn
- USC Michelson Center Convergent Science Institute in Cancer, University of Southern California, Los Angeles, CA, USA
| | - Timothy R Rebbeck
- Division of Population Science, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Shankar Balasubramanian
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
- Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, UK
| |
Collapse
|
31
|
Eichinski P, Alexander C, Roe P, Parsons S, Fuller S. A Convolutional Neural Network Bird Species Recognizer Built From Little Data by Iteratively Training, Detecting, and Labeling. Front Ecol Evol 2022. [DOI: 10.3389/fevo.2022.810330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Automatically detecting the calls of species of interest in audio recordings is a common but often challenging exercise in ecoacoustics. This challenge is increasingly being tackled with deep neural networks that generally require a rich set of training data. Often, the available training data might not be from the same geographical region as the study area and so may contain important differences. This mismatch in training and deployment datasets can impact the accuracy at deployment, mainly due to confusing sounds absent from the training data generating false positives, as well as some variation in call types. We have developed a multiclass convolutional neural network classifier for seven target bird species to track presence absence of these species over time in cotton growing regions. We started with no training data from cotton regions but we did have an unbalanced library of calls from other locations. Due to the relative scarcity of calls in recordings from cotton regions, manually scanning and labeling the recordings was prohibitively time consuming. In this paper we describe our process of overcoming this data mismatch to develop a recognizer that performs well on the cotton recordings for most classes. The recognizer was trained on recordings from outside the cotton regions and then applied to unlabeled cotton recordings. Based on the resulting outputs a verification set was chosen to be manually tagged and incorporated in the training set. By iterating this process, we were gradually able to build the training set of cotton audio examples. Through this process, we were able to increase the average class F1 score (the harmonic mean of precision and recall) of the recognizer on target recordings from 0.45 in the first iteration to 0.74.
Collapse
|