1
|
Horng H, Scott C, Winham S, Jensen M, Pantalone L, Mankowski W, Kerlikowske K, Vachon CM, Kontos D, Shinohara RT. Multivariate testing and effect size measures for batch effect evaluation in radiomic features. Sci Rep 2024; 14:13923. [PMID: 38886407 PMCID: PMC11183083 DOI: 10.1038/s41598-024-64208-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2024] [Accepted: 06/06/2024] [Indexed: 06/20/2024] Open
Abstract
While precision medicine applications of radiomics analysis are promising, differences in image acquisition can cause "batch effects" that reduce reproducibility and affect downstream predictive analyses. Harmonization methods such as ComBat have been developed to correct these effects, but evaluation methods for quantifying batch effects are inconsistent. In this study, we propose the use of the multivariate statistical test PERMANOVA and the Robust Effect Size Index (RESI) to better quantify and characterize batch effects in radiomics data. We evaluate these methods in both simulated and real radiomics features extracted from full-field digital mammography (FFDM) data. PERMANOVA demonstrated higher power than standard univariate statistical testing, and RESI was able to interpretably quantify the effect size of site at extremely large sample sizes. These methods show promise as more powerful and interpretable methods for the detection and quantification of batch effects in radiomics studies.
Collapse
Affiliation(s)
- Hannah Horng
- Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, 19104, USA.
- Department of Radiology, Center for Biomedical Image Computing and Analysis (CBICA), University of Pennsylvania, Philadelphia, PA, 19104, USA.
- Penn Statistics in Imaging Endeavor (PennSIVE), Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| | | | | | | | - Lauren Pantalone
- Department of Radiology, Center for Biomedical Image Computing and Analysis (CBICA), University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Walter Mankowski
- Department of Radiology, Center for Biomedical Image Computing and Analysis (CBICA), University of Pennsylvania, Philadelphia, PA, 19104, USA
| | | | | | - Despina Kontos
- Department of Radiology, Center for Biomedical Image Computing and Analysis (CBICA), University of Pennsylvania, Philadelphia, PA, 19104, USA
- Center for Innovation in Imaging Biomarkers and Integrated Diagnostics (CIMBID), Columbia University, New York, NY, 10027, USA
| | - Russell T Shinohara
- Department of Radiology, Center for Biomedical Image Computing and Analysis (CBICA), University of Pennsylvania, Philadelphia, PA, 19104, USA
- Penn Statistics in Imaging Endeavor (PennSIVE), Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA, 19104, USA
| |
Collapse
|
2
|
Chen AA, Weinstein SM, Adebimpe A, Gur RC, Gur RE, Merikangas KR, Satterthwaite TD, Shinohara RT, Shou H. Similarity-based multimodal regression. Biostatistics 2023:kxad033. [PMID: 38058018 DOI: 10.1093/biostatistics/kxad033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Revised: 10/07/2023] [Accepted: 11/06/2023] [Indexed: 12/08/2023] Open
Abstract
To better understand complex human phenotypes, large-scale studies have increasingly collected multiple data modalities across domains such as imaging, mobile health, and physical activity. The properties of each data type often differ substantially and require either separate analyses or extensive processing to obtain comparable features for a combined analysis. Multimodal data fusion enables certain analyses on matrix-valued and vector-valued data, but it generally cannot integrate modalities of different dimensions and data structures. For a single data modality, multivariate distance matrix regression provides a distance-based framework for regression accommodating a wide range of data types. However, no distance-based method exists to handle multiple complementary types of data. We propose a novel distance-based regression model, which we refer to as Similarity-based Multimodal Regression (SiMMR), that enables simultaneous regression of multiple modalities through their distance profiles. We demonstrate through simulation, imaging studies, and longitudinal mobile health analyses that our proposed method can detect associations between clinical variables and multimodal data of differing properties and dimensionalities, even with modest sample sizes. We perform experiments to evaluate several different test statistics and provide recommendations for applying our method across a broad range of scenarios.
Collapse
Affiliation(s)
- Andrew A Chen
- Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC 29425, USA
| | - Sarah M Weinstein
- Department of Epidemiology and Biostatistics, Temple University College of Public Health, Philadelphia, PA 19122, USA
| | - Azeez Adebimpe
- Penn Lifespan Informatics & Neuroimaging Center, Department of Psychiatry, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Psychiatry, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Ruben C Gur
- Department of Psychiatry, University of Pennsylvania, Philadelphia, PA 19104, USA
- Lifespan Brain Institute Penn Medicine and CHOP, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Raquel E Gur
- Department of Psychiatry, University of Pennsylvania, Philadelphia, PA 19104, USA
- Lifespan Brain Institute Penn Medicine and CHOP, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Kathleen R Merikangas
- Genetic Epidemiology Research Branch, Intramural Research Program, National Institute of Mental Health, Bethesda, MD 20892, USA
| | - Theodore D Satterthwaite
- Penn Lifespan Informatics & Neuroimaging Center, Department of Psychiatry, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Psychiatry, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Russell T Shinohara
- Penn Statistics in Imaging and Visualization Center, Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Center for Biomedical Image Computing and Analytics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Haochang Shou
- Penn Statistics in Imaging and Visualization Center, Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Center for Biomedical Image Computing and Analytics, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|