1
|
Chintapalli SS, Wang R, Yang Z, Tassopoulou V, Yu F, Bashyam V, Erus G, Chaudhari P, Shou H, Davatzikos C. NeuroSynth: MRI-Derived Neuroanatomical Generative Models and Associated Dataset of 18,000 Samples. ARXIV 2024:arXiv:2407.12897v1. [PMID: 39070036 PMCID: PMC11275685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
Availability of large and diverse medical datasets is often challenged by privacy and data sharing restrictions. For successful application of machine learning techniques for disease diagnosis, prognosis, and precision medicine, large amounts of data are necessary for model building and optimization. To help overcome such limitations in the context of brain MRI, we present NeuroSynth: a collection of generative models of normative regional volumetric features derived from structural brain imaging. NeuroSynth models are trained on real brain imaging regional volumetric measures from the iSTAGING consortium, which encompasses over 40,000 MRI scans across 13 studies, incorporating covariates such as age, sex, and race. Leveraging NeuroSynth, we produce and offer 18,000 synthetic samples spanning the adult lifespan (ages 22-90 years), alongside the model's capability to generate unlimited data. Experimental results indicate that samples generated from NeuroSynth agree with the distributions obtained from real data. Most importantly, the generated normative data significantly enhance the accuracy of downstream machine learning models on tasks such as disease classification. Data and models are available at: https://huggingface.co/spaces/rongguangw/neuro-synth.
Collapse
Affiliation(s)
- Sai Spandana Chintapalli
- Center for AI and Data Science for Integrated Diagnostics (AI2D), Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Rongguang Wang
- Center for AI and Data Science for Integrated Diagnostics (AI2D), Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Zhijian Yang
- Center for AI and Data Science for Integrated Diagnostics (AI2D), Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Vasiliki Tassopoulou
- Center for AI and Data Science for Integrated Diagnostics (AI2D), Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Fanyang Yu
- Center for AI and Data Science for Integrated Diagnostics (AI2D), Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Vishnu Bashyam
- Center for AI and Data Science for Integrated Diagnostics (AI2D), Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Guray Erus
- Center for AI and Data Science for Integrated Diagnostics (AI2D), Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Pratik Chaudhari
- Department of Electrical and Systems Engineering, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Haochang Shou
- Center for AI and Data Science for Integrated Diagnostics (AI2D), Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Christos Davatzikos
- Center for AI and Data Science for Integrated Diagnostics (AI2D), Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
2
|
Shiyam Sundar LK, Beyer T. Is Automatic Tumor Segmentation on Whole-Body 18F-FDG PET Images a Clinical Reality? J Nucl Med 2024; 65:995-997. [PMID: 38844359 PMCID: PMC11218718 DOI: 10.2967/jnumed.123.267183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Accepted: 05/13/2024] [Indexed: 07/03/2024] Open
Abstract
The integration of automated whole-body tumor segmentation using 18F-FDG PET/CT images represents a pivotal shift in oncologic diagnostics, enhancing the precision and efficiency of tumor burden assessment. This editorial examines the transition toward automation, propelled by advancements in artificial intelligence, notably through deep learning techniques. We highlight the current availability of commercial tools and the academic efforts that have set the stage for these developments. Further, we comment on the challenges of data diversity, validation needs, and regulatory barriers. The role of metabolic tumor volume and total lesion glycolysis as vital metrics in cancer management underscores the significance of this evaluation. Despite promising progress, we call for increased collaboration across academia, clinical users, and industry to better realize the clinical benefits of automated segmentation, thus helping to streamline workflows and improve patient outcomes in oncology.
Collapse
Affiliation(s)
| | - Thomas Beyer
- Quantitative Imaging and Medical Physics Team, Medical University of Vienna, Vienna, Austria
| |
Collapse
|
3
|
Deshpande R, Kelkar VA, Gotsis D, Kc P, Zeng R, Myers KJ, Brooks FJ, Anastasio MA. Report on the AAPM Grand Challenge on deep generative modeling for learning medical image statistics. ARXIV 2024:arXiv:2405.01822v1. [PMID: 38745699 PMCID: PMC11092676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Background The findings of the 2023 AAPM Grand Challenge on Deep Generative Modeling for Learning Medical Image Statistics are reported in this Special Report. Purpose The goal of this challenge was to promote the development of deep generative models for medical imaging and to emphasize the need for their domain-relevant assessments via the analysis of relevant image statistics. Methods As part of this Grand Challenge, a common training dataset and an evaluation procedure was developed for benchmarking deep generative models for medical image synthesis. To create the training dataset, an established 3D virtual breast phantom was adapted. The resulting dataset comprised about 108,000 images of size 512×512. For the evaluation of submissions to the Challenge, an ensemble of 10,000 DGM-generated images from each submission was employed. The evaluation procedure consisted of two stages. In the first stage, a preliminary check for memorization and image quality (via the Fréchet Inception Distance (FID)) was performed. Submissions that passed the first stage were then evaluated for the reproducibility of image statistics corresponding to several feature families including texture, morphology, image moments, fractal statistics and skeleton statistics. A summary measure in this feature space was employed to rank the submissions. Additional analyses of submissions was performed to assess DGM performance specific to individual feature families, the four classes in the training data, and also to identify various artifacts. Results Fifty-eight submissions from 12 unique users were received for this Challenge. Out of these 12 submissions, 9 submissions passed the first stage of evaluation and were eligible for ranking. The top-ranked submission employed a conditional latent diffusion model, whereas the joint runners-up employed a generative adversarial network, followed by another network for image superresolution. In general, we observed that the overall ranking of the top 9 submissions according to our evaluation method (i) did not match the FID-based ranking, and (ii) differed with respect to individual feature families. Another important finding from our additional analyses was that different DGMs demonstrated similar kinds of artifacts. Conclusions This Grand Challenge highlighted the need for domain-specific evaluation to further DGM design as well as deployment. It also demonstrated that the specification of a DGM may differ depending on its intended use.
Collapse
Affiliation(s)
- Rucha Deshpande
- Dept. of Biomedical Engineering, Washington University in St. Louis, St. Louis, Missouri, USA
| | - Varun A. Kelkar
- Dept. of Electrical and Computer Engineering, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| | - Dimitrios Gotsis
- Dept. of Electrical and Computer Engineering, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| | - Prabhat Kc
- Center for Devices and Radiological Health, Food and Drug Administration, Silver Spring, Maryland, USA
| | - Rongping Zeng
- Center for Devices and Radiological Health, Food and Drug Administration, Silver Spring, Maryland, USA
| | | | - Frank J. Brooks
- Center for Label-free Imaging and Multiscale Biophotonics (CLIMB), University of Illinois Urbana-Champaign, Urbana, Illinois, USA
- Dept. of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
| | - Mark A. Anastasio
- Dept. of Electrical and Computer Engineering, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
- Center for Label-free Imaging and Multiscale Biophotonics (CLIMB), University of Illinois Urbana-Champaign, Urbana, Illinois, USA
- Dept. of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
| |
Collapse
|
4
|
Chang Q, Yan Z, Zhou M, Qu H, He X, Zhang H, Baskaran L, Al'Aref S, Li H, Zhang S, Metaxas DN. Mining multi-center heterogeneous medical data with distributed synthetic learning. Nat Commun 2023; 14:5510. [PMID: 37679325 PMCID: PMC10484909 DOI: 10.1038/s41467-023-40687-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 08/03/2023] [Indexed: 09/09/2023] Open
Abstract
Overcoming barriers on the use of multi-center data for medical analytics is challenging due to privacy protection and data heterogeneity in the healthcare system. In this study, we propose the Distributed Synthetic Learning (DSL) architecture to learn across multiple medical centers and ensure the protection of sensitive personal information. DSL enables the building of a homogeneous dataset with entirely synthetic medical images via a form of GAN-based synthetic learning. The proposed DSL architecture has the following key functionalities: multi-modality learning, missing modality completion learning, and continual learning. We systematically evaluate the performance of DSL on different medical applications using cardiac computed tomography angiography (CTA), brain tumor MRI, and histopathology nuclei datasets. Extensive experiments demonstrate the superior performance of DSL as a high-quality synthetic medical image provider by the use of an ideal synthetic quality metric called Dist-FID. We show that DSL can be adapted to heterogeneous data and remarkably outperforms the real misaligned modalities segmentation model by 55% and the temporal datasets segmentation model by 8%.
Collapse
Affiliation(s)
- Qi Chang
- Department of Computer Science, Rutgers University, Piscataway, NJ, USA
| | | | - Mu Zhou
- SenseBrain Research, Princeton, NJ, USA
- Shanghai Artificial Intelligence Laboratory, Shanghai, China
| | - Hui Qu
- Department of Computer Science, Rutgers University, Piscataway, NJ, USA
| | - Xiaoxiao He
- Department of Computer Science, Rutgers University, Piscataway, NJ, USA
| | - Han Zhang
- Department of Computer Science, Rutgers University, Piscataway, NJ, USA
| | - Lohendran Baskaran
- Department of Cardiovascular Medicine, National Heart Centre Singapore, and Duke-National University Of Singapore, Singapore, Singapore
| | - Subhi Al'Aref
- Department of Medicine, Division of Cardiology, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Hongsheng Li
- Chinese University of Hong Kong, Hong Kong SAR, China.
- Centre for Perceptual and Interactive Intelligence (CPII), Hong Kong SAR, China.
| | - Shaoting Zhang
- Shanghai Artificial Intelligence Laboratory, Shanghai, China.
- Centre for Perceptual and Interactive Intelligence (CPII), Hong Kong SAR, China.
- SenseTime, Shanghai, China.
| | - Dimitris N Metaxas
- Department of Computer Science, Rutgers University, Piscataway, NJ, USA.
| |
Collapse
|
5
|
Jiang X, Hu Z, Wang S, Zhang Y. Deep Learning for Medical Image-Based Cancer Diagnosis. Cancers (Basel) 2023; 15:3608. [PMID: 37509272 PMCID: PMC10377683 DOI: 10.3390/cancers15143608] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 07/10/2023] [Accepted: 07/10/2023] [Indexed: 07/30/2023] Open
Abstract
(1) Background: The application of deep learning technology to realize cancer diagnosis based on medical images is one of the research hotspots in the field of artificial intelligence and computer vision. Due to the rapid development of deep learning methods, cancer diagnosis requires very high accuracy and timeliness as well as the inherent particularity and complexity of medical imaging. A comprehensive review of relevant studies is necessary to help readers better understand the current research status and ideas. (2) Methods: Five radiological images, including X-ray, ultrasound (US), computed tomography (CT), magnetic resonance imaging (MRI), positron emission computed tomography (PET), and histopathological images, are reviewed in this paper. The basic architecture of deep learning and classical pretrained models are comprehensively reviewed. In particular, advanced neural networks emerging in recent years, including transfer learning, ensemble learning (EL), graph neural network, and vision transformer (ViT), are introduced. Five overfitting prevention methods are summarized: batch normalization, dropout, weight initialization, and data augmentation. The application of deep learning technology in medical image-based cancer analysis is sorted out. (3) Results: Deep learning has achieved great success in medical image-based cancer diagnosis, showing good results in image classification, image reconstruction, image detection, image segmentation, image registration, and image synthesis. However, the lack of high-quality labeled datasets limits the role of deep learning and faces challenges in rare cancer diagnosis, multi-modal image fusion, model explainability, and generalization. (4) Conclusions: There is a need for more public standard databases for cancer. The pre-training model based on deep neural networks has the potential to be improved, and special attention should be paid to the research of multimodal data fusion and supervised paradigm. Technologies such as ViT, ensemble learning, and few-shot learning will bring surprises to cancer diagnosis based on medical images.
Collapse
Grants
- RM32G0178B8 BBSRC
- MC_PC_17171 MRC, UK
- RP202G0230 Royal Society, UK
- AA/18/3/34220 BHF, UK
- RM60G0680 Hope Foundation for Cancer Research, UK
- P202PF11 GCRF, UK
- RP202G0289 Sino-UK Industrial Fund, UK
- P202ED10, P202RE969 LIAS, UK
- P202RE237 Data Science Enhancement Fund, UK
- 24NN201 Fight for Sight, UK
- OP202006 Sino-UK Education Fund, UK
- RM32G0178B8 BBSRC, UK
- 2023SJZD125 Major project of philosophy and social science research in colleges and universities in Jiangsu Province, China
Collapse
Affiliation(s)
- Xiaoyan Jiang
- School of Mathematics and Information Science, Nanjing Normal University of Special Education, Nanjing 210038, China; (X.J.); (Z.H.)
| | - Zuojin Hu
- School of Mathematics and Information Science, Nanjing Normal University of Special Education, Nanjing 210038, China; (X.J.); (Z.H.)
| | - Shuihua Wang
- School of Computing and Mathematical Sciences, University of Leicester, Leicester LE1 7RH, UK;
| | - Yudong Zhang
- School of Computing and Mathematical Sciences, University of Leicester, Leicester LE1 7RH, UK;
| |
Collapse
|
6
|
Wang Z, Lim G, Ng WY, Tan TE, Lim J, Lim SH, Foo V, Lim J, Sinisterra LG, Zheng F, Liu N, Tan GSW, Cheng CY, Cheung GCM, Wong TY, Ting DSW. Synthetic artificial intelligence using generative adversarial network for retinal imaging in detection of age-related macular degeneration. Front Med (Lausanne) 2023; 10:1184892. [PMID: 37425325 PMCID: PMC10324667 DOI: 10.3389/fmed.2023.1184892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2023] [Accepted: 05/30/2023] [Indexed: 07/11/2023] Open
Abstract
Introduction Age-related macular degeneration (AMD) is one of the leading causes of vision impairment globally and early detection is crucial to prevent vision loss. However, the screening of AMD is resource dependent and demands experienced healthcare providers. Recently, deep learning (DL) systems have shown the potential for effective detection of various eye diseases from retinal fundus images, but the development of such robust systems requires a large amount of datasets, which could be limited by prevalence of the disease and privacy of patient. As in the case of AMD, the advanced phenotype is often scarce for conducting DL analysis, which may be tackled via generating synthetic images using Generative Adversarial Networks (GANs). This study aims to develop GAN-synthesized fundus photos with AMD lesions, and to assess the realness of these images with an objective scale. Methods To build our GAN models, a total of 125,012 fundus photos were used from a real-world non-AMD phenotypical dataset. StyleGAN2 and human-in-the-loop (HITL) method were then applied to synthesize fundus images with AMD features. To objectively assess the quality of the synthesized images, we proposed a novel realness scale based on the frequency of the broken vessels observed in the fundus photos. Four residents conducted two rounds of gradings on 300 images to distinguish real from synthetic images, based on their subjective impression and the objective scale respectively. Results and discussion The introduction of HITL training increased the percentage of synthetic images with AMD lesions, despite the limited number of AMD images in the initial training dataset. Qualitatively, the synthesized images have been proven to be robust in that our residents had limited ability to distinguish real from synthetic ones, as evidenced by an overall accuracy of 0.66 (95% CI: 0.61-0.66) and Cohen's kappa of 0.320. For the non-referable AMD classes (no or early AMD), the accuracy was only 0.51. With the objective scale, the overall accuracy improved to 0.72. In conclusion, GAN models built with HITL training are capable of producing realistic-looking fundus images that could fool human experts, while our objective realness scale based on broken vessels can help identifying the synthetic fundus photos.
Collapse
Affiliation(s)
- Zhaoran Wang
- Duke-NUS Medical School, National University of Singapore, Singapore, Singapore
| | - Gilbert Lim
- Duke-NUS Medical School, National University of Singapore, Singapore, Singapore
- Singapore Eye Research Institute, Singapore, Singapore
| | - Wei Yan Ng
- Singapore Eye Research Institute, Singapore, Singapore
- Singapore National Eye Centre, Singapore, Singapore
| | - Tien-En Tan
- Singapore Eye Research Institute, Singapore, Singapore
- Singapore National Eye Centre, Singapore, Singapore
| | - Jane Lim
- Singapore Eye Research Institute, Singapore, Singapore
- Singapore National Eye Centre, Singapore, Singapore
| | - Sing Hui Lim
- Singapore Eye Research Institute, Singapore, Singapore
- Singapore National Eye Centre, Singapore, Singapore
| | - Valencia Foo
- Singapore Eye Research Institute, Singapore, Singapore
- Singapore National Eye Centre, Singapore, Singapore
| | - Joshua Lim
- Singapore Eye Research Institute, Singapore, Singapore
- Singapore National Eye Centre, Singapore, Singapore
| | | | - Feihui Zheng
- Singapore Eye Research Institute, Singapore, Singapore
| | - Nan Liu
- Duke-NUS Medical School, National University of Singapore, Singapore, Singapore
- Singapore Eye Research Institute, Singapore, Singapore
| | - Gavin Siew Wei Tan
- Duke-NUS Medical School, National University of Singapore, Singapore, Singapore
- Singapore Eye Research Institute, Singapore, Singapore
- Singapore National Eye Centre, Singapore, Singapore
| | - Ching-Yu Cheng
- Duke-NUS Medical School, National University of Singapore, Singapore, Singapore
- Singapore Eye Research Institute, Singapore, Singapore
- Singapore National Eye Centre, Singapore, Singapore
| | - Gemmy Chui Ming Cheung
- Duke-NUS Medical School, National University of Singapore, Singapore, Singapore
- Singapore Eye Research Institute, Singapore, Singapore
- Singapore National Eye Centre, Singapore, Singapore
| | - Tien Yin Wong
- Singapore National Eye Centre, Singapore, Singapore
- School of Medicine, Tsinghua University, Beijing, China
| | - Daniel Shu Wei Ting
- Duke-NUS Medical School, National University of Singapore, Singapore, Singapore
- Singapore Eye Research Institute, Singapore, Singapore
- Singapore National Eye Centre, Singapore, Singapore
| |
Collapse
|
7
|
Yan C, Yan Y, Wan Z, Zhang Z, Omberg L, Guinney J, Mooney SD, Malin BA. A Multifaceted benchmarking of synthetic electronic health record generation models. Nat Commun 2022; 13:7609. [PMID: 36494374 PMCID: PMC9734113 DOI: 10.1038/s41467-022-35295-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Accepted: 11/28/2022] [Indexed: 12/13/2022] Open
Abstract
Synthetic health data have the potential to mitigate privacy concerns in supporting biomedical research and healthcare applications. Modern approaches for data generation continue to evolve and demonstrate remarkable potential. Yet there is a lack of a systematic assessment framework to benchmark methods as they emerge and determine which methods are most appropriate for which use cases. In this work, we introduce a systematic benchmarking framework to appraise key characteristics with respect to utility and privacy metrics. We apply the framework to evaluate synthetic data generation methods for electronic health records data from two large academic medical centers with respect to several use cases. The results illustrate that there is a utility-privacy tradeoff for sharing synthetic health data and further indicate that no method is unequivocally the best on all criteria in each use case, which makes it evident why synthetic data generation methods need to be assessed in context.
Collapse
Affiliation(s)
- Chao Yan
- grid.412807.80000 0004 1936 9916Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN USA
| | - Yao Yan
- grid.430406.50000 0004 6023 5303Sage Bionetworks, Seattle, WA USA
| | - Zhiyu Wan
- grid.412807.80000 0004 1936 9916Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN USA
| | - Ziqi Zhang
- grid.152326.10000 0001 2264 7217Department of Computer Science, Vanderbilt University, Nashville, TN USA
| | - Larsson Omberg
- grid.430406.50000 0004 6023 5303Sage Bionetworks, Seattle, WA USA
| | - Justin Guinney
- grid.34477.330000000122986657Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA USA ,grid.511425.60000 0004 9346 3636Tempus Labs, Chicago, IL USA
| | - Sean D. Mooney
- grid.34477.330000000122986657Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA USA
| | - Bradley A. Malin
- grid.412807.80000 0004 1936 9916Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN USA ,grid.152326.10000 0001 2264 7217Department of Computer Science, Vanderbilt University, Nashville, TN USA ,grid.412807.80000 0004 1936 9916Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN USA
| |
Collapse
|
8
|
Seastedt KP, Schwab P, O’Brien Z, Wakida E, Herrera K, Marcelo PGF, Agha-Mir-Salim L, Frigola XB, Ndulue EB, Marcelo A, Celi LA. Global healthcare fairness: We should be sharing more, not less, data. PLOS DIGITAL HEALTH 2022; 1:e0000102. [PMID: 36812599 PMCID: PMC9931202 DOI: 10.1371/journal.pdig.0000102] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/16/2023]
Abstract
The availability of large, deidentified health datasets has enabled significant innovation in using machine learning (ML) to better understand patients and their diseases. However, questions remain regarding the true privacy of this data, patient control over their data, and how we regulate data sharing in a way that that does not encumber progress or further potentiate biases for underrepresented populations. After reviewing the literature on potential reidentifications of patients in publicly available datasets, we argue that the cost-measured in terms of access to future medical innovations and clinical software-of slowing ML progress is too great to limit sharing data through large publicly available databases for concerns of imperfect data anonymization. This cost is especially great for developing countries where the barriers preventing inclusion in such databases will continue to rise, further excluding these populations and increasing existing biases that favor high-income countries. Preventing artificial intelligence's progress towards precision medicine and sliding back to clinical practice dogma may pose a larger threat than concerns of potential patient reidentification within publicly available datasets. While the risk to patient privacy should be minimized, we believe this risk will never be zero, and society has to determine an acceptable risk threshold below which data sharing can occur-for the benefit of a global medical knowledge system.
Collapse
Affiliation(s)
- Kenneth P. Seastedt
- Beth Israel Deaconess Medical Center, Department of Surgery, Harvard Medical School, Boston, Massachusetts, United States of America
- * E-mail:
| | - Patrick Schwab
- GlaxoSmithKline, Artificial Intelligence & Machine Learning, Zug, Switzerland
| | - Zach O’Brien
- Australian and New Zealand Intensive Care Research Centre (ANZIC-RC), Department of Epidemiology and Preventive Medicine, Monash University, Melbourne, Victoria, Australia
| | - Edith Wakida
- Mbarara University of Science and Technology, Mbarara, Uganda
| | - Karen Herrera
- Quality and Patient Safety, Hospital Militar, Managua, Nicaragua
| | - Portia Grace F. Marcelo
- Department of Family & Community Medicine, University of the Philippines, Manila, Philippines
| | - Louis Agha-Mir-Salim
- Institute of Medical Informatics, Charité—Universitätsmedizin Berlin (corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health), Berlin, Germany
- Laboratory for Computational Physiology, Harvard-MIT Division of Health Sciences & Technology, Cambridge, Massachusetts, United States of America
| | - Xavier Borrat Frigola
- Laboratory for Computational Physiology, Harvard-MIT Division of Health Sciences & Technology, Cambridge, Massachusetts, United States of America
- Anesthesiology and Critical Care Department, Hospital Clinic de Barcelona, Barcelona, Spain
| | - Emily Boardman Ndulue
- Department of Journalism, Northeastern University, Boston, Massachusetts, United States of America
| | - Alvin Marcelo
- Department of Surgery, University of the Philippines, Manila, Philippines
| | - Leo Anthony Celi
- Laboratory for Computational Physiology, Harvard-MIT Division of Health Sciences & Technology, Cambridge, Massachusetts, United States of America
- Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, United States of America
- Department of Biostatistics Harvard T.H, Chan School of Public Health, Boston, Massachusetts, United States of America
| |
Collapse
|
9
|
Shmatko A, Ghaffari Laleh N, Gerstung M, Kather JN. Artificial intelligence in histopathology: enhancing cancer research and clinical oncology. NATURE CANCER 2022; 3:1026-1038. [PMID: 36138135 DOI: 10.1038/s43018-022-00436-4] [Citation(s) in RCA: 112] [Impact Index Per Article: 56.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Accepted: 08/03/2022] [Indexed: 06/16/2023]
Abstract
Artificial intelligence (AI) methods have multiplied our capabilities to extract quantitative information from digital histopathology images. AI is expected to reduce workload for human experts, improve the objectivity and consistency of pathology reports, and have a clinical impact by extracting hidden information from routinely available data. Here, we describe how AI can be used to predict cancer outcome, treatment response, genetic alterations and gene expression from digitized histopathology slides. We summarize the underlying technologies and emerging approaches, noting limitations, including the need for data sharing and standards. Finally, we discuss the broader implications of AI in cancer research and oncology.
Collapse
Affiliation(s)
- Artem Shmatko
- Division of AI in Oncology, German Cancer Research Center (DKFZ), Heidelberg, Germany
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | | | - Moritz Gerstung
- Division of AI in Oncology, German Cancer Research Center (DKFZ), Heidelberg, Germany.
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK.
| | - Jakob Nikolas Kather
- Department of Medicine III, University Hospital RWTH Aachen, Aachen, Germany.
- Medical Oncology, National Center for Tumor Diseases, University Hospital Heidelberg, Heidelberg, Germany.
- Pathology and Data Analytics, Leeds Institute of Medical Research at St James's, University of Leeds, Leeds, UK.
- Else Kroener Fresenius Center for Digital Health, Medical Faculty Carl Gustav Carus, Technical University Dresden, Dresden, Germany.
| |
Collapse
|
10
|
Conditional generation of medical time series for extrapolation to underrepresented populations. PLOS DIGITAL HEALTH 2022; 1:e0000074. [PMID: 36812549 PMCID: PMC9931259 DOI: 10.1371/journal.pdig.0000074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Accepted: 06/10/2022] [Indexed: 11/19/2022]
Abstract
The widespread adoption of electronic health records (EHRs) and subsequent increased availability of longitudinal healthcare data has led to significant advances in our understanding of health and disease with direct and immediate impact on the development of new diagnostics and therapeutic treatment options. However, access to EHRs is often restricted due to their perceived sensitive nature and associated legal concerns, and the cohorts therein typically are those seen at a specific hospital or network of hospitals and therefore not representative of the wider population of patients. Here, we present HealthGen, a new approach for the conditional generation of synthetic EHRs that maintains an accurate representation of real patient characteristics, temporal information and missingness patterns. We demonstrate experimentally that HealthGen generates synthetic cohorts that are significantly more faithful to real patient EHRs than the current state-of-the-art, and that augmenting real data sets with conditionally generated cohorts of underrepresented subpopulations of patients can significantly enhance the generalisability of models derived from these data sets to different patient populations. Synthetic conditionally generated EHRs could help increase the accessibility of longitudinal healthcare data sets and improve the generalisability of inferences made from these data sets to underrepresented populations.
Collapse
|